Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis

Felix Fischer; Brooke Levis; Carl Falk; Ying Sun; John P A Ioannidis; Pim Cuijpers; Ian Shrier; Andrea Benedetti; Brett D Thombs; Depression Screening Data (DEPRESSD) PHQ Collaboration

doi:10.1017/S0033291721000131

Comparison of different scoring methods based on latent variable models of the PHQ-9

Standard

Comparison of different scoring methods based on latent variable models of the PHQ-9 : an individual participant data meta-analysis. / Fischer, Felix; Levis, Brooke; Falk, Carl; Sun, Ying; Ioannidis, John P A; Cuijpers, Pim; Shrier, Ian; Benedetti, Andrea; Thombs, Brett D; Depression Screening Data (DEPRESSD) PHQ Collaboration.

In: PSYCHOL MED, Vol. 52, No. 15, 2022, p. 3472-3483.

Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review

Harvard

Fischer, F, Levis, B, Falk, C, Sun, Y, Ioannidis, JPA, Cuijpers, P, Shrier, I, Benedetti, A, Thombs, BD & Depression Screening Data (DEPRESSD) PHQ Collaboration 2022, 'Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis', PSYCHOL MED, vol. 52, no. 15, pp. 3472-3483. https://doi.org/10.1017/S0033291721000131

APA

Fischer, F., Levis, B., Falk, C., Sun, Y., Ioannidis, J. P. A., Cuijpers, P., Shrier, I., Benedetti, A., Thombs, B. D., & Depression Screening Data (DEPRESSD) PHQ Collaboration (2022). Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis. PSYCHOL MED, 52(15), 3472-3483. https://doi.org/10.1017/S0033291721000131

Vancouver

Fischer F, Levis B, Falk C, Sun Y, Ioannidis JPA, Cuijpers P et al. Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis. PSYCHOL MED. 2022;52(15):3472-3483. https://doi.org/10.1017/S0033291721000131

Bibtex

@article{82eff493cc8e4527a3cc91e817acad0a,

title = "Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis",

abstract = "BACKGROUND: Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores.METHODS: We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences.RESULTS: The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10.CONCLUSIONS: In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.",

author = "Felix Fischer and Brooke Levis and Carl Falk and Ying Sun and Ioannidis, {John P A} and Pim Cuijpers and Ian Shrier and Andrea Benedetti and Thombs, {Brett D} and {Depression Screening Data (DEPRESSD) PHQ Collaboration} and Bernd L{\"o}we",

year = "2022",

doi = "10.1017/S0033291721000131",

language = "English",

volume = "52",

pages = "3472--3483",

journal = "PSYCHOL MED",

issn = "0033-2917",

publisher = "Cambridge University Press",

number = "15",

}

RIS

TY - JOUR

T1 - Comparison of different scoring methods based on latent variable models of the PHQ-9

T2 - an individual participant data meta-analysis

AU - Fischer, Felix

AU - Levis, Brooke

AU - Falk, Carl

AU - Sun, Ying

AU - Ioannidis, John P A

AU - Cuijpers, Pim

AU - Shrier, Ian

AU - Benedetti, Andrea

AU - Thombs, Brett D

AU - Depression Screening Data (DEPRESSD) PHQ Collaboration

AU - Löwe, Bernd

PY - 2022

Y1 - 2022

N2 - BACKGROUND: Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores.METHODS: We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences.RESULTS: The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10.CONCLUSIONS: In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.

AB - BACKGROUND: Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores.METHODS: We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences.RESULTS: The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10.CONCLUSIONS: In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.

U2 - 10.1017/S0033291721000131

DO - 10.1017/S0033291721000131

M3 - SCORING: Journal article

C2 - 33612144

VL - 52

SP - 3472

EP - 3483

JO - PSYCHOL MED

JF - PSYCHOL MED

SN - 0033-2917

IS - 15

ER -