Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-08T03:37:05.679Z Has data issue: false hasContentIssue false

Comparison of different scoring methods based on latent variable models of the PHQ-9: an individual participant data meta-analysis

Published online by Cambridge University Press:  22 February 2021

Felix Fischer*
Affiliation:
Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
Brooke Levis
Affiliation:
Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec, Canada Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Staffordshire, UK
Carl Falk
Affiliation:
Department of Psychology, McGill University, Montréal, Québec, Canada
Ying Sun
Affiliation:
Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec, Canada
John P. A. Ioannidis
Affiliation:
Department of Medicine, Department of Epidemiology and Population Health, Department of Biomedical Data Science, Department of Statistics, Stanford University, Stanford, California, USA
Pim Cuijpers
Affiliation:
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit, Amsterdam, the Netherlands
Ian Shrier
Affiliation:
Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec, Canada Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada Department of Family Medicine, McGill University, Montréal, Québec, Canada
Andrea Benedetti
Affiliation:
Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada Respiratory Epidemiology and Clinical Research Unit, McGill University Health Centre, Montréal, Québec, Canada Department of Medicine, McGill University, Montréal, Québec, Canada
Brett D. Thombs
Affiliation:
Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, Québec, Canada Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada Department of Psychology, McGill University, Montréal, Québec, Canada Department of Medicine, McGill University, Montréal, Québec, Canada Department of Psychiatry, McGill University, Montréal, Québec, Canada Department of Educational and Counselling Psychology, McGill University, Montréal, Québec, Canada Biomedical Ethics Unit, McGill University, Montréal, Québec, Canada
*
Author for correspondence: Felix Fischer, E-mail: Felix.Fischer@charite.de
Rights & Permissions [Opens in a new window]

Abstract

Background

Previous research on the depression scale of the Patient Health Questionnaire (PHQ-9) has found that different latent factor models have maximized empirical measures of goodness-of-fit. The clinical relevance of these differences is unclear. We aimed to investigate whether depression screening accuracy may be improved by employing latent factor model-based scoring rather than sum scores.

Methods

We used an individual participant data meta-analysis (IPDMA) database compiled to assess the screening accuracy of the PHQ-9. We included studies that used the Structured Clinical Interview for DSM (SCID) as a reference standard and split those into calibration and validation datasets. In the calibration dataset, we estimated unidimensional, two-dimensional (separating cognitive/affective and somatic symptoms of depression), and bi-factor models, and the respective cut-offs to maximize combined sensitivity and specificity. In the validation dataset, we assessed the differences in (combined) sensitivity and specificity between the latent variable approaches and the optimal sum score (⩾10), using bootstrapping to estimate 95% confidence intervals for the differences.

Results

The calibration dataset included 24 studies (4378 participants, 652 major depression cases); the validation dataset 17 studies (4252 participants, 568 cases). In the validation dataset, optimal cut-offs of the unidimensional, two-dimensional, and bi-factor models had higher sensitivity (by 0.036, 0.050, 0.049 points, respectively) but lower specificity (0.017, 0.026, 0.019, respectively) compared to the sum score cut-off of ⩾10.

Conclusions

In a comprehensive dataset of diagnostic studies, scoring using complex latent variable models do not improve screening accuracy of the PHQ-9 meaningfully as compared to the simple sum score approach.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press
Figure 0

Table 1. Characteristics of the included participants stratified by sample.

Figure 1

Table 2. Loadings, correlation with sum score and fit indices of the three latent variable models in the calibration sample

Figure 2

Fig. 1. ROC Curves comparing diagnostic accuracy of the sum score and the latent variable models in the calibration and validation sample.

Figure 3

Table 3. Estimates from the IPD meta-analyses for each model's cut-off maximizing combined sensitivity and specificity

Figure 4

Table 4. Mean differences of (combined) sensitivity, specificity between optimal cut-offs of latent factor models and sum score along with bootstrapped 95% confidence interval in parentheses

Supplementary material: File

Fischer et al. supplementary material

Fischer et al. supplementary material

Download Fischer et al. supplementary material(File)
File 593.1 KB