Accuracy of 12 short versions of the Geriatric Depression Scale to detect depression in a prospective study of a high-risk population with different levels of cognition.

OBJECTIVES
To determine the accuracy of 12 previously validated short versions of the Geriatric Depression Scale (GDS) to detect major depressive disorder (MDD) in a high-risk population with and without global cognitive impairment.


DESIGN
Cross-sectional study.


SETTING
Five hospitals, Western Sweden.


PARTICIPANTS
Older adults (age ≥70 years, n = 60) assessed at a home visit 1 year after hospital care in connection with suicide attempt.


MEASUREMENTS
Depression symptoms were rated using the established 15-item GDS. Eleven short GDS versions identified by a recent systematic review were derived from this administered version. Receiver operating characteristic curves and area under the curve (AUC) for the identification of MDD diagnosed according to Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, were obtained for each version. The Youden Index optimal criterion was used to determine the appropriate cutoffs. Analyses were repeated after stratification by cognitive status (Mini Mental State Examination score ≤24 and >24) for the best performing GDS short versions and the established 15-item GDS.


RESULTS
The 7-item GDS according to Broekman et al. (2011), with a cutoff 3, was the most accurate among the 12 short versions (AUC 0.90, 95% confidence interval 0.80-1.00), identifying MDD with sensitivity 88% and specificity 81%. The cutoff score remained consistent in the presence of global cognitive impairment, which was not the case for the standardized 15-item GDS.


CONCLUSION
The Broekman 7-item GDS had high accuracy to detect MDD in this prospective clinical cohort at high risk for MDD. Further testing of GDS short versions in diverse settings is required.


Introduction
The prevalence of major depression in late life ranges between 4.6% and 9.3% (Luppa et al., 2012). Major depression is a contributor to decreased functional level and quality of life (Daly et al., 2010). Screening instruments that are easy to administer and have high sensitivity and specificity for the detection of major depression can play an important role in secondary preventive interventions.
The Geriatric Depression Scale (GDS) was originally developed as a 30-item screening tool for depression in older adults (Yesavage et al., 1982). Efforts have been undertaken to cement the validity of GDS, to better its accuracy in diverse populations and settings, and to improve its efficiency by decreasing the number of redundant items. The GDS 15-item version (Sheikh and Yesavage, 1986) is currently one of the most widely used instruments for detection of depression in older adults. A recent systematic review of brief GDS versions further identified 1-, 4-, 5-, 7-, 8-, and 10-item versions (Pocklington et al., 2016). In that review, a metaanalysis was applied to the standardized version of 15-item GDS and demonstrated pooled sensitivity 89% and specificity 77% to detect depression when using the established 15-item GDS cutoff score 5. The values of these pooled accuracy parameters are not very high, most likely reflecting differences in clinical and community samples included in the meta-analysis.
A 12-item version was developed for administration in institutionalized patients and showed only slightly lower sensitivity, but higher specificity to detect depression compared to the 15-item GDS using the same cutoff (score 5) (Sutcliffe et al., 2000). The accuracy of other brief versions was not determined in that study. Moreover, it remains to be elucidated whether a given scale retains accuracy to detect depression in prospectively followed clinical cohorts when affective pathology may be in remission.
Depression often coexists with cognitive impairment in old adults (Van der Mussele et al., 2013), and it has been shown that the accuracy of longer GDS versions is affected by severe cognitive impairment (Sheikh and Yesavage, 1986). Therefore, it remains to be clarified whether abbreviated GDS versions could be sensitive enough to detect major depression, irrespective of the presence of cognitive impairment.
We aimed to assess the existent GDS short versions for their accuracy to detect major depressive disorder (MDD) according to the Diagnostic and Statistical Manual of Mental Disorder, fourth edition (DSM-IV) (APA, 1994). For short scales with highest accuracy, we also aimed to determine whether accuracy to discriminate MDD was affected by global cognitive impairment.

Participants
Data were obtained in a prospective study on attempted suicide in individuals 70 years and older (range 70-91 years) (Wiktorsson et al., 2011). Briefly, consecutive patients admitted to emergency departments at five hospitals in western Sweden in connection with a suicide attempt were recruited during [2003][2004][2005][2006]. The ability to comprehend study aims and interview content and to give informed consent was determined by the attending physician. There was no formal testing of patients' capacity to participate in the research study. Patients were excluded due to terminal illness (n = 2), severe dementia (n = 2), and insufficient knowledge of the Swedish language (n = 1), leaving 140 patients eligible for the study. Of these, 7 patients were discharged without receiving study information, 16 died, and 28 declined participation. Thus, 103 patients were included (73.6% participation rate), but 6 of them did not complete the interview, thus leaving 97 participants assessed with GDS (Wiktorsson et al., 2010). At 1-year followup, there were 14 deceased, 1 nontraceable, and 22 refusals. Sixty individuals were alive and accepted a psychiatric assessment at follow-up, including GDS (Wiktorsson et al., 2011).
In accordance with the Declaration of Helsinki, all participants gave their informed and written consent for the study. The study was approved by the Research Ethics Committee at the University of Gothenburg.

Procedures
The interviews were performed by a psychologist (SW) who read aloud all study questions for the participants, including self-report questionnaires. Interviews were carried out in participants' homes (n = 48), nursing homes (n = 9), psychiatric wards (n = 2), and at a psychiatric outpatient department (n = 1). The median time from hospitalization to the follow-up interview was 391 days. The following scales and questionnaires were used during interview: The established 15-item GDS (Sheikh and Yesavage, 1986) was used as a standard self-report screening instrument for clinical depression. The scale has 15 "yes/no" questions, and a score 1 was assigned to all "yes" answers and to "no" answers in items 1, 5, 7, 11, and 13 to indicate depressive symptoms (score range 0-15). A score of 5 is considered the standard cutoff score that indicates depression (Shah et al., 1996). For the purpose of this study, we tested the accuracy to detect MDD of the 15-item GDS along with 11 shorter versions of the GDS all derived from the 15-item version: a 12-item version (Sutcliffe et al., 2000), two 10-item versions (D'Ath et al., 1994;van Marwijk et al., 1995), an 8-item version (Allgaier et al., 2011;Jongenelis et al., 2007), a 7-item version (Broekman et al., 2011), two 5-item versions (Cheng et al., 2010;Hoyl et al., 1999), two 4-item versions (D'Ath et al., 1994;van Marwijk et al., 1995), and two 1-item versions (D'Ath et al., 1994;van Marwijk et al., 1995) (Table 1).
The Comprehensive Psychopathological Rating Scale (CPRS) (Åsberg et al., 1978) was used to assess past month psychopatology. The Montgomery Åsberg Depression Rating Scale (MADRS) (Montgomery and Åsberg, 1979) was derived from the CPRS and was employed at initial assessment and at follow-up to capture change in burden of depressive symptoms over time. MADRS includes 10 items rated 0 (no symptom) to 6 (severe symptom) with a maximum score of 60.
The Cumulative Illness Rating Scale for Geriatrics (CIRS-G) (Miller et al., 1992) was used to rate serious physical illness in 13 organ systems using scores 0-4. Participants were considered to have serious physical illness if assigned scores 3 ("severe/ constant disability and/or uncontrollable chronic problems") or 4 ("extremely severe illness and/ or functional impairment") on any of the 13 organ categories. A senior psychiatrist (MW) reviewed all ratings.
The Mini Mental State Examination (MMSE) (Folstein et al., 1975) was used to assess global cognitive function. For the purpose of this study, participants with an MMSE score ≤ 24 were categorized as cognitively impaired (Creavin et al., 2016). No imputation was used for missing points due to physical or sensory handicap, e.g. visual impairment, and MMSE scores ranged from 14 to 30.

Psychiatric diagnoses
The research diagnosis of major depression according to the DSM-IV (APA, 1994) was established using an algorithm based on symptoms according to the CPRS (Sjoberg et al., 2013). History of alcohol use disorder (past and current) was registered at baseline if either alcohol misuse or dependence was acknowledged by any of three sources: interview with the patient, case records, or the regional hospital discharge register (Morin et al., 2013).

Statistical methods
Participants and nonparticipants (deceased and refusals) at follow-up were compared on demographic and clinical variables registered during hospitalization. As Shapiro-Wilk normality testing showed non-normal distributions for age, MMSE, MADRS, and GDS scores, Mann-Whitney test was employed to compare participants and nonparticipants on these continous numeric variables. Fishers's exact test was applied to test for differences in proportions regarding sex, marital status, education level, antidepressant prescription, MDD, alcohol use disorder, and serious physical illness. In ancillary analyses at follow-up, we also compared age and MADRS score across the subgroups with different levels of global cognitive function (MMSE score ≤ 24 vs. MMSE score >24) using Mann-Whitney test. Distribution of sexes, education level, and serious physical illness were compared in the two subgroups using Fishers's exact test. Two-tailed statistical testing was considered

1.
Are you basically satisfied with your life? X X X X X X X X X X No 2.
Have you dropped many of your activities and interests?
X X X X X Yes 3. Do you feel that your life is empty? X X X X X X X X Yes 4.
Do you often get bored? X X X X X X Yes 5.
Are you in good spirits most of the time? X X X X X No 6.
Are you afraid that something bad is going to happen to you?
X X X X Yes 7. Do you feel happy most of the time? X X X X X X X X X No 8.
Do you often feel helpless? X X X X X X X Yes 9.
Do you prefer to stay at home, rather than going out and doing new things?  (2000) significant at the α level 0.05 in these analyses (significant p-value <0.05). We used receiver operating characteristic (ROC) with area under the curve (AUC) to estimate the diagnostic accuracy for different GDS versions to detect depression in this prospective sample using "gold standard" MDD diagnosed according to DSM-IV criteria (Murphy et al., 1987). Youden Index was used to identify optimal cutoffs based on the ROC according to a parametric method (Fluss et al., 2005). The Youden Index was computed as J = max cut-off (sensitivity cut-off + (specificity cut-off -1)). Values of the Youden Index range between 0 (poor accuracy) and 1 (best accuracy). Accuracy of each GDS version was further evaluated by computing sensitivity, specificity, and positive and negative likelihood ratios (LR) using the optimal cutoffs identified using the Youden Index. Sensitivity was computed as the probability of a positive test result in individuals with MDD and specificity as the probability of a negative test result in individuals without MDD. Positive LR was computed as the ratio of probabilities of having a positive test result in individuals with MDD vs. probabilities of a positive test in those without MDD. Negative LR was computed as the ratio of probabilities of having a negative test result in individuals with MDD vs. probabilities of a negative test in the nondepressed. Large positive LR values (closer to infinity) and small negative LR values (closer to 0) indicate accurate diagnostic tests. We also computed Cronbach's α as a measure of internal consistency of the GDS versions. Finally, we applied equivalence testing among the different GDS versions using two one-sided tests (TOST, i.e. null-hypothesis and equivalence testing of differences between mean values) (Lakens et al., 2018).
All statistical analyses were run using the R program (R studio version 3.5.1).

Results
No baseline differences were observed in participants and nonparticipants (deceased and refusals) at 12-month follow-up regarding sociodemographic factors (age, sex, married or cohabiting status, mandatory education) ( Table 2). Most clinical characteristics (MMSE score, major depression, antidepressant prescription at discharge, serious physical illness) also did not differ, but participants had a higher frequency of alcohol use disorder and lower MADRS mean score than nonparticipants.
AUC and Youden Indices were similar for many of the GDS versions tested (Table 3). The AUC for the 7-item GDS according to Broekman (Broekman et al., 2011) was slightly numerically greater in comparison to AUCs for all the other versions, but the 10-item GDS according to Van Marwijk (van Marwijk et al., 1995) had the highest Youden Index (Table 3).
According to the Youden Index, the 4-item GDS according to D'Ath (D'Ath et al., 1994) was as accurate as the Broekman 7-item version in detecting MDD in this sample.
The 15-item GDS did not outperform these three versions in detecting MDD (Figure 1). However, internal consistency according to Cronbach's α was poor for both 4-item GDS versions, and for the 5-item version according to Hoyl (Hoyl et al., 1999). All other GDS versions showed good internal consistency (0.8-0.9) ( Table 3). The value of Cronbach's α was similar for the Broekman 7-item version (0.84) and the established 15-item GDS (0.86), which may indicate that the Broekman 7-item version retains the unidimensional property of the scale.
The optimal cutoff GDS-15 score was 9 in this sample, as determined by the Youden Index. Overall accuracy parameters for the 15-item GDS did not improve when we applied the standard cutoff score 5 (sensitivity 88%, specificity 65%). Although the best sensitivity at follow-up (94%) was achieved by the 4-item version by Van Marwijk (van Marwijk et al., 1995), the specificity was poor (Table 3). Low AUC and Youden Index estimates were observed for the two 1-item versions.
Although the GDS versions were equivalent (nonsignificant p-values in equivalence tests), GDS scores were statistically different for the majority of the GDS versions (significant p-values in nullhypothesis tests of differences between mean values), and TOST results were inconclusive (see Supplementary Table S1).
The accuracy of selected GDS versions to detect MDD in subgroups with or without cognitive impairment The most accurate short versions of the GDS at follow-up, the 10 items by Van Marwijk (van Marwijk et al., 1995), the 7 items by Broekman (Broekman et al., 2011), and the 4 items by D'Ath (D'Ath et al., 1994), were tested further for their accuracy to detect MDD in cognitively intact Age ( Differences between participants and nonparticipants were tested by Fishers's exact test for distribution of sexes, marital status, education, antidepressant prescription, MDD, alcohol use disorder, and serious physical illnes; and Mann-Whitney test for not normally distributed variables age, MMSE, MADRS, and GDS scores. GDS score range was 0 to maximum in all versions, except GDS 15-item participants score range 0-14 and nonparticipants 1-13; 12-item nonparticipants score range 0-11; 10-item D'Ath participants score range 0-9; and 10-item Van Marwijk nonparticipants score range 1-10. Missing scores among nonparticipants at follow-up: MMSE n = 2 and MADRS n = 1. a Nonparticipants: 14 deceased and 23 refusals (including n = 1 nontraceable). b At hospital discharge. c Defined as score 3 or 4 on any somatic category on the Cumulative Illness Rating Scale for Geriatrics. * p-values ≤ 0.05 are considered statistically significant (two-tailed significance at α level 0.05).
(MMSE score 25-30) and cognitively impaired individuals (MMSE score 14-24). The established GDS-15 was also tested for comparison. There were no differences between the two subgroups regarding age (cognitively intact median age 80 years, IQR 8, 1st-3rd quantile range 75.5-83.5; cognitively impaired median age 83 years, IQR 9, 1st-3rd quantile range 77.0-86.0; Mann-Whitney test p-value 0.174); MADRS score (cognitively intact median MADRS score 8, IQR 12.5, 1st-3rd quantile range 2.2-14.7; cognitively impaired median MADRS score 12.5, IQR 15.7, 1st-3rd quantile range 5.5-21.2; Mann-Whitney test p-value 0.184), and sex distribution (53.3% women cognitively intact vs. 53.3% women cognitively impaired); education (48.9% more than mandatory education among those cognitively intact vs. 33.3% in cognitively impaired; Fisher's Exact test p-value 0.375) or severe physical illness (56.8% in cognitively intact vs. 86.7% in cognitively impaired; Fisher's Exact test p-value 0.219). Table 4 shows results for subgroups with and without cognitive impairment regarding diagnostic accuracy of GDS versions. Only the 7-item GDS according to Broekman and 4-item GDS according to D'Ath retained their cutoff scores in both subsamples, but sensitivity decreased from 90% to 86% among those with cognitive impairment. Among cognitively impaired patients, the established 15-item version showed accuracy similar to that of the Broekman 7-item version, but the cutoff for the 15-item version suggested by the Youden Index (7) was higher than the standard cutoff (5) and different from optimal cutoffs suggested by the Youden Index in the total sample (9). Internal consistency according to Cronbach's α was not affected by the level of global cognitive impairment.

Discussion
We found satisfactory accuracy for the brief 7-item GDS according to Broekman (Broekman et al., 2011) to detect MDD in a diagnostically heterogeneous sample of previously hospitalized suicide attempters at 1-year follow-up when many were in remission. The presence of cognitive impairment seemed not to affect the cutoff score for the Broekman 7-item scale, which was the case for the established 15-item version. However, the small size of the sample makes any definitive conclusion difficult.
In this prospective clinical sample at high risk for depression, the accuracy to detect MDD was higher for three short versions, i.e. the 10 items by Van Marwijk (van Marwijk et al., 1995), the 7 items by Broekman (Broekman et al., 2011), and the 4 items by D' Ath (D'Ath et al., 1994), compared to the Table 3. The cutoff score (9) for the 15-item GDS identified by the optimization statistic Youden Index in this clinical sample was higher than the standard cutoff (5), and the 15-item GDS did not outperform the Broekman 7-item scale. Our results suggest that the Broekman 7-item GDS with an optimal cutoff score 3 might be applicable in similar clinical settings. This version may be preferred due to its shorter format. Moreover, the internal validity of the Broekman 7-item scale was as good as for the GDS versions with more items. Studies of brief GDS versions are scarce and differ in methodology, making comparisons difficult. Our finding contrasts with that of a community-based study using the Broekman 7-item GDS scale that identified a cutoff at 1 to detect MDD according to the DSM-IV with high sensitivity (93%) and specificity (91%) (Broekman et al., 2011). While application of the cutoff score 1 in our sample improved sensitivity (94%), specificity at this cutoff was unacceptable (44%). Our results seem to indicate that the short scales have lower accuracy in clinical samples at high risk of affective psychopathology compared to community samples. Furthermore, 5-item scales tested in our sample showed moderate sensitivity (76%) and high specificity (84-86%) at cutoff 3, in contrast with previous clinical studies. A study of the 5-item GDS by Cheng suggested cutoff 2 determined using Youden Index to detect depression in a clinical population (sensitivities 72-81% and specificities 55-58% for different old age groups) (Cheng et al., 2010). Another study of the accuracy of the 5-item GDS according to Hoyl (Hoyl et al., 1999) tested using an a priori chosen cutoff score 2 in three settings, i.e. hospital, outpatient, and nursing home, demonstrated highest sensitivity 97% among hospitalized patients in the acute geriatric ward (specificity 74%) (Rinaldi et al., 2003). The accuracy of the short GDS scales varies when used in different populations. Taken together, these findings suggest that short GDS scales may be useful in clinical populations but standardized cutoffs may be difficult to establish.
At the cutoff score 3, the Broekman 7-item GDS had slightly better accuracy than the established 15-item GDS in our prospective clinical sample, but only among those who were cognitively intact. The versions tested after stratification by cognitive level had similar accuracies in those with global cognitive impairment, but the optimal cutoff varied in the 15-and 10-item GDS versions for those with and without impairment.
The short GDS scales tested after stratification by cognitive level performed better in those without cognitive impairment than in those impaired, in line with previous studies (Friedman et al., 2005). Others have reported on the accuracy of the 15-item GDS being negatively affected by global cognitive impairment if the standard cutoff was retained (Chiesi et al., 2018;de Craen et al., 2003). However, the small size of the group with global cognitive impairment (n = 15) makes clear-cut inferences difficult. To summarize, our findings shed further light on the diagnostic accuracy of the brief versions of the GDS, as called for by the authors of a recent review of the evidence (Pocklington et al., 2016). Furthermore, although we can draw no firm conclusions regarding the accuracy of the brief scales in persons with cognitive dysfunction, our results provide leads for much-needed future research that takes cognitive impairment into consideration (Chiesi et al., 2018).

Methodological considerations
The strengths of the study are the prospective clinical sample and the fact that the same licensed psychologist made assessments during hospitalization and atfollow-up. Study limitations include the special nature of this clinical sample (high risk for MDD) and the small sample size. GDS scores in our sample were not normally distributed, which makes it difficult to interpret the results of the available TOST that rely on mean value distributions. Although we acknowledge a bias in obtaining the items of the shorter GDS versions from administering the 15-item GDS, this method allows a comparison of different versions using the same sample. We thus eliminate confounding factors that are difficult to control when comparing different samples. A further consideration is the fact that the cohort was assessed over a decade ago. Nevertheless, the results may be considered representative for individuals older than 70 years currently diagnosed with MDD using DSM-IV criteria, since no radical changes with impact on mental healthcare have occurred in the catchment area during the last decade. There may be a selection bias and a healthy survivor effect as those who took part in the follow-up interviews had lower MADRS scores at baseline. However, a history of alcohol use disorder was more prevalent among those who took part in the follow-up examination.
Another aspect that has to be discussed is thatdespite almost all participants receiving antidepressants at the 1-year follow-up, one in four individuals exhibited depressive symptomatology fulfilling criteria for major depression. Treatmentrefractory depression is common in old age (Knochel et al., 2015). Although the short GDS scales detected MDD with low to moderate accuracy, short scales with high sensitivity (88% for the Broekman 7-item version in our sample) could be useful in the follow-up care of high-risk populations. The scale had good internal consistency.
In conclusion, the brief 7-item version of GDS according to Broekman (Broekman et al., 2011) could be useful in the follow-up of MDD in highrisk populations due to consistency in cutoff score in relation to global cognitive function. Further studies are warranted to compare the accuracy of these Table 4. instruments and to standardize the cutoffs in older adult populations with diverse affective pathology.