There is increasing emphasis on objective assessment of patient centric outcomes that span function, symptoms and quality of life. Validated measures of patient-reported outcomes with standardized questionnaires are thus critical to clinical and research outcome assessment Reference Rotenstein, Huckman and Wagle. When patients seek treatment, a determination of severity before treatment and improvement after treatment is mostly based on their subjective reporting of symptoms, and are linked to objective measures in those disorders where they are salient. Little is known about how the patient’s subjective and the clinician’s objective rating of disorder improvement or treatment effect are aligned.
The Patient’s Global Impressions of Improvement (PGI-I) scale has been included in several studies conducted worldwide to assess patients’ overall perception of their condition by a simple and easy-to-use validated questionnaire Reference Guy. The PGI-I is a 1-item questionnaires that ask an individual patient to rate the perceived change in his/her condition in response to therapy at endpoint. It is derived from the Clinical Global Impressions – Improvement scale (CGI-I) which was first developed for use in psychopharmacology trials as part of the NIMH collaborative study of schizophrenia Reference Guy. Since then, it has been used as a standard primary outcome measure in studies investigating the efficacy of pharmacological treatments for psychiatric and medical conditions where subjective symptoms predominate, including pain, fatigue and mood Reference Rotenstein, Huckman and Wagle[3–Reference Liebowitz, Schneier, Campeas, Hollander, Hatterer and Fyer6] as well as secondary outcomes and responder analysis in many more studies; for example Reference Papakostas and Fava[7–Reference Schneider, Dagerman and Insel9]. The CGI-I address the patient’s improvement from baseline rated by the clinician. Both PGI-I and CGI-I show a bipolar scaling from 1 (very much improved) to 7 (very much worse). These types of measures have been validated in clinical studies of patients with stress incontinence Reference Yalcin and Bump, urogenital prolapse Reference Srikrishna, Robinson and Cardozo, fibromyalgia Reference Arnold, Clauw, Wang, Ahl, Gaynor and Wohlreich major depressive disorder Reference Demyttenaere, Desaiah, Petit, Croenlein and Brecht and stress urinary incontinence Reference Yalcin and Viktrup.
This article aims to evaluate the agreement between patient- and clinician-rated global impression of improvement (PGI-I, CGI-I) scales. We also examine convergent validity of PGI-I compared with CGI-I correlation other clinician-assigned ratings of disease severity, functioning and quality of life. Data were derived from a three double-blind, placebo-controlled, multicentre, randomized controlled trials in adult outpatients with bipolar depression and major depressive disorder.
2.1. Study design and participants
This was a secondary analysis of data from 3 clinical trials. Details of the study designs and populations have previously been published Reference Hellerstein, Yanowitch, Rosenthal, Samstag, Maurer and Kasch[15–Reference Dean, Kanchanatawan, Ashton, Mohebbi, Ng and Maes18]. Study 1 was a randomized, double-blind, placebo-controlled, parallel-design study to evaluate the efficacy of 2 g/day N-acetylcysteine (NAC) as adjunct maintenance treatment for bipolar disorder Reference Berk, Dean, Cotton, Gama, Kapczinski and Fernandes[15, Reference Berk, Dean, Cotton, Gama, Kapczinski and Fernandes16]. Participants (n = 149) had a Montgomery Asberg Depression Rating Score of (MADRS) ≥12 at trial entry and, after eight weeks of open-label NAC treatment, were randomized to adjunctive (in addition to treatment as usual) NAC or placebo for a further 24 weeks. Study participants were men and women residing in Australia and Brazil (www.anzctr.org.au: ACTRN12607000074493).
Study 2 was a randomized, double-blind, placebo-controlled, parallel-design study to evaluate the efficacy of 1 g/day NAC for major depressive disorder (MDD) in addition to existing treatments Reference Berk, Dean, Cotton, Jeavons, Tanious and Kohlmann. Participants (N = 252) had MADRS ≥18 at the time of entry with a current episode of MDD diagnosed according to DSM-IV-TR criteria. Participants were treated with NAC or placebo in addition to treatment as usual for 12 weeks and were followed to 16 weeks. Study participants were men and women residing in Australia (www.anzctr.org.au: ACTRN12607000134426).
Study 3 was a randomized, double-blind, placebo-controlled, parallel-design study to evaluate the efficacy of 200 mg/day of adjunctive minocycline or placebo for major depressive disorder (MDD) in addition to existing treatments Reference Dean, Kanchanatawan, Ashton, Mohebbi, Ng and Maes. Participants (N = 71) had MADRS ≥25 at the time of entry and met criteria for unipolar depression, based on Diagnostic and Statistical Manual of Mental Disorders–Fourth Edition (DSM-IV) criteria. Participants were randomized to NAC or placebo (parallel groups) over 12 weeks of treatment and were followed to week 16. Study participants were men and women residing in Australia and Thailand (www.anzctr.org.au: ACTRN12612000283875).
2.2.1. Patient Global Impression of Improvement (PGI-I) and Clinician Global Impression of Improvement (CGI-I)
Patient global impression of improvement scale (PGI-I) is a single-item global rating of change scale that ask an individual patient to rate the severity of a specific condition at baseline and or to rate at endpoints the perceived change in his/her condition in response to therapy. There are seven possible responses (scored 1–7): very much better, much better, a little better, no change, a little worse, much worse, and very much worse. The clinical global impression of improvement scale (CGI-I) is the clinician rated single-item scale that uses the same seven-point response criteria as the PGI-I Reference Guy (see Appendix A in Supplementary material).
2.2.2. Depression severity
Severity of depressive symptomatology across studies time points were measured using the Montgomery–Åsberg Depression Rating Scale (MADRS) Reference Montgomery and Asberg.
2.2.3. Quality of life
Quality of Life Enjoyment and Satisfaction Questionnaire–Short Form (Q-LES-Q) Reference Nee, Harrison and Blumenthal was used for measuring quality of life.
2.2.4. Functional impairment
Functional impairment was measured using the Range of Impaired Functioning Tool (LIFE–RIFT) Reference Leon, Solomon, Mueller, Turvey, Endicott and Keller.
2.2.5. Social and occupational functioning
The Social and Occupational Functioning Scale (SOFAS) Reference Goldman, Skodol and Lave was used to measure functioning over the duration of the study.
All trials were conducted according to the Declaration of Helsinki 1964 as revised in 2008, the requirements of the Australian National Statement on Ethical Conduct in Human Research, the federal patient privacy (HIPAA) law and the International Conference of Harmonisation for Good Clinical Practice Guidelines (ICH-GCP) and were approved by institutional review boards at all sites.
2.4. Statistical analysis
Weighted agreement Reference Fleiss, Levin and Paik was reported as a descriptive measure. The weights were given by where i and j index the rows of columns of the ratings for CGI-I and PGI-I, |i − j| indicate absolute difference and k is the maximum number of possible ratings. A weight of 1 indicates that an observation should count as perfect agreement and a weight of, say, 0.66 means that CGI-I and PGI-I are in two-thirds agreement (which happens if CGI-I and PGI-I are “two apart”). The agreements between clinician and patient ratings were assessed using Intraclass Correlation Coefficient (ICC) and its 95% confidence interval (CI) by implementing two-way random-effects model Reference Koo and Li. According to Fleiss Reference Fleiss, Levin and Paik, ICC values lower than 0.40 can be interpreted as poor, between 0.41 and 0.75 as fair, and above 0.75 as excellent agreement. The Bland-Altman plot Reference Bland and Altman was used to visually inspect agreement. This analysis involved plotting the difference between CGI-I and PGI-I measurements against the average of the two measurements ± 1.96 times its SD known as the 95% limits of agreement. For all analyses, participants were included regardless of what type of treatment they received. Convergent and divergent validity of PGI-I was evaluated by calculating Spearman correlations of PGI-I with CGI-I and four other clinician rated scale measuring depressive symptoms severity (MADRS), quality of life (Q-LES-Q), social and occupational functioning (SOFAS), and functional impairment (LIFE–RIFT). Direction and strength of CGI-I correlations with MADRS, Q-LES-Q, LIFE-RIFT and SOFAS were compared with PGI-I correlations with MADRS, Q-LES-Q, LIFE-RIFT and SOFAS, respectively. Convergent validity of PGI-I was evaluated with the expectation that PGI-I is positively correlated with MADRS and LIFE–RIFT and the strength of the correlations across follow-ups are similar to CGI-I correlations with MADRS and LIFE–RIFT. Furthermore convergent validity was examined with the expectation that SOFAS and Q-LES-Q are negatively correlated with PGI-I and strength of correlations across time points are similar between CGI-I and PGI-I. Pairwise correlations between CGI-I and PGI-I across follow-up time points are also reported as reference values for examining convergent validity. As CGI-I and PGI-I are measures of illness severity, convergent validity of PGI-I with change from baseline values of MADRS (MADRS change), LIFE–RIFT (LIFE-RIFT change), SOFAS (SOFAS change) and Q-LES-Q (Q-LES-Q change) are also evaluated in a similar manner. Sub-group analyses using similar analytical approaches were performed to evaluate agreement, convergent and divergent validity of PGI-I for both bipolar and MDD patients.
A total of 472 individuals, (female 307, 65%) were randomized in the 3 studies; mean age ranged from 45.8 to 50.2 years. A total of 200 had MDD and 148 had bipolar depression (Table 1). Mean ± SD of PCI-I and CGI-I values at each post-baseline assessment time point are presented in Table 2. There was a systematic decrease in both PCI-I and CGI-I values across time, reflecting clinical improvement with treatment. However, PCI-I and CGI-I mean values and SD were very similar at each time point. The weighed absolute agreement ranged from 94.27% to 98.69% showing a very high level of agreement. The unadjusted and adjusted (adjusted for gender and age) ICC of all time points were excellent.
a Study 1: The efficacy of N-acetylcysteine as an adjunctive treatment in bipolar depression: An open label trial, ACTRN12607000074493.
b Study 2: The Efficacy of Adjunctive N-Acetylcysteine in Major Depressive Disorder: A Double-Blind, Randomized, Placebo-Controlled Trial, ACTRN12607000134426.
c Study 3: Adjunctive minocycline treatment for major depressive disorder: A proof of concept trial; ACTRN12612000283875.
d Risk of suicide based on the Mini-International Neuropsychiatric Interview Reference Hergueta, Baker and Dunbar.
e Pooled DSM-IV anxiety disorders.
Note: SD: Standard deviation. ICC: Intra-class correlation, CI: Confident interval.
Fig. 1 illustrates CGI-I and PGI-I Bland-Altman agreement plots across follow-up time points. The SDs of agreement from the Bland-Altman analysis were as follows: week 2, ±1.37; week 4, ±1.91; week 6, ±2.05; week 8, ±2.13, week 12, ±2.21, week 16, 2.29; week 20, ±2.73, week 24, ±2.39; and week 28, ±2.41, showing an adverse association between time and agreement. The mean difference between CGI-I and PGI-I were close to zero across all follow-up time points in overall, male only and female only data (Table 3) illustrating negligible measurement bias between CGI-I and PGI-I. The direction of mean differences (CGI-I – PGI-I) were randomly changed across time points and in overall, male only and female only data showing no time trend in systematically higher or lower mean CGI-I values compared to mean PGI-I values or vice versa. The number of pairs outside 95% limits of agreement ranged from 2.27 to 8.99 showing reasonably acceptable out of range pairs of agreement all across time points except week 20. Comparing number of pairs above and below zero (x axes in Fig. 1 plots) and mean difference line (dashed lines in Fig. 1 plots) showed no tendency in higher or lower CGI-I rating compared to PGI-I rating or vice versa across time points. Fig. 1 also revealed acceptable homogeneity in CGI-I and PGI-I agreement across low, middle and high values of mean CGI-I and PGI-I values.
Note. LoA: Limits of agreement.
a Difference between CGI-I and PGI-I.
The convergent validity of the PGI-I was evaluated comparing MADRS and LIFE-RIFT pairwise correlations with CGI-I versus MADRS and LIFE-RIFT pairwise correlations with PGI-I across follow-ups (Table 4). As expected moderately high correlations were observed between CGI-I with MADRS and LIFE-RIFT, confirming high agreement between clinician assessed scales. Similar pairwise correlations were observed between PGI-I with MADRS and LIFE-RIFT, supporting convergent validity of PGI-I compared to clinician assessed scales. Convergent correlations were weaker than PGI-I and CGI-I correlations, showing higher agreement between patient and clinician measures of illness severity compared to other clinician assessed scales. Of note, both MADRS and LIFE-RIFT convergent values increased by follow-up time points (see Table 4). MADRS change from baseline also had moderately high correlations with both CGI-I and PGI-I, illustrating convergent validity of PGI-I. In a similar manner, convergent validity of PGI-I were examined by observing moderately high adverse correlation with similar strength between CGI-I with SOFAS and Q-LES-Q versus PGI-I with SOFAS and Q-LES-Q. Table S2 illustrates weighed absolute agreement, unadjusted and adjusted ICC of all time points between PGI-I and CGI-I stratified by mental disorder type (i.e. bipolar and MDD) and Table S3 shows the sub-group analyses findings for convergent and divergent correlations of PGI-I with CGI-I, MADRS, LIFE-RIFT, SOFAS, and Q-LES-Q. Similar to the overall analysis, the weighed absolute agreements and ICCs confirmed the very high level of agreement between PGI-I and CGI-I across all time points. Similar correlation patterns were observed between PGI-I and CGI-I with MADRS, SOFAS, Q-LES-Q and LIFE-RIFT, confirming convergent and divergent validity of PGI-I in bipolar and MDD patients.
a Non-significant correlations (all other correlations are significant at the 0.01 level (2-tailed)).
The present study examined properties of patient’s global impressions of improvement, an outcome measure commonly used in clinical trials for the treatment of medical and psychiatric disorders with subjective endpoints. Overall, our findings support the utility of the PGI-I ratings among patients with MDD and bipolar disorder. These findings were replicated in bipolar and MDD patient sub-group analyses. PGI-I ratings were strongly associated with clinician’s global impressions of improvement and other clinician-administered measures of specific symptomatology and quality of life and functioning.
The agreement between the CGI-I and the PGI-I was investigated by calculating Intra-class correlations and Bland and Altman plots. We found that estimation of improvement by the clinician and patient was excellent according to unadjusted ICCs, and very good according to ICCs that were adjusted to patients’ age and gender. The findings was confirmed with close examination of Bland and Altman plots across gender and overall. Results of ICCs and Bland and Altman plots showed there was no gender bias in PGI-I. Similar conclusion was drawn for patients’ age by comparing unadjusted and age-adjusted ICCs. There was no systematic trend (bias) of over-estimation or under-estimation of PGI-I across follow-up times as examined by mean CGI-I and PGI-I and Bland and Altman plots. A commonly raised question regarding the PGI and related measures is that when patients assigned a rating of global change, their reliance on memory of baseline functioning might compromise the validity of the rating. We observed a slow declining trend in agreement through follow-up time as illustrated by increasing percentage of pairs outside agreement limits form Bland and Altman analytic results, and decreasing ICCs. Despite this trend, there were acceptable agreement through all follow-up time points confirming with strong relationship between PGI-I ratings and CGI-I ratings at mid- and post-treatment follow-ups supported the assumption that PGI-I ratings reflected actual changes in functioning from baseline.
Global Improvement ratings were reassuringly shown to be highly correlated with self-reported and clinician-assessed indices of depressive symptoms and impairment Reference Khan, Khan, Shankles and Polissar[26, Reference Khan, Brodhead and Kolts27]. PGI-I convergent validity is evaluated by comparing PGI-I correlations with clinician-assessed indices with equivalent CGI-I correlations using the knowledge that there are high correlations between CGI-I and other clinician-assessed indices. Evidence of convergent validity of the PGI-I with other clinician’s rated scales namely MADRS, SOFAS, Q-LES-Q and LIFE-RIFT were provided in this study indicated that the PGI-I instruments could be a valuable and useful tool for clinical studies and practice. Sub-group analyses confirmed both convergent and divergent validity of PGI-I in bipolar and MDD patients.
The present study has some limitations. First, there may be some inherent differences in bipolar and major depression that would influence these findings. However, this did not appear to be a significant factor in the current analyses. Secondly, that participants were recruited from several countries, increasing the heterogeneity of the sample. Conversely, this may enhance generalizability of the congruence between patient and clinician ratings.
While the regulatory evaluation of treatment benefit in clinical trials may require multi-item instruments to fully describe the impact of treatment on various symptoms Reference Speight and Barendse, the PGI-I can provide an overall patient centric appraisal of their own condition. This is concordant with the philosophy of patient centric care and research that is increasingly the touchstone of modern clinical care and assessment, where the patient should be the final judge of their care and clinical change Reference Stewart. The PGI is brief, simple and hence practical for clinical use by virtue of its simplicity of administration and interpretability.
Availability of data and materials
The datasets analysed during the current study are not publicly available due to human research ethics restrictions. Data are however available from the authors upon reasonable request and upon institutional human research ethics permission.
The authors declare that they have no competing interests.
All authors involved in designing the study. OMD prepared datasets for analysis.
MM performed the analysis, and was a major contributor in writing the manuscript. All authors read and approved the final manuscript.
The data used in these analyses was provided from clinical trials supported by the NHMRC, Australian Rotary Health, the Stanley Medical Research Institute, the Brain and Behavior Foundation and an Australasian Society for Bipolar and Depressive Disorders/Servier Grant. MB is supported by a NHMRC Senior Principal Research Fellowship (1059660). OMD is supported by a R.D. Wright Biomedical NHMRC Research Fellowship (1145634).
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.eurpsy.2018.05.006.