The Brief Psychiatric Rating Scale (BPRS; Reference Overall and GorhamOverall & Gorham, 1962) is one of the most frequently used instruments for evaluating psychopathology in patients with schizophrenia. Although its psychometric properties in terms of reliability, validity and sensitivity have been extensively examined (for a comprehensive review, see Reference Hedlund and ViewegHedlund & Vieweg, 1980), the clinical implications of BPRS scores are not always clear. For example, to our knowledge it has never been analysed how ill a patient with a BPRS total score of say, 30, 50 or 90 actually is from a clinical judgement point of view. Furthermore, in clinical studies a reduction of at least 20% (e.g. Reference Kane, Honigfeld and SingerKane et al, 1988; Reference Marder and MeibachMarder & Meibach, 1994), 30% (e.g. Reference Arvanitis and MillerArvanitis et al, 1997; Reference Small, Hirsch and ArvanitisSmall et al, 1997), 40% (e.g. Reference Beasley, Tollefson and TranBeasley et al, 1996) or 50% (e.g. Reference Peuskens and LinkPeuskens & Link, 1997) of the initial BPRS score has been used as a cut-off to define response, but what these cut-off levels mean clinically is again unclear. The Clinical Global Impression scale (CGI; Reference GuyGuy, 1976), another frequently used instrument, is to some extent more informative in this regard: because it describes a patient's overall clinical state as a ‘global impression’ by the rater, it provides results that (in contrast to BPRS scores) can be understood intuitively by clinicians (Reference Nierenberg and DeCeccoNierenberg & DeCecco, 2002). The purpose of our study therefore was to find – with statistical means – corresponding points for BPRS and CGI ratings within a large sample of patients with schizophrenia participating in antipsychotic drug trials. To know which BPRS score corresponds to a CGI – Severity rating of, for example, ‘moderately ill’ or ‘severely ill’ or which percentage BPRS reduction from baseline corresponds to a CGI – Improvement rating of ‘minimally better’ or ‘much better’ could increase our understanding of the clinical implications of BPRS scores.
Original patient data from seven trials (baseline n=1979; 1361 men, 618 women; age 35.8 years, s.d.=10.6; weight 72.6 kg, s.d.=15.8; height 172 cm, s.d.=9) comparing amisulpride or olanzapine with other antipsychotics or placebo, which used both the original BPRS (Reference Overall and GorhamOverall & Gorham, 1962) and the CGI (Reference GuyGuy, 1976), were pooled for this analysis (Table 1). All studies were randomised, and all but one (Reference Colonna, Saleem and Dondey-NouvelColonna et al, 2000) were double-blind. Each trial included patients with schizophrenia or schizophreniform disorder according to DSM–III–R or DSM–IV (American Psychiatric Association, 1987, 1994). With one exception (Reference Carrière, Bonhomme and LemperiereCarrière et al, 2000), all studies required various minimum scores as eligibility criteria to assure that the patients had florid positive symptoms. Please note that the criteria in Table 1 were eligibility criteria before the wash-out phases. Some patients had already improved during the wash-out phases and had scores below the eligibility criteria at baseline. The patients in the study without scale-derived minimum scores (Reference Carrière, Bonhomme and LemperiereCarrière et al, 2000) were all inpatients and had a mean BPRS score of 65 at baseline, so that patients with severe symptoms were also involved in this study. The mean BPRS total score at baseline in all studies was 58.9 (s.d.=12.2) and the mean CGI – Severity scale score was 5.2 (s.d.=0.8). All studies used the 18-item version of the BPRS with its original anchors; the items were not derived from the Positive and Negative Syndrome Scale (PANSS; Reference Kay and FiszbeinKay & Fiszbein, 1987). The single items were rated on a seven-point scale (1, not present; 2, very mild; 3, mild; 4, moderate; 5, moderately severe; 6, severe; 7, extremely severe). Thus, the range of possible BPRS total scores is from 18 to 126. The CGI – Severity (CGI–S) and the CGI – Global Improvement (CGI–I) scales (Reference GuyGuy, 1976) were also available for all studies. The CGI–S assesses the clinician's impression of the patient's current illness state. The rater is asked to ‘consider his total clinical experience with the given population’. As with the BPRS, the time span considered is the week before the rating, and the following scores can be given: 1, normal, not at all ill; 2, borderline mentally ill; 3, mildly ill; 4, moderately ill; 5, markedly ill; 6, severely ill; 7, among the most extremely ill patients. The CGI–I assesses the patient's improvement or worsening since the start of the study using the following scores: 1, very much improved; 2, much improved; 3, minimally improved; 4, no change; 5, minimally worse; 6, much worse; 7, very much worse. A third item of the CGI, which tries to relate therapeutic effects and side-effects – the efficacy index – was not used for the analysis.
|Study||Antipsychotic drug used||Sample size (n)||Duration (weeks)||Selected patient characteristics||Mean BPRS score at baseline|
|Möller et al (Reference Möller, Boyer and Fleurot1997)||Amisulpride, haloperidol||191||6||In-patients with paranoid, disorganised or undifferentiated schizophrenia (DSM—III—R), BPRS psychotic sub-score1 ≥ 12 and at least two BPRS psychosis items ≥ 4||61|
|Wetzel et al (Reference Wetzel, Grunder and Hillert1998)||Amisulpride, flupentixol||133||6||Acutely admitted in-patients with paranoid or undifferentiated schizophrenia, BPRS total score ≥ 36, but no predominant negative symptoms defined as SANS composite score > 55||53|
|Puech et al (Reference Puech, Fleurot and Rein1998)||Amisulpride, haloperidol||319||4||In-patients with acute exacerbations of paranoid, disorganised or undifferentiated schizophrenia (DSM—III—R), BPRS psychotic sub-score ≥ 12 and at least two BPRS psychosis items ≥ 4||61|
|Colonna et al (Reference Colonna, Saleem and Dondey-Nouvel2000)||Amisulpride, haloperidol||487||51||In- or out-patients of paranoid, disorganised or undifferentiated schizophrenia (DSM—III—R), at least two BPRS psychosis items ≥ 4||56|
|Carrière et al (Reference Carrière, Bonhomme and Lemperiere2000)||Amisulpride, haloperidol||202||17||In-patients with paranoid schizophrenia or schizophreniform disorder (DSM—IV)||65|
|Peuskens et al (Reference Peuskens, Bech and Möller1999)||Amisulpride, risperidone||228||8||In- or out-patients with paranoid, disorganised or undifferentiated schizophrenia (DSM—IV), BPRS total score ≥ 36, BPRS psychotic sub-score ≥ 12 and at least two BPRS psychosis items ≥ 4||55|
|Beasley et al (Reference Beasley, Tollefson and Tran1996)||Olanzapine, haloperidol, placebo||419||6||In-patients with acute exacerbations of schizophrenia (DSM—III—R), BPRS total score ≥ 42||60|
An often-used, but nevertheless inadequate, method to compare scores would have been to regress BPRS scores on CGI scores or vice versa. Both measures showed only median high correlations (see Results) and, therefore, regression equations would give different results depending on the direction of the regression equation. Linear regression treats one scale as the independent variable measured without error and the other as the dependent variable measured with error. This is conceptually wrong, because both variables are measured with random error. Within the psychometric literature the search for corresponding points on different, but correlated, measurement devices is referred to as ‘linking’ (Reference LinnLinn, 1993) or, in its most strict sense, as ‘equating’ (Reference Kolen and BrennanKolen & Brennan, 1995). For this study we used equipercentile linking, a technique that identifies those scores on both measures that have the same percentile rank. We used the SAS program EQUIPERCENTILE (Reference Price, Lurie and WilkinsPrice et al, 2001), a realisation of the algorithms described by Kolen & Brennan (Reference Kolen and Brennan1995). In the first step, percentile rank functions are calculated for both variables. Using the percentile rank function of one variable and the inverse percentile rank function of the other, one then finds for every score of one variable a score on the other variable that has the same percentile rank. The exact formulae are described in Chapter 2 of Kolen & Brennan (Reference Kolen and Brennan1995). With regard to our large database, no smoothing was applied, either to the cumulative distribution functions or to the resulting linking functions. Only evaluations at baseline and at weeks 1, 2 and 4 were analysed, because although the duration of the studies ranged from 4 weeks to 51 weeks not all studies provided data for other time points, so that trial effects could have biased the results. For each linking task we included all patients with valid values on both measures, because analysing the data only of those who completed the studies would have implied a selection. However, approximately 20% of the patients withdrew between baseline and week 4. In a sensitivity analysis we therefore included only patients who were still in the studies at week 4, so that a rating was available at each time point. With the exception of a somewhat more notable variation concerning the association between the CGI–I ratings much worse/very much worse and percentage BPRS worsening of up to 4–6% BPRS points, the results were so similar that only those of the primary analysis are shown.
Correlation between CGI and BPRS
Spearman correlation coefficients between CGI–S ratings and BPRS total score were 0.41, 0.60, 0.68 and 0.74 respectively for baseline (n=1905), week 1 (n=1835), week 2 (n=1720) and week 4 (n=1512); all P<0.001. Spearman correlations between CGI–I score and percentage improvement of BPRS total score were –0.72, –0.74 and –0.76 for week 1 (n=1829), week 2 (n=1717) and week 4 (n=1511) respectively; all P<0.001.
Linking of CGI–S score and BPRS total score
Figure 1 shows the result of the linking between CGI–S rating and the BPRS total score at baseline and at weeks 1, 2 and 4. They suggest that being considered ‘mildly ill’ on the CGI (CGI–S score 3) approximately corresponded to a BPRS total score of 32 at baseline and at week 1 and a total score of 30 at weeks 2 and 4. Being considered ‘moderately ill’ (CGI–S score 4) corresponded to BPRS total scores of 44 at baseline and 40 at weeks 1, 2 and 4. ‘Markedly ill’ (CGI–S score 5) corresponded to BPRS scores of 55 at baseline, 53 at weeks 1 and 2, and 52 at week 4. ‘Severely ill’ (CGI–S score 6) corresponded to BPRS scores of 70 at baseline and 68, 67 and 65 at weeks 1, 2 and 4, respectively. Extremely ill (CGI–S score 7) corresponded to BPRS scores of 85 at baseline and 89, 84 and 88 at weeks 1, 2 and 4, respectively. Thus, the results were relatively consistent over the four time points examined, although there was a slight tendency that, for a given BPRS score, CGI ratings were somewhat less severe at baseline and became more severe during the course of the treatment. This effect, however, was neither large nor always consistent.
Linking of CGI–I score and percentage BPRS change from baseline
Figure 2 shows the linking function between the CGI–I scale and the percentage BPRS change from baseline at weeks 1, 2 and 4. Ratings of ‘minimally improved’ (CGI–I score 3) at weeks 1, 2 and 4 corresponded to percentage BPRS reductions of 23, 27 and 30%, respectively. Ratings of ‘much improved’ (CGI–I score 2) corresponded to percentage BPRS reductions of 44, 53 and 58% at weeks 1, 2 and 4, respectively. Ratings of ‘very much improved’ (CGI–I score 1) corresponded to percentage BPRS reductions of 71, 79 and 85% at weeks 1, 2 and 4, respectively. Thus there was a consistent time effect indicating that a smaller percentage change in BPRS total score was necessary for a patient to be considered improved 1 week after the initiation of treatment than at later time points. This effect is also seen for the ‘no change’ rating according to the CGI–I (score 4), which was linked with a 5% BPRS score reduction at weeks 1 and 2 and an 8% reduction at week 4.
Although the BPRS is a frequently used and psychometrically sound assessment device collecting explicitly certain aspects of psychotic behaviour, the clinical meaning of a given scale value has not been anchored to a global clinical judgement. In our study the psychometric procedure of equipercentile linking was used to link the BPRS to a clinically meaningful global rating. Applying this procedure in a large sample of acutely ill patients across various multicentre studies did result in a calibration or anchoring of the rating instrument to the clinical judgement. The linking functions linking BPRS scores to the CGI can provide a better understanding of the BPRS and can help clinicians to interpret the results of clinical trials. For example, the data indicate that trials in which the average BPRS total score at baseline was 40 are unlikely to have examined a severely ill population. Furthermore, frequently used cut-off points to define response in treatment trials – a 20 or 50% reduction of the BPRS baseline scores – seem to mean that on average the patients were ‘minimally improved’ and ‘much improved’ respectively, according to the raters’ clinical impression. In fact, the data suggest that somewhat higher cut-off points than 20% (rather 25–30%) and 50% (rather 55%) might be better indicators of ‘minimal improvement’ and ‘much improvement’.
These results are relevant not only for the readers of publications on antipsychotic drugs, but also for the definition of response criteria of future trials: considering that a 25% BPRS score reduction means that the patient is just minimally better compared with baseline, this criterion might be a useful cut-off for studying patients with treatment-refractory disease, but not for the ‘average’ patient. In treatment-refractory cases even a small improvement in symptoms might be clinically important. However, in acutely ill patients with non-refractory conditions, a 50% criterion (i.e. clinically much improved) would seem to be a more appropriate reflection of clinically meaningful improvement, because such patients usually respond well to antipsychotic drugs (Reference ColeCole, 1964). Considering only a 25% reduction (i.e. only minimally improved) of the overall symptoms as a ‘response’ would probably not meet clinicians’ expectations of drug treatment and would be of questionable clinical importance. In contrast to our findings, recent antipsychotic drug trials in patients with acute exacerbations often used a 20 or 30% criterion to distinguish between responders and nonresponders (Reference Marder and MeibachMarder & Meibach, 1994; Reference Arvanitis and MillerArvanitis et al, 1997; Reference Small, Hirsch and ArvanitisSmall et al, 1997). Ironically, the 20% cut-off level was indeed initially used in a study of patients with refractory disease (Reference Kane, Honigfeld and SingerKane et al, 1988), but was subsequently widely applied in studies of non-refractory cases.
The main strength of our analysis is the large number of patients, which should make the results rather robust. However, a number of limitations of our analysis must be considered. Despite the widespread use of the CGI in drug trials, there have been only a few studies of its psychometric characteristics, so the CGI is certainly not an ideal measure for ‘evaluating’ the BPRS. In 116 patients with panic disorder and depression, Leon et al (Reference Leon, Shear and Klerman1993) found good concurrent validity and sensitivity for change using the CGI. In two trials, Khan et al (Reference Khan, Khan and Shankles2002, Reference Khan, Brodhead and Kolts2004) showed that the sensitivity of the CGI–S and CGI–I was similar to that of the Montgomery–Åsberg Depression Rating Scale (Reference Montgomery and AsbergMontgomery & Åsberg, 1979) and the Hamilton Rating Scale for Depression (Reference HamiltonHamilton, 1960). However, Beneke & Rasmus (Reference Beneke and Rasmus1992) criticised the CGI on semantic (e.g. asymmetric scaling), logical (e.g. non-meaningful combinations of CGI–S and CGI–I ratings) and statistical grounds (e.g. relatively low test–retest reliability in a heterogeneous sample of patients with ‘schizophrenic, depressive and anxiety disorders’).
Although the algorithms for linking and equating are the same, the terms have different meanings. For example, equating two forms of a college admission test is done to assure that both forms can be used interchangeably and provide the same decision. In our application the meaning is far less rigorous as the instruments differ, showing correlation coefficients for the CGI–S v. BPRS total score comparison of 0.60–0.76 in weeks 1 to 4 and of only 0.40–0.41 at the baseline measurement. Linking is thus best understood here as a kind of anchoring that helps in understanding the clinical meaning of a given scale score. The correlation at baseline was especially low. This may in part be explained by the minimum of symptoms required at baseline by most studies, so that variability was reduced, accounting for the relatively low correlation.
From a purely statistical point of view, correlating an implicit difference rating (CGI–I rating) with an explicit, calculated ‘percentage improvement’ score is problematic. It was nevertheless reassuring that these two measures showed higher correlations than the severity scores themselves, thus demonstrating that clinicians are able to give meaningful differential global ratings reflecting something like a ‘relative amount of change’. There was a time effect in the percentage BPRS reduction, suggesting that a somewhat smaller ‘objective’ percentage change as measured by the BPRS was necessary for patients to be considered improved according to the CGI–I at 1 week after the initiation of treatment than at later weeks. This result probably reflects physicians’ expectations, which may be lower after short durations of treatment than at later stages. Whereas the investigators received training in BPRS rating before the trials, this was usually not the case for the CGI. Interrater reliabilities for the BPRS between 0.87 and 0.97 have been reported (Collegium Internationale Psychiatrae Scalarum, 1996). A small study reported interrater reliabilities for the CGI–S and the CGI–I of 0.66 and 0.51, respectively (37 physicians rating 12 patients with dementia; Reference Dahlke, Lohaus and GutzmannDahlke et al, 1992). Recently a somewhat better-anchored CGI scale for patients with schizophrenia has been developed (the Clinical Global Impression – Schizophrenia scale) and its validity and reliability have been verified: the interrater reliability was 0.75 (Reference Haro, Kamath and OchoaHaro et al, 2003). A replication with this new scale would be useful. Such data could also show that a more objective measure of clinical psychopathology might be obtained by raters who were masked to which week of participation the patient is in.
It is important to emphasise the nature of the patients involved, as the results might not be the same when different patient populations are analysed. We assembled a data-set composed of people suffering from acute exacerbations of schizophrenia with positive symptoms. For example, in patients suffering only from negative symptoms, the relationship between the BPRS and the CGI – Severity scale might be very different. Such patients could be considered severely ill according to the CGI, but would have relatively low BPRS total scores owing to a lack of positive symptoms. Similarly, a 50% BPRS reduction might have a different clinical meaning in patients with low baseline BPRS scores. We therefore hasten to emphasise that our results relate only to acutely ill patients with schizophrenia with positive symptoms similar to those included in our database.
Despite these limitations, we consider that the results are an important contribution to a better understanding of the clinical meaning of the BPRS total score and percentage BPRS change in score in acutely ill patients with schizophrenia. Future studies should examine other patient populations (e.g. patients with residual schizophrenia and predominant primary negative symptoms) and should use anchored versions of the CGI and specifically trained raters. In addition, efforts are under way to develop criteria for ‘remission’ that could be applied to schizophrenia and used in evaluating treatment effects in a more objective and consistent fashion (Reference Andreasen, Carpenter and KaneAndreasen et al, 2005).
Clinical Implications and Limitations
▪ The linking functions linking Brief Psychiatric Rating Scale (BPRS) total scores to the Clinical Global Impression (CGI) severity ratings provide certain anchors that may help in understanding the results of clinical trials.
▪ Studies in acutely ill, treatment-responsive patients with schizophrenia and positive symptoms should use a 50% BPRS score reduction cut-off to define response rather than lower thresholds.
▪ Linking CGI improvement ratings with percentage BPRS reduction showed a time effect indicating that a smaller percentage BPRS change was necessary for a patient to be considered improved 1 week after the initiation of treatment than at later time points and suggesting that expectation bias might play a part part in assessing improvement.
▪ The results are only generalisable to patients with schizophrenia and at least moderate positive symptoms.
▪ The psychometric properties of the CGI have not been well evaluated, and the analysis should be repeated using better-anchored versions of this measure.
▪ Although using drug trial data to a certain extent reflects ‘real trial world’ conditions, replication studies with specifically trained CGI raters would be useful.
We are indebted to Sanofi-Aventis and Eli Lilly for allowing us to analyse individual patient data from their database. The study was supported by a grant from the Zucker Hillside Hospital Intervention Research Center for Schizophrenia (MH-60575).