Introduction
Bipolar disorder (BD) is a complex and chronic illness characterised by lasting functional and cognitive deficits during all phases, including remission. Indeed, more than half the individuals with BD experience significant functional impairment in several domains, such as family and social life and work, outside the acute phases of the illness (Sanchez-Moreno et al., Reference Sanchez-Moreno, Martinez-Aran and Vieta2017). Some patients also present significant cognitive impairments even in the euthymic phase of the disorder (Roux et al., Reference Roux, Etain, Cannavo, Aubin, Aouizerate, Azorin, Bellivier, Belzeaux, Bougerol, Cussac, Courtet, Kahn, Leboyer, M'Bailara, Payet, Olie, Henry and Passerieux2019). Traditionally, outcomes for patients with BD have been defined as the reduction of mood symptoms. However, the endpoints of randomised placebo-controlled trials (RCT) has recently shifted from clinical remission to functional recovery (Vieta and Torrent, Reference Vieta and Torrent2016). In addition, as cognitive impairment is an important determinant of functional impairment in BD (Roux et al., Reference Roux, Raust, Cannavo, Aubin, Aouizerate, Azorin, Bellivier, Belzeaux, Bougerol, Cussac, Courtet, Etain, Gard, Job, Kahn, Leboyer, Olie, Henry and Passerieux2017), functional recovery may be improved by cognitive remediation.
Recently, the number of clinical trials targeting cognition and psychosocial functioning in BD has markedly increased (Bellani et al., Reference Bellani, Biagianti, Zovetti, Rossetti, Bressi, Perlini and Brambilla2019), with RCTs showing promising results both for cognitive (Lewandowski et al., Reference Lewandowski, Sperry, Cohen, Norris, Fitzmaurice, Ongur and Keshavan2017) and functional (Torrent et al., Reference Torrent, Bonnin, Martínez-Arán, Valle, Amann, González-Pinto, Crespo, Ibáñez, Garcia-Portilla and Tabarés-Seisdedos2013; Bonnin et al., Reference Bonnin, Reinares, Martinez-Aran, Balanza-Martinez, Sole, Torrent, Tabares-Seisdedos, Garcia-Portilla, Ibanez, Amann, Arango, Ayuso-Mateos, Crespo, Gonzalez-Pinto, Colom and Vieta2016) remediation, while many trials are still ongoing (Strawbridge et al., Reference Strawbridge, Fish, Halari, Hodsoll, Reeder, Macritchie, McCrone, Wykes and Young2016; Gomes et al., Reference Gomes, Rocca, Belizario and Lafer2017; Ott et al., Reference Ott, Vinberg, Bowie, Christensen, Knudsen, Kessing and Miskowiak2018). In such a context of the considerable deployment of resources, there is an urgent need to confirm whether statistically significant changes identified in clinical trials are beneficial to the individual in daily life. The smallest clinically meaningful improvement which can be perceived by patient caregivers is called the minimal clinically important difference (MCID). The MCID is crucial to accurately estimate the number of patients needed to treat in RCTs for continuous outcomes (Guyatt et al., Reference Guyatt, Juniper, Walter, Griffith and Goldstein1998), such as cognition and functioning, by preventing the loss of power resulting from the dichotomisation of continuous scores when the MCID is unknown (Falissard et al., Reference Falissard, Sapin, Loze, Landsberg and Hansen2016). The MCID also plays a crucial role in interpreting cognitive and functional scale scores in a clinical setting. Until now, interpreting results from such instruments have relied on the personal experience of clinicians treating populations with BD and thus lack objectivity (Phillips et al., Reference Phillips, Qi, Collinson, Ling, Feng, Cheung and Ng2015). The MCID has been proposed as a more objective way to establish clinical relevance to changes in standardised instrument scores and can be used to assess the effectiveness of treatment.
Several methods used so far to estimate MCID have been classified according to whether they are anchor-based or distribution-based methods (Revicki et al., Reference Revicki, Hays, Cella and Sloan2008). Anchor-based methods compare the instrument scores to an external gold-standard criterion, whereas distribution-based methods estimate the MCID based on a measure of the variability of the observed scores. We aimed to characterise the MCID for cognition and psychosocial functioning in BD using both anchor- and distribution-based methods, as combining the two strategies is widely recommended (Revicki et al., Reference Revicki, Hays, Cella and Sloan2008). In this study, we investigated the MCID for psychosocial functioning for the Functioning Assessment Short Test (FAST), because this scale was specifically designed for BD, it is a domain-based measure of functioning (six domains: autonomy, occupational functioning, cognition, financial issues, interpersonal relationships and leisure (Rosa et al., Reference Rosa, Sanchez-Moreno, Martinez-Aran, Salamero, Torrent, Reinares, Comes, Colom, Van Riel, Ayuso-Mateos, Kapczinski and Vieta2007)), and it is a prevalent instrument in the literature (Chen et al., Reference Chen, Fitzgerald, Madera and Tohen2019). Cognition was investigated with a neuropsychological battery covering six relevant domains for BD. The use of multiple anchors is strongly recommended (Revicki et al., Reference Revicki, Hays, Cella and Sloan2008). Thus, the two anchor dimensions selected in this study were global functioning and BD severity, which has been significantly associated with cognition in two meta-analyses (Bourne et al., Reference Bourne, Aydemir, Balanza-Martinez, Bora, Brissos, Cavanagh, Clark, Cubukcuoglu, Dias, Dittmann, Ferrier, Fleck, Frangou, Gallagher, Jones, Kieseppa, Martinez-Aran, Melle, Moore, Mur, Pfennig, Raust, Senturk, Simonsen, Smith, Bio, Soeiro-de-Souza, Stoddart, Sundet, Szoke, Thompson, Torrent, Zalla, Craddock, Andreassen, Leboyer, Vieta, Bauer, Worhunsky, Tzagarakis, Rogers, Geddes and Goodwin2013; Bora, Reference Bora2018), as well as psychosocial functioning (Sanchez-Moreno et al., Reference Sanchez-Moreno, Martinez-Aran and Vieta2017).
Methods
Study design and characteristics of the recruiting network
This multicentre, longitudinal study included patients recruited into the FACE-BD (FondaMental Advanced Centers of Expertise for Bipolar Disorders) cohort within a French national network of 10 centres (Bordeaux, Colombes, Créteil, Grenoble, Marseille, Monaco, Montpellier, Nancy, Paris, and Versailles). This network was set up by the Fondation FondaMental (https://www.fondation-fondamental.org), which created an infrastructure and provided resources to follow clinical cohorts and comparative-effectiveness research in patients with BD. All procedures were approved by the local ethics committee (Comité de Protection des Personnes Ile de France IX) on January 18, 2010, under French law for non-interventional studies (observational studies without any risk, constraint, or supplementary or unusual procedure concerning diagnosis, treatment or monitoring). The board required that all patients be given an informational letter but waived the requirement for written informed consent. However, verbal consent was witnessed and formally recorded.
Participants
The diagnosis of BD was based on the Structured Clinical Interview for DSM-IV-TR (SCID) criteria (First et al., Reference First, Spitzer, Gibbon and Williams1997). Outpatients with type 1, type 2, or not-otherwise-specified BD, between 18 and 65 years of age, were eligible for this analysis. No criteria related to the current mood state at inclusion were used to preserve the variability of absolute and changed levels of functioning and cognition in this longitudinal observational cohort. However, individuals whose symptoms intensity was judged to be incompatible with the one-and-a-half-day evaluation at baseline were excluded (for instance, high suicidal risk, agitation, severe distractibility, disability to think or concentrate or severe indecisiveness).
Assessment tools
The socio-demographic variables collected at inclusion were sex, age and education level.
Clinical assessments at inclusion and 12 and 24 months
The following clinical variables were recorded using the SCID: age at onset of BD, number and type of previous mood episodes, a subtype of BD and history of psychotic symptoms. Mania was measured using the Young Mania Rating Scale (YMRS; Young et al., Reference Young, Biggs, Ziegler and Meyer1978). Depression was measured using the Montgomery-Asberg Depression Rating Scale (MADRS; Montgomery and Asberg, Reference Montgomery and Asberg1979). We used a yes/no questionnaire for recording patient treatment at the three times of evaluation: lithium carbonate, anticonvulsants, antipsychotics, antidepressants or anxiolytics.
Domain-based psychosocial functioning was measured using the total score of the FAST, a short instrument comprising 24 items administered during an interview by a trained clinician. Two external criteria were used to anchor and calibrate the FAST and cognition. The first was the Clinical Global Impression-Severity (CGI-S) scale, which assesses the severity of the disorder (Guy, Reference Guy and Guy1976). This tool was selected as an anchor because it is a well-established rating used by practising clinicians and is widely used for this purpose in the field of MCID (Duru and Fantino, Reference Duru and Fantino2008; Hermes et al., Reference Hermes, Sokoloff, Stroup and Rosenheck2012; Falissard et al., Reference Falissard, Sapin, Loze, Landsberg and Hansen2016). The CGI-S was preferred to the CGI-I to avoid any memory bias during the 2-year follow-up. For the CGI-S, the minimum clinically important difference has been defined as the minimal observable difference between two adjacent categories, which is 1. A difference of 2 was considered to be mild, 3 moderate, 4 marked, 5 severe or great (depending on the direction) and 6 extreme. The second anchor was the Global Assessment of Functioning (GAF; Jones et al., Reference Jones, Thornicroft, Coffey and Dunn1995), which measures global functioning. It was chosen because it is highly used in BD (Chen et al., Reference Chen, Fitzgerald, Madera and Tohen2019), particularly as a reference measure of functioning (Bonnin et al., Reference Bonnin, Martinez-Aran, Reinares, Valenti, Sole, Jimenez, Montejo, Vieta and Rosa2018). For the GAF, the minimum clinically important absolute difference has been defined as the range of the score within one category, which is 10. An absolute difference of 20 was considered to be mild, 30 moderate, 40 marked, 50 severe or great (depending on the direction), and 60 extreme.
The battery of cognitive tests at inclusion and 24 months
Experienced neuropsychologists administered the tests in a fixed order that was the same for every centre. Testing lasted approximately 120 min, including 5-to-10-min breaks. The standardised test battery complied with the recommendations of the International Society for BD (Yatham et al., Reference Yatham, Torres, Malhi, Frangou, Glahn, Bearden, Burdick, Martinez-Aran, Dittmann, Goldberg, Ozerdem, Aydemir and Chengappa2010). This evaluation was not performed at T12. It included 11 tests, amongst which five were subtests from the Wechsler Adult Intelligence Scale (WAIS) version III (Wechsler, Reference Wechsler1997a) or version IV (Wechsler et al., Reference Wechsler, Coalson and Raiford2008), as the French version of the WAIS-IV started to be used as it became available. The battery evaluated six domains:
• Processing speed: Digit symbol coding (WAIS-III) or coding (WAIS-IV), WAIS symbol search and TMT part A
• Verbal memory: California Verbal Learning Test (Delis, Reference Delis2000) short and long delay free recall and total recognition
• Attention: Conners’ Continuous Performance Test II (detectability, (Conners and Staff, Reference Conners and Staff2000)
• Working memory: WAIS digit span (total score) and spatial span (forward and backward scores) from the Wechsler Memory Scale version III (Wechsler, Reference Wechsler1997b)
• Executive functions: colour/word condition of the Stroop test (Golden, Reference Golden1978), semantic and phonemic verbal fluency (Lezak, Reference Lezak2004), and Trail-Making Test (TMT) part B (Reitan, Reference Reitan1958)
• Verbal and perceptual reasoning: WAIS vocabulary and matrices
Raw scores were transformed to demographically corrected standardised z-scores based on normative data (Golden, Reference Golden1978; Conners and Staff, Reference Conners and Staff2000; Poitrenaud et al., Reference Poitrenaud, Deweer, Kalafat and Van der Linden2007; Godefroy, Reference Godefroy and Godefroy2008). Higher scores reflected better performance.
Statistical analyses
Anchor-based MCID estimation
The Spearman rank correlation coefficient was used to quantify the association between the clinical anchors (CGI-S and GAF) and the instrument being investigated (FAST or cognition). Linking analysis aims to find corresponding points on different (in length or content), but correlated, tests (Lim, Reference Lim1993). It has been recommended that the clinical anchors and the instrument being examined have a correlation threshold ⩾|0.30| (Revicki et al., Reference Revicki, Hays, Cella and Sloan2008; Cheung et al., Reference Cheung, Foo, Shwe, Tan, Fan, Yong, Madhukumar, Ooi, Chay, Dent, Ang, Lo, Yap, Ng and Chan2014). Thus, linking analyses were performed only for variables that showed a correlation above this threshold. Among the several available linking techniques, equipercentile linking is particularly useful, as it allows a non-linear relationship, with a symmetric attribution of random error in measurement between the two tests, which is not true, for example, for linear regression (Kolen and Brennan, Reference Kolen and Brennan2013). This technique sets the cumulative distribution functions of the two tests as equal and identifies the scores on each scale that have the same percentile ranks. The kernel method for equating tests was applied using the package kequate for R (Andersson et al., Reference Andersson, Bränberg and Wiberg2013).
In a cross-sectional analysis, the clinical anchor (CGI-S and GAF) scores were initially mapped to the FAST and cognition using equipercentile linking techniques for values at baseline and 12 and 24 months. The average linking values across all time points were also computed. Changes in CGI-S and GAF scores were then linked to corresponding changes in the FAST and cognition between baseline and 12 and 24 months. The average linking values for changes across all time points were also computed.
Distribution-based MCID estimation
The distribution-based method estimates the MCID by comparing the observed change in the FAST and cognition to the variability in these instruments calculated in this study as the standard error of measurement (s.e.m.), which is more concordant with a clinically meaningful change than other distributive methods (McHorney and Tarlov, Reference McHorney and Tarlov1995; Eisen et al., Reference Eisen, Ranganathan, Seal and Spiro2007). The formula for the s.e.m. is ${\rm S}.{\rm E}.{\rm M}. = \delta \sqrt {\lpar {1-r} \rpar }$, where δ is the standard deviation (s.d.) and r is the reliability as measured by the intraclass correlation coefficient. Previous studies have shown that values between 1 and 1.96 s.e.m. approximate the MCID (Wyrwich, Reference Wyrwich2004; Rejas et al., Reference Rejas, Pardo and Ruiz2008; Falissard et al., Reference Falissard, Sapin, Loze, Landsberg and Hansen2016). To calculate the s.d. of the FAST and cognition, a subset of the population with stable symptomatology during the follow-up period was chosen by identifying individuals whose CGI-S score did not change from baseline to 24 months, a method similar to that used by several authors (Duru and Fantino, Reference Duru and Fantino2008; Hermes et al., Reference Hermes, Sokoloff, Stroup and Rosenheck2012). The s.d. of the FAST and cognition scores for this population at baseline was used for the s.e.m. calculation. The intraclass correlation coefficient was calculated using a two-way mixed model of FAST and cognition at baseline and 24 months.
Results
Participants
The breakdown of participants at each time point was as follows: baseline, 1422; 1 year, 742 (47.8% of the participants were lost); and 2 years, 571 (59.8% of the participants were lost). Participants were included between January 2009 and October 2015. Their socio-demographic, clinical and functional characteristics at inclusion are presented in Table 1. A current mood episode was present in 15.5% of individuals at inclusion.
MADRS, Montgomery Åsberg Depression Rating Scale; YMRS, Young Mania Rating Scale; CGI, Clinical Global Impression scale; GAF, Global Assessment of Functioning scale; FAST, Functioning Assessment Short Test.
The number of participants who benefited from the neuropsychological evaluation was 1221 at inclusion (41% had WAIS-IV) and 366 at 2 years. The results of the neuropsychological tests are presented in Table 2.
TMT, Trail Making Test; CPT, Continuous Performance Test; CVLT, California Verbal Learning Test.
Linking the FAST to the CGI-S and GAF
Linking of cross-sectional scores with the anchor-based MCID estimation
The correlations between the FAST total score, CGI-S and GAF are presented in online Supplementary Table 1. The observed correlations were statistically significant at all time points (p-value < 0.001) for the CGI-S and GAF, with all absolute values of Spearman's rank correlation coefficients >0.4, thus allowing anchor-based linking analysis.
The results of the linking between the FAST total scores and the CGI-S scores at each measurement wave are presented in online Supplementary Fig. 1. An average CGI-S ranking of 1-‘normal’ corresponded to a FAST score of 0, 2-‘borderline’ to 3, 3-‘mildly ill’ to 8, 4-‘moderately ill’ to 18, 5-‘markedly ill’ to 29, 6-‘severely ill’ to 40 and 7-‘extremely ill’ to 54.
The results of the linking between the FAST total scores and the GAF scores at each measurement wave are shown in online Supplementary Fig. 2. An average GAF ranking of 20 (some danger of hurting self or others) corresponded to a FAST score of 72, 30 (serious impairment in communication or judgment, or inability to function in almost all areas) to 63, 40 (major impairment in several areas, such as work or school, family relations, judgment, thinking or mood) to 50, 50 (any serious impairment in social, occupational, or school functioning) to 39, 60 (moderate difficulty in social, occupational, or school functioning) to 27, 70 (some difficulty in social, occupational or school functioning, but generally functioning pretty well, has some meaningful interpersonal relationships) to 16, 80 (no more than slight impairment in social, occupational or school functioning) to 7, 90 (good functioning in all areas, interested and involved in a wide range of activities; socially effective, generally satisfied with life, no more than everyday problems or concerns) to 2, and 100 (superior functioning in a wide range of activities, life's problems never seem to get out of hand, is sought out by others because of his or her many positive qualities) to 0.
Linking of change scores with the anchor-based MCID estimation
Among participants, 35.6% and 42.2% presented no change in CGI-S and FAST, respectively. The number of occurrences in each level of change in CGI-S and GAF is reported in Table 3. The correlations between changes in the FAST, CGI-S and GAF are presented in online Supplementary Table 2. The observed correlations were statistically significant at all time points (p < 0.001) for the CGI-S and GAF, with all absolute values of Spearman rank correlation coefficients >0.3, thus allowing anchor-based linking analysis. The results of the linking between changes in the FAST total score and CGI-S at each measurement wave are shown in Fig. 1 and Table 3. The results of the linking between changes in the FAST total score and GAF at each measurement wave are shown in Fig. 2 and Table 3.
CGI, Clinical Global Impression scale; GAF, Global Assessment of Functioning scale; FAST, Functioning Assessment Short Test.
The MCID for the FAST was equal to 8 or 9 points using the CGI-S and GAF (lower bound of the minimum clinically important improvement for CGI-S: 8; lower bound of the minimum clinically important worsening for CGI-S: 8; lower bound of the minimum clinically important improvement for GAF: 8; lower bound of the minimum clinically important worsening for GAF: 9). A change in FAST of 16 points was considered to be mild, 23 moderate and 31 marked.
Distribution-based MCID estimation for the FAST
The sub-population for which there was no change in the CGI-S between inclusion and 12 months consisted of 233 individuals with a mean FAST score of 18.6 (s.d. = 26.3) at inclusion. The reliability of the FAST calculated as the intra-class correlation between the FAST scores at baseline and those at 12 months was 0.73, which computes to 1 s.e.m. = 7.6 and 1.96 s.e.m. = 14.9 FAST points.
Linking cognition to the CGI-S and GAF
Linking of cross-sectional scores with the anchor-based MCID estimation
The correlations between cognition, CGI-S and GAF are presented in online Supplementary Table 3. The Spearman rank correlation coefficients were all ⩽0.2: it was thus not possible to perform an anchor-based analysis linking cognition with the CGI-S and GAF. However, certain observed correlations were statistically significant. The strongest negative association between the CGI-S and cognition was found for CPT-Detectability at 24 months (ρ = −0.15, uncorrected p-value = 0.007). The strongest positive association between the GAF and cognition was found for Verbal Fluency-Semantic at 24 months (ρ = 0.2, uncorrected p-value ⩽0.001).
Linking of change scores with the anchor-based MCID estimation
The correlations between changes in cognition, CGI-S and GAF are presented in online Supplementary Table 4. The absolute values of Spearman rank correlation coefficients were all ⩽0.2: it was thus not possible to perform an anchor-based analysis linking cognition with the CGI-S and GAF. A few observed correlations were statistically significant. The strongest negative association between changes in CGI-S and cognition was found for Digit/symbol coding (ρ = −0.18, uncorrected p-value = 0.001) and the strongest positive association between changes in GAF and cognition was found for Digit Span Forward & backward (ρ = 0.13, uncorrected p-value = 0.019).
Distribution-based MCID estimation for cognition
The results are presented in Table 4. The MCID for cognition ranged from 0.45 (for Digit/symbol coding) to 0.93 (for TMT part B) for 1 s.e.m. and from 0.88 to 1.82 for 1.96 s.e.m.
ICC, Intra Class Correlation; TMT, Trail Making Test; CVLT, California Verbal Learning Test; CPT, Continuous Performance Test.
Discussion
This study estimated the MCID for the FAST, a widely used measure of domain-based functioning in BD, along with a battery of cognitive tests.
Main findings and comparison with other studies
This is the first study to report the MCID in psychosocial functioning and cognitive performance in BD. We found an estimate of 8 or 9 for the MCID in the FAST total score with the anchor-based approach, which corresponded to the threshold of 7.6 found with the 1-s.e.m. distribution-based approach. These results suggest that a change below 8 for the FAST total score would not be clinically significant at the individual level. Despite different conceptual underpinnings, the anchored- and distribution-based estimations of the MCID for the FAST were very close, thus providing additional evidence of the validity of these estimates. The 1.96 s.e.m. distribution-based approach gave a more conservative threshold of 14.9 for the FAST total score.
Moreover, using an anchor point of >80 for the GAF for functional remission (Bonnin et al., Reference Bonnin, Martinez-Aran, Reinares, Valenti, Sole, Jimenez, Montejo, Vieta and Rosa2018), we obtained a cut-off of ⩽7 for the FAST. Considering the transition between borderline and mildly ill for the CGI-S as another anchor point for clinical remission, we obtained exactly the same threshold of <8 for the FAST. This threshold is lower than the cut-off of 11 previously estimated in a sample of 101 participants (Bonnin et al., Reference Bonnin, Martinez-Aran, Reinares, Valenti, Sole, Jimenez, Montejo, Vieta and Rosa2018). This small gap between the two studies can be explained by the higher depressive and manic symptomatology in our sample than that in the other, in which participants were strictly euthymic. Indeed, functional remission is more difficult to attain in cases of more pronounced mood symptoms. Selecting only euthymic participants when studying functioning may be problematic for the generalisability of results, as it excludes participants with a mild form of chronic or highly recurrent depression, who may yet benefit from functional remediation. By contrast, the present study used open inclusion criteria, allowing for selection of what is likely a generalisable population of outpatients with BD.
For cognition, the threshold correlation of |0.3| with clinical severity or global functioning was obtained for none of the cognitive tests. A meta-analysis reported a mean Pearson correlation between neurocognitive ability and functioning of 0.27 (Depp et al., Reference Depp, Mausbach, Harmell, Savla, Bowie, Harvey and Patterson2012). However, the correlations were lower for clinician ratings (such as for the three scales used in this study) than performance-based tasks and real-world milestones, such as employment. Performance-based tasks may thus be better candidates for anchoring cognition on functioning than clinician-rated scales such as GAF or CGI. Subtle cognitive impairments might also be detected with a self-reported scale assessing cognitive complaints, such as the ‘Cognitive complaints in Bipolar disorder Rating Assessment’ (COBRA). This scale may be more closely associated with functioning than objective neuropsychological performance. Anchor-based MCID in cognition measured with COBRA should thus be explored in further studies. In the present study, the MCID was evaluated using only distribution-based methods: 1 s.e.m. of the MCID ranged from 0.5 to 0.9 s.d. and 1.96 s.e.m. of the MCID ranged from 0.9 to 1.8 s.d.. Very few studies have explored the MCID in the context of a neuropsychological battery. An observational study reported a similar range of 0.5–0.9 s.d. for 1-s.e.m. of the MCID in cognition for mild cognitive impairment (Phillips et al., Reference Phillips, Qi, Collinson, Ling, Feng, Cheung and Ng2015). In this study, anchor-based MCID in cognition ranged from 0.3 to 0.9. Another study investigating reliable cognitive changes in schizophrenia reported even larger values, between 0.7 and 1.7 s.d. (Gray et al., Reference Gray, McMahon, Green, Seidman, Mesholam-Gately, Kern, Nuechterlein, Keefe and Gold2014).
The MCID found for cognition in this study may seem to be very large to be considered as minimally detectable by patients and clinicians. Several factors may explain this large MCID in cognition. First, one might speculate that the 1.96 s.e.m. of the MCID in cognition may have overestimated the true MCID, as the 1.96 s.e.m. of the MCID in FAST was larger than the anchor-based MCID in our study. One previous study has indeed reported that even the 1 s.e.m. of the MCID in cognition was slightly larger than the anchor-based MCID (Cheung et al., Reference Cheung, Foo, Shwe, Tan, Fan, Yong, Madhukumar, Ooi, Chay, Dent, Ang, Lo, Yap, Ng and Chan2014). Secondly, the neuropsychological performances were heterogeneous in our observational study, as the participants were not selected on their cognitive performance, as opposed to RCT's investigating cognitive remediation or enhancement. A significant heterogeneity implies high s.d. in cognitive performance, leading to a large s.e.m. and MCID. The MCID in cognition must thus be interpreted with caution, as the s.e.m. only reflects a change that cannot be attributed to measurement error alone. The fact that it may estimate the MCID is only theoretical (some authors consider, for example, the s.e.m. measures a minimal detectable difference rather than a minimal clinically important difference (De Vet et al., Reference De Vet, Terwee, Mokkink and Knol2011)) and should be corroborated with clinical anchors. Here, the MCID was evaluated within an observational study. The MCID may differ depending on whether the data were gathered in an observational study or clinical trial (Revicki et al., Reference Revicki, Hays, Cella and Sloan2008). RCTs may overestimate anchor-based MCID, as substantial differences in outcomes are expected on a carefully selected population (Falissard et al., Reference Falissard, Sapin, Loze, Landsberg and Hansen2016). By contrast, a distribution-based MCID would underestimate values due to the homogeneity of the selected population. Observational study-based MCIDs may conversely be more reliable as they are not affected by therapeutic interventions or eligibility criteria (Falissard et al., Reference Falissard, Sapin, Loze, Landsberg and Hansen2016).
Limitations
This study had several limitations. The first was the long-time interval between the two waves for calculating the distribution-based MCID for the functioning (1 year) and cognition (2 years). This may have led to an overestimation of the s.e.m., increasing the probability of a change to occur during the follow-up period, especially since previous reports showed an improvement in psychosocial functioning and cognition in this cohort (Ehrminger et al., Reference Ehrminger, Brunet-Gouet, Cannavo, Aouizerate, Cussac, Azorin, Bellivier, Bougerol, Courtet, Dubertret, Etain, Kahn, Leboyer, Olie, Passerieux and Roux2019). However, we believe that such an overestimation bias may have been controlled by the fact that the distribution-based estimates were computed on a sample of patients with stable functioning. The influence of mood symptoms (Bonnín et al., Reference Bonnín, González-Pinto, Solé, Reinares, González-Ortega, Alberich, Crespo, Salamero, Vieta, Martínez-Arán and Torrent2014), medication (Roux et al., Reference Roux, Etain, Cannavo, Aubin, Aouizerate, Azorin, Bellivier, Belzeaux, Bougerol, Cussac, Courtet, Kahn, Leboyer, M'Bailara, Payet, Olie, Henry and Passerieux2019) and trauma (Jimenez et al., Reference Jimenez, Sole, Arias, Mitjans, Varo, Reinares, Bonnin, Ruiz, Saiz, Garcia-Portilla, Buron, Bobes, Amann, Martinez-Aran, Torrent, Vieta and Benabarre2017) has not been assessed in this study and these are variables that could have influenced the patient outcomes. Another significant limitation was the lack of a psychometrically validated MCID for the two gold-standard anchor measures (CGI-S and GAF), which were determined based on the expertise of the authors and how the two scales were elaborated and clinically anchored. A final drawback was the loss of more than half of the patients to follow-up. No survey was proposed to the non-completers; it was thus impossible to investigate the reasons for such a high rate of attrition.
Clinical implications
We estimated the MCID for the FAST with a large representative sample using various complementary analytical techniques. The results were consistent, giving an estimation of 8 points. This result may provide clinicians with a better understanding of a commonly used measure of functioning in BD in both research reports and clinical practice. In light of the recent developments of functional remediation in BD, it is crucial to know whether newer interventions are sufficient to achieve functional recovery and a clinically relevant change in functioning. The results presented here aid in the transposition of trial results into practice.
Our results were less clear for the determination of the MCID for cognition, as changes in cognitive performance did not consistently correlate with changes in clinical severity or functioning. Further studies should use performance-based tasks to evaluate functioning as clinical anchors for cognition in BD, using, for example, the Brief University of California, San Diego (UCSD) Performance-based Skills Assessment (Patterson et al., Reference Patterson, Goldman, McKibbin, Hughs and Jeste2001). Despite this limitation, our results provide the first estimates for interpreting cognitive changes in BD at an individual level; these results would also help in estimating the required number to treat for RCTs in the field of cognitive remediation in BD.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2045796020000566.
Availability of data and materials
Data will not be shared as participants did not give their consent for sharing these data.
Acknowledgements
All co-authors were invited to proofread and amend the manuscript. We thank the Centre Hospitalier de Versailles and William Hempel (Alex Edelman and Associates) for editorial assistance.
Financial support
This work was supported by the Centre Hospitalier de Versailles, Fondation FondaMental, Créteil, France, and the Investissements d'Avenir Programs managed by the ANR under references ANR-11-IDEX-0004-02 and ANR-10-COHO-10-01.
Conflict of interest
The authors have no conflicts of interest to state.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.