Predicting relapse or recurrence of depression: systematic review of prognostic models

Background Relapse and recurrence of depression are common, contributing to the overall burden of depression globally. Accurate prediction of relapse or recurrence while patients are well would allow the identification of high-risk individuals and may effectively guide the allocation of interventions to prevent relapse and recurrence. Aims To review prognostic models developed to predict the risk of relapse, recurrence, sustained remission, or recovery in adults with remitted major depressive disorder. Method We searched the Cochrane Library (current issue); Ovid MEDLINE (1946 onwards); Ovid Embase (1980 onwards); Ovid PsycINFO (1806 onwards); and Web of Science (1900 onwards) up to May 2021. We included development and external validation studies of multivariable prognostic models. We assessed risk of bias of included studies using the Prediction model risk of bias assessment tool (PROBAST). Results We identified 12 eligible prognostic model studies (11 unique prognostic models): 8 model development-only studies, 3 model development and external validation studies and 1 external validation-only study. Multiple estimates of performance measures were not available and meta-analysis was therefore not necessary. Eleven out of the 12 included studies were assessed as being at high overall risk of bias and none examined clinical utility. Conclusions Due to high risk of bias of the included studies, poor predictive performance and limited external validation of the models identified, presently available clinical prediction models for relapse and recurrence of depression are not yet sufficiently developed for deploying in clinical settings. There is a need for improved prognosis research in this clinical area and future studies should conform to best practice methodological and reporting guidelines.


Background
Relapse and recurrence of depression are common, contributing to the overall burden of depression globally. Accurate prediction of relapse or recurrence while patients are well would allow the identification of high-risk individuals and may effectively guide the allocation of interventions to prevent relapse and recurrence.

Aims
To review prognostic models developed to predict the risk of relapse, recurrence, sustained remission, or recovery in adults with remitted major depressive disorder.

Method
We searched the Cochrane Library (current issue); Ovid MEDLINE (1946 onwards); Ovid Embase (1980 onwards); Ovid PsycINFO (1806 onwards); and Web of Science (1900 onwards) up to May 2021. We included development and external validation studies of multivariable prognostic models. We assessed risk of bias of included studies using the Prediction model risk of bias assessment tool (PROBAST).

Background
Depression is the leading cause of disability worldwide. 1 After a first episode of depression, approximately half of patients will experience a relapse or recurrence (re-emergence of depressive symptoms after an initial improvement), 2 and most do so within the first 6 months. 3 Those who experience a relapse or recurrence are more likely to relapse again in the future compared with those who do not. 4 There is evidence to suggest that relapse or recurrence of depression result in an increased risk of subsequent relapse 4 and, possibly, increased treatment resistance. 5 Reliable prediction of individuals' risk of relapse and recurrence might enable a precision medicine approach to relapse prevention, personalising the allocation and potentially type of relapse prevention interventions offered to ensure maximum benefit. Prognostic factors are variables that are associated with an outcome of interest, although are not necessarily causal, and overall prognosis can be estimated within groups defined by the values of a prognostic factor. These are differentiated from prescriptive factors, which are associated with outcomes and also moderate treatment effects. Prognostic factors associated with relapse and recurrence include childhood maltreatment, history of recurrent depression and presence of residual depressive symptoms, among others, whereas evidence for prescriptive factors remains limited. 6 Multivariable prognostic models combine information about multiple prognostic factors for a particular person to provide individualised risk predictions. 7 There have been an increasing number of attempts to derive and validate prognostic models to predict depression-related outcomes. [8][9][10][11] There has been no previous systematic review to identify all prognostic models designed to predict relapse or recurrence of depression.

Objectives
To identify and critically appraise prognostic model development and validation studies aimed at predicting relapse, recurrence, sustained remission or recovery in adults with major depressive disorder who meet the criteria for remission or recovery. In addition, we planned to summarise and meta-analyse their predictive performance, to describe the characteristics of the models identified, and to review the clinical utility (net benefit) of the identified models, where possible.

Method
The protocol was preregistered in the Cochrane Database of Systematic Reviews (CD013491) 12,13 and is reported in line with the Preferred Reported Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline. 14

Eligibility criteria
We specified the following inclusion criteria (see the Appendix for PICOTS criteria): 15 (a) adult population (18 years and over) with major depressive disorder (defined using validated diagnostic criteria) who met criteria for remission or recovery (i.e. no longer meeting diagnostic criteria for major depressive episode) at the point of prediction; (b) any setting (primary, secondary, or community care); (c) all multivariable prognostic models developed to predict individual risk of relapse, recurrence, sustained remission, or recovery of depression over any time period.
Remission and recovery are terms used to describe an improvement in depressive symptoms; remission meaning improved but still 'in episode' and recovery being the resolution of the underlying episode (usually after 6 to 12 months of remission). 16 Relapse occurs following some level of remission but precedes recovery, whereas recurrence is the onset of a new episode of depression following recovery. 17,18 Sustained remission can be thought of as the inverse, or opposite of relapse; and recovery as the inverse of recurrence. Both of these hold potentially valuable prognostic information pertinent to relapse risk prediction models in depression, and are therefore included as outcomes in this review. The precise temporal cut-offs of these terms have not been robustly validated empirically and are inconsistently operationalised in the literature. 6 For this reason, we accepted all definitions of these terms, as operationalised by the authors of the primary studies.
We included all three types of prognostic model study: (a) development studies with internal validation (which derive a model for individualised prediction and quantify predictive performance in the development data-set); (b) development with external validation (which develop a model and then quantify the performance in data external to the development set); and (c) external validation only (attempt to externally validate an existing model). 19 External validation did not include randomly splitting the development data-set to produce two separate data-sets (an approach more appropriately considered an inefficient form of internal validation), 7 but did include studies where a validation data-set was produced by a non-random split, for example, participants from the same institution but at different time points (temporal validation) or by location (geographical validation). 20 We excluded models developed in populations with comorbid severe mental illness (for example, schizophrenia and bipolar affective disorder), as these patients typically receive more intensive psychiatric input and the results would be less generalisable. We excluded studies where the intention was not to provide individualised risk predictions (for example those aimed at quantifying the adjusted effects of prognostic factors).

Information sources and search strategy
We searched the Cochrane Library (current issue); Ovid MEDLINE (1946 onwards); Ovid Embase (1980 onwards); and Ovid PsycINFO (1806 onwards) up to May 2021, using relevant subject headings (controlled vocabularies) and search syntax, appropriate to each resource. We also searched several grey literature resources primarily for dissertations and theses (Open Grey (www.opengrey.eu); ProQuest Dissertations & Theses Global (www.proquest.com/products-services/pqdtglobal.html); DART-Europe E-theses Portal (www.dart-europe.eu); EThOS -the British Libraries e-theses online service (ethos.bl.uk); Open Access Theses and Dissertations (oatd.org)), also up to May 2021. We applied no restrictions by date, language or publication status. We checked the reference lists of all included articles and conducted forward citation searches on the Web of Science (12 March 2021 and 19 May  2021), to identify additional studies missed from the original electronic searches (for example unpublished or in-press citations).
We contacted corresponding authors for information on unpublished or ongoing studies.

Selection of studies
Two review authors (A.S.M. and N.M.) independently reviewed the titles and abstracts of studies identified by the search strategy. We excluded prognostic model studies that clearly did not meet our inclusion criteria at the title and abstract screening stage. For any studies where there was uncertainty, we undertook a full-text review. We resolved disagreement in judgements through discussion or, if necessary, by referral to a third review author (K.I.E.S. or D.M.).

Data collection
Two independent review authors (A.S.M. and N.M.) conducted the data extraction, commencing 1 September 2020. The Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS), which has been specifically designed for systematic reviews of prognostic models, was used to guide data extraction. 15 This included the following measures of predictive performance, where available: • calibration, which measures the extent to which risk predictions and observed outcomes are in agreement (measures extracted included calibration slope, ratio of observed (O) to expected (E) events (O:E ratio), calibration plots); and • discrimination, the model's ability to separate patients who develop the outcome of interest and those who do not (usually measured using the Concordance (C)-statistic or area under the receiver operator curve (AUC)).
Where these measures were not available directly, we planned to calculate them from other information available with reference to recent guidance. 21 We also planned to extract information on clinical utility, where available. Clinical utility is important to consider when a model's predicted risks are to be used to inform decisionmaking. It can be measured by the net benefit at a particular risk threshold, and by plotting decision curves of the net benefit across a range of relevant thresholds. 22

Data synthesis and meta-analysis approaches
If a sufficient number of external validation studies were identified for a particular model, we planned to conduct random-effects meta-analyses to summarise the performance of prognostic models, as the data were likely to be highly heterogeneous. In the absence of sufficient data for a meta-analysis, we have used a narrative synthesis instead.

Risk of bias assessment in included studies
Two independent review authors (A.S.M. and N.M.) assessed risk of bias (ROB) using the Prediction model risk of bias assessment tool (PROBAST), which assesses ROB (low, high or unclear) over four domains (participants, predictors, outcomes and analysis) and applicability (concerns about applicability; also low, high, or unclear) in the first three of the domains. 7,19,23 For the 'Analysis' domain, when determining whether an appropriate sample size was used, we adhered to PROBAST recommendations, which use the rule of thumb using events per candidate predictor parameter (EPP). The PROBAST guidance suggests an EPP of 20 and over for development studies (although those between 10 and 20 EPP can be rated 'probably yes' or 'probably no', depending on outcome frequency, overall model performance and distribution of predictors in the model) and 100 participants with the outcome and 100 without the outcome for external validation studies. For handling of missing data, multiple imputation is considered the most appropriate method when data are missing at random 7 and is recommended by PROBAST. 23 The PROBAST tool has been developed primarily for studies that used a more traditional regression method and guidance on best practice for machine learning models is less widely available. In the case of any machine learning models identified, we applied the PROBAST guidance as described for traditional regression techniques, but judgements should be interpreted with these limitations in mind.

Results of the search
We identified a total of 8694 studies initially, with one study located through a forward citation search performed on 12 March 2021. 24 Deduplicated records (n = 5777) records underwent title and abstract screening by two independent review authors (A.S.M. and N.M.), 51 underwent full-text screening and 12 studies were included in the final review (2 full-text articles required referral to K.I.E.S. and were excluded following this referral). These included 11 unique prognostic models; 1 of the studies 24 externally validated a model developed elsewhere. 25 Studies excluded after full-text screening (n = 37) fell into two categories: not meeting study design criteria (i.e. model not intended for prediction) or not meeting participant population criteria. Two studies (awaiting further information) were conference proceedings; we were unable to obtain further information on these studies and so did not include them in the review 26,27 (Fig. 1).

Description of studies
Of the included studies (Table 1), three were development and external validation studies, 28-30 eight were development-only studies 25,31-37 and one 24 was an external validation study. Three 25,35,36 of the development-only studies reported internal validation. No prognostic model was externally validated in more than one included study and, therefore, a meta-analysis was not necessary. All included studies used prospectively gathered data for developing the prognostic models. Four of the models were developed in secondary care, [32][33][34]37 whereas the other seven were developed in primary care 28,36 or community settings. 25,[29][30][31]35 Van Loo et al (2020) used a data-set drawn from primary care, secondary care and community settings (the Netherlands Study of Depression and Anxiety (NESDA)) for external validation. 24 Further details of the studies can be found in Supplementary Table S1 (available at https://doi.org/10.1192/bjp.2021.218).
The Appendix summarises the specific outcome definitions used. The included studies covered a wide range of predictors (Table S2 outlines the different predictors included in the final models and how they were measured for the individual studies). Most commonly, these were disease-related characteristics and demographic factors. Some studies explored some less common predictors such as: neuropsychological predictors (emotional categorisation, emotional memory, and facial expression recognition); 36 personality characteristics such as neuroticism; 32 psychosocial predictors such as life stress and interpersonal difficulties; 31 biochemical predictors such as results from the corticotrophin-releasing factor test; 37 peripheral blood metabolomic markers; 35 and combinations of items from the Symptom Checklist (SCL-90). 34 Of the 11 development studies, nine used regression analysis (five used logistic regression 30,[32][33][34]37 and four used Cox proportional hazards regression to study time to recurrence. 25,28,29,35 Of the remaining two included development studies, one used a machine learning support vector machine model to predict   recurrence over a median period of 233 days 36 and the other used discriminant function analysis (DFA), a statistical method to identify which continuous variables (predictors) best discriminate between two or more groups (in this case, relapse or stable remission). 31 Predictive performance of prognostic models The predictive performance of all included models is summarised in Table S2. Six of the model development studies identified 25,[28][29][30]35,36 reported internal validation to account for overfitting and optimism within the developed model. Three also reported external validation, using a data-set separate from the training data-set to give a truer reflection of model performance and generalisability. [28][29][30] Van Loo (2020) 24 presented the external validation of the model developed in Van Loo (2018). 25 Klein (2018) 28 used a randomized controlled trial data-set separate from that used for development for external validation and presented a calibration slope of 0.56 (0.81 on internal validation) and a Harrell's C-statistic of 0.59 (0.56 on internal validation). Van Loo (2015) 29 used a temporal cut-off to define their development and validation samples (temporal validation). They presented 'comparable' Kaplan-Meier curves as evidence that their prognostic model was well calibrated for people at lower risk of relapse but less so for higher-risk participants, and an AUC of 0.61 on external validation (0.79 on internal validation). Wang et al (2014) 31 used data from the same source but from a different geographical region (geographical validation) to define development and external validation data-sets. The authors presented a C-statistic of 0.72, indicating good discrimination, and presented the result of the Hosmer-Lemeshow goodness-of-fit test (3.51, P = 0.9) as evidence of 'excellent calibration'.
Van Loo et al (2020) 24 presented the results of the developed model in two 'test' sets. One of these, the Virginia Adult Twin Study of Psychiatric and Substance Use Disorder (VATSPSUD), was data from the same sample used in Van Loo et al (2018) 26 for model development and we have therefore classified this as an internal validation. The second test sample (NESDA) is separate from the development data-set and we have focused on this as the external validation. Discrimination was reported as good (AUC = 0.68 (95% CI 0.66-0.71) predicting recurrence over 0 to 2 years; AUC = 0.72 (95% CI 0.69-0.75) predicting recurrence over 0 to 9 years); calibration was not reported. Of the external validations included in this review, only Van Loo et al (2020) 25 included 95% CI for measures of predictive performance.
Klein et al (2018) 29 was the only included study to present all of the regression coefficients for the predictors included in the final model as well as the intercept and associated 95% CI. This model could therefore be used based on the information provided in the primary source. None of the included studies explored net benefit analysis (clinical utility) with respect to the developed models.

ROB and applicability assessment of included studies
We rated 11 of the 12 included studies as being at high overall ROB (see Fig. 2 29 was assessed to be at low ROB in all four domains. ROB was generally assessed as being low for most studies in the domains of participants and predictors. ROB was unclear for 8 out of 12 of the studies in the domain of outcomes, because the studies did not state that outcomes were determined masked to the predictor information. For the fourth domain (analysis), there was variable quality for the reported methods and some weaknesses and potential sources of bias were identified in this domain for 11 of the 12 included studies.

Predicting relapse or recurrence of depression
The most common weakness related to sample size or number of events, or both, a lack of which seriously and adversely impairs the ability of a statistical model in the real world because of a significant risk of overfitting. 38 Most studies did not describe how the sample size was determined. Only one study 28 reported sufficient EPP for model development (104 recurrences for eight candidate predictor parameters). All other regression models 25,29,30,[32][33][34][35]37 had inadequate sample size, according to PROBAST (see Method). The sample size determination used by Backs-Dermott et al (2010), 31 which used DFA, appeared to be appropriate according to their reported methods.
Ruhe et al (2019) 37 used a machine learning approach for model development. 36 Formal guidance is lacking to aid sample size determinations for prognostic model studies using machine learning techniques. The guidance and literature that does exist suggests that we should demand, if anything, significantly larger sample sizes when using a machine learning approach to prognostic model development, with one paper estimating that one would need more than ten times the EPP required for regression models to achieve a stable AUC and small optimism. 39 This study did not have an adequate sample size according to any of the existing guidance and recommendations. For Van Loo et al (2020), 24 although it was not explicitly stated, we made the assessment that the sample size probably met PROBAST requirements for external validation (at least 100 events).
Another limitation of the majority of the included studies (n = 8) was their handling of missing data. Multiple imputation was used to handle missing data in only four of the identified studies. 24,25,28,34 The remaining studies either did not report their approach [31][32][33]37 or used non-PROBAST recommended approaches for handling missing data, such as imputing the mean 36 or single imputation. 29,30 Finally, most studies (n = 11) did not present appropriate performance statistics. The PROBAST guidance recommends that, as a minimum, a calibration plot and discrimination statistics (C-statistic for binary and time-to-event outcome models) are presented as relevant performance measures for a prognostic model study. 19 Classification measures, such as sensitivity and specificity, can be presented in addition to calibration and discrimination statistics, but they have the drawback of loss of information and of requiring risk thresholds to be specified, often based on the data rather than on meaningful, clinical grounds. One study 28 presented both a calibration plot and C-statistic in line with minimum best practice.
We had low concern about applicability for all included studies except for one, 32 which was rated at an unclear level of concern ( Fig. 2(b)). It was unclear whether all participants had reached remission and it appears that a proportion of participants would have met the criteria for depression according to the Hamilton Rating Scale for Depression.

Discussion
This is the first systematic review looking at prognostic models predicting relapse and recurrence of depression. We have identified 11 unique models, across 12 included studies. None of the models underwent independent external validation (i.e. by researchers not involved in the original model development) or net benefit analysis to assess clinical utility. Only one of the included models was found to be at overall low ROB 28 and the discrimination and calibration of this model were poor on external validation. We were guided by the recent prognosis literature and guidance in developing our review methods, searches and in critically appraising the included studies. Our planned meta-analysis was not necessary because of an insufficient number of studies reporting performance statistics for the same model.

Comparison with the previous literature
The findings from this review align with previous prognosis research in this area, the majority of which has focused on prognostic factors. In contrast to prognostic models, which provide individualised risk prediction of particular outcomes conditional on multiple factors, prognostic factor studies focus on the factors themselves and whether they add (causal or prognostic) value over existing factors. Two recent systematic reviews and meta-analyses have explored prognostic factors associated with relapse and recurrence of depression. 6,40 There is 'strong evidence' that residual depressive symptoms are prognostic for relapse and recurrence, and 'good' Backs-Dermott et al (2010) 32 Berlanga et al (1999) 33 Johansson et al (2015) 34 Judd et al (2016) 35 Klein et al (2018) 29 Mocking et al (2021) 36 Pintor et al (2009) 32 Berlanga et al (1999) 33 Johansson et al (2015) 34 Judd et al (2016) 35 Klein et al (2018) 29 Mocking et al (2021) 36 Pintor et al (2009)  evidence that the number of previous episodes are associated with increased risk of relapse and recurrence. 6 In addition, the following factors are associated with relapse and recurrence: childhood maltreatment, comorbid anxiety, neuroticism, age at first onset, rumination, 6 experiencing a higher number of dependent chronic stressors, or a severe independent life event post-treatment. 40 Individual participant data meta-analyses have also been used to explore prognostic and prescriptive factors 41,42 and have been broadly in agreement, finding that younger age at onset, residual symptoms and a shorter duration of remission are associated with an increased risk of relapse. The prescriptive value of these factors remains uncertain. Previous research has also found a higher odds of recurrence associated with both psychosocial impairment and poor coping skills, and that avoidant coping style and 'daily hassles/life events' were predictive of recurrence. 2,43 The number of previous episodes was the most common included predictor across the models identified in this review (n = 6). 25,[28][29][30]33,36 The presence of residual symptoms was used as a predictor only in one developed model. 28 Childhood maltreatment was included as a predictor in four of our included studies, 25,29,30,36 comorbid anxiety in three, 25,29,30 neuroticism in one 32 and age of onset in two models. 25,36 Notably, rumination was not explored as a predictor in any of the included prognostic models, despite good evidence that this is associated with increased risk of relapse. 6,43 Wang et al (2014) 30 found that marital status 'contributed to' the prediction of recurrence, whereas Johansson et al (2015) 33 included having a partner or not as one of the two predictors in their final model (odds ratio of 0.12 (95% CI 0.02-0.64), P = 0.01). The extant literature does not support marital status as a predictor of recurrence 4,44 and weaknesses in the methodology of the prognostic model studies mean that we cannot make conclusive statements about this but, given the strength of the association presented, 33 the prognostic significance of 'having a partner or not' warrants further investigation. The model development study by Van Loo et al (2018) 25 supports the findings of earlier research suggesting that gender is unlikely to be predictive of relapse.
There have been some previous attempts to derive and validate multivariable prognostic models to predict depression-related outcomes other than relapse and recurrence. Existing prognostic models for depression outcomes include a model (the Depression Outcomes Calculator-Six Items, (DOC-6©)) to predict remission (C-statistic (AUC) of 0.62, 95% CI 0.57-0.66) or persistent depressive symptoms (C-statistic (AUC) of 0.67, 95% CI 0.61-0.72) at 6 months' post-diagnosis; 11 a model to predict persistent symptoms at six months (C-statistic not reported; R 2 of 0.40 in the development sample and 0.27 in the validation sample); 45 and a model to predict onset of depression in general practice attendees who did not currently have depression (C-statistic of 0.79, 95% CI 0.77-0.81). 11 The studies in this review present predictive performance statistics broadly in line with these, suggesting that successful individualised prediction might be possible for depression outcomes, but better quality studies and potentially different combinations of predictors are needed to explore this further.

Implications for clinical practice and research
Relapse and recurrence occur in a significant proportion of people with remitted depression and are a source of considerable morbidity. The economic burden of depression is higher in those who experience relapse or recurrence than in those who do not 46 and, although interventions to prevent relapse or recurrence of depression (including pharmacological and psychological approaches) can be resource-intensive, they are effective [47][48][49] and cost-effective. 50 Implementation research is needed to ensure that such interventions can be made available to a greater number of patients in a scalable and feasible way.
A potentially effective way of ensuring efficient allocation of relapse prevention interventions is by risk-stratifying patients according to risk of relapse and recurrence. Interventions can then be provided to those most likely to benefit from them. The aetiology of depression and depressive relapse is multifaceted, and multivariable models are likely to be a more helpful approach to predicting outcomes than relying on the presence or absence of single prognostic factors. None of the prognostic models identified in this review had sufficiently high-performance metrics to enable a personalised approach to relapse prevention for depression at present.
We reported some key methodological weaknesses in the studies identified in this review, particularly with respect to sample size. Unless the sample size is adequate, there will be limitations to how far we can trust the predictive performance statistics presented by the model development study as overfitting is likely. Going forward, it might be that data from multiple sources should be combined and harmonised to increase the available sample size for model development. A further consideration is that the data in the included studies were taken from samples collected for other purposes, for example randomised controlled trials and longitudinal cohort studies. Although these are considered acceptable and feasible sources of data for prognostic model studies, 51 there may be advantages to prospectively gathering data (in a pre-designed prospective cohort study) with the explicit purpose of prognostic model development. 7 A benefit of this is that researchers can control the collection and ensure standardised measurement of predictor and outcome information, but such an approach is more costly and time-consuming than the secondary analysis of pre-existing data and would require a commitment to resource and fund such work. The International Taskforce for relapse prevention of depression (ITFRA) (www.itfra.org) have begun to address these issues by bringing together data from trials of existing relapse prevention interventions and aiming to harmonise predictor and outcome measurement to improve personalised medicine in this area. Work is also underway aiming to move beyond stratification to provide more robust evidence for treatment moderators and prescriptive factors in relapse prevention. 52 Most of the included predictors in the studies identified in this review were clinical or demographic variables. It is possible that including a greater number of biomarkers or genetic information may help move towards such a precision medicine approach, as has been shown promising in a number of other areas, including diagnosing mood disorders. 53 Nevertheless, such an approach may not be clinically feasible, and an important consideration for researchers is the context and setting in which a prognostic model is intended to be used. Models intended for a primary care setting, for example, may need to focus on a different set of predictors than those intended for use within a specialist service. Primary care-based models would ideally need to include predictors that were available and routinely collected in primary care, such as demographics, socioeconomic information, comorbidities and depression history characteristics.
This review has highlighted a range of statistical approaches to prognostic model development, from 'traditional' regression-based techniques to those using machine learning. Machine learning approaches offer the potential of greater predictive performances than more traditional approaches. 54 However, this not always the case, as some studies 55 have shown. The technique can also be criticised for lack of interpretability, and variable reporting standards, although the forthcoming TRIPOD-AI may encourage greater consistency in this regard. When designing future prognosis research, researchers should be mindful of the relative benefits and disadvantages associated with different methodological approaches. Prognosis research has grown as an area over recent years 7 and, with the development of the PROGRESS initiative, there are now standards and guidelines for conducting, 56 reporting 57 and appraising 19 prognostic model studies. Future studies looking to develop prognostic models for relapse and recurrence of depression should follow best practice guidance when designing methodology, and should be reported in line with the TRIPOD statement. 57 In conclusion, this review identified 11 prognostic models developed to predict the risk of relapse or recurrence in people with remitted depression. The models were developed in a variety of clinical settings and patient populations and with a range of included predictors. We are not yet at the point where we can reliably predict outcomes for a given person with remitted depression based on their demographic, clinical and disease-level characteristics. This review suggests that this might be possible, although the studies identified here were limited by their high ROB because of methodological weaknesses. Researchers should conform to best practice when developing prognostic models in future. Beyond this, any such prognostic models will require good-quality external validation, assessment of clinical utility and evaluation of implementation before they can successfully be translated into clinical practice. version of the review. We thank the Cochrane Prognosis Methods Group for providing guidance and the editorial team of the Cochrane Common Mental Disorders (CCMD) Group. The authors are grateful to the following Patient Advisory Group members who contributed to and provided constructive feedback on the final review: Gregory