Empirical evidence for definitions of episode, remission, recovery, relapse and recurrence in depression: a systematic review

Aims. For the past quarter of a century, Frank et al.’s (1991) consensus-based definitions of major depressive disorder (MDD) episode, remission, recovery, relapse and recurrence have been the paramount driving forces for consistency in MDD research as well as in clinical practice. This study aims to review the evidence for the empirical validation of Frank et al.’s proposed concept definitions and to discuss evidence-based modifications. Methods. A literature search of Web of Science and PubMed from 1/1/1991 to 08/30/2017 identified all publications which referenced Frank et al.’s request for definition validation. Publications with data relevant for validation were included and checked for referencing other studies providing such data. Results. A total of 56 studies involving 39 315 subjects were included, mainly presenting data to validate the severity and duration thresholds for defining remission and recovery. Most studies indicated that the severity threshold for defining remission should decrease. Additionally, specific duration thresholds to separate remission from recovery did not add any predictive value to the notion that increased remission duration alleviates the risk of reoccurrence of depressive symptoms. Only limited data were available to validate the severity and duration criteria for defining a depressive episode. Conclusions. Remission can best be defined as a less symptomatic state than previously assumed (Hamilton Rating Scale for Depression, 17-item version (HAMD-17) ⩽4 instead of ⩽7), without applying a duration criterion. Duration thresholds to separate remission from recovery are not meaningful. The minimal duration of depressive symptoms to define a depressive episode should be longer than 2 weeks, although further studies are required to recommend an exact duration threshold. These results are relevant for researchers and clinicians aiming to use evidence-based depression outcomes.


Introduction
Major depressive disorder (MDD) is a common, often chronic and recurrent condition, marked by persistent suffering and poor overall health and with deleterious effects on psychosocial, academic, vocational and family functioning. MDD is one of the most prevalent mental disorders and the leading cause of disability worldwide (World Health Organization, 2017), with lifetime prevalence estimates ranging from 7% to 21% (Kessler & Bromet, 2013).
In 1991, the MacArthur Foundation Network on the Psychobiology of Depression concluded that the randomness with which investigators referred to key changes in clinical status of individuals with depression led to considerable confusion in the literature ). Subsequently, a task force was initiated to achieve consensus about the definition of key stages, change points and outcome definitions for MDD among clinical investigators and practicing clinicians. The resulting report by Frank et al. (1991) defined conceptualisations of an MDD episode, remission, recovery, relapse and recurrence (see Fig. 1 and supplementary Table) by a set of five parameters or thresholds: two severity scores (cut-offs for 'asymptomatic' and fully symptomatic ranges) and three durations (minimal consecutive time durations in the fully or a-symptomatic range before an episode, remission, or recovery can be declared).
Specific consensus-based recommendations for these thresholds were provided in Frank et al.'s (1991) report and revised in a follow-up report by Rush et al. (2006). Both reports explicitly requested empirical validations of these now widely used consensus-based definitions. Therefore, the present paper reviews the accumulated evidence over the past 27 years to validate the proposed conceptualisations and operationalisations and to provide suggestions for future avenues.

Conceptual discussion
Here we focus on conceptualisations of MDD episode, remission, recovery, relapse and recurrence by Frank et al. (1991, see supplementary Table), which are based exclusively on severity (number/intensity) and duration of clinical symptoms, and each has its own rationale and clinical implications. An MDD episode means that illness is present and that treatment is indicated. When the state of remission (a relatively brief period without clinically relevant symptoms during or at the end of an episode) is reached, no intensified treatment regimen is required or justified. A recovery (a sustained period of absence of clinically relevant symptoms, i.e. a sustained remission) means that the episode has ended and treatment can be discontinued or aimed at preventing subsequent episodes. Relapse/ recurrence imply a return of symptoms during remission/recovery, respectively, and indicate a need for treatment intensification. The implicit distinction between relapse and recurrence is that a relapse is thought to be a return of symptoms of an ongoing episode that was symptomatically suppressed, whereas a recurrence represents an entirely new episode.
Importantly, these concepts can only have the ascribed interpretations and treatment implications if they have substantial predictive value for a future course. For example, treatment is indicated for those experiencing an episode because they have a worse prognosis than those who are experiencing symptoms that do not meet episode criteria. Therefore, the operationalisations of these concepts (i.e. the choice of severity and duration thresholds) should be chosen in such a way that they have optimal prognostic significance.
In particular, it should be possible to distinguish remission from recovery (and therefore relapse from recurrence), which are different only in their duration, by a difference in prognosis. The hypothesis is that those in remission have not (yet) fully recovered from the latently present episode (i.e. they are still undergoing a healing process) and therefore have a relatively high relapse rate compared with those who recovered. Those who recovered have a low recurrence rate that is no longer dependent on the time since their last episode and equal to the incidence rate of a risk factor-comparable population who never experienced an episode. Similarly, in cancer research, 'full remission' is defined as the period during which any sign of the disease is lacking, but during which a patient is particularly vulnerable for a relapse of the tumour since latent disease might still be present. When the remission is of sufficiently long duration, the patient can be (retrospectively) considered to be recovered or 'cured' as the passing of even more time does not provide additional protection to disease recurrence, the risk of which is similar to the incidence risk of a comparable healthy population.
Some of the clinical status concepts that are the subject of this review are also defined in the Diagnostic and Statistical Manual of mental disorders (DSM-5; American Psychiatric Association, 2013) and the International Classification of Diseases (ICD-10; Fig. 1. Time course of depressive symptomatology in a hypothetical patient, showing an MDD episode, remission, relapse, recovery and recurrence. These stages are operationalised using two severity criteria (S 1 , S 2 ), and three duration criteria (D, E, F). S 1 : Severity threshold separating asymptomatic from partially symptomatic range; S 2 : Severity threshold separating partially symptomatic range from fully symptomatic range; t 1 , Start of MDD episode; t 2 , Start of episode remission; t 3 , End of episode remission; t 4 , Relapse of MDD episode; t 5 : Start of episode recovery and end of MDD episode; t 6 , Start of MDD recurrence.
World Health Organization, 1993), as summarised in Table 1.

Methods/literature search
This systematic review largely adhered to PRISMA guidelines (Moher et al. 2009). To review empirical evidence regarding the definitions proposed by Frank et al. (1991), we searched both Web of Science and Pubmed for studies that referenced them without imposing language restrictions (see supplementary PRISMA flow diagram). Duplicates and nonobtainable studies were excluded. Based on title and abstract, studies were excluded that (i) did not focus on individuals with MDD, (ii) were non-empirical, (iii) were of study types not expected to be useful for the purpose of this review (see online supplement), or (iv) focused on the evaluation of some association or cause-effect relation between variables.
The remaining articles were scrutinised for data that could (in)validate at least one of Frank's definitions. Because severity related criteria were necessarily instrument-specific we focused on articles determining cut-offs on the HAMD-17 and the Montgomery-Åsberg Depression Rating Scale (MADRS), which are the most widely used instruments (Zimmerman et al. 2004a). Studies using different methodologies were included (see results section). Criteria to define state duration should be maximally predictive of remaining in that state (Frank et al. 1991). Therefore, we sought studies that show the remission/recovery and relapse/ recurrence of depressive episodes over time (via survival curves or equivalent).
Two authors (PLdZ, BFJ) extracted data independently and resolved discrepancies through discussion and consensus. References of included articles were searched for additional relevant studies. The literature search was last updated on August 30, 2017.

Results
The 1570 identified papers (supplementary eFigure) included 214 duplicates and 26 non-obtainable papers. The study selection criteria (as outlined above) reduced the number to 117 papers and yielded 49 additional records via reference checks. From these 166 papers, 110 were excluded based on the full-text assessment. Thus 56 studies covering 39 315 subjects were included, and summarised in Tables 2-5. Frank et al. (1991) categorised the level of MDD symptomatology in three clinical ranges: a fully symptomatic range that can indicate the start of an episode, an asymptomatic range that can indicate the start of a full remission, and a partially symptomatic range in between. The 'asymptomatic range' is supposed to represent the normal range consistent with the absence of disorder. The term is a bit of a misnomer as this range includes the presence of a minor level of symptomatology associated with the 'healthy' (non-depressed) population, in which the average HAMD-17 score is about 3.2 (Zimmerman et al. 2004b); however, for consistency, the term asymptomatic will be used throughout this review.

Severity thresholds
Two instrument-specific 'thresholds' need to be defined on the HAMD-17 and MADRS (most widely used as endpoints in clinical trials; Zimmerman et al. 2004a) to operationalise these three different levels of symptomatology (see Fig. 1). Frank et al. (1991) defined HAMD-17 scores 515 to correspond to the fully symptomatic range while HAMD-17 47 would indicate the Range cut-off 42 symptoms to no more than a mild degree 'Free from any significant mood symptoms', not specified further Symptomatic Range cut-off 51 out of 2 core symptoms and 55 out of 9 total symptoms 52 out of 3 core symptoms and 54 out of 10 total symptoms a If the symptoms are particularly severe and of very rapid onset, it may be justified to make the diagnosis after less than 2 weeks. b Although the term 'recovery' is mentioned in the DSM-5 and ICD-10, it is not explicitly defined. 'Relapse' is not mentioned in DSM-5 and ICD-10, whereas a recurrent episode is defined in DSM-5 as a return of symptoms during a remission (i.e. equivalent to the concept of 'relapse' by Frank et al. (1991)) and in ICD-10 as a depressive episode separated from a previous episode by at least 2 months free from any significant mood symptoms.
asymptomatic range, the latter of which is roughly equivalent to MADRS 410-11 (Zimmerman et al. 2004c).
Regarding the severity thresholds, the 32 studies that provided data are summarised in Tables 2-4.

Severity threshold for the asymptomatic range
Studies focusing on the asymptomatic threshold could be roughly divided into three groups, reflecting differences in the used criteria for determining the 'best' threshold for the asymptomatic range.
The first group of studies selected the optimal threshold by maximising the correspondence to some gold standard (Hawley et al. 2002;Zimmerman et al. 2004dZimmerman et al. , 2005Bandelow et al. 2006;Ballesteros et al. 2007;Riedel et al. 2010;Romera et al. 2011;Leucht et al. 2013;Sacchetti et al. 2015), most often the Clinical Global Impression-Severity scale (CGI-S) or some measure of functioning (see Table 2). The second group of studies based on the optimal asymptomatic threshold on the mean scores or statistical upper limits of the general population (Zimmerman et al. 2004a, b; see Table 3). These two groups mentioned a variety of optimal asymptomatic thresholds for the HAMD-17 ranging from 42 (Zimmerman et al. 2005) to 410 (Zimmerman et al. 2004b) and for the MADRS 44 (Zimmerman et al. 2004a(Zimmerman et al. , 2004d to 411 (Bandelow et al. 2006).
The third and largest group of studies compared the prognosis of patients with different levels of depressive symptomatology, usually in terms of relapse/ recurrence risk (see Table 4). Based on this information, a threshold can be chosen that best distinguishes those with a favourable from those with a bad prognosis, argued by Zimmerman et al. (2004a) to be the best method of validating a threshold. Most of these studies show that the presence of 'subthreshold' symptoms (often called residual symptoms if occurring after an MDD episode) was associated with an enhanced risk of a (recurrent) episode or relapse (Maier et al. 1997;Riso et al. 1997;Judd et al. 1998Judd et al. , 2000Judd et al. , 2016Van Londen et al. 1998;Fava et al. 1999;Kanai et al. 2003;Taylor et al. 2004;Nierenberg et al. 2010;Dunlop et al. 2012;Kiosses & Alexopoulos, 2013;Peselow et al. 2015). One study (Romera et al. 2011) did not find this increased risk. Often authors implicitly argued for a lower threshold for remission that does not encompass this level of symptomatology. Some studies also showed that remission as defined by Frank et al. (1991), HAMD-17 47, is associated with a better prognosis than not achieving this level of remission (Paykel et al. 1995;Pintor et al. 2004).
Saliently, some other noteworthy studies showed a large discrepancy between Frank's definition of depression and patient's own judgement regarding their remission (Zimmerman et al. 2012a, b). Within the group of remitters as defined by Frank et al. (1991), a substantial heterogeneity was observed with respect to reported symptoms (Zimmerman et al. 2012c), psychosocial impairment (Zimmerman et al. 2004e, 2007) and a range of other relevant outcomes (Zimmerman et al. 2012d) (see Table 3).

Severity threshold for the fully symptomatic range
Only one study focusing on the fully symptomatic threshold was obtained (see Table 2). By using the CGI-S of 2 or 3 as the gold standard, Leucht et al. (2013) advise a HAMD-17 threshold of 57 or 514, respectively.
Duration threshold for episode Frank et al. (1991) categorised the symptomatic period following any non-depressive state using a time boundary, separating the time period before the symptoms were recognised as part of a depressive episode from the time period afterwards. The underlying assumption was that developing transient depressive symptoms is not necessarily pathological, as long as they do not culminate in a long-lasting depressive episode. Regarding the validation of this duration criterion, Frank et al. (1991) state that an episode should be declared 'when it is unlikely that the patient will spontaneously recover in the next day or two'. Although rather arbitrary, the concept is clear: for the validation of this duration criterion, data are necessary that shed light on the prognosis of those with recently started depressive symptomatology.
Such data was provided by four studies (see Table 5). The meta-analysis by Whiteford et al. (2013) covering the rate of spontaneous remission in untreated depression showed that this rate decreases continuously over time. However, the amount of data in the range of short duration of follow-up is rather scarce and the studied population (wait-list and primary care samples) is not representative of the general population with depressive symptoms.
One study in the general population showed that 25% of depressive episodes remitted after 4 weeks and 50% after 8-12 weeks, using a methodology in which onset and end of depressive episodes were retrospectively assessed by asking the respondents for their depressive symptomatology in the past (Eaton et al. 1997). The finding of a median duration of 12 weeks was replicated in the NEMESIS study using a similar methodology, which also shows that the rate of recovery quickly diminishes after these 12 weeks (Spijker et al. 2002).   (14) Gen   (12) Psychiatric status ratings: (1) asymptomatic (return to usual self); (2) residual/mild affective S x ; (3) partial remission, moderate S x or impairment; (4) marked/major S x or impairment; (5) meets definite MDD criteria without prominent psychotic S x or extreme impairment; (6) meets definite criteria with prominent psychotic S x or extreme impairment. c The authors state that relapse becomes less likely when the MADRS score is lower, but there is no single cut-off that has high sensitivity and specificity for predicting relapse: 'This suggests that there is no particular cut-off that is sufficient to consider as 'low enough' to protect against future relapse, so the primary conclusion would be to strive for the lowest score possible'. d No particular cut-off: those with a greater number of residual symptom domains (out of nine possible DSM-IV criterion symptom domains) had a greater probability of relapse.  Both relapse and recurrence def. as first of 2 week with syndromal MDD S x (PSR = 5 or 6) b or minor depression (PSR = 3) b .

Continued
Both relapse and recurrence def. as first of 2 week with syndromal MDD S x (PSR = 5 or 6) or minor depression ( Psychiatric status ratings: (1) asymptomatic (return to usual self); (2) residual/mild affective S x ; (3) partial remission, moderate S x or impairment; (4) marked/major S x or impairment; (5) meets definite MDD criteria without prominent psychotic S x or extreme impairment; (6) meets definite criteria with prominent psychotic S x or extreme impairment.
(1) Respondents rated whether they experienced 'a time when you felt sad or blue and had some of these other problems (e.g., weight loss or sleeplessness)'.
(2) Response was defined in various ways, and each definition was tested for validity.
(3) Authors appear to mix up recurrence and relapse, but we denote time after patient recovered as recurrence.
(4) Medication use was seen as indication for not being healthy, thus these people were not at risk for recurrence.
(5) PSR ≥3 during some of these weeks count as residual S x after remission, i.e., the patient is not yet considered to be relapsed or recurred before PSR ≥5.
(6) The authors suggest that 8 week duration was the standard before their paper was published, mistakenly, see Rush et al. (2006). Wakefield & Schmitz (2013) argued that 'uncomplicated' depressive episodes, defined as <2 months in duration combined with the absence of certain 'heavy' symptoms such as suicidal ideation and psychomotor retardation, should not be classified as MDD. They argued that the risk of developing new depressive episodes for those who had such an uncomplicated episode is not higher than for the general population. Thus, this subgroup of patients does not seem to suffer from an underlying disorder that increases their risk of developing subsequent depressive episodes. This suggests that, at least for this subgroup, the depressive symptomatology should be at least 2 months of duration before it should be considered as a depressive episode.
Duration thresholds for remission and recovery Frank et al. (1991) categorised the asymptomatic period following a fully symptomatic period with two time boundaries, yielding three distinct time periods: those (i) before the onset of full remission, (ii) following the onset of full remission but before declaration of recovery and (iii) after declaration of recovery. The underlying assumption is that these three successive periods are each associated with a certain 'hazard' for a return of symptoms, which diminishes significantly at each time boundary and becomes constant when recovery is declared.
In the available literature, the hazard for a return of symptoms for asymptomatic individuals is usually shown indirectly in the form of survival curves, showing the fraction of subjects without relapse/recurrence over time. An exponential survival curve is thus equivalent to a constant hazard, whereas a sudden decrease in a hazard (for example, when remission is achieved) should be visible as an upward discontinuity in the survival curve slope.
Survival curves (or equivalent) for asymptomatic individuals until relapse/recurrence or equivalent data were obtained from 31 studies (see Table 5). There is a substantial difference between studies in their studied populations (viz., general population, 1st, 2nd or 3rd line ambulant patients or inpatients), their operationalisations of remission, recovery, relapse and recurrence (because of different instruments or cut-offs on the same instruments) and in the involved treatments that are often uncontrolled.
Several studies show some indication of a sudden drop in relapse/recurrence rate a certain time after remission/recovery was obtained (Paykel et al. 1995;Riso et al. 1997;Judd et al. 1998;Van Londen et al. 1998;Heinze et al. 2002;Kanai et al. 2003;Kennedy et al. 2003;Naz et al. 2007;Holma et al. 2008;de Jonge et al. 2010;O'Leary et al. 2010;Kiosses & Alexopoulos, 2013). However, the exact amount of time necessary to achieve this drop (as counted from the start of the asymptomatic period) differs per study, ranging from about 2 months (O'Leary et al. 2010) to about 3 years (Judd et al. 1998). Other studies do not find such a sudden drop at all, instead suggesting that the diminishing hazard of return of symptoms is a gradual process rather than a discrete one (Maj et al. 1992;Shea et al. 1992;Flint & Rifat, 1997;Kessing et al. 1998;Van Weel-Baumgarten et al. 1998;Mueller et al. 1999;O'Leary et al. 2000;Solomon et al. 2000;Mattisson et al. 2007;Dunlop et al. 2012;Martínez-Amorós et al. 2012;Seemüller et al. 2014;Peselow et al. 2015;Judd et al. 2016). In particular, several studies of the long-term course of MDD show that recurrence rates stabilise only after many years, such as 2.5 years , 10 years (Mattisson et al. 2007) or about 15 years (Kessing et al. 1998). A third group of studies shows atypical survival curves where the time-specific risk of return of symptoms even increases over time during certain time intervals (Eaton et al. 1997;Emslie et al. 1997;Birmaher et al. 2004;Pintor et al. 2004;Yiend et al. 2009).

Severity thresholds
The obtained studies that aimed to identify the optimal thresholds for the asymptomatic and fully symptomatic depressive ranges differed widely in their methodologies (see Tables 2-4). Frank et al. (1991) postulated that these ranges should (i) correspond to what clinicians view as asymptomatic and fully symptomatic and (ii) that classification of patients within these ranges should be reasonably stable over time. Other theorists argued that the optimal thresholds should be selected based on their predictive value for the future course (Zimmerman et al. 2004e), which would be most consistent with methods used in other medical fields (Zimmerman et al. 2004a).

Severity threshold for the asymptomatic range
Multiple studies showed that those who scored below a certain threshold on depressive symptom scales had a better prognosis than those who scored above it (Paykel et al. 1995;Maier et al. 1997;Riso et al. 1997;Judd et al. 1998;, 2016Van Londen et al. 1998;Fava et al. 1999;Kanai et al. 2003;Pintor et al. 2004;Taylor et al. 2004;Nierenberg et al. 2010;Dunlop et al. 2012;Kiosses & Alexopoulos, 2013;Peselow et al. 2015). Often this finding was presented as evidence for the perspective that the asymptomatic threshold is currently too high (Judd et al. 1998).
However, even though none of these studies systematically studied and compared the predictive value of all possible thresholds, we hypothesise that this is a general finding that can be obtained irrespective of the chosen threshold, as a lower score on a depressive symptom scale increases the 'symptomatic distance' to the fully symptomatic threshold and therefore the average time required for reaching that state. Indeed, some studies show that the currently often-used threshold (HAMD-17 47;Frank et al. 1991) also differentiates in this regard (Paykel et al. 1995;Pintor et al. 2004). Studies using other methodologies for determining the best asymptomatic thresholdsuch as optimising correspondence to clinical impressions of clinicians (using the CGI-S as a gold standard), different functioning scales, or the general populationyield different optimal thresholds. The consensus among these authors seems to be that the currently often-used threshold of HAMD-17 47 is too high, as it leads to the inclusion of too many patients with poor functioning (Sacchetti et al. 2015), who are psychosocially impaired (Zimmerman et al. 2007) and who do not consider themselves as remitted (Zimmerman et al. 2012a).
Ultimately, the particular choice of asymptomatic threshold is rather arbitrary given the available evidence. Nonetheless, the currently often-used threshold seems to be too high. We, therefore, suggest lowering the asymptomatic threshold to 44 on the HAMD-17; this is on the low side of the suggested values in the obtained studieswhich we think is justified given the better functioning below this score (Sacchetti et al. 2015)although still above the mean score in the general population (Zimmerman et al. 2004b). It has been shown that some patients who scored 47 on the HAMD-17 still met diagnostic criteria for MDD (Zimmerman et al. 2004e), which is another argument for our suggestion to lower the asymptomatic threshold to 44, as this largely prevents 'remitted' people from meeting the diagnostic criteria for MDD. This new HAMD-17 threshold is roughly equivalent to a threshold of 45 on the MADRS (Mittmann et al. 1997), which is plausible given the reviewed evidence. Note that these thresholds are useful as endpoints in clinical studies, but do not necessarily mean that scoring below these thresholds should be the main treatment goal for clinicians, as treating individual patients by striving for the lowest score possible still improves prognosis (Taylor et al. 2004).

Severity threshold for the fully symptomatic range
Only one study was obtained that provides some evidence for the fully symptomatic cut-off (Leucht et al. 2013). This relative lack of evidence is understandable, as this seems to be the definition that least 'needs' empirical validation; this can be understood as a rather subjective clinical decision regarding the minimal level of symptomatology that can be considered to be a disorder. Therefore, there is not enough evidence to make any recommendations regarding this threshold.

Duration threshold for episode
Only a limited amount of studies showed data on the prognosis of those with 'recent-onset' depression (see Table 5). This can be explained by epidemiological investigations that typically include depressed populations, for which it is unclear how long the depressive symptoms have been present at the start of the studies. Although two studies show that half of the depressive episodes in the general population remit within 3 months after their onset (Eaton et al. 1997;Spijker et al. 2002), it seems likely that many short 'episodes' of only a few days are missed since these episodes are infrequently retrospectively indicated, and short episodes are more easily forgotten than long ones (Moffitt et al. 2010). Therefore, the rate of early remission is probably even higher than suggested by these studies.
In general, the reviewed data suggest that the rate of (spontaneous) remission of depressive symptoms is relatively high when the onset of these symptoms is recent, especially during the first 12 weeks, but diminishes quickly thereafter. This provides some justification for the suggestion by Frank et al. (1991) of requiring a certain amount of time at the fully symptomatic level before defining a depressive episode. However, the currently required 'waiting time' of only 2 weeks (see Table 1; DSM-5 criteria, APA, 2013; ICD-10 criteria, WHO, 1993) does not seem to be based on empirical evidence. The reviewed studies suggest that a longer time period might be advisable. Nonetheless, we refrain from a definitive conclusion, for which a prospective study in which the general population is screened with a high frequency (e.g. weekly) for depressive symptomatology is required but hitherto unavailable.

Duration thresholds for remission and recovery
A substantial body of literature studying depressive relapse/recurrence risk over time has been obtained (see Table 5), but comparing the studies is not straightforward; the studies differed in their studied populations, their operationalisations of remission, recovery, relapse and recurrence, and in the involved treatments. Some studies were consistent with the idea of a 'point of rarity' (Frank et al. 1991) at which the relapse/recurrence risk suddenly drops or becomes stable.
However, there is no consistency in the estimation of this time point. Combined with the fact that the majority of studies do not show such a point of rarity, the most likely conclusion is that prognosis gradually improves as remission/recovery duration is longer, rather than suddenly at a particular point in time.
The reviewed data do not suggest that any specific duration threshold to distinguish remission from recovery is warranted to add predictive value to the observation that prognosis improves over time as the duration of the asymptomatic period increases. Not only were the specific operationalisations of the duration criteria by Frank et al. (1991) and Rush et al. (2006) not empirically supported, it seems that the whole concept of these duration criteria must be rejected. The idea that a reoccurrence of depressive symptoms shortly after their initial remission constitutes a 'relapse' of the previous episode, whereas their later reoccurrence is the first sign of an entirely new episode, is a model that lacks empirical support. Additionally, it is of no additional value to the patient or clinician as the assumed origin of the reoccurring symptoms has no implications for treatment or prognosis.
Thus, based on these results, the duration criteria for declaring remission and recovery seem unnecessary. We suggest that depressive remission can simply be defined as the asymptomatic state after a depressive episode, without applying any duration criterion. Stability of remission is then relatively low on the first day but increases gradually with its duration. The term recovery can then be used as a concept that includes more than just absence of symptoms, such as social functioning or subjective well-being, possibly including the absence of significant treatment as this would better fit the concept of recovery from a patient's perspective.

Limitations
Limitations of this review include the greatly varying study populations and treatments within the included studies (which is also a strength). Moreover, a substantial part of the data had to be extracted from survival curves that only rarely showed confidence intervals and often did not possess a clearly labelled time axis, making it difficult to assess exactly when the measurement began.

Conclusions
More than a quarter-century after the landmark paper in which Frank et al. (1991) provided their consensusbased definitions for depressive states (episode, remission, recovery, relapse, recurrence), we reviewed the empirical evidence. The data suggest that remission can best be defined as a less symptomatic state than assumed earlier (HAMD-17 44 instead of 47), without applying a duration criterion. Specific duration thresholds to separate remission from recovery are not meaningful. Evidence suggests that the minimal duration of depressive symptoms before a depressive episode can be defined should be longer than 2 weeks, although further studies are required to recommend an exact duration threshold.

Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2045796018000227