The practice of evidence-based medicine requires clinicians to integrate their personal experience and expertise with the results of systematic research studies. Randomised controlled trials (RCTs) are the primary source of evidence with respect to treatments, and statistical tools such as effect sizes and ‘number needed to treat’ (NNT) statistics were developed to facilitate clinicians' comparisons of results across studies and the application of these results to their practice. This integration of systematic research into clinical decision-making has been of great benefit to patients over the past 25 years, but temporal trends in research practices and study results necessitate ongoing revision of how evidence-based medicine is integrated into clinical practice.
In this analysis, we describe one such example, in which rising placebo response rates in RCTs over the past 30 years might influence how practising clinicians and other interested stake-holders interpret and apply the results of these studies. We discuss this problem using the example of major depressive disorder (MDD), which has become perhaps the paradigmatic illustration of a disorder for which increasing placebo response is the dominant issue in RCTs. In short-term RCTs of antidepressants in adults with depression, for example, placebo response has risen steadily over the past 30 years, concurrent with a decline in drug v. placebo difference. Reference Walsh, Seidman, Sysko and Gould1,Reference Dunlop, Thase, Wun, Fayyad, Guice-Pabia and Musgnung2 Studies conducted in children and adolescents with MDD typically observe even higher placebo response rates (mean of 46% compared with a mean medication response of 59%), which have also been increasing over time. Reference Bridge, Birmaher, Iyengar, Barbe and Brent3 Although our discussion is focused on MDD, we believe the arguments considered are germane to the practice of evidence-based medicine in any disorder for which clinical trials have consistently demonstrated a high placebo response rate.
The conceptual basis of NNT
In a highly influential article, Laupacis et al reviewed several methods by which practising clinicians could ‘measure and compare the benefits and risks of various preventive, diagnostic, therapeutic, rehabilitative approaches’. Reference Laupacis, Sackett and Roberts4 These ‘yardsticks’ were intended to help clinicians transform the otherwise often difficult-to-interpret results of RCTs, considered the most valid data on which to base measurements of benefit and harm for treatments, into information directly applicable to the clinical situation. The ideal yardstick, according to Laupacis and colleagues, would have four attributes: it would (a) compare the consequences of doing nothing with the potential benefits of doing something; (b) summarise the potential harm of a treatment; (c) identify patients who are at high risk of an adverse event and who are likely to be responsive to therapy; and (d) permit comparisons between different treatment approaches. Reference Laupacis, Sackett and Roberts4
The metric most closely approximating this combination of properties was deemed to be the NNT. The NNT is defined mathematically as the reciprocal of the absolute risk reduction (ARR). For a therapeutic intervention, the ARR is calculated by subtracting a post-treatment measurement of disease severity in the treatment being studied from measured disease severity in the comparison group.
For example, in an antidepressant clinical trial in which the response rate in the medication group was 50% and the response rate in the placebo group was 35%:
or 7, since it is customary to round the NNT to a whole number.
As this example suggests, the most easily interpretable NNTs are calculated from objectively defined outcome data that are categorical, such as death v. survival or stroke v. no stroke. Reference Laupacis, Sackett and Roberts4 In many psychiatric conditions (e.g. MDD), categorical outcomes such as response v. non-response do exist, but difficulties may arise in other conditions (e.g. schizophrenia) for which there is no clear consensus on the definition of response. It is sometimes necessary to derive NNTs from continuous data (e.g. a change score on an outcome measure), such as for older studies that do not report response and remission rates as well as for meta-analyses that report summary mean difference statistics between interventions and controls. Complex statistical methods have been developed for these cases to permit calculation of NNTs from continuous outcome data, and in many reports these derived NNTs seem to be valid across a range of treatment response definitions. Reference Furukawa, Akechi, Wagenpfeil and Leucht5 However, these approaches may require patient-level data from clinical trials to which most clinicians do not have access, and their underlying assumptions may not be valid in all conditions. For the practicing clinician, these considerations tend to limit the applicability of the NNT to disorders that have clinically meaningful, categorical outcomes and studies that report such data.
The NNT tells the clinician the number of patients one would expect to treat with intervention A to have one more success (or one fewer failure) than if the same number were treated with intervention B (either a comparator treatment or an inactive control). As stated in the original article by Laupacis et al, a ‘number needed to be treated of 11 means that 10 of 11 patients either do not need therapy or will not respond to it’. Reference Laupacis, Sackett and Roberts4 An NNT of 2 indicates that a treatment is very effective and that this effect will be readily apparent to the clinician. By contrast, an NNT of 100 reflects a very small treatment effect that would contribute little to everyday clinical practice (unless, of course, that one patient could be identified prospectively). However, a particular NNT value does not in itself establish whether a treatment is clinically important, and the NNT alone should not be used to make comparative judgements across areas of therapeutics. Variables such as the prevalence of the illness being treated, the acuity and severity of the illness, and the time-frame of observation are critical. For example, despite having a very low NNT of 2, an anti-emetic medication relieving nausea for 2 h may be less clinically important than an anticoagulant medication used to prevent death from stroke, even if the latter had an NNT of 100.
Similarly, the number needed to harm (NNH) is a variation that determines the number of patients the clinician needs to treat with intervention A compared with B before they will see one additional case of a specified severe adverse event. Reference Sedgwick6 In order to make an informed assessment of the risk–benefit ratio of an intervention, both the NNT and NNH must be considered. For example, a ratio is sometimes computed for these two numbers, with higher ratio values indicating greater relative benefit.
Application of NNT: is antidepressant treatment worthwhile for this patient?
The meaning of the NNT for any treatment is always expressed relative to a control condition. In the original formulation, the control condition was pill placebo and, since placebo is inactive, the NNT was explicitly interpreted as informing clinicians about the consequences of prescribing a treatment as compared with doing nothing. Laupacis and colleagues argued that a benefit of the ARR (and thus, the NNT) as compared with relative risk reduction is that the ARR and NNT constructs are ‘expression[s] of the consequences of giving no treatment’. Reference Laupacis, Sackett and Roberts4 The notion of comparing a treatment with doing nothing is very relevant for clinicians, since this is often the decision confronting them in their offices. Do I prescribe aspirin for this patient at risk of myocardial infarction or not?
However, it must be noted that this interpretation of the NNT for a placebo-controlled trial (i.e. as an expression of the consequences of prescribing a treatment compared with doing nothing) is based on the assumption that the placebo intervention is tantamount to doing nothing – that placebo response is roughly equivalent to natural history (the course an illness would take if undisturbed by treatment). In the original examples illustrating the utility of the NNT concept, placebo response probably was a reasonable approximation of doing nothing. For example, participants in clinical trials of aspirin to prevent myocardial infarction received aspirin or placebo. Placebo effects do not seem to exist when the outcome is mortality caused by myocardial infarction, and participants in these studies typically do not receive other potentially therapeutic elements that are standard parts of the intensive follow-up comprising the placebo condition in RCTs for treatment of MDD (e.g. weekly clinic visits, advice about diet and exercise, feedback about lipid levels and blood pressure measurements).
By contrast, a substantial proportion of the therapeutic benefit of participating in an RCT evaluating a novel antidepressant is conveyed by the so-called nonspecific elements of treatment. In fact, 60–80% of benefit can be explained by nonspecific factors. Reference Walsh, Seidman, Sysko and Gould1,Reference Rutherford, Pott, Tandler, Wall, Roose and Lieberman9 RCTs in psychiatric disorders may be more influenced by these nonspecific effects because symptom reports are more subjective (e.g. depressive symptom reports) and physiological pathways seem to exist by which placebo effects may influence the target disorder. Reference Mayberg, Silva, Brannan, Tekell, Mahurin and McGinnis10 For example, taking a pill believed to represent an effective treatment may induce expectancies in depressed patients that are capable of causing both side-effects and therapeutic effects. Reference Rutherford, Marcus, Wang, Sneed, Pelton and Devanand11 Besides the expectancy effects generated in placebo-controlled RCTs for MDD, the study pills are administered as part of a multi-component clinical management program that typically includes weekly sessions with a clinician, self-reports, structured ratings, medical evaluation, and considerable contact with caring and helpful clinic staff. Reference Miller, Frank and Reynolds12
Increasing evidence suggests this therapeutic contact may also be a potent contributor to symptomatic improvement in patients with depression, particularly patients in the placebo arm of a medication v. placebo RCT. Reference Rutherford, Sneed and Roose13,Reference Posternak and Zimmerman14 In adolescents with depression, a greater number of visits during the acute treatment period is associated with significantly increased placebo response but not medication response. Reference Rutherford, Sneed, Tandler, Peterson and Roose15 Similarly, an analysis of antidepressant clinical trials enrolling adults aged 18–65 years found a cumulative and positive therapeutic effect of additional follow-up visits on placebo response, but the effect of this increased therapeutic contact was approximately 50% less in the medication groups. Reference Posternak and Zimmerman14 Analysis of response rates in clinical trials of antidepressants in older patients demonstrated that there is a significant interaction between study visits and treatment assignment (OR = 0.89, t = −2.186, d.f. = 36, P = 0.035), such that each additional visit over the grand mean for the sample increased average placebo response by 2.5%, but did not significantly affect medication response. Reference Rutherford, Tandler, Brown, Sneed and Roose16
Thus, increasing the number of study visits significantly increases placebo response while leaving medication response generally unaffected. The effect of this arm-assignment/visits interaction is to dramatically decrease average medication v. placebo differences with increasing visit frequency. For example, in a 12-week, placebo-controlled study of antidepressants for late-life depression, providing 6 clinic visits resulted in an average medication response rate of 51.7% and placebo response rate of 39.5%. Reference Valenstein, Eisenberg, McCarthy, Austin, Ganoczy and Kim17 Providing 8 clinic visits for the same duration of study decreased the average medication response rate to 50.7% and increased placebo response to 44.4%, while providing 10 clinic visits resulted in a medication and placebo response of 49.7% and 49.3%, respectively. Thus, intensifying supportive care from 6 to 10 visits over 12 weeks resulted in a reduction of the average medication/placebo difference from 12.2% to 0.4%.
The weekly visits that are typical in RCTs are far more frequent than are commonly practised in the community treatment of depression. Claims-based analyses of the follow-up received by patients filling a new prescription for an antidepressant report that patients with depression have an average of 2.4 to 4.5 face-to-face visits with a physician in the 12 weeks after starting an antidepressant (i.e. they are seen approximately monthly). Reference Valenstein, Eisenberg, McCarthy, Austin, Ganoczy and Kim17–Reference Stettin, Yao, Verbrugge and Aubert19 Most patients (73.6%) are treated exclusively by their general medical provider, as opposed to a psychiatrist, Reference Mojtabai and Olfson20 fewer than 20% of patients have a mental healthcare visit in the first 4 weeks after starting an antidepressant, and fewer than 5% of adults beginning treatment with an antidepressant have as many as 7 physician visits in their first 12 weeks on the medication. These differences partially explain why results of RCTs are not always generalisable to standard clinical treatment: Reference Morrato, Libby, Orton, deGruy, Brent and Allen18,Reference Stettin, Yao, Verbrugge and Aubert19 the placebo response in an RCT is enhanced by the frequent clinic visits. If RCTs included a frequency of patient visits similar to that of clinical practice, this would decrease the placebo response while not significantly affecting the medication response. Thus, far from representing the case of doing nothing in clinical practice, the placebo arm in an antidepressant RCT represents an intensive form of clinical management that has therapeutic effects.
Another difference between the exemplar RCTs originally used to illustrate the utility of the NNT (e.g. aspirin to prevent myocardial infarction, warfarin for the prevention of stroke) and RCTs for disorders such as MDD is that placebo responses are extremely variable. Reference Walsh, Seidman, Sysko and Gould1 Most of the variability in drug–placebo differences in antidepressant RCTs can be attributed to fluctuation in placebo response rather than medication response. Reference Bridge, Birmaher, Iyengar, Barbe and Brent3 For example, in a recent meta-analysis of antidepressant RCTs, placebo response ranged from 13% to 56% (mean 34.7%), and NNTs for antidepressants calculated from individual RCTs within this data-set ranged from 2 to 25. Reference Rutherford, Sneed and Roose13 As a result, even when medication response seems to be relatively constant across studies, the clinical estimate of the importance of medication treatment changes dramatically on the basis of placebo response rates.
Insofar as the placebo condition significantly diverges from doing nothing, clinicians' application of the NNT from a given RCT to their practice becomes problematic. For example, the clinical question for a practicing psychopharmacologist might be construed as ‘would it be worthwhile to prescribe an antidepressant for this individual with mild MDD?’ The options available are (a) prescribe an antidepressant or (b) not prescribe any medication and follow up with the patient at a later date. To decide between these options, the psychopharmacologist might consult the literature, where they would find an average medication response rate from placebo-controlled RCTs for mild depression of approximately 45–50% and a placebo response rate of 30–35%, translating into NNTs of between 10 and 20. Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam and Shelton21 Controversially, it has been suggested that these NNTs are high considering the nature of the benefit (i.e. moderate symptom improvement), the potential for side-effects, and the cost associated with the treatment. Reference Kirsch and Sapirstein22 Thus, the psychopharmacologist may decide not to prescribe treatment, judging the 10% response rate difference between medication and placebo to be too small in light of the cost and possibility of side-effects.
In our view, this treatment decision is confounded by a misinterpretation of the NNT construct. The NNT from an antidepressant RCT compares prescribing a medication plus intensive clinical management to prescribing a placebo plus intensive clinical management. The options available in the clinical situation, including to do nothing, do not exist in these RCTs, and for the reasons outlined above, there is reason to think that placebo plus intensive clinical management is substantially more therapeutic for patients with depression than doing nothing. Whereas response rates reported for an acute course of placebo plus clinical management are 30–40%, Reference Walsh, Seidman, Sysko and Gould1,Reference Dunlop, Thase, Wun, Fayyad, Guice-Pabia and Musgnung2 the expected response rate for an acute depressive episode without treatment over 8–12 weeks' follow-up is approximately 10%. Reference Rutherford, Mori, Sneed, Pimontel and Roose23 The true difference between the alternative treatment options available in the clinical situation, therefore, is more likely to be:
In fact, the NNT that best represents the clinical situation would be calculated using medication response rates from comparator trials, which are significantly higher than the response rates in placebo controlled trials. Reference Rutherford, Sneed and Roose13 Higher response rates may occur in comparator trials because these patients know they are receiving active medication as opposed to possibly receiving placebo, and this situation is more generalisable to clinical practice. A medication response rate of 60% from a comparator trial yields an NNT of 2 (where the control is 10% spontaneous remission).
It is therefore misleading to apply an NNT derived from an RCT, in which an expectation of improvement is induced and frequent study visits are administered for both medication- and placebo-treated subjects, to a clinical setting, in which placebo plus clinical management is not an available treatment option. To inform the clinician about the relative effectiveness of treatments in a real-life clinical setting, the NNT for an antidepressant or antipsychotic should be calculated from data from a study that compares open medication with a no-treatment control group. Unfortunately, no such studies have been done in the past, and to propose such a study design now would raise significant feasibility and, more importantly, ethical concerns. Thus, while placebo-controlled RCTs are the gold standard for demonstrating the efficacy of an intervention, they have serious, possibly fatal limitations when used to guide clinical decision-making in disorders with high placebo response rates.
Another application of NNT: which of the available treatments should be prescribed?
A second clinical application of the NNT is to help clinicians decide between alternative interventions. Among the treatments available for a given condition, which would be the most useful to prescribe for this patient? A clinician might choose by simply comparing the NNTs calculated from clinical trials of each intervention and selecting the intervention with the lowest NNT. This would be cogent when choosing between antidepressants because in an RCT comparing medications all patients receive the same management, with the only variable being medication assignment. However, the situation is different if the clinician wants to compare antidepressants with psychotherapeutic modalities. Taking the treatment of mild MDD as an example, the NNTs in RCTs for adults with mild MDD range from 10 to 25, with an average of approximately 16. Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam and Shelton21 By contrast, much lower NNTs have been reported for the psychotherapeutic treatment of MDD. A series of influential meta-analyses have reported NNTs for cognitive–behavioural therapy (CBT) and other psychotherapeutic treatments for MDD in the 2–5 range. Reference Cuijpers, van Straten, Andersson and van Oppen24 On the basis of this substantially lower NNT, a clinician might reasonably opt to prescribe an evidence-based psychotherapy over medication.
In fact, guidelines for the treatment of mild MDD, based substantially on NNT comparisons, state that psychotherapy should be the first-line treatment for this condition. For example, National Institute for Health and Care Excellence (NICE) guidelines advise that practitioners ‘not use antidepressants routinely to treat persistent sub-threshold depressive symptoms or mild depression because the risk–benefit ratio is poor’. 25 Others have made stronger statements, suggesting the continued high rates of antidepressant prescriptions for mild to moderate MDD in the face of these unfavorable NNT numbers reflects improper motives on behalf of physicians and/or pharmaceutical companies. Reference Kirsch and Sapirstein22
While evidence-based psychotherapies are inarguably a valuable component of the clinician's armamentarium in the treatment of MDD, confusion in the interpretation of NNTs from medication and psychotherapy trials makes blanket statements in favor of psychotherapy difficult to justify. As discussed previously, the NNT is calculated relative to a control condition. It follows that, in order to compare NNTs across different treatment interventions (e.g. antidepressants and CBT), the control conditions upon which the NNTs are based must be comparable. As stated in a recent commentary by Garcia, ‘to directly compare NNTs one needs to ensure that … the control or comparison groups to which the treated group was compared were equivalent. The need for similar control groups is no small feat in psychotherapy research where there is not a readily available analog to the pill placebo conditions that are frequently used in psychopharmacology research’. Reference Garcia26
In the case of an antidepressant RCT, the placebo control condition replicates every element of the medication condition except for the actual pharmacological effect of medication (e.g. serotonin reuptake inhibition). By contrast, no comparable placebo control condition exists for psychotherapy, partly because the specific ‘active ingredients’ of psychotherapy are unknown. Instead, a psychotherapy typically is compared with the effects of being placed on a treatment waiting list (or other attention control condition), receiving a different active treatment, or a placebo pill.
A control condition such as a wait-list has substantially fewer therapeutic components than a pill placebo control condition. Subjects on a wait-list do not usually have frequent clinic visits and are not masked to the fact that they are not receiving an active treatment (and probably, therefore, have minimal expectancy of improvement), both of which stand in stark contrast to the treatment received by individuals in a pill placebo cell. Indeed, a recent meta-analysis reported that the relative benefit of psychotherapies v. their control conditions (and the NNT calculated) depends greatly upon the nature of the control condition. Reference Huang, Delucchi, Dunn and Nelson27 Thus, there is good reason to believe that comparisons between NNTs for CBT v. wait-list control and for antidepressants v. pill placebo are seriously confounded. Such comparisons would be biased in favour of psychotherapy over medication, given the much less effective control conditions in the psychotherapy trials.
The problem of mismatched control conditions is not necessarily avoided even when the treatments are administered within the same study. The Treatment for Adolescents with Depression Study (TADS) randomly allocated adolescents with MDD to CBT alone, fluoxetine alone, combined CBT and fluoxetine, or pill placebo. Reference March, Silva, Petrycki, Curry, Wells and Fairbank28 The authors state that ‘participants and all study staff remained masked in the ‘pills only’ conditions (fluoxetine therapy and placebo) … [but] patients and treatment providers in the combination and CBT conditions were aware of treatment assignment’. Reference March, Silva, Petrycki, Curry, Wells and Fairbank28 Thus, the study compared blinded treatments (fluoxetine and pill placebo) with un-blinded treatments (CBT and combined CBT/fluoxetine). It has been reported that comparisons of openly administered treatments with double-blind treatments are significantly biased against the blinded treatment. Reference Rutherford, Sneed and Roose13 The most clinically informative and fair study design would compare psychotherapy, open medication, and no treatment in individuals with mild MDD. However, no such study has been performed.
The point of the above discussion is not to advocate antidepressants as the treatment of choice for mild MDD, but rather to illuminate the difficulties posed by comparing treatments that do not have comparable control conditions. While NNTs can be calculated for any treatment and control condition, the validity of between-treatment comparisons and therefore of treatment decisions based on NNTs depends on careful verification that the controls upon which each NNT is based are comparable.
Suggestions for improving the clinical utility of NNT
After reviewing the difficulties faced by clinicians who are using NNTs calculated from RCTs to practice evidence-based medicine, the question naturally arises: what can be used instead? The answer is far from clear. The most obvious solution is to conduct studies that more closely resemble the treatment options available to doctors and patients in clinical practice. For example, a study comparing openly administered antidepressants and monthly clinic visits (the average follow-up visit frequency in community treatment Reference Kirsch and Sapirstein22 ) with periodic follow-up without pills (i.e. doing nothing) would more closely replicate the clinical situation and would permit the calculation of an NNT for antidepressants that would better guide clinical decision-making. Treatment development could shift to a two-stage model in which an intervention is first tested for efficacy in a placebo-controlled RCT, and then tested in an effectiveness study to inform clinicians how different therapeutic options would work in their clinical practice.
Unfortunately, effectiveness studies are difficult to execute and perhaps even more difficult to fund in today's research environment, so alternatives must be explored. Although all of the available alternatives may have significant drawbacks, in many cases they offer more valid clinical guidance than NNTs calculated from standard, placebo-controlled RCTs. One possibility would be to use information from psychotherapy studies utilising a wait-list control group as a means of estimating the option of doing nothing for MDD. We analysed the acute symptom change in wait-list control groups and found that individuals with MDD experience an average improvement of 4 Hamilton Rating Scale for Depression points (Cohen's d effect size = 0.5) over a mean follow-up of 10 weeks. Reference Rutherford, Mori, Sneed, Pimontel and Roose23 This figure could be construed as a type of ‘historical control’ and used in place of a placebo control condition to estimate a modified NNT for an active treatment condition in a different RCT. The obvious downside to this course is that there may be multiple confounds, making comparisons across different patient samples invalid.
Some steps can be taken to attenuate the problems faced by clinicians using the NNT to decide between different treatment options. Clinicians can reduce the effect of bias in their NNT comparison by ensuring the comparability of the control conditions. These conditions should be similar to one another in terms of blinding (i.e. do not compare open v. blinded), credibility to participants, and the amount of therapeutic contact (e.g. visits) provided. The control condition should also be appropriate to the treatment being assessed. Attention to these considerations can enhance the validity of using the NNT to compare treatments.
In many medical fields, but particularly in psychiatry, long-standing concerns have been expressed about how applicable the results of RCTs are to clinical practice. Although the NNT has proved itself a useful tool in the practice of evidence-based medicine in some circumstances, it is perhaps less useful with RCTs, which have complex control and comparator conditions and high, variable placebo responses. The purpose of this analysis has been to raise the field's awareness of certain difficulties in the proper interpretation of NNTs derived from RCTs having high placebo response rates and illustrate how NNTs may be misleading in certain circumstances.
This work was supported by National Institute of Mental Health grants K23 MH085236 (BRR) and T32 MH015144 (SPR).