Long-term outcomes of psychological treatment for posttraumatic stress disorder: a systematic review and meta-analysis

Several types of psychological treatment for posttraumatic stress disorder (PTSD) are considered well established and effective, but evidence of their long-term efficacy is limited. This systematic review and meta-analysis aimed to investigate the long-term outcomes across psychological treatments for PTSD. MEDLINE, Cochrane Library, PTSDpubs, PsycINFO, PSYNDEX, and related articles were searched for randomized controlled trials with at least 12 months of follow-up. Twenty-two studies (N = 2638) met inclusion criteria, and 43 comparisons of cognitive behavioral therapy (CBT) were available at follow-up. Active treatments for PTSD yielded large effect sizes from pretest to follow-up and a small controlled effect size compared with non-directive control groups at follow-up. Trauma-focused treatment (TFT) and non-TFT showed large improvements from pretest to follow-up, and effect sizes did not significantly differ from each other. Active treatments for comorbid depressive symptoms revealed small to medium effect sizes at follow-up, and improved PTSD and depressive symptoms remained stable from treatment end to follow-up. Military personnel, low proportion of female patients, and self-rated PTSD measures were associated with decreased effect sizes for PTSD at follow-up. The findings suggest that CBT for PTSD is efficacious in the long term. Future studies are needed to determine the lasting efficacy of other psychological treatments and to confirm benefits beyond 12-month follow-up.

Numerous psychological PTSD treatments have been developed which can be differentiated by content (e.g. Ehlers et al., 2010). Trauma-focused treatment (TFT) mainly focusses on processing the individual's memory of the trauma and/or its meaning. Trauma-focused cognitive behavioral therapy (TF-CBT) typically incorporate psychoeducation, homework, relaxation, and cognitive and/or behavioral-based components (e.g. cognitive therapy, Ehlers & Clark, 2000; cognitive processing therapy, Resick & Schnicke, 1992; prolonged exposure, Foa & Rothbaum, 1998;Foa, Hembree, & Rothbaum, 2007; narrative exposure therapy, . Eye Movement Desensitization and Reprocessing (Shapiro, 1995) recalls the traumatic memory using bilateral movements and some core elements of TF-CBT. Non-TFTs typically address coping with symptoms, emotion regulation, or current problems in life without a primary focus on the trauma. Techniques of non-TF-CBT comprise, inter alia, anxiety management, relaxation, stress management, social skills training, positive thinking, assertiveness training, or thought stopping (International Society for Traumatic Stress Studies, 2015; e.g. stress inoculation training, Veronen & Kilpatrick, 1983; seeking safety, Najavits, 2002). Series of other PTSD treatments exist (e.g. psychodynamic therapies or hypnotherapy), but are less frequently studied (Cusack et al., 2016).
Seven recent systematic reviews and meta-analyses examined the lasting benefits of psychological treatment in adult PTSD. Their findings mainly represent treatment effects at short-term to medium-term, indicating medium to large improvements in PTSD symptoms up to 6 months of follow-up (Bisson, Roberts, Andrew, Cooper, & Lewis, 2013;Carpenter et al., 2018;Ehring et al., 2014;Kline, Cooper, Rytwinksi, & Feeny, 2018;Lee et al., 2016;Mavranezouli et al., 2020;van Dis et al., 2020). Two of these studies give insights into the long-term treatment effects of PTSD, i.e. at 12-month follow-up and above. One meta-analysis investigated evidence-based treatments for PTSD at medium-term, with at least 6 months of follow-up (Kline et al., 2018). The findings were based on uncontrolled comparison only, and showed larger treatment effects for active treatments from baseline to medium-term follow-up (d = 2.14) compared to active control conditions for the same period (d = 1.04). Uncontrolled comparisons between psychological treatments for PTSD demonstrated no significant differences from pretest to follow-up. For the posttest to follow-up phase, exposure-based treatments were superior, while CBTs without exposure were inferior to all other treatments combined. Of the 32 included trials, eight studies (25%) had a 12-month follow-up. Only one study comprised more than 12 months of follow-up, making it difficult to disentangle any long-term benefit from medium-term outcomes. Another meta-analysis focused on CBT for anxiety disorders compared with usual care or wait-list group with at least 1 month of follow-up (van Dis et al., 2020). Longer-term outcomes for PTSD alone showed medium effects at 6-12 month of follow-up and large treatment effects compared with the control group at more than 12 months of follow-up. The analyses were limited to direct comparisons, and 16 studies with a follow-up of at least 6 months were available for PTSD.
This systematic review and meta-analysis aimed to investigate the long-term outcomes across psychological treatments for adults with PTSD. We aimed to assess PTSD severity and comorbid depressive symptoms at least 12 months after treatment completion. Using comparisons both within and between studies, we aimed to increase the number of available studies with a longterm follow-up but simultaneously control for time and placebo effects. We examined whether (1) psychological treatment differed from control groups and whether (2) TFT differed from non-TFT in PTSD severity and comorbid depressive symptoms at long-term follow-up. In addition, we investigated specific moderators and their potential impact on long-term benefits of psychological treatment for PTSD.

Method
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement for conducting and reporting this meta-analysis (Liberati et al., 2009, see Appendix A). The study protocol was not registered a priori.

Eligibility criteria
Studies were selected if they comprised (1) face-to-face psychological treatment for PTSD, (2) adult participants, (3) at least 70% of participants diagnosed with PTSD (e.g. according to DSM-IV/V, ICD-10); (4) either active or passive, nonpharmacological control conditions (e.g. supportive counseling, wait-list) or psychological treatment as comparators; (5) PTSD severity as primary outcome measured at least 12 months after the end of treatment; (6) a randomized controlled trial design, and (7) at least ten participants per treatment arm. Trials were included based on any type of trauma, type of setting (e.g. inpatients, outpatients), presence of comorbidity, or adjuvant drug treatment (e.g. by prescription or as part of the study protocol).

Selection of studies
A systematic literature search in MEDLINE, PsycINFO, PSYNDEX, PTSDpubs, and Cochrane Library was conducted for articles in English or German language until November 7, 2019. The search strategy derived from the preparation of the German S3 treatment guideline for PTSD (Schäfer et al., 2019), and included the following keywords: (PTSD OR OR Posttraumatische Belastungsstörungen or PTBS) AND ((treatment trial OR randomized controlled trial) or (indexed by a thesaurus term as a clinical trial)). In addition, we performed a systematic snowball search by screening reference lists from included primary studies and relevant review articles. Two researchers independently screened articles and decided on eligible studies. In cases of disagreement between the researchers, a third researcher decided on eligibility.

Data extraction
Several study characteristics were extracted (see Appendix A). Means and standard deviations or reported effect sizes for PTSD severity (primary outcome) and comorbid depressive symptoms (secondary outcome) were extracted at baseline, posttest, and at ⩾12 months after treatment completion. In cases of multiple follow-up intervals, data from the latest was used (e.g. Karyotaki et al., 2016;Kline et al., 2018). Data for clinician-rated PTSD and intent-to-treat samples (ITT) were extracted if available. If further statistical data or data subsets were needed (e.g. for adult subsample), we contacted the study authors and sent a follow-up e-mail in case of non-response (57% response rate). One researcher (MW) extracted data, which were cross-checked by a second (SSch) to ensure accuracy.

Treatment coding
We coded treatment conditions as psychological treatment or control. Psychological treatment was classified as TFT or non-TFT (Ehring et al., 2014;see Ehlers et al., 2010 for discussion). Control conditions were rated as active, if interventions were not directive or trauma-focused such as supportive counseling or treatment as usual (TAU), and served to control for non-specific mechanisms (e.g. Kline et al., 2018;Lambert & Alhassoon, 2015;Powers, Halpern, Ferenschak, Gillihan, & Foa, 2010). We classified control groups as passive if there was no clinician involvement, i.e. during wait-list. Treatment coding was performed by two independent researchers (MW, WH), and a third (BK) was consulted if raters disagreed.

Study quality assessment
The included studies were assessed using the six domains from the Cochrane risk of bias tool  (1) random sequence generation, (2) allocation concealment, (3) blinding of participants, personnel, and (4) outcome assessment, (5) incomplete outcome data (e.g. if no ITT data were available for follow-up analysis), and (6) selective outcome reporting (e.g. if studies deviated from trial registration). All domains were rated as either low, unclear (unknown), or high risk of bias closely following the recommendations for risk of bias ratings in psychotherapy research (see Munder, & Barth, 2018). Two researchers independently (MW, WH) coded risk of biases and consulted a third (SSch) in case of disagreement.

Statistical analyses
Effect size calculation Two types of effect sizes were estimated using Hedges' g (Hedges, 1981). Within-group effect sizes were obtained by subtracting the follow-up (or posttest) mean from the baseline (or posttest) mean. For between-group effect sizes, the control group mean was subtracted from the treatment group at follow-up or posttest divided by the pooled standard deviation. For within-group effect sizes, the standard deviation within groups was used including the correlation between the two measurements (Borenstein, Hedges, Higgins, & Rothstein, 2009). Both types of effect sizes used were corrected for small sample biases (Hedges & Olkin, 1985). A magnitude of 0.2, 0.5, and 0.8 represents a small, medium, and large effect size, respectively (Cohen, 1977). Positive effect sizes indicate improved symptoms, while the width of the respective 95% confidence interval (CI) quantifies its precision (Borenstein et al., 2009).
Comprehensive meta-analysis software, version 3 (Biostat) was applied to pool effect sizes using a random-effects model. If no correlation was available to calculate within-group effect sizes, sensitivity analyses were performed by replacing the coefficient with r = 0.2, r = 0.5 and r = 0.8; the default value was set to r = 0.5 (k = 10; Borenstein et al., 2009). Heterogeneity of effect sizes was tested with the Q-statistic, the I 2 value, and by visual inspections of forest plots. A p-value of the Q-statistic below 0.05 indicates heterogeneity (Cochran, 1954). I 2 values range from 0 to 100% and suggest presence of low (25%), medium (50%), and large (75%) heterogeneity (Higgins & Thompson, 2002).

Subgroup analysis
Subgroup analyses of treatment conditions (active treatment v. control condition), treatment types (TFT v. non-TFT) were performed for the primary and secondary outcome. Dropout rates (i.e. ratio of participants initiating but not completing treatment; Ehring et al., 2014) were calculated for both conditions and treatment subgroups. For PTSD severity, six additional variables were analyzed: proportion of female participants [high (⩾50%) v. low (<50%); Sloan et al., 2013;Watts et al., 2013], type of population (military personnel v. civilian; Kline et al., 2018), treatment format (individual v. group-based; Ehring et al., 2014;Haagen et al., 2015), average number of treatment sessions [high (⩾10 sessions) v. low (<10 sessions); Lambert & Alhassoon, 2015], outcome measure (self-rated v. clinician-assessed, Lambert & Alhassoon, 2015), and type of analysis at follow-up (Kline et al., 2018). Analyses on within-group effect sizes were conducted using the mixed-effect model and the Q-statistics (Borenstein et al., 2009) if at least three comparisons per subgroup were available.

Included studies
The search yielded 12 286 hits. Twenty-two eligible studies were included in this meta-analysis (Fig. 1). Studies were published between 1999 and 2018 and comprised 28-353 participants per  Of 46 included treatment conditions at posttest, 35 were active treatments including 28 TFT (k = 27 TF-CBT, k = 1 EMDR), and 7 non-TFT (all CBT). The 11 active, non-directive control conditions consisted of TAU (k = 5), social counseling (k = 3), presentcentered therapy (k = 2), and educational groups (k = 1). One of six wait-list conditions was available at follow-up, and thus all passive control conditions were removed from analyses. Three active treatment conditions had inadequate follow-up data (k = 2 TF-CBT, k = 1 EMDR), and remained for qualitative and pre-post analyses, resulting in 43 CBT-based conditions for primary analyses. The average follow-up lasted 18 months and ranged from 12 to 74 months. Five studies (23%) had more than 12 months of follow-up. The dropout rate in active treatments (24%, k = 30) did not significantly differ from active control groups (18%, k = 7, p = 0.32). Dropouts in TFT (25%, k = 24) did also not differ significantly from non-TFT (21%, k = 6, p = 0.42).
Heterogeneity was large for within-group effect sizes from pretest to follow-up across condition and treatment types (Q > 62, p < 0.001, I 2 > 81, Table 2). None of the moderators examined for PTSD severityexcept for self-rated PTSD measure (Q = 5.16, p = 0.16, I 2 = 41.91)increased homogeneity in active treatments. However, subgroup analyses showed higher effect sizes for civilian compared to military populations ( p < 0.01), for studies with larger proportions of female participants ( p < 0.001), and for interviewbased compared to self-rated outcome measures ( p < 0.001). Subgroups did not significantly differ regarding treatment format, number of sessions, type of analysis used, or follow-up duration.

Study quality
The overall risk of bias from studies included in this systematic review and meta-analysis ranged from low (50%) and unclear (24%) to high (26%) across all bias domains (see Appendix A for details per study). Most studies generated a low risk of bias concerning sequence generation (68%, k = 15), allocation concealment (50%, k = 11), and blinding of outcome assessors (90%, k = 20). Nine studies (41%) provided complete outcome data at Psychological Medicine follow-up indicating a low risk of bias. Nine studies (41%) also registered or published their study protocol a priori and adhered to it, while in half of the studies (50%) selective outcome reporting remained unclear. Few studies applied blinding of participants and personnel (9%, k = 2), and only one study had a low risk of bias in all domains.

Additional analyses
The replaced correlations using r = 0.2 and r = 0.8 revealed marginally altered within-group effect sizes for active treatments and control conditions in all comparisons (see Appendix A for sensitivity analyses). Publication bias remained untested due to moderate or large heterogeneity between effect sizes and small number of studies in our datasets (Ioannidis & Trikalinos, 2007;Rothstein et al., 2005;Sterne et al., 2006;Terrin et al., 2003).

Summary of evidence
This systematic review and meta-analysis of 22 randomized controlled trials indicate that psychological treatment for adults with PTSD is efficacious in the long term. Active treatments yielded large symptom reductions of PTSD from pretest up to at least 12 months after initial treatment. Small treatment effects favored psychological treatment over non-directive control groups at follow-up, and symptom improvements remained stable from posttest to follow-up. TFT and non-TFT yielded large sustained improvements in PTSD from pretest to follow-up. Treatment effects of TFT were medium relative to non-directive control groups, and not significantly different from non-TFT at follow-up. Effect size estimates were of considerable heterogeneity and the number of available comparisons was low.
The large and stable within-group effect sizes of psychological treatment for PTSD in this meta-analysis are comparable with previous results at both short-term and medium-term follow-up (Ehring et al., 2014;Kline et al., 2018), yet uncontrolled comparisons must be interpreted cautiously. The between-group effect size of psychological treatment compared with active control groups is smaller than reported in a previous study with pooled wait-list and active control groups as comparator at follow-up (van Dis et al., 2020). However, the between-group effect size of TFT relative to active control groups is consistent with the previous finding. All active treatments included cognitive behavioral therapy (CBT) with TFT most frequently (78%) studied long-term, which is in line with previous evidence at mediumterm follow-up (e.g. Kline et al., 2018). The enduring treatment effects of trauma-focused CBT after an average of 18 months confirm its recommendation as first-line therapy for PTSD (e.g. APA, 2017). One follow-up study was available on EMDR, yet the longterm data were insufficient to be included for meta-analysis. This finding highlights its current weaker empirical support as PTSD treatment (APA, 2017), and lasting benefits beyond CBT outcomes require future research. A few studies examined non-TFT showing large and enduring benefits for PTSD, which mirrors prior short-term findings regarding its empirical support and efficacy (e.g. Ehring et al., 2014;Lenz et al., 2017;Powers et al., 2010). Effect sizes of non-TFT were smaller compared with TFT at follow-up as previously reported (Ehring et al., 2014), but the difference was statistically non-significant. Given the small number of comparisons for non-TFT in particular, non-significant results should be interpreted as the absence of statistical evidence rather than as evidence of non-inferiority (Rief & Hofmann, 2018).
The within-group effect sizes of psychological treatments for improved comorbid depressive symptoms were medium from pretest to follow-up (or posttest), and smaller than previous findings at short-term follow-up on treatments aiming to reduce depression or PTSD (Morina, Malek, Nickerson, & Bryant, 2017). Between-group effect sizes at follow-up were also smaller than in the previous study mainly comparing active treatments to waitlist groups. Studies less frequently (73%) reported secondary depression outcomes at follow-up, and comparisons for depressive symptoms were likely underpowered.
The included studies differed widely across sample-related and treatment-related characteristics, and meta-analyses including the present one are inherently associated with heterogeneous effect sizes. In addition, dropout in active treatments was slightly higher compared to previous rates from mixed populations (Kline et al., 2018;Lewis, Roberts, Gibson, & Bisson, 2020), but lower than in military samples alone (Goetter et al., 2015). TFT and non-TFT types did not differ in dropout rates, which reflects some findings (e.g. Imel, Laska, Jakupcak, & Simpson, 2013;, and opposes others (Lewis et al., 2020).
Half of all studies (50%) were at high risk of bias in at least two domains potentially increasing the risk of overestimated treatment effects (e.g. Cuijpers, van Straten, Bohlmeijer, Hollon, & Andersson, 2010b). Subgroup analysis indicated large and non-significantly different effect sizes from ITT and A, alprazolam; AC, academic catch-up; B-CBT, brief cognitive behavioral therapy; BA + E, behavioral activation and exposure; BA + E TH, behavioral activation and exposure telehealth-based; BDI-II, beck depression inventory-II; BDI, Beck depression inventory; C, d-cycloserine; CAPS, clinician-administered PTSD scale; CBT, cognitive behavioral therapy; CPT-M, cognitive processing therapy modified; CPT, cognitive processing therapy; CR, cognitive restructuring; EG, educational group therapy; EMDR, eye movement desensitization and reprocessing; HADS, hospital anxiety and depression scaledepression subscale; HAM-D, Hamilton depression scale; HDRS, Hamilton depression rating scale; ICBTcom, integrated cognitive behavioral treatment for comorbidities; IOE, impact of events scale; IR, imagery rescripting therapy; MINI, mini international neuropsychiatric interview; n, sample size; n session, average number of sessions; NET, narrative exposure therapy; NTF, non-trauma-focused treatment; PCL-C, PTSD checklistcivilian version; PCL-M, PTSD checklistmilitary version; PCLS, post-traumatic checklist scale; PCT, present-centered therapy; PDSi, posttraumatic diagnostic scale -Interview-based; PE, prolonged exposure; Pla, placebo; PSS-I, PTSD symptom scale -Interview; PTSD, posttraumatic stress disorder; SC, supportive counseling; Se, sertraline; SIT, stress inoculation training; SMT, self-management therapy; SS, seeking safety; TAU, treatment as usual; TF, trauma-focused treatment; VR, virtual reality exposure therapy; WET, written exposure therapy; WL, wait-list. a We thank primary study authors for providing additional data and/or data subsets. b Treatment condition was excluded from all meta-analyses. c Treatment condition was excluded from follow-up meta-analysis.
completer samples at follow-up. However, in ITT samples (but not in completer samples) higher dropout was associated with larger effect sizes. This suggests that dropout is a critical source of bias that ITT analyses may not fully resolve assuming data are missing at random (White, Horton, Carpenter, & Pocock, 2011). In addition, treatment effects were significantly higher for studies using interview-based outcome measures compared to self-rated measures, challenging prior results from subgroup analyses in PTSD (Kline et al., 2018;Lambert & Alhassoon, 2015). As suggested for depression outcomes earlier, self-rated measures either underrated the improved symptoms, clinicianrated interviews were more sensitive to change, or a combination of both (Cuijpers, Li, Hofmann, & Andersson, 2010a).
The improved PTSD symptoms at follow-up in female and civilian populations compared with male and military subgroups replicate previous findings at short-term (Sloan et al., 2013;Watts et al., 2013) and medium-term follow-up (Kline et al., 2018;Wade et al., 2016). However, we noted that proportions of male participants were small in civilian populations but close to 100% in military samples, and future studies with balanced samples need to disentangle the effect of gender and population type. Importantly, treatment effects for PTSD did not differ regarding the length of follow-up duration, indicating that symptom improvements persisted beyond 12 months after treatment. The current number of studies with longer follow-up intervals is limited, and future evidence is required to confirm treatment gains Note. Analyses based on 21 studies (20 for proportion female).
1 Additional subgroup analyses for PTSD indicated significantly larger treatment effects for intention-to-treat samples with higher compared to lower dropout rates (⩾20%; <20%, p < 0.01), while treatment effects remained unaffected by losses to follow-up (<40%; ⩾40%, p = 0.61). In completer samples, attrition rates had no impact on treatment outcomes ( p = 0.56; p = 0.34). a p-value of Hedges' g. b p-value of Q-statistics. c p-value between groups. d Analyses include active treatments only.
beyond 12 months following treatment. Subgroup analyses on pre-specified moderators are exploratory and should be interpreted with caution as additional study-level factors may confound the results (Higgins & Green, 2011).

Strengths and limitations
This systematic review and meta-analysis on long-term benefits of psychological treatment exceeds the number and variety of treatment studies included in former meta-analyses for PTSD (Kline et al., 2018;van Dis et al., 2020), and provides first evidence for comorbid depressive symptoms in PTSD at long-term. In addition, we performed a comprehensive literature search, adhered closely to the PRISMA recommendations, and examined prespecified moderators of lasting treatment gains. However, there are several limitations to be noted. First, publication bias could not be tested due to small databases and considerable heterogeneity (Ioannidis & Trikalinos, 2007;Rothstein et al., 2005;Sterne et al., 2006;Terrin et al., 2003). Second, several studies had small sample sizes and included an unclear or high risk of bias. We analyzed two bias domains with no impact on effect sizes, but confidence in the results may remain limited. Third, the number of comparisons at follow-up was low, and specific treatment types beyond broad categories (i.e. TFT, non-TFT) remained untested. Finally, the studies provided few and inconsistent data on additional treatment (e.g. medication) during psychological treatment, and on whether participants received any treatment during the follow-up phase. It remains unclear if benefits can be ascribed to the initial psychological treatment alone.

Conclusion
The results of this meta-analysis on psychological treatment for PTSD demonstrate large and sustained benefits after at least 12 months of follow-up. Active treatments were CBTs, and most studies examined TFT. Effect sizes of TFT were large for improved PTSD and medium for comorbid depressive symptoms from pretest to follow-up and superior to active control groups at followup. Non-TFT showed small to large benefits for PTSD and depressive symptoms from pretest to follow-up, respectively, but the number of available studies was scarce. Future well-designed studies are essential to determine the sustained gains from specific CBT types and other psychological treatments, and to confirm benefits beyond 12 months following initial treatment. Non-TFT v. control 1 -Note. 95% CI = 95% confidence intervals, g = Hedges' g, k = number of comparisons, PTSD = posttraumatic stress disorder, TFT = trauma-focused treatment. a p-value of Hedges' g. b p-value of Q-statistics. Note. 95% CI = 95% confidence intervals, g = Hedges' g, k = number of comparisons, PTSD = posttraumatic stress disorder. a p-value of Hedges' g. b p-value of Q-statistics. c p-value between groups. d Analyses include active treatments only.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S003329172100163X.
Acknowledgements. With thanks to the German PTSD treatment guideline group for providing the results from earlier literature searches, and for supporting the implementation of this work. We thank Helen Niemeyer for statistical advices. We also like to thank all authors of the included primary studies for providing additional data for this systematic review and meta-analysis.
Author contributors. MW and BK initiated this project. MW, WH, AL, TE, and ISch were involved in the literature research. MW extracted the data, which were cross-checked by SSch. MW and WH independently performed the treatment coding, which was supervised by BK. MW and WH independently performed the quality assessment, and SSch acted as the third rater. MW performed the statistical analyses and wrote the first draft of the manuscript under close supervision of BK and SSch. JB contributed to the statistical analyses and interpretation. All authors approved the final manuscript.
Financial support. This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest. All authors declare no conflict of interest.