Psychological treatment of perinatal depression: a meta-analysis

Background Depression during pregnancy and after the birth of a child is highly prevalent and an important public health problem. Psychological interventions are the first-line treatment and, although a considerable number of randomized trials have been conducted, no recent comprehensive meta-analysis has evaluated treatment effects. Methods We used an existing database of randomized controlled trials of psychotherapies for adult depression and included studies aimed at perinatal depression. Random effects models were used in all analyses. We examined the effects of the interventions in the short and long term, and also examined secondary outcomes. Results Forty-three studies with 49 comparisons and 6270 participants between an intervention and control group were included. The overall effect size was g = 0.67 [95% confidence interval (CI) 0.45~0.89; numbers needed-to-be-treated = 4.39] with high heterogeneity (I2 = 80%; 95% CI 75~85). This effect size remained largely unchanged and significant in a series of sensitivity analyses, although some publication bias was found. The effects remained significant at 6–12 months follow-up. Significant effects were also found for social support, anxiety, functional limitations, parental stress and marital stress, although the number of studies for each outcome was low. All results should be considered with caution because of the high levels of heterogeneity in most analyses. Conclusions Psychological interventions are probably effective in the treatment of perinatal depression, with effects that last at least up to 6–12 months and probably also have effects on social support, anxiety, functional impairment, parental stress, and marital stress.


Introduction
Depression during pregnancy and after the birth of a child is highly prevalent and results in a considerable reduction in quality of life, social functioning (Drury, Scaramella, & Zeanah, 2016), as well as parental and maternal functioning (Bernard, Nissim, Vaccaro, Harris, & Lindhiem, 2018;O'Hara & McCabe, 2013). The negative consequences of perinatal depression extend to infants and children, including diminished cognitive, emotional, behavioral, and physical outcomes (Liu et al., 2017;O'Hara & McCabe, 2013;Stein et al., 2014). Although it is not clear whether the prevalence of depression is actually more common in the postpartum period than in other periods of life (O'Hara & McCabe, 2013), one meta-analysis estimated that about one in seven women in high-income countries and one in 10 women in low-income countries are affected by perinatal depression (Woody, Ferrari, Siskind, Whiteford, & Harris, 2017). The effective treatment of perinatal depression is, therefore critical to public health.
Psychotherapy is recommended as the first-line approach for perinatal women with a new depression episode (O'Connor, Rossom, Henninger, Groom, & Burda, 2016). Importantly, women generally prefer psychotherapy due to concerns about the effects of medication on their infants (Dennis & Chung-Lee, 2006). More than 40 randomized controlled trials have tested the effects of psychological treatments of perinatal depression and several meta-analyses have evaluated overall effects (Bledsoe & Grote, 2016;Dhillon, Sparkes, & Duarte, 2017). Nevertheless, no recent meta-analysis has integrated the results of all published trials. Instead, meta-analytic evaluations have included a subset of trials across a wide range of dimensions including the type of therapy (Cluxton-Keller & Bruce, 2018;Huang, Zhao, Newby, 2019;Nair, Armfield, Chatfield, & Edirippulige, 2018;Roman et al., 2020;Stevenson et al., 2010;Zhou et al., 2020), settings (Rahman et al., 2013;Stephens, Ford, Paudyal, & Smith, 2016), and timing i.e. antenatal or postpartum period (Huang et al., 2020;Li et al., 2020a;Roman et al., 2020;van Ravesteyn, Lambregtse-van den Berg, Hoogendijk, & Kamperman, 2017). Additionally, published meta-analyses which focused on broader mental health problems in the perinatal period, such as anxiety and distress in general, resulted in unclear and incomplete effect estimates of depression treatments (Han, Guo, Ren, Duan, & Xu, 2020;Li et al., 2020b;Song, Kim, & Ahn, 2015). Some meta-analyses included non-randomized trials with the associated increase of uncertainty of effect estimates (Sockol, 2015). Finally, despite the link between maternal depression and diminished development in children, no prior meta-analyses of perinatal depression treatments have assessed whether treatment results in improved child outcomes.
With the substantial number of published clinical trials evaluating psychotherapy treatments for perinatal depression, the field has reached a critical point of having adequate statistical power to examine, not only the effect size of these treatments, but also the ability to examine secondary outcomes and potential sources of heterogeneity in subgroup analyses. A larger meta-analysis also allows to examine the relative effectiveness of different treatments, modalities, settings, and timing of treatment.
More studies also provide a more robust estimate of long-term effects, which are often understudied in psychotherapy research. In one earlier larger meta-analysis of all psychotherapies for adult depression, a relatively complete set of studies on psychotherapies in perinatal depression (N = 36) were included. However, this analysis only resulted in an overall pooled effect size (Cuijpers, Karyotaki, Reijnders, & Huibers, 2018a). No secondary outcomes, subgroup analyses or long-term effects of these studies were conducted.
The limitations of prior meta-analyses thus prompted a new comprehensive meta-analysis of randomized trials comparing psychological treatments of perinatal depression with control groups. Here we focus not only on the overall pooled effect size of treatments for depressed women, but also on secondary outcomes in mother and child, incorporating subgroup analyses to explore the potential impact of moderators, and long-term effects.

Identification and selection of studies
The protocol for this meta-analysis was registered at the Open Science Framework (Cuijpers, 2020b;Cuijpers & Karyotaki, 2020; https://osf.io/cmk4y). We used an existing database of studies on the psychological treatment of depression. This database has been described in detail elsewhere (Cuijpers & Karyotaki, 2020), and has been used in a series of earlier published meta-analyses (Cuijpers, 2017). The database is continuously updated and was developed through a comprehensive literature search (from 1966 to January 1, 2020). For this database, we searched four major bibliographical databases (PubMed, PsycINFO, Embase and the Cochrane Library) by combining terms (both index terms and text words) indicative of depression and psychotherapies, with filters for randomized controlled trials. The full search string for one database (PubMed) is given in online Supplementary Appendix A. We also searched a number of bibliographical databases to identify trials in non-Western countries (Cuijpers, Karyotaki, Reijnders, Purgato, & Barbui, 2018b), because the number of trials on psychological treatments in these countries is growing rapidly. Furthermore, we checked the references of earlier meta-analyses on psychological treatments of depression. For the current meta-analysis, we also checked the references of previous meta-analyses on the psychological treatment of perinatal depression. All records were screened by two independent researchers and all papers that could possibly meet inclusion criteria according to one of the researchers were retrieved as full-text. The decision to include or exclude a study in the database was also done by the two independent researchers, and disagreements were solved through discussion.
For the current meta-analysis, we included studies that were: (a) randomized trials (b) in which a psychological treatment (c) for perinatal depression (d) was compared with a control group (waiting list, care-as-usual, placebo, other inactive treatment). Perinatal depression was defined as depression during pregnancy (antenatal depression) and up to two years after giving birth (postpartum depression). Depression could be established with a diagnostic interview or with a score above a cut-off on a self-report measure. No language restrictions were applied.

Quality assessment and data extraction
As in our previous meta-analyses using our database of randomized trials, we assessed the validity of included studies using four criteria of the 'Risk of bias' assessment tool, developed by the Cochrane Collaboration (Higgins et al., 2011). This tool assesses possible sources of bias in randomized trials, including the adequate generation of allocation sequence; the concealment of allocation to conditions; the prevention of knowledge of the allocated intervention (masking of assessors); and dealing with incomplete outcome data (this was assessed as positive when intention-to-treat analyses were conducted, meaning that all randomized patients were included in the analyses). Assessment of the validity of the included studies was conducted by two independent researchers, and disagreements were solved through discussion.
We also coded participant characteristics (depressive disorder or scoring high on a self-rating scale; antenatal, postnatal, or mixed; the age of participants); we also rated whether studies were explicitly aimed at high-risk women or not (high risk was defined as low-income and/or ethnic minority); type of therapy (according to the framework developed previously; Cuijpers, Karyotaki, de Wit, & Ebert, 2020a;Cuijpers, van Straten, Andersson, & van Oppen, 2008) and other characteristics of the psychotherapies (treatment format; the number of sessions); and general characteristics of the studies (type of control group; the country where the study was conducted). The treatment format was coded as an individual, group or guided self-help (including internet-based guided self-help). We did not include unguided self-help, because this has been found to be less effective in the treatment of depression (Cuijpers, Noma, Karyotaki, Cipriani, & Furukawa, 2019).

Outcome measures
For each comparison between psychotherapy and a control condition, the effect size indicating the difference between the two groups at post-test was calculated (Hedges' g) (Hedges & Olkin, 1985). Effect sizes of 0.8 can be assumed to be large, while effect Psychological Medicine 2597 sizes of 0.5 are moderate, and effect sizes of 0.2 are small (Cohen, 1988). Effect sizes were calculated by subtracting (at post-test) the average score of the psychotherapy group from the average score of the control group and dividing the result by the pooled standard deviation. Because some studies had relatively small sample sizes we corrected the effect size for small sample bias (Hedges & Olkin, 1985). If means and standard deviations were not reported, we used the procedures of the Comprehensive Meta-Analysis software (see below) to calculate the effect size using dichotomous outcomes; and if these were not available either, we used other statistics (such as t value or p value) to calculate the effect size. In order to calculate effect sizes, we used all measures examining depressive symptoms (such as the Beck Depression Inventory/ BDI (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961); the BDI-II (Beck, Steer, & Brown, 1996); or the Hamilton Rating Scale for Depression/HAMD-17 (Hamilton, 1960). When one study reported more than one depression outcome measure, we pooled the effect sizes within the studies before pooling across studies. We also conducted sensitivity analyses in which we used only one depression outcome measure from each study, based on an algorithm that we used in a previous meta-analysis on psychotherapies for depression, giving priority to the HAM-D-17, the BDI, the BDI-II, another clinician-rated instrument and another self-report instrument . Effect sizes were calculated using Comprehensive Meta-Analysis (version 3.3070; CMA).
We also extracted secondary outcomes, like the quality of life, social support, functional limitations and child outcomes. We checked in the included studies which secondary outcomes were included and reported, and if 5 or more studies reported on the same outcome, this outcome was included in the meta-analysis. For definitions and operationalizations of secondary outcomes, we used previous meta-analyses of our database on quality of life (Kolovos, Kleiboer, & Cuijpers, 2016), functional limitations (Renner, Cuijpers, & Huibers, 2014), social support (Park, Cuijpers, van Straten, & Reynolds, 2014), and child outcomes (Cuijpers, Weitz, Karyotaki, Garber, & Andersson, 2015).

Meta-analyses
We pooled the effect sizes using the 'meta' and 'metafor' packages in R, and ran all analyses in R studio, version 1.1.463, for Mac (the R Foundation). Because we expected considerable heterogeneity among the studies, we employed a random-effects pooling model in all analyses. We pooled the effect sizes using the inverse variance method, with the Hartung-Knapp adjustment for the random-effects model. We calculated the I 2 statistic and its 95% confidence interval (CI) to estimate heterogeneity (Higgins, Thompson, Deeks, & Altman, 2003). A value of 0% indicates no observed heterogeneity, and larger values indicate increasing heterogeneity, with 25% as low, 50% as moderate, and 75% as high heterogeneity. In addition, we calculated the prediction interval. Because the 95% CI of the effect size does not indicate how the true effects found in studies are distributed, we added the prediction interval which indicates the range in which the true effect size of 95% of all populations will fall (Borenstein, Hedges, Higgins, & Rothstein, 2009;Borenstein, Higgins, Hedges, & Rothstein, 2017).
Numbers needed-to-be-treated (NNT) for depression outcomes were calculated using the formulae provided by Furukawa (1999). The NNT is the inverse of the risk difference between the treatment and control group (Laupacis, Sackett, & Roberts, 1988), based on a binary outcome. The NNT can be estimated with a continuous outcome, especially if the proportion of a binary outcome is known. In our calculation, we conservatively set the proportion of the outcome at 19% for the control group (based on the pooled response rate of 50% reduction of symptoms across trials in psychotherapy for depression) . We did not calculate NNTs for other outcomes, because the control group's outcome rates were unknown.
We tested for publication bias by inspecting the funnel plot on primary outcome measures and by Duval and Tweedie's trim and fill procedure (Duval & Tweedie, 2000) as implemented in CMA, which yields an estimate of the effect size after the publication bias has been taken into account. We also conducted Egger's test of the intercept to quantify the bias captured by the funnel plot and to test whether it was significant. Studies were considered to be outliers when the 95% confidence interval of the effect size did not overlap with the pooled effect size.
We conducted subgroup analyses according to the mixed-effects model, in which effect sizes within subgroups are pooled according to the random-effects model and differences between subgroups are tested with a fixed-effects model. We conducted bivariate meta-regression analyses to examine the association between the effect size and continuous outcomes.
We conducted sensitivity analyses in which we included only studies with a low risk of bias, and analyses in which the alternative way of calculating effect sizes was used. We also calculated the relative risk (RR) of dropping out from the study (for any reason) in the intervention groups compared with the control groups.

Selection and inclusion of studies
After examining a total of 24 771 records (18 217 after removal of duplicates), we retrieved 2914 full-text papers for further consideration. We excluded 2871 of the retrieved papers. The PRISMA flowchart describing the inclusion process, including the reasons for exclusion, is presented in Fig. 1. A total of 43 randomized controlled trials (with 49 comparisons between psychotherapy and a control group) with 6270 participants (3158 in the treatment groups and 3112 in the control groups) met inclusion criteria for this meta-analysis.

Characteristics of included studies
A summary of key characteristics of the included studies is presented in Table 1. Eighteen of the 43 studies were aimed at pregnant women, 24 at women with postpartum depression and one was aimed at a mixed population. Eight studies were specifically aimed at high-risk women. The mean age of the women in the study populations ranged from 21.9 to 32.2 years (median 29.2; three studies did not report the mean age). In 22 studies women had to meet diagnostic criteria for a depressive disorder to participate, in 21 studies women had to score above the cut-off of a self-rating depression measure.
In the 43 studies, 49 psychological interventions were compared with a control group. Twenty-four of these were cognitive behavior therapy, seven were interpersonal psychotherapy, seven were supportive counseling, and 10 were other therapies. Twenty-four of the 49 interventions used an individual format, 15 a group format, three internet-based guided self-help, and
seven used a mixed or another format. Thirty-six interventions had between 6 and 12 sessions, seven had fewer sessions, and six had more sessions (range 2-21). A total of 33 studies used a care-as-usual as a control group, five used a waiting list and the other five studies used another control group. Thirteen studies were conducted in North America, 10 in Europe, six in Australia, six in East Asia and eight in other countries.
The risk of bias in many studies was considerable. Twenty-six of the 43 studies reported an adequate sequence generation (61%). Twenty-two reported allocation to conditions by an independent (third) party (51%). Eight studies reported using blinded outcome assessors (19%), and 30 used only self-report outcomes (70%). In 22 studies intent-to-treat analyses were conducted (51%). Fourteen of the 43 studies (33%) met all quality criteria, 15 studies (35%) met two or three of the criteria and the 14 remaining studies met no or only one criterion (33%).

Overall effects of psychological treatments of perinatal depression
The mean effect size of all 49 comparisons of psychotherapy with a control group was g = 0.67 (95% CI 0.45∼0.89), which corresponds with an NNT of 4.39. Heterogeneity was very high (I 2 = 80%; 95% CI 75∼85), and the prediction interval ranged from −0.82 to 2.16. The data of these analyses are presented in Table 2 and Fig. 2 gives the forest plot.
There were two studies in which two psychological treatments were compared with the same control group, and two more studies in which three interventions were compared with the same control group. These effect sizes are not independent of each other and may artificially reduce heterogeneity and affect the effect size. Therefore, we conducted two sensitivity analyses, one in which we only included the largest of the two effect sizes from these five studies, and another one in which we included only the smallest effect size. As can be seen in Table 2, these analyses did not point at major differences of the effect sizes or the level of heterogeneity.
After removal of six outliers (the confidence interval of the effect size did not overlap with the pooled effect size; Table 2), the effect size dropped somewhat (g = 0.56; 95%: 0.46−0.66), and heterogeneity dropped considerably, but remained moderately high (I 2 = 48; 95% CI 25−63). The 17 studies with a low risk of bias resulted in a somewhat higher effect size (g = 0.87; 95% CI 0.20∼1.54), but heterogeneity was still very high (I 2 = 87; 95% CI 81-91). We found some indications for significant   publication bias. Egger's test was not significant ( p = 0.33), but Duval and Tweedie's trim and fill procedure identified four missing studies. After adjustment for these missing studies, the effect size dropped to g = 0.53 (95% CI 0.24-0.82). The funnel plot is presented in online Supplementary Appendix B. The effect sizes found for specific outcome questionnaires ranged from g = 0.45 (95% CI 0.30-0.59) for the EPDS to g = 1.14 (95% CI 0.08-2.20) for the BDI-II. The mean differences between treatment and control groups, indicating the exact number of points on the scales, were 2.15 for the EPDS, 4.82 for the HAMD, 7.04 for the BDI-II and 8.32 for the BDI.

Subgroup analyses, long-term outcomes and acceptability
We conducted a series of subgroup analyses (Table 2). We did not find a significant difference between the effect sizes found for period (antenatal, postnatal, mixed), high-risk group v. no high-risk group, type of therapy, country where the study was conducted or higher v. lower risk of bias. We did find that studies in which participants had to meet diagnostic criteria for a depressive disorder had larger effect sizes than studies in which participants scored above a cut-off on a self-rating scale ( p < 0.05). We also found a differential effect size for different treatment formats, where a mixed-format had the lowest effects ( p = 0.02), and we found that type of control group was associated with the effect size, with waiting lists resulting in the largest effect sizes. Heterogeneity remained high in all subgroups with a substantial number of studies.
Fourteen studies reported outcomes at 6 months or longer follow-up. The results are summarized in Table 2. When we took from each study the effect size that was closest to 12 months follow-up, the pooled effect size was g = 0.40 (95% CI 0.17-0.63), with high heterogeneity (I 2 = 76; 95% CI 60-86). When we limited that to the follow-up measures between 6 and 12 months, the effect size was very comparable (g = 0.41). Four studies reported effect sizes at follow-up between 13 and 24 months, but this was not significant. Two studies reported effect sizes longer than 24 months, but we did not pool these. The sensitivity analyses with long-term outcomes were very comparable to the main results (Table 2).
We could calculate study drop-out for psychotherapy and control conditions in 43 comparisons, but we found no indication for a differential drop-out rate in treatment or control groups (RR = 1.11; 95% CI 0.86-1.42; p = 0.41).

Effects of psychotherapies for perinatal depression on other outcomes
The effects of psychotherapies for perinatal depression on other outcomes are reported in Table 3. Forest plots and funnel plots for the secondary outcomes are given in online Supplementary Appendix C. We found significant effects of treatment on social support (g = 0.41; 95% CI 0.10-0.72). However, after adjustment for publication bias, these results were no longer significant, and the studies with low-risk bias also did not show positive effects.
We also found significant effects on anxiety (g = 0.71; 95% CI 0.08-1.34), which remained significant when outliers were excluded and when only studies with low risk of bias were included. However, after adjustment for publication bias only a moderate, non-significant effect size remained.
We also found significant effects for functional impairment, parental stress and marital stress, also after adjustment for publication bias. Unfortunately, there were not enough studies with a low risk of bias to validate the robustness of these results in highquality research.
No significant effects were found on the weight and height of the children.

Discussion
We conducted a comprehensive meta-analysis of all randomized trials examining the effects of psychological interventions for perinatal depression compared to control conditions and also included important secondary outcomes, outcomes at follow-up, as well as child outcomes. We found that the effects of the interventions on depression were moderate to large, and they remained significant after a series of sensitivity analyses. There was some publication bias, but after adjustment for that, the effects were somewhat smaller, but still significant. These effects remained significant at one-year follow-up, although the number of studies with a low risk of bias was small at follow-up, so these results should be considered with caution.
One major problem was the very high level of heterogeneity in most analyses. This means that the effect sizes of the included studies do not point in the same direction. We found several important outliers with effect sizes that are so large that they lack credibility. But even after excluding these extreme outliers, the level of heterogeneity remained considerable. Subgroup analyses pointed at some significant differences between samples of studies. It is known from other meta-analytic research that the Abbreviations: CBT, cognitive behavior therapy; CI, confidence interval; ES, effect size; FU, follow-up; IPT, interpersonal psychotherapy; N comp , number of comparisons; NNT, numbers-needed-to-be-treated. a According to a random effects model. b Outliers were: Jiang et al., 2014;Lenze & Potts, 2017;Lund et al., 2020;Spinelli et al., 2012;Zemestani et al., 2016. c The 95% CI of I2 and the prediction interval cannot be calculated when the number of studies is less than 3. d Hou et al., 2014. type of control group is associated with differential effect sizes (Cuijpers, Quero, Papola, Cristea, & Karyotaki, 2021), and that was confirmed in this study. But we also found that treatment format may be related to the effect size, which is not confirmed in other meta-analytic research (Cuijpers et al., 2019). The same is true for the differential effects of studies in women with a diagnosed mood disorder compared to a score above the cut-off of a self-report depression measure. This has also not been confirmed in other meta-analytic research (Cuijpers, Karyotaki, Reijnders, & Huibers, 2018a). However, the number of studies in subgroups was relatively small, and may very well be chance findings, because moderator analyses need large sample sizes, especially when heterogeneity is high (Hempel et al., 2013). Because the p values are not that convincing, and because of potential confounding, these results should be considered with caution. Apart from the effects on depression, we found indications that the interventions also had positive effects on social support, anxiety, functional limitations, parental stress and marital stress. This is encouraging and is in line with a growing body of research showing that psychological interventions affect not only depression, but also have positive effects on a range of secondary outcomes (Cuijpers, 2020a(Cuijpers, , 2020b. These results should, however, also be considered with caution because of the risk for publication bias and because each of the outcomes was only examined in relatively small samples of studies, most of which did not meet the criteria for high quality.
We could include a considerable number of studies in highrisk women, with low incomes or from minority groups. It was encouraging to find that the effects of the interventions did not differ for studies in these groups compared to the effects found in other groups.
The results of this meta-analysis are in line with the results of previous meta-analyses, which mostly focused on subsamples of studies, and resulted in very uncertain outcomes. The current meta-analysis indicated more robust and precise outcomes, although the results remained problematic because of the high level of heterogeneity, publication bias, and the low quality of many included trials. Only low risk of bias 2 0.08 −0.04 to -0.19 x x a According to a random-effects model. b This sample of studies did not include studies with a low risk of bias. c The two outliers in the main analyses were excluded (Hsu et al., 2009;Tsai et al., 2008). d The 95% CI of I 2 and the prediction interval cannot be calculated when the number of studies is less than 3 Underlined values are significant (p<0.05) This study has important implications. First of all, it confirms that psychological treatments should be first-line treatments of perinatal depression. These treatments are effective, also in the longer term. Furthermore, they do not only affect depression, but also important secondary outcomes, such as anxiety, social support, functional impairment, parental stress and marital stress. However, the high heterogeneity also indicates that the effect of treatments vary considerably across studies and it is not clear what the causes are of these differences. This means that more and better research is needed to examine who benefits from which treatment under which conditions. One of the causes of the heterogeneity is certainly the fact that many different outcome measures are used and standardization of outcome measures across studies is certainly recommended.
The strengths of the current study include the incorporation of trials spanning the entire perinatal period, the evaluation of other outcomes than depression alone, child outcomes, follow-up outcomes, and state-of-the-art meta-analytic approaches. However, there are also several important limitations that have to be mentioned. We already mentioned the high level of heterogeneity, publication bias, and the low quality of many included trials. In addition to that, it is important to mention that the secondary outcomes were not very consistent across trials, and it would be good if researchers would find some consensus on what secondary outcomes are relevant and should be included in trials. We also included only a limited number of studies reporting longerterm outcomes. The effects in the longer term are highly relevant from a clinical and public health perspective.
Despite these limitations, we can conclude that psychological interventions are effective in the treatment of perinatal depression, with effects that last at least up to 6-12 months and probably also has effects on social support, anxiety, functional impairment, parental stress and marital stress. Conflict of interests. The authors report no conflict of interest.