Introduction
Mental disorders impose a significant global burden, with an estimated economic impact of 4.7 trillion USD and 125.3 million disability-adjusted life years (DALYs) lost worldwide in 2019 [Reference Arias, Saxena and Verguet1, 2]. Common first-line treatments for most mental disorders include psychotherapy and pharmacotherapy [Reference Barth, Munder, Gerger, Nüesch, Trelle and Znoj3]. However, research suggests that 30%–40% of individuals do not achieve significant symptom improvement with current treatment options [Reference Barth, Munder, Gerger, Nüesch, Trelle and Znoj3–Reference Cuijpers, Noma, Karyotaki, Vinkers, Cipriani and Furukawa5]. While the combination of psychotherapy and pharmacotherapy can demonstrate greater effectiveness than treatment-as-usual or placebo [Reference Cuijpers, Quero, Noma, Ciharova, Miguel and Karyotaki6–Reference Hoskins, Bridges, Sinnerton, Nakamura, Underwood and Slater9], these approaches have notable limitations. Reported effects for pharmacotherapy, psychotherapy, and their combination are generally small to moderate, and meta-analyses frequently identify high publication bias, which may lead to an overestimation of treatment efficacy [Reference Cuijpers, Quero, Noma, Ciharova, Miguel and Karyotaki6–Reference Hoskins, Bridges, Sinnerton, Nakamura, Underwood and Slater9].
Recent research has explored the potential of psychedelics as a novel treatment for mental disorders. Preliminary evidence suggests that psychedelic-assisted psychotherapy may alleviate symptoms of various mental disorders, such as posttraumatic stress disorder (PTSD), depression, and anxiety, with generally mild and short-term adverse events (AEs) [Reference Goodwin, Aaronson, Alvarez, Arden, Baker and Bennett10, Reference Muttoni, Ardissino and John11]. Despite growing enthusiasm for psychedelic-assisted psychotherapy, the field is constrained by small, homogeneous samples and methodological challenges that limit the interpretability and generalizability of findings [Reference Muttoni, Ardissino and John11–Reference Schneier, Feusner, Wheaton, Gomez, Cornejo and Naraindas16]. A particularly underexamined issue is how placebo and non-specific effects are handled in clinical trials involving psychedelics – interventions that are deeply experiential, highly suggestible, and often embedded in intensive psychotherapeutic frameworks. Randomized controlled trials (RCTs) remain the gold standard for evaluating new treatments [Reference Hariton and Locascio17, Reference Sedgwick and Greenwood18], but they are ill-equipped to disentangle the overlapping effects of expectancy, therapeutic alliance, and psychotherapeutic context – factors that are especially potent in psychedelic studies. These trials frequently rely on active or low-dose psychedelic placebos combined with extensive psychological support, making it difficult to isolate the drug’s specific therapeutic contribution. Moreover, the powerful subjective effects of psychedelics often result in functional unblinding, further amplifying expectancy and undermining internal validity [Reference Muthukumaraswamy, Forsyth and Lumley19–Reference Wen, Singhal, Jones, Zeifman, Mehta and Shenasa21].
Unlike typical pharmacotherapy trials, where placebo groups receive inert pills with minimal interaction, control groups in psychedelic trials often receive a credible but subtherapeutic agent along with full psychotherapy protocols. This creates a placebo context rich in therapeutic elements – preparation, guided dosing, and integration – that may independently improve outcomes. Furthermore, due to the inherent difficulties in maintaining blinding in psychedelic trials, accurately estimating treatment effects is contingent upon the choice of control condition [Reference Burke and Blumberger22, Reference Aday, Heifets, Pratscher, Bradley, Rosen and Woolley23]. However, the field currently lacks a standardized or widely accepted control condition. This meta-analysis addresses that critical gap. Our objectives are: (1) to estimate between-group treatment effects on symptom change using direct meta-analyses of change scores; (2) to descriptively quantify within-group symptom changes in control groups and treatment groups; (3) to exploratorily compare within-control symptom changes across different control types (e.g., inactive vs low-dose/active placebo) where data permit, including comparisons of within-control change (and, secondarily, whether between-group effects differ by placebo type). By doing so, we aim to clarify how much improvement in psychedelic trials may be attributable to non-drug factors, a question central to interpreting efficacy and designing future studies.
Methods
This meta-analysis was prepared in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann and Mulrow24] and was registered in PROSPERO (CRD420251111853).
Search strategy
A comprehensive search of the literature was conducted on 1 July 2025 through OVID (MEDLINE, Embase, and APA PsychInfo) and PubMed to identify studies evaluating the use of psychedelics for the treatment of psychiatric disorders. The search strategy included the following psychedelics: psilocybin, lysergic acid diethylamide (LSD), dimethyltryptamine (DMT), 3,4-Methylenedioxymethamphetamine (MDMA), and ayahuasca. Psychedelics that targeted NMDA receptors, such as ketamine, were excluded due to their distinct, non-serotonergic mechanisms of action. Search terms included anxiety and related disorders, depression, bipolar disorder, schizophrenia, PTSD, and eating disorders. Results were filtered to RCTs. No language or date restrictions were applied.
Inclusion/exclusion criteria and screening
Studies underwent a two-stage screening process: first-level screening, which included an assessment based on title and abstract, followed by second-level screening, which involved a full-text assessment. Eligible studies were peer-reviewed RCTs that (1) involved participants with a diagnosed mental disorder, and (2) investigated the efficacy of psychedelics or psychedelic-assisted therapy versus a placebo control condition (active or inactive). Exclusion criteria were as follows: (1) concurrent use of any other forms of pharmacotherapy or drug treatment (e.g., antidepressants); (2) non-human studies; and (3) non-English publications. Screening was performed by two independent reviewers (RS and SM). Conflicts between the two reviewers were resolved through discussion and, when necessary, consultation with VB.
Data extraction
The following data on study design and patient demographics were extracted by two independent reviewers (RSSM): country of origin, mental disorder of patient population, diagnostic assessment method (e.g., Diagnostic and Statistical Manual of Mental Disorders [DSM] IV or V), mean age, sex (male or female), and sample size. Regarding treatment, the type of psychedelic, dose, and duration (in weeks), route of administration, type of placebo, and concurrent psychotherapy treatment (if applicable) were extracted. Outcome data comprised scores from validated, disorder-specific instruments at all reported assessment time points.
Quality assessment
Two independent reviewers (RS and SM) assessed the quality of all included studies using the Cochrane Risk of Bias tool [Reference Higgins, Altman, Gøtzsche, Jüni, Moher and Oxman25] (Supplementary Table 1).
Meta-analysis
All analyses were conducted using R software (version 4.4.0). For each outcome (depressive symptom severity, PTSD symptoms, and anxiety symptom severity), our primary efficacy analysis was a between-group meta-analysis of standardized differences in change scores (treatment vs control). When the mean change was not reported, it was calculated as endpoint mean minus baseline mean. The standard deviation (SD) of the change score was obtained using a predefined hierarchy: (1) reported SD of change; (2) derived from standard error (SE) of change; (3) derived from the 95% confidence interval (CI) of change; or, if none were available, (4) derived from baseline and endpoint SDs assuming a pre–post correlation of r = 0.5. For trials with two treatment arms sharing one control group, the control group sample size was split evenly across comparisons (rounded to the nearest integer), following Cochrane guidance for multi-arm trials. Between-group change-score meta-analyses were conducted using the meta package (version 8.0-2) with the standardized mean difference (Cohen’s d) based on group sample sizes, mean changes, and SDs of change [Reference Balduzzi, Rücker and Schwarzer30].
To contextualize symptom change within psychedelic trials, we conducted secondary within-group meta-analyses separately for treatment and control arms using standardized mean change with change-score standardization (SMCC), which quantifies pre–post changes while accounting for the paired structure of the data [Reference Burgess, Leng, Brizan and Hawe26]. These analyses were intended to describe the magnitude of symptom change occurring within each arm, capturing placebo-associated and other non-pharmacological effects in control groups, and were not designed to estimate treatment efficacy or to isolate a causal placebo effect. Effect sizes were computed using the escalc() and rma() functions from the metafor package (version 4.8-0) [Reference Viechtbauer27]. Studies that reported change-score dispersion (SD/SE/CI) were analyzed directly from the change statistics; studies with baseline/endpoint data only used the derived SD change with the same assumed r = 0.5. One study [Reference Bouso, Doblin, Farré, Alcázar and Gómez-Jarabo34] in the control group was excluded from the SMCC meta-analysis due to a very small sample size (n = 2), which led to a non-estimable effect size.
Given anticipated heterogeneity across trials, all meta-analyses used random-effects models. We also conducted sensitivity analyses varying the assumed pre–post correlation used to derive SD of change (r = 0.3, 0.7, and 0.9) to assess robustness [Reference Balk, Earley, Patel, Trikalinos and Dahabreh28–Reference Harrer, Cuijpers, Furukawa and Ebert31]. In an additional sensitivity analysis restricted to the primary correlation assumption (r = 0.5), we repeated the meta-analyses for depressive and PTSD symptoms after excluding studies rated as having a high risk of bias or some concerns of bias; this analysis was not conducted for anxiety due to an insufficient number of studies. Control conditions were classified as inactive placebo versus active/low-dose placebo, and subgroup analyses by placebo type were performed as exploratory analyses for depressive and PTSD symptoms. Because few studies contributed to each subgroup, subgroup findings should be interpreted cautiously and viewed as exploratory rather than definitive. Statistical heterogeneity was summarized using I2 and τ2. Effect sizes were interpreted following standard thresholds, where Cohen’s d or SMCC values of 0.2, 0.5, and 0.8 represent small, medium, and large effects, respectively [Reference Higgins, Thomas, Chandler, Cumpston, Li and Page32]. Heterogeneity across studies was assessed using the I2 statistic, with values classified as follows: 0%–40% (not important), 30%–60% (moderate), 50%–90% (substantial), and 75%–100% (considerable) [Reference Higgins, Thomas, Chandler, Cumpston, Li and Page32]. Potential small-study effects/publication bias were assessed using funnel plots and Egger’s regression test when ≥10 study arms were available for a meta-analysis. For between-group analyses, k reflects the number of comparisons; for within-group analyses, k reflects the number of unique arms contributing data.
Results
Search results
The initial search identified 3295 articles. Duplicates were removed prior to screening (n = 320). A total of 2975 studies underwent first-level screening, and 32 full-text articles were retrieved for second-level assessment. Finally, 14 RCTs (n = 643) met the inclusion criteria and were included in the meta-analysis (Figure 1) [Reference Back, Freeman-Young, Morgan, Sethi, Baker and Myers33–Reference Wolfson, Andries, Feduccia, Jerome, Wang and Williams46]. Characteristics of included studies are indicated in Table 1.

Figure 1. PRISMA flow diagram.
Table 1. Included the characteristics of the studies

Note: Initially 30 mg, but after two of the first three participants discontinued due to one vomiting shortly after administration and the other due to personal reasons, it was lowered to 22 mg. Abbreviations: MDD, Major depressive disorder; MDMA, 3,4-ethylenedioxymethamphetamine; PTSD, Post-traumatic stress disorder; UK, United Kingdom; USA, United States of America.
Quality assessment
Across the 14 included trials, the risk of bias was predominantly low. Ten studies were low risk overall, with low risk across all five domains. Three studies were rated as having some concerns overall: Holze et al. [Reference Holze, Gasser, Müller, Dolder and Liechti36] and Wolfson et al. [Reference Wolfson, Andries, Feduccia, Jerome, Wang and Williams46], due to some concerns regarding deviations from intended interventions, and von Rotz et al. [Reference von Rotz, Schindowski, Jungwirth, Schuldt, Rieser and Zahoranszky45] due to some concerns in outcome measurement, while all other domains in these studies were low risk. One study, Bouso et al. [Reference Bouso, Doblin, Farré, Alcázar and Gómez-Jarabo34], was assessed as high overall risk of bias, driven by high risk due to deviations from intended interventions, with additional concerns for outcome measurement and selective reporting, despite low risk for the randomization process and missing outcome data.
Meta-analysis results
Between-group comparison: Direct meta-analysis of change scores
Across 14 trials comprising 17 arm comparisons, eight trials used inactive placebos, while six used active or low-dose placebos. Figures 2–4 present the direct between-arm meta-analyses of change scores for depressive symptoms, PTSD symptoms, and anxiety symptoms. The pooled effect favored treatment across all outcomes, showing large reductions in depressive symptoms (k = 13; SMD = −0.82; 95% CI = −1.17, −0.47; I2 = 60.1%) and PTSD symptoms (k = 10; SMD = −0.89; 95% CI = −1.14, −0.65; I2 = 0%), and medium reductions in anxiety symptoms (k = 5; SMD = −0.66; 95% CI = −0.94, −0.38; I2 = 0%) compared with control. Subgroup analyses did not indicate that effects differed by placebo type for depressive symptoms (inactive k = 5 versus active k = 8; subgroup differences p = 0.47) or PTSD symptoms (inactive k = 4 versus active k = 6; subgroup differences p = 0.81). Funnel plots and Egger’s tests provided limited evidence of small-study effects for depressive symptoms (p = 0.24; Supplementary Figure 1S) and PTSD symptoms (p = 0.15; Supplementary Figure 2S). Sensitivity analyses are presented in Supplementary Table 2S; varying the assumed within-study correlation (r) from 0.3 to 0.9 changed the magnitude of pooled effects (generally becoming larger in absolute value at higher r), while the overall direction remained consistent with greater symptom reduction in the treatment condition. The second sensitivity analysis excluding studies with high or some concerns of bias showed similar results for depressive symptoms (k = 10; SMD = −0.88; 95% CI = −1.31, −0.44; I2 = 68.3%; Supplementary Figure 6aS) and PTSD symptoms (k = 9; SMD = −0.89; 95% CI = −1.14, −0.64; I2 = 0%; Supplementary Figure 7aS).

Figure 2. Forest plot of direct between-arm comparisons for depressive symptom change, subgroup by placebo type.

Figure 3. Forest plot of direct between-arm comparisons for PTSD symptom change, subgroup by placebo type.

Figure 4. Forest plot of direct between-arm comparisons for anxiety symptom change.
Within-group effects
Supplementary Figures 3S–5S present within-group meta-analyses of SMCC across outcomes in control and treatment groups. In control groups, pooled estimates indicated medium-to-large symptom reductions across outcomes, including depressive symptoms (k = 9; SMCC = −0.58; 95% CI = −0.77, −0.38; I2 = 34%), PTSD symptoms (k = 7; SMCC = −0.75; 95% CI = −1.10, −0.39; I2 = 58.2%), and anxiety symptoms (k = 4; SMCC = −0.51; 95% CI = −0.71, −0.30; I2 = 0%). Subgroup analyses by placebo type showed no differences for depressive symptoms (inactive k = 4 versus active k = 5; subgroup differences p = 0.70), while larger within-group reductions in PTSD symptoms were observed in the inactive placebo group (SMCC = −1.15 vs. −0.37; subgroup differences p < 0.001).
In treatment groups, within-group analyses showed larger symptom reductions across outcomes, including depressive symptoms (k = 13; SMCC = −1.30; 95% CI = −1.69, −0.92; I2 = 84.1%), PTSD symptoms (k = 10; SMCC = −1.47; 95% CI = −1.86, −1.08; I2 = 61.7%), and anxiety symptoms (k = 5; SMCC = −1.16; 95% CI = −1.40, −0.92; I2 = 0%). Funnel plot inspection (Supplementary Figure 3cS) suggested some asymmetry, and Egger’s test indicated evidence of small-study effects (p = 0.005). Sensitivity analyses varying the assumed within-study correlation (r = 0.3–0.9; Supplementary Tables 3aS–3bS) showed that effect magnitudes changed as expected, while the direction of within-group change remained consistent across outcomes. In a separate sensitivity analysis excluding studies with a high risk of bias or some concerns of bias (Supplementary Figures 6S–7S; conducted under the primary r = 0.5 assumption), pooled effects for depressive symptoms and PTSD symptoms remained consistent.
Discussion
Our results demonstrated that between-group meta-analyses of change scores consistently favored treatment for depressive symptoms, PTSD symptoms, and anxiety symptoms, with larger benefits for depression and PTSD and more moderate benefits for anxiety, and no meaningful differences by placebo type. Within-group analyses showed symptom reductions in control conditions but larger improvements in treatment groups; placebo type did not materially affect within-group depression changes, whereas control-group PTSD improvements appeared greater with inactive placebo. Small-study effects were limited for between-group findings but suggested in treatment-group within-group analyses, and sensitivity analyses varying the within-study correlation changed effect sizes but not the direction of effects.
The magnitude of symptom reduction observed in control groups is clinically relevant but warrants cautious interpretation. The within-control changes in the present synthesis appear more modest than the “large” placebo responses reported in some broader overviews of mental health trials [Reference Huneke, Amin, Baldwin, Bellato, Brandt and Chamberlain47], although direct comparison is limited by differences in sampled populations, concomitant interventions (including psychotherapy), follow-up intervals, and effect-size calculations. In psychedelic-assisted psychotherapy trials, both treatment and control arms typically include substantial clinical contact and structured support [Reference Back, Freeman-Young, Morgan, Sethi, Baker and Myers33–Reference Wolfson, Andries, Feduccia, Jerome, Wang and Williams46], which may contribute to symptomatic improvement independent of psychedelic exposure; accordingly, observed control-arm changes are best interpreted as reflecting overall response under the trial context rather than evidence for any single operative mechanism. A further methodological consideration is that psychedelic trials may be especially susceptible to lessebo effects, attenuated improvement when participants infer assignment to placebo or a sub-therapeutic condition, particularly in settings characterized by strong prior expectations and imperfect masking [Reference Mestre48, Reference Huneke, Fusetto Veronesi, Garner, Baldwin and Cortese49]. Additional non-specific influences may include reactivity to repeated, structured outcome assessments (e.g., trauma-focused interviews in PTSD trials) [Reference Weathers, Keane and Davidson50], regression to the mean, and natural symptom fluctuation. Because expectancy and blinding integrity were inconsistently measured and reported across included studies [Reference Huneke, Fusetto Veronesi, Garner, Baldwin and Cortese49], these potential contributors could not be evaluated quantitatively and should be treated as hypotheses rather than causal explanations.
While between-group subgroup analyses did not indicate clear differences by control type, the within-control analyses suggested that PTSD symptom reductions may be larger in trials using inactive controls than in those using active controls, whereas depressive symptoms did not show a clear pattern by control condition. These observations raise important methodological considerations. Low-dose psychedelics and other active control strategies are often used to preserve blinding, yet they may still compromise trial integrity if participants infer allocation based on the presence or absence (or intensity) of expected acute effects, potentially shaping expectancy, engagement, and therapeutic response [Reference Yanakieva, Polychroni, Family, Williams, Luke and Terhune51–Reference Cameron, Nazarian and Olson52]. This issue is directly relevant to the design of the MAPS phase 3 MDMA trials, which used an inactive placebo comparator; the investigators noted that low-dose MDMA had improved blinding in earlier studies but was not selected for phase 3 in part to better isolate drug efficacy and permit cleaner safety comparisons. More broadly, low-dose psychedelics may not be pharmacologically inert: even sub-perceptual or low doses can produce subtle subjective, cognitive, or physiological effects that could influence participant experience and expectations, complicating interpretation of treatment effects. Finally, the absence of consistent control-type differences in many analyses highlights ongoing challenges in this literature, including limited power to detect subgroup effects and the frequent lack of systematic measurement of expectancy and blinding integrity factors that are critical for improving the rigor and interpretability of psychedelic trials.
Our study has several limitations. First, due to the absence of detailed information on expectancy and blinding across studies, we were not able to evaluate the effects of these factors in our analysis. Differences in participant expectations and inadequate blinding may have biased the results, potentially leading to an overestimation of the intervention’s true effects. Second, since the included psychedelic trials involved psychotherapy sessions even in the placebo groups, the magnitude of the group response cannot be accurately estimated, as the effects of therapy are inherently included. These limitations suggest that future research should aim for more consistent reporting, larger sample sizes, better control for expectancy and blinding, and the inclusion of a therapy-alone condition or, ideally, a 2 × 2 design to enhance the validity and applicability of the findings.
In conclusion, this meta-analysis showed that control conditions in psychedelic-assisted psychotherapy trials were associated with meaningful within-group symptom reductions, with moderate-to-large improvements in depressive and PTSD symptoms and moderate improvements in anxiety. Treatment groups consistently demonstrated greater symptom improvement than controls across outcomes, but the magnitude of change observed in control groups is consistent with an important role for non-pharmacological and contextual factors, although these mechanisms could not be evaluated directly in the included trials. In addition, while between-group effects did not appear to differ by control type, the within-control findings suggested larger PTSD improvements under inactive placebo than active control conditions, underscoring how comparator selection may shape outcomes and interpretation. Collectively, these findings emphasize the need for careful design, justification, and transparent reporting of control conditions, alongside routine assessment of blinding integrity and expectancy, so that the incremental benefit attributable to psychedelics can be interpreted within the therapeutic milieu. Future research should prioritize dismantling and head-to-head comparator designs, as well as mechanistic studies, to better disentangle drug-specific effects from psychological and contextual contributors to response.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1192/j.eurpsy.2026.10168.
Data availability statement
This study is a meta-analysis of previously published data and did not involve the collection of individual participant data. Extracted summary data, as well as the study protocol and analysis code, are available from the corresponding author upon reasonable request.
Financial support
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Competing interests
Lisa Burback, Olga Winkler, and Yanbo Zhang are supported by the Academic Medicine and Health Services Program (AMHSP), a joint program funded by the University of Alberta and Alberta Health Services to ensure physicians affiliated with Alberta’s faculties of medicine are compensated for providing patient care along with their work related to research, innovation, education, administration, and leadership. Rakesh Jetly is the CMO of Mydecine Innovation Group. Muhammad I. Husain has led contracted research for COMPASS Pathfinder Limited and has provided consultancy to Mindset Pharma Inc., Psyched Therapeutics, and Wake Network. Venkat Bhat is supported by an Academic Scholar Award from the University of Toronto Department of Psychiatry and has received research funding from the Canadian Institutes of Health Research, Brain & Behavior Foundation, Ontario Ministry of Health Innovation Funds, Royal College of Physicians and Surgeons of Canada, Department of National Defense (Government of Canada), New Frontiers in Research Fund, Associated Medical Services Inc. Healthcare, American Foundation for Suicide Prevention, Roche Canada, Novartis, and Eisai. Matthew J. Burke is supported by Academic Scholar Awards from the University of Toronto and Sunnybrook Health Sciences Centre Department of Psychiatry.





Comments
No Comments have been published for this article.