Auditory verbal hallucinations (AVHs), the experience of hearing voices without an external source, Reference Beck and Rector1 can occur across a wide range of psychiatric disorders, including schizophrenia spectrum disorders, affective disorders, borderline personality disorder, post-traumatic stress disorder, as well as in non-clinical populations. Reference Maijer, Begemann, Palmen, Leucht and Sommer2 Despite their cross-diagnostic presence, AVHs are most commonly associated with schizophrenia spectrum disorders, where approximately 70% of patients experience them at some point during their lives. Reference McCarthy-Jones, Trauer, Mackinnon, Sims, Thomas and Copolov3 Phenomenologically, AVHs are highly heterogeneous, varying in content, emotional valence and perceived agency. Reference Woods, Jones, Alderson-Day, Callard and Fernyhough4 They may be personified or non-personified, and experienced as persecutory, abusive, obscene, derogatory, threatening or critical, but also potentially helpful, affirming or inspirational. Reference McCarthy-Jones, Trauer, Mackinnon, Sims, Thomas and Copolov3
Although not inherently pathological, AVHs can become clinically relevant when they are experienced as intrusive, uncontrollable or malevolent. Reference Larøi, Bless, Laloyaux, Kråkvik, Vedul-Kjelsås and Kalhovde5 In such cases, AVHs are often associated with heightened distress, functional impairment and increased psychopathology. Reference Toh, Thomas, Hollander and Rossell6
In the treatment of schizophrenia spectrum disorders, guidelines emphasise a multidisciplinary approach, with antipsychotic medication as a central component. Reference Hasan, Falkai and Lehmann7–Reference Kuipers, Yesufu-Udechuku, Taylor and Kendall9 Although antipsychotics are effective for a substantial proportion of patients, Reference Samara, Nikolakopoulou, Salanti and Leucht10 approximately 20–35% do not experience clinically meaningful improvement. Reference Diniz, Fonseca and Rocha11–Reference Siskind, Orr and Sinha13 Limitations are further underscored by high relapse rates upon medication discontinuation, Reference Bogers, Hambarian, Walburgh Schmidt, Vermeulen and Haan14,Reference Zipursky, Menezes and Streiner15 and the risk of burdensome side-effects. Reference Leucht, Priller and Davis16 Notably, around 30% of treatment-resistant symptoms involve persistent AVHs, Reference Goghari, Harrow, Grossman and Rosen17–Reference Nathou, Etard and Dollfus19 highlighting the urgent need for additional, targeted interventions for individuals who continue to experience significant voice-related distress despite standard pharmacological care.
In addition, psychological interventions such as cognitive–behavioural therapy (CBT) are recommended. Reference Hasan, Falkai and Lehmann7–Reference Kuipers, Yesufu-Udechuku, Taylor and Kendall9 CBT has demonstrated small effects on psychotic symptoms in numerous domains, including overall positive Reference Bighelli, Salanti, Huhn, Schneider-Thoma, Krause and Reitmeir20 and AVH-related symptoms. Reference Turner, Burger, Smit, Valmaggia and Gaag21 However, CBT shows relevant limitations including a large number of sessions required for effectiveness. Reference Lincoln, Jung, Wiesjahn and Schlier22 More recently, a shift toward symptom-specific approaches has gained momentum, allowing for more personalised treatment strategies. Reference Freeman23 For AVHs, relational therapies have emerged as a promising line of intervention, drawing on the person-like qualities of voices and conceptualising AVHs as embedded within a dynamic, relationship-like framework. Reference Smailes, Alderson-Day, Fernyhough, McCarthy-Jones and Dodgson24 Among these, the therapy intervention Audio Visual Assisted Therapy Aid for Refractory Auditory Hallucinations (AVATAR) represents a novel therapeutic development explicitly aimed at improving outcomes for individuals experiencing distressing and persistent voices. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25
Initially developed by Julian Leff in 2008, Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26 AVATAR therapy enables real-time interactions with a digital representation of a person’s most dominant AVH. The approach involves the creation of a patient-designed digital avatar displayed either on a screen or through a virtual reality headset. A clinician then animates the avatar by voicing it according to the patient’s descriptions, and the avatar’s facial and head movements are synchronised to simulate natural speech. This simulated ‘face-to-face’ interaction with the avatar functions as a form of exposure to anxiety-provoking stimuli, Reference Rus-Calafell, Ward, Zhang, Edwards, Garety and Craig27 with the therapeutic aim of gradually increasing the individual’s sense of control, reducing fear-based appraisals and altering the relational dynamics with the voice. By enabling patients to assert themselves and challenge previously threatening voices, AVATAR therapy may reduce the distress associated with AVHs. Reference Rus-Calafell, Ehrbar, Ward, Edwards, Huckvale and Walke28 In line with this, maladaptive appraisals of voices, such as beliefs about omnipotence or malevolence, have shown associations with voice-related distress, whereas more positive interpretations were modestly associated with reduced distress. Reference Tsang, Bucci and Branitsky29
Since its creation, AVATAR therapy has been evaluated in several clinical trials. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25 Previous reviews have examined AVATAR therapy among virtual reality-based treatments in mental disorders or positive symptoms of schizophrenia spectrum disorders, Reference Spark, Pot-Kolder, Dzafic, Nelson, Byrne and Lum30,Reference Zeka, Clemmensen, Valmaggia, Veling, Hjorthøj and Glenthøj31 but they have not provided a comprehensive perspective of AVATAR therapy for AVHs beyond its role within the broader context of virtual reality-based treatments. Two previous systematic reviews and meta-analyses have examined AVATAR trials and have reported promising effects on AVH-related symptoms. Reference Aali, Kariotis and Shokraneh32,Reference Hsu, Tseng, Hsu, Yang, Changchien and Lin33 However, these were limited by the inclusion of few studies in the meta-analysis, by analysis of short follow-ups and by the lack of secondary aspects such as tolerability and acceptability of the treatment. These aspects are especially relevant for clinical decision-makers and guideline developers in evaluating the real-world utility of emerging therapies. Moreover, further high-quality clinical trials may have since been published, revealing the necessity for an updated review. In light of these limitations, we aimed to comprehensively assess the efficacy of AVATAR-based interventions for AVHs, including both clinical and functional outcomes, as well as their tolerability, acceptability and the overall quality of available evidence.
Method
This systematic review and meta-analysis was preregistered at PROSPERO on 15 March 2025 (identifier CRD420251005545) and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines (see Supplementary Tables 1 and 2 available at https://doi.org/10.1192/bjo.2026.11014). Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann and Mulrow34
Search strategy
On 21 March 2025, the publication databases Embase, PubMed and PsycINFO, as well as the grey literature and trial registration databases Web of Science Core Collection, CENTRAL, ClinicalTrials.gov and ISRCTN, were searched for relevant studies, to minimise the potential of publication bias. Indexing and general terms were applied in the search (see Supplementary Table 3). Reference Bramer, De Jonge, Rethlefsen, Mast and Kleijnen35 References of prior reviews and included articles were checked for additional studies, and experts were contacted for knowledge of further suitable trials. Duplicate removal and record screening were performed in the systematic review program Rayyan. Reference Ouzzani, Hammady, Fedorowicz and Elmagarmid36 F.O. and S.H. independently screened titles and abstracts for full-text screening eligibility with hierarchical criteria (see Supplementary Table 4), Reference Polanin, Pigott, Espelage and Grotpeter37 before independently completing the full-text screening in a spreadsheet. Corresponding authors of trial registrations and missing data were contacted weekly over the period of 4 weeks to request missing information. For disagreements during screenings, a third reviewer (K.B.) was consulted for discussion. Interrater reliability in the metric of Cohen’s κ was calculated using the DeltaMAN package for both screening stages. Reference Cohen38,Reference Maldonado, Marzo and Andrés39
Inclusion criteria
To be included, studies had to be published or unpublished, randomised controlled trials or controlled trials, investigating the efficacy of AVATAR or equivalent therapy in treating samples of participants aged ≥16 years who reported AVHs and were diagnosed with a schizophrenia spectrum disorder in >50% of cases, compared to any active or passive control group. Finally, studies had to report one of the following outcomes of interest.
The primary outcome of this study was the between-group difference of severity of voice symptoms at post-treatment measured with the Psychotic Symptom Rating Scales for Auditory Hallucinations (PSYRATS-AH Reference Haddock, McCarron, Tarrier and Faragher40 ). Secondary outcomes included further voice-related (frequency and distress in the PSYRATS-AH), clinical (positive, negative, total psychotic, depressive and anxiety symptoms) and functional outcomes (social functioning and quality of life), as well as their follow-up assessments. We also indexed acceptability by examining within- and between-group drop-out rates, and tolerability from reports of treatment-related adverse events and symptom exacerbations. For multiple non-AVH measurements per study and outcome, the more primary measurement was included into the quantitative analysis (per study definition or earlier reference as an outcome). Follow-up measurements were categorised as short (12–23 weeks), medium (24–51 weeks) and long term (≥52 weeks) as a pragmatic approach reflecting common follow-up durations selected in meta-analyses of psychological intervention trials in schizophrenia spectrum disorders. Reference Guaiana, Abbatecola, Aali, Tarantino, Ebuenyi and Lucarini41,Reference Mayer, Corcoran, Kennedy, Leucht and Bighelli42
We examined both treatment and study drop-out. Treatment drop-out was identified by any participant who did not complete the treatment post-randomisation. Reference Cooper and Conklin43 The criteria for having finished a treatment followed that of each study. In contrast, study drop-out was characterised by participants lost to post-treatment assessments regardless of reason.
Data extraction, synthesis and effect sizes
Data was independently extracted by F.O. and S.H., and cross-checked by D.A. The data extraction started on 2 April 2025. Meta-analysis was performed when at least two effect sizes could be pooled per analysis. Although AVATAR therapy approaches are similar, we anticipated differences in control groups, leading us to the use of random-effects models, which allow for heterogeneous treatment effects. Reference Riley, Higgins and Deeks44
The metric of Hedges’ g was used to report the standardised mean difference for continuous outcomes and aggregate outcomes of overlapping constructs using different scales. Reference Hedges45 Negative effect sizes described outcomes with a lower mean score in the AVATAR groups relative to the control groups, with positive values indicating the opposite. Values of 0.2, 0.5 and 0.8 were considered the thresholds for low, medium and large effect sizes, respectively. Reference Cohen46 Proportion and risk ratio were calculated for within- and between-group drop-out effect sizes. Drop-out proportions were calculated with generalised linear-mixed models. Reference Lin and Chu47 Risk ratios >1 portrayed an increased risk in AVATAR relative to control groups. To adjust for zero-case studies, 0.5 cases of drop-out were added to each study’s risk table containing at least one instance of zero drop-outs in either group. Reference Weber, Knapp, Ickstadt, Kundt and Glass48 For three-arm studies, the sample sizes of the control groups were halved to include both AVATAR groups, per the suggestions in the Cochrane Handbook. Reference Higgins, Eldridge, Li, Higgins, Thomas, Chandler, Cumpston, Li and Page49
The presence of heterogeneity was tested with Cochran’s Q-test and interpreted as Higgins’ I 2 -statistic. Reference Cochran50,Reference Higgins51 Percentages of 25%, 50% and 75% were considered low, moderate and high levels of heterogeneity, respectively. All analyses were performed in R version 2025.05.1+513 in RStudio for macOS, using the metafor and Tidyverse packages, and were visualised with forest plots. 52–Reference Wickham, Averick and Bryan54 To exploratively assess the robustness of effects on AVH symptoms, subgroup analyses of studies with low and high risks of bias were performed. Additionally, for AVH symptoms, analyses by control group (treatment as usual versus treatment as usual plus waitlist versus active control group) were performed with subgroup analyses and mixed-effects meta-regression. Results with significant heterogeneity were examined with jackknife analyses. Reference Viechtbauer and Cheung55 An α-level of 0.05 was considered for all analyses.
Bias assessments, publication bias and quality of evidence
F.O. and S.H. independently assessed risk of bias with the Revised Cochrane Risk-of-Bias Tool for Randomized Trials (RoB-2 Reference Sterne, Savović, Page, Elbers, Blencowe and Boutron56 ), and differences were discussed with a third reviewer, K.B. Bias was rated under consideration of the primary outcome of AVH severity. The risk of bias was visualised using the robvis app. Reference McGuinness and Higgins57 Publication bias may occur when articles remain unpublished because of unwanted study results. Reference Dickersin and Min58 For main analyses, funnel plots were plotted for its assessment and asymmetry was tested with Egger’s tests. Reference Egger, Davey Smith, Schneider and Minder59 Asymmetric analyses were eligible to be corrected with the trim-and-fill procedure. Reference Duval and Tweedie60 F.O. characterised the certainty of meta-analytic evidence according to the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) criteria, using GRADEpro online software. Reference Guyatt, Oxman and Vist61,62
Results
Study selection
The literature search identified 556 records (see the PRISMA flow diagram in Supplementary Fig. 1). After duplicate removal and dual screenings, a total of seven peer-reviewed articles and one letter to the editor were included. We contacted authors of trial registrations and conference abstracts, although no ongoing trials were able to provide includable information concerning relevant outcomes for this study. Abstract and full-text screening exclusion reasons can be viewed in Supplementary Tables 4 and 5. The original AVATAR trial included a small number of participants under 16 years of age, Reference Leff, Williams, Huckvale, Arbuthnot and Leff25 formally conflicting with our inclusion criteria. Given this small proportion and comparable mean age and standard deviation to other included samples, the study was retained. The title and abstract as well as full-text screening interrater reliabilities revealed Cohen’s κ = 0.95 and 0.93, respectively, which can be considered satisfactory. Reference McHugh63
Study and sample characteristics
The eight randomised and controlled studies included nine relevant comparisons (Garety et al Reference Garety, Edwards and Jafari64 included two AVATAR groups) and n = 554 participants in the AVATAR therapy groups and n = 424 in the control groups. Of note is that three studies were partial crossover trials, Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65,Reference Stefaniak, Sorokosz, Janicki and Wciórka66 with only the period before crossover included in the analyses. The post-treatment measurement varied from 7 to 16 weeks (mean 9.88 weeks). Each study that performed a follow-up assessment did so at 12–13-week follow-up, and one study additionally assessed at 24- and 52-week follow-ups. Reference Dellazizzo, Potvin, Phraxayavong and Dumais67 All participants received treatment as usual, typically including antipsychotic medication. Control groups consisted of treatment as usual alone, Reference Garety, Edwards and Jafari64 treatment as usual plus waitlist, Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65,Reference Stefaniak, Sorokosz, Janicki and Wciórka66 or treatment as usual with an additional active control condition (CBT, Reference Dellazizzo, Potvin, Phraxayavong and Dumais67,Reference Liang, Li and Guo68 supportive counselling, Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26 enhanced treatment as usual with supportive counselling sessions Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 ). The studies with an additional active control condition were categorised as active control groups in the subgroup analysis. The range of treatment durations was 6 to 9 weeks or sessions except for AVATAR-EXT, Reference Garety, Edwards and Jafari64 an extension of AVATAR in 12 weekly sessions. Samples consisted of persons with schizophrenia spectrum disorders reporting treatment-resistant, chronic or persisting AVHs, although three studies included small subsamples of other mental disorders. Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26,Reference Garety, Edwards and Jafari64,Reference Stefaniak, Sorokosz, Janicki and Wciórka66 Demographic and clinical outcomes, as well as specific study characteristics, can be found in Table 1. Additional information (protocols, criteria, settings, diagnosis distributions, participant ages and funding) is provided in Supplementary Table 6, whereas Supplementary Table 7 describes additional information received from study authors.
Table 1 Study characteristics

N, sample size; T0, baseline; AVATAR, Audio Visual Assisted Therapy Aid for Refractory Auditory Hallucinations; TAU, treatment as usual; AVH, auditory verbal hallucinations; T1, post-treatment; T2, follow-up; RCT, randomised controlled trial; RPCT, randomised partial crossover trial.
Sample sizes given at baseline.
AVATAR therapy protocols
The treatment protocols showed differences. Treatment began with the creation of a digital avatar, except in Stefaniak et al, Reference Stefaniak, Sorokosz, Janicki and Wciórka66 which used a standardised avatar. Although some protocols enabled viewing the avatar via a virtual reality headset, Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65,Reference Dellazizzo, Potvin, Phraxayavong and Dumais67–Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 others presented the avatar on a computer screen, Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26,Reference Garety, Edwards and Jafari64,Reference Stefaniak, Sorokosz, Janicki and Wciórka66 and one used full-face and body motion capture of the therapist to increase the feeling of presence and immersion. Reference Liang, Li and Guo68 The participants usually interacted with an avatar voiced by the therapist in real time, although in Stefaniak et al, Reference Stefaniak, Sorokosz, Janicki and Wciórka66 a pre-programmed avatar was used, which the therapist and participant both interacted with.
Post-creation sessions included a therapeutic session preparation phase, followed by exposure to distressing utterances of the avatar, during which the participant was encouraged respond assertively. Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26 As the treatment progressed, the avatar’s verbalisations gradually became less abusive and more supportive to reflect the participants’ increase in control. Later sessions were designated for future outlook and relapse prevention. As an extension of this framework, AVATAR-EXT provided six additional sessions, where participants discussed the trauma, marginalisation and social exclusion background of their AVHs with the therapist before avatar exposure. Reference Garety, Edwards and Jafari64 Half of the studies provided recordings of the sessions, which could be listened to as homework to increase transferral to daily life. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26,Reference Garety, Edwards and Jafari64,Reference Liang, Li and Guo68
Risk of bias
The risk-of-bias assessments are displayed in Fig. 1, and the results of the bias assessment questions can be found in Supplementary Table 8. Overall, two studies were rated with low risk of bias, Reference Garety, Edwards and Jafari64,Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 and the remaining six studies high overall risk of bias. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26,Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65–Reference Liang, Li and Guo68 Importantly, the two studies rated as having low risk of bias accounted for the majority (63%) of participants.

Fig. 1 Risk-of-bias assessments. Ext., extended; Brf., brief.
Meta-analysis and systematic review results
Although control groups varied across studies, a main analysis of all studies was performed. Treatments for schizophrenia spectrum disorders differ depending on the setting, availability and severity of symptoms. Reference Burgess-Barr, Nicholas, Venus, Singh, Nethercott and Taylor70–Reference McDonagh, Dana, Selph, Devine, Cantor and Bougatsos72 Consequently, this approach provided greater external validity, but may have induced heterogeneity. Subgroup analyses by specific control group type were conducted to complement this approach. Forest plots of all secondary outcomes can be found in Supplementary Figs 2–6, the assessments of the certainty of evidence according to GRADE criteria can be found in Supplementary Table 9 and information on which scales were included can be found in Supplementary Table 10.
Primary and secondary AVH outcomes: AVH severity, frequency and distress
Table 2 presents the meta-analytic results for voice-related outcomes alongside the corresponding certainty of evidence, and Table 3 shows subgroup and moderation analyses by subgroup and bias. Each study assessed AVH severity, frequency and distress at pre- and post-treatment. Short-term 12- to 13-week follow-up assessments were performed in all but three studies. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65,Reference Stefaniak, Sorokosz, Janicki and Wciórka66 The pooled effect size on AVH severity at post-treatment was small, homogeneous and favoured AVATAR (Hedges’ g = −0.40, 95% CI −0.54 to −0.25, p < 0.001). This effect was moderated by control group (p = 0.01), although each subgroup favoured AVATAR. At short-term follow-up, the effect size remained small (Hedges’ g = −0.25, 95% CI −0.40 to −0.10, p < 0.001) and homogeneous. This effect was not moderated by control group (p = 0.82), but only the active control subgroup significantly favoured AVATAR. Medium- and long-term effect sizes were derived from only one study, and were small and non-significant. Studies rated with a low risk of bias revealed similar results at both post-treatment and short-term follow-up. Forest plots for AVH severity are displayed in Fig. 2. The certainty of evidence was rated moderate for post-treatment and low for the short-term-follow-up, whereas medium- and long-term follow-ups were rated very low.
Table 2 Meta-analytic results for auditory verbal hallucination-related outcomes

N, sample size; I 2, heterogeneity statistic; GRADE, Grading of Recommendations, Assessment, Development and Evaluation; AVH, auditory verbal hallucinations; T1, post-treatment; T2, follow-up at 12 weeks; T3, follow-up at 24 weeks; T4, follow-up at 52 weeks.
Negative effect sizes described lower scores in intervention groups.
a. GRADE ratings 73 : High: a lot of confidence that the true effect lies close to that of the estimated effect. Moderate: moderate confidence in the estimated effect. The true effect is likely to be close to the estimated effect, but there is a possibility that it is substantially different. Low: limited confidence in the estimated effect. The true effect might be substantially different from the estimated effect. Very low: very little confidence in the estimated effect. The true effect is likely to be substantially different from the estimated effect.
*p < 0.05; **p < 0.01; ***p < 0.001.
Table 3 Meta-analytic subgroup and moderator results for auditory verbal hallucination-related outcomes

N , sample size; I 2, heterogeneity statistic; AVH, auditory verbal hallucinations; T1, post-treatment; TAU, treatment as usual; T2, follow-up at 12–23 weeks; T3, follow-up at 24–51 weeks; T4, follow-up at 52 weeks.
Negative effect sizes described lower scores in intervention groups.
*p < 0.05; **p < 0.01; ***p < 0.001.

Fig. 2 Forest plot showing meta-analytic results for AVH severity at post-treatment and follow-up. Negative SMDs portray smaller means in AVATAR compared with control groups. ⊕, Low bias; ⊖, high bias; AVATAR, Audio Visual Assisted Therapy Aid for Refractory Auditory Hallucinations; AVH, auditory verbal hallucination; Brf., brief; CBT, cognitive–behavioural therapy; Ext., extended; I 2, Higgins’ heterogeneity statistic; k, number of comparisons; n, sample size; Q, Cochran’s Q-statistic; Qp, Cochran’s Q p-value; SMD, standardised mean difference in Hedges’ g; TAU, treatment as usual. Reference Stefaniak, Sorokosz, Janicki and Wciórka66
For AVH frequency, the pooled effect size at post-treatment was small, favoured AVATAR (Hedges’ g = −0.38, 95% CI −0.52 to −0.24, p < 0.001) and was homogeneous. Of the control group subgroup analyses, all but the AVATAR versus the treatment as usual plus waitlist group favoured AVATAR. The effect size at short-term follow-up remained small (Hedges’ g = −0.34, 95% CI −0.49 to −0.19, p < 0.001), significantly favoured AVATAR across subgroups and was homogeneous. Medium- and long-term follow-up effect sizes were derived from one study and were small and non-significant. Studies rated with a low risk of bias revealed similar results at both time points. The certainty of evidence was rated high for post-treatment and low for short term follow-up, whereas medium- and long-term follow-ups were rated very low.
For AVH distress, the pooled effect size at post-treatment was medium, significant (Hedges’ g = −0.32, 95% CI −0.46 to –0.18, p < 0.001) and homogeneous. This effect was moderated by control group (p = 0.02), but each of the subgroups significantly favoured AVATAR. The pooled small effect size at short-term follow-up favoured AVATAR (Hedges’ g = −0.20, 95% CI −0.35 to −0.06, p = 0.007) and was homogeneous. Only the active control subgroup favoured AVATAR at short-term follow-up. Medium- and long-term follow-up effect sizes were derived from one study and were negligible to small and non-significant. Studies rated with a low risk of bias revealed similar results at post-treatment, but were non-significant at short-term follow-up. The certainty of evidence was rated low for post-treatment and very low for short-term follow-up, whereas medium- and long-term follow-ups were rated very low.
Secondary outcomes: clinical and functional outcomes
Table 4 presents the results of meta-analyses performed for secondary clinical outcomes. PSYRATS subscales measured by one study Reference Garety, Edwards and Jafari64 were aggregated according to the recommendations of Borenstein, Reference Borenstein, Hedges, Higgins and Rothstein74 assuming a correlation of 0.34 between delusions and hallucinations. Reference Smith, Mar and Turoff75 For positive symptoms, small and negligible effect sizes significantly favoured AVATAR at post-treatment and short-term follow-up, respectively. In contrast, effect sizes for total psychotic and negative symptoms did not significantly favour either group at either time point. Small effect sizes significantly favoured AVATAR for anxiety and depressive symptoms at post-treatment, but remained significant only for anxiety symptoms at short-term follow-up. No significant between-group effects were observed for quality of life at either time point. Social functioning was measured by only one study, Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 and was therefore not meta-analytically aggregated. The removal of a singular outlier Reference Dellazizzo, Potvin, Phraxayavong and Dumais67 at post-treatment, specifically for quality of life, reduced heterogeneity without altering the conclusion of the statistical tests. Medium- and long-term follow-up assessments were derived from one study and were non-significant for each outcome.
Table 4 Meta-analytic results for clinical and functional outcomes

I 2 , heterogeneity statistic; T1, post-treatment; T2, follow-up at 12 weeks; T3, follow-up at 24 weeks; T4, follow-up at 52 weeks; AVATAR, Audio Visual Assisted Therapy Aid for Refractory Auditory Hallucinations.
Negative effect sizes described lower means in AVATAR groups.
a. Decimals are because of aggregation of subscales.
*p < 0.05; **p < 0.01; ***p < 0.001.
Secondary outcomes: tolerability and acceptability
Each study reported tolerability aspects concerning treatment-related adverse events or exacerbations, except for Leff et al. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25 Adverse events and exacerbations were non-standardised, making narrative review necessary. In Percie du Sert et al, Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65 1 participant out of 12 received additional counselling during early treatment, because of transient symptom exacerbations. Additionally, each of the four participants who dropped out did so because of anxiety symptom exacerbations in early virtual reality sessions. Three studies stated that any adverse events were not attributable to AVATAR or the control group. Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26,Reference Dellazizzo, Potvin, Phraxayavong and Dumais67,Reference Liang, Li and Guo68 In Garety et al, Reference Garety, Edwards and Jafari64 1 case of hospital admission in AVATAR (out of 116) and 4 cases in AVATAR-EXT (out of 114) could not be ruled out from possibly being attributable to the treatments. In Smith et al, Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 of 140 participants, five cases of hospital admission and one case of self-harm were considered potentially related to the AVATAR treatment because of worsening of AVHs. AVH symptom increases occurred in 52 participants during early exposure, which gradually declined over the course of the therapy. Furthermore, 40 participants reported needing additional time to manage anxiety during virtual reality immersion. Finally, Stefaniak et al Reference Stefaniak, Sorokosz, Janicki and Wciórka66 reported one case of hospital admission among 14 in the AVATAR group, although causal connections to the treatment were not reported.
Each study reported on treatment drop-outs in AVATAR and five reported rates in control groups. As can be seen in Table 5, the overall aggregated proportion of treatment drop-out in AVATAR was 24% with moderate heterogeneity. In control groups, the aggregated proportion of treatment drop-out was 18%, with moderate heterogeneity. No single outlier was responsible for the observed heterogeneity. In terms of study drop-out, both intervention and control groups showed aggregated proportions of 16%. Comparative analyses showed no significant difference in risk between AVATAR and control groups for both treatment and study drop-out (risk ratio of 1.01 in both cases).
Table 5 Meta-analytic drop-out results

N, sample size; I 2 , heterogeneity statistic. Negative effect sizes described lower scores in intervention groups whereas risk ratios above 1 describe an increased risk in intervention groups.
**p ≤ 0.01; ***p ≤ 0.001.
Publication bias
The Egger’s test for AVH severity at post-treatment was significant (p = 0.009), indicating an asymmetric funnel plot. Reference Egger, Davey Smith, Schneider and Minder59 Examination of the corresponding funnel plot revealed the notable impact of the large effect size in Stefaniak et al. Reference Stefaniak, Sorokosz, Janicki and Wciórka66 Trim-and-fill procedures revealed a corrected small effect size favouring AVATAR (Hedges’ g corrected = −0.35, 95% CI −0.49 to −0.22, p < 0.001), with two studies estimated to be missing. The Egger’s test of AVH distress (p = 0.02) at post-treatment was significant. Examination of the corresponding funnel plot revealed the impact of the large effect sizes in Stefaniak et al Reference Stefaniak, Sorokosz, Janicki and Wciórka66 and Percie du Sert et al. Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65 Trim-and-fill procedures revealed a corrected small effect size favouring AVATAR (Hedges’ g corrected = −0.28, 95% CI −0.41 to −0.14, p < 0.001), with two studies estimated to be missing. Funnel plots and Egger’s tests statistics can be found in Supplementary Figs 7–10 and Supplementary Table 11, respectively.
Discussion
The present study employed a systematic review and meta-analysis framework to investigate the efficacy, tolerability and acceptability of AVATAR for AVHs. Analyses showed preliminary evidence of the efficacy of AVATAR in decreasing AVH severity at post-treatment and short-term follow-up. This was robust for the effect of bias, and after correction for potential publication bias, although the effect was moderated by control group. Notably, the analyses were performed in persons with treatment-resistant and persistent AVHs, suggesting that AVATAR may address a critical treatment gap for previously refractory symptoms. The effect size observed (Hedges’ g = –0.40) was comparable to that of guideline-recommended psychological interventions, such as CBT (Hedges’ g = –0.34); however, this was in populations not specifically characterised as treatment-resistant. Reference Turner, Burger, Smit, Valmaggia and Gaag21 This illustrates the compelling potential of a theory-based intervention that harnesses digital technologies in a novel and effective way to treat AVHs. Medium- and long-term follow-ups (non-significant small effect sizes) were measured by only one study, limiting interpretability and certainty of evidence.
Analyses of effects on AVH frequency and distress showed significant small effects at post-treatment. They remained small and significant at short-term follow-up, and robust for the effect of bias (exception: distress at short-term follow-up) and potential publication bias. From a theoretical standpoint, the effects may align with potential mechanisms underlying reductions in AVH severity. The exposure to distressing avatar utterances and subsequent assertive strategies seem to have reduced both the frequency and distress of AVHs. Qualitative evidence has shown that although those affected by AVHs prioritise a decrease in the frequency of AVHs (i.e. total cessation), a decrease in AVH distress and disruption is more prioritised by service providers. Reference Longden, Branitsky, Sheaves, Chauhan and Morrison76 These findings indicate that AVATAR may support shared goals between patients and caregivers, with the potential for mutually beneficial outcomes. Of note is that the duration of AVATAR generally consisted of fewer sessions (7–12) than a typical minimal dose of 16 sessions of a psychological intervention such as CBT, Reference Kuipers, Yesufu-Udechuku, Taylor and Kendall9 emphasising the efficiency of AVATAR. Nevertheless, AVATAR is comparatively resource-intensive and may be demanding to implement at scale, because of the requirements of software, hardware and training. Reference Garety, Edwards and Jafari64
Some concerns of publication bias were present in the AVH symptom analyses at post-treatment for AVH severity and distress. Funnel plots revealed the strong influence two small outlier studies with large effect sizes. Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65,Reference Stefaniak, Sorokosz, Janicki and Wciórka66 This may point less toward the potential for missing studies and may instead reflect true heterogeneity of small studies. These are often able to direct more resources into treatment intensity, but are also more likely to be afflicted by methodological effects of bias. Reference Sterne, Sutton, Ioannidis, Terrin, Jones and Lau77 Furthermore, the outlier studies had waitlist control groups, which a recent meta-analysis has shown to have the smallest within-group symptom reductions in treatment-resistant schizophrenia, Reference Schütz, Salahuddin, Priller, Bighelli and Leucht78 underlining the potential for inflated effects in these designs.
AVATAR therapy was generally well tolerated. Treatment-related adverse events and lasting symptom exacerbations were not common. However, two studies reported instances of anxiety occurring during early exposure to avatars, Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65,Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 and a few cases requiring hospital admission attributable to treatment-related AVH symptom exacerbations. Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 As with other exposure-based therapies, initial increases in anxiety are not unusual and typically diminish as the treatment progresses. Reference Heinig, Knappe and Hoyer79 Consistent with this, two included studies found that within-session anxiety significantly decreased over the course of the treatment. Reference Craig, Rus-Calafell, Ward, Leff, Huckvale and Howarth26,Reference Rus-Calafell, Ward, Zhang, Edwards, Garety and Craig27,Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65 Additionally, small effect sizes favoured AVATAR in the reduction of anxiety symptoms at post-treatment and short-term follow-up. This further suggests that anxiety exacerbation is likely transient for the majority of participants. Studies also employed techniques to mitigate acute distress, such as offering a panic button or showing images of a calming beach. Reference Leff, Williams, Huckvale, Arbuthnot and Leff25,Reference Smith, Vernal, Mariegaard, Christensen, Jansen and Schytte69 However, expanding upon these techniques to avoid overwhelming participants may be beneficial in AVATAR to minimise adverse events and attrition.
Treatment non-completion was observed in approximately a quarter of participants receiving AVATAR, which did not differ significantly in drop-out risk to control groups, supporting the general acceptability of the intervention. Notably, other exposure-based interventions report similar treatment drop-out rates (e.g. 28% for prolonged exposure in post-traumatic stress disorder Reference Varker, Jones, Arjmand, Hinton, Hiles and Freijah80 ). The aggregated study drop-out rate of 16% across intervention and control conditions closely aligns with drop-out rates of CBT in schizophrenia spectrum disorders (14% Reference Cuijpers, Harrer, Miguel, Ciharova, Papola and Basic81 ). This suggests that AVATAR is comparable to current recommended evidence-based treatments in terms of retention.
Positive symptoms improved compared to control groups, which is likely associated with the improvement in AVH symptoms. Effect sizes were comparable to those reported for CBT in treatment-resistant schizophrenia compared with TAU and supportive counselling (Hedges’ g = –0.31 and –0.19, respectively Reference Salahuddin, Schütz and Pitschel-Walz82 ). Total psychotic and negative symptoms did not show significant improvement, which may reflect the more specific focus on AVHs. In contrast, the small effect on anxiety symptoms at post-treatment and short-term follow-up makes the anxiolytic effect of exposure to distressing AVHs apparent. Significant results were not observed for social functioning and quality of life. Contrary to common impairment in schizophrenia spectrum disorders, Reference Laws, Darlington, Kondel, McKenna and Jauhar83 functional outcomes were measured in few of the included studies, reflecting a common oversight in schizophrenia spectrum disorder research. Reference Bighelli, Wallis, Reitmeir, Schwermann, Salahuddin and Leucht84 Future trials should aim to consistently include functional outcome measures.
Limitations
The results of this study should be considered alongside several limitations. As a meta-analysis of under ten studies, robustness of results is not assured, requiring additional high-quality, randomised controlled trials. Likewise, publication bias tests and meta-regression tests may have been underpowered, potentially missing significant effects. Reference Sterne, Sutton, Ioannidis, Terrin, Jones and Lau77,Reference Deeks, Higgins, Altman, Higgins, Thomas, Chandler, Cumpston, Li and Page85 Another limiting factor concerns the length of measurement: only one study measured follow-ups later than 3 months post-intervention. Future studies are encouraged to perform longer follow-ups to assess the retention of effects. Although statistical heterogeneity was non-significant in the majority of analyses and often controlled by the removal of outliers, differences in treatment and follow-up durations, inclusion criteria, study designs, outcome measures and immersion were observed, potentially leading to clinical heterogeneity not assessed in these analyses. Future reviews should plan further subgroup and meta-regression analyses. Similarly, the question remains as to the comparative efficacy and tolerability of immersive virtual reality-based versus less immersive screen-based approaches, which could not be answered in this review and may be assessed with future direct comparison trials. Many included samples were small, had heterogeneous outcomes and were monocentric, with one study enrolling only 19 participants, Reference Percie du Sert, Potvin, Lipp, Dellazizzo, Laurelli and Breton65 potentially introducing bias into effect sizes. Reference Lin86 Future trials should be designed as large, multicentric trials with standardised outcome measures to allow for greater comparability between studies. Finally, trials were overwhelmingly conducted in Western, Educated, Industrial, Rich and Democratic (WEIRD) countries, Reference Muthukrishna, Bell and Henrich87 limiting the generalisability of results for all populations reporting AVHs, especially considering the strong cultural aspects apparent in AVHs. Reference Khaled, Brederoo, Yehya, Alabdulla, Woodruff and Sommer88,Reference Larøi, Luhrmann, Bell, Christian, Deshpande and Fernyhough89
In conclusion, AVATAR therapy showed efficacious and robust results for our primary outcome of the severity of AVH at post treatment, with moderate certainty of evidence. Effects were maintained into short-term follow-up, and AVH dimensions of both frequency and distress showed similar results. Efficacy profiles for other clinical and functional outcomes were mixed. These findings support the efficacy of AVATAR as a focused treatment for AVHs, although heterogeneity was apparent, and additional medium- and long-term evidence is required to assess the retention of effects.
Supplementary material
The supplementary material is available online at https://doi.org/10.1192/bjo.2026.11014
Data availability
Additional material is available in the online Supplementary Material. Data and scripts can be supplied upon reasonable request.
Acknowledgements
We would like to thank Professor Alex Leff for providing support and additional information on the original AVATAR trial. Additionally, we thank Dr Daniel Schulze and Dr Lars Schulze for answering early statistical questions.
Author contributions
F.O. contributed to study conceptualisation, data curation, formal analysis, methodology, project administration, visualisation and software, and wrote the original draft of the manuscript. S.H. contributed to study conceptualisation, validation and data curation, wrote the original draft of the manuscript and reviewed and edited the manuscript. P.W., L.F., M.S., N.T., C.-S.L. and B.S. reviewed and edited the manuscript. D.A. contributed to validation and reviewed and edited the manuscript. O.P.d.S., I.S. and L.B.G. contributed to data curation and reviewed and edited the manuscript. K.B. contributed to study conceptualisation, methodology, project administration, resources and supervision, and reviewed and edited the manuscript.
Funding
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Declaration of interest
L.B.G., O.P.d.S., N.T. and I.S. were directly involved with included AVATAR trials. These authors played purely advisory roles and had no influence over methodology, or rating of bias and certainty of evidence. B.S. is supported by a National Institute for Health and Care Research Advanced Fellowship (grant number NIHR301206). B.S. is on the editorial board of Nature Exercise Science and Health, The Journal of Physical Activity and Health, Ageing Research Reviews, Mental Health and Physical Activity, The Journal of Evidence Based Medicine and The Brazilian Journal of Psychiatry. B.S. has received honorarium from a co-edited book on exercise and mental illness (Elsevier), an education course and unrelated advisory work from ASICS and FitXR Ltd. L.B.G. has received honoraria from Heka-VR for clinician training in the Challenge-virtual reality-assisted therapy intervention in Denmark and internationally, is engaged in other research collaborations with Heka-VR, and has received research funding for related virtual reality-based studies. N.T. has received funding from the Australian National Health and Medical Research Council for research into AVATAR therapy. K.B. is a co-founder of the two healthcare start-ups Kiso GmbH and Mental Hub. He has also received consultancy fees and/or honoraria for lectures and/or educational materials from Boehringer Ingelheim and Angelini, as well as from publishers and training institutes for workshops, books and lectures on psychotherapy. F.O., S.H., P.W., D.A., L.F., M.S. and C.-S.L. report no competing interests.





eLetters
No eLetters have been published for this article.