Introduction
Mentalizing, or theory of mind (ToM), refers to the ability to attribute others’ mental states, such as thoughts, desires, and emotions [Reference Quesque, Apperly, Baillargeon, Baron-Cohen, Becchio and Bekkering1]. It is fundamental to social interaction and linked to psychiatric symptoms, social functioning, and social skills [Reference Lewandowski, Pinkham and Van Rheenen2, Reference Thibaudeau, Rae, Raucher-Chéné, Bougeard and Lepage3]. Recent reviews indicated widespread mentalizing impairments across psychiatric conditions [Reference Lewandowski, Pinkham and Van Rheenen2, Reference Cotter, Granger, Backx, Hobbs, Looi and Barnett4, Reference McLaren, Gallagher, Hopwood and Sharp5]. However, it remains unclear whether these deficits reflect shared vulnerabilities or disorder-specific mechanisms. Adopting a transdiagnostic approach in studying mentalizing impairments aligns with contemporary frameworks, such as the Research Domain Criteria initiative [Reference Kozak and Cuthbert6] and the Hierarchical Taxonomy of Psychopathology model [Reference Kotov, Krueger, Watson, Achenbach, Althoff and Bagby7], emphasizing the importance of identifying nuanced processes and underlying mechanisms in psychopathology across psychiatric conditions [Reference Abramovitch, Short and Schweiger8, Reference Lavigne, Deng, Raucher-Chéné, Hotte-Meunier, Voyer and Sarraf9]. Exploring shared and unique mentalizing impairments will improve our understanding of the distinct presentations and mechanistic features of social cognitive deficits, thereby informing the development of targeted treatments.
Previous meta-analyses have compared social cognitive impairments between disorders, such as schizophrenia versus autism spectrum disorder (ASD) [Reference Oliver, Moxon-Emre, Lai, Grennan, Voineskos and Ameis10], or stages along the psychosis continuum, including individuals with clinical high-risk for psychosis (CHR), familial high risk for psychosis (FHR), and early schizophrenia [Reference Bora and Pantelis11, Reference Tsui, Luk, Hsiao and Chan12]. However, traditional pairwise meta-analysis is limited to examining only two groups at a time and cannot simultaneously compare multiple conditions. Network meta-analysis (NMA) emerged as a promising tool derived from graph theory, which has primarily been used to compare multiple treatments [Reference White13]. Recently, NMA has been applied to cognitive biases [Reference Lavigne, Deng, Raucher-Chéné, Hotte-Meunier, Voyer and Sarraf9] and brain morphology [Reference McCutcheon, Pillinger, Guo, Rogdaki, Welby and Vano14] across psychiatric disorders, with distinct conditions represented as nodes. This framework allows simultaneous, statistically coherent comparisons among multiple conditions, yielding a more comprehensive map of mentalizing impairment.
Conceptually, mentalizing is an umbrella term encompassing diverse domains, such as false belief, humor comprehension, and intentionality. Its assessment has relied on varied modalities, including verbal and nonverbal tasks, text-based readings, comic strips, and videos, leading to considerable heterogeneity [Reference Eddy15–Reference Tsui, Wong, Ma, Wong, Hsiao and Chan17]. Recent reviews emphasize the need to analyze these methods separately. For example, Gao et al. [Reference Gao, Luo, Li and Zhao16] showed that ASD adults displayed distinct impairment patterns depending on task type, performing poorly on text-based reading comprehension and video-based ecological scene comprehension tasks, but showing only moderate deficits on perceptual scene comprehension tasks based on static illustrations. Despite such variability in experimental designs and psychometric properties, most meta-analyses have aggregated mentalizing tasks without accounting for these distinctions [Reference Tsui, Wong, Ma, Wong, Hsiao and Chan17, Reference Yeung, Apperly and Devine18]. Among assessment approaches, tasks using static illustrations or comic strips, such as the commonly used Brüne’s Picture Sequencing Task (PST) [Reference Brüne19] and Brunet’s Comic Strip Task (CST) [Reference Brunet, Sarfati, Hardy-Baylé and Decety20], combine visual and narrative elements, making them engaging, accessible, and relatively low in cognitive demand. They can mimic real-life social interactions, minimize language demands, and provide standardized formats for cross-population comparisons. Yet, no systematic review has comprehensively examined mentalizing impairment across psychiatric conditions, focusing specifically on such tasks.
This review aimed to examine static illustration-based mentalizing tasks across conditions to identify shared and disorder-specific patterns of impairments. Pairwise meta-analysis and NMA were conducted to compare psychiatric groups with healthy controls (HCs) and between groups. Subgroup analyses examined impairments across specific task domains, including false belief, humor, and intentionality tasks. Additionally, meta-regression analyses were employed to explore potential moderators related to participant and study characteristics, such as demographics and estimated intelligence. We also assessed the ceiling effects of specific tasks to evaluate the psychometric properties across different samples. By investigating mentalizing deficits across psychiatric groups, this review aims to uncover patterns of impairment that can inform theories of social cognition and contribute to the development of targeted treatment strategies.
Methods
Search strategy and eligibility
This systematic review and meta-analysis adhered to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines and was preregistered in PROSPERO (CRD42024629394) (Supplementary Methods). Four electronic databases – Embase, MEDLINE, PsycINFO, and Web of Science – were systematically searched up to February 2, 2025. The search strategy consisted of terms related to nonverbal mentalizing or ToM tasks using static illustrations: (“Attribution of Intention Task” OR “Cartoon Stories ToM Paradigm” OR “Cartoon Theory of Mind” OR “Cartoon Vignette” OR “Comic Strip Task” OR “Comic Theory of Mind” OR “Joke-Appreciation Task” OR “Nonverbal Theory of Mind” OR “Picture Sequencing Task” OR “Picture Stories Task” OR “Theory of Mind Stories Task” OR “Visual Jokes Test” OR “Yoni Task”). Two researchers (H.K.H.T and J.L.) independently conducted the screening, data extraction, and quality assessment procedures between February 2, 2025, and February 17, 2025, with discrepancies resolved through team discussion. Inter-rater reliability was high (Cohen’s κ = 0.72–0.81). Detailed inclusion and exclusion criteria are provided in the Supplementary Methods.
Data extraction
Demographics and clinical details of all groups were recorded, including age, gender, years of education, diagnosis or condition, estimated intelligence quotient (IQ), validated diagnostic or assessment tools used, comorbidity, and medication information. Information about the mentalizing tasks with static illustration was also documented, including behavioral performance metrics, scoring methods, experimental designs, and stimulus characteristics. Tasks were classified into three domains: false belief (understanding beliefs differing from one’s own or reality), intentionality (inferring goal-directed actions or intentions), and humor (detecting or resolving incongruities). This categorization, informed by prior theoretical reviews [Reference Gao, Luo, Li and Zhao16, Reference Achim, Guitton, Jackson, Boutin and Monetta21, Reference Quesque and Rossetti22], was verified through an independent review of task protocols and scoring methods. Additional study characteristics (year, author, and country) were recorded, and authors were contacted for missing information.
Participant or diagnostic groups were categorized according to established definitions. Early schizophrenia was defined as first-episode schizophrenia or psychosis, or illness onset within 5 years [Reference Newton, Rouleau, Nylander, Loze, Resemann and Steeves23, Reference Tsui, Wong, Sum, Chu, Hui and Chang24]. This distinction was made to capture studies explicitly targeting the early illness stage, rather than general schizophrenia samples that often include a mix of illness durations. As prior evidence suggests distinct profiles of social cognitive [Reference García-Fernández, Cabot-Ivorra, Romero-Ferreiro, Pérez-Martín and Rodriguez-Jimenez25], cognitive [Reference McCutcheon, Keefe and McGuire26], and neurostructural alterations [Reference Dietsche, Kircher and Falkenberg27] between early and chronic stages, this distinction aligned with the abundance of studies focusing on first-episode or early-stage schizophrenia, enabling an adequate sample size and examination of potential stage-related differences in mentalizing ability. CHR refers to the high-risk group of developing psychosis identified by validated tools such as the Comprehensive Assessment of At-Risk Mental States and the Structured Interview for Psychosis-Risk Syndromes. FHR-S and FHR-B referred to first-degree relatives of patients with schizophrenia or bipolar disorder, respectively.
Quality assessment
Study quality was assessed with a modified Newcastle-Ottawa Scale (Supplementary Methods), evaluating diagnostic methods, sample representativeness, group comparability, task validity, and outcome reporting. Composite scores summarized methodological quality and risk of bias.
Statistical analysis
Effect sizes were calculated as Hedges-adjusted standardized mean difference, where negative values indicated poorer mentalizing (Supplementary Methods), and interpreted as small (0.20), moderate (0.50), and large (0.80) following established guidelines [Reference Fritz, Morris and Richler28]. Pairwise meta-analyses adopted random-effects models with restricted maximum likelihood and inverse-variance weighting to compare each group with HCs, and with one another. Subgroup analyses were conducted by task domains (Intentionality, False Belief, and Humor). Heterogeneity was assessed using I 2, Q, and τ 2 statistics. Egger’s test was performed to evaluate the potential publication biases. Meta-regression was restricted to pairwise analyses because covariate adjustment is not supported in frequentist NMA, exploring potential moderators, such as demographics, estimated IQ, sample sizes, or quality scores. Only variables with at least four studies were included in meta-regression.
A frequentist random-effects NMA was conducted to compare all groups simultaneously, overall, and stratified by task domains. Nodes represented distinct groups, with only those having at least three studies included in the overall NMA model to ensure statistical power for reliable comparisons. Results were illustrated with network graphs, forest plots, and league tables. Global heterogeneity (τ 2 and I 2) and network inconsistency (separating indirect from direct evidence/SIDE test) were assessed (Supplementary Methods). Publication bias was evaluated using comparison-adjusted funnel plots and Egger’s test. Sensitivity analyses were conducted to assess the robustness of the results by excluding studies with a high risk of bias. The confidence in the evidence was evaluated using the Confidence in Network Meta-Analysis (CINeMA) framework. NMA results were prioritized, given greater statistical power from combining direct and indirect evidence than pairwise results.
Ceiling effects were defined as mean scores ≥80% of the maximum [Reference Petersen, Hoyniak, McQuillan, Bates and Staples29] and were assessed only for tasks with at least five studies. For each task, the number and percentage of samples exhibiting ceiling effects were calculated, along with mean percentage scores. Statistical analysis was performed in R version 4.4.1 (metafor and netmeta) with two-tailed α = 0.05.
Results
From 1,488 records screened, 89 studies met the inclusion criteria, comprising 9,038 participants across 11 psychiatric and neurodevelopmental conditions or high-risk groups (Figure 1). The overall sample had a mean age of 32.2 years, with 48.4% female participants (Supplementary Table S1). Schizophrenia (k = 35; N = 1,850) and early schizophrenia (k = 19; N = 718) were the most studied conditions, followed by bipolar disorder (k = 11, N = 366) and CHR (k = 11, N = 446), ASD (k = 10, N = 468) and depression (k = 10; N = 286), borderline personality disorder (BPD) (k = 4; N = 200) and FHR for schizophrenia (FHR-S (k = 4; N = 710), obsessive-compulsive disorder (OCD) (k = 3; N = 150), and anorexia nervosa (k = 2; N = 42) and FHR for bipolar disorder (FHR-B) (k = 2; N = 41). No eligible studies were identified for ADHD and general anxiety disorder. While most studies matched age and sex between groups, mean values varied across diagnostic categories (Supplementary Table S2). Nine distinct mentalizing tasks were identified, and details were provided in Supplementary Table S3. Intentionality (k = 43) and false belief (k = 35) tasks were most common, while humor comprehension was examined in only 11 studies, precluding subgroup NMA. The included studies demonstrated good methodological quality, with a mean modified Newcastle-Ottawa Scale score of 6 (median = 6; range = 4–8).

Figure 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram showing study selection. Note: Literature searches were conducted by two independent researchers from database inception until February 2, 2025. ASD, autism spectrum disorders; BPD, borderline personality disorder; CHR, individuals at clinical high risk for psychosis; FHR-B, familial high risk for bipolar disorder; FHR-S, familial high risk for schizophrenia; OCD, obsessive-compulsive disorder.
Pairwise meta-analysis
Thirty-seven pairwise meta-analyses were conducted comparing psychiatric conditions with HCs and with each other across overall mentalizing performance and three task-type subgroups. All psychiatric conditions demonstrated significant mentalizing impairment compared to HCs, except for the FHR-B group, with varying patterns across task types (Table 1 and Supplementary Figure S1). Particularly, only ASD, anorexia nervosa, BPD, and schizophrenia were impaired in humor tasks, although very few studies were available. Between-condition comparisons only revealed that schizophrenia showed greater impairment than depression (g = 0.565 [0.327–0.804], p < 0.001), and early schizophrenia showed greater impairment than CHR (g = 0.430 [0.047–0.814], p = 0.028). The main analyses showed moderate heterogeneity with a mean I 2 of 38.7% and a mean τ 2 of 0.080. High heterogeneity (I 2 > 75%, τ 2 > 0.160) was observed in comparisons between bipolar disorder, FHR-S, and schizophrenia with HCs.
Table 1. Stratified meta-analyses (or single effect size) of mentalizing ability with static illustrations across psychiatric conditions compared to healthy controls, and between conditions

Note: ASD, autism spectrum disorder; HC, healthy controls; BPD, borderline personality disorder; CHR, clinical high risk for psychosis; FHR-B, familial high risk for bipolar disorder; FHR-S, familial high risk for schizophrenia; OCD, obsessive-compulsive disorder. Bold values indicate statistical significance (p < 0.05).
a Between disorders comparisons were conducted only when direct comparisons were available in the included studies. Positive values indicate better performance in the former condition compared to the latter condition.
Meta-regression analyses revealed that years of education (Q m = 18.796, p = .049) and IQ (Q m = 6.742, p = .048) positively moderated performance in ASD (Supplementary Table S4). Age negatively moderated performance in the FHR-S group (Q m = 12.212, p = .04), while education (Q m = 7.125, p = .015) and sample size (Q m = 4.461, p = .042) showed positive associations in schizophrenia. Study quality score negatively moderated the effect size in CHR versus early schizophrenia comparisons (Q m = 10.755, p = .046). Residual heterogeneity was reduced to below 75% in these groups, except for schizophrenia. Egger’s tests and funnel plots indicated potential publication bias in comparisons between schizophrenia, FHR-S, and early schizophrenia with HCs (Table 1 and Supplementary Figure S2).
Network meta-analysis
Network graphs of overall mentalizing ability, false belief subgroup, and intentionality subgroup were illustrated in Figure 2. NMA of overall mentalizing ability with static illustrations revealed deficits across all psychiatric conditions compared to HCs (Figure 3). Effect sizes ranged from small-moderate in bipolar disorder (g = −0.326 [−0.596 to −0.057], p = 0.018), CHR (g = −0.408 [−0.677 to −0.139], p = 0.003), and depression (g = −0.426 [−0.679 to −0.172], p = 0.001); to moderate in ASD (g = −0.505 [−0.773 to −0.236], p < 0.001) and FHR-S (g = −0.567 [−0.919 to −0.215], p = 0.002); to moderate-large in OCD (g = −0.613 [−1.113 to −0.112], p = 0.017) and BPD (g = −0.612 [−1.041 to −0.183], p = 0.005); to large in early schizophrenia (g = −0.785 [−0.980 to −0.590], p < 0.001) and schizophrenia (g = −0.960 [−1.108 to −0.811], p < 0.001). Between-condition comparison revealed that both schizophrenia and early schizophrenia displayed greater impairments than most other conditions (Table 2). Notably, schizophrenia showed more pronounced impairments compared to ASD (g = −0.455 [−0.756 to −0.154]) and FHR-S (g = −0.393 [−0.761 to −0.025]), while both schizophrenia and early schizophrenia were more impaired than bipolar disorder, CHR, and depression.

Figure 2. Network graph of mentalizing ability with static illustrations across conditions.

Figure 3. Forest plots of mentalizing ability with static illustrations across conditions.
Table 2. League table of theory of mind with static illustrations between conditions

Note: Bold values indicate statistical significance (p < 0.05).
NMA subgroup analyses showed that early schizophrenia, schizophrenia, and depression demonstrated impairments in both false belief and intentionality tasks compared to HCs (Figure 3 and Table 2). However, bipolar disorder and OCD were significantly impaired only in false belief task performance, while ASD, CHR, and FHR-S were impaired only in intentionality tasks compared to HCs. In comparison between conditions, schizophrenia was more impaired than bipolar disorder, depression, and FHR-S in false belief tasks. For intentionality tasks, both early schizophrenia and schizophrenia displayed greater impairments than CHR. Furthermore, schizophrenia, but not early schizophrenia, demonstrated significant impairments compared to ASD and bipolar disorder. No significant differences were found between schizophrenia and early schizophrenia in overall mentalizing impairments or specific tasks.
NMA analyses demonstrated low-moderate heterogeneity for overall mentalizing ability and false belief tasks, and moderate-high heterogeneity for intentionality tasks, with minimal evidence of global and local inconsistency (Supplementary Tables S5 and S6). Potential publication biases were suggested by the comparison-adjusted funnel plot and Egger’s tests in overall mentalizing ability (p < 0.001) and false belief tasks (p = 0.042) (Supplementary Figure S3). The confidence of the evidence evaluated by CINeMA was moderate to low (Supplementary Table S7).
Ceiling effect
Ceiling effects were observed across mentalizing tasks (Table 3 & Supplementary Table S8). The Brüne PST showed the highest rate of ceiling effects (68.3%), followed by the Langdon PST (65.7%), Yoni Task (65.0%), and Brunet/Sarfati CST (58.8%). The Happé Cartoon Task demonstrated the lowest rate of ceiling effects (33.3%). Notably, HCs consistently showed higher rates of ceiling effects (86.7–94.7%) across the first four tasks, while clinical groups showed lower rates (36.8–50.0%). Among clinical groups, CHR and ASD frequently demonstrated ceiling effects, while patients with schizophrenia rarely showed ceiling performance across all tasks. Mean performance scores of maximum possible scores followed a similar pattern, with HCs generally achieving higher scores (85.2–89.0%) compared to clinical groups (70.6–81.0%). However, it should be noted that the number of studies in many psychiatric conditions was very low (often k ≤ 3).
Table 3. Ceiling effects across mentalizing tasks and conditions

Abbreviations: ASD, autism spectrum disorder; BPD, borderline personality disorder; CHR, clinical high risk for psychosis; FHR-B, familial high risk for bipolar disorder; FHR-S, familial high risk for schizophrenia; OCD, obsessive-compulsive disorder; CST, comic strip task; PST, picture sequencing task.
Note: Ceiling effects were defined as mean performance ≥80% of maximum possible score. Mean scores are presented as percentage of maximum possible score.
Discussion
This review represents the first comprehensive examination of mentalizing impairments assessed with static-illustration tasks across psychiatric disorders, neurodevelopmental disorders, and high-risk groups, encompassing 89 studies with 9,038 participants. Significant impairments emerged in anorexia nervosa, ASD, bipolar disorder, BPD, CHR, early schizophrenia, FHR-S, OCD, and schizophrenia, with the exception of FHR-B. Schizophrenia and early schizophrenia exhibited the largest impairments overall and in both false belief and intentionality tasks, compared to HCs and most other groups, including bipolar disorder, CHR, and depression. Domain-specific patterns were evident across conditions. NMA revealed that bipolar disorder and OCD were impaired only in false belief tasks, whereas ASD, CHR, and FHR-S were impaired only in intentionality tasks. Pairwise comparison indicated that humor comprehension deficits only appeared in ASD, anorexia nervosa, BPD, and schizophrenia. However, findings should be interpreted cautiously, given the limited studies for certain comparisons, the presence of heterogeneity, and potential publication bias. Prominent ceiling effects were observed across the commonly used tasks. This review offers valuable insights into transdiagnostic and domain-specific mentalizing impairments, guiding future research directions and the development of potential targeted interventions.
Main discussion
Our findings revealed mentalizing impairments of varying severity across psychiatric groups, from profound deficits in schizophrenia and early schizophrenia, to moderate impairments in ASD, BPD, OCD, and FHR-S, and milder deficits in bipolar disorder, CHR, and depression. This aligns with evidence that mentalizing deficits represent a transdiagnostic feature of psychopathology [Reference Lewandowski, Pinkham and Van Rheenen2–Reference Cotter, Granger, Backx, Hobbs, Looi and Barnett4]. The gradient and domain-specific profiles suggest both shared vulnerabilities and disorder-specific manifestations. These patterns may arise from bidirectional links with neural–cognitive mechanisms or symptoms across conditions [Reference Abramovitch, Short and Schweiger8, Reference Green, Horan and Lee30, Reference Schurz, Radua, Tholen, Maliske, Margulies and Mars31]. Schizophrenia showed the most pronounced impairments compared to CHR and FHR-S, while early schizophrenia was only more impaired than CHR. This supports a progressive social cognitive decline along the schizophrenia spectrum [Reference Bora and Pantelis11, Reference Tsui, Luk, Hsiao and Chan12]. Similar impairment magnitudes in early and chronic schizophrenia suggest chronicity does not further worsen deficits, and moderate deficits may serve as endophenotypic markers [Reference Thibaudeau, Rae, Raucher-Chéné, Bougeard and Lepage3, Reference van Neerven, Bos and van Haren32].
Although mentalizing difficulties are well-documented in ASD, our analysis found them milder than schizophrenia, contradicting previous meta-analyses [Reference Oliver, Moxon-Emre, Lai, Grennan, Voineskos and Ameis10]. One explanation may be that previous reviews pooled diverse tasks, obscuring task-specific patterns. In contrast, our review showed no false belief deficits in adults with ASD, challenging assumptions of pervasive difficulties [Reference Gernsbacher and Yergeau33]. Developmental compensation in adulthood (the mean age of our ASD samples is 23.6 years) may also account for the preserved performance [Reference Happé and Frith34]. This is consistent with a recent meta-analysis by Gao et al. [Reference Gao, Luo, Li and Zhao16], which reported only moderate deficits in perceptual scene-based abilities but greater impairments in more cognitively demanding ecological or conversational tasks in ASD compared to HCs. Furthermore, relationships between executive dysfunction and mentalizing difficulties have been shown in both ASD [Reference Jones, Simonoff, Baird, Pickles, Marsden and Tregay35] and schizophrenia [Reference Thibaudeau, Achim, Parent, Turcotte and Cellard36]. Our meta-regression analysis likewise indicated that higher education, as a proxy for cognitive reserve, attenuated impairments in both ASD and schizophrenia, highlighting the interplay between cognition and task type.
Our findings indicated small to moderate deficits on static-illustration tasks in both bipolar disorder and depression. Whether these are state- or trait-linked remains unresolved [Reference Lewandowski, Pinkham and Van Rheenen2, Reference van Neerven, Bos and van Haren32]. Associations with mood symptoms were inconsistent [Reference Inoue, Yamada and Kanba37–Reference Wolf, Brüne and Assion39], while evidence for a psychosis contribution was limited [Reference Corcoran, Rowse, Moore, Blackwood, Kinderman and Howard40, Reference Popolo, Borsella, Meschini, Pianella, Zampaglione and Vinci41]. In contrast, poorer neurocognition or lower intelligence consistently predicts weaker mentalizing [Reference Inoue, Yamada and Kanba37, Reference Moore, Blackwood, Corcoran, Rowse, Kinderman and Bentall42, Reference Zobel, Werden, Linster, Dykierek, Drieling and Berger43]. This is indeed consistent with findings that many individuals with bipolar disorders (about half to two-thirds) retain intact social cognition when neurocognitive functions are preserved [Reference Bora, Veznedaroğlu and Vahip44, Reference Burdick, Russo, Frangou, Mahon, Braga and Shanahan45].
Beyond cognition and mood symptoms, demographic and clinical factors also contributed to impairment severity. Longer illness duration, poorer insight, and male sex were linked to poorer social cognition [Reference Bora, Veznedaroğlu and Vahip44, Reference Tulacı, Cankurtaran, Özdel, Öztürk, Kuru and Özdemir46, Reference Vaskinn, Sundet and Haatveit47]. Neuroimaging studies further indicate that age, biological sex, and individual social cognitive performances, rather than diagnostic categories per se, explain much of the variance in mentalizing-related networks [Reference Schurz, Radua, Tholen, Maliske, Margulies and Mars31, Reference Bagheri, Yu, Gallucci, Tan, Oliver and Dickie48, Reference Oliver, Moxon-Emre, Hawco, Dickie, Dakli and Lyon49]. Such interindividual variability likely underlies the heterogeneity observed in our meta-analyses and underscores the need for transdiagnostic, multidimensional frameworks that account for personal and clinical factors alongside traditional diagnoses.
Our subgroup analyses further revealed distinct domain-specific patterns of mentalizing impairments across conditions, suggesting differential profiles of social cognitive dysfunction. Schizophrenia, early schizophrenia, and depression exhibited deficits in both false belief and intentionality tasks, indicating a generalized impairment. In contrast, bipolar disorder and OCD displayed selective impairments in false belief tasks, a relatively straightforward construct involving basic perspective taking and object permanence, while maintaining intact intentionality comprehension. This suggested that individuals with bipolar disorder and OCD may preserve the ability to infer intentions but struggle with self-other differentiation, potentially relying on their own mental states to infer others’ beliefs [Reference Eddy15, Reference Quesque and Rossetti22]. However, only two OCD studies assessed false beliefs, and heterogeneity and potential publication bias affected bipolar disorder analyses, warranting cautious interpretations. Moreover, false belief tasks vary in operationalization, targeting different aspects of belief reasoning, which likely contributes to inconsistent findings (Supplementary Table S3). Notably, OCD patients demonstrated impairments in affective but not cognitive second-order beliefs [Reference Liu, Fan, Gan, Lei, Niu and Chan50], whereas bipolar disorder patients were impaired only in cognitive second-order beliefs, with intact affective second-order and first-order performance [Reference Wang, Wang, Zou, Ni, Tian and Sun38].
Intentionality tasks, the most widely studied domain, revealed further divergence. ASD, CHR, and FHR-S showed specific difficulty in inferring intentions despite intact self-other differentiation. Our analyses addressed impairment magnitude but not direction. Theoretical models distinguish “hypomentalizing,” reduced attribution of mental states, from “hypermentalizing,” excessive, overly elaborate attributions, along the proposed “autism-psychosis continuum” [Reference Crespi and Badcock51, Reference Martinez, Alexandre, Mam-Lam-Fook, Bendjemaa, Gaillard and Garel52]. Indeed, evidence indicates that schizophrenia, early schizophrenia, bipolar disorder, and CHR groups often exhibit hypermentalizing biases during gaze perception [Reference Chan, Hsiao, Wong, Liao, Suen and Yan53–Reference Tsui, Liao, Hsiao, Suen, Yan and Poon56]. A meta-analysis by McLaren et al. [Reference McLaren, Gallagher, Hopwood and Sharp5] further suggests that hypermentalizing is a transdiagnostic feature across multiple disorders, not limited to BPD or schizophrenia, particularly on complex video-based tasks such as the Movie for the Assessment of Social Cognition. In BPD, attachment hyperactivation may underlie hypermentalizing in emotionally charged contexts. However, little is known about how these biases shift across contexts, such as low emotional arousal or third-person perspectives. Current static-illustration tasks cannot capture bias direction and may obscure subtle distinctions in how individuals infer others’ minds by heavily relying on standardized stimuli with correct/incorrect scoring. Future research could employ tasks capable of differentiating over- versus under-ascription of mental states [Reference Fretland, Andersson, Sundet, Andreassen, Melle and Vaskinn57–Reference Wastler and Lenzenweger59], offering deeper insights into hyper- and hypomentalizing across disorders.
For humor comprehension, significant impairments were found only in ASD, anorexia nervosa, BPD, and schizophrenia. These deficits likely reflect difficulties in processing non-literal meaning, interpreting subtle social cues, and managing complex social perceptions [Reference Berger, Bitsch and Falkenberg60, Reference Kalandadze, Norbury, Nærland and Næss61], signaling broader challenges in understanding social nuances essential for communication and social functioning. Given the limited number of studies, however, these findings should be interpreted cautiously.
Overall, our results reveal domain-specific patterns across false belief, intentionality, and humor tasks, reinforcing that mentalizing is not a unitary construct but a constellation of partially independent processes. Future work should disentangle specific components, such as affective and order components, to clarify distinct neural and clinical mechanisms. Beyond theory, the current synthesis offers empirical insights and directions for developing targeted interventions across diagnostic groups that can leverage preserved abilities while addressing areas of impairment. Therapeutic approaches, from traditional social cognition remediation [Reference Fernandez-Sotos, Torio, Fernandez-Caballero, Navarro, Gonzalez and Dompablo62], noninvasive brain stimulation [Reference Tsui, Kranz, Zheng, Hsiao and Chan63], to emerging virtual-reality-based interventions [Reference Pérez-Ferrara, Flores-Medina, Landa-Ramírez, González-Sánchez, Luna-Padilla and Sosa-Millán64], may differentially target mentalizing domains and enhance functional outcomes across psychiatric disorders.
Methodological considerations
This review differs from prior work by focusing on mentalizing tasks that employ static illustrations or comic strips, thereby reducing heterogeneity and allowing more consistent cross-study comparisons, a key prerequisite for meaningful meta-analysis [Reference Gurevitch, Koricheva, Nakagawa and Stewart65]. Despite these advantages, several methodological limitations warrant attention. Prominent ceiling effects were observed, especially in HC, ASD, CHR, and bipolar disorder groups, which may obscure milder impairments by compressing scores near the upper limit [Reference Tsui, Wong, Ma, Wong, Hsiao and Chan17, Reference Yeung, Apperly and Devine18, Reference Konstantin, Nordgaard and Henriksen66]. Consequently, subtle deficits may appear absent unless more sensitive measures are used.
Ecological validity presents another significant concern. Recent reviews have questioned whether these tasks truly measure mentalizing ability rather than specific problem-solving skills [Reference Quesque and Rossetti22, Reference Konstantin, Nordgaard and Henriksen66]. The laboratory–life gap risks misrepresenting social cognition in both clinical and general populations. Moreover, the relationship between mentalizing performance and real-world functioning remains underexplored, with only a few studies reporting mixed findings [Reference Morrison, Pinkham, Kelsven, Ludwig, Penn and Sasson67]. In response, more ecologically valid approaches have been developed, such as video-based paradigms [Reference Baksh, Abrahams, Auyeung and MacPherson68, Reference Dziobek, Fleck, Kalbe, Rogers, Hassenstab and Brand69] and interactive virtual reality assessments [Reference Canty, Neumann, Fleming and Shum70, Reference Cao, So, Wang, Hu, Xie and Gu71], which can better simulate dynamic social contexts. With advances in natural language processing and large language models (LLMs), dynamic and adaptive mentalizing assessments and interventions have become increasingly feasible. These approaches enable modeling of mentalizing processes at both computational and semantic levels [Reference Jara-Ettinger and Rubio-Fernandez72, Reference Nour, McNamee, Liu and Dolan73] and allow personalized, real-time adaptations based on participants’ inferred mental states and interactive behaviors [Reference Sarıtaş, Tezören and Durmazkeser74, Reference Xiao, Wang, Xu, Song, Xu and Cheng75]. Future research should explore the integration of dynamic, context-rich, and potentially LLM-enhanced paradigms to advance the modeling and understanding of mentalizing ability, complement existing tools, and inform the development of more effective and ecologically valid interventions.
Limitations
First, despite the large overall sample size, certain groups (e.g., anorexia nervosa and FHR-B) were underrepresented, limiting generalizability and precluding inclusion in the NMA. Similarly, some pairwise meta-analyses were based on only two or three studies per condition, which may yield imprecise or biased estimates. We retained these analyses to provide a comprehensive overview of all available literature, but the corresponding results should be interpreted with caution due to limited statistical power and potential bias. Second, substantial heterogeneity was observed, particularly in HC comparisons with ASD, bipolar disorder, FHR-S, and schizophrenia, as well as in the NMA for intentionality tasks. This likely reflected methodological, sample, and task differences. Publication bias was evident in several pairwise contrasts, and effect sizes were moderated by sample size and study quality. Third, comorbidity was inadequately reported despite its high prevalence [Reference Hossain, Khan, Sultana, Ma, McKyer and Ahmed76, Reference Wilson, Yung and Morrison77], and evolving diagnostic criteria may also have influenced results [Reference Blashfield, Keeley, Flanagan and Miles78]. Fourth, our focus on static illustration tasks reduced task heterogeneity and enabled meaningful synthesis but may limit generalization to other modalities. Future reviews should extend to other measures of mentalizing and social cognition, such as dynamic video-based tasks and naturalistic social scenarios. Fifth, this review focused exclusively on psychiatric populations, and findings may not generalize to other groups with social cognitive deficits (e.g., brain injury, Huntington’s disease, and Parkinson’s disease). Finally, restricting inclusion to English, peer-reviewed articles may have introduced language and publication bias.
Conclusion
This meta-analysis revealed significant mentalizing impairments, assessed through static illustration tasks, across psychiatric disorders, neurodevelopmental disorders, and at-risk groups. Notably, schizophrenia and early schizophrenia demonstrated the most pronounced deficits overall and across domains. While impairments were transdiagnostic, distinct domain-specific patterns (false belief, intentionality, and humor) suggested both shared vulnerabilities and disorder-specific mechanisms. Moreover, the substantial heterogeneity points to the importance of interindividual variability, which may complement diagnostic categories in identifying mechanistic pathways. At the same time, methodological challenges, such as ceiling effects and limited ecological validity, highlight the need for more refined and ecologically relevant measures. Future studies should prioritize domain-specific mechanisms within a transdiagnostic framework to advance both theory and intervention design.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1192/j.eurpsy.2025.10146.
Data availability statement
The search strategy is detailed in the Supplementary Material. Full search results and data entry forms are available from the authors upon request.
Author contribution
H.K.H. Tsui: Conceptualization, data curation, formal analysis, investigation, methodology, project administration, software, visualization, and writing – original draft.
J.C.H. Lo: Data curation, investigation, project administration, and writing – review and editing.
S.K.W. Chan: Conceptualization, investigation, methodology, project administration, resources, supervision, and writing – review and editing.
Financial support
This study was funded by the Basic Research Seed Fund, The University of Hong Kong (109000324; 104006611).
Competing interests
The authors declare none.






Comments
No Comments have been published for this article.