Multilingualism and mentalizing abilities in adults

Bilingual children have better Theory-of-Mind compared to monolingual children


Introduction
Social cognition refers to a set of processes that impact the way people perceive, attend to, store, and use information about others to smoothly navigate the social world (Moscowitz, 2004). Language also plays a vital role in how people interact with others and so it is unsurprising that language and social cognition are closely linked (Baker, Peterson, Pulos, & Kirkland, 2014;Pavias, van den Broek, Hickendorff, Beker, & Van Leijenhorst, 2016). In particular, it has been proposed that those who know more than one language have an advantage when it comes to understanding others (Fan, Liberman, Keysar, & Kinzler, 2015;Genesee, Tucker, & Lambert, 1975;Javor, 2016).
As the number of international migrants has greatly increased over the past 20 years (International Organization for Migration, 2020), many countries are experiencing an increase in linguistic diversity. This makes the question of whether and how this diversity influences social cognition a timely one. Multilinguals have extensive experience with alternating between their languages, often managing conflict between them, which fosters greater metalinguistic awareness (e.g., Bialystok, 1988;Cummins, 1978;Friesen & Bialystok, 2012). This in turn may help multilinguals understand that different people can have different perspectives and interpretations for the same event, which is a core aspect of social cognition (i.e., perspectivetaking). Multilinguals, as a result, may have an advantage when it comes to social cognition.

Bilingualism and social cognition in children
When the impact of bilingualism on social cognition is studied in children, it is often in relation to ToM. Theory-of-Mind, often used synonymously with the term mentalizing (Frith & Frith, 2003), refers to the understanding that other people have mental states that can differ from our own and from reality (Premack & Woodruff, 1978;Wimmer & Perner, 1983).
Mentalizing helps individuals understand what others are thinking and feeling, allowing people to better adjust to social situations and empathize with and predict the behavior of others (Hooker, Verosky, Germine, Knight, & D'Esposito, 2008). The development of ToM appears to be accelerated for children raised in bilingual environments compared to those raised in monolingual environments (e.g., Goetz, 2003;Farhadian, Abdullah, Mansor, Redzuan, Gazanizadand, & Kumar, 2010;Kovács, 2009;cf. Buac & Kaushanskaya, 2020). False-belief tasks, such as the Sally-Anne task, evaluate whether children understand that others can hold beliefs that differ from reality and bilingual children pass these tasks at earlier ages compared to monolinguals (e.g., Goetz, 2003;Kovács, 2009). A meta-analysis of 16 studies on this topic found that there was a small effect in favor of bilinguals across studies that used a false-belief task or some other test of ToM, like the appearance-reality task or perspective-taking task (Cohen's d = .22, p = .05;Schroeder, 2018). This effect-size was substantially larger once second-language proficiency was taken into account (Cohen's d = .58, p < .001). Controlling for language proficiency in these analyses is important because most ToM tasks recruit a host of language abilities (for a meta-analysis see Milligan, Astington, & Dack, 2007; for reviews see de Villiers, 2007 andde Villiers &de Villiers, 2014). For example, such tasks frequently rely on noticing subtle grammatical distinctions or following the details of a narrative. Considering bilinguals are often less proficient relative to monolinguals in each of their languages (Bialystok, Hawrylewicz, Grundy, & Chung-Fat-Yim, 2022;Friesen, Luo, Luk, & Bialystok, 2015;Kohnert & Bates, 2002), controlling for proficiency is necessary for a fair comparison. This does raise an important question, however. Because bilinguals perform worse on verbal ability measures and ToM tasks rely on verbal ability, how is it that an overall ToM benefit for bilinguals can still be observed?
Various accounts have been proposed to explain why bilingual children develop ToM earlier than their monolingual peers. One explanation is that bilingual children gain this advantage by virtue of their enhanced executive functioning (Goetz, 2003;Rubio-Fernández, 2017). Executive functioning refers to a set of higher-order cognitive processes that are responsible for selfcontrol and goal-oriented behavior, including inhibition, shifting of attention, and working memory (Diamond, 2013;Zelazo & Carlson, 2012). It has been proposed that to successfully communicate in the target language, bilinguals recruit domain-general executive functions to resolve the conflict that arises from two jointly activated language systems (e.g., Marian & Spivey, 2003;Shook & Marian, 2019;Thierry & Wu, 2007; for a review see Kroll, 2017). The need for bilinguals to maintain attention on the target language while ignoring the non-target language is thought to lead to enhanced executive functioning (Bialystok, 2015(Bialystok, , 2017. Furthermore, because multilinguals are dealing with three or more co-activated languages, they may have better executive control abilities than bilinguals due to the additional demands placed on the cognitive system to inhibit or switch between languages. That said, evidence for the effect of trilingualism on executive control is mixed, with some studies showing better performance for trilinguals than bilinguals (Cedden & Şimşek, 2014;Madrazo & Bernardo, 2018), whereas others have reported equivalent performance between these two groups (Guðmundsdóttir & Lesk, 2019;Poarch & van Hell, 2012). As additional support, however, multilinguals who are fluent in more than two languages are at a lower risk of cognitive decline compared to bilinguals (Perquin, Vaillant, Schuller, Pastore, Dartigues, Lair, & Diederich, 2013). ToM tasks frequently draw upon executive functions, relying on the ability to suppress one's own knowledge (inhibition), take another person's perspective (shifting attention), and hold this information in mind to make an inference (working memory) (Perner & Lang, 1999). Not surprisingly then, both executive functions and ToM follow similar developmental trajectories (Vetter, Leipold, Kliegel, Phillips, & Altgassen, 2013) and share a common neurological basis in the prefrontal cortex (Carlson & Moses, 2001).
Another account of why bilingual children may have a ToM advantage relative to monolinguals is based on the observation that bilingual children do better than monolingual children on metalinguistic tasks (Adesope, Lavin, Thompson, & Ungerleider, 2010 for a meta-analysis; Friesen & Bialystok, 2012). Metalinguistic awareness refers to the ability to consciously reflect about language, independent of its literal meaning (Doherty & Perner, 1998). In other words, it is the awareness that language has a structure that can be manipulated and that a single word may have multiple meanings separate from its direct referent. Schroeder (2018) explained that bilinguals have an understanding that the same concept can be represented by two different labels, one in each language. This may transfer to the understanding that different people can hold different beliefs, desires, and intentions about the same event. Furthermore, bilingualism fosters an understanding that not everyone shares the same piece of information (Fan et al., 2015;Genesee, Boivin, & Nicoladis, 1996;Genesee, Nicoladis, & Paradis, 1995). For example, a bilingual child may adapt their language-use depending on their knowledge of the language limitations of their conversational partners. Hence, bilinguals may be precocious in appreciating that others have a different perspective from their own.

Bilingualism and mentalizing in adults
Most research on bilingualism and mentalizing has focused on children, with very few studies examining this question in adults. One notable exception is a study by Rubio-Fernández and Glucksberg (2012), who asked 23 monolingual and 23 bilingual young adults to perform an adapted Sally-Anne task (Baron-Cohen, Leslie, & Frith, 1985) while their eye-movements were recorded. In this version of the Sally-Anne task, adults watched a cartoon where a character (Sally) placed an object in a location that is later moved in her absence by another character (Anne), with the target question being where Sally will look for this object. The test isolates whether individuals understand that Sally's beliefs about the world can deviate from the true state of affairs, given her ignorance of Anne's actions. Although there was no difference in reaction time between monolingual and bilingual adults, more than half of the bilinguals fixated directly on the correct location whereas only about a quarter of the monolinguals did the same. Furthermore, attentional control, as measured by the Simon task, was associated with false-belief performance in both groups.
In another study by Cox and colleagues (Cox, Bak, Allerhand, Redmond, Starr, Deary, & MacPherson, 2016), examining 90 older adults (∼74 years old; 26 bilinguals and 64 monolinguals), bilinguals exhibited less variability in scores on a measure of social reasoning (i.e., the Faux Pas test; Stone, Baron-Cohen, & Knight, 1998;Gregory, Lough, Stone, Erzinclioglu, Martin, Baron-Cohen, & Hodges, 2002) and better attentional control on the Simon task. In a young adult sample, Navarro and Conway (2021) compared monolinguals (n = 41) and bilinguals (n = 37) on an adult-appropriate measure of mentalizing known as the director task (Dumontheil, Küster, Apperly, & Blakemore, 2010;Keysar, Lin, & Barr, 2003), which measures visual perspective-taking. In this task, participants must monitor what objects on a grid are visible to both the self and another person with a different visual perspective. Bilinguals were more accurate than monolinguals on the trials requiring participants to consider the perspective of the other person. Another study of 89 participants revealed that performance on the director task was predicted by the frequency of second language use, amount of switching between languages, as well as the number of languages spoken by family members during the participant's childhood (Navarro, DeLuca, & Rossi, 2022). Finally, Tiv, O'Regan, and Titone (2021) examined 61 bilinguals who read sets of sentences that relied on making logical inferences, mental inferences (i.e., mentalizing), or neither. All sentences were presented in English, which was the first language learned for some participants (n = 31) and the second for others (n = 30). Participants reading in their second language made mental-state inferences more quickly compared to those reading in their first language. They also found that using multiple languages frequently in a variety of different contexts was associated with perceiving more mentalizing content in the mentalizing sentences. Overall, these findings suggest that bilingualism influences social cognitive abilities in adults across a variety of tasks.

Current study
The current study builds on past work by testing the association between multilingualism and mentalizing. By capitalizing upon a large archival dataset, an aggregation of many past datasets from our laboratory, this question was investigated with a much larger sample size than in previous research. These datasets included scores for the most widely-used measure of mentalizing ability in adults, the Reading-the-Mind-in-the-Eyes Test (RMET; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 2001). The RMET has been cited over 6,500 times according to Google Scholar as of August 2022 and has now had many translated versions validated, including in German (Pfaltz, McAleese, Saladin, Meyer, Stoecklin, Opwis, Dammann, & Martin-Soelch, 2013), French (Prevost, Carrier, Chowne, Zelkowitz, Joseph, & Gold, 2014), Italian (Vellante, Baron-Cohen, Melis, Marrone, Petretto, Masala, & Preti, 2013), and Spanish (Fernández-Abascal, Cabello, Fernández-Berrocal, & Baron-Cohen, 2013). The test involves presenting photographs of a person's eye region, and participants must choose which of four possible adjectives best describes what the person is thinking or feeling. Recognizing emotions from facial cues is an important aspect of mental inference and is measured by the RMET (Oakley, Brewer, Bird, & Catmur, 2016). However, a recent meta-analysis confirmed that other aspects of mentalizing are just as important contributors for this task (e.g., understanding causality and perceptual discrimination of socially relevant stimuli; Kittel, Olderbak, & Wilhelm, 2021), as an inspection of the items would also suggest (e.g., recognizing mental states such as preoccupied and tentative). Those with known social impairments tend to do worse on the RMET, such as individuals with autism ( Baron-Cohen et al., 2001), schizophrenia (Kettle, O'Brien-Simpson, & Allen, 2008;Köther, Veckenstedt, Vitzthum, Roesch-Ely, Pfueller, Scheu, & Moritz, 2012;Schimansky, David, Rössler, & Haker, 2010), and social anxiety (Machado-de-Sousa, Arrais, Alves, Chagas, de Meneses-Gaya, Crippa, & Hallak, 2010).
In order to adopt a conservative approach to examining our research question, we included and controlled for several variables relevant to mentalizing. Recent meta-analyses found that women scored higher than men on the RMET (e.g., Hall, Hutton, & Morgan, 2010;Kirkland, Peterson, Baker, Miller, & Pulos, 2013;McClure, 2000) and that verbal ability also contributes to RMET performance (Peñuelas-Calvo, Sareen, Sevilla-Llewellyn-Jones, & Fernández-Berrocal, 2019). For example, RMET is correlated with verbal intelligence at about r = .24, 95% CI [.13, .34], according to a meta-analysis by Baker et al. (2014). In addition to controlling for gender and years of English fluency, we also controlled for the Other-Race effect (i.e., the same-race bias), in which faces from other races are processed with greater difficulty (Golby, Gabrieli, Chiao, & Eberhardt, 2001;Kelly, Quinn, Slater, Lee, Ge, & Pascalis, 2007). In sum, our analyses controlled for gender, the same-race bias, and years of English fluency.
In addition to extending past work into the realm of mentalizing, analyzing a larger sample, and controlling for several relevant demographic variables, our study added additional nuance by considering multilingualism in a continuous fashion. Several researchers have proposed that participants should not be categorized into two discrete groups based on language abilities (i.e., monolingual vs. multilingual), but rather multilingualism should be examined along a range of factors due to its multidimensional nature (Baum & Titone, 2014;de Bruin, 2019;Kaushanskaya & Prior, 2015;Luk & Bialystok, 2013;Whitford & Luk, 2019). It remains relatively unknown whether knowing more than two languages provides any additional benefit to mentalizing. The majority of research to-date has investigated only bilingual environments in comparison to monolingual ones, but our data allowed us to examine if there is a linear association between the number of languages known and mentalizing ability (cf. one study did not find an association between number of languages and social cognition; Navarro et al., 2022). To summarize, our study builds on past work examining the relation between bilingualism and mentalizing by analyzing a large archival sample who completed an adult measure of mentalizing, while controlling for relevant demographic variables and examining multilingualism as a categorical and continuous variable.

Selection of studies
Twenty-eight pre-existing datasets collected from 2011 to 2020 for separate and unrelated studies were aggregated. These datasets were selected because they included scores for the RMET, demographic information regarding languages, and at least 10 valid participants with complete data for the RMET. Informed consent for the use of these data was obtained at the time of testing and ethics approval was obtained from the local Institutional Review Board for each study.

Participants
The initial sample size after aggregation was 2,443. Our preregistered exclusion criteria included removing participants who were missing any responses for the RMET items (n = 269), if participants reported an unusually low or high age in years (ages 8, 10, and 99 were removed; n = 3), and if participants did not report the language information needed for this study (n = 15). Since all pre-registered analyses controlled for gender, race, and years speaking English, participants were also automatically removed if they were missing any of these covariates because regression uses list-wise deletion (i.e., a participant missing even one covariate is removed from the analysis; n = 164).
Our final sample consisted of 1,995 participants. The average age of respondents was 23.38 years (SD = 9.25) and 64% were female. Most of our sample spent an average of 21 years speaking English (M = 21.44, SD = 10.46), were university undergraduates from a large and multicultural city in Canada, with 11% (n = 223) coming from Amazon's Mechanical Turk (MTurk; an online crowdsourcing platform). Most were of European (49%) or Asian (30%) heritage. Table 1 presents a detailed breakdown of the cultural and linguistic diversity of the total sample. Across the entire sample, 68% (n = 1,376) were L1 English speakers (English as the first acquired language). Forty-two percent of the sample reported speaking one language (n = 840), 53% reported speaking two languages (n = 1,056), 5% reported spoking three languages (n = 95), and only four people reported speaking four or more languages.

Measures
Reading-the-Mind-in-the-Eyes Test (RMET; Baron-Cohen et al., 2001) All of our datasets included the RMET as a measure of mentalizing, in which participants must correctly identify the mental state of a person based on a grayscale image of their eye-region (e.g., fantasizing, decisive). Responses were selected from four possible options and all options were accompanied by definitions to reduce the language load of the task. The RMET was scored as the total number of items answered correctly, out of a maximum of 36. This measure has demonstrated acceptable test-retest reliability, across numerous countries and translations (e.g., Fernández-Abascal et al., 2013;Pfaltz et al., 2013;Vellante et al., 2013). The RMET has also been validated by capturing expected group differences in mentalizing ability, such as between women and men, and between neurotypical controls and those with high-functioning autism or Asperger's syndrome (Baron-Cohen et al., 2001;Khorashad, Baron-Cohen, Roshan, Kazemian, Khazai, Aghili, Talaei, & Afkhamizadeh, 2015;Warrier, Bethlehem, & Baron-Cohen, 2017). Lastly, the RMET is associated with neurophysiological responses consistent with mentalizing (Adams, Rule, Franklin, Wang, Stevenson, Yoshikawa, Nomura, Sato, Kveraga, & Ambady, 2010;Domes, Heinrichs, Michel, Berger, & Herpertz, 2007).
Language background information. All participants selfreported their first and second language, and the number of languages in which they were fluent. These data were coded in various ways in order to capture different aspects of multilingulism. First, Multilingual Status was coded as a binary predictor in which those who reported knowing only one language were coded as monolingual, with those reporting two or more languages coded as multilingual. Second, to fully explore whether multilingualism has advantages over bilingualism, the number of fluent languages was examined as both a continuous (i.e., total number of languages) and as a categorical predictor (i.e., monolingual vs. bilingual vs. multilingual). The latter approach accounts for the fact that we did not know in advance, at the time of pre-registration, whether the range of languages in our sample was sufficient to support treating multilingualism as a continuous predictor. Lastly, because the RMET is administered in English, we conducted an exploratory analysis to see if the effect of multilingualism depends on whether English is the first language learned or not, by creating three categorical variables: English-Monolingual, English-Multilingual, or Other-Multilingual. This allowed us to investigate whether any multilingual advantage is boosted when operating in a non-native tongue, in line with the results of Tiv and colleagues (2021).

Control variables
Gender, race, and years of English fluency were controlled for, as they have all been associated with mentalizing. We controlled for gender, as women consistently demonstrate better mentalizing abilities than men (Baron-Cohen, Bowen, Holt, Allison,  Kirkland et al., 2013;McClure, 2000). As the RMET targets were all White, there is a possibility of a cross-race effect in that participants who share the race of the RMET targets will be more accurate in recognizing the mental states compared to participants of another race (Adams et al., 2010;Elfenbein & Ambady, 2003). Thus, we used cultural background to compute a race variable, where 0 = Other-Race and 1 = Same-Race. Although all mental states in the RMET were accompanied by definitions in order to reduce the reliance on vocabulary, we also controlled for years of English fluency as a proxy for language proficiency (as in the meta-analysis by Schroeder, 2018). Although we pre-registered that we would use age as a control variable, due to high collinearity with years of English fluency (r = .91) we deviated from our pre-registration and omitted age as a control variable.

Descriptive statistics
All analyses were conducted with R in RStudio (version 4.1.2; RStudio Team, 2020). Zero-order correlations between all variables appear in Table 2. Variables that are dichotomous were dummy-coded (see Table 2 notes). The average RMET accuracy was 24 out of 36 items correct (68%; M = 24.39, SD = 5.40), consistent with past reports with non-clinical adult samples (Baron-Cohen et al., 2001;Dietze & Knowles, 2021;Kraus, Côté, & Keltner, 2010). These scores ranged from 4 to 36 items answered correctly, again consistent with past work (Dietze & Knowles, 2021). Being female, having more years of experience speaking English, and being the same-race as the RMET targets were all associated with higher RMET scores. Having English as a first language was also associated with higher RMET scores. Contrary to our expectations, being multilingual, compared to being monolingual, was associated with lower RMET scores. Consistent with this, we also observed a negative correlation between the number of languages spoken and RMET scores.

Primary analyses
The following primary analyses were all pre-registered (https://osf. io/ngyf4/). The data satisfied all assumptions for regression and all reported regression estimates are unstandardized. We conducted hierarchical regressions to examine if our language predictors predict mentalizing accuracy, above and beyond demographic variables. In Step 1, we controlled for gender, same-or other-race as target, and years of English fluency. Step 2, we entered multilingual status as a predictor (0 = monolingual; 1 = multilingual). We observed that being multilingual (speaking two or more languages) was associated with lower mentalizing accuracy compared to being monolingual, b = -0.99, 95% CI [-1.53, -0.45], p < .001. Adding this variable improved the model over and above the control variables, χ 2 (1, N = 1,995) = 12.86, p < .001 (see Figure 1A)  In a parallel model, we used the number of known languages by participants in Step 2. We pre-registered that we would look at the number of languages as a continuous variable and as a categorical variable because we were unsure whether there would be enough participants who knew more than three languages. As suspected, the low counts for those who have four or five languages (n = 4) makes it inappropriate to treat number of languages as a continuous variable. Instead, we opted to use number of languages as a categorical variable, coding for those with one, two, or three or more languages (see Table 3 for sample characteristics by number of languages). Given that the categories have a meaningful order, we also tested the number of languages as an ordered categorical variable (i.e., ordinal).
We entered the number of languages as a categorical variable into the model as two dummy-coded variables for (1) monolingual versus bilingual, and (2) Figure 1B). Although multilinguals performed better than bilinguals (d = -0.20, 95% CI [-0.40, 0.01]), the post-hoc t-test was not statistically significant, t(1989) = 1.88, p = .15. The addition of these dummy-coded variables improved the model, χ 2 (1, N = 1,995) = 16.40, p < .001, and variance explained in the model, R 2 adj = 0.07. Although being bilingual is associated with lower mentalizing accuracy compared to being monolingual, monolingual and multilingual speakers exhibit similar mentalizing accuracy.
In a third parallel model, we entered the number of known languages into the model as an ordinal variable, as an alternative to testing the number of languages continuously. In R, inputting an ordinal variable in a regression model automatically tests for higher-order polynomials up to the number of levels minus 1. Given that number of languages has three levels (i.e., monolingual, bilingual, multilingual), both linear and quadratic trends were automatically tested. The linear term was not statistically significant, suggesting that there is no evidence of a linear association between number of languages and mentalizing accuracy, b = -0.03, 95% CI [-0.83, 0.77], p = .95. However, the quadratic term was statistically significant, suggesting a curvilinear association between number of languages (as an ordinal variable) and mentalizing, b = 0.86, 95% CI [0.34, 1.38], p = .001 (see Figure 1C). The addition of number of languages as linear and quadratic terms improved the model, χ 2 (1, N = 1,995) = 16.40, p < .001, and variance explained in the model, R 2 adj = 0.07. This is consistent with our categorical findings, in which monolinguals and multilinguals did not differ in their mentalizing ability but bilinguals were worse than monolinguals, suggesting a U-shaped trend.

Secondary analyses
The following analyses were conducted without a priori predictions about the results, but they were included in our preregistration. We first examined whether the association between multilingualism and mentalizing accuracy depends on whether the participant was an L1 English speaker. We did this by creating a categorical variable for L1-multilingual status, resulting in three categories: English-monolingual, English-multilingual (i.e., multilingual with L1 English), and Other-multilingual (i.e., multilingual with a non-English L1). Based on this variable, we entered two dummy-coded variables into the model with Englishmultilingual as the reference category: (1) English-multilingual versus English-monolingual, and (2) English-multilingual versus Other-multilingual. The first dummy-coded variable tests whether multilinguals have worse mentalizing than monolinguals while only looking at English L1 speakers. The second tests whether multilinguals' mentalizing accuracy is affected by whether they are an English L1 speaker or not. Compared to being an English-multilingual (M = 23.87, SE = 0.22), being an English-monolingual (M = 24.82, SE = 0.20) was associated with better mentalizing accuracy, b = 0.94, 95% CI [0.35, 1.53], p = .002. This demonstrates that the worse mentalizing abilities observed in multilinguals relative to monolinguals is not exclusive to those who do not have English as their native tongue. There was no difference observed between English-multilinguals and Other-multilinguals (M = 23.75, SE = 0.24), suggesting that the deficit observed for multilinguals is not a result of having English as a second language, t(1989) = 0.39, p = .92, d = 0.02, 95% CI [-0.09, 0.14]. Entering both dummycoded variables improved the model, χ 2 (1, N = 1,995) = 13.02, p = .001, and the variance explained in the model, R 2 adj = 0.07 ( Figure 1D). Multilingualism, regardless of whether English was the first language or not, is therefore associated with lower mentalizing accuracy as captured by the RMET.

Exploratory analyses
Given the observed curvilinear effects, it was important to also examine whether this U-shaped pattern between monolinguals, bilinguals, and multilinguals is also present if we examine only the L1 English speakers (n = 1,376; see Table 4 for sample characteristics). This analysis was exploratory and not pre-registered. For L1 English speakers only, we specified a regression model with gender, same-or other-race as target, years of English fluency, and the number of languages as an ordinal variable. We again observed a quadratic effect for number of languages on mentalizing accuracy (b = 1.14, 95% CI [0.34, 1.94], p = .005), but not a linear effect, b = 0.43, 95% CI [-0.81, 1.68], p = .49 (see Figure 2). Specifically, being an L1 English monolingual (M = 25.10, SE = 0.20) was associated with better mentalizing accuracy than being  14.

Discussion
The current study investigated whether knowing more than one language leads to better mentalizing abilities in adults. In a large sample of almost 2,000 participants, monolinguals were more accurate than multilinguals (speakers of two or more languages) on a task-based measure of mentalizing abilities, the RMET. For all analyses, we controlled for gender, same-or otherrace as target, and years of English fluency, demonstrating that these effects are robust and not a function of these other relevant variables. When the multilingual group was further broken down into two groups (speakers of two languages versus speakers of three or more languages), a quadratic U-shaped trend across language groups emerged: monolinguals did better than bilinguals (speakers of only two languages), but multilinguals (speakers of three or more languages) also did better than bilinguals. Moreover, the U-shaped pattern remained even when only those who with English as their first language was examined (with English being the language of the test). Overall, these findings highlight the need to (1) differentiate speakers of two languages from speakers of three or more languages in research on multilingualism and cognition, and (2) make concerted efforts to recruit larger samples of multilingual participants to allow for adequate comparisons between language groups (as suggested by Brysbaert, 2021). The results we observed ran counter to our original theorizing and depart from past demonstrations that bilingualism is associated with a ToM advantage in both children (see Schroeder, 2018 for a meta-analysis) and adults (Navarro & Conway, 2021;Navarro et al., 2022;Rubio-Fernández & Glucksberg, 2012). A great deal of our initial theorizing was rooted in a potential executive functioning advantage for bilinguals. Undermining this line of reasoning, however, is that several recent meta-analyses and a very large-sample study fail to find an executive functioning advantage for bilinguals (Dick, Garcia, Pruden, Thompson, Hawes, Sutherland, Riedel, Laird, & Gonzalez, 2019;Lehtonen, Soveri, Laine, Järvenpää, de Bruin, & Antfolk, 2018;Lowe, Cho, Goldsmith, & Morton, 2021;Paap, Johnson, & Sawi, 2015). With respect to previous findings, as our study is based on a far larger sample than past studies our results are likely the best available estimates of these effects, with larger samples reducing the likelihood of observing falsely-positive results (Green, Munafò, DeYoung, Fossella, Fan, & Gray, 2008;Pashler & Harris, 2012). In addition, we may have also observed a different pattern of findings by controlling for several relevant nuisance variables that were not always taken into account in previous research (i.e., gender, same-or other-race as target, years of English fluency). Lastly, differences between studies might originate in the different tasks employed. We examined performance on the RMET, which measures the ability to infer mental states based on subtle nonverbal cues. Previous studies used either the Sally-Anne task (Rubio-Fernández & Glucksberg, 2012) or the director task (Navarro & Conway, 2021;Navarro et al., 2022), which may recruit executive functioning to a greater extent than the RMET. For example, in the director task, participants are required to ignore their own predominant viewpoint and focus on the director's viewpoint. Similarly, in the Sally-Anne task, participants are required to ignore the new location of the object (i.e., Anne's perspective) and instead report the object's previous location (i.e., Sally's perspective). In addition, the Sally-Anne task recruits working memory processes as participants need to keep track of the story. The RMET, in comparison, relies little on inhibition and working memory.
Differences in how monolinguals and bilinguals process faces may lie at the root of the differences we observed. The RMET involves deciphering the mental state of an actor based on the information from their eye region, and recent research indicates that bilingual adults tend to be slower at processing faces than monolingual adults (Hausmann, Durmusoglu, Yazgan, & Güntürkün, 2004;Kandel, Burfin, Méary, Ruiz-Tada, Costa, & Pascalis, 2016). This may be because bilinguals are spending more time processing additional contextual information pertaining to race and language, compared to monolinguals (Kandel et al., 2016). This, however, seems unlikely to be the full picture for our data as all targets were the same race and it cannot explain why multilinguals (speaking three or more languages) seemed to do better than bilinguals. Future research should examine the face-processing capabilities of multilinguals, in addition to monolinguals and bilinguals. In addition, eye-tracking data might help inform the processes underlying the U-shaped pattern of results we observed. One other possible explanation could be rooted in the observation that bilinguals perform worse than monolinguals on verbal tasks. For example, bilingual children and adults generally have a smaller vocabulary (e.g., Bialystok et al., 2022), are slower and less accurate to name pictures (e.g., Gollan, Montoya, Fennema-Notestine, & Morris, 2005;Roberts, Garcia, Desrochers, & Hernandez, 2002), and generate fewer items on verbal fluency tasks relative to their monolingual peers (e.g., Bialystok, Craik & Luk, 2008;Portocarrero, Burright, & Donovick, 2007;Sandoval, Gollan, Ferreira, & Salmon, 2010). That said, the worse performance for bilinguals on the RMET, compared to monolinguals, is unlikely due to worse verbal skills. First, our results were observed even after controlling for years of English fluency and with all definitions for the target mental state terms provided. Second, our multilingual group performed around the same as our monolingual group. Presumably, this multilingual group should be even less proficient in English relative to the bilingual group given that they are dividing their time between three or more languages. Multilinguals performing equally to the monolingual group and not as poorly as the bilinguals is therefore inconsistent with the idea that differences in verbal ability are driving our results. Third, the U-shaped trend we observed persisted even when we only examined those who had English as their first language (i.e., they had life-long English skills). That said, we were limited in our language measures as we were analyzing archival data. Future studies designed to examine this topic should include nuanced measures of language ability, including both subjective and objective measures of proficiency.
A related limitation of the current study is the lack of a detailed language background questionnaire to assess each participant's language experience. Questions pertaining to different aspects of bi-/multilingualism, such as the context of language acquisition (home, school, travel, or work), age of acquisition, and relative frequency of language use, would have provided a better picture of our participants' linguistic profile. As it stands, we do not know what level of proficiency the multilingual speakers had in their third language, for example. Such information would provide important context when interpreting the differences we observed between our bilinguals and multilinguals. In addition, asking participants about the language background of their main conversation partners could further elucidate our results. Similar to Navarro and colleagues (2022), we may find that other factors predict mentalizing ability, such as the number of speakers they interact with on a daily basis who speak their second or third language, or the amount of daily exposure to their other languages.
In sum, our study demonstrates that bilingual adults have worse mentalizing abilities than monolinguals, but that this difference appears to disappear when considering people who speak three or more languages. Moreover, these results cannot be attributed to differences in gender, race, and years of English fluency, nor whether people have English as their first language. Because of the necessarily correlational nature of this research, however, the direction of causality cannot be inferred. Bilingualism may promote mentalizing ability, or those who are better at mentalizing may be more motivated to acquire multiple languages to better connect with others. Future research on this topic should attempt to collect large samples, control for potential nuisance variables, examine multilingualism in addition to bilingualism, attempt to evaluate motivations for becoming multilingual, and also explore key linguistic and socio-linguistic factors.

Data Availability Statement
All materials, analysis script, and Stage 1 manuscript will be made publicly available upon publication on Open Science Framework (osf.io/vtjea). The raw data will be made promptly available to any researchers upon request.
Conflict of Interest. The authors have no conflicts of interest to declare.