Highlights
-
• We focused on monolingual and bilingual native speakers of English.
-
• We measured their attention allocation to spoken language using pupil dilation.
-
• Bilinguals and monolinguals allocate attentional resources differently.
1. Introduction
In recent decades, myriads of studies have investigated the bilingualism effects on cognition. While debated, the extant literature suggests that bilingual experience can lead to domain-general cognitive benefits, which is termed the bilingual cognitive advantage (e.g., Adesope et al., Reference Adesope, Lavin, Thompson and Ungerleider2010; Bialystok, Reference Bialystok2017; Grundy, Reference Grundy2020; cf. Bao et al., Reference Bao, Alain, Thaut and Molnar2024; Lehtonen et al., Reference Lehtonen, Soveri, Laine, Järvenpää, de Bruin and Antfolk2018; Lowe et al., Reference Lowe, Cho, Goldsmith and Morton2021). For example, bilinguals have shown better executive functions in the nonlinguistic domain relative to monolinguals (e.g., Crivello et al., Reference Crivello, Kuzyk, Rodrigues, Friend, Zesiger and Poulin-Dubois2016; Stocco et al., Reference Stocco, Yamasaki, Natalenko and Prat2012). As a possible mechanism, attention has been argued to explain such effects. For example, according to the attentional control framework, bilingual experience can enhance attentional control, a repertoire of processing operations that higher level cognition uses to accomplish goals (Bialystok, Reference Bialystok2017; Bialystok & Craik, Reference Bialystok and Craik2022). Especially the continuing need to manage two languages results in more efficiency in utilizing attentional resources even in nonlinguistic settings (Bialystok & Craik, Reference Bialystok and Craik2022).
Attention entails at least two aspects: the control processes and the processing resources (e.g., Bialystok & Craik, Reference Bialystok and Craik2022; Kahneman, Reference Kahneman1973; Shiffrin & Schneider, Reference Shiffrin and Schneider1977). The former are necessary for managing cognitive or motor tasks, and the latter for supporting those activities (Bialystok & Craik, Reference Bialystok and Craik2022). Nevertheless, much of the bilingualism research has primarily studied attentional control (e.g., Krizman et al., Reference Krizman, Skoe, Marian and Kraus2014; Ooi et al., Reference Ooi, Goh, Sorace and Bak2018; Soveri et al., Reference Soveri, Laine, Hämäläinen and Hugdahl2010; Stafford, Reference Stafford2011). While investigators may speculate the amount of attentional resources involved in certain cognitive tasks, such assumptions have rarely been tested explicitly. We thus designed this study to examine whether monolinguals and bilinguals differ in attentional resource allocation – an underexplored area in the bilingual attention literature – quantified via pupil dilation (i.e., increased pupil diameter). Specifically, we contextualized this question within spoken language processing, which extends the pivot of the bilingual cognitive advantage to the linguistic domain.
As a well-established physiological measure, pupil dilation has been used to index attention allocation changes (e.g., Hess & Polt, Reference Hess and Polt1960; Laeng et al., Reference Laeng, Sirois and Gredebäck2012) and indicate attentional effort exerted in a cognitive task (e.g., Kahneman, Reference Kahneman1973; van der Wel & van Steenbergen, Reference van der Wel and van Steenbergen2018). In auditory processing, increased pupil dilation has been considered a proxy of greater listening effort, which refers to the deliberate allocation of attentional resources in goal pursuit when carrying out a listening task (Pichora-Fuller et al., Reference Pichora-Fuller, Kramer, Eckert, Edwards, Hornsby, Humes, Lemke, Lunner, Matthen, Mackersie, Naylor, Phillips, Richter, Rudner, Sommers, Tremblay and Wingfield2016). Research also suggests that the pupil response is sensitive to input- and listener-based variables (e.g., Zekveld et al., Reference Zekveld, Koelewijn and Kramer2018). For example, larger pupil size is associated with increases in stimulus complexity and processing demands (e.g., McCloy et al., Reference McCloy, Lau, Larson, Pratt and Lee2017; Piquado et al., Reference Piquado, Isaacowitz and Wingfield2010), and with bilingual status when comparing monolinguals to second language (L2) learners, who are usually sequential bilinguals that learned L2 after the age of three (e.g., Borghini & Hazan, Reference Borghini and Hazan2018, Reference Borghini and Hazan2020; further discussed below). However, the relationship between pupil size and individual linguistic/cognitive abilities remains unclear given mixed results (e.g., better abilities associated with larger pupil size in Koelewijn et al., Reference Koelewijn, Zekveld, Festen, Rönnberg and Kramer2012 and Zekveld & Kramer, Reference Zekveld and Kramer2014, but with smaller pupil size in Koelewijn et al., Reference Koelewijn, Zekveld, Festen and Kramer2014 and Zekveld et al., Reference Zekveld, Rudner, Kramer, Lyzenga and Rönnberg2014).
2. Pupil dilation in spoken language processing in prior bilingualism research
Some studies have demonstrated that increased pupil dilation reflects greater listening effort or processing load in bilinguals during spoken language processing. For instance, Borghini and Hazan (Reference Borghini and Hazan2018, Reference Borghini and Hazan2020) observed significantly larger pupil size in Italian-English bilinguals than in English monolinguals when listening to English sentences, both in quiet and noisy environments. They interpreted this as evidence that bilinguals exert more effort when processing speech in an L2. Similarly, Francis et al. (Reference Francis, Tigchelaar, Zhang and Zekveld2018) reported that native Dutch speakers with varying L2-English proficiency showed greater pupil dilation when processing spoken sentences in English compared to their first language (L1). In developmental research, Brännström et al. (Reference Brännström, Rudner, Carlie, Sahlén, Gulz, Andersson and Johansson2021) found that bilingual Swedish children (ages 7–9) had more pupil dilation when processing spoken passages in a typical classroom condition than in a favorable setting with less noise, again supporting the idea that bilingual language processing may demand more listening effort.
The picture, however, becomes more complicated when considering individual differences (e.g., language proficiency and use, cognitive ability). For example, the modulatory role of L2 proficiency in the relationship between pupil dilation and effort appears inconsistent. In a picture–word matching experiment, Schmidtke (Reference Schmidtke2014) examined English spoken word recognition among English monolinguals and Spanish-English bilinguals who learned English either in childhood or adulthood. He found that bilinguals with higher English proficiency showed smaller pupil dilation, suggesting that increased proficiency may ease processing and reduce effort. However, Borghini and Hazan (Reference Borghini and Hazan2018, Reference Borghini and Hazan2020) showed that bilinguals’ linguistic profiles (e.g., self-reported English proficiency and use) did not reliably predict pupil size. Yet, Beatty-Martínez et al. (Reference Beatty-Martínez, Guzzardo Tamargo and Dussias2021) found that early Spanish-English bilinguals with more code-switching experience (i.e., alternating between two languages in conversation) and better attention ability (measured by the elevator counting task) had larger pupil dilation when processing code-switched speech – indicative of greater cognitive engagement.
In addition, contextual factors (e.g., task demands) also shape pupillary responses. In an earlier study with highly proficient Finnish-English bilinguals, Hyönä et al. (Reference Hyönä, Tommola and Alaja1995) observed smaller pupil size during active listening compared to more demanding tasks (e.g., repetition or simultaneous interpretation), suggesting a positive link between processing difficulty and pupil dilation. Taken together, prior findings in the literature illustrate that pupil dilation may not simply be associated with listening effort but reflect the combined effects of individual differences and contextual factors. Nonetheless, it seems that sequential bilinguals often require more listening effort when processing L2 compared to L1, and relative to monolinguals processing the same language (e.g., Borghini & Hazan, Reference Borghini and Hazan2018, Reference Borghini and Hazan2020; Francis et al., Reference Francis, Tigchelaar, Zhang and Zekveld2018). Yet the influence of age of acquisition (AoA) and proficiency complicates interpretations of pupil data (e.g., Schmidtke, Reference Schmidtke2014; cf. Francis et al., Reference Francis, Tigchelaar, Zhang and Zekveld2018).
Notably, little research has investigated the processing of a shared L1 among monolinguals and simultaneous bilinguals. In contrast to sequential bilinguals, simultaneous bilinguals are those who learned both languages before the age of three and typically have comparable proficiency across languages. This is the entry point of our work in terms of the population being compared and the language being tested. Specifically, we chose to test simultaneous bilinguals because it enables us to understand more about the effects of a bilingual vs. monolingual environment on cognition, while minimizing the confounding influences of AoA and language proficiency across language groups.
3. The current study
The goal of our study was to investigate how listening effort, indexed by pupil dilation, is influenced by language environment and individual difference (e.g., language proficiency, cognitive ability) factors in monolinguals and simultaneous bilinguals during spoken language processing (pre-registered on the Open Science Framework at https://osf.io/csz3a). We approached this goal through the lens of attentional resources, an aspect of attention overlooked in bilingualism research. While bilinguals are suggested to have enhanced attention control especially in the nonlinguistic domain (e.g., Bialystok, Reference Bialystok2017; Bialystok & Craik, Reference Bialystok and Craik2022), it is often assumed that this benefit stems from their practice of language switching. Focusing on listening effort in speech processing, our study probed into the origin of the bilingual advantage by shifting the spotlight from the nonlinguistic domain to a linguistic context and examining effects that contribute to monolingual vs. bilingual differences beyond language switching.
3.1. Population
Considering bilingualism as a continuum (e.g., Luk & Bialystok, Reference Luk and Bialystok2013; Marian & Hayakawa, Reference Marian and Hayakawa2021), we tested two groups of native Canadian English speakers with comparable age and socioeconomic status (SES) background: monolinguals and simultaneous heritage bilinguals who had similar AoA of English. These two groups represent the two endpoints of the bilingualism spectrum with distinct linguistic environments. For example, compared to monolinguals, simultaneous bilinguals have less English exposure but higher linguistic complexity (e.g., more phonological and syntactic variabilities from different languages). Additionally, we were able to minimize the effects of various confounders that differ across monolinguals and sequential bilinguals (e.g., AoA, language proficiency), and consider factors that vary within our simultaneous bilingual sample.
3.2. Factors
Combining the categorical and continuous approaches to bilingualism (Kremin & Byers-Heinlein, Reference Kremin and Byers-Heinlein2021), we assigned participants to a monolingual or bilingual group while also factoring into continuous individual variations within each group. As such, we analyzed not only the categorical variable of language group, but also continuous variables related to language environment (e.g., language exposure, linguistic complexity) and individual differences (e.g., language proficiency, cognitive ability). By doing so, we assessed the modulatory role of various environmental and individual variables on pupil responses, which has not been systematically addressed in previous bilingualism research on spoken language processing.
3.3. Conditions
We designed two conditions where participants listened to speech varied by language familiarity. In an active listening task, participants were presented with passages spoken in a familiar (English) or unfamiliar (Hebrew) language. Neuroimaging evidence from monolinguals has shown that speech processing of an unfamiliar language, without semantic support, is more effortful than that of L1. For example, Cotosck et al. (Reference Cotosck, Meltzer, Nucci, Lukasova, Mansur and Amaro2021) found that the brain network associated with attention demanding tasks was activated in Brazilian listeners only when a story was presented in an unknown language (Japanese) to the participants but not in their L1 (Portuguese). Similar results were also obtained by Hernández et al. (Reference Hernández, Ventura-Campos, Costa, Miró-Padilla and Ávila2019) among Spanish speakers. Further, developmental studies suggest that simultaneous bilingual infants and toddlers appear to be more attentive to novel linguistic information, which can be possibly explained by their cognitive adaptations to the bilingual environment that features more novelty and variance than its monolingual counterpart (e.g., Costa & Sebastián-Gallés, Reference Costa and Sebastián-Gallés2014; Kovács, Reference Kovács2015; Molnar et al., Reference Molnar, Alemán Bañón, Mancini and Caffarra2021). Whether this is the case in adulthood, by the time more linguistic experience is accumulated, remains unclear.
Surrounding the bilingualism effects on listening effort indicated by pupil dilation during spoken language processing, we asked three research questions described as below. Our overarching hypothesis was that bilingual experience would alter pupillary response. To test it, we implemented a categorical comparison (i.e., monolingual vs. bilingual participants) in question (i), adopted a continuous approach (via language exposure) in question (ii) and examined the role of other control variables (i.e., language proficiency, cognitive ability) in question (iii).
-
(i) Is there a difference in pupil size between monolinguals and bilinguals when they listen to passages in different conditions: spoken in a familiar or unfamiliar language? We predicted that the unfamiliar language condition would elicit larger pupil size in monolinguals since it is more cognitively demanding, considering evidence from brain imaging data (e.g., Cotosck et al., Reference Cotosck, Meltzer, Nucci, Lukasova, Mansur and Amaro2021; Hernández et al., Reference Hernández, Ventura-Campos, Costa, Miró-Padilla and Ávila2019). Further, as bilinguals have less English exposure and show increased pupil dilation in previous studies (e.g., Borghini & Hazan, Reference Borghini and Hazan2018, Reference Borghini and Hazan2020), we hypothesized that they may have larger pupil size than monolinguals when listening to the familiar language.
-
(ii) What are the effects of language exposure (e.g., the amount of English exposure, length of stay in an English-speaking environment) and linguistic complexity (e.g., the number of languages known) on participants’ pupil size during speech processing? Given previous evidence suggesting smaller pupil dilation in monolinguals than bilinguals when listening to English (e.g., Borghini & Hazan, Reference Borghini and Hazan2018, Reference Borghini and Hazan2020), we hypothesized that monolingual characteristics, such as more English exposure, a longer stay in an English-speaking environment and fewer languages known, would correlate with smaller pupil size in the familiar language condition.
-
(iii) Are participants’ language proficiency and cognitive ability associated with their pupil size during spoken passage processing? In light of mixed findings about the relationship between pupil dilation and linguistic/cognitive capabilities (e.g., Beatty-Martínez et al., Reference Beatty-Martínez, Guzzardo Tamargo and Dussias2021; Koelewijn et al., Reference Koelewijn, Zekveld, Festen, Rönnberg and Kramer2012; Schmidtke, Reference Schmidtke2014; cf. Koelewijn et al., Reference Koelewijn, Zekveld, Festen and Kramer2014; Zekveld et al., Reference Zekveld, Rudner, Kramer, Lyzenga and Rönnberg2014), we considered this as an exploratory question and thus did not have a specific hypothesis.
4. Methods
4.1. Participants
We recruited 73 young adults through flyers, word of mouth, and online posts in Toronto, Ontario, Canada. A great majority (82.19%) were undergraduate and graduate students at the University of Toronto, with the rest mainly being alumni and employees in the neighboring community. To be eligible to participate, participants had to be: (i) aged between 18 and 25 years; (ii) born in or immigrated to Canada before the age of three; (iii) either an English-speaking monolingual or an English-speaking bilingual who learned both or all languages before the age of three and do not know Hebrew (as it was the unfamiliar language in the experiment); (iv) with normal hearing and normal or corrected-to-normal vision. Some monolingual participants may have passive exposure to another language (e.g., overhearing another language at school or work settings, as defined in Castro et al., Reference Castro, Wodniecka and Timmer2022), but we only included those who reported ≤20%; we acknowledge that this criterion was post hoc when defining our monolingual group. Before participation, we obtained written informed consent from all participants, who were compensated $15 per hour after the experiment. The University of Toronto Health Sciences Research Ethics Board reviewed and approved the ethics protocol.
Seventy participants (36 monolingual, 34 bilingual) were included in the final analysis because three bilinguals had missing pupil data due to excessive amounts of blinks (i.e., <50% usable trials; see Oliva, Reference Oliva2018; Olmos-Solis et al., Reference Olmos-Solis, van Loon and Olivers2018). This sample size was determined based on previous pupillometry studies with similar designs (e.g., Borghini & Hazan, Reference Borghini and Hazan2020). Participants’ language background was assessed using an online questionnaire adapted from the LEAP-Q (Language Experience and Proficiency Questionnaire; Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007). For English listening proficiency, all participants reported an 8 (very good) or above on a scale from 0 (none) to 10 (perfect). Notably, we observed a heterogeneous heritage language background in our bilingual sample, which reflected the diverse linguistic profile of the Toronto metropolitan area: the additional languages they knew included French, Cantonese, Mandarin, Spanish, Arabic, Korean, Greek, Italian, Vietnamese, Tamil or Urdu. We also assessed SES by scoring participants’ highest level of education and family or personal income. Moreover, participants’ language proficiency and cognitive ability were evaluated via CELF-5 (Clinical Evaluation of Language Fundamentals Fifth Edition; Wiig et al., Reference Wiig, Semel and Secord2013) and TONI-4 (Test of Nonverbal Intelligence Fourth Edition; Brown et al., Reference Brown, Sherbenou and Johnsen2010), respectively (see Section 4.3 for more details). Table 1 summarizes key participant characteristics.
Table 1. Participant characteristics

Note: Values in the group columns are means (standard deviation or SD). Asterisks indicate significant group comparisons (***p < .001, **p < .01).
We conducted t-tests (see Supplementary Table S1 for detailed results) and found significant group comparisons for three measures: English exposure (p < .001), years spent in an English-speaking family (p = .002) and the number of languages known (p < .001). As expected, monolinguals had significantly greater exposure to English, longer residence in an English-speaking family and fewer languages known than bilinguals. However, the two groups had similar years of staying in an English-speaking school/workplace. Further, comparable SES, CELF-5 and TONI-4 scores demonstrate that participants’ SES background, language proficiency and cognitive ability were matched.
4.2. Materials
Materials used in this study are accessible at https://doi.org/10.5683/SP3/HBPALF.
4.2.1. Auditory materials
Auditory materials consisted of 20 spoken passages: 10 in English as the familiar language and 10 in Hebrew as the unfamiliar language. Two passages (one in each language) were used for practices and the remaining 18 for study trials. Passages were excerpts from travel guides describing cities worldwide and were recorded by three highly proficient Hebrew-English female bilingual Canadians. Recordings were processed using Adobe Audition to reduce noise and normalize intensity. In terms of duration, these passages lasted for 29.92 seconds on average, ranging from 28.52 to 30.76 s. Given the shortest duration, 28 s were used as the cutoff across trials in analysis.
Moreover, questions were recorded by a female native Canadian English speaker in a soundproof booth using a SONY digital recorder. These questions asked whether the participant heard a specific word in the spoken passage (e.g., “Did you hear the word ‘beautiful’?”). In the unfamiliar language condition, we used cognates with similar meanings/pronunciations to English (e.g., “Italy,” “auto”) for half of the target words, and words that were totally unknown to participants for the other half. The speaker who recorded the questions was trained on how to pronounce these Hebrew words. Additionally, we varied where the target word occurred in the passage (i.e., at the beginning, in the middle, in the end, or did not occur) to prevent participants from anticipation.
4.2.2. Visual materials
Three screensavers (duration: 26–30 s) were used to maintain the participants’ gaze on the display. Adopted from Nencheva et al. (Reference Nencheva, Piazza and Lew-Williams2020), these videos were made isoluminant to minimize the luminance effects on pupil dilation.
4.3. Procedure
Before the experiment, participants spent 10–15 minutes filling out an online language and SES background questionnaire. For language background, participants were asked to report their acquisition of, exposure to, and proficiency in each language they knew. For SES, questions were about participants’ highest level of education and family or personal income (contingent on whether they received financial support from family).
Participants sat in a sound-attenuated room during the experiment and completed an active listening task. The task began with two practice trials (the first in English, the second in Hebrew) to familiarize participants with the task, followed by 18 study trials equally divided into two blocks and randomized within each block. Participants started with either the familiar language or the unfamiliar language block. At each trial (Figure 1A), they listened to a spoken passage over headphones for about 30 s while looking at a screensaver on a 24-inch DELL monitor. To ensure that participants were paying attention while listening, a question asking whether they heard a specific word was embedded 2 s after each passage. Participants responded by pressing corresponding keys on a keyboard within 10 s. The correct response to most trials was yes, except for two it was no. This setup was designed to encourage participants to remain listening and be attentive during the task. Participants’ eye movement data were recorded via an EyeLink 1000 Plus desktop mount eye tracker (SR Research) at a sampling rate of 500 Hz (Figure 1B). The experiment was programmed with the SR Research Experiment Builder software (v2.3.1; SR Research Ltd., 2020) and took around 15 min to complete.

Figure 1. Trial process and recording setup of the active listening task. (A) Trial schema. At each trial, participants listened to a passage, spoken in either a familiar or unfamiliar language, for about 30 s. After a brief interval, they were asked whether they had heard a word in the passage. Then, they had 10 s to respond by pressing a key on the keyboard. (B) Recording setup. During the task, participants listened to the passages via headphones while looking at an isoluminant screensaver (to minimize the luminance effects on pupil size) on the display. A chinrest was used to ensure head stability. Pupil data were recorded via a desktop mount eye tracker.
After the experiment, participants’ language proficiency and cognitive ability were measured. Focusing on expressive language, three subtests of CELF-5 were used: Recalling Sentence, Sentence Assembly, and Formulated Sentence. Considering the participant’s age, raw data obtained from these subtests were converted to scaled scores and then averaged as an overall CELF-5 score. Afterward, TONI-4 was administered to evaluate participants’ cognitive ability and raw data were converted to index scores. It took approximately 45 min to complete these two tests.
4.4. Data processing
Pupil data were preprocessed using the PupilPre package (v0.6.2; Kyröläinen et al., Reference Kyröläinen, Porretta, van Rij and Järvikivi2019) in R (v4.3.2; R Core Team, Reference Team2023). Based on the sample report generated from the EyeLink Data Viewer software (v4.1.1; SR Research Ltd., 2019), we aligned pupil data to the stimulus onset and created a time series for subsequent processing and modelling. As most participants had the right eye as the dominant eye (also see Reiss & Reiss, Reference Reiss and Reiss1997), we used data from the right pupil across the sample, which consisted of 1260 trials (70 participants × 18 study trials). We removed 149 trials that had sparse data (<50%) in the baseline period (0–300 ms, which was the average time range for our auditory stimuli to initiate) and in the critical window (2–28 s; data visualization suggested that it took 2 s for the pupil size to stabilize in our design, see time evolution in Supplementary Figure S1). Afterward, we performed baseline correction by subtracting the average pupil size in the baseline period from that in the critical window. Then, we downsampled pupil data to 25 Hz to reduce autocorrelation between neighboring data points. Since pupil size was returned as arbitrary pixel units from the EyeLink system, we used a printed artificial eye for pupil calibration and then converted pupil size to millimeter (mm).
4.5. Data analysis
To analyze pupil data, we decided to build linear mixed effects models via the lme4 package (v1.1-35.1; Bates et al., Reference Bates, Mächler, Bolker and Walker2015) after consulting a statistician. This model approach was deemed better relative to generalized additive mixed modeling (GAMM; Wood, Reference Wood2017), considering our relatively long interest period and that it was also used in previous studies with a similar length of interest period (e.g., Fawcett, Reference Fawcett2022; Nencheva et al., Reference Nencheva, Piazza and Lew-Williams2020; Thoma, Reference Thoma2023). Estimated p-values for model fit and comparisons were calculated using the lmerTest package (v3.1-3; Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017). The dependent variable was mean pupil size (in mm), measured in the time window from 2 s (i.e., the time point when pupil size stabilized after stimulus onset) to 28 s (i.e., the time point when stimuli terminated at the earliest) while the spoken passages were presented. For fixed effects, we included the following independent variables: Condition (categorical: familiar language vs. unfamiliar language), categorical bilingual variable Language Group (monolingual vs. bilingual), continuous bilingual variables with significant group differences (ref. Table 1) English Exposure, Years Spent in an English-Speaking Family, and Number of Languages Known, as well as control variables CELF-5 score, TONI-4 score, and SES score. For random effects, we included a random slope for Condition by participants.
Fixed and random effects were fitted separately in a stepwise forward manner. First, we created a null model with a random intercept for participants. We evaluated whether adding a new component (i.e., a predictor or an interaction) significantly contributed to the model fit via the likelihood ratio test, implemented using the anova function that compares a complex model and its simpler version. We inspected the estimated p-value derived from model comparison and the Akaike information criterion (AIC) values associated with each model. The complex model was favored when the resulting difference was significant (p < .05) and its AIC value was smaller; otherwise, the simpler model was selected (Matuschek et al., Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017). Following this model comparison approach (see Lewis et al., Reference Lewis, Butler and Gilbert2010; Stephens et al., Reference Stephens, Buskirk, Hayward and Martinez Del Rio2005), we tested the significance of a predictor or an interaction. Once the optimal model was determined, based on it, we conducted multiple comparisons for significant interactions using the emmeans package (v1.8.9; Lenth, Reference Lenth2023), while applying the Tukey adjustment method to guard against Type I error; significant comparisons (p < .05) were reported below. For data visualization, we used the ggplot2 (v3.4.4; Wickham, Reference Wickham2016) package to plot the estimates of marginal means and standard errors that were obtained from the best fit model.
5. Results
We first examined how accurately monolinguals and bilinguals responded to questions under familiar and unfamiliar language conditions by calculating the mean accuracy rate (i.e., the percentage of correct responses). The two groups performed similarly both across conditions (monolinguals: 57.87%, bilinguals: 60.13%; t(63.24) = .89, p = .375) and within each condition (see Supplementary Table S2 for further details). Additionally, we compared accuracy rates between conditions both across groups and within each group, but observed no significant differences (see Supplementary Table S2). In sum, there were no significant differences in accuracy between groups or conditions. The relatively low accuracy rates may reflect the difficulty of our task, which resembled a foil paradigm designed to maximize participants’ attention during listening. Given this, we retained all trials in the pupil data analysis regardless of response accuracy.
We then investigated how bilingual experience influenced participants’ pupil size in the two conditions. Using the model comparison approach, we found a significant effect of Condition (p < .001) along with three relevant significant interactions: Condition × Language Group (p < .001), Condition × English Exposure (p < .001) and Condition × Years Spent in an English-Speaking Family (p = .016). Adding an interaction term significantly improved the model fit compared to a reduced model containing only the main effects. However, due to multicollinearity, we analyzed each interaction separately in individual models. Notably, no other predictors yielded significant effects (e.g., Number of Languages Known: p = .750; CELF-5 Score: p = .933; TONI-4 Score: p = .913; SES Score: p = .204).
Focusing on the interaction between Condition and Language Group, we identified the optimal model (see Table 2). Post hoc comparisons revealed that monolinguals exhibited significantly larger pupil size when listening to the unfamiliar language compared to the familiar one (p = .042); however, no other comparisons reached significance. This interaction is visualized in Figure 2. Although visual inspection indicates that bilinguals tended to have larger pupil size than monolinguals across conditions (particularly in the familiar language condition) and that their pupil responses remained relatively similar between conditions, these patterns were not statistically significant.
Table 2. Model summary: Pupil size predicted by the interaction between Language Group and Condition

Note: ***p < .001, **p < .01, *p < .05.

Figure 2. Pupil size was predicted by the interaction between Condition and Language Group. Based on the optimal model, estimated marginal means and standard error bars were plotted. Applying the Tukey adjustment method, multiple comparisons suggest that only in monolinguals (the dashed line), listening to an unfamiliar language elicited significantly larger pupil size than listening to a familiar language (p = .042).
Similarly, we built separate models to examine the other two significant interactions: Condition × English Exposure (see optimal model summary in Supplementary Table S2) and Condition × Years Spent in an English-Speaking Family (see summary in Supplementary Table S3). Both models revealed a consistent pattern: greater English exposure and a longer duration of residence in an English-speaking family (i.e., toward the monolingual end of the spectrum) were associated with smaller pupil size, particularly in the familiar language condition.
6. Discussion
Studying bilingualism is important from the perspective of equity and diversity, as at least half of the global population is bilingual (Giannini, Reference Giannini2024); hence, experimental findings should be representative to a significant portion of the entire population. Most studies thus far have focused on comparing monolingual and bilingual abilities. While this approach has given us a glimpse into the bilingual mind, its explanatory power about bilingual language processing is limited. Subsequently, there have been recent calls to increase the explanatory power of studies conducted on bilinguals, in a way that we can build more detailed theoretical models relevant to language and cognition (e.g., Blanco-Elorrieta & Caramazza, Reference Blanco-Elorrieta and Caramazza2021; Marian & Hayakawa, Reference Marian and Hayakawa2021). Responding to these calls, we included both categorical and continuous variables of bilingualism in the analyses. More importantly, our study quantified auditory attention allocation during speech processing via pupil dilation – providing a novel angle for understanding attention in bilingualism research. While changes in pupil size can reflect various cognitive processes (e.g., attention, memory, and arousal; see Fink et al., Reference Fink, Simola, Tavano, Lange, Wallot and Laeng2024 for a review), we focused on gauging attentional resources – equivalent to listening effort defined by Pichora-Fuller et al. (Reference Pichora-Fuller, Kramer, Eckert, Edwards, Hornsby, Humes, Lemke, Lunner, Matthen, Mackersie, Naylor, Phillips, Richter, Rudner, Sommers, Tremblay and Wingfield2016). We also acknowledge that the relation between attention and effort remains debated, with some accounts treating them as overlapping but distinct constructs (e.g., Bruya & Tang, Reference Bruya and Tang2018).
Most theoretical frameworks describing differences between monolingual and bilingual cognition have primarily focused on attentional control in the nonlinguistic domain (e.g., Bialystok & Craik, Reference Bialystok and Craik2022; Green & Abutalebi, Reference Green and Abutalebi2013). Conversely, our study investigated attentional resources (or listening effort) during spoken language processing in two groups of native English speakers: monolinguals and simultaneous heritage bilinguals. Using an active listening task, we presented participants with passages in two conditions: spoken in a familiar (English) or unfamiliar (Hebrew) language. Combining the categorical and continuous approaches to bilingualism (Kremin & Byers-Heinlein, Reference Kremin and Byers-Heinlein2021), we compared pupil responses between the two language groups and examined the modulatory role of various environmental and individual factors on pupil dilation. Overall, we hypothesized that greater English exposure/proficiency would be associated with less pupil dilation.
We observed a significant effect of the experimental condition (familiar vs. unfamiliar language) and its interactions with language groups and with environmental variables, such as the amount of English exposure and length of stay in an English-speaking family. Specifically, a group distinction arose when comparing the two conditions: listening to an unfamiliar language evoked larger pupil size than listening to a familiar one in monolinguals only, whereas bilinguals had similar pupil responses across conditions. It should also be noted that though not statistically significant, bilinguals tended to exhibit larger pupil size than monolinguals, and the difference was more pronounced in the familiar language condition (i.e., while listening to English; Figure 2). Further, consistent results were found when evaluating participants on the spectrum of bilingual experience: more English exposure and a more extended stay in an English-speaking family correlated with smaller pupil size, particularly in the familiar language condition. Note that we minimized the effects of AoA by only including simultaneous bilinguals, and those with more English experience exhibited more monolingual-like patterns. Taken together, our findings suggest that listening effort is related to an individual’s overall linguistic experience, especially the amount of early language exposure at home. There seems to be a gradient of the relation across participants: processing a completely unknown language is most effortful, processing a language that was learned early in life but not the primary language spoken at home is less effortful, and processing a language that was the dominant language at home from infancy is the least effortful (further discussed in Section 6.2).
In addition, individual English language proficiency scores obtained from a standardized test were not associated with participants’ pupil dilation. This finding suggests that it is not the level of English proficiency but the amount of English exposure that seems relevant to listening effort expanded in speech processing. However, there was no significant difference in proficiency scores between our monolingual and bilingual participants, which was unsurprising because our simultaneous heritage bilinguals were also native English speakers. Finally, individual cognitive ability scores were not linked to pupil size changes either. Likewise, there was no significant group difference in this cognitive measure among our participants, who were mostly university students. Therefore, it may be that our data did not have enough variation in terms of cognitive scores to show an effect on pupil responses.
6.1. Language group comparison: monolinguals vs. bilinguals
We first asked if there was a difference between monolinguals and bilinguals in pupil responses during speech processing varied by language familiarity, and we predicted that listening to an unfamiliar language would elicit larger pupil size in monolinguals than listening to a familiar language. This prediction was confirmed by our data, which are consistent with previous neuroimaging data obtained from monolinguals (e.g., Cotosck et al., Reference Cotosck, Meltzer, Nucci, Lukasova, Mansur and Amaro2021; Hernández et al., Reference Hernández, Ventura-Campos, Costa, Miró-Padilla and Ávila2019) suggesting that speech processing of an unknown language is more cognitively demanding than that of a known one. Further, we hypothesized that bilinguals would have larger pupil size than monolinguals when listening to the familiar language. This prediction was borne out too, but the group difference did not reach the significance level. Our result suggests that simultaneous English bilinguals may not require significantly more listening effort than monolinguals when processing English-spoken passages. While this finding seems to contrast with those of Borghini and Hazan (Reference Borghini and Hazan2018, Reference Borghini and Hazan2020), it should be noted that they tested Italian L1–English L2 sequential bilinguals, who were included as long as they had lived in the United Kingdom for over three or 10 months (Borghini & Hazan, Reference Borghini and Hazan2018, Reference Borghini and Hazan2020). Thus, it is possible that we did not find a significant group difference because we included simultaneous bilinguals whose AoA is similar to that of monolinguals.
Moreover, our finding supports the attentional control framework regarding situations when the language group difference would occur. Specifically, it is postulated that monolinguals and bilinguals tend to exhibit differences under high attentional control demand (Bialystok & Craik, Reference Bialystok and Craik2022). We observed significantly larger pupil size in the unfamiliar language condition than the familiar language one in monolinguals only, while bilinguals had similar pupil responses across conditions. A possible explanation is that higher linguistic complexity in the bilingual environment potentially facilitates the adaptation to novelty in a linguistic auditory scene, whereas monolinguals are habituated to monotonous linguistic input and thus would allocate more attentional resources to process unfamiliar spoken language (also see Kovács, Reference Kovács2015). Similar findings have been demonstrated in Spanish monolingual and bilingual toddlers listening to novel syntactic constructs (Molnar et al., Reference Molnar, Alemán Bañón, Mancini and Caffarra2021).
6.2. Language environment factors: language exposure and linguistic complexity
Our second question sought to determine what language environment variables influence pupil responses during spoken language processing, with a focus on language exposure and linguistic complexity. We hypothesized that when listening to the familiar language, participants would show smaller pupil size if they were more monolingual-like, namely having more exposure to English and more extended stay in an English-speaking environment along with knowing fewer languages. This was partially corroborated by the association between smaller pupil size and more English exposure. This finding is in line with the interactions between listener- and input-based variables frequently reported in previous studies (e.g., McCloy et al., Reference McCloy, Lau, Larson, Pratt and Lee2017; Piquado et al., Reference Piquado, Isaacowitz and Wingfield2010) and reveals the complex processes involved in attentional resources allocation during auditory processing.
Further, as discussed earlier, our results reveal a gradient relation between language experience and listening effort, which highlights the role of early language exposure during infancy. Different processing patterns between monolinguals and simultaneous bilinguals can be partially attributed to the amount of early exposure to English, which may not be the dominant language spoken at home among many of our heritage bilingual participants. This is consistent with prior work showing that early exposure to a particular language can predict later language processing abilities (e.g., Hurtado et al., Reference Hurtado, Grüter, Marchman and Fernald2013; Schmidtke, Reference Schmidtke2014). However, pupil responses to both languages of bilinguals need to be measured, so as to disentangle the effects as a result of exposure to multiple languages (e.g., when bilinguals have equally larger pupil size for both of their languages relative to that of monolinguals listening to their native language) or exposure to their L1 only (e.g., when bilinguals’ pupil size is only larger than that of monolinguals for English but not for their heritage language).
We did not find effects of the length of stay in an English-speaking professional setting, which could be attributed to the fact that our two groups had a comparable number of years spent in an English-speaking school/workplace. Additionally, no effect was found for how many languages the participants knew either, even though bilinguals knew significantly more. A possible explanation is that the number of languages known only represents an approximate estimate of linguistic complexity, and thus, it may not capture the kernel of this multidimensional construct. We acknowledge this as a limitation of our study and suggest further investigations focusing on linguistic complexity and its multiple facets (e.g., measuring the linguistic distance between languages known by the bilingual participants). Alternatively, bilinguals and multilinguals might rely on similar attentional resources, which could also be further explored in future research.
6.3. Individual difference factors: language proficiency and cognitive ability
Our third question concerned interindividual differences in language proficiency and cognitive ability. We hypothesized that participants’ CELF-5 and TONI-4 scores may be related to their pupil responses, but neither was a significant predictor. While our results differ from some published research (e.g., Beatty-Martínez et al., Reference Beatty-Martínez, Guzzardo Tamargo and Dussias2021; Schmidtke, Reference Schmidtke2014), they are consistent with those of Koelewijn et al. (Reference Koelewijn, Zekveld, Festen and Kramer2014) and Zekveld et al. (Reference Zekveld, Rudner, Kramer, Lyzenga and Rönnberg2014). It should also be noted that our study used two standardized tests to assess language and cognitive abilities separately, which, however, are often measured together without distinction through one test or task in prior work. For example, Koelewijn et al. (Reference Koelewijn, Zekveld, Festen, Rönnberg and Kramer2012) measured linguistic/cognitive abilities via cognitive tests with substantial verbal components (e.g., reading span and listening span tests that assess verbal working memory capacity) and referred to those two abilities interchangeably (also see Zekveld et al., Reference Zekveld, Rudner, Kramer, Lyzenga and Rönnberg2014; Zekveld & Kramer, Reference Zekveld and Kramer2014). Yet, given mixed results in this area, more research is needed to establish a clear association between pupillary responses and language and cognitive abilities.
6.4. Limitations
Statistically, various methods are available to analyze pupil data. We chose the current technique as it is most suitable for our research design, considering experimental conditions and the trial duration (also see Fawcett, Reference Fawcett2022; Nencheva et al., Reference Nencheva, Piazza and Lew-Williams2020; Thoma, Reference Thoma2023). Another technique commonly used in the field is GAMM. As a time-series approach, GAMM can include nonlinear interactions between pupil dilation and experimental predictors while handling variability and autocorrelation within the data (van Rij et al., Reference van Rij, Hendriks, van Rijn, Baayen and Wood2019). However, it has been only used for time periods much shorter than our trial duration (e.g., 2–3 s; Chuang et al., Reference Chuang, Fon, Papakyritsis, Baayen and Ball2021; van Rij et al., Reference van Rij, Hendriks, van Rijn, Baayen and Wood2019). It is unclear whether GAMM is appropriate for much longer time periods (i.e., 28 s in our design). For full disclosure, we had conducted preliminary exploratory analyses using GAMM across the whole trial duration, but it has not yielded meaningful results and the model often did not even converge. We also considered analyzing shorter segments of the trial; however, we could not rely on any framework that could have guided trial truncation (i.e., duration of shorter segments).
7. Conclusions
Using pupil dilation as an index of listening effort, we investigated whether monolinguals and bilinguals differ in allocating their attentional resources – an overlooked aspect of cognition in the existing bilingualism literature and frameworks as they primarily focus on attentional control. Importantly, we approached this question in a linguistic context of spoken language processing, which addresses the origin of the bilingual advantage extensively studied in the nonlinguistic domain. Through testing two groups of native English speakers across the monolingual–bilingual spectrum, we not only compared monolinguals and simultaneous heritage bilinguals, but also explored the modulatory role of language environment and individual difference factors on pupil dilation. For the first time, our findings on pupillary responses provide support for neuroimaging evidence that phonological processing of an unknown language is more effortful than that of a native language in monolingual speakers. Further, they confirm the postulation that monolingual and bilingual differences are more observable in cognitively demanding situations. We conclude that simultaneous bilinguals differ from monolinguals in their attentional resource allocation, especially when listening to a novel linguistic input.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925100795.
Data availability statement
The data that support the findings of this study along with the materials used are openly available in Borealis (the Canadian Dataverse Repository) at https://doi.org/10.5683/SP3/HBPALF.
Acknowledgments
This work was supported by the Natural Sciences and Engineering Research Council of Canada awarded to M.M. (RGPIN-2019-06523).
Competing interests
The authors declare none.

