1. Introduction
To date, heritage language (HL) acquisition research has primarily focused on either early childhood development or adult competence/performance outcomes (Kupisch & Rothman, Reference Kupisch and Rothman2018; Montrul, Reference Montrul2016; Polinsky, Reference Polinsky2018). As a result, HL development and use in adolescence remain understudied, despite compelling reasons to the contrary (Bayram et al., Reference Bayram, Pisa, Rothman, Slabakova, Montrul and Polinsky2021; Korkus & Vihman, Reference Korkus and Vihman2024; Minkov et al., Reference Minkov, Kagan, Protassova and Schwartz2019). Indeed, having a clear picture of HL development over late childhood and adolescence is essential for capturing the full developmental trajectory of HL bilingualism, in particular for linguistic domains where sub-properties are generally known to be gradually acquired in stages over time (e.g., properties related to pronouns). The fact that adolescent HL bilinguals are differentially subject to (drastic) changes in their input and usage patterns, emerging sociolinguistic realities with increased personal agency, as well as maturation in other cognitive and psychological abilities, results in a rather unique natural laboratory whereby studying their development over this transitional period (from childhood to adulthood) can be informative in multifarious ways.
Heritage speakers (HSs) are early bilinguals who acquire a HL from birth with naturalistic exposure at home or their immediate communities. Importantly, the HL is not the language of the larger society (Montrul, Reference Montrul2016; Rothman, Reference Rothman2009). At the macro-group level, a substantial body of research has documented that adult HSs often perform differently compared to first language (L1)-dominant speakers of the same language (Montrul, Reference Montrul2016; Polinsky, Reference Polinsky2018). To investigate the source of these differences, researchers have proposed a comparative developmental approach, examining child HSs and adult HSs relative to monolingual “baselines” (Montrul & Polinsky, Reference Montrul and Polinsky2021; Polinsky, Reference Polinsky2018; Polinsky & Scontras, Reference Polinsky and Scontras2020). The logic behind this approach is that if both child and adult HSs pattern similarly and differently from monolinguals at the same time, the difference reflects a differential acquisition trajectory and outcomes between HSs and L1-dominant users. In contrast, if child HSs are more similar to monolingual children but differ from adult HSs, then HL attrition may explain the adult HSs’ outcomes (Polinsky, Reference Polinsky2018).
While this comparative approach has yielded important insights, it implicitly assumes a linear or categorical developmental path and often (unwittingly) treats HS populations as internally homogeneous. However, an accumulating body of research highlights that HSs exhibit striking individual differences (IDs) to a degree that is rarely observed among neurotypical L1-dominant users or even L2 populations (De Houwer, Reference De Houwer2023; Rothman et al., Reference Rothman, Bayram, DeLuca, Di Pisa, Duñabeitia, Gharibi, Hao, Kolb, Kubota, Kupisch, Laméris, Luque, Van Osch, Pereira Soares, Prystauka, Tat, Tomić, Voits and Wulff2023). These differences span multiple linguistic domains (Daskalaki et al., Reference Daskalaki, Chondrogianni, Blom, Argyri and Paradis2019) and are shaped by a complex interaction of child-internal factors, for example, cognitive abilities, and child-external factors, for example, input quantity and quality (see Paradis, Reference Paradis2023, for a summary). For the comparative framework to be truly explanatory, it must be embedded within a broader developmental model that not only traces language outcomes across the lifespan but also accounts for the heterogeneity observed within HS groups.
Under the comparative approach, another recurring question in the HL bilingualism literature concerns whether all linguistic domains are equally vulnerable to input reduction (Polinsky & Scontras, Reference Polinsky and Scontras2020). One influential proposal is that linguistic structures situated at the interface of syntax with other domains, especially discourse-pragmatics, which are often the same ones fully converged upon later in L1-dominant child acquisition, are more susceptible to reduced input than properties governed strictly by narrow syntax. This idea is formalized in the Interface Hypothesis (IH), which maintains that morphosyntactic properties requiring integration with discourse-pragmatic information are particularly vulnerable in bilingual grammars (Sorace, Reference Sorace2011), most likely for processing-related reasons. In addition, structures involving long-distance (LD) dependencies, such as object-relative clauses and wh-movement, are thought to impose greater difficulties for HSs, a phenomenon sometimes referred to as “the distance problem” (Polinsky & Scontras, Reference Polinsky and Scontras2020). This distance problem suggests that the integration of elements across larger syntactic spans may interact with the vulnerability of interface structures, creating a cumulative challenge for bilingual grammars. The IH and the distance problem have been empirically tested in both child and adult HSs and other bilinguals, but findings supporting its predictions remain mixed (Daskalaki et al., Reference Daskalaki, Chondrogianni, Blom, Argyri and Paradis2019; Hao, Chondrogianni & Sturt Reference Hao, Chondrogianni and Sturt2024; Hao et al., Reference Hao, Chondrogianni and Sturt2024; Leal et al., Reference Leal, Rothman and Slabakova2014).
These mixed results may not be surprising when considering that the comparative approach tends to focus on group-level patterns while often overlooking the substantial variability within HS populations. By averaging across individuals, this approach may obscure meaningful variability in how different structures are processed and acquired by different individuals. An ID approach offers a powerful complementary lens to refine and test the IH and the distance problem. From the ID perspective, narrow syntactic structures and local dependencies are expected to be more uniform across HSs, whereas interface structures and LD dependencies are more susceptible to variation.
Additionally, while there has been an upsurge in research adopting an ID approach in the HL literature, doing so in combination with online processing methods trails significantly behind. Online processing methods measure how language users respond to linguistic information in real time, offering insights into automatic, time-sensitive mechanisms that underpin language comprehension/use. In contrast, offline methods, such as grammaticality judgment tasks or elicited production tasks, are more likely to be influenced by language users’ metalinguistic awareness, task strategies, and explicit knowledge. This distinction is particularly relevant for HSs, whose performance is known to be highly sensitive to such factors (Polinsky, Reference Polinsky2018). Online methods, therefore, offer a more direct window into HL knowledge and are essential for evaluating how ID factors shape language comprehension and use beyond what offline tasks can reveal (Bayram et al., Reference Bayram, Pisa, Rothman, Slabakova, Montrul and Polinsky2021).
The current study investigates how IDs in cognitive and input-related factors affect online HL processing in adolescent Mandarin-English HSs. Using a visual world eye-tracking paradigm, we focus on three types of Mandarin pronouns: pronominals (ta “he/she/it,” a LD dependency located at the syntax-pragmatics interface), simplex reflexives (ziji “self,” a LD dependency located at the syntax-semantics-pragmatic interface), and complex reflexives (taziji “himself/herself/itself,” a local dependency governed by narrow syntax). By examining how adolescent HSs interpret these forms in real time, we evaluate the extent to which working memory (WM), inhibition, and HL exposure and use patterns modulate HL processing. In doing so, we aim to contribute to a more developmentally grounded, cognitively informed model of HL bilingualism.
2. ID factors modulating HL development and processing
ID factors can be broadly categorized into user-internal and user-external dimensions (see Paradis, Reference Paradis2023, for a summary). Internal factors refer to the cognitive capacities that language users bring to the task of language acquisition and use, such as working memory (WM), inhibitory control/inhibition, and other executive functions. These factors influence how efficiently learners can process, store, and retrieve linguistic information. In contrast, external factors encompass the broader socioenvironmental context of language exposure and use. These include proximal factors, such as the amount and quality of direct input in the HL and opportunities for its use, as well as distal factors related to the larger environment, such as socioeconomic status (SES), which can shape the availability and richness of proximal experiences. It is worth noting that while ID factors should, in principle, affect language learners and users more generally, their effects on HL processing need not reflect the same underlying mechanisms across populations. HSs occupy a unique sociolinguistic and developmental position, characterized by early bilingual exposure, long-term dominance shifts, and often sustained reduction in naturalistic HL input. As a result, ID effects in this population are likely to arise from interactions between user-internal cognitive resources and user-external experiential factors that are not directly comparable to those in other bilingual or monolingual populations. For example, an effect of WM in HL processing, where input is reduced, variable, and largely naturalistic, may reflect different underlying mechanisms from WM effects observed in L1-dominant speakers with full input or in L2 learners whose reduced input is often instruction-driven (see also Cunnings, Reference Cunnings2017, on the differential role of WM in L1-dominant and L2 users). To avoid generalizations that might incorrectly imply shared underlying mechanisms, we therefore limit our discussion of ID effects to HSs.
Starting with (language) user-internal factors, a growing body of research has demonstrated that WM plays a positive role in bilinguals’ language abilities across a wide range of linguistic domains (e.g., vocabulary, morphosyntax, and narrative skills) and across task modalities (i.e., offline comprehension, production, and online measures). However, the role of WM in HL development and use has received comparatively less attention. Among the few existing studies, Paradis et al. (Reference Paradis, Soto-Corominas, Chen and Gottardo2020) found that HL vocabulary size in child HSs was positively associated with WM capacity, while Soto-Corominas et al. (Reference Soto-Corominas, Daskalaki, Paradis, Winters-Difani and Janaideh2022) reported a similar positive relationship between WM and HL sentence repetition accuracy across a range of morphosyntactic structures. These findings suggest that WM supports HL development at least in young children. While there is good reason to hypothesize a strong relationship between HL processing and WM capacity for particular domains of grammar, namely those that tax memory, such as, for example, lexical retrieval, LD dependencies, those that require integration between grammar and discourse, etc., the relationship between WM and HL processing is severely understudied, and thus unclear. In terms of grammar proper, Bice and Kroll (Reference Bice and Kroll2021) stand out as a singular study directly examining the role of WM on HL processing. Their findings showed that WM capacity correlated with sensitivity to subject-verb agreement violations in L1-dominant users, but not in adult HSs. While this stands out in contrast to Paradis et al. (Reference Paradis, Soto-Corominas, Chen and Gottardo2020) and Soto-Corominas et al. (Reference Soto-Corominas, Daskalaki, Paradis, Winters-Difani and Janaideh2022), the discrepancy may be task-specific, structure-specific, or age-specific. This highlights the need for carefully considering the linguistic domains and developmental stages under investigation.
Another user-internal factor that is particularly relevant to the current study is inhibitory control, or inhibition. While the role of inhibition in bilingualism has been extensively studied in the context of language switching and lexical selection as well as in research on domain-general cognitive advantages associated with bilingual experience, relatively little attention has been paid to how inhibition supports real-time sentence processing, particularly in HSs. Yet, examining inhibition at the processing level holds significant promise for addressing a central question in bilingualism: how does cross-linguistic influence (CLI) manifest during language comprehension, and what cognitive mechanisms help bilinguals manage competing representations from their two languages?
At the processing level, CLI has been reported in terms of HSs’ use of the majority language processing strategies even when processing the HL (see Chondrogianni, Reference Chondrogianni, Elgort, Siyanova-Chanturia and Brysbaert2023, for a summary). In such cases, HSs may rely on cues or parsing strategies that are more consistent with their societal language than with those typically employed by dominant speakers of their HL. Several theoretical models of bilingual sentence processing account for this phenomenon by emphasizing the role of cue-based transfer. For instance, both the cue-based retrieval model (e.g., Cunnings, Reference Cunnings2017) and the Unified Competition Model (e.g., MacWhinney, Reference MacWhinney, Hickmann, Veneziano and Jisa2018) posit that bilinguals interpret sentences by drawing on cues associated with both of their languages. When the two languages differ in cues, bilinguals may transfer cue preferences from one language to the other, resulting in non-target-like processing. Therefore, when HSs process HL structures that diverge from the majority language, those with stronger inhibitory control may show greater apparent success in suppressing default strategies that align with the societal dominant language.
Turning to user-external factors, a substantial body of evidence highlights the importance of HL input quantity (a proximal factor typically measured as current or cumulative exposure to and use of the HL) in shaping HL development and use, particularly as reflected in offline measures across a wide range of linguistic domains (Chondrogianni & Daskalaki, Reference Chondrogianni and Daskalaki2023; Daskalaki et al., Reference Daskalaki, Chondrogianni, Blom, Argyri and Paradis2019; Kubota et al., Reference Kubota, Goto, Kurokawa, Matsuoka, Otani and Rothman2025; Paradis et al., Reference Paradis, Soto-Corominas, Chen and Gottardo2020; Soto-Corominas et al., Reference Soto-Corominas, Daskalaki, Paradis, Winters-Difani and Janaideh2022). However, the role of HL input quantity on HL processing remains less clear. The emerging literature presents mixed findings, suggesting that the role of HL input quantity may vary across developmental stages or types of tasks. For example, Hao et al. (Reference Hao, Chondrogianni and Sturt2024) found that among pre-teenage Mandarin-English HSs, HL input quantity significantly predicted performance on offline comprehension and production of various non-canonical structures. However, input quantity did not modulate online processing of the same structures in a self-paced listening with picture verification task. In contrast, studies with adult HSs have reported more consistent effects of input. For instance, using the visual world eye-tracking paradigm, Hao, Rossi et al. (Reference Hao, Rossi, Nakamura, Luque and Rothman2025) found that input modulated the processing strategy preferences (see also Karaca et al., Reference Karaca, Brouwer, Unsworth and Huettig2024). Moreover, emerging neuroimaging work (EEG) also shows a positive correlation between HL grammatical processing and quantitative as well as qualitative aspects of HL input (Hao, Rossi et al., Reference Hao, Rossi, Nakamura, Luque and Rothman2025).
These divergent findings suggest that the role of HL input in processing may be developmentally mediated, and that adolescence could represent a transitional phase in which the relationship between input and real-time processing begins to emerge or reorganize. Moreover, compared to self-paced listening with picture verification tasks, the visual world eye-tracking and EEG paradigms, at least as implemented in the above studies, can offer a more naturalistic and certainly more temporally fine-grained measure of language processing. They track participants’ real-time attention to referents as they listen to/read (spoken) language, allowing the detection of subtle effects of input on processing. Additionally, unlike self-paced listening with picture verification tasks, eye-tracking and EEG do not require an additional metalinguistic or verification task at the end of each trial. This minimizes task-related demands and reduces the likelihood that processing is influenced by response strategies, affective factors (argued to be particularly relevant for HS, Polinsky, Reference Polinsky2018), or metalinguistic knowledge, making it particularly suitable for capturing the automatic aspects of HL processing.
The present study focuses on three language user-level factors: WM, inhibition, and HL input quantity. While other language user-level factors may also contribute to HL processing IDs, including additional predictors would require a substantially larger sample size to ensure adequate statistical power and avoid overfitting. This is especially true as we are also interested in how IDs potentially differentially manifest themselves across different linguistic domains. Moreover, many user-external factors, such as input quantity and input richness, tend to be highly correlated, complicating model specification and interpretability.
To maintain analytical clarity while achieving a robust understanding of the role of WM, inhibition, and HL input quantity, the present study controls for other user-level factors to the extent possible. As detailed in the Participants section, we accounted for HL education, parental language background, and SES, three distal factors that have been shown to influence HL development and use (see Paradis, Reference Paradis2023, for a review). By focusing on theoretically motivated core predictors while minimizing potential confounds, this study aims to provide a precise and developmentally sensitive account of how internal and external factors shape real-time HL processing in adolescence.
3. Pronoun systems in English and Mandarin
Pronouns are linguistic devices whose referential interpretation depends on other elements within the sentence and/or discourse. That is, their meanings are not fixed but are bound to and/or co-referenced with other noun phrases (NPs) or entities. Pronouns include reflexives (sometimes referred to as anaphors), such as himself, and pronominals (sometimes referred to reductively as pronouns) such as he. Reflexives and pronominals have different distributions. For example, while the reflexive himself in (1) must be interpreted as Joe but not Jack, the pronominal him in (2) must not refer to Joe but can be interpreted as Jack or some other third-person male in the discourse.
(1)
Jack thinks Joe likes himself.
(2)
Jack thinks Joe likes him.
Chomsky (Reference Chomsky1993) proposed the Binding Principles A and B to capture the distribution of reflexives and pronominals respectively. More specifically, Binding Principle A suggests that a reflexive is bound in its governing category/a reflexive is locally bound (the minimal category that contains the reflexive, assigns a thematic role or syntactic Case to the reflexive, and is a SUBJECT). Binding Principle B stipulates that a pronominal must be free within its governing category. While in both (1) and (2), Jack and Joe c-command himself and him, Joe is realized within the governing category of the pronouns, but Jack is not. In other terms, Joe is a local antecedent of the pronouns, and Jack is a LD antecedent and pronominals can bind with an LD antecedent but cannot bind with a local antecedent, and reflexives must bind with a local antecedent.
While Binding Principle A reliably accounts for the distribution of reflexives in argument positions in English, Binding Principle B appears to face some challenges, that is, pronominals governed by Principle B often exhibit behaviour that suggests greater interpretive flexibility than Binding Principle B alone predicts. Consider example (3):
(3)
I know what Jack and Joe have in common. Jack adores him, and Joe adores him too.
In this case, the second pronominal him for many can refer to Joe, its apparent local antecedent, seemingly violating Principle B (as a bound variable interpretation). However, such examples are often judged acceptable in context, especially when contrastive focus or parallel discourse structure is present. This example highlights the difference between binding and co-reference (see Reuland, Reference Reuland, Everaert and van Riemsdijk2006, for a summary). Binding involves a syntactic dependency between a pronoun and its antecedent, constrained by structural rules such as locality and c-command. In contrast, co-reference refers to cases where two expressions refer to the same entity but are not syntactically dependent on each other. Building on this distinction, Grodzinsky and Reinhart (Reference Grodzinsky and Reinhart1993) proposed the Rule I, which formalizes a principle of interpretive economy: when both a bound variable reading (via syntactic binding) and a coreferential reading (via discourse) are available and yield the same interpretation, the coreferential reading is blocked. In example (2), him cannot be bound by Joe (as it violates Principle B) nor be co-referenced with Joe (this co-reference reading is identical to a bound reading, violating Rule I). In example (3), however, while the binding reading is ruled out by Principle B, the co-referential reading (A and B like B) is not blocked by Rule I as it differs from the bound variable reading (A likes A and B likes B), allowing a local co-referential reading. These patterns demonstrate that (LD) pronominal resolution draws on both syntactic constraints and pragmatic computations: while syntax constrains the space of possible antecedents, pragmatic and discourse principles determine which referent is ultimately selected.
Under such an account, reflexives, governed by Principle A, are interpreted via syntactic binding: once the structural dependency is established, the reflexive’s reference is determined compositionally. In contrast, pronominals are referentially independent and interpreted through discourse co-reference (Rule I) once syntactic constraints (Principle B) have ruled out illicit binding. This asymmetry explains why reflexives are handled within narrow syntax, whereas pronominals require computation at the syntax-pragmatics interface.
Mandarin has three forms of pronouns: the pronominal ta “he/she/it” (ta henceforth), the complex reflexive taziji “himself/herself/itself” (taziji henceforth), and the simplex reflexive ziji “self” (ziji henceforth). The pronominal ta, like its English counterpart him, is governed by Binding Principle B and Rule I (which blocks co-reference between ta and its local antecedent when a complex reflexive taziji yields the same interpretation). The complex reflexive taziji, corresponding closely to English himself, is subject to Binding Principle A, requiring local syntactic binding. In contrast, the simplex reflexive, ziji, lacking in English, can refer either to an LD antecedent or to a local antecedent, violating Binding Principle A. The status of ziji in Mandarin remains highly debated without a theoretical consensus.
Some approaches treat ziji as a syntactic anaphor in both LD and local readings (see Cole et al., Reference Cole, Hermon, Huang, Everaert and van Riemsdijk2006, for a summary). The basic idea is that LD reading of ziji is a result of Logical Form (LF)-movement, where ziji is moved to the T(ense) position at LF, where it receives its features from the subject through Specifier-Head agreement. Under this approach, even LD binding is ultimately local, conforming to the Binding Principle A. However, this movement-based approach fails to account for some empirical observations, such as island constraints and blocking effects (see Wang & Pan, Reference Wang, Pan, Wang and Pan2021, for a summary of criticisms of the movement approach). One influential alternative approach to ziji proposes that it functions as a syntactic anaphor following Binding Principle A and giving rise to local binding and as a pragmatic logophor leading to an LD reading (C. T. J. Huang & Liu, Reference Huang and Liu2000). Logophors impose a consciousness requirement, requiring the antecedent to be conscious of the relevant event being reported. For example, in example (4), the LD antecedent Jack must be aware of the claim made in the embedded clause, as it reflects his own view, that is, the consciousness requirement is met, allowing ziji to take on the LD antecedent as its referent. Under this non-uniform approach, the interpretation of the LD ziji is influenced by syntactic structure, verb semantics (e.g., whether the verb is logophoric), and discourse-level factors (e.g., perspective alignment).
In the present study, as we are interested in IDs in modulating the processing of structures at the interface versus at the narrow syntax, all main verbs in the experiment are logophoric, leading to a preference for LD binding. Thus, the LD interpretation of ziji arises at the syntax-semantics-pragmatics interface: syntax constrains possible antecedents, verb semantics encode logophoricity, and discourse/pragmatic context ultimately selects the referent.
(4)
杰克
认为
乔
喜欢
自己。
Jack
renwei
Joe
xihuan
ziji
Jack
believe
Joe
likes
SELF
Jack believes that Joe likes him/himself.
4. The acquisition and processing of English and Mandarin pronouns
In terms of the acquisition of pronouns in L1-dominant children, both English- and Mandarin-speaking children have been shown to have acquired Binding Principle A from a very young age (around 5 y.o.). That is, they correctly interpret reflexives to a local c-commanding antecedent (Chien, Reference Chien1992; Chien & Wexler, Reference Chien and Wexler1990; Clackson et al., Reference Clackson, Felser and Clahsen2011). In contrast, at the same age, English-speaking children have been consistently shown to erroneously accept local bindings of pronominals, a phenomenon often referred to as the Delay of Principle B Effect (see Thornton & Wexler, Reference Thornton and Wexler1999, for a review). This has been attributed either to children’s immature development of pragmatic principles governing pronominal resolution (e.g., Rule I), despite having acquired Binding Principle B (e.g., Chien & Wexler, Reference Chien and Wexler1990), or to domain-general limitations such as WM constraints (e.g., Kim & Yoon, Reference Kim and Yoon2020). Studies testing the Delay of Principle B Effect in Mandarin, however, have received mixed results (Chien & Lust, Reference Chien, Lust, Li, Tan, Bates and Tzeng2006). It is typically not until the age of 9 that children begin to show adult-like performance in correctly rejecting local bindings of pronominals.
As for the development of the Mandarin ziji , while children before the age of 4 show unsystematic performance patterns, children from the age of 5 predominantly co-index ziji with local antecedents rather than with LD antecedents (Chien & Lust, Reference Chien, Lust, Li, Tan, Bates and Tzeng2006). This local preference is also found in adults. However, individual studies show a high degree of variations, for example, the mean acceptance rates of LD readings range from under 40% to over 90%, and even the mean acceptance rates of local readings can range from under 70% to 90% (see Chen & Ionin, Reference Chen and Ionin2023, for a summary). Such inter-study variation could be attributed to the different methodologies used in these studies, for example, truth value judgement task versus picture-biasing sentence acceptability judgement task, different verbs (logophoric vs. generic verbs), etc., to potential individual variation in executive function (particularly WM) and/or the importance placed on syntax versus discourse factors (see Kim & Yoon, Reference Kim and Yoon2020, for a summary). Importantly, different offline tasks vary in their demands on WM and in the degree to which they require integration of pragmatic information. However, research efforts aimed at addressing task effects and individual variation in this domain remain limited. Indeed, studies adopting online processing methods tend to report a local-reading advantage such that binding ziji to a local antecedent induces smaller processing costs compared to binding ziji to an LD antecedent (Dillon et al., Reference Dillon, Chow and Xiang2016; Lyu & Kaiser, Reference Lyu and Kaiser2021).
Among bilingual speakers, to our knowledge, all existing studies have employed offline tasks. For example, C. Chen and Ionin (Reference Chen and Ionin2023) used a picture-based truth-value judgment task to examine the acceptability of local and LD readings of ta, ziji, and taziji among two groups of Mandarin proficiency-matched L2 learners: L1-Korean and L1-English speakers. Their results showed that L1-Korean learners were more likely to accept local readings of ta compared to both the L1-English learners and L1-dominant Mandarin speakers, suggesting CLI from Korean, a language that permits locally bound pronominals. In the comprehension of taziji, all three groups demonstrated a preference for local readings, although the L1-English group was somewhat more likely to reject its local readings. For ziji, L1-dominant Mandarin speakers showed a numerical preference for local readings, while both L2 groups were significantly more likely to reject LD readings and accept local readings.
In a related study, C. Chen (Reference Chen2020) used the same methodology to compare the comprehension of all three pronoun forms among Mandarin-English HSs and L1-English L2 learners of Mandarin. The results showed that L2 learners were more likely to reject LD readings and accept local readings of ta (in contrast to Chen & Ionin, Reference Chen and Ionin2023), taziji (partially consistent with Chen & Ionin, Reference Chen and Ionin2023), and ziji (like Chen & Ionin, Reference Chen and Ionin2023). In contrast, HSs patterned more closely with L1-dominant speakers in their interpretation of ta and taziji. However, they were less likely to accept the LD reading of ziji and more likely to accept its local reading, which was partially attributed to CLI from English.
5. The present study
The present study employs the web-based visual world eye-tracking paradigm to investigate the automatic, online processing of three types of Mandarin pronouns, ta, ziji, and taziji, among Mandarin-English late adolescent HSs aged 14 to18. Importantly, to avoid variable binding preferences of ziji across individuals and to bias an LD reading, we use only logophoric verbs in the experiment. This manipulation allows us to probe whether different pronoun types engage distinct resolution mechanisms during real-time processing. Specifically, we aim to address the following research questions (RQs):
RQ1: How do HSs process the three types of Mandarin pronouns? Do the pronouns elicit distinct processing patterns?
RQ2: Which (and how do) user-internal and user-external factors modulate IDs in HSs’ processing? If so, do they do so differentially for different types of pronouns?
As described above, two user-internal factors are of particular interest in the present study: WM and inhibition. As for the user-external factor, we focus on HL input quantity. Our focus on late adolescence offers a particularly informative window for testing the effects of user-external and user-internal factors. Compared to early adolescence and childhood, late adolescence is marked by the maturation of executive functions, enabling us to better distinguish between effects attributable to ongoing cognitive development and those driven by stable individual variability. Importantly, this age range is also when adolescents begin to exert greater autonomy over language use, making more independent choices about when, how, and with whom they engage in their HL. As a result, patterns of HL exposure and use (user-external factors) become more variable and personalized than in childhood, offering richer variation in external input that can be captured within an ID framework.
5.1. Predictions
Starting with RQ1, following C. Chen (Reference Chen2020)’s study with Mandarin-English adult HSs and the broader HL processing literature (e.g., Fuchs, Reference Fuchs2022; Hao et al., Reference Hao, Chondrogianni and Sturt2024; Karaca et al., Reference Karaca, Brouwer, Unsworth and Huettig2024), we predict that adolescent HSs will show distinct processing patterns across pronoun types. More specifically, they would show local binding preferences for taziji , and LD preferences for ziji and ta . Nevertheless, according to the IH and the distance problem, HSs may show greater variability in their preferences during the processing of ziji and ta , but relatively robust local preferences for taziji . This is because taziji resolution is governed by narrow syntactic constraints (Binding Principle A) and involves local dependencies, whereas the resolution of ziji and ta requires the integration of information across domains and the establishment of LD dependencies, making these structures more susceptible to input reduction and individual variation. The ID approach adopted in the present study allows us to move beyond a simple group-level analysis and ask why some HSs are more likely than others to exhibit local binding of ziji and ta despite the LD bias – driven by logophoric verbs in the case of ziji and by Binding Principle B in the case of ta – by examining the role of user-internal and user-external factors.
It is worth noting that local interpretations of ziji and ta in the present study need not reflect the same underlying mechanisms. Because ziji allows both local and LD binding in Mandarin and exhibits a default preference for local binding, local interpretations of ziji in the present design may arise from multiple sources, including difficulties with interface/LD structures, reliance on a default local strategy, or CLI from English, where reflexives are locally bound. In contrast, ta does not permit local binding in either Mandarin or English; therefore, any local binding of ta in the present study cannot be attributed to CLI or default strategies and is more likely to reflect difficulties associated with interface/LD integration.
For RQ2, we predict user-internal and user-external factors to play differential roles depending on the type of pronouns. With respect to user-internal factors, WM is expected to facilitate LD interpretations of ziji and ta , insofar as establishing LD dependencies requires maintaining and integrating multiple candidate antecedents and sources of information (e.g., Rule I, logophoric verb semantics) during online processing. In addition, inhibitory control is expected to modulate the extent to which ziji is interpreted locally. Specifically, HSs with higher inhibitory ability may be better able to suppress English-like local binding strategies and/or Mandarin local binding preference, leading to a higher likelihood of LD readings. Turning to user-external factors, despite it seemingly being relatively intuitive that HL input will have a default, ubiquitous effect on HL processing, previous findings are mixed (Hao et al., Reference Hao, Chondrogianni and Sturt2024; Hao, Kubota, et al., Reference Hao, Kubota, Bayram, González Alonso, Grüter, Li and Rothman2025; Karaca et al., Reference Karaca, Brouwer, Unsworth and Huettig2024) and likely depend on other factors such as the specific domain of grammar and the time-course of its typical acquisition (Tsimpli, Reference Tsimpli2014). Two possibilities emerge, that is, HL input either modulates HL processing or does not. Given the limited literature in this specific domain and age range, we treat the effect of HL input as exploratory, without strong directional predictions. However, given that the properties we examine, while inherently related, have independent time courses for full acquisition even in monolingual children (Chien & Lust, Reference Chien, Lust, Li, Tan, Bates and Tzeng2006) and in light of claims that reduced input would affect interface related structures more, it is possible, if not likely, that input factors might have a greater influence over some properties (ziji and ta) examined herein than others (taziji).
5.2. Participants
In total, 125 eligible participants accessed the experimental platform and provided informed consent. Of these, 44 participants did not successfully complete the main visual-world eye-tracking task, primarily due to repeated calibration failures. An additional nine participants were excluded because their effective eye-tracking sampling rate during the main task was below 15 Hz. Consequently, data from 72 Mandarin-English late adolescent HSs (14–18 y.o) who completed the study online were deemed of sufficient quality for analysis.
However, we further excluded 21 participants to control for variation in several user-level factors that have been shown to modulate HL development and use beyond the factors in focus in the current study. More specifically, we excluded three participants due to relatively low SES, five participants who currently reside in Ireland, eight participant with low Mandarin proficiency measured by the Peabody Picture Vocabulary Test (PPTV) fourth edition (Dunn & Dunn, Reference Dunn and Dunn2012), three participants who speak another language other than Mandarin and English (including other Chinese languages, e.g., Cantonese, Hokkien, etc.), and two participants who were exposed to the societal dominant language English after the age of three. The final sample included 51 participants (18 girls, mean age = 15.8 years, SD = 1.7, min = 14 years, max = 18 years). Within the 51 participants, 16 currently reside in the UK, while the other 35 reside in the USA. All HSs were exposed to Mandarin from birth at home and to English before the age of 3 years. They were either born and raised (all second-generation immigrants; n = 17) or immigrated to their current residency before the age of 3 (first-generation immigrants; n = 34), with a mean age onset of acquisition of English of 11.57 months (SD = 7.96, min = 0, max = 27). All participants had exposure to formal instruction in Mandarin (e.g., via Saturday Schools, tuition, etc.).
5.3. Baseline tasks
5.3.1. Language background questionnaire
To collect participants’ language background and demographic information, we administered the Quantifying Bilingual Experience (Q-BEx) questionnaire (De Cat et al., Reference De Cat, Kašćelan, Prévost, Serratrice, Tuller and Unsworth2023). Q-BEx is a validated, user-friendly online instrument designed to quantify multilingual language experience. We included all mandatory modules and optional modules, except for the detailed attitudinal module. As a result, we obtained a comprehensive assessment of HSs’ language exposure and use, self-rated proficiency, richness of linguistic experiences, and language mixing patterns. The questionnaire provides four composite scores that serve as proxies for HL use and exposure. Specifically, we derived two key measures: current HL exposure and use (Mean = 0.41, SD = 0.11, Range = 0.07–0.60) and cumulative HL exposure and use (Mean = 80.53, SD = 30.61, Range = 11.70–118.38), by summing the respective exposure and use components. These aggregated scores were used as continuous user-external variables in subsequent analyses.
5.3.2. English and Mandarin receptive vocabulary
We administered the Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4; Dunn & Dunn, Reference Dunn and Dunn2012) to assess participants’ receptive vocabulary ability in both English and Mandarin. Form A was administered in English, and Form B was translated from English into Mandarin and administered in Mandarin. Because the PPVT-4 is neither available nor normed for Mandarin and was not designed or normed for bilingual populations, even in its English version, we report raw scores only, which were used solely for participant screening and exclusion purposes. For the same reasons, we caution readers against interpreting PPVT scores as direct measures of language proficiency or comparing English and Mandarin vocabulary scores within the same participant. Nevertheless, test administration followed the procedures outlined in the PPVT-4 manual, including age-appropriate starting items, ceiling rules, and termination criteria based on error counts. The final sample has a mean PPVT score of 139 in English (SD = 8.08, min = 115, max = 152) and 124 in Mandarin (SD = 7.44, min = 106, max = 151).
5.3.3. Flanker/no-go task
To examine inhibitory control, participants were tested on an engaging variant of a flanker task that also includes a Go and No-Go component (Woodard et al., Reference Woodard, Pozzan and Trueswell2016). In this task, a Go trial requires the participants to press a key on the keyboard (“Z” for left and “M” for right) in accordance with the direction of the middle fish that is surrounded by two flanker fishes on its left and two on its right. There are two conditions in the Go trials: the congruent condition (30 trials) and the incongruent condition (30 trials). In the congruent conditions, the flanker fish faces the same direction as the middle fish. In the incongruent conditions, the flanker fish faces the opposite direction to the middle fish. In a No-Go trial (30 trials), the middle fish is surrounded by fishbowls. The participants were instructed to refrain from responding. We counterbalanced the direction of the middle fish. Each trial was preceded by a 1000 ms fixation cross in the middle of the screen, and each trial lasted till a response was recorded or until a maximum display time of 5000 ms was reached. Reaction time and accuracy were recorded. We calculated the No-Go cost (mean = −0.16, SD = 0.18, min = −0.33, max = 0.33) as a proxy for inhibitory control by taking the average accuracy difference between omission errors on Go-trials and hits on No-Go trials. This No-Go cost score was carried forward into subsequent analyses, with higher values indexing weaker inhibitory control.
5.3.4. Working memory task
We used a spatial sequence WM task inspired by the Alloway Working Memory Assessment (Alloway et al., Reference Alloway, Gathercole, Kirkwood and Elliott2008) to assess participants’ non-verbal visuospatial WM. In this task, participants were instructed to help a forgetful alien return home by recalling, in reverse order, the sequence of squares the alien had walked through in a 4 × 4 matrix. In each trial, the alien randomly moved across one or more squares in the matrix. In the first block, the alien walked through a single square; in each subsequent block, the number of visited squares increased by one. Each block consisted of six trials, with a maximum of eight blocks in total. Scoring followed the standard procedure outlined in the Alloway assessment. If a participant responded correctly to the first four trials in a block, they automatically progressed to the next block and were awarded the full six points for that block. The task was terminated once the participant responded incorrectly to three trials within the same block. The mean score of the task reached 22.69 (SD = 4.76, min = 15, max = 30). This score was carried forward into the analyses as an index of WM capacity, with higher scores indicating greater WM capacity.
5.3.5. The visual world eye-tracking experiment
The visual world eye-tracking paradigm was adopted to examine participants’ online processing. In the task, participants listened to sentences while viewing three pictures (of potential referents) on the screen (Figure 1). We embedded the pronouns in genitive forms by adding the genitive marker de after the pronouns, and we included three experimental conditions (i.e., ta, ziji vs. taziji). Additionally, we included a condition with full NPs followed by de marker as a control condition to make sure participants understand genitives. Each condition had nine trials, giving rise to 27 experimental trials and nine control trials. All experimental sentences followed the same format: This morning/last night + Long Distance referent NP + Main Verb + Local referent NP + Embedded Verb + ta/ziji/taziji/control NP + de + NP. We chose three logophoric verbs (xiangrang “want somebody to do something,” yaoqiu “demand somebody do something,” and mengjian “dream about somebody doing something”) as the main verbs, and each appeared three times per condition. For the embedded verbs and the de NP pairs, we chose three pairs, that is, liang … de tiwen “checking someone’s temperature,” mo … de erdu “touch someone’s ear,” and la … de weiba “pull someone’s tail,” and each appeared three times per condition. For all referent NPs (including for the control condition NPs), frequently used disyllabic animals were adopted, and each animal appeared equally often as the LD referent, the local referent, and the third referent (either not mentioned or the control condition NP). The positions of the NPs were counterbalanced such that the LD referent, local referent, and a third potential referent appeared an equal number of times on Top, Bottom Left, and Bottom Right positions. To avoid item-specific effects, we created four lists such that each item appeared once as a ta, ziji, taziji, or NP condition. For example, Figure 1, as a visual scene, was accompanied by sentence (5a) in List A, (5b) in List B, (5c) in List C, and (5d) in List D.
(5) a
jintianzaoshang
daxiang
yaoqiu
xiaogou
liang
ta
de
tiwen
This morning
elephant
demand
dog
check
PRO
de
temperature
“This morning, the elephant demanded the dog to check his temperature.”
b
jintianzaoshang
daxiang
yaoqiu
xiaogou
liang
ziji
de
tiwen
This morning
elephant
demand
dog
check
SE
de
temperature
“This morning, the elephant demanded the dog to check his temperature/the temperature of himself.”
c
jintianzaoshang
daxiang
yaoqiu
xiaogou
liang
taziji
de
tiwen
This morning
elephant
demand
dog
check
SELF
de
temperature
“This morning, the elephant demanded the dog to check the temperature of himself.”
d
jintianzaoshang
daxiang
yaoqiu
xiaogou
liang
shizi
de
tiwen
This morning
elephant
demand
dog
check
lion
de
temperature
“This morning, the elephant demanded the dog to check the lion’s temperature.”
Example of a visual scene in the visual world eye-tracking experiment.

The experiment also included 18 filler trials where the processing of relative clauses was the focus. Relative clauses were chosen as they allow us to make sure two animate referents can be mentioned while another one can be inferred, following Y. T. Huang et al. (Reference Huang, Zheng, Meng and Snedeker2013)’s design. As such, the visual scenes for the relative clauses are comparable to the ones used for pronoun processing in terms of animacy and number of potential referents. For example, for the filler sentence “The fish that catches the lion is singing,” the visual scene consisted of a fish, a lion, and a potential patient of the action not mentioned in the sentence, for example, a shrimp, or a potential agent of the action not mentioned, for example, a crab.
To ensure that the participants paid attention to the sentences and to make sure the participants understood genitives, the control trials, along with another random 15 trials, included an offline comprehension check in the format of picture sentence verification (no participant was removed due to low/below-chance offline comprehension accuracy). In the comprehension check, participants were shown a picture that either matched the sentence or did not and asked to press the “Z” key on the keyboard if they matched or the “M” key if not. This comprehension check was embedded in another alien game. We instructed the participants to listen to the sentences and look at some pictures. We also informed them that occasionally, an alien will try to draw the event described in the sentence. They were instructed that when this happens, their task was to decide whether the alien’s drawing matched the sentence or not by pressing “Z” or “M.” All trials appeared in a completely random order for each participant. Each trial began with a 1,500-ms display of the visual scene, followed by the auditory experimental sentence.
Auditory stimuli were recorded by a male Mandarin L1-dominant user in a soundproof booth. Experimental stimuli were constructed by concatenating extracted tokens of This morning/last night + Long Distance referent NP + Main Verb + Local referent NP + Embedded Verb + ta/ziji/taziji/control NP + de + NP. All recordings were produced with neutral prosody, and no systematic prosodic manipulation (e.g., stress) was implemented on any segments. Two L1-dominant Mandarin users checked the naturalness of all stimuli. The duration of all parts but the ta/ziji/taziji/control NP was held constant across all items. Importantly, the duration of de + NP was exactly 1,200 ms in each experimental item. This 1,200 ms period constitutes the critical region for analysis.
5.4. Procedure
All participants took part in the study from their homes. We implemented all tasks with Gorilla on a webpage (Anwyl-Irvine et al., Reference Anwyl-Irvine, Massonnié, Flitton, Kirkham and Evershed2020), which utilizes WebGazer.js (Papoutsaki et al., Reference Papoutsaki, Sangkloy, Laskey, Daskalova, Huang and Hays2016) to run webcam-based eye-tracking. To minimize any carry-over effect between the Mandarin and English vocabulary test, all participants completed all tasks in the following sequence: the Eye-tracking task, Mandarin PPVT, Flanker/No Go Task, WM Task, English PPVT, and the Q-Bex. The whole experiment lasts around approx. 65 mins. The study was approved by the institutional ethics committee. All participants were informed of their ethical rights of participation in written form, prior to the experiment. Before any tasks, participants were asked to check boxes on the webpage to give consent for their participation.
Prior to participation, participants received an introduction video accompanied by written instructions in both Mandarin and English, detailing how participants could help in optimizing data quality (e.g., close all other applications and webpages except for the experiment page; maximize ambient lighting, etc.). Participants were additionally provided with both video and written instructions on how to complete the calibration procedure. Eye-tracking calibration employed a 9-point calibration routine. Recalibration was performed every nine trials or every 5 min, whichever occurred first. For each calibration phase, participants were allowed up to three attempts. An attempt was classified as unsuccessful if at least two out of the nine calibration points failed to calibrate successfully. Furthermore, to minimize system lag and reduce computational load, eye-tracking data were recorded only from the onset of the LD NPs. As a result, eye-gaze data during the 1,500-ms preview window and during the initial temporal adverbial segment (e.g., “last night” / “this morning”) were not recorded. These quality optimization procedures gave us a mean effective sampling rate of 30.6 Hz (SD = 9.3, Range = 15–60).
6. Results
For plotting and data analyses, we resampled the eye-movement data into 50-ms time bins. Given the mean sampling rate, this bin width minimizes empty bins while avoiding excessive aggregation of multiple samples within a single bin. This resampling yielded 24 time bins (data points) within the critical time window. To ensure data quality, we excluded trials with more than 50% of invalid data points (e.g., out of bounds), resulting in 1,285 trials retained out of a total of 1,479 trials. Figure 2 illustrates the difference in mean fixation proportion to the long distance (LD) referent minus the mean fixation proportion to the local referent throughout the course of a trial. As such, a positive value indicates more looks to the LD referent over the local referent, and a negative value indicates more looks to the local referent over the LD referent. The dotted vertical line indicates the onset of the critical time window (the onset of the genitive marker de). As we centred the time information with reference to the onset of the genitive marker, and as different pronouns have different lengths, the onset of each condition differs in the figure. Visual inspection suggests that at the group level (RQ 1), HSs preferred LD referent over local referent after hearing ta (LD-advantage score: Mean = 10.88, SD = 4.37, Range = 1–21) and ziji (LD-advantage score: Mean = 6.76, SD = 6.65, Range = −13–21). However, a preference for the local over LD referent was observed after HSs hearing taziji (LD-advantage score: Mean = −7.40, SD = 3.89, Range = −24–0).
Difference in proportion fixations to LD versus local referent by Condition.

Figure 2. Long description
The graph features a horizontal x-axis labeled Time Bin ranging from negative 3000 to 1000 and a vertical y-axis labeled Difference in fixation proportions to L D referent versus local referent ranging from negative 1.0 to 1.0. A vertical dotted line marks the zero point on the x-axis. A legend on the right identifies three conditions: ta represented by a solid black line, ziji represented by a dotted line, and taziji represented by a dashed line. All lines include a light gray shaded area representing the confidence interval.
* The ta condition (solid line) starts near zero, rises to a peak of approximately 0.35 at negative 1500, dips back to zero at the vertical dotted line, and then rises sharply toward 0.5 at the end of the timeline.
* The ziji condition (dotted line) remains relatively flat near the zero baseline throughout the negative time bins, showing a slight upward trend toward 0.3 after passing the zero mark.
* The taziji condition (dashed line) fluctuates near zero until negative 2000, then dips to a trough of negative 0.2 at negative 1200, returns to zero at the vertical dotted line, and then drops sharply to negative 0.5 by the end of the timeline.
To statistically account for the results, we calculated the LD advantage score (over local referent) within the critical time window. This LD-advantage score was calculated by subtracting the number of 50 ms time bins within the critical time window that contained looks to the local referent from the number of 50 ms time bins that contained looks to the LD referent. We did not adjust for the 200 ms needed to initiate ballistic eye movement in response to an acoustic stimulus because we do not make a distinction between prediction and integration. In contrast, we are interested in how pronouns are interpreted online in general. For statistical analyses, general linear mixed-effect regressions were carried out with the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (R Core Team, 2018). Our statistical modelling, whenever possible, follows a confirmatory approach that is subjective and theory-driven (McElreath, Reference McElreath2020), where only theory-driven fixed effect factors were included. For random effects, we included the maximal random effects justified by the design where possible (Barr et al., Reference Barr, Levy, Scheepers and Tily2013), that is, by-subject and by-item random intercepts, as well as by-subject and by-item random slopes for Condition. When the maximal model failed to converge, we iteratively simplified random effect structures until convergence was achieved, that is, removing random effect(s) accounting for the least variance.
For RQ1 (Do the pronouns elicit distinct processing patterns?), the maximal converged model was derived via the R syntax: lmer(LD_Adv ~ Condition + (1 + Condition|Participant) + (1|Item), with the fixed effect Condition treatment coded and ziji as the referenced level ( ziji vs. ta and taziji ). The model suggests that (1) the ziji condition induced significantly more looks to the LD referent over the local referent (the intercept is significantly larger than zero: Estimate = 7.41, SE = 0.51, CI [6.40, 8.42], t = 14.42, p < 0.001); (2) the ta condition induced more looks to the LD referent compared to the ziji condition (Estimate = 3.46, SE = 0.48, CI [2.51, 4.41], t = 7.15, p < 0.001); and (3) the taziji condition induced more looks to the local referent compared to the ziji condition (Estimate = −15.38, SE = 0.74, CI [−16.83, −13.92], t = −20.76, p < 0.001). As post hoc analyses, we reran the model with the reference level for both ta and taziji to examine HSs’ preference for LD and local referent for each condition. The results suggest that the ta condition induced significantly more looks to the LD referent over the local referent (the intercept is significantly larger than zero: Estimate = 10.87, SE = 0.35, CI [10.18, 11.56], t = 31.00, p < 0.001). In contrast, the taziji condition induced significantly more looks to the local referent over the LD referent (the intercept is significantly smaller than zero: Estimate = −7.96, SE = 0.48, CI [−8.91, −7.01], t = −16.42, p < 0.001).
For RQ2 (RQ2: Which (and how do) user-internal and user-external factors modulate IDs in HSs’ processing?), we firstly examined the correlations among all individual-internal and individual-external factors to avoid multicollinearity in statistical modelling. This step identified strong correlations between Mandarin PPVT score and all other variables, between current HL exposure and use and cumulative HL exposure and use, among others (see the R script on OSF for more information). As such, aiming to statistically account for the effects of theoretical interests, we decided to include only current HL exposure and use (centred), WM (centred), and inhibition (No Go Cost; centred) as fixed effects interacting with Condition (treatment coded). Given the quantity of data at our disposal, we did not include interaction terms in the models among these background factors. The final converged maximal model has the R syntax of lmer(LD_Adv ~ Condition*(Current_Exposure_Use_Mandarin_c + WM_c + NoGo_Cost_c) + (1 |Participant) + (1|Item)). We calculated the Variance Inflation Factor (VIF) to ensure that the maximal model does not violate the multicollinearity principle (all VIFs < 2). Table 1 summarizes the statistical output where ziji is treated as the reference level for the Condition variable (ziji vs. ta, taziji). It is important to note that, as the categorical variable was treatment-coded, the effects reported in the statistical table represent simple effects, that is, the effect of a variable at a given level relative to the reference level, rather than main effects.
The model with Condition (ziji as the reference level) interacting with Current HL Exposure and Use, WM, and NoGo Cost as fixed effects

Table 1. Long description
The table contains six columns: Predictors, Estimates, S E, C I, t, and p.
* (Intercept): Estimate 7.41, S E 0.29, C I 6.83 to 7.98, t 25.36, p less than 0.001.
* taziji: Estimate negative 15.42, S E 0.41, C I negative 16.23 to negative 14.61, t negative 37.39, p less than 0.001.
* ta: Estimate 3.46, S E 0.36, C I 2.76 to 4.17, t 9.62, p less than 0.001.
* Current H L Exposure Use: Estimate 0.50, S E 0.26, C I negative 0.00 to 1.00, t 1.95, p 0.052.
* W M: Estimate 1.78, S E 0.25, C I 1.30 to 2.27, t 7.25, p less than 0.001.
* NoGo Cost: Estimate negative 2.13, S E 0.25, C I negative 2.63 to negative 1.63, t negative 8.41, p less than 0.001.
* taziji multiplied by Current H L Exposure Use: Estimate negative 2.62, S E 0.34, C I negative 3.30 to negative 1.95, t negative 7.62, p less than 0.001.
* ta multiplied by Current H L Exposure Use: Estimate negative 0.66, S E 0.30, C I negative 1.24 to negative 0.08, t negative 2.24, p 0.025.
* taziji multiplied by W M: Estimate negative 2.18, S E 0.35, C I negative 2.86 to negative 1.50, t negative 6.26, p less than 0.001.
* ta multiplied by W M: Estimate 0.14, S E 0.29, C I negative 0.42 to 0.70, t 0.48, p 0.628.
* taziji multiplied by NoGo Cost: Estimate 1.99, S E 0.35, C I 1.31 to 2.67, t 5.74, p less than 0.001.
* ta multiplied by NoGo Cost: Estimate 2.13, S E 0.29, C I 1.55 to 2.70, t 7.27, p less than 0.001.
This model suggests that individual HSs’ look patterns for the ziji condition were modulated by WM and NoGo Cost but not by Current HL Exposure and Use. More specifically, with the increase of WM, there is an increase in looks to the LD referent, and with the increase of NoGo Cost, there is a decrease in looks to the LD referent. To unpack the significant interaction terms between Condition and WM, NoGo Cost, and Current HL Exposure and Use, we ran post hoc analyses by conducting models with all possible combinations of reference levels for the categorical variable Condition. As can also be seen in Figure 3, post hoc analyses reveal that participants’ look pattern for the ta condition was modulated by WM, where participants with higher WM load were more likely to look at the LD referent (Estimate = 1.92, SE = 0.24, CI [1.45, 2.39], t = 8.07, p < 0.001). However, the look pattern for the ta condition was not modulated by Current HL Exposure and Use (Estimate = −0.17, SE = 0.25, CI [−0.65, 0.32], t = −0.67, p = 0.51) nor by NoGo Cost (Estimate = −0.01, SE = 0.24, CI [−0.47, 0.47], t = −0.02, p = 0.98). Look pattern for the taziji condition was not modulated by WM (Estimate = −0.40, SE = 0.31, CI [−1.01, 0.21], t = −1.27, p = 0.20) nor by NoGo Cost (Estimate = −0.14, SE = 0.30, CI [−0.73, 0.46], t = −0.45, p = 0.65) but was modulated by Current HL Exposure and Use such that participants with more HL exposure and use were more likely to look at the local referent (Estimate = −2.13, SE = 0.30, CI [−2.72, −1.53], t = −6.99, p < 0.001).
Effect of WM (left), Inhibition (mid), and Current Exposure and Use of Mandarin on LD advantage.

Figure 3. Long description
A multi-panel line graph with three panels arranged horizontally. All panels share a Y-axis labeled L D underscore v s underscore Local ranging from negative 10 to 15. A legend at the bottom identifies three conditions: taziji (red), ta (blue), and ziji (green).
* Left Panel (Mandarin exposure/use): The X-axis is Current underscore Exposure underscore Use underscore Mandarin from 0.00 to 0.75. The blue line (ta) remains high and stable near 10. The green line (ziji) shows a slight linear increase from 5 to 10. The red line (taziji) shows a sharp linear decrease from 0 to negative 15.
* Middle Panel (Working memory): The X-axis is W M from 15 to 30. The blue line (ta) and green line (ziji) both show a parallel linear increase, with ta rising from 8 to 14 and ziji rising from 5 to 10. The red line (taziji) is stable and flat near negative 8.
* Right Panel (Inhibition): The X-axis is NoGo underscore Cost from negative 0.2 to 0.2. The blue line (ta) is flat at 11. The red line (taziji) is flat at negative 8. The green line (ziji) shows a significant linear decrease from 9 to 2.
7. Discussion
The present study investigated how Mandarin-English HSs interpret different types of pronouns in real time and how linguistic-level and individual-level factors shape this process. We focused on a group of late adolescent HSs, an age group that is underrepresented in the literature. Using a web-based visual world eye-tracking paradigm, we examined the online interpretation of three Mandarin pronouns, that is, ta, ziji, and taziji. Importantly, while taziji is governed by narrow syntax, interpreting ta and ziji requires integrating different information sources, with ta requiring syntax-discourse integration and ziji requiring syntax-semantics-discourse integration.
Starting with group-level performance across pronoun types (RQ1), we observed clear distinctions in pronoun processing patterns. Specifically, HSs interpreted taziji as referring to the local antecedent, whereas ta and ziji were interpreted as referring to the LD antecedent. These findings align with those reported by C. Chen (Reference Chen2020), who found L1-dominant-like offline interpretations of ta and taziji among adult HSs. Our results, combined with those of C. Chen (Reference Chen2020), suggest that Mandarin HSs successfully process and interpret pronouns, even when pronominal resolution requires establishing complex linguistic dependencies, such as those involving LD antecedents, and are at the interface between syntax (semantics) and discourse (as in ta and ziji). This observation is particularly noteworthy in light of claims that linguistic dependencies, especially LD ones, constitute a vulnerable domain in HL grammars (Polinsky & Scontras, Reference Polinsky and Scontras2020). Moreover, under the IH (Sorace, Reference Sorace2011), structures that require integration across grammatical interfaces between syntax and discourse are predicted to be especially prone to attrition or arrested development in bilingual populations. Both ta and ziji fall into this category, as their resolution depends on information at the syntax-(semantics-) discourse interface.
The absence of evidence for such vulnerability in our data, therefore, challenges the scope of these theoretical predictions, highlighting the need to better understand the conditions under which interface-dependent phenomena/LD dependencies may or may not be vulnerable in HSs (Daskalaki et al., Reference Daskalaki, Chondrogianni, Blom, Argyri and Paradis2019; Leal et al., Reference Leal, Rothman and Slabakova2014). One plausible explanation for this discrepancy is methodological. Whereas previous studies reporting vulnerability in pronominal interpretation among HSs have primarily used offline comprehension or production tasks and focused on adult HSs (e.g., Kim & Yoon, Reference Kim and Yoon2020), our study employed a real-time eye-tracking paradigm and targeted late-adolescent HSs. The findings thus underscore the need for future research that adopts a developmental perspective and systematically varies task modality to capture a more comprehensive picture of HL bilingual processing and representation across the lifespan (Fuchs, Reference Fuchs2022).
Additionally, the current study found that ta elicited more looks to the LD referent than ziji , suggesting stronger and more consistent LD resolution for ta across participants. Strikingly, all LD advantage scores for the ta condition were positive, indicating that every participant consistently interpreted ta as referring to the LD antecedent across all items. In contrast, the LD advantage scores for ziji showed much more variability. This suggests that although there was a group-level preference for LD interpretations of ziji , individual participants occasionally interpreted ziji as referring to the local antecedent. This variability is not entirely unexpected, even though all the verbs used in the study were logophoric, biasing LD-readings. This may reflect differences in the linguistic architecture supporting each pronoun. The resolution of ta relies on syntax-discourse interface mechanisms. In contrast, ziji resolution requires integration across multiple interfaces (syntax, semantics, and discourse), potentially making it more susceptible to individual variation whereby HSs may weigh sources of information differently depending on their individual language experiences and cognitive resources (Hao, Kubota, et al., Reference Hao, Kubota, Bayram, González Alonso, Grüter, Li and Rothman2025; Hao, Rossi, et al., Reference Hao, Rossi, Nakamura, Luque and Rothman2025; Kim & Yoon, Reference Kim and Yoon2020). These findings underscore the importance of examining not only group-level trends but also participant-level variability in HL bilingualism (De Houwer, Reference De Houwer2023; Paradis, Reference Paradis2023; Rothman et al., Reference Rothman, Bayram, DeLuca, Di Pisa, Duñabeitia, Gharibi, Hao, Kolb, Kubota, Kupisch, Laméris, Luque, Van Osch, Pereira Soares, Prystauka, Tat, Tomić, Voits and Wulff2023), a point we now turn to in the discussion of RQ2.
Our second research question (RQ2) investigated which user-internal (i.e., WM, inhibition) and user-external (i.e., current HL exposure and use) factors modulate IDs in HSs’ pronoun processing and whether these effects differ across pronoun types. The results reveal distinct patterns of modulation for each pronoun. For taziji, only the user-external factor of current HL exposure and use significantly modulated participants’ look patterns: HSs with greater current exposure to and use of the HL were more likely to fixate on the local antecedent. In contrast, for ta, only the user-internal factor of WM, but not inhibition or HL exposure and use, predicted processing behaviour. That is, participants with higher WM capacity showed stronger preferences for the LD antecedent. Lastly, for ziji, both WM and inhibition modulated look patterns: participants with higher WM capacity and stronger inhibitory control (indicated by lower NoGo cost) showed a greater tendency to fixate on the LD antecedent.
The role of user-internal factors aligns well with our predictions and broader cognitive accounts of sentence processing. Specifically, WM emerged as a key modulator of successful LD resolution for both ta and ziji, but not for taziji. This pattern suggests that pronouns that require resolution across longer syntactic distances or across multiple domains (e.g., syntax, discourse, semantics) impose greater cognitive demands and thus rely more heavily on WM resources. In contrast, the processing of taziji, which is governed primarily by narrow syntactic constraints and typically resolved locally, does not appear to require substantial WM resources. This accords well with findings from Bice and Kroll (Reference Bice and Kroll2021), who also reported no WM effects on morphosyntactic processing (subject-verb agreement) among HSs, reinforcing the view that local dependencies/narrow syntax may not engage domain-general cognitive systems to the same extent compared to LD dependencies/interface structures.
Unlike ta , ziji exhibited sensitivity to both WM and inhibition, suggesting that processing ziji may place demands not only on memory resources but also on participants’ ability to manage competing interpretations. One possibility is that inhibition is involved in suppressing CLI. English lacks a direct equivalent of Mandarin ziji , which might lead HSs to map English reflexives onto Mandarin ziji , giving rise to a preference for local interpretations. However, this does not mean that Mandarin HSs have a reduced inventory of pronouns ( ta vs. reflexives), a possibility suggested by Polinsky and Scontras (Reference Polinsky and Scontras2020), as the current study does show a clear distinction in how HSs process the three different pronouns. Alternatively, or perhaps additionally, inhibition may be required to suppress a dominant or default preference for local binding of ziji , even when the semantic cues from the logophoric verb bias a LD reading. This is supported by prior research suggesting that local interpretations of ziji are more accessible and less costly for processing (Dillon et al., Reference Dillon, Chow and Xiang2016; Lyu & Kaiser, Reference Lyu and Kaiser2021). Under such an account, participants with better inhibitory control were more successful in suppressing the local (default) interpretation, especially when it conflicted with the logophoric semantics of the main verbs.
The fact that inhibition modulated ziji but not ta, despite both requiring LD interpretation, further suggests that ziji may involve more interpretive conflict. An open question, however, concerns which of these mechanisms, that is, CLI versus default local binding preference, is more influential in shaping ziji interpretation among HSs. Future studies could address this question by directly manipulating verb type (e.g., logophoric vs. generic), thereby testing whether the availability of strong semantic cues reduces inhibitory demands in ziji interpretation. Additionally, examining the processing of other LD dependencies that differ in across-language similarities/differences and within-language defaults may help dissociate CLI effects from interpretive biases.
Lastly, regarding the effect of the user-external factor, that is, current HL exposure and use, we found that it modulated look patterns only when HSs processed taziji , but not ta or ziji . While it is seemingly intuitive that more HL exposure and use would lead to better performance in general, as observed in offline tasks (Chondrogianni & Daskalaki, Reference Chondrogianni and Daskalaki2023; Daskalaki et al., Reference Daskalaki, Chondrogianni, Blom, Argyri and Paradis2019; Kubota et al., Reference Kubota, Goto, Kurokawa, Matsuoka, Otani and Rothman2025; Paradis et al., Reference Paradis, Soto-Corominas, Chen and Gottardo2020; Soto-Corominas et al., Reference Soto-Corominas, Daskalaki, Paradis, Winters-Difani and Janaideh2022), findings from online processing studies have been mixed. HL input effects have been reported in adult HSs (e.g., Hao, Kubota, et al., Reference Hao, Kubota, Bayram, González Alonso, Grüter, Li and Rothman2025; Hao, Rossi, et al., Reference Hao, Rossi, Nakamura, Luque and Rothman2025), yet such effects have been shown to be absent in children (e.g., Hao et al., Reference Hao, Chondrogianni and Sturt2024). While one interpretation is that HL input effects are developmentally mediated (i.e., the older one gets, possibly according to more distance in multiple senses the average HS has to the HL, IDs in exposure become (more) deterministic), the current findings suggest a more nuanced possibility: HL input may have selective effects on real-time processing depending on the grammatical domain.
Specifically, we propose that structures involving interface-level integration and/or LD dependencies (ziji and ta) place greater demands on domain-general cognitive resources. As such, their processing may rely less on the amount of HL input alone and more on individual cognitive capacities like WM and inhibition. In contrast, narrow syntactic structures, such as taziji, which is constrained by Binding Principle A and requires local binding, may benefit more directly and robustly from increased HL exposure and use, as these structures are more rule-governed, frequent in input, and less cognitively taxing.
This interpretation, however, stands in contrast to the predictions of the IH (Sorace, Reference Sorace2011), which posits that interface phenomena are more vulnerable to variability and should be more sensitive to input (reduction). One possible resolution is to distinguish between representation and processing efficiency, especially when most studies supporting the IH come from offline measures. It may be that increased HL use enhances the automaticity and efficiency of processing well-established syntactic representations (e.g., Hao et al., Reference Hao, Chondrogianni and Sturt2024; Hao, Kubota, et al., Reference Hao, Kubota, Bayram, González Alonso, Grüter, Li and Rothman2025) but is less effective at resolving the more variable and inferential demands of interface phenomena, where cognitive effort, rather than input frequency, may be the bottleneck.
There is evidence from the present study that supports this view. When we examined individual participants’ LD versus local antecedent preferences for taziji across items, we found that all participants consistently interpreted taziji as referring to the local antecedent, as LD advantage scores were uniformly negative. This suggests categorical application of Binding Principle A. The role of HL exposure and use here may not have influenced which interpretation participants arrived at, but rather how efficiently they processed and resolved the dependency in real time. In other words, HL exposure may have facilitated more rapid retrieval or application of binding constraints, even when interpretive outcomes were uniform across participants. Future research could test this by manipulating HL use, comparing real-time processing and interpretive accuracy across pronoun types, using longitudinal or training designs, and varying verb semantics to isolate input effects from cognitive demands.
Overall, the current findings strongly suggest that not all interface phenomena/LD dependencies are equally vulnerable, nor do they uniformly respond to input variation or cognitive factors (Leal et al., Reference Leal, Rothman and Slabakova2014). Instead, this study highlights the nuanced role cognitive and experiential factors play in shaping real-time pronoun processing among adolescent HSs.
Data availability statement
Supplementary materials, including the full experimental lists and the data that support the findings of this study, are openly available in OSF at https://osf.io/xs23z.
Acknowledgements
We thank the participants who made this research possible. Our special thanks go to the enthusiastic families who advertised the study on our behalf.
Funding statement
This project was funded by the European Union’s Horizon Europe research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101104834, and the Trond Mohn Foundation, under the Center for Language, Brain, and Learning (C-LaBL) grant No. TMS2023UiT01. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.
Competing interests
The author(s) declare none.
Disclosure of use of AI tools
None declared.
Ethics statement
The Research Ethics Committee at the Faculty of Humanities, Social Sciences, and Education at UiT The Arctic University of Norway has assessed the study protocol, including the methodology, recruitment of participants, data processing, as well as the information letter and informed consent. The study protocol is approved by the committee in accordance with the Guidelines for Research Ethics in the Social Sciences and the Humanities (Ref: 12–2024).