Highlights
-
• Instructed and immersed bilinguals differ in L1 attrition in oral production.
-
• L1 attrition effects are modulated by patterns of L1/L2 use and exposure.
-
• Greater overexplicitness in referring expressions is attested in bilinguals.
-
• Overexplicitness in bilinguals is partly linked to ambiguity avoidance strategies.
-
• Pragmatic, activation, and processing costs explain L1 attrition differences.
1. Introduction
Research in bilingualism has increasingly focused on identifying the cognitive and linguistic mechanisms that underlie changes in first language (L1) use among bilinguals. In this context, particular attention has been directed towards how distinct types of bilingual experience, especially immersion versus instruction contexts, shape the nature and extent of these L1 changes. This study addresses these concerns by examining how bilingualism affects the L1 production of subject referring expressions (REs), particularly in topic continuity (TC) contexts. This study specifically targets the spoken production of REs by two types of L1 Spanish-L2 English bilinguals, i.e., early immersed bilinguals in the second language (L2) setting and instructed bilinguals in an L1-dominant environment,Footnote 1 to determine how attrition effects unfold and whether such effects vary as a function of language use and exposure.
While variability in subject expression has been widely studied in both L1 and L2 populations (Lozano, Reference Lozano and Aronoff2021), research specifically targeting the production patterns in potential L1 attriters remains limited (Giannakou & Sitaridou, Reference Giannakou and Sitaridou2022; Köpke & Genevska-Hanke, Reference Köpke and Genevska-Hanke2018). Existing studies on L1 attrition in REs have primarily employed interpretation and processing tasks (e.g., Chamorro et al., Reference Chamorro, Sorace and Sturt2016; Kaltsa et al., Reference Kaltsa, Tsimpli and Rothman2015; Tsimpli et al., Reference Tsimpli, Sorace, Heycock and Filiaci2004), leaving the domain of spoken production underexplored. However, spontaneous production provides key insights into the interface between linguistic representation and real-time use, particularly in discourse-sensitive phenomena such as subject expression.
Moreover, the literature has largely focused on bilinguals who have been immersed in an L2 environment for an extended period of time, generally longer than 5 years (Chamorro et al., Reference Chamorro, Sorace and Sturt2016; Kaltsa et al., Reference Kaltsa, Tsimpli and Rothman2015; Tsimpli et al., Reference Tsimpli, Sorace, Heycock and Filiaci2004). This tendency broadly reflects a conventional view of L1 attrition as a representational phenomenon that requires both prolonged and intense L2 immersion (Gürel, Reference Gürel2004; Schmid, Reference Schmid2013; Seliger & Vago, Reference Seliger and Vago1991). However, more recent frameworks propose a broader and more dynamic conception of L1 attrition. Schmid and Köpke (Reference Schmid and Köpke2017, pp. 637–638), for instance, define attrition as ‘any of the phenomena that arise in the native language of a sequential bilingual as the consequence of the co-activation of languages, crosslinguistic transfer or disuse, at any stage of second language development and use’. Adopting this broader view, this study investigates attrition as a gradual, multidimensional process that can affect both representation and processing along a continuum and emerge under various conditions of L2 exposure.
In line with this approach, this study includes not only early immersed bilinguals but also instructed bilinguals who acquire and use their L2 within a formal educational setting while still residing in an L1-dominant environment. Although often overlooked in L1 attrition research, instructed bilinguals provide a valuable comparison group: while they also experience crosslinguistic influence and language co-activation to different degrees, their frequency of L1 use is considerably higher than that of immersed bilinguals, and importantly, previous research has evidenced L1 changes in this type of bilinguals (Cook, Reference Cook2003; Długosz, Reference Długosz2021; Kecskes & Papp, Reference Kecskes, Papp and Cook2003; Requena & Berry, Reference Requena and Berry2021). Crucially, this distinction allows us to examine how attrition effects are modulated by language use, as predicted by current models of bilingual language interaction such as the Activation Threshold Hypothesis (ATH) (Paradis, Reference Paradis1993, Reference Paradis2004, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007). This account posits that competing items in the language not in use get inhibited following the activation of the counterpart in the language that is being more frequently used. It additionally hypothesises that such L1 changes emerge as a function of frequency and recency of L1 use. Consequently, immersed bilinguals are expected to exhibit more pronounced attrition effects than their instructed counterparts.
A further contribution of this study lies in addressing the lack of control over discourse-pragmatic variables in prior L1 attrition research. While earlier work has shown that potential L1 attriters tend to be overexplicit in RE production, particularly in TC contexts (Giannakou & Sitaridou, Reference Giannakou and Sitaridou2022; Köpke & Genevska-Hanke, Reference Köpke and Genevska-Hanke2018; Sorace, Reference Sorace2011, Reference Sorace2016), the specific discourse conditions that give rise to these patterns have rarely been analysed in detail. This study focuses on two such variables, specifically antecedent distance and number of potential antecedents, which have been shown to influence the use of fuller REs (Arnold & Griffin, Reference Arnold and Griffin2007; Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020). By systematically controlling for these factors, we aim to more accurately isolate differences between bilingual groups and functional monolingualsFootnote 2 to better understand the conditions under which attrition effects surface. Such methodological refinement is critical for accurately assessing the nature and locus of attrition phenomena.
On a final note, the theoretical underpinnings guiding this research draw on two complementary hypotheses. The Interface Hypothesis (IH) (Sorace, Reference Sorace2011, Reference Sorace2016) suggests that structures at the syntax-discourse interface are particularly vulnerable in bilingual grammars, while the Pragmatic Principles Violation Hypothesis (PPVH) (Lozano, Reference Lozano and Ramos2016, Reference Lozano2018) attributes the overuse of explicit forms to violations of pragmatic economy and referential appropriateness. Although the former has been widely used in L1 attrition research, the latter has not yet been applied to this context, despite offering a potentially fruitful explanatory model for bilingual production patterns.
2. Factors constraining the production of null and overt subject REs in native Spanish
Spanish is a null subject language where null and overt subject REs, including overt pronouns and NPs, can grammatically alternate (Rizzi, Reference Rizzi1993). However, selecting between these forms in null subject languages such as Spanish is not completely arbitrary and several differences are found in pro-drop languages (Contemori & Di Domenico, Reference Contemori and Di Domenico2021; Filiaci et al., Reference Filiaci, Sorace and Carreiras2014; Giannakou & Sitaridou, Reference Giannakou and Sitaridou2020; Leonetti-Escandell & Torregrossa, Reference Leonetti-Escandell and Torregrossa2024; Lozano et al., Reference Lozano, Quesada, Papadopoulou and Charatzidis2023; Torregrossa et al., Reference Torregrossa, Andreou and Bongartz2020). In fact, several discursive factors have been shown to constrain the form of the RE used in different contexts within this syntax-discourse interface phenomenon. Firstly, previous research has drawn distinctions concerning how information status can account for the use of differentially explicit subject REs in Spanish. On the one hand, less explicit forms – largely null pronouns – tend to be employed in TC (see 1), that is, where the same subject referent is maintained across clauses and which typically coincides with the topic (Sánchez, Reference Sánchez2010). On the other hand, topic shift (TS) is generally encoded via fuller forms, such as overt pronouns and NPs (Bel et al., Reference Bel, García-Alcaraz, Rosado, de la Fuente, Valenzuela and Martínez-Sanz2016; Blackwell & Quesada, Reference Blackwell, Quesada, Geeslin and Díaz-Campos2012; Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Shin & Smith Cairns, Reference Shin, Smith Cairns and Collentine2009). This aligns with the expected biases of null and overt pronouns suggested by Carminati (Reference Carminati2002): null pronouns largely link back to the previous subject (i.e., TC) and overt pronouns to the previous object (i.e., TS). In TC, as the referent is kept constant across clauses, minimal forms – null pronouns – are generally preferred in Spanish (Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Montrul & Rodríguez-Louro, Reference Montrul, Rodríguez-Louro, Torrens and Escobar2006).
-
1. Chaplin i abrió la puerta. ∅ i se encontró con una habitación vacía.
Chaplini opened the door. (He) i found an empty room.
Thus, in TC, null and non-null subject languages can be broadly differentiated. As opposed to Spanish, a non-null subject language (e.g., English), which does not grammatically allow for the dropping of overt subjects across the board,Footnote 3 encodes TC via overt subject REs (Martín-Villena et al., Reference Martín-Villena, Gharibi, Rothman, Ionin, Montrol and Slabakova2024). It is in these differentially encoded contexts as opposed to TS where L2 learners have been shown to differ from monolinguals, and particularly when they involve third person anaphoric singular subjects (Lozano, Reference Lozano, Leung, Snape and Smith2009, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Quesada, Reference Quesada2021), as they do not seem to struggle with deictic uses of first and second person pronouns. Thus, this study will particularly target TC in the oral production of L1 Spanish-L2 English potential L1 attriters.
Apart from information status, other discursive factors have been found to modulate rates of (over)explicitness in Spanish subject REs. Among these, the distance between a given RE and its antecedent has been argued to be a strong predictor of the appearance of overt forms in TC. Based on accounts such as Accessibility Theory (Ariel, Reference Ariel1990, Reference Ariel1991), the Givenness Hierarchy (Gundel et al., Reference Gundel, Hedberg and Zacharski1993), or Givón’s (Reference Givón1983) Continuity Scale, those referents that are less accessible or salient require the use of explicit material to be recovered. Importantly, referential distance is claimed to play a very central role in modulating referent accessibility or salience, with more distant antecedents making referents less salient and accessible. In fact, previous research has found that more explicit material is employed under the presence of more distant antecedents both in learners and natives (Lozano, Reference Lozano and Ramos2016; Quesada & Lozano, Reference Quesada and Lozano2020).
In addition to antecedent distance, the presence of multiple activated antecedents has been shown to modulate the selection of subject REs in production. Generally, selecting a fuller or less explicit subject RE first requires narrowing down from a choice of potential activated antecedents in discourse, which increases the cognitive load (Arnold & Griffin, Reference Arnold and Griffin2007). For instance, in (2), three potential antecedents are introduced (namely Chaplin, a man, and a baby).
-
2. Chaplini coge el bebéj y ∅i se va corriendo y ∅i encuentra a un hombrek para deshacerse del bebéj. Chaplin i lek da el bebéj al hombrek. [ES_SP_18_14_ASO]Footnote 4
Chaplini takes the babyj and ∅i runs off and ∅i finds a mank to get rid of the babyj. Chaplini gives the babyj to the mank.
Selection of the required RE in the second sentence (Chaplin le da el bebé al hombre) needs to be done considering the number of potential antecedents that match in features (e.g., third person singular) with the verb, e.g., Chaplin, the man, and the baby. Thus, in the presence of matching features in more than one potential antecedent, a fuller RE should be selected to avoid ambiguity. Importantly, these hypotheses have been confirmed in previous research, which has attested an increase in the use of more explicit REs in contexts with a higher number of activated antecedents (Arnold & Griffin, Reference Arnold and Griffin2007; Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Quesada, Reference Quesada2021), a factor which we further explore. Therefore, controlling the effect of these variables is deemed crucial to account for potential differences in overexplicitness between Spanish monolinguals and bilinguals who are proficient in L2 English.
3. Pragmatic Principles Violation Hypothesis
Research on anaphora resolution has consistently shown that bilinguals experience difficulties when interpreting and producing subject REs, especially in interface structures where pragmatic and syntactic cues must be integrated simultaneously (Lozano, Reference Lozano and Aronoff2021). For instance, previous studies have generally reported an overproduction of overt REs in null subject languages among different types of bilinguals, e.g., late bilinguals, heritage speakers, or L1 attriters (Lozano, Reference Lozano and Aronoff2021). From a processing perspective, the IH proposes that bilinguals – in both their languages – tend to be less efficient than monolinguals when integrating multiple sources of information from different domains (e.g., syntax or discourse) simultaneously. These differences, which may result from a less automatic syntactic processing of interface structures, are argued to manifest in production as a potential overproduction of overt pronouns. These are claimed to be used as a default strategy to compensate for potential failures that might arise when computing mappings at the syntax-discourse interface in real time, partly because of bilinguals’ enhanced sensitivity to ambiguity and the consequent tendency to be pragmatically redundant rather than ambiguous. Bilinguals may rely on the processing demands involved in computing such mappings to inhibit the language not currently in use, as this requires constant engagement of executive control.
Connected with this idea, Lozano (Reference Lozano and Ramos2016) observed that L2 learners tend to produce redundant subject REs more frequently than ambiguous ones, as they overuse overt pronouns in TC rather than underuse null subjects in TS. This asymmetry, which frames deficits as a balance between redundancy and ambiguity, motivated his subsequent pragmatic account of anaphora resolution in bilinguals. Thus, taking as a point of departure the IH’s insight that learners resort to overt pronouns as a ‘default’ processing strategy to compensate for deficits at the syntax-discourse interface, Lozano (Reference Lozano and Ramos2016) proposed a pragmatic model to account for this and related observations, i.e., the Pragmatic Principles Violation Hypothesis (PPVH). Based on previous work that used Grice’s (Reference Grice, Cole and Morgan1975) Maxims of Quantity and Manner to explain differences in the distribution of subject REs, Lozano formulated the PPVH, including principles that would ‘call for the avoidance of ambiguity and redundancy as long as the anaphora can be resolved’ (2016, p. 261). The PPVH additionally introduces the notion of different violation strengths (mild to strong) along a continuum, which are connected to the principle that is violated (Manner/Clarity versus Informativeness/Economy) and to a specific violation type (ambiguity versus redundancy), respectively.
In particular, Lozano (Reference Lozano and Ramos2016) argues that violating the Informativeness/Economy principle results in redundancy, a mild violation, as it does not lead to a communicative breakdown since the anaphora can be easily resolved. Within this scenario, an additional gradience is included within redundancy: an overt pronoun is more redundant in TC in the presence of one antecedent as opposed to where there are two (or more). By contrast, violating the Manner/Clarity principle makes it impossible to resolve the anaphora due to its ambiguity, and thus, it is considered a strong violation given that communication breakdowns likely emerge. It is then hypothesised that, in bilinguals, the violation of one principle (Informativeness/Economy principle) will be more frequent than the violation of the other (Manner/Clarity principle). Moreover, Lozano (Reference Lozano and Ramos2016) observes that the pragmatic violations attested in learners are also present in native grammars, although rather marginally. To a much lesser extent when compared with L2 learners, it has been found that native speakers also produce explicit material that results in redundancy in TC, which is in line with our results.
Following the original formulation of the PPVH (Lozano, Reference Lozano and Ramos2016), which incorporated factors such as information status (TC/TS) and number of potential antecedents, Quesada and Lozano (Reference Quesada and Lozano2025) expanded it to accommodate the interaction of redundancy with additional factors such as syntactic configuration (e.g., coordination), picture transition (same/new image), and characterhood (main/secondary character). All these factors are hypothesised to trigger a milder or a stronger violation within the redundancy spectrum. Hence, this second formulation of the hypothesis further expands on the variables that contribute to gradience within redundancy, apart from the number of potential antecedents that was initially proposed. This paper will primarily contribute to exploring the factors that modulate redundancy in L1 attriters, including unexplored variables such as antecedent distance.
Finally, it is important to note that the PPVH has exclusively been applied to L2 acquisition contexts (Feng, Reference Feng2022; García-Tejada, Reference García-Tejada2022; Lozano, Reference Lozano and Ramos2016, Reference Lozano2018; Lumley, Reference Lumley, Ryan and Crosthwaite2020; Margaza & Gavarró, Reference Margaza and Gavarró2022; Quesada, Reference Quesada2021). Native speakers have been found to obey pragmatic principles and their violations of such principles are thought to be minimal. Moreover, sequential bilingual adults ‘are supposed to already manage pragmatic principles in their L1’ (Quesada, Reference Quesada2021, p. 300). Nevertheless, ample previous evidence suggests that the two languages of a bilingual are in constant interaction and (can) thus influence each other in multiple domains (Chamorro & Sorace, Reference Chamorro, Sorace, Schmid and Köpke2019; Green, Reference Green1998; Schmid & Köpke, Reference Schmid and Köpke2017). Therefore, it remains to be explored whether different types of highly proficient bilinguals produce differentially explicit redundant subject REs in their L1 framed within the PPVH, and additionally whether this potential redundancy is further modulated by L2 exposure and use in different contexts (namely immersion versus instruction), which is one of the main goals of the current paper. In addition, it should be investigated whether these violations would be similar across the board or whether arguably more cognitively challenging tasks (i.e., under the presence of multiple antecedents instead of only one antecedent) would trigger more instances of overt forms in bilinguals in line with the IH (Sorace, Reference Sorace2011, Reference Sorace2012) and with the PPVH’s prediction that redundancy (i.e., the production of overt forms when they are not pragmatically required) is milder as the number of antecedents increase.
We thus aim to explore to what extent the PPVH prediction that bilinguals are generally pragmatically redundant holds and is modulated across different L1 attrition contexts, and how the expected redundancy interacts with factors such as the number of potential antecedents and antecedent distance in TC in L1 Spanish-L2 English bilinguals differing in the type of predominant L2 exposure, i.e., instruction and naturalistic immersion. Notably, differences would be expected between these two types of bilinguals based on the claims from the ATH (Paradis, Reference Paradis1993, Reference Paradis2004, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007), i.e., increased redundancy would be expected in bilinguals who use the L1 less frequently. We thus expect immersed bilinguals to be more pragmatically redundant than instructed bilinguals and the latter than Spanish monolinguals. Moreover, several factors, e.g., number of potential antecedents and antecedent distance, are expected to contribute to grading redundancy, thus exhibiting stronger or milder violations of the Informativeness/Economy principle. Importantly, this study will concentrate on the analysis of spoken production due to the scarcity of data available to date.
4. Research questions and hypotheses
As a summary of key findings from previous studies, the production of third person singular subject REs has been found to be vulnerable in language contact settings, e.g., L2 acquisition and L1 attrition (Chamorro & Sorace, Reference Chamorro, Sorace, Schmid and Köpke2019; Köpke & Genevska-Hanke, Reference Köpke and Genevska-Hanke2018; Lozano, Reference Lozano and Aronoff2021). While Spanish monolinguals largely employ null pronouns in TC, L1 attriters generally resort to the use of more explicit subject REs in line with the predictions from the IH (Chamorro & Sorace, Reference Chamorro, Sorace, Schmid and Köpke2019; Sorace, Reference Sorace2011, Reference Sorace2016). In interface phenomena such as the distribution of subject REs in discourse, bilinguals are expected to show L1 vulnerability, particularly when integrating information from different domains (e.g., syntax and discourse) in real time. Moreover, the PPVH (Lozano, Reference Lozano and Ramos2016) also argues that bilinguals tend to be more pragmatically redundant than ambiguous, which parallels with the expected overproduction of overt pronouns predicted by the IH.
In addition to the potential increase in overt REs in bilinguals in TC, several discourse-related factors have been found to trigger the use of more explicit subject REs: antecedent distance or the number of potential antecedents (Torregrossa et al., Reference Torregrossa, Bongartz and Tsimpli2019). However, the role of some of these factors is still poorly understood. Furthermore, research on production in L1 attrition has only focused on immersed bilinguals but not on L2 instructed bilinguals in the L1 environment, whose performance should lie between that of Spanish monolinguals and immersed bilinguals considering the claims of frequency of L1 use made by the ATH (Paradis, Reference Paradis1993, Reference Paradis2004, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007).
Therefore, originating from the predictions of vulnerability in interface structures in bilinguals from the IH (Chamorro & Sorace, Reference Chamorro, Sorace, Schmid and Köpke2019; Sorace, Reference Sorace2011, Reference Sorace2016), the PPVH (Lozano, Reference Lozano and Ramos2016), and the ATH (Paradis, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007) as well as the role that discourse-related factors may play on the realisation of subject REs, we used two film-retelling oral production tasks to assess the following research questions and hypotheses:
RQ1: How do third person singular null and overtly realised (overt pronouns and NPs) subject REs distribute in TC in the oral production of instructed versus immersed L1 Spanish-L2 English bilinguals compared to Spanish monolinguals? Does the distribution vary in tasks that differ in overall cognitive demands?
H1: The three groups are expected to largely produce null pronouns to encode TC. However, the production of overt subject REs in instructed and immersed bilinguals will most likely be significantly higher than that of Spanish monolinguals given their high(er) L2 English use and exposure in line with the predictions from the IH and the ATH. Moreover, differences will be expected between instructed and immersed bilinguals in that the former will show attrition effects to a lesser extent since they are less exposed to the L2 and use it less frequently.
Additionally, differences in the distribution of subject REs are more likely to emerge in more cognitively demanding tasks where the selection of the appropriate subject RE needs to be done in the presence of multiple activated referents. Particularly, differences with bilinguals will be more pronounced in tasks with increased cognitive demands given that some cognitive resources will be necessary to inhibit the language not in use following the IH.
RQ2: Which discourse-related factors constrain the production of null and overt subject REs in TC in instructed versus immersed L1 Spanish-L2 English bilinguals compared to that of Spanish monolinguals? Are the two bilingual groups more pragmatically redundant than Spanish monolinguals?
H2: Considering previous results on variability in subject realisation in production, we hypothesise that third person singular overt subject REs (namely, overt pronouns and NPs) will be modulated by factors such as a longer distance between a given subject RE and its antecedent, and scenarios with a higher number of activated antecedents.
Firstly, more explicit subject material is expected with more distant antecedents considering that they are less accessible in working memory and should therefore be retrieved overtly to make them more salient. Secondly, a higher number of potential antecedents might make an antecedent less salient, and hence, more overt material will be employed to avoid ambiguity following the PPVH predictions.
Additionally, instructed and immersed bilinguals could possibly be more sensitive to pragmatic factors and would be more pragmatically redundant in their L1 to avoid potential ambiguity. This prediction would follow the PPVH, by which it could be hypothesised that these bilinguals are more sensitive than Spanish monolinguals in production due to enhanced sensitivity to pragmatic principles that constrain the use of overt REs (e.g., antecedent distance) to avoid potential ambiguity.
5. Methodology
5.1. Participants
Three groups participatedFootnote 5 in this study (see Table 1), namely, a control group of L1 Spanish functional monolinguals and two experimental groups of advanced L1 Spanish-L2 English instructed and immersed bilinguals. The group of monolinguals (N = 33; 20 females) were all undergraduate or postgraduate university students majoring in degrees unrelated to languages (M age = 21.6; SD = 2.15; range = 18–26). They were all monolingually raised Peninsular Spanish speakers who lived in an L1-dominant monolingual context, e.g., Granada. Additionally, they had not spent time abroad during their (pre-)university studies, so their exposure to L2 English was minimal and limited to formal instruction, with a mean age of onset at 5.34 years (SD = 1.76) and an average length of instruction of 11.71 years (SD = 1.42). They had not attended a bilingual school during primary or secondary education, and their mean L2 English proficiency, as objectively measured by the Oxford Quick Placement Test (OQPT), was 21.94/60 (SD = 3.29, range = 15–29), corresponding to A1-A2 levels from the CEFR. Hence, the proficiency level of this group was considerably low. Participants also reported using their L2 daily at a mean rate of 3.81% (SD = 4.7). Thus, both exposure to and use of the L2 were minimal. Regarding dominance, all participants obtained high overall scores (range = 114.43–176.72) in the Bilingual Language Profile (BLP) (Birdsong et al., Reference Birdsong, Gertken and Amengual2012), indicating that they were all L1-dominant.
Table 1. Participants’ background

Secondly, the group of instructed bilinguals consisted of 80 undergraduate students (64 females; M age = 20.41; SD = 1.7; range = 18–26) pursuing a degree in English Studies at various Spanish universities, where they were regularly exposed to L2 English. All participants were raised monolingually in Spain, speaking and being exposed to Peninsular Spanish. On average, they were first exposed to L2 English at 5.12 years of age (SD = 1.99), and their mean length of L2 English instruction was 14.89 years (SD = 1.97). In terms of proficiency, all participants were highly advanced L2 English learners, with a mean OQPT score of 52.41/60 (SD = 3.68; range = 48–60), corresponding to C1-C2 CEFR levels. Regarding daily L2 English use, participants reported a mean of 25.33% (SD = 11.56), which was primarily restricted to instructional settings. However, half of the participants also reported using L2 English with friends outside lectures or for activities such as social media and entertainment (e.g., reading or writing). Finally, their BLP scores ranged from 29.42 to 108.43, indicating L1 dominance, though their dominance profiles varied considerably.
Thirdly, we included a group of 94 advanced L1 Spanish-L2 English bilinguals (69 females; M age = 26.9; SD = 3.74; range = 19–34) who were immersed in an L2 environment, using and being exposed to L2 English daily. All participants were monolingually raised Peninsular Spanish speakers who had been living in the UK for 1–12 years (M = 3.88; SD = 2.4). They were first exposed to L2 English at an average age of 6.23 years (SD = 2.31), and their mean length of L2 English instruction was 15.13 years (SD = 3.3). In addition, participants reported using the L2 daily (M = 64.84%; SD = 17.67) and were all highly advanced L2 English bilinguals, as measured by the OQPT (M = 52.89/60; SD = 3.03; range = 48–60). Based on their BLP scores (range = −20.35 to 104.34), this group included participants with varying dominance profiles, from more L2-dominant to L1-dominant individuals.
5.2. Oral production tasks
To investigate the production of subject REs, participants completed two semi-guided narrative tasks orally, which presented instances of Charles Chaplin clips. Film retellings have been previously employed in studies analysing the production of subject REs in written (Lozano & Quesada, Reference Lozano and Quesada2023; Quesada, Reference Quesada2021) and oral formats (Blackwell & Quesada, Reference Blackwell, Quesada, Geeslin and Díaz-Campos2012; Quesada & Blackwell, Reference Quesada, Blackwell and Collentine2009; Ryan, Reference Ryan2016). Two videosFootnote 6 were selected to explore the role of the number of potential antecedents in the production of fuller subject REs. Having two videos with different antecedent configurations (one main character versus several antecedents with different genders) allowed us to explore the potential effect of the number of activated antecedents more deeply. These tasks have been shown to trigger the semi-spontaneous production of third person singular animate subject REs in TC, considering that these contexts have been found to be largely problematic for L2 acquisition and hypothetically vulnerable for L1 attrition (García-Alcaraz & Bel, Reference García-Alcaraz and Bel2019; Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Quesada, Reference Quesada2021).
Both short black and white clips (lasting roughly 2 and 4 minutes, respectively) presented actions performed by different characters. Nonetheless, the main character was kept constant in the two videos to avoid a potential character effect (Quesada & Lozano, Reference Quesada and Lozano2020). The first clip (Task 1) only contained actions performed by Charles Chaplin from the film One A.M. and no other animate characters intervened. Thus, this video would elicit (almost) exclusively TC contexts where reference to the main character is maintained across clauses. This makes it possible to explore the subject REs produced in complete absence of additional antecedents. The second video clip (Task 2) is an excerpt from the film The kid by Charles Chaplin and up to six characters appear: Charles Chaplin, a baby, a policeman, a woman and her baby, and an old man. Most of them take a leading role in different scenes, and hence, this task enables to explore the selection of subject REs used when different competing antecedents intervene. Task 2 was arguably more cognitively demanding given that the selection of the appropriate RE in each context would have to be done in the presence of multiple antecedents that could match in number and gender features, e.g., Chaplin and the old man. This requires holding several potential antecedents in working memory and selecting one out of different options, which may in turn increase the cognitive load (Arnold & Griffin, Reference Arnold and Griffin2007). By contrast, participants in Task 1 had to narrate actions exclusively performed by the main character, who was always kept constant, which additionally keeps away the cognitive cost of maintaining referents in working memory.
Importantly, participants were instructed to narrate the story of the video in Spanish to someone who had not watched it to minimise assumptions of shared knowledge with the potential addressee. This instruction is crucial in tasks of this kind, as familiarity with the video could arguably influence the choice of REs (Sorace, Reference Sorace2004). Participants recorded their oral narrations using their own devices and uploaded them via a provided link. The recordings were transcribed by a research assistant and later checked by the first author of the paper for consistency. In total, the recordings amounted to over 1000 minutes.
5.3. Analysis and tagset
The data from the two video-retelling tasks were analysed using the UAM Corpus Tool (O’Donnell, Reference O’Donnell and Callejas2009), a stand-off XML annotation software that allows for the creation of fine-grained tagsets with varying levels of specificity. All third person singular animate subject REs in TC that appeared with finite verbs in coordinated, subordinated, and juxtaposed scenarios were annotated and included in the analysis (see Supplementary Figure S1). Only third person singular animate subject REs were considered following Lozano (Reference Lozano, Leung, Snape and Smith2009), who showed that deficits at the syntax-discourse interface are selective and do not affect the entire pronominal paradigm. Additionally, third person singular pronouns were identified as particularly problematic for L1 Spanish-L2 English learners in TC (Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020) and are thus hypothesised to be problematic for L1 Spanish attriters. Moreover, a potential L2 English influence on the REs used in L1 Spanish within the investigated groups would most likely be manifested in TC, where null pronouns are expected in L1 Spanish and overt forms in L1 English instead. Notably, this effect would primarily appear in contexts outside of coreferential coordination, where both English and Spanish predominantly use null pronouns (Martín-Villena et al., Reference Martín-Villena, Gharibi, Rothman, Ionin, Montrol and Slabakova2024). Although examining the specific role of crosslinguistic influence in L1 attrition falls outside the scope of this paper – since it would require comparisons with other relevant language pairs (e.g., L1 Spanish-L2 Italian or Greek attriters) – our focus on TC provides valuable opportunities for comparison with previous studies exploring this phenomenon.
For each subject RE analysed, various tags were applied using a linguistically-informed fine-grained tagset, developed based on Lozano (Reference Lozano, Leung, Snape and Smith2009, Reference Lozano and Ramos2016), Martín-Villena and Lozano (Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020), and Quesada and Lozano (Reference Quesada and Lozano2020). First, each subject was classified according to its form: null pronoun, overt pronoun, or NP (see 3). These are the primary REs previously examined in corpus and experimental studies.
-
3. ∅ i/Éli/Chaplini se vuelve a resbalar con las alfombras hasta que ∅ i vuelve a caer al suelo. [ES_SP_20_15_JFM]
∅/He/Chaplini slips again with the rugs until hei falls back on the floor.
Regarding potential antecedents, various tags were included in the analysis based on previous studies (Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Quesada & Lozano, Reference Quesada and Lozano2020). Activated antecedentsFootnote 7 were defined as those active within up to four (non-)finite clauses preceding the tagged subject RE. First, we coded the number of potential antecedents preceding each subject. There were instances where just one activated antecedent was identified, which coincided with the annotated subject (4), and others where there were two (5) or three (6) potential antecedents.
-
4. ∅ i Intenta abrir la casa pero ∅ i no puede. ∅ i Busca por todos lados y ∅ i no encuentra la llave. [ES_SP_19_15_CMJ]
Hei tries to open the house but hei can’t. Hei searches everywhere and cannot find the key.
-
5. La señora i […] parece que discute con él j y que ∅i le j dice que ∅ j se lleve al niño k. [ES_SP_27_14_AM]
It looks like the womani is arguing with himj and shei tells himj to take the babyk.
-
6. [Chaplin] ∅ i Se vuelve a cruzar con el carro que se ha cruzado el otro hombre j y esta vez, la mujer k está fuera, y al reconocerle i, pues ∅ k va detrás de él i. [ES_SP_20_14_AGD]
[Chaplin] Hei comes across the pram again that the other manj has come across, and this time, the womank is outside, and as shek recognises himi, shek goes after himi.
Finally, the last tags added captured the distance between the subject RE and its antecedent, an important factor in the selection of anaphoric forms (García-Tejada, Reference García-Tejada2022; Givón, Reference Givón1983; Lozano, Reference Lozano and Ramos2016). In this analysis, antecedents included any mention that activated a given referent, regardless of its form. To measure the distance between the subject RE and its antecedent, we counted the number of clauses – both finite and non-finite – and applied tags for one (7), two (8), or three (9) clauses apart.
-
7. La mujer i se enfada y ∅ i le j hace llevarse al bebé k. [ES_SP_31_14_CP]
The womani gets mad and ∅i makes himj take the babyk.
-
8. Cuando ∅ i entra por la puerta con las llaves pues hay una serie de alfombras y demás y cada vez que ∅ i pisa una alfombra […]. [ES_SP_24_15_CCC]
When hei enters through the door with the keys, there are some rugs and the like and every time hei steps on a rug […].
-
9. ∅i Los tira y el vídeo es cómico porque no hay voces ni sonidos, solo una música de fondo y entonces mientras ∅ i está fumando […] [ES_SP_21_14_CSI]
Hei throws them away and the video is comical because there are no voices or sounds, only background music and then while hei is smoking […].
Once all REs were tagged, we fitted generalised linear mixed-effects models using the glmer function with a binomial family from the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (R Core Team, 2021). These models analysed the probability of selecting an overtly realised RE (including both overt pronouns and NPs given the low frequency of overt pronouns) versus null pronouns. The models included relevant dummy-coded categorical fixed factors (e.g., group), scaled continuous predictors (e.g., number of activated antecedents and antecedent distance), and the BLP score as a covariate to control for the effect of language dominance.Footnote 8 A by-participant random intercept was also included, as adding slopes led to convergence issues. Additional comparisons were conducted by changing the reference level of the categorical predictors. The model output provided the log odds of producing an overt subject RE or not. The final models used for RQ1 and RQ2 were the following:
-
- RQ1: glmer(re_form ~ group + scale(BLPDom) + (1|participant)) for Task 1 and Task 2, respectively, and
-
- RQ2: glmer(re_form ~ group*scale(ant_dist) + scale(BLPDom) + (1|participant)) for Task 1 and 2 together and glmer(re_form ~ group*scale(n_act_ant) + scale(BLPDom) + (1|participant)) for Task 2.
6. Results
6.1. Descriptive results
All subject REs analysed amounted to 9209. Despite differences in the raw total number of subjects tagged by group, the ratio of subjects to the total number of words produced in each group is not strikingly dissimilar (monolinguals: 1396/16506 = 8.46%, instructed bilinguals: 3371/41218 = 8.18%, and immersed bilinguals: 4442/59679 = 7.44%), making the three groups comparable.
The following sections present the results by research question. First, the distribution of subject REs will be compared across groups in Task 1 and 2 separately to address RQ1, followed by the effect of antecedent distance and the number of activated antecedents to address RQ2.
6.2. Overall distribution of subject referring expressions (RQ1)
This section presents the results from the overall production of subject REs in TC across the three groups in the two tasks that differ in cognitive demands. When the results from the two tasks are considered separately, two clearly differentiated patterns emerge.
As shown in Figure 1 and Table 2, there are no differences in the distribution of subject REs in Task 1. Spanish monolinguals and the two bilingual groups (instructed and immersed) almost exclusively resort to the use of null pronouns (99.2%, 98.9%, and 98.2%), with the production of overt pronouns and NPs being marginal. Specifically, only 5, 17, or 36 explicit REs were produced out of the total tagged subjects. Importantly, the results from a generalised linear mixed-effects model, with group as a fixed factor (monolinguals as the reference level), the BLP score as a covariate, and a by-participant random intercept, do not reveal a significant effect of group. There were no significant differences between monolinguals and instructed (β = 1.23, 95% CI [−.61, 3.07], SE = .94, z = 1.32, p = .19) and immersed bilinguals (β = 1.85, 95% CI [−.19, 3.90], SE = 1.05, z = 1.77, p = .08), or between the two bilingual groups (β = .62, 95% CI [−.20, 1.45], SE = .42, z = 1.47, p = .14). This suggests that all groups exhibit comparable distribution patterns of subject RE production, with null pronouns being predominantly used as predicted.

Figure 1. Overall production of subject REs in Task 1 across groups.
Table 2. Overall production of subject REs in Task 1 across groups

Note: The percentage of production is followed by the raw frequency in brackets.
By contrast, their performance in the most cognitively demanding task proves to be somewhat dissimilar, as illustrated in Figure 2 and Table 3. In this task, TC is also predominantly encoded via null pronouns (monolinguals = 94.6%, instructed bilinguals = 91.6%, and immersed bilinguals = 88.5%), followed by NPs (4.1%, 5.4%, and 8.5%, respectively) and overt pronouns (1.3%, 3%, and 3%). The production of the latter remains very limited, in line with previous research (Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Montrul & Rodríguez-Louro, Reference Montrul, Rodríguez-Louro, Torrens and Escobar2006; Quesada, Reference Quesada2021). The results from another generalised linear mixed-effects model, with group as a fixed factor (monolinguals as the reference level), the BLP score as a covariate and with a by-participant random intercept, reveal a significant effect of group. There are differences between the three groups of interest: monolinguals versus instructed (β = .66, 95% CI [.11, 1.21], SE = .28, z = 2.37, p = .02) and versus immersed bilinguals (β = 1.05, 95% CI [.45, 1.65], SE = .31, z = 3.42, p < .001) and between bilinguals themselves (β = .39, 95% CI [.14, .64], SE = .13, z = 3.09, p = .002).

Figure 2. Overall production of subject REs in Task 2 across groups.
Table 3. Overall production of subject REs in Task 2 across groups

Overall, different patterns of production of overt REs have been found across tasks. Importantly, while Task 1 only includes one main character, i.e., Charles Chaplin, the second task presents actions performed by both Chaplin and other characters. Thus, Task 2 is shown to mostly trigger the use of more explicit forms.
6.3. Factors conditioning the use of more explicit REs (RQ2)
Regarding RQ2, several factors are hypothesised to affect the distribution of differentially explicit subject REs. This section will present the findings on the effect of variables such as antecedent distance and the number of potential antecedents on the increased use of overt REs across all groups. Although TC is predominantly encoded via null pronouns in Spanish monolinguals and bilinguals, the overall distribution of subject REs includes some instances of overt REs, which will be examined in more detail.
Antecedent distance and number of activated antecedents
RQ2 explored the role of antecedent distance and the number of activated antecedents. On the one hand, regarding the distance between a given subject RE and its antecedent, it was hypothesised that retrieving a more distant antecedent would require more explicit material, as the referent’s activation in working memory would likely decrease. Importantly, antecedents were considered those that activate a given referent at a specific point in discourse and trigger its mental representation, even if they are not fully realised through explicit material. Therefore, antecedents could take a more explicit form (e.g., Chaplin, the old man) or a less explicit one (e.g., a clitic pronoun or an overt or a null pronoun).
Concerning this factor, Figure 3 and Table 4 show the distribution of null and overt subject REs, combining overt pronouns (which were infrequently produced) and NPs, across groups in contexts that vary in antecedent distance. Visual inspection of the results suggests that instructed and immersed bilinguals tend to use more overt forms than monolinguals throughout. Additionally, immersed bilinguals exhibit the highest rate of overt form production overall. Monolinguals show relatively consistent production rates of overt forms across different antecedent distances. In contrast, instructed and immersed bilinguals display observable differences, with an increase in overt forms as the antecedent becomes more distant.

Figure 3. Overall production of null and overt subject REs by antecedent distance across groups.
Table 4. Overall production of null and overt subject REs by antecedent distance across groups

On the other hand, the number of potential antecedents has been proposed as a trigger for the use of more explicit REs, as explored in RQ2 (Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Quesada, Reference Quesada2021; Torregrossa et al., Reference Torregrossa, Bongartz and Tsimpli2019). Importantly, activated antecedents include all referents that are recovered, either explicitly or implicitly, within several clauses prior to a given subject RE. Figure 4 and Table 5 display the distribution of overt material in TC, considering the number of activated antecedents within the last four clauses prior to a given subject RE. All groups generally produce more overt REs as the number of activated antecedents increases (monolinguals: 2.1%, 4.6%, and 7.3%; instructed bilinguals: 3.8%, 6.7%, and 10.9%; and immersed bilinguals: 10%, 8.4%, and 14%). Notably, immersed bilinguals tend to produce more overt REs than the other two groups, particularly in contexts with just one activated antecedent. This suggests that immersed bilinguals use significantly more explicit REs even in less demanding contexts, where ambiguity is not necessarily at stake.

Figure 4. Overall production of overt subject REs by number of activated antecedents across groups in Task 2.
Table 5. Overall production of overt subject REs by number of activated antecedents across groups in Task 2

To analyse the effect of antecedent distance, the number of activated antecedents, and the potential interaction with the group, we ran two generalised linear mixed-effects models. These models included the interaction of group, which was dummy-coded (with monolinguals as the reference level), with the two scaled continuous predictors (antecedent distance and number of activated antecedents) separately. The models also included the scaled BLP score as a measure of language dominance and a by-participant random intercept. Importantly, the model exploring the effect of antecedent distance was run on the entire dataset, including both Task 1 and Task 2, while the model addressing the effect of the number of potential antecedents was run exclusively on Task 2, as this is the only task where this variable plays a role.
In the model investigating the effect of antecedent distance in interaction with group, a significant effect of group was found, with significant differences between monolinguals and instructed (β = .72, 95% CI [.19, 1.24], SE = .27, z = 2.69, p = .007) and immersed bilinguals (β = 1.13, 95% CI [.55, 1.71], SE = .30, z = 3.82, p < .001), and between the two bilingual groups (β = .41, 95% CI [.17, .65], SE = .12, z = 3.35, p < .001). Additionally, the effect of antecedent distance was not significant only for monolinguals (β = −.08, 95% CI [−.49, .32], SE = .21, z = −.40, p = .69), but it was significant for both bilingual groups. A significant group*antecedent distance interaction revealed that the effect of antecedent distance was significantly different for immersed bilinguals when compared to monolinguals (β = .43, 95% CI [.02, .84], SE = .21, z = 2.05, p = .04). However, this effect was not significantly different between monolinguals and instructed bilinguals, nor between the two bilingual groups. These findings suggest that the production of more explicit subject REs in instructed bilinguals lies between that of monolinguals and immersed bilinguals.
The results from the model exploring the role of the number of activated antecedents in Task 2, with monolinguals at the intercept, revealed a significant effect of group, with differences between monolinguals and instructed (β = .65, 95% CI [.09, 1.21], SE = .29, z = 2.26, p = .02) and immersed bilinguals (β = 1.11, 95% CI [.50, 1.72], SE = .31, z = 3.57, p < .001), and between the two bilingual groups (β = .47, 95% CI [.21, .72], SE = .13, z = 3.56, p < .001). The effect of the number of activated antecedents was significant overall for all three groups (β = .45, 95% CI [.07, .82], SE = .19, z = 2.31, p = .02). However, the interaction group*number of activated antecedents was not significant.
7. Discussion
This section discusses the results from the two corpus-based tasks in relation to previous studies on the production of subject REs in native Spanish, addressing the research questions and hypotheses formulated.
RQ1 aimed to investigate the distribution of subject REs in TC in the oral production of instructed and immersed bilinguals compared to monolinguals. This was done using two narrative tasks that varied in cognitive demands. The first task involved narrating actions performed by a single, constant main character (i.e., Chaplin), arguably minimising cognitive demands by eliminating the need to track multiple referents. The second task required participants to select the appropriate referring expression in contexts with multiple antecedents matching in number and gender (e.g., Chaplin or the old man). This arguably introduced an additional cognitive load, as participants had to maintain several potential antecedents in working memory and select the correct option, significantly increasing the task’s complexity.
As previously illustrated, TC is almost exclusively encoded via null pronouns in all groups, consistent with previous studies (Blackwell & Quesada, Reference Blackwell, Quesada, Geeslin and Díaz-Campos2012; Giannakou & Sitaridou, Reference Giannakou and Sitaridou2020; Lozano, Reference Lozano, Leung, Snape and Smith2009, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Montrul & Rodríguez-Louro, Reference Montrul, Rodríguez-Louro, Torrens and Escobar2006; Quesada, Reference Quesada2021). Additionally, similar to the patterns observed in previous research, NPs are used more frequently than overt pronouns, whose production has been reported to be markedly limited (Giannakou & Sitaridou, Reference Giannakou and Sitaridou2022; Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Montrul & Rodríguez-Louro, Reference Montrul, Rodríguez-Louro, Torrens and Escobar2006). Despite the overall similarity in subject production patterns across groups, significant differences were observed, though not consistently across all contexts. Bilinguals do not differ from monolinguals, nor from each other, in the production of null pronouns in contexts without multiple potential antecedents (i.e., in Task 1, which features only one main character). This suggests that, in the absence of pressing cognitive demands, advanced bilinguals pattern similarly to monolinguals when tested in their L1. Nevertheless, a different trend emerges in Task 2, where multiple characters play a leading and active role throughout the video. This arguably increases the cognitive demands of the task, as the correct selection of subject REs is necessary to avoid potential referential ambiguity.
Tasks 1 and 2 primarily differ in the presence of potential ambiguity, which is almost exclusive to the latter. To avoid ambiguity, speakers need to consider various factors (e.g., degree of antecedent activation, antecedent distance, or number of activated antecedents), making the second task more cognitively demanding. It is in this second task where both advanced bilingual groups significantly differed from monolinguals. Notably, both instructed and immersed bilinguals used significantly more explicit forms than monolinguals to encode TC. Furthermore, the proportion of overt forms used by instructed and immersed bilinguals significantly differed, with immersed bilinguals being the most (over)explicit group, as reported in previous studies (Köpke & Genevska-Hanke, Reference Köpke and Genevska-Hanke2018). Thus, in line with the predictions of the IH for L1 attrition (Chamorro & Sorace, Reference Chamorro, Sorace, Schmid and Köpke2019; Sorace, Reference Sorace2011), the L1 production of third person singular subject REs in TC (Lozano, Reference Lozano, Leung, Snape and Smith2009) has been shown to be a vulnerable domain in the two bilingual groups, particularly in contexts that require the simultaneous integration of information from different domains such as syntax and discourse. These differences between monolinguals and bilinguals may stem from the increased processing demands faced by bilinguals, who must inhibit the language not in use. Possibly, the effort bilinguals employ in inhibitory control may be in a trade-off relationship with the capacity to integrate discursive or pragmatic information. Crucially, this inefficient integration results in overexplicitness, arguably to reduce the processing demands involved by using overt subject REs as a ‘default’ processing strategy (Chamorro & Sorace, Reference Chamorro, Sorace, Schmid and Köpke2019; Sorace, Reference Sorace2016). Alternatively, or complementarily, from a PPVH perspective, bilinguals may use more explicit REs to avoid ambiguity and be pragmatically more ‘obedient’. Thus, the findings can be explained from both an online processing costs (IH) and a pragmatic costs (PPVH) perspective as processing costs are generally a reflection of complex linguistic operations (be them pragmatic, syntactic, or morphological).
Moreover, it is important to note that these results are partially consistent with the ATH (Paradis, Reference Paradis, Köpke, Schmid, Keijzer and Dostert2007). Considering variables such as frequency of L1 use, the groups analysed exhibit distinct distribution patterns of subject REs. Monolinguals, who use the L1 more frequently are the least overexplicit when encoding TC. In contrast, instructed bilinguals are more overexplicit than monolinguals, but still use significantly fewer overt forms than immersed bilinguals, who use the L1 less frequently than the other groups. Thus, these results can also be explained by theories that focus on variables central to the bilingual experience, i.e., activation costs accounts, and that do not make predictions for categorically distinct groups but that can be used to understand bilingualism within a continuum, a perspective that future studies should explore.
On another note, RQ2 examined the factors that constrain the production of null and overt subject REs, with antecedent distance being the first explored. Several accounts, such as Accessibility Theory (Ariel, Reference Ariel1990, Reference Ariel1991), the Givenness Hierarchy (Gundel et al., Reference Gundel, Hedberg and Zacharski1993), or Givón’s (Reference Givón1983) Continuity Scale have emphasised the role of distance in contributing to antecedent salience, prominence, or accessibility. According to these theories, closer antecedents are more salient, prominent, or accessible, thus requiring less explicit material for activation. Our analyses revealed that antecedent distance significantly accounted for an increase in the use of overt material, but this was observed exclusively in the bilingual groups. This suggests that bilinguals may be more sensitive to the last mention of a given referent. A significant effect of antecedent distance was found in the two bilingual groups, while the overexplicit production of overt material by monolinguals was not modulated by this factor. Overall, it appears that more distant antecedents require the use of more explicit forms in the production of the bilingual groups.
Importantly, this finding aligns with the predictions from the PPVH (Lozano, Reference Lozano and Ramos2016, Reference Lozano2018), which hypothesises that bilinguals are more likely to be pragmatically redundant than ambiguous. However, instances of redundancy are likely influenced by additional variables, such as antecedent distance, which contribute to the gradience of redundancy. These variables can either mitigate or exacerbate the violation of the principle of Informativeness/Economy. In the presence of distant antecedents, an overt form encoding TC would be considered less redundant (i.e., a milder violation), since its use might be motivated by a desire to reduce potential ambiguity. Notably, although bilinguals have been shown to produce more overt material than monolinguals, instances of overproduction are partly modulated by antecedent distance. Taken together, these results suggest that antecedent distance should be added to the list of factors contributing to the gradation of redundancy proposed within the PPVH.
It is noteworthy that, although the predictions of the PPVH were originally formulated for L2 acquisition, they provide a valuable framework for studying L1 attrition. This observation aligns with the hypothesis that L1 attrition and L2 acquisition are two sides of the same coin (Sorace, Reference Sorace2016). Moreover, the PPVH’s claims – specifically, that bilinguals exhibit more redundancy than ambiguity due to increased pragmatic obedience and possibly enhanced sensitivity to ambiguity – may apply to both their L1 and L2. This phenomenon could thus be viewed as a by-product of bilingualism, potentially extending to bilinguals of any language combination regardless of crosslinguistic similarities. Future studies should investigate this further.
The final factor explored in relation to the production of different subject REs was the number of potential antecedents (i.e., the number of active referents in the last four clauses preceding a given RE). It was found that a higher number of activated antecedents was associated with an increased production of overt forms across all groups (Arnold & Griffin, Reference Arnold and Griffin2007; Blackwell & Quesada, Reference Blackwell, Quesada, Geeslin and Díaz-Campos2012; Cunnings et al., Reference Cunnings, Fotiadou and Tsimpli2017; Fukumura & van Gompel, Reference Fukumura and van Gompel2010; Lozano, Reference Lozano and Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Quesada, Reference Quesada2021). Although this factor influenced all groups, both bilingual groups produced significantly more overt REs than monolinguals.
Bilinguals are generally more pragmatically redundant than monolinguals, arguably to avoid potential ambiguity when different antecedents are available to be recovered by a subject RE. In our corpus-based study, the effect of the number of activated antecedents is confirmed as a modulator of the graded violation of the Informativeness/Economy principle, in line with the findings reported in Lozano (Reference Lozano and Ramos2016) and Quesada (Reference Quesada2021). Furthermore, bilinguals are shown to be more overexplicit than monolinguals, as predicted by the IH. Differences are also observed between the two bilingual groups, depending on frequency of L1 use following the ATH, as illustrated below (see Figure 5).Footnote 9 Crucially, this phenomenon can be explained based on activation costs (ATH), online processing costs (IH), or pragmatic costs (PPVH) accounts.

Figure 5. Relationship between L1 redundancy and L1/L2 exposure/use in bilinguals along a continuum.
Overall, regarding the role played by language-internal variables in the use of overexplicit forms, as predicted by the PPVH (Lozano, Reference Lozano and Ramos2016, Reference Lozano2018), instances of redundancy appear to be modulated by factors such as antecedent distance and the number of activated antecedents, to which both bilingual groups have been shown to be particularly sensitive. As expected, more redundancy scenarios are observed in the two bilingual groups compared to monolinguals, arguably as a strategy to avoid potential ambiguity. These results suggest that overproduction in L1 attrition settings is more likely driven by increased processing demands in bilinguals and the interaction of language-internal factors than merely by crosslinguistic differences between the L1 and the L2, although L2 influence may not be entirely ruled out. On the whole, the findings from this paper have thus additionally enriched the initial PPVH proposal by illustrating the modulatory role of unexplored factors such as antecedent distance and type of L2 exposure and use.
8. Limitations
This study examines how redundancy is modulated by discourse-related factors such as the number of potential antecedents and antecedent distance in instructed and immersed bilinguals. While the PPVH’s predictions have been partially explored in L1 attrition due to the exclusive focus on TC, future research should also investigate TS scenarios to explore underspecification contexts to determine whether the prediction that bilinguals – particularly potential L1 attriters – are more redundant than ambiguous applies consistently in L1 attrition contexts. Furthermore, this study focuses on L1 Spanish-L2 English sequential bilinguals, a group characterised by crosslinguistic differences in subject RE realisation between their two languages. As a result, disentangling the effects of crosslinguistic influence from those of discourse-related factors remains challenging. Future research should incorporate bilinguals from language pairs with similar pronoun resolution patterns to isolate these effects more effectively. Finally, additional input and experience measures should be gathered to further explore the role played by these factors in modulating L1 attrition outcomes in different types of bilinguals.
9. Conclusion
This paper investigated L1 attrition in the production of subject REs in L1 Spanish-L2 English instructed and immersed bilinguals. The main differences between the bilingual groups emerged in the most cognitively demanding task, where bilinguals produced significantly more overt material than monolinguals. Furthermore, significant differences were found between the bilingual groups, with the immersed group being the most overexplicit, consistent with the ATH predictions linking higher rates of L2 use – and decreased L1 use – to increased overexplicitness. Additionally, antecedent distance and the number of potential antecedents were shown to be modulators of overexplicitness rates in TC, warranting consideration in future studies. These factors were particularly salient in the bilingual groups, which produced more explicit material to avoid potential ambiguity as predicted by the PPVH. Thus, the observed phenomenon has been explained through three interrelated and complementary perspectives: activation costs (ATH), online processing costs (IH), and pragmatic costs (PPVH). Future research should investigate bilinguals with different language combinations to determine whether crosslinguistic influence plays a role in this domain or whether the observed effects are merely a by-product of bilingualism, as partially anticipated by the PPVH.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925100680.
Data availability statement
All data and codes generated for the analysis are publicly available in the first authors’ OSF repository (https://osf.io/4mrfc/) as well as in the CEDEL2 corpus.
Acknowledgements
We thank the editorial team and anonymous reviewers at Bilingualism: Language and Cognition for their helpful and constructive feedback. We are also grateful to our colleagues at the Universidad de Granada, the University of Edinburgh, the University of Cambridge, and Universitat Pompeu Fabra for their valuable comments and support during the development of this work.
Competing interests
The authors declare none.




