Understanding L1 and L2 reference comprehension in speech: focusing referents and pronouns

Regina Hert; Anja Arnhold; Juhani Järvikivi

doi:10.1017/S0142716425100143

Understanding L1 and L2 reference comprehension in speech: focusing referents and pronouns

Published online by Cambridge University Press: 15 September 2025

and

Regina Hert*: Affiliation:
Department of Linguistics, University of Alberta, Edmonton, AB, Canada Université Toulouse-Jean Jaurès, Maison de la Recherche, Toulouse, France
Anja Arnhold: Affiliation:
Department of Linguistics, University of Alberta, Edmonton, AB, Canada
Juhani Järvikivi: Affiliation:
Department of Linguistics, University of Alberta, Edmonton, AB, Canada
*: Corresponding author: Regina Hert; Email: regina.hert@univ-tlse2.fr

Article contents

Abstract
Introduction
Hypotheses
Experiment 1: Interpretation of er and der in L2 speakers
Experiment 2: Interpretation of er and ihn in L1 and L2 speakers
General discussion
Conclusion
Replication package
Funding Statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

In this study, we present data from two experiments investigating the effect of prosodic focus marking on German L1 and L2 speakers’ interpretation of pronouns. Experiment 1 tested L2 speakers’ interpretation of personal and demonstrative subject pronouns. Experiment 2 examined L1 and L2 speakers’ interpretation of unaccented and accented personal subject and object pronouns. The results of experiment 1 reveal that L2 speakers are sensitive to the different functions of the two subject pronouns. However, grammatical role and focus marking influenced referential choice to similar degrees for both pronouns, suggesting that L2 speakers’ weighting of these linguistic factors differs from that of L1 speakers. Experiment 2 showed L1 and L2 speakers to prefer the subject referent for both subject and object pronouns. Referent preference reversal is only observed with the accented subject pronoun in L1 speakers. Ultimately, this study emphasizes the varying levels of sensitivity to grammatical role and information structure observed not only for the different pronoun types but also among different speaker groups.

Keywords

d-pronouns information structure L1 and L2 speakers pronoun interpretation prosody

Information

Type: Original Article
Information: Applied Psycholinguistics , Volume 46 , 2025 , e25

DOI: https://doi.org/10.1017/S0142716425100143 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Introduction

For successful language comprehension, new information has to be stored and linked to already existing information. The use of a pronoun suggests that its referent is already known, but pronouns themselves generally encode little information (apart from e.g., person, number). Therefore, linking the pronoun to a referent is necessary to retrieve more information for the comprehension process. But how do comprehenders decide to which referent they link a particular pronoun? That is the central question in pronoun resolutionFootnote ¹ research. To answer this question, many theoretical accounts appeal to a ranking of referents in terms of their accessibility or prominence (e.g. Grosz et al., Reference Grosz, Weinstein and Joshi1995a; Ariel, Reference Ariel, Sanders, Schilperoord and Spooren2001; Arnold, Reference Arnold2001; Arnold et al., Reference Arnold, Brown-Schmidt and Trueswell2007; von Heusinger and Schumacher, Reference von Heusinger and Schumacher2019). Others, however, have pointed out that accounts of prominence or accessibility as the determining factor in pronoun resolution risk being circular, as well as being unable to fully account for the complexity of the emerging picture (Kaiser and Trueswell, Reference Kaiser and Trueswell2008; Hert et al., Reference Hert, Järvikivi and Arnhold2024; Bader and Portele, Reference Bader and Portele2025).

A wide variety of factors can affect the likelihood of a certain referent being linked to a pronominal form, among them grammatical role (e.g., Alonso-Ovalle et al., Reference Alonso-Ovalle, Fernández-Solera, Frazier and Charles2002; Carminati, Reference Carminati2002; Crawley and Stevenson, Reference Crawley and Stevenson1990; Frederiksen, Reference Frederiksen1981; Fukumura and van Gompel, Reference Fukumura and van Gompel2015; Gordon and Chan, Reference Gordon and Chan1995; Hert et al., Reference Hert, Järvikivi and Arnhold2024; Järvikivi et al., Reference Järvikivi, van Gompel, Hyönä and Bertram2005; Kaiser, Reference Kaiser2011a; Okuma, Reference Okuma, Herschensohn and Tanner2011; Song and Fisher, Reference Song and Fisher2005) and information structure (e.g., Colonna et al., Reference Colonna, Schimke and Hemforth2012, Reference Colonna, Schimke, Hemforth, Hemforth, Mertins and Fabricius- Hansen2014, Reference Colonna, Schimke and Hemforth2015; de la Fuente and Hemforth, Reference de la Fuente and Hemforth2013; Ellert, Reference Ellert2013; Xu, Reference Xu2015). However, it is not quite clear why certain factors seem to have stronger effects than others. Moreover, the factors and their relative weights have been shown to differ between languages and individual pronominal forms (e.g., Bader and Portele, Reference Bader, Portele, Gattnar, Hörning, Störzer and Featherston2019a,b; Ellert, Reference Ellert2010; Ellert et al., Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011; Kaiser and Trueswell, Reference Kaiser and Trueswell2008). Kaiser and Trueswell (Reference Kaiser and Trueswell2008) captured these findings in the form-specific approach that states that multiple factors (e.g., subjecthood, focus) play a role in linking a specific referent to a specific pronoun, and the degree of sensitivity to these factors varies for the different pronominal forms (e.g. personal pronoun vs demonstrative pronoun, or overt pronouns vs null pronouns). In other words, what makes a referent more likely to form a link with the pronoun depends not only on the factors’ contribution to the referent’s accessibility, but also on the pronoun’s sensitivity to these factors.

The current study examines how changes in information structure affect the resolution of different pronouns in German for first (L1) and second language (L2) speakers. Experiment 1 tests how focus marking on preceding subject and object referents influences referential choice for the personal pronoun er and demonstrative pronoun der for L2 speakers of German. In experiment 2, we employ prosodic focus marking in the form of accents on pronouns themselves, comparing the subject pronoun er (“he”) and the object (accusative case-marked) pronoun ihn, and examine the effect of prosodic focus marking on referent selection for both L1 and L2 speakers. We concentrate on information structure for three reasons: First, the notions of prominence and accessibility are more or less explicitly connected to information structural concepts such as focus (Gundel et al., Reference Gundel, Hedberg and Zacharski1993; von Heusinger and Schumacher, Reference von Heusinger and Schumacher2019; Ladd and Arvaniti, Reference Ladd and Arvaniti2023). Therefore, investigating the effect of information structure on pronoun resolution is a promising way of accessing prominence- and accessibility-based accounts of pronoun resolution (also see Hert et al., Reference Hert, Järvikivi and Arnhold2024). Second, information structure is a central factor connecting several other factors that have been found to affect pronoun resolution, such as grammatical role and position. Thus, while grammatical role and position generally coincide in fixed word order languages like English (e.g. in The baker called the tailor, the baker is both the subject and the first-mentioned entity, whereas the tailor is the object and second mentioned), they can be disentangled in a flexible word order language like German (Järvikivi et al., Reference Järvikivi, van Gompel, Hyönä and Bertram2005; Sauermann et al., Reference Sauermann, Höhle, Chen and Järvikivi2011; Kaiser and Trueswell, Reference Kaiser and Trueswell2004). However, changes in word order mark differences in information structure (Frey, Reference Frey2006; Weskott et al., Reference Weskott, Hörnig, Fanselow and Kliegl2011; Fanselow, Reference Fanselow, Féry and Ishihara2015), which should therefore be manipulated directly. Third, as reviewed below, several studies have suggested that L2 speakers are more affected by information structure than L1 speakers (Okuma, Reference Okuma, Herschensohn and Tanner2011; Schimke and Colonna, Reference Schimke and Colonna2016; Ellert et al., Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011), whereas one study suggests they are less likely to take information structure into account than L1 speakers (Abashidze et al., Reference Abashidze, Gagarina and Bittner2023). This disagreement warrants another look, especially since none of the existing studies explicitly controlled for prosody, which is the most natural way to mark information structure in many languages, including German. Therefore, our objective is to gain new insights into the pronoun and L2 literature.

Pronoun resolution in L1 German

In German, the personal subject pronoun er (“he”) usually refers to the preceding subject referent (e.g., Bader and Portele, Reference Bader and Portele2019b; Bouma and Hopp, Reference Bouma and Hopp2007; Colonna et al., Reference Colonna, Schimke and Hemforth2012; Hert et al., Reference Hert, Järvikivi and Arnhold2024). In addition to the personal pronoun, demonstrative pronounsFootnote ² like der are used anaphorically as well. The two pronominal forms – personal and demonstrative pronouns – have been found to differ in their preference regarding choice of referents. The difference between the two pronominal forms has been described in terms of complementary preferences for grammatical role as well as for information structure. Unstressed personal pronouns have been shown to prefer the subject/topical referent, whereas demonstratives are more likely to be linked to the object/non-topical referent (Bosch et al., Reference Bosch, Rozario and Zhao2003, Reference Bosch, Katz, Umbach, Schwarz-Friesel, Consten and Knees2007; Comrie, Reference Comrie1997; Diessel, Reference Diessel1999; Kaiser, Reference Kaiser2011b). Importantly, in line with Kaiser and Trueswell’s form-specific approach, studies have shown that the extent of sensitivity toward these factors varies between the two pronominal forms: demonstratives are affected more by information structure than personal pronouns, whereas personal pronouns are influenced by grammatical role to a greater degree than demonstrative pronouns (e.g., Bader and Portele, Reference Bader, Portele, Gattnar, Hörning, Störzer and Featherston2019a, 2025; Hert et al., Reference Hert, Järvikivi and Arnhold2024; Kaiser and Trueswell, Reference Kaiser and Trueswell2008; Kaiser, Reference Kaiser2011c; Portele and Bader, Reference Portele and Bader2016).

Regarding the difference between subject and object pronouns, Sauermann and Gagarina (Reference Sauermann and Gagarina2017) investigated the effects of word order and grammatical role parallelism (i.e., a subject referent preference for subject pronouns like er “he” and an object referent preference for object pronouns like ihn “him”) during online pronoun processing using eye-tracking. The gaze data showed an effect of grammatical role parallelism, i.e. with the subject pronoun er, the participants fixated subject referents more than object referents, whereas the opposite pattern was found for the object pronoun ihn. This effect was present 750–2500 ms from pronoun onset, irrespective of word order. However, as recent research suggests, online preferences in the visual world do not necessarily reflect the final interpretation (Blything et al., Reference Blything, Järvikivi, Toth and Arnhold2021; Hert et al., Reference Hert, Järvikivi and Arnhold2024; Schumacher et al., Reference Schumacher, Dangl, Uzun, Holler and Suckow2016, Reference Schumacher, Roberts and Järvikivi2017), it is not clear to what extent these findings reflect the final (pronoun) interpretation. In fact, Sauermann and Gagarina, who investigated eye gaze during 12 consecutive 250-ms time segments after pronoun onset, note that the effect of grammatical role was not present in the last two time segments comprising 2500-3000 ms after pronoun onset. They further assumed that this effect might decrease during later processing, but did not collect offline responses on final interpretations.

Following Sauermann and Gagarina’s experimental design, Abashidze, Gagarina, and Bittner (Reference Abashidze, Gagarina and Bittner2023) examined the influence of grammatical role and positional parallelism (preference for the referent occupying the same syntactic position as the pronoun, regardless of grammatical role, see e.g., Smyth (Reference Smyth1994)) during online and offline resolution of subject and object pronouns in German. Their gaze data revealed an initial preference toward the subject referent for subject and object pronouns, which was higher for the object pronoun at first, but increased further over time for the subject pronoun. The offline results showed a preference for the subject referent with the subject pronoun, while for the object pronoun, referent choice was at chance level. Thus, similar to Sauermann and Gagarina, Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023) found that grammatical role strongly affects online processing. Unlike Sauermann and Gagarina, however, the gaze pattern for object pronouns also showed a subject referent preference. Abashidze et al. explain their results in terms of topicality. They see personal pronouns as a tool for topic continuation, meaning participants would be biased toward interpreting personal pronouns as topics. Resolution of subject pronouns in the sentence-initial topic position would be straightforward because grammatical role and topicality align. On the other hand, the preferences for object pronouns would not be as clear, since grammatical role would point to the object and the topic bias toward the subject—the topic—as the referent, which might explain the chance-level performance on the offline interpretation. However, unlike Sauermann and Gagarina (Reference Sauermann and Gagarina2017), Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023) only tested subject-verb-object (SVO) word order, which impedes the comparison between the results of the two studies. Further, as word order is known to express information structure in German (Fanselow, Reference Fanselow, Féry and Ishihara2015; Frey, Reference Frey2006), the fact that Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023) did not manipulate word order or vary information structure in any other way makes it difficult to directly assess their claims that both subject and object pronouns preferentially refer to topics.

For the difference in referent selection between unaccented and accented pronouns, there is no research on German. The theoretical literature generally assumes that accented pronouns reverse the referent preference (Akmajian and Jackendoff, Reference Akmajian and Jackendoff1970). For example, in Rachel texted Monica and Ross called her/HER, the unaccented pronoun would refer to the object Monica, but the accented one to the subject Rachel. This pattern has been derived in both coherence-based accounts (e.g., Kehler, Reference Kehler2005) and Accessibility Theory (Ariel, Reference Ariel1988, Reference Ariel1990, Reference Ariel, Sanders, Schilperoord and Spooren2001) (see also e.g., Givón, Reference Givón1983; Gundel et al., Reference Gundel, Hedberg and Zacharski1993; Kameyama, Reference Kameyama1999; Smyth, Reference Smyth1994, for similar accounts). Experimental studies suggest that the reversal effect of accented pronouns is only present in certain contexts (Mozuraitis and Heller, Reference Mozuraitis and Heller2017; Taylor et al., Reference Taylor, Stowe, Redeker and Hoeks2013), but there is no agreement on the precise conditions for it. However, most of these findings are based on English, which only has one type of subject and object pronoun. Since German has different types of pronouns that differ in their degree of sensitivity to different factors (e.g., er vs der, as detailed above), we may find differences in the resolution of their accented versions, as well.

To summarize, the studies above show not only personal subject pronouns to be preferably interpreted as the preceding subject/topic and demonstrative pronouns as the preceding object/non-topic but also that these pronouns differ in their sensitivity toward grammatical role and information structure (Abashidze et al., Reference Abashidze, Gagarina and Bittner2023; Bader and Portele, Reference Bader, Portele, Gattnar, Hörning, Störzer and Featherston2019a,b; Bosch et al., Reference Bosch, Rozario and Zhao2003, Reference Bosch, Katz, Umbach, Schwarz-Friesel, Consten and Knees2007; Kaiser, Reference Kaiser2011b; Sauermann and Gagarina, Reference Sauermann and Gagarina2017). Similarly to the personal subject pronoun, for the (personal) object pronoun, the two determining factors for its resolution are grammatical role—object—and topicality. However, it is not quite clear which of these two factors exerts a stronger influence. Finally, the effect of accenting the pronoun has not been investigated for German so far.

Pronoun resolution in L2

In research on L2 pronoun resolution, the focus has been on whether L2 speakers can perform like native speakers. Especially null subject or pro-drop languages have been contrasted with languages that generally only allow overt pronouns. Since they differ in their referential options—while pro-drop languages utilize overt and null pronominal forms, non-pro-drop languages only employ overt forms—it has been questioned whether learners can fully acquire the differences in the use of the pronominal forms. However, when compared to native speaker control groups, differences in the use, interpretation, and processing of null and overt pronouns have been found for L2 speakers regardless of whether their L1 is a non-pro-drop or a prop-drop language (Belletti et al., Reference Belletti, Bennati and Sorace2007; Lozano, Reference Lozano2018; Okuma, Reference Okuma, Herschensohn and Tanner2011; Polio, Reference Polio1995; Roberts et al., Reference Roberts, Gullberg and Indefrey2008; Sorace and Filiaci, Reference Sorace and Filiaci2006). While some findings suggest that when trying to resolve ambiguity, L1 influence can emerge (e.g. Roberts et al., Reference Roberts, Gullberg and Indefrey2008), other findings reveal that even when both L1 and L2 are pro-drop languages, L2 learners do not necessarily benefit from this similarity (e.g. Lozano, Reference Lozano2018; Polio, Reference Polio1995). This means that differences in L2 pronoun resolution cannot entirely be explained by the difficulty of switching from a non-pro-drop to a pro-drop system or vice versa.

Turning now to German, several studies have investigated pronoun resolution in L2 learners. While German does generally not allow null subject pronouns, as previously mentioned, demonstrative pronouns can be used anaphorically. Thus, in German too, there are different referential forms with different underlying referential functions, i.e., similar to null and overt pronouns, they are affected to different degrees by grammatical roles and information structural roles (see section 1.1). What may make the differences in interpretation between personal and demonstrative pronouns not obvious is that the anaphoric use of the demonstrative pronoun often gets overlooked in grammars (cf. Ahrenholz, Reference Ahrenholz2007) and may therefore not even be part of the L2 acquisition process. So far, three studies have investigated the different pronominal forms explicitly and considered the effect of information structure in nonnative German speakers’ pronoun resolution.

Ellert, Roberts, and Järvikivi (Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011) investigated the effect of topicality on the resolution of subject personal and demonstrative pronouns (er and der) in L2 German. The L2 speakers’ L1 was Dutch, which allows demonstrative pronouns to be used anaphorically and shows the same referent preferences of personal and demonstrative pronouns as German (e.g., Bosch et al., Reference Bosch, Katz, Umbach, Schwarz-Friesel, Consten and Knees2007; Kaiser, Reference Kaiser2011c). In their items (e.g., Der Schrank ist schwerer als der Tisch. Er/Der stammt […]. “The cabinet is heavier than the table. It comes […].”), the first-mentioned NP (Schrank “cabinet”) is assumed to be topical, whereas the second-mentioned NP (Tisch “table”) is non-topical. The results revealed that L2 learners linked both pronouns to the topical referent, which is different from the L1 preference (both in Dutch and German) to resolve the personal pronoun toward the topical referent and the demonstrative toward the non-topical referent. Nevertheless, even though L2 speakers showed a topic preference for both pronouns, the preference was stronger for personal than for demonstrative pronouns. This was observed in their online eye-tracking data as well as in the offline comprehension questionnaire. These results suggest that the referential function of the individual pronouns played a decisive role also for L2 speakers. Ellert (Reference Ellert2010) further suggests that proficiency is a crucial factor in L2 resolution of personal and demonstrative pronouns. She observes that less proficient learners used both pronominal forms for the same function (i.e., linked to topical referents), whereas highly proficient L2 learners differentiated distinct functions for the personal pronoun (i.e., linked to topics) and for demonstrative pronouns (i.e., linked to non-topics).

In contrast to Ellert et al. (Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011), who investigated topicality, Patterson, Esaulova, and Felser (Reference Patterson, Esaulova and Felser2017) conducted three experiments to examine how focus affects the resolution of the within-sentence subject pronoun er in both native and non-native German speakers, as well as native Russian speakers. Focus was established through the use of cleft constructions and focus-sensitive particles. The results indicated a distinct contrast between native and non-native speakers that could not be attributed to L1 influence. Specifically, native speakers of German and Russian were less likely to link a pronoun with a referent in focus (via cleft) when compared to a non-focused referent in the same position. In contrast, non-native speakers did not display this effect, but rather tended to resolve a pronoun toward referents appearing with a focus-sensitive particle. Thus, L2 speakers showed sensitivity to focus marking. Since results were the same for both L1 groups, the L2 group’s divergence cannot be explained by possible L1 influences. Lastly, Wilson (Reference Wilson2009) tested word order and grammatical role effects on the resolution of German personal and demonstrative subject pronouns in L1 and L2 speakers (L1 English). The results showed L2 speakers to prefer the first-mention—topical—referent with the personal pronoun, while L1 speakers showed no preference. For the demonstrative pronoun, L2 speakers had no preference, whereas L1 speakers linked it to the second mention—non-topical—referent. However, Wilson mentions that her manipulation of word order may not have triggered changes in information structure as intended, since her stimuli did not employ appropriate prosody.

As for the resolution of object pronouns, German as L2 has not yet been extensively researched. The above-mentioned study by Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023) contrasted L2 speakers (L1 Georgian) with native German speakers. In the gaze data, they found L2 speakers to attend more to the subject referent than the object referent after a subject pronoun, which corresponded to L1 speakers’ gaze pattern. For object pronouns, L2 speakers fixated the object referent more than L1 speakers. In the offline results, L2 speakers showed the same tendency as L1 speakers, namely, selecting the subject referent more often for the subject pronoun than for the object pronoun. However, L2 speakers preferred the object referent with the object pronoun, whereas L1 speakers did not show a preference for either referent. Abashidze et al. conclude that while L1 speakers’ preference is affected by grammatical role and topicality, L2 speakers may have difficulties employing information structural cues and hence rely only on grammatical parallelism during pronoun resolution. Note, however, that their study did not manipulate information structure directly, whereas studies that directly investigated information structural effects showed that L2 German speakers were sensitive to changes in information structure (Ellert et al., Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011; Patterson et al., Reference Patterson, Esaulova and Felser2017). This has also been shown for L2 speakers of other languages, and, for example, Schimke and Colonna (Reference Schimke and Colonna2016) suggest that L2 learners might rely on discourse-level information to a greater extent than L1 speakers when interpreting pronouns (also see Okuma, Reference Okuma, Herschensohn and Tanner2011).

In sum, the studies presented in this section show that native and non-native speakers are affected by grammatical role and information structure when interpreting pronouns. However, these factors seem to be weighted differently in L2 than in L1 speakers. The underlying cause for the differences in weighting is not obvious. While proficiency plays an important role in L2 speakers achieving a more native-like performance (e.g., Ellert, Reference Ellert2010; Lozano, Reference Lozano2018; Polio, Reference Polio1995), the role of L1 influence is not certain (cf. Ellert et al., Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011; Patterson et al., Reference Patterson, Esaulova and Felser2017; Roberts et al., Reference Roberts, Gullberg and Indefrey2008; Lozano, Reference Lozano2018; Polio, Reference Polio1995). However, language influence in the sense of dominance/proficiency cannot completely be disregarded, as Tsimpli et al. (Reference Tsimpli, Sorace, Heycock and Filiaci2004) have shown that even native speakers under attrition can behave more like L2 than L1 speakers. This is also indicated in Roberts et al.’s (Reference Roberts, Gullberg and Indefrey2008) study, where L2 speakers of different L1s showed similar online processing patterns, but deviated in their final interpretation. Thus, these findings entail that the effect of language proficiency should be further investigated together with the role of information structure.

The role of prosodic focus marking in L1 and L2 speakers

Intonation is commonly used as an indication of information structure. For instance, it can mark whether an element of a sentence has been introduced in the previous discourse (Schwarzschild, Reference Schwarzschild1999), whether that element is new (i.e., update of the common ground, Lambrecht, Reference Lambrecht1994) or whether that element indicates the relevance of alternatives (Roberts, Reference Roberts2012; Rooth, Reference Rooth1985, Reference Rooth1992). In German, while focus is associated with a falling accent (H*(+L)) (e.g., Baumann, Reference Baumann2006; Büring, Reference Büring1997; Féry, Reference Féry1993), topics are connected to rising accents (L*+H), especially when contrastive (e.g., Féry, Reference Féry1993; Büring, Reference Büring1997; Braun, Reference Braun2006; Repp and Drenhaus, Reference Repp and Drenhaus2015). Focus is acoustically marked with a wider pitch range, an increased intensity, and increased duration compared to other speech elements that are not in focus (Féry and Kügler, Reference Féry and Kügler2008).

Literature suggests that native speakers are able to identify and integrate prosodic information to build information structure in real time (Heim and Alter, Reference Heim and Alter2006; Wang et al., Reference Wang, Wang, Qadir, Lee and Zee2011). An event-related potential (ERP) experiment in German (Hruska and Alter, Reference Hruska and Alter2004) revealed an increased N400 response at words that were expected to carry a focus pitch accent but did not. In eye-tracking studies, contrastive focus marking has been revealed to trigger anticipatory eye movements, e.g. hearing blue ball followed by GREY raises expectations that the upcoming noun will also be ball (Ito and Speer, Reference Ito and Speer2008; Ito et al., Reference Ito, Bibyk, Wagner and Speer2014), which in turn can support target search. Similarly, it has been observed that in H*L (focus) conditions, the initial higher proportion of looks directed toward the competitor decreases earlier compared to L*H (non-focus) conditions (Chen et al., Reference Chen, Den Os and De Ruiter2007; Sedivy et al., Reference Sedivy, Tanenhaus, Chambers and Carlson1999). This shows that native listeners can make predictions about upcoming referents in real time using prosodic cues.

For L2, on the one hand, some studies suggest that L2 learners might have difficulties producing and perceiving prosodic cues, particularly if these differ from prosodic cues in their L1 (Mennen and De Leeuw, Reference Mennen and De Leeuw2014). Akker and Cutler (Reference Akker and Cutler2003) found that L2 Dutch learners of English were not able to map pitch accents to semantic information as effectively as native speakers of English, even though the use of prosodic cues for information structure is similar in Dutch and English (also see Chen and Lai, Reference Chen and Lai2011). On the other hand, Takahashi et al. (Reference Takahashi, Kao, Baek, Yeung, Hwang and Broselow2018) found shorter reaction times for sentences with felicitous contrastive pitch accent as compared to infelicitous use of contrastive pitch for L2 Chinese learners of English. The authors assumed that the effective use of English contrastive prosodic cues in L2 speakers stemmed from similarities in pitch cues to focus in English and Mandarin Chinese. ERP studies (Reichle, Reference Reichle2010; Reichle and Birdsong, Reference Reichle and Birdsong2014) revealed that L2 proficiency can affect the online perception of information structure. Unlike low-proficiency L2 English learners of French, high-proficiency learners showed a native-like anterior negativity response for contrastive focus. Perdomo and Kaan (Reference Perdomo and Kaan2021) looked into the effects of proficiency and working memory on L2 information structure processing. They found that while L2 speakers used prosodic information to build information structure during listening, neither proficiency nor working memory influenced L2 speakers’ use of contrastive pitch accent to predict or process the following noun phrase.

Thus, the evidence so far suggests that L2 speakers show sensitivity to modulations of L2 prosody, but may experience difficulties in effectively using the prosodic information for the subsequent discourse. As to the role of proficiency, its effect on L2 performance is not yet clear.

As to the role of prosodic focus marking in L2 pronoun resolution, it has not been investigated so far (see Tsoukala et al., Reference Tsoukala, Vogelzang and Tsimpli2024, for effects of implicit prosodic rhythm cues in L2 English). The present study is intended to fill this gap.

Hypotheses

Various accounts have been put forward attempting to explain observed differences between L1 and L2 language processing, such as the interface hypothesis (Sorace and Filiaci, Reference Sorace and Filiaci2006; Sorace, Reference Sorace2011) or the shallow structure hypothesis (Clahsen and Felser, Reference Clahsen and Felser2006, Reference Clahsen and Felser2018). Despite making different predictions about when difficulties for L2 speakers arise, what these accounts have in common is that they describe L1-L2 differences in terms of difficulties in applying information during online processing. Cunnings (Reference Cunnings2017) proposes that during the cue-based memory retrieval processes, similarities among the cues may interfere and lead to differences in weighting of the cues, which in turn may result in differences during processing. Cunnings’s approach fits particularly well with the idea of the form-specific account that different syntactic, pragmatic, and discourse-level cues are weighted to render each pronoun’s referent (cf. Kaiser and Trueswell, Reference Kaiser and Trueswell2008; Kaiser, Reference Kaiser2017). Since languages and individual pronouns may differ in how they weight cues, bilingual speakers may have more weighting options available. Moreover, discourse-based cues seem to generally be weighted more strongly in L2 than L1 processing (cf. Schimke and Colonna, Reference Schimke and Colonna2016). This could actually be explained in terms of cue-weighting as follows: Prosodic focus marking results in increased attention to focused referents during processing and memory retention (Hert et al., Reference Hert, Järvikivi and Arnhold2024; Káldi and Babarczy, Reference Káldi and Babarczy2021). Therefore, using attention, an explicit link can be established to pronoun resolution: A prosodically focus-marked referent may receive a boost in its memory representation, which in turn will make establishing the link between it and a pronoun easier. In L1 processing, this does not determine the ultimate likelihood of establishing such a link, since pronoun resolution is generally determined by grammatical characteristics of individual pronouns, e.g. subject pronouns like English he and German er are linked to preceding subject referents (Foraker and McElree, Reference Foraker and McElree2007; Bly- thing et al., 2021; Hert et al., Reference Hert, Järvikivi and Arnhold2024). For L2 speakers, for whom cue-weighting is generally expected to be challenging, the effect of focus marking can be expected to be even stronger, and it is possible that, unlike L1 speakers, L2 speakers are generally more likely to select focused referents in pronoun resolution.

Thus, the first overarching hypothesis tested in the present study is that L2 speakers are more sensitive to information structure than L1 speakers in processing pronouns. In particular, we predict them to be more likely to select focused entities as pronoun referents than L1 speakers. Our second overarching hypothesis, based on previous findings (e.g., Ellert, Reference Ellert2010; Lozano, Reference Lozano2018; Wilson, Reference Wilson2009), is that proficiency will affect L2 speakers’ pronoun processing, such that more proficient L2 speakers will be more similar to L1 speakers than less proficient ones. Further, we predict proficiency to be more important than type of L1 in the L2 speakers group, since L1 type does not necessarily lead to a native-like performance (see Polio, Reference Polio1995) and even native speakers under attrition behaved more similar to L2 speakers than to native controls (Polio, Reference Polio1995; Tsimpli et al., Reference Tsimpli, Sorace, Heycock and Filiaci2004). The third overarching hypothesis to be tested is therefore that proficiency is more important for L2 speakers’ performance than whether their L1 is a pro-drop or non-pro-drop language.

More detailed predictions are derived in the individual sections for the two experiments below. Both experiments were approved by the Research Ethics Board 2 of the University of Alberta (study ID Pro00105075).

Experiment 1: Interpretation of er and der in L2 speakers

In experiment 1, we investigate whether L2 speakers’ referential choice can be aided by focusing on possible referents. That is, can their referent preference be biased toward one referent if prosody explicitly marks that referent as focused in the discourse context (see section 3.1.2 for an example)? In accordance with the first overarching hypothesis, we predict that the answer is yes, and that focused referents will be chosen more often than those that are not focused.

Further, we want to examine whether L2 speakers are sensitive to the different referential functions of the subject pronouns er and der. Previous L2 research suggests that L2 speakers differ from L1 speakers in their resolution of the demonstrative pronoun der, but are more alike to L1 speakers with the personal pronoun er (e.g., Ellert et al., Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011). Therefore, we predict interpretation of the personal pronoun er in L2 speakers to be similar to L1 speakers, i.e. they will show a preference for the subject referent, but we predict a less clear preference for the demonstrative pronoun der.

As for proficiency, we predict that L2 speakers’ referent selection interacts with their level of proficiency, resulting in an increasingly more native-like performance with increasing levels of proficiency, as per the second overarching hypothesis. This means, more proficient L2 speakers should select the subject referent more often than less proficient ones, especially for er. Finally, in accordance with the third overarching hypothesis, we predict that proficiency will have a stronger effect on the results than whether participants’ L1 is a pro-drop language.

Method

Design and materials were identical to those used with L1 speakers in experiment 2 reported in Hert et al. (Reference Hert, Järvikivi and Arnhold2024).

Participants

In total, 80 participants with various L1sFootnote ³ (see Table 1) completed the experiment via Prolific for monetary compensation (£9/h). For the analysis, we excluded 15 participants based on their high error rate with filler items (more than 60% incorrect), resulting in 65 participants (age range: 19–62, mean: 33, sd: 12.4). Participants indicated they learned German in language classes in school or university. At the time of testing, participants spent all or the majority of their lives in their respective L1 country.

Table 1. L1s and number of speakers for experiment 1

Materials

We designed a comprehension task where participants listened to short dialogues. Two experimental factors were manipulated: (i) whether the dialogue contained an unaccented personal pronoun er or an unaccented demonstrative pronoun der; (ii) which possible referent for the pronoun was focused (subject, object; see Table 2 for an example). Information structure was manipulated in a twofold way, coupling prosodic focus marking with changes in the context that licensed the prosodically indicated information structure. A total of four conditions were tested, and we used ten sentences per condition, which resulted in 40 experimental dialogues.

Table 2. Example dialogue with critical manipulation in all four conditions. Prosodic focus marking in italics, unaccented ambiguous pronoun in bold. Contexts were identical for all conditions except where indicated with slashes and condition names in brackets. Note that information structure was manipulated prosodically in the critical sentence, as well as in the preceding context

The dialogues were recorded using Shure SM10A headset microphones in a sound-attenuated booth by two native speakers of German, one female and one male. The female speaker (the second author, a prosody researcher) recorded all the introductions and critical sentences (A-turns in Table 2) for the experimental items, while for the filler items, it was the male speaker. The speaker uniformly produced a single falling accent on the focused constituent, while the rest of the sentence remained unaccented (represented as H* L-% in GToBI notation Grice et al., Reference Grice, Baumann, Benzmüller and Jun2005, Reference Grice, Baumann, Ritter and Röhr2017).

The experimental items were distributed across four lists in a Latin square design. Additionally, we constructed 40 filler items that were the same across all lists. These fillers also contained four possible referents, but unlike the experimental items, we did not include any ambiguous pronouns.

Note that, like prior research on German pronouns, we only use masculine third-person singular pronouns in the experimental items. This is because the feminine third person singular pronouns sie and die are the same in nominative and accusative cases, and are additionally homonymous with the third person plural and the polite second person singular pronouns, which would induce additional ambiguity. The possible referents in all target items were grammatically masculine occupation names, which are interpreted as referring to men in German (Horvath et al., Reference Horvath, Merkel, Maass and Sczesny2016), meaning all of them were equally suitable referents for the target pronouns.

Procedure

The experiment was created with the jsPsych framework for carrying out online experiments (version 7.2.1, de Leeuw, Reference de Leeuw2015). The participants were given a brief written explanation of the tasks they were about to complete. First, participants filled out a questionnaire about their language background. Additionally, we included the German LexTALE (Lemhöfer and Broersma, Reference Lemhöfer and Broersma2012) as a measure of L2 speakers’ vocabulary knowledge (see Table 3). Afterwards, a screen with instructions appeared, asking participants to carefully listen to the dialogues. They were also given the chance to check their speakers’/headphones’ volume before starting the task.

Table 3. LexTALE scores (raw) for L2 speakers, including range, mean, and standard deviation

While listening to the dialogues, participants saw the names of the four mentioned referents on the screen. Following each dialogue, they saw a question on the screen probing to which of the two target referents, subject or object, the pronoun referred (see last row in Table 2). We also included the other two referents as possible responses to ensure that participants paid attention during the experiment. Participants gave their answer by clicking on one of the names on the screen. The positions of the referents’ names on the screen were randomized for each list. Halfway through the experiment, participants were given a break.

Results

We performed generalized linear mixed-effects regression modeling (GLMER) using the lme4 package (version 1.1-35.1, Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in the software R (version 4.3.0, R Core Team, 2023) to analyze the participants’ responses. The models included a binomial dependent variable coding whether the participant chose the subject or the object as the referent of the pronoun. We therefore excluded 111 responses choosing a distractor referent, i.e. 4.27% of the data, leaving 2489 data points for analysis. We added a three-way interaction for Condition (subject focus vs object focus), Pronoun (er vs der), and LexTALE (centered). L1 type (a binary variable coding whether participants’ L1 was a pro-drop language) was also included as a fixed effect (a more complex model with a four-way interaction did not converge). Opting for a backward-fitting procedure, we excluded fixed factors one by one to see whether they significantly contributed to the model fit. Neither L1 type nor any interactions between Pronoun, Condition, and LexTALE affected the model’s fit significantly and therefore the fixed effect L1 type and the interactions were excluded from the final model shown in Table 4, which only indicates significant effects of Condition, Pronoun, and LexTALE. Moreover, the model contained by-Participant random slopes for Pronoun, as well as a random intercept for Item. Models with more complex random effects structures did not converge.

Table 4. Fixed effects for best fitting generalized linear mixed-effects model of referent choice for er and der

As illustrated in Figure 1, the subject referent was chosen significantly less often in the object focus condition compared to the subject focus condition (the intercept in Table 4). With regards to the pronouns, participants selected the subject referent significantly less often for der than for er. Moreover, participants’ LexTALE score predicted their referent choice significantly. Higher scores correlated with an increase in subject preference.

Figure 1. Referent choice for er and der by condition, with error bars for standard error.

In addition, we compared the subject preference for the two pronouns in both prosody conditions to chance level using one-sample Wilcoxon signed rank tests and found the difference to be significant only for er in subject focus, where subject preference was significantly above chance level, and der in object focus conditions, where it was significantly below chance level (for both p < 0.05).

Discussion

As predicted by our first hypothesis, the L2 speakers were sensitive to focus marking and chose focused referents more often than non-focused ones. Unlike for L1 speakers (cf. Hert et al., Reference Hert, Järvikivi and Arnhold2024), the effect of information structure did not significantly interact with the factor Pronoun. Thus, while effects of focus were attenuated and ultimately overridden by the individual pronoun’s referent preference in L1 speakers, the effect of focus on L2 speakers was the same for both pronouns. Figure 1 depicts a preference for the subject referent in the subject focus condition with the personal pronoun er, and a preference for the object referent in the object focus condition for the demonstrative pronoun der. These two preferences were not as pronounced as for L1 speakers (cf. Hert et al., Reference Hert, Järvikivi and Arnhold2024). Nonetheless, the results show that L2 speakers were sensitive to the two different referential forms: er was more likely to be resolved toward the subject referent, and der was more often linked to the object referent. Similar to L1 speakers, there was no preference for either referent with der in the subject focus condition for the L2 group. However, unlike L1 speakers who still preferred the subject referent, L2 speakers did not show a referent preference for er in the object focus condition. As hypothesized, L2 speakers do not show a strong preference for der, but, surprisingly, there was also no clear preference for er. This can be ascribed to the effect of information structure: In short, for both pronouns, a preference for one referent emerged only when grammatical role and focus marking were combined, and, as hypothesized, L2 speakers were more swayed by focus marking than L1 speakers.

As for the second hypothesis, more proficient L2 speakers chose the subject referent more often than less proficient speakers, as hypothesized. In addition, while proficiency had a significant effect, L1 type did not, supporting the third hypothesis.

Experiment 2: Interpretation of er and ihn in L1 and L2 speakers

In experiment 2, we test L1 and L2 speakers’ referential choice for the subject pronoun er and object pronoun ihn. We manipulated whether pronouns were unaccented or accented, as for the subject pronoun in (1), to investigate whether prosody would affect referent selection and if L1 speakers would differ from L2 speakers (see section 4.1.2 for full example and all pronoun conditions).

(1) Der Arzt bringt den Koch mit einer Clownsnase zum Lachen, als er/ER die Musikerin mit der Kamera filmt.

“The doctor makes the cook laugh with a clown’s nose, when he/HE filmed the musician with the camera.”

As discussed in section 1.1, unaccented personal subject pronouns are preferably resolved toward the preceding subject referents (e.g., Abashidze et al., Reference Abashidze, Gagarina and Bittner2023; Bader and Portele, Reference Bader and Portele2019b; Bouma and Hopp, Reference Bouma and Hopp2007; Colonna et al., Reference Colonna, Schimke and Hemforth2012; Hert et al., Reference Hert, Järvikivi and Arnhold2024), whereas for object pronouns, a preference for object referents (Sauermann and Gagarina, Reference Sauermann and Gagarina2017) or subject and object referents (Abashidze et al., Reference Abashidze, Gagarina and Bittner2023) was observed in L1. Therefore, for L1 speakers, we expect a clear preference for linking the unaccented subject pronoun to the subject referent. For the unaccented object pronoun, if its interpretation is indeed affected to the same extent by multiple factors as proposed by Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023), then we expect no preference for either referent. With respect to accented pronouns, previous findings suggest that whether or not the accent leads to a reversal in referential choices depends on alternatives being explicitly available in the previous discourse (Mozuraitis and Heller, Reference Mozuraitis and Heller2017). Our experimental items include alternative referents, which should enable the reversal of accented subject pronouns. For the object pronoun, we assume no preference following Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023) and Sauermann and Gagarina (Reference Sauermann and Gagarina2017), and accenting the pronoun should therefore not lead to a reversal in referent preference.

Turning to L2 speakers, considering the research presented in section 1.2, we assume referent choice to be similar to that of L1 speakers with unaccented pronouns, but the preference may not be as pronounced as with L1 speakers. Regarding accented pronouns, in line with our first hypothesis, experiment 1 showed that L2 speakers are highly sensitive to focus marking, where it helps them with the task of resolving ambiguity of pronouns, even though this can lead to non-native like preferences. But will the effect of information structure marking be equally pronounced when it applies to the pronoun itself rather than highlighting one of the possible referents, and thus not directly aiding with ambiguity resolution?

In addition to the accent, we included an order of mention manipulation for the object pronoun ihn. This was done so that we could target possible effects of parallel position (cf., Abashidze et al., Reference Abashidze, Gagarina and Bittner2023; Sauermann and Gagarina, Reference Sauermann and Gagarina2017). Note that while Abashidze et al. (Reference Abashidze, Gagarina and Bittner2023) and Sauermann and Gagarina (Reference Sauermann and Gagarina2017) have looked into the effect of parallel position as well, they actually did not manipulate order of mention of the pronoun.

Lastly, in relation to previous work on L2 pronoun resolution mentioned in section 1.2, we also considered the role of proficiency and included a measure of vocabulary knowledge. Following previous findings (e.g., Ellert et al., Reference Ellert, Roberts, Järvikivi, Spiegel and Krafft2011; Lozano, Reference Lozano2018; Wilson, Reference Wilson2009), we assume that L2 speakers’ performance correlates with their level of proficiency. This means the higher their score on the proficiency measure, the closer their performance on referent selection should be to that of the native speakers, as stated by our second overarching hypothesis. Moreover, following the third hypothesis, we expect proficiency to be of greater importance for pronoun resolution than for L2 speakers’ L1.