Why non-native speakers sometimes outperform native speakers in agreement processing

Abstract It is well-known that native English speakers sometimes erroneously accept subject-verb agreement violations when there is a number-matching attractor (e.g., *The key to the cabinets were…). Whether bilinguals whose L1 lacks number agreement are prone to such interference is unclear, given previous studies that report conflicting findings using different structures, participant groups, and experimental designs. To resolve the conflict, we examined highly proficient Korean–English bilinguals’ susceptibility to agreement attraction, comparing prepositional phrase (PP) and relative clause (RC) modifiers in a speeded acceptability judgment task and a speeded forced-choice comprehension task. The bilinguals’ judgments revealed attraction with RCs but not with PPs, while reaction times indicated attraction with both structures. The results therefore showed L2 attraction in all measures, with the consistent exception of judgments for PPs. We argue that this supports an overall native-like agreement processing mechanism, augmented by an additional monitoring mechanism that filters explicit judgments in simple structures.


Introduction
Learning a second language involves acquiring new morpho-syntactic features and rules. While the knowledge of such rules may develop through increased experience with the language, being able to put them to use to quickly process sentences in real-time remains a challenging task for a non-native speaker. Some studies have shown that highly proficient L2 learners show native-like sensitivity to morpho-syntactic violations in online sentence comprehension, including number, gender, and tense agreement (Hopp, 2006;Foote, 2011;Tanner, 2011;Sagarra & Ellis, 2013;Coughlin & Tremblay, 2013), but other studies have found significant differences in the way non-native speakers respond to such grammatical errors (Chen, Shu, Liu, Zhao & Li, 2007;Jiang, 2004Jiang, , 2007Jiang, Novokshanova, Masuda & Wang, 2011;Keating, 2009;Tanner, Nicol, Herschensohn, Osterhout, Biller, Chung & Kimball, 2012;Lago & Felser, 2018;Song, 2015). This pattern has often been noted in cases where learners had to newly acquire the L2-specific feature, such as number agreement for Chinese learners of English (e.g., Jiang, 2004) or Korean learners of English (e.g., Song, 2015).
Despite the growing research on non-native speakers' morpho-syntactic processing and the unresolved debate of whether L1-L2 differences play a critical role in determining ultimate attainment of L2 morpho-syntactic sensitivity, less work has investigated the underlying cognitive mechanism in non-native speakers' computation of less-familiar operations that are specific to the L2. A psycholinguistic phenomenon that is useful for probing such mechanism in native speakers is agreement attraction, a case of systematic failure in native speakers' processing of a relatively simple operation like subject-verb agreement, which informs psycholinguistic theories about how the parser uses different kinds of information to process agreement in real-time. Examining whether non-native speakers show attraction effects like native speakers, with similar attraction profiles, can provide supporting evidence for either a shared or fundamentally distinct language processing mechanism in native and non-native language processing and provide insight on whether difficulties in L2 processing are due to a unique underlying mechanism that is more error-prone compared to what is utilized in native language processing. Agreement attraction also provides a promising test of parallels between native and nonnative language processing, because it has been attributed to pervasive processing operations such as cue-based memory retrieval (Dillon, Mishler, Sloggett & Phillips, 2013;Lago, Shalom, Sigman, Lau & Phillips, 2015;Wagers, Lau & Phillips, 2009). While some previous works show that L2 learners whose first language has agreement show susceptibility to such illusion like native speakers (e.g., Jegerski, 2016;Lago & Felser, 2018;Tanner, 2011), studies with learners whose first language lacks agreement seem to show conflicting results, where attraction is found in one study (Lim & Christianson, 2015) but not in another (Schlueter, Momma & Lau, 2019). Since those studies used different methods, examined different structures, and tested speakers of different L1s, it is difficult to ascertain the underlying reason for the divergence in the results. The goal of the present study is to resolve the conflict by directly comparing the different structures tested in the previous studiesnamely, prepositional phrase (PP) and relative clause (RC) subject modifiers, in the same experimental design with the same group of L2 learners, and in so doing, to examine whether highly proficient Korean-English bilinguals whose first language lacks number agreement use the same mechanism to compute subject-verb number agreement in their L2 English as native speakers.

Agreement attraction in L1 sentence processing
Agreement attraction refers to a phenomenon commonly found in real-time sentence comprehension and production (e.g., Bock & Miller, 1991). Here, we focus on agreement attraction in comprehension, where the parser fails to notice a number mismatch between the subject and the verb when a nearby noun matches the verb's number. To take an example, (1a) and (1b) are both ungrammatical in English, because they violate subject-verb number agreement. The critical difference between them is whether the nearby noun agrees with the verb in grammatical number (1b; cabinets) or not (1a; cabinet). Studies have shown that native speakers tend to show facilitated processing of ungrammatical sentences containing an attractor that matches the verb's number (1b) compared to those without a number-matching attractor (1a), in various measures including acceptability judgments (e.g., Clifton, Frazier & Deevy, 1999), reading times during comprehension using self-paced reading (e.g., Pearlmutter, Garnsey & Bock, 1999;Wagers et al., 2009) and eye-tracking (e.g., Dillon et al., 2013), and neural responses using ERPs (e.g., Tanner, Nicol & Brehm, 2014).
1. a) *The key to the cabinet were old and rusty. b) *The key to the cabinets were old and rusty.
This kind of illusion in online sentence comprehension has been observed with different kinds of structures in English, including prepositional phrase and relative clause subject modifiers (Tanner, 2011), double modifier constructions (Lago & Felser, 2018), and even object relative clause constructions (Wagers et al., 2009), which indicates that the attractor does not need to linearly intervene between the subject and verb. It is not specific to English; agreement attraction is found cross-linguistically with other kinds of agreement, such as number agreement in Spanish (Lago et al., 2015), gender agreement in Russian (Slioussar & Malko, 2016) and Spanish (Gonzalez Alonso, Cunnings, Fujita, Miller & Rothman, 2021), and honorific agreement in Korean (Kwon & Sturt, 2016). Although their degree of susceptibility may differ, children (Veenstra, Antoniou, Katsos & Kissine, 2017), older adults (Reifegerste, Hauer & Felser, 2017), and nonnative speakers (Tanner, 2011) all seem to be prone to the illusion to some extent. There are several reasons why agreement attraction has gained interest in the psycholinguistics literature. One is that it represents a situation where even native speakers fail to correctly compute a simple syntactic operation which is not limited to certain cases but rather robustly found across various structures, languages, and populations. Another more theoretically motivated reason is that the attraction effect has allowed psycholinguists to investigate the cognitive mechanism involved in computing linguistic dependencies like subject-verb agreement, where the parser has to complete the dependency while encoding and maintaining other elements in the sentence in memory. While there have been various accounts explaining the phenomenon, a widely adopted view has been cue-based memory retrieval accounts (Dillon et al., 2013;Lago et al., 2015;Wagers et al., 2009), which suggest that the illusion occurs because of how memory retrieval works (Lewis & Vasishth, 2005;Lewis, Vasishth & Van Dyke, 2006;McElree, 2000;Van Dyke & McElree, 2006). According to this view, as the parser comprehends a sentence, each element is encoded and stored in memory until a point at which a previously encountered item has to be retrieved, and this backward search for the target item is driven by retrieval cues that define the features of each item, such as [+subject] or [+plural]. This mechanism makes the parser susceptible to similarity-based interference, where sometimes irrelevant items that partially match those cues are retrieved instead of the correct target. For example, when processing the ungrammatical sentence (1b), the parser initiates a backward search at the verb to find a plural subject to agree with the plural verb, using the [+subject] and [+plural] cues. The subject key provides a partial match ([+subject; −plural]) while the plural attractor cabinets also provides a partial match with the number cue ([−subject; +plural]). As a result, the number-matching attractor cabinets is sometimes mistakenly retrieved instead of the subject key, resulting in higher acceptance rates or reduced reading times for the ungrammatical sentence.
An alternative explanation for the illusion proposed by representational accounts (e.g., Eberhard, Cutting & Bock, 2005;Franck, Vigliocco & Nicol, 2002;Pearlmutter et al., 1999) is that agreement errors arise due to uncertainty in the encoding of a complex subject noun phrase. A plural modifier might lead the parser to incorrectly treat the whole NP as plural, leading to erroneous acceptance of an ungrammatical sentence containing a plural attractor. A key piece of evidence that favors cue-based retrieval accounts over this alternative representational view is the grammatical asymmetry, where attraction effects tend to be large and robust in ungrammatical sentences but small or absent in grammatical sentences 1 (Lago et al., 2015;Tanner et al., 2012;Wagers et al., 2009). While such an effect can be explained through a cue-based retrieval mechanism, the grammatical asymmetry is not expected under a representational view, because the misrepresentation of the head NP should occur in grammatical sentences just as often as it does in ungrammatical sentences (but see Hammerly, Staub & Dillon, 2019 for a proposed explanation of the grammatical asymmetry under a representational account). In the present study, we also observe the grammatical asymmetry, even in the L2 learners in cases where they show susceptibility to agreement attraction. We therefore interpret our findings as supporting evidence for the idea that both an L1 and L2 parser use a cue-based memory retrieval mechanism to compute subject-verb agreement.
For the purpose of the present study, the presence of a grammatical asymmetry would not only provide support for either the retrieval account or a representational account, but it would also indicate whether a similar mechanism is used to compute agreement in the L2. If non-native speakers show the grammatical asymmetry, it would suggest that learners who had to newly 1 Although some studies have reported attraction effects with grammatical sentences (Jiang, 2004;Kwon & Sturt, 2016;Pearlmutter et al., 1999), it has been proposed by others that these cases reflect spill-over effects which arise from the processing cost differences between singular and plural nouns, instead of retrieval processes (Wagers et al., 2009). acquire the number agreement feature use a native-like cue-based retrieval mechanism to compute agreement in the L2.

Agreement attraction in L2 sentence processing
Most of the previous studies on agreement attraction in L2 sentence comprehension have investigated learners whose first language also has agreement (e.g., Jegerski, 2016;Lago & Felser, 2018;Tanner, 2011). These studies tend to show that L2 learners are also susceptible to agreement attraction, suggesting that the same cue-based memory retrieval mechanism is involved in processing subject-verb agreement in a non-native language. For example, Tanner (2011) showed that Spanish-English bilinguals' behavioral judgments and ERP responses revealed native-like attraction effects regardless of a structural manipulation. Native-like L2 attraction effects have also been found with English-Spanish bilinguals (Jegerski, 2016) and Russian-German bilinguals (Lago & Felser, 2018). However, it remains unclear whether learners whose first language lacks the agreement feature that is required in L2 processing also show susceptibility to agreement attraction, given the sparse existing literature. Two studies that specifically investigated the agreement attraction phenomenon in L2 real-time comprehension and which included comparisons between grammatical and ungrammatical sentences with and without number-matching attractors are Lim and Christianson (2015) and Schlueter et al. (2019). They present conflicting results. Lim and Christianson (2015) examined number attraction effects in native English speakers and highly proficient Korean-English bilinguals using eye-tracking.
(2) a. The teachers who instructed the students were very strict.
b. The teachers who instructed the student were very strict. c. *The teacher who instructed the student were very strict. d. *The teacher who instructed the students were very strict.
The results showed that both groups fixated longer and regressed more when reading ungrammatical sentences with subject-verb mismatches (2c, d) than reading grammatical sentences (2a, b), indicating that they were sensitive to agreement errors. Critically, both groups made shorter fixations and fewer regressions on the verb and one word following the verb in sentences that contained number-matching attractors (2d) compared to those that did not have attractors (2c), suggesting facilitation of processing agreement violations when a number-matching attractor was present. Similar attraction effects were replicated in another study, where Korean-English bilinguals' eye-fixations revealed interference from attractors, regardless of animacy manipulations (Chung & Nam, 2019). Only a quantitative difference between the native and non-native speakers was observed in Lim and Christianson (2015): the Korean-English bilinguals exhibited a delay in showing the effects, indicating that relatively more time is needed to process and register information in the L2. In contrast to Lim and Christianson (2015), Schlueter and her colleagues (2019) did not find an attraction effect with highly proficient Chinese-English bilinguals.
(3) a. The owner of the expensive car has been drinking a lot.
b. *The owner of the expensive car have been drinking a lot. c. The owner of the expensive cars has been drinking a lot. d. *The owner of the expensive cars have been drinking a lot.
In both a speeded acceptability judgment task and a self-paced reading task, the bilinguals demonstrated sensitivity to the subject-verb number mismatch (3a vs. 3b; 3c vs. 3d) but showed no significant difference in performance between the singular attractor and plural attractor conditions (3a vs. 3c; 3b vs. 3d). However, the reading time data did not reveal attraction with the native English speakers either, which was unexpected considering that it is a reliably observed effect with native speakers. As the authors mentioned, this makes it difficult to draw strong conclusions from the Chinese speakers' performance in the self-paced reading task. Nevertheless, their judgment data showed clear contrasts between native and non-native speakers regarding the attraction effect: the non-native speakers were not susceptible to agreement attraction, in contrast to the L2 attraction effect found in Lim and Christianson (2015). At this point, it is important to note that the main issue at hand is not a question of whether or not L2 learners who had to newly acquire the L2 agreement feature are sensitive to agreement violations in online sentence comprehension, since nonnative speakers in both Lim and Christianson (2015) and Schlueter et al. (2019) showed sensitivity to agreement violations, even though the findings diverged in whether that sensitivity was modulated by the presence of number-matching attractors. While the question of L2 learners' sensitivity to agreement has been extensively investigated in the literature (e.g., Chen et al., 2007;Coughlin & Tremblay, 2013;Foote, 2011;Hopp, 2006;Keating, 2009;Lago & Felser, 2018;Jiang, 2004Jiang, , 2007Jiang et al., 2011;;Sagarra & Ellis, 2013;Song, 2015;Tanner, 2011;Tanner et al., 2012), the focus of the present study was whether learners who have already achieved high L2 performance and are able to compute agreement in real-time use the same kind of mechanism to do so as native speakers. We therefore examined L2 agreement attraction with highly proficient bilinguals in order to avoid the risk of measuring effects that are due to less developed proficiency rather than their processing mechanisms (Keating, 2017).
Resolving the conflict between the two earlier works is important because they support different theories on L2 agreement processing, either claiming a qualitative difference between L1 and L2 processing or supporting a common system with potentially quantitative differences. Lim and Christianson's (2015) data supports the idea that there is only a quantitative difference, as the native and non-native speakers showed similar attraction effects, with the non-native speakers only exhibiting a delay in the effect. Conversely, Schlueter et al. (2019) proposed that Chinese-English bilinguals process agreement in a qualitatively different way compared to native speakers, in that they only use the retrieval cue that is available in both L1 and L2 (i.e., structural cue [+subject]) and not the newly acquired cue which is specific to L2 (i.e., number cue [+plural]). Since the learners do not use the [+plural] cue, the plural attractor does not interfere with the search for the head noun, which makes the learners actually outperform native speakers who use both types of cues and become susceptible to attraction. This suggests that there is a qualitative difference in the way native and non-native speakers compute agreement, whereas the former account proposes that there is only a quantitative difference. There is also an independently motivated view which predicts L2 learners to be more prone to agreement attraction or other kinds of similarity-based interference than native speakers (Cunnings, 2017). While these accounts differ in their predictions, the conflicting findings from Lim and Christianson (2015) and Schlueter et al. (2019) cannot provide clear answers, given that the two studies differed in many ways.

154
Eun-Kyoung Rosa Lee and Colin Phillips One critical contrast between the two studies was that Lim and Christianson (2015) used sentences with relative clause (RC) subject modifiers (e.g., The teacher that instructed the student was very strict.), while Schlueter et al. (2019) used prepositional phrase (PP) modifiers (e.g., The owner of the expensive car has been drinking a lot.). The fact that L2 agreement attraction was found in the former study with RCs, and not in the latter with PPs is surprising, given that native speakers reliably show attraction effects with both types of structures in comprehension (Tanner, 2011) and sometimes even show a lack of attraction with RCs (Parker & An, 2018), or the opposite pattern, in production (e.g., Bock & Miller, 1991;Bock & Cutting, 1992). Even bilinguals whose L1 has grammatical agreement showed attraction effects with both RC and PP modifiers under the same experimental condition (e.g., Spanish-English bilinguals; Tanner, 2011). Therefore, the contrast between Lim and Christianson (2015) and Schlueter et al. (2019) calls for further examination.
The two previous studies targeted different L1 speakers (Korean vs. Chinese), used different methods (eye-tracking vs. behavioral judgments), and tested different conditions (singular vs. plural head nouns), which makes it difficult to draw strong conclusions about the contribution of the difference between PPs and RCs in L2 agreement attraction. In the present study, we directly compared the two structures in the same within-subjects design with the same group of participants while controlling for other variables. We set out to investigate whether advanced Korean-English bilinguals whose L1 does not have subject-verb number agreement are prone to number attraction and whether this is modulated by sentence structure. In Experiment 1, we used a speeded acceptability judgment task to examine the native and non-native speakers' realtime judgments of sentences with and without number-matching attractors and searched for an attraction effect in the two structures, PPs and RCs. Based on the results from Experiment 1, we used a modified paradigm in Experiment 2, a speeded forced-choice judgment task to probe more immediate judgments as well as reaction times as an additional measure, in order to examine potential attraction effects that may not have been detected in the judgments.

Experiment 1
The goal of Experiment 1 was to test whether highly proficient bilinguals would show similar or different agreement attraction effects between sentences with PP and RC modifiers in the same experimental design. The main data of interest was the presence of an attraction effect (differences between sentences with number-matching attractors and those without), in the PP and RC conditions, as well as the presence of the grammatical asymmetry, i.e., an interaction between grammaticality and attraction, with larger attraction effects in the ungrammatical condition.
The potential outcomes and their implications were as follows: if Korean-English bilinguals do not use a native-like mechanism to compute subject-verb agreement in the L2, they would not show attraction effects with PPs and RCs, unlike native speakers. If Korean-English bilinguals use a native-like mechanism to compute subject-verb agreement in the L2, they would show attraction effects with both PPs and RCs, like native speakers.

Participants
Participants were 41 native Korean speakers (28 females, mean age = 24) attending Seoul National University in Korea who were highly proficient in English, evidenced by their high standardized English proficiency test scores (mean iBT TOEFL score = 112 out of 120) 2 . In addition, a control group of 23 native English speakers (10 females, mean age = 19.31) were recruited 3 . They were all undergraduate students enrolled at the University of Maryland. No participant reported having any visual impairments.

Materials
For the speeded acceptability judgment task, 48 sets of target items were constructed in a 2 x 2 x 2 design, manipulating modifier structure (PP vs. RC), grammaticality (grammatical vs. ungrammatical), and attractor (singular attractor vs. plural attractor). Thus, for every target sentence, eight (2 x 2 x 2) different versions were created. Half included PP modifiers while the other half included RC modifiers, and for each modifier type, half were grammatical and the other half were ungrammatical, each with and without a verb-matching attractor. The eight conditions and example sentences are presented in Table 1.
Most of the items were adapted from Tanner (2011), where careful measures were taken to balance the items between the PP and RC conditions, such as inserting an adjective at the end of PP modifiers to control the distance between the subject and verb in both conditions. Verbs with strong transitive biases were specifically chosen to ensure that the attractors in the RC condition would be parsed as the object of the verb in the relative clause and to prevent garden-path effects (Tanner, 2011). Target sentences included only singular head nouns, in order to avoid complications with the markedness effect (Wagers et al., 2009), while the distractor sentences contained both singular and plural subjects. The present and past tense forms of the be-verb (i.e., is/are, was/were) were used.
Each participant saw six sentences from each condition, resulting in 48 target items in total, half grammatical and half ungrammatical. In addition, 48 distractor items were created, which included manipulations of pronoun gender, past and future tense, and third person -s. These were also half grammatical and half ungrammatical. In total, each participant saw 96 sentences, plus the four practice items in the beginning of the experiment.

Procedure
The experiment was conducted in a quiet lab, where participants were seated in front of a computer monitor and keyboard. Participants provided written consent and then were given instructions about the experimental procedure. The speeded acceptability judgment task was presented using Psychopy (Peirce, 2007). Experimental sentences were presented word-by-word on the computer screen and participants were asked to judge whether or not each sentence was acceptable in English. Each sentence was preceded by a fixation mark that was presented for 1000 ms for the native speaker control group and 1500 ms for the L2 group. After the fixation mark, each word was presented one at a time on the screen, 400 ms per word for native speakers and 500 ms per word for nonnative speakers. The speed at which the fixation mark and the words were presented was different for the two groups, taking into account that delays in reading time are often observed in L2 2 Their mean self-rating proficiency on a scale of 1 (low) to 10 (high) were as follows: reading (M = 8.22, SD = 1.08), listening (M = 7.27, SD = 1.72), speaking (M = 6.32, SD = 2.02), writing (M = 6.59, SD = 1.69). processing 4 . At the end of every sentence, the question, "Was that acceptable?" appeared, and participants had to press either the F or J key on the keyboard to respond "yes" or "no," respectively, as quickly as possible. The entire experiment session took no more than 30 minutes, and participants were compensated at the end of the experiment.

Analysis
Responses exceeding three seconds for native speakers and four seconds for non-native speakers were removed from analysis (6.16% of data from native speakers, 5.34% of data from nonnative speakers). Statistical analyses were conducted by constructing generalized linear mixed effects logit models for acceptance rates (Jaeger, 2008), in the R computing environment (R Core Team, 2017) using the "lme4" package (Bates, Maechler, Bolker & Walker, 2015) and the "bobyqa" optimizer (Powell, 2009). The models for each group included Grammaticality (G = grammatical, U = ungrammatical), Attractor (S = singular attractor, P = plural attractor), and Structure (PP = prepositional phrase, RC = relative clause) as fixed effects, and post-hoc models for each modifier structure were built when there was a three-way interaction. We analyzed the data from the two groups in separate models in order to ensure comparability with prior studies on attraction in single groups. This also avoided the difficulty of interpreting the presence or absence of a 4-way interaction. The random effects structures were maximally specified and random slopes were progressively dropped until the models converged (Barr, Levy, Scheepers & Tily, 2013). The resulting random effects structures included by-subject and by-item random intercepts as well as by-subject random slope for Grammaticality, unless noted otherwise. All reported models represent the final converged models. All factors were deviation coded as follows: Grammaticality (G = .5, U = −.5), Attractor (N = .5, A = −.5), and Structure (PP = .5, RC = −.5). The statistical significance of main effects and interactions were judged based on p-values ( p < .05). Figure 1 represents participants' mean acceptance rates (proportion of accepted sentences), and Table 2 summarizes the results of the logit models on acceptance rates.

Results
The model for the native speakers' acceptance rates revealed a significant main effect of Grammaticality and a Grammaticality x Attractor interaction (both p < .001), indicating the commonly observed grammatical asymmetry, in which number-matching attractors improve acceptance of ungrammatical sentences more than lowering the acceptance of ungrammatical sentences ( p = .01 for grammatical sentences; p < .001 for ungrammatical sentences). This attraction effect did not interact with Structure ( p = .42), indicating that the same pattern applied to both PP and RC modifiers.
Similarly, the non-native speakers' acceptance rates showed a main effect of Grammaticality and a Grammaticality x Attractor interaction (both p < .001), revealing the same grammatical asymmetry found with the native speakers. However, unlike with the native speakers, there was a significant three-way interaction between Grammaticality, Attractor, and Structure ( p < .001) for the non-native speakers, suggesting that the attraction effect differed between the PP and RC conditions. Separate post-hoc models were built for each of the two structures. The model for the non-native speakers' PP condition revealed the expected Grammaticality main effect, where non-native speakers accepted grammatical sentences more than ungrammatical sentences (est. = 5.16, SE = 0.45, z = 11.35, p < .001), while there was no significant Grammaticality x Attractor interaction (est. = −0.19, SE = 0.49, z = −0.40, p = .69). In contrast, the model for the nonnative speakers' RC condition revealed a Grammaticality main effect (est. = 5.36, SE = 0.51, z = 10.56, p < .001) and a significant Grammaticality x Attractor interaction (est. = −2.13, SE = 0.53, z = −4.04, p < .001), with patterns resembling native speakers' (i.e., higher acceptance rates for grammatical items than ungrammatical items, and higher acceptance rates for ungrammatical sentences with plural attractors than with singular attractors). In other words, the Korean-English bilinguals showed an attraction effect only with items containing RC modifiers and not with sentences containing PP modifiers.

Discussion
The results of the speeded acceptability judgment task in Experiment 1 revealed a divergence between native and nonnative speakers' susceptibility to agreement attraction: the native speakers showed comparable attraction between PP and RC conditions, while the Korean-English bilinguals showed an attraction effect only with the RC modifier. There was an interaction between grammaticality and attractor type, indicating the grammatical asymmetry expected under the cue-based memory retrieval model. For the non-native speakers, this interacted with modifier structure, and subsequent analyses indicated an

Plural attractor
The artist with the tall sculptures is very talented.

Ungrammatical Singular attractor
The artist with the tall sculpture are very talented.

Plural attractor
The artist with the tall sculptures are very talented.

RC Grammatical Singular attractor
The artist who made the sculpture is very talented.

Plural attractor
The artist who made the sculptures is very talented.

Singular attractor
The artist who made the sculpture are very talented.

Plural attractor
The artist who made the sculptures are very talented.

4
The decision to give the learners a slower presentation rate was based on a pilot study where some learners reported that the presentation was too fast to process the sentences. Even when we matched the presentation rate between native and non-native speakers in the second experiment, we replicated the contrast between the two groups, which indicates that the group differences are unlikely to be due to different presentation rates.

156
Eun-Kyoung Rosa Lee and Colin Phillips attraction effect with sentences containing an RC modifier and not with sentences containing a PP modifier. These results overall replicated the contrast found between the two previous studies. Schlueter et al. (2019) did not find L2 attraction with PPs, while Lim and Christianson (2015) found attraction with RCs. It had been difficult to determine the cause of divergence between the earlier findings because they were based on studies that used different experimental designs, different participant backgrounds, as well as different structures. Our results indicate that out of those factors, structure (PP vs. RC) was a likely source of the diverging findings and they show that the same contrast can be replicated in a within-subjects design when other factors are controlled for. Thus, agreement attraction with RCs and no attraction with PPs appears to be a systematic contrast found with nonnative speakers whose L1 lacks number agreement. The structural contrast found in L2 attraction challenges previous accounts which either predict parallel attraction effects in the different types of structures (Lim & Christianson, 2015;Schlueter et al., 2019) or an account that predicts stronger attraction effects in L2 than in L1 processing (Cunnings, 2017). However, the contrast leaves open the question of whether it is the PPs or RCs that counts as the exceptional case. Under the account that learners do not use the plural cue that is not available in their L1 (Schlueter et al., 2019), the RCs seems to be the exceptional case where attraction is not observed. In contrast, if it is the case that learners use all available cues in the L2 (Lim & Christianson, 2015), then the PPs are the unusual case where learners seem to do better than native speakers in avoiding attraction.
Although the results from Experiment 1 confirmed the structural contrast observed across the two previous studies, it was necessary to replicate the same contrast in a more direct measure that reflects real-time retrieval processes right at the verb. While the speeded acceptability judgment task used in Experiment 1 has been widely used to observe agreement attraction effects in previous studies, a limitation is that it reflects end-of-sentence judgments, which makes it difficult to observe responses at the verb when subject-verb agreement checking occurs. Therefore, in Experiment 2, we examined attraction effects using a speeded forced-choice comprehension task (Hammerly et al., 2019). The modified paradigm allowed us to: 1) directly probe decision processes at the verb when retrieval occurs, and more importantly, 2) examine any hints of interference from number-matching attractors even in trials where participants made correct judgments, by measuring the time it took to make those judgments. If there was any interference from attractors, we expected it to appear as a delay in reaction times, even if it did not modulate the actual judgments. This additional reaction time measurement could also help determine whether it is the PPs or RCs that should be treated as the exceptional case where non-native speakers' attraction effect deviates from that of native speakers.

Experiment 2
Method Participants A total of 38 native Korean speakers attending Seoul National University in Korea (25 females, mean age = 25) participated in Experiment 2 5 . They were all advanced learners of English, who began learning English after age 6 and had high standardized English proficiency test scores (mean iBT TOEFL score = 111 out of 120). For the control group, 36 native English speakers (10 females, mean age = 32) were recruited in Amazon Mechanical Turk. None of the participants had taken part in Experiment 1. No participant reported having any visual impairments.

Materials
The same stimuli used in Experiment 1 were used to create the preambles for the speeded forced-choice comprehension task in Experiment 2. The critical sentences used in Experiment 1 were cut off after the verb, such that the last word of the preamble was the verb, which either agreed with the subject in grammatical number or caused an agreement violation (e.g., The key to the brown cabinet/cabinets), and the target word was the verb (e.g., was/were). The same distractor items were used, and they were edited such that the last word of the preambles determined the grammaticality of the sentence (e.g., a reflexive either matching the grammatical gender of the subject or not). Since we used the same items from Experiment 1, the type of errors participants had to make judgments about were the same as in the previous experiment. To prevent participants from focusing on the types of errors included in the task, given that the forced-choice task drew more attention to the target verb which determined acceptability, a portion of the items (23%) were presented in full sentences rather than preambles and were followed by comprehension questions that asked about the content of the sentence (i.e., "comprehension trials").

Procedure
All experimental procedures for Experiment 2 were conducted online due to situation changes regarding the COVID-19 pandemic. The speeded forced-choice comprehension task was presented using PCIbex (Zehr & Schwarz, 2018). Participants gave written consent and were given written instructions for the experiment. Sentence preambles were presented on the screen wordby-word in RSVP. For each trial, a fixation cross appeared for 1500 ms, followed by the words of the sentence that were presented for 400 ms each. Participants were asked to read the sentence as the words appeared on the screen, and at the point of seeing a word in green color, they were to judge whether that word was a good continuation of the sentence by pressing the F or J key to respond yes or no, respectively. For the few comprehension trials, the preamble continued after the participant responded to the green word, then a comprehension question appeared at the end of the sentence, and participants had to respond using the same yes/no keys.
Participants were asked to respond as quickly as possible, no later than three seconds. After the time limit, a warning message appeared, and the screen moved on to the next trial. Negative feedback was provided for incorrect responses to the comprehension questions. The total experiment session took no more than 30 minutes, and participants were compensated at the end of the experiment.

Analysis
Acceptance rates were analyzed using the same procedures in Experiment 1. Reaction times for the correctly responded trials were analyzed using linear mixed effects models (Baayen, Davidson & Bates, 2008), following the same procedures as for the acceptance rates. The statistical significance of main effects and interactions for reaction times was judged based on t-values (|t| > 2) (Gelman & Hill, 2007).

Acceptance rates
The mean acceptance rates are shown in Figure 2, and the results of the logit models on acceptance rates are presented in Table 3.
For the native speakers, there was a main effect of Grammaticality ( p < .001) and Attractor ( p = .01), where acceptance rates were higher in the grammatical items than the ungrammatical ones and in sentences with singular attractors than those with plural attractors. There was also a Grammaticality x Attractor interaction ( p < .001), where the presence of a number-matching attractor affected the ungrammatical conditions more than the grammatical ones ( p > .05 for grammatical sentences; p = .02 for ungrammatical sentences), indicating a standard grammatical asymmetry.
The non-native speaker group also showed a main effect of Grammaticality and Attractor, as well as the Grammaticality x Attractor interaction (all p < .001), revealing similar patterns to the native group's. A critical difference, however, was that the non-native speaker group also showed a Attractor x Structure interaction ( p = .04), indicating a greater effect of the attractor in the RC structure than the PP structure. The three-way Grammaticality x Attractor x Structure interaction was marginally significant ( p = .06). Subsequent models for both the PP and RC conditions revealed that the Grammaticality x Attractor interaction was only present in the RC structure ( p < .001), while absent in the PP structure ( p = .33). The results overall suggest greater attraction effects in the RC structures compared to PPs in the non-native speaker group, similar to Experiment 1.

Reaction times
The mean reaction times for correct trials are presented in Figure 3, and the results of the linear mixed effects models are shown in Table 4.
In the native speaker group, there was a main effect of Grammaticality (t = −2.45) and a main effect of Attractor (t = 3.85), where native speakers were faster to judge the grammatical sentences than the ungrammatical ones and the sentences with singular attractors than with plural attractors. While the non-native speakers overall showed slower reaction times compared to the native speakers, they exhibited the same pattern of attraction: a main effect of Grammaticality (t = −3.16) and a main effect of Attractor (t = 3.10), with faster reaction times for grammatical sentences and for sentences with singular attractors. These effects did not interact with each other or with Structure, in either group (|t| < 2). 5 Two participants were removed from analysis because they did not fit the recruitment criteria.

158
Eun-Kyoung Rosa Lee and Colin Phillips

Discussion
The judgment accuracy in the speeded forced-choice comprehension task in Experiment 2 replicated the pattern found in Experiment 1 and the comparison between previous studies on L2 agreement attraction (Lim & Christianson, 2015;Schlueter et al., 2019). Native English speakers were susceptible to interference from a plural attractor, regardless of the structure, whereas Korean-English bilinguals, who do not have subject-verb number agreement in their L1, showed attraction with RCs but not with PPs. The interaction between attractor and grammaticality indicated the attraction effect with the grammatical asymmetry in   both groups, while the effect of modifier structure only appeared with the non-native speakers, through an interaction between attractor and structure and a marginally significant three-way interaction of these two factors with grammaticality. While the three-way interaction found in Experiment 1 turned out marginally significant in Experiment 2, the significant interaction between attractor and structure suggests that the effect of the attractor was modulated by the structure the attractor was in (PP vs. RC). The reaction time data revealed a different pattern from the judgments. In both the native and non-native speaker groups, participants were slower to correctly respond to trials in the singular attractor condition than in the plural attractor condition, regardless of grammaticality or modifier structure. Such pattern found with reaction times diverges from the judgment data in two critical ways: an absence of the grammatical asymmetry (i.e., no difference between grammatical and ungrammatical conditions in the effect of attractors) and an absence of the structural asymmetry in L2 attraction (i.e., no difference between PPs and RCs). The reaction times revealed a more general effect of the presence of a number-matching attractor than was found with acceptance rates, and importantly, it revealed an attraction effect in the non-native speakers' PP conditions, which did not show attraction in judgments. Taken together, the judgment and reaction time results indicate that overall, the non-native speakers exhibited similar patterns of attraction to native speakers, with the exception of the judgments for PPs, which did not show an attraction effect in both Experiment 1 and 2. We further discuss the implications in the General Discussion.
One may wonder why the native speakers showed relatively poor performance in the forced-choice task in Experiment 2, compared to the non-native speakers as well as the native speakers in Experiment 1. The two experiments differed in many ways aside from the task, including the data collection method and participant background. Unlike Experiment 1, which was conducted in-person, all procedures in Experiment 2 were carried out online, including recruitment of the native control group. The native speakers were recruited through an online crowd-sourcing platform (i.e., Amazon Mechanical Turk), where the participant pool tends to show wide variation in factors such as age and level of education. The native speakers in Experiment 1, in contrast, were all undergraduate students in the same university in the US, representing a more homogenous, highly educated group, as well as the Korean speakers who were recruited from a highly selective university in Korea. Moreover, in the forcedchoice task in Experiment 2, the native speakers showed overall significantly faster reaction times than the non-native speakers, which suggests that the relatively poor performance could be a consequence of a speed-accuracy trade-off. A combination of these factors could have contributed to the greater variability and increased error rates in the native speakers' judgments in Experiment 2.

General discussion
The main goal of the study was to examine whether bilinguals with an L1 that does not have subject-verb number agreement are susceptible to number attraction in L2 agreement processing. While previous studies have found native-like attraction effects in bilinguals who speak languages with agreement (e.g., Jegerski, 2016;Lago & Felser, 2018;Tanner, 2011), there have been conflicting findings with learners who had to newly acquire the number agreement feature in the L2 (Chung & Nam, 2019;Lim & Christianson, 2015;Schlueter et al., 2019). Through two experiments, using a speeded acceptability judgment task and a speeded forced-choice comprehension task, we examined highly proficient Korean-English bilinguals' susceptibility to agreement attraction in L2 English using sentences containing different types of subject modifiers. The speeded judgments in Experiment 1 and 2 showed that the bilinguals were indeed prone to interference from plural attractors but in a selective way: attraction with RC subject modifiers but not with PP subject modifiers. This structural asymmetry was not observed with native speakers, who were affected by the presence of a number-matching attractor regardless of sentence structure. However, the reaction times in Experiment 2 revealed a general attraction effect in both groups with both PP and RC modifiers. A summary of the findings is presented in Table 5. In the following discussion, we first explain how the judgment data from Experiment 1 and 2 indicates a structural asymmetry in L2 agreement attraction, then discuss the divergence between the accuracy and reaction time measures from Experiment 2 and whether they represent different attraction effects, and finally provide some explanations for the structural asymmetry found with the bilinguals and the implications for existing theories of L2 agreement processing.

Eun-Kyoung Rosa Lee and Colin Phillips
The main finding based on the judgment accuracies in Experiment 1 and 2 was the structural contrast found in the Korean speakers' susceptibility to agreement attraction. In the speeded acceptability judgment task in Experiment 1, both the native and non-native speakers made more judgment errors when an intervening noun matched the verb's number than when it mismatched, and this effect was found in the ungrammatical sentences only, representing the classic attraction effect with the grammatical asymmetry expected under a cue-based retrieval model (Lewis & Vasishth, 2005;Martin & McElree, 2009;McElree, 2000;McElree, Foraker & Dyer, 2003;Van Dyke & McElree, 2006). Critically, in the non-native speaker group, the attraction effect interacted with modifier structure, where attraction was found with RC modifiers but not with PP modifiers. The native controls did not show this structural contrast. A similar pattern emerged in the judgments in the speeded forcedchoice comprehension task in Experiment 2, where both groups tended to erroneously accept ungrammatical sentences when there was a plural attractor, but only the Korean speakers showed an interaction with modifier structure, where the presence of the number-matching attractor affected judgments in the RC condition but not the PP condition. The judgment data from the two experiments together suggests that non-native speakers' susceptibility to number interference is modulated by sentence structure.
These results provide a potential explanation for the diverging findings between earlier works: Lim and Christianson (2015) which found L2 attraction with RC modifiers and Schlueter et al. (2019) which did not find L2 attraction with PP modifiers. Given that the two previous studies used different tasks, tested different structures, with participants with different L1 backgrounds, it was difficult to determine whether the contrasting findings were strictly due to the sentence structure or were an artifact of other experimental variables. In the present study, we directly compared the two structures, PPs and RCs, while controlling for other differences such as the distance between the subject and verb, in a within-subject design where the same participants made judgments for both structures, and we were able to replicate the contrast in two experiments. This suggests that the different patterns observed in the previous studies were due to the structure investigated, and the fact that native English speakers did not show the contrast between PPs and RCs in either of our experiments or in the earlier studies indicates that it is a unique pattern observed with L2 learners.
Unlike the judgment data, however, the reaction times from Experiment 2 revealed a general attraction effect, even in the case where judgments did not reveal attraction, in the non-native speakers' PP condition. The speeded forced-choice comprehension task in Experiment 2 made it possible to examine interference effects even when the participants were getting the judgments correct. Unlike acceptance rates, the reaction times for the correct trials showed a main effect of attractor in both groups, which was not modulated by whether the subject and verb agreed or not (i.e., no grammatical asymmetry) or whether the attractor was in a PP or RC (i.e., no structural asymmetry). Both the native and non-native speakers were relatively slow at judging sentences containing a plural attractor, indicating that even in trials where they made correct judgments, the presence of a number-matching attractor delayed the speed of those decisions.
The judgment and reaction time results together suggest that Korean-English bilinguals are indeed susceptible to number attraction in a similar way to native speakers, although the learners sometimes show immunity to attraction in one specific combination of tasks and structures (i.e., judgments for PP modifiers). This raises the question of why we see a divergence between judgments and reaction times, where we only see immunity to attraction in the Korean speakers' judgments and not in their reaction times. We break this into two questions. First, mechanistically, how can attraction impact judgment times but not the judgments themselves? Second, why should the Korean speakers behave differently in PP and RC conditions?
On the first question, we suggest that interference from a plural attractor can be high enough to influence reaction times but below the threshold for affecting judgments. That is, the reaction time measure could detect an interference effect that is too weak to impact final judgments. It may take longer to judge the acceptability of a plural verb given a singular subject and a plural attractor noun compared to when the nouns are both singular, but for the learners, this interference from the plural attractor can sometimes be successfully overcome at the end, resulting in similar judgment accuracies in the singular attractor and plural attractor conditions. Mechanistically, this contrast between measures is possible in an activation-based retrieval model such as ACT-R (Anderson & Lebiere, 1998;Lewis & Vasishth, 2005), where the presence of a partially matching item may delay retrieval times without ultimately changing the item that is retrieved as the best match to the retrieval cues.
Second, why are the learners doing better than native speakers, in the PP condition? We consider two possibilities. One is that the interference from a plural attractor is somehow weaker in a PP than in a RC due to its structural position. It may be the case that for L2 learners, having a number-matching noun in an oblique argument position (e.g., The key to the brown cabinets…) interferes less with subject-verb agreement processing than having an attractor in a core argument position (e.g., The key that opened the cabinets…). However, previous studies on agreement attraction with native speakers (mostly in production but also recently found in comprehension) have found that native speakers tend to show the opposite effect, where being in a core argument position makes the attractor irrelevant to subject-verb agreement processing, and, hence, sometimes does not create illusions (e.g., Parker & An, 2018). Thus, it seems difficult to explain that for L2 learners having an attractor in a core argument position makes them more susceptible to attraction. Another potential reason why we see immunity to attraction in the Korean speakers' judgments is that the learners have additional control of the L2 that helps them overcome the interference and avoid judgment errors. This explanation may be supported by the fact that our Korean participants were all highly proficient bilingual speakers, highly educated, with substantial experience with the L2. It is possible that some combination of these factors allowed them to avoid interference in the particular measure we used; the proficient learners may have additional control over their L2 that native speakers do not have, and this may actually help them to resist interference from a number-matching attractor and correctly rule out agreement violations in their judgments, even though it slows down their reaction times.
Critically, these factors seem to come into play only when the sentence structure is simple as with PPs, whereas more complex structures like RCs induce an attraction effect that is impossible to avoid or overcome. Processing a RC involves determining the boundary of the embedded clause, computing a filler-gap dependency, and figuring out its relation to the main clause, which, for non-native speakers, may invoke greater cognitive resources, especially in real-time comprehension (Kim et al., 2015;Dussias & Guzzardo Tamargo, 2013). Previous studies have shown that structural manipulations modulate non-native speakers' ability to compute agreement more than they affect native speakers' processing (Keating, 2009;Lago & Felser, 2018;Song, 2015). Thus, the greater complexity of RCs could prevent learners from being able to overcome the interference from plural attractors which they are able to do with a simpler structure.
The present findings contribute to the advancement of existing theories and accounts on L2 agreement processing in several respects. First, the robust attraction found with RCs, as well as the hidden attraction effect found with PPs in reaction times, suggests that non-native speakers whose L1 lacks subject-verb number agreement use a cue-based retrieval mechanism to compute number agreement in the L2, which makes them susceptible to similarity-based interference like native speakers (e.g., Lim & Christianson, 2015). The pattern of the learners' attraction effect revealed a grammatical asymmetry like native speakers' which is consistent with a cue-based memory retrieval mechanism in subject-verb agreement processing (Dillon et al., 2013;Lago et al., 2015;Wagers et al., 2009), whereas representational accounts do not predict this asymmetry (e.g., Eberhard et al., 2005;Franck et al., 2002;Pearlmutter et al., 1999). Although there is a possibility that the asymmetry arises from response biases rather than from the mechanism involved (Hammerly et al., 2019), it is questionable whether L2 learners would have similar response biases as native speakers to account for the similar attraction pattern observed in the present study. Moreover, even if response biases generalize across the two groups, it is questionable how a representational account can capture the structural contrast between PPs and RCs in the learners' attraction effects, as well as the divergence between the judgment and RT measures. It is unclear whether and why response biases would differ across different structures, and how this contrast only appears in the learners' judgment measures, specifically. The attraction effect found with the Korean-English bilinguals also challenges an account that claims that non-native speakers do not use the number cue that is absent in their L1 (e.g., Schlueter et al., 2019). However, the lack of an attraction effect in the PP judgments suggests that the non-native speakers' way of computing agreement allows them to avoid the illusion under certain circumstances. While previously, it has been suggested that non-native speakers' inability to use all the available cues in the language helps them avoid attraction in judgments (Schlueter et al., 2019), our findings indicate that it may actually be what the highly proficient learners have additional to native speakers, such as greater language control, that prevents them from making judgment errors and leads to immunity to attraction in judgments. Finally, our results suggest that non-native speakers do not show greater susceptibility to similarity-based interference than native speakers (cf. Cunnings, 2017), but rather that they generally show attraction effects to a similar or even smaller extent where they sometimes overcome the interference, depending on sentence structure and the learners' L1 background and experience with the L2.
It should be noted that the present findings were based on data from a specific population of L2 learnersthose whose L1 lacks subject-verb number agreement and who have high L2 proficiency. Although previous studies with learners of L1 and L2 that both have agreement were not found to show this structural contrast in attraction effects (e.g., Spanish-English bilinguals; Tanner, 2011), it is questionable whether other structural manipulations that potentially change the relative activation levels of the subject and attractor noun influence the degree to which nonnative speakers become susceptible to interference. It is also reasonable to expect that the pattern of findings in the present study extends to other cases where L2-specific computations are required, such as English-Spanish bilinguals' use of the gender cue when checking Spanish gender agreement (e.g., Shantz & Tanner, 2019). These questions remain to be investigated in future studies. Moreover, L2 participants with high English proficiency were recruited for the present study, because we had to ensure the learners were sensitive to agreement in the L2 in order to test whether they show susceptibility to agreement attraction. Also, exploring these questions with the most advanced learners helps avoid the risk of measuring performance differences that are simply due to underdeveloped proficiency (Keating, 2017). However, if the immunity to attraction effects in the judgment data is indeed due to the L2 learners' proficient control of the L2, it would be predicted that susceptibility to attraction will be more pronounced in less proficient speakers who may have had less experience with L2-specific computations like agreement, which could lead to increased errors in judgments or a lack of attraction due to less experience with the L2-specific number cue. Future studies could explore how other L1 backgrounds and individual differences like proficiency modulate non-native speakers' susceptibility to interference effects. Finally, the divergence between judgment and reaction time measures found in the forced-choice task in Experiment 2 suggests that effects observed with one type of measure could sometimes be misleading, depending on how sensitive the measure is in detecting a specific effect. For example, the grammatical asymmetry that was reliably observed in the judgment measure in both of our experiments did not appear

162
Eun-Kyoung Rosa Lee and Colin Phillips in reaction times. Given that the only previous study we know of which used a similar measure (RTs of correct forced-choice judgments) did find an asymmetry in RTs (Hammerly et al., 2019), it is difficult to explain the cause of the divergence. One possibility is that although the attractor affects the retrieval and judgmentmaking process in grammatical sentences, as reflected in RTs, the interference is not strong enough to affect final judgments, because in the grammatical sentences, the head noun matches more retrieval cues of the verb than the attractor which only has a partial match, while in the ungrammatical sentences, both the head noun and attractor have partial matches with retrieval cues. We leave it to future work to explore this issue further considering that different kinds of measurements may be sensitive to varying activation thresholds in agreement attraction.

Conclusion
This study examined Korean-English bilinguals' susceptibility to agreement attraction in L2 sentence processing, comparing sentences with PP and RC subject modifiers. The results of a speeded acceptability judgment task and a speeded forced-choice comprehension task indicated overall native-like attraction effects in the learners' judgment accuracies and reaction times, except for one exceptional case. The learners consistently showed immunity to attraction in their judgments with PPs, across two experiments in the present study as well as in a previous work with a different L2 group (Schlueter et al., 2019). In all other measures, the learners' pattern of attraction closely resembled the effect found with native speakers, including the grammatical asymmetry. Together, the findings suggest that non-native speakers whose L1 does not have number agreement use a cue-based retrieval mechanism to compute subject-verb agreement like native speakers, which makes them susceptible to interference from number-matching attractors in the L2. However, the learners may show immunity to attraction in certain cases, as with simpler structures like PPs, when they are able to use their additional language control to overcome interference effects and make correct judgments.