EXAMINING THE CONTRIBUTION OF MARKEDNESS TO THE L2 PROCESSING OF SPANISH PERSON AGREEMENT

Abstract We used event‑related potentials to investigate how markedness impacts person agreement in English‑speaking learners of L2‑Spanish. Markedness was examined by probing agreement with both first‑person (marked) and third‑person (unmarked) subjects. Agreement was manipulated by crossing first‑person subjects with third‑person verbs and vice versa. Native speakers showed a P600 for both errors, larger for “first‑person subject + third‑person verb” violations. This aligns with claims that, when the first element in the dependency is marked (first person), the parser generates stronger predictions regarding upcoming agreeing elements using feature activation. Twenty‑two upper‑intermediate/advanced learners elicited a P600 across both errors. Learners were equally accurate detecting both errors, but the P600 was marginally reduced for “first‑person subject + third‑person verb” violations, suggesting that learners overused unmarked forms (third person) online. However, this asymmetry mainly characterized lower‑proficiency learners. Results suggest that markedness impacts L2 agreement without constraining it, although learners are less likely to use marked features top‑down.

The present study investigates the relationship between L2 inflectional variability and markedness, the observation that different feature values are asymmetrically represented. Morphological theory proposes that marked features are more complex, have more internal structure, or are less frequent than unmarked features. For person, the claim is that first and second person are marked, while third person is unmarked (e.g., Forchheimer, 1953;Harley & Ritter, 2002;Harris, 1995;McGinnis, 2005;Nevins, 2011). 1 An additional claim is that only marked features are specified. For example, first and second person are specified as participants in the speech act, the speaker and addressee, respectively. Third person, in contrast, is underspecified (i.e., it lacks featural information) because it is a nonparticipant in the speech act. Third person is considered to be the "default" person, a sort of "elsewhere" form. Our study examines whether/how this asymmetry impacts L1-English L2-Spanish learners' processing of sentences like (1a-b), which differ with respect to person markedness. We build on previous observations that learners overuse "defaults," that is, forms that are unmarked/underspecified (although not necessarily uninflected) and thus appear in a wider range of contexts (e.g., third-person verbal inflection emerges with personless subjects, as in Correr relaja "Running relax -3RD-SG ") (e.g., Hawkins, 2001Hawkins, , 2009McCarthy, 2008;Prévost & White, 2000).
(1) a. Yo estudio. I-1ST-SG study-1ST-SG b. El médico estudia. the doctor-3RD-SG study-  López Prego and Gabriele's proposal connecting markedness to facilitation originated in the psycholinguistics literature, where it has been argued that marked features remain longer in the focus of attention (e.g., Wagers & McElree, 2011;Wagers & Phillips, 2014) and are, thus, more likely to impact agreement operations. For example, ungrammatical strings such as the key to the cabinets *are cause less disruption when the attractor noun (cabinets) is plural (e.g., Acuña Fariña et al., 2014;Dillon et al., 2013;Pearlmutter et al., 1999;Wagers et al., 2009). Likewise, Carminati (2005) showed that coreference between a null pronoun and a structurally dispreferred antecedent is less disruptive when the disambiguating verb is inflected for first/second person (marked) relative to third person (unmarked). Crucially, Nevins et al. (2007) posited that markedness might determine whether agreement is established predictively. Their proposal is that, upon encountering a marked feature, the parser can generate a stronger prediction regarding upcoming agreement elements.
To summarize, certain L2 theoretical models (McCarthy, 2008(McCarthy, , 2012Prévost & White, 2000) posit that learners have difficulty accessing marked/specified forms when establishing agreement. Consequently, they incorrectly overextend unmarked/underspecified forms (i.e., defaults) to contexts that require marked ones, due either to computational pressure (e.g., the MSIH) or to representational issues (e.g., McCarthy, 2008). Alternatively, proposals from the psycholinguistics literature (e.g., Nevins et al., 2007;Wagers & Phillips, 2014) capitalize on the predictive value of marked features. Under this proposal, when the first element in the dependency is marked, feature activation allows the parser to better resolve agreement further down the line.

BACKGROUND OF THE PRESENT STUDY
In previous studies, we evaluated the previously mentioned proposals regarding how markedness impacts agreement resolution in both native (Alemán Bañón & Rothman, 2016 and nonnative speakers of Spanish (Alemán Bañón et al., 2017). In those studies, we examined online agreement resolution using event-related potentials (ERPs). In addition to providing high temporal resolution and being multidimensional, ERPs can unveil qualitative differences between different agreement dependencies. For example, Foucart and Frenck-Mestre (2012) found that the same L1-English L2-French learners elicited qualitatively different brain responses to gender errors (N400 vs. P600), depending on the syntactic configuration where they were realized. ERPs are, therefore, well suited for investigating both quantitative and qualitative differences between agreement dependencies that differ with respect to markedness.
In Alemán Bañón and Rothman (2019), we argued in favor of Nevins et al.'s proposal (2007) that markedness allows the parser to resolve agreement top-down, at least when the dependency is sufficiently constraining (e.g., Dillon et al., 2013). In that study, we examined subject-verb person agreement with both first-person (marked/specified: speaker; see 2) and third-person singular subjects (unmarked/underspecified: see 3) in a group of 28 native speakers of European Spanish. We then manipulated agreement by crossing first-person subjects with third-person verbs (2b) and vice versa (3b). Both violation types yielded a P600 (500-1,000 ms), a component associated with various morphosyntactic operations (e.g., Osterhout & Holcomb, 1992), consistent with previous studies on person agreement (e.g., Mancini et al., 2011Mancini et al., , 2019Nevins et al., 2007;Silva Pereyra & Carreiras, 2007;Zawiszewsky et al., 2016). 2 (2) a. Yo a menudo acaricio a los caballos. I-1ST-SG often pet-1ST-SG CASE the horses b. Yo a menudo *acaricia a los caballos.
El cartero a menudo acaricia a los gatos. the postman-3RD-SG often pet-3RD-SG CASE the cats b. El cartero a menudo *acaricio a los gatos. the postman-  pet-1ST-SG Violations with a first-person subject (2b) yielded a larger P600 (700-900 ms) than violations with a third-person subject (3b), which we interpreted as evidence that the parser can better resolve agreement when the first element in the dependency is marked (Nevins et al., 2007). Recall that the rationale behind this proposal is that, when the first element in the dependency is marked/specified for person, feature activation allows the parser to generate a stronger prediction regarding the upcoming verb. When this prediction is unmet, the result is a larger P600. This is in line with current proposals interpreting the P600 as an index of the reanalysis processes triggered by violations of top-down expectations (e.g., Bornkessel-Schlesewsky & Schlesewsky, 2008;Kuperberg, 2007;Tanner et al., 2017;van de Meerendonk et al., 2010). 3 In contrast, our investigation of noun-adjective number and gender agreement with the same native speakers (Alemán Bañón & Rothman, 2016) failed to provide similar evidence. In that study, we probed agreement with both feminine and masculine nouns (corresponding to the marked/specified and unmarked/underspecified genders, respectively), which could be used in the plural or in the singular (marked/specified vs. unmarked/underspecified number values, respectively) (see an example of a sentence with a feminine singular noun in (4)).
(4) Carlos fotografió una catedral que parecía inmensa para una revista. Carlos photographed a cathedral -FEM-SG that looked huge -FEM-SG for a magazine Our results revealed an earlier P600 for number and gender violations realized on marked adjectives (example from the gender conditions: coche que parecía *cara "car -MASC that looked expensive -FEM ") relative to the opposite error type (e.g., catedral que parecía *inmenso "cathedral -FEM that looked huge -MASC "). In addition, number violations realized on plural adjectives (e.g., coche que parecía *caros "car -SG that looked expensive -PL ") yielded a larger P600 than the reverse error type (e.g., coches que parecían *caro "car -PL that looked expensive -SG "). In sum, native speakers were sensitive to number/gender markedness (like the Spanish natives under stress in López Prego and Gabriele's study, 2014 and in line with McCarthy's predictions for L2ers), but we found no evidence that the marked status of the first element in the dependency (i.e., the noun) facilitated agreement. Possibly, the structure where we examined number/ gender agreement was not sufficiently constraining to allow the parser to generate predictions regarding upcoming adjectives because continuations other than an adjective L2 Processing of Spanish Person Agreement 703 were possible (e.g., una calle que parecía zigzaguear "a street that seemed to zigzag"). The same is not true of the subject-verb agreement manipulation in (2-3), where the subject made it certain that a verb would appear to satisfy the sentence-building phrase structure rule (e.g., Chomsky, 1957Chomsky, , 1995. In Alemán Bañón et al. (2017), we extended the examination of noun-adjective number and gender agreement to 22 upper-intermediate/advanced L1-English L2-Spanish learners, and found qualitatively similar results to the native controls. That is, the learners showed a P600 across both types of gender violations, with an earlier onset for gender violations realized on marked/specified (feminine) adjectives. In addition, they elicited a P600 across both number violation types, marginally larger for errors on marked/specified (plural) adjectives. Thus, similar to the native controls, the L2ers were sensitive to the marked status of the adjective, as opposed to the marked status of the noun (i.e., the first element in the dependency). Importantly, this sensitivity was qualitatively nativelike, which does not align with representational accounts of variability.
Herein, we investigate how the same learners resolve subject-verb person dependencies that differ with respect to markedness (first person: marked/specified as speaker; third person: unmarked/underspecified). We will examine whether learners overuse defaults (i.e., third-person verbal inflection), as predicted by L2 theories of morphological variability. We will further investigate whether this asymmetry is computational (e.g., Prévost & White, 2000) or representational (McCarthy, 2008). We will do so by examining the learners' brain responses to the violating verbs (i.e., the point when the dependency is established) and their accuracy with both dependencies in the untimed GJT at the end of each sentence.
We will also examine the extent to which proficiency explains this potential overreliance on default morphology. Recall that, in McCarthy's study (2012), intermediate learners outperformed low-intermediate learners with first-person verbs, but not with third-person verbs, suggesting that marked forms (i.e., first-person inflection) emerge later in L2 production. We will examine whether a similar asymmetry characterizes L2 comprehension, as predicted by McCarthy. Importantly, proficiency is the most reliable predictor of whether L2ers elicit a P600 for morphosyntactic errors (Caffarra et al., 2015). We will also follow recent claims for the need to dissociate global proficiency from experiential factors, such as amount of L2 instruction and immersion in L2-speaking countries (Bowden et al., 2013;Caffarra et al., 2015;DeLuca et al., 2019). A few studies have investigated the relative contribution of these factors to L2 processing, but results remain inconclusive (e.g., Alemán Bañón et al., 2018;Faretta-Stutenberg & Morgan-Short, 2018).
Alternatively, we will examine whether, similar to the native controls in Alemán Bañón and Rothman (2019), the marked status of the first element in the dependency eases person agreement resolution in the L2. If so, learners should show increased sensitivity (i.e., a larger P600) to person violations with a first-person subject (marked/specified). To our knowledge, only López Prego (2015) has examined this question in L2 comprehension. López Prego used self-paced reading to examine adjective-noun gender agreement in Spanish with a design manipulating whether the adjective showed overt gender cues (e.g., nueva "new-FEM "/verde "green" in 5a) and whether the trigger noun was feminine (5a) or masculine (5b), corresponding to the marked/specified and unmarked/underspecified genders. Her results showed that both native speakers and advanced English-speaking learners read the complementizer following the trigger noun (blusa que …) faster when the preceding adjective was feminine (marked/specified), relative to when it was morphologically invariant (nueva vs. verde in 5a), although the effect was marginal in the learners. In contrast, no such facilitation emerged in the comparison of masculine (unmarked/underspecified) versus invariant adjectives (nuevo vs. verde in 5b). López Prego argues that marked features ease agreement resolution at the noun (although in her study facilitation emerged after the noun).
The present study examines whether other markedness-related properties, such as speech participant status, also facilitate L2 agreement resolution. To our knowledge, only Rossi et al. (2006) and Tanner et al. (2013) have used ERPs to examine L2 person agreement resolution, and neither study manipulated markedness. Rossi et al.'s bidirectional study found that German-Italian learners elicited a P600 for "third-person subject + first/secondperson verb" errors (e.g., Il signore…beve/*bevo "the man drink-3RD-SG /drink-1ST-SG "), which was modulated by proficiency. Tanner et al. (2013) found that third-year L1-English L2-German learners elicited a P600 for "first-person subject + third-person verb" errors (e.g., Ich wohne/*wohnt in Berlin "I live-1ST-SG /live-3RD-SG …"), but first-year students showed a biphasic N400-P600 pattern, which they argue reflects individual differences with respect to morphosyntactic development. Thus, the question of how person markedness modulates L2 processing remains unexplored. Importantly, as Slabakova (2018) points out, studies examining potential reliance on defaults among intermediate/ advanced learners are lacking. Our study fills these gaps.

THE PRESENT STUDY
The present study investigates how markedness modulates subject-verb person agreement in Spanish, among upper-intermediate/advanced English-speaking learners. Our design manipulates both person markedness (first person: marked/specified as speaker; third person: unmarked/underspecified) and agreement. Our research questions "RQ" are: RQ1. Does markedness impact subject-verb person agreement resolution in the L2?
McCarthy's representational account (2008,2012) predicts reduced sensitivity to "first-person subject + third-person verb" errors relative to "third-person subject + firstperson verb" errors across all metrics (accuracy in an untimed GJT, brain responses). This is because the learner's grammar allows the former error type due to representational issues. Alternatively, it is possible that such an asymmetry will only emerge in the ERP responses time-locked to the presentation of the verb because the learners' brain must detect the error exactly when the verb is encountered, whereas the end-of-the-sentence GJT is less time constrained (e.g., Prévost & White, 2000).

L2 Processing of Spanish Person Agreement 705
RQ2. To what extent does L2 proficiency account for a potential overreliance on default morphology?
Proficiency should explain variability with "first-person subject + third-person verb" errors across metrics, but not with the opposite error type (McCarthy, 2012). This is because the former error type involves overusing third-person inflection (a default), a representational issue that learners are only predicted to overcome at higher levels of proficiency (based on McCarthy, 2012). Alternatively, it is possible that proficiency and markedness will only interact in the ERP data (e.g., Prévost & White, 2000).
RQ3. Do L2 learners use person markedness information to ease agreement resolution?
If so, L2ers should elicit larger ERP responses to "first-person subject + third-person verb" errors (similar to the Spanish native speakers in Alemán Bañón and Rothman, 2019) due to the fact that feature activation at the subject (i.e., the first element in the dependency) should facilitate agreement at the verb (e.g., López Prego, 2015; Nevins et al., 2007).

PARTICIPANTS
Twenty-two L1-English L2-Spanish learners (12 females; mean age: 25; SD: 7.5) with a mean age of L2 acquisition of 14 years (range: 8-23) provided their informed written consent to participate in the study. Their Spanish proficiency was monitored with a 50-item test including the cloze section of the Diploma de Español como Lengua Extranjera and the vocabulary section from the MLA Cooperative Foreign Language Test (e.g., Alemán Bañón et al., 2014;McCarthy, 2008;White et al., 2004). Sixteen learners were of advanced proficiency (range: 43-50/50) and six of intermediate proficiency (33)(34)(35)(36)(37)(38). Mean duration of Spanish instruction was 7.3 years (SD: 2.7 years; range: .5-12 years) and mean duration of immersion in Spanish-speaking countries was 15 months (SD: 13 months; range: 0-48 months). Only four learners had lived in L2-speaking countries for less than eight months. All learners grew up as monolingual speakers of English, with the exception of one heritage speaker of Japanese, a language that does not instantiate subject-verb person agreement. Twenty of the learners reported knowledge of other foreign languages to varying degrees of proficiency.
A group of 28 native speakers of Spanish reported in Alemán Bañón and Rothman (2019) served as the control group. All 50 participants met the standard requirements for language-related ERP studies: right-handedness (Edinburgh Handedness Questionnaire; Oldfield, 1971), normal or corrected-to-normal vision, and no history of neurological impairments. The testing took place in the United Kingdom and all participants received monetary compensation for their time.

MATERIALS
To examine the contribution of person markedness to agreement, we created 80 sentences with a first-person singular subject (Table 1: condition 1), which is marked/specified for person (speaker), and 80 sentences with a third-person singular lexical subject (condition 3), which is unmarked/underspecified for person (the default person). Agreement was manipulated by pairing up first-person subjects with third-person verbs (condition 2), and vice versa (condition 4).

Structure of the Sentences
All experimental sentences followed the structure in conditions 1-4. They started with the subject, followed by the adverb a menudo "often," the verb, and a three-word continuation that ensured that the verb (i.e., the critical word) was not sentence-final. The adverb a menudo intervened between the subject and the verb to create linear distance between the agreeing elements. Previous ERP studies argued that comprehenders are more likely to exploit predictive strategies when they have sufficient time for prediction generation (e.g., Ito et al., 2017;Wlotko & Federmeier, 2015). Thus, if the L2ers herein can successfully use the subject's person markedness to ease agreement resolution at the verb, the present setup is suitable to investigate that possibility.
they-3RD-PL are more punctual than you -2ND-SG

L2 Processing of Spanish Person Agreement 707
The Subjects of the Sentences The conditions with a marked/specified subject involved the first-person singular pronoun yo (conditions 1-2). For theoretical reasons, the conditions with an unmarked/underspecified subject involved third-person singular lexical determiner phases (DPs) (conditions 3-4). There is disagreement in the literature regarding whether third-person pronouns are underspecified for person (e.g., Harley & Ritter, 2002) or carry a nonparticipant person feature (e.g., Nevins, 2007). However, there is consensus that referential DPs like el maestro "the teacher" carry no person specification (e.g., Bianchi, 2006;Den Dikken, 2011;Nevins, 2011). Because the present study is concerned with whether the parser can better establish person agreement when the subject carries person information, relative to when it does not, we opted for lexical DPs as unmarked/underspecified subjects. Because the Spanish verb also encodes number, all subjects were used in the singular (unmarked/underspecified for number). Two measures were taken to match the two markedness conditions as much as possible. First, we submitted the sentences with lexical DP subjects, truncated at the adverb a menudo, to a cloze probability rating (e.g., la artista a menudo…) to rule out the possibility that participants could predict the target verbs based on the lexical features of the subjects. This rating, which involved 33 Spanish native speakers who did not participate in the ERP study reported in Alemán Bañón and Rothman (2019), revealed that mean cloze probability was very low (mean cloze: .03; SD: .1), suggesting that the target verbs were, overall, not predictable. In addition, we added 80 fillers using other (nominative case) personal pronouns in coordinated structures with ellipsis (see Table 1). In 40 of those fillers, the pronouns were used in contrastive focus. The use of other person pronouns was expected to attenuate the salience of pronoun yo in the experiment. Likewise, the use of contrastive focus and ellipsis was expected to improve the naturalness of overt pronouns in the sentences, given that Spanish is a null-subject language.

The Target Verbs
We used the same verbs in the conditions with firstand third-person subjects. Thus, at the verb (i.e., the critical word) the two markedness conditions only differed with respect to the subject. Verbs inflected for firstand third-person singular were controlled with respect to number of characters (mean length of third-person verbs: 6.56; SD = 1.61; 95% CI [6.20, 6.92]; first-person verbs: 6.57; SD = 1.65; 95% CI [6.21, 6.94]; t(79) = .445, p = .658; ηp2 = .003). However, third-person verbs were significantly more frequent than first-person ones (EsPal database; Duchon et al., 2013). Being underspecified for person, third-person verbs emerge in more syntactic contexts than first-person verbs, which results in the former being more frequent. Finally, the critical verbs were always located midsentence. These 160 sentences were intermixed with 240 sentences from Alemán Bañón et al. (2017), a study that does not manipulate subject-verb agreement. These materials were counterbalanced across 12 lists. Across lists, all sentences occurred in their grammatical and ungrammatical versions, but no participant saw the same sentence twice. Each participant saw two different lists, administered on separate days. Each list contained an equal number of items per condition. After combining the two lists, each participant saw 80 sentences with a first-person subject (40 ungrammatical) and 80 sentences with a third-person subject (40 ungrammatical). They also saw 20 items from each of the 12 conditions in Alemán Bañón et al. (2017) (80 grammatical, 80 number violations, 80 gender violations) and 80 grammatical fillers. The ratio of grammatical to ungrammatical sentences was 1/1. In total, participants saw 480 sentences across the two sessions. All materials associated with this study are published in Alemán Bañón and Rothman (2019). All materials from the study on number/gender agreement are published in Alemán Bañón and Rothman (2016). 4

PROCEDURE
The testing involved two EEG recordings, each including 240 sentences (with an equal number of items per condition, including the fillers). We used the software Paradigm (Perception Research Systems Inc.; Tagliaferri, 2005) for sentence presentation. Participants were instructed to read the sentences silently, without blinking, and to decide whether each was grammatical or ungrammatical in Spanish (e.g., Rossi et al., 2006;Tanner et al., 2013). They were asked to favor accuracy over speed. Each EEG recording included eight practice sentences. Four of them were ungrammatical, but none involved agreement violations or nouns/verbs from the experimental materials. Feedback was provided for the first three practice trials. The experiment began upon completion of the practice. Each recording was divided into six 40-sentence blocks, separated by five short breaks. Each EEG recording lasted approximately 1 hour.

Trial Structure
First, a fixation cross was displayed in the center of the screen for 500 ms. Then, the presentation of the sentence began, one word at a time. Each word was displayed for 450 ms, followed by a 300 ms blank (e.g., Alemán Bañón et al., 2012Bañón et al., , 2014; see Molinaro et al., 2011). At the end of the sentence came a 1,000 ms pause. Participants then saw the words Bien "good" or Mal "bad" on the screen and decided if the sentence was grammatical or ungrammatical by pressing a button (middle and index fingers of the left hand, respectively). The prompts remained on the screen until the participant provided a response. Upon the button press, an intertrial interval was added ranging between 500-1,000 ms, pseudorandomly varied at 50 ms increments.

L2 Processing of Spanish Person Agreement 709
We used Brain Vision Analyzer 2.0 (Brain Products, GmbH, Germany) for offline data processing. First, we re-referenced the recordings to the average of near-mastoid electrodes (TP7/8). 5 We then segmented the EEG into epochs from À300 ms to 1,200 ms relative to the verb. Upon visual inspection, we rejected trials with blinks, horizontal eye movements, excessive alpha waves, or excessive muscle movement. We then discarded trials with incorrect behavioral responses. Finally, the epochs were baseline-corrected relative to the 300 ms prestimulus baseline, averaged per condition and per subject, and filtered with a phase-shift free Infinite Impulse Response Butterworth filter, with a high cutoff of 30 Hz and a 12 dB/octave rolloff.
Rejection of trials with artifacts or incorrect behavioral responses resulted in approximately 16% of data loss. The mean number of good trials per condition ranged between 33-35/40 (Condition 1: 35; Condition 2: 33; Condition 3: 34; Condition 4: 33). A repeated-measures ANOVA revealed that we had retained fewer trials associated with person violations than with grammatical sentences overall, F(1, 21) = 4.450, p = .047 (we discarded incorrectly judged trials, and learners were marginally less accurate rejecting person errors overall in the GJT). As Luck (2014, supplement, chapter 8, pp. 4-5) points out, different numbers of trials per condition is not problematic when analyzing mean amplitudes. Most importantly, this difference does not affect the examination of the Markedness by Agreement interaction because we retained a comparable number of trials for each error type.

Statistical Analysis
To evaluate RQ1 and RQ3, mean amplitudes were submitted to a repeated-measures ANOVA with Markedness (first-person subject, third-person subject), Agreement (grammatical, ungrammatical), Anterior-Posterior (anterior, medial, posterior), and Hemisphere (left, right) as the within-subjects factors. The hemisphere and midline regions were analyzed separately. For the analyses in the midline, the only topographical factor in the ANOVA was Anterior-Posterior. Because the Markedness by Agreement interaction is critical for our discussion, whenever this interaction was qualified by a topographical factor, follow-up analyses were conducted by examining the Markedness by Agreement interaction separately within the relevant ROIs. The Geisser and Greenhouse correction was applied for sphericity violations (we report corrected degrees of freedom; Field, 2005). To evaluate RQ2, we ran a series of multiple regressions with repeated measures (detailed in the following text). These analyses allowed us to examine how the linear relationship between proficiency measures (score in standardized Spanish proficiency test, amount of L2 instruction, immersion in L2-speaking countries) and measures of sensitivity to agreement (P600 magnitude, D-prime) varied for each level of the repeated factor (person error type) (Schneider et al., 2015).
A False Discovery Rate correction (Benjamini & Hochberg, 1995) was applied to all follow-up tests, to control for Type I error. For all follow-up tests, we provide both the raw p value and the adjusted significance level (q*). We consider effects where p is below .05 as significant and effects where p is between .05 and .1 as marginal.

BEHAVIORAL
The percentage of accurate responses in the GJT is provided in Table 2 for all experimental conditions. For each person dependency type, the rightmost column of the table provides D-Prime Scores (i.e., a single measure of sensitivity to each person dependency type that controls for response bias). The learners were very accurate across the board (above 90% across conditions), suggesting that they understood the task and were able to complete it. A repeated-measures ANOVA with Markedness (first-person subject, thirdperson subject) and Agreement (grammatical, ungrammatical) as the repeated factors revealed a marginal main effect of Agreement, F(1, 21) = 3.641, p = .07; ηp2 = .148, driven by the fact that learners were less accurate rejecting ungrammatical sentences overall (M = 92; SD = 11; 95% CI [87, 97]) than accepting grammatical ones (M = 96; SD = 5; 95% CI [93,98]).

ERP EFFECTS
Both types of person errors yielded more positive waveforms than their grammatical counterparts (Figures 1-2). This positivity starts~450 ms upon presentation of the violating verb, does not go back to baseline by the end of the epoch (1,200 ms), and shows a central-posterior distribution. Overall, this is consistent with the P600. The positivity appears less robust for "first-person subject + third-person verb" errors, as can be seen in Figure 3, which shows the magnitude of the effects. In the same time window where the P600 emerged (~450 ms until the end of the epoch), violations yielded more negative waveforms than their grammatical counterparts in Left Anterior. This negativity appears equally robust for both error types.    In the hemispheres, the omnibus ANOVA also revealed a significant Agreement by Anterior-Posterior interaction (Table 3) In the midline, the omnibus ANOVA revealed a significant Agreement by Anterior-Posterior interaction (Table 3). Follow-up tests revealed a main effect of Agreement in Midline Posterior, F(1, 21) = 9.376, p = .006, q* = .017; ηp2 = .309, driven by the fact that person violations overall were more positive (M = 2.23 μV; SD = 2.50; 95% CI [1.13, 3.34]) than grammatical sentences (M = 1.67 μV; SD = 2.58; 95% CI [.53, 2.81]).

450-900 ms Time Window (P600)
In the midline, the omnibus ANOVA also revealed a significant main effect of Agreement, which was qualified by an interaction with Anterior-Posterior (see Table 3). To summarize, person violations overall yielded a P600 with a central-posterior distribution. Importantly, the P600 was reduced for "first-person subject + third-person verb" violations (in the midline), although the interaction remained marginal. In this time L2 Processing of Spanish Person Agreement 715 window, a negativity emerged across person violations relative to grammatical sentences in Left Anterior. The P600 was already apparent in the preceding time window (250-450 ms).

P600 Magnitude
We used multiple regression with repeated measures to examine whether overreliance on defaults (reduced sensitivity to "first-person subject + third-person verb" errors) was accounted for by variables related to the learners' proficiency in and experience with their L2. The dependent variable was P600 magnitude, corresponding to the mean amplitude between 450-900 ms in the ungrammatical minus the grammatical condition in a ROI including all regions where the P600 emerged (Left Medial, Right Medial, Left Posterior, Right Posterior, Midline Medial, and Midline Posterior). P600 magnitude was calculated separately for each person violation type. The two error types correspond to the two levels of the within-subjects predictor, Error_Type. The between-subjects predictors were Global_Proficiency (score in a standardized Spanish proficiency test), Instruction (years of instruction in L2 Spanish), and Months_Abroad (months spent in a Spanish-speaking country). Table 4 shows all zero-order correlations between the variables of interest.
The assumptions of multiple regression (evaluated using the residuals) were met. Table 5 summarizes the results of the regression. The significant Global_Proficiency by Error_Type interaction was driven by the fact that Global_Proficiency significantly predicted P600 magnitude for "first-person subject + third-person verb" violations, t(18) = 2.84, p = .011, q* = .017, ηp2 = .309, an instance of overreliance on defaults, but not for the reverse error (Figure 4, Plots A-B). Examination of the unstandardized 716 José Alemán Bañón, David Miller, and Jason Rothman regression coefficient suggests that a one standard deviation increase in Global_Proficiency (i.e., a 5-point score increase in the proficiency test) results in an estimated .57 μV increase in P600 magnitude (95% CI [.15, .98]) for "first-person subject + third-person verb" errors. Notice also that a few learners have negative values, consistent with the possibility that their sensitivity to the errors was qualitatively different. Thus, the positive value of the regression coefficient suggests development toward increasingly nativelike processing for "first-person subject + third-person verb" errors, both qualitatively and quantitatively. The main effect of Months_Abroad was also significant, driven by the fact that P600 magnitude overall tended to be larger for learners with longer immersion time. The estimated increase in P600 magnitude for every one standard deviation increase in Months_Abroad (i.e., a 13-month increase in immersion time) was .37 μV (95% CI [.02, .73]). However, this effect was qualified by a marginal interaction with Error_Type (Table 5). As Plots A and B of Figure 5 reveal, this interaction was driven by the fact that Months_Abroad predicted P600 magnitude for "third-person subject + first-person verb" errors, t(18) = 2.782, p = .012, q* = .017, ηp2 = .301 (estimated increase in P600 magnitude for every one standard deviation increase in Months_Abroad = .68 μV; 95% CI [.17, 1.19]), but not for the other error.

D-Prime Scores
A similar approach was undertaken to examine the relationship between the learners' behavioral sensitivity to person agreement and their proficiency in and experience with L2 Spanish. The dependent variable was D-Prime Score. All other aspects of the analysis were held constant. Zero-order correlations and a summary of the effects are provided in Tables 4 and 5, respectively. The assumptions of multiple regression were met. The main effect of Global_Proficiency was significant, with more proficient L2ers showing higher D-Prime scores overall (Figure 4, Plots C-D). The estimated increase in D-Prime_Score for every one standard deviation increase in Global_Proficiency (i.e., a 5-point score increase in the proficiency test) was .52 (95% CI [.24, .80]). The main effect of Month-s_Abroad was also significant. Learners with longer immersion time showed lower D-Prime scores (i.e., lower accuracy) ( Figure 5, Plots C-D). The estimated decrease in Spanish proficiency test) and their sensitivity to person agreement both in terms of P600 magnitude (Plots A and B) and in terms of behavioral accuracy (Plots C and D). Note. P600 effect size was calculated by subtracting the grammatical from the ungrammatical condition. Effects were averaged across all regions where P600 effects emerged for both types of person errors. Behavioral accuracy was operationalized as D-prime Score for each type of person dependency in the GJT. Each dot represents a data point from a single learner. The dashed line represents the best-fit regression line. Minimal jitter has been added to make learners with identical or near identical values visible. Marked subject = first-person; Unmarked subject = third-person.

DISCUSSION
The present study investigated the role of markedness in the processing of subject-verb person agreement in Spanish by upper-intermediate/advanced English-speaking learners. Recall that our main aim was to adjudicate between different proposals regarding how markedness impacts L2 agreement resolution. L2 theoretical models posit that learners overuse default morphology (e.g., third-person verbal inflection) in contexts that require marked/specified forms (e.g., first-person subjects) due either to computational pressure (Prévost & White, 2000) or to the asymmetrical representation of features in the learner's Note. P600 effect size was calculated by subtracting the grammatical from the ungrammatical condition. Effects were averaged across all regions where P600 effects emerged for both types of person errors. Behavioral accuracy was operationalized as D-prime Score for each type of person dependency in the GJT. Each dot represents a data point from a single learner. The dashed line represents the best-fit regression line. Minimal jitter has been added to make learners with identical or near identical values visible. Marked subject = first-person; Unmarked subject = third-person.
grammar (e.g., McCarthy, 2008McCarthy, , 2012. Alternative proposals from the psycholinguistics literature suggest that marked features remain longer in the focus of attention and, thus, facilitate agreement operations, for example, by recruiting top-down mechanisms to resolve agreement (Nevins et al., 2007;Wagers & McElree, 2011;see López Prego, 2015). Under the latter proposal, feature activation upon encountering a marked/specified feature (e.g., a first-person subject) allows the parser to better resolve agreement.
To that end, we probed subject-verb person agreement with both first-person (marked/ specified as speaker) and third-person (unmarked/underspecified) subjects (e.g., Bianchi, 2006;Den Dikken, 2011;Harley & Ritter, 2002;Harris, 1995;Jakobson, 1971;McGinnis, 2005;Nevins, 2011). By crossing each subject type with verbs inflected for the opposite person, we created two error types for which the previously mentioned proposals make different predictions. The learners' brain responses revealed reduced sensitivity (i.e., a reduced P600) to "first-person subject + third-person verb" errors, relative to the opposite error type, although this effect remained marginal in the ANOVA. Recall however that, in the regression analyses, Proficiency interacted with Error_Type (i.e., P600 magnitude as a function of error type). This analysis showed that the reduced sensitivity to "first-person subject + third-person verb" errors characterized the less proficient learners in the sample. These data suggest that, in the course of online processing, learners tolerated unmarked/underspecified forms (third-person inflection) in contexts that required marked/specified ones (first-person subject), mainly at intermediate levels of proficiency. Importantly, the fact that no asymmetry emerged in the end-of-the-sentence GJT, for which learners took as much time as they needed, indicates that such overreliance on defaults most likely results from computational burden (e.g., Alemán Bañón et al., 2017;Hopp, 2010;López Prego & Gabriele, 2014;McDonald, 2006;Prévost & White, 2000). That is, while the learners' brain might not have detected "first-person subject + third-person verb" errors exactly at the time when the violating verb was presented, it is possible that the learners detected the agreement error by the time they provided the grammaticality judgment. The learners also elicited a Late Anterior Negativity (similar to the native speakers) across both error types. This component has been argued to reflect the costs associated with keeping the violations in working memory for the GJT (e.g., Alemán Bañón et al., 2012;Gillon-Dowens et al., 2010;Sabourin & Stowe, 2004). This might explain why this component was unimpacted by markedness because the learners were equally accurate detecting both error types.
Recall that the same learners were sensitive to markedness asymmetries in the processing of noun-adjective number and gender agreement (Alemán Bañón et al., 2017). In that study, number violations on plural adjectives (marked for number) yielded a marginally larger P600 than number errors on singular adjectives. In addition, the P600 emerged earlier for gender errors realized on feminine (marked for gender) relative to masculine adjectives. Crucially, similar effects emerged in the L1-Spanish controls (Alemán Bañón & Rothman, 2016), suggesting that the learners' processing profile was qualitatively nativelike, at least for noun-adjective number/gender. The overall picture that emerges from these studies is that learners are sensitive to markedness asymmetries at the point when the agreement dependency is resolved (the adjective for noun-adjective agreement; the verb for subject-verb agreement), and that such sensitivity can also characterize native speaker processing (e.g., Alemán Bañón & Rothman, 2016;López Prego & Gabriele, 2014). 6 We also asked whether learners eventually abandon their reliance on default morphology with development. Our regression analysis suggests so. As Figure 4 shows, more proficient learners showed larger P600 effects for "first-person subject + third-person verb" violations (Plot A), but not for "third-person subject + first-person verb" errors (Plot B). One possibility is that, at the upper-intermediate level of proficiency (i.e., the lower bound of the proficiency range examined herein), the learners' grammar still allows first-person subjects with third-person inflection, which still functions as a sort of all-purpose form. However, the learners' grammar already rules out "third-person subject + first-person verb" configurations. This aligns with McCarthy's analysis (2012) of corpus production data, which showed that less proficient learners supplied third-person verbs with first-person subjects, but rarely did the reverse. Plots A-B from Figure 4 reveal that P600 effects were larger for "third-person subject + first-person verb" errors overall (Plot B). Twelve learners show a P600 of approximately 1 μV or larger for this error type, and these learners are scattered across the proficiency spectrum examined. In contrast, fewer learners (seven) show P600 effects of approximately similar size for "first-person subject + third-person verb" violations (Plot A), and all of them scored in the upper range of the advanced level. In addition, although we see negative effects for both error types, negativities tend to be larger for "first-person subject + third-person verb" errors, and the largest negativities for this error type are associated with learners near the lower end of the proficiency range examined. It is thus possible that lower proficiency learners rely on qualitatively different processing mechanisms for harder dependencies (e.g., Carrasco-Ortiz et al., 2017;Osterhout et al., 2006;Tanner et al., 2014) although the small number of negative responders in our sample precludes strong conclusions. That global proficiency interacted with error type in the ERP data is consistent with the claim we made in Alemán Bañón et al. (2017) that markedness impacts L2 processing without constraining it. Overreliance on default forms (i.e., reduced sensitivity to "first-person subject + third-person verb" errors) might characterize the intermediate levels of proficiency, but is progressively abandoned at higher ones. Figure 4 (Plots C-D) shows that higher proficiency learners also tended to show higher D-Prime scores for both person violation types, providing additional evidence for development (but not markedness) (e.g., Alemán Bañón et al., 2018). The reader might wonder why global proficiency interacted with markedness in the ERP data, but not in the D-Prime data. The two metrics probably tap into different types of sensitivity. While P600 magnitude is a measure of brain sensitivity to person dependencies exactly at the time when they are established, D-Prime scores provide a measure of sensitivity to the same dependencies once the learner has read the whole sentence. That D-Prime scores and P600 magnitude did not significantly correlate for either type of person dependency is consistent with this possibility (see Table 4). It is thus conceivable that proficiency modulated both types of sensitivity differently. Given what is required of a judgment task (i.e., detecting the ungrammaticality, maintaining decisions about grammaticality in working memory until the end of the sentence, wrapping up sentence meaning), a lower level of proficiency might have impacted accuracy with both types of person dependencies.
Therefore, the response to RQ1 and RQ2 (Does markedness impact subject-verb person agreement resolution in the L2? To what extent does L2 proficiency account for a potential overreliance on default morphology?) is that L2 processing is not constrained L2 Processing of Spanish Person Agreement 721 by markedness, but is sensitive to it, particularly among less advanced learners. Because markedness did not impact the learners' accuracy while judging the sentences at a later point and with no time pressure, sensitivity to markedness is more likely to be computational.
We now turn to the question of why these learners, who were qualitatively nativelike with noun-adjective number and gender agreement, showed a qualitatively different processing profile from native speakers for subject-verb person agreement. 7 One possibility is that, although the learners could successfully resolve person agreement at the verb, the markedness of the subject (i.e., the first element in the dependency) did not facilitate agreement. The question of whether L2 learners can use linguistic cues to facilitate integration of the bottom-up input has played a central role in recent L2 processing research (see Kaan, 2014 for a review). While some studies have argued that L2 learners, even advanced ones, fail to use lexical, morphosyntactic, syntactic, or discourse cues predictively (e.g., Grüter et al., 2012Grüter et al., , 2017Kaan et al., 2016;Martin et al., 2013), others have claimed that predictive processing is similar in the L1 and L2 (e.g., Kaan, 2014), but is modulated by proficiency (e.g., Dussias et al., 2013), the strength of lexical representations (e.g., Hopp, 2013), L1-L2 similarity (e.g., Foucart et al., 2014;Hopp & Lemmerth, 2018), and individual differences in cognitive factors (e.g., Hopp, 2013). It is still unclear, however, which cues learners can use predictively and which linguistic representations they can activate because most of the evidence supporting prediction in the L2 comes from studies manipulating overt gender cues (Dussias et al., 2013;Foucart et al., 2014;Hopp, 2013;López Prego, 2015). Thus, it is possible that, unlike gender cues, speech participant status was insufficient to facilitate agreement at the verb. So, the answer to RQ3 (Do L2 learners use person markedness information to ease agreement resolution?) is preliminarily "no," based on the evidence provided herein.
It could be argued that the larger P600 for "third-person subject + first-person verb" errors does not reflect sensitivity to markedness, but rather facilitation from L1 English, which instantiates agreement with third-person singular subjects (e.g., Jiang, 2004Jiang, , 2007Lardiere, 2009;Tokowicz & MacWhinney, 2005). First, we point out that using plural subjects would not have worked because third-person plural DPs in Spanish agree with firstand second-person plural verbs, a process called unagreement (e.g., Las viudas lloramos/lloráis "[we/you-2ND-PL ] the widows cry-1ST-PL /cry-2ND-PL ") (e.g., Höhn, 2016;Hurtado, 1985), and previous work by Mancini et al. (2011Mancini et al. ( , 2019 has shown that native speakers treat these sentences differently from outright person violations (i.e., they do not elicit a P600). Most importantly, while previous ERP studies manipulating L1-L2 similarity have consistently shown an advantage for L2 features instantiated in the L1 (e.g., Alemán Bañón et al., 2014;Bond et al., 2011;Gabriele et al., in press;Gillon-Dowens et al., 2010 they have consistently found no advantage for shared morphological instantiations of common features (e.g., Alemán Bañón et al., 2014;Bond et al., 2011;Gabriele et al., in press). For example, Bond et al. (2011) and Gabriele et al. (in press) examined L1-English L2-Spanish learners' brain responses to subject-verb number agreement with third-person singular subjects (instantiated in English) and noun-adjective number agreement (unique to Spanish), and found no facilitation for the former. Other studies have even found a disadvantage for contexts where the L1 overtly marks agreement. For example, Alemán Bañón et al. (2014) examined demonstrative-noun and noun-adjective number agreement in Spanish (e.g., este apartamento/*apartamentos "this apartment/apartments"; órgano muy complejo/*complejos "organ-SG very complex-SG /complex-PL ") among advanced English-speaking learners. Crucially, English instantiates number on demonstratives, but not on adjectives. Their results showed a larger P600 for number violations on adjectives, an effect that can be explained by markedness (i.e., este was unmarked, but complejos was marked) or differences in syntactic category, but crucially not by transfer (e.g., Tokowicz & MacWhinney, 2005). Thus, it is unlikely that the effects herein reflect L1 facilitation, especially because the same learners showed enhanced sensitivity to gender errors realized on marked adjectives (Alemán Bañón et al., 2017), an effect that cannot come from the L1.
Before concluding, we address the role of immersion in L2 morphosyntactic development, which showed inconsistency across measures. While Plot B of Figure 5 suggests that longer immersion results in greater brain sensitivity to "third-person subject + firstperson verb" errors, Plots C-D reveal lower D-Prime scores (i.e., less sensitivity) to both error types for learners with similar immersion time. We can think of no reason why immersion in an L2-speaking environment would result in poorer ability to resolve any type of morphosyntactic dependency. Importantly, immersion represents an indirect measure of proficiency. Learners with longer immersion time are assumed to have benefited from richer input, alongside increased opportunities for output. In turn, this is expected to promote morphosyntactic development. However, this might simply not be the case. In fact, previous studies on morphosyntactic development do not consistently report advantages for learners in study-abroad programs relative to "at home" learners (Faretta-Stutenberg & Morgan-Short, 2018, p. 5). In our study, the two learners with the longest immersion time (48 and 36 months, respectively) had mainly lived in Englishspeaking communities and they both scored in the intermediate range in the standardized test (38 and 36, respectively). Both of them showed D-Prime scores roughly one or two standard deviations below the mean. Likewise, one of the learners with the highest D-Prime scores (approximately one standard deviation above the mean) had never resided in Spanish-speaking countries. However, this learner had lived immersed in a Spanishspeaking community for two years in the United States and benefited from native speaker input. It is, therefore, possible that, being an indirect measure of proficiency, immersion time is a noisier predictor of grammatical development. Another possibility is that there was not sufficient variability in the sample for reliable relations to emerge because half of the learners had an immersion time of~10 months.

CONCLUSION
The present study found that a group of 22 L1-English L2-Spanish learners showed brain sensitivity (i.e., a P600) across two types of subject-verb person dependencies that differed with respect to markedness. The learners were marginally less sensitive to violations where the subject corresponded to a marked/specified person (first person: speaker) and the verb was unmarked/underspecified for person (third person), suggesting some overreliance on default morphology (i.e., third-person inflection). Markedness did not impact the learners' ability to detect the same violations in an end-of-the-sentence GJT, for which no time constraints were imposed, suggesting that overreliance on defaults is computational. Regression analyses showed that L2ers gradually abandon this overreliance on defaults with increased proficiency. Finally, unlike what we found in the L2 Processing of Spanish Person Agreement 723 Spanish controls (Alemán Bañón & Rothman, 2019), the speech participant status of the subject (i.e., speaker vs. the default person) did not ease agreement resolution among learners.
NOTES 1 See Battistella (1990) for a list of criteria to determine markedness asymmetries. 2 Between~300-500 ms, the P600 is sometimes preceded by a Left Anterior Negativity (e.g., Rossi et al., 2005), associated with morphosyntactic processing, or an N400 (e.g., Mancini et al., 2011Mancini et al., , 2019Zawiszewsky et al., 2016), associated with lexical-semantic operations. Because no negativities emerged in Alemán Bañón and Rothman (2019) for either person error type, we focus on the P600. 3 Because we examined agreement violations, our study cannot dissociate prediction from integration effects (see Kuperberg & Jaeger, 2016). We reasoned, however, that only "first-person subject + third-person verb" violations would contribute to top-down processing of person information because third-person subjects lack person information. In Alemán Bañón and Rothman (2019), we did not argue that the larger P600 for violations with first-person subjects reflected prediction disconfirmation, but rather the reanalysis cost engendered by an unmet prediction. 4 The 12 counterbalanced lists and the instructions to the task can be downloaded from the IRIS database at https://www.iris-database.org/iris/app/home/index 5 Because electrode TP10 was noisy in several recordings, we re-referenced them to the average of TP7/TP8 (e.g., Simor et al., 2019). 6 Although here we focused on comprehension, our design includes an elicited production task, which revealed virtually no variability. 7 Because our learners' proficiency ranged from intermediate to advanced, we analyzed them separately from native speakers. An ANOVA with Error_Type as the repeated factor and Group (natives, learners) as the between-subjects factor showed a significant Error_Type by Group interaction, F(1, 48) = 4.359, p = .042, ηp2 = .083, with learners showing a reduced P600 for "first-person subject + third-person verb" errors relative to native speakers, F(1, 48) = 6.672, p = .013, q* = .033, ηp2 = .122. P600 was calculated for one region including 16 central-posterior electrodes, between 500-1,000 ms for natives and 450-900 ms for L2ers).