In the native speaker’s eye: Online processing of anomalous learner syntax

Abstract How do native speakers process texts with anomalous learner syntax? Second-language learners of Norwegian, and other verb-second (V2) languages, frequently place the verb in third position (e.g., *Adverbial-Subject-Verb), although it is mandatory for the verb in these languages to appear in second position (Adverbial-Verb-Subject). In an eye-tracking study, native Norwegian speakers read sentences with either grammatical V2 or ungrammatical verb-third (V3) word order. Unlike previous eye-tracking studies of ungrammaticality, which have primarily addressed morphosyntactic anomalies, we exclusively manipulate word order with no morphological or semantic changes. We found that native speakers reacted immediately to ungrammatical V3 word order, indicated by increased fixation durations and more regressions out on the subject, and subsequently on the verb. Participants also recovered quickly, already on the following word. The effects of grammaticality were unaffected by the length of the initial adverbial. The study contributes to future models of sentence processing which should be able to accommodate various types of “noisy” input, that is, non-standard variation. Together with new studies of processing of other L2 anomalies in Norwegian, the current findings can help language instructors and students prioritize which aspects of grammar to focus on.

specifically relevant in the context of increased global mobility, where native speakers of a language need to accommodate anomalies produced by immigrant adult L2 learners.
In the present eye-tracking study, we investigated how native speakers process anomalous L2 syntax. We presented native Norwegian speakers with written sentences with syntactic anomalies in order to elicit their responses to typical nonnative word order.
The study focuses on verb-second (V2) word order, which is common in most Germanic languages (apart from English). In V2 languages, the finite verb occurs in the second position of a declarative main clause, preceded only by a single first constituent. In the Norwegian examples below, sentence (1a) is grammatical, as the verb spiller 'plays' is correctly placed in second position, preceded by one constituent, the fronted adverbial på torsdager 'on Thursdays.' The subject gutten 'the boy' is placed after the main inflected verb. However, (1b) is ungrammatical in Norwegian, since two constituents, both the adverbial på torsdager 'on Thursdays' and the subject gutten 'the boy,' precede the verb, which is in third position. Thus, (1b) is an example of ungrammatical V3 word order.
The ungrammatical sentence with V3 in (1b) does not express a different propositional content than the grammatical sentence with V2 in (1a). Sentences with V3 are found in multiethnic urban vernaculars in Sweden (Kotsinas, 2000), Denmark (Quist, 2008), Norway (Hårstad & Opsahl, 2013), and Germany (Freywald et al., 2015), and they are used with the same meaning as an equivalent sentence with V2, but as part of a different stylistic practice (see Quist, 2008). Language attitude experiments document that V3 may be associated either with immigrant status or with multiethnic youth varieties (Freywald et al., 2015;Quist, 2008).
Though ungrammatical V3 word order is common in L2 production (and in urban vernaculars), there are only a few studies on the perception of V3. Generally, there is little research on native speakers' processing of non-native or non-standard syntax, which is surprising given the prevalence of this type of "noisy" and non-standard variation. This research is likewise critically important for developing models of sentence processing that can accommodate said variability. The current study contributes valuable input to such models for two reasons. Firstly, the word order anomalies in the study are naturally occurring in both oral and written production, rather than consisting of randomly scrambled words, as in previous eye-tracking studies on word order (Huang & Staub, 2021). Secondly, previous eyetracking studies of ungrammaticality have primarily addressed morphosyntactic anomalies. We cannot a priori know whether word order anomalies elicit the same effects as anomalies involving morphological changes. According to some neurolinguistic models (e.g., Friederici, 2002), initial syntactic structure building and morphosyntactic processes differ in timing.
The current findings can thus inform future models of processing of naturally occurring word order anomalies which are part of everyday communication in multi-lingual and multiethnic societies leading to more robust models which accommodate noisy input from non-proficient language users and other types of non-standard variation.

Background
In this section, we review results from EEG studies on the processing of V3 word order. Given the lack of eye-tracking studies on V3, we review results from the relatively few eye-tracking studies that have investigated other types of ungrammaticality, that is, morphosyntactic anomalies or transposed words. The review focuses on the time course of the effects of ungrammaticality, which has varied in previous studies and which we return to in the discussion. We expect that all types of ungrammaticality will result in a surprisal effect when predictions about the morphological form of words, or the order of words, are not met, consistent with prediction-based approaches to sentence processing (Christiansen & Chater, 2016;Kamide 2008;Levy, 2008). However, given the different nature of word order anomalies versus morphosyntactic anomalies, their eye-tracking record may differ.

Processing of V3evidence from EEG
Three studies on Swedish have examined online processing of ungrammatical V3 after sentence-initial adverbials, measured by event-related potentials (ERPs) (Andersson et al., 2019;Yeaton, 2019;Sayehli et al., 2022). Andersson et al. and Yeaton manipulated the order of subject and verb, as shown in (2). 2. a. Idag läste hon tidningen. 'Today read she the paper.' b. *Idag hon läste tidningen. 'Today she read the paper.' Both studies found a P600 effect, an ERP component often elicited by syntactic violations and considered a later response, typically related to an effort to integrate anomalous input into the context of the sentence. The P600 occurred for the processing of anomalous compared to correct sentences, both in native Swedish speakers and in L2 learners (with German, English, or French as L1). Despite similar patterns for this late effect, only the native speakers showed a left anterior negativity (LAN) effect, which may reflect more automatic processing (Andersson et al., 2019). The stimuli in these studies had little variation in the choice of adverbials. Sentences always started with the adverbs idag ('today') or hemma ('at home'). Sayehli et al. (2022) also included sentences with V3 after kanske 'maybe,' which were judged to be more acceptable than sentences with the other two adverbials. Accordingly, the ERP analyses showed stronger effects for V3 after hemma and idag, especially for the P600. The authors suggest V3 with kanske "is processed differently than V3 with other adverbials where the V2 norm is stronger" (Sayehli et al., 2022, p. 1). Swedish and Norwegian are closely related languages and may show similarities in the processing of V3.
Effects of syntactic processing difficulty reflected in the eye movements Syntactic processing difficulty 1 has been examined in a number of eye-tracking studies (for an overview, see Clifton et al., 2007). Typically, such studies have employed grammatical structures that result in ambiguous sentences or gardenpaths (e.g., Frazier & Rayner, 1982), structures that disconfirm expectations (e.g., Staub & Clifton, 2006), non-canonical word order (Gattei et al., 2021), and structures that violate rules of grammar, both in the form of real and "seeming" violations (e.g., Pearlmutter et al., 1999). Effects of syntactic processing difficulty differ from study to study and are seen at various points in the eye-tracking record, thus leaving it open which factors determine the observed patterns of effects (Clifton et al., 2007;Clifton & Staub, 2011).
There are relatively few eye-tracking studies of ungrammaticality, and, to our knowledge, only one manipulating word order. Huang and Staub (2021) investigated readers' tendency to overlook random transposition errors like The white was cat big. Transpositions were less likely to be noticed when both words were short, and when readers' eyes skipped one of the two words, instead of directly fixating on both. The transpositions caused early and sustained disruption on the critical word cat (see Table 1), but only on trials that participants judged to be ungrammatical.
Eye-tracking studies with ungrammatical items in their manipulations mostly examine morphosyntactic anomalies (Braze et al., 2002;Dank et al., 2015;Deutsch & Bentin, 2001;Lim & Christianson, 2015;Ni et al., 1998;Pearlmutter et al., 1999). Most of these studies find increased regressions out from the site of the morphosyntactic anomaly and from subsequent words, often, but not always, combined with longer reading times (see Hallberg & Niehorster, 2021). Thus, there are systematic effects, but the results differ regarding when the effect of the anomaly first appears in the eye movements. Ni et al. (1998) compared reading patterns for sentences where the verb was morphosyntactically anomalous (3a) to non-anomalous sentences (3b).
3. a. It seems that the cats won't usually eating the food we put on the porch.
b. It seems that the cats won't usually eat the food we put on the porch.
The authors did not find significant differences between the baseline and the morphosyntactically anomalous version at any sentence position regarding either first-pass reading times (i.e., the sum of all fixations in a region from first entering it until leaving it again, a.k.a. gaze duration) or residual reading times. 2 However, morphosyntactically anomalous sentences induced significantly more regressions than baseline sentences in the region containing the anomalous progressive verb Table 1. Overview of eye-tracking studies using ungrammatical items (transpositions and morphosyntactic anomalies)

Reference
Type of anomaly Example sentence

Effects of anomaly found
No effects found Huang and Staub (2021) Transposition The white was cat big.
More regressions in (to the critical 4 th word) and out.
Increased first fixation duration, gaze duration, regression path duration, and total time. Ni et al. (1998) Modal verb progressive verb in English It seems that the cats won't usually eating the food we put on the porch.
More regressions out in the verb region and subsequent region.
First-pass reading times ** , total reading time (residual reading time) Braze et al. (2002) Modal verb progressive or past tense form of a verb in English The wall will surely cracking after a few years in this harsh climate./The engine will softly whined while it is running at low capacity.
More regressions out in the verb region and increased firstpass reading times ** (but only by subject). Pearlmutter et al. (1999) Subject-verb number agreement in English (the attraction phenomenon) The key to the cabinet(s) were rusty from many years of disuse.
More regressions out of and increased total reading times in the verb region.
First-pass reading times ** Lim and Christianson (2015) Subject-verb number agreement in English (the attraction phenomenon).
The teacher who instructed the student(s) were very strict.
Increased gaze duration, regression path duration, and total reading time.
Regressions out, first fixation duration Dank et al. (2015) Subject-predicate gender agreement in Hebrew (the attraction phenomenon). form (eating the), as well as in the subsequent region (food we). Thus, the increase in regressions was "immediate, but short-lived" (Ni et al., 1998, p. 532). A study by Braze et al. (2002) used similar materials (but also including anomalies in past tense inflection, cf. Table 1) and found similar effects, as well as increased first-pass reading times in the verb region, for example, cracking after. It is worth of notice that both studies tested morphosyntactic anomalies which are typically not attested in natural speech. Another strand of studies using ungrammatical items have investigated so-called attraction phenomena, for example, when a word erroneously agrees with a local distractor noun instead of the head noun (Hallberg & Niehorster, 2021). Attraction errors have been investigated in subject-verb number agreement in English (Lim & Christianson, 2015;Pearlmutter et al., 1999) and in subjectpredicate gender agreement in Hebrew (Dank et al., 2015). In general, these studies report higher regression ratios and increased total times on the anomalous word in ungrammatical sentences without a distractor, compared to anomalous sentences with a distractor, and to correct control sentences (Hallberg & Niehorster, 2021). However, the results, especially regarding early measurements, differ (cf. Table 1).
Based on the previous studies of morphosyntactic anomalies, Hallberg and Niehorster (2021, p. 32) conclude that syntactic anomalies "reliably produce increased regressions out from the site of the anomaly and from subsequent words, and often also longer reading time." Readers respond immediately, as they make more regressions. However, the time course regarding reading times is less clear. Ni et al. (1998) and Pearlmutter et al. (1999) do not find increased first-pass reading times. Dank et al. (2015) and Deutsch & Bentin (2001) find very early effects on first fixation duration, but Lim and Christianson (2015) do not. Finally, readers relatively quickly recover from the anomalies (e.g., compared to pragmatic counterparts, see Braze et al. (2002); Ni et al. (1998)).

The present study
In the present study, we investigated native readers' online responses to sentences with anomalous word order. The aim of the study was to test whether there was an expected slow-down in processing of the ungrammatical V3 sentences, compared to grammatical V2 baselines. According to the E-Z Reader model of eye movement control in reading (Reichle et al., 2009), severe syntactic violations can result in rapid integration failure of a word n. If the integration of n fails rapidly, the forward saccade to n 1 is canceled. This results in a pause (increasing first fixation duration and gaze duration on n) and/or a refixation (increasing gaze duration) or an interword regression. Thus, the model predicts that "problems with postlexical integration can sometimes have very rapid effects" (Reichle et al., 2009, p. 10). Rather than assuming that integration only happens after the input is presented, the prediction-based approaches to sentence processing (Christiansen & Chater, 2016;Kamide 2008;Levy, 2008) assume that readers make predictions about the input before it is presented, for example, about the word order of upcoming sentences. When these predictions are not met, extra resources are spent, reflected in increased reading times (Kristensen & Wallentin, 2015). Based on previous eye-tracking studies of ungrammaticality, we expect to find similar surprisal effects on the subject and verb (the critical regions), manifested as longer fixation durations and more regressions out in the ungrammatical condition, and both manifested in reading measurements reflecting early (first fixation duration, gaze duration, firstpass regression ratio, regression path duration) and later stages of processing (total duration). Because previous studies (e.g., Braze et al., 2002;Huang and Staub, 2021;Pearlmutter et al., 1999) document that readers recover relatively quickly, we did not expect to see effects of ungrammaticality in the post-critical or wrap-up region. The results may give insights into how L1 readers react to different types of nonstandard variation, by comparing the time course of V3 processing to results from previous eye-tracking studies of morphosyntactic anomalies and to eye-tracking studies of non-canonical, but grammatical, word order.
We also manipulated the length of the sentence-initial adverbials, which vary greatly in sentences with V3 in L2 production (Søby & Kristensen, to appear), in order to examine whether long sentence-initial constituents increase the severity of the ungrammaticality effect (inspired by Braze et al., 2002). Finally, we expected an adaptation effect for all trials, including the ungrammatical sentences, such that participants generally became faster and regressed less over time.

Method
Participants Fifty-two native speakers of Norwegian participated in the study, primarily students and employees from the Norwegian University of Science and Technology. Participants were monolingual until starting school (with a wide variety of dialectal backgrounds) and had normal/corrected to normal vision and no reading deficits. None of them participated in the norming of the test stimuli. In compensation for participation, they chose between a gift voucher (160 NOK) and a lab t-shirt.
Data from four of these 52 participants were identified as outliers and were not entered in the analysis. Three of these participants were excluded because more than 33% of experimental trials had track losses or blinks in the critical regions. Track losses may indicate poor data quality (Staub & Goddard, 2019). A fourth participant was excluded due to a significantly high average sentence reaction time (>6.8 SD from group mean) (Weiss et al., 2018). This participant also read all sentences twice, a possible indicator of reading difficulties. No participants were excluded due to poor accuracy on comprehension questions (see results section). This left 48 participants in the analysis (18 males, 30 females; aged 19-36 years, M = 23.7 years, SD = 3.7 years).

Apparatus
Participants' right eyes were tracked using an EyeLink 1000 eye tracker (SR Research Ltd., Ontario, Canada) with a sampling rate of 1000 Hz. Stimuli were displayed in a fixed-width-font (Courier New, size 27) in black, on a light gray background. All sentences were displayed on a single line. Participants viewed stimuli binocularly on a monitor around 68 cm from their eyes so that approximately three characters equaled 1 degree of visual angle. Head movements were minimized by using a chin rest and (when possible) a forehead rest. The experiment was written in Experiment Builder (SR Research Ltd., version 2.2.61).

Materials
Instructions and stimuli were written in Bokmål, the most commonly used standard for written Norwegian (Vikør, 2015). There were 40 items with four conditions in a 2 × 2 design: grammatical (V2) vs. ungrammatical (V); short adverbial vs. long adverbial.
All experimental items consisted of sentences with five regions 3 (cf. Table 2), and each region contained at least five characters. To avoid confounds, we compared exactly the same words or phrases to each other (besides from the long-short distinction), that is, they had the same length, shape, or frequency.
The pre-critical region contained either a short temporal adverbial or a long temporal adverbial. The last word(s) in the adverbial phrases (i.e., på tirsdager) were identical. "Short" adverbials consisted of 1-2 words between 5 and 12 characters including spaces (mean length = 9.1 characters). "Long" adverbials were at least twice as long, consisting of 4-7 words between 25 and 38 characters (mean length = 30.58 characters). A t test (correlated samples, one-tailed) showed a significant difference in number of characters between the two groups, p < .0001. All 40 adverbials in the short condition were different. However, in order to create the 40 long adverbials, reusing adverbials was necessary. Structurally, all adverbials can be considered a unit since they can be topicalized together in the sentence.
The two critical regions (critical region 1 and critical region 2) contained either the subject followed by the verb (the ungrammatical condition) or the verb followed by the subject (the grammatical condition). All verbs were frequent (defined as having > 15.000 occurrences of the lemma in the HaBiT Norwegian Web Corpus, 2015) and referred to typical everyday activities; they were in the present tense and all were transitive. Most of the verbs were reused once. In order to create some variation, many different subjects were used: typical Norwegian first names, nouns (gutten 'the boy,' jenta 'the girl'), kinship terms (bestemor 'grandmother'), occupations (sjefen 'the boss'), non-human subjects (kattene 'the cats'), and inanimate subjects (kommunen 'the municipality'). The length of the subjects was 5-13 characters (mean = 6.45).
The post-critical region contained a syntactic object, which referred to a physical object, an animal, or a human.
The wrap-up region contained another adverbial, primarily prepositional phrases like på kjøkkenet ('in the kitchen'). This region made it possible to distinguish between spill-over effects (i.e., when a region is "swamped by processing continuing from the (immediately) preceding region" (Vasishth, 2006, p. 97)) from the critical regions and sentence wrap-up effects.
The Appendix contains a list of all experimental items. Conditions were counterbalanced across four lists in a Latin square design, so each participant only saw each item in one of the four conditions. All participants were exposed to 10 items from each of the four conditions. Each list of stimuli was presented in four blocks, so that conditions were balanced across blocks. The presentation order was randomized, both of the blocks and of the trials in each block.
All lists also contained 40 filler sentences (see online-only Supplementary materials A) with various kinds of syntactic constructions (e.g., passives and cleft constructions). Half of the fillers contained morphological anomalies such as agreement errors, incorrect use of gender, or definite vs. indefinite form, which occurred in many different sentence positions. Ten fillers had a structure similar to the target items with a locative or temporal sentence-initial adverbial (some also containing a morphological anomaly). The purpose of the fillers was to avert participant expectations of V3 when a sentence-initial adverbial was presented.
Thirty items (50% targets and 50% fillers) were followed by a simple yes-no comprehension question about the content of the sentence (50% yes/no) in order to keep participants' attention and make them read for comprehension. All comprehension questions can be seen in the Supplementary materials A.

Norming
Prior to the eye-tracking experiment, a judgment task was carried out with 44 grammatical sentences, each in a short and long version, distributed in two lists. Fortytwo participants, who did not later participate in the eye-tracking experiment, rated the naturalness of the sentences on a five-point Likert scale from 1 very unnatural to 5 very natural. In a second "correction" task, participants saw two incorrect sentences with V3 and were asked to state whether or not the sentences were grammatically correct in Norwegian and, if not, where something was wrong. In the judgment task, all items with an average score below three were either discarded (4 items) or changed and re-normed (3 items). In the correction task, 95% of the anomalies were discovered, indicating that this type of anomaly is noticed by native speakers.

Procedure
Participants provided informed consent and various background information, for example, about handedness and dialect. Participants were instructed to read for comprehension in a natural manner and to avoid blinking while reading. A break screen appeared three times during the experiment, but breaks could be taken whenever needed. The experiment lasted around 15 min.
The eye tracker was calibrated using a nine-point calibration grid. Re-calibrations were performed during the experiment, if necessary. A short (two-trial) practice session followed the calibration. Participants responded to the questions by pressing buttons on the keyboard. Corrective feedback was given on the screen.

Response accuracy
To ensure that all participants had read the sentences for comprehension, we analyzed the accuracy of comprehension questions. The group mean was >90% in all four experimental conditions, and all participants had at least 76% correct answers.

Data cleaning
The experimental trials were inspected visually in the EyeLink Data Viewer software package (SR Research Ltd., version 4.1.1). Trials with track losses and blinks in the two critical regions (subject and verb) were removed (following the procedure of e.g., Frisson et al., 2017;Micai, 2018;Warren et al., 2015). As noted above, data from four subjects were excluded. For the remaining 48 participants, track losses or blinks led to removal of 72 trials (3.6%), leaving 1,848 trials that were included in the analysis. Data were cleaned using the automatic Four-stage Fixation Cleaning in Eyelink Data Viewer (SR Research): Short fixations (<80 ms) within one character position of a preceding or following fixation longer than 80 ms were collapsed. Other fixations less than 80 ms in duration were removed, as were fixations greater than 1500 ms in duration (following Frisson et al., 2017;Milburn, 2018).

Reading measurements
We conducted analyses over the five regions (cf. Table 2). The following four standard fixation duration measures were computed: • First fixation duration: the duration of the first fixation on a region during first-pass reading. • Gaze duration: the total duration of all first-pass fixations on a region until leaving it in either direction. • Regression path duration 4 : the total duration of all fixations from entering a region during first-pass reading until leaving it to the right, including any refixations on previous text. • Total duration: of all fixations on a region. Furthermore, the following fixation ratio measures were computed: • First-pass regression ratio: the proportion of fixations following fixation on a region that are regressive relative to that region, considering first-pass reading only. • First-pass skipping ratio: the proportion of times when the target region is skipped during first-pass reading.
We included standard measures that both reflect early (first fixation duration, gaze duration, first-pass regression ratio) and later processing (total duration). Both total duration and regression path duration include gaze duration and cannot be independent of it. Regression path duration is sometimes categorized as a later processing measure because it includes re-reading. However, we consider it to reflect early processing, even though it includes re-reading, since it indicates how long it takes to move past a certain region during first-pass reading (Warren et al., 2015).

Statistical models
Data were analyzed using linear mixed effects models in RStudio (R Core Team, 2019, version 1.2.1335), using the lme4 package (Bates et al., 2015, ver. 1.1.21). P-values were obtained using the lmerTest package (Kuznetsova et al., 2017, ver. 3.1.1). All models included the following fixed effects: Grammaticality (grammatical vs ungrammatical), length (short vs long), trial order, and an interaction between grammaticality and length as well as between grammaticality and trial order. Models also included random effects of participant and item. Random slopes were not included in the models, as they either resulted in a "singular fit" or failed to converge (even in very simple models). Comparisons were coded using sum contrasts (Schad et al., 2020), so that short and grammatical were coded as −0.5 and ungrammatical and long were coded as 0.5.
For binominal data (skips and regressions), a generalized linear mixed model was used to carry out logistic regressions (Frisson et al., 2017). Trial order was rescaled to a scale from 0 to 1. In one case (skipping ratio in the pre-critical region), we used BOBYQA (Powell, 2009), an optimizer that allows more iterations for attempting to reach convergence.

Results
Model results for total reading time and for all eye-tracking measures in the five regions are found in the online-only Supplementary materials B. An overview of all main effects and interactions in the different regions is shown in Table 3 (unexpected effects are in italic writing). Because of the nature of our stimuli, syntactic subjects appear in different critical regions depending on whether the sentence is grammatical or ungrammatical (i.e., ungrammatical subjects are presented in the first critical region, and grammatical subjects are presented in the second critical region). However, we compare subjects to subjects regardless of sentence position.
Likewise, verbs in the grammatical conditions (in the first critical region) are compared to verbs in the ungrammatical conditions (in the second critical region). Table 3 shows that we found the expected effects of grammaticality, with longer fixation durations and more regressions out, in the critical regions. The effects were found on several early measurements, such as first fixation duration (FFD) (only on the verb, though), gaze duration (GD), first-pass regression ratio (RR), regression path duration (RPD), which includes re-reading, and on the only late measurement, total duration (TD). This confirms that V3 causes immediate disturbance on the subject and subsequently on the verb. There were no reliable effects of grammaticality after the critical regions, confirming that V3 causes local disturbance, that is, participants recover quickly. An unexpected interaction between grammaticality and length was found on the object for regression path duration, but this effect is doubtful due to several factors. It only arises in one measure, is not localized in the critical region, and is accompanied by an unexpected main effect of length. 5 The results of the length manipulation are mixed. If the length of the adverbial prior to the anomaly influenced processing of the anomaly, we should see crossing interactions between grammaticality and length in the critical regions. This is not the case, and thus, it seems that the effects of V3 are stable across contexts with short or long adverbials. Obviously, length effects were found in the pre-critical region, which was either short or long. Here, fixation durations, as expected, were longer in  the long conditions for three measurements, both early and late. We assumed that working memory load would be higher after long adverbials, so the length effects on first fixation duration in the critical regions and on two measurements in the wrapup region were expected. However, since unexpected length effects, with longer fixation durations in the short conditions, were also found in the pre-critical, one of the critical and in the post-critical region, the results regarding length are uncertain. Effects of trial, with shorter fixation durations and less regressions (or skips) for later trials, were found for several measurements in all regions, always for regression path duration and total duration, and often for gaze duration and first-pass regression ratio (in one case also for first-pass skipping ratio (SR). Crossing interactions between grammaticality and trial were also found in the pre-critical and critical regions for regression path and total duration, so that adaptation seemed greater in the ungrammatical conditions (but see the discussion on adaptation).
In the following subsections, we present the results in more details, first for total reading times and then for the five regions of the sentence. Table 4 shows total sentence reading time. As expected, there is an effect of grammaticality on total sentence reading time, which is longer in the ungrammatical conditions (β = 409.93 ms, SE = 69.75, t = 5.88, p < .001), see Figure 1. There is also an obvious effect of length (β = 749.94 ms, SE = 34.04, t = 22.03, p < .001). Furthermore, we find an increased reading speed for later trials, that is, an effect of trial order (β = −15.01 ms, SE = 1.53, t = −9.81, p < .001). Finally, there is a crossing interaction between grammaticality and trial order (β = −11.02 ms, SE = 3.08, t = −3.58, p < .001), as seen in Figure 2: The slope is much steeper for ungrammatical conditions, which seems to indicate a larger adaptation effect here. Table 5 shows means and standard deviations for all eye movement measures in the individual regions and is presented in more detail in the following sections.

Pre-critical region: Adverbial
In the pre-critical region, we did not expect grammaticality to affect any measures besides the total duration (the only late measurement). This pattern was confirmed. Participants had longer total durations in the ungrammatical conditions (β = 149.47 ms, SE = 37.92, t = 3.94, p < .001), meaning that they regressed more to the sentence-initial adverbial from other regions. This region was either short or long, and we found an effect of length with increased durations in long conditions for several measurements: Gaze duration (β = 649.98 ms, SE = 11.80, t = 55.07, p < .001), regression path duration (β = 650.13 ms, SE = 11.79, t = 55.15, p < .001), and total duration (β = 780.92 ms, SE = 18.53, t = 42.15, p < .001). Furthermore, there was an unexpected effect of length on first fixation duration (β = −8.23 ms, SE = 3.07, t = −2.68, p < .01), with shorter fixations on long adverbials.
A crossing interaction between grammaticality and trial order was found for total duration (β = −4.31 ms, SE = 1.67, t = −2.58, p < .01), showing a larger adaptation effect in ungrammatical conditions (see plot in Supplementary materials D).
As found for the subjects, there was an effect of length on first fixation duration (β = 12.08 ms, SE = 3.87, t = 3.12, p < .01), so that durations increased in the long conditions.

Post-critical region: Object
In the post-critical region, there were no effects of grammaticality.
A crossing interaction between grammaticality and length was found for regression path duration (β = −57.35 ms, SE = 24.15, t = −2.37, p < .05) (see plot in Supplementary materials D). In the short conditions, regression path duration increased in the ungrammatical versions, but for the long conditions, it decreased in the ungrammatical versions.

Wrap-up region: Adverbial
In the wrap-up region, there were no effects of grammaticality on any measures.

Post hoc analysis with combined critical regions
Since word order is V-S in the grammatical conditions and S-V in the ungrammatical conditions, we compared constituents in different sentence positions. In order to check whether this confounded the results, we carried out a post hoc analysis on a unified subject-verb region. The only reading measurement which we could calculate for the combined subject-verb region post hoc was total duration. The model results of total durations for the combined region showed the same effects as the original analyses of the two regions (see model results in the Supplementary materials C), that is, no indication of a confound.

Discussion
In sum, how do the eyes move in response to anomalous V3 word order?
In the pre-critical region (the short vs. long adverbial, for example, På tirsdager 'On Tuesdays'/Klokken halv sju på tirsdager 'Half past six on Tuesdays'), participants displayed longer total durations in the ungrammatical conditions, as expected. This is because participants regressed more to the sentence-initial adverbial from other regions. Because this region was either short or long, length effects on several measurements were expected and found. However, there was also an unexpected effect of length on first fixation duration, so that fixations were shorter in the long conditions.
Results for the two critical regions, the subject (e.g., biblioteket 'the library') and the verb (e.g., tilbyr 'offers'), were quite similar. There were effects of grammaticality on most measurements besides first-pass skipping ratio (and first fixation duration on the subject). Fixation durations were longer, and more regressions were made in the ungrammatical conditions. In both regions, first fixation duration (assumed to reflect processing difficulties) was longer after long adverbials, indicating that participants paid more attention in this condition. However, this effect of length was not echoed in other measurementson the subject, a reversed effect of length was found for regression path duration, which decreased in the long conditions.
In the post-critical region, the object (e.g., høytlesning 'a read-aloud'), no main effects of grammaticality were found. An unexpected interaction between grammaticality and length was found for regression path duration. It was only found for one measurement and was furthermore accompanied by an unexpected effect of length (which was also found for gaze duration), with durations decreasing in the long conditions.
In the wrap-up region, the second adverbial (e.g., for barn og unge 'for children and adolescents'), no effects of grammaticality were found. There were effects of length on regression path duration and first-pass regression ratio, with more regressions and longer durations in the long conditions. This could be explained by a heavier load on working memorythe need for regressing to previous parts in the sentence is likely greater when sentences are long.
In sum, participants responded immediately to the V3 anomalies, as reflected in longer fixation durations and more regressions out on the subject and subsequently the verb. Participants recovered quickly, already on the word after the misplaced subject and verb, that is, the object. The effects of V3 were stable across contexts with short or long sentence-initial adverbials. Finally, participants generally read faster and regressed less for later trials.

Effects of V3a prominent anomaly
Our results are in line with previous EEG studies of Swedish V3 (e.g., Andersson et al., 2019), as we also found a reaction to V3 after temporal adverbials on online processing.
Previous eye-tracking studies with ungrammatical items have addressed morphosyntactic anomalies, for example, agreement errors (Dank et al., 2015;Deutsch & Bentin, 2001;Lim & Christianson, 2015;Pearlmutter et al., 1999), anomalous verb conjugations (Braze et al., 2002;Ni et al., 1998), and randomly transposed words (Huang & Staub, 2021). Their results varied regarding the time course of the effects found. As expected, our results were similar to those of Huang and Staub (2021), whose word order manipulation caused early and sustained disruption on the critical word. Furthermore, our results are similar to the studies of gender agreement in Hebrew (Dank et al., 2015;Deutsch & Bentin, 2001) as they both found effects on early (including first fixation duration) and later measurements. The only other study which included first fixation duration was Lim and Christianson (2015), who surprisingly did not find effects of missing subject-verb agreement on regressions out or first fixation duration in English. Based on Huang and Staub (2021), our study of V3, and the studies of Hebrew (Dank et al., 2015;Deutsch & Bentin, 2001), it seems that word order anomalies and morphosyntactic anomalies elicit the same responses, with similar time courses. However, as our experiment does not directly compare the two, it remains uncertain whether there are differences in prominence when reading. A behavioral error detection study in Danish shows that there are indeed differences in prominence. High school students underlined different anomalies (syntactic, morphological, and orthographic) in texts under time pressure. As much as 71% of the V3 anomalies were discovered, compared to 59% of anomalous verb conjugations and 55% of gender mismatches in NP-s (Søby et al., to appear). Behavioral data from our eye-tracking study confirm that V3 is a prominent anomaly. In a post-experimental interview, all participants either reported or confirmed (if they did not mention it initially) to have noticed the word order anomalies. Also, a different set of participants, who carried out a correction task when norming the stimuli, corrected 95% of sentences with V3.
The reaction to V3 anomalies in our study was immediate, as reflected in effects on early measurements on the subject. Previous eye-tracking studies that compared grammatical, but non-canonical OVS word orders to canonical SVO word orders in Spanish (e.g., Gattei et al., 2021) primarily found effects on later measurements. The early effects in our study and in Huang and Staub (2021) therefore seem unique to ungrammatical, not just atypical, word order. This suggests that the degree of acceptability for non-standard variation has consequences for the reactions seen in the eye-tracking record. Similarly, Sayehli et al. (2022) suggested, based on their EEG study, that V3 after kanske 'maybe' (which is a more acceptable construction) was processed differently than V3 after other adverbials.

Adverbial length does not affect processing of V3
To test whether the length of the preceding constituent affected anomaly processing, we manipulated the length of the first constituent. However, the manipulation did not result in crossing interactions in the critical regions, suggesting that the effects of V3 are stable across contexts with short or long adverbials. Instead, we found main effects of length on a few measurements for the subject, verb, and second adverbial which could be explained by a heavier working memory load in the long conditions. However, since these were accompanied by unexpected effects of length for a few measurements on the first adverbial, subject, and object, the interpretation is uncertain. Braze et al. (2002) also examined whether readers' sensitivity to anomaly detection and anomaly processing is affected by variation in processing load prior to the anomaly. They hypothesized that "[i]mposing a decoding challenge prior to the anomaly might plausibly reduce a reader's capability to cope with the anomaly" (Braze et al. 2002, p. 4). They varied the length and frequency of the subject nouns preceding the anomalous verbs, and length and frequency were correlated, so that long nouns (mean length: 9.94 letters) were reliably lower in frequency than short ones (mean length: 5.39 letters), but found no consistent effects of length, possibly due to a relatively small difference in length between the nouns. The difference between short and long conditions in our experiment was larger. We initially assumed that longer (and less common) adverbial phrases are more demanding on working memory until the point of the anomaly than short (and frequent) ones. Thus, we expected a (larger) effect of length for the ungrammatical sentences (i.e., an interaction), manifested as longer fixation durations and more regressions in the critical regions after long adverbials compared to short. Yet, length effects could also manifest as less disturbance after long adverbials. Participants might overlook more anomalies in the long condition, that is, increased processing load prior to the anomaly might camouflage its presence.
A corpus study of learners' production of written Danish by Søby and Kristensen (to appear) found that V3 anomalies occur most frequently after subordinate clauses, for example, Selv om det er rigtig sjovt, jeg [S] savner [V] dig! 'Even though it is a lot of fun, I miss you!' Although the length of the adverbial did not affect the processing of the anomalies in our study, there may be differences between processing the long adverbials in our study and the even lengthier and more structurally complex subordinate clauses in naturally occurring V3 anomalies.
The (non)finding regarding sentence-initial adverbial length is supported by data from the Danish error detection study (Søby et al., to appear) who found no significant differences in the probabilities of discovering V3 anomalies after short vs. long adverbials. Furthermore, although the EEG studies of Swedish V3 (Andersson et al., 2019;Yeaton, 2019) included a length manipulation of the sentence-initial adverbials, they did not report results regarding length effects.
Effects of trial: Task adaptation or syntactic adaptation to V3?
It is well documented that participants can adapt to the experimental task and perform faster and better during an experiment (e.g., Kristensen et al., 2014;Prasad & Linzen, 2021). An interesting question is whether participants also adapt to word order anomalies, such as V3. According to prediction theory, language users constantly update their expectations to language input (Kristensen & Wallentin, 2015;Levy, 2008). Therefore, it may be that the first occurrence of a word order anomaly results in a surprisal effect and disrupted eye movements, but that, for later occurrences, readers update their expectations for language input, and adapt to the anomaly at hand.
In this study, we found that participants in general read sentences faster for later trials, that is, an adaptation effect. This effect was seemingly larger for ungrammatical sentences. In the analysis of the five sentence regions, we also found effects of trial in all regions and on several measurements (see Table 3), as well as crossing interactions between grammaticality and trial order for total duration (the first three regions), and for regression path duration (the critical regions), suggesting that adaptation seemingly is greater in ungrammatical sentences. However, as an anonymous reviewer noted, due to the current study design, we cannot know whether the effects of trial are the result of syntactic adaptation to V3, or simply task adaptation. The speed-up in processing time could reflect a shift in task-related strategies. Participants might lose focus toward the end of the experiment and read faster or learn that the comprehension questions can be answered correctly with less re-reading. As pointed out by the reviewer, task-related effects might not reliably affect early processing measurements, such as first fixation duration and gaze duration, but task-related effects are likely to affect regression strategies (Weiss et al., 2018), and thus the late measurement, total duration, as well as regression path duration, which includes refixations on previous text. Task adaptation predicts a main effect of trial, but could also predict an interaction between grammaticality and trial, if the grammatical conditions have floor-level regressions to begin with. The fact that first fixation duration is never affected by trial order, as well as the fact that the interactions between grammar and trial are only observed in regression path duration and total duration, speaks in favor of the adaptation effect simply being due to task adaptation rather than satiation towards V3 (or a combination of the two).
Going forward, better-suited study designs could examine adaption to V3. However, finding a task that is less vulnerable to strategic processing is difficult. V3 sentences do not express different propositional content, and therefore one cannot ask control questions where the anomaly is crucial. One option is to use a between-group design like Prasad & Linzen's (2021) and compare V3 effects in two groups of participants: one exposed to V3 sentences prior to the actual experiment, and one exposed to filler sentences. In this way, it could be clarified whether there is syntactic adaptation "over and above" task adaptation (Prasad & Linzen, 2021, p. 19). Also, one could test participants with great exposure to V3, for example, from a spouse with L2 Norwegian or with friends speaking the multiethnic urban vernacular, to see whether they react less to V3. Adaptation to non-standard syntax after great exposure, that is, change in predictions based on non-standard input, speaks in favor of prediction-based approaches to sentence processing (e.g., Christiansen & Chater, 2016).

Applications of the study
There is surprisingly little research on native speakers' processing of non-native or non-standard syntax. The current study used manipulations based on naturally occurring anomalies typical of L2 learners, increasing the ecological validity.
Thus, the results can be valuable to research on processing of non-standard language varieties, including future models of sentence processing which should be able to accommodate "noisy" input from non-proficient language users and other types of non-standard variation. It may also contribute to research on L2 processing, being a useful baseline for comparison. The study could, for example, be repeated with two groups of L2 speakers of Norwegian (one whose L1 features V2, one whose L1 does not) to examine crosslinguistic influence, as in Andersson et al. (2019). Furthermore, our study is a first step in helping language instructors prioritize which aspects of grammar to focus on in an often tight curriculum. The behavioral data from the Danish proofreading study (Søby et al., to appear) indicate that V3 is noticed more than other common L2 anomalies. However, future studies on online processing of other L2 anomalies in Norwegian are needed to make a direct comparison with processing of V3 in this study.
Norwegian, Danish, and Swedish are to a great extent mutually intelligible (Vikør, 2015). Compared to Danes and Swedes, Norwegians are described as being more receptive to linguistic variation (Torp, 2004). In the Norwegian "polylectal" language situation, dialect use is well-accepted, with dialects used widely in all registers and contexts, and no officially codified spoken standard variety of the language (Havas & Vulchanova, 2018;Røyneland, 2009). Furthermore, Norwegian has two distinct written standards: Bokmål ('Book Language') and Nynorsk ('New Norwegian'), both taught in school. Even in this context, with active diglossia at both the spoken and written level, including grammar, we find clear responses and sensitivity to syntactic anomalies. Therefore, we expect that native speakers of other V2 languages will show the sameor an even largerdegree of sensitivity to V3 anomalies. Indeed, Andersson et al. (2019) found ERP effects in the processing of V3 in Swedish. Interestingly, that study, which also included learners of Swedish, found that effects were more native-like for German learners whose L1 also features V2 than for English learners. Thus, future controlled comparisons between native speakers and L2 learners' sensitivity to syntactic anomalies, and the impact of learner proficiency and language background, are in order.
Tolerance for various anomalies can be modulated by participants' perception of the speaker or experimenter, so that the tolerance and willingness to repair is higher for non-native speakers (Gibson et al., 2017;Hanulíková et al., 2012;Konieczny et al., 1994). We do not know if the participants in our study perceived the author of the stimuli as a non-native speaker, but due to the association between V3 and immigrant status (Freywald et al., 2015), combined with the relatively high amount of anomalies in the stimuli, including the fillers, it seems likely. The study was conducted by a Danish experimenter in Danish. This might have affected the participantsat the first appearance of an anomaly, some participants asked if the experimenter was aware that there was a mistake. However, even if they were affected by non-nativeness of either the experimenter or the stimuli, they still responded to the V3 anomalies. Whether tolerance towards V3 can be modulated, could for example, be tested in an EEG paradigm similar to Hanulíková et al. (2012), where the P600 effects of Dutch gender agreement errors disappeared when presented in a foreign accent. If such morphological processing and syntactic processing are similar, we would expect a similar decrease in response to ungrammatical V3 in Norwegian for speakers with foreign accent and speakers of multiethnic urban vernacular.

Conclusion
The present study demonstrates the consequences of using non-native syntax in written production aimed at native speakers. The study contributes new knowledge to the relatively unexplored field of native speaker responses to naturally occurring anomalies, for example, those produced by L2 learners of the language. Hopefully, this knowledge can be used to create more robust sentence processing models in the future, which can accommodate various types of "noisy" input from non-proficient language users and other types of non-standard variation.
Our results show that native speakers react immediately to V3 word order, as reflected in longer fixation durations and more regressions out on the subject and subsequently on the verb (for reading measurements reflecting both early and later stages of processing). Participants appear to recover from seeing the anomaly equally fast, however. The effects of grammaticality on fixation durations and regressions out are stable across contexts with short or long sentence-initial adverbials.
We argue that V3 is a prominent anomaly in V2 languages, to which native speakers show sensitivity and which negatively affects processing. This first step in a line of potential future studies of online processing of other L2 anomalies in Norwegian can help teachers and learners at language schools prioritize which aspects of grammar to focus on.
Supplementary material. To view supplementary material for this article, please visit https://doi.org/ 10.1017/S0142716422000418