Phonological transfer effects in novice learners: A learner's brain detects grammar errors only if the language sounds familiar

Abstract Many aspects of a new language, including grammar rules, can be acquired and accessed within minutes. In the present study, we investigate how initial learners respond when the rules of a novel language are not adhered to. Through spoken word-picture association-learning, tonal and non-tonal speakers were taught artificial words. Along with lexicosemantic content expressed by consonants, the words contained grammatical properties embedded in vowels and tones. Pictures that were mismatched with any of the words’ phonological cues elicited an N400 in tonal learners. Non-tonal learners only produced an N400 when the mismatch was based on a word's vowel or consonants, not the tone. The emergence of the N400 might indicate that error processing in L2 learners (unlike canonical processing) does not initially differentiate between grammar and semantics. Importantly, only errors based on familiar phonological cues evoked a mismatch-related response, highlighting the importance of phonological transfer in initial second language acquisition.


Introduction
Second language learners can acquire many aspects of a new language (L2) at surprisingly fast rates. In both naturalistic and artificial acquisition settings, novice learners show phonological (e.g., Gullberg, Roberts, Dimroth, Veroude & Indefrey, 2010), lexicosemantic (e.g., Dittinger, Scherer, Jäncke, Besson & Elmer, 2019;Gullberg et al., 2010;Kimppa, Kujala, Leminen, Vainio & Shtyrov, 2015), and even grammatical L2 knowledge (e.g., Cunillera et al., 2009;de Diego-Balaguer, Toro, Rodríguez-Fornells & Bachoud-Lévi, 2007;Gosselke Berthelsen, Horne, Shtyrov & Roll, 2020) after mere minutes of language contact or training. An important facilitating factor during such early stages of second language acquisition (SLA) is transfer. To this effect, initial learners often acquire words with native language (L1) morphology and phonology more accurately and quickly than words with unfamiliar morphological and phonological features (Havas, Taylor, Vaquero, de Diego-Balaguer, Rodríguez-Fornells & Davis, 2018;McKean, Letts & Howard, 2013). This is likely due to a more comprehensive neural processing of L1-like novel words which can draw on fine-tuned L1 networks. This claim finds support in electroencephalographic (EEG) studies which show that only words with familiar phonology induce rapid automatic word assessment and other pre-attentive processing mechanisms in initial learners (Gosselke Berthelsen et al., 2020;Kimppa et al., 2015). In the present paper, we extend the investigation of both initial L2 processing and the role of transfer by examining how novice learners initially process visually presented referents that mismatch lexicosemantic or morphosyntactic properties of acoustically presented words in a new language context. We investigate whether the learners respond differently to mismatches that are based on familiar and unfamiliar phonological cues.

Lexicosemantic and grammatical errors in L1 and L2
When a native speaker encounters a lexicosemantic or grammatical mismatch in the incoming language stream, their brain reacts to these errors with characteristic responses that can easily be observed using EEG measurements. A lexicosemantic error, for instance, will increase the amplitude of the N400 component. The N400 is a natural response to meaningful stimuli measured over the posterior part of the scalp. Its amplitude is in an inverse relation to the expectancy of an encountered stimulus with respect to context and world knowledge (Ganis, Kutas & Sereno, 1996;Kutas & Federmeier, 2014;Kutas & Hillyard, 1980). Hence, the component is particularly enhanced for lexicosemantic errors and can serve as an indicator of lexical and semantic accuracy. Sometimes, the N400 has even been observed for morphosyntactic manipulations (Barber & Carreiras, 2005;Guarjardo & Wicha, 2014). In those cases, the morphosyntactic irregularities are either embedded in contexts that also include prominent semantic manipulations (Guarjardo & Wicha, 2014) or constructions with a strong semantic load (i.e., N400 for noun-postnominal adjective agreement vs. left anterior negativity [LAN] for determiner-noun agreement, Barber & Carreiras, 2005). Most often, however, morphosyntactic processing is distinct from semantic processing. It typically evokes one or several of a range of sequential event-related potentials (ERPs): early left anterior negativity (ELAN), LAN, and/or P600. These components are enhanced and thus most easily elicited in contexts where specific parts of morphosyntactic processing are manipulated so as to fail, i.e., in error contexts. The suggested functions of the different morphosyntactic components are as follows: the ELAN, with a latency of ∼150 ms, is believed to signal the assessment of a word's grammatical category, presumably based on the initial activation of its morphosyntactic features (e.g., Friederici, 2002;Gosselke Berthelsen et al., 2020;Neville, Nicol, Barss, Forster & Garret, 1991). The LAN, subsequently, is similar to the lexicosemantic N400 in timing, but not distribution. It is largest at frontal electrodes, often left-lateralised, and is likely reflective of assessment of, for instance, gender or number relations (Molinaro, Barber, Caffarara & Carreiras, 2014;Roll, Gosselke, Lindgren & Horne, 2013;Schremm, Novén, Horne & Roll, 2019). This assessment is likely made possible by a rule-based integration of morphosyntactic and semantic content (Gosselke Berthelsen et al., 2020;Krott & Lebib, 2013). The P600, finally, is a positive response strongest at posterior electrodes at around 600 ms. It reflects utterance-level integration processes and is enhanced in contexts of encountered errors (morphosyntactic and sometimes semantic) due to a need of revision or repair (Kim & Osterhout, 2005;Osterhout & Holcomb, 1992;Roll, Horne & Lindgren, 2007). Given their different but often complementary assumed functions, the three major morphosyntactic components can occur together or separately. For the most part, they stand in clear contrast to the lexicosemantic N400.
Error processing has also been studied in second language learners. With respect to lexicosemantic mismatches in the L2, learners resemble native speakers and produce an N400 already at low proficiency levels. With respect to morphosyntactic processing, however, learners differ profoundly from native speakers. Rather than an ELAN, LAN, or P600 for grammatical irregularities, learners often produce an N400, especially at low proficiency levels (McLaughlin et al., 2010;Tanner, McLaughlin, Herschensohn & Osterhout, 2012). The N400 initially acts as a second language learner's default response to any linguistic inconsistency, (morpho)syntax included. During continued acquisition, language-specific regularities gradually become grammaticalised in the learner's brain (Steinhauer, 2014): at intermediate proficiencies, learners produce P600 responses (e.g., Tanner et al., 2012;Tokowicz & MacWhinney, 2005), followed by the LAN at high proficiency levels (e.g., Bowden, Steinhauer, Sanz & Ullman, 2013;Gillon Dowens, Vergara, Barber & Carreiras, 2010;Ojima, Nakata & Kakigi, 2005), and, very rarely, an ELAN (e.g., Hanna, Shtyrov, Williams & Pulvermüller, 2016;Hed, Schremm, Horne & Roll, 2019;Rossi, Gugler, Friederici & Hahne, 2006). Evidently, however, language processing can be accelerated dramatically in targeted experimental settings. Here, learners frequently show both lexicosemantic and morphosyntactic ERP effects with minimal instruction or exposure. The order in which the ERP components emerge stays the same. Yet, the P600 emerges already after hours or even minutes of training in experimental settings (Batterink & Neville, 2013;Davidson & Indefrey, 2009;de Diego-Balaguer et al., 2007;Friederici, Steinhauer & Pfeifer, 2002;Grey, Sanz, Morgan-Short & Ullman, 2018;Havas, Laine & Rodríguez-Fornells, 2017;Mueller, Hahne, Fujii & Friederici, 2005). The same holds true for the LAN and possibly the ELAN which, however, seem to depend more on intermitted consolidation periods Mueller, Hirotani & Friederici, 2007;Morgan-Short, Steinhauer, Sanz & Ullman, 2012). While a recent artificial learning study found an ELAN and a LAN without consolidation, this may well be explained by the fact that this study analysed canonical grammar processing rather than error processing (Gosselke Berthelsen et al., 2020). Collectively, the findings for error processing in L2 learners suggest that lexicosemantic error processing appears to precede grammatical processing and that the latter proceeds gradually. Learners can process L2 words like native speakers but it either takes time or specific, experimental focus for the error-related ERP responses to emerge. In experimental settings, language processing can emerge rapidly, even in initial learners and often, in fact, before behavioural performance transcends chance levels (cf., McLaughlin, Osterhout & Kim, 2004;Hed et al., 2019). However, some grammar-related ERP components (ELAN/ LAN) have thus far only been observed after consolidation periods, at least in violation paradigms.

Transfer in the early stages of SLA and word-level tone
One factor that appears critical for how rapidly and naturally novel items and rules can be acquired and processed is transfer from the learner's native language on the basis of L1-L2 similarity. Transfer is relevant on many linguistic levels, including syntax (e.g., Foucart & Frenck-Mestre, 2011; and phonology (e.g., Gosselke Berthelsen et al., 2020;Kimppa et al., 2015). Transfer tends to affect early, pre-conscious processing more than late, post-conscious processing or offline responses (e.g., Andersson, Sayehli & Gullberg, 2019;Gillon Dowens et al., 2010;Gillon Dowens, Guo, Guo, Barber & Carreiras, 2011). With respect to phonological similarity, only novel words that are in accordance with the learners' L1 rules for sounds and sound structure are assessed and processed pre-attentively by initial learners. This has been found for a number of different preattentive responses: an early lexicosemantic gating component at ∼50 ms, for instance, shows rapid lexical trace formation only for words that have L1-like phonotactic (Kimppa et al., 2015) or prosodic structure (Gosselke Berthelsen et al., 2020). For words where L1 transfer is not possible, the lexicosemantic gating component is unaffected. Similarly, the fronto-central N1, strongly involved in automatic auditory processing, is significantly reduced for phonologically illegal pseudowords but not for legal pseudowords (Silva, Vigário, Leone Fernandez, Jerónimo, Alter & Frota, 2019). This may suggest that novel words with illegal phonology are not processed as words, but as unusual sound patterns. Finally, even pre-conscious grammar processing is affected by the possibility for phonological transfer. In a study on artificial word acquisition, Gosselke Berthelsen et al. (2020) found an ELAN response for pre-attentive morphosyntax activation only in learners for whom L1-L2 similarity was high. Together, these studies thus indicate that, at least initially, only words with L1-like phonology can be processed outside of consciousness. This is likely a result of listeners being able to extend their neural subsystems, fine-tuned to the automatic processing of native sounds, to native-sounding foreign words, in line with the concept of L1-L2 transfer. An exception to the illustrated limitations in pre-attentive processing has been observed in children Bilingualism: Language and Cognition 657 https://doi.org/10.1017/S1366728921000134 (Partanen et al., 2017). Children produced changes in early components for non-native as well as native-sounding words. This is likely due to larger neural plasticity in the developing brain. However, the children still displayed clear hemispheric differences in the processing of phonologically native and non-native novel words, thus indicating that phonological similarity and transfer have an important impact on pre-attentive processing in all learners, even children. Presumably as a consequence of transfer and the possibility for pre-attentive processing, unfamiliar phonology also sometimes negatively affects behavioural learning outcomes (Havas et al., 2018;McKean et al., 2013) and impedes predictive processing (Lozano-Argüelles, Sagarra & Casillas, 2018). A grammar feature that lends itself well to the study of transfer in initial L2 acquisition is the so-called grammatical tone. Grammatical tone is a suprasegmental feature that is added onto the segments (vowels, consonants) of a word to make grammatical distinctions (e.g., case, gender, or number). Since the tone itself as well as its grammatical content are independent, additive features, the word's lexical and non-tonal grammatical meaning (e.g., suffixes) can be accessed irrespective of the tone. The tone's grammatical meaning, on the other hand, can only be accessed through the tone itself. If words with grammatical tone are taught to (beginner) learners from tonal and non-tonal language backgrounds, it is possible to directly compare general acquisition abilities (segmental features) to potentially transferaffected acquisition (suprasegmental features) within the same words 1 . Importantly for the purpose of the current study, this can be extended to a direct comparison of initial learners' error processing (i.e., errors across segmental features) and error processing in light of potential transfer (errors in the suprasegmental tone dimension). An important premise, of course, is that L1-L2 transfer is possible for tone. This has, indeed, been observed behaviourally, at least when the L2 tone is less complex than the L1 tone and similar in form and function (e.g., Braun & Johnson, 2011;Hallé, Chang & Best, 2004;So & Best, 2010;Wayland & Li, 2008). Neurophysiological accounts of transfer effects in L2 learners of tone languages, on the other hand, are rare. However, studies using mismatch negativity (MMN) paradigms have observed transfer based modulations of L2 tone perception. To this effect, only learners with a tonal L1 showed modulations of MMN effects based on tone functionality and not only pitch height (Shen & Froud, 2019;Yu, et al., 2019). For post-perceptive processing, no N400 for lexical tone mismatches or LAN/P600 for mismatches in grammar-related tone is found in natural L2 learners with a non-tonal L1 (Gosselke Berthelsen et al., 2020;Pelzl, Lau, Guo & DeKeyser, 2019). In addition to these null results, the clearest indication of transfer was found for artificial, grammatical tone in Gosselke Berthelsen et al. (2020): a rapidly emerging left anterior negativity (LAN), a P600, as well as early, pre-attentive processing (i.e., word recognition effect and ELAN) emerged in learners with a tonal L1. Non-tonal learners, in contrast, showed no evidence of preattentive processing and a later LAN onset. Thus, we see distinct transfer effects for L2 tone acquisition and processing. We, therefore, suggest that grammatical tone is well suited for examining how learners, differentially affected by transfer, differ in their processing of L2 errors.

Word-picture association learning and picture processing
While transfer is an important and often deliberately studied feature in initial L2 acquisition, studies attempt to limit the EXPLICIT presence of the learners' L1 in initial learning contexts to avoid a conscious mediation of the L2 through the L1. A useful method for circumventing overt L1 exposure during initial L2 learning is word-picture association learning where new L2 words are taught with the help of pictures. Interestingly, studies using this paradigm have uniformly found that picture associations can install rapid lexicosemantic (Dittinger et al., 2019;François, Cunillera, Garcia, Laine & Rodríguez-Fornells, 2017;Havas et al., 2017;Yang & Li, 2019) or grammatical (Gosselke Berthelsen et al., 2020;Havas et al., 2017) processing in novel words. For this to occur, learners must be aware of the lexical and grammatical content of the meaning-assigning pictures. Relatively little, however, is known about how this affects the processing of grammatically relevant features (e.g., number or gender) in meaning-carrying image referents themselves. We know from early N400 literature, that at least the N400 appears to be largely amodal and increases for inconsistent sentence endings regardless of whether they are expressed through words or pictures (e.g., Ganis et al., 1996;Kutas & van Petten, 1990;Nigam, Hoffman & Simons, 1992). In fact, even pictures primed with single auditory words show language-like N400 modulations (Pratarelli, 1994). Processing of grammatical properties in pictures, on the other hand, has been studied considerably less. One study that did investigate this found a LAN (but no P600) for gender mismatched pictures in sentential contexts (Wicha, Moreno & Kutas, 2003). This suggests that at least some lexicosemantic and morphosyntactic processes can be activated by pictures. Based on these findings, we assume that lexicosemantic and grammatical errors in the picture of a word-picture association paradigm should evoke the corresponding ERP responses (e.g., N400, LAN) and therefore constitute a useful tool for testing error processing in learners at the very beginning of the acquisition process.

The current study
Addressing a thus far understudied issue in initial second language acquisition, we investigated how error processing is realised during the initial acquisition of L2 words. As mentioned, learners at first process all L2 errors lexicosemantically. Grammatical components for error processing emerge later in both natural and artificial SLA. In artificial SLA, this process can, however, be sped up and grammatical processing can be found with minimal exposure. In the present study, we drew on a word-picture association paradigm that had previously produced rapid grammatical processing in novel words (Gosselke Berthelsen et al., 2020). The words in the present study contained grammatical tone and were tested on learners with and without a tonal L1. This allowed for a finegrained study of phonological transfer effects. As a control condition, the words also contained a grammatical vowel change which both learner groups were equally familiar with from their L1. Importantly, we included mismatches in the word-picture pairs that became transparent at picture onset and analysed the learners' behavioural and neural responses to the resulting errors. Previous studies have illustrated that pictures in linguistic contexts 1 can evoke both lexicosemantic and grammatical responses. Moreover, we were aware that word-picture association paradigms can successfully be used to teach grammar; a fact which entails that the linguistic content of pictures in this paradigm must be evident to learners. Therefore, pictures were considered a suitable, straightforward way of studying the processing of morphosyntactic mismatches in novice learners. Based on previous results and due to the strong focus on grammar in this paradigm, we expected grammatical responses (e.g., LAN/P600) to errors in the initial learners. We further anticipated error processing to be facilitated by L1-L2 transfer. To test for this, we elicited errors in a phonological dimension that one learner group was unfamiliar with (i.e., tone) or in dimensions that both learner groups knew from their L1 (i.e., vowel and consonant). We hypothesised that transfer effects in the form of reduced or missing mismatch responses would only appear in the unfamiliar tone condition, since the tone was an additive feature and the remainder of the word could be acquired independently of the tone.
All experiments were conducted in agreement with the ethical guidelines for experiments in the Declaration of Helsinki and carried out in the Lund University Humanities Lab. All NTL1 participants were exchange students at Lund University. The NTL1s had no extensive knowledge of Swedish (highest self-assessed level of proficiency was B1 [=intermediate] of the Common European Framework of Reference, mean proficiency was A1 [=beginner]). Despite having lived in Sweden for some months (M = 23 weeks, SD = 16) and having studied Swedish to some extent (M = 10 weeks, SD = 7), they only engaged with Swedish actively for on average 5 hours per week (i.e., studying, conversation, listening, SD = 7) and passively for 10 hours (i.e., Swedish spoken in the background, SD = 8). Only one of the NTL1 participants reported having heard of Swedish word accents and none were aware of their ties to grammar nor did they include them in their vocabulary acquisition routines. Additionally, we tested the NTL1s on their perception of Swedish word accents and both behavioural and neuropsychological data suggested that they processed them as linguistically irrelevant.
Keeping with participant sizes in previous studies with related paradigms (e.g., 23: de Diego-Balaguer, Rodríguez-Fornells & Bacoud-Lévi, 2015;19: Havas, Laine & Rodríguez-Fornells, 2017;Leminen, Kimppa, Leminen, Lehtonen, Mäkelä & Shtyrov, 2016), we originally recorded 24 participants per group. As each participant had their own set of stimuli, multiples of eight allowed us to counterbalance the distribution of the four grammatical features over vowels and tones. Prior to analyses, we excluded one participant from each group, due to rudimentary knowledge of a tone language (Chinese) and experienced discomfort with the EEG equipment, respectively. Closely inspecting the behavioural data, we subsequently found that some participants performed poorly on the acquisition task of the main study and remained at chance level for error detection throughout the first session of the experiment (see section 2.3 below). Closer inspection of the distribution of the accuracy data ( Figure 1) indicated a discontinuity in accuracy between 55 and 60% that we interpreted as a natural cut-off point. We therefore excluded the four participants (2 NTL1s, 2TL1s) whose accuracy during the first session stayed below 55%, resulting in final group sizes of 21 participants.

Stimuli
We used a learning paradigm that has previously been reported to be efficient in learning novel lexical and morphosyntactic features with tonal and non-tonal distinctions (Gosselke Berthelsen et al., 2018). Participants learned new word-picture associations by hearing an auditorily presented novel word followed by a meaning-assigning picture, as described in more detail below.

Auditory stimuli
For the sound stimuli, we constructed pseudowords from consonants and vowels, which were produced by a native speaker of Russian (rather than German or Swedish to prevent differential carry-over effects) and chosen on the basis of being phonologically close in all three languages. Using Praat (Boersma, 2001), consonant and vowel durations were standardised. Subsequently, consonants and vowels were spliced into 24 simple CVC pseudowords and four different tones were added to the words with the help of pitch-manipulation. See Figure 2 for an overview of all words and an example stimulus.  Swedish has tones (word accents) that are strongly associated with grammatical suffixes. A low tone (accent 1) on the stem of the word fisk ("fish"), for instance, can be followed by the definite singular suffix -en, ( fisk 1 -en, "fish-the") but not by the plural suffix -ar (*fisk 1 -ar, "fish-PL"). The plural instead requires a high tone (accent 2) on the word stem: fisk 2 -ar.

Picture stimuli
We chose images of humans portrayed as 24 different professions for the pictorial stimuli. For each profession, we constructed eight different images: one picture each with one, two, three, or four female or male workers, respectively. This way, in addition to the lexical information (profession), we were able to visually implement two grammatical categories: gender (masculine-feminine) and number (singular-plural). For the control condition, nonmeaningful pictures were constructed by scrambling all pixels from the profession pictures. See Figure 2 for an overview of all professions and an example set of picture stimuli. See the supplementary material for an overview of all used picture stimuli.

Experimental procedure
All participants took part in two acquisition sessions, 24 hours apart. They sat on a chair one meter from a computer screen and kept their index fingers on a response box that stood on a table in front of them. The experiment was controlled by E-Prime 2 stimulation software (Psychology Software Tools Inc., Sharpsburg, PA). All auditory stimuli were routed through a GSI 16 Audiometer (Grason & Stadler Inc., Eden Prairie, MN) and presented at 70 dB SPL through a pair of circumaural earphones (California Headphone Company, Danville, CA). The presentation level was verified using a Brüel and Kjaer 2231 sound level meter with a 4134 microphone in a 4153 Artificial Ear. After a short instruction and training session (only in session 1), participants were asked to learn 24 words consisting of six professions that could be masculine or feminine (gender) and singular or plural (number). Professions were always expressed through the initial and final consonant (consonant frame). Gender and number were always expressed by vowel or tone, equally distributed across participants. Each participant had a unique set of words (i.e., list of six professions and combination of vowels, tones and their respective meaning). All vowels and tones were used equally often as part of target and control words. For an example set of stimuli, see Table 1.  Using the procedure shown in Figure 3, all 48 word-picture combinations from each participant's target and control set were presented 30 times in each of the two sessions. In every full repetition of all 48 experiment items (cycle), three pseudorandomly selected meaningful word-picture trials were followed by overt questions pertaining to the correctness of the previous word-picture pair. Every cycle also included three additional question trials where there was a mismatch between word and picture related to one of the learned categories: number, gender (i.e., based on tone or vowel) or profession (based on consonants). Participants responded to all question trials by pressing a button on the response box and overt feedback on their response was given. After every 10 cycles (∼40 minutes), participants were offered a break.

Electroencephalography (EEG)
Throughout the experiment, the participants' brain activity was recorded using 64 Ag-AgCl EEG electrodes mounted in an electrode cap (EASYCAP GmbH, Herrsching, Germany), a SynAmps 2 EEG amplifier (Compumedics Neuroscan, Victoria, Australia), and Curry Neuroimaging Suite 7 software (Compumedics Neuroscan). To monitor eye movements, horizontal and vertical bipolar electrooculogram channels (EOG) were added. Impedances for the scalp channels were kept below 3 kΩ and below 10 kΩ for the eye channels. The left mastoid (M1) was used as online reference and the frontocentral electrode AFz as ground. EEG was recorded with a 500 Hz sampling rate using DC mode and an online anti-aliasing low pass filter at 200 Hz.
The recorded EEG data was then re-referenced offline to average reference, and subsequently filtered with a 0.01 Hz high pass and a 30 Hz low pass filter. ERP epochs of 1200 ms including a 200-ms baseline were extracted for word and picture stimuli at word disambiguation point and picture onset, respectively. The extracted epochs included both learning and question trials. Independent component analysis (ICA) (Jung, Makeig, Humphries, Lee, McKeown, Iragui & Sejnowski, 2000) was conducted on all epochs. ICA components representing eye artefacts and single bad channels were removed. Finally, all epochs still exceeding ±100 μV were excluded. Only the epochs pertaining to picture stimuli were considered for the analysis.

Statistical analysis
Behavioural data analysis was conducted separately for the two experimental factors 'Response Time' (RT) and 'Response Accuracy' which were recorded for question trials. RT measures were log-transformed to normalise the distribution and submitted to a mixed Analysis of Variance (ANOVA) in IBM SPSS Statistics 25 (International Business Machines Corp., Armonk, NY, United States) with within-subject factor 'Type' (consonant-related wordpicture mismatch [henceforth, consonant mismatch], vowelrelated word-picture mismatch [henceforth, vowel mismatch], and tone-related word-picture mismatch [henceforth, tone mismatch], and correctly matched word-picture trials [henceforth, matched pictures]), temporal factors 'Session' (session 1 vs. session 2) and 'Half' (first vs. second half of a session), and the between-subject factor 'Learner Group' (TL1s vs. NTL1s). For the response accuracy data, d ′ scores were calculated for each participant, condition and time window by comparing z transforms of hit rates (correct acceptance of matched trials relative to the total number of matched trials) and false alarm rates (incorrect acceptance of mismatched trials compared to the total number of mismatched trials -by mismatch type). Log-linear corrections (Hautus, 1995) were applied to extreme values (i.e., 0 and 1). D ′ scores were submitted to a mixed ANOVA with the factors 'Mismatch Type' (consonant, vowel, tone), the temporal factors 'Session' and 'Half' and the between-subject factor 'Learner Group'. Greenhouse-Geisser correction was used when applicable. Main effects and interactions were considered significant at a p-value of < 0.05. For pairwise comparisons, False Discovery Rate (FDR) corrections (Benjamini & Hochberg, 1995) were used.
To find effects of interest in the ERPs, global root mean squares (gRMS) of the data were used to investigate when the learner groups' neural activity was maximal (cf. Lehmann & Skrandies, 1980). When there were peak latency differences between the learner groups, we submitted the participants' unweighted mean gRMS peaks to a 2-tailed independent samples t-test. If there were significant between-group differences for gRMS peak latency, we established different time windows for the ERP analyses for the respective peak and analysed the groups separately. Subsequent to the gRMS analysis, we inspected the ERPs at gRMS peak latencies to find the ERP effects which the peaks were related to. The gRMS curve reaches a maximum when there is a CHANGE in neural activity and gRMS peaks therefore often mark effect onset rather than effect peak (cf. e.g., Roll, Söderström, Mannfolk, Shtyrov, Johansson, van Westen & Horne, 2015). Thus, using gRMS maxima as effect onsets, we selected time windows of 70 ms for the semi-early peaks (150-300 ms) and 200 ms for late peaks (above 300 ms) for the ERP analysis.
Testing for ERP effects at the gRMS peak latencies, mean amplitudes over the selected time windows for each electrode and condition were submitted to a cluster-based permutation test to find electrode groups that differed significantly between conditions. Each permutation test included one of the mismatched conditions and the match condition. We also tested for interactions with the between-subject factor 'Learner Group' as well as the temporal factors 'Session' and 'Half' and combinations thereof (e.g., interaction with 'Half' in session 1) in all conditions. All permutation analyses were carried out with help of the nonparametric cluster-based permutation approach implemented in Fieldtrip (Maris & Oostenveld, 2007). Using the Monte Carlo method to account for large data sets, we ran 1000 random permutations of the data. Clusters of three or more electrodes that had a p-value of < 0.05 were considered significant.
Finally, we conducted two-tailed Pearson correlations between response times and effect amplitudes (i.e., difference between mismatch and match amplitudes for electrodes in mismatch clusters) for all emerging error-related clusters in order to examine possible relationships between error processing in novice learners and subsequent behavioural responses.

Behaviour
For the behavioural variable Response Accuracy, the mixed ANOVA produced a main effect of Mismatch Type, see Table 2. FDR corrected pairwise comparisons revealed that participants were significantly more accurate for vowel or consonant conditions than for tones. Further, mismatches in consonants were more accurately detected than those involving vowels. Learners improved throughout and between sessions, particularly from the beginning to the end of session 1. The non-tonal learners' tone detection ability was initially so poor that it differed clearly from vowel and consonant mismatches but increased over time and approached vowel mismatch levels towards the end of the learning sessions.
For Response Times, we found a main effect of Type (see Table 3). FDR corrected pairwise comparisons revealed that participants were faster at detecting consonant mismatches and vowel  Table 3). Furthermore, Response Times improved throughout the sessions and between sessions. The strongest improvement took place within session 1. Vowel mismatch detection was not significantly faster than tone mismatch detection until session 2. There were no significant interactions with Learner Group.

Electrophysiology
Results for gRMS peak latency Two gRMS peaks emerged in the data for both participant groups: ∼180 ms (visual N1 latency) and ∼370 ms (N400/LAN latency) after the picture onset. In the group average, the first peak was delayed for the TL1 group compared to the NTL1 group, cf. Figure 4. A 2-tailed independent samples t-test showed significant differences in average gRMS peak latency between the learner groups, t(40) = -2.22, p = .032. The TL1 group's first gRMS peak (M = 185 ms, SD = 12) was significantly delayed compared to the NTL1s' (M = 176 ms, SD = 13). Accordingly, the time windows for the first gRMS peak were 185-255 ms for the TL1 group and 176-246 ms for the NTL1 group (i.e., peak +70 ms). For the second gRMS peak (M TL1 = 373 ms, SD = 49, M NTL1 = 379 ms, SD = 44), there were no significant differences in timing.
Hence, the time window submitted for the second gRMS peak was defined as 370-570 ms (i.e., ∼peak +200 ms).

Correlation behavioural and neurophysiological results
When testing for correlations between response times and ERP effects for error processing (i.e., N400), we found no significant correlations in the consonant condition. In the vowel condition, there was a significant positive correlation between vowel N400 and response times to vowel mismatches, r(40) = .370, p = .016: the larger the N400 effect for vowel, the faster the response times to vowel mismatches (cf. Figure 5C). For the tone N400 in the TL1 group, we similarly observed a positive correlation between tone N400 and response times to tone mismatches, r(19) = .452, p = .040. (For the non-tonal learners, in comparison, neural responses to tone mismatches and response times were not correlated: r(19) = .033, p = .889).

Discussion
The present study investigated the processing of lexicosemantic and morphosyntactic errors in initial second language acquisition and the role of transfer in this process. In a word-picture association paradigm, learners acquired novel words with grammatical tone and, as a control, grammatical vowel change. All participants were familiar with systematic, grammatically distinctive vowel alterations from their L1. With respect to grammatical tone, however, the learners differed, such that they either were (TL1s) or were not (NTL1s) familiar with grammatically meaningful tone from their L1. This created the possibility of differential L1-L2 phonological transfer in the novel words. We occasionally presented pictures that were mismatched with the different phonological features of the preceding auditory words in the word-picture association paradigm. By doing so, we introduced lexicosemantic and morphosyntactic errors during the first four hours of acquisition. We expected to find neurophysiological responses related to lexicosemantic (N400) or morphosyntactic (E/LAN, P600) processing, respectively, for all phonological mismatch conditions in the TL1 learners. We anticipated grammatical ERP components in this learner group even though grammatical processing usually emerges relatively late in L2 learners. Acquisition conditions in the present study were optimised by allowing for complete L1-L2 transfer and through a strong focus on grammar. Furthermore, as mentioned in the introduction, we knew from previous studies that learners can deduce the grammatical content of the pictures in the word-picture association paradigm. We therefore assumed that grammatical ERP components, most likely a P600 and possibly a LAN or ELAN, would emerge for grammatically mismatched pictures in the tonal learners. For the NTL1 group, on the other hand, we expected reduced or missing ERP effects and possibly poorer behavioural performance for errors which were based on mismatches across the tone dimension but similar processing to the TL1 group for the other mismatch conditions.

Differential processing demands in beginner learners: Posterior N1
Somewhat unexpectedly, an early gRMS peak pointed at an important role of the visual perception-related posterior N1. The N1 (or N170), with a posterior distribution for visual input, is an exogenous response which is sensitive to task demands (Callaway & Halliday, 1982;Nash & Williams, 1982). Like other ERP components, the N1 is usually analysed with respect to amplitude differences between conditions or groups. The posterior N1 in the present study, however, showed timing rather than amplitude differences: the TL1 group's N1 was delayed in relation to the NTL1 group's N1. Such differences in N1 timing, although rare, have been attested in the literature and are generally associated with differentially effortful processing. As an example from the auditory domain, for instance, word recognition tasks in children with specific language impairment have elicited a delayed N1 compared to the N1 response from a matched control group of normally developing children (Malins, Desroches, Robertson, Newman, Archibald & Joanisse, 2013). With visual input, similar results have been reported when stimuli were processed under differential task demands. Children diagnosed with ADHD, for example, had a delayed N1 response to light flashes during an active detection task compared to a passive viewing (Callaway & Halliday, 1982). Similarly, healthy adults have shown a delayed N1 when tasked with stimulus identification rather than simple stimulus detection for moving visual stimuli (Fort, Besle, Giard & Pernier, 2005). In line with those studies, we believe that the N1 latency in the present study is related to differential cognitive demands for picture perception and identification in the two learner groups and that this is based on top-down influences. From the word-picture association, the tonal learners rapidly internalise the words' three phonological features (tone, vowel, and consonant) and their respective meaning. The activation of these features during word processing likely pre-activates the associated pictorial properties (number, gender, and profession) in the TL1 learners prior to the emergence of the picture. Thus, when the picture referent appears in the input, the TL1 learners are faced with the difficult task of pre-consciously checking and identifying the image against the three pre-activated categories. The NTL1 learners, on the other hand, have presumably only internalised two phonological categories, i.e., vowel and consonant, and their corresponding meaning, which influence their pre-conscious perception and the picture identification task. We believe that the automatic access to a greater number of pictorial properties, through preactivation, increased automatic picture identification difficulty in the non-tonal learners which manifested in a delayed posterior N1. Interestingly, there were no differences in either N1 latency or amplitude between mismatching or matching pictures in either participant group, suggesting that validity does not affect processing at this latency. Importantly, the N1 results show that the absence or presence of positive L1-L2 transfer in auditory words has a strong influence on the pre-attentive processing of upcoming pictorial referents. Hence, transfer not only affects the processing of the transfer-facilitated words themselves but also allows for pre-activation of closely related items in the immediate context via top-down processes, which influences word perception. This finding calls for more detailed investigations into the generalisability and scope of transfer-facilitated processing effects. Can a comparable N1 latency effect also be found for written word form identification and possibly even in the auditory modality? If so, how close to each other do the associated elements need to be? Can positive transfer effects on the first element in a non-adjacent dependency, for instance, influence the pre-attentive perception and identification of the second element further down the language stream? Addressing these and related questions could provide important insights for a more comprehensive understanding of the N1 in the perceptive processing of pre-activated elements in language contexts.
Mismatch responses in beginner learners: N400 as a response to all error types For mismatch processing, we anticipated differential results depending on mismatch type. For semantic errors, which were contained in mismatches between the words' consonants and the profession depicted in the associated picture, we expected an N400 response. For morphosyntactic errors, which were elicited by mismatches of picture properties related to the words' vowels or tones, on the other hand, we expected to see morphosyntactic ERP components. The vowels and tones had clear and easily abstractable morphosyntactic content and L1-L2 transfer was possible. For comparable circumstances, previous studies on artificial acquisition rapidly elicited at the very least a P600 (e.g., Batterink & Neville, 2013;Havas et al., 2017) and sometimes even additional early frontal negativities (de Diego-Balaguer et al., 2007). However, we find no evidence of a P600-related late positivity in response to the morphosyntactic mismatches in the present study. Although surprising, the lack of a P600 might be explained by the fact that mismatches were between novel spoken words and pictorial referents. While N400 and LAN effects have previously been observed for pictures in linguistic contexts (e.g., Federmeier & Kutas, 2001;Wicha et al., 2003), a P600 was never reported in those studies. It is thus possible that the absence of the P600 in the present study is simply due to the mismatch format. Instead of a P600, we found an earlier negativity, as indicated by a prominent gRMS peak at ∼370 ms. This negativity was related to an N400 response and was found consistently for all mismatch types, but only when L1-L2 transfer was possible. The emergence of an N400 rather than the expected LAN may be an unintended result of the use of visual referents. The pictorial form of the mismatches may automatically have induced semantic processing. We assume that learners are aware of 'gender' and 'number' categories in a word-picture association paradigm, as the related visual cues in the pictures can install grammatical processing in the words (Gosselke Berthelsen et al., 2020). However, this does not necessarily entail that the same pictorial features could not in theory be processed semantically when they are presented as part of the pictures, especially in L2 contexts. That is to say, the abstracted, morphosyntactic label 'feminine' for the words could be processed concretely as the semantic feature 'female' during picture processing using visual features. For number this would correspond to a grammatical vs numeric number distinction between words and pictures such as 'plural' vs 'three'. The pictorial form of the mismatches is thus an entirely possible explanation of the absence of P600 and LAN effects. An alternative explanation for the lack of morphosyntactic responses in the present data might be categorical differences between canonical processing and error processing in L2 learners. In a direct comparison of results elicited with the present wordpicture association learning paradigm, we find an ELAN, LAN and P600 for canonical words but a N400 for mismatched pictures. This shows that learners can use the present paradigm to produce a full range of grammatical components for newly acquired words. These are elicited after just 20 minutes of acquisition in the comparison of learned words and meaningless pseudowords and thus for canonical processing (Gosselke Berthelsen et al., 2020). For errors in the word-picture pairs, in contrast, we do not observe morphosyntactic processing although this should in theory have been possible. Previous literature does find quickly emerging morphosyntactic components even in error contexts (Batterink & Neville, 2013;de Diego-Balaguer et al., 2007;Havas et al., 2017). Those studies, however, either had a much stronger, solitary focus on syntax (Batterink & Neville, 2013;de Diego-Balaguer et al., 2007), or a considerably less complex morphosyntactic system (Havas et al., 2017). The learners in the present study, instead, produced an initial L2 default response to all types of L2 errors: an N400 (Osterhout, Poliakov, Inoue, McLaughlin, Valentine, Pitkanen, Frenck-Mestre & Hirschensohn, 2008). This default response suggests that differentiations of mismatches as lexicosemantic or morphosyntactic do not take place. This differentiation might generally be of little importance to beginner learners. Error detection arguably takes precedence over error classification. In fact, this might be an important factor to keep in mind for our interpretation of grammar processing in second language learners. Traditionally, grammar processing is measured with the help of violation paradigms. The present results hint that this very tradition might potentially skew the picture. Learners might not in fact be incapable of grammar processing per se but instead simply be indifferent to different error types. They might then respond only to the unexpectedness of an erroneous item and the effect that this has on utterance-level meaning, something which would elicit an N400. Therefore, the late emergence of full-fledged grammatical processing of errors in both natural (e.g., Gillon Dowens et al., 2010; Tanner et al., 2012) and artificial second language acquisition (cf. e.g., Tanner et al., 2012;Morgan-Short et al., 2012) might be related to the fact that the learners' L2 processing resources are initially exhausted by canonical processing and the detection but not classification of errors. In summary, the present findings suggest a need for carefully evaluating the use of violation paradigms as measurements of L2 proficiency in beginner learners and, more generally, grammar processing in second language contexts. Future studies should more closely investigate potential categorical differences between canonical processing and error processing in L2 learners. Also, to untangle the contribution of error processing and picture processing from the current findings, it is essential that future studies test the processing of grammar errors contained in auditory words rather than pictures. This could even be achieved using the same wordpicture association paradigm as in the present study but including intermitted reverse test pairs (i.e., picture-word) after every learning cycle where matches and mismatches are then contained in the auditory words.
The second main finding in the mismatch processing responses is the different processing of tone-mismatched picture referents between learner groups. This emphasises once more the importance of L1-L2 similarity in initial L2 acquisition: only mismatches based on familiar phonological features elicited a mismatch response in the learners. This is presumably related to the inability to pre-consciously process unfamiliar phonological elements; a realisation that is well-attested in the rapid learning literature (e.g., Gosselke Berthelsen et al., 2020;Kimppa et al., 2015;Silva et al., 2019). The non-tonal learners cannot pre-attentively access the word's tone or its content, which in turn likely inhibits automatic pre-activation of the tone-related picture features. As there are no pre-activated features for the tone condition for the NTL1s, the appearance of a mismatch in the pictures will not lead to an expectancy-based mismatch, hence preventing the emergence of an N400. Notice, however, that the lack of an expectancy-related mismatch response in the NTL1 group does not entail that mismatches were not detected. Behavioural results show no significant differences between the tonal and non-tonal learners in tone mismatch detection accuracy (M TL1 = 68.9%; M NTL1 = 62.6%) or response times (M TL1 = 1724 ms; M NTL1 = 1853 ms), replicating findings which show that transfer affects online but not offline processing in L2 learners (e.g., Andersson et al., 2019). For features where transfer is not possible, learners presumably rely on later, more consciously evaluative processing to detect mismatches. This contrasts with pre-activation-induced mismatch detection for transferable features, which is likely initiated during pre-conscious processing of the feature and its associated meaning. Interestingly, the additive nature of the grammatical tone entailed that the non-tonal learners could process the remainder of the word, i.e., the consonant and vowel content, independently of the tone. We could, therefore, observe an N400 for errors based on mismatches across the vowel and consonant dimension even in the NTL1 group. The transfer-related effect of the words' phonological features on the processing of mismatches in pictures again shows that transfer affects not only the word itself but also the processing of tightly associated referents, demonstrating the overarching influence of L1-L2 similarity in SLA. As argued for the N1, above, further studies are needed to determine the potential reach of such transfer-based top-down effects on the processing of upcoming words. It would be useful to determine, for instance, whether only the immediately following element can be top-down mediated or whether this effect extends to any
closely associated element, possibly with a number of intervening constituents. Finally, in accordance with previous findings on the rapidness of initial L2 acquisition (e.g., Cunillera et al., 2009;de Diego-Balaguer et al., 2007;Dittinger et al., 2019;Gullberg et al., 2010;Gosselke Berthelsen et al., 2020;Kimppa et al., 2015), the N400 for mismatched pictures manifested already within 1 hour, i.e., within 15 learning cycles and 15 mismatch trials per condition. At this point, mismatch processing had manifested to such a degree that further changes over time were undetectable. Indeed, the effect was likely present already after 30 minutes. We see great improvements in the behavioural responses within the first session and know from previous literature that neurophysiological changes tend to precede behavioural ones (e.g., McLaughlin et al., 2010). A targeted investigation into the onset of error processing in L2 acquisition would require a reliable way of studying ERP responses at smaller time intervals: for instance, by means of testing a very large number of participants. Alternatively, future studies could use an acquisition paradigm with a considerably larger number of target words so that the number of error trials increases as a function of the total number of trials.

Behavioural responses to errors in initial learners
The two learner groups in the present study were virtually indistinguishable in their behavioural performance on mismatch detection. Although mean accuracy for tone mismatches was slightly lower in the non-tonal learner group and mean response times longer, between-group differences were not significant. The observed findings tie in nicely with previous literature showing that learners' offline responses are often unaffected by transfer (e.g., Andersson et al., 2019). Instead, facilitation effects based on L1-L2 similarity tend to manifest in online measures. Learners who cannot profit from positive L1-L2 transfer have to resort to different types of processingfor instance, conscious rather than pre-conscious (cf. Gosselke Berthelsen et al., 2020) but reach the same learning outcome.
A secondary finding in the behavioural data in the present study was an overall poor acquisition outcome for tone. Of all mismatch conditions, tone had the lowest accuracy and slowest response times. The inferior learning performance is likely based on two factors: a slight functional dissimilarity in the L1 and L2 tone in this study and a general difficulty in the acquisition of L2 tone. The tonal participants' L1 tone is associated with grammar but not itself grammatically meaningful, unlike the L2 tone in the present study. This might have negatively impacted on the TL1s' tone acquisition. Besides, L2 tone is generally difficult to acquire and even distinguish. To this effect, Yang and Chan (2010) found that even highly advanced learners of Chinese performed poorly in the perceptual discrimination of tone. For the least accurate tone contrast in their study, discrimination accuracy was below 30% in the advanced learners, compared to nearly 100% in the native speaker control group. This considerable discrimination difference is a clear indicator of the general perceptual and in turn acquisitional challenges associated with L2 tone. Coupled with the slight L1-L2 tone dissimilarity, this readily explains the lower acquisition outcome for L2 tone even in the tonal learners.
Interestingly, however, while there were no between-group differences for mismatch detection in the behavioural variables, we found a correlation between behavioural factors and the N400 such that response times were faster the larger the N400. This correlation centrally emerged both for vowel mismatch and in the tonal learners also for tone mismatch. The emergence of the correlation in the tonal learners (and a clear lack of correlation in the NTL1s) strengthens the suggested impact of the words' phonological properties, via pre-activation of linguistic content, in the processing and detection of errors in picture referents. Those learners, who react most strongly to picture mismatches neurophysiologically (visible in larger N400 effects), are also fastest at identifying a corresponding error behaviourally. Both outcomes are likely mediated through expectation-related processes.

Conclusions
In the current study, we investigated how behavioural error detection and neurophysiological error processing proceeded in novice learners. We found that mismatches elicited by pictures in wordpicture pairs uniformly elicited an N400 response, regardless of whether the mismatch was lexicosemantic or morphosyntactic in nature. We believe this to be indicative of the increased difficulty of error processing in second language learners, making the study of grammatical ERP responses in violation paradigms a questionable measure of L2 proficiency and processing. The N400 emerged well within an hour of acquisition and was correlated with behavioural responses. Importantly, we only observed an N400 when the error was based on a familiar phonological property in the preceding word, highlighting the importance of cross-linguistic transfer. Besides the N400, we found a visual perception-related posterior N1 for the pictures which was delayed for learners who could, via phonological transfer, internalise all three sound-meaning relationships and thus automatically pre-activate three rather than only two of the picture's associated visual properties.