The congruency effect in L2 collocational processing: The underlying mechanism and moderating factors

Abstract The congruency effect—that is, faster and more accurate processing of congruent multiword units, has been demonstrated in multiple studies. It is still unclear, however, what its underlying mechanism is, and how congruency may interact with other factors. Using an acceptability judgement task, this study examined the congruency effect in immersive (Experiment 1) and nonimmersive (Experiment 2) L2 learners’ collocational processing while taking into account L2 collocation frequency, immersive learners’ L2 use, their length and starting age of immersion, nonimmersive learners’ length of instruction, and their L2 proficiency. The study also tested whether L1 counterparts of words in L2 collocations were activated. Nonmmersive learners showed a congruency effect in both processing speed and accuracy. In contrast, immersive learners were affected by congruency only in processing accuracy. Higher L2 collocation frequency, greater length of instruction, and higher L2 proficiency did not reduce the congruency effect, whereas longer duration of immersion improved the processing of incongruent items. An effect of L1 lexical frequency was found, an indication of L1 activation. Results were discussed in light of how L2 proficiency and experiences changed the amount of L1 influence in L2 collocational processing.


Introduction
First language (L1) plays a significant role in second language (L2) learning and processing. Studies have demonstrated L1 influence in the processing of L2 multiword units (MWUs) through the congruency effect. MWUs, often used interchangeably with other terms such as formulaic sequences/language, may be defined as word sequences that frequently co-occur, that together often represent a single concept (Wood, 2019), and that are considered conventional by speakers of a language (Siyanova-Chanturia & Pellicer-Sánchez, 2019). Some main subcategories of MWUs include idioms (e.g., spill the beans), collocations (e.g., single parents), and phrasal verbs (e.g., pick up), among others. The congruency effect in MWU processing occurs when congruent MWUsthat is, those that have word-for-word translations between the L1 and the L2-are processed faster and more accurately than incongruent ones, those without word-forword L1 translations (e.g., Wolter & Gyllstad, 2013;Wolter & Yamashita, 2018;Yamashita & Jiang, 2010). An example of a congruent MWU between Chinese and English is strong wind, the direct Chinese translation of which is felicitous in Chinese. Strong tea, on the other hand, is incongruent, because its word-for-word Chinese translation does not constitute a felicitous expression in Chinese.
Little is known about the mechanism underlying the congruency effect, with some attributing the effect to L1 activation in L2 processing (e.g., Conklin & Carrol, 2018), and others to differences in the order of acquisition of congruent and incongruent MWUs (e.g., Wolter & Gyllstad, 2013). Few studies to date have directly tested these explanations. In addition, few studies on the congruency effect in MWU processing have considered how learner-related factors such as the learning context and itemrelated factors such as frequency (see Wolter & Gyllstad, 2013, for an exception) may moderate the congruency effect. This study intended to fill the above research gaps by taking into account L2 MWU frequency and by looking into MWU processing of learners in two different contexts-that is, English-as-a-foreign-language (EFL) and English-as-a-second-language (ESL). These two groups of learners mostly differ in their L2 experiences-for example, the amount and quality of exposure-and their comparison will shed light on how experiences shape L2 MWU representations and the bilingual lexicon. The study also directly tested whether L1 was activated in L2 MWU processing by examining whether L1 lexical frequencies of individual words in an L2 MWU affected learners' processing.
The underlying mechanism of the congruency effect in MWU processing The congruency effect has been demonstrated across multiple subcategories of MWUs, including idioms (e.g., Carrol & Conklin, 2014, binomials (Du et al., 2021), and collocations (e.g., Sonbul & El-Dakhs, 2020;Wolter & Gyllstad, 2011Wolter & Yamashita, 2018;Yamashita & Jiang, 2010). In general, data collected from eyetracking and reaction-time (RT) tasks have suggested that an L2 MWU with a wordfor-word translation in the L1 is processed faster and more accurately than an L2 MWU without such a counterpart in the L1. Conklin and Carrol (2018) attributed the congruency effect to the automatic activation of L1 translation equivalents of words in an L2 MWU. They argued that in the case of congruent MWUs, the L1 translation equivalents of individual words would then activate corresponding L1 MWUs, which leads to the activation of the underlying concepts; alternatively, the concepts can be directly activated by L2 MWUs without L1 mediation. For incongruent MWUs, on the other hand, there is no corresponding L1 MWUs to be activated, and L2 learners have to access the meaning of L2 MWUs directly without relying on the L1. They explained that in the initial stages of L2 learning, L2 MWU representations are likely to be weak and thus the L1-mediated route may be faster, resulting in an advantage of congruent MWUs. Conklin and Carrol's explanation is in line with the revised hierarchical model (Kroll & Stewart, 1994), which hypothesized that access to L2 word meanings is mediated through the L1 at least at the beginning of L2 learning. Similarly, Jiang (2000) proposed a bilingual lexicon model where L2 forms are linked to L1 semantics in the early stages of L2 lexical development. Applying Jiang's model to L2 MWU processing, congruent MWUs would be recognized faster because they are linked to existing L1 MWUs; the recognition of incongruent MWUs, in contrast, would be slower, because it requires rejecting the word-for-word L1 translations first and matching L2 MWUs to the right L1 expressions that bear the same meanings (Yamashita & Jiang, 2010).
A few studies interpreted their findings as counterevidence against L1 activation as the mechanism underlying the congruency effect. Wolter and Yamashita (2018), using an acceptability judgement task (AJT), included three types of adjective-noun collocations: congruent, incongruent, and translated collocations (those that exist only in the L1, i.e., L1-only). Their results revealed that although Japanese learners of English showed a congruency effect, they did not show a processing difference between L1-only collocations and noncollocational items. Similarly, Wolter and Yamashita (2015) used a double lexical decision task and found no difference between the processing of L1-only collocations and noncollocates. The existence of the congruency effect and the lack of processing difference between L1-only collocations and noncollocates were used as evidence against the argument that L1 activation underlies the congruency effect. Further, Wolter and Gyllstad (2013), focusing on adjective-noun collocations, did not find an effect of L1 collocation frequency on L2 collocational processing, an indication that L1 collocations were not activated. Authors of these studies referred to the order of acquisition as an alternative explanation for the congruency effect. They argued that congruent MWUs are learned earlier than incongruent ones based on available L1 MWU knowledge. MWUs that are learned earlier then become more entrenched in the neural networks and thus enjoy a processing advantage. However, it is also possible that the lack of difference between L1-only and noncollocational items in these studies did not come from the lack of L1 activation but was in fact due to the nature of the task. In Wolter andYamashita (2015, 2018), participants had to make a Yes/No decision on the acceptability of a collocation as in an AJT or on the lexicality of the component words in a collocation as in a lexical decision task. It is possible that participants activated the L1 translations of individual words in an L1-only collocation and initially decided for a Yes based on their L1. But later participants were unable to find the L1-only collocation in their L2 mental lexicon to support their decision. They then had to revise their initial Yes decision before eventually pressing the response key, resulting in a slowdown and thus a null difference between L1-only and noncollocates. This potential explanation is supported by eyetracking data from Carrol and Conklin (2017) and Carrol et al. (2016). In these studies, when participants did not have to make decisions, they read L1-only idioms faster than control items.
Regarding the null results of L1 collocation frequency effect in Wolter and Gyllstad (2013), it could well be the case that the L1 counterparts of individual words in a MWU rather than the whole L1 MWU are activated. In this case, L1 frequencies of individual words rather than of MWUs will have an effect on L2 processing. In the current study, I explored this possibility by examining whether L2 learners' processing of MWUs is influenced by the L1 translation frequencies of individual words in the MWUs. This approach is less likely to be subject to task effects than the use of L1-only MWUs and provides more direct evidence of L1 activation (Jiang et al., 2020).

Factors moderating the congruency effect
This study focused on the item-related variable of L2 MWU frequency and the learnerrelated variable of learning context and investigated how they interacted with congruency. These two variables were chosen to represent two aspects of language learning experiences. In particular, MWU frequency indexes the more specific micro-L2 experiences that learners have with individual MWUs, and the learning context represents the macro aspect of one's L2 experiences-that is, being immersed in the L2 environment or not. Research into how these two factors moderate the congruency effect may tell a more complete story about how experiences influence the bilingual lexicon.

L2 frequency of multiword units
Frequency represents the prevalence of a linguistic unit in the input and indicates the amount of experience one has with the unit (Tremblay & Tucker, 2011). The more frequent the unit, the more likely one encounters it repeatedly. Repeated encounter of a structure makes its representation stronger and more readily accessible in memory; and particularly related to MWUs, high frequency also leads to structure autonomy-that is, being processed as a unit rather than as individual parts (Bybee, 1985(Bybee, , 2006. Empirical research has suggested that more experiences with a MWU, or higher MWU frequency, lead to better learning (e.g., Pellicer-Sánchez et al., 2022;Puimège & Peters, 2020), faster processing (e.g., Ellis et al., 2008;Yi, 2018), and faster production (e.g., Janssen & Barber, 2012) Although studies on L2 MWU processing and learning largely confirmed the effects of L2 MWU frequency and congruency, not much is known about how congruency interacts with frequency. Yamashita and Jiang (2010) hypothesized that congruent MWUs may be less influenced by L2 MWU frequency because they can be accepted based on learners' L1, whereas for incongruent MWUs, L2 frequency may play a bigger role, as they receive little L1 support. Another possibility is that the congruency effect may attenuate among MWUs of high L2 frequency because as these MWUs are encountered repeatedly, their L2 representations become stronger and are subject to less L1 influence. So far, only Wolter and Gyllstad (2013) have examined this potential interaction. Counter to Yamashita and Jiang's predictions, their results revealed that the congruency effect applied to both high-and low-frequency collocations, and that both congruent and incongruent collocations were subject to L2 frequency influence. Further investigation is needed to understand in what circumstances this interaction may or may not occur. Several studies have suggested that L2 learners' sensitivity to L2 frequency may vary based on learners' amount of L2 exposure: more L2 exposure is associated with reduced sensitivity to L2 frequency (Cop et al., 2015;Yi, 2018;Yi et al., 2017). If learners with different amounts of L2 experiences vary in their sensitivity to L2 MWU frequency, it is possible that these learners' reaction to congruency is also influenced by L2 MWU frequency to different extents.

Learning context
The previously mentioned bilingual lexicon models by Kroll and Stewart (1994) and Jiang (2000), in addition to hypothesizing how the L1 and the L2 interact, also provided predictions on how the L1-L2 interaction develops as the L2 learner progresses. Kroll and Stewart (1994) argued that although access to L2 word meanings is mediated by the L1 at the beginning of L2 learning, the connections between L2 words and their meanings will grow stronger and thus reliance on the L1 would decrease when (1) the L2 becomes dominant possibly when learners are in an immersive environment and/or (2) L2 proficiency improves. From a developmental perspective, Jiang (2000) also argued that as learners' L2 exposure and proficiency increase, words in their L2 mental lexicon might connect to concepts directly without L1 mediation.
Findings regarding whether an immersive environment reduces L1 influence in L2 learning and processing in general have been mixed so far. In Linck et al. (2009), English learners of Spanish, after three months of studying abroad, showed attenuated L1 access in comprehension and production compared with students who learned the L2 in a classroom setting in their home country. Jiang et al. (2020) included Chinese learners of English who resided in the US and China. The study revealed that when processing L2 words, only learners in China but not those immersed in the United States exhibited L1 activation. Several studies, in contrast, demonstrated that learners who were immersed in the L2 environment still activated their L1 during L2 processing. Carrol and Conklin (2017), using eye tracking, showed activation of L1 forms but not meanings in L2 idiom processing by Chinese learners of English in the UK. In Thierry and Wu (2007) and Wu and Thierry (2010), Chinese learners of English in the UK activated L1 translations in the L2 processing of single words as shown by ERP data; however, such L1 activation was not detected in behavior RT data.
With respect to L2 MWU processing, most studies recruited a homogenous group of L2 learners who were residing in their home country or in the target language country. These studies were therefore not positioned to explore the potential moderating effects of learning context on the congruency effect. One exception, Yamashita and Jiang (2010), compared EFL and ESL learners, who resided in China and the US, respectively, in their processing of adjective-noun and verb-noun collocations. The authors in fact found that only EFL but not ESL learners showed a congruency effect in processing speed, which was interpreted as an indication of ESL learners' L2 collocation representations being independent of the L1. However, it is not known what aspects-for example the length of residence, age of arrival, or the amount of L2 use-in the immersive environment contribute to the development of L2 MWU representations. Although these variables have been found to affect language learning to different extents in an immersive context (length of residence: e.g., Amuzie & Winke, 2009; age of arrival: e.g., Baker, 2010; L2 use: e.g., Baker-Smemoe et al., 2014;Cubillos & Ilvento, 2012), they have not been studied in L2 MWU processing. For L2 learners residing in their home countries, it is unclear whether more instruction and/or higher proficiency levels will allow them to eventually develop strong L2 MWU representations like their counterparts in the immersive environment. For the variable of length of instruction, Muñoz (2014) suggested that in terms of oral proficiency, nonmmersive learners' syntactic complexity but not fluency or lexical diversity was predicted by the number of years of formal instruction. Again, this variable has not been explored in MWU processing studies. Regarding L2 proficiency, three studies on L2 collocational processing so far (Sonbul & El-Dakhs, 2020;Wolter & Gyllstad, 2013;Wolter & Yamashita, 2018) have examined its effect and showed mixed findings. In Sonbul and El-Dakhs (2020), participants reacted to adjective-noun and verb-noun collocations in a timed AJT task. The authors found an interaction between L2 proficiency and congruency, suggesting that the processing difference in RT between congruent and incongruent collocations was smaller for more advanced learners. This interaction, however, was not found in the untimed multiple-choice test. Wolter and Gyllstad (2013), also using an AJT, uncovered an L2 proficiency effect in the accuracy but not RT data: the effect of L2 proficiency was more pronounced on the processing accuracy of incongruent than congruent collocations. Wolter and Yamashita (2018), on the other hand, did not reveal an L2 proficiency effect: learners displayed a congruency effect regardless of their proficiency. As Sonbul and El-Dakhs (2020) pointed out, the reason why Wolter and Yamashita (2018) did not find an effect of L2 proficiency may be because they included proficiency as a categorical rather than a continuous variable; further, the proficiency range included in previous studies was somewhat narrow. Although Sonbul and El-Dakhs (2020) filled the abovementioned two gaps, their proficiency measure only assessed 1k and 2k levels of vocabulary, which may not have fully revealed participants' proficiency. The inconsistent findings regarding the effect of L2 proficiency on congruency warrant further research with more comprehensive proficiency measures.

The current study
The goal of the current study was to test whether L1 activation of individual words in a MWU is underlying the congruency effect and how the strength of the congruency effect may vary in relation to learners' learning context and L2 MWU frequency. Two experiments are reported. Experiment 1 involved ESL learners residing in the US. Experiment 2 was intended to see whether the results of ESL learners also apply to EFL learners in China. ESL learners' length of residence and age of arrival in the target L2 country, the amount of L2 use, and EFL learners' L2 proficiency and length of instruction were taken into account to further gauge the effects of learning context and language experiences.
The current study focused on the processing of adjective-noun collocations, a subcategory of MWUs. Collocations are more likely to be subject to L1 influence because unlike other types of MWUs such as idioms, an L1 collocation usually has a corresponding expression in the L2, though sometimes with different word combinations (Wolter, 2006;Yamashita & Jiang, 2010). Adjective-noun collocations were chosen in this study because the lack of variability in whether a determiner is present before the noun in this type of collocations makes it easier to develop comparable items used in the experiments (Wolter & Gyllastad, 2013;Yi, 2018). Because collocation frequency was included as a variable in the study, the frequency-based approach, which defines collocations as words that frequently co-occur (e.g., Gyllstad & Wolter, 2016;Nesselhauf, 2003;Sinclair, 1991;Webb et al., 2013), was followed when choosing experimental items. The following research questions were addressed: 1. Does the congruency effect show in the L2 collocational processing of ESL (Experiment 1) and EFL (Experiment 2) learners? 2. Are the L1 translation equivalents of individual words in an L2 collocation activated during ESL (Experiment 1) and EFL (Experiment 2) learners' L2 processing? 3. To what extent does L2 collocation frequency (Experiments 1 and 2), ESL learners' length of residence, age of arrival, and L2 use (Experiment 1), and EFL learners' L2 proficiency and length of instruction (Experiment 2) moderate the congruency effect?

Materials
Using the Phrases in English database (Fletcher, 2011), I retrieved a preliminary list of 66,285 adjective-noun collocations with the minimum frequency of 10 from the British National Corpus (BNC). The raw frequencies were then standardized using the Zipf scale (Van Heuven et al., 2014). The Zipf scale is logarithmic and does not contain negative values. Zipf frequency in this study was calculated by the formula log10 (frequency per million words) + 3. The conversion resulted in a Zipf-frequency range of 2 to 4.44. Following Siyanova-Chanturia and Spina (2015) and Yi (2018), I divided the frequency range into four bins: 2-2.3, 2.5-2.8, 2.9-3.2, 3.3-4.44. Next, I went through items in each bin to determine which were congruent and incongruent. A congruent collocation has a word-for-word translation from English to Chinese. That is, the most typical Chinese translations of the English collocation's components constitute an acceptable Chinese word or collocation that expresses the same meaning as the English collocation. For example, good and idea in good idea are typically translated into Chinese as 好 (hǎo) and主意 (zhǔ yì), respectively; 好主意 is a legitimate Chinese word and corresponds to the meaning of good idea. An incongruent collocation is one without such word-for-word translation between the two languages. An example is heavy rain, where heavy and rain are typically translated into 重 (zhòng) and 雨 (yǔ), respectively. The combination of the two characters-that is, 重雨, is infelicitous in Chinese and does not correspond to the meaning of heavy rain, which is expressed in Chinese as大 (dà; big) 雨 (yǔ; rain).
Pilot tests following Yamashita and Jiang's (2010) procedure were run to make sure that the typical translations of the collocations and their constituents I assumed were also agreed upon by other L1 speakers of Chinese. Chinese learners of English who had a background similar to those of prospective ESL participants in the study were asked to provide Chinese translations for (1) the constituent words in the English collocations in the first pilot (n = 4) and (2) the English collocations in the second pilot (n = 5). Collocations in the study fulfilled the following criteria, as shown by responses from at least three out of four participants in the first pilot and four out of five in the second pilot. First, the meanings of the collocations and their components should be known. Second, for congruent items, the translations of words in the first pilot should be equal to individual Chinese words in the Chinese translations of collocations in the second pilot. For example, white and paper were translated into 白 (bái) and 纸 (zhǐ), respectively, in the first pilot, each corresponding to the words in the translation of the congruent collocation white paper (白纸). Third, for incongruent items, the translations of words in the first pilot should not completely overlap with Chinese words in the Chinese translations of collocations in the second pilot. Living rooms, for example, was translated to 客 (kè; guest) 厅 (tīng; hall ) in the second pilot; neither 客 nor 厅 corresponded to the translations provided for living (生活 shēng huó) and rooms (房间 fáng jiān) in the first pilot.
The congruent and incongruent items were then matched for length (number of letters) of the whole collocation as well as its individual words, mutual information (MI; the strength of co-occurrence between words in a collocation), L2 collocation frequency, L2 Word1 frequency, and L2 Word2 frequency, using data from BNC. The final materials included 40 congruent and 40 incongruent collocations, with 10 congruent and 10 incongruent items in each of the four frequency bins; 99% of the words used in the stimuli were from the most frequent 4,000 word families in the BNC/COCA word frequency list (Nation, 2012). Words within this frequency band were likely to be known by target EFL participants in China, who would have passed the College English Test Band 4 (CET4), a standardized proficiency test in China (see Zhao & Ji, 2018, for the relationship between vocabulary size and CET4 scores). In addition, five EFL students in China, who were similar to prospective EFL participants in the study, rated the items for familiarity-that is, how well the participants knew the meanings of the items, on a 7-point Likert-type scale. The average familiarity rating was 6.70 (SD = .28).
Eighty noncollocates, serving as control items, were created by randomly combining adjectives and nouns from the congruent and incongruent collocations. The control items did not appear in BNC, except for black meal, which has a frequency of 1 per 100 million and a low MI of .15. The L1 (Chinese) translation frequencies of Word1 and Word2 were obtained from SUBTLEX-CH (Cai & Brysbaert, 2010). Table 1 presents a summary of item characteristics. Mann-Whitney U tests revealed no significant difference between the congruent and incongruent collocations in terms of L2 collocation frequency (W = 794.50, p = 1.00, r = -.006) and MI (W = 953.00, p = .14, r = .16). No significant difference was found between congruent, incongruent, and control items as suggested by Kruskal-Wallis tests for L2 Word1 frequency, χ 2 (2) = .64, p = .73, effect size in ϵ 2 = .004; L2 Word2 frequency, χ 2 (2) = .4.43, p = .11, effect size in ϵ 2 = .028; Word1 length, χ 2 (2) = .08, p = .96, effect size in ϵ 2 = .001; Word2 length, χ 2 (2) = .01, p = 1.00, effect size in ϵ 2 = 4.89e-05; and item length, χ 2 (2) = .07, p = .96, effect size in ϵ 2 = .0005. Both Experiments 1 and 2 used the same set of items. For the complete list of stimuli, see Appendix A. Twenty-four L1 speakers of English were recruited to respond to the stimuli in an AJT. This manipulation check confirmed that the stimuli were working as intended: L1 speakers of English (1) responded significantly faster and more accurately to collocations than to control items, (2) did not show a congruency effect, and (3) did not show effects of L1 Chinese Word1 or Word2 frequencies. For L1 English speakers' RT and accuracy and details of the analyses, see Appendix B.
The acceptability judgement task An AJT was used in Experiments 1 and 2 to examine collocational processing. This task taps into participants' processing of meaning (Wolter & Yamashita, 2018). In this task, participants were asked to decide whether or not a word combination was acceptable as quickly and accurately as possible (see Appendix C for task instructions). The instructions were written in both English and Chinese. The task was created and administered online using Gorilla, an online experiment builder (Anwyl-Irvine et al., 2019).
The items in the AJT were presented one at a time in a randomized order at the center of a screen. Each trial began with a fixation cross for 500 ms, followed by the item, which remained on the screen until response or disappeared and was replaced by the next trial after 5,000 ms. Before the presentation of the 160 test items, there was a handedness questionnaire and 20 practice items. Left-handers pressed s for YES responses and k for NO responses. For those who were right-handed, YES responses were indicated by pressing k and NO responses by pressing s. Participants took a break after 90 items and resumed by pressing the space bar.

Procedure
In both Experiments 1 and 2, participants first completed a language background questionnaire on Qualtrics (https://www.qualtrics.com) before doing the AJT on Gorilla. It was emphasized to the participants that they should do the task (1) on a desktop or laptop, (2) in a quiet environment with good internet connection, and (3) after reading the instructions carefully. For the first requirement, a constraint was also set on Gorilla such that the AJT was only accessible on desktops or laptops. The duration of each experiment was around 20 min.

Experiment 1
Participants Thirty-three ESL learners in the US participated in the study. Two of them, who arrived in the US at the age of 6 and 11 years, respectively, and were studying at high school were excluded to make the ESL group more homogenous. The remaining ESL learners all spoke Mandarin as their L1, were studying at university in the US or had graduated from a US university, and were residing in the US at the time of participation. Their mean age was 25.52 years (SD = 5.38, range = 20-42). They had been in the US for at least a year (M = 4.82, SD = 2.13, range = 1.5-12) and came to the US at a mean age of 20.84 years (SD = 5.47, range = 13-36). On a 100-point scale in the language background questionnaire, the ESL learners reported using English 38.10% of their day on average (SD = 22.10, range = 0-81). The proficiency level of the ESL learners was estimated to be at least intermediate-low, given that a TOEFL score of 60 or IETLS of 6.0 is at the lower end of the admission scores required by US universities.

Analysis
All statistical analyses were performed in R (version 3.6.1; R Core Team, 2019). The RT and accuracy data collected from the AJT were analyzed with mixed-effects modeling. Mixed models were built using the lme4 package (version 1.1-21; Bates et al., 2015). The p values were calculated by lmerTest (version 3.1-0; Kuznetsova et al., 2017). Responses with RTs shorter than 400 ms or longer than 4,000 ms were trimmed using the trimr package (version 1.0.1; Grange, 2015). These responses were considered too slow or fast to accurately reflect the genuine recognition process (see also Gyllstad & Wolter, 2016;Öksüz et al., 2021;Wolter & Yamashita, 2018, for similar outlier removal criteria). RT trimming affected 0.52% of the ESL data. Trimmed RT data were then log transformed to bring the variable closer to normal distribution. Log-transformed RTs were analyzed with linear mixed-effects models and only correct responses were included. Generalized linear mixed-effects models were built for accuracy data in which correct responses were coded as 1 and incorrect ones as 0. A maximal model was first built, which included (1) main effects of theoretical interest: L2 collocation frequency, condition (congruent and incongruent), L1 Word1 and Word2 frequencies, age of arrival, length of residence, and L2 use; (2) interactions between the main effects of interest; (3) covariates: collocation length (number of letters), L2 Word1 and Word2 frequencies, and MI; and (4) maximal random-effects structure justified by the data (Barr et al., 2013). Condition was treatment coded with the congruent condition serving as the reference. Three items (one congruent and two incongruent) were excluded from the analysis because the L1 frequencies of these items were not found in the corpus. In case of nonconvergence, the random-effects structure was simplified by dropping by-subject random slopes first (Barr et al., 2013), followed by by-item random slopes. The backward stepwise modeling procedure was adopted in which insignificant interactions were first excluded, followed by insignificant covariates. Interactions and covariates that did not improve model fit were then also removed. Models were compared based on the Akaike Information Criterion (AIC), and the model with the lowest AIC was chosen. All continuous independent variables were mean centered to reduce collinearity. The variance inflation factors of the final models were checked using the performance package (version 0.4.0; Lüdecke et al., 2019) and were all under 10 (Hair et al., 1995). Residuals of the final models were also examined, and observations with a residual greater than 2.5 SD away from the mean were removed. The 95% Wald confidence intervals for estimates were obtained through the confint.merMod() function. Table 2 includes the descriptive statistics of ESL learners' trimmed RTs (in milliseconds) and accuracy (in percentage) by item type. In response to RQ1 on the congruency effect, the RT analysis (Table 3) showed that overall, ESL learners responded to congruent and incongruent collocations at a similar speed-that is, no congruency effect; in contrast, the accuracy analysis (Table 4) showed that that ESL learners made more errors when responding to incongruent than congruent items. In terms of RQ2 on L1 activation, ESL learners' processing speed but not accuracy was affected by the L1 frequencies of component words in the L2 collocations. Further, the RT data showed no significant interaction between condition (congruent vs. incongruent) and L1 word frequencies, suggesting that L1 was activated in the processing of both congruent and incongruent collocations. As for the moderating variables on the congruency effect (RQ3), the interaction between L2 collocation frequency and congruency did not survive model selection, suggesting that ESL learners were sensitive to frequency regardless of congruency and that, more importantly, the congruency effect in accuracy or the lack of it in RT remained regardless of collocation frequency. ESL learners' reaction to congruency in L2 collocational processing was affected by their length of residence in the US. Longer residence in the US was associated with faster and more accurate processing of incongruent items (see Figure 1). Age of arrival in the country or percentage of daily L2 use did not moderate the congruency effect. The lack of age effect may be attributed to the fact that most ESL participants arrived in the US after age 18 (DeKeyser, 2000;DeKeyser et al., 2010). For the null effect of L2 use, one explanation could be that percentage of L2 use did  not reflect the quality of L2 use, which may contribute to language development more than quantity (Baker-Smemoe et al., 2014).

Experiment 2
Participants Thirty-three EFL learners were recruited. They were non-English majors in their sophomore or junior years in college in mainland China, with an average age of 19.52 years (SD = 0.94, range = 18-22). They had never lived or studied in an English-speaking country and had received English instruction for an average of 10.39 years (SD = 2.36, range = 6-15). Self-reported CET4 scores were used as a proxy of EFL learners' proficiency level. All but four participants had taken the test, and those who did took the test less than 2 years ago from the time of data collection. The CET4 has a maximum score of 710, and the average score of the EFL participants was 494.97 (SD = 57.59, range = 372-593). According to China's National Education Examinations Authority (n.d.), the participants' mean score was around the 44th percentile, with a range from the 4th to 90th percentile. The EFL learners' CET scores suggested that they were of intermediate-low to intermediate proficiency.

Analysis
Mixed-effects models were built to analyze RT and accuracy data. The RT trimming and transformation, data coding, modeling procedure, model comparison, and model criticism followed those in Experiment 1. RT trimming affected 2.50% of the EFL data. The initial maximal models for RT and accuracy included (1) main effects of theoretical interest: L2 collocation frequency, condition (congruent and incongruent), L1 Word1 and Word2 frequencies, L2 proficiency (CET4 scores), and length of instruction; (2) interactions between the main effects of interest; (3) covariates: collocation length (number of letters), L2 Word1 and Word2 frequencies, and MI; (4) maximal randomeffects structure justified by the data (Barr et al., 2013). Four EFL participants were excluded from analysis due to missing CET4 scores. Table 5 presents the descriptive statistics for EFL learners' RT and accuracy. Mixedeffects models (see Tables 6 and 7) showed that unlike ESL learners, EFL learners showed a congruency effect in both processing speed and accuracy (RQ1), reacting significantly faster and more accurately to congruent than incongruent items. Similar to ESL learners, EFL learners were affected by the L1 frequencies of component words only in RT but not in accuracy (RQ2). L1 word frequencies did not interact with condition, an indication of L1 activation in both congruent and incongruent collocation processing. The interaction between L1 Word1 frequency and length of instruction showed a trend toward significance. This interaction suggested that L1 Word1 frequency effect would be slightly larger for learners with longer length of instruction.

Results and discussion
In terms of variables that moderated the congruency effect (RQ3), there was no significant interaction between L2 collocation frequency and congruency, indicating that EFL learners showed a congruency effect even for high-frequency collocations. Length of instruction did not interact with congruency, meaning that more classroom instruction did not reduce L1 influence in EFL learners' collocational processing. The RT data showed that L2 proficiency did not moderate congruency. This is in line with Wolter and Gyllstad (2013) and Wolter and Yamashita (2018), which revealed null results of L2 proficiency, but stands in contrast with Sonbul and El-Dakhs (2020), who found that advanced learners showed a smaller congruency effect. As mentioned in the literature review, Sonbul and El-Dakhs (2020) attributed the null results in Wolter and Gyllstad (2013) and Wolter and Yamashita (2018) to the treating of L2 proficiency as a categorical variable and to including participants of a limited range of proficiency levels.  In the current study, even when L2 proficiency was analyzed as a continuous variable and the range of proficiency was fairly wide (i.e., from 4th to 90th percentile), L2 proficiency did not come out as a significant moderating variable of congruency. In the accuracy analysis, L2 proficiency interacted with L2 collocational frequency. It seems that EFL learners with higher proficiency level benefitted more from greater collocational frequency. Proficiency interacted with congruency in an unexpected direction such that the higher the L2 proficiency, the more errors EFL learners made when responding to incongruent collocations. Although such results seemed counterintuitive, one speculation was that EFL learners of lower proficiency lacked confidence in their ability and tended to think that items they did not know were not necessarily unacceptable. Thus, even when these lower level learners did not know the incongruent items, they accepted the items as collocations, leading to higher accuracy. If this was the case, lower level learners should also have made more errors for noncollocational control items, thinking that those items were collocations that they have not learned. In contrast, learners of higher proficiency would be more certain and reject items that they were not familiar with-namely, incongruent collocations as well as noncollocates. To evaluate this explanation, a follow-up analysis on the relationship between proficiency and accuracy for noncollocational control items was conducted. Proficiency turned out to be significantly and positively related to accuracy for noncollocational control items (z = 4.60, p < .001; see Appendix D for the full model), suggesting that EFL learners with lower proficiency were indeed more likely to accept a control item as a collocation. The lack of proficiency and congruency interaction in the RT analysis, which only involved correct responses, also supported this explanation: Once the EFL learners have learned the collocations, proficiency had little influence on the learners' processing of congruent and incongruent collocations.

General discussion
The findings of the study can be summarized as follows. First, EFL and ESL learners reacted differently to congruency (RQ1): ESL learners only showed a congruency effect in processing accuracy but not speed, whereas EFL learners' accuracy and speed were both affected by congruency. In response to RQ2 on L1 activation, L1 lexical frequencies of individual words in an L2 collocation affected both ESL and EFL learners' processing speed, an indication that L1 was activated regardless of whether a learner showed a congruency effect. Finally, in terms of the moderating variables on the congruency effect (RQ3), L2 collocation frequency did not interact with congruency, meaning that repeated encounter of a structure or the lack of it did not affect how L2 learners were influenced by their L1 during L2 processing. ESL learners' length of residence in the US moderated the effect of congruency: The longer ESL learners resided in the US, the faster and more accurately they reacted to incongruent collocations. The congruency effect was not reduced by EFL learners' L2 proficiency or length of instruction. Below I discuss the results in relation to (1) the underlying mechanisms of the congruency effect (based on RQ2) and (2) how language experiences and L2 proficiency moderate L1 influence (based on RQs 1 and 3).
The underlying mechanism of the congruency effect To reiterate, two explanations have been proposed in the literature to account for the congruency effect-namely, L1 activation and order of acquisition. The results of the current study favor L1 activation for two reasons. First, the absence of a congruency effect in ESL learners contradicts the proposal that the congruency effect comes from earlier acquisition of congruent items. If it is the case that congruent items are processed faster because they are learned earlier, they are likely to always enjoy an advantage over their incongruent counterparts regardless of whether a learner has lived in the L2 environment. Second, the effect of L1 lexical frequency on L2 collocational processing constitutes direct evidence for L1 activation. 1 That is, the higher the L1 lexical frequencies of individual words in the L2 collocations, the faster L2 learners processed the collocations. More importantly, L1 was activated for both congruent and incongruent collocations and for EFL as well as ESL learners even when ESL learners did not show a congruency effect in RT. These findings indicate that, consistent with Conklin and Carrol's (2018) hypothesis, L1 translation equivalents of words in the L2 collocations are automatically activated. It is not known, however, what routes learners may take after the activation of L1 translation equivalents to access the concepts of the L2 collocations. Based on Conklin and Carrol (2018) and Jiang (2000), I propose some possible scenarios in the following paragraphs that could account for the presence of the congruency effect among EFL learners and the absence of it among ESL learners. For incongruent collocations, it is most likely that after activation of the L1 translation equivalents of individual words, L2 learners need to suppress their L1 because the activated L1 translation equivalents do not constitute felicitous L1 expressions. L2 learners then switch to the L2 route, which involves the activation of L2 collocations to directly access the concepts. The processing speed for incongruent collocations, therefore, depends on the strength of the conceptual link between an L2 collocation and its meaning and/or how good L2 learners are at suppressing their L1. EFL learners may have a comparatively weak conceptual link and/or a more dominant L1 that is harder to suppress. ESL learners, in comparison, may have established a stronger link between L2 collocations and concepts, or their L1 might be inhibited by the L2 immersive environment (Linck et al., 2009).
For congruent collocations, there are three possible scenarios. The first one entails dual activation, where the activated L1 translation equivalents lead to the activation of the corresponding L1 collocation as well as the L2 collocation. The dual activation may explain EFL learners' faster processing of congruent collocations over incongruent ones (Wolter & Gyllstad, 2011). Such a route, however, seems less plausible for ESL learners, who did not show a congruency effect. In the second scenario, the L1 translation equivalents activate the L1 collocation, through which meaning is accessed. For ESL learners, strong representations have been established for L2 collocations, and thus there is little difference in speed between the L2 route for incongruent collocations and the L1-mediated route for congruent ones. EFL learners may also take this route. But 1 A reviewer pointed out that to establish direct evidence for L1 activation, apart from significant effects of L1 lexical frequencies, it was also necessary to demonstrate that models with L1 frequencies provide a better fit than those with L2 frequencies. Following this suggestion, additional analyses were conducted. In the first analysis, I fit two models for RT data. One with L1 frequencies, the main variables, and covariates in the study (i.e., condition, L2 collocation frequency, length of instruction, L2 proficiency, length of residence, age of arrival, percentage of L2 use, collocation length, and MI). The other model included L2 frequencies, the main variables, and covariates. This analysis was done for both EFL and ESL RT data. In the second analysis, I used the final models for RT data selected in the original manuscript. I then replaced L1 frequencies in those models with L2 frequencies. Again, this was done for both EFL and ESL RT data. In both analyses, the models with L1 frequencies were better fitting (i.e., lower AIC) than the ones with L2 frequencies. Appendix E summarizes the specifications and AICs for each model in the two analyses. because their L2 representations are still weak, the L2 route may be slower than the L1-mediated route, resulting in the congruency effect. Finally, in the third scenario, for ESL learners, it is possible that after the automatic activation of L1 translation equivalents of the individual words, the L2 collocation and its concept are directly activated, without the mediation of the L1 corresponding collocation. The strength of the conceptual links for both congruent and incongruent collocations is similar, and thus there was no congruency effect. This route, which does not involve activation of L1 collocations, may account for the lack of L1 collocation frequency effect in Wolter and Gyllstad (2013).

Language experiences, proficiency, and L1 influence
The congruency effect is an indication of the amount of L1 influence in L2 processing. Analyzing learners in two different learning contexts (ESL and EFL) and the moderating variables on the congruency effect (i.e., L2 collocation frequency, length of residence, age of arrival, and length of instruction) informs us about how language experiences may affect L1 influence. Frequency indicates the more specific micro-L2 experiences that learners have with individual linguistic units (Spätgens & Schoonen, 2019). Learning context and related variables (length of residence, age of arrival, and length of instruction) represent the macro aspect of one's L2 experiences-that is, time being immersed in the L2 environment or learning in a classroom setting.
For the micro aspect of experiences, the results for both RT and accuracy showed that regardless of learning context, ESL and EFL learners were tuned to L2 collocation frequency, a finding consistent with usage-based approaches to language acquisition that learning is driven by experiences (e.g., Bybee, 2006). More importantly, the effect of L2 collocation frequency did not interact with congruency, a result similar to that of Wolter and Gyllstad (2013). That is, the congruency effect existed for both high-and lowfrequency collocations in EFL learners' RTs and accuracy and ESL learners' accuracy. This suggested that although repeated encounters can contribute to fluent and accurate processing of collocations, it does not seem to reduce the influence of the L1.
Learning context, the macro aspect of L2 experiences, on the other hand, affects the amount of L1 influence in L2 processing. ESL learners, different from EFL learners, did not show a congruency effect in terms of processing speed. This finding is in line with Yamashita and Jiang (2010), who attributed the discrepancy between ESL and EFL learners to their different degrees of dependence on the L1 lexicon as a result of learning context difference: When an L2 collocation is first learned, it is linked to its L1 counterpart, which serves as a mediation to meaning, and this dependence on L1 gradually subsides with more L2 exposure, leading to the lack of congruency effect in ESL learners' RTs. The effects of length of residence on ESL learners' collocational processing further support the role of learning context in reducing L1 influence. ESL learners' processing of incongruent collocations was faster with longer length of residence in the US. For processing accuracy, though ESL learners still showed a congruency effect, the longer they lived in the L2 environment, the smaller the congruency effect was. This implies that immersing in an L2 environment alleviates the negative effect of L1-L2 incongruency, making it less difficult to learn and accept L2 collocations that did not have counterparts in the L1. In contrast, for EFL learners in a classroom setting, having more language instruction or being higher in language proficiency did not seem to reduce L1 influence in L2 collocational processing in terms of either speed or accuracy.
The differential effects of L2 collocation frequency, learning context and its related variables (i.e., length of residence, age of arrival, and length of instruction), and L2 proficiency on L1 influence suggest that frequent inhibition of the L1 may be the key to reducing L1 influence. Linck et al. (2009) showed that when learners were immersed in the L2 environment, L1 access was constantly suppressed, leading to attenuated L1 influence. In the current study, EFL learners may have encountered an L2 collocation repeatedly, had substantial classroom instruction, and achieved relatively high proficiency, but they were still in an L1 environment with active use of the L1. In contrast, ESL learners not only used the L1 less but also had to suppress the L1 when they used the L2. Reduced access to and frequent inhibition of the L1 may have resulted in lower level of L1 activation, which allowed the ESL learners to become more independent of the L1 in L2 collocational processing. This explanation also provides support for the argument in the previous discussion section that the processing of incongruent collocations depends on how well L2 learners are at L1 suppression.

Limitations
The findings of the study should be interpreted in light of several limitations. First, the L2 proficiency levels of ESL and EFL learners were not matched. The potential differences in proficiency levels between the two groups made it hard to attribute their processing differences solely to the learning context. Although it was found that L2 proficiency did not reduce the congruency effect, we still cannot rule out the possibility that the lack of congruency effect in ESL learners was a result of immersion and higher L2 proficiency combined. However, L2 proficiency and learning context are two constructs that are challenging to disentangle. Immersion might improve L2 proficiency in ways such as enhancing processing efficiency, which may not be captured solely by standardized tests. In fact, in Jiang et al. (2020), length of immersion was used as a proxy of L2 proficiency. When matching the proficiency levels of ESL and EFL learners, future studies can use both standardized tests and RT-based measures to provide more comprehensive proficiency profiles of their participants.
The second limitation is related to participants' familiarity with constituent words in the collocations. Participants' not knowing the constituent words may have changed the nature of "No" responses in the AJT: Participants indicated that an item was not a collocation because they didn't know some of the constituent words, not because they thought that the item was a noncollocate. Although the constituent words were estimated to be known to participants based on their proficiency level, future studies should directly check participants' knowledge of the words in an exit questionnaire.
Another limitation of the study concerns the accuracy of information about learners' L2 use. ESL participants' understanding of language use may have varied, with some only considering speaking the language as language use for example. The use of percentage as the unit of L2 use may have further contributed to variations in responses. Definitions of L2 use should be provided and L2 use should be measured in hours in future studies to obtain more accurate information.
Finally, as Yamashita (2018) pointed out, the semantic transparency effect may be involved in the congruency effect: Congruent collocations used in previous studies tended to be more transparent than incongruent collocations, and transparent collocations were processed faster (e.g., Gyllstad & Wolter, 2016). Although the manipulation check showed that L1 English speakers processed congruent and incongruent collocations used in the current study in a similar manner, indicating that these two types of collocations were comparable in transparency, future studies should still obtain semantic transparency ratings of items to reduce the potential confounding effect of transparency.

Conclusion
The current study adds to our understanding of the mechanism underlying the congruency effect in L2 collocational processing and how L1 influence in L2 collocation processing may vary in relation to learners' L2 proficiency and linguistic experiences, indexed by L2 collocation frequency and the learning context. By examining the effect of L1 translation frequencies of individual words in an L2 collocation, the study provides direct evidence of L1 activation and pinpoints that the unit of L1 activation is individual words in an L2 collocation. The findings also showed that repeated encounter with an L2 collocation alone may not be sufficient to attenuate L1 influence in L2 processing. Being immersed in the L2 environment and the duration of immersion, on the other hand, may have a positive effect on reducing L1 influence. Further research is needed to elucidate how immersion attenuates L1 influence-for example, through L1 inhibition. Findings in this study should also be verified in other tasks, such as self-paced reading, and with other types of collocations.
Supplementary material. The supplementary material for this article can be found at http://doi.org/ 10.1017/S0272263123000281. Data availability statement. The experiment in this article earned an Open Materials badges for transparent practices. The materials and data are available at https://osf.io/e4sng/?view_only=97246182b67546f08ce160 b461399e92 Competing interest. We have no known conflict of interest to disclose.