Highlights
-
• Native English speakers demonstrated a significant semantic priming effect in both tasks.
-
• L2 semantic priming effect was not significant for L2 learners in either task.
-
• Task effects were not found for both native English speakers and L2 learners.
-
• L2 form-meaning connection is weak and nonnative-like for advanced L2 learners.
1. Introduction
A long-standing issue in bilingual processing research is the connection between L2 words and concepts or L2 lexical form-meaning connections. It consists of several specific questions. The first is whether a bilingual’s two languages are linked to a shared or two separate semantic or conceptual systems. The second is whether L2 words are directly linked to conceptual representations or indirectly through L1 translations. A third issue is the strength of connections between L2 words and concepts if such direct connections exist. Earlier research seemed to have helped create a consensus regarding the first two questions. For example, researchers have compared response time between two tasks: L2 picture naming and L1–L2 translation. If the two languages are linked through shared conceptual representations, performing the two tasks would involve similar activation pathways, that is, picture-concept-L2 word in picture naming and L1 word-concept-L2 word in L1–L2 translation. As a result, bilinguals should complete the two tasks in a similar amount of time. However, if L2 words are only connected with L1 translations rather than via shared concepts, bilinguals should perform the translation task much faster than the L2 picture naming task. Several studies showed similar response latencies in the two tasks (e.g., Chen & Leung, Reference Chen and Leung1989; Kroll & Curley, Reference Kroll, Curley, Gruneberg, Morris and Sykes1988; Potter et al., Reference Potter, So, von Eckardt and Feldman1984), and these findings have been considered as evidence for a direct connection between L2 words and concepts. Similarly, comparable semantic priming effect size from interlingual and intralingual prime-target pairs (as reported by Caramazza & Brones, Reference Caramazza and Brones1980; Frenck & Pynte, Reference Frenck and Pynte1987; Schwanenflugel & Rey, Reference Schwanenflugel and Rey1986) were taken as evidence for the view that “bilinguals have a single semantic representation that subserves two distinct sets of lexical entries for their two languages” (Caramazza & Brones, Reference Caramazza and Brones1980, p. 81). Thus, it is not surprising that current models of bilingual lexical representation such as the Revised Hierarchical Model (RHM, Kroll & Stewart, Reference Kroll and Stewart1994), the distributed conceptual feature model (DCFM, de Groot, Reference de Groot, Frost and Katz1992, Reference de Groot, Schreuder and Weltens1993), and the BIA+ model (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002) all recognized shared conceptual representations by the two languages and a direct L2-concept connection.
1.1. The strength of L2-concept connections: Conflicting findings
The third issue, that of the strength of connections between L2 word form and meaning, remains controversial. Many researchers endorse the view that L2 word-concept connections are weaker in strength than L1 word-concept connections, and acknowledge, at the same time, that this connection can become stronger with increased L2 proficiency (e.g., Kroll & Tokowicz, Reference Kroll, Tokowicz, Kroll and de Groot2005). Other researchers emphasize a robust and direct conceptual connection with L2 words, often based on the findings that a semantic effect can be observed in L2 processing that is similar in magnitude to that in L1 processing (e.g., Duyck & de Houwer, Reference Duyck and De Houwer2008).
Empirical evidence is mixed. Some studies reported comparable semantic processing effects in L1 and L2. For example, a similar imageability effect (de Groot & Poot, Reference de Groot and Poot1997) and a similar concreteness effect (de Groot et al., Reference de Groot, Dannenburg and van Hell1994) were found in forward (L1 to L2) and backward (L2 to L1) translations. De Groot et al. (Reference de Groot, Borgwaldt, Bos and Van den Eijnden2002) also reported that semantic properties (imageability, context availability, and definition accuracy) affected the Dutch-English bilinguals’ lexical decision latencies in both L1 and L2. Duyck and Brysbaert (Reference Duyck and Brysbaert2004, Reference Duyck and Brysbaert2008) examined the topic in terms of the number magnitude effect, for example, individuals responding to smaller numbers (e.g., 2 or 3) faster than larger numbers (e.g., 8 or 9), and they found that bilinguals showed a similar number magnitude effect in both translation directions. Duyck and de Houwer (Reference Duyck and De Houwer2008) adopted a semantic Simon task to explore this issue. They presented animal names (e.g., fox) and occupation names (e.g., driver) to Dutch-English bilinguals in upper and lower cases and asked them to say “animal” to uppercase targets and “occupation” to lowercase targets. They manipulated the word type (animal or occupation names) and letter case (upper or lower) such that they may be congruent (e.g., FOX presented in uppercase) or incongruent (e.g., fox presented in lowercase). This setup allowed them to determine if there was a congruency effect (or a semantic Simon effect in their term) in processing L1 and L2. They found a comparable congruency effect of 28 ms and 23 ms for L1 and L2 items, respectively. The finding that newly learned words could produce reliable semantic priming effects, as reported by Elgort (Reference Elgort2011) and Elgort and Piasecki (Reference Elgort and Piasecki2014), is also consistent with immediate and robust semantic connections for L2 words.
Where other paradigms were adopted, many studies have shown a weaker semantic effect associated with an L2. Researchers have compared the emotional effects in L1 and L2 processing, for example. The emotional effect refers to an elevated response to emotional words (e.g., murdur, war) compared to neutral words (e.g., sleep, jar), as assessed in terms of recall accuracy, valence or arousal rating scores, response latencies, electrophysiological responses, level of skin conductance, or pupil size in visual word recognition. There is increasing evidence showing a reduced emotional effect in L2 processing. In one of the earliest studies, Anooshian and Hertel (Reference Anooshian and Hertel1994) asked English-Spanish bilinguals to first rate the difficulty and emotionality of a set of emotional and neutral words, which was then followed by a surprise recall task. The participants showed a better recall rate for emotional words than for neutral words in their L1, but not in L2. Winskel (Reference Winskel2013) tested Thai-English in a color decision task with emotional and neutral words displayed in different colors. The bilinguals showed a delay in responding to emotional words as compared to neutral words in L1, but they showed no such effect in L2. Eilola et al. (Reference Eilola, Havelka and Sharma2007) and Eilola and Havelka (Reference Eilola and Havelka2010) demonstrated that while participants showed an elevated skin conductance response to L1 negative and taboo words, they did not show this pattern in L2. Similar results were also reported by Iacozza et al. (Reference Iacozza, Costa and Duñabeitia2017), where the emotional effect was measured in terms of emotionality rating scores and pupillary responses, that is, the size of the pupils as recorded by an eye tracker. Spanish-English bilinguals showed different pupillary response patterns while performing the task in their L1 and L2. The change in pupil size between negative and neutral words was larger in L1 than in L2. Assuming that the emotional effect is semantic in nature, that is, arising from the meaning of the stimulus words, these findings suggested a weaker L2-concept connection.
The same pattern can be seen in false memory studies where L1 and L2 are compared. These studies often employed the Deese-Roediger-McDermott (DRM) paradigm. A set of words related to a theme is displayed to participants for memory in the study phase, for example, beer, drunk, liquor, wine, glass, and alcohol, as related to the “alcohol” theme. In the test phase, words are displayed for a recognition task in which participants have to decide if a word is one of those shown in the study phase. The test items included both previously displayed “old” words that require a positive response, and “new” words that were not displayed previously and thus require a negative response. Among the latter were words related or not related to the themes involved in the study phase. Individuals often produce an incorrect positive response to a theme-related word, for example, whiskey, than to a control word, which is referred to as the false memory effect. Several studies have shown that this effect was much weaker in L2 than in L1. For example, the false recognition rate was 29% and 15% in the dominant and nondominant languages among the English-Spanish bilinguals tested by Arndt and Beato (Reference Arndt and Beato2017, Experiment 1). They also showed that more proficient L2 speakers produced a higher false recognition rate (22%) than less proficient bilinguals (13%). This L1–L2 difference has also been shown in several other studies (Anastasi et al., Reference Anastasi, Rhodes, Marquez and Velino2005; Cabeza & Lennartson, Reference Cabeza and Lennartson2005; Howe et al., Reference Howe, Gagnon and Thouas2008; Miyaji-Kawasaki et al., Reference Miyaji-Kawasaki, Inoue and Yama2004; Sahlin et al., Reference Sahlin, Harding and Seamon2005).
1.2. Semantic priming studies involving L2 primes: Further inconsistency
A frequently used research paradigm for studying word-concept connections, also the one employed in the present study, is the priming paradigm. Strength of conceptual connections for L1 and L2 words can be assessed by comparing the semantic priming effect size involving L1 and L2 primes. This can involve both intralingual (L1–L1, L2–L2) and interlingual (L1–L2, L2–L1) prime-target pairs. Semantic priming arising from semantically or associatively related prime-target pairs, such as dog-cat or dog-mao (for Chinese-English bilinguals), is more informative than translation pairs, as translation priming can potentially arise from direct lexical links.
Evidence from semantic priming studies is also quite inconsistent, though. Many early bilingual priming studies showed a relatively consistent pattern: interlingual semantic priming effects were stronger in the L1–L2 direction than in the L2–L1 direction. For example, Schwanenflugel and Rey (Reference Schwanenflugel and Rey1986) showed a priming effect of 135 ms and 47 ms in the two priming directions in their Experiment 1, and that of 63 ms and 12 ms in Experiment 2, where the stimulus onset asynchrony was reduced from 300 ms to 100 ms. The priming effects in the two directions were 91 ms and 15 ms in Jin (Reference Jin1990), 95 ms and 10 ms and 45 ms and 5 ms in the two experiments reported by Tzelgov and Eben-Ezra (Reference Tzelgov and Eben-Ezra1992), and 38 ms and −5 ms in Keatley et al. (Reference Keatley, Spinks and de Gelder1994, Experiment 1, SOA 250 ms). This pattern has been replicated in more recent studies, for example, 53 ms and −3 ms in Smith et al. (Reference Smith, Walters and Prior2019). The priming effect in the L2–L1 direction was absent in multiple studies. These results suggest that L2-concept connections are weaker than L1-concept connections. Other studies, however, have reported comparable priming effects or an opposite pattern, for example, 20 ms and 33 ms in Keatley and de Gelder (Reference Keatley and de Gelder1992, Experiment 1, high proportion of related items), 18 ms and 26 ms in Grainger and Beauvillain (Reference Grainger and Beauvillain1988, Experiment 2, long SOA), and 170 ms and 250 ms in Chen and Ng (Reference Chen and Ng1989, Experiment 2).
All these studies adopted an unmasked priming paradigm in which the prime was visible to a participant, which may give rise to guessing or anticipation once a participant noticed a semantic relationship between the prime and the target. Subsequent masked priming studies overcame this problem by presenting the prime subliminally, for example, for 50 ms, and by sandwiching it between a visual mask and the target. Two studies showed a semantic priming effect in both directions (Perea et al., Reference Perea, Duñabeitia and Carreiras2008; Schoonbaert et al., Reference Schoonbaert, Duyck, Brysbaert and Hartsuiker2009). Basnight-Brown and Altarriba (Reference Basnight-Brown and Altarriba2007) also tested bilinguals in both priming direction, but reported priming only from the dominant language. Two additional masked priming studies tested bilinguals only in the L1–L2 direction. Williams (Reference Williams1994) showed a semantic priming effect only among items of strong semantic relation, but de Groot and Nas (Reference de Groot and Nas1991) reported semantic priming from L1 to L2 only when the prime was visible; there was no masked semantic priming from L1 to L2.
In contrast to a relatively large number of interlingual semantic priming studies, only a small number of studies have examined intralingual semantic priming in L2. Frenck and Pynte (Reference Frenck and Pynte1987) tested English-French bilinguals with English or French targets (e.g., sparrow or ormoineau) displayed alone or preceded by a category name in the same language (e.g., bird or oiseau). They found an intralingual semantic priming effect of 81 ms and 21 ms in L1 and L2, respectively, among their skilled bilinguals. Grainger and Beauvillain (Reference Grainger and Beauvillain1988) also reported significant semantic priming in L1 and L2. de Groot and Nas (Reference de Groot and Nas1991) included L2–L2 semantic priming involving noncognates in their Experiment 3, and a significant masked priming effect of 35 ms was found. Smith et al. (Reference Smith, Walters and Prior2019) tested unbalanced Hebrew-English bilinguals on semantically related and unrelated prime-target pairs in both L1 and L2 in a lexical decision task (LDT). The prime duration was 150 ms followed by 50-ms blank screen. They found a significant semantic effect of 41 ms in L2, but the 14 ms priming effect in L1 was not significant. The latter finding was inconsistent with those of previous studies in which reliable semantic priming was found in L1 (e.g., 28 ms of priming in Perea et al., Reference Perea, Duñabeitia and Carreiras2008).
1.3. The present study
L2-concept connections represent a fundamental issue in bilingual processing research. In light of the inconsistent findings, the present study was intended as a further effort to examine this issue. Specifically, we adopted the unmasked priming paradigm to examine L2–L2 semantic priming among unbalanced but advanced bilinguals in two tasks: lexical decision and semantic categorization. These methodological decisions were made on the basis of several considerations that we explain below.
First, we examined L2–L2 semantic priming for three reasons. First, whether L2 primes were able to show a reliable semantic priming effect will shed light on the strength of L2-concept connections. A reliable L2–L2 semantic priming effect would provide evidence for a strong L2-concept connection. Previous research has already documented such priming effects, but methodological characteristics of these studies often prevent them from generalizing the findings. For example, Perea et al. (Reference Perea, Duñabeitia and Carreiras2008) reported L2–L2 semantic priming, but they tested balanced bilinguals. Smith et al. (Reference Smith, Walters and Prior2019) also reported this effect, but they adopted a high relatedness ratio (1:1) of their test items, which could have led to strategic effects. Second, compared to a large number of interlingual semantic priming studies, there have been a small number of studies that examined intralingual semantic priming in L2. Finally, intralingual L2–L2 priming was preferred over L2–L1 interlingual priming not only to avoid language switching complications but also for a better chance of observing a reliable priming effect. As Keatly et al. (Reference Keatley, Spinks and de Gelder1994) argued, within-language association is generally stronger than cross-language association as words in the same language share the same semantic network, which would result in more efficient transfer of activation to related words stored in the same interrelated network.
We chose to test unbalanced L2 speakers because this is the population (in contrast to balanced bilinguals) that has been studied in most bilingual processing studies. Current models of bilingual representation, such as the revised hierarchical model of Kroll and Stewart (Reference Kroll and Stewart1994) and BIA+ model of Dijkstra and van Heuven (Reference Dijkstra and van Heuven2002), are based on the findings from this population. We focused on advanced L2 speakers because substantial lexicosemantic development has occurred in these individuals, and their performance (in contrast to lower-proficiency L2 speakers) would be more informative about whether strong L2-concept connections can be developed in L2 learning.
Furthermore, the unmasked priming paradigm was chosen over the masked priming paradigm because semantic priming under the masked condition was often quite small even among L1 speakers, for example, 7 ms, 17 ms, and 11 ms with prime duration of 33 ms, 50 ms, and 67 ms, respectively, in Perea and Gotor (Reference Perea and Gotor1997), 11 ms in Tan and Yap (Reference Tan and Yap2016), and 9 ms and 20 ms with the prime displayed for 40 ms and 60 ms, respectively, in Kiefer et al. (Reference Kiefer, Harpaintner, Rohr and Wentura2023). Given slower processing associated with L2 words, using the masked priming paradigm risks missing the priming effect due to increased processing demand. We were aware of the risk of the participants developing strategic effects once they became aware of the semantic relationship between an unmasked prime and a target. This risk was dealt with by keeping a low proportion of related items, as was done in previous studies (e.g., Frenck & Pynte, Reference Frenck and Pynte1987; Grainger & Beauvillain, Reference Grainger and Beauvillain1988).
Finally, the comparison of semantic priming in the two tasks of lexical decision and semantic categorization was inspired by the task effect in translation priming. Several studies have demonstrated that masked L2 words often failed to prime their L1 translation in an LDT. However, when the task was switched to a semantic categorization task, reliable L2–L1 translation priming effects can be observed. This task effect has been demonstrated where the masked priming paradigm was adopted (e.g., Finkbeiner et al., Reference Finkbeiner, Forster, Nicol and Nakamura2004; Grainger & Frenck-Mestre, Reference Grainger and Frenck-Mestre1998; Xia & Andrews, Reference Xia and Andrews2015). The same task effect was also reported where a two-phase priming paradigm was adopted. In the latter case, participants were asked to perform a lexical task, such as the LDT or a semantic task, such as animacy judgment in L2, in the study phase, which was then followed by a test phase in which the participants performed the same tasks but with words in the other language. An L2–L1 translation priming effect was observed only when a semantic task was performed (e.g., Taylor & Francis, Reference Taylor and Francis2017; Zeelenberg & Pecher, Reference Zeelenberg and Pecher2003). The same task effect has also been observed in semantic priming in L1 (e.g., de Wit & Kinoshita, Reference de Wit and Kinoshita2015a). The task effect suggests that word-concept connections are more likely to be involved or boosted in a semantic task than in a lexical task (Gollan & Kroll, Reference Gollan, Kroll and Rapp2001; Zeelenberg & Pecher, Reference Zeelenberg and Pecher2003). Thus, the semantic task offers a better opportunity for semantic priming to materialize.
Thus, the present study examined the strength of L2-concept connections via L2–L2 semantic priming. Two favorable conditions were created for this priming effect to surface if strong L2-concept connections existed: the display of visible L2 primes and the adoption of a semantic task in comparison to a lexical task. A group of native speakers (NS) of English was included for comparison. We expected NS to show a reliable semantic priming effect in both tasks, thus replicating previous findings and confirming the adequacy of the design. They may produce a stronger semantic priming effect in the semantic task than in the lexical task. If strong L2-concept connections are present, we would expect advanced L2 speakers to show a semantic priming effect, maybe smaller in magnitude in comparison to NS, and at least in the semantic task. If L2 speakers did not show a priming effect in both tasks, the findings would suggest a weak word-concept connection in L2 speakers.
2. Experiment 1: L2 semantic priming in LDT
2.1. Method
2.1.1. Participants
The study tested 38 native English speakers and 40 L2 learners. Native English participants were recruited from the SONA system and were all undergraduate students at a mid-Atlantic university in the United States, with an average age of 19.28 (SD = 1.16). All the L2 learners were recruited through snowball sampling and were native Chinese speakers, learning English as a foreign or second language. 80% of them were current university students (10% undergraduate junior and senior students; 70% graduate students) in the United States, majoring in various disciplines (37.5% in language/linguistic-related fields, 37.5% in social sciences, and 25% in STEM), and the rest 20% were recent graduates who were now working full-time in the United States. All of them used English on a daily basis with an average age of 26.23 years old (SD = 3.98).
The language learning and use questionnaire, adapted from the Language Experience and Proficiency Questionnaire (LEAP-Q) (Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007), indicates that the average age of acquisition for the L2 learners was 8.12 (SD = 2.69). All of them have been living in an English-speaking country for an average of 3.98 years (SD = 3.32). 75% of the participants reported that they were generally more comfortable using Chinese, and the rest reported being equally comfortable using both English and Chinese, indicating that most of them were L1 dominated. Moreover, the participants scored on average 75.38% (SD = 8.87; Min = 58. 75%; Max = 92.5%) in the LexTALE test, which confirmed their high English proficiency, ranging from level B2 to level C2 (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). Their self-reported English proficiency on a 10-point Likert scale in reading (M = 7.98, SD = 1.40), listening (M = 7.75, SD = 1.26), speaking (M = 7.1, SD = 1.52), and writing (M = 6.93, SD = 1.62) also demonstrated their high English proficiency. All participants had normal or corrected-to-normal vision without any language or learning problems. They were given a $10 Amazon gift card for their participation.
2.1.2. Materials
Thirty-six semantically related English word pairs were selected from the USF Free Association Norms (Nelson et al., Reference Nelson, McEvoy and Schreiber1998). The selected stimuli were all concrete nouns containing three to seven letters with forward association strength greater than 0.38 and word frequency larger than 10 times per million. These critical items were chosen based on the following criteria: 1) the forward association strength of words should be as large as possible to ensure a high relatedness of the word pairs; 2) the frequency of each prime and target word should be as high as possible to ensure that participants were familiar with the stimuli; 3) the target words should be easily categorized into either the “man-made” or “natural” category to be used in both tasks, so we selected concrete nouns instead of other parts of speech; 4) the length of words should be kept minimum to facilitate recognition. By applying those criteria to material selection, 42 pairs of English words were initially selected. Five highly proficient Chinese learners of English who shared a similar language learning background with the participants were instructed to complete an English-Chinese translation task for all selected pairs. They also rated their familiarity with each word on a scale from 1 (least familiar) to 5 (most familiar). Six word pairs were removed from the list due to either low translation accuracy or low familiarity ratings. Eventually, there were a total of 36 semantically related pairs selected for the present study. The mean familiarity ratings for the primes and targets were 4.97 and 4.99, respectively, suggesting that Chinese learners of English were very familiar with these word pairs. The average word association strength of the final pairs was 0.55, which means that, on average, 55% of the native English speakers tested in Nelson et al. (Reference Nelson, McEvoy and Schreiber1998) were able to produce the targets when presented with the primes.
Two counterbalanced lists were created. The target words that were primed by semantically related words in List 1 were primed by semantically unrelated words in List 2. The related primes had a mean frequency of 48.86 per million and were on average 5.00 letters long with 5.67 mean concreteness. The unrelated primes were matched to the related primes on word length, concreteness, and frequency. See Table 1 for the lexical properties of the items. To avoid strategic processes, the proportion of the related items was reduced to 20% (McNamara, Reference McNamara2005) by adding another 54 unrelated filler pairs to make up a total of 90 word pairs. Both filler primes and filler targets were matched with the related primes and targets on word length, concreteness, and frequency, respectively. Moreover, to ensure that the same material can be used in the following semantic categorization task, all the 90 word pairs had an equal number of word targets that can be categorized as either “natural” (N = 45) or “man-made” (N = 45). To construct a lexical decision task, 90 orthographically legal and pronounceable English non-words were selected from the ARC non-word database (Rastle et al., Reference Rastle, Harrington and Coltheart2002) by matching the word length to that of the critical word targets. Moreover, another 90 English words that match the preceding English primes on word length, concreteness, and frequency were selected to prime the nonwords. The same filler pairs and nonword pairs were incorporated into both lists. To help participants familiarize the task, six practice items were added at the beginning of each list. Those items were different from the items used in the main experiment, and responses to these items were not included in the data analysis. Thus, each list contained a total of 186 trials, including six practice pairs, 36 critical word pairs, 54 unrelated filler pairs, and 90 nonword pairs. The trials (except for the practice pairs) were pseudorandomized to ensure that participants did not provide the same response for more than three consecutive items. Additionally, participants were randomly assigned to one of the two lists and they only saw each target word once throughout the experiment.
Table 1. Lexical properties of the primes and targets

2.1.3. Procedure
The experiment was conducted online. The participants first read and signed the consent form before moving on to the main experiment. The experiment was delivered via PsychoPy (2022.2.4). Both the primes and targets were displayed in lowercase to facilitate recognition by Chinese learners of English (see Jiang, Reference Jiang2021). In the lexical decision task, participants were instructed to decide whether the letter strings shown on the screen were real English words or not. They were directed to press the “a” button on the keyboard if the letter strings were words and “l” if they were not. Participants were encouraged to create tags or take notes to help them remember which button represents which decision to avoid confusion during the experiment. They were also instructed to respond as quickly and accurately as possible. Their reaction times were recorded from the onset of the target until they provided a response. The experiment began with a fixation (+) appearing on the screen for 500 ms, followed by a prime lasting for 250 ms and by a target which remained on the screen until the participants responded or for a total of 3 seconds. After the main experiment, participants were instructed to complete the LexTALE test and fill out the revised LEAP-Q questionnaire. The sequence of the experiment was to avoid test effects, as the LexTALE and the main experiment share a similar task format.
2.1.4. Data analysis
Before the main analyses, the data were first cleaned and trimmed. Participants whose error rate was above 20% were excluded from the final dataset (N = 2). The RT analysis focused only on the 36 critical items. We first deleted all the incorrect responses. This led to the exclusion of 2.34% of the data in the native speakers and 1.61% in the L2 learners. RT outliers were also removed as they may not reflect a genuine word recognition process (Jiang, Reference Jiang2012). We excluded RTs longer than 2,500 ms and shorter than 300 ms and RTs that deviated from each participant’s mean RT by 2.5 standard deviations (Jiang, Reference Jiang2012). This procedure resulted in the deletion of another 3.89% of the data in the native speakers and 2.97% in the L2 learners. The whole data trimming procedure affected a total of 6.23% of the data for the native speakers and 4.58% for the L2 learners. After data trimming, the data were inverse transformed (−1000/RT) to improve distribution normality (as judged from the histograms, the inverse transformation functioned better than the log transformation).
Then, the trimmed and transformed RT data were analyzed through linear mixed effects (LME) models. All the analyses were conducted using R (Version 4.2.1: R Core Team, 2022) in the R Studio environment with the lme4 (Version 1.1–33: Bates et al., Reference Bates, Mächler, Bolker and Walker2015) and the lmerTest (Version 3.1–3: Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) packages. The transformed RT data were treated as the outcome variable and were continuous in nature. Condition (Related vs. Unrelated), Group (Native vs. L2), Association Strength, Target Word Frequency, Concreteness, and Length were taken as fixed effects, with Association Strength, Target Word Frequency, Concreteness, and Length being continuous and Condition and Group being categorical. All the continuous variables were grand-mean-centered, and the two categorical variables were dummy-coded with the related condition and native speakers as the reference groups. Item and Participant were treated as random intercepts, and both by-item and by-participant random slopes were considered in each analysis.
To fit the random effects, the maximal random effects structures recommended by Barr et al. (Reference Barr, Levy, Scheepers and Tily2013) were used, meaning that all the by-participant and by-item random intercepts and slopes were first included in the model. We examined whether deleting one random component would significantly decrease the model fit. If removing one random component led to a significant decrease in the model fit, the more complex model would be retained. Otherwise, a simpler model should be chosen. Models were estimated using the maximum likelihood technique. The χ2 likelihood ratio test with its associated p-value was examined for model comparison in this stepwise procedure.
To evaluate the fixed effects, we adopted a confirmatory approach. This was achieved by treating the transformed RT data as the outcome variable, Condition, Group, and the interaction between Condition and Group as the fixed effects, with Association Strength, Target Word Frequency, Concreteness, and Length as covariates. The semantic priming effect for native speakers was indicated by the main effect of Condition when the reference group was set as native speakers. We relevelled the reference group to L2 learners to evaluate the L2 semantic priming effect for the L2 learners. We acknowledge the use of generalized mixed-effects models (GLMM) with inverse-Gaussian or gamma distributions to handle positively skewed RT data. However, we did not adopt this approach because such models often present convergence issues and are sensitive to complex random-effects structures, as in the present case. Complete mixed-effects model output and R code for all the analyses are available on Open Science Framework (OSF): https://osf.io/d2ph4
2.2. Results and discussion
Table 2 displays the descriptive statistics of the average RT and ER of the native and non-native English participants’ performance in the LDT. The ERs did not differ systematically across the related and unrelated conditions for both the native English speakers and L2 learners.
Table 2. Native English speakers’ and L2 learners’ mean RT (in ms) and ER (in percentage) in the LDT (standard deviation in parentheses)

Note: Asterisk means statistically significant at p < .05 level. Inferential statistics were computed on inverse-transformed RTs, and the raw means should be interpreted descriptively only.
When we set native English speakers as the reference group, the best-fitting LME model with by participant random slopes for Condition and Strength and by-item random intercept demonstrates that there was a significant main effect of Condition b = 0.074, SE = 0.022, p = .001, indicating that native English speakers performed significantly slower in the unrelated condition than in the related condition, meaning that the semantic priming effect was significant for native speakers. However, after releveling the reference group to L2 learners, the main effect of Condition was not significant, even controlling for covariates: b = 0.005, SE = 0.022, p = .817, indicating that L2 learners did not respond significantly differently in the related and unrelated conditions. The result suggests a non-significant L2 semantic priming effect for L2 learners. The interaction between Condition and Group was significant: b = −0.069, SE = 0.031, p = .027, indicating that the priming effect differed significantly between native speakers and L2 learners. This suggests that the two groups engaged in qualitatively different processing in the LDT (i.e., the NSs showed an effect, whereas the NNSs did not).
In sum, Experiment 1 examined L2 intralingual semantic priming with both native English speakers and L2 learners in the LDT. The results demonstrate a significant semantic priming effect for native English speakers, but a null effect for L2 learners. The nonsignificant L2 semantic priming effect was consistent with the findings of many previous bilingual processing studies (e.g., Chen & Ng, Reference Chen and Ng1989; Jin, Reference Jin1990; Smith et al., Reference Smith, Walters and Prior2019), indicating a weak connection between L2 words and their meanings.
3. Experiment 2: L2 semantic priming in SCT
As reviewed earlier, previous translation priming studies (e.g., Finkbeiner et al., Reference Finkbeiner, Forster, Nicol and Nakamura2004; Grainger & Frenck-Mestre, Reference Grainger and Frenck-Mestre1998) have consistently found a more robust presence of L2 translation priming effects in a semantic task than in a lexical decision task. Given that translation priming can be confounded by language switching and form-level activation across languages, intralingual L2 semantic priming may be a better paradigm to examine the strength of L2 form-meaning connections. However, whether a significant L2 semantic priming effect could be found in a semantic task among advanced L2 learners is unknown. Thus, Experiment 2 was conducted to see whether a robust L2 semantic priming effect could be observed in a semantic categorization task.
3.1. Method
3.1.1. Participants
Another 38 native English speakers and 40 highly proficient L2 learners were recruited. The adoption of a between-group design was to make sure that no participant would see the same targets twice throughout the experiments. To ensure the equivalence of learner characteristics, Experiment 1 and Experiment 2 were conducted concurrently, and the participants were randomly assigned to either the LDT or the SCT. Two L2 participants who reported feeling more comfortable using English than Chinese were excluded. Thus, there were a total of 38 L2 learners. Table 3 displays the learner characteristics of the L2 learners recruited for the two experiments. A set of independent samples t-tests or Wilcoxon rank sum tests (for nonnormally distributed data) demonstrate that there was no statistically reliable evidence for a difference between the two groups in terms of age: W = 772.5, p = .904, r(rank biserial) = 0.02, 95%CI [−0.24, 0.27], age of acquisition: t(76) = 1.30, p = .20, Cohen’s d = 0.29, 95%CI [−0.15, 0.75], length of residence: W = 610, p = .134, r(rank biserial) = −0.20, 95%CI [−0.43, 0.06], LexTALE score: t(74) = −0.79, p = .430, Cohen’s d = −0.181, 95%CI [−0.63, 0.27], self-reported reading: W = 786.5, p = .788, r(rank biserial) = 0.03, 95%CI [−0.22, 0.28], listening: W = 724, p = .714, r (rank biserial) = −0.05, 95%CI [−0.30, 0.21], writing: W = 754.5, p = .959, r(rank biserial) = −0.007, 95%CI [−0.26, 0.25], and speaking proficiency: W = 700, p = .542, r(rank biserial) = −0.08, 95%CI [−0.32, 0.18], revealing that the two groups possessed a comparable level of English proficiency and shared a similar language learning background.
Table 3. Characteristics of the L2 participants (mean and standard deviation in parentheses)

3.1.2. Materials
The same 36 critical word pairs and 54 filler pairs used for Experiment 1 were adopted in Experiment 2. Similar to Experiment 1, the critical items were counterbalanced into two lists. The filler items were the same across the two lists. Thus, each presentation list includes 36 critical word pairs and 54 unrelated filler pairs displayed pseudo-randomly in each list along with six practical pairs at the beginning of each list.
3.1.3. Procedure
After the consent process, participants were asked to complete a semantic categorization task where they needed to decide whether the target word refers to something that is “man-made” or “natural.” If a target word refers to something made by humans (e.g., table), participants were instructed to press the “a” button on the keyboard. If a target word represents something that comes from nature (e.g., ocean), they were asked to press the “l” button. All other details remained the same as those outlined in Experiment 1.
3.1.4. Data analysis
The same data trimming and transformation procedure, statistics models, and R packages were used to analyze the RT data obtained from the SCT. One native English speaker was excluded due to a high error rate. The deletion of the incorrect responses led to 7.88% of the data loss in the native speakers and 7.53% in the L2 learners. The removal of outliers further affected 3.50% of the data in the native speakers and 3.40% in the L2 learners. The total number of data excluded due to the cleaning procedure accounted for 11.38% of the data in the native speakers and 10.93% in the L2 learners. The same statistical analyses conducted in Experiment 1 were also applied to Experiment 2. In addition, an extra analysis was conducted to explore task effects. This process involves combining the data collected from the LDT and the SCT. The transformed RT was the outcome variable, Condition, Group, Task (LDT vs. SCT; LDT as the reference group), and their interaction were treated as fixed effects with appropriate by-item and by-participant random intercepts and slopes being entered into the initial model. The evaluation of both the random and fixed effects followed the same procedure as described in Experiment 1.
3.2. Results and discussion
Table 4 presents the descriptive statistics of the average RT and ER of the native and non-native English participants’ performance in the SCT. The ERs of both the native and L2 speakers did not differ systematically across the related and unrelated conditions.
Table 4. Native English speakers’ and L2 learners’ mean RT (in ms) and ER (in percentage) in the SCT (standard deviation in parentheses)

Note: Asterisk means statistically significant at p < .05 level. Inferential statistics were computed on inverse-transformed RTs, and the raw means should be interpreted descriptively only.
The best-fitting LME model with by-participant and by-item random intercepts demonstrates that there was a significant main effect of Condition: b = 0.041, SE = 0.015, p = .008, meaning that native speakers responded significantly slower in the unrelated condition than in the related condition. The result suggests that a significant semantic priming effect was found for native English speakers. However, after releveling the reference group to L2 learners, Condition was not a significant predictor of RT: b = −0.002, SE = 0.015, p = .877, indicating that L2 semantic priming effects were not significant for L2 learners even controlling for covariates. The interaction between Condition and Group was significant: b = −0.043, SE = 0.021, p = .045, indicating that the semantic priming effects were significantly different between native English speakers and L2 learners. This finding suggests that the two groups engaged in qualitatively different processing in the SCT.
Regarding task effects, the best-fitting model, which included by-participant random slopes for Condition and by-item random slopes for Tasks, failed to reveal a significant interaction between Condition and Task for native English speakers: b = −0.035, SE = 0.028, p = .203, meaning that the semantic priming effects were similar across the two tasks. Thus, no task effect was found for native speakers. A similar pattern was also observed for L2 learners. After releveling the reference group to L2 learners, the interaction between Condition and Task was not significant either: b = −0.037, SE = 0.027, p = .891, signaling the lack of a task effect for L2 learners. Figure 1 plots the semantic priming effects across tasks and groups.

Figure 1. The semantic priming effect across tasks and groups.
In sum, Experiment 2 investigated the L2 intralingual semantic priming effect via a semantic categorization task. A significant semantic priming effect was only observed among native English speakers, but not among L2 learners. Additionally, no task effect was found for either group. The results were inconsistent with the findings of previous translation priming studies (e.g., Grainger & Frenck-Mestre, Reference Grainger and Frenck-Mestre1998; Finkbeiner et al., Reference Finkbeiner, Forster, Nicol and Nakamura2004; Xia & Andrew, Reference Xia and Andrews2015), where a significant L2 translation priming was more robustly found in a semantic task among unbalanced bilinguals.
4. General discussion
To summarize the results of both experiments, native English speakers demonstrated a significant semantic priming effect in both the LDT and the SCT, and the priming magnitude was similar across tasks. This finding confirmed the adequacy of the experimental design: a reliable priming effect could be obtained with the experimental setup if strong word-concept links were present. The lack of the task effect among native speakers was not consistent with the expectation and the results of de Wit and Kinoshita (Reference de Wit and Kinoshita2015a). However, similar priming effect sizes have been reported in previous studies involving native speakers. For example, in an unmasked priming study with an SOA of 250 ms, Sanchez-Casa et al. (Reference Sanchez-Casas, Ferre, Garcia-Albea and Guasch2006) reported a semantic priming effect of 29 ms and 27 ms in the lexical and semantic tasks, respectively, for their “very close” semantic pairs, and the priming effect was 14 ms and 17 ms in the two tasks for the “close” semantic pairs. No task effect was observed. Thus, the task effect among native speakers can be an evasive phenomenon and is dependent on the circumstances of the experimental setup.
In contrast to the reliable semantic priming effect among native speakers, advanced L2 learners failed to produce a significant L2 semantic priming effect in both tasks. The non-significant L2 semantic priming effect found in the LDT corroborated the results of many previous bilingual processing studies showing no semantic priming when L2 words served as primes (e.g., Chen & Ng, Reference Chen and Ng1989; Jin, Reference Jin1990; Keatley et al., Reference Keatley, Spinks and de Gelder1994; Smith et al., Reference Smith, Walters and Prior2019). The finding that semantic priming was hard to come by even when a semantic task was adopted provided stronger evidence for a weak word-concept connection among L2 speakers. It helped to rule out the possibility that a lack of semantic priming effects among L2 speakers was a result of adopting a lexical task.
These findings were inconsistent with those of several previous studies that demonstrated L2–L2 semantic priming effects. We believe methodological differences may underlie the discrepancy. The first one was the linguistic profile of the participants. In the present study, we tested advanced unbalanced bilinguals who were L1 dominant. Even though they were living in an English-speaking country at the time of testing, they were classroom L2 learners with very limited access to natural input and interaction opportunities as their English emerged. In contrast, among the studies that produce L2–L2 semantic priming, Grainger and Beauvillain (Reference Grainger and Beauvillain1988) tested early English-French bilinguals who had “acquired both languages in infancy” (p. 268). They were a different population with a much earlier onset age and much higher L2 proficiency. The Dutch-English bilinguals tested in de Groot and Nas (Reference de Groot and Nas1991) were unbalanced bilinguals, but as bilingual speakers of two typologically related languages who lived in an environment with easier access to L2, they were likely to be more proficient in L2 than the participants tested in our study. There were no data for noncognates in their L1 Dutch and L2 English for comparison, but the L1–L2 difference in RT on cognates (e.g., in their Experiment 2) was smaller than the difference shown by our participants. This proficiency difference could have contributed to the difference in the results.
A second methodological difference was the percentage of related items in the stimuli. In the present study, we adopted a low percentage of related items (20%) by adding unrelated filler items in order to minimize the chance for the participants to notice the prime-target relationship, thus producing a strategic effect. The unbalanced bilinguals in Smith et al. (Reference Smith, Walters and Prior2019)’s study showed a reliable L2–L2 semantic priming effect, but they did not use filler items to reduce the proportion of related items. As a result, 50% of their items were related items, compared to 20% in the present study, and 16% and 33% in Grainger and Beauvillain (Reference Grainger and Beauvillain1988) and Frenck and Pynte (Reference Frenck and Pynte1987), respectively. Relatedness proportion is known to affect the semantic priming effect size (de Wit & Kinoshita, Reference de Wit and Kinoshita2015b), with higher proportions associated with larger priming effects. Under these circumstances, at least some priming effect may be strategic in nature.
A third possible methodological cause was the test items used. The study reported by Frenck and Pynte (Reference Frenck and Pynte1987) produced a reliable L2–L2 semantic priming effect among unbalanced bilinguals with a low related item ratio, so these two methodological features cannot explain the discrepancy. However, they used categories and exemplars (e.g., bird-sparrow) as prime-target pairs. Six categories (e.g., clothing, body, vegetable) were involved, each with six exemplars. In contrast, we selected our stimuli by relying on associative strength, resulting in a mixture of different types of prime-target pairs, such as members of the same category (e.g., table-chair), pairs with semantic feature overlap (e.g., forest-tree), and associative pairs with limited semantic feature overlap (e.g., nest-bird). A related difference was that the test items used in Frenck and Pynte involved narrower categories compared to the much broader distinction of natural and man-made objects adopted in the present study. Previous research has demonstrated that a priming effect was more likely to occur when the stimuli involved smaller or narrower categories. For example, in a semantic judgment task, Forster (Reference Forster2004) reported a significant semantic priming effect (in the form of the congruency effect) with narrower categories such as Months (e.g., related/unrelated pairs being january/machine-AUGUST), but no such priming effect was observed in an earlier study involving a broader category (Forster et al., Reference Forster, Mohan, Hector, Kinoshita and Lupker2003). Whether or how these stimulus-related differences contributed to the different findings has yet to be determined.
Another point worth considering is why a task effect was observed in translation priming but not in L2–L2 semantic priming. Another way to frame this question is why L2–L1 translation priming, but not L2–L2 semantic priming, can be observed in a semantic task, even though L2 primes were involved in both cases and even when similar participants were involved (e.g., Finkbeiner et al., Reference Finkbeiner, Forster, Nicol and Nakamura2004; Xia & Andrew, Reference Xia and Andrews2015). This difference can be explained in terms of the type of connections involved. Assuming that the same conceptual representations are linked to a bilingual’s two languages, translation priming involves a single shared conceptual representation, thus with a relatively short path of connections: L2 word prime to shared concept to L1 translation target. In contrast, a longer activation pathway is involved in L2–L2 semantic priming: L2 word prime to concept to related concept to L2 word target. The difference between translation priming and semantic priming can also be seen in a distributed model of semantic representation. Translation pairs are likely to share more semantic features than semantically related pairs. Consistent with this analysis, the translation priming effect was often found to be larger than the semantic priming effect in the same study (e.g., Schoobaert et al., Reference Schoonbaert, Duyck, Brysbaert and Hartsuiker2009; Smith et al., Reference Smith, Walters and Prior2019). Thus, the semantic task may boost the involvement of L2 word-concept links to uncover L2–L1 translation priming due to its short activation pathway, leading to the task effect. However, it may not be enough to trigger L2–L2 semantic priming because of the longer activation pathway involved. It is worth noting in this context that Guasch et al. (Reference Guasch, Sánchez-Casas, Ferré and García-Albea2011) examined interlingual translation and semantic priming among highly proficient Catalan-Spanish and Spanish-Catalan bilinguals in a lexical and a semantic task, and the priming effect was similar between the two tasks, as well.
To summarize, the lack of intralingual L2 semantic priming in both tasks corroborates existing findings in demonstrating a weaker word-concept link in L2 compared to L1 in these unbalanced L1-dominant bilinguals. These existing findings include a weaker interlingual semantic priming effect in the L2–L1 direction than in the L1–L2 direction (e.g., Jin, Reference Jin1990; Schwanenflugel & Rey, Reference Schwanenflugel and Rey1986), a lower percentage of meaning-related responses in word association in L2 than in L1 (e.g., Jiang & Zhang, Reference Jiang and Zhang2021), a weaker false memory effect among L2 speakers (e.g., Beato & Arndt, Reference Beato and Arndt2021; Howe et al., Reference Howe, Gagnon and Thouas2008), and a reduced emotional resonance in processing L2 (e.g., Jończyk et al., Reference Jończyk, Naranowicz, Bel-Bahar, Jankowiak, Korpal, Bromberek-Dyzman and Thierry2024; Toivo & Scheepers, Reference Toivo and Scheepers2019).
We recognize that some studies have shown reliable intralingual semantic priming effects in L2. Other studies have demonstrated comparable semantic effects between bilinguals’ two languages or among L1 and L2 speakers. We explained some discrepancies in terms of methodological differences. However, further research is needed to sort out the circumstances where semantic effects are comparable and different, and to explore L2-concept connections in general.
5. Conclusion
We want to conclude with a discussion of the limitations of the study and some ideas for future research. One potential limitation was related to the test items. We inadvertently include three critical items and two control items that may be considered to belong to different categories for the natural/man-made classification (i.e., juice – orange; garden – flower; net – fish; café – bird; cousin – job). This incongruency in category membership between the prime and the target may have an impact on a participant’s response to the target. However, after deleting these five pairs, the post-hoc analysis still did not find a significant L2 semantic priming effect in both the LDT: b = 0.002, SE = 0.022, df = 69.450, p = .941 and the SCT: b = 0.001, SE = 0.016, df = 1948, p = .940, suggesting that the non-significance was not due to these items. This incongruency was also present in some filler items. They could have had some carryout effects, thus affecting the responses to some critical items, but we suspect the impact should be limited.
A second issue related to the test items was the number of items. According to Brysbaert and Stevens (Reference Brysbaert and Stevens2018), a minimum of 40 participants and 40 items is recommended to detect a median effect size. After data trimming, there were fewer than 36 critical items for each participant and fewer than 40 participants per task. This may compromise the effect size to some extent when using mixed-effects models. More items and participants are desirable to increase statistical power. Finally, with a focus on selecting prime-target pairs that were strong in associative relationships, we ended up including a mixture of items with different prime-target relationships, some being members of the same category and others being more associative in nature. In future research, it is desirable to include a more homogeneous set of items in this regard.
In addition, it is also worthwhile comparing participants of different L2 proficiencies, particularly very advanced L2 speakers, in this research. We classified our participants as advanced L2 speakers, but due to typological differences between their L1 and L2, they may be better classified as developing L2 speakers. Data from more advanced L2 speakers would be more informative for understanding this topic. Furthermore, given the null effect among L2 speakers, it is desirable to include a repetition priming manipulation to demonstrate that the L2 primes are processed by the L2 participants. Finally, one may also ask to what extent an associative relationship based on native speakers’ performance also applies to nonnative speakers. For example, bread-butter may be a strong associate for English native speakers, but less so for Chinese ESL speakers because butter may not be part of their diet. It is desirable to select test items by norming them among prospective participants first, rather than based on L1 norming data.
Data availability statement
The data and material that support the findings of this study are openly available in OSF at: https://osf.io/d2ph4.
Competing interests
The authors declare none.



