The tug of war between an idiom ’ s figurative and literal meanings: Evidence from native and bilingual speakers

In two lexical-decision experiments, we investigated the processing of figurative and literal meaning in idioms. Dutch native and German – Dutch bilingual speakers responded to target words presented after a minimal context idiom prime (e.g., ‘ He kicked the bucket ’ ). Target words were related to the figurative meaning of the prime ( ‘ die ’ ), the literal word at the end of the idiom ( ‘ water ’ ), or unrelated to both ( ‘ face ’ ). We observed facilitation in RTs for figuratively and literally related targets relative to unrelated targets for both participant groups. A higher frequency idiom-final word caused inhibition in responses to the literally related target for native speakers, indicating competition between the idiom as a whole and its literal word constituents. Native speakers further showed sensitivity to transparency of the idiom ’ s meaning and the plausibility of the idiom as a literally interpretable sentence. The results are interpreted in terms of available L1/L2 idiom comprehension models, and a more detailed processing account for literal and idiomatic sentence interpretation.


Introduction
The presence of idioms in language often leads to advantages in processing speed over purely literal language for native language (L1) speakers (Gibbs, 1980;Ortony, Schallert, Reynolds & Antos, 1978;Swinney & Cutler, 1979). This finding is commonly referred to as the IDIOM SUPERIORITY EFFECT. In the L1, sentences containing idioms are read faster than comparable literal sentences, and less fixation time is spent on idiom-final words than on words in a literal context sentence (Siyanova-Chanturia, Conklin & Schmitt, 2011;Underwood, Schmitt & Galpin, 2004). However, when an idiomatic expression is understood, activating the meaning of its component words does not NECESSARILY lead to the meaning of the expression as a whole. For example, a semantic analysis of the words in the idiomatic expression to kick the bucket could result in a literal interpretation instead of the idiom's meaning to die. Idioms themselves also differ in transparency: the meaning of some idioms is directly derived from their component parts (i.e., to miss the boat means to miss out on an opportunity), whereas that of other idioms such as kick the bucket (to die) is more opaque.
In the current study our first aim was to clarify the online processing of figurative and literal meaning activation during native and bilingual idiom comprehension by means of a novel visual-visual priming paradigm. This paradigm tapped into the activation resulting from literal and idiomatic sentence processing AFTER the final word of the sentence was presented rather than before it, as is often done. The collected empirical data were then used to reach the second aim of our study: a more detailed theoretical account of how idiomatic and literal sentence processing takes place at form and meaning levels in natives and bilinguals performing priming tasks.
Before we zoom in on our own study, we discuss available models and empirical studies on idiom processing for native and bilingual speakers in the following sections.

Models of idiom processing
Accounts of L1 idiom processing differ in the extent to which they allow for literal word activation during the unfolding of an idiomatic expression. Available models can be broadly divided into three categories, where: (1) figurative meaning takes precedence, (2) literal word and/or sentence meaning takes precedence, and (3) hybrid models where figurative and literal meaning are allowed to interact. We define these categories based on speed of retrieval, where precedency for figurative meaning means that the figurative reading of an idiom becomes available at an earlier point in time than its literal sentence reading (or literal word meaning is suppressed by the idiom's figurative meaning).
First, some models postulate that literal word activation in the context of an idiom is only possible when the idiom's meaning does not provide a sufficient interpretation for the presented context (Gibbs, 1980). One of the most influential views on idiom processing, the LEXICAL REPRESENTATION HYPOTHESIS, assumes that literal word activation runs in parallel with retrieval of the idiom as a separate holistic representation (Swinney & Cutler, 1979). However, the idiom retrieval route wins out in speed over the computation of literal word meanings. As a consequence, the idiomatic meaning of a string of words becomes available faster than its literal, compositionally computed, interpretation.
Second, retrieving literal meaning may be considered an immediate priority, with some researchers assuming that retrieval of the idiomatic meaning of a sentence is only possible once a literal interpretation of that sentence has been reliably rejected (Bobrow & Bell, 1973). According to the CONFIGURATION HYPOTHESIS (Cacciari & Tabossi, 1988), literal activation of incoming word strings is prioritized, but when enough information is gathered for the string to be recognized as an idiom (the IDIOM KEY), its figurative meaning is retrieved. Here idiom familiarity is the key to fast retrieval of idiomatic expressions, rather than the string's idiomaticity. Familiarity has been shown to facilitate processing speed (Connine, Mullennix, Shernoff & Yellen, 1990;Schweigert, 1991) and to be a key contributor to reaction time (RT) facilitation effects for targets presented at the offset of an idiomatic string in a lexical decision task (Titone & Libben, 2014).
Finally, hybrid models allow activation to spread between higher order representations of idioms and literal word components. These models represent idioms as units in the lexicon that are connected to their individual words (Cutting & Bock, 1997;Sprenger, Levelt & Kempen, 2006). In their productionbased model of idiom processing, Sprenger et al. (2006) represent idiom meanings as SUPERLEMMAS that act as mediators between the conceptual representations of an idiom's meaning and the simple lemmas of the literal words the idiom is comprised of. The model allows activation to spread from the superlemma to these simple lemmas and back via an 'element-of' relationship. Applying the model to idiom comprehension, it assumes an initial activation of simple lemmas that ultimately leads to activation of the idiom's superlemma. Activation is allowed to spread from the simple lemmas to the superlemma and vice versa. Another proposed hybrid model, the CONSTRAINT-BASED MODEL OF IDIOM PROCESSING (Libben & Titone, 2008;Titone & Libben, 2014) stresses the importance of constraints placed on idiomatic processing, such as context: Activation of the idiomatic meaning of a sentence builds up over time as evidence accumulates for a figurative interpretation of the sentence, and as such different information may be available at differing timepoints during the unfolding of an idiom.

Native studies
With respect to sentence comprehension, facilitation effects of figurative meaning have been shown in many studies, for instance, using cross-modal priming (Titone & Libben, 2014). Lexical decisions to visual target words related to the idiomatic meaning of an auditory prime sentence (i.e., sleep for the idiom Fred hit the sack) were faster than to words presented after a literal control sentence (i.e., They liked the coffee). Different idiom properties mediated this facilitatory effect at different time points of presentation of the visual target. For example, increased literal plausibility of the idiom slowed RTs to targets presented at the penultimate position of an idiom (e.g., after the in Fred hit the sack), idiom familiarity facilitated RTs to targets presented at the offset of the idiom (e.g., after sack in Fred hit the sack), and idiom decomposability was the most important facilitator of RTs for targets presented at 1000 ms post idiom-offset. Further studies indicate that words within an idiom can also prime literally related words, by showing that the idiom to kick the bucket can prime a semantic associate of bucket such as pail (Hillert & Swinney, 2001;Cacciari & Tabossi, 1988). Rommers, Dijkstra, and Bastiaansen (2013) addressed the issue of figurative versus literal (word) meaning activation in an EEG study by comparing the priming effects for literally and idiomatically biased sentences. N400 effects were obtained on a critical word that functioned as an index of the violation of semantic expectancy (Kutas & Hillyard, 1980;Kutas & Federmeier, 2011). Participants silently read sentences like After many transactions the careless scammer eventually walked against the lamp yesterday. Here a critical word was either the expected word for the idiom (lamp in this example, which is a constituent in the Dutch idiom to walk against the lamp, meaning: to get caught), literally related to this expected word (candle) or unrelated to both conditions ( fish). Analogous literal sentences provided a similar bias towards the crucial noun in a contrasting literal context (e.g., After lunch the electrician screwed the new light bulb into the lamp yesterday). In the literal context, the expected noun lamp showed the smallest N400 effect, followed by candle and fish in a graded pattern. In the idiomatic context, however, this graded effect disappeared as there was no longer a difference in N400 effects between the semantic associate candle and the unrelated fish. These results show that literal word activation may be supressed when part of a strong figuratively biasing context.

Bilingual studies
The available studies provide a two-sided view of bilingual idiom comprehension. Some studies have shown faster processing in advanced bilinguals for literal meaning than figurative meaning, in contrast to L1 studies (e.g., Cieślicka, 2006;Siyanova-Chanturia et al., 2011). Whereas these studies generally reported an absence of figurative facilitation in bilinguals, a recent study by Beck and Weber (2016a) yielded facilitatory effects of figurative meaning, although facilitation for literal meaning remained more prominent. American English L1 speakers and German-English bilinguals participated in a cross-modal priming experiment in which auditory sentences were presented that included idioms. likes to pull my leg. Following the auditory sentence, a visual word was presented for English lexical decision. Targets that were figuratively related to the meaning of the idiom (e.g., JOKE for the previous example) were responded to faster than unrelated targets (SHIP), but targets that were semantically related to the last word of the idiom (WALK) were the fastest responded to. Although this study shows priming of figurative meaning in L2 speakers, facilitation in the literal condition was larger than in the figurative condition for bilinguals as well as natives. This finding contrasts previous evidence on the saliency of figurative meaning in native speakers (Gibbs, 1980;Ortony et al., 1978;Swinney & Cutler, 1979). The authors further compared LEXICAL LEVEL IDIOMS (idioms with word-for-word translations between German and English such as to lend someone an ear -German jemanden sein Ohr leihen) and POST-LEXICAL LEVEL IDIOMS (idioms with matching concepts but differing lexical items, such as to kick the bucket -German den Löffel abgeben (to give away the spoon). Interestingly, different levels of translation equivalency did not influence results. The authors argue that "highly proficient L2 learners may access the figurative meaning of idioms directly in their experiment and therefore do not show effects of translation equivalency with the L1." When ample time for translation is available, facilitation of figurative meaning has been shown in bilinguals (Carrol & Conklin, 2014;Charteris-Black, 2002;Irujo, 1986;Laufer & Hill, 2000). Carrol and Conklin (2014) suggested spreading activation from native to non-native words within idioms in an English lexical decision task with self-paced reading of a prime sentence. The prime contained an idiom translated from the participants' native language Chinese, and subsequently a target that finished the idiom was presented (e.g., feet in the translated Chinese idiom to draw a snake and add feet (meaning to ruin something by adding unnecessary detail) or a matched control appeared (hair). The authors proposed a modified dual route model in which a representation of the idiom is accessible either directly via translation of an L1 idiom to the L2, or through analysis and computation of the phrase itself. In a follow-up study, Carrol and Conklin (2017) embedded translated idioms in a story context in eye-tracking experiments. Chinese-English bilinguals were shown to read idiom-final words more quickly in translated idioms than in control sentences. However, bilingual participants showed inhibitory effects for figurative readings of idiomatic phrases as compared to native speakers. This suggests that processing of a native idiom in a non-native language can remain problematic even for advanced Chinese-English second language learners. Carrol, Conklin, and Gyllstad (2016) addressed effects of cross-language overlap in an eye-tracking study with L1 English speakers and highly proficient L2 English speakers with L1 Swedish. Idioms were either L2-only (English-only), L1-only (Swedish idioms translated to English) or congruent (existing in both languages). Across the board, highly proficient L2 speakers showed faster reading times for idioms than for literal phrases. However, facilitation of reading times and an increased likelihood of skipping the final word were more similar in size for two of the categories of language overlap: those idioms that were L1-only (Swedish idioms translated into English) and congruent idioms. The authors argue this facilitation stems from participants' familiarity with the L1 idiom for both types of facilitated idioms.
Available studies show that the strength of figurative meaning effects is modulated by the experimental context for L2 speakers (Beck & Weber, 2016b;Bobrow & Bell, 1973). If the percentage of idioms in a particular experiment increases, the likelihood of finding figurative meaning effects and the strength of these effects may also be boosted. This concept of 'figurative attunement' is particularly interesting when considering L2 speakers. Because the percentage of idioms included in experiments is generally small, literal interpretations may be more salient to L2 speakers whose idiom representations may be less entrenched.

The current study
We conducted two visual-visual priming studies, one addressing L1 Dutch speakers and one addressing German-Dutch bilinguals. Dutch natives and German-Dutch advanced bilinguals were presented with idiom primes that were followed by target words for Dutch lexical decision. This paradigm allowed us to assess the activation of the idiomatic and the literal sentence interpretation as a whole, rather than confounded with the properties of the last item in the sentence.
All sentences presented in the two experiments were idioms, given that we were interested in literal word activation in circumstances where idiom activation may likely be assumed for L2 speakers as well as L1 speakers. To consider effects for idioms proper, all idioms were presented in isolation, without a biasing context. In three experimental conditions, target words were either (1) figuratively related to the idiom as a whole (FIG condition), (2) semantically related to the literal word at the end of the idiom (LIT condition), or (3) unrelated to both the idiom as a whole and the idiom-final literal word (UNREL condition). For the Dutch idiom Hij doet iets uit de losse pols (He does something from the loose wrist, he does something with ease), target words corresponding to these conditions were: (1) MAKKELIJK (EASY), (2) HARTSLAG (PULSE), and (3) SCHAAMTE (SHAME).
Comparing the three conditions, we predicted the following outcomes. First, for our Dutch native speakers we predicted RT facilitation for FIG targets relative to UNREL targets, in line with previous native idiom comprehension studies (e.g., Titone & Libben, 2014).
Second, we expected facilitated RTs in the LIT condition relative to UNREL. Although Rommers et al. (2013) reported no facilitation for literally related words over unrelated words in an idiom context, Beck and Weber (2016a) did. We hypothesized that this difference across studies could be attributed to the use of a larger biasing context in the first study and a minimal idiom context in the second. The critical word also differed across studies: Rommers et al. measured on a target that was part of the idiom, while Beck and Weber measured on a target presented after the idiom prime. Using minimal idiom contexts and measuring on a separate target presented at idiom offset, we predicted RT facilitation of LIT targets compared to UNREL targets.
Finally, we did not predict any difference in facilitation between the FIG and LIT conditions. Without a strongly biasing context, we did not expect the figurative meaning of the idiom to suppress semantic facilitation effects for the literal words. At first sight, this prediction contrasts with the findings of Beck and Weber (2016a), who reported more facilitation for literally than figuratively related targets in a comparable experimental setting. However, our prediction is based on a post-hoc analysis of the stimuli used in their study. Cognate status across American English and German was not controlled for and was unbalanced across conditions, which may have affected results as the presence of German-English cognates could affect English lexical decision Bilingualism: Language and Cognition 133 responses. Furthermore, literal targets and their matched controls were significantly shorter in word length than figurative targets and their matched controls. Literal targets may have benefitted from more facilitation than figurative targets because of their shorter word length as well. We assume no difference between figurative and literal conditions in our study when these factors are balanced.
In the L2 Dutch-German study, we predicted facilitation in RTs for the Dutch FIG condition compared to the UNREL condition. Our idioms were selected to be well-known to L1 speakers, and our bilingual participants had a high level of language proficiency and immersion in the language environment. We further predicted to find facilitation for LIT targets compared to UNREL targets, considering previous evidence for the strength of literal word activation in idiom processing even for more advanced learners.
We predicted no difference in RTs between the FIG and LIT condition for the German-Dutch bilinguals. This prediction contrasts with the LITERAL SALIENCY HYPOTHESIS (Cieślicka, 2006) that either expects facilitation only for the literal target category, or more facilitation in the literal condition than in the figurative condition. As for the natives, due to the absence of a strongly biasing context we did not expect a strong suppression of literal word activation in the bilinguals.
Because word frequency can be considered as a typical marker for literal (word) activation, we tested several of its effects. First, we predicted that a higher target word frequency would facilitate RTs in Dutch lexical decision for both the native and bilingual group. This expectation is in line with previous research into frequency effects in lexical decision (Grainger, 1990;Rajaram & Neely, 1992;Scarborough, Cortese & Scarborough, 1977). Second, and more important for the present study, we examined how the frequency of the noun at the end of each idiom (e.g., 'lamp' at the end of hij liep tegen de lamp (he walked against the lamp, he got caught) affects literal word processing within the idiom. We reasoned that if this idiom-final word is not suppressed by the idiom, it should be able to affect the RT to a literally related target word. Relative to a low frequency idiom-final word, a higher frequency word should prime both the idiom and the literal reading of the word more strongly, and responses to literally related target words should therefore be slowed due to the resulting increased lexical competition.
Finally, we were interested in the effects of several idiom properties: idiom familiarity, imageability, transparency, subjective frequency, and literal plausibility. We expected that familiarity would be the most influential predictor for lexical decision RTs in both participant groups, as idiom familiarity has commonly been identified as one of the most important predictors in idiom processingwhich is understandable, considering that lack of familiarity with the meaning of an idiom should result in a lack of idiomatic meaning effects.
From a methodological perspective, our study added to the existing body of knowledge on idiom comprehension in several ways. First, by selecting the idioms most commonly known to L1 speakers, we aimed to maximize the odds of our bilingual participants being familiar with the idioms as well. Earlier studies commonly assessed idiom familiarity by asking participants the question 'are you familiar with the meaning of this idiom?', without objectively testing knowledge of the actual meaning of the idiom. We included open-ended questions at the end of our experiment to avoid this pitfall.
Second, we considered the effect of frequency not only with respect to the target word, but also with respect to literal words within the idiom. Previous research neglected the frequency aspects of individual words within idioms, but these may affect processing if literal word activation remains during idiom processing. Comparing processing of Dutch idioms between Dutch native speakers and German learners of Dutch, we used idioms that overlap between Dutch and German only in meaning and not in orthographic aspects (i.e., their constituent words were not cognates). For example, the Dutch idiom iemand aan de tand voelen (to feel someone on the tooth, meaning: to interrogate someone) is a translation equivalent of German jemandem auf den Zahn fühlen, but the constituent words in each idiom are non-cognates (at a Levenshtein distance of 2 or more for each word). For these expressions, we assumed direct translation effects to be minimal to none, considering the short word presentation times in an online experiment.
Finally, we examined the performance of advanced bilinguals (with more than three years of experience with L2 Dutch in an immersive environment) and included an objective measure of L2 vocabulary knowledge in the Dutch LexTALE vocabulary test (Lemhöfer & Broersma, 2012).
In the next two sections, we will subsequently present the L1 and L2 experiments of our study.

Participants
In total, 46 students from Radboud University Nijmegen participated in the study (38 females, mean age = 22.7). All participants were right-handed native speakers of Dutch and had normal or corrected to normal vision. Participation in the experiment was voluntary and compensated with participant credits or a gift card.

Materials and design
Experimental materials consisted of 26 Dutch idiomatic expressions. Dutch native speakers rated each expression on several five point Likert scales. Each expression received at least 20 ratings. Four questions were posed before the idiom's meaning was shown, assessing: subjective frequency ('how often have you seen or heard this expression?'), familiarity ('how familiar are you with the meaning of this expression?'), literal plausibility ('how literally plausible is this sentence?'), and imageability ('how easily can you associate an image with this sentence?') (see also Hubers, van Ginkel, Cucchiarini, Strik & Dijkstra, 2018). Then, participants answered an open-ended and a fourchoice multiple-choice question assessing recall and recognition of the idiom's meaning, respectively. Finally, the correct meaning of the idiom was provided and idiom transparency was assessed ('how clear is the meaning of this expression based on the constituent words?'). Idioms were only selected for this experiment if they were well known to Dutch native speakers, as determined by the average percentage correct on the multiple-choice question. The 26 selected idioms for this study had an average of 91.96% correct answers (SD = 13). All idioms included in the experiment are listed in Table A1 in Appendix 1 (norming data are included in Table A2 in Appendix 2).
Primes consisted of each idiom presented as a sentence in a Repeated Serial Visual Presentation (RSVP) paradigm. Each sentence ended in a noun (henceforth: idiom-final word). Target words were selected for three conditions per idiom: figuratively

134
Wendy van Ginkel and Ton Dijkstra related to the idiom's meaning (FIG), semantically related to the idiom-final word (LIT), or unrelated to both (UNREL). An example of trial sentences and lexical decision targets is given in Table 1 for the Dutch idiom iemand aan de tand voelen (to feel someone on the tooth). Target words for the LIT condition were primarily obtained from the Dutch Word Association Database (De Deyne & Storms, 2008). Frequency information for all targets was obtained from SUBTLEX-NL, a database of Dutch word frequencies based on 44 million words from film and television subtitles (Keuleers, Brysbaert & New, 2010). Words were only selected for the pool of possible targets if their LOG10 frequency in the SUBTLEX-NL database was two or higher. Target word relatedness for each of the conditions was assessed in several validation surveys. Each potential target was rated by at least 20 participants for its relatedness to the idiom prime and the idiom-final word on a five-point Likert scale ranging from 'completely unrelated' to 'completely related'. Literal filler pairs (e.g., 'tigerstripes') were inserted into each validation survey to mask its purpose. From the validation surveys, target words were selected and balanced across conditions on word length and frequency. Paired t-tests showed that the average relatedness scores for figuratively related target words in relation to the idiom (mean = 4.16, SD = .49) did not differ significantly from literally related words in relation to the idiom-final noun (mean = 4.26, SD = .31, t(26)=−.979, p = .336). Unrelated targets were rated as less related to the idiom than FIG words (mean = 1.39, SD = .34, t(25) = 23.093, p < .001) and as less related to the idiom-final word than LIT words (mean = 1.61, SD = .68, t(25) = 17.363, p < .001), but were not significantly less related to one condition over the other. To further ensure that there was no arbitrary meaning relationship between FIG targets and the idiom-final word that could cause priming independently from the context of the idiom, we conducted a control lexical decision experiment. Here, we stripped the idiom of its context and presented only the idiom-final word as a prime (for example: 'tooth' in the idiom to feel someone on the tooth). A linear mixed effects regression analysis showed that idiom-final words in isolation only primed LIT targets (mean RT in miliseconds = 504, SD = 97) in comparison to FIG targets (mean = 519, SD = 101; Estimate = .04325, SE = .009615, t(1761) = 4. 498, p < .001) and UNREL targets (mean = 519, SD = 104; (Estimate = .03180, SE = .009297, t(1858) = 3. 420, p < .001). This study validated our stimulus materials (see Appendix S1 for the full control experiment, Supplementary Materials).

Procedure
Participants were tested individually in a single session. Presentation of visual primes and targets was programmed in Psychopy (Peirce, 2007). RTs were recorded via a dedicated button box designed by the Radboud Donders Centre for Cognition's Technical Group (BitsiBox) at one millisecond accuracy. Participants were seated at a table at 60 cm distance from the computer screen. They received written instructions in Dutch that were repeated orally, informing them that they would see a series of sequentially presented words in white, with one word presented in red capital letters. They were asked to decide whether the red word was a Dutch word or not by pressing one of two buttons on the button box in front of them. Participants were instructed to respond as quickly and as accurately as possible. A short practice session consisting of 16 example stimuli preceded the experimental trials, to allow participants to familiarize themselves with the task. A fixation cross (+) was presented at the start of each trial with a duration of 1.5 seconds. An idiom sentence prime was then presented one word at a time in the center of the screen in white letters in the font Arial (font size 42.5) on a black background. Each word was presented for 300 ms and followed by a 300 ms blank screen. Next, each target word was presented 300 ms after the offset of the idiom-final word. Participants had a three second time window to provide a response before the next trial started automatically.
Each participant worked through a pseudorandomized stimulus list in which each expression occurred thrice. This order of presentation was counterbalanced across lists. Filler sentences were created to mask the presence of the experimental sentences. All fillers were also idioms, so every sentence prime had a possible idiomatic interpretation. Primes were presented with either three nonwords or a balanced selection of nonwords and words. For example, the idiom de eerste viool spelen (to play the first violin, to have the most important role) was presented once with a literally related target word (ORKEST / ORCHESTRA) and twice with non-word targets. This resulted in a fifty-fifty percent chance of encountering a word or non-word in the lexical decision task. The task was split into five blocks for a total of 213 trials. Participants were allowed to take a break after each block for however long they wished. In total, the lexical decision task took about 25 to 30 minutes per participant.
After the task, participants provided biographical information.
Then they answered open-ended questions assessing their knowledge of the experimental idioms by typing in the meaning of each expression.

Reaction times
One expression and its targets (tussen twee vuren zitten) were excluded from the analysis, as participants provided too many idiosyncratic meanings for this expression in the open-ended question to reliably include it. One participant was excluded for slow overall RTs at 2.5 SDs from the overall mean. One item was excluded for more than 20% data loss ('KAAK' / 'JAW'), and one further target was removed for overall slow outlier RTs at 2.5 SDs from the overall mean ('INGEWANDEN' / 'INTESTINES').
Outliers were removed at both subject and item level at and above 2.5 SDs from the mean. One further item was excluded for over 20% data loss after outlier analysis ('BOFFEN' / 'TO BE LUCKY'). In total, 10.1% of raw data were removed due to these procedures. Paired t-tests showed that target word length and target word frequency remained balanced after these three items were removed. Means and SDs for targets are provided in table 2 below along with error rates. The mean RT for target nonwords was 637 ms (SD = 175). Linear mixed effects model regression analyses were conducted in Rstudio (lmerTest package in R Project for Statistical Computing, R version 3.4.1) on correct responses only. This analysis takes into account random effects at both subject and item level, eliminating the need for separate subject and items (F1, F2) ANOVAs. Log-transformed RTs were taken as the dependent variable, and the contribution of several predictors was assessed: Target Word Condition (figuratively related (FIG), literally related (LIT), or unrelated (UNREL)), Target Word Length, Target Word Frequency, and Idiom-Final Word Frequency. The contribution of idiom-level predictors was also assessed: Subjective Frequency, Familiarity, Transparency, Imageability, and Literal Plausibility. Participant and Item were included as random factors, where Item was defined at the level of the idiomatic expression as target words were matched in triplets corresponding to each idiom. Multi-collinearity of idiom-level predictors was addressed by computing bivariate correlations. Literal plausibility and Imageability were shown to be correlated (r 2 =.655, p < .001), and a significant correlation was also found between Transparency and Imageability (r 2 =.488, p < .01).
We started with the simplest regression model and iteratively added predictors until we reached the most explanatory, theoretically relevant model (Hox, 2002). Each model was tested against its predecessor in an ANOVA in a stepwise selection procedure until the best fitting model had been selected. The final model took the log-transformed RTs as the dependent variable and included a random slope for Participant (over Trial Number to take into account individual order effects), and for Item. A fixed effect of Trial Number was also included. Several interactions between Target Word Condition and other predictors were included: a two-way interaction between Target Word Condition*Idiom-Final Word Frequency, and two three-way interactions between (1) Target Word Condition*Target Word Frequency*Target Word Length, and (2) Target Word Condition*Transparency (Idiom-Level Predictor) *Literal Plausibility (Idiom-Level Predictor). We applied t > 1.96 as an indication for significance. All tests that are reported to be significant according to this t-criterion are also significant when the LmerTest statistic is applied, and p-values provided by this statistical package are provided. The predictors Target Word Frequency and Idiom-Final Word Frequency were centered. Condition effects were examined by releveling the Target Word Condition factor within the linear mixed effects model. The most relevant results of Experiment 1 are summarized in Table 3 for the relevel of the model where the unrelated condition is placed on the intercept. Other comparisons are made in text in the next two sections.

Condition and frequency effects
As can be seen in Table 3 showed that, on average, an increase in target word frequency caused a slowing in RTs specifically for longer targets (Estimate = .003375, SE = .01216, t(874.7) = 2.775, p < .001), even though the overall effect of word frequency was facilitatory. The same pattern of results fell just short of significance in the LIT condition (Estimate = .02175, SE = .01134, t(60.6) = 1.918, p = .056), and was not present in the UNREL condition.
For Idiom-Final Word Frequency, we found a simple effect only in the LIT condition (Estimate = .04847, SE = .01504, t(33.2) = 3.221, p = .003), where a higher idiom-final word frequency resulted in slower RTs on LIT targets. The two-way interaction effect between Target Word Condition and Idiom-Final Word Frequency (see Figure 1) showed that RTs were slowed on targets following a higher frequency idiom-final word in the LIT condition only as compared to the FIG condition (Estimate=−.04639, SE = .01313, t(1536)=−3.534, p < .001) and the UNREL condition (Estimate=−.04256, SE = .01394, t(1229)=−3.053, p = .002). In sum, Idiom-Final Word Frequency was only an important predictor for RTs in the LIT condition, where higher frequencies resulted in slower RTs.

Effects of idiom-level predictors
There was a simple effect of Transparency in all conditions, but directions of the effect differed. Higher Transparency of the idiom resulted in faster RTs in both the FIG condition

Error analysis
Error rates on words in the lexical decision task averaged .04 with a maximum of .15 (SD = .03). Table 2 reports list means and standard deviations for the error analysis. The mean error rate for nonwords was .05 (SD = .04). A binary logistic regression  Bilingualism: Language and Cognition 137 run on correctness of judgments did not yield differences in error rates between the experimental conditions.

Discussion
We found priming of the meaning of idioms presented in a minimal context as reflected by faster RTs in the FIG than the UNREL condition, in line with the hypothesis that the idiom's representation is activated as a whole during online processing. We also found priming of targets in the LIT condition compared to the UNREL condition, showing that literal word meanings were also activated. We ensured that the figurative priming effect was not due to spurious relationships between the idiom-final word and the target words (e.g., tooth and QUESTION) by conducting a control experiment in which the idiom-final word was isolated and presented in a word-word priming paradigm (see Methods section above and Appendix S1). In this control experiment, we found only facilitation for targets in the LIT condition (e.g., tooth -JAW). Target word frequency facilitated RTs for targets in FIG and LIT conditions as compared to the UNREL condition. In the UNREL condition, the absence of a frequency effect could be due to a ceiling effect where RTs simply could not benefit from more facilitation from target word frequency on top of target word length effects. Target word length was important across all three conditions, with shorter targets receiving faster responses.
The word frequency of the idiom-final word negatively affected RTs on LIT targets. A higher word frequency of the idiom-final word was associated with slower RTs on LIT targets. This inhibition effect could be attributed to conflicting processes: the idiomatic input causes strong activation of the idiom as a whole, but literal words within the expression also become activated (as shown by the priming effect for LIT targets). If the idiom-final word is more frequent, it becomes harder to suppress its activation. When required to respond to a literally related target, this conflicting process slows down RTs. This finding stands in contrast to Rommers et al. (2013), who reported suppression of literal word meanings by the idiom. One explanation for our finding lies in the presence of an idiomatically biasing context in Rommers et al., in contrast to the current experiment. A strongly idiomatically biasing context could suppress literal word meanings as the retrieval of such meanings is no longer necessary (and, in fact, disadvantageous) to understand the input once the idiom has been recognized as such. The idiom's meaning can be retrieved and the literal word meaning is no longer relevant. If word meanings were suppressed by the idiom as a whole, we would not have found any literal word priming in our task. Therefore, we take this finding as a strong, additional indication of literal word processing during idiom comprehension in the absence of a strongly biasing context.
Considering our idiom-level predictors, we found effects only for idiom transparency and literal plausibility. In the figurative condition, a competition effect arose between transparency and literal plausibility. A higher transparency of idiom meaning facilitates RTs on figurative related target words, as this meaning becomes available more quickly for this type of idiom. However, higher transparency idioms experienced more interference from a highly literally plausible interpretation of the idiom, reflected in an inhibitory effect on RTs in the FIG condition. As such, integration of the figurative meaning of the idiom is hindered by the presence of a strongly possible literal sentence reading of the same idiom, resulting in competition between the idiom's figurative and literal sentence meanings.
Literally related targets also showed competition, although the effect was less strong. A high literal plausibility of the idiom generally facilitates RTs on targets literally related to the sentence-final word. However, if the same literally plausible idiom has a very transparent meaning, the idiomatic interpretation competes with its literal sentence interpretation, causing inhibition on RTs. This competition effect was present in both the FIG and LIT conditions, but it was stronger in the FIG condition. We suggest that the integration of a more abstract, idiomatic meaning is hindered by such competition.

Participants
Participants were L2 learners of Dutch with L1 German (29 total, 24 females, mean age = 25.31) who had an average of 5.6 years of experience actively using the Dutch language and who had been living in the Netherlands for an average of 4.2 years. All participants had normal or corrected to normal vision. Participation in the experiment was voluntary and compensated with a gift card.

Materials and Design
The same materials were used as in Experiment 1. Idioms selected for the current experiment existed in both Dutch and German, but were comprised of non-cognate words at a Levenshtein distance of two or more compared to their German counterpart. For example: the Dutch expression iets uit de losse pols doen (to do something from the loose wrist) and the German expression etwas aus dem Handgelenk schütteln share the same overall meaning 'to do something with ease', but are comprised of words with differing orthography. Overlap with German was determined by subjective ratings of six native speakers of German who provided German counterparts for each Dutch expression. These suggestions were validated through the use of dictionaries such as the German-Dutch and Dutch-German 'van Dale' dictionary and an online index for German expressions, Redensarten Index (2001).

Procedure
The same procedure as in Experiment 1 was employed. Participants performed a lexical decision task in which they were presented with Dutch words and nonwords, and pressed one of two buttons to indicate whether the presented item was an existing Dutch word or not. Additionally, participants completed the LexTALE Dutch vocabulary test (Lemhöfer & Broersma, 2012) that rates vocabulary knowledge on a scale of zero to 100. Participants scored an average of 71.81. Participants also completed a language background questionnaire assessing their years of experience with Dutch and the time they had spent living in the Netherlands, as well as their exposure to the Dutch language.

Reaction times
As in Experiments 1 and 2, the expression tussen twee vuren zitten was excluded from analysis to maximize comparability between experiments. Three participants were excluded for error percentages over 20%. Nine items were excluded for over 20% data Outliers were removed at both subject and item level at 2.5 SDs from the mean. In total, 20.3% of the raw data were removed due to these procedures. Items remained balanced across conditions in terms of word frequency and word length in paired t-tests. Mean RTs and error rates are shown in Table 2. The mean RT for nonwords was 758 ms (SD = 193). The final linear mixed effects regression model was made by adding predictors in an iterative manner, testing each model against its predecessor in an ANOVA until the most complex theoretically relevant model had been reached. None of the idiomlevel predictors provided a significant contribution to the final model. The final model took the log-transformed RTs as the dependent variable and included a random slope for Participant (over Trial Number), and Item at the idiom-level. Main effects were included for Trial Number, Target Word Condition, Target Word Frequency, and Target Word Length. No interaction effects were included. Again, we compared our three target conditions by releveling the linear mixed effects regression model. Results are shown in Table 4 for the relevel of the model with the unrelated (UNREL) condition on the intercept.
Participants responded faster to both FIG (Estimate = .03814, SE = .009517, t(1340) = 4. 007, p < .001) and LIT targets (Estimate = .04430, SE = .009517, t(1487) = 4. 336, p < .001) compared to UNREL targets, but there was no difference between FIG and LIT targets. Trial Number did not have a significant effect on RTs in the bilingual group, but accounted for some of the variance in the model. There was a simple effect of Target Word Frequency in all three conditions (for each condition: Estimate=−.06836, SE = .01022, t(350)=−5.914, p < .001), where higher frequency targets were responded to significantly faster (see Figure 2). The effect of Target Word Length showed a trend towards shorter targets receiving faster responses by the bilingual participants (for each condition: Estimate=−.007769, SE = .004219, t(21)=−1.842, p = .07).

Error analysis
Error rates in the lexical decision task averaged .09 across conditions with a maximum of .24 (SD = .05). Table 2 lists means and SDs for the error analysis. The mean error rate for nonwords was .09 (SD = .07). A binary logistic regression model was run with correctness of lexical decision judgments as the dependent variable. For the experimental conditions, we found that FIG targets were responded to slightly more accurately than UNREL targets (Estimate = .8874, SE = .3525, z = 2.518, p = .01), but no other differences were found. Furthermore, higher frequency targets received more accurate responses overall than lower frequency targets (Estimate=−.1.6059, SE = .3674, z=−4.370, p < . 001). Lastly, shorter targets received more accurate responses overall than lower frequency targets (Estimate=−.3322, SE = .1150, z= −2.890, p < . 01).

Discussion
For German-Dutch bilinguals processing in their L2 Dutch, we observed faster response times to figuratively related target words than to unrelated target words, a sign of meaning activation of the associated idioms in the participants' L2. Comparable facilitation was found for literally related vs unrelated target words. No RT difference was found between figuratively and literally related words. These findings suggest that both the figurative meaning of the idiom as a whole as well as literal word meanings are available online to highly proficient bilinguals during idiom comprehension.
A higher target word frequency facilitated RTs in all three conditions. Target Word Length showed a trend towards shorter targets being responded to faster, but this effect fell just short of significance. We suggest that the same mechanisms are at play for both L1 and L2 speakers during idiom processing in our study, but that our bilingual participants were less sensitive to certain word aspects due to less experience with their L2.
Whereas the processing by native speakers was sensitive to idiom properties, this was not the case for the bilinguals. This relative insensitivity might be attributed to the reduced experience with idioms in our bilingual group. Because they have not been in the language environment for as long, bilinguals may have encountered these idioms less frequently than natives. As a result, there would be a weaker activation of idiom representations and their properties in L2 speakers than in L1 speakers.

General discussion
Dutch L1 speakers and advanced German-Dutch bilinguals made Dutch lexical decisions on target words following sentences that had idiomatic and literal interpretations, such as Hij voelt hem aan de tand (He feels him on the tooth). Relative to unrelated target words, Dutch L1 speakers responded faster to both figuratively related words (QUESTION) as well as words related to the last word of the idiom prime (tooth -JAW). Dutch L1 speakers were sensitive to the targets' word frequency and length. A higher transparency of the idiom's meaning caused interference effects for highly literally plausible idioms (e.g., idioms with a more highly likely literal sentence meaning interpretation, such as He shook her awake). This effect can be interpreted as a reflection Bilingualism: Language and Cognition 139 of competition between the idiom's meaning as a whole and its interpretation as a literal sentence. If a sentence has a very clear idiomatic meaning, but also has a highly likely literal interpretation, this hinders responses to both figuratively and literally related targets through competition effects. Finally, a higher idiom-final word frequency inhibited responses to literally related targets. This effect may be ascribed to competition between the idiom's figurative meaning and its literal word constituents. A higher word frequency of the idiom-final word can only inhibit lexical decision responses on literally related targets if there is competition between the idiom and its literal word constituents. Through this competition, responses to literally related targets are slowed, because the prime word they are related to is also part of an idiom. Thus, the idiom-final word primes both its semantic associates and the idiom it is a part of. When idiom-final words were presented in isolation in our control experiment (see Appendix S1, Supplementary Materials), this inhibition effect disappeared, because there was no longer competition between the prime word and an idiomatic context in which it was contained.
In a second idiom-priming experiment, highly proficient German-Dutch bilinguals performed the same lexical decision task as the Dutch L1 speakers in Experiment 1. L2 speakers showed priming for both figuratively and literally related targets compared to unrelated targets. Target word frequency facilitated responses in all conditions. However, L2 participants were not sensitive to the frequency of the idiom-final word. Bilinguals also did not show sensitivity to idiom-level predictors such as transparency and literal plausibility.
To account for these data patterns in native and non-native language users, a hybrid processing model is required, such as that proposed by Sprenger et al. (2006). In line with such a model, both the figurative meaning of the idioms and literal word meaning of idiom-final words were equally available when L1 and L2 speakers of Dutch processed idioms online. Furthermore, different idiom properties can affect the processing and retrieval of idiomatic meaning for L1 speakers, which is in line with the CONSTRAINT-BASED MODEL OF IDIOM PROCESSING (Libben & Titone, 2008;Titone & Libben, 2014). Models that give precedence to either figurative or literal sentence or word meaning cannot account for these findings. We should have found priming only for figuratively related targets if figurative meaning had taken precedence in our study, as literal word processing would have been aborted once the idiom had been recognized as such (Gibbs, 1980;Swinney & Cutler, 1979). Similarly, if literal word meaning took precedence and idiomatic meaning was only computed after reliable rejection of a literal interpretation of our sentences (Bobrow & Bell, 1973), we would not have found priming for both figuratively and literally related targets.

Processing account
More specifically, our data for Dutch native speakers appear to be in line with the following task-dependent processing account for native speakers and bilinguals. For each incoming word, word form and then meaning is retrieved. Lexical meaning is integrated into a meaning representation for the sentence as far as it is available. Normally, sentence context or other previously presented information provides a relative bias towards a literal or an idiomatic sentence interpretation. This bias will affect the speed with which the meaning of an upcoming word can be integrated within the sentence interpretation under construction.
In a purely literal context, the integration of a word in the sentence framework is co-determined by its lexical properties (e.g., its frequency and plausibility). In case the sentence-final item has a higher word frequency, it will result in more spreading activation to a semantically-related target word presented after the sentence.
In our experimental situation however, the sentence-final word completes an idiomatic expression. The idiomatic meaning representation of the sentence has gradually built up during its word-by-word presentation, and it is completed when the final word is presented. Importantly, to complete the idiomatic expression, only the FORM of the last item is relevant. In fact, the meaning of the last item in the sentence may compete with the often completely unrelated meaning of the idiom. It takes time to resolve this competition. During this time, the isolated target item for lexical decision is presented. Temporarily, both the idiomatic and the literal sentence interpretation are active. The transitory competition is reflected in the native data in the observed interaction of transparency and literal plausibility: processing more transparent idioms is

140
Wendy van Ginkel and Ton Dijkstra hindered more by a higher literal plausibility of the idiom for both figuratively and literally related targets. In contrast to Rommers et al. (2013), we did not find evidence for suppression of literal word activation. We account for this difference in findings in terms of the strength of context effects. Whereas Rommers et al. presented their target word as the last word of a sentence in extensive idiomatically or literally biasing contexts, we presented idioms in relative isolation. We propose that, given time, such a biasing context may override or suppress literal word activation.

Task aspects
Crucial in our processing account is the moment at which a target word is presented relative to the activation state at the sentence level. This makes our account task-dependent in the light of timing differences between research paradigms. In many studies, it is the sentence-final word that is the focus of investigation (e.g., Carrol & Conklin, 2017;Carrol et al., 2016, Rommers et al., 2013. When this word is processed, both the idiomatic sentence interpretation and the literal sentence intepretation are still under development. As a consequence, the relationship between the earlier presented words and the last word of the sentence (e.g., CLOZE probability) will play an important role (as well as the properties of this word itself). However, in our paradigm the last word of the sentence has already been integrated, which allows completion of both the idiomatic expression and the literal intepretation of the sentence. As such, the sentence-final word has already contributed to the sentence as a whole and its prediction may be less relevant as it is readily available. On the basis of this analysis, we recommend that task differences in terms of activation time-course are carefully considered when findings are compared across empirical studies.

Native and bilingual idiom processing
One implication of our processing account is that changes in the relative activation speed of different sentence or word properties may be reflected in the processing of the target word. Under the assumption that the two groups process sentences according to similar mechanisms, their global result patterns for figurative and literal meaning conditions might look more or less similar (e.g., in terms of main effects and interactions for the same predictors). However, timing differences may result in different contributions of form and meaning properties to the performance of the two groups. Global similarity but difference in detail is exactly what we found for the results in the two groups.

Individual word properties
For the L1 speakers in Experiment 1, representations of individual Dutch words are strong and readily available. This results in facilitatory effects of target word frequency and length, but also in a sensitivity to other word frequency aspects. In particular, a higher final-word frequency inhibited responses on literally related targets, indicating the presence of competition between literal word aspects (i.e., frequency) and the sentence's idiomatic interpretation.
In contrast, L2 participants in Experiment 2 did not show such a sensitivity to idiom-final word frequency. Having less experience with Dutch, their Dutch (L2) word representations are weaker than for L1 speakers. This is reflected in overall slower responses in Experiment 2. L2 participants did show facilitation of target word frequency, but this was the only significant contributor in terms of word aspects. Their representations of individual words may be too weak to induce idiom-final word frequency effects such as in L1. In sum, because they do not have the same degree of exposure to the Dutch language, L2 participants are not sensitive to subtle variations in the frequency of the idiom-final word when making their decision on the subsequent target word.

Idiom properties
A similar reasoning about relative activation speed holds for L1 and L2 figurative processing. Our native speakers have extensive experience with Dutch idioms, which are likely to be strongly represented in their mental lexicon. Note that our idioms were selected based on how well-known their meaning is to Dutch natives, a criterion which resulted in relatively little variance in idiom properties given that these idioms may also be more transparent, subjectively frequent, etc. We found competition of transparency and literal plausibility in our L1 study. When the meaning of an idiom being processed was more transparent, this was hindered by a more literally plausible interpretation of the sentence. In sum, L1 speakers are sensitive to context information provided by both word and sentence properties when they make their lexical decision on the isolated target word.
In contrast, representations of Dutch idioms are less strongly represented in L2 participants, due to considerably less language exposure. L2 participants were familiar with the meaning of the experimental idioms, as these were selected from among the bestknown Dutch idioms. However, few encounters with these idioms and less knowledge of Dutch idioms overall made the L2 participants less sensitive to their properties than L1 participants. Because in our experimental set-up only sentences with a possible idiomatic continuation were included, participants may have been 'figuratively attuned', and as such activation of figurative meaning may have been boosted (Beck & Weber, 2016b). Nevertheless, only general effects of figurative and literal word meaning were found in the L2 group and no particular sentivities to different idiom properties.

Future research
In the present experimental context, all sentences included word sequences that had a possible idiomatic interpretation. Employing only a minimal context, we obtained idiom-final word frequency effects, indicative of literal meaning activation. Future research should determine if the effects are sustained in STRONGER BIASING CONTEXTS, e.g., with a preceding disambiguating sentence. If literal word activation is indeed suppressed in a sufficiently biasing idiomatic context, inhibition effects caused by higher idiom-final word frequency should disappear.
These effects can also be investigated by manipulating TASK DEMANDS. Measuring on the idiom-final word or on a target word following the idiom prime should yield different result patterns, as in one scenario the word is predicted and needs to be integrated, whereas in the other it is already available as a sentence completion before a target word is presented.
Differences in activation speed of individual words may not only be present for words differing in frequency, but also in terms of their CROSS-LINGUISTIC OVERLAP. For instance, the presence of cognates in the idiom could facilitate translation between Bilingualism: Language and Cognition languages. In our study, we carefully avoided the inclusion of cognates in our stimulus materials. This aspect might explain that, in contrast to Beck and Weber (2016a), we did not obtain more facilitation for literally than figuratively related targets relative to the unrelated conditions for either our L1 or our L2 groups. At a more global level, the result patterns in our study were comparable to theirs in that L1 and L2 participants performed similarly under the same task demands, although result patterns differed across studies. Thus, future research should explicitly focus on the effects of cognates in the stimulus lists or otherwise carefully balance the number of cognates across conditions.
Another important type of cross-language overlap is conceptual or meaning overlap. We chose to select idiomatic stimuli without form overlap (hence: no cognates) but with conceptual overlap between the two languages. However, the consequences of such overlap on processing are not fully clear (see our review of cross-linguistic overlap in the Introduction). An interesting future manipulation is to consider the frequency effect of an idiom-final word when this word is a cognate. Contrasting literal and figurative sentences with or without conceptual overlap by positioning cognates or non-cognates at the crucial idiom-final position opens up an interesting avenue of research.

Conclusions
In our study, we observed both quantitative and qualitative similarities and differences between L1 and L2 speakers of Dutch. Although L1 and L2 speakers were both familiar with the meaning of the presented Dutch idioms, they differed in speed of processing and sensitivity to properties of the idiomatic and literal sentences as a whole and of each word individually (e.g., frequency). The different result patterns for the two groups could be ascribed to a different exposure to Dutch words and idioms, but also to differences in processing capacity and working memory with respect to the target language (their L1 or L2).
In all, the current study offers several novel insights. First, our idiom-priming study was the first to manipulate the word frequency of literal word constituents within the idiom. Idiom-final word frequency effects pointed at competition effects between the idiom sentence's representation as a whole and its constituent words. Their direct competition was also evident from an interaction of transparency and literal plausibility. Second, L2 speakers are able to process both figurative and literal word meaning aspects in idiom priming similarly to L1 speakers, but their depth of knowledge and processing differs in terms of sensitivity to more subtle variations such as idiom transparency when variation in the presented materials is limited.
Finally, our study suggests that the time-course of sentence activation and consecutive result patterns depend on task properties. It makes a difference whether measurement takes place on the idiom-final word or on an isolated word presented after the entire idiom is available. Future studies should study the consequences of different task demands when theoretical questions are posed involving time-course aspects of figurative and literal sentence processing. What you see depends both on where and when you look.