Prediction in challenging situations: Most bilinguals can predict upcoming semantically-related words in their L1 source language when interpreting

Abstract Prediction is an important part of language processing. An open question is to what extent people predict language in challenging circumstances. Here we tested the limits of prediction by asking bilingual Dutch native speakers to interpret Dutch sentences into their English counterparts. In two visual world experiments, we recorded participants’ eye movements to co-present visual objects while they engaged in interpreting tasks (consecutive and simultaneous interpreting). Most participants showed anticipatory eye movements to semantically-related upcoming target words in their L1 source language during both consecutive and simultaneous interpretation. A quarter of participants during simultaneous interpretation however did not move their eyes, an extremely unusual participant behaviour in visual world studies. Overall, the findings suggest that most people predict in the source language under challenging interpreting situations. Further work is required to understand the causes of the absence of (anticipatory) eye movements during simultaneous interpretation in a substantial subset of individuals.

In the present study we tested the limits of prediction by asking (native) Dutch, L2 speakers of English, to translate Dutch sentences into their English counterparts during consecutive interpreting (CI) and simultaneous interpreting (SI) tasks. Given that interpreting is an extremely demanding task, especially for untrained bilinguals, we chose to test their tendency to predict language in a strongly supportive task environment: verb-based semantic prediction in a visual world context.

Prediction in challenging situations
Some research suggests that, despite the advantage of making language processing more efficient, predictive processing is far from effortless. , for example, found that taxing participants' working memory delayed predictive eye-movements. This finding, together with the findings in Huettig and Janse (2016) and Chun, Chen, Liu, and Chan (2021), suggests that prediction, at least in challenging situations, is constrained by available cognitive resources. In addition, prior empirical efforts also show that limited processing time causes reduced prediction in both listening (Huettig & Guerra, 2019) and reading (Ito, Corley, Pickering, Martin, & Nieuwland, 2016). Challenging situations impeding predictive processing also relate to some perceptual difficulties in adverse conditions, such as casual speech with many phonological reductions (Brouwer et al., 2013) and foreign-accented speech with unreliable and potentially ambiguous input (Porretta, Buchanan, & Järvikivi, 2020;Romero-Rivas, Martin, & Costa, 2016;Schiller et al., 2020).
Motivated by 'prediction-is-production' views, which posit a fundamental role of the production system for prediction in language processing (Pickering & Gambi, 2018;Pickering & Garrod, 2013), several studies explored prediction by accompanying a comprehension task with a production task. Most of these studies obtained some evidence consistent with a role for the production system (Hintz, Meyer, & Huettig, 2016;Lelonkiewicz, Rabagliati, & Pickering, 2021;Rommers, Dell, & Benjamin, 2020) though, arguably, overall the evidence is still limited.
Finally, compared to native language processing, L2 settings appear to impose extra challenges on predictive processing. While some studies provided evidence for the occurrence of prediction in L2 to a similar extent as in L1 (e.g., Chambers & Cooke, 2009;Dijkgraaf, Hartsuiker, & Duyck, 2017), ample work showed smaller, delayed, or null effects of prediction among L2 speakers (for reviews, see Kaan, 2014;Ito & Pickering, 2021;Kaan & Grüter, 2021). This can be attributed to the fact that non-native language processing is generally more resource-demanding and non-automatic in some sub-processes, including accessing lexical representations (McDonald, 2006), building syntactic representations (Clahsen & Felser, 2006), and determining sentence meanings (MacWhinney & Bates, 1989). Consequently, it is conceivable that, at least during the early stages of L2 processing, there are limited time and resources available for prediction. L2 speakers, for example, face difficulties in using lexical or grammatical features that are reliable for prediction but absent in their L1 (e.g., Dussias et al., 2013;Hopp, 2013Hopp, , 2016Lew-Williams & Fernald, 2010;Mitsugi & Macwhinney, 2016). Finally, interference from L1 forms another source of challenge. Given that L2 speakers are often more dominant and proficient in their L1, unidirectional cross-linguistic influence tends to take place from L1 to L2 largely automatically (Karaca, Brouwer, Unsworth, & Huettig, 2021), which delays the pre-activation of lexical representations of L2 words and thus makes prediction less efficient.
In short, several challenging situations limit predictive processing, including those in which processing resources are taxed, the production system is occupied concurrently, and non-native language processing is involved.

Prediction and interpreting
The case of prediction during interpreting is particularly interesting because the interpreting task involves several of the challenges for predictive processing mentioned above. Interpreting is a linguistically and cognitively demanding bilingual experience, in which interpreters must comprehend one language and produce another language under extreme time pressure (Dong & Li, 2020;Frauenfelder & Schriefers, 1997). There are two typical interpreting typesnamely, consecutive interpreting (CI) and simultaneous interpreting (SI). CI is a two-stage process, where interpreters must comprehend the speech input in the source language first and subsequently produce the output in the target language (Pöchhacker, 2011a), with the memory load accumulated before the interpretation has been finished (Liang, Fang, Lv, & Liu, 2017). In SI, production is in synchrony with perception and comprehension of language information in the source language (Pöchhacker, 2011b). Given the need to divide attention to multiple tasks simultaneously, SI is generally regarded as a more challenging interpreting type.
Considering that prediction may be more limited in situations when cognitive load, time pressure, concurrent production, and additional L2 processing are involved (reviewed in the preceding section), further research is warranted to explore prediction during the challenging circumstances of interpreting. It is noteworthy that traditional interpreting accounts consistently assume an important role of prediction in achieving successful interpreting. This role is usually assumed because of the notion that the potential benefits of prediction during interpreting may motivate interpreters to use it as a practical strategy (Moser, 1978;Gerver, Longley, Long, & Lambert, 1984;Setton, 2005). It remains however the case that time pressure is a big burden for interpreters, especially when they interpret in the simultaneous way. Relying on prediction, interpreters have a chance to maintain a shorter time lag between the onset of input and output, and thus to keep pace with the speaker. On the other hand, cognitive resource constraints form another challenge imposed on interpreters by the task situation, accounting for impaired fluency, numerous errors, omissions and infelicities in interpreting (Gile, 2009). In this regard, prediction has the potential of easing the high cognitive load caused by the multiplicity and simultaneity of interpreting (Gile, 2009;Seeber & Kerzel, 2011), and is commonly taught as an efficient strategy (Li, 2015). Recently, Amos and Pickering (2020) proposed a theory of prediction in simultaneous interpreting based on a set of psycholinguistic studies on prediction. The authors assume that the prediction-by-production mechanism may underlie prediction in SI, in which interpreters rely on their production system to deploy rapid prediction with semantic, syntactic and phonological representations involved.
From theoretical modelling to empirical testing, a number of studies have been conducted, but few of them provide solid evidence for prediction in interpreting due to limitations of research focus and design. One reason is that the definition of prediction within the framework of interpreting studies tends to be vague (Liontou, 2012). Another reason is that exploring prediction during the processing of the source language is rare (Wilss, 1978;Van Besien, 1999;Kurz & Färber, 2003). Some empirical interpreting studies sought to tap predictive processing during comprehension using measures such as latency (van Hell & de Groot, 2008;Chmiel, 2016;Hodzik & Williams, 2017;Chmiel, 2021). Hodzik and Williams (2017), for example, attempted to index prediction using latency measures between the target word in the source language and its equivalent in the target language, showing that target words were interpreted faster in the high (vs. low) constraining condition. However, one must interpret such data as evidence for prediction with caution, because they can also be explained by integration (for extensive discussion, see Pickering & Gambi, 2018). To obtain solid evidence for prediction, measuring the preactivation of linguistic representations is the key, which, however, is difficult to detect using off-line measures.
It is clear that, with prediction defined as the pre-activation of linguistic information, more appropriate on-line methods are needed to examine prediction in interpreting. Several studies based on eye-tracking data indicated that experience with interpreting and code-switching could help bilinguals predict target linguistic units based on grammatical gender (Valdés Kroff, Dussias, Gerfen, Perrotti, & Bajo, 2017) and morphological cues (Lozano-Argüelles, Sagarra, & Casillas, 2019). Although these empirical efforts strengthen the grounds of prediction in interpreting, the current evidence for the important role of prediction is scarce. Up to now, direct evidence that prediction occurs during interpretation appears to be limited to a recent PhD dissertation by Amos (2020), who, using the visual world paradigm; found that prediction often takes place in both SI (interpreting from L2 English to L1 French) and CI (interpreting from L2 English to L1 Dutch).

The current study
Here, we sought to test the limits of prediction by observing prediction in two challenging tasks, i.e., consecutive and simultaneous interpreting. The former setting involves a production process following a comprehension process, while in the latter setting comprehension and production overlap. By doing so, we can test whether different task settings affect prediction. We focused on the prediction of the L1 source language during interpreting because we were particularly interested in prediction in challenging situations. Investigating prediction of the source language allowed us to compare our results to previous results during L1 listening without any interpreting tasks (Hintz, Meyer, & Huettig, 2017).
To this end, we conducted two visual world eye-tracking experiments. The eye-tracking method allowed us to measure semantic prediction in speech processing of the source sentences unequivocallythat is, before participants heard the anticipated target. Participants' anticipatory eye movements to the target objects in the predictable and non-predictable condition were recorded while they engaged in two Dutch-English interpreting tasks. In Experiment 1, participants were asked to interpret in a consecutive way while they were looking at co-present visual objects. In Experiment 2, a different set of participants was asked to interpret simultaneously in the more demanding interpreting task. Experiment 1 and Experiment 2 were carried out in parallel. Participants were from a homogenous undergraduate student population.
Notably, we used the same manipulation of verb-noun predictability, participant population, as well as spoken and visual stimuli from Hintz et al. (2017), not only because their manipulation and stimuli have been shown to elicit robust anticipatory eye movements with a large effect size in Dutch L1 processing, but also to directly compare prediction in various tasks in different kinds of "challenging situation": a) mere comprehension (Hintz et al., 2017); b) consecutive interpreting; c) simultaneous interpreting. The interpreting direction from L1 Dutch to L2 English was chosen partly due to the same considerationto enable comparison with the earlier study. But also, the L1-L2 interpreting direction is a common practice, especially on national markets (Denissenko, 1989;Lim, 2005;Chmiel, 2016Chmiel, , 2021, although the reverse L2-L1 direction is more widely used, especially for international organizations like UN (Donovan, 2004;Pavlović, 2007;Nicodemus & Emmorey, 2013) and favored by interpreting studies on prediction (Hodzik & Williams, 2017; Amos, 2020).
The current study thus also complements prior interpreting studies by focusing on a less frequently tested interpreting direction.

Experiment 1
Participants Thirty-three participants from the participant pool of the Max Planck Institute for Psycholinguistics were paid for their participation. The data from thirty participants (24 females; mean age = 22.03, SD = 2.06) were used for analysis (for data exclusion, see the Results and interim discussion section below). All of them were students at Radboud University, with Dutch as their native language. They all reported to use English frequently. On average, the participants had started learning English at the age of 10 (M = 9.70, SD = 2.22). English language television programs are typically not dubbed in the Netherlands and thus daily English language exposure is a normal part of Dutch life.
All participants had normal or corrected-to-normal vision as well as normal hearing. All participants gave informed written consent. Ethical approval to conduct the study was provided by the ethics board of the Social Sciences faculty at Radboud University.
In order to assess participants' English proficiency, participants were asked to rate their own level of English proficiency in terms of reading, speaking, writing and understanding spoken language, using a Likert-type scale (1 = very low, 7 = very comfortable). They considered themselves highly proficient in English (reading: M = 6.20, SD = 1.01; speaking: M = 5.33 SD = 1.32; writing: M = 5.43, SD = 1.36; understanding spoken language: M = 6.23, SD = 0.80). Furthermore, we administered the (English) National Adult Reading Test (NART) and the English version of Peabody Picture Vocabulary Test to assess their reading skills and receptive vocabulary size in English. These tests were carried out after the eye-tracking experiment.

National Adult Reading Test
The National Adult Reading Test comprises 50 written words in British English, which have irregular pronunciation. The NART was developed by Nelson (1982) and, along with its American English version (Blair & Spreen, 1989), is widely used as a measure of premorbid intelligence levels of English-speaking patients with dementia. More importantly for the present purposes, NART performance highly correlates with adults' reading and verbal comprehension skills (Bright, Hale, Gooch, Myhill, & van der Linde, 2018). Thus, the English version of NART was used to assess participants' verbal comprehension skills in L2, which is an important component of general English proficiency.
Participants were told to read the 50 words slowly and aloud, and they were encouraged to guess the pronunciation of words they were unfamiliar with. They were allowed to correct their responses and the test was untimed. Following the NART scoring guidelines, a score for each participant was calculated based on the number of errors using the following formula: Verbal Comprehension Index = 126.81-1.0745 × errors

Peabody Picture Vocabulary Test
The Peabody Picture Vocabulary Test was developed by Dunn and Dunn (1997) and has been used widely to measure receptive vocabulary size (also for participants in visual world eye-tracking Bilingualism: Language and Cognition experiments, e.g., Borovsky, Elman, & Fernald, 2012;Rommers, Meyer, & Huettig, 2015;Hintz et al., 2017). A digitized version of the English Peabody test was used in the current study to assess participants' L2 lexical ability, which is another important component of L2 proficiency. Following the standard protocol of the test, on each trial, participants heard a word and saw four numbered pictures. Participants were asked to give the number (1, 2, 3, or 4) that corresponded to the correct picture indicated by the spoken word. Trials were presented in blocks of 12 increasing in difficulty. The test ended if fewer than five correct responses were provided within the current block. Participants' score was the number of the last item they saw minus the number of errors made. Since we tested non-native Dutch speakers of English, we did not apply the agesensitive transformation procedure as described in the test manual since the population norms were based on native English individuals.
The raw scores of both NART and Peabody test were used for analysis, with their descriptive results shown in Table 1a. The results of the two language proficiency tests as well as selfrating scores correlated with each other positively and robustly, see Table 1b. The overall results reveal that the participants were highly proficient in L2 English.

Stimuli
The same Dutch sentence recordings and visual displays as in Hintz et al. (2017) were used. The materials consisted of 40 target nouns and 80 verbs used in the sentence "The man (verb) at this moment a (noun)". The adverbial phrase "at this moment" separated verb and noun, and was included to give participants enough opportunity to engage in predictive language processing. The stimuli sentences lasted, on average, 2483ms. The resulting sentence construction is deemed quite natural by native speakers of Dutch. Each target noun appeared in two versions, as a predictable and as a nonpredictable item depending on the verb preceding it (e.g., "De man schilt/tekent op dit moment een appel", the man peels/draws at this moment an apple, see Appendix A, for all items). Each target noun was paired with a set of four objects, one of which being a depiction of the target noun, the other three being unrelated distractors (Figure 1, for an example).
To evaluate whether predictable and nonpredictable sentences were classified properly, Hintz et al. (2017) had pretested all sentences for cloze probability according to Taylor (1953). In the predictable condition, the mean cloze probability of the target nouns was .39 (SD = .24; ranging from .06 to .8); in the nonpredictable condition, it was zero. In addition, a series of pretests assessing the verb-noun relationship was conducted, including free association strength, plausibility, typicality rating (for more details, see Hintz et al., 2017).

Procedure
The eye-tracking experiment consisted of 80 experimental items (40 target nouns presented in predictable and nonpredictable conditions) in total. Predictable and nonpredictable items were evenly distributed across two lists such that the same target noun did not appear twice on one list. Specifically, each list contained all the target nouns (40), with half of them (20) paired with a predictable verb and the other half (20) paired with a nonpredictable verb. Participants were randomly assigned to one list and sat in a sound-shielded room. Eye movements were tracked using an Eye-link 1000 Tower Mount (SR Research) sampling at 1000 Hz.
After successful calibration of the eye-tracker, participants received the task instruction: they were told to listen to the sentences carefully and interpret them from Dutch to English. Importantly, participants were instructed to interpret in a consecutive fashion. That is, they should listen to a given sentence first and start producing the translated sentence AFTER the spoken sentence had ended. In line with previous studies, no explicit instruction was given as to where they should look on the visual display (i.e., a look-and-listen task, for further discussion, see Huettig, Rommers, & Meyer, 2011).
Each trial began with a central fixation dot presented for two seconds. After the dot disappeared, a picture consisting of 4 objects was displayed and then the playback of the sentence started. The presentation of the visual displays was timed to precede the onset of the spoken verb by one second to provide sufficient time to preview all four objects. The position of the four objects was random on a (virtual) 2 × 2 grid (Figure 1, for an example). A beep marked the end of the spoken sentence and indicated to participants that they could initiate their interpretation. The visual display of four objects remained in view until the end of the trial, see Figure 2. Each participant was presented with all 40 trials on one list (20 trials for predictable and the other 20 trials for nonpredictable condition). The order of trials was randomized automatically before the experiment. The eyetracking experiment, including calibration and validation, took approximately 10 min.

Data Analysis
Four areas of interest (200 × 200 pixels) were defined for the four objects in the display. Using the algorithm provided by the EyeLink software, eye gaze was analyzed in terms of fixations directed to the target object or to one of the three unrelated distractors, or elsewhere. We plotted participants' fixation proportions for each object (target, distractors) and each condition (predictable, nonpredictable) during the whole interpreting process (Figure 3), spanning 2.5 seconds before and 5 seconds after target word onset. This period captured both comprehension and production processes. A magnitude estimation approach was used for data analysis. This was in line with the 'new statistics' approach (Cumming, 2014), which advocates a change from nullhypothesis testing to interpreting results by using measures of effect sizes and confidence intervals. Empirically evidenced by Fidler and Loftus (2009), reporting confidence intervals leads to a better interpretation of results than that based on null hypothesis testing (for extensive discussion, see Cumming, 2012;Cumming, 2014). We reported the mean fixation proportions accompanied by by-participant confidence interval (95%, area shaded in gray), see Figure 3. As in previous studies (e.g., Huettig & Janse, 2016;Hintz et al., 2017;Huettig & Guerra, 2019), in doing so we provide a detailed graphical description of eye movements over time in each experimental condition.

Results and interim discussion
The data of three participants were excluded because they did not fixate any displayed object on more than 25% of trials. The recordings of participants' interpreted sentences were scored for accuracy and transcribed using Praat (Boersma, 2001). Interpreting outputs were scored as correct if they were identical to our translation of the target sentence or when a semantically similar verb and/or noun was used. In Experiment 1, the overall  After that, a picture consisting of 4 objects (one target, three distractors) was displayed and then the playback of the sentence started. The presentation of the visual displays was timed to precede the onset of the spoken verb by one second. After the offset of the spoken sentence, participants were instructed to initiate their interpretation. The visual display remained in view until the end of the trial. The diagram of consecutive interpreting task used in Exp 1 is also shown. accuracy of interpreting was 87.58% (SD = 11.38) and 12.42% of data (incorrect translations) were excluded from further analyses. 1 Participants completed interpretation earlier (but not statistically significant, t = 0.93, p = .357, d = 0.24, 95% CI [-298, 67]) in the predictable condition (M = 6571ms, SD = 465ms) than the nonpredictable condition (M = 6686ms, SD = 492ms). With regard to cross-condition accuracy, the accuracy of interpretation in the predictable condition (M = 84%, SD = 12%) was significantly lower than that in the nonpredictable condition (M = 91%, SD = 9%), t = 2.47, p = .017, d = 0.67, 95% CI [-0.13, -0.01]. Figure 3 presents the fixation proportions for Experiment 1: fixations to the target (solid lines) and to the averaged distractors (dashed lines) over time for the predictable (green) and nonpredictable (red) condition. The shaded grey areas surrounding the lines represent by-participant 95% confidence intervals (Huettig & Janse, 2016;Hintz et al., 2017;Huettig & Guerra, 2019). Figure 3 covers a period of 7500 ms time course, with time zero indicating the acoustic onset of the spoken target. Consistent with the task instruction, comprehension and production happened sequentially.
In the predictable condition, the likelihood of looking at the target object increased well before it was mentioned, at around one second before target word onset. In contrast, in the nonpredictable condition, participants only looked at the same objects after they were referred to in the speech signal, starting 200 ms after target word onset, which is the time needed to launch a saccadic eye movement (Saslow, 1967). In spite of the differences in fixations prior to target word onset, the eye-movement patterns after target word onset looked very similar in predictable and nonpredictable conditions: fixations to the target objects dropped slightly after the offset of the spoken sentence but increased again after participants had started producing their interpretation.
Above all, the results demonstrate clear evidence for predictive processing in a consecutive interpreting task. In Experiment 2, participants were required to do a more demanding interpreting task (i.e., simultaneous interpreting). That is, Experiment 2 was identical to Experiment 1, except that participants were instructed to interpret the spoken sentence while they were still listening to it.

Experiment 2
Participants Another forty-one participants from the participant pool of the Max Planck Institute for Psycholinguistics were recruited for Experiment 2 and were paid for their participation. The data of thirty participants (23 females; mean age = 23.87, SD = 3.68) were used for analysis (for data exclusion, see the Results and interim discussion section below). They were again all students from Radboud University, with Dutch as native language and English as frequently used foreign language. As with the participants in Experiment 1, they also started learning English at the age of 10 around (mean age = 9.87, SD = 1.87). All participants had normal or corrected-to-normal vision as well as normal hearing. All participants gave informed written consent before taking part in the experiment. Ethical approval to conduct the study was provided by the ethics board of the faculty of Social Sciences at Radboud University.
The same self-report as in Experiment 1 was administered and showed that participants self-rated themselves a high level of English proficiency (reading: M = 6.33, SD = 0.69; speaking:  Table 2a. The correlations between self-rating scores, NART score, and Peabody score were calculated; the correlations showed that the scores robustly and positively correlated with each other (except three pairs: NARTspeaking, NART-understanding, and Peabody-understanding), see  [-11.51, 4.24]).

Stimuli, procedure and data analysis
Stimuli, procedure and data analysis were the same as in Experiment 1, except that participants were instructed to interpret the sentences in a simultaneous rather than consecutive fashion, see Figure 4 for the procedure. To that end, participants were asked before the experiment to initiate their interpretation as soon as possible. Additionally, we implemented an auditory beep to occur two seconds after the end of the spoken sentence and told participants that their interpretation should be finished before the beep. Pretests had shown that this setting was feasible.

Results and interim discussion
Among the forty-one participants, seven participants did not look at any displayed object on more than 25% trials, while another four participants always focused on one or two fixed positions on the screen. According to their post-experiment verbal report, these eleven participants just focused on the interpreting task without viewing the displayed objects. We excluded these participants' data as they had engaged in a specific form of strategic processing. In addition, the accuracy of interpreting was 86.58% (SD = 11.16%) so that 13.42% of data (incorrect translations) were excluded from further analyses.
Participants showed a similar efficiency-accuracy offset as in Experiment 1. That is, they completed interpretation earlier (but not statistically significant, t = 1.26, p = .212, d = 0.33, 95% CI [-298, 67]) in the predictable condition (M = 4346ms, SD = 365ms) than the nonpredictable condition (M = 4461ms, SD = 341ms). With regard to cross-condition accuracy, the accuracy of interpretation in the predictable condition (M = 81%, SD = 11%) was significantly lower than that in the nonpredictable condition (M = 92%, SD = 9%), t = 4.32, p < .001, d = 0.67, 95% CI [-0.16, -0.06]. Figure 5 plots participants' fixation behavior during the SI process. As can be seen in Figure 5, participants initiated their interpretation approximately 1080ms after the onset of the spoken Dutch sentence, and 1403ms before the offset of spoken sentence. Very similar to Experiment 1, in the predictable condition, the likelihood of looks to the target increased shortly after participants had heard the verb in the spoken sentence, about one second prior to target word onset. Similarly, the time course of fixations to the target object in the nonpredictable condition was comparable to that in Experiment 1: compared to the unrelated distractors, more looks to the target were made about 200 ms after target word onset. Thus, the results suggest that participants predicted the upcoming target noun shortly after hearing the verb, while they had already started interpreting the spoken sentence.

Comparison of Experiment 1 and Experiment 2
To complement our magnitude estimation analysis approach and to quantify differences in eye gaze behavior between predictable and nonpredictable conditions across Experiment 1 and 2, we additionally fitted a linear mixed-effects model using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in R (R Development Core Team, 2012). To this end, fixation proportions during the onset verb-onset target period (i.e., prediction window) of the spoken Dutch sentences (200 ms were added to both onsets to account for the time it takes to program and launch a saccadic eye movement, Saslow, 1967) were extracted. To calculate the dependent variable, we divided each participant's  After that, a picture consisting of 4 objects (one target, three distractors) was displayed and then the playback of the sentence started. The presentation of the visual displays was timed to precede the onset of the spoken verb by one second. After hearing the spoken sentence, participants were asked to initiate their interpretation as soon as possible, and finish interpretation within a 2000ms window after the offset of sentence. The visual display remained in view until the end of the trial. The diagram of simultaneous interpreting task used in Exp 2 is also shown. proportion of looks to the target during the prediction window on a given trial by that participant's proportion of looks to the averaged distractors during the same time window. The resulting values were log-transformed. Prior to the division and logtransformation fixation proportions of 0 or 1 were replaced with 0.01 and 0.99, respectively (cf. Macmillan & Creelman, 1991). The model included Experiment (1 vs. 2) and Condition (predictable vs. nonpredictable) as fixed factors. Experiment 1 and the nonpredictable condition, respectively, were mapped onto the intercept (i.e., using treatment-/dummy-coding 2 ). To test for a potential influence of participants' English reading skills and their English receptive vocabulary size on eye movements, NART and PPVT scores (both scaled and centered) were added as continuous predictors. Participants and Items were added as random factors, both with random intercepts. Using a maximal random effects structure (with random slopes for Condition by Participants and Items and random slopes for Experiment by Item) resulted in 'model singularity'. We systematically simplified the random effects structure until the error did no longer occur. The formula of the final model was: targetpref ∼ Exp * Cond * (PPVT_cs + NART_cs) + (1|Participant) + (1 + Cond|Item), data = data, control = lmerControl(optimizer = "bobyqa").
This model revealed a significant effect of Condition (β = .82, SEβ = .15, t = 5.63), suggesting that target objects were looked at significantly more during the predictive period than the distractors in the predictable but not the nonpredictable condition in Experiment 1. None of the other factors, predictors or interactions reached statistical significance (see Table 3). Specifically, the lack of a significant interaction between Experiment and Condition demonstrates that gaze during the predictive window did not differ across both experiments. The same model without the fixed factor Condition provided a significantly worse fit to the data (χ 2 (6) = 43.79, p < .001).
In sum, the complementary mixed-effects modelling analysis suggests that the prediction effects in Experiment 1 and 2 were very similar and that neither NART nor PPVT scores contributed to explaining variance in participants' gaze behavior. It is noteworthy that 11 of the 41 participants in Experiment 2 did not move their eyes during the trials. Only 3 of 33 participants did not move their eyes in Experiment 1. We will discuss the relevance of this observation in the General Discussion.
Comparison of current study with Hintz et al. (2017) Finally, given the similarity of the present experiments with our previous study (i.e., same materials, L1 input, participants sampled from the same population), we conducted an additional analysis comparing eye gaze across interpreting and comprehension tasks. That is, we assessed whether having an interpreting task, either in a consecutive or simultaneous fashion, leads to differences in fixation behavior, compared to when participants merely comprehend the spoken sentences (i.e., look and listen, Hintz et al., 2017). To that end, we incorporated the data from Experiment 1 from Hintz et al. (2017) in the analysis described above. The model structure was identical, except that PPVT and NART were dropped and that the fixed factor Experiment had three levels, with 'Hintz2017' mapped onto the intercept (using treatment-/dummy-coding 3 ). As before, the maximal random effects structure yielded 'model singularity'. We therefore simplified the model until the error no longer occurred. The final model had the following structure: targetpref ∼ Exp * Cond + (1 + Cond| Participant) + (1|Item), data = data, control = lmerControl(optimizer = "bobyqa"). Table 4 summarizes the results of this analysis. While Conditionas in the previous modelcontributed significantly to explaining variance in eye gaze in the three experiments (larger preference for the target over the unrelated distractors in the predictable than in the non-predictable condition, β =.98, SEβ = .10, t = 9.68 in our previous experiment, Hintz et al., 2017), none of the other predictors showed a significant effect. In particular, none of the interactions showed even a trend towards an effect suggesting that gaze during the predictive window did not differ across the three experiments.
In sum, this analysis suggests that the presence of an interpreting task, where listeners comprehend speech in their native language and translate it into an L2, does not modulate (predictive) fixation behavior as compared to a setting where participants merely comprehend speech in their L1.

General discussion
We investigated the limits of prediction by asking native Dutch speakers, who were also proficient L2 speakers of English, to translate Dutch sentences into their English counterparts during consecutive and simultaneous interpreting. To this end, we conducted two visual-world eye-tracking experiments, in which participants viewed a visual display consisting of four objects (one target and three distractors) while interpreting simple Dutch sentences into English. In both experiments, the main manipulation was the predictability of spoken sentences. On hearing the predictable sentences, in the predictable condition it was possible for participants to use the semantic information of verbs to predict the upcoming target nouns (e.g., "The man peels the apple"), whereas the target nouns were not predictable in the nonpredictable sentences (e.g., "The man draws the apple").
In Experiment 1, participants were asked to engage in a consecutive interpreting taskthat is, they were asked to comprehend the speech inputs first and render the interpretation after the offset of spoken sentences. The results of Experiment 1 show that the participants fixated the targets before they were mentioned in the predictable condition, but such predictive looks to the targets were not observed in the nonpredictable condition. The bilingual participants of Experiment 1 thus showed anticipatory eye movements to semantically-related upcoming target words in the source language when concurrently planning consecutive interpretation.
Experiment 2 was conducted to examine whether prediction of the source language in novice interpreting can also routinely occur in a more difficult kind of interpreting tasknamely, simultaneous interpreting. Participants in Experiment 2 were required to interpret heard sentences in the simultaneous way, with comprehension and production happening nearly concurrently. The participants of Experiment 2 exhibited anticipatory eye movements to semantically-related upcoming target words in the source language when engaging in simultaneous interpretation.
The present findings thus suggest that proficient L2 speakers can engage in prediction in their L1 despite the adverse conditions imposed by an interpreting task on prediction, including cognitive load, time pressure, L2 processing and concurrent (or subsequent) production. These overall results could also be taken to support the notion that prediction of upcoming semantically-related words in the source language is advantageous for interpreting in both types of interpreting (consistent with recent findings by Amos, 2020, but with a focus on the L1-L2 interpreting direction). We note however that accuracy rates in both Experiment 1 and Experiment 2 in the predictable condition were significantly lower than in the nonpredictable condition. This raises the possibility that prediction, in certain situations, may be harmful, or at least is not beneficial (cf. Frisson, Harvey, & Staub, 2017;Huettig & Mani, 2016;Luke & Christianson, 2016). Further research could usefully investigate this possibility.
Too taxing to predict or too taxing to move the eyes?
It is important to point out at this junction that in Experiment 2 when participants got involved in a simultaneous interpreting task, more participants (11 of 41, about every 4th participant) than in Experiment 1 (3 of 33, about every 10th participant) chose not to move their eyes and view the displayed objects.
What does this difference mean? Is it the enhanced cognitive burden of the simultaneous interpreting task that caused this difference? Does it mean that prediction is not taking place at all in these cases, or is it that the predictive processing is not manifesting in eye movement behavior as measured through the visual world paradigm? We cannot be sure about the correct answer from the present data but there are hints in the previous literature that warrant a little speculation. To not move one's eyes in a visual world task is very unusual behavior. Visual-world eye-tracking behavior is a reflection of the tight connection between spoken language processing and visual processing that has been established in a great number of studies (for reviews, see Huettig et al., 2011;Magnuson, 2019). When participants hear a word that refers (directly or in an anticipatory fashion) to a visual object in their concurrent visual environment they quickly and semiautomatically (see Mishra, Olivers, & Huettig, 2013, for extensive discussion) direct their eye gaze to objects which are similar (e.g., semantically) to the heard word. Indeed all participants in Hintz et al. (2017) showed this typical eye movement behavior. We speculate here that it is thus likely that the 25% of participants in the simultaneous interpreting task did not move their eyes because of the extreme cognitive burden of this version of the interpreting task. What this means with regard to prediction in these 25% of people is unclear. It is possible that these 25% of participants did not predict semantically-related upcoming target words in the source language in simultaneous interpreting. This could be due to the higher-level complexity of SI relative to CI, with the former one featuring high degrees of multiplicity and simultaneity. We believe that future work on this particular issue would be particularly useful and informative. If it turns out that these 25% of people did not predict or show substantially reduced prediction, then this would suggest that there are important limits to prediction (cf. Huettig & Mani, 2016;Huettig & Guerra, 2019) during simultaneous interpreting given the relatively easy sentences participants translated in the current study. The present data cannot reveal whether this interpretation is correct and additional research is needed to explore this account. What our data do reveal however is that the 75% of participants who engaged in the typical semi-automatic visual world eye gaze behavior showed clear evidence of prediction of the source language also in simultaneous interpreting. Thus, for the vast majority of highly proficient bilinguals the present results suggest that prediction does not break down during interpreting even in a very challenging task such as simultaneous interpreting.

Prediction and production
It is noteworthy that the settings of the two experiments here were very similar to the experiments reported in Hintz et al. (2017) except for the addition of a production phase (CI task in Experiment 1) and the concurrent execution of comprehension and production (SI task in Experiment 2). Doing so fueled our motivation to compare prediction in various tasks with different challenging levels, providing another novel contribution of the present study. With the direct involvement of production processes, the current findings are consistent with accounts of prediction in language processing which assume a role of production system during comprehension (Federmeier, 2007;Pickering & Garrod, 2013;Dell & Chang, 2014;Huettig, 2015;Pickering & Gambi, 2018). However, different from relevant studies demonstrating reduced prediction when the prediction system was 'occupied' (Martin, Branzi, & Bar, 2018), or boosted prediction in the 810 Yiguang Liu et al.
case of increased engagement of the production system (Hintz et al., 2017;Rommers et al., 2020;Lelonkiewicz et al., 2021), the current study showed null effects of the addition of a production phase on anticipatory eye gaze.

Adaptive prediction
It is also surprising that the prediction effects were similar between mere comprehension and interpreting tasks and across different interpreting tasks (at least in 75% of the participants) given different cognitive challenges and processing mechanisms involved in them (Liang et al., 2017;Liang, Lv, & Liu, 2019;Jia & Liang, 2020). It is conceivable that such surprising results can be attributed to the variability of prediction being the result of not only the passive constraints of various mediating factors but also the 'active' adaptability of language users involved. Kuperberg and Jaeger (2016), for example, have put forward a 'utility view of prediction': language users dynamically adjust their predictive behavior by weighting the costs and benefits of prediction for achieving their communicative goals. Looking back to the current findings, participants faced both extra challenges (cognitive load, time pressure, concurrent production and cross-language processing) as well as benefits (to relieve cognitive burden and to deal with intense time pressure) related to prediction during interpreting tasks. How such a potential 'costbenefit analysis' plays out in specific communicative situations as well as on a mechanistic level is another interesting implication and challenge from the present study for further work.

Ecological considerations
Finally, we note that in the present study we chose a strongly supportive task environment: verb-based semantic prediction in a visual world context. Semantic prediction effects are typically much larger than syntactic or phonological prediction effects in native language processing (but see Ferreira & Qiu, 2021). Furthermore, the language stimuli to be interpreted in the present study were relatively simple compared with those in real interpreting situations. Future work is now in a good position to move on to explore prediction in interpreting using various non-semantic cues as well as the kind of sentences and phrases that are used in actual real world interpreting situations.

Data availability
The data that support the findings of this study are openly available in OSF at https://osf.io/54zup, DOI: 10.17605/OSF.IO/54ZUP.