Highlights
-
• VR shows additive effects of Valence and Weather on speeded language decisions, with no reliable Language effect
-
• Negative high-arousal words speed lexical decisions in L1 and L2
-
• Rainy VR induces an overall slowing without modulating valence or language effects
-
• The weather effect becomes more pronounced over time, practice-related speeding emerges in sun but is constrained under rain
-
• Supports VR as a tool for ecologically realistic research in bilingual psycholinguistics
1. Introduction
Language and emotion are deeply interconnected. Language conveys affective content while also shaping how emotions are perceived, experienced, and regulated (Ferré et al., Reference Ferré, Fraga and Hinojosa2025). Psycholinguistic research has consistently shown that the degree to which a word is experienced in terms of positive or negative polarity (i.e., the word’s valence) directly influences its recognition (Kousta et al., Reference Kousta, Vinson and Vigliocco2009). Across lexical decision and emotional categorization tasks, positive words typically elicit faster responses than neutral and negative words (Crossfield & Damian, Reference Crossfield and Damian2021; Kauschke et al., Reference Kauschke, Bahn, Vesker and Schwarzer2019; Scott et al., Reference Scott, O’Donnell and Sereno2014). Evidence regarding the processing of negative words is mixed. Early accounts of ‘automatic vigilance’ proposed that negative stimuli capture attention and slow responses, a view supported by large-scale analyses showing slower recognition of negative relative to positive words (Kuperman et al., Reference Kuperman, Estes, Brysbaert and Warriner2014). However, other studies controlling for lexical confounds suggest that both negative and positive words can be recognized faster than neutral words, consistent with the idea that motivationally relevant stimuli enjoy a processing advantage (Kousta et al., Reference Kousta, Vinson and Vigliocco2009; see Hinojosa et al., Reference Hinojosa, Moreno and Ferré2019 for a review). Critically, these effects are not only driven by valence but also by arousal. Interestingly, high-arousal negative words have been shown to facilitate recognition relative to neutral ones, whereas low-arousal negative words tend to slow down responses (Citron et al., Reference Citron, Gray, Critchley, Weekes and Ferstl2014; Hofmann et al., Reference Hofmann, Kuchinke, Tamm, Vo and Jacobs2009; Larsen et al., Reference Larsen, Mercer, Balota and Strube2008). This variability highlights that the impact of negativity on word recognition is not uniform and may depend on task demands, stimulus properties, and individual differences. Importantly, large-scale evidence confirms robust valence effects while also showing that their size varies across individuals (Haro et al., Reference Haro, Hinojosa and Ferré2024). Complementarily, computational work suggests that positive words reach stable semantic representations earlier and more reliably during development than negative words, which may underpin their processing advantage (Martínez-Huertas et al., Reference Martínez-Huertas, Jorge-Botana, Martínez-Mingo, Iglesias and Olmos2025). Together, these strands position valence and its interaction with arousal as a central determinant of lexical access.
In bilingual contexts, emotional effects are further modulated by language. The emotional resonance of words is typically attenuated in a second language (L2) compared to the native language (L1). This phenomenon, often termed the foreign-language effect, has been extensively reviewed (see Aguilar et al., Reference Aguilar, Ferré, Hinojosa and Federmeier2024 for an extensive review on the topic) and demonstrated in experimental studies across fields (Costa et al., Reference Costa, Foucart, Arnon, Aparici and Apesteguia2014; Geipel et al., Reference Geipel, Hadjichristidis and Surian2015; Hayakawa et al., Reference Hayakawa, Tannenbaum, Costa, Corey and Keysar2017). Emotional words in L2 often evoke weaker neural and physiological responses than in L1. For instance, Chen et al. (Reference Chen, Lin, Chen, Lu and Guo2015) showed that emotional words elicited reduced ERP and fMRI responses in L2 compared to L1, indicating attenuated neural engagement. Similarly, Iacozza et al. (Reference Iacozza, Costa and Duñabeitia2017) reported reduced pupil dilation when participants read emotional sentences in L2 relative to L1. Such asymmetry is often attributed to differences in acquisition context and frequency of emotional use (Caldwell-Harris, Reference Caldwell-Harris2014; Opitz & Degner, Reference Opitz and Degner2012). Typically, L1 is learned in emotionally rich, embodied contexts, while L2 often lacks such grounding experiences. Yet, when proficiency in both languages is accounted for, lexical decision tasks do not show reliable language-specific differences in emotional word processing (Ponari et al., Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015). This tentatively suggests that as L2 proficiency increases, its disembodiment decreases and responses can parallel those observed in L1.
Beyond internal linguistic and affective variables, environmental context can also shape lexical processing. Outside the laboratory, recognition unfolds within perceptual variability and concurrent sensory demands. Research on processing fluency shows that perceptual disfluency, such as degraded or visually challenging input, slows recognition and can alter processing strategies (Alter & Oppenheimer, Reference Alter and Oppenheimer2009). In this regard, weather is one of the most pervasive factors in everyday perception. It has been associated with fluctuations in mood and expressed sentiment (Baylis et al., Reference Baylis, Obradovich, Kryvasheyeu, Chen, Coviello, Moro, Cebrian and Fowler2018; Jiang et al., Reference Jiang, Murrugara-Llerena, Bos, Liu, Shah, Neves and Barbieri2022), even though some experimental effects are often subtle and depend on the manipulation (Behnke et al., Reference Behnke, Overbye, Pietruch and Kaczmarek2021), underscoring the need for paradigms that are both controlled and ecologically grounded. Virtual reality (VR) affords precisely this combination (Rocabado et al., Reference Rocabado, Muntini, Jubran, Lachmann and Duñabeitia2025a).
Within VR, initial studies suggest that naturalistic environmental load (e.g., simulated rain) negatively impacts L1 single word processing and sentence reading (Rocabado et al., Reference Rocabado, Muntini, González Alonso and Dunabeitia2024). Importantly for the current study, in an affective L1 word judgment task, rainy VR conditions have been shown to slow responses without altering perceived valence, while intrinsic affective evaluations were kept intact (Rocabado & Duñabeitia, Reference Rocabado and Duñabeitia2025). This dissociation indicates that the impact of simulated rain is better interpreted as perceptual interference rather than mood induction, implying that environmental disfluency taxes early visual processing while leaving affective representations largely unchanged. In another study, Rocabado et al. (Reference Rocabado, Schmitz and Duñabeitia2025b) employed a bilingual language-decision task with orthotactically unmarked Spanish and English words distributed across two visually degraded conditions, simulated rain and laboratory degradations (namely, a noise mask). Rainy weather induced a general slowdown across both languages, whereas visual noise-induced masking selectively reduced the L1 advantage. These findings point to a dissociation where naturalistic environmental disfluencies, such as simulated rain, tend to impose broad additive costs on processing, whereas laboratory perceptual degradations, such as masking with artificially superimposed visual noise, can more selectively disrupt bilingual lexical dynamics. Such dissociation motivates a critical next step: testing whether realistic weather manipulations modulate emotional word processing during timed lexical access, where early perceptual constraints and decisional pressure are paramount.
Despite these advances, the interaction between valence and environmental disfluency during timed bilingual word recognition in real-life-like settings remains largely unexplored, especially for negative items which are, presumably, the most sensitive to interference and thus the most likely to reveal boundary conditions. Prior VR studies have either examined bilingual language decisions with valence uncontrolled items (Rocabado et al., Reference Rocabado, Muntini, Jubran, Lachmann and Duñabeitia2025a) or assessed valence ratings through reflective appraisal tasks without testing lexical access efficiency of retrieving linguistic information during speeded online processing tasks (Rocabado & Duñabeitia, Reference Rocabado and Duñabeitia2025). Consequently, it is unclear whether the tendency for negative words to elicit longer response times and less consistent performance due to attentional capture and associated costs (Hinojosa et al., Reference Hinojosa, Moreno and Ferré2019; Kousta et al., Reference Kousta, Vinson and Vigliocco2009; Kuperman et al., Reference Kuperman, Estes, Brysbaert and Warriner2014) is amplified, attenuated, or preserved under perceptually challenging and realistic conditions like simulated rain. This issue is especially relevant in bilingual settings. Given that emotional processing in the L2 is often described as attenuated, one might argue that the negative-over-neutral contrast could be smaller in L2 than in L1. Nevertheless, in tightly controlled lexical decision tasks, highly proficient L2 speakers show the same effects for valenced over neutral words as native speakers do, with no reliable Language by Valence interaction. Late L2 speakers are slower overall and more frequency-sensitive, but the valence advantage persists in both languages (Ponari et al., Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015; Wang et al., Reference Wang, Zhang and Zhang2023).
The present study embeds a Spanish (L1) and English (L2) language-decision task in a VR realistic environment under sunny versus rainy conditions while manipulating valence (negative versus neutral). This task was selected for its heightened sensitivity to both lexical and sublexical processing demands, as it requires the successful integration of phonotactic, orthotactic, and semantic cues. Unlike traditional lexical decision tasks that can be administered in blocked language contexts, the language-decision task requires explicit identification of the language of each item, thereby maximizing cross-linguistic competition and providing a stringent test of bilingual lexical access under perceptual load (van Hell & Tanner, Reference van Hell and Tanner2012). This design reflects situated bilingual cognition within an immersive framework, simulating how languages compete for attention in dynamic real-world settings rather than being processed in isolated blocks. We focus on the contrast between impactful highly negative words (i.e., words with low valence and high arousal) and neutral items (i.e., words with neutral valence and low arousal), since such negative stimuli are theoretically and empirically the most sensitive to interference, offering maximal leverage for detecting contextual modulation. This contrast is particularly relevant for bilingual processing, because negative high-arousal words are precisely those for which emotional resonance tends to be most attenuated in L2 compared to L1. If L2 emotional grounding is weaker, one might expect the facilitation associated with high-arousal negativity to be reduced, absent, or delayed in the L2 relative to the L1. Equally, if motivational salience, meaning its capacity to capture attention due to biological or behavioral relevance, overrides language-based attenuation, facilitation could emerge in both languages, paralleling the patterns observed in prior studies (Ponari et al., Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015). By focusing on this critical region of the affective space: negative high-arousal versus neutral low-arousal words, and targeting lexical–semantic and decisional processes, our design provides a stringent test of whether L2 emotional disembodiment persists under realistic perceptual load, or whether arousal-driven vigilance effects generalize across languages even when both languages are processed with comparable efficiency, acting as domain-general constraints on the time course of word recognition.
2. Method
2.1. Participants
A total of 40 students from Nebrija University participated in the present experiment. Among the participants, 32 self-identified as female (Mage = 22.22, SD = 4.53) and 8 as male (Mage = 22, SD = 4.41). All participants had normal or corrected-to-normal vision and hearing and were native speakers of Spanish (L1) with English as their second language (L2). To enroll in the experimental session, participants were required to self-report a B2 level of English proficiency according to the Common European Framework of Reference for Languages (CEFR, Council of Europe, 2001). This was further confirmed using LexTALE (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012), a vocabulary knowledge test designed to estimate lexical proficiency. Scores around 60–70 correspond to a B2 level, and participants achieved an average score of 70.96 (SD = 7.66). Written informed consent was obtained prior to participation, and the study was approved by the Research Ethics Committee at Nebrija University (approval code UNNE-2022-0017).
2.2. Materials
A total of 160 Spanish words and their English partial cognate translations were selected from Multilex (Spanish: Martínez et al., Reference Martínez, Conde, Reviriego and Brysbaert2024a; English: Brysbaert et al., Reference Brysbaert, Martínez and Reviriego2024). Relying on the Spanish word properties available we chose 80 negative word pairs and 80 neutral word pairs. To minimize trivial form-based cues, items were orthotactically non-identical cognates (Comesaña et al., Reference Comesaña, Ferré, Romero, Guasch, Soares and García-Chico2015). This partial overlap was confirmed using the length-corrected orthographic Levenshtein Distance (see Duñabeitia et al., Reference Duñabeitia, Dimitropoulou, Morris and Diependaele2013, Reference Duñabeitia, Borragán, de Bruin and Casaponsa2020), a metric that ranges from 0 (totally different translation equivalents) to 1 (completely overlapping cognates). The corrected Levenshtein Distance averaged 0.66, with values ranging from 0.34 (e.g., diablo – devil) to 0.80 (e.g., secta – sect). Word pairs were additionally matched on frequency, length, and orthographic neighborhood density (OLD20, Keuleers, Reference Keuleers2013; Yarkoni et al., Reference Yarkoni, Balota and Yap2008; see Table 1). Crucially, the negative and neutral sets differed not only in valence but also in arousal. Normative ratings from Martínez et al. (Reference Martínez, Molero, González, Conde, Brysbaert and Reviriego2024b) confirmed that negative words were both highly negative and high in arousal, whereas neutral words were mid-valence and low in arousal. This ensured that the manipulation contrasted very negative, high-arousal items with neutral, low-arousal ones, directly targeting the region of the affective space most likely to reveal vigilance-driven differences in lexical access.
Descriptive statistics of characteristics of the materials

Note: Values reported are means with standard deviation in parentheses for word frequency (Zipf scale), word length (number of letters), and the length-corrected LD metric for interlingual orthographic overlap distance score (Duñabeitia et al., Reference Duñabeitia, Dimitropoulou, Morris and Diependaele2013). Neighborhood density is indexed using OLD20, defined as the mean orthographic distance to the 20 closest lexical neighbors (lower values indicate denser neighborhoods). Finally, valence and arousal properties are presented for Spanish words.
2.3 Virtual reality setting and apparatus
The experimental setting was designed and created using Vizard 7.5 (WorldViz, 2024), a virtual reality (VR) software platform, and presented to participants through a head-mounted display (HMD). The VR task was programmed and designed using Python 3 and Vizard 7 (WorldViz). The experimental environments and stimuli were displayed using the HTC VIVE Pro HMD, which offers a rendering resolution of 2880 × 1600 pixels (1440 × 1600 pixels per eye), a 90 Hz refresh rate, and a 110° field of view. The participant’s viewpoint remained fixed throughout the task, irrespective of their real-world movements.
Modifications to the 3D model and the main environment were implemented using Vizard Inspector (WorldViz, 2024). This tool was utilized to remove unnecessary 3D objects and incorporate a white canvas for the presentation of experimental stimuli. Additionally, ambient sounds were included to enhance the realism of the VR experience, with rain sounds for the rainy condition and sounds of a fountain and pigeons for the sunny condition. To ensure auditory consistency, all ambient sounds were normalized and presented at a constant volume across conditions. These sounds were selected to be strictly congruent with the respective virtual contexts, providing a realistic level of environmental noise.
The background sky was animated to reflect the respective weather conditions (see Supplementary Materials for a video demonstration). Furthermore, the rainy condition featured darker modified building textures to reflect a cloudy, wet setting. Crucially, these environmental changes did not extend to the central white canvas used for stimulus presentation, which maintained uniform brightness and contrast across both weather conditions. This ensured that any visual interference was strictly a result of the dynamic falling raindrops between the participant and the target, rather than changes in the legibility or illumination of the display itself.
The items were presented in a 3D open street residential neighborhood, chosen as the main scenario for its high realism, familiarity, and the openness of the simulated environment. This immersive setting allowed participants to experience simulated weather in a more naturalistic context (see Figure 1 and Supplementary Material).
Example of the participant’s perspective in the main scenario under sunny (left) and rainy (right) weather conditions.

2.4. Task and procedure
Participants were seated on a rotating chair and provided with a HMD to immerse them in the 3D virtual environment, allowing a full 360° stationary perspective. Once the headset was placed and calibrated, participants were given two controllers, representing virtual hands, for interacting within the environment. The instructions for the language-decision task were displayed on a floating canvas in the virtual setting. These instructions outlined the two phases of the experiment: a practice phase followed by the main experimental phase. Participants categorized the linguistic stimuli as either Spanish or English by pressing buttons on the left or right controller, respectively, and were encouraged to respond as quickly as possible based on their first impressions.
The target words were displayed centrally on a simulated white canvas in black Courier New monospaced font to ensure readability. Each trial began with a central fixation mark presented for 500 ms to orient the participants’ gaze toward the stimulus area and minimize potential distractions from environmental elements. This was followed by the target word, which remained visible for a maximum of 3000 ms or until a response was provided. An inter-stimulus interval blank screen was presented for 500 ms before the next trial.
The order of presentation of the weather conditions (i.e., rainy or sunny) was counterbalanced across participants. In each weather condition, participants completed 160 unique randomized trials: 40 negative and 40 neutral words in Spanish, and 40 negative and 40 neutral words in English. After a 10–15 minute unrelated filler task, participants completed a second language-decision session under the remaining weather condition, again with 160 trials. Critically, translation pairs were split across weather conditions, so that translation equivalents were never presented in the same weather condition. The two 160-word lists were counterbalanced across weather conditions, and all words within each list were presented in a random order.
2.5. Data processing
All data handling was conducted in RStudio (RStudio Team, 2022). Reaction-time (RT) analyses were restricted to correct trials. Trials with incorrect responses were removed prior to trimming. For each participant, RT outliers exceeding ±2.5 standard deviations from that participant’s mean were excluded; this procedure led to the removal of 3.77% of the data. Accuracy was computed on the full set of valid trials prior to RT trimming. The final analysis datasets were inspected to verify normality of residuals for RT models and to ensure that exclusion rules were applied uniformly across experimental cells.
2.6. Statistical analysis
All inferential statistics were performed in jamovi (The jamovi project, 2022) using the GAMLj module (Gallucci, Reference Gallucci2019), which interfaces R’s mixed-effects engines. Accuracy was analyzed with a binomial generalized linear mixed-effects model (GLMM) and RTs with linear mixed-effects models (LMMs). Fixed factors were Language (Spanish/L1 versus English/L2), Valence (negative high-arousal versus neutral low-arousal), and Weather (sunny versus rainy). In addition, Trial Order was included as a covariate (z-scored), following psycholinguistic modeling conventions to control for systematic variance related to time on task (Baayen et al., Reference Baayen, Davidson and Bates2008).
All predictors were coded using deviation (sum-to-zero) contrasts so that main effects reflect average differences across the other factor levels. Random intercepts for participants and items were included in all models. We began with a maximal random-effects structure (Barr et al., Reference Barr, Levy, Scheepers and Tily2013; Brysbaert & Debeer, Reference Brysbaert and Debeer2025), specifying by-participant slopes for all within-subject predictors (Language, Valence, Weather, and Trial Order), and by-item slopes for Weather and Order. Random slopes were modeled as uncorrelated to mitigate convergence issues (Matuschek et al., Reference Matuschek, Kliegl, Vasishth, Baayen and Bates2017).
Due to persistent convergence and identifiability warnings (e.g., degenerate Hessian, negative eigenvalues), we adopted a principled stepwise reduction strategy. The final RT model retained random intercepts for participants and items, by-participant slopes for Language and Order, and fixed effects for Language, Valence, Weather, and Trial Order, along with the critical interaction between Weather and Order. No other interactions improved model fit or reached significance (all ps > .19), so the final model was additive.
For accuracy, the final GLMM included fixed effects of Language, Valence, Weather, and Trial Order, but no interaction terms, as none reached significance or improved model fit (all ps > .10). The random-effects structure included intercepts for participants and items, and a by-participant random slope for Language.
Likelihood-ratio χ2 tests are reported for GLMMs, F-tests for LMMs, and associated p-values. Odds ratios and mean differences in milliseconds are provided where relevant. Estimated marginal means are plotted with 95% confidence intervals in all figures.
3. Results
3.1. Accuracy
Accuracy was generally high across conditions. The final binomial GLMM included fixed effects of Language (Spanish versus English), Valence (negative versus neutral), Weather (sunny versus rainy), and centered Trial Order, with random intercepts for participants and items.
There were no significant effects of Language, χ2(1) = 0.21, p = .645, or Weather, χ2(1) = 1.93, p = .165. The effect of Trial Order was also not significant, χ2(1) = 0.83, p = .362. A marginal effect of Valence was observed, χ2(1) = 3.49, p = .062, with post-hoc contrasts indicating that responses to negative words were 1.42 times more likely to be correct than responses to neutral words (see Figure 2). In sum, accuracy did not vary systematically with language or weather and showed only a weak tendency toward better performance for negative relative to neutral stimuli.
Estimated marginal means of accuracy proportions for the three main factors: (left) language condition, (middle) valence condition, and (right) weather condition. Each panel presents the main effects of each factor on accuracy. Vertical bars represent ±1 standard error of the mean.

3.2. Reaction times
RTs were analyzed using a linear mixed-effects model with fixed effects of Language, Valence, Weather, and Trial Order (centered), as well as the Weather by Trial Order interaction. The final model included random intercepts for participants and items, and by-participant random slopes for Language and Trial Order.
A robust main effect of Valence emerged, F(1, 312.7) = 10.34, p = .001, with faster responses to negative words compared to neutral ones, yielding an average facilitation of 28 ms. The main effect of Weather was also significant, F(1, 11,895.6) = 4.39, p = .036, with overall slower responses in rainy compared to sunny conditions (mean difference = 9 ms). In contrast, the main effect of Language did not reach significance, F(1, 90.5) = 2.33, p = .13 (see Figure 3).
Estimated marginal means of reaction times (in milliseconds) for the three main model factors: (left) language condition, (middle) valence condition, and (right) weather condition. Each panel plot illustrates the main effects of each factor on reaction time. Vertical bars represent ±1 standard error of the mean.

Crucially, we observed a significant Weather by Trial Order interaction, F(1, 12,024) = 7.22, p = .007 (Figure 4). This indicates that the performance gap between weather conditions changed over time, driven primarily by an improvement under the sunny weather condition rather than a progressive decline under the rainy weather condition. Simple effects analyses revealed that in early trials (−1 SD), the difference between conditions was minimal (2.65 ms, SE = 6.09, t = 0.435, p = .664). At the task midpoint, responses under rainy weather were significantly slower than under sunny weather conditions by 8.98 ms (SE = 4.29, t = −2.10, p = .036). By the final phase of the task (+1 SD), this difference reached 20.61 ms (SE = 6.09, t = −3.38, p < .001), indicating that the sunny weather condition became progressively faster while the rain condition remained relatively stable across trials.
Estimated marginal means of reaction times (in milliseconds) as a function of weather condition across item order, shown at −1 SD, the mean, and +1 SD of the order distribution. The panel illustrates the significant weather by item order interaction, with the sun-related acceleration increasing over the course of the task. Vertical bars represent ±1 standard error of the mean.

Taken together, these results show that emotional valence consistently facilitates lexical access, and that virtual weather imposes a perceptual cost that becomes increasingly apparent as the task progresses. This pattern emerges because performance improves over time in sunny weather conditions, while response speed remains relatively stable under rain. Importantly, these effects were additive since no two-way or three-way interactions among Language, Valence, and Weather reached significance, with all p-values above .15. These findings support the interpretation that emotional and environmental factors independently constrain word recognition during dynamic VR-based processing.
4. Discussion
The present study examined how bilingual word recognition is modulated by emotional valence and environmental context under realistic perceptual conditions. Using a VR-based language-decision task, we tested whether simulated weather (sunny versus rainy conditions) interacts with word valence (negative versus neutral) and language (L1 Spanish versus L2 English). Moreover, exploratory models including trial-level order did not reveal any modulating effects on these main patterns, except for a significant interaction between Weather and Trial Order in the RT data, which indicated that rainy conditions led to increasingly slower responses as the task progressed. This suggests that the perceptual cost imposed by rain may accumulate under sustained load but does not alter the fundamental additive architecture of the effects. In addition, given that accuracy was high across conditions and no interactions involving valence, weather, or language emerged in the accuracy data, further reinforcing the additive pattern of effects and supporting the interpretability of the main RT results without confounding speed–accuracy tradeoffs. Taken together, the findings indicate that language, valence, and environmental context each exert reliable influences on word recognition, but their effects were additive rather than interactive.
While previous studies have emphasized the robustness of L1 advantages in bilingual lexical processing, our findings revealed no significant effect of Language in the RT data. Although responses tended to be slower to English than to Spanish items, this trend did not reach statistical significance. This contrasts with the classic L2 cost typically observed in bilingual lexical access, even when the two languages are closely matched in their properties and when proficiency is relatively high (Duyck et al., Reference Duyck and De Houwer2008; Gollan et al., Reference Gollan, Montoya, Cera and Sandoval2008). Similar developmental evidence has shown that slower responses in the non-dominant language emerge even in balanced bilingual children, and that this asymmetry gradually diminishes as reading skills consolidate (Duñabeitia et al., Reference Duñabeitia, Ivaz and Casaponsa2016). In the present context, the lack of a reliable Language effect provides an important baseline for interpreting the emotional and environmental effects, as it suggests that processing demands across languages may converge under immersive and ecologically valid conditions. Although our design was sensitive enough to detect other small main effects and interactions, the absence of a robust Language effect in this case may reflect the relatively high proficiency and frequent usage of both languages in our bilingual sample. This aligns with previous accounts suggesting that language dominance effects may attenuate in immersive or ecologically valid contexts where both languages are highly active (van Hell & Tanner, Reference van Hell and Tanner2012).
Regarding the valence effect observed (namely, faster responses to negative relative to neutral words), our results converge with studies showing that both negative and positive words can enjoy a processing advantage over neutral items (Citron et al., Reference Citron, Gray, Critchley, Weekes and Ferstl2014; Hofmann et al., Reference Hofmann, Kuchinke, Tamm, Vo and Jacobs2009). This facilitation is consistent with the approach–withdrawal model (Robinson et al., Reference Robinson, Storbeck, Meier and Kirkeby2004), by which negative valence combined with high-arousal signals potential threats and triggers rapid avoidance-oriented responses. Although the classic automatic vigilance account predicts a generic slowdown for negative words (Algom et al., Reference Algom, Chajut and Lev2004; Estes & Adelman, Reference Estes and Adelman2008), subsequent work has clarified that negativity is not categorical, and not all negative words produce the same degree of interference. In fact, interactions with arousal show that low-arousal negative words are most likely to produce slowdowns, whereas high-arousal negative words often produce less interference and sometimes even facilitation (Larsen et al., Reference Larsen, Mercer, Balota and Strube2008, Haro et al., Reference Haro, Hinojosa and Ferré2024). Our findings fit this pattern, suggesting that the use of highly arousing negative items may have shifted processing from interference toward vigilance-driven facilitation.
Importantly, the facilitation observed for negative words was parallel across L1 and L2, with no evidence for a Language by Valence interaction. At first glance, this absence of attenuation in L2 may seem unexpected given frequent reports of weaker emotional resonance in the second language (Caldwell-Harris, Reference Caldwell-Harris2014; Pavlenko, Reference Pavlenko2012). However, our results converge with previous well-controlled lexical decision studies showing that, when proficiency and stimulus properties are carefully matched, valence effects can be equivalent across L1 and L2. In this line, Ponari et al. (Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015) found that both native speakers and highly proficient L2 speakers exhibited comparable facilitation for emotional relative to neutral words in lexical decision (see Sutton et al., Reference Sutton, Altarriba, Gianico and Basnight-Brown2007 for an early Stroop-task evidence). Although previous reviews emphasize that findings are heterogeneous (see Aguilar et al., Reference Aguilar, Ferré, Hinojosa and Federmeier2024), our results converge with this view, suggesting that arousal-driven vigilance effects generalize across L1 and L2 when motivational salience is high.
Simulated weather also reliably modulated word processing and language-decision times. Responses under rainy conditions were slower than under sunny conditions, an effect that was consistent across both languages and valence categories. This pattern supports the perceptual load account suggested by Rocabado et al. (Reference Rocabado, Muntini, Jubran, Lachmann and Duñabeitia2025a), stating that environmental disfluency imposed by simulated rain taxes early perceptual processing, yielding a general slowdown, but does not selectively amplify or attenuate the effects of language or emotion. These results are in line with previous VR studies in which rain slowed lexical access and reading but did not alter intrinsic affective evaluations of words (Rocabado & Duñabeitia, Reference Rocabado and Duñabeitia2025). By contrast, laboratory degradations such as visual masking can disrupt bilingual dynamics more selectively (e.g., reducing the general L1 advantage). Notably, however, the effect of Weather was not constant across time. The observed interaction between Weather and Trial Order indicated that performance improved progressively under sunny conditions, while response speed under rainy conditions remained stable across the task. This shows that the classic practice effect, in which participants respond faster as they become familiar with the task, was only present in the sunny weather condition. By contrast, the difficulty introduced by the rainy environment appears to have prevented this improvement. Although a central fixation mark was used to orient attention before each trial, the falling droplets introduced persistent visual disturbance that contributed to perceptual disfluency without necessarily diverting gaze. Prior research has shown that dynamic but task-irrelevant visual input can impair performance by introducing noise that disrupts perceptual clarity, reduces processing fluency, and slows target detection (Li & Shimomura, Reference Li and Shimomura2025; Oppenheimer, Reference Oppenheimer2008). Perceptual environments with stable and predictable features, like the sunny condition, tend to support practice-related gains by allowing participants to allocate attention more efficiently to the task demands (Oppenheimer, Reference Oppenheimer2008). In this regard, the sunny weather may have facilitated faster adaptation, whereas the rain constrained performance by maintaining a constant level of interference. Importantly, the Weather effect was not accompanied by any interaction with Valence or Language, further supporting that the influence of simulated rain was additive and did not modulate emotional or linguistic processing.
At first sight, our results may seem at odds with some preceding VR studies that report mood- or situation-congruent effects in emotional processing (Wang et al., Reference Wang, Zhang and Zhang2023). In those studies, exposure to negative versus positive VR contexts biased subsequent word selection and amplified physiological responses such as galvanic skin conductance (Wang et al., Reference Wang, Zhang and Zhang2023). However, a key distinction needs to be noted at the level of the task timing. Affect-labeling and appraisal tasks require reflective judgment and therefore allow emotional context to influence decision criteria. In contrast, lexical or language decision demands rapid categorization, tapping earlier stages of lexical access. Our findings indicate that in such speeded tasks, realistic weather manipulations function primarily as early perceptual load rather than as affective inductions, producing additive slowdowns without interacting with emotional content. This dissociation highlights the importance of task characteristics in determining when environmental factors shape emotion–language coupling.
Furthermore, our results confirm that the disruptive influence of simulated rain is not uniform, but accumulates over time, as evidenced by the significant Weather by Trial Order interaction. This dynamic pattern supports the interpretation of environmental disfluency as a cumulative perceptual constraint rather than a transient mood induction, consistent with prior findings by Rocabado et al. (Reference Rocabado, Muntini, Jubran, Lachmann and Duñabeitia2025a). At the same time, the absence of interactions with language or valence indicates that the interference induced by simulated rain operates independently of lexical–emotional content. This points to a domain-general interference mechanism – likely due to reduced visual clarity or increased attentional demands – rather than to affective congruence or semantic integration.
Finally, it is important to acknowledge certain methodological constraints regarding environmental manipulation. While ambient sounds were matched for volume to ensure consistency, they were not controlled for other acoustic dimensions such as pitch or frequency spectrum. This limitation is relevant, as distinct sound profiles across weather conditions could introduce uncontrolled variance in auditory salience. Although these sounds were selected for their contextual congruence and ecological plausibility, future research should explicitly control or manipulate spectral features to examine whether auditory complexity interacts with visual load or lexical processing. Moreover, while word lists were matched for frequency, length, and orthographic neighborhood density, other uncontrolled lexical and sublexical factors could have exerted their influence in the observed effects. These dimensions represent residual design limitations that future research should address.
5. Conclusion
In this study, we examined how language (L1 Spanish versus L2 English), emotional valence (negative high-arousal versus neutral low-arousal), and realistic weather (sun versus rain) shape speeded lexical access in an additive manner in VR. The findings demonstrate that while rainy weather imposes a general processing cost, both languages are processed with comparable efficiency under these immersive conditions, and the relative pattern of valence effects remains consistent across languages. Crucially, the disruptive influence of rain was shown to increase over time, as revealed by the significant Weather by Trial Order interaction, indicating that perceptual load accumulates under sustained exposure rather than operating as a fixed or transient effect. These findings help delineate the boundary conditions of the foreign-language effect in emotion: while immersive VR contexts can bias affective judgments in reflective tasks, early lexical decision appears dominated by perceptual load and shows parallel valence effects in L1 and L2. The study also highlights that realistic environmental manipulations can systematically affect cognitive processing without necessarily altering emotional or linguistic dynamics, supporting the interpretation of simulated rain as a domain-general source of perceptual interference rather than an affective induction. More broadly, the study illustrates the utility of VR for bridging laboratory control with ecological realism, providing a framework for testing how environmental and emotional factors converge in situated bilingual cognition.
Supplementary materials
Video samples of the virtual reality task conducted under both sunny and rainy weather conditions, along with a list of experimental materials, can be found at the following link: https://doi.org/10.17605/OSF.IO/2WK8S.
Data availability statement
The data and analysis code that support the findings will be available at Open Science Framework at the following link: https://doi.org/10.17605/OSF.IO/2WK8S.
Acknowledgements
This research was partially funded by PID2024-161331NB-I00 (MCIN/AEI/10.13039/501100011033). We thank Nebrija University participants for their contribution.
Competing interests
The authors declare no competing interests.
Ethics statement
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
AI tools disclosure
The authors declare no use of artificial intelligence (AI) tools in the preparation of the present document.
