Hostname: page-component-77f85d65b8-g98kq Total loading time: 0 Render date: 2026-03-28T02:54:05.905Z Has data issue: false hasContentIssue false

The role of audiovisual modality in predicting the neurodynamics of language control in Tibetan–Chinese bilinguals

Published online by Cambridge University Press:  12 September 2025

Yanbing Hu
Affiliation:
Department of Psychology, Northwest Normal University , Lanzhou, Gansu, China Key Laboratory of Education Digitalization of Gansu Province Key Laboratory of Behavior and Mental Health of Gansu Province
Keyu Pan
Affiliation:
Department of Psychology, Northwest Normal University , Lanzhou, Gansu, China Key Laboratory of Education Digitalization of Gansu Province Key Laboratory of Behavior and Mental Health of Gansu Province
Xiaofeng Ma*
Affiliation:
Department of Psychology, Northwest Normal University , Lanzhou, Gansu, China Key Laboratory of Education Digitalization of Gansu Province Key Laboratory of Behavior and Mental Health of Gansu Province
*
Corresponding author: Xiaofeng Ma; Email: Maxiaofeng@nwnu.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

Although bilinguals use both auditory and visual cues, the cognitive cost of language switching in audiovisual contexts is unclear. We investigated the cost in Tibetan–Chinese bilinguals using a task with audiovisual, visual and auditory modalities. In Study 1, the audiovisual modality yielded the fastest reaction times, reflecting improved processing efficiency. ERP data revealed smaller positive amplitudes in the early window (200–350 ms) for audiovisual modality, indicating reduced neural demand, while only auditory modality showed significant divergence in the later window (350–700 ms). Moreover, audiovisual context, L2-to-L1 switching and early neural responses predicted switching behavior. Study 2 replicated the behavioral and ERP findings of Study 1 and demonstrated that auditory input and second-language processing exacerbated switch costs. These findings shed light on multisensory integration in language switching by demonstrating that audiovisual cues reduce switch costs, whereas auditory input and second-language processing exacerbate them, with implications for language education and cognitive interventions.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Self-assessment of language proficiency of tibetan–mandarin bilingual participants

Figure 1

Figure 1. The experimental flowchart illustrates the repeat and switch tasks in visual, auditory and audiovisual conditions. In the diagram, we use the L1 repeat task and the L1–L2 switch task as examples to explain the process. In the L1 repeat task, participants see a red cue, prompting them to name the image in L1. In the L1–L2 switch task, participants first see a red cue, instructing them to respond in L1. Then, in the second trial, they see a blue cue, signaling them to switch to L2 for the response.

Figure 2

Figure 2. Response times (in milliseconds) for language switching and repetition tasks across three perceptual modalities (audiovisual, auditory and visual) in Tibetan–Chinese bilinguals. The bar plots represent mean response times, with error bars indicating standard error of the mean (SEM). The red bars correspond to tasks involving Chinese (L2), while the gray bars correspond to Tibetan (L1). The data points indicate individual participant responses.

Figure 3

Figure 3. ERP Results for Bilingual Language Switching Tasks in Different Perceptual Modalities (Audiovisual, Auditory, Visual) in the Medial Anterior Region. The top panel displays ERP waveforms for repetition and switching tasks involving Tibetan (L1) and Chinese (L2). The red dashed line represents the Chinese repetition task, the gray dashed line represents the Tibetan repetition task, the solid red line represents the Chinese switching task and the solid black line represents the Tibetan switching task. The middle and bottom panels show topographic maps and bar plots of ERP responses within the early (200–350 ms) and late (350–700 ms) time windows across different task and language conditions (A = Auditory, V = Visual, AV = Audiovisual, CC = L1 Repetition Task, TT = L2 Repetition Task, CT = L1 to L2 Switching Task, TC = L2 to L1 Switching Task), with the region of interest (ROI) being the medial anterior area. The bar plots represent mean amplitudes, and error bars indicate the standard error of the mean (SEM).

Figure 4

Figure 4. Machine learning model fitting results. The left panel shows the distribution of mean squared error (MSE) values across multiple iterations, with the green dashed line indicating the p-value result from the MSE permutation test. The middle panel illustrates the distribution of R-squared (R2) values, with the red dashed line representing the p-value result from the R2 permutation test. The right panel presents a scatter plot of the model’s predicted scores versus actual scores, with an R2 value of .35. The solid line represents the best-fit line.

Figure 5

Figure 5. Machine learning feature importance results. The bar chart illustrates the feature importance values in the ridge machine learning model, indicating the relative contribution of different features in predicting language switching behavior. The darker bars represent higher feature importance. MP = medial posterior, MA = medial anterior, MC = medial central, SW = switching task, AV = audiovisual modality, V = visual modality, A = auditory modality, Tib = Tibetan, Chi = Chinese.

Supplementary material: File

Hu et al. supplementary material

Hu et al. supplementary material
Download Hu et al. supplementary material(File)
File 240.5 KB