There is general agreement in the literature that bilingualism involves the workings of cognitive control, i.e., a collection of top-down processes responsible for achieving goal-directed behaviour in the face of distraction (Bialystok, Reference Bialystok2017; Green, Reference Green1998; Kroll, Dussias, Bogulski & Kroff, Reference Kroll, Dussias, Bogulski and Kroff2012). This proposal is supported by prominent models of bilingual language control which posit that bilinguals need to recruit domain-general inhibition mechanisms in order to reduce interference between constantly active languages (Inhibitory Control Model; Green, Reference Green1998; Bilingual Interactive Activation Model; Grainger, Midgley & Holcomb, Reference Grainger, Midgley and Holcomb2010; Grainger & Dijkstra, Reference Grainger, Dijkstra and Harris1992; Green, Reference Green1998; also see van Heuven & Dijkstra, Reference van Heuven and Dijkstra2010). This continuous management of multiple languages is proposed to develop and enhance the cognitive control system. However, the exact conditions under which between-language competition recruits cognitive control is a matter of ongoing research because the available evidence is inconsistent (de Bruin, Reference de Bruin2019; Leivada, Westergaard, Duñabeitia & Rothman, Reference Leivada, Westergaard, Duñabeitia and Rothman2020). Only recently has it been proposed that the engagement of cognitive control in bilinguals may depend on the patterns in which bilinguals use their languages (Abutalebi & Green, Reference Abutalebi and Green2016; Bak, Reference Bak2016; Bialystok, Reference Bialystok2017; de Bruin, Reference de Bruin2019; DeLuca, Reference DeLuca2019; Green & Abutalebi, Reference Green and Abutalebi2013; Pliatsikas, DeLuca & Voits, Reference Pliatsikas, DeLuca and Voits2020). In particular, the Adaptive Control Hypothesis (Green & Abutalebi, Reference Green and Abutalebi2013; ACH hereafter) proposes that different patterns of language use act as a cognitive training, thus triggering different adaptive changes in the cognitive control system.
The current study examined how bilinguals’ patterns of language use shape domain-general inhibition. Specifically, we tested the predictions of the ACH that are related to the language-use experiences of single-language and dual-language contexts (for details, see section 1.1). The patterns of language use were experimentally manipulated in a single group of bilinguals via language games. Since the language games involved natural language use (i.e., real conversation), they provided an ecologically valid manipulation of the language-use experience. The study should thereby allow us to assess whether and how natural language use affects inhibition.
1.1. The Adaptive Control Hypothesis
The ACH (Green & Abutalebi, Reference Green and Abutalebi2013) distinguishes between three basic patterns of language use (the so-called interactional context in the ACH): single-language context, dual-language context, and dense code-switching. Importantly, the ACH posits that these patterns engage cognitive control in different ways. Bilinguals who mix elements of two languages, e.g., words, within single utterances (i.e., representing the dense code-switching context) practically do not engage cognitive control as they utilise whichever language route is most readily available. In contrast, bilinguals who switch but do not mix their languages daily engage much more cognitive control during language use. Such bilinguals operate in either a single-language context (SL context), in which the person speaks only one language in each context (e.g., one language at home, another one at work), or a dual-language context (DL context), in which the person speaks two languages in one context but distinct languages are spoken with distinct speakers. Bilinguals representing both SL and DL contexts are hypothesized to engage such cognitive processes as interference control and goal maintenance as this helps them to suppress cross-language interference and maintain fluent use of the target language. However, since interference is more likely to happen when two languages are used in the same situation, bilinguals in a DL context are assumed to engage interference control and goal maintenance to a greater extent. Moreover, since bilinguals in a DL context typically use both languages in one conversation, they also engage additional cognitive processes in their language use, i.e., cue detection, response inhibition, and task engagement/disengagement. These processes enable bilinguals in a DL context to monitor environmental cues that suggest language switches and suppress the currently used language if there is a need to switch.
Overall, bilinguals in both SL and DL contexts are assumed to engage inhibitory processes in their language use. Crucially, however, since bilinguals in a DL context are expected to experience relatively high inhibitory demands on language use, they are assumed to engage and train this mechanism to a greater extent compared with other bilinguals. The available evidence for the ACH comes from two lines of research: cross-sectional studies that assess everyday habits of language use, and studies that manipulate language experience in an experimental setting. The following two sections present relevant findings concerning inhibition (for a detailed summary of the reviewed experiments, see Tables A.3. and A.4. in Wodniecka, Casado, Kałamała, Marecka, Timmer & Wolna, Reference Wodniecka, Casado, Kałamała, Marecka, Timmer and Wolna2020).
1.2. Efficiency of inhibition and bilinguals’ everyday language-use habits
There are only a few studies that have focused on how variation in everyday language-use habits differentiates bilinguals in terms of their inhibitory skills (Hartanto & Yang, Reference Hartanto and Yang2020; Kałamała, Szewczyk, Chuderski, Senderecka & Wodniecka, Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b; Pot, Keijzer & de Bot, Reference Pot, Keijzer and de Bot2018; see also Beatty-Martínez, Navarro-Torres, Dussias, Bajo, Guzzardo Tamargo & Kroll, Reference Beatty-Martínez, Navarro-Torres, Dussias, Bajo, Guzzardo Tamargo and Kroll2019; Gullifer, Chai, Whitford, Pivneva, Baum, Klein & Titone, Reference Gullifer, Chai, Whitford, Pivneva, Baum, Klein and Titone2018; Henrard & Van Daele, Reference Henrard and Van Daele2017; Ooi, Goh, Sorace & Bak, Reference Ooi, Goh, Sorace and Bak2018). Pot and colleagues (Reference Pot, Keijzer and de Bot2018) found that greater self-assessed diversity in language use across social contexts (an SL context in this paper) is related to a smaller flanker effect in RTs. This effect was observed in a group of older adults who were highly proficient in L2 and used this language on a daily basis. However, some results do not support the predictions of the ACH. Kałamała and colleagues (Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b) did not find support for the relationship between the self-assessed intensity of the DL context experience and inhibition (assessed by four different tasks) in a group of young adult bilinguals who declared high proficiency of L2 and everyday use of this language. In turn, in a group with similar self-assessed L2 proficiency and daily use of L2 as in Kałamała and colleagues (Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b), Hartanto and Yang (Reference Hartanto and Yang2020) showed that greater self-assessed exposure to a DL context was related to better task switching (assessed by three different switching tasks), but neither exposure to a DL context nor exposure to an SL context impacted indices of inhibition in this study, which is at odds with the predictions of the ACH (for the effects related to the DCS context, see the text).
There are three potential reasons for the current inconsistency. Firstly, in most studies the patterns of language use were assessed via the participants' self-reports (Hartanto & Yang, Reference Hartanto and Yang2020; Kałamała et al., Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b; Pot et al., Reference Pot, Keijzer and de Bot2018). However, it is not clear to what extent individuals are able to adequately self-assess their language-use patterns, and studies usually do not report psychometric properties for measures derived from self-reports (for an exception, see Kałamała et al., Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b). Secondly, even if bilinguals' patterns of language use are adequately assessed, individuals experiencing the same patterns can still differ in other aspects of bilingualism, such as language proficiency (for arguments, see Beatty-Martínez et al., Reference Beatty-Martínez, Navarro-Torres, Dussias, Bajo, Guzzardo Tamargo and Kroll2019; de Bruin, Reference de Bruin2019). Therefore, it is possible that the demands imposed by a pattern of language use could interact with other aspects of bilingualism, and these interactions may confound the measurement of pattern-specific effects (DeLuca, Reference DeLuca2019; DeLuca, Rothman, Bialystok & Pliatsikas, Reference DeLuca, Rothman, Bialystok and Pliatsikas2019; Gullifer et al., Reference Gullifer, Chai, Whitford, Pivneva, Baum, Klein and Titone2018; Pliatsikas et al., Reference Pliatsikas, DeLuca and Voits2020). Thirdly, the available evidence mostly comes from behavioural studies. However, behavioural measures such as RTs and accuracy reflect not only the cognitive process in question but also other irrelevant processes related to performance (e.g., perceptual processing, memorizing task rules, and so on; the task-impurity problem; Gratton, Cooper, Fabiani, Carter & Karayanidis, Reference Gratton, Cooper, Fabiani, Carter and Karayanidis2017; Miyake, Friedman, Emerson, Witzki, Howerter & Wager, Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000). In turn, this might contaminate the measurement of the targeted process (inhibition in this case). In contrast to behavioural measures, neuroimaging methods with a high temporal resolution, such as event-related potentials (ERP), track the neural processes that lead to the behaviourally observed outcomes in real time (Cespón & Carreiras, Reference Cespón and Carreiras2020; DeLuca, Reference DeLuca2019; Gratton et al., Reference Gratton, Cooper, Fabiani, Carter and Karayanidis2017; Pliatsikas et al., Reference Pliatsikas, DeLuca and Voits2020). Although the use of neuroimaging methods enables the investigation of processes that might otherwise be obscured in behavioural measurements, these methods were not used to test the cognitive effects of daily patterns of language use in any of the studies (but for evidence on resting-state brain connectivity, see Bice, Yamasaki & Prat, Reference Bice, Yamasaki and Prat2020; Gullifer et al., Reference Gullifer, Chai, Whitford, Pivneva, Baum, Klein and Titone2018).
1.3. Efficiency of inhibition and short-term manipulation of language experience
A promising approach to circumventing the problems of assessing real-life patterns of language use and controlling the rich diversity of language experiences is to experimentally manipulate patterns of language use within the same group of bilinguals (a so-called within-subject design; Pliatsikas et al., Reference Pliatsikas, DeLuca and Voits2020; Wodniecka et al., Reference Wodniecka, Casado, Kałamała, Marecka, Timmer and Wolna2020), which should allow straightforward assessment of cognitive effects related to differences in bilingual patterns of language use. Researchers have only recently begun to experimentally manipulate language experience. Crucially, some of these studies collected both behavioural and neuroimaging data (e.g., Jiao, Liu, Liang, Plummer, Perfetti & Chen, Reference Jiao, Liu, Liang, Plummer, Perfetti and Chen2019; Wu & Thierry, Reference Wu and Thierry2013; see also Cespón & Carreiras, Reference Cespón and Carreiras2020) and thereby provided a detailed measurement of inhibition.
In the seminal research, Wu and Thierry (Reference Wu and Thierry2013) showed that passive exposure to both languages (i.e., imitating a DL context in this study) improves inhibition. In this study, young adult bilinguals who declared high-to-moderate proficiency of L2 and everyday use of this language performed a flanker task in which the flanker trials were interspersed with words in either L1 (L1 condition), L2 (L2 condition), or L1 and L2 presented alternately (mixed condition). Participants were more accurate in resolving the flanker conflict when exposed to both languages than when exposed only to L1 or only to L2. Furthermore, their better performance during ongoing exposure to both languages was accompanied by reduced P300 amplitude for flanker-incongruent trials, thus reflecting that participants experienced less interference from incongruent trials (but see Jiao et al., Reference Jiao, Liu, Liang, Plummer, Perfetti and Chen2019; see also Adler, Valdés Kroff & Novick, Reference Adler, Valdés Kroff and Novick2020; Hofweber, Marinis & Treffers-Daller, Reference Hofweber, Marinis and Treffers-Daller2020).
In addition to trial-by-trial manipulations, language experience can be manipulated with short-term language training, which usually takes the form of a cued picture-naming paradigm. While some studies have shown that short-term language training impacts bilinguals’ non-linguistic switching abilities (Prior & Gollan, Reference Prior and Gollan2011; Timmer, Calabria & Costa, Reference Timmer, Calabria and Costa2019; Timmer, Christoffels & Costa, Reference Timmer, Christoffels and Costa2019) and proactive control (Zhang, Kang, Wu, Ma & Guo, Reference Zhang, Kang, Wu, Ma and Guo2015), we are aware of only one study which utilized within-subject, short-term training when testing inhibition (Yang, Ye, Wang, Zhou & Wu, Reference Yang, Ye, Wang, Zhou and Wu2018; for evidence in a between-subject design see Liu, Yang, Jiao, Schwieter, Sun & Wang, Reference Liu, Yang, Jiao, Schwieter, Sun and Wang2019). In the study by Yang and colleagues (Reference Yang, Ye, Wang, Zhou and Wu2018), trilingual speakers who were balanced in terms of proficiency and use of L1 and L2 (but not L3) underwent three versions of a blocked picture-naming task (i.e., blocks of pictures requiring the use of one language alternated with blocks of pictures requiring the use of another language), each of which was followed by performance of a flanker task. Additionally, the behavioural measurement was accompanied by fMRI data recording. The language training imitated three different instances of the DL context: L1-L2, L2-L3, L1-L3. The study showed that the flanker effect was substantially reduced after the L1-L2 training compared to both L1-L3 and L2-L3 (but only in terms of accuracy), thus suggesting improved inhibition after a short session of switching between well-known languages. This effect was further confirmed by the fMRI data, which showed reduced neural activation in the prefrontal cortex and some subcortical areas after a session of switching between L1 and L2 compared to the other conditions.
Taken together, studies that experimentally manipulate language experience provide promising alternatives for testing how cognitive control in bilinguals can be affected by differences in their language-use experience. Importantly, however, when one attempts to relate experimental manipulations to real-life language-use habits, experimental protocols should involve natural language use. This has not been the case in previous studies, as participants were either passively exposed to language-related stimuli (e.g., Wu & Thierry, Reference Wu and Thierry2013) or were engaged in cued picture-naming tasks which require the memorizing of arbitrary associations between cues and languages and artificially force language changes between single words (e.g., Yang et al., Reference Yang, Ye, Wang, Zhou and Wu2018). Therefore, these types of language interventions have relatively low ecological validity (for additional arguments, see Blanco-Elorrieta & Pylkkänen, Reference Blanco-Elorrieta and Pylkkänen2018; van den Noort, Struys, Bosch, Jaswetz, Perriard, Yeo, Barisch, Vermeire, Lee & Lim, Reference van den Noort, Struys, Bosch, Jaswetz, Perriard, Yeo, Barisch, Vermeire, Lee and Lim2019; Wodniecka et al., Reference Wodniecka, Casado, Kałamała, Marecka, Timmer and Wolna2020).
1.4. Present study
In the current study, we tested how natural bilingual language use influences inhibition efficiency. The patterns of language use were experimentally manipulated in a single group of bilinguals via a series of language games. Since the language games involved real conversations, they provided an ecologically valid manipulation of language-use experience. After each game, the participants performed two inhibition tasks (for an overview of the study design, see Figure 1). This within-subject design allows more straightforward attribution of the observed cognitive after-effects to natural patterns of language use while controlling for individual differences in the participants’ background characteristics.
We tested a group of bilinguals who lived in their native-language environment (i.e., Polish, L1) and were relatively homogenous in terms of their background characteristics. Only bilinguals who were proficient in English (L2) and rarely used this language on a daily basis participated in the study (for details, see section 2.1). The games differed in terms of how the languages were used: 1) the L1 game required the use of L1; 2) the L2 game required the use of the non-dominant language, i.e., L2 in an L1 environment; 3) the dual-language (DL) game required switching between L1 and L2 depending on the game partner. Since the participants lived in their L1 environment and mostly used L1 on a daily basis, the L1 game did not differ from their typical language use and was considered as the baseline for between-game comparisons. The L2 game and the DL game differed from participants’ typical language use and represented the SL context and the DL context, respectively.
We used two well-established inhibition tasks: the stop-signal task and the Stroop task (Diamond, Reference Diamond2013). The former task requires inhibition at the level of manual response, whereas the latter task requires inhibition at the level of speech production. The use of two intrinsically different tasks was intended to provide a more fine-grained measurement of inhibition. At the behavioural level, we focused on the stop-signal reaction time (so-called SSRT) in the stop-signal task and the Stroop effect (in RTs and accuracy) in the Stroop task. We also supplemented the behavioural measurement by recording ERPs. We focused on a set of ERP components whose spatiotemporal characteristics differ and thus reflect the engagement of the inhibition mechanism at various stages of information processing, i.e., N2 and P3 in the stop-signal task (Nieuwenhuis, Yeung, van den Wildenberg & Ridderinkhof, Reference Nieuwenhuis, Yeung, van den Wildenberg and Ridderinkhof2003) and N450 in the Stroop task (Liotti, Woldorff, Perez & Mayberg, Reference Liotti, Woldorff, Perez and Mayberg2000; see also Cespón & Carreiras, Reference Cespón and Carreiras2020). N2 is a fronto-central negativity that peaks around 200–300 ms after the stimulus onset. More negative N2 amplitudes for unsuccessfully than for successfully inhibited trials in the stop-signal task (i.e., the N2 unsuccessful > successful inhibition effect) are typically interpreted as reflecting detection and/or monitoring of the conflict between go and inhibitory responses (Dimoska, Johnstone & Barry, Reference Dimoska, Johnstone and Barry2006; Nieuwenhuis et al., Reference Nieuwenhuis, Yeung, van den Wildenberg and Ridderinkhof2003; Senderecka, Reference Senderecka2016). P3 is a centro-parietal positivity which peaks around 300–350 ms after the stimulus. More positive P3 amplitudes for successfully than for unsuccessfully inhibited trials in the stop-signal task (i.e., the P3 successful > unsuccessful inhibition effect) are assumed to reflect mechanisms involved in successful response inhibition (Berkman, Kahn & Merchant, Reference Berkman, Kahn and Merchant2014; Manuel, Bernasconi & Spierer, Reference Manuel, Bernasconi and Spierer2013; Senderecka, Reference Senderecka2018; Senderecka, Szewczyk, Wichary & Kossowska, Reference Senderecka, Szewczyk, Wichary and Kossowska2018; Spierer, Chavan & Manuel, Reference Spierer, Chavan and Manuel2013). N450 is a fronto-central negative deflection peaking around 350–500 ms post-stimulus (Liotti et al., Reference Liotti, Woldorff, Perez and Mayberg2000). More negative N450 amplitudes for incongruent than for congruent trials in the Stroop task (i.e., the N450 incongruent > congruent trial effect) are assumed to reflect monitoring and/or suppression of semantic interference between the colour of the ink and the meaning of the word (Hsieh, Huang, Wu, Chang & Hung, Reference Hsieh, Huang, Wu, Chang and Hung2018; Larson, Kaufman & Perlstein, Reference Larson, Kaufman and Perlstein2009; Liotti et al., Reference Liotti, Woldorff, Perez and Mayberg2000; Szűcs & Soltész, Reference Szűcs and Soltész2012).
Based on research which shows cognitive effects of short-term manipulations of language use (Prior & Gollan, Reference Prior and Gollan2011; Timmer, Calabria et al., Reference Timmer, Calabria and Costa2019; Yang et al., Reference Yang, Ye, Wang, Zhou and Wu2018), we expected that the manipulation of language-use experience would affect the subsequent performance of inhibition tasks. The two predictions were formulated on the basis of the ACH (Green & Abutalebi, Reference Green and Abutalebi2013). Firstly, if the DL and SL contexts improve inhibition, we should observe more efficient performance in inhibition tasks after playing the DL game and the L2 game compared to after the L1 game. Secondly, if the use of two languages in one context without mixing them (i.e., DL context) benefits inhibition more than the use of these languages separately (i.e., different languages in different contexts; SL context), we should observe more efficient performance in the inhibition tasks after a session of the DL game compared to after the L2 game (see also Figure 1). Regarding the behavioural measures, more efficient inhibition should be reflected in shorter SSRT and smaller Stroop effects in RTs and accuracy. With respect to ERPs, following previous ERP research that examined training effects on inhibitory performance (Chang, Alderman, Chu, Wang, Song & Chen, Reference Chang, Alderman, Chu, Wang, Song and Chen2017; Hsieh et al., Reference Hsieh, Huang, Wu, Chang and Hung2018; Schroder, Dubuson, Dousset, Mortier, Kornreich & Campanella, Reference Schroder, Dubuson, Dousset, Mortier, Kornreich and Campanella2020; see also Cespón & Carreiras, Reference Cespón and Carreiras2020; Wu & Thierry, Reference Wu and Thierry2013; Jiao, Grundy, Liu & Chen, Reference Jiao, Grundy, Liu and Chen2020), we predicted that improvements in inhibition-related mechanisms after the DL and L2 games would be reflected in a reduction of ERP effects: the N2 unsuccessful > successful inhibition effect, the P3 successful > unsuccessful inhibition effect, and the N450 incongruent > congruent trial effect.Footnote 1
All participants (N = 32, 18 female)Footnote 2 were right-handed, healthy young adults (mean age 22 years; SD = 2.2 years). They were recruited via an experimental recruitment system at Jagiellonian University, Kraków. Using an online platform, volunteers completed a socio-demographic background questionnaire and two English proficiency tests: the Cambridge General English test (Cambridge Assessment English, 2018; Cambridge test) and the Lexical Test for Advanced Learners of English (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012; LexTALE). Only Polish native speakers (L1) in good health (i.e., free of medications and with normal or corrected-to-normal vision) who were relatively proficient in English (L2) (i.e., scored at least 20 out of 25 in the Cambridge test and at least 45 out of 60 in the LexTALE test) were invited to participate in the study. Thirty-one participants completed all three sessions; one participant missed the third session but was included in the analyses where possible.
On average, the participants were highly educated (years of formal education, M = 14.56, SD = 1.98) and obtained relatively high scores on the fluid intelligence test (M = 14.19 out of 18, SD = 3.26, on a shortened version of Raven's Advanced Progressive Matrices test; only odd-numbered items and 20 minutes to complete). Their language proficiency and history of language learning were evaluated using a language-background questionnaire based on Marian, Blumenfeld and Kaushanskaya (Reference Marian, Blumenfeld and Kaushanskaya2007) and Li, Zhang, Tsai and Puls (Reference Li, Zhang, Tsai and Puls2014). The participants were language-unbalanced Polish–English bilinguals, all of whom had acquired only their native language in early childhood. On average, they started learning English as their L2 at around the age of six (SD = 1.87). Table 1 presents self-assessment data concerning the participants' language abilities. The participants rated their overall L1 proficiency higher than their L2 proficiency, which they considered intermediate to high (p < .001). Consistent with the self-assessment, they correctly completed 97% (SD = 4%) of the Cambridge test and 85% (SD = 7%) of the LexTALE test, thus indicating their moderate-to-high proficiency in L2. Thirty participants declared they had started learning an additional foreign language (predominantly German, French, or Russian), but their overall self-rated proficiency of these languages (M = 3.49, SD = 1.36; for a rating scale see Table 1) was lower than their L1 or L2 proficiency (ps < .001).
Notes. M, mean; SD, standard deviation; 1) self-ratings were 1 = “no knowledge of given language” to 9 = “native-like proficiency”; 2) self-ratings were 1 = “never” to 9 = “always”; 3) general tendency to switch languages between sentences; 4) general tendency to switch languages within sentences.
The participants declared that they used their L1 on a daily basis more often than their L2 (p < .001). Their language-switching habits were evaluated using two indices: the frequency of intrasentential code-switching and the frequency of intersentential code-switching, both of which are derived from the Code-switching and interactional contexts questionnaire (Hartanto & Yang, Reference Hartanto and Yang2016). Overall, participants rarely used two languages in the same situation on a daily basis. When they did, they more frequently switched their languages within single sentences than between sentences (p < .001; for details see Table 1).
2.2. Measures and procedure
Experimental procedures for data collection and analysis were approved by the Institutional Review Board of Jagiellonian University and the Pennsylvania State University. The participants signed an informed consent form prior to the experiment and were paid PLN 250 (about $65.75) for their participation in the study.
The experiment consisted of three testing sessions that were conducted on separate days with at least a two-day break between them (up to a ten-day break). Each testing session consisted of one language game immediately followed by an electroencephalography (EEG) recording. Both the language games and the EEG sessions were conducted in appropriately adapted laboratory rooms. The participants were always tested individually. After completion of a game, each participant was informed about their overall score in the game and asked to assess the difficulty of the game in terms of speaking effort (from 1 = extremely easy to 15 = extremely difficult). The assessment of the game's difficulty served as an explicit measure of whether the participants experienced differences in speaking effort across the language games. Then, the participant was immediately directed to an EEG recording room in the company of the main experimenter, seated (approximately 80 cm from the computer screen) and EEG capped, which took up to 20 minutes. Afterwards, two inhibition tasks were performed, i.e., the stop-signal task and the Stroop task, administered in a fixed order using DMDX software (Forster & Forster, Reference Forster and Forster2003). To reduce the number of EEG artifacts, the participant was instructed to restrict their body movements and try to blink only after the response. After completing the experimental tasks in the first session, the participant filled out two language-background questionnaires (both described in section 2.1). At the end of the third session, they completed a shortened form of Raven's Advanced Progressive Matrices test and were informed about the goals of the study. With the exception of language games (see below), all instructions and communication were in English (L2 for participants).
Patterns of language-use manipulation: language games
The language game was based on the Map Task (Brown, Anderson, Shillcock & Yule, Reference Brown, Anderson, Shillcock and Yule1985). One game involved three players and consisted of six rounds (see also Panel A of Figure 2). One round involved two game partners, each of whom received a set of six picture slides. The slides differed in the number of elements and their arrangement on the slide (see also Panel B of Figure 2). The role of one game partner (the host) was to describe the content of the slides to the other game partner (the confederate), who had to rearrange the elements on their slides and remove unnecessary ones in order to match the host's versions as closely as possible.
The participant was always assigned to the role of the host; the two experimenters who acted as game players switched their roles (confederate and inactive player) between the game rounds. The participants were not aware of the nature of the experimenters who were acting as confederates; instead they were told that all roles in the game were assigned based on a random draw. The experimenters who were acting as confederates were aware of the reasoning behind the experiment. Each game lasted approximately 120 min, out of which around 90 min were used purely for speaking between the host (i.e., a participant) and the confederates. The remaining time was used to clarify the rules and to set up the game and equipment. A detailed description of the stimuli and game procedure can be found in Appendix S1. A fully documented game set-up is available online at https://osf.io/xy4qg.
There were three games that differed only in the language-use rules (see also Panel C of Figure 2): 1) all players used Polish; 2) all players used English; 3) one experimenter used Polish and the other used English, therefore the participant had to switch between Polish (L1) and English (L2) between the rounds of the game, depending on the confederate's language (the DL game). The participants gained points by correctly rearranging elements on the confederate's slides (one point for every correctly completed slide) and lost points if they went over the time limit for the round (minus one point for every thirty seconds of extra time). Additionally, they lost points when incorrect elements were placed on the confederate's slide (minus one point for every incorrect element on a slide) or if they spoke in the wrong language (minus one point for every utterance).
The differences in the language requirements across the three games allowed the patterns of language use to be manipulated, thus constituting the three experimental conditions: the L2 game, the dual-language (DL) game and the L1 game. The L2 game (use of the non-dominant language) was meant to imitate the SL context, while the DL game (switching between L1 and L2 in the same situation) was meant to imitate the DL context. The L1 game (use of the dominant language) was considered as a baseline for between-game comparisons. The order of the games was counterbalanced between participants in a Latin square design; the participants were informed about the version of the game at the beginning of a language-game session.
Measurement of response inhibition: Stop-signal task
Participants completed a stop-signal task with an auditory stop stimulus (e.g., Senderecka, Reference Senderecka2018). Each trial began with the presentation of a central fixation cross for 1200 ms, immediately followed by the presentation of a black screen for 200 ms. Afterwards, a visual go stimulus was presented for 100 ms in the centre of the screen. The go stimulus consisted of a horizontal arrow pointing to the left or the right with 50% probability for each direction. The stimuli were shown in white against a black background. The length of the arrow in the display was 20 mm (1.71°). The fixation cross was 6 mm (0.51°) in width. Participants were instructed to indicate the direction of the arrow (i.e., left or right) by pressing the corresponding Ctrl key (i.e., left or right, respectively) using their index fingers. In a random sample of 25% of trials, a 1400 Hz tone served as the stop signal. It was presented binaurally over EEG-compatible headphones (Sennheiser HD 429; intensity 60 dB SPL, duration 100 ms) immediately after the presentation of the arrow. The sound prompted participants to inhibit their responses to the primary go task, regardless of the arrow direction.
The interval between the presentation of the go stimulus and the stop signal (i.e., the stop-signal delay, SSD) was varied trial-by-trial using a tracking method: the interval increased or decreased by 50 ms (from 100 to 400 ms) for the next stop-signal trial, depending on whether participants had successfully or unsuccessfully inhibited their response to the go stimulus. There were seven possible SSDs: 100, 150, 200, 250, 300, 350, and 400 ms. After a successful inhibition, the interstimulus interval became longer; after an unsuccessful inhibition, it became shorter. The initial value of the SSD was set to 150 ms. The tracking method aimed to converge on an SSD at which participants successfully inhibited responses to approximately 50% of the stop-signal trials. The timeout for a trial was 1500 ms.
In each testing session, participants received one practice block of 20 go-trials and six stop-signals. They were instructed to react to the go stimulus as quickly and as accurately as possible, but they also had to try to stop their response during trials that included the stop signal. After the practice runs, they completed five experimental blocks, each consisting of 56 trials with short breaks in between. During the break, the accuracy feedback for go-trials and mean RT were presented centrally on the screen.
Measurement of response inhibition: Stroop task
Participants completed a modified version of the Stroop task (Stroop, Reference Stroop1935). Each trial began with a white fixation cross presented centrally for 500 ms, immediately followed by the presentation of a black screen for 300 ms. Afterwards, a coloured word was presented in the centre of the screen until a response was made or time ran out (1600 ms). Then, a blank screen was shown for 180 ms. The stimuli were four Polish words displayed in blue, green, red, or yellow: blue (“niebieski”), green (“zielony”), red (“czerwony”), and yellow (“żółty”). The length of words on the screen was 40 to 70 mm (3.42° to 5.98°). The fixation cross was 7 mm (0.60°) in width. The stimuli were presented against a black background. For the congruent trials, the colour of the ink corresponded to the word's meaning (e.g., “red” printed in red). For the incongruent trials, the colour of the ink did not correspond to the word meaning (e.g., “red” printed in blue). Participants were instructed to name the colour of the ink aloud as quickly and accurately as possible. RTs for vocal responses were automatically measured using a DMDX voice key and were manually screened for any artifactual sounds. During each experimental session, participants completed two experimental blocks, each consisting of 186 trials with a short break in between. Each block consisted of 30% incongruent and 70% congruent trials presented in random order. In the first testing session, participants first received two practice blocks. The first practice block consisted of 12 trials, and participants named the colour of the ink when a neutral string of letters was presented (i.e., HHHHHH). The second practice block consisted of 12 trials (30% of which were incongruent), and participants named the colour of the ink when the coloured words were presented. In the other testing sessions, they received only the latter practice block in order to remind them of the task's rules.
2.3. Data pre-processing
Accuracy and reaction times: Stop-signal task
We focused on SSRT, which provides an estimate of the latency of the inhibitory process (Verbruggen & Logan, Reference Verbruggen and Logan2008). It was calculated following the standard procedure by Logan (Reference Logan1994). RTs from go responses in which no stop signal occurred were rank ordered. The nth RT was selected, where n was obtained by multiplying the number of no-signal RTs in the distribution (210) by the probability of responding (e.g., 0.5 if the inhibition rate in the task was 50%) for each participant separately. The SSRT was calculated by subtracting the average SSD from the nth RT (for details, see Logan & Cowan, Reference Logan and Cowan1984; Verbruggen & Logan, Reference Verbruggen and Logan2008). The SSRT scores were normally distributed.
Accuracy and reaction times: Stroop task
We focused on accuracy and RTs for congruent trials and incongruent trials. For the accuracy measure, timeouts were taken as erroneous responses. For RTs, only correct trials were included. Also, extremely short RTs (< 300 ms) and RTs that were three standard deviations above or below the condition mean for a participant were discarded from the analysis (2.8% of all trials). Due to the skewed distribution of RTs, the data was log-transformed.
The continuous scalp EEG was recorded from 32 Ag/AgCl active electrodes (with preamplifiers) using the BioSemi ActiveTwo system. The electrodes were secured in an elastic cap according to the extended 10–20 international electrode placement system. The zero-reference principal voltage values (each site was quantified relative to the DRL and CMS loop) were digitized at a sampling rate of 256 Hz. The horizontal and vertical electro-oculograms were monitored using additional electrodes placed above and below the right eye and in the external canthi of both eyes in order to control for ocular artifacts.
EEG data were pre-processed using the BrainVision Analyzer 2 (Brain Products, Munich, Germany). All channels were re-referenced to the average of the two mastoid electrodes. The data were filtered with a 0.05 Hz high-pass filter (slope 24 dB/oct) and a 45 Hz low-pass filter (slope 12 dB/oct). The EEG data were then segmented relative to stimulus onset into -100–700 ms segments. Ocular artifacts were corrected using the Gratton and Coles method (Gratton, Coles & Donchin, Reference Gratton, Coles and Donchin1983). After ocular correction, contaminated trials exceeding amplitudes of ±75 μV were rejected by a semi-automatic procedure.
Stimulus-locked segments were subsequently checked separately for each trial type (i.e., a successful and an unsuccessful stop in the stop-signal task, and correct-congruent and correct-incongruent in the Stroop task). Afterwards, ERPs were aligned to the pre-stimulus baseline from -100 ms to 0 ms. The mean number of artifact-free epochs per participant included in the ERP analysis for the stop-signal task was as follows: successful stop, M = 35 (SD = 4; min = 18); unsuccessful stop, M = 33 (SD = 3; min = 23). For the Stroop task: correct-congruent, M = 256 (SD = 14; min = 171); correct-incongruent, M = 103 (SD = 5; min = 78).
Appropriate electrode clusters and time windows for the targeted ERP components, i.e., N2, P3 and N450, were selected a priori based on previous studies using the same or similar experimental paradigms: a pronounced negativity around the fronto-central electrodes (Fz, Cz, FC1, FC2) in the 220–270 ms time window for N2 (Dimoska et al., Reference Dimoska, Johnstone and Barry2006; Senderecka, Reference Senderecka2016); a pronounced positivity around the centro-parietal electrodes (Cz, Pz, CP1, CP2) in the 270–400 ms time window for P3 (Senderecka, Reference Senderecka2018; Senderecka et al., Reference Senderecka, Szewczyk, Wichary and Kossowska2018); and a pronounced negativity at the fronto-central electrodes (Fz, FC1, FC2, Cz) in the 350–500 ms time window for N450 (Kałamała, Ociepka & Chuderski, Reference Kałamała, Ociepka and Chuderski2020a; Larson, Clayson & Clawson, Reference Larson, Clayson and Clawson2014; Rey-Mermet, Gade & Steinhauser, Reference Rey-Mermet, Gade and Steinhauser2019). Mean voltage amplitudes in the pre-specified electrode clusters and time windows for each trial were used for statistical analysis. The distributions of the ERP data did not differ from the normal distribution.
2.4. Statistical analyses
Linear mixed effects (LME) regression models were used to establish how prior language-use manipulation affected the estimates of inhibition: the SSRT, N2 and P3, Stroop RT and N450. Initially, we planned to also analyse the accuracy of the Stroop task; however, we noted very high accuracy scores (see Table 2), therefore no further analysis was conducted. All models were fitted using the lme4 package in R (version 1.1-13; R Core Team, 2019) with the BOBYQA optimizer included (Bates, Mächler, Bolker & Walker, Reference Bates, Mächler, Bolker and Walker2015). The fixed effects were coded using a priori contrasts, as recommended by Schad, Vasishth, Hohenstein and Kliegl (Reference Schad, Vasishth, Hohenstein and Kliegl2020). We tested the two models against each of the outcome variables.
Notes. L1, Polish; L2, English; DL, dual-language; M, mean; SD, standard deviation; SSRT, stop-signal reaction times; RTs, response latencies.
Model 1 tested the first prediction, i.e., whether the language experiences that arise in the DL and SL contexts enhance inhibition compared to the use of L1 in an L1 environment. The model for the SSRT included Prior language use (i.e., L1 game, L2 game, DL game) as the participant-related fixed effect. The model for the other outcome variables included two additional participant-related fixed effects: Trial type (successful and unsuccessful stops in the stop-signal task; congruent and incongruent in the Stroop task) and the interaction between Prior language use and Trial type. Prior language use was coded using treatment contrast with the L1 game as the reference level so that the estimated model parameters reflected differences between the L2 game (SL context) and the L1 game (i.e., L2-L1 game contrast) and between the DL game (DL context) and the L1 game (DL-L1 game contrast). Trial type was coded using sum contrast such that the model parameters reflected the difference between the trial types (i.e., successful vs. unsuccessful stops in the stop-signal task, and congruent vs. incongruent in the Stroop task). Model 2 tested the second prediction, i.e., whether the DL context improves inhibition more than the SL context. The model included the same participant-related fixed effects but differed in the levels of treatment contrast for Prior language use. Here, the L2 game (SL context) was taken as the reference for the DL game (DL context), and the L1 game was excluded (i.e., DL-L2 game contrast). Models 1 and 2 were supplemented with direct tests for each type of Prior language use separately, i.e., DL game, L2 game, L1 game, each of which included Trial type as defined above.
For each model, we started with the maximal structure of random effects. If the model did not converge, we first removed correlations between random effects; in the next step, we removed the random effects with the smallest unique variance, following the recommendation by Bates and colleagues (Reference Bates, Mächler, Bolker and Walker2015). Only models for the SSRT needed trimming in the structure of the random effects. Absolute t values greater than the conventional level of two were considered significant. The data and the R scripts are available at https://osf.io/xy4qg.
Of primary interest were two types of effects. The interaction of Trial type and Prior language use in Models 1 and 2 served to assess differences in the behavioural and neural efficiency of inhibition due to prior language-use manipulation. The main effect of Trial type after each game separately served to assess sensitivity to a task manipulation after a language-use manipulation.
3.1. Missing data and data exclusion
We excluded data from participants with < 90% accuracy. For the stop-signal task, we removed two participants’ data and one other participant's first testing session data. For the Stroop task, we removed two participants’ data and one participant's third testing session data. In the LME analyses, participants with missing data from only one session were included. In total, 30 participants were included in the analyses for the stop-signal and Stroop tasks.
3.2. Behavioural data
The L2 game and the DL game were assessed as similarly difficult in terms of speaking effort (M = 7.93, SD = 2.34 and M = 7.68, SD = 2.76, respectively; p > .05) and both were assessed as more difficult than the L1 game (M = 5.45, SD = 2.56; t(29) = 5.46, p < .001 and t(29) = 5.24, p < .001, respectively). Table 2 presents behavioural data from the inhibition tasks across the three language-game conditions; Table 3 presents the results of the LME analyses for behavioural outcomes.
Notes. CI, Confidence Intervals; SD, standard deviation; SSRT, stop-signal reaction time; RT, reaction time; significant effects bolded.
Stop-signal task (SSRT)
The analysis of the SSRT did not show any effects of Prior language use. The effects of the L2-L1 comparison, the DL-L1 comparison in Model 1, and the DL-L2 comparison in Model 2 were non-significant.
Stroop task (RT)
Both Model 1 and Model 2 revealed a main effect of Trial type. Consistent with this, the direct tests showed a Trial type effect in each language-game condition separately (ts ≥ 13.74). RTs were slower for incongruent trials compared to congruent ones. None of the other effects were significant in Models 1 and 2.
3.3. ERP data
Figure 3 presents grand averages of stimulus-locked ERPs in the stop-signal task; Figure 4 presents grand averages of stimulus-locked ERPs in the Stroop task. The targeted ERP components, i.e., N2, P3 and N450, were identified and all demonstrated their classic spatiotemporal characteristics. Consistent with previous research that used auditory stop-signal stimuli (Dimoska et al., Reference Dimoska, Johnstone and Barry2006; Ramautar, Kok & Ridderinkhof, Reference Ramautar, Kok and Ridderinkhof2006; Skippen, Fulham, Michie, Matzke, Heathcote & Karayanidis, Reference Skippen, Fulham, Michie, Matzke, Heathcote and Karayanidis2019), the N2 component did not show a clearly distinguished peak as it partially overlapped in time with a broadly distributed positivity, which smeared the N2 peak out.
Stop-signal task: N2 (220–270 ms)
Both models revealed a main effect of Trial type. The N2 was more negative for unsuccessful than for successful stop trials. Consistent with this, the direct tests showed a Trial type effect in each language-game condition (ts ≥ 4.10). Neither Model 1 nor Model 2 revealed any effects of Prior language use (see Table 4).
Notes. CI, Confidence Intervals; significant effects bolded.
Stop-signal task: P3 (270–400 ms)
Table 5 presents the estimates of the LME models. Model 1 showed a significant main effect of Trial type. P3 was more positive for successful than for unsuccessful stop trials. The Trial type × DL-L1 game interaction effect revealed a trend toward significance, which suggested that the P3 successful > unsuccessful inhibition effect was reduced after the DL game compared to after the L1 game. Model 2 did not show any significant effects. The direct tests for each language-game condition separately showed that the main effect of Trial type was significant after the L1 game (t = −2.98) but was non-significant after the DL and L2 games (t = −1.03 and t = −1.34, respectively). The analyses indicated that successfully inhibited stop signals evoked a more pronounced P3 than unsuccessfully inhibited ones after the L1 game but not after the DL and L2 games.
Notes. CI, Confidence Intervals; significant effects bolded.
Stroop task: N450 (350–500 ms)
Table 6 presents the estimates of the LME models. Model 1 revealed the main effect of Trial type: N450 was more negative for incongruent trials than for congruent trials. Moreover, Trial type interacted with the L2-L1 game and the DL-L1 game comparisons in Model 1: the N450 incongruent > congruent trial effect was reduced after the DL and L2 games compared to after the L1 game. Model 2 did not show any effects. The direct tests showed that the main effect of Trial type was significant after the L1 game (t = 2.52) but was non-significant after the DL and L2 games (t = 0.07 and t = 0.27, respectively). The analyses indicated that the N450 amplitudes were sensitive to the congruency manipulation after the L1 game but not after the DL or L2 games.
Note CI, Confidence Intervals; significant effects bolded.
4.1. Results summary
This study investigated how natural patterns of language use shape inhibition efficiency in L1-dominant bilinguals living in an L1 environment. Rather than identifying the patterns of language use via the lifelong language experiences of bilinguals (Hartanto & Yang, Reference Hartanto and Yang2020; Kałamała et al., Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b) or imitating them via artificial experimental tasks (e.g., Prior & Gollan, Reference Prior and Gollan2011; Timmer, Calabria et al., Reference Timmer, Calabria and Costa2019; Yang et al., Reference Yang, Ye, Wang, Zhou and Wu2018; Wu & Thierry, Reference Wu and Thierry2013), we induced these patterns in a series of language games involving natural language use. Moreover, in order to provide a fine-grained measurement of inhibition, we used two well-established inhibition tasks, the stop-signal task and the Stroop task (Diamond, Reference Diamond2013), and supplemented the behavioural measurement by recording ERPs (Cespón & Carreiras, Reference Cespón and Carreiras2020; Pliatsikas et al., Reference Pliatsikas, DeLuca and Voits2020).
Drawing on the ACH (Green & Abutalebi, Reference Green and Abutalebi2013) and previous research showing cognitive improvements after the short picture-naming sessions (Liu et al., Reference Liu, Yang, Jiao, Schwieter, Sun and Wang2019; Prior & Gollan, Reference Prior and Gollan2011; Timmer, Calabria et al., Reference Timmer, Calabria and Costa2019; Timmer, Christoffels et al., Reference Timmer, Christoffels and Costa2019; Yang et al., Reference Yang, Ye, Wang, Zhou and Wu2018; Zhang et al., Reference Zhang, Kang, Wu, Ma and Guo2015), we formulated two predictions regarding the relationship between patterns of language use and inhibition. More efficient inhibition after the L2 and DL games compared to after the L1 game (baseline) would indicate beneficial roles of both the DL and SL contexts (compared to the use of L1 in the L1 environment). More efficient inhibition after the DL game compared to after the L2 game would indicate that the DL context benefits inhibition to a greater extent than the SL context.
Overall, we replicated classic behavioural and ERP effects in the inhibition tasks. In the stop-signal task, the SSRT fell within the standard range (from 150 to 300 ms in young, healthy participants, Wessel & Aron, Reference Wessel and Aron2015), whereas in the Stroop task faster responses were observed for congruent than for incongruent trials (Stroop, Reference Stroop1935). The targeted ERPs, i.e., N2, P3, and N450, demonstrated spatiotemporal characteristics consistent with expectations based on previous ERP reports (for evidence on N2 and P3, see Berkman et al., Reference Berkman, Kahn and Merchant2014; Dimoska et al., Reference Dimoska, Johnstone and Barry2006; Manuel et al., Reference Manuel, Bernasconi and Spierer2013; Nieuwenhuis et al., Reference Nieuwenhuis, Yeung, van den Wildenberg and Ridderinkhof2003; Senderecka, Reference Senderecka2018; for evidence on N450, see Hsieh et al., Reference Hsieh, Huang, Wu, Chang and Hung2018; Larson et al., Reference Larson, Kaufman and Perlstein2009; Liotti et al., Reference Liotti, Woldorff, Perez and Mayberg2000). Importantly, faster RT was related to smaller amplitude differences in P3 and N450 (i.e., a smaller P3 successful > unsuccessful inhibition effect and a smaller N450 incongruent > congruent trial effect; for details, see Appendix S2). This indicates that smaller ERP effects were associated with more efficient cognitive processing, which is in line with previous research (Chang et al., Reference Chang, Alderman, Chu, Wang, Song and Chen2017; Hsieh et al., Reference Hsieh, Huang, Wu, Chang and Hung2018; Schroder et al., Reference Schroder, Dubuson, Dousset, Mortier, Kornreich and Campanella2020). At the same time, RT was unrelated to the N2 unsuccessful > successful inhibition effect. However, we found this measure unreliable and therefore excluded it from the interpretation in the study (for details, see Appendix S3).
With regards to the prior language-use manipulation, the participants perceived the games as being different in terms of speaking effort. While the L1 game was assessed as very easy, both the L2 game and the DL game were assessed as difficult. The results suggest that the games involving the use of L2 indeed imposed demands on the participants' language use. Notably, however, the DL game, which on the basis of the ACH was assumed to induce the highest demands on language use, was judged to be as difficult as the L2 game (imitating the SL context).
With regards to the behavioural data, we did not observe any effects of the prior language-use manipulation. The latency of the response inhibition mechanism (indexed by SSRT) and the efficiency of interference resolution (indexed by the Stroop effect in RT) were similar regardless of how the participants used their languages in the preceding language game. In fact, the absence of behavioural effects in this study corroborates the findings from our recent latent-variable study (Kałamała et al., Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b), in which the behavioural measures of inhibition were unrelated to the self-assessed patterns of language use in a large group of bilinguals derived from the same population as in this study. However, in contrast to previous studies (e.g., Kałamała et al., Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b; Pot et al., Reference Pot, Keijzer and de Bot2018; Liu et al., Reference Liu, Yang, Jiao, Schwieter, Sun and Wang2019), this study employed not only behavioural but also ERP measurements. The P3 component in the stop-signal task and the N450 component in the Stroop task showed the effects related to prior language use. As predicted, differences in the N450 amplitudes between incongruent and congruent trials were reduced after the DL and L2 games compared to after the L1 game. Similarly, differences in the P3 amplitudes between successfully and unsuccessfully inhibited trials were reduced after the DL game compared to after the L1 game (but this effect was marginal). In contrast to our predictions, however, we did not find differences in P3 and the N450 amplitudes after the DL and the L2 games. Crucially, the direct tests showed that both P3 and N450 were sensitive to the inhibition demands imposed by the tasks (i.e., stop signals and interference, respectively) after the L1 game but were insensitive to these after the L2 and the DL games. The modulation of ERPs in the inhibition tasks consistently suggests that it is less effortful to implement inhibition when prior language use involved language-switching or the exclusive use of a non-dominant language.
4.2. Evidence for the Adaptive Control Hypothesis
Evidence on how patterns of language use shape inhibition comes from studies that assessed everyday habits of language use (e.g., Beatty-Martínez et al., Reference Beatty-Martínez, Navarro-Torres, Dussias, Bajo, Guzzardo Tamargo and Kroll2019; Kałamała et al., Reference Kałamała, Szewczyk, Chuderski, Senderecka and Wodniecka2020b) and studies that experimentally manipulated language experience (e.g., Liu et al., Reference Liu, Yang, Jiao, Schwieter, Sun and Wang2019; Wu & Thierry, Reference Wu and Thierry2013; Zhang et al., Reference Zhang, Kang, Wu, Ma and Guo2015). However, the lifelong experience of bilingualism can be challenging to measure, and the low ecological validity of language-production tasks complicates inferences about how patterns of language use shape cognitive control. In contrast to previous research, this study induced natural language use in an experimental setting in order to investigate the direct effects of natural language use on inhibition.
Although the behavioural measures of inhibition were not modulated by the prior language-use manipulation, the ERP results provide the first evidence for a direct relationship between natural patterns of language use and inhibition. The reduction of the P3 and N450 effects after the DL and L2 games suggests that it is less effortful to implement inhibition when prior language use involved language-switching or the exclusive use of a non-dominant language. These findings are in line with the ACH and suggest that the DL and SL contexts train inhibitory mechanisms, which translates into less effortful implementation of inhibition in a subsequent task. At the same time, the fact that the magnitudes of P3 and N450 did not differ from each other after the DL and L2 games contradicts the ACH's prediction. The absence of differences for the two conditions suggests that, regardless of whether bilinguals switch languages in a context (i.e., operates in a DL context) or are restricted to use only one language (i.e., operates in an SL context), their neural mechanisms of inhibition are trained in a similar way (but for an alternative interpretation, see section 4.3). Interestingly, the effects observed for the DL and SL contexts correspond to the participants' perception of how difficult the games were: the L2 game and the DL game were assessed as being similarly difficult in terms of speaking effort, but both were assessed as more difficult than speaking during the L1 game. In summary, the current pattern of results indicates that the cognitive training that bilinguals receive during their everyday language use of a non-dominant language in an L1 environment affects the neural implementation of inhibitory control.
4.3. Limitations and future directions
The study provides direct evidence for cognitive effects related to the language-use patterns. However, some findings are limited by the nature of the data and therefore require further investigation. The lack of evidence in the ERP data for better inhibition after the DL game compared to the L2 game suggests that the language experiences of the DL and SL contexts impact inhibition to the same extent. However, this finding may not generalize to the entire bilingual population as it may be a consequence of the specific language-dominance profile of the tested population. Since we tested a group of L1-dominant bilinguals living in an L1 environment, we speculate that the effect of inhibiting L1 in the L2 game (mimicking the SL context) was disproportionately large (compared to what was originally proposed in the ACH), which might have translated into the absence of differences between the SL and DL contexts in this study. This in turn suggests that L1-dominant bilinguals living in an L1 environment recruit inhibition processes in a unique way (for a similar argument, see Hofweber et al., Reference Hofweber, Marinis and Treffers-Daller2020; see also Goral, Campanelli & Spiro, Reference Goral, Campanelli and Spiro2015). Future research should thoroughly examine the interactions between language dominance and language-use patterns. An alternative explanation for the absence of differences in the ERP data between the DL and L2 games is that we encountered a floor-effect in measuring ERPs. Since the P3 and N450 amplitude differences were not sensitive to inhibition demands after the DL and L2 games (as reflected by the absence of task-manipulation effects after these games), no further cognitive improvement related to the DL game could have been captured by our ERP measurement (Bialystok, Poarch, Luo & Craik, Reference Bialystok, Poarch, Luo and Craik2014).
In terms of the study design, it is important to note that the use of EEG required a short break between the language and the EEG sessions (dedicated to EEG capping). During the break, conversation was kept to a minimum, but if something required an explanation, the experimenter always used English (L2 for participants). Therefore, one may argue that the additional use of L2 could have interfered with the language-use patterns induced during the games. It is worth noting, however, that the aim of this study was to test how prior cognitive training in the form of a language game affects inhibition. Therefore, while the additional use of L2 may have led to slight deviations in the induced language-use pattern, it should not have removed the cognitive effects of the two-hour training sessions. Nevertheless, in order to provide a more methodologically rigorous design, future research should limit language use between subsequent sessions. Relatedly, since we used a vocal Stroop task, it could be argued that naming in L1 in the Stroop task interfered with the preceding language-use pattern and thereby contaminated the measurement of pattern-specific effects in this task. We consider this scenario unlikely because the Stroop task data was consistent with the data from the stop-signal task, which did not include any linguistic material. However, in order to obtain quantitative evidence, we performed an additional analysis. Assuming that the preceding language-use manipulation indeed interfered with performance on the Stroop task, we should observe some changes during the ongoing performance of this task after games involving the use of L2. The analysis clearly showed that performance on the Stroop task did not differ across trials after the DL and L2 games, which suggests that the use of L1 in the Stroop task did not contaminate the measurement of pattern-specific effects.Footnote 3 Importantly, the evidence for the stability of the pattern-specific effect in the task requiring overt language production suggests an important property of the language-induced effects on cognitive control: although induced by speaking, they did not dissipate despite subsequent language production. Future research should address this issue more thoroughly.
Finally, the pattern of the neural activation did not translate into the behaviourally observed outcomes. We see three possible explanations for this discrepancy. The first explanation might be that the inhibition tasks were performed at the upper limit, which made it impossible to observe the inhibition benefits of the SL and DL contexts at the behavioural level (a so-called ceiling effect). This is a likely explanation as accuracy in the Stroop task was close to 100% and the SSRT was shorter than in several previous studies using the same or similar versions of the stop-signal task (e.g., Greenhouse & Wessel, Reference Greenhouse and Wessel2013; Senderecka, Reference Senderecka2018; Wagner, Wessel, Ghahremani & Aron, Reference Wagner, Wessel, Ghahremani and Aron2017). The specificity of the participants’ sample additionally supports this explanation as we tested young bilingual adults who are often argued to be at the peak of cognitive efficiency, therefore they are susceptible to ceiling effects (Bialystok, Reference Bialystok2017; Bialystok, Martin & Viswanathan, Reference Bialystok, Martin and Viswanathan2005; but see Samuel, Roehr-Brackin, Pak & Kim, Reference Samuel, Roehr-Brackin, Pak and Kim2018). The second explanation is related to the specificity of the experimental manipulation. Since we incorporated a relatively short-term language-use manipulation, it can be speculated that behavioural effects would be observed with regards to the longer-lasting language use. Since the ACH does not define a time frame for the cognitive effects of language use, this issue requires further research. The third explanation is that the behavioural and ERP measurements are to some extent dissociable and therefore the differences in the neural activation patterns do not always translate into behaviourally observed effects (Gratton, Sun & Petersen, Reference Gratton, Sun and Petersen2018; van den Noort et al., Reference van den Noort, Struys, Bosch, Jaswetz, Perriard, Yeo, Barisch, Vermeire, Lee and Lim2019). While behavioural measures reflect not only a specific cognitive process targeted in the study but also different peripheral processes involved in performance, ERPs reflecting temporal changes in the activity of specific brain processors are more precise manifestations of specific cognitive processes. In the ERP literature, the P3 and the N450 are well-established markers of inhibition (Larson et al., Reference Larson, Clayson and Clawson2014; Pires, Leitão, Guerrini & Simões, Reference Pires, Leitão, Guerrini and Simões2014). Building on this logic, P3 and the N450 likely reflected the engagement of inhibition in this study, but the behavioural measures did not.
This study shows how inhibition efficiency can be modulated by bilinguals’ language-use experience. By adopting a within-subject design and a multiple-measure approach, the study is the first to test the direct effects of a relatively natural and ecologically valid language-use manipulation (i.e., conversation) on inhibition on both behavioural and electrophysiological levels. The study increases our knowledge about the specific conditions in which language use can benefit inhibition. Specifically, we observed a more neurally efficient implementation of inhibition after prior use of L2. Crucially, the study suggests that the exclusive use of L2 and the alternate use of L1 and L2 might be comparable in enhancing inhibitory control when bilinguals reside in an L1 environment.
The study is also timely with respect to the ongoing discussion regarding the ecological validity of manipulating language use in an experimental setting (Blanco-Elorrieta & Pylkkänen, Reference Blanco-Elorrieta and Pylkkänen2018; van den Noort et al., Reference van den Noort, Struys, Bosch, Jaswetz, Perriard, Yeo, Barisch, Vermeire, Lee and Lim2019). Evidence for the cognitive effects of language training mostly comes from studies utilizing artificial cued language-production paradigms. Our findings indicate that natural patterns of language use can be successfully induced in well-controlled experimental settings and may affect the workings of the cognitive control system. The study should thereby inspire future research to use more ecologically valid manipulations.
The research was funded by a National Science Centre grant to Z.W. [2015/18/E/HS6/00428] and by a National Science Foundation grant [OISE-1545900]. During work on the paper, M.S. was supported by a National Science Centre grant [2015/19/B/HS6/00341] and P.K. was supported by National Science Centre grants [2017/27/N/HS6/01029; 2020/36/T/HS6/00363] and Foundation for Polish Science (START). The authors gratefully acknowledge the help of Maria Badanova with experiment preparation and data collection; Jakub Szewczyk with preparing experimental tasks and analysis; and research assistants who contributed to data collection as confederates in language games.
Appendix S1. Language games: stimuli and game procedure
Each picture slide included five to twelve elements on a geometrically shaped background. In total, 54 unique slides were prepared, which allowed the creation of nine unique sets, each including six slides (one set of slides per game round). The host's and confederate's slides shared elements and backgrounds but differed in the number of elements (up to five additional elements were given to the confederate, i.e., these were to be removed) and their arrangement on the slide. In each set, the slides were split into three pairs that differed within the following dimensions: 1) the number of elements per slide (5, 8 or 12 elements on the host's slide); 2) complexity of the background (basic geometric shapes, e.g., cylinders and rectangles, vs. complex geometric shapes, e.g., overlapping lines); 3) semantic category (elements from distinct semantic categories, e.g., cat, car, lemon, etc. vs. elements from the same semantic category, e.g., lion, tiger, cat, etc.). These dimensions allowed the difficulty of the game to be manipulated (i.e., more elements, more complex background, and more semantically related elements were assumed to make the slide description more difficult), and thus constituted the three levels of difficulty within a game round: simple, moderate, and difficult. When two slides had been completed, the level of difficulty for the game increased. All slides were 1280 x 720 pixels in bitmap image format and were presented using Microsoft PowerPoint on laptop computers with a screen resolution of 1366 x 768 pixels. The host was presented with the slides in presentation mode, while the confederate was presented with the slides in editing mode, which enabled them to rearrange the slide elements using a computer mouse. All slides used in the experiment and the materials required for the games are available online at https://osf.io/xy4qg.
The full game consisted of six rounds. Each round was time-limited and its duration depended on the difficulty of the slide. The time limits for the simple, moderate, and difficult sets were 1:30 min, 3:00 min, and 6:00 min, respectively. If the confederate finished rearranging their slides on a given difficulty level before the time limit, they received additional slides that contained the same elements as the finished slides but in a different arrangement so that they matched the difficulty level. The host and the confederate were allowed to communicate freely with each other; however, they could not use gestures, show each other the laptop screens, or use any other communication tools.
In total, four individuals were present during a language game session: three game players and the main experimenter, who was responsible for explaining and enforcing the game rules and monitoring the course of the game so that each language game was similar in terms of duration and sequence of events. At the beginning of the game, the three players (i.e., the two experimenters acting as players and the participant) were seated at a table with two laptops. Then, the main experimenter explained the rules of the game in English (L2 for participants). Once the procedure was clear to everyone, the three players engaged in the game. During the game, the main experimenter kept time and gave the sound signals for the start of each round and thirty seconds before the end of the time limit. This approach allowed a natural and voluntary end to the conversation between the host and the confederate. Moreover, the experimenters who acted as players were instructed to engage the participant during the game to enable a fluid and natural conversation. For example, if the participant could not remember the correct name of an object, the confederate would ask them questions about its visual features to allow the game to continue smoothly and to use the available time effectively. After six rounds of the game had been completed, the main experimenter scored the slides for accuracy.
Appendix S2. Associations between behavioural and ERP data
In order to test the functional interpretation of the targeted ERP effects, i.e., the better the efficiency of inhibition-related mechanisms, the smaller the ERP amplitude differences between the task conditions, we used LME regression models. N2, P3 and N450 were regressed on the following variables: RTs (i.e., the SSRT for N2 and P3, and the log-transformed Stroop RTs for N450; both scaled), Trial type (i.e., successful and unsuccessful stops for N2 and P3; congruent and incongruent for N450) and their interaction. All models included Trial type and Prior language use as random effects. The fitting procedure and contrasts were as described for Model 1 in section 2.4. None of the models needed trimming. We expected that RTs would interact with Trial type so that faster RTs would be related to smaller ERP effects, i.e., the N2 unsuccessful > successful inhibition effect, the P3 successful > unsuccessful inhibition effect and the N450 incongruent > congruent trial effect.
The model for N2 showed the main effects of Trial type and SSRT (t = −7.47 and t = −8.64, respectively). However, Trial type and SSRT did not interact with each other (t = −0.39). Contrary to the prediction, the magnitude of N2 (i.e., the N2 unsuccessful > successful inhibition effect) was not related to behavioural performance in the stop-signal task. The model for P3 showed the main effect of Trial type (t = −2.48), but it did not show the main effect of SSRT (t = 0.41). Crucially, it showed an interaction between Trial type and SSRT (t = −4.39). SSRT positively predicted the P3 amplitude for successful stop trials (t = 2.01) but did not predict the P3 amplitude for unsuccessful stop trials (t = −1.24). This indicates that the P3 successful > unsuccessful inhibition effect was smaller for faster SSRT, which is in line with the prediction. The model for N450 did not show a main effect of Trial type (t = 0.82) but it revealed the main effect of Stroop RTs and an interaction between Trial type and Stroop RTs (t = −2.20 and t = −2.91, respectively). The Stroop RTs negatively predicted the N450 amplitude for incongruent trials (t = −3.64) and did not predict N450 for congruent trials (t = −0.62). As predicted, the N450 incongruent > congruent trial effect was smaller for faster Stroop RTs.
Appendix S3. Test-retest reliability analysis
The test-retest reliabilities of the inhibition measures (i.e., the SSRT, N2 and P3, Stroop RTs and N450) were computed to verify whether the study had sufficient psychometric properties to detect intra-individual variation within the experimental manipulation, i.e., prior language use. Since SSRT is a single value for a participant, the estimates were assessed using only the classic Pearson's correlation coefficient (r). The estimates for the other measures were additionally assessed using hierarchical models, as recommended by Rouder and Haaf (Reference Rouder and Haaf2019).
The classic r was computed for each pair of language games. The hierarchical models resembled those presented in Rouder and Haaf (Reference Rouder and Haaf2019). The effects of Trial type (i.e., successful vs. unsuccessful stops in the stop-signal task; incongruent vs. congruent in the Stroop task) for each language game (i.e., L1 game, L2 game, DL game) were taken as the fixed effects, whereas the overall Trial type effect (i.e., an effect regardless of a language game) and the idiosyncratic deviations within an individual (i.e., differences in the Trial type effect within a participant) were taken as random effects. The correlation coefficients derived from the hierarchical models were expressed by the multivariate distribution (for more details, see Rouder & Haaf, Reference Rouder and Haaf2019).
Table S3 presents the test-retest reliabilities. All of the measures except for the N2 component demonstrate acceptable test-retest reliabilities in our study. For the SSRT, the Stroop RTs, and the P3 successful > unsuccessful inhibition effect, the estimates indicate excellent reliability. For the N450 incongruent > congruent trial effect, the estimates differ depending on the between-game comparison, but overall they are considered acceptable. Non-significant reliability estimates of the N2 unsuccessful > successful inhibition effect indicate that the N2 data was not stable over time and as such should be excluded from the interpretation in our study. At the same time, sufficient reliabilities of both the behavioural and the P3-N450 data imply that the current discrepancy between the behavioural and the ERP findings cannot be easily explained by the idiosyncratic properties of the study. While sufficient reliability for SSRT and the Stroop effect in RT shows that the absence of effects in the behavioural data is not a measurement error, the satisfactory reliabilities for the P3 and N450 data further support the presence of the reported effects.