Hostname: page-component-77f85d65b8-h52fh Total loading time: 0 Render date: 2026-03-26T03:51:07.034Z Has data issue: false hasContentIssue false

Patterns of speech and gesture production in the communications of bilinguals and monolinguals: Do speakers’ proficiency and discourse context matter?

Published online by Cambridge University Press:  24 March 2026

Armita Ghobadi*
Affiliation:
Psychology, Barnard College, Columbia University, USA
Şeyda Özçalışkan
Affiliation:
Psychology, GSU, USA
*
Corresponding author: Armita Ghobadi; Email: armita.ghobadi@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Gesture and speech form a tightly integrated system in first language (L1). We know less about the gesture-speech system in second language (L2) production, particularly with respect to speaker proficiency and discourse context. In this study, we focused on the speech and gestures produced by adult Persian (L1)-English (L2) bilinguals with high or low L2 proficiency and English native speakers (n = 22/group). We asked whether speaker proficiency (native, high, low) and discourse context (narratives, explanations) influence the amount, diversity and complexity of speech and gesture production. Our results showed an effect of context, with greater production of speech and gesture in narratives than explanations across proficiency levels. More importantly, we found an effect of proficiency – with lower speech complexity coupled with greater gesture complexity in bilinguals with low proficiency, particularly in the explanation context – suggesting a compensatory role for gesture among bilinguals with low L2 proficiency in more demanding communicative contexts.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press

1. Introduction

Previous research in first language (L1) production has shown a strong connection between speech and gesture use in adults (Kita et al., Reference Kita, Özyürek, Allen, Brown, Furman and Ishizuka2007; Özçalışkan et al., Reference Özçalışkan2016). Adults use gestures when they speak, with gestures either reinforcing what they express in speech (e.g., ‘chair’ + pointing at chair) or adding new information not found in speech (e.g., ‘sit’ + pointing at chair; Goldin-Meadow, Reference Goldin-Meadow2007; Hostetter, Reference Hostetter2011). However, we know less about how speech and gesture are linked in second language (L2) production, and the existing studies provide inconclusive results (Brown & Gullberg, Reference Brown and Gullberg2008; Özçalışkan, Reference Özçalışkan2016; Özyürek, Reference Özyürek and Skarabella2002). In this study, we focused on two groups of adult Persian (L1)-English (L2) bilinguals – with either high or low proficiency in their L2 – and compared them to monolingual English speakers. Our goal was to determine whether there were systematic differences in the amount, diversity, and complexity of speech and gesture production among these three groups of speakers in different contexts of language use, with a focus on narratives and explanations. Narration and explanation both involve extended language use; they also systematically differ in their communicative function. Narratives require recounting a temporally ordered sequence of events and emphasize coherence across episodes (Berman & Slobin, Reference Berman and Slobin1994), while explanations ask speakers to provide reasons or clarifications and rely more on causal reasoning (Kotthoff, Reference Kotthoff2007). Accordingly, these two discourse contexts place different cognitive and communicative demands on speakers, particularly on speakers with different levels of language proficiency. Given our two key variables, namely, proficiency and discourse context, we predicted one of two possible outcomes: One possibility was that gesture and speech would follow the same pattern, namely, that greater amount, diversity and complexity of speech production would be coupled with greater amount, diversity and complexity of gesture production across all groups and discourse contexts. Another possibility, however, was that gesture and speech might follow opposing patterns. More specifically, we predicted that gesture would compensate for the difficulties in speech production, particularly for bilinguals with low L2 proficiency and particularly in the more demanding explanation context – a possibility that would result in greater gesture production outcomes coupled with lower speech production outcomes.

In this study, we focused on multiple facets of speech and gesture production in adult bilinguals with different L2 proficiencies. Our goal was to provide a relatively more comprehensive assessment of multi-modal communication strategies that bilingual speakers might employ across different discourse contexts. The findings from this study could inform instructional strategies in second language learning contexts.

1.1. Patterns of speech production in bilinguals

Bilinguals show variability in speech production in both their L1 and L2 compared to monolingual speakers, with respect to the amount, diversity and complexity of their production.

1.1.1. Amount of speech production

Beginning with the amount of speech production (i.e., total number of words), research has shown that bilingual speakers tend to talk more in their L2, particularly when compared to monolingual speakers of the same language. Studies that examined amount of speech production across a variety of contexts – from descriptions of animated stimuli to narratives – showed that bilinguals produced more speech in their L2 compared to monolingual speakers of their L2 – a pattern that was shown across different bilingual groups, including Mandarin (L1)-English (L2), Hindi (L1)-English (L2), French (L1)-English (L2), Spanish (L1)-English (L2): Nicoladis et al., Reference Nicoladis, Nagpal, Marentette and Hauer2018; Spanish (L1)-English (L2): Cruz (Reference Cruz2021); Turkish (L1)-English (L2): Özçalışkan (Reference Özçalışkan2016); and Korean (L1)-English (L2): Park (Reference Park2020) bilingual speakers. One possible explanation for this difference could be the relative L2 proficiency of bilingual speakers. For example, in a discourse context, bilinguals might lack the vocabulary to express the different entities in speech (e.g., characters, objects; Bosch, Reference Bosch1983; Garrod, Reference Garrod, Smelser and Baltes2001; Lyons, Reference Lyons1977; So et al., Reference So, Kita and Goldin-Meadow2009) and might instead rely on other linguistic tools, such as more extended descriptions (Johns et al., Reference Johns, Sheppard, Jones and Taler2016). This, in turn, might result in greater speech production in bilinguals, particularly compared to monolinguals.

This difference becomes particularly pronounced for bilinguals with lower proficiency in L2. A study by So et al. (Reference So, Kita and Goldin-Meadow2013) that examined Mandarin (L1)-English (L2) bilinguals showed that bilinguals with low proficiency produced greater amount of speech than bilinguals with high proficiency. The greater amount of speech production among less proficient bilinguals was attributed to their tendency to over specify referents by including greater amount of descriptive speech to characterize them (Gullberg, Reference Gullberg, Dimroth and Starren2003, Reference Gullberg2006; So et al., Reference So, Kita and Goldin-Meadow2013; Yoshioka, Reference Yoshioka2008) – a pattern that was particularly pronounced in more complex speech production tasks (i.e., tasks that place higher cognitive demands such as reasoning, argumentation; Yoshioka, Reference Yoshioka2008).

Some other studies also suggested that the amount of speech production could show variability based on the type of L1 and L2 (Nicoladis et al., Reference Nicoladis, Nagpal, Marentette and Hauer2018). Languages like English tend to employ a more chronicle-style narrative, focusing on actions (e.g., how events happened); languages like Greek, on the other hand, employ a more descriptive type of narrative, focusing on why events happened (Ryan, Reference Ryan1993; Tannen, Reference Tannen and Chafe1980). Accordingly, the greater emphasis placed on providing a chronicle of the story with more details might result in longer narratives in a language like English, which has been the L2 in much of the earlier work with bilinguals (Tannen, Reference Tannen and Tannen1982). Most of the earlier work also primarily used either narrative tasks (e.g., So et al., Reference So, Kita and Goldin-Meadow2013) or description of animated scenes (e.g., Nicoladis et al., Reference Nicoladis, Nagpal, Marentette and Hauer2018; Pika et al., Reference Pika, Nicoladis and Marentette2006), leaving patterns of speech production in other relatively more demanding speech contexts (e.g., explanations) unexamined, highlighting this as an important area in the need of further research.

1.1.2. Diversity of speech production

Turning to the diversity of speech production (i.e., number of different words), we know relatively less about the variability in the range of meanings bilinguals convey in speech. We know from earlier developmental work that bilingual children convey a narrower range of meanings in speech in each of their languages compared to their monolingual peers (Bialystok, Reference Bialystok2009; Oller and Eilers, Reference Oller and Eilers2002), suggesting early differences. Research with adult bilinguals also showed that the proficiency level of the speaker might be one of the key factors in determining the diversity of spoken vocabulary in bilinguals (Johns et al., Reference Johns, Sheppard, Jones and Taler2016). As shown in earlier work with English (L1)-French (L2) bilinguals, bilinguals with high proficiency were more likely to exhibit greater semantic diversity (i.e., greater variety of words) in their speech than the ones with low proficiency (Johns et al., Reference Johns, Sheppard, Jones and Taler2016). A study conducted on three groups of French (L1)-English (L2) bilinguals with varying levels of proficiency in a picture description task found that bilinguals with high proficiency displayed higher levels of lexical diversity, particularly for verbs (Treffers-Daller, Reference Treffers-Daller, Richards, Daller, Malvern, Meara, Milton and Treffers-Daller2009). Similarly, another study with Korean (L1)-English (L2) bilinguals with different levels of proficiency showed that bilinguals with high proficiency produced a more diverse set of verb types than the ones with low proficiency in L2 (Park, Reference Park2020).

Apart from proficiency, earlier work has shown that discourse context might also be an important variable in determining speech diversity in bilinguals. An earlier study with Persian (L1)-English (L2) bilinguals – all with high proficiency – examined lexical diversity in three types of contexts, including argumentation (e.g., argue why money might bring happiness), description (e.g., describe a good time you had with a friend) and narrative (e.g., narrate a story based on pictures). The study showed that the argumentation and description resulted in higher levels of lexical diversity as compared to the narratives (Bayazidi et al., Reference Bayazidi, Ansarin and Mohammadnia2019), suggesting greater speech diversity in discourse contexts that impose greater production demands.

1.1.3. Complexity of speech production

Turning to the complexity of speech production (i.e., the mean length or mean syntactic complexity of an utterance), the existing sparse research suggests that proficiency might be an important factor for speech complexity. As shown in earlier work (e.g., Bayazidi et al., Reference Bayazidi, Ansarin and Mohammadnia2019), bilinguals with high L2 proficiency produced more complex speech (e.g., longer utterances) in more demanding tasks (e.g., argumentation) than in less demanding tasks (e.g., narratives), further highlighting both discourse context and proficiency as important contributors to speech complexity as well.

In summary, research on the amount, diversity and complexity of speech production in bilinguals suggests that both proficiency and discourse context are important factors in explaining variability in speech production. Bilinguals with high L2 proficiency tend to talk less (i.e., fewer word tokens), but they show greater diversity (i.e., greater word types) and complexity (i.e., longer utterances) in their speech compared to bilinguals with low L2 proficiency. Discourse context and language type might also affect speech production, with certain types of tasks (e.g., argumentation) and types of languages (e.g., languages with chronicle style of reporting) resulting in higher speech production outcomes in bilinguals compared to their monolingual peers. Even though existing research underscores the significant influence of both proficiency and context demands on speech production in bilinguals, there is no research to date that examined systematic variability in speech production in bilinguals based on both proficiency and discourse context across multiple outcome measures, including amount, diversity, and complexity of speech production. There is also scarcity of research in patterns of speech production among bilinguals with Persian as L1 – a language that shows both systematic similarities and differences from English – highlighting the need for future studies.

1.2. Patterns of gesture production in bilinguals

Gesture and speech form a tightly integrated system in monolinguals (e.g., de Ruiter, Reference De Ruiter and McNeill2000; Kita & Özyürek, Reference Kita and Özyürek2003; McNeill, Reference McNeill1992; Özçalışkan et al., Reference Özçalışkan2016). Speakers frequently use their hands when speaking and these co-speech gestures convey substantial information integral to the semantic information conveyed in speech (Goldin-Meadow, Reference Goldin-Meadow2007). However, relatively less is known about gesture production in bilinguals, with most studies focusing primarily on the amount of gesture production.

1.2.1. Amount of gesture production

Beginning with the amount of gesture production (i.e., number of gestures), studies suggest that bilinguals’ gesture more, particularly when speaking their L2, compared to monolinguals (Nicoladis, Reference Nicoladis2007; Nicoladis et al., Reference Nicoladis, Pika and Marentette2009, Reference Nicoladis, Nagpal, Marentette and Hauer2018; Özçalışkan, Reference Özçalışkan2016; Pika et al., Reference Pika, Nicoladis and Marentette2006; So, Reference So2010; So et al., Reference So, Kita and Goldin-Meadow2013). An earlier study by Pika et al. (Reference Pika, Nicoladis and Marentette2006), which compared advanced French (L1)-English (L2) and English (L1)-Spanish (L2) bilinguals to English monolinguals, found that bilinguals produced more gestures in their L2 than their monolingual counterparts speaking the same language. This pattern was further supported by Özçalışkan (Reference Özçalışkan2016), who found that highly proficient Turkish (L1)-English (L2) bilinguals produced more gestures in L2 English than monolingual English speakers. Similar findings emerged across different proficiency levels. For instance, Nicoladis et al. (Reference Nicoladis, Nagpal, Marentette and Hauer2018) examined bilinguals with Mandarin, Hindi, French or Spanish as L1 and English as L2, while Park (Reference Park2020) studied Korean (L1)-English (L2) bilinguals. Both studies reported that bilinguals, regardless of proficiency, gestured more in their L2 English than monolingual English speakers. In contrast, Azar et al. (Reference Azar, Backus and Özyürek2020) examined second-generation heritage Turkish (L1)–Dutch (L2) bilinguals in comparison to monolingual Turkish and monolingual Dutch speakers and reported no significant differences in gesture frequency between bilinguals and monolinguals in either language. However, different from earlier work, Azar et al. (Reference Azar, Backus and Özyürek2020) tested heritage speakers with native-like proficiency in both of their languages relying on silent video stimuli (e.g., cooking activities, office work scenes all without words), but the tasks did not impose distinct discourse-level demands.

Other studies (e.g., So et al., Reference So, Kita and Goldin-Meadow2013) examined amount of gesture use in relation to different proficiency levels and showed that bilinguals might differ in their relative use of different gesture types (e.g., iconic vs. pointing gestures) based on their proficiency level. So et al. (Reference So, Kita and Goldin-Meadow2013), in a picture description study with Mandarin (L1)-English (L2) bilinguals, showed that bilinguals with lower L2 proficiency produced more deictic gestures that indicated objects (e.g., pointing to a car), while bilinguals with higher L2 proficiency used more iconic gestures that were symbolic (e.g., flapping arms to convey flying). This was a finding that was also evident at the early ages in children (ages 2–3; Nicoladis et al., Reference Nicoladis, Mayberry and Genesee1999). These findings thus suggest that not only overall gesture production but also relative use of each gesture type might differ based on the proficiency level of the bilinguals. Much of this earlier work on gesture production has focused on narrative or picture/video description tasks; thus, we do not yet know whether these patterns extend to other discourse contexts (e.g., explanations) and how discourse context may also interact with speakers’ level of L2 proficiency.

1.2.2. Diversity of gesture production

Turning to the diversity of gesture production (i.e., the unique referents, actions, attributes conveyed by gesture, e.g., point at cat vs. point at chair vs. wiggle fingers in place for crawling), research on bilinguals remains virtually non-existent. We do not yet know whether the diversity of meanings conveyed in gesture varies either by proficiency level or discourse context. However, we know from earlier work on lexical diversity that bilinguals with high proficiency in L2 produce more diverse speech (i.e., use more different word types) than the ones with low proficiency in L2. If gesture further augments what is conveyed in speech, then we would expect the same pattern in gesture, namely, that bilinguals with low proficiency would convey a narrower range of meanings in their gesture akin to their speech. If, on the other hand, gesture compensates for the difficulties in speech production, we would expect that bilinguals with low proficiency in L2 would convey a greater diversity of meanings in gesture when speaking their L2, possibly to express meanings that they cannot convey in speech. This possibility might also be more pronounced in discourse contexts that impose greater demands in speech production. The paucity of research in this domain thus renders it as an important area for future research.

1.2.3. Complexity of gesture production

Turning last to the complexity of gesture production (i.e., gesture’s informational relation to speech, namely, whether gesture conveys the same information as speech or additional information to speech), there is relatively limited research as well. The few studies that focused on the informational relation gesture has in relation to speech showed that bilinguals with high proficiency in L2 were more likely to use gesture to supplement what they conveyed in speech than bilinguals with low proficiency (Swedish (L1)-French (L2) bilinguals: Gullberg, Reference Gullberg, Dimroth and Starren2003, Reference Gullberg2006; Mandarin (L1)-English (L2) bilinguals: So et al., Reference So, Kita and Goldin-Meadow2013). They might, for example, use a deictic gesture to clarify an ambiguous referent in speech (e.g., ‘The lady asked him a question’ + point at picture of tall man) or produce an iconic gesture to add further descriptive detail to a spoken description (e.g., ‘The lady asked the man to take a taxi’ + spread palms away from each other vertically to convey how tall the man is). Apart from these two earlier studies that focused on narratives produced by bilinguals with high L2 proficiency, there is no work that has yet examined how discourse context or speaker proficiency might impact complexity of gesture production.

In summary, the limited research on bilingual speakers suggests that proficiency might serve as an important factor in explaining variability in gesture production – with some studies suggesting an advantage for bilinguals with higher proficiency in the amount and complexity of gesture production. There is no work that has yet systematically examined how gesture production might vary by discourse context or the interaction between proficiency and discourse context, highlighting these areas as important venues in need of further research. Similar to speech, there is also paucity of research in patterns of gesture production among bilinguals with Persian as L1, raising the need for future studies.

1.3. Current Study

In this study, we focused on the gestures and speech produced by Persian (L1)-English (L2) bilinguals with either high or low L2 proficiency in two different language production tasks (narratives, explanations) and compared them to the gestures and speech produced by a group of monolingual English speakers – all focusing on productions only in English. We asked two questions:

  1. (1) We first asked whether language proficiency (native, high, low) and discourse context (narrative, explanation) would have an effect on the amount, diversity, and complexity of speech production in English. We expected an effect of proficiency. Specifically, we predicted that bilinguals with low L2 proficiency would produce a greater amount of speech than bilinguals with high L2 proficiency, who, in turn, would use more speech than monolinguals. We expected this pattern to be reversed for the diversity and complexity of speech production, namely, that bilinguals with low L2 proficiency would produce less diverse and less complex speech than bilinguals with high L2 proficiency and monolinguals. These predictions stemmed from prior research, which indicated that bilinguals with low proficiency talked more but used less diverse and less complex speech when speaking their L2 (Gulberg, Reference Gullberg, Dimroth and Starren2003, Reference Gullberg2006; Johns et al., Reference Johns, Sheppard, Jones and Taler2016; Nicoladis et al., Reference Nicoladis, Nagpal, Marentette and Hauer2018; So et al., Reference So, Kita and Goldin-Meadow2013). We also expected an effect of discourse context, namely, that the differences based on proficiency would be more pronounced in the explanation task as compared to the narrative task, based on earlier research that underscored the significance of task demands on L2 speech production (e.g., Bayazidi et al., Reference Bayazidi, Ansarin and Mohammadnia2019).

  2. (2) We next asked whether language proficiency (native, high, low) and discourse context (narrative, explanation) would have an effect on the amount, diversity and complexity of gesture production in English. We predicted one of two possibilities based on the relatively scarce and inconclusive findings. One possibility would be that gestures would mirror the patterns observed in speech, based on earlier work that suggested a strong coupling between gesture and speech production in adults across a variety of tasks in first language production contexts (Chu & Kita, Reference Chu and Kita2011; Hostetter et al., Reference Hostetter, Alibali and Kita2007; Theocharopoulou et al., Reference Theocharopoulou, Cocks, Pring and Dipper2015). That is, we expected a greater amount of gesture production, coupled with lower diversity and complexity of gesture use in bilinguals with low L2 proficiency as compared to bilinguals with high L2 proficiency and monolinguals – thus mimicking the patterns observed in speech. A second possibility, as part of our two-way prediction, would be that gestures would show the opposite pattern as speech, compensating for the difficulties in speech production. That is, we expected bilinguals with low proficiency to produce more gestures than bilinguals with high proficiency, who, in turn, would gesture more than monolinguals in their productions in English – a pattern that we expected to be evident in the diversity and complexity of gesture production as well. This prediction was based on earlier work that suggested a compensatory role for gesture in bilingual children and adults (Arslan et al., Reference Arslan, Aktan-Erciyes and Göksun2023; Nicoladis, Reference Nicoladis2007; Smithson & Nicoladis, Reference Smithson and Nicoladis2013). We also expected that these differences to be more prevalent in the explanation task than in the narrative task, following earlier work which showed that more difficult tasks (e.g., argumentation) result in greater gesture production than the ones with less difficulty (e.g., narratives; Nicoladis, Reference Nicoladis2007).

2. Method

2.1. Participants

The sample consisted of 44 Persian (L1)-English (L2) adult bilingual speakers – 22 with high L2 proficiency and 22 with low L2 proficiency – and 22 adult monolingual English L1 speakers who had very limited or no knowledge of other languages (see Table 1). As can be seen in Table 1, the three groups were comparable in age and education,Footnote 1 but they differed in their L2 English proficiency. The sample size was based on a power analysis (G*Power 3.1; Faul et al., Reference Faul, Erdfelder, Buchner and Lang2007), which showed that n = 22 per group would provide a power of 85% at an alpha level of 0.05 with medium effect sizes (R2 = 0.20).

Table 1. Summary of sample characteristics by group

Note: M, mean (years; months); SD, Standard Deviation (years; months).

The bilinguals were classified into high versus low proficiency groups based on three criteria, including (1) their Test of English as a Foreign Language (TOEFL; Educational Testing Service, 2024) scores, (2) their extent of residency in the United States and (3) their score on a verbal fluency task that assesses vocabulary knowledge in English (Portocarrero et al., Reference Portocarrero, Burright and Donovick2007; Spreen & Strauss, Reference Spreen and Strauss1998).Footnote 2 Bilinguals with low English L2 proficiency included speakers who had a B2 level on the Common European Framework of Reference for Languages (CEFR) and resided in the United States for less than 4 years. Bilinguals with high English L2 proficiency consisted of speakers who had a CEFR level of C1 and who have lived in the United States for more than 4 years. Five of the participants from the high L2 proficiency group did not have TOEFL scores or any other equivalent English proficiency tests as they immigrated to the United States at an early age and completed their education in English. Their fluency scores and extent of residency were comparable to that of the bilinguals in the high proficiency group; we therefore categorized these five participants in the high L2 proficiency group. All monolingual English speakers were born in English-speaking households, had minimum knowledge of any languages other than English and completed all their education in English. Both samples were recruited through student organizations at an urban research university in the United States. Accordingly, the participants in each group were either college students or recent college graduates. All participants received a small gift for their participation. The research was approved by the institutional review board of a research university in the United States and was conducted in accordance with the Code of Ethics for the protection of human research participants.

2.2. Procedure for data collection

Each participant was interviewed in English by a native English speaker individually in one session in a quiet place. All participants first completed the consent forms, followed by a demographic questionnaire that provided information about their age, gender, education and language background. Bilingual participants also completed the short form of the Language Experience and Proficiency Questionnaire (LEAP-Q, Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007), which provides information about language dominance, attitudes toward second language learning and self-assessment of fluency in speaking, listening, reading and writing in English. Upon completion of both questionnaires, each participant was asked to participate in two language production tasks – one eliciting narratives and the other eliciting explanations – in one of six different orders. Three of the orders started with narratives and three with explanations. We also systematically varied the presentation order within the narrative and explanation task to control for any possible order effects on performance. Each participant was randomly assigned to one of the six presentation orders, with approximately equal representation of each order across the three participant groups.

The stimuli used in the narrative and explanation tasks were similar in length (5 minutes total for each), to ensure that variability in amount, diversity and complexity of gesture and speech production by discourse type was not due to differences in length of the stimuli in each task. Each participant also completed a fluency task in English at the end of the session (see footnote 2, for description of the task). All responses were video-recorded.

2.2.1. Narration task

Each participant was shown two cartoons, each approximately 2.5 minutes long, one at a time, resulting in a total of 5 minutes of stimulus exposure time. They were then asked to narrate what happened in the cartoon to the experimenter (i.e., ‘Tell me what happened in the cartoon in as much detail as you can remember’). Each cartoon depicted adventures of a cat and a mouse (Tom & Jerry; Hannah & Barbera, Reference Hanna and Barbera2014), featuring dynamic, temporally organized event scenes that were well-suited for eliciting gestures in a narrative context (see Figure 1 for stills from one of the cartoons used for the narrative task).

Figure 1. Screenshots from a sample task eliciting narrative.

2.2.2. Explanation task

Each participant was shown five cartoons, each approximately 1 minute long, one at a time, resulting in a total of 5 minutes of stimulus exposure time. After each cartoon, they were asked to explain what happened in the cartoon to the experimenter (i.e., ‘Explain to me what the problem in the cartoon was and how it was solved in as much detail as you can remember’). Each cartoon depicted a mouse and an elephant (Die Sendung mit der Maus; Schmidt, Reference Schmidt1971), facing a particular problem and then solving it individually or together, making them suitable for eliciting speech and gestures within an explanatory framework (see Figure 2 for stills from a sample cartoon used for the explanation task).

Figure 2. Screenshots from a sample task eliciting explanation.

2.3. Transcription, coding and reliability

2.3.1. Speech

All speech produced by each participant was transcribed from video recordings by native speakers, using the Codes for the Human Analysis of Transcript (CHAT) systems (MacWhinney, Reference MacWhinney2000). Speech transcripts were divided into utterances, defined as a sequence of words that were preceded and followed by a change in conversational turn, intonation or pause, following the CHAT system guidelines.

Following previous research (Özçalışkan et al., Reference Özçalışkan, Adamson, Dimitrova and Baumann2017), we considered meaningful sounds that were used to refer to concrete and abstract entities (e.g., ‘cat’, ‘idea’, ‘tree’), events (e.g., ‘run’, ‘eat’), properties of entities or events (e.g., ‘pretty’, ‘fast’), onomatopoeic sounds (e.g., ‘meow’), conventionalized evaluative sounds (e.g., ‘wow’) and function words (e.g., ‘and’, ‘that’, ‘with’) as words. Speech responses were further coded for the amount, diversity and complexity of production. Specifically, we used the number of words as a measure of speech amount (i.e., word tokens), the number of different types of words (i.e., lexical diversity, e.g., ‘cat’ vs. ‘bird’ vs. ‘run’ vs. ‘toward’) as a measure of speech diversity and the mean length of utterance in words (MLU) as a measure of speech complexity, following earlier work (Ozturk et al., Reference Ozturk, Pınar, Ketrez and Özçalışkan2021). We treated words with the same stem but with derivational morphemes as different words (e.g., ‘run’ vs. ‘runner’ as two different word types), while words with the same stem but with inflectional morphemes (e.g., ‘run’ vs. ‘running’) were considered the same word type. Repetition of the same sentence, false starts (i.e., a speaker starts a sentence but abandons the idea without finishing the task), non-verbal vocalization (e.g., laughter, throat cleaning) and backchannel responses (e.g., serving as a listener to experimenter’s instruction, for example, saying: ‘yes’, ‘sure’, ‘of course’) were excluded from all analysis.

2.3.2. Gestures

All gestures produced by each participant were also coded. Gesture was defined as a communicative hand or body movement that did not involve direct manipulation of an object, following earlier work (e.g., Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005a). Each gesture was further categorized into different types, including deictic gestures (e.g., pointing at empty space to indicate the imaginary location of a cat), emblematic gestures (e.g., thumbs up to convey okay), iconic gestures (e.g., flapping arms to convey flying, placing a cupped-shaped hand in the air as if holding a bird’s nest) and beats (e.g., rhythmic hand gestures marking speech boundaries; McNeill, Reference McNeill1992; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005a, Reference Özçalışkan and Goldin-Meadow2005b). Each gesture was also coded in terms of its informational relation to speech (i.e., gesture + speech combination). These included reinforcing gestures (i.e., gesture-speech combinations where the gesture conveyed the same information as speech; e.g., saying ‘flying’ while flapping arms), emphasizing gestures (combinations where the gesture marked speech boundaries; e.g., flicking fingers at the end of each speech segment), disambiguating gestures (i.e., combinations in which gesture clarified a pronominal referent; e.g., saying ‘this one’ while pointing to the character on screen) and supplementary gestures (i.e., gesture-speech combinations where the gesture added new information to speech; e.g., saying ‘he ran to that door’ while moving right index finger in a circle to indicate that the door was a revolving door), following the guidelines outlined in earlier work (Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005a, Reference Özçalışkan and Goldin-Meadow2005b).

Gesture responses were further coded for the amount, diversity and complexity of production, following earlier work (Ozturk et al., Reference Ozturk, Pınar, Ketrez and Özçalışkan2021; Pınar et al., Reference Pınar, Öztürk, Ketrez and Özçalışkan2021). Specifically, we used the total number of gestures (collapsing across beat, deictic, emblematic and iconic gestures) as a measure of gesture amount, the number of different referents conveyed in gesture (e.g., flapping arms for bird vs. tracing whiskers under nose for cat vs. pointing up to indicate upward location vs. nodding for affirmation) as a measure of gesture diversity. For gesture diversity, we did not include beat gestures, as beat gestures do not convey semantic information on their own (i.e., no lexical meaning). Instead, they function to emphasize speech and regulate discourse flow (McNeill, Reference McNeill1992). For gesture complexity, we included all four gesture types (beat, deictic, emblematic, iconic) but divided them into two categories in terms of their informational relation to speech as simple or complex. Simple gestures consisted of gestures that either conveyed the same information as speech (i.e., reinforcing; e.g., ‘the mama bird is knitting’ + moving fisted hands in opposite circular directions to convey knitting) or emphasized speech (‘the mama bird is knitting’ + moving palms rhythmically to emphasize speech). Complex gestures consisted of gestures that either added new information to speech (i.e., supplementing; ‘she goes’ + moving fisted hands in opposite circular directions to convey knitting) or clarified a pronominal referent in speech (i.e., disambiguating; e.g., ‘she was knitting’, accompanied by an index-finger point at the previous location of the referred character of mother bird on computer screen).

The use of three different outcome measures separately for speech and gestures (amount, diversity, complexity) allowed us to provide a more comprehensive account of variability in patterns of speech and gesture production by proficiency and elicitation context – an analysis strategy used in earlier work examining adult and child multi-modal productions (Ozturk et al., Reference Ozturk, Pınar, Ketrez and Özçalışkan2021; Pınar et al., Reference Pınar, Öztürk, Ketrez and Özçalışkan2021).

2.3.3. Reliability

We assessed reliability for gesture coding by a trained independent coder, naïve to the hypotheses of the study, who coded a randomly selected 30% of the video recordings. Inter-coder agreement was high across all coding dimensions. The agreement between coders was 97% (κ = 0.94) for identifying gestures (i.e., presence vs. absence of gestures in an utterance), 98% (κ = 0.96) for assigning meaning to gestures (i.e., referents conveyed in gestures), 93% (κ = 0.91) for coding gesture into types (as deictic, iconic, emblematic, beat) and 92% (κ = 0.89) for coding gesture complexity (i.e., gesture’s informational relation to speech as reinforcing, emphasizing, disambiguating, or supplementing).

2.4. Statistical analysis

We analyzed speech and gestures separately using a set of Generalized Linear Mixed Models (GLMMs). GLMMs allowed us to account for the non-normal distribution of the data and to model the count-based outcomes associated with speech and gesture production. We conducted three separate GLMMs for speech, one for each of the three outcome measures (speech amount, speech diversity, speech complexity). Similarly, we conducted three separate GLMMs for gesture, one for each of the three outcome measures (gesture amount, gesture diversity, gesture complexity). As an exploratory analysis, we also conducted GLMMs on types of gestures (beat, deictic, emblematic, iconic) and simple gestures (i.e., gestures that only reinforce or emphasize the accompanying speech).

Each model had the same structure: Outcome = Proficiency * Context + (1| Subject_ID), for each of the six different outcome measures.Footnote 3 Proficiency (native, high, low) and context (narrative, explanation) were included as fixed effects, and participant was treated as a random effect due to the repeated-measures design. Dummy coding was applied to each of the two categorical independent variables: the native group served as the baseline for the proficiency variable (native, high, low) and the narrative task served as the baseline for the discourse context (narrative, explanation). Data manipulation was conducted using Python (Python Software Foundation, 2023), with the Pandas (McKinney, Reference McKinney, van der Walt and Millman2010) and NumPy (Harris et al., Reference Harris, Millman and van der Walt2020) libraries facilitating data management. Data visualization was performed using Matplotlib (Hunter, Reference Hunter2007) and Seaborn (Waskom, Reference Waskom2021). Scikit-learn (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay, Shultz, Bertrand, Dufour and Oliphant2011) was employed to standardize the data and ensure consistency in scale. For multiple regression model analyses, we utilized Statsmodels package (Seabold & Perktold, Reference Seabold and Perktold2010).

3. Results

3.1. Patterns of speech production

We first examined whether production of speech showed variability by speaker proficiency (native, high, low) and discourse context (narratives, explanations). Beginning with speech amount, we found no main effect of speaker proficiency (β = 0.29, 95% CI [−0.30, 0.88], z = 0.30, p = 0.33) but a main effect of context (β = 0.42, 95% CI [0.07, 0.77], z = 2.33, p = 0.01). Overall, speakers produced a greater amount of speech in the narrative than in the explanation context. The interaction between proficiency and discourse context approached, but not reached significance (β = 0.47, 95% CI [−0.02, 0.98], z = 1.87, p = 0.06) – with a tendency for greater amount of speech production in the narrative context, but only among native English speakers and bilinguals with high L2 English proficiency (see Figure 3, panel A1).

Figure 3. Mean amount (A1–A2), diversity (B1–B2) and complexity (C1–C2) of speech and gesture production in English by bilinguals with low proficiency, bilinguals with high proficiency and monolingual English speakers with native proficiency in narrative (solid red bars) and explanation tasks (striped bars). Error bars represent standard error; also note that the scales of the bars in panels A and B are different and panel C2 only depicts the means for complex gestures (i.e., gestures that either disambiguate speech or add new information to speech).

Turning next to speech diversity (i.e., number of different words), we found no main effect of speaker proficiency (β = 0.17, 95% CI [−0.42, 0.77], z = 0.58, p = 0.56), but there was a marginal effect of context (β = 0.35, 95% CI [−0.02, 0.73], z = 1.88, p = 0.06): speakers tended to show greater speech diversity in the narrative than in the explanation context – consistent with their amount of speech production. There was no interaction between proficiency and discourse context (β = 0.06, 95% CI [−0.47, 0.59], z = 0.23, p = 0.82; see Figure 3, panel B1).

Turning last to speech complexity (i.e., MLU), we found a main effect of proficiency: native speakers produced more complex speech (i.e., longer sentences) compared to bilinguals with either high proficiency (β = 0.70, 95% CI [0.17, 1.24], z = 2.61, p = 0.01) or low proficiency (β = 1.22, 95% CI [0.70, 1.76], z = 4.53, p < 0.001). However, there was neither an effect of context (β = 0.03, 95% CI [−0.24, 0.29], z = 0.19, p = 0.85) nor an interaction between speaker proficiency and discourse context (β = 0.19, 95% CI [−0.19, 0.57], z = 0.99, p = 0.32) in the complexity of speech production; see Figure 3, panel C1.

It is important to note here that participants spent more time on the narrative as compared to the explanation task (Mnarrative = 200.30 seconds, SD = 121.42 vs. Mexplanation = 160.00 seconds, SD = 85.39), t (66) = 3.62, p = 0.001, d = 0.44), even if the total amount of stimuli exposure time was comparable in the two elicitation contexts by design. This difference was particularly pronounced among speakers in the native (M = 155.65, SD = 90.67 vs. M = 119.96, SD = 78.69), t(21) = 3.02, p = 0.006, d = 0.63) and high proficiency (M = 222.36, SD = 139.67 vs. M = 161.14, SD = 80.80), t(21) = 3.39, p = 0.003, d = 0.72) groups, but was less evident in the low proficiency group (Mnarrative = 224.91, SD = 122.02) (Mexplanation = 200.73, SD = 80.04), t (21) = 0.93, p = 0.36, d = 0.20).

3.2. Patterns of gesture production

We next examined patterns of gesture production by speaker proficiency and discourse context. Beginning with gesture amount (i.e., number of gestures), our results showed no main effect of proficiency (β = 0.06, 95% CI [−0.87, 0.99], z = 0.13, p = 0.89), but a main effect of context (β = 0.43, 95% CI [−0.04, −0.82], z = 2.18, p = 0.02). Speakers produced a greater number of gestures in the narrative than in the explanation context – a finding consistent with their patterns of speech production. There was no interaction between proficiency and discourse context (β = 0.06, 95% CI [−0.48, 0.60], z = 0.23, p = 0.82) (see Figure 3, panel A2).

Speakers – across proficiency levels and contexts – produced iconic gestures (e.g., moving index finger in circles to convey a revolving door, moving the index and middle finger forward repeatedly to convey a bird’s pecking; M = 24.02, SD = 18.33) and beat gestures (M = 21.59, SD = 18.12) the most, followed by deictic gestures (e.g., pointing to the screen to indicate a character; M = 7.34, SD = 6.12) and emblematic gestures (e.g., shaking head for negation; M = 3.02, SD = 3.62). As can be seen in Table 2, the distribution of each gesture type also varied by speaker proficiency and discourse context. Beginning with iconic gestures, we found no effect of proficiency, but an effect of context favoring the narrative context (β = 0.28, 95% CI [0.02, 0.55], z = 2.10, p = 0.04), which also interacted with proficiency (β = 0.41, 95% CI [0.02, 0.81], z = 2.04, p = 0.04): speakers with low proficiency produced more iconic gestures in the explanation than in the narrative context. Turning next to beat gestures, we found a marginal effect of proficiency (β = 0.48, 95% CI [−0.02, 0.98], z = 1.88, p = 0.06) and a main effect of context (β = 0.69, 95% CI [0.32, 1.06], z = 3.66, p = 0.001). However, there was an interaction of proficiency and context (β = 0.54, 95% CI [0.06, 1.01], z = 2.22, p = 0.03): speakers with lower proficiency produced more beats, but only in the explanation context. Deictic gestures followed a similar pattern: our results showed a main effect of proficiency (β = 0.63, 95% CI [0.13, 1.14], z = 2.44, p = 0.02), with greater deictic gesture production among bilinguals with low proficiency. Deictic gesture production also showed a main effect of context (β = 0.63, 95% CI [0.26, 0.99], z = 3.39, p = 0.001) – but no interaction (β = 0.31, 95% CI [−0.26, 0.79], z = 1.23, p = 0.22), with greater deictic gesture use in the narrative context. The use of emblematic gestures showed neither an effect of proficiency (β = 0.14, 95% CI [−0.62, 0.91], z = 0.37, p = 0.71), context (β = 0.15, 95% CI [−0.20, 0.50], z = 0.82, p = 0.41), nor proficiency × context interaction (β = 0.44, 95% CI [0.13, 1.02], z = 1.51, p = 0.13).

Table 2. Mean (SD) production of gesture types by speaker proficiency and discourse context

Note: SD, standard deviation.

Turning next to the gesture diversity (i.e., number of different referents conveyed in gesture), we found no main effect of either speaker proficiency (β = 0.17, 95% CI [−0.71, 1.04], z = 0.38, p = 0.70), context (β = 0.16, 95% CI [−0.19, 0.50], z = 0.88, p = 0.38), or an interaction between proficiency and discourse context (β = 0.16, 95% CI [−0.32, 0.63], z = 0.64, p = 0.52; see Figure 3, panel B2).

Turning last to gesture complexity, we first examined complex gestures (i.e., gestures that disambiguate or supplement speech). As can be seen in Figure 3 panel C2, we found no main effect of context (β = 0.29, 95% CI [−0.18, 0.74], z = 1.50, p = 0.14) but a significant main effect of proficiency (β = 1.15, 95% CI [0.19, 2.09], z = 2.37, p = 0.02): bilinguals with low proficiency used gestures in more complex ways to add new information to speech, showing the opposite pattern from what was observed for speech complexity. We also found an interaction between proficiency and context (β = 0.88, 95% CI [0.35, 1.48], z = 3.23, p = 0.001), with bilinguals with low proficiency producing more complex gestures, but only in the explanation context (see Figure 3, panel C2).

Looking next at simple gestures (i.e., gestures that reinforce or emphasize speech; not shown in Figure 3), we found no effect of proficiency (β = 6.84, 95% CI [−8.78, 22.45], z = 0.85, p = 0.39), but a marginal main effect of context favoring the narrative context (β = 10.07, 95% CI [−1.44, 21.59], z = 1.71, p = 0.09). Context also interacted with proficiency, with bilinguals with low proficiency producing more simple gestures in the explanation context(β = 24.35, 95% CI [8.43, 40.26], z = 2.10, p = 0.003).

Speakers – across proficiency levels and discourse contexts – primarily used gestures to reinforce (e.g., ‘revolving door’ + moving index finger in circles to convey revolving; M = 26.66, SD = 21.78) or emphasize (i.e., beats; M = 14.68, SD = 13.07) their speech. As can be seen in Table 3, using gesture in more complex ways to either supplement speech (e.g., ‘the mouse uses his tail to exit the hole’ + spinning index finger upward to convey spinning motion; M = 4.43, SD = 3.43) or disambiguate speech (e.g., ‘the mouse looks for something sweet inside that’ + pointing to sugar bowl on screen; M = 3.73, SD = 0.58) was overall less frequent across groups and contexts. At the same time, however, bilinguals with low proficiency used gestures more to add further information to their speech than the other two groups (see Figures 4 and 5 for descriptions of a sample scene in speech and gesture by speakers with different levels of proficiency in the narrative and the explanation context).

Table 3. Mean (SD) production of simple and complex gestures by speaker proficiency and discourse context

Note: SD, standard deviation.

Figure 4. The sample scene of the mother bird knitting (A), and its depiction in gesture by a monolingual speaker with native fluency (B) and by bilingual speakers with either high (C) or low (D) English proficiency. The speakers with native and high proficiency both used gesture to further reinforce the information already conveyed in their speech (‘the mama bird is knitting  + moving fisted hands in opposite circular directions to convey knitting); the speaker with low English proficiency used gesture to add new information not expressed in speech (‘she is doing something’ + moving fisted hands in opposite circular directions to convey knitting).

Figure 5. The sample explanation scene of a mouse pumping a tire (A), and its depiction in gesture by a monolingual speaker with native English fluency (B), and by bilingual speakers with either high (C) or low (D) English proficiency. The speakers with native and high proficiency both used gestures to further reinforce the information already expressed in speech (‘the mouse is pumping the tire’ + moving cupped hands up and down rapidly to convey pumping); the speaker with low English proficiency used gestures to add new information that was missing in speech (‘she is doing this’ + moving cupped hands up and down rapidly to convey pumping).

4. Discussion

Gesture and speech form an integrated system in first language (L1) production (Kita et al., Reference Kita, Alibali and Chu2017; McNeill, Reference McNeill1992). In this study, we asked whether this pattern extends to second language (L2) production across different proficiency levels (native, high, low) and discourse contexts (narratives, explanations), using data from a sample of 44 Persian (L1)-English (L2) bilinguals – 22 with high and 22 with low L2 proficiency – and 22 native English speakers. Our analysis showed an effect of context in the amount of speech and gesture production, suggesting close coupling between the two systems in L2 production. We also found a strong effect of proficiency in the complexity of speech and gesture production – with lower complexity in speech coupled with higher complexity in gesture production by bilinguals with low L2 proficiency, compared to monolinguals and bilinguals with high proficiency – also suggesting a compensatory role for the gesture in the communications of bilinguals with lower proficiency in L2.

4.1. Does discourse context affect patterns of speech and gesture production?

The participants in our study – monolinguals and bilinguals – produced more speech in their narratives than in their explanations. This finding aligns well with previous research, which also showed an effect of discourse context on speech production, with an advantage for narratives (Masson-Carro et al., Reference Masson-Carro, Goudbeek and Krahmer2017; Nicoladis et al., Reference Nicoladis, Pika, Yin and Marentette2007; Yoshioka, Reference Yoshioka2008). One possible explanation for the greater amount of speech production in narratives could be the nature of narrative practices in English. Several studies have shown that languages like English emphasize chronicle-style narratives focusing on actions (e.g., how events happened), which, in turn, result in more detailed descriptions and greater speech production in the narrative context (Nicoladis et al., Reference Nicoladis, Nagpal, Marentette and Hauer2018; Ryan, Reference Ryan1993; Tannen, Reference Tannen and Chafe1980).

More importantly, participants also produced more gestures in their narratives than in their explanations, suggesting a strong connection between the two modalities in production. The close coupling between speech and gesture use has been shown in earlier work in both L1 (de Ruiter, Reference De Ruiter and McNeill2000; Kita & Özyürek, Reference Kita and Özyürek2003; McNeill, Reference McNeill1992; Özçalışkan et al., Reference Özçalışkan2016) and L2 production (Cruz, Reference Cruz2021; Nicoladis, Reference Nicoladis2007; Nicoladis et al., Reference Nicoladis, Nagpal, Marentette and Hauer2018; Özçalışkan, Reference Özçalışkan2016; So et al., Reference So, Kita and Goldin-Meadow2013). Our study furthers these findings, showing a similar pattern in the amount of speech and gesture production in a new and understudied group of speakers, namely, Persian (L1)-English (L2) bilinguals.

The greater amount of speech and gesture production in the narratives compared to explanations could also be an outcome of the greater amount of time speakers spent in the narrative task than in the explanation task. We kept the stimulus time similar in the two tasks and did not impose any time constraints on speakers’ responses so as not to influence patterns of speech and gesture production. The speakers in our study – with the exception of bilinguals with low proficiency – spent more time telling narratives than giving explanations. Future studies that place a time cap on production are needed to further elucidate the effect of time pressure on patterns of speech and gesture production in the two discourse contexts.

Importantly, however, the context-based variability in production was not quite as evident when we looked at the diversity of meanings conveyed in speech and gesture. Speakers did not produce reliably more diverse speech (i.e., different words) in their narratives than in their explanations – a pattern that was also observable in the diversity of meanings conveyed in gesture (i.e., different gesture referents). This finding differs from earlier work (Bayazidi et al., Reference Bayazidi, Ansarin and Mohammadnia2019), which showed lower diversity in speech production in the narrative context than in contexts that elicit descriptive and argumentative discourse among Persian (L1)-English (L2) bilinguals. However, Bayazidi et al. (Reference Bayazidi, Ansarin and Mohammadnia2019) did not include speaker proficiency as a variable in their study, thus differing in its design from our study. But the question still remains why speakers differed in their amount but not diversity of speech and gesture production in the two discourse contexts? One possibility could be the nature of the tasks. The stimuli – both narratives and explanations – centered around the same characters, with specific verbal prompts in elicitation (i.e., Narrative: ‘Can you tell me what happened in the video’; Explanation: ‘Can you tell me what the problem in the video was and how the character resolved it’), which in turn, might have limited the range of referents participants could employ in their speech and gesture production. In fact, the participants frequently repeated the same character in their speech and gestures (e.g., ‘the mouse’; pointing at the screen where the mouse was; ‘the bird’; pecking gesture for woodpecker) multiple times in the production of a single narrative or explanation. This, in turn, might have affected the diversity of the meanings conveyed in speech and gesture equally in both discourse contexts. The relatively narrow range of characters in the stimuli of both discourse contexts might be a possible limitation of our study. As such, future studies that elicit narratives and explanations using cartoons with a greater range of characters, settings and events might shed further light on this possibility.

The lack of context-based variability was also observed in the complexity of speech and gesture production. Research on the complexity of speech and gesture is quite limited. However, one such study on bilinguals with high proficiency found that they produced more complex speech (i.e., longer utterances) in more demanding tasks (e.g., argumentation; Bayazidi et al., Reference Bayazidi, Ansarin and Mohammadnia2019). The relatively small body of research on gesture complexity has focused primarily on narratives produced by bilinguals, with no comparison to other discourse contexts. Our study is the first of its kind to examine the effect of discourse context on the complexity of gesture production, revealing no reliable context effects.

More importantly, the context-based variability (or lack thereof) in speech and gesture production was tightly integrated. Narratives elicited a greater amount of speech than explanations, a pattern that was also mirrored in gesture production. Conversely, narratives and explanations did not differ reliably in the diversity or complexity of speech production – a pattern that was also reflected in gesture production. This close coupling between the two modes of communication suggests that gesture and speech form tightly integrated systems across different production contexts and across proficiency levels.

4.2. Does speaker proficiency affect patterns of speech and gesture production?

The three groups of participants in our study did not show differences in the amount of speech or gesture production – a pattern that diverged from our predictions. This finding also contrasts with earlier work that showed greater speech production in bilinguals with low proficiency as compared to those with high proficiency (e.g., Gullberg, Reference Gullberg, Dimroth and Starren2003, Reference Gullberg2006; So et al., Reference So, Kita and Goldin-Meadow2013; Yoshioka, Reference Yoshioka2008). Several factors could explain the lack of proficiency effects in our study. One possibility could be the plateau effect in language proficiency (Richards, Reference Richards2008). After reaching an intermediate level, learners may experience a plateau where further increases in proficiency do not significantly impact language production in their L2. The participants in our study had all passed the English proficiency test required for admission into colleges in the United States and had lived in an English-speaking environment for an extended period of time. Therefore, even though the low proficiency group was lower in proficiency than the high proficiency group, they both were still using English frequently in their everyday lives. As such, they may have accumulated enough language experience to manage tasks like narratives and explanations at levels comparable to native speakers or bilingual speakers with high proficiency. On a related point, Han (Reference Han2004) discusses the concept of fossilization, where language development slows down or stops, potentially leading to similar speech patterns among intermediate and advanced learners. The concepts of plateau and fossilization together might explain why our participants, despite varying proficiency levels, exhibited comparable production in both speech and gesture. In fact, one possible limitation of our study could be the relatively smaller proficiency gap between the two bilingual groups. Future studies on bilinguals with more graded proficiency differences can shed further light on the relative effect of more granular proficiency differences on patterns of speech and gesture production.

It is important to note that our criteria for categorizing bilinguals into two groups were based on multiple factors, including English test scores, verbal fluency measures and length of residency. Although test scores are often treated as the primary indicator of proficiency, other dimensions, such as verbal fluency, play an equally critical role. Roselli et al. (Reference Rosselli, Ardila, Salvatierra, Marquez, Matto and Weekes2002), for example, showed that verbal fluency outcomes are directly influenced by speakers’ proficiency of a language. Length of residency has also been established as a key factor. As shown by Park (Reference Park2020), extended exposure to an L2-dominant environment substantially contributes to proficiency, with bilinguals who had more than 5 years of L2 residency producing more diverse vocabulary than those with only 2 years of residency. Azar et al. (Reference Azar, Backus and Özyürek2020) also found that heritage speakers with native-like proficiency and long-term residency in the L2 environment did not differ from monolinguals in gesture frequency. This suggests that with sufficiently high proficiency and extended exposure, differences in gesture production between bilinguals and monolinguals may diminish. In line with this, our definition of ‘high proficiency’ required not only strong performance on standardized measures but also a higher length of residency. This allowed us to ensure that verbal fluency and broader communicative competence in the L2 were both considered in assessing proficiency. More specifically, while bilinguals can reach high levels of test-based proficiency, achieving strong verbal fluency – as shown in measures such as the fluency test (Portocarrero et al., Reference Portocarrero, Burright and Donovick2007) – typically depends on prolonged residency and sustained experience with the target language.

The comparability of overall speech (and consequently gesture) production might also be a result of more formulaic language use – chunks of memorized phrases, such as ‘all of a sudden’, ‘then out of nowhere’, or ‘that is how it happened’ – among bilinguals with lower proficiency in our study. The use of such formulaic speech might have allowed them to produce enough speech to appear comparable to the other two groups in terms of the sheer amount of speech output (Pawlak, Reference Pawlak2011). This, in fact, aligns well with Grosjean’s (Reference Grosjean2010) notion of functional bilingualism, where bilinguals develop coping strategies that enable them to adapt their language use effectively, thereby reducing variability across proficiency levels. Future studies that include bilinguals at lower proficiency levels (e.g., beginners) or that analyze speech content with particular attention to formulaic expressions might offer more insight into the lack of proficiency-based differences in the relative amount of speech and gesture production.

The lack of proficiency-based variability was also evident in the diversity of speech and gesture production – a pattern that also differed from our predictions. Earlier work focusing on the diversity of speech production has shown that bilinguals with high proficiency used a more diverse lexicon in speech than those with low proficiency (Johns et al., Reference Johns, Sheppard, Jones and Taler2016; Treffers-Daller, Reference Treffers-Daller, Richards, Daller, Malvern, Meara, Milton and Treffers-Daller2009). For example, Treffers-Daller (Reference Treffers-Daller, Richards, Daller, Malvern, Meara, Milton and Treffers-Daller2009) conducted a study on French (L1)-English (L2) bilinguals with varying levels of proficiency in a picture description task and found that bilinguals with high proficiency displayed higher levels of lexical diversity, particularly for verbs. Our findings, in contrast, did not replicate the previously observed proficiency-based differences in speech diversity and extended this lack of difference to the diversity of meanings conveyed in gesture. One possible reason for this could be the task design and complexity. Tasks that are too simple or too structured can limit the range of linguistic and gestural expressions participants could use. Robinson (Reference Robinson2001) suggests that task complexity influences language output, namely, that less complex tasks may produce more uniform results across proficiency levels. Therefore, the relatively simple structure of the narrative and explanation stimuli, along with the specific instructions, might have constrained the extent of expressive possibilities, resulting in similar levels of lexical diversity in speech and gesture across different proficiency groups.

Importantly, however, we observed that the patterns of speech and gesture production remained closely aligned across all proficiency levels. Specifically, the three groups (native, high, low) did not show differences in either the amount or the diversity of their speech production; this was also the case for their gesture production. This suggests that the integration of speech and gesture operates similarly across proficiency levels, reinforcing the idea that both modalities are tightly linked in communication regardless of language proficiency. These findings also further support earlier models of gesture-speech integration, where the two modalities form a tightly integrated system from conceptualization to articulation (Kita & Özyürek, Reference Kita and Özyürek2003; McNeill, Reference McNeill1992).

The lack of group differences in the amount and diversity of speech and gesture production, however, co-occurred with a strong effect of proficiency on the complexity of speech and gesture production. Bilinguals with low proficiency produced less complex speech than both bilinguals with high proficiency and monolinguals with native proficiency – a pattern consistent with earlier work suggesting that higher proficiency correlates with greater linguistic complexity (Bayazidi et al., Reference Bayazidi, Ansarin and Mohammadnia2019). More importantly, this pattern was reversed for the complexity of gestures. Bilinguals with low proficiency produced more complex gestures that either disambiguated their speech or added information not expressed verbally, suggesting a compensatory role for gesture.

This finding may be linked to the earlier theories of gesture-speech production: Krauss et al. (Reference Krauss, Morrel-Samuels and Colasante1991) and Butterworth and Hadar (Reference Butterworth and Hadar1989) suggest that gestures occur more frequently in situations where speech difficulties arise. Rauscher et al. (Reference Rauscher, Krauss and Chen1996), for example, found that gesturing facilitates speech, especially in discourse contexts requiring complex spatial descriptions. Addressing this compensatory function, de Ruiter (Reference De Ruiter and McNeill2000) also proposed the Sketch Model, which considers gesture and speech as two separate but interrelated systems. As he argues, when one system does not function fully, the other can compensate. We see evidence of a similar phenomenon in adults with aphasia, where speech production is belabored. Not surprisingly, adults with aphasia frequently rely on gesture to either replace speech (i.e., pantomimes) or add information to their speech (e.g., Cocks et al., Reference Cocks, Morgan and Kita2011; Mol et al., Reference Mol, Krahmer and van de Sandt-Koenderman2013; Ozturk & Özçalışkan, Reference Ozturk and Özçalışkan2024; Sekine et al., Reference Sekine, Rose, Foster, Attard and Lanyon2013). A similar pattern might be evident for bilinguals – especially those with lower L2 proficiency – who might use gestures to convey meanings that they have difficulty expressing through speech. In fact, the participants in our low proficiency group frequently used gestures to clarify or supplement their L2 speech (‘He is doing this’ + moving cupped hands up and down rapidly to convey pumping; ‘She is doing something’ + moving fisted hands in opposite circular directions to convey knitting). As proposed in the Sketch Model (de Ruiter, Reference De Ruiter and McNeill2000), these gestures appear to compensate for difficulties in L2 production and assist speech production by providing visual cues and by expanding the complexity of the meaning conveyed in speech. This was particularly evident in the explanation task, where problem-solving and reasoning were required, and bilinguals with low proficiency produced more complex gestures. This illustrates how proficiency interacts with context, with the low-proficiency group producing more complex gestures in tasks demanding higher cognitive resources. While our findings align with these accounts in suggesting that bilinguals with low proficiency use gestures as a compensatory tool, it is important to note that our study did not directly test whether gestures facilitated lexical retrieval (e.g., Krauss et al., Reference Krauss, Chen, Gottesman and McNeill2000; Rauscher et al., Reference Rauscher, Krauss and Chen1996). Rather, our evidence points more broadly to the role of gestures in supplementing or clarifying speech and enriching communication when linguistic resources in the L2 are more limited.

The compensatory role of gesture in language production of adult L2 speakers with low proficiency in our study also aligns well with earlier studies with children on L1 development. Children frequently use gestures to supplement their speech across multiple milestones in their L1 development, from early sentences (e.g., Goldin-Meadow & Butcher, Reference Goldin-Meadow, Butcher and Kita2003; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2005a) to early narratives and explanations (Demir et al., Reference Demir, Levine and Goldin-Meadow2015; Özçalışkan, Reference Özçalışkan2007; Özçalışkan & Goldin-Meadow, Reference Özçalışkan and Goldin-Meadow2025; Stites & Özçalışkan, Reference Stites and Özçalışkan2017, Reference Stites and Özçalışkan2021). Moreover, gestures may reduce production demands, facilitating communication for speakers with limited linguistic proficiency (Goldin-Meadow & Alibali, Reference Goldin-Meadow and Alibali2013). Similarly, in second language learning contexts, gestures might serve as a compensatory tool, helping bilinguals with low proficiency convey complex ideas they cannot express verbally.

Of interest, one other domain where we observed differences based on speaker proficiency was the relative production of different gesture types (i.e., iconic, point, beat, emblem). Bilinguals with lower proficiency produced more pointing and beat gestures, particularly in the explanation context. The greater production of deictic gestures by the low proficiency group further extends earlier work: bilingual children have been shown to use pointing gestures more frequently when speaking in their weaker language (Nicoladis, Reference Nicoladis2007). We also know from earlier work that monolingual children rely on pointing gestures extensively at the early stages of language learning to express meanings that they cannot yet convey in speech (e.g., Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Özçalışkan et al., Reference Özçalışkan, Adamson, Dimitrova and Baumann2017). Even among adult bilinguals with native-like proficiency in both languages, deictic gestures are used more often as compared to their monolingual counterparts (see Azar et al., Reference Azar, Backus and Özyürek2020, for a review).

Similar to this earlier work, bilinguals with low proficiency in our study used more pointing gestures, especially when referring to entities or actions they struggled to lexicalize (e.g., ‘then that thing helped’ + pointing at the screen to indicate the pump; ‘he fell into that’ + pointing at the screen to indicate the deep hole that the main character fell into). We also know that pointing tends to increase under more challenging communicative conditions (Alibali et al., Reference Alibali, Kita and Young2000), raising the possibility that this strategy persists into adulthood, particularly for bilinguals navigating asymmetries in proficiency. Different from points, beats are regarded as gestures without symbolic referents, either emphasizing speech or regulating discourse flow (McNeill, Reference McNeill1992). Beyond emphasis, beats have been shown to support attention and facilitate semantic integration (Dimitrova et al., Reference Dimitrova, Chu, Wang, Özyürek and Hagoort2016; Molnar et al., Reference Molnar, Leung, Santos Herrera and Giezen2023; Prieto et al., Reference Prieto, Cravotta, Kushch, Rohrer, Vilà-Giménez, Klessa, Bachan, Wagner, Karpiński and Śledziński2018). In explanation tasks with prompts such as ‘tell me what the problem is and how the character resolves it’, beat gestures may thus have helped participants ensure clarity and highlight critical aspects of their account. Prior research also indicates that bilinguals often rely more on beats in their weaker language (Molnar et al., Reference Molnar, Leung, Santos Herrera and Giezen2023), and among monolingual adults, beats have been linked to alleviating tip-of-the-tongue states (Ravizza, Reference Ravizza2003). The frequent use of beats and points, particularly in the explanation context, thus suggests that both gesture types remain as useful resources for adults with lower proficiency, supporting referential clarity and maintaining discourse coherence when linguistic resources were more limited.

In contrast, the use of iconic gestures – those resembling entities or actions (McNeill, Reference McNeill1992) – did not differ significantly across proficiency groups. All participants used more iconic gestures during narratives than explanations, consistent with McNeill’s (Reference McNeill1992) observation that storytelling contexts elicit greater reliance on depictive gestures to illustrate actions and events. Iconic gestures have been shown to aid both language access and conceptual activation for speakers and listeners alike (Nicoladis, Reference Nicoladis2007). Similar to pointing, they help resolve tip-of-the-tongue states (Frick-Horbury & Guttentag, Reference Frick-Horbury and Guttentag1998) and can supplement speech when lexical access falters (Nicoladis, Reference Nicoladis2002). The consistent use of iconic gestures across groups in our study suggests that, regardless of proficiency, speakers drew on the depictive affordances of iconic gestures to enrich their narratives. Importantly, however, similarity in the amount of iconic gesture production across different proficiency levels also co-occurred with variability in the function of these iconic gestures in the two discourse contexts. More specifically, and as reflected in the interaction between gesture type and complexity, iconic gestures served different communicative functions for the different speaker groups. For speakers with higher proficiency, iconic gestures predominantly reinforced spoken content, thus further augmenting the content conveyed in speech (e.g., saying ‘pumping’ while producing a pumping gesture). In contrast, speakers with lower proficiency relied on iconic gestures more to supplement their speech: they used iconic gestures to add new information to their speech or to clarify actions, referents or meanings that were underspecified in their speech (e.g., saying ‘goes like this’ while producing a pumping gesture). Thus, while the overall frequency of iconic gestures was comparable across groups, the complexity of these gestures in relation to speech differed as a function of proficiency and discourse context – with greater production of iconic gestures to supplement speech among bilinguals with lower L2 proficiency, particularly in the relatively more difficult explanation context.

Our study focused on productions in English as either L1 or L2. An important question, however, is how the patterns of speech and gesture production we observed in monolingual English L1 speakers also compare to those of monolingual Persian L1 speakers – a limitation of our study that was beyond our control. Due to the geopolitical context in the United States, we were unable to obtain ethics board approval to conduct research with Persian monolinguals in Iran, which prevented us from including data from monolingual Persian speakers.

In summary, our analysis of speech and gestures produced by speakers with respect to discourse context and speaker proficiency showed that gesture and speech form tightly integrated systems. Gesture both mirrors the patterns observed in speech (e.g., greater amount of speech and gestures in narratives than explanations) and compensates for difficulties in speech production (i.e., greater use of more complex gestures by speakers with low proficiency in the more challenging explanation context). These findings contribute to our understanding of the intricate relationship between speech and gesture in bilingual communication, highlighting that this integration becomes even more pronounced in speakers with lower proficiency. By revealing that bilinguals with lower proficiency employ more complex gestures to overcome linguistic challenges, our study extends existing theories on gesture’s compensatory role in language production. This underscores the adaptive strategies speakers use to maintain effective communication, emphasizing gestures as a crucial resource in conveying complex ideas when speech alone is insufficient. Furthermore, our research emphasizes the influence of discourse context on gesture use, suggesting that task complexity can modulate the reliance on gestural communication, especially among those with limited language proficiency. These insights have practical implications for language education, where encouraging gesture use could facilitate language learning and fluency in second-language speakers. Future research could build on these findings by exploring other communicative contexts or by examining how gestures interact with other non-verbal modalities to support communication in bilinguals, particularly focusing on groups with lower language proficiency.

Data availability statement

The quantitative data for the study are available on the Open Science Framework (OSF) and can be accessed using the link (https://osf.io/7rwj6/?view_only=0ffdb1aca97248c5af7799ae6b732a70).

Competing interests

The authors declare none.

Footnotes

1 Our original goal was to make the three groups comparable for gender as well – a goal we attained for native English speakers and bilinguals with low English proficiency (13 females, 9 males per group). The recruitment of bilinguals with high proficiency who also met our residency, TOEFL and fluency score criteria presented challenges. This was because many female Persian L1 speakers with high English proficiency tend to return to Iran after completing their studies, thus resulting in fewer number of available female speakers in this group. Given these recruitment constraints, we ensured that groups remained comparable on key background variables (age, education), but not on gender.

2 For the verbal fluency task, the participant was first asked to generate as many words as possible related to a specific category, such as animals and fruits, in 1 minute assessing their category (i.e., semantic) fluency. The participant was next asked to produce words starting with a given letter, such as “A”, “S” or “F” in 1 minute, assessing their letter (i.e., phonemic) fluency. The mean fluency score (as reported here) – in line with earlier work (Portocarrero et al., Reference Portocarrero, Burright and Donovick2007) – provides an average across both semantic and letter fluency scores.

3 Each dependent variable was uncorrelated with the others, and each GLMM was conducted separately for its respective dependent variable. As a result, the power for each GLMM remained consistent at 85% with an alpha level of 0.05, assuming a medium effect size (R2 = 0.20).

References

Alibali, M. W., Kita, S., & Young, A. J. (2000). Gesture and the process of speech production: We think, therefore we gesture. Language and Cognitive Processes, 15, 569613.10.1080/016909600750040571CrossRefGoogle Scholar
Arslan, B., Aktan-Erciyes, A., & Göksun, T. (2023). Multimodal language in bilingual and monolingual children: Gesture production and speech disfluency (pp. 113). Bilingualism: Language and Cognition.Google Scholar
Azar, Z., Backus, A., & Özyürek, A. (2020). Language contact does not drive gesture transfer: Heritage speakers maintain language specific gesture patterns in each language. Bilingualism: Language and Cognition, 23(2), 414428.10.1017/S136672891900018XCrossRefGoogle Scholar
Bayazidi, A., Ansarin, A. A., & Mohammadnia, Z. (2019). The relationship between syntactic and lexical complexity in speech monologues of EFL learners. Applied Research on English Language, 8(4), 473488.Google Scholar
Berman, R. A., & Slobin, D. I. (1994). Relating events in narrative: A crosslinguistic developmental study. Lawrence Erlbaum.Google Scholar
Bialystok, E. (2009). Bilingualism: The good, the bad, and the indifferent. Bilingualism: Language and Cognition, 12(1), 311.10.1017/S1366728908003477CrossRefGoogle Scholar
Bosch, P. (1983). Agreement and anaphora: A study of the roles of pronouns in discourse and syntax. Academic Press.Google Scholar
Brown, A., & Gullberg, M. (2008). Bidirectional crosslinguistic influence in L1-L2 encoding of manner in speech and gesture: A study of Japanese speakers of English. Studies in Second Language Acquisition, 30(2), 225251.10.1017/S0272263108080327CrossRefGoogle Scholar
Butterworth, G., & Hadar, U. (1989). Gesture, speech, and computational stages: A reply to McNeill. Cognition, 32(3), 237244.Google Scholar
Chu, M., & Kita, S. (2011). The nature of gestures’ beneficial role in spatial problem solving. Journal of Experimental Psychology: General, 140(1), 102116.10.1037/a0021790CrossRefGoogle ScholarPubMed
Cocks, N., Morgan, G., & Kita, S. (2011). Iconic gesture and speech integration in younger and older adults. Gesture, 11(1), 2439.10.1075/gest.11.1.02cocCrossRefGoogle Scholar
Cruz, A. (2021). A syntactic approach to gender assignment in Spanish–English bilingual speech. Glossa: A Journal of General Linguistics, 6(1).Google Scholar
De Ruiter, J. P. (2000). The production of gesture and speech. In McNeill, D. (Ed.), Language and gesture (pp. 284311). Cambridge University Press.10.1017/CBO9780511620850.018CrossRefGoogle Scholar
Demir, Ö. E., Levine, S., & Goldin-Meadow, S. (2015). A tale of two hands: Children’s early gesture use in narrative production. Journal of Child Language, 42(3), 662681.10.1017/S0305000914000415CrossRefGoogle ScholarPubMed
Dimitrova, D., Chu, M., Wang, L., Özyürek, A., & Hagoort, P. (2016). Beat that word: How listeners integrate beat gesture and focus in multimodal speech discourse. Journal of Cognitive Neuroscience, 28(9), 12551269.10.1162/jocn_a_00963CrossRefGoogle ScholarPubMed
Educational Testing Service. (2024). TOEFL iBT® test prep planner. https://www.ets.org/toefl.Google Scholar
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175191.10.3758/BF03193146CrossRefGoogle Scholar
Frick-Horbury, D., & Guttentag, R. E. (1998). The effects of restricting hand gesture production on lexical retrieval and free recall. American Journal of Psychology, 1, 4362.10.2307/1423536CrossRefGoogle Scholar
Garrod, S. (2001). Anaphora resolution. In Smelser, N. J. & Baltes, P. B. (Eds.), International encyclopedia of the social and behavior sciences (pp. 490494). Elsevier.10.1016/B0-08-043076-7/01527-8CrossRefGoogle Scholar
Goldin-Meadow, S. (2007). Gesture with speech and without it (pp. 3150). Duncan Cassell Levy.Google Scholar
Goldin-Meadow, S., & Alibali, M. W. (2013). Gesture’s role in speaking, learning, and creating language. Annual Review of Psychology, 64(1), 257283.10.1146/annurev-psych-113011-143802CrossRefGoogle ScholarPubMed
Goldin-Meadow, S., & Butcher, C. (2003). Pointing toward two-word speech in young children. In Kita, S. (Ed.), Pointing: Where language, culture, and cognition meet (pp. 85107). Erlbaum.Google Scholar
Grosjean, F. (2010). Bilingual: Life and reality. Harvard University Press.10.4159/9780674056459CrossRefGoogle Scholar
Gullberg, M. (2003). Gestures, referents, and anaphoric linkage in learner varieties. In Dimroth, C. & Starren, M. (Eds.), Information structure, linguistic structure and dynamics of language acquisition (pp. 311328). Benjamin.10.1075/sibil.26.15gulCrossRefGoogle Scholar
Gullberg, M. (2006). Handling discourse: Gestures, reference tracking, and communication strategies in early L2. Language Learning, 56(1), 156196.10.1111/j.0023-8333.2006.00344.xCrossRefGoogle Scholar
Han, Z. (2004). Fossilization: Five central issues. International Journal of Applied Linguistics, 14(2), 212242.10.1111/j.1473-4192.2004.00060.xCrossRefGoogle Scholar
Hanna, W., & Barbera, J. (Creators). (2014). Tom and Jerry. Warner Bros. Television.Google Scholar
Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). Array programming with NumPy. Nature, 585, 357362.10.1038/s41586-020-2649-2CrossRefGoogle ScholarPubMed
Hostetter, A. B. (2011). When do gestures communicate? A meta-analysis. Psychological Bulletin, 137(2), 297.10.1037/a0022128CrossRefGoogle ScholarPubMed
Hostetter, A. B., Alibali, M. W., & Kita, S. (2007). I see it in my hands’ eye: Representational gestures reflect conceptual demands. Language and Cognitive Processes, 22(3), 313336.10.1080/01690960600632812CrossRefGoogle Scholar
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 9095.10.1109/MCSE.2007.55CrossRefGoogle Scholar
Iverson, J. M., & Goldin-Meadow, S. (2005). Gesture paves the way for language development. Psychological Science, 16(5), 367371.10.1111/j.0956-7976.2005.01542.xCrossRefGoogle ScholarPubMed
Johns, B. T., Sheppard, C. L., Jones, M. N., & Taler, V. (2016). The role of semantic diversity in word recognition across aging and bilingualism. Frontiers in Psychology, 7, 703.10.3389/fpsyg.2016.00703CrossRefGoogle ScholarPubMed
Kita, S., Alibali, M. W., & Chu, M. (2017). How do gestures influence thinking and speaking? The gesture-for-conceptualization hypothesis. Psychological Review, 124(3), 245.10.1037/rev0000059CrossRefGoogle ScholarPubMed
Kita, S., & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal? Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48(1), 1632.10.1016/S0749-596X(02)00505-3CrossRefGoogle Scholar
Kita, S., Özyürek, A., Allen, S., Brown, A., Furman, R., & Ishizuka, T. (2007). Relations between syntactic encoding and co-speech gestures: Implications for a model of speech and gesture production. Language and Cognitive Processes, 22(8), 12121236.10.1080/01690960701461426CrossRefGoogle Scholar
Kotthoff, H. (2007). Oral genres of explanation: Some theoretical and practical remarks. Linguistics and Education, 18(2), 129144.Google Scholar
Krauss, R., Chen, Y., & Gottesman, R. (2000). Lexical gestures and lexical access: A process model. In McNeill, D. (Ed.), Language and gesture (Vol. 2, pp. 261283). Cambridge University Press.10.1017/CBO9780511620850.017CrossRefGoogle Scholar
Krauss, R. M., Morrel-Samuels, P., & Colasante, C. (1991). Do conversational gestures communicate? Journal of Personality and Social Psychology, 61, 743754.10.1037/0022-3514.61.5.743CrossRefGoogle ScholarPubMed
Lyons, J. (1977). Semantics 2. Cambridge University Press.Google Scholar
MacWhinney, B. (2000). The childes project: Tools for analyzing talk (3rd ed.). Lawrence Erlbaum Associates.Google Scholar
Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The language experience and proficiency questionnaire (LEAP–Q): Assessing language profiles in bilinguals and multilinguals. Journal of Speech, Language, and Hearing Research, 50, 940967.10.1044/1092-4388(2007/067)CrossRefGoogle ScholarPubMed
Masson-Carro, I., Goudbeek, M., & Krahmer, E. (2017). How what we see and what we know influence iconic gesture production. Journal of Nonverbal Behavior, 41, 367394.10.1007/s10919-017-0261-4CrossRefGoogle ScholarPubMed
McKinney, W. (2010). Data structures for statistical computing in python. In van der Walt, S. & Millman, J. (Eds.), Proceedings of the 9th python in science conference (pp. 5156).10.25080/Majora-92bf1922-00aCrossRefGoogle Scholar
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of Chicago Press.Google Scholar
Mol, L., Krahmer, E., & van de Sandt-Koenderman, M. (2013). Gesturing by speakers with aphasia: How does it compare? Journal of Speech, Language, and Hearing Research, 56(4), 12241236.10.1044/1092-4388(2012/11-0159)CrossRefGoogle ScholarPubMed
Molnar, M., Leung, K. I., Santos Herrera, J., & Giezen, M. (2023). Toddler-directed and adult-directed gesture frequency in monolingual and bilingual caregivers. International Journal of Bilingualism, 27(5), 717730.10.1177/13670069221120929CrossRefGoogle Scholar
Nicoladis, E. (2002). Some gestures develop in conjunction with spoken language development and others don’t: Evidence from bilingual preschoolers. Journal of Nonverbal Behavior, 26, 241266.10.1023/A:1022112201348CrossRefGoogle Scholar
Nicoladis, E. (2007). The effect of bilingualism on the use of manual gestures. Applied PsychoLinguistics, 28, 441454.10.1017/S0142716407070245CrossRefGoogle Scholar
Nicoladis, E., Mayberry, R. I., & Genesee, F. (1999). Gesture and early bilingual development. Developmental Psychology, 35(2), 514522.10.1037/0012-1649.35.2.514CrossRefGoogle ScholarPubMed
Nicoladis, E., Nagpal, J., Marentette, P., & Hauer, B. (2018). Gesture frequency is linked to story-telling style: Evidence from bilinguals. Language and Cognition, 10(4), 641664.10.1017/langcog.2018.25CrossRefGoogle Scholar
Nicoladis, E., Pika, S., & Marentette, P. (2009). Do French–English bilingual children gesture more than monolingual children? Journal of Psycholinguistic Research, 38, 573585.10.1007/s10936-009-9121-7CrossRefGoogle ScholarPubMed
Nicoladis, E., Pika, S., Yin, H., & Marentette, P. (2007). Gesture use in story recall by Chinese–English bilinguals. Applied PsychoLinguistics, 28, 721735.10.1017/S0142716407070385CrossRefGoogle Scholar
Oller, D. K., & Eilers, R. E. (2002). Language and literacy in bilingual children (2). Multilingual Matters.10.21832/9781853595721CrossRefGoogle Scholar
Özçalışkan, Ş. (2007). Metaphors we move by: Children’s developing understanding of metaphorical motion in typologically distinct languages. Metaphor and Symbol, 22(2), 147168.10.1080/10926480701235429CrossRefGoogle Scholar
Özçalışkan, Ş. (2016). Do gestures follow speech in bilinguals’ description of motion? Bilingualism: Language and Cognition, 19(3), 644653.10.1017/S1366728915000796CrossRefGoogle Scholar
Özçalışkan, Ş., Adamson, L. B., Dimitrova, N., & Baumann, S. (2017). Early gesture provides a helping hand to spoken vocabulary development for children with autism, down syndrome and typical development. Journal of Cognition and Development, 18(3), 325337.10.1080/15248372.2017.1329735CrossRefGoogle ScholarPubMed
Özçalışkan, Ş., & Goldin-Meadow, S. (2005a). Gesture is at the cutting edge of early language development. Cognition, 96(3), B101B113.10.1016/j.cognition.2005.01.001CrossRefGoogle Scholar
Özçalışkan, Ş., & Goldin-Meadow, S. (2005b). Do parents lead their children by the hand? Journal of Child Language, 32(3), 481505.10.1017/S0305000905007002CrossRefGoogle Scholar
Özçalışkan, Ş., & Goldin-Meadow, S. (2025). Does gesture follow speech in describing metaphorical motion events over developmental time? Brain & Language, 270, 105620.10.1016/j.bandl.2025.105620CrossRefGoogle ScholarPubMed
Ozturk, S., Pınar, E., Ketrez, F. N., & Özçalışkan, Ş. (2021). Effect of sex and dyad composition on speech and gesture development of singleton and twin children. Journal of Child Language, 48(5), 10481066.10.1017/S0305000920000744CrossRefGoogle ScholarPubMed
Ozturk, S., & Özçalışkan, Ş. (2024). Gesture’s Role in the Communication of Adults With Different Types of Aphasia. American journal of speech-language pathology, 33(4), 18111830.10.1044/2024_AJSLP-23-00046CrossRefGoogle ScholarPubMed
Özyürek, A. (2002). Speech-language relationship across languages and in second language learners: Implications for spatial thinking and speaking. In Skarabella, B. (Ed.), Proceedings of the 26th Boston University conference on language development (pp. 500509). Cascadilla Press.Google Scholar
Park, H. I. (2020). How do Korean–English bilinguals speak and think about motion events? Evidence from verbal and non-verbal tasks. Bilingualism: Language and Cognition, 23(3), 483499.10.1017/S1366728918001074CrossRefGoogle Scholar
Pawlak, M. (2011). Speaking and instructed foreign language acquisition. Multilingual Matters.10.21832/9781847694126CrossRefGoogle Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., Shultz, E., Bertrand, J., Dufour, N., & Oliphant, T. E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 28252830.Google Scholar
Pika, S., Nicoladis, E., & Marentette, P. (2006). A cross-cultural study on the use of gestures: Evidence for cross-linguistic transfer? Bilingualism: Language and Cognition, 9, 319327.10.1017/S1366728906002665CrossRefGoogle Scholar
Pınar, E., Öztürk, S., Ketrez, N., & Özçalışkan, Ş. (2021). Parental speech and gesture input to girls versus boys in singletons and twins. Journal of Nonverbal Behavior, 45, 297318.10.1007/s10919-020-00356-wCrossRefGoogle Scholar
Portocarrero, J. S., Burright, R. G., & Donovick, P. J. (2007). Vocabulary and verbal fluency of bilingual and monolingual college students. Archives of Clinical Neuropsychology, 22, 415422.10.1016/j.acn.2007.01.015CrossRefGoogle ScholarPubMed
Prieto, P., Cravotta, A., Kushch, O., Rohrer, P. L., & Vilà-Giménez, I. (2018). Deconstructing beat gestures: A labelling proposal. In Klessa, K., Bachan, J., Wagner, A., Karpiński, M., & Śledziński, D. (Eds.), Proceedings of the 9th international conference on speech prosody (pp. 201205). International Speech Communication Association.Google Scholar
Python Software Foundation. (2023). Python (Version 3.x) [Software]. https://www.python.org.Google Scholar
Rauscher, F. H., Krauss, R. M., & Chen, Y. (1996). Gesture, speech, and lexical access: The role of lexical movements in speech production. Psychological Science, 7(4), 226231.10.1111/j.1467-9280.1996.tb00364.xCrossRefGoogle Scholar
Ravizza, S. (2003). Movement and lexical access: Do noniconic gestures aid in retrieval? Psychonomic Bulletin & Review, 10, 610615.10.3758/BF03196522CrossRefGoogle ScholarPubMed
Richards, J. C. (2008). Moving beyond the plateau: From intermediate to advanced levels in language learning. Cambridge University Press.Google Scholar
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22(1), 2757.10.1093/applin/22.1.27CrossRefGoogle Scholar
Rosselli, M., Ardila, A., Salvatierra, J., Marquez, M., Matto, L., & Weekes, V. A. (2002). A cross-linguistic comparison of verbal fluency tests. International Journal of Neuroscience, 112(6), 759776.10.1080/00207450290025752CrossRefGoogle ScholarPubMed
Ryan, M.-L. (1993). Narrative in real time: Chronicle, mimesis and plot in baseball broadcasts. Narrative, 1, 138155.Google Scholar
Schmidt, A. (Director). (1971). Die Sendung mit der Maus [TV series]. WDR.Google Scholar
Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with python. SciPy, 7(1). https://doi.org/10.25080/Majora-92bf1922-011.CrossRefGoogle Scholar
Sekine, K., Rose, M. L., Foster, A. M., Attard, M. C., & Lanyon, L. E. (2013). Gesture production patterns in aphasic discourse: In-depth description and preliminary predictions. Aphasiology, 27(9), 10311049.10.1080/02687038.2013.803017CrossRefGoogle Scholar
Smithson, L., & Nicoladis, E. (2013). Verbal memory resources predict iconic gesture use among monolinguals and bilinguals. Bilingualism: Language and Cognition, 16(4), 934944.10.1017/S1366728913000175CrossRefGoogle Scholar
So, W. C. (2010). Cross-cultural transfer in gesture frequency in Chinese–English bilinguals. Language and Cognitive Processes, 25(10), 13351353.10.1080/01690961003694268CrossRefGoogle Scholar
So, W. C., Kita, S., & Goldin-Meadow, S. (2009). Using the hands to identify who does what to whom: Gesture and speech go hand-in-hand. Cognitive Science, 33(1), 115125.10.1111/j.1551-6709.2008.01006.xCrossRefGoogle Scholar
So, W. C., Kita, S., & Goldin-Meadow, S. (2013). When do speakers use gestures to specify who does what to whom? The role of language proficiency and type of gestures in narratives. Journal of Psycholinguistic Research, 42, 581594.10.1007/s10936-012-9230-6CrossRefGoogle ScholarPubMed
Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests (2nd ed.). Oxford University Press.Google Scholar
Stites, L., & Özçalışkan, Ş. (2017). Who does what to whom: Children track story referents first in gesture. Journal of Psycholinguistic Research, 46(4), 10191032.10.1007/s10936-017-9476-0CrossRefGoogle ScholarPubMed
Stites, L., & Özçalışkan, Ş. (2021). The time is at hand: Literacy predicts changes in children’s gestures about time. Journal of Psycholinguistic Research, 50(5), 967983.10.1007/s10936-021-09782-3CrossRefGoogle Scholar
Tannen, D. (1980). A comparative analysis of oral narrative strategies: Athenian Greek and American English. In Chafe, W. L. (Ed.), The pear stories: Cognitive, cultural, and linguistic aspects of narrative production (pp. 5187). Ablex Publishing Corporation.Google Scholar
Tannen, D. (1982). The oral/literate continuum in discourse. In Tannen, D. (Ed.), Spoken and written language: Exploring orality and literacy (pp. 116). Ablex Publishing Corporation.Google Scholar
Theocharopoulou, F., Cocks, N., Pring, T., & Dipper, L. T. (2015). TOT phenomena: Gesture production in younger and older adults. Psychology and Aging, 30(2), 245.10.1037/a0038913CrossRefGoogle ScholarPubMed
Treffers-Daller, J. (2009). Language dominance and lexical diversity: How bilinguals and L2 learners differ in their knowledge and use of French lexical and functional items. In Richards, B., Daller, M. H., Malvern, D. D., Meara, P., Milton, J., & Treffers-Daller, J. (Eds.), Vocabulary studies in first and second language acquisition: The Interface between theory and application (pp. 7490). Palgrave MacMillan.10.1057/9780230242258_5CrossRefGoogle Scholar
Waskom, M. L. (2021). Seaborn: Statistical data visualization. Journal of Open-Source Software, 6(60), 3021.10.21105/joss.03021CrossRefGoogle Scholar
Yoshioka, K. (2008). Gesture and information structure in first and second language. Gesture, 8(2), 236255.10.1075/gest.8.2.07yosCrossRefGoogle Scholar
Figure 0

Table 1. Summary of sample characteristics by group

Figure 1

Figure 1. Screenshots from a sample task eliciting narrative.

Figure 2

Figure 2. Screenshots from a sample task eliciting explanation.

Figure 3

Figure 3. Mean amount (A1–A2), diversity (B1–B2) and complexity (C1–C2) of speech and gesture production in English by bilinguals with low proficiency, bilinguals with high proficiency and monolingual English speakers with native proficiency in narrative (solid red bars) and explanation tasks (striped bars). Error bars represent standard error; also note that the scales of the bars in panels A and B are different and panel C2 only depicts the means for complex gestures (i.e., gestures that either disambiguate speech or add new information to speech).

Figure 4

Table 2. Mean (SD) production of gesture types by speaker proficiency and discourse context

Figure 5

Table 3. Mean (SD) production of simple and complex gestures by speaker proficiency and discourse context

Figure 6

Figure 4. The sample scene of the mother bird knitting (A), and its depiction in gesture by a monolingual speaker with native fluency (B) and by bilingual speakers with either high (C) or low (D) English proficiency. The speakers with native and high proficiency both used gesture to further reinforce the information already conveyed in their speech (‘the mama bird is knitting + moving fisted hands in opposite circular directions to convey knitting); the speaker with low English proficiency used gesture to add new information not expressed in speech (‘she is doing something’ + moving fisted hands in opposite circular directions to convey knitting).

Figure 7

Figure 5. The sample explanation scene of a mouse pumping a tire (A), and its depiction in gesture by a monolingual speaker with native English fluency (B), and by bilingual speakers with either high (C) or low (D) English proficiency. The speakers with native and high proficiency both used gestures to further reinforce the information already expressed in speech (‘the mouse is pumping the tire’ + moving cupped hands up and down rapidly to convey pumping); the speaker with low English proficiency used gestures to add new information that was missing in speech (‘she is doing this’ + moving cupped hands up and down rapidly to convey pumping).