Hostname: page-component-848d4c4894-hfldf Total loading time: 0 Render date: 2024-05-22T19:32:00.268Z Has data issue: false hasContentIssue false


Published online by Cambridge University Press:  05 August 2021

Aki Tsunemoto*
Concordia University
Rachael Lindberg
Concordia University
Pavel Trofimovich
Concordia University
Kim McDonough
Concordia University
Correspondence concerning this article should be addressed to Aki Tsunemoto, Department of Education, Concordia University (FG 5.150), 1455 de Maisonneuve Blvd W., Montreal, Quebec, Canada H3G 1M8. E-mail:


This study examined the role of visual cues (facial expressions and hand gestures) in second language (L2) speech assessment. University students (N = 60) at English-medium universities assessed 2-minute video clips of 20 L2 English speakers (10 Chinese and 10 Spanish speakers) narrating a personal story. They rated the speakers’ comprehensibility, accentedness, and fluency using 1,000-point sliding scales. To manipulate access to visual cues, the raters were assigned to three conditions that presented audio along with (a) the speaker’s static image, (b) a static image of a speaker’s torso with dynamic face, or (c) dynamic torso and face. Results showed that raters with access to the full video tended to perceive the speaker as more comprehensible and significantly less accented compared to those who had access to less visually informative conditions. The findings are discussed in terms of how the integration of visual cues may impact L2 speech assessment.

Research Article
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


We would like to thank the members of our research group (Tzu-Hua Chen, Teng Hsu, YooLae Kim, Chen Liu, Oguzhan Tekin, and Pakize Uludag) for their valuable insights and all the research assistants who helped with data collection and coding: Marie Apaloo, Tzu-Hua Chen, Dalia Elsayed, Sarah Ercoli, Lisa Gonzalez, Xuanji Hu, Chen Liu, Ashley Montgomery, Jie Qiu, Quinton Stotz, Lauren Strachan, Kym Taylor Reid, Oguzhan Tekin, Pakize Uludag, and Roza van Lieshout. Also, we are grateful to Masaki Eguchi and Shungo Suzuki for their help with data analysis, and to the anonymous reviewers and the journal editors of Studies in Second Language Acquisition for their insightful comments and suggestions.

This research was supported by the Social Sciences and Humanities Research Council of Canada (SSHRC) grants 435-2016-1406 (to Pavel Trofimovich and Sara Kennedy) and 435-2019-0754 (to Kim McDonough and Pavel Trofimovich).

The experiment in this article earned Open Materials and Open Data badges for transparent practices. The materials and data are available at



Alibali, M. W., Kita, S., & Young, A. J. (2000). Gesture and the process of speech production: We think, therefore we gesture. Language and Cognitive Processes, 15, 593613.CrossRefGoogle Scholar
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390412.CrossRefGoogle Scholar
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255278.CrossRefGoogle ScholarPubMed
Bartoń, K. (2020). MuMIn: Multi-Model Inference. R package version 1.43.17. Google Scholar
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 148.CrossRefGoogle Scholar
Bates, E., & Dick, F. (2002). Language, gesture, and the developing brain. Developmental Psychobiology, 40, 293310.CrossRefGoogle ScholarPubMed
Batty, A. O. (2014). A comparison of video- and audio-mediated listening tests with many-facet Rasch modeling and differential distractor functioning. Language Testing, 32, 320.CrossRefGoogle Scholar
Bavelas, J. B., Coates, L., & Johnson, T. (2002). Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52, 566580.Google Scholar
Beattie, G. W., & Hughes, M. (1987). Planning spontaneous speech and concurrent visual monitoring of a televised face: Is there interference? Semiotica, 65, 97106.CrossRefGoogle Scholar
Beattie, G. W., & Shovelton, H. (1999). Do iconic hand gestures really contribute anything to the semantic information conveyed by speech? An experimental investigation. Semiotica, 123, 130.CrossRefGoogle Scholar
Bosker, H. R., Pinget, A. F., Quené, H., Sanders, T., & de Jong, N. H. (2013). What makes speech sound fluent? The contribution of pauses, speed and repairs. Language Testing, 30, 159175.CrossRefGoogle Scholar
Chu, M., Meyer, A., Foulkes, L., & Kita, S. (2014). Individual differences in frequency and saliency of speech-accompanying gestures: The role of cognitive abilities and empathy. Journal of Experimental Psychology: General, 143, 694709.CrossRefGoogle ScholarPubMed
Chui, K. (2005). Temporal patterning of speech and iconic gestures in conversational discourse. Journal of Pragmatics, 37, 871887.CrossRefGoogle Scholar
Cobb, T. (2019). VocabProfilers [computer program]. Google Scholar
Derwing, T. M., & Munro, M. J. (2015). Pronunciation Fundamentals: Evidence-based perspectives for L2 teaching and research. John Benjamins.CrossRefGoogle Scholar
Drijvers, L., & Özyürek, A. (2017). Visual context enhanced: The joint contribution of iconic gestures and visible speech to degraded speech comprehension. Journal of Speech, Language, and Hearing Research, 60, 212222.CrossRefGoogle ScholarPubMed
Drijvers, L., & Özyürek, A. (2020). Non-native listeners benefit less from gestures and visible speech than native listeners during degraded speech comprehension. Language and Speech, 63, 209220.CrossRefGoogle ScholarPubMed
Glenberg, A. M., Shroeder, J. L., & Robertson, D. A. (1998). Averting the gaze disengages the environment and facilitates remembering. Memory & Cognition, 26, 651658.CrossRefGoogle ScholarPubMed
Gregersen, T., Olivares-Cuhat, G., & Storm, J. (2009). An examination of L1 and L2 gesture use: What role does proficiency play? The Modern Language Journal, 93, 195208.CrossRefGoogle Scholar
Gullberg, M. (1998). Gesture as a communication strategy in second language discourse: A study of learners of French and Swedish. Lund University Press.Google Scholar
Gullberg, M. (2006). Some reasons for studying gesture and second language acquisition (Hommage à Adam Kendon). International Review of Applied Linguistics in Language Teaching, 44, 103124.CrossRefGoogle Scholar
Gullberg, M. (2010). Methodological reflections on gesture analysis in second language acquisition and bilingualism research. Second Language Research, 26, 75102.CrossRefGoogle Scholar
Gullberg, M., De Bot, K., & Volterra, V. (2008). Gestures and some key issues in the study of language development. Gesture, 8, 149179.CrossRefGoogle Scholar
Gullberg, M., & Holmqvist, K. (2006). What speakers do and what addressees look at: Visual attention to gestures in human interaction live and on video. Pragmatics & Cognition, 14, 5382.CrossRefGoogle Scholar
Gullberg, M., & McCafferty, S. G. (2008). Introduction to gesture and SLA: Toward an integrated approach. Studies in Second Language Acquisition, 30, 133146.CrossRefGoogle Scholar
Hardison, D. M. (2018). Visualizing the acoustic and gestural beats of emphasis in multimodal discourse: Theoretical and pedagogical implications. Journal of Second Language Pronunciation, 4, 232259.CrossRefGoogle Scholar
Hayes-Harb, R., & Hacking, J. F. (2015). Beyond rating data: What do listeners believe underlies their accentedness judgments? Journal of Second Language Pronunciation, 1, 4364.CrossRefGoogle Scholar
Hostetter, A. B., & Alibali, M. W. (2008). Visible embodiment: Gestures as simulated action. Psychonomic Bulletin & Review, 15, 495514.CrossRefGoogle ScholarPubMed
Hostetter, A. B., & Potthoff, A. L. (2012). Effects of personality and social situation on representational gesture production. Gesture, 12, 6383.CrossRefGoogle Scholar
Inceoglu, S. (2019). Individual differences in L2 speech perception: The role of phonological memory and lipreading ability. The Modern Language Journal, 103, 782799.CrossRefGoogle Scholar
Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility: Identifying the linguistic influences on listeners’ L2 comprehensibility ratings. Studies in Second Language Acquisition, 34, 475505.CrossRefGoogle Scholar
Iverson, J. M., Capirci, O., Volterra, V., & Goldin-Meadow, S. (2008). Learning to talk in a gesture-rich world: Early communication in Italian vs. American children. First Language , 28, 164181.CrossRefGoogle Scholar
Jenkins, S., & Parra, I. (2003). Multiple layers of meaning in an oral proficiency test: The complementary roles of nonverbal, paralinguistic, and verbal behaviors in assessment decisions. The Modern Language Journal, 87, 90107.CrossRefGoogle Scholar
Kahng, J. (2018). The effect of pause location on perceived fluency. Applied Psycholinguistics, 39, 569591.CrossRefGoogle Scholar
Kang, O. (2010). Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness. System, 38, 301315.CrossRefGoogle Scholar
Kang, O., & Rubin, D. L. (2009). Reverse linguistic stereotyping: Measuring the effect of listener expectations on speech evaluation. Journal of Language and Social Psychology, 28, 441456.CrossRefGoogle Scholar
Kawase, S., Hannah, B., & Wang, Y. (2014). The influence of visual speech information on the intelligibility of English consonants produced by non-native speakers. The Journal of the Acoustical Society of America, 136, 13521362.CrossRefGoogle ScholarPubMed
Kelly, S. D., Manning, S. M., & Rodak, S. (2008). Gesture gives a hand to language and learning: Perspectives from cognitive neuroscience, developmental psychology and education. Language and Linguistics Compass, 2, 569588.CrossRefGoogle Scholar
Kendon, A. (1994). Do gestures communicate? A review. Research on Language and Social Interaction, 27, 175200.CrossRefGoogle Scholar
Kita, S. (2009). Cross-cultural variation of speech-accompanying gesture: A review. Language and Cognitive Processes, 24, 145167.CrossRefGoogle Scholar
Knapp, M. L., & Hall, J. A. (2001). Nonverbal communication in interaction. Holt, Rinehart and Winston.Google Scholar
Krahmer, E., & Swerts, M. (2007). The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. Journal of Memory and Language, 57, 396414.Google Scholar
Kutlu, E. (2020). Now you see me, now you mishear me: Raciolinguistic accounts of speech perception in different English varieties. Journal of Multilingual and Multicultural Development. Advance online publication. CrossRefGoogle Scholar
Lenth, R. (2020). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.5.1. Google Scholar
Li, P., Baills, F., & Prieto, P. (2020). Observing and producing durational hand gestures facilitates the pronunciation of novel vowel-length contrasts. Studies in Second Language Acquisition. Advance online publication. CrossRefGoogle Scholar
Linck, J. A., & Cunnings, I. (2015). The utility and application of mixed-effects models in second language research. Language Learning, 65, 185207.CrossRefGoogle Scholar
Maas, C. J., & Hox, J. J. (2005). Sufficient sample size for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Science, 1, 8692.CrossRefGoogle Scholar
MacPherson, D., Abur, D., & Stepp, C. (2017). Acoustic measures of voice and physiologic measures of autonomic arousal during speech as a function of cognitive load. Journal of Voice, 31, 504.e1504.e9.CrossRefGoogle ScholarPubMed
Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38, 4352.CrossRefGoogle Scholar
McCafferty, S. G. (2002). Gesture and creating zones of proximal development for second language learning. The Modern Language Journal, 86, 192203.CrossRefGoogle Scholar
McDonough, K., & Trofimovich, P. (2019).  Corpus of English as a Lingua Franca Interaction (CELFI). Concordia University.Google Scholar
McNeish, D. M., & Stapleton, L. M. (2016). The effect of small sample size on two-level model estimates: A review and illustration. Educational Psychology Review, 28, 295314.CrossRefGoogle Scholar
Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 15, 133137.CrossRefGoogle ScholarPubMed
Munro, M. J., & Derwing, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45, 7397.Google Scholar
Nambiar, M. K., & Goon, C. (1993). Assessment of oral skills: A comparison of scores obtained through audio recordings to those obtained through face-to-face evaluation. RELC Journal, 24, 1531.CrossRefGoogle Scholar
Nakatsuhara, F., Inoue, C., & Taylor, L. (2021). Comparing rating modes: Analysing live, audio, and video ratings of IELTS speaking test performances. Language Assessment Quarterly, 18, 83106.CrossRefGoogle Scholar
Neu, J. (1990). Assessing the role of nonverbal communication in the acquisition of communicative competence in L2. In Scarcella, R. C., Andersen, E. S., & Krashen, S. D. (Eds.), Developing communicative competence in a second language (pp. 121138). Newbury House.Google Scholar
Nicoladis, E., Nagpal, J., Marentette, P., & Hauer, B. (2018). Gesture frequency is linked to story-telling style: Evidence from bilinguals. Language and Cognition, 10, 641664.CrossRefGoogle Scholar
O’Carroll, S., Nicoladis, E., & Smithson, L. (2015). The effect of extroversion on communication: Evidence from an interlocutor visibility manipulation. Speech Communication, 69, 18.CrossRefGoogle Scholar
Ockey, G. J. (2007). Construct implications of including still image or video in computer-based listening tests. Language Testing, 24, 517537.CrossRefGoogle Scholar
Pelachaud, C., Badler, N. I., & Steedman, M. (1996). Generating facial expressions for speech. Cognitive Science, 20, 146.CrossRefGoogle Scholar
Pika, S., Nicoladis, E., & Marentette, P. F. (2006). A cross-cultural study on the use of gestures: Evidence for cross-linguistic transfer? Bilingualism: Language and Cognition, 9, 319327.CrossRefGoogle Scholar
Préfontaine, Y, & Kormos, J. (2016). A qualitative analysis of perceptions of fluency in second language French. International Review of Applied Linguistics in Language Teaching, 54, 151169.CrossRefGoogle Scholar
R Core Team. (2020). R: A language and environment for statistical computing. Google Scholar
Saito, K., & Plonsky, L. (2019). Effects of second language pronunciation teaching revisited: A proposed measurement framework and meta-analysis. Language Learning, 69, 652708.CrossRefGoogle Scholar
Saito, K., Trofimovich, P., & Isaacs, T. (2017). Using listener judgments to investigate linguistic influences on L2 comprehensibility and accentedness: A validation and generalization study. Applied Linguistics, 38, 439462.Google Scholar
Scarborough, R., Keating, P., Mattys, S. L., Cho, T., & Alwan, A. (2009). Optical phonetics and visual perception of lexical and phrasal stress in English. Language and Speech, 52, 135175.CrossRefGoogle ScholarPubMed
Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sample sizes for organizational research using multilevel modeling. Organizational Research Methods, 12, 347367.CrossRefGoogle Scholar
Smithson, L., & Nicoladis, E. (2013). Verbal memory resources predict iconic gesture use among monolinguals and bilinguals. Bilingualism: Language and Cognition, 16, 934944.CrossRefGoogle Scholar
So, W. C. (2010). Cross-cultural transfer in gesture frequency in Chinese–English bilinguals. Language and Cognitive Processes, 25, 13351353.CrossRefGoogle Scholar
Stam, G., & Buescher, K. (2018). Gesture research. In Phakiti, A., DeCosta, P., Plonsky, P., & Starfield, S. (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 793809). Palgrave Macmillan.CrossRefGoogle Scholar
Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures and facial cues in second language listening comprehension. Language Learning, 55, 661699.CrossRefGoogle Scholar
Swerts, M., & Krahmer, E. (2008). Facial expression and prosodic prominence: Effects of modality and facial area. Journal of Phonetics, 36, 219238.CrossRefGoogle Scholar
Turkstra, L. S. (2005). Looking while listening and speaking: Eye-to-face gaze in adolescents with and without traumatic brain injury. Journal of Speech, Language, and Hearing Research, 48, 14291441.CrossRefGoogle ScholarPubMed
Wagner, E. (2008). Video listening tests: What are they measuring? Language Assessment Quarterly, 5, 218243.CrossRefGoogle Scholar
Wheeler, P. (2019). The effect of vowel accuracy, visual speech, and iconic gesture on intelligibility. Unpublished master’s thesis, University College London, UCL Institute of Education. Google Scholar
Zheng, Y. I., & Samuel, A. G. (2019). How much do visual cues help listeners in perceiving accented speech? Applied Psycholinguistics, 40, 93109.CrossRefGoogle Scholar