Speaking

doi:10.1017/9781009294850.030

24 - Speaking

from Part VI - Language Skills and Areas

Published online by Cambridge University Press: 15 June 2025

Gilbert Dizon

Edited by

Glenn Stockwell and

Yijen Wang

Show author details

Glenn Stockwell: Affiliation:
The Education University of Hong Kong
Yijen Wang: Affiliation:
Waseda University, Japan

Book contents

Summary

This chapter examines the concept of L2 speaking by detailing several technologies that can be used to support the development of oral production in a foreign language. Relevant theoretical and historical concepts are first discussed to give readers a foundation to understand the factors that influence the L2 speaking process. The next sections delve into emerging technologies that show promise in supporting speaking development. The chapter concludes with future directions related to L2 speaking teaching and learning.

Keywords

speaking dialogue-based CALL intelligent personal assistants (IPAs)automatic speech recognition (ASR)virtual reality (VR)augmented reality (AR)machine translation

Information

Type: Chapter
Information: The Cambridge Handbook of Technology in Language Teaching and Learning , pp. 395 - 409

DOI: https://doi.org/10.1017/9781009294850.030 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

24 Speaking

Introduction

Conversing effectively in a second language (L2) is one of the most challenging language skills to develop because of the time constraints involved (Jong, Reference Jong and Chapelle2020). In natural conversation, interlocutors must be able to sufficiently understand the topic of discussion, decide what to say, and then actually say it. If there are any issues related to comprehension or message formulation, then an L2 speaker may miss their chance to join a conversation as the topic of discussion may change abruptly. Even when L2 speakers are given more time to prepare, as is often the case with oral presentations, they may still struggle to speak coherently and fluently in the target language. Given the difficulties surrounding L2 speaking, the topic has been one of the most examined areas in computer-assisted language learning (CALL) research (Gillespie, Reference Gillespie2020). However, the plethora of technologies that can be used in and outside the language classroom to promote L2 speaking can make it difficult for teachers to choose the most appropriate digital tools for their particular context. It would be impossible to detail all the available technologies that can be used to support L2 speaking development in a single book chapter. Therefore, this chapter highlights emerging technologies that have yet to be widely adopted in the teaching of L2 speaking but have shown potential in promoting L2 oral interaction and speaking development.

Background

Although there are several explanations as to how learners acquire an L2 and in extension develop L2 speaking, two of the most recognizable and easily contrasted theories in L2 learning are cognitive theory and sociocultural theory (SCT). Accordingly, this section provides an overview of the two theories in the context of L2 speaking while also touching upon other important concepts that pertain to the speaking process.

The most influential model in relation to cognitive theory and L2 speaking is Levalt’s Blueprint of the Speaker (Levelt, Reference Levelt, Bertelson, Eelen and d’Ydewalle1994, Reference Levelt, Brown and Hagoort1999). According to this model, the speaking process involves a series of steps, the first of which is the conceptualization of a speaker’s communicative intention, that is, the message that the speaker would like to convey. This message gets formulated through three different operations: grammatical, morpho-phonological, and phonetic encoding. Grammatical encoding involves the activation of lexical concepts or lemmas from the lexicon, a mental repository which stores the words a person has acquired throughout their life. Once grammatical encoding begins, morpho-phonological encoding is activated, leading to the creation of a phonological score, that is, the “syllabified words, phrases and intonation pattern” of an utterance (Levelt, Reference Levelt, Brown and Hagoort1999, p. 88). Finally, a pronounceable structure is formulated through phonetic encoding, which triggers the appropriate articulatory gestures depending on the syllables in a particular phonological score. As Hulstijn (Reference Hulstijn, Cummins and Davison2006) notes, in the context of L2 speaking instruction, one important point to consider is that the mental processes outlined in this paragraph occur outside a speaker’s conscious awareness. This makes these cognitive processes sensitive to working memory (WM), the topic of the subsequent paragraph.

Working memory is defined as “the mental processes responsible for the temporary storage and manipulation of information in the course of on-going processing” (Juffs & Harrington, Reference Juffs and Harrington2011, p. 137). Because of the complexity of mental processes and their interaction with one another, WM has the potential to limit performance during cognitive tasks. Thus, it is believed that individuals with high working memory capacity (WMC) tend to perform better in productive L2 tasks than those with low WMC (Mackey et al., Reference Mackey, Adams, Stafford and Winke2010). Research by Kormos and Safar (Reference Kormos and Safar2008) demonstrating a strong correlation between WMC and language learning supports this notion. However, more recent findings by Hayashi and colleagues (Hayashi, Reference Hayashi2019; Hayashi, Kobayashi, & Toyoshige, Reference Hayashi, Kobayashi and Toyoshige2016) suggest that WM does not have a significant impact on foreign language development. Instead, other factors such as context, strategy, and other individual differences may have a larger influence on L2 performance. Nevertheless, more empirical studies need to be conducted to better establish the relationship between WM and L2 speaking.

Complexity, accuracy, and fluency (CAF) have also featured predominantly in L2 research inspired by cognitive theory (Housen & Kuiken, Reference Housen and Kuiken2009). There is still debate regarding the definitions of these three constructs. However, accuracy is the one with the most consensus and refers to the degree to which an individual’s language deviates from the norm (Wolfe-Quintero, Inagaki, & Kim, Reference Wolfe-Quintero, Inagaki and Kim1998). Fluency is typically described as a “person’s general language proficiency, particularly as characterized by perceptions of ease, eloquence, and ‘smoothness’ of speech or writing” (Housen & Kuiken, Reference Housen and Kuiken2009, p. 463). While complexity is the most controversial construct (Housen & Kuiken, Reference Housen and Kuiken2009), it can be referred to as the capacity to use a variety of sophisticated structures and lexis (Suzuki & Kormos, Reference Suzuki and Kormos2020). In the context of L2 speaking, several factors have been shown to positively influence some or all of the CAF constructs including WMC (Ahmadian, Reference Ahmadian2012), pre-task planning (Ahmadian & Tavakoli, Reference Ahmadian and Tavakoli2011), task repetition (Ahmadian & Tavakoli, Reference Ahmadian and Tavakoli2011) and topic familiarity (Qiu, Reference Qiu2019).

In contrast to cognitive theory which tends to focus on the mental processes that underlie L2 learning, SCT stresses social context and how cultural factors mediate L2 development. Sociocultural theory is highly influenced by the work of Vygotsky (Reference Vygotsky, Cole, John-Steiner, Scribner and Souberman1978) and is defined as the field that “studies the content, mode of operation, and interrelationships of psychological phenomena that are socially constructed and shared, and are rooted in other social artifacts” (Ratner, Reference Ratner2002, p. 9). As noted by Surtees and Duff (Reference Surtees, Duff, Derwing, Munro and Thomson2022), SCT is comprised of multiple theories, each one drawing upon Vygotsky’s work in different ways. However, a shared commonality between theories informed by SCT is that social interaction is central to all learning. Thus, from an SCT perspective, speaking acts as a vehicle for people to express their identities through socialization, which, in turn, provides them with opportunities to understand different cultural practices such as language and culture (Surtees & Duff, Reference Surtees, Duff, Derwing, Munro and Thomson2022).

The zone of proximal development (ZPD) is one of the most frequently cited concepts when discussing SCT. The term is generally referred to as the difference between what a learner can do on their own versus what they can do through mediation (Lantolf & Beckett, Reference Lantolf and Beckett2009). Referencing SCT, Nassaji and Swain (Reference Nassaji and Swain2000) state that ZPD can serve to promote language awareness through metalinguistic reflection. In other words, the researchers posit that interaction with others mediates language learning by promoting learners’ metalinguistic awareness, which is “an individual’s ability to focus attention on language as an object in and of itself, to reflect upon language, and to evaluate it” (Dillon, Reference Dillon2009, p. 186). According to Goh (Reference Goh, Garrett and Cots2017a), metalinguistic awareness is key to developing L2 speaking as learners who are metacognitively aware can better use speaking strategies such as planning, monitoring, and evaluation during oral interaction.

Scaffolding, or the process in which a teacher or more capable peer provides assistance that enables a learner to accomplish a task they otherwise would not be able to complete (Goh, Reference Goh2017b), is a concept that is closely related to ZPD. While some are hesitant to compare the two as scaffolding may place a greater emphasis on the individual providing aid, thereby restricting freedom of discourse (Kinginger, Reference Kinginger2002), others point out that scaffolding can be a collective endeavor in which learners support one another to reach higher levels of linguistic output (Donato, Reference Donato, Lantolf and Appel1994). In this regard, research suggests that learners who have received metacognitive training on peer scaffolding can improve their L2 speaking skills (Fujii, Ziegler, & Mackey, Reference Fujii, Zeigler, Mackey, Sato and Ballinger2016).

Independent of the two theories outlined above, accentedness and comprehensibility are two concepts that have received much attention in L2 speaking literature. Accentedness refers to how nativelike a learner’s speech is based on listener judgements of their L2 pronunciation, whereas comprehensibility relates to the amount of effort required by listeners to understand L2 speech (Suzuki & Kormos, Reference Suzuki and Kormos2020). Research by Derwing and Munro (Derwing & Munro, Reference Derwing and Munro1997; Munro & Derwing, Reference Munro and Derwing1995) demonstrated that accentedness and comprehensibility are related yet distinct constructs, with accentedness not necessarily interfering with comprehensibility. Current research indicates that comprehensibility is more important than accentedness when teaching L2 speaking (Tsang, Reference Tsang2019), and this is reflected in the fact that the construct is included in the descriptors of high-stakes language assessments such as the Test of English as a Foreign Language (TOEFL) and the International English Language Testing Systems (IELTS) exam (Suzuki & Kormos, Reference Suzuki and Kormos2020). Findings from major studies indicate that L2 comprehensibility is impacted by several factors, most notably fluency, grammar, lexis, and pronunciation (Isaacs & Trofimovich, Reference Isaacs and Trofimovich2012; Saito, Trofimovich, & Isaacs, Reference Saito, Trofimovich and Isaacs2017).

Primary Themes

Improving L2 Speaking through Dialogue-Based CALL

Although numerous technologies can be used to enhance L2 speaking, this section is devoted to the use of dialogue-based CALL. Dialogue-based CALL consists of the collection of digital tools that enable learners to interact with a computer in a target language, including but not limited to intelligent personal assistants (IPAs), automatic speech recognition (ASR)-based CALL, intelligent tutoring systems, and chatbots (Bibauw, Francois, & Desmet, Reference Bibauw, Francois and Desmet2019). Due to word constraints, only two of these technologies are covered in this section: ASR-based CALL and IPAs.

Early research involving ASR and L2 speech indicated that the technology struggled to reach comprehensibility rates similar to that of human listeners. For example, Derwing, Munro, and Carbonaro (Reference Derwing, Munro and Carbonaro2000) found that a popular ASR software could understand L2 speech 71–73 percent of the time, while human listeners were able to recognize 90 percent of what was spoken. However, recent research by McCrocklin and Edalatishams (Reference McCrocklin and Edalatishams2020) shows that ASR-based systems have made considerable improvements in their ability to reliably understand L2 speech. In their study, the researchers analyzed the accuracy of Google’s cloud-based voice transcription software to understand L2 English learners whose first languages (L1) were Chinese and Spanish, which mirrored Derwing et al.’s (Reference Derwing, Munro and Carbonaro2000) research. The adult participants also dictated the same sentences used in Derwing et al. (Reference Derwing, Munro and Carbonaro2000), thereby providing an equal comparison between past ASR software and modern ASR software. Findings revealed that Google’s ASR was able to understand 91–93 percent of the L2 speech, which was similar to the 88–93 percent rate among human listeners. There was also a significant correlation between Google’s ASR and the human listeners when it came to overall comprehensibility. Nonetheless, a significant correlation was only found for the L1 Chinese speakers, that is, there was no significant relationship between Google’s ASR and human listener comprehensibility for the L1 Spanish speakers. According to the researchers, this indicates that the value of ASR software may depend on learners’ L1 and L2 proficiency level.

Studies involving ASR-based systems for L2 learning have revealed some of the affordances of the technology. Research focusing on learners’ experiences and perspectives toward ASR shows that it can promote language learning autonomy (McCrocklin, Reference McCrocklin2016), reduce speaking anxiety (Bashori, et al., Reference Bashori, van Hout, Strik and Cucchiarini2021), and provide useful pronunciation feedback (McCrocklin, Reference McCrocklin2019). Recent studies examining the capacity of ASR in promoting speaking improvements have also yielded positive results. For instance, Jiang et al. (Reference Jiang, Jong, Lau, Chai and & Wu2021) found that L2 English learners using ASR were able to make greater improvements in oral language complexity than those in the control group. In another quantitative study, results from Evers and Chen (Reference Evers and Chen2020) demonstrated that adult L2 English learners could make significant gains in pronunciation in both read aloud and spontaneous conversation tasks. Taken together, these studies demonstrate that learners have favorable perceptions toward ASR-based CALL and that the technology can be useful in promoting L2 speaking skills.

Intelligent personal assistants such as Amazon Alexa, Google Assistant, and Siri rely on ASR and natural language processing (NLP) to understand user requests and respond accordingly. Because of their popularity and widespread availability, speaking with an IPA through a compatible device (e.g. smartphone, smart speaker, headphones) is an easy way for L2 learners to practice their speaking skills. In line with SCT, it appears that IPAs support metalinguistic awareness, that is, speaking with virtual assistants can help direct L2 learners to gaps in their linguistic output. For example, learners in Dizon (Reference Dizon2017) reported that L2 English interactions with Alexa helped them notice deficiencies in their pronunciation that interfered with IPA-mediated communication. This finding is supported by Tai and Chen (Reference Tai and Chen2020), who found that the indirect feedback provided by Google Assistant encouraged the L2 English learners in their study to modify their pronunciation in order to be more easily comprehended by the IPA.

It is important to note that communication breakdowns with Alexa, Google Assistant, or other IPAs may not necessarily be due to an individual learner’s L2 pronunciation issues. Instead, the primary cause may be the inability of a particular IPA to accurately understand any speech that deviates, however slightly, from the most popular varieties of English. However, given the work by McCrocklin and Edalatishams (Reference McCrocklin and Edalatishams2020) as well as Chen, Yang, and Lai’s (Reference Chen, Xie, Zou and Hwang2020) finding that pronunciation errors were the most common reason for communication breakdowns with the target IPA, it is probable that current IPAs can reliably recognize comprehensible L2 speech provided that a learner’s oral output does not suffer from major deviations in target language pronunciation. As a result, IPAs may be more suitable for use among intermediate to advanced L2 learners.

Although limited in number, experimental studies involving IPAs have provided insight into the impact they can have on speaking development. In a small-scale study involving English as a foreign language (EFL) learners, Dizon (Reference Dizon2020) found that students who interacted with Alexa were able to make more significant speaking gains than those who did not. These findings are supported by recent research by Hsu, Chen, and Todd (Reference Hsu, Chen and Todd2021) and Tai and Chen (Reference Tai and Chen2022) as the EFL learners who interacted with an IPA in these studies also made greater gains in L2 speaking skills than those who did not have access to the technology. Several reasons are attributed to the speaking improvements made, namely, IPA-mediated interactions decrease speaking anxiety, promote oral interaction, and increase language learning enjoyment.

Even though ASR systems and IPAs have great potential in promoting oral interaction and speaking development, research indicates that certain steps should be taken to maximize their effectiveness for L2 teaching and learning. First, while it may be tempting for L2 learners to interact with these technologies individually given that they provide speakers with an L2 interlocutor, research suggests that students benefit from group work tasks when using dialogue-based CALL. For instance, in Evers and Chen (Reference Evers and Chen2020), students who used the ASR software with peers were able to make greater pronunciation improvements than learners who worked on their own to identify pronunciation mistakes. There was also a clear preference for collaborative group work over individual activities in Tai and Chen (Reference Tai and Chen2022) as peer feedback enabled them to have smoother and more enjoyable interactions with the IPA. Accordingly, activities involving dialogue-based CALL should utilize group work, thereby encouraging collaborative dialogue and peer scaffolding. Additionally, tasks utilizing ASR or IPAs should incorporate visual feedback in order to decrease the cognitive load of learners and better direct them to gaps in their linguistic output. Learners in Tai and Chen (Reference Tai and Chen2022) reported that the aural-only mode (i.e. using a device that lacked a display) made it difficult for them to properly respond to communication breakdowns with the IPA. Consequently, abandonment was a common communication strategy used by these students, which resulted in less successful interactions. In contrast, students who used a device with a display were able to more easily identify errors in their speech by checking the visual feedback, which in turn, allowed them to adjust their output accordingly. Lastly, dialogue-based CALL activities should incorporate a variety of tasks that target different speaking skills and scenarios. One way to increase task diversity when using IPAs is to take advantage of the skills or apps that can be freely downloaded through their respective platforms. For example, Tai and Chen (Reference Tai and Chen2020) identified Google Assistant skills including Song Quiz, Smart Story Teller, and Car Quiz Pro that allow for different interaction styles. Dozens of Alexa skills were used by the participants in Dizon (Reference Dizon2020) such as vocabulary skills (e.g. Magoosh Vocabulary Builder), interactive audio stories (e.g. Earplay), as well as conversational socialbots. Engaging students in a variety of IPA-mediated tasks will not only help them improve different aspects related to L2 speaking but will also reduce the risk of learner fatigue and disinterest.

Current Research and Practice

Emerging Technologies for Speaking Development

Besides dialogue-based CALL, several other emerging technologies show promise when it comes to the development of L2 speaking. One of them is virtual reality (VR), with many studies since the early 2010s examining how the technology can affect L2 learning. As noted by Ebadi and Ebadijalal (Reference Ebadi and Ebadijalal2020), VR offers several affordances for language learning that are pertinent for speaking development such as increased opportunities for collaborative learning and enhanced motivation and engagement.

Foreign language anxiety (FLA) and its effect on L2 communication is an area that has been oft studied when it comes to VR. In a study comparing three modalities – voice, video, and VR – York et al. (Reference York, Shibata, Tokutake and Nakayama2021) concluded that the VR environment was the easiest, most fun, and most effective medium for English communication. Participants also reported significantly lower levels of FLA in VR, although significant differences were not found between VR and the other modalities in this regard. Trasher (Reference Trasher2022) also investigated VR and FLA using two measures, self-reported anxiety and levels of salivary cortisol, a biological marker of anxiety, in a study involving L2 French students. An additional goal of her study was to determine if VR had a positive influence on L2 speech comprehensibility. Results from the research revealed that students had lower levels of FLA, both in terms of self-reported data and cortisol in VR compared to interaction in the traditional classroom. The learners’ L2 speech comprehensibility was also found to be higher in the VR environment and when students had lower levels of anxiety, thus suggesting a link between the two variables. These two studies demonstrate that VR has the potential to reduce FLA when speaking in an L2, which, in turn, can positively affect speech comprehensibility.

A few studies have explored the impact that VR can have on L2 speaking skills. Ebadi and Ebadijalal (Reference Ebadi and Ebadijalal2020) compared two groups, one that used the Google Expeditions VR platform and a control group, to evaluate if VR could support L2 English oral proficiency and willingness to communicate. Results from the study indicated significant differences between the groups concerning the two variables: VR better contributed to enhanced oral proficiency and willingness to communicate than conventional instruction. In another study utilizing Google’s VR tools, that is, Google Expeditions and Google Cardboard, Xie, Chen, and Ryder (Reference Xie, Chen and Ryder2021) investigated the impact that VR could have on oral presentations. The L2 Chinese participants in their study gave six presentations, four using VR and the remaining two using PowerPoint in a traditional classroom environment. While no significant differences were found in relation to the fluency, grammar, or pronunciation subscales, the learners’ overall content and vocabulary scores were significantly higher using VR compared to PowerPoint. While also examining VR and its potential to improve L2 speaking, Chien, Hwang, and Jong (Reference Chien, Hwang and Jong2020) took a different approach in that the researchers did not compare a VR and non-VR group. Instead, the researchers examined the role of peer assessment and its impact on speaking performance, FLA, and other variables in a VR environment among EFL students. Results from the study indicated that the experimental group which utilized peer feedback made greater gains in speaking fluency and maturity of language, that is, the ability to include details in a response that exceed the minimum requirements. Students in the peer assessment group also exhibited lower levels of FLA, which again underscores the importance of collaborative tasks in technology-mediated speaking activities.

Similar to VR, the use of augmented reality (AR) in L2 teaching and learning has become increasingly popular. In a systematic review paper, Parmaxi and Demetriou (Reference Parmaxi and Demetriou2020) identified fifty-four studies published between 2014 and 2019 that pertained to AR and language learning. However, among those studies, only 9.9 percent of them focused on L2 speaking skills, which implies L2 speaking is underexplored in AR research. One exception is an early case study by Liu (Reference Liu2009), who measured the impact of an AR system on EFL students’ listening and speaking achievements. The researcher found that those in the experimental group who utilized AR had significantly higher listening and speaking test scores throughout the experiment compared to the control group. In a follow-up study investigating the same AR system, Liu and Chu (Reference Liu and Chu2010) had similar results. In other words, EFL learners who used AR had greater improvements in English listening and speaking. Results from interviews indicated that the AR system allowed the students to practice English speaking in an authentic context, which in turn, increased their confidence in speaking the target language. More recent studies investigating AR in an L2 speaking context have not explored the potential linguistic gains learners can make through the technology, but student perceptions toward AR in communicative classroom environments. For instance, Taskiran (Reference Taskiran2019) measured EFL students’ views toward AR games in a survey-based study, with results indicating that the participants enjoyed using AR for language learning and believed the technology supported language development. The researcher posited that the AR games promoted collaboration among the students, which, in turn, helped support L2 speaking and listening.

The growing ubiquity of smartphones has made mobile assisted language learning (MALL) a popular area within technology-mediated language learning research. Having said that, fewer MALL studies have examined speaking compared to other areas related to language proficiency, namely, listening and vocabulary (Shadiev, Hwang, & Huang, Reference Shadiev, Hwang and Huang2017). Hwang et al. (Reference Hwang, Shih, Ma, Shadiev and Chen2016) found that an experimental group that used mobile games outperformed a control group on an L2 speaking post-test. Based on these results, the researchers identified three affordances of MALL that relate to both cognitive theory and SCT: It (1) provides more opportunities for L2 speaking and reflection: (2) promotes speaking accuracy; and (3) allows for L2 speaking in real-life contexts. In a mixed-methods study, Wu and Miller (Reference Wu and Miller2020) investigated EFL students’ perceptions toward a mobile application called PeerEval. The participants gave each other peer feedback using the app after the students’ oral performances. Survey and interview findings suggested that the mobile app promoted improvements in L2 English speaking, with the collaborative nature of MALL one of the primary affordances of the activity.

While there is a strong body of literature examining machine translation (MT) in the context of L2 writing (Lee, Reference Lee2021), research related to MT and L2 speaking is scarce. An exception to this is van Lieshout and Cardoso’s (Reference van Lieshout and Cardoso2022) study of Google Translate as a digital tool for self-directed L2 Dutch learning. In the one-hour experiment, adult L2 learners were tasked with ten learning objectives, that is, they were instructed to learn ten Dutch phrases and their corresponding pronunciations by using Google Translate. The participants had no knowledge of the target phrases prior to the study, thus any gains in L2 vocabulary and pronunciation could be directly attributed to the use of MT. Native speaker raters’ judgements of the participants’ recorded speech showed that the participants’ L2 Dutch was comprehensible and contained a low degree of accentedness, thereby highlighting the potential of MT as a tool for L2 speaking.

Recommendations for Research and Practice

The studies detailed above highlight the significance of group work when utilizing CALL for L2 speaking. Although technology has shown promise in promoting independent, self-directed learning (van Lieshout & Cardoso, Reference van Lieshout and Cardoso2022), the peer feedback that group work brings is invaluable in supporting an engaging and motivational learning environment (e.g. Chien et al., Reference Chien, Hwang and Jong2020; Evers & Chen, Reference Evers and Chen2020). Having said that, it is important that learners be given sufficient training in providing constructive feedback as many of them may lack the confidence or skills needed to give such feedback (Wu & Miller, Reference Wu and Miller2020).

Training is not only important when it comes to giving peer feedback in L2 speaking tasks, but also as it relates to CALL in general. As a result, it is critical to provide quality training so that learners can properly leverage technology for L2 learning. Hubbard and Romeo (Reference Hubbard, Romeo and Stockwell2012) summarize a training model that can be used to train learners in the use of CALL. Their model is comprised of three parts: technical, strategic, and pedagogical training. Technical training consists of giving learners the necessary information regarding how to use a specific technology or application for language learning purposes. Strategic training relates to the teaching of strategies that enable learners to complete specific learning objectives. Finally, pedagogical training refers to supporting learners in their understanding of why they should use certain strategies to reach a given objective. While maintaining a balance between these three areas may be difficult, they should all be integrated into any training program for CALL to be most effective.

Another important consideration to make when implementing technology for L2 teaching is understanding if a CALL task is suitable for one’s teaching context. To that end, Chapelle (Reference Chapelle2001) created a list of six criteria that can be used to evaluate CALL activities (see Table 24.1). As noted by Chapelle (Reference Chapelle2001), CALL evaluation is context-dependent, and what may work in one situation with a group of learners may not work in another: “[A]n evaluation has to result in an argument indicating in what ways a particular CALL task is appropriate for particular learners at a given time (p. 53).

Table 24.1 CALL evaluation criteria by Chapelle (Reference Chapelle2001, p. 55)

Authenticity	The degree of correspondence between the learning activity and target language activities of interest to learners out of the classroom.
Language learning potential	The degree of opportunity present for beneficial focus on form.
Learner fit	The amount of opportunity for engagement with language under appropriate conditions given learner characteristics.
Meaning focus	The extent to which learners’ attention is directed toward the meaning of language.
Positive impact	The positive effects of the CALL activity on those who participate in it.
Practicality	The adequacy of resources to support the use of the CALL activity.

Future Directions

Given the rise of artificial intelligence (AI) in education, it is likely that AI applications will play a more prominent role in L2 speaking going forward. However, considering the potential ethical and privacy concerns, researchers and teachers should be cautious about utilizing AI for language teaching purposes. AI software typically collects large amounts of data about its users, so those who are interested in utilizing AI for language research or teaching must carefully consider the privacy issues involved (X. Chen et al., Reference Chen, Yang and Lai2020). Learning analytics, which describe the process of collecting, analyzing, and reporting data for the purposes of creating an optimal learning environment (Zeng et al., Reference Zeng, Zhang, Gao, Xu and Zhang2020), is another avenue of L2 speaking research that is likely to garner more attention in the coming years. Initial research involving learning analytics for L2 learning suggests that it can be valuable in understanding learners’ behaviors in online courses focused on L2 oral communication (Zeng et al., Reference Zeng, Zhang, Gao, Xu and Zhang2020). The popularity of social media, gaming, and online video means that students are able to learn foreign languages incidentally through these digital practices. Accordingly, there has been increased interest from both young people and researchers in the digital wilds, that is, “informal language learning that takes place in digital spaces, communities, and networks that are independent of formal instructional contexts” (Sauro & Zourou, Reference Sauro and Zourou2017, p. 186). These types of digital practice often involve oral interaction, thereby making them potentially useful for L2 speaking development.

References

Ahmadian, M. J. (2012). The relationship between working memory capacity and L2 oral performance under task-based careful online planning condition. TESOL Quarterly, 46(1), 165–175. https://doi.org/10.1002/tesq.8 CrossRef Google Scholar

Ahmadian, M. J., & Tavakoli, M. (2011). The effects of simultaneous use of careful online planning and task repetition on accuracy, complexity, and fluency in EFL learners’ oral production. Language Teaching Research, 15(1), 35–59. https://doi.org/10.1177%2F1362168810383329 CrossRef Google Scholar

Bashori, M., van Hout, R., Strik, S., & Cucchiarini, C. (2021). Effects of ASR-based websites on EFL learners’ vocabulary, speaking anxiety, and language enjoyment. System, 99, 1–16. https://doi.org/10.1016/j.system.2021.102496 CrossRef Google Scholar

Bibauw, S., Francois, T., & Desmet, P. (2019). Discussing with a computer to practice a foreign language: Research synthesis and conceptual framework of dialogue-based CALL. Computer Assisted Language Learning, 32(8), 827–877. https://doi.org/10.1080/09588221.2018.1535508 CrossRef Google Scholar

Chapelle, C. (2001). Computer applications in second language acquisition: Foundations for teaching, testing and research. Cambridge University Press.CrossRef Google Scholar

Chen, H. H.-J., Yang, C. T.-Y., & Lai, K. K.-W. (2020). Investigating college EFL learners’ perceptions toward the use of Google Assistant for foreign language learning. Interactive Learning Environments, 1–16. https://doi.org/10.1080/10494820.2020.1833043 Google Scholar

Chen, X., Xie, H., Zou, D., & Hwang, G. J. (2020). Application and theory gaps during the rise of artificial intelligence in education. Computers & Education: Artificial Intelligence, 1, 100002. https://doi.org/10.1016/j.caeai.2020.100002 Google Scholar

Chien, S. Y., Hwang, G. J., & Jong, M. S. Y. (2020). Effects of peer assessment within the context of spherical video-based virtual reality on EFL students’ English-speaking performance and learning perceptions. Computers & Education, 146, 103751. https://doi.org/10.1016/j.compedu.2019.103751 CrossRef Google Scholar

Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility and comprehensibility: Evidence from four L1s. Studies in Second Language Acquisition, 19(1), 1–16. https://doi.org/10.1017/S0272263197001010 CrossRef Google Scholar

Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603. https://doi.org/10.2307/3587748 CrossRef Google Scholar

Dillon, A. (2009). Metalinguistic awareness and evidence of cross-linguistic influence among bilingual learners in Irish primary schools. Language Awareness, 18, 182–197. https://doi.org/10.1080/09658410902928479 CrossRef Google Scholar

Dizon, G. (2017). Using intelligent personal assistants for L2 learning: A case study of Alexa. TESOL Journal, 8(4), 811–830. https://doi.org/10.1002/tesj.353 CrossRef Google Scholar

Dizon, G. (2020). Evaluating intelligent personal assistants for L2 listening and speaking development. Language Learning & Technology, 24(1), 16–26. https://doi.org/10125/44705 CrossRef Google Scholar

Donato, R. (1994). Collective scaffolding in second language learning. In Lantolf, J. & Appel, G. (Eds.), Vygotskian approaches to second language research (pp. 33–56). Praeger.Google Scholar

Ebadi, S., & Ebadijalal, M. (2020). The effect of Google expeditions virtual reality on EFL learners’ willingness to communicate and oral proficiency. Computer Assisted Language Learning, 1–25. https://doi.org/10.1080/09588221.2020.1854311 Google Scholar

Evers, K., & Chen, S. (2020). Effects of an automatic speech recognition system with peer feedback on pronunciation instruction for adults. Computer Assisted Language Learning, 1–22. https://doi.org/10.1080/09588221.2020.1839504 Google Scholar

Fujii, A., Zeigler, N., & Mackey, A. (2016). Peer interaction and metacognitive instruction in the EFL classroom. In Sato, M. & Ballinger, S. (Eds.), Peer interaction and second language learning (pp. 63–89). John Benjamins.CrossRef Google Scholar

Gillespie, J. (2020). CALL research: Where are we now? ReCALL, 32(2), 127–144. https://doi.org/10.1017/S0958344020000051 CrossRef Google Scholar

Goh, C. (2017a). Language awareness and the teaching of listening and speaking. In Garrett, P. & Cots, J. M. (Eds.), The Routledge handbook of language awareness (pp. 92–107). Routledge.CrossRef Google Scholar

Goh, C. (2017b). Research into practice: Scaffolding learning processes to improve speaking performance. Language Teaching, 50(2), 247–260. https://doi.org/10.1017/S0261444816000483 CrossRef Google Scholar

Hayashi, Y. (2019). Investigating effects of working memory training on foreign language development. The Modern Language Journal, 103(3), 665–685. https://doi.org/10.1111/modl.12584 CrossRef Google Scholar

Hayashi, Y., Kobayashi, T., & Toyoshige, T. (2016). Investigating the relative contributions of computerized working memory training and English language teaching to cognitive and foreign language development. Applied Cognitive Psychology, 30, 196–213. https://doi.org/10.1002/acp.3177 CrossRef Google Scholar

Housen, A., & Kuiken, F. (2009). Complexity, accuracy and fluency in second language acquisition. Applied Linguistics, 30(4), 461–473. https://doi.org/10.1093/applin/amp048 CrossRef Google Scholar

Hubbard, P., & Romeo, K. (2012). Diversity in learner training. In Stockwell, G. (Ed.), Computer-assisted language learning: Diversity in research and practice (pp. 33–48). Cambridge University Press.CrossRef Google Scholar

Hulstijn, J. H. (2006). Psycholinguistic perspectives on second language acquisition. In Cummins, J., & Davison, C. (Eds.), The international handbook on English language teaching (pp. 701–713). Springer.Google Scholar

Hsu, H. L., Chen, H. H. J., & Todd, A. G. (2021). Investigating the impact of the Amazon Alexa on the development of L2 listening and speaking skills. Interactive Learning Environments. Advance online publication. https://doi.org/10.1080/10494820.2021.2016864 CrossRef Google Scholar

Hwang, W. Y., Shih, T. K., Ma, Z. H., Shadiev, R., & Chen, S. Y. (2016). Evaluating listening and speaking skills in a mobile game-based learning environment with situational contexts. Computer Assisted Language Learning, 29(4), 639–657. https://doi.org/10.1080/09588221.2015.1016438 CrossRef Google Scholar

Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility: Identifying the linguistic influences on listeners’ L2 comprehensibility ratings. Studies in Second Language Acquisition, 34(3), 475–505. https://doi.org/10.1017/S0272263112000150 CrossRef Google Scholar

Jiang, M. Y. C., Jong, M. S. Y., Lau, W. W. F., Chai, C. S., & Wu, N. (2021). Using automatic speech recognition technology to enhance EFL learners’ oral language complexity in a flipped classroom. Australasian Journal of Educational Technology, 37(2), 110–131. https://doi.org/10.14742/ajet.6798 CrossRef Google Scholar

Jong, N. H. de. (2020). Teaching speaking. In Chapelle, C. A. (Ed.), The concise encyclopedia of applied linguistics (pp. 1071–1077). Wiley-Blackwell.Google Scholar

Juffs, A., & Harrington, M. (2011). State of the article: Aspects of working memory in L2 learning. Language Teaching, 44(2), 137–166. https://doi.org/10.1017/S0261444810000509 CrossRef Google Scholar

Kinginger, C. (2002). Defining the zone of proximal development in US foreign language education. Applied Linguistics, 23(2), 240–261.CrossRef Google Scholar

Kormos, J., & Safar, A. (2008). Phonological short-term memory, working memory and foreign language performance in intensive language learning. Bilingualism: Language and Cognition, 11(2), 261–271. https://doi.org/10.1017/S1366728908003416 CrossRef Google Scholar

Lantolf, J. P., & Beckett, T. G. (2009). Sociocultural theory and second language acquisition. Language Teaching, 42(4), 459–475. https://doi.org/10.1017/S0261444809990048 CrossRef Google Scholar

Lee, S.-M. (2021). The effectiveness of machine translation in foreign language education: a systematic review and meta-analysis. Computer Assisted Language Learning, 36(1–2), 103–125. https://doi.org/10.1080/09588221.2021.1901745 CrossRef Google Scholar

Levelt, W. J. M. (1994). The skill of speaking. In Bertelson, P., Eelen, P., & d’Ydewalle, G. (Eds.), International perspectives on psychological science: Vol. 1. Leading themes (pp. 89–103). Lawrence Erlbaum Associates, Inc.Google Scholar

Levelt, W. J. M. (1999). Language production: A blueprint of the speaker. In Brown, C. & Hagoort, P. (Eds.), Neurocognition of language (pp. 83–122). Oxford University Press.Google Scholar

Liu, T.-Y. (2009). A context-aware ubiquitous learning environment for language listening and speaking. Journal of Computer Assisted Learning, 25(6), 515–527. https://doi.org/10.1111/j.1365-2729.2009.00329.x CrossRef Google Scholar

Liu, T.-Y., & Chu, Y.-L. (2010). Using ubiquitous games in an English listening and speaking course: Impact on learning outcomes and motivation. Computers & Education, 55(2), 630–643. https://doi.org/10.1016/j.compedu.2010.02.023 CrossRef Google Scholar

Mackey, A., Adams, R., Stafford, C., & Winke, P. (2010). Exploring the relationship between modified output and working memory capacity. Language Learning, 60(3), 501–533. https://doi.org/10.1111/j.1467-9922.2010.00565.x CrossRef Google Scholar

McCrocklin, S. (2016). Pronunciation learner autonomy: The potential of automatic speech recognition. System, 57, 25–42. https://doi.org/10.1016/j.system.2015.12.013 CrossRef Google Scholar

McCrocklin, S. (2019). Learners’ feedback regarding ASR-based dictation practice for pronunciation learning. CALICO Journal, 36(2), 119–137. https://doi.org/10.1558/cj.34738 CrossRef Google Scholar

McCrocklin, S., & Edalatishams, I. (2020). Revisiting popular speech recognition software for ESL speech. TESOL Quarterly, 54(4), 1086–1097. https://doi.org/10.1002/tesq.3006 CrossRef Google Scholar

Munro, M. J., & Derwing, T. M. (1995). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38(3), 289–306. https://doi.org/10.1177/002383099503800305 CrossRef Google Scholar PubMed

Nassaji, H., & Swain, M. (2000). A Vygotskian perspective on corrective feedback in L2: The effect of random versus negotiated help on the learning of English articles. Language Awareness, 9(1), 34–51. https://doi.org/10.1080/09658410008667135 CrossRef Google Scholar

Parmaxi, A., & Demetriou, A. A. (2020). Augmented reality in language learning: A state‐of‐the‐art review of 2014–2019. Journal of Computer Assisted Learning, 36(5), 1–15. https://doi.org/10.1111/jcal.12486 CrossRef Google Scholar

Qiu, X. (2019). Functions of oral monologic tasks: Effects of topic familiarity on L2 speaking performance. Language Teaching Research, 24(6), 1–20. https://doi.org/10.1177/1362168819829021 Google Scholar

Ratner, C. (2002). Cultural psychology: Theory and method. Kluwer/Plenum.CrossRef Google Scholar

Saito, K., Trofimovich, P., & Isaacs, T. (2017). Using listener judgments to investigate linguistic influences on L2 comprehensibility and accentedness: A validation and generalization study. Applied Linguistics, 38(4), 439–462. https://doi.org/10.1093/applin/amv047 Google Scholar

Sauro, S., & Zourou, K. (2017). Call for papers. Language Learning & Technology, 21(1), 186. https://doi.org/10125/44603 Google Scholar

Shadiev, R., Hwang, W.-Y., & Huang, Y.-M. (2017). Review of research on mobile language learning in authentic environments. Computer Assisted Language Learning, 30(3–4), 284–303. http://dx.doi.org/10.1080/09588221.2017.1308383 CrossRef Google Scholar

Surtees, V., & Duff, P. (2022). Sociocultural approaches to speaking in SLA. In Derwing, T. M., Munro, M. J., & Thomson, R. I. (Eds.), The Routledge handbook of second language acquisition and speaking (pp. 54–67). Routledge.CrossRef Google Scholar

Suzuki, S., & Kormos, J. (2020). Linguistic dimensions of comprehensibility and perceived fluency: An investigation of complexity, accuracy, and fluency in second language argumentative speech. Studies in Second Language Acquisition, 42(1), 143–167. https://doi.org/10.1017/S0272263119000421 CrossRef Google Scholar

Tai, T.-Y., & Chen, H. H.-J. (2020). The impact of Google Assistant on adolescent EFL learners’ willingness to communicate. Interactive Learning Environments, 31(3), 1485–1502. https://doi.org/10.1080/10494820.2020.1841801 CrossRef Google Scholar

Tai, T.-Y., & Chen, H. H.-J. (2022). The impact of intelligent personal assistants on adolescent EFL learners’ speaking proficiency. Computer Assisted Language Learning, 37(5–6), 1224–1251. https://doi.org/10.1080/09588221.2022.2070219 CrossRef Google Scholar

Taskiran, A. (2019). The effect of augmented reality games on English as foreign language motivation. E-Learning and Digital Media, 16(2), 122–135. https://doi.org/10.1177/2042753018817541 CrossRef Google Scholar

Trasher, T. (2022). The impact of virtual reality on L2 French learners’ language anxiety and oral comprehensibility: An exploratory study. CALICO Journal, 39(2), 1–20. https://doi.org/10.1558/cj.42198 Google Scholar

Tsang, A. (2019). Reconceptualizing speaking, listening, and pronunciation: Glocalizing TESOL in the contexts of World Englishes and English as a lingua franca. TESOL Quarterly, 53(2), 580–588. https://doi.org/10.1002/tesq.504 CrossRef Google Scholar

van Lieshout, C., & Cardoso, W. (2022). Google Translate as a tool for self-directed language learning. Language Learning & Technology, 26(1), 1–19. http://hdl.handle.net/10125/73460 CrossRef Google Scholar

Vygotsky, L. S. (1978). Interaction between learning and development. In Cole, M., John-Steiner, V., Scribner, S., & Souberman, E. (Eds.), Mind in society: The development of higher psychological processes (pp. 79–91). Harvard University Press.Google Scholar

Wolfe-Quintero, K., Inagaki, S., & Kim, H. Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity (Technical Report No. 17). National Foreign Language Resource Center.Google Scholar

Wu, J. G., & Miller, L. (2020). Improving English learners’ speaking through mobile-assisted peer feedback. RELC Journal, 51(1), 168–178. https://doi.org/10.1177/0033688219895335 CrossRef Google Scholar

Xie, Y., Chen, Y., & Ryder, L. H. (2021). Effects of using mobile-based virtual reality on Chinese L2 students’ oral proficiency. Computer Assisted Language Learning, 34(3), 225–245. https://doi.org/10.1080/09588221.2019.1604551 CrossRef Google Scholar

York, J., Shibata, K., Tokutake, H., & Nakayama, H. (2021). Effect of SCMC on foreign language anxiety and learning experience: A comparison of voice, video, and VR-based oral interaction. ReCALL, 33(1), 49–70. https://doi.org/10.1017/S0958344020000154 CrossRef Google Scholar

Zeng, S., Zhang, J., Gao, M., Xu, K. M., & Zhang, J. (2020). Using learning analytics to understand collective attention in LMOOCs. Computer Assisted Language Learning, 1–27. https://doi.org/10.1080/09588221.2020.1825094 Google Scholar

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the HTML of this chapter is currently unknown and may be updated in the future.

Book contents

24 - Speaking

Summary

Keywords

Information

Introduction

Background

Primary Themes

Improving L2 Speaking through Dialogue-Based CALL

Current Research and Practice

Emerging Technologies for Speaking Development

Recommendations for Research and Practice

Table 24.1 CALL evaluation criteria by Chapelle (Reference Chapelle2001, p. 55)

Future Directions

References

References

Further Reading

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Save book to Kindle

Save book to Dropbox

Save book to Google Drive