Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-14T22:38:19.395Z Has data issue: false hasContentIssue false

Automated assessment of second language comprehensibility: Review, training, validation, and generalization studies

Published online by Cambridge University Press:  28 March 2022

Kazuya Saito*
Affiliation:
University College London, London, UK
Konstantinos Macmillan
Affiliation:
Birkbeck, University of London, London, UK
Magdalena Kachlicka
Affiliation:
Birkbeck, University of London, London, UK
Takuya Kunihara
Affiliation:
University of Tokyo, Tokyo, Japan
Nobuaki Minematsu
Affiliation:
University of Tokyo, Tokyo, Japan
*
*Corresponding author. Email: k.saito@ucl.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Whereas many scholars have emphasized the relative importance of comprehensibility as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners’ judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward using machine learning on spontaneous unscripted speech in speech engineering, the current study examined the possibility of establishing quick and reliable automated comprehensibility assessments. Orchestrating a set of phonological (maximum posterior probabilities and gaps between L1 and L2 speech), prosodic (pitch and intensity variation), and temporal measures (articulation rate, pause frequency), the regression model significantly predicted how naïve listeners intuitively judged low, mid, high, and nativelike comprehensibility among 100 L1 and L2 speakers’ picture descriptions. The strength of the correlation (r = .823 for machine vs. human ratings) was comparable to naïve listeners’ interrater agreement (r = .760 for humans vs. humans). The findings were successfully replicated when the model was applied to a new dataset of 45 L1 and L2 speakers (r = .827) and tested under a more freely constructed interview task condition (r = .809).

Information

Type
Methods Forum
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Summary of 11 key studies on automated L2 speech assessment

Figure 1

Figure 1. Conceptual summary of averaged posterior and posterior gap.

Figure 2

Table 2. Interclass correlations among 10 listeners’ comprehensibility judgments (Study 1)

Figure 3

Table 3. Descriptive statistics of automated measures

Figure 4

Table 4. Results of multiple regression analysis using automated measures as predictors of listeners’ comprehensibility scores

Figure 5

Figure 2. The relationship between human comprehensibility scores and predicted comprehensibility scores (r = .823).

Figure 6

Table 6. Interclass correlations among five listeners’ comprehensibility judgments (Study 2)

Figure 7

Figure 3. The relationship between human comprehensibility scores and predicted comprehensibility scores (r = .827).

Figure 8

Table 7. Interclass correlations among five listeners’ comprehensibility judgments (Study 2)

Figure 9

Figure 4. Relationship between human comprehensibility scores and predicted comprehensibility scores (r = .809).

Supplementary material: File

Saito et al. supplementary material

Saito et al. supplementary material

Download Saito et al. supplementary material(File)
File 21.9 KB