Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-06T08:30:50.341Z Has data issue: false hasContentIssue false

DOING L2 SPEECH RESEARCH ONLINE: WHY AND HOW TO COLLECT ONLINE RATINGS DATA

Published online by Cambridge University Press:  28 June 2021

Charles L. Nagle*
Affiliation:
Iowa State University
Ivana Rehman
Affiliation:
Iowa State University
*
*Correspondence concerning this article should be addressed to Charles L. Nagle, Department of World Languages and Cultures, Iowa State University, 3102 Pearson Hall, 505 Morrill Road, Ames, Iowa 50011. E-mail: cnagle@iastate.edu
Rights & Permissions [Opens in a new window]

Abstract

Listener-based ratings have become a prominent means of defining second language (L2) users’ global speaking ability. In most cases, local listeners are recruited to evaluate speech samples in person. However, in many teaching and research contexts, recruiting local listeners may not be possible or advisable. The goal of this study was to hone a reliable method of recruiting listeners to evaluate L2 speech samples online through Amazon Mechanical Turk (AMT) using a blocked rating design. Three groups of listeners were recruited: local laboratory raters and two AMT groups, one inclusive of the dialects to which L2 speakers had been exposed and another inclusive of a variety of dialects. Reliability was assessed using intraclass correlation coefficients, Rasch models, and mixed-effects models. Results indicate that online ratings can be highly reliable as long as appropriate quality control measures are adopted. The method and results can guide future work with online samples.

Information

Type
Methods Forum
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2021. Published by Cambridge University Press
Figure 0

TABLE 1. Listeners demographics

Figure 1

FIGURE 1. Overview of the structure and timing of the online ratings in Amazon Mechanical Turk.

Figure 2

TABLE 2. Listener demographics: screened and approved AMT workers

Figure 3

TABLE 3. Means and (SDs) by rater group and construct

Figure 4

FIGURE 2. Score distribution by rater group and construct.

Figure 5

TABLE 4. Reliability coefficients by rater group and construct

Figure 6

TABLE 5. Summary of least square means: group × rating type

Figure 7

FIGURE 3. Reliability of the AMT ratings at different listener sample sizes.Note. Each sample size simulation consists of 100 runs. These ICC estimates hold for the blocked design of the current study, in which listeners evaluated at least 11 L2 files and up to 22 files if they participated in both experimental blocks. According to Cicchetti (1994), ICC > .60 = good and ICC > .75 = excellent. Solid black lines have been added to the figure at these values.

Supplementary material: File

Nagle and Rehman supplementary material

Nagle and Rehman supplementary material

Download Nagle and Rehman supplementary material(File)
File 2.7 MB