Hostname: page-component-77f85d65b8-lfk5g Total loading time: 0 Render date: 2026-03-28T00:04:15.178Z Has data issue: false hasContentIssue false

A closer look at a marginalized test method: Self-assessment as a measure of speaking proficiency

Published online by Cambridge University Press:  28 April 2022

Paula Winke*
Affiliation:
Michigan State University, East Lansing, MI, USA
Xiaowan Zhang
Affiliation:
MetaMetrics, Durham, NC, USA
Steven J. Pierce
Affiliation:
Michigan State University, East Lansing, MI, USA
*
*Corresponding author: E-mail: winke@msu.edu
Rights & Permissions [Opens in a new window]

Abstract

Second language (L2) teachers may shy away from self-assessments because of warnings that students are not accurate self-assessors. This information stems from meta-analyses in which self-assessment scores on average did not correlate highly with proficiency test results. However, researchers mostly used Pearson correlations, when polyserial could be used. Furthermore, self-assessments today can be computer adaptive. With them, nonlinear statistics are needed to investigate their relationship with other measurements. We wondered, if we explored the relationship between self-assessment and proficiency test scores using more robust measurements (polyserial correlation, continuation-ratio modeling), would we find different results? We had 807 L2-Spanish learners take a computer-adaptive, L2-speaking self-assessment and the ACTFL Oral Proficiency Interview – computer (OPIc). The scores correlated at .61 (polyserial). Using continuation-ratio modeling, we found each unit of increase on the OPIc scale was associated with a 131% increase in the odds of passing the self-assessment thresholds. In other words, a student was more likely to move on to higher self-assessment subsections if they had a higher OPIc rating. We found computer-adaptive self-assessments appropriate for low-stakes L2-proficiency measurements, especially because they are cost-effective, make intuitive sense to learners, and promote learner agency.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Figure 1. Timeline of 15 studies on L2 self-assessment of speaking 1985–2019

Figure 1

Figure 2. This study’s validity claim, warrants, backings, and rebuttal that will be tested through the analysis of the data.

Figure 2

Figure 3. The sequential selection process of the self-assessment. Level 1 through level 5 represent the five testlets of 10 can-do statements in the self-assessment. Threshold 1 through threshold 4 represent the four thresholds that implicitly exist between every two levels of self-assessment testlets.

Figure 3

Table 1. The five ACTFL OPIc test forms offered to the students in spring 2017 (adapted from Isbell & Winke, p. 469).

Figure 4

Table 2. Frequency counts of students’ highest self-assessment level by OPIc score

Figure 5

Table 3. Polyserial, Pearson, and Spearman correlations between self-assessment levels and OPIc scores

Figure 6

Table 4. The number of students who completed, demonstrated mastery on, and failed to demonstrate mastery on the statements in each testlet level of the self-assessment

Figure 7

Table 5. The number of students who took and passed each threshold (see Figure 3) and each threshold’s conditional pass rate

Figure 8

Table 6. Model estimates, standard errors, p-values, and odds ratio for Models 1 and 2

Figure 9

Table 7. Goodness-of-fit indices for Models 1 and 2 and likelihood-ratio test result

Figure 10

Figure 4. Predicted conditional pass rates by OPIc rating for each self-assessment threshold. The nine lines represent the nine observed OPIc ratings (1 = novice-low, 2 = novice-mid, 3 = novice-high, 4 = intermediate-low, 5 = intermediate-mid, 6 = intermediate-high, 7 = advanced-low, 8 = advanced-mid, 9 = advanced-high). Due to how OPIc ratings were centered for modeling purposes, the line for OPIc = 5 visualizes the interpretation of the set of transition-specific intercepts.

Figure 11

Figure 5. Predicted unconditional (or absolute) pass rates by OPIc rating for each self-assessment threshold. The nine lines represent the nine observed OPIc ratings (1 = novice-low, 2 = novice-mid, 3 = novice-high, 4 = intermediate-low, 5 = intermediate-mid, 6 = intermediate-high, 7 = advanced-low, 8 = advanced-mid, 9 = advanced-high).