Hostname: page-component-6766d58669-fx4k7 Total loading time: 0 Render date: 2026-05-15T10:12:42.005Z Has data issue: false hasContentIssue false

Predicting lexico-grammatical competence in L2 Spanish writing through multiple lexical diversity measures

Published online by Cambridge University Press:  15 May 2026

Earl Kjar Brown*
Affiliation:
Department of Linguistics, Brigham Young University, Provo, Utah, USA
Brett Hashimoto
Affiliation:
Department of Linguistics, Brigham Young University, Provo, Utah, USA
Alan V. Brown
Affiliation:
Department of Hispanic Studies, University of Kentucky, Lexington, Kentucky, USA
*
Corresponding author: Earl Kjar Brown; Email: ekbrown@byu.edu
Rights & Permissions [Opens in a new window]

Abstract

The present study compares several lexical diversity (LD) measures to determine which measure, or measures, best predict receptive lexico-grammatical ability in written L2 Spanish, and whether a composite LD score is better than any single measure. We analyzed 1,225 written responses with eight different LD measures: six popular measures, a composite score based on Principal Component Analysis (PCA) of those six measures, and the traditional TTR to be used as a baseline against which to compare the other measures. Other predictor variables included the age and gender of the writers, the age of first exposure to Spanish, the number of years studying Spanish, and study abroad participation. The results of a series of mixed-effect logistic regression models suggest that the composite LD measure is best, and that among the LD measures studied, two predicted lexico-grammatical ability nearly equally well. We conclude with the recommendation that L2 language researchers use multiple LD measures, including a composite measure based on PCA, rather than any single LD measure.

Information

Type
Methods Forum
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Table 1. Descriptive Statistics for L2 Spanish Writers

Figure 1

Table 2. Descriptive Statistics of Placement Exam Scores

Figure 2

Figure 1. Distribution of exam scores.

Figure 3

Table 3. Summary of Calculations of LD Measures Used in This Study

Figure 4

Table 4. Correlation between Manually Corrected and Hunspell-Corrected Versions of 40 Randomly Selected Responses

Figure 5

Figure 2. Residuals by fitted values of mixed-effect linear regression model with MATTR and other predictor variables.

Figure 6

Table 5. Logistic Regression Model with MTLD-MA-Wrap

Figure 7

Table 6. Value Inflation Factor Scores of Logistic Regression Model That Included MTLD-MA-Wrap

Figure 8

Figure 3. Exam score by MTLD-MA-wrap by binned text length.

Figure 9

Table 7. Pearson Correlation Coefficients of Six LD Measures

Figure 10

Table 8. Kaiser-Meyer-Olkin Sampling Adequacy Scores of Five LD Measures

Figure 11

Table 9. Eigenvalues and Percentage of Explained Variance of Principal Components

Figure 12

Figure 4. Vector plot of PCA.

Figure 13

Figure 5. Contribution of six LD measures to PC1.

Figure 14

Table 10. Mixed-Effect Logistic Regression with PC1 Along with Other Predictors

Figure 15

Table 11. Value Inflation Factor Scores of Logistic Regression Model That Included PC1

Figure 16

Figure 6. Exam score by PC1 by text length.

Figure 17

Table 12. Bayes Factors for Model Comparison