Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-09T10:08:03.220Z Has data issue: false hasContentIssue false

Second language learning of degree expressions: A computational approach

Published online by Cambridge University Press:  03 December 2024

Yan Cong*
Affiliation:
School of Languages and Cultures, Purdue University, West Lafayette, IN 47907, USA
Rights & Permissions [Opens in a new window]

Abstract

Degrees, unlike entities or events, refer to comparative qualities and are closely tied to gradable adjectives such as “tall.” Degree expressions have been explored in second language (L2) research, covering areas such as learnability, first language (L1) transfer, contrastive analysis, and acquisition difficulty. However, a computational approach to the learning of degree expressions in L2 contexts, particularly for L1 Chinese learners of English, has not been thoroughly investigated. This study aims to fill this gap by utilizing natural language processing (NLP) methods, drawing insights from recent advancements in large language models (LLMs). This study extends Cong (2024)’s general-purpose assessment pipeline to specifically analyze degree expressions, predicting that surprisal metrics will correlate with proficiency levels and distinct developmental stages of L2 learners. Crucially, we address the limitations of surprisal metrics in capturing underuse or avoidance—common in L2 development—by integrating frequency-based analyses. Using an NLP pipeline developed with Stanza, we automatically identified and analyzed degree expressions, constructing linear mixed-effects models to track L2 development trajectories. Our findings reveal that as proficiency increases, learners use complex degree expressions more frequently, supporting theories linking difficulty and learnability. Higher surprisal values are associated with lower proficiency in using degree expressions, and these surprisals are more predictive of degree expressions proficiency than classic NLP measures. These results add further evidence that LLMs and NLP tools provide valuable insights into L2 development, specifically in the domain of degree expressions, expanding upon previous research and offering new approaches for understanding L2 learning processes.

Information

Type
Squib
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Table 1. A subset of PELIC that we extracted for our degree expressions analysis

Figure 1

Table 2. Subtypes of degree expressions extracted from the L2 corpus

Figure 2

Figure 1. Dependency parse of an English clausal comparative sentence “Alex marched more quickly than she thought.” with the Stanza dependency parser.

Figure 3

Table 3. Descriptive statistics for the LLMs variables

Figure 4

Figure 2. L2 development in degree expressions indexed by LLMs-surprisals. *p$\lt$0.05; **p$\lt$0.001; ***p$\lt$0.0001; ns, no significant difference.

Figure 5

Table 4. Effect sizes of paired comparisons

Figure 6

Table 5. Linear mixed-effects models summary for each degree expression subtype

Figure 7

Figure 3. Degree expressions subtypes usage development across proficiency levels.

Figure 8

Table 6. Evaluation metrics of the random forest models predicting writing proficiency

Figure 9

Table 7. Factor analysis of the surprisals and existing (syntactic) sophistication indices

Figure 10

Table 8. Spearman’s correlations between Writing_Sample and different kinds of surprisals