Hostname: page-component-77f85d65b8-8wtlm Total loading time: 0 Render date: 2026-03-28T09:32:47.164Z Has data issue: false hasContentIssue false

Relative importance of lexical features in word processing during L2 English reading

Published online by Cambridge University Press:  29 August 2025

Shingo Nahatame*
Affiliation:
University of Tsukuba , Tsukuba, Japan
Satoru Uchida
Affiliation:
Kyushu University , Fukuoka, Japan
*
Corresponding author: Shingo Nahatame: Email: nahatame.shingo.gp@u.tsukuba.ac.jp
Rights & Permissions [Opens in a new window]

Abstract

Word processing during reading is known to be influenced by lexical features, especially word length, frequency, and predictability. This study examined the relative importance of these features in word processing during second language (L2) English reading. We used data from an eye-tracking corpus and applied a machine-learning approach to model word-level eye-tracking measures and identify key predictors. Predictors comprised several lexical features, including length, frequency, and predictability (e.g., surprisal). Additionally, sentence, passage, and reader characteristics were considered for comparison. The analysis found that word length was the most important variable across several eye-tracking measures. However, for certain measures, word frequency and predictability were more important than length, and in some cases, reader characteristics such as proficiency were more significant than lexical features. These findings highlight the complexity of word processing during reading, the shared processes between first language (L1) and L2 reading, and their potential to refine models of eye-movement control.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Descriptive statistics for six eye-tracking measures used in this study

Figure 1

Table 2. Variables of lexical features

Figure 2

Figure 1. Relative importance of each predictor for skipping (skip).Note: Each score is presented with its mean value and accompanied by error bars indicating the standard error.

Figure 3

Figure 2. Relationship between skipping probability and word length.

Figure 4

Figure 3. Relative importance of each predictor for first fixation duration (ffd).Note: Each score is presented with its mean value and accompanied by error bars indicating the standard error.

Figure 5

Figure 4. Relative importance of each predictor for regression-in (regin).Note: Each score is presented with its mean value and accompanied by error bars indicating the standard error.

Figure 6

Figure 5. Relative importance of each predictor for regression path duration (rpd).Note: Each score is presented with its mean value and accompanied by error bars indicating the standard error.

Figure 7

Figure 6. Relative importance of each predictor for the number of fixations (nfix).Note: Each score is presented with its mean value and accompanied by error bars indicating the standard error.

Figure 8

Figure 7. Relative importance of each predictor for total fixation duration (tfd).Note: Each score is presented with its mean value and accompanied by error bars indicating the standard error.

Figure 9

Figure 8. Heatmap of predictor importance across eye-tracking measures.Note: Warmer colors (red, orange, and yellow) indicate higher relative importance, whereas cooler green indicates lower relative importance. To enhance interpretability, the color gradient in this heatmap is based on normalized values within each column, allowing the relative importance of predictors to be compared independently within each measure.

Supplementary material: File

Nahatame and Uchida supplementary material

Nahatame and Uchida supplementary material
Download Nahatame and Uchida supplementary material(File)
File 40.4 KB