Abstract
Estimating word complexity is essential for many computer-assisted language learning technologies. We introduce semantic error prediction (SEP) as a novel task that assesses the production complexity of content words. In SEP, a system has to predict which word token are replacements of tokens from the original text. We use LLMs for this novel task and establish its practical relevance for predicting the vocabulary scores of learner essays, providing a finer-grained assessment of learner skills.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)