Hostname: page-component-77f85d65b8-jkvpf Total loading time: 0 Render date: 2026-03-29T14:47:32.132Z Has data issue: false hasContentIssue false

NLP-powered quantitative verification of the English Grammar Profile’s structure-level assignment

Published online by Cambridge University Press:  25 July 2025

Daniela Verratti-Souto*
Affiliation:
Hector Research Institute of Education Sciences and Psychology, University of Tübingen, Tübingen, Germany
Nelly Sagirov
Affiliation:
Hector Research Institute of Education Sciences and Psychology, University of Tübingen, Tübingen, Germany
Xiaobin Chen
Affiliation:
Hector Research Institute of Education Sciences and Psychology, University of Tübingen, Tübingen, Germany
*
Corresponding author: Daniela Verratti-Souto; Email: daniverratti@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Since its inception, the Common European Framework of Reference (CEFR) has become increasingly influential in the field of second language (L2) education. In an effort to define the grammatical structures that English learners acquire at each CEFR level, the English Grammar Profile (EGP) provides a list of over 1,200 structure-level mappings derived from largely manual analysis of learner corpora. Though highly valuable for the design of didactic materials and examinations, the EGP lacks comprehensive quantitative methods to verify the acquisition levels it proposes for the grammatical structures. This paper presents an approach for revisiting the EGP structure-level mappings with empirical statistics. The approach utilizes automatic grammatical construction extraction, a large learner corpus, and statistical testing to empirically determine the level of each structure. The structure-level mappings resulting from our approach show limited agreement with that of the original EGP proposals, suggesting that frequency data alone does not provide enough evidence for the acquisition of the grammatical structures at the levels presented by the EGP.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.
Figure 0

Figure 1. Example EGP form- and use-based descriptors for the A1 level.

Figure 1

Table 1. Englishtown levels and their CEFR and Cambridge English counterparts

Figure 2

Figure 2. Obtaining L2 knowledge representations with the UIMA framework.

Figure 3

Table 2. Distribution of extractable structures across CEFR levels

Figure 4

Table 3. Average precision, recall, and f1-score across all evaluated structures for each CEFR level

Figure 5

Figure 3. Example of an expected trajectory for an A2 structure.

Figure 6

Table 4. Number and percentage of structures of each level not found in the data

Figure 7

Table 5. Number and percentage of structures per EGP level presenting significant differences in their frequencies across text levels

Figure 8

Table 6. Percentage of informative structures with a significant difference at their corresponding EGP level

Figure 9

Table 7. Precision, recall, and f1-score for structure-level assignments

Figure 10

Figure 4. Classification of structures into levels: our predictions vs. the EGP-assigned levels.

Supplementary material: File

Verratti-Souto et al. supplementary material

Verratti-Souto et al. supplementary material
Download Verratti-Souto et al. supplementary material(File)
File 2.7 MB