Skip to main content Accessibility help
Hostname: page-component-684899dbb8-x64cq Total loading time: 0.261 Render date: 2022-05-16T21:47:08.983Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true }

Article contents

Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach

Published online by Cambridge University Press:  10 November 2021

Thomas Gaillat
Université Rennes 2, France (
Andrew Simpkin
School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway (
Nicolas Ballier
Université de Paris, France (
Bernardo Stearns
Data Science Institute (DSI), National University of Ireland, Galway (
Annanda Sousa
Data Science Institute (DSI), National University of Ireland, Galway (
Manon Bouyé
Université de Paris, France (
Manel Zarrouk
Université Sorbonne Paris Nord, France (


This paper focuses on automatically assessing language proficiency levels according to linguistic complexity in learner English. We implement a supervised learning approach as part of an automatic essay scoring system. The objective is to uncover Common European Framework of Reference for Languages (CEFR) criterial features in writings by learners of English as a foreign language. Our method relies on the concept of microsystems with features related to learner-specific linguistic systems in which several forms operate paradigmatically. Results on internal data show that different microsystems help classify writings from A1 to C2 levels (82% balanced accuracy). Overall results on external data show that a combination of lexical, syntactic, cohesive and accuracy features yields the most efficient classification across several corpora (59.2% balanced accuracy).

Research Article
© The Author(s), 2021. Published by Cambridge University Press on behalf of European Association for Computer Assisted Language Learning

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Arnold, T., Ballier, N., Gaillat, T. & Lissòn, P. (2018) Predicting CEFR levels in learner English on the basis of metrics and full texts. Proceedings of the 20th Conférence Sur l’Apprentissage Automatique. INSA de Rouen, 20–22 June.Google Scholar
Ballier, N., Canu, S., Petitjean, C., Gasso, G., Balhana, C., Alexopoulou, T. & Gaillat, T. (2020) Machine learning for learner English: A plea for creating learner data challenges. International Journal of Learner Corpus Research, 6(1): 72103. CrossRefGoogle Scholar
Ballier, N. & Gaillat, T. (2016) Classifying French learners of English with written-based lexical and complexity metrics. JEP-TALN-RECITAL 2016, 9: 114.Google Scholar
Ballier, N., Gaillat, T., Simpkin, A., Stearns, B., Bouyé, M. & Zarrouk, M. (2019) A supervised learning model for the automatic assessment of language levels based on learner errors. In Scheffel, M., Broisin, J., Pammer-Schindler, V., Ioannou, A. & Schneider, J. (eds.), Transforming learning with meaningful technologies: 14th European Conference on Technology Enhanced Learning, EC-TEL 2019, Delft, The Netherlands, September 16–19, 2019, proceedings. Switzerland: Springer International Publishing, 308–320. CrossRefGoogle Scholar
Biber, D., Gray, B., Staples, S. & Egbert, J. (2020) Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes, 46: 115. CrossRefGoogle Scholar
Boulton, A. (2017) Data-driven learning and language pedagogy. In Thorne, S. L. & May, S. (eds.), Language, education and technology (3rd ed.). Cham: Springer International Publishing, 181–192. Google Scholar
Chen, M. & Zechner, K. (2011) Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human language technologies: Volume 1: Long papers. Stroudsburg: Association for Computations Linguistics, 722–731.Google Scholar
Chen, X. & Meurers, D. (2016) CTAP: A web-based tool supporting automatic complexity analysis. In Brunato, D., Dell’Orletta, G., Venturi, G., François, T. & Blache, P. (eds.), Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC). Osaka: The COLING 2016 Organizing Committee, 113–119.Google Scholar
Crossley, S. A., Kyle, K., Allen, L. K., Guo, L. & McNamara, D. S. (2014) Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. The Journal of Writing Assessment, 7(1): 134.Google Scholar
Crossley, S. A., Kyle, K. & McNamara, D. S. (2016) The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48(4): 12271237. CrossRefGoogle ScholarPubMed
Crossley, S. A. & McNamara, D. S. (2012) Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2): 115135. CrossRefGoogle Scholar
Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2011) Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28(4): 561580. CrossRefGoogle Scholar
de Jong, J. H. A. L. & Benigno, V. (2017) Alignment of the Global Scale of English to other scales: The concordance between PTE Academic, IELTS, and TOEFL (Global Scale of English Research Series). London: Pearson.Google Scholar
Depraetere, I. & Langford, C. (2012) Advanced English grammar: A linguistic approach. London: Continuum International.Google Scholar
Ellis, R. (1994) The study of second language acquisition. Oxford: Oxford University Press.Google Scholar
Friedman, J. H., Hastie, T. & Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 122. CrossRefGoogle ScholarPubMed
Gaillat, T. (2016) Reference in interlanguage: The case of this and that. From linguistic annotation to corpus interoperability. Université Paris Diderot, unpublished PhD.Google Scholar
Garner, J., Crossley, S. & Kyle, K. (2019) N-gram measures and L2 writing proficiency. System, 80: 176187. CrossRefGoogle Scholar
Geertzen, J., Alexopoulou, T. & Korhonen, A. (2014) Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCamDat). In Miller, R. T., Martin, K. I., Eddington, C. M., Henery, A., Miguel, N., Tseng, A., Tuninetti, A. & Walter, D. (eds.), Selected proceedings of the 2021 Second Language Research Forum: Building bridges between disciplines. Somerville: Cascadilla Proceedings Project, 240–254.Google Scholar
Gentilhomme, Y. (1979) Microsystèmes linguistiques et langagiers: Fonctions heuristiques et didactiques. Introduction méthodologique. Travaux du Centre de Recherches Sémiologiques, 34: 131.Google Scholar
Gentilhomme, Y. (1980) Microsystèmes et acquisition des langues. Encrages, Numéro spécial: 79–84.Google Scholar
Granger, S., Kraif, O., Ponton, C., Antoniadis, G. & Zampa, V. (2007) Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL, 19(3): 252268. CrossRefGoogle Scholar
Hawkins, J. A. & Buttery, P. (2010) Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1(1): 123. CrossRefGoogle Scholar
Hawkins, J. A. & Filipović, L. (2012) Criterial features in L2 English: Specifying the reference levels of the Common European Framework. Cambridge: Cambridge University Press.Google Scholar
Hoerl, A. E. & Kennard, R. W. (2000) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1): 8086. CrossRefGoogle Scholar
Housen, A., Kuiken, F. & Vedder, I. (eds.) (2012) Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA. Amsterdam: John Benjamins. CrossRefGoogle Scholar
Huang, Y., Murakami, A., Alexopoulou, T. & Korhonen, A. (2018) Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1): 2854. CrossRefGoogle Scholar
Khushik, G. A. & Huhta, A. (2020) Investigating syntactic complexity in EFL learners’ writing across Common European Framework of Reference Levels A1, A2, and B1. Applied Linguistics, 41(4): 506532. CrossRefGoogle Scholar
Kuhn, M. (2008) Building predictive models in R using the caret package. Journal of Statistical Software, 28(5): 126. CrossRefGoogle Scholar
Kyle, K. & Crossley, S. A. (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4): 757786. CrossRefGoogle Scholar
Lan, G., Lucas, K. & Sun, Y. (2019) Does L2 writing proficiency influence noun phrase complexity? A case analysis of argumentative essays written by Chinese students in a first-year composition course. System, 85: 113. CrossRefGoogle Scholar
Lu, X. (2010) Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4): 474496. CrossRefGoogle Scholar
Lu, X. (2012) The Relationship of Lexical Richness to the Quality of ESL Learners’ Oral Narratives. The Modern Language Journal, 96(2): 190208. CrossRefGoogle Scholar
Lu, X. (2014) Computational methods for corpus annotation and analysis. Dordrecht: Springer. CrossRefGoogle Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014) The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland: Association for Computational Linguistics, 55–60. CrossRefGoogle Scholar
Marcus, M. P., Santorini, B. & Marcinkiewicz, M. A. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2): 313330. Google Scholar
Meurers, D. (2015) Learner corpora and natural language processing. In Granger, S., Gilquin, G. & Meunier, F. (eds.), The Cambridge handbook of learner corpus research. Cambridge: Cambridge University Press, 537566. CrossRefGoogle Scholar
O’Keeffe, A. & Mark, G. (2017) The English Grammar Profile of learner competence: Methodology and key findings. International Journal of Corpus Linguistics, 22(4): 457489. CrossRefGoogle Scholar
Ortega, L. (2009) Understanding second language acquisition. London: Hodder Education.Google Scholar
Page, E. B. (1968) The use of the computer in analyzing student essays. International Review of Education/Internationale Zeitschrift für Erziehungswissenschaft/Revue Internationale de l’Education, 14(2): 210225. Google Scholar
Py, B. (1980) Quelques réflexions sur la notion d’interlangue. La Revue Tranel (TRavaux NEuchâtelois de Linguistique), 1: 3154.Google Scholar
Py, B. (1996) Les données et leur rôle dans l’acquisition d’une langue non maternelle. Les Carnets du Cediscor, 4: 95110. CrossRefGoogle Scholar
Py, B. (2000) Didactique des langues étrangères et recherche sur l’acquisition. Les conditions d’un dialogue. Études de Linguistique Appliquée, 120: 395404.Google Scholar
Saricaoglu, A. (2019) The impact of automated feedback on L2 learners’ written causal explanations. ReCALL, 31(2): 189203. CrossRefGoogle Scholar
Shute, V. J. (2008) Focus on formative feedback. Review of Educational Research, 78(1): 153189. CrossRefGoogle Scholar
Sousa, A., Ballier, N., Gaillat, T., Stearns, B., Zarrouk, M., Simpkin, A. & Bouyé, M. (2020) From linguistic research projects to language technology platforms: A case study in learner data. In Rehm, G., Bontcheva, K., Choukri, K., Hajič, J., Piperidis, S. & Vasiljevs, A. (eds.), Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). Paris: European Language Resources Association, 112–120.Google Scholar
Tack, A., François, T., Roekhaut, S. & Fairon, C. (2017) Human and automated CEFR-based grading of short answers. In Tetreault, J., Burstein, J., Kockhmar, E., Leacock, C. & Yannakoudakis, H. (eds.), Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. Stroudsburg: Association for Computations Linguistics, 169–179. CrossRefGoogle Scholar
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267288. Google Scholar
Tono, Y. (2013) Automatic extraction of L2 criterial lexico-grammatical features across pseudo-longitudinal learner corpora: Using edit distance and variability-based neighbour clustering. In Bardel, C., Lindqvist, C. & Laufer, B. (eds.), L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis. Amsterdam: European Second Language Association, 149–176.Google Scholar
Vajjala, S. (2018) Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28(1): 79105. CrossRefGoogle Scholar
van Ek, J. A. & Trim, J. L. M. (1998) Threshold 1990 (Conseil de l’Europe, Ed.). Cambridge: Cambridge University Press. Google Scholar
van Rooy, B. & Schafer, L. (2003) An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus. In Archer, D., Rayson, P., Wilson, A. & McEnery, T. (eds.), Proceedings of the Corpus Linguistics 2003 Conference (UCREL Technical Paper Number 16). Lancaster: Lancaster University, 835–844.Google Scholar
Venant, R. & d’Aquin, M. (2019) Towards the prediction of semantic complexity based on concept graphs. In Lynch, C. F., Merceron, A., Desmarais, M. & Nkambou, R. (eds.), Proceedings of the 12th International Conference on Educational Data Mining. Canada: Université du Québec à Montréal; Polytechnique Montréal, 188–197.Google Scholar
Volodina, E., Pilán, I. & Alfter, D. (2016) Classification of Swedish learner essays by CEFR levels. In Papadima-Sophocleous, S., Bradley, L. & Thouësny, S. (eds.), CALL communities and culture – Short papers from EUROCALL 2016. Dublin:, 456461. CrossRefGoogle Scholar
Wolfe-Quintero, K., Inagaki, S. & Kim, H.-Y. (1998) Second language development in writing: Measures of fluency, accuracy, & complexity. Honolulu: Second Language Teaching & Curriculum Center, University of Hawai‘i at Mānoa.Google Scholar
Yannakoudakis, H., Briscoe, T. & Medlock, B. (2011) A new dataset and method for automatically grading ESOL texts. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human language technologies: Volume 1: Long papers. Stroudsburg: Association for Computations Linguistics, 180–189.Google Scholar
Zou, H., & Hastie, T. (2005) Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, 67(2): 301320.CrossRefGoogle Scholar
Supplementary material: PDF

Gaillat et al. supplementary material

Gaillat et al. supplementary material 1

Download Gaillat et al. supplementary material(PDF)
PDF 140 KB

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *