Hostname: page-component-77f85d65b8-g4pgd Total loading time: 0 Render date: 2026-03-29T15:54:32.142Z Has data issue: false hasContentIssue false

The capacity of ChatGPT-4 for L2 writing assessment: A closer look at accuracy, specificity, and relevance

Published online by Cambridge University Press:  31 July 2025

Aysel Saricaoglu
Affiliation:
Department of English Language and Literature, Social Sciences University of Ankara, Ankara, Türkiye
Zeynep Bilki*
Affiliation:
Department of Foreign Language Education, TED University, Ankara, Türkiye
*
Corresponding author: Zeynep Bilki; Email: zeynep.bilki@tedu.edu.tr
Rights & Permissions [Opens in a new window]

Abstract

This study examined the capacity of ChatGPT-4 to assess L2 writing in an accurate, specific, and relevant way. Based on 35 argumentative essays written by upper-intermediate L2 writers in higher education, we evaluated ChatGPT-4’s assessment capacity across four L2 writing dimensions: (1) Task Response, (2) Coherence and Cohesion, (3) Lexical Resource, and (4) Grammatical Range and Accuracy. The main findings were (a) ChatGPT-4 was exceptionally accurate in identifying the issues across the four dimensions; (b) ChatGPT-4 demonstrated more variability in feedback specificity, with more specific feedback in Grammatical Range and Accuracy and Lexical Resource, but more general feedback in Task Response and Coherence and Cohesion; and (c) ChatGPT-4’s feedback was highly relevant to the criteria in the Task Response and Coherence and Cohesion dimensions, but it occasionally misclassified errors in the Grammatical Range and Accuracy and Lexical Resource dimensions. Our findings contribute to a better understanding of ChatGPT-4 as an assessment tool, informing future research and practical applications in L2 writing assessment.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2025. Published by Cambridge University Press.
Figure 0

Table 1. Coded samples from the dataset

Figure 1

Table 2. Descriptive results for accuracy across L2 writing dimensions

Figure 2

Table 3. Descriptive results for specificity across L2 writing dimensions

Figure 3

Table 4. Descriptive results for relevance across L2 writing dimensions

Figure 4

Table 5. Chi-square test results for the association between dimension type and feedback features

Figure 5

Table 6. Adjusted standardized residuals for the four dimensions across feedback features

Supplementary material: File

Saricaoglu and Bilki supplementary material

Saricaoglu and Bilki supplementary material
Download Saricaoglu and Bilki supplementary material(File)
File 3.5 MB