Hostname: page-component-89b8bd64d-7zcd7 Total loading time: 0 Render date: 2026-05-10T00:00:49.526Z Has data issue: false hasContentIssue false

Exploring the dual impact of AI in post-entry language assessment: Potentials and pitfalls

Published online by Cambridge University Press:  05 June 2025

Tiancheng Zhang*
Affiliation:
Faculty of International Studies, Southwestern University of Finance and Economics, Chengdu, China DELNA, The University of Auckland, Auckland, New Zealand
Rosemary Erlam
Affiliation:
Faculty of Arts and Education, The University of Auckland, Auckland, New Zealand
Morena Botelho de Magalhães
Affiliation:
DELNA, The University of Auckland, Auckland, New Zealand
*
Corresponding author: Tiancheng Zhang; Email: tzha305@aucklanduni.ac.nz
Rights & Permissions [Opens in a new window]

Abstract

This paper explores the complex dynamics of using AI, particularly generative artificial intelligence (GenAI), in post-entry language assessment (PELA) at the tertiary level. Empirical data from trials with Diagnostic English Language Needs Assessment (DELNA), the University of Auckland’s PELA, are presented.

The first study examines the capability of GenAI to generate reading text and assessment items that might be suitable for use in DELNA. A trial of this GenAI-generated academic reading assessment on a group of target participants (n = 132) further evaluates its suitability. The second study investigates the use of a fine-tuned GPT-4o model for rating DELNA writing tasks, assessing whether automated writing evaluation (AWE) provides feedback of comparable quality to human raters. Findings indicate that while GenAI shows promise in generating content for reading assessments, expert evaluations reveal a need for refinement in question complexity and targeting specific subskills. In AWE, the fine-tuned GPT-4o model aligns closely with human raters in overall scoring but requires improvement in delivering detailed and actionable feedback.

A Strengths, Weaknesses, Opportunities, and Threats analysis highlights AI’s potential to enhance PELA by increasing efficiency, adaptability, and personalization. AI could extend PELA’s scope to areas such as oral skills and dynamic assessment. However, challenges such as academic integrity and data privacy remain critical concerns. The paper proposes a collaborative model integrating human expertise and AI in PELA, emphasizing the irreplaceable value of human judgment. We also emphasize the need to establish clear guidelines for a human-centered AI approach within PELA to maintain ethical standards and uphold assessment integrity.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.
Figure 0

Figure 1. The three stages of the DELNA process.

Figure 1

Figure 2. Scalable content creation using human-in-the-loop AI in the Duolingo English Test (Hao et al., 2024, p. 3).

Figure 2

Figure 3. Wright map for the reading trial of the GenAI-generated assessment.

Figure 3

Table 1. Performance of the fine-tuned LLM-based AWE model in writing assessment

Figure 4

Figure 4. SWOT analysis of AI in PELA.

Figure 5

Figure 5. A proposed collaborative model between humans and AI within PELA.