Hostname: page-component-5db58dd55d-ggg9q Total loading time: 0 Render date: 2026-06-02T05:34:55.977Z Has data issue: false hasContentIssue false

Introducing the Chinese Learner English Corpus (CLEC)

A new resource for exploring the impact of extramural activities on L2 writing

Published online by Cambridge University Press:  03 October 2025

Ying Wang*
Affiliation:
Karlstad University , Sweden
Henrik Kaatari
Affiliation:
University of Gävle , Sweden
Tove Larsson
Affiliation:
Northern Arizona University , USA
Hongping Xiong
Affiliation:
Wuhan University , China
Fei Liu
Affiliation:
Northwestern Polytechnical University , China
*
Corresponding author: Ying Wang; Email: ying.wang@kau.se
Rights & Permissions [Opens in a new window]

Abstract

This paper introduces the Chinese Learner English Corpus (CLEC), comprising argumentative texts written by Chinese lower and upper secondary school students. CLEC expands learner corpus research by including texts from intermediate-level learners and rich metadata on their backgrounds, including engagement with self-initiated, so-called Extramural English (EE) activities outside the classroom. To illustrate potential uses, two case studies are presented. The first uses a keyword analysis to reveal thematic and stylistic differences between CLEC and its Swedish counterpart, SLEC, highlighting linguistic priorities related to distinct learning contexts. The second investigates lexical bundles associated with gaming, demonstrating how EE engagement might influence learners’ use of multiword units. Freely available online, CLEC facilitates contrastive interlanguage analysis and supports further research into L2 learning and use, particularly regarding the role of language exposure. The corpus is also a valuable resource for teacher trainees aiming to deepen their understanding of SLA processes.

Information

Type
Data Report
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Overview of the metadata included in CLEC

Figure 1

Table 2. Distribution across gender1

Figure 2

Table 3. Distribution across school years

Figure 3

Figure 1. Time spent on EE activities.

Figure 4

Table 4. Popular films/TV series, songs, and reading genres

Figure 5

Figure 2. Time spent on EE activities: comparing CLEC and SLEC.

Figure 6

Table 5. Top 30 keywords from CLEC and SLEC

Figure 7

Table 6. Top 30 3-word bundles in gaming and nongaming subsets

Supplementary material: File

Wang et al. supplementary material

Wang et al. supplementary material
Download Wang et al. supplementary material(File)
File 23.7 KB