Skip to main content Accessibility help

Characterising postgraduate students’ corpus query and usage patterns for disciplinary data-driven learning

  • Peter Crosthwaite (a1), Lillian L.C. Wong (a2) and Joyce Cheung (a3)


Data-driven learning (DDL; Johns, 1991), involving students’ hands-on use of corpora for self-guided language learning, is a methodology now increasingly used in many tertiary contexts to enhance the teaching of disciplinary postgraduate thesis writing. However, there are still few studies tracking students’ actual engagement with corpora for DDL. This mixed-methods study reports on the tracking of students’ corpus use via a purpose-built corpus query and data visualisation platform integrated into a large postgraduate disciplinary thesis writing program at a university in Hong Kong. Data on corpus usage history (e.g. times of access, duration of use), query syntax (e.g. query lexis/phraseology and use of wildcards and part-of-speech tags), query function (e.g. frequency lists/distribution, concordance sorting and collocation) and query filters (e.g. searches by faculty, discipline, or thesis section) were collected from 327 students spanning over 11,000 individual corpus queries. The results show significant interdisciplinary and inter-/intra-user trends and variation in the use of particular corpus functions and query syntax adopted by corpus users. Students varied in the type of knowledge (e.g. domain-specific, language-specific) they were accessing, and frequently went beyond the exemplars of the DDL course materials to generate unique queries under their own initiative. Qualitative case study data from three corpus users’ activity logs also show distinctive individual corpus engagement by query frequency and function. These data provide a clearer insight into what students actually do during DDL and the different directions and trajectories that individual users take as a result of DDL. All accompanying DDL tasks are also included as supplementary materials.



Hide All
Anthony, L. (2014) AntConc (Version 3.4.4). Tokyo: Waseda University.
Anthony, L. (2017) AntFileConverter (Version 1.2.1). Tokyo: Waseda University.
Baisa, V., and Suchomel, V. (2014) SkELL: Web interface for English language learning. In Horák, A. & Rychlý, P. (eds.), RASLAN 2014: Eighth Workshop on Recent Advances in Slavonic Natural Language Processing (pp. 6370). Brno: NLP Consulting.
Boulton, A. (2015) Applying data-driven learning to the web. In Leńko-Szymańska, A. & Boulton, A. (eds.), Multiple affordances of language corpora for data-driven learning. Amsterdam: John Benjamins, 267295.
Boulton, A., Carter-Thomas, S., and Rowley-Jolivet, E. (eds.) (2012) Corpus-informed research and learning in ESP: Issues and applications. Amsterdam: John Benjamins.
Boulton, A., and Cobb, T. (2017 ) Corpus use in language learning: A meta-analysis. Language Learning, 67(2): 348393.
Centre for Applied English Studies (2017) Introduction to Thesis Writing. Hong Kong: The University of Hong Kong.
Chambers, A. & O’Sullivan, Í. (2004) Corpus consultation and advanced learners’ writing skills in French. ReCALL, 16(1): 158172.
Charles, M. (2007) Reconciling top-down and bottom-up approaches to graduate writing: Using a corpus to teach rhetorical functions. Journal of English for Academic Purposes, 6(4): 289302.
Charles, M. (2014) Getting the corpus habit: EAP students’ long-term use of personal corpora. English for Specific Purposes, 35: 3040.
Charles, M. (2015) Same task, different corpus: The role of personal corpora in EAP classes. In Leńko-Szymańska, A. & Boulton, A. (eds.), Multiple affordances of language corpora for data-driven learning. Amsterdam: John Benjamins, 131153.
Chen, M., and Flowerdew, J. (2018) Introducing data-driven learning to PhD students for research writing purposes: A territory-wide project in Hong Kong. English for Specific Purposes, 50: 97112.
Cobb, T., and Boulton, A. (2015) Classroom applications of corpus analysis. In Biber, D. & Reppen, R. (eds.), The Cambridge handbook of English corpus linguistics. Cambridge: Cambridge University Press, 478497.
Cotos, E. (2014) Enhancing writing pedagogy with learner corpus data. ReCALL, 26(2): 202224.
Cotos, E., Link, S., and Huffman, S. (2017) Effects of DDL technology on genre learning. Language Learning & Technology, 21(3): 104130.
Crosthwaite, P. (2017) Retesting the limits of data-driven learning: Feedback and error correction. Computer Assisted Language Learning, 30(6): 447473.
Flowerdew, L. (2015) Data-driven learning and language learning theories: Whither the twain shall meet. In Leńko-Szymańska, A. & Boulton, A. (eds.), Multiple affordances of language corpora for data-driven learning. Amsterdam: John Benjamins, 1536.
Flowerdew, J. (2016) English for specific academic purposes (ESAP): Making the case. Writing & Pedagogy, 8(1): 532.
Frankenberg-Garcia, A. (2005) A peek into what today’s language learners as researchers actually do. International Journal of Lexicography, 18(3): 335355.
Gaskell, D., and Cobb, T. (2004) Can learners use concordance feedback for writing errors? System, 32: 301319.
Hafner, C. A., and Candlin, C. N. (2007) Corpus tools as an affordance to learning in professional legal education. Journal of English for Academic Purposes, 6: 303318.
Hyland, K. (2000) Disciplinary discourses: Social interactions in academic writing. Harlow: Longman.
Johns, T. (1991) Should you be persuaded: Two examples of data-driven learning materials. In Johns, T. & King, P. (eds.), Classroom concordancing: English Language Research Journal 4. Birmingham: Centre for English Language Studies, University of Birmingham, 116.
Kilgarriff, A., and Grefenstette, G. (2003) Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3): 333347.
Kilgarriff, A., Rychly, P., Smrz, P., and Tugwell, D. (2004) The Sketch Engine. Information Technology, 105: 116127.
Lee, D., and Swales, J. (2006) A corpus-based EAP course for NNS doctoral students: Moving from available specialized corpora to self-compiled corpora. English for Specific Purposes, 25(1): 5675.
Lee, H., Warschauer, M., and Lee, J. H. (2018) The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics. Advance online publication.
Leńko-Szymańska, A., and Boulton, A. (eds.) (2015) Multiple affordances of language corpora for data-driven learning. Amsterdam: John Benjamins.
Long, M. H. (1991) Focus on form: A design feature in language teaching methodology. In de Bot, K., Ginsberg, R. B. & Kramsch, C. (eds.), Foreign language research in cross-cultural perspective. Amsterdam: John Benjamins, 3952.
Luo, Q. (2016) The effects of data-driven learning activities on EFL learners’ writing development. SpringerPlus, 5(1): 1255.
Millar, N. (2011) The processing of malformed formulaic language. Applied Linguistics, 32(2): 129148.
Pérez-Paredes, P., Sánchez-Tornel, M., Alcaraz Calero, J. M., and Jiménez, P. A. (2011) Tracking learners’ actual uses of corpora: Guided vs non-guided corpus consultation. Computer Assisted Language Learning, 24(3): 233253.
Schmidt, R. W. (1990) The role of consciousness in second language learning. Applied Linguistics, 11(2): 129158.
Steel, C. (2012) Fitting learning into life: Language students’ perspectives on benefits of using mobile apps. In Brown, M., Hartnett, M. & Stewart, T. (eds.), Future challenges, sustainable futures. Proceedings ASCILITE. Wellington: Massey University, 875880.
Steel, C. H., and Levy, M. (2013) Language students and their technologies: Charting the evolution 2006–2011. ReCALL, 25(3): 306320.
Widmann, J., Koh, K., and Ziai, R.(2011) The SACODEYL search tool: Exploiting corpora for language learning purposes. In Frankenberg-Garcia, A., Flowerdew, L. & Aston, G. (eds.), New trends in corpora and language learning. London: Continuum, 167178.
Yoon, H. (2008) More than a linguistic reference: The influence of corpus technology on L2 academic writing. Language Learning & Technology, 12(2): 3148.
Yoon, H., and Hirvela, A. (2004) ESL student attitude toward corpus use in L2 writing. Journal of Second Language Writing, 13(4): 257283.


Type Description Title
Supplementary materials

Crosthwaite et al. supplementary material
Crosthwaite et al. supplementary material 1

 Unknown (731 KB)
731 KB

Characterising postgraduate students’ corpus query and usage patterns for disciplinary data-driven learning

  • Peter Crosthwaite (a1), Lillian L.C. Wong (a2) and Joyce Cheung (a3)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed