Skip to main content Accessibility help
Hostname: page-component-55597f9d44-mm7gn Total loading time: 0.445 Render date: 2022-08-08T04:36:31.417Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true } hasContentIssue true

Article contents

Testing the limits of data-driven learning: language proficiency and training

Published online by Cambridge University Press:  01 January 2009

Alex Boulton
CRAPEL – ATILF/CNRS, Nancy-Université, 3 place Godefroi de Bouillon, BP 3397, 54015 Nancy – cedex, France (email:


The potential for corpora in language learning has attracted a significant amount of attention in recent years, including in the form of data-driven learning (DDL). Careful not to appear to over-promote the field, enthusiasts have urged caution in its application, in particular with regard to lower-level learners, and have argued that extensive learner-training in corpus techniques is an essential condition for DDL to be successful. Such limits seem eminently reasonable, but there is a notable dearth of empirical studies to support them. This paper describes a simple experiment to see how lower-level learners cope with corpus data with no prior training.

The language focus here is on linking adverbials in English, which are renowned to be difficult to teach using traditional methods. The subjects are 132 first-year students at an engineering college in France of roughly intermediate and lower levels of English. They were divided into random groups to compare their ability to deal with the target items using traditional sources (extracts from a bilingual dictionary or a grammar/usage manual) or corpus data (short contexts or truncated concordances). Performance was tested prior to the experiment, subsequently to check ability to use the different information sources as a reference, and later to test recall.

No evidence was found that traditional sources promote better recall, and corpus data seemed to be more effective for reference purposes. While the results of any single experiment must be treated with caution, these findings suggest the need for more empirical studies to complement the theoretical arguments and qualitative data which currently dominate the discussions of DDL.

Original Article
Copyright © European Association for Computer Assisted Language Learning 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Aarts, J.Granger, S. (1998) Tag sequences in learner corpora: a key to interlanguage grammar and discourse. In: Granger, S. (ed.), Learner English on computer. London: Longman, 132142.Google Scholar
Adolphs, S. (2006) Introducing electronic text analysis: a practical guide for language and literary studies. London: Routledge.Google Scholar
Aijmer, K. (2003) Discourse particles in contrast: the case of ‘in fact’ and ‘actually’. In: Wilson, A., Rayson, P. and McEnery, T. (eds.), Corpus linguistics by the lune: a festschrift for Geoffrey Leech. Frankfurt: Peter Lang, 2335.Google Scholar
Allan, R. (2006) Data-driven learning and vocabulary: investigating the use of concordances with advanced learners of English. Centre for language and communication studies, occasional paper, 66. Dublin: Trinity College Dublin.Google Scholar
Allan, R. (2008) Can a graded reader corpus provide ‘authentic’ input? ELT Journal (advance access).Google Scholar
Altenberg, B.Tapper, B. (1998) The use of adverbial connectors in advanced Swedish learners’ written English. In: Granger, S. (ed.), Learner English on computer. London: Longman, 8093.Google Scholar
Bernardini, S. (2001) ‘Spoilt for choice.’ A learner explores general language corpora. In: Aston, G. (ed.), Learning with corpora. Houston: Athelstan, 220249.Google Scholar
Biber, D., Johansson, S., Leech, G., Conrad, S.Finegan, E. (1999) Longman grammar of spoken and written English. London: Pearson.Google Scholar
Boulton, A. (2007a) But where’s the proof? The need for empirical evidence for data-driven learning. BAAL 40: technology, ideology and practice in applied linguistics. University of Edinburgh, September.Google Scholar
Boulton, A. (2007b) DDL is in the details… and in the big themes. In: Rayson, P. (ed.), Proceedings of 4th Corpus Linguistics conference. Birmingham: University of Birmingham Centre for Corpus Research.Google Scholar
Boulton, A. (2008) Looking for empirical evidence of data-driven learning at lower levels. In: Lewandowska-Tomaszczyk, B. (ed.), Corpus linguistics, computer tools, and applications – state of the art. Frankfurt: Peter Lang, 581–598.Google Scholar
Boulton, A.Wilhelm, S. (2006) Habeant corpus – they should have the body: tools learners have the right to use. ASp, 49–50: 155170.CrossRefGoogle Scholar
Braun, S. (2007) Integrating corpus work into secondary education: from data-driven learning to needs-driven corpora. ReCALL, 19(3): 307328.CrossRefGoogle Scholar
Breyer, Y. (2006) My Concordancer: tailor-made software for language learners and teachers. In: Braun, S., Kohn, K. and Mukherjee, J. (eds.), Corpus technology and language pedagogy: new resources, new tools, new methods. (English corpus linguistics, 3). Frankfurt: Peter Lang, 157176.Google Scholar
Brown, D. (2007) Language learner motivation and the role of choice in ESP listening engagement. ASp, 51–52: 159177.CrossRefGoogle Scholar
Chambers, A. (2007) Popularising corpus consultation by language learners and teachers. In: Hidalgo, E., Quereda, L. and Santana, J. (eds.), Corpora in the foreign language classroom. Amsterdam: Rodopi, 316.Google Scholar
Cheng, W.Warren, M. (2000) The Hong Kong corpus of spoken English: language learning through language description. In: Burnard, L. and McEnery, T. (eds.), Rethinking language pedagogy from a corpus perspective. Frankfurt: Peter Lang, 133144.Google Scholar
Chujo, K., Utiyama, M.Nishigaki, C. (2007) Towards building a usable corpus collection for the ELT classroom. In: Hidalgo, E., Quereda, L. and Santana, J. (eds.), Corpora in the foreign language classroom. Amsterdam: Rodopi, 4769.CrossRefGoogle Scholar
Ciezielska-Ciupek, M. (2001) Teaching with the internet and corpus materials: preparation of ELT materials using the internet and corpus resources. In: Lewandowska-Tomaszczyk, B. (ed.), PALC 2001: practical applications in language corpora. (Lodz studies in language, 7). Frankfurt: Peter Lang, 521531.Google Scholar
Cobb, T. (1999a) Breadth and depth of lexical acquisition with hands-on concordancing. CALL, 12(4): 345360.CrossRefGoogle Scholar
Cobb, T. (1999b) Giving learners something to do with concordance output. IMELT 99. Hong Kong Polytechnic University, November. Scholar
Cobb, T. (2003) Do corpus-based electronic dictionaries replace concordancers? In: Morrison, B., Green, G. and Motteram, G. (eds.), Directions in CALL: experience, experiments, evaluation. Hong Kong: Polytechnic University, 179206.Google Scholar
Conrad, S. (1999) The importance of corpus-based research for language teachers. System, 27(1): 118.CrossRefGoogle Scholar
Cresswell, A. (2007) Getting to ‘know’ connectors? Evaluating data-driven learning in a writing skills course. In: Hidalgo, E., Quereda, L. and Santana, J. (eds.), Corpora in the foreign language classroom. Amsterdam: Rodopi, 267287.CrossRefGoogle Scholar
Crewe, W. J. (1990) The illogic of logical connectors. ELT Journal, 44(4): 316325.CrossRefGoogle Scholar
De Haan, P.Van Esch, K. (2007) Assessing the development of foreign language writing skills: syntactic and lexical features. In: Fitzpatrick, E. (ed.), Corpus linguistics beyond the word: corpus research from phrase to discourse. Amsterdam: Rodopi, 185202.CrossRefGoogle Scholar
Flowerdew, J. (2001) Concordancing as a tool in course design. In: Ghadessy, M., Henry, A. and Roseberry, R. (eds.), Small corpus studies and ELT: theory and practice. Amsterdam: John Benjamins, 7192.CrossRefGoogle Scholar
Flowerdew, L. (1998) Integrating expert and interlanguage computer corpora findings on causality: discoveries for teachers and students. ESP journal, 17(4): 329345.Google Scholar
Frankenberg-Garcia, A. (2005) Pedagogical uses of monolingual and parallel concordances. ELT Journal, 59(3): 189198.CrossRefGoogle Scholar
Garton, J. (1996) Interactive concordancing with a specialist corpus. ON-CALL, 10(1): 814. Scholar
Gaskell, D.Cobb, T. (2004) Can learners use concordance feedback for writing errors? System, 32(3): 301319.CrossRefGoogle Scholar
Gavioli, L. (2005) Exploring corpora for ESP learning. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Granger, S.Rayson, P. (1998) Automatic lexical profiling of learner texts. In: Granger, S. (ed.), Learner English on computer. London: Addison Wesley Longman, 119131.Google Scholar
Granger, S.Tribble, C. (1998) Learner corpus data in the foreign language classroom: form-focused instruction and data-driven learning. In: Granger, S. (ed.), Learner English on computer. London: Longman, 199209.Google Scholar
Granger, S.Tyson, S. (1996) Connector usage in the English essay writing of native and non-native EFL speakers of English. World Englishes, 15(1): 1727.CrossRefGoogle Scholar
Hadley, G. (2002) An introduction to data-driven learning. RELC journal, 33(2): 99124.CrossRefGoogle Scholar
Holec, H. (1990) Des documents authentiques, pour quoi faire? Mélanges CRAPEL, 6574.Google Scholar
Horst, M.Cobb, T. (2001) Growing academic vocabulary with a collaborative on-line data-base. In: Morrison, B., Gardner, D., Keobke, K. and Spratt, M. (eds.), ELT perspectives on IT and multimedia: selected papers from the ITMELT conference 2001. Hong Kong: Polytechnic University, 189225. Scholar
Hyland, K.Milton, J. (1997) Qualification and certainty in L1 and L2 students’ writing. Reprinted. In: Sampson, G. and McCarthy, D. (eds.), Corpus linguistics: readings in a widening discipline. London: Continuum, 371386.Google Scholar
Johns, T. (1986) Micro-Concord: a language learner’s research tool. System, 14(2): 151162.CrossRefGoogle Scholar
Johns, T. (1991) Should you be persuaded: two examples of data-driven learning. In: Johns, T. and King, P. (eds.), Classroom concordancing. (English language research journal, 4), 1–16.Google Scholar
Johns, T. (1997) Kibbitzing one-to-ones (Web version). BALEAP: academic writing. University of Reading, 29 November. Scholar
Kennedy, C.Miceli, T. (2002) The CWIC project: developing and using a corpus for intermediate Italian students. In: Kettemann, B. and Marko G. (eds.), Teaching and learning by doing corpus analysis: proceedings of the fourth international conference on teaching and language corpora. Amsterdam: Rodopi, 183192.CrossRefGoogle Scholar
Kilgarriff, A. (2001) Web as corpus. Reprinted in: Sampson, G. and McCarthy, D. (eds.) (2004) Corpus linguistics: readings in a widening discipline. London: Continuum, 471473.Google Scholar
Koosha, M.Jafarpour, A. (2006) Data-driven learning and teaching collocation of prepositions: the case of Iranian EFL adult learners. Asian EFL journal quarterly, 8(4): 192209. Scholar
Lake, J. (2004) Using ‘on the contrary’: the conceptual problems for EAP students. ELT journal, 58(2): 137144.CrossRefGoogle Scholar
Lamy, M.-N.Klarskov Mortensen, J. (2007) Using concordance programs in the modern foreign languages classroom. Module 2.4. In: Davies, G. (ed.), Information and communications technology for language teachers (ICT4LT). Slough: Thames Valley University. Scholar
Levy, M. (1990) Concordances and their integration into a word-processing environment for language learners. System, 8(2): 177188.CrossRefGoogle Scholar
O’Sullivan, I.Chambers, A. (2006) Learners’ writing skills in French: corpus consultation and learner evaluation. Journal of second language writing, 15(1): 4968.CrossRefGoogle Scholar
Renouf, A., Kehoe, A.Banerjee, J. (2007) WebCorp: an integrated system for web text search. In: Hundt, M., Nesselhauf, N. and Biewer, C. (eds.), Corpus linguistics and the web. Amsterdam: Rodopi, 4767.CrossRefGoogle Scholar
Roby, W. (2005) The internet, autonomy, and lexicography: a convergence? In: Debaisieux, J-M. and Boulton, A. (eds.), TIC et autonomie dans l’apprentissage des langues. (Mélanges CRAPEL, 28), 47–66.Google Scholar
Schiffrin, D. (1987) Discourse markers. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Scott, M. Tribble, C. (2006) Textual patterns: key words and corpus analysis in language education. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Sealey, A.Thompson, P. (2007) Corpus, concordance, classification: young learners in the L1 classroom. Language awareness, 16(3): 208216.CrossRefGoogle Scholar
Sinclair, J. (2003) Reading concordances: an introduction. Harlow: Pearson Longman.Google Scholar
Sinclair, J. (2004) New evidence, new priorities, new attitudes. In: Sinclair, J. M. (ed.), How to use corpora in language teaching. Amsterdam: John Benjamins, 271299.CrossRefGoogle Scholar
Stevens, V. (1991) Concordance-based vocabulary exercises: a viable alternative to gap-filling. In: Johns, T. and King, P. (eds.), Classroom concordancing. (English language research journal, 4), 47–61.Google Scholar
Sun, Y-C. (2003) Learning process, strategies and web-based concordancers: a case-study. British journal of educational technology, 34(5): 601613.CrossRefGoogle Scholar
Sun, Y-C.Wang, L-Y. (2003) Concordancers in the EFL classroom: cognitive approaches and collocation difficulty. CALL, 16(1): 8394.CrossRefGoogle Scholar
Swan, M. (2005) Practical English usage. Oxford: Oxford University Press, 3rd edition.Google Scholar
Test of English for International Communication (TOEIC) Scholar
Thurstun, J.Candlin, C. (1997) Exploring academic English: a workbook for student essay writing. Sydney: CELTR.Google Scholar
Thurstun, J.Candlin, C. (1998) Concordancing and the teaching of the vocabulary of academic English. English for specific purposes, 17(3): 267280.CrossRefGoogle Scholar
Todd, R. W. (2001) Induction from self-selected concordances and self-correction. System, 29(1): 91102.CrossRefGoogle Scholar
Widdowson, H. G. (1998) Context, community, and authentic language. TESOL quarterly, 32(4): 705716.CrossRefGoogle Scholar
Wilson, E. (1997) The automatic generation of CALL exercises from general corpora. In: Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, G. (eds.), Teaching and language corpora. Harlow: Addison Wesley Longman, 116130.Google Scholar
Yoon, H.Hirvela, A. (2004) ESL student attitudes toward corpus use in L2. Journal of second language writing, 13(4): 257283.CrossRefGoogle Scholar
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Testing the limits of data-driven learning: language proficiency and training
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Testing the limits of data-driven learning: language proficiency and training
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Testing the limits of data-driven learning: language proficiency and training
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *