Skip to main content Accessibility help

Words that second language learners are likely to hear, read, and use*



In the present study, we explore whether multiple data sources may be more effective than single sources at predicting the words that language learners are likely to know. Second language researchers have hypothesized that there is a relationship between word frequency and the likelihood that words will be encountered or used by second language learners, but it is not yet clear how this relationship should be effectively measured. An analysis of word frequency measures showed that spoken language frequency alone may predict the occurrence of words in learner textbooks, but that multiple corpora as well as textbook status can improve predictions of learner usage.


Corresponding author

Address for correspondence: Doug Davidson, F. C. Donders Centre for Cognitive Neuroimaging, P.O. Box 9101, 6500 HB Nijmegen, The


Hide All

Arna van Doorn assembled the vocabulary lists from the three Dutch textbooks. The Max Planck Institute for Psycholinguistics provided access to the CELEX, the CGN, and the ESF corpora. The analysis was conducted using R (R Development Team, 2005), and the stats (R Development Team, 2005) and MASS (Venables & Ripley, 2002) libraries. This research was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO). We would also like to thank two anonymous reviewers for useful suggestions, and Jan Hulstijn for providing helpful comments and references for textbook vocabulary selection including, in addition to those cited in the text, Hazenberg (1994) and Sciarone (1979).



Hide All
Akaike, H. 1974. A new look at statistical model identification. IEEE Transactions on Automatic Control, AU-19, 716–722.
Anderson, J. R. & Schooler, L. J. 1991. Reflections of the envi-ronment in memory. Psychological Science, 2, 396408.
Baayen, R. H., Piepenbrock, R. & Gulikers, L. 1995. The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania.
Bossers, B. 1996. Woordenschat. In Hulstijn, J. H., Stumpel, R., Bossers, B. & Van Veen, C. (eds.), Nederlands als tweede taal in de volwasseneneducatie: Handboek voor docenten, pp. 167193. Amsterdam: Meulenhoff Educatief.
Brown, C. 1993. Factors affecting the acquisition of vocabulary: Frequency and saliency of words. In T. Huckin, M. Haynes & J. Coady (eds.), Second language reading and vocabulary learning, pp. 263–286. Norwood, NJ: Ablex.
Corpus Gesproken Nederlands. Copyright Nederlandse Taalunie 2004. (accessed 17 October 2007).
Day, R., Omura, C. & Hiramatsu, M. 1991. Incidental EFL vocabulary learning and reading. Reading in a Foreign Language, 7, 541551.
De Kleijn, P. & Nieuwborg, E. 1983. Basiswoordenboek Nederlands. Leuven: Wolters.
Donaldson, B. 1996. Colloquial Dutch: The complete course for beginners. New York: Routledge.
Dunn, L. M. & Dunn, L. M. 1997. The Peabody Picture Vocabulary Test – 3rd edition. Circle Pines, MN: American Guidance Service.
Dupuy, B. & Krashen, S. 1993. Incidental vocabulary acquisition in French as a foreign language. Applied Language Learning, 4, 5563.
Fukkink, R. G., Hulstijn, J. & Simis, A. 2005. Does training of second-language word recognition skills affect reading comprehension? An experimental study. The Modern Language Journal, 89, 5475.
Van Gelderen, A., Schoonen, R., DeGlopper, K. Glopper, K., Hulstijn, J., Simis, A., Snellings, P. & Stevenson, M. 2004. Linguistic knowledge, processing speed, and metacognitive knowledge in first- and second-language reading comprehension: A componential analysis. Journal of Educational Psychology, 96, 1930.
Hazenberg, S. 1994. Een keur van woorden. Ph.D. dissertation, Vrije Universiteit Amsterdam.
Hulstijn, J. H., Hollander, M. & Greidanus, T. 1996. Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. Modern Language Journal, 80, 327339.
van Kampen, H. & Stumpel, R. 2002. Dutch for self-study/Nederlands voor anderstaligen (4th edn.). Utrecht: Prisma (Het Spectrum B.V.).
Laufer, B. 2005. Lexical frequency profiles: From Monte Carlo to the real world. Applied Linguistics, 26 (4), 582588.
Laufer, B., Elder, C., Hill, K. & Congdon, P. 2004. Size and strength: Do we need both to measure vocabulary knowledge? Language Testing, 21, 202226.
Meara, P. 2005. Lexical frequency profiles: A Monte Carlo analysis. Applied Linguistics, 26 (1), 3247.
Oostdijk, N. 2000. The Spoken Dutch Corpus: Overview and first evaluation. In Gravilidou, M., Carayannis, G., Markantonatou, S., Piperidis, S. & Stainhaouer, G. (eds.), LREC-2000 (Second International Conference on Language Resources and Evaluation) Proceedings, vol. 2, pp. 887894. Paris: European Language Resources Association.
Pavlik, P. I., Jr. & Anderson, J. R. 2005. Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29, 559586.
Perdue, C. (ed.) 1984. Second language acquisition by adult immigrants: A field manual. Rowley: Newbury House.
Perdue, C. (ed.) 1993. Adult language acquisition: Cross-linguistic perspectives (vol. 1: Field methods). Cambridge: Cambridge University Press.
Pitts, M., White, H. & Krashen, S. 1989. Acquiring second language vocabulary through reading: A replication of the Clockwork Orange study using second language acquirers. Reading in a Foreign Language, 5, 271275.
R Development Team 2005. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Rott, S. 1999. The effect of exposure frequency on intermediate language learners' incidental vocabulary acquisition and retention through reading. Studies in Second Language Learning, 21, 589619.
Schneider-Broekmans, J. 2000. Taal vitaal: Nederlands voor beginners. Amsterdam & Antwerpen: Intertaal.
Sciarone, A. G. 1979. Woordjes leren in het vreemde-talenonderwijs. Muiderberg: Coutinho.
Uitden Bogaart, P. C. (ed.) 1975. Woordfrequenties in geschreven en gesproken Nederlands. Utrecht: Oosthoek, Scheltema & Holkema.
Venables, W. N. & Ripley, B. D. 2002. Modern applied statistics with S. New York: Springer.
Vermeer, A. 2001. Breadth and depth of vocabulary in relation to L1/L2 acquisition and frequency of input. Applied Psycholinguistics, 22, 217234.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Bilingualism: Language and Cognition
  • ISSN: 1366-7289
  • EISSN: 1469-1841
  • URL: /core/journals/bilingualism-language-and-cognition
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed