Skip to main content Accessibility help

Wordbank: an open repository for developmental vocabulary data*


The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.

Corresponding author
Address for correspondence: Michael C. Frank, Department of Psychology, Jordan Hall (Bldg. 420), 450 Serra Mall, Stanford, CA 94305; tel: (650) 724-4003; e-mail:
Hide All

This work supported by a John Merck Scholars award and NSF BCS-1528526. Thanks to Ranjay Krishna for contributions to the initial development of the site, to Rune Nørgaard Jørgensen for helping port data from CLEX, to all of the contributors listed at <> for generously sharing their data, and to the Advisory Board of the MacArthur-Bates Communicative Development Inventories, especially Philip Dale and Larry Fenson, for their support.

Hide All
Bates, E. (1976). Language and context: the acquisition of pragmatics (Vol. 13). New York, NY: Academic Press.
Bates, E. & Goodman, J. (1999). On the emergence of grammar from the lexicon. In MacWhinney, B. (ed.), The emergence of language (pp. 2979). Mahwah, NJ: Lawrence Erlbaum Associates.
Bates, E., Marchman, V., Thal, D., Fenson, L., Dale, P., Reznick, J. S., … Hartung, J. (1994). Developmental and stylistic variation in the composition of early vocabulary. Journal of Child Language 21, 85123.
Bloom, P. (2002). How children learn the meanings of words. Cambridge, MA: MIT Press.
Bornstein, M. H. & Haynes, O. M. (1998). Vocabulary competence in early childhood: measurement, latent construct, and predictive validity. Child Development 69, 654–71.
Braginsky, M., Yurovsky, D., Marchman, V. A. & Frank, M. C. (2015). Developmental changes in the relationship between grammar and the lexicon. In Noelle, D. C., Dale, R., Warlaumont, A. S., Yoshimi, J., Matlock, T., Jennings, C. D., & Maglio, P. P. (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society.
Brown, R. (1973). A first language: the early stages. Cambridge, MA: Harvard University Press.
Cartmill, E. A., Armstrong, B. F., Gleitman, L. R., Goldin-Meadow, S., Medina, T. N. & Trueswell, J. C. (2013). Quality of early parent input predicts child vocabulary 3 years later. Proceedings of the National Academy of Sciences 110, 11278–83.
Clark, E. (2003). First language acquisition. Cambridge: Cambridge University Press.
Dale, P. S. (n.d.). Adaptations, not translations! Online: <> (last accessed 2015).
Dale, P. S. & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments & Computers 28, 125–7.
Dale, P. S. & Penfold, M. (n.d.). Adaptations of the MacArthur-Bates CDI into non-US English languages. Online: <> (last accessed 2011).
Dickinson, D. K. & Tabors, P. O. (2001). Beginning literacy with language: young children learning at home and school. Baltimore, MD: Paul H. Brookes Publishing.
Dunn, L. M. & Dunn, L. M. (2007). Peabody Picture Vocabulary Test, 4th ed. Parsippany, NJ: AGS Publishing / Pearson Assessments.
Eriksson, M., Marschik, P. B., Tulviste, T., Almgren, M., Pérez Pereira, M., Wehberg, S., … Gallego, C. (2012). Differences between girls and boys in emerging language skills: evidence from 10 language communities. British Journal of Developmental Psychology 30, 326–43.
Feldman, H. M., Dale, P. S., Campbell, T. F., Colborn, D. K., Kurs-Lasky, M., Rockette, H. E. & Paradise, J. L. (2005). Concurrent and predictive validity of parent reports of child language at ages 2 and 3 years. Child Development 76, 856–68.
Feldman, H. M., Dollaghan, C. A., Campbell, T. F., Kurs-Lasky, M., Janosky, J. E. & Paradise, J. L. (2000). Measurement properties of the MacArthur Communicative Development Inventories at ages one and two years. Child Development 71, 310–22.
Fenson, L., Bates, E., Dale, P., Goodman, J., Reznick, J. S. & Thal, D. (2000). Reply: measuring variability in early child language: don't shoot the messenger. Child Development 71, 323–8.
Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Hartung, J. P., Pethick, S. & Reilly, J. (1993). MacArthur Communicative Development Inventories: user's guide and technical manual. Baltimore, MD: Paul H. Brookes Publishing Co.
Fenson, L., Dale, P., Reznick, J., Bates, E., Thal, D., Pethick, S., … Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development 59.
Fenson, L., Marchman, V. A., Thal, D., Dale, P., Reznick, J. S. & Bates, E. (2007). MacArthur-Bates Communicative Development Inventories: user's guide and technical manual, 2nd ed. Baltimore, MD: Brookes Publishing Company.
Hidaka, S. (2016). Estimating the latent number of types in growing corpora with reduced cost–accuracy trade-off. Journal of Child Language 43, 128.
Hills, T. T., Maouene, J., Riordan, B. & Smith, L. B. (2010). The associative structure of language: contextual diversity in early word learning. Journal of Memory and Language 63, 259–73.
Hills, T. T., Maouene, M., Maouene, J., Sheya, A. & Smith, L. (2009). Longitudinal analysis of early semantic networks: Preferential attachment or preferential acquisition? Psychological Science 20, 729–39.
Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M. & Lyons, T. (1991). Early vocabulary growth: relation to language input and gender. Developmental Psychology 27, 236–48.
Jørgensen, R. N., Dale, P. S., Bleses, D. & Fenson, L. (2010). CLEX: a cross-linguistic lexical norms database. Journal of Child Language 37, 419–28.
Kristoffersen, K. E., Simonsen, H. G., Bleses, D., Wehberg, S., Jørgensen, R. N., Eiesland, E. A. & Henriksen, L. Y. (2013). The use of the Internet in collecting CDI data – an example from Norway. Journal of Child Language 40, 567–85.
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children's production of multiword utterances: a usage-based analysis. Cognitive Linguistics 20, 481507.
MacWhinney, B. (2000). The CHILDES Project: tools for analyzing talk, 3rd ed. Mahwah, NJ: Lawrence Erlbaum Associates.
Marchman, V. A. & Martínez-Sussmann, C. (2002). Concurrent validity of caregiver/parent report measures of language for children who are learning both English and Spanish. Journal of Speech, Language, and Hearing Research 45, 983–97.
Mayor, J. & Plunkett, K. (2011). A statistical estimate of infant and toddler vocabulary size from CDI analysis. Developmental Science 14, 769–85.
Muggeo, V. M., Sciandra, M., Tomasello, A. & Calvo, S. (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology. Environmental and Ecological Statistics 20, 519–31.
Nelson, K. (1973). Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development 38, 1135.
Norrman, G. & Bylund, E. (2016). The irreversibility of sensitive period effects in language development: evidence from second language acquisition in international adoptees. Developmental Science 19, 513–20.
R Foundation for Statistical Computing (2014). R: a language and environment for statistical computing. Software, online: <>.
Rescorla, L. (1989). The language development survey: a screening tool for delayed language in toddlers. Journal of Speech and Hearing Disorders 54, 587–99.
Roy, B. C., Frank, M. C., DeCamp, P., Miller, M. & Roy, D. (2015). Predicting the birth of a spoken word. Proceedings of the National Academy of Sciences 112, 12663–68.
Song, J. Y., Shattuck-Hufnagel, S. & Demuth, K. (2015). Development of phonetic variants (allophones) in 2-year-olds learning American English: a study of alveolar stop /t, d/ codas. Journal of Phonetics 52, 152–69.
Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N. & Marchman, V. A. (2008). Baby's first 10 words. Developmental Psychology 44, 929–38.
Thal, D., Jackson-Maldonado, D. & Acosta, D. (2000). Validity of a parent-report measure of vocabulary and grammar for Spanish-speaking toddlers. Journal of Speech, Language, and Hearing Research 43, 1087–100.
Tomasello, M. & Mervis, C. B. (1994). The instrument is great, but measuring comprehension is still a problem. Monographs of the Society for Research in Child Development 59, 174–9.
Wallentin, M. (2009). Putative sex differences in verbal abilities and language cortex: a critical review. Brain and Language 108, 175–83.
Weisleder, A. & Fernald, A. (2013). Talking to children matters: early language experience strengthens processing and builds vocabulary. Psychological Science 24, 2143–52.
Wickham, H. (2009). Ggplot2: elegant graphics for data analysis. New York, NY: Springer Science & Business Media.
Wickham, H. & Francois, R. (2014). Dplyr: a grammar of data manipulation. R package version 0·3·0·2. Online: <>.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of Child Language
  • ISSN: 0305-0009
  • EISSN: 1469-7602
  • URL: /core/journals/journal-of-child-language
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed