Hostname: page-component-848d4c4894-4rdrl Total loading time: 0 Render date: 2024-06-24T02:18:09.487Z Has data issue: false hasContentIssue false

Wordbank: an open repository for developmental vocabulary data*

Published online by Cambridge University Press:  18 May 2016

Stanford University, USA
Stanford University, USA
Stanford University, USA
Stanford University, USA
Address for correspondence: Michael C. Frank, Department of Psychology, Jordan Hall (Bldg. 420), 450 Serra Mall, Stanford, CA 94305; tel: (650) 724-4003; e-mail:


The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.

Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)



This work supported by a John Merck Scholars award and NSF BCS-1528526. Thanks to Ranjay Krishna for contributions to the initial development of the site, to Rune Nørgaard Jørgensen for helping port data from CLEX, to all of the contributors listed at <> for generously sharing their data, and to the Advisory Board of the MacArthur-Bates Communicative Development Inventories, especially Philip Dale and Larry Fenson, for their support.



Bates, E. (1976). Language and context: the acquisition of pragmatics (Vol. 13). New York, NY: Academic Press.Google Scholar
Bates, E. & Goodman, J. (1999). On the emergence of grammar from the lexicon. In MacWhinney, B. (ed.), The emergence of language (pp. 2979). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Bates, E., Marchman, V., Thal, D., Fenson, L., Dale, P., Reznick, J. S., … Hartung, J. (1994). Developmental and stylistic variation in the composition of early vocabulary. Journal of Child Language 21, 85123.Google Scholar
Bloom, P. (2002). How children learn the meanings of words. Cambridge, MA: MIT Press.Google Scholar
Bornstein, M. H. & Haynes, O. M. (1998). Vocabulary competence in early childhood: measurement, latent construct, and predictive validity. Child Development 69, 654–71.CrossRefGoogle ScholarPubMed
Braginsky, M., Yurovsky, D., Marchman, V. A. & Frank, M. C. (2015). Developmental changes in the relationship between grammar and the lexicon. In Noelle, D. C., Dale, R., Warlaumont, A. S., Yoshimi, J., Matlock, T., Jennings, C. D., & Maglio, P. P. (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society.Google Scholar
Brown, R. (1973). A first language: the early stages. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
Cartmill, E. A., Armstrong, B. F., Gleitman, L. R., Goldin-Meadow, S., Medina, T. N. & Trueswell, J. C. (2013). Quality of early parent input predicts child vocabulary 3 years later. Proceedings of the National Academy of Sciences 110, 11278–83.Google Scholar
Clark, E. (2003). First language acquisition. Cambridge: Cambridge University Press.Google Scholar
Dale, P. S. (n.d.). Adaptations, not translations! Online: <> (last accessed 2015).+(last+accessed+2015).>Google Scholar
Dale, P. S. & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments & Computers 28, 125–7.Google Scholar
Dale, P. S. & Penfold, M. (n.d.). Adaptations of the MacArthur-Bates CDI into non-US English languages. Online: <> (last accessed 2011).+(last+accessed+2011).>Google Scholar
Dickinson, D. K. & Tabors, P. O. (2001). Beginning literacy with language: young children learning at home and school. Baltimore, MD: Paul H. Brookes Publishing.Google Scholar
Dunn, L. M. & Dunn, L. M. (2007). Peabody Picture Vocabulary Test, 4th ed. Parsippany, NJ: AGS Publishing / Pearson Assessments.Google Scholar
Eriksson, M., Marschik, P. B., Tulviste, T., Almgren, M., Pérez Pereira, M., Wehberg, S., … Gallego, C. (2012). Differences between girls and boys in emerging language skills: evidence from 10 language communities. British Journal of Developmental Psychology 30, 326–43.CrossRefGoogle Scholar
Feldman, H. M., Dale, P. S., Campbell, T. F., Colborn, D. K., Kurs-Lasky, M., Rockette, H. E. & Paradise, J. L. (2005). Concurrent and predictive validity of parent reports of child language at ages 2 and 3 years. Child Development 76, 856–68.Google Scholar
Feldman, H. M., Dollaghan, C. A., Campbell, T. F., Kurs-Lasky, M., Janosky, J. E. & Paradise, J. L. (2000). Measurement properties of the MacArthur Communicative Development Inventories at ages one and two years. Child Development 71, 310–22.Google Scholar
Fenson, L., Bates, E., Dale, P., Goodman, J., Reznick, J. S. & Thal, D. (2000). Reply: measuring variability in early child language: don't shoot the messenger. Child Development 71, 323–8.CrossRefGoogle Scholar
Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Hartung, J. P., Pethick, S. & Reilly, J. (1993). MacArthur Communicative Development Inventories: user's guide and technical manual. Baltimore, MD: Paul H. Brookes Publishing Co.Google Scholar
Fenson, L., Dale, P., Reznick, J., Bates, E., Thal, D., Pethick, S., … Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development 59.CrossRefGoogle ScholarPubMed
Fenson, L., Marchman, V. A., Thal, D., Dale, P., Reznick, J. S. & Bates, E. (2007). MacArthur-Bates Communicative Development Inventories: user's guide and technical manual, 2nd ed. Baltimore, MD: Brookes Publishing Company.Google Scholar
Hidaka, S. (2016). Estimating the latent number of types in growing corpora with reduced cost–accuracy trade-off. Journal of Child Language 43, 128.Google Scholar
Hills, T. T., Maouene, J., Riordan, B. & Smith, L. B. (2010). The associative structure of language: contextual diversity in early word learning. Journal of Memory and Language 63, 259–73.CrossRefGoogle ScholarPubMed
Hills, T. T., Maouene, M., Maouene, J., Sheya, A. & Smith, L. (2009). Longitudinal analysis of early semantic networks: Preferential attachment or preferential acquisition? Psychological Science 20, 729–39.CrossRefGoogle ScholarPubMed
Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M. & Lyons, T. (1991). Early vocabulary growth: relation to language input and gender. Developmental Psychology 27, 236–48.Google Scholar
Jørgensen, R. N., Dale, P. S., Bleses, D. & Fenson, L. (2010). CLEX: a cross-linguistic lexical norms database. Journal of Child Language 37, 419–28.CrossRefGoogle ScholarPubMed
Kristoffersen, K. E., Simonsen, H. G., Bleses, D., Wehberg, S., Jørgensen, R. N., Eiesland, E. A. & Henriksen, L. Y. (2013). The use of the Internet in collecting CDI data – an example from Norway. Journal of Child Language 40, 567–85.CrossRefGoogle ScholarPubMed
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children's production of multiword utterances: a usage-based analysis. Cognitive Linguistics 20, 481507.Google Scholar
MacWhinney, B. (2000). The CHILDES Project: tools for analyzing talk, 3rd ed. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Marchman, V. A. & Martínez-Sussmann, C. (2002). Concurrent validity of caregiver/parent report measures of language for children who are learning both English and Spanish. Journal of Speech, Language, and Hearing Research 45, 983–97.CrossRefGoogle ScholarPubMed
Mayor, J. & Plunkett, K. (2011). A statistical estimate of infant and toddler vocabulary size from CDI analysis. Developmental Science 14, 769–85.CrossRefGoogle ScholarPubMed
Muggeo, V. M., Sciandra, M., Tomasello, A. & Calvo, S. (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology. Environmental and Ecological Statistics 20, 519–31.CrossRefGoogle Scholar
Nelson, K. (1973). Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development 38, 1135.CrossRefGoogle Scholar
Norrman, G. & Bylund, E. (2016). The irreversibility of sensitive period effects in language development: evidence from second language acquisition in international adoptees. Developmental Science 19, 513–20.Google Scholar
R Foundation for Statistical Computing (2014). R: a language and environment for statistical computing. Software, online: <>..>Google Scholar
Rescorla, L. (1989). The language development survey: a screening tool for delayed language in toddlers. Journal of Speech and Hearing Disorders 54, 587–99.Google Scholar
Roy, B. C., Frank, M. C., DeCamp, P., Miller, M. & Roy, D. (2015). Predicting the birth of a spoken word. Proceedings of the National Academy of Sciences 112, 12663–68.Google Scholar
Song, J. Y., Shattuck-Hufnagel, S. & Demuth, K. (2015). Development of phonetic variants (allophones) in 2-year-olds learning American English: a study of alveolar stop /t, d/ codas. Journal of Phonetics 52, 152–69.Google Scholar
Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N. & Marchman, V. A. (2008). Baby's first 10 words. Developmental Psychology 44, 929–38.CrossRefGoogle ScholarPubMed
Thal, D., Jackson-Maldonado, D. & Acosta, D. (2000). Validity of a parent-report measure of vocabulary and grammar for Spanish-speaking toddlers. Journal of Speech, Language, and Hearing Research 43, 1087–100.Google Scholar
Tomasello, M. & Mervis, C. B. (1994). The instrument is great, but measuring comprehension is still a problem. Monographs of the Society for Research in Child Development 59, 174–9.Google Scholar
Wallentin, M. (2009). Putative sex differences in verbal abilities and language cortex: a critical review. Brain and Language 108, 175–83.Google Scholar
Weisleder, A. & Fernald, A. (2013). Talking to children matters: early language experience strengthens processing and builds vocabulary. Psychological Science 24, 2143–52.Google Scholar
Wickham, H. (2009). Ggplot2: elegant graphics for data analysis. New York, NY: Springer Science & Business Media.Google Scholar
Wickham, H. & Francois, R. (2014). Dplyr: a grammar of data manipulation. R package version 0·3·0·2. Online: <>..>Google Scholar