Skip to main content Accessibility help
Hostname: page-component-5959bf8d4d-bmjgf Total loading time: 1.261 Render date: 2022-12-09T03:52:58.931Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Wordbank: an open repository for developmental vocabulary data*

Published online by Cambridge University Press:  18 May 2016

Stanford University, USA
Stanford University, USA
Stanford University, USA
Stanford University, USA
Address for correspondence: Michael C. Frank, Department of Psychology, Jordan Hall (Bldg. 420), 450 Serra Mall, Stanford, CA 94305; tel: (650) 724-4003; e-mail:


The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.

Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)



This work supported by a John Merck Scholars award and NSF BCS-1528526. Thanks to Ranjay Krishna for contributions to the initial development of the site, to Rune Nørgaard Jørgensen for helping port data from CLEX, to all of the contributors listed at <> for generously sharing their data, and to the Advisory Board of the MacArthur-Bates Communicative Development Inventories, especially Philip Dale and Larry Fenson, for their support.



Bates, E. (1976). Language and context: the acquisition of pragmatics (Vol. 13). New York, NY: Academic Press.Google Scholar
Bates, E. & Goodman, J. (1999). On the emergence of grammar from the lexicon. In MacWhinney, B. (ed.), The emergence of language (pp. 2979). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Bates, E., Marchman, V., Thal, D., Fenson, L., Dale, P., Reznick, J. S., … Hartung, J. (1994). Developmental and stylistic variation in the composition of early vocabulary. Journal of Child Language 21, 85123.CrossRefGoogle ScholarPubMed
Bloom, P. (2002). How children learn the meanings of words. Cambridge, MA: MIT Press.Google Scholar
Bornstein, M. H. & Haynes, O. M. (1998). Vocabulary competence in early childhood: measurement, latent construct, and predictive validity. Child Development 69, 654–71.CrossRefGoogle ScholarPubMed
Braginsky, M., Yurovsky, D., Marchman, V. A. & Frank, M. C. (2015). Developmental changes in the relationship between grammar and the lexicon. In Noelle, D. C., Dale, R., Warlaumont, A. S., Yoshimi, J., Matlock, T., Jennings, C. D., & Maglio, P. P. (Eds.), Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society.Google Scholar
Brown, R. (1973). A first language: the early stages. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
Cartmill, E. A., Armstrong, B. F., Gleitman, L. R., Goldin-Meadow, S., Medina, T. N. & Trueswell, J. C. (2013). Quality of early parent input predicts child vocabulary 3 years later. Proceedings of the National Academy of Sciences 110, 11278–83.CrossRefGoogle ScholarPubMed
Clark, E. (2003). First language acquisition. Cambridge: Cambridge University Press.Google Scholar
Dale, P. S. (n.d.). Adaptations, not translations! Online: <> (last accessed 2015).+(last+accessed+2015).>Google Scholar
Dale, P. S. & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments & Computers 28, 125–7.CrossRefGoogle Scholar
Dale, P. S. & Penfold, M. (n.d.). Adaptations of the MacArthur-Bates CDI into non-US English languages. Online: <> (last accessed 2011).+(last+accessed+2011).>Google Scholar
Dickinson, D. K. & Tabors, P. O. (2001). Beginning literacy with language: young children learning at home and school. Baltimore, MD: Paul H. Brookes Publishing.Google Scholar
Dunn, L. M. & Dunn, L. M. (2007). Peabody Picture Vocabulary Test, 4th ed. Parsippany, NJ: AGS Publishing / Pearson Assessments.Google ScholarPubMed
Eriksson, M., Marschik, P. B., Tulviste, T., Almgren, M., Pérez Pereira, M., Wehberg, S., … Gallego, C. (2012). Differences between girls and boys in emerging language skills: evidence from 10 language communities. British Journal of Developmental Psychology 30, 326–43.CrossRefGoogle Scholar
Feldman, H. M., Dale, P. S., Campbell, T. F., Colborn, D. K., Kurs-Lasky, M., Rockette, H. E. & Paradise, J. L. (2005). Concurrent and predictive validity of parent reports of child language at ages 2 and 3 years. Child Development 76, 856–68.CrossRefGoogle ScholarPubMed
Feldman, H. M., Dollaghan, C. A., Campbell, T. F., Kurs-Lasky, M., Janosky, J. E. & Paradise, J. L. (2000). Measurement properties of the MacArthur Communicative Development Inventories at ages one and two years. Child Development 71, 310–22.CrossRefGoogle ScholarPubMed
Fenson, L., Bates, E., Dale, P., Goodman, J., Reznick, J. S. & Thal, D. (2000). Reply: measuring variability in early child language: don't shoot the messenger. Child Development 71, 323–8.CrossRefGoogle Scholar
Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Hartung, J. P., Pethick, S. & Reilly, J. (1993). MacArthur Communicative Development Inventories: user's guide and technical manual. Baltimore, MD: Paul H. Brookes Publishing Co.Google Scholar
Fenson, L., Dale, P., Reznick, J., Bates, E., Thal, D., Pethick, S., … Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development 59.CrossRefGoogle ScholarPubMed
Fenson, L., Marchman, V. A., Thal, D., Dale, P., Reznick, J. S. & Bates, E. (2007). MacArthur-Bates Communicative Development Inventories: user's guide and technical manual, 2nd ed. Baltimore, MD: Brookes Publishing Company.Google Scholar
Hidaka, S. (2016). Estimating the latent number of types in growing corpora with reduced cost–accuracy trade-off. Journal of Child Language 43, 128.CrossRefGoogle ScholarPubMed
Hills, T. T., Maouene, J., Riordan, B. & Smith, L. B. (2010). The associative structure of language: contextual diversity in early word learning. Journal of Memory and Language 63, 259–73.CrossRefGoogle ScholarPubMed
Hills, T. T., Maouene, M., Maouene, J., Sheya, A. & Smith, L. (2009). Longitudinal analysis of early semantic networks: Preferential attachment or preferential acquisition? Psychological Science 20, 729–39.CrossRefGoogle ScholarPubMed
Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M. & Lyons, T. (1991). Early vocabulary growth: relation to language input and gender. Developmental Psychology 27, 236–48.CrossRefGoogle Scholar
Jørgensen, R. N., Dale, P. S., Bleses, D. & Fenson, L. (2010). CLEX: a cross-linguistic lexical norms database. Journal of Child Language 37, 419–28.CrossRefGoogle Scholar
Kristoffersen, K. E., Simonsen, H. G., Bleses, D., Wehberg, S., Jørgensen, R. N., Eiesland, E. A. & Henriksen, L. Y. (2013). The use of the Internet in collecting CDI data – an example from Norway. Journal of Child Language 40, 567–85.CrossRefGoogle Scholar
Lieven, E., Salomo, D. & Tomasello, M. (2009). Two-year-old children's production of multiword utterances: a usage-based analysis. Cognitive Linguistics 20, 481507.CrossRefGoogle Scholar
MacWhinney, B. (2000). The CHILDES Project: tools for analyzing talk, 3rd ed. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
Marchman, V. A. & Martínez-Sussmann, C. (2002). Concurrent validity of caregiver/parent report measures of language for children who are learning both English and Spanish. Journal of Speech, Language, and Hearing Research 45, 983–97.CrossRefGoogle ScholarPubMed
Mayor, J. & Plunkett, K. (2011). A statistical estimate of infant and toddler vocabulary size from CDI analysis. Developmental Science 14, 769–85.CrossRefGoogle Scholar
Muggeo, V. M., Sciandra, M., Tomasello, A. & Calvo, S. (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology. Environmental and Ecological Statistics 20, 519–31.CrossRefGoogle Scholar
Nelson, K. (1973). Structure and strategy in learning to talk. Monographs of the Society for Research in Child Development 38, 1135.CrossRefGoogle Scholar
Norrman, G. & Bylund, E. (2016). The irreversibility of sensitive period effects in language development: evidence from second language acquisition in international adoptees. Developmental Science 19, 513–20.CrossRefGoogle ScholarPubMed
R Foundation for Statistical Computing (2014). R: a language and environment for statistical computing. Software, online: <>..>Google Scholar
Rescorla, L. (1989). The language development survey: a screening tool for delayed language in toddlers. Journal of Speech and Hearing Disorders 54, 587–99.CrossRefGoogle ScholarPubMed
Roy, B. C., Frank, M. C., DeCamp, P., Miller, M. & Roy, D. (2015). Predicting the birth of a spoken word. Proceedings of the National Academy of Sciences 112, 12663–68.CrossRefGoogle ScholarPubMed
Song, J. Y., Shattuck-Hufnagel, S. & Demuth, K. (2015). Development of phonetic variants (allophones) in 2-year-olds learning American English: a study of alveolar stop /t, d/ codas. Journal of Phonetics 52, 152–69.CrossRefGoogle Scholar
Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N. & Marchman, V. A. (2008). Baby's first 10 words. Developmental Psychology 44, 929–38.CrossRefGoogle ScholarPubMed
Thal, D., Jackson-Maldonado, D. & Acosta, D. (2000). Validity of a parent-report measure of vocabulary and grammar for Spanish-speaking toddlers. Journal of Speech, Language, and Hearing Research 43, 1087–100.CrossRefGoogle ScholarPubMed
Tomasello, M. & Mervis, C. B. (1994). The instrument is great, but measuring comprehension is still a problem. Monographs of the Society for Research in Child Development 59, 174–9.CrossRefGoogle Scholar
Wallentin, M. (2009). Putative sex differences in verbal abilities and language cortex: a critical review. Brain and Language 108, 175–83.CrossRefGoogle ScholarPubMed
Weisleder, A. & Fernald, A. (2013). Talking to children matters: early language experience strengthens processing and builds vocabulary. Psychological Science 24, 2143–52.CrossRefGoogle ScholarPubMed
Wickham, H. (2009). Ggplot2: elegant graphics for data analysis. New York, NY: Springer Science & Business Media.CrossRefGoogle Scholar
Wickham, H. & Francois, R. (2014). Dplyr: a grammar of data manipulation. R package version 0·3·0·2. Online: <>..>Google Scholar
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Wordbank: an open repository for developmental vocabulary data*
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Wordbank: an open repository for developmental vocabulary data*
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Wordbank: an open repository for developmental vocabulary data*
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *