Morphosyntactic annotation of CHILDES transcripts*

KENJI SAGAE; ERIC DAVIS; ALON LAVIE; BRIAN MACWHINNEY; SHULY WINTNER

doi:10.1017/S0305000909990407

Morphosyntactic annotation of CHILDES transcripts*

Published online by Cambridge University Press: 25 March 2010

KENJI SAGAE ,

ERIC DAVIS ,

ALON LAVIE ,

BRIAN MACWHINNEY and

SHULY WINTNER

Show author details

KENJI SAGAE*: Affiliation:
Institute for Creative Technologies, University of Southern California
ERIC DAVIS: Affiliation:
Language Technologies Institute, Carnegie Mellon University
ALON LAVIE: Affiliation:
Language Technologies Institute, Carnegie Mellon University
BRIAN MACWHINNEY: Affiliation:
Department of Psychology, Carnegie Mellon University
SHULY WINTNER: Affiliation:
Department of Computer Science, University of Haifa, Israel
*: Address for correspondence: Kenji Sagae, USC Institute for Creative Technologies, 13274 Fiji Way, Marina del Rey, CA 90292. e-mail: sagae@usc.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes.

Information

Type: Articles
Information: Journal of Child Language , Volume 37 , Special Issue 3: Computational models of child language learning , June 2010 , pp. 705 - 729

DOI: https://doi.org/10.1017/S0305000909990407 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

[*]

We thank Marina Fedner for help with annotation of the English data, and Bracha Nir for help with annotation of the Hebrew data. This research was supported in part by Grant No. 2007241 from the United States–Israel Binational Science Foundation (BSF) and by the National Science Foundation (NSF) under grant IIS-0414630.

References

REFERENCES

Berger, A., Della Pietra, S. A. & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71.Google Scholar

Berman, R. A. (1978). Modern Hebrew structure. Tel Aviv: University Publishing Projects.Google Scholar

Berman, R. A. (1979). Lexical decomposition and lexical unity in the expression of derived verbal categories in modern Hebrew. Afroasiatic Linguistics 6, 1–26.Google Scholar

Bloom, L. (1970). Language development: Form and function in emerging grammars. Cambridge, MA: MIT Press.Google Scholar

Bod, R. (2009). From exemplar to grammar: A probabilistic analogy-based model of language learning. Cognitive Science 33(5), 752–93.CrossRef Google Scholar PubMed

Borensztajn, G., Zuidema, J. & Bod, R. (2009). Children's grammars grow more abstract with age – evidence from an automatic procedure for identifying the productive units of language. Topics in Cognitive Science 1, 175–88.CrossRef Google Scholar PubMed

Briscoe, T. & Carroll, J. (1993). Generalised probabilistic lr parsing of natural language (corpora) with unification-based grammars. Computational Linguistics 19(1), 25–59.Google Scholar

Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.CrossRef Google Scholar

Buchholz, S. & Marsi, E. (2006). Conll-x shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CONLL-x), 149–64. New York City: Association for Computational Linguistics.Google Scholar

Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of the First Conference of the North American Chapter of the Association for Computational Linguistics, 132–39. San Francisco, CA: Morgan Kaufmann Publishers Inc.Google Scholar

Doron, E. (1983). Verbless predicates in Hebrew. Unpublished doctoral dissertation, University of Texas at Austin.Google Scholar

Hudson, R. A. (1984). Word grammar. Oxford: Basil Blackwell.Google Scholar

Knuth, D. (1965). On the translation of languages from left to right. Information and Control 8(6), 607–639.CrossRef Google Scholar

Lee, L. (1974). Developmental sentence analysis. Evanston, IL: Northwestern University Press.Google Scholar

MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk, 3rd edn. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

MacWhinney, B. (2008). Enriching CHILDES for morphosyntactic analysis. In Behrens, H. (ed.), Corpora in language acquisition research: History, methods, perspectives, Vol. 6, 165–98. Amsterdam: Benjamins.CrossRef Google Scholar

Mel'čuk, I. A. (1988). Dependency syntax: Theorie and practice. Albany, NY: SUNY Press.Google Scholar

Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the Eighth International Worskshop on Parsing Technologies (IWPT), 149–60. Nancy.Google Scholar

Nivre, J., Hall, J., Nilsson, J., Eryigit, G. & Marinov, S. (2006). Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the Tenth Conference on Computational Natural Language Learning, 221–25. New York: Association for Computational Linguistics.Google Scholar

Parisse, C. & Le Normand, M.-T. (2000). Automatic disambiguation of the morphosyntax in spoken language corpora. Behavior Research Methods, Instruments and Computers 32, 468–81.CrossRef Google Scholar PubMed

Peters, A. M. (1983). The units of language acquisition. New York: Cambridge University Press.Google Scholar

Sagae, K. & Lavie, A. (2006). A best-first probabilistic shift-reduce parser. In Proceedings of the Coling/ACL Poster Session, 691–98. Sydney: Association for Computational Linguistics.Google Scholar

Sagae, K., Lavie, A. & MacWhinney, B. (2004). Adding syntactic annotations to transcripts of parent–child dialogs. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), 1815–18. Lisbon: European Language Resources Association.Google Scholar

Sagae, K., Lavie, A. & MacWhinney, B. (2005). Automatic measurement of syntactic development in child language. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), 197–204. Ann Arbor, MI: Association for Computational Linguistics.Google Scholar

Sagae, K. & Tsujii, J. (2007). Dependency parsing and domain adaptation with LR models and parser ensembles. In Proceedings of the CONLL Shared Task Session of the Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CONLL 2007), 1044–50. Prague: Association for Computational Linguistics.Google Scholar

Scarborough, H. S. (1990). Index of productive syntax. Applied Psycholinguistics 11, 1–22.CrossRef Google Scholar

Tomita, M. (ed.) (1991). Generalized LR parsing. Boston: Kluwer Academic Publishing.CrossRef Google Scholar

Wilson, B. & Peters, A. M. (1988). What are you cookin' on a hot?: A three-year-old blind child's ‘violation’ of universal constraints on constituent movement. Language 64, 249–73.CrossRef Google Scholar

Article contents

Morphosyntactic annotation of CHILDES transcripts*

Abstract

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests