Skip to main content

Adding semantic roles to the Chinese Treebank


We report work on adding semantic role labels to the Chinese Treebank, a corpus already annotated with phrase structures. The work involves locating all verbs and their nominalizations in the corpus, and semi-automatically adding semantic role labels to their arguments, which are constituents in a parse tree. Although the same procedure is followed, different issues arise in the annotation of verbs and nominalized predicates. For verbs, identifying their arguments is generally straightforward given their syntactic structure in the Chinese Treebank as they tend to occupy well-defined syntactic positions. Our discussion focuses on the syntactic variations in the realization of the arguments as well as our approach to annotating dislocated and discontinuous arguments. In comparison, identifying the arguments for nominalized predicates is more challenging and we discuss criteria and procedures for distinguishing arguments from non-arguments. In particular we focus on the role of support verbs as well as the relevance of event/result distinctions in the annotation of the predicate-argument structure of nominalized predicates. We also present our approach to taking advantage of the syntactic structure in the Chinese Treebank to bootstrap the predicate-argument structure annotation of verbs. Finally, we discuss the creation of a lexical database of frame files and its role in guiding predicate-argument annotation. Procedures for ensuring annotation consistency and inter-annotator agreement evaluation results are also presented.

Hide All
Abney, S., Schapire, R., and Singer, Y. 1999. Boosting applied to tagging and PP attachment. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999, College Park, MD, USA.
Baker, C., Fillmore, C., and Lowe, J. 1998. The Berkeley FrameNet Project. In Proceedings of COLING-ACL, Montreal, Canada.
Burchardt, A., Erk, K., Frank, A., Kowalski, A., Pado, S., and Pinkal, M. 2006. The SALSA corpus: a German corpus resource for lexical semantics. In Proceedings of LREC 2006, Genoa, Italy, pp. 969–974.
Chen, Keh-Jiann, Huang, Chu-Ren, Chen, Feng-Yi, Luo, Chi-Ching, Chang, Ming-Chung, and Chen, Chao-Jan. 2004. Sinica Treebank: design criteria, representational issues and implementation. In Abeillé, Anne (ed.), Building and Using Parsed Corpora, Dordrecht, the Netherlands: Kluwer.
Chierchia, G. 1984. Topics in the Syntax and Semantics of Infinitives and Gerunds. Ph.D. thesis, University of Massachusetts at Amherst.
Hajič, Jan, Böhmová, A., Hajicová, E., and Hladká, B. 2003. The Prague Dependency Treebank: a three level annotation scenario. In Abeillé, Anne (ed.), Treebanks: Building and Using Annotated Corpora, Dordrecht, the Netherlands: Kluwer Academic Publishers.
Hindle, D., and Rooth, M. 1991. Structural ambiguity and lexical relations. In The 29th Annual Meeting of the Association for Computational Linguistics, University of California, Berkeley.
Levin, B. 1993. English Verbs and Alternations: A Preliminary Investigation. Chicago: The Unversity of Chicago Press.
Li, C., and Thompson, S. 1976. Subject and topic: a new typology of language. In Li, Charles (ed.), Subject and Topic. New York: Academic Press.
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19 2313–30
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., and Grishman, R.. 2004. The NomBank Project: an interim report. In Proceedings of the NAACL/HLT Workshop on Frontiers in Corpus Annotation, Boston, MA, pp. 24–31.
Palmer, M., Gildea, D., and Kingsbury, P. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics 31 171106
Palmer, M., Rosenzwieg, J., and Cotton, S. 2001. Automatic predicate argument analysis of the penn treebank. In Proceedings of the First International Conference on Human Language Technology Research, San Francisco.
Pantel, P., and Lin, D. 2000. An unsupervised approach to prepositional phrase attachment using contextually similar words. In Proceedings of the 38th Meeting of the Association for Computational Linguistics, October 2000, Hong Kong, pp. 101–8.
Siegel, S., and Castellan, N. J. Jr., 1988. Nonparametric Statistics for the Behavioral Sciences, 2nd ed.New York: McGraw-Hill.
Xue, N. 2003. Guidelines for the Chinese Proposition Bank.
Xue, N. 2004. Handling Dislocated and Discontinuous Constituents in Chinese Semantic Role Labeling. In Proceedings of the 4th Workshop on Asian Language Resources, ALR04, Hainan Island, China.
Xue, N. 2006a. A Chinese lexicon of roles and senses. Language Resources and Evaluation 40 3–4395403.
Xue, N. 2006b. Annotating the predicate-argument structure of Chinese nominalizations. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy.
Xue, N. 2008. Labeling Chinese Predicates with Semantic Roles. Computational Linguistics 34 2225–55.
Xue, N., and Palmer, M. 2003. Annotating the propositions in the Penn Chinese Treebank. In The Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan.
Xue, N., and Palmer, M. 2005. Automatic semantic role labeling for Chinese verbs. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, pp. 1160–5.
Xue, N., and Xia, F. 2000. The Bracketing Guidelines for Penn Chinese Treebank Project. Technical Report IRCS 00-08, University of Pennsylvania.
Xue, N., Xia, F., Chiou, F. d., and Palmer, M. 2005. The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Natural Language Engineering 11 2207–38
Yi, S., Loper, E., and Palmer, M. 2007. Can semantic roles generalize across genres? In Proceedings of NAACL-2007, Rochester, NY, pp. 548–55.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 2
Total number of PDF views: 30 *
Loading metrics...

Abstract views

Total abstract views: 160 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st March 2018. This data will be updated every 24 hours.