Skip to main content Accessibility help
×
Home

Annotation projection for temporal information extraction

  • Chris R. Giannella (a1), Ransom K. Winder (a1) and Joseph P. Jubinski (a1)

Abstract

Approaches to building temporal information extraction systems typically rely on large, manually annotated corpora. Thus, porting these systems to new languages requires acquiring large corpora of manually annotated documents in the new languages. Acquiring such corpora is difficult owing to the complexity of temporal information extraction annotation. One strategy for addressing this difficulty is to reduce or eliminate the need for manually annotated corpora through annotation projection. This technique utilizes a temporal information extraction system for a source language (typically English) to automatically annotate the source language side of a parallel corpus. It then uses automatically generated word alignments to project the annotations, thereby creating noisily annotated target language training data. We developed an annotation projection technique for producing target language temporal information extraction systems. We carried out an English (source) to French (target) case study wherein we compared a French temporal information extraction system built using annotation projection with one built using a manually annotated French corpus. While annotation projection has been applied to building other kinds of Natural Language Processing tools (e.g., Named Entity Recognizers), to our knowledge, this is the first paper examining annotation projection as applied to temporal information extraction where no manual corrections of the target language annotations were made. We found that, even using manually annotated data to build a temporal information extraction system, F-scores were relatively low (<0.35), which suggests that the problem is challenging even with manually annotated data. Our annotation projection approach performed well (relative to the system built from manually annotated data) on some aspects of temporal information extraction (e.g., event–document creation time temporal relation prediction), but it performed poorly on the other kinds of temporal relation prediction (e.g., event–event and event–time).

Copyright

Corresponding author

*Corresponding author. Email: cgiannella@mitre.org

References

Hide All
Bethard, S. (2013). ClearTK-TimeML: A minimalist approach to TempEval 2013. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-13) as part of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 1014.
Bittar, A., Amsili, P., Denis, P. and Danios, L. (2011). French Timebank: An ISO-TimeML annotated reference corpus. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2. Association for Computational Linguistics, pp. 130134.
Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R. and Specia, L. (2013). Findings of the 2013 workshop on statistical machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp. 144.
Caselli, T., Fokkens, A., Morante, R. and Vossen, P. (2015). SPINOZA_VU: An NLP pipeline for cross document TimeLines. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval-15) as part of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 787791.
Chambers, N., Wang, S. and Jurafsky, D. (2007). Classifying temporal relations between events. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) – Interactive Poster and Demonstration Session. Association for Computational Linguistics, pp. 173176.
Chambers, N., Cassidy, T., McDowell, B. and Bethard, S. (2014). Dense event ordering with a multi-pass architecture. Transactions of the Association for Computational Linguistics 2, 273284.
Costa, F. and Branco, A. (2010). Temporal information processing of a new language: Fast porting with minimal resources. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 671677.
D’souza, J. and Ng, V. (2013). Classifying temporal relations with rich linguistic knowledge. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Association for Computational Linguistics, pp. 918927.
Das, D. and Petrov, S. (2011). Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, pp. 600609.
Do, Q., Lu, W. and Roth, D. (2012). Joint inference for event timeline construction. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, pp. 677687.
Forascu, C. and Tufis, D. (2012). Romanian TimeBank: An annotated parallel corpus for temporal information. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC). European Language Resource Association, pp. 37623766.
Fairholm, W.O. (2014). Annotation of Temporal Relations Using Markov Logic Networks and Temporal Centering. Master’s Thesis, Guelph, Ontario, Canada: School of Computer Science, University of Guelph.
Ganchev, K. and Das, D. (2013). Cross-lingual discriminative learning of sequence models with posterior regularization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 19962006.
Ganchev, K., Gillenwater, J. and Taskar, B. (2009). Dependency grammar induction via Bitext projection constraints. In Proceedings of the 47th Annual Meeting of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, pp. 369377.
Genkin, A., Lewis, D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics (American Statistical Association and the American Society for Quality) 49(3), 291304. doi: 10.1198/004017007000000245.
Glavas, G. and Snajder, J. (2015). Construction and evaluation of event graphs. Natural Language Engineering 21(4), 607652. doi: 10.1017/S1351324914000060.
Gouws, S. and Sogaard, A. (2015). Simple task-specific bilingual word embedding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics (ACL), pp. 13861390.
He, L., Gillenwater, J. and Taskar, B. (2013). Graph-based posterior regularization for semi-supervised structured prediction. In Proceedings of the 17th Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, pp. 3846.
Jang, S.B., Baldwin, J. and Mani, I. (2004). Automatic TIMEX2 tagging of Korean news. ACM Transactions on Asian Languages Information Processing 3(1), 5165. doi: 10.1145/1017068.1017072.
Jarzebowski, P. and Przepiorkowski, A. (2012). Temporal information extraction with cross-language projected data. In Isahara, H. and Kanzaki, K. (eds), Advances in Natural Language Processing, Lecture Notes in Computer Science, vol. 7614, Springer, Berlin, Heidelberg, pp. 198209.
Jeong, Y.-S., Kim, Z.M., Do, H.-W., Lim, C.-G. and Choi, H.-J. (2015). Temporal information extraction from Korean texts. In Proceedings of the 19th Conference on Computational Language Learning (CoNLL). Association for Computational Linguistics, pp. 279288.
Kozhevnikov, M. and Titov, I. (2014). Cross-lingual model transfer using feature representation projection. In Proceedings of the 52nd Annual Meetings of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), pp. 579585.
Laokulrat, N., Miwa, M. and Tsuruoka, Y. (2015). Stacking approach to temporal relation classification with temporal inference. Journal of Natural Language Processing 22(3), 171196. doi: 10.5715/jnlp.22.171.
Laparra, E., Aldabe, I. and Rigau, G. (2015). Document level time-anchoring for TimeLine extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL) and the 7th International Joint Conference on Natural Language Processing (IJCNLP). Association for Computational Linguistics, pp. 358364.
Liang, P., Taskar, B. and Klein, D. (2006). Alignment by agreement. In Proceeding of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL). Association of Computational Linguistics, pp. 104111.
Ling, X. and Weld, D. (2010). Temporal information extraction. In Proceedings of the 24th AAAI Conference on Artificial Intelligence. The AAAI Press, pp. 13851390.
Llorens, H., Saquete, E. and Navarro, B. (2010). TIPSem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10) as part of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 284291.
Luong, M.-T. Pham, H. and Manning, C. (2015). Bilingual word representation with monolingual quality in mind. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics (ACL), pp. 151159.
Manfredi, G., Strotgen, J., Zell, J. and Gertz, M. (2014). HeidelTime at EVENTI: Tuning Italian resources and addressing TimeML’s empty tags. In Proceedings of the 1st Italian Conference on Computational Linguistics (CLiC-it) & the 4th International Workshop EVALITA, pp. 3943.
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations. Association for Computational Linguistics, pp. 5560.
McCallum, A. (2002). Available at http://mallet.cs.umass.edu (accessed 16 July 2013).
Minard, A.-L., Speranza, M., Urizar, R., Altuna, B., van Erp, M., Schoen, A. and van Son, C. (2016). MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC). European Languages Resources Association, pp. 44174422.
Mirroshandel, S.A. and Ghassem-Sani, G. (2012) Towards unsupervised learning of temporal relations between events. Journal of Artificial Intelligence Research 45, 125163. doi: 10.1613/jair.3693.
Mirroshandel, S.A., Ghassem-Sani, G. and Khayyamian, M. (2011). Using syntactic-based kernels for classifying temporal relations. Journal of Computer Science and Technology 26(1), 6880. doi: 10.1007/s11390-011-9416-7.
Mirza, P. and Minard, A.-L. (2014). FBK-HLT-time: a Complete Italian Temporal Processing System for EVENTI-Evalita 2014. In Proceedings of the 1st Italian Conference on Computational Linguistics (CLiC-it) & the 4th International Workshop EVALITA, pp. 4449.
Mirza, P. and Tonelli, S. (2014). Classifying temporal relations with simple features. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Association for Computational Linguistics, pp. 308317.
Moriceau, V. and Tannier, X. (2014). French resources for extraction and normalization of temporal expressions with HeidelTime. In 9th International Conference on Language Resources and Evaluation (LREC). The European Language Resources Association, pp. 32393243.
Skukan, L., Glavaš, G. and Šnajder, J. (2014). Heideltime.HR: Extracting and normalizing temporal expressions in Croatian. In Proceedings of the 9th Language Technologies Conference. Department of Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia, pp. 99103.
Spreyer, K. and Frank, A. (2008). Projection-based acquisition of a temporal labeller. In Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP), pp. 489496.
Strötgen, J. and Gertz, M. (2010). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10) as part of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 321324.
Strötgen, J. and Gertz, M. (2013). Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47(2), 269298. doi: 10.1007/s10579-012-9179-y.
Strötgen, J. and Gertz, M. (2015). A baseline temporal tagger for all languages. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 541547.
Strötgen, J. and Gertz, M. (2016). Domain-sensitive temporal tagging. Synthesis Lectures on Human Language Technologies 9(3), 1151.
Tackstrom, O., Das, D., Petrov, S., McDonald, R. and Nivre, J. (2013). Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1, 112.
The Apache Software Foundation. (2016). Apache Lucene 6.0.0 documentation. April 7. Available at https://lucene.apache.org/core/6_0_0/index.html (accessed 26 May 2016).
Torbati, M., Ghassem-Sani, G., Mirroshandel, S., Yaghoobzadeh, Y. and Hosseini, N. (2013). Temporal relation classification in Persian and English contexts. In Proceedings of the Recent Advances in Natural Language Processing (RANLP), pp. 261269.
UzZaman, N., Llorens, H., Allen, J., Derczynski, L., Verhagen, M. and Pustejovsky, J. (2013). SemEval-2013 Task 1: TempEval-3: Evaluating events, time expressions and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval-13) as part of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 19.
Verhagen, M., Gaizauskas, R., Schilder, F. and Pustejovsky, J. (2009). The TempEval challenge: identifying temporal relations in text. Language Resources and Evaluation 43(2), 161179. doi: 10.1007/s10579-009-9086-z.
Verhagen, M., Saurí, R., Caselli, T. and Pustejovsky, J. (2010). SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval-10) as part of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, pp. 5762.
Wang, M. and Manning, C. (2014). Cross-lingual projected expectation regularization for weakly supervised learning. Transactions of the Association for Computational Linguistics 2, 5566.
Yarowski, D. and Ngai, G. (2001). Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics.
Yarowsky, D., Ngai, G. and Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research. Association for Computational Linguistics, pp. 18.
Yoshikawa, K., Riedel, S., Asahara, M. and Matsumoto, Y. (2009). Jointly identifying temporal relations with Markov logic. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) of the Asian Federation of Natural Language Processing (AFNLP). Association of Computational Linguistics and Asian Federation of Natural Language Processing, pp. 405413.
Zennaki, O., Semmar, N. and Besacier, L. (2016). Inducing multilingual text analysis tools using bidirectional recurrent neural networks. In Proceedings of the 26th International Conference on Computational Linguistics (COLING). The Association for Computational Linguistics (ACL), pp. 450460.

Keywords

Annotation projection for temporal information extraction

  • Chris R. Giannella (a1), Ransom K. Winder (a1) and Joseph P. Jubinski (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed