Skip to main content
×
Home
    • Aa
    • Aa

Improving mention detection for Basque based on a deep error analysis

  • ANDER SORALUZE (a1), OLATZ ARREGI (a1), XABIER ARREGI (a1) and ARANTZA DÍAZ DE ILARRAZA (a1)
Abstract
Abstract

This paper presents the improvement process of a mention detector for Basque. The system is rule-based and takes into account the characteristics of mentions in Basque. A classification of error types is proposed based on the errors that occur during mention detection. A deep error analysis distinguishing error types and causes is presented and improvements are proposed. At the final stage, the system obtains an F-measure of 74.57% under the Exact Matching protocol and of 80.57% under Lenient Matching. We also show the performance of the mention detector with gold standard data as input, in order to omit errors caused by the previous stages of linguistic processing. In this scenario, we obtain an F-measure of 85.89% with Strict Matching and of 89.06% with Lenient Matching, i.e., a difference of 11.32 and 8.49 percentage points, respectively. Finally, how improvements in mention detection affect coreference resolution is analysed.

Copyright
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

I. Alegria , X. Artola , K. Sarasola , and M. Urkia , 1996. Automatic morphological analysis of Basque. Literary & Linguistic Computing 11 (4): 193203.

R. Artstein , and M. Poesio , 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34 (4): 555–96.

K. Hacioglu , B. Douglas , and Y. Chen , 2005. Detection of entity mentions occuring in english and Chinese text. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT ’05), Vancouver, British Columbia, Canada, pp. 379–86.

F. Karlsson , J. Voutilainen , J. Heikkilä , and A. Anttila , 1995. Constraint Grammar: Language-independent System for Parsing Unrestricted Text. Berlin: Mouton de Gruyter.

M. Kopeć , and M. Ogrodniczuk 2014. Inter-annotator agreement in coreference annotation of Polish. In J. Sobecki , V. Boonjing , and S. Chittayasothorn , (eds.), Advanced Approaches to Intelligent Information and Database Systems, Studies in Computational Intelligence, vol. 551. Switzerland: Springer. Springer International Publishing, Switzerland.

H. Lee , A. Chang , Y. Peirsman , N. Chambers , M. Surdeanu , and D. Jurafsky , 2013. A generic anaphora resolution engine for Indian languages. Computational Linguistics 39 (4): 885916.

X. Luo , 2005. On coreference resolution performance metrics. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT ’05), Vancouver, British Columbia, Canada, pp. 2532.

L. Màrquez , M. Recasens , and E. Sapena , 2013. Coreference resolution: an empirical study based on SemEval-2010 shared task 1. Language Resources and Evaluation 47 (3): 661–94.

S. Pradhan , E. Hovy , M. Marcus , M. Palmer , L. Ramshaw , and R. Weischedel , 2007. OntoNotes: a unified relational semantic representation. In Proceedings of the International Conference on Semantic Computing (ICSC 2007), Irvine, California, pp. 517–26.

M. Recasens , and M. Hovy , 2011. BLANC: implementing the Rand index for coreference evaluation. Natural Language Engineering 17 (4): 485510.

M. Recasens , and M. Martí , 2010. AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44 (4): 315–45.

W. M. Soon , H. T. Ng , and D. C. Y. Lim , 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27 (4): 521–44.

M. Vilain , J. Burger , J. Aberdeen , D. Connolly , and L. Hirschman , 1995. A model-theoretic coreference scoring scheme. In Proceedings of the 6th Conference on Message Understanding (MUC6), Columbia, Maryland, pp. 4552.

Y. Versley , S. P. Ponzetto , M. Poesio , V. Eidelman , A. Jern , J. Smith , X. Yang , and A. Moschitti , 2008. BART: a modular toolkit for coreference resolution. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, Columbus, Ohio, pp. 912.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 2
Total number of PDF views: 39 *
Loading metrics...

Abstract views

Total abstract views: 276 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 20th September 2017. This data will be updated every 24 hours.