Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-03-29T22:49:48.290Z Has data issue: false hasContentIssue false

Improving mention detection for Basque based on a deep error analysis

Published online by Cambridge University Press:  12 July 2016

ANDER SORALUZE
Affiliation:
IXA Group, University of the Basque Country, Donostia-San Sebastián, Spain. e-mails ander.soraluze@ehu.es, olatz.arregi@ehu.es, xabier.arregi@ehu.es, a.diazdeilarraza@ehu.es.
OLATZ ARREGI
Affiliation:
IXA Group, University of the Basque Country, Donostia-San Sebastián, Spain. e-mails ander.soraluze@ehu.es, olatz.arregi@ehu.es, xabier.arregi@ehu.es, a.diazdeilarraza@ehu.es.
XABIER ARREGI
Affiliation:
IXA Group, University of the Basque Country, Donostia-San Sebastián, Spain. e-mails ander.soraluze@ehu.es, olatz.arregi@ehu.es, xabier.arregi@ehu.es, a.diazdeilarraza@ehu.es.
ARANTZA DÍAZ DE ILARRAZA
Affiliation:
IXA Group, University of the Basque Country, Donostia-San Sebastián, Spain. e-mails ander.soraluze@ehu.es, olatz.arregi@ehu.es, xabier.arregi@ehu.es, a.diazdeilarraza@ehu.es.

Abstract

This paper presents the improvement process of a mention detector for Basque. The system is rule-based and takes into account the characteristics of mentions in Basque. A classification of error types is proposed based on the errors that occur during mention detection. A deep error analysis distinguishing error types and causes is presented and improvements are proposed. At the final stage, the system obtains an F-measure of 74.57% under the Exact Matching protocol and of 80.57% under Lenient Matching. We also show the performance of the mention detector with gold standard data as input, in order to omit errors caused by the previous stages of linguistic processing. In this scenario, we obtain an F-measure of 85.89% with Strict Matching and of 89.06% with Lenient Matching, i.e., a difference of 11.32 and 8.49 percentage points, respectively. Finally, how improvements in mention detection affect coreference resolution is analysed.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable