BLANC: Implementing the Rand index for coreference evaluation

M. RECASENS; E. HOVY

doi:10.1017/S135132491000029X

BLANC: Implementing the Rand index for coreference evaluation

Published online by Cambridge University Press: 06 December 2010

M. RECASENS and

E. HOVY

Show author details

M. RECASENS: Affiliation:
CLiC, University of Barcelona, Gran Via 585, Barcelona 08007, Spain email: mrecasens@ub.edu
E. HOVY: Affiliation:
USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292, USA email: hovy@isi.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper addresses the current state of coreference resolution evaluation, in which different measures (notably, MUC, B3, CEAF, and ACE-value) are applied in different studies. None of them is fully adequate, and their measures are not commensurate. We enumerate the desiderata for a coreference scoring measure, discuss the strong and weak points of the existing measures, and propose the BiLateral Assessment of Noun-Phrase Coreference, a variation of the Rand index created to suit the coreference task. The BiLateral Assessment of Noun-Phrase Coreference rewards both coreference and non-coreference links by averaging the F-scores of the two types, does not ignore singletons – the main problem with the MUC score – and does not inflate the score in their presence – a problem with the B3 and CEAF scores. In addition, its fine granularity is consistent over the whole range of scores and affords better discrimination between systems.

Information

Type: Articles
Information: Natural Language Engineering , Volume 17 , Issue 4 , October 2011 , pp. 485 - 510

DOI: https://doi.org/10.1017/S135132491000029X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, F. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12 (4): 461–486.CrossRef Google Scholar

Bagga, A., and Baldwin, B. 1998. Algorithms for scoring coreference chains. In Proceedings of the LREC 1998 Workshop on Linguistic Coreference, Granada, Spain, pp. 563–566.Google Scholar

Bengtson, E., and Roth, D. 2008. Understanding the value of features for coreference resolution. In Proceedings of EMNLP, Honolulu, HI, pp. 294–303.Google Scholar

Cai, J., and Strube, M. 2010. Evaluation metrics for end-to-end coreference resolution systems. In Proceedings of SIGDIAL, University of Tokyo, Japan, pp. 28–36.Google Scholar

Choi, Y., and Cardie, C. 2007. Structured local training and biased potential functions for conditional random fields with application to coreference resolution. In Proceedings of HLT-NAACL, Rochester, NY, pp. 65–72.Google Scholar

Culotta, A., Wick, M., Hall, R., and McCallum, A. 2007. First-order probabilistic models for coreference resolution. In Proceedings of HLT-NAACL, Rochester, NY, pp. 81–88.Google Scholar

Daumé, H. III and Marcu, D. 2005. A large-scale exploration of effective global features for a joint entity detection and tracking model. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 97–104.Google Scholar

Denis, P., and Baldridge, J. 2009. Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural 42: 87–96.Google Scholar

Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program – tasks, data and evaluation. In Proceedings of LREC, Lisbon, Portugal, pp. 837–840.Google Scholar

Finkel, J. R., and Manning, C. D. 2008. Enforcing transitivity in coreference resolution. In Proceedings of ACL-HLT, Columbus, OH, pp. 45–48.Google Scholar

Haghighi, A., and Klein, D. 2007. Unsupervised coreference resolution in a nonparametric Bayesian model. In Proceedings of ACL, Prague, Czech Republic, pp. 848–855.Google Scholar

Haghighi, A., and Klein, D. 2009. Simple coreference resolution with rich syntactic and semantic features. In Proceedings of EMNLP, Suntec, Singapore, pp. 1152–1161.CrossRef Google Scholar

Hirschman, L., and Chinchor, N. 1997. MUC-7 coreference task definition – version 3.0. In Proceedings of MUC-7. Washington, DC.Google Scholar

Hubert, L., and Arabie, P. 1985. Comparing partitions. Journal of Classification 2 (1): 193–218.CrossRef Google Scholar

Luo, X. 2005. On coreference resolution performance metrics. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 25–32.Google Scholar

Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., and Roukos, S. 2004. A mention-synchronous coreference resolution algorithm based on the Bell tree. In Proceedings of ACL, Barcelona, Spain, pp. 21–26.Google Scholar

Luo, X., and Zitouni, I. 2005. Multi-lingual coreference resolution with syntactic features. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 660–667.Google Scholar

Ng, V. 2009. Graph-cut-based anaphoricity determination for coreference resolution. In Proceedings of NAACL-HLT, Boulder, CO, pp. 575–583.Google Scholar

Ng, V., and Cardie, C. 2002. Improving machine learning approaches to coreference resolution. In Proceedings of ACL, Philadelphia, PA, pp. 104–111.Google Scholar

Poon, H., and Domingos, P. 2008. Joint unsupervised coreference resolution with Markov logic. In Proceedings of EMNLP, Honolulu, HI, pp. 650–659.Google Scholar

Popescu-Belis, A. 2000. Évaluation numérique de la résolution de la référence: critiques et propositions. T.A.L.: Traitement automatique de la langue 40 (2): 117–146.Google Scholar

Popescu-Belis, A., Rigouste, L., Salmon-Alt, S., and Romary, L. 2004. Online evaluation of coreference resolution. In Proceedings of LREC, Lisbon, Portugal, pp. 1507–1510.Google Scholar

Rahman, A., and Ng, V. 2009. Supervised models for coreference resolution. In Proceedings of EMNLP, Suntec, Singapore, pp. 968–977.CrossRef Google Scholar

Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66 (336): 846–850.CrossRef Google Scholar

Recasens, M., and Hovy, E. 2010. Coreference resolution across corpora: languages, coding schemes, and preprocessing information. In Proceedings of ACL, Uppsala, Sweden, pp. 1423–1432.Google Scholar

Recasens, M., Màrquez, L., Sapena, E., Martí, M. A., Taulé, M., Hoste, V., Poesio, M., and Versley, Y. 2010. SemEval-2010 task 1: coreference resolution in multiple languages. In Proceedings of the Fifth International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden, pp. 1–8.Google Scholar

Recasens, M., and Martí, M. A. 2010. AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44 (4): 315–345.CrossRef Google Scholar

Soon, W. M., Ng, H. T., and Lim, D. C. Y. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27 (4): 521–544.CrossRef Google Scholar

Stoyanov, V., Gilbert, N., Cardie, C., and Riloff, E. 2009. Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In Proceedings of ACL-IJCNLP, Suntec, Singapore, pp. 656–664.CrossRef Google Scholar

Vilain, M., Burger, J., Aberdeen, J., Connolly, D., and Hirschman, L. 1995. A model-theoretic coreference scoring scheme. In Proceedings of MUC-6, San Francisco, CA, pp. 45–52.Google Scholar

Wick, M., and McCallum, A. 2009. Advances in learning and inference for partition-wise models of coreference resolution. Technical Report UM-CS-2009-028, Department of Computer Science, University of Massachusetts.CrossRef Google Scholar

Yang, X., Su, J., Lang, J., Tan, C. L., Liu, T., and Li, S. 2008. An entity-mention model for coreference resolution with inductive logic programming. In Proceedings of ACL-HLT, Columbus, OH, pp. 843–851.Google Scholar

Article contents

BLANC: Implementing the Rand index for coreference evaluation

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests