Skip to main content

BLANC: Implementing the Rand index for coreference evaluation

  • M. RECASENS (a1) and E. HOVY (a2)

This paper addresses the current state of coreference resolution evaluation, in which different measures (notably, MUC, B3, CEAF, and ACE-value) are applied in different studies. None of them is fully adequate, and their measures are not commensurate. We enumerate the desiderata for a coreference scoring measure, discuss the strong and weak points of the existing measures, and propose the BiLateral Assessment of Noun-Phrase Coreference, a variation of the Rand index created to suit the coreference task. The BiLateral Assessment of Noun-Phrase Coreference rewards both coreference and non-coreference links by averaging the F-scores of the two types, does not ignore singletons – the main problem with the MUC score – and does not inflate the score in their presence – a problem with the B3 and CEAF scores. In addition, its fine granularity is consistent over the whole range of scores and affords better discrimination between systems.

Hide All
Amigó E., Gonzalo J., Artiles J., and Verdejo F. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12 (4): 461486.
Bagga A., and Baldwin B. 1998. Algorithms for scoring coreference chains. In Proceedings of the LREC 1998 Workshop on Linguistic Coreference, Granada, Spain, pp. 563566.
Bengtson E., and Roth D. 2008. Understanding the value of features for coreference resolution. In Proceedings of EMNLP, Honolulu, HI, pp. 294303.
Cai J., and Strube M. 2010. Evaluation metrics for end-to-end coreference resolution systems. In Proceedings of SIGDIAL, University of Tokyo, Japan, pp. 2836.
Choi Y., and Cardie C. 2007. Structured local training and biased potential functions for conditional random fields with application to coreference resolution. In Proceedings of HLT-NAACL, Rochester, NY, pp. 6572.
Culotta A., Wick M., Hall R., and McCallum A. 2007. First-order probabilistic models for coreference resolution. In Proceedings of HLT-NAACL, Rochester, NY, pp. 8188.
Daumé H. III and Marcu D. 2005. A large-scale exploration of effective global features for a joint entity detection and tracking model. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 97104.
Denis P., and Baldridge J. 2009. Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural 42: 8796.
Doddington G., Mitchell A., Przybocki M., Ramshaw L., Strassel S., and Weischedel R. 2004. The automatic content extraction (ACE) program – tasks, data and evaluation. In Proceedings of LREC, Lisbon, Portugal, pp. 837840.
Finkel J. R., and Manning C. D. 2008. Enforcing transitivity in coreference resolution. In Proceedings of ACL-HLT, Columbus, OH, pp. 4548.
Haghighi A., and Klein D. 2007. Unsupervised coreference resolution in a nonparametric Bayesian model. In Proceedings of ACL, Prague, Czech Republic, pp. 848855.
Haghighi A., and Klein D. 2009. Simple coreference resolution with rich syntactic and semantic features. In Proceedings of EMNLP, Suntec, Singapore, pp. 11521161.
Hirschman L., and Chinchor N. 1997. MUC-7 coreference task definition – version 3.0. In Proceedings of MUC-7. Washington, DC.
Hubert L., and Arabie P. 1985. Comparing partitions. Journal of Classification 2 (1): 193218.
Luo X. 2005. On coreference resolution performance metrics. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 2532.
Luo X., Ittycheriah A., Jing H., Kambhatla N., and Roukos S. 2004. A mention-synchronous coreference resolution algorithm based on the Bell tree. In Proceedings of ACL, Barcelona, Spain, pp. 2126.
Luo X., and Zitouni I. 2005. Multi-lingual coreference resolution with syntactic features. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 660667.
Ng V. 2009. Graph-cut-based anaphoricity determination for coreference resolution. In Proceedings of NAACL-HLT, Boulder, CO, pp. 575583.
Ng V., and Cardie C. 2002. Improving machine learning approaches to coreference resolution. In Proceedings of ACL, Philadelphia, PA, pp. 104111.
Poon H., and Domingos P. 2008. Joint unsupervised coreference resolution with Markov logic. In Proceedings of EMNLP, Honolulu, HI, pp. 650659.
Popescu-Belis A. 2000. Évaluation numérique de la résolution de la référence: critiques et propositions. T.A.L.: Traitement automatique de la langue 40 (2): 117146.
Popescu-Belis A., Rigouste L., Salmon-Alt S., and Romary L. 2004. Online evaluation of coreference resolution. In Proceedings of LREC, Lisbon, Portugal, pp. 15071510.
Rahman A., and Ng V. 2009. Supervised models for coreference resolution. In Proceedings of EMNLP, Suntec, Singapore, pp. 968977.
Rand W. M. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66 (336): 846850.
Recasens M., and Hovy E. 2010. Coreference resolution across corpora: languages, coding schemes, and preprocessing information. In Proceedings of ACL, Uppsala, Sweden, pp. 14231432.
Recasens M., Màrquez L., Sapena E., Martí M. A., Taulé M., Hoste V., Poesio M., and Versley Y. 2010. SemEval-2010 task 1: coreference resolution in multiple languages. In Proceedings of the Fifth International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden, pp. 18.
Recasens M., and Martí M. A. 2010. AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44 (4): 315345.
Soon W. M., Ng H. T., and Lim D. C. Y. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27 (4): 521544.
Stoyanov V., Gilbert N., Cardie C., and Riloff E. 2009. Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In Proceedings of ACL-IJCNLP, Suntec, Singapore, pp. 656664.
Vilain M., Burger J., Aberdeen J., Connolly D., and Hirschman L. 1995. A model-theoretic coreference scoring scheme. In Proceedings of MUC-6, San Francisco, CA, pp. 4552.
Wick M., and McCallum A. 2009. Advances in learning and inference for partition-wise models of coreference resolution. Technical Report UM-CS-2009-028, Department of Computer Science, University of Massachusetts.
Yang X., Su J., Lang J., Tan C. L., Liu T., and Li S. 2008. An entity-mention model for coreference resolution with inductive logic programming. In Proceedings of ACL-HLT, Columbus, OH, pp. 843851.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 4
Total number of PDF views: 38 *
Loading metrics...

Abstract views

Total abstract views: 294 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 24th November 2017. This data will be updated every 24 hours.