Hostname: page-component-546b4f848f-gfk6d Total loading time: 0 Render date: 2023-06-03T02:14:10.467Z Has data issue: false Feature Flags: { "useRatesEcommerce": true } hasContentIssue false

BLANC: Implementing the Rand index for coreference evaluation

Published online by Cambridge University Press:  06 December 2010

CLiC, University of Barcelona, Gran Via 585, Barcelona 08007, Spain email:
USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292, USA email:


This paper addresses the current state of coreference resolution evaluation, in which different measures (notably, MUC, B3, CEAF, and ACE-value) are applied in different studies. None of them is fully adequate, and their measures are not commensurate. We enumerate the desiderata for a coreference scoring measure, discuss the strong and weak points of the existing measures, and propose the BiLateral Assessment of Noun-Phrase Coreference, a variation of the Rand index created to suit the coreference task. The BiLateral Assessment of Noun-Phrase Coreference rewards both coreference and non-coreference links by averaging the F-scores of the two types, does not ignore singletons – the main problem with the MUC score – and does not inflate the score in their presence – a problem with the B3 and CEAF scores. In addition, its fine granularity is consistent over the whole range of scores and affords better discrimination between systems.

Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, F. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12 (4): 461486.CrossRefGoogle Scholar
Bagga, A., and Baldwin, B. 1998. Algorithms for scoring coreference chains. In Proceedings of the LREC 1998 Workshop on Linguistic Coreference, Granada, Spain, pp. 563566.Google Scholar
Bengtson, E., and Roth, D. 2008. Understanding the value of features for coreference resolution. In Proceedings of EMNLP, Honolulu, HI, pp. 294303.CrossRefGoogle Scholar
Cai, J., and Strube, M. 2010. Evaluation metrics for end-to-end coreference resolution systems. In Proceedings of SIGDIAL, University of Tokyo, Japan, pp. 2836.Google Scholar
Choi, Y., and Cardie, C. 2007. Structured local training and biased potential functions for conditional random fields with application to coreference resolution. In Proceedings of HLT-NAACL, Rochester, NY, pp. 6572.Google Scholar
Culotta, A., Wick, M., Hall, R., and McCallum, A. 2007. First-order probabilistic models for coreference resolution. In Proceedings of HLT-NAACL, Rochester, NY, pp. 8188.Google Scholar
Daumé, H. III and Marcu, D. 2005. A large-scale exploration of effective global features for a joint entity detection and tracking model. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 97104.Google Scholar
Denis, P., and Baldridge, J. 2009. Global joint models for coreference resolution and named entity classification. Procesamiento del Lenguaje Natural 42: 8796.Google Scholar
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program – tasks, data and evaluation. In Proceedings of LREC, Lisbon, Portugal, pp. 837840.Google Scholar
Finkel, J. R., and Manning, C. D. 2008. Enforcing transitivity in coreference resolution. In Proceedings of ACL-HLT, Columbus, OH, pp. 4548.Google Scholar
Haghighi, A., and Klein, D. 2007. Unsupervised coreference resolution in a nonparametric Bayesian model. In Proceedings of ACL, Prague, Czech Republic, pp. 848855.Google Scholar
Haghighi, A., and Klein, D. 2009. Simple coreference resolution with rich syntactic and semantic features. In Proceedings of EMNLP, Suntec, Singapore, pp. 11521161.CrossRefGoogle Scholar
Hirschman, L., and Chinchor, N. 1997. MUC-7 coreference task definition – version 3.0. In Proceedings of MUC-7. Washington, DC.Google Scholar
Hubert, L., and Arabie, P. 1985. Comparing partitions. Journal of Classification 2 (1): 193218.CrossRefGoogle Scholar
Luo, X. 2005. On coreference resolution performance metrics. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 2532.Google Scholar
Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., and Roukos, S. 2004. A mention-synchronous coreference resolution algorithm based on the Bell tree. In Proceedings of ACL, Barcelona, Spain, pp. 2126.Google Scholar
Luo, X., and Zitouni, I. 2005. Multi-lingual coreference resolution with syntactic features. In Proceedings of HLT-EMNLP, Vancouver, Canada, pp. 660667.Google Scholar
Ng, V. 2009. Graph-cut-based anaphoricity determination for coreference resolution. In Proceedings of NAACL-HLT, Boulder, CO, pp. 575583.Google Scholar
Ng, V., and Cardie, C. 2002. Improving machine learning approaches to coreference resolution. In Proceedings of ACL, Philadelphia, PA, pp. 104111.Google Scholar
Poon, H., and Domingos, P. 2008. Joint unsupervised coreference resolution with Markov logic. In Proceedings of EMNLP, Honolulu, HI, pp. 650659.CrossRefGoogle Scholar
Popescu-Belis, A. 2000. Évaluation numérique de la résolution de la référence: critiques et propositions. T.A.L.: Traitement automatique de la langue 40 (2): 117146.Google Scholar
Popescu-Belis, A., Rigouste, L., Salmon-Alt, S., and Romary, L. 2004. Online evaluation of coreference resolution. In Proceedings of LREC, Lisbon, Portugal, pp. 15071510.Google Scholar
Rahman, A., and Ng, V. 2009. Supervised models for coreference resolution. In Proceedings of EMNLP, Suntec, Singapore, pp. 968977.CrossRefGoogle Scholar
Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66 (336): 846850.CrossRefGoogle Scholar
Recasens, M., and Hovy, E. 2010. Coreference resolution across corpora: languages, coding schemes, and preprocessing information. In Proceedings of ACL, Uppsala, Sweden, pp. 14231432.Google Scholar
Recasens, M., Màrquez, L., Sapena, E., Martí, M. A., Taulé, M., Hoste, V., Poesio, M., and Versley, Y. 2010. SemEval-2010 task 1: coreference resolution in multiple languages. In Proceedings of the Fifth International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden, pp. 18.Google Scholar
Recasens, M., and Martí, M. A. 2010. AnCora-CO: coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44 (4): 315345.CrossRefGoogle Scholar
Soon, W. M., Ng, H. T., and Lim, D. C. Y. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 27 (4): 521544.CrossRefGoogle Scholar
Stoyanov, V., Gilbert, N., Cardie, C., and Riloff, E. 2009. Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In Proceedings of ACL-IJCNLP, Suntec, Singapore, pp. 656664.CrossRefGoogle Scholar
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., and Hirschman, L. 1995. A model-theoretic coreference scoring scheme. In Proceedings of MUC-6, San Francisco, CA, pp. 4552.Google Scholar
Wick, M., and McCallum, A. 2009. Advances in learning and inference for partition-wise models of coreference resolution. Technical Report UM-CS-2009-028, Department of Computer Science, University of Massachusetts.Google Scholar
Yang, X., Su, J., Lang, J., Tan, C. L., Liu, T., and Li, S. 2008. An entity-mention model for coreference resolution with inductive logic programming. In Proceedings of ACL-HLT, Columbus, OH, pp. 843851.Google Scholar