Skip to main content
    • Aa
    • Aa

A scaffolding approach to coreference resolution integrating statistical and rule-based models


We describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 F1 on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four F1 points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

A. Boyd , W. Gegg-Harrison , and D. Byron , 2005. Identifying non-referential it: a machine learning approach incorporating linguistically motivated features. In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP, Ann Arbor, Michigan, pp. 40–7.

C. J. Burges , 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2: 121–67.

K. Clark , and C. D. Manning , 2016. Deep reinforcement learning for mention-ranking coreference models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, pp. 2256–62.

J. R. Hobbs , 1978. Resolving pronoun references. Lingua 44 (4): 311–38.

P. Jindal , and D. Roth , 2013. Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives. Journal of the American Medical Informatics Association (JAMIA) 20 (2): 356–62.

H. Lee , A. Chang , Y. Peirsman , N. Chambers , M. Surdeanu , and D. Jurafsky , 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39 (4): 885916.

A. Rahman , and V. Ng , 2009. Supervised models for coreference resolution. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suntec, Singapore, pp. 968–77.

B. Roark , and K. Hollingshead 2008. Classifying chart cells for quadratic complexity context-free inference. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, United Kingdom.

J. Steinberger , M. Poesio , M. A. Kabadjov , and K. Jezek , 2007. Two uses of anaphora resolution in summarization. Information Processing and Management 43 (6): 1663–80.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 12
Total number of PDF views: 67 *
Loading metrics...

Abstract views

Total abstract views: 477 *
Loading metrics...

* Views captured on Cambridge Core between 21st March 2017 - 23rd August 2017. This data will be updated every 24 hours.