In this paper we discuss a persistent problem arising from polysemy: namely the difficulty of finding consistent criteria for making fine-grained sense distinctions, either manually or automatically. We investigate sources of human annotator disagreements stemming from the tagging for the English Verb Lexical Sample Task in the SENSEVAL-2 exercise in automatic Word Sense Disambiguation. We also examine errors made by a high-performing maximum entropy Word Sense Disambiguation system we developed. Both sets of errors are at least partially reconciled by a more coarse-grained view of the senses, and we present the groupings we use for quantitative coarse-grained evaluation as well as the process by which they were created. We compare the system's performance with our human annotator performance in light of both fine-grained and coarse-grained sense distinctions and show that well-defined sense groups can be of value in improving word sense disambiguation by both humans and machines.
Email your librarian or administrator to recommend adding this journal to your organisation's collection.