Skip to main content
    • Aa
    • Aa

Predicting word choice in affective text

  • M. GARDINER (a1) and M. DRAS (a1)

Choosing the best word or phrase for a given context from among the candidate near-synonyms, such as slim and skinny, is a difficult language generation problem. In this paper, we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields. We present a supervised approach to this problem, initially with a unigram model that solidly outperforms the baseline, with a 6.8% increase in accuracy. The results to some extent confirm those from related problems, where feature presence outperforms feature frequency, and immediate context features generally outperform wider context features. However, this latter is somewhat surprisingly not always the case, and not necessarily where intuition might first suggest; and an analysis of where document-level models are in some cases better suggested that, in our corpus, broader features related to the ‘tone’ of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself. From these, our best model has a 10.1% increase in accuracy, corresponding to a 38% reduction in errors. Moreover, our models do not just improve accuracy on affective word choice, but on non-affective word choice also.

Hide All

The authors would like to thank the anonymous reviewers of the article, and to acknowledge the support of ARC Discovery grant DP0558852.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

P. Edmonds 1997. Choosing the word most typical in context using a lexical co-occurrence network. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain, pp. 507509. Association for Computational Linguistics.

P. Edmonds , and G. Hirst , 2002. Near-synonymy and lexical choice. Computational Linguistics 28 (2): 105144.

W. A. Gale , and G. Sampson , 1995. Good-turing frequency estimation without tears. Journal of Quantitative Linguistics 2: 217232.

M. Gamon 2004. Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In COLING ’04: Proceedings of the 20th International Conference on Computational Linguistics, Morristown, NJ, USA, pp. 841847. Association for Computational Linguistics.

I. J. Good , 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40 (3–4): 237264.

S. Hassan , A. Csomai , C. Banea , R. Sinha , and R. Mihalcea 2007. UNT: SubFinder: combining knowledge sources for automatic lexical substitution. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, pp. 410413. Association for Computational Linguistics.

V. Hatzivassiloglou , and J. M. Wiebe , 2000. Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000), Saarbrcken, Germany, pp. 299305.

D. Inkpen , 2007b. A statistical model for near-synonym choice. ACM Transactions of Speech and Language Processing 4 (1): 117.

D. Inkpen , and G. Hirst , 2006. Building and using a lexical knowledge-base of near-synonym differences. Computational Linguistics 32 (2): 223262.

S. M. Katz , 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35: 400401.

M. Koppel , N. Akiva , and I. Dagan , 2006a. Feature instability as a criterion for selecting potential style markers. Journal of the American Society for Information Science and Technology 57 (11): 15191525.

S. Kullback , and R. A. Leibler , 1951. On Information and Sufficiency. The Annals of Mathematical Statistics 22: 7986.

T. Landauer , and S. Dumais , 1997. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104 (2): 211240.

Y. Liu , and Y. F. Zheng 2005. One-against-all multi-class SVM classification using reliability measures. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, (IJCNN ’05). vol. 2, pp. 849854.

D. McCarthy , and R. Navigli , 2007. SemEval-2007 Task 10: english lexical substitution task. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic. Association for Computational Linguistics, pp. 4853.

L. Özgür , and T. Güngör 2010. Text classification with the support of pruned dependency patterns. Pattern Recognition Letters 31 (12): 15981607.

B. Pang , and L. Lee , 2008. Opinion mining and sentiment nnalysis. Foundations and Trends in Information Retrieval 2 (1–2): 1135.

R. Rapp , 2008. The automatic generation of thesauri of related words for English, French, German, and Russian. International Journal of Speech Technology 11 (3–4): 147156.

E. Reiter , and R. Dale 2000. Building Natural Language Generation Systems. Cambridge University Press, Cambridge, UK.

S. L. Salzberg , 1997. On comparing classifiers: pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1: 317327.

J. Wiebe , T. Wilson , R. Bruce , M. Bell , and M. Martin , 2004. Learning subjective language. Computational Linguistics 30 (3): 277308.

H. Yu , and V. Hatzivassiloglou , 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), Sapporo, Japan, pp. 129136.

D. Yuret 2007. KU: word sense disambiguation by substitution. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, pp. 207214. Association for Computational Linguistics.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 1
Total number of PDF views: 37 *
Loading metrics...

Abstract views

Total abstract views: 402 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 29th June 2017. This data will be updated every 24 hours.