Skip to main content
    • Aa
    • Aa

A cross-corpus study of subjectivity identification using unsupervised learning

  • DONG WANG (a1) and YANG LIU (a1)

In this study, we investigate using unsupervised generative learning methods for subjectivity detection across different domains. We create an initial training set using simple lexicon information and then evaluate two iterative learning methods with a base naive Bayes classifier to learn from unannotated data. The first method is self-training, which adds instances with high confidence into the training set in each iteration. The second is a calibrated EM (expectation-maximization) method where we calibrate the posterior probabilities from EM such that the class distribution is similar to that in the real data. We evaluate both approaches on three different domains: movie data, news resource, and meeting dialogues, and we found that in some cases the unsupervised learning methods can achieve performance close to the fully supervised setup. We perform a thorough analysis to examine factors, such as self-labeling accuracy of the initial training set in unsupervised learning, the accuracy of the added examples in self-training, and the size of the initial training set in different methods. Our experiments and analysis show inherent differences across domains and impacting factors explaining the model behaviors.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

O. Chapelle , B. Schölkopf and A. Zien (eds). 2006. Semi-Supervised Learning. Cambridge, MA: MIT Press.

Y. Choi and C. Cardie 2009. Adapting a polarity lexicon using integer linear programming for domainspecific sentiment classification. In Proceedings of EMNLP, Singapore.

W. Dai , G.-R. Xue , Q. Yang , and Y. Yu 2007. Transferring naive Bayes classifiers for text classification. In Proceedings of AAAI, Vancouver, British Columbia, Canada.

S. Dasgupta and V. Ng 2009. Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In Proceedings of ACL-IJCNLP, Suntec, Singapore.

Y. Gyamfi , J. Wiebe , R. Mihalcea and C. Akkaya 2009. Integrating knowledge for subjectivity sense labeling. In Proceedings of NAACL, Boulder, CO, USA.

M. Hu and B. Liu 2006. Opinion extraction and summarization on the web. In Proceedings of AAAI, Boston, MA, USA.

G. Murray and G. Carenini 2008. Summarizing spoken and written conversations. In Proceedings of EMNLP, Honolulu, Hawaii.

G. Murray and G. Carenini 2009. Detecting subjectivity in multiparty speech. In Proceedings of Interspeech, Brighton, UK.

X. Ni , G.-R. Xue , X. Ling , Y. Yu , and Q. Yang 2007. Exploring in the weblog space by detecting informative and affective articles. In Proceedings of WWW, Banff, Alberta, Canada.

K. Nigam , A. K. McCallum , S. Thrun , and T. Mitchell 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–34.

S. Raaijmakers , K. Truong and T. Wilson 2008. Multimodal subjectivity analysis of multiparty conversation. In Proceedings of EMNLP, Honolulu, Hawaii.

J. Wiebe , T. Wilson , R. Bruce , M. Bell , and M. Martin 2004. Learning subjective language. Computational Linguistics 30 (3): 277308.

T. Wilson , J. Wiebe and R. Hwa 2004. Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of AAAI, San Jose, CA, USA.

S. Zhou , Q. Chen and X. Wang 2010. Active deep networks for semi-supervised sentiment classification. In Proceedings of COLING, Beijing, China.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 1
Total number of PDF views: 13 *
Loading metrics...

Abstract views

Total abstract views: 80 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 20th August 2017. This data will be updated every 24 hours.