Skip to main content

A cross-corpus study of subjectivity identification using unsupervised learning

  • DONG WANG (a1) and YANG LIU (a1)

In this study, we investigate using unsupervised generative learning methods for subjectivity detection across different domains. We create an initial training set using simple lexicon information and then evaluate two iterative learning methods with a base naive Bayes classifier to learn from unannotated data. The first method is self-training, which adds instances with high confidence into the training set in each iteration. The second is a calibrated EM (expectation-maximization) method where we calibrate the posterior probabilities from EM such that the class distribution is similar to that in the real data. We evaluate both approaches on three different domains: movie data, news resource, and meeting dialogues, and we found that in some cases the unsupervised learning methods can achieve performance close to the fully supervised setup. We perform a thorough analysis to examine factors, such as self-labeling accuracy of the initial training set in unsupervised learning, the accuracy of the added examples in self-training, and the size of the initial training set in different methods. Our experiments and analysis show inherent differences across domains and impacting factors explaining the model behaviors.

Hide All
Andreevskaia A. and Bergler S. 2008. When specialists and generalists work together: overcoming domain dependence in sentiment tagging. In Proceedings of ACL/HLT, Columbus, Ohio.
Chapelle O., Schölkopf B. and Zien A. (eds). 2006. Semi-Supervised Learning. Cambridge, MA: MIT Press.
Choi Y. and Cardie C. 2009. Adapting a polarity lexicon using integer linear programming for domainspecific sentiment classification. In Proceedings of EMNLP, Singapore.
Dai W., Xue G.-R., Yang Q., and Yu Y. 2007. Transferring naive Bayes classifiers for text classification. In Proceedings of AAAI, Vancouver, British Columbia, Canada.
Dasgupta S. and Ng V. 2009. Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In Proceedings of ACL-IJCNLP, Suntec, Singapore.
Druck G., Pal C., McCallum A., and Zhu X. 2007. Semi-supervised classification with hybrid generative/discriminative methods. In Proceedings of ACM SIGKDD, San Jose, CA, USA.
Gyamfi Y., Wiebe J., Mihalcea R. and Akkaya C. 2009. Integrating knowledge for subjectivity sense labeling. In Proceedings of NAACL, Boulder, CO, USA.
Hu M. and Liu B. 2006. Opinion extraction and summarization on the web. In Proceedings of AAAI, Boston, MA, USA.
Kim S.-M. and Hovy E. 2005. Automatic detection of opinion bearing words and sentences. In Proceedings of ACL, Jeju Island, Korea.
Li S., Huang C.-R., Zhou G., and Lee S. Y. M. 2010. Employing personal/impersonal views in supervised and semi-supervised sentiment classification. In Proceedings of ACL, Uppsala, Sweden.
Melville P., Gryc W. and Lawrence R. D. 2009. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of ACM SIGKDD, Paris, France.
Murray G. and Carenini G. 2008. Summarizing spoken and written conversations. In Proceedings of EMNLP, Honolulu, Hawaii.
Murray G. and Carenini G. 2009. Detecting subjectivity in multiparty speech. In Proceedings of Interspeech, Brighton, UK.
Nakagawa T., Inui K. and Kurohashi S. 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Proceedings of NAACL, Los Angeles, CA, USA.
Ng V., Dasgupta S. and Arifin S. M. N. 2006. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In Proceedings of COLING/ACL, Sydney, Australia.
Ni X., Xue G.-R., Ling X., Yu Y., and Yang Q. 2007. Exploring in the weblog space by detecting informative and affective articles. In Proceedings of WWW, Banff, Alberta, Canada.
Nigam K., McCallum A. K., Thrun S., and Mitchell T. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39: 103–34.
Nishikawa H., Hasegawa T., Matsuo Y. and Kikui G. 2010. Optimizing informativeness and readability for sentiment summarization. In Proceedings of ACL, Uppsala, Sweden.
Pang B. and Lee L. 2004. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of ACL, Barcelona, Spain.
Pang B. and Lee L. 2008. Using very simple statistics for review search: An exploration. In Proceedings of COLING, Manchester, UK.
Pang B., Lee L. and Vaithyanathan S. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP, Philadelphia, PA, USA.
Raaijmakers S. and Kraaij W. 2008. A shallow approach to subjectivity classification. In Proceedings of ICWSM, Seattle, DC, USA.
Raaijmakers S., Truong K. and Wilson T. 2008. Multimodal subjectivity analysis of multiparty conversation. In Proceedings of EMNLP, Honolulu, Hawaii.
Riloff E. and Wiebe J. 2003. Learning extraction patterns for subjective expressions. In Proceedings of EMNLP, Stroudsburg, PA, USA.
Riloff E., Wiebe J. and Phillips W. 2005. Exploiting subjectivity classification to improve information extraction. In Proceedings of AAAI, Pittsburgh, PA, USA.
Sebastiani F., Esuli A. and Sebastiani F. 2006. Determining term subjectivity and term orientation for opinion mining. In Proceedings of EACL, Trento, Italy.
Tsuruoka Y. and Tsujii J. 2003. Training a naive Bayes classifier via the EM algorithm with a class distribution constraint. In Proceedings of NAACL, Edmonton, Canada.
Wiebe J. and Riloff E. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of CICLing, Mexico City, Mexico.
Wiebe J., Wilson T., Bruce R., Bell M., and Martin M. 2004. Learning subjective language. Computational Linguistics 30 (3): 277308.
Wiegand M. and Klakow D. 2010. Bootstrapping supervised machine-learning polarity classifiers with rule-based classification. In Proceedings of WASSA, Lisbon, Portugal.
Wilson T. 2008. Annotating subjective content in meetings. In Proceedings of LREC, Marrakech, Morocco.
Wilson T. and Wiebe J. 2003. Annotating opinions in the world press. In Proceedings of SIGdial, Sapporo, Japan.
Wilson T., Wiebe J. and Hwa R. 2004. Just how mad are you? Finding strong and weak opinion clauses. In Proceedings of AAAI, San Jose, CA, USA.
Wilson T., Wiebe J. and Hoffmann P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT-EMNLP, Vancouver, British Columbia, Canada.
Yu H. and Hatzivassiloglou V. 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of EMNLP, Stroudsburg, PA, USA.
Zhou S., Chen Q. and Wang X. 2010. Active deep networks for semi-supervised sentiment classification. In Proceedings of COLING, Beijing, China.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 1
Total number of PDF views: 18 *
Loading metrics...

Abstract views

Total abstract views: 106 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 19th January 2018. This data will be updated every 24 hours.