Skip to main content
×
Home
    • Aa
    • Aa

Classifying news versus opinions in newspapers: Linguistic features for domain independence

  • K. R. KRÜGER (a1), A. LUKOWIAK (a1), J. SONNTAG (a1), S. WARZECHA (a1) and M. STEDE (a1)...
Abstract
Abstract

Newspaper text can be broadly divided in the classes ‘opinion’ (editorials, commentary, letters to the editor) and ‘neutral’ (reports). We describe a classification system for performing this separation, which uses a set of linguistically motivated features. Working with various English newspaper corpora, we demonstrate that it significantly outperforms bag-of-lemma and PoS-tag models. We conclude that the linguistic features constitute the best method for achieving robustness against change of newspaper or domain.

Copyright
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

D. Biber , and S. Conrad , 2009. Register, Genre, and Style. Cambridge, UK: Cambridge University Press.

L. Freund , C. L. A. Clarke , and E. G. Toms , 2006. Towards genre classification for IR in the workplace. In Proceedings of the 1st International Conference on Information Interaction in Context (IIiX), Copenhagen, Denmark, pp. 3036.

M. Hall , E. Frank , G. Holmes , B. Pfahringer , P. Reutemann , and I. H. Witten , 2009. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11 (1): 1018.

D. W. Hosmer , S. Lemeshow , and R. X. Sturdivant , 2013. Applied Logistic Regression. Hoboken, NJ: Wiley.

J. Karlgren , and D. Cutting , 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th Conference on Computational Linguistics (COLING 1994), vol. 2, Kyoto, Japan, pp. 10711075.

B. Kessler , G. Nunberg , and H. Schütze , 1997. Automatic detection of text genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 3238.

R. Lippmann , 1987. An introduction to computing with neural nets. ASSP Magazine, IEEE 4 (2): 422.

C. D. Manning , M. Surdeanu , J. Bauer , J. Finkel , S. J. Bethard , and D. McClosky , 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, pp. 5560.

J. Pearl , 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.

P. Petrenz , and B. Webber , 2011. Stable classification of text genres. Computational Linguistics 37 (2): 385–93.

J. Wiebe , T. Wilson , R. Bruce , M. Bell , and M. Martin , 2004. Learning subjective language. Computational Linguistics 30 (3): 277308.

T. Wilson , J. Wiebe , and P. Hoffmann , 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP 2005), Vancouver, B.C., pp. 347354.

H. Yu , and V. Hatzivassiloglou , 2003. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), Stroudsburg, PA, pp. 129136.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 5
Total number of PDF views: 37 *
Loading metrics...

Abstract views

Total abstract views: 224 *
Loading metrics...

* Views captured on Cambridge Core between 21st February 2017 - 29th May 2017. This data will be updated every 24 hours.