Skip to main content
    • Aa
    • Aa

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

  • Burt L. Monroe (a1), Michael P. Colaresi (a2) and Kevin M. Quinn (a3)

Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.

Corresponding author
e-mail: (corresponding author)
Hide All

Author's note: We would like to thank Mike Crespin, Jim Dillard, Jeff Lewis, Will Lowe, Mike MacKuen, Andrew Martin, Prasenjit Mitra, Phil Schrodt, Corwin Smidt, Denise Solomon, Jim Stimson, Anton Westveld, Chris Zorn, and participants in seminars at the University of North Carolina, Washington University, and Pennsylvania State University for helpful comments on earlier and related efforts. Any opinions, findings, and conclusions or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

Greg D. Adams 1997. Abortion: Evidence of issue evolution. American Journal of Political Science 41(3): 718–37.

Alan Agresti . 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: Wiley.

Akiko Aizawa . 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management 39(1): 4565.

Peter H. Aronson , and Peter C. Ordeshook 1981. Regulation, redistribution, and public choice. Public Choice 37(1): 69100.

Frank R. Baumgartner , Suzanna L. DeBoef , and Amber E. Boydstun 2008. The decline of the death penalty and the discovery of innocence. Cambridge: Cambridge University Press.

Gavin C. Cawley , and Nicola L. C. Talbot 2006. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19): 2348–55.

Dennis Chong , and James N. Druckman 2007. Framing theory. Annual Review of Political Science 10: 103–26.

Gary W. Cox , and Mathew D. McCubbins 2005. Setting the agenda: Responsible party government in the US House of Representatives. Cambridge: Cambridge University Press.

Matthew Gentzkow , and Jesse M. Shapiro 2006. What drives media slant? Evidence from U.S. daily newspapers. University of ChicagoTechnical report.

David J. Hand 2006. Classifier technology and the illusion of progress. Statistical Science 21(1): 114.

Trevor Hastie , Robert Tibshirani , and Jerome Friedman . 2001. The elements of statistical learning. New York: Springer.

Djoerd Hiemstra . 2000. A probabilistic justification for using tfxidf term weighting in information retrieval. International Journal on Digital Libraries 3: 131–9.

Dustin Hillard , Stephen Purpura , and John Wilkerson . 2007. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology and Politics 4(4): 3146.

Judith Kleinfeld , and Andrew Kleinfeld . 2004. Cowboy nation and American character. Society 41(3): 4350.

Michael Laver , Kenneth Benoit , and John Garry . 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97: 311–31.

Will Lowe . 2008. Understanding Wordscores. Political Analysis doi:10.1093/pan/mpn004 (accessed October 4, 2008).

Dawn McCaffrey , and Jennifer Keys . 2008. Competitive framing processes in the abortion debate: Polarization-vilification, frame saving, and frame debunking. Sociological Quarterly 41(1): 4161.

Keith T. Poole 2005. Spatial models of parliamentary voting. New York, Cambridge: Cambridge University Press.

Cheryl Schonhardt-Bailey . 2008. The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science 38(3): 383410.

S. K. Shevade , and S. S. Keerthi 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17): 2246–53.

Jonathan B. Slapin , and Sven-Oliver Proksch . 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–22.

Sean M. Theriault 2008. Party polarization in congress. Cambridge: Cambridge University Press.

Peter M. Williams 1995. Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1): 117–43.

Bei Yu , Stefan Kaufmann , and Daniel Diermeier . 2008. Classifying party affiliation from political speech. Journal of Information Technology and Politics 5(1): 3348.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 28 *
Loading metrics...

Abstract views

Total abstract views: 119 *
Loading metrics...

* Views captured on Cambridge Core between 4th January 2017 - 24th June 2017. This data will be updated every 24 hours.