Hostname: page-component-797576ffbb-jhnrh Total loading time: 0 Render date: 2023-12-08T00:33:15.626Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "useRatesEcommerce": true } hasContentIssue false

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Published online by Cambridge University Press:  04 January 2017

Burt L. Monroe*
Department of Political Science, Quantitative Social Science Initiative, The Pennsylvania State University
Michael P. Colaresi*
Department of Political Science, Michigan State University
Kevin M. Quinn*
Department of Government and Institute for Quantitative Social Science, Harvard University
e-mail: (corresponding author)
Rights & Permissions [Opens in a new window]


Core share and HTML view are not possible as this article does not have html content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.

Special Issue: The Statistical Analysis of Political Text
Copyright © The Author 2009. Published by Oxford University Press on behalf of the Society for Political Methodology 


Author's note: We would like to thank Mike Crespin, Jim Dillard, Jeff Lewis, Will Lowe, Mike MacKuen, Andrew Martin, Prasenjit Mitra, Phil Schrodt, Corwin Smidt, Denise Solomon, Jim Stimson, Anton Westveld, Chris Zorn, and participants in seminars at the University of North Carolina, Washington University, and Pennsylvania State University for helpful comments on earlier and related efforts. Any opinions, findings, and conclusions or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.


Adams, Greg D. 1997. Abortion: Evidence of issue evolution. American Journal of Political Science 41(3): 718–37.Google Scholar
Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: Wiley.Google Scholar
Aizawa, Akiko. 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management 39(1): 4565.Google Scholar
Aronson, Peter H., and Ordeshook, Peter C. 1981. Regulation, redistribution, and public choice. Public Choice 37(1): 69100.Google Scholar
Baumgartner, Frank R., DeBoef, Suzanna L., and Boydstun, Amber E. 2008. The decline of the death penalty and the discovery of innocence. Cambridge: Cambridge University Press.Google Scholar
Black, Duncan. 1958. The theory of committees and elections. Cambridge: Cambridge University Press.Google Scholar
Breiman, Leo. 2001. Random forests. Machine Learning 45(1): 532.Google Scholar
Carmines, Edward, and Stimson, James. 1990. Issue evolution. New York: Princeton University Press.Google Scholar
Cawley, Gavin C., and Talbot, Nicola L. C. 2006. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19): 2348–55.Google Scholar
Chong, Dennis, and Druckman, James N. 2007. Framing theory. Annual Review of Political Science 10: 103–26.Google Scholar
Cox, Gary W., and McCubbins, Mathew D. 2005. Setting the agenda: Responsible party government in the US House of Representatives. Cambridge: Cambridge University Press.Google Scholar
Diaz, Mercedes Mateo. 2005. Representing women? Female legislators in West European parliaments. Colchester, UK: ECPR Press Monographs.Google Scholar
Diermeier, Daniel, Godbout, Jean-Francois, Yu, Bei, and Kaufmann, Stefan. 2007. Language and ideology in congress. Chicago, IL: Midwestern Political Science Association.Google Scholar
Emily, Senay, Newberger, Eli H., and Waters, Rob. 2004. From boys to men. New York: Simon and Schuster.Google Scholar
Ericson, Matthew. 2008. The words they used. The New York Times, September 4, (accessed November 1, 2008).Google Scholar
Freund, Yoav, and Schapire, Robert. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1): 119–39.Google Scholar
Gentzkow, Matthew, and Shapiro, Jesse M. 2006. What drives media slant? Evidence from U.S. daily newspapers. University of ChicagoTechnical report.Google Scholar
Gilligan, Carol. 1993. In a different voice: Psychological theory and women's development. Cambridge, MA: Harvard University Press.Google Scholar
Hand, David J. 2006. Classifier technology and the illusion of progress. Statistical Science 21(1): 114.Google Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2001. The elements of statistical learning. New York: Springer.Google Scholar
Hiemstra, Djoerd. 2000. A probabilistic justification for using tfxidf term weighting in information retrieval. International Journal on Digital Libraries 3: 131–9.Google Scholar
Hillard, Dustin, Purpura, Stephen, and Wilkerson, John. 2007. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology and Politics 4(4): 3146.Google Scholar
Hopkins, Daniel, and King, Gary. 2007. Extracting systematic social science meaning from text. Chicago, IL: Midwestern Political Science Association.Google Scholar
Jurafsky, Daniel, and Martin, James H. 2000. Speech and language processing. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Kleinfeld, Judith, and Kleinfeld, Andrew. 2004. Cowboy nation and American character. Society 41(3): 4350.Google Scholar
Krippendorff, Klaus. 2004. Content analysis: An introduction to its methodology. 2nd ed. New York: Sage.Google Scholar
Lakoff, George. 2004. Don't think of an elephant! Know your values and frame the debate. White River Junction, VT: Chelsea Green Publishing.Google Scholar
Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97: 311–31.Google Scholar
Lowe, Will. 2008. Understanding Wordscores. Political Analysis doi:10.1093/pan/mpn004 (accessed October 4, 2008).Google Scholar
Manning, Christopher D., and Schütze, Hinrich. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Google Scholar
Manning, Christopher D., Raghavan, Prabhakar, and Schütze, Hinrich. 2008. An introduction to information retrieval. Cambridge: Cambridge University Press.Google Scholar
Many Eyes (IBM). Tag Cloud Guide. (accessed July 13, 2008).Google Scholar
McCaffrey, Dawn, and Keys, Jennifer. 2008. Competitive framing processes in the abortion debate: Polarization-vilification, frame saving, and frame debunking. Sociological Quarterly 41(1): 4161.Google Scholar
McCarty, Nolan, Poole, Keith T., and Rosenthal, Howard. 2006. Polarized America: The dance of ideology and unequal riches. Cambridge, MA: MIT Press.Google Scholar
Monroe, Burt L., Monroe, Cheryl L., Quinn, Kevin M., Radev, Dragomir, Crespin, Michael H., Colaresi, Michael P., Balazar, Jacob, and Abney, Steven P. 2006. United States congressional speech corpus. State College, PA: Pennsylvania State University.Google Scholar
Monroe, Burt L., and Maeda, Ko. 2004. Rhetorical ideal point estimation: Mapping legislative speech. Society for Political Methodology. Palo Alto: Stanford University.Google Scholar
Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M. An Animated History of Senate Debate on Defense, 1997–2004. (accessed November 1, 2008).Google Scholar
Mucciaroni, Gary, and Quirk, Paul J. 2006. Deliberative choice: Debating public policy in congress. Chicago, IL: University of Chicago Press.Google Scholar
Nunberg, Geoffrey. 2006. Talking right: How conservatives turned liberalism into a tax-raising, latte-drinking, sushi-eating, volvo-driving, New York Times-Reading, Body-Piercing, Hollywood-Loving, Left-Wing Freak Show. New York: Public Affairs.Google Scholar
Poole, Keith T. 2005. Spatial models of parliamentary voting. New York, Cambridge: Cambridge University Press.Google Scholar
Poole, Keith T., and Rosenthal, Howard. 1997. Congress: A political-economic history of roll-call voting. Oxford: Oxford University Press.Google Scholar
Porter, Martin F. 2001. The English (Porter2) stemming algorithm. (accessed July 3, 2008).Google Scholar
Quinn, Kevin, Monroe, Burt L., Colaresi, Michael, Crespin, Michael, and Radev, Dragomir. 2006. How to analyze political attention with minimal assumptions and costs. Davis, CA: Society for Political Methodology.Google Scholar
Riker, William H. 1986. The art of political manipulation. New Haven, CT: Yale University Press.Google Scholar
Rokeach, Milton. 1973. The nature of human values. New York: The Free Press.Google Scholar
Sacerdote, Bruce, and Zidar, Owen. 2008. Campaigning in poetry: Is there information conveyed in the candidates’ choice of words. Technical report, University of Dartmouth.Google Scholar
Schonhardt-Bailey, Cheryl. 2008. The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science 38(3): 383410.Google Scholar
Shevade, S. K., and Keerthi, S. S. 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17): 2246–53.Google Scholar
Sinclair, Barbara. 2006. Party wars: Polarization and the politics of national policy making. Norman, OK: University of Oklahoma Press.Google Scholar
Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–22.Google Scholar
Theriault, Sean M. 2008. Party polarization in congress. Cambridge: Cambridge University Press.Google Scholar
Vapnik, Vladimir. 2001. The nature of statistical learning theory. New York: Springer-Verlag.Google Scholar
Williams, Melissa S. 1998. Voice, trust and memory: Marginalized groups and the failings of liberal representation. Princeton, NJ: Princeton University Press.Google Scholar
Williams, Peter M. 1995. Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1): 117–43.Google Scholar
Wordle. 2008. (accessed July 13, 2008).Google Scholar
Yu, Bei, Kaufmann, Stefan, and Diermeier, Daniel. 2008. Classifying party affiliation from political speech. Journal of Information Technology and Politics 5(1): 3348.Google Scholar