Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Burt L. Monroe; Michael P. Colaresi; Kevin M. Quinn

doi:10.1093/pan/mpn018

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Published online by Cambridge University Press: 04 January 2017

Burt L. Monroe ,

Michael P. Colaresi and

Kevin M. Quinn

Show author details

Burt L. Monroe*: Affiliation:
Department of Political Science, Quantitative Social Science Initiative, The Pennsylvania State University
Michael P. Colaresi*: Affiliation:
Department of Political Science, Michigan State University
Kevin M. Quinn*: Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University
*: e-mail: burtmonroe@psu.edu (corresponding author)
e-mail: colaresi@msu.edu
e-mail: kevin_quinn@harvard.edu

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the 'Save PDF' action button.

Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.

Information

Type: Special Issue: The Statistical Analysis of Political Text
Information: Political Analysis , Volume 16 , Issue 4: Special Issue: The Statistical Analysis of Political Text , Autumn 2008 , pp. 372 - 403

DOI: https://doi.org/10.1093/pan/mpn018 [Opens in a new window]
Copyright: Copyright © The Author 2009. Published by Oxford University Press on behalf of the Society for Political Methodology

Footnotes

Author's note: We would like to thank Mike Crespin, Jim Dillard, Jeff Lewis, Will Lowe, Mike MacKuen, Andrew Martin, Prasenjit Mitra, Phil Schrodt, Corwin Smidt, Denise Solomon, Jim Stimson, Anton Westveld, Chris Zorn, and participants in seminars at the University of North Carolina, Washington University, and Pennsylvania State University for helpful comments on earlier and related efforts. Any opinions, findings, and conclusions or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

Adams, Greg D. 1997. Abortion: Evidence of issue evolution. American Journal of Political Science 41(3): 718–37.Google Scholar

Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: Wiley.CrossRef Google Scholar

Aizawa, Akiko. 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management 39(1): 45–65.Google Scholar

Aronson, Peter H., and Ordeshook, Peter C. 1981. Regulation, redistribution, and public choice. Public Choice 37(1): 69–100.Google Scholar

Baumgartner, Frank R., DeBoef, Suzanna L., and Boydstun, Amber E. 2008. The decline of the death penalty and the discovery of innocence. Cambridge: Cambridge University Press.Google Scholar

Black, Duncan. 1958. The theory of committees and elections. Cambridge: Cambridge University Press.Google Scholar

Breiman, Leo. 2001. Random forests. Machine Learning 45(1): 5–32.Google Scholar

Carmines, Edward, and Stimson, James. 1990. Issue evolution. New York: Princeton University Press.Google Scholar

Cawley, Gavin C., and Talbot, Nicola L. C. 2006. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19): 2348–55.Google Scholar

Chong, Dennis, and Druckman, James N. 2007. Framing theory. Annual Review of Political Science 10: 103–26.Google Scholar

Cox, Gary W., and McCubbins, Mathew D. 2005. Setting the agenda: Responsible party government in the US House of Representatives. Cambridge: Cambridge University Press.CrossRef Google Scholar

Diaz, Mercedes Mateo. 2005. Representing women? Female legislators in West European parliaments. Colchester, UK: ECPR Press Monographs.Google Scholar

Diermeier, Daniel, Godbout, Jean-Francois, Yu, Bei, and Kaufmann, Stefan. 2007. Language and ideology in congress. Chicago, IL: Midwestern Political Science Association.Google Scholar

Emily, Senay, Newberger, Eli H., and Waters, Rob. 2004. From boys to men. New York: Simon and Schuster.Google Scholar

Ericson, Matthew. 2008. The words they used. The New York Times, September 4, http://www.nytimes.com/interactive/2008/09/04/us/politics/20080905_WORDS_GRAPHIC.html (accessed November 1, 2008).Google Scholar

Freund, Yoav, and Schapire, Robert. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1): 119–39.Google Scholar

Gentzkow, Matthew, and Shapiro, Jesse M. 2006. What drives media slant? Evidence from U.S. daily newspapers. University of ChicagoTechnical report.Google Scholar

Gilligan, Carol. 1993. In a different voice: Psychological theory and women's development. Cambridge, MA: Harvard University Press.CrossRef Google Scholar

Hand, David J. 2006. Classifier technology and the illusion of progress. Statistical Science 21(1): 1–14.Google Scholar

Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2001. The elements of statistical learning. New York: Springer.Google Scholar

Hiemstra, Djoerd. 2000. A probabilistic justification for using tfxidf term weighting in information retrieval. International Journal on Digital Libraries 3: 131–9.Google Scholar

Hillard, Dustin, Purpura, Stephen, and Wilkerson, John. 2007. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology and Politics 4(4): 31–46.Google Scholar

Hopkins, Daniel, and King, Gary. 2007. Extracting systematic social science meaning from text. Chicago, IL: Midwestern Political Science Association.Google Scholar

Jurafsky, Daniel, and Martin, James H. 2000. Speech and language processing. Upper Saddle River, NJ: Prentice Hall.Google Scholar

Kleinfeld, Judith, and Kleinfeld, Andrew. 2004. Cowboy nation and American character. Society 41(3): 43–50.Google Scholar

Krippendorff, Klaus. 2004. Content analysis: An introduction to its methodology. 2nd ed. New York: Sage.Google Scholar

Lakoff, George. 2004. Don't think of an elephant! Know your values and frame the debate. White River Junction, VT: Chelsea Green Publishing.Google Scholar

Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97: 311–31.Google Scholar

Lowe, Will. 2008. Understanding Wordscores. Political Analysis doi:10.1093/pan/mpn004 (accessed October 4, 2008).Google Scholar

Manning, Christopher D., and Schütze, Hinrich. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Google Scholar

Manning, Christopher D., Raghavan, Prabhakar, and Schütze, Hinrich. 2008. An introduction to information retrieval. Cambridge: Cambridge University Press.Google Scholar

Many Eyes (IBM). Tag Cloud Guide. http://manyeyes.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html (accessed July 13, 2008).Google Scholar

McCaffrey, Dawn, and Keys, Jennifer. 2008. Competitive framing processes in the abortion debate: Polarization-vilification, frame saving, and frame debunking. Sociological Quarterly 41(1): 41–61.Google Scholar

McCarty, Nolan, Poole, Keith T., and Rosenthal, Howard. 2006. Polarized America: The dance of ideology and unequal riches. Cambridge, MA: MIT Press.Google Scholar

Monroe, Burt L., Monroe, Cheryl L., Quinn, Kevin M., Radev, Dragomir, Crespin, Michael H., Colaresi, Michael P., Balazar, Jacob, and Abney, Steven P. 2006. United States congressional speech corpus. State College, PA: Pennsylvania State University.Google Scholar

Monroe, Burt L., and Maeda, Ko. 2004. Rhetorical ideal point estimation: Mapping legislative speech. Society for Political Methodology. Palo Alto: Stanford University.Google Scholar

Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M. An Animated History of Senate Debate on Defense, 1997–2004. http://qssi.psu.edu/PartisanDefenseWords.html (accessed November 1, 2008).Google Scholar

Mucciaroni, Gary, and Quirk, Paul J. 2006. Deliberative choice: Debating public policy in congress. Chicago, IL: University of Chicago Press.Google Scholar

Nunberg, Geoffrey. 2006. Talking right: How conservatives turned liberalism into a tax-raising, latte-drinking, sushi-eating, volvo-driving, New York Times-Reading, Body-Piercing, Hollywood-Loving, Left-Wing Freak Show. New York: Public Affairs.Google Scholar

Poole, Keith T. 2005. Spatial models of parliamentary voting. New York, Cambridge: Cambridge University Press.CrossRef Google Scholar

Poole, Keith T., and Rosenthal, Howard. 1997. Congress: A political-economic history of roll-call voting. Oxford: Oxford University Press.Google Scholar

Porter, Martin F. 2001. The English (Porter2) stemming algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html (accessed July 3, 2008).Google Scholar

Quinn, Kevin, Monroe, Burt L., Colaresi, Michael, Crespin, Michael, and Radev, Dragomir. 2006. How to analyze political attention with minimal assumptions and costs. Davis, CA: Society for Political Methodology.Google Scholar

Riker, William H. 1986. The art of political manipulation. New Haven, CT: Yale University Press.Google Scholar

Rokeach, Milton. 1973. The nature of human values. New York: The Free Press.Google Scholar

Sacerdote, Bruce, and Zidar, Owen. 2008. Campaigning in poetry: Is there information conveyed in the candidates’ choice of words. Technical report, University of Dartmouth.Google Scholar

Schonhardt-Bailey, Cheryl. 2008. The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science 38(3): 383–410.Google Scholar

Shevade, S. K., and Keerthi, S. S. 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17): 2246–53.Google Scholar

Sinclair, Barbara. 2006. Party wars: Polarization and the politics of national policy making. Norman, OK: University of Oklahoma Press.Google Scholar

Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–22.Google Scholar

Theriault, Sean M. 2008. Party polarization in congress. Cambridge: Cambridge University Press.Google Scholar

Vapnik, Vladimir. 2001. The nature of statistical learning theory. New York: Springer-Verlag.Google Scholar

Williams, Melissa S. 1998. Voice, trust and memory: Marginalized groups and the failings of liberal representation. Princeton, NJ: Princeton University Press.Google Scholar

Williams, Peter M. 1995. Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1): 117–43.CrossRef Google Scholar

Wordle. 2008. http://wordle.net (accessed July 13, 2008).Google Scholar

Yu, Bei, Kaufmann, Stefan, and Diermeier, Daniel. 2008. Classifying party affiliation from political speech. Journal of Information Technology and Politics 5(1): 33–48.Google Scholar

Article contents

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Abstract

Information

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests