Skip to main content
×
×
Home

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

  • Burt L. Monroe (a1), Michael P. Colaresi (a2) and Kevin M. Quinn (a3)
Abstract

Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.

    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
      Available formats
      ×
Copyright
Corresponding author
e-mail: burtmonroe@psu.edu (corresponding author)
e-mail: colaresi@msu.edu
e-mail: kevin_quinn@harvard.edu
Footnotes
Hide All

Author's note: We would like to thank Mike Crespin, Jim Dillard, Jeff Lewis, Will Lowe, Mike MacKuen, Andrew Martin, Prasenjit Mitra, Phil Schrodt, Corwin Smidt, Denise Solomon, Jim Stimson, Anton Westveld, Chris Zorn, and participants in seminars at the University of North Carolina, Washington University, and Pennsylvania State University for helpful comments on earlier and related efforts. Any opinions, findings, and conclusions or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Footnotes
References
Hide All
Adams, Greg D. 1997. Abortion: Evidence of issue evolution. American Journal of Political Science 41(3): 718–37.
Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: Wiley.
Aizawa, Akiko. 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management 39(1): 4565.
Aronson, Peter H., and Ordeshook, Peter C. 1981. Regulation, redistribution, and public choice. Public Choice 37(1): 69100.
Baumgartner, Frank R., DeBoef, Suzanna L., and Boydstun, Amber E. 2008. The decline of the death penalty and the discovery of innocence. Cambridge: Cambridge University Press.
Black, Duncan. 1958. The theory of committees and elections. Cambridge: Cambridge University Press.
Breiman, Leo. 2001. Random forests. Machine Learning 45(1): 532.
Carmines, Edward, and Stimson, James. 1990. Issue evolution. New York: Princeton University Press.
Cawley, Gavin C., and Talbot, Nicola L. C. 2006. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19): 2348–55.
Chong, Dennis, and Druckman, James N. 2007. Framing theory. Annual Review of Political Science 10: 103–26.
Cox, Gary W., and McCubbins, Mathew D. 2005. Setting the agenda: Responsible party government in the US House of Representatives. Cambridge: Cambridge University Press.
Diaz, Mercedes Mateo. 2005. Representing women? Female legislators in West European parliaments. Colchester, UK: ECPR Press Monographs.
Diermeier, Daniel, Godbout, Jean-Francois, Yu, Bei, and Kaufmann, Stefan. 2007. Language and ideology in congress. Chicago, IL: Midwestern Political Science Association.
Emily, Senay, Newberger, Eli H., and Waters, Rob. 2004. From boys to men. New York: Simon and Schuster.
Ericson, Matthew. 2008. The words they used. The New York Times, September 4, http://www.nytimes.com/interactive/2008/09/04/us/politics/20080905_WORDS_GRAPHIC.html (accessed November 1, 2008).
Freund, Yoav, and Schapire, Robert. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1): 119–39.
Gentzkow, Matthew, and Shapiro, Jesse M. 2006. What drives media slant? Evidence from U.S. daily newspapers. University of ChicagoTechnical report.
Gilligan, Carol. 1993. In a different voice: Psychological theory and women's development. Cambridge, MA: Harvard University Press.
Hand, David J. 2006. Classifier technology and the illusion of progress. Statistical Science 21(1): 114.
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2001. The elements of statistical learning. New York: Springer.
Hiemstra, Djoerd. 2000. A probabilistic justification for using tfxidf term weighting in information retrieval. International Journal on Digital Libraries 3: 131–9.
Hillard, Dustin, Purpura, Stephen, and Wilkerson, John. 2007. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology and Politics 4(4): 3146.
Hopkins, Daniel, and King, Gary. 2007. Extracting systematic social science meaning from text. Chicago, IL: Midwestern Political Science Association.
Jurafsky, Daniel, and Martin, James H. 2000. Speech and language processing. Upper Saddle River, NJ: Prentice Hall.
Kleinfeld, Judith, and Kleinfeld, Andrew. 2004. Cowboy nation and American character. Society 41(3): 4350.
Krippendorff, Klaus. 2004. Content analysis: An introduction to its methodology. 2nd ed. New York: Sage.
Lakoff, George. 2004. Don't think of an elephant! Know your values and frame the debate. White River Junction, VT: Chelsea Green Publishing.
Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97: 311–31.
Lowe, Will. 2008. Understanding Wordscores. Political Analysis doi:10.1093/pan/mpn004 (accessed October 4, 2008).
Manning, Christopher D., and Schütze, Hinrich. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Manning, Christopher D., Raghavan, Prabhakar, and Schütze, Hinrich. 2008. An introduction to information retrieval. Cambridge: Cambridge University Press.
Many Eyes (IBM). Tag Cloud Guide. http://manyeyes.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html (accessed July 13, 2008).
McCaffrey, Dawn, and Keys, Jennifer. 2008. Competitive framing processes in the abortion debate: Polarization-vilification, frame saving, and frame debunking. Sociological Quarterly 41(1): 4161.
McCarty, Nolan, Poole, Keith T., and Rosenthal, Howard. 2006. Polarized America: The dance of ideology and unequal riches. Cambridge, MA: MIT Press.
Monroe, Burt L., Monroe, Cheryl L., Quinn, Kevin M., Radev, Dragomir, Crespin, Michael H., Colaresi, Michael P., Balazar, Jacob, and Abney, Steven P. 2006. United States congressional speech corpus. State College, PA: Pennsylvania State University.
Monroe, Burt L., and Maeda, Ko. 2004. Rhetorical ideal point estimation: Mapping legislative speech. Society for Political Methodology. Palo Alto: Stanford University.
Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M. An Animated History of Senate Debate on Defense, 1997–2004. http://qssi.psu.edu/PartisanDefenseWords.html (accessed November 1, 2008).
Mucciaroni, Gary, and Quirk, Paul J. 2006. Deliberative choice: Debating public policy in congress. Chicago, IL: University of Chicago Press.
Nunberg, Geoffrey. 2006. Talking right: How conservatives turned liberalism into a tax-raising, latte-drinking, sushi-eating, volvo-driving, New York Times-Reading, Body-Piercing, Hollywood-Loving, Left-Wing Freak Show. New York: Public Affairs.
Poole, Keith T. 2005. Spatial models of parliamentary voting. New York, Cambridge: Cambridge University Press.
Poole, Keith T., and Rosenthal, Howard. 1997. Congress: A political-economic history of roll-call voting. Oxford: Oxford University Press.
Porter, Martin F. 2001. The English (Porter2) stemming algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html (accessed July 3, 2008).
Quinn, Kevin, Monroe, Burt L., Colaresi, Michael, Crespin, Michael, and Radev, Dragomir. 2006. How to analyze political attention with minimal assumptions and costs. Davis, CA: Society for Political Methodology.
Riker, William H. 1986. The art of political manipulation. New Haven, CT: Yale University Press.
Rokeach, Milton. 1973. The nature of human values. New York: The Free Press.
Sacerdote, Bruce, and Zidar, Owen. 2008. Campaigning in poetry: Is there information conveyed in the candidates’ choice of words. Technical report, University of Dartmouth.
Schonhardt-Bailey, Cheryl. 2008. The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science 38(3): 383410.
Shevade, S. K., and Keerthi, S. S. 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17): 2246–53.
Sinclair, Barbara. 2006. Party wars: Polarization and the politics of national policy making. Norman, OK: University of Oklahoma Press.
Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–22.
Theriault, Sean M. 2008. Party polarization in congress. Cambridge: Cambridge University Press.
Vapnik, Vladimir. 2001. The nature of statistical learning theory. New York: Springer-Verlag.
Williams, Melissa S. 1998. Voice, trust and memory: Marginalized groups and the failings of liberal representation. Princeton, NJ: Princeton University Press.
Williams, Peter M. 1995. Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1): 117–43.
Wordle. 2008. http://wordle.net (accessed July 13, 2008).
Yu, Bei, Kaufmann, Stefan, and Diermeier, Daniel. 2008. Classifying party affiliation from political speech. Journal of Information Technology and Politics 5(1): 3348.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed