Skip to main content Accessibility help
×
Home
Hostname: page-component-559fc8cf4f-qpj69 Total loading time: 0.834 Render date: 2021-03-04T16:11:06.313Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": false, "newCiteModal": false, "newCitedByModal": true }

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict

Published online by Cambridge University Press:  04 January 2017

Burt L. Monroe
Affiliation:
Department of Political Science, Quantitative Social Science Initiative, The Pennsylvania State University
Michael P. Colaresi
Affiliation:
Department of Political Science, Michigan State University
Kevin M. Quinn
Affiliation:
Department of Government and Institute for Quantitative Social Science, Harvard University
Corresponding
Rights & Permissions[Opens in a new window]

Abstract

Entries in the burgeoning “text-as-data” movement are often accompanied by lists or visualizations of how word (or other lexical feature) usage differs across some pair or set of documents. These are intended either to establish some target semantic concept (like the content of partisan frames) to estimate word-specific measures that feed forward into another analysis (like locating parties in ideological space) or both. We discuss a variety of techniques for selecting words that capture partisan, or other, differences in political speech and for evaluating the relative importance of those words. We introduce and emphasize several new approaches based on Bayesian shrinkage and regularization. We illustrate the relative utility of these approaches with analyses of partisan, gender, and distributive speech in the U.S. Senate.

Type
Special Issue: The Statistical Analysis of Political Text
Copyright
Copyright © The Author 2009. Published by Oxford University Press on behalf of the Society for Political Methodology 

Footnotes

Author's note: We would like to thank Mike Crespin, Jim Dillard, Jeff Lewis, Will Lowe, Mike MacKuen, Andrew Martin, Prasenjit Mitra, Phil Schrodt, Corwin Smidt, Denise Solomon, Jim Stimson, Anton Westveld, Chris Zorn, and participants in seminars at the University of North Carolina, Washington University, and Pennsylvania State University for helpful comments on earlier and related efforts. Any opinions, findings, and conclusions or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

Adams, Greg D. 1997. Abortion: Evidence of issue evolution. American Journal of Political Science 41(3): 718–37.CrossRefGoogle Scholar
Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: Wiley.CrossRefGoogle Scholar
Aizawa, Akiko. 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management 39(1): 4565.CrossRefGoogle Scholar
Aronson, Peter H., and Ordeshook, Peter C. 1981. Regulation, redistribution, and public choice. Public Choice 37(1): 69100.CrossRefGoogle Scholar
Baumgartner, Frank R., DeBoef, Suzanna L., and Boydstun, Amber E. 2008. The decline of the death penalty and the discovery of innocence. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Black, Duncan. 1958. The theory of committees and elections. Cambridge: Cambridge University Press.Google Scholar
Breiman, Leo. 2001. Random forests. Machine Learning 45(1): 532.CrossRefGoogle Scholar
Carmines, Edward, and Stimson, James. 1990. Issue evolution. New York: Princeton University Press.Google Scholar
Cawley, Gavin C., and Talbot, Nicola L. C. 2006. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 22(19): 2348–55.CrossRefGoogle ScholarPubMed
Chong, Dennis, and Druckman, James N. 2007. Framing theory. Annual Review of Political Science 10: 103–26.CrossRefGoogle Scholar
Cox, Gary W., and McCubbins, Mathew D. 2005. Setting the agenda: Responsible party government in the US House of Representatives. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Diaz, Mercedes Mateo. 2005. Representing women? Female legislators in West European parliaments. Colchester, UK: ECPR Press Monographs.Google Scholar
Diermeier, Daniel, Godbout, Jean-Francois, Yu, Bei, and Kaufmann, Stefan. 2007. Language and ideology in congress. Chicago, IL: Midwestern Political Science Association.Google Scholar
Emily, Senay, Newberger, Eli H., and Waters, Rob. 2004. From boys to men. New York: Simon and Schuster.Google Scholar
Ericson, Matthew. 2008. The words they used. The New York Times, September 4, http://www.nytimes.com/interactive/2008/09/04/us/politics/20080905_WORDS_GRAPHIC.html (accessed November 1, 2008).Google Scholar
Freund, Yoav, and Schapire, Robert. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1): 119–39.CrossRefGoogle Scholar
Gentzkow, Matthew, and Shapiro, Jesse M. 2006. What drives media slant? Evidence from U.S. daily newspapers. University of ChicagoTechnical report.Google Scholar
Gilligan, Carol. 1993. In a different voice: Psychological theory and women's development. Cambridge, MA: Harvard University Press.Google Scholar
Hand, David J. 2006. Classifier technology and the illusion of progress. Statistical Science 21(1): 114.CrossRefGoogle Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2001. The elements of statistical learning. New York: Springer.CrossRefGoogle Scholar
Hiemstra, Djoerd. 2000. A probabilistic justification for using tfxidf term weighting in information retrieval. International Journal on Digital Libraries 3: 131–9.CrossRefGoogle Scholar
Hillard, Dustin, Purpura, Stephen, and Wilkerson, John. 2007. Computer-assisted topic classification for mixed-methods social science research. Journal of Information Technology and Politics 4(4): 3146.CrossRefGoogle Scholar
Hopkins, Daniel, and King, Gary. 2007. Extracting systematic social science meaning from text. Chicago, IL: Midwestern Political Science Association.Google Scholar
Jurafsky, Daniel, and Martin, James H. 2000. Speech and language processing. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Kleinfeld, Judith, and Kleinfeld, Andrew. 2004. Cowboy nation and American character. Society 41(3): 4350.CrossRefGoogle Scholar
Krippendorff, Klaus. 2004. Content analysis: An introduction to its methodology. 2nd ed. New York: Sage.Google Scholar
Lakoff, George. 2004. Don't think of an elephant! Know your values and frame the debate. White River Junction, VT: Chelsea Green Publishing.Google Scholar
Laver, Michael, Benoit, Kenneth, and Garry, John. 2003. Extracting policy positions from political texts using words as data. American Political Science Review 97: 311–31.CrossRefGoogle Scholar
Lowe, Will. 2008. Understanding Wordscores. Political Analysis doi:10.1093/pan/mpn004 (accessed October 4, 2008).CrossRefGoogle Scholar
Manning, Christopher D., and Schütze, Hinrich. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.Google Scholar
Manning, Christopher D., Raghavan, Prabhakar, and Schütze, Hinrich. 2008. An introduction to information retrieval. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Many Eyes (IBM). Tag Cloud Guide. http://manyeyes.alphaworks.ibm.com/manyeyes/page/Tag_Cloud.html (accessed July 13, 2008).Google Scholar
McCaffrey, Dawn, and Keys, Jennifer. 2008. Competitive framing processes in the abortion debate: Polarization-vilification, frame saving, and frame debunking. Sociological Quarterly 41(1): 4161.CrossRefGoogle Scholar
McCarty, Nolan, Poole, Keith T., and Rosenthal, Howard. 2006. Polarized America: The dance of ideology and unequal riches. Cambridge, MA: MIT Press.Google Scholar
Monroe, Burt L., Monroe, Cheryl L., Quinn, Kevin M., Radev, Dragomir, Crespin, Michael H., Colaresi, Michael P., Balazar, Jacob, and Abney, Steven P. 2006. United States congressional speech corpus. State College, PA: Pennsylvania State University.Google Scholar
Monroe, Burt L., and Maeda, Ko. 2004. Rhetorical ideal point estimation: Mapping legislative speech. Society for Political Methodology. Palo Alto: Stanford University.Google Scholar
Monroe, Burt L., Colaresi, Michael P., and Quinn, Kevin M. An Animated History of Senate Debate on Defense, 1997–2004. http://qssi.psu.edu/PartisanDefenseWords.html (accessed November 1, 2008).Google Scholar
Mucciaroni, Gary, and Quirk, Paul J. 2006. Deliberative choice: Debating public policy in congress. Chicago, IL: University of Chicago Press.Google Scholar
Nunberg, Geoffrey. 2006. Talking right: How conservatives turned liberalism into a tax-raising, latte-drinking, sushi-eating, volvo-driving, New York Times-Reading, Body-Piercing, Hollywood-Loving, Left-Wing Freak Show. New York: Public Affairs.Google Scholar
Poole, Keith T. 2005. Spatial models of parliamentary voting. New York, Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Poole, Keith T., and Rosenthal, Howard. 1997. Congress: A political-economic history of roll-call voting. Oxford: Oxford University Press.Google Scholar
Porter, Martin F. 2001. The English (Porter2) stemming algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html (accessed July 3, 2008).Google Scholar
Quinn, Kevin, Monroe, Burt L., Colaresi, Michael, Crespin, Michael, and Radev, Dragomir. 2006. How to analyze political attention with minimal assumptions and costs. Davis, CA: Society for Political Methodology.Google Scholar
Riker, William H. 1986. The art of political manipulation. New Haven, CT: Yale University Press.Google Scholar
Rokeach, Milton. 1973. The nature of human values. New York: The Free Press.Google Scholar
Sacerdote, Bruce, and Zidar, Owen. 2008. Campaigning in poetry: Is there information conveyed in the candidates’ choice of words. Technical report, University of Dartmouth.Google Scholar
Schonhardt-Bailey, Cheryl. 2008. The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science 38(3): 383410.CrossRefGoogle Scholar
Shevade, S. K., and Keerthi, S. S. 2003. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17): 2246–53.CrossRefGoogle ScholarPubMed
Sinclair, Barbara. 2006. Party wars: Polarization and the politics of national policy making. Norman, OK: University of Oklahoma Press.Google Scholar
Slapin, Jonathan B., and Proksch, Sven-Oliver. 2008. A scaling model for estimating time-series party positions from texts. American Journal of Political Science 52(3): 705–22.CrossRefGoogle Scholar
Theriault, Sean M. 2008. Party polarization in congress. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Vapnik, Vladimir. 2001. The nature of statistical learning theory. New York: Springer-Verlag.Google Scholar
Williams, Melissa S. 1998. Voice, trust and memory: Marginalized groups and the failings of liberal representation. Princeton, NJ: Princeton University Press.Google Scholar
Williams, Peter M. 1995. Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1): 117–43.CrossRefGoogle Scholar
Wordle. 2008. http://wordle.net (accessed July 13, 2008).Google Scholar
Yu, Bei, Kaufmann, Stefan, and Diermeier, Daniel. 2008. Classifying party affiliation from political speech. Journal of Information Technology and Politics 5(1): 3348.CrossRefGoogle Scholar

Altmetric attention score

Full text views

Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.

Total number of HTML views: 1
Total number of PDF views: 2339 *
View data table for this chart

* Views captured on Cambridge Core between 04th January 2017 - 4th March 2021. This data will be updated every 24 hours.

Access

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
Available formats
×
×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *