Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

KENNETH BENOIT; DREW CONWAY; BENJAMIN E. LAUDERDALE; MICHAEL LAVER; SLAVA MIKHAYLOV

doi:10.1017/S0003055416000058

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

Published online by Cambridge University Press: 04 August 2016

KENNETH BENOIT ,

DREW CONWAY ,

BENJAMIN E. LAUDERDALE ,

MICHAEL LAVER and

SLAVA MIKHAYLOV

Show author details

KENNETH BENOIT*: Affiliation:
London School of Economics and Trinity College
DREW CONWAY*: Affiliation:
New York University
BENJAMIN E. LAUDERDALE*: Affiliation:
London School of Economics and Political Science
MICHAEL LAVER*: Affiliation:
New York University
SLAVA MIKHAYLOV*: Affiliation:
University College London
*: Kenneth Benoit is Professor, London School of Economics and Trinity College, Dublin (kbenoit@lse.ac.uk).
Drew Conway, New York University.
Benjamin E. Lauderdale is Associate Professor, London School of Economics.
Michael Laver is Professor, New York University.
Slava Mikhaylov is Senior Lecturer, University College London.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

Type: Research Article
Information: American Political Science Review , Volume 110 , Issue 2 , May 2016 , pp. 278 - 295

DOI: https://doi.org/10.1017/S0003055416000058 [Opens in a new window]
Copyright: Copyright © American Political Science Association 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Alonso, O., and Baeza-Yates, R.. 2011. “Design and Implementation of Relevance Assessments Using Crowdsourcing.” In Advances in Information Retrieval, eds. Clough, P., Foley, C., Gurrin, C., Jones, G., Kraaij, W., Lee, H., and Mudoch, V.. Berlin: Springer.Google Scholar

Alonso, O., and Mizzaro, S.. 2009. Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. Paper read at Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation.Google Scholar

Ariely, D., Au, W. T., Bender, R. H., Budescu, D. V., Dietz, C. B., Gu, H., and Zauberman, G.. 2000. “The effects of averaging subjective probability estimates between and within judges.” Journal of Experimental Psychology: Applied 6 (2): 130–47.Google Scholar

Armstrong, J. S., ed. 2001. Principles of Forecasting: A Handbook for Researchers and Practitioners. New York: Springer.Google Scholar

Baker, Frank B, and Kim, Seock-Ho. 2004. Item Response Theory: Parameter Estimation Techniques. Boca Raton: CRC Press.Google Scholar

Bakker, Ryan, de Vries, Catherine, Edwards, Erica, Hooghe, Liesbet, Jolly, Seth, Marks, Gary, Polk, Jonathan, Rovny, Jan, Steenbergen, Marco and Vachudova, Milada. 2015. “Measuring Party Positions in Europe: The Chapel Hill Expert Survey Trend File, 1999-2010.” Party Politics 21 (1): 143–52.Google Scholar

Benoit, Kenneth. 2005. “Policy Positions in Britain 2005: Results from an Expert Survey.” London School of Economics.Google Scholar

Benoit, Kenneth. 2010. “Expert Survey of British Political Parties.” Trinity College Dublin.Google Scholar

Benoit, Kenneth, and Laver, Michael. 2006. Party Policy in Modern Democracies. London: Routledge.Google Scholar

Berinsky, A., Huber, G., and Lenz, G.. 2012. “Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk.” Political Analysis 20 (3): 351–68.Google Scholar

Berinsky, A., Margolis, M., and Sances, M.. 2014. “Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self-Administered Surveys.” American Journal of Political Science.Google Scholar

Bohannon, J. 2011. “Social Science for Pennies.” Science 334: 307.CrossRef Google Scholar PubMed

Budge, Ian, Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Tannenbaum, Eric, Fording, Richard, Hearl, Derek, Kim, Hee Min, McDonald, Michael, and Mendes, Silvia. 2001. Mapping Policy Preferences: Estimates for Parties, Electors and Governments 1945–1998. Oxford: Oxford University Press.Google Scholar

Budge, Ian, Robertson, David, and Hearl, Derek. 1987. Ideology, Strategy and Party Change: Spatial Analyses of Post-War Election Programmes in 19 Democracies. Cambridge, UK: Cambridge University Press.Google Scholar

Cao, J, Stokes, S., and Zhang, S.. 2010. “A Bayesian Approach to Ranking and Rater Evaluation: An Application to Grant Reviews.” Journal of Educational and Behavioral Statistics 35 (2): 194–214.Google Scholar

Carpenter, B. 2008. “Multilevel Bayesian Models of Categorical Data Annotation.” Unpublished manuscript.Google Scholar

Chandler, Jesse, Mueller, Pam, and Paolacci, Gabriel. 2014. “Nonnaïveté among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers.” Behavior Research Methods 46 (1): 112–30.Google Scholar

Clemen, R., and Winkler, R.. 1999. “Combining Probability Distributions From Experts in Risk Analysis.” Risk Analysis 19 (2): 187–203.Google Scholar

Conway, Drew. 2013. “Applications of Computational Methods in Political Science.” Department of Politics, New York University.Google Scholar

Däubler, Thomas, Benoit, Kenneth, Mikhaylov, Slava, and Laver, Michael. 2012. “Natural Sentences as Valid Units for Coded Political Text.” British Journal of Political Science 42 (4): 937–51.Google Scholar

Eickhoff, C., and de Vries, A.. 2012. “Increasing Cheat Robustness of Crowdsourcing Tasks.” Information Retrieval 15: 1–17.Google Scholar

Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications: New York: Springer.Google Scholar

Galton, F. 1907. “Vox Populi.” Nature (London) 75: 450–1.CrossRef Google Scholar

Goodman, Joseph, Cryder, Cynthia, and Cheema, Amar. 2013. “Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples.” Journal of Behavioral Decision Making 26 (3): 213–24.Google Scholar

Grimmer, Justin, and Stewart, Brandon M. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97.Google Scholar

Hambleton, Ronald K., Swaminathan, Hariharan, and Jane Rogers, H.. 1991. Fundamentals of Item Response Theory. Thousand Oaks, CA: Sage.Google Scholar

Hooghe, Liesbet, Bakker, Ryan, Brigevich, Anna, Vries, Catherine de, Edwards, Erica, Marks, Gary, Rovny, Jan, Steenbergen, Marco, and Vachudova, Milada. 2010. “Reliability and Validity of Measuring Party Positions: The Chapel Hill Expert Surveys of 2002 and 2006.” European Journal of Political Research. 49 (5): 687–703.Google Scholar

Horton, J., Rand, D., and Zeckhauser, R.. 2011. “The Online Laboratory: Conducting Experiments in a Real Labor Market.” Experimental Economics 14: 399–425.Google Scholar

Hsueh, P., Melville, P., and Sindhwani, V.. 2009. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria.” Paper read at Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing.Google Scholar

Ipeirotis, Panagiotis G., Provost, Foster, Sheng, Victor S., and Wang, Jing. 2014. “Repeated Labeling using Multiple Noisy Labelers.” Data Mining and Knowledge Discovery 28 (2): 402–41.CrossRef Google Scholar

Jones, Frank R., and Baumgartner, Bryan D.. 2013. “Policy Agendas Project.”Google Scholar

Kapelner, A., and Chandler, D.. 2010. “Preventing Satisficing in Online Surveys: A ‘Kapcha’ to Ensure Higher Quality Data.” Paper read at The World's First Conference on the Future of Distributed Work (CrowdConf 2010).Google Scholar

King, Gary. 1995. “Replication, Replication.” PS: Political Science & Politics 28 (03): 444–52.Google Scholar

Klingemann, Hans-Dieter, Hofferbert, Richard I., and Budge, Ian. 1994. Parties, Policies, and Democracy. Boulder: Westview Press.Google Scholar

Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Budge, Ian, and McDonald, Michael. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990–2003. Oxford: Oxford University Press.Google Scholar

Krippendorff, Klaus. 2013. Content Analysis: An Introduction to Its Methodology. 3rd ed. Thousand Oaks, CA: Sage.Google Scholar

Laver, M. 1998. “Party Policy in Britain 1997: Results from an Expert Survey.” Political Studies 46 (2): 336–47.Google Scholar

Laver, Michael, and Budge, Ian. 1992. Party Policy and Government Coalitions. New York: St. Martin's Press.Google Scholar

Laver, Michael, and Hunt, W. Ben. 1992. Policy and Party Competition. New York: Routledge.Google Scholar

Lawson, C., Lenz, G., Baker, A., and Myers, M.. 2010. “Looking Like a Winner: Candidate Appearance and Electoral Success in New Democracies.” World Politics 62 (4): 561–93.Google Scholar

Lin, L. 1989. “A Concordance Correlation Coefficient to Evaluate Reproducibility.” Biometrics 45: 255–68.Google Scholar

Lin, L. 2000. “A Note on the Concordance Correlation Coefficient.” Biometrics 56: 324–5.Google Scholar

Lord, Frederic. 1980. Applications of Item Response Theory to Practical Testing Problems. New York: Routledge.Google Scholar

Lyon, Aidan, and Pacuit, Eric. 2013. “The Wisdom of Crowds: Methods of Human Judgement Aggregation.” In Handbook of Human Computation, ed. Michelucci, P.. New York: Springer.Google Scholar

Mason, W., and Suri, S.. 2012. “Conducting Behavioral Research on Amazon's Mechanical Turk.” Behavior Research Methods 44 (1): 1–23.Google Scholar

Mikhaylov, Slava, Laver, Michael, and Benoit, Kenneth. 2012. “Coder Reliability and Misclassification in Comparative Manifesto Project Codings.” Political Analysis 20 (1): 78–91.Google Scholar

Nowak, S., and Rger, S.. 2010. “How Reliable are Annotations via Crowdsourcing? A Study about Inter-Annotator Agreement for Multi-Label Image Annotation.” Paper read at The 11th ACM International Conference on Multimedia Information Retrieval, 29–31 March 2010, Philadelphia.CrossRef Google Scholar

Paolacci, Gabriel, Chandler, Jesse, and Ipeirotis, Panagiotis. 2010. “Running Experiments on Amazon Mechanical Turk.” Judgement and Decision Making 5: 411–9.Google Scholar

Quoc Viet Hung, Nguyen, Tam, Nguyen Thanh, Tran, Lam Ngoc, and Aberer, Karl. 2013. “An Evaluation of Aggregation Techniques in Crowdsourcing.” In Web Information Systems Engineering – WISE 2013, eds. Lin, X., Manolopoulos, Y., Srivastava, D., and Huang, G.: Berlin: Springer.Google Scholar

Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bo-goni, L., and Moy, L.. 2010. “Learning from Crowds.” Journal of Machine Learning Research 11: 1297–322.Google Scholar

Ruedin, Didier. 2013. “Obtaining Party Positions on Immigration in Switzerland: Comparing Different Methods.” Swiss Political Science Review 19 (1): 84–105.CrossRef Google Scholar

Ruedin, Didier, and Morales, Laura. 2012. “Obtaining Party Positions on Immigration from Party Manifestos.” Paper presented at the Elections, Public Opinion and Parties (EPOP) conference, Oxford, 7 Sept 2012.Google Scholar

Schwarz, Daniel, Traber, Denise, and Benoit, Kenneth. Forthcoming. “Estimating Intra-Party Preferences: Comparing Speeches to Votes.” Political Science Research and Methods.Google Scholar

Sheng, V., Provost, F., and Ipeirotis, Panagiotis. 2008. “Get Another Label? Improving Data Quality and Data Mining using Multiple, Noisy Labelers.” Paper read at Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Google Scholar

Snow, R., O'Connor, B., Jurafsky, D., and Ng, A.. 2008. “Cheap and Fast—But is it Good?: Evaluating Non-expert Annotations for Natural Language Tasks.” Paper read at Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar

Surowiecki, J. 2004. The Wisdom of Crowds. New York: W.W. Norton & Company, Inc. Google Scholar

Turner, Brandon M., Steyvers, Mark, Merkle, Edgar C., Budescu, David V., and Wallsten, Thomas S.. 2014. “Forecast Aggregation via Recalibration.” Machine Learning 95 (3): 261–89.Google Scholar

Volkens, Andrea. 2001. “Manifesto Coding Instructions, 2nd revised ed.” In Discussion Paper (2001), ed. W. Berlin, p. 96, Andrea Volkens Berlin: Wissenschaftszentrum Berlin für Sozialforschung gGmbH (WZB).Google Scholar

Welinder, P., Branson, S., Belongie, S., and Perona, P.. 2010. “The Multidimensional Wisdom of Crowds.” Paper read at Advances in Neural Information Processing Systems 23 (NIPS 2010).Google Scholar

Welinder, P., and Perona, P.. 2010. “Online Crowdsourcing: Rating Annotators and Obtaining Cost-effective Labels.” Paper read at IEEE Conference on Computer Vision and Pattern Recognition Workshops (ACVHL).Google Scholar

Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., and Movellan, J.. 2009. “Whose Vote should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise.” Paper read at Advances in Neural Information Processing Systems 22 (NIPS 2009).Google Scholar

BENOIT supplementary material

Supplementary material

PDF 2.8 MB

Submit a response

Comments

No Comments have been published for this article.

Article contents

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

Abstract

Access options

References

REFERENCES

BENOIT supplementary material

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests