Hostname: page-component-848d4c4894-5nwft Total loading time: 0 Render date: 2024-06-13T07:04:18.548Z Has data issue: false hasContentIssue false

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

Published online by Cambridge University Press:  04 August 2016

London School of Economics and Trinity College
New York University
London School of Economics and Political Science
New York University
University College London
Kenneth Benoit is Professor, London School of Economics and Trinity College, Dublin (
Drew Conway, New York University.
Benjamin E. Lauderdale is Associate Professor, London School of Economics.
Michael Laver is Professor, New York University.
Slava Mikhaylov is Senior Lecturer, University College London.


Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

Research Article
Copyright © American Political Science Association 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)



Alonso, O., and Baeza-Yates, R.. 2011. “Design and Implementation of Relevance Assessments Using Crowdsourcing.” In Advances in Information Retrieval, eds. Clough, P., Foley, C., Gurrin, C., Jones, G., Kraaij, W., Lee, H., and Mudoch, V.. Berlin: Springer.Google Scholar
Alonso, O., and Mizzaro, S.. 2009. Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. Paper read at Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation.Google Scholar
Ariely, D., Au, W. T., Bender, R. H., Budescu, D. V., Dietz, C. B., Gu, H., and Zauberman, G.. 2000. “The effects of averaging subjective probability estimates between and within judges.” Journal of Experimental Psychology: Applied 6 (2): 130–47.Google Scholar
Armstrong, J. S., ed. 2001. Principles of Forecasting: A Handbook for Researchers and Practitioners. New York: Springer.Google Scholar
Baker, Frank B, and Kim, Seock-Ho. 2004. Item Response Theory: Parameter Estimation Techniques. Boca Raton: CRC Press.Google Scholar
Bakker, Ryan, de Vries, Catherine, Edwards, Erica, Hooghe, Liesbet, Jolly, Seth, Marks, Gary, Polk, Jonathan, Rovny, Jan, Steenbergen, Marco and Vachudova, Milada. 2015. “Measuring Party Positions in Europe: The Chapel Hill Expert Survey Trend File, 1999-2010.” Party Politics 21 (1): 143–52.Google Scholar
Benoit, Kenneth. 2005. “Policy Positions in Britain 2005: Results from an Expert Survey.” London School of Economics.Google Scholar
Benoit, Kenneth. 2010. “Expert Survey of British Political Parties.” Trinity College Dublin.Google Scholar
Benoit, Kenneth, and Laver, Michael. 2006. Party Policy in Modern Democracies. London: Routledge.Google Scholar
Berinsky, A., Huber, G., and Lenz, G.. 2012. “Evaluating Online Labor Markets for Experimental Research:'s Mechanical Turk.” Political Analysis 20 (3): 351–68.Google Scholar
Berinsky, A., Margolis, M., and Sances, M.. 2014. “Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self-Administered Surveys.” American Journal of Political Science.Google Scholar
Bohannon, J. 2011. “Social Science for Pennies.” Science 334: 307.CrossRefGoogle ScholarPubMed
Budge, Ian, Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Tannenbaum, Eric, Fording, Richard, Hearl, Derek, Kim, Hee Min, McDonald, Michael, and Mendes, Silvia. 2001. Mapping Policy Preferences: Estimates for Parties, Electors and Governments 1945–1998. Oxford: Oxford University Press.Google Scholar
Budge, Ian, Robertson, David, and Hearl, Derek. 1987. Ideology, Strategy and Party Change: Spatial Analyses of Post-War Election Programmes in 19 Democracies. Cambridge, UK: Cambridge University Press.Google Scholar
Cao, J, Stokes, S., and Zhang, S.. 2010. “A Bayesian Approach to Ranking and Rater Evaluation: An Application to Grant Reviews.” Journal of Educational and Behavioral Statistics 35 (2): 194214.Google Scholar
Carpenter, B. 2008. “Multilevel Bayesian Models of Categorical Data Annotation.” Unpublished manuscript.Google Scholar
Chandler, Jesse, Mueller, Pam, and Paolacci, Gabriel. 2014. “Nonnaïveté among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers.” Behavior Research Methods 46 (1): 112–30.Google Scholar
Clemen, R., and Winkler, R.. 1999. “Combining Probability Distributions From Experts in Risk Analysis.” Risk Analysis 19 (2): 187203.Google Scholar
Conway, Drew. 2013. “Applications of Computational Methods in Political Science.” Department of Politics, New York University.Google Scholar
Däubler, Thomas, Benoit, Kenneth, Mikhaylov, Slava, and Laver, Michael. 2012. “Natural Sentences as Valid Units for Coded Political Text.” British Journal of Political Science 42 (4): 937–51.Google Scholar
Eickhoff, C., and de Vries, A.. 2012. “Increasing Cheat Robustness of Crowdsourcing Tasks.” Information Retrieval 15: 117.Google Scholar
Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications: New York: Springer.Google Scholar
Galton, F. 1907. “Vox Populi.” Nature (London) 75: 450–1.CrossRefGoogle Scholar
Goodman, Joseph, Cryder, Cynthia, and Cheema, Amar. 2013. “Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples.” Journal of Behavioral Decision Making 26 (3): 213–24.Google Scholar
Grimmer, Justin, and Stewart, Brandon M. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97.Google Scholar
Hambleton, Ronald K., Swaminathan, Hariharan, and Jane Rogers, H.. 1991. Fundamentals of Item Response Theory. Thousand Oaks, CA: Sage.Google Scholar
Hooghe, Liesbet, Bakker, Ryan, Brigevich, Anna, Vries, Catherine de, Edwards, Erica, Marks, Gary, Rovny, Jan, Steenbergen, Marco, and Vachudova, Milada. 2010. “Reliability and Validity of Measuring Party Positions: The Chapel Hill Expert Surveys of 2002 and 2006.” European Journal of Political Research. 49 (5): 687703.Google Scholar
Horton, J., Rand, D., and Zeckhauser, R.. 2011. “The Online Laboratory: Conducting Experiments in a Real Labor Market.” Experimental Economics 14: 399425.Google Scholar
Hsueh, P., Melville, P., and Sindhwani, V.. 2009. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria.” Paper read at Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing.Google Scholar
Ipeirotis, Panagiotis G., Provost, Foster, Sheng, Victor S., and Wang, Jing. 2014. “Repeated Labeling using Multiple Noisy Labelers.” Data Mining and Knowledge Discovery 28 (2): 402–41.CrossRefGoogle Scholar
Jones, Frank R., and Baumgartner, Bryan D.. 2013. “Policy Agendas Project.”Google Scholar
Kapelner, A., and Chandler, D.. 2010. “Preventing Satisficing in Online Surveys: A ‘Kapcha’ to Ensure Higher Quality Data.” Paper read at The World's First Conference on the Future of Distributed Work (CrowdConf 2010).Google Scholar
King, Gary. 1995. “Replication, Replication.” PS: Political Science & Politics 28 (03): 444–52.Google Scholar
Klingemann, Hans-Dieter, Hofferbert, Richard I., and Budge, Ian. 1994. Parties, Policies, and Democracy. Boulder: Westview Press.Google Scholar
Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Budge, Ian, and McDonald, Michael. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990–2003. Oxford: Oxford University Press.Google Scholar
Krippendorff, Klaus. 2013. Content Analysis: An Introduction to Its Methodology. 3rd ed. Thousand Oaks, CA: Sage.Google Scholar
Laver, M. 1998. “Party Policy in Britain 1997: Results from an Expert Survey.” Political Studies 46 (2): 336–47.Google Scholar
Laver, Michael, and Budge, Ian. 1992. Party Policy and Government Coalitions. New York: St. Martin's Press.Google Scholar
Laver, Michael, and Hunt, W. Ben. 1992. Policy and Party Competition. New York: Routledge.Google Scholar
Lawson, C., Lenz, G., Baker, A., and Myers, M.. 2010. “Looking Like a Winner: Candidate Appearance and Electoral Success in New Democracies.” World Politics 62 (4): 561–93.Google Scholar
Lin, L. 1989. “A Concordance Correlation Coefficient to Evaluate Reproducibility.” Biometrics 45: 255–68.Google Scholar
Lin, L. 2000. “A Note on the Concordance Correlation Coefficient.” Biometrics 56: 324–5.Google Scholar
Lord, Frederic. 1980. Applications of Item Response Theory to Practical Testing Problems. New York: Routledge.Google Scholar
Lyon, Aidan, and Pacuit, Eric. 2013. “The Wisdom of Crowds: Methods of Human Judgement Aggregation.” In Handbook of Human Computation, ed. Michelucci, P.. New York: Springer.Google Scholar
Mason, W., and Suri, S.. 2012. “Conducting Behavioral Research on Amazon's Mechanical Turk.” Behavior Research Methods 44 (1): 123.Google Scholar
Mikhaylov, Slava, Laver, Michael, and Benoit, Kenneth. 2012. “Coder Reliability and Misclassification in Comparative Manifesto Project Codings.” Political Analysis 20 (1): 7891.Google Scholar
Nowak, S., and Rger, S.. 2010. “How Reliable are Annotations via Crowdsourcing? A Study about Inter-Annotator Agreement for Multi-Label Image Annotation.” Paper read at The 11th ACM International Conference on Multimedia Information Retrieval, 29–31 March 2010, Philadelphia.CrossRefGoogle Scholar
Paolacci, Gabriel, Chandler, Jesse, and Ipeirotis, Panagiotis. 2010. “Running Experiments on Amazon Mechanical Turk.” Judgement and Decision Making 5: 411–9.Google Scholar
Quoc Viet Hung, Nguyen, Tam, Nguyen Thanh, Tran, Lam Ngoc, and Aberer, Karl. 2013. “An Evaluation of Aggregation Techniques in Crowdsourcing.” In Web Information Systems Engineering – WISE 2013, eds. Lin, X., Manolopoulos, Y., Srivastava, D., and Huang, G.: Berlin: Springer.Google Scholar
Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bo-goni, L., and Moy, L.. 2010. “Learning from Crowds.” Journal of Machine Learning Research 11: 1297–322.Google Scholar
Ruedin, Didier. 2013. “Obtaining Party Positions on Immigration in Switzerland: Comparing Different Methods.” Swiss Political Science Review 19 (1): 84105.CrossRefGoogle Scholar
Ruedin, Didier, and Morales, Laura. 2012. “Obtaining Party Positions on Immigration from Party Manifestos.” Paper presented at the Elections, Public Opinion and Parties (EPOP) conference, Oxford, 7 Sept 2012.Google Scholar
Schwarz, Daniel, Traber, Denise, and Benoit, Kenneth. Forthcoming. “Estimating Intra-Party Preferences: Comparing Speeches to Votes.” Political Science Research and Methods.Google Scholar
Sheng, V., Provost, F., and Ipeirotis, Panagiotis. 2008. “Get Another Label? Improving Data Quality and Data Mining using Multiple, Noisy Labelers.” Paper read at Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Google Scholar
Snow, R., O'Connor, B., Jurafsky, D., and Ng, A.. 2008. “Cheap and Fast—But is it Good?: Evaluating Non-expert Annotations for Natural Language Tasks.” Paper read at Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
Surowiecki, J. 2004. The Wisdom of Crowds. New York: W.W. Norton & Company, Inc. Google Scholar
Turner, Brandon M., Steyvers, Mark, Merkle, Edgar C., Budescu, David V., and Wallsten, Thomas S.. 2014. “Forecast Aggregation via Recalibration.” Machine Learning 95 (3): 261–89.Google Scholar
Volkens, Andrea. 2001. “Manifesto Coding Instructions, 2nd revised ed.” In Discussion Paper (2001), ed. W. Berlin, p. 96, Andrea Volkens Berlin: Wissenschaftszentrum Berlin für Sozialforschung gGmbH (WZB).Google Scholar
Welinder, P., Branson, S., Belongie, S., and Perona, P.. 2010. “The Multidimensional Wisdom of Crowds.” Paper read at Advances in Neural Information Processing Systems 23 (NIPS 2010).Google Scholar
Welinder, P., and Perona, P.. 2010. “Online Crowdsourcing: Rating Annotators and Obtaining Cost-effective Labels.” Paper read at IEEE Conference on Computer Vision and Pattern Recognition Workshops (ACVHL).Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., and Movellan, J.. 2009. “Whose Vote should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise.” Paper read at Advances in Neural Information Processing Systems 22 (NIPS 2009).Google Scholar
Supplementary material: PDF

BENOIT supplementary material

Supplementary material

Download BENOIT supplementary material(PDF)
PDF 2.8 MB
Submit a response


No Comments have been published for this article.