Skip to main content Accessibility help

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data



Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.


Corresponding author

Kenneth Benoit is Professor, London School of Economics and Trinity College, Dublin (
Drew Conway, New York University.
Benjamin E. Lauderdale is Associate Professor, London School of Economics.
Michael Laver is Professor, New York University.
Slava Mikhaylov is Senior Lecturer, University College London.


Hide All
Alonso, O., and Baeza-Yates, R.. 2011. “Design and Implementation of Relevance Assessments Using Crowdsourcing.” In Advances in Information Retrieval, eds. Clough, P., Foley, C., Gurrin, C., Jones, G., Kraaij, W., Lee, H., and Mudoch, V.. Berlin: Springer.
Alonso, O., and Mizzaro, S.. 2009. Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. Paper read at Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation.
Ariely, D., Au, W. T., Bender, R. H., Budescu, D. V., Dietz, C. B., Gu, H., and Zauberman, G.. 2000. “The effects of averaging subjective probability estimates between and within judges.” Journal of Experimental Psychology: Applied 6 (2): 130–47.
Armstrong, J. S., ed. 2001. Principles of Forecasting: A Handbook for Researchers and Practitioners. New York: Springer.
Baker, Frank B, and Kim, Seock-Ho. 2004. Item Response Theory: Parameter Estimation Techniques. Boca Raton: CRC Press.
Bakker, Ryan, de Vries, Catherine, Edwards, Erica, Hooghe, Liesbet, Jolly, Seth, Marks, Gary, Polk, Jonathan, Rovny, Jan, Steenbergen, Marco and Vachudova, Milada. 2015. “Measuring Party Positions in Europe: The Chapel Hill Expert Survey Trend File, 1999-2010.” Party Politics 21 (1): 143–52.
Benoit, Kenneth. 2005. “Policy Positions in Britain 2005: Results from an Expert Survey.” London School of Economics.
Benoit, Kenneth. 2010. “Expert Survey of British Political Parties.” Trinity College Dublin.
Benoit, Kenneth, and Laver, Michael. 2006. Party Policy in Modern Democracies. London: Routledge.
Berinsky, A., Huber, G., and Lenz, G.. 2012. “Evaluating Online Labor Markets for Experimental Research:'s Mechanical Turk.” Political Analysis 20 (3): 351–68.
Berinsky, A., Margolis, M., and Sances, M.. 2014. “Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self-Administered Surveys.” American Journal of Political Science.
Bohannon, J. 2011. “Social Science for Pennies.” Science 334: 307.
Budge, Ian, Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Tannenbaum, Eric, Fording, Richard, Hearl, Derek, Kim, Hee Min, McDonald, Michael, and Mendes, Silvia. 2001. Mapping Policy Preferences: Estimates for Parties, Electors and Governments 1945–1998. Oxford: Oxford University Press.
Budge, Ian, Robertson, David, and Hearl, Derek. 1987. Ideology, Strategy and Party Change: Spatial Analyses of Post-War Election Programmes in 19 Democracies. Cambridge, UK: Cambridge University Press.
Cao, J, Stokes, S., and Zhang, S.. 2010. “A Bayesian Approach to Ranking and Rater Evaluation: An Application to Grant Reviews.” Journal of Educational and Behavioral Statistics 35 (2): 194214.
Carpenter, B. 2008. “Multilevel Bayesian Models of Categorical Data Annotation.” Unpublished manuscript.
Chandler, Jesse, Mueller, Pam, and Paolacci, Gabriel. 2014. “Nonnaïveté among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers.” Behavior Research Methods 46 (1): 112–30.
Clemen, R., and Winkler, R.. 1999. “Combining Probability Distributions From Experts in Risk Analysis.” Risk Analysis 19 (2): 187203.
Conway, Drew. 2013. “Applications of Computational Methods in Political Science.” Department of Politics, New York University.
Däubler, Thomas, Benoit, Kenneth, Mikhaylov, Slava, and Laver, Michael. 2012. “Natural Sentences as Valid Units for Coded Political Text.” British Journal of Political Science 42 (4): 937–51.
Eickhoff, C., and de Vries, A.. 2012. “Increasing Cheat Robustness of Crowdsourcing Tasks.” Information Retrieval 15: 117.
Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications: New York: Springer.
Galton, F. 1907. “Vox Populi.” Nature (London) 75: 450–1.
Goodman, Joseph, Cryder, Cynthia, and Cheema, Amar. 2013. “Data Collection in a Flat World: Strengths and Weaknesses of Mechanical Turk Samples.” Journal of Behavioral Decision Making 26 (3): 213–24.
Grimmer, Justin, and Stewart, Brandon M. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97.
Hambleton, Ronald K., Swaminathan, Hariharan, and Jane Rogers, H.. 1991. Fundamentals of Item Response Theory. Thousand Oaks, CA: Sage.
Hooghe, Liesbet, Bakker, Ryan, Brigevich, Anna, Vries, Catherine de, Edwards, Erica, Marks, Gary, Rovny, Jan, Steenbergen, Marco, and Vachudova, Milada. 2010. “Reliability and Validity of Measuring Party Positions: The Chapel Hill Expert Surveys of 2002 and 2006.” European Journal of Political Research. 49 (5): 687703.
Horton, J., Rand, D., and Zeckhauser, R.. 2011. “The Online Laboratory: Conducting Experiments in a Real Labor Market.” Experimental Economics 14: 399425.
Hsueh, P., Melville, P., and Sindhwani, V.. 2009. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria.” Paper read at Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing.
Ipeirotis, Panagiotis G., Provost, Foster, Sheng, Victor S., and Wang, Jing. 2014. “Repeated Labeling using Multiple Noisy Labelers.” Data Mining and Knowledge Discovery 28 (2): 402–41.
Jones, Frank R., and Baumgartner, Bryan D.. 2013. “Policy Agendas Project.”
Kapelner, A., and Chandler, D.. 2010. “Preventing Satisficing in Online Surveys: A ‘Kapcha’ to Ensure Higher Quality Data.” Paper read at The World's First Conference on the Future of Distributed Work (CrowdConf 2010).
King, Gary. 1995. “Replication, Replication.” PS: Political Science & Politics 28 (03): 444–52.
Klingemann, Hans-Dieter, Hofferbert, Richard I., and Budge, Ian. 1994. Parties, Policies, and Democracy. Boulder: Westview Press.
Klingemann, Hans-Dieter, Volkens, Andrea, Bara, Judith, Budge, Ian, and McDonald, Michael. 2006. Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990–2003. Oxford: Oxford University Press.
Krippendorff, Klaus. 2013. Content Analysis: An Introduction to Its Methodology. 3rd ed. Thousand Oaks, CA: Sage.
Laver, M. 1998. “Party Policy in Britain 1997: Results from an Expert Survey.” Political Studies 46 (2): 336–47.
Laver, Michael, and Budge, Ian. 1992. Party Policy and Government Coalitions. New York: St. Martin's Press.
Laver, Michael, and Hunt, W. Ben. 1992. Policy and Party Competition. New York: Routledge.
Lawson, C., Lenz, G., Baker, A., and Myers, M.. 2010. “Looking Like a Winner: Candidate Appearance and Electoral Success in New Democracies.” World Politics 62 (4): 561–93.
Lin, L. 1989. “A Concordance Correlation Coefficient to Evaluate Reproducibility.” Biometrics 45: 255–68.
Lin, L. 2000. “A Note on the Concordance Correlation Coefficient.” Biometrics 56: 324–5.
Lord, Frederic. 1980. Applications of Item Response Theory to Practical Testing Problems. New York: Routledge.
Lyon, Aidan, and Pacuit, Eric. 2013. “The Wisdom of Crowds: Methods of Human Judgement Aggregation.” In Handbook of Human Computation, ed. Michelucci, P.. New York: Springer.
Mason, W., and Suri, S.. 2012. “Conducting Behavioral Research on Amazon's Mechanical Turk.” Behavior Research Methods 44 (1): 123.
Mikhaylov, Slava, Laver, Michael, and Benoit, Kenneth. 2012. “Coder Reliability and Misclassification in Comparative Manifesto Project Codings.” Political Analysis 20 (1): 7891.
Nowak, S., and Rger, S.. 2010. “How Reliable are Annotations via Crowdsourcing? A Study about Inter-Annotator Agreement for Multi-Label Image Annotation.” Paper read at The 11th ACM International Conference on Multimedia Information Retrieval, 29–31 March 2010, Philadelphia.
Paolacci, Gabriel, Chandler, Jesse, and Ipeirotis, Panagiotis. 2010. “Running Experiments on Amazon Mechanical Turk.” Judgement and Decision Making 5: 411–9.
Quoc Viet Hung, Nguyen, Tam, Nguyen Thanh, Tran, Lam Ngoc, and Aberer, Karl. 2013. “An Evaluation of Aggregation Techniques in Crowdsourcing.” In Web Information Systems Engineering – WISE 2013, eds. Lin, X., Manolopoulos, Y., Srivastava, D., and Huang, G.: Berlin: Springer.
Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bo-goni, L., and Moy, L.. 2010. “Learning from Crowds.” Journal of Machine Learning Research 11: 1297–322.
Ruedin, Didier. 2013. “Obtaining Party Positions on Immigration in Switzerland: Comparing Different Methods.” Swiss Political Science Review 19 (1): 84105.
Ruedin, Didier, and Morales, Laura. 2012. “Obtaining Party Positions on Immigration from Party Manifestos.” Paper presented at the Elections, Public Opinion and Parties (EPOP) conference, Oxford, 7 Sept 2012.
Schwarz, Daniel, Traber, Denise, and Benoit, Kenneth. Forthcoming. “Estimating Intra-Party Preferences: Comparing Speeches to Votes.” Political Science Research and Methods.
Sheng, V., Provost, F., and Ipeirotis, Panagiotis. 2008. “Get Another Label? Improving Data Quality and Data Mining using Multiple, Noisy Labelers.” Paper read at Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Snow, R., O'Connor, B., Jurafsky, D., and Ng, A.. 2008. “Cheap and Fast—But is it Good?: Evaluating Non-expert Annotations for Natural Language Tasks.” Paper read at Proceedings of the Conference on Empirical Methods in Natural Language Processing.
Surowiecki, J. 2004. The Wisdom of Crowds. New York: W.W. Norton & Company, Inc.
Turner, Brandon M., Steyvers, Mark, Merkle, Edgar C., Budescu, David V., and Wallsten, Thomas S.. 2014. “Forecast Aggregation via Recalibration.” Machine Learning 95 (3): 261–89.
Volkens, Andrea. 2001. “Manifesto Coding Instructions, 2nd revised ed.” In Discussion Paper (2001), ed. W. Berlin, p. 96, Andrea Volkens Berlin: Wissenschaftszentrum Berlin für Sozialforschung gGmbH (WZB).
Welinder, P., Branson, S., Belongie, S., and Perona, P.. 2010. “The Multidimensional Wisdom of Crowds.” Paper read at Advances in Neural Information Processing Systems 23 (NIPS 2010).
Welinder, P., and Perona, P.. 2010. “Online Crowdsourcing: Rating Annotators and Obtaining Cost-effective Labels.” Paper read at IEEE Conference on Computer Vision and Pattern Recognition Workshops (ACVHL).
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., and Movellan, J.. 2009. “Whose Vote should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise.” Paper read at Advances in Neural Information Processing Systems 22 (NIPS 2009).

Related content

Powered by UNSILO
Type Description Title
Supplementary materials

BENOIT supplementary material
Supplementary material

 PDF (2.8 MB)
2.8 MB

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.