Skip to main content

Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach

  • Alexander Coppock

To what extent do survey experimental treatment effect estimates generalize to other populations and contexts? Survey experiments conducted on convenience samples have often been criticized on the grounds that subjects are sufficiently different from the public at large to render the results of such experiments uninformative more broadly. In the presence of moderate treatment effect heterogeneity, however, such concerns may be allayed. I provide evidence from a series of 15 replication experiments that results derived from convenience samples like Amazon’s Mechanical Turk are similar to those obtained from national samples. Either the treatments deployed in these experiments cause similar responses for many subject types or convenience and national samples do not differ much with respect to treatment effect moderators. Using evidence of limited within-experiment heterogeneity, I show that the former is likely to be the case. Despite a wide diversity of background characteristics across samples, the effects uncovered in these experiments appear to be relatively homogeneous.

Hide All

Alexander Coppock is an Assistant Professor in the Department of Political Science, Yale University, New Haven, CT 06520 ( Portions of this research were funded by the Time-sharing Experiments in the Social Sciences (TESS) organization. This research was reviewed and approved by the Institutional Review Board of Columbia University (IRB-AAAP1312). The author is grateful to James N. Druckman and Donald P. Green for their guidance and support. To view supplementary material for this article, please visit

Hide All
Berinsky, Adam J., Huber, Gregory A., and Lenz, Gabriel S.. 2012. ‘Evaluating Online Labor Markets for Experimental Research:’s Mechanical Turk’. Political Analysis 20(3):351368.
Brader, Ted. 2005. ‘Striking a Responsive Chord: How Political Ads Motivate and Persuade Voters by Appealing to Emotions’. American Journal of Political Science 49(2):388405.
Chandler, Jesse, Paolacci, Gabriele, Peer, Eyal, Mueller, Pam, and Ratliff, Kate. 2015. ‘Non-Nave Participants Can Reduce Effect Sizes’. Psychological Science 26:11311139.
Chong, Dennis, and Druckman, James N.. 2010. ‘Dynamic Public Opinion: Communication Effects Over Time’. American Political Science Review 104(4):663680.
Coppock, Alexander, and Green, Donald P.. 2015. ‘Assessing the Correspondence Between Experimental Results Obtained in the Lab and Field: A Review of Recent Social Science Research’. Political Science Research and Methods 3(1):113131.
Craig, Maureen A., and Richeson, Jennifer A.. 2014. ‘More Diverse Yet Less Tolerant? How the Increasingly Diverse Racial Landscape Affects White Americans’ Racial Attitudes’. Personality and Social Psychology Bulletin 40(6):750761.
Cronbach, Lee J, and Shapiro, Karen. 1982. Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass.
Ding, Peng, Feller, Avi, and Miratrix, Luke. 2016. ‘Randomization Inference for Treatment Effect Variation’. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(3):655671.
Franco, Annie, Malhotra, Neil, and Simonovits, Gabor. 2014. ‘Publication Bias in the Social Sciences: Unlocking the File Drawer’. Science 345(6203):15021505.
Gerber, Alan S., and Green, Donald P.. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton.
Gerber, Alan S., Malhotra, Neil, Dowling, Conor M., and Doherty, David. 2010. ‘Publication Bias in Two Political Behavior Literatures’. American Politics Research 38(4):591613.
Gilbert, Daniel T., King, Gary, Pettigrew, Stephen, and Wilson, Timothy D.. 2016. ‘Comment on “Estimating the reproducibility of psychological science”’. Science 351(6277):10371037.
Haider-Markel, Donald P., and Joslyn, Mark R.. 2001. ‘Gun Policy, Opinion, Tragedy, and Blame Attribution: The Conditional Influence of Issue Frames’. The Journal of Politics 63(2):520543.
Henrich, Joseph, Heine, Steven J., and Norenzayan, Ara. 2010. ‘The Weirdest People in the World?’. Behavioral and Brain Sciences 33:6183.
Hiscox, Michael J. 2006. ‘Through a Glass and Darkly: Attitudes Toward International Trade and the Curious Effects of Issue Framing’. International Organization 60(3):755780.
Holm, Sture. 1979. ‘A Simple Sequentially Rejective Multiple Test Procedure’. Scandinavian Journal of Statistics 6(2):6570.
Hopkins, Daniel J., and Mummolo, Jonathan. 2017. ‘Assessing the Breadth of Framing Effects’. Quarterly Journal of Political Science 12(1):3757.
Huff, Connor, and Tingley, Dustin. 2015. ‘“Who are These People?” Evaluating the Demographic Characteristics and Political Preferences of MTurk Survey Respondents’. Research & Politics 2(3):112.
Humphreys, Macartan, Sanchez de la Sierra, Raul, and van der Windt, Peter. 2013. ‘Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration’. Political Analysis 21(1):120.
Ioannidis, John P. A. 2005. ‘Why Most Published Research Findings are False’. PLoS Medicine 2(8):e124.
Johnston, Christopher D., and Ballard, Andrew O.. 2016. ‘Economists and Public Opinion: Expert Consensus and Economic Policy Judgments’. The Journal of Politics 78(2):443456.
Kahan, Dan M. 2013. ‘Fooled Twice, Shame on Who? Problems with Mechanical Turk Study Samples, Part 2’. Available at, accessed 22 February 2018.
Keeter, Scott, Hatley, Nick, Kennedy, Courtney, and Lau, Arnold. 2017. ‘What Low Response Rates Mean for Telephone Surveys’. Technical report, Pew Research Center, Washington, DC.
Krupnikov, Yanna, and Levine, Adam Seth. 2014. ‘Cross-Sample Comparisons and External Validity’. Journal of Experimental Political Science 1(1):5980.
Levendusky, Matthew, and Malhotra, Neil. 2016. ‘Does Media Coverage of Partisan Polarization Affect Political Attitudes?’ Political Communication 33(2):283--301.
Levitt, Steven D., and List, John A.. 2007. ‘What do Laboratory Experiments Measuring Social Preferences Reveal About the Real World?’. The Journal of Economic Perspectives 21(2):153174.
Lucas, Jeffrey W. 2003. ‘Theory-Testing, Generalization, and the Problem of External Validity’. Sociological Theory 21(3):236253.
McDermott, Rose. 2002. ‘Experimental Methodology in Political Science’. Political Analysis 10(4):325342.
McGinty, Emma E., Webster, Daniel W., and Barry, Colleen L.. 2013. ‘Effects of News Media Messages About Mass Shootings on Attitudes Toward Persons With Serious Mental Illness and Public Support for Gun Control Policies’. American Journal of Psychiatry 170(5):494501.
Mullinix, Kevin J., Leeper, Thomas J., Druckman, James N., and Freese, Jeremy. 2015. ‘The Generalizability of Survey Experiments’. Journal of Experimental Political Science 2:109138.
Nicholson, Stephen P. 2012. ‘Polarizing Cues’. American Journal of Political Science 56(1):5266.
Open Science Collaboration. 2015. ‘Estimating the Reproducibility of Psychological Science’. Science 349(6251):943951.
Peffley, Mark, and Hurwitz, Jon. 2007. ‘Persuasion and Resistance: Race and the Death Penalty in America’. American Journal of Political Science 51(4):9961012.
Samii, Cyrus, and Aronow, Peter M.. 2012. ‘On Equivalencies Between Design-Based and Regression-Based Variance Estimators for Randomized Experiments’. Statistics & Probability Letters 82(2):365370.
Sears, David O. 1986. ‘College Sophomores in the Laboratory: Influences of a Narrow Data Base on Social Psychology’s View of Human Nature’. Journal of Personality and Social Psychology 51(3):515530.
Simonsohn, Uri, Nelson, Leif D., and Simmons, Joseph P.. 2014. ‘P-curve: A Key to the File-Drawer’. Journal of Experimental Psychology: General 143(2):534547.
Stewart, Neil, Ungemach, Christoph, Harris, Adam J.L., Bartels, Daniel M., Newell, Ben R., Paolacci, Gabriele, and Chandler, Jesse. 2015. ‘The Average Laboratory Samples a Population of 7,300 Amazon Mechanical Turk Workers’. Judgment and Decision Making 10(5):479491.
Tipton, Elizabeth. 2013. ‘Improving Generalizations From Experiments Using Propensity Score Subclassification Assumptions, Properties, and Contexts’. Journal of Educational and Behavioral Statistics 38(3):239266.
Transue, John E. 2007. ‘Identity Salience, Identity Acceptance, and Racial Policy Attitudes: American National Identity as a Uniting Force’. American Journal of Political Science 51(1):7891.
White, Halbert. 2000. ‘A Reality Check for Data Snooping’. Econometrica 68(5):10971126.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Science Research and Methods
  • ISSN: 2049-8470
  • EISSN: 2049-8489
  • URL: /core/journals/political-science-research-and-methods
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary materials

Coppock Dataset

Supplementary materials

Coppock supplementary material 1

 PDF (2.8 MB)
2.8 MB


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed