Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach

  • Alexander Coppock

To what extent do survey experimental treatment effect estimates generalize to other populations and contexts? Survey experiments conducted on convenience samples have often been criticized on the grounds that subjects are sufficiently different from the public at large to render the results of such experiments uninformative more broadly. In the presence of moderate treatment effect heterogeneity, however, such concerns may be allayed. I provide evidence from a series of 15 replication experiments that results derived from convenience samples like Amazon’s Mechanical Turk are similar to those obtained from national samples. Either the treatments deployed in these experiments cause similar responses for many subject types or convenience and national samples do not differ much with respect to treatment effect moderators. Using evidence of limited within-experiment heterogeneity, I show that the former is likely to be the case. Despite a wide diversity of background characteristics across samples, the effects uncovered in these experiments appear to be relatively homogeneous.

Alexander Coppock is an Assistant Professor in the Department of Political Science, Yale University, New Haven, CT 06520 ( Portions of this research were funded by the Time-sharing Experiments in the Social Sciences (TESS) organization. This research was reviewed and approved by the Institutional Review Board of Columbia University (IRB-AAAP1312). The author is grateful to James N. Druckman and Donald P. Green for their guidance and support. To view supplementary material for this article, please visit

