Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Adam J. Berinsky; Gregory A. Huber; Gabriel S. Lenz

doi:10.1093/pan/mpr057

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Part of: PA editors' choice articles

Published online by Cambridge University Press: 04 January 2017

Adam J. Berinsky ,

Gregory A. Huber and

Gabriel S. Lenz

Show author details

Adam J. Berinsky*: Affiliation:
Department of Political Science, Massachusetts Institute of Technology, Cambridge, MA 02139
Gregory A. Huber: Affiliation:
Institution for Social and Policy Studies, Yale University, New Haven, CT 06511. e-mail: gregory.huber@yale.edu
Gabriel S. Lenz: Affiliation:
Department of Political Science, University of California, Berkeley, Berkeley, CA 94720. e-mail: glenz@berkeley.edu
*: e-mail: berinsky@mit.edu (corresponding author)

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

Type: Research Article
Information: Political Analysis , Volume 20 , Issue 3 , Summer 2012 , pp. 351 - 368

DOI: https://doi.org/10.1093/pan/mpr057 [Opens in a new window]
Copyright: Copyright © The Author 2012. Published by Oxford University Press on behalf of the Society for Political Methodology

Footnotes

Authors' note: Supplementary data for this article are available on the Political Analysis Web site.

References

Ballew, Charles C. II, and Todorov, Alexander. 2007. Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences of the United States of America 104: 17948–53.Google Scholar PubMed

Berinsky, Adam J., Huber, Gregory A., and Lenz, Gabriel S. 2011. Replication data for: Evaluating online labor markets for experimental research: Mechanical Turk. IQSS Dataverse Network [Distributor] V1 [Version]. http://hdl.handle.net/1902.1/17220 (accessed January 19, 2012).Google Scholar

Berinsky, Adam J., and Kinder, Donald R. 2006. Making sense of issues through media frames: Understanding the Kosovo crisis. Journal of Politics 68: 640–56.CrossRef Google Scholar

Bless, Herbert, Betsch, Tilmann, and Franzen, Axel. 1998. Framing the framing effect: The impact of context cues on solutions to the ‘Asian disease’ problem. European Journal of Social Psychology 28: 287–91.3.0.CO;2-U>CrossRef Google Scholar

Buhrmester, Michael D., Kwang, Tracy, and Gosling, Samuel D. 2011. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6: 3–5.CrossRef Google Scholar

Chandler, Dana, and Kapelner, Adam. 2010. Breaking monotony with meaning: Motivation in crowdsourcing markets. University of Chicago Mimeo.Google Scholar

Chen, Daniel L., and Horton, John J. 2010. The wages of pay cuts: Evidence from a field experiment. Harvard University Mimeo.CrossRef Google Scholar

Chong, Dennis, and Druckman, James N. 2010. Dynamic public opinion: Communication effects over time. American Political Science Review 104: 663–80.CrossRef Google Scholar

Downs, Julie S., Holbrook, Mandy B., Sheng, Steve, and Cranor, Lorrie Faith. 2010. Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, 2399-402. New York: ACM Press.Google Scholar

Druckman, James N. 2001. Evaluating framing effects. Journal of Economic Psychology 22: 91–101.CrossRef Google Scholar

Druckman, James N., and Kam, Cindy D. 2011. Students as experimental participants: A defense of the ‘narrow data base’. In Handbook of experimental political science, eds. Druckman, James N., Green, Donald P., Kuklinski, James H., Lupia, Arthur, 41–57. New York: Cambridge University Press.CrossRef Google Scholar

Gerber, Alan S., Gimpel, James G., Green, Donald P., and Shaw, Daron R. 2011. How large and long-lasting are the persuasive effects of televised campaign ads? Results from a randomized field experiment. American Political Science Review 105: 135–50.CrossRef Google Scholar

Green, Donald P., and Kern, Holger L. 2010. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Yale University Mimeo.Google Scholar

Gosling, Samuel D., Vazire, Simine, Srivastava, Sanjay, and John, Oliver P. 2004. Should we trust web-based studies? American Psychologist 59: 93–104 CrossRef Google Scholar PubMed

Horton, John J., Rand, David G., and Zeckhauser, Richard J. 2010. The online laboratory: Conducting experiments in a real labor market. Available at SSRN: http://ssrn.com/abstract=1591202 (accessed January 19, 2012).Google Scholar

Horton, J., and Chilton, L. 2010. The labor economics of paid crowdsourcing. Proceedings of the 11th ACM Conference on Electronic Commerce, Cambridge, MA.CrossRef Google Scholar

Jou, Jerwen, Shanteau, James, and Harris, Richard. 1996. An information processing view of framing effects: The role of causal schemas in decision making. Memory & Cognition 24(1): 1–15.CrossRef Google Scholar PubMed

Kam, Cindy D., and Simas, Elizabeth N. 2010. Risk orientations and policy frames. Journal of Politics 72: 381–96.CrossRef Google Scholar

Kam, Cindy D., Wilking, Jennifer R., and Zechmeister, Elizabeth J. 2007. Beyond the ‘narrow data base’: Another convenience sample for experimental research. Political Behavior 29: 415–40.CrossRef Google Scholar

Kittur, Aniket, Chi, Ed H., and Suh, Bongwon. 2008. Crowdsourcing user studies with Mechanical Turk. Proceedings of the 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2009, 453-6. New York: ACM Press.CrossRef Google Scholar

Kuhberger, Anton. 1995. The framing of decisions: A new look at old problems. Organizational Behavior & Human Decision Processes 62: 230–40.CrossRef Google Scholar

Lawson, C., Lenz, Gabriel S., Myers, Mike, and Baker, Andy. 2010. Looking like a winner: Candidate appearance and electoral success in new democracies. World Politics 62: 561–93.CrossRef Google Scholar

Mason, Winter, and Watts, Duncan J. 2009. Financial incentives and the performance of crowds. Proceedings of the ACM SIGKDD Workshop on Human Computation, 77-85. New York: ACM Press.CrossRef Google Scholar

Orne, M. T. 1962. On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist 17: 776–83.CrossRef Google Scholar

Paolacci, Gabriele, Chandler, Jesse, and Ipeirotis, Panagiotis G. 2010. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5: 411–19.CrossRef Google Scholar

Rasinski, Kenneth A. 1989. The effect of question wording on public support for government spending. Public Opinion Quarterly 53: 388–94.CrossRef Google Scholar

Ross, Joel, Irani, Lily, Six Silberman, M., Zaldivar, Andrew, and Tomlinson, Bill. 2010. Who are the crowdworkers? Shifting demographics in Amazon Mechanical Turk. In CHI EA 2010, 2863-72. New York: ACM Press.Google Scholar

Sears, David O. 1986. College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology 51: 515–30.CrossRef Google Scholar

Sheng, Victor S., Provost, Foster, and Ipeirotis, Panagiotis G. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 614–22. New York: ACM Press.Google Scholar

Snow, Rion, O'Connor, Brendan, Jurafsky, Daniel, and Ng, Andrew Y. 2008. Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 254–63. Morristown NJ: Association for Computational Linguistics.Google Scholar

Sorokin, Alexander, and Forsyth, David. 2008. Utility data annotation with Amazon Mechanical Turk. Computer Vision and Pattern Recognition Workshops ‘08 51: 1–8.Google Scholar

Takemura, Kazuhisa. 1994. Influence of elaboration on the framing of decision. Journal of Psychology 128: 33–9.Google Scholar

Transue, John E., Lee, Daniel J., and Aldrich, John H. 2009. Treatment spillover effects across survey experiments. Political Analysis 17: 143–61.CrossRef Google Scholar

Tversky, Amos, and Kahneman, Daniel. 1981. The framing of decisions and the psychology of choice. Science 211: 453–8.CrossRef Google Scholar PubMed

Berinsky et al. supplementary material

Appendix

File 890.9 KB

Article contents

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Abstract

Footnotes

References

Berinsky et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests