Skip to main content
×
×
Home

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

  • Adam J. Berinsky (a1), Gregory A. Huber (a2) and Gabriel S. Lenz (a3)
Abstract

We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk
      Available formats
      ×
Copyright
Corresponding author
e-mail: berinsky@mit.edu (corresponding author)
Footnotes
Hide All

Authors' note: Supplementary data for this article are available on the Political Analysis Web site.

Footnotes
References
Hide All
Ballew, Charles C. II, and Todorov, Alexander. 2007. Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences of the United States of America 104: 17948–53.
Berinsky, Adam J., Huber, Gregory A., and Lenz, Gabriel S. 2011. Replication data for: Evaluating online labor markets for experimental research: Mechanical Turk. IQSS Dataverse Network [Distributor] V1 [Version]. http://hdl.handle.net/1902.1/17220 (accessed January 19, 2012).
Berinsky, Adam J., and Kinder, Donald R. 2006. Making sense of issues through media frames: Understanding the Kosovo crisis. Journal of Politics 68: 640–56.
Bless, Herbert, Betsch, Tilmann, and Franzen, Axel. 1998. Framing the framing effect: The impact of context cues on solutions to the ‘Asian disease’ problem. European Journal of Social Psychology 28: 287–91.
Buhrmester, Michael D., Kwang, Tracy, and Gosling, Samuel D. 2011. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6: 35.
Chandler, Dana, and Kapelner, Adam. 2010. Breaking monotony with meaning: Motivation in crowdsourcing markets. University of Chicago Mimeo.
Chen, Daniel L., and Horton, John J. 2010. The wages of pay cuts: Evidence from a field experiment. Harvard University Mimeo.
Chong, Dennis, and Druckman, James N. 2010. Dynamic public opinion: Communication effects over time. American Political Science Review 104: 663–80.
Downs, Julie S., Holbrook, Mandy B., Sheng, Steve, and Cranor, Lorrie Faith. 2010. Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, 2399-402. New York: ACM Press.
Druckman, James N. 2001. Evaluating framing effects. Journal of Economic Psychology 22: 91101.
Druckman, James N., and Kam, Cindy D. 2011. Students as experimental participants: A defense of the ‘narrow data base’. In Handbook of experimental political science, eds. Druckman, James N., Green, Donald P., Kuklinski, James H., Lupia, Arthur, 4157. New York: Cambridge University Press.
Gerber, Alan S., Gimpel, James G., Green, Donald P., and Shaw, Daron R. 2011. How large and long-lasting are the persuasive effects of televised campaign ads? Results from a randomized field experiment. American Political Science Review 105: 135–50.
Green, Donald P., and Kern, Holger L. 2010. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Yale University Mimeo.
Gosling, Samuel D., Vazire, Simine, Srivastava, Sanjay, and John, Oliver P. 2004. Should we trust web-based studies? American Psychologist 59: 93104
Horton, John J., Rand, David G., and Zeckhauser, Richard J. 2010. The online laboratory: Conducting experiments in a real labor market. Available at SSRN: http://ssrn.com/abstract=1591202 (accessed January 19, 2012).
Horton, J., and Chilton, L. 2010. The labor economics of paid crowdsourcing. Proceedings of the 11th ACM Conference on Electronic Commerce, Cambridge, MA.
Jou, Jerwen, Shanteau, James, and Harris, Richard. 1996. An information processing view of framing effects: The role of causal schemas in decision making. Memory & Cognition 24(1): 115.
Kam, Cindy D., and Simas, Elizabeth N. 2010. Risk orientations and policy frames. Journal of Politics 72: 381–96.
Kam, Cindy D., Wilking, Jennifer R., and Zechmeister, Elizabeth J. 2007. Beyond the ‘narrow data base’: Another convenience sample for experimental research. Political Behavior 29: 415–40.
Kittur, Aniket, Chi, Ed H., and Suh, Bongwon. 2008. Crowdsourcing user studies with Mechanical Turk. Proceedings of the 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2009, 453-6. New York: ACM Press.
Kuhberger, Anton. 1995. The framing of decisions: A new look at old problems. Organizational Behavior & Human Decision Processes 62: 230–40.
Lawson, C., Lenz, Gabriel S., Myers, Mike, and Baker, Andy. 2010. Looking like a winner: Candidate appearance and electoral success in new democracies. World Politics 62: 561–93.
Mason, Winter, and Watts, Duncan J. 2009. Financial incentives and the performance of crowds. Proceedings of the ACM SIGKDD Workshop on Human Computation, 77-85. New York: ACM Press.
Orne, M. T. 1962. On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist 17: 776–83.
Paolacci, Gabriele, Chandler, Jesse, and Ipeirotis, Panagiotis G. 2010. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5: 411–19.
Rasinski, Kenneth A. 1989. The effect of question wording on public support for government spending. Public Opinion Quarterly 53: 388–94.
Ross, Joel, Irani, Lily, Six Silberman, M., Zaldivar, Andrew, and Tomlinson, Bill. 2010. Who are the crowdworkers? Shifting demographics in Amazon Mechanical Turk. In CHI EA 2010, 2863-72. New York: ACM Press.
Sears, David O. 1986. College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology 51: 515–30.
Sheng, Victor S., Provost, Foster, and Ipeirotis, Panagiotis G. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 614–22. New York: ACM Press.
Snow, Rion, O'Connor, Brendan, Jurafsky, Daniel, and Ng, Andrew Y. 2008. Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 254–63. Morristown NJ: Association for Computational Linguistics.
Sorokin, Alexander, and Forsyth, David. 2008. Utility data annotation with Amazon Mechanical Turk. Computer Vision and Pattern Recognition Workshops ‘08 51: 18.
Takemura, Kazuhisa. 1994. Influence of elaboration on the framing of decision. Journal of Psychology 128: 33–9.
Transue, John E., Lee, Daniel J., and Aldrich, John H. 2009. Treatment spillover effects across survey experiments. Political Analysis 17: 143–61.
Tversky, Amos, and Kahneman, Daniel. 1981. The framing of decisions and the psychology of choice. Science 211: 453–8.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax
Type Description Title
WORD
Supplementary materials

Berinsky et al. supplementary material
Appendix

 Word (891 KB)
891 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed