Skip to main content Accessibility help
×
Home
Hostname: page-component-79b67bcb76-6tv95 Total loading time: 0.185 Render date: 2021-05-13T19:08:12.912Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": false, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true }

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Published online by Cambridge University Press:  04 January 2017

Adam J. Berinsky
Affiliation:
Department of Political Science, Massachusetts Institute of Technology, Cambridge, MA 02139
Gregory A. Huber
Affiliation:
Institution for Social and Policy Studies, Yale University, New Haven, CT 06511. e-mail: gregory.huber@yale.edu
Gabriel S. Lenz
Affiliation:
Department of Political Science, University of California, Berkeley, Berkeley, CA 94720. e-mail: glenz@berkeley.edu
Corresponding
Rights & Permissions[Opens in a new window]

Abstract

We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

Type
Research Article
Copyright
Copyright © The Author 2012. Published by Oxford University Press on behalf of the Society for Political Methodology 

Footnotes

Authors' note: Supplementary data for this article are available on the Political Analysis Web site.

References

Ballew, Charles C. II, and Todorov, Alexander. 2007. Predicting political elections from rapid and unreflective face judgments. Proceedings of the National Academy of Sciences of the United States of America 104: 17948–53.CrossRefGoogle ScholarPubMed
Berinsky, Adam J., Huber, Gregory A., and Lenz, Gabriel S. 2011. Replication data for: Evaluating online labor markets for experimental research: Mechanical Turk. IQSS Dataverse Network [Distributor] V1 [Version]. http://hdl.handle.net/1902.1/17220 (accessed January 19, 2012).Google Scholar
Berinsky, Adam J., and Kinder, Donald R. 2006. Making sense of issues through media frames: Understanding the Kosovo crisis. Journal of Politics 68: 640–56.CrossRefGoogle Scholar
Bless, Herbert, Betsch, Tilmann, and Franzen, Axel. 1998. Framing the framing effect: The impact of context cues on solutions to the ‘Asian disease’ problem. European Journal of Social Psychology 28: 287–91.3.0.CO;2-U>CrossRefGoogle Scholar
Buhrmester, Michael D., Kwang, Tracy, and Gosling, Samuel D. 2011. Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6: 35.CrossRefGoogle ScholarPubMed
Chandler, Dana, and Kapelner, Adam. 2010. Breaking monotony with meaning: Motivation in crowdsourcing markets. University of Chicago Mimeo.Google Scholar
Chen, Daniel L., and Horton, John J. 2010. The wages of pay cuts: Evidence from a field experiment. Harvard University Mimeo.Google Scholar
Chong, Dennis, and Druckman, James N. 2010. Dynamic public opinion: Communication effects over time. American Political Science Review 104: 663–80.CrossRefGoogle Scholar
Downs, Julie S., Holbrook, Mandy B., Sheng, Steve, and Cranor, Lorrie Faith. 2010. Are your participants gaming the system? Screening Mechanical Turk workers. Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, 2399-402. New York: ACM Press.CrossRefGoogle Scholar
Druckman, James N. 2001. Evaluating framing effects. Journal of Economic Psychology 22: 91101.CrossRefGoogle Scholar
Druckman, James N., and Kam, Cindy D. 2011. Students as experimental participants: A defense of the ‘narrow data base’. In Handbook of experimental political science, eds. Druckman, James N., Green, Donald P., Kuklinski, James H., Lupia, Arthur, 4157. New York: Cambridge University Press.CrossRefGoogle Scholar
Gerber, Alan S., Gimpel, James G., Green, Donald P., and Shaw, Daron R. 2011. How large and long-lasting are the persuasive effects of televised campaign ads? Results from a randomized field experiment. American Political Science Review 105: 135–50.CrossRefGoogle Scholar
Green, Donald P., and Kern, Holger L. 2010. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Yale University Mimeo.Google Scholar
Gosling, Samuel D., Vazire, Simine, Srivastava, Sanjay, and John, Oliver P. 2004. Should we trust web-based studies? American Psychologist 59: 93104 CrossRefGoogle ScholarPubMed
Horton, John J., Rand, David G., and Zeckhauser, Richard J. 2010. The online laboratory: Conducting experiments in a real labor market. Available at SSRN: http://ssrn.com/abstract=1591202 (accessed January 19, 2012).Google Scholar
Horton, J., and Chilton, L. 2010. The labor economics of paid crowdsourcing. Proceedings of the 11th ACM Conference on Electronic Commerce, Cambridge, MA.Google Scholar
Jou, Jerwen, Shanteau, James, and Harris, Richard. 1996. An information processing view of framing effects: The role of causal schemas in decision making. Memory & Cognition 24(1): 115.CrossRefGoogle ScholarPubMed
Kam, Cindy D., and Simas, Elizabeth N. 2010. Risk orientations and policy frames. Journal of Politics 72: 381–96.CrossRefGoogle Scholar
Kam, Cindy D., Wilking, Jennifer R., and Zechmeister, Elizabeth J. 2007. Beyond the ‘narrow data base’: Another convenience sample for experimental research. Political Behavior 29: 415–40.CrossRefGoogle Scholar
Kittur, Aniket, Chi, Ed H., and Suh, Bongwon. 2008. Crowdsourcing user studies with Mechanical Turk. Proceedings of the 26th Annual CHI Conference on Human Factors in Computing Systems, CHI 2009, 453-6. New York: ACM Press.CrossRefGoogle Scholar
Kuhberger, Anton. 1995. The framing of decisions: A new look at old problems. Organizational Behavior & Human Decision Processes 62: 230–40.CrossRefGoogle Scholar
Lawson, C., Lenz, Gabriel S., Myers, Mike, and Baker, Andy. 2010. Looking like a winner: Candidate appearance and electoral success in new democracies. World Politics 62: 561–93.CrossRefGoogle Scholar
Mason, Winter, and Watts, Duncan J. 2009. Financial incentives and the performance of crowds. Proceedings of the ACM SIGKDD Workshop on Human Computation, 77-85. New York: ACM Press.CrossRefGoogle Scholar
Orne, M. T. 1962. On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist 17: 776–83.CrossRefGoogle Scholar
Paolacci, Gabriele, Chandler, Jesse, and Ipeirotis, Panagiotis G. 2010. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5: 411–19.Google Scholar
Rasinski, Kenneth A. 1989. The effect of question wording on public support for government spending. Public Opinion Quarterly 53: 388–94.CrossRefGoogle Scholar
Ross, Joel, Irani, Lily, Six Silberman, M., Zaldivar, Andrew, and Tomlinson, Bill. 2010. Who are the crowdworkers? Shifting demographics in Amazon Mechanical Turk. In CHI EA 2010, 2863-72. New York: ACM Press.Google Scholar
Sears, David O. 1986. College sophomores in the laboratory: Influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology 51: 515–30.CrossRefGoogle Scholar
Sheng, Victor S., Provost, Foster, and Ipeirotis, Panagiotis G. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 614–22. New York: ACM Press.Google Scholar
Snow, Rion, O'Connor, Brendan, Jurafsky, Daniel, and Ng, Andrew Y. 2008. Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 254–63. Morristown NJ: Association for Computational Linguistics.Google Scholar
Sorokin, Alexander, and Forsyth, David. 2008. Utility data annotation with Amazon Mechanical Turk. Computer Vision and Pattern Recognition Workshops ‘08 51: 18.Google Scholar
Takemura, Kazuhisa. 1994. Influence of elaboration on the framing of decision. Journal of Psychology 128: 33–9.Google Scholar
Transue, John E., Lee, Daniel J., and Aldrich, John H. 2009. Treatment spillover effects across survey experiments. Political Analysis 17: 143–61.CrossRefGoogle Scholar
Tversky, Amos, and Kahneman, Daniel. 1981. The framing of decisions and the psychology of choice. Science 211: 453–8.CrossRefGoogle Scholar
Supplementary material: File

Berinsky et al. supplementary material

Appendix

Download Berinsky et al. supplementary material(File)
File 891 KB
You have Access
1959
Cited by

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk
Available formats
×
×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *