Hostname: page-component-cd4964975-g4d8c Total loading time: 0 Render date: 2023-03-29T08:17:11.395Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

Multi-modes for Detecting Experimental Measurement Error

Published online by Cambridge University Press:  14 October 2019

Raymond Duch*
Nuffield College, University of Oxford, Oxford, UK. Email:
Denise Laroze
Centre for Experimental Social Sciences and Departamento de Administración, Universidad de Santiago de Chile, Santiago, Chile. Email:
Thomas Robinson
Department of Politics and International Relations, University of Oxford, Oxford, UK. Email:
Pablo Beramendi
Department of Political Science, Duke University, Durham, NC 27708, USA. Email:


Experiments should be designed to facilitate the detection of experimental measurement error. To this end, we advocate the implementation of identical experimental protocols employing diverse experimental modes. We suggest iterative nonparametric estimation techniques for assessing the magnitude of heterogeneous treatment effects across these modes. And we propose two diagnostic strategies—measurement metrics embedded in experiments, and measurement experiments—that help assess whether any observed heterogeneity reflects experimental measurement error. To illustrate our argument, first we conduct and analyze results from four identical interactive experiments: in the lab; online with subjects from the CESS lab subject pool; online with an online subject pool; and online with MTurk workers. Second, we implement a measurement experiment in India with CESS Online subjects and MTurk workers.

Copyright © The Author(s) 2019. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Authors’ note: We would like to acknowledge the contributions of the Nuffield College Centre for Experimental Social Sciences postdocs who were instrumental in helping design and implement the experiments reported on in the manuscript—these include, John Jensenius III, Aki Matsuo, Sonke Ehret, Mauricio Lopez, Hector Solaz, Wojtek Przepiorka, David Klinowski, Sonja Vogt, and Amma Parin. We have also benefited from the very helpful comments from colleagues including Vera Troeger, Thomas Pluemper, Dominik Duell, Luke Keele, and Mats Ahrenshop. And thanks to the Political Analysis reviewers, editor and editorial team who were extremely helpful. Of course we assume responsibility for all of the shortcomings of the design and analysis. All replication materials are available from the Political Analysis Dataverse, (Duch et al.2019).

Contributing Editor: Jeff Gill


Ahlquist, J. S. 2018. “List Experiment Design, Non-Strategic Respondent Error, and Item Count Technique Estimators.” Political Analysis 26(1):3453.Google Scholar
Al-Ubaydli, O., List, J. A., LoRe, D., and Suskind, D.. 2017. “Scaling for Economists: Lessons from the Non-Adherence Problem in the Medical Literature.” Journal of Economic Perspectives 31(4):125144.Google Scholar
Athey, S., and Imbens, G.. 2017. “The Econometrics of Randomized Experiments.” Handbook of Economic Field Experiments 1:73140.Google Scholar
Bader, F., Baumeister, B., Berger, R., and Keuschnigg, M.. 2019. “On the Transportability of Laboratory Results.” Sociological Methods & Research, Scholar
Bertrand, M., and Mullainathan, S.. 2001. “Do People Mean What They Say? Implications for Subjective Survey Data.” Economics and Social Behavior 91(2):6772.Google Scholar
Blair, G., Chou, W., and Imai, K.. 2019. “List Experiments with Measurement Error.” Political Analysis, Scholar
Blattman, C., Jamison, J., Koroknay-Palicz, T., Rodrigues, K., and Sheridan, M.. 2016. “Measuring the Measurement Error: A Method to Qualitatively Validate Survey Data.” Journal of Development Economics 120:99112.Google Scholar
Burleigh, T., Kennedy, R., and Clifford, S.. 2018. “How to Screen Out VPS and International Respondents Using Qualtrics: A Protocol.” Working Paper.Google Scholar
Camerer, C.2015. The Promise and Success of Lab-Field Generalizability in Experimental Economics: A Critical Reply to Levitt and List. Oxford Scholarship Online.Google Scholar
Centola, D. 2018. How Behavior Spreads: The Science of Complex Contagions. Princeton, NJ: Princeton University Press.Google Scholar
Chang, L., and Krosnick, J. A.. 2009. “National Surveys Via Rdd Telephone Interviewing Versus the InternetComparing Sample Representativeness and Response Quality.” Public Opinion Quarterly 73(4):641678.Google Scholar
Coppock, A. 2018. “Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach.” Political Science Research and Methods 7(3):613628.Google Scholar
Coppock, A., Leeper, T. J., and Mullinix, K. J.. 2018. “Generalizability of heterogeneous treatment effect estimates across samples.” Proceedings of the National Academy of Sciences 115(49):1244112446.Google Scholar
de Quidt, J., Haushofer, J., and Roth, C.. 2018. “Measuring and Bounding Experimenter Demand.” American Economic Review 108(11):32663302.Google Scholar
Duch, R., Laroze, D., and Zakharov, A.. 2018. “Once a Liar Always a Liar?” Nuffield Centre for Experimental Social Sciences Working Paper.Google Scholar
Duch, R., Laroze, D., Robinson, T., and Beramendi, P.. 2019. “Replication Data for: Multi-Modes for Detecting Experimental Measurement Error.”, Harvard Dataverse, V1, UNF:6:4Q/frMH7kswKFTcwuHsmGQ== [fileUNF].Google Scholar
Dupas, P., and Miguel, E.. 2017. “Impacts and Determinants of Health Levels in Low-Income Countries.” In Handbook of Economic Field Experiments, edited by Duflo, E. and Banerjee, A., 393. Amsterdam: North-Holland.Google Scholar
Engel, C., and Kirchkamp, O.. 2018. “Measurement Errors of Risk Aversion and How to Correct Them.” Working Paper.Google Scholar
Gelman, A. 2013. “Preregistration of Studies and Mock Reports.” Political Analysis 21(1):4041.Google Scholar
Gerber, A. S., and Green, D. P.. 2008. “Field Experiments and Natural Experiments.” In The Oxford Handbook of Political Methodology, 357381. Oxford: Oxford University Press.Google Scholar
Gillen, B., Snowberg, E., and Yariv, L.. 2019. “Experimenting with Measurement Error: Techniques with Applications to the Caltech Cohort Study.” Journal of Political Economy 127(4):18261863.Google Scholar
Gooch, A., and Vavreck, L.. 2019. “How Face-to-Face Interviews and Cognitive Skill Affect Item Non-Response: A Randomized Experiment Assigning Mode of Interview.” Political Science Research and Methods 7(1):143162.Google Scholar
Green, D. P., and Kern, H. L.. 2012. “Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76(3):491511.Google Scholar
Grimmer, J., Messing, S., and Westwood, S. J.. 2017. “Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods.” Political Analysis 25(4):413434.Google Scholar
Hill, J. L. 2011. “Bayesian Nonparametric Modeling for Causal Inference.” Journal of Computational and Graphical Statistics 20(1):217240.Google Scholar
Huff, C., and Tingley, D.. 2015. “‘Who are these People?’ Evaluating the Demographic Characteristics and Political Preferences of MTurk Survey Respondents.” Research & Politics 2(3): 2053168015604648.Google Scholar
Imai, K., and Ratkovic, M.. 2013. “Estimating Treatment Effect Heterogeneity in Randomized Programme Evaluation.” The Annals of Applied Statistics 7(1):443470.Google Scholar
Imai, K., and Strauss, A.. 2011. “Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign.” Political Analysis 19(1):119.Google Scholar
Kennedy, R., Clifford, S., Burleigh, T., Jewell, R., and Waggoner, P.. 2018. “The Shape of and Solutions to the MTurk Quality Crisis.” Working Paper.Google Scholar
Künzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B.. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116(10):41564165.Google Scholar
Levitt, S., and List, J.. 2007. “What Do Laboratory Experiments Measuring Social Preferences Reveal about the Real World.” The Journal of Economic Perspectives 21(7):153174.Google Scholar
Levitt, S. D., and List, J. A.. 2015. “What Do Laboratory Experiments Measuring Social Preferences Reveal About the Real World?” Oxford Scholarship Online.Google Scholar
Loomes, G. 2005. “Modelling the Stochastic Component of Behaviour in Experiments: Some Issues for the Interpretation of Data.” Experimental Economics 8(4):301323.Google Scholar
Maniadis, Z., Tufano, F., and List, J. A.. 2014. “One Swallow Doesn’t Make a Summer: New Evidence on Anchoring Effects.” American Economic Review 104(1):277290.Google Scholar
Morton, R., and Williams, K.. 2010. From Nature to the Lab: Experimental Political Science and the Study of Causality. New York: Cambridge University Press.Google Scholar
Mullainathan, S., and Spiess, J.. 2017. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31(2):87106.Google Scholar
Mutz, D. C. 2011. Population-Based Survey Experiments. Princeton, NJ: Princeton University Press.Google Scholar
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349(6251): aac4716.Google Scholar
Tourangeau, R., and Yan, T.. 2007. “Sensitive Questions in Surveys.” Psychological Bulletin 5:859883.Google Scholar
Wager, S., and Athey, S.. 2018. “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.” Journal of the American Statistical Association 113(523):12281242.Google Scholar
Zizzo, D. J. 2010. “Experimenter Demand Effects in Economic Experiments.” Experimental Economics 13(1):7598.Google Scholar
Supplementary material: File

Duch et al. supplementary material

Duch et al. supplementary material

Download Duch et al. supplementary material(File)
File 510 KB