Multi-modes for Detecting Experimental Measurement Error

Raymond Duch; Denise Laroze; Thomas Robinson; Pablo Beramendi

doi:10.1017/pan.2019.34

Multi-modes for Detecting Experimental Measurement Error

Published online by Cambridge University Press: 14 October 2019

and

Raymond Duch*: Affiliation:
Nuffield College, University of Oxford, Oxford, UK. Email: raymond.duch@nuffield.ox.ac.uk
Denise Laroze: Affiliation:
Centre for Experimental Social Sciences and Departamento de Administración, Universidad de Santiago de Chile, Santiago, Chile. Email: denise.laroze@usach.cl
Thomas Robinson: Affiliation:
Department of Politics and International Relations, University of Oxford, Oxford, UK. Email: thomas.robinson@politics.ox.ac.uk
Pablo Beramendi: Affiliation:
Department of Political Science, Duke University, Durham, NC 27708, USA. Email: pablo.beramendi@duke.edu
*: *Email: raymond.duch@nuffield.ox.ac.uk

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Experiments should be designed to facilitate the detection of experimental measurement error. To this end, we advocate the implementation of identical experimental protocols employing diverse experimental modes. We suggest iterative nonparametric estimation techniques for assessing the magnitude of heterogeneous treatment effects across these modes. And we propose two diagnostic strategies—measurement metrics embedded in experiments, and measurement experiments—that help assess whether any observed heterogeneity reflects experimental measurement error. To illustrate our argument, first we conduct and analyze results from four identical interactive experiments: in the lab; online with subjects from the CESS lab subject pool; online with an online subject pool; and online with MTurk workers. Second, we implement a measurement experiment in India with CESS Online subjects and MTurk workers.

Keywords

experimental design measurement error Bayesian estimation causality counterfactual analysis heterogeneous effects

Information

Type: Articles
Information: Political Analysis , Volume 28 , Issue 2 , April 2020 , pp. 263 - 283

DOI: https://doi.org/10.1017/pan.2019.34 [Opens in a new window]
Copyright: Copyright © The Author(s) 2019. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Authors’ note: We would like to acknowledge the contributions of the Nuffield College Centre for Experimental Social Sciences postdocs who were instrumental in helping design and implement the experiments reported on in the manuscript—these include, John Jensenius III, Aki Matsuo, Sonke Ehret, Mauricio Lopez, Hector Solaz, Wojtek Przepiorka, David Klinowski, Sonja Vogt, and Amma Parin. We have also benefited from the very helpful comments from colleagues including Vera Troeger, Thomas Pluemper, Dominik Duell, Luke Keele, and Mats Ahrenshop. And thanks to the Political Analysis reviewers, editor and editorial team who were extremely helpful. Of course we assume responsibility for all of the shortcomings of the design and analysis. All replication materials are available from the Political Analysis Dataverse, doi.org/10.7910/DVN/F0GMX1 (Duch et al.2019).

Contributing Editor: Jeff Gill

References

Ahlquist, J. S. 2018. “List Experiment Design, Non-Strategic Respondent Error, and Item Count Technique Estimators.” Political Analysis 26(1):34–53.Google Scholar

Al-Ubaydli, O., List, J. A., LoRe, D., and Suskind, D.. 2017. “Scaling for Economists: Lessons from the Non-Adherence Problem in the Medical Literature.” Journal of Economic Perspectives 31(4):125–144.Google Scholar

Athey, S., and Imbens, G.. 2017. “The Econometrics of Randomized Experiments.” Handbook of Economic Field Experiments 1:73–140.Google Scholar

Bader, F., Baumeister, B., Berger, R., and Keuschnigg, M.. 2019. “On the Transportability of Laboratory Results.” Sociological Methods & Research, https://doi.org/10.1177/0049124119826151.Google Scholar

Bertrand, M., and Mullainathan, S.. 2001. “Do People Mean What They Say? Implications for Subjective Survey Data.” Economics and Social Behavior 91(2):67–72.Google Scholar

Blair, G., Chou, W., and Imai, K.. 2019. “List Experiments with Measurement Error.” Political Analysis, https://doi.org/10.1017/pan.2018.56.Google Scholar

Blattman, C., Jamison, J., Koroknay-Palicz, T., Rodrigues, K., and Sheridan, M.. 2016. “Measuring the Measurement Error: A Method to Qualitatively Validate Survey Data.” Journal of Development Economics 120:99–112.Google Scholar

Burleigh, T., Kennedy, R., and Clifford, S.. 2018. “How to Screen Out VPS and International Respondents Using Qualtrics: A Protocol.” Working Paper.Google Scholar

Camerer, C.2015. The Promise and Success of Lab-Field Generalizability in Experimental Economics: A Critical Reply to Levitt and List. Oxford Scholarship Online.Google Scholar

Centola, D. 2018. How Behavior Spreads: The Science of Complex Contagions. Princeton, NJ: Princeton University Press.Google Scholar

Chang, L., and Krosnick, J. A.. 2009. “National Surveys Via Rdd Telephone Interviewing Versus the InternetComparing Sample Representativeness and Response Quality.” Public Opinion Quarterly 73(4):641–678.Google Scholar

Coppock, A. 2018. “Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach.” Political Science Research and Methods 7(3):613–628.Google Scholar

Coppock, A., Leeper, T. J., and Mullinix, K. J.. 2018. “Generalizability of heterogeneous treatment effect estimates across samples.” Proceedings of the National Academy of Sciences 115(49):12441–12446.Google Scholar

de Quidt, J., Haushofer, J., and Roth, C.. 2018. “Measuring and Bounding Experimenter Demand.” American Economic Review 108(11):3266–3302.Google Scholar

Duch, R., Laroze, D., and Zakharov, A.. 2018. “Once a Liar Always a Liar?” Nuffield Centre for Experimental Social Sciences Working Paper.Google Scholar

Duch, R., Laroze, D., Robinson, T., and Beramendi, P.. 2019. “Replication Data for: Multi-Modes for Detecting Experimental Measurement Error.” https://doi.org/10.7910/DVN/F0GMX1, Harvard Dataverse, V1, UNF:6:4Q/frMH7kswKFTcwuHsmGQ== [fileUNF].Google Scholar

Dupas, P., and Miguel, E.. 2017. “Impacts and Determinants of Health Levels in Low-Income Countries.” In Handbook of Economic Field Experiments, edited by Duflo, E. and Banerjee, A., 3–93. Amsterdam: North-Holland.Google Scholar

Engel, C., and Kirchkamp, O.. 2018. “Measurement Errors of Risk Aversion and How to Correct Them.” Working Paper.Google Scholar

Gelman, A. 2013. “Preregistration of Studies and Mock Reports.” Political Analysis 21(1):40–41.Google Scholar

Gerber, A. S., and Green, D. P.. 2008. “Field Experiments and Natural Experiments.” In The Oxford Handbook of Political Methodology, 357–381. Oxford: Oxford University Press.Google Scholar

Gillen, B., Snowberg, E., and Yariv, L.. 2019. “Experimenting with Measurement Error: Techniques with Applications to the Caltech Cohort Study.” Journal of Political Economy 127(4):1826–1863.Google Scholar

Gooch, A., and Vavreck, L.. 2019. “How Face-to-Face Interviews and Cognitive Skill Affect Item Non-Response: A Randomized Experiment Assigning Mode of Interview.” Political Science Research and Methods 7(1):143–162.Google Scholar

Green, D. P., and Kern, H. L.. 2012. “Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76(3):491–511.Google Scholar

Grimmer, J., Messing, S., and Westwood, S. J.. 2017. “Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods.” Political Analysis 25(4):413–434.Google Scholar

Hill, J. L. 2011. “Bayesian Nonparametric Modeling for Causal Inference.” Journal of Computational and Graphical Statistics 20(1):217–240.Google Scholar

Huff, C., and Tingley, D.. 2015. “‘Who are these People?’ Evaluating the Demographic Characteristics and Political Preferences of MTurk Survey Respondents.” Research & Politics 2(3): 2053168015604648.Google Scholar

Imai, K., and Ratkovic, M.. 2013. “Estimating Treatment Effect Heterogeneity in Randomized Programme Evaluation.” The Annals of Applied Statistics 7(1):443–470.Google Scholar

Imai, K., and Strauss, A.. 2011. “Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign.” Political Analysis 19(1):1–19.Google Scholar

Kennedy, R., Clifford, S., Burleigh, T., Jewell, R., and Waggoner, P.. 2018. “The Shape of and Solutions to the MTurk Quality Crisis.” Working Paper.Google Scholar

Künzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B.. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116(10):4156–4165.Google Scholar

Levitt, S., and List, J.. 2007. “What Do Laboratory Experiments Measuring Social Preferences Reveal about the Real World.” The Journal of Economic Perspectives 21(7):153–174.Google Scholar

Levitt, S. D., and List, J. A.. 2015. “What Do Laboratory Experiments Measuring Social Preferences Reveal About the Real World?” Oxford Scholarship Online.Google Scholar

Loomes, G. 2005. “Modelling the Stochastic Component of Behaviour in Experiments: Some Issues for the Interpretation of Data.” Experimental Economics 8(4):301–323.Google Scholar

Maniadis, Z., Tufano, F., and List, J. A.. 2014. “One Swallow Doesn’t Make a Summer: New Evidence on Anchoring Effects.” American Economic Review 104(1):277–290.Google Scholar

Morton, R., and Williams, K.. 2010. From Nature to the Lab: Experimental Political Science and the Study of Causality. New York: Cambridge University Press.Google Scholar

Mullainathan, S., and Spiess, J.. 2017. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31(2):87–106.Google Scholar

Mutz, D. C. 2011. Population-Based Survey Experiments. Princeton, NJ: Princeton University Press.Google Scholar

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349(6251): aac4716.Google Scholar

Tourangeau, R., and Yan, T.. 2007. “Sensitive Questions in Surveys.” Psychological Bulletin 5:859–883.Google Scholar

Wager, S., and Athey, S.. 2018. “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.” Journal of the American Statistical Association 113(523):1228–1242.Google Scholar

Zizzo, D. J. 2010. “Experimenter Demand Effects in Economic Experiments.” Experimental Economics 13(1):75–98.Google Scholar

Duch et al. supplementary material

File 510 KB

Article contents

Multi-modes for Detecting Experimental Measurement Error

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Duch et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests