Choosing Imputation Models

Moritz Marbach

doi:10.1017/pan.2021.39

Choosing Imputation Models

Published online by Cambridge University Press: 10 December 2021

Moritz Marbach

Show author details

Moritz Marbach*: Affiliation:
The Bush School of Government & Public Service, Texas A&M University, 4220 TAMU, College Station, TX 77843-4220, USA. Email: moritz.marbach@tamu.edu
*: Corresponding author Moritz Marbach

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

Keywords

missing data imputation weighting

Information

Type: Letter
Information: Political Analysis , Volume 30 , Issue 4 , October 2022 , pp. 597 - 605

DOI: https://doi.org/10.1017/pan.2021.39 [Opens in a new window]
Copyright: © The Author(s) 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Edited by Daniel Hopkins

References

Abayomi, K., Gelman, A., and Levy, M.. 2008. “ Diagnostics for Multivariate Imputations .” Journal of the Royal Statistical Society: Series C 57 (3): 273–291.Google Scholar

Andridge, R. R., and Little, R. J.. 2010. “A Review of Hot Deck Imputation for Survey Non-response.” International Statistical Review 78 (1): 40–64.CrossRef Google Scholar PubMed

Bondarenko, I., and Raghunathan, T.. 2016. “Graphical and Numerical Diagnostic Tools to Assess Suitability of Multiple Imputations and Imputation Models.” Statistics in Medicine 35 (17): 3007–3020.CrossRef Google Scholar PubMed

Cranmer, S. J., and Gill, J.. 2013. “We Have to be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data.” British Journal of Political Science 43 (2): 425–449.CrossRef Google Scholar

Doove, L. L., Van Buuren, S., and Dusseldorp, E.. 2014. “Recursive Partitioning for Missing Data Imputation in the Presence of Interaction Effects.” Computational Statistics & Data Analysis 72: 92–104.CrossRef Google Scholar

Franklin, J. M., Rassen, J. A., Ackermann, D., Bartels, D. B., and Schneeweiss, S.. 2014. “Metrics for Covariate Balance in Cohort Studies of Causal Effects.” Statistics in Medicine 33(10): 1685–1699.CrossRef Google Scholar PubMed

Hainmueller, J. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20 (1): 25–46.CrossRef Google Scholar

Honaker, J., King, G., and Blackwell, M.. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software 45(7): 1–47.CrossRef Google Scholar

King, G., Honaker, J., Joseph, A., and Scheve, K.. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political Science Review 95 (1): 49–69.CrossRef Google Scholar

Kropko, J., Goodrich, B., Gelman, A., and Hill, J.. 2014. “Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal And Conditional Approaches.” Political Analysis 22 (4): 497–519.CrossRef Google Scholar

Lall, R. 2016. “How Multiple Imputation Makes a Difference.” Political Analysis 24 (4): 414–433.CrossRef Google Scholar

Little, R. J. 1988. “Missing-data Adjustments in Large Surveys.” Journal of Business & Economic Statistics 6 (3): 287–296.Google Scholar

Little, R. J. A., and Rubin, D. B.. 2019. Statistical Analysis with Missing Data (3rd edn.). New York: Wiley.Google Scholar

Marbach, M. 2021. “Replication Data for: Choosing Imputation Models.” https://doi.org/10.7910/DVN/IIXGBM, Harvard Dataverse, V1.CrossRef Google Scholar

Mealli, F., and Rubin, D. B.. 2015. “Clarifying Missing at Random and Related Definitions, and Implications When Coupled With Exchangeability.” Biometrika 102 (4): 995–1000.CrossRef Google Scholar

Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581–592.CrossRef Google Scholar

Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.CrossRef Google Scholar

Rubin, D. B. 1996. “Multiple Imputation After 18+ Years.” Journal of the American Statistical Association 91 (434): 473–489.CrossRef Google Scholar

Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Boca Raton: Chapman & Hall.CrossRef Google Scholar

Seaman, S. R., White, I. R., Copas, A. J., and Li, L.. 2012. “Combining Multiple Imputation and Inverse-Probability Weighting.” Biometrics 68 (1): 129–137.CrossRef Google Scholar PubMed

Stekhoven, D. J., and Bühlmann, P.. 2012. “MissForest—Non-parametric Missing Value Imputation for Mixed-Type Data.” Bioinformatics 28 (1): 112–118.CrossRef Google Scholar PubMed

Van Buuren, S. 2007. “Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification.” Statistical Methods in Medical Research 16 (3): 219–242.CrossRef Google Scholar PubMed

Van Buuren, S. 2018. Flexible Imputation of Missing Data. Boca Raton: Chapman & Hall.CrossRef Google Scholar

Van Buuren, S., Brand, J. P., Groothuis-Oudshoorn, C. G., and Rubin, D. B.. 2006. “Fully Conditional Specification in Multivariate Imputation.” Journal of Statistical Computation and Simulation 76 (12): 1049–1064.CrossRef Google Scholar

Van Buuren, S., and Groothuis-Oudshoorn, K.. 2011. “MICE: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software 45 (3): 1–67.Google Scholar

Zubizarreta, J. R. 2015. “Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data.” Journal of the American Statistical Association 110 (511): 910–922.CrossRef Google Scholar

Marbach Dataset

Dataset

https://doi.org/10.7910/DVN/IIXGBM

Link

Marbach supplementary material

PDF 183 KB

Article contents

Choosing Imputation Models

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Marbach Dataset

Marbach supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests