Skip to main content
×
×
Home

When Can Multiple Imputation Improve Regression Estimates?

  • Vincent Arel-Bundock (a1) and Krzysztof J. Pelc (a2)
Abstract

Multiple imputation (MI) is often presented as an improvement over listwise deletion (LWD) for regression estimation in the presence of missing data. Against a common view, we demonstrate anew that the complete case estimator can be unbiased, even if data are not missing completely at random. As long as the analyst can control for the determinants of missingness, MI offers no benefit over LWD for bias reduction in regression analysis. We highlight the conditions under which MI is most likely to improve the accuracy and precision of regression results, and develop concrete guidelines that researchers can adopt to increase transparency and promote confidence in their results. While MI remains a useful approach in certain contexts, it is no panacea, and access to imputation software does not absolve researchers of their responsibility to know the data.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      When Can Multiple Imputation Improve Regression Estimates?
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      When Can Multiple Imputation Improve Regression Estimates?
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      When Can Multiple Imputation Improve Regression Estimates?
      Available formats
      ×
Copyright
Corresponding author
* Email: kj.pelc@mcgill.ca
Footnotes
Hide All

Authors’ note: We thank Neal Beck, Timm Betz, Christina Davis, Tom Pepinsky, Amy Pond, and Erik Voeten for valuable comments. Replication files and supplementary materials are hosted on the Harvard Dataverse and the authors’ websites. https://dataverse.harvard.edu/dataverse/pan, doi:10.7910/DVN/S9G9XS. http://arelbundock.com, https://sites.google.com/site/krzysztofpelc/.

Contributing Editor: R. Michael Alvarez

Footnotes
References
Hide All
Allison, Paul D. 2001. Missing data , vol. 136. Thousand Oaks, CA: Sage Publications.
Arel-Bundock, Vincent, and Pelc, Krzysztof. 2017. When can multiple imputation improve regression estimates? doi:10.7910/DVN/S9G9XS, Harvard Dataverse, V1.
Brambor, Thomas, Clark, William Roberts, and Golder, Matt. 2006. Understanding interaction models: Improving empirical analyses. Political Analysis 14(1):6382.
Collins, Linda M., Schafer, Joseph L., and Kam, Chi-Ming. 2001. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods 6(4):330.
Franzese, Robert, and Kam, Cindy. 2009. Modeling and interpreting interactive hypotheses in regression analysis . Ann Arbor, MI: University of Michigan Press.
Geddes, Barbara. 1990. How the cases you choose affect the answers you get: Selection bias in comparative politics. Political Analysis 2(1):131150.
Graham, John W., Hofer, Scott M., and MacKinnon, David P.. 1996. Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research 31(2):197218.
Hollyer, James R., Rosendorff, B. Peter, and Vreeland, James Raymond. 2011. Democracy and transparency. Journal of Politics 73(4):11911205.
Honaker, James, King, Gary, and Blackwell, Matthew. 2011. Amelia II: A program for missing data. Journal of Statistical Software 45(7):147, http://www.jstatsoft.org/v45/i07/.
Jones, Michael P. 1996. Indicator and stratification methods for missing explanatory variables in multiple linear regression. Journal of the American Statistical Association 91(433):222230, http://www.jstor.org/stable/2291399.
King, Gary, Honaker, James, Joseph, Anne, and Scheve, Kenneth. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95(1):4969.
Lall, Ranjit. 2016. How multiple imputation makes a difference. Political Analysis 24(4):414433.
Lall, Ranjit. 2017. The missing dimension of the political resource curse debate. Comparative Political Studies 50(10):12911324, http://cps.sagepub.com/content/early/2016/09/06/0010414016666861.
Little, Roderick J. A. 1992. Regression with missing X’s: A review. Journal of the American Statistical Association 87(420):12271237, http://www.jstor.org/stable/2290664.
Little, Roderick J. A., and Rubin, Donald B.. 2002. Statistical analysis with missing data . Hoboken, NJ: John Wiley & Sons.
Pepinsky, Thomas. 2017. A note on listwise deletion versus multiple imputation. Working paper, Cornell University.
Rubin, Donald B. 1976. Inference and missing data. Biometrika 63(3):581592, http://biomet.oxfordjournals.org/content/63/3/581.
Schafer, Joseph L. 1997. Analysis of incomplete multivariate data . London: Chapman & Hall.
van Buuren, Stef. 2012. Flexible imputation of missing data . Boca Raton, FL: CRC Press.
van Buuren, Stef, and Groothuis-Oudshoorn, Karin. 2011. Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software 45(3):167.
White, Ian R., and Carlin, John B.. 2010. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Statistics in Medicine 29(28):29202931, http://onlinelibrary.wiley.com/doi/10.1002/sim.3944/abstract.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax

Keywords

Type Description Title
UNKNOWN
Supplementary materials

Arel-Bundock and Pelc supplementary material 1
Arel-Bundock and Pelc supplementary material

 Unknown (214 KB)
214 KB
UNKNOWN
Supplementary materials

Arel-Bundock and Pelc Dataset
Dataset

 Unknown

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 60
Total number of PDF views: 605 *
Loading metrics...

Abstract views

Total abstract views: 923 *
Loading metrics...

* Views captured on Cambridge Core between 6th March 2018 - 16th August 2018. This data will be updated every 24 hours.