Hostname: page-component-6766d58669-mzsfj Total loading time: 0 Render date: 2026-05-20T11:13:49.961Z Has data issue: false hasContentIssue false

Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches

Published online by Cambridge University Press:  04 January 2017

Jonathan Kropko*
Affiliation:
Woodrow Wilson Department of Politics, University of Virginia, 1540 Jefferson Park Avenue, Charlottesville, VA 22903
Ben Goodrich
Affiliation:
Department of Political Science, Columbia University, 420 W. 118th St., Mail Code 3320, New York, NY 10027. e-mail: bg2382@columbia.edu
Andrew Gelman
Affiliation:
Departments of Statistics and Political Science, Columbia University, 1255 Amsterdam Avenue, Room 1016, New York, NY 10027. e-mail: gelman@stat.columbia.edu
Jennifer Hill
Affiliation:
Department of Humanities and Social Sciences, New York University Steinhardt, 246 Greene Street, Room 804, New York, NY 10003. e-mail: jennifer.hill@nyu.edu
*
e-mail: jkropko@virginia.edu (corresponding author)

Abstract

We consider the relative performance of two common approaches to multiple imputation (MI): joint multivariate normal (MVN) MI, in which the data are modeled as a sample from a joint MVN distribution; and conditional MI, in which each variable is modeled conditionally on all the others. In order to use the multivariate normal distribution, implementations of joint MVN MI typically assume that categories of discrete variables are probabilistically constructed from continuous values. We use simulations to examine the implications of these assumptions. For each approach, we assess (1) the accuracy of the imputed values; and (2) the accuracy of coefficients and fitted values from a model fit to completed data sets. These simulations consider continuous, binary, ordinal, and unordered-categorical variables. One set of simulations uses multivariate normal data, and one set uses data from the 2008 American National Election Studies. We implement a less restrictive approach than is typical when evaluating methods using simulations in the missing data literature: in each case, missing values are generated by carefully following the conditions necessary for missingness to be “missing at random” (MAR). We find that in these situations conditional MI is more accurate than joint MVN MI whenever the data include categorical variables.

Information

Type
Research Article
Copyright
Copyright © The Author 2014. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable