Introduction
The problem of missing data in survey data is one of long standing, arising from nonresponse or partial response to survey questions. Reasons for nonresponse include unwillingness to provide the information asked for, difficulty of recall of events that occurred in the past, and not knowing the correct response. Imputation is the process of estimating or predicting the missing observations.
In this chapter we deal with the regression setup with data vector (y i, x i), i = 1, …, N. For some of the observations some elements of x i or of both (y i, x i) are missing. A number of questions are considered. When can we proceed with an analysis of only the complete observations, and when should we attempt to fill the gaps left by the missing observations? What methods of imputation are available? When imputed values for missing observations are obtained, how should estimation and inference then proceed?
If a data set has missing observations, and if these gaps can be filled by a statistically sound procedure, then benefit comes from a larger and possibly more representative sample and, under ideal circumstances, more precise inference. The cost of estimating missing data comes from having to make (possibly wrong) assumptions to support a procedure for generating proxies for the missing observations, and from the approximation error inherent in any such procedure. Further, statistical inference that follows data augmentation after imputed values replace missing data is more complicated because such inference must take into account the approximation errors introduced by imputation.
Review the options below to login to check your access.
Log in with your Cambridge Aspire website account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.