Hostname: page-component-848d4c4894-cjp7w Total loading time: 0 Render date: 2024-06-24T01:46:52.889Z Has data issue: false hasContentIssue false

Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples

Published online by Cambridge University Press:  04 January 2017

Yajuan Si*
Department of Statistics, MC 4690, Columbia University, NY, NY 10027, USA
Jerome P. Reiter
Department of Statistical Science, Box 90251, Duke University, Durham, NC 27708, USA
D. Sunshine Hillygus
Department of Political Science, Box 90204, Duke University, Durham, NC 27708, USA


Panel studies typically suffer from attrition. Ignoring the attrition can result in biased inferences if the missing data are systematically related to outcomes of interest. Unfortunately, panel data alone cannot inform the extent of bias due to attrition. Many panel studies also include refreshment samples, which are data collected from a random sample of new individuals during the later waves of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by non-ignorable attrition while reducing reliance on strong assumptions about the attrition process. We present a Bayesian approach to handle attrition in two-wave panels with one refreshment sample and many categorical survey variables. The approach includes (1) an additive non-ignorable selection model for the attrition process; and (2) a Dirichlet process mixture of multinomial distributions for the categorical survey variables. We present Markov chain Monte Carlo algorithms for sampling from the posterior distribution of model parameters and missing data. We apply the model to correct attrition bias in an analysis of data from the 2007–08 Associated Press/Yahoo News election panel study.

Research Article
Copyright © The Author 2014. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Authors' note: Replication materials are available in Si et al. (2014).


Albert, J. H., and Chib, S. 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88(422): 669–79.CrossRefGoogle Scholar
Allison, P. 2000. Multiple imputation for missing data: A cautionary tale. Sociological Methods and Research 28:301–9.CrossRefGoogle Scholar
Bartels, L. M. 1999. Panel effects in the American National Election Studies. Political Analysis 8(1): 120.CrossRefGoogle Scholar
Behr, A., Bellgardt, E., and Rendtel, U. 2005. Extent and determinants of panel attrition in the European community household panel. European Sociological Review 21(5): 489512.CrossRefGoogle Scholar
Bhattacharya, D. 2008a. Inference in panel data models under attrition caused by unobservables. Journal of Econometrics 144(2): 430–46.CrossRefGoogle Scholar
Bhattacharya, D. 2008b. Inference in panel data models under attrition caused by unobservables. Journal of Econometrics 144:430–46.CrossRefGoogle Scholar
Brehm, J. 1993. The phantom respondents: Opinion surveys and political representation. Ann Arbor: University of Michigan Press.Google Scholar
Brown, C. H. 1990. Protecting against nonrandomly missing data in longitudinal studies. Biometrics 46(1): 143–55.CrossRefGoogle ScholarPubMed
Burgette, L. F., and Reiter, J. P. 2010. Multiple imputation via sequential regression trees. American Journal of Epidemiology 172:1070–6.CrossRefGoogle ScholarPubMed
Callegaro, M., and DiSogra, C. 2008. Computing response metrics for online panels. Public Opinion Quarterly 72(5): 1008–32.CrossRefGoogle Scholar
Clinton, J. 2001. Panel bias from attrition and conditioning: A case study of the Knowledge Networks panel. In AAPOR 55th Annual Conference.Google Scholar
Cranmer, S. J., and Gill, J. 2013. We have to be discrete about this: A non-parametric imputation technique for missing categorical data. British Journal of Political Science 43(02): 425–49.CrossRefGoogle Scholar
Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y., and Zheng, S. 2013. Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science 22:238–56.Google Scholar
Diggle, P., and Kenward, M. G. 1994. Informative dropout in longitudinal data analysis. Journal of the Royal Statistical Society Series C (Applied Statistics) 43(1): 4993.Google Scholar
Dunson, D. B., and Xing, C. 2009. Nonparametric Bayes modeling of multivariate categorical data. Journal of the American Statistical Association 104:1042–51.CrossRefGoogle Scholar
Erosheva, E. A., Fienberg, S. E., and Junker, B. W. 2002. Alternative statistical models and representations for large sparse multi-dimensional contingency tables. Annales de la Faculté des Sciences de Toulouse 11(4): 485505.Google Scholar
Frankel, L., and Hillygus, S. 2013. Looking beyond demographics: Panel attrition in the ANES and GSS. Political Analysis 22(1): 118.Google Scholar
Frick, J. R., Goebel, J., Schechtman, E., Wagner, G. G., and Yitzhaki, S. 2006. Using analysis of Gini (ANOGI) for detecting whether two subsamples represent the same universe: The German Socio-Economic Panel Study (SOEP) experience. Sociological Methods Research 34:427–68.CrossRefGoogle Scholar
Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F., and Meulders, M. 2005. Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 61:7485.CrossRefGoogle ScholarPubMed
Grimmer, J. 2010. A Bayesian hierarchical topic model for political texts: Measuring expresses agendas in Senate press releases. Political Analysis 18(1): 135.CrossRefGoogle Scholar
Hausman, J. A., and Wise, D. A. 1979. Attrition bias in experimental and panel data: The Gary income maintenance experiment. Econometrica 47(2): 455–73.CrossRefGoogle Scholar
He, Y., Zaslavsky, A. M., and Landrum, M. B. 2010. Multiple imputation in a large-scale complex survey: A guide. Statistical Methods in Medical Research 19:653–70.CrossRefGoogle Scholar
Heeringa, S. 1997. Russia longitudinal monitoring survey sample attrition, replenishment, and weighting: Rounds V-VII. University of Michigan Institute for Social Research.Google Scholar
Henderson, M., and Hillygus, D. S. 2011. The dynamics of health care opinion, 2008–2010: Partisanship, self-interest, and racial resentment. Journal of Health Politics, Policy, and Law 36(6): 945–60.CrossRefGoogle Scholar
Henderson, M., Hillygus, D., and Tompson, T. 2010. “Sour grapes” or rational voting? Voter decision making among thwarted primary voters in 2008. Public Opinion Quarterly 74(3): 499529.CrossRefGoogle Scholar
Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. 1998. Combining panel data sets with attrition and refreshment samples. Technical report 230, National Bureau of Economic Research.CrossRefGoogle Scholar
Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. 2001. Combining panel data sets with attrition and refreshment samples. Econometrica 69:1645–59.CrossRefGoogle Scholar
Hogan, J. W., and Daniels, M. J. 2008. Missing data in longitudinal studies. Boca Raton, FL: Chapman and Hall.Google Scholar
Holmes, C. C., and Held, L. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis 1(1): 145–68.Google Scholar
Honaker, J., and King, G. 2010. What to do about missing values in time-series cross-section data. American Journal of Political Science 54(2): 561–81.CrossRefGoogle Scholar
Ishwaran, H., and James, L. F. 2001. Gibbs sampling for stick-breaking priors. Journal of the American Statistical Association 96:161–73.CrossRefGoogle Scholar
Iyengar, S., Sood, G., and Lelkes, Y. 2012. Affect, not ideology: A social identity perspective on polarization. Public Opinion Quarterly 76(3): 405–31.CrossRefGoogle Scholar
Keeter, S., Kennedy, C., Dimock, M., Best, J., and Craighill, P. 2006. Gauging the impact of growing nonresponse on estimates from a national RDD telephone survey. Public Opinion Quarterly 70(5): 759–79.CrossRefGoogle Scholar
Kenward, M. G. 1998. Selection models for repeated measurements with non-random dropout: An illustration of sensitivity. Statistics in Medicine 17:2723–32.3.0.CO;2-5>CrossRefGoogle ScholarPubMed
Kenward, M. G., Molenberghs, G., and Thijs, H. 2003. Pattern-mixture models with proper time dependence. Biometrika 90:5271.CrossRefGoogle Scholar
King, G., Honaker, J., Joseph, A., and Scheve, K. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95:4969.CrossRefGoogle Scholar
Kish, L., and Hess, I. 1959. A “replacement” procedure for reducing the bias of nonresponse. American Statistician 13:1719.Google Scholar
Kropko, J., Goodrich, B., Gelman, A., and Hill, J. 2014. Multiple imputation for continuous and categorical data: Comparing joint multivariate normal and conditional approaches. Political Analysis. Published online doi:10.1093/pan/mpu007.CrossRefGoogle Scholar
Kruse, Y., Callegaro, M., Dennis, J., Subias, S., Lawrence, M., DiSogra, C., and Tompson, T. 2009. Panel conditioning and attrition in the AP-Yahoo! News Election Panel Study. In 64th Conference of the American Association for Public Opinion Research (AAPOR). Hollywood, FL.Google Scholar
Kyung, M., Gill, J., and Casella, G. 2011. New findings from terrorism data: Dirichlet process random effects models for latent groups. Journal of the Royal Statistical Society Series C (Applied Statistics) 60:701–21.Google Scholar
Lin, I., and Schaeffer, N. C. 1995. Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly 59:236–58.CrossRefGoogle Scholar
Little, R. J. A. 1993. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association 88:125–34.Google Scholar
Little, R. J. A., and Rubin, D. B. 2002. Statistical Analysis with Missing Data. 2nd ed. New York: John Wiley & Sons.CrossRefGoogle Scholar
Little, R. J. A., and Wang, Y. 1996. Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52(1): 98111.CrossRefGoogle ScholarPubMed
Meng, X. 1994. Posterior predictive p-values. Annals of Statistics 22:1142–60.CrossRefGoogle Scholar
Olsen, R. J. 2005. The problem of respondent attrition: Survey methodology is key. Monthly Labor Review 128:6371.Google Scholar
Olson, K., and Witt, L. 2011. Are we keeping the people who used to stay? Changes in correlates of panel survey attrition over time. Social Science Research 40(4): 1037–50.CrossRefGoogle Scholar
Papaspiliopoulos, O. 2008. A note on posterior sampling from Dirichlet mixture models. Technical report, Centre for Research in Statistical Methodology, University of Warwick.Google Scholar
Pasek, J., Tahk, A., Lelkes, Y., Krosnick, J. A., Payne, B. K., Akhtar, O., and Tompson, T. 2009. Determinants of turnout and candidate choice in the 2008 US presidential election illuminating the impact of racial prejudice and other considerations. Public Opinion Quarterly 73(5): 943–94.CrossRefGoogle Scholar
Prior, M. 2010. You've either got it or you don't? The stability of political interest over the life cycle. Journal of Politics 72:747–66.CrossRefGoogle Scholar
Reiter, J. P., Raghunathan, T. E., and Kinney, S. 2006. The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology 32(2): 143–50.Google Scholar
Ridder, G. 1992. An empirical evaluation of some models for non-random attrition in panel data. Structural Change and Economic Dynamics 3:337–55.CrossRefGoogle Scholar
Rubin, D. B. 1987. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons.CrossRefGoogle Scholar
Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. 1999. Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of the American Statistical Association 94(448): 1096–120.Google Scholar
Schluchte, M. D. 1982. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine 11(14): 1861–70.Google Scholar
Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Statistica Sinica 4:639–50.Google Scholar
Si, Y., and Reiter, J. P. 2013. Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys. Journal of Educational and Behavioral Statistics 38(5): 499521.CrossRefGoogle Scholar
Si, Y., Reiter, J. P., and Hillygus, D. S. 2014. Replication data for: Semi-parametric selection models for potentially nonignorable attrition in panel studies with refreshment samples. (accessed April 19, 2014). IQSS Dataverse Network, V1.Google Scholar
Su, Y.-S., Yajima, M., Gelman, A. E., and Hill, J. 2011. Multiple imputation with diagnostics (mi) in R: Opening windows into the black box. Journal of Statistical Software 45(2): 131.CrossRefGoogle Scholar
Thompson, M., Fong, G., Hammond, D., Boudreau, C., Driezen, P., Hyland, A., Borland, R., Cummings, K., Hastings, G., Siahpush, M. 2006. Methods of the International Tobacco Control (ITC) four-country survey. Tobacco Control 15(Suppl. 3) iii12-iii18.CrossRefGoogle Scholar
Traugott, M. W., and Tucker, C. 1984. Strategies for predicting whether a citizen will vote and estimation of electoral outcomes. Public Opinion Quarterly 48(1): 330–43.CrossRefGoogle Scholar
Vehovar, V. 1999. Field substitution and unit nonresponse. Journal of Official Statistics 15:335–50.Google Scholar
Vermunt, J. K., Van Ginkel, J. R., Der Ark, V., Andries, L., and Sijtsma, K. 2008. Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology 38(1): 369–97.CrossRefGoogle Scholar
Walker, S. G. 2007. Sampling the Dirichlet mixture models with slices. Computations in Statistics-Simulation and Computation 36:4554.CrossRefGoogle Scholar
Wawro, G. 2002. Estimating dynamic panel data models in political science. Political Analysis 10:2548.CrossRefGoogle Scholar
Wissen, L., and Meurs, H. 1989. The Dutch mobility panel: Experiences and evaluation. Transportation 16:99119.CrossRefGoogle Scholar
Zabel, J. 1998. An analysis of attrition in the Panel Study of Income Dynamics and the Survey of Income and Program Participation with an application to a model of labor market behavior. Journal of Human Resources 33:479506.CrossRefGoogle Scholar