Hostname: page-component-6766d58669-fx4k7 Total loading time: 0 Render date: 2026-05-22T03:08:27.113Z Has data issue: false hasContentIssue false

A Bayesian Approach Towards Missing Covariate Data in Multilevel Latent Regression Models

Published online by Cambridge University Press:  01 January 2025

Christian Aßmann
Affiliation:
Leibniz Institute for Educational Trajectories Bamberg Otto-Friedrich-Universität Bamberg
Jean-Christoph Gaasch
Affiliation:
Otto-Friedrich-Universität Bamberg
Doris Stingl*
Affiliation:
Otto-Friedrich-Universität Bamberg
*
Correspondence should be made to Doris Stingl, Otto-Friedrich-Universität Bamberg, Bamberg, Germany. Email: doris.stingl@uni-bamberg.de
Rights & Permissions [Opens in a new window]

Abstract

The measurement of latent traits and investigation of relations between these and a potentially large set of explaining variables is typical in psychology, economics, and the social sciences. Corresponding analysis often relies on surveyed data from large-scale studies involving hierarchical structures and missing values in the set of considered covariates. This paper proposes a Bayesian estimation approach based on the device of data augmentation that addresses the handling of missing values in multilevel latent regression models. Population heterogeneity is modeled via multiple groups enriched with random intercepts. Bayesian estimation is implemented in terms of a Markov chain Monte Carlo sampling approach. To handle missing values, the sampling scheme is augmented to incorporate sampling from the full conditional distributions of missing values. We suggest to model the full conditional distributions of missing values in terms of non-parametric classification and regression trees. This offers the possibility to consider information from latent quantities functioning as sufficient statistics. A simulation study reveals that this Bayesian approach provides valid inference and outperforms complete cases analysis and multiple imputation in terms of statistical efficiency and computation time involved. An empirical illustration using data on mathematical competencies demonstrates the usefulness of the suggested approach.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2022 The Author(s)
Figure 0

Table 1. Prior specifications and MCMC starting values.

Figure 1

Table 2. Simulated missing data mechanisms.

Figure 2

Table 3. Simulation study (scenario 1, missing rates: X1=19%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_1=19\%$$\end{document}, X2=26%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2=26\%$$\end{document}, overall = 33%)—True parameter values, mean posterior medians and standard deviations, RMSEs and coverage ratios of structural parameter (regression coefficients, variance parameters) over 1000 replications obtained from BD, CC, IBM and DART.

Figure 3

Table 4. Simulation study (scenario 2, missing rates: X1=40%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_1=40\%$$\end{document}, X2=50%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2=50\%$$\end{document}, overall = 59%)—True parameter values, mean posterior medians and standard deviations, RMSEs and coverage ratios of structural parameter (regression coefficients, variance parameters) over 1000 replications obtained from BD, CC, IBM and DART.

Figure 4

Table 5. Simulation study (scenario 3, missing rates: X1=20%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_1=20\%$$\end{document}, X2=36%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2=36\%$$\end{document}, overall = 46%)—True parameter values, mean posterior medians and standard deviations, RMSEs and coverage ratios of structural parameter (regression coefficients, variance parameters) over 1000 replications obtained from BD, CC, IBM and DART.

Figure 5

Table 6. Simulation study (scenario 4, missing rates: X1=17%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_1=17\%$$\end{document}, X2=28%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_2=28\%$$\end{document}, overall =40%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$=\,40\%$$\end{document})—True parameter values, mean posterior medians and standard deviations, RMSEs and coverage ratios of structural parameter (regression coefficients, variance parameters) over 1000 replications obtained from BD, CC, IBM and DART.

Figure 6

Table 7. Comparison of prediction accuracy of conditioning variables X.

Figure 7

Table 8. NEPS grade 9—descriptive statistics (complete case summary).

Figure 8

Table 9. NEPS grade 9, mathematical competencies—parameter estimates of model I.

Figure 9

Table 10. NEPS grade 9, mathematical competencies—relative effects for structural parameter estimates of model I.

Figure 10

Table 11. NEPS grade 9, mathematical competencies—relative effects for structural parameter estimates of model II.

Figure 11

Table 12. NEPS grade 9, mathematical competencies—structural parameter estimates of model II.

Figure 12

Figure 1. NEPS grade 9, Gaussian kernel density estimates for the set of conditional variances on person level σg2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma _g^2$$\end{document} and school level υg2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\upsilon _g^2$$\end{document} and expected a posteriori estimates of scalar person parameter θi\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _i$$\end{document} referring to mathematical competence in model I.

Figure 13

Figure 2. NEPS grade 9, Gaussian kernel density estimates for the set of conditional variances on person level σg2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\sigma _g^2$$\end{document} and school level υg2\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\upsilon _g^2$$\end{document} and expected a posteriori estimates of scalar person parameter θi\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\theta _i$$\end{document} referring to mathematical competence in model II.

Supplementary material: File

Aßmann et al. supplementary material

Tables 1-9
Download Aßmann et al. supplementary material(File)
File 3.1 MB