Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-11T13:41:16.609Z Has data issue: false hasContentIssue false

Hierarchical imputation of categorical variables in the presence of systematically and sporadically missing data

Published online by Cambridge University Press:  10 June 2025

Shahab Jolani*
Affiliation:
Department of Methodology and Statistics, Care and Public Health Research Institute (CAPHRI), Maastricht University, Maastricht, The Netherlands
Rights & Permissions [Opens in a new window]

Abstract

Modern quantitative evidence synthesis methods often combine patient-level data from different sources, known as individual participants data (IPD) sets. A specific challenge in meta-analysis of IPD sets is the presence of systematically missing data, when certain variables are not measured in some studies, and sporadically missing data, when measurements of certain variables are incomplete across different studies. Multiple imputation (MI) is among the better approaches to deal with missing data. However, MI of hierarchical data, such as IPD meta-analysis, requires advanced imputation routines that preserve the hierarchical data structure and accommodate the presence of both systematically and sporadically missing data. We have recently developed a new class of hierarchical imputation methods within the MICE framework tailored for continuous variables. This article discusses the extensions of this methodology to categorical variables, accommodating the simultaneous presence of systematically and sporadically missing data in nested designs with arbitrary missing data patterns. To address the challenge of the categorical nature of the data, we propose an accept–reject algorithm during the imputation process. Following theoretical discussions, we evaluate the performance of the new methodology through simulation studies and demonstrate its application using an IPD set from patients with kidney disease.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology
Figure 0

Table 1 Estimates of the fixed- and random-effects parameters in the simulation study for the binary outcome with n = 10 studies and weak between-study heterogeneity

Figure 1

Table 2 Estimates of the fixed- and random-effects parameters in the simulation study for the binary outcome with n = 10 studies and moderate between-study heterogeneity

Figure 2

Table 3 Estimates of the fixed- and random-effects parameters in the simulation study for the binary outcome with n = 10 studies and strong between-study heterogeneity

Figure 3

Figure 1 Bias of the fixed-effects estimates with 10% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 4

Figure 2 Coverage rate of the 95% confidence interval for the fixed-effects parameters with 10% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 5

Figure 3 Root mean squared error (RMSE) of the fixed-effects estimates with 10% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 6

Figure 4 Bias of the random-effects estimates with 10% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 7

Figure 5 Root mean squared error (RMSE) of the random-effects estimates with 10% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 8

Table 4 Percentage of missing data by variable and study in the empirical example

Figure 9

Table 5 Estimates of the fixed- and random-effects parameters in the empirical example

Figure 10

Figure A1 Bias of the fixed-effects estimates with 30% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 11

Figure A2 Coverage rate of the 95% confidence interval for the fixed-effects parameters with 30% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 12

Figure A3 Root mean squared error (RMSE) of the fixed-effects estimates with 30% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 13

Figure A4 Bias of the random-effects estimates with 30% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Figure 14

Figure A5 Root mean squared error (RMSE) of the random-effects estimates with 30% systematically missingness. Methods include reference (REF-before introducing missing data), complete case analysis (CCA), stratified multiple imputation (STI), multilevel multiple imputation (MLMI), and two-stage multilevel multiple imputation (2STG).

Supplementary material: File

Jolani supplementary material

Jolani supplementary material
Download Jolani supplementary material(File)
File 326.5 KB