Hostname: page-component-6766d58669-bp2c4 Total loading time: 0 Render date: 2026-05-17T23:46:58.094Z Has data issue: false hasContentIssue false

An Iterative GLMM–XGBoost Algorithm with Group-Aware Conditional Permutation Importance for Explaining Multilevel Item Response Data

Published online by Cambridge University Press:  24 April 2026

Sun-Joo Cho*
Affiliation:
Vanderbilt University , USA
Rights & Permissions [Opens in a new window]

Abstract

Identifying which individual-, cluster-, and item-level predictors explain binary responses requires balancing flexibility and statistical rigor. Generalized linear mixed models (GLMMs) explicitly partition fixed and random effects and capture multilevel structure, but they make it challenging to model nonlinear relationships and higher-order interactions. In contrast, tree ensembles, such as extreme gradient boosting (XGBoost), flexibly capture nonlinearities and interactions but typically ignore clustering and random effects. This study introduces an iterative GLMM–XGBoost algorithm that replaces the penalized weighted least squares step in the penalized IRLS (PIRLS) routine with a boosted-tree learner, while retaining Laplace-approximation-based updates of the random effects via their conditional modes. Weighted C-projection and global centering enforce orthogonality between the tree component and grouping factors, avoiding redundancy. The algorithm yields an approximate Newton–Fisher scoring update, preserves the ability to model random effects, and maintains the flexibility of XGBoost. In addition, a group-aware conditional permutation importance measure and its associated uncertainty measure are developed to identify predictor contributions under multilevel data. Results indicate that iterative GLMM–XGBoost improves predictive accuracy and importance ranking relative to standalone XGBoost under clustering and random effects. It also remains competitive with alternative methods across both smooth and tree-based fixed-effects specifications while accounting for random variation.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Figure 1 Empirical study: Log-likelihood values over iterations in the iterative GLMM–XGBoost.

Figure 1

Table 1 Empirical study: Permutation importance comparisons between GLMM–XGBoost and XGBoost

Figure 2

Figure 2 Simulation study: True tree structure.

Figure 3

Table 2 Simulation Study 1: Parameter recovery of GLMM–XGBoost (Panel A), prediction accuracy comparisons among GLMM–XGBoost, XGBoost, and GLMM (Panel B), and permutation importance comparisons between GLMM–XGBoost and XGBoost (Panel C)

Figure 4

Table 3 Simulation Study 2: Prediction accuracy comparisons for Structure 1 (Panel A: GLMM–XGBoost vs. GAMM) and Structure 2 (Panel B: GLMM–XGBoost vs. GLMM-Tree)

Figure 5

Table A.1 Data and inputs for the C-projection example

Figure 6

Table A.2 Calculation of the weighted C-projected vector

Figure 7

Table B.1 Data and inputs for the omission risk example

Figure 8

Table B.2 Projection under omission ($\mathbf {x}_g = \emptyset $)