Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-07T11:28:39.852Z Has data issue: false hasContentIssue false

Mixed-Effects XGBoost with Group-Aware Permutation Importance and Cross-Validation for Multilevel Cross-Classified Continuous Outcomes

Published online by Cambridge University Press:  10 April 2026

Sun-Joo Cho*
Affiliation:
Vanderbilt University , USA
Sophia Mueller
Affiliation:
The University of North Carolina at Chapel Hill , USA
*
Corresponding author: Sun-Joo Cho; Email: sj.cho@vanderbilt.edu
Rights & Permissions [Opens in a new window]

Abstract

This article proposes a mixed-effects machine-learning framework for modeling complex, nonlinear relations between predictors and continuous outcomes in multilevel cross-classified data. The proposed method, termed LMM–XGBoost, embeds extreme gradient boosting (XGBoost) within a linear mixed model (LMM) to combine flexible modeling of nonlinear and interaction effects with random effects that model dependence. In addition, an iterative estimation procedure for LMM–XGBoost is developed, a group-aware permutation importance measure that respects multilevel dependence is proposed, and a combined-group cross-validation (CV) strategy for hyperparameter tuning, out-of-fold (OOF) prediction, and importance estimation is developed for cross-classified designs. The simulation study shows that the proposed estimation method for LMM–XGBoost yields good parameter recovery under non-zero random-effect variances. In addition, relative to standard LMM and XGBoost, LMM–XGBoost achieves lower OOF prediction error and more accurate recovery of variable importance. The study further shows that combined-group CV and group-aware permutation importance yield less biased error estimates and substantially higher agreement with the true importance rankings than conventional permutation measures. An empirical application using the Add Health study illustrates how the proposed methods can identify important factors across multiple domains associated with adolescent depressive symptoms.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Table 1 Overview of literature reviews on LMM-ML for continuous outcomes

Figure 1

Table 2 Simulation results: Panel A (prediction accuracy), Panel B (parameter recovery), and Panel C (accuracy of permutation importance)

Figure 2

Table 3 Empirical study: Prediction accuracy of LMM, XGBoost, and LMM–XGBoost (MSE)

Figure 3

Table 4 Empirical study: Random-effect estimates of the LMMs and LMM-RF

Figure 4

Table 5 Empirical study: Permutation importance comparisons between LMM–XGBoost and XGBoost (averaged ranks for ties)

Figure 5

Figure 1 Empirical study: ALE plots and scatter plots of selected predictors—Standardized self-esteem scores (top), standardized social support scores (middle), and the interaction between the two (bottom).Note: Each tick mark in the x-axis or the y-axis represents values of a continuous predictor; The smooth function is plotted in blue, and the fitted linear function is shown as a red dotted line.

Figure 6

Figure B1 True tree structure in the simulation study.

Figure 7

Figure C1 Paper selection for literature review on using Add Health data.The literature review results are posted on the Open Science Framework, https://osf.io/jkctg/.

Figure 8

Table D1 Descriptive statistics of predictors in the empirical study.

Figure 9

Table E1 Fixed-effect estimates of the LMMs in the empirical study.

Figure 10

Figure F1 Residual analyses of LMM.