Hostname: page-component-77f85d65b8-zzw9c Total loading time: 0 Render date: 2026-04-19T21:26:37.455Z Has data issue: false hasContentIssue false

Modeling Missing at Random Neuropsychological Test Scores Using a Mixture of Binomial Product Experts

Published online by Cambridge University Press:  22 October 2025

Daniel Suen*
Affiliation:
Statistics, University of Washington , USA
Yen-Chi Chen
Affiliation:
Statistics, University of Washington , USA
*
Corresponding author: Daniel Suen; Email: dsuen@uw.edu
Rights & Permissions [Opens in a new window]

Abstract

Multivariate bounded discrete data arises in many fields. In the setting of dementia studies, such data are collected when individuals complete neuropsychological tests. We outline a modeling and inference procedure that can model the joint distribution conditional on baseline covariates, leveraging previous work on mixtures of experts and latent class models. Furthermore, we illustrate how the work can be extended when the outcome data are missing at random using a nested EM algorithm. The proposed model can incorporate covariate information and perform imputation and clustering. We apply our model to simulated data and an Alzheimer’s disease data set.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Figure 1 This flowchart describes overall inference procedure.

Figure 1

Table 1 These results are the estimated MSEs and estimated coverage after imputing $M=20$ times from our model and running Algorithms 3 and 4 for $U=1,000$ replicates

Figure 2

Table 2 These results are the estimated MSEs after imputing $M=20$ times using mice for $U=200$ replicates

Figure 3

Table 3 For the simulations in Tables 1 and 2, we report the average computation time in minutes and the standard deviation in parentheses across all the randomly generated data sets

Figure 4

Figure 2 The CDR score distributions for the complete cases, the individuals missing at least one outcome variable, and the entire data set are provided in the left, middle, and right panels, respectively.

Figure 5

Table 4 These tables contain point estimates of the test score means for each latent class

Figure 6

Table 5 This table contains point estimates for the coefficients for each of the covariates and each latent class

Figure 7

Figure 3 Clustering on complete data only.Note: These barplots summarize the composition of each of the five latent groups. We order the groups from the most healthy to the least healthy. This is reflected in the mean CDR score of each group.

Figure 8

Figure 4 Clustering on the entire data with MAR assumption.Note: These barplots summarize the composition of each of the five latent groups. We order the groups from the most healthy to the least healthy. This is reflected in the mean CDR score of each group.

Supplementary material: File

Suen and Chen supplementary material 1

Suen and Chen supplementary material
Download Suen and Chen supplementary material 1(File)
File 12 KB
Supplementary material: File

Suen and Chen supplementary material 2

Suen and Chen supplementary material
Download Suen and Chen supplementary material 2(File)
File 1 MB