Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-06T12:02:31.294Z Has data issue: false hasContentIssue false

Bayesian Federated Inference for regression models based on non-shared medical center data

Published online by Cambridge University Press:  10 March 2025

Marianne A. Jonker*
Affiliation:
Research Institute for Medical Innovation, Science Department IQ Health, Section Biostatistics, Radboud University Medical Center, Nijmegen, Netherlands
Hassan Pazira
Affiliation:
Research Institute for Medical Innovation, Science Department IQ Health, Section Biostatistics, Radboud University Medical Center, Nijmegen, Netherlands
Anthony C. C. Coolen
Affiliation:
DCN Donders Institute, Faculty of Science, Radboud University, Nijmegen, Netherlands Saddle Point Science Europe, Mercator Science Park, Nijmegen, Netherlands
*
Corresponding author: Marianne A. Jonker; Email: marianne.jonker@radboudumc.nl
Rights & Permissions [Opens in a new window]

Abstract

To estimate accurately the parameters of a regression model, the sample size must be large enough relative to the number of possible predictors for the model. In practice, sufficient data is often lacking, which can lead to overfitting of the model and, as a consequence, unreliable predictions of the outcome of new patients. Pooling data from different data sets collected in different (medical) centers would alleviate this problem, but is often not feasible due to privacy regulation or logistic problems. An alternative route would be to analyze the local data in the centers separately and combine the statistical inference results with the Bayesian Federated Inference (BFI) methodology. The aim of this approach is to compute from the inference results in separate centers what would have been found if the statistical analysis was performed on the combined data. We explain the methodology under homogeneity and heterogeneity across the populations in the separate centers, and give real life examples for better understanding. Excellent performance of the proposed methodology is shown. An R-package to do all the calculations has been developed and is illustrated in this article. The mathematical details are given in the Appendix.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology
Figure 0

Table 1 Homogeneous setting

Figure 1

Table 2 Heterogeneous setting

Figure 2

Table 3 Heterogeneous setting

Figure 3

Table 4 Heterogeneous setting

Figure 4

Table 5 The BFI estimates of the parameters in the linear regression model, $\widehat {\beta }_{{\mathrm { {BFI}}}}$, and the MAP estimates obtained from the analysis after combining the data, $\widehat {\beta }_{{\mathrm { {com}}}}$

Figure 5

Table 6 The BFI estimates of the parameters in the linear regression model with a cluster effect for hospital size, $\widehat {\beta }_{{\mathrm { {BFI}}}}$, and the MAP estimates obtained from the analysis after combining the data, $\widehat {\beta }_{{\mathrm { {com}}}}$

Figure 6

Table 7 The BFI estimates of the parameters in the linear regression model with hospital specific intercepts, $\widehat {\beta }_{{\mathrm { {BFI}}}}$, and the MAP estimates obtained from the analysis after combining the data, $\widehat {\beta }_{{\mathrm { {com}}}}$

Figure 7

Figure 1 Outcome predictions based on the BFI strategy (vertical axis) versus those based on the MAP estimates from the analysis obtained after combining the training data sets (horizontal axis). Left: Heterogeneous populations. Predictions are based on the model that includes the covariates age, gender, experience. Middle: Heterogeneous populations. Predictions are based on the model that includes the covariates hospital size, age, gender, experience. Right: Homogeneous populations. Predictions are based on the model that includes the covariates age, gender, experience. Perfect agreement corresponds to all points on the diagonal (yellow line). Here, $\lambda =0.1$. The plots look similar for other values of $\lambda $.