Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-08T08:29:53.269Z Has data issue: false hasContentIssue false

Analyzing pension fund mortality with Gaussian processes in a subpopulation framework

Published online by Cambridge University Press:  08 May 2026

Eduardo Fraga L. de Melo
Affiliation:
Universidade do Estado do Rio de Janeiro, Brazil Susep – Superintendência de Seguros Privados, Brazil School of Applied Mathematics, Fundacao Getulio Vargas, Brazil
Michael Ludkovski*
Affiliation:
Statistics and Applied Probability, UC Santa Barbara, USA
Rodrigo S. Targino
Affiliation:
School of Applied Mathematics, Fundacao Getulio Vargas, Brazil
*
Corresponding author: Michael Ludkovski; Email: ludkovski@pstat.ucsb.edu
Rights & Permissions [Opens in a new window]

Abstract

Pension fund populations often have mortality experiences that are substantially different from the national benchmark. In a motivating case study of Brazilian corporate pension funds, pensioners are observed to have mortality that is 40–55% below the national average, due to the underlying socioeconomic disparities. Direct analysis of a pension fund population is challenging due to very sparse data, with age-specific annual death counts often in low single digits. We design and study a collection of stochastic subpopulation frameworks that coherently capture and project pensioner mortality rates via deflator factors relative to a reference population. Superseding parametric approaches, we propose Gaussian process (GP)-based models that flexibly estimate age- and/or year-specific deflators. We demonstrate that the GP models achieve better goodness of fit and uncertainty quantification. Our models are illustrated on two Brazilian pension funds in the context of exogenous national mortality tables. The GP models are implemented in R Stan using a fully Bayesian approach and take into account over-dispersion relative to the Poisson likelihood.

Information

Type
Original Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The Institute and Faculty of Actuaries
Figure 0

Figure 1 Time series of total exposure $E_{t} \,:\!=\, \sum _x E_{x,t}$ (blue solid lines) and number of deaths $d_{t} \,:\!=\, \sum _x d_{x,t}$ (red dashed lines) per year $t$ for both pension funds. Data from 2013 to 2021, males, ages 60–89, pension fund 1 (squares) and pension fund 2 (triangles).

Figure 1

Figure 2 Age pyramids of exposures $E_{x,t}$ (left) and number of deaths (right) $d_{x,t}$ per age $x$ for year $t=2018$ for the pension funds 1 and 2: ages 60–89.

Figure 2

Figure 3 Brazilian national population (BRA) log-mortality rates for males for years 2010/2015/2021, ages 60–80.

Figure 3

Figure 4 Inferred overdispersion parameters $\omega$ for Males. We show the prior density (red curve), the posterior histogram, the posterior mean (solid vertical line), and 90% quantile range (dashed lines).

Figure 4

Figure 5 Inferred deflators $\theta (\!\cdot\! )$ across six models. For the FD-1 model we show the prior and posterior densities. For all other models, thicker error bars denote the 50% posterior credible interval, thinner bars the 90% interval, and the dots the posterior mean. For models AD-FE, AD-AR, and AD-GP; the horizontal axis denotes age, while for TD-AR and TD-GP, it denotes year. For the time-dependent models, 2019 is a forecast.

Figure 5

Figure 6 Top: predicted pension fund’s male log-mortality rates as a function of age $x$ for year 2019 (blue) induced by models FD-1, AD-GP, and GP-S2. We compare it to the reference population (BRA) mortality curve (green squares) and raw mortality (red crosses). Bottom: predicted number of deaths for each age $x \in \{60, \ldots , 89\}$ in $t=2019$ (blue) relative to the observed number $d_{x,t}$ of deaths (red crosses). Circle sizes represent the probability assigned to the corresponding number of deaths, thin/thick lines represent 50%/90% posterior intervals, and the squares are the posterior means.

Figure 6

Figure 7 Posterior distribution of GP length scales: $\phi _{ag}$ in AD-GP, GP-S1, and GP-S2 (top row) and $\phi _{yr}$ in TD-GP, GP-S2 (bottom row) models. The red curve indicates the prior density as listed in Table A.1; the vertical solid (dashed) lines denote the posterior mean (90% posterior interval).

Figure 7

Figure 8 Predicted pension fund 2019 log-mortality rates for ages 60–89 based on the proposed GP-based models trained on 2013–2018. x’s indicate the observed raw mortality rates in 2019; note that for some ages there were zero recorded deaths, shown as circles at the bottom of the plot.

Figure 8

Table 1. Minimum, mean, and maximum of the yearly performance indexes (8) and (9) for leave-one-out cross-validation across years 2013–2019, ages 60–89 for males. $BRA$ is the reference population, except for models GP-S1 and GP-S2. Out-of-sample and in-sample results considering models described in Table A.1 and negative binomial likelihood. Bolded numbers indicate the best-performing model for the mean of each metric

Figure 9

Table A.1. Model specifications for the pension fund number of deaths

Figure 10

Figure B.1 Inferred deflators $\theta (\!\cdot\! )$ for males in pension fund 2. For the FD-1 model, we show the prior and posterior densities. For all other models, thicker error bars denote the 50% posterior credible interval, thinner bars the 90% interval, and the dots the posterior mean. For models AD-FE, AD-AR, and AD-GP, the horizontal axis denotes age, while for TD-AR and TD-GP, it denotes calendar year. For the time-dependent models TD-AR and TD-GP, 2019 is a forecast.

Figure 11

Table B.1. Mean of the yearly performance indexes (8) and (9) for leave-one-out cross-validation across years 2013–2019 for pension fund 2. The shown metrics are for ages 60–89 for males. Out-of-sample and in-sample results considering models described in Table A.1 and negative binomial likelihood. Bolded numbers indicate the best-performing model for each metric

Supplementary material: File

de Melo et al. supplementary material

de Melo et al. supplementary material
Download de Melo et al. supplementary material(File)
File 1.5 MB