Survival energy models for mortality prediction and future prospects

Abstract The survival energy model (SEM) is a recently introduced novel approach to mortality prediction, which offers a cohort-wise distribution function of the time of death as the first hitting time of a “survival energy” diffusion process to zero. In this study, we propose a novel SEM that can serve as a suitable candidate in the family of prediction models. We also proposed a method to improve the prediction in an earlier work. We further examine the practical advantages of SEM over existing mortality models.


Introduction
Statistics over the last few decades demonstrate an increase in life expectancy in many countries. For example, in Japan, the life expectancy in 2020 was 85 years, whereas it was 60 years in 1950. Such a rapid change in longevity is called the "Longevity Revolution". This trend confers selectivity and value to human life for individuals. However, this gives rise to several medical, economic, and social welfare problems. For instance, the Japanese financial crisis involving the national pension system is a pressing matter. The prediction of mortality is becoming a critical social issue worldwide.
Since the early 20th century, numerous authors have studied mortality prediction, and a methodology has already been established. Most mortality models treat "death" as the first event of a time-inhomogeneous Poisson process. Let T x be the remaining lifetime of an individual of age x. It is assumed that where μ(x, t) is a (possibly stochastic) intensity function called the force of mortality in the insurance context. Previous studies have derived models for μ (x, t). For instance, certain deterministic mortality models, such as the Gompertz, Makeham, and Heligman-Pollard laws, were introduced in earlier years; More recently, numerous stochastic mortality models have been proposed, such as those developed by Olivieri Olovieri (2001), Biffis (2005), Cairns et al. (2006b), Hainaut and Devolder Hainaut and Devolder (2008), Biffis et al. (2010), Blackburn and Sherris (2013). Moreover, by assuming that μ(x, ·) is constant between (t, t + 1], say m(x, t), allows for modeling of the mortality m (x, t). This approach corresponds to many established classical models, such as the Lee-Carter (1992), Renshaw-Gaberman (2006), and CBD models (Cairns et al., 2008(Cairns et al., , 2009), among others. We refer to these approaches as reduced-form approaches, because they consider death just as a stochastic event. C The Author(s), 2023. Published by Cambridge University Press on behalf of The International Actuarial Association. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited. Shimizu et al. (2020) proposed a structural approach under the "survival energy hypothesis", which assumes the existence of survival energy for human beings, and death occurs when the energy dissipates. Shimizu et al. (2020) used inhomogeneous diffusion (ID) processes as the cohort-wise survival energy model (SEM), such as X c = (X c t ) t≥0 with cohort c, called ID-SEM: where x c is a positive constant corresponding to the initial survival energy, U c and V c are deterministic functions on R + × with the parameter space given below, and W is the Wiener process: where T c is a known parameter called change point, at which the trend of survival energy changes drastically. of ϑ c = (α c , β c , γ c , κ c ) is given by Under this ID-SEM, they defined the time of death as the first hitting time for X c to reach zero: τ c := inf{t > 0 | X c t < 0}, and illustrated that the mortality function, or more practically, the following conditional mortality function: for a suitably chosen age S can fit their empirical version computed using data from the human mortality database (HMD) (Human Mortality Database); see also Remark 2.5. This indicates that the SEM can propose an excellent parametric family to predict future mortality functions; nevertheless, it is merely a fictitious assumption. As described in Shimizu et al. (2020), the term "structural approach" follows the structural approach in credit risk analysis. This approach is analogous to the structural approach to "default probability", in which a stochastic process describes the asset price. Default time was defined as the first hitting time to a certain level. These two approaches are mathematically identical, but there is a significant difference from a statistical perspective: we can observe the asset process in a credit risk context unlike the "survival energy." However, we can observe many deaths for many individuals' data, although defaults are not directly observed in default risk calculations (because they are predetermined or assumed to occur before default). We estimated the parameters in the SEM family with careful attention to this point.
The main contribution of this paper is the proposal of a novel SEM in Section 2. The mortality function for ID-SEM is sensitive to the change-point parameter T c and is difficult to predict for a future cohort because it has no clear trend. To address this issue, we propose a new SEM, IG-SEM, which comprises a simple parametric family without such a threshold and is fully flexible enough to fit the training data without a change point. This is helpful because the mortality function can be written explicitly.
Another contribution is that we propose a methodology to improve long-term future predictions in Section 3. The prediction procedure proposed by Shimizu et al. (2020) is satisfactory for mid-term future (approximately 10-30 years future) but not for long-term future (e.g., 40 years future cohort). Occasionally, the predictive mortality function does not fit existing data. Therefore, we implement a two-step procedure: the first step is the same as in Shimizu et al. (2020), and in the second step, we refit the predicted mortality to the existing younger generation data using 95% prediction intervals for the parameters. We illustrate that this second step can drastically improve the long-term prediction.
Section 5 discusses some advantages of SEM over classical regression-type models in the reducedform approach. Although this section only presents a theoretical discussion, there are some ongoing experimental studies. We refer to, for example, Shirai and Shimizu (2022) for discussing the prediction of full life expectancy via SEM.
Finally, Section 6 introduces the SEM project. which explicitly provides cohort/countrywise mortality functions with parameter values on a website.

A new SEM: Inverse Gaussian SEM
Let us introduce some notations to provide a new SEM with an explicit mortality function.
Random variable Y follows an inverse Gaussian distribution, that is, with mean a and variance a 3 /b if the probability density is given by

Definition 2.1 (IG-SEM; Inverse Gaussian SEM). A survival energy process
where x c > 0 is the initial energy and Y c ∼ IG( c , σ c ) is an inverse Gaussian process with mean function c and parameter σ c > 0; that is, Y c 0 = 0, a.s., and Y c have independent increments. Moreover, for any t > s > 0 and an increasing function c with c (0) = 0, it follows that Remark 2.2. If (t) = t, then Y is an inverse Gaussian Lévy process that is a spectrally positive purejump subordinator. Hence, IG-SEM can include a jump in the path of survival energy, although the path of ID-SEM is continuous.
Such a process is used to model the time of system failure in engineering, where failure occurs at τ c if the accumulated damage Y c t exceeds a certain threshold x c : τ c = inf{t > 0 | Y c t > x c }, which follows the same idea as our survival energy for human death; refer to Ye and Chen (2014). The following theorem provides the mortality function:

Theorem 2.3. The mortality function for IG-SEM is given by
2π e −z 2 /2 dz, Subsequently, we consider the mean function ϑc as follows: The parameters are estimated in a manner similar to those for ID-SEM, as shown in the data analysis in Section 4. Remark 2.4. As described in Shimizu et al. (2020), we can interpret the parameters and coefficients of the SDE. For example, in ID-SEM, the drift term represents the intrinsic survival power of a human and the diffusion term is affected by the social environment. In IG-SEM, λ c may correspond to the drift term because it is the mean of the accumulating damage process Y c , and σ c may be an environmental parameter because it affects damage variance.
Remark 2.5. We estimate the parameters by least-squares fitting of the "conditional" mortality function q c (t|S) given in (1.3) to the corresponding empirical version, which can be computed based on the data in the HMD (Human Mortality Database), as explained in Shimizu et al. (2023). We often recommend choosing a conditioning age S of approximately 20 years. This is because mortality at young ages is highly volatile and unstable, making it difficult to predict with simple models, such as ours. The value of S must be determined empirically by examining the abundance of the data and mortality rates at young ages, which depends on the country.

Modification of estimated mortality functions
Suppose we have estimated the values of ϑ c for some cohorts c 1 < c 2 < · · · < c m , say, ϑ c 1 , . . . , ϑ cm via LSE, as in Shimizu et al. (2020). We assume that future parameter ϑ c is determined as follows: for the deterministic (unknown) mean function h. Assuming that the estimated parameters ϑ c 1 , ϑ c 2 , . . . , ϑ cm are the realisations of ϑ c i (i = 1, . . . , m), we estimate h, parameterized case-by-case, as described in Section 4. Once h is estimated, say h, we predict the parameter ϑ c for a future cohort c' by and obtain a predicted mortality function (PMF) q c (·, ϑ c ), as in Shimizu et al. (2020). However, in this study, we propose further modifications to improve the prediction. Based on assumption (3.1), we can construct the α-prediction interval for ϑ c : where c is an estimator of c in (3.1), and z α is the (1 − α)-percentile of N(0,1); that is, (Numerical illustrations for these (95%-) prediction intervals are shown in Figures 1 and 2 in the real data analysis). Using prediction interval I c ,m α , we readjust the parameters within the α-prediction interval such that the mortality function can fit the existing (younger) data for cohort c as follows: Definition 3.1 (Modified PMF). When empirical data q c (t|S) for t = t 1 , . . . , t d exist, we reselect the predictor such that where I c ,m α is given by (3.3). We used q c (·, ϑ c ) as the final predictive mortality function. We refer to it as the modified predicted mortality function (MPMF).
Later, in certain examples, we compare the direct prediction (3.2) with the above modification (3.4).

Data analysis: ID-SEM versus IG-SEM
In this section, we compare ID-SEM and IG-SEM using actual data from the HMD (Human Mortality Database) and illustrate that the MPMF with (3.4) can predict future mortality significantly better than the PMF without modification. to this parameter. In the following examples, we fix T c = 50, for which ID-SEM can fit the training data relatively well.

Denmark
The first example is Denmark. We use the following mortality data from the m = 25 cohorts: and suppose that we are in 1951 (because we already have the data of 110 years old of the 1840 birth cohort). Based on this data, we predicted the mortality functions of 20 years old in the future cohorts c = 1850, 1870, and 1890 for females and males, respectively. The predicted age groups for c = 1850, c = 1870, and c = 1890 will be 101 years old, 81 years old, and 61 years old, respectively, based on the assumption that the current year is 1951. Data analysis was performed using the following procedure.
1. We estimate the parameters in q ID c (t, ϑ c ) and q IG c (t, ϑ c ) for the data c =1816 -1840 and obtain the values of the parameters in the future cohorts c = 1850, 1870 and 1890 as in Section 3; also refer to Shimizu et al. (2020). The results for ID-SEM and IG-SEM with the (adjusted) coefficient of determination R 2 , (R 2 ) and 95. We will show the tables for R 2 (R 2 ) and the regression curves with the amplitude of the 95%-PI for males, but not the corresponding figures.   2. To obtain the MPMF, we split the data into training and test data. For example, in c = 1890, we split the mortality data into two parts: 20-60 years (training data: red dots) and 61-110 years (test data: black dots), and use the training data for modification (3.4). 3. In Figure 3, we will visually compare the two mortality curves with test data (black dots) for males and females, but only for c = 1890. For other cohorts (c = 1850 and 1870), we will only show the MSE between the predicted mortality function and the actual empirical mortality function in Table 3. Remark 4.2. We employed the simplest regression functions feasible to facilitate ease of use. For α c , β c , κ c , we used a negative increasing function of the form −c 1 e −c 2 x < 0 because these values should be negative. Although γ c should be positive, it may be justifiable to model it using a linear function, among other possible forms, given the available data. Occasionally, one can use the information criteria, for example, AIC or BIC, to select a regression function; it is also possible to use a time-series model to predict future parameters. However, any model has merits and demerits; therefore, we attempted it as simply as possible.
In this cohort (relatively long future prediction), the difference between ID-SEM and IG-SEM is more significant. Even a modified version in IG-SEM cannot predict well in males because of the parameter prediction for a c and b c . This is a successful example of ID-SEM with a change point T and more parameters than IG-SEM.

Norway
The second example is that of Norway. Similar to Denmark, we use the following mortality data from the m = 25 cohorts:

Figure 3. Mortality functions by ID-SEM (left) and IG-SEM (right) for 1890 birth cohort in Denmark; females (top) and males (bottom). The magenta curve is before modification, and the blue one is the modified version. The prediction part is more than 60 years old.
All other procedures were identical to those used in Denmark. We estimate parameters using nonlinear regression and obtain PMFs before/after the modification. For these results, we only show the figures of PMF before/after changes for c = 1900. For the others, we only show the noninear regression curves with the values of R 2 (R 2 ) and their 95%-PI in Tables 4 and 5. Moreover, Table 6 lists the MSE of MPMFs.

Remark 4.3.
In Denmark, ID-SEM is superior to IG-SEM. However, IG-SEM is effective in this example and occasionally outperforms ID-SEM. Because it is challenging to determine a suitable change-point parameter T in ID-SEM, IG-SEM, which has fewer parameters than ID-SEM, is also a good candidate for the prediction model of the mortality function.
In this example, IG-SEM is superior to ID-SEM in females but not males. Accordingly, it would be challenging to determine the SEM to predict and compute some quantities of interest. We should compute them both by ID-SEM and IG-SEM and compare the values objectively to make a decision Shimizu et al. (2020)    along with their MSEs. The results demonstrate that the differences in prediction errors are similar, but ID-SEM is often superior to RHM at senior ages.

Comparison with the classical model with cohort-effects
Remark 5.1. Although we used the CBD model, for example, Cairns et al. (2006aCairns et al. ( , 2008, as a candidate cohort model, it was unsuitable for long-term prediction. Therefore, these results were excluded from this study.

Figure 4. Mortality functions by ID-SEM (left) and IG-SEM (right) for 1900 birth cohort in Norway; females (top) and males (bottom).
The magenta curve is before modification, and the blue one is the modified version. The prediction part is more than 60 years old.

Reducing statistical errors
One of the advantages of the proposed SEM approach is the statistical estimation of the actuarial quantities. Consider, for example, the single premium of all life insurance at age x, say A x . It is written as follows: where v ∈ (0, 1) is the discount factor. If we use the Lee-Carter model, then it is written as where m x,t is the (crude) mortality parameterized by with parameters α x , β x estimated based on the predicted values of κ t , which are generated using a time series model that includes some unknown parameters, and x,t is a noise process. Here, we must estimate requires only one parameter estimation for ϑ c because ϑ c is independent of k = 1, 2, . . . . This can make the statistical error less than that of classical mortality models.

Sensitivity analysis
As shown in the previous section, most actuarial quantities are written in the functionals of the mortality function q c (t, ϑ c ), which are often rewritten in terms of the conditional mortality function q c (t, ϑ c |S), with a few unknown parameters ϑ c . This situation is suitable for sensitivity analysis concerning parameter changes. Consider an actuarial quantity for age x and cohort c represented by a Stieljes-type integral form such as where h denotes a measurable function of [0, ∞), The integral sign implies that ∞ 0 := [0,∞) . We suppose the exchangeability of ∞ 0 and differentiation ∂ ϑ as far as we need which is continuous in ϑ.
Most actuarial quantities are written in this form (see Shimizu et al. (2020)). For example, A x , the single premium of all life insurance at age x, is given by where v ∈ (0, 1), Moreover, for the immediate payment version:

It follows from integration by parts that
We are interested in the difference H(ϑ) − H(ϑ c ) for different values of parameters ϑ and ϑ c . By Taylor's formula, Integral ∞ 0 ∂ ϑ q c (t, ϑ c |x) dh(t) can be evaluated via direct computation. For instance, we have the following inequality: Lemma 5.2. For the mortality function of IG-SEM, q IG c (t, ϑ) with ϑ = (a, b, σ ), we obtain the following estimates: ∂ a q IG c (t, ϑ) ≤ 2te at 1 + σ ϑ (t) + x c φ xc,xc/σ ( ϑ (t)); ∂ b q IG c (t, ϑ) ≤ 2t 1 + σ ϑ (t) + x c φ xc,xc/σ ( ϑ (t)); For our LSE ϑ c of ϑ c given in Theorem 3.2 in Shimizu et al. (2020) (see also the erratum Shimizu, 2022) and the sample size n c required to obtain the estimator, we have, by the delta method in statistics, that √ n c H( ϑ c ) − H(ϑ c ) = ∞ 0 ∂ ϑ q c (t, ϑ c |x) dh(t) · √ n c ( ϑ c − ϑ c ) + o p (1) where the asymptotic variance c,x can be estimated using the estimators of R d , Q d , in Theorem 3.2 in Shimizu et al. (2020) (with Shimizu, 2022, and the plug-in estimator ∞ 0 ∂ ϑ q c (t, ϑ c |x) dh(t). This yields the confidence interval H(ϑ c ): where z α is the upper α-percentile of the standard normal distribution and c,x is an estimator of the asymptotic variance c,x .

Conclusions
We proposed two types of parametric families for SEMs: ID-SEM and IG-SEM, which provide accurate cohort-wise PMFs. Using the (prediction) confidence intervals for unknown parameters, we can modify the MPMF to fit existing data in a manner consistent with LSE (refer to Remark 3.1). SEM is a viable candidate for alternative modeling of mortality prediction. We illustrated that both SEMs had high potential for long-term mortality prediction and were superior to the classical model, possibly with cohort effects, for example, LC, RH, and CBD models. Moreover, SEM has numerous theoretical advantages: notational understanding for nonactuarial people, reduced estimation error owing to fewer parameters, and usefulness for sensitivity analysis.
For further information regarding SEM, such as graphs and other topics, please refer to the supplementary article by Shimizu et al. (2023).