WHY DOES A HUMAN DIE? A STRUCTURAL APPROACH TO COHORT-WISE MORTALITY PREDICTION UNDER SURVIVAL ENERGY HYPOTHESIS

Abstract We propose a new approach to mortality prediction under survival energy hypothesis (SEH). We assume that a human is born with initial energy, which changes stochastically in time and the human dies when the energy vanishes. Then, the time of death is represented by the first hitting time of the survival energy (SE) process to zero. This study assumes that SE follows a time-inhomogeneous diffusion process and defines the mortality function, which is the first hitting time distribution function of the SE process. Although SEH is a fictitious construct, we illustrate that this assumption has the potential to yield a good parametric family of cumulative probability of death, and the parametric family yields surprisingly good predictions for future mortality rates.

remains constant over each year of integer age x and over each calendar year t, we see that μ(t, x) = m (t, x), and the mortality rate is given by (− m(x, t)).
The Lee-Carter model is a regression type model of this mortality rate: (1.1) where α x , β x are time-independent parameters, κ = (κ t ) t≥0 is a latent time series, and x,t is an i.i.d. noise. There are many other extensions or modifications of this model. For example, Renshaw and Haberman (2003) considered the multifactor version of the Lee-Carter model, Renshaw and Haberman (2006) and Cairns et al. (2009) considered cohort effects, and Chen and Cox (2009) include jump effects. See the review by Cairns et al. (2008) for other studies. Those models concentrate on "probability" by considering death an accidental event, but they do not explain why a human dies. An analogy can be seen in credit risk analysis, where τ is considered the time of default of a company. Such a model is called the reduced-form approach in credit risk analysis; see, for example, Jarrow and Turnbull (1995). Except for the Lee-Carter type approaches, there are other reduced-form approaches in mortality prediction, as in Dahl (2004), Biffis (2005), Biffis et al. (2010), Bauer et al. (2012) (see also the references therein).
On the other hand, there is a structural approach, where the "firm value" of the company is modeled by a stochastic process, and where the time of default is defined as the first passage time of the process to a certain region (the Merton model), which is initiated by Merton (1974Merton ( , 1976. There are many extensions to this stream; see, for example, Schoutens and Cariboni (2009) and references therein. This is also the same as the probability of ruin in insurance risk theory, which is originally introduced by Lundberg (1903), where the ruin occurs if an insurance surplus reaches the negative region. Moreover, there is also a similar idea in the analysis of a system failure in engineering, where the "failure time τ " appears if accumulating "damages" exceed a certain level; for example, Ye and Chen (2014), among others.
In our study, we will use the same idea as those "structural approach" in mortality forecasting. That is, we assume that there is an underlying survival energy (SE) for human beings, and that a human dies when that energy vanishes: survival energy hypothesis (SEH); see also Ito and Shimizu (2019). The flow of energy would depend on the cohort in which a human was born because it would depend on the period: natural, economic, and social environments, technology and medical care levels, and so on. Therefore, we will consider cohort-wise SE dynamics.
Mathematically, we suppose that SE at time t in a cohort c is given by a random variable X c t with initial energy X c 0 = x c . The sample space is a family in the corresponding cohort, and each sample ω ∈ corresponds to a person. That is, for a person ω, his/her energy has a different flow X c · (ω), but the law is governed by an underlying probability law P on the σ -field F of . In this way, ( , F, P) becomes a probability space and X c = (X c t ) t≥0 is a stochastic SE process in that space.
Later, we will assume that the SE process follows a stochastic differential equation with an unknown parameter, which means that every human in the fixed cohort follows the same SE model but has a different path due to randomness. Therefore, we cannot separate the individual SE processes, but we can compare the mortality changes in one cohort with that of the others.
In the sequel, we assume a stochastic basis ( , F, P;F) with the usual conditions, where F = (F t ) t≥0 , is given, and that the cohort-wise SE process X c = (X c t ) t≥0 is a F-adapted process. The time of death of a human in the cohort c is given by the first hitting time of X c to zero: which can be the F-stopping time under some regularities and the probability of death up to time t is given by We call it the mortality function.
Our setting for the mortality function is similar to those for the default and ruin probabilities. That is, our analysis is essentially the one for the first hitting time to zero of the SE process X c which is the basic structure for human beings. Therefore, we call this approach a (cohort-wise) structural approach to mortality prediction after the earlier literatures.
There are many practical advantages of identifying cohort-wise mortality function (1.3). For example, when we compute life expectancy at age x, say e x , in practice, where t p x is the actuarial notation meaning the survival probability of age x for t years. Since most conventional approaches for estimating m(x, t) give a graph in x for a fixed t, we would need many graphs of m(x, t) (t = 1, 2, . . .) to calculate e x . However, if we can obtain q c (t) in (1.3) as a closed analytical function of t, we can obtain e x with a simple (numerical) integral Moreover, the single premium of an entire life insurance policy at age x (with δ > 0 interest force) is given by among others. Only a mortality function can give many cohort-wise actuarial quantities in simple computations. The rest of this paper is organized as follows. Section 2 explains the structural approach for computing cohort-wise probability of death using a Brownian survival energy model (BSEM) for a simple understanding. We extend the toy model to a more general time-inhomogeneous diffusion process in a natural way and propose a specific model that can give the explicit form of the hitting time distribution function, which is called the mortality function. Section 3 gives a method for estimating the model parameters under restricted data situations that would be unique in our context. Section 4 is devoted to real-life data analysis, where cohort-wise mortality functions are estimated and predicted based on open data from the Human Mortality Database (https://www.mortality.org/). Lastly, we conclude with Section 5.

Describing the idea with a toy SEM
To understand the structural approach that we use, let us consider a simple example of an SE model (SEM).
For simplicity, we assume that the SE of cohort c is a drifted Brownian motion: where W is a Wiener process and μ c , σ c , and x c are unknown parameters. Although this model is too simple for use in practice, it is useful to understand some of the advantages of the structural approach, which will be described in the sequel.
In this model, it is known that the hitting time distribution function (1.3) is given explicitly as follows: where θ = (μ c , σ c ) and are standard normal distribution functions; see Baukai (1990). If μ c < 0, the distribution of τ c is an inverse-Gaussian distribution with mean E[τ c ] = x c /|μ c | and variance 2.1.1. The meaning of the parameters • Initial SE: x c is the initial SE of cohort c, which is common within the cohort.
Then, the life expectancy of that cohort is given by E[τ c ] = x c /|μ c |. • SE volatility: σ 2 c can be a measure of a sudden death since the higher the σ 2 c , the easier it is for the energy process to hit zero.

Interpretation of the model
By estimating parameters (μ c , σ 2 c , x c ) and observing those changes year on year for a fixed cohort, we may be able to interpret "Why does a human die?" • If the initial SE x c is estimated to be high, then it indicates that the corresponding cohort has a potential of longevity. • If the estimated μ c is increasing year on year, then it indicates that a human in that cohort will not die year after year. If it is estimated as μ c ≤ 0, then it implies that P(τ c < ∞) = 1, which will be usual. However, it would be interesting if it is estimated to be μ c > 0, which means that E[τ c ] = ∞: a human will live forever! Such a belief does exist in some areas of biological philosophy, and if it were to be true, we might be able to "prove" the belief statistically. • If σ 2 c is estimated to be high, it implies that there is a probability of a "sudden death" because the SE process easily hits zero. If the estimated σ 2 c is decreasing year on year, then it indicates that a sudden death is going to decrease, which might also indicate an improved quality of life (medical care and lifestyle).

A practical point of view
It would not be realistic to assume that the parameters μ c and σ 2 c are constant throughout life as in the previous model. Intuitively speaking, in the first stage of life, childhood and teenage years, death rate and survival rate are unstable due to high volatility. In the middle stage, 20-40 years of age, μ c may not be that high though σ 2 c is somewhat high. In the last stage after retirement, μ c and σ 2 c decrease simultaneously; see Figure 1. Actually, the curve (2.2) cannot follow the empirical mortality function, which increases significantly in the last stage. Considering this, we shall consider a stochastic model where the drift and volatility depend on time.

Inhomogeneous diffusion SEM: Id-SEM
Hereafter, we assign a certain cohort as c and omit the index c.
On a filtered probability space ( , F, P, F = (F t ) t≥0 ) with usual conditions, we assume that the SE process of a certain cohort, say X c = (X c t ) t≥0 satisfies a stochastic differential equation where W is a Wiener process, U : [0, ∞) × , V : [0, ∞) × are measurable functions, with ⊂ R p , ⊂ R q for p, q ∈ N and we set θ = (x c , μ c , σ c ) ∈ with = × × , where ⊂ (0, ∞). We will be using such a time-inhomogeneous diffusion (Id) process for the SE model (SEM) in this study, and we shall call this model Id-SEM. More specifically, we shall consider some change point models, where the parameters in U and V have several change points in time.
In particular, this path is continuous since X 1 T 1 = X 0 T 1 . This mortality function is the same as the hitting time distribution function computed in Baukai (1990).

Brownian SEM with N change points
where μ c = (μ c,k ) k=0,...,N and σ c = (σ c,k ) k=0,...,N , and consider that (T N , Then the corresponding SE process, say, X c,N , follows a piecewise (drifted) Brownian motion. This is equivalent to X N = (X N t ) t≥0 defined as the following recurrence system of stochastic equations: for k = 1, 2, . . . , N, (2.5) The mortality function of N-BSEM is given by a recurrence-type formula for the mortality functions for the process X k 's (k = 1, 2, . . . , N); see Ito and Shimizu (2019) but it is difficult to implement in practice since the mortality function is not explicit. In reality, 1-and 2-BSEMs are fitted to real-life mortality data, and an ad hoc modification is used for the mortality function due to theoretical and practical difficulties. To overcome these difficulties, we shall use a model as a limiting process of the N-BSEM.

As a limit process of the N-BSEM
In the N-BSEM with (2.4), suppose that there exist some functions μ c (t) and σ c (t) such that As is well known in stochastic analysis, it holds that uniformly on any compact set for t (ucp-topology). See Protter (2004).
Although there are many studies on the hitting time distribution for such a general diffusion process, only a few models give explicit formulas for the corresponding mortality functions.

Explicit mortality functions
Suppose X c satisfies and that there exists a constant κ c = 0 such that This special SEM gives us an explicit form of the mortality function as follows.

Theorem 2.1 (Molini et al., 2011). Consider the mortality function of an Id-SEM
satisfying the condition (2.7) and put and Consequently, we can construct a parametric family of mortality functions as where q c is the function given in Theorem 2.1 and ⊂ R m with m := dim . Figure 2 shows the examples of (conditional) mortality functions of an inhomogeneous diffusion process (2.6) with coefficients with parameters μ c = (α c , β c , γ c ) = ( − 7.6, −0.009, 2.7) and κ = −0.002. The left curve is the unconditional version (S = 0) and the right one is conditional with S = 20, which will be used in the data analysis later. Compare it to, for example, Figure 3, the empirical mortality function of Swedish 1830 birth-cohort.

Empirical conditional mortality function
Since we cannot observe the path of X c in reality, estimating the unknown parameters in the SE process will not be straight forward. However, we can see "in principle" the many observations of the time of death τ c : where τ c i is the time of death of the ith human and n c is the population in cohort c. Let be the true conditional mortality function for a S year old in cohort c. If we observe the actual time of death: τ i (i = 1, 2, . . . , n c ), we could compute the empirical version of q c ( · |S) as follows: Although it might be hard to obtain such individual data, we can make an empirical version q c (t|S) using life tables from the Human Mortality Database (HMD) (https://www.mortality.org/) as follows: Let q (c) x be the mortality rate for 1 year of age x in "calendar year" c, and let q c x be the mortality rate for 1 year of age x with "birth year" c (cohort). We note that, in the life table of the calendar year c (from HMD), where ω is the final age in the life table. Then, we have that q c k = q (c+k) k for k = 2, 3, . . . , ω (e.g., ω = 110 in HMD). From these, we can compute the empirical mortality function We show in Figure 3 an example of the unconditional/conditional mortality functions of the Swedish 1830-birth-cohort made from HMD (https://www.mortality.org/). We see that the unconditional curve seems to have two points of inflection, which is hard to fit by our parametric family in (2.8). However, if we take, for example, S = 20, we get a conditional curve with one point of inflection, which could be fitted with a mortality function generated by our diffusion models.

Least squares estimation for hitting time distribution
Consider a parametric model of the mortality function q c (t) = P(τ c ≤ t): where ⊂ R m is a bounded open set and suppose that there exists a true value of the parameter θ 0 ∈ such that which is a parametric model for P(τ c ≤ t|τ c > S). By the Glivenko-Cantelli theorem in the classical theory of statistics, it is easy to see for q c (t|S) with the expression (3.1) that if q c (S) < 1. Therefore, we shall estimate the parameter θ 0 by fitting our parametric model to the empirical version in terms of the least squares error.
Definition 3.1. For a given S with 0 < q c (S) < 1, and S < t 1 < t 2 < · · · < t d ≤ ω, where is an open bounded subset of R m and is the closure. The index n := n c is actually implicit in the estimator since (3.1) is computed the same way as in (3.2) in practice.
Theorem 3.1. Suppose q c (t, ·|S) belongs to C 1 ( ) for each t and S. Moreover, for a given S > 0, suppose the following identifiability condition holds true: Then, we have a weak consistency of the least square error θ n : Proof. Note that θ n is an M-estimator for the contrast function and that, by (3.3), it holds for each θ ∈ that According to the M-estimation theory (e.g., van der Vaart 1998, Theorem 5.7), the proof of consistency is complete if we show the following (a) and (b). There is a function : → R such that Note that (b) is clear with (θ 0 ) = 0 by the identifiability condition, so we will now show (a). We consider n as a random map n : → (C( ), · ), where · is the sup norm over , by extending the domain of n to by its continuity on . Then, the uniform tightness of the C( )-valued random element n implies a weak convergence n D −→ in C( ), which is equivalent to (a). A well-known tightness criterion is given by and this is clear since q c and q c are bounded by 1 and ∂ θ q c is bounded on by the assumption. Hence, the proof is complete.
Theorem 3.2. Suppose the same assumptions as in Theorem 3.1. Moreover, suppose q c ∈ C 2 ( ) and is a convex subset of R m . Then, and the variance-covariance matrix = (σ ij ) 1≤i,j≤m is given by Proof. We use the notation ∂ θ = (∂ θ 1 , . . . , ∂ θ m ) and ∂ 2 θ = ∂ θ ∂ θ . For the proof, we shall use the same notation as the previous proof of Theorem 3.1. Moreover, for simplicity, we suppose the M-estimator θ n belongs to the interior of the parameter space as usual. Then, we can assume, by the usual argument, that ∂ θ n ( θ n ) = 0, and that det ∂ 2 θ n (θ) = 0 around θ 0 without loss of generality because the probability of those becomes zero when n c → ∞.
Under the above assumptions, we see from the mean value theorem that Note that q c (t i |S) is the empirical distribution that is consistent with q c (t i |S): where F is the distribution function of τ c i and and that it follows from the Donsker-type theorem that, for any jointly for i = 1, 2, . . . , d, and therefore, where = (σ ij ) 1≤i,j≤m is given by Moreover, since θ * n P −→ θ 0 by the consistency of θ n , we see from the continuous mapping theorem that This completes the proof.

Id-SEM with the exact mortality function
Here, we try the inhomogeneous diffusion process (2.6) with the (2.7) condition for the Id-SEM process to specify the mortality functions. We use life tables of several countries from HMD (https://www.mortality.org/) and estimate the model parameters as described in Section 3. We choose the three models A-C as candidates of coefficient U(t, μ c ) in the diffusion model with parameter μ c = (α c , β c , γ c ): and the diffusion coefficient is given by V 2 (t) = 2 κ c U(t, μ c ). We try the least squares method described in the previous section with S = 20 in each model. Here, Model A is the toy model described in Section 2.1. We can expect that the mortality function of Model A will not have a rapid increase in old age after, say, 50 years. Model B can improve the disadvantage of Model A with a polynomial growth function after change point T. Model C is an exponential growth model.
Throughout the analyses in this section, we fix the initial value x c = 1000 to satisfy the identifiability condition (3.4) in Theorem 3.1. Although we can take x c arbitrarily, we chose x c = 1000 to make the order of estimated values of other parameters be numerically appropriate. We note that, for any x c , there is some μ c that gives the same fitting in terms of the least squared error in our model. That is, the initial value x c is not that important in this SEM. Thus, the parameter θ = (α c , β c γ c , κ c ) is to be estimated and predicted for the future model.

Swedish mortality data
Swedish mortality data are good for testing the prediction for the distant future because the HMD (https://www.mortality.org/) has the long-term life tables of 20-110 years from 1751 to 2016. We apply our Id-SEM to the data from 1801 to 1830 birth-cohort and try to predict the mortality functions of the cohort after 10, 30, 50, and 70 years.
First, we shall try to estimate the parameters from the historical mortality functions with the change point T = 50. That is, assuming that we are now in 1940, we estimate the parameters of . . .
• 1830-birth-cohort by life tables of 1850-1940, Figure 4 shows an example of the estimated curves and the least squared error in models A-C for the c =1830 birth-cohort. As expected, Model A does not follow the empirical curves. Model B performs better than Model A but Model C is the best in terms of the mean squared error of all cohorts (1801-1830). Hence, we use the exponential growth model (Model C) to predict the future mortality functions. The results of all the parameter estimates in Model C are given in Figure 5. From the plots of the estimated values, we use linear regression for predicting the value of γ , and the exponential functions for α c , β c , and κ c because they seem to be increasing but their sign should be negative. We predict the future parameters using these regression functions and show the prediction curves after 10, 30, 50, and 70 years in Figure 6 with the prediction (LSE) errors in Table 1.
From the results, we observe that our Id-SEM can predict future mortality very well even after 50 years though there is a small gap after 70 years.

Dutch mortality data
Next, we apply our Models A-C to Dutch data of 1841-1870 birth-cohorts. In this example, we chose S = 20 and change point T = 60, which is selected so that the least square error fitting of the models is as good as possible.  According to Figure 7 with the table of mean squared errors, we see that Model C is the best in 1870-cohort; it was the same as in the other cohorts of 1841-1869. From the results of estimated parameters in 1841-1870 birth-cohorts, see Figure 8, we predict the future parameters using nonlinear regression as in the previous section and show the mortality functions of 10, 20, and 30 years after 1870 with the obtained (empirical) mortality data in Figure 9 and the prediction errors in Table 2. We also see in this example that our model can predict future mortality very well.

Japanese mortality data
We next apply our models to Japanese mortality data.
As Japan currently has a rapidly aging population, it is very important to predict the future mortality function, especially in old age. Since there is not much Japanese mortality data in the HMD (https://www.mortality.org/), we cannot confirm the accuracy of our predictions for the distant future. However, from the good prediction results for the Swedish and Dutch data, we expect that our model would yield good predictions for the Japanese data too.
We use the Japanese mortality data from 1947 to 2017. First, we estimate the parameters in 1897-1907 birth-cohort (for 11 cohorts) with S = 50 and T = 70 up to 90 year old, and predict the mortality of age 50 years in the cohort of 10-40 years after 1907.    Figure 10 shows the least square error fitting of SEMs A-C for old age in the 1907 birth-cohort. We also observe that Model C always gives a better fit to the empirical data for 1897-1906. The results of parameter estimation and nonlinear regression in each cohort are given in Figure 11. Based on these, the predicted mortality functions are plotted in Figure 12 and the prediction errors in Table 3. The predicted curve in each cohort seems to fit the obtained mortality although the actual (data) are up to 2017. We observe that the death rate at each age is decreasing in each cohort, which indicates that Japanese society is going to age more and more.

English and Welsh mortality data
Lastly, we use data from England and Wales. In this respect, we find a different feature for the transition of the parameters. We tried a preliminary analysis  (c = 1901-1920) with the results of (non) linear regressions. The horizontal axis is t, for which c = 1900 + t.
with Model C for the data of the birth-cohorts 1841-1920; see Figure 13. We observed a "structural change" around 1890 though we do not know the reason for it. Therefore, we will use the data from 1901 to 1920 birth-cohort (for 20 cohorts) in the least square error fitting of the mortality function with S = 20 and T = 50 up to 90 year old, and predict the mortality of age 20 years in the cohort of 10-30 years after 1920.
The results of fitting Models A-C for the 1920 birth-cohort are given in Figure 14. We also observe that Model C again gave a better fit to the data for 1901-1920, and the results of its parameter estimation and nonlinear regression for each cohort are given in Figure 15.
Remark 4.1. In order to capture the "structural change" around 1890 birthcohort, we had also tried to predict parameter changes by time series ARIMA model (with the model selection by AIC) including whole the term 1841-1920. However, the predicting results were very poor. Hence we will only show the results by nonlinear regression in this paper. How to predict such a structural change is an important issue in the future.
We predict the mortality function of 20 years old of the future cohort c =1930 (10 years later), 1940 (20 years later), and 1950 (30 years later). The mean squared errors for the prediction are given in Table 4. We find that Model  C would be again a better choice for the prediction, and the predicted mortality functions are given in Figure 16. Although the predicted curve is slightly bigger than the actual around age 70 years, it seems to fit the data in each cohort very well. We again observe as in the previous results for Japan that the death rate at each age is decreasing in the cohort, which indicates that the English society is also going to age more in future.

Comparison with the Lee-Carter model
Finally, we close this section by comparing our model to the classical Lee-Carter model given in (1.1). Although this comparison may be unfair since the Lee-Carter model (1.1) does not include cohort effect, we should attempt it because the Lee-Carter model is the most standard model in practice. . . . For the prediction by the Lee-Carter model (1.1), we use the same procedure as in Lee and Carter (1992) to estimate and predict the parameters, and the life tables of 1901-1940, which are enough for prediction by this model since the Lee-Carter model does not consider cohort effect.
The results are given in Figure 17. The black dots are the empirical mortality function (actual), and the red solid lines are the predicted mortality functions by our Id-SEM. The blue dots are the results by the Lee-Carter model. We find that the SEM gives better predictions than Lee-Carter in all the cohorts, especially for older ages.

Summary
In this study, we propose a new approach to cohort-wise mortality prediction. We assume the SE process for people in each cohort and define the time of death as the first hitting time (τ c ) of the SE process to zero level. We call the distribution function of τ c the mortality function, which is useful for the computations of many actuarial quantities and longevity analysis.
This study explains a human's death in terms of survival energy but that does not mean that we claim the existence of such an energy. We claim that some hitting time distributions of diffusion processes can give a good parametric family to fit the empirical mortality functions and their predictions. Indeed, we propose a special inhomogeneous diffusion (Id) process as in Section 2.4, which can give an explicit mortality function and show in Section 4.1 that it gives surprisingly good predictions for the distant future if we choose the drift and diffusion coefficients appropriately.
Although we have tried, in some sense, ad hoc models for fitting and predictions, our structural approach is worth using and studying further for mortality prediction.

Model selection and predictions
There are many things to be considered from the perspective of modeling.
We fix the change point T of the drift coefficient in Id-SEM for all the cohorts in a country. However, this may depend on the cohort in practice and we need to investigate the estimation and prediction of T from the data. Conversely, we should take S in (3.1) to be high to some extent since the mortality function in the younger population, say, 0-5 years old is not likely to be the hitting time distribution function of a diffusion process; you can compare Figures 2 and 3, (S = 0). Therefore, the prediction for younger ages is an important problem.
How to choose T (as well as S) is a problem of model selection since it affects the coefficients of the SE processes. Moreover, we should compare several competing models with different coefficients. A standard procedure for model selection is based on information criteria. However, we did not discuss the information criteria in this study because of several theoretical problems.
We estimate the parameter in the SEM using the least squares method, for which the Akaike information criteria (AIC) and Bayesian information criteria (BIC) are not available since they have to be based on the maximum likelihood estimators. For example, in a theoretical (statistical) perspective, AIC cannot be applied even to the Lee-Carter model because the consistency of estimators of parameters in the Lee-Carter model is not ensured; see Remark 3.1. In our case, we need to construct the generalized information criteria (see Konishi and Kitagawa 1996), which would be not easy since our data for τ i are restricted. In HMD (https://www.mortality.org/), we cannot obtain the actual of each τ i but we can see the (approximate) empirical mortality functions that are manufactured from the data of death probabilities. This could be improved if the mortality database is arranged in more detail. Otherwise, we still have many technical (mathematical) issues to be solved under our limited sampling situation. This would be an important issue for future studies.
In the prediction of parameter changes, we used nonlinear regressions for simplicity. However, that might be inadequate because the estimated parameters are sometimes unstable and the trends do not seem to be selected nonlinear functions. For the prediction of parameters, it might be possible to use a machine learning method with Gaussian processes, among others. If we can propose a kind of confidence interval for the future parameters, then we can give a confidence interval for the predicted mortality function. Currently, we can only give a confidence interval for estimated mortality functions by the asymptotic normality results for the estimators: Theorem 3.2.

Some directions of model extension
Future studies will have to extend our SEM in several directions: one is to extend the coefficient of the SDE (2.3) to be more general, for example, statedependent model or random coefficient and the other is to introduce "jumps" in the SE process such as jump-diffusions or Lévy processes since SE could jump if there is a disease or an accident in their lives.
In those cases, we face a problem such that the hitting time distribution q c is usually not easy to use. In some special cases, we could use some implicit results, for example, by Hao et al. (2013) or Abbring (2012), where an integraltype equation or the Laplace transformation of the hitting time distribution is available. However, in most cases, the mortality function is not explicit. Then, we will face problems in statistical inference, and it could sometimes be computer intensive in the end.
One practicable and strong candidate would be the Inverse Gaussian process. For example, Ye and Chen (2014) investigate the process in a context of modeling the degradation of products in random environments. Since "degradation" is an accumulation of damage over time that ultimately leads to a system failure when it crosses a certain threshold, the motivation is very similar to our SE hypothesis. In the inverse Gaussian model, the hitting time distribution can be written in explicit form, so it can be a strong candidate for the SEM.
Although Ye and Chen (2014) also investigate the procedures of parameter estimation, there is an important difference between their model and ours.
That is, we cannot observe the SE process directly while they can observe the degradation process. This makes our problem harder on many statistical issues and model selection. We have proposed one possible strategy in this paper, but further study is required on those points as pointed out in Section 5.2.1.