Hostname: page-component-8448b6f56d-c47g7 Total loading time: 0 Render date: 2024-04-18T20:17:56.571Z Has data issue: false hasContentIssue false

A unified approach to mortality modelling using state-space framework: characterisation, identification, estimation and forecasting

Published online by Cambridge University Press:  22 May 2017

Man Chung Fung*
Affiliation:
Data61, CSIRO, Australia
Gareth W. Peters
Affiliation:
Department of Statistical Science, University College London, 1–19 Torrington Place, WC1E 7HB, United Kindom Oxford Mann Institute, Oxford University Systemic Risk Center, London School of Economics
Pavel V. Shevchenko
Affiliation:
Department of Applied Finance and Actuarial Studies, Macquarie University, NSW 2109, Australia
*
*Correspondence to: Man Chung Fung, Data61, CSIRO, Pembroke Road, Marsfield, NSW 2122, Australia. Tel: (612) 94905873. E-mail: simon.fung@csiro.au
Rights & Permissions [Opens in a new window]

Abstract

This paper explores and develops alternative statistical representations and estimation approaches for dynamic mortality models. The framework we adopt is to reinterpret popular mortality models such as the Lee–Carter class of models in a general state-space modelling methodology, which allows modelling, estimation and forecasting of mortality under a unified framework. We propose alternative model identification constraints which are more suited to statistical inference in filtering and parameter estimation. We then develop a class of Bayesian state-space models which incorporate a priori beliefs about the mortality model characteristics as well as for more flexible and appropriate assumptions relating to heteroscedasticity that present in observed mortality data. To study long-term mortality dynamics, we introduce stochastic volatility to the period effect. The estimation of the resulting stochastic volatility model of mortality is performed using a recent class of Monte Carlo procedure known as the class of particle Markov chain Monte Carlo methods. We illustrate the framework using Danish male mortality data, and show that incorporating heteroscedasticity and stochastic volatility markedly improves model fit despite an increase of model complexity. Forecasting properties of the enhanced models are examined with long-term and short-term calibration periods on the reconstruction of life tables.

Type
Papers
Copyright
© Institute and Faculty of Actuaries 2017 

1. Introduction

An ageing population is a major challenge that many countries are facing today. The problem arises from the fact that fertility rates are declining while life expectancy has been increasing in the past several decades without any sign of slowing down. The adverse financial outcome of people living longer than expected, and hence the possibility of outliving their retirement savings, is known as longevity risk. This long-term demographic risk has significant implications for societies and manifests as a systematic risk for pension plans and annuity providers. Policymakers rely on mortality projection to determine appropriate pension benefits and to understand the costing of different economic assumptions and regulations regarding the age of retirement of a given population. For instance, in the United Kingdom and Australia defined-benefit pension plans prior to 2000s had limited exposure to effects of longevity risk since high equity returns on pension fund wealth management portfolios were masking the impact of longevity risk, however, post-2000 declining equity returns coupled with record low interest rate financial environments has demonstrated the significance of decades of longevity improvements, posing a very real problem for pension schemes. Furthermore, by regulation, insurers who offer retirement income products are required to hold additional reserving capital to cover longevity risk. A key input to address longevity risk is the development of advanced mortality modelling methodology, so that human longevity can be predicted with better accuracy and any uncertainties can be accounted for in mortality forecasting.

Since the introduction of the Lee–Carter (LC) model (Lee & Carter, Reference Lee and Carter1992), a range of stochastic mortality models have been proposed in the literature. Renshaw & Haberman (Reference Renshaw and Haberman2003) and Renshaw & Haberman (Reference Renshaw and Haberman2006) introduce multiple period effects and cohort effect to capture the change of mortality with respect to year and year-of-birth, respectively, to the LC model. Cairns et al. (Reference Cairns, Blake and Dowd2006) proposed a two-factor period effect mortality model, known as the Cairns–Blake–Dowd (CBD) model, for pensioner ages. A cohort extension of the CBD model was studied in Cairns et al. (Reference Cairns, Blake, Dowd, Coughlan, Epstein, Ong and Balevich2009). Plat (Reference Plat2009) draws on the strengths of the LC model and the CBD model to produce an age–period–cohort model that includes a term to capture young mortality dynamics. In these well-known cases it is common practice in actuarial settings to estimate stochastic mortality models based on a singular value decomposition (SVD) approach (Lee & Carter, Reference Lee and Carter1992; Koissi et al., Reference Koissi, Shapiro and Hognas2006) or via a maximum likelihood-based approach if a discrete Poisson regression setting is considered (Brouhns et al., Reference Brouhns, Denuit and Vermunt2002; Cairns et al., Reference Cairns, Blake, Dowd, Coughlan, Epstein, Ong and Balevich2009).

A common feature of the estimation methods adopted in the frameworks mentioned above is that the dynamics of the period effect, the stochastic latent processes, are not directly incorporated into joint parameter and state estimation, and instead form a component of a second stage of estimation. Typically this involves specifying a model for the period effect for forecasting purpose only after an estimation is performed. Such approaches often suffer from a statistical lack of efficiency compared to methods that perform joint static model parameter estimation and latent process filtering. Hence, the first argument we make is that recasting different classes of mortality models in a state-space formulation can better facilitate state-space-based inference under either frequentist or Bayesian estimation. This is especially true in the case that the inference is performed jointly on the latent process and static model parameters, rather than in a less statistically efficient two-stage procedure.

We note that in the classical actuarial approach that utilises the two-stage procedure, a wide variety of model choices may be adopted to model the latent stochastic dynamics such as period effect, cohort effect, etc. These models may be time series models such as random walks with drift, seasonal autoregressive integrated moving average (ARIMA) model, long memory processes such as generalized autoregressive moving average (GARMA) model or stochastic volatility model such as generalized autoregressive conditional heteroscedasticity (GARCH) model, etc. It is upto the modeller to perform statistical estimation, model selection and model criticism stages such as via Box–Jenkins type procedures to select an appropriate model choice. In the one-stage estimation framework of the state-space model, exactly the same flexible classes of model are also admissible for the state equation, so there is no loss in model generality when working with the state-space formulation. As we will discuss in linear Gaussian state-space choices such as under seasonal ARIMA (SARIMA) class of models there are optimal state-space estimators based on Kalman filters, but in more general classes of non-linear time series models one can use Sequential Monte Carlo (SMC) filtering as discussed below. Note also that one can always carry out model diagnostics in the state-space framework to decide whether the model proposed is suitable given the data. In addition, there are various formal statistical criteria designed for model ranking as well as model selection which are applicable for the one-stage procedure in state-space setting.

Typically the studies carried out in practice and in the literature have the feature that only mortality data from the past several decades is considered. For many countries, age-specific death rates are evolving rather smoothly except for some potential change of trend in the last 50 years or so in some developed countries. Besides ARIMA models, structural change model have been proposed to take into account the trend-changing behaviour of the period effect (Li et al., Reference Li, Chan and Cheung2011; van Berkum et al., Reference van Berkum, Antonio and Vellekoop2016). Despite this, the implication of including earlier periods that exhibit significant volatility of mortality, which can be attributed to some life-critical events such as wars and epidemics, is still not yet being investigated. The ability to incorporate such structural information into a mortality model is greatly facilitated when recasting the model in a state-space formulation. Furthermore, extensions to mortality models that can also be facilitated in a state-space formulation are increasingly able to be considered and may better explain the stochastic dynamics of such processes. These include features such as time-varying volatility; cross-sectional volatility between different age groups; extremal dependence features; cohort effects; structure breaks in regimes; long memory or persistence in mortality features in different age groups; cointegration and non-stationarity features; as well as regression-based structures that decompose mortality according to categorical features such as official death causes, regional categories, etc.

Moreover, additional stochastic factor models such as two- and three-factor models can be easily considered. This can be particularly relevant when modelling features such as trends in excess mortality in particular age groups resulting from disease epidemics (Zucs et al., Reference Zucs, Buchholz, Haas and Uphoff2005; Dawood et al., Reference Dawood, Iuliano, Reed, Meltzer, Shay, Cheng, Bandaranayake, Breiman, Brooks, Buchy, Feikin, Fowler, Gordon, Hien, Horby, Huang, Katz, Krishnan, Lal, Montgomery, Molbak, Pebody, Presanis, Razuri, Steens, Tinoco, Wallinga, Yu, Vong, Bresee and Widdowson2012), cold and heat waves (Fouillet et al., Reference Fouillet, Rey, Laurent, Pavillon, Bellec, Guihenneuc-Jouyaux, Clavel, Jougla and Hémon2006; Analitis et al., Reference Analitis, Katsouyanni, Biggeri, Baccini, Forsberg, Bisanti, Kirchmayer, Ballester, Cadum, Goodman, Hojs, Sunyer, Tittanen and Michelozzi2008) and other effects such as medical impairments, occupational hazards, hazardous persuites, geographical location of residence and ethnic origin, see Eloranta et al. (Reference Eloranta, Lambert, Andersson, Czene, Hall, Björkholm and Dickman2012) and England & Haberman (Reference England and Haberman1993). Hence, the second argument we make is that all these different model structures can readily be encoded in state-space model structures. Furthermore, they can be consistently combined in joint estimation procedures in such state-space model structures in either frequentist and Bayesian formulations, whilst also admitting consistent joint forecasting models for predictive purposes.

A variety of state-space model approaches exist in a range of different literatures, in this paper we propose to begin with the widely adopted frameworks typically introduced in state-space modelling settings in Harvey (Reference Harvey1989) or West & Harrison (Reference West and Harrison1997), which we develop to address some of the aforementioned issues. In contrast to the SVD and Poisson regression estimation approaches where the period effect is treated as parameter without any temporal structure in the first-stage estimation, period effect is regarded as a latent process with a Markovian structure under the state-space approach. In other words, a state-space formulation permits modelling, estimation and forecasting of mortality under a unified framework. Recent progress in sampling-based techniques has allowed statistical inference to be conducted on sophisticated state-space models that can incorporate multiple latent-driving factors which may exhibit non-linear and non-Gaussian stochastic dynamics. We take advantage of this development and utilise realistic model to capture the long-term volatility structure of mortality time series.

Pedroza (Reference Pedroza2006) and Kogure & Kurachi (Reference Kogure and Kurachi2010) consider Bayesian estimation of the LC model in state-space form. A maximum likelihood approach is studied in De Jong & Tickle (Reference De Jong and Tickle2006). Here, we extend such frameworks to show how to adopt a combination of filtering procedures with Rao-Blackwellisation to obtain gradient-based Fisher score equation recursions to accurately and efficiently perform optimal filtering of the latent state process, in the sense of mean square error minimisation, and recursive least squares estimation for the static model parameters jointly in a recursive manner. Furthermore, we extend such state-space models to incorporate non-linear and non-Gaussian features in the state-space structure that no longer admit simple Kalman filter forward-backward algorithm recursions, leading us to more cutting edge filtering techniques based on SMC methods. In this regard, we estimate and examine the LC model with heteroscedasticity using both gradient-based maximum likelihood and Bayesian analysis. Alternative models that have tried to include such features include, for example, the Poisson regression in Brouhns et al. (Reference Brouhns, Denuit and Vermunt2002) and Czado et al. (Reference Czado, Delwarde and Denuit2005), who aimed to replace the homogeneous additive error term in the LC model by a Poisson error structure. Also, we note a recently developed framework for modelling death counts with common latent risk factors via credit risk plus methodology with model estimation via Markov chain Monte Carlo (MCMC) in Hirz et al. (Reference Hirz, Schmock and Shevchenko2017). However, we argue that the state-space formulation allows heteroscedasticity to be accounted for in a more straightforward manner.

Note that Pedroza (Reference Pedroza2006) also considers a LC model with heteroscedasticity structure. Our approach in this paper, however, is more general compared to Pedroza (Reference Pedroza2006) as we explain in details the machinery behind the gradient-based maximum likelihood estimation and sampling-based inference, not confining to the restricted class of linear and Gaussian state-space models as previous studies focus on. Such an effort is important especially for researchers and practitioners who are interested in more realistic and sophisticated models which would fall into the class of non-linear and non-Gaussian models, while not wanting to be handicapped by the complications arising from model estimations.

Through reformulation and extensions of the LC type mortality models in a state-space model structure, we investigate several key properties observed in mortality data. First, the cross-sectional variance–covariance matrix between age group structures is non-homogeneous. Second, examination of mortality data over a long period indicates that volatility of the evolution of death rates is not constant, that is, heteroscedasticity is present. We show that the incorporation of a second stochastic volatility latent factor will allow us to identify the periods in which mortality demonstrates heightened volatility. This will aid in interpretation and forecasting from such models. Specifically, we introduce a stochastic volatility model for the period effect, aiming to capture long-term mortality dynamics. The state-space framework provides a natural platform to analyse stochastic volatility models (Kim et al., Reference Kim, Shephard and Chib1998; Chib et al., Reference Chib, Nardari and Shephard2002). In this paper we develop a particle Markov chain Monte Carlo (PMCMC) (Andrieu et al., Reference Andrieu, Doucet and Holenstein2010). Bayesian model formulation in order to estimate the resulting stochastic volatility model of mortality jointly with the other latent stochastic factors and the static model parameters.

We introduce to mortality modelling the estimation framework based around the PMCMC algorithm which utilises SMC (Doucet et al., Reference Doucet, De Freitas and Gordon2001; Peters et al., Reference Peters, Fan and Sisson2012) to obtain required quantities in Metropolis–Hastings algorithms that has found many applications in a variety of areas, for example, finance (Peters et al., Reference Peters, Briers, Shevchenko and Doucet2013), economics (Flury & Shephard, Reference Flury and Shephard2011), non-life insurance (Peters et al., 2010 Reference Peters, Wüthrich and Shevchenkob ), risk management (Targino et al., Reference Targino, Peters and Shevchenko2015) and computational biology (Golightly & Wilkinson, Reference Golightly and Wilkinson2011). We apply this powerful tool in mortality modelling and it allows us to develop efficient algorithms to estimate a stochastic volatility extension of the LC model.

Recently, there are growing interests in applying a state-space framework to the pricing and hedging of longevity risk. For example, Kogure & Kurachi (Reference Kogure and Kurachi2010) considers the LC model in state-space form and show that a pricing risk premium for longevity instruments can be derived based on the maximum entropy principle. As another example, Liu & Li (2016 Reference Liu and Lia ) investigates longevity hedging strategies accounting for population basis riskFootnote 1 by formulating multi-population mortality models in the state-space framework. Their approach relies crucially on the determination of the sensitivities of the hedge portfolio and the liability with respect to the hidden states of a state-space multi-population mortality model. In another paper, Liu & Li (2016 Reference Liu and Lib ) addresses the issues of trend-changing behaviours of mortality rates by considering a locally linear CBD model in state-space representation. In addition, they consider a state-space hedging method including drift risk where the hidden states play an important role. We believe that our general presentation of the state-space framework in the paper, including the machinery such as PMCMC developed in the statistics literature and extended models we proposed, will further facilitate the potentials of utilising state-space methods in modelling human mortality and the risk management of longevity risk.

The paper is organised as follows. In section 2 we give an overview of the conventional mortality modelling and estimation methodology in the literature. A state-space approach for mortality modelling is formulated and discussed in section 3. Section 4 is devoted to state-space inference for stochastic mortality models in a frequentist approach. Section 5 focusses on Bayesian inference for dynamic mortality models in state-space framework. In section 6 we analyse Danish mortality data based on the enhanced models and methodologies proposed in the paper. Section 7 provides concluding remarks.

2. Classical Bayesian and Frequentist Approaches

In this section we first briefly recall some important definitions on mortality modelling. We then review stochastic mortality models that are commonly found in the literature. Standard estimation procedures under frequentist and Bayesian approaches are discussed.

2.1. Definitions and notation

Hereafter, we use the following standard definitions from actuarial literature on mortality modelling (Dickson et al., Reference Dickson, Hardy and Waters2009; Pitacco et al., Reference Pitacco, Denuit, Haberman and Olivieri2009). Let T x be a random variable representing the remaining lifetime of a person aged x. The cumulative distribution function and survival function of T x are written as $$_{\tau } q_{x} \, {\equals}\, P\left( {T_{x} \leq \tau } \right)$$ and $$_{\tau } p_{x} \, {\equals}\, P\left( {T_{x} \,\gt\,\tau } \right)$$ , respectively. For a person aged x, the force of mortality at age x+τ is defined as

(1) $$\mu _{{x{\plus}\tau }} \,\colon\! {\equals}\, \mathop {\rm lim}\limits_{h\to0} {1 \over h}P\left( {T_{x} \,\lt\,\tau {\plus}h\!\mid\!T_{x} \,\gt\,\tau } \right)\, {\equals}\, \, {\minus}\, {d \over {d\tau }}{\rm ln}_{\tau } p_{x} $$

Let f x (t) be the density function of T x , then from (1) we have $$_{\tau } q_{x} \, {\equals}\, {\int}_0^\tau \,{f_{x} (s)\:ds} \, {\equals}\, {\int}_0^\tau \,{{}_{s}p_{x} \,\mu _{{x{\plus}s}} \:ds} $$ . The central death rate for a x-year-old, where $$x\in{\Bbb N}$$ , is defined as

(2) $$m_{x} \,\colon\! {\equals} \, {{q_{x} } \over {{\int}_0^1 \,{_{s} \:p_{x} ds} }}\, {\equals} \, {{{\int}_0^1 \,{_{s} p_{x} \mu _{{x{\plus}s}} \:ds} } \over {{\int}_0^1 \,{_{s} p_{x} \:ds} }}$$

which is a weighted average of the force of mortality (here $$q_{x} \,\colon\! {\equals} \, {}_{1}q_{x} $$ ). Under the so-called piecewise constant force of mortality assumption, that is $$\mu _{{x{\plus}s}} \, {\equals} \, \mu _{x} $$ , where 0≤s<1 and $$x\in{\Bbb N}$$ , we have, from (2), m x =μ x . Moreover, if a Poisson assumption is made for the actual number of deaths, then the resulting maximum likelihood estimate of the force of mortality $$\hat{\mu }_{x} $$ (and hence $$\hat{m}_{x} $$ ) is given by $$\hat{\mu }_{x} \, {\equals} \, {{D_{x} } \mathord{\left/ {\vphantom {{D_{x} } {E_{x} }}} \right. \kern-\nulldelimiterspace} {E_{x} }}\, {\equals} \, \hat{m}_{x} $$ , where D x is the number of deaths recorded at age x last birthday and the exposure-to-risk E x the average number of people aged x last birthday, during the observation year. Note that E x is approximated by an estimate of the population aged x last birthday in the middle of the observation year. We refer to $$\hat{m}_{x} $$ as the crude death rate.

In the above setup it is assumed that the force of mortality μ is deterministic. The stochastic case can be handled by the intensity-based framework where death time is modelled as the first jump time of a doubly stochastic process (Biffis, Reference Biffis2005). Hereafter, we treat the force of mortality μ x+t (t), the central death rate m x,t and the crude death rate $$\hat{m}_{{x,t}} $$ as stochastic processes. For a detailed discussion of the background of stochastic mortality modelling in discrete-time and continuous-time, see Cairns et al. (Reference Cairns, Blake and Dowd2008).

2.2. Stochastic mortality models

One of the most widely considered examples of stochastic factor model in the context of mortality modelling is the approach first presented in Lee & Carter (Reference Lee and Carter1992) who proposed a stochastic mortality model for the age-specific crude death rate $$\hat{m}_{{x,t}} $$ , where x=x 1, … , x p and t=1, … , T represent age (or age group) and year (time), respectively. Under the model, the dynamics of the log crude death rates, $$y_{{x,t}} \, {\equals} \, {\rm ln}\,\hat{m}_{{x,t}} $$ , is given byFootnote 2

(3) $$y_{{x,t}} \, {\equals} \, \alpha _{x} {\plus}\beta _{x} \kappa _{t} {\plus}{\varepsilon}_{{x,t}} ,\quad {\varepsilon}_{{x,t}} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{{\varepsilon}}^{2} } \right)$$

where $$N\left( {0,\sigma _{{\varepsilon}}^{2} } \right)$$ denotes a Gaussian distribution with zero mean and variance $$\sigma _{{\varepsilon}}^{2} $$ . The vector $${\mib \bialpha} \, {\equals} \, \alpha _{{x_{1} \! \colon\,\! x_{p} }} \,\colon\! {\equals} \, \left[ {\alpha _{{x_{1} }} ,\,\ldots\!,\alpha _{{x_{p} }} } \right]$$ represents the age-profile of the log death rates and $${\mib \bibeta } \, {\equals} \, \beta _{{x_{1} \!\colon\! \,x_{p} }} $$ measures the sensitivity of death rates for different age group to a change of the time series κ t . The period effect, κ t , for forecasting purpose, is assumed to satisfy the equation

(4) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t} ,\quad \omega _{t} \mathop{\,\sim\,}\limits^{{iid}} N(0,\sigma _{\omega }^{2} )$$

where ε x,t and ω t are independent.

Under this specification, it is clear that the LC model is not identifiable, since (3) is invarifollowing standard definitions from actuarial literature on mortality modellinagnt up to some linear transformations of the parameters:

(5) $${\mib{y}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}{\mib \bibeta } \kappa _{t} {\plus}{\mib {\bivarepsilon}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}{\mib \bibeta } c{\plus}{{\mib \bibeta } \over d}\left( {\left( {k_{t} \, {\minus}\, c} \right)d} \right){\plus}{\mib {\bivarepsilon}} _{t} \, {\equals} \, {\tilde{\bialpha }} {\plus}{\tilde{\bibeta }} \tilde{\kappa }_{t} {\plus}{\mib {\bivarepsilon}} _{t} $$

where $${\tilde{\bialpha }} \, {\equals} \, { \bialpha } {\plus}{ \bibeta } c$$ , $${\tilde{\bibeta }} \, {\equals} \, {{\bibeta } \mathord{\left/ {\vphantom {{\beta } d}} \right. \kern-\nulldelimiterspace} d}$$ and $$\tilde{\kappa }_{t} \, {\equals} \, \left( {\kappa _{t} \, {\minus}\, c} \right)d$$ .

To overcome this identification issue when estimating the LC model, one has to impose a non-unique choice of constraints to restrict the model to an identifiable class. It is standard practice in actuarial literature to consider the following two constraints:

(6) $$\mathop{\sum}\limits_{x\, {\equals} \, x_{1} }^{x_{p} } \beta _{x} \, {\equals} \, 1,\quad \mathop{\sum}\limits_{t\, {\equals} \, 1}^T \kappa _{t} \, {\equals} \, 0$$

as suggested in Lee & Carter (Reference Lee and Carter1992) to remedy the identifiability issue. This choice of constraints is equivalent to fixing $$c\, {\equals} \, (1\,/\,T)\mathop{\sum}\nolimits_{t\, {\equals} \, 1}^T \kappa _{t} $$ and $$d\, {\equals} \, \mathop{\sum}\nolimits_{x\, {\equals} \, x_{1} }^{x_{p} } \beta _{x} $$ . Consequently we have $$\mathop{\sum}\nolimits_{t\, {\equals} \, 1}^T \,\tilde{\kappa }_{t} \, {\equals} \, 0$$ and $$\mathop{\sum}\nolimits_{x\, {\equals} \, x_{1} }^{x_{p} } \,\tilde{\beta }_{x} \, {\equals} \, 1$$ . The reason for these particular form of identification constraints relates to the fact that the constraint on the path space of the stochastic factor κ 1, … , κ T is intended to have the effect of centring the κ t values over the range t∈{1, … , T}, such that the structure is designed to capture age–period effects with the α x terms incorporating the main age effects, averaged over time and the bilinear terms β x κ t incorporating the age-specific period trends (relative to the main age effects).

Since the introduction of the LC model it has found a widespread uptake of this class of factor model in both practice, where the LC model is now used as a benchmark methodology by the US Bureau of the Census, and in academia where a range of stochastic mortality model extensions have been proposed in the literature, see Table 1.

Table 1 Several popular stochastic mortality models.

We note here that Renshaw & Haberman (Reference Renshaw and Haberman2003) and Renshaw & Haberman (Reference Renshaw and Haberman2006) introduces multi-period $$\left( {\mathop{\sum}\nolimits_{i\, {\equals} \, 1}^k \beta _{x}^{{(i)}} \kappa _{t}^{{(i)}} } \right)$$ and cohort factor $$\left( {\zeta _{{t\, {\minus}\, x}} } \right)$$ , respectively, to the LC method. Currie (Reference Currie2009) considers a simplified version of the model in Renshaw & Haberman (Reference Renshaw and Haberman2006). Cairns et al. (Reference Cairns, Blake and Dowd2006) propose to model $${\rm logit}\left( {q_{{x,t}} } \right)\,\colon\! {\equals} \, {\rm ln}\left( {q_{{x,t}} \,/\,(1\, {\minus}\, q_{{x,t}} )} \right)$$ instead of log death rates and $$\bar{x}$$ is the average age in the sample range. An addition of cohort factor is studied in Cairns et al. (Reference Cairns, Blake, Dowd, Coughlan, Epstein, Ong and Balevich2009). Plat (Reference Plat2009) introduces a model which combines the desirable features of the previous models and include a term $$\left( {\bar{x}\, {\minus}\, x} \right)^{{\plus}} \,\colon\! {\equals} \, {\rm max}\left( {\bar{x}\, {\minus}\, x,0} \right)$$ to capture better young age mortality. The specification of identification constraints for the LC type models, that is for those where the log death rate is being modelled in Table 1, is discussed in Hunt & Villegas (Reference Hunt and Villegas2015).

2.3. Two-stage estimation approaches: frequentist view

Several “classical” approaches to LC model estimation have been proposed in the literature, though they typically involve a two-stage procedure looking first at the observation equation as a regression (ignoring the latent factor structure explicitly) and then in the second stage they fit time series models to the latent factor structures. A good overview of such methods is obtained in Pitacco et al. (Reference Pitacco, Denuit, Haberman and Olivieri2009). This two-stage procedure is at odds with modern state-space modelling procedures which have been progressively moving towards joint parameter estimation and latent state estimation in frequentist and Bayesian formulations, which will be discussed in subsequent sections. This is reflected in the first attempt to improve the calibration approaches as reflected in the comment in Cairns et al. (Reference Cairns, Blake, Dowd, Coughlan and Khalaf-Allah2011) where they highlight that the “… key element of the proposed framework is our single-stage approach to model fitting and process parameter estimation”. Such sentiments, relating to consistent single-stage joint estimation are also echoed in the work of Czado et al. (Reference Czado, Delwarde and Denuit2005).

2.3.1. Multi-factor LC SVD-based two-stage calibration

One of the most commonly adopted approaches to estimate stochastic mortality models is via SVD. We use the multi-period (k-factor) LC model (Renshaw & Haberman, Reference Renshaw and Haberman2003) with identification constraints given by

(7) $$\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \kappa _{t}^{{(i)}} \, {\equals} \, 0,\quad \mathop{\sum}\limits_{x\, {\equals} \, x_{1} }^{x_{p} } \beta _{x}^{{(i)}} \, {\equals} \, 1$$

where i=1, … , k, as an example to illustrate the methodology below (Koissi et al., Reference Koissi, Shapiro and Hognas2006).

Stage 1a – Observation Equation Estimation Stage: We first notice that the constraint $$\mathop{\sum}\nolimits_{t\, {\equals} \, 1}^T \kappa _{t}^{{(i)}} \, {\equals} \, 0$$ will lead to an estimator for the level α given by

(8) $$\hat{\bialpha }_{x} \, {\equals} \, {1 \over T}\mathop{\sum}\limits_{y\, {\equals} \, 1}^T y_{{x,t}} $$

Stage 1b – Observation Equation Estimation Stage: The next stage is to de-trend the observations { y 1:T } by the level estimate $${\hat{\bialpha }} $$ and then to perform a SVD on the resulting (p×T) matrix of residual observations to obtain the decomposition:

(9) $${\rm SVD}\left[ {{\mib{y}} _{{1\: \colon\: T}} \, {\minus}\, { \hat{\bialpha }} } \right]\, {\equals} \, \mathop{\sum}\limits_{i\, {\equals} \, 1}^h \rho _{i}\; {\mib \bimu } _{i} {\mib \binu } _{i}^{ \top } $$

where ⊤ denotes transposition and ρ i , for i∈{1, … , h}, are the descending singular values where h is the rank of the data matrix. Here μ i and ν i are the corresponding left and right singular vectors of the singular value ρ i with dimension p and T, respectively. For a k-rank, where kh, approximation of the matrix, we have

(10) $${\mib{y}} _{{1\: \colon\: T}} \, {\minus}\, {\hat{\alpha }} \, {\equals} \, \mathop{\sum}\limits_{i\, {\equals} \, 1}^k \rho _{i} {\mib \bimu } _{i} {\mib \binu } _{i}^{ \top } {\plus}\bivarphi _{{1\: \colon\: T}} $$

where $${\mib \bivarphi } _{{1\: \colon\: T}} \, {\equals}\, \mathop{\sum}\nolimits_{i\, {\equals} \, k{\plus}1}^h \,\rho _{i} {\bimu } _{i} {\binu } _{i}^{ \top } $$ is the k-rank residuals. We then identify $${ \tilde{\beta }} ^{{(i)}} \, {\equals} \, { \bimu } _{i} $$ and $${ \tilde{\kappa }} ^{{(i)}} \, {\equals} \, \rho _{i} {\mib \binu } _{i} $$ , for i=1, … , k. One then performs the transformation

(11) $$\kappa _{t}^{{(i)}} \, {\equals} \, \tilde{\kappa }_{t}^{{(i)}} \mathop{\sum}\limits_x \tilde{\beta }_{x}^{{(i)}} ,\quad \beta _{x}^{{(i)}} \, {\equals} \, {{\tilde{\beta }_{x}^{{(i)}} } \over {\mathop{\sum}\limits_x \tilde{\beta }_{x}^{{(i)}} }}$$

to ensure the constraints $$\mathop{\sum}\nolimits_x \beta _{x}^{{(i)}} \, {\equals} \, 1$$ , for i=1, … , k, are satisfied.

Stage 2 – Latent Process Factor Estimation Stage: At this stageFootnote 3 , the estimation of the latent factors can be performed by specifying a time series model structure such as ARIMA model for each of the factors:

(12) $$\kappa _{t}^{{(j)}} \, {\equals} \, \theta ^{{(j)}} {\plus}\mathop{\sum}\limits_{r\, {\equals} \, 1}^p \kappa _{{t\,\, {\minus}\, \,r}}^{{(j)}} {\plus}\mathop{\sum}\limits_{s\, {\equals} \, 1}^q {\varepsilon}_{{t\,\, {\minus}\, \,s}}^{{(j)}} {\plus}{\varepsilon}_{t} $$

or alternatively one could fit the equivalent vector autoregressive model structure not treating each factor as independent in the time series specification. One would typically perform this stage of estimation via the Yule–Walker equations, see for instance discussions in Tsay & Tiao (Reference Tsay and Tiao1984). Under such specifications, one then obtain closed form distributions and estimators for period effect latent factor forecasts that can be substituted into the observation model for forecasts of the mortality by age in future forecast horizons and used to construct life tables.

2.3.2. Regression-based approaches

It is important to note that the SVD approach assumes homoscedasticity in the error structure. Therefore, to account for heteroscedasticity in mortality data for different ages, Brouhns et al. (Reference Brouhns, Denuit and Vermunt2002) propose to model death counts, instead of death rates, via Poisson regression where the addition error term in the LC approach is replaced by Poisson random variation. Specifically, the number of death D x,t is modelled as

(13) $$D_{{x,t}} \:\sim\:{\rm Poisson}\left( {E_{{x,t}} \:m_{{x,t}} (\Phi )} \right)$$

where E x,t is the death exposure, m x,t (Φ) is a model of the central death rate and Φ is the parameter vector according to the model being used, including time dynamic factors such as period and cohort effect, see, for example, Table 1. The parameter vector is then estimated by maximising the log-likelihood function, which is given by

(14) $$l(\Phi;\,D,E)\, {\equals} \, \mathop{\sum}\limits_t \mathop{\sum}\limits_x \left( {D_{{x,t}} {\rm ln}\left( {E_{{x,t}} \:m_{{x,t}} (\Phi )} \right)\, {\minus}\, E_{{x,t}} \:m_{{x,t}} (\Phi )\, {\minus}\, {\rm ln}\left( {D_{{x,t}} \,!\,} \right)} \right)$$

where D x,t ! indicates the factorial of D x,t . Times series models are then used to model the time dynamic factors forming a second-stage estimation procedure for forecasting purpose. Note that the CBD type models can be estimated under this approach since we have q x,t =1−exp{−m x, t} (Cairns et al., Reference Cairns, Blake, Dowd, Coughlan, Epstein, Ong and Balevich2009).

Remark 2.1 In all the discussed cases above, there is the general idea that the two-stage estimation approaches (SVD and regression) treat the unobserved factors corresponding to for instance a period effect κt and a cohort effect ζ tx as parameters. For forecasting purpose, these dynamics factors are then modelled as time series, typically under the ARIMA framework. In this paper we argue that a more consistent approach involves embedding the specification of the model formally within a state-space model structure and to perform the estimation via a joint combination of filtering and static parameter estimation, which can be achieved either in Bayesian (posterior-based) or frequentist (likelihood-based) settings. We will demonstrate both in this paper.

2.4. Estimation approaches: Bayesian view

From the Bayesian modelling perspective there are few papers that study stochastic mortality models, the main papers in this area involve the works of Czado et al. (Reference Czado, Delwarde and Denuit2005), Pedroza (Reference Pedroza2006), Kogure et al. (Reference Kogure, Kitsukawa and Kurachi2009) and Cairns et al. (Reference Cairns, Blake, Dowd, Coughlan and Khalaf-Allah2011). As observed in these studies, there are many possible advantages to adopting a Bayesian approach for mortality modelling, especially in the context of small populations which may also have substantial quantities of missing data.

An important point to note is that all Bayesian model formulations to date in the mortality modelling literature, that we are aware of, have utilised what would, in modern statistical approaches be considered rudimentary sampling-based approaches to performing Bayesian estimation of the LC type models. The criticism here can be levelled in two ways:

  1. 1. The first relates to the fact that in these Bayesian formulations the latent dynamic process states are still treated in the MCMC sampling procedures as if they were a set of static model parameters. The issues with doing this have been mentioned in numerous places, see for example, Carter & Kohn (Reference Carter and Kohn1994). Recently new approaches to such inference in Bayesian models have been developed to avoid having to make univariate conjugate Gibbs or Metropolis-within-Gibbs steps for the latent processes. The reason for this is that it is known in general to be very inefficient in performing inference and can be prone to misleading posterior inference results due to poor mixing performance of the Markov chain for a finite computational budget. Detailed discussions have been provided on such problems in Andrieu et al. (Reference Andrieu, Doucet and Holenstein2010) and subsequently in work such as Chopin et al. (Reference Chopin, Jacob and Papaspiliopoulos2013), and the specific case to population-based state-space models in ecology in Peters et al. (2010 Reference Peters, Hosack and Hayesa ). Pedroza (Reference Pedroza2006) also follows this more efficient sampling approach for mortality modelling based on Carter & Kohn (Reference Carter and Kohn1994).

  2. 2. Second, all existing MCMC sampling-based approaches we are aware of for Bayesian inference in the mortality modelling literature tends to neglect the issue of model identification in the likelihood which can cause issues in the Bayesian formulation. In fact, some approaches implement identification constraint in the Bayesian model and develop an MCMC sampler that tries to impose the identification constraint in such a manner that the resulting Markov chain may not be consistent with preserving the correct invariant stationary distribution if one applies the constraints inappropriately. We investigate this issue in a separate paper (Peters et al., Reference Peters, Fung and Shevchenko2017).

These two considerations need to be resolved to update the approaches to more efficient sampling approaches with enhanced specifications of the model formulation to deal with such issues directly. In particular, modern approaches to such model estimations are to treat the latent unobserved process not as static parameters but as a state-space model in which filtering-based methods (Kalman filter variants, SMC) can be utilised for the latent process estimations jointly with consistent estimation of the “static” model parameters. We will detail such estimation procedures which are also consistent with imposing specific identification constraints of relevance to the LC model formulations, that are developed to ensure the correct invariant Bayesian posterior model is preserved by the Markov chain sampler and filters developed.

Remark 2.2 (Likelihood Identification Issues and Bayesian Modelling): We note the fact that model parameters that are not identified in the likelihood pose no formal problem in a Bayesian analysis. Identification is a property of the likelihood function, whereas Bayesian inference simply uses the likelihood function to map through the data from prior beliefs to posterior beliefs. However, it is often the case that working with unidentified likelihood functions is usually unsatisfactory from a practical perspective as it may lead to partial identification issues in the posterior or problematic multimodality in the posterior. In general if one utilises a proper prior distribution it may act to provide a “near-identification” in the sense that one considers parameter restrictions as limiting forms of prior densities, then there is at least a functional equivalence between introducing prior information about parameters, and imposing identifying restrictions.

3. State-Space Formulations of Mortality Models

We are now in a position to present an alternative representations of stochastic mortality modelling based on state-space methodology (Harvey, Reference Harvey1989; West & Harrison, Reference West and Harrison1997). A key advantage of this approach is that the two-stage estimation and forecasting procedure under the SVD or Poisson regression maximum likelihood approaches can be combined in a single setting. The improved statistical consistency of a single-stage approach is recognised in Cairns et al. (Reference Cairns, Blake, Dowd, Coughlan and Khalaf-Allah2011). Another key advantage comes from the recent progress in sampling-based techniques in the estimation of state-space models. The advancement allows statistical inference to be conducted on sophisticated state-space models. We take advantage of this development and utilise realistic model aiming to capture long-term mortality dynamics.

A general state-space model consists of a state equation

(15) $$\phi _{t} \, {\equals} \, a\left( {\phi _{{t\, {\minus}\, 1}} ,{\mib \bimu } _{t} } \right)$$

and an observation equation

(16) $${\mib{z}} _{t} \, {\equals} \, b\left( {\phi _{t} ,{\mib \binu } _{t} } \right)$$

where the states ϕ t form a hidden/latent Markov process with disturbance ν t , and the observed time series data z t depends only on ϕ t and disturbance ν t . Here a(.) and b(.) are possibly non-linear functions, and the states ϕ t and observations z t can be multi-dimensional.

It is clear that the models in Table 1 specify the observation equation of different state-space models that can be considered. For example, for the multi-period LC model (Renshaw & Haberman, Reference Renshaw and Haberman2003), the observed data is $$z_{{x,t}} \, {\equals} \, {\rm ln}\left( {\hat{m}_{{x,t}} } \right)$$ for different age x and the latent states are the period effects $$\phi _{t} \, {\equals} \, \big( {\kappa _{t}^{{(1)}} ,\,\ldots\!,\kappa _{t}^{{(k)}} } \big)$$ . We also note that multi-population (i.e. multi-curve) structures can be incorporated in the following state-space models in a number of different ways and the approaches we will develop for estimation will accommodate such settings. In the following sub-sections we will discuss a few different classes of mortality models that are difficult to deal with in the approaches mentioned in section 2, but can be handled straightforwardly in state-space framework.

3.1. LC model with heteroscedasticity (LC-H model)

We present here a state-space formulation of the LC model with heteroscedasticity structure. In this context, the heteroscedasticity refers to a relaxation of the constant single degree of freedom diagonal covariance assumption typically made on the observation vector for each year t across the panel of age group stratifications $${\mib{y}} _{t} \, {\equals} \, \left( {y_{{x_{1} ,t}} ,y_{{x_{2} ,t}} ,\,\ldots\!,y_{{x_{p} ,t}} } \right)$$ . Within this state-space model structure we propose an alternative identification constraint which is tailored for the estimation under the state-space approach.

The LC model with heteroscedasticity structure can be written in state-space form by combining the processes $${\mib{y}} _{t} \, {\equals} \, \left( {y_{{x_{1} ,t}} ,\,\ldots\!,y_{{x_{p} ,t}} } \right)$$ and κ t into one dynamical system

(17a) $${\mib{y}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}{\mib \bibeta } \kappa _{t} {\plus}{\mib {\bivarepsilon}} _{t} ,\quad {\mib {\bivarepsilon}} _{t} \mathop{\,\sim\,}\limits^{{iid}} N(0,\Sigma )$$
(17b) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t} ,\quad \omega _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{\omega }^{2} } \right)$$

where $${\mib \bialpha } \, {\equals} \, \alpha _{{x_{1} \!\colon\,x_{p} }} $$ , $${\mib \bibeta } \, {\equals} \, \beta _{{x_{1} \!\colon\,x_{p} }} $$ and κ t is the latent state of the resulting linear Gaussian state-space model. Here, Σ is a p by p diagonal matrix with $$\sigma _{{{\varepsilon},x_{1} \,\colon\,x_{p} }}^{2} $$ on the diagonal. We refer to this model as LC-H model, and the special case with $$\sigma _{{{\varepsilon},x_{i} }}^{2} \, {\equals} \, \sigma _{{\varepsilon}}^{2} $$ , i∈{1, … , p}, as LC model.

Instead of the identification constraint (6), we suggest an alternative constraint which is simpler and more readily applicable to Monte Carlo-based procedures such as MCMC and SMC. Our formulation of the identification constraint is given by setting

(18) $$\alpha _{{x_{1} }} \, {\equals} \, {\rm constant},\quad \beta _{{x_{1} }} \, {\equals} \, {\rm constant}$$

Such a choice is a valid identification constraint since if one of the elements of each α and β are known (here we have arbitrarily chosen $$\alpha _{{x_{1} }} $$ and $$\beta _{{x_{1} }} $$ ; in general one can choose $$\alpha _{{x_{i} }} $$ and $$\beta _{{x_{j} }} $$ , where i, j∈{1, … , p}), then a non-trivial linear transformation in (5) is not allowed anymore; that is, we must have c=0 and d=1. Note that implementing the proposed constraint is straightforward in both maximum likelihood and Bayesian setting compared to the constraint (6).

Remark 3.1 As discussed in section 2.4, one needs to be careful about the way how the constraint (6) is implemented in a MCMC-based sampling setting. An advantage of the proposed constraint (18) here is that it is error proof in this setup. Note that regardless of which constraint one chooses, a natural interpretation of α and β is that α represents the general pattern of the age-specific log death rates while β captures the sensitivity of log death rates with respect to a change of the period effect. Therefore, one can naturally set a value for $$\alpha _{{x_{1} }} $$ as the average of the log death rate at age x 1. On the other hand, there is no restriction on what value $$\beta _{{x_{1} }} $$ can take. However, we recommend it to be in the range of [0.01, 1], as too small or too large value may lead to numerical issues such as overflow/underflow problems. We also emphasise here that the resulting fit and forecasting ability of the model will not be affected by the choice of constraint since an identification constraint serves only to identify the model in a unique way.

3.2. Two-factor LC model with age-based heteroscedasticity (LC2-H model)

A natural extension of the LC-H model is to include a second stochastic factor for the cohort effect. We denote this model by LC2-H model. The cohort effect (Renshaw & Haberman, Reference Renshaw and Haberman2006) can be modelled under the state-space framework as follows:

(19) $$\left[ {\matrix{ {y_{{x_{1} ,t}} } \cr {y_{{x_{2} ,t}} } \cr \vdots \cr {y_{{x_{p} ,t}} } \cr } } \right]\, {\equals} \, \left[ {\matrix{ {\alpha _{{x_{1} }} } \cr {\alpha _{{x_{2} }} } \cr \vdots \cr {\alpha _{{x_{p} }} } \cr } } \right]{\plus}\left[ {\matrix{ {\beta _{{x_{1} }}^{{(1)}} } &#x0026; {\beta _{{x_{1} }}^{{(2)}} } &#x0026; 0 &#x0026; \cdots &#x0026; 0 \cr {\beta _{{x_{2} }}^{{(1)}} } &#x0026; 0 &#x0026; {\beta _{{x_{2} }}^{{(2)}} } &#x0026; \cdots &#x0026; 0 \cr \vdots &#x0026; \vdots &#x0026; \vdots &#x0026; \ddots &#x0026; \vdots \cr {\beta _{{x_{p} }}^{{(1)}} } &#x0026; 0 &#x0026; 0 &#x0026; \cdots &#x0026; {\beta _{{x_{p} }}^{{(2)}} } \cr } } \right]\left[ {\matrix{ {\kappa _{t} } \cr {\zeta _{t}^{{x_{1} }} } \cr {\zeta _{t}^{{x_{2} }} } \cr \vdots \cr {\zeta _{t}^{{x_{p} }} } \cr } } \right]{\plus}\left[ {\matrix{ {{\varepsilon}_{{x_{1} ,t}} } \cr {{\varepsilon}_{{x_{2} ,t}} } \cr \vdots \cr {{\varepsilon}_{{x_{p} ,t}} } \cr } } \right]$$

where $$\zeta _{t}^{x} \,\colon\! {\equals} \, \zeta _{{t\, {\minus}\, x}} $$ . The state equation can be expressed as

(20) $$\left[ {\matrix{ {\kappa _{t} } \cr {\zeta _{t}^{{x_{1} }} } \cr {\zeta _{t}^{{x_{2} }} } \cr \vdots \cr {\zeta _{t}^{{x_{{p\, {\minus}\, 1}} }} } \cr {\zeta _{t}^{{x_{p} }} } \cr } } \right]\, {\equals} \, \left[ {\matrix{ 1 &#x0026; 0 &#x0026; 0 &#x0026; \cdots &#x0026; 0 &#x0026; 0 \cr 0 &#x0026; \vartheta &#x0026; 0 &#x0026; \cdots &#x0026; 0 &#x0026; 0 \cr 0 &#x0026; 1 &#x0026; 0 &#x0026; \cdots &#x0026; 0 &#x0026; 0 \cr 0 &#x0026; 0 &#x0026; 1 &#x0026; \cdots &#x0026; 0 &#x0026; 0 \cr \vdots &#x0026; \vdots &#x0026; \vdots &#x0026; \ddots &#x0026; \vdots &#x0026; \vdots \cr 0 &#x0026; 0 &#x0026; 0 &#x0026; \cdots &#x0026; 1 &#x0026; 0 \cr } } \right]\left[ {\matrix{ {\kappa _{{t\, {\minus}\, 1}} } \cr {\zeta _{{t\, {\minus}\, 1}}^{{x_{1} }} } \cr {\zeta _{{t\, {\minus}\, 1}}^{{x_{2} }} } \cr \vdots \cr {\zeta _{{t\, {\minus}\, 1}}^{{x_{{p\, {\minus}\, 1}} }} } \cr {\zeta _{{t\, {\minus}\, 1}}^{{x_{p} }} } \cr } } \right]{\plus}\left[ {\matrix{ \theta \cr 0 \cr 0 \cr \vdots \cr 0 \cr 0 \cr } } \right]{\plus}\left[ {\matrix{ {\omega _{t}^{\kappa } } \cr {\omega _{t}^{{\zeta _{1} }} } \cr 0 \cr \vdots \cr 0 \cr 0 \cr } } \right]$$

Here, we assume κ t is a random walk with drift process and an autoregressive model of order 1 (AR(1)) process is assumed for the cohort effect, that is $$\zeta _{t}^{{x_{1} }} \, {\equals} \, \vartheta \zeta _{{t\, {\minus}\, 1}}^{{x_{1} }} {\plus}\omega _{t}^{{\zeta _{1} }} $$ , where |ϑ|<1. Note that, from (20), we have $$\zeta _{t}^{{x_{i} }} \, {\equals} \, \zeta _{{t\, {\minus}\, 1}}^{{x_{{i\, {\minus}\, 1}} }} $$ for i=2, … , p, which is the defining property of the cohort effect and consequently we are only required to model the dynamics of $$\zeta _{t}^{{x_{1} }} $$ . We can write the model (19)–(20) in the following form:

(21a) $${\mib{y}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}B\left[ {\kappa _{t} ,{\mib \bizeta } _{t} } \right]^{ \top } {\plus}{\mib {\bivarepsilon}} _{t} ,\quad {\mib {\bivarepsilon}} _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\Sigma } \right)$$
(21b) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t}^{\kappa } ,\quad \omega _{t}^{\kappa } \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{{\omega ^{\kappa } }}^{2} } \right)$$
(21c) $${\mib \bizeta } _{t} \, {\equals} \, C\,{\mib \bizeta } _{{t\, {\minus}\, 1}} {\plus}D{\mib \biomega } _{t}^{\zeta } ,\quad \omega _{t}^{{\zeta _{1} }} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{{\omega ^{\zeta } }}^{2} } \right)$$

where B is the p by p+1 matrix in (19), C the corresponding p by p sub-matrix in (20) and D a zero p by p matrix except for the (1, 1) element with value 1.

For further details on the formulation, estimation and forecasting of cohort models in state-space framework, see Fung et al. (Reference Fung, Peters and Shevchenko2017).

3.3. Three-factor LC model with dynamic time-based heteroscedasticity (LC3-H2 model)

We can further extend the LC2-H model by adding a third factor for the dynamic of volatility in the observation vector over time. Specifically, a time dependent stochastic factor $$\sqrt {\gamma _{t}^{y} } $$ , which is independent to age (or age group), is incorporated into the observation noise term as follows:

(22a) $${\mib{y}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}B\left[ {\kappa _{t} ,{\mib \bizeta } _{t} } \right]^{ \top } {\plus}\sqrt {\gamma _{t}^{y} } {\bf {\bivarepsilon}}_{t} ,\quad {\bf {\bivarepsilon}}_{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\Sigma } \right)$$
(22b) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t}^{\kappa } ,\quad \omega _{t}^{\kappa } \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{{\omega ^{\kappa } }}^{2} } \right)$$
(22c) $${\mib \bizeta } _{t} \, {\equals} \, C{\mib \bizeta } _{{t\, {\minus}\, 1}} {\plus}D{\mib \biomega } _{t}^{\zeta } ,\quad \omega _{t}^{{\zeta _{1} }} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{{\omega ^{\zeta } }}^{2} } \right)$$
(22d) $$\gamma _{t}^{y} \, {\equals} \, a\left( {b\, {\minus}\, \gamma _{{t\, {\minus}\, 1}}^{y} } \right){\plus}\gamma _{{t\, {\minus}\, 1}}^{y} {\plus}\sigma \sqrt {\gamma _{{t\, {\minus}\, 1}}^{y} } {\varepsilon}_{t}^{{\gamma ^{y} }} ,\quad {\varepsilon}_{t}^{{\gamma ^{y} }} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,1} \right)$$

where $$\gamma _{t}^{y} $$ is a process obtained via an Euler discretisation of a square Bessel process corresponding to the Cox–Ingersoll–Ross process given by

$$d\gamma _{t}^{y} \, {\equals} \, a\left( {b\, {\minus}\, \gamma _{t}^{y} } \right)\:dt{\plus}\sigma \sqrt {\gamma _{t}^{y} } \:dW_{t} $$

where 2abσ 2 to ensure $$\gamma _{t}^{y} $$ is strictly positive. Such a dynamic volatility factor can be used to explain time-varying periods of heightened observation variance, which potentially occur in some populations over time. These may be attributed to disease, war, famine, environmental factors or shocks as well as changes in migration and immigration patterns that could influence the volatility of the observed death counts in different age groups. Note that this is one possible way to introduce stochastic volatility to mortality models where the stochastic volatility factor enters via the observation noise term. Another possible choice is to consider stochastic volatility through the state equation described in section 3.4 which will be the focus of this paper.

3.4. Multi-factor model with stochastic volatility in the latent process (LCSV and LCSV-H models)

A common assumption in mortality modelling is that the period effect is derived from a discretisation of a random walk with drift process. Such a process may be sufficient for modelling simple dynamics, but can be insufficient if time-varying periods of volatility are present in the time series. In fact, much of the literature focusses mainly on capturing the trend of the period effect κ t for the past several decades where mortality time series for many countries are reasonably smooth.

Here, we extend the LC framework to incorporate stochastic volatility in the latent process. As a result, the impact of epidemics, natural disasters, medical breakthrough or wars on the evolution of mortality can be taken into account. This will produce different structural effects on the calibration and importantly on the forecasting when compared to the previously developed model of LC3-H2. We refer (23a)–(23c) as LC stochastic volatility model which we denote as LCSV model:

(23a) $${\mib{y}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}{\mib \bibeta } \kappa _{t} {\plus}{\mib {\bivarepsilon}} _{t} ,\quad {\mib {\bivarepsilon}} _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {{\bf{0}} ,\sigma _{{\varepsilon}}^{2} {\bf{1}} _{p} } \right)$$
(23b) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t} ,\quad \omega _{t} \!\mid\!\gamma _{t} \!\sim\1N\left( {0,exp\left\{ {\gamma _{t} } \right\}} \right)$$
(23c) $$\gamma _{t} \, {\equals} \, \lambda _{1} \gamma _{{t\, {\minus}\, 1}} {\plus}\lambda _{2} {\plus}\eta _{t} ,\quad \eta _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{\gamma }^{2} } \right)$$

The log-volatility process γ t is introduced in the state equation for κ t via the error term ω t . The process γ t is an AR(1) with |λ 1|<1 and the mean reverting level is given by λ 2/(1−λ 1). A heteroscedasticity structure can be introduced in (23a) and will be referred to as LCSV-H model. Specifically, the LCSV-H model is given by

(24a) $${\mib{y}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}{\mib \bibeta } \kappa _{t} {\plus}{\mib {\bivarepsilon}} _{t} ,\quad {\mib {\bivarepsilon}} _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {{\bf{0}} ,\Sigma } \right)$$
(24b) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t} ,\quad \omega _{t} \!\mid\!\gamma _{t} \: \sim\: N\left( {0,{\rm exp}\left\{ {\gamma _{t} } \right\}} \right)$$
(24c) $$\gamma _{t} \, {\equals} \, \lambda _{1} \gamma _{{t\, {\minus}\, 1}} {\plus}\lambda _{2} {\plus}\eta _{t} ,\quad \eta _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{\gamma }^{2} } \right)$$

Cohort effect can also be incorporated into the LCSV model as follows:

(25a) $${\mib y}_{t} \, {\equals} \, {\mib \bialpha } {\plus}B\left[ {\kappa _{t} ,{\mib \bizeta } _{t} } \right]^{ \top } {\plus}{\mib {\bivarepsilon}} _{t} ,\quad {\mib {\bivarepsilon}} _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {{\bf{0}} ,\sigma _{{\varepsilon}}^{2} {\bf{1}} _{p} } \right)$$
(25b) $$\kappa _{t} \, {\equals} \, \kappa _{{t\, {\minus}\, 1}} {\plus}\theta {\plus}\omega _{t} ,\quad \omega _{t} \!\mid\!\gamma _{t} \:\sim\: N\left( {0,exp\left\{ {\gamma _{t} } \right\}} \right)$$
(25c) $$\gamma _{t} \, {\equals} \, \lambda _{1} \gamma _{{t\, {\minus}\, 1}} {\plus}\lambda _{2} {\plus}\eta _{t} ,\quad \eta _{t} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{\gamma }^{2} } \right)$$
(25d) $${\mib \bizeta } _{t} \, {\equals} \, C{\mib \bizeta } _{{t\, {\minus}\, 1}} {\plus}D{\mib \biomega } _{t}^{\zeta } ,\quad {\mib \omega } _{t}^{{\zeta _{1} }} \mathop{\,\sim\,}\limits^{{iid}} N\left( {0,\sigma _{{\omega ^{\zeta } }}^{2} } \right)$$

Compared to the LC3-H2 model where stochastic volatility is included in the observation noise term, introducing stochastic volatility in the latent period process has the advantage, in terms of simplicity and ease of interpretation, that the variability of mortality data in the time dimension is captured purely by the latent process.

4. Frequentist State-Space Inference

Given these different state-space model structures, the next task to consider is the inference for the joint single-stage state and parameter estimation. In this section, we consider full likelihood-based joint inference procedures based on filtering and gradient estimation. To achieve this we must describe both filtering in linear Gaussian and non-linear/non-Gaussian filtering via SMC method (particle filters) and their application to gradient-based estimation in the marginal likelihood, having integrated out the latent state processes. We will do this in a general way and then present particular examples of relevance to this paper.

Under the classical maximum likelihood approach, parameters are estimated by maximising a model’s log-likelihood function. In the case of state-space models in the form of (15)–(16), the likelihood is in two forms: the complete data likelihood, assuming ϕ 0 fixed, is given by

(26) $$p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib{z}} _{{1\: \colon\: T}} } \right)\, {\equals} \, \prod\limits_{t\, {\equals} \, 1}^T p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right)$$

and the marginal likelihood, typically used for the static model-based inference, is given by

(27) $$p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)\, {\equals} \, {\int} {\prod\limits_{t\, {\equals} \, 1}^T p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right)\:d{\mib \biphi } _{t} } $$

where $$\bipsi $$ denotes the d-dimensional parameter vector of the model. Two challenges now arise. The first is that typically the integral in (27) cannot be evaluated in closed form, except for linear Gaussian state-space model systems. The second issue is that the gradient equations for such a state-space model marginal likelihood, even if they can be calculated in closed form, requires a non-linear multiple equation solver. For this reason it is common to adopt a solution based on a recursive estimation using gradient and Hessian information from the marginal likelihood. Under the gradient-based approach, the optimal parameter vector can be found by iterations, where the (m+1)th estimate is obtained by

(28) $${\mib \bipsi } ^{{(m{\plus}1)}} \, {\equals} \, {\mib \bipsi } ^{{(m)}} \, {\minus}\, \left[ {\nabla _{{\mib \bipsi } }^{2} \:\ell ({\mib \bipsi } ^{{(m)}} )} \right]^{{ {\minus} 1}} \nabla _{{\mib \bipsi } } \:\ell \left( {{\mib \bipsi } ^{{(m)}} } \right)$$

where $$\ell ({\mib \bipsi } )$$ , $$\nabla _{{\mib \bipsi } } \ell ({\mib \bipsi } )$$ and $$\, {\minus} \nabla _{{\mib \bipsi } }^{2} \:\ell ({\mib \bipsi } )$$ denote the log-likelihood function, the gradient (or score) vector and the Hessian information matrix of the log-likelihood function, respectively, defined with respect to grad and Laplacian differential operators given by

(29) $$\eqalignno{ &#x0026; \left[ {\nabla _{{\mib \bipsi } } } \right]_{i} \,\colon\! {\equals} \, {\partial \over {\partial \psi _{i} }},\quad \forall i\in\{ 1,\,\ldots,n\} \cr &#x0026; \left[ {\nabla _{{\mib \bipsi } }^{2} } \right]_{{i,j}} \,\colon\! {\equals} \, {{\partial ^{2} } \over {\partial \psi _{i} \partial \psi _{j} }},\quad \forall i,j\in\left\{ {1,\,\ldots ,n} \right\} $$

The iterating scheme will stop once certain criterion is met, for example, when the magnitude of the score vector is small enough. This will be illustrated using the LC-H model as an example in section 4.2.

The results developed are based on the marginal likelihood of the state-space model, with generic static model parameters $$\bipsi $$ for observations z 1:T = z 1, … , z T having integrated out latent states ϕ 1, … , ϕ T , denoted by p $$_\bipsi $$ ( z 1:T ). We are then interested in forming recursive filtering to integrate the complete data likelihood to find the marginalised likelihood and then working with recursive gradient-based estimation to update static model parameters in Newton–Descent type algorithm, or for linear Gaussian systems a recursive least squares-based approach.

As observed in Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2005) and Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2011), it is useful to consider two classes of identities for the gradient and Hessian of the marginalised likelihood, given by the Fisher’s identity and the Louis’ identity, respectively, according to

(30) $$\matrix{ {\nabla _{{\mib \bipsi } } p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)} \hfill &#x0026;\!\!\!\!\!\!\!\!\! {{\equals} \, {\int} {\nabla _{{\mib \bipsi } }{\rm ln}\, p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib{z}} _{{1\: \colon\: T}} } \right)p_{{\mib \bipsi } } \left( {\left. {{\mib \biphi } _{{1\: \colon\: T}} } \right|{\mib{z}} _{{1\: \colon\: T}} } \right)\,d{\mib \biphi } _{{1\: \colon\: T}} } } \hfill \cr { \!{\minus}\! \nabla _{{\mib \bipsi } }^{2} p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)} \hfill &#x0026; \!\!\!\!\!\!{\, {\equals} \, \nabla _{{\mib \bipsi } } {\rm ln }\;p_{{\mib \bipsi} } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)\nabla _{{\mib \bipsi } } {\rm ln} \,p_{{\bf \bipsi }} \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)^{{\rm \top }} \, {\minus}\, {{\nabla _{{\mib \bipsi } }^{2} {\rm ln}\;p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)} \over {p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)}}} \hfill \cr } $$

where

(31) $$\eqalignno{ {{\nabla _{{\mib \bipsi } }^{2} p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)} \over {p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)}}\, {\equals} \, &#x0026; {\int} {\nabla _{{\mib \bipsi } } {\rm ln}p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib{z}} _{{1\: \colon\: T}} } \right)\nabla _{{\mib \bipsi } } {\rm ln}\;p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib{z}} _{{1\: \colon\: T}} } \right)^{{\rm \top }} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)d{\mib \biphi } _{{1\: \colon\: T}} } \cr &#x0026; {\plus}{\int} {\nabla _{{\mib \bipsi } }^{2} {\rm ln}p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib{z}} _{{1\: \colon\: T}} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\1{\mib{z}} _{{1\: \colon\: T}} } \right)d{\mib \biphi } _{{1\: \colon\: T}} } $$

An important point about these recursions is that the integrals for the gradient vector and Hessian matrix are expressed in terms of the path-space distribution p $$_\bipsi $$ ( ϕ 1:T | z 1:T ). In the case of the linear Gaussian dynamics this distribution can be obtained based on variations of the Kalman filter recursion, however, when the state-space model is non-linear or non-Gaussian this distribution must be estimated via Monte Carlo methods. The most efficient of these methods for state-space modelling purposes is known as the class of SMC methods (particle filters). In this case it will be more accurate from the perspective of the variance of the estimated gradient and Hessian matrices to utilise the filter distribution-based estimators in a recursive fashion based on the local estimates of the distributions p $$_\bipsi $$ ( ϕ t | z 1:t ), for each t∈{1, … , T} rather than the path-space estimator which is based on distribution p $$_\bipsi $$ ( ϕ 1:T | z 1:T ) at the final time T. The result explaining the difference in estimation precision for the gradient and Hessian, from the perspective of variance of the solution on the path-space distribution versus filter distributions is provided in theorem 1 of Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2011). This motivates the need to work with the filter recursions.

To achieve this one can replace in the Fisher and Louis’ identities the path-space quantities p $$_\bipsi $$ ( ϕ 1:T , z 1:T ) and p $$_\bipsi $$ ( ϕ 1:T | z 1:T ) by the filter quantities given by p $$_\bipsi $$ ( ϕ t , z 1:t ) and p $$_\bipsi $$ ( ϕ t | z 1:t ). After this substitution, one may utilise the following recursive formulations to evaluate the gradient and Hessian, see Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2005) and Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2011). In this case the Fisher identity is recursively given by

$$\eqalignno{ {\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: t}} } \right)} \,&#x0026; { {\equals} \, {\int} {\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\:\colon\: t}} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t}} } \right)\,d{\mib \biphi } _{t} } } \cr {\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)} \,&#x0026; { {\equals} \, {{p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\:\colon\:t\, {\minus}\, 1}} } \right)p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\:{\mib \biphi } _{t} } \right)} \over {p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)}}{\int} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)} \cr {} &#x0026; \qquad {{\times}\left[ {\nabla _{{\mib \bipsi } } {\rm ln}p_{{\mib \bipsi } } \left( {z_{t} \!\mid\!{\mib \biphi } _{t} } \right){\plus}\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right){\plus}\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} ,{\mib{z}} _{{1\:\colon\: t\, {\minus}\, 1}} } \right)} \right]d{\mib \biphi } _{{t\, {\minus}\, 1}} } \cr {p_{\psi } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)} &#x0026; {\, {\equals} \, p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right){\int} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)d{\mib \biphi } _{{t\, {\minus}\, 1}} } \cr } $$

The recursive form of Luis’ identity is given by

$$\eqalignno{ {{{\nabla _{{\mib \bipsi } }^{2} p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: t}} } \right)} \over {p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)}}} &#x0026;\, { {\equals} \, {\int} {\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)^{{\rm \top }} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t}} } \right)\:d{\mib \biphi } _{t} } } \cr {} &#x0026; \qquad {{\plus}{\int} {\nabla _{{\mib \bipsi } }^{2} {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t}} } \right)\:d{\mib \biphi } _{t} } } \cr {\nabla _{{\mib \bipsi } }^{2} {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)} &#x0026; {\, {\equals} \, {{\nabla _{{\mib \bipsi } }^{2} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)} \over {p_{\psi } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)}}\, {\minus}\, \nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)\nabla _{{\mib \bipsi } } {\rm ln} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)^{{\rm \top }} } \cr {\nabla _{{\mib \bipsi } }^{2} p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} ,{\mib{z}} _{{1\: \colon\: t}} } \right)} &#x0026; {\, {\equals} \, p_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right){\int} {p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)} } \cr {} &#x0026; \qquad {{\times}\left\{ {\left[ {\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right){\plus}\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right){\plus}\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} ,{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)} \right]} \right.} \cr {} &#x0026; \qquad {{\times}\left[ {\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right){\plus}\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right){\plus}\nabla _{{\mib \bipsi } } {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} ,{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)} \right]^{{\rm \top }} } \cr {} &#x0026; \qquad {\left. {{\plus}\left[ {\nabla _{{\mib \bipsi } }^{2} {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right){\plus}\nabla _{{\mib \bipsi } }^{2} {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right){\plus}\nabla _{{\mib \bipsi } }^{2} {\rm ln}\,p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} ,{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)} \right]} \right\}d{\mib \biphi } _{{t\, {\minus}\, 1}} } \cr } $$

In general, the solution to these recursions can be achieved via SMC as detailed in Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2005) and Poyiadjis et al. (Reference Poyiadjis, Doucet and Singh2011).

In the following sections, we will illustrate the use of these recursive identities for the special case of the LC-H model where the state-space takes a linear Gaussian form. In this case the integrals and recursive evaluation of the gradient and Hessian can be written in closed form. To proceed we first introduce the optimal filter recursion, with respect to minimisation of mean-squared error, in the case of linear Gaussian state-space models, known as the Kalman filter (Kalman, Reference Kalman1960; Harvey, Reference Harvey1989).

4.1. Closed form filter recursions for LC-H model

The aim of filtering is to obtain the distribution of the latest state given observations. For a general state-space model, (15)–(16), the filtering density π( ϕ t | z 1:t ) at time t can be calculated sequentially by first assuming the filtering density π( ϕ t−1| z 1:t−1) at time t−1 is known. Then the one-step ahead predictive density for the state is given by

(32) $$\pi \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)\, {\equals} \, {\int} {\pi \left( {{\mib \biphi } _{t} \!\mid\!{\mib \biphi } _{{t\, {\minus}\, 1}} } \right)\pi \left( {{\mib \biphi } _{{t\, {\minus}\, 1}} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)d{\mib \biphi } _{{t\, {\minus}\, 1}} } $$

From Bayes’ formula and the structure of the conditional dependency of the state-space model, one can obtain the filtering density as

(33) $$\pi \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t}} } \right)\, {\equals} \, {{\pi \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)\pi \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right)} \over {{\int} {\pi \left( {{\mib \biphi } _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)\pi \left( {{\mib{z}} _{t} \!\mid\!{\mib \biphi } _{t} } \right)d{\mib \biphi } _{t} } }}$$

For non-linear and non-Gaussian state-space models, numerical techniques such as SMC methods are required to estimate the filtering density, see Doucet et al. (Reference Doucet, De Freitas and Gordon2001), Doucet et al. (Reference Doucet, Godsill and Andrieu2000) and Liu (Reference Liu2008).

In the case of the LC-H model, since it is a linear and Gaussian state-space model, the filtering distribution can be obtained analytically via Kalman filtering. In particular, we can find the conditional distributions of the key quantities in the filtering recursions are all Gaussian distributions as follows:

(34a) $$\kappa _{{t\, {\minus}\, 1}} \!\mid\!{\mib{y}} _{{1\: \colon\: t\, {\minus}\, 1}} \: \sim\: N\left( {m_{{t\, {\minus}\, 1}} ,C_{{t\, {\minus}\, 1}} } \right)$$
(34b) $$\kappa _{t} \!\mid\!{\mib{y}} _{{1\: \colon\: t\, {\minus}\, 1}} \: \sim\: N\left( {a_{t} ,R_{t} } \right)$$
(34c) $${\mib{y}} _{t} \!\mid\!{\mib{y}} _{{1\: \colon\: t\, {\minus}\, 1}} \: \sim\: N\left( {{\mib{f}} _{t} ,{\mib{Q}} _{t} } \right)$$
(34d) $$\kappa _{t} \!\mid\!{\mib{y}} _{{1\: \colon\: t}} \: \sim\: N\left( {m_{t} ,C_{t} } \right)$$

where the recursive nature of these distributions arises from the recursions of the sufficient statistics:

(35a) $$a_{t} \, {\equals} \, m_{{t\, {\minus}\, 1}} {\plus}\theta ,\quad R_{t} \, {\equals} \, C_{{t\, {\minus}\, 1}} {\plus}\sigma _{\omega }^{2} $$
(35b) $${\mib{f}} _{t} \, {\equals} \, {\mib \bialpha } {\plus}{\mib \bibeta } a_{t} ,\quad {\mib{Q}} _{t} \, {\equals} \, {\mib \bibeta \bibeta } ^{{\rm \top }} R_{t} {\plus}\Sigma $$
(35c) $$m_{t} \, {\equals} \, a_{t} {\plus}R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\, 1}} \left( {{\mib{y}} _{t} \, {\minus}\, {\mib{f}} _{t} } \right),\quad C_{t} \, {\equals} \, R_{t} \, {\minus}\, R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \bibeta } R_{t} $$

That is, given the filtering distribution at t−1, (34a), the filtering distribution at t is given by (34d) using (35a)–(35c).

4.2. Closed form gradient-based estimation via score and Hessian recursions for LC-H model

For the LC-H model, the log-likelihood function $$\ell ({\mib \bipsi } )\,\colon\! {\equals} \, {\rm ln}\,\pi \left( {{\mib{y}} _{{1\: \colon\: T}} \!\mid\!{\mib \bipsi } } \right)$$ is given by

(36) $$\ell ({\mib \bipsi } )\, {\equals} \, {\rm ln}\,\left( {\prod\limits_{t\, {\equals} \, 1}^T \pi ({\mib{y}} _{t} \!\mid\!{\mib{y}} _{{1\: \colon\: t\, {\minus}\, 1}} ,{\mib \bipsi } )} \right)\, {\equals} \, \, {\minus}\, {{pT} \over 2}{\rm ln}\,2\pi \, {\minus}\, {1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left( {{\rm ln}\,\!\mid\!{\mib{Q}} _{t} \!\mid\!{\plus}{\mib \binu } _{t}^{{\rm \top }} {\bf Q}_{t}^{{\, {\minus}\! 1}} {\mib \binu } _{t} } \right)$$

where ν t := y t f t , and $${\mib \bipsi } \, {\equals} \, \big( {\alpha _{{x_{2} \: \colon\: x_{p} }} ,\beta _{{x_{2} \: \colon\: x_{p} }} ,\theta ,\sigma _{{{\varepsilon},x_{1} \: \colon\: x_{p} }}^{2} ,\sigma _{\omega }^{2} } \big)$$ is an n-dimensional parameter vector. The log-likelihood function (54) can be derived directly from (34c).

It can be shown that (Harvey, Reference Harvey1989) the elements of the score vector and the information matrix are given in closed form for the LC-H model according to the expressions:

(37) $${{\partial \ell } \over {\partial \psi _{i} }}\, {\equals} \, {1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left\{ {{\rm tr}\left[ {\left( {{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib{Q}} _{t} } \over {\partial \psi _{i} }}} \right)({\bf{1}} _{p} \, {\minus}\, {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \binu } _{t} {\mib \binu } _{t}^{{\rm \top }} )} \right]{\plus}2{{\partial {\mib \binu } _{t}^{{\rm \top }} } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus} 1}} {\mib \binu } _{t} } \right\},\quad i\, {\equals} \, 1,\,\ldots\:,n$$

where tr[·] denotes the trace operator and

(38) $$\, {\minus}\, {\rm E}\left[ {{{\partial ^{2} \ell } \over {\partial \psi _{i} \partial \psi _{j} }}} \right]\, {\equals} \, {1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left[ {{\rm tr}\left( {{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib{Q}} _{t} } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib{Q}} _{t} } \over {\partial \psi _{j} }}} \right)} \right]{\plus}{\rm E}\left[ {\mathop{\sum}\limits_{t\, {\equals} \, 1}^T {{\partial {\mib \binu } _{t}^{{\rm \top }} } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib \binu } _{t} } \over {\partial \psi _{j} }}} \right],\quad i,j\, {\equals} \, 1,\,\ldots\:,n$$

and the expectation operator E[·] on the second term in (38) can be dropped (since the expressions are asymptotically equivalent). In order to evaluate the score vector and the information matrix, we need

(39) $${{\partial {\mib \binu } _{t} } \over {\partial \psi _{i} }}\, {\equals} \, \, {\minus}\, {{\partial {\mib \bialpha } } \over {\partial \psi _{i} }}\, {\minus}\, {{\partial {\mib \bibeta } } \over {\partial \psi _{i} }}a_{t} \, {\minus}\, {\mib \bibeta } {{\partial a_{t} } \over {\partial \psi _{i} }}$$

and

(40) $${{\partial {\mib{Q}} _{t} } \over {\partial \psi _{i} }}\, {\equals} \, {{\partial {\mib \bibeta } } \over {\psi _{i} }}R_{t} {\mib \bibeta } ^{{\rm \top }} {\plus}{\mib \bibeta } {{\partial R_{t} } \over {\partial \psi _{i} }}{\mib \bibeta } ^{{\rm \top }} {\plus}{\mib \bibeta } R_{t} {{\partial {\mib \bibeta } ^{{\rm \top }} } \over {\partial \psi _{i} }}{\plus}{{\partial \Sigma } \over {\partial \psi _{i} }}$$

The expressions (39) and (40) require, for t=1, … , T and i=1, … , n

(41) $${{\partial a_{t} } \over {\partial \psi _{i} }}\, {\equals} \, {{\partial m_{{t\, {\minus}\, 1}} } \over {\partial \psi _{i} }}{\plus}{{\partial \theta } \over {\partial \psi _{i} }}\quad {\rm and}\quad {{\partial R_{t} } \over {\partial \psi _{i} }}\, {\equals} \, {{\partial C_{{t\, {\minus}\, 1}} } \over {\partial \psi _{i} }}{\plus}{{\partial \sigma _{\omega }^{2} } \over {\partial \psi _{i} }}$$

The expressions in (41) in turn require, for t=1, … , T−1 and i=1, … , n

(42) $${{\partial m_{t} } \over {\partial \psi _{i} }}\, {\equals} \, {{\partial a_{t} } \over {\partial \psi _{i} }}{\plus}{{\partial R_{t} } \over {\partial \psi _{i} }}{\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \binu } _{t} {\plus}R_{t} {{\partial {\mib \bibeta } } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \binu } _{t} \, {\minus}\, R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib Q } _{t} } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \binu } _{t} {\plus}R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib \binu } _{t} } \over {\partial \psi _{i} }}$$

and

(43) $$\eqalignno{ {{\partial C_{t} } \over {\partial \psi _{i} }}\, {\equals} \, &#x0026; {{\partial R_{t} } \over {\partial \psi _{i} }}\, {\minus}\, {{\partial R_{t} } \over {\partial \psi _{i} }}{\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \bibeta } R_{t} \, {\minus}\, R_{t} {{\partial {\mib \bibeta } } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \bibeta } R_{t} \cr &#x0026; {\plus}R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib{Q}} _{t} } \over {\partial \psi _{i} }}{\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \bibeta } R_{t} \, {\minus}\, R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {{\partial {\mib \bibeta } } \over {\partial \psi _{i} }}R_{t} \, {\minus}\, R_{t} {\mib \bibeta } ^{{\rm \top }} {\mib{Q}} _{t}^{{\, {\minus}\! 1}} {\mib \bibeta } {{\partial R_{t} } \over {\partial \psi _{i} }} $$

Note that $${{\partial m_{0} } \over {\partial \psi _{i} }}\, {\equals} \, {{\partial C_{0} } \over {\partial \psi _{i} }}\, {\equals} \, 0$$ for i=1, … , n and the required differentiation matrices $${{\partial {\mib \bialpha } } \over {\partial {\mib \bipsi } _{i} }}$$ , $${{\partial {\mib \bibeta } } \over {\partial {\bf \psi }_{i} }}$$ , $${{\partial {\bf \biSigma } } \over {\partial {\bf \psi }_{i} }}$$ , $${{\partial {\mib \bitheta } } \over {\partial {\mib \bipsi } _{i} }}$$ and $${{\partial {\mib \bisigma } _{{\mib \biomega} }^{{\bf 2}} } \over {\partial {\mib \bipsi } _{i} }}$$ are displayed in Appendix A. The gradient-based estimation for the LC-H model is described in Algorithm 1.

5. Bayesian State-Space Inference

In contrast to the classical maximum likelihood approach where parameters are deterministic but unknown, in a Bayesian view the parameters are treated as random variables. In this way the Bayesian paradigm can take parameter uncertainty into account and incorporate in a consistent manner a priori beliefs on the important model parameters, as encoded through the prior.

In this section, we aim to develop modern approaches to Bayesian inference for state-space modelling that do not rely on potentially inefficient sampling approaches based on Gibbs or Metropolis-within-Gibbs for the latent state process. Instead we will introduce to stochastic mortality modelling state-space models of two classes of Bayesian inference:

  • Linear Gaussian stochastic mortality models: a Rao-Blackwellised Gibbs sampler approach which is based on a combination of Metropolis-within-Gibbs and Gibbs sampling steps for the static model parameters, combined with a forward-backward Kalman filter recursion for the state process. We assume the proposed constraints (18) throughout, avoiding the constraint issue when performing MCMC as discussed in section 2.4.

  • Non-linear/non-Gaussian stochastic mortality models: in the case of non-linear and or non-Gaussian state-space model dynamics such as the stochastic volatility models of LC3-H2 and the LCSV models, the sampler we develop is based on novel developments of the particle Metropolis–Hastings samplers of Andrieu et al. (Reference Andrieu, Doucet and Holenstein2010) adapted to the stochastic mortality models. In particular, we consider a combination of Rao-Blackwellised Kalman filter and particle filter for the latent state process full posterior conditionals, combined with a combination of Metropolis-within-Gibbs and Gibbs sampling steps for the static model parameters.

In general under all the Bayesian approaches we consider, we aim to obtain the joint posterior density:

(44) $$\pi \left( {\kappa _{{0\: \colon\: T}} ,{\mib \bipsi } \!\mid\!{\mib{y}} _{{1\: \colon\: T}} } \right)$$

of the states is κ 0:T as well as the parameters, $$\bipsi $$ , given the observations y 1:T . We begin with the first case of the linear Gaussian state-space stochastic mortality models and we use the LC-H model as an example where the parameter vector is $${\mib \bipsi } \,\colon\! {\equals} \, \big( {\alpha _{{x_{2} \: \colon\: x_{p} }} ,\beta _{{x_{2} \: \colon\: x_{p} }} ,\theta ,\sigma _{{{\varepsilon},x_{1} \: \colon\: x_{p} }}^{2} ,\sigma _{\omega }^{2} } \big)$$ as we use the constraint proposed in (18).

5.1. Linear Gaussian state-space inference

We develop an efficient approach involving a combined Gibbs sampling conjugate model sampler for the marginal target distributions of the static model parameters along with a forward-backward Kalman filter sampler for the latent process κ 1:T . A sample of the targeted density is obtained via Gibbs sampling where M is the number of MCMC iterations (Algorithm 2).

The general block Gibbs sampling algorithm steps require to sample from the full conditional densities π(κ 0:T | $$\bipsi $$ , y 1:T ) and π( $$\bipsi $$ |κ 0:T , y 1:T ), which are shown in the following.

5.1.1. Sampling from the full conditional density π(κ 0:T | $$\bipsi $$ , y 1:T )

Samples of the full conditional density π(κ 0:T | $$\bipsi $$ , y 1:T ) can be obtained via the so-called forward-filtering-backward sampling (FFBS) procedure (Carter & Kohn, Reference Carter and Kohn1994). This sampling approach is also utilised in Pedroza (Reference Pedroza2006). We can write

(45) $$\pi \left( {\kappa _{{0\: \colon\: T}} \!\mid\!{\mib \bipsi } ,{\mib{y}} _{{1\: \colon\: T}} } \right)\, {\equals} \, \prod\limits_{t\, {\equals} \, 0}^T \pi \left( {\kappa _{t} \!\mid\!\kappa _{{t{\plus}1\: \colon\: T}} ,{\mib \bipsi } ,{\mib{y}} _{{1\: \colon\: T}} } \right)\, {\equals} \, \prod\limits_{t\, {\equals} \, 0}^T \pi \left( {\kappa _{t} \!\mid\!\kappa _{{t{\plus}1}} ,{\mib \bipsi } ,{\mib{y}} _{{1\: \colon\: t}} } \right)$$

where the last term in the product, π(κ T | $$\bipsi $$ , y 1:T ), is distributed as N(m T ,C T ) which is obtained from the last iteration of the Kalman filtering procedure.

Once we draw a sample κ T from N(m T ,C T ), then (45) suggests that we can draw recursively and backwardly κ t from π(κ t |κ t+1, $$\bipsi $$ , y 1:t ), where t=T−1, T−2, … , 1, 0. Moreover, we have

(46) $$\kappa _{t} \!\mid\!\kappa _{{t{\plus}1}} ,{\mib \bipsi } ,{\mib{y}} _{{1\: \colon\: t}} {\rm \, \sim\, }N\left( {h_{t} ,H_{t} } \right)$$

where

(47a) $$h_{t} \, {\equals} \, m_{t} {\plus}C_{t} R_{{t{\plus}1}}^{{\, {\minus}\! 1}} \left( {\kappa _{{t{\plus}1}} \, {\minus}\, a_{{t{\plus}1}} } \right)$$
(47b) $$H_{t} \, {\equals} \, C_{t} \, {\minus}\, C_{t} R_{{t{\plus}1}}^{{\, {\minus}\! 1}} C_{t} $$

which can be derived based on Kalman smoother (Petris et al., Reference Petris, Petrone and Campagnoli2009).

The FFBS procedure is displayed in Algorithm 3. Note that the prior distribution for κ 0 can be set to be vague to run the Kalman filter; the output of the algorithm includes the posterior distribution of κ 0.

5.1.2. Sampling from the full conditional density π( $$\bipsi $$ |κ 0:T , y 1:T )

The first thing to observe is that under the reparameterisation of the identification constraints (18), the following Gibbs sampling stages can be performed exactly.

We assume that the prior for $$\big( {\alpha _{{x_{2} \: \colon\: x_{p} }} ,\beta _{{x_{2} \: \colon\: x_{p} }} ,\theta ,\sigma _{{{\varepsilon},x_{1} \: \colon\: x_{p} }}^{2} ,\sigma _{\omega }^{2} } \big)$$ are given by

(48a) $$\alpha _{x} \: \sim\: N\left( {\tilde{\bimu }_{\alpha } ,\tilde{\sigma }_{\alpha }^{2} } \right),\quad \beta _{x} \: \sim\: N\big( {\tilde{\bimu }_{\beta } ,\tilde{\sigma }_{\beta }^{2} } \big),\quad \theta \: \sim\: N\left( {\tilde{\bimu }_{\theta } ,\tilde{\sigma }_{\theta }^{2} } \right)$$
(48b) $$\sigma _{{{\varepsilon},x}}^{2} \: \sim\: {\rm IG}\big( {\tilde{a}_{{\varepsilon}} ,\tilde{b}_{{\varepsilon}} } \big),\quad \sigma _{\omega }^{2} \: \sim\: {\rm IG}\big( {\tilde{a}_{\omega } ,\tilde{b}_{\omega } } \big)$$

where $${\rm IG}\big( {\tilde{a}_{\omega } ,\tilde{b}_{\omega } } \big)$$ denotes an inverse-γ distribution with mean $${{\tilde{b}_{\omega } } /{\left( {\tilde{a}_{\omega } {\minus}\, 1} \right)^{2}}$$ and variance $${{\tilde{b}_{\omega }^{2} } {/{{{\big( {\big( {\tilde{a}_{\omega } \, {\minus}\, 1} \big)^{2} \big( {\tilde{a}_{\omega } \, {\minus}\, 2} \right)} \big)}}} \big. \kern-\nulldelimiterspace} }$$ for $$\tilde{a}_{\omega } \,\gt\,2$$ . We assume that the priors for all parameters are independent. In this case the posterior densities of parameters are of the same type as the prior densities, a so-called conjugate prior. The posterior distribution for each parameter is given by (we write, for ease of notation, y = y 1 :T , κ =κ 0:T and family $$\bipsi $$ λ means “parameter vector $$\bipsi $$ without the parameter λ”):

(49) $$\alpha _{x} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,{\bikappa } ,{\mib \bipsi } _{{\, {\minus}\, \alpha _{x} }} {\rm \: \sim\: }N\left( {{{\tilde{\mu }_{\alpha } \sigma _{{{\varepsilon},x}}^{2} {\plus}\tilde{\sigma }_{\alpha }^{2} \mathop{\sum}\nolimits_t \left( {y_{{xt}} \, {\minus}\, \beta _{x} \kappa _{t} } \right)} \over {\tilde{\sigma }_{\alpha }^{2} T{\plus}\sigma _{{{\varepsilon},x}}^{2} }},{{\tilde{\sigma }_{\alpha }^{2} \sigma _{{{\varepsilon},x}}^{2} } \over {\tilde{\sigma }_{\alpha }^{2} T{\plus}\sigma _{{{\varepsilon},x}}^{2} }}} \right)$$
(50) $$\beta _{x} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,{\mib \bikappa } ,{\mib \bipsi } _{{\, {\minus}\, \beta _{x} }} \: \sim\: N\left( {{{\tilde{\sigma }_{\beta }^{2} \mathop{\sum}\nolimits_t \left( {y_{{xt}} \, {\minus}\, \alpha _{x} } \right)\kappa _{t} {\plus}\tilde{\mu }_{\beta } \sigma _{{{\varepsilon},x}}^{2} } \over {\tilde{\sigma }_{\beta }^{2} \mathop{\sum}\nolimits_t \kappa _{t}^{2} {\plus}\sigma _{{{\varepsilon},x}}^{2} }},{{\tilde{\sigma }_{\beta }^{2} \sigma _{{{\varepsilon},x}}^{2} } \over {\tilde{\sigma }_{\beta }^{2} \mathop{\sum}\nolimits_t \kappa _{t}^{2} {\plus}\sigma _{{{\varepsilon},x}}^{2} }}} \right)$$
(51) $$\theta \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,{\mib \bikappa } ,{\mib \bipsi } _{{\, {\minus}\, \theta }} {\rm \: \sim\: }N\left( {{{\tilde{\sigma }_{\theta }^{2} \mathop{\sum}\nolimits_{t\, {\equals} \, 1}^T {\left( {\kappa _{t} \, {\minus}\, \kappa _{{t\, {\minus}\, 1}} } \right){\plus}\tilde{\mu }_{\theta } \sigma _{\omega }^{2} } } \over {\tilde{\sigma }_{\theta }^{2} T{\plus}\sigma _{\omega }^{2} }},{{\tilde{\sigma }_{\theta }^{2} \sigma _{\omega }^{2} } \over {\tilde{\sigma }_{\theta }^{2} T{\plus}\sigma _{\omega }^{2} }}} \right)$$
(52) $$\sigma _{{{\varepsilon},x}}^{2} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,{\mib \bikappa } ,{\mib \bipsi } _{{\, {\minus}\, \sigma _{{{\varepsilon},x}}^{2} }} {\rm \: \sim\: IG}\left( {\tilde{a}_{{\varepsilon}} {\plus}{{pT} \over 2},\:\tilde{b}_{{\varepsilon}} {\plus}{1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left( {y_{{xt}} \, {\minus}\, (\alpha _{x} {\plus}\beta _{x} \kappa _{t} )} \right)^{2} } \right)$$
(53) $$\sigma _{\omega }^{2} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,{\mib \bikappa } ,{\mib \bipsi } _{{\, {\minus}\, \sigma _{\omega }^{2} }} {\rm \: \sim\: IG}\left( {\tilde{a}_{\omega } {\plus}{T \over 2},\:\tilde{b}_{\omega } {\plus}{1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left( {\kappa _{t} \, {\minus}\, (\kappa _{{t\, {\minus}\, 1}} {\plus}\theta )} \right)^{2} } \right)$$

5.2. Non-linear/non-Gaussian state-space inference

In the case of non-linear/non-Gaussian state-space model dynamics such as the stochastic volatility models of LC3-H2 and the LCSV models, the sampler we develop is based on novel developments of the particle Metropolis–Hastings samplers of Andrieu et al. (Reference Andrieu, Doucet and Holenstein2010) adapted to the stochastic mortality models. In particular, we consider a combination of Rao-Blackwellised Kalman filter and particle filter for the latent state process full posterior conditionals, combined with a combination of Metropolis-within-Gibbs and Gibbs sampling steps for the static model parameters, both embedded within a PMCMC framework. We will illustrate the idea of this methodology for the LCSV model where a stochastic volatility dynamics is included in the latent process for the period effect.

5.2.1. Estimation for the LCSV mortality model

The static parameter vector is denoted as $${\mib \bipsi } \, {\equals} \, \left( {\alpha _{{x_{2} \: \colon\: x_{p} }} ,\beta _{{x_{2} \: \colon\: x_{p} }} ,\theta ,\sigma _{{\varepsilon}}^{2} ,\sigma _{\gamma }^{2} ,\lambda _{1} ,\lambda _{2} ,\gamma _{0} } \right)$$ . Note that we treat γ 0 as a static parameter and our task is to obtain samples from the joint posterior distribution:

(54) $$\pi \left( {\kappa _{{0\: \colon\: T}} ,\gamma _{{1\: \colon\: T}} ,{\mib \bipsi } \!\mid\!{\mib{y}} _{{1\: \colon\: T}} } \right)$$

In this setting one can try a number of different approaches, the first would be to sample jointly from the full posterior distribution (54) via PMCMC methods to be described below. A second approach would be to combine PMCMC methods within a block-Gibbs-based sampler such as the following sampling scheme, where we apply Gibbs sampling to sample from the full conditional densities:

(55a) $$\pi \left( {\kappa _{{0\: \colon\: T}} \!\mid\!{\mib \bipsi } ,\gamma _{{1\: \colon\: T}} ,{\mib{y}} _{{1\: \colon\: T}} } \right)$$
(55b) $$\pi \left( {{\mib \bipsi } \!\mid\!\kappa _{{0\: \colon\: T}} ,\gamma _{{1\: \colon\: T}} ,{\mib{y}} _{{1\: \colon\: T}} } \right)$$
(55c) $$\pi \left( {\gamma _{{1\: \colon\: T}} \!\mid\!{\mib \bipsi } ,\kappa _{{0\: \colon\: T}} ,{\mib{y}} _{{1\: \colon\: T}} } \right)$$

Note that sampling from (55a) can be achieved by the FFBS procedure described in Algorithm 3, as one can apply Kalman filtering since γ 1:T is assumed to be given. The only difference compared to section 5.1.1 is that the term $$\sigma _{\omega }^{2} $$ is replaced by exp{γ t } in Kalman filtering.

In the following, we provide details on how to sample from either the full posterior (54) or from full conditionals such as the density in (55c), via PMCMC method. Sampling from the posteriors of the static parameters (55b) is detailed in section 5.2.4.

5.2.2. PMCMC for mortality models

In this section, we explain the generic form of the PMCMC methodology that can be applied in a range of approaches for state-space stochastic mortality models. In general a PMCMC sampling method is a class of MCMC method where SMC algorithm is used as a proposal distribution within a MCMC algorithm. Though this seems trivial, it is actually based on a key observation that by using such a filter within the MCMC, the dimension of the acceptance probability in the Metropolis–Hastings acceptance–rejection stage is significantly reduced and can therefore facilitate much better mixing performance of the resulting Markov chain, reducing variance in estimation, see discussion in detail in Andrieu et al. (Reference Andrieu, Doucet and Holenstein2010).

To bring out the essence of PMCMC, we first discuss a generic approach to sample from a target distribution:

(56) $$\pi \left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib \bipsi } \!\mid\!{\bf z}_{{1\: \colon\: T}} } \right)$$

where ϕ 1:T and $$\bipsi $$ are the latent state and static parameters of a general state-space model. Note, the state processes in this context are generally non-linear and potentially non-Gaussian.

From the perspective of obtaining the most efficiently mixing Markov chain to sample from this posterior, the ideal proposal distribution for constructing the Markov chain for $$\left( {{\mib \biphi } '_{{1\: \colon\: T}} ,{\mib \bipsi '} } \right)$$ is easily seen to be given by

(57) $$q\left( {{\mib \bipsi '} \!\mid\!{\mib \bipsi } } \right)p_{{{\mib \bipsi '} }} \left( {{\mib \biphi \prime \!}_{{1\: \colon\: T}}\!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)$$

where q( $$\bipsi $$ ′| $$\bipsi $$ ) is a proposal for the parameters and the proposal for the latent state, $$p_{{{\mib \bipsi '} }} \left( {{\mib \biphi \prime } _{{1\: \colon\: T}}{\mib z}_{{1\: \colon\: T}} } \right)$$ , is from the state equation (given $$\bipsi $$ ′). Here ( ϕ 1:T , $$\bipsi $$ ) is the current state at MCMC iteration j−1 and $$\left( {{\mib \biphi \prime}\!_{{1\: \colon\: T}} ,{\mib \bipsi '} } \right)$$ is the proposed next move at MCMC iteration j.

In this ideal case, the acceptance probability of this ideal proposal is given by

(58) $$\alpha \left( {\left( {{\mib \biphi }{{\mib \prime} }\!_{{1\: \colon\: T}} ,{\mib \bipsi '} } \right),\left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib \bipsi } } \right)} \right)\, {\equals} \, 1\wedge{{p\left( {{\mib \biphi }{{\mib \prime} }\! _{{1\: \colon\: T}} ,{\mib \bipsi '} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)q\left( {{\mib \bipsi } \!\mid\!{\mib \bipsi '} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)} \over {p\left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib \bipsi } \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)q\left( {{\mib \bipsi '} \!\mid\!{\mib \bipsi } } \right)p_{{{\mib \bipsi '} }} \left( {{\mib \biphi } _{{1\: \colon\: T}}^{{\mib \prime} } \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)}}$$
(59) $$\, {\equals} \, 1\wedge{{p_{{{\mib \bipsi '} }} \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\!{\bf z}_{{1\: \colon\: T}} } \right)p\left( {{\mib \bipsi '} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)q\left( {{\mib \bipsi } \!\mid\!{\mib \bipsi '} } \right)p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)} \over {p_{{\mib \bipsi } } \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)p\left( {{\mib \bipsi } \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)q\left( {{\mib \bipsi '} \!\mid\!{\mib \bipsi } } \right)p_{{{\mib \bipsi '} }} \left( {{\mib \biphi }{{\mib \prime} }\! _{{1\: \colon\: T}} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)}}$$
(60) $$\, {\equals} \, 1\wedge{{p\left( {{\mib \bipsi '} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)q\left( {{\mib \bipsi } \!\mid\!{\mib \bipsi '} } \right)} \over {p\left( {{\mib \bipsi } \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)q\left( {{\mib \bipsi '} \!\mid\!{\mib \bipsi } } \right)}}$$
(61) $$\, {\equals} \, 1\wedge{{p_{{{\mib \bipsi '} }} ({\mib{z}} _{{1\: \colon\: T}} )p({\mib \bipsi '} )q\left( {{\mib \bipsi } \!\mid\!{\mib \bipsi '} } \right)} \over {p_{{\mib \bipsi } } ({\mib{z}} _{{1\: \colon\: T}} )p({\mib \bipsi } )q({\mib \bipsi '} \!\mid\!{\mib \bipsi } )}}$$

where r 1Λr 2:=min(r 1, r 2). A desirable property of the ideal proposal is that the acceptance probability depends only on the marginal likelihood, together with the prior and proposal for the static parameters. This is optimal in the sense that the dimension of the numerator and denominator is reduced significantly to the static model parameter dimensions, and not including explicitly the path-space latent process dimensions, a reduction of d×T dimensions for a d-dimensional state vector ϕ t . However, clearly one can never achieve this goal as it requires perfect knowledge of $$p_{{{\mib \bipsi '} }} \left( {{\mib \biphi }{{\mib \prime} }\!_{{1\: \colon\: T}} } \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)$$ as well as the ability to sample this distribution, both of which are unachievable except in the special case of the Linear Gaussian case explained in section 5.1.1.

To circumvent this problem, the particle marginal Metropolis–Hastings sampler (PMMH; Andrieu et al., Reference Andrieu, Doucet and Holenstein2010) applies SMC method to obtain an approximate of the state transition density (which is also the state proposal):

(62) $$\hat{p}_{{{\mib \bipsi '} }} \left( {{\mib \biphi } _{{1\: \colon\: T}} \!\mid\!{\mib{z}} _{{1\: \colon\: T}} } \right)\, {\equals} \, \mathop{\sum}\limits_{i\, {\equals} \, 1}^N w_{T}^{{(i)}} \delta _{{{\mib \biphi } _{{1\: \colon\: T}}^{{(i)}} }} \left( {{\mib \biphi } _{{1\: \colon\: T}} } \right)$$

where $$w_{T}^{{(i)}} $$ is the importance weight, $$\delta _{x} (X)$$ a Dirac mass function centred at X and a proposed next move of the latent state is drawn from this discrete approximate distribution. Moreover, a by-product of a SMC algorithm is the marginal likelihood, $$\hat{p}_{{\mib \bipsi } } ({\mib{z}} _{{1\: \colon\: T}} )$$ , which has the following important property:

Lemma 5.1 A SMC proposal admits as a by-product an unbiased estimator of the marginal likelihood p $$_\bipsi $$ ( z 1:T ) given by

(63) $$\hat{p}_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)\,\colon\! {\equals} \, \prod\limits_{t\, {\equals} \, 2}^T \hat{p}_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)$$

where a SMC approximation with N-particles produces, for all t

(64) $$\hat{p}_{{\mib \bipsi } } \left( {{\mib{z}} _{t} \!\mid\!{\mib{z}} _{{1\: \colon\: t\, {\minus}\, 1}} } \right)\, {\equals} \, {1 \over N}\mathop{\sum}\limits_{i\, {\equals} \, 1}^N w_{t}^{{(i)}} $$

which is an unbiased particle estimate of p $$_\bipsi $$ ( z t | z 1:t−1). This non-trivial unbiasedness was first presented in Del Moral (Reference Del Moral2004) and has since been utilised to great advantage as explained in Chopin et al. (Reference Chopin, Jacob and Papaspiliopoulos2013). In addition, the variance of this estimator typically only grows linearly with T.

The unbiased approximate marginal likelihoods are then used in the acceptance probability (61):

(65) $$\alpha \left( {\left( {{\mib \biphi } _{{1\: \colon\: T}}^{{\mib \prime} } ,{\mib \bipsi '} } \right),\left( {{\mib \biphi } _{{1\: \colon\: T}} ,{\mib \bipsi } } \right)} \right)\, {\equals} \, 1\wedge{{\hat{p}_{{{\mib \bipsi '} }} \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)p\left( {{\mib \bipsi '} } \right)q\left( {{\mib \bipsi } \!\mid\!{\mib \bipsi '} } \right)} \over {\hat{p}_{{\mib \bipsi } } \left( {{\mib{z}} _{{1\: \colon\: T}} } \right)p\left( {\mib \bipsi } \right)q\left( {{\mib \bipsi '} \!\mid\!{\mib \bipsi } } \right)}}$$

Due to the unbiasedness of the estimated marginal likelihood, Andrieu et al. (Reference Andrieu, Doucet and Holenstein2010) show that, even though only SMC approximates are used (with finite number of particles N), the invariant distribution of PMMH is the target distribution π( ϕ 1:T , $$\bipsi $$ | z 1:T ).

To apply PMCMC for an efficient estimation of the LCSV model, we first notice that we can obtain explicitly the posteriors of static parameters via conjugate priors. As a result we are only required to sample from the density π(γ 1:T | $$\bipsi $$ , κ 0:n , y 1:T ), instead of the joint density π(γ 1:T , $$\bipsi $$ |κ 0:n , y 1:T ). It turns out that there is a class of PMCMC algorithm, called particle-independent Metropolis–Hastings sampler (PIMH), which provide a mechanism to sample exactly from π1:T | $$\bipsi $$ , κ 0:n , y 1:T ).

Our approach to sampling from the joint posterior distribution, π(k 0:T , γ 1:T, $$\bipsi $$ | y 1:T , of the LCSV model is summarised in Algorithm 4.

5.2.3. PIMH: sampling from π(γ 1:T | $$\bipsi $$ , κ 0:n , y 1:T )

We first note that π(γ 1:T | $$\bipsi $$ , κ 0:n , y 1:T =π $$_\bipsi $$ (γ 1:T |k 0:n ) given the structure of the LCSV model. Using an independent proposal density, q $$_\bipsi $$ 1:T 0:n ), in the Metropolis–Hastings algorithm, the acceptance probability is given by

(66) $$\alpha \left( {\gamma '_{{1\: \colon\: T}} ,\gamma _{{1\: \colon\: T}} } \right)\, {\equals} \, 1\wedge{{\pi _{{\bf \psi }} (\gamma{{\mib \prime} }\! _{{1\: \colon\: T}} \!\mid\!\kappa _{{0\: \colon\: n}} )q_{{\bf \psi }} \left( {\gamma _{{1\: \colon\: T}} \!\mid\!\kappa _{{0\: \colon\: n}} } \right)} \over {\pi _{{\bf \psi }} (\gamma _{{1\: \colon\: T}} \!\mid\!\kappa _{{0\: \colon\: n}} )q_{{\bf \psi }} \left( {\gamma _{{1\: \colon\: T}}^{{\mib \prime} } \!\mid\!\kappa _{{0\: \colon\: n}} } \right)}}$$

Ideally, one may take q $$_\bipsi $$ (γ 1:T |κ 0:n )=π $$_\bipsi $$ (γ 1:T |κ 0:n ). However, in most cases such an ideal choice is impossible to sample from and to evaluate. The PIMH sampler proposes instead to use the SMC approximation $$\hat{\pi }_{\psi } (\gamma _{{1\: \colon\: T}} \!\mid\!\kappa _{{0\: \colon\: n}} )$$ as the proposal density and calculate the acceptance probability as

(67) $$\alpha \left( {\gamma{{\mib \prime} }\! _{{1\: \colon\: T}} ,\gamma _{{1\: \colon\: T}} } \right)\, {\equals} \, 1\wedge{{\hat{\pi }_{{\mib \bipsi } } \left( {\kappa _{{0\: \colon\: n}} } \right){\prime} } \over {\hat{\pi }_{{\mib \bipsi } } \left( {\kappa _{{0\: \colon\: n}} } \right)[j\, {\minus}\, 1]}}$$

where $$\hat{\pi }_{{\mib \bipsi } } \left( {\kappa _{{0\: \colon\: n}} } \right){\prime} $$ and $$\hat{\pi }_{{\mib \bipsi } } \left( {\kappa _{{0\: \colon\: n}} } \right)[j\, {\minus}\, 1]$$ are unbiased marginal likelihoods estimated by SMC (see Lemma 5.1) in the current MCMC iteration j and the previous iteration j−1, respectively. It can be shown that the invariant distribution of the PIMH sampler is the target distribution π $$_\bipsi $$ (γ 1:T |κ 0:n ) (Andrieu et al., Reference Andrieu, Doucet and Holenstein2010).

It remains to specify an SMC approximation $$\hat{\pi }_{{\mib \bipsi } } \left( {\gamma _{{1\: \colon\: T}} \!\mid\!\kappa _{{0\: \colon\: n}} } \right)$$ (Appendix B). We use the so-called bootstrap filter, that is, the proposal distribution in the SMC algorithm to draw γ t is given by the state equation (23c):

(68) $$g_{t} \left( {\gamma _{t} \!\mid\!\gamma _{{1\: \colon\: t\, {\minus}\, 1}} ,\kappa _{{0\: \colon\: t}} } \right)\,\colon\! {\equals} \, \pi _{{\mib \bipsi } } \left( {\gamma _{t} \!\mid\!\gamma _{{t\, {\minus}\, 1}} } \right)$$

Consequently, the importance weight is evaluated as

(69) $$\tilde{w}_{t} \propto\tilde{w}_{{t\, {\minus}\, 1}} \pi _{{\mib \bipsi } } \left( {\kappa _{t} \!\mid\!\gamma _{t} ,\kappa _{{t\, {\minus}\, 1}} } \right)$$

where π(κ t |γ t , κ t−1) is the incremental importance weight. Our approach for sampling from π $$_\bipsi $$ (γ 1:T |κ 0:T ) is summarised in Algorithm 5 (together with Algorithm 6).

5.2.4. Sampling from π( $$\bipsi $$ |κ 0:T , γ 1:T , y 1:T )

We assume the prior for $$\big( {\alpha _{{x_{2} \: \colon\: x_{p} }} ,\,\beta _{{x_{2} \: \colon\: x_{p} }} ,\,\theta ,\,\sigma _{{\varepsilon}}^{2} ,\,\sigma _{\gamma }^{2} ,\,\lambda _{1} ,\,\lambda _{2} ,\,\gamma _{0} } \big)$$ is given by

(75a) $$\alpha _{x} \: \sim\: N\left( {\tilde{\bimu }_{\alpha } ,\tilde{\sigma }_{\alpha }^{2} } \right),\quad \beta _{x} \: \sim\: N\left( {\tilde{\bimu }_{\beta } ,\tilde{\sigma }_{\beta }^{2} } \right),\quad \theta \: \sim\: N\left( {\tilde{\bimu }_{\theta } ,\tilde{\sigma }_{\theta }^{2} } \right),\quad \sigma _{{\varepsilon}}^{2} \: \sim\: {\rm IG}\left( {\tilde{a}_{{\varepsilon}} ,\tilde{b}_{{\varepsilon}} } \right)$$
(75b) $$\sigma _{\gamma }^{2} \: \sim\: {\rm IG}\left( {\tilde{a}_{\gamma } ,\,\tilde{b}_{\gamma } } \right),\:\,\lambda _{1} \: \sim\: N_{{[\, {\minus}\, 1,1]}} \left( {\tilde{\bimu }_{{\lambda _{1} \!}} ,\,\tilde{\sigma }_{{\lambda _{1} }}^{2} } \right),\,\lambda _{2} \: \sim\: N\left( {\tilde{\bimu }_{{\lambda _{2} }\!} ,\,\tilde{\sigma }_{{\lambda _{2} }}^{2} } \right)\:\gamma _{0} \: \sim\: N\left( {\tilde{\bimu }_{{\gamma _{0\!} }} ,\,\tilde{\sigma }_{{\gamma _{0} }}^{2} } \right)$$

where x=x 2, … , x p and N [−1, 1] denotes a truncated Gaussian with support [−1, 1]. It is assumed that the priors for all parameters are independent.

Samples from the density π( $$\bipsi $$ |κ 0:T , γ 1:T , Y 1:T ) are obtained by sampling from the following posteriors:

(76) $$\alpha _{x} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib{\bikappa }} ,\,{\mib{\bigamma }} ,\,{\mib{\bipsi }} _{{\, {\minus}\, \alpha _{x} }} \: \sim\: N\left( {{{\tilde{\mu }_{\alpha } \sigma _{{\varepsilon}}^{2} {\plus}\tilde{\sigma }_{\alpha }^{2} \mathop{\sum}\nolimits_t {\left( {y_{{xt}} \, {\minus}\, \beta _{x} \kappa _{t} } \right)} } \over {T\tilde{\sigma }_{\alpha }^{2} {\plus}\sigma _{{\varepsilon}}^{2} }},{{\tilde{\sigma }_{\alpha }^{2} \sigma _{{\varepsilon}}^{2} } \over {T\tilde{\sigma }_{\alpha }^{2} {\plus}\sigma _{{\varepsilon}}^{2} }}} \right)$$
(77) $$\beta _{x} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib{\kappa }} ,\,{\mib \gamma } ,\,{\mib \bipsi } _{{\, {\minus}\, \beta _{x} }} \: \sim\: N\left( {{{\tilde{\sigma }_{\beta }^{2} \mathop{\sum}\nolimits_t {\left( {y_{{xt}} \, {\minus}\, \alpha _{x} } \right)\kappa _{t} {\plus}\tilde{\mu }_{\beta } \sigma _{{\varepsilon}}^{2} } } \over {\tilde{\sigma }_{\beta }^{2} \mathop{\sum}\nolimits_t {\kappa _{t}^{2} {\plus}\sigma _{{\varepsilon}}^{2} } }},{{\tilde{\sigma }_{\beta }^{2} \sigma _{{\varepsilon}}^{2} } \over {\tilde{\sigma }_{\beta }^{2} \mathop{\sum}\nolimits_t {\kappa _{t}^{2} {\plus}\sigma _{{\varepsilon}}^{2} } }}} \right)$$
(78) $$\theta \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib \kappa } ,\,{\mib \gamma } ,\,{\mib \bipsi } _{{\, {\minus}\, \theta }} \: \sim\: N\left( {{{\tilde{\mu }_{\theta } \,/\,\tilde{\sigma }_{\theta }^{2} {\plus}\mathop{\sum}\nolimits_t {\left( {\kappa _{t} \, {\minus}\, \kappa _{{t\, {\minus}\, 1}} } \right)\,/\,e^{{\gamma _{t} }} } } \over {1\,/\,\tilde{\sigma }_{\theta }^{2} {\plus}\mathop{\sum}\nolimits_t {1\,/\,e^{{\gamma _{t} }} } }},{1 \over {1\,/\,\tilde{\sigma }_{\theta }^{2} {\plus}\mathop{\sum}\nolimits_t {1\,/\,e^{{\gamma _{t} }} } }}} \right)$$
(79) $$\sigma _{{\varepsilon}}^{2} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib{\kappa }} ,\,{\mib \gamma } ,\,{\mib \bipsi } _{{\, {\minus}\, \sigma _{{\varepsilon}}^{2} }} \: \sim\: {\rm IG}\left( {\tilde{a}_{{\varepsilon}} {\plus}{{pT} \over 2},\:\,\tilde{b}_{{\varepsilon}} {\plus}{1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T {\mathop{\sum}\limits_{x\, {\equals} \, 1}^p {\left( {y_{{xt}} \, {\minus}\, \left( {\alpha _{x} {\plus}\beta _{x} \kappa _{t} } \right)} \right)^{2} } } } \right)$$
(80) $$\sigma _{\gamma }^{2} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib{\kappa }} ,\,{\mib \gamma } ,\,{\mib \bipsi } _{{\, {\minus}\, \sigma _{\gamma }^{2} }} \: \sim\: {\rm IG}\left( {\tilde{a}_{\gamma } {\plus}{T \over 2},\,\:\tilde{b}_{\gamma } {\plus}{1 \over 2}\mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left( {\gamma _{t} \, {\minus}\, \lambda \gamma _{{t\, {\minus}\, 1}} } \right)^{2} } \right)$$
(81) $$\lambda _{1} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib{\kappa }} ,{\mib \gamma } ,{\mib \bipsi } _{{\, {\minus}\, \lambda _{1} }} \: \sim\: N_{{[\, {\minus}\, 1,1]}} \left( {{{\sigma _{\gamma }^{2} \tilde{\mu }_{{\lambda _{1} }} {\plus}\tilde{\sigma }_{{\lambda _{1} }}^{2} \mathop{\sum}\nolimits_t {\gamma _{{t\, {\minus}\, 1}} \gamma _{t} } } \over {\sigma _{\gamma }^{2} {\plus}\tilde{\sigma }_{{\lambda _{1} }}^{2} \mathop{\sum}\nolimits_t {\gamma _{{t\, {\minus}\, 1}}^{2} } }},\,{{\tilde{\sigma }_{{\lambda _{1} }}^{2} \sigma _{\gamma }^{2} } \over {\sigma _{\gamma }^{2} {\plus}\tilde{\sigma }_{{\lambda _{1} }}^{2} \mathop{\sum}\nolimits_t {\gamma _{{t\, {\minus}\, 1}}^{2} } }}} \right)$$
(82) $$\lambda _{2} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,\,{\mib \kappa } ,\,{\mib \gamma } ,\,{\mib \bipsi } _{{\, {\minus}\, \lambda _{2} }} \: \sim\: N\left( {{{\sigma _{\gamma }^{2} \tilde{\mu }_{{\lambda _{2} }} {\plus}\tilde{\sigma }_{{\lambda _{2} }}^{2} \mathop{\sum}\nolimits_t {\left( {\gamma _{t} \, {\minus}\, \lambda _{1} \gamma _{{t\, {\minus}\, 1}} } \right)} } \over {\sigma _{\gamma }^{2} {\plus}T\tilde{\sigma }_{{\lambda _{2} }}^{2} }},\,{{\tilde{\sigma }_{{\lambda _{2} }}^{2} \sigma _{\gamma }^{2} } \over {\sigma _{\gamma }^{2} {\plus}T\tilde{\sigma }_{{\lambda _{2} }}^{2} }}} \right)$$
(83) $$\gamma _{0} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} ,{\mib \kappa } ,{\mib \gamma } ,{\mib \bipsi } _{{\, {\minus}\, \gamma _{0} }} \: \sim\: N\left( {{{\sigma _{\gamma }^{2} \tilde{\mu }_{{\gamma _{0} }} {\plus}\tilde{\sigma }_{{\gamma _{0} }}^{2} \lambda \gamma _{1} } \over {\sigma _{\gamma }^{2} {\plus}\tilde{\sigma }_{{\gamma _{0} }}^{2} \lambda ^{2} }},\,{{\tilde{\sigma }_{{\gamma _{0} }}^{2} \sigma _{\gamma }^{2} } \over {\sigma _{\gamma }^{2} {\plus}\tilde{\sigma }_{{\gamma _{0} }}^{2} \lambda ^{2} }}} \right)$$

where the posterior distributions are obtained similarly as in section 5.1.2.

6. Empirical Analysis: Danish Male Population

In this section, a comprehensive empirical studyFootnote 4 is conducted on Danish mortality data using the models summarised in Table 2. The LC, LC-H, LCSV and LCSV-H models are described in section 3. While the LC-H model addresses heteroscedasticity in the observation equation, the LCSV model attempts to incorporate stochastic volatility in the state dynamics. The LCSV-H model includes both features of the LC-H model and the LCSV model, thus allowing for a full consideration of variability in long-term mortality dynamics.

Table 2 A summary of state-space mortality models considered in our empirical study.

Note that we have omitted the LC2-H model (a cohort model) in this analysis as a detailed account of the formulation, Bayesian estimation and forecasting of state-space cohort models is considered in a separate paper, see Fung et al. (Reference Fung, Peters and Shevchenko2017). We are also aware that there are growing interests in the literature to consider more realistic models for the latent factor; for example, Li et al. (Reference Li, Chan and Cheung2011), van Berkum et al. (Reference van Berkum, Antonio and Vellekoop2014) and Liu & Li (2016 Reference Liu and Lib ) all study trend-changing stochastic behaviour for the latent period effects. For this reason we will focus on the LCSV and LCSV-H models, where stochastic volatility is introduced in the latent dynamics, instead of the LC3-H2 model where stochastic volatility enters via the observation equation, in our empirical studies.

The Human Mortality DatabaseFootnote 5 provides a particularly long time series of mortality data from year 1835 to 2011 for the Danish population, supplemented with a detailed document analysing the data (Andreev, Reference Andreev2002). The provision of a long time series is important to our analysis concerning stochastic volatility. In the past several decades, mortality trend for developed countries generally exhibit a rather smooth pattern. The inclusion of periods that involve wars, epidemics or other life-critical events are crucial factors in witnessing significant volatility in mortality time series. In the following, we analyse the population mortality from Denmark based on the models in Table 2 and Bayesian methodologies studied in this paper. We then examine the models in terms of the forecasting properties of death rates and life expectancies. We also comment on the linear trend assumption and jump-off bias in mortality forecasting.

6.1. Data description

The data set consists of Danish male population death rates for 21 age groups (0, 1–4, 5–9, … , 95–99) from year 1835 to 2010 where we fix year 2010 as the end year. Figure 1 displays some of the time series of the log death rates for the Danish male population. It is clear that the multi-dimensional time series exhibit different volatility for different age groups, which justify the introduction of heteroscedasticity into the observation equation as discussed in section 3.1. We also observe that, mainly before 1950, there are periods that the volatility of death rates for some age groups are markedly different. Such a change of volatility in the temporal dimension suggests that stochastic volatility may be present in the underlying time period effect.

Figure 1 Time series of log death rates for Danish male population from year 1835 to 2010.

6.2. Estimation results

In our empirical study we focus on Bayesian inference and forecasting. We assume vague priors so that all inferences are mainly based on the data and the impact of the prior is not material. Taking the LCSV model as an example, we assume κ 0 $$\sim$$ N(0, 10), α x $$\sim$$ N(0, 10), β x $$\sim$$ N(0, 10), θ $$\sim$$ N(0, 10), $$\sigma _{{\varepsilon}}^{2} \: \sim\: {\rm IG}(2.001,0.001)$$ , $$\sigma _{\gamma }^{2} \: \sim\: {\rm IG}(2.001,0.001)$$ , λ 1 $$\sim$$ N(0, 10), λ 2 $$\sim$$ N(0, 10) and γ 0 $$\sim$$ N(0, 10), where x∈{x 2, … , x p }. The number of iterations of the Markov chain is 15,000 with 5,000 burn-in. We fix $$\alpha _{{x_{1} }} \, {\equals} \, {1 \over T}\mathop{\sum}\nolimits_{t\, {\equals} \, 1}^T y_{{x_{1} ,t}} $$ and $$\beta _{{x_{1} }} \, {\equals} \, 0.2$$ as an identification constraint; see Remark 3.1 for a discussion for our choice of values here.

Estimated values of the static parameters (except α and β ) for the Danish mortality data (1835–2010) are shown in Table 3. The rest of the estimated parameters and states are displayed in Figure 2. Here, we only show the plots for the LCSV-H model since the corresponding figures obtained from the LC, LC-H and LCSV model are visually similar to the case of the LCSV-H model.

Figure 2 Estimation of (upper panels) α , β and $$\sigma _{{x_{1} \,\colon\,x_{{21}} ,{\epsilon}}}^{2} $$ ; (lower panels) time effect κ 1834:2010, log-volatility γ 1835:2010 and first difference $$\Delta \bar{\kappa }_{t} $$ , for Danish male mortality data (1835–2010) using the LCSV-H model. LCSV-H, Lee–Carter stochastic volatility model with heteroscedasticity; CI, confidence interval.

Table 3 Estimated values of the static parameters (except α and β ) for the Danish male mortality data (1835–2010).

The range in (,) represents 95% credible interval.

LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity.

It is evident from Figure 2 that there are periods, namely 1850–1870, 1910–1920, 1930–1950, that the time effect κ exhibits higher volatility compared with other periods. We also observe that κ accelerates markedly downward after 1990 and is relatively smooth in the recent period 1950–2010. The filtering of the log-volatility process γ 1835:2010 (Figure 2) quantifies the volatility level $$\left( {e^{{\gamma _{t} }} } \right)$$ of the time effect and gives further evidence on the stochastic volatility nature of mortality. To see more clearly the phenomenon of changing volatility, we plot the first difference $$\Delta \bar{\kappa }_{t} \, {\equals} \, \bar{\kappa }_{t} \, {\minus}\, \bar{\kappa }_{{t\, {\minus}\, 1}} $$ in Figure 2 for the LCSV-H model, where $$\bar{\kappa }_{t} $$ denotes the posterior mean of κ t , t=1836, … , 2010. It shows evidently the change of volatility level in the latent process κ t . The patterns of the estimated log-volatility γ 1835:2010 and the first difference $$\Delta \bar{\kappa }_{t} $$ clearly suggest that it is not appropriate to assume constant volatility $$\left( {\sigma _{\omega }^{2} } \right)$$ for the time effect.

The state-space modelling approach is able to uncover the age-specific heteroscedasticity structure hidden in the Danish mortality time series. Figure 2 reveals that variability is particularly high for the very young and very old age group. Implications of the heteroscedastic structure on forecasting will be discussed in section 6.4.

To investigate the forecasting properties of the stochastic volatility model, we also estimate the models based on calibration periods 1835–1990 and 1950–1990. Figures 3 and 4 show the estimated parameters and states for the LCSV-H model in those periods.

Figure 3 Estimation of (upper panels) α , β and $$\sigma _{{x_{1} \,\colon\,x_{{21}} ,{\epsilon}}}^{2} $$ ; (lower panels) time effect κ 1834:1900, log-volatility γ 1835:1990 and first difference $$\Delta \bar{\kappa }_{t} $$ , for Danish male mortality data (1835–1990) using the LCSV-H model. LCSV-H, Lee–Carter stochastic volatility model with heteroscedasticity; CI, confidence interval.

Figure 4 Estimation of (upper panels) α , β and $$\sigma _{{x_{1} \,\colon\,x_{{21}} ,{\epsilon}}}^{2} $$ ; (lower panels) time effect κ 1949:1990, log-volatility γ 1950:1990 and first difference $$\Delta \bar{\kappa }_{t} $$ , for Danish male mortality data (1950–1990) using the LCSV-H model. LCSV-H, Lee–Carter stochastic volatility model with heteroscedasticity; CI, confidence interval.

6.3. Model assessment

To compare the fit of the models to the data, we apply deviance information criterion (DIC) as a Bayesian measures of model complexity and fit (Spiegelhalter et al., Reference Spiegelhalter, Best, Carlin and van der Linde2002). It is common to assess and compare models with latent variables using conditional DIC (Berg et al., Reference Berg, Meyer and Yu2004; Celeux et al., 2006). Specifically, we use the so-called conditional log-likelihood which is calculated as

(84) $${\rm ln} f\left( {{\mib{y}} _{{1\: \colon\: T}} \!\mid\!{\mib \bipsi } ,\kappa _{{1\: \colon\: T}} } \right)\, {\equals} \, \mathop{\sum}\limits_{x\, {\equals} \, x_{1} }^{x_{p} } \mathop{\sum}\limits_{t\, {\equals} \, 1}^T \left( {\, {\minus}\, {1 \over 2}{\rm ln}\,2\pi \, {\minus}\, {\rm ln}\,\sigma _{{{\varepsilon},x}} \, {\minus}\, {1 \over 2}\left( {{{y_{{x,t}} \, {\minus}\, \left( {\alpha _{x} {\plus}\beta _{x} \kappa _{t} } \right)} \over {\sigma _{{{\varepsilon},x}} }}} \right)^{2} } \right)$$

Note that the likelihood is conditional on parameters that include both static parameters and the latent process κ. Using the conditional log-likelihood function, the deviance is defined as

(85) $$D({\bf \Psi })\, {\equals} \, \, {\minus}\, 2{\rm ln}\,f\left( {{\mib{y}} _{{1\: \colon\: T}} \!\mid\!{\bf \Psi }){\plus}2{\rm ln}\,h({\mib{y}} _{{1\: \colon\: T}} } \right)$$

where $${\bf \Psi } $$ =( $$\bipsi $$ , κ 1:T ) and we assume h( y 1:T )=1 since in the models we consider it plays the role of a constant which is the same for competing models. The effective dimension, p D , is evaluated as

(86) $$p_{D} \, {\equals} \, \bar{D}({\bf \Psi })\, {\minus}\, D({\bf \bar{\Psi }})$$

where $$\bar{D}({\bf \Psi })$$ and $${\bf \bar{\Psi }}$$ denote, respectively, the mean of D( $${\bf \Psi } $$ ) and the mean of the posterior distribution of $${\bf \Psi } $$ . The conditional DIC is then given by

(87) $${\rm DIC}\,\colon\! {\equals} \, \bar{D}({\bf \Psi }){\plus}p_{D} \, {\equals} \, 2\bar{D}({\bf \Psi })\, {\minus}\, D({\bf \bar{\Psi }})$$

which can be evaluated straightforwardly using MCMC samples.

The DIC values for the models with different calibration periods are shown in Table 4 Footnote 6 . The LCSV-H model clearly outperforms other models for all calibration periods considered. The LC-H is the second best; in fact it even outperforms the LCSV model for long calibration periods 1835–2010 and 1835–1990. These results suggest that one is strongly encouraged to include heteroscedasticity structure when modelling Danish mortality data. Note that for long calibration periods, the incorporation of stochastic volatility can markedly improve model fit as can be seen by comparing the LCSV-H model and the LC-H model, as well as the comparison between LCSV model and the LC model.

Table 4 Deviance information criterion of models with different calibration periods.

LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity.

6.4. Forecasting

In this section, we investigate the forecasting properties of the mortality models summarised in Table 2 where heteroscedasticity as well as stochastic volatility structures are incorporated. Our analysis is based on the forecasting distributions of (log) death rates and life expectancy. The Bayesian state-space framework allows us to obtain the forecasting distributions using MCMC samples which is shown below.

6.4.1. Death rates

For the LC (LC-H) model, the k-step ahead forecasting distribution of y T+k , given y 1:T , is given by

(88) $$\pi \left( {{\mib{y}} _{{T{\plus}k}} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} } \right)\, {\equals} \, {\int} {\pi \left( {{\mib{y}} _{{T{\plus}k}} \!\mid\!\kappa _{{T{\plus}k}} ,{\mib \bipsi } } \right)\pi \left( {\kappa _{{T{\plus}k}} \!\mid\!\kappa _{{T{\plus}k\, {\minus}\, 1}} ,{\mib \bipsi } } \right)\,\ldots\,\pi \left( {\kappa _{T} ,{\mib \bipsi } \!\mid\!{\mib{y}} _{{1\: \colon\: T}} } \right)\:d{\mib \bipsi } d\kappa _{{T\: \colon\: T{\plus}k}} } $$

where $$\bipsi $$ is the parameter vector for the LC (LC-H) model; (88) suggests that we can sample recursively to obtain the forecasting distribution, for k≥1, as follows:

(89a) $$\kappa _{{T{\plus}k}}^{{(\ell )}} \: \sim\: N\left( {\kappa _{{T{\plus}k\, {\minus}\, 1}}^{{(\ell )}} {\plus}\theta ^{{(\ell )}} ,\left( {\sigma _{\omega }^{2} } \right)^{{(\ell )}} } \right)$$
(89b) $${\mib{y}} _{{T{\plus}k}}^{{(\ell )}} \: \sim\: N\left( {{\mib{\alpha }} ^{{(\ell )}} {\plus}{\mib{\beta }} ^{{(\ell )}} \kappa _{{T{\plus}k}}^{{(\ell )}} ,\,\Sigma ^{{(\ell )}} } \right)$$

where $$\ell \, {\equals} \, 1,\,\ldots\:,L$$ and L is the number of MCMC iterations after burn-in. Here Σ is a diagonal matrix with $$\sigma _{{{\varepsilon},x}}^{2} $$ on the diagonal for the LC-H model and $$\sigma _{{\varepsilon}}^{2} $$ for the LC model. This procedure generates an estimate of the forecasting distribution.

Similarly, the forecasting distribution of y T+k , given y T , for the LCSV (LCSV-H) model is given by

(90) $$\eqalignno{ \pi \left( {{\mib{y}} _{{T{\plus}k}} \!\mid\!{\mib{y}} _{{1\: \colon\: T}} } \right)\, {\equals} \, &#x0026; {\int} {\pi \left( {{\mib{y}} _{{T{\plus}k}} \!\mid\!\kappa _{{T{\plus}k}} ,\,{\mib \bipsi } } \right)\pi \left( {\kappa _{{T{\plus}k}} \!\mid\!\kappa _{{T{\plus}k\, {\minus}\, 1}} ,\,\gamma _{{T{\plus}k}} ,\,{\mib \bipsi } } \right)\,\ldots\,} \cr &#x0026;\pi \left( {\gamma _{{T{\plus}1}} \!\mid\!\gamma _{T} ,\,{\mib \bipsi } } \right)\pi \left( {\kappa _{T} ,\,\gamma _{T} ,\,{\mib \bipsi } \!\mid\!{\mib{y}} _{{1\: \colon\: T}} } \right)d{\mib \bipsi } d\kappa _{{T\: \colon\: T{\plus}k}} d\gamma _{{T\: \colon\: T{\plus}k}} $$

For k≥1, the forecasting distribution can be obtained by sampling recursively

(91a) $$\gamma _{{T{\plus}k}}^{{(\ell )}} \: \sim\: N\left( {\lambda _{1}^{{(\ell )}} \gamma _{{T{\plus}k\, {\minus}\, 1}}^{{(\ell )}} {\plus}\lambda _{2}^{{(\ell )}} ,\,\big( {\sigma _{\gamma }^{2} } \big)^{{(\ell )}} } \right)$$
(91b) $$\kappa _{{T{\plus}k}}^{{(\ell )}} \: \sim\: N\left( {\kappa _{{T{\plus}k\, {\minus}\, 1}}^{{(\ell )}} {\plus}\theta ^{{(\ell )}} ,exp\left\{ {\gamma _{{T{\plus}k}}^{{(\ell )}} } \right\}} \right)$$
(91c) $${\mib{y}} _{{T{\plus}k}}^{{(\ell )}} \: \sim\: N\left( {{\mib \bialpha } ^{{(\ell )}} {\plus}{\mib \bibeta } ^{{(\ell )}} \kappa _{{T{\plus}k}}^{{(\ell )}} ,\,\Sigma ^{{(\ell )}} } \right)$$

where $$\ell \, {\equals} \, 1,\,\ldots\,,L$$ , and Σ is a diagonal matrix with $$\sigma _{{{\varepsilon},x}}^{2} $$ on the diagonal for the LCSV-H model and $$\sigma _{{\varepsilon}}^{2} $$ for the LCSV model.

Figure 5 shows the forecasted log death rates based on the LC-H, LCSV and LCSV-H model, using the LC model as a benchmark. We show age groups 5–9, 35–39, 65–69 and 95–99 as representatives of young, adult, old and very old age. The models are estimated using data for the period 1835–2010 and forecast for 30 years.

Figure 5 (Colour online) 30-year forecasted log death rates (2011–2041) for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–2010. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

The heteroscedasticity structure, from the LC-H model, gives rise to materially larger forecasting intervals for the young and very old age group, while the forecasting interval for the age group 35–39 is narrower than predicted by the LC model. The LCSV model, on the other hand, produces a wider forecasting interval compared to the LC model except for the very old age group. The observed wider forecasting interval is due to the fact that the volatility level is increasing in the last estimation periods and is larger than $$\sigma _{\omega }^{2} $$ estimated in the LC model. Moreover, as the estimated β x is close to zero at older ages (Figure 2), the impact of the forecasted κ on the prediction of death rates diminished significantly as older ages are considered. The LCSV-H model exhibits similar features of the LC-H and the LCSV model. It is interesting to note that the forecasted means obtained from the different models are very similar and their differences mainly lie in the forecasting interval.

To illustrate further the forecasting property of the LCSV model, we estimate the models for the period 1835–1990 and plot 20-year out-of-sample forecasted log death rates in Figure 6. It turns out the forecasting intervals predicted by the LCSV model tends to be narrower than the LC model, as the estimated $$\sigma _{\omega }^{2} $$ in the LC model is larger than the volatility level at the last estimation period for the LCSV model in this case. Note that the forecasted distributions produced by the LC-H model are biased compared to the benchmark LC model since the fitted rates at the last estimation period, that is year 1990, are different for the LC and LC-H model. This feature is known as jump-off error (Lee & Miller, Reference Lee and Miller2001). One may remove this jump-off bias by forcing the forecasted death rates to start at the actual rates instead of the fitted rates (Bell, Reference Bell1997; Shang et al., Reference Shang, Booth and Hyndman2011). In this paper we do not perform this procedure, however.

Figure 6 (Colour online) 20-year out-of-sample forecasted log death rates for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Figure 7 shows the forecasting distributions of log death rates where we assume a shorter calibration period from 1950 to 1990. For all the models, the estimated β x for all age groups, except for age groups 0, 1–4 and 5–9, are very close to zero. It is in fact expected since there is no clear downward trend in the observed mortality data besides the first few age groups, during the period 1950–1990. Therefore, there is only small difference between the forecasting distributions produced by the LC model and the LCSV model, except for young age groups. Note that there is a clear change of downward trend for some of the middle age groups for the Danish male mortality data as shown in Figure 7. It results in the out-of-sample data falling out of the lower bound of the 95% credible intervals and its consequence for the forecasting of life expectancy will be discussed in section 6.4.2 and generally in section 6.4.3.

Figure 7 (Colour online) 20-year out-of-sample forecasted log death rates for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1950–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

By comparing the forecast performance using in-sample data from 1835 to 1990 and from 1950 to 1990 displayed in Figures 6 and 7, we expose the influence that leaving out important historical events, that may affect the mortality rates markedly in a population, can have on the ability to accurately model trend and volatility structures in population dynamics. In particular, we observe that one must be cautious as forecast performance can degrade markedly when important historical events are excluded from the sample as the forecast using data from 1835 to 1990 has clearly outperformed the forecast using only shorter calibration data from 1950 to 1990.

6.4.2. Life expectancy

Using the samples of the forecasted log death rates $$y_{{x,t}}^{{(\ell )}} \, {\equals} \, {\rm ln}\,\hat{m}_{{x,t}}^{{(\ell )}} $$ , where $$\ell \, {\equals} \, 1,\,\ldots\,,L$$ and L is the number of MCMC samples, we can obtain the so-called period life expectancy at different ages by constructing an abridged life table, since we use age group data, as follows (Koissi et al., Reference Koissi, Shapiro and Hognas2006; Yusuf et al., Reference Yusuf, Martins and Swanson2014). We consider age group x∈{0, 1–4, 5–9, … , 95–99} and $$\tilde{x}$$ is defined as the initial age of age group $$x$$ , that is $$\tilde{x}\in\{ 0,1,5,\,\ldots\,,90,95\} $$ . Define $$n_{{\tilde{x}}} $$ as the length of the interval of age group x (corresponds to $$\tilde{x}$$ ) and hence we have n 0=1, n 1=4, n 5=5, … , n 95=5. We then calculate the (crude) death probabilityFootnote 7 that a person aged $$\tilde{x}$$ in year t will die in the next $$n_{{\tilde{x}}} $$ years as

(92) $$_{{n_{{\tilde{x}}} }} \hat{q}_{{\tilde{x},t}}^{{(\ell )}} \, {\equals} \, {{n_{{\tilde{x}}} \:\hat{m}_{{x,t}}^{{(\ell )}} } \over {1{\plus}n_{{\tilde{x}}} \left( {1\, {\minus}\, a\left( {\tilde{x},n_{{\tilde{x}}} } \right)} \right)\hat{m}_{{x,t}}^{{(\ell )}} }}$$

where $$a\left( {\tilde{x},n_{{\tilde{x}}} } \right)$$ is the average fraction of the $$n_{{\tilde{x}}} $$ years lived by the people who is initially aged $$\tilde{x}$$ in that interval. Using the assumption that deaths are distributed uniformly in the interval, we set $$a(\tilde{x},n_{{\tilde{x}}} )\, {\equals} \, 0.5$$ for every $$\tilde{x}$$ Footnote 8 . The hypothetical number of people alive at age $$\tilde{x}{\plus}n_{{\tilde{x}}} $$ , $$l_{{\tilde{x}{\plus}n_{{\tilde{x}}} ,t}}^{{(\ell )}} $$ , is determined by $$l_{{\tilde{x}{\plus}n_{{\tilde{x}}} ,t}}^{{(\ell )}} \, {\equals} \, l_{{\tilde{x},t}}^{{(\ell )}} \big( {1\, {\minus}\, _{{n_{{\tilde{x}}} }} q_{{\tilde{x},t}}^{{(\ell )}} } \big)$$ , where $$l_{{0,t}}^{{(\ell )}} $$ is assumed to be 100,000. We can then calculate the number of deaths $$_{{n_{{\tilde{x}}} }} d_{{\tilde{x},t}}^{{(\ell )}} \, {\equals} \, l_{{\tilde{x},t}}^{{(\ell )}} \, {\minus}\, l_{{\tilde{x}{\plus}n_{{\tilde{x}}} ,t}}^{{(\ell )}} $$ and the person-years lived $$_{{n_{{\tilde{x}}} }} L_{{\tilde{x},t}}^{{(\ell )}} \, {\equals} \, n_{{\tilde{x}}} \big( {l_{{\tilde{x}{\plus}n_{{\tilde{x}}} ,t}}^{{(\ell )}} {\plus}a(\tilde{x},n_{{\tilde{x}}} ){\times}_{{n_{{\tilde{x}}} }} d_{{\tilde{x},t}}^{{(\ell )}} } \big)$$ . The total future lifetime of the $$l_{{\tilde{x},t}}^{{(\ell )}} $$ persons who attain age $$\tilde{x}$$ is $$T_{{\tilde{x},t}}^{{(\ell )}} \, {\equals} \, \mathop{\sum}\nolimits_{i\geq \tilde{x}} _{{n_{{\tilde{x}}} }} L_{{i,t}}^{{(\ell )}} $$ , where i∈{0, 1, 5, … , 90, 95}. Finally, a sample of the period life expectancy at age $$\tilde{x}$$ is obtained as

(93) $$e_{{\tilde{x},t}}^{{(\ell )}} \, {\equals} \, {{T_{{\tilde{x},t}}^{{(\ell )}} } \mathord{\left/ {\vphantom {{T_{{\tilde{x},t}}^{{(\ell )}} } {l_{{\tilde{x},t}}^{{(\ell )}} }}} \right. \kern-\nulldelimiterspace} {l_{{\tilde{x},t}}^{{(\ell )}} }}$$

and the distributions are obtained in different forecasting year t=T+k, where k≥1.

Remark 6.1 (Period and cohort life expectancy): Period life expectancy assumes there is no trend for future death rates (it is evaluated based on the age-specific death rates in a fixed year t) while cohort life expectancy assumes death rates following the lifetime of a cohort and hence it takes mortality trend into account. For example, to evaluate period life expectancy at age 65 in year t, one needs $$\{ _{5} q_{{65,t}} ,_{5} q_{{70,t}} ,\,\ldots\,,_{5} q_{{95,t}} \} $$ while for cohort life expectancy, $$\{ _{5} q_{{65,t}} ,_{5} q_{{70,t{\plus}5}} ,\,\ldots\,,_{5} q_{{95,t{\plus}30}} \} $$ are used instead. However, the cohort life expectancy for people born in recent years cannot be evaluated using data alone since some of the death rates data are yet to be observed. As a result we focus on period life expectancy so that our forecasts can be compared with the observed data.

Figure 8 shows the forecasted life expectancy at birth, age 65 and age 85 for all the models estimated using data from 1835 to 2010. Interestingly, the forecasted life expectancy at birth is very similar for the LC model and LC-H model for the calibration period 1835–2010. It reflects the fact that forecasting intervals of death rates produced by the LC-H model are wider for some age groups and narrower for other, compared to the LC model, as can be seen from Figure 5. It turns out that these effects almost cancel each other out for this case as death rates are aggregated for all age groups to form the life expectancy at birth distribution. This explanation does not apply to life expectancy at ages 65 and 85, however, since only forecasted death rates for age groups larger than 65 and 85, respectively, are used to obtain the corresponding life expectancy distribution. As the forecasting intervals of death rates generated by the LC-H model tend to be narrower for old age groups compared with the LC model, the interval for the forecasted life expectancy at ages 65 and 85 distribution produced by the LC-H model is observably narrower than the LC model. Note that we only show four different age groups in Figure 5; projected death rates for other age groups are also relevant but are not displayed due to limited space. In contrast to the LC-H model, we can observe from Figure 5 that LCSV model produces generally wider forecasting intervals for death rates at different age groups compared to the LC model. Consequently, the forecasting intervals for life expectancy at various ages generated by the LCSV model is wider than the LC model, as shown in Figure 8. Similarly to the case of death rates forecasting, the LCSV-H model has both the features of the LC-H and LCSV model in terms of life expectancy prediction.

Figure 8 (Colour online) 30-year forecasted life expectancy (2011–2041) at birth, ages 65 and 85 for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–2010. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Results for forecasted life expectancy using data from 1835 to 1990 are shown in Figure 9. It is clear that the forecasting intervals for life expectancy at different ages produced by the LCSV model are narrower compared to the LC model, which can be traced back to the fact that the LCSV model predicts narrower forecasting intervals for death rates at various age groups than the prediction by the LC model, see Figure 6. Apparently the jump-off bias are quite different for the LC-H (LCSV-H) model and the LC model. Note also that the “cancel out” effect of aggregated death rates (Figure 6) for the life expectancy at birth for the LC-H model is less prominent for the calibration period 1835–1990 than the period 1835–2010.

Figure 9 (Colour online) 20-year out-of-sample forecasted life expectancy (1991–2010) at birth, ages 65 and 85 for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

As we use the fitted death rates instead of the observed death rates in the jump-off year (i.e. year 2010), there is a jump-off bias in the forecasted death rates. The forecasted life expectancy at age 65 is particularly sensitive to this jump-off bias. It comes from a sudden decline of death rates for age groups larger than 65 beginning in year 1990, hence a significant increase of life expectancy at age 65 is observed. The jump-off bias is significantly smaller when the calibration period 1835–1990 is considered, see Figure 9. As expected, the forecasted distributions of life expectancy at birth and age 65 are similar for all the models estimated using mortality data from year 1950–1990 (Figure 10). Note that the 95% credible intervals capture poorly the out-of-sample data in this case except for the life expectancy at age 85. It is a consequence of the sudden change of significant downward trend for the death rates observed in the middle age groups of the Danish mortality data starting from around 1990, see Figure 7, as well as the jump-off bias. We discuss about the linear trend assumption and jump-off bias in the next section.

Figure 10 (Colour online) 20-year out-of-sample forecasted life expectancy (1991–2010) at birth, ages 65 and 85 for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1950–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

6.4.3. Linear trend assumption and jump-off bias

In performing the forecasting of death rates and life expectancies, we use the models summarised in Table 2 where the LC-H, LCSV and LCSV-H models are variants of the LC model in which a linear trend of the period effect is assumed. However, for the Danish male mortality data that we used, the overall trend is reasonably linear for the whole period 1835–2010, but the same may not be said for the shorter period 1950–2010 as Figures 6 and 7 indicate.

In particular, we observe in Figure 7 that there is a clear change of trend for the death rates of middle age groups. Such a change of trend is difficult, if not impossible, to predict in terms of timing and magnitude. We also perform the analysis on French male mortality data and found similar patterns. For forecasting purpose, one may therefore argue that expert opinion will be an important factor in predicting mortality. Even though any change of mortality trend in the short run cannot be predicted with reasonable accuracy using data alone, it can be detected if the instantaneous volatility of mortality is quantified. For example, using the LCSV-H model, the log-volatility γ is quantified and we observe from Figure 2 that γ started to increase around 1990 after several decades of declining. The change of volatility level not only affect the prediction intervals as discussed in previous sections, but also indicates that the change of mortality is heightened and one should be cautious whether a change of trend is taking place.

We also find that jump-off bias, discussed in sections 6.4.1–6.4.2, is an important factor in predicting deaths rates and life expectancies. We note that it is straight forward to remove the jump-off bias by adjusting (89)–(91) so that the actual death rates, instead of the fitted death rates, are used in the beginning of the forecasting period. We do not provide the corresponding plots in the paper but one can envisage the results simply by shifting the forecasted distribution so that the forecasted mean is attached to the (in-sample) data at the end of the estimation year. Removing the jump-off bias will have a significant impact on the accuracy of mortality forecasting especially when data exhibit clear trending.

7. Concluding Remarks

We developed and presented a comprehensive state-space framework for stochastic mortality modelling. The state-space approach has two key advantages. First, it puts modelling, estimation and forecasting of mortality in a unified framework in contrast to common practice in this area. Second, the methodology permits realistic and sophisticated mortality models to be estimated and forecasted, which could be difficult to handled using other approaches.

We show that many of the popular mortality models exist in the literature can be cast in state-space form. We then suggest several classes of mortality models that can be classified as linear Gaussian state-space models and non-linear/non-Gaussian state-space models. Our proposals are not exhaustive but aim to illustrate the flexibility of the methodology. In particular, we incorporate heteroscedasticity and stochastic volatility in mortality modelling, as an examination of mortality data suggests that volatility of death rates is not constant in the age and time dimension over a long time period. Moreover, we propose an alternative identification constraint for the LC type modelling, which is tailored for the state-space approach.

Frequestist state-space inference for stochastic mortality models is carried out and explained based on the gradient and Hessian of the marginalised likelihood developed recently in statistics literature. We also utilise a modern approach to Bayesian inference for state-space modelling hinged on the PMCMC framework. In particular, we develop a sampler using a combination of Rao-Blackwellised Kalman filter and particle filter for the latent state process full posterior conditionals, combined with Gibbs sampling steps for the static model parameters to estimate a stochastic volatility model for mortality proposed in this paper.

Using mortality data of Danish male population, we assess the extended models based on deviance conditional criterion. It is found that incorporating heteroscedasticity is a crucial improvement factor in model fitting, while model complexity is accounted for. The incorporation of stochastic volatility clearly enhances model performance for fitting of long-term mortality time series. Estimation results for long calibration period support the assumption of stochastic volatility. We show that forecasting can be carried out straightforwardly in state-space framework under a Bayesian setting. We examine the forecasting properties of the models using different calibration periods. The inclusion of heteroscedasticity and stochastic volatility substantially affects prediction intervals of death rate and life expectancy distributions. The linear trend assumption commonly found in mortality modelling and jump-off bias are discussed in light of the Danish mortality data.

State-space framework provides attractive features that are of importance to mortality modelling. The methods and results developed and shown in the paper will have significant implications for longevity risk management in actuarial applications. In particular, we anticipate the models and methods introduced in this paper can be employed to longevity risk applications such as the state-space longevity hedging methods proposed in Liu & Li (2016 Reference Liu and Lia ) and Liu & Li (2016 Reference Liu and Lib ) which is a topic of future research.

Acknowledgements

This research was supported by the CSIRO-Monash Superannuation Research Cluster, a collaboration among CSIRO, Monash University, Griffith University, the University of Western Australia, the University of Warwick and stakeholders of the retirement system in the interest of better outcomes for all. This research was also partially supported under the Australian Research Council’s Discovery Projects funding scheme (project number: DP160103489). The authors also acknowledge project “Research on Urban Intelligence” from Research Organization of Information and Systems in Japan.

Appendix A: Differentiation Matrices in Gradient-Based Estimation

For the LC-H model, the parameter vector is denoted by $${\mib \bipsi } \, {\equals} \, \big( {\alpha _{{x_{2} \: \colon\: x_{p} }} ,\beta _{{x_{2} \: \colon\: x_{p} }} ,\sigma _{{{\varepsilon},x_{1} \: \colon\: x_{p} }}^{2} ,\theta ,\sigma _{\omega }^{2} } \big)$$ with dimension n=3p, where p is the number of age group considered. We are required to evaluate $${{\partial {\mib \bialpha } } \over {\partial {\mib \bipsi } _{i} }}$$ , $${{\partial {\mib \bibeta } } \over {\partial {\mib \bipsi } _{i} }}$$ , $${{\partial {\bf \Sigma } } \over {\partial {\mib \bipsi } _{i} }}$$ , $${{\partial {\mib \bitheta } } \over {\partial {\mib \bipsi } _{i} }}$$ and $${{\partial {\mib \bisigma } _{{\mib \omega } }^{{\bf 2}} } \over {\partial {\mib \bipsi } _{i} }}$$ in the gradient-based estimation (section 4.2). Define

$$\eqalignno{ &#x0026; {\mib \bipsi } _{\alpha } \,\colon\! {\equals} \, \psi _{{1\: \colon\: p\, {\minus}\, 1}} \, {\equals} \, \alpha _{{x_{2} \: \colon\: x_{p} }} \left( {\,\colon\! {\equals} \, {\mib \bialpha } _{{\, {\minus}\, x_{1} }} } \right),\quad {\mib \bipsi } _{\beta } \,\colon\! {\equals} \, \psi _{{p\: \colon\: 2p\, {\minus}\, 2}} \, {\equals} \, \beta _{{x_{2} \: \colon\: x_{p} }} \left( {\,\colon\! {\equals} \, {\mib \bibeta } _{{\, {\minus}\, x_{1} }} } \right) \cr &#x0026; {\mib \bipsi } _{{\sigma _{{\varepsilon}}^{2} }} \,\colon\! {\equals} \, \psi _{{2p\, {\minus}\, 1\: \colon\: 3p\, {\minus}\, 2}} \, {\equals} \, \sigma _{{{\varepsilon},x_{1} \: \colon\: x_{p} }}^{2} ,\quad \psi _{\theta } \,\colon\! {\equals} \, \psi _{{3p\, {\minus}\, 1}} \, {\equals} \, \theta ,\quad \psi _{{\sigma _{\omega }^{2} }} \,\colon\! {\equals} \, \psi _{{3p}} \, {\equals} \, \sigma _{\omega }^{2} $$

Then we have

$${{\partial \left( {{\mib \bialpha } _{{\, {\minus}\, x_{1} }} } \right)_{j} } \over {\partial \left( {{\mib \bipsi } _{\alpha } } \right)_{i} }}\, {\equals} \, \delta _{{ij}} \, {\equals} \, {{\partial \left( {{\mib \bibeta } _{{\, {\minus}\, x_{1} }} } \right)_{j} } \over {\partial \left( {{\mib \bipsi } _{\beta } } \right)_{i} }},\quad i,j\, {\equals} \, 1,\,\ldots\!,p\, {\minus}\, 1$$
$${{\partial (\Sigma )_{{jj}} } \over {\partial \left( {{\mib \bipsi } _{{\sigma _{{\varepsilon}}^{2} }} } \right)_{i} }}\, {\equals} \, \delta _{{ij}} ,\quad i,j\, {\equals} \, 1,\,\ldots\!,p$$
$${{\partial \theta } \over {\partial \psi _{\theta } }}\, {\equals} \, 1\, {\equals} \, {{\partial \sigma _{\omega }^{2} } \over {\partial \psi _{{\sigma _{\omega }^{2} }} }}$$

where δ ij =1 if j=i and zero otherwise; Σ is a diagonal matrix with diagonal $$\sigma _{{{\varepsilon},x_{1} \: \colon\: x_{p} }}^{2} $$ . Note that $${{\partial \left( {\mib \bialpha } \right)_{1} } \over {\partial \left( {{\mib \bipsi } _{\alpha } } \right)_{i} }}\, {\equals} \, {{\partial \left( {\mib \bibeta } \right)_{1} } \over {\partial \left( {{\mib \bipsi } _{\beta } } \right)_{i} }}\, {\equals} \, 0$$ for i=1, … , p−1, where $${\mib \bialpha } \, {\equals} \, \alpha _{{x_{1} \: \colon\: x_{p} }} $$ and $${\mib \bibeta } \, {\equals} \, \beta _{{x_{1} \: \colon\: x_{p} }} $$ .

Appendix B: A Review of SMC Method

SMC, also known as particle filtering, can be viewed as a generalisation of Kalman filtering in state-space modelling context. The method is based on importance sampling and it has become an essential sampling-based tool in many domains (Doucet et al., Reference Doucet, De Freitas and Gordon2001). In the following, we give a brief review of the method using the LCSV model, (23b)–(23c), as an example to derive a basic particle filtering algorithm. Our target density is the joint posterior distribution of the states for stochastic volatility:

(94) $$\pi \left( {\gamma _{{1\: \colon\: t}} \!\mid\!\kappa _{{0\: \colon\: t}} } \right)$$

where the parameters of the model are assumed to be known and is suppressed here for ease of notation. To apply importance sampling, we first calculate

(95) $$\eqalignno{ \pi \left( {\gamma _{{1\: \colon\: t}} \!\mid\!\kappa _{{0\: \colon\: t}} } \right)\, {\equals} \, &#x0026; {{\pi \left( {\kappa _{t} \!\mid\!\gamma _{{1\: \colon\: t}} ,\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)\pi \left( {\gamma _{{1\: \colon\: t}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)} \over {\pi \left( {\kappa _{t} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)}} \cr \, {\equals} \, &#x0026; {{\pi \left( {\kappa _{t} \!\mid\!\gamma _{{1\: \colon\: t}} ,\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)\pi \left( {\gamma _{t} \!\mid\!\gamma _{{1\: \colon\: t\, {\minus}\, 1}} ,\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)} \over {\pi \left( {\kappa _{t} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)}}\pi \left( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right) \cr \, {\equals} \, &#x0026; {{\pi \left( {\kappa _{t} \!\mid\!\gamma _{t} ,\kappa _{{t\, {\minus}\, 1}} } \right)\pi \left( {\gamma _{t} \!\mid\!\gamma _{{t\, {\minus}\, 1}} } \right)} \over {\pi \left( {\kappa _{t} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)}}\pi \left( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right) $$

The importance density is assumed to satisfy

(96) $$g_{{1\: \colon\: t}} \left( {\gamma _{{1\: \colon\: t}} \!\mid\!\kappa _{{0\: \colon\: t}} } \right)\,\colon\! {\equals} \, g_{t} \left( {\gamma _{t} \!\mid\!\gamma _{{1\: \colon\: t\, {\minus}\, 1}} ,\kappa _{{0\: \colon\: t}} } \right)g_{{1\: \colon\: t\, {\minus}\, 1}} \left( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)$$

and the importance weight is given by

(97) $$\eqalignno{ \tilde{w}_{t} \, {\equals} \, &#x0026; {{\pi \left( {\kappa _{t} \!\mid\!\gamma _{t} ,\kappa _{{t\, {\minus}\, 1}} } \right)\pi \left( {\gamma _{t} \!\mid\!\gamma _{{t\, {\minus}\, 1}} } \right)} \over {\pi \left( {\kappa _{t} \!\mid\!\kappa _{{t\, {\minus}\, 1}} } \right)g_{t} \left( {\gamma _{t} \!\mid\!\gamma _{{1\: \colon\: t\, {\minus}\, 1}} ,\kappa _{{0\: \colon\: t}} } \right)}}{{\pi \left( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)} \over {g_{{0\: \colon\: t\, {\minus}\, 1}} \left( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)}} \cr \propto &#x0026; {{\pi \left( {\kappa _{t} \!\mid\!\gamma _{t} ,\kappa _{{t\, {\minus}\, 1}} } \right)\pi \left( {\gamma _{t} \!\mid\!\gamma _{{t\, {\minus}\, 1}} } \right)} \over {g_{t} \left( {\gamma _{t} \!\mid\!\gamma _{{1\: \colon\: t\, {\minus}\, 1}} ,\kappa _{{0\: \colon\: t}} } \right)}}\:\tilde{w}_{{t\, {\minus}\, 1}} \cr \,\colon\! {\equals} \, &#x0026; \hat{w}_{t} \:\tilde{w}_{{t\, {\minus}\, 1}} $$

where $$\hat{w}_{t} $$ is called the incremental importance weight. The normalised importance weights are then obtained as $$w_{t}^{{(i)}} \,\colon\! {\equals} \, {{\tilde{w}_{t}^{{(i)}} } \mathord{\big / { \vphantom {{\tilde{w}_{t}^{{(i)}} } {{\sum}\nolimits_{j\, {\equals} \, 1}^N \tilde{w}_{t}^{{(j)}} }}} \right \kern-\nulldelimiterspace} {\mathop{\sum}\nolimits_{j\, {\equals} \, 1}^N \tilde{w}_{t}^{{(j)}} }}$$ . To summarise, suppose we have N particle paths $$\big( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}}^{{(i)}} ,w_{{t\, {\minus}\, 1}}^{{(i)}} } \big)_{{i\, {\equals} \, 1}}^{N} $$ to approximate the density $$\pi \left( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}} \!\mid\!\kappa _{{0\: \colon\: t\, {\minus}\, 1}} } \right)$$ at time t−1. Then, from (96), the tth particle path at time t is given by $$\gamma _{{1\: \colon\: t}}^{{(i)}} \, {\equals} \, \big( {\gamma _{{1\: \colon\: t\, {\minus}\, 1}}^{{(i)}} ,\gamma _{t}^{{(i)}} } \big)$$ , where $$\gamma _{t}^{{(i)}} $$ is sampled from $$g_{t} \big( {\gamma _{t} \!\mid\!\gamma _{{1\: \colon\: t\, {\minus}\, 1}}^{{(i)}} ,\kappa _{{0\: \colon\: t}} } \big)$$ . The target density π(γ 1:t |κ 0:t ) is approximated by $$\big( {\gamma _{{1\: \colon\: t}}^{{(i)}} ,w_{t}^{{(i)}} } \big)_{{i\, {\equals} \, 1}}^{N} $$ , where the normalised weight $$w_{t}^{{(i)}} $$ is obtained from (97) and normalisation is carried out.

The problem of degeneracy, that is a majority of the particle paths may have negligible weight, can be handled by resampling. Specifically, we define the so-called effective sample size

(98) $$N_{{eff}} \,\colon\! {\equals} \, \left( {\mathop{\sum}\limits_{i\, {\equals} \, 1}^N \left( {w_{t}^{{(i)}} } \right)^{2} } \right)^{{\!\!\! {\minus}\: 1}} $$

At each time t, if N eff is smaller than some threshold (e.g. 80% of N) then we draw N samples (denoted by N(i), i=1, … , N) from a multinomial distribution with probability weights $$w_{t}^{{(i)}} $$ , i=1, … , N, and replace the particle paths $$\gamma _{{1\: \colon\: t}}^{{(i)}} $$ by $$\gamma _{{1\: \colon\: t}}^{{(N(i))}} $$ , and set $$w_{t}^{{(i)}} \, {\equals} \, 1\,/\,N$$ . The resampling step allows to keep the particle paths in proportion to their weights and tend to discard those that have negligible weights.

Footnotes

1 Population basis risk refers to the risk that the mortality experience of a portfolio being hedged is different to the mortality experience underlying an index-based longevity hedging instrument. This discrepancy causes the hedge to be less effective.

2 Alternatively, one may treat the LC model as a model for the log central death rate ln m x,t =α x +β x κ t . The distinction of the crude and central death rate is of particular importance when one considers a Poisson regression setup of death counts (discussed in section 2.3.2) where the dynamics of the central death rate is being modelled (Cairns et al., Reference Cairns, Blake, Dowd, Coughlan, Epstein, Ong and Balevich2009; Dowd et al., Reference Dowd, Cairns, Blake, Coughlan, Epstein and Khalaf-Allah2010).

3 We omit here the refitting procedure for κ suggested in Lee & Carter (Reference Lee and Carter1992).

4 We have also performed numerous simulation studies using synthetic data to confirm the effectiveness of our estimation approaches but these are omitted here for space considerations. They are available upon request.

5 http://www.mortality.org/ (accessed on September 2015).

6 The lower the DIC value, the better the model in terms of a trade-off of fit and complexity.

7 The death probability is “crude” in the sense that the crude death rate is used for the calculation. For a discussion of crude and true death probabilities, see Dowd et al. (Reference Dowd, Cairns, Blake, Coughlan, Epstein and Khalaf-Allah2010).

8 The Human Mortality Database provides abridged life tables with specific values for $$a\left( {\tilde{x},n_{{\tilde{x}}} } \right)$$ in different period. For ease of comparison we use the typical assumption that $$a\left( {\tilde{x},n_{{\tilde{x}}} } \right)\, {\equals} \, 0.5$$ . For the special case when age, instead of age group is considered (i.e. $\tilde{x}\, {\equals} \, x$ ), the 1-year death probability q x,t in year t is defined as the ratio of death counts and the population at the beginning of the year. Assuming half of the deaths occurred during the first half of the year, then we have $q_{{x,t}} \,\colon\! {\equals} \, {{D_{{x,t}} } \mathord{\left/ {\vphantom {{D_{{x,t}} } {\left( {E_{{x,t}} {\plus}0.5\:D_{{x,t}} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {E_{{x,t}} {\plus}0.5\:D_{{x,t}} } \right)}}\, {\equals} \, {{\hat{m}_{{x,t}} } \mathord{\left/ {\vphantom {{\hat{m}_{{x,t}} } {\left( {1{\plus}0.5\:\hat{m}_{{x,t}} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {1{\plus}0.5\:\hat{m}_{{x,t}} } \right)}}$ where E x,t is the population at the middle of the year. The $n_{{\tilde{x}}} \, {\minus}\, {\rm year}$ death probability (92) for the general case when age group is considered can be derived similarly.

References

Analitis, A., Katsouyanni, K., Biggeri, A., Baccini, M., Forsberg, B., Bisanti, L., Kirchmayer, U., Ballester, F., Cadum, E., Goodman, P., Hojs, A., Sunyer, J., Tittanen, P. & Michelozzi, P. (2008). Effects of cold weather on mortality: results from 15 European cities within the PHEWE project. American Journal of Epidemiology, 168(12), 13971408.Google Scholar
Andreev, K.F. (2002). Evolution of the Danish Population from 1835 to 2000. Odense University Press, Denmark.Google Scholar
Andrieu, C., Doucet, A. & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of Royal Statistical Society Series B, 72, 269342.Google Scholar
Bell, W.R. (1997). Comparing and assessing time series methods for forecasting age-specific fertility and mortality rates. Journal of Official Statistics, 13(3), 279303.Google Scholar
Berg, A., Meyer, R. & Yu, J. (2004). Deviance information criterion for comparing stochastic volatility models. Journal of Business and Economic Statistics, 22(1), 107120.Google Scholar
Biffis, E. (2005). Affine processes for dynamic mortality and actuarial valuations. Insurance: Mathematics and Economics, 37(3), 443468.Google Scholar
Brouhns, N., Denuit, M. & Vermunt, J.K. (2002). A Poisson log-bilinear regression approach to the construction of projected lifetables. Insurance: Mathematics and Economics, 31, 373393.Google Scholar
Cairns, A., Blake, D. & Dowd, K. (2006). A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. Journal of Risk and Insurance, 73(4), 687718.CrossRefGoogle Scholar
Cairns, A., Blake, D. & Dowd, K. (2008). Modelling and management of mortality risk: a review. Scandinavian Actuarial Journal, 2(3), 79113.Google Scholar
Cairns, A., Blake, D., Dowd, K., Coughlan, G., Epstein, D., Ong, A. & Balevich, I. (2009). A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North American Actuarial Journal, 13(1), 135.Google Scholar
Cairns, A., Blake, D., Dowd, K., Coughlan, G. & Khalaf-Allah, M. (2011). Bayesian stochastic mortality modeling for two populations. ASTIN Bulletin, 41(1), 2959.Google Scholar
Carter, C.K. & Kohn, R. (1994). On Gibbs sampling for state-space models. Biometrika, 81(3), 541553.Google Scholar
Celeux, C., Forbes, F., Robert, C.P. & Titterington, D.M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651674.Google Scholar
Chib, S., Nardari, F. & Shephard, N. (2002). Markov chain Monte Carlo methods for stochastic volatility models. Journal of Econometrics, 108, 281316.Google Scholar
Chopin, N., Jacob, P.E. & Papaspiliopoulos, O. (2013). SMCˆ2: an efficient algorithm for sequential analysis of state-space models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(3), 397426.Google Scholar
Currie, I.D. (2009). Smoothing and forecasting mortality rates with P-splines. Available online at the address http://www.ma.hw.ac.uk/ iain/research.talks.html (accessed on 20th August 2015).Google Scholar
Czado, C., Delwarde, A. & Denuit, M. (2005). Bayesian Poisson log-bilinear mortality projections. Insurance: Mathematics and Economics, 36, 260284.Google Scholar
Dawood, F.S., Iuliano, A.D., Reed, C., Meltzer, M.I., Shay, D.K., Cheng, P.-Y., Bandaranayake, D., Breiman, R.F., Brooks, W.A., Buchy, P., Feikin, D.R., Fowler, K.B., Gordon, A., Hien, N.T., Horby, P., Huang, Q.S., Katz, M.A., Krishnan, A., Lal, R., Montgomery, J.M., Molbak, K., Pebody, R., Presanis, A.M., Razuri, H., Steens, A., Tinoco, Y.O., Wallinga, J., Yu, H., Vong, S., Bresee, K. & Widdowson, M.A. (2012). Estimated global mortality associated with the first 12 months of 2009 pandemic influenza A H1N1 virus circulation: a modelling study. The Lancet Infectious Diseases, 12(9), 687695.Google Scholar
De Jong, P. & Tickle, L. (2006). Extending the Lee-Carter mortality forecasting. Mathematical Population Studies, 13, 118.CrossRefGoogle Scholar
Del Moral, P. (2004). Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Springer-Verlag, New York.Google Scholar
Dickson, D., Hardy, M. & Waters, H.R. (2009). Actuarial Mathematics for Life Contingent Risks. Cambridge University Press.Google Scholar
Doucet, A., De Freitas, J.F.G. & Gordon, N.J. (Eds.) (2001). Sequential Monte Carlo Methods in Practice. Springer, New York.Google Scholar
Doucet, A., Godsill, S. & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197208.Google Scholar
Dowd, K., Cairns, A., Blake, D., Coughlan, G., Epstein, D. & Khalaf-Allah, M. (2010). Evaluating the goodness of fit of stochastic mortality models. Insurance: Mathematics and Economics, 47, 255265.Google Scholar
Eloranta, S., Lambert, P.C., Andersson, T.M., Czene, K., Hall, P., Björkholm, M. & Dickman, P.W. (2012). Partitioning of excess mortality in population-based cancer patient survival studies using flexible parametric survival models. BMC Medical Research Methodology, 12(1), 1.CrossRefGoogle ScholarPubMed
England, P.D. & Haberman, S. (1993). A new approach to modeling excess mortality. Journal of Actuarial Practice, 1, 85117.Google Scholar
Flury, T. & Shephard, N. (2011). Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econometric Theory, 27, 933956.Google Scholar
Fouillet, A., Rey, G., Laurent, F., Pavillon, G., Bellec, S., Guihenneuc-Jouyaux, C., Clavel, J., Jougla, E. & Hémon, D. (2006). Excess mortality related to the August 2003 heat wave in France. International Archives of Occupational and Environmental Health, 80(1), 1624.Google Scholar
Fung, M.C., Peters, G.W. & Shevchenko, P.V. (2017). Cohort effects in mortality modelling: a Bayesian state-space analysis. Preprint, available at SSRN: 2907868.Google Scholar
Golightly, A. & Wilkinson, D.J. (2011). Bayesian parameter inference for stochastic biochemical network models using particle MCMC. Interface Focus, 1(6), 807820.Google Scholar
Harvey, A.C. (1989). Forecasting: Structural Time Series Models and the Kalman Filter. Cambridge University Press.Google Scholar
Hirz, J., Schmock, U. & Shevchenko, P.V. (2017). Actuarial applications and estimation of extended CreditRisk+. Risks 5, 1–23.Google Scholar
Hunt, A. & Villegas, A.M. (2015). Robustness and convergence in the Lee-Carter model with cohort effects. Insurance: Mathematics and Economics, 64, 186202.Google Scholar
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME – Journal of Basic Engineering, 82(Series D), 3545.CrossRefGoogle Scholar
Kim, S., Shephard, N. & Chib, S. (1998). Stochastic volatility: likelihood inference and comparison of ARCH models. Review of Economic Studies, 65, 361393.Google Scholar
Kogure, A., Kitsukawa, K. & Kurachi, Y. (2009). A Bayesian comparison of models for changing mortalities toward evaluating longevity risk in Japan. Asia-Pacific Journal of Risk and Insurance, 3(2), https://doi.org/10.2202/2153-3792.1036.Google Scholar
Kogure, A. & Kurachi, Y. (2010). A Bayesian approach to pricing longevity risk based on risk-neutral predictive distributions. Insurance: Mathematics and Economics, 46, 162172.Google Scholar
Koissi, M., Shapiro, A.F. & Hognas, Goran (2006). Evaluating and extending Lee-Carter model for mortality forecasting: bootstrap confidence interval. Insurance: Mathematics and Economics, 38, 120.Google Scholar
Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting U.S. mortality. Journal of the American Statistical Association, 87, 659675.Google Scholar
Lee, R.D. & Miller, T. (2001). Evaluating the performance of the Lee-Carter method for forecasting mortality. Demography, 38(4), 537549.Google Scholar
Li, J., Chan, W. & Cheung, S. (2011). Structural changes in the Lee-Carter mortality indexes. North American Actuarial Journal, 15(1), 1331.Google Scholar
Liu, J.S. (2008). Monte Carlo Strategies in Scientific Computing. Springer-Verlag, New York.Google Scholar
Liu, Y. & Li, J.S.-H. (2016 a). It’s all in the hidden states: a longevity hedging strategy with an explicit measure of population basis risk. Insurance: Mathematics and Economics, 70, 301319.Google Scholar
Liu, Y. & Li, J.S.-H. (2016 b). The locally linear Cairns-Blake-Dowd model: a note on delta-nuga hedging of longevity risk. ASTIN Bulletin, 47(1), 79151, 173.Google Scholar
Pedroza, C. (2006). A Bayesian forecasting model: predicting U.S. male mortality. Biostatistics, 7(4), 530550.Google Scholar
Peters, G.W., Briers, M., Shevchenko, P.V. & Doucet, A. (2013). Calibration and filtering for multi factor commodity models with seasonality: incorporating panel data from futures contracts. Methodology and Computing in Applied Probability, 15(4), 841874.Google Scholar
Peters, G.W., Fan, Y. & Sisson, S.A. (2012). On sequential Monte Carlo, partial rejection control and approximate Bayesian computation. Statistics and Computing, 22(6), 12091222.Google Scholar
Peters, G.W., Fung, M.C. & Shevchenko, P.V. (2017). Addressing identification problems in MCMC estimation of stochastic mortality models. Working paper.Google Scholar
Peters, G.W., Hosack, G.R. & Hayes, K.R. (2010 a). Ecological non-linear state-space model selection via adaptive particle Markov chain Monte Carlo (AdPMCMC). Available at arXiv preprint, available at arXiv:1005.2238.Google Scholar
Peters, G.W., Wüthrich, M.V. & Shevchenko, P.V. (2010b). Chain ladder method: Bayesian bootstrap versus classical bootstrap. Insurance: Mathematics and Economics, 47(1), 3651.Google Scholar
Petris, G., Petrone, S. & Campagnoli, P. (2009). Dynamic Linear Models with R. Springer-Verlag, New York.Google Scholar
Pitacco, E., Denuit, M., Haberman, S. & Olivieri, A. (2009). Modelling Longevity Dynamics for Pensions and Annuity Business. Oxford University Press.Google Scholar
Plat, R. (2009). On stochastic mortality modeling. Insurance: Mathematics and Economics, 45, 393404.Google Scholar
Poyiadjis, G., Doucet, A. & Singh, S.S. (2005). Maximum likelihood parameter estimation in general state-space models using particle methods. In Proceedings of the American Statistical Association, Minneapolis, USA. August.Google Scholar
Poyiadjis, G., Doucet, A. & Singh, S.S. (2011). Particle approximations of the score and observed information matrix in state-space models with application to parameter estimation. Biometrika, 98(1), 6580.Google Scholar
Renshaw, A. & Haberman, S. (2003). Lee-Carter mortality forecasting with age-specific enhancement. Insurance: Mathematics and Economics, 33, 255272.Google Scholar
Renshaw, A. & Haberman, S. (2006). A cohort-based extension to the Lee-Carter model for mortality reduction factors. Insurance: Mathematics and Economics, 38, 556570.Google Scholar
Shang, H.L., Booth, H. & Hyndman, R.J. (2011). Point and interval forecasts of mortality rates and life expectancy: a comparison of ten principal component methods. Demography, 25(5), 173214.Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, 64, 583639.Google Scholar
Targino, R.S., Peters, G.W. & Shevchenko, P.V. (2015). Sequential Monte Carlo samplers for capital allocation under copula-dependent risk models. Insurance: Mathematics and Economics, 61, 206226.Google Scholar
Tsay, R.S. & Tiao, G.C. (1984). Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary and nonstationary ARMA models. Journal of the American Statistical Association, 79(385), 8496.Google Scholar
van Berkum, F., Antonio, K. & Vellekoop, M. (2016). The impact of multiple structural changes on mortality predictions. Scandinavian Actuarial Journal, 2016(7), 581603.Google Scholar
West, M. & Harrison, J. (1997). Bayesian Forecasting and Dynamic Models. Springer Series in Statistics.Google Scholar
Yusuf, F., Martins, J.M. & Swanson, D.A. (2014). Methods of Demographic Analysis. Springer, Netherlands.Google Scholar
Zucs, P., Buchholz, U., Haas, W. & Uphoff, H. (2005). Influenza associated excess mortality in Germany, 1985–2001. Emerging Themes in Epidemiology, 2(1), 1.Google Scholar
Figure 0

Table 1 Several popular stochastic mortality models.

Figure 1

70

Figure 2

71

Figure 3

Table 2 A summary of state-space mortality models considered in our empirical study.

Figure 4

Figure 1 Time series of log death rates for Danish male population from year 1835 to 2010.

Figure 5

Figure 2 Estimation of (upper panels) α, β and $$\sigma _{{x_{1} \,\colon\,x_{{21}} ,{\epsilon}}}^{2} $$; (lower panels) time effect κ1834:2010, log-volatility γ1835:2010 and first difference $$\Delta \bar{\kappa }_{t} $$, for Danish male mortality data (1835–2010) using the LCSV-H model. LCSV-H, Lee–Carter stochastic volatility model with heteroscedasticity; CI, confidence interval.

Figure 6

Table 3 Estimated values of the static parameters (except α and β) for the Danish male mortality data (1835–2010).

Figure 7

Figure 3 Estimation of (upper panels) α, β and $$\sigma _{{x_{1} \,\colon\,x_{{21}} ,{\epsilon}}}^{2} $$; (lower panels) time effect κ1834:1900, log-volatility γ1835:1990 and first difference $$\Delta \bar{\kappa }_{t} $$, for Danish male mortality data (1835–1990) using the LCSV-H model. LCSV-H, Lee–Carter stochastic volatility model with heteroscedasticity; CI, confidence interval.

Figure 8

Figure 4 Estimation of (upper panels) α, β and $$\sigma _{{x_{1} \,\colon\,x_{{21}} ,{\epsilon}}}^{2} $$; (lower panels) time effect κ1949:1990, log-volatility γ1950:1990 and first difference $$\Delta \bar{\kappa }_{t} $$, for Danish male mortality data (1950–1990) using the LCSV-H model. LCSV-H, Lee–Carter stochastic volatility model with heteroscedasticity; CI, confidence interval.

Figure 9

Table 4 Deviance information criterion of models with different calibration periods.

Figure 10

Figure 5 (Colour online) 30-year forecasted log death rates (2011–2041) for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–2010. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Figure 11

Figure 6 (Colour online) 20-year out-of-sample forecasted log death rates for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Figure 12

Figure 7 (Colour online) 20-year out-of-sample forecasted log death rates for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1950–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Figure 13

Figure 8 (Colour online) 30-year forecasted life expectancy (2011–2041) at birth, ages 65 and 85 for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–2010. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Figure 14

Figure 9 (Colour online) 20-year out-of-sample forecasted life expectancy (1991–2010) at birth, ages 65 and 85 for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1835–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.

Figure 15

Figure 10 (Colour online) 20-year out-of-sample forecasted life expectancy (1991–2010) at birth, ages 65 and 85 for Danish male population under (left column) LC-H model, (middle column) LCSV model and (right column) LCSV-H model in comparison with LC model. Calibration period: 1950–1990. LC, Lee–Carter; LC-H, LC model with heteroscedasticity; LCSV, LC stochastic volatility; LCSV-H, LCSV model with heteroscedasticity; CI, confidence interval.