Hostname: page-component-6766d58669-rxg44 Total loading time: 0 Render date: 2026-05-20T13:57:42.323Z Has data issue: false hasContentIssue false

Statistical features of persistence and long memory in mortality data

Published online by Cambridge University Press:  11 May 2021

Gareth W. Peters*
Affiliation:
Department of Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh EH14 4AS, USA
Hongxuan Yan
Affiliation:
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
Jennifer Chan
Affiliation:
School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW 2006, Australia
*
*Corresponding author. E-mail: g.peters@hw.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Understanding core statistical properties and data features in mortality data are fundamental to the development of machine learning methods for demographic and actuarial applications of mortality projection. The study of statistical features in such data forms the basis for classification, regression and forecasting tasks. In particular, the understanding of key statistical structure in such data can aid in improving accuracy in undertaking mortality projection and forecasting when constructing life tables. The ability to accurately forecast mortality is a critical aspect for the study of demography, life insurance product design and pricing, pension planning and insurance-based decision risk management. Though many stylised facts of mortality data have been discussed in the literature, we provide evidence for a novel statistical feature that is pervasive in mortality data at a national level that is as yet unexplored. In this regard, we demonstrate in this work a strong evidence for the existence of long memory features in mortality data, and second that such long memory structures display multifractality as a statistical feature that can act as a discriminator of mortality dynamics by age, gender and country. To achieve this, we first outline the way in which we choose to represent the persistence of long memory from an estimator perspective. We make a natural link between a class of long memory features and an attribute of stochastic processes based on fractional Brownian motion. This allows us to use well established estimators for the Hurst exponent to then robustly and accurately study the long memory features of mortality data. We then introduce to mortality analysis the notion from data science known as multifractality. This allows us to study the long memory persistence features of mortality data on different timescales. We demonstrate its accuracy for sample sizes commensurate with national-level age term structure historical mortality records. A series of synthetic studies as well a comprehensive analysis of real mortality death count data are studied in order to demonstrate the pervasiveness of long memory structures in mortality data, both mono-fractal and multifractal functional features are verified to be present as stylised facts of national-level mortality data for most countries and most age groups by gender. We conclude by demonstrating how such features can be used in kernel clustering and mortality model forecasting to improve these actuarial applications.

Information

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries
Figure 0

Table 1. Estimate H by R/S, DFA and PR methods from the data generated by the ARFIMA model with $d=0.45$ and $n=50$, 100 and 300. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 1

Table 2. Estimate H by R/S, DFA and PR methods from the data generated by the ARFIMA model with $n=150$ and $d=0.15$, 0.25 and 0.35. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 2

Table 3. Estimate H by R/S, DFA and PR methods from the data generated by the ARFIMA model with $n = 150$, $d = 0.45$ and 5%, 10% and 30% missing values. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 3

Table 4. Estimate H by R/S, DFA and PR methods from the data generated by the ARFIMA model with $n=150$, $d=0.45$ and aggregate every 5 and 10 values. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 4

Table 5. Estimate H by R/S, DFA and PR methods from the data generated by the ARFIMA model with $n=150$, $d=0.45$ and rounding off one and two decimal places. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 5

Table 6. Estimate H by R/S, DFA and PR methods from the data generated by the ARFIMA model with $n=150$, $d=0.45$ and scaling every 10, 100 and 1,000. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 6

Table 7. Data length (years) and abbreviation for country names

Figure 7

Table 8. Estimated Hurst exponent for three types of aggregation. Age group represents different aggregation for different types, for example, age group 2 for type 1 is 1, for type 2 is 1–4 and for type 3 is 11–20. Empirical confidence intervals at 95% were found to be less than 5% in deviation from the point estimator in all cases

Figure 8

Figure 1 Heat map of estimated H across countries and age groups for female (left) and male (right).

Figure 9

Figure 2 Boxplot of estimated H across age groups aggregated over countries to show gender effect. For each age group, the first box plot is female (pink) and the second is male (blue).

Figure 10

Figure 3 Boxplot of estimated H across countries aggregated over ages to show gender effect. For each age group, the first box plot is female (pink) and the second is male (blue).

Figure 11

Figure 4 Source: Max Roser and Esteban Ortiz-Ospina (2019) Life expectancy by sex. Published online at OurWorldInData.org. Retrieved from: https://ourworldindata.org [Online Resource].

Figure 12

Figure 5 Boxplots of estimated H by MFDFA across age groups aggregated over countries to show gender effect. For each age group, the left panel is female (pink), and the right panel is male (blue).

Figure 13

Figure 6 Boxplots of estimated H by MFDFA across age groups aggregated over countries to show gender effect. For each age group, the left panel is female (pink), and the right panel is male (blue).

Figure 14

Table 9. Kernel k-means clustering assignments for ages 10–14, 35–39 and 70–74 across 16 countries

Figure 15

Table 10. Kernel k-means clustering assignments for AU, IS, UK, US and JP across all age groups

Figure 16

Table 11. Kernel k-means cluster representative data sets selected by age groups 10–14, 35–39 and 70–74 to be used for forecasting analysis application

Figure 17

Figure 7 Observed mortality rates $\boldsymbol\mu_{x,T-19\,:\,T}$ (black line) from HMD and forecast $\hat{\boldsymbol{\mu}}_{x,T-19\,:\,T}$ by Lee–Carter model (female in purple) and long memory extended Lee–Carter model (female in red) for age groups (10–14, 35–39 and 70–74).

Figure 18

Figure 8 Observed mortality rates $\boldsymbol\mu_{x,T-19\,:\,T}$ (black line) from HMD and forecast $\hat{\boldsymbol {\mu}}_{x,T-19\,:\,T}$ by Lee–Carter model (male in skyblue) and long memory extended Lee–Carter model (male in blue) for age groups (10–14, 35–39 and 70–74).

Supplementary material: File

Peters et al. supplementary material

Peters et al. supplementary material

Download Peters et al. supplementary material(File)
File 2.4 MB