Hostname: page-component-77f85d65b8-4lrz4 Total loading time: 0.001 Render date: 2026-03-26T09:18:44.205Z Has data issue: false hasContentIssue false

A compositional approach to modeling cause-specific mortality with zero counts

Published online by Cambridge University Press:  17 January 2025

Zhe Michelle Dong*
Affiliation:
Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, Australia
Han Lin Shang
Affiliation:
Department of Actuarial Studies and Business Analytics, Macquarie University, Sydney, Australia
Francis Hui
Affiliation:
Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, Australia
Aaron Bruhn
Affiliation:
Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, Australia
*
Corresponding author: Zhe Michelle Dong; Email: zhe.dong@anuedu.au
Rights & Permissions [Opens in a new window]

Abstract

Understanding and forecasting mortality by cause is an essential branch of actuarial science, with wide-ranging implications for decision-makers in public policy and industry. To accurately capture trends in cause-specific mortality, it is critical to consider dependencies between causes of death and produce forecasts by age and cause coherent with aggregate mortality forecasts. One way to achieve these aims is to model cause-specific deaths using compositional data analysis (CODA), treating the density of deaths by age and cause as a set of dependent, nonnegative values that sum to one. A major drawback of standard CODA methods is the challenge of zero values, which frequently occur in cause-of-death mortality modeling. Thus, we propose using a compositional power transformation, the $\alpha$-transformation, to model cause-specific life-table death counts. The $\alpha$-transformation offers a statistically rigorous approach to handling zero value subgroups in CODA compared to ad hoc techniques: adding an arbitrarily small amount. We illustrate the $\alpha$-transformation in England and Wales and US death counts by cause from the Human Cause-of-Death database, for cardiovascular-related causes of death. The results demonstrate the $\alpha$-transformation improves forecast accuracy of cause-specific life-table death counts compared with log-ratio-based CODA transformations. The forecasts suggest declines in the proportions of deaths from major cardiovascular causes (myocardial infarction and other ischemic heart diseases).

Information

Type
Original Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries
Figure 0

Table 1. Selected causes of death, disaggregated for cardiovascular causes, used in our application to England and Wales data (top) and US data (bottom) from the Human Cause-of-Death Data series (2024)

Figure 1

Figure 1 Death counts by cause for England and Wales deaths from 2001 to 2016. The top row presents death counts by cause (disaggregated cardiovascular causes) for males (left) and females (right) in our application to England and Wales data from the Human Cause-of-Death Data series (2024). The bottom row presents the same data but converted to the composition of cardiovascular deaths by cause.

Figure 2

Figure 2 Death counts by cause for US deaths from 1979 to 2021. Note the two years 2020 and 2021 show a spike in deaths, likely due to COVID-19. The top row presents death counts by cause (disaggregated cardiovascular causes) for males (left) and females (right) in our application to England and Wales data from the Human Cause-of-Death Data series (2024). The bottom row presents the same data but converted to the composition of cardiovascular deaths by cause.

Figure 3

Table 2. Forecast performance on test data, applying CLR, ILR, and the $\alpha$-transformation coupled with the LC model to England and Wales data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. For each metric and gender, the bolded values correspond to the error using optimal values of $\alpha$ tuned based on cross-validation. In contrast, underlined values correspond to the lowest metric in the out-of-sample forecast

Figure 4

Figure 3 Male (top row) and female (bottom row) mortality by cause in our application to England and Wales data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. The figures show the movement in actual proportion of deaths for each cause from 2001 to 2016 (left column), while the remaining three columns present results from applying CLR, ILR (with zeros removed), and $\alpha$-transformations, respectively.

Figure 5

Figure 4 Forecast of cause-specific mortality up to 2026 in our application to England and Wales data from the HCD database, disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the forecast using the CLR, ILR (with zeros removed), and $\alpha$-transformations (L–R). Mortality by cause is shown for males (top row) and females (bottom row). This figure omits non-cardiovascular causes for presentation purposes.

Figure 6

Table 3. Forecast performance on test data, applying CLR, ILR, and the $\alpha$-transformation coupled with the LC model to US data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. For each metric and gender, the bolded values correspond to the error using optimal values of $\alpha$ tuned based on cross-validation. In contrast, underlined values correspond to the lowest metric in the out-of-sample forecast

Figure 7

Figure 5 Male (top row) and female (bottom row) mortality by cause in our application to US data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. The figures show the movement in actual proportion of deaths for each cause from 1979 to 2021 (left column), while the remaining three columns present results from applying CLR, ILR (with zeros removed) and $\alpha$-transformations, respectively.

Figure 8

Figure 6 Forecast of cause-specific mortality up to 2051 in our application to US data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the forecast using the CLR, ILR (with zeros removed), and $\alpha$-transformations (L–R). Mortality by cause is shown for males (top row) and females (bottom row). This figure omits non-cardiovascular causes for presentation purposes.

Figure 9

Figure 7 Male (top row) and female (bottom row) 90% interval forecasts up to 2026 in our application to England and Wales data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death.

Figure 10

Figure 8 Male (top row) and female (bottom row) 90% interval forecasts up to 2051 in our application to US data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death.

Figure 11

Figure 9 Male mortality by cause using US data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. The figures show the movement in the actual proportion of deaths for each cause from 1979 to 2021 (left column), while the remaining four columns present results from applying MLR simple, single, quadratic, and cubic regressions, respectively.

Figure 12

Figure 10 Fits of cause-specific male mortality in our application to US data from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the fit using MLR regressions. This figure omits non-cardiovascular causes for presentation purposes.

Figure 13

Table A.1 Results for validation sets (RMSE and MAE, based on fourfold expanding window cross-validation) to tune $\alpha$, using the $\alpha$-transformation coupled with an LC model for forecasting in our application to England and Wales death counts by cause from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Optimal values of $\alpha$ are shown in bold, noting all results are scaled by multiplying by 100

Figure 14

Table A.2 Results for validation sets (RMSE and MAE, based on tenfold expanding window cross-validation) to tune $\alpha$, using the $\alpha$-transformation coupled with an LC model for forecasting in our application to US death counts by cause from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Optimal values of $\alpha$ are shown in bold, noting all results are scaled by multiplying by 100

Figure 15

Figure A.1 Forecast of cause-specific mortality up to 2026 in our application to England and Wales death counts by cause from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the forecast using the CLR transformation with variations in the treatment of zeros in the data. Mortality by cause is shown for males (top row) and females (bottom row). This figure omits non-cardiovascular causes for presentation purposes.

Figure 16

Figure A.2 Forecast of cause-specific mortality up to 2026 in our application to England and Wales death counts by cause from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the forecast using the ILR transformation with variations in the treatment of zeros in the data. Mortality by cause is shown for males (top row) and females (bottom row). This figure omits non-cardiovascular causes for presentation purposes.

Figure 17

Figure A.3 Forecast of cause-specific mortality up to 2051 in our application to US death counts by cause from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the forecast using the CLR transformation with variations in the treatment of zeros in the data. Mortality by cause is shown for males (top row) and females (bottom row). This figure omits non-cardiovascular causes for presentation purposes.

Figure 18

Figure A.4 Forecast of cause-specific mortality up to 2051 in our application to US death counts by cause from the Human Cause-of-Death Data series (2024), disaggregated for cardiovascular causes of death. Solid lines represent the observed mortality by cause proportions, and dashed lines show the forecast using the ILR transformation with variations in the treatment of zeros in the data. Mortality by cause is shown for males (top row) and females (bottom row). This figure omits non-cardiovascular causes for presentation purposes.