Hostname: page-component-5db58dd55d-l8wb7 Total loading time: 0 Render date: 2026-06-01T00:26:45.637Z Has data issue: false hasContentIssue false

Hedging targeted risks with reinforcement learning: application to life insurance contracts with embedded guarantees

Published online by Cambridge University Press:  20 February 2026

Carlos Octavio Pérez-Mendoza
Affiliation:
Concordia University , Canada
Frédéric Godin*
Affiliation:
Concordia University , Canada
*
Corresponding author: Frédéric Godin; Email: frederic.godin@concordia.ca
Rights & Permissions [Opens in a new window]

Abstract

We propose a deep reinforcement learning (RL) framework designed to optimize the hedging of specific, user-defined risk factors—referred to as targeted risks—in financial instruments affected by multiple sources of uncertainty. Our methodology uses Shapley value decompositions to establish source of risk grouping’s contribution to the projected contract cash flows, providing a clear attribution of the profit and loss to distinct risk categories. Leveraging this decomposition, we apply deep RL to hedge only the targeted risks, while leaving non-targeted risks mostly unaffected. In addition, we introduce a joint neural network architecture in which the agent network utilizes risk estimates from a risk measurement neural network to stabilize the hedging strategy, taking into account local risk dynamics. Numerical experiments show that our approach outperforms traditional methods, such as delta hedging and traditional deep hedging, significantly reducing targeted risks in variable annuities while maintaining flexibility for broader applications.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The International Actuarial Association
Figure 0

Table 1. Estimates of the three-factor model.

Figure 1

Table 2. Estimates of the equity index model.

Figure 2

Table 3. Estimates of the fund model.

Figure 3

Table 4. Computation time for different components of the hedging frameworks, GMWB example.

Figure 4

Table 5. Total risk allocation of 20-year GMMB variable annuities, with and without hedging.

Figure 5

Table 6. Total risk allocation of GMWB variable annuities, with and without hedging.

Figure 6

Table 7. Total risk allocation across different RL settings for a GMMB on the assumption mixed fund.

Figure 7

Table 8. Impact of the local penalty on total risk allocation for a GMMB on the assumption mixed fund.

Figure 8

Figure 1. CVaR of cumulative gains and losses over the hedging horizon. Results are based on 1000 out-of-sample paths. The hedged is the equity component of a GMMB variable annuity with a maturity of $T = 240$ months, rebalanced monthly under the assumed fund dynamics. The CVaR is computed at a 95% confidence level. In the local penalization scenario, the penalization parameter is set to $\frac{1}{T}$. The CVaR is applied to the negative P&L to capture tail risk.

Figure 9

Figure 2. Average local contribution of the risk factors to gains and losses. Results are based on 1000 out-of-sample paths. The hedged target is the equity component of a variable annuity on the Assumption fund with a maturity of $T = 240$ months. Rebalancing is performed monthly. The risk factor decomposition is computed as described in Section 2.1. The shaded confidence bands represent the interquartile range, spanning from the 25th to the 75th percentile.