Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-15T22:04:16.413Z Has data issue: false hasContentIssue false

Pseudo-model-free hedging for variable annuities via deep reinforcement learning

Published online by Cambridge University Press:  14 March 2023

Wing Fung Chong
Affiliation:
Maxwell Institute for Mathematical Sciences and Department of Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh EH14 4AS, UK
Haoen Cui
Affiliation:
School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA
Yuxuan Li*
Affiliation:
Department of Mathematics, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
*
*Corresponding author. E-mail: yuxuanl9@illinois.edu
Rights & Permissions [Opens in a new window]

Abstract

This paper proposes a two-phase deep reinforcement learning approach, for hedging variable annuity contracts with both GMMB and GMDB riders, which can address model miscalibration in Black-Scholes financial and constant force of mortality actuarial market environments. In the training phase, an infant reinforcement learning agent interacts with a pre-designed training environment, collects sequential anchor-hedging reward signals, and gradually learns how to hedge the contracts. As expected, after a sufficient number of training steps, the trained reinforcement learning agent hedges, in the training environment, equally well as the correct Delta while outperforms misspecified Deltas. In the online learning phase, the trained reinforcement learning agent interacts with the market environment in real time, collects single terminal reward signals, and self-revises its hedging strategy. The hedging performance of the further trained reinforcement learning agent is demonstrated via an illustrative example on a rolling basis to reveal the self-revision capability on the hedging strategy by online learning.

Information

Type
Original Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries
Figure 0

Table 1. Contract characteristics.

Figure 1

Table 2. Parameters setting of market environment.

Figure 2

Table 3. Parameters setting of model of market environment, with bolded parameters being different from those in market environment.

Figure 3

Table 4. Fee structures derived from model of market environment.

Figure 4

Table 5. Summary statistics of empirical distributions of realised terminal P&Ls by different Delta strategies.

Figure 5

Figure 1 Empirical density and cumulative distribution functions of realised terminal P&Ls by different Delta strategies.

Figure 6

Figure 2 The relationship among insurer, RL agent, MDP training environment, and market environment of the two-phase RL approach.

Figure 7

Figure 3 An example of policy and value function artificial neural networks with a shared hidden layer and a non-shared hidden layer.

Figure 8

Table 6. Hyperparameters setting of Proximal Policy Optimisation and neural network.

Figure 9

Figure 4 Training log in terms of bootstrapped sum of rewards and batch entropy.

Figure 10

Table 7. Parameters setting of increasing force of mortality actuarial model for Delta.

Figure 11

Table 8. Parameters setting of Heston financial model for Delta.

Figure 12

Figure 5 Empirical density and cumulative distribution functions of realised terminal P&Ls by the approaches of reinforcement learning, classical Deltas, and deep hedging.

Figure 13

Table 9. Summary statistics of empirical distributions of realised terminal P&Ls by the approaches of reinforcement learning, classical Deltas, and deep hedging.

Figure 14

Figure 6 Empirical density functions of realised pathwise differences of terminal P&Ls comparing with the approaches of classical Deltas and deep hedging.

Figure 15

Table 10. Summary statistics of empirical distributions of realised pathwise differences of terminal P&Ls comparing with the approaches of classical Deltas and deep hedging.

Figure 16

Figure 7 An illustrative timeline with the real time and the contract effective time in the online learning phase.

Figure 17

Table 11. Hyperparameters setting of Proximal Policy Optimisation for online learning with bolded hyperparameters being different from those for training.

Figure 18

Figure 8 Best-case and worst-case samples of future trajectories for rolling-basis evaluation of reinforcement learning agent with online learning phase, and comparisons with classical Deltas and reinforcement learning agent without online learning phase.

Figure 19

Figure 9 Empirical conditional density functions of first surpassing times conditioning on reinforcement learning agent with online learning phase exceeding correct Delta in terms of sample means of terminal P&L within 3 years.

Figure 20

Table 12. Summary statistics of empirical conditional distributions of first surpassing times conditioning on reinforcement learning agent with online learning phase exceeding correct Delta in terms of sample means of terminal P&L within 3 years.

Figure 21

Table 13. Estimated proportions of future trajectories where reinforcement learning agent with online learning phase is statistically significant to be exceeding correct Delta and incorrect Delta within 3 years with various levels of significance.

Figure 22

Figure 10 Empirical conditional density functions of first statistically significant surpassing times conditioning on reinforcement learning agent with online learning phase being statistically significant to be exceeding correct Delta within 3 years for $0.1$ level of significance.

Figure 23

Table 14. Summary statistics of empirical conditional distributions of first statistically significant surpassing times conditioning on reinforcement learning agent with online learning phase being statistically significant to be exceeding correct Delta within 3 years for $0.1$ level of significance.

Figure 24

Figure 11 Snapshots of empirical density functions of sample mean of terminal P&L by reinforcement learning agent with online learning phase, reinforcement learning agent without online learning phase, correct Delta, and incorrect Delta at different time points.

Figure 25

Table 15. Summary statistics of empirical distributions of sample mean of terminal P&L by reinforcement learning agent with online learning phase, reinforcement learning agent without online learning phase, correct Delta, and incorrect Delta at different time points.

Figure 26

Algorithm 1. Pseudo-code for deep hedging method

Figure 27

Table C.1. The hyperparameters of deep hedging training and the neural network.