Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-06T04:27:26.169Z Has data issue: false hasContentIssue false

DPTree and DPForest: tree-based methods fulfilling demographic parity

Published online by Cambridge University Press:  26 August 2025

Pierre-Alexandre Simon
Affiliation:
Department of Mathematics, Université Libre de Bruxelles (ULB), Brussels, Belgium
Michel Denuit
Affiliation:
Institute of Statistics, Biostatistics and Actuarial Science, UCLouvain, Louvain-la-Neuve, Belgium
Julien Trufin*
Affiliation:
Department of Mathematics, Université Libre de Bruxelles (ULB), Brussels, Belgium
*
Corresponding author: Julien Trufin; Email: julien.trufin@ulb.be
Rights & Permissions [Opens in a new window]

Abstract

Tree-based methods are widely used in insurance pricing due to their simple and accurate splitting rules. However, there is no guarantee that the resulting premiums avoid indirect discrimination when features recorded in the database are correlated with the protected variable under consideration. This paper shows that splitting rules in regression trees and random forests can be adapted in order to avoid indirect discrimination related to a binary protected variable like gender. The new procedure is illustrated on motor third-party liability insurance claim data.

Information

Type
Original Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries
Figure 0

Figure 1 Regression tree “stump” with only one node and two leaves.

Figure 1

Algorithm 1. DPTree(m) algorithm at node t

Figure 2

Table 1. Summary statistics of the numerical features in $\mathcal{D}^{\texttt {training}}$: power of the vehicle, weight of the vehicle, price of the vehicle, age of the vehicle, number of drivers mentioned in the contract, age of the main driver, and seniority of the contract

Figure 3

Table 2. Categorical features with associated levels and proportions according to gender on $\mathcal{D}^{\texttt {training}}$: use of the vehicle, ADAS (Advanced Driver Assistance Systems) equipped vehicle, and make of the vehicle

Figure 4

Table 3. Categorical features with associated levels and proportions according to gender on $\mathcal{D}^{\texttt {training}}$: parking in a garage, fuel of the vehicle, mileage limit specified in the policy, and premium payment

Figure 5

Table 4. Best random forests and associated OOS deviances (OOS dev) computed on $\mathcal{D}^{\texttt {valid}}$ for different relative margins $\epsilon$ together with the OOS deviance of the null model and the normalized deviances (norm. dev). The null hypothesis $H_0: F_0=F_1$ supporting demographic parity is rejected when $J^*\gt 1.358$

Figure 6

Figure 2 In-sample deviance (left panel) and out-of-sample deviance on $\mathcal{D}^{\texttt {testing}}$ (right panel) for each best model listed in Table 4.

Figure 7

Figure 3 Pairs of predictions for unconstrained random forest ($m=1$) and constrained random forest with the largest tolerable relative margin $\varepsilon =5\%$ on $\mathcal{D}^{\texttt {valid}}$.

Figure 8

Figure 4 Feature importance for the unconstrained random forest (corresponding to $m=1$) in the bottom panel and for the constrained random forest with the largest tolerable relative margin $\varepsilon =5\%$ in the top panel.

Figure 9

Table 5. Best and constrained random forests and associated OOS deviances (OOS dev) on $\mathcal{D}^{\texttt {valid}}$ for the largest tolerable relative margins $\epsilon =0.05$ together with the OOS deviance obtained with discrimination-free prices (DFP) and Wasserstein barycenters correction (WBC). The null hypothesis $H_0: F_0=F_1$ supporting demographic parity is rejected when $J^*\gt 1.358$

Figure 10

Figure 5 Pairs of predictions on $\mathcal{D}^{\texttt {valid}}$ for unconstrained random forest ($m=1$) and those obtained with Wasserstein barycenters corrections in the left panel. Corresponding pairs with discrimination-free prices in the right panel.

Figure 11

Figure 6 Empirical distribution functions $\widehat {F}_0$ and $\widehat {F}_1$ of $\widehat {\mu }(\boldsymbol{X})$ given $D=0$ and $D=1$, respectively, computed on $\mathcal{D}^{\texttt {valid}}$, for the unconstrained random forest ($m=1$, left), the constrained random forest ($\varepsilon =0.05$, middle left), the Wasserstein barycenters corrections (middle right), and the discrimination-free price (right).