Bonus-Malus scale premiums for tweedie's compound poisson models

Based on the recent paper by Delong et al. (2021), two distributions for the total claims amount (loss cost) are considered: Compound Poisson-gamma (CPG) and Tweedie. Each is used as an underlying distribution in the Bonus-Malus Scale (BMS) model described in the paper by Boucher (2023). The BMS model links the premium of an insurance contract to a function of the insurance experience of the related policy. In other words, the idea is to model the increase and the decrease in premiums for insureds that do or do not file claims. Therefore, our proposed models can be seen as a generalization of the paper of Delong et al. (2021) and an extension of the work of Boucher (2023). We applied our approach to a sample of data from a major insurance company in Canada. Data fit and predictability were analyzed. We showed that the studied models are exciting alternatives to consider from a practical point of view, and that predictive ratemaking models can address some importantpractical considerations.


Introduction
Experience Rating and predictive ratemaking refer to ratemaking models that use claims information from past insurance contracts to predict the future total amount of claims (also known as "loss costs").From a ratemaking point-of-view, the idea of experience rating is to compute a premium for insured i, for a contract of period T , that will consider all the insured's past insurance contracts 1, . . ., T − 1.
Historically, research on this type of predictive ratemaking has focused on modeling the annual claims number of a contract.For an insured i, one can use panel data theory to model the joint distribution of the annual claims number for each T contract .This is expressed as the product of predictive distributions: Pr(N i,t |n i,(1:t−1) ) , where n i,(1:t−1) = {n i,1 , n i,2 , . . ., n i,t−1 } is the vector of annual past claims numbers observed at the beginning of each contract t.
Considering each predictive distribution N i,t |n i,(1:t−1) , we can calculate the frequency component premium of the contract t (t = 1 . . .T ), denoted π (F ) i,t , using the following equation: π (F ) i,t = E[N i,t |n i,(1:t−1) ].Therefore, the premium of contract t of policy i can be interpreted as a function of n (1:t−1) , which materializes the dependency between the T contracts of insured i.
Usually, the classic actuarial approach to introducing dependency between the T contracts of an insured i is to introduce a random term familiar to all of the insured's T contracts.See Turcotte and Boucher (2023) or Pechon et al. (2019) for a review of this approach.Several other approaches in the literature have also been tried, such that of Bermúdez et al. (2018) for models based on time series or Shi and Valdez (2014) for models using copula by the 'jittering' method.
More recently, instead of using the random effects approach, several papers have highlighted the advantages of using two families of models that take advantage of the fact that the average frequency of claims in property and casualty insurance is often between 0 and 1. Called Kappa-N model and Bonus-Malus Scales (BMS) model, these families propose to use a claims history function directly in the mean parameter of a counting distribution to model the decrease in premiums for insureds who do not file claims and the increase in premiums for insureds who do.Research on modelling the total annual claims number across several different databases and insurance products has shown that these models can generate excellent fit of training data and excellent prediction on test data, and often outperform the results of Bayesian random-effects models (Boucher, 2023, Boucher andInoussa, 2014).
In this paper, we generalize this BMS approach by working with data with a slightly more complex structure, close to what is used in practice, at least in North America.In Canada, some families contain multiple insured vehicles.From an experience-rating standpoint, the premium of one specific vehicle insured by a family could be calculated using the number of past claims for all others in this family, not just the number of past claims for that particular vehicle.In fact, in a family, the different drivers all use one or the other of the vehicles, so it makes sense to use the experience of all vehicles for rating.If we take as an example a family with two insured vehicles, we end up with a total premium defined as: E[N i,1,t + N i,2,t |n i,1(1:t−1) , n 1,2,(1:t−1) ] = E[N i,1,t |n (1:t−1) , n (1:t−1) ] + E[N i,1,t |n i,1,(1:t−1) , n i,2,(1:t−1) ] ̸ = E[N i,1,t |n i,1,(1:t−1) ] + E[N i,2,t |n i,2(1:t−1) ] , where n i,j,(1:t−1) = {n i,j,1 , n i,j,2 , . . ., n i,j,t−1 } is the vector of annual past claims numbers observed at the beginning of the contract t of vehicle j for this family i.
Instead of counting the number of claims per vehicle in a family, we can count the number of claims per coverage/warranty (see Boucher and Inoussa, 2014, for an illustration).We could also analyze the number of claims per insurance product, counting the number of home insurance claims to model the number of auto insurance claims.One can also consult (Verschuren, 2021) for this type of application of BMS model.
In this paper, we also propose to generalize the type of random variables modeled.Instead of using only the annual claims number for each contract t of an insured i, we propose to develop a structure to model the cost of each claim k, Z i,j,k,t and the annual claims amount Y i,j,t .First, we deal with the joint distribution of the annual claims number and the costs of each these claims for the T contracts of insured i.Second, we deal with the joint distribution of the annual claims number and the annual claims amount for these same T contracts of insured i.The joint modelling of the annual claims number and the costs of each these claims is called frequency-severity modelling.See Jeong and Valdez (2020), Oh et al. (2020), Lee and Shi (2019), and Shi and Yang (2018) for a literature review about this model.Even if, in the loss cost model, the target variable is the annual claims amount of a contract, researchers recommend that this target variable and the annual claims number be modeled jointly (Delong et al., 2021, Frees et al., 2014).

Terminologies and Definitions
Similar to what has been done for the annual claims number modelling, one can also model the conditional distribution of the annual claims amount Y i,j,t according to the t − 1 past annual claims amounts.However, in keeping with the idea of the Kappa-N and BMS models that are built conditionally on the number of past claims, our approach will be based on the analysis of the distribution of the annual claims amount, conditional on the number of past claims.In such a case, for an insured i, the premium of contract t of vehicle j is calculated as: where J is the total number of insured vehicles in the past.As shown in 2 and 3, the severity could also be modeled using this approach.
More generally, to cover all of these possibilities, Boucher (2023) defined two types of variables to be used in experience rating: 1.The variable to model, named the target variable; 2. The information used to define what we consider the past claim experience, named the scope variable.
Using these definitions means that for this paper, three target variables will be modeled: the annual claims number (the claims frequency), the claims costs (the claims severity) and the annual claims amount (also called the loss cost).In contrast, all three target variables will be modeled based only on one type of scope variable: the number of past claims.

Summary
In section 2 of the paper, we present some contextual elements and hypotheses to better introduce the models we propose.The models' notation is revised so that predictive pricing approaches can be used for the two new target variables, severity and loss cost.In the recent paper by Delong et al. (2021), the Compound Poisson-gamma (CPG) and Tweedie distributions were studied for their practical advantages in loss cost modelling.Our paper has the same objective as that of Delong et al. (2021), but we also consider the past insurance experience of each insured vehicle in calculating the premium of its future contracts.Details on how to use these two distribution in our proposed models is given in section 3.In section 4, we apply our proposed models to an auto insurance database of a major Canadian insurer.A variable selection step based on the Elastic-net regularization is also introduced to measure the impact of adding a pricing component per experience.Section 5 concludes the paper.
2 Data Structure and Hypotheses

Definitions and Form of Available Data
We assume a hierarchical data structure in which the claims experience of the policies, vehicles and insurance contracts associated with each vehicle is observed.To ensure a common vocabulary, we have retained some of the terms used in the introduction but have clarified their definitions further: • A policy is usually associated with a single insured.In insurers' databases, a policy is usually identified by a unique number.
• An insured vehicle, or simply a vehicle, is associated with a policy.For an in-force policy, the minimum number of insured vehicles is one, but many insured (particularly in North America) might have several vehicles.
• A policy is often made up of several insurance contracts.An insurance contract is also often referred to as an insurance term, and it is usually one year long.Insurance contracts are sometimes shorter, for example three or six months.Some insurance policies contain only one term, but a significant portion of policies contain multiple contracts.
For a given policy i, i = 1, . . ., m, we assume that the claims of a j, j = 1 . . ., J i vehicle are observable through T i,j contracts.We denote by t, the index associated with the T contracts (i and j are removed to simplify reading).Our variables of interest are therefore the claims experience of the T contracts of each vehicle in a policy.
To better capture the form of the available data, Table 1 provides an illustration of a sample of three insurance policies.It shows that policy #1 contained only one insured vehicle for the 2018 and 2021 policies, but two vehicles were insured in 2019 and 2020.Thus, the claims experience of the first vehicle in policy #1 was observed during four annual contracts, while the claims experience of the second vehicle was observed during only two annual contracts.Policy #2 in this sample contains only one insured vehicle, and that vehicle was insured for only one annual contract.Finally, two vehicles insured on a single annual contract were observed in the third and final policy in this table.
It should be noted that the contract number is obtained according to its associated policy and its effective date.That is, in a given policy, all contracts with the same effective year have the same contract number.The first contract for vehicle #2 of policy #1 illustrates this situation.Such a notation is important for the rest of the paper.
As can be seen in the sample in the same table, the characteristics of the insured and the insured vehicle are also available.Finally, the frequency of claims and the cost of each claim are also available information.

Target Variables
For vehicle j of an insured i, the random variable N i,j,t represents the annual claims number of its contract t.If the observed annual claims number n i,j,t = n > 0, the random vector Z i,j,t = (Z i,j,t,1 , ..., Z i,j,t,n ) ′ will represent the vector of each the insured's n claims costs.This is not defined if the associated observed annual claims number n i,j,t = n = 0. Thus, we calculate the loss cost, denoted by the random variable Y i,j,t as follows: We define our three target variables by the following three random variables: N i,j,t , Z i,j,t and Y i,j,t .

Premiums
For the m insured in the portfolio, assuming a minimization of the square distance for the calculation of the premium of contract t for each vehicle j in the policy i, the parameter of interest corresponds to E[Y i,j,t ] , ∀i = 1, ..., m, ∀j = 1, . . ., J i , ∀t = 1, . . ., T i,j .This parameter can be calculated in two ways: (1) By the multiplying the frequency component premium and the severity component premium according to some assumptions; (2) By considering the conditional distribution of the loss cost denoted by f Yi,j,t (.).Formally, these two ways are expressed as: yf Yi,j,t (y) dy (2).
2. The independence of the costs of each claim across distinct policies: 3. For the same contract t, the costs of each claim are identically distributed.
In order to include some form of segmentation in the rating (Frees et al., 2014), it should be noted that the premium is generally calculated considering specific observable characteristics of each contract, such as those illustrated in Table 1.

Experience Rating with Compound Poisson-gamma (CPG) and Tweedie Models
For the experience rating, the Kappa-N and BMS models are generally proposed (Boucher, 2023), which model the conditional distribution of a target variable according to the scope variables.In this section, the CPG and Tweedie are used as an underlying distribution in each model.Before presenting Kappa-N and BMS models in our context, we start with an example to better explain how the scope variables are calculated in practice.

Scope variables
It is known in actuarial science that insureds who make claims will have a higher frequency of claims in their future contracts.This can be explained in several ways: some insureds behave more riskily than others, some insureds live in areas that are more prone to disasters, and some insured property is more likely to be damaged.Individual characteristics used as segmentation variables may partly explain this situation.However, many of these variables cannot be measured and modeled directly in rating.Thus, past claims experience can be used to approximate the effect of these non-measurable characteristics on premiums.This is why, in addition to conditioning on characteristics X i,j,t , we price an insured according to their claims history, defined as a scope variable in the introduction.
To illustrate the situation adequately, we use Table 1 as an example, which we generalize to Table 2.For each vehicle in Table 1, we can calculate the number of claims from past contracts, i.e n i,j,(1:t−1) = {n i,j,1 , n i,j,2 , . . ., n i,j,t−1 }.This is shown in columns 5, 6 and 7 of Table 2.However, our scope variable of frequency will not only be composed of the number of past claims for the same vehicle, but also the number of past claims of the entire policy.Thus, in the last three columns of Table 2, the sum of the past claims for all vehicles of the same policy is shown for the previous contracts, namely: For a loss cost model, we are looking for the joint conditional distribution of (N i,j,t , Y i,j,t ) according to n i,•,(1:t−1) and X i,j,t .For a frequency-severity model, we are looking for the joint conditional distribution of (N i,j,t , Z i,j,t ) according to n i,•,(1:t−1) and X j,t .
Using a logarithmic link between the co-variables and the mean parameter such as in GLM models (Delong et al., 2021, De Jong et al., 2008), the expectations for our three variables of interest are expressed as given by Equations (3.2.1), (3.2.2) and ( 3 It should be noted that several possibilities exist to define these historical claims functions.Boucher (2023) listed some of them, and the problems that they could create.Taking advantage of the fact that the average automobile insurance claim frequency is between 0 and 1, and that insureds expect a discount when they do not claim and a surcharge if they report a claim, Boucher proposed instead to define a new indicator variable κ i,j,t = I(n i,j,t = 0) that identifies claims-free contracts.We thus have two new variables summarizing the claims experience: , and so 1 ∈ R. The negative sign in front of the positive parameter γ (.) 0 is used to highlight that an additional year without a claim will decrease the premium.This simple way of summarizing the claims history in the mean parameter of a random variable is called Kappa-N modelling.The idea is to consider κ i,•,• and n i,•,• as co-variables in premium modelling.

Claims Score
For each policy i and each vehicle's contract t, we define a positive quantity ℓ (.) i,t called the claims score based on the function h (.) (.) and an initial score ℓ 1 .This initial score is interpreted as the maximum number of years for which a contract can remain without a claim from its effective date to its end date.Boucher (2023) sets ℓ 1 = 100 for a simple aesthetic reason: , where Ψ (.) is the jump parameter and γ 0 is the relativity parameter of penalties related to past claims.
For a Kappa-N model using the claims score, the expectations of our three variables of interest can be expressed as: From these equations, one can quickly assess the impact of the claims score on the insurance premium.This has several desirable qualities in terms of the contract's rating structure: • For an insured i without insurance experience, n i,•,• = 0 and κ i,•,• = 0, which means a claim score of ℓ t = 100 = ℓ 1 ; • Each annual contract without a claim will decrease the claim score ℓ by 1; • Each claim increases the claim score ℓ by Ψ; • The impact of a single claim on the premium is then roughly equal to Ψ years without claims; • The penalty for a claim is an increase of (exp(Ψγ 0 ) − 1)% of the premium; • Each year without a claim decreases the premium by (1 − exp(−γ 0 ))%.

Kappa-N For Independent Poisson Annual Claims Numbers
Considering the risk exposure of contract t, denoted by d i,j,t , we assume that its annual claims number N i,j,t |ℓ For inference purposes , we assume the independence between the contracts' annual claims number of distinct policies: Further, given the use of the claims score ℓ (N ) , a form of dependency will exist between contracts for the same vehicle, and contracts for vehicles of the same policy.More formally, we have: The likelihood contribution for the claims frequency of a single policy i is expressed as follows (i is removed for easy reading): Finally, the idea is to estimate the parameters β (N ) , γ by maximizing the likelihood function built by multiplying the contribution (3.2.9) for m policies.This optimization can also be done using the glm function in R.

Kappa-N for Independent Gamma Claims Costs
For a contract t, we assume that the common distribution of the costs of each of its n i,j,t = n claims, Z i,j,t,k , is gamma such that: It is important to note that the cost of a claim, Z i,j,t,k , k = 1, . . ., n depends on the score ℓ (Z) which is set at the beginning of period t.Thus, the first, second, and third claims of contract t are all dependent on the same score ℓ (Z) .This score, as expressed in equation (3.2.11), will only be updated at the end of contract t for the rating of contract t + 1.
Similar to the frequency part, we assume that the claims severities of contracts of distinct policies are independent.Further, we take into account the dependency between the severity of contracts for the same vehicle and between those of vehicles of the same policy: We therefore evaluate the contribution of the likelihood for the severity of a policy i as follows (i is removed for easy reading): where z j,t,k , is the observed cost of claim k of contract t.Thus, the inference consists in maximizing the likelihood function built from the likelihood contributions (3.2.12) of the m policies.Similar to the Poisson model, the glm function in R can also be used to estimate the parameters β (Z) , γ 1 .

Remarks
In the Compound Poisson-gamma (CPG) model, for each contract, it is assumed that the individual's annual claims number is Poisson distributed and the costs of each of the insudred's claims is gamma distributed.This model also assumes independence between the frequency and the severity of claims of each contract.Therefore, to calculate the annual premium of a contract, one can model its frequency and severity components separately (Delong et al., 2021).
For the severity modelling, we consider the costs of each claim, unlike Delong et al. (2021), who considered the average claims amount of each contract as a target variable.Note that these two approaches lead to the same inference results (Delong et al., 2021).

Kappa-N for Independent Tweedie Annual Claims Amount
Instead of using the CPG approach to model the loss cost of a contract, another alternative is to use the distribution of its annual claims amount directly and calculate the expectation of this distribution to obtain the annual premium.This is the idea of the Tweedie model.
Consistent with Delong et al. (2021), we consider for each contract t, the couple of random variables (N i,j,t , Y i,j,t ) representing the annual claims number and the annual claims cost: Y i,j,t is Tweedie distributed and N i,j,t is Poisson distributed.
For inference purposes, we are interested in the conditional distribution of (N i,j,t , Y i,j,t ) according to the ℓ (Y ) i,t and X i,j,t .We also assume the following equations for the mean µ i,j,t and the dispersion ϕ i,j,t parameters of the Tweedie distribution: (3.2.13) where β (D) is a real vector of the same dimension as β (Y ) and γ is a real parameter.
According to Delong et al. (2021), we can also obtain the probability density function of (N i,j,t , Y i,j,t ) as follows (i is removed for easy reading): (2−p)ϕj,t n = 0 , where w j,t and p represent respectively the weight and the Tweedie variance parameter.p is a positive real and verifies: By assuming the independence between the loss costs of contracts of distinct policies, we can calculate the contribution of policy i to the likelihood as (i is removed for easy reading): To estimate all parameters, one can use the maximum likelihood strategy considering the (3.2.16).The idea is to define for each policy i, each vehicle j and each contract t, a response variable for the dispersion parameter ϕ i,j,t as follows: where ν i,j,t = 2wi,j,t ϕi,j,t µ 2−p i,j,t (p−1)(2−p) .The main motivation of this response variable is that we obtain: E[D i,j,t ] = ϕ i,j,t .Therefore, the optimization of the likelihood function can be seen as two connected GLM (Generalized Linear Models): 1) GlM for the mean parameter; and 2) GLM for the dispersion parameter.This approach is called Double-GLM (DGLM) (Delong et al., 2021).
simultaneously.This BMS level depends on the parameters Ψ, ℓ min and ℓ max which are all unknown in the model.However, for the inference, one can use the profile maximization of the likelihood function on the three parameters: Ψ, ℓ min and ℓ max .The idea is to use all possible values of these three parameters to estimate the other parameters of the model.For the jump parameter, Ψ, the use of natural numbers is required.

Model
Premium BMS level We consider a non-random sample of an automobile insurance database of a major insurer in Canada over a total period of 13 consecutive years.The data concern the Canadian province of Ontario and contain more than 2 million observations.Each observation corresponds to an annual contract for one vehicle.The form of the database is similar to Table 1 introduced at the beginning of our paper.For each observation in the database, we have a policy number, a vehicle number as well as the effective and end date of the vehicle contract.Several characteristics of the insured or the insured vehicle are also available.Finally, for each contract for each vehicle, the number of claims and the cost of each claim are available.
The database is also divided into coverage type, which provides information on third-party liability, collision and comprehensive claims.To illustrate the approach described in this paper, we focused on a single cover.Thus, in connection with the defined terms in the introduction, we illustrate our pricing model by experience with: • A target variable based on collision coverage, representing the property damage protection of auto insurance for accidents for which the driver is at fault; • A scope variable also based on collision coverage.
As mentioned earlier, however, the proposed pricing approach is very flexible, and any combination of target and scope variables would be possible.For example, the analysis of the sum of liability and collision coverage could be analyzed by conditioning on the experience of past claims of comprehensive coverage.
For confidentiality reasons, the full description of the data will not be detailed.That being said, we can provide some summary information for the studied sample: • The observed annual claims frequency is approximately 2%; • The average severity of a claim is around $7,500; • The average annual loss cost is about $160 for all available years; • The average number of vehicles insured per contract is around 1.70; • On average, a vehicle is insured for 2.75 contracts.
We also split the data into a training set and a test set.To maintain the dependency between the contracts and the vehicles of the same policy, the training and test set were formed by policy number selection.For example, if a policy is in the training set, it means that all vehicles and contracts in that policy are in the same training set.Thus, 75% of the policies were assigned to the training set and 25% to the test set.These correspond respectively to 75% and 25% of all observations.We made these splits by ensuring that we had the same claims frequency and the same average claims severity in each of the two sets.

Available Covariates
We have several characteristics for each vehicle and for each contract.In order to illustrate the impact of segmentation in rating, we select eight of these characteristics as covariates.For confidentiality reasons, but also because this is not the focus of the paper, these covariates are simply labelled as X1 − X8 and their meaning is not explained.A summary of the proportions of each modality of these variables is given by Figure 1.To be consistent with the rating approach usually used in practice, which is also often used in the actuarial scientific literature, we have chosen classic risk characteristics related to the sex and age of the insured, the use of the vehicle or the type of vehicle driven, for example.We did not seek to artificially inflate residual heterogeneity from risk characteristics that are not used in pricing.Thus, we consider that the pricing model developed with the chosen covariates is representative of standard pricing models.In addition to the selected covariates, indicator variables for each of the calendar years of the contracts were included in the modelling.

Impact of Past Insurance Experience
Although we have 13 years of experience, we use a portion of the database to create a claims history for all insureds.It should be noted that many insureds in the database during the first year have a claims history with the insurer.However, this claims history is not available owing to the structure of the data.Therefore, the first six years of the database are used exclusively to obtain the claims history of each insured, and only the following seven years are used for modelling purposes.For consistency purposes, a fixed window of six years in the past is always used to calculate past claims statistics, n i,•,• and κ i,•,• , for each of the insureds and each contract.In other words, this means the BMS level of a contract t depends only on the claims experience of contracts t − 1, . . ., t − 6 and not on the clains experience of the contracts t − 7, t − 8 . ... The impact of this choice of window on the models used is minimal, but it does mean that the Markovian property for a single contract, defined in Equation (3.4.1), no longer holds.See Boucher (2023) for a study of the window of experience to be used in predictive ratemaking.
A classic quote from Lemaire (1995) is that if only one segmentation variable were to be used for the rating, it should be based on claims experience.For our preliminary analysis of the impact of claims history on premiums, we create six groups of contracts according to their past experience.The first three groups of contracts are based on the number of past contracts ([0.1], [2.3] or [4.5]), and contain only those insureds who could be qualified as inexperienced.We can also call them the new insureds or new policies.The last three groups of policies include only insureds who have been observed for six years or more.The difference between the three groups is based on claims history: the insured in group E and F have filed claims at least once in the past while group D insureds have not filed claims in the last six years.Table 4 summarizes the groups of contracts and indicates the frequency, severity and loss cost ratios.These ratios are obtained by dividing the average claims frequency, average claims cost and average loss cost by the corresponding average of each group.
For each of the seven years studied, Figure 2 shows the frequency and severity ratios for each group.For a given calendar year or for all years combined, the Frequency Ratio is defined as the ratio of the observed frequency for a group to the observed frequency of the portfolio.This value indicates how much better or worse a group of policyholders is than the portfolio average.The Severity Ratio and the Loss Cost Ratio are defined in the same way.Although the impact of covariates may need to be considered in order to better understand the statistics shown in Table 4 and Figure 2, it is still relevant to comment directly on each group.
Type A: We observed that new policyholders in Group A have a much worse claims experience than other groups, in terms of both frequency and severity.With a claims frequency 30.3% higher than average, and an 11.0% higher severity, the total burden of Group A policyholders is approximately 45% higher than the portfolio average.Type B: Group B insureds, with only 1 or 2 years of experience more than the to Group A insureds, seem to differ from the latter.Indeed, the curves illustrated in Figure 2, representing their loss experience in frequency and severity compared to the portfolio average, are close to one.The insureds of Group B have a higher claims frequency and claims severity than the insureds of Group C, who have one or two years more experience than Group B.
Type C: Group C policyholders have four or five years of past experience.They may or may not have had claims during those years.However, when we look at Figure 2, their average claims frequency and severity are better than the averages of the portfolio.We can see, based on the number of years of experience in the company, that a minimum of about four years is necessary to have insureds with claims experience similar to the average of the portfolio.
Type D: Insureds in this group have insurance experience but have never filed a claim in the last six years.What can be quickly noticed from the figure and the table is that experienced insureds who have not claimed in the last six years (Type D) have a lower claims frequency than other insureds.Surprisingly, this same group of insured also has a better severity than the others.
Type E: Group E policyholders are those who have insurance experience but have filed a claim once in the last six years.These insureds have a claim frequency comparable to the new insureds with two and three years of insurance experience.In contrast, their average claims costs are generally lower than new insureds and the average claims cost of the portfolio.
Type F: Finally, Group F insureds also have insurance experience but have made at least two claims in the last six years.These insureds are the ones who produce the most interesting claims statistics.In fact, they have a 57% higher frequency of claims than the portfolio average.They claim more than the new policies of Group A. However, their average claims cost is lower than the average claims cost of the portfolio compared to the insureds in Group A.
Through these analyses, we show how important it is to correctly identify the contracts of new insureds (especially those in Group A) from those of Group D because the insureds of these two groups have the same value for n • .The use of a covariate, counting the number of past contracts without claims κ • is then justified.
Finally, to better understand how the past insurance experience impacts each target variable, it is necessary to model their distribution.One can also use other covariates to have flexible rating models.

Covariate selection
The data were used to adjust three types of models for frequency, severity, and loss cost: 1.A model that will be called standard, which has no component related to past claims experience; 2. The Kappa-N model (Section 3.2) using covariates n • and κ • ; 3. The Bonus-Malus Scale (Section 3.4) limiting the claims Score to ℓ min and ℓ max .
For each of the models, we considered the same vector of characteristics except for the Kappa-N and BMS models which use also use the covariates κ i,•,• and n i,•,• .However, not all risk characteristics consistently have the same impact in our three variables of interest.For example, the frequency of claims is greatly impacted by the characteristics of the insured, such as age and gender, while the severity of collision coverage will usually be more influenced by the characteristics of the vehicle, mainly the value of the vehicle.Therefore, a statistical procedure for selecting covariates seems necessary.
We adopted the Elastic-net regularization to select the covariates.This method is seen as a combination of Lasso and Ridge regressions.See Hastie et al. (2009) and Hastie et al. (2015) for more details about this approach.One of the advantages if this approach is that it solves the redundancy of variables and the multicollinearity of risk factors.The idea of the procedure is to impose constraints on the coefficients of the model.Excluding the intercept from the procedure, the constraint to be added to the log-likelihood score to be maximized is expressed as: This penalty constraint depends on the values chosen for the parameters α and λ.If α = 0, the Elastic-net is equivalent to a Lasso regression.In contrast, if α = 1, it is equivalent to a Ridge regression.For each studied model, the optimal values of λ and α were obtained by a cross-validation using deviance as a selection criterion.

Ftting Statistics and Prediction Scores
Table 5 shows the fit results of the models based on the training set, and the prediction quality based on the test set.The number of parameters used in the model (after applying the procedure by elastic net), the log-likelihood, as well as the AIC (Akaike information criterion) and BIC (Bayesian information criterion) for each of the models are indicated.To evaluate the prediction quality based on the test set we avoided using least squares because they are not always adequate for frequency, severity, or loss cost statistics.A logarithmic score SL representing the negative loglikelihood value on the test set with the parameters estimated on the training set is used instead.More formally, if we denote by P the estimated parameters for each of the models, the logarithmic prediction score is calculated as: SL = − log(f ( P |T est set)), where f () is the probability density of function of each target variable.As with least squares, the aim is to obtain the smallest value of SL on the test set in order to control the over-fitting resulting from the estimation of the parameters on the training set.Optimal use of this score implies the estimation of all parameters by maximizing the likelihood function and not only the parameters associated with the mean of each target variable.

CPG vs Tweedie
The first angle of analysis is to compare the fit and prediction quality of the CPG model with the Tweedie model.Some analyses of the insurance data showed that the gamma distribution was not rejected by a hypothesis test (the QQ-plots are available in Appendix B 5), but the Poisson distribution for the number of claims is not ideal.Given the convergence of the Poisson model estimators, however, the assumption of a Poisson distribution for the annual claims number will be retained, which will allow us to continue the analysis with Tweedie.
As mentioned by Delong et al. (2021), to use likelihood based criteria to compare the two approaches (CPG and Tweedie), the data samples must be the same in each model.So, using our remark from 3.3, we get the adjustment statistics and the prediction scores of the two models as given by Table 5. Delong et al. (2021) also argued that the fit quality of the Tweedie is always better than that of the CPG if the same covariates are used in both models and under other assumptions that we have considered here as well.However, we do not have the same covariates in the two models due to the Elastic-net procedure and the addition of the claims score or BMS level in the mean parameter modelling in each model.Considering the adjustment statistics and the prediction scores in Table 5, the Tweedie models are better than the CPG models.
For only the BMS case, we consider a Tweedie model (Tweedie' s CP) with the same covariates selected as in the CPG model in line with Delong et al. (2021).This Tweedie model is better than CPG, as Delong et al. (2021) assert, but our proposed Tweedie model is still better than Tweedie's CP model.

Standard vs Kappa-N and BMS
Another angle of analysis, more specific to what we have presented in the paper, is related to the modelling of past experience.First, the analysis should be divided according to the chosen distribution, and then a summary analysis should be done of all the models combined.

Poisson (Frequency):
The introduction of a claim score or BMS level into the mean parameter of a Poisson distribution is not new.Nevertheless, it is still interesting to see that the Kappa-N and BMS models significantly improve the AIC and BIC statistics, compared to the standard models.The prediction score is also improved if one switches from the standard model to the Kappa-N or BMS model.The differences in the fit statistics and prediction score of the Kappa-N model and the BMS model are minimal.Knowing that the Kappa-N model is difficult to use in practice, it is interesting to see that the cost of having a predictive pricing model that has a potential for use is negligible.

Gamma (Severity):
Modelling severity based on the number of past claims is not a very common approach in actuarial science.As we saw earlier in the severity analysis based on five groups of insureds, severity approaches that included a loss experience component were expected to perform well.This is what we are seeing: the Kappa-N and BMS models produced better values of AIC, BIC, and a better logarithmic prediction score than did the standard model.Considering that the standard model does not use claims history in the premium calculation, this result is very interesting since it seems to indicate that there is value for insurers in including a component of discount and surcharge on severity based on past claims.As with frequency, the observed differences between the values of AIC, BIC and logarithmic SL score are very small between the BMS approach and the Kappa-N approach, showing once again that the requirement to have a practical approach is not very restrictive.

CPG (Loss cost):
The CPG model is the combination of the frequency and severity approach discussed above.Both the frequency and severity approaches favor the Kappa-N and BMS models.Thus, it is obvious that both of these approaches will be better than the standard approaches.We also notice that the results are much better with the BMS model except for the BIC criterion, which causes a greater number of parameters to be penalized, at 37 compared to 33 for Kappa-N.
Tweedie (Loss cost): As with severity modelling, the use of the number of past claims to model the loss cost is not a very common approach in actuarial science and it is therefore very interesting to check whether this generalization of the approach is relevant.Analysis of the AIC, BIC and SL tends to show that the addition of elements related to past insurance experience helps to better segment the risk for collision coverage.Indeed, the values obtained for the Kappa-N and BMS approaches strongly favor these models, compared to the standard approach.Just like before, the BMS model performed better than the Kappa-N except for the BIC criteria and the prediction score, where a difference was observed for the Kappa-N model.

Estimated Parameters
Table 10 in Appendix A 5 shows the estimated values of the β used with the selected covariates for the Standard model and the BMS model.For frequency, severity, and loss cost, there is a marked difference between some estimators.The impact of adding components linked to a BMS level on parameters related to segmentation variables had already been observed and analyzed by Boucher and Inoussa (2014) for frequency analysis.We will not elaborate further, especially since the same explanations of the reasons for these differences apply to the analysis of severity and loss cost.Above all, it is necessary to analyze in a little more detail the values of the estimators of the parameters related to the claim experience.Table 6 shows the value of the parameters related to past claims experience for the Kappa-N model and the value of the structural parameters for the BMS approach.For frequency, severity, and loss cost, the table shows that the impact of estimating the parameters γ 0 and Ψ is small when minimum and maximum BMS limits are added, i.e. ell min and ℓ max .The results in Table 6 indicate that the jump parameter for a past claim is the same for the frequency and loss cost (Ψ N = Ψ Y = 3), but this parameter is different for severity (Ψ Z = 2).The relativity parameter γ 0 of the frequency is also closer to that of the loss cost than to that of the frequency.Finally, the frequency model proposes BMS level limits that are slightly larger than the severity or loss cost.To better understand how the experience of past claims impacts policyholders' premiums according to the studied models, we can refer to Figure 3 which illustrates the relativity curve of the frequency (Poisson), the severity (gamma) and loss cost (Tweedie) as a function of BMS level.The impact of the parameters Ψ, γ 0 , ℓ min and ℓ max can all be observed simultaneously in the same figure:

Kappa
• It shows that the range of possible penalties for severity (in blue) is much smaller than those for frequency (red) and loss cost (black).
• The maximum penalty for frequency is higher than the maximum penalty for loss cost, which is much higher than the penalty for severity.We reach the same conclusion by comparing the maximum discount obtained in each model.In the CPG model, an insured's total premium is the frequency multiplied by the severity.For the premium calculation of this model, the BMS levels of frequency and severity must be calculated.Thus, it is not possible to illustrate the BMS relativities of the CPG model in two dimensions, as is done in Figure 3.To compare BMS surcharges, discounts, and minimum and maximum relativities, BMS models of frequency and severity should be combined.The result of this comparison is shown in Table 7.The direct comparison between the CPG model and Tweedie model shows that the surchages of the two approaches for a claim are similar: a premium increase of 40.1% compared to an increase of 39.5%.The discounts for a claims-free year are also similar: -11.3% versus -10.6%.The most striking difference between the two models is revealed above all at the level of relativity of each model.In the CPG approach, an insured with a lot of past claims might have a surchage of more than 175.3% while this surchage is limited to 156.8% for the Tweedie model.In contrast, the maximum discount is comparable in both models.Relativities [0.626,1.753] [0.855,1.000] [0.535,1.753] [0.570,1.568]Table 7: Impacts of past claims for all BMS Models

Numerical Example
To better illustrate the similarities and differences between the CPG model and the Tweedie model, we will use the estimated parameters of the models and thus assume four insureds with the claim history shown in Table 8.Each insured was observed for twelve consecutive years.The first insured has not filed a claim during the 12-year periods, insured #2 is a bad driver who claims frequently, insured #3 filed many claims in the first three years, but the number diminished during the other years, and the last insured has a deteriorating driving experience.It will be assumed that all insured persons start at the BMS level of 100 at year 0, for the bonus-malus scale of frequency, severity and loss cost.
As was done with the insurance data used earlier, a 6-year window is assumed for the calculation of levels and the first 6 years are used to create a claims history.We will analyze the resulting premiums for each insured in years 7 to 12.
At the beginning of each year t, Figure 4 shows the evolution of BMS levels of frequency, severity and loss cost for each of the four fictitious insureds.The grey area of each graph corresponds to the first six years used for the calculation of the initial BMS level.
Figure 5 shows the resulting BMS relativity, where the BMS relativity of the CPG model is the combined effect of frequency and severity.For all years after the sixth contract (t ≥ 7), we can see that the BMS relativities for each of the two models are similar for the four insureds in the example.Thus, even though the BMS levels sometimes appear to be different, the combination of severity and frequency means that the relativity obtained is close to that of Tweedie.Of course, there are some differences between the two curves, but the general trend is always the same.

CPG vs Tweedie
Figure 6 shows the ratio between the CPG premium and Tweedie for the training set and test set.The distribution is similar for both parts of the dataset.We can also see that the premium ratio is around 1 but that spreads of minus 95% or more than 110% exist in the portfolio.The choice of a type of model for predictive pricing thus has a significant potential impact.A relevant way to compare BMS models with the two underlying distributions (CPG and Tweedie) is to check the fit between what is observed and what has been predicted for each of the models according to the type of contracts.By taking the five types of policyholders defined in Section 4.1.2,we can potentially see the type of contract for which the two Tweedie models could be improved.Table 9 below shows the ratio of the predicted loss cost to the annual average for the five types of policyholders. Figure 7, in contrast, illustrates the observed pure charge ratio and those predicted for both models: the two graphs at the top are for CPG and the ones at the bottom are for Tweedie.For the test portion of the database, the graphs on the left refer to type A, B and C insureds (and therefore what could be called new insureds), and those on the right indicate the result for type D and E insureds, i.e. insured with at least six years of insurance experience.
This analysis by type of insured makes it possible to see the type of insured that seems to be best or worst predicted by the different models.The two approaches, CPG and Tweedie, predict the loss costs of policyholders whose average total charge is close to or smaller than that of the portfolio.This prediction is almost perfect for Group D. In contrast, for policyholders who have an average loss cost higher than that of the portfolio, the costs are generally underestimated by both approaches.It may also be interesting to check the fit between the observed and the predicted values according to the BMS level of the contract.Figure 8 illustrates this fit for frequency, severity, and loss cost.The blue curves represent the predicted mean relativity while the red curve represents the observed relativities.The solid lines are for the training set while the dotted lines represent the results of the test set.For the frequency of claims, we see that the difference between predicted and observed relativities is minimal for contracts with BMS levels below 103.For levels 103 and above, where there are far fewer insureds, we can see that the general trend of the model is in line with the average of what has been observed.For the severity model, the relativity curve clearly shows a decrease when the BMS level decreases, which is what the pricing model also assumes.The difference between the observed and the predicted values is more variable for severity than for frequency.Finally, the gap between the observed relativities and the relativities obtained by the Tweedie model for the total charge is close to what was observed for the frequency model: the difference is minimal for lower BMS levels and more variable for higher levels.

Conclusion
We generalized the paper of Delong et al. (2021) by including a predictive ratemaking component in the premium.Our approach can also be seen as an extension of the work of Boucher ( 2023) by considering the severity and the loss cost as target variables in addition to the frequency.In other words, our objective was to compare the BMS models to the standard models when the CPG and Tweedie are used as underlying distributions.Our main conclusions and remarks are as follows.
First, in the BMS model with CPG as an underlying distribution, we found that BMS level impacts the frequency and severity components of the CPG differently.Although both are positively related to the BMS level, the impact is stronger for the frequency component than the severity component.This results in the surcharge and discount being significant in the frequency component than severity.Finally, by comparing the relativities, the frequency component penalizes insureds who have made a lot of claims in the past more than the severity component does.
Second, in the BMS model with Tweedie as an underlying distribution, we found positive dependency between the BMS level and the corresponding premium.We also noted that the surcharge by claim and the claims-free discount are comparable in both the Tweedie and CPG cases.However, the CPG model penalized insureds who have made a lot of claims in the past more than the Tweedie model did.
Finally, we obtained the same conclusions as Boucher ( 2023) about the BMS model.All BMS models considered in our analysis have better quality data adjustment and data prediction than the standard approaches do.These statistics are better when the Tweedie is used as an underlying distribution compared with the CPG.In addition, the Tweedie model (Tweedie) with a unique BMS level is better than the Tweedie model (Tweedie' s CP), which uses the BMS levels obtained in the CPG model.
We conclude the paper with some remarks about the underlying distributions used.First, there are other excellent distributions for the frequency and the severity modelling.For example, the Negative-Binomial is preferable to the Poisson.Yet to compare the CPG and Tweedie models, the frequency and severity components of the CPG must be modelled by the Poisson and gamma distributions.Finally, due to the positive probability density function of loss cost when this loss cost is zero, a mixture distribution (discrete and continuous) may be an adequate choice for the loss cost modellling.This is why we make an appropriate choice of the variance parameter of the Tweedie distribution to allow loss cost modelling.

Figure 2 :
Figure 2: Average claim frequency and severity by group Figure 3: BMS relativites

Figure 4 :Figure 5 :
Figure 4: BMS levels for all four fictitious insureds

Figure 8 :
Figure 8: Predicted vs Observed for the claims frequency (left) and the claims severity (right)

Table 1 :
The loss cost, representing the sum of the costs of each claim, is shown in the last column of the table.An insured who has not made any claims during their contract execution period has a loss cost of zero.Illustration of frequency and severity data

Table 4 :
Group of contracts by past experience Type Past Experience n i,•,• Frequency Ratio Severity Ratio Lost Cost Ratio

Table 6 :
Estimation of the other parameters of the Kappa-N and BMS models

Table 8 :
Insureds with claims experience

Table 9 :
Loss Cost Ratio for all types of insureds 4.6.2BMS Levels