Mack’s estimator motivated by large exposure asymptotics in a compound poisson setting

Abstract The distribution-free chain ladder of Mack justified the use of the chain ladder predictor and enabled Mack to derive an estimator of conditional mean squared error of prediction for the chain ladder predictor. Classical insurance loss models, that is of compound Poisson type, are not consistent with Mack’s distribution-free chain ladder. However, for a sequence of compound Poisson loss models indexed by exposure (e.g., number of contracts), we show that the chain ladder predictor and Mack’s estimator of conditional mean squared error of prediction can be derived by considering large exposure asymptotics. Hence, quantifying chain ladder prediction uncertainty can be done with Mack’s estimator without relying on the validity of the model assumptions of the distribution-free chain ladder.


Introduction
We consider the problem of predicting outstanding claims costs from insurance contracts whose coverage periods have expired but for which not all claims are known to the insurer.Such prediction tasks are referred to as claims reserving.The chain ladder method is arguably the most widespread and well known technique for claims reserving based on claims data organized in run-off triangles, with cells indexed by accident year and development year.The chain ladder method is a deterministic prediction method for predicting the not yet known south-east corner (target triangle) based on the observed north-west corner (historical triangle) of a square with cell values representing accumulated total claims amounts.The square and historical triangle can easily be generalized to rectangle and trapezoid, reflecting claims data for more historical accident years.However, we will here consider the traditional setup in order to simplify comparison with influential papers.We refer to the text book [20] by Wüthrich and Merz for an overview of methods for claims reserving.
Important contributions appeared in the 1990s presenting stochastic models and properties of parametric stochastic models that give rise to the chain ladder predictor.Mack [9] presented three model properties, known as the distribution-free chain ladder model, that together with weighted least squares estimation give rise to the chain ladder predictor.Renshaw and Verrall [15] showed that independent Poisson distributed cell values for incremental total claims amounts, together with Maximum Likelihood estimation of parameters for row and column effects, give rise to the chain ladder predictor.The Poisson model is inconsistent with the distribution-free chain ladder.
The most impressive contribution of Mack in [9] is the estimator of conditional mean squared error of prediction.The key contribution is the estimator of the contribution of parameter estimation error to conditional mean squared error of prediction.A number of papers have derived the same estimator based on different approaches to statistical estimation in settings consistent with the distribution-free chain ladder, see e.g.Merz and Wüthrich [13], Röhr [16], Diers et al. [2], Gisler [5], Lindholm et al. [8].
Different approaches to the estimation of, and estimators of, prediction error for the chain ladder method sparked some scientific debate, both regarding which stochastic model underlies the chain ladder methods, see e.g. the papers by Mack and Venter [11] and Verrall and England [19], and regarding prediction error estimation for the chain ladder method, see Buchwalder et al. [1], Gisler [4], Mack et al. [12] and Venter [18].Gisler revisited, in [6], different estimators for conditional mean squared error in the setting of the distribution-free chain ladder.Ultimately, Mack's estimator of conditional mean squared error of prediction has stood the test of time.
The main contribution of the present paper is that we show that a simple but natural compound Poisson model is fully compatible with both the chain ladder predictor and Mack's estimator of conditional mean squared error of prediction, although the model is incompatible with Mack's distribution-free chain ladder, as long as we consider an insurance portfolio with sufficiently large exposure (e.g.accumulated total claims amounts based on sufficiently many contracts).The Poisson model considered by Renshaw and Verrall in [15] is a special case of the compound Poisson model we consider, and consequently also their Poisson model gives rise to Mack's estimator of conditional mean squared error of prediction.
The rest of the paper is organized as follows.Section 2 presents the stochastic model we consider, both a simple model called the special model and a more general model.The special model is a classical insurance loss model (independent compound Poisson processes in each cell of the run-off triangle of incremental total claim amounts).Section 3 recalls Mack's distribution-free chain ladder.Section 4 presents asymptotic results that demonstrates that we can retrieve Mack's classical estimators in model setting that are incompatible with the distribution-free chain ladder.Section 5 presents a numerical example that illustrates the theoretical results in Section 4. The proofs are found in Section 6.

The model
We will focus on a simple yet general class of models for the number of reported claims and the cost of these claims.In line with classical reserving methods based on claims data organized in run-off triangles, we consider T accident years and T development years.For i, t ∈ T = {1, . . ., T }, let C α i,t denote the accumulated total claims amount due to accident events in accident year i that are paid up to and including development year t.The parameter α is a measure of exposure, such as the number of contracts of not yet fully developed accident years.We will analyze asymptotics as α → ∞ and use the findings to motivate the use of well established predictors and estimators in settings that are not consistent with model assumptions used to derive the classical results for the chain ladder method.A given claims reserving situation of course corresponds to a single, typically large, number α.As in any other situation where asymptotic arguments are the basis for approximation, we embed the prediction problem in a sequence of prediction problems, indexed by α.
The special model is simply a set of independent Cramér-Lundberg (compound Poisson) models, indexed by accident year and development year, with a common claim size distribution with finite variance and positive mean, where exposure parameter α plays the role of time in the Cramér-Lundberg models.Consider incremental accumulated total claim amounts X α i,t due to accident events in accident year i that are paid during development year t: Consider constants λ 1 , . . ., λ T ∈ (0, ∞) and q 1 , . . ., q T ∈ (0, 1) with T t=1 q t = 1.For each i, t ∈ T , (X α i,t ) α≥0 is a Cramér-Lundberg model with representation where (N α i,t ) α≥0 is a homogeneous Poisson process with intensity λ i q t ∈ (0, ∞), independent of the i.i.d.sequence (Z i,t,k ) ∞ k=1 .The claim size variables satisfy Z i,t,k d = Z for all i, t, k for some Z with finite variance and positive mean.Moreover, the compound Poisson processes (X α i,t ) α≥0 , (i, t) ∈ T × T , are independent.We want to highlight the special case of the special model obtained by letting Z ≡ 1.In this case the special model is simply a set of independent homogeneous Poisson processes, indexed by accident year and development year, where exposure parameter α plays the role of time.In particular, for a fixed α, we obtain the model considered by Renshaw and Verrall in [15] as a model underlying the chain ladder method since it gives rise to the chain ladder predictor (see Section 3) upon replacing unknown parameters by their Maximum Likelihood estimates.

The general model
Several of the statements in Section 4 below hold for a wider class of models than the special model.The general model, (GM1)-(GM4) below, allows us to write where M α i denotes the number of accident events in accident year i, Z i,k denotes the size of the kth such claim and D i,k denotes the corresponding development year, the indicator The properties GM1-GM4 together constitute the general model: The common distribution of the terms (D i,k , Z i,k ) does not depend on the accident year i.With (D, Z) denoting a generic such pair, By (GM3), claims data variables are independent if they correspond to different accident years.However, the components of (D, Z) are possibly dependent, allowing for the distribution of claim size to depend on development year.Note that we allow for exposures to vary between accident years, reflected in possibly different parameters λ 1 , . . ., λ T .Note also that the incremental accumulated claims amounts X α i,s and X α i,t , s ̸ = t, are in general not independent (unless M α i is Poisson distributed).In order to derive Mack's estimator in [9] of conditional mean squared error of prediction for the chain ladder predictor we must consider a special case of the general model: The properties (SM1)-(SM3) together form an alternative way of specifying the special model.Since (SM3) implies (GM4), the special model is a special case of the general model.

Mack's distribution-free chain ladder
The arguably most well-known method for claims reserving is the chain ladder method.
In the seminal paper [9], Thomas Mack presented properties, see ( 1) and ( 2) below, for conditional distributions of accumulated total claims amounts that, together with (3) below, make the chain ladder prediction method the optimal prediction method for predicting outstanding claims amounts.Moreover, and this is the main contribution of [9], he showed that these properties lead to an estimator of the conditional mean squared error of the chain ladder predictor.With C i,t denoting the accumulated total claims amount up to and including development year t for accidents during accident year i, Mack considered the following assumptions for the data generating process: for t = 1, . . ., T − 1 there exist constants f MCLt > 0 and σ 2 MCLt ≥ 0 such that and The conditions ( 1), ( 2) and ( 3) together are referred to as Mack's distribution-free chain ladder model.The parameters f MCLt and σ 2 MCLt are estimated by and respectively.We refer to [9] for properties of these parameter estimators.The property (2) for the conditional variance is very difficult to assess from data in the form of run-off triangles on which the chain ladder method is applied.We refer to [10] for tests assessing the assumptions of Mack's distribution-free chain ladder.Moreover, it is notoriously difficult to find stochastic models that satisfy this property.Note that the special model, see Section 2, does not satisfy Mack's conditions: neither (1) nor (2) hold.By Theorem 3.3.6. in [14], for the special model, It is shown in Theorem 1 below that large exposure limits, as α → ∞, do exist for estimators f t and σ 2 t .The constant (a.s.convergence) limit for the parameter estimator f t has a meaningful interpretation in terms of the general model we consider, and the parameter estimators f t can be transformed into estimators of parameters of our model, see Remark 4.However, Mack's parameter estimator σ 2 t converges in distribution to a nondegenerate random variable.Hence, although σ 2 t will generate numerical values that may seem reasonable, such values do not correspond to outcomes of random variables converging to a parameter.
The main contribution of Mack's paper [9] is the derivation of an estimator of the conditional mean squared error of prediction where D is the σ-algebra generated by the data observed at the time of prediction: {C j,t : where We will show that when considering the special model (SM1)-(SM3), large exposure asymptotics naturally lead to Mack's estimator of conditional mean squared error of prediction despite the fact that the special model is inconsistent with Mack's distribution-free chain ladder.Hence, the chain ladder predictor C i,T = C i,T −i+1 T −1 s=T −i+1 f s may be used together with an assessment of its accuracy by (4) without having to rely on the validity of (1) and (2) of Mack's distribution-free chain ladder.

Large exposure asymptotics
We will next present the main results, motivating the use of the chain ladder method and Mack's estimator of conditional mean squared error of prediction, in the setting of the general or special model.Recall that, for i, t ∈ T , C α i,t = denote a random variable with a chi squared distribution with ν degrees of freedom.Let N T (µ, Σ) denote the T -dimensional normal distribution with mean µ and covariance matrix Σ.In what follows, convergence of random variables should be understood as convergence as α → ∞.
Theorem 1.Consider the general model (GM1)-(GM4).For each t ∈ T with t ≤ T − 1, For each i ∈ T with i ≥ 2, For each t ∈ T with t ≤ T − 2, where Remark 1.We do not index f t and σ 2 t by the exposure parameter α.It should be clear from the context whether f t should be seen as an element in a convergent sequence or simply as a function of the given data.Similarly for σ 2 t .Remark 2. For the convergence in (5) and (6) it is not necessary to assume that M α 1 , . . ., M α T are independent.If Z and D are independent, then the limit expressions in (5) and (7) simplify: where q t = P(D = t).
Remark 3. The convergence (6) supports the use of the chain ladder predictor whose prediction error is studied in [9] and [10].However, (7) says that from numerical estimates σ 2 t we may not conclude that there is empirical evidence in support of the assumption (2) of Mack's distribution-free chain ladder.Remark 4. It follows from (5) that if we either replace the claims amounts by the number of claims (corresponding to Z ≡ 1) in the estimator f t , or assume that the variables D and Z are independent, then the estimators f 1 , . . ., f T −1 can be transformed into consistent estimators q 1 , . . ., q T , where q t = P(D = t).More generally, s=1 f s converges a.s. to ( q 1 , . . ., q T ), where In particular, if the generic pair (D, Z) has independent components or if Z ≡ 1, then ( q 1 , . . ., q T ) = (q 1 , . . ., q T ).

Conditional mean squared error of prediction
The natural measure of prediction error is where D α is the σ-algebra generated by {C α j,t : j, t ∈ T , j + t ≤ T + 1}, the run-off triangle that is fully observed at the time of prediction.Since we are considering large exposure limits, the conditional expectation (8) diverges as α → ∞ and is hence not meaningful.However, we show (Theorems 2, 3 and 4 together with Remark 10) that there exists a random variable L such that the standardized mean squared error of prediction converges in distribution, and that the limit L has a natural D α -measurable estimator L α (Remarks 5, 6 and 8).
Consequently, the natural estimator of the prediction error ( 8) is Our aim is to arrive at an estimator of conditional mean squared error of prediction that coincides with Mack's estimator (4), and this is not in general true in the setting of the general model.Therefore, we need to consider the special model (SM1)-(SM3).Combining Theorems 2, 3, 4 and Remarks 5, 6, 8 below we show that which coincides with the estimator of conditional mean squared error of prediction obtained by Mack in [9].Note that in (10) we use the notation Note that C α i,T −i+1 is independent of f T −i+1 , . . ., f T −1 since the latter estimators are functions of only data from accident years ≤ i − 1.Hence, s=T −i+1 f s is a product of two independent factors.In order to verify the convergence in (9), note that the left-hand side in ( 9) can be expressed as In the literature, the first term (11) (upon multiplication by C α i,T −i+1 ) is referred to as process variance, and the second term (12) (upon multiplication by C α i,T −i+1 ) is referred to as estimation error.In the setting of the distribution-free chain ladder, ( 11) is a conditional variance.However, in our setting (the general or special model, see Section 2) this term is not a conditional variance.Hence, we will not use the terminology "process variance".Note that the two factors in (13) are independent because of independent accident years.This fact will enable us to study the asymptotic behavior of ( 13), convergence in distribution, and verify that the limit distribution has zero mean.
Theorem 2 below shows that the second term (12) converges in distribution in the setting of the general model.Theorem 3 below shows that the first term (11) converges in distribution in the setting of the special model.In fact, the Poisson assumption for the counting variables is not not needed for convergence in distribution.However, we need it in order to obtain an estimator of conditional mean squared error of prediction that coincides with the estimator derived by Mack in [9].Theorem 4 below shows that the third term (13) converges in distribution in the setting of the special model.Remark 10 below clarifies that the sum of the terms converges in distribution in the setting of the special model.

Theorem 2. Consider the general model (GM1)-(GM4).
For each i ∈ T with i ≥ 2, there exists γ i ∈ (0, ∞) such that If Z and D are independent, then Remark 5. Motivated by ( 5) and (7) we estimate f t by f t and σ 2 t by σ 2 t .Since Hence, the estimator of γ 2 i is Consequently, the estimator of . and coincides with Mack's estimator (see [9], p. 219).

Theorem 3. Consider the special model (SM1)-(SM3). For each
where In particular, the expectation of the limit variable in ( 14) is Remark 6.Since (15) equals estimating f t by f t and σ 2 t by σ 2 t gives the estimator of (15) given by .
Remark 7. Convergence of the conditional expectations considered in Theorem 3 does not require the Poisson assumption for the counting variables.However, we have used the fact that E[M α i ] = var(M α i ) to derive the limit in (14).If E[M α i ] and var(M α i ) would increase with α at rates that differ asymptotically, then a limit corresponding to (14) would look differently and consequently we would arrive at an estimator of conditional mean squared error of prediction that would differ from the one obtained by Mack in [9].

Theorem 4. Consider the special model (SM1)-(SM3). Let
Then (A α 1 ) α≥0 and (A α 2 ) α≥0 are independent and both converges in distribution to normally distributed random variables with zero means.In particular, (A α 1 A α 2 ) α≥0 converges in distribution to a random variable with zero mean.Remark 8.By Theorem 4 the third term (13) in the expression for the standardized mean squared error of prediction converges in distribution to a random variable with zero mean.Consequently, we estimate (13) by 0. Theorem 5. Suppose that for each accident year j, (M α j ) α≥0 is a renewal counting process given by M α j = sup{m ≥ 1 : T j,m ≤ α}, where the steps Y j,k of the random walk Corollary 1.Consider the setting of Theorem 5. Let where (H, F ) is jointly normally distributed with Remark 9.If (M α j ) α≥0 is a homogeneous Poisson process, then var(Y ) = λ −2 j , the random vectors S α j in Theorem 5 have independent components, and H α and F α in Corollary 1 are independent.
Remark 10.Theorems 2, 3 and 4 show convergence in distribution separately for the three terms (11), ( 12) and (13) of conditional mean squared error of prediction.We treat them separately since we want to emphasize that convergence to the appropriate limits occurs under different assumptions; only for two of the terms we use the compound Poisson assumption of the special model.However, the sum of the terms converges in distribution under the assumptions made in Theorem 3.This convergence of the sum is a consequence of the convergence in distribution of the random vectors α −1/2 (S α j − E[S α j ]) in Theorem 5.That the convergence in distribution in Theorems 2, 3 and 4 can be extended to joint convergence in distribution can then be verified by combining the convergence of α −1/2 (S α j −E[S α j ]) in Theorem 5 with an application of the continuous mapping theorem for weak convergence together with Slutsky's theorem.Such an argument verifies that where L (1) , L (2) and L (3) correspond to the limits in Theorems 2, 3 and 4.

Numerical illustration
In the setting of the special model, we may simulate a run-off triangle {C α j,t : j, t ∈ T , j +t ≤ T +1} and explicitly compute the standardized conditional mean squared error of prediction (standardized means division by C α T −i+1 ) in ( 9) as a known function of the simulated runoff triangle.For the same run-off triangle, we may compute the standardized estimator of mean squared error by Mack, and then compare the two random variables, or their distributions.We first show how to explicitly compute the standardized conditional mean squared error of prediction.Since C α i,T = C α i,T −i+1 + N α k=1 Z k with N α ∼ Pois(αλ i T t=T −i+2 q t ) independent of the i.i.d.sequence (Z k ), and we may use the independence between N α k=1 Z k and D α to get From Theorems 2, 3 and 4 together with Remark 10 we know that L α d → L and we may compute E[L] explicitly.We have not shown convergence in distribution for L α but it follows from Theorem 1 and Slutsky's theorem that each term in the expression for L α converges in distribution, and the corresponding expectations of the limits add up to E[L].Hence, if we draw many realizations of run-off triangles based on the special model, and convert these into a random sample from the distribution of L α − L α , then we expect the empirical mean to be approximately zero.
For the numerical illustration, we take the claims data from Table 1 in Mack [9], originally presented by Taylor and Ashe [17], in order to choose values for the model parameters of exposure and distribution of delay.Applying the formula from Remark 4, we can transform the development factors f t corresponding to Table 1 in [9] into ( q t ) T t=1 = (0.069, 0.172, 0.180, 0.194, 0.107, 0.075, 0.069, 0.047, 0.070, 0.018).
For the exposures, we simply use the first column of the run-off triangle in Mack (1993) and normalize it by dividing by its first entry (this procedure suffices for illustration, more sophisticated estimation could be considered).This yields ( λ i ) T i=1 = (1.000,0.984, 0.812, 0.868, 1.239, 1.107, 1.230, 1.005, 1.053, 0.961) across accident years.For simplicity, we choose Z ≡ 1 and α = 4, 000, 000, which roughly corresponds to the order of magnitude as can be found in [9].We generate 100, 000 realizations of run-off triangles and for each one compute both the true standardized conditional mean squared error (17), as well as the standardized version of Mack's estimator of conditional mean squared error (16) for accident years i = 3, 5 and 8.The results can be seen in Figure 1.The results are not sensitive to the value chosen for α, the histograms in Figure 1 are essentially indistinguishable from those with α = 10, 000.

proofs
Before the proof of Theorem 1 we state a result, on stochastic representations of norms of multivariate normal random vectors, that will be used in the proof of Theorem 1.
Proof of Theorem 1.We first prove (5).Note that, for 1 ≤ i 0 < i 1 ≤ T , using Theorem 2.1 in [7], In order to prove (6), Note that, similarly to the above, We proceed to the more involved task of proving (7).For j = i 0 , . . ., i 1 , let . Some algebra shows that i.e. the jth term in the sum in the expression for σ 2 t .The numerator of W α j can be written as We can now write W α = B α U α , where and B α is a square matrix with entries The multivariate Central Limit Theorem together with Theorem 1.1 in [7] yield U α d → U , where U ∼ N i 1 −i 0 +1 (0, c 2 t diag(λ i 0 , . . ., λ i 1 )) with By the strong law of large numbers, B α a.s.→ B, where  and hence geometric multiplicities i 1 − i 0 and 1, respectively.By Lemma 1, where Q i 0 , . . ., Q i 1 −1 are independent and standard normal.Altogether, we have shown that If Z and D are independent, then it is seen from the above expression that Σ is diagonal.In this case, and consequently (18) converges in distribution to a centered normally distributed random variable with variance Proof of Theorem 4. A α 2 can be expressed as with S t as in the proof of Theorem 2. Hence, the arguments in the proof of 2 shows that (A α 2 ) α≥0 converges in distribution to a normally distributed random variable with zero mean.A α 1 can be expressed as
where Q 1 , . . ., Q n are independent and standard normal and µ 1 , . . ., µ n are the eigenvalues of Σ. Proof of Lemma 1. Write Σ = LL T and note that W d = LQ with Q ∼ N n (0, I).Hence, W T W d = Q T L T LQ.The matrix L T L is orthogonally diagonizable and has the same eigenvalues as Σ = LL T .Write L T L = O T DO, where O is orthogonal and D = diag(µ 1 , . . ., µ n ).Hence,