Estimation of Conditional Mean Squared Error of Prediction for Claims Reserving

This paper studies estimation of conditional mean squared error of prediction, conditional on what is known at the time of prediction. The particular problem considered is the assessment of actuarial reserving methods given data in the form of runoff triangles (trapezoids), where the use of prediction assessment based on out-of-sample performance is not an option. The prediction assessment principle advocated here can be viewed as a generalization of Akaike's final prediction error. A direct application of this simple principle in the setting of a data generating process given in terms of a sequence of general linear models yields an estimator of conditional mean square error of prediction that can be computed explicitly for a wide range of models within this model class. Mack's distribution-free chain ladder model and the corresponding estimator of the prediction error for the ultimate claim amount is shown to be a special case. It is demonstrated that the prediction assessment principle easily applies to quite different data generating processes and results in estimators that have been studied in the literature.


Introduction
Actuarial reserving amounts to forecasting future claim costs from incurred claims that the insurer is unaware of and from claims known to the insurer that may lead to future claim costs. The predictor commonly used is an expectation of future claim costs computed with respect to a parametric model, conditional on the currently observed data, where the unknown parameter vector is replaced by a parameter estimator. A natural question is how to calculate an estimate of the conditional mean squared error of prediction, MSEP, given the observed data, so that this estimate is a fair assessment of the accuracy of the predictor. The main question is how the variability of the predictor due to estimation error should be accounted for and quantified.
Mack's seminal paper [14] addressed this question for the chain ladder reserving method. Given a set of model assumptions, referred to as Mack's distribution-free chain ladder model, Mack justified the use of the chain ladder reserve predictor and, more importantly, provided an estimator of the conditional MSEP for the chain ladder predictor. Another significant contribution to measuring variability in reserve estimation is the paper [8] which introduced bootstrap techniques to actuarial science. For more on other approaches to assess the effect of estimation error in claims reserving, see e.g. [5,9,27,20,6] and the references therein.
Even though [14] provided an estimator of conditional MSEP for the chain ladder predictor of the ultimate claim amount, the motivation for the approximations in the derivation of the conditional MSEP estimator is somewhat opaque -something commented upon in e.g. [5]. Moreover, by inspecting the above references it is clear that there is no general agreement on how estimation error should be accounted for when assessing prediction error.
Many of the models underlying commonly encountered reserving methods, such as Mack's distribution-free chain ladder model, have an inherent conditional or autoregressive structure. This conditional structure will make the observed data not only a basis for parameter estimation, but also a basis for prediction. More precisely, expected future claim amounts are functions, expressed in terms of observed claim amounts, of the unknown model parameters. These functions form the basis for prediction. Predictors are obtained by replacing the unknown model parameters by their estimators. In particular, the same data are used for the basis for prediction and parameter estimation. In order to estimate prediction error in terms of conditional MSEP it is necessary to account for the fact that the parameter estimates differ from the unknown parameter values. As demonstrated in [14], not doing so will make the effect of estimation error vanish in the conditional MSEP estimation.
We start by considering assessment of a prediction method without reference to a specific model. Given a random variable X to be predicted and a predictor X, the conditional MSEP, conditional on the available observations, is defined as The variance term is usually referred to as the process variance and the expected value is referred to as the estimation error. Notice that MSEP F0 (X, X) is the optimal predictor of the squared prediction error (X − X) 2 in the sense that it minimizes E[((X − X) 2 − V ) 2 ] over all F 0 -measurable random variables V having finite variance. However, MSEP F0 (X, X) typically depends on unknown parameters.
Typically, the predictor X is taken as the plug-in estimator of the conditional expectation E[X | F 0 ]: if X has a probability distribution with a parameter vector θ, then we may write where z → h(z; F 0 ) is an F 0 -measurable function and θ is an F 0 -measurable estimator of θ. (Note that this definition of a plug-in estimator, i.e. the estimator obtained by replacing an unknown parameter θ with an estimator θ of the parameter, is not to be confused with the so-called plug-in principle, see e.g. [7,Ch. 4.3].) Since the plug-in estimator of of model selection is relevant. This topic will not be pursued any further, but it is worth noting that the techniques and methods discussed in the present paper allow for "distribution-free" model selection.
In Section 2 we present in detail the general approach to estimation of conditional mean squared error of prediction briefly summarized above. Moreover, in Section 2 we illustrate how the approach applies to the situation with runoff triangle based reserving when we are interested in calculating conditional MSEP for the ultimate claim amount and the claims development result (CDR). We emphasize the fact that the conditional MSEP given by (1) is the standard (conditional) L 2 distance between a random variable and its predictor. The MSEP quantities considered in [28] in the setting of the distribution-free chain ladder model are not all conditional MSEP in the sense of (1).
In Section 3 we put the quantities introduced in the general setting in Section 2 in the specific setting where data emerging during a particular time period (calendar year) form a diagonal in a runoff triangle (trapezoid).
In Section 4, development-year dynamics for the claim amounts are given by a sequence of general linear models. Mack's distribution-free chain ladder model is a special case but the model structure is more general and include e.g. developmentyear dynamics given by sequences of autoregressive models. Given the close connection between our proposed estimator of conditional MSEP and Akaike's FPE, our approach naturally lends itself to model selection within a set of models.
In Section 5 we show that we retrieve Mack's famous conditional MSEP estimator for the ultimate claim amount and demonstrate that our approach coincides with the approach in [6] to estimation of conditional MSEP for the ultimate claim amount for Mack's distribution-free chain ladder model. We also argue that conditional MSEP for the CDR is simply a special case, choosing CDR as the random variable of interest instead of e.g. the ultimate claim amount. In Section 5 we show agreement with certain CDR-expressions obtained in [28] for the distribution-free chain ladder model, while noting that the estimation procedure is different from those used in e.g. [28,6].
Although Mack's distribution-free chain ladder model and the associated estimators/predictors provide canonical examples of the claim amount dynamics and estimators/predictors of the kind considered in Section 4, analysis of the chain ladder method is not the purpose of the present paper. In Section 6 we demonstrate that the general approach to estimation of conditional MSEP presented here applies naturally to non-sequential models such as the overdispersed Poisson chain ladder model. Moreover, for the overdispersed Poisson chain ladder model we derive a (semi-) analytical MSEP-approximation which turns out to coincide with the well-known estimator from [19].

Estimation of conditional MSEP in a general setting
We will now formalize the procedure briefly described in Section 1. All random objects are defined on a probability space (Ω, F, P). Let T = {t, t + 1, . . . , t} be an increasing sequence of integer times with t < 0 < t and 0 ∈ T representing current time. Let ((S t , S ⊥ t )) t∈T be a stochastic process generating the relevant data. (S t ) t∈T and (S ⊥ t ) t∈T are independent and identically distributed stochastic processes, where the former represents outcomes over time in the real world and the latter represents outcomes in an imaginary parallel universe. Let (F t ) t∈T denote the filtration generated by (S t ) t∈T . It is assumed that the probability distribution of (S t ) t∈T is parametrized by an unknown parameter vector θ. Consequently, the same applies to (S ⊥ t ) t∈T . The problem considered in this paper is the assessment of the accuracy of the prediction of a random variable X, that may be expressed as some functional applied to (S t ) t∈T , given the currently available information represented by F 0 . The natural object to consider as the basis for predicting X is which is an F 0 -measurable function evaluated at θ. The corresponding predictor is then obtained as the plug-in estimator where θ is an F 0 -measurable estimator of θ. We define We write to emphasize that MSEP F0 (X, X) can be seen as an F 0 -measurable function of θ. Consequently, the plug-in estimator of MSEP F0 (X, X) is given by which coincides with the plug-in estimator of the process variance leading to a likely underestimation of MSEP F0 (X, X). This problem was highlighted already in [14] in the context of prediction/reserving using the distribution-free chain ladder model. The analytical MSEP approximation suggested for the chain ladder model in [14] is, in essence, based on replacing the second term on the right-hand side in (5), relating to estimation error, by another term based on certain conditional moments, conditioning on σ-fields strictly smaller than F 0 . These conditional moments are natural objects and straightforward to calculate due to the conditional structure of the distribution-free chain-ladder claim-amount dynamics. This approach to estimate conditional MSEP was motivated heuristically as "average over as little as possible", see [14, p. 219]. In the present paper, we present a conceptually clear approach to quantifying the variability due to estimation error that is not model specific. The resulting conditional MSEP estimator for the ultimate claim amount is found to coincide with that found in [14] for the distribution-free chain ladder model, see Section 5. This is further illustrated by applying the same approach to non-sequential, unconditional, models, see Section 6, where it is shown that the introduced method can provide an alternative motivation of the estimator from [19] for the overdispersed Poisson chain ladder model.
With the aim of finding a suitable estimator of MSEP F0 (X, X), notice that the predictor X := h( θ, F 0 ) is obtained by evaluating the F 0 -measurable function z → h(z, F 0 ) at θ. The chosen model and the stochastic quantity of interest, X, together form the function z → h(z, F 0 ) that is held fixed. This function may be referred to as the basis of prediction. However, the estimator θ is a random variable whose observed outcome may differ substantially from the unknown true parameter value θ. In order to obtain a meaningful estimator of the MSEP F0 (X, X), the variability in θ should be taken into account. Towards this end, consider the random variable θ * which is not F 0 -measurable, which is constructed to share key properties with θ. Based on X * := h( θ * ; F 0 ) we will introduce versions of conditional MSEP from which estimators of conditional MSEP in (5) will follow naturally.
In general, evaluation of the second term on the right-hand side above requires full knowledge about the model. Typically, we want only to make weaker moment assumptions. The price paid is the necessity to consider the approximation [1,2], the quantity FPE (final prediction error) for assessment of the accuracy of a predictor, intended for model selection by rewarding models that give rise to small prediction errors. Akaike demonstrated the merits of FPE when used for order selection among autoregressive processes. Akaike's FPE assumes a stochastic process (S t ) t∈T of interest and an independent copy (S ⊥ t ) t∈T of that process. Let F 0 be the σ-field generated by (S t ) t∈T ,t≤0 and let X be the result of applying some functional to (S t ) t∈T such that X is not F 0 -measurable. If (S t ) t∈T is 1-dimensional, then X = S t , for some t > 0, is a natural example. Let h(θ; F 0 ) := E[X | F 0 ] and let h( θ; F 0 ) be the corresponding predictor of X based on the F 0 -measurable parameter estimator θ. Let F ⊥ 0 , X ⊥ , h(θ; F ⊥ 0 ) and θ ⊥ be the corresponding quantities based on (S ⊥ t ) t∈T . FPE is defined as and it is clear that the roles of (S t ) t∈T and (S ⊥ t ) t∈T may be interchanged to get Naturally, we may consider the conditional version of FPE which gives where to emphasize that MSEP * F0 (X, X) and MSEP * ,∇ F0 (X, X) are F 0 -measurable functions of θ.
The plug-in estimator H * ( θ; F 0 ) of MSEP * F0 (X, X) may appear to be a natural estimator of MSEP F0 (X, X). However, in most situations there will not be sufficient statistical evidence to motivate specifying the full distribution of θ * . Therefore, H * ( θ; F 0 ) is not likely to be an attractive estimator of MSEP F0 (X, X). The plugin estimator H * ,∇ ( θ; F 0 ) of MSEP * ,∇ F0 (X, X) is more likely to be a computable estimator of MSEP F0 (X, X), requiring only the covariance matrix Λ(θ; F 0 ) := Cov( θ * | F 0 ) as a matrix-valued function of the parameter θ instead of the full distribution of θ * . We will henceforth focus solely on the estimator H * ,∇ ( θ; F 0 ). Definition 2.3. The estimator of the conditional mean squared error of prediction is given by We emphasize that the estimator we suggest in Definition 2.3 relies on one approximation and one modeling choice. The approximation refers to and no other approximations will appear. The modeling choice refers to deciding on how the estimation error should be accounted for in terms of the conditional covariance structure Cov( θ * Before proceeding further with the specification of θ * , one can note that in many situations it will be natural to structure data according to e.g. accident year. In these situation it will be possible to express X as X = i∈I X i and consequently also h(θ; F 0 ) = i∈I h i (θ; F 0 ). This immediately implies that the estimator (7) of conditional MSEP can be expressed in a way that simplifies computations. Lemma 2.1. Given that X = i∈I X i and h(θ; F 0 ) = i∈I h i (θ; F 0 ), the estimator (7) takes the form The proof of Lemma 2.1 follows from expanding the original quadratic form in the obvious way, see Appendix C. Even though Lemma 2.1 is trivial, it will be used repeatedly in later sections when the introduced methods are illustrated using e.g. different models for the data generating process.
Since the specification θ * := θ ⊥ implies that Cov( θ * | F 0 ) does not depend on F 0 we refer to θ ⊥ as the unconditional specification of θ * . In this case, as described in Remark 1, MSEP * F0 (X, X) coincides with Akaike's FPE in the conditional setting. Moreover, For some models for the data generating process (S t ) t∈T , such as the conditional linear models investigated in Section 4 below, computation of the unconditional covariance matrix Cov( θ) is not feasible. Moreover, it may be argued that observed data should be considered also in the specification of θ * although there is no statistical principle justifying this argument. The models investigated in Section 4 are such that θ = (θ 1 , . . . , θ p ) and there exist nested σ-fields G 1 ⊆ . . . G p ⊆ F 0 such that E[ θ k | G k ] = θ k for k = 1, . . . , p and θ k is G k+1 -measurable for k = 1, . . . , p −1.
Consequently, Cov( θ j , θ k | G j , G k ) = 0 for j = k. If further the covariance matrices Cov( θ k | G k ) can be computed explicitly, as demonstrated in Section 4, then we may choose θ for k = 1, . . . , p and Cov( θ * j , θ * k | F 0 ) = 0 for j = k. Since the specification θ * := θ * ,c implies that Cov( θ * | F 0 ) depends on F 0 we refer to θ * ,c as the conditional specification of θ * . In this case, Notice that if θ * ,u := θ ⊥ and θ * ,c denote the unconditional and conditional specifications of θ * , respectively, then covariance decomposition yields Cov( θ * ,c | F 0 ) is thus an unbiased estimator of Cov( θ * ,u | F 0 ). The estimators of conditional MSEP for the distribution-free chain-ladder model given in [5] and, more explicitly, in [6] are essentially based on the conditional specification of θ * . We refer to Section 5 below for details.
2.1. Selection of estimators of conditional MSEP. As noted in the introduction, MSEP F0 (X, X) is the optimal predictor of the squared prediction error (X − X) 2 in the sense that it minimizes E[((X − X) 2 − V ) 2 ] over all F 0 -measurable random variables V having finite variance. Therefore, given a set of estimators V of MSEP F0 (X, X), the best estimator is the one minimizing E[((X − X) 2 − V ) 2 ]. Write V := MSEP F0 (X, X) and V := V + ∆V and notice that In our setting, Recall that Λ(θ; F 0 ) = Cov( θ * | F 0 ) depends on the specification of θ * . Therefore we may in principle search for the optimal specification of θ * . However, it is unlikely that any specifications will enable explicit computation of E[∆V 2 ]. Moreover, for so-called distribution-free models defined only in terms of certain (conditional) moments, the required moments appearing in the computation of E[∆V 2 ] may be unspecified.
We may consider the approximations Therefore, the specification of θ * should be such that Appendix D compares the two estimators of conditional MSEP based on unconditional and conditional, respectively, specification of θ * , in the setting of Mack's distribution-free chain ladder model. No significant difference between the two estimators can be found. However, in the setting of Mack's distribution-free chain ladder model, only the estimator based on the conditional specification of θ * is computable.

Data in the form of runoff triangles
One of the main objectives of this paper is the estimation of the precision of reserving methods when the data in the form of runoff triangles (trapezoids), explained below, have conditional development-year dynamics of a certain form. Mack's chain ladder model, see e.g. [14], will serve as the canonical example.
Let I i,j denote the incremental claims payments during development year j ∈ {1, . . . , J} =: J and from accidents during accident year i ∈ {i 0 , . . . , J} =: I, where i 0 ≤ 1. This corresponds to the indexation used in [14], i.e. j = 1 corresponds to the payments that are made during a particular accident year. Clearly, the standard terminology accident-and development year used here could refer to any other appropriate time unit. The observed payments as of today, at time 0, is what is called a runoff triangle or runoff trapezoid: and let F 0 := σ(D 0 ). Notice that accident years i ≤ 1 are fully developed. Notice also that in the often considered special case i 0 = 1, the runoff trapezoid takes the form of a triangle. Instead of incremental payments I i,j we may of course equivalently consider cumulative payments C i,j := The incremental payments that occur between (calendar) time t − 1 and t corresponds to the following diagonal in the runoff triangle of incremental payments: Consequently the filtration (F t ) t∈T is given by Electronic copy available at: https://ssrn.com/abstract=3302576 i.e. the subset of D 0 corresponding to claim amounts up to and including development year k, and notice that G k := σ(B k ) ⊂ F 0 , k = 1, . . . , J, form an increasing sequence of σ-fields. Conditional expectations and covariances with respect to these σ-fields appear naturally when estimating conditional MSEP in the distribution-free chain ladder model, see [14], and also in the more general setting considered here when θ * is chosen according to the conditional specification. We refer to Section 4 for details.
3.1. Conditional MSEP for the ultimate claim amount. The outstanding claims reserve R i for accident year i that is not yet fully developed, i.e. the future payments stemming from claims incurred during accident year i, and the total outstanding claims reserve R are given by The ultimate claim amount U i for accident year i that is not yet fully developed, i.e. the future and past payments stemming from claims incurred during accident year i, and the ultimate claim amount U are given by Similarly, the amount of paid claims P i for accident year i that is not yet fully developed, i.e. the past payments stemming from claims incurred during accident year i, and the total amount of paid claims P are given by Obviously, U i = P i + R i and U = P + R.
We are interested in calculating the conditional MSEP of U and we can start by noticing that if the F 0 -measurable random variable P is added to the random variable R to be predicted, then the same applies to its predictor: U = P + R. Therefore, Further, in order to be able compute the conditional MSEP estimators from Definition 2.1, and in particular the final plug-in estimator given by (7), we need to specify the basis of prediction, i.e. z → h(z, F 0 ), which is given by as well as specify the choice of θ * . Neither of this is meaningful to pursue any further, without specifying the underlying model structure. In Sections 4, 5 and 6 we discuss how this can be done for specific models using Lemma 2.1.

3.2.
Conditional MSEP for the claims development result. Above we have described an approach for calculating MSEP for the ultimate claim amount. Another quantity which has received considerable attention is MSEP for the claims development result, CDR, which is the difference between the ultimate claim amount predictor and its update based on one more year of observations. For the chain ladder method, an estimator of the variability of CDR is provided in [26]. We will now describe how this may be done consistently in terms of MSEP * . As will be seen, there is no conceptual difference compared to the calculations for the ultimate claim amount -all steps will follow verbatim from Section 2. For more on the estimator in [26] for the distribution-free chain ladder model, see Section 5. Let and θ (0) and θ (1) are F 0 -and F 1 -measurable estimators of θ, based on the observations at times 0 and 1, respectively. Hence, CDR is simply the difference between the predictor at time 0 of the ultimate claim amount and that at time 1. Thus, given the above it follows by choosing that we may again estimate MSEP F0 (CDR, CDR) using Definitions 2.1 and 2.2in particular we may calculate the plug-in estimator given by (7). Note, from the definition of CDR, regardless of the specification of θ * , that it directly follows that where the F 0 -measurable term h (0) ( θ ; F 0 ) appearing both in CDR and CDR has canceled out. Thus, from the above definition of h(θ; F 0 ), together with the definition of MSEP * , Definition 2.1, it is clear that the estimation error will only correspond to the effect of perturbing θ in E[h (1) ( θ (1) ; F 1 ) | F 0 ](θ). Moreover, the notion of conditional MSEP and the suggested estimation procedure for the CDR is in complete analogy with that for the ultimate claim amount. This estimation procedure is however different from the ones used in e.g. [26,28,20,6] for the distribution-free chain ladder model. For Mack's distribution-free chain ladder model, and therefore MSEP F0 (CDR, CDR) = MSEP F0 (CDR, 0). This is however not true in general for other models. More details on CDR-calculations for the distributionfree chain ladder model are found in Section 5.
Moreover, by introducing we can, of course, repeat the above steps to obtain the conditional MSEP for the k-year CDR by using the following definition Electronic copy available at: https://ssrn.com/abstract=3302576 together with the obvious changes. We want to stress that we have no particular interest in these CDR-calculations, one-year or k-year, but merely want to illustrate the applicability and transparency of the suggested approach. As an illustration, in Section 5 calculations for the ultimate claim amount and one-year CDR are carried out in more detail for the distribution-free chain ladder model. This is, again, based on using Lemma 2.1.

Dynamics in the form of sequential conditional linear models
We will now describe how the theory introduced in Section 2 applies to specific models. We will first introduce a class of sequential conditional linear models to which the distribution-free chain ladder model is a special case, but also contains more general autoregressive reserving models investigated in e.g. [11] and [12]. Since this class of models has a natural conditional structure it is interesting to discuss the specification of θ * as being either conditional or unconditional. As concluded in Section 2, the parameter estimator θ and Λ(θ; F 0 ) are needed in order to obtain a computable estimator of MSEP F0 (X, X) following (7). In the present section we will present rather general development-year dynamics for claim amounts that immediately give the estimator θ and we will discuss how θ * can be specified which gives us Λ(θ; F 0 ).
For the remainder of the current section we will focus on the following developmentyear dynamics for claim amounts: Here Y j+1 is a |I| × 1 vector that may represent incremental or cumulative claim amounts, corresponding to either Y j+1 = (I i,j+1 ) i∈I or Y j+1 = (C i,j+1 ) i∈I , respectively, A j is a |I| × p j matrix, β j is a p j × 1 parameter vector, σ j is a positive scalar parameter, D j is a diagonal |I| × |I| matrix with positive diagonal elements and e j+1 is a |I| × 1 vector. The random matrices A j and D j and the random vector e j+1 all have independent rows. This requirement ensures that claim amounts stemming from different accident years are independent. Moreover, the components of e j+1 all have, conditional on A j and D j , mean zero and variance one. Therefore, the same holds for the unconditional first two moments: Notice, however, that variables e 2,k , . . . , e J,k are not required to be independent. In fact if the variables Y 2,k , . . . , Y J,k are required to be positive, then e 2,k , . . . , e J,k cannot be independent. See Remark 2 in Section 5 for an example, and [15] for further comments in the setting of Mack's distribution-free chain ladder model.
The development-year dynamics (11) with the above dimensions of A j , D j and e j+1 do not correspond to the dynamics of data observed at time 0. For runoff triangle data, observations come in the form of a diagonal. In particular, at time 0 only the first n j := J − j − i 0 + 1 components of Y j+1 are observed. The developmentyear dynamics of claim amounts that are observed at time 0 are therefore of the form where Y j+1 is a n j × 1 vector, A j is a n j × p j matrix, D j is a diagonal n j × n j matrix and e j+1 is a n j × 1. We will throughout assume that n j ≥ p j . Hence, we will in what follows consider a sequence of conditional linear models where the dimension of the parameters is fixed whereas the dimension of the random objects vary with the development year. Notice that Y j+1 , A j , D j and e j+1 are the subvectors/matrices of Y j+1 , A j , D j and e j+1 obtained by considering only the first n j rows.
Recall the following notation introduced in Section 2 i.e. the subset of D 0 corresponding to claim amounts up to and including development year k, A j and D j are both σ(B j )-measurable with independent rows. Moreover, by the independence between the rows in e j+1 , the components of e j+1 all have, conditional on A j and D j , mean zero and variance one. These observations form the basis of parameter estimation since it allows β j to be estimated by the standard weighted least squares estimator from the theory of general linear models: which is independent of σ j . Notice in particular that Moreover, The estimator of the dispersion parameter σ 2 j is, for j = 1, . . . , J − 1, given by given that n j − p j > 0, i.e. given that i 0 ≤ J − j − p j . If i 0 = 1, then σ 2 J−1 has to be defined by an ad hoc choice. The weighted least squares estimator in (13) is the best linear unbiased estimator of β j in the sense that, for any a ∈ R pj , β j is such that a β j has minimum variance among all unbiased linear estimators. Similarly the estimator in (16) is the best unbiased estimator of σ 2 j . For further details on weighted (generalized) least squares see e.g. [21,Sec. 3.10].
Basic properties of the estimators are presented next. The essential properties are that, for each j, β j is unbiased and, for j = k, β j and β k are uncorrelated.
Electronic copy available at: https://ssrn.com/abstract=3302576 Considering the similarities of the model considered here and general linear models, it is clear that there are conditions ensuring that h(θ; In what follows we hence make the following assumption: Assumption 4.1 is fulfilled by e.g. the distribution-free chain ladder model, see Section 5, as well as the models stated in Appendix A, which cover e.g. [11,12]. Given Assumption 4.1 we write h(β; F 0 ) for h((β, z); F 0 ) for an arbitrary z.
Recall from Section 2 that MSEP F0 (X, X) is approximated by (6) which in turn has a computable estimator (7). Under Assumption 4.1, and therefore (6) simplifies as follows: 4.1. Specification of θ * . Recall from Section 2 that we introduced the two independent and identically distributed stochastic processes (S t ) t∈T and (S ⊥ t ) t∈T , where the former is the one generating data that can be observed. In the current setting we have a parallell universe (another independent runoff triangle) with development year dynamics If the unconditional specification of θ * is chosen, i.e. θ * ,u = θ ⊥ , then i.e. simply the weighted least squares estimator applied to the data in the independent triangle with identical features as the observable one. It follows directly from Proposition 4.1 that It is also clear that these unconditional covariances Cov( β j ) are not possible to compute analytically.
On the other hand, if we specify θ * conditionally, then which is identical to β j except that e ⊥ j+1 appears instead of e j+1 . Notice that this definition of β * j satisfies Assumption 2.1. Notice also that Electronic copy available at: https://ssrn.com/abstract=3302576 Hence, Further, in Section 2 arguments were given for when the conditional specification of θ * resulting in Λ(σ; F 0 ) may be seen as an unbiased estimator of Λ(β, σ), given by the corresponding unconditional θ * , see (8). Within the class of models given by (11) this relation may be strengthened: Proposition 4.2 below tells us that Λ( σ; F 0 ) is an unbiased estimator of Cov( β) and an empirical estimator of Cov( β) based on a single claims trapezoid.
The proof of Proposition 4.2 is given in the appendix. Moreover, in Appendix B we have collected a number of asymptotic results where it is shown that, given suitable regularity conditions, Cov( β) and Cov( β j | B j ) will converge to the same limit as the number of accident years tends to infinity, see Proposition B.1. This implies that given a sufficient amount of data the two views on estimation error will result in conditional MSEP estimates that are close. In Section 5 this is shown to be the case in an illustration based on real data.

Mack's distribution-free chain ladder
The classical chain ladder reserving method is a prediction algorithm for predicting the ultimate claim amount. In order to justify the use of this method and in order to measure the prediction accuracy, Mack introduced, in [14], conditions that should be satisfied by the underlying model. The chain ladder method with Mack's conditions is referred to as Mack's distribution-free chain ladder model. We will see that this setting is compatible with the development-year dynamics (11) in Section 4 and we will show in Proposition 5.1 that the estimator of MSEP F0 (U, U ) from Section 3.1 calculated according to Definition 2.3 coincides with the celebrated estimator of MSEP F0 (U, U ) provided by Mack in [14].
In accordance with Mack's distribution-free chain ladder model, assume that, for j = 1, . . . , J − 1, there exist constants f j > 0, called development factors, and constants σ 2 j ≥ 0 such that where i = i 0 , . . . , J. Moreover, assume that, Notice that the claim amounts during the first development year I i0,1 , . . . , I J,1 are independent but not necessarily identically distributed.
Mack's distribution-free chain ladder fits into the development-year dynamics (11) in Section 4 as follows: for j = 1, . . . , J − 1, set p j = 1, β j = f j , where diag[a] denotes a diagonal matrix with diagonal [a]. Notice that this choice of (Y j+1 , A j , Σ j ) corresponds to a special case of (38) of Assumption A.1. Therefore, the statement of Assumption 4.1 holds.

Remark 2.
For the elements of Σ j to have positive diagonal elements we need the additional condition In this case, conditional on C i,j , {e j+1 } i is simply a translated lognormal random variable, translated by −f j C 1/2 i,j /σ j , with zero mean and unit variance. Notice that which coincides with the classical chain ladder development factor estimator, hence, being a standard weighted least-squares estimator for the model (11). Furthermore, and similarly for Using the tower property of conditional expectations together with (18) and (20) it is straightforward to verify that In order to calculate MSEP for the ultimate claim amount following Lemma 2.1, we need to obtain expressions for process (co)variances and the Q i,j s given by The process variances are given in [14,Thm. 3,Cor.] and follows by using variance decomposition, the tower property of conditional expectations, (18), (19) and (20), and may, after simplifications, be expressed as For detailed calculations, see [14,Thm. 3,Cor.]. Further, letting Thus, if we set we see that If we turn to the calculation of Q i,j ( θ; F 0 ) we see that for i = 2, . . . , J and j = 1, . . . , J − 1 and that where {Λ(σ; F 0 )} i,j = 0 for all i = j. Hence, Electronic copy available at: https://ssrn.com/abstract=3302576 and it follows by direct calculations that Thus, from Lemma 2.1 it follows that for a single accident year i, which is equivalent to [14,Thm. 3]. We state this result together with the corresponding result for the total ultimate claim amount in the following proposition: Proposition 5.1. In the setting of Mack's distribution-free chain ladder, where Γ U i,J is given by (24) and ∆ U i,J is given by (27). The remaining part of the proof is given in Appendix C and amounts, due to The figures in Appendix D illustrate the differences between conditional MSEP estimates when using the conditional specification and when using the unconditional specification of f * . The illustrations are based on simulations and data from [14]. It is seen that the disfferences are essentially indistinguishable.
Before ending the discussion of conditional MSEP estimation for the ultimate claim amount, recall that the conditional MSEP can be split into one process variance part and one estimation error part. In [14] all process variances are calculated without using any approximations, and the estimation error is calculated exactly up until a final step where, p. 219 in [14], "...we replace S 2 k with E(S 2 k | B k ) and S j S k , j < k, with E(S j S k | B k )". This last step may, as noted already in [5], be seen as a specific choice of f * , following the general approach in the present paper. Given this specific choice of f * , the calculations carried out in [14] are exact. However, the implicit choice of f * used in [14] is different from the one used in the present paper, since Proposition 5.1 relies on a certain Taylor approximation. In [5] an exact MSEP calculation for the ultimate claim amount is carried out using a choice of f * which is identical with that used in the present paper. Moreover, from the calculations in [5] it is clear that the Taylor approximation used in Proposition 5.1 will result in under estimation, w.r.t. the specific choice of f * used in the current paper. For further details, see [5] as well as the discussion in [15].
We will now provide the necessary building blocks needed in order to be able to arrive at the estimator of conditional MSEP for the CDR following Section 3.2 using Definition 2.3. This will be done using the same notion of conditional MSEP for both the ultimate claim amount and for CDR which, as introduced in Section 2, is the F 0 -conditional expectation of the squared distance between a random variable and its F 0 -measurable predictor, as well as the same estimation procedures.
We now proceed with the derivation of the estimator of conditional MSEP for the CDR in the chain ladder setting, in complete analogy with the corresponding derivation of the estimator of conditional MSEP for the ultimate claim amount. Note that many of the partial results needed for the computation of our suggested estimator of conditional MSEP for the CDR can be found in [16,26,28]. The results in the mentioned papers do, however, use a different indexation than that used in [14], which is the indexation used in the present paper. Due to this, we have rephrased all results for the CDR-calculations in terms of the indexation used in [14].
As before, let h(θ; F 0 ) denote the theoretical predictor, but now w.r.t. CDR:
In order to calculate conditional MSEP for the CDR, we again make use of Lemma 2.1. The plug-in estimator of the process variance for a single accident year, one of the two terms of the estimator of conditional MSEP, is derived in [28], where Γ CDR i,J The process variance for all accident years is given by which may be written as Hence, it follows that where which corresponds to [26,Eq. (3.4)], then Combining the above, using Lemma 2.1, gives that MSEP F0 (CDR i , CDR i ), given by Definition 2.3, simplifies to Note that by using the linearisation of the process variance used in [26, Eq. (A.1)] it follows that it in turn follows that (35) reduces to Result 3.1, Eq. (3.9), in [26]. Notice that our estimator of conditional MSEP coincides with that in [26] despite the quite different logics of the two approaches for deriving the estimator. The derivation of Result 3.1 in [26] is based on perturbing the initial f j s, i.e. the f (0) j , that in our setting are a part of the basis of prediction and therefore may not be perturbed. That the two approaches give estimators that coincide is due to the underlying symmetry MSEP F0 (CDR i , CDR i ) = MSEP F0 ( CDR i , CDR i ) and the fact that the CDR-quantities are multilinear in the model parameters.
Furthermore, the MSEP calculations for the CDR aggregated over all accident years follow the same steps as those used for the derivation of the corresponding MSEP calculations for the ultimate claim amount verbatim. The only resulting difference is the necessity to keep track of covariance terms across accident years. That is, we will get contributions of the form when i < k, which by introducing allows us to summarize the results obtained in the following proposition: In the setting of Mack's distribution-free chain ladder, As noted in the discussion leading up to Proposition 5.2, the proof is identical to that of Proposition 5.1 in all aspects, except for the covariance terms, see Appendix C for details. Again, in analogy with the situation for a single accident year, using the process (co)variance approximation following [26, Eq. (A.1)], it is seen that Proposition 5.2 will coincide with Result 3.3 in [26]. Even though the results from Proposition 5.2, given the mentioned approximation, will coincide with those obtained in [26,Result 3.3], the underlying estimation procedures differ. The procedure advocated here for the CDR is consistent with that for the ultimate claim amount and is straightforward to apply.
As mentioned in Section 3.2, the primary purpose with the current section was to illustrate how the introduced methods can be applied to different functions of the future development of the underlying stochastic process -here the ultimate claim amount and the CDR. In the next, and final, section, we illustrate how the general approach to calculate conditional MSEP introduced in the present paper applies to other reserving methods.

Applications to non-sequential reserving models
We will now demonstrate that the general approach to estimation of conditional MSEP presented in Section 2 also applies when the model is quite different from the sequential conditional linear models considered in Section 4. We will show how to compute conditional MSEP estimates for the ultimate claim amount for the overdispersed Poisson chain ladder model, see e.g. [13,8]. The overdispersed Poisson chain ladder model is based on the following assumptions: where i, j = 1, . . . , J and α 1 = β 1 = 0. The model parameters may be estimated using standard quasi-likelihood theory and the natural predictor for the ultimate claim amount for accident year i is given by where θ = (η, {α i }, {β k }). We may use Lemma 2.1 to calculate conditional MSEP for the ultimate claim amount. Firstly, due to independence across all indices, Secondly, in order to determine the Q i,j ( θ; F 0 )s we need the partial derivatives of h i (θ; F 0 ) which are given by Hence, are independent of F 0 , and in particular By combining the above relations together with Lemma 2.1 it follows that the estimator of conditional MSEP in Definition 2.3, applied to the ultimate claim amount, is given by and takes the form What remains for having a computable estimator of conditional MSEP for the ultimate claim amount is to compute the covariance matrix Λ(θ) = Cov( θ). Notice that the estimator (37) corresponds to the general conditional MSEP estimator upon choosing θ * as an independent copy θ ⊥ of θ, which gives Notice also since the overdispersed Poisson chain ladder model relies on quasilikelihood theory we do not have access to an explicit expression for the covariance of the parameter estimators. However, no such explicit expression is needed since a numerical approximation is easily obtained as output of a standard quasi-Poisson GLM-fit. That is, using standard numerical procedures for approximating the covariance matrix, e.g. GLM-fitting procedures, one obtains a non-simulation based procedure for estimation of the conditional MSEP for the ultimate claim amount. Further, since quasi-likelihood estimators are M-estimators, see e.g. [24,Ch. 5], these can be shown to be consistent given certain regularity conditions. This motivates neglecting possible bias when using Definition 2.3. Another alternative is, of course, to introduce a bias correction, see e.g. [12]. Another observation concerning the conditional MSEP estimator (37) for the overdispersed Poisson chain ladder model is the following: The proof follows by noting that all ∇g i ( θ) are functions of ∇µ i,j ( θ)s and where η i,j := log(µ i,j ). See also [8,Eq.  Notice that due to Lemma 2.1 the semi-analytical estimator (37) is valid for any non-sequential GLM-based reserving model.
The above example of calculating a semi-analytical expression for the estimator of conditional MSEP for the ultimate claim amount according to Definition 2.3 for the overdispersed Poisson chain ladder model can of course be extended to more complex models as long as it is possible to compute (i) h(θ; F 0 ) together with its partial derivatives, (ii) (an approximation of) a suitable, conditional or unconditional, covariance matrix of θ. One example of a more complex GLM-based reserving model is the one introduced in [25], which is based on one triangle for observed counts and one triangle for incremental payments. In this model the counts are modelled as an overdispersed Poisson chain ladder model, and the incremental payments are modelled as a quasi-Poisson GLM model conditional on counts. Due to the overall quasi-Poisson structure of the model it is possible to obtain explicit expressions for the predictor of the ultimate claim amount, together with the corresponding process variance, but where F 0 now also contains information concerning observed counts. The conditional MSEP for the ultimate claim amount can again be calculated using Lemma 2.1.
Furthermore, the general exposition of the methods introduced in the present paper do not rely on that the data generating process is defined in terms of runoff triangles. Examples of another type of models are the continuous time point process models treated in e.g. [17,3]. These models rely on extensive stochastic simulations in order to be used in practice. One simple example of a special case of a point process model for which the quantities needed for the calculation of a semi-analytical MSEP estimator for the ultimate claim amount according to Definition 2.3 is possible is the model described in Section 8.A in [17]. Hence, it is again possible to use Lemma 2.1 to calculate the conditional MSEP of the ultimate claim amount.
The above examples provide semi-analytical MSEP estimators which only rely on that we are able to calculate certain expected values and (co)variances. One advantage of this approach is that there is no need for simulation based techniques in order to carry out MSEP calculations.
Remark 3. The models with intercepts defined by (39) and (41) require that the payment data is normalized by an exposure measure before any statistical analysis. The normalization may correspond to dividing all payments stemming from a given accident year by the number of written insurance contracts that accident year.
where each coefficient a i,j is either 0 or a finite product of distinct β-parameters β jk for j ∈ {1, . . . , J − 1} and k ∈ {1, . . . , p j }. In particular, E[U | F 0 ] is an F 0 -measurable multi-affine function in the parameters β jk , an expression of the form c + dβ jk . Under Assumption A.2, using the tower property of conditional expectations, where each coefficient b i,j is either 0 or a finite product of distinct β-parameters β jk for j ∈ {1, . . . , J − 1} and k ∈ {1, . . . , p j }. In particular, E[U | F 0 ] is again an F 0 -measurable multi-affine function in the parameters β jk , an expression of the form c + dβ jk .
It is clear that each of Assumption A.1 and A.2 implies that the statement in Assumption 4.1 holds.

Appendix B. Asymptotic properties of conditional weighted least squares estimators
The following result motivates the approximation of Cov( β j ) by Cov( β j | B j ), and hence also the approximation of Cov( β) by Λ(σ; F 0 ), by asymptotic arguments, corresponding to letting the number of accident years in the available data set tend to infinity.
The proof of Proposition B.1 is given in the appendix and relies on that the conditional covariance may be written in the form of weighted sums of independent random variables.
Remark 5. Conditions (i)-(iii) are technical conditions that can be verified given additional mild assumptions, essentially existence of higher order moments, on the development-year dynamics in (11). The conditions can be simplified if it is assumed that the development-year dynamics for different accident years are identical, corresponding to identically distributed rows for A j and Σ j . Condition (iii) is equivalent to the existence of an invertible p j × p j matrix ν j such that If the rows of A j and Σ j are identically distributed, then has an invertible covariance matrix. Remark 6. Proposition B.1 provides the asymptotic behavior of Cov( β) and Λ(σ; F 0 ) as the number of accident years in the available data set tends to infinity. Proposition B.1 can be extended to also address the asymptotic behavior of Λ( σ; F 0 ) by considering conditions ensuring consistency and a certain rate of convergence for the estimators σ 2 j . We will not analyze such conditions in this paper.
Proof of Lemma 2.1. Recall from Definition 2.3 that it is possible to split the conditional MSEP approximation into a process variance part and an estimation error part. Thus, given that X = i∈I X i , it follows that the process variance may be expressed as Electronic copy available at: https://ssrn.com/abstract=3302576 and, if it in addition holds that h(θ; F 0 ) = i∈I h i (θ; F 0 ), the estimation error part of (7) may be re-written according to Proof of Statement (ii): Let Z j+1 := Σ −1/2 j Y j+1 and C j := Σ −1/2 j A j and rewrite the weighted linear model (12) as Z j+1 = C j β j + σ j e j . Notice that It now follows from Theorem 3.3 in [21] that E[ σ 2 j | B j ] = σ 2 j holds for j = 1 . . . , J −1 given that i 0 ≤ J − j − p j .
Proof of Proposition 4.2. Covariance decomposition together with (14) gives on the one hand On the other hand, using Proposition 4.
Proof of Proposition B.1. The constant parameter σ j is irrelevant for the argument of the proof and therefore here set to 1. Notice that, for i, k ∈ {1, . . . , p j }, where the terms are independent since A j and Σ j have independent rows. Further, by assumption (ii) it follows that, for i, k ∈ {1, . . . , p j }, This allows us to use Corollary 4.22 in [10], i.e. that, for i, k ∈ {1, . . . , p j }, → ν j as n j → ∞.
Since ν j is invertible, the latter convergence implies n j ( From the proof of Proposition 4.2 we know that Cov( β j ) = E Cov β j | B j . The assumed uniform integrability and Proposition 4.12 [10] give Proof of Corollary B.1. We start by proving that β P → β as |I| → ∞. By Proposition 4.1, β is an unbiased estimator of β. Now Markov's inequality combined with Proposition B.1 immediately gives consistency: for k ∈ {1, . . . , p j } and any ε > 0, We continue by showing that |I|Λ(σ; F 0 ) converges in probability as |I| → ∞. First, from Proposition B.1 we know that |I|Λ(σ; F 0 ) P → C as |I| → ∞, where C is block diagonal with blocks ν −1 j . From this, (15) and the assumption that σ 2 j P → σ 2 j as |I| → ∞ for all j = 1, . . . , J − 1, an application of Slutsky's theorem yields |I|Λ( σ; F 0 ) P → C as |I| → ∞. Further, h is only a function of elements in either (I ij ) i≥2,j∈J or (C ij ) i≥2,j∈J and thus it follows that, for a fixed J, h is independent of |I|. Therefore β → ∇ β h(β; F 0 ) does not depend on |I|. Moreover, from Remark 4, each component of ∇ β h( β; F 0 ) is either constant or a multi-affine function of the components of β, i.e. a sum of products of the components of β. Therefore, since β P → β as |I| → ∞, we can use the continuous mapping theorem to conclude that Electronic copy available at: https://ssrn.com/abstract=3302576 as |I| → ∞. Putting it all together we have Proof of Proposition 5.1. The proof of MSEP for the ultimate claim amount for a single accident year is already given in Section 5 in the text leading up to the statement of Proposition 5.1. We will now go through the remaining steps needed in the derivation of MSEP for the ultimate claim amount aggregated over all accident years.
In Section 5 we provided the process variance, see (23), hence, following Lemma 2.1, what remains to determine are the Q i,k ( θ; F 0 )s: where {Λ( σ; F 0 )} i,j = 0 for all i = j. By using the above, for i ≤ k, it follows that where ∆ U i,J is given by (27). Given the above, the statement in Proposition 5.1 follows by using Lemma 2.1.
Proof of Proposition 5.2. As in the proof of Proposition 5.1, the process (co)variances are obtained from the references given in the text leading up to the formulation of Proposition 5.2. Thus, given Lemma 2.1, what remains to determine are the Q i,k ( θ; F 0 )s: where ∇ f h i ( f ; F 0 ) is given by (33), which may be expressed as and Thus, for all i ≤ k it holds that is given by (34) and χ CDR i,J is given by (36). Finally, Proposition 5.2 follows by combining the above together with the corresponding process (co)variances and Lemma 2.1.

Appendix D. Numerical Example
In this section a simulation study is presented whose purpose is to analyze and compare the two estimators of conditional MSEP based on the conditional and unconditional specification of θ * . The data used is the runoff triangle of [23], see Table 1, which has been widely used and analyzed, e.g. in [14]. The performance of the two estimators of conditional MSEP, based on this particular data set, is examined by estimating, through simulations, E[∆V 2 ] as specified in Section 2.1. The practical relevance of computing these estimators is investigated by comparing the size of the estimation error to the size of the process variance.
The data generating process the simulation study is assumed to be a sequence of general linear models of the form in (11) in Section 4. More specifically, for each i ∈ I, it is assumed that C i,1 = α + τ e i,1 , C i,j+1 = f j C ij + σ j C ij e i,j+1 , j = 1, . . . , J − 1.
The error terms are given by Remark 2, i.e. by translated log-normal variables, which also holds for the first column by setting C i0 := 1 for all i ∈ I.
The parameter values used in the simulation study are the ones acquired from fitting this model to the data in Table 1 following the weighted least squares estimation introduced in Section 4, see (13) and (16). As seen in Section 5, this is equivalent to fitting a chain ladder model to this triangle together with estimating an intercept and a variance for the first column (using the sample mean and the unbiased sample variance of the first column). The resulting parameter estimates are taken to be the true parameter values in the simulation study, they are denoted by f ,σ 2 , α, and τ 2 , and referred to jointly as θ. To be able to use the unbiased estimators of the σ 2 j s, the last column of the triangle is removed. An alternative to this approach could be to use maximum likelihood or some form of extrapolation of the σ 2 j s. Since comparison of methods to estimating tail variances is not the purpose of the simulation study, the former simpler approach is chosen. Based on the above development-year dynamics and θ, N = 10 6 new triangles are generated giving rise to {F For each such triangle, a chain ladder model is fitted together with an intercept and variance for the first column, as described above, to get the parameter estimator θ (i) . For i = 1, . . . , N , the following quantities are computed: • the (true) process variance Var(U (i) | F (i) 0 ), given in (22), and the plug-in estimator Var( , ( σ (i) ) 2 ) given in (23) • ∆V 2 i for the two resampling specifications as given in Section 2.1. As already mentioned, Cov( f ) is not analytically tractable and is therefore estimated using simulations. Recall, from Proposition 4.  ). The choice of M i is as follows. For a fixed n, consider the increasing sequence (2 k−1 n) k≥1 . Conditional on not having stopped for the value k, 2 k+1 n new triangles are generated based on the parameters θ are computed as well as Upon stopping the two independent samples of size 2 k n are merged. Consequently M i = 2 k+1 n, where k is the smallest number such that the stopping criterion is satisfied.
The results of the simulation study are the following. In Figure 1 the distribution of the difference between the simulated values of ∆V 2 for the unconditional and the conditional specification of θ * is illustrated. The distribution is leptokurtic, has a slight positive skewness and is approximately centered at zero. The mean and the median of this distribution are small relative the scale of the data (−0.94 · 10 22 and 0.28 · 10 22 , respectively). To quantify the uncertainty in these quantities 95% bootstrap confidence intervals are computed based on the percentile method, see [7], yielding [−1.2, −0.7] · 10 22 and [0.2, 0.3] · 10 22 , respectively, using 10 5 bootstrap samples. As a matter of fact, none of the bootstrap samples of the mean are above 0 and none of the samples of the median are below 0. This indicates that the unconditional specification is better on average (the mean is negative), but the conditional specification is better more often (the median is positive). The practical relevance of this is, however, questionable since on the relative scale of the data, the mean and median are both approximately zero, indicating that the difference between the two estimators is negligible and that one should therefore focus on the computability of the estimators. In Figure 2 the ratio between the conditional and the unconditional estimators of the estimation error is shown. From this figure it is clear that the two estimators are comparable and do not deviate from each other by much.
The distribution of the difference between the ∆V 2 s is heavy tailed, and one is therefore led to question whether this is due to the log-normally distributed error terms. Therefore, the marginal distributions of the components of θ are illustrated in Figure 3 (first column parameters), Figure 4 (development factors) and Figure 5 (chain ladder variances). The estimators of the intercept of the first column and the development factors are, for all intents and purposes, marginally Gaussian. The variances, however, do have heavier tails (the standard deviations are illustrated in Figure 5). This can have a large effect on the estimated process variance, and thus in turn on the ∆V 2 s.   . Kernel density estimators of the estimators of the development factors. Some density curves are cut in order to make it easier to visually discriminate between the development factors centered close to 1.
So far the relative performance of the two estimators has been presented. It is of interest to also investigate the absolute performance. Figure 6 shows the distributions of the true estimation error minus the estimated ones based on the conditional and unconditional specification of θ * . It is seen that there is a tendency Electronic copy available at: https://ssrn.com/abstract=3302576 to overestimate the true estimation error, although there is a tail to the right indicating that the estimation error will occasionally be greatly underestimated.
The mean estimation error in the simulations is 1.9 · 10 12 and the 95% quantiles of the two above distributions are approximately 5 · 10 12 . The estimated estimation error will therefore, in the 95% worst case scenario, be underestimated on the scale of, approximately, 2.5 estimation error means. The practical relevance of estimating the estimation error requires that it is of size comparable to the process variance. Figure 7 shows the distributions of the estimated estimation errors divided by the estimated process variances, together with dashed black vertical lines indicating some of the quantiles of the distribution of the true estimation error divided by the true process variance. On average the estimation error is half the size of the process variance, which is also more or less the center of the distributions of the estimated versions. The median, however, of the true distribution lies approximately around 0.25. Therefore, it is as likely that the estimation error is greater than a quarter of the process variance as that it would be less than a quarter of the process variance.
Finally, to illustrate how plug-in estimation of the process variance performs, Figure 8 shows the distribution of the ratio between the estimated process variance (based on plug-in) and the true process variance. Both the mean and the median of this distribution lie close to 1, indicating that on average the estimator yields the correct variance and that we are more or less equally likely to overestimate it as to underestimate it. It is also seen that there are extreme cases where the variance is estimated to be either half or double the true variance.