Detection and treatment of outliers for multivariate robust loss reserving

Abstract Traditional techniques for calculating outstanding claim liabilities such as the chain-ladder are notoriously at risk of being distorted by outliers in past claims data. Unfortunately, the literature in robust methods of reserving is scant, with notable exceptions such as Verdonck & Debruyne (2011, Insurance: Mathematics and Economics, 48, 85–98) and Verdonck & Van Wouwe (2011, Insurance: Mathematics and Economics, 49, 188–193). In this paper, we put forward two alternative robust bivariate chain-ladder techniques to extend the approach of Verdonck & Van Wouwe (2011, Insurance: Mathematics and Economics, 49, 188–193). The first technique is based on Adjusted Outlyingness (Hubert & Van der Veeken, 2008. Journal of Chemometrics, 22, 235–246) and explicitly incorporates skewness into the analysis while providing a unique measure of outlyingness for each observation. The second technique is based on bagdistance (Hubert et al., 2016. Statistics: Methodology, 1–23) which is derived from the bagplot; however; it is able to provide a unique measure of outlyingness and a means to adjust outlying observations based on this measure. Furthermore, we extend our robust bivariate chain-ladder approach to an N-dimensional framework. The implementation of the methods, especially beyond bivariate, is not trivial. This is illustrated on a trivariate data set from Australian general insurers and results under the different outlier detection and treatment mechanisms are compared.


Motivation
At any moment, the number, timing and severity of future claims payments for a general insurer is shrouded in uncertainty.Reserves are set up to ensure, with a given degree of confidence, that necessary claims payments are met as they arise.The reserving problem is one of solvency, however it is also one of capital efficiency; if an insurer holds reserves over and above what is needed, they forfeit the opportunity to utilise this capital elsewhere, and insurance becomes more expensive than necessary.It is critical that the results from models and techniques applied to the loss reserving problem are as accurate as possible when tasked to a range of different data sets.
Some of these data sets may include abnormal observations; outliers or deviations from model assumptions.Full inclusion of these data points in an analysis may prove detrimental to the accuracy of reserve estimates and the resulting inference.This is an issue that needs to be addressed if these models and techniques are going to reflect reality and be used to inform decisions.Robustness refers to the ability of a model or estimation procedure to not be overtly influenced by outliers in the data set under investigation and/or deviations from the underlying assumptions of the model.Understanding how outliers may impact the results of a model and having statistically sound procedures to detect and treat abnormal observations will improve the robustness of reserving techniques and ultimately lead to more informed and reliable decisions.
While some authors have explored the issue of robustness in reserving, the body of literature in this area is relatively scant.Of particular importance for this paper is the robust GLM chain-ladder of Verdonck and Debruyne (2011) and the robust bivariate chain-ladder of Verdonck and Van Wouwe (2011), however there has been notable work that moves away from the chain-ladder technique.For robust non-chain ladder methods, please also see for example Chan and Choy (2003); Chan et al. (2008); Pitselis et al. (2015).Verdonck, Van Wouwe, and Dhaene (2009) provide a two-stage deterministic robust chain-ladder technique which fundamentally relies on the analysis of residuals given after fitting an over-dispersed Poisson (ODP) GLM to the cumulative and then incremental claims data as described in England and Verrall (1999).A boxplot is employed on the Pearson residuals after fitting the ODP GLM at each stage to detect outlying observations.Under their robust technique, development factors are calculated as medians which are known to be much more robust than means.Unfortunately, in its original formulation, results were poor as reserves were still being heavily influenced by outliers.
The approach of Verdonck, Van Wouwe, and Dhaene (2009) was refined by Verdonck and Debruyne (2011).In particular, it was found that the standard threshold value of 1.345 to identify outliers was often too low for triangular loss data.To combat this an additional stage was added to the methodology whereby the threshold point was taken to be the 75%-quantile of the residuals after an initial fit using the threshold value of 1.345 which led to better results in terms of robustness.Verdonck and Debruyne (2011) also showed that the influence function for reserves with respect to incremental claims is unbounded when assuming a Poisson GLM specification.This provides a formal basis for the non-robustness of the chainladder technique, and related techniques.A comprehensive study of impact functions of central estimates, mean squared errors, and quantiles of reserves can be found in Avanzi, Lavender, Taylor, and Wong (2023).

Statement of contributions
The refined robust GLM chain-ladder technique is an integral component of the robust bivariate chainladder as developed in Verdonck and Van Wouwe (2011).In particular, the residuals given after fitting the robust GLM chain-ladder are used to generate a bagplot (Rousseeuw, Ruts, and Tukey, 1999) which may be considered as a bivariate boxplot.The second approach employed to detect and treat outliers is the minimum covariance determinant (MCD) (Rousseeuw, 1984) Mahalanobis distance, also applied to the residuals.
Each of these two techniques have shortcomings that this paper aims to address.In particular, they fail to take skewness of the data sufficiently into account, which may lead to the misclassification of data points between regular and outliers.
We introduce two alternative methodologies that offer a consistent and structured approach to the detection, measurement and treatment of outlying observations with statistical backing.Our methods are based on Adjusted Outlyingness (Hubert and Van der Veeken, 2008) and bagdistance (Hubert, Rousseeuw, and Segaert, 2016).Adjusted Outlyingness provides a unique measure of outlyingness for each observation and explicitly incorporates a robust measure of skewness into the detection process.Bagdistance is derived from the bagplot and provides a measure of outlyingness for each observation.Through calculation of the bagdistance a greater variety of alternative treatments for outliers becomes available then when simply using the bagplot.These methodologies are applied and compared on real data.Improvements over the techniques introduced in Verdonck and Van Wouwe (2011) are shown to be material and significant.
Additionally, we extend the methodologies beyond the bivariate case.Multivariate reserving techniques primarily benefit from the consideration of the dependence structure between loss triangles.Such consideration should provide a more accurate estimation of reserves, however the impact of outliers can still be significant in this setting.Furthermore, information about other lines of business may inform the identification and adjustment of outliers.Hence appropriate detection techniques, and adjustment procedures help ensure that the benefits of multivariate reserving analysis are not lost due to a lack of robustness.This is also a focus of this paper, which develops a multivariate robust reserving technique.We extend the bivariate chain-ladder to an N-dimensional framework and illustrate the implementation of all four outlier detection and treatment techniques on a trivariate example.
Implementation of the methodology is not trivial, and R codes are freely available on GitHub for any interested user; see Section 5.

Outlier Detection Techniques
In Section 2.1 we start by reviewing some basic concepts related to robust statistics, which are required to introduce four detection techniques in Section 2.2.

Preliminary: General Robustness Concepts 2.1.1. Influence Functions and Breakdown Points
When considering robust methodologies, two common concepts are influence functions and breakdown points, which we define here for later reference: Influence functions consider an estimation technique and calculate the potential 'influence' a single data point could have on that estimator.For robustness it is desirable to have a bounded influence function such that no single data point can have an unlimited impact on estimators.An example of an estimator with an unbounded influence function is the mean whereas the median has a bounded influence function.
Breakdown points describe the proportion of the data set required to be contaminated (or outlying, however this may be defined) before an estimator provides inaccurate results.For example, when calculating the mean only one data point has to be contaminated to render the technique invalid, however, when calculating the median, 50% of data points need to be contaminated to invalidate the technique.
The halfspace depth (Tukey, 1975) of a point is defined as the minimum number of points (from the data sample) in a closed halfplane through the point of interest.We refer to Figure 1 which illustrates the concept of halfspace depth in the bivariate case (note that this is scalable to higher dimensions).In Figure 1, to calculate the halfspace depth of the point marked with a green asterisk, we would consider numerous lines through this point (such as the lines denoted as L 1 and L 2 in the Figure ) and look for the minimum number of points on either side of each of these lines.Romanazzi (2001) showed that halfspace depth has a bounded influence function.This means that the impact an outlier may have on the halfspace depth of a given observation is limited and highlights the robustness of this statistic.
The depth median (Tukey median) T * , is the point with the greatest halfspace depth.It represents a central point of the data and is marked by the red asterisks in Figure 2. If the point is not unique the centre of gravity of the deepest region is used.

Minimum Covariance Determinant (MCD) Estimation Procedure
A popular technique to estimate location and dispersion parameters in a robust fashion is the minimum covariance determinant (MCD) (Rousseeuw, 1984) procedure, whereby one finds the observations whose classical estimator of the covariance matrix has the smallest determinant where n represents the number of observations and p represents the dimension of the data.The variable h represents the number of observations that are ultimately used, and the procedure puts conditions around the minimum number of observations to avoid selection of a small set that doesn't truly represent the data.The location vector is then the arithmetic mean of these points and the scale matrix is taken as a multiple of this covariance matrix.Heuristically, this procedure identifies a 'core' set of data points which are mostly related to one another.

Swamping and Masking
Inappropriate outlier detection techniques may lead to swamping or masking.Swamping occurs when one classifies too many observations as outliers i.e. the impact outliers have on the technique being applied results in regular observations incorecctly appearing to be outliers.Masking, on the other hand, refers to the failure to classify outliers as such i.e. outliers have impacted the technique being applied in a way that 'masks' them from being detected as outliers.

Outlier Detection Techniques
A robust bivariate chain-ladder technique hinges on the detection and adjustment of bivariate outliers.Verdonck and Van Wouwe (2011) put forward two techniques to be used for this task: the MCD (Rousseeuw, 1984) Mahalanobis Distance, and the bagplot (Rousseeuw, Ruts, and Tukey, 1999).In this section we outline these approaches and two different techniques; the bagdistance and Adjusted Outlyingnessthat can be applied to the multivariate reserving problem and which address some of the shortcomings of the former two.

MCD Mahalanobis Distance
A standard method used to detect outliers in multivariate analysis is to calculate some measure of distance of each data point from the centre of the data.A popular measure is the Mahalanobis Distance (MD), given by, where µ µ µ represents the sample location vector and Σ Σ Σ represents the sample scale matrix.If these are not estimated in a robust manner then outliers may fail to be detected due to masking and swamping effects.Essentially this means that outlying observations will influence the estimate of the central point and dispersion of the data leading to outliers themselves having small MD (masking) and/or non-outlying observations having high MD (swamping).To combat this, the MCD estimates of location and dispersion are used, and outliers are then flagged as observations that have a MD greater than a certain threshold.Note that MD 2 ∼ χ 2 p if the underlying data is normal.

Bagplot
We refer to Figure 2 to outline the components of a bagplot.
The bag (B), is shown in Figure 2 by the darker inner area surrounding T* and is constructed as follows.First, denote D k as the region of all data points that have halfspace depth greater than k and #D k as the number of data points contained in this region.The bag is given by the linear interpolation with respect to T * of the two regions that satisfy #D k ≤ n 2 < #D k−1 .Here linear interpolation refers to the method of constructing new data points that are in the range of both of #D k and #D k−1 .
The fence is then given by multiplying the bag by some factor relative to T * .Typically this factor is three and Rousseeuw, Ruts, and Tukey (1999) point out that this value was chosen based on simulations.However, in their recent R package, mrfDepth, Segaert, Hubert, Rousseeuw, Raymaekers, and Vakili (2017) have instead opted to use the square root of the 99 th percentile of a chi-squared distribution where the degrees of freedom is equal to the number of dimensions being considered.This new fence factor was chosen after derivation of the distribution of the bagdistance (Hubert, Rousseeuw, and Segaert, 2015); see also the next section.In the two dimensional case it is very close to three.We will use this newly developed fence factor throughout the paper.
When displaying a bagplot, the fence is generally not drawn (Rousseeuw, Ruts, and Tukey, 1999).Figure 2, however, shows the same bagplot with the fence drawn in green and we can see that the four outlying points marked in red are outside this fence.Note that the choice of this fence factor will directly influence both the number of outliers detected and how they are adjusted.Data points outside the fence are considered outliers and are adjusted to facilitate the application of loss reserving techniques.The adjustment may be done in a purely graphical manner i.e. bringing outliers back to the fence or loop or a weighting function based on bagdistance may be employed.
The perimeter of the lighter blue area is know as the loop and is given by the convex hull of all nonoutlying points.This is analogous to the whiskers in a univariate boxplot (Tukey, 1977).

Bagdistance
Now we present the bagdistance (bd ) (Hubert, Rousseeuw, and Segaert, 2016).This statistic provides a measure of outlyingness for each observation and hence does not rely solely on graphical representations.It can also handle skewness often found in loss data.However, similarly to the bagplot, the bd utilises the bag to capture the shape of the data and subsequently detect outliers.As the bag is formulated based on circa 50% of the data points, there is potential that it does not fully encapsulate the skewness in the set.When this is the case, the ensuant outlier detection results may be flawed.To calculate the bd, firstly define c x as the intersections of the boundary of the bag (B) and the ray from the Tukey median T * through the points x (red lines in Figure 2).The bd is defined as follows

bd(x; P
where P n represents the distribution of the data set and ||.|| is the Euclidean norm such that ||x|| = x 2 1 + ... + x 2 n .The denominator scales the distance of the data point (x) to T * , relative to the dispersion of the bag (in that direction of the projected ray).A cut-off point is then set, such that data points with a bd beyond this threshold are considered outliers and are adjusted back to an appropriate point on the ray emanating from T * passing through x.Note that if the cutoff value is chosen to be the same as the fence factor under the bagplot technique, then the same outliers will be detected.

Adjusted Outlyingness
Adjusted Outlyingness (Hubert and Van der Veeken, 2008) is based on an adjustment of the Stahel-Donoho estimate of outlyingness (Donoho, 1982) to explicitly incorporate skewness.Furthermore, it is based on robust estimates of location, scale and skewness such that it achieves a theoretical breakdown point of 25%.Further highlighting its robustness, the adjusted outlyingness technique has a bounded influence function (Hubert and Van der Veeken, 2008).The technique is applied by firstly considering a p-dimensional sample X n = (x 1 , ..., x n ) where x i = (x i1 , ..., x ip ) and a ∈ R p .The measure of adjusted outlyingness (AO) for x i is given by where where w 1 and w 2 are the lower and upper whiskers of the skew-adjusted boxplot (Hubert and Vandervieren, 2008) applied to the data set X n , and where med(X n ) denotes the median of that data set.The skewadjusted boxplot for all the AO-values is then constructed and those that are beyond the cut-off value are declared as outliers, where Q 3 represents the third quartile of the data, MC represents the medcouple (a robust measure of skewness; see Brys, Hubert, and Struyf, 2004), and where IQR is the interquartile range (i.e.Q 3 −Q 1 , where Q 1 represents the first quartile of the data).We are only concerned with the upper cut-off value as we are performing the skew-adjusted boxplot technique on the measures of outlyingness and hence small results in this context are not of interest.An alternative AO cut-off value of χ 2 {99,p} • median(AO) is given in mrfDepth (Segaert et al., 2017), where p represents the dimension of the data, χ 2 {99,p} is the 99 th percentile of the chi-squared distribution with p degrees of freedom and AO represents the set of all AO values for the data set.This cut-off value aligns closely with the fence factor used for the bagplot approach.In this paper we consider both cut-off values.
Under the AO methodology, not all univariate vectors a can be considered, however Hubert and Van der Veeken (2008) note that taking m = 250p directions provides a good balance between 'efficiency' and computation time.From here, if p = 2, to visualise the bivariate data a version of the bagplot based on the AO values (rather than halfspace depth) may be constructed (Hubert and Van der Veeken, 2008).
When constructing an AO based bagplot, the bag is given by the convex hull of the 50% of data points with the smallest AO (note this is different from Figure 2, which was based on halfspace depth).There are three possible approaches, which differ in the mechanism used to detect and hence treat outliers: 1.A fence is drawn by multiplying the AO based bag by 3 (or some other factor) relative to the point with the lowest AO.Outliers are then flagged as those observations outside the fence and the loop is the convex hull of all points within the fence.Since the fence is generated from the bag which only considers 50% of the data points it may fail to fully capture the shape of the data and in particular the skewness in the set.2. Utilise the alternative cut-off value given in mrfDepth where no fence is drawn.
Here, the cut-off does not incorporate a robust measure of skewness and instead relies on the median value of the AOs and a quantile of the chi-squared distribution.
3. Outliers are flagged using the traditional cut-off value given by Equation (2.6).In this case, the loop will be generated by the convex hull of all points with AO less than this value and no fence will be generated.
The traditional AO cut-off value incorporates a robust measure of skewness known as the medcouple which considers the whole data set.Hence, it is more equipped to capture the total skewness, and the loop that is generated by this approach more fully captures the skewness in the data in comparison to the fence methodology.
For the reasons explained above and unless stated otherwise, we will use the third approach.

An extension of the robust bivariate chain-ladder
In this section we implement the Adjusted Outlyingness and bagdistance (see Section 2) outlier detection and treatment techniques in a bivariate reserving setting.We show that the four techniques often lead to different results (although the bagplot and bagdistance will be the same if you use the fence factor as the cut-off distance).The differences are not only in the number of outliers flagged but also which observations are detected and hence treated.In each case, the different techniques should be implemented and results compared.
This section is motivated by some of the shortcomings of the current robust bivariate chain-ladder methodology.Firstly, the MCD Mahalanobis distance approach assumes elliptical symmetry of the multivariate data.If this assumption is not met, we may fail to detect outlying observations as well as falsely declare regular observations as outliers due to masking and swamping effects.The bagplot is better able to effectively visualise bivariate data; highlighting any correlation, skewness and tail behaviour.
We now present comparisons between these four outlier detection techniques when applied to real non-life insurance data.

The Robust Bivariate Chain-Ladder
The robust bivariate chain-ladder technique was put forward by Verdonck and Van Wouwe (2011) and the general steps involved are as follows: 1. Apply the robust Poisson GLM chain-ladder technique (Verdonck and Debruyne, 2011) to each triangle separately.Obtain residuals from each triangle given by where X ij represents the incremental claim for accident year i and development year j. 2. Store residuals from each triangle as bivariate observations in a n × 2 matrix X = (x 1 , ..., x n ) where x i = (r kj1 , r kj2 ).We have that n = I(I + 1)/2.3. Apply either the bagplot or MCD Mahalanobis Distance (see Section 2) to these bivariate residuals.4. Adjust outliers.For the bagplot, outlying observations are brought back to the fence or loop.For the MCD Mahalanobis distance technique, observations are brought back to the tolerance ellipse representing the 95% quantile of the χ 2 2 distribution.5. Backtransform adjusted residuals to obtain robust incremental claims X Rob i,j .6. Apply the multivariate time series chain-ladder (Merz and Wüthrich, 2008) to the robust (adjusted) observations.We now provide an example of the above framework on real bivariate data.We however include two alternative outlier detection and treatment techniques: the Adjusted Outlyingness and bagdistance.

Bivariate Example using Belgian data
The data for this example is from Belgian non-life insurers and is given in Appendix A.1 (from Shi, Basu, and Meyers, 2012).

Bagplot
We begin by illustrating the major shortcoming of the bagplot approach in that it does not provide a measure of outlyingness that is unique for each observation.This is because the bagplot approach is based on halfspace depth whereby multiple different observations can have the same halfspace depth even if they are at different levels of outlyingness.Figure 3 shows the bagplot for this data, where 6 outliers have been detected.Four of these observations have the lowest halfspace depth of 1 which may be expected for outliers.However, two of these outliers, X 6,5 and X 7,3 , have a halfspace depth of 2 and 3 respectively.These points are marked with red crosses.Additionally, one non-outlying observation, X 9,2 has a halfspace depth of 1 and is marked with a purple '+'.Hence without the corresponding bagplot, little inference can be made about how outlying the outliers are and importantly about whether an observation is outlying or not.This will become an even bigger concern when extending this technique to higher dimensions where graphical representations become less available.Indeed, the bagplot is purely based on a form of ranking (halfspace depth) (see Section 2.2.2).As a result, without having the graph available (such as for higher dimensions as considered in the next section) it is difficult to communicate how outlying an observation is.

MCD Mahalanobis Distance
When employing the MCD Mahalanobis Distance technique a graphical representation is also available in the bivariate case.In particular, we can plot each data point as well as tolerance ellipses of which have a squared distance to the central estimate of the data equal to a quantile of the χ 2 2 distribution.The classical tolerance ellipse is constructed when the Mahalanobis Distance is calculated using the classical estimators of the location vector and scale matrix rather than their robust counterparts from the Minimum Covariance Determinant procedure outlined in Section 2.1.3.Outliers are adjusted by a technique known as bivariate Winsorization such that an outlying observation x is adjusted according to where c is equal to the 95% quantile of a χ 2 2 distribution (χ 2 0.95,2 ).Under this methodology 7 observations are detected as outliers.These are the same observations as was flagged under the bagplot approach as well as X 9,2 , the only other observation with a halfspace depth of 1. Figure 4 gives these plots before and after outliers have been adjusted.

Adjusted Outlyingness
We now explore Adjusted Outlyingness (AO) as an outlier detection and adjustment technique.AO explicitly incorporates a robust measure of skewness and provides a measure of outlyingness for each multivariate observation, alleviating the aforementioned issue with the bagplot based on halfspace depth.Additionally, it may be better equipped to handle more extreme levels of skewness.A bagplot based on AO is also available in the two dimensional case as shown in Figures 5 and 6, which illustrate all three methods outlined in Section 2.2.4.
For these plots, the red asterisk is the point with the lowest AO, and represents a central point of the data analogous to the Tukey median, the dark blue area represents the bag which contains the 50% of points with the lowest AO and the light blue area represents the loop whose perimeter is constructed by the convex hull of all points not declared as outliers when using the traditional cut-off method (Hubert and Van der Veeken, 2008).
In Figure 5 we have used approach 3 with the traditional cut-off value given in Hubert and Van der Veeken (2008) (cut-off = Q 3 + 1.5e 3M C IQR) which incorporates a robust measure of skewness calculated from the whole data set and hence more fully considers the shape of the data to declare outliers, which are shown in red.This cut-off value was calculated to be 5.5813.The light blue area represents the convex hull of all points not declared as outliers under this methodology and may be considered as an AO-based loop.
The fence approach (approach 1) only captures the shape of the data from the 50% of observations determined to be least outlying.Figure 6a shows the AO bagplot with the fence drawn and we see that under this approach an additional 6 observations are detected as outliers.A further issue that presents itself in this situation is the adjustment to the fence or the loop.
Finally, we may use the alternative cut-off value as given in the mrfDepth package (Segaert et al., 2017) (Approach 2).The cut-off value under this approach is calculated to be 7.6227 compared with 5.5813 under the traditional calculation.An AO bagplot for this approach is shown in Figure 6b where no fence is drawn.In this case we detect 1 less outlier in comparison to the traditional case.Based on our previous arguments regarding the potential lack of total consideration of skewness under the two alternative approaches we recommend use of the traditional cut-off value (Approach 3).

bagdistance
The second outlier detection approach we propose is the bagdistance (bd ) (see Section 2.2.3) which utilises the bag to capture the shape of the data and provides a distance measure for each observation to represent its outlyingness.A cut-off point is set such that observations with a bd beyond this threshold are classified as outliers and are adjusted back to an appropriate point on the ray emanating from T * passing through x.An illustration of the bd is given in Figure 7 where the bd for the outliers is given by taking the ratio of the orange line to the white line (noting that the orange line continues through to T * ).If we choose the bd cut-off distance to be uniformly the same as the fence factor for a corresponding bagplot we will detect the same outliers under the two approaches.

Summary of Detection Results
Table 1 summarises the observations that were detected as outliers by the four techniques discussed here where a tick indicates that the observation was flagged under the relevant technique and a cross indicates that it was not.All observations not listed here were not flagged by any of the techniques.Further, the results in brackets correspond to the halfspace depth, MCD Mahalanobis distance, AO and bd values for each observation respectively.

Adjustment of Outliers
In the previous section we outlined how outliers may be detected.Of course, an equally crucial task is to determine how to treat those outliers.In this section we discuss different methods for such treatment.
In some robust analyses, if a data point is extremely outlying, it is simply removed.Unfortunately, this option is not available in most triangular reserving techniques (which require a fully populated triangle), and hence we must formulate and choose appropriate adjustment mechanisms.

Bagplot Based Adjustments
Under the bagplot approach, Verdonck and Van Wouwe (2011) suggest that outlying observations should be brought back to the fence however upon inspecting their plots they appear to have brought observations back to the loop.
Figure 8a and Figure 8b show the bagplot with the fence drawn in green and the bagplot after adjusting residuals back to the fence respectively.In Figure 8b, all of the adjusted outliers are within the loop, however, this is not always the case as after adjustment of the outlying observations, the bagplot methodology is performed again and with some data sets, some of the observations will be outlying because the overall shape of the data has changed which will lead to a different bag and hence fence.
Figure 9 shows the bagplot after adjusting outliers to the loop.Note that the adjustment in this case is more drastic in comparison to the fence adjustment.

MCD Mahalanobis Distance Based Adjustments
The adjustment methodology under the MCD Mahalanobis Distance approach is outlined in Section 3.2.2.where f represents the factor of the bag that we wish to adjust outliers back to.For adjustment back to the bagplot fence we choose f = χ 2 0.99,2 .Note that other adjustment functions may be employed.For illustration purposes, we have considered two other adjustment functions such that, dependant on the level of outlyingness as measured by bd, observations are adjusted to differing degrees.In the first case all observations (x) are adjusted according to This means that moderate outliers will be brought back to the fence whereas more extreme outliers will be adjusted to points beyond the fence according to their levels of outlyingness.Specifically, the 1 2 (2f + √ 4f + 1+1) constraint is chosen as it is the value of bd x where the adjustment function in case 3 of equations 3.4 and 3.5 (i.e.f + < 1 which means that the points that fall into case 3 will be adjusted back towards the centre of the data, however, to a point beyond the fence.For completeness, f + , which means that if the constraint was set at a lower value, then case 3 could lead to outliers being adjusted further away from the central point of the data rather than closer to it.This takes one view point on outliers whereby if an observation is very far outlying then it should still represent moderately rare events in the data.Another approach would be to consider outliers beyond a certain limit as misleading or errors and adjust them further towards the centre of the data, limiting their influence.Further investigation will be required based on the data set being analysed and the adjustment functions can then be designed depending on the conclusions drawn.
The second approach we consider is given by (3.5) Under this adjustment function, we set an additional limit u such that if outliers are beyond this point they are also brought back to the fence.The rationale for this approach is that if an observation is this far outlying it should be given full adjustment back to the fence as it either represents a data entry error or the information contained in that observation is too irrelevant for the current task to include more fully.Note that u is an arbitrary selection that will be dependent on the data set under review.The key takeaway is that with unique measures of outlyingness for each observation, there is significantly greater flexibility in how observations can be adjusted.
Figure 10a shows the outliers after adjustment according to Equation (3.4) and Figure 10b shows the outliers after adjustment according to Equation (3.5) where we have set f = χ 2 0.99,2 and u = 15.Under the latter approach with u = 15, observation X 9,1 which has bagdistance of 15.8 as shown in Table 1 is brought back to the fence whereas it remains beyond the fence in Figure 10a.
Remark 3.1.Additionally, we may use the bd values as a residual term when fitting a model with a robust loss function (e.g.robust M-estimation).For example we could utilise the bd in Huber's loss function such that we have (3.6)

Adjusted Outlyingness Based Adjustments
The AO approach provides a measure of outlyingness for each observation.This measure is based on numerous one-dimensional projections of the individual multivariate observations and the total multivariate sample.For each of these projections, the univariate measure of AO is calculated.The maximum of these univariate AO values after all the projections is taken to be the AO measure in the multivariate case.Without knowing which direction has led to the final AO we are seemingly unable to backtransform the AO measure to lead to adjusted residuals and ultimately adjusted claim amounts.A solution is to use the will be seen for the corresponding claim observations.Note that '-' values are provided where outliers were not detected under that methodology.

Final Reserves and Discussion
Once the outliers have been appropriately identified and treated, the data can be used to compute reserves estimates.Here we used the multivariate time series chain-ladder technique as described in Merz and Wüthrich (2008) Table 4 summarises the final reserve estimates and their associated RMSE for each individual triangle and total reserves under each outlier detection and adjustment technique as well as when we simply apply the multivariate chain-ladder without adjusting any observations.However we consider the last three development periods as separate univariate triangles.This is because there are few data points for these development periods and applying the multivariate chain-ladder to such periods often leads to highly volatile results or potentially failure in that elements of the estimated correlation matrices may have absolute values greater than one.This in turn may lead to a lack of convergence.Note that some authors (including Merz and Wüthrich, 2008) suggest extrapolation of these correlation variables from the previous periods however as our main focus is on outlier detection and adjustment we have not pursued this option.
The robust reserves in this example are always modestly less than the original reserves.This phenomenon is well known in the robust literature, and is obviously related to all adjustements being reductions; losses are thus decreased and so are the associated calculated reserves.The greatest adjustment is for the bagdistance technique where a limit is set for the adjustment function.
In each case, we saw a much greater reduction in the RMSE than reserves.For example, the bagplot-loop methodology leads to a 0.91% reduction in reserves and a 12.17% reduction in RMSE.This suggests that even if the change in reserve estimates are only minor, the accuracy of such estimates may be enhanced considerably as a result of these robust techniques.

An N-dimensional Framework
We now put forward a framework for N-dimensional robust chain-ladder reserving.The methodology is as follows.Firstly, perform the robust Poisson GLM chain-ladder technique on each triangle separately to obtain residuals.These residuals are then stored in an N-dimensional matrix.It is these N-dimensional residuals that outlier detection and adjustment techniques are applied to.The techniques are based on AO, halfspace depth, MCD Mahalanobis distance and bagdistance.
Under the AO approach outliers are detected using the standard cut-off value (see Section 2.2.4).In the three dimensional case a graphical representation is also constructed.Firstly, declare the point with the lowest AO as the AO median.Next, form the convex polyhedron that contains the 50% of points with the lowest AO which can be considered as a N-dimensional AO based bag.We then construct an N-dimensional AO based loop by forming the convex polyhedron that contains all non-outlying points.
Outliers are adjusted by bringing them back to the intersection point of the ray from the AO-median to the outlying point and the AO based loop.This intersection is found using the parametric line clipping algorithm (Cyrus and Beck, 1978;Liang and Barsky, 1984).Adjusted residuals are then backtransformed to give robust incremental claims.The multivariate time series chain-ladder (Merz and Wüthrich, 2008) is applied on these robust claims.
For the halfspace depth approach we calculate residuals in the same fashion, however the outlier detection and treatment methodology is altered.The halfspace depth methodology outlined in Section 2.1.2can be applied in N dimensions with Figure 1 providing an illustration in 2 dimensions.Firstly, calculate the halfspace depth of each observation.The Tukey Median is then the observation with the greatest halfspace depth or the centre of gravity of the deepest region if this is not a unique point.Next, formulate the Ndimensional bag.Firstly, let D s represent the region that contains all points with halfspace depth greater than or equal to s.Let #D s be the number of data points in D s .The bag is constructed by linearly interpolating the two convex regions that satisfy #D s ≤ n 2 < #D s−1 with respect to the Tukey Median.This interpolation is done in the same way as for the bivariate bagplot as described in Miller et al. (2003) however now applicable for any polytype rather than only two dimensional polygons.Firstly, calculate The bag is given by λ•outer polytype+(1−λ)•inner polytype.Next, the N-dimensional fence is constructed by multiplying the N-dimensional bag by a fence factor with respect to the Tukey Median.Outliers are declared as points beyond the fence.The fence factor we use is χ 2 N .For all N > 1, this is greater than the previously used arbitrary value of 3 in the two dimensional case and hence less outliers will be detected and adjusted.Then construct the N-dimensional loop by forming the convex hull of all non-outlying points.Outliers are treated by bringing them back to either the fence or loop.In three dimensions, the bag, loop and fence are convex polyhedrons and graphical representation is available.Adjusted residuals are backtransformed and the multivariate time series chain-ladder applied.
The MCD Mahalanobis distance and bagdistance approaches are implemented in the same fashion as they were for the bivariate case.For the trivariate case a graphical representation of the tolerance ellipsoid for the MCD Mahalanobis technique may be displayed.We now show the implementation of these four techniques on real data and compare the results.

Illustration with the AUSI dataset
For the purposes of illustration we use data from three different lines of business of two Australian insurers.A short discussion of this data is provided in Appendix A.2.

Data Adjustments
The fitting of the robust Poisson GLM stage of the methodology is only possible if there are no negative incremental claims in each triangle.However, of the 820 observations for each line, the CTP line from insurer 1, the CTP line from insurer 2 and the Home line from insurer 1 have 1, 21 and 57 negative claims respectively.We have set all these claims to zero.To ensure that this did not significantly impact results, we calculated development factors under the classical chain-ladder technique before and after the adjustment.The largest change in a development factor was 0.231% and hence the effect of the adjustment is small and is deemed acceptable.

Detection of outliers
Firstly, we perform the AO technique on these residuals.34 outliers were flagged under this approach.We then construct a 3D version of the AO-based bagplot such that the 50% of data points with the lowest AO are contained in a convex polyhedron analogous to the bag and an additional convex polyhedron is formed for all non-outlying observations analogous to the loop.Figure 11a shows the residuals with the 3D AO bag drawn in yellow and Figure 11b shows the residuals with the 3D AO loop in orange and the corresponding bag contained inside.Outliers are shown in red in each case.
We note that this loop extends in the various directions of the data's dispersion and seemingly captures its general shape.For completeness we have also investigated the use of the cut-off value for AO of χ 2 {99,N } • median(AO) as suggested in Segaert et al. (2017).Under this approach 48 outliers are detected.We now illustrate the halfspace depth based approach.Figure 12a shows the residuals with the bag drawn in yellow and Figure 12b shows the residuals with the bag loop and fence all drawn.The loop is again in orange and is within the light blue fence.Outliers are declared as the points outside the fence and under this methodology we detect 48 outliers.Interestingly, 29 of the 34 observations flagged as outliers under the traditional AO methodology are also detected under the halfspace depth approach.This highlights some agreement between the two methodologies.Note that the bags under both the AO and halfspace depth approaches are relatively similar, however the fence which is used to declare outliers under the halfspace depth methodology appears unable to capture the three dimensional skewness in the data.Rather, it remains relatively elliptical in shape whereas the AO loop stretches in each direction that the data is dispersed.This is further reflected in the number of outliers flagged.We conject that the halfspace depth approach may be misclassifying regular observations as outliers in this instance.To highlight the differences between a fence-based approach and AO approach, we may also draw an AO-based fence by multiplying the AO bag by three and declare as outliers observations outside this fence.Figure 13a shows the residuals with the AO bag drawn in yellow and AO fence in light blue.In Figure 13b we have drawn the AO bag, loop and fence.In this case the convex polyhedrons representing the loop and fence are overlapping and in particular the loop seemingly extends further in each direction that the data is dispersed.Notably, the AO fence appears reasonably elliptical in shape, similar to its halfspace depth counterpart.
Under this fence based approach, 66 observations are flagged as outliers.We believe that using an AO-based fence may again fail to fully capture the shape of the data as we suspect has occurred for the halfspace depth approach.In particular, we believe this will be of the greatest concern when there is significant skewness in the data set.Now, the MCD Mahalanobis Distance technique which does not consider skewness in the data set at all.Analogous to tolerance ellipses as described in Section 3.2.2,tolerance ellipsoids have a squared distance to the central estimate of the data equal to a quantile of the χ 2 3 distribution.This concept is extendable to higher dimensions where the squared distance is equal to a quantile of the χ 2 p distribution where p represents the dimension of the data.Figure 14a shows the residuals with the 97.5% robust tolerance ellipsoid drawn in orange and Figure 14b shows the non-robust tolerance ellipsoid drawn in green with its robust counterpart encapsulated within it.This is analogous to what we saw in the bivariate case (see Figure 6).Under the MCD Mahalanobis distance approach, 120 observations were flagged as outliers.This approach saw a substantially greater number of observations detected as outliers than the other available techniques as it does not consider skewness, likely leading to misclassifications e.g.swamping as the tolerance ellipsoid does not extend towards the skewed data.

Treatment of Outliers
Now we turn to the treatment of outliers.Under the AO, halfspace depth and MCD Mahalanobis distance approaches, outlying observations are brought back to a relevant convex polyhedron (i.e.loop or fence) or tolerance ellipsoid.Under the bagdistance methodology an alternative approach may be used which weights the degree of adjustment according to the level of outlyingness of each observation.In particular we have again considered the weighting functions given by equations (3.4) and (3.5) and set f = χ 2 3 and u = 10.As noted in Section 3.3.3,u is an arbitrary selection that will be dependent on the data set under review.Adjusted residuals under each methodology are backtransformed to give robust incremental observations.

Final Reserves and Discussion
The time series multivariate chain-ladder technique (Merz and Wüthrich, 2008) is performed on these robust observations.The results for final reserves and estimated RMSE are given in Table 5.Note that we have scaled the presented results within each LOB to ensure commercial data confidentiality.Under each methodology reserves have been reduced.The greatest reduction was seen when outliers were detected and adjusted using the MCD Mahalanobis distance approach.This is likely because there was a much greater proportion of observations detected as outliers under this approach.However as this methodology does not consider the skewness in the data set it may be inappropriate in this situation (likely excessive).Rather the AO technique based on the traditional cut-off value is likely more equipped to capture the shape of the data in this regard and we believe it to be the most suitable approach for this data set.The smallest reduction in reserves is given for the bd approach where there was no upper limit on the weighting function.This is likely because under this approach outliers have been adjusted to a smaller degree than in comparison to the alternative mechanisms.Additionally, we have seen the RMSE of reserves decrease in each case.The MCD Mahalanobis distance approach saw the greatest reduction in this metric of 21.71% whereas the traditional AO approach (loop) saw a 8.17% reduction.Notably the shift in the RMSE for each approach was significantly greater than the corresponding change in reserves.This highlights that even for small adjustments in terms of the point estimate of reserves when using these robust techniques, we may see significant improvements in accuracy.

R Codes
All relevant R codes used for the bivariate and trivariate illustrations can be found at https://github.com/agi-lab/reserving-robust, along with a synthetic dataset (with features inspired by the AUSI data used in section 4) to help replication of results.

Conclusion
We put forward two alternative robust bivariate chain-ladder techniques.The first technique is based on Adjusted Outlyingness and explicitly incorporates skewness into the analysis whilst providing a unique measure of outlyingness for each observation.The second technique is based on bagdistance which is derived from the bagplot however is able to provide a unique measure of outlyingness and a means to adjust outlying observations based on this measure.Using a real bivariate data set, we illustrated how those techniques compared to existing ones.While the (inevitable) reduction in central estimates is (desirably) modest, the reduction in RMSE is much more significant.This highlights that even for small adjustments in terms of the point estimate of reserves when using these robust techniques, we may see significant improvements in accuracy.We showed that our methodology offers material and significant improvements, compared to the techniques introduced in Verdonck and Van Wouwe (2011).
We then extended our framework to N dimensions and implemented all four outlier detection and treatment techniques on a real trivariate data set.The mathematical generalisation of the bivariate methodologies were not trivial.The illustration demonstrated good performance again, and results were qualitatively similar to those discussed in the bivariate example.
Through our expositions and extension to higher dimensions, we have added to the toolbox of techniques available to detect and treat outliers in multivariate reserving.We believe that the new techniques applied in this paper address some of the shortcomings of the previous approaches and should be explored as common practice when implementing robust multivariate reserving techniques in practice.
While we focused on the multivariate time series chain-ladder model of Merz and Wüthrich (2008) for our illustrations, it should be noted that the detection and treatment methodologies developed in this paper can be applied to other reserving techniques, including in most recent machine learning actuarial procedures (see, e.g., Richman, 2018).Notably, our work is concerned with the detection and treatment of outliers rather than finding the best reserving model for the data and we have used the chain ladder model for illustrative purposes.In the case that one proceeds to model the robustified data, back-testing of results compared to those from unadjusted data may provide useful insights into the effectiveness of the techniques.

Figure 9 :
Figure 9: Bagplot After Adjusting Outliers to Loop Figure 11: AO Based Detection Figure 12: Halfspace Depth Based Detection Figure 13: AO Based Detection (Approach 1 vs Approach 3) (a) MCD Robust Tolerance Ellipse (b) MCD Robust and Non-Robust Tolerance Ellipses