Impact of combination methods on extreme precipitation projections

Abstract Climate change is expected to increase the frequency and intensity of extreme weather events. To properly assess the increased economical risk of these events, actuaries can gain in relying on expert models/opinions from multiple different sources, which requires the use of model combination techniques. From non-parametric to Bayesian approaches, different methods rely on varying assumptions potentially leading to very different results. In this paper, we apply multiple model combination methods to an ensemble of 24 experts in a pooling approach and use the differences in outputs from the different combinations to illustrate how one can gain additional insight from using multiple methods. The densities obtained from pooling in Montreal and Quebec City highlight the significant changes in higher quantiles obtained through different combination approaches. Areal reduction factor and quantile projected changes are used to show that consistency, or lack thereof, across approaches reflects the uncertainty of combination methods. This shows how an actuary using multiple expert models should consider more than one combination method to properly assess the impact of climate change on loss distributions, seeing as a single method can lead to overconfidence in projections.


Introduction
Climate change and global warming are expected to lead to increases in catastrophic weather events such as wildfires, droughts, and extreme precipitation.These changes can have many effects such as crop damage, soil erosion, and increased risk of flooding.Quantifying severe weather events is of particular interest to actuaries, since events such as flooding account for a large part of global economic losses (Boudreault et al., 2020).An increase in extreme rainfall can lead to a possibly greater increase in river discharge (Breinl et al., 2021).Therefore, one would gain from obtaining reliable rainfall projections to assess flood risks.
Modelling precipitation behaviour, and weather events in general, requires complex models.For example, seasonality needs to be taken into account (e.g.Kodra et al., 2020), and one further needs to model spatial interpolation (Wagner et al., 2012;Hu et al., 2019, etc.).As such, projecting changes in extreme precipitation would mean combining these elements with extreme value theory in a limited data context.Given that different models may capture different elements of a system's behaviour, when interested in extreme precipitation, one will often receive diverging information from multiple sources and may wish to combine these sources of information.These sources can often be considered as expert opinions, which are used in actuarial science, for example, particularly in mortality studies, where deterministic projections are incorporated into mortality forecasting via Continuous Mortality Investigation (Huang & Browne, 2017) and P-Splines (Djeundje, 2022).Combining expert opinions and models is especially important for actuaries to set credible hypotheses when modelling losses from weather events.
Extreme weather events caused $2.1 billion in insured damage in Canada alone in 2021 (Insurance Bureau of Canada, 2022), and losses from natural catastrophes have been increasing over the last 20 years.In this context, the last few years have seen increased demand for catastrophe insurance, particularly flood insurance, and private insurers have been developing new products to respond to this demand.The challenge with modelling flood losses, or severe weather events in general, is that the covered events do not occur frequently, and the changing nature of climate implies that only relatively short spans of time can be considered to have similar risks.This compounds the lack of data necessary for developing actuarial models with traditional techniques requiring a high volume of frequency and severity data.Given that expert climate models specialise in the complex dynamics of weather events, combining these models offers an appealing solution for insurers by allowing for an alternate way of obtaining reliable models for catastrophic events.
To efficiently combine models, one needs to determine how much weight to give to each expert's opinion.Clemen (1989) reviewed forecast combination literature, concluding that combining individual forecasts substantially improves accuracy and that simple methods work reasonably well relative to more complex methods.By reviewing statistical techniques for combining multiple probability distributions, Jacobs (1995) showed that independent experts yield more information than dependent experts, where dependent experts might for example have models relying on one another.Cooke (1991) also reviewed expert combination and offered a non-parametric approach for attributing weights to experts based on specific quantiles.From a different perspective allowing for the potential use of a prior opinion about each of the experts, Mendel & Sheridan (1989) and Raftery et al. (1997) used Bayesian approaches to combine expert distributions.
Such methods have been further developed, in particular with Bayesian Model Averaging (BMA) gaining popularity in recent years.For example, Broom et al. (2012) considered BMA in a limited data context, and Fragoso et al. (2018) provided a review of its applications in 587 articles from 1990 to 2014, covering biology, social sciences, environmental studies, and financial applications.In the last few years, the concept of BMA has been generalised into Bayesian Predictive Synthesis (BPS) in a financial time series context (e.g.Johnson, 2017;McAlinn & West, 2019;McAlinn et al., 2020).Model combination can be useful in areas such as climate modelling, where significant uncertainty is present, especially in the context of climate change, and different models rely on different hypotheses.BMA is currently used to this end, for example Massoud et al. (2020) used BMA to study mean precipitation changes in the US by region.
In the context of extreme rainfall leading to flooding, spatial distribution becomes important as it can significantly change risk exposure, where a local rainfall does not lead to the same risks as widespread rainfall.To analyse this spatial distribution, areal reduction factors (ARF) are often used to convert point rainfall into areal rainfall (see for example Svensson & Jones, 2010).The impact of climate change on ARFs was studied by Li et al. (2015) for the region of Sydney, Australia.A limitation of this study is that the authors used a single expert model to obtain precipitation projections.One would seek to improve this type of analysis by combining multiple expert projections.A challenge with this idea is that combination methods often require larger datasets than are available in an extreme precipitation context.This is especially true given that precipitation patterns are changing, where considering an extended span of time means differences in precipitation distribution within the dataset.To circumvent this issue, Innocenti et al. (2019) used a model pooling approach with a 50-member ensemble when studying extreme precipitation in Northeastern North-America, allowing the authors to use 3-year periods of data.Supposing that all expert projections are equally likely, the authors could then apply frequency analysis to study 99th quantiles.An advantage of this method, beyond its simplicity and effectiveness, is that it allows for observing how variability between expert models can be used to improve the estimation of annual maxima statistics.A question that naturally arises is whether attributing weights to each expert based on combination methods instead of supposing all projections are equally likely would yield significantly different projections.This question is of particular interest to actuaries, since changing the underlying precipitation hypotheses would have an effect on event probabilities, and thus affect both pricing and reserving.
We thus focus on the impact of model combination methods on quantile and ARF projections when applied to the pooling approach of Innocenti et al. (2019) in Montreal and Quebec, Canada.The paper is divided as follows: section 2 provides details regarding parametric and nonparametric model combination methods, section 3 applies these methods to pooling to obtain extreme precipitation quantile and ARF projections and briefly explains how such projections can be used for flood damage modelling.Finally, section 4 provides concluding comments.Additional material can be found in Appendices A-C.

Model Combination Methods
Expert climate research groups often provide diverging information based on varying methods and underlying hypotheses regarding greenhouse gas emissions, changes in global convection patterns, the impact of topography, etc.One may seek to combine this information by using an array of tools such as non-parametric approaches or Bayesian approaches.This section presents approaches from various combination methods relying on different hypotheses.To easily analyse the differences between approaches, we choose well known approaches allowing for establishing weights to attribute to each expert, as compared to less transparent machine learning methods such as neural networks, for example.Such methods are however increasing in popularity, as highlighted in a review of recent AI applications in actuarial science by Richman (2021).As will be shown in section 3, the choice of method can lead to very different probabilities attributed to each expert's projections, suggesting that one can benefit from investigating the differences between expert models with higher probability.
Before going into each method's details, the following notation will be used throughout the remainder of this paper.Consider a vector of years τ = {s, s + 1, . . ., t}, where s ∈ {0, . . ., t}, t ≤ T, with T ∈ N the latest available year.Let Y τ ,x be a vector of random variables representing the precipitation annual maxima of G(x, τ , d), the daily precipitation at site x for day d, for years in τ .Further let the vector of random variables Y τ ,A be the annual maxima of H(A, τ , d) for the same period from s to t, where H(A, τ , d) is the average areal rainfall for day d, such that x∈X G(x, τ , d) for a collection of sites x ∈ X within the area A. The respective realisations of G(x, τ , d) and H(A, τ , d) are then y τ ,x and y τ ,A , with length t − s + 1.
Consider n experts providing a model M e allowing for projections of annual maxima for site x and area A, y (e)  τ ,x and y (e) τ ,A, respectively, where e ∈ {1, . . ., n}, over a period τ as described above.With a certain weight w e attributed to each expert, the objective is then to obtain a precipitation projection with a weighted sum of the experts' projections, that is, w e y (e)  τ ,x .
The goal of each method is then to obtain these w e from calibration variables.These are variables for which we know the true values, while the experts providing their opinion do not.This information then allows us to calibrate how much weight we give to each expert.Consider K such calibration variables V 1 , . . ., V K .We specify M percentages for which each one of n experts provides corresponding quantiles v (e)  k,m , k = 1, . . ., K; m = 1, . . ., M; and e = 1, . . ., n.In the context of extreme precipitation projection, we would have card(X) calibration variables corresponding to Y τ ,x for a calibration period τ .

Inverse distance weighting
A first possible approach to model combination is to intuitively build weights based on the distance between an expert's projection about a variable of interest, or vector of variables, and the true value of this variable.This idea can be achieved through Inverse Distance Weighting (IDW).The advantage of this approach is its intuitiveness and ease of use.
Classically, IDW was used with Euclidean distance.In a geometric context, Shepard (1968) used IDW to consider distance while taking angles into account.In a probabilistic setting, the challenge with this method is then to determine an appropriate distance measure.One such measure is the Wasserstein distance, which Kantorovitch & Rubinštein (1958) first realised was applicable to probability distributions.This idea was expanded on by Givens & Shortt (1984) and used recently by Pesenti et al. (2021) for sensitivity analysis.In the univariate case, the distance for expert e over time period τ at location x is defined as With this distance, the weight attributed to each expert's projection is then n l=1 1/D (l) .

Non-parametric calibration
From a literature-based approach, model combination can be approached from many angles.Cooke (1991) offered a review of early expert combination methods.Clemen & Winkler (1999) further elaborated on this review, suggesting issues that need to be considered when combining expert opinions such as expert selection and the role of interaction between experts.Since then, Cooke & Goossens (2008) and Hammitt & Zhang (2012) compared the performance of multiple combination methods, among which a classical approach which was first presented by Cooke (1991).This combination method uses desirable properties of scoring rules, namely that they should be coherent, strictly proper, and relevant (see Cooke, 1991 for details).A three-part method was established attributing weights to each expert distribution based on a relative information component, a calibration component, and an entropy component.This method has the advantage of being non-parametric, suggesting that an expert does not need to have a complete statistical model.Such a method can be appropriate for example in actuarial science, where an expert might reasonably provide an estimate for a small, medium, and large loss, but not a full loss distribution.
From the calibration variables V 1 to V K defined previously, we set v k,0 and v k,M+1 such that We compare these selections and expert-provided values with the true observed values to find the proportion of calibration variables in each interquantile space.This forms an empirical distribution q = {q 1 , . . ., q M+1 } that we can compare to the theoretical proportion p = {p 1 , . . ., p M+1 }.As shown by Cooke (1991), we can obtain the calibration and entropy components, C(e) and O(e) respectively, as where is the relative information component, and It can readily be shown that the relative information component I(q, p) multiplied by 2K (i.e.twice the number of calibration variables) follows a chi-squared distribution.The calibration component uses this fact to measure the goodness of fit of each expert forecast, while the entropy component measures the distance of expert forecasts from a uniform distribution.The intuition for this component is that a uniform model provides very little useful information.From these, we finally obtain for a specified threshold α chosen by optimising the score of the combined distributions, where 0 < α < 1.This α can be seen as a hyperparameter representing the minimal calibration level that each model needs to satisfy to receive weight.As such, a higher α means we give probability to less models.This also implies that the maximal value for α is the highest value of C(e).We can then recalibrate the weights to make their sum equal to 1 by dividing w e by the sum over all experts: These w e do not require the analyst to have a prior opinion of each expert's projections.We will refer to this method as Cooke's method for the sake of brevity.In the context of daily precipitation annual maxima, the corresponding calibration variable is then Y τ ,x , where we consider K different sites x.

Bayesian model averaging
As an alternative to the previous approaches, one may seek to exploit their prior knowledge using Bayesian methods, updating a prior belief with observed data to obtain a posterior distribution more representative of recent data.Bayesian Model Averaging (BMA) is a widely used tool for model combination.Recently, in the United States, BMA was used to study extreme rainfall density as well as daily mean rainfall by Zhu et al. (2013) and Massoud et al. (2020), respectively.First made popular by Raftery et al. (1997) in linear models, BMA uses observed data to update weights to different models based on their likeliness.This relies on the premise that any of the models could be right, but selecting only one model would fail to capture the uncertainty around this choice.This in turn leads to reducing overconfidence from ignoring a model's uncertainty.BMA however implicitly relies on the assumption that one of the models must be right (Hoeting et al., 1999).Note that the method presented in Cooke (1991) relies on a similar assumption, given that the optimal α requires at least one model to be chosen.
Let M be a discrete variable representing this best model, with possible values {M 1 , . . ., M n }.An analyst has some prior belief about the probability that each expert's model is right, Pr (M = M e ), which we will denote Pr (M e ), normalised such that n e=1 Pr (M e ) = 1.In the absence 3: Calculate the likelihood assuming residuals follow a normal distribution: Pr M e | y τ ,x = 1, and posterior probabilities Pr M e | y τ ,x can therefore be considered as updated weights attributed to each expert.This supposes that all models are independent since we ignore possible interactions between models.This assumption is appropriate in this case since all experts rely on different approaches, but this will be discussed in section 4.There are different ways of calculating the expert-associated probabilities.
A first possibility is to use an expectation-maximisation (EM) algorithm, as shown by Darbandsari & Coulibaly (2019), where the residuals between the model projections y (e)  τ ,x , representing an expert's projection generated from model M e about the variable Y τ ,x , and actual data are assumed to follow a Gaussian distribution.This assumption allows for iterating through these residuals' Gaussian likelihood while updating the weights attributed to each expert model until the difference between iterations is less than some threshold β.The algorithm is outlined in Appendix A. The algorithm allows for projecting a posterior distribution for a period ψ = {s , s + 1, . . ., t }, with s ∈ {t, t + 1, . . ., t }, t < t ≤ T. This approach must be used carefully as it can lead to overfitting.With a low threshold, EM will be optimised for training data, but will also learn the noise surrounding the signal.Because of this, the algorithm can then perform poorly on testing data.This limitation of the EM algorithm will be further explored in section 3.
The same hypothesis that residuals follow a normal distribution was used by Zhu et al. (2013), but with a different approach due to limited datasets, where the authors used bootstrapping under Generalised Likelihood Uncertainty Estimation (GLUE, see Beven & Freer, 2001) to obtain the posterior likelihoods.The algorithm is presented in Algorithm 1, where y τ ,x,m is the mth quantile of the vector y τ ,x , y τ ,x,m,b is the bth bootstrap resampling of this quantile with B resamplings, and Pr (Y ψ,x = y|M e ) is the probability distribution of extreme precipitation under model M e for a future period ψ.

Application to Areal Reduction Factors
In the context of extreme precipitation, where projections from multiple models are available, model combination can become a particularly useful tool.The issue with combining models with annual maxima data is that datasets are limited.To find projected precipitation trends in annual maxima at a 1 in 100 return level, Innocenti et al. (2019) pooled y (e)   ψ,x across all experts for projected time period ψ, thus significantly increasing available data for small spans of time.Let Y ψ,x be the vector of random variables describing annual maxima for period ψ.The pooled "observations" for this variable are then , where all elements of y ψ,x are considered equiprobable, such that with y ∈ y ψ,x , n experts, and ψ having length t − s + 1.
Applying frequency analysis to this pooled set, we define the quantile corresponding to a certain frequency R as Y ∈ Y ψ,x such that where for example for a 1 in 20-year return level, we would have 1 − 1/20 = 0.95.

Non-equiprobable pooling
In the previous section, we saw different methods to attribute weights to expert opinions depending on the probability of each expert projection being accurate.We can incorporate these ideas into the pooling idea of Innocenti et al. (2019).We use their pooling method as a baseline, where one may consider all expert-provided models as equally likely, which we will refer to as the equiprobable scenario.Instead of supposing that all model projections are equally likely (Pr (M e ) = 1/n), we can update our belief about the probability of each model with observed data.By defining with y ∈ y ψ,x , we obtain a shifted distribution reflecting this updated belief, where t − s + 1 is the number of years in the future projection period ψ, and τ is the historical observed period.

Calculating areal reduction factors
We can now incorporate the model combination methods and pooling presented previously into ARFs to investigate their impact on extreme precipitation quantile and ARF projections.
Although there are slightly varying definitions of ARFs, we will focus on the one used by Le et al. (2018), which can be thought of as a quantile of average areal precipitation over an average of point precipitation quantiles.This particular definition has the advantage of being applicable to any station within a region and not only one station.Starting from the notation introduced in section 2, let Y τ ,A,R and Y τ ,x,R, respectively, represent the areal and point rainfall for area A, point x, and frequency R over period τ .The ARF based on daily precipitation is then , where there are a collection of sites x ∈ X within area A.
An issue that arises when calculating ARFs with climate models is that expert projections are often not available at each point x, but rather at a grid scale.This issue can however be solved by assuming that scaling from point precipitation to grid average precipitation is time invariant.Li et al. (2015) demonstrated the validity of this hypothesis, enabling the use of grid cells for ARF calculation, where we would have grid-to-area instead of point-to-area.
With this notion of time-constant scaling, we can thus consider the points x as grid cell coordinates instead of stations.This enables us to calculate ARFs using grid data, as made available by climate agencies such as Climate Data Canada and Copernicus Climate Change Service.Grid cells are available at a resolution of approximately 0.1 degrees of latitude and longitude and represent average precipitation over the grid cell.We consider zones of 6 × 4 grid cells in the regions of Montreal and Quebec.We have access to 24 different climate models using historical data from 1951 to 2005 to project precipitation from 2006 to 2100.These models rely on three different Representative Concentration Pathways (RCP) emission scenarios: a low emissions scenario (RCP 2.6), a moderate emissions scenario (RCP 4.5) and a high emissions scenario (RCP 8.5).In keeping with Innocenti et al. (2019), we will focus on the 8.5 scenario, corresponding to a 4.5 degree increase by 2100.We calibrate weights using data from 2001 to 2020, for which we have both real and projected precipitation.This allows us to compare quantiles for Bayesian Model Averaging, or interquantile space for Cooke's method and inverse Distance Weighting, and so calibrate combination weights using each method.With the obtained weights, all future time periods are then forecasted.It is worth noting that this relies on the hypothesis that weights remain the same whether forecasting near or far future.
To use pooling, we need to have sufficient data for frequency analysis.Due to having 24 models instead of the 50 in Innocenti et al. (2019), we consider 6-year periods, such as precipitation from 2016 to 2021, rather than 3-year periods to obtain a similar number of data points.Applying weights calculated using the different methods presented in section 2, we calculate shifted densities reflecting these adjusted weights, as can be observed in Figures 4 and 5.However, before using the BMA-EM algorithm, a threshold or number of iterations must be chosen to prevent overfitting.This is because too many iterations of the expectation-maximisation algorithm will lead to learning the signal as well as the noise in the training data.Figure 1 illustrates the average MSE resulting from splitting data from 2001 to 2020 into ten-year training and testing periods.Overfitting occurs passed 4 iterations of the expectation-maximisation algorithm, where we see that the testing sample MSE starts increasing significantly while the training sample MSE stabilises and even slightly increases.To prevent this overfitting, we choose to stop the algorithm after 4 iterations.This is a known issue of BMA (see e.g.Domingos, 2000), added to BMA tending to select only one model asymptotically, as BMA implicitly relies on the assumption that one of the models is true (Hoeting et al., 1999).An α of 0.65 is also selected for Cooke's method by optimising the error as shown in Figure 1.
We first note that different combination methods can yield very different weights attributed to each model.Figure 2 illustrates the difference in weights for the cities of Montreal and Quebec for a period from 2001 to 2020.Note that for the rest of the article, when we refer to Quebec, this will imply Quebec City and not the province.We see that for Montreal, the two BMA methods generally agree, whereas they do not for Quebec.On the other hand, both Cooke and IDW lead   to relatively similar weights in both locations, but they differ from BMA results.These different weight attributions can lead to different projected quantiles.
One may seek to investigate the expert models with larger probability to ensure they agree with those models' hypotheses.For example, in Montreal, the next to last model (MPI_MR) receives a large weight from the EM algorithm, but gets truncated by the calibration approach.This happens because the model has a jump in precipitation level around the 50th quantile, as illustrated in Figure 3. 7 observations out of 20 fall in the 45-50% interquantile space for model MPI_MR.This  causes a poor fit in calibration in terms of Cooke's method, but the quantile-to-quantile residuals are quite small, meaning that we still have a good fit in terms of low residuals when compared to real data, making the BMA methods give this model high weight.In similar fashion, one can gain additional insight by comparing the outputs of different combination approaches.
Figures 4 and 5 illustrate the upper tail of the resulting empirical cumulative distribution functions under different possible combination methods for Montreal and Quebec, respectively.We see that the quantiles obtained from varying combination methods are substantially different depending on the weights attributed to each model.From a risk management perspective, such differences can alter conclusions reached by an analyst concerning risk level.As such, one would benefit from considering multiple combination methods, given that this would allow for better understanding of projection uncertainty.
Since different combination methods yield different results, one may be interested in the variability induced by attributing weights to each expert.Let F (e)  τ ,x be the cumulative distribution function corresponding to model M e .We define the CDF of Y τ ,A as , where w 1 , . . ., w n are the weights attributed to each expert (which correspond to probabilities Pr (M e | y τ ,A )).It is easy to show that for a given return level, the boundaries for Y τ ,A,R will be the minimum and maximum of Y (1)  τ ,A,R , . . ., Y (n) τ ,A,R .Indeed, we have τ ,A Y τ ,A for some i ∈ {1, . . ., n} and ∀ j ∈ {1, . . ., n}.Then, it follows that , and so F −1 τ ,A Y τ ,A must be the minimum for Y τ ,A,R for any combination of weights.Similarly, the reverse logic allows for stating that the lowest CDF must yield the maximum quantile.
From this reasoning, Figure 6 illustrates the CDF obtained with each combination method in Montreal between 2001 and 2020 compared to the minimum and maximum boundaries of quantiles, where the period is expanded to 20 years to allow for empirical quantiles from each expert in a short enough period that precipitation is not expected to change significantly.We notice that the combination methods are grouped within a much narrower range than the theoretical boundaries from the minimum and maximum projections.
We can suppose that the weights provided by the different combination methods will improve the variance around a quantile estimate compared to having no information about each expert.While we cannot obtain this variance mathematically, we can use bootstrap resampling to compare the quantile distribution under each scenario.Figure 7 illustrates the resulting density distributions for the 95th quantile in Montreal between 2001 and 2020.In keeping with intervals presented in Climate Data Canada, the 10% and 90% quantiles of the distribution supposing no information about experts are shown (corresponding to the equiprobable scenario), which can be thought of as the lower and upper bounds that a user with no evaluation of the expert models might consider as plausible.We notice that the two BMA methods differ largely from the other two methods, with modes lying outside the 10-90% boundaries, while the other methods are more similar to not evaluating experts, particularly for the 95th quantile.This difference is driven by the same phenomenon as the difference in weight attribution.BMA methods rely on the assumption that residuals between projections and real data follow a normal distribution, whereas Cooke's method and IDW using Wasserstein distance use the distance between (cumulative) densities of the projections and real data.If expert distributions have jumps in their CDFs, this will cause aggregation for both Cooke and Wasserstein, leading to these models receiving little weight.Nonetheless, the residuals between these experts' projections and real data might still be small, such that BMA methods will attribute larger weight to these models.These different weights cause the gap between quantile values of BMA methods compared to the other methods, as observed in Figure 6.Given the similarity in results between the non-parametric methods using densities, and the BMA methods using residuals as in Figure 7, it is natural to suppose that keeping only one method using densities and another using residuals provides sufficient information for analysis purposes.
Moreover, these combination methods allow for alternate confidence bounds based on an evaluation of expert models as opposed to supposing all expert projections are equally likely.Table 1 also highlights the reduction in variance for the 95th quantile in Montreal, while the much lower variance is similar for all methods in Quebec.
Applying the same exercise to multiple grid cells within the Montreal region, we can calculate the resulting ARF for each method.Given that we observe a 10% difference in 95th quantiles between methods, we can expect different weights to yield significantly different ARF curves.
It is worth noting that directly using quantiles found with model combination methods can yield non-sensical results when computing ARFs.This is because the spatiotemporal relation between the full region Y τ ,A,R and the underlying grid cells Y τ ,x,R for each expert's projection is broken when comparing a weighted average of y (e)  τ ,x,R and y (e) τ ,A,R , leading to ARFs possibly exceeding 1.From a point-to-area point of view, this would not make sense, seeing as a whole area cannot have more intense precipitation than its maximal component, limiting the applicability of such a method.This effect is lessened by using the same weights for all grid cells within an area.
From the significant variability in higher quantiles observed in the previous figures depending on the weights attributed to model projections, we choose to study percentage changes in ARF and quantiles because they yield more comparable information between the different combination methods than actual quantile and ARF values.Mathematically, the modelled quantile change for area A corresponds to quant = Y ψ,A,R /Y τ ,A,R , and the ARF change to ARF = ARF A,R, ψ /ARF A,R, τ for future period ψ and current period τ .
Using the quantile boundaries found previously, we can establish boundaries for possible quantile change by comparing the future maximum to the current minimum and vice versa for the minimum possible change.This exercise is not well-defined for ARFs, since the area value depends on the underlying grid cells, and so we cannot for example use the highest area quantile with the lowest grid quantiles, as this would not make sense from a rainfall perspective.Keeping the same 20-year period, we compare it to a near-future period of 2011-2030 and a far future of 2071-2090.The idea behind comparing two future periods is that the variability in near future should be lower than for a later period.Figures 8 and 9 show the change in quantiles and ARFs in Montreal for the near future and far future at a 1 in 20-year return level.While we observe the expected change in variability for quantiles, Figure 9 shows that change in ARF does not significantly vary between near and far projections.This could be explained by looking at the underlying composition of the ARF, where , such that the first ratio is the change in quantiles, but the second ratio has the current period and future period inverted, suggesting that it will be approximately inversely proportional to the quantile change.As such, the two ratios will cancel out, other than the random noise between different grid cell precipitation, which is what we observe in Figure 9.The fatter tails for Bayesian methods are induced by the distribution of quantile change, which is less centred around a mode, as seen in Figure 8.The same idea is applied to Quebec in Appendix B, where all methods generally agree, and the Bayesian quantile change projections are more centred around their mode than for Montreal, such that the ARF change projection has smaller tails.The distributions resulting from different combination methods can provide valuable information about the uncertainty of projections, where for example in this case the confidence level is higher regarding Quebec projections than Montreal projections.Moreover, compared to the 10-90% confidence bounds usually presented, we see that the resulting distributions from combination methods provide alternate bounds based on an evaluation of expert projections.In an actuarial context, this could be very important as it can highlight whether a projection is too conservative or not conservative enough.
Figures 10 and 11 compare the mean percentage change in ARF and quantiles respectively for a 1 in 20-year return level for Montreal between Cooke's method and BMA-EM, divided into   approximately 24 km × 22 km areas.These two methods are chosen to illustrate the substantial variation between a density-based method and a residuals-based method.For example, both methods project increases in quantile, but one projects a 10% increase with little change to the ARF, while the other projects a 22% increase with a reduction to the ARF.From a risk management perspective, this would imply differing scenarios of a moderate increase with similar spatial distribution and a heavier increase with more localised precipitation, which can lead to different losses (see for example Cheng et al., 2012 and American Academy of Actuaries, 2020).Flood losses provide a particular example of how Cooke's method and Bayesian model averaging with expectation-maximisation would lead to different loss projections.While the link between extreme rainfall and flooding is complex, the difference in scenarios between Cooke's method and BMA-EM allows for a theoretical discussion of its impact for an actuary.Through a combination of hydrological and hydraulic models such as Hydrotel (Fortin et al., 2001), HEC-RAS (Brunner, 2016) or the Hillslope Link Model (Demir & Krajewski, 2013), one can produce discharge flood projections under different rainfall scenarios.Breinl et al. (2021) used elasticity to illustrate the relationship between extreme precipitation and flooding, where depending on ground dampness, an increase in precipitation will have an at least equivalent increase in river discharge, leading to increased flood severity.Supposing that the reduction in ARF will mitigate the impact of an increase in quantiles due to more localised rainfall, such that for example we have an approximately 7% and 19% increase under, respectively, the Cooke and BMA-EM scenarios, the relationship between discharge and rainfall would clearly imply a greater risk of increased flood losses in the latter case.
Using a hierarchical model such as the one used by Boudreault et al. (2020), flood intensities are associated with different levels of discharge, and their respective probabilities are established from frequency analysis.In their study, the second and third levels of flood intensities have discharges of 1,570 m 3 /s and of 1,740 m 3 /s, respectively, with occurrence probabilities of 0.01496 and 0.00842.This 10.8% difference in discharge is lower than the projected increase in extreme precipitation using BMA-EM, which is not the case for Cooke's method.All else being equal, the probability of observing more severe flooding in the BMA-EM scenario would increase relative to the Cooke scenario.This change in probability can then be used to calculate premiums and/or reserves for flooding, where BMA-EM would lead to a more conservative estimate than the other method in this case.In a changing climate perspective, the range of scenarios resulting from different combination methods becomes even more important to have a fuller understanding of the impact of climate change on insurable losses.An analyst using only one method would fail to obtain a complete picture of projection uncertainty and may find themselves being overconfident in the result of a single combination method.
Similar graphics are available in Appendix C for Quebec.Projections for this city are much more similar across methods, leading to smaller confidence intervals in this case.
In summary, we see that the different combination methods considered can yield varying sets of weights, or probabilities, assigned to each model, which impacts projected quantiles.From the similarities between methods using densities compared to methods using residuals, we see that one only needs to use one method from each approach to obtain a picture of the underlying projection uncertainty, and the difference between the approaches provides a measure of this uncertainty.In cases where methods agree, one could more confidently reach conclusions about the analysed data, but in cases where methods disagree, using only one method would fail to capture projection uncertainty.Moreover, combination methods can yield alternate confidence bounds based on an evaluation of expert models and offer an improved pooling projection over considering all expert projections as equally likely.

Conclusion
In this paper, we applied model combination methods to the pooling approach used by Innocenti et al. (2019) to highlight the resulting difference in quantile estimation and areal reduction factor (ARF) calculation.More specifically, we compared Cooke's method, an inverse distance weighting approach, and two Bayesian model averaging approaches to equiprobable pooling when considering precipitation annual maxima.
Our main focus was to investigate the impact, if any, of various model combination methods on quantiles obtained through pooling and therefore on the resulting ARFs.We considered two non-parametric approaches, namely Cooke's method as well as Inverse Distance Weighting using Wasserstein distance, in addition to Bayesian Model Averaging using an Expectation-Maximisation algorithm, and a Generalised Likelihood Uncertainty Estimation algorithm.The choice of these methods was motivated by having an approach not requiring much information, an easy to use and intuitive method, and Bayesian approaches frequently used in recent studies.
We focused on a 1 in 20-year return level in Montreal and Quebec to show that different weighting methods lead to significantly different results for both quantiles and ARF curves.By considering the projected percentage change in quantiles and ARFs from 2001-2020 to 2071-2090, the variability in results offered insight into the uncertainty of future projections, where results seemed to generally agree around Quebec, whereas results varied significantly between methods for Montreal.This suggests that despite past literature demonstrating that combination methods significantly increase accuracy (Clemen, 1989), one should use more than one combination method, given that a single method may lead to overconfidence about projections.Moreover, it may be sufficient to compare a method using densities to another using residuals to obtain alternate confidence bounds instead of the standard bounds used in weather projections.Combination methods can be of particular interest to actuaries in a changing climate context to have a better understanding of the impact of projected changes on potential losses.
A limitation of this study is that the combination methods used ignored the potential dependence between different expert projections by assuming independence between experts.The new method of Bayesian Predictive Synthesis presented in McAlinn & West (2019) would be an interesting extension, as it is a generalisation of Bayesian Model Averaging taking dependence into account in a time series context.
y) dy, with F Y τ ,x the real cumulative distribution function and F Y (e) τ ,x the expert's CDF.

Figure 1 .
Figure 1.Grid cell MSE of the expectation-maximisation algorithm (left) and Cooke's method (right) in the Montreal region from 2001 to 2020.

Figure 2 .
Figure 2. Model weight by method for Montreal (left) and Quebec (right) for precipitation from 2001 to 2020.

Figure 3 .
Figure 3. Cumulative distribution for model MPI_MR and real data in Montreal for a grid cell between 2001 and 2020.

Figure 4 .
Figure 4. Upper tail of empirical cumulative distribution functions of pooled annual maximum daily rainfall (mm) for Montreal from 2016 to 2021 with different weighting methods.

Figure 5 .
Figure 5. Upper tail of empirical cumulative distribution functions of pooled annual maximum daily rainfall (mm) for Quebec from 2016 to 2021 with different weighting methods.

Figure 6 .
Figure 6.Upper tail of empirical cumulative distribution functions of pooled annual maximum daily rainfall (mm) for Montreal from 2001 to 2020 with different weighting methods, and minimum and maximum boundaries.

Table 1 .Figure 7 .
Figure 7.Comparison of bootstrap densities under different combination methods for the 90th quantile (left) and 95th quantile (right) in Montreal between 2001 and 2020 for 10,000 iterations.

Figure 10 .
Figure 10.Percentage change in quantiles for a 1 in 20-year return level between 2001-2020 and 2071-2090 for the region of Montreal using Cooke's method (left) and BMA-EM (right).

Figure 11 .
Figure 11.Percentage change in ARFs for a 1 in 20-year return level between 2001-2020 and 2071-2090 for the region of Montreal using Cooke's method (left) and BMA-EM (right).