Calibration of transition risk for corporate bonds

Abstract Under the European Union’s Solvency II regulations, insurance firms are required to use a one-year VaR (Value at Risk) approach. This involves a one-year projection of the balance sheet and requires sufficient capital to be solvent in 99.5% of outcomes. The Solvency II Internal Model risk calibrations require annual changes in market indices/term structure/transitions for the estimation of the risk distribution for each of the Internal Model risk drivers. Transition and default risk are typically modelled using transition matrices. To model this risk requires a model of transition matrices and how these can change from year to year. In this paper, four such models have been investigated and compared to the raw data they are calibrated to. The models investigated are: A bootstrapping approach – sampling from an historical data set with replacement. The Vašíček model was calibrated using the Belkin approach. The K-means model – a new non-parametric model produced using the K-means clustering algorithm. A two-factor model – a new parametric model, using two factors (instead of a single factor with the Vašíček) to represent each matrix. The models are compared in several ways: 1. A principal components analysis (PCA) approach that compares how closely the models move compared to the raw data. 2. A backtesting approach that compares how each model’s extreme percentile compares to regulatory backtesting requirements. 3. A commentary on the amount of expert judgement in each model. 4. Model simplicity and breadth of uses are also commented on.


Introduction
In this paper transition and default risk for credit assets in the context of the Solvency II requirement to develop a 1-in-200 VaR and full risk distribution has been looked at.This primarily involves developing a model of transition matrices.Relative to other risks on an insurance company balance sheet, this risk is more complex with a wider range of considerations.
• In Section 2 we outline the key risk drivers associated with this risk and introduce the core modelling componentthe transition matrix.• In Section 3 we look at the primary sources of historical transition matrix data with a discussion of how this data is treated.• Section 4 analyses of the data, presenting the key features any credit model should aim to capture.• Section 5 discusses the different types of credit models and then presents a range of models split between parametric and non-parametric model types.The parametric models explored are the Vašíček model (Vašíček, 1987(Vašíček, , 2002) calibrated using an approach described in Belkin et al. (1998); and a new two-factor model introduced in this paper.Two non-parametric models are also explored; a model (known as the K-means model) which uses the K-means algorithm to group historical matrices; as well as a simple bootstrapping approach of simulating historical transition matrices with replacement.• Section 6 includes a quantitative and qualitative comparison of the various credit models.For the quantitative comparison, principal components analysis (PCA) is used to identify the directions of most variance in historical data, which is then compared for each of the models.
The first and second principal components (PC1 and PC2) are focused on.A second quantitative comparison involves comparing the 99.5 th percentile with that expected from Solvency II regulations.For the qualitative comparison, there is a discussion of strengths and weaknesses of each model.The simpler models are easier to calibrate and explain to stakeholders but at the cost of not explaining as many of the key features seen in the data in practice.The more complex models can allow a closer replication of the key data features but with a greater challenge in explaining them to stakeholders.
The key questions this paper has sought to answer when comparing the models are: • Does the model output move in a consistent way compared to the historical data (i.e. are PC1 and PC2 from the underlying data consistent with PC1 and PC2 from the model)?• Does the model produce stress transition matrices that are sufficient to meet reasonable back-testing requirements?• Is the model calibration largely objective (i.e. based on prescribed calibration methods/data) or is there significant scope for expert judgement in the model calibration?
In this paper we find that the Vašíček model does not move consistently with the raw data; PC1 from the raw data is more consistent with PC2 from the Vašíček; and PC1 from the Vašíček is more consistent with PC2 from the raw data.The other models explored in this paper do move more consistently with the raw data.The Vašíček and two-factor model require additional strengthening to ensure their 99.5 th percentile exceeds the 1932 transition matrix.A bootstrapping approach can never exceed the worst event in its data set, which is a significant issue for models used for future events as the worst case can never be worse than the worst event in history (an example of the Lucretius fallacy).The K-means model is specified to pass back-testing as required and includes events worse than the worst event in history.
The K-means model as implemented in this paper has significant expert judgement required.This allows for flexibility in model development but is also less objective.The bootstrapping approach has no requirement for expert judgement at all beyond the choice of data.The Vašíček and two-factor models can be applied with varying amounts of expert judgement depending on the purpose for which the model is designed.

Risk Driver Definition
Transition and default risk apply to both the modelling of assets and liabilities.
• On the asset side, credit ratings are given to individual assets and the movement between different rating classes can impact the asset price.Default on any asset also means significant loss of value on that asset.It might be possible to use credit spreads as opposed to credit ratings to model credit risk, however historical time series of credit spreads are largely split by credit ratings, so it is difficult to avoid the use of credit rating.Hence transition and default risk are modelled using transition matrices.• On the liability side, many solvency regulations have a link between the discount rate used to discount liabilities and the assets held to back the liabilities.In the case of the matching adjustment in the Solvency II regime, the credit rating of the assets is explicitly used to define default allowances.
Transition matrices are used to capture probabilities of transitioning between credit ratings and default (an absorbing state).They are produced from the number of corporate bonds that moved between credit ratings or defaulted over a given time period.
An S&P transition matrix is shown below.This gives the one-year transition and default probabilities based on averages over 1981-2018.
The transition matrix itself is the data item that is being modelled.A historical time series of transition matrices can be obtained and this time series of 30-100 transition matrices is used to gain an understanding of the risk.Each matrix is itself 7*7 data points (i.e. the default final column is simply 100% minus the sum of the other columns in that row; the bottom default row is always a row of 0% with 100% in the final column (as above)).
The complexity of this data source makes transition and default risk one of the most complex risks to model.

Data Sources
For historical transition matrices, there are three main data sources for modelling: 1.Moody's Default and Recovery Database (DRD) and published Moody's data.2. Standard and Poor (S&P) transition data via S&P Credit Pro. 3. Fitch's transition data.
We present a qualitative comparison of the data sources in Section 4. We have used published S&P transition matrices as the key market data input for the corporate downgrade and default risk calibration in the models analysed in this paper.This data is freely available for the period 1981-2019 in published S&P indices and using this data combined with transition matrices from the Great Depression (Varotto, 2011) can be used to calibrate transition matrix models.A sample matrix is shown in Table 3.
Some key points to note about transition matrices are: 1.Each row sums to 1 (100%), as this represents the total probability of where a particular rated bond can end up at the end of the year.2. The leading diagonal of the transition matrix is usually by far the largest value, representing bonds that have remained at the same credit rating over the year.Note that as well as the main credit ratings, this data contains a category called "Not Rated (NR)".We have removed the NR category by reallocating it to all other ratings by dividing by (1-p(NR)).
3. A transition matrix multiplied by another transition matrix is also a valid transition matrix with the rows summing to 1; and the calculated matrix containing transition probabilities over two periods.4. For completeness, there is also a row for the default state with zero in every column, except for the default state itself which has value 1.

Stylised Facts of Data
The data set used is a series of transition matrices, one for each year.This makes it the most complex data set most Internal Models will use.There are upgrades, downgrades, and defaults which each have complex probability distributions and relations between each other.Downgrade and defaults tend to be fat tailed with excess Kurtosis (i.e. a higher than for a Normal distribution).
The probabilities of each of these events can vary significantly over time.
For the purpose of detailed empirical data analysis, we have used publicly available data: • 1932 Moody's transition matrix.
• 1981-2019 S&P transition matrix data.The above analysis shows: • When comparing the types of transitions: ○ For investment grade ratings, the probability of downgrade is more significant, with defaults forming a very small percentage of transitions (although note the scale of the asset loss is much more material for defaults than for transitions.)○ Defaults are shown to be much more material at the sub-investment grade ratings.• When comparing across years: ○ The 1932 matrix is shown as straight lines across all the plots for each rating to compare with the 1981-2019 period.The 1932 matrix was worse than any in the more recent period 1981-2019.○ 2009 and 2001 show relatively high levels of default and downgrade, which is expected given the financial crisis and dot com bubble respectively.
Table 4 shows the mean, standard deviation, skewness and excess kurtosis for the upgrades, downgrades and defaults based on data from 1981 to 2019 including the 1932 transition matrix.The main comments on the first four moments for upgrades, downgrades and defaults for each credit rating are: • For upgrades the mean and standard deviation increase as the ratings decrease.Each rating has a slightly positive skew; excess kurtosis is either close to zero or slightly above zero.• For downgrades, the mean and standard deviation decrease as the ratings decrease.The positive skewness is higher than for upgrades and the excess kurtosis is very high, indicating non-normal characteristics.• For defaults, the mean and standard deviation rise significantly as the ratings fall, with the mean default for AAA at zero, and by CCC/C at 25.7% of bonds defaulting within a year.The higher-rated assets have a more positive skewness, which gradually falls from AA to CCC/C.The AA and A ratings have a very high excess kurtosis with occasional defaults and long periods of no default from these ratings.• The ratings above CCC are more likely to downgrade/default than to upgrade.This feature is specifically captured in the two-factor model later in the paper with the "Optimism" parameter.

Models Explored
Four credit models are described in detail, split between parametric and non-parametric models.

Parametric Models
For parametric models, the systemic components of transition matrices are expressed as a function of a small number of parameters.In this Section two parametric models are discussed: • Vašíček (calibrated using Belkin approach) • Two-factor model (a model introduced in this paper)

The Vašíček model
Oldrich Vašíček first considered the probability of loss on loan portfolios in 1987 (Vašíček, 1987).
Starting from Merton's model of a company's asset returns (Merton, 1974), the question Vašíčekwas seeking to answer was relatively simple: what is the probability distribution of default for a portfolio of fixed cashflow assets?Vašíček required several assumptions for the portfolio of assets: • All asset returns are described by a Wiener process.In other words, all asset values are lognormal distributed, similar to Merton's approach.• All assets have the same probability of default p.
• All assets are of equal amounts.
• Any two of the assets are correlated with a coefficient ρ (rho).
The starting point in Vašíček's model was Merton's model of a company's asset returns, defined by the formula: where T is the maturity of the asset, W(t) is standard Brownian motion, asset values (denoted A(t)) are lognormal distributed, µ and σ 2 are the instantaneous expected rate and instantaneous variance of asset returns respectively, and X represents the return on a firm's asset.In this setting, X follows a standard normal distribution, given by X WT W 0 T p .The next step in Vašíček's model was to adapt Merton's single-asset model to a portfolio of assets.For a firm denoted i (with i = 1, : : : , n), Equation (1) can be rewritten as: Given the assumptions above, and the equi-correlation assumption for variables X i , it follows that the variables X i belong to an equi-correlated standard normal distribution (equi-correlation means all assets in the portfolio are assumed to have the same correlation with one another).Any variable X i that belongs to an equi-correlated standard normal distribution can be represented as a linear combination of jointly standard normal random variables Z and Y i such that: Equation ( 3) is a direct result of statistical properties of jointly equi-correlated standard normal variables, which stipulates that any two variables X i and X j are bivariate standard normal with correlation coefficient ρ if there are two independent standard normal variables Z and Y for which X i Z and X j ρZ 1 ρ2 p Y, with ρ a real number in [−1, 1] 2 .Note that it can be shown that the common correlation of n random variables has a lower bound equal to 1 n 1 .As n tends to infinity, ρ will have a lower bound of 0, also known as the zero lower bound limit of common correlation.In other words, for (very) large portfolios, firms' assets can only be positively correlated, as is their dependence on systematic factors.
With each firms' asset return X i of the form , variable Z is common across the entire portfolio of assets, while Y i is the i th firm's specific risk, and independent from variable Z and variables Y j , where j <> i.
As a parenthesis, the covariance between two firms' asset returns X i and X j is determined by , where σ X i and σ X j are the standard deviations of each firm's asset returns.For a fixed ρ, a higher variance of asset returns requires a higher covariance of asset returns and viceversa.For standard normal variables, σ X i σ X j 1, and hence ρ ij cov X i ; X j .The final step in Vašíček's model is the derivation of a firm's probability of default, conditional on the common factor Z. This is relatively straightforward: Finally, although Vašíček has not considered credit ratings in his setting other than the default state, from Equation (4) it follows that, conditional on the value of the common factor Z, firms' loss variables are independent and equally distributed with a finite variance.
The loss of an asset portfolio (e.g. a portfolio represented in a transition matrix) can thus be represented by a single variable on the scaled distribution of variable X i .Although overly simplistic, this setup is helpful for analysing historical data (such as historical default rates or transition matrices) and understanding the implications of asset distributions and correlations in credit risk modelling.In the following Section, we consider a method that applies a firm's conditional probability of default to historical transition matrices.
where Z, Y i and ρ are as per Vašíček's framework, and X i is the standardised asset return of a portfolio in a transition matrix.
The method proposed in Belkin employs a numerical algorithm to calibrate the asset correlation parameter ρ and systematic factors Z subject to meeting certain statistical properties (e.g. the unit variance of Z on the standard normal distribution), using a set of historical transition matrix data.This approach allows a transition matrix to be represented by a single factor, representing that transition matrix's difference from the average transition matrix.Matrices with more downgrades and defaults are captured with a high negative factor; matrices with relatively few defaults and downgrades have a positive factor.
Representing a full transition matrix (of 49 probabilities) with a single factor inevitably leads to loss of information.More information can be captured in a two-factor model, which is introduced in Section 5.1.2.
The Belkin implementation uses the standard normal distribution.If this distribution was replaced with a fatter-tailed distribution, it could be used to strengthen the calibration of extreme percentiles (e.g. the 99.5 th percentile).However, other moments of the distribution are also expected to be impacted (e.g. the mean) which would need to be carefully understood before implementation.This approach has not been explored in this paper, but it would not be expected to change the directional results seen in Section 6.
5.1.2.The two-factor model 5.1.2.1.Two-factor model description.The Vašíček model includes just a single parameter used to model transition matrices and some of the limitations of the model arise from an oversimplification of the risk.In this Section, a two-factor model is described to capture two important features of the way transition matrices change over time, particularly in stress.This model is based on a description given by Rosch and Scheule (2008).Rosch and Scheule (2008) use two defined features of a transition matrix they describe as "Inertia" and "Bias."In this paper we use the terms "Inertia" and "Optimism" as the term bias is widely used in statistics, potentially causing confusion with other uses such as statistical bias in parameter estimation.
Inertia is defined as the sum of the leading diagonal of the transition matrix.For a transition matrix in our setting with probabilities p ij , where i denotes row and j denotes column: Optimism is defined as the ratio between the upgrade probabilities and downgrade probabilities summed over all seven rows and weighted by the default probabilities in each row: p iD is the default probability for each row and this value is summed across.P i < j p ij is the sum of all upgrades in each row.P i > j p ij is the sum of all downgrades in each row.Appendix B gives a sample calculation of Inertia and Optimism for a given matrix.
Any historical matrix can be characterised by these two factors and the base transition matrix (a long-term average transition matrix used as a mean in the best estimate) can be adjusted so that its Inertia and Optimism correspond to those in an historical transition matrix.This allows the generation of an historical time series for Inertia and Optimism based on our historical data set; then to fit probability distributions to the values of these parameters that can then be combined with a copula.
It would also be possible to extend the model by weighting the values of Inertia and Optimism by the actual assets held in the portfolio (but this is not explored in this paper).In this paper, Optimism has only been calculated based on AA-B ratings; but in practice, this could be changed to be closer to the actual assets held.
This means that with probability distributions for Inertia, Optimism and a copula all calibrated from historical data, a full probability distribution of transition matrices can be produced.
The base transition matrix can be adjusted so that its Inertia and Optimism are consistent with the parameters of a historical matrix or parameters outputted from a probability distribution by using the following steps (BaseInertia and StressInertia and BaseOptimism and StressOptimism are used for the base Inertia and Optimism and the Inertia and Optimism from the matrix the base matrix is adjusted to have the same values as): 1. Multiply each of the diagonal values by StressInertia/BaseInertia. 2. Adjust upgrades and downgrades so the rows sum to 1, by dividing them by a single value.3. Adjust upgrades and downgrades so that their ratio is now in line with StressOptimism.
The adjustments required for steps 2 and 3 above are now defined.To do this the matrices required to calculate the adjustments are first defined.
• Elements of the base transition matrix are defined as p 1 ij for the i th row and j th column from this matrix.
• Elements of the matrix after step 1 (having the diagonals adjusted by StressInertia/ BaseInertia) are defined by p 2 ij .• Elements of the matrix after step 2 (upgrades and downgrades adjusted to sum to 1) are defined by p 3 ij .• Elements of the matrix after step 3 (upgrades and downgrades adjusted so optimism is equal to StressOptimism) are defined by p 4 ij .
Following step 1, upgrades and downgrades in step 2 are given by: This gives a matrix which has the same Inertia as the StressInertia, but the Optimism is still not the same as the StressOptimism.
To carry out step 3 no change is needed for the AAA or CCC/C categories, which have no upgrades nor downgrades respectively.For the other ratings, a single factor is found to add to the sum of the upgrades and subtract from the sum of the downgrades to give new upgrades and downgrades, which have the same ratio as the value of StressOptimism.This factor is: The final matrix can now be found by: . Two-factor model calibration.This Section describes an approach to calibrating the twofactor model.Using historical data for historical transition matrices from 1981 to 2019, as well as the 1932 matrix and a matrix from 1931 to 1935, a time series of Inertia and Optimism can be constructed.It is then possible to fit probability distributions from this data and simulate these probability British Actuarial Journal distributions, combined with the approach of adjusting the base transition matrix for given Inertia and Optimism to give a full probability distribution of transition matrices.
The data is shown in Figure 5 with the time series compared to the values for 1932 and 1931-1935.
The first four moments for this data are shown in Table 5 (including the 1932 and 1931-1935 matrices).
The correlations between the two data series are given in Table 6: The main comments on these data series are: • Inertia is negatively skewed and has a fat-tailed distribution.
• Optimism is slightly positively skewed and with a slightly fatter tail than the normal distribution.• The two data series are correlated based on the Pearson, Spearman or Kendall Tau measure of correlation.In the most extreme tail event, both Inertia and Optimism were at their lowest values.This means that this year had the most amount of assets changing rating as well as the greatest number of downgrades relative to upgrades in that year.To simulate from these data sets probability distributions have been fitted to the two data series using the Pearson family of probability distributions.The Pearson Type 1 distribution produces a satisfactory fit to the two data sets as shown in Figure 6, comparing the raw data ("data") against 10,000 simulated values from the fitted distributions ("fitted distribution").
The plot above shows the actual historical data values for Inertia and Optimism from each of the historical transition matrices compared to the distributions fitted to this data.Note that the darker pink indicates where both the data (blue) and fitted distribution (pink) overlap.
As well as probability distributions for Inertia and Optimism, a copula is also needed to capture how the two probability distributions move with respect to one another.For this purpose, a Gaussian Copula has been selected with a correlation of 0.5.This correlation is slightly higher than found in the empirical data.The model could be extended to use a more complex copula such as the t-copula instead of the Gaussian copula.An alternative to a one or two-factor model is to use non-parametric models and, rather than fit a model to the data, use the data itself directly to generate a distribution for the risk.

Non-Parametric Models
While the Vašíček and two-factor models involve a specific functional form for transition matrices, there are also non-parametric methods for constructing distributions of transition matrices.These are multivariate analogues to the concept of an empirical distribution function, in contrast to formulas such as Vašíček's, which are akin to the fitting of a parametric distribution family.Two different non-parametric models are considered; the first involves dimension reduction using the K-means algorithm; the second is a bootstrapping approach, whereby historical transition matrices are simulated many times with replacement, to get a full risk distribution.

The K-means model
Under the K-means model the key steps are: 1. Apply the K-means algorithm to the data to identify a set of groups within the data set and decide how many groups are required for the analysis.2. Assign each of the groups to real-line percentiles manually, e.g.assign a group containing the 1932 matrix at the 0.5 th percentile, assign the average matrix of the 1931-1935 matrix at 0.025 th percentile, put an identity matrix at the 100 th percentile, put a square of the 1932 matrix at the 0 th percentile, etc. 3. Interpolate any percentiles needed in between using a matrix interpolation approach.
5.2.1.1.K-means to transition data.K-means clustering is an unsupervised clustering algorithm that is used to group different data points based on similar features or characteristics.K-means clustering is widely used when un-labelled data (i.e.data without defined categories or groups) needs to be organised into a number of groups, represented by the variable K (Trevino, 2016).The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided.The results of the K-means clustering algorithm are: 1.The means of the K clusters can be used to label new data.2. Labels for the training data (each data point is assigned to a single cluster).3. Rather than defining groups before looking at the data, clustering allows you to find and analyse groups that have formed organically.The "Choosing K" Section below describes how the number of groups can be determined.
Based on the transition risk data, we apply the K-means algorithm to group each of the transition year data points into groups with a similar profile of transitions from one rating to another.We present the K-means visualisations in Figure 7.In K-means clustering, as we increase the number of clusters, generally the sum of squares within and between groups reduces.The key idea is to optimise the number of groups (i.e.value of K) such that reduction in the sum of squares within and between the groups stops reducing substantially for higher numbers of groups.
Further details of the K-means clustering algorithm are given in Appendix A.
As shown in Figure 7: • 1931 and 1931-1935 average matrices are in separate groups as we increase the groups from 7 onwards.• Below K = 7 groups, the average 1931-1935 matrix does not come out as a separate group in its own right.
As shown in Figure 8, the total within clusters sum of squares reduces significantly as we increase the number of groups from K = 2 to K = 6.The reduction in total within clusters sum of squares does not reduce materially after K = 8.We have applied equal weights to each of the transition matrices.
5.2.1.2.Manually assign the matrix group to percentiles.Once K-means has been run, we get each of the transition matrices assigned to a group, as shown in Figure 7.For our example, we have selected K = 8.The next step is to manually assign each of the groups to a percentile value on the empirical distribution.This is set via an expert judgement process as follows: • Square of 1932 matrix is assigned to the 0 th percentile.
• 1932 matrix is assigned to the 0.5 th percentile.
• 1932-1935 average matrix is assigned to the 1.25 th percentile.
• A Group containing the 2002 matrix is assigned to the 8.75 th percentile.
• A Group containing the 2009 matrix is assigned to the 27.5 th percentile.
• A Group containing the 1981 matrix is assigned to the 48.75 th percentile.
• A Group containing the 2016 matrix is assigned to the 66.25 th percentile.
• A Group containing the 2017 matrix is assigned to the 81.25 th percentile.• A Group containing the 2019 matrix is assigned to the 93.75 th percentile.
• An identity matrix is assigned to the 100 th percentile.
All other percentiles are derived using a matrix interpolation approach as follows, where P 1 is the known matrix at percentile p 1 , P 2 is the known matrix at percentile p 2 , and Q is the interpolated matrix at percentile q.
Interp p 2 q p2 p 1 The bootstrapping model refers to the relatively simplistic approach of sampling from the original data set with replacement.In this case, there are n transition matrices from which 100,000 random samples are taken with replacement to give a distribution of transitions and defaults.This is a very simple model and the main benefit of being true to the underlying data without many expert judgements and assumptions.A significant downside of this model is it cannot produce scenarios worse than the worst event seen in history; this means it is unlikely to be useful for Economic Capital models where the extreme percentiles are a crucial feature of the model.Nevertheless, this model is included for comparison purposes as it is very close in nature to the underlying data.

Comparison of the Models Explored
In this Section, several metrics are used to compare the models described in Section 5: 1. Do the models show movements of a similar nature to historical data capturing the types of stresses seen historically?2. Do the models produce extreme percentiles sufficiently extreme to pass back-testing?3. Are the models calibrated in an objective, easily definable way?
The first of these metrics requires an assessment of how the model transition matrices compare to historical transition matrices.In particular, how do the model transition matrices move over time compared to how the historical transition matrices move?Sections 6.1, 6.2 and 6.3 describe how the movements of transition matrices generated by the models can be compared to historical data.Section 6.4 shows how the various models described in Section 5 compare in terms of this metric.Section 6.5 shows how the models compare to historical back-tests expected under UK regulatory frameworks.Section 6.6 compares the models in terms of the amount of expert judgement required to calibrate them.

The need to reduce dimensions
Transition matrix modelling is a high-dimensional exercise.It is almost impossible to visualise the 56-dimensional distribution of a random 7 × 8 matrix.To make any progress comparing models we need to reduce the number of dimensions while endeavouring to mitigate the loss of information.
Dimension reduction is even more necessary because commonly used transition models employ a low number of risk drivers, in order to limit calibration effort, particularly the need to develop correlation assumptions with other risks within an internal model.Vašíček's model, for example, has a single risk driver when a portfolio is large.If we have 7 origin grades and 8 destination grades (including default, but excluding NR), we could say that the set of feasible matrices under Vašíček's model is a one-dimensional manifold (i.e. a curve) in 56-dimensional space.This is a dramatic dimension reduction relative to the historical data.
We can simplify matters to some extent by modelling the transition matrices row-by-row, considering different origin grades separately.This is possible because the investment mandates for many corporate bond portfolios dictate a narrow range of investment grades most of the time, with some flexibility to allow the fund time to liquidate holdings that are re-graded out with the fund mandate.In that case, we are dealing with a few 8-dimensional random variables; still challenging but not as intimidating as 56 dimensions.
Popular transition models generally calibrate exactly to a mean transition matrix so that the means of two alternative consistently calibrated models typically coincide.It is the variances and covariances that distinguish models.

Principal components analysis
Principal components analysis is a well-known dimension-reduction technique based on the singular value decomposition of a variance-covariance matrix.A common criticism of PCA, valid in the case of transition modelling, is that it implicitly weights all variances equally, implying that transitions (such as defaults) with low frequency but high commercial impact have little effect on PCA results.We propose a weighted PCA approach which puts greater weight on the less frequent transitions.
Standard PCA can also be distorted by granulation, that is lumpiness in historical transition rates caused by the finiteness of the number of bonds in a portfolio.We now describe granulation in more detail and show how a weighted PCA approach, applied one origin grade at a time, can reveal the extent of granulation.
6.2.Granulation 6.2.1.Systematic and granulated models Some theoretical models start with transition probabilities for an infinitely large portfolio (the systematic model), and then use a granulation procedure (such as a multinomial distribution) for bond counts so that, for example, each destination contains an integer number of bonds.Other models such as that of Vašíček are specified at the individual bond level and then the systematic model emerges in the limit of diverse bond portfolios.
It is possible that two transition models might have the same systematic model, differing only in the extent of granulation.It could also be that discrepancies between a proposed model and a series of historical matrices are so large that granulation cannot be the sole explanation.It is important to develop tests to establish when model differences could be due to granulation.

Granulation frustrates statistical transformations
When a theoretical model puts matrices on a low-dimensional manifold, granulation can cause noise in both historical matrices and simulated future matrices, which are scattered about that systematic manifold.Granulation complicates naïve attempts to transform historical transition data.For example, under Vašíček's model, the proportion of defaults (or transitions to an X-orworse set of grades) is given by an expression involving a cumulative normal distribution whose argument is linear in the risk factor.We might attempt to apply the inverse normal distribution functions to historical default rates and then reconstruct the risk factor by linear regression.However, when the expected number of bond defaults is low, the observed default rate in a given year can be exactly zero, so that the inverse normal transformation cannot be applied.

Mathematical definition of granulation
For n ≥ 2, let S n denote the n-simplex, that is the set of ordered (n+1)-tuples (x 0 , x 1 , x 2 : : : x n ) whose components are non-negative and sum to one.
We define a granulation to be a set of probability laws {ℙ x : x ∈ S n } taking values in S n such that if a vector Y satisfies Y ∼ ℙ x then: The parameter h, which must lie between 0 and 1 is the Herfindahl index (Herfindahl 1950) of the granulation.

Granulation examples
One familiar example of a granulation is a multinomial distribution with n bonds and probabilities x, in which case the Herfindahl index is n −1 .In the extreme case where n = 1, this is a categorical distribution where all probability lies on the vertices of the simplex.In the other extreme, as n becomes large, the law P x is a point mass at x.Other plausible mechanisms for individual matrix transitions also conform to the mathematical definition of a granulation.For example, if bonds have different face values, we might measure transition rates weighted by bond face values.Provided the bonds are independent, this is still a granulation with the standard definition of the Herfindahl index.In a more advanced setting, we might allocate bonds randomly to clusters, with all bonds in each cluster transitioning in the same way, but different clusters transitioning independently.This too satisfies the covariance structure of a granulation.Transition models based on Dirichlet (multivariate-beta) distributions are granulations, with h −1 equal to one plus the sum of the alpha parameters.Finally, we can compound two granulations to make the third granulation, in which case the respective Herfindahl indices satisfy:

Granulation effect on means and variances of transition rates
We now investigate the effect of granulation on means and variance matrices of simplex-valued random vectors.Suppose that X is a S n -valued random vector, representing the systematic component of a transition model and that Y is another random vector with YjX P X for some granulation.
Let us denote the (vector) mean of X by π E X and the variance (-covariance) matrix of X by Then it is easy to show that the mean of Y is the same as that of X

E Y π
And the variance(-covariance) matrix of Y is: We are now able to propose a weighted principal components approach for models of simplexvalued transition matrices.
Suppose then that we have a model with values in S n .Its mean vector π is, of course, still in S n .Let us denote the variance matrix by V. Our proposed weighted PCA method is based on a singular value decomposition of the matrix.
As this is a real positive-semidefinite symmetric matrix, the eigenvalues are real and non-negative.The simplex constraint in fact implies no eigenvalue can exceed 1 (which is the limit of a categorical distribution).We can without loss of generality take the eigenvectors to be orthonormal.We fix the signs of eigenvectors such that the component corresponding to the default grade is non-positive so that a positive quantity of each eigenvector reduces default rates (and a negative quantity increases defaults).This is consistent with our definitions of Optimism and Inertia in Section 5.1.2.1.
As the components of a simplex add to 1, it follows that V1 0 where 1 is a vector of 1s.This implies that the weighted PCA produces a trivial eigenvector e with eigenvector zero and e triv π 1 2 diag π 1 2 1 6.3.2.Weighted PCA and granulation Suppose now that we have a non-trivial eigenvector e of the weighted systematic matrix with eigenvalue λ sys so that diag π 1 2 V sys diag π 1 2 e λ sys e It is easy to show that e is also an eigenvector of any corresponding granulated model, with transformed eigenvalue shrunk towards 1.
Thus, if one model is a granulation of another, the weighted eigenvectors are the same and the eigenvalues are related by a shrinkage transformation towards 1.This elegant result is the primary motivation for our proposed weighting.PCA usually focuses on the most significant components, that is those with the largest associated eigenvalues.In the context of a transition matrix, the smallest (non-zero) eigenvalue of a granulated model has a role as an upper bound for the Herfindahl index of any granulation.Where the systematic model inhabits a low-dimensional manifold, the smallest non-zero eigenvalue is typically close to zero, which implies that the smallest non-zero eigenvalue of a granulated model is a tight upper bound for the Herfindahl index.
Knowing the Herfindal index allows us to strip out granulation effects to reconstruct the variance matrix of an underlying systematic model.
• The bootstrap model has movements most like the raw data, which is in line with expectations as it is simply the raw data sampled with replacement.• The K-means model eigenvalues are also like the raw data because again it is effectively a closely summarised version of the raw data grouping the raw data into eight groups and interpolating between them for intermediate percentiles.
• The two-factor and Vašíček models both have just two components.This is to be expected as these models summarise full transition matrices with 2 or 1 parameters respectively.• Based on the eigenvalues, none of the models appears to be significantly different from the data to make them inappropriate for use.
The eigenvectors for each model are now compared in Figure 9 for BBB assets: Figure 9 shows the first two eigenvectors for BBB-rated assets, which have a similar pattern to other ratings.For PC1 there is a clear similarity between the raw data, bootstrapping, and K-means.The PC1 direction for these assets is a fall in the assets staying unchanged and a rise in all other categories (with a small rise for upgrades).The two-factor model is similar in nature to these non-parametric approaches, albeit with a larger rise in the upgrades increasing by one rating.However, the Vašíček model is structurally different for PC1 in that, for PC1, the upgrades are moving in the same direction as the assets staying unchanged; and the opposite direction from the downgrades/ defaults.
For PC2, the raw data, bootstrapping and K-means models are all very similar in nature.The two-factor model is also similar but notable with the down more than one rating category in the opposite direction to the non-parametric models.The Vasciek model is again structurally different from the other models with the "no change" group moving in the opposite direction to all the other categories.PC2 for the Vašíček is perhaps like PC1 for the other models in that the "no change" category is moving in the opposite direction to all other categories.
In this comparison, the non-parametric are moving most closely in line with historical data.The Vašíček model is structurally different from the raw data, with PC2 of the Vasciek being more akin to PC1 for the other models.This suggests that Vašíček is not capturing the movement in historical data.The two-factor model is an improvement the Vašíček in that it is a closer representation of the movement underlying raw data, which is what might reasonably be expected from the additional parameter.

Backtesting Comparison
A key requirement for transition and default models is that they meet any back-testing requirements.For example, the UK-specific requirement for Matching Adjustment Internal models is given in Supervisory Statement 8/18 point 4.3.4 as "compare their modelled 1-in-200 transition matrix and matrices at other extreme percentiles against key historical transition events, notably the 1930s Great Depression (and 1932 and 1933 experience in particular).This should include considering how the matrices themselves compare as well as relevant outputs".
In this Section, the 99.5 th percentile from the models is compared to the 1932 matrix.The 1932 matrix itself (Varotto, 2011), is shown in Table 8.
The four models being compared are: 1. Bootstrapping 2. The K-means model 3.The Vašíček model 4. The two-factor model The bootstrapping model is simply using the raw data, sampled with replacement.Thus, the most extreme percentile is simply the worst data point; in this case the 1932 matrix.On this basis, it might be concluded that the bootstrapping model passes the backtest, however, it also has no scenarios worse than the 1932 matrix.This means that scenarios worse than the worst event in history are not possible to model, which is a significant model weakness.
The K-means model produced as part of this paper had the 99.5 th percentile specifically set to be the 1932 matrix.On this basis, the model passes the backtest by construction.The model is flexible enough so that the percentiles of the various K-means clusters are selected by the user for the specific purpose required.The K-means model also has transition matrices stronger than the 1932 event with the 100 th percentile set at the 1932 matrix multiplied by itself (effectively two such events happening in a single year).
The Vašíček model calibrated to the data set described in Section 1 gives a rho value of just over 8%.The 99.5 th percentile from this model is shown in Table 9: Compared to the 1932 matrix, it can be clearly seen that: • The defaults are lower for most ratings.
• The transitions one rating lower for AA, A and BBB assets are lower.
• The leading diagonal values are higher.
This shows this matrix is not as strong as the 1932 matrix.However, it would be possible to the calibration of the Vašíček model, moving the rho parameter to say 30% as an expert judgement loadingspecifically targeted at passing the backtesting requirements.The updated 99.5 th percentile for this strengthened Vašíček model is shown in Table 10.Compared to the 1932 matrix, it can be clearly seen that: • The defaults are now largely higher than the 1932 matrix.
• The transitions one rating lower for AA, A and BBB assets are higher or comparable.
• The leading diagonal values are lower or more comparable.
The two-factor model has a range of 99.5 th percentiles that could be used, depending on the portfolio of assets it is applied to; but for the purposes of this paper, the 99.5 th percentile has been taken as the 99.5 th percentile of the sum of investment grade default rates.Using this approach, the below 99.5 th percentile model has been produced: Compared to the 1932 matrix, it can be seen that: • The defaults are higher only for BBB investment grade assets; with the 1932 matrix higher for other investment grades.• The transitions one rating lower for AA, A and BBB assets are more comparable to the 1932 matrix than the unadjusted Vašíček calibration, but slightly lower than the 1932 matrix.• The leading diagonal values are slightly higher than the 1932 matrix.
Overall the two-factor transition matrix at the 99.5 th percentile is slightly weaker than the 1932 matrix; but stronger than the Vašíček 99.5 th percentile.This model could also be strengthened in a similar way to the strengthening of the Vašíček modelwith an expert judgement uplift to one of the parameters.There are a few more options where just an adjustment might be applied: 1.In the probability distributions used to model Inertia and Optimism.These could be fattertailed distributions than those chosen in the calibration in this paper.2. The copula used to model Inertia and Optimism could be made a t-copula rather than a Gaussian copula.This is potentially more appropriate as in practice the 1932 matrix has the most extreme values for both variables, indicating a tail dependence perhaps more in line with the t-copula than the Gaussian copula.3. Specific adjustments could be made to the parameters of the calibrated risk distributions.

Objectivity Comparison
Each of the models has varying levels of expert judgement applied, and this Section compares each.

Conclusions
Transition and default modelling is one of the most complex risks to be modelled by insurance companies.The use of transition matrices creates a large modelling challenge due to the large number of data items contained in each matrix and how these interact with each other.This paper has reviewed four models for assessing this risk, two of which (the K-means and two-factor model) are not previously captured in the literature.The four models have been compared with several metrics, including a new test using PCA to compare model movements to historical data movements for transition matrix models.The PCA-based test has highlighted a deficiency in the Vašíček model in that it does not replicate the way historical data moves.The first principal component of the Vašíček model is not well matched to the first principal component of the underlying data, and the second principal component of the Vašíček model is perhaps closer to the first principal component in the underlying data.The other three models shown in this paper capture the historical movement of the underlying data more accurately than the Vašíček model.
The non-parametric models have the advantage of having historical movements very close to the historical data, but the bootstrapping approach has a limitation of not producing stresses worse than the worst historical data point.The K-means model in the form presented in this paper has a relatively significant amount of expert judgement in its construction.The two-factor model has the advantage of being relatively simple to apply, with an improved representation of the historical data (relative to the Vašíček model); but not as close to the historical data as the nonparametric models.The Optimism is calculated as the default weighted sum of upgrades divided by the sum of downgrades and defaults.The defaults used in the weighting are shown in yellow.The upgrades are in blue, the downgrades are in green.

Figures
Figures 1, 2, 3 and 4 show the 1932 values compared to the 1981-2019 data.The 1931-1935 transition matrix has been used in the model calibrations in this paper but is not shown in the plot below.The above analysis shows:

Figure 3 .
Figure 3. Default rates for investment-grade assets.
5.1.1.1.Vašíček model calibration -Belkin.Belkin et al. (1998) introduced a statistical method to estimate the correlation parameter ρ and common factors Z in Vašíček's model based on historical transition matrices.The starting point in their model is the one-factor representation of annual transition matrices (denoted by variables X i , with i representing year):

Figure 6 .
Figure 6.Historical plots of (a) Inertia and (b) Optimism compared to fitted distributions.

Figure 8 .
Figure 8. K-means clustering examples with different K valuessum of squares within clusters.

Figure 9 .
Figure 9. Plots of the eigenvectors of four models and raw data for (a) BBB for PC1 and (b) BBB for PC2.

Table 2 .
Comparison of transition risk data sources considered.

Table 4 .
First four moments for downgrades, upgrades, and defaults.

Table 5 .
First four moments of inertia and optimism.

Table 6 .
Correlation between inertia and optimism.

Table 9 .
The 99.5 th transition matrix from the Vašíček model.

Table 10 .
The 99.5 th transition matrix with strengthened Vašíček calibration.

Table 11 .
The two-factor 99.5 th percentile transition matrix.

Table B1 .
Highlighted example of inertia.

Table B2 .
Highlighted example of optimism.