11.1 Introduction
In this chapter, we describe in more detail validation issues and activities surrounding PD, LGD, and EAD modeling and internal borrower and facility ratings. To accomplish this task, we present several case studies. The six case studies are based on banking institutions’ practices observed by the author and other regulators, and they focus on important issues identified in the validation process.
This chapter is organized according to the various major steps that a complete model validation would take in addressing the following issues: (1) use of the model; (2) internal and external data; (3) model assumptions and methodologies; (4) model performance; (5) outcomes analysis; and (6) the quality and comprehensiveness of development documentation (see, e.g., Glowacki Reference Glowacki2012). This chapter also includes some observations on partial model validation and review of vendor models.
11.2 Validation of Use
A model validation generally begins the way one would start to develop a financial model; that is, by understanding the use of the model. This will help determine the level of detail of the model validation and allow the model validation group to focus on key areas of the model. For example, while reviewing a loss model for the purpose of CCAR/DFAST stress tests, it is important that the results produced by the model are reasonable and robust under a stressful environment. In contrast, for pricing models, where the focus is to develop an average cost, the results produced by the model for extremely stressful scenarios may not be as important in the model validation. A model validation should identify the use of the model, whether the model is consistent and applicable for the intended use, and it should ensure that the model is not being used for purposes beyond the capabilities of the model.
11.2.1 Use Validation: AIRB Regulatory Capital Models
Basel II AIRB Models Guidance: IRB components should be integrated for internal risk management purposes and thus, validation activities related to uses are required. The use test for estimates is broader and requirements are based on paragraph 444 of Basel II (June 2006).
The IRB use test is based on the conception that supervisors can take additional comfort in the IRB components where such components “play an essential role” in how banks measure and manage risk in their businesses. If the IRB components are solely used for regulatory capital purposes, there could be incentives to minimize capital requirements rather than produce accurate measurement of the IRB components and the resulting capital requirement. Moreover, if IRB components were used for regulatory purposes only, banks would have fewer internal incentives to keep them accurate and up-to-date whereas the employment of IRB components in internal decision making creates an automatic incentive to ensure sufficient quality and adequate robustness of the systems that produce such data.
In an internal Basel survey (BCBS SIGBB Internal Discussion Memo 2016), supervisors acknowledged that universal usage of the IRB components for all internal purposes is not necessary. In some cases, differences between IRB components and other internal risk estimates can result from mismatches between prudential requirements in the Basel II Framework and reasonable risk management practices, business considerations, or other regulatory and legal considerations. Examples include different regulatory and accounting requirements for downturn LGD, PD and LGD floors, annualized PDs and provisioning. Other examples of where differences could occur include pricing practices and default definitions.
In general, there are three main areas where the use of IRB components for internal risk management purposes should be observable: strategy and planning processes, credit exposure management, and reporting. Uses in any of these areas provide evidence of internal use of IRB components. If IRB components are not used in some of these areas, the supervisor may require an explanation for such non-use, or may raise concerns about the quality of the IRB components.
In many instances, supervisors will need to exercise considerable judgement in assessing the use of IRB components. For example, supervisors have noticed use of adjusted IRB parameters in key business processes. The types of adjustments that require justifications include:
Removal of conservative layers, like a downturn adjustment or application of floors
Adjustments to have point in time (PIT) parameters rather than through the cycle (TTC) parameters
Adjustments of the time horizon which can be different from the twelve months used for regulatory capital.
Banking institutions are responsible for proving that they comply with the use test requirement. They should document and justify adjustments made to IRB components for use in key operational processes such as:
Risk appetite / capital allocation
Credit granting (including pricing, limits)
Credit monitoring /early warning
Internal reporting
NPL management /recovery
Provisioning/cost of risk
Performance, RAROC, remuneration
Economic capital and ICAAP
Stress testing.
While one banking institution’s approach to wholesale IRB was consistent with risk management practices at the institution, the bank’s model validation team noted several important areas of divergence: First, the banking institution’s facility risk ratings (FRRs) were judgmental adjustments to the obligor risk ratings (ORR), achieved through a notching process. Second, when combined with the final ORR, the FRR approximates an expected loss. This is different from LGD, which is a stressed loss number based on facility characteristics only. Third, for internal credit risk management purposes, the banking institution was not using a dual risk ratings system, which strictly separates obligor and facility characteristics.
Since the banking institution’s FRRs approximate expected loss, the resulting internal risk capital calculation would likely underestimate the amount of capital required to be held against a given loan under Basel II rules. The bank’s management believed that the FRR is a better assessment of facility risk than LGD because the latter’s numerous LGD segments may not contain enough actual data to support estimates. The layer of conservatism that is required, given the lack of LGD data, would therefore overestimate the capital requirement. The bank was also using its FRRs in the allowance for loan and lease losses (ALLL) calculations.
While baseline and final ORRs were consistent and were validated against PDs used in the AIRB approach, the bank was layering on a further qualitative assessment in determining obligor limit ratings (OLRs) for committed facilities beyond one year. OLRs were used to determine risk tolerance for individual borrowers as well as related groups of borrowers and they were generally viewed to be a compensating adjustment for facilities with a longer term. The ability to predict default becomes more uncertain for longer-term facilities and relies upon a judgmental process that calls for expertise that credit analysts may not possess. In reality, the vast majority of OLRs were the same as the ORRs, but management was unable to produce a reliable analysis of these data. The OLR is standardized at the economic group level of the borrower and it was usually based on consolidated financial information. It did not take into account individual group member (obligor) PD. So while each obligor, in compliance with Basel II standards, is assigned a PD, this PD was not used for limit setting.
11.2.2 Use Validation: CCAR/DFAST Models
Stress Testing Models: The validation focus for model use is the reasonability and robustness of outcomes. Consequently, in addition to performance and back-testing, guidance emphasizes validation related to:
Model assumptions and limitations
Model overlays
Sensitivity analysis
Challenger/Benchmark models.
Supervisory Guidance: Models used in the capital planning process should be reviewed for suitability for their intended uses. A firm should give particular consideration to the validity of models used for calculating post-stress capital positions. In particular, models designed for ongoing business activities may be inappropriate for estimating losses, revenue, and expenses under stressed conditions. If a firm identifies weaknesses or uncertainties in a model, the firm should make adjustments to model output if the findings would otherwise result in the material understatement of capital needs (SR15–18 guidance,Footnote 1 pp 9–10.) SR 15–18 also outlines expectations for model overlays, benchmark models, and sensitivity analysis.
Model Assumptions and Limitations: Banking institutions make modeling assumptions that are informed by their business activities and overall strategy. For example, in CRE stress-loss modeling, banks facing a lack of historical data may assume LTV or DSCR default threshold triggers based upon LOB specialist judgment. Such assumptions can significantly impact model effectiveness, to the extent that the resulting model output differs from what is realized in practice. As a result, banks should mitigate model risk by quantifying the effects of assumptions to demonstrate:
Consistency of results with economic scenarios
Outcomes conservatism and the consideration of adjustments to account for model weaknesses
Comprehensive documentation of specialist panel discussions
Reasonableness from a business perspective
Use of benchmark data to the extent possible
Testing of assumptions and quantifying the impact of these on model output.
Model Overlays (SR15–18, Appendix B): A BHC may need to rely on overrides or adjustments to model output (model overlay) to compensate for model, data, or other known limitations. If well-supported, use of a model overlay can represent a sound practice.
Model overlays (including those based solely on expert or management judgment) should be subject to validation or some other type of effective challenge. Consistent with the materiality principle in SR 11-7 and OCC Bulletin 2011–12, the intensity of model risk management for overlays should be a function of the materiality of the model and overlay. Effective challenge should occur before the model overlay is formally applied, not on an ex post basis.
Sensitivity Analysis: Sensitivity analysis is an important tool for stress testing model robustness and checking for model stability. In sensitivity analysis, a model’s output is evaluated by changing or stressing individual input factors to understand the model’s dependency on these individual factors. Sensitivity analysis can be used as part of a bank’s champion/challenger process: model validation testing; quantifying a model risk buffer; and demonstrating the conservatism of model assumptions.
Sensitivity analysis should be conducted during model development and as well as in model validation to provide information about how models respond to changes in key inputs and assumptions, and how those models perform in stressful conditions (SR 15–18.)
Challenger/Benchmark Models: Champion/challenger frameworks are important for model governance, delivering model robustness, usability and long-term performance. They are a critical source of performance benchmarking to assess:
11.2.3 Use Validation: Summary and Conclusions
The use test’s ultimate goal is to enhance the quality of IRB parameters or stress model estimates, through continuous emphasis on improving the estimation processes. The conditions to create continuous emphasis on the quality of model outputs are: active interaction between users and modelers, and a good understanding of the model, its assumptions, and its limitations among model developers and users.
Active involvement of model users is expected in model development and model maintenance. This should be clearly described in the model development or governance policy and verified through the analysis of modeling workgroup minutes (regular presence of business representatives, suggestions made by the business etc.). For CCAR/DFAST, supervisors look for evidence of active LOB engagements during the risk identification (e.g., segmentation, risk drivers, variable selection) and outcomes challenge processes.
Model developers should demonstrate the efforts made to explain their models to users. This can be found in the agenda of the modeling workgroup, supported by an assessment of the clarity of the presentations and minutes of that workgroup and of the model documentation. The number and quality of the training sessions with users could also be checked. In their discussions with users, modelers should be especially transparent regarding key modeling assumptions and the main constraints and shortcomings of the model. Senior management should also be aware of the main features of the models and all major shortcomings. The validation report must clearly state the constraints, shortcomings, and the corrective actions, if any.
11.3 Validation of Data (Internal and External)
The data and other information used to develop a model are of critical importance. As a result, there should be a rigorous assessment of data quality and relevance, along with appropriate documentation. The second step of a model validation is to review the data used to develop the model. The model validation group would start with the same data that were used to develop the model. The model validation review of the data could include: univariate analyses to independently identify potential variables to include in the model; a review of the range of the response or outcome variable being modeled (e.g., the minimum and maximum default rate in the data by calendar quarter); a review of the number and magnitude of stressful events included in the data; data exclusions; and other tests. External data not used in model development could be added to the validation dataset to check for other risk drivers that were not considered in the development stage of the model. The intent of this part of model validation is to understand limitations of the data used to develop the model and their implications for the estimates produced by the model (see, e.g., Glowacki (Reference Glowacki2012)).
Data availability for wholesale portfolio loss modeling is a challenge for many banking institutions. Several types of portfolios may have very few defaults. For example, some portfolios historically have experienced low numbers of defaults and are generally – but not always – considered to be low-risk (e.g., portfolios of exposures to sovereigns, banks, insurance companies or highly rated corporates). Other portfolios may be relatively small in terms of total exposures, either globally or at an individual bank level (e.g. project finance, shipping), or a banking institution may be a recent market entrant for a given portfolio. Other portfolios may not have incurred recent losses, but historical experience, or other analysis, may suggest there is a greater likelihood of losses than is captured in recent data.
11.3.1 Data Validation: AIRB Regulatory Capital Models
The Basel II framework recognizes that relatively sparse data might require increased reliance on alternative data sources and data-enhancing tools for quantification and alternative techniques for validation. The Basel guidance (BCBS (2005), No. 6) also recognizes that there are circumstances in which banking institutions will legitimately lack sufficient default history to compare realized default rates with parameter estimates that may be based in part on historical data. In such cases, greater reliance must be placed on validation techniques such as:
Pooling of data with other banks or market participants, the use of other external data sources, and the use of market measures of risk can be effective methods to complement internal loss data. A bank would need to satisfy itself and its supervisor that these sources of data are relevant to its own risk profile. Data pooling, external data and market measures can be effective means to augment internal data in appropriate circumstances. This can be especially relevant for small portfolios or for portfolios where a bank is a recent market entrant.
Internal portfolio segments with similar risk characteristics might be combined. For example, a bank might have a broad portfolio with adequate default history that, if narrowly segmented, could result in the creation of a number of low default portfolios. While such segmentation might be appropriate from the standpoint of internal use (e.g., pricing), for purposes of quantifying risk parameters for regulatory capital purposes, it might be more appropriate to combine the sub-portfolios.
In some circumstances, different rating categories might be combined and PDs quantified for the combined category. A bank using a rating system that maps to rating agency categories might find it useful, for example, to combine AAA, AA and A-rated credits, provided this is done in a manner that is consistent with paragraphs 404–405 of the Basel II Framework. This could enhance default data without necessarily sacrificing the risk-sensitivity of the bank’s internal rating system.
The upper bound of the PD estimate can be used as an input to the formula for risk-weighted assets for those portfolios where the PD estimate itself is deemed to be too unreliable to warrant direct inclusion in capital adequacy calculations.
Banks may derive PD estimates from data with a horizon that is different from one year. Where defaults are spread out over several years, a bank may calculate a multi-year cumulative PD and then annualize the resulting figure. Where intra-year rating migrations contain additional information, these migrations could be examined as separate rating movements in order to infer PDs. This may be especially useful for the higher-quality rating grades.
If low default rates in a particular portfolio are the result of credit support, the lowest non-default rating could be used as a proxy for default (e.g., banks, investment firms, thrifts, pension funds, insurance firms) in order to develop ratings that differentiate risk. When such an approach is taken, calibration of such ratings to a PD consistent with the Basel II definition of default would still be necessary. While banks would not be expected to utilize all of these tools, they may nevertheless find some of them useful. The suitability and most appropriate combination of individual tools and techniques will depend on the bank’s business model and characteristics of the specific portfolio.
11.3.2 Data Validation: CCAR/DFAST Models
The CCAR Range of Practice Expectations (ROPE) guidance states that banking institutions should ensure that models are developed using data that contain sufficiently adverse outcomes. If an institution experienced better-than-average performance during previous periods of stress, it should not assume that those prior patterns will remain unchanged in the stress scenario. As such, institutions should carefully review the applicability of key assumptions and critically assess how historically observed patterns may change in unfavorable ways during a period of severe stress for the economy, the financial markets, and the institution.
For CCAR/DFAST loss and revenue estimates, banking institutions should generally include all applicable loss events in their analysis, unless an institution no longer engages in a line of business, or its activities have changed such that the institution is no longer exposed to a particular risk. Losses should not be selectively excluded based on arguments that the nature of the ongoing business or activity has changed – for example, because certain loans were underwritten to standards that no longer apply, or were acquired and, therefore, differ from those that would have been originated by the acquiring institution.
The supervisory expectations for model validation as laid out in SR11–7 and OCC Bulletin 2011–12 address all stages of the model development lifecycle, including the review of reference data. Specifically, data quality assessment would include:
Assessing the appropriateness of the selected data sample for model development; for stress testing purposes the data sample should include at least one business cycle
Evaluating the portfolio segmentation scheme in accordance with FR Y-14A reports submitted to the Federal Reserve
Data reconciliation (e.g., exclusions) and validity checks
Assessing treatment of missing values and outliers
Assessing suitability of using proxy data where applicable.
The bank uses a credit ratings transition matrix model (TMM) framework to forecast quarterly transition rates under specified macroeconomic scenarios. The TMM is developed for each of a dozen segments defined by business type and region (including international segments). The TM model forecasts key credit ratings migration rates, i.e., upgrade, downgrade and default, at a segment level.
The Bank’s Model Validation assessed the model development data inputs and sources; the quality and relevance of the model development data; the data processing and data exclusions; and the dependent and independent variable definitions and transformations.
Data Inputs and Sources: The TMM was developed using internal historical ratings and default data and external data from Moody’s Default and Recovery Database (DRD). Validation reviewed Loan data; Risk Rating Data; Historical Defaults; and Moody’s DRD. Validation reported the following observations:
Obligor ratings were not actually refreshed every quarter. Therefore, historical quarterly transitions may appear muted and rating inertia may be overstated in the calibration dataset.
Based on independent analysis of the raw datasets for the TMM and the LGD models, discrepancies existed in the default counts between the two datasets.
Inconsistencies exist with respect to assignment of risk segment, risk ratings and defaults for the population of common obligors in both the internal and external data.
To assess data quality and completeness, validation:
Performed data reconciliation and found discrepancies between position data and the development data.
Checked Accuracy of Raw Data: Verified that the default flag provided in the raw dataset was accurate.
Checked for Completeness of External Data: Verified that complete external data were used and all exclusions were reasonable and documented.
Checked for Quality of External Data: Ensured that key fields were populated with intuitive and valid data points.
Review of Model Development Data Relevance: Validation noted that the historical data ranges for all segments were not identical, which is not a good practice. The internal dataset was augmented with external data to address the issue of insufficient data, particularly for international segments ratings-transition data. Developers presented an analysis of consistency of external data with internal data with respect to the definition of default, risk rating and risk segment. Validation observed that, while the results for default rates and risk segment were comparable between internal and external data, results for mapping Moody’s to internal risk ratings were less satisfactory. Validation also noted that the model reference data period was sufficiently long and included a period of severe economic stress (as per supervisory guidelines).
Data Processing and Exclusions: The model developers did not perform an analysis of the impact of data exclusions on the TMM component. Validation independently implemented the exclusions applied to the development dataset and assessed the rationale for each exclusion in light of business intuition and impact of observed default in the development data. Validation observed that, due to the exclusion of scorecards with data anomalies, upgrade and downgrade rates in several risk segments within the TMM framework changed significantly.
11.3.3 Data Validation: Summary and Conclusions
Effective data validation practices include:
Checking data samples used for model development, calibration or validation to verify the accuracy, consistency and completeness of the data used
Checking a sample of observations to verify whether rating processes are applied appropriately
Investigating missing data and data exclusions to ensure the representativeness of data used
Reconciling figures between business reports (e.g., accounting information) and model development (e.g., risk databases)
Understanding the bank’s rationale for certain data aggregations, as well as evaluating inconsistences between the source data and the data actually used for model development
Evaluating the exclusion or filtering criteria used for creating model development and validation data samples
Reviewing the adequacy of the data cleaning policy
Reviewing computer codes (e.g., SAS, Stata, R, MATLAB, Excel) used for the risk rating, parameter estimation, or model validation processes.
11.4 Validation of Assumptions and Methodologies
This step of a model validation process involves a review of the selection of the type of model and the associated modeling assumptions to determine if they are reasonable approximations of reality. The selection of the model type should be justified by the model development team and include a discussion of the types of models considered but not selected. The structure of the model should reflect significant properties of the response or outcome variable being modeled. As noted by Glowacki (Reference Glowacki2012), a logistic model is commonly selected to estimate PDs as it has desirable attributes such as the ability to model dichotomous events (default or no default) and produce a probability estimate between 0 and 1. The logistic model is not always appropriate, however, particularly when average default rates are small (around 1%). For stress-test applications, stressed default rates can easily spike to greater than 10% or 20%, due to the shape of the logistic curve. This can be problematic, since these results are generally not consistent with actual experience. In sum, the model validation group must be aware of the properties of a logistic model and be able to determine and to assess the appropriateness of the assumptions underlying use of the model.
If the model under review is a regression model, a model validation should include a review of the variables and coefficient estimates in the model, the methodology used for selecting the variables, and the goodness-of-fit results for the model. This review would include an understanding of any transformations performed on the data in the regression model for reasonableness, as well as discussions with the model development team on variable selection to understand the process utilized in developing the model.
If the model is not a regression model, a model validation should include a review of the form of the model, the inputs into the model, and the sensitivity of the model to these inputs. Part of the model validation should also include discussions with the model development team on how the model was developed, the reasoning for ultimate model selection, and limitations of the model.
11.4.1 Validation of Assumptions and Methodologies: AIRB Regulatory Capital Models
The validation process involves the examination of the rating system and the estimation process and quantification methods for PD, LGD and EAD. It also requires verification of the minimum requirements for the AIRB approach. The application of validation methods is closely linked to the type of rating system and its underlying data, e.g., ratings for small business lending will typically be of a more quantitative nature, based on a rather large quantity of data. Sovereign ratings instead will typically place more emphasis on qualitative aspects because these borrowers are more opaque and default data are scarce (see, e.g., BCBS (2005), No. 14, p. 8 and BCBS (2005) No. 6).
Validation by a banking institution consists of two main components: validation of the rating system and estimates of the risk components PD, LGD, and EAD; and validation of the rating process, focusing on how the rating system is implemented.
In the case of a model-based rating system, the validation of the model design should include, for example, a qualitative review of the statistical model building technique, the relevance of the data used to build the model for a specific business segment, the method for selecting the risk factors, and whether the selected risk factors are economically meaningful.
Evaluation of an internal rating process involves important issues like data quality, internal reporting, how problems are handled and how the rating system is used by the credit officers. It also entails the training of credit officers and a uniform application of the rating system across different branches. Although quantitative techniques are useful, especially for the assessment of data quality, the validation of the rating process is mainly qualitative in nature and should rely on the skills and experience of typical banking supervisors. The following paragraph provides more detail on these issues.
Banking institutions must first assign obligors to risk grades. All obligors assigned to a grade should share the same credit quality as assessed by the bank’s internal credit rating system. Once obligors have been grouped into risk grades, the bank must calculate a “pooled PD” for each grade. The credit-risk capital charges associated with exposures to each obligor will reflect the pooled PD for the risk grade to which the obligor is assigned. While supervisory guidance presents permissible approaches to estimating pooled PDs, it permits banks a great deal of latitude in determining how obligors are assigned to grades and how pooled PDs for those grades are calculated. This flexibility allows banks to make maximum use of their own internal rating and credit data systems in quantifying PDs, but it also raises important challenges for PD validation. Supervisors and bank model validators will not be able to apply a single formulaic approach to PD validation, because the dynamic properties of pooled PDs depend on each bank’s particular approach to rating obligors. Supervisors need to exercise considerable skill to verify that a bank’s approach to PD quantification is consistent with its rating philosophy. The underlying rating philosophy definitely has to be assessed before validation results can be judged, because the rating philosophy is an important driver for the expected range for the deviation between the PDs and actual default rates.
Banking institutions typically employ two stages in the validation of PD: validation of the discriminatory power of the internal obligor risk rating; and validation of the calibration (i.e., accuracy) of the internal rating system.
Quantitative measures used to test the discriminatory power of a rating system include (see, e.g., Scandizzo (2016), Loffler and Posch (2007), Bohm and Stein (2009), and Christodoulakis and Satchell (2009)):
Cumulative Accuracy Profile (CAP) and Gini Coefficient
Receiver operating characteristic (ROC), ROC measure and Pietra index
Bayesian error rate
Entropy measures (e.g., conditional information entropy ratio (CIER))
Information value
Kendall’s Tau & Somers’ D
Brier score
Divergence
Commonly-used calibration methodologies include:
Binomial test with an assumption of independent default events
Binomial test with an assumption of non-zero default correlation
Chi-square test
Brier score
For LGD and EAD models, quantitative validation techniques are significantly less advanced than those used for PD. There are four generally accepted methods for assigning LGD to non-default facilities: workout LGD; market LGD; implied historical LGD; and implied market LGD. Of these four methods, workout LGD is the most commonly used in the industry. Risk drivers such as the type and seniority of the loan, existing collateral, the liquidation value of the obligor’s assets, and the prevailing bankruptcy laws should be considered for LGD estimation.
For EAD models, banking institutions typically use either the cohort method or fixed-horizon method in the construction of the development dataset for EAD estimation. The requirements for the estimation process of EAD and the validation of EAD estimates are similar to those for LGD.
11.4.2 Validation of Assumptions and Methodologies: CCAR/DFAST Models
In C&I stress loss modeling, estimations are typically made at the loan level with reporting by segments. There are many dimensions to segmentation, but industry classification is one of the most statistically significant. To show that a particular segmentation approach has appropriate granularity (i.e., segments have sufficient data to develop robust and testable estimates that capture different underlying portfolio risk characteristics), given the modeling objective of forecasting losses under normal and stressed macroeconomic environments, banking institutions provide:
Business rationale for the segmentation, which could be either the business requirements driving a non-statistical segmentation, or the business intuition for the segmentation variables of a statistical segmentation
Evidence that developing a model based on the segmentation approach is feasible (e.g., the number of defaults per segment is adequate based on some criterion)
For statistical segmentations, discussion of the trade-offs compared to alternative segmentations such as a methodology with appropriate explanatory variables but no or fewer segmentation
For statistical segmentation, if available, analysis to justify differences between segments (e.g., central tendency and / or dispersion of distributions, risk factors, sensitivities to common risk factors).
One bank used a five-step, iterative segmentation process that combined business intuition with statistical analysis to define segments for the PD and Rating Migration Models:
Step 1: Risk managers, in consultation with representatives from the front-line business and risk units, propose initial industry and geographic segments.
Step 2: Model developers statistically test the proposed segments, working closely with the risk unit to refine the segmentation. The model developers may also suggest additional risk drivers for each segment based on any statistical analyses conducted.
Step 3: Risk and model developers work iteratively to refine the list of segments and determine appropriate and statistically relevant risk drivers.
Step 4: Risk and model developers propose industry and geographic segments to the senior committee for review. Alternative segmentation schemes may be proposed at this stage.
Step 5: The senior committee reviews, challenges, and decides on the segmentation approach with the understanding that some segments may be adjusted and re-approved following model calibration.
Model validation observed that model developers systematically refined and combined the initially proposed segments and tested the adjusted-segment models against historical data to measure the impact on the model’s performance across industries and geographies. Key statistical performance measures including ROC, variance inflation factor (VIF), among others, were provided and the values were found to be stable across all segments. However, no quantitative analysis was provided in support of the stated objective – “First, portfolios must exhibit relatively homogenous behavior within a given segment (i.e., relatively uniform default and rating migration behavior with respect to changes in the model inputs). Second, portfolios must exhibit differentiated risk characteristics across segments.” Segmentation issue was extensively discussed by the senior committee from a functional soundness perspective.
11.4.3 Validation of Assumptions and Methodologies: Summary and Conclusions
Effective validation practices should include:
Assessing conceptual soundness of the model and relevance to published research and/or sound industry practices
Testing assumptions and assessing appropriateness of the chosen modeling approach for intended business purposes
Reviewing alternative methodologies and designs
Evaluating the segmentation and variable selection processes reflecting appropriate consideration for portfolio risk characteristics.
11.5 Validation of Model Performance
Quantitative credit risk models, particularly those that are complex, can produce inconsistent results. One such example is forecasting CRE property values under stressed macroeconomic scenarios. Some banking institutions develop NOI and Cap Rate models separately in deriving stressed property values (where Property Value = NOI/Cap Rate) for income-producing CRE properties as functions of CRE-related macroeconomic variables such as the mortgage rate, interest rate, and federal funds rate. These NOI and Cap Rate models, when developed separately, can lead to inconsistent forecasts, such as property values rising during stress periods. Therefore, an important step in a model validation is to assess the reasonableness of model outputs. This aspect of model validation relies heavily upon a model validator’s professional expertise and judgment. The review of model performance should include sensitivity analysis, statistical tests (performed either independently from or with the model development team), and other evaluations commensurate with the type of model and scope of the validation. The focus here is to understand the limits of the model and the conditions that indicate whether the model is performing appropriately or not.
Sensitivity analysis is an important tool for assessing model robustness and for checking model stability. In sensitivity analysis, a model’s output is evaluated by changing individual or a set of inputs to understand the model’s dependency on these individual inputs.
11.5.1 Validation of Model Performance: AIRB Regulatory Capital Models
Basel II (2006) paragraphs 388, 389, 417, 420, 449, 450 and 500–504 provide guidance related to the performance of internal rating systems and the accuracy of risk estimates. Paragraph 389 emphasizes that a banking institution’s “rating and risk estimation systems and processes provide for a meaningful assessment of borrower and transaction characteristics; a meaningful differentiation of risk; and reasonably accurate and consistent quantitative estimates of risk.” However, “it is not the Committee’s intention to dictate the form or operational detail of banks’ risk management policies and practices.”
Based on an internal Basel survey (BCBS (2016)), no jurisdiction has defined a minimum standard for the discriminatory power of rating systems or minimum standards for PD calibration. Banks define their own standards for model performance, informed by Basel Committee guidelines and industry standards and, for certain performance metrics, benchmark themselves against industry practices. The banks’ own standards are then reviewed and challenged by their supervisors. For low-default portfolios, banks and supervisors seek “alternatives to statistical evidence” or take recourse to benchmarking. The internal Basel survey also identified commonly used statistical tests and statistics for backtesting purposes. Table 11.1 summarizes statistical tests and related statistics used by banks for back-testing.
Table 11.1. Backtesting tests.
| Statistical tests and related statistics used by banks for backtesting (BCBS (2016)) | ||
| Discriminatory power of rating systems | PD calibration | LGD and EAD calibration |
|
|
|
Most of these statistical tests are sensitive to the assumption of independent observations. However, such independent tests give conservative results. Backtesting generally includes out-of-sample and out-of-time tests. Backtesting failure is generally not a standalone trigger for rejection of a model. Instead of rejecting a model because of backtesting failures, less strict reactions such as capital add-ons may be applied until the model weaknesses are addressed.
11.5.2 Validation of Model Performance: CCAR/DFAST Models
11.5.2.1 Federal Reserve SR 15–18 Guidance on Assessing Model Performance
A firm should use measures to assess model performance that are appropriate for the type of model being used. The firm should outline how each performance measure is evaluated and used. A firm should also assess the sensitivity of material model estimates to key assumptions and use benchmarking to assess reliability of model estimates (see Appendix C, “Use of Benchmark Models in the Capital Planning Process” and Appendix D, “Sensitivity Analysis and Assumptions Management”).
A firm should employ multiple performance measures and tests, as generally no single measure or test is sufficient to assess model performance. This is particularly the case when the models are used to project outcomes in stressful circumstances. For example, assessing model performance through out-of-sample and out-of-time backtesting may be challenging due to the short length of observed data series or the paucity of realized stressed outcomes against which to measure the model performance. When using multiple approaches, the firm should have a consistent framework for evaluating the results of different approaches and supporting rationale for why it chose the methods and estimates ultimately used.
A firm should provide supporting information about models to users of the model output, including descriptions of known measurement problems, simplifying assumptions, model limitations, or other ways in which the model exhibits weaknesses in capturing the relationships being modeled. Providing such qualitative information is critical when certain quantitative criteria or tests measuring model performance are lacking.
Quantitative validation of loss models involves backtesting, sensitivity analysis and application of key statistical tests to gauge overall model robustness. Depending on the underlying modeling approach, the most appropriate metrics should be selected covering relevant validation areas. These metrics may be evaluated in-sample, out-of-sample or across multiple sub-samples. Table 11.2 describes various validation areas and the associated key metrics.
Table 11.2. Metrics for outcomes analysis (Shaikh et al. (Reference Shaikh, Jacobs and Sharma2016)).
| Validation Areas | Description | Key Metrics |
|---|---|---|
| Accuracy | Comparison of actual to model predictions (e.g. default, upgrade, downgrade rates by risk ratings.) |
|
| Stability | Analysis of shift in population characteristics from the time of model development to any reference time period. |
|
| Sensitivity | Capturing the sensitivity of models to macroeconomic factors by performing factor prioritization and factor mapping. | Sensitivity ratio. |
| Model discrimination | Validation of the statistical measure of models’ ability to discriminate risk. |
|
| Vintage analysis | Comparison of behavior of loss over time. |
|
A banking institution modeled LGD for income-producing CRE loans using a Tobit regression with the inverse of LTV, i.e., 1/LTV, property value, and macroeconomic variables as predictor variables. The LGD model was segmented by property types, where the region-specific indicators were used for the US loans to account for the LGD variation across regions. LGD regressions were estimated using a combination of the bank’s internal default data and external Trepp default data. Model risk management (MRM) evaluated the following items:
The modeling methodology and pros and cons of the selected modeling approach
The economic intuition behind the choice of explanatory variables used in the model
Consistency of the explanatory variables across the different property type and regional segments.
More specifically, MRM evaluated the pros and cons of the selected modeling approach with respect to industry publications and academic research in the public domain and observed that the model may not sufficiently capture the following effects: (1) impact of the rent rate on vacancy rate; (2) impact of new construction on the vacancy rate and rent rate; (3) impact of usage factor on vacancy rate and rent rate; (4) impact of property-type specific risk drivers on vacancy rate and rent rate; (5) cyclical nature of CRE market dynamics; and (6) impact of rent rates on cap rates.
Additionally, the validation tests conducted by MRM showed:
Residuals for all LGD regressions (Tobit model) failed normality and homoscedasticity tests. This is important, since the Tobit model makes normality and homoscedasticity distributional assumptions for regression residuals.
The LGD model does not capture differences due to default type (term default vs maturity default) or recourse type (recourse vs non-recourse).
The model underpredicts losses for specific property classes (retail, multifamily, industrial, etc.) based on in-sample and out-of-sample backtesting results. The underprediction in the LGD model is partially mitigated by a defaulted property value adjustment applied to the loss forecasts.
The coefficients for GDP growth rate in the retail segment and for property value in the multifamily segment showed statistically significant counter-intuitive signs, when the model was calibrated using a different time frame.
When re-estimated at a regional level, the GDP growth rate variable in the Retail LGD model shows a statistically significant counter-intuitive sign for a geographic segment.
When the model is re-estimated separately using the bank’s internal data only and the external Trepp CMBS data only, the coefficient estimate on the Unemployment rate in the retail and industrial segments shows opposite signs between the two datasets, which is not intuitive.
11.5.3 Model Performance Validation: Summary and Conclusions
In validating model performance, sensitivity analysis is an important and effective tool for assessing model robustness and checking for model stability and can be used for:
Model validation testing
Quantifying a model risk buffer
Demonstrating the conservatism of model assumptions
11.5.4 Outcomes Analysis
Outcomes analysis compares the actual estimates produced by a model against historically observed outcomes, as opposed to identifying the limitations of a model. Examples of outcomes analysis include backtesting, out-of-sample testing, and actual-to-expected comparisons on an ongoing basis. Outcomes analysis should be performed prior to implementing the model and done at least on an annual basis after implementation to ensure it is performing as expected. Error limits should be developed for the outcome analysis results, and if the actual errors from the model exceed those limits, predetermined actions should be required. If the model is recalibrated or updated on an annual basis, limits should also be developed that monitor the size and frequency of re-estimating the model. If updating the model repeatedly results in large changes in the estimates produced by the model, then certain actions should be required, including external model validation, a recalibration of the model, or even development of an entirely new methodology or type of model. The action type and triggers for the specific action should be set out in advance of, and in accordance with, the use and risk of the model. These policies should be written and included in the model governance policies of the banking/institution. The initial model validation could be used to help set error limits for the model.
11.5.4.1 Outcomes Analysis: AIRB Regulatory Capital Models
Banking institutions are expected to conduct a number of exercises to demonstrate the accuracy of their IRB estimates (Paragraphs 388 and 389 of the Basel II Accord.) Such exercises should include comparisons of estimates to relevant internal and external data (benchmarking), comparisons of estimates to those produced by other estimation techniques (often referred to as Challenger Models), and the comparison of model estimates to realized outcomes (backtesting). The benchmarking exercises could be any of the following (BCBS internal observations):
Cross-bank comparisons: These exercises involve aggregating IRB estimates or internal ratings across portfolios and portfolio segments, and comparing the results with those of other peer banks or external sources (e.g., external agency ratings such as those provided by Moody’s and S&P).
Common obligor analysis: These exercises involve aggregating IRB estimates for a subset of exposures where multiple banks are exposed to an identical set of obligors. Identifying commonly held obligor sets effectively controls for deviations in the three key parameters from the benchmark that can be attributed to differences in risk.
Hypothetical portfolio exercises: In these exercises, banks are asked to develop IRB estimates for a hypothetical set of exposures.
Backtesting exercises: Backtesting exercises involve the comparison of realizations of historical defaults and losses to IRB estimates. Specifically, historical realizations of default rates are compared to PD estimates, historical realization of loss rates on defaulted exposures are compared to LGD estimates, and historical realizations of exposure sizes on defaulted exposures are compared to EAD estimates. Such comparisons can show whether bank estimates show a reasonable relationship to actual risk-determined outcomes.
Thematic reviews of modeling practices (mostly conducted by supervisors): Benchmarking exercises need not be restricted to quantitative considerations. Some supervisors mentioned thematic reviews of specific modeling practices across banks. The “benchmark” in this case might be practices observed to be common or expected, with an objective to identify bank practices that deviate from this benchmark. Such exercises are resource-intensive in that they typically require on-site interactions and in-depth reviews of model development documentation.
Regression-based exercises (mostly conducted by supervisors): Some supervisors apply regression techniques to the development of benchmarks for PD, LGD and EAD estimates. Regression specifications and techniques vary but they all necessarily rely on risk-driver information obtained through supervisory reporting to produce benchmark estimates that can then be compared to banks’ estimates.
Benchmarking and challenger models are both important analytical tools that can be used in a bank’s validation efforts. However, it is important to note that the two types of tools focus on different aspects of model validation. Benchmarking provides insights into the performance of IRB parameter estimates (outcomes analysis) relative to a benchmark; whereas, challenger models provide insights into the bank’s chosen modeling approach (process analysis) relative to alternative modeling approaches.
A benchmark model was developed by a banking institution as a part of its LGD validation using external Moody’s URD data. The benchmark model assumed a linear relationship between LGD and several risk drivers. For each observation, the dataset included risk drivers and other attributes such as instrument type and default type, which were used to build a statistically based benchmark model. The LGD was estimated as a linear function of the following variables: debt cushion; instrument type; default type; instrument ranking; issuer total debt; and principal amount at default. All estimated coefficients were statistically significant at the 95% confidence level and the R-square was above 40%. Predicted LGDs from the benchmark model developed on Moody’s data were higher than the observed LGDs.
The validation report noted, among other issues, the following limitations for the benchmark model:
The majority of Moody’s URD data used to construct the benchmark model used corporate bonds; whereas, the bank’s LGD data used bank loans.
The bank faced challenges in mapping its internal data to external data in order to use the benchmark model that was developed on Moody’s data. The selected risk drivers in the benchmark model, for example, debt cushion, may not have been mapped perfectly to an equivalent variable in the internal data.
Facility LGD in a segment typically exhibits distributions such as a bi-modal or beta distribution by common risk mitigants such as collateral, seniority, or type of business or products. LGD performance can be related to the macroeconomic factors in a country or a sector, and to a bank’s recovery process or practice.
The predicted LGDs from the benchmarking model exhibit a Gaussian-like distribution, which differs from the observed LGD distribution based on Moody’s URD data. This result indicates that a linear regression model based on the statistically selected risk drivers is not adequate to capture the LGD profile.
The predicted facility-level LGDs were mostly concentrated in the 30%–45% facility grades, indicating the performance of the benchmarking model was poor in differentiating facility-level LGDs.
11.5.4.2 Outcomes Analysis: CCAR/DFAST Models
In CCAR/DFAST, benchmark modelsFootnote 2 should provide a significantly different perspective (such as different theoretical approach, different methodology, and different data) as opposed to just tweaking or making minor changes to the primary or champion approach.
Examples of good or leading practices would include the following:
Identification of material portfolios that require a benchmark
Having a set process for developing and implementing benchmark models (as banks have for all models)
Clear expectations for benchmarks to supplement results of the primary or champion model
Using several different benchmark models, each with their own different strengths, thereby allowing the bank to triangulate around an acceptable model outcome
Using benchmark models to alter primary or champion model results as a “bridge” or transition to eventual better modeling in the future.
Examples of bad or lagging practices would include the following:
No benchmark model, not a good fit, or not evaluated for quality
Overreliance on developer benchmarks by validation staff
Differences in results not reconciled or explained
Change a variable or two, or otherwise make slight tweaks to the model, and then claim that is a benchmark model
Results of benchmarking exercises can be a valuable diagnostic tool in identifying potential weaknesses in a bank’s risk quantification system. However, benchmarking results should never be considered definitive indicators of the relative accuracy or conservativeness of banks’ estimates. The benchmark itself is an alternative estimate, and differences from that estimate may be due to different data, different levels of risk, or different modeling methods. The identification of outliers from the benchmark should always be investigated further to determine underlying causes of divergences.
Because no single benchmarking technique is likely to be adequate for all situations, the development of underlying benchmarks should also consider multiple approaches to arrive at more informed conclusions. As examples, benchmarks can be constructed using unweighted or exposure-weighted averages, PIT or TTC estimates, or with or without regulatory add-ons.
Benchmarking exercises should consider multiple layers of analyses to avoid drawing misleading conclusions. For example, analysis at a portfolio level may suggest alignment with the benchmark when a bank’s estimates overpredict for some sub-portfolios or segments (by product type, by geography, or by rating grade) but underpredict for others.
Benchmarking analyses that rely on multiple data sources are likely to produce more robust analyses than those that rely on a single data source. Similarly, benchmarking analyses that also consider qualitative factors (differing modeling approaches and environmental factors) are likely to be more informative than strictly quantitative exercises.
11.6 Model Validation Report
The final step of a model validation is communication of the results through a model validation report. The model validation report should be a written report that documents the model validation process and results. The report should highlight potential limitations and assumptions of the model and it may include suggestions on model improvements.
11.6.1 Model Validation Report: AIRB Regulatory Capital Models
Validation reports should be transparent. Transparency refers to the extent to which third parties, such as rating system reviewers and internal or external auditors and supervisors, are able to understand the design, operations and accuracy of a bank’s IRB systems and to evaluate whether the systems are performing as intended (US Final Rule, Section 22(k)). Transparency should be a continuing requirement and achieved through documentation. Banks are required to update their documentation in a timely manner, such as when modifications are made to the rating systems.
Documentation should encompass, but is not limited to, the internal risk rating and segmentation systems, risk parameter quantification processes, data collection and maintenance processes, and model design, assumptions, and validation results. The guiding principle governing documentation is that it should support the requirements for the quantification, validation, and control and oversight mechanisms, as well as the bank’s broader credit risk management and reporting needs. Documentation is critical to the supervisory oversight process. A bank’s validation policy should outline the document requirements.
One bank’s validation policy specified the documentation template for model assessment along the following topics:
Validation timeline
Summary of validation
Intended uses of the Model
Model input and data requirement
Data processing procedures and transformations
Model assumptions
◦ Market, business or data related assumptions and decisions
◦ Mathematical, statistical or technical assumptions and decisions
◦ General assessment of model assumptions
Model review
◦ General model review
◦ Alternative modeling approach
◦ Assessment of business model documentation and testing
Limitations and compensating controls
◦ Limitations of the general modeling framework
◦ Limitations of the model implementation and technical assumptions
◦ Compensating controls
Validation restrictions and corrective actions
Model or system control environment
Model implementation and approximation
Testing approach and validation procedures
◦ Justification for choice of the testing approach
◦ Independent implementation of business model
◦ Independent implementation of benchmark model
◦ Comparison of business models/engines
◦ Business test results
11.6.2 Validation Report: CCAR/DFAST Models
SR 15–18 guidance (p. 9 ): “A firm’s documentation should cover key aspects of its capital planning process, including its risk-identification, measurement and management practices and infrastructure; methods to estimate inputs to post-stress capital ratios; the process used to aggregate estimates and project capital needs; the process for making capital decisions; and governance and internal control practices. A firm’s capital planning documentation should include detailed information to enable independent review of key assumptions, stress testing outputs, and capital action recommendations.”
CCAR/DFAST validation reports should include assessment of model overlays, if any.
11.6.3 Model Validation Report: Summary and Conclusions
The Model Validation Report should be sufficiently detailed so that parties unfamiliar with a model can understand how the model operates, its limitations, and its key assumptions.
For a complex model with many components (i.e., segments or sub-models), model developers are required to provide all the tests for each segment/sub-model in a comprehensive document, which provides sufficient evidence that a test conducted for one segment is still valid for another segment, so that a model performance can be compared and reported separately for each segment/sub-model. Technical soundness must be assessed for the overall model and every model component contained in the same model submission.
11.7 Vendor Model Validation and Partial Model Validation
The regulatory expectation is that banks will apply the same rigor in validating vendor models as they do for their internally developed models. Comprehensive validations on so-called black box models developed and maintained by third-party vendors are therefore problematic, because the mathematical code and formulas are not typically available for review (in many cases, a validator can only guess at the cause-and-effect relationships between the inputs and outputs based on the model’s documentation provided by the vendor). Where the proprietary nature of these models limits full-fledged validation, banking institutions should perform robust outcomes analysis including sensitivity and benchmarking analyses. Banking institutions should monitor models periodically and assess the model’s conceptual soundness, supported by adequate documentation on model customization, developmental evidence, and applicability of the vendor model applied to the bank’s portfolio.
Applicable standards from supervisory guidance include:
The design, theory, and logic underlying the model should be well documented and generally supported by published research and sound industry practice.
The model methodologies and processing components that implement the theory, including the mathematical specification and the numerical techniques and approximations, should be explained in detail with particular attention to merits and limitations.
Banking institutions are expected to validate their own use of vendor products and should have systematic procedures for validation to help it understand the vendor product and its capabilities, applicability, and limitations. Validation should begin with a review of the documentation provided by the vendor for the model with a focus on the following:
What is the level of model transparency?
Does the system log results of intermediate calculations?
How complete/detailed and granular is the level of reporting?
Are limitations of the model clearly communicated with the magnitude/scope of possible effects?
Are boundary conditions (i.e., conditions under which the model does not perform well) described in the documentation?
What is the level of documentation provided?
11.7.1 Partial Model Validation
As noted previously, comprehensive model validations consist of three main components: conceptual soundness, ongoing monitoring and benchmarking, and outcomes analysis and backtesting. A comprehensive validation encompassing all these areas is usually required when a model is first put into use. Any validation that does not fully address all three of these areas is by definition a limited-scope or partial validation.
Four considerations can inform the decision as to whether a full-scope model validation is necessary: