SIMULTANEOUS CONFIDENCE BANDS FOR CONDITIONAL VALUE-AT-RISK AND EXPECTED SHORTFALL

Conditional value-at-risk (CVaR) and conditional expected shortfall (CES) are widely adopted risk measures which help monitor potential tail risk while adapting to evolving market information. In this paper, we propose an approach to constructing simultaneous confidence bands (SCBs) for tail risk as measured by CVaR and CES, with the confidence bands uniformly valid for a set of tail levels. We consider one-sided tail risk (downside or upside tail risk) as well as relative tail risk (the ratio of upside to downside tail risk). A general class of location-scale models with heavy-tailed innovations is employed to filter out the return dynamics. Then, CVaR and CES are estimated with the aid of extreme value theory. In the asymptotic theory, we consider two scenarios: (i) the extreme scenario that allows for extrapolation beyond the range of the available data and (ii) the intermediate scenario that works exclusively in the case where the available data are adequate relative to the tail level. For finite-sample implementation, we propose a novel bootstrap procedure to circumvent the slow convergence rates of the SCBs as well as infeasibility of approximating the limiting distributions. A series of Monte Carlo simulations confirm that our approach works well in finite samples.


INTRODUCTION
When investing in a risky asset, investors assuming a long position are exposed to downside tail risk due to a sharp fall of the asset price, and investors assuming a short position are exposed to upside tail risk due to a sharp rise of the asset price.
In the practice of risk management and investment evaluation, it is important to monitor potential tail risk while adapting to evolving market information, particularly the volatility dynamics (McNeil and Frey, 2000, pp. 272-273). A leading vehicle that aids in fulfilling this task is the conditional tail risk measure, among whose subordinates the most prominent are conditional value-at-risk (CVaR) and conditional expected shortfall (CES). 1 Let {R t } be a sequence of returns on a certain risky asset, and let I t be the historical information available up to time t. Based on observed data from time 1 through n, at a tail level τ which is close to 0, the oneperiod ahead upside CVaR and CES are, respectively, defined as where Q R n+1 (·|I n ) is the conditional quantile function of R n+1 given I n . Similarly, the one-period ahead downside CVaR and CES at the tail level τ are, respectively, defined as where Q −R n+1 (·|I n ) is the conditional quantile function of −R n+1 given I n . In addition, to facilitate a comparison of downside and upside tail risk, we define, respectively, the one-period ahead relative CVaR and CES at the tail level τ as inference at each tail level, because this incurs the multiplicity effect that leads to deflated coverage rates of confidence intervals (e.g., Romano, Shaikh, and Wolf, 2018). On the contrary, the SCB is immune to the multiplicity effect and asymptotically targets the nominal confidence level. We model the dynamics of R t via a semiparametric location-scale model, in which the conditional mean and variance are parametrically specified, but the innovation distribution is unspecified. The key assumption concerning the innovation distribution is that it has Pareto-type heavy tails. We first fit the locationscale model to obtain standardized residuals, and then employ extreme value theory (EVT) to estimate the risk measures. This type of two-stage approach is pioneered by McNeil and Frey (2000) and has now been standard in the field of conditional risk measure estimation (e.g., Chan et al., 2007;Martins-Filho et al., 2018;Hoga, 2019aHoga, , 2022. When deriving the asymptotic results, we work under the double asymptotics, namely n → ∞ and τ → 0, which is common to EVT-based asymptotic theory (e.g., de Haan and Ferreira, 2006). In our asymptotic theory, we consider two scenarios: (i) the extreme scenario that allows for extrapolation beyond the range of the available data and (ii) the intermediate scenario that works exclusively in the case where the data are adequate relative to the tail level. Both of the two scenarios have their relative merits, and when to use which depends on whether extrapolation is needed. Theoretically, when the data are adequate relative to the tail level (that is, when extrapolation is not needed), the intermediate scenario can provide a more accurate finite-sample approximation, because some terms that are neglected in the extreme scenario are recalled and hence the approximation error is reduced. This is confirmed by a set of Monte Carlo simulations in Section 5. Nonetheless, our simulations also reveal that when extrapolation is indeed necessary (that is, when the tail lies out of the range of the available data), the extreme scenario works reasonably well. This paper makes several contributions. First, this paper provides a unifying asymptotic theory for simultaneous inference for tail risk. The literature has witnessed continued efforts to develop asymptotic theories for conditional risk measure estimators. Chan et al. (2007) and Hoga (2019a) derive asymptotic properties of estimators of CVaR and CES in the autoregressive moving average (ARMA)-generalized autoregressive conditional heteroskedastic (GARCH) model, whereas Martins-Filho et al. (2018) do so in a nonparametric location-scale model. The above papers confine their attention to one-sided tail risk (downside or upside only) and only consider tail risk measures at a single tail level. Hoga (2022) establishes a limiting theory for the distortion risk measure and expectile in a general location-scale model and considers SCBs. Again, only one-sided tail risk is considered in Hoga (2022). In addition, each of the existing studies only considers one of the two scenarios of asymptotic theory in this paper. Chan et al. (2007)) and Hoga (2019aHoga ( , 2022 derive their asymptotic theory in the extreme scenario, and the asymptotic theory of Martins-Filho et al. (2018) is developed under a condition that is essentially equivalent to the intermediate scenario. By incorporating the two scenarios in a unifying framework, our theory substantially generalizes the existing studies.
Second, this paper provides a limiting theory for the relative risk measure, allowing risk managers to monitor tail risk not only in the absolute sense, but also in the relative sense. Inference for relative tail risk is relevant because it is useful to both expected utility and nonexpected utility investors in making investment decisions. The cumulative prospect theory (CPT) of Tversky and Kahneman (1992) has confirmed that nonexpected utility investors are generally risk-seeking for tail gains, but are risk-averse for tail losses. A CPT investor generally overweights the tails of the return distribution and hence prefers lotterylike, or positively skewed, assets. Therefore, our approach possesses practical relevance in that it can help CPT investors to identify assets with more tail gains than tail losses (that is, assets with more upside risk than downside risk), which can make their overall wealth more lottery-like. On the other hand, Barberis and Huang (2008) show that preference for a lottery-like asset can make the asset overpriced, and hence creates a chance for expected utility investors who can try to exploit the overpricing by assuming short positions. Nonetheless, as Barberis and Huang (2008) argue, when shorting the lottery-like asset, expected utility investors face the risk of poor short-term performance. Then, our approach would help expected utility investors to monitor the short selling strategy by quantifying the loss relative to the gain.
Third, we propose a novel bootstrap procedure for finite-sample implementation. Two implementational issues arise in employing our established theory to construct SCBs. The first issue is about the slow convergence rates of the theoretical SCBs in both the extreme and intermediate scenarios, and the other issue is due to infeasibility of approximating the limiting distributions in the intermediate scenario. To circumvent these issues, we suggest an easy-to-implement bootstrap procedure. The bootstrap procedure is based on uniform expansions of the risk measure estimators and employs the idea of multiplier bootstrap. Extensive simulations confirm that the bootstrap procedure delivers favorable finite-sample performance.
Finally, we give a formal treatment to the issue of information truncation under general conditions. Information truncation arises frequently due to the fact that, although the dynamics usually involve infinite historical information, a feasible forecast of tail risk is based on finite observations. In addition, our framework allows for flexible dynamic models. Existing studies are usually confined to linear models such as the ARMA-GARCH model (McNeil and Frey, 2000;Chan et al., 2007;Hoga, 2019a). On the contrary, our framework facilitates flexible specifications, particularly those depicting nonlinear and nonsmooth dynamics (e.g., the threshold GARCH [TGARCH] model). The flexibility is of much practical importance, as accounting for nonlinearity is often critical to the success of financial time series modeling (e.g., Tsay, 2010, Chaps. 3 and 4).
The rest of this paper is organized as follows. Section 2 presents the framework and illustrates how to estimate CVaR and CES. Section 3 establishes the asymptotic theory. Section 4 describes the bootstrap method and justifies its validity. Section 5 contains results of a series of Monte Carlo simulations. Section 6 concludes the paper and describes future work. All mathematical proofs of main results are relegated to the Appendix. Auxiliary lemmas along with their proofs can be found in the Supplementary Material.

The Framework
Assumption 2.1. R t follows the location-scale model: where m(I t−1 ,θ 0 ) and σ 2 (I t−1 ,θ 0 ), both known up to an unknown parameter θ 0 , are, respectively, the conditional mean and conditional variance of R t given I t−1 . The innovation ε t is independent of I t−1 and forms a sequence of independent and identically distributed (i.i.d.) continuous random variables with zero mean and unit variance. The information set at time t is Our framework allows for flexible specifications. Model (2.1) naturally nests many commonly used models, such as the ARMA-GARCH model that has been widely adopted in estimation of conditional tail risk measures (e.g., Chan et al., 2007;Hoga, 2019a). In addition, our setting facilitates specifications that are nonlinear (even nonsmooth) in the information set. A typical example is the TGARCH of Glosten, Jagannathan, and Runkle (1993). Finally, our framework subsumes the popular class of volatility-in-mean models (e.g., Engle, Lilien, and Robins, 1987).
Remark 2.1. In an alternative framework, we may work with nonparametric conditional mean and variance functions. Inspired by Martins-Filho et al. (2018), suppose X t is a d-dimensional random vector which may include lagged returns {R t− } p =1 for positive integers d and p, we may model R t nonparametrically by

2)
where m(·) and σ (·) are unknown functions, and the innovation ε t is independent of X t . The advantage of the nonparametric model (2.2) is that it can avoid the risk of misspecification that emerges from parametric functional forms for m(·) and σ (·). In this paper, we focus on the parametric model in Assumption 2.1, but our theory can be extended to the nonparametric model (2.2) with appropriate technical adjustments. We refer the reader to Remark 3.1 and Lemma S2.1 in the Supplementary Material for more detailed discussions.
We also assume that the distribution of ε t has Pareto-type heavy tails. Let F ε (·) = Pr(ε t ≤ ·) and F −ε (·) = Pr(−ε t ≤ ·) be, respectively, the distribution functions of ε t and −ε t . Then, assume that 1 − F ε and 1 − F −ε are both regularly varying at infinity in the sense that where γ R > 0 and γ L > 0 are the extreme value indices associated with the right and left tails, respectively. It is worth mentioning that, under Assumption 2.1, the innovation ε t is assumed to have unit variance. This implicitly implies that γ R < 1/2 and γ L < 1/2. The motivation for assuming heavy-tailed innovations is twofold. First, in the context of financial time series modeling, models with the classical normal innovation are frequently found to be inadequate (e.g., Hamilton, 1994, Chap. 21) and the literature has witnessed increasing popularity of models with heavy-tailed innovations such as the Student's t (e.g., Bollerslev, 1987) and the skewed t (e.g., Theodossiou and Savva, 2015) variables. Second, in the context of conditional risk measure estimation, McNeil and Frey (2000) show by backtesting that procedures based on heavy-tailed innovations deliver better risk measure estimates than methods that ignore the heavy-tailed feature. Since McNeil and Frey (2000), heavy-tailed innovations have been pervasive in subsequent studies (e.g., Chan et al., 2007;Martins-Filho et al., 2018;Hoga, 2019a). Based on the model structure (2.1), the conditional tail risk measures become

8)
where Q ε (·) is the quantile function of ε t . Suppose, for type ∈ {U, D, R}, that we have estimators type-CVaR and type-CES for, respectively, type-CVaR and type-CES. For a tail region [τ l ,τ u ] with 0 < τ l ≤ τ u , we consider the maximum absolute log-ratios type-CVaR(τ ) type-CVaR(τ ) and sup τ ∈[τ l ,τ u ] log type-CES(τ ) type-CES(τ ) . (2.9) It is the task of Section 3 to establish the asymptotic distributions of the maximum absolute log-ratios, which enable us to construct SCBs for the risk measures.

Estimation of CVaR and CES
In this section, we demonstrate how to employ EVT to estimate CVaR and CES for a given tail level, based on observed data from time 1 to n. The key step is to obtain estimators of the quantiles Q ε (1 − τ ) and Q ε (τ ) as well as estimators of the conditional tail means We note that simple nonparametric estimators of the above quantities based on the empirical distribution function have also been considered, and their asymptotic properties have been studied (see Gao and Song, 2008). Suppose that there is a consistent estimatorθ of θ 0 . LetĨ t−1 be the truncated information set, which is generated by feasible information up to time t − 1. The truncation is necessary when the information set relies on infinite past observations. For example, when . . ,n. For d n < n such that d n → ∞ as n → ∞, we discard the residuals for t < d n and work withε t for t = d n , . . . ,n. The discarding eliminates the effect of information truncation. Similar discarding is conducted in Chan et al. (2007) and Hoga (2019a) for the ARMA-GARCH model. Here, we generalize their treatments to general models.
The following estimation procedure is standard in EVT-based tail estimation (see Hoga (2019a) for a recent example and de Haan and Ferreira (2006) for a comprehensive documentation). Denote by F ← ε the left continuous inverse of F ε . Then, for x > 0, the We first consider estimation of the right tail, namely Q ε (1 − τ ) and E[ε|ε > Q ε (1 − τ )]. For this purpose, we employ the integer sequence k 1 ≡ k 1,n → ∞ with 1 ≤ k 1 ≤ n and k 1 /n → 0 as n → ∞. Note that, for a small τ , (2.10) implies the following approximation: (2.12) Then, an estimator of Q ε (1 − τ ) can be obtained based on estimators of U ε (n/k 1 ) and γ R . Letε (k) be the (k + 1)-largest value of {ε t } n t=d n , for k = 0,1, . . . ,n − d n . A suitable estimator of U ε (n/k 1 ) isε (k 1 ) , which is usually known as the intermediate order statistic (de Haan and Ferreira, 2006, Chap. 2). For estimation of γ R , we concentrate on the prevalent Hill's (1975) estimator Pluggingε (k 1 ) andγ R into (2.12) delivers the following estimator of Q ε (1 − τ ): (2.13) To estimate E[ε|ε > Q ε (1 − τ )], we invoke Proposition 4.1 of Pan, Leng, and Hu (2013), which implies that lim τ →0 Since τ is close to 0, the above equation motivates the following estimator of E[ε|ε > Q ε (1 − τ )]: (2.14) Estimation of the left tail proceeds in the same way as what is done for the right tail. Note that For another integer sequence k 2 ≡ k 2,n → ∞ with 1 ≤ k 2 ≤ n and k 2 /n → 0 as n → ∞, following similar arguments leading to (2.13) and (2.14), we have the estimatorQ ε (τ ) =ε (n−k 2 −d n ) , whereγ L is the Hill estimator of γ L based on {−ε t } and is written asγ L = 1 Plugging the above estimators into (2.5)-(2.8), we have estimators of the risk measures And apparently, we have estimators of the relative risk measures
Assumption 3.1 states the convergence rate ofθ . It is standard to assume n 1/2 -consistency (Bai and Ng, 2001;Hoga, 2019a), which is nested by Assumption 3.1 with υ 0 = 1. In addition, Assumption 3.1 allows for estimators with slower convergence rates, thereby providing more practical flexibility. Take the GARCH model for example. For the conventional Gaussian quasi-maximum likelihood estimator (QMLE), a standard condition to ensure n 1/2 -consistency is the moment condition E(ε 4 t ) < ∞. When the moment condition is suspicious, Assumption 3.1 can still be satisfied by, for example, the estimator of Hill (2015), which is n υ 0 /2consistent with υ 0 as close to 1 as desired.
Assumption 3.2(i) and (ii) is standard in estimation of heteroskedastic time series models. Assumption 3.2(iii) imposes common differentiability conditions. Assumption 3.2(iv) is similar to Assumptions A2 and A3 of Bai and Ng (2001) and is satisfied by some commonly used models with suitable moment conditions on ε t . For example, Assumption 3.2(iv) is easily shown to hold for the ARMA(1,1) model when E|ε t | υ 1 < ∞ for some υ 1 > 2/υ 0 .
Assumption 3.3(i) is common. Assumption 3.3(ii) and (iii) is used to control the effect of information truncation so that the truncation does not affect the asymptotic theory (see Bai and Ng (2001) for similar assumptions). In particular, Assumptions 3.3(iii) means that the cumulative effect of information truncation is asymptotically negligible. We note that the effect of discarding the residuals for t < d n matters. If there is no discarding (that is, the cumulative effect starts from t = 1 rather than t = d n ), we generally have n (1), and thus the cumulative effect of truncation is no longer negligible. Assumptions 3.2 and 3.3 are high-level conditions, and it takes effort to verify them for specific models. Appendix A.3 collects primitive conditions for them in the ARMA-GARCH, the TGARCH, and the GARCH-in-mean models.
Remark 3.1. As discussed in Remark 2.1, it is possible to extend our work to the nonparametric model (2.2). Letm(·) andσ (·) be, respectively, estimators of m(·) and σ (·), and letε * t = (R t −m(X t ))/σ (X t ), for t = 1, . . . ,n, be the corresponding standardized nonparametric residuals. Theε * t are employed to estimate CVaR and CES in the same manner as in Section 2.2. Inspecting the technical derivations of our theoretical results, we find that the only adjustment needed to obtain the theoretical results for nonparametric models is to prove that the results of Lemma S1.7 in the Supplementary Material hold forε * t . This lemma quantifies the difference between the tail empirical distribution of {ε * t } and that of {ε t }. This can be achieved by assuming some uniform convergence conditions on the estimatorsm(·) andσ (·), for example, where k 1/2 2 A L (n/k 2 ) → 0 as n → ∞. Assumption 3.4(i) and (ii) provides second-order controls over the quality of the approximations in (2.10) and (2.11), respectively. This type of second-order condition has been commonly employed to develop asymptotic results in tail estimation (see, e.g., de Haan and Ferreira, 2006;Chan et al., 2007;Einmahl, de Haan, and Zhou, 2016;Hoga, 2019a). A commonly employed class of distributions that satisfies Assumption 3.4 is the Hall class of heavy-tailed distribution (Hall, 1982). For the Hall class, the distribution function F(x) can be expanded as where C > 0, β 1 > 0, β 2 > 0, and D is a real number. Then, Assumption 3.4 holds with γ = 1/β 1 and ρ = −β 2 /β 1 . A typical member of the Hall class is the Student's t distribution (sstd). The t distribution with the degree of freedom ν belongs to the Hall class with β 1 = ν and β 2 = 2, which implies that both of its left and right tails satisfy Assumption 3.4 with γ R = γ L = 1/ν and ρ R = ρ L = −2/ν. Assumption 3.5. (i) τ l ≤ τ u satisfies τ l → 0 as n → ∞ and τ u = τ l for some constant ≥ 1. (ii) There exist two constants c 0 > 0 and c 1 ≥ 0 such that lim n→∞ k 1 /k 2 = c 0 and lim n→∞ nτ Assumption 3.5 states the orders of the effective sample sizes k 1 and k 2 as well as the order of the discarded sample size d n . In particular, the conditions lim n→∞ k −1/2 1 log(nτ l /k 1 ) = 0 and k 1 = o(n δ 0 ), for some 0 < δ 0 < υ 0 , ensure that the convergence rate ofθ is faster than that of the estimators of CVaR and CES, so that the estimation uncertainty ofθ does not affect the limiting behaviors of the estimators of CVaR and CES. In addition, it is necessary to make more discussions on the condition k 1 = o(n δ 0 ), which requires that only a limited portion of the sample can be used for estimation so as to ignore the estimation error contained inθ . We note that this condition is mild whenθ is n 1/2 -consistent (that is, υ 0 = 1), because EVT always requires that k 1 /n → 0. When υ 0 is small, however, a potential issue is that fewer data can be employed and the estimation error ofθ is large. We investigate the severity of this issue through simulations in Section 5. The simulation results indicate that, for a reasonable range of k 1 and k 2 , the estimation error ofθ does not have evident influence on the finite-sample performance of our approach.

Two Scenarios of Asymptotics
The following three propositions serve as the basis of our theoretical findings. In Propositions 3.1 and 3.2, we only present the results for upside CVaR and CES. The results for downside CVaR and CES are similar and can be found in Propositions A.1 and A.2 in Appendix A.1.
PROPOSITION 3.3. Under Assumptions 2.1 and 3.1-3.5, as n → ∞, Proposition 3.1 indicates that the estimation uncertainty of the risk measures are dominated by the estimation uncertainty of the quantiles and conditional tail means of ε t . In particular, the estimation effect of the model parameter does not affect the limiting behaviors of risk measure estimators up to the dominating order. Proposition 3.1 generalizes similar results established by Chan et al. (2007) and Hoga (2019a) in ARMA-GARCH models to general dynamic models and also enhances them to obtain uniform results. Proposition 3.2 moves on to decompose the estimation uncertainty of the quantiles and conditional tail means of ε t into two parts, with one part concerning the order statistics and the other concerning the extreme value index estimators. Finally, Proposition 3.3 indicates that the order statistics and the extreme value index estimators have the same convergence rates and are asymptotically independent.
Contemplating the decompositions in Proposition 3.2, we observe thatγ R − γ R is scaled by log (k 1 /(nτ )). Consequently, the order of log (k 1 /(nτ )) plays a key role in the limiting behaviors. Take the decomposition ofQ ε (1−τ ) Q ε (1−τ ) − 1 for example. Proposition 3.3 indicates that the convergence rates ofˆε U ε (n/k 1 ) − 1 is no longer asymptotically negligible and the limiting behavior of Based on the above discussion, we distinguish our asymptotic theory between two scenarios, depending on the magnitude of the effective sample sizes k 1 and k 2 relative to nτ . The first one, the extreme scenario, corresponds to the case when c 1 = 0. In this scenario, τ is of smaller order than k 1 /n, meaning that the tail to be estimated lies out of the range of the available data. The second one, the intermediate scenario, corresponds to the case when c 1 > 0. In this scenario, τ is of the same order as k 1 /n, meaning that the tail level is comparable to the size of the available data.
We argue that both the extreme and intermediate scenarios bear their own appeal and when to use which depends on the specific problem. In the extreme scenario, the intuition behind the condition c 1 = 0 is that the starting point of extrapolation (that is, k 1 /n and k 2 /n) is more central relative to the level at which the tail is to be estimated (that is, τ ). Accordingly, this scenario is capable of estimating very deep tails. For example, when one needs to estimate the tail risk at τ = 0.01% but the available data are only of size 1,000, the tail to be estimated lies out of the range of the data. In this case, the extreme scenario applies, but the intermediate scenario cannot work because no reasonable k 1 is comparable to nτ = 0.1.
To appreciate the relative merit of the intermediate scenario, consider the case where the sample size is 1,000 and the tail level is 1%. In this case, the size of available data is comparable to the tail level. Though theoretically the extreme scenario still applies to this case, the intermediate scenario seems more appealing. The reason is that the terms that are negligible in the extreme scenario (that is,ˆε U −ε (n/k 2 ) − 1) may result in approximation errors when the associated limiting distributions are used to construct SCBs. On the contrary, the neglected terms are recalled in the intermediate scenario and hence can lead to more reliable finite-sample performance. Indeed, our simulation results in Section 5 confirm that, when the available data are adequate relative to the tail level, the intermediate scenario produces more accurate coverage rates than the extreme scenario.

Main Results
The following theorem establishes uniform convergences of the maximum absolute log-ratios in the two scenarios.
in distribution, where G 1,type (u) and G 2,type (u) are centered Gaussian processes with variance-covariance functions γ 1,type (u 1 ,u 2 ) = Cov(G 1,type (u 1 ),G 1,type (u 2 )) and γ 2,type (u 1 ,u 2 ) = Cov(G 2,type (u 1 ),G 2,type (u 2 )) satisfying that Remark 3.2. In the extreme scenario, CVaR and CES share the same limiting distributions for each type. This is becauseˆε U −ε (n/k 2 ) − 1 are asymptotically negligible and hence CVaR and CES have the same asymptotic expansion. Similar results are also found in Hoga (2019a). Remark 3.3. According to Theorem 3.1(i), we can construct SCBs for the risk measures in the extreme scenario based on consistent estimators of the limiting variances. Take the relative risk measures for example. A consistent estimator of the limiting variance isγ 2 R + k 1γ 2 L /k 2 . Hence, the asymptotic (1 − α)-SCBs for R-CVaR(τ ) and R-CES(τ ) are, respectively, where −1 (·) is the quantile function of the standard normal distribution.
By incorporating the two scenarios as well as relative risk measures in a unifying theory, Theorem 3.1 substantially generalizes and unifies some existing studies. The extreme scenario is conventional in EVT-based tail estimation (e.g., de Haan and Ferreira, 2006), particularly in estimation of conditional risk measures. Chan et al. (2007) and Hoga (2019a)

BOOTSTRAP IMPLEMENTATION
Although Theorem 3.1 provides limiting distributions for the maximum absolute log-ratios, two implementational issues arise when employing the theorem to construct SCBs. The first issue is due to the slow convergence rates of the maximum absolute ratios because k 1 and k 2 are small compared to n and τ is close to 0. Consequently, even if Remark 3.3 gives the SCBs in the extreme scenario, our preliminary simulation results indicate that the coverage of such theoretical SCBs is seriously distorted. The second issue arises because it is quite hard to approximate the limiting distributions in the intermediate scenario, which are constructed from Gaussian processes that depend on the underlying model in a very complicated manner. To circumvent the two issues, we propose an easy-toimplement bootstrap procedure for finite-sample implementation.
We first discuss the bootstrap procedure for the extreme scenario. For any real number x, let log > x take value log x if x > 1 and 0 otherwise. Under Assumptions 2.1 and 3.1-3.5 and for type ∈ {U, D, R}, the proof of Theorem 3.1 implies that k 1/2 1 log(k 1 /(nτ )) log type-CVaR(τ ) type-CVaR(τ ) = I(type ∈{U,R})k 1/2 1 (γ R − γ R ) − I(type ∈{D,R})(k 1 /k 2 ) 1/2 k 1/2 and k 1/2 1 log(k 1 /(nτ )) log type-CES(τ ) type-CES(τ ) is the indicator function of the event A which equals 1 if A occurs and 0 otherwise. It is worth mentioning that, in the extreme scenario, CVaR and CES share the same asymptotic expansions in terms of the log-ratio, although they are very different quantities. The reason is as follows. As judged from (2.14), the tail mean estimator is constructed by dividing the quantile estimator by the factor 1 −γ R . The estimation uncertainty ofγ R is of order k 1/2 1 , which is smaller than k −1/2 1 log(k 1 /(nτ )) (the order of the leading term in the extreme scenario). Therefore, the estimation uncertainty ofγ R does not affect the limiting property up to the leading order.

1)
and Proposition 4.1 gives linear expansions of the tail index estimators. In addition, Proposition 3.3 implies that the tail index estimatorsγ R andγ L are asymptotically independent. Our proposal is to employ the idea of the multiplier bootstrap to approximate the uncertainty characterized by componentsγ R − γ R andγ L − γ L , by exploiting the linear expansions as well as the asymptotic independence. Specifically, suppose we have two independent sequences of multipliers {ξ t } n t=d n and {ν t } n t=d n , which consist of i.i.d. random variables with zero mean, unit variance, and bounded support, and are both independent of the original data. Then, multiplying the summands in (4.1) and (4.2), respectively, with {ξ t } n t=d n and {ν t } n t=d n , we obtain

Bootstrap Algorithm for the Extreme Scenario
where X R and X L are two independent normal random variables with zero mean and respective variances γ 2 R and γ 2 L , and they are both independent ofγ R − γ R andγ L − γ L .
Unlike the extreme scenario, we need to account for the contributions of the components X R and X L apart from those of the tail index estimators. To this end, we randomly draw observations from X * R and X * L , which are independent normal random distributions with zero mean and respective variancesγ 2 R andγ 2 L , and are independent of {ξ t } n t=d n and {ν t } n t=d n . For CVaR, define for type ∈ {U, D, R} and τ ∈ [τ l ,τ u ] where M * R and M * L are defined in (4.3) and (4.4), respectively. Then, for type ∈ {U, D, R}, T int * type-CVaR (τ ) is used to mimic k 1/2 1 log type-CVaR(τ )/type-CVaR(τ ) . Similarly, for CES, define for type ∈ {U, D, R} and τ ∈ [τ l ,τ u ] Then, for type ∈ {U, D, R}, T int * type-CES (τ ) is used to mimic k

NUMERICAL STUDY
In this section, we first provide guidelines on how to choose the effective sample sizes k 1 and k 2 to implement the proposed approach, and then proceed to show the results of a Monte Carlo study.

Data-Driven Selection of k 1 and k 2
The implementation of the proposed approach requires selection of the effective sample sizes k 1 and k 2 . Previous studies on EVT-based risk measure estimation have found that the choice of the effective sample size is crucial (e.g., Chan et al., 2007;Martins-Filho et al., 2018). Various authors have proposed different empirical suggestions. For example, when constructing pointwise confidence intervals for downside CVaR, Chan et al. (2007) suggest 1.5(log n) 2 for n = 1,000, where · is the integer part of a real number. Nonetheless, their suggestion is based on their simulations for n = 1,000, and Spierdijk (2016) finds that it is no longer adequate for larger sample sizes. Compared to the ad hoc suggestions, a data-driven rule is desirable because it selects k 1 and k 2 that are tailored to the data at hand.
Below we suggest a data-driven method for selecting k 1 and k 2 . Our estimation relies on the Pareto approximation to the tails of the innovation distribution. Then, k 1 and k 2 actually determine where the approximation starts for the right and left tails, respectively. Therefore, it is reasonable to choose the effective sample size that leads to the best Pareto approximation. To this end, we follow the wisdom of Danielsson et al. (2019), who propose to measure the Pareto approximation by the maximum distance between the fitted Pareto-type tail and the empirical quantile. Then, the optimal effective sample size is chosen as the minimizer of the measure. Since estimating CVaR essentially involves estimating quantiles, the quantile-based method is suitable for CVaR estimation.
Following the proposal of Danielsson et al. (2019), for the estimation of upside CVaR, we use k 1 equal to is the empirical quantile andε (l) (j/l) −γ R (l) is the quantile estimator given by the Pareto approximation. Similarly, when estimating downside CVaR, k 2 is chosen as k * 2 , which is obtained by applying the above criterion to {−ε t }.
For a conceptually suitable criterion for CES estimation, we modify (5.1) by employing its tail mean-based analog. Specifically, when estimating upside CES, we use k 1 equal to k * 1 = arg min l=k min ,...,k max max j=1,...,k max n t=d n I(ε t >ε (j) )ε t n t=d n I(ε t >ε (j) )

2)
where n t=dn I(ε t >ε (j) )ε t n t=dn I(ε t >ε (j) ) is the empirical estimator of the tail mean over the (1 − j/n)quantile andˆε (l) is the estimator given by the Pareto approximation. Again, when estimating downside CES, k 2 is chosen as k * 2 , which is obtained by applying the above criterion to {−ε t }.
Finally, we note that Assumption 3.5(ii) and (iii) sets conditions for k 1 and k 2 . Below we provide suggestions on how these conditions can be satisfied. First, we choose k 1 and k 2 from the same [k min ,k max ] so that k 1 and k 2 are comparable as required by Assumption 3.5(ii). Second, for the intermediate scenario (corresponding to Assumption 3.5(ii) with c 1 > 0), k 1 and k 2 should be comparable to nτ . Accordingly, we can set k min = anτ u and k max = bnτ u with 0 < a < b.
For example, if the tail region is [0.5%,1%], we can set k min = 2% × n and k max = 15% × n, corresponding to the setting that a = 2 and b = 15. In addition, the choice of k max means that at most 15% of the sample are used, meeting the requirement of Assumption 3.5(iii) that k max should be small relative to n υ 0 , where υ 0 is usually equal to, or at least close to, 1. Third, for the extreme scenario (corresponding to Assumption 3.5(ii) with c 1 = 0), k 1 and k 2 should be larger relative to nτ . Accordingly, we should set k min to be larger than nτ u . For example, if the tail region is [0.05%,0.1%], we can set k min = 1% × n and k max = 10% × n. Again, the choice of k max meets the requirement of Assumption 3.5(iii). Lastly, given the above two rules, k 1/2 1 would generally be larger than | log(nτ l /k 1 )| and hence the condition lim n→∞ k −1/2 1 log(nτ l /k 1 ) = 0 in Assumption 3.5(iii) is satisfied.

Monte Carlo Simulations
In this section, we conduct a series of Monte Carlo simulations to evaluate the finite-sample performance of the proposed approach. In the bootstrap implementation, the number of bootstrap replications is B = 500 and the multipliers are generated from the two-point distribution with the probability mass function P(ξ = 1) = P(ξ = −1) = 1/2. The confidence level is 95%. The number of discarded observations is d n = 5 × (nτ l ) 1/3 , which should satisfy d n = o(k 1/2 1 ) to satisfy Assumption 3.5. Since k 1 ∼ nτ l under the intermediate scenario and k 1 = o(nτ l ) under the extreme scenario, setting d n to be 5 × (nτ l ) 1/3 meets the required order of d n for both the intermediate and extreme scenarios. All the reported simulation results are obtained based on 1,000 replications.
The data are generated from the GARCH(1,1) model where ω = 0.001, ω 1 = 0.04, and ω 2 = 0.85. The innovation term ε t follows the standardized skewed sstd with the shape parameter ν and the skewness parameter λ (in short, sstd(ν,λ); see Fernández and Steel, 1998). The sstd is symmetric if λ = 1 and asymmetric otherwise, being skewed to the left if λ < 1 and to the right if λ > 1. In our simulations, we consider sstd(5,λ) where λ ∈ {0.95,1}. The conclusions are qualitatively unchanged for other models such as the TGARCH model. In the first part, we compare the finite-sample performance of the SCBs based on the two scenarios of asymptotics. The tail region is [0.5%,1%], and the sample size n ∈ {500,1,000,1,500,5,000}. Note that the tail region is not beyond the range of the sample. Hence, theoretically, both the extreme and intermediate scenarios apply. The effective sample sizes k 1 and k 2 are selected using the data-driven method in Section 5.1 with k min = 2% × n and k max = 15% × n for the intermediate scenario and k min = 5% × n and k max = 15% × n for the extreme scenario. Apart from the data-driven (k * 1 , k * 2 ), we also use (k * 1 + n/100, k * 2 + n/100) and (k * 1 + n/50, k * 2 + n/50). This means that the user may want to employ more data, but at the cost of less accurate Pareto approximations. The model parameters are estimated with the Gaussian QMLE method. We only report the results for the relative risk measures to save space. The empirical coverage rates and relative lengths are collected in Tables 1 and 2. Since our proposed SCBs are constructed in a multiplication-division manner, it is more informative to consider the relative length defined as the ratio of the SCB's upper bound to its lower bound. Several observations are in the sequel. First, the SCB based on the intermediate scenario has satisfactory coverage rates, but the SCB based on the extreme scenario suffers from evident underestimation. This confirms our discussion in Section 3.2 that, when data are adequate so that no extrapolation is needed, the intermediate scenario provides more accurate approximations. Second, the more accurate coverage of the intermediate scenario is accompanied by a larger length. This means that the risk measure estimators under the intermediate scenario has more variability, the consequence of recalling the neglected terms in the extreme scenario. In addition, the coverage of the extreme scenario does not improve even if n increases to 5,000. This indicates that the impact of the neglected terms persist even for a very large sample size. Third, for n = 500, (k * 1 , k * 2 ) leads to better coverage than (k * 1 + n/100, k * 2 + n/100) and (k * 1 + n/50, k * 2 + n/50) under the intermediate scenario. However, as n increases, the performance becomes more robust. Finally, the results are basically robust to different innovation distributions.
To summarize, we find that when the data are adequate so that extrapolation is not needed, the intermediate scenario produces SCBs with more accurate coverage rates than the extreme scenario, but accompanied by larger lengths. In addition, the data-driven methods yield the best finite-sample performance when n = 500 and the performance tends to be robust as n increases.
In the second part, we evaluate the performance of the SCB based on the extreme scenario when the tail region is indeed beyond the range of the available data, that is, when extrapolation is needed. We consider the tail region [0.05%,0.1%]. To ensure extrapolation, the sample size should be no larger than 1,000, and hence we consider n ∈ {500,1,000}. The data are still generated from model (5.3). The effective sample sizes k 1 and k 2 are selected using the data-driven method in Section 5.1 with k min = 1% × n and k max = 10% × n. Moreover, in addition to the data-driven (k * 1 , k * 2 ), we also use (k * 1 + n/100, k * 2 + n/100) and (k * 1 + n/50, k * 2 + n/50) to check robustness. The simulation results are collected in Table 3. We observe that the coverage rates of the SCBs for CVaR are very close to the nominal level, but the coverage rates of the SCBs for CES are underestimated. Hence, it seems more difficult to estimate CES than CVaR under the extreme scenario. The same issue is also found in the simulation study of . Finally, we observe that the SCBs based on (k * 1 , k * 2 ) have more accurate coverage rates and larger lengths than the SCBs based on (k * 1 + n/100, k * 2 + n/100) and (k * 1 + n/50, k * 2 + n/50). A potential explanation for the undercoverage for CES under the extreme scenario is as follows. As judged from (2.14), the tail mean estimator is constructed Notes: The results are for the relative risk measures under the confidence level 95%. k * 1 and k * 2 are selected by the data-driven methods in Section 5.1. The relative length is defined as the ratio of the SCB's upper bound to its lower bound. We note that under the extreme scenario, the SCBs for CVaR and CES have the same relative length. Notes: The results are for the relative risk measures under the confidence level 95%. k * 1 and k * 2 are selected by the data-driven methods in Section 5.1. The relative length is defined as the ratio of the SCB's upper bound to its lower bound. We note that under the extreme scenario, the SCBs for CVaR and CES have the same relative length. Notes: The results are for the relative risk measures under the confidence level 95%. k * 1 and k * 2 are selected by the data-driven methods in Section 5.1. The relative length is defined as the ratio of the SCB's upper bound to its lower bound. We note that under the extreme scenario, the SCBs for CVaR and CES have the same relative length. by dividing the quantile estimator by the factor 1 −γ R , thereby introducing more variability into the CES estimator than the CVaR estimator. Nonetheless, Theorem 3.1 indicates that CVaR and CES share the same limiting distributions under the extreme scenario, meaning that the variability of the dividing factor is of smaller order. In finite samples, however, the variability of the dividing factor increases estimation uncertainty. Since the multiplier bootstrap only mimics the first-order stochastic behavior, the resulting SCBs tend to be too tight.
In the third part, we investigate the impact of the convergence rate ofθ on the performance of the proposed approach. We employ model (5.3) with sstd(3,1) innovations, and the sample size is n = 1,000. We consider τ = 1% for the intermediate scenario and τ = 0.1% for the extreme scenario. For the estimation of the model parameters, two methods are employed: the Gaussian QMLE and the Laplace QMLE. When the innovation is sstd(3,1), the convergence rate of the Laplace QMLE is n −1/2 (Francq and Zakoïan, 2010, Exam. 9.3), but the convergence rate of the Gaussian QMLE is roughly n −1/3 (Mikosch and Straumann, 2006, Thm. 4.4). We also consider setting the parameter estimates to be the true parameter to investigate the performance when there is no estimation uncertainty. Figure 1 shows the coverage rates and relative lengths of the confidence intervals for R-CVaR for k 1 = k 2 = k with k ∈ {20, 25, . . . ,195,200}. It appears that there is no visible difference among the performances based on different estimators and the true parameter. Hence, for a reasonable range of k 1 and k 2 (at most 20% of the sample here), the first-stage estimation uncertainty does not seem to affect the performance of the proposed approach.
In the final part, we compare our EVT-based approach with the purely nonparametric approach of Gao and Song (2008). We employ model (5.3) with sstd(5,1) innovations, and the sample size is n = 1,000. We apply our intermediate scenario and the approach of Gao and Song (2008) to construct confidence intervals for downside CVaR and CES for τ ∈ {0.1%,0.2%, . . . ,1%}. For our approach, k 1 and k 2 are selected by the data-driven methods in Section 5.1 with k min = τ + 3% × n and k max = τ + 10% × n. Figure 2 shows the coverage rates and lengths for different τ . For a direct comparison of the interval length, we use the absolute length defined as the difference between the confidence interval's upper bound and its lower bound.
We first discuss the results for CVaR. When 0.4% ≤ τ ≤ 1%, the coverage rates are basically comparable, but the nonparametric approach has larger lengths. When τ becomes more extreme (that is, when τ < 0.4%), the coverage rate of the nonparametric approach declines sharply and the length of the nonparametric approach is much smaller than the length of the EVT-based approach. Turning to the results for CES, we observe that the EVT-based approach has better coverage and, again, the gap in coverage becomes very evident when τ < 0.7%. In addition, the length of the EVT-based approach is larger than that of the nonparametric approach. Combining the above observations, we find that our approach performs more reliably for small tail levels, thereby highlighting the merit of EVT in estimating deep tails.  Figure 1. Comparison of coverage rates and relative lengths of the confidence intervals for R-CVaR based on different estimators of θ 0 . The confidence level is 95%. "Gaussian" means the Gaussian QMLE, "Laplace" means the Laplace QMLE, and "True" means the true parameter. The relative length is defined as the ratio of the confidence interval's upper bound to its lower bound.
Note that * R,t = R,1,t ξ t and * 2 R,t = R,2,t ξ 2 t . For any > 0, where the second last equality is due to n t = d n R,2,t = n t = d n E * ( * 2 R,t ), the last inequality is from Cauchy-Schwarz inequality, and the last equality is because of R,1 = o p (1). Thus, { * R,t } n t=d n satisfies the Lindeberg-Feller condition, and hence k 1/2 1 γ −1 R M * R converges to standard normal distribution. Similarly, we can show that k 1/2 2 γ −1 L M * L also converges to standard normal distribution conditional on the original data and thus the first part of this theorem.
(ii) Regarding the formulations of T int * type-CVaR and T int * type-CES , along with k 1/2 1 γ −1 R M * R and k 1/2 2 γ −1 L M * L both converge to standard normal distribution conditional on the original data as shown in part (i), the proof of this part of this theorem can be proved in the same manner as part (ii) of Lemma S1.11 in the Supplementary Material and thus is omitted here. Then, we finish the proof of this theorem.

C. Primitive Conditions for Assumptions 3.2 and 3.3.
This section provides primitive conditions for Assumptions 3.2 and 3.3 in the ARMA-GARCH, the TGARCH, and the GARCH-in-mean models. In the following discussions, we restrict our attention to the simplest forms of these models to reduce notational complication.