TAIL DEPENDENCE OF OLS

This paper shows that if the errors in a multiple regression model are heavy-tailed, the ordinary least squares (OLS) estimators for the regression coefficients are tail-dependent. The tail dependence arises, because the OLS estimators are stochastic linear combinations of heavy-tailed random variables. Moreover, tail dependence also exists between the fitted sum of squares (FSS) and the residual sum of squares (RSS), because they are stochastic quadratic combinations of heavy-tailed random variables.


INTRODUCTION
Financial data are often heavy-tailed. For instance, Jansen and de Vries (1991) investigate the tail behavior of stock returns and find tail index estimates consistent with a finite variance, but with infinite higher moments. The heavy-tailed nature of financial data can distort statistical analysis, especially in small samples. Mikosch and de Vries (2013) show that in an ordinary least squares (OLS) regressionan econometric method often applied to financial data-the estimator for the regression coefficient is heavy-tailed if the errors are heavy-tailed. As a follow-up study, we explore the joint tail behavior of the OLS estimators. In particular, we establish that the OLS estimators in a multiple regression model with heavy-tailed errors are tail-dependent.
This study contributes to the literature of regression analysis with heavy-tailed data. There are two streams of literature dealing with heavy-tails in regression: one stream aims at finding more robust, or more efficient, estimation methods than OLS, and the other aims at characterizing the behavior of the OLS estimator(s) when the errors are heavy-tailed.
The first stream of literature addresses the concern of efficiency loss of the OLS estimator when applied to severely heavy-tailed data with infinite variance. Blattberg and Sargent (1971) examine the performance of the minimum sum of absolute errors (MSAE) estimator and a class of best linear unbiased estimators in a regression model with stable Paretian distributed errors. They find that the turning point in efficiency lies at a tail index of approximately 1.5: for lower values of the tail index, the MSAE estimator performs considerably better than the OLS estimator, while the OLS estimator performs only slightly better for higher values of the tail index. Others have studied the OLS estimator in terms of robustness. For example, He et al. (1990) investigate the robustness of regression estimators by means of their "tail performance." They show that when the errors follow heavytailed distributions, the tail performance of the OLS estimator does not depend on the sample size and is effectively as worse as with a single observation only. By contrast, the MSAE, or least absolute deviations, estimator and least median of squares estimators have considerably better tail performance and are therefore also not expected to exhibit tail dependence.
The second stream of literature aims at describing the (tail) behavior of the OLS estimators under heavy-tailed errors. Mikosch and de Vries (2013) is the only study to provide small sample analytical results on the distribution of the OLS estimator when the errors follow heavy-tailed distributions. Specifically, they find that the OLS estimator is heavy-tailed in the case of a simple linear regression model with additive or multiplicative errors. Consequently, approximations prescribed by the central limit theorem do not suffice in small samples. Our study fits into the second stream of literature. Compared to Mikosch and de Vries (2013), we go beyond the simple regression model and study the tail dependence among the OLS estimators in a multiple regression model with heavy-tailed errors. Furthermore, we study the tail dependence between the regression fitted sum of squares (FSS) and the residual sum of squares (RSS).
To establish the theoretical result in this study, we express the OLS estimators for the regression coefficients as stochastically weighted sums of the errors. van Oordt (2013) shows that positive deterministic linear combinations of positive heavytailed random variables exhibit tail dependence. We extend this result in three ways: (i) allowing for real-valued weights, (ii) allowing for stochastic weights, and (iii) allowing for real-valued heavy-tailed random variables, of which the second extension is the most technically involved one. The tail dependence between the OLS estimators then follows from this extended result. Furthermore, the FSS and the RSS are stochastically weighted sums of squares and cross products of the errors. Consequently, their tail dependence can be established in a similar way.
In Section 2, we will define the multiple regression model and present the main result on the tail dependence between the OLS estimators and that between the FSS and the RSS. Section 3 illustrates the theoretical results by means of a simulation study. Proofs are deferred to Section 4.

Model Setup
Consider the multiple regression model Y t = X 1,t β 1 + · · · + X k,t β k + η t , for t = 1,2,..,n, under the conditions that (i) {η t } n t=1 is an identically and independently distributed (i.i.d) sequence of random variables, and (ii) {(X 1,t , . . . ,X k,t )} n t=1 is an i.i.d. sequence of k-dimensional random vectors containing explanatory variables, independent of the error sequence {η t } n t=1 . We assume that η t follows a heavy-tailed distribution, i.e., its distribution function is regularly varying. More specifically, a random variable, or its distribution function F, is said to be regularly varying with tail index α > 0 if there exists p,q ≥ 0 with p + q = 1 and a slowly varying function L such that, as x → ∞, Here, a slowly varying function L is defined by having the property L(λx) ∼ L(x) as x → ∞, for any λ > 0 (see Bingham, Goldie, and Teugels, 1989). Condition (2) is referred to as a tail balance condition. Note that this assumption on F is semiparametric in the sense that it does not impose any restrictions on the moderate levels of the distribution. We do not restrict the range of α: when α < 1, η t does not have a finite mean.
Observe that the difference between the OLS estimator and the corresponding true value is a stochastic linear combination of η. For example, if k = 2, we can writeβ 1 = β 1 + n t=1 C t η t andβ 2 = β 2 + n t=1 D t η t , where Here, (C t ,D t ) n t=1 is an identically distributed, but not independent, sequence of random vectors. Consequently, we omit subscripts and write (C,D) whenever only the distribution of (C t ,D t ) is involved.
We aim at calculating the tail dependence between the OLS estimators. We define tail dependence between any two random variables Z 1 and Z 2 as where Q u (Z) = inf{l : F Z (l) ≥ u} denotes the quantile of a random variable Z at probability level u. Similarly, we define multivariate tail dependence between Z i and the other random variables as where Z −i is a shorthand notation for denoting all random variables Z 1 , . . . ,Z n except Z i .

Tail Dependence Between the OLS Estimators
Consider model (1) with k = 2. We then have the following result for the tail dependence between the OLS estimators. THEOREM 1. Assume that the errors {η t } n t=1 in model (1), for k = 2, follow a heavy-tailed distribution-as defined in (2)-with p,q > 0 and tail index α. In addition, assume that the weights (C,D) satisfy E|C| α+ ,E|D| α+ < ∞, for some > 0, where (C,D) is defined as in (3). Then, the tail dependence between the OLS estimators for β 1 and β 2 is given by where I {.} is the indicator function, C + := max {C,0}, C − := max {−C,0}, the variables D + and D − are defined similarly, and A is the event Mathematically, Theorem 1 can be interpreted as showing that stochastically weighted sums of regularly varying random variables are tail-dependent. This also follows directly from the multivariate version of Breiman's lemma (see Proposition A.1 in Basrak, Davis, and Mikosch, 2002). Our result, however, is more specific in that we have an explicit formula relating the level of tail dependence to the distribution of the regressors. With the explicit formula, one can estimate the tail dependence using the observed regressors (see Section 3 below). Following (4), the level of tail dependence is determined by the tail index, α, and the joint distribution of the stochastic weights (C,D). We resort to a simulation study in Section 3 to illustrate the effect of these determinants.
One high-level condition required in Theorem 1 is E|C| α+ ,E|D| α+ < ∞. In Proposition 1 below, we provide sufficient conditions on the distribution of the regressors which guarantee such a requirement. PROPOSITION 1. Assume that the following conditions are satisfied: a. the common marginal distribution of X i,t satisfies that, for some γ > 2α n , c > 0, and x 0 > 0, such that, for all x < x 0 , the common density function of X i,t is bounded in the neighborhood of zero and n > 2α. 2. There exist some c 1 > 0, γ 1 > 2α, and 0 < x 1 < 1 such that, for all 0 < x < x 1 , Then, there exists some > 0 such that E|C t | α+ < ∞ and E|D t | α+ < ∞.
We remark that the first condition in Proposition 1 is a requirement on the marginal distribution of the regressors. It is similar to the condition in Lemma 3.6 of Mikosch and de Vries (2013). Note that for financial data, condition 1(b) may not be likely to hold (see, e.g., Han, Cho, and Phillips, 2011). The second condition in Proposition 1 requires that the probability of having two linearly dependent regressors is sufficiently low. Finally, as an example, these conditions hold when the regressors X 1,t ,X 2,t follow a bivariate normal distribution with correlation parameter |ρ| < 1 and n > 4α + 2. Proofs for Proposition 1 and the example are postponed to Section 4.
In the general case with k regressors, we may still is a function of the regressor matrix X. The weights (W 1,t ,W 2,t ,...,W k,t ) n t=1 form an identically distributed, but not independent, sequence of k-dimensional random vectors. Therefore, we omit the time subscripts and continue to write (W 1 ,W 2 ,...,W k ). Consequently, we have the following extension of Theorem 1.

Tail Dependence Between the FSS and the RSS
In regression analysis, two important quantities are the FSS and the RSS. These quantities are often used to test the explanatory power of the regression model via the F-test. That is, to test β 1 = · · · = β k = 0 in the regression model given by (1). For example, under the assumption that the errors {η t } n t=1 in (1) are normally distributed, the F-statistic is defined as where the terms Xβ Xβ and Y − Xβ Y − Xβ are the FSS and the RSS, respectively. Under the null hypothesis β 1 = · · · = β k = 0, we can write the FSS and the RSS as Similar to the OLS estimators, the FSS and the RSS are weighted quadratic combinations of the errors, where the weights are contained in . Denote the tth diagonal element of as L tt . Then, {L tt } n t=1 are identically distributed random variables. We omit the subscripts and use L whenever only the distribution of L tt is involved. In the following theorem, we show that the FSS and the RSS are taildependent when the errors are heavy-tailed.
THEOREM 3. Assume that the errors {η t } n t=1 in (1) follow a heavy-tailed distribution with p,q > 0 and tail index α. Then, under the null hypothesis β 1 = · · · = β k = 0, the tail dependence between the FSS and the RSS is given by and B is the event We have two remarks about Theorem 3. First, for a given α, λ FSS,RSS is solely determined by the distribution of L-the distribution of a diagonal element of . In regression analysis, a diagonal element of is known as the leverage of an observation and indicates to what extent the observation "determines" the OLS estimate. Note that 0 ≤ L tt ≤ 1, which guarantees the existence of all moments of L. Therefore, no moment condition is needed in Theorem 3. Second, under normally distributed errors, the FSS and the RSS are independent. By contrast, we show that under heavy-tailed errors, the two are at least taildependent. Consequently, the F-statistic may not follow an F-distribution, if errors are heavy-tailed. Nevertheless, if the regressors are deterministic and the errors are in the class of multivariate elliptical distributions, the F-statistic still follows an F-distribution (see, e.g., Qin and Wan, 2004). Multivariate elliptical distributions include heavy-tailed distributions, such as the Cauchy distribution and the multivariate Student t distribution (recall the Gaussian scale mixture representation; see Breusch, Robertson, and Welsh, 1997), but do not include the i.i.d. case we are considering.
Although the null hypothesis in Theorem 3 appears to be specific, the same conclusion is valid for a broader null where only a subset of the coefficients are set to zero, i.e., a nested model test.
where the first term with · (0) contains the first k 0 dimensions and the second term with · (1) contains the other k − k 0 dimensions. Suppose we are testing the null that the first k 0 dimensions of the coefficients are zero, i.e., β (0) = 0 k 0 . By the Frisch-Waugh-Lovell theorem, the OLS estimator for β (0) is the same as the OLS estimator obtained when regressing Moreover, the residuals in both regressions are the same. Hence, testing β (0) = 0 k 0 in the full model is equivalent to testing the explanatory power in the regression after eliminating the impact of X (1) . Therefore, the same conclusion as in Theorem 3 holds.

SIMULATION STUDY
The main results in Section 2 show that the tail dependence between the OLS estimators depends on the tail index α and the joint distribution of the regressors. Here, we perform a simulation study to validate this result and demonstrate to what extent these determinants affect the level of tail dependence.

Bivariate Tail Dependence Between the OLS Estimators
We simulate data from the regression model where X 1,t ,X 2,t n t=1 are i.i.d observations drawn from a bivariate normal distribution with mean zero, standard deviation 1/5, and correlation ρ. In addition, {η t } n t=1 is an i.i.d. sequence of errors, independent of X 1,t ,X 2,t n t=1 . For the heavy-tailed case, we draw these errors from a Student t distribution with mean zero and α degrees of freedom. In this model, we have that β 1 = β 2 = 1. Under these conditions, the regression model (6) satisfies the conditions of Theorems 1 and 3, because the Student t distribution with α degrees of freedom is regularly varying with tail index α and p = q = 0.5 due to its symmetry. To compare with the heavy-tailed case, we also draw errors from a standard normal distribution. In all simulation studies, we consider the regression model with n ∈ {25,50,100,200} and repeat the simulation m times for each n. Figure 1 shows the scatter plots of the (transformed) OLS estimators under the heavy-tailed and normally distributed errors in the left and right panels, respectively. For the sake of comparison, we transform the estimates to having uniform marginal distributions using the empirical distribution functions estimated from the m runs. The top two plots show the result of m = 2,000 runs. We observe that the dependence structure of the OLS estimators under heavy-tailed errors is different from that under normally distributed errors: observations are more concentrated along the outer diagonals, suggesting the presence of tail dependence in the case of heavy-tailed errors and the potential tail independence under normally distributed errors. In order to further demonstrate the tail dependence structure in the right tails, in the two bottom plots, we show the results of m = 10,000 runs with zooming in to the joint tail regions when both transformed estimators are above 0.95. Under (tail) independence, one would expect to find 10,000 * 0.05 * 0.05 = 25 such pairs in this area. We observe that under the heavy-tailed errors, there are 152 such pairs in this region, about five times more than the expected number under (tail) independence. However, under the normally distributed errors, there are 32 such pairs, close to the expected number.
Subsequently, we evaluate the tail dependence between the OLS estimators by two methods. The first method involves estimating the tail dependence from simulated regressors using the result in Theorem 1. Note that under the distribution of {η t } n t=1 specified above, we simplify the tail dependence in Theorem 1 to because C and D are symmetrically and identically distributed. Based on (7), we can estimate the tail dependence by its empirical analog using the observed regressors aŝ With the m simulation runs, we obtain m estimates ofλβ 1 ,β 2 . From these estimates, we calculate the mean and standard deviation ofλβ 1 ,β 2 and report it in Table 1 as the lower entry for each given set of parameters (α,n,ρ). For the case ρ = 0, we consider a wider range of values for α from 1 to 5. In this case, we draw the errors from the standard Student t(α) distribution. Notice that, for α = 1, the errors have no finite mean, and for α = 2, the errors have no finite variance. For the other cases where ρ = 0, we only consider α > 2, because it is often assumed in regression models that the errors have a finite variance. In these cases, we draw the errors from the standard Student t distribution with α degrees of freedom and unit variance. This choice facilitates a better comparison across different values of α. Note that in this simulation study, we refrain from estimating α and consider α to be known in (8). In practice, one needs to estimate α from the observations {Y t } n t=1 , which may potentially inflate the variance of the estimator.
As a second method, we use a nonparametric estimator from multivariate extreme value theory to estimate the tail dependence. Note that the m simulation runs result in m pairs of estimates (β 1 ,β 2 ). Therefore, we can use a nonparametric estimator for the tail dependence betweenβ 1 andβ 2 as follows: i are the order statistics of the m estimateŝ β i,1 ,...,β i,m , for i = 1,2, and k is an intermediate sequence such that k/m → 0 as (k,m) → ∞. For each given set of parameters (α,n,ρ), we report the estimate Iβ 1 ,β 2 in Table 1 as the upper entry. All estimates in Table 1 are obtained by fixing m = 1,000,000 and k = 1,000.   Note: For each set of parameters (α,n,ρ), the upper entry presents an estimate of Iβ 1,β2 as in (9) and the lower entry presents the mean and standard deviation ofλβ 1,β2 (in parentheses) obtained from (8).
We have the following observations from the simulation. First, the result in Theorem 1 is in line with the simulations. The average ofλβ 1 ,β 2 coincides closely with Iβ 1 ,β 2 for most combinations of α, ρ, and n. Some differences occur between the two estimators Iβ 1 ,β 2 andλβ 1 ,β 2 for high values of α: for example, for α = 5, n ≥ 25, and ρ = 0, the average ofλβ 1 ,β 2 and Iβ 1 ,β 2 differ in excess of 50%. This result might be a consequence of the fixed choice of k in the estimator Iβ 1 ,β 2 . Second, for given α and ρ, the mean ofλβ 1 ,β 2 is generally invariant as n increases. For given n and ρ, both Iβ 1 ,β 2 andλβ 1 ,β 2 decrease as α increases. As α increases, the Student t distribution with α degrees of freedom becomes more similar to the normal distribution. Consequently, tail dependence between the OLS estimators decreases as the tail of the errors becomes lighter. Table 1 also illustrates that more negative correlation corresponds to higher tail dependence between the OLS estimators. This is due to the increase in P(C > 0,D > 0) as ρ decreases.
Finally, we remark that the standard deviation ofλβ 1 ,β 2 can be relatively high for small n. This suggests that one should be cautious when applying the estimator in (8) for regressions with limited number of observations. In addition, the standard deviation ofλβ 1 ,β 2 increases as the correlation ρ decreases, but is relatively insensitive to the level of α except in the case ρ = −0.9.

Tail Dependence Between the FSS and the RSS
Similar to the previous subsection, we estimate the tail dependence between the FSS and the RSS from the simulated regression model. We use an estimator based on Theorem 3 and a nonparametric estimator, denoted asλ FSS,RSS and I FSS,RSS , respectively. These estimators are constructed in a similar way asλβ 1 ,β 2 and Iβ 1 ,β 2 . We omit the details.  (6), with m = 10,000,000 and k = 1,000. We only present the results for n = 25, since results for other values of n remain qualitatively the same. 1 The average ofλ FSS,RSS and I FSS,RSS coincide closely except for the highest value of α. Furthermore, we observe that the tail dependence between the FSS and the RSS decreases as α increases. In contrast with the results in Table 1, the level of tail dependence between the FSS and the RSS does not depend on the correlation between the regressors, ρ. In addition, the tail dependence between the FSS and the RSS is stronger than the tail dependence between the OLS estimators in the base case ρ = 0. This can partly be explained by the fact that the quadratic forms of the FSS and the RSS are more heavy-tailed, with tail index α/2, and by the fact that the weights L tt and (1 − L tt ) are always nonnegative.

PROOFS
To calculate the tail dependence between the OLS estimators, we will analyze the joint probability that the estimators exceed high thresholds. Since the estimatorŝ β 1 ,...,β k are stochastically weighted sums of regularly varying errors, we start by analyzing nonstochastic linear combinations of regularly varying random variables.

Extension of the Feller Theorem
Theorem 4 provides an approximation to the joint probability that two real-valued linear combinations of real-valued regularly varying random variables exceed high thresholds.
THEOREM 4. Let η 1 ,η 2 ,...,η n be real-valued regularly varying random variables. Assume that they are independent. In addition, assume that c i ,d i are realvalued coefficients, for i = 1,2, . . . ,n. In addition, assume that x y → κ as x → ∞, with κ > 0. Then, it holds that, as x → ∞, The relation in (10) has a similar interpretation as the Feller theorem (see Feller, 1971, p. 278); for high x and y, the events { n i=1 c i η i > x} and { n i=1 d i η i > y} occur simultaneously solely due to a sufficiently extreme value of one of the η 1 ,...,η n if their corresponding coefficients have the same sign.
As a corollary of Theorem 4, we consider the joint tail probability of k linear combinations of n real-valued regularly varying random variables. COROLLARY 1. Let η 1 ,η 2 ,...,η n be real-valued regularly varying random variables. Assume that they are independent. In addition, assume that w 1,i ,...,w k,i are real-valued coefficients, for i = 1,2, . . . ,n. In addition, assume that x 1 x j → κ j as x 1 → ∞, with κ j > 0, for j = 2, . . . ,k. Then, it holds that, as x 1 → ∞, To prove Theorem 4, we first deal with n nonnegative regularly varying random variables in Lemma 1. In addition, this lemma does not require the assumption of independence. Instead, it only requires the weaker assumption that each pair of random variables are either independent or satisfying P η i > 0, η j > 0 = 0. LEMMA 1. Let η 1 ,η 2 , . . . ,η n be nonnegative regularly varying random variables. Assume that, for all 1 ≤ i ≤ n, η i is independent of {η j : P(η i > 0,η j > 0) > 0)}. In addition, assume that c i ,d i are real-valued coefficients, for i = 1,2, . . . ,n. In addition, assume that x y → κ as x → ∞, with κ > 0. Then, it holds that, as x → ∞, If n i=1 I{c i > 0,d i > 0} = 0, the above relation should be read as Proof of Lemma 1. We prove Lemma 1 by mathematical induction. We start by proving the Lemma for n = 2. Here, we only deal with c 1 < 0 and c 2 ,d 1 ,d 2 > 0. The other cases, including the case 2 i=1 I{c i > 0,d i > 0} = 0, are similar or simpler. We handle the joint probability on the left-hand side of (11) by providing its upper and lower bounds using set manipulation.
Note that, for any 0 < δ < 1/2, we have that From (12), we get that the lower bound on the joint probability is If P (η 1 > 0,η 2 > 0) = 0, we have that and if η 1 and η 2 are independent, we have Note that P η 1 < δx −c 1 → 1 as x → ∞. By first letting x → ∞ and then letting From (13), we get that the upper bound on the joint probability is If η 1 and η 2 are independent, the last term on the right-hand side of (14) is then P η 1 > δy , which is of higher order than the first term as x → ∞. If P (η 1 > 0, η 2 > 0) = 0, then the last term on the right-hand side of (14) is zero. Therefore, by letting δ → 0, we have that Combining the upper and lower bounds gives the result for n = 2.
Assuming that the lemma holds for n − 1 nonnegative regularly varying random variables, we prove that it also holds for n. First, we deal with the case n i=1 I{c i > 0,d i > 0} = 0. Then, there is at least one positive pair (c i ,d i ). Without loss of generality (w.l.o.g.), assume that c n ,d n > 0. Note that, for any 0 < δ < 1/2, we have that By (15), we have that we apply the induction hypothesis on I 1 to get that and P(η n > 0,η i > 0) = 0, for i ∈ I c ind , it follows that The term I 3 is of a higher order than n i=1 P(η i > x): the proof follows a similar way as handling I 2 by splitting variables into the set I ind and its complement. Combining all three terms and further taking δ → 0, we obtain the lower bound that For the upper bound, we get from (16) that The proof follows a similar way as handling the lower bound: the term L 1 can be handled using the induction hypothesis. The term L 2 can be handled using the regular variation. Finally, similar to I 3 , the L 3 term is of a higher order than n i=1 P(η i > x). Combining the three terms together and further taking δ → 0, we obtain the upper bound as The combination of the lower and upper bounds yields the result for n. If n−1 i=1 I{c i > 0,d i > 0} = 0, the proof is similar and simpler, because the terms I 1 and L 1 are of a higher order than n i=1 P(η i > x). Next, we handle the statement regarding the case n i=1 I{c i > 0,d i > 0} = 0 by mathematical induction. The proof for n = 2 is trivial. Assume that the result holds for n − 1.
If all c i and all d i are nonpositive, the result for n holds trivially. Suppose at least one c i or d i is positive. w.l.o.g., assume that c n > 0 and d n ≤ 0. Notice that The statement is proved for n provided that the probability of each set on the righthand side is of a higher order than n i=1 P(η i > x). The first set can be handled by the induction hypothesis. The second and third sets can be handled again by splitting the variables into the set I ind and its complement.
To prove Theorem 4, we separate the positive and negative parts of a realvalued regularly varying random variable and "absorb" the negative part into the corresponding coefficient. The result then follows by Lemma 1.
Proof of Theorem 4. W.l.o.g., consider the case that n = 2 and let η In addition, let e i = −c i and z i = −d i . By definition, it holds that P(η + i > 0, η − i > 0) = 0. In addition, by definition, it holds that Since the random variables {η + 1 ,η − 1 ,η + 2 ,η − 2 } satisfy the conditions of Lemma 1, the theorem is proved by applying the lemma.

Proofs of Theorems 1 and 2
We can apply the results in Section 4.1 to the OLS estimators by a conditioning argument. Denoting P(...|X) byP(...), we have that In the Appendix, we show that we may interchange the limit and the expectation, which leads to Then, the "conditional tail dependence," the term within the expectation in (17), can be calculated with the aid of Theorem 4. We start with obtaining an approximation for P η t > Q u β 1 − β 1 -the probability which needs to be dealt with after we apply Theorem 4 to the conditional tail dependence. To obtain the required approximation, we need to deal with the unconditional marginal distribution ofβ 1 − β 1 . For that purpose, we state here Lemma 3.4 of Mikosch and de Vries (2013), which extends Breiman's result (see Proposition 3 in Breiman, 1965) to a multivariate regularly varying random vector ξ and another random vector ζ of the same dimension. We define a random vector ξ = (ξ 1 ,...,ξ d ) to be multivariate regularly varying with index α > 0 if there exists a Radon measure μ on R d \ (− , ) d , for some > 0, such that for every Borel set with the homogeneity condition μ(xE) = x −α μ(E), see Theorem 6.1 in Resnick (2007). A special case of multivariate regular variation of ξ is when ξ 1 ,...,ξ d are independent regularly varying random variables (Resnick, 2007, Sect. 6.5.1).
LEMMA 2. Assume that ξ is multivariate regularly varying in R d with index α > 0 and is independent of the random vector ζ . Furthermore, it holds that E|ζ | α+ < ∞, for some > 0, where | · | denotes any given norm on R d . Then, the scalar product ψ = ξ ζ is regularly varying with index α. Moreover, if ξ has independent components, then as x → ∞, We now present an approximation for P(η t > Q u (β 1 − β 1 )) in the following lemma.
LEMMA 3. Let {η t } n t=1 be a sequence of real-valued regularly varying i.i.d. random variables satisfying p,q > 0. Assume that {C t } n t=1 is a sequence of identically distributed real-valued random variables independent of {η t } n t=1 . Furthermore, assume that C t satisfies E|C t | α+ < ∞, for some > 0. Then, it holds as u → 1, Proof of Lemma 3. Define |η| ∞ := max t (|η t |). By regular variation of η t and the tail balance condition (2), we have that, as x → ∞, Recall that, by definition, P n t=1 C t η t > Q u n j=1 C j η j = 1−u. Equation (18) follows immediately from Lemma 2 and the tail balance condition.
Using Lemma 3, we can now prove Theorem 1.
Proof of Theorem 1. We start by deriving the conditional tail dependence between the OLS estimators, which is the limit appearing in (17).
First, we show that From Lemma 3, we get that, as u → 1, The statement thus follows from the fact that η has a regularly varying tail. We can then take x = Q u β 1 − β 1 and y = Q u β 2 − β 2 in Theorem 4 to obtain that, as u → 1, where M + u and M − u are given by We first handle the term M + u . Note thatP(η t > Q u (β j − β j )) = P(η t > Q u (β j − β j )), for j = 1,2. Therefore, by Lemma 3 and the regular variation of η t , we have, as u → 1, The indicator functions in the expression of M + u allow us to consider only C t > 0 and D t > 0. Furthermore, note that the limit of the minimum of two convergent sequences is equal to the minimum of the two limits of the sequences. Consequently, we have that Similarly, we have that By relation (19), we have that It follows that the unconditional tail dependence between the OLS estimators is given by E M + + M − . Since {(C t ,D t )} n t=1 are identically distributed random vectors, we obtain that By the law of total expectation, we obtain the right-hand side of (4), which yields the theorem by (17).
The proof of Theorem 2 follows similar steps as that of Theorem 1 and is therefore omitted.

Proof of Theorem 3
Recall from (5) that under the null hypothesis β = 0, the FSS and the RSS can be written as η η and η (I − ) η, respectively, where = X X X −1 X . Therefore, the tail dependence between the FSS and the RSS is given by We shall interchange the expectation and the limit. Because contains bounded entries only, the justification of the interchanging can be obtained in a similar manner as in the situation of (17) (see the Appendix). Consequently, we get that Note that under theP measure, the matrix can be regarded as nonstochastic. In regression analysis, the elements on the diagonal of , L tt , reflect the "leverage" of observation Y t and satisfy 0 ≤ L tt ≤ 1, whereas the elements on the off-diagonal, L tj , satisfy −1/2 ≤ L tj ≤ 1/2, for t = j.
In the proof, we use the fact that for the cross products η t η j with t = j, both the left and right tails are regularly varying with tail index α (Embrechts and Goldie, 1980), whereas the squared terms η 2 t are regularly varying with tail index α/2. Proof of Theorem 3. For any x > 0, define the following sets: We study the probability of these sets under the measureP.
First, we handle the sets A i (x), for i = 0,1. From the Feller theorem, we have that Here, in the denominators, we use the fact thatP(η 2 t > x) = P(η 2 t > x) due to independence between the error terms and the regressors.
Next, we handle the sets B i (x), for i = 0,1. Note that B 0 (x) = t<j 2L tj η t η j > x and B 1 (x) = t<j 2L tj η t η j < −x . For any t < j, since η t η j has a lower tail index than η 2 t , we get that, as x → ∞,P(2L tj η t η j > x) = o(P(η 2 t > x)), which implies that, for both i = 0,1, Finally, we handle the sets C i (x), for i = 0,1. Note that, for any 0 < < 1/2, Hence, we get the upper bound forP(C 0 (x)) as lim sup where, in the last step, we apply (21), (22), and the fact that P(η 2 t > x) is a regularly varying function with index −α/2. Similarly, we have the lower bound forP(C 0 (x)) as By taking → 0, the upper and lower bounds coincide, which yields the limit of P(C 0 (x)). A limit result forP(C 1 (x)) can also be obtained in a similar way. We present both limit results as follows: (1 − L tt ) α/2 . By taking expectation on both sides of the two limit relations, we get that lim x→∞ P(C 0 (x)) P(η 2 t > x) = nE(L α/2 ) and lim Recall that C 0 (x) = η η > x , from the properties of regularly varying function, we get that, as u → 1, which further yields We are now ready to calculate the limit inside the expectation in (20). Choose specific values x = Q u (η η) and y = Q u (η (I − )η). Using the notation of the C i sets, we can write the set in the numerator as C 0 (x) C 1 (y). Similar to the way we handled C i individually, for any 0 < < 1, we can get an upper bound of this set as Clearly, the upper bound can be expanded as the union set of four sets in the form of A 0 A 1 , B 0 A 1 , A 0 B 1 , and B 0 B 1 . Under theP measure, the limit relation (22) ensures that the probability of the last three sets involving B i sets tends to zero faster than 1 − u as u → 1. Hence, we only need to handle the first set in the form of A 0 A 1 . The limit in (23) shows that x/y converges to a positive constant as u → 1. Since {η 2 t } n t=1 are n i.i.d. heavy-tailed random variables, we can apply Lemma 1 to the series {η 2 t } n t=1 , which leads to an upper bound forP C 0 (x) C 1 (y) as lim sup Similarly, to obtain a lower bound forP C 0 (x) C 1 (y) , we use the fact that Under theP measure, the probability of the set B 1 n(n−1)/2 x B 0 n(n−1)/2 y tends to zero faster than 1 − u as u → 1, which yields that where the last step is derived in a similar way as for the upper bound. Combining (24) and (25), together with taking → 0, we obtain that The theorem follows from taking expectation on both sides of this limit relation, combined with interchanging the expectation and the limit, justified in the Appendix.

CONCLUSION
In this paper, we study the tail dependence between OLS estimators for the regression coefficients when the error terms in the regression are heavy-tailed. We show the presence of tail dependence and provide an explicit formula relating the level of tail dependence to the distribution of the regressors. We also show that the FSS and the RSS are tail-dependent by providing an explicit formula for calculating the level of tail dependence. Simulation studies confirm our theoretical findings.
In practice, the error terms in a regression may possess heavy tails, for instance, when regressing a heavy-tailed dependent variable, such as a financial variable, on a set of thin-tailed independent variables. If, in addition, the number of observations is low, then the OLS estimators for the regression coefficients can be heavy-tailed and tail-dependent. That means if the estimator for one coefficient deviates largely from its true value, the estimator for another coefficient may also deviate from its true value substantially. A practitioner has to be cautious in interpreting the significance of the results from such a regression.

Interchanging the Expectation and the Limit
We show that we may interchange the expectation and the limit in (17). The proof follows similar steps as in the proof of Breiman's lemma (see Breiman, 1965).