PROPERTIES OF THE INVERSE OF A NONCENTRAL WISHART MATRIX

The inverse of a noncentral Wishart matrix occurs in a variety of contexts in multivariate statistical work, including instrumental variable (IV) regression, but there has been very little work on its properties. In this paper, we first provide an expression for the expectation of the inverse of a noncentral Wishart matrix, and then go on to do the same for a number of scalar-valued functions of the inverse. The main result is obtained by exploiting simple but powerful group-equivariance properties of the expectation map involved. Subsequent results exploit the consequences of other invariance properties.


INTRODUCTION
Many inference problems in multivariate analysis, and in econometrics, involve various properties of the noncentral Wishart distribution. For example, all of the estimators for the parameters in structural equation models (Limited information maximum likelihood [LIML], Full information maximum likelihood [FIML], ordinary least squares [OLS], instrumental variable [IV], etc.) are functions of a noncentral Wishart matrix, and the distribution theory for them derives directly from this fact. See, for instance, Phillips (1983), and the many references therein. Recent work on partially identified models (Phillips, 1989) and weak instruments (e.g., Andrews and Stock, 2005) involves the same structure. In some applications, however, the relevant matrix is the inverse of a noncentral Wishart matrix, rather than the matrix itself, and this makes the problem considerably more difficult. That is the motivation for the present paper.
If the rows of the n × m matrix Z are independent normal with covariance matrix , and E[Z] = M, then, if n ≥ m, the density of W = Z Z is given by (Muirhead, 1982, p. 442) where etr(A) denotes exp(tr(A)),tr(A) is the trace of A, |A| is the determinant of A, and is the multivariate gamma function, with (t) the univariate gamma function. Here and throughout, we use the usual notation for the hypergeometric functions of matrix argument (see Constantine, 1966, or Muirhead, 1982: Here, α = (j 1 ,j 2 ,...,j m ) is a partition of j with at most m parts (i.e., j 1 ≥ j 2 ≥ · · · ≥ j m ≥ 0, and m s=1 j s = j), this being denoted by α j. The numerical coefficients (c) a are defined by where (c) r = c(c + 1) · · · (c + r − 1). Finally, C α (·) denotes the zonal polynomial in the elements of the indicated matrix corresponding to the partition α j (see Muirhead, 1982, Chapter 7, or Macdonald, 1995. The important properties of these polynomials for our purposes are their invariance under orthogonal transformations, i.e., C α (HAH ) = C α (A), H ∈ O(m), where O(m) is the group of m × m orthogonal matrices, and the integral identity (Constantine, 1966, equation (5)): Here, (dH) denotes the normalized invariant (Haar) measure on O(m). We will also use the averaged hypergeometric functions based on this integral, and defined by: (see Muirhead, 1982, p. 259). 1 The noncentrality matrix is usually defined as = −1 . Here, we shall use the more symmetric notation. In addition, n in this definition need not be an integer: the expression given is a density for the positive definite symmetric matrix W for any real number n > m − 1.
We abbreviate the fact that W has this density by W ∼ W m (n, , ), or W ∼ W m (n, ) when = 0. Expectations of functions of W when W has this density will be denoted by expressions like E , [·], with omitted in the subscript when = I m , and omitted in the central case = 0. When = 0 and = I m , both subscripts are omitted. Analogous notation will also be used for other random variables.
In this paper, we examine some of the properties of the inverse of such a matrix W, beginning with the expectation of W −1 . For the problem of evaluating the expectation of W −1 , we may assume that = I m , so that the density of W depends only on . For other functions of W −1 discussed later, this assumption is restrictive, but the more general case of arbitrary is more complicated. So, later, we will discuss both of the cases = I m and arbitrary. We first seek an expression for the mean, E [W −1 ]. In the central case where W ∼ W m (n, ), the result is easily obtained (Muirhead, 1982, p. 97), and is: In the (noncentral) scalar case (m = 1), the result is also well known, since W ∼ χ 2 n (λ) (i.e., noncentral chi-square with noncentrality parameter λ) and we have, for n > 2, (Krishnan, 1967) where 1 F 1 (a,b;z) is the confluent hypergeometric function. However, as far as we know, no explicit results are available for the noncentral case when m > 1, although some attempts on the problem have been made (Ullah, 1994;Letac and Massam, 2008). These known results are not explicit, but are expressed in terms of unresolved integrals and/or differential operators. 2 The expressions obtained below are, in contrast, explicit, and relatively simple.
After evaluating E [W −1 ] itself, we then go on to discuss the expectations of certain scalar functions of W −1 . Here, we confine attention to orthogonally invariant functions (i.e., functions f (W −1 ) invariant under W → HWH , H ∈ O(m)). In particular, we evaluate the expectations of the elements of a particular basis for the vector space of such functions, the zonal polynomials. The expectations of functions such as (tr(W −1 )) k and tr(W −k ) naturally have expansions in terms of expectations of zonal polynomials.
We denote by P m the space of m × m symmetric matrices. The notation A > 0 denotes the set of matrices A ∈ P m that are positive-definite symmetric. Short proofs are given in the text, and more elaborate ones are given in Appendixes A-C.

Application: IV Regression
In an IV regression with known reduced form covariance matrix, we have (after suitable standardizing transformations), 3 the system with y (n × 1) independent of X(n × k) and Z (n × p) fixed (exogenous). The IV estimator for β is the OLS estimator and, assuming standard normality for the rows of (u,U), we have that, conditionally, given X, Thus, the conditional covariance matrix of (b − β) is W −1 , and the unconditional covariance matrix its expectation. 4

A Useful Representation
The IV regression example just discussed is useful for expository purposes. Thus, we introduce a vector x whose conditional distribution, given W, is N(0,W −1 ). That is, the conditional covariance matrix of x, given W, is E x|W [xx ] = W −1 , and its unconditional covariance matrix is the object of our desire. But, confining attention to the case = I m, where the final expectation is now with respect to the marginal density of x: 3 Briefly, we start from an equation involving T × (k + 1) variables (ỹ,Ỹ), say. The IV equation isỹ =Ỹβ + u, and in the background is a reduced form forỸ,Ỹ =Z + U, withZ T × p. Since we are assuming a known covariance matrix for the rows of (u,U), we may assume that standardizing transformations have reduced this to an identity matrix. In addition, if the IV equation involves other variables, say Z 1 (including a constant term), we assume these have been partialled out (i.e., (ỹ,Ỹ) are actually residuals after regression on Z 1 ).
The IV estimators are of the form b = (Ỹ PỸ) −1Ỹ Pỹ, where P is an idempotent of rank n ≥ k. It therefore has the form V(V V) −1 V for some choice of instruments V(n × k). Both n and V depend on the choice of estimator. Now, The estimator is then, as defined in the text, b = (X X) −1 X y. Further details can be found in Hillier, Kinal, and Srivastava (1984). 4 The result we obtain below for this expectation, although reasonably complex, is considerably simpler than that given in Hillier et al. (1984) for the second moment matrix of the IV estimator. That is because we are here assuming that the covariance matrix of (u,U) is known (and so can be taken to be the identity). That assumption was not made in the earlier paper, and both the mean and covariance matrix of the conditional distribution of b, given X, there depend on X. This complicates matters considerably.
Using a standard result on Laplace transforms of zonal polynomials (see Constantine, 1963, equation (1), for example), the marginal density of x is easily seen to be In the IV regression example, this equation (with m = k and = Z Z ) is simply the unconditional density of the recentered IV estimator b − β. We note, for latter use, the fact that, when evaluating the expectation of any function of x that is invariant under x → Hx, H ∈ O(m), the density (16) may be replaced by its average over O(m) under the transformation x → Hx, H ∈ O(m). Using (6), this is: An exactly analogous argument applies, of course, to the density of W itself (see equation (37) below).

Remark 1.
This argument can easily be generalized to the case of an m × r matrix X whose columns are independent N(0,W −1 ) vectors. See Section 4.2.2.

PRELIMINARIES
In the representation in terms of x, note that since the conditional mean E x|W [x] is zero, so is the unconditional mean. The required covariance matrix E [W −1 ] = E [xx ] will be a matrix-valued function of , say ( ). The expectation operator thus defines a mapping from the space of symmetric matrices, P m , to itself. This map has the following simple but important property.
. . , ± 1} (i.e., the group of sign changes), it is easy to see that A ∈ P m is invariant under the action of T if and only if A is diagonal. But, by equivariance, for every T in this subgroup, Translated into conclusions about the problem of interest, these results say: and ( ) have common eigenvectors. (iii) When is spherical (proportional to the identity matrix), so is ( ), and (iv) variables with density pdf (x;D),D diagonal, are uncorrelated (although manifestly not independent). Thus, only the variances of the x i are needed to determine (D).
The fact that (D) is diagonal, coupled with its equivariance, means that the m diagonal elements of (D), together with the eigenvectors of in L, completely determine ( ). However, even more can be said about these diagonal elements, as follows. Let P r ,r = 2, . . . ,m, denote the permutation (transposition) matrix that interchanges the first and rth elements of a vector in R m . Note that the P r are elements of O(m). We have the following lemma.
Proof. By Lemma 1, (P r DP r ) = P r (D)P r , for each r = 2, . . . ,m. The (1,1) element of the matrix on the left is ψ 1 (P r DP r ). On the right, the (1,1) element is ψ r (D), establishing the result. 5 This result means that we need only determine one of the elements ψ i (D) of = (D), say ψ 1 (D), since this determines all the remaining elements. We will see the properties described in Lemma 2 directly in Section 3 (Theorem 1).

A Useful Lemma
We will make use of the following result.
LEMMA 3. Let ρ be a partition of an integer r > 0,C ρ (·) the zonal polynomial associated with ρ, and n > m − 1 a positive real number. We have where The proof is in Appendix A. Using Lemma 3, the following corollary, which will be useful later, is easily established.

Special Cases
In the central case ( = 0), we have already noted the result in equation (8). In the spherical case with = αI m , = I m , part (ii) of Corollary 1 says that we only need to findψ(α), and this can be deduced from the fact that tr( ) = E [x x] = mψ(α). Thus, we need simply to evaluate E [x x] for this case. The result is easily obtained as an application of Corollary 2.
Proof. Using Corollary 2, we have, when = αI m , This gives the result stated.
When is of rank r < m, it can easily be shown that diagonal elements of (D) corresponding to zero eigenvalues of are all equal to (n − m − 1) −1 . The remaining elements will follow from the main result, the proof of which applies whatever the rank of .

Main Result
According to Lemma 2, we need only consider the case = D, and also need only evaluate ψ 1 (D) = Var[x 1 ], the remaining terms being obtained by transpositions (1,r) of the diagonal elements of D. We prove the following theorem.
). When n > m + 1, the elements on the diagonal of = (D) are given by i = 1, . . . ,m, where D i is D with its ith row and column removed. The ψ i (D) are the eigenvalues of ( ), and ( ) = L L is the spectral decomposition of ( ).
The proof can be found in Appendix B. It is straightforward to check that this result agrees with the known result for the case m = 1, i.e., the case E (χ 2 n (ω)) −1 . As expected (because ( ) is the (unconditional) covariance matrix of x, and W −1 > 0 almost surely), we have the following corollary.
as given in Theorem 1 with replaced by 1 .
Remark 3. Note that, for n large, the term μ k (D 1 ) as defined in the proof of Theorem 1 is approximately equal to n−m+1 2 k / n 2 k , so that Thus, for n large with diagonal elements approximately equal to Remark 4. Theorem 1 evidently enables the evaluation of the expectation of any linear function of the elements of W −1 . For instance, for any fixed matrix A,

Estimation of the Precision Matrix −1
In the central case, equation (8) means that (n−m−1)W −1 is an unbiased estimator of the precision matrix −1 . In the noncentral case, this is no longer the case. Instead, as noted above, To the extent that (n − m − 1) ( 1 ) differs from an identity matrix, the estimator is biased by the second factor in this expression. In the case 1 = αI m , for instance, withψ(α) as in equation (23). For n large, (n − m − 1)ψ(α) is near 1, but the estimator is, in general, biased for moderate n even in this simple case.

EXPECTATION OF THE ZONAL POLYNOMIALS C κ (W −1 )
The zonal polynomials C κ (A), κ k, form a basis for the space of orthogonally invariant homogeneous polynomials of degree k in the elements of a symmetric matrix A (see, for instance, Macdonald, 1995). That is, any scalar function of W −1 that is homogeneous and invariant under W → HWH , H ∈ O(m) can, in principle, be expressed as a linear combination of the corresponding zonal polynomials. Scalar functions of W −1 of potential interest in this class include the following three cases: 6 powers of the determinant, powers of the trace, and the power sums Other examples include f v (W) = s i=1 tr(W −i ) ν i , with ν 1 + 2ν 2 + · · · + sν s = k, which can be written as linear combination of the C κ (W −1 ), for κ k. The cases (tr(W −1 )) k = κ k C κ (W −1 ) and tr(W −k ) (i.e., τ k and π k ) are both in this class, with s = 1 and v 1 = k in the first case, and s = k,v i = 0, for i = 1, . . . ,k − 1, and v k = 1 in the second case. Hence, if the expectations of the C κ (W −1 ), for all κ k, are known , so are the expectations of all other functions of W −1 in this class. This fact is the motivation for this section. The computation of the functions appearing in the results in this section is discussed in Section 5.
For orthogonally invariant functions f (W) (including the zonal polynomials), the following results hold. The proofs are trivial and omitted. Evidently, in the general case, the expectations E , [f (W)] will involve a class of functions of ( , ) more general than symmetric polynomials in either argument. We will see this directly below. However, when = I m , the expectations of functions in the class referred to in the lemma depend only on the eigenvalues of , i.e., D, and may be computed from the averaged density of W, We begin with the simplest case, the top-order zonal polynomials, with = I m .

Expectations of Top-Order Zonal Polynomials
The case of top-order zonal polynomials can be dealt with by exploiting the conditioning argument explained in Section 1.2. 7 When x|W ∼ N(0,W −1 ), it is straightforward to see (using the moment generating function of q = x x) that the conditional moments of q are given by and therefore, with the latter expectation evaluated in the density given in equation (17). We may evaluate the expectations E [(1 + q) k ] first, using Corollary 2, then use the (binomial) expansion to recover E [q k ]. Defining ϕ k ( ) := E [(1 + q) k ], ϕ 0 = 1, we will then have Thus, directly from Corollary 2, we have the following theorem.
THEOREM 2. For W ∼ W m (n,I m , ) and k < n−m+1 2 , the expectation of the top-order zonal polynomial C k (W −1 ) is given by 7 The conditioning argument used here can evidently be applied to any function of x, say g(x), which, when x ∼ N(0, ), has known expectation G( ). In particular, for any g(·) that is orthogonally invariant (g(Hx) = g(x) ∀H ∈ O(m)),g depends on x only through q = x x,g(x) =ḡ(q) (since q = x x is a maximal invariant under this action). In such cases, we will then have with the expectation on the right evaluated in the density in equation (17). The usefulness of this fact depends, of course, on whether the expectation of the function G(W −1 ) is itself of interest.

General Zonal Polynomials
4.2.1. The Central Case. Before tackling the noncentral case, we record here the results for the central case, for ease of reference later. The following result for C κ (W) is standard: For C κ (W −1 ), the result is essentially given by Constantine (1966, equation 10) and (in different notation) Khatri (1966, Lemma 4), which we repeat here for completeness. 8

The Noncentral Case.
For the general case, the conditioning device used in the previous subsection for the top-order polynomials is one possible route. For, from Kushner and Meisner (1984, equation (2.12)), we have an integral expression for a zonal polynomial C κ (W −1 ): 8 The formulas (43) and (46)  where X is m × r with independent columns each distributed as N(0,W −1 ). Here, r ≤ m is the number of nonzero parts of κ, and the function κ (·), for κ = (k 1 ,k 2 , . . . ,k m ) k, is defined, for a symmetric matrix A, by where A i is the upper left i × i principal submatrix of A (so A 1 = a 11 and A m = A). The expectation can, in principle, therefore be evaluated, as before, in the unconditional density of X by interchanging the expectation operations. In the case r = 1 (i.e., C κ (·) is the top-order polynomial), this yields the results given in the previous subsection, since (k) (X X) = x 1 x 1 , where x 1 is the first column of X. In general, however, this approach produces an integral that seems difficult to evaluate. We therefore adopt a different, but closely related, direct approach.

Direct Approach, Noncentral Case
Consider first the case = I m . Direct computation using the averaged density for W in equation (37) means that we can write with numerical coefficients Recall that the absence of subscripts on E[·] means that W ∼ W m (n,I m ). The problem thus reduces to the evaluation of the expectations of the products C κ (W −1 )C α (W) when W ∼ W m (n,I m ). We will give an explicit expression for these coefficients.
The key to the results given below is the following simple lemma, expressing the zonal polynomial C κ (W −1 ) in terms of a power of the determinant of W, multiplied by a zonal polynomial with argument W itself. LEMMA 6. For κ = (k 1 , . . . ,k m ) k and r any integer ≥ k 1 , the zonal polynomial C κ (W −1 ) satisfies where κ r is given by κ r = (r − k m ,r − k m−1 , . . . ,r − k 1 ), a partition of rm − k. Constantine (1966, p. 217) mentions this relationship without proof (there the partition κ r is denoted by κ * ). Takemura (1984) states the result in Lemma 2, p. 54, and gives a detailed proof. Another statement and short proof may be found in Macdonald's (2013) notes. 9 Lemma 6 enables a direct evaluation of the c κ,α , but we also require the following result on the so-called linearization of a product of zonal polynomials (Constantine, 1966;Kushner, 1988;Macdonald, 1995). This says that, for α j, λ l, and certain coefficients g δ α,λ , we have The existence of such coefficients follows from the fact that the left-hand side is a homogeneous invariant polynomial of degree j + l, and the zonal polynomials span the space of such functions. The coefficients g δ α,λ may not all be nonzero, as δ varies over the partitions of j + l; they can be related to the transition matrices between the various bases of symmetric functions, and are discussed further in Section 5 below.
We also use the following easy consequence of the result in equation (43) and the linearization just given.
LEMMA 7. If W ∼ W m (n, ), α j, and λ l, Combining the previous two lemmas, we have the following lemma. 10 LEMMA 8. For α j and κ = (k 1 , . . . ,k m ) k, and r an integer satisfying k 1 ≤ r < n−m+1 2 , 9 When m = 2, the problem simplifies because, in that case, we can take r = k in Lemma 6, and observe that κ k = κ, so that C κ (W −1 ) = |W| −k C κ (W), generalizing the fact that (tr(W −1 )) k = |W| −k (tr(W)) k when m = 2. 10 Note that Lemma 8 implies that, unlike its two components, the function f κ,α (W) = C κ (W −1 )C α (W) is not an eigenfunction of the W m (n, ) distribution, i.e., (regrettably) E [C κ (W −1 )C α (W)] is not a multiple of C κ ( −1 )C α ( ). The latter is, instead, Remark 5. In the central case, the term C α (W) is missing, and the above result becomes This agrees with the result in equation (46), because, for any k 1 ≤ r < n−m+1 2 , it is easy to see that m n 2 − r,κ r = m n 2 , − κ . Applying Lemma 8 with = I m , we have which produces the following theorem.

General
When is not proportional to I m , the above argument using the averaged density (37) is not available, and an alternative approach is needed. Instead, we have The expectation term on the right-hand side is, on using Lemma 6 and Davis (1979, equation (2.6)), where θ κ r ,α φ = C κ r ,α φ (I m ,I m )/C φ (I m ). The somewhat elaborate notation needed here is explained in Davis (1979). The polynomials C λ,α φ (A,B) are Davis's (1980Davis's ( , 1981 Using this result, we obtain the most general expression for the expectation of C κ (W −1 ). 11 THEOREM 4. For κ k and any r satisfying k 1 ≤ r < n−m+1 2 , and W ∼ W m (n, , ),

Applications: The Functions δ k ( , ),τ k ( , ), and π k ( , )
First observe that, in the case m = 1, the three functions coincide, being just the expected value of a negative power of a noncentral χ 2 n (λ) variate. Hence, for m = 1, we have, for all k < n 2 , Note, in particular, that (unlike the expression for positive powers) this involves an infinite series in λ. It follows that, for general m, we cannot expect to obtain finite expansions for the expectations of these functions. Indeed, the simplest example is δ k ( , ) = E , |W| −k , for which the result is given in Muirhead (1982, Theorem 10.3.7), with r replaced by −k, and is a generalization of the result just given for m = 1: for k < n−m+1 2 , In contrast to the case E , |W| k , which is a polynomial of degree km, the series expansion for E , |W| −k does not terminate. We do not need to consider this case further. 4.5.1. τ k and π k in the Central Case. Since tr(W −1 ) k = κ k C κ (W −1 ), we can set r = k in Lemma 6 and, using Lemma 5, obtain at once: 11 The analogous (infinite series) result for E , [C κ (W)] is given as Corollary 4.1 in Díaz-García and Gutiérrez-Jáimez (2001). However, we are able to show that it is possible to obtain an expression for the expectation of C κ (W) as a finite sum of invariant polynomials in the matrices ( , ). The derivation is beyond the scope of the present paper.
The corresponding result for the power sums π k is less straightforward. However, being a homogeneous and invariant function of W, the power sum tr(W −k ) has an expansion in terms of zonal polynomials of the form: with coefficients d k,κ that, we show in Appendix C, are given by for κ = (k 1 ,k 2 , . . . ,k m ) k, and (c) 0 = 1 by convention. 12 Given this expansion, the result in Lemma 5 can again be invoked to yield, for the central case, 4.5.2. τ k and π k in the Noncentral Case. In the noncentral case, for τ k , we use the expansion tr(W −1 ) k = κ k C κ (W −1 ), and then set r = k in Lemma 6 to obtain the following immediate consequence.
COROLLARY 4. For k < n−m+1 2 , and = I m , with coefficients Finally, for π k ( , ) = E , [tr(W −k )], we may use equation (63), and, combining this with Lemma 6, we have the expansion The coefficients d k,κ are those given earlier in equation (64). Multiplying the above expansion by C α (W) and taking the expectation using Lemma 8 produces the following proposition. 12 The coefficients d k,κ are the elements in the first row of the transition matrix D k , say, that maps the zonal polynomials into the power-sum symmetric functions. The matrix D k is not known explicitly, and equation (64) does not seem to be widely known in the literature. See Section 5 and Appendix C for further discussion.
PROPOSITION 2. For α j and k < n−m+1 We therefore have, for the case = I m , the following theorem.
In the case where is not proportional to I m , the density-averaging device is again unavailable, and we need to rely on the Davis two-matrix-argument polynomials again. The result (really a corollary of Theorem 4) is the following theorem.

TRANSITION MATRICES AND COMPUTATION ISSUES
In order to compute E[W −1 ] in Theorem 1, and expectations of the scalar functions of W −1 discussed in Section 4, we need to compute zonal polynomials and hypergeometric functions with matrix argument. The fastest algorithms for computing the Jack polynomials (of which zonal polynomials are a special case) and hypergeometric functions with matrix arguments are those available in Koev and Edelman (2006) and Chan et al. (2019).
For the results in Section 4, we need to compute the transition matrices between the different bases for the space of homogeneous symmetric functions in the eigenvalues of W. If w 1 , . . . ,w m are the eigenvalues of W, the three bases that we are interested in are (1) the monomial symmetric functions, (2) the power-sum symmetric functions, and (3) the zonal polynomials. For κ = (k 1 , . . . ,k m ) k, the monomial symmetric function is defined as The power-sum symmetric functions are defined as where p i = m j=1 w i j = tr(W i ). For a given value of k, we arrange these symmetric functions in reverse lexicographical order and define the corresponding basis vectors as . . .
Our particular objective is to compute the transition matrix D k between C (k) and p (k) , defined by The matrix D k (and other analogous transition matrices) is universal, i.e., its elements do not depend on the argument matrix W, so, once computed, it is known. Its dimension, though-being the number of partitions of k-grows rapidly with k. For k ≤ 5, D k is available from James (1961), and it is obviously desirable to be able to compute D k for arbitrary k. This task has hitherto been regarded as prohibitively time-consuming. For example, Gutiérrez, Rodriguez, and Saeź (2000) spent about 8 days to compute the transition matrix that transforms C (k) into m (k) , for k = 20, using double precision. However, we have recently written programs that are vastly more efficient when compared with existing alternatives. 13 Programs to implement the calculation of D k , as well as other results in the paper, are available from the authors upon request. We shall briefly outline how these are obtained. We first compute the inverse of D k . We do this in two steps, first computing the transition matrix C k that expands C (k) in terms of m (k) , then the transition matrix It appears to be novel in the econometric literature. Although very technical, the results obtained for the expectations of zonal polynomials provide the basis for solving a large class of moment problems that are potentially of interest. These also seem to be new. Finally, it may be noted that all of the results presented here can easily be generalized to the complex noncentral Wishart case. This distribution has important applications in physics and electronics.

A. Proof of Lemma 3
The functions in the integrand on the left are invariant under x → Hx, H ∈ O(m). We may therefore use the density (17) in evaluating them. Integrating over x ∈ R m , the coefficients of the polynomials C ρ ( ) in the expansion of the hypergeometric function with two matrix arguments are the numerical coefficients multiplied by the expressions Since the integral must evaluate to unity, we can equate coefficients of the polynomials C ρ ( ) in this expression and in the expansion of etr( /2) to obtain equation (20). The second expression arises by transforming x → (q,h) in the first integral with q = x x and h = x(x x) − 1 2 . The integrand is constant on the unit sphere h h = 1, so integrating over h h = 1 produces the result. 14

B. Proof of Theorem 1
In this appendix, we use the following notation: for any n × k matrix A with rank(A) = k (so n ≥ k), we define M A = I T − A(A A) −1 A . In additon, we denote by e 1 the unit vector (e 1 = (1,0,...,0) ), whatever its dimension. According to Lemma 2, we need only consider the case = D, i.e., W ∼ W m (n,I m ,D). The conditional variance of x 1 given W is the (1,1) element of W −1 : where W 22 is a submatrix of W without its first row and column. We need the expectation of this with respect to the distribution of W. Writing W = Z Z, we can assume that Z ∼ N D 1 2 0 ,I n ⊗ I m .
Therefore, as in equation (9) in the text (with n replaced by n − m + 1), When ω 1 > 0, the unconditional variance of x 1 is the expectation of this with respect to the distribution of Z 2 : where Now, let where R = Z 22 Z 22 ∼ W m−1 (n − 1,I m−1 ,D 1 ) is independent ofz 21 . Writing z = R − 1 2z 21 , we have z|R ∼ N(0,R −1 ). That is, the joint distribution properties of (z,R) are exactly as above for (x,W), but with (n,m) replaced by (n − 1,m − 1) and D replaced by D 1 . Since e 1 M Z 2 e 1 = (1 + z z) −1 , we need to evaluate the expectations when z has density pdf (m−1) (z;D 1 ). These clearly exist for all k ≥ 0, and we can use Corollary 2, with (n,m) replaced by (n − 1,m − 1) and s replaced by −k to obtain Substituting this into equation (88) gives ψ 1 (D), and Lemma 2 then implies the result stated. Note that, because the hypergeometric function is symmetric in the elements of its argument, the order of the terms in D i is irrelevant.