Limiting empirical spectral distribution for the non-backtracking matrix of an Erd\H{o}s-R\'enyi random graph

In this note, we give a precise description of the limiting empirical spectral distribution (ESD) for the non-backtracking matrices for an Erd\H{o}s-R\'{e}nyi graph assuming $np/\log n$ tends to infinity. We show that derandomizing part of the non-backtracking random matrix simplifies the spectrum considerably, and then we use Tao and Vu's replacement principle and the Bauer-Fike theorem to show that the partly derandomized spectrum is, in fact, very close to the original spectrum.


Introduction
For a simple undirected graph G = (V, E), the non-backtracking matrix is defined as follows.
For each (i, j) ∈ E, form two directed edges i → j and j → i.The non-backtracking matrix B is a 2|E| × 2|E| matrix such that The central question of the current paper is the following: Question 1.1.What can be said about the eigenvalues of the non-backtracking matrix B of random graphs as |V | → ∞?
The non-backtracking matrix was proposed by Hashimoto [Has89a].The spectrum of the nonbacktracking matrix for random graphs was studied by Angel, Friedman, and Hoory [AFH15] in the case where the underlying graph is the tree covering of a finite graph.Motivated by the question of community detection (see [KMM + 13, Mas14, MNS13, MNS15]), Bordenave, Lelarge, and Massoulié [BLM15] determined the size of the largest eigenvalue and gave bounds for the sizes of all other eigenvalues for non-backtracking matrices when the underlying graph is drawn from a generalization of Erdős-Rényi random graphs called the Stochastic Block Model (see [HLL83]), and this work was further extended to the Degree-Corrected Stochastic Block Model (see [KN11]) by Gulikers, Lelarge, and Massoulié [GLM16].In recent work, Benaych-Georges, Bordenave and Knowles [BGBK17] studied the spectral radii of the sparse inhomogeneous Erdős-Rényi graph through a novel application of non-backtracking matrices.Stephan and Massoulié [SM20] also conducted a study on the non-backtracking spectra of weighted inhomogeneous random graphs.
In the current paper, we give a precise characterization of the limiting distribution of the eigenvalues for the non-backtracking matrix when the underlying graph is the Erdős-Rényi random graph G(n, p), where each edge ij is present independently with probability p, and where we exclude loops (edges of the form ii). We will allow p to be constant or decreasing sublinearly with n, which contrasts to the bounds proved in [BLM15] and [GLM16] corresponding to the case p = c/n with c a constant.Let A be the adjacency matrix of G(n, p), so A ij = 1 exactly when edge ij is part of the graph G and A ij = 0 otherwise; and let D the diagonal matrix with D ii = n j=1 A ij .Much is known about the eigenvalues of A, going back to works of Wigner in the 1950s [Wig55,Wig58] (see also [Gre63] and [Arn67]), who proved that the distribution of eigenvalues follows the semicircular law for any constant p ∈ (0, 1).More recent results have considered the case where p tends to zero, making the random graph sparse.It is known that assuming np → ∞, the empirical spectral distribution (ESD) of the adjacency matrix A converges to the semicircle distribution (see for example [KP93] or [TVW13]).Actually, much stronger results have been proved about the eigenvalues of A (see the surveys [Vu08] and [BGK16]).For example, Erdős, Knowles, Yau, and Yin [EKYY13] proved that as long as there is a constant C so that np > (log n) C log log n (and thus np → ∞ faster than logarithmic speed), the eigenvalues of the adjacency matrix A satisfy a result called the local semicircle law.This law characterizes the distribution of the eigenvalues in small intervals that shrink as the size of the matrix n increases.The most recent development regarding the local semicircle law can be found in [HKM19] and [ADK22].
It has been shown in [Has89b,Bas92,AFH15] (for example, Theorem 1.5 from [AFH15]) that the spectrum of B is the set {±1} ∪ {µ : det(µ 2 I − µA + D − I) = 0}, or equivalently, the set {±1} ∪ {eigenvalues of H}, where We will call this 2n × 2n matrix H the non-backtracking spectrum operator for A, and we will show that the spectrum of H may be precisely described, thus giving a precise description of the eigenvalues of the non-backtracking matrix B. We will study the eigenvalues of H in two regions: the dense region, where p ∈ (0, 1) and p is fixed constant; and the sparse region, where p = o(1) and np → ∞.The diluted region, where p = c/n for some constant c > 1, is the region for which the bounds in [BLM15] and [GLM16] apply, and, as pointed out by [BLM15], it would be interesting to determine the limiting eigenvalue distribution of H in this region.
Note that E(D) = (n − 1)pI, and so we will let α = (n − 1)p − 1 and consider the partly averaged matrix The partly averaged matrix H 0 will be an essential tool in quantifying the eigenvalues of the non-backtracking spectrum operator H. Three main ideas are at the core of this paper: first, that partial derandomization can greatly simplify the spectrum; second, that Tao and Vu's replacement principle [TV10, Theorem 2.1] can be usefully applied to two sequences of random matrices that are highly dependent on each other; and third, that in this case, the partly derandomized matrix may be viewed as a small perturbation of the original matrix, allowing one to apply results from perturbation theory like the Bauer-Fike Theorem.The use of Tao and Vu's replacement principle here is novel, as it is used to compare the spectra of a sequence of matrices with some dependencies among the entries to a sequence of matrices where all random entires are independent; typically, the Tao-Vu replacement principle has been applied in cases where the two sequences of random matrices both have independent entries, see for example [TV10,Woo12,Woo16].
1.1.Results.Throughout the rest of this paper, we mainly focus on sparse random graphs where p ∈ (0, 1) may tend to zero as n → ∞.Our first result shows that the spectrum of H 0 can be determined very precisely in terms of the spectrum of the random Hermitian matrix A, which is well-understood.
Proposition 1.2 (Spectrum of the partly averaged matrix).Let H 0 be defined as in (1.2), and let 0 < p ≤ p 0 < 1 for a constant p 0 .If p ≥ C/ √ n for some large constant C > 0, then, with probability ) and 1)); all other eigenvalues for 1 √ α H 0 are complex with magnitude 1 and occur in complex conjugate pairs.If np → ∞ with n, then the real parts of the eigenvalues in the circular arcs are distributed according to the semicircle law.
Remark 1.3.When n −1+ϵ ≤ p ≤ n −1/2 , more real eigenvalues of H 0 will emerge.We provide a short discussion on the real eigenvalues of H 0 in Section 2.2.Note that as long as the number of real eigenvalues is bounded by a fixed constant, for example when p ≥ C/ √ n, the bulk distribution of H 0 is two arcs on the unit circle, with density so that real parts of the eigenvalues follow the semicircular law.
The spectrum of the non-backtracking matrix for a degree regular graph was studied in [Bor15], including proving some precise eigenvalue estimates.One can view Proposition 1.2 as extending this general approach by using averaged degree counts, but allowing the graph to no longer be degree regular.Thus, Proposition 1.2 shows that partly averaging H to get H 0 is enough to allow the spectrum to be computed very precisely.Our main results are the theorems below, which show that the empirical spectral measures µ H for H and µ H 0 for H 0 are very close to each other, even for p a decreasing function of n.(The definitions of the measure µ M for a matrix M and the definition of almost sure convergence of measures are given in Section 1.3).
Theorem 1.4.Let A be the adjacency matrix for an Erdős-Rényi random graph G(n, p).Assume 0 < p ≤ p 0 < 1 for a constant p 0 and np/ log n → ∞ with n.Let 1 √ α H be a rescaling of the non-backtracking spectrum operator for A defined in (1.1) with α = (n − 1)p − 1, and let almost surely (thus, also in probability) to zero as n goes to infinity.Remark 1.5.When p ≫ log n/n, the graph G(n, p) is almost a random regular graph and thus H 0 appears to be a good approximation of H.When p becomes smaller, such an approximation is no longer accurate.In this sense, Theorem 1.4 is optimal.
In Figure 1, we plot the eigenvalues of 1 √ α H and 1 √ α H 0 for an Erdős-Rényi random graph G(n, p), where n = 500.The blue circles mark the eigenvalues for H/ √ α and the red x's mark the eigenvalues for H 0 / √ α.We can see that the empirical spectral measures of H/ √ α and H 0 / √ α are very close for p not too small.As p becomes smaller (note that here log n/n ≈ 0.0054), the eigenvalues of H 0 / √ α still lie on the arcs of the unit circle whereas the eigenvalues of H/ √ α start to escape and be attracted to the inside of the circle.
To prove that the bulk eigenvalue distributions converge in Theorem 1.4, we will use Tao and Vu's replacement principle [TV10, Theorem 2.1] (see also Theorem 3.2), which was a key step in proving the circular law.The replacement principle lets one compare eigenvalue distributions of two sequences of random matrices, and it has often been used in cases where one type of random input-for example, standard Gaussian normal entries-is replaced by a different type of random input-for example, arbitrary mean 0, variance 1 entries.This is how the replacement principle was used to prove the circular law in [TV10], and it was used similarly in, for example, [Woo12,Woo16].The application of the replacement principle in the current paper is distinct in that the entries for one ensemble of random matrices, namely H, has some dependencies between the entries, whereas the random entries of H 0 are all independent.
Our third result (Theorem 1.6 below) proves that all eigenvalues of H are close to those of H 0 with high probability when p ≫ log 2/3 n n 1/6 , which implies that there are no outlier eigenvalues of H, that is, no eigenvalues of H that are far outside the support of the spectrum of H 0 (described in Theorem 1.2).
Theorem 1.6.Assume 0 < p ≤ p 0 < 1 for a constant p 0 and p ≥ log 2/3+ε n n 1/6 for ε > 0. Let A be the adjacency matrix for an Erdős-Rényi random graph G(n, p).Let 1 √ α H be a rescaling of the non-backtracking spectrum operator for A defined in (1.1).Then, with probability In the upcoming Section 2, it will be demonstrated that eigenvalues in the bulk of the distributions for 1 √ α H and for 1 √ α H 0 have absolute value 1.Since p ≫ log 2/3 n n 1/6 , we have R = 40 log n np 2 = o(1), and consequently, Theorem 1.6 provides informative results.We would like to mention that the above result has been improved to hold for p ≫ log n/n and that each eigenvalue of of 1 ) of an eigenvalue of 1 √ α H 0 , using a variant of Bauer-Fike perturbation theorem that appeared later in [CZ21, Corollary 2.4], as opposed to invoking the classical Bauer-Fike theorem in this paper (see Theorem 4.1).1.2.Outline.We will describe the ESD of the partly averaged matrix H 0 to prove Proposition 1.2 in Section 2. In Section 3, we will show that the ESDs of H and H 0 approach each other as n goes to infinity by using the replacement principle [TV10, Theorem 2.1] and in Section 4 we will use the Bauer-Fike theorem to prove Theorem 1.6, showing that the partly averaged matrix H 0 has eigenvalues close to those of H in the limit as n → ∞.
1.3.Background definitions.We give a few definitions to make clear the convergence described in Theorem 1.4 between empirical spectral distribution measures of H and H 0 .For an n × n matrix M n with eigenvalues λ 1 , . . ., λ n , the empirical spectral measure µ Mn of M n is defined to be where δ x is the Dirac delta function with mass 1 at x.Note that µ Mn is a probability measure on the complex numbers C. The empirical spectral distribution (ESD) for M n is defined to be For T a topological space (for example R or C) and B its Borel σ-field, we can define convergence of a sequence (µ n ) n≥1 of random probability measures on (T, B) to a nonrandom probability measure µ also on (T, B) as follows.We say that µ n converges weakly to µ in probability as n → ∞ (written µ n → µ in probability) if for all bounded continuous functions f : T → R and all ϵ > 0 we have Also, we say that µ n converges weakly to µ almost surely as n → ∞ (written µ n → µ a.s.) if for all bounded continuous functions f : T → R, we have that T f dµ n − T f dµ → 0 almost surely as n → ∞.
We will use ∥A∥ F := tr (AA * ) 1/2 to denote the Frobenius norm or Hilbert-Schmidt norm, and ∥A∥ to denote the operator norm.We denote ∥A∥ max = max ij |a ij |.We use the notation o(1) to denote a small quantity that tends to zero as n goes to infinity.We use the asymptotic notation for a constant C > 0 when n is sufficiently large.Finally, we will use I or I n to denote the identity matrix, where the subscript n will be omitted when the dimension can be inferred by context.

The spectrum of H 0
We are interested in the limiting ESD of H when H is scaled to have bounded support (except for one outlier eigenvalue), and so we will work with the following rescaled conjugation of H, which has the same eigenvalues as H/ √ α.
Note that the diagonal matrix 1 α (I − D) is equal to −I in expectation, and so we will compare the eigenvalues of ‹ H to those of the partly averaged matrix ‹ H 0 , noting that ‹ H = ‹ H 0 + E, where Note that H 0 / √ α and ‹ H 0 also have identical eigenvalues.We will show that ‹ H 0 is explicitly diagonalizable in terms of the eigenvectors and eigenvalues of 1 √ α A, and then use this information to find an explicit form for the characteristic polynomial for ‹ H 0 .
where U is an orthogonal matrix.Consider the matrix xI − ‹ H 0 , and note that With the characteristic polynomial for ‹ H 0 factored into quadratics as in (2.2), we see that for each λ i of 1 √ α A, there are two eigenvalues µ 2i−1 and µ 2i for ‹ H 0 which are the two solutions to x 2 − λ i x + 1 = 0; thus, The eigenvalues of A are well-understood.We use the following results that exist in literature.
Then for any ϵ > 0, the following holds with probability 1 − o(1): Proof.We collect relevant results regarding the eigenvalues of A from different works in the literature.In [KS03], it is shown that with probability 1 −o(1), λ 1 (A) = (1 + o(1)) max{np, √ ∆} where ∆ is the maximum degree.As long as np/ log n → ∞, max{np, √ ∆} = np (for the bounds on ∆ see, for instance, the proof of Lemma 3.5 below).
The operator norm of A − EA and the extreme eigenvalues of A have been studied in various works (see [FK81, Vu07, BGBK17, EKYY13, LS18, HLY20, HK21]).In particular, in [LS18, Theorem 2.9], assuming p ≥ n −1+ϕ , the authors proved that for any ϵ > 0 and C > 0, the following estimate holds with probability at least 1 − n −C : The conclusion of the theorem follows immediately from the classical Weyl's inequality that max Now we are ready to derive Proposition 1.2.
Proof of Proposition 1.2.Note that λ i = λ i (A)/ √ α and α = (n − 1)p − 1.We have that with probability 1 − o(1) by Theorem 2.1.Therefore, for λ 1 , we see from (2.3) that µ 1 , µ 2 are real eigenvalues and with probability 1 − o(1).Next, by Theorem 2.1, it holds with probability 1 − o(1) for any 2 ≤ i ≤ n that Since p ≥ C/ √ n for a sufficiently large constant C, we have for all sufficiently large n.Hence, for all i ≥ 2, we have λ 2 i < 4 and thus µ 2i−1 , µ 2i are complex eigenvalues with magnitude 1 (since |µ 2i−1 | = |µ 2i | = 1).One should also note that µ 2i−1 µ 2i = 1 for every i, and that whenever µ 2i−1 is complex (i.e., i ≥ 2), its complex conjugate is It is known that the empirical spectral measure of A/ np(1 − p) converges to the semicircular law supported on [−2, 2] assuming np → ∞ (see for instance [KP93] or [TVW13]).We have the ESD of the scaled real parts of µ j 1 2n weakly almost surely where µ sc is the semicircular law supported on [−2, 2].The proof of Proposition 1.2 is now complete.□ 2.2.Real eigenvalues of ‹ H 0 when p ≤ n −1/2 .As mentioned in Remark 1.3, when p becomes smaller than n −1/2 , more real eigenvalues of ‹ H 0 will emerge.We can identify some of these eigenvalues, using recent results of [EKYY13, LS18, HLY20, HK21] in the study of the extreme eigenvalues of A. For instance, in [LS18, Corollary 2.13], assume n 2ϕ−1 ≤ p ≤ n −2ϕ ′ for ϕ > 1/6 and ϕ ′ > 0. Then where Therefore, when p ≥ n −2/3+ϵ , by noting that F T W 1 (s) → 1 as s → ∞ and selecting s to be a large constant in (2.5), we see that Hence, from (2.3), both µ 3 and µ 4 are real.The convergence result (2.5) holds for finitely many extreme eigenvalues of A and thus they also generate real eigenvalues for ‹ H 0 .The fluctuation of the extreme eigenvalues of A has been obtained in [HLY20, Corollary 1.5] for n −7/9 ≪ p ≪ n −2/3 and in [HK21] for the remaining range of p up to p ≥ n −1+ϵ .One could use similar discussion as above to extract information about the real eigenvalues of ‹ H 0 .The details are omitted.

‹
H 0 is diagonalizable.We can now demonstrate an explicit diagonalization for ‹ H 0 .Since µ 2i−1 and µ 2i are solutions to µ 2 − µλ i + 1 = 0, one can check that the following vectors 2i for all i.Furthermore, y 2i−1 and y 2i are unit vectors.For 1 ≤ i ≤ n, define the vectors we see that X = Y −1 since v 1 , . . ., v n are orthonormal.Also it is easy to check that Y ‹ H 0 X = diag(µ 1 , . . ., µ 2n ).

The bulk distribution: proving Theorem 1.4
We begin by re-stating Theorem 1.4 using the conjugated matrices defined in (2.1).
Theorem 3.1.Let A be the adjacency matrix for an Erdős-Rényi random graph G(n, p).Assume 0 < p ≤ p 0 < 1 for a constant p 0 and np/ log n → ∞ with n.Let ‹ H be the rescaled conjugation of the non-backtracking spectrum operator for A defined in (2.1), and let ‹ H 0 be its partial derandomization, also defined in (2.1).Then, µ ‹ H − µ ‹ H 0 converges almost surely (thus, also in probability) to zero as n goes to infinity.
To prove Theorem 3.1, we will show that the bulk distribution of ‹ H matches that of ‹ H 0 using the replacement principle [TV10, Theorem 2.1], which we rephrase slightly as a perturbation result below (see Theorem 3.2).First, we give a few definitions that we will use throughout this section.We say that a random variable and we say that X n is almost surely bounded if Theorem 3.2 (Replacement principle [TV10]).Suppose for each m that M m and M m + P m are random m × m matrices with entries in the complex numbers.Assume that and that, for almost all complex numbers z ∈ C, converges in probability (resp., almost surely) to zero; in particular, this second condition requires that for almost all z ∈ C, the matrices M m + P m − zI and M m − zI have non-zero determinant with probability 1 − o(1) (resp., almost surely non-zero for all but finitely many m).Then µ Mm − µ Mm+Pm converges in probability (resp., almost surely) to zero.
Note that there is no independence assumption anywhere in Theorem 3.2; thus, entries in P m may depend on entries in M m and vice versa.
We will use the following corollary of Theorem 3.2, which essentially says that if the perturbation P m has largest singular value of order less than the smallest singular value for M m − zI for almost every z ∈ C, then adding the perturbation P m does not appreciably change the bulk distribution of M m .
Corollary 3.3.For each m, let M m and P m be random m × m matrices with entries in the complex numbers, and let f (z, m) ≥ 1 be a real function depending on z and m.Assume that with probability tending to 1 (resp., almost surely for all but finitely many m).
Proof.We will show that the three conditions (3.3), (3.4), and (3.5) of Corollary 3.3 together imply the two conditions needed to apply Theorem 3.2.First note that (3.3) is identical to the first condition (3.1) of Theorem 3.2.Next, we will show in the remainder of the proof that condition (3.2) of Theorem 3.2 holds by noting that sufficiently small perturbations have a small effect on the singular values, and also the absolute value of the determinant is equal to the product of the singular values.
Let z be a complex number for which (3.5) holds, let M m − zI have singular values We will use the following result, which is sometimes called Weyl's perturbation theorem for singular values, to show that the s i are small.Lemma 3.4 ([Cha09, Theorem 1.3]).Let A and B be m × n real or complex matrices with singular values σ We then have that max and by (3.5), max which converges to zero in probability (resp., almost surely) by (3.4).Thus we know that where the inequalities hold with probability tending to 1 (resp., almost surely for all sufficiently large m).Using the fact that the absolute value of the determinant is the product of the singular values, we may write (3.2) as Indeed, Lemma 3.7 verifies (3.3), Lemma 3.9 verifies (3.5) and (3.4) follows by combining Lemma 3.5 and Lemma 3.9.Note that the assumption np/ log n → ∞ in Theorem 3.1 is only needed to prove conditions (3.3) and (3.4).Condition (3.5) in fact follows for any p and for more general matrices-see the proof of Lemma 3.9.In Corollary 3.3, we will take M m to be the partly derandomized matrix ‹ H 0 and P m to be the matrix E (see (2.1)), where we suppress the dependence of ‹ H 0 and E on n = m/2 to simplify the notation.There are two interesting features: first, the singular values of ‹ H 0 may be written out explicitly in terms of the eigenvalues of the Hermitian matrix A (which are well understood; see Lemma 3.9); and second, the matrix E is completely determined by the matrix ‹ H 0 , making this a novel application of the replacement principle (Theorem 3.2 and Corollary 3.3) where the sequence of matrices ‹ H 0 + E = ‹ H has some dependencies among the entries.
Lemma 3.5.Assume 0 < p ≤ p 0 < 1 for a constant p 0 .Further assume np/ log n → ∞.For E as defined in (2.1), we have that ∥E∥ ≤ 20 log n np almost surely for all but finitely many n.In particular, ∥E∥ converges to zero almost surely for all but finitely many n.
Proof.First, note that ED = (n − 1)pI = (α + 1)I and thus Since ED − D is a diagonal matrix, it is easy to check that Note that D ii 's have the same distribution.By the union bound, it follows that for any s > 0, Next we will apply the following general form of Chernoff bound.
By our assumption, np = ω(n) log n where ω(n) is a positive function that tends to infinity with n.Now take K = (n − 1)p + npt where t = t(n) = 10 log n np (say).Our assumption np/ log n → ∞ implies t → 0 with n.Thus where Therefore, for n sufficiently large, taking t = 5 log n np , we get Similarly, take L = (n − 1)p − npt where t = 5 log n np .Applying the Chernoff bound yields that We take n sufficiently large such that t = t(n) < 0.01 (say).Then where we use the fact that log(1 − x) < −x for x ∈ (0, 1) and log(1 − x) > −x − 3 5 x 2 for x ∈ (0, 0.01).Hence, we get Since 2αt = 2((n − 1)p − 1)t ≥ npt for n sufficiently large, it follows that By the Borel-Cantelli lemma, we have that ∥E∥ ≤ 10 log n np almost surely for all but finitely many n.

□
To show (3.3), we combine Hoeffding's inequality and Lemma 3.5 to prove the following lemma.
Theorem 3.8 (Hoeffding's inequality [Hoe63]).Let β 1 , . . ., β k be independent random variables such that for 1 ≤ i ≤ k we have P( Then for any real t, Recall that α = (n − 1)p − 1 and ‹ , where A = (a ij ) 1≤i,j≤n is the adjacency matrix of an Erdős-Rényi random graph G(n, p).Thus To apply Hoeffding's inequality, note that a ij (i < j) are iid random variables each taking the value 1 with probability p and 0 otherwise.Let b i = 1 and a i = 0 for all i, and let k = n 2 , which is the number of random entries in A (recall that the diagonal of A is all zeros by assumption).Letting S = i<j a ij , we see that ES = kp and so Take t = p.For n sufficiently large, and since p ≥ ω(n) log n/n for ω(n) > 0 and ω(n) → ∞ with n, we get By the Borel-Cantelli lemma, we conclude that 1 2n ∥ H0 ∥ 2 F is bounded almost surely.Since ∥E∥ max = ∥E∥, by triangle inequality, we see By Lemma 3.5, we get 1 2n ∥ ‹ H 0 + E∥ 2 F is bounded almost surely.This completes the proof.□ The last part of proving Theorem 3.1 by way of Corollary 3.3 is proving that (3.5) holds with M m = ‹ H 0 and f (z, m) = C z , a constant depending only on z.The following lemma will be proved by writing a formula for the singular values of ‹ H 0 in terms of the eigenvalues of the adjacency matrix A, which are well understood.A number of elementary technical details will be needed to prove that the smallest singular value is bounded away from zero, and these appear in Lemma 3.10.Lemma 3.9.Let ‹ H 0 be as defined in (2.1) and let z be a complex number such that Im(z) ̸ = 0 and |z| ̸ = 1 (note that these conditions exclude a set of complex numbers of Lebesgue measure zero).Then there exists a constant C z depending only on z such that ( ‹ H 0 − zI) −1 ≤ C z with probability 1 for all but finitely many n.
Proof.We will compute all the singular values of ‹ H 0 − zI, showing that they are bounded away from zero by a constant depending on z.The proof does not use randomness and depends only on facts about the determinant and singular values and on the structure of ‹ H 0 ; in fact, the proof is the same if ‹ H 0 is replaced with any matrix To find the singular values of ‹ H 0 we will compute the characteristic polynomial χ( w) for ( ‹ H 0 − zI)( ‹ H 0 − zI) * , using the definition of ‹ H 0 in (2.1), and assuming that w = w + 1 + |z| 2 ; thus, We can use the fact that if å is a matrix composed of four n × n square blocks where W and Z commute, then det Because A √ α is Hermitian, it can be diagonalized to L = diag(λ 1 , . . ., λ n ), and thus the above determinant becomes: det The quadratic factors can then be explicitly factored, showing that each λ i generates two singular values for ‹ H 0 − zI, each being the positive square root of The proof of Lemma 3.9 is thus completed by Lemma 3.10 (stated and proved below), which shows that the quantity above is bounded from below by a positive constant depending only on z. □ Lemma 3.10.Let z be a complex number satisfying Im(z) ̸ = 0 and |z| ̸ = 1.Then for any real number λ, we have that where C z is a positive real constant depending only on z.
The proof of Lemma 3.10 is given in Appendix A using elementary calculus, facts about matrices, and case analysis.Lemma 3.10 completes the proof of Lemma 3.5 and thus of Theorem 3.1.

Perturbation theory: proving Theorem 1.6
In this section, we study the eigenvalues of H via perturbation theory.Recall from that the discussion in the beginning of Section 2 that ‹ H in (2.1) has the same eigenvalues as H/ √ α.We Let us begin by defining the spectral separation of matrices.Denote the eigenvalues of a matrix M by η i (M )'s.The spectral variation of M + E with respect to M is defined by Then the number of eigenvalues of H 0 + E in ∪ i∈I C i is exactly |I|.
We will bound the operator norm of E and the condition number ∥Y ∥ Y −1 of Y to prove Theorem 1.6.By Lemma 3.5, we know that ∥E∥ ≤ 20 log n np with probability 1 for all but finitely many n.To bound the condition number of Y , we note that the square of the condition number of Y is equal to the largest eigenvalue of Y Y * divided by the smallest eigenvalue of Y Y * .Using the explicit definition of Y from (2.6), we see from the fact that the v i are orthonormal that where Y i 's are 2 × 2 block matrices of the following form Recall that (where we replaced b 2 by γ in the quantity (A.1)).It is sufficient to complete the proof if we show that for all real a and γ satisfying γ > 0 and a 2 + γ ̸ = 1 that g(λ, a, γ) ≥ C a,γ > 0, where C a,γ is a constant depending only on a and γ.We will prove this by considering three cases and using calculus.
Case I: Ä In this case we will minimize g over γ > 0. Note that .
The quantity in the denominator of the fraction above is always greater than 2 since Ä λ 2 ≥ 4 and γ > 0 by assumption; thus the derivative ∂ ∂γ g is positive for all γ, and hence g (viewed as a function of γ) is strictly increasing as γ increases, for any a and any |λ| ≥ 2. Also, note that which is always strictly positive, showing that g is concave upwards as a function of γ, for any a and any |λ| ≥ 2. By the above we know that ∂ ∂γ g(λ, a, γ/2) ≥ 1 − 2 √ 4+4γ , and we also know that g(λ, a, γ) ≥ 0 for all λ, a, γ (because it is a singular value); thus, using the fact that the tangent line at γ/2 is strictly below g (since it is concave upwards), we see that g(λ, a, γ) , which is a positive constant depending only on γ > 0, completing the proof in Case I.

Figure 1 .
Figure 1.The eigenvalues of H/ √ α defined in (1.1) and H 0 / √ α defined in (1.2) for a sample of G(n, p) with n = 500 and different values of p.The blue circles are the eigenvalues of H/ √ α and the red x's are for H 0 / √ α.For comparison, the black dashed line is the unit circle.For the figures from top to bottom and from left to right, the values of p are taken to be p = 0.5, p = 0.1, p = 0.08 and p = 0.05 respectively.