1. Introduction
Random regular graphs, where each vertex has the same degree d, are among the most wellknown examples of expanders: graphs with high connectivity and which exhibit rapid mixing. Expanders are of particular interest in computer science, from sampling and complexity theory to design of errorcorrecting codes. For an extensive review of their applications, see [Reference Hoory, Linial and Wigderson35]. What makes random regular graphs particularly interesting expanders is the fact that they exhibit all three existing types of expansion properties: edge, vertex, and spectral.
The study of regular random graphs took off with the work of [Reference Bender7, Reference Bender and Rodney Canfield8, Reference Bollobás10], and slightly later [Reference McKay52] and [Reference Wormald64]. Most often, their expanding properties are described in terms of the existence of the spectral gap, which we define below.
Let A be the adjacency matrix of a simple graph, where $A_{ij} = 1$ if i and j are connected and zero otherwise. Denote $\sigma(A) = \{\lambda_1 \geq \lambda_2 \geq \ldots \}$ as its spectrum. For a random dregular graph, $\lambda_1 = \max_{i} \lambda_i = d$ , but the second largest eigenvalue $\eta = \max (\lambda_2, \lambda_{n} )$ is asymptotically almost surely of much smaller order, leading to a spectral gap. Note that we will always use $\eta$ to be the second largest eigenvalue of the adjacency matrix A. For a list of important symbols see Appendix A.
Spectral expansion properties of a graph are, strictly speaking, defined with respect to the smallest nonzero eigenvalue of the normalised Laplacian, $\mathcal{L} = I  D^{1/2} A D^{1/2}$ , where I is the identity and D is the diagonal matrix of vertex degrees. In the case of a dregular graph, $\sigma(\mathcal{L})$ is a scaled and shifted version of $\sigma(A)$ . Thus, a spectral gap for A translates directly into one for $\mathcal{L}$ .
The study of the second largest eigenvalue in regular graphs had a first breakthrough in the AlonBoppana bound [Reference Alon1], which states that the second largest eigenvalue satisfies
Graphs for which the Alon–Boppana bound is attained are called Ramanujan. Friedman [Reference Friedman27] proved the conjecture of [Reference Alon1] that almost all dregular graphs have $\eta \leq 2 \sqrt{d1} + \epsilon$ for any $\epsilon > 0$ with high probability as the number of vertices goes to infinity. This result was simultaneous simplified and deepened in [Reference Friedman and Kohler29]. More recently, [Reference Bordenave12] gave a different proof that $\eta \leq 2 \sqrt{d1}+\epsilon_n$ for a sequence $\epsilon_n \rightarrow 0$ as n, the number of vertices tends to infinity; the new proof is based on the nonbacktracking operator and the IharaBass identity.
1.1. Bipartite biregular model
In this paper, we prove the analog of Friedman and Bordenave’s result for bipartite, biregular random graphs. These are graphs for which the vertex set partitions into two independent sets $V_1$ and $V_2$ , such that all edges occur between the sets. In addition, all vertices in set $V_i$ have the same degree $d_i$ . See Figure 1 for a schematic of such a graph. Along the way, we also bound the smallest positive eigenvalue and the rank of the adjacency matrix.
Let $\mathcal{G}(n,m,d_1,d_2)$ be the uniform distribution of simple, bipartite, biregular random graphs. Any $G \sim \mathcal{G}(n,m,d_1,d_2)$ is sampled uniformly from the set of simple bipartite graphs with vertex set $V=V_1 \bigcup V_2$ , with $V_1=n$ , $V_2=m$ and where every vertex in $V_i$ has degree $d_i$ . Note that we must have $n d_1 = m d_2 = E$ . Without any loss of generality, we will assume $n \leq m$ and thus $d_1 \geq d_2$ when necessary. Sometimes we will write that G is a $(d_1, d_2)$ regular graph, when we want to explicitly state the degrees. Let X be the $n \times m$ matrix with entries $X_{ij}=1$ if and only if there is an edge between vertices $i \in V_1$ and $j \in V_2$ . Using the block form of the adjacency matrix
It is well known that $\mathcal{G}(n,m,d_1,d_2)$ is connected with high probability, as long as $d_i\geq 3$ . From (1), it can be verified that all eigenvalues of A occur in pairs $\lambda$ and $\lambda$ , where $\lambda$ is a singular value of X, along with at least $n  m$ zero eigenvalues. For these reasons, the second largest eigenvalue is $\eta = \lambda_2(A) = \lambda_{n+m1} (A)$ . Furthermore, the leading or Perron eigenvalue of A is always $\sqrt{d_1 d_2}$ , matched to the left by $\sqrt{d_1d_2}$ , which reduces to the result for dregular when $d_1 = d_2$ .
We will focus on the spectrum of the adjacency matrix. Similar to the case of the dregular graph, in the bipartite, biregular graph, the spectrum of the normalized Laplacian is a scaled and shifted version of the adjacency matrix: Because of the structure of the graph, $D^{1/2} A D^{1/2} = \frac{1}{\sqrt{d_1 d_2}} A$ . Therefore, a spectral gap for A again implies that one exists for $\mathcal{L}$ .
Previous work on bipartite, biregular graphs includes the work of [Reference Feng and Li25] and [Reference Li and Solé43], who proved the analog of the Alon–Boppana bound. For every $\epsilon >0$ ,
as the number of vertices goes to infinity. This bound also follows immediately from the fact that the second largest eigenvalue cannot be asymptotically smaller than the right limit of the asymptotic support for the eigenvalue distribution, which is $\sqrt{d_11} + \sqrt{d_2  1}$ and was first computed by [Reference Godsil and Mohar32]. They found the spectral measure $\mu(\lambda)$ has a point mass at $\lambda=0$ of size $\frac{1}{2} d_1  d_2 / (d_1 + d_2)$ and a continuous part given by the density
supported on
Graphs where $\eta$ attains the Alon–Boppana bound, equation (2), are also called Ramanujan. Complete graphs are always Ramanujan but not sparse, whereas dregular or bipartite $(d_1, d_2)$ regular graphs are sparse. Our results show that almost every $(d_1, d_2)$ regular graph is ‘almost’ Ramanujan.
Beyond the first two eigenvalues, we should mention that [Reference Bordenave and Lelarge13] studied the limiting spectral distribution of large sparse graphs. They obtained a set of two coupled equations that can be solved for the eigenvalue distribution of any $(d_1, d_2)$ regular random graph. The solution of the coupled equations for fixed $d_1$ and $d_2$ shows convergence of the spectral distribution of a random regular bipartite graph to the Marčenko–Pastur law. This was first observed by [Reference Godsil and Mohar32]. For $d_1, d_2 \to \infty$ with $d_1/d_2$ converging to a constant, [Reference Dumitriu and Johnson24] showed that the limiting spectral distribution converges to a transformed version of the Marčenko–Pastur law. When $d_1 = d_2 = d$ , this is equal to the Kesten–McKay distribution [Reference McKay51], which becomes the semicircular law as $d \to \infty$ [Reference Godsil and Mohar32, Reference Dumitriu and Johnson24]. Notably, [Reference Mizuno and Sato53] obtained the same results when they calculated the asymptotic distribution of eigenvalues for bipartite, biregular graphs of high girth. However, their results are not applicable to random bipartite biregular graphs as these asymptotically almost surely have low girth [Reference Dumitriu and Johnson24].
Our techniques borrow heavily from the results of [Reference Bordenave, Lelarge and Massoulié14] and [Reference Bordenave12], who simplified the trace method of [Reference Friedman27] by counting nonbacktracking walks built up of segments with at most one cycle, and by relating the eigenvalues of the adjacency matrix to the eigenvalues of the nonbacktracking one via the Ihara–Bass identity. The combinatorial methods we use to bound the number of such walks are similar to how [Reference Brito, Dumitriu, Ganguly, Hoffman and Tran15] counted selfavoiding walks in the context of community recovery in a regular stochastic block model.
Finally, we should mention that similar techniques have been employed by [Reference Coste20] to study the spectral gap of the Markov matrix of a random directed multigraph. The nonbacktracking operator of a bipartite biregular graph could be seen as the adjacency matrix of a directed multigraph, whose eigenvalues are a simple scaling away from the eigenvalues of the Markov matrix of the same. However, the block structure of our nonbacktracking matrix means that the corresponding multigraph is bipartite, and this makes it different from the model used in [Reference Coste20].
1.2. Configuration versus random lift model
Random lifts are a model that allows the construction of large, random graphs by repeatedly lifting the vertices of a base graph and permuting the endpoints of copied edges. See [Reference Bordenave12] for a recent overview. A number of spectral gap results have been obtained for random lift models, e.g. [Reference Friedman27, Reference Angel, Friedman and Hoory2, Reference Friedman and Kohler29, Reference Bordenave12].
Random lift models are contiguous with the configuration model in very particular cases. See Section 4.1 for a definition of the configuration model; this is a useful substitute for the uniform model and is practically equivalent. For even d, random nlifts of a single vertex with $d/2$ selfloops are equivalent to the dregular configuration model. For odd d, no equivalent lift construction is known or even believed to exist.
For $(d_1, d_2)$ biregular, bipartite graphs, the situation is more complicated. A celebrated result due to [Reference Marcus, Spielman and Srivastava48] showed the existence of infinite families of $(d_1, d_2)$ regular bipartite graphs that are Ramanujan. That is, with $\eta = \sqrt{d_11} + \sqrt{d_2  1}$ by taking repeated lifts of the complete bipartite graph on $d_1$ left and $d_2$ right vertices $K_{d_1,d_2}$ . If $d_1=d_2=d$ , then the configuration model is contiguous to the random lift of the multigraph with two vertices and d edges connecting then. Certainly, for a biregular bipartite graph with $n/d_2=m/d_1=k$ not an integer, we cannot construct it by lifting $K_{d_1, d_2}$ as considered by [Reference Marcus, Spielman and Srivastava48]. But even for k integer, it seems likely the two models are not contiguous, for the reasons we now explain.
Suppose there were a base graph G that could be lifted to produce any (3,2)biregular, bipartite graph. Consider another graph H which is a union of 2 complete bipartite graphs $K_{2,3}$ . Then H is a (3,2)biregular, bipartite graph and occurs in the configuration model with nonzero probability. The only G that H could be a lift of is $K_{2,3}$ , because it is a disconnected union and $K_{2,3}$ itself is not a lift of any graph or multigraph (note that $2+3=5$ is prime). Therefore, G would have to be $K_{2,3}$ . Figure 2 shows an example of another graph H’ with the same number of vertices as H which is (3,2)biregular, bipartite but is not a lift of $K_{2,3}$ . Now, H and H’ both occur in the configuration model with equal, nonzero probability. Therefore, we cannot construct every example of a (3,2)biregular, bipartite graph by repeatedly lifting a single base graph G.
Since the eventual goal of any argument based on lifts that also applies to the configuration model would have to show that almost all bipartite, biregular graphs can be obtained by lifting and are sampled asymptotically uniformly from the lift model, the above considerations suggest this argument would be highly nontrivial. We in fact doubt such an argument can be made. Intuitively, in the configuration model edges occur ‘nearly independently,’ whereas for random lifts there are strong dependencies due to the fact that many edges are not allowed; see [Reference Bordenave12].
1.3. Structure of the paper
Briefly, we now lay out the method of proof that the bipartite, biregular random graph is Ramanujan. The proof outline is given in detail in Section 5.1, after some important preliminary terms and definitions given in Section 4. The bulk of our work builds to Theorem 3.1, which is actually a bound on the second eigenvalue of the nonbacktracking matrix B, as explained in Section 2. The Ramanujan bound on the second eigenvalue of A then follows as Theorem 3.2. As a side result, we find that row and columnregular, rectangular matrices (the offdiagonal block X of the adjacency matrix in equation (1)) with aspect ratio smaller than one ( $d_1 \neq d_2$ ) have full rank with high probability.
To find the second eigenvalue of B, we subtract from it a matrix S that is formed from the leading eigenvectors and examine the spectral norm of the ‘almostcentered’ matrix $\bar{B} = B  S$ . We then proceed to use the trace method to bound the spectral norm of the matrix $\bar{B}^\ell$ by its trace. However, since $\bar{B}$ is not positive definite, this leads us to consider
On the righthand side, the terms in $\bar{B}^\ell$ refer to circuits built up of 2k segments, each of length $\ell+1$ , since an entry $B_{ef}$ is a walk on two edges. Because the degrees are bounded, it turns out that, for $\ell = O(\log (n))$ , the depth $\ell$ neighbourhoods of every vertex contain at most one cycle—they are ‘tanglefree.’ Thus, we can bound the trace by computing the expectation of the circuits that contribute, along with an upper bound on their multiplicity, taking each segment to be $\ell$ tanglefree.
Finally, to demonstrate the usefulness of the spectral gap, we highlight three applications of our bound. In Section 6, we show a community detection application. Finding communities in networks is important for the areas of social network, bioinformatics, neuroscience, among others. Random graphs offer tractable models to study when detection and recovery are possible.
We show here how our results lead to community detection in regular stochastic block models with arbitrary numbers of groups, using a very general theorem by [Reference Wan, Meilă, Cortes, Lawrence, Lee, Sugiyama and Garnett62]. Previously, [Reference Newman and Martin54] studied the spectral density of such models, and the community detection problem of the special case of two groups was previously studied by [Reference Brito, Dumitriu, Ganguly, Hoffman and Tran15] and [Reference Brito, Dumitriu, Ganguly, Hoffman and Tran15].
In Section 7, we examine the application to linear error correcting codes built from sparse expander graphs. This concept was first introduced by [Reference Gallager30] who explicitly used random bipartite biregular graphs. These ‘lowdensity paritycheck’ codes enjoyed a renaissance in the 1990s, when people realised they were well suited to modern computers. For an overview, see [Reference Richardson and Urbanke57, Reference Richardson and Urbanke58]. Our result yields an explicit lower bound on the minimum distance of such codes, i.e. the number of errors that can be corrected.
The final application, in Section 8, leads to generalised error bounds for matrix completion. Matrix completion is the problem of reconstructing a matrix from observations of a subset of entries. Heiman et al. [Reference Heiman, Schechtman and Shraibman33] gave an algorithm for reconstruction of a square matrix with low complexity as measured by a norm $\gamma_2$ , which is similar to the trace norm (sum of the singular values, also called the nuclear norm or Ky Fan nnorm). The entries that are observed are at the nonzero entries of the adjacency matrix of a bipartite, biregular graph. The error of the reconstruction is bounded above by a factor which is proportional to the ratio of the leading two eigenvalues, so that a graph with larger spectral gap has a smaller generalization error. We extend their results to rectangular graphs, along the way strengthening them by a constant factor of two. The main result of the paper gives an explicit bound in terms of $d_1$ and $d_2$ .
As this paper was being prepared for submission, we became aware of the work of [Reference Deshpande, Montanari, O’Donnell, Schramm and Sen22]. In their interesting paper, they use the smallest positive eigenvalue of a random bipartite lift to study convex relaxation techniques for random notallequal3SAT problems. It seems that our main result addresses the configuration model version of this constraint satisfaction problem, the first open question listed at the end of [Reference Deshpande, Montanari, O’Donnell, Schramm and Sen22].
2. Nonbacktracking matrix B
Given $G \sim \mathcal{G}(n,m,d_1,d_2)$ , we define the nonbacktracking operator B. This operator is a linear endomorphism of $\mathbb{R}^{\vec{E}}$ , where $\vec{E}$ is the set of oriented edges of G and $\vec{E} = 2E$ . Throughout this paper, we will use V(H), E(H), and $\vec{E}(H)$ to denote the vertices, edges, and oriented or directed edges of a graph, subgraph, or path H. For oriented edges $e=(u,v)$ , where u and v are the starting and ending vertices of e, and $f=(s,t)$ , define:
We order the elements of $\vec{E}$ as $\{e_1, e_2,\cdots,e_{2E}\}$ , so that the first $E$ have end point in the set $V_2$ . In this way, we can write
for $E \times E$ matrices $B^{(12)}$ and $B^{(21)}$ with entries equal to 0 or 1.
We are interested in the spectrum of B. Denote by $\textbf{1}_{\alpha}$ the vector with first $E$ coordinates equal to 1 and the last $E$ equal to $\alpha = \sqrt{d_11}/\sqrt{d_21}$ . We can check that
for $\lambda=\sqrt{(d_11)(d_21)}$ . By the Perron–Frobenius Theorem, we conclude that $\lambda_1=\lambda$ and the associated eigenspace has dimension one. Also, one can check that if $\lambda$ is an eigenvalue of B with eigenvector $v=(v_1,v_2)$ , $v_i\in \mathbb{R}^{E}$ , then $\lambda$ is also an eigenvalue with eigenvector $v'=(v_1,v_2)$ . Thus, $\sigma(B)=\sigma(B)$ and $\lambda_{2E}=\lambda_1$ .
2.1 Connecting the spectra of A and B
Understanding the spectrum of B turns out to be a challenging question. A useful result in this direction is the following theorem proved by [Reference Bass6], and subsequently in [Reference Watanabe and Fukumizu63] and [Reference Kotani and Sunada40]; see also Theorem 3.3 in [Reference Angel, Friedman and Hoory3].
Theorem 2.1 (Ihara–Bass formula). Let $G=(V,E)$ be any finite graph and B be its nonbacktracking matrix. Then
where D is the diagonal matrix with $D_{vv}=d_v  1$ and A is the adjacency matrix of G.
We use the Ihara–Bass formula to analyse the relationship of the spectrum of B to the spectrum of A in the case of a bipartite biregular graph. It will turn out that this relationship can be completely unpacked. From Theorem 2.1, we get that
Note that there are precisely $2(m+n)$ eigenvalues of B that are determined by A, and that $\lambda = 0$ is not in the spectrum of B, since the graph has no isolated vertices ( $\det(D) \neq 0$ ).
We use the special structure of G to get a more precise description of $\sigma(B)$ . The matrices A and D are equal to:
where $I_k$ is the $k\times k$ identity matrix. Let $\lambda\in \sigma(B)\backslash\{1,1\}$ . Then there exists a nonzero vector v such that
Writing $v=(v_1,v_2)$ with $v_1\in \mathbb{C}^n$ , $v_2\in \mathbb{C}^m$ , we obtain:
The above imply that, provided that the righthand side is nonzero,
is a nonzero eigenvalue of both $XX^*$ with eigenvector $v_1$ and $X^*X$ , with eigenvector $v_2$ . We can rewrite equation (6) as
We will now detail how the eigenvalues of A (denoted $\xi$ here) map to eigenvalues of B and viceversa. Let us examine the special case $\xi = 0$ . Assume $n \leq m$ for simplicity. Assume that the rank of X is r. Then X has $mr$ independent vectors in its nullspace. Let u be one such vector. Now, if we pick $v_2 = u$ , $v_1 = 0$ , and $\lambda = \pm i \sqrt{d_21}$ , equations (4) and (5) are satisfied. Hence, $\pm i \sqrt{d_21}$ are eigenvalues of B, both with multiplicity $mr$ .
Since the rank of X is r, it follows that the nullity of $X^*$ is $nr$ , so there are $nr$ independent vectors w for which $X^*w = 0$ . Now, note that picking $v_1 = w$ , $v_2 = 0$ , and $\lambda = \pm i \sqrt{d_1 1}$ , we satisfy equations (4) and (5). Thus, $\pm i \sqrt{d_11}$ are eigenvalues of B, both with multiplicity $nr$ .
The remaining 4r eigenvalues of B determined by A come from nonzero eigenvalues of A. For each $\xi^2$ with $\xi$ a nonzero eigenvalue of A, we will have precisely 4 complex solutions to equation (6). Since there are 2r such eigenvalues, coming in pairs $\pm \xi$ , they determine a total of 4r eigenvalues of B, and the count is complete. To summarise the discussion above, we have the following Lemma:
Lemma 2.2. Any eigenvalue of B belongs to one of the following categories:

1. $\pm 1$ are both eigenvalues with multiplicities $EV = nd_1  m n$ ,

2. $\pm i \sqrt{d_11}$ are eigenvalues with multiplicities $nr$ , where r is the rank of the matrix X,

3. $\pm i \sqrt{d_21}$ are eigenvalues with multiplicities $mr$ , and

4. every pair of nonzero eigenvalues $(\xi, \xi)$ of A generates exactly 4 eigenvalues of B.
3. Main result
We spend the bulk of this paper in the proof of the following:
Theorem 3.1. If B is the nonbacktracking matrix of a bipartite, biregular random graph $G \sim \mathcal{G}(n,m,d_1,d_2)$ , then its second largest eigenvalue
asymptotically almost surely, with $\epsilon_n \to 0$ as $n \to \infty$ . Equivalently, there exists a sequence $\epsilon_n \rightarrow 0$ as $n \rightarrow \infty$ so that
Remark. For the random lift model, Theorem 3.1 was proved by [Reference Bordenave12], which applies to random bipartite graphs only when $d_1=d_2=d$ as discussed in Section 1.2.
We combine Theorems 2.1 and 3.1 to prove our main result concerning the spectrum of A.
Theorem 3.2 (Spectral gap). Let $A=\left(\begin{array}{cc} 0 & X \\X^* & 0 \end{array}\right )$ be the adjacency matrix of a bipartite, biregular random graph $G \sim \mathcal{G}(n,m,d_1,d_2)$ . Without loss of generality, assume $d_1 \geq d_2$ or, equivalently, $n \leq m$ . Then:

i. Its second largest eigenvalue $\eta = \lambda_2 (A)$ satisfies
\begin{equation*}\eta \leq \sqrt{d_11}+\sqrt{d_21}+ \epsilon^{\prime}_n\end{equation*}asymptotically almost surely, with $\epsilon^{\prime}_n \to 0$ as $n \to \infty$ . 
ii. Its smallest positive eigenvalue $\eta^+_{\textrm{min}} = \min ( \{\lambda \in \sigma(A): \lambda > 0\} )$ satisfies
\begin{equation*}\eta_{\textrm{min}}^+\geq \sqrt{d_11}\sqrt{d_21} \epsilon^{\prime\prime}_n\end{equation*}asymptotically almost surely, with $\epsilon^{\prime\prime}_n\to 0$ as $n \to \infty$ . (Note that this will be almost surely positive if $d_1>d_2$ ; no further information is gained if $d_1 = d_2$ .) 
iii. If $d_1 \neq d_2$ , the rank of X is n with high probability.
Remark. Since the first draft of this work came out, considerable advances have been made regarding the question of singularity of random regular graphs. It was conjectured in [Reference Costello and Vu21] that, for $3\leq d\leq n3$ , the adjacency matrix of uniform dregular graphs is not singular with high probability as n grows to infinity. For directed dregular graphs and growing d, this is now known to be true, following the results of [Reference Cook19] and [Reference Litvak, Lytova, Tikhomirov, TomczakJaegermann and Youssef45, Reference Litvak, Lytova, Tikhomirov, TomczakJaegermann and Youssef46]. For constant degree d, [Reference Huang36, Reference Huang37] proved the asymptotic nonsingularity of the adjacency matrix for both undirected and directed dregular graphs. The last case can be interpreted as singularity of the adjacency matrix of random dregular bipartite graph. To the best of our knowledge, Theorem 3.2(iii) is the first result concerning the rank of rectangular random matrices with $d_1$ nonzero entries in each row and $d_2$ in each column.
Remark. The analysis of the IharaBass formula for Markov matrices of bipartite biregular graph appeared before in [Reference Kempton39]. We have independently proven Lemma 2.2 and extracted from it more information than is given in [Reference Kempton39], including Theorem 3.2(iii).
Proof. Equation (7), describing those eigenvalues of B which are neither $\pm 1$ and do not correspond to 0 eigenvalues of A is equivalent to
where $x = \lambda^2$ , $y = \xi^2$ , $\alpha = d_1  1$ , and $\beta = d_2 1$ . A simple discriminant calculation and analysis of equation (8), keeping in mind that $y \neq 0$ , leads to a number of cases in terms of y:

Case 1: $y \in ((\sqrt{\alpha} \sqrt{\beta})^2, (\sqrt{\alpha} + \sqrt{\beta})^2)$ , i.e. roughly speaking, $\eta$ is in the bulk, means that x is on the circle of radius $\sqrt{\alpha \beta}$ and the corresponding pair of eigenvalues $\lambda$ are on a circle of radius $(\alpha \beta)^{1/4}$ .

Case 2: $y \in (0, (\sqrt{\alpha}  \sqrt{\beta})^2]$ means that x is real and negative, so $\lambda$ is purely imaginary.
In this case, one may also show that the smaller of the two possible values for x is increasing as a function of y and $x_{} \in (\alpha, \sqrt{\alpha \beta}]$ . The larger of the two values of x is decreasing and $x_{+} \in [\sqrt{\alpha \beta},  \beta)$ . Correspondingly, the largest in absolute value that $\lambda$ could be in this case is $\pm i \alpha^{1/4} = \pm i (d_11)^{1/4}$ .

Case 3: $y \geq (\sqrt{\alpha} + \sqrt{\beta})^2$ means that both solutions $x_{\pm}$ are real, and the larger of the two is larger than $\sqrt{\alpha \beta}$ .
Note that equation (8) shows there is a continuous dependence between x and y, and consequently between $\xi$ and $\lambda$ . Putting these cases together with Lemma 2.2, a few things become apparent:

1. $\xi>\sqrt{d_11}+ \sqrt{d_21}$ means that $\lambda> ((d_11)(d_21))^{1/4}$ .

2. $\lambda_2 \leq ((d_11)(d_21))^{1/4} + \epsilon$ implies that all eigenvalues except for the largest two will be either 0, or in a small neighbourhood $[\sqrt{\alpha} \sqrt{\beta}  \delta,\sqrt{\alpha} + \sqrt{\beta} + \delta]$ of the bulk, with $\delta$ small if $\epsilon$ is small since the dependence of $\delta$ on $\epsilon$ can be deduced from equation (8).

3. $\lambda_2 \leq ((d_11)(d_21))^{1/4} + \epsilon$ with high probability implies that if $d_1 \neq d_2$ , $r=n$ with high probability. Otherwise, we would have eigenvalues of B with absolute value $\sqrt{d_11}$ and this is larger than $((d_11)(d_21))^{1/4}$ .
This completes the proof, with results (i) and (ii) following from the point 2 and (iii) following from point 3.
In Figure 3, we depict the spectra of A and B for a sample graph $G \sim \mathcal{G}(120,280,7,3)$ . Looking at the nonbacktracking spectrum, we observe the two leading eigenvalues $\pm \sqrt{(d_1  1 )(d_2  1)}$ (blue crosses) outside the circle of radius $((d_1  1)(d_2  1))^{1/4}$ along with a number of zero eigenvalues (black dots). There are also multiple purely imaginary eigenvalues which can arise from $\xi \in (0, \sqrt{d_1  1}  \sqrt{d_2  1}]$ as well as $\xi = 0$ . However, due to Theorem 3.2, only the smaller of $i \sqrt{d_1  1}$ and $i \sqrt{d_2  1}$ is observed with nonnegligible probability, implying that X has rank $r = n$ with high probability (shown as blue stars). Furthermore, we observe two pairs of real eigenvalues of B which are connected to a pair of eigenvalues of A from ‘above’ the bulk, as well as two pairs of imaginary eigenvalues of B which are connected to a pair of eigenvalues of A from ‘below’ the bulk.
4. Preliminaries
We describe the standard configuration model for constructing such graphs. We then define the ‘tanglefree’ property of random graphs. Since small enough neighbourhoods are tanglefree with high probability, we only need to count tanglefree paths when we eventually employ the trace method.
4.1. The configuration model
The configuration or permutation model is a practical procedure to sample random graphs with a given degree distribution. Let us recall its definition for bipartite biregular graphs. Let $V_1=\{v_1,v_2,\dots,v_n\}$ and $V_2=\{w_1,w_2,\dots, w_m\}$ be the vertices of the graph. We define the set of half edges out of $V_1$ to be the collection of ordered pairs
and analogously the set of half edges out of $V_2$ :
see Figure 1. Note that $\vec{E}_1=\vec{E}_2=nd_1=md_2 = E$ . To sample a graph, we choose a random permutation $\pi$ of $[n d_1]$ . We put an edge between $v_i$ and $w_j$ in the G whenever
for any pair of values $1\leq s\leq d_1$ , $1\leq t\leq d_2$ . For specific half edges $e = (v_i, j)$ and $f = (w_s, t)$ , we use the notation $\pi(e) = f$ as shorthand for $\pi((i1)d_1 + j) = (s1)d_2 + t$ and say that “e matches to f.”
The graph obtained may not be simple, since multiple half edges may be matched between any pair of vertices. However, conditioning on a simple graph outcome, the distribution is uniform in the set of all simple bipartite biregular graphs. Furthermore, for fixed $d_1, d_2$ and $n,m \to \infty$ , the probability of getting a simple graph is bounded away from zero [Reference Bollobás11].
Consider the random $B \in \mathbb{R}^{2 E\times 2 E}$ whose first $E$ rows are indexed by the elements of $\vec{E}_1$ and the last $E$ rows are indexed by those of $\vec{E}_2$ , in lexicographic order. Columns are indexed in the same way. Entry $B_{ef}$ with $e=(v_i,j) \in \vec{E}_1$ and $f=(w_s,t) \in \vec{E}_2$ is defined as
This defines the upper half of B. We define the lower half similarly, by putting
This is the same definition used in [Reference Bordenave12]. In words, it says that the directed edge given by e followed by the directed edge given by f are connected by some half edge $f' = (w_s, t')$ , and the path they form does not backtrack. This is therefore the same matrix introduced in Section 2, ordered according to the half edges. Notice that the randomness comes from the matching only.
We consider two symmetric matrices $M = M(\pi)$ and N, indexed the same as B, and defined by:
and
We see that a term like $M_{eg} N_{gf}$ corresponds to M matching the directed edge e to g by $\pi$ , and N taking us out of the vertex of g along the directed edge f, which is different from g. Thus, the rule of matrix multiplication means that
This equality will be useful in Section 5.2 when working with products of the matrix B.
4.2. Tanglefree paths
Sparse random graphs, including bipartite graphs, have the important property of being ‘treelike’ in the neighbourhood of a typical vertex. Formally, consider a vertex $v \in V_1 \cup V_2$ . For a natural number $\ell$ , we define the ball of radius $\ell$ centered at v to be:
where $d_G(\cdot ,\cdot )$ is the graph distance.
Definition 4.1. A graph G is $\ell$ tanglefree if $B_{\ell}(v)$ contains at most one cycle for any vertex v.
The next lemma says that most bipartite biregular graphs are $\ell$ tanglefree up to logarithmic sized neighbourhoods.
Lemma 4.2. Let $G \sim \mathcal{G}(n,m,d_1,d_2)$ be a bipartite, biregular random graph. Let $\ell < \frac{1}{8} \log_d (n)$ , for $d=\max\{d_1, d_2\}$ . Then G is $\ell$ tanglefree with probability at least $1n^{1/2}$ .
Proof. This is essentially the proof given in [Reference Lubetzky and Sly47], Lemma 2.1. Fix a vertex v. We will use the so called exploration process to discover the ball $B_{\ell}(v)$ . More precisely, we order the set $\vec{E}_1$ lexicographically: $(v_i,j) < (v_{i'},j')$ if $i\leq i'$ and $j\leq j'$ . The exploration process reveals $\pi$ one edge at the time, by doing the following:

A uniform element is chosen from $\vec{E}_2$ , and it is declared equal to $\pi(1)$ .

A second element is chosen uniformly, now from the set $\vec{E}_2\backslash\{\pi(1)\}$ and set equal to $\pi(2)$ .

Once we have determined $\pi(i)$ for $i\leq k$ , we set $\pi(k+1)$ equal to a uniform element sampled from the set $\vec{E}_2\backslash\{\pi(1),\pi(2),\dots,\pi(k)\}$ .
We use the final $\pi$ to output a graph as we did in the configuration model. The law of these graphs is the same. With the exploration process, we expose first the neighbours of v, then the neighbours of these vertices, and so on. This breadthfirst search reveals all vertices in $B_k (v)$ before any vertices in $B_{j > k} (v)$ . Note that, although our bound is for the family $\mathcal{G}(n,m,d_1,d_2)$ , the neighbourhood sizes are bounded above by those of the dregular graph with $d=\max(d_1,d_2)$ .
Consider the matching of half edges attached to vertices in the ball $B_i(v)$ at depth i (thus revealing vertices at depth $i+1$ ). In this process, we match a maximum $m_i \leq d^{i+1}$ pairs of half edges total. Let $\mathcal{F}_{i,k}$ be the filtration generated by matching up to the kth half edge in $B_i(v)$ , for $1 \leq k \leq m_i$ . Denote by $A_{i,k}$ the event that the kth matching creates a cycle at the current depth. For this to happen, the matched vertex must have appeared among the $k1$ vertices already revealed at depth $i+1$ . The number of unmatched half edges is at least $nd  2 d^{i+1}$ . We then have that:
So, we can stochastically dominate the sum
by $Z \sim \textrm{Bin} \left( d^{\ell+1},\ n^{1} d^{\ell} \right)$ . So the probability that $B_{\ell}(v)$ is $\ell$ tanglefree has the bound:
which follows using that $\ell = c \log_d n$ with $c < 1/8$ . The Lemma follows by taking a union bound over all vertices.
5. Proof of Theorem 3.1
5.1. Outline
We are now prepared to explain the main result. To study the second largest eigenvalue of the nonbacktracking matrix, we examine the spectral radius of the matrix obtained by subtracting off the dominant eigenspace. We use for this:
Lemma 5.1 ([Reference Bordenave, Lelarge and Massoulié14], Lemma 3). Let T and R be matrices such that $\textrm{Im}(T)\subset \textrm{Ker}(R)$ , $\textrm{Im}(T^*)\subset \textrm{Ker}(R)$ . Then all eigenvalues $\lambda$ of $T+R$ that are not eigenvalues of T satisfy:
Throughout the text, $\ \cdot \$ is the spectral norm for matrices and $\ell^2$ norm for vectors. Recall that the leading eigenvalues of B, in magnitude, are $\lambda_1 = \sqrt{(d_1 1)(d_21)}$ and $\lambda_{2E} = \lambda_1$ with corresponding eigenvectors $\textbf{1}_{\alpha}$ and $\textbf{1}_{\alpha}$ . Applying Lemma 5.1 with $T=\frac{\lambda_1^{\ell}}{\textbf{1}_{\alpha}^* \textbf{1}_{\alpha}}(\textbf{1}_{\alpha} \textbf{1}_{\alpha}^* +(1)^\ell \textbf{1}_{\alpha} \textbf{1}_{\alpha}^*)$ and $R = B^\ell  T$ , we get that
It will be important later to have a more precise description of the set $\textrm{Ker}(T)$ . It is not hard to check that
In the last line, the vectors v, w and $\textbf{1}$ are $E$ dimensional, and $\textbf{1}$ is the vector of all ones.
In order to use equation (10), we must bound $\ B^\ell x \$ for large powers $\ell$ and $x \in \textrm{Ker}(T)$ . This amounts to counting the contributions of certain nonbacktracking walks. We will use the tanglefree property in order to only count $\ell$ tanglefree walks. We break up $B^\ell$ into two parts in Section 5.2, an ‘almost’ centered matrix $\bar{B}^\ell$ and the remainder $\sum_j R^{\ell,j}$ and we bound each term independently.
To compute these bounds, we need to count the contributions of many different nonbacktracking walks. We will use the trace technique, so only circuits which return to the starting vertex will contribute. In Section 5.3, we compute the expected contribution of products of B along such circuits, employing a result from [Reference Bordenave12].
Section 5.4 covers the combinatorial component of the proof. The total contributions $\ B^\ell x \$ come from many nonbacktracking circuits of different flavours, depending on their number of vertices, edges, cycles, etc. Each circuit is broken up into 2k segments of tanglefree walks of length $\ell$ . We need to compute not only the expectation along the circuit, but also upperbound the number of circuits of each flavor. We introduce an injective encoding of such circuits that depends on the number of vertices, length of the circuit, and, crucially, the tree excess of the circuit. An important part of these calculations is to keep track of the imbalance between left and right vertices visited in the circuit, since this controls the powers of $d_1$ and $d_2$ in the result.
Finally, in Section 5.8, we put all of these ingredients together and use Markov’s inequality to bound each matrix norm with high probability. We find that $\ \bar{B}^\ell \$ contributes a factor that goes as $((d_1  1)(d_2  1))^{\ell/4}$ , whereas $\ R^{\ell,j} \$ contributes only a factor of $(d1)^\ell/n$ , up to polylogarithmic factors in n. Thus, the main contribution to the circuit counts comes from the mean and, in fact, comes from circuits which are exactly trees traversed forwards and backwards. Interestingly, this is analogous to what happens when using the trace method on random matrices of independent entries.
In the proof, we are forced to consider tangled paths but which are built up of tanglefree components. This delicate issue was first made clear by [Reference Friedman28] who introduced the idea of tangles and a ‘selective trace.’ Bordenave et al. [Reference Bordenave, Lelarge and Massoulié14], who we follow closely in this part of our analysis, also has a good discussion of these issues and their history. We use the fact that
and so deal with circuits built up of 2k segments which are $\ell$ tanglefree. Notice that the first segment comes from $\bar{B}^\ell$ , the second from $(\bar{B}^{\ell})^*$ , etc. Because of this, the directionality of the edges along each segment alternates. See Figure 4 for an illustration of a path which contributes for $k=2$ and $\ell=2$ . Also, while each segment is $\ell$ tanglefree, the overall circuit may be tangled.
5.2. Matrix decomposition
We start this section by defining the set of paths that will be relevant to bound the norm of $\\bar{B}^{\ell}\$ . We closely follow [Reference Bordenave12].
Definition 5.2. Define $\Gamma^{\ell}_{ef}$ to be the set of all nonbacktracking paths of $2\ell+1$ half edges, starting at e and ending at f. A path in this set will be denoted by $\gamma= (e_1,e_2,\dots,e_{2\ell},e_{2\ell+1})$ , where $e_1 = e$ and $e_{2\ell + 1} = f$ . The nonbacktracking property means that, for all $1\leq i\leq \ell$ , $e_{2i}$ and $e_{2i+1}$ share the same vertex but $e_{2i}\neq e_{2i+1}$ . Similarly, let $\Gamma^\ell = \bigcup_{e,f} \Gamma^\ell_{ef}$ .
Each path in $\Gamma^\ell_{ef}$ uses $2 \ell + 1$ half edges, corresponding to $\ell + 1$ edges in the graph. To be clear, the above definition counts all possible nonbacktracking sequences of half edges. These are different than the usual nonbacktracking paths and do not necessarily exist in the graph. Some of these paths might backtrack along a duplicate edge which utilizes a different half edge.
We now have
where we used equation (9) and the fact that $\Gamma^{\ell}_{ef}$ is nonbacktracking, so $N_{e_{2t}e_{2t+1}}=1$ .
Recall that we will use equation (10) and Lemma 5.1 to bound $\lambda_2$ . Denote by $\bar{B}$ the matrix with entries equal to $\bar{B} = B  S$ , where
Note that $\bar{B}$ is an almost centred version of B, and $\textrm{Ker} ( S ) = \textrm{Ker}(T) = \textrm{span} (\textbf{1}_{\alpha}, \textbf{1}_{\alpha} ) $ , where T is the matrix from Lemma 5.1. To apply the lemma, we wish to get an expression like equation (12) for $\bar{B}^{\ell}$ . To do so, we write:
the above matrix equation in the unknown S’ can be solved by simple manipulations. We get
Using again that N is identically one over the elements of the set $\Gamma^{\ell}_{ef}$ , we find a similar formula to equation (12):
where $\bar{M}=MS'$ .
The following telescoping sum formula is a simple algebraic manipulation and appears in [Reference Massoulie50] and [Reference Bordenave, Lelarge and Massoulié14]:
Using this, with $x_s = B_{e_{2s1} e_{2s+1}}$ and $y_s = \bar{B}_{e_{2s1} e_{2s+1}}$ , we obtain the following relation:
This decomposition breaks the elements in $\Gamma^{\ell}_{ef}$ into two subpaths, also nonbacktracking, of length j and $\ellj$ , respectively.
Definition 5.3. Let $F_{ef}^\ell \subset \Gamma^\ell_{ef}$ denote the subset of paths which are tanglefree, with $F^\ell = \bigcup_{e,f} F^\ell_{ef}$ .
We will take the parameter $\ell$ to be small enough so that the path $\gamma$ is tanglefree with high probability. Thus, the sums in equations (12) or (13) need only be over the paths $\gamma \in F_{ef}^\ell$ . However, to recover the matrices B and $\bar{B}$ by rearranging equation (14), we need to also count those tanglefree subpaths that arise from splitting tangled paths. While breaking a tanglefree path will necessarily give us two new tanglefree subpaths, the converse is not always true. This extra term generates a remainder that we define now.
Definition 5.4. Let $T^{\ell,j}_{ef}$ be the set of nonbacktracking paths containing $2\ell+1$ half edges, starting at e and ending at f, such that overall the path is tangled but the first $2j1$ , middle three, and last $2(\ell  j)+1$ half edges form tanglefree subpaths: $\gamma = (e_1, \ldots, e_{2\ell + 1}) \in T^{\ell,j}$ if and only if $\gamma' = (e_1, \ldots, e_{2j1}) \in F^{j1}$ , $\gamma'' = (e_{2j1}, e_{2j}, e_{2j+1}) \in F^{1}$ , and $\gamma''' = (e_{2j+1}, \ldots, e_{2\ell + 1}) \in F^{\ell  j}$ . Set $T^{\ell,j} = \bigcup_{e,f} T^{\ell,j}_{ef}$ .
Set the remainder
Since the paths are nonbacktracking, the N terms are all unity.
Adding and subtracting $\sum_{j=1}^{\ell} R^{\ell,j}_{ef}$ to equation (14) and rearranging the sums, we obtain
where the matrices $B^{(\ell)}$ and $\bar{B}^{(\ell)}$ are tanglefree versions of $B^\ell$ and $\bar{B}^\ell$ , i.e. element ef in both matrices only counts paths $\gamma \in F_{ef}^\ell$ . Multiplying equation (16) on the right by $x \in \textrm{Ker} (T)$ and using that $B^{(\ell  j)} x$ is also within $\textrm{Ker}(S)$ , since it is just the space spanned by the leading eigenvectors, we find that the middle term is identically zero. Thus, for $x \in \textrm{Ker}(T)$ ,
5.3 Expectation bounds
Our goal is to find a bound on the expectation of certain random variables which are products of $\bar{B}_{ef}$ along a circuit. To do this, we will need to bound the probabilities of different subgraphs when exploring G. This requires us to introduce the concept of consistent edges and their multiplicity.
Definition 5.5. Let $\gamma = (e_1, \ldots, e_{2k})$ be a sequence of half edges of even length, with $\vec{E}(\gamma)$ its set of half edges, and $E(\gamma) = \{ \{e_{2i1}, e_{2i}\}\ for\ i [k]\} $ its set of edges (unordered pairs, thus undirected).

The multiplicity of a half edge $e \in \vec{E}(\gamma)$ is $m_\gamma(e) = \sum_{t=1}^{2k} 1_{\{ e_t = e \}}$ .

The multiplicity of an edge $\{h_1, h_2\} \in E(\gamma)$ , is $m_\gamma(\{h_1,h_2\}) = \sum_{t=1}^k 1_{\{ \{e_{2t1}, e_{2t}\} = \{h_1, h_2\} \}}$ .

An edge $\{h_1, h_2\}$ is consistent if $m_\gamma(h_1) = m_\gamma(h_2) = m_\gamma(\{h_1, h_2\})$ .
Lemma 5.6. Let $\gamma = (e_1, \ldots, e_{2k})$ be a sequence of half edges of even length, with M and $\bar{M}$ the matching matrix and its centred version generated by a uniform matching in the configuration model. Then for $1 \leq k \leq \sqrt{E}$ and $0 \leq t_0 \leq k$ we have that
where $b = $ number of inconsistent edges of multiplicity one occurring before $t_0$ , $\mathcal{E}_1 = $ number of consistent edges with multiplicity one occurring before $t_0$ , $\mathcal{E} = E(\gamma)$ , and C is a universal constant.
Proof. Recall the form of the matrices
Matrix $M_1 \in \mathbb{R}^{E \times E}$ is a random permutation matrix between $n d_1 = E$ and $m d_2 = E$ half edges. Therefore, $M_1$ is distributed exactly the same as a matching matrix of a random $E$ lift of a single edge, and the same holds for its centered version $M_1  \frac{1}{E} \textbf{1} \textbf{1}^*$ . The only paths $\gamma$ that contribute in this bipartite setting must alternate between the bipartite sets and avoid the 0 blocks, otherwise the bound holds trivially. For one of these paths $\gamma$ assume, without loss of generality, that the path starts in set $V_1$ . Then define the transformed path $\gamma' = (e^{\prime}_1, \ldots, e^{\prime}_{2k}) = (e_1, e_2, e_4, e_3, e_5, \ldots)$ , i.e. with every other pair in $\gamma$ in reverse order. Note that
Then the Lemma holds by [Reference Bordenave12], Proposition 28.
5.4. Path counting
This section is devoted to counting the number of ways nonbacktracking walks can be concatenated to obtain a circuit as in Section 5.2. We will follow closely the combinatorial analysis used in [Reference Brito, Dumitriu, Ganguly, Hoffman and Tran16]. In that paper, the authors needed a similar count for selfavoiding walks. We make the necessary adjustments to our current scenario.
Our goal is to find a reasonable bound for the number of circuits that contribute to the trace bound, equation (11) and shown graphically in Figure 4. Define $\mathcal{C}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ as those circuits which visit exactly $\mathcal{V} = V(\gamma)$ different vertices, $\mathcal{R} = V(\gamma) \cap V_2$ of them in the right set, and $\mathcal{E} = E(\gamma)$ different edges. Note, these are undirected edges in E(G). This is a set of circuits of length $2 k \ell$ obtained as the concatenation of 2k nonbacktracking, tanglefree walks of length $\ell$ . We denote such a circuit as $\gamma= ( \gamma_1,\gamma_2,\cdots, \gamma_{2k} )$ , where each $\gamma_j$ is a length $\ell$ walk.
To bound $C_{\mathcal{V}, \mathcal{E}}^\mathcal{R} =  \mathcal{C}_{\mathcal{V}, \mathcal{E}}^\mathcal{R} $ , we will first choose the set of vertices and order them. The circuits that contribute are indeed directed nonbacktracking walks. However, by considering undirected walks along a fixed ordering of vertices, that ordering sets the orientation of the first and thus the rest of the directed edges in $\gamma$ . Thus, we are counting the directed walks which contribute to equation (11). We relabel the vertices as $1,2, \ldots, \mathcal{V}$ as they appear in $\gamma$ . Denote by $\mathcal{T}_{\gamma}$ the spanning tree of those edges leading to new vertices as induced by the path $\gamma$ . The enumeration of the vertices tells us how we traverse the circuit and thus defines $\mathcal{T}_{\gamma}$ uniquely.
We encode each walk $\gamma_j$ by dividing it into sequences of subpaths of three types, which in our convention must always occur as type 1 $\to$ type 2 $\to$ type 3, although some may be empty subpaths. Each type of subpath is encoded with a number, and we use the encoding to upper bound the number of such paths that can occur. Given our current position on the circuit, i.e. the label of the current vertex, and the subtree of $\mathcal{T}_{\gamma}$ already discovered (over the whole circuit $\gamma$ not just the current walk $\gamma_j$ ), we define the types and their encodings:

Type 1: These are paths with the property that all of their edges are edges of $\mathcal{T}_{\gamma}$ and have been traversed already in the circuit. These paths can be encoded by their end vertex. Because this is a path contained in a tree, there is a unique path connecting its initial and final vertex. We use 0 if the path is empty.

Type 2: These are paths with all of their edges in $\mathcal{T}_{\gamma}$ but which are traversed for the first time in the circuit. We can encode these paths by their length, since they are traversing new edges, and we know in what order the vertices are discovered. We use 0 if the path is empty.

Type 3: These paths are simply a single edge, not belonging to $\mathcal{T}_\gamma$ , that connects the end of a path of type 1 or 2 to a vertex that has been already discovered. Given our position on the circuit, we can encode an edge by its final vertex. Again, we use 0 if the path is empty.
Now, we decompose $\gamma_j$ into an ordered sequence of triples to encode its subpaths:
where each $p_i$ characterises subpaths of type 1, $q_i$ characterises subpaths of type 2 and $r_i$ characterizes subpaths of type 3. These subpaths occur in the order given by the triples. We perform this decomposition using the minimal possible number of triples.
Now, $p_i$ and $r_i$ are both numbers in $\{0,1,...,\mathcal{V} \}$ , since our cycle has $\mathcal{V}$ vertices. On the other hand, $q_i \in \{ 0,1,...,\ell \}$ since it represents the length of a subpath of a nonbacktracking walk of length $\ell$ . Hence, there are $(\mathcal{V} + 1)^2 (\ell+1)$ possible triples. Next, we want to bound how many of these triples occur in $\gamma_j$ . We will use the following lemma.
Lemma 5.7. Let $(p_1,q_1,r_1) (p_2,q_2,r_2) \cdots (p_t,q_t,r_t)$ be a minimal encoding of a nonbacktracking walk $\gamma_j$ , as described above. Then $r_i = 0$ can only occur in the last triple $i = t$ .
Proof. We can check this case by case. Assume that for some $i<t$ we have $(p_i,q_i,0)$ and consider the concatenation with $(p_{i+1},q_{i+1},r_{i+1})$ . First, notice that both $p_{i+1}$ and $q_{i+1}$ cannot be zero since then we will have $(p_i,q_i,0)(0,0,v^*)$ which can be written as $(p_i,q_i,v^*)$ . If $q_i\neq 0$ , then we must have $p_{i+1} \neq 0$ . Otherwise, we split a path of new edges (type 2), and the decomposition is not minimal. This implies that we visit new edges and move to edges already visited, hence we need to go through a type 3 edge, implying that $r_i \neq 0$ . Finally, if $p_i \neq 0$ and $q_i = 0$ , then we must have $p_{i+1}=0$ ; otherwise, we split a path of old edges (type 1). We also require $q_{i+1} \neq 0$ , but $(p_i,0,0)(0,q_{i+1},r_{i+1})$ is the same as $(p_i,q_{i+1},r_{i+1})$ , which contradicts the minimality condition. This covers all possibilities and finishes the proof.
Using the lemma, any encoding of a nonbacktracking walk $\gamma_j$ has at most one triple with $r_i=0$ . All other triples indicate the traversing of a type 3 edge. We now give a very rough upper bound for how many of such encodings there can be. To do so, we will use the tanglefree property and slightly modify the encoding of the paths with cycles. Consider the two cases:

Case 1: Path $\gamma_j$ contains no cycle. This implies that we traverse each edge within $\gamma_j$ once. Thus, we can have at most $\chi= \mathcal{E}  \mathcal{V} + 1$ many triples with $r_i\neq 0$ . This gives a total of at most
\begin{equation*} \left((\mathcal{V}+1)^2 (\ell+1) \right)^{\chi+1}\end{equation*}many ways to encode one of these paths. 
Case 2: Path $\gamma_j$ contains a cycle. Since we are dealing with nonbacktracking, tanglefree walks, we enter the cycle once, loop around some number of times, and never come back. We change the encoding of such paths slightly. Let $\gamma_j^{a}$ , $\gamma_j^{b}$ , and $\gamma_j^{c}$ be the segments of the path before, during, and after the cycle. We mark the start of the cycle with $$ and its end with $\$ . The new encoding of the path is
\begin{equation*}(p^a_1,q^a_1,r^a_1) \cdots (p^a_{t^a},q^a_{t^a},r^a_{t^a})\,  \,(p^b_1,q^b_1,r^b_1) \cdots (p^b_{t^b},q^b_{t^b},r^b_{t^b})\, \ \,(p^c_1,q^c_1,r^c_1) \cdots (p^c_{t^c},q^c_{t^c},r^c_{t^c}),\end{equation*}where we encode the segments separately. Observe that each a subpath is connected and selfavoiding. The above encoding tells us all we need to traverse $\gamma_j$ , including how many times to loop around the cycle: since the total length is $\ell$ , we can back out the number of circuits around the cycle from the lengths of $\gamma_j^{a}$ , $\gamma_j^{b}$ , and $\gamma_j^{c}$ , see Figure 5. Following the analysis made for Case 1, the subpaths $\gamma_j^{a}$ , $\gamma_j^{b}$ , $\gamma_j^{c}$ are encoded by at most $\chi + 1$ triples, but we also have at most $\ell$ choices each for our marks $$ and $\$ . We are left with at most\begin{equation*} \ell^2 \left((\mathcal{V}+1)^2 (\ell+1) \right)^{\chi+1}\end{equation*}ways to encode any path of this kind.
Together, these two cases mean there are less than $2 \ell^2 \left((\mathcal{V}+1)^2 (\ell+1) \right)^{\chi+1}$ such paths.
Now we conclude by encoding the entire circuit $\gamma = (\gamma_1, \ldots, \gamma_{2k})$ . We first choose $\mathcal{V}$ vertices, $\mathcal{R}$ in the set $V_2$ , and order them, which can occur in $(m)_\mathcal{R} (n)_{\mathcal{V}\mathcal{R}} \leq m^\mathcal{R} n^{\mathcal{V}  \mathcal{R}}$ different ways. Finally, in the whole path $\gamma$ we are counting concatenations of 2k paths which are $\ell$ tanglefree. Therefore, we conclude with the following Lemma:
Lemma 5.8. Let $\mathcal{C}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ be the set of circuits $\gamma = (\gamma_1, \ldots, \gamma_{2k})$ of length $2 k \ell$ obtained as the concatenation of 2k nonbacktracking, tanglefree walks of length $\ell$ , i.e. $\gamma_s \in F^\ell$ for all $s \in [2k]$ , which visit exactly $\mathcal{V} = V(\gamma)$ different vertices, $\mathcal{R} = V(\gamma) \cap V_2$ of them in the right set, and $\mathcal{E} = E(\gamma)$ different edges. If $C_{\mathcal{V}, \mathcal{E}}^\mathcal{R} = \mathcal{C}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ , then
where $\chi = \mathcal{E}  \mathcal{V} + 1$ .
The circuits that contribute to the remainder term $R^{\ell, j}$ are slightly different. In this case, each length $\ell$ segment is an element of $T^{\ell, j}$ rather than $F^{\ell}$ . We have to slightly modify the previous argument for this case.
Lemma 5.9. Let $\mathcal{D}_{\mathcal{V},\mathcal{E}}^\mathcal{R}$ be the set of circuits $\gamma = (\gamma_1, \ldots, \gamma_{2k})$ of length $2k\ell$ obtained as the concatenation of 2k elements $\gamma_s \in T^{\ell, j}$ for $s = 1, \ldots, 2k$ , that visit exactly $\mathcal{V}$ vertices, $\mathcal{R}$ of which are in $V_2$ , and $\mathcal{E}$ different edges. Then for $D_{\mathcal{V},\mathcal{E}}^\mathcal{R} =  \mathcal{D}_{\mathcal{V},\mathcal{E}}^\mathcal{R} $ , we have
Proof. Since each $\gamma^s = (e_1, \ldots, e_{2\ell + 1}) \in T^{\ell,j}$ , we have that $\gamma' = (e_1, \ldots, e_{2j1}) \in F^{j1}$ , $\gamma'' = (e_{2j1}, e_{2j}, e_{2j+1}) \in F^{1}$ , and $\gamma''' = (e_{2j+1}, \ldots, e_{2\ell + 1}) \in F^{\ell  j}$ . Encoding $\gamma'$ , $\gamma''$ , and $\gamma'''$ as before, we have the generous upper bound of at most
many encodings for each $\gamma_s$ . Choosing and ordering the vertices, then concatenating 2k of these paths gives the final result.
5.5. Half edge isomorphism counting
We have constructed the circuits in $\mathcal{C}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ and $\mathcal{D}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ by choosing the vertices and edges that participate in them. However, the expectation bound applies to matchings of halfedges in the configuration model. Since there are multiple ways to configure the half edges into such a circuit, this must be taken into account in the combinatorics.
Lemma 5.10. Let $I_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ be the number of half edge choices for the graph induced by $\gamma \in \mathcal{C}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}\cup \mathcal{D}_{\mathcal{V}, \mathcal{E}}^\mathcal{R}$ . Then,
Proof. For every left vertex v, with degree $g_v$ on the graph induced by $\gamma$ , the number of choices of half edges is $d_1(d_11)\dots(d_1 g_v+1)\leq d_1 (d_11)^{g_v1}$ . Note that the choice of half edges are independent for the left vertices. We then get that there are $d_1^{\mathcal{V}\mathcal{R}} (d_11)^{\mathcal{E}  \mathcal{V}+\mathcal{R}}$ many choices, where we used that the sum of all the degrees on one component of a bipartite graph equals the number of edges: $\mathcal{E} = \sum_v g_v$ . Similarly, for right vertices we get $d_2^{\mathcal{R}}(d_21)^{(\mathcal{E}\mathcal{R})}$ .
Corollary 5.11. We have that
5.6. Bounding the imbalance $\psi$
We focus now on the quantity defined as $\psi = \mathcal{R}  \mathcal{E} / 2 $ . Informally, $\psi$ captures the imbalance between the number of vertices on each partition of the bipartite graph visited by the circuit $\gamma$ . We show that this imbalance is not too large.
Lemma 5.12. Let $\ell < \frac{1}{32} \log_d(n)$ , then $\psi \leq 16 k^2$ with high probability.
Proof. For any subgraph H define $\psi(H) = \mathcal{R}(H)\mathcal{E}(H)/2$ . We set $\psi=\psi(\gamma)$ . To bound this quantity, we analyse the subgraph $\gamma_{\leq i}$ , obtained by the concatenation of the first i walks in $\gamma$ , i.e. the union of the graphs induced by $\gamma_1, \ldots, \gamma_i$ . Our choice of $\ell$ implies that every neighbourhood of radius $4\ell$ is tanglefree with high probability. Hence, every nonbacktracking walk $\gamma_j$ is either a path or a path with exactly one loop. It is not hard to conclude that $\psi(\gamma_j) \leq 2$ for all j and $\psi(\gamma_{\leq 1}) = \psi(\gamma_1) \leq 2$ . We now proceed inductively to add walks to our graph, one by one, as they appear on the circuit. We will upper bound the increment $\psi(\gamma_{\leq i+1})  \psi(\gamma_{\leq i})$ by looking at how the addition of $\gamma_{i+1}$ changes the imbalance.
To analyse this, consider the intersection of $\gamma_{i+1}$ and each $\gamma_j$ , $1\leq j\leq i$ . Notice that $\psi$ may increase only if there are vertices at which the two walks split apart. We claim that there are at most two such vertices. Assume that at $v_1,v_2$ and $v_3$ the two walks split. Then there are two disjoint cycles in the union of $\gamma_{i+1}$ and $\gamma_j$ , obtained by following each first from $v_1$ to $v_2$ and then from $v_2$ to $v_3$ . But this is a contradiction, since the diameter of this union is less than $2 \ell < \frac{1}{8} \log(n)$ , which implies that their union is tanglefree. We conclude that $\psi(\gamma_{i+1} \cup \gamma_j) \leq 8$ since there are at most two splits and each split contributes at most four to the imbalance. Then $\psi(\gamma_{\leq i+1}) \leq \psi(\gamma_{\leq i}) + 8i + 2$ , which implies that $\psi(\gamma)\leq 16k^2$ , as desired.
5.7. Bounding the inconsistent edges
We will need a bound on the number of inconsistent edges of multiplicity one, which we get in the following lemma. Recall Definition 5.5, which introduced inconsistent edges.
Lemma 5.13. Let $b_\mathcal{C}$ denote the number of inconsistent edges of multiplicity one on a circuit $\gamma = (\gamma_1, \ldots, \gamma_{2k})$ consisting of 2k nonbacktracking walks of length $\ell$ each. It holds that $ b_\mathcal{C} \leq 4 (k + \chi)$ , where $\chi=\mathcal{E}\mathcal{V}+1$ .
Proof. Let $\{e,f\}$ be an inconsistent edge of multiplicity one, where e and f are its half edges. For inconsistency and without loss of generality, there must exist another edge $\{e,f'\}$ in $\gamma$ , so that $m_\gamma (e) \neq 1$ . We may assume that $\{e,f\}$ is traversed before $\{e,f'\}$ . Let v be the vertex of e and consider the two possible scenarios:

Case 1: There is no cycle containing v in $\gamma$ . Then the edge $\{e,f\}$ may only be inconsistent if v is visited at the end of one of the 2k nonbacktracking walks $\gamma_i$ and $\{e,f'\}$ is at the beginning of $\gamma_{i+1}$ . Hence, in this case, we have at most two inconsistent edges of multiplicity one. This yields at most 4k such edges.

Case 2: There is a cycle passing through v. For each such cycle, there is an edge that does not belong to the tree $T_{\gamma}$ (defined in Section 5.4). Furthermore, each cycle creates at most four inconsistent edges. Combining these two facts we get at most $4\chi$ and the proof follows.
Lemma 5.14. Consider a circuit $\gamma = (\gamma_1, \ldots, \gamma_{2k})$ with $\gamma_s \in T^{\ell,j}$ for $j \in [2k]$ , with $\gamma_s$ decomposed as $\gamma^{\prime}_s, \gamma^{\prime\prime}_s, \gamma^{\prime\prime\prime}_s$ as in the Definition 5.4. Let $b_\mathcal{D}$ denote the number of inconsistent edges of multiplicity one in the union of segments $\bigcup_{s=1}^{2k} (\gamma^{\prime}_s \cup \gamma^{\prime\prime}_s)$ . Then $b_\mathcal{D}\leq 16k+4 \chi$ , where $\chi = \mathcal{E}  \mathcal{V} + 1$ .
Proof. The argument is similar to the above; however, now there are 4k segments in $\bar\gamma = \bigcup_{s=1}^{2k} (\gamma^{\prime}_s \cup \gamma^{\prime\prime}_s)$ , counting $\gamma^{\prime}_s$ and $\gamma^{\prime\prime}_s$ separately. As above, each of these 4k segments may yield at most 2 inconsistent edges. Furthermore, the graph induced by $\bar\gamma$ may not be connected; let C be the number of connected components. Each edge that creates a cycle may yield at most 4 inconsistent edges, and there are at most $\mathcal{E}  \mathcal{V} + C$ nontree edges. Then, we have that
as claimed.
5.8. Bounds on the norm of $\bar{B}^{\ell}$ and $R^{\ell,j}$
All of the ingredients are gathered to bound the matrix norms.
Theorem 5.15. Let $\ell = \lfloor c\log(n) \rfloor$ where $c < \frac{1}{32}$ is a universal constant. It holds that
asymptotically almost surely.
Proof. The following holds for any natural number k, but for our proof, we will take
We have
The sum is taken over the set $\mathcal{C}$ of all circuits $\gamma$ of length $2k\ell$ , where $\gamma= ( \gamma_1, \gamma_2, \ldots, \gamma_{2k} )$ is formed by concatenation of 2k tanglefree segments $\gamma_s \in F^{\ell}$ , with the convention $e^{s+1}_1=e^s_{\ell+1}$ . Again, refer to Figure 4 for clarification.
As in Section 5.4, we will break these into circuits which visit exactly $\mathcal{V} = V(\gamma)$ different vertices, $\mathcal{R} = V(\gamma) \cap V_2$ of them in the right set, and $\mathcal{E} = E(\gamma)$ different edges. We define three disjoint sets of circuits:
Define the quantities
for $j = 1, 2$ and 3, so that (24) can be bounded as
We will bound each term on the righthand side above. The reason for this division is that, by Theorem 5.6, when we have any twopath traversed exactly once, the expectation of the corresponding circuit is smaller, because the matrix $\bar{B}$ is nearly centered. We will see that the leading order terms in equation (24) will come from circuits in $\mathcal{C}_1$ . From Lemmas 5.6 and 5.8 and Corollary 5.11, we get that
We use $C, c_0, c_1, c_2, c_3, c_4$ to denote constant terms and set $\alpha = ((d_11)(d_21))^{1/4}$ . In the last line, we used Lemmas 5.12 and 5.13 to bound $\psi$ and $b_\mathcal{C}$ in terms of k and $\chi$ and remove the sum over $\mathcal{R}$ , which contains at most $\mathcal{V}$ terms. We will use equation (26) to bound each $I_j$ .
5.8.1 Bounding $I_1$
Here, $\mathcal{E}_1 = 0$ since every edge is traversed twice. We then have that $\mathcal{V}  1 \leq \mathcal{E} \leq k\ell$ . Since $\gamma$ is connected, we have $1 \leq \mathcal{V} \leq k\ell+1$ . Thus, on the righthand side of equation (26) we get
The second sum is upper bounded by $\sum_{\chi=0}^\infty\left( C \frac{ ((\mathcal{V}+1)^2 (\ell+1) )^{2k} }{n} \right)^\chi$ , where C is some constant, which we show next is bounded by a common constant for our choices of k and $\ell$ and all $\mathcal{V} \in [k\ell + 1]$ . To see this, it will suffice to show that $((\mathcal{V}+1)^2 (\ell+1) )^{2k}=o(n)$ . But $((\mathcal{V}+1)^2 (\ell+1) ) = O(k^2 \ell^3)$ which, for our choices of k and $\ell$ , yields
as desired.
Finally, the first summand is maximised for $\mathcal{V} = k\ell + 1$ and there are at $k\ell + 1$ many terms in that sum. Therefore, modifying the constant $c_0$ yields
5.8.2 Bounding $I_2$
Here there is at least one edge traversed exactly once, so we have $\mathcal{E} \geq \mathcal{V}$ for $\gamma\in \mathcal{C}_2$ . Taking $\mathcal{E}_1 = 0$ only increases the righthand side on equation (26); it becomes
Notice that this last term is almost identical to the one in the bound of $I_1$ , except that now we start the second sum at $\chi=1$ , which leads to an extra factor of $O( ((\mathcal{V} + 1)^2 (\ell + 1))^{2k} / n )$ . This allow us to factor out another geometric series and proceed as we did for $I_1$ . This yields
since there are $k\ell+1$ terms in the first sum.
5.8.3 Bounding $I_3$
This set will require more delicate treatment, since circuits in $\mathcal{C}_3$ visit potentially many vertices and edges, yet we need to keep the power of $\alpha$ at most $2k\ell$ .
We first show that, in this case, $\mathcal{E}_1$ is also large. We have $\mathcal{E} \geq \mathcal{V}$ , and let $\mathcal{V} = k\ell + t$ . Define $\mathcal{E}^{\prime}_1$ as the number of edges traversed once in $\gamma$ , so that $\mathcal{E}^{\prime}_1 = b + \mathcal{E}_1$ . Since $\gamma$ has length $2k \ell$ , we deduce that $2 (\mathcal{E}  \mathcal{E}^{\prime}_1) + \mathcal{E}^{\prime}_1 \leq 2k\ell$ , which implies that $\mathcal{E}^{\prime}_1 \geq 2t$ . Finally, Lemma 5.13 yields $\mathcal{E}_1 \geq (2t  4 (\chi + k))_+$ . equation (26) then gives,
To simplify our notation, we will write
Observe now that we can write
where $c(k,\ell)=c_3\alpha^2((2k\ell+1)^2(\ell+1))^{2k}$ . To bound the double sum on the righthand side above, we start by removing a factor of $\frac{c(k,\ell)}{n}$ , which leaves
The n in the denominator is crucial to cancel the linear term in $F(k,\ell)$ , keeping the upper bound for $I_3$ small. We focus on bounding the double sum. We split the sum in t in two parts.
Case 1: $t< 2k+2$ . For these values of t, we have $(t2k22\chi)_+=0$ , hence
where the last equality uses again the same geometric upper bound for the sum over $\chi$ .
Case 2: $t\geq 2k+2$ . We split the second sum, from $\chi=0$ to $N=\lfloor t/2k1\rfloor$ and the terms with $\chi > N$ and analyse the two separately. The first can be upper bounded by
The last inequality can be checked in two steps: We first factor out the power of $\alpha^{4k+4}$ , and then use that $\frac{c(k,\ell)}{n}\leq \frac{c_4 k \ell}{\sqrt{n}}$ , which holds for large enough n, to simplify the second sum to the addition of $N+1$ equal terms. To bound the right hand side, we one more time upper bound by a geometric series of ratio less than one to get
We are left with the terms $\chi > N$ . In this case, we get $(t2k22\chi)_+=0$ , so
The sum over $\chi$ is of the order of $\left(\frac{c(k,\ell)}{n}\right)^{N+1}\leq \left(\frac{c(k,\ell)}{n}\right)^{t/2k1}$ . Substituting this into the above, we are left with
for some universal constant C. After factoring $\alpha^{4k+4}$ and changing variables in the summation, we conclude that
Using (29) and the results for case 1 (30) and case 2 (31) and (32), we conclude that
5.8.4 Finishing the proof of Theorem 5.15
We have bounded the three pieces we need to prove the theorem. From (27), (28), and (33), with n sufficiently large, we get
where $c_5, c_6, C, C', C''$ are universal constants.
Take any $\epsilon > 0$ . It can be checked that $(\log n)^{20 (\log n)^{1/3} + 6}= o(n^{\epsilon})$ , and $f(n) = o(n^{\epsilon})$ as well. Let $g(n) = \exp((\log n)^{3/4})$ ; then $g(n) = o(n^\epsilon)$ but $g(n)^{2k} \gg n^\epsilon$ . We apply Markov’s inequality, so that
which is the statement of the theorem.
Theorem 5.16. Let $1\leq j\leq \ell = \lfloor c \log(n) \rfloor$ where $c < \frac{1}{32}$ is a universal constant. Then
asymptotically almost surely.
Proof. The proof is analogous to the proof of Theorem 5.15. Recall the definition of $R^{\ell,j}$ in equation (15). For any integer k, we have that
Now, the sum is over the set $\mathcal{D}$ of circuits $\gamma= (\gamma_1,\gamma_2,\dots,\gamma_{2k})$ of length $2k\ell$ formed from 2k elements of $T^{\ell,j}$ , $\gamma_s = (e^s_1, e^s_2, \ldots, e^s_{2\ell+1} )$ for