The Critical Mean-field Chayes-Machta Dynamics

The random-cluster model is a unifying framework for studying random graphs, spin systems and electrical networks that plays a fundamental role in designing efficient Markov Chain Monte Carlo (MCMC) sampling algorithms for the classical ferromagnetic Ising and Potts models. In this paper, we study a natural non-local Markov chain known as the Chayes-Machta dynamics for the mean-field case of the random-cluster model, where the underlying graph is the complete graph on $n$ vertices. The random-cluster model is parametrized by an edge probability $p$ and a cluster weight $q$. Our focus is on the critical regime: $p = p_c(q)$ and $q \in (1,2)$, where $p_c(q)$ is the threshold corresponding to the order-disorder phase transition of the model. We show that the mixing time of the Chayes-Machta dynamics is $O(\log n \cdot \log \log n)$ in this parameter regime, which reveals that the dynamics does not undergo an exponential slowdown at criticality, a surprising fact that had been predicted (but not proved) by statistical physicists. This also provides a nearly optimal bound (up to the $\log\log n$ factor) for the mixing time of the mean-field Chayes-Machta dynamics in the only regime of parameters where no non-trivial bound was previously known. Our proof consists of a multi-phased coupling argument that combines several key ingredients, including a new local limit theorem, a precise bound on the maximum of symmetric random walks with varying step sizes, and tailored estimates for critical random graphs. In addition, we derive an improved comparison inequality between the mixing time of the Chayes-Machta dynamics and that of the local Glauber dynamics on general graphs; this results in better mixing time bounds for the local dynamics in the mean-field setting.


Introduction
The random-cluster model generalizes classical random graph and spin system models, providing a unifying framework for their study [14]. It plays an indispensable role in the design of efficient Markov Chain Monte Carlo (MCMC) sampling algorithms for the ferromagnetic Ising/Potts model [31,8,20] and has become a fundamental tool in the study of phase transitions [2,12,11].
The random-cluster model is defined on a finite graph = ( , ) with an edge probability parameter ∈ (0, 1) and a cluster weight > 0. The set of configurations of the model is the set of all subsets of edges ⊆ . The probability of each configuration is given by the Gibbs distribution: where ( ) is the number of connected components in ( , ) and := ( , , ) is the normalizing factor called the partition function.
The special case when = 1 corresponds to the independent bond percolation model, where each edge of the graph appears independently with probability . Independent bond percolation is also known as the Erdős-Rényi random graph model when is the complete graph.
For integer ≥ 2, the random-cluster model is closely related to the ferromagnetic -state Potts model. Configurations in the -state Potts model are the assignments of spin values {1, . . . , } to the vertices of ; the = 2 case corresponds to the Ising model. A sample ⊆ from the random-cluster distribution can be easily transformed into one for the Ising/Potts model by independently assigning a random spin from {1, . . . , } to each connected component of ( , ). Random-cluster based sampling algorithms, which include the widely-studied Swendsen-Wang dynamics [30], are an attractive alternative to Ising/Potts Markov chains since they are often efficient at "low-temperatures" (large ). In this parameter regime, several standard Ising/Potts Markov chains are known to converge slowly.
In this paper we investigate the Chayes-Machta (CM) dynamics [10], a natural Markov chain on randomcluster configurations that converges to the random-cluster measure. The CM dynamics is a generalization to non-integer values of of the widely studied Swendsen-Wang dynamics [30]. As with all applications of the MCMC method, the primary object of study is the mixing time, i.e., the number of steps until the dynamics is close to its stationary distribution, starting from the worst possible initial configuration. We are interested in understanding how the mixing time of the CM dynamics grows as the size of the graph increases, and in particular how it relates to the phase transition of the model. Given a random-cluster configuration ( , ), one step of the CM dynamics is defined as follows: (i) activate each connected component of ( , ) independently with probability 1/ ; (ii) remove all edges connecting active vertices; (iii) add each edge between active vertices independently with probability , leaving the rest of the configuration unchanged.
We call (i) the activation sub-step, and (ii) and (iii) combined the percolation sub-step. It is easy to check that this dynamics is reversible with respect to the Gibbs distribution (1) and thus converges to it [10]. For integer , the CM dynamics may be viewed as a variant of the Swendsen-Wang dynamics. In the Swendsen-Wang dynamics, each connected component of ( , ) receives a random color from {1, . . . , }, and the edges are updated within each color class as in (ii) and (iii) above; in contrast, the CM dynamics updates the edges of exactly one color class. However, note that the Swendsen-Wang dynamics is only well-defined for integer , while the CM dynamics is feasible for any real > 1. Indeed, the CM dynamics was introduced precisely to allow this generalization. The study of the interplay between phase transitions and the mixing time of Markov chains goes back to pioneering work in mathematical physics in the late 1980s. This connection for the specific case of the CM dynamics on the complete -vertex graph, known as the mean-field model, has received some attention in recent years (see [7,15,18]) and is the focus of this paper. As we shall see, the mean-field case is already quite non-trivial, and has historically proven to be a useful starting point in understanding various types of dynamics on more general graphs. We note that, so far, the mean-field is the only setting in which there are tight mixing time bounds for the CM dynamics; all other known bounds are deduced indirectly via comparison with other Markov chains, thus incurring significant overhead [8,6,17,5,31,7]. The phase transition for the mean-field random-cluster model is fairly well-understood [9,25]. In this setting, it is natural to re-parameterize by setting = / ; the phase transition then occurs at the critical value = ( ), where ( ) = when ∈ (0, 2] and ( ) = 2( −1 −2 ) log( − 1) for > 2. For < ( ) all components are of size (log ) with high probability (w.h.p.); that is, with probability tending to 1 as → ∞. This regime is known as the disordered phase. On the other hand, for > ( ) there is a unique giant component of size ≈ , where = ( , ); this regime of parameters is known as the ordered phase. The phase transition is thus analogous to that in ( , ) corresponding to the emergence of a giant component.
The phase structure of the mean-field random-cluster model, however, is more subtle and depends crucially on the second parameter . In particular, when > 2 the model exhibits phase coexistence at the critical threshold = ( ). Roughly speaking, this means that when = ( ), the set of configurations with all connected components of size (log ), and set of configurations with a unique giant component, contribute each a constant fraction of the probability mass. For ≤ 2, on the other hand, there is no phase coexistence. These subtleties are illustrated in Figure 1.
Phase coexistence at = ( ) when > 2 has significant implications for the speed of convergence of Markov chains, including the CM dynamics. The following detailed connection between the phase structure of the model and the mixing time CM mix of the CM dynamics was recently established in [7,4,18]. When > 2, we have: where ( , ) is the so-called metastability window. It is known that = , but does not have a closed form; see [7,25]; we note that ( ) ∈ ( , ) for > 2.
When ∈ (1,2], there is no metastability window, and the mixing time of the mean-field CM dynamics is Θ(log ) for all ≠ ( ). In view of these results, the only case remaining open is when ∈ (1, 2] and = ( ). Our main result shown below concerns precisely this regime, which is particularly delicate and had resisted analysis until now for reasons we explain in our proof overview.
Theorem 1.1. The mixing time of the CM dynamics on the complete -vertex graph when = ( ) = and ∈ (1, 2) is (log · log log ).
A Ω(log ) lower bound is known for the mixing time of the mean-field CM dynamics that holds for all ∈ (0, 1) and > 1 [7]. Therefore, our result is tight up to the lower order (log log ) factor, and in fact even better as we explain in Remark 4. The conjectured tight bound when = ( ) and ∈ (1, 2) is Θ(log ). We mention that the = ( ) and = 2 case, which is quite different and not covered by Theorem 1.1, was considered earlier in [24] for the closely related Swendsen-Wang dynamics, and a tight Θ( 1/4 ) bound was established for its mixing time. The same mixing time bound is expected for the CM dynamics in this regime.
Our result establishes a striking behavior for random-cluster dynamics when ∈ (1, 2). Namely, there is no slowdown (exponential or power law) in this regime at the critical threshold = ( ). Note that for > 2, as described in (2) above, the mixing time of the dynamics undergoes an exponential slowdown, transitioning from Θ(log ) when < , to a power law at = , and to exponential in when ∈ ( , ). The absence of a critical slowdown for ∈ (1, 2) was in fact predicted by the statistical physics community [16], and our result provides the first rigorous proof of this phenomenon. See Remark 1 for further comments.
Our second result concerns the local Glauber dynamics for the random-cluster model. In each step, the Glauber dynamics updates a single edge of the current configuration chosen uniformly at random; a precise definition of this Markov chain is given in Section 6. In [7], it was established that any upper bound on the mixing time CM mix of the CM dynamics can be translated to one for the mixing time GD mix of the Glauber dynamics, at the expense of a˜ ( 4 ) factor; the˜ notation hides polylogarithmic factors. In particular, it was proved in [7] that GD mix ≤ CM mix ·˜ ( 4 ). We provide here an improvement of this comparison inequality. Theorem 1.2. For all > 1 and all = (1), GD mix ≤ CM mix · ( 3 (log ) 2 ). To prove this theorem, we establish a general comparison inequality that holds for any graph, any ≥ 1 and any ∈ (0, 1); see Theorem 6.1 for a precise statement. When combined with the known mixing time bounds for the CM dynamics on the complete graph, Theorem 1.2 yields that the random-cluster Glauber dynamics mixes in˜ ( 3 ) steps when > 2 and ∉ ( , ), or when ∈ (1, 2) and = (1). In these regimes, the mixing time of the Glauber dynamics was previously known to be˜ ( 4 ) and is conjectured to be˜ ( 2 ); the improved comparison inequality in Theorem 1.2 gets us closer to this conjectured tight bound. We note, however, that even if one showed the conjectured optimal bound for the mixing time of the Glauber dynamics, the CM is faster, even if we take into account the computational cost associated to implementing its steps.
We conclude this introduction with some brief remarks about our analysis techniques, which combine several key ingredients in a non-trivial way. Our bound on the mixing time uses the well-known technique of coupling: in order to show that the mixing time is (log ·log log ), it suffices to couple the evolutions of two copies of the dynamics, starting from two arbitrary configurations, in such a way that they arrive at the same configuration after (log ) steps with probability Ω(1/log log ). (The moves of the two copies can be correlated any way we choose, provided that each copy, viewed in isolation, is a valid realization of the dynamics.) Because of the delicate nature of the phase transition in the random-cluster model, combined with the fact that the percolation sub-step of the CM dynamics is critical when = , our coupling is somewhat elaborate and proceeds in multiple phases. The first phase consists of a burn-in period, where the two copies of the chain are run independently and the evolution of their largest components is observed until they have shrunk to their "typical" sizes. This part of the analysis is inspired by similar arguments in earlier work [7,24,15].
In the second phase, we design a coupling of the activation of the connected components of the two copies which uses: (i) a local limit theorem, which can be thought of as a stronger version of a central limit theorem; (ii) a precise understanding of the distribution of the maximum of symmetric random walks on Z with varying step sizes; and (iii) precise estimates for the component structure of random graphs. We develop tailored versions of these probabilistic tools for our setting and combine them to guarantee that the same number of vertices from each copy are activated in each step w.h.p. for sufficiently many steps. This phase of the coupling is the main novelty in our analysis, and allows us to quickly converge to the same configuration. We give a more detailed overview of our proof in the following section.

Proof sketch and techniques
We now give a detailed sketch of the multi-phased coupling argument for proving Theorem 1.1. We start by formally defining the notions of mixing and coupling times. Let Ω RC be the set of random-cluster configurations of a graph ; let M be the transition matrix of a random-cluster Markov chain with stationary distribution = , , , and let M ( 0 , ·) be the distribution of the chain after steps starting from 0 ∈ Ω RC . The -mixing time of M is given by where || · || TV denotes total variation distance. In particular, the mixing time of M is M mix := M mix (1/4). A (one step) coupling of the Markov chain M specifies, for every pair of states ( , ) ∈ Ω RC × Ω RC , a probability distribution over ( +1 , +1 ) such that the processes { } and { } are valid realizations of M, and if = then +1 = +1 . The coupling time, denoted coup , is the minimum such that Pr[ ≠ ] ≤ 1/4, starting from the worst possible pair of configurations in Ω RC . It is a standard fact that M mix ≤ coup ; moreover, when Pr[ = ] ≥ for some coupling, then M mix = ( −1 ) (see, e.g., [22]). We provide first a high level description of our coupling for the CM dynamics. For this, we require the following notation. For a random cluster configuration , let ( ) denote the size of the -th largest connected component in ( , ), and let R ( ) := ≥ ( ) 2 ; in particular, R 1 ( ) is the sum of the squares of the sizes of all the components of ( , ). Our coupling has three main phases: 1. Burn-in period: run two copies { }, { } independently, starting from a pair of arbitrary initial configurations, until R 1 ( ) = ( 4/3 ) and R 1 ( ) = ( 4/3 ).

2.
Coupling to the same component structure: starting from and such that R 1 ( ) = ( 4/3 ) and R 1 ( ) = ( 4/3 ), we design a two-phased coupling that reaches two configurations with the same component structure as follows: 2a. A two-step coupling after which the two configurations agree on all "large components"; 2b. A coupling that after (log ) additional steps reaches two configurations that will also have the same "small component" structure.
3. Coupling to the same configuration: starting from two configurations with the same component structure, there is a straightforward coupling that couples the two configurations in (log ) steps w.h.p.
We proceed to describe each of these phases in detail.

The burn-in period
During the initial phase, two copies of the dynamics evolve independently. This is called a burn-in period and in our case consists of three sub-phases.
In the second and third sub-phases of the burn-in period, we use the fact that when R 2 ( ) = ( 4/3 ), the number of activated vertices is well concentrated around / (its expectation). This is used to show that the size of the largest component contracts at a constant rate for = (log ) steps until a configuration is reached such that R 1 ( ) = ( 4/3 ). This part of the analysis is split into two sub-phases because the contraction for 1 ( ) requires a more delicate analysis when 1 ( ) = ( ); this is captured in the following two lemmas.
Lemma 2.3. Let = and ∈ (1, 2). Suppose R 2 ( 0 ) = ( 4/3 ) and that 1 ( 0 ) ≤ for a sufficiently small constant . Then, with probability Ω(1), after = (log ) steps R 1 ( ) = ( 4/3 ).  Remark 2. Sub-steps (ii) and (iii) of the CM dynamics are equivalent to replacing the active portion of the configuration by a ( , / ) random graph, where is the number of active vertices. Since E[ ] = / , one key challenge in the proofs of Lemmas 2.2 and 2.3, and in fact in the entirety of our analysis, is that the random graph ( , / ) is critical or almost critical w.h.p. since · / ≈ 1; consequently its structural properties are not well concentrated and cannot be maintained for the required (log ) steps of the coupling. This is one of the key reasons why the = ( ) = regime is quite delicate.

Coupling to the same component structure
For the second phase of the coupling, we assume that we start from a pair of configurations 0 , 0 such that The goal is to show that after = (log ) steps, with probability Ω(1/log log ), we reach two configurations and with the same component structure; i.e., ( ) = ( ) for all ≥ 1. In particular, we prove the following.
Our coupling construction for proving Theorem 2.5 has two main sub-phases. The first is a two-step coupling after which the two configurations agree on all the components of size above a certain threshold = 2/3 / ( ), where ( ) is a slowly increasing function. For convenience and definiteness we set ( ) = log log log log . In the second sub-phase we take care of matching the small component structures.
We note that when the same number of vertices are activated from each copy of the chain, we can easily couple the percolation sub-step (with an arbitrary bijection between the activated vertices) and replace the configuration on the active vertices in both chains with the same random sub-graph; consequently, the component structure in the updated sub-graph would be identical. Our goal is thus to design a coupling of the activation of the components that activates the same number of vertices in both copies in every step.
In order for the initial two-step coupling to succeed, certain (additional) properties of the configurations are required. These properties are achieved with a continuation of the initial burn-in phase for a small number of (log ( )) steps. For a random-cluster configuration , let R ( ) = : ( ) ≤ ( ) 2 and let ( ) denote the number of isolated vertices of . Our extension of the burn-in period is captured by the following lemma.
The proof of Lemma 2.6 is provided in Section 5.1. With these bounds on R ( ), R ( ), ( ) and ( ), we construct the two-step coupling for matching the large component structure. The construction crucially relies on a new local limit theorem (Theorem 5.1). In particular, under our assumptions, when ( ) is small enough, there are few components with sizes above . Hence, we can condition on the event that all of them are activated simultaneously. The difference in the number of active vertices generated by the activation of these large components can then be "corrected" by a coupling of the activation of the smaller components; for this we use our new local limit theorem.
Specifically, our local limit theorem applies to the random variables corresponding to the number of activated vertices from the small components of each copy. We prove it using a result of Mukhin [28] and the fact that, among the small components, there are (roughly speaking) many components of many different sizes. To establish the latter we require a refinement of known random graph estimates (see Lemma 3.11).
To formally state our result we introduce some additional notation. Let S ( ) be the set of connected components of with sizes greater than . At step , the activation of the components of two randomcluster configurations and is done using a maximal matching between the components of and , with the restriction that only components of equal size are matched to each other. For an increasing positive function and each integer ≥ 0, defineˆ ( , ) :=ˆ ( , , ) as the number of matched pairs in whose component sizes are in the interval where > 0 is a fixed large constant (independent of ).
From the first part of the lemma we obtain two configurations that agree on all of their large components, as desired, while the second part guarantees additional structural properties for the resulting configurations so that the next sub-phase of the coupling can also succeed with the required probability. The proof of Lemma 2.7 is given in Section 5.2.
In the second sub-phase, after the large component are matched, we can design a coupling that activates exactly the same number of vertices from each copy of the chain. To analyze this coupling we use a precise estimate on the distribution of the maximum of symmetric random walks over integers (with steps of different sizes). We are first required to run the chains coupled for = (log ( )) steps, so that certain additional structural properties appear. Let ( ) and ( ) be the components in the matching that belong to and , respectively, and let ( ) and ( ) be the complements of ( ) and ( ). Let Lemma 2.8. Let = , ∈ (1, 2). Suppose 0 and 0 are random-cluster configurations such that S ( 0 ) = S ( 0 ), andˆ (0, ( )) = Ω( ( ) 3·2 −1 ) for all ≥ 1 such that 2/3 ( ) −2 −1 → ∞. Suppose also that and similarly for 0 .
The proof of Lemma 2.8 also uses our local limit theorem (Theorem 5.1) and is provided in Section 5.3.
The final step of our construction is a coupling of the activation of the components of size less than , so that exactly the same number of vertices are activated from each copy in each step w.h.p. Lemma 2.9. Let = , ∈ (1, 2) and suppose 0 and 0 are random-cluster configurations such that and similarly for 0 . Then, there exist a coupling of the CM steps and a constant > 0 such that after = (log ) steps, and have the same component structure with probability Ω (log log log ) − .
We comment briefly on how we prove this lemma. Our starting point is two configurations with the same "large" component structure; i.e., S ( 0 ) = S ( 0 ). We use the maximal matching 0 to couple the activation of the large components in 0 and 0 . The small components not matched by 0 , i.e., those counted in 0 , are then activated independently. This creates a discrepancy D 0 between the number of active vertices from each copy. Since E[D 0 ] = 0 and Var(D 0 ) = Θ( 0 ) = Θ( 4/3 ( ) −1/2 ), it follows from Hoeffding's inequality that D 0 ≤ 2/3 ( ) −1/4 w.h.p. To fix this discrepancy, we use the small components matched by 0 . Specifically, under the assumptions in Lemma 2.9, we can construct a coupling of the activation of the small components so that the difference in the number of activated vertices from the small components from each copy is exactly D 0 with probability Ω(1). This part of the construction utilizes random walks over the integers; in particular, we use a lower bound for the maximum of such a random walk.
We need to repeat this process until = 0; this takes (log ) steps since ≈ (1 − 1/ ) 0 . However, there are a few complications. First, the initial assumptions on the component structure of the configurations are not preserved for this many steps w.h.p., so we need to relax the requirements as the process evolves. This is in turn possible because the discrepancy D decreases with each step, which implies that the probability of success of the coupling increases at each step. See Section 5.4 for the detailed proof.
We now indicate how these lemmas lead to a proof of Theorem 2.5 stated earlier.
Remark 3. We pause to mention that this delicate coupling for the activation of the components is not required when = and > 2. In that regime, the random-cluster model is super-critical, so after the first (log ) steps, the component structure is much simpler, with exactly one large component. On the other hand, when = and ∈ (1, 2] the model is critical, which, combined with the fact mentioned earlier that the percolation sub-step of the dynamics is also critical when = , makes the analysis of the CM dynamics in this regime quite subtle.

Coupling to the same configuration
In the last phase of the coupling, suppose we start with two configurations 0 , 0 with the same component structure. We are still required to bound the number of steps until the same configuration is reached. The following lemma from [7] supplies the desired bound. Combining the results for each of the phases of the coupling, we now prove Theorem 1.1.
Proof of Theorem 1.1. By Theorem 2.4, after 0 = (log ) steps, with probability Ω(1), If this is the case, Theorem 2.5 and Lemma 2.10 imply that there exists a coupling of the CM steps such that with probability Ω (log log ) −1 after an additional 1 = (log ) steps, 0 + 1 = 0 + 1 . Consequently, we obtain that CM mix = (log · log log ) as claimed.
Remark 4. The probability of success in Theorem 2.5, which governs the lower order term (log log ) in our mixing time bound, is controlled by our choice of the function ( ) for the definition of "large components". By choosing ( ) that goes to ∞ more slowly, we could improve our mixing time bound to (log · ( )) where ( ) is any function that tends to infinity arbitrarily slowly. However, it seems that new ideas are required to obtain a bound of (log ) (matching the known lower bound). In particular, the fact that ( ) → ∞ is crucially used in some of our proofs. Our specific choice of ( ) yields the (log · log log ) bound and makes our analysis cleaner.

Random graph estimates
In this section, we compile a number of standard facts about the ( , ) random graph model which will be useful in our proofs. We use ∼ ( , ) to denote a random graph sampled from the standard ( , ) model, in which every edge appears independently with probability . A ( , ) random graph is said to be sub-critical when < 1. It is called super-critical when > 1 and critical when = 1. For a graph , with a slight abuse of notation, let ( ) denote the size of the -th largest connected component in , and let R ( ) := ≥ ( ) 2 ; note that the same notation is used for the components of a random-cluster configuration, but it will always be clear from context which case is meant.

Consider the equation
and let ( ) be defined as its unique positive root. Observe that is well-defined for > 1.
All the random graph facts stated so far can be either found in the literature, or follow directly from well-known results. The following lemmas are slightly more refined versions of similar results in the literature.
Lemma 3.10. Suppose | | = (ℎ( )) and let ℎ = 2/3 ℎ( ) −1 , where ℎ : N → R is a positive increasing function such that ℎ( ) = (log ). Then, for any ∈ (0, 1) there exists a constant = ( ) > 0 such that, with probability at least , The proofs of Lemmas 3.10 and 3.11 are provided in Appendix C. Finally, the following corollary of Lemma 3.11 will also be useful. For a graph , let ( , ) be the number of components of whose sizes are in the interval I ( ). We note that with a slight abuse of notation, for a random-cluster configuration , we also use ( , ) for the number of connected components of in I ( ).

The burn-in period: proofs
In this section we will provide proofs for Lemma 2.2 and Lemma 2.3.

A drift function
Consider the mean-field random-cluster model with parameters ≥ 1 and = / . In this subsection, we introduce a drift function captures the rate of decay of the size of the largest component in a configuration under steps of the CM dynamics which will be helpful for proving Lemma 2.2; this function was first studied in [7]. Given ∈ (0, 1], consider the equation and let ( , , ) be defined as the largest positive root of (4). We shall see that is not defined for all and since there may not be a positive root. When and are clear from the context we use ( ) = ( , , ). Note that ( ) defined by equation (3) is the special case of (4) when = 1; observe that is only welldefined when > 1.

Shrinking a large component: proof of Lemma 2.2
The proof of Lemma 2.2 uses the following lemma, which follows directly from standard random graph estimates and Hoeffding's inequality. To simplify the notation, we letˆ ( ) := 1 ( )/ 2/3 . We use ( ) to denote the number of vertices activated by the step CM dynamics from configuration . Let Λ denote the event that the largest component of the configuration is activated in step .
Proof of Lemma 2.2. Letˆ be the first time whenˆ ( ) ≤ 1/3 , let ′ be a large constant we choose later; we set := min{ˆ , ′ }. Observe that with constant probability the largest component in the configuration is activated by the CM dynamics for every ≤ ′ ; i.e., the event Λ occurs for every ≤ ′ . Let us assume this is the case and fix < .
If ( ) ∈ , then the random graph ( ( ), ) is super-critical since Next, for a super-critical random graph, Lemma 3.3 provides a concentration bound for the size of largest new component, provided ( ) ∈ . To see this, we write ( ( ), / ) as holds regardless of , Lemma 3.3 implies that for ( ) > 0 defined in Section 4.1, with high probability where is the drift function defined in Section 4.1. By Lemma 4.1, we know ( ) > 1 > 0 for sufficiently small constant 1 (independent of and ). Hence, w.h.p. for sufficiently large this establishes (ii) from above. For (i), note that for < we haveˆ ( ) > 1/3 , so Lemma 4.2 implies, A union bound implies that these two events occur simultaneously w.h.p. and the result follows.

Shrinking a medium size component: proof of Lemma 2.3
In the third sub-phase of the burn-in period, we show that 1 ( ) contracts at a constant rate; the precise description of this phenomenon is captured in the following lemma. 1. There exists a constant := ( , , ) < 1 such that Since Lemmas 4.2 suggests R 2 ( ) = ( 4/3 ) with reasonably high probability throughout the execution of the sub-phase, Lemmas 4.3 and 4.2 can be combined to derive the following more accurate contraction estimate which will be crucial in the proof Lemma 2.3. Suppose 0 is such that ( ) ≥ˆ ( 0 ) ≥ (log ( )) 8 and R 2 ( 0 ) = ( 4/3 ), then there exists a constant and We first provide a proof for Lemma 2.3 that recursively uses the contraction estimate of Lemma 4.4.
where in the last inequality we use the assumption that 1/3 ≥ˆ ( ). For sufficiently small and sufficiently large , ∃ < 1 such that which concludes the proof of part 1. For part 2, note first that when the largest component is inactive, we have 1 ( +1 ) ≥ 1 ( ); hence, it is sufficient to show that 1 ( +1 ) ≤ 1 ( ) with the desired probability.
We are now ready to prove Lemma 4.4.
where is a large constant we choose later. Let := ′ ∧ˆ , where the operator ∧ takes the minimum of the two numbers. Define ( ) as the number of steps up to time in which the largest component of the configuration is activated.
To facilitate the notation, we define the following events. (The constants and are those from Lemmas 4.3 and 4.2, respectively).
Dividing equation (5)  The probability for can then be bounded as follows: Next, let us assume . Then we have Notice that if =ˆ then the proof is complete. Consequently, it suffices to showˆ ≤ ′ with probability at least 1 − ( ) −Ω (1) . Observe that := ( ′ ) is a binomial random variable ( ′ , 1/ ), whose expectation is If indeed = ′ and ≥ ′ 2 log ( ), then the event implieŝ which leads toˆ ≤ . Therefore, as desired.
Proof of Claim 4.5. We first show the following inequality: Note that by direction computation From the definition of , we know that ( ) > 6 for all < . Hence, log < log ( ) 1/6 . In addition, recall that is such that ∀ ≥ 6 , we have ≥ (log ) 48 ; therefore, ( ) ≥ (log ( )) 48 . Then, log(log ( )) 8 ≤ log ( ) 1/6 . Putting all these together, The proof of part (i) is inductive. The base case ( = 0) holds trivially. For the inductive step note that where the last inequality follows from (6). For part (ii) we also use induction. The base case ( = 0) can be checked straightforwardly. For the inductive step, where the last inequality follows from (6).

Coupling to the same component structure: proofs
In this section we provide the proofs of Lemma 2.6, Lemma 2.7, Lemma 2.8 and Lemma 2.9.

Continuation of the burn-in phase: proof of Lemma 2.6
Recall that for a random-cluster configuration , let ( ) denote the random variable corresponding to the number of vertices activated by step (i) of the CM dynamics from .
To establish (7)- (8), let H 1 be the event that ( ) ∈ / − 2/3 , / + 2/3 , where > 0 is a constant. By Hoeffding's inequality, for a suitable > 0, Pr[H 1 ] ≥ 1 − 1 8 2 since R 1 ( ) = ( 4/3 ). Let denote the subgraph induced on the inactivated vertices at the step . Observe that E R ( ) = Hence, by Markov's inequality and independence between activation of each component, with probability at least 1/4 2 , the activation sub-step is such that satisfies and We denote this event by H 2 . It follows by a union bound that H 1 and H 2 happen simultaneously with probability at least 1/8 2 . We assume that this is indeed the case and proceed to discuss the percolation sub-step. Lemma 3.10 implies that there exists 1 > 0 such that with probability 99/100, Hence, where the last inequality holds for a suitable constant ∈ (0, 1) and a sufficiently large since R ( ) > 4/3 ( ) −1/2 . On the other hand, Lemma 3.9 implies E R 1 ( ), = ( 4/3 ). By Markov's inequality, there exists 2 such that, with probability 99/100, For large enough , Finally, it follows from a union bound that (7) and (8)  To prove Lemma 2.7, we use a local limit theorem to construct a two-step coupling of the CM dynamics that reaches two configurations with the same large component structure. The construction of Markov chain couplings using local limit theorems is not common (see [24] for another example), but it appears to be a powerful technique that may have other interesting applications. We provide next a brief introduction to local limit theorems.
Local limit theorem. Let be an integer. Let 1 ≤ · · · ≤ be integers and for = 1, . . . , , let be the random variable that is equal to with probability ∈ (0, 1), and it is zero otherwise. Let us assume that 1 , . . . , are independent random variables. Let = =1 , = E[ ] and 2 = Var( ). We say that a local limit theorem holds for if for every integer ∈ Z: We prove, under some conditions, a local limit theorem that applies to the random variables corresponding to the number of active vertices from small components. Recall that for an increasing positive function and each integer ≥ 0, we defined the intervals where > 0 is a fixed large constant.
Theorem 5.1 follows from a general local limit theorem proved in [28]; a proof is given in Appendix A. We provide next the proof of Lemma 2.7.
Proof of Lemma 2.7. First, both { }, { } perform one independent CM step from the initial configurations 0 , 0 . We start by establishing that 1 and 1 preserve the structural properties assumed for 0 and 0 .
By assumption R 1 ( 0 ) = ( 4/3 ), so Hoeffding's inequality implies that the number of activated vertices from 0 is such that ( 0 ) ∈ := / − ( 2/3 ), / + ( 2/3 ) with probability Ω(1). Then, the percolation step is distributed as a random graph, with | | = (1) with probability Ω(1). Conditioning on this event, from Lemma 3.2 we obtain that ( 1 ) = Ω( ) w.h.p. Moreover, from Lemma 3.9 and Markov's inequality we obtain that R 1 ( 1 ) = ( 4/3 ) with probability at least 99/100 and from Lemma 3.10 that R ( 1 ) = ( 4/3 ( ) −1/2 ) also with probability at least 99/100. We show next that 1 and 1 , in addition to preserving the structural properties of 0 and 0 , also have many connected components with sizes in certain carefully chosen intervals. This fact will be crucial in the design of our coupling. When ( 0 ) ∈ , by Lemmas 3.11 and 3.12 and a union bound, for all integer ≥ 0 such that 2/3 ( ) −2 → ∞, ( 1 , ) = Ω( ( ) 3·2 −1 ) w.h.p. (Recall, that ( 1 , ) denotes the number of connected components of 1 with sizes in the interval I ( ).) We will also require a bound for the number of components with sizes in the interval where > 0 is a constant such that does not intersect any of the I ( )'s intervals. Let (resp., ) be the set of components of 1 (resp., 1 ) with sizes in the interval . Lemma 3.11 then implies that for some positive constants 1 , 2 independent of , All the bounds above apply also to the analogous quantities for 1 with the same respective probabilities. Therefore, by a union bound, all these properties hold simultaneously for both 1 and 1 with probability Ω(1). We assume that this is indeed the case and proceed to describe the second step of the coupling, in which we shall use each of the established properties for 1 and 1 . Recall S ( 1 ) and S ( 1 ) denote the sets of connected components in 1 and 1 , respectively, with sizes larger than . (Recall that = 2/3 ( ) −1 , where ( ) = log log log log .) Since R 1 ( 1 ) = ( 4/3 ), the total number of components in S ( 1 ) is ( ( ) 2 ); moreover, it follows from the Cauchy-Schwarz inequality that the total number of vertices in the components in S ( 1 ), denoted S ( 1 ) , is ( 2/3 ( )); the same holds for S ( 1 ). Without loss of generality, let us assume that S ( 1 ) ≥ S ( 1 ) . Let and let min = arg min ∈ S ( 1 )∪ . In words, min is the smallest subset of components of so that the number of vertices in the union of S ( 1 ) and is greater than that in S ( 1 ). Since every component in has size at least 2/3 ( ) −6 and | | = Ω( ( ) 9 ), the number of vertices in is Ω( 2/3 ( ) 3 ) and so ≠ ∅. In addition, the number of components in min is ( ( ) 9 ). Let S ′ ( 1 ) = S ( 1 ) ∪ min and observe that the number of components in S ′ ( 1 ) is also ( ( ) 9 ) and that 0 ≤ S ′ ( 1 ) − S ( 1 ) ≤ 2 2/3 ( ) −6 .
Note that S ( 1 ) − S ( 1 ) may be Ω( 2/3 ( )) (i.e., much larger than S ′ ( 1 ) − S ( 1 ) ). Hence, if all the components from S ( 1 ) and S ( 1 ) were activated, the difference in the number of active vertices could be Ω( 2/3 ( )). This difference cannot be corrected by our coupling for the activation of the small components. We shall require instead that all the components from S ′ ( 1 ) and S ( 1 ) are activated so that the difference is ( 2/3 ( ) −6 ) instead.
We now describe a coupling of the activation sub-step for the second step of the CM dynamics. As mentioned, our goal is to design a coupling in which the same number of vertices are activated from each copy. If indeed ( 1 ) = ( 1 ), then we can choose an arbitrary bijective map between the activated vertices of 1 and the activated vertices of 1 and use to couple the percolation sub-step. Specifically, if and were activated in 1 , the state of the edges { , } in 2 and { ( ), ( )} in 2 would be the same. This yields a coupling of the percolation sub-step such that 2 and 2 agree on the subgraph update at time 1.
Suppose then that in the second CM step all the components in S ( 1 ) and S ′ ( 1 ) are activated simultaneously. If this is the case, then the difference in the number of activated vertices is ≤ 2 2/3 ( ) −6 . We will use a local limit theorem (i.e., Theorem 5.1) to argue that there is a coupling of the activation of the remaining components in 1 and 1 such that the total number of active vertices in both copies is the same with probability Ω(1). Since all the components in S ( 1 ) and S ′ ( 1 ) are activated with probability exp(− ( ( ) 9 )), the overall success probability of the coupling will be exp(− ( ( ) 9 )). Now, let 1 , 2 , . . . , be the sizes of the components of 1 that are not in S ( 1 ) (in increasing order). Letˆ ( 1 ) be the random variable corresponding to the number of active vertices from these components. Observe thatˆ ( 1 ) is the sum of independent random variables, where the -th variable in the sum is equal to with probability 1/ , and it is 0 otherwise. We claim that sequence 1 , 2 , . . . , satisfies all the conditions in Theorem 5.1.

Re-contracting largest component: proof of Lemma 2.8
In Section 5.2, we designed a coupling argument to ensure that the largest components of both configurations have the same size. For this, we needed to relax our constraint on the size of the largest component of the configurations. In this section we prove Lemma 2.8, which ensures that after (log ( )) steps the largest components of each configuration have size ( 2/3 ) again.
The following lemma is the core of the proof Lemma 2.8 and it may be viewed as a generalization of the coupling from the proof of Lemma 2.7 using the local limit theorem from Section 5.2.
We recall some notation from the proof sketch. Given two random-cluster configurations and , is maximal matching between the components of and that only matches components of equal size to each other. We use ( ), ( ) for the components in from , , respectively, ( ), ( ) for the complements of ( ), ( ), and = C ∈ ( )∪ ( ) |C| 2 . For an increasing positive function and each integer ≥ 1, defineˆ ( , ) :=ˆ ( , , ) as the number of matched pairs in whose component sizes are in the interval where > 0 is a fixed large constant (independent of ). Let := |C ( )| = Θ( ), and similarly for := |C ( )|. Let C 1 ≤ · · · ≤ C (resp., C ′ 1 ≤ · · · ≤ C ′ ) be sizes of components in C ( ) (resp., C ( )) in ascending order. For all ≤ , let X be a random variable that equals to C with probability 1/ and 0 otherwise, which corresponds to the number of activated vertices from th component in C ( ). Note that X 1 , . . . , X are independent. We check that X 1 , . . . , X satisfy all other conditions of Theorem 5.1.
We are now ready to prove Lemma 2.8.
Proof of Lemma 2.8. Let 1 be a suitable constant that we choose later. We wish to maintain the following properties for all ≤ := 1 log ( ): By assumption, 0 and 0 satisfy these properties. Suppose that and satisfy these properties at step ≤ . We show that there exists a one-step coupling of the CM dynamics such that +1 and +1 preserve all six properties with probability Ω ( ) −1 .
We provide the high-level ideas of the proof first. We will crucially exploit the coupling from Lemma 5.2. Assuming ( ) = ( ), properties 1 and 2 hold immediately at + 1, and properties 3 and 4 can be shown by a "standard" approach used throughout the paper. In addition, we reuse simple arguments from previous stages to guarantee properties 5 and 6.
Consider first the activation sub-step. By Lemma 5.2, ( ) = ( ) with probability at least Ω( ( ) −1 ). If the number of vertices in the percolation is the same in both copies, we can couple the edge re-sampling so that the updated part of the configuration is identical in both copies. In other words, all new components created in this step are automatically contained in the component matching +1 ; this includes all new components whose sizes are greater than . Since none of the new components contributes to +1 , we obtain +1 ≤ = 4/3 ( ) 1/2 . Therefore, ( ) = ( ) immediately implies properties 1 and 2 at time + 1.
With probability 1/ , the largest components of and are activated simultaneously. Suppose that this is the case. By Hoeffding's inequality, for constant > 0, we have
In addition, Lemma 3.4 and Markov's inequality imply that there exists a constant 2 such that Pr R 2 ( +1 ) = 2 4/3 ≥ 99 100 ; By Lemma 4.3, there exists < 1 such that with at least probability 99/100 where is independent of and . Potentially, property 6 may not hold when 1 ( ) < 1 ( +1 ) ≤ 2 ( ) = ( 2/3 ), but then we stop at this point. (We will argue that in this case all the desired properties are also established shortly.) Hence, we suppose otherwise and establish properties 5 and 6 for +1 . Similar bounds hold for +1 .

A four-phase analysis using random walks couplings: proof of Lemma 2.9
We introduce first some notation that will be useful in the proof of Lemma 2.9. Let ( 0 ) = ∅, and given ( ), ( +1 ) is obtained as follows: (i) ( +1 ) = ( ); (ii) every component in ( ) activated by the CM dynamics at time is removed from ( +1 ); and (iii) the largest new component (breaking ties arbitrarily) is added to ( +1 ).
Let C ( ) denote the set of connected components of and note that ( ) is a subset of C ( ); we use | ( )| to denote the total number of vertices of the components in ( ). Finally, let In the proof of Lemma 2.9, we use the following lemmas. The proofs of these lemmas are given in Section 5.4.1. In particular, as mentioned, to prove Lemma 5.5 we use a precise estimate on the maximum of a random walk on Z with steps of different sizes (see Theorem 5.7).
Proof of Lemma 2.9. The coupling has four phases: in phase 1 we will consider (log log log log ) steps of the coupling, (log log log ) steps in phase 2, (log log ) steps in phase 3 and phase 4 consists of (log ) steps. We will keep track of the random variables R 1 ( ), R 1 ( ), ( ), ( ), andˆ ( , ) for a function we shall carefully choose for each phase, and use these random variables to derive bounds on the probability of various events. By Lemma 5.5, for a sufficiently large constant > 0, we obtain a coupling for the activation of and such that the same number of vertices are activated in and , with probability at least By Lemma 5.4, ( ) ∈ ( /2 , ) and := ( ( ) / − 1) · ( ) 1/3 = (1) with probability at least 1 − 1 16 2 . It follows a union bound that ( ) = ( ), ( ) ∈ ( /2 , ) and = (1) with probability 1 − 1 8 2 . We call this event H 1 . Let ′ denote the inactivated components in ( ) ∪ ( ) at the step , and ′ the inactivated components in ( ) ∪ ( ). Observe that Similarly, Hence, by Markov's inequality and independence between activation of components in ′ and components in ′ , with probability at least 1/4 2 , the activation sub-step is such that We denote this event by H 2 . By a union bound, H 1 and H 2 happen simultaneously with probability 1/8 2 . Suppose all these events indeed happen; then we couple the percolation step so that the components newly generated in both copies are exactly identical, and we claim that all of the following holds with at least constant probability: First, note that +1 can not possibly increase because the matching +1 can only grow under the coupling if indeed ( ) = ( ). Observe that only the inactivated components in and would contribute to +1 , so +1 = C ∈ ′ |C| 2 ≤ 1 . Next, we establish the properties 3 and 4. For this, notice that the percolation step is distributed as a Since the percolation step is coupled, this implies that both +1 and +1 will have all the components in , so we haveˆ ( + 1, 1 ) = Ω( 1 ( ) 3·2 −1 ) for all ≥ 1 such that 2/3 1 ( ) −2 → ∞, and ( +1 ), ( +1 ) = Ω( ), w.h.p. Finally, assuming that | | = (1), by Lemma 3.9 and Markov's inequality, there exists 2 > 0 such that E [R 1 ( )] = 2 4/3 with probability at least 99/100. Then for large enough 1 . A union bound implies that all four properties hold with at least constant probability > 0. Thus, the probability that at each step of update all four properties can be maintained throughout Phase 1 is at least 1 = −12 log 1 (log log log ) = (log log log ) −12 log ( 1 ) .
If property 2 holds at the end of Phase 1, we have To facilitate discussions in Phase 2, we show that the two copies of chains satisfy one additional property at the end of Phase 1. In particular, there exist a lower bound for the number of components in a different set of intervals. We consider the last percolation step in Phase 1. Then, Corollary 3.12 with 2 ( ) := (log log log · log log log log ) 2 impliesˆ ( 1 , 2 ) = Ω( 2 ( ) 3·2 −1 ) for all ≥ 1 such that 2/3 2 ( ) −2 → ∞, with high probability. Recall ( ) and ( ) defined at the beginning of Section 5.4. In Phase 2, 3 and 4, a new element of the argument is to also control the behavior ( ) and ( ). We provide a general result that will be used in the analysis of the three phases: 5. If a function ′ satisfies ′ ≥ and ′ ( ) Proof of this claim is provided later in Section 5.4.1.
Conditioned on ( ) = ( ) for every activation sub-step in this phase, a bound for can be obtained through a first moment method. On expectation contract by a factor of 1 = 1 − 1 each step. Thus, we can recursively compute the expectation of E[ ]: It follows from Markov's inequality that with at least constant probability Finally, in the last percolation step in this phase, Corollary 3.12 guarantees that with high probabilitŷ ( , ′ ) = Ω( ′ ( ) 3·2 −1 ) for all ≥ 1 such that 2/3 ′ ( ) −2 → ∞. The claim follows from a union bound.
An important tool used in the proof of Lemma 5.5 is the following coupling on a (lazy) symmetric random walk on Z; its proof is given in Appendix B. = − with probability ; = 0 otherwise and has the same distribution as . Let = =1 and = =1 . Then for any > 0, there exist a constant := ( ) > 0 and a coupling of and such that We note that Theorem 5.7 is a generalization of the following more standard fact which will also be useful to us. Recall ≤ 4/3 ℎ( ) . Hence, with probability at least 1 − 4 exp (−2 ( )), We first couple the activation of the components in I 1 , then in I 2 and so on up to I * . Without loss of generality, suppose that 0 = 0 ( ) − 0 ( ). If 0 ≤ 2/3 ( ) 2 , we simply couple the components with sizes in I 1 using the matching . Suppose otherwise that 0 > 2/3 ( ) 2 . Let 1 ( ) and 1 ( ) be random variables corresponding to the numbers of active vertices from ( ) and ( ) with sizes in I 1 respectively. By assumptionˆ 1 ≥ ( ) 3 . Hence, Theorem 5.7 implies that for = ( ) > 0, there exists a coupling for the activation of the components in ( ) and ( ) with sizes in I 1 such that with probability at least where the last inequality holds for large enough. Let 1 := ( 0 ( ) − 0 ( )) + ( 1 ( ) − 1 ( )). If the coupling succeeds, we have 0 ≤ 1 ≤ 2/3 ( ) 2 . Thus, we have shown that 1 ≤ 2/3 ( ) 2 with probability at least Now, let be the difference in the number of active vertices after activating the components in I . Suppose that ≤ 2/3 ( ) 2 , for ≤ * . By assumption,ˆ +1 ≥ ( ) 3·2 . Thus, using Theorem 5.7 again we get that there exists a coupling for the activation of the components in I +1 such that Therefore, there is a coupling of the activation components in I 2 , I 3 , . . . , I * such that Finally, we coupleˆ ( ) andˆ ( ) to fix * . By assumption ( ), ( ) = Ω( ), so := |ˆ ( )| = |ˆ ( )| = Ω( ). Let ( ) and ( ) denote the total number of activated isolated vertices fromˆ ( ) andˆ ( ) respectively. We activate all isolated vertices independently, so ( ) and ( ) can be seen as two binomial random variables with the same parameters and 1/ . Lemma 5.8 gives a coupling for binomial random variables such that for ≤ 1/3 , Therefore, as claimed.

New mixing time for the Glauber dynamics via comparison
In this section, we establish a comparison inequality between the mixing times of the CM dynamics and of the heat-bath Glauber dynamics for the random-cluster model for a general graph = ( , ). The Glauber dynamics is defined as follows. Given a random-cluster configuration , one step of this chain is given by: (i) pick an edge ∈ uniformly at random; (ii) replace the current configuration by ∪ { } with probability It is immediate from its definition this chain is reversible with respect to = , , and thus converges to it.
The following comparison inequality was proved in [7]: where denotes the number of edges in , and gap(CM), gap(GD) the spectral gaps of the transition matrices of the CM and Glauber dynamics, respectively. The standard connection between the spectral gap and the mixing time (see, e.g., Theorem 12.3 in [23]) yields where min = min ∈Ω ( ) with Ω denoting the set of random-cluster configurations on . In some cases, such as in the mean-field model with = Θ( −1 ), log −1 min = Ω( log ), and a factor of ( 2 (log ) 2 ) is thus lost in the comparison. We provide here an improved version of this inequality. Theorem 6.1. For any > 1 and any ∈ (0, 1), the mixing time of Glauber dynamics for the random-cluster model on a graph with vertices and edges satisfies GD mix ≤ log + 2 log · log 1 min{ , 1 − } · CM mix .
We note that in the mean-field model, where = Θ( 2 ) and we take = / with = (1), this theorem yields that GD mix = ( 3 (log ) 2 ) · CM mix , which establishes Theorem 1.2 from the introduction and improves by a factor of ( ) the best previously known bound for the Glauber dynamics on the complete graph.
To prove Theorem 6.1 we use the following standard fact.
Note that 0 is the minimum probability of any configuration on Γ 0 . Without the additional assumptions in the theorem, the best possible bound involves a factor of min = min ∈Γ ( ) instead. We remark that there are related conditions under which (20) holds; we choose the condition that ( , Γ \ Γ 0 ) ≤ 1 16 for every and every ≥ for convenience.
We can now provide the proof of Theorem 6.1.
Proof of Theorem 6.1. First note that if = Ω(1), it suffices to prove that This follows from (19) and the fact that since the partition function for the random-cluster model on satisfies ≤ (see, e.g., Theorem 3.60 in [19]).
Thus, we may assume ≤ 1/100. From (18) and the standard relationship between the spectral gap and the mixing time (see, e.g., Theorem 12.4 in [23]) we obtain: Let denote the transition matrix of the Glauber dynamics. In order to apply Theorem 6.2, we have to find a suitable subset of states Ω 0 ⊆ Ω and a suitable time so that ( , Ω \ Ω 0 ) ≤ 1 16 , for every ∈ Ω and every ≥ .
We let Ω 0 = { ⊆ : | | ≤ 100 } and = log for a sufficiently large constant > 0. When an edge is selected for update by the Glauber dynamics, it is set to be open with probability /( + (1 − )) if it is a "cut edge" or with probability if it is not; recall that we say an edge is open if the edge is present in the random-cluster configuration. Therefore, since ≥ /( + (1 − )) when > 1, after every edge has been updated at least once the number of open edges in any configuration is stochastically dominated by the number of edges in a ( , ) random graph. By the coupon collector bound, every edge has been updated at least once at time w.h.p. for large enough . Moreover, if all edges are indeed updated by time , the number of open edges in at any time ≥ is at most 100 with probability at least 19/20 by Markov's inequality. Therefore, the Glauber dynamics satisfies condition in Theorem 6.2 for these choices of and Ω 0 .
For completeness, we also derive Theorem 5.1 from first principles (i.e. without using Mukhin's result [28]) in Appendix D.

B Proofs of random walk couplings
Another important tool in our proofs are couplings based on the evolutions of certain random walks. In this section we consider a (lazy) symmetric random walk ( ) on Z with bounded step size, and the first result we present is an estimate on = max{ 1 , . . . , } which is based on the well-known reflection principle (see, e.g., Chapter 2.7 in [22]).  We can now prove Theorem 5.7.

If
3 . By the Berry-Esséen theorem for independent (but not necessarily identical) random variables (see, e.g. [3]), we get that for any ∈ R where is a standard normal random variable, and ∈ [0.4, 0.6] is an absolute constant. Then, Notice ≥ 2 √ . If + 8 ≥ , the theorem holds vacuously since

C Random graphs estimates
In this section, we provide proofs of lemmas which do not appear in the literature. Recall ∼ ( , 1+ −1/3 ), where = ( ) may depend on . Both of Lemmas 3.10 and 3.11 are proved using the following precise estimates on the moments of the number of trees of a given size in . We note that similar estimates can be found in the literature (see, e.g., [29,1]); a proof is included for completeness. (iii) For ≠ , Cov( , ) ≤ (1+ (1)) 2/3 2 3/2 3/2 .
To prove Lemma 3.10, we also use the following result.
Lemma C.2. Suppose 3 → ∞ and = (1). Then w.h.p. the largest component of ∼ , 1+ is the only component of which contains more than one cycle. Also, w.h.p. the number of vertices contained in the unicyclic components of is less than ( ) −2 for any function ( ) → ∞.
Proof. An equivalent result was established in [26] for the ( , ) model, in which exactly edges are chosen independently at random from the set of all 2 possible edges (see Theorem 7 in [26]). The result follows from the asymptotic equivalence between the ( , ) and ( , ) models when = 2 (see, e.g., Proposition 1.12 in [21]).
All that is left to prove is that the contribution from complex (non-tree) components is small. When | | = (1), this follows immediately from the fact that the expected number of complex components is (1) (see, e.g., Lemma 2.1 in [27]). Then, if C ℎ is the set of complex components in of size at most ℎ , we have E ∈ C ℎ | | 2 = Setting ( ) = 2 , it follows that w.h.p.
Notice that the 's for ≤ are Bernoulli random variables. By periodicity (see Theorem 3.5.2 in [13]), | ( )| equals to 1 only when equals to the multiples of 2 . For ∈ [ /2, ], | ( )| is bounded away from 1, and there exists a constant < 1 such that | ( )| ≤ . Hence, | ( )| ≤ . By choosing to be sufficiently large, we may bound the integral for > : ∫ Thus we established (33) and the proof is complete.