Random Feedback Shift Registers, and the Limit Distribution for Largest Cycle Lengths

For a random binary noncoalescing feedback shift register of width $n$, with all $2^{2^{n-1}}$ possible feedback functions $f$ equally likely, the process of long cycle lengths, scaled by dividing by $N=2^n$, converges in distribution to the same Poisson-Dirichlet limit as holds for random permutations in $\mathcal{S}_N$, with all $N!$ possible permutations equally likely. Such behavior was conjectured by Golomb, Welch, and Goldstein in 1959.

In 1959 [17], see also Chapter VII of [16], Golomb, Welch, and Goldstein suggest that the flat random permutation in S N , with all N ! permutations π equally likely, gives a good approximation to the cycle structure of π f , in the sense that the cycle structure of π f is close to the cycle structure of π, in various aspects of distribution, such as the average length of the longest cycle. See [21], especially the section "Cellular Automata and Nonlinear Shift Registers," which includes an anecdote that Golomb used custom hardware modules in 1956 to experiment on this conjecture, and these ran about 3 million times faster than the general purpose computer on the same problem . We prove that the longest cycle part of this conjecture is true, and more, namely that π and π f have the same limit distributions in the infinite-dimensional simplex ∆, for the processes 1 of long cycle lengths, scaled by N . This does not answer other aspects of Golomb's conjecture, involving the distribution of the number of cycles, or behavior of short cycles.
There are two natural ways to view the large cycles of the random permutation π f , which we now describe briefly. First, there is the process of largest cycle lengths: write L i for the length of the i th longest cycle of π f , with L i := 0 if the permutation has fewer than i cycles, so that always L 1 + L 2 + · · · = N , where N = 2 n . Write L = L(N ) for the process of scaled cycle lengths, L = (L 1 /N, L 2 /N, . . .). Second, there is the process of cycle lengths taken in age order : pick a random n-tuple, take A 1 to be the length of the cycle of π f containing that first n-tuple, then pick a random n-tuple from among those not on the first cycle, take A 2 to be the length of the cycle of π f containing that second n-tuple, and so on. Write A = A(N ) = (A 1 /N, A 2 /N, . . .) for the process of scaled cycle lengths in age order. For flat random permutations π in place of π f , the limit of A is called the GEM process (after Griffiths [18], Engen [15], and McCloskey [20]); it is the distribution of (1−U 1 , U 1 (1−U 2 ), U 1 U 2 (1−U 3 ), . . .), where U, U 1 , U 2 , . . . are independent and uniformly distributed in (0, 1). The Poisson-Dirichlet process is (X 1 , X 2 , . . .) where X i is the i th largest of 1−U 1 , U 1 (1−U 2 ), U 1 U 2 (1−U 3 ), . . .. This construction gives the simplest way to characterize the Poisson-Dirichlet process, PD. For flat random permutations, the limit of L is PD. 2 See Section 5.1 for a review of these concepts, including more discussion of age-order and the GEM limit as used in (5). See also [3]. Formally, our result is the following: Theorem 1. Consider the random permutation π f given by (3), where all 2 2 n−1 possible f in (2) are equally likely. Then, as n → ∞, L(N ) converges in distribution to (X 1 , X 2 , . . .) with PD distribution.
Writing → d to denote convergence in distribution, we can succinctly summarize the conclusion of Theorem 1 by writing L(N ) → d X := (X 1 , X 2 , . . .).
We note some easy consequences of Theorem 1. Theorem 1 is equivalent to with GEM distribution, by a soft argument involving size-biased permutations, originally given by [13]. By projecting onto the first coordinate 3 , we see By taking expectations, we see Of course, the uniform distributional limit in (6) makes no local limit claim; it is plausible that N P(A 1 = i) → 1 holds uniformly in n < i < N − n. For any fixed i > 1, the statement N P(A 1 = i) → 1 is false. It is true that N P(A 1 = 1) = N P(A 1 = N ) = 1. And for any fixed j > 0 the statement N P(A 1 = N − j) → 1 is false; see [10].
We work with the de Bruijn graph D n−1 , with edge set F n 2 and vertex set F n−1

2
; edge e = (y 0 , y 1 , . . . , y n−1 ) goes from vertex v = (y 0 , y 1 , . . . , y n−2 ) to vertex v = (y 1 , . . . , y n−1 ). The graph D n−1 is 2-in, 2-out regular, and a random feedback logic f corresponds to a random resolution of all vertices; the resolution at a vertex v pairs the incoming edges, 0v and 1v, with the outgoing edges v0 and v1. The cycles of a random permutation π f correspond exactly to the edge-disjoint cycles in a random circuit decomposition of the Eulerian graph D n−1 .

A Survey of the Proof of Theorem 1
In this section we survey the proof of Theorem 1 while omitting many necessary technicalities. It is hoped the reader will thus have a better notion of what is happening, and why, as s/he reads the later sections. We begin with the notion of relativisation. Suppose, as for example in the hypotheses of Theorem 1, that one has for each n = 1, 2, . . . a probability P n on the permutations of a set E n . Let π ∈ S(E n ) be one such permutation, and let e = (e 1 , . . . , e k ) be a k-tuple of, for now, distinct elements from the domain E n . Picturing the permutation π as a collection of disjoint cycles, one sees that by ignoring all elements of E n 2 This same Poisson-Dirichlet process also gives the distributional limit for the process of scaled bit sizes of the prime factors of an integer chosen uniformly from 1 to x, as x goes to infinity. Here we write PD for PD (1), where, in general, GEM(θ) and PD(θ) for θ > 0 are constructed using U 1/θ in place of U , and the case θ = 1/2 gives the limits for the processes of sizes of largest components, in age order or strict size order, for random mappings, i.e., functions from [n] to [n] with all n n possibilities equally likely. 3 Since U, U 1 and 1 − U 1 all have the same distribution, uniform in (0, 1).
except for the e i , these latter are permuted among themselves. That is, starting with e i , traverse the cycle of π containing this element:e i , π(e i ), π 2 (e i ), . . . until after one or more steps an element e j is encountered. (It is possible for the first element so encountered to be e i , which happens when the traversed cycle contains only a single member of the k-tuple e.) Since the e i are given in a definite order, the induced permutation among these elements is readily identified with an element of S k , the permutations of the set {1, 2, . . . , k}. Altogether, we have a function rel n,k : S(D n ) × (D n ) k → S k which we call relativisation. Here, (D n ) k denotes the ordered k-tuples drawn from E n without replacement. We shall prove: Suppose that for every fixed k ≥ 1 the sequence of distributions induced on S k by the functions rel n,k and the probability distributions P n tends to the uniform distribution. (For brevity, we say "P n has the uniform relativisation property.") Then the large cycle process associated with P n tends to Poisson-Dirichlet. The proof that the uniform relativisation property implies the Poisson-Dirichlet property appears in Section 5.3, as Lemma 4.
Henceforth we specialize to the particular sequence P n of interest: the sets E n are the binary n-tuples F 2 n , and P n assigns equal weight to each of the 2 N/2 shift register permutations π f (and no weight to other permutations), where N = 2 n . Let S n,k denote the Cartesian product For technical reasons we define the relativisation function rel n,k on the set S n,k , see Definition (1) in Section 4. 10. Nevertheless, pairs (f, E ) in which e contains a repeated element may be safely ignored by the reader for now, and only the primary objective be kept in mind: to show that as (f, e) varies over S n,k the coverage of S k under the relativisation function rel n,k is approximately uniform. Roughly speaking, this objective is accomplished by partitioning the set S n,k into blocks such that the restriction of rel n,k to each block of the partition yields an almost uniform coverage of S k . The description of these blocks involves the notion of toggle. Let v ∈ F 2 n−1 and f be a feedback function; then the toggle of the function f at the point v is the function f v which disagrees with f only at the argument v: That is, we have toggled a single bit in the truth table of f . Toggling a feedback function has a predictable effect on rel n,k (π f , e). In particular, for and and, (with a < b),xv = π j f (e b ), and then (let the reader check by drawing a picture) rel n,k (π fv , e) = rel n,k (π f , e) • (a, b), where (a, b) denotes a transposition in S k . The blocks in our partition of S n,k arise as follows: given (f, e) ∈ S n,k , we determine, in a way explained below, a subset of size m, and define the block containing (f, e) to be the 2 m different toggles (f U , e), U ranging over subsets of V .
Here, f U denotes function f toggled at all v ∈ U . For the block to be well defined, it must be the case that the choice of V will be the same for all f U as for f . This necessitates the introduction of a subset H ⊆ S n,k , the "happy event," see equation (42) in Section 4. 8. It turns out that the happy event is almost all of S n,k , |H|/|S n,k | → 1, and for (f, e) ∈ H the blocks are well defined. Moreover for each such block we have an ordered sequence of transpositions ( For m sufficiently large, almost all such sequences of transpositions yield 2 m compositions which cover S k almost uniformly. (Lemma 2 in Section 4.9 proves that for all k, there is an m such that the distribution induced on S k is within of uniform in total variation for all but an fraction of possible sequences.) Let us say something about how, given (f, e) ∈ S n,k , the m-subset V of F 2 n−1 is chosen. The pair (f, e) determines k segments e a , π f (e a ), . . . , π t f (e a ) (1 ≤ a ≤ k) (8) in which the length t is taken to be approximately N 3/5 . For this length it is almost certain that not only are the initial edges e a distinct, but in fact all k × (t + 1) of the edges π i f (e a ) are distinct. This feature is included in the definition (42) of event H. Given that π f acts by shifting left and bringing in one new bit on the right, each sequence (8) is equivalent to a binary sequence e a,0 e a,1 · · · e a,n−1 · · · e a,n+t−1 of length n + t. To be considered for membership in V , an (n − 1)-tuple v # must appear in two of these binary sequences; that is, for some a < b and some bit x ∈ {0, 1} xv # = e a,i · · · e a,i+n−1 andxv # = e b,j · · · e b,j+n−1 .
One may ask as (f, e) varies uniformly over S n,k what is the probability of finding such leftmost (n − 1)repeats (i, j) in various regions of the plane? Remarkably, such points when rescaled as (i/N 1/2 , j/N 1/2 ) constitute, in the limit with respect to total variation distance, a familiar Poisson process. Thanks to this limiting behavior we can estimate not only the probability of finding v # 's which satisfy the above minimal constraint for V -membership, but also the probability of finding m v # 's lying in a much more stringently constrained geometry, which geometry implies (f, e) ∈ H. Section 4 is devoted to proving these properties of H and V under the assumption that the probabilities in question can be approximated by a Poisson process.
We conclude our survey by saying how this last assumption is justified. We present in Section 3 an algorithm called sequential editing which begins with k random binary sequences (referred to as coin-toss sequences) and edits them in such a way that the result of the editing is a set of k sequences which could have been produced by choosing (f, e) ∈ S n,k and forming the k segments (8). Even more, the probability of obtaining a particular set of sequences is exactly the same, whether we choose (f, e) and form (8), or flip k(n + t) coins and perform sequential editing. (This is proven in Theorem 4 of Section 3.6).
Moreover, there is a "good event" G, G ⊆ F 2 (n+t)k , such that when the initial coin toss sequence C belongs to G the leftmost (n − 1)-repeats in the edited sequence appear in exactly the same locations (i, j) as they do in C. Since G is almost all of F 2 (n+t)k (Theorem 3 in Section 3), the study of the (f, e)-induced pairs is reduced to the study of leftmost (n − 1)-repeats in k random sequences. This new process is by no means easily evaluated, but fortunately it is in the realm of the Chen-Stein method as presented and extended in [6]. In such a manner the above described approximation is justified.
Looking back at this survey, it appears that the components in the proof of Theorem 1 have been described almost in the reverse order that they appear in the sequel. May we wish that in the end the determined reader will understand the proof forwards and backwards.

Comparisons with Coin Tossing Sequences
Throughout this section these conventions will be observed: a i , b i , C i denote bits; v i denotes an (n − 1)-long sequence of bits; and e i denotes an n-long sequence of bits. A tool used in the proof of Theorem 1 is to compare the bit sequence b 0 , b 1 , . . . b n+t generated by a randomly chosen feedback logic f with a coin toss sequence, denoted in this section C 0 , C 1 , . . . , C n+t . A bit sequence b i generated by a feedback logic has what we refer to as the de Bruijn property: it satisfies a recursion of the form b In a sequence with the de Bruijn property the n-long words 0v and 1v must be followed by different bits. Of course, not every coin toss sequence has the de Bruijn property. The sequential edit, defined below, of a coin toss sequence C i is obtained in a left-to-right bit-by-bit manner and adheres as closely as possible to C i , changes being made only when forced by the desire to respect the de Bruijn property. On the other hand, the shotgun edit, also defined below, of a sequence C i is a naive imitation of a sequential edit. In a sense and circumstances to be made precise, by the combination of Theorems 2 and 3, with high probability, these two produce the same output.

Sequential Editing
We begin with an n + t long bit sequence The new bit sequence of the same length, is produced by following two rules: Rule 2: For i ≥ 0 determine bit b i+n by first asking if the feedback logic bit f (b i+1 , . . . , b i+n−1 ) has been previously defined; if so, set b i+n accordingly: otherwise, define (and remember) the feedback logic bit in such a way that b i+n and C i+n agree.
Here we give some terminology, and indexing practice. We say that the sequence b is obtained from the coin toss sequence C by sequential editing. Each time a b i+n has freedom -because the necessary feedback bit has not yet been set -we set the feedback bit so that b i+n = C i+n ; but at any time the bit b i+n "has no choice," we assign it the forced value. Such a time i is a time of a potential edit; if it turns out (by chance) that b i+n and C i+n agree, then no actual edit has taken place; if it is forced to take b i+n equal to C i+n then an actual edit has taken place, and we label the time of this actual edit as i rather than i + n. The sequence b obtained by this process always has the de Bruijn property. In terms of the de Bruijn graph with all vertices resolved, a potential edit occurs at time i when e i , the edge from , the resolution of that vertex, is already known, so that the successor edge, e i+1 = π f (e i ) is determined -this is equivalent to determining b i+n , the rightmost bit of e i+1 .

Shotgun Editing
Now we define a second, generally different, way to edit the coin toss sequence C i to produce a sequence a i . We call this the shotgun edit. Unlike b i obtained by sequential editing, the sequence a i obtained by shotgun editing may not have the de Bruijn property.
We use (I) and r(I) to denote the left-and right-endpoints of the interval I. A binary sequence has an m-long repeat at (I, J) if (I) < (J), |I| = m = |J| and the two ordered m-tuples (C i : i ∈ I) and (C j : j ∈ J) are equal. We say that (9) has a leftmost 4 m-long repeat at (I, J) if, in addition, either (I) = 0 or This given, the shotgun edit of coin toss (9) is readily defined: make a list (I 1 , J 1 ), (I 2 , J 2 ), . . . of all the leftmost n-long repeats found in (9). Let

Zero and First Generation Words
be a coin toss sequence whose leftmost n-tuple repeats occur at (I 1 , J 1 ), (I 2 , J 2 ), . . . . The zero-generation words of length h are simply words of the form: A first-generation word is a zero-generation word with exactly one bit complemented, with the index of the complemented bit required to be r(J k ) for some k:

The Good Event G (t)
We always consider n to be understood, but sometimes we will not want to emphasize the role of t, hence writing G ≡ G (t) . Henceforth we shall always assume that t is at most N = 2 n , since we are interested in cycle lengths for permutations on a set of size N . Let C 0 , C 1 , . . . , C t+n−1 be a length n + t coin toss sequence whose leftmost n-repeats occur at (I 1 , J 1 ), (I 2 , J 2 ), . . . . Then the good event G (t) is defined to be the conjunction of these six conditions: (a) neither the initial n-long word of the coin toss sequence, nor any of its 1-offs 5 is repeated (probability of failure O(tn/N )); (b) all intersections of the form I k ∩ J k are empty (probability of failure O(t 3 n/N 2 + tn 3 /N )); (c) the sets I 1 , I 2 , . . . are pairwise disjoint; likewise J 1 , J 2 , . . . (probability of failure O(t 2 n 2 /N 2 + t 3 n/N 2 + tn 3 /N )); (d) no first-generation word of length n − 1 equals a zero-generation word of length n − 1, or another first-generation word of length n − 1 (probability of failure O(t 4 n 2 /N 3 + t 3 n 3 /N 2 + tn 3 /N )); (e) for every leftmost (n − 1)-repeat (I, J) we have for all k (probability of failure O(t 3 n/N 2 + t 2 n 2 /N 2 + tn 3 /N )); (f ) there is no (2n − 1)-repeat (probability of failure O(t 2 /N 2 )).
The indicated probabilities of failure will be proven below in Theorem 3. First, though, we will prove a theorem that explains why G is called the "good event." These conclusions, along with Theorem 3 in the next subsection, will provide substantial control of the prevalence of (n − 1)-tuple repeats Proof of Conclusion 1. Assume, to the contrary, that the a and b sequences differ; let i be the first position of disagreement: There are two possibilities: (1) a i = b i and b i = C i ; or (2) a i = b i and a i = C i .
Case (1). Since a i = C i we have i = r(J k ) for some k, and there is a leftmost n-repeat in the C sequence at (I k , J k ). But a j = C j for j ∈ I k (condition(b)); and a j = C j for j ∈ J k \ {i} (condition (c)). Hence b j = a j = C j for j ∈ I k ∪J k \ {i}. But b i = a i = C i , so in fact the b-sequence itself has an n-repeat at (I k , J k ). But the b-sequence has the de Bruijn property, and so the (I k , J k ) repeat can be backed up d = (I k ) > 0 steps to reveal The word on the right side of the last equality is either a zero-generation or a first-generation word; either case contradicts condition (a).
is appearing for a second or later time, say We must have b = C , since no sequential editing took place at time . (The relevant bit of the feedback logic had not yet been determined.) We know that b i = b , else the b-sequence contains an n-repeat which, as was explained in Case (1), backs up to yield the contradictory (10). So, Because i is the first point at which the b and a sequences disagree, Suppose (for the sake of a contradiction) that none of the a bits appearing on either side of this last Equation (11) was edited by the shotgun process. Then we have We have thus discovered an n-long repeat in the coin toss sequence, but it might not be a leftmost n-long repeat. So, we look left to determine the least m ≥ 1 such that either − n + 1 − m < 0 (i.e., you've gone "off the board") or the run of equalities is broken: One of these two will happen for m < n or else the C-sequence is found to contain a 2n-repeat, contradicting assumption (f). But then we have found a leftmost n-repeat in the C-sequence beginning at − n + 1 − m + 1 and i−n+1−m+1; shotgun editing would consequently modify the C-bit at position i−n+1−m+1+n−1 = i − m + 1. Since we have found that one of the C-bits on the right side of Equation (12), namely the one whose index is i − m + 1, is changed by shotgun editing, contrary to our earlier supposition that none of the a bits appearing on either side of the equality (11) was edited by the shotgun process.
By condition (c), every n-long word in the a sequence either is a zero-generation word (matches exactly the corresponding C-bits) or is a first-generation word (matches the corresponding C-bits with exactly one change). Thus, at least one of the n-long words appearing in (11) is a first-generation word, and this contradicts condition (d).
Proof of Conclusion 2. We will make use of the a and b sequences being equal. Suppose we have a leftmost (n − 1) repeat in the coin toss sequence, By condition (e), none of these 2n bits (or 2n − 2 in case i = 0) can be edited by the shotgun edit. Hence, we have a leftmost (n − 1)-repeat at the same place in the a sequence, whence also the b sequence.
Altogether by (14), (15), (16) we have If i = 0, then the last is a leftmost (n − 1)-repeat in the C sequence, as asserted. So, to conclude, suppose for the sake of a contradiction that i > 0 and that C i−1 = C −1 . Then we have an n-long repeat Sliding left for m steps, we will encounter a leftmost n-repeat in the coin toss sequence with 0 ≤ m < n − 1 by condition (f). But in such a case a +n−2−m = C +n−2−m by the definition of shotgun editing. However, for 0 ≤ m < n − 1 and by (16) a +n−2−m = C +n−2−m The supposition that i > 0 and that C i−1 = C −1 has been contradicted, and so (17) is, indeed, a leftmost (n − 1)-repeat as needed.

Probability
In this section we bound the probability of failure of any one of the conditions (a)-(f) appearing in Theorem 2. Let S be a set of relations, each of the form C i = C j or C i = C j with i < j. We assume always that S has at most one relation for a given (i, j); that is, we don't allow both C i = C j and C i = C j . What is the probability that a coin toss sequence C will satisfy such a set of relations? The desired probability is 2 −|S| provided the graph associated with S is cycle free. The graph we have in mind here is (V, E) where V is the set 0, 1, 2, . . . and E is the set of pairs {i, j} such that at least one (and by convention exactly one) of the In particular, if the graph of S consists of the n pairs (i, j), (i + 1, j + 1), . . . , (i + n − 1, j + n − 1) the probability is 2 −n = 1/N . This is quite clear if I = {i, . . . , i + n − 1} and J = {j, . . . , j + n − 1} are disjoint, since then the underlying graph has no vertex of degree 2. It is also true if I and J overlap, (of course I = J): every vertex of degree 2 in the graph (i.e., every element of I ∩ J) has one larger neighbor and one smaller neighbor. But a cycle would require at least one vertex with two smaller neighbors.
We will have frequent occasion below, in the proof of Theorem 3, to consider sets S whose graphs are the union of two such n-sets of pairs (i 1 , . We begin with a lemma which shows that in many situations which arise in these proofs the probability in question is 1/N 2 .
Lemma 1. Let G be the graph whose edges consist of two sets of pairs Then G is cycle free if any one of the following three conditions holds, where we assume i 1 < j 1 and i 2 < j 2 : Proof. If I 1 ∩ I 2 = ∅ then no vertex has two neighbors larger than it. If J 1 ∩ J 2 = ∅ then no vertex has two neighbors smaller than it. In case (iii), all edges out of I 1 ∪ I 2 go to J 1 ∪ J 2 , and vice-versa. A cycle, if there is one, lies within the bipartite graph whose parts are I 1 ∪ I 2 and J 1 ∪ J 2 , and clearly the cycle must alternate edges between (I 1 , J 1 ) and (I 2 , J 2 ) types. If the cycle (of necessity even in length) uses edges of the first sort and of the second, then it has traveled × (j 1 − i 1 ) in one direction and × (j 2 − i 2 ) in the other. The last part of condition (iii) makes it impossible for the cycle to have returned to its starting point.

Theorem 3.
Let G be the good event. Then,

Proof.
We shall show that the probability that a random coin toss sequence of length t + n fails any one of the conditions (a) through (f) in the definition of G is O(t 4 n 2 /N 3 + t 3 n 3 /N 2 + tn 3 /N ). (More explicitly, each will be shown to fail with the probability indicated in the definition of G.) We invoke the above Lemma during the proof by citing Lemma (i), Lemma (ii), and Lemma (iii).
Condition (a): [neither the initial n-long word of the coin toss sequence, nor any of its 1-offs, is repeated.] Consider first an exact repetition. There are t − 1 places where the repeated sequence can start, and by earlier remarks the probability that the second sequence repeats the first is 1/N . The same argument applies to the 1-offs of the initial pattern, and we conclude that the probability for condition (a) to fail is less than t(n + 1)/N .
For k = k we bound the probability of failure by tn/N using the same technique as in case (a). Suppose that I 1 ∩ J 2 = ∅. By the k = k case of the proof we may assume J 1 disjoint from I 1 and to its right; and I 2 disjoint from J 2 and to its left. If In the remaining case I 1 meets I 2 , J 1 meets J 2 , and I 1 meets J 2 . Thus the union I 1 ∪ I 2 ∪ J 1 ∪ J 2 is an interval, and a bound of O(tn 3 /N ) results.

Condition (c): [the sets
We will prove the assertion regarding I 1 , I 2 , . . . ; the other assertion is proven in an entirely similar manner. Suppose I 1 ∩I 2 = ∅. We may assume J 1 ∩ J 2 = ∅; otherwise Lemma (ii) implies an upper bound of O(t 3 n/N 2 ). So now, both intersections I 1 ∩I 2 and J 1 ∩J 2 are nonempty. If any one of the four intersections I a ∩J b is nonempty, then again the union Assume, for the sake of a contradiction, that r(J 2 ) − r(I 2 ) = d = r(J 1 ) − r(I 1 ). Then we have I 1 = I 2 and J 1 = J 2 . Without loss, let us say I 1 is left of

Condition (d):
[no First-generation word of length n − 1 equals a Zero-generation word of length n − 1, or another First-generation word of length n − 1]. Suppose the first assertion is violated. Then we have, for some i, some d > 0 and some j ∈ {i, i + 1, . . . , i + n − 2}, with r(J k ) ∈ {j, j + d} and with (I k , J k ) a leftmost n-repeat. Let I = {i, i + 1, ..., i + n − 2} and let J = {i + d, i + d + 1, ..., i + d + n − 2}. If I and I k are disjoint then using Lemma (i) the probability is bounded by O(t 3 n/N 2 ). So assume they intersect, so their union is an interval. Similarly we can assume, using Lemma (ii), that J and J k also intersect so their union is another interval. If these two intervals intersect, forming another interval, we have a probability bound of for the probability. This gives an overall bound of O(t n 3 /N + t 2 n 2 /N 2 + t 3 n/N 2 ). Next, suppose the second assertion of (d) is violated. Then we have, for some i, some d > 0 and some with j 1 = r(J 1 ), j 2 + d = r(J 2 ), and (I 1 , J 1 ), (I 2 , J 2 ) leftmost n-repeats. As reasoned before we have I ∩ J, I 1 ∩ J 1 , and I 2 ∩ J 2 all empty with probability at least 1 − O(tn 2 /N ). It follows, from the sheer geometry of the situation, that I ∩ I 1 = ∅. We may assume that I 1 ∩ I 2 = ∅, since (as proven in (c) above) the probability of failure is O(t 3 n/N 2 + t 2 n 2 /N 2 + tn 3 /N ). We may assume that I 1 ∩ I = ∅, for otherwise the union of I 1 and I and J 1 is a connected interval, and then by reasoning as above a bound of O(t 3 n 3 /N 2 ) results. We now have all three intersections I ∩ I 1 , I ∩ I 2 , and I 1 ∩ I 2 being empty; and by an obvious embellishment of Lemma (i) the probability of the remaining case is O(t 4 n 2 /N 3 ).
Condition (e): [for every leftmost (n − 1)-repeat (I, J) we have The probability that I ∩ J is not empty is O(tn/N ), so assume both I k ∩ J k and I ∩ J are empty. The probability that I k ∩ I is empty is, by Lemma (i), Easily, the failure probability is at most 2t 2 /N 2 .

Coin Tossing Versus Paths in the de Bruijn Graph
Theorem 4. Let b 0 , . . . , b t+n−1 be a bit string. Then, the probability that this string arose by the sequential editing of an n + t long coin toss sequence is the same as the probability that it arose by choosing a logic f : V n−1 → V and starting position (b 0 , . . . , b n−1 ) each uniformly at random. Proof. Without loss of generality we assume the given string has the de Bruijn property. (Else, the two probabilities are both zero.) First, let's compute the probability that b arose by sequential editing of a (t + n)-long coin toss. The probability of the coin toss yielding b 0 , . .
has not been seen before (among (n − 1)-long words ending at a position greater then or equal to n), then C i must be equal to b i . (And, we remember henceforward the value of ) Altogether, then, the probability that a length t + n coin toss will yield a given sequence b 0 , b 1 , . . . , b t+n−1 by sequential editing is 2 −r where r = n − 1 + #distinct (n − 1)−long subwords ending at position n − 1 or later. Now let's compute the probability that b arose by choosing a starting position and logic at random. Classify each position i, 0 ≤ i ≤ t + n − 1, as Type I or Type II. The position is Type I if i ≥ n and the preceding (n − 1) long word (b i−n+1 , . . . , b i−1 ) is appearing for the first time in the b-sequence. The position is Type II otherwise: either i < n, or the preceding (n − 1) long word (b i−n+1 , . . . , b i−1 ) is appearing for the second or later time. It should be clear that the probability in question is The two probabilities just calculated agree. 6

Notation for Paths Starting at k Random n-tuples
We now fix k ≥ 1 and use the notation e 1 , . . . , e k to name k random n-tuples. Collectively, these k edges of Picking a random feedback f , and k random n-tuples, independent of f , is equivalent to picking one element, uniformly at random from the space The choice of (f, e) from S n,k determines k infinite periodic sequences of edges: for a = 1 to k, Seg(f, e a ) := (e a,0 e a,1 e a,2 · · · ) where e a,0 = e a , and for i ≥ 0, e a,i+1 = π f (e a,i ).
For the sake of comparison with coin tossing, we often look at such paths only up to time t (this is what motivated our terminology segment): for a = 1 to k, Seg(f, e a , t) = (e a,0 e a,1 · · · e a,t ). (23)

(k, t)-sequential Editing
Now we will define a modification of the sequential editing process that was discussed earlier in Section 3. 1.
The reader should bear in mind our ultimate goal. We wish to study what happens when a feedback logic f is chosen at random; k different starting n-tuples e 1 , . . . , e k are chosen at random; and k walks of length t are generated, the first starting from e 1 and using the logic f to continue for t steps; the second starting from e 2 , etc. As in Section 3.1, we wish to generate these walks using k(n + t) coin tosses, and we would like to have an analog to Theorem 4 saying that our procedure for passing from the coin toss to the k walks perfectly simulates the process of choosing a logic and starting points at random. The reader can almost certainly envision the natural way to achieve this, but we will write out the details. The first n + t coins are used exactly as in Section 3.1: Rule 1 is applied to the first n coin tosses to yield starting point e 1 , and then Rule 2 is applied t times to get the overlapping n-tuples e 1 = e 1,0 , e 1,1 , . . . , e 1,t that form the first walk. Equivalently, this segment is spelled out by the (n + t) de Bruijn bits b 0 . . . b t+n−1 , and along the way, some feedback logic bits have been defined.
Then, for the next n coin tosses, C i for i = t + n to i = t + 2n − 1 inclusive, sequential editing is suspended ; again Rule 1 is applied, to give with no new feedback logic bits learned. Then, Rule 2 is applied for the next t input bits, C i for i = t + 2n to i = 2t + 2n − 1 to create the second walk of length t, Seg(f, e 2 , t) -remembering of course those feedback logic bits that were learned during the creation of Seg(f, e 1 , t), and (most likely) learning some new feedback logic bits in the process. (It might be the case that e 2 = e 1 , or that e 2 appears in the first walk, in which case, we don't learn any new feedback logic bits.) If k > 2, we continue in a similar fashion, first suspending editing for time n, during which time we learn no new feedback logic bits and we form e a := (b (a−1)(t+n) , . . . , b (a−1)t+an−1 ) := (C (a−1)(t+n) , . . . , C (a−1)t+an−1 ), then returning to Rule 2 for the next t bits, to fill out Seg(f, e a , t).
For k, t ≥ 1 we define as given by the above procedure. It may, or should, seem intuitively obvious that Q-EDIT k,t , applied to an input uniformly chosen from {0, 1} k(n+t) , induces the same distribution on the k segments of length t in (22), as does a uniform pick from S n,k and iteration of π f from each of e 1 , . . . , e k . We claim that the argument given in the proof of Theorem 4 can be adapted to show this. 3.9 The Good Event G (k,t) for (k, t)-sequential Editing There are two different ways to produce k walks each of length t out of a sequence of k(n + t) coin tosses. The first, with t = (k − 1)n + kt playing the role of t, is simple sequential edit, to determine a starting n-tuple e, and one path e 0 , e 1 , . . . , e t corresponding to t = (k − 1)n + kt iterates of π f starting from e. The good event, regarding this first procedure, is really G ≡ G ((k−1)n+kt) . We can then cut the path of length t to produce k paths of length t; see (31) to see the natural notation associated with such cutting. The second procedure is is to apply Q-EDIT k,t , defined in the previous section, to produce a k-tuple of starting edges, e, and k segments of length t, as in (23). The good event, regarding this second procedure, to be called G (k,t) , is designed so that the two procedures agree. We simply take all of the demands of the good event for simple editing on k(n + t) coins, and throw in additional requirements to ensure the suspensions of editing involved in the definition of Q-EDIT k,t . Informally, these additional requirements are that every (n − 1) tuple which appears at some time j involved in suspension occurs at no other time i in the coin toss sequence. Formally, given n, k, t, the bad event B is given by where the event M ij = ∅ if i = j, and otherwise and the good event is then Since a word of length n − 1 has n neighbors at Hamming distance 1 or less, P(M ij ) = n/2 n−1 for i = j, so that P(B) ≤ (n + t)k 2 n 2 × 2/N , for the sake of extending Theorem 3. We now consider the following to have been proved; it is a single theorem, to give the extensions of Theorems 2 and 3 and 4, appropriate to k, t sequential editing. Note that in the final conclusion of Theorem 5 we treat k as fixed while n, t → ∞, so that t and kt are of the same order, and we take the assumption t/ √ N → ∞ so that the three terms in the bound from Theorem 3 are covered by a single term.
Theorem 5. (i) The procedure Q-EDIT k,t , applied to a coin toss sequence chosen uniformly at random from {0, 1} k(n+t) , yields k segments of length t, (Seg(f, e 1 , t), . . . , Seg(f, e k , t)) with exactly the same distribution as obtained by a random feedback logic f and k starting n-tuples, e = (e 1 , . . . , e k ).
(ii) The good event G (k,t) ⊂ {0, 1} k(n+t) , defined by (27) -which ultimately involves conditions (a) through (f) from Section 3.4, applied with t = (k − 1)n + kt in the role of t, is such that for every outcome in G (k,t) , the bit sequence b 0 b 1 · · · b k(n+t)−1 (and the equivalent sequence of overlapping n-tuples, e 0 e 1 · · · e (k−1)n+kt ) formed by single sequential edit agrees with the shotgun edit of the k(n+t) coins, and leftmost (n−1)-tuple repeats have the same locations in b 0 b 1 · · · b k(n+t)−1 and in (C 0 , C 1 , . . . , C k(n+t)−1 ).
(iii) Also, on the good event G (k,t) , the k segments of length t, produced by Q-EDIT k,t and notated as in (23) match exactly with e 0 · · · e t , e t+n · · · e 2t+n , . . . , e (k−1)(t+n) · · · e (k−1)n+kt , produced by cutting the output of the single sequential edit of k(n + t) coins.
We summarize: there is an exact operation, sequential editing of n + t coin tosses, which achieves the exact distribution of Seg(f, e, t), as induced by a uniform choice of (f, e) from its 2 2 n−1 2 n possible values, followed by starting at e and taking t iterates of the permutation π f . There is a good event G ≡ G t , with P(G) → 1 provided that t 3 n 3 /N 2 → 0, for which the sequential edit agrees with the shotgun edit, and v i = v j iff the coins have a leftmost (n − 1)-tuple repeat at (i, j). This sequential edit can be used with k(n + t) in place of n + t, to create one long segment; there is the corresponding good event G t , t = (k − 1)n + kt. There is a second, distinct operation, Q-EDIT k,t , for editing k(n + t) coin tosses, to yield the exact distribution of k segments of length t under a single logic f and k starting n-tuples, e = (e 1 , . . . , e k ); that is, the distribution of (Seg(f, e 1 , t), . . . , Seg(f, e k , t)) as induced by a uniform choice of (f, E ) from its 2 2 n−1 2 kn possible values. And there is a corresponding good event G (k,t) ⊂ G t , with P(G t \ G (k,t) ) ≤ 2k 2 n 2 (n + t)/N, formed by adding the constraint that i or j ∈ ∪ 0≤a<k [a(n + t) − n + 2, a(n + t)] implies that there is not an (n − 1) tuple repeat at (i, j). On the event G (k,t) , the k-sequential edit agrees exactly with the cutting of Seg(f, e 1 , k(n + t) − n).

A Cutting Example
We now illustrate some of the concepts just introduced, with an example and with Figures 1-4. Take n = 10, t = 90, k = 3. So, to generate k = 3 segments of length t = 90, we start with k(n + t) = 300 coin tosses, used to generate one segment of length k(n + t) − n = 290. When we have in mind a single segment of length t, we will use a single subscript to label the edges, so that with e = e 0 , the segment is a list of t + 1 edges Seg(f, e, t) = e 0 e 1 · · · e t .
The coin tosses, indexed from i = 0 to i = 299, are labeled C i , the de Bruijn bits formed by sequential edit are labeled b i , and the bits formed by shotgun edit are labeled a i . On the good event G, we will have We also view the segment in (28) as a list of t + 2 vertices, or as a list of t + n bits, and abuse notation by writing equality, so that Since we are particularly interested in leftmost (n − 1)-tuple repeats, we shall suppose that we are in the good event G, and the leftmost (n − 1)-tuple repeats in the coin-toss sequence are at (56,153), (120,260), and (135,175). Thanks to G occurring, we know that all 291 edges e 0 to e 290 are distinct, and the only vertex repetitions are v 56 = v 153 , v 120 = v 260 , and v 135 = v 175 . One way of indicating where these vertex repeats occur is to draw some auxiliary lines pointing to the locations, as in Figure 1. Figure 2 on the next page gives a two-dimensional ("spatial") view of the same situation.
The same k = 3 segments of length t = 90, presented as lists of vertices (which here are 9-tuples) are notated as Collectively, these k segments are given by a deterministic function of (f, e), where e = (e 1 , e 2 , . . . , e k ) names all k starting points.

Coloring
Imagine the k segments of length t as pieces of (directed) yarn, with k different "primary" colors. Vertices that appear only once get the primary color of the segment they come from; vertices that appear twice on the same segment might be visualized as having a more saturated version of the primary color of that segment. The interesting case occurs when a vertex appears on two different segments; such a vertex, call it v # , gets each of two primary colors -and its secondary color shows which two segments this vertex lies on; for  example imagine that the two strands are red and yellow, so that v # is colored orange. Figure 3 on page 20 and Figure 4 on the following page illustrate this coloring.

Toggling
To toggle a logic f : is simply to get a new f from the old, by changing the value at v. This is called a "cross-join step", and is studied extensively in the context of cycle joining algorithms to create a full cycle logic. Our interest in toggling is different: we have k ≥ 2 segments induced by a logic f and k starting n-tuples, e 1 , . . . , e k , and we want to choose m different "toggle points" in the role of v, to get a nice family of 2 m related logics. All this is done in the interest of showing that the chance that e 1 and e 2 lie on the same cycle of π f is approximately one-half, for large n, and more generally, that the chance e 1 , . . . , e k all lie on the same cycle is approximately 1/k, and even more, that the permutation π f , relativized to e 1 , . . . , e k , is approximately uniformly distributed over all k! permutations. This introductory paragraph is intentionally short and vague; the full details use all of Sections 3 -6. Section 4.1 gives a longer attempt at introduction, including Figure 5 on page 23, showing the huge collection of candidate toggle vertices, using k colors to help visualize the k segments of interest.

Big Picture Perspective: k Colored Segments, m Toggle Points
We will have k segments each of length t = N . 6 . The expected number of leftmost (n − 1)-tuple repeats within a single segment is about t 2 /N . = .5N . 2 . The expected number of repeats between two different given segments is about t 2 /N = N .2 , so the expected number of repeats between two different segments, combined over all k 2 choices for which two segments, is about k 2 × N . 2 . This is a huge number of repeats (each based on one vertex having a secondary coloring), and we intend to find m such repeats, say at v # 1 , . . . , v # m in narrowly constrained spatial positions. The goal is to show that, with high probability, for all 2 m choices of how to change f by toggling the values of f (v # i ) for i ∈ I ⊂ {1, 2, . . . , m}, the same m vertices will be picked out by the narrow two-dimensional spatial constraints.
In this section we present further figures intended to assist the reader's intuition. We also give an algorithm which for a given logic f and starting edges e i finds m vertices v # 1 , . . . ,v # m . These points-we call them toggle points-give rise to a family of 2 m functions. We also define (in Section 4.10) a process called relativisation which associates with π f in S N a permutation σ in S k , k being fixed and N → ∞. It will be shown that as f varies over the 2 m functions in a "toggle class", the resulting σ s cover S k almost uniformly. In Section 5, it will be shown that this uniform coverage of S k (for each fixed k) is a sufficient condition to prove Theorem 1.
A critical issue is that the algorithm for choosing the toggle points must be such that, if the feedback f is replaced by any one of the 2 m functions in the toggle class, the algorithm would find the same toggle points Figure 4: Coloring and cutting; a succinct way to visualize both. The k segments of length t are still shown as they appear along the single segment of length k(t + n) − n. We also show the k 2 t by t squares where matches may occur between two differently colored length t segments. Note the repeat at (56,153) is a vertex colored both red and yellow, hence orange. The repeat at (120,260) is a vertex colored both yellow and blue, hence green. The vertex at (135,175) is colored yellow twice -we could show it as an extra-saturated yellow but did not. The significance of the diagonals of the small squares is explained in Section 4. 3. and the same class. 7 However this is not necessarily the case and the probability of success for the algorithm must be estimated; this leads to the definition of an event H. We show what can happen when we toggle one bit of a logic f . We have two segments of length t, which share a vertex v # . Toggling changes the value of f , only at v # , and gives a new logic f * . Suppose that the segments under f were red and yellow, and that v # appears at position i on the red segment, and position j on the yellow segment. Overall, this repeat has spatial location (i, j), and color orange. Exactly one such repeat was visualized in Figures 2 on page 19 and 3 on page 20; it occurred with (i, j) = (56, 53). The displacement is i − j -we have a preferred sequence of colors, (derived from the rainbow ROY G. BIV) where red comes before yellow -hence the displacement is 3, rather than −3, in this example.

Picking the "Earliest" Toggle with a Small Displacement
Consider the case where we have k = 2 segments, and want to find a single vertex v # via a recipe which, when applied to the segments under the toggled logic f * , still picks out the same vertex. A good recipe involves naming a small bound d on the absolute displacement |i − j| (thus staying close to the "diagonal"), and then picking the "earliest" pair (i, j) that satisfies the displacement bound. This was the key to overcoming the "fallacy" described in Footnote 7.
The specific choice of how to define earliest is somewhat arbitrary; we will take smallest (i + j) as the first criterion for earliest, with ties to be broken according to smallest value of max(i, j) -given that i+j = i +j , this is equivalent to taking smallest absolute displacement for the tie-break criterion. For use in the case of k colors and k 2 color pairs α = (a, b), break further ties according to min(a, b) and then max (a, b). The logic f , with value at v # complemented, gives a new logic It is possible that changing the logic bit at v # , will cause an earlier pair to become available as the location of a match between the two segments; so that the recipe for picking the earliest small displacement match, applied to f * , picks out a different vertex instead of v # . In this case, the word toggle is very misleading: the overall operation (find v # , then complement the logic at that vertex) is not an involution. Our program is to specify a displacement bound d that varies with n, in such a way that 1) with high probability, at least one small displacement match can be found, and 2) with high probability, the vertex for the earliest small displacement match is the same in the logic f * = Toggle(f, {v # }) at the vertex selected for f . The example in Figure 8 on page 26, viewed with any d ≥ 3, illustrates what might go wrong with respect to 2).
Recall, from Section 3, that t is the length of our segments. To get high probability in 1), a necessary and sufficient condition is that td/N → ∞.
To get high probability in 2), a necessary and sufficient condition is that 7 Consider the simplest situation, k = 2 and m = 1, where one is trying to prove (7) by showing that P(e 1 , e 2 lie on the same cycle) is approximately one half. Knowing that the segments starting from e 1 and e 2 have high probability of reaching a common vertex v # , and that performing a cross-join step by toggling the logic f at this v # , to get a new logic f * , changes whether or not e 1 and e 2 lie on the same cycle, one might consider the proof complete. The fallacy is that this procedure does not pair up f with f * , i.e., it need not be the case that (f * ) * = f , because the procedure used to find v # (from f , given e 1 , e 2 ) might find a different v when applied to f * . Overcoming this fallacy entails the study of displacements, starting in  = 111. 4. The color scheme is intended to be purple, green, blue across the top row, orange, yellow for the middle, and red (magenta) for the bottom.  Figure 5, are not shown. In Section 4.5 we discuss this picture, suggesting scaling for the axes, so that in each color, the picture is approximately a standard (rate 1 per unit area) two-dimensional Poisson process. The color scheme is intended to be purple, green, orange.  Figure 4 on page 21. When all k 2 squares are superimposed, as in Figure 6 on the previous page, the spatial location becomes (i, j) = (56, 53). Before the toggle, we have two segments of length t = 90; after the toggle, the segments have length t ± d, that is, 93 and 87.
The argument that (33) suffices is somewhat delicate, akin to a stopping time argument; it is easier to prove -see (37) -that a sufficient condition is that and then it will be easy to arrange for situations corresponding to pairs (t, d) satisfying both (32) and (34).

Displacements Caused by Toggles
Suppose we have k = 3 colors, as shown in Figure 9 on the next page. There are three segments of length t = 90, with respect to f . The segment with respect to f , starting with e 1 , colored red, has v # 1 in position 6 and v # 2 in position 35 -so the red segment, of length 90, is divided into an initial red path of length 6, followed by a red path of length 29, followed by a red path of length 55.
The f segment starting with e 2 , colored yellow, has v # 1 in position 3, and v # 3 in position 75, hence it is divided into yellow paths of lengths 3, 72, 15, in that order.
The f segment starting with e 3 , colored blue, has v # 2 in position 37, and v # 3 in position 71, hence it is divided into blue paths of lengths 37, 34, 19, in that order.
Next, consider f * := f , toggled at v # 1 . Its segment starting from e 1 has length 6 red followed by length (72+15)=87, for a total length of 93. Its segment starting from e 2 has length 3 yellow, followed by length  Figure 8 by the pair of red dots for f . After the toggle at the orange vertex, vertex 53, along the segment that starts red and finishes yellow, is the same as vertex 55, along the segment that start yellow and finishes red. So, in the logic f * , we have two matches between the two segments: the original, at (56,53), shown by the orange dots, and a new one, at (53,55), shown by the red dots. If there is an orange match at (I, J) for the f segments, with I > i and J > j, this match will move to (I − d, J + d).
Similarly, a match between yellow, and some a not equal to red, occurring at (I, J) under the logic f , moves to (I + 3, J) or (I, J + 3) under the logic f * , according to whether a comes after or before yellow, in the list of all k colors.
This effect can be seen in Figure 9 on the preceding page: the orange dot is at (6,3) with displacement d = 3, the purple dot occurs at (35,37) under f , but at (32,37) under f * and f * * .
More cases can be seen in Figure 10 on the next page. The latter point of view is natural, since at each (i, j), for each color pair (a, b), 1 ≤ a < b ≤ k, with . = to allow a small discrepancy for the failure of the good event, P(an arrival 9 at (i, j) in those colors) := P(v a,i = v b,j and v a,i−1 = v b,j−1 ) . = P(there is a leftmost (n − 1)-tuple repeat at a specific location 10 in the coin tossing sequence)= 1/N . Hence, scaling length by 1/ √ N , so that area is scaled by 1/N , leads to the expected number of arrivals per unit area = 1.

The
The picture in Figure 6 on page 24, viewed as occurring on a 10.556 by 10.556 square, closely resembles a (standard, rate 1 dy dx) two-dimensional Poisson process, in each secondary color pair. And overall, ignoring color, the picture resembles the rate k 2 dy dx Poisson process on the t/ √ N by t/ √ N square. There are additional requirements for the Poisson process, beyond having intensity 1 dy dx. Namely, probabilistic independence for the counts in disjoint regions. We do have a good Poisson process approximation, for a combination of two reasons. First, the good event G = G (k,t) from Theorem 5 gives a high-probability coupling (since t = N .6 entails t 3 n 3 /N 2 → 0) between coin tossing and the k de Bruijn segments of length t. f f * Figure 10: Displacements caused by a single toggle. An example with t = 90, and three colors, red, yellow, blue. Say the toggle is at v # 2 occurring at (red,blue) time (35, 40), similar to the purple vertex at (35, 37) in Figure 9 on page 26, but with the displacement changed from -2 to -5, for the sake of being easier to see in the two-dimensional picture. We have thrown in several more matches between two different colors, at various earlier and later times, to show the resulting two-dimensional displacements. Red vertices at times greater than 35 have their time increased by 5, and blue vertices at times greater than 40 have their time decreased by 5. Two-dimensional match locations are indicated by a solid circle for the logic f , and an open circle for the logic f * .
Second, the Chen-Stein method, Theorem 3 of [6], gives a total-variation distance upper bound (tending to zero since t = N .6 entails t 3 n/N 2 → 0) between the process of indicators of leftmost (n − 1)-tuple repeats for coin tossing, and a process with the same intensity, but mutually independent coordinates.
We get our intuition from the Poisson process. But for our proofs, we will work directly with the discrete, dependent processes.

Quick Motivation for the Geometric Progression
We will construct choice functions in (40), based on regions, defined in (36), which in turn are based on a geometric progression in (35). Here we give some motivation for this elaborate construction.
If we search for a single toggle point, in a thin and long rectangle along the diagonal, {(i, j) : |i − j| ≤ d, 0 ≤ i, j ≤ t}, then, in the natural scale of Section 4.5, (and ignoring factors of √ 2 related to the 45 degree rotation, and of 2 for ±d), the rectangle is (32) can be interpreted as meaning that the (natural scale) area, d 1 w 1 , tends to infinity -so that with high probability, matches can be found in this rectangle, and condition (33) can similarly be interpreted as meaning that d 1 → 0, so that no matches will be found in the two-dimensional set, of area on the order of d 2 1 , of points within ∞ distance d 1 of the chosen location (i, j). Now in choosing m toggle points, displacements caused by earlier toggles might change the search result, and we wish to make this unlikely. In more detail: as seen in Section 4.4, toggling a logic f at a vertex v # which appears on two different colors, at times i, j with |i − j| ≤ d causes displacements in the time indices of vertices occurring later on those segments, by amounts up to d. Our m potential toggle points, v # 1 , . . . , v # m , are controlled so that on any segment, v # is preceded by toggle points from among the v # 1 , . . . , v # −1 . If the displacement caused by toggling at v # i is at most d i , then in choosing v # , the accumulated displacements from previous toggles is at most d 1 + · · · + d −1 . By taking the d i in geometric progression, with large ratio r 2 , this accumulated displacement in the search for v # is at most order of d −1 . The rectangle where we search for v # is thin and long, d by w = r/d ; the length of its boundary is order of w , so the area involved in points at a distance at most d −1 = d /r 2 from the boundary is order of 1/r = o(1). Hence with high probability, displaced indices have no effect.

The Search Regions
We divide the time interval [0, t] into m equal length pieces. On the earliest piece, with times in [0,t/m], we demand that we can find a match (i, j) with |i − j| very very very small, but no upper bound on max(i, j) other than max(i, j) < t/m. In the natural scale of Section 4.5 we are searching for matches in a very very very thin and very very long rectangle surrounding the diagonal line i = j; this rectangle has a large area. On the second piece, with times in [t/m, 2t/m], we relax the notion of thin, expanding by a large factor r 2 , relax the notion of long, dividing by the factor r 2 , thus keeping the area constant. We continue this pair of geometric progressions, so the mth region is a thin long rectangle -but still with the same area.
Here is a concrete way to accomplish the above, together with t 3 n 3 /N 2 → 0 and with k fixed. Let The last condition should be understood as "in the natural scale from Section 4.5, the t by t rectangle is ma by ma, and length t/m for the discrete i and j corresponds to length a". Let r := a 1/(2m+1) , so that r 2m+1 = a, and, ignoring the factors of √ 2 involved in the 45 degree rotation, take the thin long rectangles to have shapes . . .
Indexing by = 1 to m, the th rectangle is d := r 2 −2m−2 by w := r 2m−2 +3 on the natural scale. Directly in terms of the discrete i and j, we define so one checks that 1) as increases by 1, the thinness constraint relaxes by a factor of r 2 , while the width constraint becomes more severe by a factor of r 2 , so the area stays constant, 2) the first region, with = 1, allows i, j ∈ [0, t/m], and 3) the last region, with = m, has |i − j|/ √ N ≤ 1/r 2 = o(1) as n → ∞. Consider the possibility discussed in Section 4.3, where a toggle at a vertex appearing in two differently colored segment enables a match within a single segment to become, after the toggle, an earlier match between two different segments. For each = 1 to m, with the (t, d) in (34) given by t = w √ N , d = d √ N , the condition in (34) is indeed satisfied by our specific choice in (35). On the natural scale, and ignoring rotation, we are searching for a match in a δ = d by W = w rectangle, thin and long, with δ → 0 and area δW → ∞. The condition (34), on the natural scale, means that δ 3 W → 0. It implies that, with high probability, we do not find a match between two differently colored segments (at (i, j) in the rectangle, with |i − j|/ √ N < δ,) and simultaneously a nearby match within a single segment. Here, nearby means with both indices within distance δ √ N from i or j. Now, the δ by W rectangle can be covered by W/δ squares, each square of size 4δ by 4δ, and with each successive square being a translate, by δ, of the previous square. Ignoring constant factors, 11 the expected number of arrivals in one square is order of δ 2 , and the chance of two or more arrivals in that one square is order of δ 4 . Thus the expected number of squares with two or more arrivals is order of

Definition of the Choice Functions
for the set of vertices in D n−1 , and write "null" for a special value, not in V , used to encode "undefined". Recall that we write e = (e 1 , . . . , e k ) for the starting n-tuples for k segments, and 11 such as k 2 + k -for the intensity of arrivals in the superimposed process marking matches between two different colors or both within the same color, and 16 -since a 4δ by 4δ square has area 16δ 2 S n,k = {(f, e)} for the space in which we make a uniform choice of logic and starting edges. Also recall our notation (31) for vertices along the k segments. Note that we have both k segments and k colors; these are different concepts, and ultimately, colors will be labeled according to the segment labels under f -but on the soon to be defined "happy" event H, finding v # i on two different segments of f will be equivalent to finding v # i on two different colors. To keep track of the colors, let For = 1 to m, we define Candidates : where Region is defined by (36). For = 1 to m, we define where the value is null if the set of candidates is empty, and otherwise, picking the first (i, j, a, b) in To be very careful, the order for first is the lex-first order on (i + j, max(i, j), a, b).

The Happy Event H = H(k, m, n)
We now describe a subset of S n,k , and refer to this subset as the happy event H. One requirement for (f, e) ∈ H is that, for = 1 to m, each of the values Choice (f, e) = null. Starting with such an (f, e), the choice functions pick out a set of m distinct vertices; call them v # 1 , . . . , v # m , and name the set, V # = {v # 1 , . . . , v # m } -we will use this notation in (42) below.
Given a set of vertices, U ⊂ V , we denote the logic f toggled at the vertices in U as Toggle(f, U ), defined by We define H as follows: and the segments Seg(f, e i , t) collectively have k(t + 1) distinct edges}.
Informally, (f, e) is in the happy event iff the k segments involve no n-repeats, and the choice recipes find m potential toggle vertices, and all 2 m cousins f * , formed by toggling at a subset of those vertices, give rise to the same v # 1 , . . . , v # m . The definition above creates an equivalence relation on H, in which all classes have size 2 m , and all (f * , e) ∈ [(f, e)] share the same sequence v # 1 , . . . , v # m . Using the calculations given in Section 4.6.1 one may show that for fixed k, m, |H|/|S n,k | → 1; that it, that P(H) → 1 as n → ∞.

Definition and Likelihood of an ε-good Schedule
Given k, view A, defined by (38) as an alphabet of size A schedule of length m is a word α 1 α 2 · · · α m ∈ A m . Given a schedule of length m, and m coin tosses and let τ = τ (α 1 α 2 · · · α m , D 1 , . . . , D m ) be the product, with τ 1 applied first, Write σ for an arbitrary permutation in S k , and let be the conditional probability of getting σ for the value of τ , given the schedule α 1 α 2 · · · α m -these are values of the form z/2 m with z in Z. The total variation distance to the uniform distribution on S k is Given ε > 0, a schedule α 1 α 2 · · · α m is ε-good if Distance(α 1 α 2 · · · α m ) < ε.
Proof. There is a well-known bijection between S k and the set C k : where (a b) denotes the transposition (a b) ∈ S k if a = b, and the identity map otherwise. (The corresponding algorithm, to generate uniformly distributed random permutations, is known as the "Fisher-Yates shuffle" or "Knuth shuffle".) Now consider the particular word w of length K over the alphabet A defined in (38), given by If we had m = K and the schedule is α 1 α 2 · · · α m = w, then Distance(α 1 α 2 · · · α m ) ≤ 1 − 2 −K , because for each σ in (45), one assignment of the coin values (D 1 , . . . , D m ) yields τ = σ, via the coins for the genuine transpositions among the (i c i ) on the right side of (45) being heads, and all others coins being tails. When the word w appears times inside a long word α 1 α 2 · · · α m , we have, using a standard result, For historical interest, we note that similar results are in [11,Thm. 1,p. 23]; see also [12]. In a very long random word α 1 α 2 · · · α m , the number of occurrences of w is random, with mean and variance roughly m K −K , so a sufficiently large m guarantees that is sufficiently large, with high probability.

Relativized Permutations
We will define "π f relativized to e 1 , . . . , e k " to be a specific permutation in S 1 ∪ · · · S k−1 ∪ S k , where S j denotes the set of all permutations on {1, 2, . . . , j}. For use in Lemma 4, we need to allow for the possibility that e 1 , . . . , e k are not k distinct n-tuples.
On the happy event H from (42)  For a = 1 to k, write e a := the final edge e a,t of Seg(f 0 , e a , t), so that, under the logic f 0 , Seg(f 0 , e a , t) is a directed path (in color a) from its female end e a to its male end e a . Note that being in H implies that the starting edges e 1 , . . . , e k are distinct, and the final edges e 1 , . . . , e k are distinct.
It is clear -from the relative timing of the appearances of the v # 1 , . . . , v # m along the segments Seg(f 0 , e a , t) -that under the logic f * , Seg(f * , e a , t) is a directed path from its female end e a to its male end e g(a) , where g ≡ g(f * ) is the permutation in S k given by , compare with (43). Finally, for (f, e) ∈ H, and of course, on each toggle class d TV (ĝ • g, uniform(S k )) = d TV (g, uniform(S k )).
With hindsight, we observe that the estimates of this section, and the previous Section 4. 9, have enabled us to dodge a very difficult consideration of interlacement (of the e 1 , . . . , e k and v # 1 , . . . , v # 1 ); see [5] for a study of interlacement.
5 Sampling with k Starts, to Prove Poisson-Dirichlet Convergence

Background, and Notation, for Flat Random Permutations
An overall reference for the following material and history is [3]. For a random permutation in S k , with all k! possible permutations equally likely, for j=1,2,. . . , let L j ≡ L j (k) := size of j-th longest cycle with L j = 0 if the permutation has fewer than j cycles, so that always L 1 (k) + L 2 (k) + · · · = k. The notation L j ≡ L j (k) means that we consider the two notations equivalent, so that we can use either, depending on whether or not we wish to emphasize the parameter k. Write so that L i ≡ L i (k) := L i /k. We use notation analogous to the above, systematically: boldface gives a process, and overline specifies normalizing, so that the sum of the components is 1. This paragraph, summarizing the convoluted history of the limit distribution for the length of the longest cycle, begins with Dickman's 1930 study of the largest prime factor of a random integer. Dickman proved that for each fixed u ≥ 1, Ψ(x, x 1/u )/x → ρ(u), where Ψ(x, y) counts the y-smooth integers from 1 to x. The function ρ is characterized by ρ(u) = 0 for u < 0, ρ(u) = 1 for 0 ≤ u ≤ 1, and for all u, uρ(u) = u u−1 ρ(t) dt. In modern language, writing P + = P + (x) for the largest prime factor of an random integer chosen from 1 to x , Dickman's result is that Later work by Goncharov (1944) and Shepp and Lloyd (1966) showed the corresponding result for random permutations, that for every fixed u ≥ 1, P(L 1 (k) < k/u) → ρ(u). In modern language this is The random variable X 1 appearing in (49) and (50) is the first coordinate of the Poisson-Dirichlet process; the second coordinate corresponds to the second largest prime factor, or second largest cycle length, and so on. For primes, the joint limit was proved by Billingsley (1972) [9], and for permutations, the joint limit was discussed by Vershik and Shmidt (1977) and Kingman (1977). In these early studies, the Poisson-Dirichlet process appears as the limit, but not in a form easily recognizable as either (54) or (55). A fun exercise for the reader would be to prove that the distribution of X 1 , as given by the cumulative distribution function in (49), together with the integral equation characterizing ρ, is the same as the distribution of X 1 as given by its density, which is the special case k = 1 of (54). See [2] for more on the Poisson-Dirichlet in relation to prime factorizations, and [4] for more on the Poisson-Dirichlet in relation to flat random permutations.
Returning to the process of longest cycle lengths in (48), the joint distribution is most easily understood by taking the cycles in "age order". Let Our notation convention has already told the reader that A ≡ A(k) := (A 1 (k), A 2 (k), · · · ), and that A(k) = A(k)/k. Here, the notion of age comes from canonical cycle notation: 1 is written as the start of the first (eldest) cycle, whose length is A 1 , then the smallest i not on this first cycle is the start of the second cycle, whose length is A 2 , and so on -with A j := 0 if the permutation has fewer than j cycles. 13 It is easy to see that A 1 is uniformly distributed in {1, 2, . . . , k}, and for each j = 1, 2, . . ., if there are at least j cycles, then This very easily leads to a description of the limit proportions: with U, U 1 , U 2 , . . . independent, uniformly distributed in (0,1), We write → d to denote convergence in distribution, and we note that U = d 1 − U , where = d denotes equality in distribution. The distribution of the process on the right side of (52) is named GEM, after Griffiths [18], Engen [15], and McCloskey [20]; its construction is popularly referred to as "stick breaking" although stick breaking in general allows U to take any distribution on (0,1), not just the uniform.
Convergence of processes, such as (52) and (56), and our Theorem 1 and Lemmas 3 and 4, are instances of convergence for stochastic processes with values in R ∞ , with the usual compact-open topology, and as such, convergence of processes is equivalent to convergence to the finite-dimensional-distributions, of the first r coordinates, for each r = 1, 2, . . ..
The (usual subspace) topology on ∆ is the same as the metric topology from the 1 distance, We write RANK for the function on ∆ which sorts, with largest first. An example shows some of the subtlety of the preceeding considerations: let e i ∈ ∆ be the i th standard basis vector -all zeros apart from a 1 in the i th coordinate, and let 0 be the all zeros vector. Note that 0 ∈ [0, 1] ∞ \ ∆, and in the larger space [0, 1] ∞ , e n → 0. But for i = j, d(e i , e j ) = 1, and the sequence e 1 , e 2 , . . . does not converge in ∆. The closure of ∆ is 13 In contrast with permutations on {1, 2, . . . , N }, similar to (51), where age order comes from the canonical cycle notation, for shift-register permutations π f , the oldest cycle is not the cycle containing the lex-first n-tuple, 00 · · · 0. In fact, in a random FSR, the cycle starting from 00 · · · 0 has exactly a one-half chance to have length 1. For permutations of a set lacking exchangeability, such as F n 2 , the notion of age order requires auxiliary randomization: the oldest cycle is picked out by a random n tuple; conditional on this cycle, with length A 1 < N , choose an n tuple uniformly at random from the remaining (N − A 1 ) n-tuples not on the first cycle, to pick out the second oldest cycle, whose length is A 2 , and so on. the compact set ∆ = {(x 1 , x 2 , . . . ) ∈ [0, 1] ∞ : x 1 + x 2 + · · · ≤ 1}, and RANK is also defined 14 on ∆; note that 0 ∈ ∆, and our e n example shows that RANK is not continuous on ∆. Donnelly and Joyce, [13,Proposition 4], proved that RANK is continuous on ∆, observing that ". . . in parts of the literature some of these results seem already to have been assumed." By definition, a random (X 1 , X 2 , . . .) ∈ ∆ is the Poisson-Dirichlet process, or has the Poisson-Dirichlet distribution 15 , PD, if for each k = 1, 2, . . ., the joint density of the first k coordinates is given by on the region x 1 > x 2 > · · · > x k > 0 and x 1 + · · · + x k < 1, and zero elsewhere. The Poisson-Dirchlet process may be constructed from the GEM process, which appeared on the right side of (52), by sorting, with For the process of largest cycle lengths in a random permutation, (48), the combination of the easy-to-see limit (52), and the continuity of RANK, and the characterization (55) of the Poisson-Dirichlet distribution, proves that as k → ∞, Our goal is to derive a new tool for proving the same PD convergence as in (56), but for non uniform permutations, such as those arising from a random FSR. It might benefit the reader to jump ahead a little, and read the statement of Lemma 4, and then the more technical Lemma 3, which has the meat of the argument used to prove Lemma 4. We have stated Lemma 3 in a fairly general form, hoping that it may be useful in the context of other combinatorial structures, and perhaps with limits other than the Poisson-Dirichlet. Next, for each k ≥ 1, take an ordered sample of size k, with replacement, from [N ], with all N k possible outcomes equally likely. Such a sample picks out an ordered (by first appearance) list of blocks of π, say β 1 , . . . , β r , with r ≤ k. Let C j ≡ C j (N, k) be the number of elements of the k-sample landing in the block β j , with C j := 0 for j > r, so that C 1 + C 2 + · · · = k. Let C ≡ C(N, k) = (C 1 , C 2 , . . .).
Then, if for each fixed k, as N → ∞, we have it follows that as N → ∞, M (N ) → d X.

Proof.
Here is an outline of our proof. We begin with an analysis of "sampling using k probes", leading to (61), which gets coordinatewise nearness, with exceptional probability O(1/k), uniformly over set partitions, which are indexed by N . This is the crux of our proof; the remainder is similar to Donnelly and Joyce, [13,Proposition 4], on the continuity of RANK. For an overview, writing whp to mean "with high probability", and .
= to mean "approximately equals, in 1 ": X (by (57) Conditional on the value of p, the joint distribution of (D 1 , D 2 , . . .) is exactly Multinomial(k, p). We want to establish a form of uniformity for the convergence of D(k) to p. The first step is to recall the usual proof that for Binomial sampling, with a sample of size k and true parameter p ∈ [0, 1], the sample meanp converges to the true parameter p -because the proof provides a quantitative bound. Specifically, Chebyshev's inequality gets used, with In particular, conditional on any value for p, for i = 1, 2, . . ., with p i = M i = M i (N )/N in the role of p for (60), P(|D i − M i | ≥ δ | (p 1 , p 2 , . . .)) ≤ p i kδ 2 . Hence, taking expectation to remove the conditioning on p, and then using i p i = 1 to analyze the union bound, we have a good event G (proximity in ∞ ) whose complement For x ∈ ∆, j ≥ 1 write S j (x) for the sum of the j largest coordinates of x. Obviously for ω ∈ G, S j (M ) ≥ S j (D) − j δ.
Let ε > 0 be given, and fixed for the remainder of this proof.
Let R(j, ε) := {y = (y 1 , y 2 , . . .) ∈ ∆ : RANK(y) = x = (x 1 , x 2 , . . . ) has x 1 + · · · + x j > 1 − ε}, the set of points in ∆ where some set of j coordinates sums to more than 1 − ε. Note that R(j, ε) is invariant under permutations of the coordinates, including RANK. Since ∆ = ∪ j R(j, ε), and X from (57) is a random element of ∆, there exists j = j(ε) ≥ 1, depending on the distribution of X, such that P(X ∈ R(j, ε)) > 1 − ε; fix such a value for j. [When used in Lemma 4, where the distribution of X is Poisson-Dirichlet, (55) can be used to show that the minimal such j is asympotically log(1/ε).] Using the hypothesis (57), and observing that R(j, ε) is an open set, (the open set part of the Portmanteau Theorem on weak convergence implies that) we can pick and fix a finite k 0 such that for all k ≥ k 0 , P(A(k) ∈ R(j, ε)) > 1 − ε. (65) Using the hypothesis (57) again, we can pick and fix a finite k 1 ≥ k 0 such that for each k ≥ k 1 , there exists a coupling (see Dudley [14], Real Analysis and Probability, Corollary 11. 6.4) such that the 1 distance has P(d(RANK(A(k)), X) ≥ ε) < ε.
Next, intending to use (61) with ε/j used in the role of δ, the upper bound is 1/(kδ 2 ) = j 2 /(kε 2 ). To have this upper bound be at most ε, and also be able to apply (66), we take k to be the maximum of k 1 and the ceiling of j 2 /ε 3 .
The value k has been fixed, in the previous paragraph. Now, the convergence in hypothesis (58) involves the topologically discrete space Z k + , so the distributional convergence can be metrized by the total variation distance, hence there exists a finite N 0 (k) such that for all N ≥ N 0 (k), the total variation distance between distributions is at most ε, and there exists a coupling with P(C(N, k) = A(k)) ≤ ε.
Next, observe that D(N, k) ∈ R(j, ε) and G from (61) with δ = ε/j imply that, each of the j indices i for D(N, k) ∈ R(j, ε) has |M i − D i | < δ, so the sum of those j coordinates of M is at least S j (D) − j δ = S j (D) − ε > 1 − 2ε (as observed in (62))), and the sum of the other (outside the chosen j) coordinates of M is at most 2ε, while the sum of the other (outside the chosen j) coordinates of D is at most ε. Hence, the 1 distance is at most 4ε, accounted for by jδ = ε, from the |M i − D i | with i among the chosen j, plus 2ε + ε using |M i − D i | ≤ M i + D i on the other coordinates, outside the chosen j. This result was that d(M , D) < 4ε. Now M = RANK(M ) by construction, but due to sampling noise, maybe D = RANK(D). However, since RANK is a contraction, we have d(M , RANK(D)) < 4ε.
Putting it all together, for any N ≥ N 0 , the union of the exceptional events from (61) ( M near D, coordinatewise, with P(G c ) ≤ ε), from (66) (RANK(A) near X), and from (67) (D equals RANK(A), in R(j, ε)) has probability at most 4ε, and outside this exceptional event, M is at most 4ε away from RANK(D) = RANK(A), which in turn is at most ε away from X. In summary, there are couplings so that ∀N ≥ N 0 , P(d(M , X) > 5ε) < 4ε. Given k ≥ 1, take an ordered sample of size k, with replacement, from [N ], that is, e 1 , . . . , e k with all N k possible outcomes equally likely. Let σ be π relativized to e 1 , . . . , e k , as defined at the start of Section 4. 10.
where X has the Poisson-Dirichlet distribution, as in (55) and (56).
Proof. Take the processes A(k) of cycle lengths, in age order, as given by (51), for uniform random permutations in S k , to serve as the random elements in the hypotheses (57) and (58) of Lemma 3. This requires using the Poisson-Dirichlet distribution, for X in (57). Fix k. Then (68) holding for each τ ∈ S k implies that the distribution of σ is close, in total variation distance, to the uniform distribution on S k . On the event, of probability N −1 N · · · N −(k−1) N → 1, that the k-sample with replacement from the N population has k distinct elements, the counts C(N, k) from Lemma 3 agree exactly with the cycle lengths in σ. Hence hypothesis (68) implies the hypothesis (58). 6 Putting it All Together: The Proof of Theorem 1 We now have established all the ingredients needed for our proof of Theorem 1. First, the conclusion (4) of Theorem 1 is exactly the conclusion (69) from Lemma 4. 16 To prove Theorem 1, it only remains to establish that the random FSR model (3) satisfies the hypothesis (68) of Lemma 4.
Fix k for use in (68). The uniform choice of (f, e) ∈ S n,k determines π f and the random sample e 1 , . . . , e k -for convenience in Lemma 4 we labeled the set F n 2 with the integers 1,2,. . . , N . Let an arbitrary ε > 0 be given. Fix m = m(k, ε) as per Lemma 2, so that with high probability, a random schedule of length m over the alphabet of size k 2 is ε-good.