Direct and inverse results for popular differences in trees of positive dimension

We establish analogues for trees of results relating the density of a set $E \subset \mathbb{N}$, the density of its set of popular differences, and the structure of $E$. To obtain our results, we formalise a correspondence principle of Furstenberg and Weiss which relates combinatorial data on a tree to the dynamics of a Markov process. Our main tools are Kneser-type inverse theorems for sets of return times in measure-preserving systems. In the ergodic setting we use a recent result of the first author with Bj\"orklund and Shkredov and a stability-type extension (proved jointly with Shkredov); we also prove a new result for non-ergodic systems.


Introduction
In [FW03] Furstenberg and Weiss initiated the use of dynamical methods in the study of Ramsey theoretic questions for trees.They proved a Szemerédi-type theorem using a multiple recurrence result for a class of Markov processes (a purely combinatorial proof was later given by Pach, Solymosi, and Tardos [PST12]).More precisely, they showed that finite replicas of the full binary tree could always be found in (infinite) trees of positive growth rate.It is then a natural question to quantify the abundance of finite configurations in a tree in relation to its size as measured by its upper Minkowski and Hausdorff dimensions.
To begin, we review the analogous question in the integer setting.Specifically, we consider the abundance of configurations in a subset E ⊂ N. Recall that the upper density and upper Banach density of E are The abundance of 2-term arithmetic progressions in E can be related to the density of E in the following way.Consider the sets of popular differences of E with respect to d and d * defined by Furstenberg's correspondence principle [Fur77] states that there exists a measure-preserving system (X, B, ν, S) and A ∈ B with ν(A) = d(E) such that for all integers k 1 and 0 = n 1 , . . ., n k ∈ N, Taking k = 2, it follows that ∆ 0 (E) contains the set of return times of A. Applying the mean ergodic theorem then gives (1) d(∆ 0 (E)) d(R) lim where the lower density d is defined for E ⊂ N by If in the above the upper density is replaced by the upper Banach density then ν can further be chosen to be ergodic [Fur81, Proposition 3.9] (see [BHK05,Proposition 3.1] for an explicit proof).
Following Furstenberg and Weiss [FW03], we formulate a correspondence principle for arbitrary finite configurations in a tree and use it to obtain analogues of the inequality (1).We then analyse the case of equality in (1) and its analogues for trees using inverse theorems for the set of return times.In the ergodic situation we use a result of Björklund, the first author, and Shkredov [BFS19] and a stability-type extension proved jointly with Shkredov in Appendix A, while in the general case we prove a slightly weaker statement (Theorem 5.1).Using these we obtain inverse theorems for inequality (1): a tree for which equality holds must contain arbitrarily long "arithmetic progressions" with a fixed common difference.
1.1.Main Results.To describe our results, we first summarise the necessary definitions (see Section 2 for precise formulations).For clarity of exposition, in this introduction we restrict our attention to the case r = 2 of our results and make corresponding simplifications to the notation.
Fix an integer q 2. In this paper a tree can be visualised as a directed graph T with a distinguished vertex (the root) having no incoming edges, such that each vertex has between 1 and q outgoing edges and each nonroot vertex has exactly one incoming edge.(Technically, we work with the vertices of the graph with the partial order induced by directed paths.)The "size" of T can be quantified by its upper Minkowski and Hausdorff dimensions dim M T and dim T , which are defined by an identification of such trees with closed subsets of [0, 1].
1.1.1.Tree analogues of popular difference sets.A k-term arithmetic progression (k-AP) in E ⊂ N can be viewed as an affine map {0, . . ., k − 1} → E. We consider "affine" maps satisfying certain branching conditions from configurations C ("finite trees") to trees T .If there exists such a map with "common difference" n taking the root of the configuration to v ∈ T , we say that Using extensions of upper density and upper Banach density to subsets of trees, we define sets of "generic parameters" We also introduce certain configurations F and D which are analogues of 2-APs, and their generic parameters can be interpreted as popular differences for trees.In particular, our first result is a version of (1): Theorem A (= Theorem 4.1 and Theorem 4.2 for r = 2).For any tree T we have 1.1.2.Inverse theorems for sets of return times.Given the direct result Theorem A, we are interested in characterising trees such that equality holds (or almost holds).To illustrate the ideas we consider here the situation when equality is (almost) achieved in (1), which is the analogous question for subsets of N. Observe that the density of the set of return times of A is then close to the measure of A. It is natural to expect in this situation that the dynamics of A under S is rigid in some way, and this is indeed the case.Let (X, B, ν, S) be a measure-preserving system, and let A be a measurable set with ν(A) > 0 and set of return times R. Using a theorem of Kneser we prove the following result: Theorem B (= Theorem 5.1).If d(R) = ν(A) > 0, then there exists an integer m 1 such that up to ν-null sets Question 1.1.Does the assumption d(R) = ν(A) suffice to prove the conclusion of Theorem B? If ν is ergodic then Question 1.1 has an affirmative answer, and further there is an inverse result for cases of almost equality.The following theorem is an easy corollary of results by Björklund, the first author, and Shkredov in [BFS19]: then there exists an integer m 1 such that R = mN and X [BFS19] shows that for every β > 1 there exists a non-ergodic measure-preserving system (X, B, ν, S) and A ∈ B of arbitrarily small measure such that d(R) βν(A) and there is no m 1 such that R = mN.
1.1.3.Inverse results for popular difference sets.As a corollary of Theorem B and Furstenberg's correspondence principle we immediately obtain the following inverse-type result for (1): If we consider ∆ * 0 (E) and d * (E) in place of ∆ 0 (E) and d(E), we can apply Theorem 1.2 to obtain the following inverse result: 1.1.4.Inverse results for G(F ) and G * (F ).Propositions 1.4 and 1.5 can be interpreted as saying that (almost) equality holds in (1) for a subset E ⊂ N only if E is "similar" to the periodic set mN.In the tree setting we prove analogous results.
For every m 1, define T mN to be the tree such that v ∈ T mN has q outgoing edges if the directed path from the root to v has length a multiple of m, and 1 outgoing edge otherwise.The inequalities in Theorem A are equalities for T mN (see Subsection 2.0.1).
For every k 1, define the configuration V m,k to be the first k levels of T mN .The following two theorems are analogues of Proposition 1.4 and Proposition 1.5 respectively.
Theorem C (= Theorem 6.1 for r = 2).Let T be a tree.Assume that Then there exists an integer m 1 such that dim M T = m −1 , and d(V m,k ) > 0 for every k 1.
Theorem D (= Theorem 6.2 for r = 2).Let T be a tree.Assume that Then there exists an integer m 1 such that dim T = m −1 , and d * (V m,k ) > 0 for every k 1.
Remark 1.6.We show in Subsection 2.0.2 that Theorem D cannot be improved.Indeed, for every ε > 0 there exists a tree T ε such that and the configuration V m,k does not appear at all in T ε for some large k.
Our final result is another partial analogue of Proposition 1.5.
Organisation of the paper.After describing the combinatorial and dynamical background (Section 2) and establishing Furstenberg-Weiss correspondence principles (Section 3), in Section 4 we prove lower bounds for the densities of popular differences for trees.We then use inverse theorems for sets of return times in measure-preserving systems (Section 5 and Appendix A) to prove inverse theorems for these lower bounds (Section 6).
Acknowledgments.We thank Michael Björklund and James Parkinson.The current paper is a sequel of joint works of A.F. and I.S. with them.A.F. is grateful to Itai Benjamini for inspiring conversations on the subject during his visit to the Weizmann Institute hosted by Omri Sarig.He thanks Omri and the Weizmann Institute for hospitality.He also thanks Haotian Wu and Cecilia González Tokman for fruitful discussions.I.S. is grateful to SMRI and the School of Mathematics and Statistics at the University of Sydney for funding his visit and for their hospitality.We also thank the anonymous referee for a thorough reading of the paper, which led to many improvements.
Combinatorial setup.Let Λ * = ∪ ∞ n=0 Λ n be the set of finite words over Λ, where Λ 0 is the singleton comprising the empty word ∅.Consider the partial order on Λ * defined by v w if w is the concatenation vu of v and some u ∈ Λ * .A tree is then a nonempty subset T ⊂ Λ * closed under predecessors and having no maximal elements with respect to .We refer to elements of T as vertices (using the natural graph-theoretic terminology), and write Every tree contains ∅ (the root), and for every v ∈ T there is a tree Remark 2.1.Trees are combinatorial realisations of closed sets in Λ N , a symbolic analogue of [0, 1].Given a tree T , the set {(a i ) i 0 ∈ Λ N : (a 0 , . . ., a n ) ∈ T for all n ∈ N} is closed in Λ N (with the product of discrete topologies on Λ), and there is an inverse map sending a closed subset A ⊂ Λ N to the tree {v ∈ Λ * : vw ∈ A for some w ∈ Λ N }.
This motivates several definitions we give below.
To define the Hausdorff dimension of a tree, we first define the analogue of an irredundant open cover for trees.A section of a tree T is a finite subset Π ⊂ T such that |Π∩{w ∈ T : w v}| = 1 for all but finitely many v ∈ T .Define also Example 2.2.Given E ⊂ N and 2 r q, define the tree If E is a "periodic" set (such as mN) then T r E is "self-similar" and dim M T r E = dim T r E .Elements of Λ * correspond to cylinder sets of Λ N .By the Carathéodory extension theorem, Borel probability measures on Λ N are in bijection with functions τ : Λ * → [0, 1] such that τ (∅) = 1 and τ (v) = a∈Λ τ (va) for all v ∈ Λ * .We call such functions Markov trees, since the support By abuse of notation we denote it by P(Λ N ), since it is homeomorphic to the space of Borel probability measures on Λ N with the weak- * topology.
The dimension of a Markov tree [Fur70, Definition 7] is .
Given a subset V ⊂ T we define its upper density and its upper Banach density Remark 2.3.These definitions specialise to their integer counterparts in the degenerate case q = 1, justifying the notation.The inequality d * (V ) d(V ) also holds for our more general definition.To see this, observe that it is enough to construct Markov trees π N supported on T such that (the last equality follows from reindexing the sum).But the above formula defines such a Markov tree on vertices w with l(w) N , and we can choose π N to be any consistent extension to the remaining vertices (cf. the proof of Theorem 3.4).
Both equalities follow directly from the definitions.For example, for the second equality we observe that for any τ with |τ | ⊂ T and any v ∈ |τ | we have We use the term configuration to refer to a nonempty finite subset C ⊂ Λ * closed under predecessors (a finite tree).Terminology and notation defined above for trees are used for configurations as appropriate without comment.A configuration C is nonbranching if |C(n)| 1 for all n ∈ N and branching otherwise.
By analogy with arithmetic progressions in N, we consider "affine embeddings" of C in a tree T .More precisely, for a vertex v ∈ T and n ∈ N we say • if w is the longest initial subword common to w 1 and w 2 , then ι(w) is the longest initial subword common to ι(w 1 ) and Equivalently, we say the configuration C appears at v (with parameter n).Observe that trivially every configuration appears at every vertex with parameter 0.
We will be concerned with the following configurations (see Figure 1): For every configuration C and tree T we define the sets of generic parameters Remark 2.5.Notice that F r appears at v ∈ T r E with parameter n if and only if D r,2 appears at v with parameter n if and only if The self-similarity of T r mN implies that dim T r mN = dim M T r mN .Also, by Remark 2.5 it follows that .
2.0.2.Sharpness of Theorem 6.2.Next, we modify the construction of T r mN to obtain for every ε > 0 a tree T ε with such that there exists k 1 with V r,m,k and V r,m,mM +1 1 does not appear in T ε .By the self-similarity of T ε and Example 2.2, we have Since ∆ * 0 (E) = mN, observe that by Remark 2.5 we have Dynamical setup.Given a Markov tree τ and v ∈ |τ |, define the Markov tree τ v by τ v (w) = τ (vw)/τ (v) for every w ∈ Λ * .Using this we define a Markov process p : M → P(M ) on the space Here a ∈ Λ can be interpreted as labelling the root of τ ∈ P(Λ N ) with information about the past under the dynamics τ → τ a .Since p is continuous, it induces a Markov operator P on C(M ) (a positive contraction satisfying P 1 = 1) defined by the formula P f (a, τ ) = i∈Λ τ (i)f (i, τ i ).The pair (M, p) is a CP-process.
Remark 2.6.For simplicity of notation, frequently we will denote a labelled Markov tree by its underlying Markov tree.Similarly, we write p τ = p(a, τ ) since the latter is independent of a. Further, a labelled Markov tree denoted by τ a is always assumed to have label a.
By a distribution we mean a Borel probability measure.A distribution ν on M is stationary for (M, p) if M P f dν = M f dν for all continuous f .Note that if ν is stationary, then the above formula for P extends to a well-defined operator on L p (M, ν) for 1 p ∞, and by Jensen's inequality this extension is a Markov operator.
For i ∈ Λ, define the set B i = {(a, τ ) ∈ M : a = i} of Markov trees labelled by i.The sets B i are clopen and partition M .Define also for 2 r q the set A r = {τ ∈ M : |{i : p τ (B i ) > 0}| r} of Markov trees τ such that there are at least r vertices in |τ |(1).Observe that A r is open and dense in M , and hence is not closed for r > 1.
Define on M the information function where by convention 0 log q 0 = 0.The entropy of a stationary distribution ν is then Proposition 2.7.If ν is a stationary distibution for (M, p), then Proof.Using the above bounds on H(τ ) and the definition of A r , Rearranging gives the proposition.
Endomorphic extension.It will be necessary to work with an extension of the CP-process (M, p), following [FW03].
. By abuse of notation we denote by p the natural lift of p : M → P(M ) to a continuous function M → P( M ).Explicitly, p τ = a∈Λ τ 0 (a)δ τ a , where (τ a ) i = τ i+1 for i < 0 and (τ a ) 0 = τ a 0 .We also denote by P the corresponding Markov operator on C( M ).The pair ( M , p) is said to be an endomorphic extension of (M, p).
A stationary distribution ν on M induces a stationary distribution ν on M , and by construction ν is invariant under the right shift S : (τ i ) i 0 → (τ i−1 ) i 0 [Ho14, Definition 6.3, Remark 6.4, and Lemma 6.8].The Koopman operator of S therefore acts on H = L 2 ( M , B, ν), where B is the Borel σ-algebra on M .Since p τ ({ ω}) > 0 implies S( ω) = τ , a straightforward calculation gives Lemma 2.8.For any f, g ∈ H we have P (f Sg) = gP f .Integrating with respect to ν shows that P and S are adjoint operators on H , and taking f = 1 gives the formula P S = I.It follows that S n P n is the orthogonal projection from H onto the closed subspace Lemma 2.9.P and S restrict to mutually inverse operators Proof.As ν is S-invariant, it follows from Lemma 2.9 that By composing H with the projection M → M onto the 0-th coordinate, the information function H is defined on M , and hence the entropy of a stationary distribution for ( M , p) is defined as for (M, p).

The Furstenberg-Weiss correspondence principle
In [FW03] Furstenberg and Weiss associated to a tree of positive upper Minkowski dimension a stationary distribution for the CP-process (M, p), and showed that the appearance of D 2,k n could be deduced from the positivity of quantities defined on the dynamical system.In this section we extend their construction to arbitrary configurations, and prove an analogous correspondence principle based on [Fur70] for trees of positive Hausdorff dimension.
3.1.Construction of configuration-detecting functions.Given a configuration C and an integer n 1, we say that a function f : ).
Remark 3.1.Alternatively we could sum over all injections γ : C(1) → Λ and define ϕ Cn by We also have 0 ϕ Cn 1. Indeed, since ϕ {∅}n = 1 and P is positive Starting instead with φ D r,1 n = 1 Ar and φ Cn = 1 for nonbranching configurations C, we can adapt the above recursion to construct an alternative family of configuration-detecting functions φ Cn ϕ Cn more suitable for computations.Let C(1) = {v ∈ C(1) : C v is branching}.We define φ Cn recursively by the formula Note that ϕ D r,1 n 1 Ar = φ D r,1 n .Similarly we have 0 φ Cn 1.
As the B i are clopen and P takes continuous functions to continuous functions, the ϕ Cn are continuous.However, the φ Cn are in general not continuous since A r is not clopen for r > 1.
If C is a configuration such that the configurations C v are all "isomorphic" for v ∈ C(1), the above recursion can be simplified by omitting the sum over bijections β.For integers 2 r q and n 1, define (nonlinear) operators R r,n on L ∞ (M ) by R r,n f = The following lemma is used in the proofs of the correspondence principles to account for the lack of continuity of φ Cn .Cn by the positivity of P .Then the monotone function α : δ → M φ δ Cn dν has countably many discontinuities, so we can choose a sequence δ j → 0 such that α is continuous at δ j for all j.
We claim . By Urysohn's lemma there are continuous functions h r such that 1 A δ r h r 1 A δ r .Defining h Cn to be the function obtained by repeating the construction of φ Cn with h r in place of 1 Ar , it follows that φ δ Cn Continuity of α at δ implies lim inf k→∞ M φ δ Cn dν k α(δ), and a similar argument with δ < δ proves the claim.Hence

Correspondence principle for upper density.
Theorem 3.4.For every tree T with dim M T > 0, the CP-process (M, p) has a stationary distribution µ such that H(µ) = dim M T , (2) µ(A r ) dim M T − log q (r − 1) 1 − log q (r − 1) , and for every configuration C and every integer n 1 (3) Proof.Let (L k ) k 1 be an increasing sequence such that Fix an arbitrary label a ∈ Λ, and for each k 1 let π k be any Markov tree labelled by a such that π k (v) = |T (L k )| −1 for all v ∈ T (L k ) (note that this condition determines π k on vertices of level at most L k ).Then any weak- * limit of the distributions is stationary, and we choose µ to be such a limit.
Since H(x) is continuous and π k (v) = a∈Λ π k (va), Recall that for every v ∈ |π k | we have the bounds where the third equality follows from the bounds Proposition 2.7 immediately gives the inequality (2).
To prove the inequality (3), applying a change of summation variable and using the definitions of π k and φ Cn gives The conclusion follows from Lemma 3.3.

3.3.
Correspondence principle for upper Banach density.
Since η is ergodic, the mean ergodic theorem for contractions [EFHN15, Theorem 8.6] implies By diagonalisation there exists an increasing sequence (N k ) k 1 and τ ∈ D(θ) such that for all f in a countable set of continuous functions.Taking this set to be dense in C(M ) under the uniform norm, the limit (8) holds for all continuous functions.Letting v k ∈ |θ| be a sequence of vertices such that θ v k → τ , and passing to a subsequence of (v k ) if necessary, it follows that the sequence of measures η k defined by and the inequality (7) follows from Lemma 3.3.
1 Ergodicity here means an extremal point in the compact convex subset of all stationary distributions.
Remark 3.7.Composing the projection (τ i ) i 0 → τ 0 with a C n -detecting function gives a map M → [0, 1] which is positive at (τ i ) i 0 if and only if C appears at the root of |τ 0 | with parameter n.
The recursive constructions of configuration-detecting functions in Subsection 3.1 can be used to construct their lifts using the abuses of notation Observe that the inequalities (2), (3), (6), and (7) are still valid when the distributions µ and η and the configuration detecting functions φ Cn are replaced with their lifts on M .In the remainder of the paper we work only with ( M , p) and use Theorems 3.4 and 3.5 for the endomorphic extension without comment.

Proof of direct theorems
Using the correspondence principles of Section 3, we bound from below the densities of the sets of popular differences for trees.We first prove such a result for the generic parameters of the configuration F r , since the proof is relatively simple but contains the main ideas.
Theorem 4.1.Let T be a tree.For 2 r q we have Proof.Since P and S are adjoint, Theorem 3.4 gives . By the mean ergodic theorem and the theorem follows from inequality (2) of Theorem 3.4.Using Theorem 3.5 in place of Theorem 3.4 in the above argument, we obtain the second inequality after taking → 0. Theorem 4.1 is also immediate from the corresponding result for D r,2 , which we prove now.
Theorem 4.2.Let T be a tree.For 2 r q we have Proof.We start with the proof of the first inequality.The idea is to show that G(D r,2 ) essentially contains the set of return times of A r , the density of which we can bound from below by the mean ergodic theorem.First observe that by Proposition 2.10 Since 1 Ar ∈ H ∞ , by Lemma 2.9 P n−1 1 Ar = SP n 1 Ar .Then by Lemma 2.8 and orthogonality r!
ρ}, and observe it is well-defined up to a µ-null set.Since 0 ϕ D r,1 n 1 Ar 1 and 0 ρ1 Zρ ϕ D r,1 n 1, the positivity of both P and conditional expection imply By Jensen's inequality and the adjointness of P and Combining the above with the correspondence principle Theorem 3.4, it follows that G(D r,2 ) contains a cofinite subset of where the last inequality follows from the mean ergodic theorem and Hence Z ⊃ { τ ∈ M : ϕ D r,1 n ( τ ) > 0} = A r up to a µ-null set, so d(G(D r,2 )) µ(A r ).The theorem then follows from inequality (2) of Theorem 3.4.
Using Theorem 3.5 in place of Theorem 3.4 in the above proofs, we obtain the second inequality after taking → 0.

Inverse theorems for return times
Let (X, B, ν, S) be a measure-preserving system, and let A be a measurable set with ν(A) > 0. If R = {n ∈ N : ν(A ∩ S −n A) > 0} is the set of return times of A, then by the mean ergodic theorem d(R) ν(A).
Theorem 5.1.If d(R) = ν(A) > 0, then there exists an integer m 1 such that up to ν-null sets Proof.Let (N k ) k 1 be an increasing sequence such that Rearranging and using the assumption d(R) = ν(A), it follows that Hence d (N k ) (R γ + R γ ) exists and equals d(R).Since (N k ) k 1 was arbitrary, the conclusion follows.
For 0 < γ < 1 2 , Lemma 5.3 and Kneser's theorem [Kne53] (see also [Bil97, Theorem 1.1]) therefore imply the existence of m 1 and K ⊂ {0, . . ., m − 1} such that • R γ ⊂ K + mN, • |K + K| = 2|K| − 1, where the operation on the left hand side is in Z/mZ, and Further, for all n ∈ R there exists γ > 0 small enough such that n + R γ ⊂ R by Lemma 5. Proof.By [BFS19, Section 1.5] all the theorems in [BFS19] hold for ergodic N-actions, so [BFS19, Theorem 1.3] gives the existence of m 1 such that R = mN.Therefore, the sets are mutually disjoint up to ν-null sets, and ergodicity implies that they partition X.

Inverse theorems for trees
In this section we prove inverse results for Theorems 4.1 and 4.2 (Theorems 6.1, 6.2, and 6.4) using the results of the previous section.Theorem 6.1.If T is a tree and 2 r q with d(G(F r )) = dim M T − log q (r − 1) 1 − log q (r − 1) > 0, then dim M T = m −1 (1 − log q (r − 1)) + log q (r − 1) for some positive integer m.Moreover, d(V r,m,mk 1 ) > 0 for every k 1.
Proof.Fix > 0 small enough, and let R = {n ∈ N : η (A r ∩ S −n A r ) > 0}.In the case of (11), from the proof of Theorem 4.1 we have In the case of (12), we invoke the proof of Theorem 4.2.Recall that there exist a measurable set Z such that A r ⊂ Z modulo η -null sets and an increasing chain of measurable sets (Z ρ ) ρ>0 with ρ>0 Z ρ = Z such that for every δ > 0 we have Hence for small enough and ρ the assumptions of Theorem A.3 are satisfied, so there exists m 1 such that R(Z ρ ) = R δ (Z ρ ) = mN, where R(Z ρ ) = {n ∈ N : η (Z ρ ∩ S −n Z ρ ) > 0}.Since this is true for all ρ > 0 small enough and R ⊂ ρ>0 R(Z ρ ), we conclude that for small enough there exists m 1 such that R ⊂ mN.
This immediately implies that M = m−1 i=0 S −i ∞ j=0 S −mj A r up to η -null sets.In both cases, for small the above inequalities force dim T − log q (r − 1) 1 − log q (r − 1) = m −1 , and hence where → 0 as → 0. We also have dim T η (A r ) + (1 − η (A r )) log q (r − 1) H( η ) dim T − and hence the pair of inequalities m divides i and A c r otherwise.Then the m-periodicity of A r and A c r = m−1 i=1 S −i A r under S −1 gives η -almost everywhere equalities Define for δ > 0 the set It follows from (13) and inequalities ( 14) and (15) that by choosing small enough we can guarantee that η (A δ ) > 0. We will show the existence of δ such that the configuration V r,m,mk If H( τ ) > log q (q − 1) then τ ∈ A q , and if H( τ ) > log q (r − 2) then τ ∈ A r−1 .To prove the appearance of V r,m,mk 1 at the root of |τ 0 | it therefore suffices to give sufficiently large lower bounds for H( τ v ) for l(v) mk.Lemma 6.3.For every δ 1 , δ 2 > 0 there exists δ > 0 such that for 1 Proof.We prove both statements simultaneously by induction on j.For j = 1 we have H( τ ) 1−δ for all τ ∈ A δ by ( 16), so any δ < δ 2 suffices.Further, observe that H is a continuous function attaining its maximum at τ such that p τ (B i ) = q −1 for all i ∈ Λ. Hence given δ 1 > 0 the set {τ 0 (v) : τ ∈ A δ , v ∈ |τ 0 |(1)} is contained in an interval of length < δ 1 (containing q −1 ) for δ small enough.
Assuming the lemma is true for j i < mk + 1, we prove it for j = i + 1.We first consider (b).For w ∈ |τ 0 |(i) with τ ∈ A δ the inequality (16) gives and rearranging gives By statement (a) of the induction hypothesis sup τ ∈A δ ,w∈|τ0|(i) δ τ 0 (w) → 0 as δ → 0, so by taking δ small enough statement (b) is satisfied for j = i + 1. Combining statement (a) for j = i and statement (b) for j = i + 1 with the same argument as in the base case proves statement (a), noting that if m does not divide j then we consider maxima of H on A c r .
Proof.Given ε > 0 and A, B ∈ B of positive measure, the set of ε-transfer times from for every δ > 0, then there exists m 1 such that R δ = mN for all sufficiently small δ.
Finally we show R δ = mN for small δ.Indeed, since d(R γ ) > 1 2m by combining equation ( 19) with the third implication of Kneser's theorem R γ + R γ ⊂ K + K + mN and the fact that |K| = 1, for every l ∈ N there exists n ∈ N such that mn, m(n + l) ∈ R γ .Therefore ν(A ∩ S −ml A) = ν(S −mn A ∩ S −m(n+l) A) ν((A ∩ S −mn A) ∩ (A ∩ S −m(n+l) A)) ν(A ∩ S −mn A) + ν(A ∩ S −m(n+l) A) − ν(A) (1 − 2γ)ν(A) > δν(A) 2 for δ < 1−2γ ν(A) , so for sufficiently small δ > 0 we have ml ∈ R δ for all l ∈ N. Discussion.The set of transfer times R A,B has strong parallels with the difference set A − B = {a − b : a ∈ A, b ∈ B}, A, B ⊂ Z/rZ, which is one of the main objects of Additive Combinatorics.For example, the lower bound for d(R ε A,B ) in Lemma A.2 corresponds to the simple fact that |A − B| max{|A|, |B|}.It is easy to see that the bound is tight and is attained when B − B belongs to the centraliser of A (or vice versa).It implies that A and B have some periodic structure and it is analogous to our conclusions in Theorems 5.4 and A.3 on the structure of our dynamical system.On the other hand, if A = {0, 1} ⊆ Z/rZ for large r, then A − A = {0, 1, −1} and hence η in Theorem A.3 must be less than 1/2.Moreover, the sets R δ m from Lemma A.2 which are used in the proof of Theorem A.3 can be thought as a dynamical version of the well-known combinatorial e-transform, see, e.g., [TV06, Section 5.1].Although it is non-obvious how to define the higher sumsets in the dynamical context, an analogue of the Plünnecke-Rusza triangle inequality for dynamical systems would be a first step towards such a theory.
and only if C appears at the root of |τ | with parameter n.In preparation for proving correspondence principles we construct recursively several families of configuration-detecting functions.We first construct a set of configuration-detecting functions ϕ Cn .For the simplest configuration {∅}, we can take ϕ {∅}n = 1 for all n 1.Given I ⊂ Λ such that |I| = |C(1)| and a bijection β : I → C(1), the positivity of i∈I P (1 Bi P n−1 ϕ C β(i) n ) at τ ∈ M is equivalent to the appearance of C at the root of |τ | with parameter n such that β(i) ∈ C(1) is mapped to iv ∈ |τ | for some v ∈ Λ n−1 .Summing over all choices of I and β, we define ϕ Cn by the recursive formula ϕ Cn = I⊂Λ |I|=|C(1)| β : I ∼ − →C(1) i∈I Bi P n−1 f ).If f detects C v n for (all) v ∈ C(1) and |C(1)| = r, then R r,n f detects C n .Denote by φ Cn the C n -detecting function obtained by applying a sequence of the above operators to the appropriate 1 Ar , and observe that φ Cn = cφ Cn for some integer c 1. Example 3.2.For the configuration F r , we have |C(1)| = r and |C(1) | = 1.There is always a unique bijection I → C(1) , so linearity of P gives φ F r n = 1 Ar i∈Λ P (1 Bi P n−1 1 Ar ) = 1 Ar P n 1 Ar since 1 = i∈Λ 1 Bi .If C(1) = C(1) , the factor 1 A |C(1)| is redundant in the definition of φ Cn as the function in the sum is already supported on a subset of A |C(1)| .For example,

Lemma 3. 3 .φφ
If (ν k ) k 1 is a sequence of distributions on M converging to ν in the weak- * topology, then for every configuration C and integer n 1 lim sup k→∞ M Cn dν k M Cn dν.Proof.Define for δ ∈ [0, 1] open sets A δ r = {τ ∈ M : |{i : p τ (B i ) > δ}| r} ⊂ A r , and let φ δ Cn be the function obtained by replacing 1 Ar with 1 A δ r in the recursive construction of φ Cn .Observe that δ δ implies φ δ Cn φ δ Cn dν by the monotone convergence theorem.
2. Since in addition R γ ⊂ R and d(R) = d(R γ ), it follows that n ∈ mN.Then the m sets S −i A, 0 i m − 1 are disjoint (up to ν-null sets) and each of measure m −1 .Theorem 5.4.If (X, B, ν, S) is ergodic and 0 < d(R) < 3 2 ν(A), then there exists an integer m 1 such that R = mN and X = m−1 i=0 S −i ∞ j=0 S −jm A up to ν-null sets.

3 2 η
(A r ) for small enough .By Theorem 5.4 there is a positive integer m such that R = mN and M = m−1 i=0 S −i ∞ j=0 S −mj A r up to η -null sets.

Question A. 4 .
Assume that (X, B, ν, S) is an invertible ergodic system and d(R A,B ), d(R A,C ), d(R B,C ) exist for A, B, C ∈ B. Is it true that µ(C)d(R A,B ) d(R A,C )d(R B,C ) ?