Seminorm control for ergodic averages with commuting transformations along pairwise dependent polynomials

Abstract We examine multiple ergodic averages of commuting transformations with polynomial iterates in which the polynomials may be pairwise dependent. In particular, we show that such averages are controlled by the Gowers–Host–Kra seminorms whenever the system satisfies some mild ergodicity assumptions. Combining this result with the general criteria for joint ergodicity established in our earlier work, we determine a necessary and sufficient condition under which such averages are jointly ergodic, in the sense that they converge in the mean to the product of integrals, or weakly jointly ergodic, in that they converge to the product of conditional expectations. As a corollary, we deduce a special case of a conjecture by Donoso, Koutsogiannis, and Sun in a stronger form.

Here and throughout the paper, we consider a system (X, X , µ, T 1 , . . . , T ℓ ), i.e. invertible commuting measure-preserving transformations T 1 , . . . , T ℓ acting on a Lebesgue probability space (X, X , µ), polynomials p 1 , . . . , p ℓ ∈ Z[n] that need not be distinct but are always assumed to have zero constant terms, and functions f 1 , . . . , f ℓ ∈ L ∞ (µ). The motivation for studying the limiting behaviour of (1) comes from the proof of the multidimensional polynomial Szemerédi theorem [3], in which the averages (1) are the central object of investigation.
It has been proved by Walsh [19] that the averages (1) converge in L 2 (µ); however, little is known about the nature of the limit except in several special cases. In this paper, we examine the following question. (i) jointly ergodic for the system (X, X , µ, T 1 , . . . , T ℓ ), in the sense that for all f 1 , . . . , f ℓ ∈ L ∞ (µ)? (ii) weakly jointly ergodic for the system, in the sense that for all f 1 , . . . , f ℓ ∈ L ∞ (µ)?
The first step in deriving the identities (2) and (3) is usually to establish control over the L 2 (µ) limit of (1) by one of the Gowers-Host-Kra seminorms constructed in [13], leading to the following question.
Question 2. When are the polynomials p 1 , . . . , p ℓ ∈ Z[n] good for seminorm control for the system (X, X , µ, T 1 , . . . , T ℓ ), in the sense that there exists s ∈ N such that holds for all functions f 1 , . . . , f ℓ ∈ L ∞ (µ) whenever |||f j ||| s,T j = 0 for some j ∈ {1, . . . , ℓ}? Question 1(i) was originally posed by Bergelson and was motivated by a result of Berend and Bergelson [1] that covered the case of linear polynomials. It was investigated thoroughly by Donoso, Koutsogiannis, and Sun [7], as well as in subsequent work of the three authors and Ferré-Moragues [6], in which they identified a set of sufficient (but not necessary) conditions under which Questions 1(i) and 2 can be answered affirmatively for general polynomials. Their conditions turned out to also be necessary when all the polynomials are equal.
In [11], we have addressed Questions 1 and 2 under the assumption that the polynomials p 1 , . . . , p ℓ are pairwise independent, without any extra assumption on the system. Specifically, we have showed that for every family of pairwise independent polynomials p 1 , . . . , p ℓ ∈ Z[n], there exists s ∈ N such that the identity (4) holds for all systems and all L ∞ (µ) functions under the stated seminorm assumptions. We then gave a necessary and sufficient spectral condition under which the identities (2) and (3) hold.
In this paper, we drop the assumption of pairwise independence. We are thus interested in answering Questions 1 and 2 for averages (1) in which some of the polynomial sequences p 1 , . . . , p ℓ may be pairwise dependent or even identical. An example of this is the average The pairwise dependence of the polynomials n 2 , n 2 , n 2 + n means that, contrary to the results in [11], we cannot establish the seminorm control described in Question 2 for all systems. Rather, we need to identify a special property of the system that makes the seminorm control possible. The needed property turns out to be the following.
Definition (Good and very good ergodicity property). Let ℓ ∈ N and p 1 , . . . , p ℓ ∈ Z[n]. We say that the system (X, X , µ, T 1 , . . . , T ℓ ) has the good ergodicity property for the polynomials p 1 , . . . , p ℓ is ergodic for all the aforementioned indices i, j and values c i , c j , then we say that the system has the very good ergodicity property for the polynomials p 1 , . . . , p ℓ .
Remark. As we work under the standing assumption that the polynomials have zero constant terms, the equality p i /c i = p j /c j holds for some nonzero c i , c j ∈ Z precisely when the polynomials p i , p j are linearly dependent.
For instance, the system (X, X , µ, T 1 , T 2 , T 3 ) has the good ergodicity property for the families n 2 , n 2 , n 2 + n or 2n 2 , 2n 2 , n 2 + n if and only if the only functions invariant under T 1 T −1 2 are those invariant under T 1 and T 2 , and it has the very good ergodicity property if T 1 T −1 2 is ergodic, i.e. only constant functions are invariant under T 1 T −1 2 . We first address Question 2 for systems with the good ergodicity property. Theorem 1.1 (Seminorm control). Let D, ℓ ∈ N and p 1 , . . . , p ℓ ∈ Z[n] be polynomials of degrees at most D with the good ergodicity property for (X, X , µ, T 1 , . . . , T ℓ ). Then there exists s ∈ N, depending only on D, ℓ, such that for all f 1 , . . . , f ℓ ∈ L ∞ (µ), we have whenever |||f j ||| s,T j = 0 for some j ∈ {1, . . . , ℓ}.
If additionally the transformations T 1 , . . . , T ℓ are ergodic, we get the following result. Corollary 1.3 (Joint ergodicity). The polynomials p 1 , . . . , p ℓ ∈ Z[n] are jointly ergodic for the system (X, X , µ, T 1 , . . . , T ℓ ) if and only if the following two conditions hold: (i) all the transformations T 1 , . . . , T ℓ are ergodic and the system has the very good ergodicity property for the polynomials; (ii) for eigenvalues α j ∈ Spec(T j ), j ∈ {1, . . . , ℓ}, we have lim N →∞ 1 N N n=1 e(α 1 p 1 (n) + · · · + α ℓ p ℓ (n)) = 0 (7) unless α 1 = · · · = α ℓ = 0. Theorem 1.1 and Corollary 1.3 extend Theorems 2.8 and 2.14 in [11] that cover the case of pairwise independent polynomials. Theorem 1.2 and Corollary 1.3 can be put in the context of the following conjecture by Donoso, Koutsogiannis, and Sun (the version presented below is a special case of [7,Conjecture 1.5]) that was motivated by previous results of Berend and Bergelson [1]. In the statement that follows, we say that a sequence of commuting transformations (T n ) n∈N on a probability space (X, X , µ) is ergodic for µ if for every f ∈ L ∞ (µ). (ii) the sequence (T p 1 (n) 1 × · · · × T p ℓ (n) ℓ ) n∈N is ergodic for µ × · · · × µ. Conjecture 1 thus lists conditions that have to be checked in order to verify the joint ergodicity of a family of polynomials for a system. In fact, our Theorem 1.2 is stronger than Corollary 1.4 in a number of ways. First, Theorem 1.2 gives a criterion for weak joint ergodicity, not just for joint ergodicity, meaning that the transformations T 1 , . . . , T ℓ need not be ergodic for us to be able to say anything meaningful. Second, our good ergodicity property lists strictly fewer conditions to check in order to verify joint ergodicity than the condition (i) in Conjecture 1. For instance, for the average (5), the condition (i) in Conjecture 1 requires us to check the ergodicity of the three sequences ((T 1 T −1 2 ) n 2 ) n∈N , (T n 2 1 T −(n 2 +n) 3 ) n∈N , and (T n 2 2 T −(n 2 +n) 3 ) n∈N . By contrast, the good ergodicity property of (5) holds if and only if I(T 1 T −1 2 ) = I(T 1 ) ∩ I(T 2 ), which is in any way a necessary condition for ((T 1 T −1 2 ) n 2 ) n∈N to be ergodic. Finally, we remark that the original version of Conjecture 1 from [7] is stated for more general tuples ).
It is possible that an extension of our method would establish an analogue of Theorem 1.1 for such averages. However, besides the fact that new problems arise, the technical complexity of some of our arguments in this paper is already formidable, and it would likely grow significantly if we wanted to tackle the more complicated averages (8). We have therefore refrained from seeking an extension of Theorem 1.1 to averages of tuples as in (8), sticking instead to the simpler and arguably more natural averages (1).

1.2.
Extensions to other averaging schemes. Our arguments can be modified to cover multivariate polynomials and averages over arbitrary Følner sequences 3 . While these modifications do not require any new ideas, they force us to introduce even more complicated notation and deal with straightforward but tedious technicalities. For this reason, we omit their proofs. We start with a generalisation of Theorem 1.1.
Theorem 1.6. Let K, ℓ ∈ N be integers and (I N ) N ∈N be a Følner sequence on Z K . The polynomials p 1 , . . . , p ℓ ∈ Z[n] are weakly jointly ergodic for the system (X, X , µ, T 1 , . . . , T ℓ ) along (I N ) N ∈N , in the sense that = 0 for all f 1 , . . . , f ℓ ∈ L ∞ (µ), if and only if the following two conditions hold: (i) the system has the good ergodicity property for the polynomials; (ii) for all nonergodic eigenfunctions χ j ∈ E(T j ), j ∈ {1, . . . , ℓ}, we have 1.3. Outline of the article. We begin by recalling in Section 2 basic notions and results from ergodic theory, especially those related to the families of Gowers-Host-Kra and box seminorms, dual functions, as well as nonergodic eigenfunctions. Next, we state in Section 3 preliminary technical lemmas that are used to prove our main results, most of which are variations of results from [6,9,11]. Having stated all preliminary definitions and lemmas, we discuss at length in Section 4 two baby cases of Theorem 1.1 that illustrate some of our techniques and point out the necessity of the good ergodicity property. We then proceed in Section 5 to discuss the formalism and the general strategy for handling longer families. In Section 6, we give more details of various maneuvers outlined in Section 5. These moves take the form of several highly technical propositions that play a crucial part in the inductive proof of Theorem 1. Some of the techniques used in this paper were inspired by our earlier work in [11] where we dealt with pairwise independent polynomials p 1 , . . . , p ℓ . The lack of pairwise independence introduces serious additional complications. Consequently, we are forced to keep track of more information about the averages (1) than in [11], particularly concerning the properties of the functions present therein and the coefficients of the polynomial iterates. The methods developed in this paper therefore differ in a number of places from the techniques employed in [11], and the argument from [11] is most emphatically not a special case of the argument presented in the current paper.
The need to have a better grip on the averages necessitates more extensive formalism than one in [11], making our argument rather hard to digest on a first reading. To compensate for this, we have included numerous examples that illustrate the main new obstacles and ideas in the proofs. The reader is invited to first go over these examples before delving into the details of the proofs.

Ergodic background and definitions
In this section, we present various notions from ergodic theory together with some basic results.
2.1. Basic notation. We start with explaining basic notation used throughout the paper.
The letters C, R, Z, N, N 0 stand for the set of complex numbers, real numbers, integers, positive integers, and nonnegative integers. With T, we denote the one dimensional torus, and we often identify it with R/Z or with [0, 1). We let [N ] := {1, . . . , N } for any N ∈ N. With Z[n], we denote the collection of polynomials with integer coefficients.
For an element t ∈ R, we let e(t) := e 2πit . If a : N s → C is a bounded sequence for some s ∈ N and A is a non-empty finite subset of N s , we let E n∈A a(n) := 1 |A| n∈A a(n).
We commonly use the letter ℓ to denote the number of transformations in our system or the number of functions in an average while the letter s usually stands for the degree of ergodic seminorms. We normally write tuples of length ℓ in bold, e.g. b ∈ Z ℓ , and we underline tuples of length s (or s + 1, or s − 1) that are typically used for averaging, e.g. h ∈ Z s . For a vector b = (b 1 , . . . , b ℓ ) ∈ Z ℓ and a system (X, X , µ, T 1 , . . . , T ℓ ), we let , we set e j to be the unit vector in Z ℓ in the j-th direction, and we let e 0 = 0, so that T e j = T j for j ∈ [ℓ] and T e 0 is the identity transformation.

Ergodic seminorms.
We review some basic facts about two families of ergodic seminorms: the Gowers-Host-Kra seminorms and the box seminorms.
2.2.1. Gowers-Host-Kra seminorms. Given a system (X, X , µ, T ), we will use the family of ergodic seminorms ||| · ||| s,T , also known as Gowers-Host-Kra seminorms, which were originally introduced in [13] for ergodic systems. A detailed exposition of their basic properties can be found in [14,Chapter 8]. These seminorms are inductively defined for f ∈ L ∞ (µ) as follows (for convenience, we also define ||| · ||| 0 , which is not a seminorm): |||f ||| 0,T := f dµ, and for s ∈ N 0 , we let is the multiplicative derivative of f with respect to T . The limit can be shown to exist by successive applications of the mean ergodic theorem, and for f ∈ L ∞ (µ) and s ∈ N 0 , we have |||f ||| s,T ≤ |||f ||| s+1,T (see [13] or [14,Chapter 8]). It follows immediately from the definition that be the multiplicative derivative of f of degree s with respect to T . It can be shown that we can take any s ′ ≤ s of the iterative limits to be simultaneous limits (i.e. average over [H] s ′ and let H → ∞) without changing the value of the limit in (9). This was originally proved in [13] using the main structural result of [13]; a more "elementary" proof can be deduced from [4,Lemma 1.12] once the convergence of the uniform Cesàro averages is known (and yet another proof can be found in [12,Lemma 1]). For s ′ := s, this gives the identity (10) |||f ||| 2 s s,T = lim Moreover, for 1 ≤ s ′ ≤ s, we have It has been established in [13] for ergodic systems and in [14,Chapter 8,Theorem 14] for general systems that the seminorms are intimately connected with a certain family of factors of the system. Specifically, for every s ∈ N there exists a factor Z s (T ) ⊆ X , known as the Host-Kra factor of degree s, with the property that |||f ||| s,T = 0 if and only if f is orthogonal to Z s−1 (T ).
Equivalently, ||| · ||| s,T defines a norm on the space L 2 (Z s−1 (T )) (for a proof see [14,Theorem 15,Chapter 9]). Box seminorms satisfy the following Gowers-Cauchy-Schwarz inequality [12,Proposition 2] lim sup (One can replace the limsup with a limit since it is known to exist.) We frequently bound one seminorm in terms of another. An inductive application of formula (12), or alternatively a simple application of the Gowers-Cauchy-Schwarz inequality (16), yield the following monotonicity property: a special case of which is the aforementioned bound |||f ||| s,T ≤ |||f ||| s+1,T for any f ∈ L ∞ (µ) and system (X, X , µ, T ).
The following lemma allows us to compare box seminorms depending on the invariant σ-algebras of the transformations involved.
Proof. We prove this by induction on s. For s = 1, we simply have where we use the fact that E(f |A) L 2 (µ) ≤ E(f |B) L 2 (µ) whenever A ⊆ B. For s > 1, we use the induction formula for seminorms and the result for s = 1 to deduce that The claim follows by iterating this procedure s − 1 more times.

Dual functions and sequences. Let
(the limit exists in L 2 (µ) by [13]). We call D s,T (f ) the dual function of f of level s with respect to T . The name comes because of the identity a consequence of which is that the span of dual functions of degree s is dense in L 1 (Z s−1 (T )).
Let (X, X , µ, T 1 , . . . , T ℓ ) be a system. Using the identities (15) and (18) we get the special case of which is For s ∈ N, we denote , 1 ≤ s ′ ≤ s} to be the set of sequences of 1-bounded functions coming from dual functions of degree up to s for the transformations T 1 , . . . , T ℓ , and moreover we define D := s∈N D s .
The utility of dual functions comes from the following approximation result.
is a linear combination of finitely many dual functions of level s with respect to T ; Proposition 2.3 will be used as follows. Suppose that the L 2 (µ) limit of the average for some functions f 1 , . . . , f ℓ ∈ L ∞ (µ), then we decompose f ℓ as in Proposition 2.3 for sufficiently small ε > 0 so that for some (finite) linear combinations of dual functions. Applying the triangle inequality and the pigeonhole principle, we deduce that there exists k for which where D(n)(x) := T n ℓ D s,T ℓ g k (x) for n ∈ N and x ∈ X. This way, we essentially replace the term T p ℓ (n) ℓ f ℓ in the original average by the more structured piece D(p ℓ (n)).

2.4.
Eigenfunctions and criterion for weak joint ergodicity. Following [10], we define the notion of eigenfunctions that appears in the statement of Theorem 1.2.
We denote the set of nonergodic eigenfunctions with respect to T by E(T ). For ergodic systems, a non-ergodic eigenfunction is either the zero function or a classical unit modulus eigenfunction. For general systems, each function χ ∈ E(T ) satisfies χ(T x) = 1 E (x) e(φ(x)) χ(x) for some T -invariant set E ∈ X and measurable T -invariant function φ : X → T.
Definition (Weak joint ergodicity). We say that a collection of sequences a 1 , . . . , a ℓ : N → Z is weakly jointly ergodic for the system (X, X , µ, T 1 , . . . , T ℓ ), if The notion of non-ergodic eigenfunction is important for us because of the following criterion for weak joint ergodicity from [11].
We will apply Theorem 2.4 in the proof of Theorem 1.2. The first condition will be satisfied thanks to the stronger result proved in Theorem 1.1.

Preliminary results
In this section, we gather auxiliary results needed in the proof of Theorem 1.1. We start with the following simple lemma from [11,Lemma 5.2], which allows us to pass from averages of sequences a h−h ′ to averages of sequences a h .
Subsequently, we state a result that allows us to replace a function f m in the original average by a more structured averaged termf m that encodes the information about the original average. This idea originates in the finitary works on the polynomial Szemerédi theorem by Peluse and Prendiville [15,16,17], and it has been successfully applied in the ergodic theoretic setting in [9,5,11]. The version below differs from earlier formulations because we additionally show that if the to-be-replaced function f m is measurable with respect to some sub-σ-algebra A, then the same can be assumed about the function that replaces it. In our applications, A will always be either the full σ-algebra X or the invariant sub-σ-algebra of some measure preserving transformation. Lemma 3.2 (Introducing averaged functions). Let a 1 , . . . , a ℓ : N → Z be sequences, (X, X , µ, T 1 , . . . , T ℓ ) be a system, and 1-bounded functions f 1 , . . . , f ℓ ∈ L ∞ (µ) be such that Suppose moreover that f m is A-measurable for some sub-σalgebra A ⊆ X . Then there exist N k → ∞ and g k ∈ L ∞ (µ), with g k L ∞ (µ) ≤ 1, k ∈ N, such that forf where the limit is a weak limit, we have We setg The weak compactness of L 2 (µ) implies that there exists a subsequence (N k ) k∈N of (Ñ k ) k∈N for which the sequence where g k :=gÑ k , k ∈ N, converges weakly to a 1-bounded functionf m . We observe from (20) that Taking k → ∞, using the A-measurability of the 1-bounded function f m , and applying the Cauchy-Schwarz inequality, we get Hence, An application of the Cauchy-Schwarz inequality gives the result.
We now present two different versions of the dual-difference interchange result that we use in our smoothing argument. While the second version in the proposition below has already been used in [11], the first one is novel since the extra information that it provides has not been required in earlier arguments. Proposition 3.3 (Dual-difference interchange). Let (X, X , µ, T 1 , . . . , T ℓ ) be a system, s, s ′ ∈ N, b 1 , . . . , b s+1 , c ∈ Z ℓ be vectors, (f n,k ) n,k∈N ⊆ L ∞ (µ) be 1-bounded, and f ∈ L ∞ (µ) be defined by for some N k → ∞, where the average is assumed to converge weakly.
then there exist 1-bounded functions u h,h ′ , invariant under both T b s+1 and T c , for which the inequality For the proof of Proposition 3.3, we need the following version of the Gowers-Cauchy-Schwarz inequality from [11].
For part (ii), we use the fact that T a , T b commute and the T a -invariance of f to observe that T a T hb f = T hb T a f = T hb f . From this and the mean ergodic theorem it follows that and so Proof of Proposition 3.3. Part (ii) follows from [11,Proposition 5.7], and so we only prove (i). Letting u h := E(∆ b 1 ,...,bs;h E(f |I(T c ))|I(T b s+1 )), we deduce from (21) that Using the T c -invariance of u h and the properties of conditional expectations, we deduce that For ǫ ∈ {0, 1} s \{1}, we let f ǫ := C |ǫ| E(f |I(T c )). We deduce from the previous identity and the fact f = lim k→∞ E n∈[N k ] f n,k , where convergence is in the weak sense, that For fixed k, n, H ∈ N, we apply Lemma 3.4 with f 1 := f n,k , obtaining The functions u h,h ′ are both T b s+1 and T c invariant given that each (u h ) h∈N s is and the transformations T 1 , . . . , T ℓ commute. The result follows from the fact that the limsup of a sum is at most the sum of the limsups.
The proposition below enables a transition between qualitative and soft quantitative results. Its proof uses rather abstract functional analytic arguments and the mean convergence result of Walsh [19]. If we instead use the mean convergence result of Zorin-Kranich [20] we can also get a variant that deals with averages over an arbitrary Følner sequence on Z D . Proposition 3.6 (Soft quantitative control [11,Proposition A.1]). Let m, ℓ, s ∈ N with m ∈ [ℓ], p 1 , . . . , p ℓ ∈ Z[n] be polynomials, and (X, X , µ, T 1 , . . . , T ℓ ) be a system. Let Y 1 , . . . , Y ℓ ⊆ X be sub-σ-algebras. Suppose that for all f j ∈ L ∞ (Y j , µ), j ∈ [ℓ], the seminorm |||f m ||| s controls the average , are 1-bounded and |||f m ||| s ≤ δ, then Finally, we need the following PET result that gives box seminorm control for averages with extra terms involving dual functions. It extends [6, Theorem 2.5] that did not involve dual functions. We remark that these arguments are proved by combining a complicated variant of Bergelson's original PET technique [2] with concatenation results of Tao and Ziegler [18].
with the following property: for every system (X, X , µ, T 1 , . . . , T ℓ ), functions f 1 , . . . , f ℓ ∈ L ∞ (µ), and sequences of functions D 1 , . . . , Due to the monotonicity property of box seminorms (17), we may (and will) always assume that s ≥ 2 in Proposition 3.7 (this is necessary in order to apply Lemma 2.1 in some of our arguments in Section 7).

Two motivating examples
In this section, we prove Theorem 1.1 for the family n 2 , n 2 , n 2 + n and sketch the changes needed to handle the family n 2 , n 2 , 2n 2 + n. These two cases illustrate some (but not all) key ideas needed in the proof of Theorem 1.1 in a simple setting. Additional complications arise for more general families and the ideas needed to overcome them will be illustrated with examples given on subsequent sections.
Example 1 (Seminorm control for a monic family of length 3). Our goal is to prove the following result.
Proposition 4.1 (Seminorm control for n 2 , n 2 , n 2 + n). There exists s ∈ N such that for every system (X, X , µ, We subsequently show in Corollary 4.5 that the same conclusion holds if we assume |||f i ||| s,T i = 0 for i = 1, 2, instead.
By Proposition 3.7, there exist vectors b 1 , . . . , b s+1 ∈ {e 3 , e 3 − e 1 , e 3 − e 2 } such that The goal is to inductively replace all the vectors b 1 , . . . , b s+1 in the seminorm different from e 3 by e ×s ′ 3 for some s ′ ∈ N, which is achieved in the following proposition.
The assumption I( is crucial for the following special case that will be invoked in the proof of the general case of Proposition 4.1. for some s ∈ N; the other cases follow similarly. By Proposition 3.7, the equality (27) holds under the assumption that |||f 3 ||| e ×s 1 2 ,(e 2 −e 1 ) ×s 2 = 0 for some s 1 , s 2 ∈ N 0 (which are absolute in that they do not depend on the system or the functions). The assumption and so (27) holds whenever |||f 3 ||| s,T 2 = 0 for s = s 1 + s 2 .
Proposition 4.4 follows from [11,Proposition 8.4] since the two non-dual terms (28) involve the pairwise independent polynomials n 2 and n 2 + n.
Having stated all the needed auxiliary results, we are finally in the position to prove Proposition 4.2.
Step 1 (ping): Obtaining auxiliary control by a seminorm of f 2 .

By Proposition 3.2 (applied with
The invariance property implies that Consequently, there exist a set B ⊆ N 2s of positive lower density and ε > 0 such that Each of the averages in (29) takes the form (26); we therefore apply Propositions 4.3 and 3.6 to obtain s 1 ∈ N (independent of the system or the functions) and δ > 0 (which depends only on ε and the system but not the functions) such that Together with Lemma 3.1, the inductive formula for seminorms (15) and Hölder inequality, the inequality (30) implies that We deduce from this that the seminorm |||f 2 ||| b 1 ,...,bs,e ×s 1 2 controls the average (24). This seminorm control is not particularly useful as an independent result since the vectors b 1 , . . . , b s may not involve the transformation T 2 in any way. However, it is of great importance as an intermediate result applied in the next step of our argument.
Step 2 (pong): Obtaining control by a seminorm of f 3 . Using our assumption that (25) fails, we now replace f 2 byf 2 and deduce from Proposition 3.2 (applied again with A = X ) that where D h,h ′ is a product of 2 s elements of D s 1 . Once again, there exists a set B ′ ⊆ N 2s of positive lower density and ε > 0 such that for D 1 , . . . , D 2 s ∈ D s 1 . By Proposition 4.4, the averages (32) are controlled by |||g 3 ||| s ′ ,T 3 for some s ′ ∈ N depending only on s, and so Proposition 3.6 gives δ > 0 such that Together with Lemma 3.1, the Hölder inequality and the inductive formula (15) for the seminorms, this implies that |||f 3 ||| b 1 ,...,bs,e ×s ′ 3 > 0, which is what we claim.
Finally, we show how we can use Proposition 4.1 to obtain control of the average (24) by seminorms of other terms.
Corollary 4.5. There exists s ∈ N such that for every system (X, X , µ, Proof. The statement that for some absolute s ∈ N, the identity |||f 3 ||| s,T 3 = 0 implies the vanishing of the L 2 (µ) limit of (24) follows from Proposition 3.7 and Proposition 4.1, so the content of Corollary 4.5 is to show control by other terms. Suppose that We then apply Proposition 2.3 and the pigeonhole principle to find a 1-bounded dual function D s,T 3 g such that where D(n) := T n 3 D s,T 3 g. By Proposition 3.7, we have |||f 2 ||| e ×s ′ 2 ,(e 2 −e 1 ) ×s ′ > 0 for some absolute s ′ ∈ N. The ergodicity assumption of T 1 T −1 2 and Lemma 2.2 imply that |||f 2 ||| e ×2s ′ 2 > 0, and an analogous argument gives |||f 1 ||| The argument in Example 1 is relatively clean because the leading coefficients of the polynomials are all 1. When this is not the case, minor modifications are required as explained in the next example.
Suppose that the L 2 (µ) limit of (33) does not vanish, and suppose moreover that b s+1 = 2e 3 − e 2 . Arguing as in the proof of Proposition 4.2, we arrive at the inequality invariant. We can no longer apply the invariance property in the same way as before since the polynomial 2n 2 + n is not divisible by 2. Instead, we first split N into the odd and even part and then apply the triangle inequality to deduce that Only then can we apply the ). The important part about this new tuple is that the first two polynomials are again pairwise dependent while the last one is pairwise independent with any of the first two, and that the new tuple retains the good ergodicity property.

Formalism and general strategy for longer families
We move on towards deriving Theorem 1.1 for longer families. To prove it for averages we need to analyse more complicated averages of the form that appear at the intermediate steps of the proof of Theorem 1.1, much like averages (26) and (28) show up at the intermediate steps of the proof of Proposition 4.1, a special case of Theorem 1.1 for the family n 2 , n 2 , n 2 +n. In (34) and (35), p 1 , . . . , p ℓ , ρ 1 , . . . , ρ ℓ , q 1 , . . . , q L are (not necessarily distinct) polynomials with integer coefficients and zero constant terms, (X, X , µ, T 1 , . . . , T ℓ ) is a system, f 1 , . . . , f ℓ ∈ L ∞ (µ) are 1-bounded functions, and D 1 , . . . , D L ∈ D are sequences of functions. Since D j (q j (n)) has the form T q j (n) π j g j for some π j ∈ [ℓ] and g j ∈ L ∞ (µ), the averages (35) converge in L 2 (µ) by [19]. The same comment applies for all limits involving dual sequences that appear in the rest of the paper.
The purpose of this section is to introduce a formalism that helps us meaningfully discuss averages (35). This will be done in Section 5.1. Subsequently, we give in Section 5.2 an overview of the strategy used to prove Theorem 1.1. The details of various moves discussed in Section 5.2 will be presented in Section 6.
While discussing various examples in this and the next sections, we often say informally that for j ∈ [ℓ], the average (35) is controlled by a T η j -seminorm of f j (or that we have seminorm control of (35) by a T η j -seminorm of f j ) if for all d, L ∈ N, there exists s ∈ N such that if the T η j -seminorm of f j vanishes, then the L 2 (µ) limit of (35) is 0 for all sequences D 1 , . . . , D L ∈ D d and all functions f 1 , . . . , f ℓ ∈ L ∞ (µ) satisfying some explicitly stated invariance properties. We also say informally that we have seminorm control over the average (35) if we have seminorm control by a T η j -seminorm of f j for every j ∈ [ℓ].

5.1.
The formalism behind the induction scheme. We start by introducing a handy formalism used for the induction scheme in the proof of Theorem 1.1. We often associate the average (35) with the tuple T . This tuple does not contain any information about the polynomials q 1 , . . . , q L , but this is not necessary. These terms play no role in our inductive argument and can be easily disposed of using Proposition 3.7. The only thing they do influence is the degree s of the seminorm with which we end up controlling the average (35).
Definition (Indexing data). For an average (35) or the associated tuple T , we let ℓ be their length, d := max deg ρ j be its degree, and η be its indexing tuple. For j ∈ [ℓ], we set d j := deg ρ j . We let K 1 be the maximum number of pairwise independent polynomials within the family ρ 1 , . . . , ρ ℓ (we set K 1 := 1 if ℓ = 1 or every two polynomials are pairwise dependent). We partition where j 1 , j 2 belong to the same I t if and only if ρ j 1 , ρ j 2 are linearly dependent. Thus, I 1 , . . . , I K 1 partitions [ℓ] into index sets corresponding to families of pairwise dependent polynomials. Furthermore, we define to be the set of indices corresponding to polynomials of maximum degree, and we rearrange I 1 , . . . , I K 1 so that L = t∈[K 2 ] I t for some K 2 ≤ K 1 . We also let K 3 := |L| be the number of maximum degree polynomials, and we notice that Sometimes, we denote L = L(ρ 1 , . . . , ρ ℓ ), I t = I t (ρ 1 , . . . , ρ ℓ ), and K i = K i (ρ 1 , . . . , ρ ℓ ) to emphasise the dependence on a specific family of polynomials.
What this definition captures is that every time we encounter in our tuple two transformations T η j 1 , T η j 2 whose indices η j 1 , η j 2 lie in the same set I t , the identity (37) is satisfied. For instance, the tuple (36) has the good ergodicity property along η precisely when . The first identity corresponds to comparing the pairs (j 1 , j 2 ) = (1, 2), (1,7) corresponding to the occurrences of transformations T 1 and T 2 from the cell I 1 (T 1 occurs at the index 1 and T 2 occurs at the indices 2 and 7). The second identity comes from comparing the pairs (j 1 , j 2 ) = (3, 4), (4,5) corresponding to the transformations T 3 , T 4 from the cell I 2 (T 3 occurs at the indices 3 and 5 whereas T 4 occurs at the index 4).
The guiding principle behind our arguments is that we derive seminorm control of the average (35) by inductively applying seminorm control of an average that is "simpler" than the original average in an appropriate sense. For instance, in Proposition 4.2 and Corollary 4.5, we obtained seminorm control for the tuple (T n 2 1 , T n 2 2 , T n 2 +n 3 ) from Example 1 by invoking seminorm control for the following tuples: ) in the ping step of the smoothing argument in Proposition 4.2; ) in the pong step of the smoothing argument in Proposition 4.2 (the asterisk is introduced purely for convenience; it denotes the term replaced by a product of dual functions); • (T n 2 1 , T n 2 2 , * ) in Corollary 4.5. The relative complexity of a tuple or an average is captured by the following notion.
counts the number of times the transformations (T j ) j∈It appear in T with a polynomial iterate of maximum degree 6 . We note that |w| := w 1 + · · · + w K 2 = K 3 . We say that the type w is basic if it has the form w = (K 3 , 0, . . . , 0).
For instance, the tuple (36) has type (3, 3, 0, 0): this is because T 1 , T 2 corresponding to I 1 occur thrice, as do the transformations T 3 , T 4 corresponding to I 2 , while the transformations T 5 , T 7 corresponding to I 3 and I 4 do not occur at all. We do not care about the occurrence of T 6 since it has a linear iterate. 6 We note here that the type of T depends not just on η and the polynomials ρ1, . . . , ρ ℓ , but also on the ordering of the sets I1, . . . , IK 1 . We do not record this dependence explicitly, instead fixing some ordering of I1, . . . , IK 1 a priori.
It is instructive to see what happens when the polynomials ρ 1 , . . . , ρ ℓ are pairwise independent. In that case, I t = {j t } for every t ∈ [ℓ] and w t counts the number of times the transformation T jt appears among (T η j ) j∈L , or equivalently the number of times that T jt attains a polynomial iterate of maximal degree. So for pairwise independence polynomials, this notion of type recovers the concept of type from [11, Section 8.2] (up to permuting I 1 , . . . , I K 1 ).
For the set of tuples in For instance, we have the following chain of type inequalities The first, third, fifth, sixth, and eighth inequality follow from the condition w ′ κ+1 > w κ+1 > 0 while the second, fourth, and seventh inequality are consequences of the condition w ′ κ+1 = 0 < w κ+1 . This is a rather atypical ordering, but it turns out to determine well which of the tuples T is "simpler" than the other.
The motivation for this particular choice of ordering is that in the arguments to come, we will be passing from a tuple T of type w ′ < w in two ways. In one of them, the type w ′ will meet the condition w ′ κ+1 > w κ+1 > 0 while in the other one, it will satisfy the condition w ′ κ+1 = 0 < w κ+1 . Arguing this way, we arrive in finitely many steps at a tuple of a basic type w = (K 3 , 0, . . . , 0), which constitutes the base case of our induction. This transition will be explained in greater detail at the end of Section 5.2 and illustrated in Example 10.
Then < defines a strict partial order on A.
Proof. It is clear that < is asymmetric and irreflexive, so it remains to show that it is transitive. Suppose that w ′′ < w ′ , w ′ < w, and let κ 1 , 5.2. The general strategy. In this section, we outline how to obtain a seminorm control of a given tuple using seminorm control for tuples of lower type or shorter length.
Definition (Controllable and uncontrollable tuples). Let t w := max{t : w t > 0} be the last nonzero index of w. We call a tuple T of a non-basic type w (or the corresponding average) controllable, if there exists an index m ∈ [ℓ] such that: • η m ∈ I tw ; • for every other i ∈ [ℓ] with η i = η m , we have ρ i = ρ m . If m satisfies the aforementioned assumption, we say that it satisfies the controllability condition; in this case, Proposition 3.7 guarantees that the average (35) is controlled by |||f m ||| b1,...,bs for nonzero vectors b 1 , . . . , b s . If no such index m exists, we call the tuple uncontrollable.
The previous notions of controllability are supposed to capture whether Proposition 3.7 is applicable to the relevant tuples in our setting. Defining the partitions I 1 = {1, 2, 3, 4}, I 2 = {5, 6}, I 3 = {7, 8} corresponding to the independent polynomials n 2 , n 2 + n, n 2 + 2n respectively, the first tuple has type (5, 3, 0) while the second one has type (6, 2, 0), and for both tuples we have t w = 2. The first one is controllable because for the index m = 5, the only values i = m such that η i = 5 are i = 7, 8 corresponding to the polynomial n 2 + 2n, which is distinct from n 2 + n.
Our strategy for proving seminorm control will work rather differently for controllable and uncontrollable tuples. Suppose first that the average (35) with tuple T of a non-basic type is controllable, and that an index m satisfies the controllability condition. Then Proposition 3.7 guarantees that the average (35) for some polynomials ρ ′ 1 , . . . , ρ ′ ℓ , q ′ 1 , . . . , q ′ L ′ ∈ Z[n], 1-bounded functions f ′ 1 , . . . , f ′ ℓ ∈ L ∞ (µ) and sequences of functions D ′ 1 , . . . , D ′ L ′ ∈ D. Moreover, the indexing tuple η ′ ∈ [ℓ] ℓ is obtained from η by changing η m into η i for some i = m, i.e. the passage from (35) to (40) goes by replacing T ηm at index m with T η i . Importantly, the new average (40) satisfies several key properties: (i) it has a lower type than the original average, so that we can argue by induction; (ii) the new average retains the good ergodicity property of the original average; (iii) the functions f ′ j in the new average satisfy some invariance properties; (iv) as long as the aforementioned invariance properties are satisfied, the new average (40) is controlled by the seminorm |||f ′ j ||| s,T η ′ j for each j ∈ [ℓ] and some s ∈ N.
Proposition 6.1 explains the exact way in which we pass from averages (35) to (40) so that the property (i) is satisfied, and Proposition 6.4 establishes the property (ii). Proposition 6.5 then ensures that the functions f ′ j in (40) satisfy needed invariance properties. We note though that the new average (40) need not be controllable. For instance, if we take the average corresponding to the tuple (38), then in the ping step we replace T n 2 +n 5 by the same iterate of one of T 1 , T 2 , T 3 , T 4 . The new average is then uncontrollable, as is the tuple (39), corresponding to replacing T n 2 +n 5 by T n 2 +n 1 . Hence, controllability may not be preserved while performing the procedure outlined above.
In the pong step of the smoothing argument for (35), we deal with averages of the form Crucially, each function f ′′ j is invariant under some composition of T η j and T j . This allows us to replace (some iterate of) T η j in (41) by (some iterate of) T j , a procedure that we call flipping, and show that an average (41) essentially equals an average of the form of length ℓ − 1. The details of how flipping is performed are presented in Proposition 6.6. An inductive application of a suitable modification of Theorem 1.1 then gives a control of (42) by a T j -seminorm of f ′′′ j for each j = i, and the invariance property of f ′′ j translates it into a control of (41) by a T η j -seminorm of f ′′ j for each j = i. A straightforward argument analogous to one at the end of the proof of Proposition 4.2 gives a control of (35) by a T ηm -seminorm of f m .
If the average (35) is uncontrollable, then we proceed rather differently. The previous strategy breaks right at the start since there is no index m satisfying the controllability condition. Consequently, whichever index m with η m ∈ I tw we take, we cannot employ Proposition 3.7 to bound the seminorm by |||f m ||| b1,...,b s+1 for nonzero vectors b 1 , . . . , b s+1 . What we use instead is the inductive assumption that the functions f j at indices j with η j ∈ I tw are invariant under a composition of (some power of) T η j and (some power of) T −1 j . Using this invariance property, we perform flipping once more to replace the original average (35) by a new average where η ′′′′ j = j whenever η j ∈ I tw ; the details are provided in Corollary 6.7. This new average has the good ergodicity property and is controllable. Importantly, it has a lower type, which is established in Proposition 6.8. We can then obtain seminorm control of (35) by inductively invoking the seminorm control of (43). The seminorm control of (43) is proved in turn by the smoothing argument for controllable averages described above.
Thus, whether the tuple T of a non-basic type is controllable or not, the idea is to control it by a Gowers-Host-Kra seminorm by invoking seminorm control for tuples of lower type or smaller length that naturally appear when examining T . If the of type w is controllable, we will invoke seminorm control for tuples of in the ping step of the smoothing argument. Specifically, the new type w ′ is obtained from w by the type operation defined in (45). In the pong step of the smoothing argument, we will use seminorm control for tuples of length ℓ−1. If the tuple T is uncontrollable, we will invoke seminorm control for tuples of type w ′ satisfying w ′ t = w t for t ∈ [κ] and w ′ κ+1 = 0 < w κ+1 for some κ ∈ [K 2 − 1]; the details are given in Proposition 6.8(v). The way in which we apply seminorm control for tuples of lower type motivates the choice of our somewhat weird ordering on types.
Reducing to tuples of lower type this way and noting that the tuples of length ℓ can have at most (ℓ + 1) ℓ distinct types, we arrive after finitely many steps at tuples of basic type w = (K 3 , 0, . . . , 0), i.e. those in which all the transformations come from the same class I 1 . Tuples of basic type will serve as the basis for our induction procedure. For instance, the tuple (T n 2 1 , T n 2 2 , T n 2 +n 2 ) from Example 1 has basic type (3, 0) because it only involves the transformations T 1 , T 2 whose indices belong to the set I 1 = {1, 2} (corresponding to the polynomial n 2 ); however the type (2, 1) of (T n 2 1 , T n 2 2 , T n 2 +n 3 ) is not basic because this tuple involves both the transformations T 1 , T 2 and the transformation T 3 with an index from the set I 2 = {3} (corresponding to the polynomial n 2 + n).

Further maneuvers and obstructions for longer families
Having presented the general strategy for proving Theorem 1.1 for longer families, we move on to discuss in detail the specific maneuvers outlined in Section 5.2. In this section, we state and prove various partial results that give substance to the moves discussed in Section 5.2. We also discuss a number of obstructions that appear in the process and have to be overcome before we can give a complete proof of Theorem 1.1. All of the above is illustrated with examples that will hopefully make the abstract statements in this and the next section more comprehensible to the reader. We then move on in Section 7 to prove Theorem 1.1.
The plan for this section is as follows. In Section 6.1, we discuss how to obtain a tuple of lower type in the ping step of the seminorm smoothing argument for controllable tuples. Section 6.2 exhibits the necessity of assuming that the functions appearing in the averages (40) have some invariance properties. In particular, we show on examples how these properties are essentially used to tackle tuples of basic type and to perform the pong step of the seminorm smoothing argument for controllable tuples. We also give details of the flipping procedure that relies on these invariance properties. Subsequently, we discuss in Section 6.3 how flipping can be used to reduce an uncontrollable tuple to a controllable tuple of a lower type. Finally, we combine the details of the aforementioned moves in Section 6.4 and show how we can reach a tuple of a basic type in a finite number of steps.
6.1. Reducing controllable tuples to tuples of lower type in the ping step. As explained in Section 5.2, in the ping part of the smoothing argument, we will replace the . The new indexing tuple η ′ will be defined via the operation for some distinct values m, i ∈ L. This indexing tuple corresponds to replacing the term , and all the other terms T has to be chosen carefully: it must preserve the good ergodicity property and allow for seminorm control. Lastly, it must have a lower type. For this reason, we let Supp(w) := {t ∈ [K 2 ] : w t > 0}, and if there exist distinct integers t 1 , t 2 ∈ Supp(w), we define the type operation σ t 1 t 2 w by the formula For instance, σ 32 (3, 2, 2) = (3, 3, 1). As a consequence of our ordering on types, we have σ t 1 t 2 w < w whenever t 2 < t 1 (the assumption w t 2 > 0 is crucial here), so in particular (3, 3, 1) < (3, 2, 2). Proposition 6.1, which we are about to state now, specifies how these tuples of lower type are picked, what form they take, and what properties they enjoy. It will be used in our smoothing argument in Proposition 7.5 in that the tuple of lower type for which we invoke the induction hypothesis in the ping step is constructed in Proposition 6.1. Proposition 6.1 (Type reduction for controllable tuples). Let ℓ ∈ N, η ∈ [ℓ] ℓ be an indexing tuple, ρ 1 , . . . , ρ ℓ ∈ Z[n] be polynomials with leading coefficients b 1 , . . . , b ℓ . Let also (X, X , µ, T 1 , . . . , T ℓ ) be a system and T be a tuple of a non-basic type w whose last nonzero index is t w . Suppose that T is controllable. Then there exists λ ∈ N such that for every r ∈ {0, . . . , λ − 1}, there exist an index t ′ w ∈ Supp(w) distinct from t w , an index i ∈ [ℓ] with η i ∈ I t ′ w , and a tuple T satisfying the following properties.
Proof. Let t w be the last nonzero index of the type w of T . By the controllability of the tuple, there exists m ∈ [ℓ] with η m ∈ I tw such that η m ′ = η m implies that ρ m ′ and ρ m are distinct. Let t ′ w ∈ Supp(w) be an index different from t w (it exists since the type w is non-basic) and i ∈ [ℓ] be an index with η i ∈ I t ′ w . We define η ′ := τ mi η, meaning that we replace T ηm by T η i and keep the other transformations the same.
We let λ ∈ N be the smallest number for which λ bm ρ m ∈ Z[n] (equivalently, λ is the smallest number such that b m divides the coefficients of the polynomials ρ m (λn + r) − ρ m (r) for r ∈ Z). We also fix an arbitrary r ∈ {0, . . . , λ − 1}. We then define the new polynomials ρ ′ 1 , . . . , ρ ′ ℓ by the formula has the type w ′ = σ twt ′ w w, which is strictly smaller than w by the assumption that t ′ w < t w (which follows from t ′ w = t w and the assumption that t w is the last nonzero index). Example 5 (Examples of type reduction). We show how Proposition 6.1 has been implicitly applied to the two tuples from Section 4. ) presented at the end of Section 4 also has type (2, 1) corresponding to the partition I 1 = {1, 2}, I 2 = {3}, and its good ergodicity property also states that I(T 1 T −1 2 ) = I(T 1 ) ∩ I(T 2 ). In the ping step of the smoothing argument, we obtained (upon taking r 0 = 1) the new tuple (T 4n 2 +4n 1 , T 4n 2 +4 2 , T 4n 2 +5n 2 ) by performing the operation τ 32 . This new tuple also has the indexing tuple (1, 2, 2) and basic type (3, 0), and its ergodicity property is the same as for the original tuple.
In particular, we get the following corollary of interest to us that follows from a straightforward combination of Proposition 6.1 and Lemma 6.2. Corollary 6.3 (Type reduction preserves descendancy). Let ℓ ∈ N, η ∈ [ℓ] ℓ be an indexing tuple, p 1 , . . . , p ℓ , ρ 1 , . . . , ρ ℓ ∈ Z[n] be polynomials, and (X, X , µ, T 1 , . . . , T ℓ ) be a system. Suppose that T is a tuple of a non-basic type w that is a descendant of . Then the tuple T in Proposition .
Proof. Suppose that ρ 1 , . . . , ρ ℓ are descendants of p 1 , . . . , p ℓ along η by assumption. Letting a j , b j be the leading coefficients of p j , ρ j respectively and ρ ′ j be as defined in Proposition 6.1, we have for j = m and for j = m, where we use η ′ m = η i and η ′ j = η j for j = m. Hence, the polynomials ρ ′ 1 , . . . , ρ ′ ℓ satisfy the condition of Lemma 6.2, implying the claim.
Descendant tuples enjoy the following important properties.
Property (i) ensures that when passing to descendants, we do not need to redefine the partition I 1 , . . . , I K 1 . Property (ii) is crucial because it shows that descendants retain the essential ergodicity properties of the original tuple.
Proof. Part (i) follows from the fact that for every j ∈ [ℓ], the polynomials p j and ρ j have the same degree, and that p j 1 , p j 2 are linearly dependent if and only if ρ j 1 , ρ j 2 are. We therefore move on to proving part (ii). Let b j be the leading coefficient of ρ j , a j be the leading coefficient of p j , and d j := deg p j = deg ρ j for every j ∈ [ℓ]. To check that has the good ergodicity property, we need to show that if η j 1 , η j 2 are distinct elements of the same set I t , then where for j = j 1 , j 2 . By construction, b j = a η j λ d j for some λ ∈ N, and so β j = a η j / gcd(a η j 1 , a η j 2 ) =: α η j (47) for j = j 1 , j 2 . The assumption η j 1 , η j 2 ∈ I t for some fixed t implies that p η j 1 , p η j 2 are linearly dependent, and additionally p η j 1 /α η j 1 = p η j 2 /α η j 2 . Since α η j 1 , α η j 2 are coprime, the good ergodicity property of T p j (n) j j∈ [ℓ] implies that The equality (46) follows from this and the identification (47).
If we define the partition 7 I 1 = {1, 2, 3}, I 2 = {5, 6}, I 3 = {4}, I 4 = {7}, then the tuple has type w 0 = (3, 2, 1); we recall that the term T n 7 plays no part in the type consideration since the polynomial ρ 07 (n) = n has a lower degree. The tuple (48) has the basic indexing tuple η 0 = (1, 2, 3, 4, 5, 6, 7). The tuple is controllable, and 4 satisfies the controllability condition, so in the first step we replace T 4 (this corresponds to us wanting to first get a T 4 -seminorm control over the tuple (48)). We are then provided with an index i 0 ∈ I 1 ∪ I 2 (say, i 0 = 1), and we get the new indexing tuple η 1 := τ m 0 i 0 η 0 = τ 41 η 0 = (1, 2, 3, 1, 5, 6, 7) The leading coefficient 2 of 2n 2 + n does not divide the linear coefficient, and the smallest λ 0 ∈ N such that 2 divides the coefficients of λ 0 (2n 2 +n) is λ 0 = 2. In performing the ping step of the seminorm smoothing argument for the tuple (48), we will want to apply the T 1 T −2 4 invariance of some function u to replace T 2n 2 +n 4 u by T q(n) 1 u ′ for some q ∈ Z[n] and a function u ′ related in some way to u. We cannot do this directly since 1 2 (2n 2 +n) / ∈ Z[n], but we can do this "piecewise" by splitting N into arithmetic progressions (2N + r) r=0,1 and considering the two cases separately (see the sketch of the seminorm smoothing argument for n 2 , n 2 , 2n 2 + n at the end of Section 4 to see how this was done for that family). We therefore replace the original polynomials ρ 01 , . . . , ρ 07 by new polynomials ρ 1j (n) := ρ 0j (2n + r 0 ) − ρ 0j (r 0 ), j = 4 1 2 (ρ 04 (2n + r 0 ) − ρ 04 (r 0 )), j = 4 for some r 0 ∈ {0, 1} (the choice of r 0 is not ours). Assuming that r 0 = 1, we obtain the new tuple The type of the new tuple is w 1 = σ 31 w 0 = (4, 2, 0) since we now have four transformations with indices coming from I 1 and two transformations coming from I 2 . This type is lower than the original type w 0 , and so we have successfully obtained a tuple of lower type. The new tuple is controllable, with m = 5, 6 both satisfying the controllability condition.
Although we replaced the polynomials ρ 01 , . . . , ρ 07 by new ones, we note that for any j 1 , j 2 ∈ [7], the polynomials ρ 1j 1 , ρ 1j 2 are pairwise dependent if and only if ρ 0j 1 , ρ 0j 2 are, and not only that: if they are pairwise dependent, then ρ 1j 1 /c 1 = ρ 1j 2 /c 2 if and only if ρ 0j 1 /c 1 = ρ 0j 2 /c 2 for any nonzero integers c 1 , c 2 . Moreover, if η 1j 1 = η 1j 2 , then the leading coefficients of ρ 1j 1 and ρ 1j 2 are identical. These two observations ensure that the ergodicity conditions on , which constitute the assumption that the original tuple (48) has the good ergodicity property, carry on to the new tuple (49), implying that it also enjoys the good ergodicity property. This exemplifies the claim from Proposition 6.4 that descendants of tuples with the good ergodicity property inherit the property.
The new tuple (50) has type w 2 = σ 21 w 1 = (5, 1, 0), which is still not basic, and so we continue the procedure one more time. The only index left in I 2 is 6, and it satisfies the controllability assumption, so we replace T 6 this time. We are given an index i 2 ∈ I 1 (say, i 2 = 1), so that 2, 3, 1, 3, 1, 7).
Lastly, we observe that η 3 | I 1 = η 3 | {1,2,3} is constant, i.e. while performing the type reduction procedure, we did not replace the transformations at indices from I 1 . This is a special case of property (vi) from Proposition 6.8, which will play an important role in the proof of Proposition 7.2, a seminorm control argument for tuples of basic type.
6.2. The role of invariance properties. Proposition 6.1 ensures that the lower type tuples to which we pass in the ping step of the smoothing argument have the good ergodicity property. But this is not enough. For more complicated tuples, we also need to assume that the functions appearing in the associated average have some invariance properties, otherwise the induction breaks. The example that we present now displays the necessity of this extra information. We sketch how -reducing the original tuple to tuples of shorter length or lower type -we eventually arrive at averages for which we cannot obtain seminorm control unless the functions appearing in the averages satisfy certain invariance properties. We emphasise that our goal in this example is not to give a complete proof of seminorm control, but rather to point out the necessity of the invariance assumptions. Therefore, we assume without proof when convenient that we have seminorm control over certain tuples of lower type or shorter length. Suppose that the limit of (51) is nonzero. Arguing as in the proof of Proposition 4.1, we deduce that Each of the averages inside the liminf above is of the form T n 2 1 g 1 · T n 2 2 g 2 · T n 2 +n 3 g 3 · T n 2 +n 2 g 4 (53) for 1-bounded functions g 1 , g 2 , g 3 , g 4 ∈ L ∞ (µ) of which g 4 is T 4 T −1 2 invariant. The averages (53) are controllable, with 3 satisfying the controllability condition, and they have type (3,1), which is lower than the type (2, 2) of the original average (51). Assuming inductively that we have the seminorm control of averages (53) by a T 3 -seminorm of f 3 8 , we can deduce from (52) (like in the proof of Proposition 4.2) that (54) 8 While we only use this particular control, our inductive assumption will guarantee that we control averages (53) by a relevant seminorm of other functions, too.
for some D j ∈ D s 1 . Each average in (54) takes the form Assuming inductively that averages of the form (55) are controlled by a T 4 -seminorm of the last term, we get the desired claim |||f 4 ||| b 1 ,...,bs,e ×s ′ 4 > 0 for some s ′ ∈ N using an argument similar to one in the proof of Proposition 4.2.
We have showed how a seminorm control of the original average (51) by a T 4 -seminorm of f 4 follows from the seminorm control of the averages (53) and (55). We have not proved, however, that these auxiliary averages are indeed controlled by Gowers-Host-Kra seminorms, assuming instead that this follows by induction. It turns out that obtaining a seminorm control of the averages (53) involves an interesting twist in that the argument makes essential use of the assumption that the function g 4 is T 4 T −1 2 -invariant. We sketch the steps taken in the seminorm smoothing argument for this average under the extra invariance assumption to show where this invariance property comes up and why it is necessary.
In proving the seminorm control of (53), we first prove that the average is controlled by a T 3 -seminorm of g 3 since T 3 is the only transformation with index in I 2 . Arguing as above (using Proposition 3.7 for (53), assuming that the L 2 (µ) limit of (53) is nonzero and mimicking the proof of Proposition 4.2), we deduce that respectively. Both of them have basic type.
We show that for arbitrary g ′ 1 , g ′ 2 , g ′ 3 , g ′ 4 , without the aforementioned invariance assumptions, we would not be able to control the average (56) by Gowers-Host-Kra seminorms; specifically, we could not control it by a T 2 -seminorm of g ′ 4 . Conversely, this is Assuming for simplicity that g ′ 1 = g ′ 2 := 1, we have that the second average equals and so without any additional assumptions, the average (56) is in general not controlled by a T 2 -seminorm of g ′ 3 or a T 2 -seminorm of g ′ 4 . However, the invariance assumptions on property once again, this time alongside with Lemma 3.5, we get |||g ′ 4 ||| s ′ ,T 4 = |||g ′ 4 ||| s ′ ,T 2 , so a T 2 -seminorm of g ′ 4 controls (57) and hence (56). To get control over (56) by a T 2 -seminorm of g ′ 4 without any simplifying assumptions on g ′ 1 , g ′ 2 , we have to run a more complicated argument. Combining Proposition 3.7, the ergodic condition on T 1 T −1 2 and Lemma 2.2, we first obtain control of (56) by a T 1seminorm of g ′ 1 and a T 2 -seminorm of g ′ 2 . Assuming that the L 2 (µ) limit of the average (56) is positive, we use this newly established control, decompose g ′ 1 using Proposition 2.3 and apply the pigeonhole principle to show that the average has a nonvanishing limit. The invariance properties of g ′ 3 and g ′ 4 imply that the average (58) equals for which a seminorm control by a T 4 -seminorm of g ′ 4 follows by inductively invoking seminorm control for averages of length 3. The invariance property of g ′ 4 implies once again that |||g ′ 4 ||| s ′ ,T 4 = |||g ′ 4 ||| s ′ ,T 2 for any s ′ ∈ N. It follows that a T 2 -seminorm of g ′ 4 controls (59), and hence also (58) and (56). The invariance properties also come up in the pong step of the smoothing argument for (53). In this part, we encounter averages of the form Moreover, the function g ′′ 4 is T 4 T −1 2 -invariant because it is essentially a multiplicative derivative of g ′ 4 . By a similar reason as before, such averages could not be controlled by Gowers-Host-Kra seminorms for arbitrary g ′′ 2 , g ′′ 3 , g ′′ 4 without the invariance assumption. However, thanks to the invariance assumption, the average (60) equals which is controlled 9 by |||g ′′ 4 ||| s ′ ,T 4 for some s ′ ∈ N. Using the invariance property of g ′′ 4 again together with Lemma 3.5, we deduce that |||g ′′ 4 ||| s ′ ,T 4 = |||g ′′ 4 ||| s ′ ,T 2 , and so a T 2seminorm of g ′′ 4 does control (60). An argument similar to the one at the end of the proof of Proposition 4.2 implies that a T 2 -seminorm of g 4 controls (53).
The example above shows that it is crucial to keep track of the invariance properties of the functions appearing in our averages; these invariance properties turn out to be indispensable for applying Proposition 3.7 to averages of basic type, while obtaining seminorm control in the pong step of the argument, or -as we see later on -for handling uncontrollable averages. Using the invariance property to replace an average like (58) and (60) for which we cannot have seminorm control, by an average like (59) and (61) respectively, which is controlled by Gowers-Host-Kra seminorms exemplifies the flipping technique that will be presented in detail in Proposition 6.6.
Recalling how we have performed the ping and pong steps in Examples 1, 2, and 7, we observe that in the ping step, we pass from the average to averages where f j,h,h ′ := ∆ b 1 ,...,bs;h−h ′ f j for some vectors b 1 , . . . , b s ∈ Z ℓ . In particular, the functions f j,h,h ′ are invariant under whatever transformations the functions f j are invariant. Moreover, the functions u h,h ′ are invariant under T am ηm T −a i η i , where a m and a i are the leading coefficients of ρ m and ρ i , but they also retain whatever invariance property f m has. Thus, by passing from (62) to (63), we do not lose any invariance properties of the original functions, but rather gain new ones.
Similarly, in the pong step, we pass to averages and the functions f j,h,h ′ retain whatever invariance properties f j have. Thus, the functions (f 1 , . . . , f ℓ ) get replaced by in the ping step and in the pong step. We now formalise the idea that these new families of functions retain the original invariance properties and gain new ones.
Definition (Good invariance property). Let γ ∈ N. We say that the tuple of functions (f 1 , . . . , f ℓ ) has the γ-invariance property along η with respect to polynomials p 1 , . . . , p ℓ with leading coefficients a 1 , . . . , a ℓ , if for every j ∈ [ℓ], the function f j is invariant under T aη j η j T −a j j γ . Let I be a (possibly infinite) indexing set. We say that a collection (f i1 , . . . , f iℓ ) i∈I has the good invariance property along η with respect to polynomials p 1 , . . . , p ℓ if there exists γ ∈ N such that (f i1 , . . . , f iℓ ) i∈I has the γ-invariance property along η with respect to p 1 , . . . , p ℓ for every i ∈ I.
If η = (1, . . . , ℓ) is the identity tuple, there is nothing to check and any collection of functions has the 1-invariance property with respect to any polynomial family. The property only becomes nontrivial when η is not the identity tuple.
In our arguments, we will ensure that the functions f j in the average obtained by a sequence of reductions from the original average have the good invariance property with respect to the original polynomials p 1 , . . . , p ℓ , i.e. there exists γ ∈ N such that for every j ∈ [ℓ], the function f j is invariant under T aη j where a j is the leading coefficient of the polynomial p j . The need to keep track of the invariance property with respect to the original polynomials is explained in Example 8 below. Before we state this example, however, we prove that the invariance properties get preserved when passing from the tuple of functions (f 1 , . . . , f ℓ ) to the tuples (64) and (65). Proposition 6.5 (Propagation of invariance properties). Let ℓ, s ∈ N, η ∈ [ℓ] ℓ be an indexing tuple, η ′ := τ mi η for distinct m, i ∈ [ℓ] be another indexing tuple, p 1 , . . . , p ℓ ∈ Z[n] be polynomials with leading coefficients a 1 , . . . , a ℓ , (X, X , µ, T 1 , . . . , T ℓ ) be a system, and b 1 , . . . , b s ∈ Z ℓ be vectors. Suppose that for some γ ∈ N, the functions (f 1 , . . . , f ℓ ) have . . , f ′ ℓ,h ) h∈Z s has the good invariance property along η ′ with respect to p 1 , . . . , p ℓ .
Proof of Proposition 6.5. For j = m, the functions f j are invariant under T aη j η j T −a j j γ for some nonzero γ ∈ Z independent of h ∈ Z s , and so are their translations The identity η ′ j = η j , which holds for j = m, and the fact that f ′ j,h is a product of by noting η ′ m = η i and combining the two invariance properties that these functions enjoy. Letting γ ′ := lcm(γ, γ 1 γ 2 ), it follows that for every h ∈ Z s , the collection (f ′ 1,h , . . . , f ′ ℓ,h ) has the γ ′ -invariance property along η ′ with respect to p 1 , . . . , p ℓ .
To get desirable seminorm control over the intermediate tuples encountered in Proposition 6.1, it is not sufficient to keep track of the most immediate invariance properties. This is illustrated by the example below.

Example 8 (The necessity of composed invariance properties). Consider the average
It has length 6, degree 2 and type (2, 2, 2) corresponding to the partition We assume that it has the good ergodicity property, i.e. . Suppose we want to perform the seminorm smoothing argument to obtain a control of the associated average by the T 6 -seminorm of f 6 . We iteratively pass to averages of lower type as in Proposition 6.1 (all of which turn out to be controllable), and we show that to get seminorm control for the average of basic type at which we arrive, we need to keep track of not just the latest invariance properties that the functions in the intermediate averages enjoy, but of all the invariance properties that the functions in earlier intermediate averages enjoyed.
Step 1: Reducing to an average of basic type.
In the ping part of the seminorm smoothing argument for (66), we replace T 6 in the original average (66) by some T i with i ∈ [4] = I 1 ∪ I 2 , arriving at, say, averages The new tuple (67) has type (2, 3, 1) and is controllable, with the index 5 satisfying the controllability condition. The functions inside take the form where the functions u 1,h,h ′ are T 6 T −1 4 -invariant.
To obtain seminorm control of the average (67), we need to perform the seminorm smoothing argument for this tuple. We aim to control it first by a T 5 -seminorm of f 15 since 5 satisfies the controllability condition and is the only index left in I 3 . As guided by Proposition 6.1, in the ping step of the smoothing argument, we replace T 5 in (67) by some T i with i ∈ [4]. When i = 1, for instance, we end up with averages The functions in (71) take the form in particular, the functions f 54 , f 55 , f 56 retain the invariance properties of f 44 , f 45 , f 46 and f 53 is Step 2: Handling an average of basic type. The average (71) has basic type (6, 0, 0), and so we want to control it by appropriate seminorms using Proposition 3.7. We show that without the assumption that f 56 is invariant under both T 4 T −1 1 and T 6 T −1 4 , we cannot control this average by a T 1 -seminorm of f 56 , and conversely -that this goal can be achieved with both of these assumptions.
We first note that Proposition 3.7, the ergodicity condition I(T 1 T −1 2 ) = I(T 1 ) ∩ I(T 2 ) and Lemma 2.2 allow us to control it by a T 1 -seminorm of f 51 and by a T 2 -seminorm of f 52 10 . Suppose that the L 2 (µ) limit of (71) is positive. Decomposing f 51 using Proposition 2.3 and then applying the pigeonhole principle, we deduce the existence of D ∈ D such that We now want to obtain a seminorm control of the average in (72) by inductively invoking seminorm control for some average of length 5. To this end, we attempt to proceed like in Example 7. That is, we use the invariance of f 53 , f 54 , f 55 , f 56 under respectively to conclude that the average in (72) equals However, without any extra information, we could not control the average (73) using a We know nothing about the composition T 5 T −1 4 , and so without additional input, we cannot control (74) by a Gowers-Host-Kra seminorm. This is the moment when we have to use the additional T 6 T −1 4 -invariance of f 56 . Since T 4 f 56 = T 6 f 56 , we can replace T 4 in (74) by T 6 . Then Proposition 3.7 gives us control over (74) by |||f 56 ||| e ×s 6 ,(e 6 −e 5 ) ×s for some s ∈ N. The ergodicity condition on T 6 T −1 5 and the T 6 T −1 1 -invariance of f 56 then give |||f 56 ||| e ×s 6 ,(e 6 −e 5 ) ×s ≤ |||f 56 ||| 2s,T 6 = |||f 56 ||| 2s,T 1 , and so this latter seminorm controls (72), and hence also (71).
If we want to control (72) by a T 1 -seminorm of f 56 without the simplifying assumptions on D and f 52 , f 53 , f 54 , we proceed similarly 11 . Using the T 6 T −1 4 -invariance of f 56 , we rewrite (73) as Then we inductively apply the fact that we have seminorm control for averages of length 5 of the form (75) to control this average, and hence also (71), by a T 6 -seminorm of f 56 . Subsequently, the T 6 T −1 1 -invariance of f 56 (resulting from its T 6 T −1 4 -and T 4 T −1 1invariance) and Lemma 3.5 give control of (71) by a T 1 -seminorm of f 56 .
We note that the argument above would not work if we only used the "new" property of f 56 of being invariant under T 4 T −1 1 rather than the "combined" property of being invariant under T 6 T −1 1 . The important point that this example shows is that it is not enough to keep track of the new invariance properties that we obtain at each stage of the ping argument and forget the old ones. Rather, we need to keep track of the invariance property with respect to a composition of the original and the most recent transformations, which in our example are T 6 and T 1 .
We note that if the leading coefficients of p 1 , . . . , p ℓ are all 1, and the good invariance property takes the form of f j being invariant under T η j T −1 j for all j ∈ [ℓ], then Proposition 6.6 is straightforward, and in fact for every N ∈ N, we have and |||f j ||| s,Tη j = |||f j ||| s,T η ′ j for all j ∈ [ℓ], s ∈ N. The need for the more complicated statement of Proposition 6.6 comes from tedious but uninspiring technicalities that appear when the polynomials p 1 , . . . , p ℓ have leading coefficients distinct from 1. We have used the flipping technique twice in Example 7: in (57) and when passing from (60) to (61) in order to obtain a seminorm control on the former using the seminorm control on the latter. We also used it in Example 8 to get seminorm control of (71). We will also use it shortly to handle uncontrollable tuples.
Proof of Proposition 6.6. For each j ∈ [ℓ], let a j be the leading coefficient of p j and γ j ∈ N be the smallest natural number such that f j is invariant under T aη j η j T −a j j γ j (in particular, γ j = 1 if η j = j). Let γ ∈ N be the smallest natural number such that a η j γ j divides the coefficients of γρ j for every j ∈ A. We then define for some r ∈ {0, . . . , λ − 1} to be chosen later, and we observe that ρ ′ j ∈ Z[n] for every j ∈ [ℓ]. By definition of η ′ and the γ-invariance of f 1 , . . . , f ℓ along η, the functions f ′ 1 , . . . , f ′ ℓ are γ-invariant along η ′ . Moreover, if f j = 1, then so is f ′ j . Lastly, they satisfy the bound |||f ′ j ||| s,T η ′ j ≪ aη j γ |||f j ||| s,Tη j for every s ≥ 2 and j ∈ [ℓ]; this is trivial for j / ∈ A, where we use the fact that f j is a composition of f ′ j with respect to a measure preserving transformation, the invariance property of f j , Lemma 3.5, and both directions of Lemma 2.1.
We emphasise that Corollary 6.7 by itself does not guarantee that the tuple T has a lower type than the tuple T ρ j (n) η j j∈ [ℓ] . However, this will be the case when we apply it to all the tuples T that appear in our inductive procedure. The crucial ingredient in achieving this type reduction will be the property (iv) in Proposition 6.8 enjoyed by all the tuples showing up in our arguments.
6.4. Recapitulation. We conclude this section with Proposition 6.8, which forms an inductive framework for the proof of Theorem 1.1 in the next section. Combining the content of Proposition 6.1 and Corollary 6.7, it shows that -starting with a tuple T satisfying the good ergodicity property -we reach a tuple of basic type in a finite number of steps. For tuples of basic type, seminorm control will follow from arguments made in Proposition 7.2, and then we will use this fact and induction to go back and get seminorm control for the original tuple T . Proposition 6.8 (Inductive framework). Let ℓ ∈ N, p 1 , . . . , p ℓ ∈ Z[n] be polynomials, and (X, X , µ, T 1 , . . . , T ℓ ) be a system. Suppose that the tuple T has the good ergodicity property. Then there exists r ∈ N (which depends only on p 1 , . . . , p ℓ but can be bounded purely in terms of ℓ) and a sequence of tuples with types w 0 , . . . , w r such that T and for k ∈ {0, . . . , r − 1}, the following properties hold.
is controllable, then the tuple T is chosen using Proposition 6.1.
is uncontrollable, then the tuple T is chosen using Corollary 6.7.
(v) We have w k+1 < w k , i.e. the tuple T has a lower type than the .
(vi) The tuple T has basic type and the restriction η r | I 1 is the identity sequence.
For the rest of the paper, we call a tuple T a proper descendant of the tuple if it appears in one of the sequences of tuples constructed from T p j (n) j j∈ [ℓ] using Proposition 6.8.
Proof. Let k ∈ {0, . . . , r − 1}. Suppose that the tuples T , . . . , T ρ kj (n) η kj j∈ [ℓ] are already constructed and satisfy the properties (i)-(v). If the tuple T has basic type, we halt. Otherwise, we choose the tuple T using Proposition is controllable 12 and using Corollary 6.7 otherwise, so that the properties (i) and (ii) are satisfied. We note from Corollaries 6.3 and 6.7 that the new tuple is a descendant of T p j (n) j j∈ [ℓ] , and by Proposition 6.4 it has the good ergodicity property, hence the property (iii) holds as well. The property (iv) holds by induction and the way is constructed using Proposition 6.1 or Corollary 6.7.
For the property (v), we first note that if T is controllable, then the prop- by Proposition 6.1. If T is uncontrollable, then we get from Corollary 6.7 that w (k+1)t k = 0 < w kt k , where t k := t w k is the last nonzero index of w k , and we deduce from property (iv) that w (k+1)t = w kt for 1 ≤ t < t k . This point is important, so we explain it in words. What happens is that when we apply Corollary 6.7, the index w (k+1)t k goes down to 0 (since all the transformations with indices from I t k get flipped), but the new transformations appearing in their place have indices from I t k +1 , . . . , I K 2 , as given by the property (iv). Hence, w k+1 < w k in this case as well. It follows from the property (v) and the fact that there are at most (K 3 + 1) K 2 possible types for tuples in N K 2 0 with sum of coordinates K 3 , that the sequence eventually terminates. And since, by our construction, it can only terminate on the basic type (K 3 , 0, . . . , 0), there exists r < (K 3 + 1) K 2 ≤ (ℓ + 1) ℓ such that the tuple T ρ rj (n) η rj j∈ [ℓ] which shows that at each step, the new tuple has a lower type than its predecessor. The sixth tuple, counting from the top, has been obtained via Corollary 6.7 (since the fifth tuple is uncontrollable, as explained in Examples 4 and 9) while all the other tuples have been obtained using Proposition 6.1. Each subsequent tuple is a descendant of the original tuple, and so each of them has the good ergodicity property thanks to Proposition 6.4. Lastly, the final tuple has basic type, and moreover for its indexing tuple η, the restriction η| I 1 = η| [4] is an identity because no substitution has taken place at the first four indices.  . Then there exists s ∈ N, depending only on d, D, ℓ, L, such that for all functions f 1 , . . . , f ℓ ∈ L ∞ (µ) with the good invariance property along η with respect to p 1 , . . . , p ℓ and all sequences of functions whenever |||f j ||| s,Tη j = 0 for some j ∈ [ℓ].
A word of explanation is necessary for the statement of Proposition 7.1. We need both polynomials p 1 , . . . , p ℓ and ρ 1 , . . . , ρ ℓ . The reason is that for our induction to work, we need the functions f 1 , . . . , f ℓ to have the good invariance property with respect to the original family p 1 , . . . , p ℓ rather than the descendant family ρ 1 , . . . , ρ ℓ . This is necessary for a number of reasons: to prove seminorm control for averages of basic types in the proof of Proposition 7.2; to apply Proposition 6.6 in the pong step of Proposition 7.5; to derive Proposition 7.1 from Proposition 7.4 for controllable tuples; and to invoke Corollary 6.7 for uncontrollable tuples in Proposition 7.1. The necessity of keeping track of the invariance property with regards to the original polynomial family has also been explained in Step 2 of Example 8.
Technically, the value s in Proposition 7.1 depends also on on η, but since the number of possible tuples η is bounded in terms of ℓ, this dependence can be removed.
be a proper descendant of the tuple T p j (n) j j∈ [ℓ] , and suppose that the is basic. Then there exists s ∈ N, depending only on d, D, ℓ, L, such that for all 1-bounded functions f 1 , . . . , f ℓ ∈ L ∞ (µ) with the good invariance property along η with respect to p 1 , . . . , p ℓ and all sequences of functions D 1 , . . . , D L ∈ D d , we have (80) whenever |||f j ||| s,Tη j = 0 for some j ∈ [ℓ].
A special case of Proposition 7.2 has been sketched in Step 2 of Example 8, and we invite the reader to compare the abstract proof presented below with the argument in Step 2 of Example 8.
Proof. We induct on the length ℓ of the average. If ℓ = 1, the statement holds by Proposition 3.7. We therefore assume that ℓ > 1, and we will prove Proposition 7.2 for fixed ℓ > 1 by invoking Proposition 7.1 for an average of length ℓ − 1. More specifically, we will show first that there exists an index m satisfying the controllability condition, and that we can control the average by a T ηm -seminorm of f m . Then we will replace f m by a dual function using Proposition 2.3 and the pigeonhole principle, flip the other transformations T η j into T j using Proposition 6.6, and invoke Proposition 7.1 for averages of length ℓ − 1 to obtain seminorm control in terms of other functions.
For controllable tuples, Proposition 7.1 will be deduced from the following result.
Proposition 7.4 is a consequence of Proposition 3.7, followed by an iterated application of the smoothing result given below. be a proper descendant of the tuple T p j (n) j j∈ [ℓ] . Suppose that T of a non-basic type is controllable, and let m be an index satisfying the controllability condition. Then for all s ≥ 2 and vectors b 1 , . . . , b s+1 satisfying (23) there exists s ′ ∈ N, depending only on d, D, ℓ, L, s, with the following property: for all functions f 1 , . . . , f ℓ ∈ L ∞ (µ) with the good invariance property along η with respect to p 1 , . . . , p ℓ and all sequences of functions D 1 , . . . , D L ∈ D d , if |||f m ||| b 1 ,...,b s+1 = 0 implies (80), then (80) also holds under the assumption that |||f m ||| b 1 ,...,bs,e ×s ′ ηm = 0.
We explain now the induction scheme whereby we prove Propositions 7.1-7.5. Roughly speaking, the proofs proceed by the induction on the length ℓ of an average and -for fixed ℓ -by induction on type, where the base case are averages of basic types. More precisely, the induction scheme goes as follows: (iv) For tuples (T ρ j (n) η j ) j∈[ℓ] of length ℓ > 1 and non-basic type w, we prove Proposition 7.5 only under the assumption of controllability. This proof goes by inductively invoking Proposition 7.3 in two cases: for tuples of length ℓ and type controllability condition implies that ρ i , ρ ℓ are independent, and so T b s+1 is a nonzero iterate of T η ℓ . In both these cases, the result follows from the bound |||f ℓ ||| b 1 ,...,bs,ceη ℓ ≪ |c| |||f ℓ ||| b 1 ,...,bs,eη ℓ for any c = 0, which is a consequence of Lemma 2.1.
The last remaining case to consider, and the most difficult one, is when η i ∈ I t ′ with t ′ = t and b ℓ , b i = 0. The proof of Proposition 7.5 in this case follows the same two-step strategy that was explained in Example 1, but we also have to take into account additional complications explained in Examples 7 and 8. We first obtain the control of (35) by |||f i ||| b 1 ,...,bs,e ×s 1 η i for some s 1 ∈ N depending only on d, D, ℓ, L, s. This is accomplished by using the control by |||f ℓ ||| b1,...,b s+1 , given by assumption, for an appropriately defined functionf ℓ in place of f ℓ . Subsequently, we repeat the procedure by applying the newly established control by |||f i ||| b 1 ,...,bs,e ×s 1 η i for a functionf i in place of f i . This gives us the claimed result.
7.3. Proof of Proposition 7.1. We induct on the length ℓ of the average, and for each fixed ℓ we further induct on type. In the base case ℓ = 1, Proposition 7.1 follows directly from Proposition 3.7. We assume therefore that the average has length ℓ > 1 and type w and the statement holds for averages of length ℓ − 1 as well as length ℓ and type w ′ < w. If the type w is basic, then Proposition 7.1 follows from Proposition 7.2, so we assume that w is not basic. We argue differently depending on whether the average is controllable or not. D j (q j (n)) L 2 (µ) > 0 for some D ∈ D s . By Proposition 6.6, there exist 1-bounded functions (f ′ j ) j∈[ℓ] with f ′ m = 1 and polynomials ρ ′ 1 , . . . , ρ ′ ℓ , q ′ 1 , . . . , q ′ L ∈ Z[n] such that Provided s ≥ 2 (which we can assume without loss of generality), Proposition 6.6 also implies that |||f ′ j ||| s,T j ≤ C|||f j ||| s,Tη j for a constant C > 0 depending only on the leading coefficients of p 1 , . . . , p ℓ and the number γ for which f 1 , . . . , f ℓ have the γ-invariance property along η. Moreover, the fact that ρ ′ 1 , . . . , ρ ′ ℓ are descendants of p 1 , . . . , p ℓ and Proposition 6.4 imply that T ρ ′ j (n) j j∈[ℓ],j =m has the good ergodicity property. By the case ℓ−1 of Proposition 7.1, we deduce that |||f ′ j ||| s,T j > 0 for j = m, and hence |||f j ||| s,Tη j > 0 for j = m.