Bracket words along Hardy field sequences

We study bracket words, which are a far-reaching generalisation of Sturmian words, along Hardy field sequences, which are a far-reaching generalisation of Piatetski--Shapiro sequences $\lfloor n^c \rfloor$. We show that thus obtained sequences are deterministic (i.e., they have sub-exponential subword complexity) and satisfy Sarnak's conjecture.


Introduction
One of the key results in a recent paper [DDM + 22] by J.-M.Deshouillers, M. Drmota, A. Shubin, L. Spiegelhofer and the second-named author states that the subword complexity of (⌊n c ⌋ mod m) ∞ n=0 grows at most polynomially, which in particular shows that this sequence is deterministic.The philosophy behind this result is the following: if we take a regularly growing function ((⌊n c ⌋) ∞ n=0 ) and apply a very simple rule to it (taking the residue modulo m), then the resulting sequence is still quite simple (in this case it has polynomial subword complexity).In this paper we vastly generalize both main aspects of this result, i.e. we replace (⌊n c ⌋) ∞ n=0 with Hardy sequences and we replace taking the residue modulo m by applying a bracket word.
Sturmian words are among the simplest and most extensively studied classes of infinite words over a finite alphabet.One of their defining properties is extremely low subword complexity.Recall that the subword complexity of an infinite word a = (a(n)) ∞ n=0 over a finite alphabet Σ is the function p a which assigns to each integer N the number p a (N ) of words w ∈ Σ N which appear in a.If there exists at least one value of N such that p a (N ) ≤ N then a must be eventually periodic, in which case p a is bounded.If a is a Sturmian word then p a (N ) = N + 1 for all N , which in light of the remark above is the least subword complexity possible for a word that is not eventually periodic.
In [AK22] B. Adamczewski and the first-named author studied a generalisation of Sturmian words obtained by considering letter-to-letter codings of finitely-valued generalised polynomials, which they dubbed bracket words.A generalised polynomial is an expression built from the usual polynomials using addition, multiplication and the integer part function.For instance, Sturmian words (up to letter-to-letter coding) take the form a(n) = ⌊α(n + 1) + β⌋ − ⌊αn + β⌋ with α ∈ (0, 1) \ Q and b ∈ (0, 1) (possibly with the integer part ⌊•⌋ replaced by the ceiling ⌈•⌉) , and hence are special cases of bracket words.One of the main results of [AK22] is a polynomial bound on subword complexity of bracket words: p a (N ) ≪ N C for a constant C (dependent on a).
In [DDM + 22], J.-M.Deshouillers, M. Drmota, A. Shubin, L. Spiegelhofer and the second-named author investigated synchronising automatic sequences along Piatetski-Shapiro sequences (⌊n c ⌋) ∞ n=0 , where c > 1.A special case which plays a crucial role in the argument is when the synchronising automatic sequence is periodic, in which case they obtained a polynomial bound on the subword complexity.
As a joint extension of the two lines of investigation discussed above, we investigate bracket words along Piatetski-Shapiro sequences.In fact, we can deal with a considerably larger class of Hardy field functions with polynomial growth, which in addition to n c (c > 1) include logarithmic-exponential expressions such as αn c + α ′ n c ′ or n c log c ′ n, as well as some more complicated expressions such as log(n!).Our first result is a bound on the subword complexity.
Theorem A. Let a = (a(n)) n∈Z be a (two-sided) bracket word over the alphabet Σ and let f : R + → R be a Hardy field function with polynomial growth.Then the subword complexity of (a(⌊f (n)⌋) ∞ n=0 is bounded by exp(O(H δ )) for some 0 < δ < 1.The study of (special) automatic sequences along Piatetski-Shapiro sequences ⌊n c ⌋ has a long history.We mention results by C. Mauduit and J. Rivat [MR95,MR05], by J.-M.Deshouillers, M. Drmota, and J. Morgenbesser [DDM12], by L. Spiegelhofer [Spi15,Spi20] and by L. Spiegelhofer and the second-named author [MS17].Interestingly there can appear two very different situations: On the one hand, the Thue-Morse sequence along Piatetski-Shapiro sequences (for 1 < c < 3/2) is normal -in particular it has maximal subword complexity.On the other hand, synchronizing automatic sequences along Piatetski-Shapiro sequences are very far from normal -they have subexponential subword complexity.One natural generalization of automatic sequences are morphic sequences.These are letter-to-letter codings of fixed points of substitutions.A very prominent morphic sequence is the Fibonacci word which is the fixed point of the substitution 0 → 01, 1 → 0.Moreover, this sequence is also a Sturmian word and many interesting morphic sequences are also Sturmian words (see for example [KMPS18]).Thus, we obtain as a very special case (one of) the first results for morphic sequences along Piatetski-Shapiro sequences.
It follows from Theorem A that the sequence (a(⌊f (n)⌋) ∞ n=0 is deterministic, meaning that it has subexponential subword-complexity.A conjecture of Sarnak [Sar11] asserts that each deterministic sequence should be orthogonal to the Möbius function, given by This conjecture in general is wide open.However, it has been resolved in a number of special cases [Bou13, BSZ13, DDM15, DK15, EKL16, EALdlR14, FKPLM16, GT12a, Gre12, KPL15, LS15, MR10, MR15, Mül17, Pec18, Vee16], see also the recent survey articles [DLMR,FKPL18].Of particular importance to the current paper is Möbius orthogonality for nilsequences [GT12a], which was recently strengthened to short intervals [MSTT22].As we discuss later in the paper, this is closely connected to bracket words thanks to the work of Bergelson and Leibman [BL07].Our second result is the Möbius orthogonality for bracket words along Hardy field functions.
Theorem B. Let a = (a(n)) n∈Z be a (two-sided) R-valued bracket word and let f : R + → R be a Hardy field function with polynomial growth.Then Remark 1.1.We point out that using similar techniques, it is possible to obtain a slightly stronger result.Firstly, instead of the bracket word, we could work with a bounded generalised polynomial; in fact, each bounded generalised polynomial can be approximated in the supremum norm by finitely-valued ones, which allows for a straightforward reduction.Secondly, since all of the key ingredients in the proof of Theorem B are quantitative, one can obtain explicit rate of convergence to 0 in (1).We leave the details to the interested reader.
Theorem B is closely related to Möbius orthogonality for nilsequences, that is, sequences that can be obtained by evaluating a continuous function along an orbit of a point in a nilsystem.The connection between generalised polynomials and nilsequences was established by Bergelson and Leibman [BL07], who showed that bounded generalised polynomials can be represented by evaluating a piecewise polynomial function along an orbit in a nilsystem (see Theorem 4.2 for details).
The fact that nilsequences are orthogonal to the Möbius function was established by Green and Tao [GT12a] as a part of their program of understanding additive patterns in the primes.In fact, [GT12a] already contains an outline of the proof of Möbius orthogonality for bounded generalised polynomials, although some technical details are left out.
In order to obtain a result for a bracket word along a Hardy field function, we split the range of summation into intervals where the Hardy field function under consideration can be efficiently approximated by polynomials.We are then left with the task of establishing cancellation in each of these intervals.A key ingredient is Möbius orthogonality for nilsequences in short intervals, recently established in [MSTT22], Theorem 5.3.The main technical difficulty of our argument lies in extending Theorem 5.3 to piecewise constant (and hence necessarily not continuous) functions with semialgebraic pieces, which we accomplish in Section 5.2.
1.1.Plan of the paper.In Section 2 we recall some basic definitions and results about Hardy fields.Moreover, we study Taylor polynomials of functions from a Hardy field which generalizes the corresponding part in [DDM + 22].This allows us to locally replace functions from a Hardy field with polynomials.Thus, we need to be able to work with polynomials with varying coefficients.To do so, we study in Section 3 parametric generalised polynomials which builds on and refines results obtained in [AK22].These tools allow us to prove Theorem A. In Section 4 we present some basics on nilmanifolds and discuss the connection to generalized polynomials.Then, in Section 5 we recall a result on Möbius orthogonality for nilsequences in short intervals.This is the final result that we need to prove Theorem B. One naturally arising difficulty is to translate the result on Möbius orthogonality for smooth functions to piecewise polynomial functions instead.
Notation.We use N = {1, 2, . . .} to denote the set of positive integers and N 0 = N ∪ {0}.For N ∈ N, we let [N ] = {0, 1, . . ., N − 1}.For a non-empty finite set X and a map f : X → R, we use the symbol E borrowed from probability theory to denote the average Acknowledgements.The authors wish to thank Michael Drmota for many insightful discussions, for suggesting this problem, and also for inviting the firstnamed author to Vienna for a visit during which this project started; and Fernando Xuancheng Shao for helpful comments on Möbius orthogonality of nilsequences.

Hardy fields
In this section we discuss functions from a Hardy field which have polynomial growth.In particular we study how the Taylor-polynomial of f can be used to describe ⌊f (n)⌋.Therefore, we first gather some basic results on Hardy fields.Then we discus the uniform distribution of polynomials modulo Z. Finally, we study properties of Taylor polynomials and prove the main theorem of this section, namely Theorem 2.11.
2.1.Preliminaries.We start by gathering the basic facts and results on Hardy fields.For further discussion we refer e.g. to [Bos94] and [Fra09].
Let B be the collection of equivalence classes of real valued functions defined on some half line (c, ∞), where we identify two functions if they agree eventually. 1 A Hardy field H is a subfield of the ring (B, +, •) that is closed under differentiation, meaning that H is a subring of B such that for each 0 = f ∈ H, the inverse 1/f exists and belongs to H, f is differentiable and f ′ ∈ H.We let H denote the union of all Hardy fields.If f ∈ H is defined on [0, ∞) (one can always choose such a representative of f ) we call the sequence (f (n)) ∞ n=0 a Hardy sequence.We note that choosing different representatives of the same germ of a function f , changes the number of subwords of length N of a(⌊f (n)⌋) by at most an additive constant.As a consequence, the asymptotic behaviour of the subword complexity of a(⌊f (n)⌋) depends only on the germ of f .
A logarithmic-exponential function is any real-valued function on a half-line (c, ∞) that can be constructed from the identity map t → t using basic arithmetic operations +, −, ×, :, the logarithmic and the exponential functions, and real constants.For example, t 2 + 5t, t 3 , e (log t) 2 and e √ log t / √ t 2 + 1 are all logarithmicexponential functions.Every logarithmic-exponential functions belongs to H, and so do some other classical functions such as Γ, ζ or t → sin(1/t).
For real-valued functions f and g on (c, ∞) such that g(t) is non-zero for sufficiently large t, we write | for all large t.For completeness, we let 0 ∼ 0 and 0 ≪ 0.
We state the following well-known facts as lemmas.
Lemma 2.1.Let f ∈ H be a function that is not eventually zero.Then f is eventually strictly positive or negative.If f is not eventually constant, then f is eventually strictly monotone.
Proof.Since f is not eventually 0, there exists the inverse function 1/f -in particular, f (t) = 0 for t large enough.Now, the first part follows from continuity of f .The second part follows directly from the first part by considering f ′ .
Lemma 2.2.Let H be a Hardy field and let f, g ∈ H. Then one of the following holds: f ≺ g, f ∼ g or f ≻ g.
Proof.If g is eventually zero, the situation is trivial, so assume that this is not the case.Since f /g is eventually monotone, the limit lim t→∞ |f (t)| / |g(t)| ∈ R ∪ {∞} exists.If the limit is infinite then f ≻ g.If the limit is zero then f ≺ g.If the limit is finite and non-zero then f ∼ g.
Definition 2.3.We say that f has polynomial growth if there exists n ∈ N such that f (t) ≺ t n .
We will make use of the following estimates for the derivatives of functions with polynomial growth.
Lemma 2.4 ([Fra09, Lem.2.1]).Let f ∈ H be a function with polynomial growth.Then at least one of the following holds: (i) f (t) ≺ t −n for all n ∈ N; (ii) f (t) → c = 0 as t → ∞ for some constant c; 1 The equivalence classes just defined are often called germs of functions.We choose to refer to elements of B as functions instead, with the understanding that all the operations defined and statements made for elements of B are considered only for sufficiently large values of t ∈ R.
Lemma 2.5.Let f ∈ H be a function such that f (t) ≺ t −n for all n ∈ N. Then also f (ℓ) (n) ≺ t −n for all ℓ, n ∈ N.
Proof.Reasoning inductively, it is enough to consider the case where ℓ = 1.Suppose, for the sake of contradiction, that |f ′ (t)| ≫ t −n for some n ∈ N. Since f (t) → 0 as t → ∞ and since f is eventually monotone, for sufficiently large t we have contradicting the assumption on f .
Lemma 2.6.Let f ∈ H and assume that f (t) ≪ t k for some k ∈ Z. Then Proof.Reasoning inductively, it is enough to consider the case where ℓ = 1.We consider the three possibilities in Lemma 2.4.If f (t) ≺ t −n for all n ∈ N then the claim is trivially true by Lemma 2.5.If f ′ (t) ≪ f (t)/t then f ′ (t) ≪ t k−1 , as needed.Finally, suppose that f (t) → c = 0 as n → ∞.Clearly, in this case k ≥ 0. We may decompose f (t) = f (t) + c, where f (t) = f (t) − c and f (t) ≺ 1. Repeating the reasoning with f in place of f we conclude that f Remark 2.7.For each f ∈ H and each logarithmic-exponential function g, there exists a Hardy field H such that f, g ∈ H (see e.g.[Bos94]).Hence, it follows from Lemma 2.2 that for each f ∈ H there exists k 0 (f ) ∈ Z ∪ {−∞, +∞} such that, for k ∈ Z we have: . Lemma 2.6 implies that k 0 (f (ℓ) ) ≤ k 0 (f ) − ℓ (with the convention that ±∞ − ℓ = ±∞).

Uniform distribution of polynomials.
In this subsection we recall a result about the uniform distribution of polynomials modulo Z which we need for the next subsection about Taylor-polynomials.It is well-known that a polynomial distributes uniformly modulo Z if and only if at least one (non-constant) coefficient is irrational.
The following proposition is a quantitative version of this statement.First we need to specify the way we quantify how uniformly distributed a sequence a(n) mod Z is: Let (x 1 , . . ., x N ) be a finite sequence of real numbers.Its discrepancy is defined by Thus, we have the necessary prerequisites to state the following proposition.
Proposition 2.8 (Proposition 5.2 in [DDM + 22]).Suppose that g : Z → R is a polynomial of degree d, which we write as This proposition is a direct consequence of Proposition 4.3 in [GT12b], who attribute this result to Weyl.

Taylor expansions.
For any germ f ∈ H we consider a representative that is defined on [1, ∞) and also call it f .Then, for any x ∈ (1, ∞) and ℓ ∈ N 0 we can consider the length-ℓ Taylor expansion of f at the point x, f (x + y) = P x,ℓ (y) + R x,ℓ (y), (3) uniformly for all x ≥ 1 and 0 ≤ y ≤ x, where the implied constant only depends on f and ℓ.
Proof.Combining (5) and Lemma 2.6 we have Assuming that x ≥ y, the two estimates are equivalent.
Lemma 2.10.Let k ∈ N and let f be a k times continuously differentiable function defined on an open interval I ⊆ R. Suppose that f (k) (t) has constant sign on I. Then f changes monotonicity on I at most k − 1 times.
Proof.If f (k) (t) is constant zero for all t ∈ I, then f is a polynomial of degree at most k − 1 and the statement is trivially true.Thus, we assume without loss of generality that f (k) (t) > 0 for all t ∈ I. Let us assume for the sake of contradiction that f changes monotonicity at least k times.Thus, f ′ has at least k zeros in I.It follows from the mean value theorem that f ′′ has at least k − 1 zeros in I. Inductively applying this reasoning shows that f (k) has at least 1 zero in I giving the desired contradiction.
Theorem 2.11.Let k, ℓ ∈ N be integers with k < ℓ and let f ∈ H be a function satisfying f (t) ≪ t k , and let P N,ℓ and R N,ℓ be given by (3)-(5).Then there exists some 0 < η < 1 (only depending on ℓ) such that for any H ∈ N, the formula (iii) e N is structured: There exists a partition of [H] into O(H η ) arithmetic progressions with step O(H η ) on which e N is constant.
(In the theorem above, the constants implicit in the O(•) notation are allowed to depend on k, ℓ and f .) Proof.We define ε = H η0 for some η 0 > 0 which only depends on ℓ and will be specified later.Let N ∈ N. Recall that by Proposition 2.9, we have k) .Thus, the values of N such that (7) is false contribute only O H O(1) different sequences e N , and we may freely assume that N is large enough that (7) holds.In this case we have e N : [H] → {−1, 0, 1}.Additionally, by Lemma 2.1 we may also assume that f (ℓ) (x) = 0 for all x ≥ N .As a consequence of (7), for each 0 Let α 0 , . . ., α ℓ−1 denote the coefficients of P N,ℓ : By Proposition 2.8, we distinguish two cases. (i In the first case, it follows that the number of h ∈ [H] such that (8) does not hold is at most 3εH.Thus, e N is sparse, i.e. it has at most 3εH ≪ H 1−η0 non-zero entries.It remains to estimate the number of the sequences e N of this type.Using a standard estimate Thus the number of distinct sequences e N is bounded by exp(O(H 1−η0/2 )), which gives the desired result as long as 1 − η 0 /2 ≤ η.
In the second case we split [H] into arithmetic progressions with common difference q ≪ ε −O ℓ (1) .This allows us to write (for 0 ≤ m < q) The defining property of q implies that max 1≤j<ℓ In particular, we can write Putting everything together, we find where In particular, Q is a polynomial of degree at most ℓ − 1 with integer coefficients and In the first case e N (qh+m) = 1 and in the second case e N (qh+m) = −1.Since r(h) is a polynomial of degree at most ℓ − 1, it changes monotonicity at most ℓ − 2 times.Since the ℓ-th derivative of r(h)+R N,ℓ (qh+m) = f (N +qh+m)−P N,ℓ (qh+m)+r(h) has constant sign, by Lemma 2.10 it changes monotonicity at most ℓ − 1 times on the interval [0, H/q].Hence, we can decompose [0, H/q] into at most 2ℓ − 2 intervals I 1 , . . ., I p on which r(h) and r(h , we can further subdivide each of the intervals I j into O(ε −O ℓ (1) ) subintervals such that for each subinterval, each of the inequalities in either true on the entire subinterval or false on the entire subinterval.As a consequence, e N is structured, i.e., e N is constant on each subinterval.Thus, we have found a decomposition of [H] into O(ε −O ℓ (1) ) arithmetic progressions on which e N is constant.We can write O(ε −O ℓ (1) ) = O(H Cη0 ) for some C = C(ℓ) > 0. Using the rough estimate H 3 for the number of arithmetic sequences contained in [H], we can bound the number of sequences e N which arise this way by It remains to choose η 0 = (C +2) −1 and η = 1−(2(C +2)) −1 to finish the proof.

Parametric generalised polynomials
In this section we discuss parametric generalised polynomials which builds on and refines results obtained in [AK22].In particular, we show that for any parametrised general polynomial that takes values in [M ], we can assume that the parameters belong to [0, 1) J for some finite set J (Proposition 3.5).This allows us to show a polynomial bound on the number of subwords of bracket words along polynomials of a fixed degree (Corollary 3.7).At the end of the section we give the proof of Theorem A.
Let d ∈ N. Generalised polynomial maps (or GP maps for short) from R d to R are the smallest family G such that (1) all polynomial maps belong to G; (2) if g, h ∈ G then also g + h, g • h ∈ G (with operations defined pointwise); (3) if g ∈ G then also ⌊g⌋ ∈ G, where ⌊g⌋ is defined pointwise: ⌊g⌋ (x) = ⌊g(x)⌋.We note that generalised polynomials maps are also closed under the operation of taking the fractional part, given by {g} = g − ⌊g⌋.For a sets Ω ⊆ R d and Σ ⊆ R (e.g., Ω = Z d , Σ = Z), by a generalised polynomial map g : Ω → Σ we mean the restriction g| Ω to Ω of a generalised polynomial map g : R d → R such that g(Ω) ⊆ Σ.We point out that, unlike in the case of polynomials, the lift g is not uniquely determined by g, unless Ω = R d .
In [AK22], we introduced a notion of a parametric GP map Z → R with a finite index set I, which (modulo some notational conventions) is essentially the same as a GP map R I × Z → R. For instance, the formula defines a GP map Z → R (or, strictly speaking, a family of GP maps) parametrised by R2 .Formally, a parametric GP map with index set I or a GP map parametrised by Here, we will need a marginally more precise notion, where the set of parameters takes the form R I real ×Z Iint ×[0, 1) I frac rather than R I .Let I real , I int , I frac be pairwise disjoint finite sets and put I = I real ∪ I int ∪ I frac .Then a GP map parametrised by R I real × Z Iint × [0, 1) I frac is the restriction of a GP map parametrised by R I real × R Iint × R I frac (as defined above) to R I real × Z Iint × [0, 1) I frac .We note that in the case where I int = I frac = ∅, the new definition is consistent with the previous one.
In [AK22] we defined the operations of addition, multiplication and the integer part for parametric GP maps, not necessarily indexed by the same set.Roughly speaking, if I ⊆ J are finite sets then we can always think of a GP map parametrised by R I as a GP map parametrised by R J , with trivial dependence on the parameters in R J\I .Thus, if g • and h • are GP maps parametrised by R I and R J respectively, then we can think of both g • and h • as GP maps parametrised by R I∪J , which gives us a natural way to define the (pointwise) sum and product g • + h • and g • • h • .We refer to [AK22] for a formal definition.This construction directly extends to GP maps parametrised by R I real × Z Iint × [0, 1) I frac .Definition 3.1.Let g • and h • be two GP maps parametrised by R I real × Z Iint × [0, 1) I frac and R J real × Z Jint × [0, 1) J frac respectively.Then we say that h • extends g • , denoted 2 h • ❀ g • , if there exists a GP map ϕ : R In [AK22] we obtained a polynomial bound on the number of possible prefixes of a given GP map parametrised by [0, 1) I .Theorem 3.2 ([AK22, Thm.15.3]).Let g • : Z → Z be a GP map parametrised by [0, 1) I for some finite set I. Then there exists a constant C such that, as N → ∞, we have Above, the implicit constant depends only on g • .
Our next goal is to obtain a similar bound for the number of prefixes of a bounded GP map parametrised by R I .Even though we are ultimately interested in bounded GP maps, Proposition 3.4 concerning unbounded GP maps is more amenable to proof by structural induction.We will use the following induction scheme.
Proposition 3.3 ([AK22, Prop.13.9]).Let G be a family of parametric GP maps from Z to Z with index sets contained in N. Suppose that G has the following closure properties.
(i) All GP maps Z → Z belong to G.
(ii) For every g • and h (iv ) For every pair of disjoint finite sets I ⊆ N, J ⊆ N, and every sequence of parametric GP maps h Then G contains all parametric GP maps Z → Z with index sets contained in N.
Proposition 3.4.Let g • : Z → Z be a GP map parametrised by R I for a finite set I. Then there exist finite sets J, K and a GP map g • : Z → Z parametrised by where for each j ∈ J, h • : Z → Z is a GP map parametrised by [0, 1) K .Proof.
(i) If g : Z → Z is a fixed GP map (i.e., if I = ∅) then we can simply take g = g.
(ii) Suppose that the conclusion holds for g • , h • : Z → Z, and let the corresponding extensions g • and h • be given by We may freely assume that the index sets J, K, L, M are pairwise disjoint.We will show that the conclusion also holds for g • + h • and g • • h • .In the case of g • + h • it is enough to combine the sums representing g a,β and h c,δ into a single sum.In the case of g • • h • , we take Then f has the required form and (taking e j,l = a j c l ) we see that (iii) Suppose that the conclusion holds for g • and that g • ❀ g ′ • .Then the conclusion also holds for g ′ • because the relation of being an extension is transitive.(iv ) Suppose that I ⊆ N, J ⊆ N are disjoint finite sets, h (i) • are GP maps parametrised by R J which satisfy the conclusion for each for i ∈ I, and g • is the parametric GP map defined by Let the extensions of h (i) be given by (Note that we may without loss of generality assume use the same index sets L and M for each i ∈ I.) We will show that the conclusion is satisfied for g • .We observe that we have the equality This motivates us to define where ⋄ is some index that does not belong to I × J. Letting also we see that g • takes the required form and (setting φ i,l = {α i c l } and e ⋄ = 1) we have g Combining the closure properties proved above, we infer from Proposition 3.3 that the conclusion holds for all parametric GP maps.
Proposition 3.5.Let M ∈ N and let g • : Z → [M ] be a GP map parametrised by R I for a finite set I. Then there exist a GP map g • : Z → [M ] parametrised by [0, 1) J for a finite set J such that g • ❀ g • .
Proof.Let g (0) • ❀ g • be the parametric GP from Proposition 3.4, and let g Since the value of g α,β (n) is completely determined by its residue modulo M , we expect that it is enough to consider the values of a with a ∈ [M ] J .This motivates us to put Let φ : Z I → Z J and ψ : φ(α),ψ(α) .Let θ : Z I → [0, 1) J be given by θ(α) := {φ(α)/M } (with fractional part taken coordinatewise).Then Proposition 3.6.Let a = (a(n)) n∈Z be a (two-sided) bracket word over a finite alphabet Σ, and let g • : Z → Z be a GP map parametrised by R I for some finite set I. Then there exists a constant C > 0 such that, as N → ∞, we have Above, the implicit constant depends on a and g • .
Proof.Let M := |Σ|.We may freely assume that Σ = [M ], in which case a is a GP map.Thus, a • g • is a GP map parametrised by R I and taking values in [M ].By Proposition 3.5, there exists a GP map g • parametrised by [0, 1) J for a finite set J such that g • ❀ a • g • .Thus, it suffices to show that, for a certain C > 0, the number of words ( g α (n)) As a special case, we obtain a bound on the number of subsequences of bracket words along polynomials of a given degree.
Corollary 3.7.Let a = (a(n)) n∈Z be a (two-sided) bracket word over a finite alphabet Σ and let d ∈ N. Then there exists a constant C > 0 such that, as N → ∞ we have where the implied constant depends only on a and d.

Thus we are now in a position to prove Theorem A.
Proof of Theorem A. We aim to estimate the number of subwords of length H of (a(⌊f (n)⌋)) ∞ n=0 , that is, we count words of the form (a(⌊f (N )⌋), . . ., a(⌊f for N ∈ N. Since f has polynomial growth, there exists k ∈ N such that f (t) ≪ t k .We choose ℓ ≥ k + 1 and apply Theorem 2.11 to find some 0 < η < 1 such that for any H ∈ N at least one of the following holds and P N,ℓ is the Taylor polynomial of f (see (4)).We distinguish the three possible cases.Obviously (i) contributes at most O(H ℓ+1 ) different words.For (ii) we first consider a(⌊P N,ℓ (h)⌋) H−1 h=0 .By Corollary 3.7 this word is contained in a set of size O(H C ).By assumption a(⌊f (N + h)⌋) = a(⌊P N,ℓ (h)⌋) for at most O(H η ) values of h ∈ [H], which can be chosen in H O(H η ) ways For each position h with a(⌊f (N + h)⌋) = a(⌊P N,ℓ (h)⌋) we have at most |Σ| possibilities for the value of a(⌊f (N + h)⌋).In total, we can estimate the number of subwords of length H in this case (up to a constant) by In the last case (iii) we decompose [H] into O(H η ) arithmetic progressions on which e N is constant.We let these arithmetic progressions be denoted by P 1 , . . ., P s .
As there are at most H 3 arithmetic progressions contained in [H] we can bound the number of possible different decompositions by (H 3 ) O(H η ) .On every such progression there exists a polynomial q (which is either P N,ℓ , P N,ℓ + 1 or P N,ℓ − 1) such that a(⌊f (N + h)⌋) = a(⌊q(h)⌋).As a polynomial along an arithmetic progression is again a polynomial, by Corollary 3.7 we can bound the number of subwords appearing along some P j by H C .In total, we can estimate the number of subwords of length H in this case by This finishes the proof for δ = (1 + η)/2 < 1.

Nilmanifolds
In this section we we recall some basic definitions and results on nilmanifolds and discuss the connection to generalized polynomials which goes back to the work of Bergelson and Leibman [BL07].
4.1.Basic definitions.In this section, we very briefly introduce definitions and basic facts related to nilmanifolds and nilpotent dynamics.Throughout this section, we let G denote an s-step nilpotent Lie group of some dimension D. We assume that G is connected and simply connected.We also let Γ < G denote a subgroup that is discrete and cocompact, meaning that the quotient space G/Γ is compact.
A Mal'cev basis compatible with Γ and G • is a basis X = (X 1 , X 2 , . . ., X D ) of the Lie algebra g of G such that (i) for each 0 ≤ j ≤ D, the subspace h j := span (X j+1 , X j+2 , . . ., X D ) is a Lie algebra ideal in g; (ii) for each 0 (iii) Γ is the set of all products exp(t If the Lie bracket is given in coordinates by where all of the constants c (k) i,j are rationals with height at most M then we will say that the complexity of (G, Γ, G • ) is at most M .We recall that the height of a rational number a/b is max We will usually keep the the choice of the Mal'cev basis implicit, and assume that each filtered nilmanifold under consideration comes equipped with a fixed choice of Mal'cev basis.The Mal'cev basis X induces coordinate maps τ : X → [0, 1) D and τ : G → R D , such that The Mal'cev basis also induces a natural choice of a right-invariant metric on G and a metric on X.We refer to [GT12b, Def.2.2] for a precise definition.Keeping the dependence on X implicit, we will use the symbol d to denote either of those metrics.
The space X comes equipped with the Haar measure µ X , which is the unique Borel probability measure on X invariant under the action of G: µ X (gE) = µ X (E) for all measurable E ⊆ X and g ∈ G.When there is no risk of confusion, we write dx as a shorthand for dµ X (x).
A map g : Z → G is polynomial with respect to the filtration G • , denoted g ∈ poly(Z, G • ), if it takes the form where g i ∈ G i for all 0 ≤ i ≤ d (cf.[GT12b, Lem.6.7]; see also [GT12b,Def. 1.8] for an alternative definition).Although it is not immediately apparent from the definition above, polynomial sequences with respect to a given filtration form a group and are preserved under dilation.

Semialgebraic geometry.
A basic semialgebraic set S ⊆ R D is a set given by a finite number of polynomial equalities and inequalities: A semialgebraic set is a finite union of basic semialgebraic sets.In a somewhat ad hoc manner, we define the complexity of the basic semialgebraic set S given by ( 11) to be the sum polynomials appearing in its definition.(Strictly speaking, we take the infimum over all representations of S in the form (11).) We also define the complexity of a semialgebraic set represented to be the finite union of basic semiaglebraic sets S i as the sum of complexities of S i .(Again, we take the infimum over all representations (12).)Using the Mal'cev coordinates to identify the nilmanifold X with [0, 1) D , we extend the notion of a semialgebraic set to subsets of X.A map F : X → R is piecewise polynomial if there exists a partition X = r i=1 S i into semialgebraic pieces and polynomial maps Φ i : R D → R such that F (x) = Φ i (τ (x)) for each 1 ≤ i ≤ r and x ∈ S i .One can check that these notions are independent of the choice of basis, although strictly speaking we will not need this fact.4.3.Quantitative equidistribution.The Lipschitz norm of a function F : X → R is defined as In the case, where X = [0, 1] this notion is highly connected to the discrepancy of a sequence (see (2)).In fact, for δ > 0 small enough we have that (x n ) N −1 n=0 has discrepancy δ if and only if it is δ O(1) distributed.One direction follows immediately from the Koksma-Hlawka inequality and the other direction can be found for example in the proof of Proposition 5.2 in [DDM + 22].
More restrictively, (x n ) N −1 n=0 is totally δ-equidistributed if for each arithmetic progression P ⊆ [N ] of length at least δN we have n=0 is M -rational and periodic with period ≤ M ; (iii) there is a group G ′ < G with Mal'cev basis X ′ in which each element is an M -rational combination of elements of X such that g ′ (n) ∈ G ′ for all n ∈ Z, and the sequence  (i) f is a GP map; (ii) there exists a connected, simply connected nilpotent Lie group G, lattice Γ < G, g ∈ G and a piecewise polynomial map F : G/Γ → [0, 1) such that f (n) = F (g n Γ) for all n ∈ Z; (iii) there exists a connected, simply connected nilpotent Lie group G of some dimension D, lattice Γ < G, a compatible filtration G • , a polynomial sequence g ∈ poly(Z, G • ) and an index 1 ≤ j ≤ D such that f (n) = τ j (g(n)Γ) for all n ∈ Z.
Remark 4.3.Strictly speaking, [BL07] does not include the assumption that G should be connected and simply connected.However, this requirement can be ensured by replacing G with a larger group.(cf. the "lifting argument" on [Fra09, p. 368] and also [BL07, Thm.A*]).The cost of this operation is that in (ii) one may not assume that the action of g on G/Γ is minimal, but we do not need this assumption.
In our applications, we will need to simultaneously represent maps of the form f (⌊p(n)⌋) where f is a fixed GP map and p is a polynomial which is allowed to vary.Such a representation is readily obtained from Theorem 4.2.
Theorem 4.4.Let f : Z → R be a bounded GP map and let d ∈ N. Then there exists a connected, simply connected nilpotent Lie group G, a lattice Γ < G, a filtration G • , and a piecewise polynomial map Proof.By Theorem 4.2, there exists a nilmanifold G (0) /Γ (0) together with a piecewise polynomial map F (0) : G (0) /Γ (0) → R, and a group element . This construction guarantees that F is piecewise polynomial and for all t ∈ R we have . Then g α is polynomial with respect to the filtration G • given by G i = G (⌊i/d⌋) , where G (j) j denotes the lower central series, and we have f (⌊p(n)⌋) = F (g p (n)Γ) for all n ∈ Z.

Möbius orthogonality
5.1.Main result.In this section, we discuss Möbius orthogonality of bracket words along Hardy field sequences.Our main result is Theorem B, which we restate below.
Theorem 5.1.Let a = (a(n)) n∈Z be a (two-sided) R-valued bracket word and let f : R + → R be a Hardy field function with polynomial growth.Then As usual, we will use Taylor expansion to approximate the restriction of f (n) to an interval with a polynomial sequence, and then use Theorem 2.11 to control the error term involved in computing ⌊f (n)⌋.The sequence a(⌊f (n)⌋) can then be represented on a nilmanifold by Bergelson-Leibman machinery.As the next step, we require a suitable result on Möbius orthogonality in short intervals.In Section 5.2, we will prove the following theorem, which is closely related to [MSTT22, Thm.1.1(i)].Below, we let AP denote the set of all arithmetic progressions in Z.
Theorem 5.2.Let G be a connected, simply connected nilpotent Lie group, let Γ < G be a lattice, let G • be a filtration on G, assume that G • and Γ are compatible, and let F : G/Γ → R be finitely-valued piecewise polynomial map.Let N, H be integers with N 0.626 ≤ H ≤ N .Then where the rate of convergence may depend on G, Γ, G • and F .
Proof of Theorem 5.1 assuming Theorem 5.2.Applying a dyadic decomposition, it will suffice to show that Fix a small ε > 0. We will show that, for all sufficiently large N we have Splitting the average in (16) into intervals of length (2N ) 0.7 , we see that (16) will follow once we show that for sufficiently large N and for H satisfying N 0.7 ≤ H < N we have Pick an integer k ∈ N such that f (t) ≪ t k , and let ℓ = 10k.By Theorem 2.11, we have where P N is a polynomial of degree (at most) ℓ and one of the conditions 2.11(i)-(iii) holds.In the case (i) we have N ≪ ε H 10/9 ≤ N 7/9 , which implies that N = O ε (1).Assuming that N is sufficiently large, we may disregard this case.
In the case (ii) we have Eh<H |e N (h)| < ε, and as a consequence By Theorem 4.4, there exists a connected and simply connected nilpotent Lie group G, a lattice Γ < G, a filtration G • and a finitely-valued piecewise polynomial map F : G/Γ → Z such that for each polynomial P of degree at most ℓ there exists g ∈ poly(G • ) such that a(⌊P (h)⌋) = F (g(h)Γ).In particular, By Theorem 5.2, for sufficiently large N the expression in (20) is bounded by ε.Inserting this bound into (19) yields (17).
In the case (iii), passing to an arithmetic progression we may replace e N with a constant sequence: To finish the argument, it suffices to apply Theorem 5.2 similarly to the previous case.5.2.Short intervals.The remainder of this section is devoted to proving Theorem 5.2.We will derive it from closely related estimates for correlations of the Möbius function with nilsequences in short intervals.Recall that we let AP denote the set of all arithmetic progressions in Z.
Theorem 5.3 (Corollary of Thm.1.1(i) in [MSTT22]).Let N, H be integers with N 0.626 ≤ H ≤ N and let δ ∈ (0, 1/2).Let G be a connected, simply connected nilpotent Lie group of dimension D, let Γ < G be a lattice, let G • be a nilpotent filtration on G of length d, and assume that the complexity of (G, Γ, G • ) is at most 1/δ.Let F : G/Γ → C be a function with Lipschitz norm at most 1/δ.Then, for each A > 0 we have the bound This theorem is almost the ingredient that we need, except that in our application the function F is not necessarily continuous (much less Lipschitz).Instead, F is a finitely-valued piecewise polynomial function, meaning that there exists a partition G/Γ = r i=1 S i into semialgebraic pieces and constants c i ∈ R such that for each x ∈ X and 1 ≤ i ≤ r, F (x) = c i if and only if x ∈ S i .In this case, it is enough to consider each of the level sets separately.It is clear that Theorem 5.2 will follow from the following more precise result.
Theorem 5.4.Let N, H be integers with N 0.626 ≤ H ≤ N and let δ ∈ (0, 1/2).Let G be a connected, simply connected nilpotent Lie group of dimension D, let Γ < G be a lattice, let G • be a nilpotent filtration on G of length d, and assume that the complexity of (G, Γ, G • ) is at most 1/δ.Let S ⊆ G/Γ be a semialgebraic set with complexity at most E.Then, for each A ≥ 1 we have the bound In the case where (g(n)Γ) n is highly equidistributed in G/Γ, we will derive Theorem 5.4 directly from Theorem 5.3.In fact, we will obtain a slightly stronger version, given in Theorem 5.5 below.Then, we will deduce the general case of Theorem 5.4 using the factorisation theorem from [GT12b].In order to avoid unnecessarily obfuscating the notation, from this point onwards we will allow all implicit constants to depend on the parameters d, D and E; thus, for instance, the term on the right-hand side of (24) will be more succinctly written as (1/δ) O(1) /log A N .5.3.Equidistributed case.Proposition 5.5.Let N, H be integers with N 0.626 ≤ H ≤ N and let δ ∈ (0, 1/2).Let G be a connected, simply connected nilpotent Lie group of dimension D, let Γ < G be a lattice, let G • be a nilpotent filtration on G of length d, and assume that the complexity of (G, Γ, G • ) is at most 1/δ.Let S ⊆ (R/Z) × (G/Γ) be a semialgebraic set with complexity at most E.Then, for each A ≥ 1 there exists sup where δ := 1/ log B N and the supremum is taken over all polynomial sequences g such that (g(h)Γ) H h=0 is totally δ-equidistributed.Proof.We may freely assume that δ ≥ 1/ log A N , since otherwise there is nothing to prove.In particular, δ = log O(A) N and 1/δ = O(log A N ).Decomposing S into a bounded number of pieces, we may assume that S is a basic semialgebraic set.We will assume that int S = ∅; the case where int S = ∅ can be handled using similar methods and is somewhat simpler.Thus, S takes the form where r = O(1) and P i are polynomial maps (under identification of (R/Z) × (G/Γ) with [0, 1) 1+D ) with deg P i = O(1) for 1 ≤ i ≤ r.Scaling, we may assume that P i ∞ = 1 for all 1 ≤ i ≤ r.Let τ 1 denote Mal'cev coordinates on (R/Z) × (G/Γ), given by τ 1 (t, x) = (t, τ (x)), where we identify [0, 1) with R/Z in the standard way.Furthermore, splitting S further and applying a translation if necessary, we may assume that τ 1 (S) ⊆ 1 10 , 9 10 1+D , implying in particular that τ 1 is continuous in a neighbourhood of S.
Let η ∈ (0, δ) be a small positive quantity, to be specified in the course of the argument, and let Ψ, Ψ ′ : R → [0, 1] be given by It is routine (although tedious) to verify that F and F ′ are 1/η O(1) -Lipschitz (cf. [GT12b, Lem.A.4]).Directly from the definitions, we see that for each t ∈ R/Z and x ∈ G/Γ we have In order to estimate either of the summands in ( 27)-(28), we begin by dividing the interval [H] into O(1/α) sub-intervals with lengths between αH and 2αH, where To estimate the first summand, we note that for each such sub-interval Applying Theorem 5.3 to each sub-interval, for each constant C ≥ 1 we obtain Let us now consider the second summand.We have, similarly to (30), For now, let us assume that α > δ, which we will verify at the end of the argument.We conclude from the fact that (g(h)Γ) where we use dx as a shorthand for dµ G/Γ (x).Taking the weighted average of (33) over all sub-intervals, we conclude that Applying Lemma 5.6(ii) to estimate the measure of the support of F ′ i for each 1 ≤ i ≤ r we conclude that which allows us to simplify (34) to Combining (32) and ( 37) with ( 27)-( 28), we conclude that Letting C and B be sufficiently large multiples of A, we conclude that as needed.Note that choosing B as a large multiple of A also guarantees that α = 1/ log O(A) N > δ = 1/ log B N .5.4.General case.Before we proceed with the proof of Theorem 5.2 in full generality, we will need the following technical lemma.
Lemma 5.6.Let d, D ∈ N, and let V denote the vector space of all polynomial maps P : [0, 1) D → R of degree at most d.
(i) There is a constant C > 1 (dependent on d, D) such that for P ∈ V given by x αi i we have the inequalities (ii) For each P ∈ V and for each δ ∈ (0, 1) we have Proof.Item (i) follows from the fact that each two norms on the finitely-dimensional vector space V are equivalent.For item (ii) we proceed by induction with respect to D. Multiplying P by a scalar, we may assume that P ∞ = 1.Suppose first that D = 1.We proceed by induction on d.If d = 1 then P is an affine function P (x) = ax + b, and the claim follows easily.Assume that d ≥ 2 and that the claim has been proved for d − 1.By item (i), at least one of the coefficients of P has absolute value ≫ d,D 1.In fact, we may assume that this coefficient is not the constant term, since otherwise for all x ∈ [0, 1) we would have P (x) ∈ ( 99 100 P (0), 101 100 P (0)) and hence the set in (40) would be empty for sufficiently small δ.Thus, P ′ ∞ ≫ d,D 1.By the inductive assumption, (41) Thus, it will suffice to show that For each interval I ⊆ [0, 1) such that P ′ (x) has constant sign for x ∈ I we have Since [0, 1) can be divided into O(d) intervals where P is monotonous, (42) follows.Suppose now that D ≥ 2 and the claim has been proved for all D ′ < D. Reasoning like above, we infer from item (i) that P has a coefficient with absolute value ≫ d,D 1 other than the constant.We may expand P in the form where Q i are polynomials in D − 1 variables of degree d − i. Changing the order of variables if necessary, we may assume that there exists j with 1 ≤ j ≤ d such that Q j has a coefficient ≫ d,D 1, and hence Q j ∞ ≫ d,D 1.For k ∈ N, let us consider the set The set in (43) is the disjoint union ∞ k=1 E i , so our goal is to show that (44) Summing (47) gives (44) and finishes the argument.
Proof of Theorem 5.4.The argument is very similar to the proof of Theorem 1.1 assuming Proposition 2.1 in [GT12a].As the first step, we apply the factorisation theorem [GT12b, Thm.1.19], Theorem 4.1, with M 0 = log N and parameter C to be determined in the course of the argument.We conclude that there exists an integer M with log N ≤ M ≪ log OC (1) N such that g admits a factorisation of the form where ε is (M, H)-smooth, γ is M -rational, and g ′ takes values in a rational subgroup G ′ < G which admits a Mal'cev basis X ′ where each element is a M -rational combination of elements of X , and (g ′ (h)Γ) H−1 h=0 is totally 1/M C -equidistributed in G ′ /(Γ ∩ G ′ ) (with respect to the metric induced by X ′ ).
With the same reasoning as in [GT12a], we conclude that (γ(h)Γ) h is a periodic sequence with some period q ≤ M , and for each 0 ≤ j < q and h ≡ j mod q we have γ(h)Γ = γ j Γ for some γ j ∈ G with coordinates τ (γ j ) that are rationals with height ≪ M O(1) .Splitting the average in (24) into sub-progressions, it will suffice to show that for each residue 0 ≤ j < q modulo q, and for each arithmetic progression Q ⊆ qZ + j with diameter at most N/M we have The key difference between our current work and the corresponding argument in [GT12a] is that 1 S is not continuous and hence in (49) we cannot replace ε(h) with a constant and hope that the value of the average will remain approximately unchanged.Instead, we will use an argument of a more algebraic type.We note that, as a consequence of invariance of the metric on G under multiplication on the right, for each h, h ′ ∈ Q we have d (ε(h)g ′ (h)γ j , ε(h ′ )g ′ (h)γ j ) = d (ε(h), ε(h ′ )) = O(1).
For each γ ∈ Γ, the map ι is a polynomial on the semialgebraic set ∆ ∩ ι −1 (Πγ).The estimate on the Lipschitz norm of ι implies that ∆ can be partitioned into M O(1) semialgebraic sets with complexity O(1) such that, on each of the pieces ι is a polynomial of degree O(1) (using the coordinates τ and σ).Applying the corresponding partition in (51), we see that it will suffice to show that for each semialgebraic set T ⊆ (R/Z) × ( G ′ /Λ) with bounded complexity and for each constant A ′ > 0 we have Bearing in mind that M ≥ log N , it will suffice to show that We are now in position to apply Proposition 5.5 on G ′ /Λ.The complexity of ( G ′ , Λ, G ′ • ) is 1/δ ′ , where δ ′ = 1/M O(1) .The largest exponent A ′ with which Proposition 5.5 is applicable to ( g ′ (h)) H−1 h=0 satisfies log A ′ N ≫ M µC for a constant µ ≫ 1, leading to In order to derive (53) it is enough to let C be a sufficiently large multiple of A.
Theorem 4.1 ([GT12b, Thm.1.19]).Let C > 0 be a constant.Let G be a connected, simply connected nilpotent Lie group of dimension D, let Γ < G be a lattice, let G • be a nilpotent filtration on G of length d, and assume that the complexity of (G, Γ, G • ) is at most M 0 .Then for each N ∈ N and each polynomial sequence g ∈ poly(Z, G • ) there exists an integer M with M 0 ≤ M ≪ M O C,d,D (1) 0 and a decomposition g