KAM via standard fixed point theorems

Thomas Alazard; Chengyang Shao

doi:10.1017/fms.2026.10207

KAM via standard fixed point theorems

Part of: Miscellaneous topics - Partial differential equations Hamiltonian and Lagrangian mechanics

Published online by Cambridge University Press: 06 May 2026

Thomas Alazard

and

Chengyang Shao

Show author details

Thomas Alazard: Affiliation:
Centre de Mathématiques Laurent Schwartz (CMLS), CNRS, École polytechnique, France; E-mail: thomas.alazard@polytechnique.edu
Chengyang Shao*: Affiliation:
Institut des Hautes Études Scientifiques (IHÉS), France
*: E-mail: shao@ihes.fr (Corresponding author)

Article contents

Abstract
Introduction
Circular map as illustrative model
Quantitative paraproduct and paralinearization estimates
Existence of invariant torus
Competing interests
Footnotes
References

Abstract

With a mere usage of well-established properties of paradifferential operators, the conjugacy equations in several model KAM problems are converted to parahomological equations solvable by standard fixed point argument. Such discovery greatly simplifies KAM proofs, renders the traditional KAM iteration steps unnecessary, and may suggest a systematic scheme of finding quasi-periodic solutions of realistic magnitude.

MSC classification

Primary: 35S50: Paradifferential operators

Secondary: 70H08: Nearly integrable Hamiltonian systems, KAM theory

Information

Type: Dynamics
Information: Forum of Mathematics, Sigma , Volume 14 , 2026 , e72

DOI: https://doi.org/10.1017/fms.2026.10207 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2026. Published by Cambridge University Press

1 Introduction

1.1 Organization of the paper

Can regularity loss caused by small denominators be overcome by a standard functional analysis argument, instead of a Newtonian/Nash-Moser scheme?

This has been a long-standing question in the field of dynamical systems, widely assumed to have a negative outcome. Surprisingly, we provide an affirmative answer in this paper: for several model KAM problems, it is possible to reduce the conjugacy equation to standard fixed point form via paralinearization techniques developed by Bony [Reference Bony11]. As a result, the “KAM steps” commonly considered essential to the problems are, in fact, unnecessary.

Several consequences follow immediately. Firstly, our new approach entirely bypasses the need for accelerated convergence techniques, a common feature in previous KAM arguments, whether Newtonian or not, used to compensate for regularity loss. It leads to significant simplification of the proof for KAM-type results. Moreover, following standard fixed-point theorems, this approach should allow substantially larger perturbations compared to earlier methods, as the cost of accelerated convergence scheme is unacceptably small magnitude of perturbation. We anticipate that the paradifferential approach to KAM theory may shed light on a systematic scheme for finding quasi-periodic motions of realistic, physical magnitude. Furthermore, a fixed point form of the equation may also yield insights about nonexistence.

After a brief review of the history of this topic, we present our paradifferential approach to KAM theory according to the following arrangement.

Section 2 is an essentially self-contained explanation of the core idea of our approach. We use the problem of circular maps studied by Arnold [Reference Arnold8] as an illustrative model. A Nash-Moser (modified Newton) type proof is briefly outlined. We then introduce the concept of parahomological equation, serving as the counterpart to the homological equation in conventional presentation of KAM theory. Formally being the paradifferential version of the latter, the regularity gain due to paradifferential calculus balances with the loss. This enables one to convert it to a fixed point equation, resembling the parainverse equation introduced by Hörmander [Reference Hörmander44]. We then provide a heuristic showing that such simplicifaction is not specific to one-dimensional problems, but rather pertains to very broad class of conjugacy problems.

In Section 3, we present all the paradifferential calculus tools necessary for our approach: boundedness of paraproduct operator, composition estimate of paraproducts, and a refined version of J.-M. Bony’s paralinearization theorem. The operator norms and magnitude of remainders are all estimated in a quantitative manner.

Section 4 collects the main results of this paper. We use the quantitative paradifferential calculus estimates developed in Section 3 to give new proofs of Hamiltonian conjugacy theorems. Instead of implementing any Newtonian algorithm, we will directly write down the parahomological equation and solve it by standard fixed point techniques. This is in contrast to most of the existing literature.

In order to state these results, we first set up the basic geometric notations. We consider $\mathbb {R}^n$ as the set of column vectors. For simplicity, we identify the space of mappings from $\mathbb {T}^n$ to $\mathbb {R}^n$ with the space of tangent vector fields on $\mathbb {T}^n$ . The variable $\theta \in \mathbb {T}^n$ will be used to denote the variable on the “abstract torus.” The phase space will be $\mathbb {T}^n\times \mathbb {R}^n$ , the variable of which is denoted as $(x,y)\in \mathbb {T}^n\times \mathbb {R}^n$ . Vector fields on the phase space will also be considered as column vectors. We use D to denote differentiation in $(x,y)$ , $\nabla $ to denote gradient with respect to $(x,y)$ (giving rise to column vectors), and $\partial $ to denote differentiation in $\theta $ or x. If u is a mapping from $\mathbb {T}^n$ to the phase space, then the differential $\partial u$ is understood as a matrix of n columns and $2n$ rows:

$$ \begin{align*}\partial u=\begin{pmatrix} \partial u^x \\ \partial u^y \end{pmatrix}\in\boldsymbol{M}_{2n\times n}. \end{align*} $$

The phase space $\mathbb {T}^n\times \mathbb {R}^n$ is endowed with standard symplectic form $\operatorname {\mathrm {d}}\!x\wedge dy$ . The corresponding symplectic structure has matrix representation

$$ \begin{align*}J=\begin{pmatrix} 0 & I_n \\ -I_n & 0 \end{pmatrix}\in\boldsymbol{M}_{2n\times 2n}. \end{align*} $$

Given a Hamiltonian function $h(x,y)$ on the phase space $\mathbb {T}^n\times \mathbb {R}^n$ , we denote the Hamiltonian vector field corresponding to h by

$$ \begin{align*}X_h:=J\nabla h =\begin{pmatrix} \nabla_y h \\ -\nabla_x h \end{pmatrix}. \end{align*} $$

Define the “flat” embedding of the torus $\mathbb {T}^n$ to the phase space by

(1.1)

$$ \begin{align} \zeta_0:\mathbb{T}^n\ni\theta\mapsto\begin{pmatrix} \theta \\ 0 \end{pmatrix}\in\mathbb{T}^n\times\mathbb{R}^n. \end{align} $$

The Hamiltonian function under consideration takes the form

(1.2)

$$ \begin{align} h(x,y):=a_0(x)+\langle a_1(x), y\rangle+\frac{1}{2}\langle Q(x)y,y\rangle+O(|y|^3), \end{align} $$

where $a_0$ is a scalar function, $a_1$ is a $\mathbb {R}^n$ -valued function, and Q is a symmetric matrix valued function. In other words, we consider Hamiltonian functions defined near the “flat” embedded torus.

We also fix a Diophantine frequency vector $\omega \in \mathbb {R}^n$ : there is a $\tau>0$ and $\gamma>0$ such that

(1.3)

$$ \begin{align} |k\cdot\omega|\geq\frac{\gamma}{|k|^\tau}, \quad \forall k\in\mathbb{Z}^n\setminus\{0\}. \end{align} $$

Standard measure theoretic arguments ensure the abundance of Diophantine vectors in the sense of measure: if $\tau>n-1$ , then the set of those $\omega \in \mathbb {R}^n$ satisfying the Diophantine condition (1.3) with $\gamma $ exhausting all positive real numbers is a set of the first Baire category and of full Lebesgue measure. We then write

$$ \begin{align*}\nabla_\omega=\sum_{j=1}^n\omega_j\partial_j \end{align*} $$

for the differentiation along the parallel vector field $\omega $ on $\mathbb {T}^n$ .

We now state the main theorems of the paper. The function spaces involved in these statements are classical continuously differentiable spaces $C^N$ , the Zygmund spaces $C^r_*$ , and the Sobolev spaces $H^s$ . See Subsection 1.3 for the definition of these function spaces.

Theorem 1.1 (Existence of Invariant Torus).

Fix $\tau>n-1$ . Let $s>2\tau +2+n/2+\varepsilon $ be a fixed index, set $r=s-n/2$ , and set $N_{s+r}$ to be the least integer such that $N_{s+r}>s+r$ . Let $\zeta _0$ , h and $\omega $ be as in (1.1)-(1.3), and suppose the mean value $\operatorname {\mathrm {Avg}} Q$ is an invertible matrix. Set

$$ \begin{align*}M_Q=\max\left(\big|(\operatorname{\mathrm{Avg}} Q)^{-1}\big|,|Q|_{L^\infty}\right). \end{align*} $$

Define “error of being invariant” and “error of being integrable” as

(1.4)

$$ \begin{align} e_0:=X_h(\zeta_0)-\nabla_\omega \zeta_0, \quad e_1:=Q-\operatorname{\mathrm{Avg}} Q. \end{align} $$

There are constants $c_0,c_1,c_2$ depending on $|h|_{C^{N_{s+r}+3}}$ and $M_Q$ with the following property. If

(1.5)

$$ \begin{align} \big\|e_0\big\|_{H^{s+2\tau+\varepsilon}} \leq \gamma^{4}c_0, \quad |e_1|_{C^r_*}\leq\gamma^{2}c_1 \end{align} $$

then there is an embedding $u:\mathbb {T}^n\mapsto \mathbb {T}^n\times \mathbb {R}^n$ of class $H^s$ , such that

$$ \begin{align*}\|u-\zeta_0\|_{H^s}\leq c_2\gamma^{-2}\|e_0\|_{H^{s+2\tau+\varepsilon}}, \end{align*} $$

and $u(\mathbb {T}^n)$ is an invariant torus for the flow of h:

(1.6)

$$ \begin{align} X_h(u)-\nabla_\omega u=0 \end{align} $$

We next state a translated existence theorem that allows variation of frequency.

Theorem 1.2 (Translated Conjugacy).

Fix $\tau>n-1$ . Let $s>2\tau +2+n/2+\varepsilon $ be a fixed index, set $r=s-n/2$ , and set $N_{s+r}$ to be the least integer such that $N_{s+r}>s+r$ . Let $\zeta _0,h,\omega $ be as in Theorem 1.1. Without any nondegeneracy assumption for Q, define

$$ \begin{align*}M_Q=|Q|_{L^\infty}. \end{align*} $$

Write $h_\xi (x,y)=h(x,y)+\xi \cdot y$ for $\xi \in \mathbb {R}^n$ . Still define “error of being invariant” and “error of being integrable” as

(1.7)

$$ \begin{align} e_0:=X_h(\zeta_0)-\nabla_\omega \zeta_0, \quad e_1:=Q-\operatorname{\mathrm{Avg}} Q. \end{align} $$

There are constants $c_0,c_1$ depending on $|h|_{C^{N_{s+r}+3}}$ and $M_Q$ with the following property. If

(1.8)

$$ \begin{align} \big\|e_0\big\|_{H^{s+2\tau+\varepsilon}} \leq \gamma^{4}c_0, \quad |e_1|_{C^r_*}\leq\gamma^{2}c_1, \end{align} $$

then there is an embedding $u:\mathbb {T}^n\mapsto \mathbb {T}^n\times \mathbb {R}^n$ of class $H^s$ and a constant vector $\xi \in \mathbb {R}^n$ , such that

$$ \begin{align*}\|u-\zeta_0\|_{H^s}\leq c_2\gamma^{-2}\|e_0\|_{H^{s+2\tau+\varepsilon}}, \end{align*} $$

and $u(\mathbb {T}^n)$ is an invariant torus for the flow of the modified Hamiltonian $h_\xi $ :

(1.9)

$$ \begin{align} X_{h_\xi}(u)-\nabla_\omega u=0 \end{align} $$

Remark 1.1. It is not hard to generalize the results to other function spaces, for example Zygmund spaces $C^r_*$ or more general Besov spaces. Though only existence results are stated, we can actually prove uniqueness and continuous dependence on parameters in both cases under more restrictive smallness assumptions. See the end of Subsection 4.3. On the other hand, although the theorems are stated for almost integrable systems, the coverage in fact is much broader under standard symplectic geometric consideration. See the discussion in Subsection 4.4.

Let us explain the reason of choosing these particular KAM type theorems for validation of our fixed point approach. Compared to standard account of KAM theorems solving a symplectic diffeomorphism that brings the Hamiltonian function to a normal form, starting from an approximate solution is computationally more convenient in practical applications to realistic systems. See for example [Reference Celletti and Chierchia20, Reference Celletti and Chierchia21], where the authors coducted computer assisted proof of KAM type results for restricted three body problems of Sun-Jupiter-asteroid with realistic physical parameters. It can be expected that a fixed point approach will yield improvements regarding these realistic physical systems, and may advance the estimate of threshold of validity.

In the early stages of KAM theory, Hamiltonian conjugacy theorems were commonly formulated as structural stability theorems of KAM normal forms. This necessitated suitable action-angle variables for the system to reduce it to a perturbative form, usually a challenging task. But if one directly searches for an invariant torus near a given “approximately invariant torus” in the phase space, such technical issue can be largely bypassed. This idea dates back was used in, for example, [Reference Celletti and Chierchia19, Reference Salamon78, Reference Salamon and Zehnder77, Reference de la Llave, González, Jorba and Villanueva59].

While the coverage of Theorem 1.1–1.2 may appear narrower than the results in [Reference de la Llave, González, Jorba and Villanueva59], it remains sufficiently broad to apply to a diverse range. Notably, Theorem 1.1 is the famous Kolmogorov invariant tori theorem (see [Reference Kolmogorov55, Reference Arnold6]) with strongest nondegeneracy condition. In fact, as shown in [Reference Berti and Bolle10], once the embedding u is given, a symplectic coordinate close to the original one can be readily constructed near the isotropic torus $u(\mathbb {T}^n)$ , under which h assumes a normal form and $u(\mathbb {T}^n)$ is flattened to $\mathbb {T}^n\times \{0\}$ . Similarly, Theorem 1.2 is in fact a special case of the “théorème de conjugaison tordue” (see Féjoz [Reference Féjoz29]), which further implies the iso-energetic KAM theorem as a corollary. The more general “théorème de conjugaison tordue” by Herman was proved in [Reference Féjoz28], which is a properly degenerate KAM theorem. It appears that the method of our paper still applies with only minor alteration. For discussion of coverage of Theorem 1.1–1.2 besides these obvious specific cases, see Subsection 4.4.

We briefly discuss the possibility of further generalizations. Theorem 1.2, with suitable adaptation and extension, may find compatibility with “KAM for PDEs,” including the Craig-Wayne-Bourgain approach to Melnikov persistency theorem. See for example [Reference Kuksin57, Reference Craig and Wayne23, Reference Bourgain15, Reference Bourgain16, Reference Bourgain17] (and [Reference Berti and Bolle10] for the applicability of this formalism to infinite dimensions). Such an alignment could potentially introduce a novel approach of finding “large magnitude” KAM solutions to Hamiltonian PDEs. To the author’s knowledge, the first paradifferential construction of periodic solutions of PDEs involving small denominators is carried out by Delort [Reference Delort24]. Generalization to quasi-periodic solutions is possible. Due to the technicality involved in these extensions, we confine ourselves proving Theorem 1.1–1.2 in this paper to keep the narration as simple as possible. Extensions to “KAM for PDEs” are discussed with full detail in the forthcoming work [Reference Alazard and Shao3].

1.2 A brief review of history

Given the extensive volume of literature associated to KAM theory and Nash-Moser techniques, it does not seem practical to provide a panoramic review within such limited space. We refer the reader to [Reference Alinhac and Gérard4, Reference Bost12, Reference Chierchia and Procesi22, Reference Féjoz28, Reference Hamilton36, Reference de la Llave60, Reference Pöschel73, Reference Wayne83] for comprehensive description of these topics. Here, we confine ourselves to a brief overview of those works directly related to the central issue of this paper; that is, whether KAM iteration can be substituded by standard fixed point argument.

KAM theory is generally recognized to originate from Kolmogorov’s 1954 report [Reference Kolmogorov55], although the problems it addressed had already interested mathematicians and physicists as early as the time of Poincaré: does the Poincaré-Lindstedt series converge in presence of small denominators? Or equivalently, do quasi-periodic solutions persist upon perturbation? Kolmogorov suggested a positive answer to this question by introducing a sequence of canonical transformations constructed via a modified Newtonian scheme. Arnold gave a detailed yet technically different proof of Kolmogorov’s theorem in [Reference Arnold6], and extended its coverage in [Reference Arnold7]. Both Kolmogorov and Arnold worked in space of analytic functions. In the meantime, Moser [Reference Moser67] was able to replace analyticity by finite differentiability in a parallel context. Hence comes the abbreviation KAM theory.

Realizing similarity between the Newtonian scheme used by Kolmogorov-Arnold and that used by Nash [Reference Nash71] to address the isometric embedding problem, Moser [Reference Moser68, Reference Moser69] then extracted this set of methods and applied it to broader classes of nonlinear problems. Related works that set stage for a general statement in graded spaces include Sergeraert [Reference Sergeraert79] and Hörmander [Reference Hörmander42, Reference Hörmander43]. In the original context of dynamical conjugacy problems, Zehnder [Reference Zehnder85, Reference Zehnder86, Reference Zehnder87] fitted them into adapted “hard implicit function theorems.” These works are commonly recognized as the origin of the Nash-Moser implicit function theorem, which soon found its strength in nonlinear analysis. See [Reference Hamilton36] for a systematic narration.

Nevertheless, from a practical point of view, if the goal is to systematically find periodic or quasi-periodic motions of realistic magnitude, then KAM/Nash-Moser type methods, based on Newtonian iteration, are far from satisfactory. In fact, even when solving for the zero of a single-variable function, the Newtonian method is well-known to be much more sensitive to initial estimate than ordinary fixed point schemes. The loss of regularity caused by small denominators for conjugacy problems significantly worsens the scenario. As Hénon observed [Reference Hénon38], the original proof by Arnold yields an infamous $10^{-300}$ upper bound for strength of perturbation, which is physically not acceptable (see also [Reference Laskar and Hénon58]):

Ainsi, ces théorèmes, bien que d’un très grand intérêt théorique, ne semblent pas pouvoir en leur état actuel être appliqués à des problèmes pratiques, où les perturbations sont toujours beaucoup plus grandes…

Thus, these theorems, although of great theoretical interest, do not seem to be able to be applied to practical problems in their current state, where the perturbations are always much larger…

However, Hénon also pointed out that numerical evidence suggests the persistence of quasi-periodic solutions even in the presence of strong perturbations. Numerical evidence, though it never serves as rigorous proof, could indicate the potential extension of KAM proofs to encompass stronger perturbations. There have been some successful research efforts aimed at quantifying the validity of KAM proofs. Notable among them are [Reference Celletti and Chierchia19, Reference Celletti and Chierchia20, Reference Celletti and Chierchia21], where the authors proved KAM stability for certain Hamiltonian systems, including restricted three body problems, with realistic physical parameter. The proof is quantitative and computer assisted to validate the Newtonian algorithm.

Apart from practical considerations, mathematicians are interested in avoiding Nash-Moser type schemes because they typically offer less insight into the nonlinear structure. For example, the original proof of local existence of the Ricci flow [Reference Hamilton37] or the mean curvature flow [Reference Gage and Hamilton33] employed the Nash-Moser method, given the highly degenerate parabolic operators arising from linearization. However, the DeTurck technique introduced in [Reference DeTurck25] elegantly resolved this degeneracy by fixing geometric gauge. Another example is the quasi-linear perturbation of wave equations. Klainerman [Reference Klainerman52] [Reference Klainerman53] proved global well-posedness for quasi-linear perturbation of wave equations in spatial dimension higher than 6. His method was a Nash-Moser type iteration involving smoothing operators that both truncates Fourier modes and long-time ranges to deal with loss of decay. But Klainerman and Ponce [Reference Klainerman and Ponce51] soon discovered the loss of decay could be compensated by working in Banach spaces of appropriately decaying weights. Later Klainerman [Reference Klainerman54] introduced the vector field method, drastically simplifying the proof of global well-posedness for wave equations. More recent development is also worth mentioning: Hintz-Vasy [Reference Hintz and Vasy41] proved the nonlinear stability of Kerr-de Sitter spacetime by a Nash-Moser inverse function theorem, which was later found by Fang [Reference Fang27] to be replaceable by suitable bootstrap argument.

As for the isometric embedding problem itself, 30 years after Nash’s original work, Günther [Reference Günther35] converted the isometric embedding problem into a standard implicit function form through careful manipulation with Laplacian. Thus the Nash-Moser method is not necessary for its original intended context. Almost simultaneously, Hörmander [Reference Hörmander44] recognized the connection between Nash-Moser methods and paradifferential calculus: they both involve dyadic decomposition of nonlinearity. Hörmander introduced parainverse operators for inverse function problems with loss of regularity. Although the alternative proof for Nash embedding theorem in [Reference Hörmander44] implies slightly weaker result than [Reference Günther35], the general method of the former not only addresses to the vague observation Nash-Moser technique can usually be replaced by elementary methods, but also has the potential of applying to other nonlinear problems.

There have also been alternative proofs of KAM type results. Eliasson’s paper [Reference Eliasson26] revisited the Poincaré–Lindstedt series, directly proving its convergence by implementing the delicate cancellations among the coefficients. Other proofs of KAM theorem include [Reference Bricmont, Gawedzki and Kupiainen18], using renormalization group acting on frequency space; [Reference Khanin, Dias and Marklof49], using a multidimensional continued fractions algorithm; [Reference Rüssmann75, Reference Pöschel74], which replaced the Newtonian iteration scheme with a “slowly converging” one; and [Reference Bounemoura and Fischler13, Reference Bounemoura and Fischler14] by rational approximation of the frequency vector. Being insightful from different aspects, these alternative proofs are all still based on various types of iteration with improved convergence.

In the recent decade, the idea of replacing modified Newtonian (Nash-Moser or KAM) iteration with paradifferential calculus regained some attention. To mention a few, there was a proof of local well-posedness for gravity-capillary water waves using Nash-Moser theorem in [Reference Ming and Zhang66]. By paralinearizing the system, however, Alazard-Burq-Zuily [Reference Alazard, Burq and Zuily1] proved the local well-posedness under significantly lower regime of regularity. In the construction of periodic solutions for dispersive differential equations on the torus, the traditional approach often employs Nash-Moser type techniques. But Delort [Reference Delort24] successfully substituted these techniques with ordinary iterations, aided by paradifferential calculus. The idea also echoed in recent advances for the study of Landau damping, the parallel of KAM theory in statistical mechanics. Originally proved to exist by a Newtonian scheme [Reference Mouhot and Villani70], the result was then proved in [Reference Bedrossian, Masmoudi and Mouhot9] using paraproducts in Gevrey spaces instead.

Regarding the necessity of such replacement, we particularly highlight the work of Herman [Reference Herman40], where the conjugacy equation for circular diffeomorphism was elegantly transformed into a fixed-point problem using Schwarz derivatives. Being surprising enough, even more is true. The technique was significantly extended by Marmi, Moussa, and Yoccoz [Reference Marmi, Moussa and Yoccoz61, Reference Marmi, Moussa and Yoccoz62] in their proof of finite codimensional structural stability for interval exchanging maps. Traditional KAM iteration faces limitations in such settings, since the obstructions of solution of the linear homological equation become more severe in function spaces of higher regularity, as demonstrated by Forni [Reference Forni30, Reference Forni31]. However, reformulating the conjugacy equation as a fixed-point problem effectively bypasses this issue, a key insight that supports the arguments in [Reference Marmi, Moussa and Yoccoz61, Reference Marmi, Moussa and Yoccoz62].

On the other hand, the Schwarz derivative technique appears to be highly specific to one-dimensional problems. The analogous problem on stability of invariant surfaces of geodesic flows on translation surfaces, proposed by Forni in [Reference Forni30], remained open until the very recent work of Forni himself [Reference Forni32]. Forni applied the method developed in this article to establish finite codimensional structural stability of geodesic flows on translation surfaces – a setting where traditional KAM iteration fails due to regularity constraintsFootnote ¹ . The crux of Forni’s proof lies in transforming the conjugacy equation into a fixed-point form via paralinearization. By resolving the open problem nearly three decades after its original introduction, such achievement might convince the reader of the merit of the paradifferential approach introduced in this paper. It is the aim of this paper to rekindle the idea of “Nash-Moser/KAM replaced by paradifferential” and develop a KAM theory from this new perspective.

1.3 Notation and convention

The notions in this subsection are standard and can be found in any textbook on harmonic analysis, for example [Reference Stein and Murphy81]. Throughout the paper, we do not distinguish functions defined on $\mathbb {T}^n$ with functions defined on $\mathbb {R}^n$ that are $2\pi $ -periodic with respect to each variable. We represent a distribution u on $\mathbb {T}^n$ as a Fourier series:

$$ \begin{align*}u=\sum_{k\in \mathbb{Z}^n} \hat{u}(k) e^{ik\cdot x}, \end{align*} $$

where the Fourier coefficient

$$ \begin{align*}\hat{u}(k)=\int_{\mathbb{T}^n}u(x)e^{-ik\cdot x}\operatorname{\mathrm{d}} \! x. \end{align*} $$

We denote $\operatorname {\mathrm {Avg}} u=\hat u(0)$ for the mean value of u if $u\in L^1$ . If $u\in L^2$ , then the series converges in $L^2$ .

We introduce the Littlewood-Paley decomposition as follows:

Definition 1.1. Fix a function $\varphi \in C^\infty _0(\mathbb {R}^n)$ , with support in an annulus $\{1/2\leq \left \vert \xi \right \vert \leq 2\}$ , so that

$$ \begin{align*}\sum_{j=1}^\infty\varphi(2^{-j}\xi)=1-\psi(\xi), \quad\operatorname{\mathrm{supp}}\psi\subset\{|\xi|\leq1\}. \end{align*} $$

The Littlewood-Paley decomposition of a distribution u on $\mathbb {T}^n$ is the defined as

(1.10)

$$ \begin{align} u=\Delta_{0}u+\sum_{j\ge1}\Delta_ju, \quad\text{where}\quad \Delta_j u=\sum_{k\in\mathbb{Z}^n}\varphi(2^{-j}k)\hat{u}(k)e^{ik\cdot x}, \quad j\geq1, \end{align} $$

while one fixes $\Delta _0u=\hat u(0)=\operatorname {\mathrm {Avg}} u$ .

The partial sum operator $S_j$ is defined as

$$ \begin{align*}S_j=\sum_{l\leq j}\Delta_l,\quad j\geq0, \end{align*} $$

while for $j\leq 0$ one just fixes $S_j=\Delta _0$ .

The summand $\Delta _ju$ in Definition 1.1 is called j’th building block. It has Fourier support contained in the dyadic annulus $\{0.5\cdot 2^j\leq |\xi |\leq 2\cdot 2^j\}$ . The speed of convergence of $S_ju$ to u reflects the regularity of u – and it is the content of Littlewood-Paley theory to study this connection.

For an index $s\in \mathbb {R}$ , the Sobolev space $H^s(\mathbb {T}^n)$ consists of those distributions u on $\mathbb {T}^n$ such that

$$ \begin{align*}\left\Vert u\right\Vert{}_{H^s}:=\left( \sum_{k\in \mathbb{Z}^n} \big(1+\left\vert k\right\vert{}^2\big)^s \big\vert \hat{u}(k) \big\vert^2\right)^{1/2} <+\infty. \end{align*} $$

The space $(H^s,\left \Vert \cdot \right \Vert {}_{H^s})$ is a Hilbert space. If $s\in \mathbb {N}$ , then

$$\begin{align*}H^s(\mathbb{T}^n)=\bigl\{u\in L^2(\mathbb{T}^n) : \forall \alpha\in \mathbb{N}^n,~|\alpha|\leqslant s,~\partial_x^\alpha u\in L^2(\mathbb{T}^n)\bigr\}, \end{align*}$$

where $\partial _x^\alpha $ is the derivative in the sense of distribution of u. Moreover, when s is not an integer, the Sobolev spaces coincide with those obtained by interpolation. If we define

$$ \begin{align*}\left\Vert u\right\Vert{}_{s}^2 \mathrel{:=} \sum_{j=0}^{\infty}2^{2js}\left\Vert \Delta_j u\right\Vert{}_{L^2}^2, \end{align*} $$

then it follows from Plancherel theorem that $\left \Vert \cdot \right \Vert {}_{H^s}$ and $\left \Vert \cdot \right \Vert {}_{s}$ are equivalent.

We will be using the Zygmund spaces (also known as Lipschitz spaces in the literature) throughout the paper. For an index $r\in \mathbb {R}$ , the Zygmund space $C^r_*$ consists of those distributions u on $\mathbb {T}^n$ such that

(1.11)

$$ \begin{align} |u|_{C^r_*}:= \sup_{j\geq0} 2^{jr}|\Delta_ju|_{L^\infty}<+\infty. \end{align} $$

Direct manipulation with series implies $H^s\subset C^{s-n/2}_*$ for any $s\in \mathbb {R}$ . For noninteger $r>0$ , the $C^r_*$ -norm is equivalent to the Hölder norm with index r, that is, the norm

$$ \begin{align*}|u|_{C^{[r]}} +\sum_{|\alpha|=[r]}\sup_{x,y\in\mathbb{T}^n}\frac{\big|\partial^{\alpha}u(x)-\partial^{\alpha}u(y)\big|}{|x-y|^{r-[r]}}. \end{align*} $$

Here $[r]$ is the integer part of r. However, when r is a natural number, the space $C^r_*$ is strictly larger than the classical space of Lipschitz continuous functions.

2 Circular map as illustrative model

In this section, we will use the circular map model studied by Arnold [Reference Arnold8] to illustrate the core idea of our paper. Although a quite complete global theory under sharp regularity and number-theoretic assumptions is available (see for example [Reference Herman39, Reference Yoccoz84, Reference Khanin and Sinai50, Reference Katznelson and Ornstein47]), we find it illuminating to revisit the perturbation problem for this model from the very beginning.

Throughout this section, we identify the circle $\mathbb {S}^1=\mathbb {R}/2\pi \mathbb {Z}$ , and write $\varrho _{\alpha }:x\mapsto x+\alpha $ for the rotation of angle $\alpha $ , where $\alpha \in (0,2\pi )$ . Addition of arguments will all be understood as modulo $2\pi $ . To avoid unnecessary confusion in notation, from now on we shall use $\eta ^\iota $ to denote the inversion of a nonlinear mapping $\eta $ (should it exist), and use $a^{-1}$ to denote the reciprocal of a number a.

2.1 Description of the problem

Let $\alpha \in (0,2\pi )$ be noncommensurable with $\pi $ . Consider the following classical problem:

○ Suppose $f:\mathbb {S}^1\mapsto \mathbb {S}^1$ is smooth and is close to 0. Is the diffeomorphism $x\mapsto x+\alpha +f(x)$ smoothly conjugate to the rotation $\varrho _{\alpha }$ ?

Of course, an integrablity condition must be posed for f. Recall the classical theorem due to Denjoy:

Theorem 2.1. For a $C^3$ orientation-preserving diffeomorphism of $\mathbb {S}^1$ to itself, if its rotation number $\alpha $ is not commensurable with $\pi $ , then it is topologically conjugate to the rotation $\varrho _{\alpha }$ .

If $x\mapsto x+\alpha +f(x)$ has rotation number different from $\alpha $ , then by Denjoy’s theorem, it definitely cannot conjugate to $\varrho _{\alpha }$ . On the other hand, Denjoy’s theorem has no implication on smoothness of the conjugation, even if the known diffeomorphism is smooth. Therefore the question is not answered by Denjoy’s theorem. We modify it as follows:

○ Suppose $f:\mathbb {S}^1\mapsto \mathbb {S}^1$ is smooth and is close to 0, such that the diffeomorphism $x\mapsto x+\alpha +f(x)$ still has rotation number $\alpha $ . Is it smoothly conjugate to the rotation $\varrho _{\alpha }$ ?

Let us write the hypothetical conjugation as $\eta (x)=x+u(x)$ , and try to solve the superficially more general conjugacy equation

$$ \begin{align*}\eta(x+\alpha)=\eta(x)+\alpha+f\circ\eta(x)-\lambda \end{align*} $$

for $\eta $ and the auxiliary real parameter $\lambda $ Footnote ² . A simple rearrangement converts this equation to

(2.1)

$$ \begin{align} \Delta_\alpha u=f\circ(\mathrm{Id}+u)-\lambda, \quad\text{where}\quad \Delta_\alpha u:=u\circ\varrho_{\alpha}-u. \end{align} $$

We may also compose the equation with $(\mathrm {Id}+u)^\iota $ , and reformulate equation (2.1) as

(2.2)

$$ \begin{align} \big[\Delta_\alpha u\big]\circ(\mathrm{Id}+u)^\iota=f-\lambda. \end{align} $$

We will discuss the technical difference between (2.1) and (2.2) in Appendix A.

To clarify the obstacles of solving the conjugacy problem, we may linearize either (2.1) or (2.2) at $(u,\lambda )=(0,0)$ along direction $(v,\mu )$ . Neglecting f as well, we obtain the linear homological equation

$$ \begin{align*}\Delta_\alpha v+\mu=h. \end{align*} $$

This linearized equation is solved via Fourier transform: normalizing $\hat v(0)=0$ , the unique solution is

$$ \begin{align*}v(x)=\sum_{k\neq0}\frac{\hat h(k)e^{ikx}}{e^{ik\alpha}-1}, \quad \mu=\operatorname{\mathrm{Avg}} h, \end{align*} $$

Consequently, in order that the conjugacy problem is solvable even at the linear level, an additional number-theoretic condition must be posed for the number $\alpha $ . We require that $\alpha /\pi $ is Diophantine of type $(\tau ,\gamma )$ : there are $\tau>0$ and $\gamma>0$ such that

(2.3)

$$ \begin{align} \left|\frac{q\alpha}{\pi}-p\right|\geq\frac{\gamma }{q^{\tau}}, \quad \forall p,q\in\mathbb{Z}\setminus\{0\}. \end{align} $$

Such numbers are abundant. Indeed, Liouville’s inequality asserts that if an algebraic number $\alpha $ is of degree D, then the inequality $|q\alpha -p|\geq c/q^{D-1}$ must hold for some c. A measure-theoretic argument asserts that with $\tau>1$ fixed, the set of $(\tau ,\gamma )$ Diophantine numbers with $\gamma $ exhausting all positive real numbers form a set of full Lebesgue measure and of first Baire category.

Obviously, if $\alpha $ satisfies the Diophantine condition (2.3), then $\Delta _\alpha ^{-1}$ is a well-defined operator mapping functions of mean zero on $\mathbb {S}^1$ to functions of mean zero, satisfying

(2.4)

$$ \begin{align} \|\Delta_\alpha^{-1}f\|_{H^s}\leq C\gamma^{-1}\|f\|_{H^{s+\tau}}, \quad \text{if }\operatorname{\mathrm{Avg}} f=0. \end{align} $$

Here C is an absolute constant independent of $\gamma $ or s. This enables one to solve the linearized equation $\Delta _\alpha v+\mu =h$ , at least for very regular right-hand-side. But due to this loss of information on regularity, a usual iterative scheme to solve (2.2) will necessarily terminate after finitely many steps.

2.2 Approximate right inverse

Since the operator $\Delta _\alpha ^{-1}$ causes a loss of regularity that cannot be compensated by any known elliptic technique, it is natural to employ modified Newtonian iterative scheme to solve the conjugacy problem (2.1). At a first glance it appears to be friendlier than (2.2), but in fact its linearized operator only admits an approximate right inverse. This brings more technicality for the application of Nash-Moser scheme. Let us describe the strategy of resolving it.

We set the unknwon $U=(u,\lambda )$ , and define a mapping

$$ \begin{align*}\mathscr{F}(f,U)=\Delta_\alpha u-f\circ(\mathrm{Id}+u)+\lambda. \end{align*} $$

The linearization of $\mathscr {F}$ at $U=(u,\lambda )$ along $V=(v,\mu )$ is

$$ \begin{align*}D_U\mathscr{F}(f,U)V=\Delta_\alpha v-f'\circ(\mathrm{Id}+u)v+\mu. \end{align*} $$

In the literature of dynamical systems, given $U=(u,\lambda )$ , the linearized equation

(2.5)

$$ \begin{align} D_U\mathscr{F}(f,U)V=h \end{align} $$

for the unknown $V=(v,\mu )$ with a given right-hand-side h is referred to as the homological equation.

In general, one only expects to solve the homological equation (2.5) approximately, since the operator $\Delta _\alpha -f'\circ (\mathrm {Id}+u)$ is hard to invert. To explain the meaning of being “approximately solvable,” we write down a simple yet crucial identity, which is a direct consequence of the definition of $\mathscr {F}$ :

(2.6)

$$ \begin{align} f'\circ(\mathrm{Id}+u)=\frac{\Delta_\alpha u'}{1+u'}-\frac{[\mathscr{F}(f,U)]'}{1+u'}. \end{align} $$

So in fact

(2.7)

$$ \begin{align} \begin{aligned} D_U\mathscr{F}(f,U)V &=\Delta_\alpha v-\frac{\Delta_\alpha u'\cdot v}{1+u'} +\frac{[\mathscr{F}(f,U)]'}{1+u'}v+\mu\\ &=(1+u'\circ\varrho_{\alpha})\Delta_\alpha\left(\frac{v}{1+u'}\right)+\frac{[\mathscr{F}(f,U)]'}{1+u'}v+\mu. \end{aligned} \end{align} $$

Thus, defining

$$ \begin{align*}\Psi(U)h=\left((1+u')\Delta_\alpha^{-1}\left[\frac{h-\mu(h)}{1+u'\circ\varrho_{\alpha}}\right],\,\mu(h)\right), \quad \text{where } \mu(h)=\frac{\mathrm{Avg}\big((1+u'\circ\varrho_{\alpha})^{-1}h\big)}{\mathrm{Avg}\big((1+u'\circ\varrho_{\alpha})^{-1}\big)}, \end{align*} $$

there holds

(2.8)

$$ \begin{align} D_U\mathscr{F}(f,U)\Psi(U)h-h =[\mathscr{F}(f,U)]'\Delta_\alpha^{-1}\left(\frac{h}{1+u'\circ\varrho_{\alpha}}-\mu(h)\right). \end{align} $$

Equality (2.8) is the property defining “approximate solvability”: the linear operator $\Psi (U)$ is an exact right inverse of $D_U\mathscr {F}(f,U)$ at a precise solution U of $\mathscr {F}(f,U)=0$ .

The loss of regularity caused by $\Delta _\alpha ^{-1}$ is usually identified as the primary difficulty for solving $\mathscr {F}(f,U)=0$ . A modified Newtonian scheme was implemented by Kolmogorov [Reference Kolmogorov55], Arnold [Reference Arnold6, Reference Arnold7, Reference Arnold8] and Moser [Reference Moser68, Reference Moser69]. Being technically different, the general idea shared by them is to define a sequence $U_k$ by inductively solving a sequence of homological equations:

(2.9)

$$ \begin{align} U_{k+1}=U_k-S_k\big[\Psi(U_k)\mathscr{F}(f,U_k)\big]. \end{align} $$

Here the operator $S_k$ is either the restriction operator to a smaller domain in case all functions involved are analytic, or a smoothing operator in case all functions are of finite differentiability. The quadratic convergence property of Newtonian scheme cancels the large constants so produced and ensures the convergence of $U_k$ to a genuine solution. Proving this convergence is where the complexity accumulates.

The issue with approximate invertibility was first noticed by Zehnder [Reference Zehnder85, Reference Zehnder86, Reference Zehnder87], who introduced adapted hard implicit function theorems (Nash-Moser type theorems) to fit such conjugacy problems into a unified formalism. The proofs of these hard implicit function theorems still rely on modified Newtonian schemes, essentially being (2.9). For the statement and detailed proof of Nash-Moser theorem with either exactly or approximately invertible linearized operator, see [Reference Zehnder85, Reference Zehnder86, Reference Zehnder87, Reference Hamilton36]. Though taking various forms on various graded spaces, all versions of Nash-Moser type theorems share the common features of restoring regularity through smoothing operators and ensuring convergence through quadratic property of Newtonian algorithm. While being universal and powerful theoretically, the practical disadvantage of this approach is that the allowed magnitude of perturbation is usually extremely small. A different approach is required if the aim is to find solutions of physically reasonable size.

2.3 Quick introduction to paradifferential calculus

Surprisingly, if paradifferential calculus is used to attack on (2.1), we can drastically simplify the proof of existence of solution. For simplicity, we do not present a panorama of paradifferential calculus in this article; instead, we list the propositions that suffice for our exposition, along with brief descriptions of the heuristics behind them. So let us first introduce the paraproduct operators, the central objects of this paper.

Definition 2.1. Fix Littlewood-Paley decomposition as in Definition 1.1. Given a distribution $a=a(x)$ on $\mathbb {T}^n$ , the paraproduct operator $T_a=\mathbf {Op}^{\mathrm {PM}}(a)$ associated to a is defined as, regardless of the meaning of convergence,

(2.10)

$$ \begin{align} T_au=\mathbf{Op}^{\mathrm{PM}}(a)u:=\sum_{j\geq0} S_{j-3}a\cdot \Delta_ju. \end{align} $$

For $j\geq 1$ , notice that the summand $S_{j-3}a\cdot \Delta _ju$ in (2.10) has its Fourier support contained in the annulus $\{0.25\cdot 2^j \leq |\xi |\leq 2.25\cdot 2^j\}$ . A standard construction is the paraproduct decomposition:

$$ \begin{align*}au=T_au+T_ua+R_{\mathrm{PM}}(a,u), \quad R_{\mathrm{PM}}(a,u)=\sum_{j,k:|j-k|<3}\Delta_ja\cdot\Delta_ku, \end{align*} $$

as long as the right-hand-side is well-defined in some suitable function space. Such decomposition separates apart high-low, low-high and high-high frequency interactions in a given multiplication $au$ , and is therefore widely used in the study of harmonic analysis and dispersive partial differential equations. We refer the reader to Section 4.4 of [Reference Grafakos34] for a comprehensive description of the role it plays in harmonic analysis. [Reference Kenig, Ponce and Vega48, Reference Kato46, Reference Staffilani80] are typical examples of application of these constructions in dispersive PDEs.

The following results on paraproduct operators are directly cited from [Reference Bony11] (see also Chapters 8-10 of [Reference Hörmander45]). They are the three fundamental ingredients of paradifferential calculus.

Proposition 2.1 (Continuity of paraproduct, rough version).

If $a\in L^\infty (\mathbb {T}^n)$ , then $T_a$ is a bounded linear operator from $H^s$ to $H^s$ for all $s\in \mathbb {R}$ , and in fact

$$ \begin{align*}\|T_a\|_{\mathcal{L}(H^s,H^s)}\lesssim_s|a|_{L^\infty}. \end{align*} $$

Proposition 2.2 (Composition of paraproducts, rough version).

If $a,b\in C_*^r$ with $r>0$ , then $T_aT_b-T_{ab}$ is a bounded linear operator from $H^s$ to $H^{s+r}$ for all $s\in \mathbb {R}$ . Furthermore, $T_aT_b-T_{ab}$ is continuously bilinear in $a,b\in C_*^r$ :

$$ \begin{align*}\big\|T_aT_b-T_{ab}\big\|_{\mathcal{L}(H^s,H^{s+r})} \lesssim_{s,r}|a|_{C^r_*}|b|_{C^r_*}. \end{align*} $$

Proposition 2.3 (Paralinearization, rough version).

Suppose $r=s-n/2>0$ , $u\in H^s(\mathbb {T}^n;\mathbb {R}^L)$ . Let $N_{s+r}$ be the least integer such that $N_{s+r}>s+r$ . Suppose $F=F(x,z)\in C^{N_{s+r}+2}(\mathbb {T}^n\times \mathbb {R}^L)$ . Then there holds the following paralinearization formula:

$$ \begin{align*}F(x,u)-F(x,0) =T_{F^{\prime}_z(x,u)}u +\mathcal{R}_{\mathrm{PL}}\big(F,u\big)\dot u \in H^s+H^{s+r}, \end{align*} $$

where $\mathcal {R}_{\mathrm {PL}}\big (F,u\big )$ is a bounded linear operator from $H^s\mapsto H^{s+r}$ , so that

$$ \begin{align*}\begin{aligned} \left\|\mathcal{R}_{\mathrm{PL}}(F,u)u\right\|_{H^{s+r}} &\leq |F|_{C^{N_{s+r}+2}}\big(1+\|u\|_{H^s}\big)\|u\|_{H^s}. \end{aligned} \end{align*} $$

If furthermore $F\in C^{N_{s+r}+2+k}$ , then $\mathcal {R}_{\mathrm {PL}}\big (F,u\big )$ has $C^k$ dependence on $u\in H^s H^{s+r}$ .

Precise statements of these results can be found in Section 3. Let us describe how Proposition 2.1–2.3 should be interpreted. The reason that Bony introduced paradifferential calculus in [Reference Bony11] is to understand microlocal regularity for solutions of nonlinear partial differential equations. For this purpose, it is necessary to try to separate out the most irregular part out of a given nonlinear expression $F(x,u)$ , which is the content of the paralinearization theorem 2.3. The terminology paralinearzation refers to the observation that the most irregular part, $T_{F^{\prime }_z(x,u)u}u$ out of $F(x,u)$ , is exactly given by the paradifferential operator corresponding to the linearization of F.

Bony observed that $T_a$ exhibits advantages over the usual product operator: the operator $M_a\colon u\mapsto au$ is bounded on all Sobolev spaces $H^s$ only if $a\in C^\infty $ , while the boundedness of $T_a$ on $H^s$ relies on the minimal regularity assumption that $a\in L^\infty $ , as indicated by Proposition 2.1. On the other hand, $T_a$ shares the same algebraic structure as the usual multiplication $M_a$ , up to smoothing remainders. While the mapping $a\mapsto M_a$ obviously is an algebra embedding from the function algebra $C^r_*$ to the operator algebra $\mathcal {L}(H^s,H^s)$ (with $s\leq r$ ), Proposition 2.2 shows that the mapping $a\mapsto T_a$ is an algebra homomorphism from the function algebra $C^r_*$ to the operator algebra $\mathcal {L}(H^s,H^s)$ modulo smoothing operators for any $s\in \mathbb {R}$ .

Therefore, Proposition 2.1–2.3 provides a procedure that enables one to study a nonlinear expression $F(x,u)$ as if it were linear: by paralinearization, one separates out the most irregular part, which obeys the same algebraic laws of the usual (linear) pseudo-differential operators.

2.4 Parahomological equation

In this subsection, we illustrate how the conjugacy equation (2.1) can be solved in an extremely simple manner, provided that the paradifferential tools from Subsection 2.3 are admitted as standard, as in the field of harmonic analysis and dispersive partial differential equations. To the best of the authors’ knowledge, the proof presented below stands as the shortest and simplest proof of KAM theorem for the circular map model available in literature.

Proof. Simple solution for (2.1).

We notice that the general Dirichlet approximate theorem implies $\tau \geq 1$ . Fix $s\geq \tau +1.5+\varepsilon $ . Suppose in a priori $u\in H^s$ , so that by Sobolev embedding, we have $u\in C^r_*$ with $r=s-0.5>1$ . Suppose f is sufficiently smooth, for example $f\in C^{N_{s+r}+2}$ , where $N_{s+r}$ is the least integer strictly greater than $s+r$ .

By Proposition 2.3, we have the paralinearization formula for $\mathscr {F}(f,U)$ :

(2.11)

$$ \begin{align} \begin{aligned} \mathscr{F}(f,U) &=\Delta_\alpha u-f-T_{f'\circ(\mathrm{Id}+u)}u -\mathcal{R}_{\mathrm{PL}}(f,u)\cdot u+\lambda\\ &=\Delta_\alpha u-T_{\Delta_\alpha u'/(1+u')}u-f +T_{[\mathscr{F}(f,U)]'/(1+u')}u -\mathcal{R}_{\mathrm{PL}}(f,u)\cdot u+\lambda\\ &=T_{(1+u'\circ\varrho_{\alpha})}\Delta_\alpha T_{1/(1+u')}u -f +T_{[\mathscr{F}(f,U)]'/(1+u')}u -\mathcal{R}_{\mathrm{PL}}(f,u)\cdot u+R_1(u)+\lambda. \end{aligned} \end{align} $$

Here we used the key identity (2.6) again. The last equality in (2.11) is valid since

$$ \begin{align*}[T_{1/(1+u')}u]\circ\varrho_{\alpha}=T_{1/(1+u'\circ\varrho_{\alpha})}(u\circ\varrho_{\alpha}). \end{align*} $$

Note that $\varrho _{\alpha }$ commutes with any Fourier multiplier, which is the reason that this equality holds. The remainder

$$ \begin{align*}\begin{aligned} R_1(u)&=\mathcal{R}_{\mathrm{CM}}(1+a,1+b_1)u+\mathcal{R}_{\mathrm{CM}}(1+a,b_2)u\\ &=\mathcal{R}_{\mathrm{CM}}(a,b_1)u+\mathcal{R}_{\mathrm{CM}}(a,b_2)u, \end{aligned} \end{align*} $$

where the last step is because that $\mathcal {R}_{\mathrm {CM}}(\cdot ,\cdot )$ is bilinear, and

$$ \begin{align*}a=u'\circ\varrho_{\alpha},\quad b_1=\frac{-u'\circ\varrho_{\alpha}}{1+u'\circ\varrho_{\alpha}},\quad b_2=-\frac{\Delta_\alpha u'}{(1+u')(1+u'\circ\varrho_{\alpha})} \quad\text{are all in }H^{s-1}\subset C_*^{\tau+\varepsilon}. \end{align*} $$

is produced when replacing $T_{ab}$ by $T_aT_b$ in view of Proposition 2.2. Formally, we obtain (2.11) simply by replacing all products with paraproducts in (2.7).

We then consider the parahomological equation for the unknown $U=(u,\lambda )\in H^s\times \mathbb {R}$ :

(2.12)

$$ \begin{align} \begin{aligned} T_{(1+u'\circ\varrho_{\alpha})}\Delta_\alpha T_{1/(1+u')}u &=f+\mathcal{R}_{\mathrm{PL}}(f,u)\cdot u-R_1(u)-\lambda. \end{aligned} \end{align} $$

Equation (2.12) collects all terms but the one linear in $\mathscr {F}(f,U)$ in (2.11). It can be directly reduced to fixed point form, which we shall refer as the parainverse equation following Hörmander [Reference Hörmander44]:

(2.13)

$$ \begin{align} \begin{aligned} u &=T_{1/(1+u')}^{-1}\Delta_\alpha^{-1}T_{(1+u'\circ\varrho_{\alpha})}^{-1}\left[f+\mathcal{R}_{\mathrm{PL}}(f,u)\cdot u-R_1(u)-\lambda\right]. \end{aligned} \end{align} $$

The equation itself uniquely determines $\lambda $ : it should balance out the mean value to make $\Delta _\alpha ^{-1}$ applicable.

By Proposition 2.2, the remainder

$$ \begin{align*}R_1(u)\in H^{s+r-1}\subset H^{s+\tau+\varepsilon}, \end{align*} $$

is continuous in $u\in H^s$ , and vanishes quadratically as $u\mapsto 0\ni H^s$ . By proposition 2.3,

$$ \begin{align*}\big\|\mathcal{R}_{\mathrm{PL}}(f,u)\cdot u\big\|_{H^{s+r}} \leq C_s|f|_{C^{N_{s+r}+2}}\|u\|_{H^s}. \end{align*} $$

Due to Proposition 2.1 and (2.4), we find that the right-hand-side of (2.13) is a well-defined continuous mapping from $H^s$ to $H^{s+\varepsilon }$ . The paradifferential remainder estimates further ensure that if ${\|u\|_{H^s}\leq \delta \ll 1}$ , then the $H^{s+\varepsilon }$ norm of the right-hand-side of (2.13) is controlled by

$$ \begin{align*}C\gamma^{-1}\left(|f|_{C^{N_{s+r}+2}}\delta+\delta^2\right). \end{align*} $$

Thus if we further require $\delta \ll \min (\gamma ,1)$ and $|f|_{C^{N_{s+r}+2}}\ll \delta $ , the right-hand-side of (2.13) will be a continuous mapping from the closed ball $\bar B_\delta (0)\subset H^s$ to itself, with range actually in $H^{s+\varepsilon }$ . By the Schauder fixed point theorem, such a mapping has a fixed point u. We thus have a solution $U=(u,\lambda )$ of the parainverse equation (2.13), hence the parahomological equation (2.12).

Given this solution U, the paralinearization formula (2.11) yields

$$ \begin{align*}\mathscr{F}(f,U) -T_{[\mathscr{F}(f,U)]'/(1+u')}u=0. \end{align*} $$

If we consider the left-hand-side as a linear operator acting on $\mathscr {F}(f,U)\in C^1$ , then by Proposition 2.1, it follows that

$$ \begin{align*}\big\|T_{[\mathscr{F}(f,U)]'/(1+u')}u\big\|_{H^{s}} \lesssim \big|\mathscr{F}(f,U)\big|_{C^1}\|u\|_{H^s}. \end{align*} $$

So if $|f|_{C^{N_{s+r}+2}}$ is sufficiently small (hence the solution u has small $H^s$ norm), a Neumann series argument forces $\mathscr {F}(f,U)=0$ .

Remark 2.1. Schauder’s fixed point theorem implies the existence of solution in a nonconstructive manner. If we require more differentiability of f, for example $f\in C^{N_{s+r}+3}$ instead of $C^{N_{s+r}+2}$ , then the remainder $\mathcal {R}_{\mathrm {PL}}(f,u)\cdot uu$ will be continuously differentiable in u. In fact, we can directly compute (see, e.g., [Reference Hörmander44]) the linearization of paralinearization remainder along an increment v as

$$ \begin{align*}\begin{aligned} D_u\left[\mathcal{R}_{\mathrm{PL}}(F,u)\cdot u\right]v &=F'(u)v-T_{F"(u)v}u-T_{F'(u)}v\\ &=T_v\left(\mathcal{R}_{\mathrm{PL}}(F',u)\cdot u\right)+\text{smoother terms}. \end{aligned} \end{align*} $$

If the proof above, if $f\in C^{N_{s+r}+3}$ , then $u\mapsto \mathcal {R}_{\mathrm {PL}}(f,u)\cdot u$ is a $C^1$ mapping from $H^s$ to $H^{s+r}$ . Given that f is close to $0$ in $C^{N_{s+r}+3}$ , a Banach fixed point argument is then applicable, since the right-hand-side of (2.13) will have Lipschitz constant $<1$ in u. The solution u is thus the limit of a Banach fixed point iteration sequence. Furthermore, if we assume $f\in C^\infty $ , then by differentiating the fixed point equation (2.1), we obtain a linear iterative sequence that converges to the derivative of u. This easily yields the additional regularity $u\in C^\infty $ .

Formally, (2.12) is the paradifferential counterpart of the linearized problem (2.5). Nevertheless, the gain of regularity through paralinearization balances with the loss, and thus renders unnecessary any convergence-improvement technique. This significantly differs from the prevailing practice in the existing KAM theory literature, with perhaps the only exception being Herman’s Schwarz derivative technique; see the discussion in Subsection 1.2.

We note that the usual KAM/Nash-Moser iteration (2.9) relies on smoothing operators that are essentially Fourier truncations, which are also used in the construction of paradifferential operators as seen in (2.1). However, smoothing operators are not exploited to its maximal potential in KAM/Nash-Moser iteration since interactions of different frequencies are not used to balance the regularity loss caused by $\Delta _\alpha ^{-1}$ . This is exactly captured by paraproduct operators and paralinearization.

Furthermore, we emphasize that paraproduct operators, together with the remainders in paralinearization, all admit very explicit expressions, essentially involving only convolutions. Consequently, the size of the perturbation f in (2.13), hence in the original conjugacy problem (2.1), does not have to be unreasonably small, as the proof of Banach or Schauder fixed point theorem indicates.

2.5 Refinement in regularity

If we take into account that $x+u(x)$ is a diffeomorphism of $\mathbb {S}^1$ , then the loss of regularity can be managed more delicately using paracomposition operator introduced by Alinhac [Reference Alinhac5].

Definition 2.2. Let $\chi :\mathbb {T}^n\mapsto \mathbb {T}^n$ be a Lipschitz diffeomorphism. The paracomposition operator $\chi ^\star $ associated to $\chi $ is defined by

$$ \begin{align*}\chi^\star F:=\sum_{j\geq0}(S_{j+N}-S_{j-N})\big((\Delta_jF)\circ\chi\big). \end{align*} $$

Here N depends only on $|\partial \chi |_{L^\infty }$ .

In fact, the paracomposition operator admits several equivalent definitions, see for example Appendix A of Chapter 2 in [Reference Taylor82] or Section 5 of [Reference Alazard and Métivier2]; but we do not aim to elaborate them within our current paper. We directly cite several results concerning regularity of paracomposition operators. The proof can be found in, for example, [Reference Alinhac5], Appendix A of Chapter 2 in [Reference Taylor82], Section 3 of [Reference Nguyen72], or Section 5 of [Reference Said76]. Roughly speaking, the paracomposition operator captures the “irregularity” of the paralinearization remainder $F(u)-T_{F'(u)}u$ .

Proposition 2.4. If $\chi :\mathbb {T}^n\mapsto \mathbb {T}^n$ is a Lipschitz diffeomorphism, then $\chi ^\star $ is a bounded linear operator from $H^s$ to $H^s$ for all $s\in \mathbb {R}$ :

$$ \begin{align*}\|\chi^\star\|_{\mathcal{L}(H^s,H^s)} \leq K_s\big(|\partial\chi|_{L^\infty},|\partial\chi^{-1}|_{L^\infty}\big). \end{align*} $$

Furthermore, given $F\in H^s(\mathbb {T}^n)$ , the mapping $\chi \mapsto \chi ^\star F$ is continuous from $W^{1,\infty }$ to $H^s$ .

Proposition 2.5. Suppose $\rho>1$ , $\chi :\mathbb {T}^n\mapsto \mathbb {T}^n$ is a $C^\rho _*$ diffeomorphism, and $F\in H^{s+1}(\mathbb {S}^1)$ with $r:=s-n/2>0$ . Then the following paralinearization formula holds:

$$ \begin{align*}F\circ\chi=\chi^\star F+T_{F'\circ\chi}\chi+\mathcal{R}_{A}(\chi)F, \end{align*} $$

where

$$ \begin{align*}\big\|\mathcal{R}_{A}(\chi)F\big\|_{H^{\rho+\min(\rho,r)}} \leq K_s\big(|\chi|_{C^\rho_*},|\chi^{-1}|_{C^1}\big)\|F\|_{H^{s+1}}. \end{align*} $$

Furthermore, given $F\in H^s(\mathbb {T}^n)$ and $\varepsilon>0$ , the mapping $\chi \mapsto \mathcal {R}_{A}(\chi )F$ is continuous from $C^{\rho }_*$ to $H^{s-\varepsilon }$ .

Remark 2.2. In fact, continuous dependence of paracomposition on the diffeomorphism $\chi $ was not explicitly proved in the literature cited above. However, one can directly read it off from the proof with a mere application of the dominated convergence theorem.

We can give a refined proof of sovability of (2.1). We still assume in a priori that $u\in H^s$ , where $s\geq \tau +1.5+\varepsilon $ , and $f\in H^{s+1}$ . We then pick $r=\rho =s-1/2$ . By Proposition 2.5, with $\chi =\mathrm {Id}+u$ , we have the paralinearization formula for $\mathscr {F}(f,U)$ :

(2.14)

$$ \begin{align} \begin{aligned} \mathscr{F}(f,U) &=\Delta_\alpha u-\chi^\star f-T_{f'\circ\chi}\chi -\mathcal{R}_A(\chi)f+\lambda\\ &=\Delta_\alpha u-T_{\Delta_\alpha u'/(1+u')}u-\chi^\star f +T_{[\mathscr{F}(f,U)]'/(1+u')}u -\mathcal{R}_A(\chi)f+\lambda\\ &=T_{(1+u'\circ\varrho_{\alpha})}\Delta_\alpha T_{1/(1+u')}u -\chi^\star f +T_{[\mathscr{F}(f,U)]'/(1+u')}u -\mathcal{R}_A(\chi)f+R_1(u)+\lambda. \end{aligned} \end{align} $$

Here the computations are almost the same as those in (2.11): the remainder $R_1(u)\in H^{s+r-1}\subset H^{s+\tau +\varepsilon }$ and vanishes bilinearly when $H^s\ni u\mapsto 0$ , while the paracomposition remainder $\mathcal {R}_A(\chi )f\in H^{2s-1}$ by Proposition 2.5.

We then consider the parahomological equation for the unknown $U=(u,\lambda )\in H^s\times \mathbb {R}$ :

(2.15)

$$ \begin{align} \begin{aligned} T_{(1+u'\circ\varrho_{\alpha})}\Delta_\alpha T_{1/(1+u')}u &=\chi^\star f+\mathcal{R}_A(\chi)f-R_1(u)-\lambda. \end{aligned} \end{align} $$

Equation (2.15) is in fact just (2.12) in a delicate form. The parainverse equation (2.13) then takes an equivalent form

(2.16)

$$ \begin{align} \begin{aligned} u &=T_{1/(1+u')}^{-1}\Delta_\alpha^{-1}T_{(1+u'\circ\varrho_{\alpha})}^{-1}\big(\chi^\star f+\mathcal{R}_A(\chi)f-R_1(u)-\lambda\big). \end{aligned} \end{align} $$

Here the $\lambda $ is still uniquely determined to balance out the mean value, making $\Delta _\alpha ^{-1}$ applicable.

Due to Proposition 2.4 and 2.5, we find that the right-hand-side of (2.16) is a well-defined continuous mapping from $H^s$ to $H^{s+\varepsilon }$ . The paradifferential remainder estimates further ensure that if $\|u\|_{H^s}\leq \delta $ and $|u'|_{L^\infty }\leq \rho $ , then the $H^{s+\varepsilon }$ norm of the right-hand-side of (2.13) is controlled by

$$ \begin{align*}K(\delta,\rho)\|f\|_{H^{s+\tau+\varepsilon}}+C\delta^2, \end{align*} $$

where K is an increasing function. Thus if $\|f\|_{H^{s+\tau +\varepsilon }}$ is sufficiently small, the right-hand-side of (2.16) will be a continuous mapping from the closed ball $\bar B(0,\delta )\subset H^s$ to itself, with compact range. By the Schauder fixed point theorem, such a mapping has a fixed point. The rest of the argument will then be identical with Subsection 2.4.

Sharing the same spirit with Subsection 2.4, the paracomposition approach just presented obviously requires less regularity for f when $\tau \leq 2.5$ , at the price of being technically more complicated. To keep our narration as simple as possible, we do not elaborate this approach within this paper. See Appendix A for more discussion.

2.6 Heuristics about generality

It turns out that the aforementioned method is valuable not only in one-dimensional scenarios. In fact, it encompasses all the features of general conjugacy problems of interest in dynamical systems. Roughly speaking, this is the paradifferential calculus version of Zehnder’s heuristics in Section 5 of [Reference Zehnder86]. We refer the reader to Zehnder’s paper for his original argument, and present here the paradifferential version.

The formalism of conjugacy problems is as follows. Given an open set $\mathfrak {B}$ in a Banach space and an infinite dimensional Lie group $\mathfrak {G}$ , suppose there is a differentiable group action (“dynamical system”)

$$ \begin{align*}\Phi:\mathfrak{B}\times\mathfrak{G}\mapsto\mathfrak{B}. \end{align*} $$

Then $\Phi $ is subjected to the algebraic relation

(2.17)

$$ \begin{align} \Phi(f,\mathrm{Id})=f, \quad \Phi(f,g_2\circ g_1)=\Phi\big(\Phi(f,g_2),g_1\big). \end{align} $$

This is simply a result of definition of group homomorphism.

Now suppose $\mathfrak {N}\subset \mathfrak {B}$ is a closed linear submanifold, usually being the normal forms of the dynamical system. The conjugacy problem then reads: is $\mathfrak {N}$ structurally stable in this dynamical system? That is, if $f\in \mathfrak {B}$ is close to $\mathfrak {N}$ , is it possible to find an element $g\in \mathfrak {G}$ close to the identity, so that $\Phi (f,g)\in \mathfrak {N}$ ?

To answer this question, introduce the unknwon $u=(\gamma ,\mathfrak {n})$ , where $\gamma $ belongs to the Lie algebra of $\mathfrak {G}$ and $\mathfrak {n}\in \mathfrak {N}$ . Suppose $\mathcal {E}$ is a local chartFootnote ³ near the identity of the group $\mathfrak {G}$ . Define a mapping

$$ \begin{align*}\mathscr{F}(f,u)=\Phi(f,\mathcal{E}(\gamma))-\mathfrak{n}. \end{align*} $$

The conjugacy problem is then equivalent to finding zeroes of $\mathscr {F}$ . Suppose that an approximate solution $u_0=(0,\mathfrak {n}_0)$ is already found, that is $\mathscr {F}(f,u_0)$ is close to 0. We then aim to look for u close to 0 such that $\mathscr {F}(f,u)=0$

Introduce the map $\chi _\gamma $ on the Lie algebra of $\mathfrak {G}$ by $\mathcal {E}(\chi _\gamma (\gamma _1))=\mathcal {E}(\gamma )\circ \mathcal {E}(\gamma _1)$ . Then (2.17) becomes

$$ \begin{align*}\Phi\big(f,\chi_\gamma(\gamma_1)\big) =\mathscr{F}\big(\Phi(f,\gamma),\gamma_1\big). \end{align*} $$

Denote by $v=(\psi ,\mathfrak {m})$ the increment of $u=(\gamma ,\mathfrak {n})$ , and set $L_\gamma v=(D\chi _\gamma (0)\psi ,\mathfrak {m})$ . Differentiating with respect to u, evaluating at $\gamma _1=0$ , we obtain

$$ \begin{align*}D_u\mathscr{F}(f,u)L_\gamma v =D_u\mathscr{F}\big(\Phi(f,\gamma),(0,\mathfrak{n})\big)v. \end{align*} $$

Thus by Taylor’s formula,

$$ \begin{align*}\begin{aligned} D_u\mathscr{F}(f,u)L_\gamma v &=D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)v +\left(D_u\mathscr{F}\big(\Phi(f,\gamma),(0,\mathfrak{n})\big) -D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)\right)v\\ &=D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)v +\boldsymbol{B}\big(\mathscr{F}(f,u),v\big), \end{aligned} \end{align*} $$

where $\boldsymbol {B}$ vanishes bilinearly near (0,0). Equivalently,

(2.18)

$$ \begin{align} \begin{aligned} D_u\mathscr{F}(f,u)v &=D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)L_\gamma^{1}v +\boldsymbol{B}\big(\mathscr{F}(f,u),L_\gamma^{1}v\big). \end{aligned} \end{align} $$

The idea of paradifferential approach to the conjugacy equation $\mathscr {F}(f,u)=0$ is then based on the algebraic identity (2.18). Suppose the Banach space $\mathfrak {B}$ and Lie group $\mathfrak {G}$ are realized as functions and (finite-dimensional) diffeomorphisms, and the mapping $\mathscr {F}(f,u)$ is a classical nonlinear differential operator acting on u, in the sense that it is a function of $(u,\partial u,\partial ^2u,\cdots )$ . Then the paralinearization theorem of Bony [Reference Bony11] should imply

(2.19)

$$ \begin{align} \begin{aligned} \mathscr{F}(f,u) &=\mathscr{F}(f,u_0)+T_{D_u\mathscr{F}(f,u)}(u-u_0) +\boldsymbol{R}_1(u-u_0)\\ &=T_{D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)}T_{L_\gamma^{1}}(u-u_0) +\mathscr{F}(f,u_0)+\boldsymbol{R}_2(u-u_0) +\boldsymbol{B}\big(T_{\mathscr{F}(f,u)},T_{L_\gamma^{1}}(u-u_0)\big). \end{aligned} \end{align} $$

Here $u=(\gamma ,\mathfrak {n})\simeq (0,\mathfrak {n}_0)$ , and $\boldsymbol {R}_1,\boldsymbol {R}_2$ are smoother remainders produced by paralinearization. The somewhat abused notation $\boldsymbol {B}\big (T_{\mathscr {F}(f,u)},T_{L_\gamma ^{1}}(u-u_0)\big )$ stands for the replacement of “product” $\boldsymbol {B}\big ({\mathscr {F}(f,u)},L_\gamma ^{1}(u-u_0)\big )$ by the corresponding “paraproduct” $\boldsymbol {B}\big (T_{\mathscr {F}(f,u)},T_{L_\gamma ^{1}}(u-u_0)\big )$ , should $\boldsymbol {B}$ involve only usual functional operations.

The following step is to make sure that the parahomological equation

(2.20)

$$ \begin{align} T_{D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)}T_{L_\gamma^{1}}(u-u_0) +\mathscr{F}(f,u_0)+\boldsymbol{R}_2(u-u_0)=0 \end{align} $$

is solvable. We will actually consider its equivalent fixed point form, refered as the parainverse equation:

$$ \begin{align*}(u-u_0) =-T_{L_\gamma^{1}}^{-1}T_{D_u\mathscr{F}\big(\mathfrak{n},(0,\mathfrak{n})\big)}^{-1}\big(\mathscr{F}(f,u_0)+\boldsymbol{R}_2(u-u_0)\big), \end{align*} $$

and make sure that it is solvable.

For most conjugacy problems, this really is the case, even if inverting $D_u\mathscr {F}\big (\mathfrak {n},(0,\mathfrak {n})\big )$ causes a loss of regularity due to small denominator. In fact, that loss of regularity can be compensated by the additional smoothness of the paradifferential remainder $\boldsymbol {R}_2(u-u_0)$ . Thus, if $\mathscr {F}(f,u_0)$ is sufficiently smooth, the right-hand-side of the fixed point equation defines a mapping with no loss of regularity (e.g., from $H^s$ to $H^s$ ). We note that this is the paradifferential version of Zehnder’s “finding approximate right inverse.” Consequently, assuming $\mathscr {F}(f,u_0)$ is close enough to 0, a standard fixed point argument should produce a solution $u\simeq u_0$ of (2.20).

Once a solution u to the parahomological equation (2.20) is found, the first three terms in the right-hand-side of (2.19) are then cancelled, and the paralinearization formula then becomes

$$ \begin{align*}\mathscr{F}(f,u) =\boldsymbol{B}\big(T_{\mathscr{F}(f,u)},L_\gamma^{1}(u-u_0)\big) \approx T_{\mathscr{F}(f,u)}L_\gamma^{1}(u-u_0). \end{align*} $$

Now since a paradifferential operator $T_a$ depends only on very mild regularity of the symbol a, this equation must imply $\mathscr {F}(f,u)=0$ by a Neumann series argument, provided that u is close to $u_0$ .

To conclude, solution of the parahomological equation in fact solves the original conjufacy problem.

Of course, the steps outlined above are merely heuristics. In practice, each of the assumptions they rely upon must be validated in a case-by-case fashion. Nevertheless, given that these intuitive deductions pertain to very general scenarios, it is reasonable to expect that they adequately cover several well-studied conjugacy problems. We point out, omitting most details, that the following model problems all fit into this formalism: conjugation of standard map (see, e.g., Section 2 of [Reference de la Llave60]), structural stability of parallel vector field on torus (see, e.g., [Reference Pöschel74]), existence of invariant torus of symplectic map (see, e.g., [Reference de la Llave, González, Jorba and Villanueva59]), and conjugation of Hamiltonian vector fields. We will elaborate on the last one in Section 4.

The idea of replacing a linearized equation by its paradifferential version dates back to Hörmander’s paper [Reference Hörmander44], where Hörmander proposed the terminology parainverse. Hörmander only considered nonlinear mappings with exactly invertible linearized operators, instead of approximately invertible ones. Nevertheless, it is still appropriate to attribute the core idea of the aforementioned heuristics to [Reference Hörmander44] (and [Reference Zehnder86]).

3 Quantitative paraproduct and paralinearization estimates

In this section, we present quantitative estimates concerning paraproducts and paralinearization. They are refinements of standard contents in modern harmonic analysis; see for example, [Reference Bony11, Reference Meyer64, Reference Meyer65], Chapter 9-10 of [Reference Hörmander45], or Chapter 3-4 of [Reference Métivier63]. From a harmonic analysis point of view, paradifferential operators fall into special subclasses of pseudo-differential operators of “forbidden” type (1,1), but still enjoy a symbolic calculus as usual (1,0) pseudo-differential operators do. However, to keep prerequisites minimal, we attempt to give neither comprehensive exposition of general (1,1) pseudo-differential operators nor the symbolic calculus, but confine ourselves with what is necessary.

3.1 Meyer multipliers

Before providing the quantitative paradifferential operator estimates, let us first state some auxiliary results relating to Meyer multipliers. Formally, given a sequence $\{m_j(x)\}_{j\geq 0}$ of functions on $\mathbb {T}^n$ , we call the linear operator

$$ \begin{align*}f(x)\mapsto\sum_{j\geq0} m_j(x)\cdot(\Delta_jf)(x) \end{align*} $$

the Meyer multiplier associated to $\{m_j\}_{j\geq 0}$ . Here the $\Delta _j$ ’s are the Littlewood-Paley building block operators introduced in (1.10). The fundamental theorem for such a linear operator is the following:

Proposition 3.1 (Meyer multiplier).

Let $s,r\in \mathbb {R}$ , and suppose that $s+r>0$ . Let $N_{s+r}$ be the smallest integer such that $N_{s+r}>s+r$ . Suppose $(m_j)_{j\geq 0}$ is a sequence in $L^\infty (\mathbb {T}^n)$ , such that for multi-indices $\alpha $ with $|\alpha |\leq N_{s+r}$ , there holds

$$ \begin{align*}\big|\partial^\alpha m_j\big|_{L^\infty} \leq M_\alpha 2^{j(|\alpha|-r)}. \end{align*} $$

Then the linear operator

$$ \begin{align*}f\mapsto\sum_{j=0}^\infty m_j\Delta_j f \end{align*} $$

is bounded from $H^s(\mathbb {T}^n)$ to $H^{s+r}(\mathbb {T}^n)$ . The norm of this operator is estimated by

$$ \begin{align*}C_s\sum_{\beta:|\beta|\leq N_{s+r}}M_\beta. \end{align*} $$

We need a lemma concerning Sobolev regularity of series to aid the proof of Proposition 3.1. It is of some independent interest per se.

Lemma 3.1. Let $s>0$ , let $N_s$ be the smallest integer such that $N_s>s$ . There is a constant $C_s$ with the following properties. If $\{f_k\}_{k\geq 0}$ is a sequence in $H^{N_s}(\mathbb {T}^n)$ , such that for any multi-index $\alpha $ with $|\alpha |\leq N_s$ , there holds

$$ \begin{align*}\|\partial^\alpha f_k\|_{L^2}\leq 2^{k(|\alpha|-s)}c_k,\quad (c_k)\in \ell^2(\mathbb{N}), \end{align*} $$

then $\sum _k f_k\in H^s(\mathbb {R}^n)$ and

$$ \begin{align*}\left\|\sum_{k\geq0}f_k\right\|_{H^s}^2 \leq C_s\sum_{k\geq0}c_k^2. \end{align*} $$

Proof. If $j\leq k$ , then by taking $\alpha =0$ , obviously

$$ \begin{align*}\|\Delta_jf_k\|_{L^2}\leq\|f_k\|_{L^2}\leq 2^{-ks}c_k. \end{align*} $$

If $j>k$ , then by Bernstein’s inequality,

$$ \begin{align*}\|\Delta_jf_k\|_{L^2} \leq C_s2^{-N_sj}\sum_{|\alpha|=N_s}\|\partial^\alpha f_k\|_{L^2} \leq C_s2^{-ks}2^{N_s(k-j)}c_k. \end{align*} $$

Thus, for $f=\sum _kf_k$ , we have

(3.1)

$$ \begin{align} 2^{js}\|\Delta_j f\|_{L^2} \leq C_s\sum_{k:k<j}2^{(N_s-s)(k-j)}c_k +\sum_{k:k\geq j}2^{(j-k)s}c_k. \end{align} $$

Given that $s>0$ and $N_s>s$ , the right-hand-side is the convolution of the $\ell ^1$ sequences $(2^{-(N_s-s)k}),(2^{-sk})$ with the $\ell ^2$ sequence $(c_k)$ . Therefore, the right-hand-side forms an $\ell ^2$ sequence (with index j), with norm controlled by $C_s\|(c_k)\|_{\ell ^2}$ . Thus $f\in H^s$ .

Proof of Proposition 3.1.

In fact, for $|\alpha |\leq N_{s+r}$ , each summand $m_j\Delta _j f$ satisfies

$$ \begin{align*}\begin{aligned} \big\|\partial^\alpha(m_j\Delta_jf)\big\|_{L^2} &\leq C_\alpha \sum_{\beta:\beta\leq\alpha} \big|\partial^\beta m_j\big|_{L^\infty}\big\|\partial^{\alpha-\beta}\Delta_jf\big\|_{L^2} \\ &\leq C_\alpha 2^{j(|\alpha|-r-s)}\left(\sum_{\beta:\beta\leq\alpha}M_\beta\right)2^{js}\|\Delta_jf\|_{L^2}. \end{aligned} \end{align*} $$

We can then directly apply Lemma 3.1 to conclude.

Corollary 3.1.1. If the sequence $\{f_k\}$ in Lemma 3.1 in addition satisfies the spectral condition

$$ \begin{align*}\operatorname{\mathrm{supp}}\hat{f}_k\subset\{|\xi|\geq \delta 2^k\} \end{align*} $$

for some $\gamma>0$ , then the restriction $s>0$ in Lemma 3.1 can be relaxed to any real index s:

$$ \begin{align*}\left\|\sum_{k\geq0}f_k\right\|_{H^s}^2 \leq C_{s,\delta}\sum_{k\geq0}c_k^2. \end{align*} $$

Proof. In fact, if the $f_k$ ’s satisfy the spectral condition, then the second sum in (3.1) becomes a finite one, controlled by $O(\log (\delta ^{-1}))$ . Thus the sequence $\sum _{k:k\geq j}2^{(j-k)s}c_k$ automatically forms an $\ell ^2$ sequence (with index j).

Remark 3.1. Suppose the multipliers $m_j=m_j(\lambda )$ in Proposition 3.1 depend on a parameter $\lambda $ varying in some Banach space, such that

$$ \begin{align*}\max_{\alpha:|\alpha|\leq N_{s+r}}\sup_j2^{-j(|\alpha|-r)}\big|\partial^\alpha D_\lambda m_j(\lambda)\big|_{L^\infty} <\infty. \end{align*} $$

Then we can simply repeat the proof of Proposition 3.1 to conclude the following: the operator-valued function

$$ \begin{align*}\lambda\mapsto\sum_{j=0}^\infty m_j(\lambda)\Delta_j \in\mathcal{L}(H^s,H^{s+r}) \end{align*} $$

is differentiable in $\lambda $ under the norm topology of $\mathcal {L}(H^s,H^{s+r})$ , whose differential is simply

$$ \begin{align*}\sum_{j=0}^\infty D_\lambda m_j(\lambda)\Delta_j. \end{align*} $$

3.2 Quantitative estimates for paraproduct operators

With the aid of Meyer multiplier estimates, we are now able to provide a quantitative proof of boundedness of paraproduct operators. We will first list several properties related to Littlewood-Paley characterization of the Zygmund space. The proof is quite elementary using the definition of Zygmund spaces, as in (1.11).

Proposition 3.2. Suppose $r>0$ and $f\in C^r_*$ . Then for any multi-index $\alpha $ , there holds

$$ \begin{align*}\big|\partial^\alpha S_jf\big|_{L^\infty} \lesssim_{r,\alpha} \left\{ \begin{aligned} & 2^{j(|\alpha|-r)_+}|f|_{C^r_*} & \quad |\alpha|\neq r \\ & j|f|_{C^r_*} & \quad |\alpha|=r \end{aligned} \right. \end{align*} $$

where $s_+=\max (s,0)$ . Furthremore, there holds

$$ \begin{align*}|f-S_jf|\lesssim_r 2^{-jr}|f|_{C^r_*}. \end{align*} $$

We are now at the place to state and prove the continuity of paraproduct operators, which appeared as Proposition 2.1.

Proposition 3.3. For all $a\in L^\infty (\mathbb {T}^n)$ and all $s\in \mathbb {R}$ , the paraproduct $T_a$ is a bounded linear operator from $H^s(\mathbb {T}^n)$ to $H^s(\mathbb {T}^n)$ :

$$ \begin{align*}\left\Vert T_a\right\Vert{}_{\mathcal{L}(H^s,H^s)}\leq C_s|a|_{L^\infty}. \end{align*} $$

Proof. In fact, in the defining equality (2.10), each summand has Fourier support in the annulus $\{0.25\cdot 2^j \leq |\xi |\leq 2.25\cdot 2^j\}$ . We then simply notice that the Meyer multipliers $m_j=S_{j-3}a$ satisfy

$$ \begin{align*}\big|\partial^\alpha m_j\big|_{L^\infty} \leq C_\alpha 2^{j|\alpha|}|a|_{L^\infty} \end{align*} $$

by Proposition 3.2. Thus we can repeat the proof of Corollary 3.1.1 and conclude.

The nontrivial fact about paraproduct operators is that they behave as if they were genuine multiplication operators: the multiplication $T_aT_b$ is the same as $T_{ab}$ modulo smoothing operators. Formally, this means that $a\mapsto T_a$ is an algebra homomorphism from the algebra of functions to the algebra of bounded operators (modulo smoothing ones). We provide the precise version of Proposition 2.2:

Proposition 3.4. Fix $r>0$ , and suppose $a,b\in C^r_*$ . Then

$$ \begin{align*}\mathcal{R}_{\mathrm{CM}}(a,b):=T_a\circ T_b-T_{ab} \end{align*} $$

is a bounded linear operator from $H^s(\mathbb {T}^n)$ to $H^{s+r}(\mathbb {T}^n)$ for all $s\in \mathbb {R}$ , where CM is the abbreviation for composition de paramultiplication, satisfying the tame estimate

$$ \begin{align*}\left\Vert \mathcal{R}_{\mathrm{CM}}(a,b)\right\Vert{}_{\mathcal{L}(H^s, H^{s+r})}\leq C_{s,r}\big(|a|_{L^\infty}|b|_{C^r_*}+|a|_{C^r_*}|b|_{L^\infty}\big). \end{align*} $$

In other words, $\mathcal {R}_{\mathrm {CM}}:C^r_* \times C^r_* \mapsto \mathcal {L}(H^s, H^{s+r})$ is a bounded bilinear mapping.

Proof. The proof here is exactly the same as Theorem 2.3 of [Reference Bony11]. Suppose $a,b\in C^r_*$ and $u\in H^s$ are given. We set $v=T_bu$ , $v_q=S_{q-3}b\cdot \Delta _qu$ . By Corollary 3.1.1 and the definition (1.11) of Zygmund space norm, the error

$$ \begin{align*}R_1v:=T_av-\sum_{p}S_{p-5}a\cdot \Delta_pv =\sum_{p}(\Delta_{p-4}a+\Delta_{p-3}a)\Delta_pu \end{align*} $$

satisfies $\|R_1v\|_{H^{s+r}}\leq C_{s,r}|a|_{C^r_*}|b|_{L^\infty }\|u\|_{H^{s}}$ , since each summand $(\Delta _{p-4}a+\Delta _{p-3}a)\Delta _pv$ has Fourier support contained in the annulus $\{0.25\cdot 2^p\leq |\xi |\leq 2.25\cdot 2^p\}$ . We then compute

(3.2)

$$ \begin{align} \begin{aligned} T_aT_bu-\sum_{q}(S_{q-5}a)(S_{q-3}b)\Delta_qu &= T_aT_bu-\sum_{q}\sum_{p:p\leq q-5}\Delta_pa\cdot v_q\\ &=\sum_{q}\sum_{p:p\leq q-5}\Delta_pa\cdot \big(\Delta_qv-v_q\big)+R_1T_bu\\ &=\sum_{p\geq-5}\Delta_pa\sum_{q:p+5\leq q \leq p+7} \big(\Delta_qv-v_q\big)+R_1T_bu. \end{aligned} \end{align} $$

The last step is because $\sum _q v_q=\sum _q \Delta _qv$ and $v_q$ has Fourier support contained in the annulus $\{0.25\cdot 2^q\leq |\xi |\leq 2.25\cdot 2^q\}$ . Using Corollary 3.1.1 once again, we estimate (3.2) as

(3.3)

$$ \begin{align} \left\|T_aT_bu-\sum_{q}(S_{q-5}a)(S_{q-3}b)\Delta_qu\right\|_{H^{s+r}} \leq C_{s,r}|a|_{C^r_*}|b|_{L^\infty}\|u\|_{H^s}. \end{align} $$

On the other hand, if we set

$$ \begin{align*}c_q=(S_{q-5}a)(S_{q-3}b), \end{align*} $$

then $c_q$ has Fourier support contained in the ball $|\xi |\leq 0.25\cdot 2^q$ , and

(3.4)

$$ \begin{align} \begin{aligned} |ab-c_q|_{L^\infty} &\leq|\big((\mathrm{Id}-S_{q-5})a\big)b|_{L^\infty} +|(S_{q-5}a)\big((\mathrm{Id}-S_{q-5})b\big)|_{L^\infty}\\ &\leq \sum_{p>q-5}2^{-pr}|a|_{C^r_*}|b|_{L^\infty} +|a|_{L^\infty}\sum_{p>q-5}2^{-pr}|b|_{C^r_*}\\ &\leq C_r2^{-q}\big(|a|_{C^r_*}|b|_{L^\infty}+|a|_{L^\infty}|b|_{C^r_*}\big). \end{aligned} \end{align} $$

Noticing that

$$ \begin{align*}T_{ab}u-\sum_q c_q\Delta_qu =\sum_{q}\big(S_{q-3}(ab)-c_q\big)\cdot\Delta_qu =\sum_{q}\big(S_{q-3}(ab)-ab+ab-c_q\big)\cdot\Delta_qu, \end{align*} $$

using Corollary 3.1.1 and (3.4), we find

(3.5)

$$ \begin{align} \left\|T_{ab}u-\sum_{q}c_q\cdot\Delta_qu\right\|_{H^{s+r}} \leq C_{s,r}\big(|a|_{C^r_*}|b|_{L^\infty}+|a|_{L^\infty}|b|_{C^r_*}\big)\|u\|_{H^s}. \end{align} $$

Combining (3.3) and (3.5), the proof is complete.

3.3 Bony’s paralinearization theorem

In this subsection, we present a quantitative refinement of Bony’s paralinearization theorem in [Reference Bony11], which asserts that given a nonlinear composition $F(u)$ , the paraproduct $T_{F'(u)}u$ carries most of the irregularity, and

$$ \begin{align*}F(u)-T_{F'(u)}u \end{align*} $$

is more regular than $F(u)$ . This is a far-reaching generalization of the paraproduct decomposition.

While Bony’s theorem is well-established and extensively applied in modern nonlinear analysis, we still need to manage the size of the smoothing remainder to effectively study the parahomological equation. We state the following refined version of Proposition 2.3:

Proposition 3.5. Fix $s,r>0$ , and suppose $u\in (H^s\cap C^r_*)(\mathbb {T}^n;\mathbb {R}^L)$ . Let $N_{s+r}$ be the smallest integer $>s+r$ . Suppose $F=F(x,z)\in C^{N_{s+r}+2}(\mathbb {T}^n\times \mathbb {R}^L)$ . Then there holds the following paralinearization formula:

$$ \begin{align*}\begin{aligned} F(x,u)-F(x,0) &=T_{F^{\prime}_z(x,u)}u +\mathcal{R}_{\mathrm{PL}}(F,u)\cdot u\\ &=:T_{F^{\prime}_z(x,u)}u +\mathcal{R}_{\mathrm{PL};1}(F)\cdot u +\mathcal{R}_{\mathrm{PL};2}(F,u)\cdot u \in H^s+H^{s+r}. \end{aligned} \end{align*} $$

Here $\mathcal {R}_{\mathrm {PL};1}(F):H^s\mapsto H^{s+r}$ and $\mathcal {R}_{\mathrm {PL};2}(F,u):H^s\mapsto H^{s+r}$ are bounded linear operators, so that for some increasing function $K_{r,s}$ depending only on $r,s$ ,

$$ \begin{align*}\begin{aligned} \left\|\mathcal{R}_{\mathrm{PL};1}(F)\right\|_{\mathcal{L}(H^s,H^{s+r})} &\leq C_{r,s}\big|F^{\prime}_z(x,0)-\operatorname{\mathrm{Avg}} F^{\prime}_z(x,0)\big|_{C^{r}_*}\\ \left\|\mathcal{R}_{\mathrm{PL};2}(F,u)\right\|_{\mathcal{L}(H^s,H^{s+r})} &\leq K_{r,s}\big(|u|_{C^r_*}\big)|F|_{C^{N_{s+r}+2}}|u|_{C^r_*} \end{aligned} \end{align*} $$

Moreover, the operator $\mathcal {R}_{\mathrm {PL};2}\big (F,u\big )\in \mathcal {L}(H^s,H^{s+r})$ depends continuously on $u\in H^s\cap C^r_*$ in the operator norm.

If in addition, $F=F(x,z)$ is of better regularity $C^{N_{s+r}+3}(\mathbb {T}^n\times \mathbb {R}^L)$ instead of merely $C^{N_{s+r}+2}(\mathbb {T}^n\times \mathbb {R}^L)$ , then both operators $T_{F^{\prime }_z(x,u)}\in \mathcal {L}(H^s,H^{s})$ and $\mathcal {R}_{\mathrm {PL};2}\big (F(x,\cdot ),u\big )\in \mathcal {L}(H^s,H^{s+r})$ are continuously differentiable in $u\in H^s\cap C^{r}_*$ with respect to operator norms.

Proof. The proof is a refinement of the standard telescope series argument; see for example, Chapter 10 of [Reference Hörmander45]. We write

(3.6)

$$ \begin{align} \begin{aligned} F(x,u)-F(x,0) &=F(x,S_0u)+\sum_{j=1}^\infty F(x,S_ju)-F(x,S_{j-1}u)\\ &=\left(\int_0^1F^{\prime}_z(x,\tau \Delta_0u)\operatorname{\mathrm{d}}\!\tau\right)\Delta_0u +\sum_{j=1}^\infty\left(\int_0^1 F^{\prime}_z\big(x,S_{j-1}u+\tau\Delta_j u\big)\operatorname{\mathrm{d}}\!\tau\right)\Delta_ju. \end{aligned} \end{align} $$

Defining the multipliers

$$ \begin{align*}\begin{aligned} m_j^1&=(1-S_{j-3})\big(F^{\prime}_z(x,0)-\operatorname{\mathrm{Avg}} F^{\prime}_z(x,0)\big),\quad j\geq0\\ m_0^2&=\int_0^1 \Big[F^{\prime}_z\big(x,\tau\Delta_0 u\big)-F^{\prime}_z(x,0)\Big]\operatorname{\mathrm{d}}\!\tau -S_{-3}\big(F^{\prime}_z(x,u)-F^{\prime}_z(x,0)\big),\\ m_j^2&=\int_0^1 \Big[F^{\prime}_z\big(x,S_{j-1}u+\tau\Delta_j u\big)-F^{\prime}_z(x,0)\Big]\operatorname{\mathrm{d}}\!\tau -S_{j-3}\big(F^{\prime}_z(x,u)-F^{\prime}_z(x,0)\big), \quad j\geq1 \end{aligned} \end{align*} $$

we can rewrite the telescope series (3.6) as

$$ \begin{align*}F(x,u)-F(x,0) =T_{F^{\prime}_z(x,u)}u+\sum_{j=0}^\infty m_j^1\Delta_ju +\sum_{j=0}^\infty m_j^2\Delta_ju. \end{align*} $$

Let us then just define, given $u\in H^s\cap C^r_*$ , the linear operators

$$ \begin{align*}\mathcal{R}_{\mathrm{PL};1}(F)\cdot v :=\sum_{j=0}^\infty m_j^1\Delta_jv, \quad \mathcal{R}_{\mathrm{PL};2}(F,u)\cdot v :=\sum_{j=0}^\infty m_j^2\Delta_jv. \end{align*} $$

In view of Proposition 3.1, it suffices to estimate the two sets of Meyer multipliers $\{m_j^1\}$ and $\{m_j^2\}$ .

For the multipliers $\{m_j^1\}$ , we estimate them by the simple decay property in Proposition 3.2 for the Zygmund class $C^r_*$ . Since $F^{\prime }_z(x,0)\in C^{N_{s+r}+1}$ by assumption, we obtain by Proposition 3.2 that

$$ \begin{align*}|\partial^\alpha m_j^1|_{L^\infty} \leq C_{s,\alpha} 2^{j(|\alpha|-r)}\big|F^{\prime}_z(x,0)-\operatorname{\mathrm{Avg}} F^{\prime}_z(x,0)\big|_{C^{r}}, \quad |\alpha|\leq N_{s+r}. \end{align*} $$

Proposition 3.1 then ensures the estimate for the operator $\mathcal {R}_{\mathrm {PL};1}(F)$ .

The estimate for multipliers $\{m_j^2\}$ follows by the same idea but is technically more involved. In view of Proposition 3.1, it suffices to show that

(3.7)

$$ \begin{align} |\partial^\alpha m_j^2|_{L^\infty} \leq K_{s,\alpha}\big(|u|_{C^r_*}\big) 2^{j(|\alpha|-r)}|F|_{C^{N_{s+r}+2}}|u|_{C^r_*}, \quad |\alpha|\leq N_{s+r}. \end{align} $$

To obtain (3.7) with $\alpha =0$ , we notice that for $j\geq 1$ ,

(3.8)

$$ \begin{align} \begin{aligned} m_j^2&=\int_0^1 \Big[\left(F^{\prime}_z\big(x,S_{j-1}u+\tau\Delta_j u\big)-F^{\prime}_z(x,u)\right)\Big]\operatorname{\mathrm{d}}\!\tau +(1-S_{j-3})\big(F^{\prime}_z(x,u)-F^{\prime}_z(x,0)\big) \\ &=\int_{[0,1]^2}F^{\prime\prime}_{zz}\big(x,\tau_2(S_{j-1}u+\tau_1\Delta_j u)\big)\big((1-S_{j-1})u+\tau_1\Delta_ju\big)\operatorname{\mathrm{d}}\!\tau_1\operatorname{\mathrm{d}}\!\tau_2 \\ &\quad+(1-S_{j-3})\int_0^1F^{\prime\prime}_{zz}(x,\tau u)u\operatorname{\mathrm{d}}\!\tau. \end{aligned} \end{align} $$

The first term in the right-hand-side of (3.8) is directly seen (by Proposition 3.2) to be controlled by $2^{-jr}|F|_{C^{2}}|u|_{C^r_*}$ . For the second term, we simply notice that $F\in C^{N_{s+r}+2}\subset C^{r+2}$ , so

$$ \begin{align*}\big|F^{\prime\prime}_{zz}(x,\tau u)u\big|_{C^r_*} \leq K_{s}\big(|u|_{C^r_*}\big) |F|_{C^{r+2}}|u|_{C^r_*}, \end{align*} $$

which then implies the desired estimate by implementing Proposition 3.2 once again. The estiamte for $m_0^2$ is similar.

To obtain (3.7) with $|\alpha |=N_{s+r}$ , we rewrite $m_j^2$ in a different manner: for $j\geq 1$ ,

(3.9)

$$ \begin{align} \begin{aligned} m_j^2 &=\int_{[0,1]^2}F^{\prime\prime}_{zz}\big(x,\tau_2(S_{j-1}u+\tau_1\Delta_j u)\big)\big(S_{j-1}u+\tau_1\Delta_j u\big)\operatorname{\mathrm{d}}\!\tau_1\operatorname{\mathrm{d}}\!\tau_2 -S_{j-3}\int_0^1F^{\prime\prime}_{zz}(x,\tau u)u\,\operatorname{\mathrm{d}}\!\tau. \end{aligned} \end{align} $$

We then use $\big |F^{\prime\prime }_{zz}(x,\tau u)u\big |_{C^r_*} \leq K_{s}\big (|u|_{C^r_*}\big ) |F|_{C^{r+2}}|u|_{C^r_*}$ again: for $|\alpha |=N_{s+r}$ , by Proposition 3.2, the second term in (3.9) satisfies

$$ \begin{align*}\left|\partial^\alpha S_{j-3}\int_0^1F^{\prime\prime}_{zz}(x,\tau u)u\,\operatorname{\mathrm{d}}\!\tau\right| \leq 2^{j(N_{s+r}-r)}K_{s}\big(|u|_{C^r_*}\big) |F|_{C^{r+2}}|u|_{C^r_*}. \end{align*} $$

As for the first term in (3.9), by the Faà di Bruno formula, the partial derivative

$$ \begin{align*}\partial^\alpha\Big[F^{\prime\prime}_{zz}\big(x,\tau_2(S_{j-1}u+\tau_1\Delta_j u)\big)\big(S_{j-1}u+\tau_1\Delta_j u\big)\Big] \end{align*} $$

is a linear combination of terms of the form

(3.10)

$$ \begin{align} (\tau_2D_z)^{2+I}F\big(x,\tau_2(S_{j-1}u+\tau_1\Delta_j u)\big)\cdot \left[\partial^{\beta_1}\big(S_{j-1}u+\tau_1\Delta_j u\big)\right]^{i_1} \cdots \left[\partial^{\beta_l}\big(S_{j-1}u+\tau_1\Delta_j u\big)\right]^{i_l}, \end{align} $$

where the scalar indices $i_1,\cdots ,i_l,I\leq N_{s+r}$ and multi-indices $\beta _1,\cdots ,\beta _l$ satisfy

$$ \begin{align*}\begin{aligned} N_{s+r}=i_1|\beta_1|+\cdots+i_l|\beta_l|, \quad I=i_1+\cdots+i_l. \end{aligned} \end{align*} $$

By the Littlewood-Paley characterization of Zygmund functions, such a term has an upper bound

(3.11)

$$ \begin{align} C_s|F|_{C^{N_{s+r}+2}} \prod_{p:\beta_p>r}\big(|u|_{C^r_*}2^j\big)^{i_p(|\beta_p|-r)} \prod_{p:\beta_p\leq r}\big(|u|_{C^r_*}j\big)^{i_p} \leq K_{s}\big(|u|_{C^r_*}\big) |F|_{C^{N_{s+r}+2}}|u|_{C^r_*}2^{j(N_{s+r}-r)}. \end{align} $$

Note that the expression vanishes at least linearly as $u\mapsto 0$ since at least one $i_p\neq 0$ . The estimate for $\partial ^\alpha m_0^2$ is similar.

Therefore, (3.7) holds for $|\alpha |=N_{s+r}$ . By interpolation, it remains valid for those multi-indices $\alpha $ with length between $0$ and $N_{s+r}$ . This proves the operator norm estimates of the paradifferential remainders.

To prove continuous dependence on $u\in H^s\cap C^r_*$ of the operator $\mathcal {R}_{\mathrm {PL};2}\big (F(x,\cdot ),u\big )\in \mathcal {L}(H^s,H^{s+r})$ , in view of Proposition 3.1, it suffices to show that when $u_1,u_2$ are close in $C^r_*$ , then the Meyer multipliers $m_j^2$ corresponding to $u_1$ and $u_2$ will be close to each other, that is, for $|\alpha |\leq N_{s+r}$ , the quantities

(3.12)

$$ \begin{align} 2^{-j(|\alpha|-r)}\big|\partial^\alpha\big(m_j^2(u_1)-m_j^2(u_2)\big)\big|_{L^\infty} \end{align} $$

will be close to zero. This is easily verified if we take into account the assumption $F\in C^{N_{s+r}+2}$ . For example, in the summand (3.10), the worst scenario is when all the indices $i_p\equiv 1$ , so that (3.10) becomes a “highest order derivative”:

$$ \begin{align*}(\tau_2D_z)^{2+N_{s+r}}F\big(x,\tau_2(S_{j-1}u+\tau_1\Delta_j u)\big)\cdot \left[D\big(S_{j-1}u+\tau_1\Delta_j u\big)\right]^{\otimes N_{s+r}}. \end{align*} $$

When evaluated at $u_1,u_2\in C^r_*$ that are close in the $C^r_*$ topology, we find that

$$ \begin{align*}\left[(\tau_2D_z)^{2+N_{s+r}}\Delta_{12}F\big(x,\tau_2(S_{j-1}u+\tau_1\Delta_j u)\big)\right] \end{align*} $$

will be close to zero since $D_z^{2+N_{s+r}}F(x,z)$ is continuous, while using Proposition 3.2, we estimate

$$ \begin{align*}\left|\Delta_{12}\left[D\big(S_{j-1}u+\tau_1\Delta_j u\big)\right]^{\otimes N_{s+r}}\right|{}_{L^\infty} \end{align*} $$

similarly as in (3.11), with a factor $|\Delta _{12}u|_{C^r_*}$ . In conclusion, we obtain that (3.12) is close to zero when $u_1$ is close to $u_2$ in $C^r_*$ .

Finally, let us suppose F has better regularity, namely $C^{N_{s+r}+3}(\mathbb {T}^n\times \mathbb {R}^L)$ . Continuous differentiability of the operator $\mathcal {R}_{\mathrm {PL};2}\big (F(x,\cdot ),u\big )$ in u is a direct consequence of the equality

$$ \begin{align*}(D_um_j^2)v=\int_0^1 \Big[F^{\prime\prime}_{zz}\big(x,S_{j-1}u+\tau\Delta_j u\big)(S_{j-1}+\tau\Delta_j)v\Big]\operatorname{\mathrm{d}}\!\tau -S_{j-3}\big(F^{\prime\prime}_{zz}(x,u)v\big) \end{align*} $$

and Remark 3.1.

4 Existence of invariant torus

In this section, we present a detailed proof of Theorem 1.1–1.2 together with some possible extensions.

4.1 Algebraic set-up

Let us recall that we are working with the phase space $\mathbb {T}^n\times \mathbb {R}^n$ . We are interested in Hamiltonian functions $h(x,y)$ as in (1.2) and embeddings of $\mathbb {T}^n$ into the phase space close to the “flat” one (1.1)Footnote ⁴ . The torus $\{y=0\}$ is supposed to be “approximately invariant” under the phase flow of Hamilton vector field $X_h$ , and the restriction of $X_h$ is approximately the rotation generated by a constant frequency vector field $\omega \in \mathbb {R}^n$ .

Let us define the right inverse of $\nabla _\omega $ as follows:

$$ \begin{align*}\nabla_\omega^{-1}f(\theta) :=\sum_{k\in\mathbb{Z}^n\setminus\{0\}}\frac{\hat f(k)e^{ik\cdot \theta}}{i(k\cdot\omega)}. \end{align*} $$

Clearly, when restricted to the subspace of vanishing average, the operator $\nabla _\omega ^{-1}$ is also the left inverse of $\nabla _\omega $ .

The Diophantine condition (1.3) of $\omega $ enables us to estimate the operator bound of $\nabla _\omega ^{-1}$ :

(4.1)

$$ \begin{align} \big\|\nabla_\omega ^{-1}f\big\|_{H^s} \leq \gamma^{-1}\|f\|_{H^{s+\tau}}, \quad \text{if }\hat f(0)=0. \end{align} $$

We now study the conjugacy equations (1.6) and (1.9), that is,

$$ \begin{align*}X_h(u)-\nabla_\omega u=0 \end{align*} $$

and

$$ \begin{align*}X_{h_\xi}(u)-\nabla_\omega u=0. \end{align*} $$

To employ the general heuristics discussed in Subsection 2.6, we define a mapping

(4.2)

$$ \begin{align} \mathscr{F}(h,u):=X_h(u)-\nabla_\omega u, \quad \text{whose value we denote as }E. \end{align} $$

It is the “error function” measuring whether u is indeed an invariant torus of $X_h$ or not. The problem is then reduced to looking for zero $u\simeq \zeta _0$ of $\mathscr {F}(h,u)$ under the condition of Theorem 1.1, or looking for $\xi \in \mathbb {R}^n$ and zero $u\simeq \zeta _0$ of $\mathscr {F}(h_\xi ,u)$ under the condition of Theorem 1.2.

We claim that, to solve $\mathscr {F}(h,u)=0$ , it suffices to solve a slightly “softer” equation

(4.3)

$$ \begin{align} \mathscr{F}(h,u)+\begin{pmatrix}0 \\ \mu\end{pmatrix}=0 \end{align} $$

for an auxiliary constant vector $\mu \in \mathbb {R}^n$ . We directly quote the following geometric lemma from [Reference Berti and Bolle10]:

Lemma 4.1 ([Reference Berti and Bolle10], Lemma 3).

There holds

$$ \begin{align*}\mu=\operatorname{\mathrm{Avg}}\Big((\partial u^y)^{\mathsf{T}} \mathscr{F}^x(h,u)-(\partial u^x)^{\mathsf{T}} \big(\mathscr{F}^y(h,u)-\mu\big)\Big). \end{align*} $$

In particular, $\mathscr {F}(h,u)=\begin {pmatrix}0 \\ \mu \end {pmatrix}$ necessarily implies $\mu =0$ .

In this subsection, our main task is to compute the linearization of $\mathscr {F}(h,u)$ at a given u, as in [Reference de la Llave, González, Jorba and Villanueva59]. For convenience in notation, write

(4.4)

$$ \begin{align} \begin{aligned} A[u]:=(DX_h)(u) =\begin{pmatrix} D_x\nabla_yh(u) & D_y\nabla_yh(u) \\ -D_x\nabla_xh(u) & -D_y\nabla_xh(u) \end{pmatrix} \in\boldsymbol{M}_{2n\times 2n}, \end{aligned} \end{align} $$

so that

$$ \begin{align*}D_u\mathscr{F}(h,u)v=A[u]v-\nabla_\omega v. \end{align*} $$

We also write

(4.5)

$$ \begin{align} N[u]=\big((\partial u)^{\mathsf{T}} \cdot\partial u\big)^{-1}\in\boldsymbol{M}_{n\times n}, \quad M[u]=\big( \partial u \quad (J\partial u)\cdot N[u] \big)\in\boldsymbol{M}_{2n\times2n}, \end{align} $$

provided that $\partial u^{\mathsf {T}} \cdot \partial u$ is invertible. This is true when u is $C^1$ close to $\zeta _0$ , whence $N[u]=I_n+O\big (\partial (u-\zeta _0)\big )$ . This furthermore ensures that

$$ \begin{align*}M[u]= \begin{pmatrix}I_n & \\ & -I_n\end{pmatrix}+O\big(\partial(u-\zeta_0)\big), \quad M[u]^{-1}=\begin{pmatrix}I_n & \\ & -I_n\end{pmatrix}+O\big(\partial(u-\zeta_0)\big). \end{align*} $$

We then introduce a function $L[u]$ , reflecting the “lack of being Lagrangian” for the submanifold $u(\mathbb {T}^n)\subset \mathbb {T}^n\times \mathbb {R}^n$ , denoted by

(4.6)

$$ \begin{align} L[u]=(\partial u)^{\mathsf{T}} \cdot J\cdot\partial u \in\boldsymbol{M}_{n\times n}. \end{align} $$

Then $L[u]$ is nothing but the matrix representation of the pull-back of the symplectic form on $\mathbb {T}^n\times \mathbb {R}^n$ . The submanifold $u(\mathbb {T}^n)\subset \mathbb {T}^n\times \mathbb {R}^n$ is Lagrangian if and only if $L[u]=0$ . In particular, this is the case when u does solve the conjugacy equation $\mathscr {F}(h,u)=0$ .

The linearization formula for $\mathscr {F}(h,u)$ . is summarized as the following lemmaFootnote ⁵ :

Lemma 4.2. Suppose u is close to the trivial embedding $\zeta _0$ . Let $N[u],M[u],L[u]$ be as in (4.5)(4.6). Define the $n\times n$ matrix function

(4.7)

$$ \begin{align} \begin{aligned} S[u] &=\big(I_n+(N[u]L[u])^2\big)^{-1}N[u]\cdot (\partial u)^{\mathsf{T}}\cdot\big[A[u],J\big]\cdot(\partial u)\cdot N[u]. \end{aligned} \end{align} $$

Then the linearization of $\mathscr {F}(h,u)$ with respect to u satisfies

(4.8)

$$ \begin{align} D_u\mathscr{F}(h,u)\big(M[u]v\big) =M[u]\begin{pmatrix} 0_n & S[u] \\ 0_n & 0_n \end{pmatrix}v -M[u]\nabla_\omega v +\boldsymbol{B}\big(\partial\mathscr{F}(h,u),L[u]\big)v. \end{align} $$

Here in (4.8), the matrix function $\boldsymbol {B}(\partial E,L)$ is rationalFootnote ⁶ in $\partial E$ and L, vanishing linearly as $\partial E,L\to 0$ , with coefficients being rational functions of $A[u]$ , $\partial u$ and $\partial ^2u$ . We may then rewrite (4.8) as the equivalent form

(4.9)

$$ \begin{align} D_u\mathscr{F}(h,u)v =M[u]\begin{pmatrix} 0_n & S[u] \\ 0_n & 0_n \end{pmatrix}M[u]^{-1}v -M[u]\nabla_\omega \big(M[u]^{-1}v\big) +\boldsymbol{B}\big(\partial\mathscr{F}(h,u),L[u]\big)M[u]^{-1}v, \end{align} $$

Proof. Throughout this proof, for simplicity in notation, we shall drop the $[u]$ (indicating dependence on u), write, for example, A for $A[u]$ , and denote $E:=\mathscr {F}(h,u)$ . For a given u and its increment v, we first write

$$ \begin{align*}\begin{aligned} D_u\mathscr{F}(h,u)(M[u]v) &=AMv -(\nabla_\omega M)v-M\nabla_\omega v\\ &=: \big(C_1\quad C_2\big)v-M\nabla_\omega v \end{aligned} \end{align*} $$

Here we write the matrix $AM -(\nabla _\omega M)\in \boldsymbol {M}_{2n\times 2n}$ into block form $\big (C_1\quad C_2\big )$ . The task is to determine the blocks $C_1,C_2\in \boldsymbol {M}_{2n\times n}$ .

It is fairly easy to see that $C_1=\partial E$ . In fact, we may write $M[u]$ into block form as

$$ \begin{align*}M =\big( \partial u \quad (J\partial u)N \big) =:\big(M_1\quad M_2\big). \end{align*} $$

Then obviously

(4.10)

$$ \begin{align} \begin{aligned} C_1 &=AM_1-\nabla_\omega M_1\\ &=A\partial u-\nabla_\omega\partial u =\partial\big(\mathscr{F}(h,u)\big) =\partial E. \end{aligned} \end{align} $$

The true difficulty is with the expression of $C_2=A(J\partial u)N-\nabla _\omega \big ((J\partial u)N\big )$ . We try to find $n\times n$ matrices $Q,W$ , such that

(4.11)

$$ \begin{align} C_2 =(\partial u)Q+(J\partial u)NW =M_1Q+M_2W. \end{align} $$

In solving this matrix system, we take into account that $L=(\partial u)^{\mathsf {T}} J\partial u$ is supposed to be close to 0. Multiplying (4.11) from left by $(\partial u)^{\mathsf {T}} J$ , using in sequence the Hamiltonian nature $JA=-A^{\mathsf {T}} J$ and the symplectic nature $J^2=-I_{2n}$ , we obtain

(4.12)

$$ \begin{align} \begin{aligned} LQ-W &=(\partial u)^{\mathsf{T}} JA(J\partial u)N-(\partial u)^{\mathsf{T}} J\nabla_\omega \big((J\partial u)N\big)\\ &=(\partial u)^{\mathsf{T}} A^{\mathsf{T}}(\partial u) N +(\partial u)^{\mathsf{T}}(\nabla_\omega\partial u) N +(\partial u)^{\mathsf{T}}(\partial u)\nabla_\omega N\\ &=(\partial E)^{\mathsf{T}}(\partial u)N. \end{aligned} \end{align} $$

Here in the last step we differentiated the definition $(\partial u)^{\mathsf {T}}(\partial u) N=I_n$ to deduce

$$ \begin{align*}(\partial u)^{\mathsf{T}}(\nabla_\omega\partial u) N +(\partial u)^{\mathsf{T}}(\partial u)\nabla_\omega N =-\big[\nabla_\omega(\partial u)^{\mathsf{T}}\big](\partial u)N, \end{align*} $$

and then use the last equality of (4.10).

On the other hand, multiplying (4.11) from left by $N(\partial u)^{\mathsf {T}}$ , we obtain

(4.13)

$$ \begin{align} \begin{aligned} Q+NLNW &=N(\partial u)^{\mathsf{T}} A\cdot(J\partial u)N -N(\partial u)^{\mathsf{T}}\nabla_\omega \big((J\partial u)N\big)\\ &=N(\partial u)^{\mathsf{T}} \Big[A,J\Big](\partial u)N +N(\partial u)^{\mathsf{T}} JA(\partial u)N\\ &\quad-N(\partial u)^{\mathsf{T}} J(\nabla_\omega\partial u)N -N(\partial u)^{\mathsf{T}} J(\partial u) \nabla_\omega N\\ &=N(\partial u)^{\mathsf{T}} \Big[A,J\Big](\partial u)N +N(\partial u)^{\mathsf{T}} J(\partial E) N -NL\nabla_\omega N. \end{aligned} \end{align} $$

We can then directly solve from (4.12)(4.13) the expression of $Q,W$ :

(4.14)

$$ \begin{align} \begin{aligned} Q&=S +\big(I_n+(NL)^2\big)^{-1} \Big(NLN(\partial E)^{\mathsf{T}}(\partial u)N +N(\partial u)^{\mathsf{T}} J(\partial E)N-NL\nabla_\omega N\Big)\\ W&=LQ-(\partial E)^{\mathsf{T}}(\partial u)N. \end{aligned} \end{align} $$

Therefore, we immediately see that both $W-LS$ and $Q-S$ are (matrix) rational functions of L and $\partial E$ , vanishing linearly in L and $\partial E$ , with coefficients being rational functions of $\partial u$ and $\partial ^2u$ .

Combining (4.10) and (4.11), we find that the matrix $AM-(\nabla _\omega M)$ has block-form expression

$$ \begin{align*}\begin{aligned} AM-(\nabla_\omega M) &=\big(\partial E\quad M_1S\big) +\Big(0_{2n\times n} \quad M_1(Q-S)+M_2W\Big)\\ &=M\begin{pmatrix} 0_n & S \\ 0_n & 0_n \end{pmatrix} +\Big(\partial E \quad M_1(Q-S)+M_2W\Big). \end{aligned} \end{align*} $$

Therefore, the remainder $\boldsymbol {B}(\partial E,L)=\big (\partial E \quad M_1(Q-S)+M_2W\big )$ , which, by (4.14), is a matrix rational function of L and $\partial E$ , vanishing linearly in L and $\partial E$ , with coefficients being rational functions of A, $\partial u$ and $\partial ^2u$ .

Remark 4.1. Lemma 4.2 is in fact equivalent to Lemma 20 of [Reference de la Llave, González, Jorba and Villanueva59]. It also appeared in [Reference Berti and Bolle10]. However, we are not able to find an explicit expression for $D_u\mathscr {F}(h,u)v$ in these references, which is the reason that we choose to write down our own proof. We observe that (4.9) plays the same role as (2.7).

A key feature of the Lagrangian character $L[u]$ is that it is in proportion to the error $E=\mathscr {F}(h,u)$ . In fact by Lemma 19 of [Reference de la Llave, González, Jorba and Villanueva59] or Lemma 5 of [Reference Berti and Bolle10], if we define the matrix function

$$ \begin{align*}L_1(u,E):=\partial \big((\partial u)^{\mathsf{T}}\cdot JE\big)-\left[\partial\big((\partial u)^{\mathsf{T}}\cdot JE\big)^{\mathsf{T}}\big)\right]^{\mathsf{T}}, \end{align*} $$

then there holds

(4.15)

$$ \begin{align} \nabla_\omega L[u] =L_1(u,\mathscr{F}(h,u)) \quad\text{or equivalently}\quad L[u] =\nabla_\omega ^{-1}L_1(u,\mathscr{F}(h,u)). \end{align} $$

Note that due to the presence of $\partial $ , the function $L_1(u,E)$ necessarily has average zero. In other words, the matrix function $\nabla _\omega L[u]$ is nothing but the matrix representation of the exterior differential of the 1-form represented by $(\partial u)^{\mathsf {T}}\cdot J\mathscr {F}(h,u)$ . Of course, this is a refined version of the observation that an invariant torus is necessarily a Lagrangian submanifold. Therefore, the mapping $\boldsymbol {B}\big (\mathscr {F}(h,u),L[u]\big )$ in (4.9) vanishes linearly in E if u is approximately invariant:

Corollary 4.2.1. Suppose $\omega $ satisfies the Diophantine condition (1.3), so that (4.1) holds. Suppose $u\in H^s$ for some $s>\max (\tau +2,n/2+2)$ , while $|u-\zeta _0|_{C^2}$ is sufficiently close to zero. Then for any constant shift $(0,\mu )^{\mathsf {T}}\in \mathbb {R}^{n}\times \mathbb {R}^n$ , there holds

$$ \begin{align*}\big\|\boldsymbol{B}\big(\partial\mathscr{F}(h,u),L[u]\big)\big\|_{H^{s-(\tau+2)}} \leq \gamma^{-1} K_s\big(|h|_{C^{[s]+3}},\|u\|_{H^s}\big)\big\|\mathscr{F}(h,u)+(0,\mu)^{\mathsf{T}}\big\|_{H^{s-1}}. \end{align*} $$

The function $K_s$ is increasing in both arguments and does not depend on the shift $(0,\mu )^{\mathsf {T}}\in \mathbb {R}^{n}\times \mathbb {R}^n$ .

Proof. By Lemma 4.2, we know that $\boldsymbol {B}\big (\partial \mathscr {F}(h,u),L[u]\big )$ is a rational function of $\partial \mathscr {F}(h,u)$ and $L[u]$ , vanishing linearly as the arguments $\to 0$ , with coefficients being rational functions of $A[u],\partial u,\partial ^2 u$ . Since $|u-\zeta _0|_{C^2}$ is close to 0, we find that both $\partial \mathscr {F}(h,u)$ and $L[u]$ are $C^0$ close to 0, since they only involve u and $\partial u$ . Consequently, for example, the matrix $I_n+(N[u]L[u])^2$ is $H^{s-1}$ close to the identity matrix.

Obviously, shifting $\mathscr {F}(h,u)$ by a constant addendum does not affect the value of $\boldsymbol {B}\big (\partial \mathscr {F}(h,u),L[u]\big )$ . Therefore,

$$ \begin{align*}\big\|\boldsymbol{B}\big(\partial\mathscr{F}(h,u),L[u]\big)\big\|_{H^{s-(\tau+2)}} \leq K_s\big(|h|_{C^{[s]+3}},\|u\|_{H^s}\big)\left(\big\|\mathscr{F}(h,u)+(0,\mu)^{\mathsf{T}}\big\|_{H^{s-1}} +\|L[u]\|_{H^{s-(\tau+2)}}\right). \end{align*} $$

Now we make use of (4.15). Note that for the matrix function

$$ \begin{align*}L_1(u,E)=\partial \big((\partial u)^{\mathsf{T}}\cdot JE\big)-\left[\partial\big((\partial u)^{\mathsf{T}}\cdot JE\big)^{\mathsf{T}}\big)\right]^{\mathsf{T}}, \end{align*} $$

shifting E by a constant addendum $(0,\mu )^{\mathsf {T}}$ yields the same value of $L_1(u,E)$ , by a direct computation (using $\partial _i\partial _ju=\partial _j\partial _iu$ ). Therefore, by (4.15), we have

$$ \begin{align*}\begin{aligned} \|L[u]\|_{H^{s-(\tau+2)}} &\leq \gamma^{-1} \big\|L_1\big(u,\mathscr{F}(h,u)+(0,\mu)^{\mathsf{T}}\big)\big\|_{H^{s-2}}\\ &\lesssim \gamma^{-1} K_s\big(\|u\|_{H^s}\big)\big\|\mathscr{F}(h,u)+(0,\mu)^{\mathsf{T}}\big\|_{H^{s-1}}. \end{aligned} \end{align*} $$

This concludes the proof.

4.2 Deriving parahomological equation

So far we have been mostly quoting and rewriting the algebraic findings reported in Section 8 of [Reference de la Llave, González, Jorba and Villanueva59]. However, from this point, we will proceed quite differently from [Reference de la Llave, González, Jorba and Villanueva59], or any other known reference on KAM theory, for example, [Reference Berti and Bolle10] – instead of a Newtonian algorithm involving approximate right inverse, we employ the idea of Subsection 2.6 to directly solve the parahomological equation. We may treat Theorem 1.1 and 1.2 in a unified manner, since for Theorem 1.1 it suffices to require that the parameter $\xi $ must be zero.

So we restrict to a neighbourhood of the already known approximate solution $\zeta _0$ . We notice that $\mathscr {F}(h_\xi ,u)$ is a first order nonlinear partial differential operator acting on the unknown $(u,\xi )\in H^s\times \mathbb {R}^n$ , and in fact no differential in the constant $\xi $ is involved at all. Let us recall from (1.4) and (1.7) that $e_0=\mathscr {F}(h,\zeta _0)$ . Then the paralinearization formula

(4.16)

$$ \begin{align} \begin{aligned} \mathscr{F}(h_\xi,u)+\begin{pmatrix} 0 \\ \mu\end{pmatrix} &=e_0+ \begin{pmatrix} \xi \\ \mu\end{pmatrix} +T_{D_u\mathscr{F}(h,u)}(u-\zeta_0) +\mathcal{R}_{\mathrm{PL}}(X_h,u-\zeta_0)\cdot(u-\zeta_0) \end{aligned} \end{align} $$

is valid. The expression $\boldsymbol {B}(E,L[u])$ involves only $\partial E$ , so it does not change if E is shifted by a constant vector. Thus, writing for simplicity

$$ \begin{align*}E:=\mathscr{F}(h_\xi,u)+\begin{pmatrix} 0 \\ \mu\end{pmatrix}, \end{align*} $$

we actually haveFootnote ⁷ , by Lemma 4.2, the following equality:

$$ \begin{align*}\begin{aligned} T_{D_u\mathscr{F}(h,u)}(u-\zeta_0) &= \mathbf{Op}^{\mathrm{PM}}\left[M[u]\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}M[u]^{-1}\right](u-\zeta_0)\\ &\quad-\mathbf{Op}^{\mathrm{PM}}\Big({M[u]}\big(\nabla_\omega M[u]^{-1}\big)+\nabla_\omega \Big)(u-\zeta_0) +\mathbf{Op}^{\mathrm{PM}}\left(\boldsymbol{B}(E,L[u]) M[u]^{-1}\right)(u-\zeta_0). \end{aligned} \end{align*} $$

Thus the paralinearization formula (4.16) becomes

(4.17)

$$ \begin{align} \begin{aligned} E&= e_0+\begin{pmatrix} \xi \\ \mu\end{pmatrix} +T_{M[u]}\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}T_{M[u]^{-1}}(u-\zeta_0) -T_{M[u]}\nabla_\omega T_{M[u]^{-1}}(u-\zeta_0)\\ &\quad +\mathbf{Op}^{\mathrm{PM}}\left(\boldsymbol{B}(E,L[u]) M[u]^{-1}\right)(u-\zeta_0) +\mathcal{R}_1[u](u-\zeta_0) +\mathcal{R}_{\mathrm{PL}}(X_h,u-\zeta_0)\cdot(u-\zeta_0). \end{aligned} \end{align} $$

Here the linear operator

(4.18)

$$ \begin{align} \begin{aligned} \mathcal{R}_1[u] &=\mathbf{Op}^{\mathrm{PM}}\left[M[u]\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}M[u]^{-1}\right] -T_{M[u]}\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}T_{M[u]^{-1}}\\ &\quad-\mathbf{Op}^{\mathrm{PM}}\Big[{M[u]}\big(\nabla_\omega M[u]^{-1}\big)+\nabla_\omega \Big] +T_{M[u]}\nabla_\omega T_{M[u]^{-1}} \end{aligned} \end{align} $$

is produced in view of Proposition 3.4. Note that we used the Leibniz rule $\partial T_au=T_{\partial a}u+T_a\partial u$ .

Lemma 4.3. Fix an index $s>2\tau +2+n/2+\varepsilon $ and set $r=s-n/2$ . Suppose the Hamilton function $h\in C^{N_{s+r}+3} u\in H^s(\mathbb {T}^n;\mathbb {T}^n\times \mathbb {R}^n)$ is close to $\zeta _0$ in that space. Then the operator $\mathcal {R}_1[u]$ defined by (4.18) maps $H^{t}$ to $H^{t+2\tau +\varepsilon }$ for every index $t\in \mathbb {R}$ : with $K_{t,s}$ being some increasing function,

$$ \begin{align*}\|\mathcal{R}_1[u]v\|_{H^{t+r-2}} \leq K_{t,s}\big(|h|_{C^{N_{s+r}+3}},\|u\|_{H^s}\big)\|u-\zeta_0\|_{H^s}\|v\|_{H^{t}}. \end{align*} $$

Furthermore, as such an operator, $\mathcal {R}_1[u]$ has $C^1$ dependence on $u\in H^s$ .

Proof. For simplicity of notation we write

$$ \begin{align*}a=M[u],\quad b=\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix},\quad c=M[u]^{-1}. \end{align*} $$

Then (4.18) is rewritten as

$$ \begin{align*}\begin{aligned} \mathcal{R}_1[u] &=T_{abc}-T_aT_bT_c+(T_aT_c-1)\nabla_\omega+T_aT_{\nabla_\omega c}-T_{a\nabla_\omega c}\\ &=\mathcal{R}_{\mathrm{CM}}(ab,c)+\mathcal{R}_{\mathrm{CM}}(a,b)T_c+\mathcal{R}_{\mathrm{CM}}(a,c)\nabla_\omega-\mathcal{R}_{\mathrm{CM}}(a,\nabla_\omega c). \end{aligned} \end{align*} $$

Here we used the fact that $ac$ is the identity matrix. We then recall from (4.5) that $a=\mathrm {diag}(I_n,-I_n)+O\big (\partial (u-\zeta _0)\big )$ , and $S[u]$ is given by (4.7). Therefore, we have

$$ \begin{align*}\big|S[u]\big|_{C^{r-1}_*}\leq K_{s}\big(|h|_{C^{N_{s+r}}},\|u\|_{H^s}\big), \end{align*} $$

and

$$ \begin{align*}\big|a-\text{diag}(I_n,-I_n)\big|_{C^{r-1}_*} +\big|c-\text{diag}(I_n,-I_n)\big|_{C^{r-1}_*} \leq \|u-\zeta_0\|_{H^s}. \end{align*} $$

We then use Proposition 3.4; notice that $\mathcal {R}_{\mathrm {CM}}(\cdot ,\cdot )$ does not change its value when the inputs are shifted by constants. Therefore, $\mathcal {R}_1[u]$ is a smoothing operator of order $r-2$ , vanishing linearly as $u\to \zeta _0$ . This gives the desired operatorial bound.

We can now formulate the parahomological equation in the unknown $(u,\xi ,\mu )$ , where $\xi ,\mu \in \mathbb {R}^n$ are constant parameters to be fixed. The equation reads

(4.19)

$$ \begin{align} \begin{aligned} T_{M[u]}&\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}T_{M[u]^{-1}}(u-\zeta_0) -T_{M[u]}\nabla_\omega T_{M[u]^{-1}}(u-\zeta_0) +\begin{pmatrix} \xi \\ \mu \end{pmatrix} \\ &=-e_0 -\mathcal{R}_1[u](u-\zeta_0) -\mathcal{R}_{\mathrm{PL}}(X_h,u-\zeta_0)\cdot(u-\zeta_0) \,. \end{aligned} \end{align} $$

In other words, it aims to cancel out all terms in the right-hand-side of (4.17) but the one linear in E.

To solve the parahomological equation (4.19), we need a lemma concerning linear parahomological equations, which is essentially the simple equation $\nabla _\omega v=f$ .

Lemma 4.4. Let $\omega $ be a Diophantine frequency vector satisfying (1.3). Set

$$ \begin{align*}M_Q=\Bigg\{\begin{aligned} \max\Big(\big|(\operatorname{\mathrm{Avg}} Q)^{-1}\big|&,|Q|_{L^\infty}\Big) &\quad\mathrm{under\ the\ assumption\ of\ Theorem}\ 1.1,\\&|Q|_{L^\infty}&\mathrm{under\ the\ assumption\ of\ Theorem}\ 1.2. \end{aligned} \end{align*} $$

There are constants $\rho _1,\rho _2$ depending on $|h|_{C^3}$ and $M_Q$ with the following property. If $|e_0|_{C^1}\leq \rho _1$ , and the embedding $u:\mathbb {T}^n\mapsto \mathbb {T}^n\times \mathbb {R}^n$ is such that $|u-\zeta _0|_{C^1}\leq \rho _2$ , then the linear parahomological equation in the unknown $(v,\xi ,\mu )\in H^s\times \mathbb {R}^n\times \mathbb {R}^n$

(4.20)

$$ \begin{align} T_{M[u]}\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}T_{M[u]^{-1}}v -T_{M[u]}\nabla_\omega T_{M[u]^{-1}}v +\begin{pmatrix} \xi \\ \mu \end{pmatrix} =f \end{align} $$

has a linear solution operator

$$ \begin{align*}\big( v^x, \, v^y, \, \xi, \, \mu \big) =\big( \mathfrak{L}^x[u]f, \, \mathfrak{L}^y[u]f, \, P^x[u]f, \, P^y[u]f \big), \end{align*} $$

satisfying the following estimates: with $K_s$ being increasing functions in the arguments,

$$ \begin{align*}\begin{aligned} \big\|\mathfrak{L}^x[u]f\big\|_{H^s} &\leq K_s\big(M_Q,|h|_{C^3}\big) \gamma^{-2}\|f\|_{H^{s+2\tau}} \\ \big\|\mathfrak{L}^y[u]f\big\|_{H^s} &\leq K_s\big(M_Q,|h|_{C^3}\big)\gamma^{-1}\|f\|_{H^{s+\tau}} \\ |P^x[u]f|+|P^y[u]f| &\leq K_s\big(M_Q,|h|_{C^3}\big)\gamma^{-1}\|f\|_{H^{s+\tau}}. \end{aligned} \end{align*} $$

Moreover, the four linear operators of concern are all continuously differentiable mappings from $u\in C^1$ to the space of linear operators (with operator norm). In particular, the solution operator $\xi =P^x[u]f$ is fixed as 0 under the assumption of Theorem 1.1.

Proof. Since $M[u]=\mathrm {diag}(I_n,-I_n)+O\big (\partial (u-\zeta _0)\big )$ , it follows that $T_{M[u]}$ and $T_{M[u]^{-1}}$ are both invertible bounded operator from $H^s$ to $H^s$ if $|u-\zeta _0|_{C^1}$ is small. Note further that for a constant $\lambda $ and a function A, there holds $T_A\lambda =(\operatorname {\mathrm {Avg}} A)\lambda $ . Thus, writing

(4.21)

$$ \begin{align} T_{M[u]}^{-1}\begin{pmatrix} \xi \\ \mu \end{pmatrix}=\begin{pmatrix} \xi_1 \\ \mu_1 \end{pmatrix}, \quad T_{M[u]}^{-1}f=f_1, \quad T_{M[u]^{-1}}v=v_1 \end{align} $$

for some constant vectors $\xi _1,\mu _1\in \mathbb {R}^n$ to be determined, the linear parahomological equation (4.20) is equivalent to

$$ \begin{align*}\begin{pmatrix} 0_n & T_{S[u]} \\ 0_n & 0_n \end{pmatrix}v_1 -\nabla_\omega v_1 +\begin{pmatrix} \xi_1 \\ \mu_1 \end{pmatrix} =f_1. \end{align*} $$

Furthermore, recalling the expression (1.2) of $h(x,y)$ and (4.7), we have

(4.22)

$$ \begin{align} S[u]=Q+\partial^2a_0+O\big(\partial(u-\zeta_0)\big), \end{align} $$

where the implicit constant depends on $|h|_{C^3}$ . Noting that

(4.23)

$$ \begin{align} e_0=\mathscr{F}(h,\zeta_0) =\begin{pmatrix} a_1-\omega \\ \partial a_0 \end{pmatrix}, \end{align} $$

it follows that $S[u]$ is $C^0$ -close to Q if $|e_0|_{C^1}$ and $|u-\zeta _0|_{C^1}$ are small.

In components of $v_1$ , the linear parahomological equation becomes

$$ \begin{align*}\begin{aligned} T_{S[u]}v_1^y-\nabla_\omega v_1^x+\xi_1&=f_1^x, \\ -\nabla_\omega v_1^y+\mu_1&=f_1^y. \end{aligned} \end{align*} $$

We discuss how to solve this system under the assumption of Theorem 1.1 and 1.2 separately. For simplicity of notation, we define

$$ \begin{align*} \mathcal{A}f:=f-\operatorname{\mathrm{Avg}} f.\\[-34pt] \end{align*} $$

Case 1: $\mathrm {Avg} Q$ is invertible

This is the nondegeneracy condition in Therorem 1.1. We fix $\xi =0$ in this case, and obtain from (4.21) the following:

(4.24)

$$ \begin{align} \begin{pmatrix} 0 \\ \mu \end{pmatrix} =\begin{pmatrix} \xi_1 \\ -\mu_1 \end{pmatrix} +O\big(\partial(u-\zeta_0)\big)\begin{pmatrix} \xi_1 \\ \mu_1 \end{pmatrix}. \end{align} $$

We first solve

(4.25)

$$ \begin{align} v_1^y=\operatorname{\mathrm{Avg}} v_1^y-\nabla_\omega ^{-1}\mathcal{A}f_1^y, \quad \mu_1=\operatorname{\mathrm{Avg}} f_1^y, \end{align} $$

where $\operatorname {\mathrm {Avg}} v_1^y$ is yet to be determined. With (4.24), we can solve $\xi _1$ , thus $\mu $ , as linear mappings of $\mu _1$ . Since the norm of $T_{M[u]}^{-1}$ from $H^s$ to $H^s$ depends only on $|u-\zeta _0|_{C^1}$ , this immediately yields

$$ \begin{align*}\big\|\mathcal{A}v_1^y\big\|_{H^s}+|\mu|+|\xi_1| \leq C_s\gamma^{-1}\|f\|_{H^{s+\tau}}. \end{align*} $$

Here we used (4.1).

We then choose $\operatorname {\mathrm {Avg}} v_1^y$ to balance the mean value of the equation along x-direction. By assumption $S[\zeta _0]$ is invertible, so by (4.22), the matrix-valued function $S[u]=Q+O(\partial e_0)+O\big (\partial (u-\zeta _0)\big )$ is still invertible if $|e_0|_{C^1}$ and $|u-\zeta _0|_{C^1}$ are small, and the implicit constant here depends on $|h|_{C^3}$ . In that case, $\big |(\operatorname {\mathrm {Avg}} S[u])^{-1}\big |\lesssim M_Q$ . We can then set

$$ \begin{align*}\operatorname{\mathrm{Avg}} v_1^y=\big(\mathrm{Avg} S[u]\big)^{-1}\operatorname{\mathrm{Avg}}\big(f_1^x-\xi_1-T_{S[u]}\mathcal{A}v_1^y\big), \end{align*} $$

and solve

$$ \begin{align*}v_1^x=\nabla_\omega ^{-1}\mathcal{A} \big(T_{S[u]}\mathcal{A}v_1^y+\xi_1-f_1^x\big). \end{align*} $$

By (4.7), $S[u]$ depends on up to second order derivative of h, so the solutions are immediately estimated as

$$ \begin{align*}\big\|v_1^x\big\|_{H^s} \leq K_s\big(M_Q,|h|_{C^3}\big) \gamma^{-2}\|f\|_{H^{s+2\tau}}. \end{align*} $$

Here we used (4.1) once again.

Case 2: Nontrivial parameter $\xi $

This case matches with Theorem 1.2, whence no requirement about nondegeneracy of Q is posed. In this case the parameter $\xi $ cannot be fixed as 0. But we still obtain from (4.21) the following:

(4.26)

$$ \begin{align} \begin{pmatrix} \xi \\ \mu \end{pmatrix} =\begin{pmatrix} \xi_1 \\ -\mu_1 \end{pmatrix} +O\big(\partial(u-\zeta_0)\big)\begin{pmatrix} \xi_1 \\ \mu_1 \end{pmatrix}. \end{align} $$

We then first solve

(4.27)

$$ \begin{align} v_1^y=-\nabla_\omega ^{-1}\mathcal{A} f_1^y, \quad \mu_1=\operatorname{\mathrm{Avg}} f_1^y, \end{align} $$

that is, the mean value of $v_1^y$ is fixed as 0. For the equation along x-direction, we solve

$$ \begin{align*}v_1^x=-\nabla_\omega ^{-1}\mathcal{A} \big(f_1^x-T_{S[u]}v_1^y\big), \quad \xi_1=\operatorname{\mathrm{Avg}} \big(f_1^x-T_{S[u]}v_1^y\big). \end{align*} $$

Using (4.1), they immediately yield the similar estimates as in Case 1.

Finally, continuous differentiability of the operators for $u\in C^1$ follows immediately from Proposition 3.3 and 3.4, that is, continuous differentiability of $T_a$ in a and $T_{ab}-T_aT_b$ in $(a,b)$ .

4.3 Proof of Theorem 1.1–1.2

Recall that we set $r=s-n/2$ , and $N_{s+r}$ to be the least integer $>s+r$ . Since we are working with a perturbative problem, we assume that, for example, $\|u-\zeta _0\|_{H^s}\leq 2$ , and the requirement of Lemma 4.4 is also fulfilled: with $\rho _1=\rho _1\big (M_Q,|h|_{C^3}\big )$ , $\rho _2=\rho _2\big (M_Q,|h|_{C^3}\big )$ as in Lemma 4.4, we have

(4.28)

$$ \begin{align} \|u-\zeta_0\|_{H^s}\leq2, \quad |e_0|_{C^1}\leq \rho_1,\quad |u-\zeta_0|_{C^1}\leq\rho_2. \end{align} $$

The parahomological equation (4.19) is then converted to the following parainverse equation:

(4.29)

$$ \begin{align} \begin{aligned} u-\zeta_0 &=-\begin{pmatrix} \mathfrak{L}^x[u]-P^x[u] \\ \mathfrak{L}^y[u]-P^y[u] \end{pmatrix}e_0\\ &\quad -\begin{pmatrix} \mathfrak{L}^x[u]-P^x[u] \\ \mathfrak{L}^y[u]-P^y[u] \end{pmatrix}\left(\mathcal{R}_1[u](u-\zeta_0) -\mathcal{R}_{\mathrm{PL}}(X_h,u-\zeta_0)\cdot(u-\zeta_0)\right)\\ &=:\mathcal{G}_1(u)+\mathcal{G}_2(u) \end{aligned} \end{align} $$

We analyze the right-hand side of (4.29). The requirements (4.28) meet the conditions of Lemma 4.4, so

(4.30)

$$ \begin{align} \begin{aligned} \|\mathcal{G}_1(u)\|_{H^{s+\varepsilon}} &\leq K_s\big(M_Q,|h|_{C^3}\big)\gamma^2\|e_0\|_{H^{s+2\tau+\varepsilon}}. \end{aligned} \end{align} $$

We turn to $\mathcal {G}_2$ . The paraproduct remainder $\mathcal {R}_1[u](u-\zeta _0)$ is estimated by Lemma 4.3:

(4.31)

$$ \begin{align} \big\|\mathcal{R}_1[u](u-\zeta_0)\big\|_{H^{s+r-2}} \leq K_s\big(|h|_{C^{N_{s+r}}}\big)\|u-\zeta_0\|_{H^s}^2, \end{align} $$

and $\mathcal {R}_1[u](u-\zeta _0)$ depends continuously on u (with respect to Sobolev norms $H^s$ and $H^{s+r-2}$ ). As for the paralinearization remainder $\mathcal {R}_{\mathrm {PL}}$ , by Proposition 3.5, recalling (4.4),

(4.32)

$$ \begin{align} \mathcal{R}_{\mathrm{PL}}(X_h,u-\zeta_0)\cdot(u-\zeta_0) =\mathcal{R}_{\mathrm{PL};1}\big(A[\zeta_0]\big)(u-\zeta_0) +\mathcal{R}_{\mathrm{PL};2}(X_h,u-\zeta_0)\cdot(u-\zeta_0). \end{align} $$

Recalling further the definition (1.2) of $h(x,y)$ , we find

$$ \begin{align*}A[\zeta_0] =\begin{pmatrix} \partial a_1(\theta) & Q(\theta) \\ \partial^2a_0(\theta) & -(\partial a_1)^{\mathsf{T}}(\theta) \end{pmatrix}. \end{align*} $$

On the other hand, recalling the definition (4.23), we find

$$ \begin{align*}\begin{aligned} \|a_1-\omega\|_{H^{s+\tau+\varepsilon}} +\|\partial a_0\|_{H^{s+\tau+\varepsilon}} &=\|e_0\|_{H^{s+2\tau+\varepsilon}}, \end{aligned} \end{align*} $$

that is, $a_0$ is almost a constant, and $a_1$ almost equals the constant vector $\omega $ . As a result, we find $A[\zeta _0]$ is almost a constant:

$$ \begin{align*}\begin{aligned} \big|A[\zeta_0]-\operatorname{\mathrm{Avg}} A[\zeta_0]\big|_{C^r_*} &\leq C_s\big(\|a_1-\omega\|_{H^{s+\tau+\varepsilon}} +\|\partial a_0\|_{H^{s+\tau+\varepsilon}} +|Q-\operatorname{\mathrm{Avg}} Q|_{C^r_*}\big)\\ &\leq C_s\big(\|e_0\|_{H^{s+2\tau+\varepsilon}}+|e_1|_{C^r_*}\big). \end{aligned} \end{align*} $$

Implementing the paralinearization theorem, that is, Proposition 3.5, we estimate (4.32) as (note that $X_h=J\nabla _{x,y}h\in C^{N_{s+r}+2}$ )

(4.33)

$$ \begin{align} \begin{aligned} \big\|\mathcal{R}_{\mathrm{PL}}&(X_h,u-\zeta_0)\cdot(u-\zeta_0)\big\|_{H^{s+r}} \\ &\leq C_s\Big(\big(\|e_0\|_{H^{s+2\tau+\varepsilon}}+|e_1|_{C^r_*}\big)\|u-\zeta_0\|_{H^s} +|h|_{C^{N_{s+r}+3}}\|u-\zeta_0\|_{H^s}^2\Big). \end{aligned} \end{align} $$

Furthermore, by Proposition 3.5, the operator $\mathcal {R}_{\mathrm {PL}}\big (X_h(\zeta _0+\cdot \,),u-\zeta _0\big )\in \mathcal {L}(H^s,H^{s+r})$ depends continuously on $u\in H^s$ .

Recall that by assumption, $s+r-2=2s-n/2-2\geq s+2\tau +\varepsilon $ . Combining (4.30)(4.31)(4.33), using Lemma 4.4, we conclude that if (4.28) holds, then

(4.34)

$$ \begin{align} \begin{aligned} \|\mathcal{G}_1(u)&\|_{H^{s+\varepsilon}} +\|\mathcal{G}_2(u)\|_{H^{s+\varepsilon}}\\ &\leq K_s\big(M_Q,|h|_{C^{N_{s+r}+3}}\big)\gamma^{-2} \Big(\|e_0\|_{H^{s+2\tau+\varepsilon}}+\big(\|e_0\|_{H^{s+2\tau+\varepsilon}}+|e_1|_{C^r_*}\big)\|u-\zeta_0\|_{H^s} +\|u-\zeta_0\|_{H^s}^2\Big). \end{aligned} \end{align} $$

By some elementary algebra with quadratic polynomials, there are decreasing functions $c_0,c_1$ and an increasing function $c_2$ , with the following property: if

$$ \begin{align*}\|e_0\|_{H^{s+2\tau+\varepsilon}} \leq\gamma^{4}c_0\big(M_Q,|h|_{C^{N_{s+r}+3}}\big), \quad |e_1|_{C^r_*} \leq \gamma^{2}c_1\big(M_Q,|h|_{C^{N_{s+r}+3}}\big), \end{align*} $$

then the positive number $\rho =\|e_0\|_{H^{s+\tau +\varepsilon }}\gamma ^{2}c_2\big (M_Q,|h|_{C^{N_{s+r}+3}}\big )$ satisfies the quadratic inequality

$$ \begin{align*}K_s\big(M_Q,|h|_{C^{N_{s+r}+3}}\big)\gamma^{-2} \Big(\|e_0\|_{H^{s+2\tau+\varepsilon}}+\big(\|e_0\|_{H^{s+2\tau+\varepsilon}}+|e_1|_{C^r_*}\big)\rho +\rho^2\Big)\leq\rho. \end{align*} $$

Here $K_s$ is the increasing function in (4.34). In other words,

$$ \begin{align*}\|\mathcal{G}_1(u)\|_{H^{s+\varepsilon}} +\|\mathcal{G}_2(u)\|_{H^{s+\varepsilon}} \leq \rho \quad\text{if }\|u-\zeta_0\|_{H^s}\leq\rho\text{ and (4.28) holds.} \end{align*} $$

In that case, the mapping $\zeta _0+\mathcal {G}_1+\mathcal {G}_2$ in the right-hand-side of (4.29) maps the closed ball $\bar B_{\rho }(\zeta _0)\subset H^s$ to itself. Furthermore it is continuous with respect to $u\in \bar B_{\rho }(\zeta _0)\subset H^s$ and the image is totally bounded (since the embedding $H^{s+\varepsilon }\subset H^s$ is compact). By the Schauder fixed point theorem, the mapping $u\mapsto \zeta _0+\mathcal {G}_1(u)+\mathcal {G}_2(u)$ has a fixed point $u\in \bar B_{\rho }(\zeta _0)$ , and in fact

(4.35)

$$ \begin{align} \|u-\zeta_0\|_{H^s} \leq c_2\big(M_Q,|h|_{C^{N_{s+r}+3}}\big)\gamma^{-2}\|e_0\|_{H^{s+2\tau+\varepsilon}}. \end{align} $$

Having found a solution u of the parahomological equation (4.19), we determine the constants $\xi ,\eta $ in terms of u via the operators $P^x[u],P^y[u]$ . We will then be able to cancel all terms but the one linear in E in the paralinearization formula (4.17). Thus the paralinearization formula (4.17) at u reduces to

(4.36)

$$ \begin{align} E=\mathbf{Op}^{\mathrm{PM}}\left(\boldsymbol{B}(E,L[u]) M[u]^{-1}\right)(u-\zeta_0). \end{align} $$

Using Corollary 4.2.1, we estimate

(4.37)

$$ \begin{align} \begin{aligned} \big\|\mathbf{Op}^{\mathrm{PM}}&\left(\boldsymbol{B}(E,L[u]) M[u]^{-1}\right)(u-\zeta_0)\big\|_{H^s}\\&\overset{\text{Prop. } 3.3}{\leq} C_s\big|\boldsymbol{B}(E,L[u]) M[u]^{-1}\big|_{L^\infty}\|u-\zeta_0\|_{H^s}\\&\overset{(4.35)}{\leq} C_sc_2\big(M_Q,|h|_{C^{N_{s+r}+3}}\big)\gamma^{-2}\|e_0\|_{H^{s+2\tau+\varepsilon}}\big\|\boldsymbol{B}(E,L[u]) M[u]^{-1}\big\|_{H^{s-(\tau+2)}}\\&\overset{\text{Cor. } 4.2.1}{\leq} C_sc_2\big(M_Q,|h|_{C^{N_{s+r}+3}}\big)\gamma^{-2}\|e_0\|_{H^{s+2\tau+\varepsilon}}\|E\|_{H^{s-1}}. \end{aligned} \end{align} $$

Note that it is legitimate to apply Corollary 4.2.1, since we already know that u is close to $\zeta _0$ , and therefore $E=\mathscr {F}(h_\xi ,u)+(0,\mu )^{\mathsf {T}}$ is close to zero. Consequently, if we further require the function $c_0\ll 1/c_2$ , then the factor

$$ \begin{align*}C_sc_2\big(M_Q,|h|_{C^{N_{s+r}+3}}\big)\gamma^{-3}\|e_0\|_{H^{s+2\tau+\varepsilon}} \end{align*} $$

in the right-hand-side of (4.37) will be $\leq 1/2$ (remembering $e_0$ is of size $O(\gamma ^{4}c_0)$ and $\gamma $ is assumed to be small). Therefore, (4.37) yields $\|E\|_{H^s}\leq (1/2)\|E\|_{H^{s-1}}$ , forcing $E=0$ .

To summarize, under the assumption of Theorem 1.1, there is a solution u of the the parahomological equation (4.19) with $\xi =0$ in (4.19), the constant vector $\mu $ is determined by u, and u satisfies

$$ \begin{align*}E=\mathscr{F}(h,u)+\begin{pmatrix} 0 \\ \mu\end{pmatrix}=0. \end{align*} $$

Under the assumption of Theorem 1.2, there is a solution u of the the parahomological equation (4.19), the parameters $\xi ,\mu $ are determined by u, and u satisfies

$$ \begin{align*}E=\mathscr{F}(h_\xi,u)+\begin{pmatrix} 0 \\ \mu\end{pmatrix}=0. \end{align*} $$

By Lemma 4.1, the proof of Theorem 1.1–1.2 is complete.

Remark 4.2. Similarly as in Subsection 2.4, the right-hand-side of the parainverse equation (4.29) becomes continuously differentiable in u if we assume higher regularity $h\in C^{N_{s+r}+4}$ . This is indicated in the general paralinearization theorem 3.5. Thus if the errors $e_0$ and $e_1$ are even smaller, the right-hand-side of (4.29) shall have Lipschitz constant less than 1, and a Banach fixed point argument will apply. In this case, the KAM conjugacy problem falls into the realm of standard implicit function theorems, which will immediately imply, for example, Whitney differentiability with respect to the frequency $\omega $ . Compared to what is in the literature, this direct proof is much simpler.

4.4 Miscellaneous

We discuss the generality of the assumption of Theorem 1.1–1.2. The idea of our discussion closely follows [Reference Berti and Bolle10]. At a first glance, the coverage of Theorem 1.1–1.2 is very limited: the approximate invariant torus around which we seek for a true one has to be the “flat” embedding $\theta \mapsto (\theta ,0)$ . However, under suitable geometric transformations, which are elementary and independent from KAM type results, it is possible to reduce quite general situations to this special case.

To be concrete, let the frequency $\omega $ satisfy (1.3). Suppose $h(x,y)$ is an arbitrary Hamiltonian function, not necessarily brought into the form (1.2). Let

$$ \begin{align*}\zeta(\theta)=\begin{pmatrix} \zeta^x(\theta) \\ \zeta^y(\theta) \end{pmatrix} \end{align*} $$

be an embedding from $\mathbb {T}^n$ to the phase space, not necessarily close to the “flat” embedding, such that $\zeta ^x:\mathbb {T}^n\mapsto \mathbb {T}^n$ is a diffeomorphism, and $\zeta (\mathbb {T}^n)$ is an approximately invariant torus with frequency $\omega $ under the flow of h: that is,

$$ \begin{align*}\mathscr{F}(h,\zeta):=X_h(\zeta)-\nabla_\omega \zeta \end{align*} $$

is close to 0. We aim to find suitable symplectic coordinate transformation that reduces this general approximate solution to the special case discussed in Theorem 1.1–1.2.

Recall (4.6): we introduce

$$ \begin{align*}L[\zeta]=(\partial \zeta)^{\mathsf{T}} \cdot J \partial \zeta\in\boldsymbol{M}_{n\times n} \end{align*} $$

to measure the “lack of isotropy” of the embedding $\zeta $ , since the pull-back of the symplectic form $\operatorname {\mathrm {d}}\!x\wedge \operatorname {\mathrm {d}}\!y$ to $\mathbb {T}^n$ via $\zeta $ has exactly the matrix representation $L[\zeta ]$ . We then have the following, as seen in (4.15):

(4.38)

$$ \begin{align} L[\zeta]=\partial\nabla_\omega ^{-1}\left[(\partial \zeta)^{\mathsf{T}}\cdot J\mathscr{F}(h,\zeta)\right] -\partial\nabla_\omega ^{-1}\left[(\partial \zeta)^{\mathsf{T}}\cdot J\mathscr{F}(h,\zeta)\right]^{\mathsf{T}}. \end{align} $$

According to Lemma 6 of [Reference Berti and Bolle10], if one sets

$$ \begin{align*}p(\theta)=\Delta^{-1}\partial\cdot L[\zeta] =\left(\Delta^{-1}\sum_{j=1}^n\partial_jL_{kj}(\theta)\right)_{1\leq k\leq n}, \end{align*} $$

then the embedded torus

$$ \begin{align*}\eta(\theta) =\begin{pmatrix} \eta^x(\theta) \\ \eta^y(\theta) \end{pmatrix} :=\begin{pmatrix} \zeta^x(\theta) \\ \zeta^y(\theta)-\big(\partial\zeta^x(\theta)\big)^{\mathsf{T}} p(\theta) \end{pmatrix} \end{align*} $$

is isotropic. From (4.38), p, hence $\eta -\zeta $ , is linear in $\mathscr {F}(h,\zeta )$ , and for $s>1+n/2$ ,

$$ \begin{align*}\|\eta-\zeta\|_{H^s}\leq C_s\gamma^{-1}\|\zeta\|_{H^{s+\tau+1}}\|\mathscr{F}(h,\zeta)\|_{H^{s+\tau}}, \end{align*} $$

where $C_s$ only depends on s. Since

$$ \begin{align*}\begin{aligned} \mathscr{F}(h,\eta) &=X_h\big(\zeta+(\eta-\zeta)\big)-\nabla_\omega \zeta -\nabla_\omega (\eta-\zeta)\\ &=\mathscr{F}(h,\zeta) +\int_0^1 (DX_h)\big(\zeta+\tau(\eta-\zeta)\big)(\eta-\zeta)\operatorname{\mathrm{d}}\!\tau -\nabla_\omega (\eta-\zeta), \end{aligned} \end{align*} $$

it then follows that

$$ \begin{align*}\|\mathscr{F}(h,\eta)\|_{H^s} \leq C_s\gamma^{-1}\big(1+|h|_{C^{N_{s+r}+3}}\big) \big(1+\|\zeta\|_{H^{s+\tau+2}}\big)\|\mathscr{F}(h,\zeta)\|_{H^{s+\tau+1}}, \end{align*} $$

where $C_s$ only depends on s and $|\omega |$ . In other words, if $\zeta (\mathbb {T}^n)$ is an approximately invariant torus for $X_h$ , then so is the isotropic torus $\eta (\mathbb {T}^n)$ , but the “error of being invariant” is less regular compared to $\zeta (\mathbb {T}^n)$ .

Since $\eta $ is an isotropic embedding, the diffeomorphism in phase space

$$ \begin{align*}G:\begin{pmatrix} x \\ y \end{pmatrix}\mapsto \begin{pmatrix} \eta^x(x) \\ \eta^y(x)+\big(\partial\eta^x(x)\big)^{\mathsf{T}} y \end{pmatrix} \end{align*} $$

is a symplectic one. The isotropic torus $\eta (\mathbb {T}^n)$ is straightened to the “flat” torus $\{y_1=0\}$ under the new symplectic coordinate $(x_1,y_1)=G^\iota (x,y)$ . If we set

$$ \begin{align*}H(x_1,y_1) :=(h\circ G)(x_1,y_1) =a_0(x_1)+\langle a_1(x_1),y_1\rangle +\frac{1}{2}\langle Q(x_1)y_1,y_1\rangle+O(|y_1|^3) \quad\text{near }y=0, \end{align*} $$

then the Hamiltonian vector field $X_h$ is changed to $X_H=(DG)^{-1}X_h\circ G$ . Furthermore, the “error of being invariant” is transformed to

$$ \begin{align*}(DG)^{-1}\mathscr{F}(h,\eta) =X_H(\zeta_0)-\begin{pmatrix}\omega \\ 0 \end{pmatrix} =\begin{pmatrix} a_1(\theta)-\omega \\ \partial a_0(\theta) \end{pmatrix}. \end{align*} $$

Thus $a_0$ is almost constant, and $a_1$ is almost equal to $\omega $ . We then set $S=\nabla _\omega ^{-1}(Q-\operatorname {\mathrm {Avg}} Q)$ , and introduce another canonical variable $(x_2,y_2)$ , defined by

$$ \begin{align*}\begin{aligned} x_1&=x_2+S(x_1)y_2\\ y_1&=y_2-\frac{1}{2}\partial S(x_1)(y_2,y_2). \end{aligned} \end{align*} $$

The symplectic diffeomorphism $\Gamma $ maps the isotropic torus $\{y_2=0\}$ to $\{y_1=0\}$ , while the Hamiltonian function

$$ \begin{align*}\begin{aligned} H(x_1,y_1) &=\left(a_0(x_1)-\operatorname{\mathrm{Avg}} a_0\right) +\langle a_1(x_1)-\omega,y_1\rangle\\ &\quad+\operatorname{\mathrm{Avg}} a_0+\langle \omega,y_2\rangle +\frac{1}{2}\langle (\operatorname{\mathrm{Avg}} Q)y_2,y_2\rangle+O(|y_2|^3). \end{aligned} \end{align*} $$

We thus find that the first three Taylor coefficients of $H(x_1,y_1)$ with respect to $y_2$ around $y_2=0$ will be close to

$$ \begin{align*}\operatorname{\mathrm{Avg}} a_0,\quad \omega,\quad \operatorname{\mathrm{Avg}} Q, \end{align*} $$

respectively. Consequently, under the new symplectic coordinate $(x_2,y_2)$ , the Hamiltonian function and approximately invariant torus are exactly in the form discussed by Theorem 1.1–1.2.

In conclusion, we have successfully reduced the general situation to the special case of Theorem 1.1 without any assumption on smallness of $\zeta ^x,\zeta ^y$ . For Hamiltonian functions depending on parameters, the argument will be exactly the same.

A “Direct method” for conjugacy problems

In this appendix, we explain why we prefer the “indirect method” to study conjugacy problems by comparing it with the “direct method,” to be specified shortly. Since this is only a complementary exposition, we will not be concerned with the choice of appropriate function spaces. The somewhat formal computation already suggests that the “direct method” is technically much more cumbersome compared to the “indirect method.”

We first sketch how the conjugacy problem for circular maps can be solved in a “direct” way. The idea seems to come from the lecture notes of Hörmander [Reference Hörmander43]. Define a nonlinear mapping

$$ \begin{align*}\mathscr{G}(u,\lambda)=\big[\Delta_\alpha u\big]\circ(\mathrm{Id}+u)^\iota+\lambda, \end{align*} $$

which appears in (2.2). Computing the linearization of $\mathscr {G}$ at $(u,\lambda )$ close to $(0,0)$ , we find

(A.1)

$$ \begin{align} \begin{aligned} D\mathscr{G}(u,\lambda)(v,\mu) &=\big[\Delta_\alpha v\big]\circ(\mathrm{Id}+u)^\iota\\ &\quad+\left(\big[\Delta_\alpha(u+tv)'\big]\circ(\mathrm{Id}+u+tv)^\iota\right)\cdot\frac{d}{dt}(\mathrm{Id}+u+tv)^\iota\Big|_{t=0} +\mu \\ &=\left[(1+u'\circ\varrho_{\alpha})\Delta_\alpha\left(\frac{v}{1+u'}\right)\right]\circ(\mathrm{Id}+u)^\iota+\mu. \end{aligned} \end{align} $$

The linearized equation $D\mathscr {G}(u,\lambda )(v,\mu )=h$ is then solved by

(A.2)

$$ \begin{align} v=(1+u')\Delta_\alpha^{-1}\left(\frac{h\circ(\mathrm{Id}+u)-\mu}{1+u'\circ \varrho_{\alpha}}\right), \end{align} $$

where $\mu $ is the unique real number making the function inside the bracket to have average 0. Thus, if we consider equation (2.2) instead of (2.1), then the linearized equation is exactly solvable.

Note that product and composition of mappings in the grading $\cap _{r\geq 0}C_*^r$ satisfy tame estimates. Using these tame estimates, we find that the solution $(v,\mu )$ of the linearized equation $D\mathscr {G}(u,\lambda )(v,\mu )=h$ satisfies the following tame estimate:

$$ \begin{align*}|v|_{C_*^r}+|\mu| \lesssim_r \big(1+|u|_{C_*^{2+1/2}}\big)|h|_{C_*^{r+s}}+|h|_{C_*^{1/2}}\big(1+|u|_{C_*^{r+s+1}}\big), \quad \text{where }s>\tau+1. \end{align*} $$

We can finally apply any Nash-Moser type theorem to conclude that, at least for very regular f with small magnitude, there is a solution u of (2.2).

In this case, the paradifferential approach is still valid to produce an alternative proof. We suppose a priori that u is sufficiently smooth. For simplicity of notation, write $W=W(u)=(\mathrm {Id}+u)^\iota $ . Let us try to decompose the nonlinear mapping

$$ \begin{align*}u\mapsto (\Delta_\alpha u)\circ W =(\Delta_\alpha u)\circ(\mathrm{Id}+u)^\iota \end{align*} $$

in the left-hand-side of (2.2) using appropriate paradifferential operators. By Proposition 2.5,

(A.3)

$$ \begin{align} (\Delta_\alpha u)\circ W =W^\star\Delta_\alpha u +T_{(\Delta_\alpha u')\circ W}W +\text{smoother remainder}. \end{align} $$

We leave the first term in (A.3) unchanged and concentrate on the paraproduct term, aiming to express $W(u)$ in terms of the paracomposition operator. To do this, we paralinearize the identity $(\mathrm {Id}+u)\circ W(u)=\mathrm {Id}$ to find

$$ \begin{align*}W(u)+W^\star u+T_{u'\circ W}W(u) =\mathrm{Id}+\text{smoother remainder}. \end{align*} $$

Solving for $W(u)$ , we thus obtain

(A.4)

$$ \begin{align} \begin{aligned} W(u)=-T_{(1+u'\circ W)^{-1}}W^\star u+\mathrm{Id}+\text{smoother remainder}. \end{aligned} \end{align} $$

We then cite a proposition on conjugation of paradifferential operators using paracomposition, which is a special case of the conjugation theorem proved in [Reference Alinhac5, Reference Nguyen72] or [Reference Said76]:

Proposition A.1. Let $r>0,\rho >1$ . Suppose $a(x,\xi )=\sum _{|\alpha |\leq m}a_\alpha (x)(i\xi )^\alpha $ is a classical symbol with $C^r_*$ coefficients on $\mathbb {T}^n$ . Suppose $\chi :\mathbb {T}^n\mapsto \mathbb {T}^n$ is a $C^\rho _*$ diffeomorphism. Then

$$ \begin{align*}\chi^\star T_a=T_{a^\star}\chi^\star+\mathcal{R}_{\mathrm{conj}}(\chi,a), \end{align*} $$

where

$$ \begin{align*}a^*(x,\xi) :=\sum_{|\beta|=0}^{[\rho]}\frac{1}{i^{|\beta|}|\beta|!}\partial_y^\beta\partial_\xi^\beta \left(a\big(\chi(x),R(x,y)^{-1}\xi\big) \left|\frac{\det\chi'(y)}{\det R(x,y)}\right|\right) \Bigg|_{y=x}, \end{align*} $$

with

$$ \begin{align*}R(x,y)=\int_0^1\chi'(tx+(1-t)y)^{\mathrm{T}}dt, \end{align*} $$

while the remainder operator maps $H^s$ to $H^{s-m+\min (r,\rho -1)})$ , and satisfies

$$ \begin{align*}\|\mathcal{R}_{\mathrm{conj}}(\chi,a)\|_{\mathcal{L}(H^s,H^{s-m+\min(r,\rho-1)})} \leq K_s\big(|\chi|_{C^\rho_*}\big)\sum_{|\alpha|\leq m}|a_\alpha|_{C^r_*}. \end{align*} $$

Thus, (A.3) is transformed to

(A.5)

$$ \begin{align} \begin{aligned} (\Delta_\alpha u)\circ W &=W^\star\Delta_\alpha u -T_{(\Delta_\alpha u')\circ W}T_{(1+u'\circ W)^{-1}}W^\star u +\text{smoother remainder}\\ &=W^\star\Delta_\alpha u -W^\star T_{(\Delta_\alpha u')/(1+u')}u +\text{smoother remainder}\\ &=W^\star T_{(1+u'\circ\varrho_{\alpha})}\Delta_\alpha T_{1/(1+u')}u +\text{smoother remainder}. \end{aligned} \end{align} $$

The original nonlinear equation (2.2) then becomes

(A.6)

$$ \begin{align} W^\star T_{(1+u'\circ\varrho_{\alpha})}\Delta_\alpha T_{1/(1+u')}u +\lambda =f+\text{smoother remainder}, \end{align} $$

which may still be named parahomological equation. (A.6) can still be converted to a fixed point type equation:

(A.7)

$$ \begin{align} u=T_{1/(1+u')}^{-1}\Delta_\alpha^{-1}T_{(1+u'\circ\varrho_{\alpha})}^{-1}(W^\star)^{-1}\big(f-\lambda+\text{smoother remainder}\big). \end{align} $$

The $\lambda $ can be uniquely determined by balancing the mean value, so that $\Delta _\alpha ^{-1}$ can be applied.

(A.5) is exactly the paradifferential version of the linearization of $\mathscr {G}(u,\lambda )$ , or to put simply, the paralinearization of $\mathscr {G}(u,\lambda )$ . Although quite parallel to the “indirect method,” we still prefer the “indirect method” instead. The reason is that the fixed point equation produced by the “indirect method” is technically less involved than the “direct method.” Even in this simple illustrative model, the fixed point equation (A.7) requires more paracomposition estimates compared to (2.13): it calls for knowledge about inversion of paracompositions, together with conjugation with paracompositions, while (2.13) relies only on the paralinearization formula.

Let us finally discuss the Hamiltonian conjugacy problem. As pointed out by Féjoz [Reference Féjoz29], given a Hamiltonian function h close to a normal form, if one aims to find a normal form H, a symplectic diffeomorphism $\Gamma $ and a frequency shift $\beta $ solving

(A.8)

$$ \begin{align} h=H\circ\Gamma+\beta\cdot y, \end{align} $$

then the linearized equation will be exactly solvable in a neighbourhood of $(\Gamma ,\beta ,H)=(\mathrm {Id},0,h_0)$ . On the other hand, if one considers instead

$$ \begin{align*}h\circ\Gamma-\beta\cdot (y\circ\Gamma)=H, \end{align*} $$

the linearized equation will only admit approximate solution. The form (A.8) then brings a lot of convenience for the Nash-Moser scheme. However, as with the circular map problem discussed earlier, the paralinearization of (A.8) is technically more complicated compared to the “indirect method” since it inevitably involves numbers of paracomposition estimates, though the general idea remains unchanged. In summary, we would prefer the “indirect method” discussed in this paper instead, where we solve the parainverse equation and conclude that it annihilates the nonlinear mapping by a Neumann series argument.

Acknowledgments

The authors would like to thank Alain Chenciner, Jacques Féjoz and Mauricio Garay for valuable discussions on classical KAM theory. The authors would also like to thank Yingdu Dong, Giovanni Forni and Nicolò Tedesco for discussions on the computational details.

Competing interests

The authors have no competing interests to declare.

Footnotes

1 The authors are grateful to Giovanni Forni for bringing this to their attention.

2 It is elementary that if $x\mapsto x+\alpha +f(x)$ has rotation number $\alpha $ , and the conjugacy equation has a solution $(\eta ,\lambda )$ , then $\lambda $ is necessarily zero. Introducing $\lambda $ makes the presentation simpler. See for example Proposition III.4.1.1 of [Reference Herman39].

3 This local chart usually is not the exponential map, since the exponential map is not surjective for an infinite dimensional Lie group. See for example, Section 42 of [Reference Kriegl and Michor56].

4 Unfortunately, we have to use $\theta \in \mathbb {T}^n$ to denote the variable on the “abstract torus” since x has already been occupied to label points in phase space.

5 The authors would like to thank Giovanni Forni and Nicolò Tedesco for pointing out a computational error in the original version of the paper and for subsequent discussions on the details of the correction.

6 We say that a matrix function $f(X)$ is rational in the argument X, if the entries of $f(X)$ are all rational functions of entries of X.

7 Our notation (2.10) is that $\mathbf {Op}^{\mathrm {PM}}(a)=T_a$ , and we prefer to use $\mathbf {Op}^{\mathrm {PM}}$ here in case the expression of a is very lengthy.

References

Alazard, T., Burq, N. and Zuily, C., ‘On the water-wave equations with surface tension’, Duke Math. J. 158(3) (2011), 413–499.10.1215/00127094-1345653CrossRef Google Scholar

Alazard, T. and Métivier, G., ‘Paralinearization of the Dirichlet to Neumann operator, and regularity of three-dimensional water waves’, Comm. Partial Differential Equations 34(12) (2009), 1632–1704.10.1080/03605300903296736CrossRef Google Scholar

Alazard, T. and Shao, C., ‘Paracomposition operators and paradifferential reducibility’, arXiv preprint arXiv:2410.17211, 2024.Google Scholar

Alinhac, S. and Gérard, P., Opérateurs pseudo-différentiels et théorème de Nash-Moser, EDP Sciences, 1991.Google Scholar

Alinhac, S., ‘Paracomposition et opérateurs paradifférentiels’, Comm. Partial Differential Equations 11(1) (1986), 87–121.10.1080/03605308608820419CrossRef Google Scholar

Arnold, V. I., ‘Proof of a theorem of A. N. Kolmogorov on the invariance of quasi-periodic motions under small perturbations of the Hamiltonian’, Russian Math. Surveys 18(5) (1963), 9–36.10.1070/RM1963v018n05ABEH004130CrossRef Google Scholar

Arnold, V. I., ‘Small divisor problems in classical and celestial mechanics’, Russian Math. Surveys 18(6) (1963), 85–192.10.1070/RM1963v018n06ABEH001143CrossRef Google Scholar

Arnold, V. I., ‘Small denominators, I: mappings of the circumference into itself’, Amer. Math. Soc. Transl. Ser. 2(46) (1965), 213–288.Google Scholar

Bedrossian, J., Masmoudi, N. and Mouhot, C., ‘Landau damping: paraproducts and Gevrey regularity’, Ann. PDE 2 (2016), 1–71.10.1007/s40818-016-0008-2CrossRef Google Scholar

Berti, M. and Bolle, P., ‘A Nash-Moser approach to KAM theory’, in Hamiltonian Partial Differential Equations and Applications (Springer, 2015), 255–284.10.1007/978-1-4939-2950-4_9CrossRef Google Scholar

Bony, J.-M., ‘Calcul symbolique et propagation des singularités pour les équations aux dérivées partielles non linéaires’, Ann. Sci. Éc. Norm. Supér. 14 (1981), 209–246.10.24033/asens.1404CrossRef Google Scholar

Bost, J.-B., ‘Tores invariants des systèmes dynamiques hamiltoniens’, Séminaire Bourbaki 1984/85, Société Mathématique de France, 1986, p. 85.Google Scholar

Bounemoura, A. and Fischler, S., ‘A Diophantine duality applied to the KAM and Nekhoroshev theorems’, Math. Z. 275(3–4) (2013), 1135–1167.10.1007/s00209-013-1174-5CrossRef Google Scholar

Bounemoura, A. and Fischler, S., ‘The classical KAM theorem for Hamiltonian systems via rational approximations’, Regul. Chaotic Dyn. 19 (2014), 251–265.10.1134/S1560354714020087CrossRef Google Scholar

Bourgain, J., ‘Construction of periodic solutions of nonlinear wave equations in higher dimension’, Geom. Funct. Anal. 5 (1995), 629–639.10.1007/BF01902055CrossRef Google Scholar

Bourgain, J., ‘On Melnikov’s persistency problem’, Math. Res. Lett. 4(4) (1997), 445–458.10.4310/MRL.1997.v4.n4.a1CrossRef Google Scholar

Bourgain, J., ‘Quasi-periodic solutions of Hamiltonian perturbations of 2D linear Schrödinger equations’, Ann. of Math. 148(2) (1998), 363–439.10.2307/121001CrossRef Google Scholar

Bricmont, J., Gawedzki, K., and Kupiainen, A., ‘KAM theorem and quantum field theory’, Comm. Math. Phys. 201 (1999), 699–727.10.1007/s002200050573CrossRef Google Scholar

Celletti, A. and Chierchia, L., ‘Construction of analytic KAM surfaces and effective stability bounds’, Comm. Math. Phys. 118(1) (1988), 119–161.10.1007/BF01218480CrossRef Google Scholar

Celletti, A. and Chierchia, L., ‘On the stability of realistic three-body problems’, Comm. Math. Phys. 186(2) (1997), 413–449.10.1007/s002200050115CrossRef Google Scholar

Celletti, A. and Chierchia, L., KAM stability and celestial mechanics, Mem. Amer. Math. Soc., Vol. 187 187.878, American Mathematical Society, 2007.Google Scholar

Chierchia, L. and Procesi, M., ‘Kolmogorov–Arnold–Moser (KAM) theory for finite and infinite dimensional systems’, in Encyclopedia of Complexity and Systems Science, Living ed. (Springer, New York, 2018).Google Scholar

Craig, W. and Wayne, C. E., ‘Newton’s method and periodic solutions of nonlinear wave equations’, Comm. Pure Appl. Math. 46(11) (1993), 1409–1498.10.1002/cpa.3160461102CrossRef Google Scholar

Delort, J.-M., ‘Periodic solutions of nonlinear Schrödinger equations: a paradifferential approach’, Anal. PDE 4(5) (2012), 639–676.10.2140/apde.2011.4.639CrossRef Google Scholar

DeTurck, D. M., ‘Deforming metrics in the direction of their Ricci tensors’, J. Differential Geom. 18(1) (1983), 157–162.10.4310/jdg/1214509286CrossRef Google Scholar

Eliasson, L. H., ‘Absolutely convergent series expansions for quasi-periodic motions’, Math. Phys. Electron. J. 4 (1996), 1–33.Google Scholar

Fang, A. J., ‘Nonlinear stability of the slowly-rotating Kerr-de Sitter family’, arXiv preprint arXiv:2112.07183, 2021.Google Scholar

Féjoz, J., ‘Démonstration du “théorème d’Arnold” sur la stabilité du système planétaire (d’après Herman)’, Ergodic Theory Dynam. Systems 24(5) (2004), 1521–1582.10.1017/S0143385704000410CrossRef Google Scholar

Féjoz, J., ‘Introduction to KAM theory, with a view to celestial mechanics’, in Variational Methods in Imaging and Geometric Control (de Gruyter, 2016).Google Scholar

Forni, G., ‘Solutions of the cohomological equation for area-preserving flows on compact surfaces of higher genus’, Ann. of Math. 146(2) (1997), 295–344.10.2307/2952464CrossRef Google Scholar

Forni, G., ‘Sobolev regularity of solutions of the cohomological equation’, Ergodic Theory Dynam. Systems 41(3) (2021), 685–789.10.1017/etds.2019.108CrossRef Google Scholar

Forni, G., ‘Finite codimension stability of invariant surfaces’, arXiv preprint arXiv:2502.00898, 2025.Google Scholar

Gage, M. and Hamilton, R. S., ‘The heat equation shrinking convex plane curves’, J. Differential Geom. 23(1) (1986), 69–96.10.4310/jdg/1214439902CrossRef Google Scholar

Grafakos, L., Modern Fourier Analysis, 2nd ed., Grad. Texts Math., vol. 250 (Springer, New York, 2009).10.1007/978-0-387-09434-2CrossRef Google Scholar

Günther, M., ‘On the perturbation problem associated to isometric embeddings of Riemannian manifolds’, Ann. Global Anal. Geom. 7 (1989), 69–77.10.1007/BF00137403CrossRef Google Scholar

Hamilton, R. S., ‘The inverse function theorem of Nash and Moser’, Bull. Amer. Math. Soc. (N.S.) 7(1) (1982), 65–222.10.1090/S0273-0979-1982-15004-2CrossRef Google Scholar

Hamilton, R. S., ‘Three-manifolds with positive Ricci curvature’, J. Differential Geom. 17(2) (1982), 255–306.10.4310/jdg/1214436922CrossRef Google Scholar

Hénon, M., ‘Exploration numérique du problème restreint. IV. Masses égales, orbites non périodiques’, Ann. Astrophys. 3.2 (1966), 49–66.Google Scholar

Herman, M. R., ‘Sur la conjugaison différentiable des difféomorphismes du cercle à des rotations’, Publ. Math. Inst. Hautes Études Sci. 49 (1979), 5–233.10.1007/BF02684798CrossRef Google Scholar

Herman, M. R., ‘Simple proofs of local conjugacy theorems for diffeomorphisms of the circle with almost every rotation number’, Bol. Soc. Brasil. Mat. 16(1) (1985), 45–83.10.1007/BF02584836CrossRef Google Scholar

Hintz, P. and Vasy, A., ‘The global non-linear stability of the Kerr–de Sitter family of black holes’, Acta Math. 220(1) (2018), 1–206.10.4310/ACTA.2018.v220.n1.a1CrossRef Google Scholar

Hörmander, L., ‘The boundary problems of physical geodesy’, Arch. Ration. Mech. Anal. 62(1) (1976), 1–52.10.1007/BF00251855CrossRef Google Scholar

Hörmander, L., Implicit function theorems, Lectures at Stanford University, summer quarter, 1977.Google Scholar

Hörmander, L., The Nash-Moser theorem and paradifferential operators, in Analysis, et cetera, Elsevier, 1990, pp. 429–449.10.1016/B978-0-12-574249-8.50024-9CrossRef Google Scholar

Hörmander, L., Lectures on Nonlinear Hyperbolic Differential Equations, Math. Appl., vol. 26, Springer, 1997.Google Scholar

Kato, T., On nonlinear Schrödinger equations, II. Hs-solutions and unconditional well-posedness, J. Anal. Math. 67 (1995), 281–306.10.1007/BF02787794CrossRef Google Scholar

Katznelson, Y. and Ornstein, D., ‘The absolute continuity of the conjugation of certain diffeomorphisms of the circle’, Ergodic Theory Dynam. Systems 9(4) (1989), 681–690.10.1017/S0143385700005289CrossRef Google Scholar

Kenig, C. E., Ponce, G., and Vega, L., ‘Well-posedness and scattering results for the generalized Korteweg-de Vries equation via the contraction principle’, Comm. Pure Appl. Math. 46(4) (1993), 527–620.10.1002/cpa.3160460405CrossRef Google Scholar

Khanin, K., Dias, J. L., and Marklof, J., ‘Multidimensional continued fractions, dynamical renormalization and KAM theory’, Comm. Math. Phys. 270(1) (2007), 197–231.10.1007/s00220-006-0125-yCrossRef Google Scholar

Khanin, K. M. and Sinai, Ya. G., ‘A new proof of M. Herman’s theorem’, Comm. Math. Phys. 112 (1987), 89–101.10.1007/BF01217681CrossRef Google Scholar

Klainerman, S. and Ponce, G., ‘Global, small amplitude solutions to nonlinear evolution equations’, Comm. Pure Appl. Math. 36(1) (1983), 133–141.10.1002/cpa.3160360106CrossRef Google Scholar

Klainerman, S., ‘Global existence for nonlinear wave equations’, Comm. Pure Appl. Math. 33(1) (1980), 43–101.10.1002/cpa.3160330104CrossRef Google Scholar

Klainerman, S., ‘Long-time behavior of solutions to nonlinear evolution equations’, Arch. Ration. Mech. Anal. 78(1) (1982), 73–98.10.1007/BF00253225CrossRef Google Scholar

Klainerman, S., ‘Uniform decay estimates and the Lorentz invariance of the classical wave equation’, Comm. Pure Appl. Math. 38(3) (1985), 321–332.10.1002/cpa.3160380305CrossRef Google Scholar

Kolmogorov, A. N., ‘On the conservation of quasi-periodic motion for a small variation of the Hamiltonian function’, Dokl. Akad. Nauk SSSR 98 (1954), 527–530.Google Scholar

Kriegl, A. and Michor, P. W., The Convenient Setting of Global Analysis, Math. Surveys Monogr., vol. 53, American Mathematical Society, 1997.10.1090/surv/053CrossRef Google Scholar

Kuksin, S. B., ‘Hamiltonian perturbations of infinite-dimensional linear systems with an imaginary spectrum’, Funct. Anal. Appl. 21 (1987), 192–205.10.1007/BF02577134CrossRef Google Scholar

Laskar, J., Hénon, Michel and the stability of the solar system, arXiv preprint arXiv:1411.4930, 2014.Google Scholar

de la Llave, R., González, A., Jorba, À., and Villanueva, J., ‘KAM theory without action-angle variables’, Nonlinearity 18(2) (2005), 855.10.1088/0951-7715/18/2/020CrossRef Google Scholar

de la Llave, R., A tutorial on KAM theory, in Smooth Ergodic Theory and Its Applications, Proc. Sympos. Pure Math., vol. 69, American Mathematical Society, 2001, pp. 175–296.10.1090/pspum/069/1858536CrossRef Google Scholar

Marmi, S., Moussa, P., and Yoccoz, J.-C., ‘The cohomological equation for Roth-type interval exchange maps’, J. Amer. Math. Soc. 18(4) (2005), 823–872.10.1090/S0894-0347-05-00490-XCrossRef Google Scholar

Marmi, S., Moussa, P., and Yoccoz, J.-C., ‘Linearization of generalized interval exchange maps’, Ann. of Math. 176(3) (2012), 1583–1646.10.4007/annals.2012.176.3.5CrossRef Google Scholar

Métivier, G., Para-differential Calculus and Applications to the Cauchy Problem for Nonlinear Systems , Edizioni della Normale, 2008.Google Scholar

Meyer, Y., Remarques sur un théorème de J.-M. Bony, Université de Paris-Sud, Département de Mathématique, 1980.Google Scholar

Meyer, Y., Régularité des solutions des équations aux dérivées partielles non linéaires, in Séminaire Bourbaki 1979/80, Lecture Notes in Math., vol. 842, Springer, 1981, pp. 293–302.Google Scholar

Ming, M. and Zhang, Z., ‘Well-posedness of the water-wave problem with surface tension’, J. Math. Pures Appl. 92(5) (2009), 429–455.10.1016/j.matpur.2009.05.005CrossRef Google Scholar

Moser, J., ‘On invariant curves of area-preserving mapping of an annulus’, Matematika, 6(5) (1962), 51–68.Google Scholar

Moser, J., ‘A rapidly convergent iteration method and non-linear partial differential equations—I’, Ann. Scuola Norm. Sup. Pisa Cl. Sci. 20(2) (1966), 265–315.Google Scholar

Moser, J., ‘A rapidly convergent iteration method and non-linear partial differential equations—II’, Ann. Scuola Norm. Sup. Pisa Cl. Sci. 20(2) (1966), 499–535.Google Scholar

Mouhot, C. and Villani, C., ‘On Landau damping’, Acta Math. 207 (2011), 29–201.10.1007/s11511-011-0068-9CrossRef Google Scholar

Nash, J., ‘The imbedding problem for Riemannian manifolds’, Ann. of Math. 63(1) (1956), 20–63.10.2307/1969989CrossRef Google Scholar

Nguyen, H., ‘A sharp Cauchy theory for the 2D gravity-capillary waves’, Ann. Inst. H. Poincaré Anal. Non Linéaire 34 (2017), 1793–1836.10.1016/j.anihpc.2016.12.007CrossRef Google Scholar

Pöschel, J., A lecture on the classical KAM theorem, arXiv preprint arXiv:0908.2234, 2009.Google Scholar

Pöschel, J., ‘KAM à la R’, Regul. Chaotic Dyn. 16 (2011), 17–23.10.1134/S1560354710520060CrossRef Google Scholar

Rüssmann, H., ‘KAM iteration with nearly infinitely small steps in dynamical systems of polynomial character’, Discrete Contin. Dyn. Syst. Ser. S 3(4) (2010), 683–718.10.3934/dcdss.2010.3.683CrossRef Google Scholar

Said, A. R., ‘On paracomposition and change of variables in paradifferential operators’, J. Pseudo-Differ. Oper. Appl. 14(2) (2023), 25.10.1007/s11868-023-00510-0CrossRef Google Scholar

Salamon, D. and Zehnder, E., ‘KAM theory in configuration space’, Comment. Math. Helv. 64(1) (1989), 84–132.10.1007/BF02564665CrossRef Google Scholar

Salamon, D., ‘The Kolmogorov-Arnold-Moser theorem’, Math. Phys. Electron. J. 10(3) (2004), 1–37.Google Scholar

Sergeraert, F., ‘Une généralisation du théorème des fonctions implicites de Nash’, C . R. Acad. Sci. Paris Sér. A 270 (1970), 861–863.Google Scholar

Staffilani, G., The Initial Value Problem for Some Dispersive Differential Equations, Ph.D. thesis, University of Chicago, 1995.Google Scholar

Stein, E. and Murphy, T., Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals, Princeton Math. Ser., vol. 3 (Princeton University Press, 1993).Google Scholar

Taylor, M., Tools for PDE: Pseudodifferential Operators, Paradifferential Operators, and Layer Potentials, Math. Surveys Monogr., vol. 81 (American Mathematical Society, 2000).Google Scholar

Wayne, C. E., An introduction to KAM theory, preprint, https://math.bu.edu/people/cew/preprints/introkam.pdf, 2008.Google Scholar

Yoccoz, J.-C., ‘Conjugaison différentiable des difféomorphismes du cercle dont le nombre de rotation vérifie une condition diophantienne’, Ann. Sci. Éc. Norm. Supér. 17 (1984), 333–359.10.24033/asens.1475CrossRef Google Scholar

Zehnder, E., ‘An implicit function theorem for small divisor problems’, Bull. Amer. Math. Soc. 80(1) (1974), 174–179.10.1090/S0002-9904-1974-13407-5CrossRef Google Scholar

Zehnder, E., ‘Generalized implicit function theorem with applications to some small divisor problems, I’, Comm. Pure Appl. Math. 28 (1975), 91–140.10.1002/cpa.3160280104CrossRef Google Scholar

Zehnder, E., ‘Generalized implicit function theorem with applications to some small divisor problems, II’, Comm. Pure Appl. Math. 29 (1976), 49–111.10.1002/cpa.3160290104CrossRef Google Scholar

Article contents

KAM via standard fixed point theorems

Abstract

MSC classification

Information

1 Introduction

1.1 Organization of the paper

Theorem 1.1 (Existence of Invariant Torus).

Theorem 1.2 (Translated Conjugacy).

1.2 A brief review of history

1.3 Notation and convention

2 Circular map as illustrative model

2.1 Description of the problem

2.2 Approximate right inverse

2.3 Quick introduction to paradifferential calculus

Proposition 2.1 (Continuity of paraproduct, rough version).

Proposition 2.2 (Composition of paraproducts, rough version).

Proposition 2.3 (Paralinearization, rough version).

2.4 Parahomological equation

Proof. Simple solution for (2.1).

2.5 Refinement in regularity

2.6 Heuristics about generality

3 Quantitative paraproduct and paralinearization estimates

3.1 Meyer multipliers

Proposition 3.1 (Meyer multiplier).

Proof of Proposition 3.1.

3.2 Quantitative estimates for paraproduct operators

3.3 Bony’s paralinearization theorem

4 Existence of invariant torus

4.1 Algebraic set-up

Lemma 4.1 ([Reference Berti and Bolle10], Lemma 3).

4.2 Deriving parahomological equation

Case 1: $\mathrm {Avg} Q$ is invertible

Case 2: Nontrivial parameter $\xi $

4.3 Proof of Theorem 1.1–1.2

4.4 Miscellaneous

A “Direct method” for conjugacy problems

Acknowledgments

Competing interests

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests