Confluence of left-linear higher-order rewrite theories by checking their nested critical pairs

Abstract User-defined higher-order rewrite rules are becoming a standard in proof assistants based on intuitionistic type theory. This raises the question of proving that they preserve the properties of beta-reductions for the corresponding type systems. In a series of papers, we develop techniques based on van Oostrom’s decreasing diagrams that reduce confluence proofs to the checking of various forms of critical pairs for higher-order rewrite rules extending beta-reduction on pure lambda-terms. As shown in a previous paper of the two middle authors, confluence of a terminating set of left-linear rewrite rules is obtained when their critical pairs are joinable, beta-rewrite steps being disallowed. The present paper concentrates on the case where arbitrary beta-rewrite steps are allowed for joining critical pairs. The rewrite relation used for analyzing confluence may rewrite arbitrarily many non-overlapping redexes in a single step. This relation gives rise to critical pairs that overlap both horizontally, as with parallel rewriting, but also vertically, forming chains of successive overlaps. Practical examples of use of this technique are analyzed.


Introduction
The two essential properties of a type theory, consistency and decidability of type checking, follow from three simpler ones: type preservation, strong normalization, and confluence.In dependent type theories, however, confluence is often needed to prove type preservation and strong normalization, making all three properties interdependent if termination is used in the confluence proof.This circularity can be broken in two ways: by proving all properties together within a single induction (Goguen 1994); or by proving confluence on untyped terms first, and then successively type preservation, confluence on typed terms, and strong normalization.We develop the latter way here, focusing on untyped confluence.
In a previous paper, we have investigated the case of a terminating set of left-linear rules for which critical pairs can be shown joinable by using rules from R. In the same paper, we explained that allowing us the use of arbitrary β-steps for joining critical pairs would require a more complex notion of critical pair, and that parallel critical pairs cannot suffice in general.The goal of this paper is therefore to address the case of possibly non-terminating left-linear rules, or of terminating left-linear rules whose critical pairs cannot be joined without using β-steps.
Because beta reductions do not terminate for pure lambda terms, and rewrite rules may have critical pairs, the only available tool for proving confluence in this case is based on van Oostrom's decreasing diagrams (van Oostrom 1994).Van Oostrom's theorem is abstract: its application to non-terminating term rewriting relations conceals many difficulties, as is stressed in (Appel et al. 2010).An essential aspect of our methodology is to exhibit the rewrite relation for which confluence results from an analysis of its critical pairs.Here, this relation is orthogonal higher-order rewriting which pairs up nicely with beta reductions, whose confluence proof by Tait and Martin Löf was actually based on a dedicated notion of orthogonal reductions that they called parallel.A general notion of orthogonal rewriting was introduced in (Terese 2003), called multi-step rewriting.Multi-step rewriting aims at overcoming the limitations of left-linear, critical pair free rewriting systems introduced in (Huet and Lévy 1991), also called orthogonal systems.The main idea is to package several rewriting steps together, using possibly different left-linear rules provided they do not overlap, therefore achieving orthogonality inside a given multi-step by definition of rewriting.Orthogonal rewriting, as defined here, is a variation of the same idea in which the use of a single rule is allowed in a given multi-step.It turns out that the analysis of confluence becomes then much easier, and that fewer critical pairs need to be considered.The analysis of orthogonal rewriting and the corresponding Nested Critical Pair Theorem are essential technical contributions of this paper.This lemma shows that critical pairs of orthogonal rewriting may need overlapping left-hand sides of rules horizontally (at parallel positions), as well as vertically (at an increasing sequence of ancestor positions).As a consequence, nested critical pairs may be infinitely many.
Our main theoretical result is then that higher-order rewriting in combination with beta reduction is confluent on untyped terms if all its nested higher-order critical pairs admit decreasing rewrite diagrams.It is, however, possible to stick to non-nested higher-order critical pairs for rules belonging to a terminating subset, as in (Ferey and Jouannaud 2019).The technique is illustrated with practical examples showing the strength and limits of the result.
Our computational model based on untyped higher-order reductions is recalled in Section 2, which contains a brief statement of our main result.Higher-order orthogonal reductions are defined in Section 3, which culminates with the nested critical peak theorem.Confluence is studied in Section 4: after recalling the notion of decreasing diagrams, algebraic properties of reductions are developed before giving the confluence theorem holding in our computational model.

Computational Model
We aim at rewriting terms of an untyped lambda calculus λF generated by three pairwise disjoint sets, a signature F of function symbols, a set X of variables, and a set Z of meta-variables.

Terms
λF is a mix of the annotated lambda-calculus and Klop's combinatory reduction systems (Klop 1980) which extends the calculus introduced in (Ferey and Jouannaud 2019) by having annotated abstractions to faithfully abstract dependently typed calculi whose confluence properties are our real target.
Terms are those of an untyped lambda calculus equipped with a binary abstraction λx:u.v,whose first argument u is an arbitrary term called annotation, and enriched with F -headed terms of the form f (u) with f ∈ F and meta-terms of the form Z [v] with Z ∈ Z .Only variables can be abstracted over.Elements of the vocabulary have arities, denoted by vertical bars as in |a|.Variables have arity zero.The grammar of terms is the following: As is usual, we do not duplicate parentheses, writing f (x y) for f ((x y)).This is the only case where an application does not carry its own parentheses along.
λF is introduced with a unary abstraction in (Ferey and Jouannaud 2019).Our abstraction operator λx : .has arity 2, the first argument being the annotation and the second the body.As is always the case in typed lambda calculi, the scope of the abstraction is reduced to its second argument: renaming the variable x by a fresh variable y in λx : u.v amounts to rename the bounded occurrences of x by y in the body only.Our syntax is slightly richer than that of the λ-calculus in order to enable abstracting step by step derivations in dependently typed lambda calculi by derivations in λF (we know of no confluence preserving encoding of dependently typed derivations into the untyped lambda calculus with a unary abstraction).Annotations come into play when analyzing ancestor peaks in Sections 4.5.2 and 4.5.3.The calculus without annotation being itself an abstraction of λF , all results presented here adapt straightforwardly to that calculus by forgetting annotations (in which case, λx.becomes a unary operator, whose first argument is now its body).We use this facility unannounced in examples originating from non-dependent typed lambda calculi.
Unlike function symbols and Klop's meta-variables, meta-variables here have an arity which is not fixed, but bounded.This handy feature used in DEDUKTI (Dowek 2016) provides a simple syntax for expressions of the form ( . . .(X a 1 ) . . .a n ).For example, if |Z| = 1, the two terms f (Z) and f (λx : nat.Z[x]) -standing for f (λx : nat.(Z x))-coexist (and are different in the absence of extensionality -DEDUKTI is not extensional).The example described in Section 2.12 shows that this allows us more concise rules by using different arities for different occurrences of the same meta-variable in a rule, hence avoiding useless η-expansions.
We use the small letters f , g, . . .for function symbols and x, y, . . .for variables and reserve capital letters X, Y, . . .for meta-variables.When convenient, a small letter like x may denote any variable in X ∪Z .We use the notation |_| to denote various quantities besides arities, such as the length of a list, the size of an expression or the cardinality of a set.Given a list u, u[m..n] denotes the finite sublist u m , . . .u n , and u[m..n]\{i 1 , . . ., i p } the sublist of u[m..n] whose elements u i j for j∈ [1..p] have been filtered out.u may be omitted, in which case it is the list of natural numbers.We use the notation {a 1 , . . ., a n } for enumerated sets or multisets and identify {a} with a.

Positions
Positions in terms are words over the natural numbers, using • for concatenation, for the empty word, ≤ P for the prefix order (above), ≥ P (below) for its inverse, < P and > P for their respective strict parts, and p#q for ¬( > P ∨≤ P ) (parallel).These orders are extended to sets of positions as follows: P ≥ P Q (P > P Q, P ≤ P Q, P < P Q, respectively), where Q is a set of parallel positions, iff ∀p ∈ P ∃q ∈ Q such that p ≥ P q (p > P q, p ≤ P q, p < P q, respectively).We denote by P min the subset of minimal positions of a set of positions P, and by P| Q , where Q is a subset of parallel positions of P, the set {o : ∃q ∈ Q such that q • o ∈ P}.
Meta-variables having an arity, positions in the meta-term Z[t] are no different from positions in the term f (t).As usual, p n denotes the concatenation of p with itself (n − 1)-times.
Given a term M, we use V ar(M) and MV ar(M) for its sets of free variables and of metavariables, respectively, M(p) for its symbol at position p, M| p for its subterm at position p, M [N] p for the term obtained by replacing M| p by N in M, and Pos(M), V Pos(M), MPos(M) for the following respective sets of positions of M: all positions, the positions of free variables, and of meta-variables.A term M is pure if no variable of M is bound twice or occurs both bound and free, ground if V ar(M)=∅, closed if MV ar(M)=∅, and linear if |MPos(M)|=|MV ar(M)|.
For example
-Case 2: m > n: since ar(s) Substitutions are extended to sequences of terms and to substitutions in the natural way.We use postfix notation for the application of σ to a term t, writing tσ , or to a vector of terms t, writing tσ , or to a substitution τ , writing τ σ , and call tσ (resp., tσ , τ σ ) the instance of t (resp., t, τ ) by σ .The notation Pos(σ ) will have the obvious meaning of a sequence of Dom(σ )-indexed sets of positions.
Let for example s be the term f (X[(x y), y]), where the meta-variable X has two arguments, (x y) and y, and σ be the substitution {X → λx y z .g(x, y , z ), y → a}, assuming X has arity 3.Then, we get sσ = f (λz .g((xy)σ , yσ , z σ )) = f (λz .g((xa), a, z )).Let us now compare sσ with the instance by σ of the term u = f ((X (x y)) y), in which X is applied successively to (x y) and y.Then uσ = f ((λx y z .g(x, y , z ) (x a)) a).We can see that uσ reduces to sσ in two β-steps (anticipating the next section): substitutions of meta-variables hide those reductions.This actually has positive impacts on confluence in practice, as we shall discover.

Splitting terms
Definition 1.Given a term u, a set P = {p i } i=n i=1 of parallel positions in u such that ∀i ∀q < P p i : u(q) ∈ Z is called set of splitting positions of u.
The term obtained by splitting u along P is u ] p n (cutting out below P) and its associated substitution is u P = {Z i → λx i .u|p i } i=n i=1 (cutting out above P), where, ∀i ∈ [1, n], x i is the list of all variables of u| p i bound in u on the path from p i to its root, and Z i is a fresh meta-variable of arity |x i |.
The definition of substitution for meta-variables ensures that u P u P =u, which justifies writing u=u[u| P ] P as a familiar shorthand.
Splitting allows us to rewrite independently above and below the set of positions P, the abstractions introduced by u P protecting in the subterms u| P the occurrences of variables bound above P.

Reductions
Given a binary relation −→ on terms, called rewriting, we use: ←− for its inverse; ←→ for its closure by symmetry; −→ −→ for its closure by reflexivity and transitivity; and ←→ ←→ for its closure by reflexivity, symmetry and transitivity (also called convertibility).Rewriting terms extends to substitutions as expected.
A term s is in normal form if there is no t such that s−→t.If it is not, we define a (not necessarily unique) normal form for s as a term t in normal form such that s −→ −→ t.
A rewriting relation is terminating if all its reduction sequences t 0 −→ t 1 −→ . . .−→ t n are finite.Termination garantees the existence of normal forms for every term.
A peak (resp., local peak) is a triple of terms s.t.s ←− ←− u −→ −→ t (resp., s ←− u −→ t).Two terms s, t are joinable if s −→ −→ u ←− ←− t for some u.Confluence is the property that every two convertible terms are joinable.Confluence garantees the unicity of normal forms for every term.
Arrow signs used for rewriting may be decorated, below by a name, and above by positions at which rewriting takes place, as in s p −→ i t (rewriting s at position p with rule i) or by a property that these positions satisfy, as in u Two different kinds of reductions coexist in λF , functional and higher-order reductions.Both are meant to operate on closed terms.However, rewriting open terms will sometimes be needed, in which case rewriting is intended to rewrite all their closed instances at once.

Functional reductions
Functional reduction is the relation on terms generated by the rule (λx:u.vw)−→ β α v{x → w}.The usually omitted α-index stresses that renaming bound variables, called α-conversion, is built-in.The argument u, which plays no effective rôle here, will often be omitted as well.
As is customary (Miller 1991), the particular case for which w is a variable is denoted by β 0 .Note that instantiating a β 0 -step may yield a full β-step.For example, s = (λx : u.(λy : . This is our main motivation for using Klop's notion of substitution for meta-variables, among whose numerous benefits is the elimination of β 0 -steps that are now hidden under the carpet.
We will also use a particular case of extensionality, for meta-variables only: The rôle of Mη is to identify two meta-terms having a different number of subterms, as is made possible by our notion of meta-variable with maximum arity.Clearly, Mη is not a sound rule of the annotated λ-calculus, since it equates λz:a.X [v, z] with λz:bX [v.z] for arbitrary annotations a and b.Identification of such meta-terms will be important later to join critical pairs; this will be our only use of Mη.In this context, the (sound) property needed is the following: Lemma 2. Let s, t be terms such that s| p = λz : a.X[v, z] and t = s[X [v]] p , σ a closed substitution of the form X → λxz : a.v (omitting annotations for x).Then, uσ =vσ .
Proof.We get: λz : u.X [v, z] showing that the annotation does not play a rôle on appropriate closed instances.
One particular case where σ must be of the above form, hence allowing to use Lemma 2, is when there is a single annotation possible, which encodes the calculus without annotations.This is the case of the example described in Section 2.12 whose critical pairs are computed in Section 4.3.
Otherwise, it must be proved that σ must be of the above form, hence possibly requiring some typing argument.

Patterns
Higher-order reductions result from rules whose left-hand sides are higher-order patterns in Miller's or Nipkow's sense (Mayr and Nipkow 1998), although they need not be typed here: Definition 3 (Pattern).A pre-redex of arity n in a term L is an unapplied meta-term Z[x] whose arguments x are n pairwise distinct variables.A pre-pattern is a β-normal term all of whose metavariables occur in pre-redexes.A pattern is a ground pre-pattern which is neither a pre-redex nor an abstraction, that is, is not headed by a meta-variable or λ.
We assume, as does Nipkow, that patterns are β-normal, which allows us to eliminate critical pairs of users' rules with the β-rule.Note also that patterns are ground: free variables are not needed, one can use meta-variables of arity zero instead.Pre-patterns pop up naturally in pattern matching and unification, since patterns must be deconstructed.
Erasing types from a Nipkow's pattern yields a pattern in our sense, since his pre-redexes being of base type, they cannot be applied.This restriction is important for matching and unification of patterns (Ferey and Jouannaud 2019).
Observe that pre-redexes in pre-patterns occur at parallel positions, whose set plays a key rôle: Definition 4 (Fringe).The fringe F L of a pre-pattern L is the set of parallel positions of its preredexes.We denote by F Pos(L)={p∈Pos(L) : p ≥ P F L } the set of functional positions of L. For convenience, we define )) is a pattern.Its pre-redexes are the subterms X[x, y, z] and X[x, y].Its fringe is the set [x, y, z])) (a X)) is also a pattern, and its fringe is the set ), and f (X Y) are no patterns.
Note that the set of functional positions coincides with its first-order version, and that patterns have a nonempty set of functional positions.Since patterns are ground terms, for all pre-redexes Z[x] = L| p at position p ∈ F L in the pattern L, the variables x are all locally bound above p in L.

Higher-order matching and unification of patterns
Given a term u and a pattern L, the search for a substitution σ such that Lσ = u, called a match of L = u, is a matching problem.Since L is ground, the domain of σ is a set of meta-variables, and therefore, matching reduces to the textual replacement of the meta-variables in Dom(L) by their value followed by some β 0 -steps: matching a term against a pattern is called higher-order pattern matching.An algorithm is given in (Ferey and Jouannaud 2019) for the syntax adopted here.
Given now two patterns -or pre-patterns -L, G, the search for a substitution σ such that Lσ = Gσ , called a solution of L = G, is a unification problem.Again, the terms obtained by textual replacement of the meta-variables in Dom(L) and Dom(G) by their value cannot be exactly equal since β 0 steps need to be performed: unification of patterns is called higher-order unification.An algorithm is given in (Ferey and Jouannaud 2019) which computes a most general higherorder unifier, that is a substitution θ of which any solution σ is an instance: σ = θτ for some τ (up to variable renaming).Again, Lσ , Gσ , Lθτ , and Gθτ are all equal (up to variable renaming), β 0 -equality steps being hidden.
By incorporating β 0 -steps to the substitution calculus, we got rid of them.They will not show up anywhere, hence eliminating a technical burden of a previous definition of untyped higherorder rewriting (Assaf et al. 2018).Matching and unification of simply typed patterns is due to Miller (Miller 1991), see also (Nipkow 1991).

Higher-order reductions
Definition 5 (Rule).A (higher-order) rule is a triple i:L→R, where i is its (possibly omitted) name, the left-hand side L is a pattern, and the right-hand side R is a ground term such that MV ar(R) ⊆ MV ar(L).The rule is left-linear if L is linear.
So, rules are pairs of (specific) ground terms, and their left-hand sides must be headed by a function symbol or an application, but cannot be β-reducible.Both terms may have metavariables, but do not admit free variables.This allows us to clearly separate the object language (which has no meta-variables), from the meta-language (which has meta-variables).Rules, critical pairs, and splittings belong to the meta-language.
Definition 6 (Higher-order untyped rewriting).Given a term u, a position p∈Pos (u) Let's now make our splitting notations fully explicit.Whenever u p −→ i v, we have by definition: [x]] p and u p = {X → λx.u| p } with x the variables bound above p in u and X a fresh meta-variable of arity |x|.
Note the simplicity of this definition of higher-order rewriting, which is exactly the same as the definition of rewriting for first-order terms.In sharp contrast with Nipkow, we observe that we do not need matching explicitly modulo β 0 , since the corresponding β 0 -steps are now hidden in Klop's definition of substitution for meta-variables.Besides, we do not assume that u, or v, is β-normal -or even β 0 -normal-, entirely or up to position p.Two reasons prevent it: first, β-normal forms may not exist; and second, the techniques we use rely on monotonicity and stability properties, which would not be satisfied were normalization steps used in the definition.
Example 3 (Lambda calculus).The trivial encoding, as a higher-order rule in our language, of the beta-rule is (λx does not work: the left-hand side is not a pattern, since it is not β-normal.The seemingly better encoding (U V) → U[V], whose left-hand side is β-normal, does not work either, since the pre-redex U is applied, which is also forbidden.We must therefore, as is usual, encode application as a binary operator @.The beta-rule then becomes @(U, V) → U[V], using the facility that U has an arity at most 1 (and not equal to 1).We can now notice that this rule does not overlap itself except trivially, hence has no critical pairs by itself.
This example shows the possibility to encode the beta rule in order to study its properties as a higher-order rule by extending the language with the symbol @.Using instead the encoding β : (λx : W.U[x] V) → U[V], its left-hand side is only missing the property that patterns are beta-normal.But few properties of higher-order rewriting require that left-hand sides of higherorder rules are beta-normal.To avoid unnecessary repetitions, we use this remark in the sequel by explicitly mentioning which are properties of R ∪ β and which are properties of R alone.

Basic properties of higher-order rewriting
All coming properties are true of rules in R ∪ β.
Lemma 7 (Splitting).Let s q −→ L→R t and K ⊆ Pos(s) such that q ∈ K.Then, s K −→ L→R σ and t = s K σ .
Lemma 8 (Monotonicity).Let s p −→t and u a term such that q∈Pos(u).Then, u[s] By u[s] q , we of course mean u[X [x]] q {X → λx.s}, omitting the annotations of the bound variables that are here useless since they disappear by instantiation, where x is the set of variables in s which are bound in u.

Rewrite theories and their confluence
Rewrite theories are used in various type systems, in particular in DEDUKTI, to define the conversion rule of the calculus, which is, as is customary, untyped.Definition 11.A λF -rewrite theory is a pair (F , R) made of a user's signature F and a set R of higher-order rewrite rules on that signature, defining the rewrite relation −→ The problem we consider is whether a left-linear λF -rewrite theory is confluent on closed terms and how to show its confluence by inspecting critical pairs of some sort.
We give two successive answers to this question.The first one is a recall, the second one is new and will be developed in the subsequent sections: A left-linear rewrite theory defines a confluent rewrite relation −→ λF in the following cases.
(1) R is terminating, and its higher-order critical pairs are joinable with R-steps (Ferey and Jouannaud 2019); (2) nested higher-order critical pairs of R are joinable by decreasing diagrams using −→ λF -steps.
In both cases, = α∪Mη equational steps may be needed at the bottom of the joinability diagrams.
Nested critical pairs are obtained by overlapping left-hand sides of rules horizontally (as in parallel critical pairs), as well as vertically, see Definition 31 and Lemma 32.Our particular use here of van Oostrom's decreasing diagrams is introduced in Section 4.1.
The reader must realize that, although dependently typed rules may be terminating -a standard requirement in type theory -their untyped version may be non-terminating.Further, the first confluence criterion forbids β-steps for joining critical pairs, a real obstacle in practice.Finally, we will see that we can take advantage of terminating subsets of R, hence subsuming (Ferey and Jouannaud 2019).
We may wonder why there is no answer based on parallel critical pairs.There is indeed a noninteresting one (Férey 2021): first, the right-hand side of a rewrite rule may not contain a subterm of the form X[t] such that some meta-variable Y occurs in t, a test that the encoding of the β rule @(X Y) → X[Y] fails; second, the critical pairs must satisfy the so-called "Variable condition" which imposes a strong constraint on how a critical pair can be joined; third, it is often the case that nested higher-order critical pairs reduce to parallel critical pairs (or even critical pairs), in which case the "Variable condition" need not be checked, since our result applies then as well.

The theory of global states
An important example of higher-order system that will illustrate our results is Plotkin and Power's theory of global states for a single location (Plotkin and Power 2003).It is described by two types (given for the user's understanding; they are of no use here), Val for values and A for states, a unary operation lk for looking up a state, a binary operation ud for updating a state, and five higher-order rules which satisfy our format: ) looks up the state, binds its value to v, and continues with t while ud(v, t) updates the state to v, and continues with t.In the rules below, we use U, V, W (resp. X)( resp.Y) for meta-variables of arity 0 (resp.1) (resp.2).
Our presentation is a simplification of Hamana's (Hamana 2017), making use of the property that meta-variables may have a bounded arity.This rewrite theory is proved confluent when terms are typed with weak polymorphism by Hamana, and in a sorted framework in (Ferey and Jouannaud 2019), for which arguments of type Val are instantiated by constants so as to guarantee termination.This example illustrates the gain in using nested critical pairs when the rewrite theory is non-terminating: our result applies for any instantiation of arguments of type Val.

Encoding and decoding
Another example of higher-order system, which illustrates the kind of applications targeted in DEDUKTI, is obtained by shallow encoding (and decoding) terms of some given functional language, we choose here a lambda calculus reduced to its application operator @.Since the encoding is shallow, the binder of the encoded calculus is just that of the encoding calculus, here λF .It is described by two types (given for the user's understanding; they are of no use here), Term for λ-terms and Code for encoded λ-terms.Codes are built using two constructors, @ and .Two unary operations ⇓ and ⇑ allow us use to encode a term and decode a code, respectively.The following six higher-order rules satisfy our format: Again, we will show that this rewrite theory is confluent by inspecting its nested critical pairs, some of which will be joinable by using beta rewrite steps, therefore illustrating the other advantage of the present method.

Orthogonal Rewriting
Since beta-reductions do not terminate on untyped terms, van Oostrom's technique relying on the existence of a decreasing diagram for each local peak will be our main tool for analyzing confluence of λF .Its use requires labeling all rewrite steps (van Oostrom 1994), we shall see later how.
Using van Oostrom's technique is made difficult by the presence of rewrite rules whose righthand sides are non-linear, because non-linearities make it impossible to have decreasing diagrams for the so-called ancestor peaks.The solution relies on the use of a new relation whose confluence implies that of λF , so that redexes duplicated by non-linear right-hand sides can be reduced in a single rewrite step.Then, because β-reductions can stack up redexes that were previously at parallel positions, we need to define a notion of simultaneous reduction of several non-overlapping redexes in a term.For instance, given the rewrite rule f (g(f (x))) → x, we can rewrite simultaneously the blue-and red-headed redexes in the term m(f (g(f (c))), f (g(f (d)))) and get m(c, d).We can also rewrite simultaneously, in the term f (g(f (f (g(f (c)))))) the blue-and red-headed redexes and get c.But, because the rule has a critical pair, the term f (g(f (g(f (c))))) contains two overlapping redexes at positions and 1 • 1, which cannot be both reduced at the same time.
When two redexes do not overlap, their positions are called orthogonal: the examples above show us that two redexes are othogonal in a term u iff u can be split at a position p, yielding the term u p and the substitution u p , with one redex in u p and the other one in u p .Splitting u along a set of parallel positions P ensures that the redexes in u P and those in u P do not interact.Since the rules are left-linear, these redexes can then be reduced simultaneously.
The idea of orthogonal rewriting appears in the literature under at least two different other names, parallel reductions and multi-step rewriting.Parallel reductions were introduced by Tait and Martin-Löf to show confluence of the pure λ-calculus.Van Oostrom's multi-step rewriting generalizes this construction for both concrete and abstract rewriting relations.These generalizations are extensively studied in (Terese 2003), where they are used for analyzing orthogonal rewrite relations, as well as, more generally, orthogonal rewrite steps of non-orthogonal rewrite relations, whether operating on first-order terms, higher-order terms or term-graphs.Note that the notion of othogonality of steps is trivial in left-linear critical pair free rewrite systems, like the λ-calculus: the absence of critical pairs implies that any two steps are orthogonal.We refer to (Terese 2003) for a comprehensive survey of the literature on this subject.
Our coming definition of orthogonal rewriting ensures orthogonality of steps by splitting terms, and records its construction in a label that generalizes the notion of position of a single rewrite step.This makes sense because we define orthogonal rewriting of a given rewrite rule, not of a given rewrite system as with multi-step rewriting -we would then need to record pairs made of a rule name and the positions at which that rule applies.A major advantage of our definition is that it eases the critical pair analysis.A potential disadvantage is that some rewrite theories might be proved confluent by using critical multi-pairs (whatever they are) but not with nested critical pairs, the kind of critical pairs associated with orthogonal rewriting that we are going to define soon.

Product of positions
We introduce here an operation on positions that belongs to the folklore of term rewriting although it is never used explicitly to our knowledge.It will play a key rôle for defining orthogonal rewriting.
Definition 12. Let u be a term, K a set of splitting positions in u, P a set of positions in u K such that ∀p ∈ P ∀k ∈ K : p ≥ P k, and To ensure Lemma 13, Q k , for k ∈ K, is a set of positions in u| k , not in u k as could have been expected: the abstractions in u k that disappear by instantiation must be eliminated for the product to return a set of positions in u.But it will be convenient in the following to consider Q k as a set of positions in u k as well.The reader will pardon this abuse that aims at simplifying the notations.
In the sequel, we are often given a term s and a closed substitution σ .Then, the set of positions of meta-variables in s is a splitting set for the term u = sσ .Given P and Q satisfying the conditions of Definition 12, we then often write . The symbols at positions of P, Q, and P ⊗ K Q appear in blue in s, σ , and sσ , respectively.
The product can be used to define sets of positions implicitly, using the following property: Lemma 14.Given a term u, let K be a set of splitting positions in u and O⊆Pos(u).Then, there exist unique sets of positions P, Q such that P={p∈P : p ≥ P K} and O=P ⊗ K Q.

Orthogonal positions
Rules in this section belong to R ∪ β.
Rewriting takes place at a given position p in a term.Parallel rewriting takes place at a set P of parallel positions.Orthogonal rewriting will take place at a set O of orthogonal positions in a term u, some of which being possibly on a same path from the root of u.Not any set of positions is orthogonal, of course, and indeed orthogonality of a set of positions O must depend upon the rule which is used to rewrite u: there must be enough room so that the left-hand sides at two different positions do not overlap.
Definition 15.Given a set of positions P and a rule i : L → R, we say that the position q satisfies the room condition for rule i if ∀p ∈ P: q = p ∨ q ∈ p • F Pos(L), a property denoted by RC(q, P, i).We will also use RC(Q, P, i) for ∀q ∈ Q : RC(q, P, i).
Note that the case q ∈ P is possible and will indeed often occur in the sequel in case P = Q.Note also that the room condition RC(q, P, i) is equivalent to the more verbose condition ∀p ∈ P : q#p or p ≥ P q or q ≥ P p • F L .We shall use whichever one is more convenient.

Definition 16. A set O ⊆ Pos(s) is a set of orthogonal positions for term s wrt rule i : L → R iff
(1) redex condition: ∀p ∈ P, there exists a substitution σ such that s| p = Lσ ; (2) room condition: RC(O, O, i).
The following simple property will often be used unannounced: Lemma 17. Suppose O is a set of orthogonal positions for s wrt rule i : L → R.Then, either O is a set of parallel positions, or there exists a splitting set K of s such that O = P ⊗ K Q and P, Q are sets of orthogonal positions wrt i for s K and s K , respectively.
Proof.Any set of parallel positions is a set of orthogonal positions, since it trivially satisfies conditions ( 1) and ( 2) of Definition 16.Assume now that O is not a set of parallel positions, hence is nonempty.Let K = {q} be a singleton set containing a maximal position q of O.By Lemma 14, O = P ⊗ K Q, where P ⊆ Pos(s K ) and Q = { } is a set of orthogonal positions for s K .Since q satisfies RC(q, O, i) by assumption, property (1) of a set of orthogonal positions holds for P and s K by Lemma 10, and property (2) holds because P ⊂ O and O is orthogonal by assumption.
Note that we use here our convention that Q denotes a set of positions in both s| K and s K , even if it is not formally true.In the sequel, we refrain from mentioning this abuse when using it.
Lemma 17 will often be used with a subset of parallel positions of O for splitting set K, which satisfies the room condition automatically.

Definition of orthogonal rewriting
Rules in this section belong to R ∪ β.
(extending orthogonal rewriting from terms to substitutions in the natural way.) Lemma 17 ensures that the recursive calls make sense, while Lemma 13 ensures that P ⊗ K Q, also written P ⊗ s K Q, is a subset of Pos(s) as expected (using here our convention for Q).
The meta-variables introduced by splitting s along K are eliminated from vτ by instantiation, hence V ar(t) ⊆ V ar(s) and MV ar(t) ⊆ MV ar(s).In particular, if s is closed, then t is closed too.This is a key condition to ensure that orthogonal rewriting does not depend upon the choice of a particular splitting set K, as we shall soon verify.
The abstractions introduced in s K by splitting play a crucial rôle here by allowing us to restore the link between a variable x abstracted in s at a position above K, hence in s K , and its instances in s| K , ensuring that s = s K s K .
Notice that minimal positions in O originate either from P or from Q, instead of only from P. This choice aims at generality and will ease the study of critical pairs in Section 4.5.
Orthogonal rewriting reduces to the identity if O is an empty set of positions, to rewriting if O is a singleton set of positions, and to parallel rewriting if O is a set of parallel positions.In the general case, our definition coincides with Tait's in case i = β, and both are indeed expressed in a very similar way.In our definition, the splitting set comes first.Instead, Tait assumes that s = uθ, and rewrites independently in u and θ as we do, the splitting set remaining implicit.Making the splitting set explicit becomes an advantage when it comes to generalizing the Critical Pair Lemma.

Monotonicity and stability properties
These properties are of course inherited from higher-order reductions.They hold for R ∪ β.Proof.Note that P| p is a set of orthogonal positions when p ∈ P, hence the statement makes sense.The proof is by induction on the definition of orthogonal higher-order rewriting.If ∈ P, then p = and the result holds.If u P =⇒ i v, then the result follows from the definition parallel rewriting.
( Proof.If P is a set of parallel positions, the property follows directly from Lemma 9. Otherwise, This proof shows that redexes can be linearized using a top-down strategy: a redex at some position p is always reduced before another redex at a position p • q.We could of course base the proof on the other reduction u = sσ which would yield a bottom up strategy.Using any other strategy would be possible but require commutation properties that we do not intend to develop here.Next section will provide another way to construct an arbitrary linearization strategy.

Splitting
All properties in this important section hold for R ∪ β.
Definition 19 is very flexible in the way splitting the input term is possible, minimal rewriting positions taking place above, and/or below, and/or in parallel with the set of splitting positions.This design choice has an important consequence: our product construction is both horizontal and vertical in the way a set of orthogonal positions can be extended with another by making their product.
We show here that any orthogonal rewrite step s P ⊗ =⇒ i t can actually be defined via a canonical splitting, for which the minimal rewriting positions are all (strictly) above the splitting set, all of whose elements are rewrite positions themselves:  Assuming O = P min is not enough for uniqueness of the canonical splitting set.On the other hand, assuming P = (P \ P min ) min would imply O = P min , hence give an equivalent definition.
Lemma 27 (Canonical splitting).Let P be a set of orthogonal positions such that u P ⊗ =⇒ i u , and K its canonical splitting such that P = P min ⊗ K P .Then, u Proof.By induction on |u|.The result holds if P is a set of parallel positions, hence P = P min , taking K = ∅ and P = ∅.Q may contain positions that are minimal in P.
) min , and Q 3 #O min .Note that P min =O min ∪(Q 3 ) min .Since metavariables must occur linearly in s, we split σ as its restrictions σ 1 , σ 2 , and σ 3 to the meta-variables of s which occur above (Q 1 ) min , (Q 2 ) min , and (Q 3 ) min , respectively.Using now successively Lemma 21 for σ 2 and the induction hypothesis for σ 3 , we get σ 2 An important direct consequence of canonical splitting is that the outcome of an orthogonal rewrite step is entirely determined by its input term and set of orthogonal positions: The product notation has shown itself to be very convenient: we will use it systematically, in particular when P is empty, writing then sσ will serve as a type-checking device to control complex rewriting calculations.

Critical peaks
Since patterns are β-normal and left-linear, a β-redex cannot overlap a R-redex at a position above its fringe.No wonder here, this is the only purpose of the assumption that patterns are β-normal.Further, β-redexes cannot overlap themselves either, except trivially.
We are therefore interested in overlaps between the rules of R only, throughout the four coming sub-sections which deal successively with the definition of critical peaks, their calculation, their main property, and an example.
Our formulation of the definition of orthogonal rewriting has one main purpose: ease the definition of critical pairs and the proof of the associated critical peak property.Generating the minimal nested critical peaks that characterize the confluence of orthogonal rewriting requires computing the overlaps of two orthogonal rewriting steps issuing from a term.Such peaks are defined by two different rules, each left-hand side overlapping alternatively on the other at a set of parallel positions, but not between themselves so that there are two different orthogonal steps issuing from the same term.These overlaps form both horizontal chains when one left-hand side overlaps the other at several parallel positions, and vertical chains when there is an alternation of overlaps between the two left-hand sides.
As usual, overlapping a left-hand side of rule G at a subterm of another L| o frees the variables of L| o that are bound in L above o.Then, the meta-variables of G need to depend upon those variables, which may require increasing their arity.This is done with an operation called lifting (Ferey and Jouannaud 2019), introduced first in this context by Nipkow in a slightly different form (Nipkow 1991).
Definition 29 (Lifting).Given a term L and a list x of pairwise different variables such that V ar(L) ∩ x = ∅, we call lifting of L with respect to x, denoted by L↑ x , the term Lσ x L , where Lifting increases by x the list of arguments of all meta-variables occurring in L, hence their arity by |x|, requiring changing their names to fresh ones.Note that lifting preserves pre-patterns.
Example 6. Lifting the left-hand side L = ud(V, lk(X)) of rule (ul) with respect to the variable x gives the term An important property of lifting is the following: x is the vector of variables bound above p in u.Proof.By definition of rewriting, u| p = Lσ .Let L↑ x = Lτ x L .We define the substitution θ such that Definition 31 (Nested overlaps).Given two rules k:L → R and l:G → D in R, a substitution σ and a pure term u such that u = α Lσ , two sets P={p 0 = } {p i } i∈I and Q={q j } j∈J of positions in F Pos(u), two sets {V i } i∈I and {W j } j∈J such that V i and W j are the lists of (pairwise different) variables bound in u from positions p i and q j , respectively, to the root, and two sets {L i } i∈I and {G j } j∈J of renamings of L, G, respectively, that share no meta-variables between themselves nor with L = L 0 , then u, k, P, l, Q, σ is a nested overlap of G onto L at positions P, Q iff: (i) σ satisfies the equality i∈I u| The nested overlap is critical if σ is a most general higher-order unifier.
The particular case of critical nested overlap for which P = { } and Q is empty is said to be trivial.

The set of non-trivial critical nested overlaps of rule l onto rule k is denoted by C no(k, l).
The particular case of non-trivial critical nested overlap for which P = { } and Q is a singleton set is called a critical overlap.Its set is denoted by C o(k, l).
The particular case of non-trivial critical nested overlap for which P = { } and Q is a nonempty set of parallel positions is called a critical parallel overlap.Its set is denoted by C po(k, l).
The particular case of nested overlap for which P\ is a nonempty set of pairwise parallel positions and Q is a singleton is called a critical 1-nested overlap.Its set is denoted by C 1no(k, l).
Condition (i) does not make visible the fact that matching is not syntactic, but modulo β 0 instead, since β 0 -steps are buried inside the definition of substitution for pre-redexes.It says that σ , hence Lσ , is entirely defined by condition (i), and that the subterms of u at other positions in P are k-redexes, while those at positions in Q are l-redexes.It follows that σ could be omitted from the tuple u, k, P, l, Q, σ .Condition (ii) says that l-redexes overlap an above k-redex but no other l-redex.Condition (iii) for k-redexes but the topmost one is similar.It follows that P, Q are both sets of orthogonal positions for rules k and l, respectively, a property that is of course expected.
When P and Q are singleton sets, hence P = { }, u is a usual higher-order overlap between the two rules.In general, the positions p i and q j keep increasing because of conditions (ii, iii), and the sets V i and W j of bound variables keep increasing as well, requiring new fresh meta-variables for each copy of L i and G j .This may occur in practice as shown at Example 7.
Trivial critical peaks are not local peaks, since the rewrite takes place on a single side.They are used to establish the base case of the induction showing that critical peaks can be computed.
One may wonder why we call these critical peaks nested rather than orthogonal.First, orthogonality refers explicitly to the absence of critical pairs, so an orthogonal critical pair would be kind of self-contradicting.Another reason is that there is a single rule left-hand side sitting at the top of a seed.Therefore, all redexes occurring in a seed are nested inside that left-hand side's instance, whether they extend the seed construction vertically or horizontally.
Non-trivial nested critical overlaps give rise to critical local peaks: The triple (v, u, w) is called a nested critical peak of rule l onto rule k at positions P, Q while the pair (v, w) is the corresponding nested critical pair.
Proof.By (i) u| p i = L i ↑ V i σ , where L i ↑ V i is an instance of L by the definition of lifting.Since substitutions compose, u| p i is a k-redex.Similarly, u| q i is a l-redex.Therefore, P and Q are sets of orthogonal positions for rule (k) by (iii) and for rule (l) by (ii), respectively.The result follows then from the definition of orthogonal rewriting.
The proof shows that lifting substitutions must be applied to the right-hand sides of rules.Note also that this whole section appears as a genuine generalization of the usual critical pair theory or parallel critical pair theory, apart from the calculation of critical pairs which comes next.

Calculation of nested overlaps
The previous definition of a nested overlap does not allow us to compute σ , hence the critical nested overlaps, since the positions p k and q l are positions in Lσ , σ being yet unknown.An algorithm computing these overlaps must therefore proceed by successive unifications, possibly alternating between the two left-hand sides.Computing these overlaps requires then some bookkeeping, both in terms of substitutions and overlapping positions, in order to avoid self-overlaps of L and G.This is achieved by the next definition: Definition 33 (Seeds).Given two rules k:L → R, l:G → D from R, the set pS k l of (k,l)-pre-seeds is the smallest set of tuples (s, σ , P, Q), where P and Q are lists of positions in F Pos(sσ ) of L-redexes and G-redexes, respectively, such that (i) pS k l contains the trivial pre-seed L, ∅, { }, { } ; (ii) pS k l is closed under nested overlapping: given s, σ , P, Q ∈ pS k l , two lists of parallel positions {p i ∈ F Pos(sσ ) : p i ≥ P P • F L } i∈I and {q j ∈ F Pos(sσ ) : are not both empty and whose elements are pairwise incomparable, renamings {L i } i∈I of L and {G j } j∈J of G such that ∀i, j, V ar(L i ), V ar(G j ), and V ar(sσ ) are pairwise disjoint sets, and τ a most general unifier of the equation i∈I (sσ )| p i = L i ↑ V i ∧ j∈J (sσ )| q j = G j ↑ W j , where V i and W j denote the variables bound in sσ above p i and q j , respectively, the non-trivial pre-seed sσ , τ , P ∪ {p i } i∈I , Q ∪ {q j } j∈J belongs to pS k l .We call seed a triple uτ , P, Q where u, τ , P, Q is a pre-seed.The term uτ is the nested overlap, P contains the positions in uτ of the k-redexes and Q those of the l-redexes.A seed is trivial if Q=∅.The set of seeds is denoted by S k l .
In the recursive definition of pre-seeds, the overlapping substitution σ obtained tells us where to overlap next, while the maximal positions in P and Q of these overlaps tell us where to not overlap L and G, respectively.In particular, the initial overlap is impossible with L, unless k=l, but is possible with G. Subsequent overlaps may involve both L and G.
Since L, G are left-linear, higher-order unification of a lifted copy of G with some subterm of L (or vice-versa) does not instantiate these terms beyond their boundaries.It follows that each redex instance of L (resp., G) must overlap some left-hand side G (resp., L) obtained at the previous run.This remark is a property of the definition when the rules are left-linear; building it in the definition is useless.
Alternating overlaps with L and with G would eliminate some redundancies to the price of storing the run parity in the tuple.In practice, however, that would of course be the natural strategy for computing them.
Example 7 (Rules lu and ul of the theory of global states).These two rules, like their well-chosen names, overlap themselves ad libitum because the head function symbol of each left-hand side is heading a strict subterm of the other which will be part of the substitution when unifying the rules.
The computation is represented in Figure 2. The color red is used for the left-hand side of (ul), and therefore of its subterms, while blue is used for the left-hand side of (lu).We consider the case where ul stands at the top.We adopt a renaming schema based on the value https://www.cambridge.org/core/terms.https://doi.org/10.1017/S0960129522000044Downloaded from https://www.cambridge.org/core.IP address: 78.196.67.162, on 18 Mar 2022 at 22:25:11, subject to the Cambridge Core terms of use, available at of a counter used to index the meta-variables and bound variables of each rule in turn.The starting value of the counter, used to index the meta-variables of the left-hand side of rule standing at the top, is zero.The initial rule (ul) is therefore ud(V 0 , lk(Y 0 )) → ud(V 0 , Y 0 [V 0 ]).Each time a new equation is used, the counter is increased by one.So, the first use of rule (lu) is lk(λv 1 .ud(v 1 , X 1 [v 1 ])) → lk(X 1 ).Overlapping positions appear then on the tree in violet, with a bullet in the exponent of the function symbol.The first overlap takes place at position 2. Note that both rules are represented in the figure; that's why the first occurrence of lk has one blue successor with a blue link, and one red successor with a red link.The equation generated is therefore lk(Y 0 ) = lk(λv 1 .ud(v 1 , X 1 [v 1 ])).It is displayed on the figure to the right of the first overlapping position.The substitution obtained from this equation is figured as an equality between a preredex and its value under the substitution.For Y 0 , we get λv 1 .ud(v 1 , X 1 [v 1 ]) = Y 0 .This value will of course change when new overlaps will take place successively, until the whole tree of Figure 2 is obtained.We have again fully represented the two rules and the equation generated from the second overlap at position 2•1•1.
Let us now move to the seeds calculation.The initial pre-seed in pS ul lu is ud(V 0 , lk(Y 0 )), ∅, { }, {} .The only possible overlap with ul requires Q 1 ={2}, a position above the fringe of ud(V 0 , lk(Y 0 )), and requires no lifting since there is no abstraction above position 2 in ud(V 0 , lk(Y 0 )).The equation obtained is displayed on the figure.Unification yields ) is added to pS ul lu , and the computation proceeds with ud(V 0 , lk(λv )), whose solutions appear on the figure.
Following the picture, the reader can now easily continue the computation.
We finally get four infinite families of seeds, depending upon which left-hand side of rule stands at the top, and which one stands at the bottom.ud(V 0 , lk(λv 1 .ud(v 1 , lk(λv 3 .ud . . .lk(λv 2n+1 .ud(v2n+1 , X 2n+1 [v]))...)))), The seed represented in the figure belongs to the second set.In all these nested overlaps, the set of sets of overlapping positions of (ul), or of (lu), is actually a set of singleton sets of overlapping positions.We have therefore identified both sets without ambiguity with sets of positions.Note also that our counter for indexing the variables and meta-variables is initialized by 0 for the first two seeds, and to 1 for the last two.This explains that we could use the same vector v of bound variables for all four seeds and that the last two nested overlappings are subterms of the first two.
We now show that the sets of seeds and of critical nested overlappings are one-to-one (up to variable renaming of bound variables).This statement includes trivial critical nested overlappings and trivial seeds in order to facilitate its proof.Theorem 34.Let R be a higher-order rewriting system such that k, l ∈ R.Then, s, k, P, l, Q, Of course, membership should be understood modulo the renaming of bound variables and meta-variables.
Proof.Note first that trivial critical nested overlappings and trivial seeds correspond.(l, k).It suffices to show that C no(l, k) is "closed under nested overlapping".By definition of pre-seeds, let s, σ , P, Q ∈ pS k l , {p i } i and {q j } j be sets of positions, {L i } i and {G j } j be renamings of L, G, and τ a substitution satisfying the condition (ii) of Definition 33.Using the induction hypothesis and the above conditions (ii), it is easy to verify that sσ , k, P, l, Q, σ ∈ C no(l, k).
l , the converse statement.Let s, σ , k, P, l, Q ∈ pC no(l, k), and consider the sets P , Q of positions in P, Q which are maximal in P ∪ Q.Let P \ P = {p i } i and Q \ Q = {q j } j .By condition (i) and the fact that σ is minimal, σ = θτ and s = tτ , where θ is the most general unifier of the equation t=Lθ By condition (i) and closure under nested overlappings t, τ , P, Q ∈ pS k l , hence tτ = s, P, Q ∈ S k l and we are done.
We could of course define the critical pairs themselves recursively in the same way we have defined the nested overlaps in Definition 33.It is, however, equivalent to rewrite the overlaps with the appropriate lifting of the right-hand sides of rules, which means that each rule needs to be lifted with lifting substitutions, implying that each rule needs infinitely many lifted copies in general.

Critical peak property
We conclude our study of orthogonal rewriting with the nested critical peak (or critical pair) property: This statement is pictured in Figure 3.The substitution θ is obtained by splitting r as uθ so that u contains a critical overlap.The substitution γ expresses the property that u is an instance of the most general critical peak (v , u , w ), hence u=u γ .These substitutions play different rôles, the substitution θ is rewritten while the substitution γ is not, that is why they are kept separate.Proof.By assumption on Q, the two orthogonal rewrites from r overlap.Let O={O i } i∈I and O ={O j } j∈J be the maximal subsets of P and Q, respectively, that satisfy conditions (ii, iii) of Definition 31.Let now N = ((P∪Q)\(O∪O )) min , u = r N and θ = r N , hence r = uθ.
We first show that there exists a substitution δ such that the tuple u, k, { } ∪ O, l, O , δ is a nested overlap, hence satisfies condition (i) of the same definition.Since { } ∪ P and Q are sets of orthogonal positions, so are their respective subsets { } ∪ O and O .By definition of orthogonal rewriting and maximality of O, O , the positions n∈N do not belong to any set of the form p • {o < P F L ∪ F G } for some p and o, unless p = n, or q • {o < P F L ∪ F G } for some q and o, unless q = n.Hence, { } ∪ O and O are sets of orthogonal positions in u, and therefore, by linearity of L and G, the terms in {u| p : p ∈ { } ∪ O} are k-redexes while the terms {u| q ∈ O } are l-redexes, hence are instances of L and G, respectively.Let O = {o i } i∈I and O = {o j } j∈J , {L i } i∈I and {G j } j∈J be renamings of L and G, respectively, that share no meta-variable between themselves nor with L.Then, u = Lσ , u| p i = L i σ i and u| j = G j τ j .By Lemma 30, u| p i = L i ↑ V i δ i , where V i is the vector of variables bound above p i in u, hence in r, and u| q j = G j ↑ W j δ j , where W i is the vector of variables bound above q j in u , hence in r.Note that lifting is used here to ensure that the free variables coincide in u, Lδ (there are no free variables), L i ↑ V i δ i (whose set of free variables is V i ) and G j ↑ V j δ j (whose set of free variables is W j ).Since the respective domains of these substitutions are pairwise disjoint, their union δ satisfies condition (i).
We can now exhibit the critical nested overlap.Let ξ be the most general substitution satisfying (i), hence δ = ξγ for some γ , u = Lξ and u = u γ .Then u , k, { }∪O, l, O , ξ is a critical nested overlap.By Lemma 32, s Of course, this result has particular cases: one-nested critical overlaps give rise to one-nested critical peaks (or critical pairs), parallel critical overlaps give rise to parallel critical peaks (or critical pairs), and critical overlaps give rise to critical peaks (or critical pairs).The latter case is sometimes dubbed plain to distinguish it from the others.

Nested critical pairs of the theory of global states
Using our example of global states, we illustrate the computation of two nested critical peaks and of one of the infinite families of nested critical peaks.
And now, the first infinite family of seeds described in Example 7:

Confluence in λF
Our goal here is to state and prove our main result, namely, the Church-Rosser property for a rewrite theory in λF , under the assumption that its nested critical peaks have decreasing diagrams.
Since beta-reductions do not terminate on untyped terms, and higher-order reductions may be non-terminating as well, we shall use van Oostrom's technique relying on the existence of a decreasing rewrite diagram for each local peak (van Oostrom 1994), decreasingness being defined wrt a partial quasi-order labeling the rewrite steps whose strict part must be well-founded.
The structure of this section is as follows: first, we describe the rewrite relation we use for proving confluence, and the corresponding labeling schema; second, we recall the notion of decreasing diagram and state the confluence theorem; third, we apply the result to the theory of global states; fourth, we apply it to the theory of encoding and decoding; lastly, comes the proof of the result.

Labeled rewrites in λF
We now consider the Church-Rosser property of the rewrite relation used in λF on untyped terms.As usual, it is essential to choose carefully the relation to work with.For the method to be sound, it must contain rewriting and define the same convertibility relation as the one generated by the set of rules.Higher-order rewriting will suffice for those rules that define a terminating rewrite relation.For those that do not, non-linearities make it difficult to get decreasing diagrams for ancestor peaks since there can only be one facing step on each side of the conversion.The way out is to impose left-linearity and use some form of parallel rewriting to handle non-rightlinearities.But parallel higher-order rewrites taking place below a beta-step may now become both duplicated and nested, making orthogonal higher-order rewriting necessary to get decreasing diagrams.Functional steps will be considered as particular steps requiring the use of orthogonal rewriting.
We now assume a subset S ⊆ R of small rules defining a terminating relation on untyped terms.All other rules are big rules.β-rewrite steps will be smallest among the big steps, so that they can be neglected when needed, while small steps will be smaller than big steps.The definition of S will therefore result from a compromise between two constraints: termination which allows a rule to be small, and the possible need of β-rewrites to join its critical pairs, which may force it to be big.Note that functional steps cannot be smaller than small steps: if they were, ancestor peaks having a small step below a functional step could not be made decreasing.
The rewrite steps to be considered in an arbitrary conversion, for which all local peaks must be replaced by decreasing diagrams, are therefore of one of the three following forms: For uniformity, we use the notation u p −→ α v for an = α -step taking place in the subterm u| p .
Furthermore, we sometimes allow ourselves to abbreviate a sequence

Main assumptions:
• R is a set of rules whose left-hand sides are linear patterns; • S is a subset of rules of R, whose rewrite relation is terminating; • is a quasi order on rule names whose strict part is well-founded and equivalence is ≡; We now label a step s p −→ i t or s P ⊗ =⇒ i t by a pair of the following form: Labels are compared lexicographically: our quasi order on labels is therefore ( , Since the main order is the one on rule names, we will take the liberty to use as our quasiorder on labels and on rule names at the same time.The reader will easily disambiguate when needed.The strict part of this order is well-founded by construction.
Because of its definition, the label of a rewrite step would not need to appear in our rewriting notation, unless in specific occasions where it will replace the rule name.In order to disambiguate between these two situations, we will use small letters for rule names, say i, and the corresponding capital letter I for a label whose first argument is the rule name i and the second some term s.
Key properties of the order on labels are monotonicity and stability, for which it will be convenient to define the following notation: given a context u[_] p , a substitution σ and a label I = i, a , u[I] p σ is the label i, u[a] p σ , where u[a] p σ = _ if a = _ and the term u[s] p σ if a is the term s.The notation extends of course to the case where there are multiple holes in u, as in u[I] P σ , all labels in I using rule i.This will be used later with P being the positions of the meta-variables of u.
Lemma 36.Let K = k, a and L = l, b be the labels of two rewrite steps such that K L, u[_] o a context, and σ a substitution.Then, K by monotonicity and stability of higher-order rewriting, and the result follows.

The confluence theorem
van Oostrom's decreasing diagrams can have a very general form.We shall, however, stick to a more convenient one for the confluence proof: Definition 37 (DRDs).Given a pair of labels (I, J), a decreasing reduction (DR) from u to v wrt , where labels in K labeling the side steps are strictly smaller than I and labels in L labeling the middle steps are strictly smaller than either I or J.A decreasing reduction whose facing step from s to t is empty (P = ∅) is called simple.
We often leave the labels implicit, writing then u −→ −→ Our formulation of decreasing reductions includes α-steps in both sequences K and L, that is why they do not need to appear explicitly at the bottom.On the other hand, η-expansions (or reductions) for meta-variables need to appear only at the bottom since they are absorbed by matching in decreasing reductions.These expansions can be ignored since they vanish by taking instantiations (Lemma 2).We will only find them in decreasing diagrams for critical pairs.
There is a single orthogonal step in a decreasing reduction, the other steps being plain higherorder rewrite steps.The reason is that any orthogonal step whose label is strictly less than I or J in a decreasing reduction wrt (I, J) can be expanded into a rewrite derivation by Lemma 25, decreasingness being then preserved.On the other hand, it is in general not possible to expand the orthogonal j-step without violating the decreasing diagram condition: there may be at most one (facing) step labeled by J. Collecting many j-steps into a single orthogonal j-step is the very reason for introducing orthogonal rewriting.
DRDs have better properties than arbitrary decreasing diagrams that ease the confluence proof in many (nonessential) ways.In practice, searching for DRDs is easier than searching for arbitrary decreasing diagrams; this is another reason for considering them.
We can now state the main result of the paper:  Of course, some nested critical peaks of R \ S may be one-nested critical peaks, or parallel critical peaks, or even plain critical peaks.
Before developing the proof that spans over Section 4.5, we illustrate its use with two examples, the first having infinitely many nested critical peaks, and the second requiring β-steps to join some plain critical peaks.

Confluence of the theory of global states
We illustrate now the confluence theorem with our example of global states.After recalling the rules, come the critical pairs computations presented inside individual boxes, the nested critical pairs that happen to be usual critical pairs first, then those that generate infinite families.For the case of usual critical pairs, in the upper middle of each box appear two rules whose superposition is inside braces.The upper rule is displayed in red, the lower one in blue with the proviso that lifted variables appear in red inside a blue rule.Next comes the unifier, then the colored right-hand sides, then the reduced right-hand sides, and finally the decreasing diagram part itself.Colored rule names label the arrows issuing from the critical overlap.The decreasing reductions are in black, including the rule names, except for the facing steps which are in red or blue (remember that simple reductions cannot contain a facing step).The presentation is a bit different, but still very similar, for the infinite families.
Let us first recall the rules: ll : lk (λw.lk(Y[w])) → lk (λv.Y[v, v]) ud(U, ud(V, W)) → ud(V, W) : uu ul : ud(V, lk(X)) → ud(V, X[V]) lk (λv.ud(v, X[v]) → lk(X) : lu l : lk(λv.U) → U As shown in (Ferey and Jouannaud 2019), these rules do not terminate, which is why their confluence was proved there for a subset of the whole set of terms, namely by assuming that the first argument of ud belongs to a set of constant symbols (meant to correspond to the semantic values).We do not make such an assumption here, but use instead the easy to prove property that the subset S = {uu, ll} of linear rules defines a terminating rewrite relation, hence can be used as our set of small rules.Showing its confluence is done below by showing that its critical pairs are joinable (Knuth and Bendix 1970), which we do now.The overlap of the first rule upon itself is just a usual first-order overlap.The second, however, requires lifting since the overlapping position is below an abstraction.
which is a decreasing rewrite diagram with a single facing step for rule lu lk(λw.lk(X 1 [w]) which is a trivial decreasing rewrite diagram We continue with the checking of 1-nested critical pairs originating from an overlapping of a small rule at a position of one of the big rules.All of them happen indeed to be critical pairs.Note that the next critical pair obtained by overlapping (l) with (ll) requires the use of the Drop unification rule (Ferey and Jouannaud 2019), which generates a fresh meta-variable Z of arity one, in order to eliminate the dependency of X 1 upon its second argument w: which is a trivial decreasing rewrite diagram which is a decreasing rewrite diagram with a single facing step for rule ul ud(V, lk(λv.X 1 [v]) which is a trivial decreasing rewrite diagram We are left with the nested critical pairs between big rules.As we have seen, there are four infinite families of critical overlaps between ul and lu that we have computed already at Example 7. Since each family contains infinitely many critical pairs, we have to show that they all enjoy a decreasing rewrite diagram.ud(V 0 , lk(λv 1 .ud(v 1 , lk (..., lk(λv uu which is decreasing wrt rule labeling We have therefore shown that the theory of global states preserves the confluence of β-reductions on all untyped terms, hence improving over (Ferey and Jouannaud 2019), by using the order on rules' labels inherited from the decision to have ll and uu as small rules.This example could actually not be shown confluent with an empty set of small rules, as can be checked by the reader, because, in that case, there would be additional infinite families whose joinability diagram would require {uu, ll} {ul, lu} to be decreasing, while the existing infinite families require just the contrary.
An important remark here is that the checking of nested critical pairs, whether automatic or by the user for the infinite families of nested pairs, proceeds by accumulating ordering constraints on the rules' names used for labeling the rewrite steps, and checking them for satisfiability, usually by rewriting the constraint into some disjunctive normal form.As is well-known, such a technique is modular: when new rules come in, new nested critical pairs are computed and new constraints added to the previous normal form, then the whole set gets normalized again and satisfiability or unsatisfiability concluded.

Confluence of encoding and decoding
We illustrate now the confluence theorem with our example of encoding and decoding terms, and show that all its nested critical pairs have decreasing diagrams.Just as with the previous example, we first recall the rules before computing and checking first the nested critical pairs that happen to be usual critical pairs, and then those that generate infinite families.
Let us first recall the rules: The subset of rules {de, el, dl, ae, ad} can be seen as a set of first-order rules, λx being a unary operator and bound occurrences of x in the scope of λx being constants, since no meta-variable is applied in the right-hand side of the rules.We show that the rules are strongly normalizing by using the following (non-erasing) polynomial interpretation over N + : We now prove that this interpretation decreases strictly each rule.The check is trivial for de, el and dl.For ae, we need to verify that which holds true.Naturally, neither al nor β decrease this interpretation since either of these rules is non-terminating on its own.Choosing all these five rules as small rules would actually not work: two critical peaks require using β-rewrites in their joining diagrams, which prevents us from keeping el and ad among the small rules.The set of small rules is therefore reduced to {de, ae, dl}.The remaining rules are then ordered as follows: {el, ad} al β.There are then no nested critical pair, no one-nested critical pair, and four critical pairs that we consider in turn: which is a decreasing diagram since el is big and the others small which is a decreasing diagram since ad is bigger than all others @( ⇓(T) which is a decreasing diagram since el is bigger than all others which is a decreasing diagram since ad is bigger than all others Note here that using multi-step instead of orthogonal rewriting would produce one more critical pair, actually a one-nested critical pair obtained by overlapping (de) onto (ad) and (el) onto (de).This would correspond to a multi-step using both (ad) and (el) connected by a rewrite step using (dl).Thanks to our definition of orthogonal rewriting, this extra critical pair does not need to be considered.
We conclude that the encoding and decoding specification defines a confluent system together with β.

The confluence proof
Using the generalization of van Oostrom's theorem by Jouannaud and Liu (Jouannaud and Liu 2012), we need to show the existence of decreasing diagrams for all local peaks in turn.We develop first some properties of decreasing reductions.
We will actually need a more general form of monotonicity for orthogonal reductions, for which the linear context u has multiple holes defined as the positions of the meta-variables in Dom(σ ).The proof is by induction on the number of meta-variables in u.
Lemma 40 (Multi-monotonicity).Given a linear term u whose meta-variable X occurs at a set P of parallel positions, a substitution σ of domain MV ar(u), and labels I = i, a and J Proof.By Lemma 39 applied repeatedly to Xσ for s instantiated by the identity substitution.
We now consider gluing together two decreasing reductions.
τ be two decreasing derivations originating from a term u and a substitution σ , respectively.
Proof.Follows from stability properties of reductions as above, from their monotonicity properties, and from the definition of orthogonal reductions.

Decreasing diagrams for free
Before starting to build decreasing diagrams for all local peaks, we need to generalize some standard commutation properties of plain rewriting to the case of higher-order orthogonal reductions.These algebraic properties of reductions can be seen as decreasing diagrams for free.They will of course play an important rôle in the confluence proof.Plain first-order rewriting enjoys two properties implying that disjoint and ancestor local rewriting peaks are always joinable by decreasing reductions, called disjoint peak (DP) property and linear ancestor peak (LAP) property for left-linear rewrite rules (Huet 1980).(DP) is true of all monotonic relations, and (LAP) holds for our definition of higher-order rewriting as we have seen.It is important to realize that our definition of higher-order rewriting has been designed with this objective in mind: neither (DP) nor (LAP) are true of Nipkow's definition.As expected, these properties extend to orthogonal higher-order rewrites: Lemma 42 (DP).Let Q#P be sets of orthogonal positions wrt rule i.Then, orthogonal i-rewrite steps at P and Q commute.
Note that P (resp., Q) are singleton sets when rewriting takes place at a single position.

Lemma 43 (Commutation
This straightforward property is required in the case where a rewrite relation is modulo a theory, here modulo = α (Jouannaud and Liu 2012).
In the case where j ∈ S , the orthogonal step from σ (X) to τ (X) is indeed a plain rewrite step since small rules use plain rewriting instead of orthogonal rewriting, but seen here as an orthogonal step at the singleton set Q X , while the orthogonal steps in the conclusion become parallel steps.This allows us to avoid writing specific lemmas for the case of small rules.
Proof.By stability Lemma 9 for the step from uτ to vτ and by definition of orthogonal rewriting for the step from vσ to vτ .

Decreasing diagrams for arbitrary higher-order local peaks
This is the difficult case of local peaks, whose proof is based on using Theorem 35.It also requires lifting decreasing reductions from critical peaks to overlapping peaks, as well as gluing together decreasing derivations obtained for a term and its substitution.We do not distinguish here ancestor peaks from overlapping peaks, a typology that is not necessary for the case of orthogonal rewriting.Nor do we distinguish the kind of rule that is used, functional, small, or big.
Lemma 45.Higher-order local orthogonal peaks have decreasing diagrams provided higher-order nested critical peaks have decreasing rewrite diagrams.
Of course nested critical peaks between small rules are just usual critical peaks, while those between small and big rules are either usual critical peaks or 1-nested critical peaks if the big rule applies above (non-strictly) the small rule.We can now glue these DRDs together and obtain a DRD for the original local peak by gluing Lemma 41.
This terminates the proof of our main result.Note that we have not singled out either β or small rules in the proof.The distinction is made without saying: the case |O ∪ O | > 1 is impossible if i = β; and if i ∈ S , critical pairs only are possible if j ∈ S and parallel critical pairs if j ∈ R; if i ∈ R and j ∈ S , 1-nested critical pairs are also possible.

Conclusion
Van Oostrom's decreasing diagrams technique characterizes confluence of rewriting on an abstract set.It is well-known that its application to term rewriting is difficult, although many techniques were elaborated during the last years that successfully solved many open problems (Felgenhauer https://www.cambridge.org/core/terms.https://doi.org/10.1017/S0960129522000044Downloaded from https://www.cambridge.org/core.IP address: 78.196.67.162, on 18 Mar 2022 at 22:25:11, subject to the Cambridge Core terms of use, available at 2013; Felgenhauer et al. 2015;Felgenhauer and van Oostrom 2013;Jouannaud and Liu 2012;Liu and Jouannaud 2014;Liu et al. 2015;Zankl et al. 2015).For example, Felgenhauer proved that the existence of decreasing diagrams for parallel critical pairs ensures the confluence of non-terminating left-linear first-order rewriting systems (Felgenhauer and van Oostrom 2013).
Our main result, Theorem 38, shows that van Oostrom's method applies to a quite complex new situation, higher-order rewriting of untyped terms with left-linear rules having non-trivial higher-order critical pairs.Compared to Felgenhauer's, there are two new difficulties: rewrites are higher-order, and the presence of untyped β-reductions.Compared to (Ferey and Jouannaud 2019), we do not need the assumption that the set of higher-order rewrite rules defines a terminating relation, but can use the termination property of any subset of the rules, which has been proved very useful in examples.It is actually worth noting that if S = R, then nested critical pairs disappear as well as one-nested critical pairs.The result then reduces to the main theorem of (Ferey and Jouannaud 2019), which appears to be subsumed by our new result.We have therefore got a new, powerful and flexible tool to prove the confluence on untyped terms of λ-calculi augmented by left-linear higher-order rewrite rules.
A main technical tool used here is the theory of orthogonal rewriting, which appears as being intimately related to van Oostrom's multi-step rewriting (Terese 2003).One difference is that an orthogonal step uses different instances of a single rule at all orthogonal rewriting positions, whereas a multi-step may use instances of different rules.A main novel aspect of our definition is the associated notion of a nested critical pair and the corresponding critical pair property which allows us to check confluence of non-terminating higher-order definitions whose critical pairs are not development closed, as they are in (van Oostrom 1997).Nested critical pairs could of course be defined for multi-step rewriting by adapting our definitions, but this would result in an exponential blow up of their number, a strong argument in favor of orthogonal rewriting.
Higher-order rewriting definitions have been studied extensively in the past years because they allow us encoding program constructs in type theories such as Agda, DEDUKTI and Coq, which now all allow us for user-defined higher-order rules.They are also used in tools that target checking their confluence (Hamana 2017), or their termination (Kop 2020).The examples we have carried out here, Plotkin-Power's theory of global states and an encoding of lambda calculus used in DEDUKTI, illustrate the strength of orthogonal rewriting, which requires using nested critical pairs, as well as the use of small terminating rules, which only require plain critical pairs.Many other encoding examples appear in (Férey 2021).
Our definition of higher-order rewriting for untyped λ-terms is taken from (Ferey and Jouannaud 2019).As is usual, rules have patterns for their left-hand sides, and pattern-matching and unification are higher-order, that is, in the context of patterns, modulo β 0 .What is crucial in this setting is the use of Klop's notion of substitution for meta-variables with arities, which makes higher-order rewriting actually look like being first-order.The practical impact of this trick on technical developments is very impressive.
One may wonder whether it would be important to develop an abstract theory of orthogonal rewriting.This is indeed questionable.Orthogonal rewriting is needed in the presence of β-reduction and of higher-order rules for three concurrent reasons that do not occur together otherwise: first, β-reduction does not terminate on untyped terms; second, higher-order rules generate β-redexes, which requires having maximal labels for the higher-order steps; third, β-rewrites duplicate and stack their subexpressions, which requires using orthogonal higherorder rewriting steps in order to pack together many steps into one in order to meet the decreasing diagram condition for local peaks made of a β-redex sitting above a higher-order redex.The only other cases we can think of where these circumstances would be met are obtained by relaxing the constraints on the rewrite system added to untyped β-reduction, for example, by allowing us for non-left-linear rules.
v and K ⊆ Pos(u) such that ∀k ∈ K : k≥ P p•F L .w, and v = wu K .
-rewrite theory (F , R) is left-linear if all rules in R are left-linear.
and t = uθ.Since K is a set of parallel positions in Pos(s), splitting sσ yields sσ K = s K σ and sσ K = s K σ .By induction hypothesis, s K σ splitting uses fresh meta-variables, Dom(θ ) ∩ Dom(σ ) = ∅.Hence, (uσ )θσ = uθσ = tσ .Definition of orthogonal rewriting yields the result.By induction on the definition of orthogonal rewriting.If u O =⇒ i v, the result is clear.Otherwise, O = P ⊗ K Q, u = sσ with s = u K and σ = u K , v. We conclude by two inductions.

Figure 2 .
Figure 2. Computation of a nested overlap.
left-over rewrites applying to the substitution θ.By Lemma 14, P= O⊗ N P and Q=O ⊗ N Q .Since P and Q are sets of orthogonal positions for r, P and Q , their respective subsets below N, are sets of orthogonal positions for θ.Hence, σ two local orthogonal peaks, one from u and one from θ, that can be merged by definition of orthogonal rewriting.Since P = ({ } ∪ O) ⊗ u P and Q = O ⊗ u Q , we get the peak r = uγ , we finally conclude that s = vσ and t = wτ by Lemma 28.
i } i is a set of parallel positions.
decreasing rewrite diagram (DRD) is a pair of decreasing reductions from v to v wrt (I, J) and from w to w wrt (J, I), such that v = Mη w .Note that any joinability diagram −→ −→ S = Mη ←− ←− S using small rules is indeed a decreasing rewrite diagram whose decreasing reductions are simple since small rules are terminating.There are other situations where decreasing reductions become simple.If i ∈ R \ S and j ∈ S , then any decreasing reduction with respect to (I, J) reduces to a simple decreasing reduction of the form u −→ −→ , hence all steps in the sequence are strictly smaller than I. Two other cases are the following: i = β and j ∈ S ; and i ∈ R \ S and j = β.
4.5.1 Properties of decreasing reductionsLemma 39 (Stability and monotonicity).Let I = i, a , J = j, b be labels, s −→ −→ with respect to (I, J), u[] p a context and σ a substitution.Then u orthogonal higher-order local peak with i, j ∈ R ∪ β.The proof proceeds by induction on the size of u.Assume first that ∈ P ∪ Q.Then, we can cut u along K = (P ∪ Q) min , apply induction to u K yielding some decreasing diagram, and compose back that diagram with u K by Lemma 40, yielding a decreasing diagram as expected.Assume now that ∈ P ∪ Q and wlog ∈ P. Let O∪O be the maximal subset of P∪Q s.t.(i, O, j, O ) defines a nested critical overlap at the root.There are two cases depending upon the size of O ∪ O : the first, non-overlapping case, shown in Figure4, and the second, overlapping case, shown in Figure5.(1) O ∪ O = .We then split u along K = (P \ { } ∪ Q) min .Let therefore r = u K , γ = u K , and rγ =u.By Lemma 14, P= ⊗ K P and Q= ∅ ⊗ K Q .By Lemma 28 applied twice, We now decompose the step from u to v: by Definition 19, r −→ decreasing derivation.We are done if P =∅, otherwise we get smaller peaks σ θ.By monotonicity Lemma 40, we get a decreasing derivation from sσ to sθ.By gluing Lemma 41, we get a decreasing derivation from rτ to sθ.We have therefore got a DRD for the peak v = sσ ⇐= ⊗ |O ∪ O | > 1, hence ∈ O.We then split u according to Theorem 35, hence there exist u , v , w , r, s, t, θ, γ , σ , τ , O, O , P , Q such that (i) u=u θ, v=v σ , w=w τ , u =rγ , v =sγ , w =tγ ; (ii) s O iv) P=O ⊗ u P , Q=O ⊗ u Q .By assumption, there is a DRD for the nested critical pair s, t , hence there exists some term m such that s −→ −→ <i ⊗ =⇒ −→ −→ <i,j m and t −→ −→ <j ⊗ =⇒ −→ −→ <i,j m.It follows by stability Lemma 39 that sγ −→ −→
• the non-trivial nested critical peaks of R\S have rewrite diagrams decreasing wrt rule labeling; • the non-trivial plain critical pairs of S are joinable; • the parallel critical overlaps of R \ S onto S not at the root have rewrite diagrams decreasing wrt rule labeling composed lexicographically with small rules rewriting; • the plain critical peaks and one-nested critical peaks of S onto R \ S have rewrite diagrams decreasing wrt rule labeling composed lexicographically with small rules rewriting.