Higher-order pattern generalization modulo equational theories

Abstract We consider anti-unification for simply typed lambda terms in theories defined by associativity, commutativity, identity (unit element) axioms and their combinations and develop a sound and complete algorithm which takes two lambda terms and computes their equational generalizations in the form of higher-order patterns. The problem is finitary: the minimal complete set of such generalizations contains finitely many elements. We define the notion of optimal solution and investigate special restrictions of the problem for which the optimal solution can be computed in linear or polynomial time.


Introduction
Anti-unification algorithms aim at computing generalizations for given terms. A generalization of t and s is a term r such that t and s are substitution instances of r. Interesting generalizations are those that are least general (lggs). However, it is not always possible to have a unique least general generalization. In these cases, the task is either to compute a minimal complete set of generalizations or to impose restrictions so that uniqueness is guaranteed.
Anti-unification, as considered in this paper, uses both of these ideas. The theory is simply typed lambda calculus, where some function symbols may be associative, commutative, have an associated unit element, or have any combination of these equational properties. Anti-unification for first-order terms containing function symbols obeying these properties is finitary, and the corresponding modular generalizations algorithms have been proposed in Alpuente et al. (2014), also in the presence of ordered sorts. Anti-unification for simply typed lambda terms can be restricted to compute generalizations in the form of Miller's patterns (Miller, 1991), which makes it unitary, and the single least general generalization can be computed in linear time by the algorithm proposed in . These two approaches combine nicely with each other when one wants to develop a higher-order equational anti-unification algorithm. In this paper, we present higher-order pattern anti-unification for terms containing function symbols whose equational axioms may include associativity, commutativity, identity (unit element), and their combinations. Basically, we extend the syntactic 1 generalization rules from  by equational decomposition rules inspired by those from Alpuente et al. (2014). The existence of the unit element introduces some complications due to the fact that the corresponding equational classes are infinite. To avoid them and still have a complete algorithm, we concentrate are s = E t if they are equivalent modulo E ⊆ {A, C, U}. For example, f (a, f (b, c) (a, b), c). Under this notion of equality, we can say that a substitution σ 1 is more general than σ 2 modulo an equational theory E ⊆ {A, C, U}, written σ 1 E σ 2 , if there exists ϑ such that Xσ 1 ϑ = E Xσ 2 for all X ∈ Dom(σ 1 ) ∪ Dom(σ 2 ). Note that ≺ and and their term extension are generalized accordingly. From this point on, we will use the ordering relation modulo an equational theory when discussing generalization.
A term t is called a generalization or an anti-instance modulo an equational theory E of two terms t 1 and t 2 if t E t 1 and t E t 2 . (Will refer to such objects as E-generalizations.) It is a higher-order pattern generalization if additionally t is a higher-order pattern. It is the least general generalization (lgg in short), aka a most specific anti-instance, of t 1 and t 2 , if there is no generalization s of t 1 and t 2 which satisfies t ≺ E s.
An AUP is a triple X( #» x ) : t s where λ #» x .X( #» x ), λ #» x .t, and λ #» x .s are terms of the same type, -λ x.t and λ x.s are in η-long β-normal form, and -X does not occur in t and s.
The variable X is called a generalization variable. The term X( #» x ) is called the generalization term.
The variables that belong to #» x , as well as bound variables, are written in the lower case letters x, y, z, . . .. Originally free variables, including the generalization variables, are written with the capital letters X, Y, Z, . . .. This notation intuitively corresponds to the usual convention about syntactically distinguishing bound and free variables. The size of a set of AUPs is defined as |{X 1 ( #» x 1 ) : t 1 s 1 , . . . , X n ( #» x n ) : t n s n }| = n i=1 |t i | + |s i |. Notice that the size of X i ( #» x i ) is not considered. An anti-unifier of an AUP X( #» x ) : t s is a substitution σ such that Dom(σ ) = {X} and λ #» x .X( #» x )σ is a term which generalizes both λ #» x .t and λ #» x .s. An anti-unifier σ of X( #» x ) : t s is least general (or most specific) modulo an equational theory E if there is no anti-unifier ϑ of the same problem that satisfies σ ≺ E ϑ. Obviously, if σ is a least general anti-unifier of an AUP X( #» x ) : t s, then λ #» x .X( #» x )σ is a lgg of λ #» x .t and λ #» x .s.
Here, we consider a variant of higher-order equational AUP: Given: Terms t and s of the same type in η-long β-normal form and an equational theory E ⊆ {A, C, U}. Find: A higher-order pattern generalization r of t and s modulo E ⊆ {A, C, U}.
Essentially, we are looking for r which is least general among all higher-order patterns which generalize t and s (modulo E). There can still exist a term which is less general than r, generalizes both s and t, but is not a higher-order pattern. In , there is an instance for syntactic anti-unification: if t = λx, y.f (h(x, x, y), h(x, y, y)) and s = λx, y.f (g(x, x, y), g(x, y, y)), then r = λx, y.f (Y 1 (x, y), Y 2 (x, y)) is a higher-order pattern, which is an lgg of t and s. However, the term λx, y.f (Z(x, x, y), Z(x, y, y)), which is not a higher-order pattern, is less general than r and generalizes t and s.
Another important distinguishing feature of higher-order pattern generalization modulo E is that there may be more than one least general pattern generalization (lgpg) for a given pair of terms. In the syntactic case, there is a unique lgpg. The main contribution of this paper is to find conditions on the AUPs under which there is a unique lgpg for equational cases and introduce weaker optimality conditions which allow one to greedily search the space for a less general generalization compared to the syntactic one. We formalize these concepts in the following sections.

Higher-Order Pattern Generalization in the Empty Theory
Below we assume that for AUPs of the form X( #» x ) : t s, the term λ #» x .X( #» x ) is a higher-order pattern. We now introduce the rules for the higher-order pattern generalization algorithm from , which works for E = ∅. It produces syntactic higher-order pattern generalizations in linear time and will play a key role in our optimality conditions introduced in later sections.
These rules work on triples A; S; σ , which are called states. Here, A is a set of AUPs of the form {X 1 ( #» x 1 ) : t 1 s 1 , . . . , X n ( #» x n ) : t n s n } that are pending to anti-unify, S is a set of already solved AUPs (the store), and σ is a substitution (computed so far) mapping variables to patterns. The symbol denotes disjoint union.

Dec: Decomposition
where h is a free constant or h ∈ #» x , and Y 1 , . . . , Y m are fresh variables of the appropriate types.

Abs: Abstraction Rule
where X is a fresh variable of the appropriate type.

Sol: Solve Rule
where t and s are of a base type, head(t) = head(s) or head(t) = head(s) = Z ∈ #» x . The sequence #» y is a subsequence of #» x consisting of the variables that appear freely in t or in s, and Y is a fresh variable of the appropriate type.
Although it is not necessary for this version of Solve, we can impose an extra condition on its application requiring that U / ∈ Ax(head(t)) ∪ Ax(head(s)). This condition will become useful later, when we consider theories with the unit element.

Mer: Merge Rule
where π : { #» x } → { #» y } is a bijection, extended as a substitution with t 1 π = s 1 and t 2 π = s 2 . Note that in the case of the equational theory, we will consider later we would use ≡ E instead of =.
We will refer to these generalization rules as G base . To compute generalizations for two simply typed lambda terms in η-long β-normal form t and s, the algorithm from  starts with the initial state {X : t s}; ∅; Id, where X is a fresh variable, and applies these rules as long as possible. The computed result is the instance of X under the final substitution. It is the syntactic least general higher-order pattern generalization of t and s and is computed in linear time in the size of the input.
One may notice that an AUP of the form X( #» x ) : Z(s 1 , . . . , s m ) Z(t 1 , . . . , t m ), where Z is a free variable, is transformed by Sol rather than by the Dec rule. This is because applying decomposition may result into a generalization which is not a higher-order pattern. A simple example is the AUP X : λx.Z(x, a) λx. Z(x, a). The algorithm returns the pattern λx.Y(x) as its generalization, while the application of Dec would lead to the generalization λx.Z(x, a), which is not a pattern. However, when an AUP has the form X( #» x ) : c c, where c is a constant or one of the variables in #» x , we apply the decomposition rule, that is, {X : c c}; ∅; Id =⇒ Dec ∅; ∅; {X → c}.
We will use this linear time procedure in the following section to obtain "optimal" least general higher-order pattern generalizations of terms modulo an equation theory. These optimal generalizations are dependent on the generalizations the syntactic algorithm produces. When we need to check more than one decomposition of a given AUP in order to compute the optimal generalizations modulo an equational theory, we compute the optimal generalization for each decomposition path and then compare the results. The details are explained below.

Equational Decomposition Rules: A-, C-, and AC-Theories
In this section, we discuss an extension of the basic rules concerning higher-order pattern generalization by decomposition rules for A-, C-, and AC-theories. Here, we consider the general, unrestricted case. The theory with the unit element is considered separately in the next section. Efficient special restrictions are discussed in the subsequent section.

Associative decomposition rules
We start from decomposition rules for associative generalization:
We refer to the extension of G base by the above associativity rules as G A and extend the termination, soundness, and completeness results for G base to G A . To illustrate the use of the above extension of G base , let us consider the following example where Ax( f ) = {A}: λx.λy.f (x, x, y, y) and s = λz.λv.f (z, v, v) be in flattened form. The initial state is {X : t s}; ∅; Id. First, we apply the abstraction rule twice: From here, we can continue in multiple ways, applying Dec-A-L or Dec-A-R, each of them in various positions. Assume that we use Dec-A-L at position 2, that is, with the index k being set to 2 (note that we flatten nested f 's also in the substitutions): The derivation stops here with the computed answer λx.λy.f (Y(x), y, y). Now assume that the Dec-A-L above was used not at position 2, but at position 1. It will lead to the computation of another lgg, which shows that for the associative case, there exists more than one lgg. (In contrast, higher-order pattern anti-unification in the free theory from  always results into a unique lgg.) Here, there are also multiple ways to proceed. We show one of them by the Dec-A-L rule applied at position 2: Hence, we obtained another lgg λx.λy.f (x, Y(x, y), y).
Theorem 1 (Termination). The set of transformations G A is terminating.
Proof. Termination follows from the fact that G base terminates  and the rules Dec-A-L and Dec-A-R can be applied finitely many times.
Theorem 2 (Soundness). If {X : t s}; ∅; Id =⇒ * ∅; S; σ is a transformation sequence of G A , then Xσ is a higher-order pattern in η-long β-normal form and Xσ t and Xσ s.
Proof. It was shown in  that G base is sound and always results in a higher-order pattern. The associative decomposition rules replace free variables with higher-order patterns in substitutions. Composition of pattern substitutions is again a pattern substitution. Therefore, the associative generalization algorithm also returns higher-order patterns. The second part of the theorem we prove by induction on the number of arguments of associative function constants appearing in t s. Let us assume as a base case that all occurrences of associative constants in t s have two arguments. Then, the rules Dec-A-L and Dec-A-R are equivalent to the Dec rule. As an induction hypothesis (IH), assume soundness holds when all occurrences of associative constants in t s have ≤ n arguments. We show that it holds for n + 1. Let t s be of the form f (t 1 , . . . , t m ) f (s 1 , . . . , s k ) for max{m, k} ≤ (n + 1) and let associative constants occurring in t 1 , . . . t m , s 1 , . . . s k have at most n arguments. Any application of Dec-A-L or Dec-A-R will produce two AUPs for which the IH holds, and thus, the theorem holds. We can extend this argument to an arbitrary number of associative constants with n + 1 arguments with another induction.
Theorem 3 (Completeness). Let λ #» x .t 1 and λ #» x .t 2 be higher-order terms and λ #» x .s be a higher-order pattern such that λ #» x .s is a generalization of both λ #» x .t 1 and λ #» x .t 2 modulo associativity. Then there exists a transformation sequence Proof. We can reason similarly to the previous proof. It was shown in  that G base is complete. Let us assume as a base case that all occurrences of associative function constants in t s have two arguments. Then, the rules Dec-A-L and Dec-A-R are equivalent to the Dec rule and completeness holds. When we have n + 1 arguments, there are n ways to group the arguments associatively and the decomposition rules Dec-A-L and Dec-A-R allow one to consider all groupings.
If we wish to compute the complete set of lggs, we would simply exhaust all possible applications of the above rules. However, for most applications, an "optimal" generalization is sufficient. We postpone discussion till the next section.

Commutative decomposition rules
The decomposition rules for commutative symbols are also pretty intuitive: where Ax( f ) = {C}, i ∈ {1, 2}, and Y 1 and Y 2 are fresh variables of appropriate types.
We refer to the extension of G base by the commutativity rule as G C . To illustrate the use of the above extension of G base , let us consider the following example where Ax( f ) = {C}: Example 3. Let t = λx.λy.f (g(x, y), g(y, x)) and s = λz.λw.f (w, g(z, z)). The initial state is {X : t s}; ∅; Id. After applying the Abs rule twice, we reach from here, there are two ways to proceed: applying the Dec-C with i = 1 and with i = 2.

Associative-commutative decomposition rules
Unlike commutativity, which considers a fixed number of terms, and associativity, which enforces an ordering on terms, AC constants allow an arbitrary number of arguments with no fixed ordering on the terms. The corresponding decomposition rules take it into account: . . , m}, n, m ≥ 2, and Y 1 and Y 2 are fresh variables of appropriate types.
Dec-AC-R: Associative-Commutative Decomposition Right . . , n}, n, m ≥ 2, and Y 1 and Y 2 are fresh variables of appropriate types.
We refer to the extension of G base by the AC decomposition rules as G AC . To illustrate the use of the above extension of G base , let us consider the following example where Ax( f ) = {A, C}: Example 4. Let t = λx.λy.f (x, x, y) and s = λz.λv.f (v, v, z). Starting from the initial state {X : t s}; ∅; Id, after two applications of Dec, we reach the state {X (x, y) : f (x, x, y) f (y, y, x)}; ∅; {X → λx.λy.X (x, y)}, which can be further transformed in multiple ways. We present two derivations. The first one starts with Dec-AC-L, where k = 3 (the position), l = 1 (the subsequence length) and looks as follows: Hence, the obtained lgg is λx.λy.f (x, y, Z(x, y)). The second derivation is the following: Dec-AC-L Hence, the second lgg is λx.λy.f (Y(x, y), Y(x, y), Y(y, x)).
Again, termination, soundness, and completeness are easily extended to this case.

Generalization modulo U
A peculiarity of theories with unit elements is that terms with different heads may have nontrivial least general generalizations. For instance, the lgg of λx.f (a, x) and λx.a is λx.f (a, X(x)), if f has the unit element. (Otherwise, λx.X(x) would have been the lgg.) In order not to miss such generalizations, we should not use the Solve rule for the AUP X( #» x ) : t s, if the head of t or of s is a constant f such that U ∈ Ax( f ). Instead, the following expansion rules should be applied:

Exp-U-R: Expansion for Unit, Right
Extending the base algorithm with these rules (and modifying Solve as described above) gives an algorithm whose soundness is straightforward. Termination is also easy to see because the expansion rules are to be followed by the decomposition and the problem becomes strictly smaller than it was before the expansion. However, it turns out that the algorithm is not complete, as the following example shows: where Ax( f ) = {U} and f is the unit element of f . Note that these terms come from the first-order fragment of the term language. Then we have a derivation: With another term choice f ( f , g(a, a)) in Exp-U-L, we would get a derivation of a more general generalization f (X 1 , X 2 ). However, even f (g(Y 1 , Y 2 ), Y 3 ) is not least general: It is strictly more general than f (g( f (Z 1 , Z 2 ), Z 1 ), Z 3 ). To get convinced that the latter is indeed a generalization of t and s, take The problem highlighted in Example 5 is related to the fact that from two U-equigeneral terms Y and f (Z 1 , Z 2 ), sometimes we have to choose one in the generalization and sometimes another, depending which variable can occur more than once in the generalization. However, if we compute linear generalizations (i.e., no variable occurring more than once and, hence, Merge rule is not applied), then there is no need to consider f (Z 1 , Z 2 ) as an alternative of Y (as a generalization). Notice that in Example 5, Z 1 has multiple occurrences allowing one to add additional occurrences of f . While we do not go into detail in this paper concerning how to handle unit in the nonlinear case, we conjecture that a terminating complete algorithm is possible using a similar framework as was introduced in Cerna and Kutsia (2019b) where a terminating and complete algorithm was provided for idempotent generalization. In that case, the expansion rules cover all alternatives to "repair" head disagreement between two terms, where the head of one of those terms has the unit element. Therefore, we call the algorithm G U-lin .
For the general case, one might hope to take an advantage of the unit element and generalize even arbitrary head-different terms. 2 The following rule would deal with all such possibilities:

DH-U: Terms with Different Heads in the Unit Element Theory
Extending G U-lin with DH-U, we would be able to compute the lgg for terms in Example 5, continuing the derivation that stopped there: While the use of DH-U can help to find lggs in the general case as in this example, it has a serious drawback: if we have more than one function constant with the unit element, it will generate an infinite branch in the derivation tree: 3 Along the branch, we will have generalizations but all of them are U-equigeneral. In fact, one can notice that by the repeated application of DH-U with more than one unit element, the same AUPs (with fresh generalization variables) are generated over and over again. Therefore, with a simple loop checking, or by setting the bound to the derivation depth based on the size/depth of the original problem, we can obtain a terminating algorithm for the general case as well.
Nevertheless, to avoid such unpleasant consequences of using the DH-U rule, in the rest of the paper, we will restrict ourselves with the algorithm G U-lin when the equational axioms involve U. Hence, we will be interested in computing linear generalizations for those theories.
Case 1: n = 2, that is, t 1 and t 2 are constants.
(b) Now assume as the IH that for every generalization s of t 1 and t 2 of depth at most k, either s t 1 and t 1 = t 2 , or s X and t 1 = t 2 . We show that this holds for a generalization s of depth k + 1. Let head(s ) = f . Our assumptions imply that U ∈ Ax( f ) because both t 1 and t 2 are of depth 1. Thus, s = f (s 1 , s 2 ). By the definition of a generalization, there must exists two substitutions σ 1 and σ 2 such that s σ 1 = t 1 and s σ 2 = t 2 . If s 1 σ 1 = s 1 σ 2 = f (resp. if s 2 σ 1 = s 2 σ 2 = f ), then s 2 (resp. s 1 ) is, by the IH, more general than t 1 when t 1 = t 2 , or more general than X when t 1 = t 2 . This implies, by the linearity assumption, that there exists a substitution ϑ such that s 2 ϑ = s 2 and s 1 ϑ = f . Thus, s ϑ = s 2 , that is, s ≺ s 2 .
However, if s 2 σ 1 = f and s 1 σ 2 = f , or vice versa, then additional observations are required. We assume without loss of generality the former case.
If t 1 = t 2 , then both s 1 and s 2 are generalizations of t 1 t 2 and by the IH s 1 t 1 and s 2 t 1 . If t 1 = t 2 , then we need to make a distinction: b1. If neither t 1 nor t 2 are units of function constants f t 1 and f t 2 , respectively, which may appear in s, then there exists a variable Y occurring in s 1 such that Yσ 1 = t 1 and a variable Y occurring in s 2 such that Y σ 2 = t 2 . However, by the linearity of S, this implies that there exist two substitutions σ 1 and σ 2 which coincide everywhere with σ 1 and σ 2 except on Y and Y , respectively. That is, Yσ 1 = t 2 and Y σ 2 = t 1 . This implies that both s 1 and s 2 are generalizations of t 1 t 2 which have depth ≤ k + 1. Thus, s 1 X and s 2 X. b2. If either t 1 or t 2 is a unit of the function constants f t 1 and f t 2 , respectively, which may appear in s, then additional observations are necessary. If neither t 1 nor t 2 occurs in s, then we have the same situation as b1. Otherwise, if f t 1 occurs in s 1 (respectively, f t 2 in s 2 ), then it must occur as the head symbol of a term with t 1 as a subterm because s 1 σ 2 = f . This implies that there must be a variable Y in s 1 which σ 1 maps to t 1 . Similar can be said concerning s 2 , t 2 , and σ 2 . We can construct a new substitution which coincides with σ 1 (resp. with σ 2 ) everywhere but on the variable Y (resp. Y ) which it maps to t 2 (resp. to t 1 ). This means that s 1 and s 2 are generalizations of t 1 t 2 and by the IH s 1 X s 2 X. This completes the case 1.
(a) Let us assume that t 1 = λy.t 1 and t 2 = λz.t 2 , then it must be the case that s = λy.s where s is a generalization of the AUP X(y) : t 1 t 2 . Note that depth(t 1 ) + depth(t 2 ) = n − 1 and thus by the IH, there exists a generalization s * computable using the rules of G U-lin such that s s * . Thus, a generalization for the AUP t 1 t 2 may be computed using the G U-lin by first applying the abs rule to t 1 t 2 and then computing s * . Thus, λy.s * is a generalization of t 1 t 2 .
(b) Let us assume that t 1 = f (w 1 , . . . , w m ) and t 2 = f (r 1 , . . . , r m ), such that U ∈ Ax( f ) then by applying the Dec rule to the AUP X : t 1 t 2 we get m AUPs X 1 : w 1 r 1 , . . . , X m : w 1 r 1 each of which has a depth sum of n − 1. Thus, by the IH, for each generalization s generalizing X i : w i r i , there exists a generalization s * i , computed using G U-lin , such that, s s * i . Now let S * i be the set of all such generalizations computed using G U-lin . We may now define the set of generalizations S * as follows: Note that each term in S * is a generalization of X : t 1 t 2 computed using G U-lin . Thus, any generalization s of X : t 1 t 2 such that head(s ) = f is more general than some generalization of S * . Thus, we need only to consider generalization s such that head(s ) = f . This implies that U ∈ Ax(head(s )). If s does not contain f , then s X. Thus, let us assume that s = g(s 1 , s 2 ) where U ∈ Ax(g) and without loss of generality head(s 1 ) = f . This implies that s 2 g (note that s is linear) and thus s 1 s . This reduction can be performed inductively thus showing that for any generalization s with head(s ) = f , there exists s ∈ S * such that s s .
(c) Let us assume that t 1 = f (w 1 , w 2 ) and t 2 = f (r 1 , r 2 ), such that U ∈ Ax( f ). Then, we can proceed in a similar fashion as in case b) by constructing S * . Thus, any generalization s of X : t 1 t 2 such that head(s ) = f and s = f (d 1 , d 2 ), where d 1 is a generalization of w 1 r 1 , d 2 a generalization of w 2 r 2 , is more general than some generalization of S * . When U ∈ Ax(head(s )) and some generalization s is a subterm of s such that there exists s * ∈ S * with s s * , a similar approach can be taken as in the second half of case b).
(d) Let us assume that t 1 = f (w 1 , . . . , w m ) and t 2 = g(r 1 , . . . , r m ), where either U ∈ Ax( f ) or U ∈ Ax(g), or both. By a single application of Exp-U-L or Exp-U-R, this case can be reduced to c).

Linear generalization modulo AU
When an associative function constant has a unit element, we cannot simply combine associative decomposition and unit expansion rules. Such a combination would generalize, for instance f (a, b) and f (b, a) by f (X, Y), but the lggs are f (X, a, Y) and f (X, b, Y). The problem is related to the fact that by A-decomposition (by the rules Dec-A-L and Dec-A-R), we cannot obtain AUPs which retain the first argument of a term on one side and an arbitrary term from the arguments on the another side.
The problem can be solved by special rules for AU-decomposition, which are used for those f 's for which Ax( f ) = {A, U}. However, for termination, we should make sure that they do not generate trivial AUPs of the form Y( #» x ) : f f . This is what the condition about nontriviality of new AUPs requires in the conditions below: where

and the new AUPs are not trivial.
Dec-AU-R: where Ax( f ) = {A, U}, n, m ≥ 2, 0 ≤ k ≤ m, Y 1 and Y 2 are fresh variables of appropriate types, f (s 0 ) = f (s m+1 ) = f , and the new AUPs are not trivial. Note the difference from Dec-A-L and Dec-A-R: k is allowed to reach the boundaries. It can become 0, n, or m.
Soundness of the AU-decomposition rules is easy to see. As for termination, we may require that an application of a unit expansion rule is immediately followed by the application of an AUdecomposition rule. Since the latter does not generate a trivial AUP, the sizes of the new AUPs will be smaller than the one to which the unit expansion rule was applied, which implies termination.
Note that if we did not put the trivial AUP restriction condition in the AU-decomposition rules, we could get an infinite derivation of the form Hence, to compute linear generalizations modulo AU, we extend G U-lin by AU-decomposition rules. We call this algorithm G AU-lin . With the AU-decomposition rules, we obtain all possible decompositions. The unit expansion rules allow one to transform AUPs with mismatching head symbols into AUPs with matching head symbols when at least one of the head symbols is an AUsymbol. Therefore, by G AU-lin , we will never miss an existing linear lgg of two terms. To illustrate the use of the above extension of G base , let us consider the following example where Ax( f ) = {A, U}: Example 6. Let t = λx.λy.f (x, x, y, y) and s = λz.λv.f (z, v, v). The initial state is {X : t s}; ∅; Id and the derivation of the lgg λx.λy.f (x, Y(x, y), f (y, y)) is as follows: Notice that the AU-lin generalization λx.λy.f (x, Y(x, y), y, y) computed here is less general than the A-generalizations λx.λy.f (Y(x), y, y) and λx.λy.f (x, Y(x, y), y) computed in Example 2.

Linear generalization modulo CU
Generalization in a commutative theory with the unit element is simpler than AU-generalization described above. The reason is in the Dec-C rule, which generates new AUPs from the arguments of the given AUP, removing the head symbol. The effect of its combination with the unit expansion rules is to anti-unify one argument from one side with the term in another side, while the other argument is anti-unified with the unit element. These are exactly all the alternatives CUgeneralization should consider. For (linear) CU-generalization algorithm, we can add to G U-lin the counterpart of Dec-C rule which is applied when Ax( f ) = {C, U}, obtaining the algorithm G CU-lin . Its soundness, termination, and completeness properties are straightforward. To illustrate the use of the above extension of G base , let us consider the following example where Ax( f ) = {C, U}: Example 7. Let t = λx.λy.f (g(x, y), g(y, x)) and s = λz.λv.g(z, z). The initial state is {X : t s}; ∅; Id and the derivation of the lgg λx.λy.f (g(x, Y(x, y)), Z(x, y)) is as follows:

Linear generalization modulo ACU
ACU-lin-generalization is characterized by the properties of both AU-lin-and CU-lin-generalizations. From AU-lin, it should inherit the condition that new AUPs are not trivial. It is similar to CU-lin in that the original decomposition does not need to be changed: since the order of arguments is not fixed, there is no problem in reaching the boundaries in the combination with the unit expansion rules (which was problematic in the A case). Therefore, ACU-decomposition rules have the following form: where Ax( f ) = {A, C, U}, {i 1 , . . . , i n } ≡ {1, . . . , n}, l ∈ {1, . . . , n − 1}, k ∈ {1, . . . , m}, n, m ≥ 2, Y 1 and Y 2 are fresh variables of appropriate types, and the new AUPs are not trivial.
. . , n}, n, m ≥ 2, Y 1 and Y 2 are fresh variables of appropriate types, and the new AUPs are not trivial.
Extending G U-lin with Dec-ACU-L and Dec-ACU-R, we obtain an algorithm for linear ACUgeneralization which we call G ACU-lin . Its soundness is straightforward. Arguments for termination are similar to those for AU. For completeness, note that we essentially consider all decompositions with all permutations of arguments under ACU symbols, and the unit expansion rules introduce a unit element allowing comparison between the unit element and all terms occurring as arguments to an ACU symbols. Therefore, no linear lgg will be missed. To illustrate the use of the above extension of G base , let us consider the following example where Ax( f ) = {A, C, U}: Example 8. Let t = λx.λy.λz.f (x, z, y) and s = λx .λy .λz .f (y , z , x , g(x )). The initial state is {X : t s}; ∅; Id and the derivation of the lgg λx.λy.λz.f (x, y, z, Y(x, y, z)) is as follows:  f (x, y, z, Y(x, y, z)), . . .}.

Combining different theories
Finally, we consider the general case when different function constants satisfy associativity and/or commutativity and/or identity axioms. Like in Alpuente et al. (2014), we can use the rules above all together. (In the presence of the unit element, we can restrict ourselves with computing linear generalizations.) All rules, except DH-U, are local in the sense of Alpuente et al. (2014): they are local to the given top function constant in the given AUP they are acting upon, irrespective of what other constants and what other axioms may be present in the given alphabet and the equational theory. Such a locality means that the rules are modular and they do not change when new A and/or C and/or U constants are introduced.
It should be also mentioned that in the algorithms considered above minimality was not our goal. The obtained complete set of generalizations is not necessarily minimal. They can be minimized later. Also, the rules have not been optimized, and in general, nondeterminism can be high. In the rest of the paper, we will try to reduce it, aiming at computing some kind of optimal solutions.

Toward Special Restrictions
This section is devoted to computing special kind of "optimal" generalizations, which can be done more efficiently than the general unrestricted cases considered in the previous section.
The idea is the following: The equational decomposition rules introduce branching in the search space. Each branch can be developed in linear time, but there can be too many of them. However, if the branching factor is bounded, we could choose one of the alternative states (produced by decomposition) based on some "optimality" criterion and develop only that branch. Such a greedy approach will give one "optimal" generalizations.
While restricting the input terms enough will guarantee the production of generalizations in linear time, these generalizations are not necessarily useful, meaningful, or relevant. For example, we can allow function symbols with an equational theory but only consider terms which do not need the equation decomposition rules. In order to guarantee relevance, we want to find the least restrictive restrictions of the input terms which have some guaranteed complexity bounds and produce generalizations which are less general than the syntactic counterpart. In this sense, the restrictions given in the following sections are useful, meaningful, or relevant.
In order to have a "reasonable" computational complexity, we should be able to choose such an optimal state from "reasonably" many alternatives in "reasonable" time. Toward this goal, we start by introducing the concept of E-refined generalizations. They will be our main target to compute.
Definition 1 (E-refined generalization). Given two terms t and s and their E-generalizations r and r , we say that r is at least as good as r with respect to E if either r E r or they are not comparable with respect to E .
An E-generalization r of t and s is called their E-refined generalization iff r is at least as good (with respect to E) as a syntactic lgg of t and s.
In our equational theories, to obtain a syntactic generalization of two terms, we assume that all occurrences of associative symbols in the terms are associated to the right.

Example 9.
Let t = f (a, b, c, c, a) and s = f (d, d, b, b, c, a) be two terms, where f is associative. Their syntactic lgg (with f being right associated) is r = f (x, f (y, f (z, f (z, u))), where x is a generalization of a and d, y of b and d, z of c and b, and u of a and f (c, a).
An A-refined generalization of t and s is r = f (x, b, y, c, a), where x generalizes a and f (d, d) and y generalizes c and b. Note that r is as good as r : they are incomparable with respect to A .
We design the following general procedure to compute E-refined generalizations: Refine A procedure to compute E-refined generalizations 1: Let A; S; σ be a state containing an AUP P upon which equational decomposition rules can be applied in n different ways. 2: From A; S; σ generate n new states A 1 ; S 1 ; σ 1 through A n ; S n ; σ n which result from the various ways equational decomposition rules may be applied to P. 3: Now find an lgg, denoted by l i , for each A i ; S i ; σ i using G base . 4: Select the least general l i (or one by some heuristics when multiple generalizations are least general) and choose A i ; S i ; σ i as the successor state of A; S; σ . 5: If A i ; S i ; σ i contains an AUP which may be equationally decomposed, then go back to 1.
Otherwise, apply a rule from G base if possible, and repeat this line. If no rule applies, then we exit.
Note that every syntactic generalization is also an E-refined generalization. Therefore, to guarantee that Refine indeed computes E-refined generalizations, we need to make sure that in step 2, equational decomposition rules produce results among which at least one is as good as the result of syntactic decomposition. Another consequence of the definition of E-refined generalization is that every element of the minimal complete set of E-generalizations (in our equational theories) of two terms is an E-refined generalization of t and s. However, there might exist E-refined generalizations which do not belong to the minimal complete set of generalizations.
Example 10. Let us consider a simple first-order example of this property, namely, the AUP f ( a, f (b, c) f (a, b)) where Ax( f ) = {A, C}. Note that the minimal complete set of ACgeneralizations for this AUP contains one element, f (a, f (b, c)). Its syntactic generalization is f (x, f (y, z)) where x, y, and z are fresh variables. An example of AC-refined generalization, which is not in the minimal complete set of AC-generalizations, is f (a, f (y, z)).
Looking back at the description of the Refine procedure, we can say that at each branching point, we will be aiming at choosing the alternative that would lead to "the best" E-refined generalization. To limit the number of choices, we will need to identify restrictions of equational AUPs which would have a constant decomposition branching factor.
The concept of E-refined generalizations allows us to compute better generalizations than the base procedure would do, without concerning ourselves with certain difficulty to handle decompositions. We will outline what we mean by "difficult" in later sections. Some of these difficult decompositions can be handled by finding alignments between two sequences of symbols. These sequences are usually extracted as sequences of root symbols from the given AUPs, where the root of a term is defined as root(λx 1 , . . . , x k .t) = λx 1 , . . . , x k if k > 0, and root(λx 1 , . . . , x k .t) = head(t) if k = 0. Note that as a root, we treat λx 1 , . . . , x k as one symbol and it is identified with any lambda prefix with k variables, for example, λx 1 , . . . , x k = λy 1 , . . . , y k .

This notion extends to AUPs: a pair of argument root sequences of an AUP X( #» x ) : t s is the pair of argument root sequences of the terms t and s.
Definition 3 (Alignment, rigidity function). Let w 1 and w 2 be sequences of symbols. Then, the sequence a 1 [i 1 , j 1 ] · · · a n [i n , j n ], for n ≥ 0 is an alignment of w 1 and w 2 if -i's and j's are integers such that 0 < i 1 < · · · < i n ≤ |w 1 | and 0 < j 1 < · · · < j n ≤ |w 2 |, and -a k = w 1 | i k = w 2 | j k , for all 1 ≤ k ≤ n.
The set of all alignments will be denoted by A. A rigidity function R is a function that returns, for every pair of sequences of symbols w 1 and w 2 , a set of alignments of w 1 and w 2 .
The main intuition behind the use of rigidity functions for generalization is to capture the structure (modulo a given rigidity property) of as many nonvariable terms as possible .
Example 11. Let us consider the two sequences of symbols (b, a, b, a) and (a, b, c, b, a) We can define a rigidity function that returns, for instance, the set of all LCSs, or the set of all singleton alignments, or something more specific: a rigidity function R LCS for the given two symbol sequences selects the LCS alignment, which is lexicographically smallest (with respect to the positions) among all such alignments. For example, b [1,2]

b[3, 4]a[4, 5] is lexicographically smaller than a[2, 1]b[3, 2]a[4, 5] because (1, 2) is lexicographically smaller than (2, 1).
R LCS ((c, b, a, b, a), (a, b, c, b, a) ((a, b, c, b, a), (c, b, a, b, a) ((b, a, b, a), (a, b, c, b, a) There is a subset of AUPs, referred to as 1-determined AUPs, which have interesting E-refined generalizations computable in linear time. The number 1 means that the equational decomposition can be done only in one possible way. Hence, there is no branching in equational anti-unification rule applications. (n = 1 in step 2 of Refine.) The more general k-determined AUPs allow a bounded number of possible choices, that is k choices, whenever equational decomposition may be applied. (n ≤ k in step 2 of Refine.) Even for 2-determined AUPs computing, the set of lggs is of exponential complexity. Therefore, we introduce the notion of (R, C, G)-optimal generalization where R is a rigidity function, C is a choice function picking one of the available decompositions, and G is the particular algorithm for which we are defining optimality. Under such optimality conditions, we will see later that we are able to compute an E-refined generalization in quadratic time for (uniformly) k-determined AUPs and in cubic time for arbitrary AUPs with associative constants.
The equational decomposition rules above are too nondeterministic and the computed set of generalizations has to be minimized to obtain minimal complete sets of generalizations. However, even if we performed more guided decompositions, obtaining for example, terms with the same root in new AUPs (as in Kutsia et al. (2014)), there would still be alternatives. For instance, consider the following AUP where f is associative: (s 1 , . . . s i , . . . , s j , . . . , s m ). Now let root(t i ) = root(s j ), root(s i ) = root(t j ), and for every other term comparison whose index is ≤ j, the root symbols are not equivalent. An example of such a situation as two sequences of root symbols would be (c, c, c, a, c, b, c, c, c) and (d, d, d, b, d, a, d, d, d). This situation results in two singleton alignments, a[3, 6] and b [6,3]. Note that any application of associative decomposition will have to choose between these alignments, that is, choosing one gives two AUPs in which the other alignment does not appear. It is not clear from the available information which choice will lead to a less general generalization. While this situation illustrates what happens when there are two alignments to choose from, it can easily be generalized to k different possible alignments, for example, (c, c, c, a, c, b, c, c, c) and (d, d, d, b, a, a, d, d, d) which contains the singleton alignments a[3, 6], a [3,5], and b [6,3].
Definition 4 (Maximal alignment). An alignment a = a 1 [i 1 , j 1 ] · · · a n [i n , j n ] is called an extension of an alignment b, if b is obtained from a by removing some letters a k 1 [i k 1 , j k 1 ], . . . , a k r [i k r , j k r ], {k 1 , . . . , k r } ⊆ {1, . . . , n}. It is a proper extension if r > 0.
An alignment a is a maximal alignment of two symbol sequences w 1 and w 2 , if no proper extension of a is an alignment of w 1 and w 2 . The set of maximal nonempty alignments of w 1 and w 2 is denoted by max-ne-align(w 1 , w 2 ).
Example 12. The sequences (a, b, a) and (c, a, b, c) have two maximal alignments: a[1, 2]b[2, 3] and a [3,2]. The first one is even the LCS of the given sequences, while the second one is not.
Definition 5 (k-determined sequence pair). A pair of sequences of symbols w 1 , w 2 is called kdetermined, if max-ne-align(w 1 , w 2 ) contains at most k elements.
Obviously, w 1 , w 2 is k-determined iff w 2 , w 1 is k-determined. The definition implies that any k-determined pair of sequences is also n-determined for any n ≥ k. Definition 6 (Uniform and max-uniform alignments). Let a = a 1 [i 1 , j 1 ] · · · a n [i n , j n ] be an alignment of two symbol sequences w 1 = (l 1 , . . . , l m ) and w 2 = (r 1 , . . . , r k ). We say that a is a uniform alignment of w 1 and w 2 , if the following conditions are satisfied: (1) i 1 = 1 iff j 1 = 1, A uniform alignment a of w 1 and w 2 is their maximal uniform alignment (shortly, max-uniform alignment), if no proper extension of a is a uniform alignment of w 1 and w 2 . The set of all nonempty max-uniform alignments of w 1 and w 2 is denoted by max-unif-ne-align(w 1 , w 2 ).
The first condition of uniformity forbids the first element in one sequence to be aligned with a nonfirst element in another sequence. The second condition is dual to the first one, putting the similar requirement on the last elements in the given sequences. The third condition guarantees that consecutive elements in w 1 are aligned with consecutive elements of w 2 and vice versa.
The empty alignment is always uniform. It is the trivial uniform alignment.
Example 14. The sequences (a, b, a) and (c, a,  Definition 7 (Uniformly k-determined sequence pair). The pair of sequences of symbols w 1 , w 2 is called uniformly k-determined, if max-unif-ne-align(w 1 , w 2 ) contains at most k elements.
Example 15. We illustrate uniformly k-determined pairs with the examples below and compare them with k-determined pairs from Example 13: 1. (a, b, c), (a, d, b, c) is uniformly 2-determined (1-determined in Example 13) because max-unif-ne-align ((a, b, c), (a, d, b, c) (a, b, a), (c, a, b, c) is uniformly 1-determined (2-determined in Example 13) because max-unif-ne-align ((a, b, a), (c, a, b, c) 3. (a, c, c, b, a, c), (a, d, b, a, c) is uniformly 1-determined (3-determined in Example 13) because max-unif-ne-align ((a, c, c, b, a, c), (a, d, b, a, c) We will need also orderless counterparts of definitions 4-7. They deal with alignments in which the order of symbols does not matter and, thus, can be considered as multisets.
Definition 8 (Orderless alignment, orderless rigidity function). Let w 1 and w 2 be sequences of symbols. Then the multiset {{a 1 [i 1 , j 1 ], . . . , a n [i n , j n ]}}, for n ≥ 0, is an orderless alignment of w 1 and w 2 if -i's and j's are integers such that 0 < i k ≤ |w 1 | and 0 < j k ≤ |w 2 | for all 1 ≤ k ≤ n, -i k = i l and j k = j l for all k = l, and -a k = w 1 | i k = w 2 | j k , for all 1 ≤ k ≤ n.
An orderless rigidity function R O returns for every pair of sequences of symbols a set of their orderless alignments. w 1 = (a, b, a) and w 2 = (c, a, b, c) The notions of alignment extension and maximal alignment easily extend to orderless alignments:

Example 16. Let
Definition 9 (Maximal orderless alignment). Let a and b be two orderless alignments. We say that a is an An alignment a is a maximal orderless alignment of two symbol sequences w 1 and w 2 , if no proper extension of a is an orderless alignment of w 1 and w 2 . The set of all maximal orderless alignments of w 1 and w 2 is denoted by oless-max-ne-align(w 1 , w 2 ).
Obviously, maximal orderless alignments of w 1 and w 2 consist of symbols from the intersection of multisets of symbols from w 1 and w 2 .
Definition 10 (O-k-determined sequence pair). A pair of sequences of symbols w 1 , w 2 is called O-k-determined, if oless-max-ne-align(w 1 , w 2 ) contains at most k elements. Example 17. This example illustrates the notion of O-k-determinedness. Note that the first six pairs below were also used in Example 13.

Definition 11
A uniform orderless alignment a of w 1 and w 2 is their maximal uniform orderless alignment (shortly, max-uniform orderless alignment), if no proper extension of a is a uniform orderless alignment of w 1 and w 2 . The set of all max-uniform orderless alignments of w 1 and w 2 is denoted by oless-max-unif-ne-align(w 1 , w 2 ).
Similarly to uniform alignments, uniform orderless alignments prevent "misalignments" of the type empty-vs-nonempty sequences in w 1 and w 2 .
Definition 12 (Uniformly O-k-determined sequence pair). A pair of sequences of symbols w 1 , w 2 is called uniformly O-k-determined, if oless-max-unif-ne-align(w 1 , w 2 ) contains at most k elements.
Example 18. All sequence pairs that were O-k-determined in Example 17 (except the first pair) are also uniformly O-k-determined for the same k. As for the first pair, we have the following: oless-max-unif-ne-align ((a, b, c), (a, d, b, c) Next, from symbol sequence pairs, we move to term pairs and define the notions of kdeterminedness and uniformly k-determinedness for them. Note that these definitions accommodate the corresponding orderless cases as well.

Definition 13 (k-determined and uniformly k-determined term pair). A pair of terms t, s of the same type is (uniformly) k-determined iff either
-head(t) = head(s), or -head(t) = head(s) and Ax(head(t)) = ∅, or -head(t) = head(s) = f , ∅ = Ax( f ) ⊆ {A, U}, and pars(t, s) is (uniformly) Finally, we formulate the main definition of this section, defining two notions: total kdetermined and total uniformly k-determined term pairs. They will play an important role in characterizing special cases of equational higher-order generalization.
We say that an AUP X( x) : t s is total k-determined (resp. total uniformly k-determined) if the term pair t, s is total k-determined (resp. total uniformly k-determined).
As one can see from Definition 14, the first item concerns four theories without commutativity (∅, A, U, AU), and the second one to other four theories with commutativity (C, AC, CU, ACU).

Proposition 1. For a given constant k, the complexity of checking if the term pair t, s is (uniformly) k-determined is O(n 2 ) and total (uniformly) k-determined is O(k n n 2 ), where n is maximum of the lengths of t and s.
Proof. Checking whether a pair of terms t, s is k-determined requires computing the set max-ne-align(pahs(t, s)) which in worst case requires O(n 2 ) time. When checking total (uniformly) k-determinedness, we need to repeat this computation recursively over the term pairs resulting from the alignments of max-ne-align(pahs(t, s)). If we assume that the maximum depth between t and s is n as well the resulting complexity is O(k n n 2 ).

Associative and Associative-Unit Generalization: Special Restrictions and
Optimality Below, we introduce a special restriction of associative and associative-unit generalization based on the concepts introduced in the previous section. Furthermore, we introduce so-called preferred choice functions which allow us to circumvent parts of the given AUP for which computation of a generalization is expensive and can be avoided in the search for a generalization which is at least as good as the syntactic generalization.

1-determined associative and associative-unit generalization
We start with defining a strategy of applying associative decomposition rules guided by a given maximal alignment. Assume that we are given the state State = A; S; σ , where A = {X( #» x ) : f (t 1 , . . . , t n ) f (s 1 , . . . , s m )} A for an associative (resp. associative-unit) f and a max-uniform alignment (resp. a maximal alignment) of pars ( f (t 1 , . . . , t n ) f (s 1 , . . . , s m )) has the form a = g 1 [i 1 , j 1 ] · · · g n [i k , j k ].
Let us denote X( #» x ) : f (t 1 , . . . , t n ) f (s 1 , . . . , s m ) by P. Recall that it is a flattened form of For a number l < min (i 1 , j 1 ), define a − l as the alignment g 1 [i 1 − l, j 1 − l] · · · g n [i k − l, j k − l]. The strategy of eliminating the first alignment element g 1 [i 1 , j 1 ] from a is defined below. The Y's and Z's are fresh variables of appropriate types. For simplicity, we show only P and its successors.
Case 1: i 1 = j 1 ≥ 1: Case 2: i 1 > 1, j 1 > 1, i 1 < j 1 : Case 3: i 1 > 1, j 1 > 1, i 1 > j 1 : When f is AU and a is a uniform alignment, the process is similar, but we may have one extra case when min (i 1 , j 1 ) = 1 and max (i 1 , j 1 ) > 1. In this case, the applied rule is Dec-AU-L or Dec-AU-R, introducing an AUP with f on one of its sides.
Hence, after these transformations, we get a new state State 1 containing a new problem P 1 = Z 1 ( #» x ) : f (t i 1 +1 , . . . , t n ) f (s j 1 +1 , . . . , s m ). We also obtain a new alignment a 1 = a − min (i 1 , j 1 ). Repeating the same process k − 1 times, we end up with the state State k containing a new problem P k = Z k ( #» x ) : f (t i k +1 , . . . , t n ) f (s j k +1 , . . . , s m ). The alignment is now empty. Therefore, we can decompose P k as follows: -If n − i 1 = m − j 1 , then apply Dec n − i 1 times.
-If n − i 1 < m − j 1 , the apply Dec n − i 1 − 1 times, followed by an application of Dec-A-R.
-If n − i 1 > m − j 1 , the apply Dec m − j 1 − 1 times, followed by an application of Dec-A-L.
(When f is an AU-symbol, P k may have f in one of its sides. In this case, no decomposition applies.) We end up with the state State k+1 , which is the result of total decomposition of P. In State k+1 , we got rid of at least k occurrences of f compared to State. Moreover, since a was a max-uniform or maximal alignment, those AUPs in State k+1 , which do not correspond to any of the element of a, do not have the same root. In the associative case, the only rule that applies to them is Solve. In the associative-unit case, except Solve also the unit expansion or DH-U rules may apply, but since we aim at computing E-refined generalizations, it is enough to transform them by Solve. It means that we can apply a sequence of Solve rules to State k+1 , which would keep in A only those AUPs whose root was one of the g's from a. The other AUPs will move to the store. Let the obtained state be Cong k+2 = A k+2 ; S k+2 ; σ k+2 .
Take Y( #» x ) : t s ∈ A k+2 . If t = f , then t was a subterm of f (t 1 , ..., t n ). The same is true for s and f (s 1 , ..., s m ). Let P − A k+2 be the AUP obtained from P by replacing all such subterms (i.e., the nonunit sides of AUPs from A k+2 ) by some constants. Let us call this operation the subtraction from P the AUPs from A k+2 .
Theorem 5. Given a state A; S; σ , where all AUPs in A are total uniformly 1-determined (resp. total 1-determined) associative (resp. associative-unit) generalization problems and the size of A is n, we can reach ∅; S ; σ -in time O(n), if all the max-uniform (resp. maximal) nonempty alignments are given; -in time O(n 3 ), if the max-uniform (resp. maximal) nonempty alignments are to be computed.
Proof. First, consider the associative case and assume all max-uniform alignments are given. Since the problem is total uniformly 1-determined, all those max-unif-ne-align sets are singletons. For each AUP P in A, we have the following: -If the root of P is not associative, then using Dec, Abs, and Solve rules, we either eventually eliminate all the successor problems of P from A in O(|P|) steps or reach a new problem P with an associative head in O(|P − {P }|) steps, where |P − {P }| is obtained by subtracting the AUP P from P. -If the root of P is associative, then by the decomposition procedure outlined above, in linearly many steps in the size of P, we either eliminate all successor problem of P from A in O(|P|) steps or reach a new problem P with an associative head in O(|P − {P }|) steps, where |P − {P }| is obtained by subtracting the AUP P from P.
It implies that eventually eliminating all AUPs that originate from P takes time O(|P|). Therefore, eliminating all AUPs from A needs time O(n), where n is the size of A. Now assume the max-uniform alignments should be computed. We do it each time when we encounter an AUP with an associative head. For each such AUP, there is at most one max-uniform nonempty alignment. It can be computed in time quadratic in the size of the AUP by dynamic programming. Since the number of steps when we need to apply these computations is bounded linearly in n, we obtain O(n 3 ) running time in this case.
The associative-unit case can be proved analogously.
Based on Theorem 5, we obtain the following theorems: Theorem 6. A higher-order {A}-refined (resp. {A, U}-refined) pattern generalization for a total uniformly 1-determined (resp. total 1-determined) AUP, where all max-uniform (resp. maximal) alignments are given, can be computed in time O(n) where n is the size of the AUP.
Proof. We use the construction outlined in the proof of Theorem 5, by which we can reach the state ∅; S; σ from the initial one in O(n) time. From ∅; S; σ till the final answer, we proceed as in , which proves overall O(n) runtime complexity. The obtained generalization is a refined generalization because the involved alignments are maximal, which ensures that all nonvariable subterms (i.e., those that are not η-equivalent to generalization variables) of the syntactic lgg occur also in the generalization we compute. The only case to be discussed is when our generalizations contain no other nonvariable subterms. In that case, those AUPs that are generalized by a variable in the lgg end up in the store of our derivation, and then the Merge rule will make sure that the computed generalization is as good as the syntactic lgg. Note that for {A, U}-refined generalizations, we do not require linearity. Although we do not use Exp-U-L, Exp-U-R, and DH-U rules for them (the syntactic lgg would anyway use variables to generalize the AUPs to which those rules apply), the merging rule is not forbidden.

Theorem 7.
A higher-order {A}-refined (resp. {A, U}-refined) pattern generalization for a total uniformly 1-determined (resp. total 1-determined) AUP can be computed in time O(n 3 ), where n is the size of the AUP.
Proof. By Theorem 5, to reach ∅; S; σ from the initial state requires O(n 3 ). From ∅; S; σ , to get to the final answer, we need linear time by the algorithm from , which gives the total O(n 3 ) running time. Proving the refined part is similar to Theorem 6.
In the next section, we consider AUPs which are uniformly k-determined for k > 1 but not uniformly (k − 1)-determined. This will require a new concept of optimality based on a choice function greedily applied during decomposition.

Choice functions and optimality
In this section, we introduce procedures and optimality conditions for total uniformly kdetermined AUPs where k > 1, that is AUPs where there are at most k ways to apply equational decomposition.
If we were to compute the set of E-refined generalizations for a total uniformly k-determined AUP by testing every decomposition, even for k = 2 the size of the search space is too large to deal with efficiently. However, we can find a (R, C, G)-optimal E-refined generalization (precisely defined below) in quadratic time, where R is a rigidity function, C an R-choice function, G is a set of state transformation rules. Essentially, (R, C, G)-optimality means the R-choice function chooses the "right" computation path via G based on the rigidity function R. The effect is that we reduce the problem of total uniformly k-determined AUPs to the case of total uniformly 1determined AUPs with the additional complexity of computing the choice function at each step. We will provide a choice function with linear time complexity based on the procedure for G base .
In the paper, we use a specific type of choice function, defined as follows: Definition 20 ((R, C, G)-optimal generalization). Let A be {X : t s}, R be a rigidity function, C be an R-choice function, and G ⊇ G base be a set of state transformation rules. We say that a generalization r of the terms t and s is an (R, C, G)-optimal generalization if r = Xσ , where σ is resulting from the derivation A; ∅;Id =⇒ * ∅; S; σ using the rules of G, in which every decomposition is either syntactic or is performed with respect to the alignment computed by the choice function C.
All the definitions in this section can be used with uniform and orderless alignments as well. It is straightforward to obtain the corresponding variants and, therefore, we have not spelled them out explicitly.
In the following subsection, we show how the above definitions can lead to a more general result (compared to the one in the previous section) concerning associative generalization.

k-determined associative and associative-unit generalization
First, we generalize Theorem 5 from 1-determined to k-determined AUPs.
Theorem 8. Given a state A; S; σ , where all AUPs in A are total uniformly k-determined (resp. total k-determined) associative (resp. associative-unit) generalization problems with k > 1 and the size of A is n, we can reach ∅; S ; σ -in time O(n 2 ), if all the max-uniform (resp. maximal) nonempty alignments are given; -in time O(n 3 ), if the max-uniform (resp. maximal) nonempty alignments are to be computed.
Proof. Case 1, alignments are given: The difference with Theorem 5 is that in the k-determined case, we have to make a choice to select the preferred alignment between the given at most k alternatives, when we are decomposing an AUP with associative head. The choice requires running the linear G base algorithm k times. Hence, each choice is made in linear time (in contrast to 1-determined case, when no choice was needed). Consequently, we get quadratic running time. This reasoning applies to both uniform and general cases.
Case 2, alignments are to be computed: When an AUP with an associative head is getting decomposed, we need first to compute alignments and then choose the preferred one among them. The running time of these two consecutive operations is dominated by the alignment computation, which is quadratic (computing at most k alignments, each of them in quadratic time) in contrast to the linear choice in the second step. Hence, it gives cubic running time. This reasoning applies to both uniform and general cases.
Our rigidity function R A takes a pair of symbol sequences and returns the set of their maxuniform nontrivial alignments: R A (w 1 , w 2 ) := max-unif-ne-align(w 1 , w 2 ).
The preferred (R A , G A )-choice function uses the linear time procedure G base to make a choice of a min between the various possible alignments. Notice that we use associative decomposition for {P}; ∅; Id =⇒ * G A dec(P, a); S ;σ and syntactic decomposition in the derivation dec(P, a); S ;σ =⇒ * G base ∅; S;σ a . When we move to the AU-case, we remove the uniformity restriction and consider a different rigidity function: R AU (w 1 , w 2 ) := max-ne-align(w 1 , w 2 ). We will denote by G AU the algorithm obtained by extending G base with Dec-AU-L and Dec-AU-R rules. Theorem 9. An (R A , PC (R A ,G A ) , G A )-optimal higher-order {A}-refined pattern generalization for a total uniformly k-determined AUP, k > 1, when all the alignments are given, can be computed in time O(n 2 ), where n is the size of the AUP.
An (R AU , PC (R AU ,G AU ) , G AU )-optimal higher-order {A, U}-refined pattern generalization for a total k-determined AUP, k > 1, when all the alignments are given, can be computed in time O(n 2 ), where n is the size of the AUP.
Theorem 10. An (R A , PC (R A ,G A ) , G A )-optimal higher-order {A}-refined pattern generalization for a total uniformly k-determined AUP, k > 1, can be computed in time O(n 3 ), where n is the size of the AUP.
An (R AU , PC (R AU ,G AU ) , G AU )-optimal higher-order {A, U}-refined pattern generalization for a total k-determined AUP, k > 1, can be computed in time O(n 3 ), where n is the size of the AUP.
Note that we can achieve the same results even if the AUP is not total (uniformly) k-determined, but our rigidity functions enforce computation of maximum k alignments. For instance, we can define R A (w 1 , w 2 ) as a set consisting of at most k max-uniform nonempty alignments of w 1 and w 2 , which we denote by max-unif-ne-align k : R A (w 1 , w 2 ) := max-unif-ne-align k (w 1 , w 2 ). Similarly, we can define R AU (w 1 , w 2 ) := max-ne-align k (w 1 , w 2 ). Then, both Theorem 9 and Theorem 10 hold for such rigidity functions without requiring that the AUPs are k-determined. Moreover, if we take k = 1, then we can obtain counterparts of Theorem 6 and Theorem 7 without requiring that the AUPs there are total (uniformly) 1-determined.

Commutative and Commutative-Unit Case
The C-theory is the simplest one among our (nonfree) equational theories. Commutative functions have two arguments whose order does not matter, which implies that the notions of determinedness and uniform determinedness coincide for AUPs with commutative symbols.
For total 1-determined AUPs, we have the result about linear complexity of computing {C}and {C, U}-refined generalizations. It does not make a difference whether the alignments are given or not: it just takes constant time to get them at each commutative decomposition step.
Theorem 11. A higher-order {C}-refined (and {C, U}-refined) pattern generalization for a total uniformly 1-determined (hence, for a total 1-determined AUP) can be computed in linear time.
Proof. Linear running time follows from linearity of the higher-order pattern generalization algorithm from .
The obtained generalization is a refined generalization because the alignments guarantee that all subterms of the syntactic lgg, which are not η-equivalent to generalization variables, occur also in the generalization we compute. When our generalizations contain no other nonvariable subterms than the syntactic lgg, those AUPs that are generalized by a variable in the lgg end up in the store of our derivation, and then the Merge rule will make sure that the computed generalization is as good as the syntactic lgg. For {C, U}-refined generalizations, we do not require linearity. Although we do not use Exp-U-L, Exp-U-R, and DH-U rules for them (the syntactic lgg would anyway use variables to generalize the AUPs to which those rules apply), the merging rule is not forbidden. 4 When the determinedness restriction is lifted, the preferred choice function for commutative AUPs will have to make a choice among at most two orderless alignments. Hence, one can say that unrestricted AUPs with commutative functions are the same as total 2-determined AUPs with commutative functions. Taking this relation into account, we get the following theorem: Theorem 12. A (R C , C (R C ,G C ) , G C )-optimal higher-order {C}-refined (resp. {C, U}-refined) pattern generalization for any AUP can be computed in O(n 2 ) time, where n is the size of the AUP.
Proof. Note that we use the same rigidity and choice functions for both {C}and {C, U}-refined generalizations. To decide which of the two alignments at the C-decomposition step would lead to a better generalization, we need linear time. Combining it with the statement of Theorem 11, we get the overall quadratic running time.
The strategy of eliminating the first alignment element g 1 [i 1 , j 1 ] from o is defined below. It is much simpler than what we did for the associative and associative-unit cases. The Y's and Z's are fresh variables of appropriate types. For simplicity, we show only P and its successors.
Hence, after this transformation, we get a new state State 1 containing a new problem (s 1 , . . . , s i 1 −1 , s i 1 +1 , . . . , s m ). We also obtain a new alignment o 1 = o − g 1 [i 1 , j 1 ]. Repeating the same process k − 1 times, from f (t 1 , . . . , t n ) (resp. from f (s 1 , . . . , s m )), we remove t i 1 , . . . , t i k (resp. s i 1 , . . . , s i k ). When o is a max-uniform alignment, there are two possibilities: either the terms f (t 1 , . . . , t n ) and f (s 1 , . . . , s m ) got completely eliminated, or there are some "leftovers" from both of them. The state State k , obtained after these k − 1 steps, in the latter case would contain a new problem P k = Z k ( #» x ) : t s , where t and. s are those "leftover terms". The alignment is now empty. Assume t = f (t 1 , . . . , t n ) and s = f (s 1 , . . . , s m ). Assume also without loss of generality that n ≤ m . Then, we can decompose P k by applying Dec-AC-R rule n − 2 times, each time removing the first arguments from each side, and then finally applying it again to get two AUPs of the form t n −1 s n −1 and t n f (s n , . . . , s m ).
(When f is an ACU-symbol, P k may have f in one of its sides. In this case, no decomposition applies.) We end up with the state State k+1 , which is the result of total decomposition of P. In State k+1, we got rid of at least k occurrences of f compared to State. Moreover, since o was a max-uniform or maximal alignment, those AUPs in State k+1 , which do not correspond to any of the element of o, do not have the same root. In the AC case, the only rule that applies to them is Solve. In the ACU case, except Solve also the unit expansion or DH-U rules may apply, but since we aim at computing E-refined generalizations, it is enough to transform them by Solve. It means that we can apply a sequence of Solve rules to State k+1 , which would keep in A only those AUPs whose root was one of the g's from o. The other AUPs will move to the store. Let the obtained state be Cong k+2 = A k+2 ; S k+2 ; σ k+2 .
Take Y( #» x ) : t * s * ∈ A k+2 . If t * = f , then t * was a subterm of f (t 1 , . . . , t n ). The same is true for s * and f (s 1 , . . . , s n ). Let P − A k+2 be the AUP obtained from P by replacing all such subterms (i.e., the nonunit sides of AUPs from A k+2 ) by some constants. Let us call this operation the subtraction from P the AUPs from A k+2 .
Theorem 13. Given a state A; S; σ , where all AUPs in A are total uniformly 1-determined (resp. total 1-determined) associative-commutative (resp. associative-commutative-unit) generalization problems and the size of A is n, we can reach ∅; S ; σ -in time O(n), if all the max-uniform (resp. maximal) nonempty alignments are given; -in time O(n 3 ), if the max-uniform (resp. maximal) nonempty alignments are to be computed.
Proof. First, consider the AC case and assume all max-uniform alignments are given. Since the problem is total uniformly 1-determined, all those oless-max-unif-ne-align sets are singletons. For each AUP P in A, we have the following: -If the root of P is not AC, then using Dec, Abs, and Solve rules, we either eventually eliminate all the successor problems of P from A in O(|P|) steps, or reach a new problem P with an associative-commutative head in O(|P − {P }|) steps, where |P − {P }| is obtained by subtracting the AUP P from P.
-If the root of P is AC, then by the decomposition procedure outlined above, in linearly many steps in the size of P, we either eliminate all successor problem of P from A in O(|P|) steps, or reach a new problem P with an AC head in O(|P − {P }|) steps, where |P − {P }| is obtained by subtracting the AUP P from P.
It implies that eventually eliminating all AUPs that originate from P takes time O(|P|). Given an alignment element g l [i l , j l ], extracting i l 'th and j l 's subterms from the given AUP can be done in constant time. Therefore, eliminating all AUPs from A needs time O(n), where n is the size of A. Now assume the max-uniform alignments should be computed. We do it each time when we encounter an AUP with an AC head. For each such AUP, there is at most one max-uniform nonempty alignment. It can be computed in time quadratic in the size of the AUP. Since the number of steps when we need to apply these computations is bounded linearly in n, we obtain O(n 3 ) running time in this case.
The ACU case can be proved analogously.
Based on Theorem 13, we obtain the following theorems: Theorem 14. A higher-order {A, C}-refined (resp. {A, C, U}-refined) pattern generalization for a total uniformly 1-determined (resp. total 1-determined) AUP, where all max-uniform (resp. maximal) orderless alignments are given, can be computed in time O(n) where n is the size of the AUP.
Proof. Similar to the proof of Theorem 6, using Theorem 13.
Theorem 15. A higher-order {A, C}-refined (resp. {A, C, U}-refined) pattern generalization for a total uniformly 1-determined (resp. total 1-determined) AUP can be computed in time O(n 3 ), where n is the size of the AUP.
Proof. Similar to the proof of Theorem 7, using Theorem 13.

k-determined associative-commutative and associative-commutative-unit generalization
We generalize Theorem 13 from 1-determined to k-determined AUPs. It can be proved similarly to Theorem 5.
Theorem 16. Given a state A; S; σ , where all AUPs in A are total uniformly k-determined (resp. total k-determined) associative-commutative (resp. associative-commutative-unit) generalization problems with k > 1 and the size of A is n, we can reach ∅; S ; σ -in time O(n 2 ), if all the max-uniform (resp. maximal) nonempty orderless alignments are given; -in time O(n 3 ), if the max-uniform (resp. maximal) nonempty orderless alignments are to be computed.
For the AC case, we use the rigidity function R AC (w 1 , w 2 ) := oless-max-unif-ne-align(w 1 , w 2 ). The preferred (R AC , G AC )-choice function uses the linear time procedure G base to make a choice of o min between the various possible alignments. We use associative-commutative decomposition for {P}; ∅; Id =⇒ * G A dec(P, a); S ;σ and syntactic decomposition in the derivation dec(P, a); S ;σ =⇒ * G base ∅; S;σ a . For the ACU case, we remove the uniformity restriction and consider a different rigidity function: R ACU (w 1 , w 2 ) := oless-max-ne-align(w 1 , w 2 ). The choice function is (R ACU , G ACU ), where the algorithm G ACU is obtained by extending G base with Dec-ACU-L and Dec-ACU-R.
Then we get the following counterparts of Theorems 9 and 10: Theorem 17. An (R AC , PC (R AC ,G AC ) , G AC )-optimal higher-order {AC}-refined pattern generalization for a total uniformly k-determined AUP with k > 1, when all the orderless alignments are given, can be computed in time O(n 2 ), where n is the size of the AUP. An (R ACU , PC (R ACU ,G ACU ) , G ACU )-optimal higher-order {A, C, U}-refined pattern generalization for a total k-determined AUP with k > 1, when all the orderless alignments are given, can be computed in time O(n 2 ), where n is the size of the AUP.
Theorem 18. An (R AC , PC (R AC ,G AC ) , G AC )-optimal higher-order {AC}-refined pattern generalization for a total uniformly k-determined AUP with k > 1 can be computed in time O(n 3 ), where n is the size of the AUP.
An (R ACU , PC (R ACU ,G ACU ) , G ACU )-optimal higher-order {A, C, U}-refined pattern generalization for a total k-determined AUP with k > 1 can be computed in time O(n 3 ), where n is the size of the AUP.
We can achieve the same results even if the AUP is not total (uniformly) k-determined, i.e. by using rigidity functions which only compute a maximum of k orderless alignments. For instance, we can define R AC (w 1 , w 2 ) as a set consisting of at most k max-uniform nonempty orderless alignments of w 1 and w 2 , which we denote by oless-max-unif-ne-align k : R AC (w 1 , w 2 ) := oless-max-unif-ne-align k (w 1 , w 2 ). Similarly, we can define R ACU (w 1 , w 2 ) as R ACU (w 1 , w 2 ) := oless-max-ne-align k (w 1 , w 2 ). Then, both Theorems 17 and 18 hold for such rigidity functions without requiring that the AUPs are k-determined. Moreover, if we take k = 1, then we can obtain counterparts of Theorems 14 and 15 without requiring that the AUPs there are total (uniformly) 1-determined.

Conclusion
The higher-order equational anti-unification algorithm presented in this paper combines higherorder syntactic anti-unification rules with the decomposition rules for associative, commutative, associative-commutative function symbols, and expansion rules for unit elements. This gives a modular algorithm, which can be used for problems with different symbols from different theories without any adaptation.
Higher-order pattern A-, C-, U-, AU-, CU-, AC-, and ACU-anti-unification are finitary (if only linear generalizations are computed when U is involved). In the presence of U, one needs to take special care in order to guarentee a terminating algorithm. This is something to investigate in future work. In practice, often it is desirable to compute only one answer, which is the best one with respect to some predefined criterion. We defined such an optimality criterion, which basically means that an optimal equational solution should be at least as good as the syntactic lgg. We then identified problem forms for which optimal solutions can be computed fast (in linear or polynomial time) by a greedy approach. The results are summarized in Table 1. They remain the same, if we lift the determinedness restriction from the input, but make the rigidity function provide the number of alignments bounded by a predefined constant k ≥ 1.