External Behavior of a Logic Program and Verification of Refactoring

Refactoring is modifying a program without changing its external behavior. In this paper, we make the concept of external behavior precise for a simple answer set programming language. Then we describe a proof assistant for the task of verifying that refactoring a program in that language is performed correctly.


Introduction
This paper is about the process of refactoring in the context of answer set programming (ASP), that is, about modifying an ASP program without changing its external behavior.Examples of refactoring logic programs can be found in papers by Serebrenik and Demoen (2003), Gebser et al. (2011, Section 3.1) and Buddenhagen and Lierler (2015, Section 3).In this paper we propose, for a simple ASP language, a precise definition of external behavior and a method for verifying that two programs exhibit the same external behavior.
Refactoring a program usually involves a series of small changes that improve its structure or performance.The example below shows that, in ASP, refactoring may serve also another purpose: to transform a program that a grounder classifies as unsafe into an equivalent program that it is able to ground.The program composite(I*J) :-I > 1, J > 1.
defines the set of primes in the interval {a, . . ., b}, assuming that a > 1.The grounder GRINGO (Gebser et al. 2019) tells us that the program is unsafe.A safe program defining the same set can be obtained by replacing the first rule with composite(I*J) :-I = 2..b, J = 2..b.
This is an example of refactoring, because the extent of prime/1 did not change.
We can also refactor the program to improve its performance using the fact that every composite number in {a, . . ., b} has a divisor in the interval {2, . ..⌊ √ b⌋}: In the Abstract Gringo language (Gebser et al. 2015), a program is defined as a set of rules, so that a program includes neither directives nor comments.Under this narrow definition, the program itself does not tell us which predicate symbols are meant to represent the output, and which symbols are auxiliary.But this difference is essential, because changing auxiliary predicates does not indicate a mistake in the process of refactoring.
Furthermore, the rules of a program do not show what kind of input is supposed to be provided for it.Generally, an input for an ASP program can be specified in two ways.First, some symbolic constants, such as a and b in the programs above, may be meant to serve as placeholders for elements of the input.Second, some predicate symbols occurring in the program may occur in the bodies of rules only, not in the heads.The extents of such predicates may be specified as part of input when we run the program.Some inputs may not conform to the programmer's assumptions about the intended use of the program.For instance, when we run the prime number programs above, the placeholders a and b are expected to be replaced by integers; the cases when they are replaced by symbolic constants are not related to external behavior if the programs are used as intended.
To sum up, what we consider external behavior of a set of rules depends on how these rules are meant to be used.In Sections 3-5, we make this idea precise for the subset of Abstract Gringo called mini-GRINGO (Fandinno et al. 2020, Section 2; Lifschitz 2022, Sections 2, 3).After that, we describe the proof assistant ANTHEM-P2P, 1 which uses the theorem prover VAMPIRE (Kovaćs and Voronkov 2013) to verify that two mini-GRINGO programs have the same external behavior.This proof assistant is built on top of the system ANTHEM (Fandinno et al. 2020), whose focus is on the related and yet different task of confirming that an ASP program adheres to its specification.The prime number programs above are used as a running example.To make the paper more self-contained, we have reviewed some background material in Appendices A-C.

On the syntax of mini-GRINGO
There are minor syntactic differences between mini-GRINGO and the input language of the grounder GRINGO, explained by the fact the former is designed for theoretical studies, and the latter for actual programming.For example, the definition of sqrt_b/1 in the introduction, rewritten in the syntax of mini-GRINGO, becomes Overlined symbols, such as 1, are "numerals"-syntactic objects representing integers.In examples of rules and programs, we will freely switch between the two styles.
In mini-GRINGO, precomputed terms are numerals, symbolic constants, and the symbols inf, sup.We assume that a total order on precomputed terms is chosen, such that inf is its least element, sup is its greatest element, and, for all integers m and n, m < n iff m < n.A precomputed atom is an expression of the form p(t), where p is a symbolic constant and t is a tuple of precomputed terms.A predicate symbol is a pair p/n, where p is a symbolic constant and n is a 1 Available at https://github.com/ZachJHansen/anthem-p2p .nonnegative integer.About a rule or another syntactic expressions we say that it contains p/n if it contains an atom of the form p(t 1 , . . .,t n ).

External behavior
Definition 1 A user guide is a quadruple where • PH is a finite set of symbolic constants, called placeholders, • In and Out are disjoint finite sets of predicate symbols, called input symbols and output symbols, and • Dom is a set such that each of its elements is a pair (v, I ), where (i) v is a function that maps elements of PH to precomputed terms that do not belong to PH, and (ii) I is a subset of the set of precomputed atoms that contain an input symbol and do not contain placeholders.
The set Dom is the domain of the user guide, and pairs (v, I ) satisfying conditions (i) and (ii) are called inputs.An input (v, I ) represents a way to choose the values of placeholders and the extents of input predicates: for every placeholder c, specify v(c) as its value, and add the atoms I to the rules of the program as facts.If Π is a mini-GRINGO program then v(Π) stands for the program obtained from Π by replacing every occurrence of every constant c in the domain of v by v(c).Using this notation, we can say that choosing (v, I ) as input for Π amounts to replacing Π by the program v(Π) ∪ I .
To use a program in accordance with user guide (1) means to run it for inputs that belong to Dom.The inputs that do not belong to Dom are not related to the external behavior of the program when it is used as intended.
Example 1 The intended use of the programs discussed in the introduction can be described by user guide (1) with PH = {a, b}, In = / 0, Out = {prime/1}, and with the domain consisting of the inputs (v, / 0) such that v(a), v(b) are numerals.(We could choose also to include the condition v(b) ≥ v(a) > 1.)This user guide will be denoted by UG p .
Example 2 We would like to describe the meaning of the word orphan by a logic program (Gelfond and Kahl 2014, Section 4.1.2).The intended use of such a program can be described by user guide (1) with PH = / 0, In = {father/2, mother/2, living/1}, Out = {orphan/1}, and with the domain consisting of all inputs.We will denote this user guide by UG o .In the next two sections, we examine two possible definitions of orphan/1 and consider the question of their equivalence with respect to UG o .
User guides are closely related to lp-functions (Gelfond 2002, Section 2), and also to io-programs (Fandinno et al. 2020, Section 5), reviewed in Appendix C.
An output atom of a user guide UG is a precomputed atom that contains an output symbol of UG.
Definition 2 Let (v, I ) be an input in the domain of a user guide UG, and let Π be a mini-GRINGO program such that the heads of its rules do not contain input symbols of UG.The external behavior of Π for the user guide UG and the input (v, I ) is the collection of all sets that can be represented as the intersection of a stable model of v(Π) ∪ I with the set of output atoms of UG.
Example 1, continued If Π is one of the three prime number programs from the introduction, and (v, I ) is an input in the domain of UG p , then the program v(Π) ∪ I is v(Π), and it has a unique stable model.If v is defined by the conditions v(a) = 10, v(b) = 15, then that stable model includes the atoms prime( 11), prime( 13), and some atoms containing composite/1.The external behavior of each of the programs for this input is {{prime(11), prime( 13)}}.For the safe and optimized versions, this external behavior can be calculated by instructing CLINGO to find all answers for the file obtained from the program by appending the directives orphan(X) :-living(X), not parent_living(X).
( and the directive #show orphan/1. In the special case when UG has neither placeholders nor input symbols, and its set of output symbols includes all predicate symbols occurring in Π, the external behavior of Π with respect to UG and ( / 0, / 0) is the set of stable models of Π.In this sense, the concept of external behavior is a generalization of the stable model semantics.

Equivalence
Definition 3 Let UG be a user guide, and let Π 1 , Π 2 be mini-GRINGO programs such that the heads of their rules do not contain input symbols of UG.We say that Π 1 is equivalent to Π 2 with respect to UG if, for every input (v, I ) in the domain of UG, the external behavior of Π 1 for UG and (v, I ) is the same as the external behavior of Π 2 .
Example 1, continued The three programs from the introduction are equivalent to each other with respect to UG p .As discussed in Section 6, this claim can be verified using the automated reasoning tools ANTHEM-P2P and VAMPIRE.
Example 2, continued Perhaps surprisingly, the one-rule program orphan(X) :-living(X), father(Y,X), mother(Z,X), not living(Y), not living(Z). (5) is not equivalent to (2) with respect to UG o .Indeed, the external behavior of this program with respect to UG o and input (3) is { / 0}, which is different from (4).We will see that ANTHEM-P2P can help us clarify the relationship between programs (2) and ( 5).
We understand refactoring a mini-GRINGO program with respect to a user guide UG as replacing it by a program that is equivalent to it with respect to UG.
This equivalence relation is essentially an example of relativized uniform equivalence with projection (Oetsch and Tompits 2008), except that the language discussed by Oetsch and Tompits includes neither arithmetic operations nor placeholders.It is uniform equivalence, because the programs are extended by adding facts, rather than more complex rules; relativized, because these facts I are assumed to be atoms containing input symbols, not arbitrary atoms; with projection, because we look at the output atoms in the stable model, not the entire model.

Formal notation for user guides
To design software for verifying the equivalence of programs with respect to a user guide, we need to represent user guides in formal notation.The format that we chose for user guide files is similar to the format of specification files, defined by Fandinno et al. (2020) within their work on the system ANTHEM.Placeholders and input symbols are represented by input statements, for instance: input: n. input: living/1, father/2, mother/2.
Output symbols are represented by output statements: There can be several statements of both kinds in a user guide file, in any order.
The question of representing the domain Dom by a string of characters is more difficult, because the domain is a set of inputs, which is generally infinite.Our approach is to define "assumptions" as sentences of an appropriate first-order language, and characterize the domain by a list of assumptions; an input belongs to the domain iff it satisfies all assumptions on that list.
For any set P of predicate symbols, by σ 0 (P) we denote the subsignature of the two-sorted signature σ 0 , described in Appendix A, in which the set of predicate symbols is limited to the comparison symbols and the symbols from P. In this paper, an assumption is a sentence over the signature σ 0 (In).Besides input and output statements, a user guide file may include one or more statements consisting of the word assume followed by an assumption.
To use assumptions as conditions on an input, we need to relate inputs to interpretations in the sense of first-order logic.If v is a function that maps elements of some set PH of symbolic constants to symbolic constants, and I is a subset of the set of precomputed atoms that contain a predicate symbol from P, then there exists a unique interpretation I of σ 0 (P) such that We will denote that interpretation by I(v, I ).The domain of the user guide defined by a set of assumptions is the set of inputs (v, I ) such that the interpretation I(v, I ) of σ 0 (In) satisfies all assumptions in that set.Example 2, continued The user guide UG o can be described by the statements input: living/1, father/2, mother/2.output: orphan/1. (6) The absence of assume statements here shows that the domain is the set of all inputs.
6 Functionality of ANTHEM-P2P The proof assistant ANTHEM-P2P uses the theorem prover VAMPIRE to verify that two mini-GRINGO programs have the same external behavior with respect to a given user guide.We can verify, for instance, that the first two versions of the prime number program from the introduction are equivalent with respect to the user guide UG p by running ANTHEM-P2P on three files: the unsafe program composite(I*J) :-I > 1, J > 1. prime(I) :-I = a..b, not composite(I).
The system ANTHEM-P2P transforms the task of verifying equivalence with respect to a user guide (1) into the problem of verifying the provability of a formula in a first-order theory over the signature σ 0 (In ∪ Out), and submits that problem to VAMPIRE; see Sections 7-9 for details.
The user can help VAMPIRE organize search more efficiently by supplying ANTHEM-P2P with "helper" files.Such a file may instruct VAMPIRE to prove a series of lemmas before trying to prove the goal formula.A helper file can suggest also instances of the induction schema that may be useful for the job in hand.This kind of help is needed, for instance, for verifying the equivalence of the optimized prime number program to the other two.
The use of ANTHEM-P2P for proving equivalence of programs is, generally, an interactive process.If VAMPIRE does not prove the goal formula in the allotted time then one of the options is to provide more lemmas and run ANTHEM-P2P again.Alternatively, the user can look for a counterexample that refutes the equivalence claim, as in Example 2 above.
Sometimes, ANTHEM-P2P can help us clarify the source of a puzzling discrepancy between two versions of a program if we run it in the presence of additional assume statements.If adding an assumption to the user guide makes the programs equivalent then it is possible that perceiving that assumption as self-evident is the reason why the discrepancy is puzzling.For instance, we can observe that the ANTHEM-P2P/VAMPIRE combination proves the equivalence of program (2) to program (5) if we extend user guide (6) by two existence and uniqueness assumptions: assume: forall X exists Y forall Z (father(Z,X) <-> Y=Z).assume: forall X exists Y forall Z (mother(Z,X) <-> Y=Z).
The limitations of the ANTHEM-P2P algorithm are inherited from the limitations of ANTHEM and can be described as follows.The predicate dependency graph of a mini-GRINGO program Π (Fandinno et al. 2020, Section 6.3) is the directed graph that • has the predicate symbols contained in Π as its vertices, and • has an edge from p/n to q/m if some rule of Π contains p/n in the head and q/m in the body.
The edge from p/n to q/m is positive if there is a rule R in Π such that p/n is contained in the head of R, and q/m is contained in an atom in the body of R that is not in the scope of negation.For example, the predicate dependency graph of program (2) has 6 edges; all of them except for the edge from parent_living/1 to orphan/1 are positive.We say that Π is tight if this graph has no cycles consisting of positive edges.A vertex p/n of the graph is private for a user guide UG if it is neither an input symbols nor an output symbol of UG.We say that Π uses private recursion for UG if • the predicate dependency graph of Π has a cycle such that every vertex in it is a private symbol, or • Π includes a choice rule with the head containing a private symbol.
As discussed in the next two sections, the applicability of the algorithm implemented in ANTHEM-P2P to a pair of mini-GRINGO programs and a user guide UG is guaranteed whenever the programs are tight and do not use private recursion with respect to UG.We expect that it will be possible to replace the tightness requirement by a significantly weaker condition using the ideas of a recent paper on "locally tight" programs (Fandinno and Lifschitz 2021); this is a topic for future work.
The theorem stated below relates equivalence of tight programs to the satisfaction relation of second-order logic.Its statement refers to the concept of second-order completion, reviewed in Appendix B, and also to the concept of standard interpretation, defined as follows.An interpretation I of σ 0 (P) is standard for a set PH of symbolic constants if it satisfies conditions (a), (b), (d), (e), (g) from Section 5 and the condition (c ′ ) I interprets every symbolic constant in PH as a term that does not belong to PH.
Theorem Let UG be a user guide (PH,In,Out,Dom) such that its domain is described by a finite set of assumptions, and let Asm be the conjunction of these assumptions.For any tight mini-GRINGO programs Π 1 , Π 2 such that the heads of their rules do not contain the input symbols of UG, Π 1 is equivalent to Π 2 with respect to UG iff the sentence is satisfied by all interpretations of the signature σ 0 (In ∪Out) that are standard for PH.
This theorem shows that the equivalence of tight programs may be established by choosing a first-order theory T over the signature σ 0 (In ∪ Out) such that its axioms are satisfied by all interpretations that are standard for PH, and then exhibiting a derivation of formula (10) from the axioms of T in classical second-order logic.For programs that do not use private recursion, the problem of constructing such a derivation can be reduced to proof search in first-order logic (see Section 8 below), for which many automated reasoning tools are available.This is the core of the procedure used by ANTHEM-P2P.
The proof of the theorem, including the lemma below, uses terminology related to io-programs, which is reviewed in Appendix C.
Lemma Let Π be a mini-GRINGO program such that the heads of its rules do not contain input symbols of a user guide (PH,In,Out,Dom).For any input (v, I ), a set J of output atoms is an element of the external behavior of Π for (PH,In,Out,Dom) and (v, I ) iff I ∪ J is an io-model of the io-program (Π,PH,In,Out) for (v, I ).
Proof For every set J of output atoms, the conditions • J is the set of all output atoms in some stable model M of v(Π) ∪ I ; • I ∪ J is the set of all public atoms in some stable model M of v(Π) ∪ I are equivalent to each other.Indeed, since the heads of rules of v(Π) do not contain input atoms, the set of input atoms in M is I .

Proof of the Theorem The condition
means that for any input (v, I ) such that I(v, I ) |= Asm and any set J of output atoms, J is an element of the external behavior of Π 1 for UG and (v, I ) iff J is an element of the external behavior of Π 2 for UG and (v, I ).
(12) By the lemma, condition (12) can be reformulated as follows: By the theorem quoted at the end of Appendix C, this can be further reformulated as Hence condition ( 11) is equivalent to requiring that ( 13) hold for all inputs (v, I ) such that I(v, I ) |= Asm and all set J of output atoms.Since assumptions do not contain output symbols, 11) is equivalent to asserting that implication (10) is satisfied by I(v, I ∪ J ) for all inputs (v, I ) and all sets J of output atoms.It remains to observe that an interpretation of the signature σ 0 (In ∪ Out) can be represented in the form I(v, I ∪ J ) if and only if it is standard for PH.
8 Reduction to first-order logic If Π 1 and Π 2 do not use private recursion then the reference to second-order consequences of the axioms of T in Section 7 can be eliminated in the following way.Represent the formula COMP(Π 1 , In, Out) in the form where P is a list of distinct predicate variables corresponding to the private symbols p 1 , p 2 , . . . of Π 1 , and F i (P) is the formula obtained from the completed definition of p i in Π 1 by replacing each of p 1 , p 2 , . . .by the corresponding member of P. (Thus the conjunctive members of F ′ (P) correspond to the completed definitions of the output symbols and to the constraints of Π 1 .)Similarly, write COMP(Π 2 , In, Out) as where Q is a list of distinct predicate variables corresponding to the private symbols q 1 , q 2 , . . . of Π 2 , and the formulas G j (Q) are obtained from the completed definitions of these symbols in Π 2 by replacing them with corresponding variables.Take one half of condition (10).Since Π 2 does not use private recursion, formula ( 14) is equivalent to Fandinno et al. 2020, Theorem 3).It follows that formula ( 15) is equivalent to and consequently to (with the bound variables in P, Q renamed, if necessary, to ensure that they are pairwise disjoint).
Similarly, the second half of condition ( 10) is equivalent to the formula obtained from ( 16) by swapping F ′ (P) with G ′ (Q).Thus (10) can be rewritten as Finally, observe that this formula is entailed by the axioms of T if and only if the axioms entail the first-order formula where p, q are lists of fresh predicate constants.We return to this formula in the description of the design of ANTHEM-P2P below.Note that its subformulas F i (p), G j (q), F ′ (p), G ′ (q) are parts of the first-order completion formulas of Π 1 and Π 2 , modified by replacing their private symbols p 1 , p 2 , . .., q 1 , q 2 , . . .by members of the lists p and q. 9 Design of ANTHEM-P2P The system ANTHEM-P2P is a Python program than operates by converting a claim about the equivalence of two mini-GRINGO programs into an input for ANTHEM.The system ANTHEM verifies the correctness of an io-program with respect to a formal specification.The file describing a specification includes lists of placeholders, input symbols, output symbols, and assumptions, and also a list of "specs" that describe the intended behavior of the future program by sentences over the signature σ 0 (In ∪ Out).
Given programs Π 1 and Π 2 and a user guide (PH,In,Out,Dom) with the domain described by assumptions Asm, ANTHEM-P2P constructs the following specification Sp: Then ANTHEM-P2P instructs ANTHEM to prove the claim that the io-program (Π 2 , PH, In, Out) implements Sp.Providing ANTHEM with such an instruction makes it to look for a derivation of the formula from the axioms of T by invoking the theorem prover VAMPIRE (Fandinno et al. 2020, Section 6.4).This formula is equivalent to (17).Thus instructing ANTHEM to verify that the ioprogram (Π 2 , PH, In, Out) implements the specification Sp amounts to verifying the provability of formula (17) in T .
As an example, consider the operation of the ANTHEM-P2P algorithm on programs ( 7) and ( 8) and user guide (9).In each of the programs, the only private predicate is composite/1; it corresponds to both p 1 and q 1 in the notation of Section 8.The symbols composite_1/1 and composite_2/1, generated by ANTHEM-P2P, play the parts of p and q in formula ( 17).The file describing the specification Sp is obtained in this case from user guide (9) by adding three statements.First, in accordance with clause (ii) in the description of Sp above, ANTHEM-P2P adds the statement input: composite_1/1.
Once Sp is generated, ANTHEM calls VAMPIRE to prove formula (18) in the theory T , first by deriving the specs F ′ (p) from the antecedent of (18) and G ′ (q) ("verification of specification from translated program"), and then by deriving G ′ (q) from the antecedent of (18) and the specs F ′ (p) ("verification of translated program from specification").In this example, the runtime of VAMPIRE will be significantly reduced (a few seconds instead of a few minutes) if we instruct it to start by proving two lemmas: lemma: forall I, J (I > 1 and J > 1 -> I < I*J).lemma: forall X (prime(X) -> exists N1 (not composite_1(N1) and exists N2, N3 (N2 = a and N3 = b and N2 <= N1 and N1 <= N3) and X = N1)).
The ANTHEM-P2P website2 allows users to experiment with the system in their web browser.The proof search is conducted on a University of Nebraska Omaha server (Oracle Linux 8, 4 Intel(R) Xeon(R) Gold 6248 CPUs, 4 GB RAM) subject to a 10 minute timeout.For smaller problems, this is the recommended introduction to the system.

Conclusion
This paper contributes to the theory of logic programming by defining user guides, external behaviors, and equivalence with respect to a user guide.The theorem proved in Section 7 relates equivalence of tight programs to program completion.
The problem of checking equivalence between programs arises in many areas of computer science.For example, verifying the correctness of the translation performed by an optimizing compiler is a problem of this kind.What is special about the verification of refactoring is that it involves a pair of similar programs written in the same programming language.MEDIATOR (Wang et al. 2018) is a tool that uses an SMT solver for the verification of database refactoring.
The proof assistant ANTHEM-P2P can be used for verifying the correctness of refactoring an ASP program, and also for comparing alternative solutions to the same programming problem (for instance, in classroom teaching and in ASP programming contests).To make this tool more versatile, we plan to make it applicable to programs with aggregates, along the lines of recent publications (Fandinno et al. 2022;Lifschitz 2022).
(a) the domain of the sort general in I is the set of all precomputed terms; (b) the domain of the sort integer in I is the set of all numerals; (c) I interprets every symbolic constant c in PH as v(c); (d) I interprets every precomputed term t that does not belong to PH as t; (e) I interprets the symbols for arithmetic operations as usual in arithmetic; (f) if p/n is a predicate constant from P, and c is an n-tuple of precomputed atoms, then I interprets p(c) as true iff p(c) ∈ I ; (g) I interprets the comparison symbols as in the definition of mini-GRINGO.

Example 1 ,
continued The user guide UG p can be described by the statements input: a, b. assume: exists N (a = N) and exists N (b = N).output: prime/1.The first two lines can be written more concisely as input: a -> integer, b -> integer.
(i) the placeholders of Sp are the placeholders PH of the given user guide; (ii) the input symbols of Sp are the input symbols In of the user guide and the predicate symbols p corresponding to the private symbols p 1 , p 2 , . . . of the program Π 1 ; (iii) the output symbols of Sp are the output symbols Out of the user guide; (iv) the assumptions of Sp are the assumptions Asm of the user guide and the modified completed definitions F i (p) of the private symbols of Π 1 ; (v) the specs of Sp are the remaining conjunctive terms F ′ (p) of the modified first-order completion formula of Π 1 .