Determining ActionReversibility in STRIPS Using Answer Set and Epistemic Logic Programming

In the context of planning and reasoning about actions and change, we call an action reversible when its effects can be reverted by applying other actions, returning to the original state. Renewed interest in this area has led to several results in the context of the PDDL language, widely used for describing planning tasks. In this paper, we propose several solutions to the computational problem of deciding the reversibility of an action. In particular, we leverage an existing translation from PDDL to Answer Set Programming (ASP), and then use several different encodings to tackle the problem of action reversibility for the STRIPS fragment of PDDL. For these, we use ASP, as well as Epistemic Logic Programming (ELP), an extension of ASP with epistemic operators, and compare and contrast their strengths and weaknesses. Under consideration for acceptance in TPLP.


Introduction
Traditionally, the field of Automated Planning deals with the problem of generating a sequence of actions-a plan-that transforms an initial state of the environment to some goal state, see for instance Traverso, 2004, 2016). Actions, in plain words, stand for modifiers of the environment. One interesting question is whether the effects of an action are reversible (by other actions), or in other words, whether the action effects can be undone. Notions of reversibility have previously been investigated, most notably by (Eiter, Erdem, and Faber, 2008) and by (Daum, Torralba, Hoffmann, Haslum, and Weber, 2016).
Studying action reversibility is important for several reasons. Intuitively, actions whose effects cannot be reversed might lead to dead-end states from which the goal state is no longer reachable. Early detection of a dead-end state is beneficial in a plan generation process, as shown by (Lipovetzky, Muise, and Geffner, 2016). Reasoning in more complex structures such as Agent Planning Programs De Giacomo, Gerevini, Patrizi, Saetti, and Sardiña (2016), which represent networks of planning tasks where a goal state of one task is an initial state of another task is even more prone to dead-ends, as shown by (Chrpa, Lipovetzky, and Sardiña, 2017). Concerning non-deterministic planning, for instance Fully Observable Non-Deterministic (FOND) Planning, where actions have non-deterministic effects, determining reversibility or irreversibility of each set of effects of the action can contribute to early dead-end detection, or to generalise recovery from undesirable action effects, which is important for efficient computation of strong (cyclic) plans, cf. Camacho, Muise, and McIlraith (2016). Concerning online planning, we can observe that applying reversible actions is safe and hence we might not need to explicitly provide the information about safe states of the environment Cserna, Doyle, Ramsdell, and Ruml (2018). Another, although not very obvious, benefit of action reversibility is in plan optimization. If the effects of an action are later reversed by a sequence of other actions in a plan, these actions might be removed from the plan, potentially shortening it significantly. It has been shown by (Chrpa, McCluskey, and Osborne, 2012) that under given circumstances, pairs of inverse actions, which are a special case of action reversibility, can be removed from plans.  introduced a general framework for action reversibility that offers a broad definition of the term, and generalises many of the already proposed notions of reversibility, like "undoability" proposed by (Daum et al., 2016), or the concept of "reverse plans" as introduced by (Eiter et al., 2008). The concept of reversibility of  directly incorporates the set of states in which a given action should be reversible. We call these notions S-reversibility and ϕ-reversibility, where the set S contains states, and the formula ϕ describes a set of states in terms of propositional logic. These notions are then further refined to universal reversibility (referring to the set of all states) and to reversibility in some planning task Π (referring to the set of all reachable states w.r.t. the initial state specified in Π). These last two versions match the ones proposed by (Daum et al., 2016). Furthermore, our notions can be further restricted to require that some action is reversible by a single "reverse plan" that is not dependent of the state for which the action is reversible. For single actions, this matches the concept of the same name proposed by (Eiter et al., 2008).
The complexity analysis of  indicates that some of these problems can be addressed by means of Answer Set Programming (ASP), but also by means of Epistemic Logic Programs (ELPs). In this paper, we leverage the translations implemented in plasp Dimopoulos, Gebser, Lühne, Romero, and Schaub (2019), and produce ASP and ELP encodings to effectively solve some of the reversibility problems on PDDL domains, restricted, for now, to the STRIPS fragment Fikes and Nilsson (1971). The encodings differ quite a bit concerning their generality and extensibility, and we discuss their advantages and disadvantages. We also present preliminary experiments that compare the various encodings, highlighting a trade-off between extensibility and efficiency.
2 Background STRIPS Planning. Let F be a set of facts, that is, propositional variables describing the environment, which can either be true or false. Then, a subset s ⊆ F is called a state, which intuitively represents a set of facts considered to be true. An action is a tuple a = pre(a), add(a), del(a) , where pre(a) ⊆ F is the set of preconditions of a, and add(a) ⊆ F and del(a) ⊆ F are the add and delete effects of a, respectively. W.l.o.g., we assume actions to be well-formed, that is, add(a) ∩ del(a) = / 0 and pre(a) ∩ add(a) = / 0. An action a is applicable in a state s iff pre(a) ⊆ s. The result of applying an action a in a state s, given that a is applicable in s, is the state a[s] = (s \ del(a)) ∪ add(a). A sequence of actions π = a 1 , . . . , a n is applicable in a state s 0 iff there is a sequence of states s 1 , . . . , s n such that, for 0 < i ≤ n, it holds that a i is applicable in s i−1 and a i [s i−1 ] = s i . Applying the action sequence π on s 0 is denoted π[s 0 ], with π[s 0 ] = s n . The length of action sequence π is denoted |π|.
A STRIPS planning task Π = F , A , s 0 , G is a four-element tuple consisting of a set of facts F = { f 1 , . . . , f n }, a set of actions A = {a 1 , . . . , a m }, an initial state s 0 ⊆ F , and a goal G ⊆ F . A state s ⊆ F is a goal state (for Π) iff G ⊆ s. An action sequence π is called a plan iff π[s 0 ] ⊇ G. We further define several relevant notions w.r.t. a planning task Π. A state s is reachable from state s ′ iff there exists an applicable action sequence π such that π[s ′ ] = s. A state s ∈ 2 F is simply called reachable iff it is reachable from the initial state s 0 . The set of all reachable states in Π is denoted by R Π . An action a is reachable iff there is some state s ∈ R Π such that a is applicable in s.
Deciding whether a STRIPS planning task has a plan is known to be PSPACE-complete in general and it is NP-complete if the length of the plan is polynomially bounded Bylander (1994).

Epistemic Logic Programs (ELPs) and Answer Set Programming (ASP).
We assume the reader is familiar with ELPs and will only give a very brief overview of the core language. For more information, we refer to the original paper proposing ELPs Gelfond (1991), therein named Epistemic Specifications, whose semantics we will use in the present paper.
In these rules, all a i are atoms of the form p(t 1 , . . . ,t n ), where p is a predicate name, and t 1 , . . . ,t n are terms, that is, either variables or constants. Each ℓ is either an objective or subjective literal, where objective literals are of the form a or ¬a (a being an atom), and subjective literals are of the form K l or ¬K l, where l is an objective literal. Note that often the operator M is also used, which we will simply treat as a shorthand for ¬K ¬.
The domain of constants in an ELP P is given implicitly by the set of all constants that appear in it. Generally, before evaluating an ELP program, variables are removed by a process called grounding, that is, for every rule, each variable is replaced by all possible combination of constants, and appropriate ground copies of the rule are added to the resulting program ground(P). In practice, several optimizations have been implemented in state-of-the-art systems that try to minimize the size of the grounding.
The result of a (ground) ELP program P is calculated as follows Gelfond (1991). An interpretation I is a set of ground atoms appearing in P. A set of interpretations I satisfies a subjective literal K l (denoted I K l) iff the objective literal l is satisfied in all interpretations in I . The epistemic reduct P I of P w.r.t. I is obtained from P by replacing all subjective literals ℓ with either ⊤ in case where I ℓ, or with ⊥ otherwise. P I , therefore, is an ASP program, that is, a program without subjective literals. The solutions to an ELP P are called world views. A set of interpretations I is a world view of P iff I = AS(P I ), where AS(P I ) denotes the set of stable models (or answer sets) of the logic program P I according to the semantics of answer set programming Gelfond and Lifschitz (1991). Checking whether a world view exists for an ELP is known to be Σ P 3 -complete in general, as shown by (Truszczynski, 2011).

Reversibility of Actions
In this section, we focus on the notion of uniform reversibility, which is a subclass of action reversibility as explained in detail by . Intuitively, we call an action reversible if there is a way to undo all the effects that this action caused, and we call an action uniformly reversible if its effects can be undone by a single sequence of actions irrespective of the state where the action was applied. While this intuition is fairly straightforward, when formally defining this concept, we also need to take several other factors into account-in particular, the set of possible states where an action is considered plays an important role Morak et al. (2020).

Definition 1
Let F be a set of facts, A be a set of actions, S ⊆ 2 F be a set of states, and a ∈ A be an action. We call a uniformly S-reversible iff there exists a sequence of actions π = a 1 , . . . , a n ∈ A n such that for each s ∈ S wherein a is applicable it holds that π is applicable in a[s] and π[a[s]] = s.
The notion of uniform reversibility in the most general sense does not depend on a concrete STRIPS planning task, but only on a set of possible actions and states w.r.t. a set of facts. Note that the set of states S is an explicit part of the notion of uniform S-reversibility.
Based on this general notion, it is then possible to define several concrete sets of states S that are useful to consider when considering whether an action is reversible. For instance, S could be defined via a propositional formula over the facts in F . Or we can consider a set of all possible states (2 F ) which gives us a notion of universal reversibility that applies to all possible planning tasks that share the same set of facts and actions (i.e., the tasks that differ only in the initial state or goals). Or we can move our attention to a specific STRIPS instance and ask whether a certain action is uniformly reversible for all states reachable from the initial state.

Definition 2
Let F , A , S, and a be as in Definition 1. We call the action a 1. uniformly ϕ-reversible iff a is uniformly S-reversible in the set S of models of the propositional formula ϕ over F ; 2. uniformly reversible in Π iff a is uniformly R Π -reversible for some STRIPS planning task Π; and 3. universally uniformly reversible, or, simply, uniformly reversible, iff a is uniformly 2 Freversible.
Given the above definitions, we can already observe some interrelationships. In particular, universal uniform reversibility (that is, uniform reversibility in the set of all possible states) is obviously the strongest notion, implying all the other, weaker notions. It may be particularly important when one wants to establish uniform reversibility irrespective of the concrete STRIPS instance. On the other hand, ϕ-reversibility may be of particular interest when ϕ encodes the natural domain constraints for a given planning task.
The notion of uniform reversibility naturally gives rise to the notion of the reverse plan. We say that some action a has an (S-)reverse plan π iff a is uniformly (S-)reversible using the sequence of actions π. It is interesting to note that this definition of the reverse plan based on uniform reversibility now coincides with the same notion as defined by (Eiter et al., 2008). Note, however, that in that paper the authors use a much more general planning language.
Even if the length of the reverse plan is polynomially bounded, the problem of deciding whether an action is uniformly (ϕ-)reversible is intractable. In particular, deciding whether an action is universally uniformly reversible (resp. uniformly ϕ-reversible) by a polynomial length reverse plan is NP-complete (resp. in Σ P 2 ) Morak et al. (2020).

Methods
After reviewing the relevant features of plasp, described by (Dimopoulos et al., 2019), in Section 4.1, we present our encodings for determining reversibility in Section 4.2.

The plasp Format
The system plasp, described by (Dimopoulos et al., 2019), transforms PDDL domains and problems into facts. Together with suitable programs, plans can then be computed by ASP solversand hence also by ELP solvers, since ELPs are a superset of ASP programs. Given a STRIPS domain with facts F and actions A , the following relevant facts and rules will be created by plasp: . for all f ∈ F • action(action("a")). for all a ∈ A • precondition(action("a"),variable("f"),value(variable("f"),true)) :-action(action("a")).

Reversibility Encodings using ASP and ELPs
In this section, we present our ASP and ELP encodings for checking whether, in a given domain, there is an action that is uniformly reversible. As we have seen in Section 4.1, the plasp tool is able to rewrite STRIPS domains into ASP rules even when no concrete planning instance for that domain is given. We will present two encodings, one for (universal) uniform reversibility, and one that can be used for uniform ϕ-reversibiliy. Note that universal uniform reversibility is computationally easier than ϕ-uniform reversibility (under standard complexity-theoretic assumptions). For a given action (and polynomial-length reverse plans), the former can be decided in NP, while the latter is harder (Theorem 18 and 20 in ). We will hence start with the encoding for the former problem, which follows a standard guess-and-check pattern.

Universal Uniform Reversibility
The encodings are based on sequential-horizon.lp in the plasp distribution.
ELP Encoding. As a "database" the encoding takes the output of plasp's translate action (for details, see Dimopoulos et al. (2019)). The problem can be solved in NP due to the following Observation (*): in any (universal) reverse plan for some action a, it is sufficient to consider only the set of facts that appear in the precondition of a. If any action in a candidate reverse plan π for a (resp. a itself) contains any other fact than those in pre(a), then π cannot be a reverse plan for a (resp. a is not uniformly reversible), see Theorem 18 in  or Theorem 3 in (Chrpa, Faber, and Morak, 2021). With this observation in mind, we can now describe the (core parts of) our encodings 2 . We start with our ELP encoding and will explain later how to modify it to obtain a plain ASP encoding. We should note that here the epistemic operators are used in a way as choices are used in ASP. We did this in order to understand the computational overhead of using ELP rather than ASP, but also in preparation for the uniform ϕ-reversibility encoding.
The ELP encoding makes use of the following main predicates (in addition to several auxiliary predicates, as well as those imported from plasp): • chosen/1 encodes the action to be tested for reversibility. • holds/3 encodes that some fact (or variable, as they are called in plasp parlance) is set to a certain value at a given time step. • occurs/2 encodes the candidate reverse plan, saying which action occurs at which time step.
With the intuitive meaning of the predicates defined, we first choose a single action from the available actions and set the initial state as the facts in the precondition of the chosen action. The first two lines partition the actions into chosen and unchosen ones; since it is a "modal guess," there will be one world view for each partition. The third line makes sure that there is at most one chosen action, and lines 4 and 5 enforce at least one chosen action. The last rule says, in line with the Observation (*) above, that only those variables in the precondition are relevant to check for a reverse plan. These rules set the stage for the inherent planning problem to be solved to find a reverse plan. In fact, from the initial state guessed above, we need to find a plan π that starts with action a (the chosen action), such that after executing π we end up in the initial state again. Such a plan is a (universal) reverse plan. This idea is encoded in the following: The above rules guess a potential plan π using the same technique as above, and then execute the plan on the initial state (changing facts if this is caused by the application of a rule, and keeping the same facts if they were not modified). Finally, we simply need to check that the plan is (a) executable, and (b) leads from the initial state back to the initial state. This can be done with the following constraints: The first rule checks that rules in the candidate plan are actually applicable. The next two check that the rules do not contain any facts other than those that are relevant (cf. observation (*) above). Finally, the last three rules make sure that at the maximum time point (i.e. the one given by the externally defined constant "horizon") the initial state and the resulting state of plan π are the same. It is not difficult to verify that any world view of the above ELP (combined with the plasp translation of a STRIPS problem domain) will yield a plan π (encoded by the occurs predicate) that contains the sequence of actions a, a 1 , . . . , a n , where a 1 , . . . , a n is a (universal) reverse plan for the action a (each world view consists of precisely one answer set). Note that our encoding yields reverse plans of length exactly as long as set in the "horizon" constant. One could for instance employ an iterative deepening approach for determining the shortest reverse plans in case the plan length is not known or fixed. This completes our ELP encoding for the problem of deciding universal uniform reversibility.
We can show that the encoding indeed leads to the correct result: Theorem 4 Given a STRIPS planning task Π = F , A , s 0 , G , the ELP encoding in this section, when applied to Π, produces exactly one world view for each universally uniformly reversible action a ∈ A and reverse plan π of length horizon for a.

Proof (Sketch).
We will show that, for each such action a and reverse plan π, there exists exactly one world view I , such that every answer set in I contains the facts chosen(a) and occurs(a', i) for each action a ′ ∈ π, where a ′ is the (i − 1)-th action in π. This follows by construction: The rules deriving the chosen and occurs predicates, together with the constraints that follow, ensure that there is exactly one world view candidate per choosable action and candidate reverse plan of length horizon. Because of Theorem 18 in , for universal uniform reversibility we only need to check a single starting state, and hence each world view candidate I has at most one answer set M, i.e. I = {M}.
The rules deriving the holds and caused predicates then execute action a and the reverse plan, keeping track of which value each variable has after each step (represented by time points T). Finally, M is eliminated as an answer set in case where some action a ′ in the reverse plan is not applicable or if a ′ "touches" a variable that does not occur in the precondition of the chosen action a (encoded in the predicate relevant). The latter check is, again, correct because of Observation (*). The final three rules ensure that, in any world view, no answer set can contain the fact noreversal, which is true if and only if some variable in the initial state (time point 0) has a different value from the final state (time point horizon + 1).
Hence, in any remaining world view I = {M}, M contains precisely a chosen action a, a reverse plan π of length horizon inside the occurs predicate, and the intermediate states at each time step after the successful and valid application of action a or actions from π, starting at some initial state that equals the final state. But this is precisely a reverse plan for a of length horizon, as desired.
ASP Encoding. Now, to see how the same thing can be achieved using ASP, we can modify the encoding above as follows, yielding an encoding that guarantees that every answer set represents a possible uniform reverse plan. Firstly, in order to choose the action to reverse, the first five rules of the ELP encoding can be replaced by a simple choice rule: Similarly, the rules that chose, for each time step, an action (via the occurs predicate), can be replaced with a choice rule as follows: 1 {occurs(A, T) : action(action(A))} 1 :-time(T), T > 1.
Finally, the check that no reversal exists (represented by the noreversal atom in the ELP encoding) can be encoded in ASP using simple constraints: :-holds(V, Val, 0), not holds(V, Val, horizon+1). :-holds(V, Val, horizon+1), not holds(V, Val, 0). This completes the ASP encoding, which now does not contain any subjective literals. It can be seen that, whereas the ELP encoding generates one world view per uniform reverse plan (by doing all the guesses via subjective literals), the ASP encoding will generate one answer set per such plan: Theorem 5 Given a STRIPS planning task Π = F , A , s 0 , G , the ASP encoding in this section, when applied to Π, produces exactly one answer set for each universally uniformly reversible action a ∈ A and reverse plan π of length horizon for a.
Proof (Idea). The proof proceeds in a similar fashion to the proof of Theorem 4. In particular, now the actions are not guessed via a world view, but directly inside the answer set, via the appropriate choice rules. Hence, candidate answer sets contain all combinations of chosen actions and reverse plan candidates. Via the constraints, any answer set where the actions in the reverse plan don't follow the conditions of Observation (*), or where they do not lead back to the original state, are eliminated, leaving only answer sets that contain chosen actions together with valid reverse plans for them, as desired.
Comparison. The ASP encoding is a fairly straightforward guess-and-check program, as the underlying problem of deciding universal uniform action reversibility is only NP-complete Morak et al. (2020). In this case, it could be argued that the choice rules employed there are a more natural encoding than guessing via the modal operators of ELPs. However, in terms of contrasting the expressiveness of the two languages, we feel that, still, it is interesting to see how "simple" NP-complete problems can be encoded using the modal operators of ELPs, as this may lead to further improvements of the modelling capabilities of the ELP language in the future. It also stands to reason that, in the future, ELP solvers should aim to provide some syntactic sugar for these modal operators for guess-and-check programs, similar to how choice rules are provided by modern ASP solvers.

Other Forms of Uniform Reversibility
ELP Encoding. Using a similar guess-and-check idea as in the previous encodings, we can also check for uniform reversibility for a specified set of states (that is, uniform S-reversibility). Generally, the set S of relevant states is encoded in some compact form, and our encoding therefore, intentionally, does not assume anything about this representation, but leaves the precise checking of the set S open for implementations of a concrete use case. The predicates used in this more advanced encoding are similar to the ones used in the previous for the universal case above, and hence we will not list them here again. However, in order to encode the for-all-states check (i.e. the check that the candidate reverse plan works in all states inside the set S), we now need our world views to contain multiple answer sets: one for each state in S. We again start off with the ELP encoding. However, this time, we will see afterwards that there is no easy modification to immediately obtain an ASP encoding, but the two differ substantially 3 .
The ELP encoding starts off much like the previous one: Note that we no longer need to keep track of any set of "relevant" facts, since we now need to consider all the facts that appear inside the actions and in the set S of states. However, we need to open up several answer sets, one for each state. This is done by guessing a truth value for each fact at time step 0. Recall that contains is part of the plasp output, encoding all possible values for a given variable. Next, we again guess and execute a plan, keeping track of whether the actions were able to be applied at each particular time step:  Again, the rules above choose a candidate reverse plan π, starting with the action-to-bechecked a, as before. Furthermore, we check applicability: π should be applicable (i.e. at each time step, the relevant action must have been applied, encoded by the third block of rules above), and furthermore, only modified facts (i.e. those affected by an action) can change their truth values from time step to time step. Finally, we again need to make sure that the guessed plan actually returns us to the original state at time step 0. This concludes the main part of our ELP encoding. In its current form, the encoding given above produces exactly the same results as the first encoding given in this section; that is, it checks for universal uniform reversibility. However, the second encoding can be easily modified in order to check uniform S-reversibility. Simply add a rule of the following form to it: :-< check guessed state against set S > This rule should fire precisely when the current guess (that is, the currently considered starting state) does not belong to the set S. This can of course be generalized easily. For example, if set S is given as a formula ϕ, then the rule should check whether the current guess conforms to formula ϕ (i.e., encodes a model of ϕ). Other compact representations of S can be similarly checked at this point. Hence, we have a flexible encoding for uniform S-reversibility that is easy to extend with various forms of representations of set S. ASP Encoding. Now, for the ASP encoding. As we will see, this is now substantially more involved than the ELP encoding, since we need to apply an encoding technique called saturation, cf. (Eiter and Gottlob, 1995), allowing us to express a form of universal quantification. We can start off the same as last time, that is, with a choice rule: We note the first difference compared to the ELP encoding: we need to keep track of all STRIPS facts that are potentially affected by an action. We assume that a predicate opposites/2 exists that holds, in both possible orders, the values "true" and "false". This will later be used to find the opposite value of some STRIPS fact at a particular time step.
Next, we again guess and execute a plan: Note that we use the predicate affected here to encode inertia for those facts that are not affected by the applied action. From here on, we now see a major difference to the ELP encoding. We need to set up our goal conditions and then encode the universal check for all states of set S. First we check that π should be applicable (i.e. at each time step, the relevant action must have been applied), and furthermore, the state at the beginning must be equal to the state at the end. same(V) :-holds(V, Val, 0), holds(V, Val, horizon + 1). samestate :-same(V) : variable(variable(V)). planvalid :-applied(horizon + 1). reversePlan :-samestate, planvalid.
Finally, we need to specify that for all the states specified in the set S the candidate reverse plan must work. This is done as follows: As stated above, this is done using the technique of saturation Eiter and Gottlob (1995), allowing us to express a form of universal quantifier that, in our case, checks that, for every state in the set S, we return to the original state after applying the chosen action and the reverse plan. We encourage the reader to refer to the relevant publication for more details on the "inner workings" of this encoding technique. Again, as is, the ASP encoding above checks universal uniform reversibility. However, it again can be easily modified in order to check uniform S-reversibility. Simply add a rule of the following form to it, analogously to what we had for the ELP encoding: reversePlan :-< check guessed state against set S > This completes the overview of our ELP and ASP encodings for uniform reversibility.
Comparison. Looking at the structure of the ELP and ASP encodings, it is not difficult to see that they share a certain common structure. This is not surprising, since they underlying language is the same. However, it can also be observed that the technique of saturation, which is required (in terms of expressive power) to encode uniform S-reversibility in ASP, is somewhat non-intuitive, as it is not immediately clear, what the semantics of this construction are. By contrast, the modal operators provided by ELPs make this much more readable and declarative.

Experiments
We have conducted preliminary experiments with artificially constructed domains. The domains are as follows: (define (domain rev-i) (:requirements :strips) (:predicates (f0) ... (fi)) (:action del-all :precondition (and (f0) ... (fi) ) :effect (and (not (f0)) ... (not (fi)))) (:action add-f0 :effect (f0)) ...   The action del-all has a universal uniform reverse plan add-f0, . . . , add-fi . We have generated instances from i = 1 to i = 6 and from i = 10 to i = 200 with step 10. We have analyzed runtime and memory consumption of two problems: (a) finding the unique reverse plan of size i (by setting the constant horizon to i) and proving that no other reverse plan exists, and (b) showing that no reverse plan of length i-1 exists (by setting the constant horizon to i-1). We compare the four encodings described in Section 4.2, and refer to the first two as the simple ELP/ASP encoding and to the second two as the general ELP/ASP encoding.
The results for problem (a) are plotted in Figure 1. The general ELP encoding exceeded the time limit already at the problem with seven facts, while the simple ELP encoding could solve all problems with up to 150 facts within the time limit. The general and simple ASP encodings perform better than their ELP counterparts, but the simple ELP encoding performed much better than the saturation-based general ASP encoding, even though ELP solvers are in their infancy compared to the heavily optimized ASP solving systems. The memory consumption increased with i for all encodings, proportional to the computation time.
The results for problem (b) are plotted in Figure 2. Interestingly, compared to (a), all the encodings performed significantly better. While the general encodings still hit the time limit for seven facts (ELP) and 50 facts (ASP), the simple encodings were able to solve all the instances up to our maximum of i = 250 (the figure stops at i = 150), but at the expense of increasing memory usage. In total, the general encodings, for both ASP and ELP, scale worse, as expected, since the ELP solver needs to evaluate all answer sets inside each possible world view, and the ASP solver needs to compute the result of the saturation check. However, for the simple encodings, especially the task of testing for non-reversibility performed surprisingly well for both ASP and ELP. From all of our results, however, we can see that ELP solving still severely trails, in terms of performance, encodings for plain ASP. This was somewhat expected, since ELP solvers are nowhere near as optimized as modern ASP systems. We hope, however, that our results encourage further improvements in the area of ELP solvers, since matching the ASP results, at least in this particular benchmark set, does not seem completely out of reach.

Conclusions
In this paper, we have given a review of several notions of action reversibility in STRIPS planning, as originally presented by . We then proceeded, on the basis of the PDDL-to-ASP translation tool plasp, described by (Dimopoulos et al., 2019), to present two ELP encodings and two ASP encodings to solve the task of universal uniform reversibility of STRIPS actions, given a corresponding planning domain. When given to an appropriate solving system, these encodings, combined with the ASP translation of STRIPS planning domains produced by plasp, then yield a set of world views (for ELP) or answer sets (for ASP), each one representing a (universal) reverse plan for each action in the domain, for which such a reverse plan could be found.
The four encodings use two different approaches. The first, simpler, encoding makes use of a shortcut that allows it to focus only on those facts that appear in the precondition of the action to check for reversibility, as described by . The second two encodings make use of the power of world views containing multiple answer sets in ELP, and the encoding technique of saturation as of (Eiter and Gottlob, 1995) in ASP, respectively, which allows for encoding universal quantifiers. These two encodings try to directly represent the original definition of uniform reversibility: for an action to be uniformly reversible, there must exist a plan, and this plan must revert the action in all possible starting states (where it is applicable). Hence, the two general encodings are more flexible insofar as they also allows for the checking of non-universal uniform reversibility (e.g. to check for uniform ϕ-reversibility, where the starting states are given via some formula ϕ).
In order to compare the four encodings, we performed some benchmarks on artificially generated instances by checking whether there is an action that is universally uniformly reversible. For the ELP and ASP communities, it will not come as a surprise that the ELP encodings perform worse than the ASP encodings. We see this as a call-to-action to further optimize and improve ELP solvers. From our experiments, it seems that the performance of ASP solvers, while significantly better, is not out of reach for ELP systems.
For future work, we intend to optimize our ELP encodings further, and test them with other established ELP solvers. There are several competing ELP semantics out there and several solvers are available. It would also be interesting to see how the encodings perform when compared to a procedural implementation of the algorithms proposed for reversibility checking by . We would also like to compare our approach to existing tools RevPlan 4 (implementing techniques of (Eiter et al., 2008)) and undoability (implementing techniques of (Daum et al., 2016)). Furthermore, we aim to explore how our techniques can be extended to planning languages more expressive than STRIPS. We envision various avenues for that, one is to deal with "lifted representations" (going beyond propositional atoms), another one is to allow for non-deterministic action effects or exogenous events, for which ASP and ELP seem to be well-suited.
We can check that the only world view with an answer set, in which noreversal is not derived, is the one in which occurs("del-all",1), occurs("add-f0",2), occurs("add-f1",3) hold. Indeed, del-all is the only universally uniformly reversible action, and its only reverse plan of length 2 is add-f0, add-f1 .
The simple ASP encoding works in a very similar way. Since the simple ELP encoding has at most one answer set per world view, we simple turn the "epistemic guesses" into "standard guesses", so instead of an answer set encapsulated in a world view, it is just an answer set, and also there, one answer set exists for the example, in which occurs("del-all",1), occurs("add-f0",2), occurs("add-f1",3) hold.
Concerning the general ELP encoding, similar world views as above are created. But in that encoding, multiple answer sets can exist in a world view: for each variable not in the precondition of the chosen action, there will be answer sets in which the variable is true, and answer sets in which it is false. So, any world view, in which chosen("del-all")), occurs("del-all",1), holds("f0",true,0), holds("f1",true,0) hold, will still have at most a single answer set, as all variables occur in the precondition of del-all. It is easy to see that the reverse plan is then in a single-answer-set world view similar to the one in the simple ELP encoding.
Let us have a look at the world view containing occurs("add-f0",1), occurs("del-all",2), occurs("del-all",3). For this, inapplicable will be derived because the preconditions for occurs("del-all",3) are not met in any of the answer sets, and also because the precondition for occurs("del-all",2) is not met in those answer sets in which holds("f1",false,0) is true. Therefore, the constraint :-not &k{~inapplicable}. is violated for this world view.
The general ASP encoding works in a rather different way. Here, one candidate answer set will be created for each action to be reversed, one completion of the initial state and one candidate reverse plan.