Advancing Lazy-Grounding ASP Solving Techniques -- Restarts, Phase Saving, Heuristics, and More

Answer-Set Programming (ASP) is a powerful and expressive knowledge representation paradigm with a significant number of applications in logic-based AI. The traditional ground-and-solve approach, however, requires ASP programs to be grounded upfront and thus suffers from the so-called grounding bottleneck (i.e., ASP programs easily exhaust all available memory and thus become unsolvable). As a remedy, lazy-grounding ASP solvers have been developed, but many state-of-the-art techniques for grounded ASP solving have not been available to them yet. In this work we present, for the first time, adaptions to the lazy-grounding setting for many important techniques, like restarts, phase saving, domain-independent heuristics, and learned-clause deletion. Furthermore, we investigate their effects and in general observe a large improvement in solving capabilities and also uncover negative effects in certain cases, indicating the need for portfolio solving as known from other solvers. Under consideration for acceptance in TPLP.


Introduction
Answer-Set Programming is employed in many application areas (Falkner et al. 2018) because ASP offers a rich first-order declarative knowledge representation language, and powerful reasoning systems are available. For hard, practical configuration problems such as the Partner Units Problem (Aschinger et al. 2011;Teppan 2017), for example, ASP was applied successfully offthe-shelf. However, there are practical problem instances in configuration, scheduling, and planning, where pure ASP systems based on the traditional ground-and-solve approach cannot compute solutions because of excessive main-memory consumption in the grounding phase, which is frequently superlinear in the size of the input (Falkner et al. 2018).
One way to tackle the grounding issue is by grounding lazily only those parts of a first-order theory which are actually needed to solve the problem at hand. This lazy grounding is a bottomup procedure that interleaves grounding and solving in such a way that parts of the grounding are constructed when the solver needs them. There exist a number of lazy-grounding ASP solvers, GASP (Palù et al. 2009), Omiga (Dao-Tran et al. 2012), ASPeRiX (Lefèvre et al. 2017) and the recently introduced ALPHA (Weinzierl 2017). Only the latter integrates lazy grounding with a conflict-driven clause-learning (CDCL) solver, hence it currently is the most efficient lazy-grounding system for ASP solving.
Nevertheless, ALPHA only realizes a subset (cf. Weinzierl 2017; Leutgeb and Weinzierl 2017) of the techniques usually employed in ground-and-solve ASP systems, whose efficiency is largely due to their use of a wide range of CDCL techniques for efficient SAT solving (Gebser et al. 2012;Alviano et al. 2017). Thus the search performance of ALPHA is significantly worse than that of ground-and-solve systems on problems where grounding itself is not an issue. Lazy grounding at its core contains some specific restrictions (e.g. guessing on all atoms is not allowed) that are of no concern for the techniques employed in ground-and-solve systems. Hence, one cannot simply add lazy grounding on top of the existing solving techniques. Quite the contrary, each technique from ground-and-solve systems must be checked for suitability to the lazy-grounding setting individually. In particular, restarts, phase saving, domain-independent heuristics, and learnedclause deletion, which are crucial methods in grounded ASP solving to deal with hard problem instances, are not available to lazy-grounding solvers.
In this work we show how these methods must be enhanced to enable efficient search for answer sets based on lazy-grounding ASP solving. Our contributions are as follows.
• An investigation into the techniques of restarts, phase saving, domain-independent heuristics, and learned-clause deletion, determining compatibility with lazy grounding. • Enhancing these methods to fit the lazy-grounding setting, investigating their effects and creating novel adaptions to work around issues specific to lazy grounding. • Specifically, the introduction of domain-independent VSIDS-like heuristics that use atomdependency information to assign atom-activity scores to ground rules. • An integration of the enhanced methods in the latest version of ALPHA.
• An evaluation of the above methods on worst-case scenarios for lazy grounding (and AL-PHA), where grounding is easy but problem solving is challenging. Our evaluations show signifiant runtime improvements, i.e. up to a factor of three. Furthermore, our experiments also indicate that the novel techniques introduce no obstacle for solving instances that are hard to ground. This paper starts with an introduction of the basics of ASP in Section 2. In Section 3 we recap the principles of several state-of-the-art techniques for grounded ASP solving and show their enhancements and the adaptions required for lazy-grounding systems. The runtime improvements of these new techniques are exemplified in Section 4. Section 5 discusses related work and Section 6 concludes.

Preliminaries
Let C be a finite set of constants, V be a set of variables and P be a finite set of predicates. An atom is an expression p(t 1 , . . . ,t n ) where p is an n-ary predicate and t 1 , . . . ,t n ∈ C ∪ V are terms, and a literal is either an atom a or its default negation not a. An ASP program P is a finite set of (normal) rules of the form where h and b 1 , . . . , b m are positive literals (i.e. atoms) and not b m+1 , . . ., not b n are negative literals. Given a rule r, we denote by H(r) = {h}, B(r) = {b 1 , . . . , b m , not b m+1 , . . . , not b n }, B + (r) = {b 1 , . . . , b m }, and B − (r) = {b m+1 , . . . , b n } the head, the body, the positive body, and the negative body of r, respectively. If H(r) = / 0, r is a called a constraint, and a fact if B(r) = / 0.
Given a literal l, set of literals L, or rule r, we denote by vars(l), vars(L), or vars(r) the set of variables occurring in l, L, or r, respectively. A literal l or rule r is ground if vars(l) = / 0 or vars(r) = / 0, respectively. The set of all ground atoms is denoted by At grd . A program P is ground if all its rules r ∈ P are ground.
An interpretation I ⊆ At grd satisfies a ground rule r, denoted I |= r, if B + (r) ⊆ I ∧B − (r)∩I = / 0 implies H(r) ⊆ I and H(r) = / 0. I is an answer set of a ground program P if I is the subset-minimal model of P I , where 0} is the so-called FLP reduct, the set of rules whose body is satisfied by I (Faber et al. 2011). A (partial) assignment A is a set of signed atoms where A + denotes the atoms assigned a positive truth value and A − those assigned a negative truth value in A. Given an atom at the result of applying a substitution σ : V → C to at is denoted by atσ ; this is extended in the usual way to rules r, i.e., rσ for a rule of the above form is hσ ← b 1 σ , . . . , b m σ , not b m+1 σ , not b n σ . The grounding of a rule is given by grd(r) = {rσ | σ is a substitution for all v ∈ vars(r)} and the grounding grd(P) of a program P is given by grd(P) = r∈P grd(r). The answer sets of a non-ground program P are given by the answer sets of grd(P).
Lazy grounding is an approach to tackle the grounding bottleneck inherent in traditional ground-and-solve systems which makes programs whose grounding exceeds available memory unsolvable. We describe lazy grounding only briefly here and refer to Weinzierl (2017) and Leutgeb and Weinzierl (2017) for a detailed account of the lazy-grounding ASP system ALPHA. Computing all answer sets such that grd(P) is constructed lazily is typically done by a loop composed of two phases: given a partial assignment (that is initially empty), first ground those rules that potentially fire under the current assignment, second expand the current assignment (using propagation and guessing). If the loop reaches a fixpoint, i.e., no more rules potentially fire and nothing is left to propagate or guess on, and no constraints are violated, then the current assignment is an answer set.
One important difference to ground-and-solve is that a lazy-grounding solver does not guess on each atom whether it is true or false, but it guesses about ground instances of rules whether they fire or not. This is correct due to the underlying solving mechanisms based on computation sequences and avoids generating completion nogoods (cf. Clark 1977;Gebser et al. 2012) in most cases. This has the advantage that less space is occupied by completion nogoods and the drawback that the solver lacks information, like in the following example: If the solver knows that, say, p(13) must be true, in lazy grounding the solver generally does not know if there are any rules that can derive p(13) but have not yet been grounded, and therefore cannot conclude that one of these rules must fire.

State-of-the-Art Solving Techniques
In the following we discuss several important state-of-the-art techniques for ASP solving and investigate their adaption to the lazy-grounding setting. Since many of the techniques for efficient ASP solving originate in SAT solving, there are many similarities to SAT techniques. As mentioned in the previous section, lazy-grounding ASP solving imposes additional restrictions that are neither considered in SAT solving nor in the traditional approach for ASP solving, like the fact that not all ground atoms are known from the beginning. Hence one cannot just put lazy grounding on top of the existing technologies, but each technology must be individually checked for compatibility with the lazy-grounding approach. Bomanson et al. (2019), for example, uncovered that supporting aggregates with a lazy-grounding ASP solver requires a sequential enumeration of all ground terms that will appear during the run of the solver. In the ground-andsolve approach this enumeration is trivially found by simply looking at the full grounding of the input program, while in lazy grounding the solver must provide special facilities to enable an efficient enumeration that does not rely on knowing all ground rules in advance. Similarly, while virtually all ground-and-solve systems do a Clark's completion to represent rules by clauses (or nogoods), in the lazy-grounding approach the full completion cannot be obtained in advance without grounding the input program fully. Bogaerts and Weinzierl (2018) have developed ondemand computation of justifications in order to get around the issues of missing knowledge from the Clark's completion. Luckily, the techniques we investigate here turned out to be rather well-behaved, requiring less adaptions. Nevertheless, there were still some surprising challenges we had to overcome.

Restarts
Restarts originate in the observation of SAT solvers exhibiting heavy-tailed behaviour, i.e., when solving a set of SAT instances, the majority of the time is consumed by a relatively small number of instances that often time out while the majority of instances are solved in relatively little time. For those instances with a long run time, the search seemingly gets stuck in some part of the search space and restarting the search in a new part of the search space helps (Gomes et al. 2000). Upon restarting all decisions of the solver are undone, i.e., restarts are a backjump to decision level 0. Importantly, restarts do not discard learned clauses and they do not reset the search heuristics, i.e., highly active atoms (cf. Section 3.3 for details) before the restart are still considered by the heuristics as highly active after the restart.
Even though restarts provide the solver with the possibility to go into a completely different part of the search space after a restart, practical experience showed that this is not optimal as it is likely that the search gets stuck there again, while all recent knowledge (e.g. learned clasues) has become mostly useless in the new search area. Therefore, restarts are even more efficient if combined with phase saving (cf. Section 3.2), which leads the solver again into the same area of the search space as it was before. This has the important effect that learned clauses are still useful while the restart effectively re-orders the sequence in which atoms have been guessed: after a restart highly active atoms are chosen first, which leads to conflicts arising much earlier than before the restart. Intuitively the binary search (sub-)tree with conflicts at each of its leaves is now much more shallow as it contains fewer irrelevant choices. These kind of restarts are very effective in uncovering implicit information about the problem instance and restarts in rapid succession significantly improve solving efficiency.
There are two principled ways for solvers to restart: static restart sequences trigger a restart after a fixed number of conflicts, while adaptive (or dynamic) restarts trigger a restart whenever the solver detects that it is not learning useful clauses. Static restart sequences often use the so-called Luby sequence (cf. Luby et al. 1993), while adaptive restarts (cf. Audemard and Simon 2012) issue a restart not in a fixed sequence depending on the number of conflicts encountered by the solver, but measure the quality of learned clauses to decide whether a restart is appropriate. Adaptive restarts use the LBD (Literals Blocks Distance) measure for quality of learned clauses (cf. Audemard and Simon 2009), which intuitively counts the number of decision levels of the literals appearing in the clause at the time the clause is learned. The lower the LBD value, the better the clause will likely perform in the remainder of the search. Adaptive restarts now compare an average of the LBD value of learned clauses for the recent conflicts with the average for the entire search to that point, and issue a restart if the recently learned clauses have significantly worse LBD than the average of the whole run. Computing these moving averages was later improved by Biere and Fröhlich (2018) to consider exponential moving averages, which allow evaluation without needing to queue all recently learned LBD values.
Restarts for lazy grounding. ALPHA now combines both restart strategies: adaptive restarts (which usually are quick to trigger a restart) and static restart sequences that allow exponentially increasing runs where no restart is triggered. The static restart sequence is a Luby-sequence which is computed quickly by reluctant doubling (proposed by Donald E. Knuth in his SAT'12 talk), the state-of-the-art in most SAT solvers now. For adaptive restarts ALPHA uses the exponential moving averages on LBD values as described above. Note that ALPHA currently does not update LBD values, as some SAT solvers do.
Since ALPHA follows the original computation sequence (cf. Lefèvre et al. 2017) for lazygrounding answer-set computation, picking atoms for guessing is restricted. First, only atoms that represent the body of a rule with negation can be valid choice points 1 , and second, from these valid choice points only those where the positive body of the rule is already derived may be picked for guessing (cf. Weinzierl (2017) for details). This severly restricts the order in which atoms are chosen by the solver, i.e., it may forbid the solver from branching on the most active atom(s). As a consequence of that, restarts are less effective for the current lazy-grounding approach than they are for ground-and-solve ASP systems.
Luckily, for many search problems this negative effect does not manifest, namely those where all potential choices are available to the solver right from the beginning. An example of such a problem is graph colouring where each choice point colors a vertex of the input graph. After a restart all choice points are valid, hence a restart is indeed re-ordering atoms as in the groundand-solve cases.
For problems where guesses are "stacked", however, the negative effects of restarting are visible. Examples of that are planning problems where a choice on the second action may become valid only after the first action was chosen. Clever reformulation of the problem may avoid that issue, but this is beyond the scope of this work.

Phase Saving
Phase saving (or progress saving) is a technique that focuses the search on a specific part of the search space even after backjumping or restarts. Phase saving means to save the last assigned value of each atom and whenever a choice is to be made on any atom, its last assigned value is taken (cf. Pipatsrisawat and Darwiche 2007). It does not matter if the last value was assigned due to a choice or propagation. The effect of phase saving is that when the solver is backjumping or restarting, the search is again approaching the same point in the search space (i.e., the same candidate answer set). The effect of phase saving alone seems to be less significant, but in conjunction with restarts it has a tremenduous impact on performance (cf. Elffers et al. 2018). It effectively makes the solver approach the same point in the search space but from another direction and leads to the new perspective of a solver as "clause generating machinery". With the combination of phase saving and restarts the uncovered clauses are small in size and pertinent to a small portion of the search space, hence much more focused than without those techniques.
Phase saving for lazy grounding. Phase saving can be adapted to lazy grounding by adding an array that keeps, for each known ground atom, its last assigned truth value. Specific to lazy grounding, this array needs to grow in size during the run of the solver as a lazy-grounding solver uncovers ground atoms step by step. We observed that the initial value for phases has a significant impact on whether an instance can be solved or not. Note that if an atom is guessed whose truth value already is must-be-true, then the phase is not considered but true is chosen directly, as otherwise a conflict would arise immediately.
Our experiments include several settings for the initial phase: all false, all true, and random. Selecting one or the other makes a difference depending on the instance and it seems unlikely that one or the other is always best. The experiments show however that all true seems to be slightly more favourable though. The initially all false setting corresponds to the MiniSat setting while the initially all true setting effectively corresponds to what CLINGO is doing. Note that CLINGO actually uses true only for atoms representing rule bodies, but since ALPHA only guesses on atoms representing rule bodies, it coincides with the all true setting.

Domain-Independent Heuristics
Heuristics for answer-set solving can roughly be classified as follows: domain-independent heuristics do not take the nature of the problem at hand into account, whereas domain-specific heuristics have to be tailored to a specific problem. Domain-specific heuristics are covered by , accordingly we focus on domain-independent heuristics in this work. VSIDS (Variable State Independent Decaying Sum) (Moskewicz et al. 2001) and BerkMin (Goldberg and Novikov 2002) are prominent domain-independent heuristics originally developed for SAT but also successfully employed for ASP solving (in CLASP (Gebser et al. 2012) and WASP ).
They assign a so-called activity to every atom that counts the number of times a clause containing this atom contributed to a conflict. The activity of each atom is periodically divided by a constant (i.e., it is "decayed") to reduce the influence of conflicts further in the past. When asked for an atom, the heuristics choose the most active one. Other counters are maintained as well to enable the choice of which truth value to assign. BerkMin additionally organizes the set of conflict clauses as a chronologically ordered stack, thereby preferring atoms in recent conflicts. This is done to have regard to the fact that the set of atoms responsible for conflicts may change very quickly.
Atom activities are typically initialized by MOMs (Maximum Occurrences in clauses of Minimum size) (Gebser et al. 2013;Pretolani 1993). A MOMs score for an atom is an estimate to what extent other atoms are affected when this atom is assigned. For each atom, the MOMs score is a function of the number of nogoods involving the atom in a positive literal and the number of nogoods involving the atom in a negative literal.
A direct application of BerkMin or VSIDS to a lazy-grounding ASP solver like ALPHA is challenging, because such a solver differs in many important ways from a solver adhering to the classical ground-and-solve paradigm. One major difference is that not all ground rules, and consequently not all ground literals and atoms, are known at any given time to a lazy-grounding solver. Because of this, a heuristic applied to lazy grounding can only incorporate atoms that are already known to the solver.
Another major difference lies in the solving mechanism: while a traditional ASP solver can choose any atom to guess on, ALPHA is restricted to atoms representing rule bodies. In other words, ALPHA only guesses whether a certain rule fires or not, but it does not guess whether an atom in a rule's head or body is true or not. A direct application of BerkMin or VSIDS to ALPHA would therefore suffer from the fact that choice points comprise only a small portion of all the literals occuring in clauses (or nogoods) and therefore do not influence activity and sign counters as much as other atoms.
Heuristics for lazy grounding. Domain-independent heuristics for lazy-grounding ASP solving were first studied by Taupe et al. (2017). The so-called class of dependency-driven heuristics has proven particularly useful and has been further improved since. The basic idea is as follows: since for an ordinary atom b its truth cannot be guessed, find all choice points c (i.e., atoms representing rule bodies) that have an influence on the truth of b and whenever the activity of b is to be increased, increase the activity of c instead. By that, choices are not done directly on highly active atoms but on atoms that have an influence on highly active atoms, i.e., the solver is focused on (ordinary) active atoms and chooses their truth value indirectly. For an atom b there are two ways how the guessing alone may influence its value: either by firing a rule r with H(r) = b or by firing a rule r ′ with b ∈ B − (r ′ ), since the firing of a rule makes all the atoms in the negative body false and the head true. 2 ALPHA's dependency-driven VSIDS implementation maintains choice points in a heap data structure, which enables efficient access to the choice point with the highest activity. After choosing an applicable choice point, the sign is chosen by phase saving unless the atom is already assigned must-be-true, in which case true is chosen. At every conflict for all atoms b encountered in the (CDCL-style) conflict analysis, the activity of b is increased as follows: if b represents a choice point, its activity is increased; if b is an ordinary atom the activity of all known choice points that influence b is increased. This activity increment (initially 1) is divided by 0.92 after every conflict, i.e., the increment increases with every conflict. This is a state-of-the-art way of realizing the decay of activities by increasing the activity increment instead. The relative order of activities stays the same as with the decaying, but only the most recent value needs to be adapted instead of decaying all activity values of all atoms. Internally, atom activities are stored as double-precision floating point values and whenever the activity of an atom exceeds 10 100 , all activities are normalized (divided by 10 100 ). The increment is also normalized.
ALPHA's dependency-driven MOMs implementation used to initialize atom activities is inspired by CLINGO's implementation and also exploits dependencies as described above. When a new nogood is produced by the grounder, the activities of all choice points that have an influence on one of the literals in the new nogood are updated to their current MOMs value.

Learned-clause Deletion
Conflict-driven learning, usually considered the most important technique for SAT solving (and ASP solving), leads to many additional clauses being learned during search. Since each learned clause must be stored, this increases the clause database significantly during the runtime of a solver (in the order of thousands of new clauses per second). However, the more clauses the clause database contains, the more time is required for propagation, hence propagation speed decreases with more clauses being present. This holds true even in the presence of efficient propagation techniques like two-watched-literals, or its adaption to the lazy-grounding setting (cf. Leutgeb and Weinzierl 2017). Therefore, the learned-clause database is regularly cleaned (Eén and Sörensson 2003).
Some learned clauses must be excluded from being deleted, namely those that are locked, i.e., clauses that imply one of the currently assigned literals. Since each learned clause helps to identify portions of the search space where no solution (or answer set) can be found, deleting the wrong clauses may increase the search space to consider as the solver has to re-evaluate portions of the search space that otherwise would be excluded by a learned clause. There are several ways to identify clauses that are seemingly not important. The first is an activity counter that is incremented whenever a clause occurs in some conflict analysis, i.e., it contributes to a conflict. Clauses with low activity are then deleted first as they do not contribute much to the overall search performance. The second way is to use the LBD measure to determine a clauses quality (Audemard and Simon 2009). Again, clauses with a poor LBD value (i.e., whose LBD is high) are deleted first. A combination of both is also common, where activity is used to identify clauses for removal, but clauses with exceptionally good LBD value are kept regardless.
Learned-clause deletion for lazy grounding. The technique of learned-clause (or nogood) deletion requires no special adaptions to fit the lazy-grounding setting and we observed no particular effects when realizing it in ALPHA. The implementation in ALPHA in general mimics the default behaviour of CLINGO and so clause database cleaning is run after initial 2000 conflicts and that value increases by 100 for each cleaning cycle. The whole sequence is reset after 20 cycles.
At each cleaning, half of the clauses are scheduled for removal. For that, the average activity of the learned clauses is computed and 1.5 times the average is taken as threshold for removal, i.e., clauses with less than 1.5 times the average activity are removed unless they are locked. Locked clauses are not removed and as soon as half of the clause database has been removed the process stops, keeping any remaining clauses even if their activity is below the threshold. Note that this does not guarantee that half of the clauses are actually removed, but it is a sufficiently good and efficiently computable approximation. Note that clauses with a very good LBD value (≤ 2) are never removed and they are not considered in the cleaning.

Experimental Results
To asses the impact of newly adapted techniques in the lazy-grounding setting, we evaluated them against six benchmark problems: Graph Colouring, House Reconfiguration Problem (HRP), Stable Marriage, Partner Units Polynomial (PUP), Non-Partition-Removal-Colouring (NPRC), and the evaluation of nondeterministic L-Systems (Lindenmayer Systems).
Experimental Setup. Experiments were run on a cluster of machines each with two Intel R Xeon R CPU E5-2650 v4 @ 2.20GHz with 12 cores each, 252 GB of memory, and Ubuntu 16.04.1 LTS Linux. Benchmarks were scheduled with the ABC Benchmarking System (Redl 2016) together with HTCondor TM . 3 Time and memory consumption was measured by PYRUNLIM,4 which was also used to limit wall time consumption to 5 minutes per instance and swapping to 0. ALPHA was used in several configurations to compare the impact of different solving techniques. In every configuration, constraints were grounded permissively and rules were grounded strictly as suggested by . When using CLINGO 5.3.0, some techniques not yet supported by ALPHA were switched off to improve comparability. 5 For additional comparisons, OMIGA (Dao-Tran et al. 2012) was used in learning mode (Weinzierl 2013), and ASPERIX 0.2 (Lefèvre et al. 2017) was used with command-line argument -N 1000000. Moreover, LAZY WASP (Cuteri et al. 2019), which is a recent ground-and-solve system that incorporates powerful partial evaluation techniques, was used in its default configuration. Since LAZY WASP requires a manual splitting of programs into a lazily evaluated part and a part that is evaluated with groundand-solve techniques, the splitting was done such that a maximum part is evaluated lazily. All systems were instructed to search for 10 answer sets. 6 The whole set of benchmarks was run three times; we report median solving times per instance in the discussion of the results below.
Encodings and Instances. The encodings for Graph Colouring and Stable Marriage were taken from the Fourth Answer Set Programming Competition , the former without modifications, the latter with a choice rule replacing the equivalent disjunctive rule of the original. The encoding for HRP was obtained from Friedrich et al. (2011) and adapted to conform to the input language of ALPHA. 7 The encoding for PUP was taken from the Third Answer Set Programming Competition (Calimeri et al. 2014); choice rules have been used instead of disjunction. Encoding and all 110 instances for NPRC were taken from Bogaerts and Weinzierl (2018). For Graph Colouring and PUP, all instances from the ASP Competitions (Calimeri et al. 2014;Calimeri et al. 2016) were used (60 for Graph Colouring, 65 for PUP). For Stable Marriage, the 341 random instances generated by  were used again. For HRP, the 47 instances generated for the ASP Challenge 2019 were used, which include instances of different problem classes and of varying difficulty and size. 8 The evaluation of nondeterministic L-Systems is a novel benchmark. L-Systems (or Lindenmayer Systems) are types of formal grammars, where production rules expanding symbols into larger sequences of symbols are applied to an initial starting word in parallel. Such generated words can be visualised using suitable drawing functions, resulting in fractal structures like, e.g. the Cantor set, or a fractal tree. Each iteration of the evaluation of an L-System then typically yields one level of the resulting fractal structure. Our benchmark set is comprised of 39 instances of L-Systems. Some of them are deterministic, some nondeterministic (i.e., multiple different production rules may be applied to the same symbol). For the latter, additional constraints enforce global conditions on the generated words. Since words often grow exponentially with increased iteration steps, instances only compute few steps (4 to 20) of the given L-System.
All encodings and instances as well as binaries of the ALPHA version used for the experiments are available on our website. 9 Results and Discussion. Figures 1 to 6 show cactus plots for the time consumed to solve each of the six benchmark problems. They have been created in the usual way, i.e., the x axis gives the number of instances solved within real (i.e., wall-clock) time given on the y axis. Solving time per instance is the median across three solver runs. Note that the y axis shows time accumulated over all solved instances. We compare the runtimes of CLINGO, LAZY WASP, OMIGA, and ASPERIX to that of various ALPHA configurations. The baseline configuration of ALPHA is its latest implementation before introduction of the solving techniques presented in this paper, with permissive lazy grounding of constraints and strict lazy grounding of rules as described by . This configuration has been included to be able to study the accumulated effect of various sets of newly introduced solving techniques, which constitute the other four configurations: Each of those employs our dependency-driven form of VSIDS together with phase saving, where the default phase is true in two configurations and false in the other two, and restarts are switched on in two configurations and off in the others. Figure 1 shows that for the first time, ALPHA is able to solve several hard instances from the ASP competitions (here for the Graph Colouring problem). This is a breakthrough since those instances are hand-picked to exercise search techniques of ground-and-solve systems, even though CLINGO and LAZY WASP still outperform ALPHA. All configurations employing additional solving techniques outperform ALPHA's baseline. The best configuration even outperforms the baseline by a factor of three, allowing it to solve 12 instead of the previous 4 instances. Restarts appear to be a particularly useful improvement for this benchmark, which is in line with our observation that restarts perform well if choices are not "stacked" on each other, as is the case here.
As can be seen in Fig. 3, ALPHA also profits from the new solving techniques when solving HRP. All novel configurations clearly outperform the baseline, solving more instances than the baseline. None of the various settings, however, clearly performs better than the others for these HRP instances.
On Stable Marriage (Fig. 2) no improvement can be observed. In fact, the baseline performs best. At the moment we are not sure why this is the case, but CLINGO's effortless performance indicates that there may be some other techniques missing for the lazy-grounding setting. This is also underscored by LAZY WASP's performance which is similar to CLINGO's, since both employ similar ground-and-solve techniques.
Many more PUP instances can be solved when employing the new solving techniques including restarts, even though they consume more time on easier instances compared to some configurations without restarts (Fig. 4).
Note that all of the above problems are easy to ground, hence lazy grounding is not necessary. We still picked those to demonstrate that the search performance of lazy grounding is increasingly improving even on problems where lazy grounding per se does not improve performance. Actually, the above problems all present a worst-case scenario (i.e., compared to ground-andsolve systems, a lazy-grounding system only lacks some information).
The fifth problem, NPRC, is one where grounding itself is also an issue. As shown in Fig. 5, ALPHA clearly outperforms CLINGO on this problem. With regard to the novel techniques in ALPHA, on the one hand, there is some variance but no clear improvement over the baseline however, on the other hand, this indicates that the novel techniques help to solve hard search problems while introducing no obstacles for solving hard-to-ground instances. The LAZY WASP system also performs significantly better than CLINGO and comparably to ALPHA. We also noted naive dependency-driven VSIDS with phase-saving (alltrue) dependency-driven VSIDS with phase-saving (allfalse) dependency-driven VSIDS with phase-saving (alltrue) plus restarts dependency-driven VSIDS with phase-saving (allfalse) plus restarts clingo asperix omiga lazy wasp Legend for Figs. 1 to 6. that LAZY WASP's runtime varies wildly even when run repeatedly on the same instance. We currently have no explanation for this behaviour and guess it might be due to some randomization. Figure 6 shows the results for evaluating nondeterministic L-Systems. This benchmark is grounding-intense, so CLINGO can only solve the easier instances and partial evaluation techniques of LAZY WASP have no positive effect. ALPHA is able to solve most instances and there is a clear distinction between those configurations with the initial phase being true and those with false, as the latter are only able to solve the most simple instances. Whether restarting is enabled or not seems to make little difference. Both the baseline and the configuration with restarts and dependency-driven VSIDS solve the same number of instances, though the latter needs a bit more time. No line for ASPERIX is visible, but it is able to solve the smallest instance. 10 In all figures, only few data points can be seen for OMIGA and ASPERIX, because those systems could only solve very few instances. Furthermore, HRP was not used with OMIGA and ASPERIX because of the restricted input languages of these systems, and OMIGA produced several exceptions when trying to solve Stable Marriage instances.
Overall, adapting restarts, phase saving, dependency-driven VSIDS and learned-clause deletion to the lazy-grounding setting is a significant improvement for lazy-grounding ASP solving. It improves search performance on hard problems, sometimes dramatically, and still allows the grounding bottleneck to be avoided.

Related Work
There are several approaches to tackle the grounding bottleneck of ASP. The grounders of ground-and-solve systems have, for a long time, been trying to minimize the size of the resulting ground program, which gave rise to intelligent grounding techniques (cf. Leone et al. 2006;Calimeri et al. 2017).
A more recent attempt to circumvent the grounding bottleneck is by extending ASP with specific problem solvers (e.g. temporal (Cabalar et al. 2019), or difference-logic (Abels et al. 2019) reasoners) ) and then manually reformulating part of the original problem in the added formalism. Besides the need to develop and integrate the specific problem solvers, it requires users of ASP to be knowledgeable in another (unrelated) formalism to solve their problems.
Another approach aims to tackle the grounding issue by grounding only those parts of a firstorder theory which are actually needed to solve the problem at hand. Several techniques follow this general idea. Incremental grounding , which works for planning and related types of problems, introduces time steps on-the-fly when the solver notices no solution exists in the given time window. Partial compilation techniques (Cuteri et al. 2019) are a recent approach, where a stratifiable part of the program is automatically turned into a lazy propagator. This successfully addresses the grounding bottleneck for ASP programs with a certain structure, as also shown by our experiments. It currently requires the user to manually identify the program part that can be turned into a lazy propagator, however. Also top-down lazy-model generation (De Cat et al. 2015) and top-down stable model generation techniques (Marple et al. 2012;Marple et al. 2017) exist. The former, however, does not work on ASP but the related formalism of FO(ID), while the latter, to the best of our knowledge, does not achieve good solving efficiency.
The first bottom-up lazy-grounding systems available were GASP (Palù et al. 2009) and AS-PeRiX (Lefèvre et al. 2017). The OMIGA solver (Dao-Tran et al. 2012) uses a Rete-network for efficient grounding and propagation, but, like its predecessors, does not provide efficient search.
Part of the previous work in ALPHA focused on the formulation and integration of domainspecific heuristics to solve large-scale instances where such heuristics are known (cf. . The introduction of domain-independent state-of-the-art techniques employed for grounded ASP solving, however, was left open until now.

Conclusions and Future Work
Lazy-grounding ASP solvers must address the grounding bottleneck whilst providing problem solving techniques which allow the solution of hard problem instances. Problem solving techniques which proved to be successful for grounded ASP programs cannot be directly transferred to lazy-grounding solvers. In this paper we reviewed various problem solving techniques such as restarts, phase saving, domain-independent heuristics, 11 and learned-clause deletion. We presented enhancements and adaptations such that these techniques are applicable in lazy-grounding ASP solvers.
Experimental analysis on the ALPHA solver showed significant improvements (up to a factor of three) on some hard instances while for other problems the additional techniques have no negative effect. Similarly, as for other solvers, ALPHA comes now with a range of search options and there does not seem to be a setting that is always preferable. Hence portfolio solving might improve efficiency further.
As regards future work, we want to investigate further improvements to current solving techniques, like blocking restarts in certain cases. Furthermore, integrating external atoms similar to those by Eiter et al. (2018) is another goal.