Efficient Knowledge Compilation Beyond Weighted Model Counting

Abstract Quantitative extensions of logic programming often require the solution of so called second level inference tasks, that is, problems that involve a third operation, such as maximization or normalization, on top of addition and multiplication, and thus go beyond the well-known weighted or algebraic model counting setting of probabilistic logic programming under the distribution semantics. We introduce Second Level Algebraic Model Counting (2AMC) as a generic framework for these kinds of problems. As 2AMC is to (algebraic) model counting what forall-exists-SAT is to propositional satisfiability, it is notoriously hard to solve. First level techniques based on Knowledge Compilation (KC) have been adapted for specific 2AMC instances by imposing variable order constraints on the resulting circuit. However, those constraints can severely increase the circuit size and thus decrease the efficiency of such approaches. We show that we can exploit the logical structure of a 2AMC problem to omit parts of these constraints, thus limiting the negative effect. Furthermore, we introduce and implement a strategy to generate a sufficient set of constraints statically, with a priori guarantees for the performance of KC. Our empirical evaluation on several benchmarks and tasks confirms that our theoretical results can translate into more efficient solving in practice.

Algebraic Prolog (Kimmig et al. 2011), algebraic model counting (AMC) (Kimmig et al. 2017) and algebraic answer set counting (Eiter et al. 2021) define general frameworks based on semirings to express and solve quantitative first level problems, which compute one aggregate over all models, e.g., counting the number of models, or summing or maximizing values associated with them, such as probabilities or utilities.Kimmig et al. (2017) showed that we can solve first level problems by compiling the logic program into a tractable circuit representation, on which evaluating an AMC task is in polynomial time if the semiring operations have constant cost.
However, many interesting tasks require two kinds of aggregation, and thus are second level problems that go beyond AMC.Examples include Maximum A Posteriori (MAP) inference in probabilistic programs, which involves maximizing over some variables while summing over others, inference in SLASH and SMProbLog, and optimization tasks in decision-theoretic or constrained probabilistic programming languages such as DTProbLog (Van den Broeck et al. 2010;Derkinderen and De Raedt 2020) and SC-ProbLog (Latour et al. 2017).
While second level problems stay hard on general tractable circuit representations, DT-ProbLog and SC-ProbLog are known to be polynomial time on X-constrained SDDs.The key idea is to ensure that all variables of the outer aggregation appear before those of the inner one in the circuit so that we can perform both aggregations sequentially.This, however, comes at a high cost: circuits respecting this constraint may be exponentially larger than non-constrained ones, resulting in significantly slower inference.Additionally, certain optimization techniques used in knowledge compilation, such as unit propagation, may cause constraint violations.
In this paper, we generalize the AMC approach to second level problems, and show that we can often weaken the order constraint using definability: A variable Y is defined by a set of variables X and a propositional theory T if the value of Y is functionally determined by the assignment to X in every satisfying assignment to T .E.g., a is defined by b in the theory {a ↔ b}.Informally, if a variable participating in the inner aggregation becomes defined by the variables of the outer aggregation at any point during compilation, we can move that variable to the outer aggregation.This can allow for exponentially smaller circuits and consequently faster evaluation, and additionally justifies the use of unit propagation.
Our main contributions are as follows: • We introduce second level algebraic model counting (2AMC), a semiring-based unifying framework for second level quantitative problems, and show that MAP, DTProbLog and SMProbLog inference are 2AMC tasks.• We weaken X-firstness, the constraint that the variables in X need to occur first, to Xfirstness modulo definability and show that this is sufficient for solving 2AMC tasks under weak additional restrictions.• We lift methods for generating good variable orders statically from tree decompositions to the constrained setting.• We implement our contributions in the algebraic answer set counter aspmc (Eiter et al. 2021) and the probabilistic reasoning engine ProbLog2 (Fierens et al. 2015).• We evaluate our contributions on a range of benchmarks, demonstrating that drastically smaller circuits can be possible, and that our general tools are competitive with state of the art implementations of specific second level tasks in logic programming.

Preliminaries
We consider propositional theories T over variables X.For a set of variables Y, we denote by lit(Y) the set of literals over Y, by int(Y) the set of assignments y to Y and by mod(T ) those assignments that satisfy T .Here, an assignment is a subset of lit(Y), which contains exactly one of a and ¬a for each variable a ∈ Y.Given a partial assignment y ∈ int(Y) for a theory T over X, we denote by T | y the theory over X \ Y obtained by conditioning T on y.We use |= for the usual entailment relation of propositional logic.Algebraic Model Counting (AMC) is a general framework for quantitative reasoning over models that generalizes weighted model counting to the semiring setting (Kimmig et al. 2017).

Some examples of well-known commutative semirings are
• P = ([0, 1], +, •, 0, 1), the probability semiring, • S max,• = (R ≥0 , max, •, 0, 1), the max-times semiring, • S max,+ = (R ∪ {−∞}, max, +, −∞, 0), the max-plus semiring, • EU = ({(p, eu) | p ∈ [0, 1], eu ∈ R}, +, ⊗, (0, 0), (1, 0)), the expected utility semiring, where addition is coordinate-wise and An AMC instance A = (T , S, α) consists of a theory T over variables X, a commutative semiring S and a labeling function α : lit(X) → S. The value of A is A prominent example of AMC is inference in probabilistic logic programming, which uses the probability semiring P and a theory for a probabilistic logic program (PLP).A PLP L = F ∪ R is a set F of probabilistic facts p :: f and a set R of rules h ← b 1 , . . ., b n , not c 1 , . . ., not c m such that each assignment (world) f to F leads to exactly one model.By abuse of notation, we use L also for a propositional theory with the same models.We denote the Herbrand base, i.e., the set of all ground atoms in L, by H.The success probability of a query q is SUCC (q) = h∈mod(L|q) P (h), the sum of the probabilities P (h) = l∈h α(l) of the models h where q succeeds.Here, α(f ) = p, α(¬f ) = 1 − p for p :: f ∈ F and α(l) = 1, otherwise.

Example 1 (Running Example)
We use the following probabilistic logic program L ex throughout the paper.0.4 :: a c ← a 0.6 :: b d ← b Its four worlds are f 1 = {a, b}, f 2 = {a, ¬b}, f 3 = {¬a, b}, f 4 = {¬a, ¬b}, and the probabilities of their models h 1 , h 2 , h 3 , h 4 are P (h 1 ) = 0.24, P (h 2 ) = 0.16, P (h 3 ) = 0.36 and P (h 4 ) = 0.24.The query c succeeds in the models h 1 and h 2 , thus SU CC(c) = P (h 1 ) + P (h 2 ) = 0.4.Kimmig et al. (2017) showed that Knowledge Compilation (KC) to sd-DNNFs solves any AMC problem.sd-DNNFs are special negation normal forms (NNFs).An NNF (Darwiche 2004) is a rooted directed acyclic graph in which each leaf node is labeled with a literal, true or false, and each internal node is labeled with a conjunction ∧ or disjunction ∨.For any node n in an NNF graph, Vars(n) denotes all variables in the subgraph rooted at n.By abuse of notation, we use n also to refer to the formula represented by the graph n. sd-DNNFs are NNFs that satisfy the following additional properties: Decomposability (D): Vars(n i )∩Vars(n j ) = ∅ for any two children n i and n j of an and-node.Determinism (d): n i ∧ n j is logically inconsistent for any two children n i and n j of an or-node.Smoothness (s): Vars(n i ) = Vars(n j ) for any two children n i and n j of an or-node.
In order to solve second level problems, we need Constrained KC (CKC), i.e., an additional property on NNFs, apart from s, d and D, that restricts the order in which variables occur.
Definition 2 (X-Firstness) Given an NNF n on variables partitioned into X, Y, we say an internal node n i of n is pure if Vars(n i ) ⊆ X or Vars(n i ) ⊆ Y and mixed otherwise.n is an X-first NNF, if for each of its and-nodes n i either all children of n i are pure nodes, or one child is mixed and all other children n j of n i are pure with Vars(n j ) ⊆ X.Note that X-first NNFs contain a node n x equivalent to n | x for each x ∈ int(X).
Example 2 (cont.) The NNFs in Figure 1 are both sd-DNNFs and equivalent to L ex .The left sd-DNNF is furthermore an {a, b}-first sd-DNNF, whereas the right is not.

Second Level Algebraic Model Counting
We now introduce Second Level Algebraic Model Counting (2AMC), a generalization of AMC that provides a unified view on second level problems.
, where • T is a propositional theory, • (X I , X O ) is a partition of the variables in T , • S j = (S j , ⊕ j , ⊗ j , e ⊕ j , e ⊗ j ) for j ∈ {I, O} is a commutative semiring, • α j : lit(X j ) → S j for j ∈ {I, O} is a labeling function for literals, and The value of A, denoted by 2AMC(A), is defined as An AMC instance is a 2AMC instance, where X O = ∅ and t is the identity function, meaning we only sum up weights over S I .Intuitively, the idea behind 2AMC is that we solve an inner AMC instance over the variables in X I for each assignment to X O .Then we apply the transformation function to the result, thus replacing the inner summation by a corresponding element from the outer semiring, and solve a second outer AMC instance over the variables in X O .
Example 3 (cont.)Consider the question whether it is more likely for c to be true or false in L ex from Example 1, i.e., we want to find arg max c∈int(c) SUCC (c).To keep notation simple, we consider max rather than arg max (see the discussion below Def. 5 for full details).Denoting the label of literal l by α(l) as in the definition of SUCC , the task corresponds to We thus have a 2AMC task with outer variables {c}, inner variables {a, b, d} and the probability semiring and max-times semiring as inner and outer semiring, respectively, and both kinds of labels given by α.The formal definition of this 2AMC instance is A ex = (L ex , {a, b, d}, {c}, α I , α O , P, S max,• , id), where α I (l) is the probability of l if l ∈ lit({a, b}) and 1 if l ∈ lit({d}), and α O (l) = 1, l ∈ lit({c}).We can further evaluate the value as follows: i.e., the most likely value is 0.6 and corresponds to ¬c.
Before we further illustrate this formalism with tasks from the literature, we prove that 2AMC can be solved in polynomial time on X O -first sd-DNNFs.A similar result is already known for DTProbLog (Derkinderen and De Raedt 2020) and SC-ProbLog (Latour et al. 2017).
Theorem 4 (Tractable 2AMC with X O -first sd-DNNFs) Then, we can compute 2AMC(A) in polynomial time in the size of T assuming constant time semiring operations.

Proof (Sketch)
Consider a subgraph n of T with exactly one outgoing edge for each or-node and all outgoing edges for each and-node.As T is X O -first and smooth, there is a node n in n such that Vars(n ) = X I , i.e., exactly the outer variables occur above n (see also the lowest and-nodes of the left NNF in Figure 1).Thus, n is equivalent to T | x O for some assignment x O to the outer variables, for which n computes the value of the inner AMC instance.As evaluation sums over all these subgraphs, it obtains the correct result.
We illustrate 2AMC with three tasks from quantitative logic programming.

Maximum a Posteriori
Inference A typical second level probabilistic inference task is maximum a posteriori inference, which involves maximizing over one set of variables while summing over another, as in Example 3.
Definition 5 (The MAP task) Given a probabilistic logic program L, a conjunction e of observed literals for the set of evidence atoms E, and a set of ground query atoms Q Find the most probable assignment q to Q given the evidence e, with R = H \ (Q ∪ E): MAP as 2AMC.Solving MAP requires (1) summing probabilities over the truth values of the atoms in R (with fixed truth values for atoms in E ∪ Q), and (2) determining truth values of the atoms in Q that maximize this inner sum.Thus, we have The inner problem corresponds to usual probabilistic inference, i.e., S I = P and α I assigns 1 to the literals in e and 0 to their negations, p and 1 − p to the positive and negative literals for probabilistic facts p :: f that are not part of E, and 1 to both literals for all other variables in X I .
We choose S O = (R ≥0 × 2 lit(Q) , ⊕, ⊗, (0, ∅), (1, ∅)) as the max-times semiring combined with the subsets of the query literals lit(Q) to remember the assignment that was used.Here, where > is some arbitrary but fixed total order on where the label α(l) of literal l is as defined for SUCC .The inner problem is solved by the expected utility semiring S I = EU , with α I mapping literal l to (p l , p l • u(l)) if l is a probabilistic literal with probability p l , and to (1, u(l)) otherwise.
The basis for solving the outer problem is the max-plus semiring S max,+ , with α O the utility function u, and the transformation function t((p, pu)) = pu, if p = 0 and t((0, pu)) = −∞.This is extended to argmax using the same idea as for MAP.

Probabilistic Inference with Stable Models
A more recent second level probabilistic task is probabilistic inference with stable model semantics (Totis et al. 2021;Skryagin et al. 2021).Inference of success probabilities reduces to a variant of weighted model counting, where the weight of a (stable) model of a program L = R ∪ F is normalized with the number of models sharing the same assignment f to the probabilistic facts F: where the label α(l) of literal l is as defined for SUCC .
Example 5 (cont.)For L ex , each assignment f introduces exactly one stable model, i.e.SUCC sm equals SUCC.In the extended program L sm = L ex ∪{e ← not f.f ← not e.}, however, all assignments have two stable models, one where e is added and one where f is added.Therefore, for each assignment Thus SUCC sm (e) = 0.5.
SUCC sm as 2AMC.Computing SUCC sm requires (1) counting the number of models for a given total choice and those that satisfy the query q, and (2) summing the normalized probabilities.Thus, we have X O = F and X I = H \ F. S I is the semiring over pairs of natural numbers, S I = (N 2 , +, •, (0, 0), (1, 1)), where operations are component-wise.α I maps ¬q to (0, 1) and all other literals to (1, 1).The first component thus only counts the models where q is true (|mod(L | f ∪{q} )|), whereas the second component counts all models (|mod(L | f )|).The transformation function is given by t((n 1 , n 2 )) = n1 n2 .The outer problem then corresponds to usual probabilistic inference, i.e., S O = P, and α O assigns p and 1 − p to the positive and negative literals for probabilistic facts p :: f , respectively, and 1 to all other literals.

Weakening X-Firstness
While any 2AMC problem can be solved in polynomial time on an X O -first sd-DNNF representing the logical theory T , such an sd-DNNF can be much bigger than the smallest (ordinary) sd-DNNF for T , as the X-firstness may severely restrict the order in which variables are decided.In the following, we show that for a wide class of transformation functions, we can exploit the logical structure of the theory to relax the X O -first property.
Recall that a 2AMC task includes an AMC task for every assignment x O to the outer variables, which sums over all assignments x I to the inner variables that extend x O to a model of the theory.Consider the CNF T = {y ∨ ¬x, ¬y ∨ x} and let X O = {y} and X I = {x}.The value of the outer variable y already determines the value of the inner variable x.Distributivity allows us to pull x out of every inner sum, as each such sum only involves one of the literals for x.If it does not matter whether we first apply the transformation function and then multiply or the other way around, we have a choice between keeping x in the inner semiring, or pushing its transformed version to the outer semiring.Thus, we can decide between an X O -first or an X O ∪ {x}-first sd-DNNF.Naturally, the more such variables we have, the more freedom we gain.
This situation might also occur after we have already decided some of the variables.Consider the CNF T = {z ∨ y ∨ ¬x, z ∨ ¬y ∨ x} and let X O = {z, y} and X I = {x}.If we set z to true, both y and x can take any value, therefore the value of x is not determined by z and y.However, if we set z to false, we are in the same situation as above and can move x to X O on a local level.
We formalize this, starting with definability to capture when a variable is determined by others.
Definition 7 (Definability (Lagniez et al. 2016)) A variable a is defined by a set of variables X with respect to a theory T if for every assignment x of X it holds that x ∪ T |= a or x ∪ T |= ¬a.We denote the set of variables that are not in X and defined by X with respect to T by D(T , X).
Example 6 (cont.) In L ex the atoms c and d are defined by {a, b} since c holds iff a holds, and d holds iff b holds.
Definition 8 (X-Firstness Modulo Definability) Given an NNF n on variables partitioned into X, Y, we say an internal node n i of n is pure modulo definability if Vars(n i ) ⊆ X ∪ D(n i , X) or Vars(n i ) ⊆ Y and mixed modulo definability, otherwise.n is an X-first NNF modulo definability, X/D-first NNF for short, if for each of its and-nodes n i either all children of n i are pure modulo definability, or one child of n i is mixed modulo definability and all other children n j of n i are pure modulo definability and The intuition here is that we can decide variables from Y earlier, if they are defined by the variables in X in terms of the theory T conditioned on the decisions we have already made.Thus, we only decide the variables in X first modulo definability.
Example 7 (cont.) In Figure 1, the left NNF is an {a, b}-first NNF and therefore also an {a, b}/D-first NNF.The right NNF is not an {a, b}-first NNF but an {a, b}/D-first NNF since D(L ex , {a, b}) contains c.
The following lemma generalizes this example from 2 to n pairs of equivalent variables.
Lemma 9 . ., X n } and D = {Y 1 , . . ., Y n }, then the size of the smallest X-first sd-DNNF for T is exponential in n and the size of the smallest X/D-first sd-DNNF for T is linear in n.

Proof (Sketch)
Since D(T , X) = Y, every sd-DNNF for T is an X/D-first sd-DNNF.As T has treewidth 2, there exists an sd-DNNF of linear size.On the other hand, an X-first sd-DNNF must contain a node that is equivalent to T | x for each of the 2 |X| assignments x ∈ int(X).
We see that X/D-first sd-DNNFs can be much smaller than X-first sd-DNNFs, even on very simple propositional theories.It remains to show that we maintain tractability.As in the beginning of this section, we want to regard defined inner variables as outer variables.For this to work it must not matter whether we first multiply and then apply the transform t or the other way around, i.e., t must be a homomorphism for the multiplications of the semirings.
Definition 10 (Monoid Homomorphism, Generated Monoid) Furthermore, for a subset M ⊆ M of a monoid M = (M, , e ) the monoid generated by M , denoted M M , is M * = (M * , , e ), where M ⊆ M * and M * is the subset minimal set such that M * is a monoid.
Example 8 (cont.)Consider again the 2AMC instance A ex from Example 3. Since a is defined in terms of c, we want to argue that the following equality holds, allowing us to see a as an outer variable: Here, this is easy to see since id is a homomorphism between the monoid ([0, 1], •) of the inner probability semiring and the monoid (R ≥0 , •) of the outer max-times semiring, as id : [0, 1] → R ≥0 , for any p, q ∈ [0, 1], id(p • q) = p • q = id(p) • id(q), and id(1) = 1.
In general, instead of applying the transform to a sum of products of literal labels for a set of variables, we want to apply it independently to (1) the literal labels of a defined variable and (2) the inner sum restricted to the remaining variables.It is therefore sufficient if the equality is valid for the monoid generated by the values we encounter in these situations, rather than for all values from the inner monoid's domain.As we will illustrate for MEU at the end of this section, the transform of some 2AMC tasks only satisfies this weaker but sufficient condition.The following definition captures this idea, where the two subsets of O(A) correspond to the values observed in cases ( 1) and ( 2), respectively: With this in mind, we can state our main result.

Proof (Sketch)
The proof of this theorem exploits (a) distributivity and the form of the transformation function to move defined variables to the outer semiring as outlined at the start of this section, and (b) the fact that when we decide an outer variable, then we get two new 2AMC instances, where the theory T is conditioned on the truth of the decided variable.On these new instances, we can also use definability in the same fashion as before.
Despite the fact that checking which variables are defined for each partial assignment x is not feasible, as checking definability is co-NP-complete and there are more than 2 |X O | partial assignments, this result does have implications for constrained KC in practice.
Firstly, we can check which variables are defined by X O in terms of the whole theory T , and use this to generate a variable order for compilation that leads to an X/D-first sd-DNNF with a priori guarantees on its size.We discuss this in Section 5.
Secondly, as entailment is a special case of definability, Theorem 12 justifies the use of unit propagation during compilation, which dynamically adapts the variable order when variables are entailed by the already decided ones, and thus may violate X-firstness.
We still need to verify that our three example tasks satisfy the preconditions of Theorem 12, i.e., their transformation functions are monoid homomorphisms on the observable values.For MAP and SUCC sm this is easy to prove, as the transformation is already a homomorphism on all values.For MEU, however, the restriction to the monoid generated by the observable values is crucial, as the transformation function is only a homomorphism for tuples (p, pu) with p ∈ {0, 1}, which may not be the case in general.In the DTProbLog setting, however, assignments to decision facts and probabilistic facts are independent, and every assignment to both sets extends to a single model of the theory.Together with the (1, u)-labels of the remaining atoms, this ensures that the result of the inner sum is of the form (1, x), and MEU thus meets the criteria.

Implementation
The general pipeline of PLP solvers takes a program, grounds it and optionally simplifies it or breaks its cycles.Using standard KC tools, this program is then compiled into a tractable circuit either directly or via conversion to CNF.We extended this pipeline in aspmc (Eiter et al. 2021) and SMProbLog (Totis et al. 2021) to compile programs, via CNF, into X/D-first circuits.
To obtain X/D-first circuits we need to specify a variable order in which all variables in X are decided first modulo definedness.Preferably, the chosen order should also result in efficient compilation and thus a small circuit.Korhonen and Järvisalo (2021) have shown how to generate variable orders from tree decompositions of the primal graph of the CNF that result in (unconstrained) sd-DNNFs whose size is bounded by the width of the decomposition.We adapt this result to our constrained setting, where we need to ensure that the tree decomposition satisfies an additional property that allows compilation to essentially consider the non-defined inner variables and the outer variables independently.We first define the necessary concepts.
Definition 13 (Primal Graph, Tree Decomposition) Let T be a CNF over variables X.The primal graph of T , denoted by PRIM(T ), is defined as A tree decomposition (TD) for a graph G is a pair (T, χ), where T is a tree and χ is a labeling of V (T ) by subsets of Next, we show that restricted TDs allow X/D-first compilation with performance guarantees:

Proof (Sketch).
The performance guarantee is due to Korhonen and Järvisalo (2021) and holds when we decide the variables in the order they occur in the TD starting from the root.X/D-firstness can be guaranteed by taking t * as the root of the TD and, thus, first deciding all variables in χ(t * ).From condition (2) it follows that afterwards the CNF has decomposed into separate components, which either only use variables from X ∪ D or use no variables from X. Thus, their compilation only leads to pure NNFs.
To find a TD of small width for which the lemma applies, we proceed as follows.Given a CNF T and partition X I , X O of the variables in inner and outer variables, we first compute D(T , X O ) by using an NP-oracle.Lagniez et al. (2016) showed that this is possible.As the width of a suitable TD is at least the size of the separator minus one, we first approximate a minimum size separator S ⊆ X ∪ D(T , X) using clingo (Gebser et al. 2014) with a timeout of 30 seconds.To ensure that the TD contains a node t with S ⊆ χ(t ), we add the clique over the vertices in S to the primal graph, i.e., we generate a tree decomposition (T, χ) of PRIM(T ) ∪ Clique(S).For this, we use flow-cutter (Dell et al. 2017) with a timeout of 5 seconds.The resulting TD either already contains a node with S = χ(t ) and thus satisfies the preconditions of Lemma 14, or can be modified locally to one that does by splitting the node with S ⊂ χ(t ).
aspmc:1 We use the above strategy to generate a variable order, which is then given to c2d (Darwiche 2004) or miniC2D (Oztok and Darwiche 2015) together with the CNF to compile an X/Dfirst circuit, on which we evaluate the 2AMC task.We stress that c2d -contrary to miniC2Dalways uses unit propagation during compilation, and thus can only be used due to Theorem 12.
Currently, aspmc supports DTProbLog, MAP and SMProbLog programs as inputs.
smProbLog:2 The SMProbLog implementation of Totis et al. (2021) uses DSHARP (Aziz et al. 2015) to immediately compile the logic program to a d-DNNF, which prevents a direct application of our new techniques.Our adapted version obtains a CNF T and variable ordering as in aspmc, which it then compiles to SDD (Darwiche 2011) using the PySDD library3 as in standard ProbLog.SDDs can be seen as a special case of sd-DNNFs.The main difference is that for each branch the variables are decided in the same order.

Experimental Evaluation
Our experimental evaluation addresses the following questions: Q1: How does exploiting definedness influence the efficiency of 2AMC solving?Q2: How does aspmc compare to task-specific solvers from the PLP literature?Q3: How does our second level approach compare to the first level approach when definedness reduces 2AMC to AMC?

General Setup
To answer these questions, we consider logic programs from the literature on the three example 2AMC tasks.To eliminate differences on how different solvers handle n-ary random choices (known as annotated disjunctions) and 0/1-probabilities, we normalize probabilistic programs to contain only probabilistic facts and normal clauses, and replace all probabilities by values chosen uniformly at random from 0.1, 0.2, ..., 0.8, 0.9.
For MAP, we use the growing head, growing negated body, blood and graph examples of Bellodi et al. (2020), with a subset of the probabilistic facts of uniformly random size as query atoms and the given evidence.For MEU, we use the Bayesian networks provided by Derkinderen and De Raedt (2020) as well as the viral marketing example from Van den Broeck et al. (2010) on randomly generated power law graphs that are known to resemble social networks (Barabási and Bonabeau 2003).For SUCC sm , we use an example from Totis et al. (2021) that introduces non-probabilistic choices into the well-known smokers example (Fierens et al. 2015).
Besides this basic set, we also use the original smokers example, where SUCC sm reduces to SUCC, for Q3.For Q1, we use the grid graphs of Fierens et al. (2015) as an additional MAP benchmark.Here, we control definedness by choosing the MAP queries as the probabilistic facts for the edges that are reachable in k steps from the top left corner, for all possible values of k, and use the existence of a path from the top left to the bottom right corner as evidence.
We compare the following systems: aspmc with c2d as default knowledge compiler ProbLog in different versions: ProbLog2 (version 2.1.0.42) in MAP and default (SUCC) mode, the implementation of Derkinderen and De Raedt (2020) for MEU, and our implementation of SMProbLog for SUCC sm PITA (Bellodi et al. (2020), version 4.5, included in SWI-Prolog) for MAP and MEU clingo (Gebser et al. (2014), version 5.5.0.post3) as an indicator of how an enumeration based approach not using knowledge compilation might perform.Note that this approach does not actually compute the 2AMC value of the formula, but only enumerates models.
We limit each individual run of a system to 300 seconds and 4Gb of memory.When plotting running time per instance for a specific solver, we always sort instances by ascending time for that solver, using 300 seconds for any instance that did not finish successfully within these bounds.
The instances, results and benchmarking scripts are available at github.com/raki123/CC.

Results
Q1: How does exploiting definedness influence the efficiency of 2AMC solving?To answer the first question, we use all proper 2AMC benchmarks, and focus on the 2AMC task itself, i.e., we start from the labeled CNF corresponding to the instance.We consider four different settings obtained by independently varying two dimensions: constraining compilation to either X-first or X/D-first, and compiling to either sd-DNNF using c2d or to SDD using miniC2D.
On the left of Figure 2, we plot the width of the tree decompositions the solver uses to determine the variable order in the X-first or X/D-first case, respectively.Recall from Lemma 14 that this width appears in the exponent of the compilation time bound.The optimal tree decomposition's width in the X/D-first case is at most that of the X-first case, and would thus result in points on or below the black diagonal only.In practice, we observe many points close to the diagonal, with two notable exceptions.MAP instances with high width tend to be slightly above the diagonal, whereas MAP grids are mostly clearly below the diagonal.These results can be explained by the shape of the problems and the fact that we only approximate the optimal decomposition, as this is a hard task.We note that for many of the benchmarks, the amount of variables defined in terms of the outer variables (decision variables for MEU, query variables for MAP, probabilistic facts for SUCC SM ) is limited.The exception are the MAP grids, where the choice of queries entails definedness.
We plot the same data restricted to the instances solved within the time limit on the right of Figure 2, along with summary statistics on the number of instances solved for three ranges of  X/D-width in the caption.We observe that almost no instances with X/D-width above 40 are solved.At the same time, almost all instances with X/D-width below 20 are solved, including many cases with X-width above 40, where we thus see a clear benefit from exploiting definedness.
In Figure 3, we plot the running times per instance for the different solvers.Given the width results, we distinguish between MAP grids and the remaining cases.On the grids, taking into account definedness results in clear performance gains when compiling SDDs (using miniC2D).On the other hand, compiling sd-DNNFs with c2d shows only a marginal difference between X and X/D variable orders.The reason is that c2d implicitly exploits definedness even when given the X-first order through its use of unit propagation.SDD compilation, on the other hand, cannot deviate from the given order, and only benefits from definedness if it is reflected in the variable order.On the other benchmarks, with fewer defined variables, the variable order has little effect within the same circuit class, but sd-DNNFs outperform SDDs, likely because unit propagation can also exploit context-dependent definedness.In the following we thus use c2d.
Q2: How does aspmc compare to task-specific solvers from the PLP literature?We first consider the efficiency of the whole pipeline from instance to solution on the MAP and MEU tasks, which are addressed by both ProbLog and PITA.From the plots of running times in Figure 4, we observe that all solvers outperform clingo's model enumeration.ProbLog is slower than both aspmc and PITA except on the MAP graphs.Among aspmc and PITA, aspmc outperforms PITA on MEU, and vice versa on MAP.This can be explained by the different overall approach to knowledge compilation taken in the various solvers.aspmc always compiles a CNF encoding of the whole ground program, including all ground atoms, ProbLog compiles a similar encoding, but restricted to the part of the formula that is relevant to the task at hand, while PITA directly compiles a circuit for the truth of atoms of interest in terms of the relevant ground choice atoms.As MAP queries are limited to probabilistic facts, this allows PITA to compile a circuit for the truth of the evidence in terms of relevant probabilistic facts only, which especially for the graph setting can be significantly smaller than a complete encoding.For MEU, PITA needs one circuit per atom with a utility, which additionally need to be combined, putting PITA at a disadvantage.
For SUCC sm , we plot running times on the modified smokers setting for the clingo baseline, aspmc and three variants of the dedicated ProbLog implementation, namely the original implementation of Totis et al. (2021) that compiles to d-DNNF as well as our modified implementation compiling to X-first and X/D-first SDDs.These problems appear to be hard in general, but we observe a clear benefit from the constrained compilation enabled in our approach.
Q3: How does our second level approach compare to the first level approach when definedness reduces 2AMC to AMC? On the regular smokers benchmark, SMProbLog and ProbLog semantics coincide, i.e., all inner variables of the 2AMC task are defined, and it thus reduces to AMC.In Figure 5, we plot running times for the AMC and 2AMC variants of both aspmc and ProbLog.For ProbLog, there is a clear gap between the two approaches, which is at least in part due to the fact that ProbLog only compiles the relevant part of the program, whereas SMProbLog compiles the full theory.aspmc outperforms ProbLog on the harder instances, with limited overhead for the second level task.

Conclusion
2AMC is a hard problem, even harder than #SAT or AMC in general, as it imposes significant constraints on variable orders in KC.Our theoretical results show that these constraints can be weakened by exploiting definedness of variables.In practice, this allow us to (i) introduce a strategy to construct variable orders for compilation into X/D-first sd-DNNFs with a-priori guarantees on complexity, and (ii) to use unit propagation to decide literals earlier than specified by the variable order during compilation.Our experimental evaluation shows that (ii) generally improves the performance and (i) can boost it when many variables are defined.Furthermore, we see that compilation usually performs much better than an enumeration based approach to solve 2AMC.Last but not least, our extensions of aspmc and SMProbLog are competitive with PITA and ProbLog, the state of the art solvers for MAP, MEU and SUCC sm inference for logic programs, and even exhibit improved performance on MEU and SUCC sm .
• t is a monoid homomorphism from the monoid M = O(A) (R I ,⊗ I ,e ⊗ I ) generated by the observable values to (R O , ⊗ O , e ⊗ O ).

Proof (Sketch).
Let n be the root of T .We know that the variables in D(n, X O ) are defined by X O in terms of T .For d ∈ D(n, X O ) and x O ∈ int(X O ), we denote by d | x O the literal of d that must be included in x I in order for x I ∪ x O to be a satisfying assignment (if x O can be extended to a satisfying assignment, otherwise choose an arbitrary but fixed value).
Recall that the value of A is defined as Since the inner sum only takes interpretations that satisfy T , we do not need to take the sum over both values of a defined variable d but can restrict ourselves to the value d | x O determined by x O .
. Now, if d occurs before some variable x ∈ X in some node n of T , then n is not an X-first NNF.However, it is an X O ∪ {d}-first sd-DNNF.On this NNF we can solve the 2AMC-instance B in polynomial time according to Theorem 3. By induction on the number of variables that occur before x the claim follows.

Proof
The performance guarantee is due to Korhonen and Järvisalo (2021) and holds when we decide the variables in the order they occur in the TD starting from the root.X/D-firstness can be guaranteed by taking t * as the root of the TD and, thus, first deciding all variables in χ(t * ) = {S 1 , . . ., S n }.From condition (2) it follows that afterwards the CNF has decomposed into separate components, which either only use variables from X ∪ D or use no variables from X. Thus, their compilation only leads to pure NNFs.
The variable order can be inspected schematically in Figure A 1. Here, v I is the remaining variable order for the inner variables and v O is the remaining variable order for the outer (and possibly defined) variables.The split signifies that the CNF composes into different components and we can consider respective variable orders for them independently.

A.2 Omitted Lemmas Showing Homomorphism Property
Here, we give proofs for the fact that the transformation functions of the different 2AMC problems are homomorphisms.
Note that if n 2 is zero then also n 1 is zero, so no other division by zero is avoided by h.

Fig. 1 :
Fig.1: Two sd-DNNFs for Lex.Mixed nodes for partition {a, b}, {c, d} are circled red and pure nodes are boxed blue or boxed green, when they are from {a, b} or {c, d}, respectively.
I (a) • α I (b) = max{α I (a)α I (b) + α I (a)α I (¬b), α I (¬a)α I (b) + α I (¬a)α I (¬b)} = max{0.4• 0.6 + 0.4 • 0.4, 0.6 • 0.6 + 0.6 • 0.4} = max{0.4,0.6} = 0.6 for probabilistic fact p :: f , and α O (l) = (1, {l}) otherwise.The transformation function is the function t(p) = (p, ∅).Maximizing Expected Utility Another second level probabilistic task is maximum expected utility (Van den Broeck et al. 2010; Derkinderen and De Raedt 2020), which introduces an additional set of variables D whose truth value can be arbitrarily chosen by a strategy σ(D) = d, d ∈ int(D), and is neither governed by probability nor logical rules.A utility function u maps each literal l to a reward u(l) ∈ R for l being true.Definition 6 (Maximum Expected Utility (MEU) Task) Given A program L = F ∪ R ∪ D with a utility function u Find a strategy σ * that maximizes the expected utility:

Example 4
(cont.)Consider the program L EU obtained from L ex by replacing 0.4 :: a by a decision variable ?:: a, with u(c) = 40, u(¬d) = 20 and u(l) = 0 for all other literals.Setting a to true, we have models {a, b, c, d} with probability 0.6 and utility 40 and {a, c} with probability 0.4 and utility 60, and thus expected utility 0.6•40+0.4•60= 48.Similarly, we have expected utility 0.6•0+0.4•20= 8 for setting a to false.Thus, the MEU strategy sets a true.MEU as 2AMC.Solving MEU involves (1) summing expected utilities of models over the nondecision variables (with fixed truth values for D), and (2) determining truth values of the atoms in D that maximize this inner sum.Thus, we have X O = D and X I = H \ D.

Theorem 12 (
Tractable AMC with X/D-first sd-DNNFs) The value of a 2AMC instance A = (T , X I , X O , α I , α O , S I , S O , t) can be computed in polynomial time, assuming constant time semiring operations, under the following conditions: • T is an X/D-first sd-DNNF • t is a homomorphism from the monoid O(A) (R I ,⊗ I ,e ⊗ I ) generated by the observable values to (R O , ⊗ O , e ⊗ O ).

Fig. 2 :
Fig. 2: Q1: Comparison of tree decomposition width for X-first and X/D-first variable order, across all 2AMC instances (left) and for solved 2AMC instances only (right).We solve 822 of the 825 instances with X/D-width at most 20, 118 of the 219 instances with X/D-width between 21 and 40, and 7 of the 353 instances with X/D-width above 40.

Fig. 3 :
Fig. 3: Q1: Running times per instance for different configurations on MAP grids (left) and all other 2AMC instances (right).

Fig. 4 :
Fig. 4: Q2: Running times of different solvers on MAP problem sets (top), indicated above each plot, MEU problems (bottom left) and SUCC SM (bottom right).

I
x I ∈int(X I ),x I ∪x O |=T I y∈x I α I (y) = I x I ∈int(X I \{d}),x I ∪x O |=T ⊗ I α I (d | x O )⊗ I I y∈x I α I (y) =α I (d | x O )⊗ I I x I ∈int(X I \{d}),x I ∪x O |=T I y∈x I α I (y)Next, we can plug these equalities into the expression for the value of A and use that t is a homomorphism.

O
x O ∈int(X O ) O x∈x O α O (x)⊗ O t I x I ∈int(X I ),x I ∪x O |=T I y∈x I α I (y) = O x O ∈int(X O ) O x∈x O α O (x)⊗ O t α I (d | x O )⊗ I I x I ∈int(X I \{d}),x I ∪x O |=T I y∈x I α I (y) = O x O ∈int(X O ) O x∈x O α O (x)⊗ O t(α I (d | x O ))⊗ O t I x I ∈int(X I \{d}),x I ∪x O |=T I y∈x I α I (y)Due to fact that t satisfies t(e ⊗ I ) = e ⊗ O , t(e ⊕ I ) = e ⊕ O , whenever the assignment x O cannot be extended to a satisfying of T , the weight for the given assignment will be e ⊕ O .Thus, we can again use the fact that the variable d is defined and sum over both of its values in the outer sum, resulting inO (x O ,d)∈int(X O ∪{d} O x∈x O α O (x)⊗ O t I d∈d α I (d) ⊗ O t I x I ∈int(X I \{d}),x I ∪x O |=T I y∈x I α I (y) = O (x O ,d)∈int(X O ∪{d}) O x∈x O α O (x)⊗ O O d∈d t(α I (d))⊗ O t I x I ∈int(X I \{d}),x I ∪x O |=T I y∈x I α I (y)We observe that this expression is equal to the value of another 2AMC instance B = (T, X I \ {d}, X O ∪ {d}, β I , β O , R I , R O ), where β I (x) = α I (x) for x ∈ lit(X I \ {d}) Fig. A 1: Schematic variable order constructed in the proof of Lemma 14.