Probabilistic Answer Set Programming with Discrete and Continuous Random Variables

DAMIANO AZZOLINI; FABRIZIO RIGUZZI

doi:10.1017/S1471068424000437

Probabilistic Answer Set Programming with Discrete and Continuous Random Variables

Published online by Cambridge University Press: 08 November 2024

DAMIANO AZZOLINI

and

FABRIZIO RIGUZZI

Show author details

DAMIANO AZZOLINI: Affiliation:
Department of Environmental and Prevention Sciences, University of Ferrara, Ferrara, Italy (e-mail: damiano.azzolini@unife.it)
FABRIZIO RIGUZZI: Affiliation:
Department of Mathematics and Computer Science, University of Ferrara, Ferrara, Italy (e-mail: fabrizio.riguzzi@unife.it)

Article contents

Abstract
Introduction
Background
3 Hybrid probabilistic answer set programming
4 Algorithms
5 Experiments
6 Related Work
Conclusions
Competing interests
Footnotes
References

Rights & Permissions

Abstract

Probabilistic Answer Set Programming under the credal semantics extends Answer Set Programming with probabilistic facts that represent uncertain information. The probabilistic facts are discrete with Bernoulli distributions. However, several real-world scenarios require a combination of both discrete and continuous random variables. In this paper, we extend the PASP framework to support continuous random variables and propose Hybrid Probabilistic Answer Set Programming. Moreover, we discuss, implement, and assess the performance of two exact algorithms based on projected answer set enumeration and knowledge compilation and two approximate algorithms based on sampling. Empirical results, also in line with known theoretical results, show that exact inference is feasible only for small instances, but knowledge compilation has a huge positive impact on performance. Sampling allows handling larger instances but sometimes requires an increasing amount of memory.

Keywords

hybrid probabilistic answer set programming statistical relational artificial intelligence credal semantics algebraic model counting exact and approximate inference

Information

Type: Original Article
Information: Theory and Practice of Logic Programming , Volume 25 , Issue 1 , January 2025 , pp. 1 - 32

DOI: https://doi.org/10.1017/S1471068424000437 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

1 Introduction

Almost 30 years ago (Sato Reference Sato1995), Probabilistic Logic Programming (PLP) was proposed for managing uncertain data in Logic Programming. Most of the frameworks, such as PRISM (Sato Reference Sato1995) and ProbLog (De Raedt et al. Reference De RAEDT, Kimmig and Toivonen2007), attach a discrete distribution, typically Bernoulli or Categorical, to facts in the program and compute the success probability of a query, that is, a conjunction of ground atoms. Recently, several extensions have been proposed for dealing with continuous distributions as well (Gutmann et al. Reference Gutmann, Jaeger and De Raedt2011a, Reference Gutmann, Thon, Kimmig, Bruynooghe and De RAEDT2011b; Azzolini et al. Reference Azzolini, Riguzzi and Lamma2021), which greatly expand the possible application scenarios.

Answer Set Programming (ASP) (Brewka et al. Reference Brewka, Eiter and Trusczynski2011) is a powerful formalism for representing complex combinatorial domains. Most of the research in this field focuses on deterministic programs. There are three notable exceptions: LPMLN (Lee and Wang Reference Lee, Wang, Baral, Delgrande and Wolter2016), P-log (Baral et al. Reference Baral, Gelfond and Rushton2009), and Probabilistic Answer Set Programming (PASP) under the credal semantics (CS) (Cozman and Mauá Reference Cozman and Maua2016). The first assigns weights to rules, while the last two attach probability values to atoms. However, all three allow discrete distributions only.

In this paper, we extend PASP under the CS to Hybrid Probabilistic Answer Set Programming (HPASP) by adding the possibility of expressing continuous distributions. Our approach is based on a translation of the hybrid probabilistic answer set program into a regular probabilistic answer set program (with discrete random variables only) via a process that we call discretization. In this way, we can adapt and leverage already existing tools that perform inference in probabilistic answer set programs under the CS with discrete variables only. We implemented the discretization process on top of two exact solvers, based on projected answer set enumeration and knowledge compilation, respectively, and tested them on a set of five benchmarks with different configurations. Furthermore, we also implemented and tested two approximate algorithms based on sampling on the original program (with both discrete and continuous facts) and on the discretized program (with only discrete facts). Our experiments show that knowledge compilation has a huge impact in reducing the execution time and thus in increasing the scalability of the inference task. However, larger instances require approximate algorithms that perform well in almost all the tests but require, in particular, the one based on sampling the discretized program, a substantial amount of memory.

The paper is structured as follows: Section 2 introduces the needed background knowledge. Section 3 illustrates HPASP, and Section 4 presents the exact and approximate algorithms for performing inference, whose performance is tested in Section 5. Section 6 compares our proposal with related work, and Section 7 concludes the paper.

2 Background

In this section, we review the main concepts used in the paper.

2.1 Answer set programming

In this paper, we consider ASP (Brewka et al. Reference Brewka, Eiter and Trusczynski2011). An answer set program contains disjunctive rules of the form $ h_1 ; \dots ; h_m \,{:\!-}\, \, b_1, \dots, b_n.$ , where the disjunction of atoms $h_i$ s is called head and the conjunction of literals $b_i$ s is called body. Rules with no atoms in the head are called constraints, while rules with no literals in the body and a single atom in the head are called facts. Choice rules are a particular type of rules where a single head is enclosed in curly braces as in $\{a\} \,{:\!-}\, b_1, \dots, b_n$ , whose meaning is that $a$ may or may not hold if the body is true. They are syntactic sugar for $a;not\_a \,{:\!-}\, b_1,\dots, b_n$ where $not\_a$ is an atom for a new predicate not appearing elsewhere in the program. We allow aggregates (Alviano and Faber Reference Alviano and Faber2018) in the body, one of the key features of ASP that allows representing expressive relations between the objects of the domain of interest. An example of a rule $r0$ containing an aggregate is $v(A)\,{:\!-}\, \#count\{X:b(X)\}=A$ . Here, variable $A$ is unified with the integer resulting from the evaluation of the aggregate $\#count\{X:b(X)\}$ , which counts the elements $X$ such that $b(X)$ is true.

The semantics of ASP is based on the concept of a stable model (Gelfond and Lifschitz Reference Gelfond and Lifschitz1988). Every answer set program has zero or more stable models, also called answer sets. An interpretation $I$ for a program $P$ is a subset of its Herbrand base. The reduct of a program $P$ with respect to an interpretation $I$ is the set of rules of the grounding of $P$ with the body true in $I$ . An interpretation $I$ is a stable model or, equivalently, an answer set, of $P$ if it is a minimal model (under set inclusion) of the reduct of $P$ w.r.t. $I$ . We denote with $AS(P)$ the set of answer sets of an answer set program $P$ and with $|AS(P)|$ its cardinality. If $AS(P) = \emptyset$ we say that $P$ is unsatisfiable. If we consider rule $r0$ and two additional rules, $\{b(0)\}$ and $b(1)$ , the resulting program has 2 answer sets: $\{\{b(1), b(0), v(2)\},\{b(1), v(1)\}\}$ . Finally, we will also use the concept of projective solutions of an answer set program $P$ onto a set of atoms $B$ (defined as in (Gebser et al. Reference Gebser, Kaufmann, Schaub, Van Hoeve and Hooker2009)): $AS_{B}(P) = \{A \cap B \mid A \in AS(P)\}$ .

2.2 Probabilistic answer set programming

Uncertainty in Logic Programming can be represented with discrete Boolean probabilistic facts of the form $\Pi ::a$ where $\Pi \in [0,1]$ and $a$ is an atom that does not appear in the head of rules. These are considered independent: this assumption may seem restrictive, but, in practice, the same expressivity of Bayesian networks can be achieved by means of rules and extra atoms (Riguzzi Reference Riguzzi2022). Every probabilistic fact corresponds to a Boolean random variable. With probabilistic facts, a normal logic program becomes a probabilistic logic program. One of the most widely adopted semantics in this context is the Distribution Semantics (DS) (Sato Reference Sato1995). Under the DS, each of the $n$ probabilistic facts can be included or not in a world $w$ , generating $2^n$ worlds. Every world is a normal logic program. The DS requires that every world has a unique model. The probability of a world $w$ is defined as $ P(w) = \prod _{f_i \in w} \Pi _i \cdot \prod _{f_i \not \in w} (1 - \Pi _i).$

If we extend an answer set program with probabilistic facts, we obtain a probabilistic answer set program that we interpret under the CS (Cozman and Mauá Reference Cozman and Maua2016, Reference Cozman and Maua2020). In the following, when we write “probabilistic answer set program,” we assume that the program follows the CS. Similarly to the DS, the CS defines a probability distribution over the worlds. However, every world (which is an answer set program in this case) may have 0 or more stable models. A query $q$ is a conjunction of ground literals. The probability $P(q)$ of a query $q$ lies in the range $[\underline{P}(q), \overline{P}(q)]$ where

(1)

\begin{equation} \begin{split} \underline{P}(q) &= \sum _{w_i \, \text{such that} \, \forall m \in AS(w_i), \, m \models q} P(w_i), \\[5pt] \overline{P}(q) &= \sum _{w_i \, \text{such that} \, \exists m \in AS(w_i), \, m \models q} P(w_i). \end{split} \end{equation}

That is, the lower probability is given by the sum of the probabilities of the worlds where the query is true in every answer, while the upper probability is given by the sum of the probabilities of the worlds where the query is true in at least one answer set. Here, as usual, we assume that all the probabilistic facts are independent and that they cannot appear as heads of rules. In other words, the lower (upper) probability is the sum of the probabilities of the worlds from which the query is a cautious (brave) consequence. Conditional inference means computing upper and lower bounds for the probability of a query $q$ given evidence $e$ , which is usually given as a conjunction of ground literals. The formulas are by Cozman and Mauá (Reference Cozman and Maua2017):

(2)

\begin{equation} \begin{split} \underline{P}(q \mid e) &= \frac{\underline{P}(q,e)}{\underline{P}(q,e) + \overline{P}(\lnot q,e)} \\[5pt] \overline{P}(q \mid e) &= \frac{\overline{P}(q,e)}{\overline{P}(q,e) + \underline{P}(\lnot q,e)} \end{split} \end{equation}

Conditional probabilities are undefined if the denominator is 0.

Example 1. The following probabilistic answer set program has two probabilistic facts, $0.3::a$ and $0.4::b$ :

There are $2^2 = 4$ worlds: $w_0 = \{not\, a, not\, b\}$ , $w_1 = \{not\, a, b\}$ , $w_2 = \{a, not\, b\}$ , and $w_3 = \{a,b\}$ (with $not\, f$ we mean that the fact $f$ is absent, not selected), whose probability and answer sets are shown in Table 1. For example, the answer sets for $w_2$ are computed from the following program: $a.\, q0 ; q1 \,{:\!-}\, a.\, q0 \,{:\!-}\, b.$ If we consider the query $q0$ , it is true in all the answer sets of $w_1$ and $w_3$ and in one of those of $w_2$ ; thus, we get $[P(w_1) + P(w_3), P(w_1) + P(w_2) + P(w_3)] = [0.4,0.58]$ as probability bounds. $\square$

Table 1.

Worlds, probabilities, and answer sets for Example1

DS and CS can be given an alternative but equivalent definition based on sampling in the following way. We repeatedly sample worlds by sampling every probabilistic fact obtaining a normal logic program or an answer set program. Then, we compute its model(s), and we verify whether the query is true in it (them). The probability of the query under DS is the fraction of worlds in whose model the query is true as the number of sampled worlds tends to infinity. For CS, the lower probability is the fraction of worlds where the query is true in all models, and the upper probability is the fraction of worlds where the query is true in at least one model. We call this the sampling semantics, and it is equivalent to the DS and CS definitions by the law of large numbers.

The complexity of the CS has been thoroughly studied by Mauá and Cozman (Reference Maua and Cozman2020). In particular, they focus on analyzing the cautious reasoning (CR) problem: given a PASP $P$ , a query $q$ , evidence $e$ , and a value $\gamma \in \mathbb{R}$ , the result is positive if $\underline{P}(q \mid e) \gt \gamma$ , negative otherwise (or if $\underline{P}(e) = 0$ ). A summary of their results is shown in Table 2. For instance, in PASP with stratified negation and disjunctions in the head, the complexity of the CR problem is in the class $PP^{\Sigma _2^p}$ , where $PP$ is the class of the problems that can be solved in polynomial time by a probabilistic Turing machine (Gill Reference Gill1977). The complexity is even higher if aggregates are allowed.

Table 2.

Tight complexity bounds of the CR problem in PASP from Mauá and Cozman (Reference Maua and Cozman2020). Not stratified denotes programs with stratified negations and $\vee$ disjunction in the head

2.3 ProbLog and hybrid probLog

A ProbLog (De Raedt et al. Reference De RAEDT, Kimmig and Toivonen2007) program is composed by a set of Boolean probabilistic facts as described in Section 2.2 and a set of definite logical rules, and it is interpreted under the DS. The probability of a query $q$ is computed as the sum of the probabilities of the worlds where the query is true: every world has a unique model, so it is a sharp probability value.

Gutmann et al. (Reference Gutmann, Jaeger and De Raedt2011a) proposed Hybrid ProbLog, an extension of ProbLog with continuous facts of the form

\begin{equation*} (X,\phi )::b \end{equation*}

where $b$ is an atom, $X$ is a variable appearing in $b$ , and $\phi$ is a special atom indicating the continuous distribution followed by $X$ . An example of continuous fact is $(X,gaussian(0,1))::a(X)$ , stating that the variable $X$ in $a(X)$ follows a Gaussian distribution with mean 0 and variance 1. A Hybrid ProbLog program $P$ is a pair $(R,T)$ where $R$ is a set of rules and $T = T^c \cup T^d$ is a finite set of continuous ( $T^c$ ) and discrete ( $T^d$ ) probabilistic facts. The value of a continuous random variable $X$ can be compared only with constants through special predicates: $below(X,c_0)$ and $above(X,c_0)$ , with $c_0 \in \mathbb{R}$ , which succeed if the value of $X$ is, respectively, less than or greater than $c_0$ , and $between(X,c_0,c_1)$ , with $c_0, c_1 \in \mathbb{R}$ , $c_0 \lt c_1$ , which succeeds if $X \in [c_0,c_1]$ .

The semantics of Hybrid ProbLog is given in Gutmann et al. Reference Gutmann, Jaeger and De Raedt(2011a) a proof theoretic way. A vector of samples, one for each continuous variable, defines a so-called continuous subprogram, so the joint distribution of the continuous random variables defines a joint distribution (with joint density $f(\mathbf{x})$ ) over continuous subprograms. An interval $I \in \mathbb{R}^n$ , where $n$ is the number of continuous facts, is defined as the cartesian product of an interval for each continuous random variable, and the probability $P(\mathbf{X} \in I)$ can be computed by integrating $f(\mathbf{x})$ over $I$ , that is,

\begin{equation*} P(\mathbf {X} \in I) = \int _{I} f(\mathbf {x}) \, d\mathbf {x}. \end{equation*}

Given a query $q$ , an interval $I$ is called admissible if, for any $\mathbf{x}$ and $\mathbf{y}$ in $I$ and for any truth value of the probabilistic facts, the truth value of the $q$ evaluated in the program obtained by assuming $\mathbf{X}$ takes value $\mathbf{x}$ and $\mathbf{y}$ is the same. In other words, inside an admissible interval, the truth value of a query is not influenced by the values of the continuous random variables; that is, it is always true or false once the value of the discrete facts is fixed. For instance, if we have the continuous fact $(X,gaussian(0,1))::a(X)$ and a rule $q \,{:\!-}\, a(X), between(X,0,2)$ , the interval $[0,3]$ is not admissible for the query $q$ , while $[0.5,1.5]$ is (since for any value of $X \in [0.5,1.5]$ , $between(X,0,2)$ is always true). A partition is a set of intervals, and it is called admissible if every interval is admissible. The probability of a query is defined by extracting its proofs together with the probabilistic facts, continuous facts, and comparison predicates used in the proofs. These proofs are finitely many since the programs do not include function symbols and generalize the infinitely many proofs of a hybrid program (since there can be infinitely many values for continuous variables). Gutmann et al. (Reference Gutmann, Jaeger and De Raedt2011a) proved that an admissible partition exists for each query having a finite number of proofs and the probability of a query does not depend on the admissible partition chosen. They also propose an inference algorithm that first computes all the proofs for a query. Then, according to the continuous facts and comparison predicates involved, it identifies the needed partitions. To avoid counting multiple times the same contribution, the proofs are made disjoint. Lastly, the disjoint proofs are converted into a binary decision diagram (BDD) from which the probability can be computed by traversing it bottom up (i.e., with the algorithm adopted in ProbLog (De Raedt et al. Reference De RAEDT, Kimmig and Toivonen2007)).

Let us clarify it with a simple example.

Example 2. The following Hybrid ProbLog program has one discrete fact and one continuous fact.

Consider the query $q$ . It has two proofs: $a$ and $b(X), above(X,0.3)$ . The probability of $q$ is computed as $P(a) \cdot P(X \gt 0) + P(a) \cdot (1 - P(X \gt 0)) + (1 - P(a)) \cdot P(X \gt 0) = 0.4\cdot 0.5 + 0.4 \cdot 0.5 + 0.6 \cdot 0.5 = 0.7$ . $\square$

The Hybrid ProbLog framework imposes some restrictions on the continuous random variables and how they can be used in programs, namely: (i) comparison predicates must compare variables with numeric constants, (ii) arithmetic expressions involving continuous random variables are not allowed, and (iii) terms inside random variables definitions can be used only inside comparison predicates. These restrictions allow the algorithm for the computation of the probability to be relatively simple.

3 Hybrid probabilistic answer set programming

Probabilistic logic programs that combine both discrete and continuous random variables are usually named hybrid.Footnote ¹ In this paper, we adopt the same adjective to describe probabilistic answer set programs with both discrete and continuous random variables, thus introducing hybrid probabilistic answer set programs. We use the acronym (H)PASP to indicate both (Hybrid) Probabilistic Answer Set Programming and (hybrid) probabilistic answer set program; the meaning will be clear from the context.

Without loss of generality, we consider only ground discrete and continuous probabilistic facts. With the syntax

where $f$ is ground, we indicate a continuous random variable $f$ that follows the distribution specified by $distribution$ . For example, to state that $a$ is a continuous random variable that follows a Gaussian distribution with mean 2 and variance 1, we write

Definition 1. A hybrid probabilistic answer set program is a triple $(D,C,R)$ , where $D$ is a finite set of ground independent probabilistic facts, $C$ is a finite set of continuous random variables definitions, and $R$ is a set of rules with none of the atoms in $D$ or $C$ in the head. $\square$

As in Hybrid ProbLog, we reserve some special predicates (that we call comparison predicates) to compare the value of the random variables with numeric ( $\in \mathbb{R}$ ) constants: $\mathit{above}(\mathit{Var},\mathit{value})$ , which is true if the value for the variable $\mathit{Var}$ is greater than the numeric value $\mathit{value}$ ( $\mathit{Var} \gt \mathit{value}$ ); $\mathit{below}(\mathit{Var},\mathit{value})$ , which is true if the value for the variable $\mathit{Var}$ is less than the numeric value $\mathit{value}$ ( $\mathit{Var} \lt \mathit{value}$ ); $\mathit{between}(\mathit{Var},l,u)$ , which is true if the value for the variable $\mathit{Var}$ is between the range defined by the numeric values $l$ and $u$ (i.e., $\mathit{Var} \gt l$ and $\mathit{Var} \lt u$ ); and $\mathit{outside}(\mathit{Var},l,u)$ , which is true if the value for the variable $\mathit{Var}$ is outside the range defined by the numeric values $l$ and $u$ (i.e., $\mathit{Var} \lt l$ or $\mathit{Var} \gt u$ ). Given a value for a random variable, we can evaluate the atoms for the comparison predicates. For example, if variable $a$ has value 1.2, $\mathit{below}(a,1.8)$ is true, $\mathit{above}(a,1.8)$ is false, $\mathit{between}(a,1.1,1.8)$ is true, and $\mathit{outside}(a,1.1,1.8)$ is false. In our framework, we consider the same restrictions of Gutmann et al. Reference Gutmann, Jaeger and De Raedt(2011a). Note that $outside/3$ literals are not allowed by Hybrid ProbLog.

We can extend the sampling semantics for the CS to hybrid probabilistic answer set programs: we sample every probabilistic fact and a value for every continuous variable, obtaining sample $s=(d,c)$ , where $d$ is the sample for discrete facts and $c$ is the sample for continuous random variables. From $s$ , we build a hybrid world $w=(P,c)$ where $P$ is obtained by adding to $R$ each fact $f_i$ for which $p_i::f_i \in D$ and $d_i=1$ in vector $d$ . We then ground the rules and check the sampled values for the continuous random variables against every comparison predicate that contains the variable: (i) if the comparison is false, we remove the grounding of the rule, or (ii) if the comparison is true, we remove the comparison predicate from the body of the ground rule. We are left with an answer set program $w^{'}$ , and we can compute its stable models. The lower and upper probability of a query $q$ are given, as in the sampling semantics for CS, as the fraction of worlds where the query is true in every answer set and in at least answer set, respectively, as the number of samples goes to infinity. That is,

(3)

\begin{align} \underline{P}(q) &= \lim _{n \to \infty } \frac{\sum _{i = 1}^n \unicode{x1D7D9}(\forall m \in AS(w_{i}^{'}), m \models q)}{n} \nonumber \\[5pt] \overline{P}(q) &= \lim _{n \to \infty } \frac{\sum _{i = 1}^n \unicode{x1D7D9}(\exists m \in AS(w_{i}^{'}), m \models q)}{n} \end{align}

where $\unicode{x1D7D9}$ is the indicator function, returning 1 if its argument is true, 0 otherwise, and $AS(w_{i}^{'})$ is the set of answer sets of the program $w_{i}^{'}$ obtained from the $i$ -th sample. We call this the hybrid sampling semantics.

To better specify the syntax, let us introduce an example.

Example 3. Consider a medical domain where we are interested in computing the probability of stroke given the values of the blood pressure of some individuals.

For ease of explanation and conciseness, we use the syntax $a(l..u)$ with $l, u \in \mathbb{N}$ , $l \lt u$ , to denote a set of atoms $a(l), a(l+1), \dots, a(u)$ . The program considers four people indexed with 1, 2, 3, and 4. The first two lines introduce eight discrete probabilistic facts, $pred\_d(i)$ , $i \in \{1,\dots, 4\}$ (pred is short for predisposition), which are true with probability 0.4, and $pred\_s(i)$ , $i \in \{1,\dots, 4\}$ , which are true with probability 0.6, and eight continuous random variables ( $d(1), \dots, d(4), s(1), \dots, s(4)$ , where $d$ stands for diastolic and $s$ for systolic), where the first four follow a gamma distribution with shape 70 and rate 1 and while the remaining follow a gamma distribution with shape 120 and rate 1. $prob\_d/1$ indicates a problem from the diastolic pressure, and it is true if there is a problem coming from the diastolic pressure if its value is below 60 or above 80. Similarly for $prob\_s/1$ . In general, a person has a blood pressure problem ( $prob/1$ ) if they have either a diastolic or systolic pressure problem, and there is a predisposition for it (either $pred\_d$ or $pred\_s$ ). A person having a blood pressure problem can have a stroke or not. The constraint states that at least 40% of people that have a pressure problem also have a stroke. Finally, we may be interested in computing the probability that more than one person has a stroke ( $high\_number\_strokes/0$ ). $\square$

The comparison predicates subdivide the domain of a random variable into disjoint and exhaustive intervals $I_1 \cup I_2 \cup \dots \cup I_m$ . The extremes of the disjoint intervals are obtained by selecting all constants appearing in comparison predicates for continuous random variables, removing the duplicates, and ordering them in increasing order. In this way, we obtain a list of values $[b_1\dots, b_{m+1}]$ where $b_1=-\infty$ and $b_{m+1}=+\infty$ such that $I_k=[b_k,b_{k+1}]$ for $k=1,\dots, m$ . This process is described in Algorithm2 of Gutmann et al. Reference Gutmann, Jaeger and De Raedt(2011a).

Example 4. Consider the following simple program:

There are 3 intervals to consider: $I^a_1 = ]-\infty, 0.5]$ , $I^a_2 = [0.5,0.7]$ , and $I^a_3 = [0.7,+\infty [$ . $\square$

We transform a HPASP $P_c$ into a PASP $P_c^d$ via a process that we call discretization. The probability of a query $q$ in $P_c$ is the same as the probability asked in $P_c^d$ . Thus, using discretization, inference in HPASP is performed using inference in PASP. We now show how to generate the discretized program, and then we prove the correctness of the transformation.

We need to make the rules involving comparison predicates considering more than one interval disjoint, to avoid counting multiple times the same contribution. To do so, we can first convert all the comparison predicates $\mathit{between}/3$ and $\mathit{outside}/3$ into a combination of $\mathit{below}/2$ and $\mathit{above}/2$ . That is, every rule,

\begin{equation*} h\,{:\!-}\, \, l_1,\dots, \mathit {between}(a,lb,ub),\dots, l_n. \end{equation*}

is converted into

\begin{equation*} h\,{:\!-}\, \, l_1,\dots, \mathit {above}(a,lb), \mathit {below}(a,ub),\dots, l_n. \end{equation*}

The conversion of $\mathit{outside}(a,lb,ub)$ requires introducing two rules. That is,

\begin{equation*} h\,{:\!-}\, \, l_1, \dots, \mathit {outside}(a,lb,ub), \dots, l_n. \end{equation*}

requires generating two rules

\begin{align*} \begin{split} h_a&\,{:\!-}\, \, l_1, \dots, \mathit{above}(a,ub), \dots, l_n. \\[5pt] h_b&\,{:\!-}\, \, l_1, \dots, \mathit{below}(a,lb), \dots, l_n. \end{split} \end{align*}

If there are multiple comparison predicates in the body of a rule, the conversion is applied multiple times, until we get rules with only ${above}/2$ and ${below}/2$ predicates. After this, for every continuous variable $f_j$ with $m$ associated intervals, we introduce a set of clauses (De Raedt et al. Reference De RAEDT, Demoen, Fierens, Gutmann, Janssens, Kimmig, Landwehr, Mantadelis, Meert, Rocha, Costa, Thon and Vennekens2008; Gutmann et al. Reference Gutmann, Jaeger and De Raedt2011a)

(4)

\begin{align} \begin{split} &h_1^j \,{:\!-}\, \, f_{j1}. \\[5pt] &h_2^j \,{:\!-}\, \, not\, f_{j1}, f_{j2}.\\[5pt] &\dots \\[5pt] &h_{m-1}^j \,{:\!-}\, \, not\, f_{j1}, not \, f_{j2}, \dots, f_{jm-1}. \\[5pt] &h_m^j \,{:\!-}\, \, not\, f_{j1}, not \, f_{j2}, \dots, not \, f_{jm-1}. \end{split} \end{align}

where each $h_i^j$ is a propositional atom that is true if the continuous variable $f_j$ takes value in the interval $I_i^{f_j}$ and each $f_{jk}$ is a fresh probabilistic fact with probability $\pi _{jk}$ for $k=1,\dots, m-1$ computed as

(5)

\begin{equation} \pi _{jk} ={\frac{P(b_k \leq f_j \leq b_{k+1})}{1 - P(f_j \leq b_k)}} = \frac{\int _{b_k}^{b_{k+1}} p_j(x_j) \, dx_j}{1 - \int _{-\infty }^{b_k} p_j(x_j) \, dx_j} \end{equation}

where $p_j(x_j)$ is the probability density function of the continuous random variable $f_j$ and $b_k$ for $k=1,\dots, m$ are the bounds of the intervals. After this step, we identify the comparison atoms that are true in more than one interval. A clause containing a comparison atom $below(f_j,b_{k+1})$ is replicated $k$ times, once for each of the intervals $I_l$ with $l=1,\dots, k$ . The comparison atom of the $k$ -th replicated clause is replaced by $h_k^j$ . Similarly for $above/2$ . If a clause contains comparison atoms on multiple variables, this process is repeated multiple times. That is, if a clause contains $n_c$ comparison atoms on the variables $v_1,\dots, v_n$ that are true in $k_1,\dots, k_n$ intervals, we get $k_1 \times \dots \times k_n$ clauses. Note that the complexity of inference is dominated by the number of probabilistic facts, rather than the number of clauses.

Algorithm 1.

Function Discretize: discretization of a hybrid probabilistic answer set program with rules R, continuous probabilistic facts C and discrete probabilistic factsD.

Algorithm1 depicts the pseudocode for the discretization process applied to a HPASP with ground rules $R$ , continuous random variable definitions $C$ , and discrete probabilistic facts $D$ . It is composed of three main functions, ConvertBetweenOutside, HandleIntervals, and HandleComparisonAtoms that convert the between and outside comparison predicates, compute the number of intervals and the new discrete probabilistic facts, and manage the comparison atoms, respectively. Functions ConvertBetween and ConvertOutside convert, respectively, the comparison predicates $between/3$ and $outside/3$ into combinations of $above/2$ and $below/2$ . Function ComputeIntervals( $c$ ) computes the intervals for the continuous random variable $c$ . Function ComputeProbability( $c,i$ ) computes the probability for the $i$ -th probabilistic fact for the continuous random variable $c$ . Function GetComparisonAtoms( $R$ ) returns all the comparison atoms in the rules $R$ of the program. Function ComputeIntervalsComparison( $c$ ) computes the number of intervals involved in the comparison atom $c$ . Function Replace( $a,h_i$ ) replaces the comparison atom $a$ with the corresponding discrete fact $h_i$ representing the $i$ -th interval.

Theorem 1. Consider an hybrid probabilistic answer set program $P$ . Let $n_r$ be the number of rules, $n_c$ be the number of continuous facts, $n_k$ be the number of comparison atoms in the program, $n_i$ be the maximum number of intervals for a continuous fact, $o$ be the maximum number of $outside/3$ atoms in the body of a rule, and $i_c$ be the maximum number of intervals where a comparison atom is true. Then, Algorithm 1 requires $O(n_r^o + n_c \cdot n_i + n_k \cdot n_r \cdot i_c)$ time.

Proof. Let us analyze the three functions used in Discretize. For function ConvertBetweenOutside, the body of the loop at lines 10–13 is executed $n_r$ times. The while loop at lines 14–17 is executed $n_r^o$ times. Thus, function ConvertBetweenOutside has complexity $O(n_r^o)$ . Function HandleIntervals has complexity $O(n_c \cdot n_i)$ . In function HandleComparisonAtoms, the loop at lines 35–49 is executed $n_k$ times. For each of them, the loop at lines 38–47 is executed $n_r$ times. Lastly, the innermost loop at lines 40–43 is executed at most $i_c$ times. So, function HandleComparisonAtoms has complexity $O(n_k \cdot n_r \cdot i_c)$ . Overall, function Discretize has complexity $O(n_r^o + n_c \cdot n_i + n_k \cdot n_r \cdot i_c)$ .

Note that the complexity of inference in probabilistic answer set programs is very high (see Table 2), so the discretization process has a small impact on the overall process.

We can perform conditional inference using the formulas for discrete programs (equation (2)): if the evidence is on a discrete variable, we can directly apply that formula to the discretized program. If the evidence is on a continuous variable, say, $e = (X \gt k)$ with $k \in \mathbb{R}$ , we first need to create a discretized version of the program by also taking into account this constraint.

To clarify, in Example4, we have a variable $a$ with two numerical constraints on it: $\mathit{below}(a,0.5)$ and $\mathit{below}(a,0.7)$ . The first interval, $I_1^a$ , is $]-\infty, 0.5]$ , the second, $I_2^a$ , is $[0.5,0.7]$ , and the third $[0.7,\infty [$ . After inserting the new probabilistic facts, $f_{a1}$ for $I_1^a$ and $f_{a2}$ for $I_2^a$ , we add two more rules $h_1^a \,{:\!-}\, f_{a1}.$ and $h_2^a \,{:\!-}\, not \, f_{a1}, f_{a2}.$ , we replicate the clauses with comparison predicates spanning more than one interval and replace the comparison predicates with the appropriate $h_i^a$ atom, to make them disjoint, as previously above. For example, the comparison atom $\mathit{below}(a,0.7)$ of Example4 is true in the intervals $I_1^a$ and $I_2^a$ . The clause containing it is duplicated, and in the first copy, we replace $\mathit{below}(a,0.7)$ with $h_1^a$ , while in the second with $h_2^a$ . The other comparison atom, $\mathit{below}(a,0.5)$ , is true in only one interval, so it is sufficient to replace it with $h_1^a$ . This process is repeated for every continuous random variable. Overall, we obtain the program shown in Example5.

Example 5. By applying the discretization process to Example 4, we get that $\pi _{a1} = 0.6915$ and $\pi _{a2} = \frac{0.066}{1 - 0.6915} = 0.2139$ . The program becomes:

We can now compute the probability of the query, for example, $q0$ : $[\underline{P}(q0),\overline{P}(q0)] = [0.303,0.718]$ . The worlds are shown in Table 3. Similarly, for the program of Example3, we get $P(high\_number\_strokes) = [0.256, 0.331]$ . $\square$

Table 3.

Worlds and probabilities for Example4. $f_1$ and $f_2$ are the two probabilistic facts obtained by the discretization process of the continuous probabilistic fact $a$ into three intervals. The column LP/UP indicates whether the world contributes only to the upper probability (UP) or also to the lower probability (LP + UP)

We now show that the lower and upper probability of a query from the discretized program is the same as that from the hybrid sampling semantics.

Theorem 2. Given a query $q$ , a hybrid program $P$ , and its discretized version $P^d$ , the lower and upper probability of $q$ computed in the hybrid program ( $\underline{P}(q)$ and $\overline{P}(q)$ ) coincide with the lower and upper probability computed in its discretized version ( $\underline{P}^{P^d}(q)$ and $\overline{P}^{P^d}(q)$ ), that is,

(6)

\begin{equation} \begin{split} \underline{P}(q) &= \underline{P}^{P^d}(q) \\[5pt] \overline{P}(q) &= \overline{P}^{P^d}(q) \end{split} \end{equation}

Proof. Given a hybrid world $w$ and a clause $c$ that contains at least one comparison atom in the grounding of $P$ , call $g_1,\dots, g_n$ the set of clauses in the grounding of $P^d$ generated from $c$ . There are two cases. The first is that the samples for the continuous random variables do not satisfy the constraints in the body of $c$ . In this case, all the $g_i$ clauses will have a false body. In the second case, there will be a single clause $g_i$ , where all the $h^j_i$ atoms in the body are true. These atoms can be removed, obtaining an answer set program $w^{'}$ that is the same as the one that would be obtained in the hybrid sampling semantics. It remains to prove that the resulting probability distributions over the discretized worlds are the same. We show that the probability of obtaining a program $w^{'}$ in the hybrid sampling semantics and in the sampling semantics from the discretized program is the same. We need to prove that the probability of a continuous random variable $f_j$ of falling into the interval $I_k^{f_j}$ is the probability that in a world sampled from the discretized program atom $h_k^j$ is true. The latter probability depends only on the probabilities of the $f_{jl}$ facts. Thus it can be computed by observing that the $f_{jl}$ facts are mutually independent: since the part of the program defining $h_k^j$ is a stratified normal program, the lower and upper probabilities of $h_k^j$ coincide and, if $-\infty, b_2,\dots, b_{m-1},+\infty$ are the different bounds for the intervals they can be computed as:

(7)

\begin{equation} \begin{split} P(h_k^j) &= P(not\, f_{j1}) \cdot P(not\, f_{j2}) \cdot \dots \cdot P(not f_{jk-1}) \cdot P(f_{jk}) \\[5pt] & = \left (1 - \int _{-\infty }^{b_2} p_j(f_j)\, df_j\right ) \cdot \frac{1 - \int _{b_2}^{b_3} p_j(f_j) \, df_j}{1 - \int _{-\infty }^{b_2} p_j(f_j) \, df_j} \cdot \dots \cdot \frac{\int _{b_k}^{b_{k+1}} p_j(f_j) \, df_j}{1 - \int _{-\infty }^{b_k} p_j(f_j) \, df_j}\\[5pt] & = \left (1 - \int _{-\infty }^{b_k} p_j(f_j) \, df_j\right ) \cdot \frac{\int _{b_k}^{b_{k+1}} p_j(f_j) \, df_j}{1 - \int _{-\infty }^{b_k} p_j(f_j) \, df_j } \\[5pt] & = \int _{b_k}^{b_{k+1}} p_j(f_j) \, df_j \end{split} \end{equation}

With our framework, it is possible to answer queries in programs where variables follow a mixture of distributions. For instance, in the following program

we can compute the probability of the query $q0$ ( $[\underline{P}(q0),\overline{P}(q0)] = [0.923,0.923]$ ). Here, if $c$ is true, we consider a Gaussian distribution with mean 10 and variance 3, if $c$ is false, a Gaussian distribution with mean 9 and variance 2. In other words, we consider a variable that follows the first Gaussian distribution if $c$ is true and the second Gaussian distribution if $c$ is false.

One of the key features of ASP is the possibility of adding logical constraints that cut some of the possible answer sets. However, when we consider a PASP (also obtained via discretization), constraints may cause a loss of probability mass due to unsatisfiable worlds since these contribute neither to the lower nor to the upper bound (equation (1)). There can be constraints involving only discrete probabilistic facts, constraints involving only comparison atoms (thus, continuous random variables), and constraints involving both. Let us discuss this last and more general case with an example.

Example 6. Consider the program of Example 4 with the additional rule (constraint) $\,{:\!-}\, b, below(a, 0.2)$ . When $b$ is true, the value of $a$ cannot be less than 0.2. This results in a loss of probability mass in the discretized program. The introduction of the constraint requires a more granular partition and an additional interval: $I^a_1 = ]-\infty, 0.2]$ , $I^a_2 = [0.2,0.5]$ , $I^a_3 = [0.5,0.7]$ , and $I^a_4 = [0.7,+\infty [$ . Note that the probabilities of the discretized facts will be different from the ones of Example 4. $\square$

When every world is satisfiable, we have that (Cozman and Mauá Reference Cozman and Maua2020):

(8)

\begin{equation} \underline{P}(q)+\overline{P}(not\, q) = 1 \end{equation}

When at least one world is unsatisfiable, equation (8) does not hold anymore, but we have $\underline{P}(q)+\overline{P}(not\, q) + P(inc)=1$ , where $P(inc)$ is the probability of the unsatisfiable worlds. So we can still use the semantics but we need to provide the user also with $P(inc)$ beside $\underline{P}(q)$ and $\overline{P}(q)$ . If we want to enforce equation (8), we can resort to normalization: we divide both the lower and the upper probability bounds of a query by the probability of the satisfiable worlds. That is, call

(9)

\begin{equation} Z = \sum _{w_i \mid |AS(w_i)| \gt 0} P(w_i) \end{equation}

then $P^n(q) = [\underline{P}^n(q),\overline{P}^n(q)]$ with $\underline{P}^n(q) = \underline{P}(q) / Z, \, \overline{P}^n(q) = \overline{P}(q) / Z$ . This approach has the disadvantage that if $Z$ is 0 the semantics is not defined. In Example6, the normalizing factor $Z$ is 0.7683; thus, the probability of $q0$ now is $P(q0) = [0.093,0.633]$ . Note that the new bounds are not simply the bounds of Example5 divided by $Z$ since the constraint splits the domain even further, and the probabilities associated with the probabilistic facts obtained via discretization change.

4 Algorithms

In this section, we describe two exact and two approximate algorithms for performing inference in HPASP.

4.1 Exact inference

4.1.1 Modifying the PASTA solver

We first modified the PASTA solver (Azzolini et al. Reference Azzolini, Bellodi, Riguzzi, Gottlob, Inclezan and Maratea2022),Footnote ² by implementing the conversion of the hybrid program into a regular probabilistic answer set program (Algorithm1). PASTA performs inference on PASP via projected answer set enumeration. In a nutshell, to compute the probability of a query (without loss of generality assuming here it is an atom), the algorithm converts each probabilistic fact into a choice rule. Then, it computes the answer sets projected on the atoms for the probabilistic facts and for the query. In this way, PASTA is able to identify the answer set pertaining to each world. For each world, there can be three possible cases: (i) a single projected answer set with the query in it, denoting that the query is true in every answer set, so this world contributes to both the lower and upper probability; (ii) a single answer set without the query in it: this world does not contribute to any probability; and (iii) two answer sets, one with the query and one without: the world contributes only to the upper probability. There is a fourth implicit and possible case: if a world is not present, this means that the ASP obtained from the PASP by fixing the selected probabilistic facts is unsatisfiable. Thus, in this case, we need to consider the normalization factor of equation (9). PASTA already handles this by keeping track of the sum of the probabilities of the computed worlds. The number of generated answer sets depends on the number of Boolean probabilistic facts and on the number of intervals for the continuous random variables, since every interval requires the introduction of a new Boolean probabilistic fact. Overall, if there are $d$ discrete probabilistic facts and $c$ continuous random variables, the total number of probabilistic facts (after the conversion of the intervals) becomes $T = d + \sum _{i = 1}^{c}(k_i - 1)$ , where $k_i$ is the number of intervals for the $i$ -th continuous fact. Finally, the total number of generated answer sets is bounded above by $2^{T+1}$ , due to the projection on the probabilistic facts. Clearly, generating an exponential number of answer sets is intractable, except for trivial domains.

Example 7 (Inference with PASTA.) Consider the discretized program described in Example 5. It is converted into

It has 10 answer sets projected on the atoms $q0/0$ and $a0/0$ , $a1/0$ , and $b/0$ : $AS_{1} = \{\}$ , $AS_{2} = \{a1\}$ , $AS_{3} = \{a2\}$ , $AS_{4} = \{a2,q0\}$ , $AS_{5} = \{b\}$ , $AS_{6} = \{a1,a2\}$ , $AS_{7} = \{a1,b\}$ , $AS_{8} = \{a2,b,q0\}$ , $AS_{9} = \{a1,a2,q0\}$ , and $AS_{10} = \{a1,a2,b,q0\}$ . For example, the world where only $a2$ is true is represented by the answer sets $AS_3$ and $AS_4$ . The query $q0$ is present only in one of the two, so this world contributes only to the upper probability. The world where $a1$ and $a2$ are true and $b$ is false is represented only by the answer set $AS_9$ : the query is true in it so this world contributes to both the lower and upper probability. The world where only $a1$ is true is represented by $AS_2$ , $q0$ is not present in it so it does not contribute to the probability. By applying similar considerations for all the worlds, it is possible to compute the probability of the query $q0$ . $\square$

4.1.2 Modifying the aspcs solver

We also added the conversion from HPASP to PASP to the aspcs solver (Azzolini and Riguzzi Reference Azzolini, Riguzzi, Basili, Lembo, Limongelli and Orlandini2023a), built on top of the aspmc solver (Eiter et al. Reference Eiter, Hecher, Kiesel, Bienvenu, Lakemeyer and Erdem2021; Kiesel et al. Reference Kiesel, Totis and Kimmig2022), which performs inference on Second Level Algebraic Model Counting (2AMC) problems, an extension of AMC (Kimmig et al. Reference Kimmig, Van Den Broeck and De Raedt2017). Some of them are MAP inference (Shterionov et al. Reference Shterionov, Renkens, Vlasselaer, Kimmig, Meert and Janssens2015), decision theory inference in PLP (Van den Broeck et al. Reference Van Den Broeck, Thon, Van Otterlo and De Raedt2010), and probabilistic inference under the smProbLog semantics (Totis et al. Reference Totis, De Raedt and Kimmig2023). More formally, let $X_{in}$ and $X_{out}$ be a partition of the variables in a propositional theory $\Pi$ . Consider two commutative semirings $\mathcal{R}_{in} = (R^{i},\oplus ^i,\otimes ^i,e_{\oplus }^i,e_{\otimes }^i)$ and $\mathcal{R}_{out} = (R^{o},\oplus ^o,\otimes ^o,e_{\oplus }^o,e_{\otimes }^o)$ , connected by a transformation function $f : \mathcal{R}^{i} \to \mathcal{R}^{o}$ and two weight functions, $w_{in}$ and $w_{out}$ , which associate each literal to a weight (i.e., a real number). 2AMC requires computing:

(10)

\begin{align} \begin{split} 2AMC(A) =& \bigoplus \nolimits _{I_{out} \in \sigma (X_{out})}^{o} \bigotimes \nolimits ^{o}_{a \in I_{out}} w_{out}(a) \otimes ^{o} \\[5pt] & f( \bigoplus \nolimits _{I_{in} \in \delta (\Pi \mid I_{out})}^{i} \bigotimes \nolimits ^{i}_{b \in I_{in}} w_{in}(b) ) \end{split} \end{align}

where $\sigma (X_{out})$ are the set of possible assignments to $X_{out}$ and $\delta (\Pi \mid I_{out})$ are the set of possible assignments to the variables of $\Pi$ that satisfy $I_{out}$ . We can cast inference under the CS as a 2AMC problem (Azzolini and Riguzzi Reference Azzolini, Riguzzi, Basili, Lembo, Limongelli and Orlandini2023a) by considering as inner semiring $\mathcal{R}_{in} = (\mathbb{N}^2, +, \cdot, (0,0), (1,1))$ , where $+$ and $\cdot$ are component-wise and $w_{in}$ is a function associating $not\, q$ to $(0,1)$ and all the other literals to $(1,1)$ , where $q$ is the query. The first component $n_u$ of the elements $(n_u,n_l)$ of the semiring counts the models where the query is true, while the second component $n_l$ counts all the models. The transformation function $f(n_u,n_l)$ returns a pair of values $(f_u,f_l)$ such that $f_l = 1$ if $n_l = n_u$ , 0 otherwise, and $f_u = 1$ if $n_u \gt 0$ , 0 otherwise. The outer semiring $\mathcal{R}_{out} = ([0, 1]^2, +, \cdot, (0,0), (1,1))$ is a double probability semiring, where there is a separate probability semiring for each component. $w_{out}$ associates $a$ to $(p,p)$ and $not\ a$ to $(1-p,1-p)$ for every probabilistic fact $p :: a$ , while it associates all the remaining literals to $(1,1)$ . aspmc, and so aspcs, solves 2AMC using knowledge compilation (Darwiche and Marquis Reference Darwiche and Marquis2002) targeting NNF circuits (Darwiche Reference Darwiche2004) where the order of the variables is guided by the treewidth of the program (Eiter et al. Reference Eiter, Hecher, Kiesel, Bienvenu, Lakemeyer and Erdem2021). Differently from PASTA, aspcs does not support aggregates, but we can convert rules and constraints containing them into new rules and constraints without aggregates using standard techniques (Brewka et al. Reference Brewka, Eiter and Trusczynski2011).

4.2 Approximate inference

Approximate inference can be performed by using the definition of the sampling semantics and returning the results after a finite number of samples, similar to what is done for programs under the DS (Kimmig et al. Reference Kimmig, Demoen, De Raedt, Costa and Rocha2011; Riguzzi Reference Riguzzi2013). It can be performed both on the discretized program and directly on the hybrid program.

4.2.1 Sampling the discretized program

In the discretized program, we can speed up the computation by storing the sampled worlds to avoid calling again the answer set solver in case a world has been already sampled. However, the number of discrete probabilistic facts obtained via the conversion is heavily dependent on the types of constraints in the program.

4.2.2 Sampling the hybrid program

Sampling the hybrid program has the advantage that it allows general numerical constraints, provided they involve only continuous random variables and constants. In this way, we directly sample the continuous random variables and directly test them against the constraints that can be composed of multiple variables and complex expressions. In fact, when constraints among random variables are considered, it is difficult to discretize the domain. Even if the samples would be always different (since the values are floating point numbers), also here we can perform caching, as in the discretized case: the constraints, when evaluated, are still associated with a Boolean value (true or false). So, we can store the values of the evaluations of each constraint: if a particular configuration has already been encountered, we retrieve its contribution to the probability, rather than calling the ASP solver.

4.2.3 Approximate algorithm description

Algorithm2 illustrates the pseudocode for the sampling procedure: for the sampling on the discretized program, the algorithm discretizes the program by calling Discretize (Algorithm1). Then, for a number of samples $s$ , it samples a world by including or not every probabilistic fact into the program (function SampleWorld) according to the probability values, computes its answer sets with function ComputeAnswerSets (recall that a world is an answer set program), checks whether the query $q$ is true in every answer set (function QueryInEveryAnswerSet) or in at least one answer set (function QueryInAtLeastOneAnswerSet), and updates the lower and upper probability bounds accordingly. At the end of the $s$ iterations, it returns the ratio between the number of samples contributing to the lower and upper probability and the number of samples taken. The procedure is analogous (but without discretization) in the case of the sampling on the original program. A world is sampled with function SampleVariablesAndTestConstraints: it takes a sample for each continuous random variable, tests that value against each constraint specified in the program, and removes it if the test succeeds; otherwise, it removes the whole rule containing it. The remaining part of the algorithm is the same.

We now present two results regarding the complexity of the sampling algorithm. The first provides a bound on the number of samples needed to obtain an estimate of the upper (or lower) probability of a query within a certain absolute error. The second result provides a bound on the number of samples needed to obtain an estimate within a certain relative error.

Theorem 3 (Absolute Error.) Let $q$ be a query in a hybrid probabilistic answer set program $P$ whose exact lower (upper) probability of success is $p$ . Suppose the sampling algorithm takes $s$ samples, $k$ of which are successful, and returns an estimate $\hat{p} = \frac{k}{s}$ of the lower (upper) probability of success. Let $\epsilon$ and $\delta$ be two numbers in $[0,1]$ . Then, the probability that $\hat{p}$ is within $\epsilon$ of $p$ is at least $1-\delta$ , that is,

\begin{equation*}P(p-\epsilon \leq \hat {p} \leq p+\epsilon ) \geq 1 - \delta \end{equation*}

\begin{equation*} s\geq \frac {\epsilon +\frac {1}{2}}{\epsilon ^2 \delta }. \end{equation*}

Algorithm 2.

Function Sampling: computation of the probability of a query q by taking s samples in a hybrid probabilistic answer set program P with rules R, continuous probabilistic facts C, and discrete probabilistic factsD. Variable type indicates whether the sampling of the discretized program or the original program is selected.

Proof. We must prove that

(11)

\begin{equation} P(p-\epsilon \leq \hat{p}\leq p+\epsilon )\geq 1 - \delta \end{equation}

or, equivalently, that

(12)

\begin{equation} P(sp-s\epsilon \leq k\leq sp+s\epsilon ) \geq 1 - \delta . \end{equation}

Since $k$ is a binomially distributed random variable with number of trials $s$ and probability of success $p$ , we have that (Feller Reference Feller1968):

(13)

\begin{equation} P(k\geq r_1) \leq \frac{r_1(1-p)}{(r_1-sp)^2} \end{equation}

if $r_1\geq sp$ . Moreover

(14)

\begin{equation} P(k\leq r_2) \leq \frac{(s-r_2)p}{(sp-r_2)^2} \end{equation}

if $r_2\leq sp$ . Since $P(k \geq r_1) = 1-P(k \lt r_1)$ , from equation (13), we have

(15)

\begin{eqnarray} 1-P(k\lt r_1)&\leq & \frac{r_1(1-p)}{(r_1-sp)^2}\nonumber \\[5pt] P(k\lt r_1)&\geq & 1-\frac{r_1(1-p)}{(r_1-sp)^2} \end{eqnarray}

if $r_1\geq sp$ .

In our case, we have $r_1 = sp+s\epsilon$ (since $r_1 \geq sp$ ) and $r_2 = sp-s\epsilon$ (since $r_2 \leq sp$ ). So

\begin{eqnarray*} && P(p-\epsilon \leq \hat{p}\leq p+\epsilon ) =\\[5pt] & = & P(r_2 \leq k \leq r_1) \\[5pt] & = &P(k\leq r_1)-P(k\lt r_2) \\[5pt] &\geq &1-\frac{r_1(1-p)}{(r_1-sp)^2}-\frac{(s-r_2)p}{(sp-r_2)^2} \qquad \text{(Eq.\,15 and\,14 and since $P(x \leq v) \geq P(x \lt v)$)} \\[5pt] &=&1-\frac{(sp+s\epsilon )(1-p)}{(s\epsilon )^2}-\frac{(s-sp+s\epsilon )p}{(s\epsilon )^2} \quad \text{(by replacing the values of $r_1$ and $r_2$)} \\[5pt] &=&\frac{s^2\epsilon ^2-sp-s\epsilon +sp^2+sp\epsilon -sp+sp^2-sp\epsilon }{s^2\epsilon ^2} \quad \text{(by expanding)} \\[5pt] &=&\frac{s\epsilon ^2-2p-\epsilon +2p^2}{s\epsilon ^2} \quad \text{(by collecting $s$ and simplifying)}\\[5pt] \end{eqnarray*}

and

(16)

\begin{eqnarray} \frac{s\epsilon ^2-2p-\epsilon +2p^2}{s\epsilon ^2} & \geq & 1-\delta \nonumber \\[5pt] s\epsilon ^2-2p-\epsilon +2p^2 & \geq & s\epsilon ^2- s\epsilon ^2 \delta \nonumber \\[5pt] s\epsilon ^2\delta & \geq & 2p + \epsilon - 2p^2\nonumber \\[5pt] s & \geq & \frac{2p + \epsilon - 2p^2}{\epsilon ^2\delta } = \frac{2p(1-p) + \epsilon }{\epsilon ^2\delta }. \end{eqnarray}

Since $0\leq 2p (1-p) \leq \frac{1}{2}$ with $p \in [0,1]$ , then equation (16) is implied by

\begin{eqnarray*} s & \geq & \frac{\epsilon +\frac{1}{2}}{\epsilon ^2 \delta } \end{eqnarray*}

Theorem 4 (Relative Error.) Let $q$ be a query in a hybrid probabilistic answer set program $P$ whose exact lower (upper) probability of success is $p$ . Suppose the sampling algorithm takes $s$ samples, $k$ of which are successful, and returns an estimate $\hat{p} = \frac{k}{s}$ of the lower (upper) probability of success. Let $\epsilon$ and $\delta$ be two numbers in $[0,1]$ . Then, the probability that the error between $\hat{p}$ and $p$ is smaller than $p\epsilon$ is at least $1-\delta ^p$ , that is,

\begin{equation*}P(|p-\hat {p}| \leq p\epsilon ) \geq 1 - \delta ^p\end{equation*}

\begin{equation*} s\geq \frac {3}{\epsilon ^2}\ln (\frac {1}{\delta }). \end{equation*}

Proof. According to Chernoff’s bound (Mitzenmacher and Upfal Reference Mitzenmacher and Upfal2017, Corollary 4.6), we have that

\begin{equation*} P( |ps-k| \ge ps\epsilon ) \leq 2e^{-\frac {\epsilon ^2ps}{3}}. \end{equation*}

\begin{equation*} P( |p - \hat {p}| \geq p\epsilon ) \le 2e^{-\frac {\epsilon ^2ps}{3}} \end{equation*}

and

\begin{equation*} P( |p - \hat {p}| \le p\epsilon ) \geq 1-2e^{-\frac {\epsilon ^2ps}{3}}. \end{equation*}

Then,

\begin{eqnarray*} 1-2e^{-\frac{\epsilon ^2ps}{3}} & \geq & 1-\delta ^p\\[2pt] 2e^{-\frac{\epsilon ^2ps}{3}} & \leq & \delta ^p\\[2pt] \ln (2) -\frac{\epsilon ^2ps}{3} & \leq & p \ln (\delta ) \\[2pt] -\frac{\epsilon ^2ps}{3} & \leq & p \ln (\delta ) - \ln (2) \leq p \ln (\delta ) \\[2pt] s & \geq & \frac{3}{\epsilon ^2}\ln (\frac{1}{\delta }). \end{eqnarray*}

Please note that in Theorem4, we exponentiate $\delta$ to $p$ . This is needed to avoid the appearance of $p$ in the bound since $p$ is unknown. When $p$ is 0, we have no guarantees on the error. When $p$ is 1, the confidence is $1-\delta$ . For $p$ growing from 0–1, the bound provides increasing confidence. In fact, approximating small probabilities is difficult due to the low success probability of the sample.

5 Experiments

We ran experiments on a computer with 8 GB of RAM and a time limit of 8 h (28800 s). We generated five synthetic datasets, $t1$ , $t2$ , $t3$ , $t4$ , and $t5$ , where every world of all the discretized versions of the programs has at least one answer set. We use the SciPy library (Virtanen et al. Reference Virtanen, Gommers, Oliphant, Haberland, Reddy, Cournpeau, Burovski, Peterson, Weckesser, Bright, Van DER WALT, Brett, Wilson, Millman, Mayorov, Nelson, Jones, Kern, Larson, Carey, Polat, Feng, Moore, Vanderplas, Laxalde, Perktold, Cimrman, Henriksen, Quintero, Harris, Archibald, Riberio, Pedregosa and Van MULBERGT2020) to sample continuous random variables. The following code snippets show the PASTA-backed programs; the only difference with the ones for aspcs is in the negation symbol, $not$ for the former and $\backslash +$ for the latter. For all the datasets, we compare the exact algorithms and the approximate algorithms based on sampling, with an increasing number of samples. We record the time required to parse the program and to convert the HPASP into a PASP together with the inference time. The time for the first two tasks is negligible with respect to the inference time.

5.1 Dataset t1

In $t1$ we consider instances of increasing size of the program shown in Example4. Every instance of size $n$ has $n/2$ discrete probabilistic facts $d_i$ , $n/2$ continuous random variables $c_i$ with a Gaussian distribution with mean 0 and variance 1, $n/2$ pair of rules $q0 \,{:\!-}\, below(c_i, 0.5), not \, q1$ and $q1 \,{:\!-}\, below(c_i, 0.5), not \, q0$ , $i = \{1,\dots, n/2\}$ , and $n/2$ rules $q0 \,{:\!-}\, below(c_i, 0.7), d_i$ , one for each $i$ for $i=\{1,\dots, n/2\}$ . The query is $q0$ . The goal of this experiment is to analyze the behaviour of the algorithm by increasing the number of discrete and continuous random variables. The instance of size 2 is:

5.2 Dataset t2

In $t2$ we consider a variation of $t1$ , where we fix the number $k$ of discrete probabilistic facts to 2, 5, 8, and 10, and increase the number of continuous random variables starting from 1. Every instance of size $n$ contains $k$ discrete probabilistic facts $d_i$ , $i \in \{0,\dots, k-1\}$ , $n$ continuous random variables $c_i$ , $i \in \{1,\dots, n\}$ with a Gaussian distribution with mean 0 and variance 1, $n$ pair of rules $q0 \,{:\!-}\, below(c_i, 0.5), not \, q1$ and $q1 \,{:\!-}\, below(c_i, 0.5), not \, q0$ , $i = \{1,\dots, n\}$ , and $n$ rules $q0 \,{:\!-}\, below(c_i, 0.7), d_{(i-1) \% k}$ , one for each $i$ for $i=\{1,\dots, n\}$ . The query is $q0$ . Here, when the instance size $n$ is less than the number of discrete probabilistic facts $k$ , $k - n$ probabilistic facts are not relevant for the computation of the probability of the query. The instance of size 2 with $k = 2$ is:

5.3 Dataset t3

In $t3$ , we consider again a variation of $t1$ where we fix the number $k$ of continuous random variables to 2, 5, 8, and 10, and increase the number of discrete probabilistic facts starting from 1. Every instance of size $n$ contains $k$ continuous probabilistic facts $c_i$ , $i \in \{0,\dots, k-1\}$ , with a Gaussian distribution with mean 0 and variance 1, $n$ discrete probabilistic facts $d_i$ , $i \in \{1,\dots, n\}$ , $k$ pair of rules $q0 \,{:\!-}\, below(c_i, 0.5), not \, q1$ and $q1 \,{:\!-}\, below(c_i, 0.5), not \, q0$ , $i = \{1,\dots, n\}$ , and $n$ rules $q0 \,{:\!-}\, below(c_{(i-1) \% k}, 0.7), d_i$ , one for each $i$ for $i=\{1,\dots, n\}$ . The query is $q0$ . The instance of size 3 with $k = 2$ is:

5.4 Dataset t4

In $t4$ , we consider programs with one discrete probabilistic fact $d$ and one continuous random variable $a$ that follows a Gaussian distribution with mean 0 and standard deviation 10. An instance of size $n$ has $n$ pairs of rules $q0 \,{:\!-}\, between(a,lb_i,ub_i), not \, q1$ and $q1 \,{:\!-}\, between(a,lb_i,ub_i), not \, q0$ where $lb_i$ and $ub_i$ are randomly generated (with, $lb_i \lt ub_i$ ), for $j = 1,\dots, n$ , and $n$ rules of the form $q0 \,{:\!-}\, d, between(a,LB_j,UB_j)$ , where the generation of the $LB_j$ and $UB_j$ follows the same process of the previous rule. We set the minimum value of $lb_i$ and $LB_j$ to -30 and, for both rules, the lower and upper bounds for the $\mathit{between}/2$ comparison predicate are uniformly sampled in the range $[u_{i-1}, u_{i-1} + 60/n]$ , where $u_{i-1}$ is the upper bound of the previous rule. The query is $q0$ . Here, the goal is to test the algorithm with an increasing number of intervals to consider. An example of an instance of size 2 is:

5.5 Dataset t5

Lastly, in $t5$ we consider programs of the form shown in Example3: the instance of index $n$ has $n$ people involved, $n$ discrete probabilistic facts, and $n$ $d/1$ and $s/1$ continuous random variables. The remaining part of the program is the same. Example3 shows the instance of index 4 (four people involved). The query is $high\_number\_strokes$ .

Fig 1.

Results for $t1$ , $t4$ , and $t5$ (left) and results for aspcs applied to $t3$ (right) with two and five continuous facts.

Fig 2.

Results for the experiment $t2$ with a fixed number of discrete facts and an increasing number of continuous variables.

Fig 3.

Results for the experiment $t3$ with a fixed number of continuous random variables and an increasing number of discrete facts. The x axis of the left plot is cut at size 20, to keep the results readable.

Table 4.

Results for the approximate algorithms based on sampling. The columns contain, from the left, the name of the dataset, the instance index, and the average time required to take $10^2$ , $10^3$ , $10^4$ , $10^5$ , and $10^6$ samples on the original and converted program. O.O.M. and T.O. stand, respectively, for out of memory and timeout

Table 5.

Standard deviations for the results listed in Table 4. A dash denotes that there are no results for that particular instance due to either a memory error or a time limit

Table 6.

Largest solvable instances of $t4$ by sampling the converted program when reducing the available memory. The columns contain, from the left, the maximum amount of memory, the largest solvable instance together with the number of probabilistic facts (# p.f.) and rules (# rules) obtained via the conversion, and the maximum number of samples that can be taken (max. # samples). Note that we increase the number of samples by starting from $10^2$ and iteratively multiplying the number by 10, up to $10^6$ , so in the last column, we report a range: this means that we get a memory error with the upper bound while we can take the number of samples in the lower bound. Thus, the maximum values of samples lie in the specified range

5.6 Exact inference results

The goal of benchmarking the exact algorithms is threefold: (i) identifying how the number of continuous and discrete probabilistic facts influences the execution time, (ii) assessing the impact on the execution time of an increasing number of intervals, and (iii) comparing knowledge compilation with projected answer set enumeration. Figure 1(a) shows the results of exact inference for the algorithms backed by PASTA (dashed lines) and aspcs (straight lines) on $t1$ , $t4$ , and $t5$ . For $t5$ PASTA is able to solve up to instance 4, while aspcs can solve up to instance 9 (in a few seconds). Similarly for $t1$ , where aspcs doubles the maximum sizes solvable by PASTA. The difference is even more marked for $t4$ : here, PASTA solves up to instance 11 while aspcs up to instance 35. Figures 2 and 3 show the results for $t2$ and $t3$ . Here, PASTA cannot solve instances with around more than 20 probabilistic facts and continuous random variables combined. The performance of aspcs are clearly superior, in particular for $t3$ with 2 and 5 continuous random variables (Figure 3(a)). Note that this is the only plot where we cut the x axis at instance 20, to keep the results readable. To better investigate the behavior of aspcs in this case, we keep increasing the number of discrete facts until we get a memory error, while the number of continuous variables is fixed to 2 and 5. Figure 1(b) shows the results: with 2 continuous variables, we can solve up to instance 110, while with 5 continuous variables, we can only solve up to 75. In general, the results of exact inference are in accordance with the theoretical complexity results. However, as also empirically shown by Azzolini and Riguzzi (Reference Azzolini, Riguzzi, Basili, Lembo, Limongelli and Orlandini2023a), knowledge compilation has a huge impact on the execution times. Overall, as expected, PASTA is slower in all the tests, being based on (projected) answer set enumeration. For all, both PASTA (exact algorithm) and aspcs stop for lack of memory.

5.7 Approximate inference results

The goal of the experiments run with approximate algorithms is: (i) analyzing the impact on the execution time of the number of samples taken, (ii) comparing the approach based on sampling the original program against the one based on sampling the converted program in terms of execution times, and (iii) assessing the memory requirements. Table 4 shows the averages on five runs of the execution time of the approximate algorithm applied to both the discretized and original program with $10^2$ , $10^3$ , $10^4$ , $10^5$ , and $10^6$ samples, for each of the five datasets. Standard deviations are listed in Table 5. For $t2$ and $t3$ , we consider programs with the same number of continuous variables and discrete probabilistic facts. In four of the five tests (all except $t4$ ), sampling the original program is slower than sampling the converted program, sometimes by a significant amount. This is probably due to the fact that sampling a continuous distribution is slower than sampling a Boolean random variable. For example, in instance 100 of $t3$ , sampling the original program is over 12 times slower than sampling the converted program. However, the discretization process increases the number of probabilistic facts and so the required memory: for $t4$ , from the instance 70, taking even 100 samples requires more than 8 GB of RAM. To better assess the memory requirements, we repeat the experiments with the approximate algorithms with 6, 4, 2, and 1 GB of maximum available memory. For all, taking up to $10^5$ samples in the original program is feasible also with only 1 GB of memory. The same considerations hold for sampling the converted program, except for $t4$ . Table 6 shows the largest solvable instances together with the number of probabilistic facts, rules (for the converted program), and samples for each instance: for example, with 1 GB of memory, it is possible to take only up to $10^4$ samples in the instance of size 40. For this dataset, the increasing number of $between/3$ predicates requires an increasing number of rules and probabilistic facts to be included in the program during the conversion, to properly handle all the intervals: the instance of size 70 has 142 probabilistic facts and more than 30,000 rules, which make the generation of the answer sets very expensive and sampling it with only 8 GB of memory is not feasible.

6 Related Work

Probabilistic logic programs with discrete and continuous random variables have been the subject of various works. Gutmann et al. (Reference Gutmann, Jaeger and De Raedt2011a) proposed Hybrid ProbLog, an extension of ProbLog (De Raedt et al. Reference De RAEDT, Kimmig and Toivonen2007) to support continuous distributions. There are several differences with Hybrid ProbLog: first, Hybrid ProbLog focuses on PLP while our approach on PASP. Thus, the syntax and semantics are different. For the syntax, in PASP, we can use rich constructs such as aggregates that greatly increase the expressivity of the programs. For the semantics, at high level, PLP requires that every world (i.e., combination of probabilistic facts) has exactly one model while PASP does not. Another difference with Hybrid ProbLog is in the discretization process: Hybrid ProbLog discretizes the proofs of a program, while we directly discretize the program. Moreover, their inference algorithm is based on the construction of a compact representation of the program via BDDs, while we use both ASP solvers with projective solutions and knowledge compilation targeting NNF.

Distributional Clauses (Gutmann et al. Reference Gutmann, Thon, Kimmig, Bruynooghe and De RAEDT2011b) and Extended PRISM (Islam et al. Reference Islam, Ramakrishnan and Ramakrishnan2012) are two other proposals to handle both discrete and continuous random variables. The semantics of the former is based on a stochastic extension of the immediate consequence $T_p$ operator, while the latter considers the least model semantics of constraint logic programs (Jaffar and Maher Reference Jaffar and Maher1994) and extends the PRISM framework (Sato Reference Sato1995). Michels et al. (Reference Michels, Hommersom, Lucas and Velikova2015) introduced Probabilistic Constraint Logic Programming whose semantics is based on an extension of Sato’s DS (Sato Reference Sato1995). Azzolini et al. (Reference Azzolini, Riguzzi and Lamma2021) proposed a semantics for hybrid probabilistic logic programs that allows a denumerable number of random variables. In general, all the previously discussed approaches only support normal clauses, do not adopt some of the ASP constructs, such as aggregates and constraints, and require that the worlds have a single model. Some languages allow handling uncertainty in ASP, such as LPMLN (Lee and Wang Reference Lee, Wang, Baral, Delgrande and Wolter2016), P-log (Baral et al. Reference Baral, Gelfond and Rushton2009), PrASP (Nickles and Mileo Reference Nickles, Mileo, Riguzzi and Vennekens2015), and differentiable SAT/ASP (Nickles Reference Nickels, Bellodi and Schrijvers2018) but none of these consider continuous distributions. PASOCS (Tuckey et al. Reference Tuckey, Russo and Broda2021) is a system for performing inference in probabilistic answer set programs under the CS, but it does not allow worlds without answer sets and continuous variables while plingo (Hahn et al. Reference Hahn, Janhunen, Kaminski, Romero, Ruhling and Schaub2022) considers the LPMLN, P-log, and ProbLog semantics (the relationship among these has been discussed in (Lee and Yang Reference Lee, Yang, Singh and Markovitch2017)). The credal least undefined semantics (Rocha and Gagliardi Cozman Reference Rocha, Gagliardi Cozman, Kern-Isberner, Lakemeyer and Meyer2022) and the smProbLog semantics (Totis et al. Reference Totis, De Raedt and Kimmig2023) handle unsatisfiable worlds, but by considering three-valued semantics and do not allow continuous random variables.

There is a large body of work on inference in Probabilistic Programming (PP) (Pfeffer Reference Pfeffer2016; Gehr et al. Reference Gehr, Misailovic, Vechev, Chaudhuri and Farzan2016; Tran et al. Reference Tran, Hoffman, Saurous, Brevdo, Murphy and Blei2017; van de Meent et al. Reference Van De Meent, Paige, Yang and Wood2021) with both discrete and continuous random variables, with several available tools (Tran et al. Reference Tran, Kucukelbir, Dieng, Rudolph, Liang and Blei2016; Bingham et al. Reference Bingham, Chen, Jankowiak, Obermeyer, Pradhan, Karaletsos, Singh, Szerlip, Horsfall and Goodman2018; Phan et al. Reference Phan, Pradhan and Jankowiak2019). PLP and PASP adopt a declarative approach to describe a domain, so they are particularly suitable for describing relational domains. Translating a PLP/PASP into PP is possible but would result in a much longer and less understandable program.

7 Conclusions

In this paper, we propose HPASP, an extension of PASP under the CS that allows both discrete and continuous random variables. We restrict the types of possible numerical constraints and, to perform exact inference, we convert the program containing both discrete probabilistic facts and continuous random variables into a program containing only discrete probabilistic facts, similarly to (Gutmann et al. Reference Gutmann, Jaeger and De Raedt2011a). We leverage two existing tools for exact inference, one based on projected answer set enumeration and one based on knowledge compilation. We also consider approximate inference by sampling either the discretized or the original program. We tested the four algorithms on different datasets. The results show that the exact algorithm based on projected answer set enumeration is feasible only for small instances while the one based on knowledge compilation can scale to larger programs. Approximate algorithms can handle larger instances and sampling the discretized program is often faster than sampling the original program. However, this has a cost in terms of required memory, since the discretization process adds a consistent number of rules and probabilistic facts. In the future, we plan to extend our framework to also consider comparisons involving more than one continuous random variable and general expressions, as well as considering lifted inference approaches (Azzolini and Riguzzi Reference Azzolini and Riguzzi2023b) and handle the inference problem with approximate answer set counting (Kabir et al. Reference Kabir, Everado, Shukla, Hecher, Fichte and Meel2022).

Acknowledgments

This work has been partially supported by Spoke 1 “FutureHPC & BigData” of the Italian Research Center on High-Performance Computing, Big Data and Quantum Computing funded by MUR Missione 4 – Next Generation EU (NGEU) and by Partenariato Esteso PE00000013, “FAIR – Future Artificial Intelligence Research,” Spoke 8 “Pervasive AI,” funded by MUR through PNRR – M4C2 – Investimento 1.3 (Decreto Direttoriale MUR n. 341 of March 15, 2022) under the NGEU. Both authors are members of the Gruppo Nazionale Calcolo Scientifico – Istituto Nazionale di Alta Matematica (GNCS-INdAM). We acknowledge the CINECA award under the ISCRA initiative for the availability of high-performance computing resources and support.

Competing interests

The authors declare none.

Footnotes

¹ In ASP terminology, the word “hybrid” is usually adopted to describe an extension of ASP, such as the one by Janhunen et al. Reference Janhunen, Kaminski, Ostrowski, Schellhorn, Wanko and Schaub(2017). Here we use the word hybrid exclusively to denote the presence of discrete and continuous random variables.

² Code and datasets are available on GitHub at https://github.com/damianoazzolini/pasta and on zenodo at https://doi.org/10.5281/zenodo.11653976.

References

Alviano, M. and Faber, W. 2018. Aggregates in answer set programming. KI-Künstliche Intelligenz 32, 2, 119–124.CrossRef Google Scholar

Azzolini, D., Bellodi, E. and Riguzzi, F. 2022. Statistical statements in probabilistic logic programming. In Logic Programming and Nonmonotonic Reasoning, Gottlob, G., Inclezan, D. and Maratea, M., Eds. Cham, Springer International Publishing, 43–55.CrossRef Google Scholar

Azzolini, D. and Riguzzi, F. 2023a. Inference in probabilistic answer set programming under the credal semantics. In AIxIA. 2023 - Advances in Artificial Intelligence, Basili, R., Lembo, D., Limongelli, C. and Orlandini, A., Eds. vol. 14318. Lecture Notes in Artificial Intelligence. Heidelberg, Germany, Springer, 367–380.Google Scholar

Azzolini, D. and Riguzzi, F. 2023b. Lifted inference for statistical statements in probabilistic answer set programming. International Journal of Approximate Reasoning 163, 109040.CrossRef Google Scholar

Azzolini, D., Riguzzi, F. and Lamma, E. 2021. A semantics for hybrid probabilistic logic programs with function symbols. Artificial Intelligence, 294, 103452.CrossRef Google Scholar

Baral, C., Gelfond, M. and Rushton, N. 2009. Probabilistic reasoning with answer sets. Theory and Practice of Logic Programming 9, 1, 57–144.CrossRef Google Scholar

Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P. and Goodman, N. D. 2018. Pyro: Deep universal probabilistic programming. Journal of Machine Learning Research 20, 1, 973–978.Google Scholar

Brewka, G., Eiter, T. and Trusczynski, M. 2011. Answer set programming at a glance. Communications of the ACM 54, 12, 92–103.CrossRef Google Scholar

Cozman, F. G. and Maua, D. D. 2016. The Structure and Complexity of Credal Semantics, vol. 1661. CEUR Workshop Proceedings, 3–14.CEUR-WS.org.Google Scholar

Cozman, F. G. and Maua, D. D. 2017. On the semantics and complexity of probabilistic logic programs. Journal of Artificial Intelligence Research, 60, 221–262.Google Scholar

Cozman, F. G. and Maua, D. D. (2020). The joy of probabilistic answer set programming: Semantics, complexity, expressivity, inference. International Journal of Approximate Reasoning, 125, 218–239.CrossRef Google Scholar

Darwiche, A. 2004. New Advances in Compiling CNF into Decomposable Negation Normal Form. IOS Press, 328–332.Google Scholar

Darwiche, A. and Marquis, P. 2002. A knowledge compilation map. Journal of Artificial Intelligence Research, 17, 229–264.Google Scholar

De RAEDT, L., Demoen, B., Fierens, D., Gutmann, B., Janssens, G., Kimmig, A., Landwehr, N., Mantadelis, T., Meert, W., Rocha, R., Costa, V., Thon, I. and Vennekens, J. 2008. Towards digesting the alphabet-soup of statistical relational learning. In NIPS. 2008 Workshop on Probabilistic Programming.Google Scholar

De RAEDT, L., Kimmig, A. and Toivonen, H. 2007. ProbLog: A probabilistic Prolog and its application in link discovery. In IJCAI‘07: Proceedings of the 20th international joint conference on Artifical intelligence, vol. 7, AAAI Press, 2462–2467.Google Scholar

Eiter, T., Hecher, M. and Kiesel, R. 2021. Treewidth-aware cycle breaking for algebraic answer set counting. In Bienvenu, M., Lakemeyer, G. and Erdem, E., Eds. Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, KR 2021, 269–279.Google Scholar

Feller, W. 1968. An Introduction to Probability Theory and Its Applications: Vol. 1.Wiley Series in Probability and Mathematical Statistics. John Wiley & sons.Google Scholar

Gebser, M., Kaufmann, B. and Schaub, T. 2009.Solution enumeration for projected boolean search problems. In Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, Van Hoeve, W.-J. and Hooker, J. N., Eds. Berlin, Heidelberg. Springer Berlin Heidelberg, 71–86.CrossRef Google Scholar

Gehr, T., Misailovic, S. and Vechev, M. 2016. PSI: Exact symbolic inference for probabilistic programs. In 28th International Conference on Computer Aided Verification (CAV 2016), Part I, vol. 9779, Chaudhuri, S. and Farzan, A., Eds. 62–83.CrossRef Google Scholar

Gelfond, M. and Lifschitz, V. 1988, The stable model semantics for logic programming. In Proceedings of International Logic Programming Conference and Symposium, vol. 88, USA, 1070–1080.Google Scholar

Gill, J. 1977. Computational complexity of probabilistic turing machines. SIAM Journal on Computing 6, 4, 675–695.CrossRef Google Scholar

Gutmann, B., Jaeger, M. and De Raedt, L. 2011a. Extending problog with continuous distributions. In Inductive Logic Programming, ILP 2010, vol. 6489, 76–91.CrossRef Google Scholar

Gutmann, B., Thon, I., Kimmig, A., Bruynooghe, M. and De RAEDT, L. 2011b. The magic of logical inference in probabilistic programming. Theory and Practice of Logic Programming 11, 4-5, 663–680.CrossRef Google Scholar

Hahn, S., Janhunen, T., Kaminski, R., Romero, J., Ruhling, N. and Schaub, T. 2022. plingo: A system for probabilistic reasoning in clingo based on LPMLN. In Rules and Reasoning, G. Governatori and A.-Y. Turhan, Eds. Cham: Springer International Publishing, 54–62.CrossRef Google Scholar

Islam, M. A., Ramakrishnan, C. and Ramakrishnan, I. 2012. Inference in probabilistic logic programs with continuous random variables. Theory and Practice of Logic Programming 12, 4-5, 505–523.CrossRef Google Scholar

Jaffar, J. and Maher, M. J. (1994). Constraint logic programming: A survey. The Journal of Logic Programming 19-20,503–581.CrossRef Google Scholar

Janhunen, T., Kaminski, R., Ostrowski, M., Schellhorn, S., Wanko, P. and Schaub, T. 2017. Clingo goes linear constraints over reals and integers. Theory and Practice of Logic Programming 17, 5-6, 872–888.CrossRef Google Scholar

Kabir, M., Everado, F., Shukla, A., Hecher, M., Fichte, J. K. and Meel, K. S. 2022. ApproxASP - A scalable approximate answer set counter. In AAAI Conference on Artificial Intelligence.CrossRef Google Scholar

Kiesel, R., Totis, P. and Kimmig, A. 2022. Efficient knowledge compilation beyond weighted model counting. Theory and Practice of Logic Programming 22, 4, 505–522.CrossRef Google Scholar

Kimmig, A., Demoen, B., De Raedt, L., Costa, V. S. and Rocha, R. 2011. On the implementation of the probabilistic logic programming language robLog. Theory and Practice of Logic Programming 11, 2-3, 235–262.CrossRef Google Scholar

Kimmig, A., Van Den Broeck, G. and De Raedt, L. 2017. Algebraic model counting. Journal of Applied Logic 22, 46–62.CrossRef Google Scholar

Lee, J. and Wang, Y. 2016. Weighted rules under the stable model semantics. In Baral, C., Delgrande, J. P. and Wolter, F., Eds. Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning. AAAI Press, 145–154.Google Scholar

Lee, J. and Yang, Z. 2017. LPMLN, weak constraints, and P-log. In Singh, S. and Markovitch, S., Eds. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017. AAAI Press, San Francisco, California, USA, 1170–1177.Google Scholar

Maua, D. D. and Cozman, F. G. 2020. Complexity results for probabilistic answer set programming. International Journal of Approximate Reasoning 118, 133–154.CrossRef Google Scholar

Michels, S., Hommersom, A., Lucas, P. J. F. and Velikova, M. 2015. A new probabilistic constraint logic programming language based on a generalised distribution semantics. Artificial Intelligence 228, 1–44.CrossRef Google Scholar

Mitzenmacher, M. and Upfal, E. 2017. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press.Google Scholar

Nickels, M. 2018. Differentiable SAT/ASP. In Bellodi, E. and Schrijvers, T., Eds. Proceedings of the 5th International Workshop on Probabilistic Logic Programming, PLP. 2018, co-located with the 28th International Conference on Inductive Logic Programming (ILP 2018), Ferrara, Italy, 1 Sept. 2018, vol. 2219 of CEUR Workshop Proceedings, 62–74.CEUR-WS.org.Google Scholar

Nickles, M. and Mileo, A. 2015. A hybrid approach to inference in probabilistic non-monotonic logic programming. In Riguzzi, F. and Vennekens, J., Eds. Proceedings of the 2nd International Workshop on Probabilistic Logic Programming co-located with 31st International Conference on Logic Programming (ICLP 2015), Cork, Ireland, 31st Aug. 2015, vol. 1413 of CEUR Workshop Proceedings, 57–68. CEUR-WS.org.Google Scholar

Pfeffer, A. 2016. Practical Probabilistic Programming. Manning Publications.Google Scholar

Phan, D., Pradhan, N. and Jankowiak, M. 2019. Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554.Google Scholar

Riguzzi, F. 2013. MCINTYRE: A Monte Carlo system for probabilistic logic programming. Fundamenta Informaticae 124, 4, 521–541.CrossRef Google Scholar

Riguzzi, F. 2022. Foundations of probabilistic logic programming languages, semantics, inference and learning, 2nd edn. Gistrup, Denmark: River Publishers.CrossRef Google Scholar

Rocha, V. H. N. and Gagliardi Cozman, F. 2022. A credal least undefined stable semantics for probabilistic logic programs and probabilistic argumentation. In Kern-Isberner, G., Lakemeyer, G. and Meyer, T., Eds. Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning, KR 2022, 309–319.Google Scholar

Sato, T. 1995. A statistical learning method for logic programs with distribution semantics. In L. Sterling, Ed. Logic Programming, Proceedings of the Twelfth International Conference on Logic Programming, Tokyo, Japan, June 13-16, 1995, MIT Press, 715–729.Google Scholar

Shterionov, D. S., Renkens, J., Vlasselaer, J., Kimmig, A., Meert, W. and Janssens, G. 2015. The most probable explanation for probabilistic logic programs with annotated disjunctions. In Inductive Logic Programming, Berlin, Heidelberg, 139–153.CrossRef Google Scholar

Totis, P., De Raedt, L. and Kimmig, A. 2023. mProbLog: Stable model semantics in problog for probabilistic argumentation.Theory and Practice of Logic Programming, 1–50. doi: 10.1017/S147106842300008XGoogle Scholar

Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K. and Blei, D. M. 2017. Deep probabilistic programming. CoRR, abs/1701.03757.Google Scholar

Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang, D. and Blei, D. M. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv preprint arXiv: 1610.09787.Google Scholar

Tuckey, D., Russo, A. and Broda, K. 2021. PASOCS: A parallel approximate solver for probabilistic logic programs under the credal semantics, arXiv, abs/2105.10908Google Scholar

Van De Meent, J.-W., Paige, B., Yang, H. and Wood, F. 2021. An introduction to probabilistic programming.Google Scholar

Van Den Broeck, G., Thon, I., Van Otterlo, M. and De Raedt, L. 2010. DTProbLog: A decision-theoretic probabilistic Prolog, 1217–1222.Google Scholar

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournpeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., Van DER WALT, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., Vanderplas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Riberio, A. H., Pedregosa, F., Van MULBERGT, P. and SciPy 1.0 Contributors 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 3, 261–272.CrossRef Google Scholar PubMed

Table 1. Worlds, probabilities, and answer sets for Example1

Table 2. Tight complexity bounds of the CR problem in PASP from Mauá and Cozman (2020). Not stratified denotes programs with stratified negations and$\vee$disjunction in the head

Algorithm 1. Function Discretize: discretization of a hybrid probabilistic answer set program with rules R, continuous probabilistic facts C and discrete probabilistic factsD.

Table 3. Worlds and probabilities for Example4.$f_1$and$f_2$are the two probabilistic facts obtained by the discretization process of the continuous probabilistic fact$a$into three intervals. The column LP/UP indicates whether the world contributes only to the upper probability (UP) or also to the lower probability (LP + UP)

Algorithm 2. Function Sampling: computation of the probability of a query q by taking s samples in a hybrid probabilistic answer set program P with rules R, continuous probabilistic facts C, and discrete probabilistic factsD. Variable type indicates whether the sampling of the discretized program or the original program is selected.

Fig 1. Results for $t1$, $t4$, and $t5$ (left) and results for aspcs applied to $t3$ (right) with two and five continuous facts.

Fig 2. Results for the experiment $t2$ with a fixed number of discrete facts and an increasing number of continuous variables.

Fig 3. Results for the experiment $t3$ with a fixed number of continuous random variables and an increasing number of discrete facts. The x axis of the left plot is cut at size 20, to keep the results readable.

Table 4. Results for the approximate algorithms based on sampling. The columns contain, from the left, the name of the dataset, the instance index, and the average time required to take$10^2$, $10^3$, $10^4$, $10^5$, and$10^6$samples on the original and converted program. O.O.M. and T.O. stand, respectively, for out of memory and timeout

Table 5. Standard deviations for the results listed in Table 4. A dash denotes that there are no results for that particular instance due to either a memory error or a time limit

Table 6. Largest solvable instances of$t4$by sampling the converted program when reducing the available memory. The columns contain, from the left, the maximum amount of memory, the largest solvable instance together with the number of probabilistic facts (# p.f.) and rules (# rules) obtained via the conversion, and the maximum number of samples that can be taken (max. # samples). Note that we increase the number of samples by starting from$10^2$and iteratively multiplying the number by 10, up to$10^6$, so in the last column, we report a range: this means that we get a memory error with the upper bound while we can take the number of samples in the lower bound. Thus, the maximum values of samples lie in the specified range

Article contents

Probabilistic Answer Set Programming with Discrete and Continuous Random Variables

Abstract

Keywords

Information

1 Introduction

2 Background

2.1 Answer set programming

2.2 Probabilistic answer set programming

2.3 ProbLog and hybrid probLog

3 Hybrid probabilistic answer set programming

4 Algorithms

4.1 Exact inference

4.1.1 Modifying the PASTA solver

4.1.2 Modifying the aspcs solver

4.2 Approximate inference

4.2.1 Sampling the discretized program

4.2.2 Sampling the hybrid program

4.2.3 Approximate algorithm description

5 Experiments

5.1 Dataset t1

5.2 Dataset t2

5.3 Dataset t3

5.4 Dataset t4

5.5 Dataset t5

5.6 Exact inference results

5.7 Approximate inference results

6 Related Work

7 Conclusions

Acknowledgments

Competing interests

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests