Implementing a Library for Probabilistic Programming using Non-strict Non-determinism

This paper presents PFLP, a library for probabilistic programming in the functional logic programming language Curry. It demonstrates how the concepts of a functional logic programming language support the implementation of a library for probabilistic programming. In fact, the paradigms of functional logic and probabilistic programming are closely connected. That is, language characteristics from one area exist in the other and vice versa. For example, the concepts of non-deterministic choice and call-time choice as known from functional logic programming are related to and coincide with stochastic memoization and probabilistic choice in probabilistic programming, respectively. We will further see that an implementation based on the concepts of functional logic programming can have benefits with respect to performance compared to a standard list-based implementation and can even compete with full-blown probabilistic programming languages, which we illustrate by several benchmarks. Under consideration in Theory and Practice of Logic Programming (TPLP).


Introduction
The probabilistic programming paradigm allows the succinct definition of probabilistic processes and other applications based on probability distributions, for example, Bayesian networks as used in machine learning. A Bayesian network is a visual, graph-based representation for a set of random variables and their dependencies. One of the hello world-examples of Bayesian networks is the influence of rain and a sprinkler on wet grass. Figure 1 shows an instance of this example, the concrete probabilities differ between publications. A node in the graph represents a random variable, a directed edge between two nodes represents a conditional dependency. Each node is annotated with a probability function represented as a table. The input values are on the left side of the table and the right side of the table describes the possible output and the corresponding probability. The input values of the function correspond to the incoming edges of that node. For example, the node for sprinkler depends on rain, thus, the sprinkler node has an incoming edge that originates from the rain node. The input parameter rain appears directly in the table that describes the probability function for sprinkler. For the example in Figure 1 the interpretation of the graph reads as follows: it rains with a probability of 20 %; depending on the rain, the probability for an activated sprinkler is 40 % and 1 %, respectively; depending on both these factors, the grass can be observed as wet with a probability of 0 %, 80 %, 90 % or 99 %. The network can answer the following exemplary questions.
• What is the probability that it is raining?
• What is the probability that the grass is wet, given that it is raining?
• What is the probability that the sprinkler is on, given that the grass is wet?
The general idea of probabilistic programming has been quite successful. There are a variety of probabilistic programming languages supporting all kinds of programming paradigms. For example, the programming languages Church (Goodman et al. 2008) and Anglican  are based on the functional programming language Scheme, ProbLog (Kimmig et al. 2011) is an extension of the logic programming language Prolog, Probabilistic C (Paige and Wood 2014) is based on the imperative language C, and WebPPL (Goodman and Stuhlmüller 2014), the successor of Church, is embedded in a functional subset of JavaScript. Besides full-blown languages there are also embedded domain-specific languages that implement probabilistic programming as a library. For example, FACTORIE (McCallum et al. 2009) is a library for the hybrid programming language Scala, and Erwig and Kollmansberger (Erwig and Kollmansberger 2006) present a library for the functional programming language Haskell. We recommend the survey by Gordon et al. (Gordon et al. 2014) about the current state of probabilistic programming for further information.
This paper presents PFLP, a library providing a domain-specific language for probabilistic programming in the functional logic programming language Curry (Antoy and Hanus 2010). PFLP makes heavy use of functional logic programming concepts and shows that this paradigm is well-suited for implementing a library for probabilistic programming. In fact, there is a close connection between probabilistic programming and functional logic programming. For example, non-deterministic choice and probabilistic choice are similar concepts. Furthermore, the concept of call-time choice as known from functional logic programming coincides with (stochastic) memoization (De Raedt and Kimmig 2013) in the area of probabilistic programming. We are not the first to observe this close connection between functional logic programming and probabilistic programming. For example, Fischer et al. (Fischer et al. 2009) present a library for modeling functional logic programs in the functional language Haskell. As they state, by extending their approach to weighted non-determinism we can model a probabilistic programming language. Besides a lightweight implementation of a library for probabilistic programming in a functional logic programming language, this paper makes the following contributions.
• We investigate the interplay of probabilistic programming with the features of a functional logic programming language. For example, we show how call-time choice and nondeterminism interplay with probabilistic choice. • We discuss how we utilize functional logic features to improve the implementation of probabilistic combinators. • We present an implementation of probability distributions using non-determinism in combination with non-strict probabilistic combinators that is more efficient than an implementation using lists. • We illustrate that the combination of non-determinism and non-strictness with respect to distributions has to be handled with care. More precisely, it is important to enforce a certain degree of strictness in order to guarantee correct results. • In contrast to the conference version of the paper (Dylus et al. 2018) we discuss the usage of partial functions in combination with library functions in more detail, reason about laws for two operations of the library, and present performance comparisons between our library, ProbLog and WebPPL. • Finally, this paper aims at fostering the exchange between the community of probabilistic programming and of functional logic programming. That is, while the connection exists for a long time, there has not been much exchange between the communities. We would like to take this paper as a starting point to bring these paradigms closer together. Thus, this paper introduces the concepts of both, the functional logic and probabilistic programming, paradigms.
Please note that the current state of our library cannot compete against full-blown probabilistic languages or mature libraries for probabilistic programming in terms of features, e.g., the library does not provide any sampling mechanisms. Nevertheless, the library is a good showcase for languages with built-in non-determinism, because the functional logic approach can be superior to the functional approach using lists. Furthermore, we want to emphasize that this paper uses non-determinism as an implementation technique to develop a library for probabilistic programming. That is, we are not mainly concerned with the interaction of non-determinism and probabilities as, for example, discussed in the work of Varacca and Winskel (Varacca and Winskel 2006) and multiple others. The library we develop in this paper does not combine both effects, but provides combinators for probabilistic programming by leveraging Curry's built-in non-strict non-determinism.
In this section we discuss the core of the PFLP library 1 . The implementation is based on a Haskell library for probabilistic programming presented by Erwig and Kollmansberger (Erwig and Kollmansberger 2006). We will not present the whole PFLP library, but only core functions. The paper at hand is a literate Curry file. We use the Curry compiler KiCS2 2 by Braßel et al. (Braßel et al. 2011) for all code examples.

Modeling Distributions
One key ingredient of probabilistic programming is the definition of distributions. A distribution consists of pairs of elementary events and their probability. We model probabilities as Float and distributions as a combination of an elementary event and the corresponding probability. 3 type Probability = Float data Dist a = Dist a Probability In a functional language like Haskell, the canonical way to define distributions uses lists. Here, we use Curry's built-in non-determinism as an alternative for lists to model distributions with more than one event-probability pair. As an example, we define a fair coin, where True represents heads and False represents tails, as follows. 4 coin :: Dist Bool coin = (Dist True 1 2 ) ? (Dist False 1 2 ) In Curry the (?)-operator non-deterministically chooses between two given arguments. Nondeterminism is not reflected in the type system, that is, a non-deterministic choice has type a → a → a. Such non-deterministic computations introduced by (?) describe two individual computation branches; one for the left argument and one for the right argument of (?).
We could also define coin in Prolog-style by giving two rules for coin. Both implementations can be used interchangeably since the (?)-operator is defined in the Prolog-style using two rules as well.
Printing an expression in the REPL 5 evaluates the non-deterministic computations, thus, yields one result for each branch as shown in the following examples. 1 We provide the code for the library at https://github.com/finnteegen/pflp. 2 We use version 0.6.0 of KiCS2 and the source is found at https://www-ps.informatik.uni-kiel.de/kics2/. 3 The polymorph data type Dist is parameterized over a type variable a. It has a single constructor also named Dist that is of type a → Probability → Dist a. The constructor Dist in Curry corresponds to a binary functor in Prolog. 4 Here and in the following we write probabilities as fractions for readability. 5 We visualize the interactions with the REPL using> as prompt.
The REPL computes the values using a breadth-first-search strategy to visualize the results. Due to the search strategy, we observe different outputs when changing the order of arguments to (?). However, because Curry's semantics is set-based (Christiansen et al. 2011) the order of the results does not matter.
It is cumbersome to define distributions explicitly as in the case of coin. Hence, we define helper functions for constructing distributions. 6 Given a list of events and probabilities, enum creates a distribution by folding these pairs non-deterministically with a helper function member. In Curry the constant failed is a silent failure that behaves as neutral element with respect to (?). That is, the expression True ? failed is equivalent to True. Hence, the function member takes a list and yields a non-deterministic choice of all elements of the list.
As a shortcut, we define a function that yields a uniform distribution given a list of events as well as a function certainly, which yields a distribution with a single event of probability one. The function repeat yields a list that contains the given value infinitely often. Because of Curry's laziness, it is sufficient if one of the arguments of enum is a finite list because zipWith stops when one of its arguments is empty. We can then refactor the definition of coin using uniform as follows.
coin :: Dist Bool coin = uniform [True, False] In general, the library hides the constructor Dist, that is, the user has to define distributions by using the combinators provided by the library.
The library provides additional functions to combine and manipulate distributions. In order to work with dependent distributions, the operator (> > >=) applies a function, which yields a distribution, to each event of a given distribution and multiplies the corresponding probabilities. 8 Intuitively, we have to apply the function f to each event of the distribution d and combine the resulting distributions into a single distribution. In a Haskell implementation, we would use a list comprehension to define this function. In the Curry implementation, we model distributions as non-deterministic computations, thus, the above rule describes the behavior of the function for an arbitrary pair of the first distribution and an arbitrary pair of the second distribution, that is, the result of f .
Using the operator (> > >=) we can, for example, define a distribution that models flipping two coins. The events of this distribution are pairs whose first component is the result of the first coin flip and whose second component is the result of the second coin flip.
In contrast to the example independentCoins we can also use the operator (> > >=) to combine two distributions where we choose the second distribution on basis of the result of the first. For example, we can define a distribution that models flipping two coins, but in this case we only flip a second coin if the first coin yields heads. The implementation of (> > >=) via let-bindings seems a bit tedious, however, it is important that we define (> > >=) as it is. The canonical implementation performs pattern matching on the first argument but uses a let-binding for the result of f . That is, it is strict in the first argument but non-strict in the application of f , the second argument. For now it is sufficient to note -and keep in mind -that there is a difference between pattern matching and using let-bindings. In order to understand this difference, let us consider the following implementation of fromJustToList and an alternative implementation fromJustToListLet. 9 fromJustToList :: The second implementation, fromJustToListLet, is less strict, because it yields a list constructor, (:), without evaluating its argument first. That is, we can observe the difference when passing failed and checking if the resulting list is empty or not. be evaluated and, thus, the function null propagates failed as result. In contrast, the definition of fromJustToListLet postpones the evaluation of its argument to the right-hand side, i.e., the argument needs to be evaluated only if the computation demands the value x explicitly. The function null does not demand the evaluation of x, because it only checks the surrounding list constructor.
The same strictness property as for fromJustToList holds for a definition via explicit pattern matching using case ... of. In particular, pattern matching of the left-hand side of a rule desugars to case expressions on the right-hand side.
> null (fromJustToListCase failed) failed We discuss the implementation of (> > >=) in more detail later. For now, it is sufficient to keep in mind that (> > >=) yields a Dist-constructor without evaluating any of its arguments. In contrast, a definition using pattern matching or a case expression needs to evaluate its argument first, thus, is more strict.
For independent distributions we provide the function joinWith that combines two distributions with respect to a given function. We implement joinWith by means of (> > >=).
In a monadic setting this function is sometimes called liftM2. Here, we use the same nomenclature as Erwig and Kollmansberger (Erwig and Kollmansberger 2006).
As an example we define a function that flips a coin n times.
flipCoin :: When we run the example of flipping two coins in the REPL of KiCS2, we get four events.
In the example above, coin is non-deterministic, namely, coin = (Dist True 1 2 ) ? (Dist False 1 2 ). Applying joinWith to coin and coin combines all possible results of two coin tosses.

Querying Distributions
With a handful of building blocks to define distributions available, we now want to query the distribution, that is, calculate the probability of certain events. We provide an operator (??)::(a → Bool) → Dist a → Probability -which we will define shortly -to extract the probability of an event. The event is specified as a predicate, passed as first argument. The operator filters events that satisfy the given predicate and computes the sum of the probabilities of the remaining elementary events. We implement this kind of filter function on distributions in Curry.
The implementation of filterDist is a partial identity on the event-probability pairs. Every event that satisfies the predicate is part of the resulting distribution. The function fails for eventprobability pairs that do not satisfy the predicate. Querying a distribution, i.e., summing up all probabilities that satisfy a predicate, is a more advanced task in the functional logic approach. Remember that we represent a distribution by chaining all event-probability pairs with (?), thus, constructing non-deterministic computations. These non-deterministic computations introduce individual branches of computations that cannot interact with each other. In order to compute the total probability of a distribution, we have to merge these distinct branches. Such a merge is possible by the encapsulation of non-deterministic computations. Similar to the findall construct of the logic language Prolog, in Curry we encapsulate a non-deterministic computation by using a primitive called allValues 10 . The function allValues :: a → {a} operates on a polymorphic -and potentially non-deterministic -value and yields a multiset of all non-deterministic values. In order to work with encapsulated values, Curry provides the function foldValues :: (a → a → a) → a → {a} → a to fold the resulting multiset.
We do not discuss the implementation details behind allValues here. It is sufficient to know that, as a library developer, we can employ this function to encapsulate non-deterministic values and use these values in further computations. However, due to non-transparent behavior in combination with sharing as discussed by Brael et al. (Braßel et al. 2004), a user of the library should not use allValues at all. In a nutshell, inner-most and outer-most evaluation strategies may cause different results when combining sharing and encapsulation.
With this encapsulation mechanism at hand, we can define the query operator (??) as follows.
prob :: First we filter the elementary events by some predicate and project to the probabilities only. Afterwards we encapsulate the remaining probabilities and sum them up. As an example for the use of (??), we may flip four coins and calculate the probability of at least two heads -that is, the list contains at least two True values.
In order to check the result, we calculate the probability by hand. Since there are more events that satisfy the predicate than events that do not, we sum up the probabilities of the events that do not satisfy the predicate and calculate the complementary probability. There is one event where all coins show tails and four events where one of the coins shows heads and all other show tails.

Library Pragmatics
Up to now, we have discussed a simple library for probabilistic programming that uses nondeterminism to represent distributions. In this section we will see that we can highly benefit from Curry-like non-determinism with respect to performance when we compare PFLP's implementation with a list-based implementation. More precisely, when we query a distribution with a predicate that does not evaluate its argument completely, we can possibly prune large parts of the search space. Before we discuss the details of the combination of non-strictness and nondeterminism, we discuss aspects of sharing non-deterministic choices. Finally, we discuss details about the implementation of (> > >=) and why PFLP does not allow non-deterministic events within distributions.

Call-time Choice vs. Run-time Choice
By default Curry uses call-time choice, that is, variables denote single deterministic choices. When we bind a variable to a non-deterministic computation, one value is chosen and all occurrences of the variable denote the same deterministic choice. Often call-time choice is what you are looking for. For example, this slightly modified definition of filterDist makes use of call-time choice.
Due to pattern matching via let-binding the variable d on the right-hand side corresponds to a single deterministic choice for the input distribution, namely, the one that satisfies the predicate and not the non-deterministic computation that was initially passed as second argument to filterDist. Almost as often run-time choice is what you are looking for and call-time choice gets in your way; probabilistic programming is no exception. For example, let us reconsider flipping a coin n times. We parametrize the function flipCoin over the given distribution and define the following generalized function.
replicateDist :: Here, we again use guard syntax in order to distinguish several cases depending on the Boolean expression. When we use this function to flip a coin twice, the result is not what we intended.
Because replicateDist shares the variable d, we only perform a choice once and replicate deterministic choices. In contrast, top-level nullary functions like coin are evaluated every time, thus, exhibit run-time choice, which is the reason why the previously shown flipCoin behaves properly.
In order to implement replicateDist correctly, we have to enforce run-time choice. We introduce the following type synonym and function to model and work with values with run-time choice behavior.
We can now use the type RT to hide the non-determinism on the right-hand side of a function arrow. This way, pick explicitly triggers the evaluation of rt, performing a new choice for every element of the result list.
replicateDist :: In order to use replicateDist with coin, we have to construct a value of type RT (Dist Bool). However, we cannot provide a function to construct a value of type RT that behaves as intended. Such a function would share a deterministic choice and non-deterministically yield two functions, instead of one function that yields a non-deterministic computation. The only way to construct a value of type RT is to explicitly use a lambda abstraction.
Instead of relying on call-time choice as default behavior, we could model Dist as a function and make run-time choice the default in PFLP. In this case, to get call-time choice we would have to use a special construct provided by the library -as it is the case in many probabilistic programming libraries, e.g., mem in WebPPL (Goodman and Stuhlmüller 2014).
On the other hand, ProbLog uses a similar concept to call-time choice, namely, stochastic memoization, which reuses already computed results. That is, predicates that are associated with probabilities become part of the memoized result. If a fair coin flip, for example, already resulted in True, then the probability of all further coin flips that also result in True have probability 1. Due to stochastic memoization the coin is not flipped a second time, but is identified as the same coin as before. Thus, stochastic memoization as used in ProbLog is similar to the extension of tabling in Prolog systems, but adapted to the setting of probabilistic programming that extends predicates with probabilities. Similar to our usage of RT to mimic run-time choice in Curry, we can use a so-called trial identifier, which is basically an additional argument, to circumvent memoization for a predicate like coin in ProbLog. The difference to RT is that the trial identifier needs to be different for each call to the predicate in order to force re-evaluation.
In the end, we have decided to go with the current modeling based on call-time choice, because the alternative would work against the spirit of the Curry programming language. There is a long history of discussions about the pros and cons of call-time choice and run-time choice. It is common knowledge in probabilistic programming (De Raedt and Kimmig 2013) that memoization -that is, call-time choice -has to be avoided in order to model stochastic automata or probabilistic grammars. Similarly, Antoy (Antoy 2005) observes that you need run-time choice to elegantly model regular expressions in the context of functional logic programming languages. Then again, probabilistic languages need a concept like memoization in order to use a single value drawn from a distribution multiple times.

Combination of Non-strictness and Non-determinism
This subsection illustrates the benefits from the combination of non-strictness and non-determinism with respect to performance. More precisely, in a setting that uses Curry-like nondeterminism, non-strictness can prevent non-determinism from being "spawned". Let us consider calculating the probability for throwing only sixes when throwing n dice. First we define a uniform die as follows.
data Side = One | Two | Three | Four | Five | Six die :: Dist Side die = uniform [One, Two, Three, Four, Five, Six] We define the following query by means of the combinators introduced so far. The function all simply checks that all elements of a list satisfy a given predicate; it is defined by means of the Boolean conjunction (∧).
allSix :: Int → Probability allSix n = (all (≡ Six)) ?? (replicateDist n (λ () → die)) Table 1 compares running times 11 of this query for different numbers of dice. The row labeled "Curry ND" lists the running times for an implementation that uses the operator (> > >=). The row "Curry List" shows the numbers for a list-based implementation in Curry, which is a literal translation of the library by Erwig and Kollmansberger. The row labeled "Curry ND!" uses an operator (> > >=!) instead, which we will discuss shortly. Finally, we compare our implementation to the original list-based implementation, which the row labeled "Haskell List" refers to. The table states the running times in milliseconds of a compiled executable for each benchmark as a mean of three runs. Cells marked with "-" take more than one minute.
Obviously, the example above is a little contrived. While the query is exponential in both list versions, it is linear in the non-deterministic setting 12 . To illustrate the behavior of the example above, we consider the following application for an arbitrary distribution dist of type  filterDist (all (≡ Six)) (joinWith (:) (Dist One 1 6 ) dist) This application yields an empty distribution without evaluating the distribution dist. The interesting point here is that joinWith yields a Dist-constructor without inspecting its arguments. When we demand the event of the resulting Dist, joinWith has to evaluate only its first argument to see that the predicate all (≡ Six) yields False. The evaluation of the expression fails without inspecting the second argument of joinWith. Figure 2 illustrates the evaluation in more detail.
In case of the example allSix, all non-deterministic branches that contain a value different from Six fail fast due to the non-strictness. Thus, the number of evaluation steps is linear in the number of rolled dice.

Figure 3. Simplified evaluation illustrating strict non-determinism
We can only benefit from the combination of non-strictness and non-determinism if we define (> > >=) with care. Let us take a look at a strict variant of (> > >=) and discuss its consequences.
This implementation is strict in its first argument as well as in the result of the function application. When we use (> > >=!) to implement the allSix example, we lose the benefit of Curry-like non-determinism. The row in Table 1 labeled "Curry ND!" shows the running times when using (> > >=!) instead of (> > >=). As (> > >=!) is strict, the function joinWith has to evaluate both its arguments to yield a result. Figure 3 shows how the formerly unneeded distribution dist now has to be evaluated in order to yield a value. More precisely, using (> > >=!) causes a complete evaluation of dist. Please note that an implementation that is similar to (> > >=) is not possible in a list-based implementation. The following definition of concatMap is usually used to define the bind operator for lists.
The strict behavior follows from the definition via pattern matching on the list argument. In contrast to (> > >=!) there is, however, no other implementation that is less strict. The pattern matching is inevitable due to the two possible constructors, [ ] and (:), for lists. As a consequence, a list-based implementation has to traverse the entire distribution before we can evaluate the predicate all (≡ Six). The consequence is that the running times of "Haskell List" in Table 1 cannot compete with "Curry ND" when the number of dice increases.
Intuitively, we expect similar running times for "Curry ND!" and "Curry List" as the bind operator for lists has to evaluate its second argument as well -similar to (> > >=!). However, the observed running times do not have the expected resemblance. "Curry ND!" heavily relies on non-deterministic computations, which causes significant overhead for KiCS2. We do not investigate these differences here but propose it as a direction for future research.
Obviously, turning an exponential problem into a linear one is like getting only sixes when throwing dice. In most cases we are not that lucky. For example, consider the following query for throwing n dice that are either five or six.
allFiveOrSix :: Int → Probability allFiveOrSix n = (all (λ s → s ≡ Five ∨ s ≡ Six)) ?? (replicateDist n (λ () → die))  Table 2. Overview of running times of the query allFiveOrSix n in all implementations. Nevertheless, the running time of the non-strict, non-deterministic implementation is much better because we only have to consider two sides -six and five -while we have to consider all sides in the list implementations and the non-deterministic, strict implementation. That is, while the base of the complexity is two in the case of the non-deterministic, non-strict implementation, it is six in all the other cases. As we have observed in the other examples before, we get an overhead in the case of the strict non-determinism compared to the list implementation due to the heavy usage of non-deterministic computations.

Definition of the Bind Operator
In this subsection we discuss our design choices concerning the implementation of the bind operator. We illustrate that we have to be careful about non-strictness, because we do not want to lose non-deterministic results. First, we revisit the definition of (> > >=) introduced in Section 2.
We can observe two facts about this definition. First, the definition yields a Dist-constructor without matching any argument. Second, if neither the event nor the probability of the final distribution is evaluated, the application of the function f is not evaluated either.
We can observe these properties with some exemplary usages of (> > >=). As a reference, we see that pattern matching the Dist-constructor of coin triggers the non-determinism and yields two results.

> (λ (Dist x p) → True) coin True True
In contrast, distributions resulting from an application of (> > >=) behave differently. This time, pattern matching on the Dist-constructor does not trigger any non-determinism.
We observe that the last two examples yield a single result, because the (> > >=)-operator changes the position of the non-determinism. That is, the non-determinism does not reside at the same level as the Dist-constructor, but in the arguments of Dist. Therefore, we have to be sure to trigger all non-determinism when we query distributions. Not evaluating non-determinism might lead to false results when we sum up probabilities. Hence, non-strictness is a crucial property for positive pruning effects, but has to be used carefully.
Consider the following example usage of (> > >=), which is an inlined version of joinWith applied to the Boolean conjunction (∧).
We lose one expected result from the distribution, because (∧) is non-strict in its second argument in case the first argument is False. When the first coin evaluates to False, (> > >=) ignores the second coin and yields False straightaway. In this case, the non-determinism of the second coin is not triggered and we get only three instead of four results. The non-strictness of (∧) has no consequences when using (> > >=!), because the operator evaluates both arguments and, thus, triggers the non-determinism. In the case of projecting to the event we do not care about the missing result. However, when we sum up probabilities, we do not want events to get lost.
When we compute the total probability of a distribution, the result should always be 1.0. However, the query above has only three results and every event has a probability of 0.25, resulting in a total probability of 0.75. Here is the good news. While events can get lost when passing non-strict functions to (> > >=), probabilities never get lost. For example, consider the following application.
Since multiplication is strict, if we demand the resulting probability, the operator (> > >=) has to evaluate the Dist-constructor and its probability. That is, no values get lost if we evaluate the resulting probability. Fortunately, the query operation (??) calculates the total probability of the filtered distributions, thus, evaluates the probability as the following example shows.
All in all, in order to benefit from non-strictness, all operations provided by the library have to use the right amount of strictness, not too much and not too little. For this reason PFLP does not provide the Dist-constructor nor the corresponding projection functions to the user. With this restriction, the library guarantees that no relevant probabilities get lost.

Non-deterministic Events
We assume that all events passed to library functions are deterministic, that is, the library does not support non-deterministic events within distributions. In order to illustrate why this restriction is crucial, we consider an example that breaks this rule.
Curry provides free variables, that is, expressions that non-deterministically evaluate to every possible value of its type. When we revisit the definition of a die, we might be tempted to use a free variable instead of explicitly enumerating all values of type Side.
We can define a free variable of type Side as follows. ] We just use a free variable -the constant side -and pass the probability of each event as second parameter. Now, let us consider the following query.  Figure 4. Evaluation of a distribution that contains a free variable that is not demanded constructors of Side. Then, the definition of (??) directly projects to the probability of die2 and throws away all non-determinism left in Dist side 1 6 . Therefore, we lose probabilities we would like to sum up.
As a consequence for PFLP, non-deterministic events within a distribution are not allowed. If users of the library stick to this rule, it is not possible to misuse the operations and lose nondeterministic results due to non-strictness.
However, one approach to overcome this issue when using enum is to use an alternative stricter implementation. That is, we could easily adapt the strictness behavior of enum in order to allow a more declarative definition of distributions using free variables without affecting the overall advantage leveraged by non-strict functions.

Partial Functions
Besides not using non-determinism for events, users have to follow another restriction. When using the bind operator (> > >=), the second argument is a function of type a → Dist b, that is, constructs a new distribution. As we have discussed before distributions need to sum up to a probability of 1.0, and the distributions we create via (> > >=) are no exception. This restriction is violated if we use partial functions as second argument of (> > >=). Recall the definition coin that describes a uniform distribution of type Bool, and consider the function partialPattern that depends on coin, but maps False to failed. partialPattern :: Dist Bool partialPattern = coin > > >= (λ b → if b then certainly True else failed) Due to the partial pattern matching in partialPattern, the resulting distribution does not sum up to 1.0 anymore, thus, violates the rule for a valid distribution. By performing a query with the predicate const True we can observe this property.
> (const True) ?? partialPattern 0.5 We only allow to filter distributions when a probability is computed using (??), but not in any other situation. In the current implementation this restriction on functions when using (> > >=) is neither statically nor dynamically enforced, but a coding convention that users should keep in mind and follow when working with the library.

Monad Laws
When we comply with the restrictions we have discussed above, the operators (> > >=) and certainly allow us to formulate probabilistic programs as one would expect. However, there is one obvious question that we did not answer yet. We did not check whether the operator (> > >=) together with certainly actually forms a monad as the name of the operator suggests. That is, we have to check whether the following three laws hold for all distributions d and all values x, f , and g of appropriate types.
In the previous subsection we have already observed that the first equality does not hold in general. For example, we have seen that there is a context that is able to distinguish the left-hand from the right-hand side. For instance, while the expression (λ (Dist x p) → True) coin yields True twice, the expression (λ (Dist x p) → True) (coin > > >= certainly) yields True only once. As most Curry semantics are based on sets -and not on multisets, the two sides of the equality would be the same. Notwithstanding, in a multiset semantics the user could still not observe the difference between the two expressions because he does not have access to the Dist-constructor. The user cannot pattern match on a Dist-constructor, but only use the combinator (??) to inspect a distribution.
In order to discuss the validity of the monad laws more rigorously, we apply equational reasoning to check whether the monad laws might fail.
The first monad law Let d :: Dist τ then we reason as follows about the first monad law. That is, the left-hand side is more defined than the right-hand side if d = failed.
Because the user cannot access the Dist-constructor she cannot observe this difference. The user can only compare two distributions by using the querying operator (??). Therefore, in the following we will show that the monad laws hold if we consider a context of the form pred ?? d where pred is an arbitrary predicate. Recall that we defined the operator (??) as follows.
(??) :: (a → Bool) → Dist a → Probability (??) pred d = foldValues (+.) 0.0 (allValues (prob (filterDist pred d))) Fortunately, the monad laws already hold if we consider the context filterDist pred for an arbitrary predicate pred :: a → Bool. Therefore we will show that the following equalities hold for all distributions d, and all values x, pred, f , and g of appropriate types.
In the following we will first show that equation (1) holds. We reason as follows for all distributions d :: Dist τ and predicates pred :: τ → Bool.
The (> > >=)-operator defers the pattern matching to the right-hand side via a let-expression. This so-called lazy pattern matching causes the monad laws to not hold without any context. However, because filterDist introduces a lazy pattern matching via a let-expression as well, observing two distributions via filterDist hides the difference between the two sides of the equation.
The second monad law For the second monad law (2) we reason as follows for all x :: τ 1 , and all f :: τ 1 → Dist τ 2 .
Here we observe the same restrictions as before, for example, if f yields failed for any argument x the equality does not hold. Once again, we consider the context filterDist pred for all pred :: τ 2 → Bool to reason that the user cannot observe the difference.
Fortunately, the second monad law holds as well in the context of filterDist.
The third monad law We do not discuss the third monad law (3) in detail here, as it holds without restrictions -even without the additional context using filterDist. All in all, certainly and (> > >=) form a valid monad from the user's point of view.

Applications and Evaluation
After presenting the basic combinators of the library and motivating the advantages of modeling distributions using non-determinism, we will implement some exemplary applications. We reimplement examples that have been characterized as challenging for probabilistic logic programming by Nampally and Ramakrishnan (Nampally and Ramakrishnan 2015), who use the examples to discuss the expressiveness of probabilistic logic programming and its cost with respect to performance. The examples focus on properties of random strings and their probabilities. Furthermore, we show benchmarks of these examples and compare them with the probabilistic languages ProbLog (Kimmig et al. 2011) and WebPPL (Goodman and Stuhlmüller 2014). These comparisons confirm the advantages of non-strict non-determinism with respect to performance.

Random Strings
In order to compare our library with other approaches for probabilistic programming we reimplement two examples about random strings that have also been implemented in ProbLog. This implementation can be found online. 13 We generate random strings of a fixed length over the alphabet {a, b} and calculate the probability that this string a) is a palindrome and b) contains the subsequence bb.
First we define a distribution that picks a character uniformly from the alphabet {a, b}.
pickChar :: Dist Char pickChar = uniform ['a', 'b'] Based on pickChar we define a distribution that generates a random string of length n, that is, picks a random char n times. We reuse replicateDist to define this distribution.
randomString :: Int → Dist String randomString n = replicateDist n (λ () → pickChar) In order to compute the probability that a random string is a palindrome and contains a subsequence bb, respectively, we define predicates that test these properties for a given string. A string is a palindrome, if it reads the same forwards and backwards. The following predicate, thus, checks if the reverse of a given string is equal to the original string.
palindrome :: String → Bool palindrome str = str ≡ reverse str The predicate that checks if a string contains two consecutive bs can be easily defined via pattern matching and recursion.
consecutiveBs :: String → Bool consecutiveBs str = case str of Now we are ready to perform some queries. What is the probability that a random string of length 5 is a palindrome?
> palindrome ?? (randomString 5) 0.25 What is the probability that a random string of length 10 contains two consecutive bs?
> consecutiveBs ?? (randomString 10) 0.859375 In general the above definitions of palindrome and consecutiveBs are quite naive and, thus, inefficient because all strings of the given length have to be enumerated explicitly. Due to the inefficiency, the ProbLog homepage introduces a more efficient version for both problems. In the following, we will discuss the alternative implementation to compute the probability for a palindrome only. This more efficient version has arguments for the index of the front and back position, picks characters for both ends and then moves the position towards the middle. That is, instead of naively generating the whole string of length n, this version checks each pair of front and back position first and fails straightaway, if they do not match. If the characters do match, the approach continues by moving both indices towards each other. In Curry an implementation of this idea looks as follows.
palindromeEfficient :: Int → Dist (Bool, String) palindromeEfficient n = palindrome ′ 1 n palindrome ′ :: The interesting insight here is that, thanks to the combination of non-determinism and nonstrictness, the evaluation of the first query based on palindrome behaves similar to the efficient variant in ProbLog. At first, it seems that the query performs poorly, because the predicate palindrome needs to evaluate the whole list due to the usage of reverse. The good news is, however, that the non-determinism is only spawned if we evaluate the elements of that list, and the elements still evaluate non-strictly, when explicitly triggered by (≡). More precisely, because of the combination of reverse and (≡), the evaluation starts by checking the first and last characters of a string and only continues to check more characters, and spawn more non-determinism, if they match. If these characters do not match, the evaluation fails directly and does not need to check any more characters. In a nutshell, when using PFLP, we get a version competitive with efficient implementations although we used a naive generate and test approach.

Performance Comparisons with Other Languages
Up to now, the only performance comparisons we discussed were for different implementations of our library in Curry and Haskell. These comparisons showed the advantage of using non-strict non-determinism concepts for the implementation of the library. Next we want to take a look at the comparison with the full-blown probabilistic programming languages ProbLog and WebPPL. ProbLog is a probabilistic extension of Prolog that is implemented in Python. WebPPL is the successor of Church; in contrast to Church it is not implemented in Scheme but in JavaScript. In order to try to measure the execution of the programs only, we precompiled the executable for the Curry programs. As Python is an interpreted language, a similar preparation was not available for ProbLog. However, we used ProbLog as a library in order to call the Python 14 interpreter directly. ProbLog is mainly implemented in Python, which allows users to import ProbLog as a Python package. 15 For WebPPL, we used node.js 16 to run the JavaScript program as a terminal application. All of the following running times are the mean of 1000 runs as calculated by the Haskell tool bench 17 that we use to run the benchmarks.
We compare the running times based on the two examples we already discussed: the dice rolling example presented in Subsection 3.2 and the palindrome example from the previous subsection.
Dice Rolling As discussed before, non-strict non-determinism performs pretty well for the dice rolling example, as a great deal of the search space is pruned early. Figure 5 shows an impressive advantage of our library in comparison with ProbLog and WebPPL. The x-axis represents the number of rolled dice and we present the time in milliseconds in logarithmic scale on the y-axis.
In order to demonstrate that our library outperforms ProbLog and WebPPL by several orders of magnitude for this example, we also run the Curry implementation for bigger values of n that eventually had the same running time as the last tested value for the other languages. The right part of Figure 5 shows the running times for 25 to 5000 dice. We can see that our library can compute the probability for getting only sixes for 2500 dice in roughly the same time as ProbLog for 5 dice. The running times for WebPPL seem very bad in the beginning, but after a few throws it becomes obvious that there is a constant overhead. In fact, Nogatz et al. (Nogatz et al. 2018) observe and discuss this overhead as well. Nevertheless, whereas WebPPL computes the probability for 9 dice, our library can compute the probability for 2500 dice in roughly the same time.
Palindrome In order to back up the results of the previous example, Figure 6 shows benchmarks for implementations of the naive and the efficient versions in Curry, ProbLog and WebPPL. The x-axis represents the length of the generated palindrome and, once again, we present the time in milliseconds in logarithmic scale on the y-axis. The figure uses dashed bars for the efficient version of the algorithm and a solid filling for the naive algorithm. The naive algorithm scales pretty bad in ProbLog and WebPPL. The Curry version is still applicable up to 30 as its running time is similar to all three efficient versions. Overall, the efficient versions all perform in a similar time range, but WebPPL shows a slight performance advantage for an increasing length of the string. More precisely, the efficient WebPPL implementation performs a query for strings of length 50 in the same time as the efficient Curry and ProbLog perform a query for strings of length 40. That is, the efficient WebPPL implementation outperforms the other implementations by roughly two orders of magnitude.

Related and Future Work
The approach of this paper is based on the work by Erwig and Kollmansberger (Erwig and Kollmansberger 2006), who introduce a Haskell library that represents distributions as lists of eventprobability pairs. Their library also provides a simple sampling mechanism to perform inference on distributions. Inference algorithms come into play because common examples in probabilistic programming have an exponential growth and it is not feasible to compute the whole distribution. Similarly,Ścibior et al. (Ścibior et al. 2015) present a more efficient implementation using a DSL in Haskell. They represent distributions as a free monad and inference algorithms as an interpretation of the monadic structure. Thanks to this interpretation, the approach is competitive to full-blown probabilistic programming languages with respect to performance. PFLP provides functions to sample from distributions as well. However, in this work we focus on modeling distributions and do not discuss any sampling mechanism. In particular, as future work we plan to investigate whether we can benefit from the improved performance as presented here in the case of sampling. Furthermore, a more detailed investigation of the performance of non-determinism in comparison to a list model is a topic for another paper.
The benefit with respect to the combination of non-strictness and non-determinism is similar to the benefit of property-based testing using Curry-like non-determinism in Haskell (Runciman et al. 2008) and Curry (Christiansen and Fischer 2008). In property-based testing, sometimes we want to generate test cases that satisfy a precondition. With Curry-like non-determinism the precondition can prune the search space early, while a list-based implementation has to generate all test cases and filter them afterwards. Both applications, probabilistic programming and propertybased testing, are examples, where built-in non-determinism outperforms list-based approaches as introduced by Wadler (Wadler 1985). In comparison to property-based testing, here, we observe that we can even add a kind of monadic layer on top of the non-determinism that computes additional information and still preserve the demand-driven behavior. However, the additional information has to be evaluated strictly -as it is the case for probabilities, otherwise we might lose non-deterministic results.
There are other more elaborated approaches to implement a library for probabilistic programming. For example, Kiselyov and Shan (Kiselyov and Shan 2009) extend their library for probabilistic programming in OCaml with a construct for lazy evaluation to achieve similar behavior with respect to efficiency. However, they use lazy evaluation for a concrete application based on importance sampling. Due to the combination of non-strictness and non-determinism, we can efficiently calculate the total probability of the resulting distribution without utilizing sampling.
As future work, we see a high potential for performance improvements for the Curry compiler KiCS2. PFLP serves as a starting point for further studies of functional logic features in practical applications. For example, we would expect the running times of the strict implementation based on non-determinism to be approximately as efficient as a list-based implementation. However, as the numbers in Section 3 show, the list approach is considerably faster.
The library's design does not support the use of non-determinism in events or probabilities of a distribution. In case of deeper non-determinism, we have to be careful to trigger all nondeterminism when querying a distribution as shown in Section 3. Hence, the extension of the library with an interface using non-determinism on the user's side is an idea worth studying.
Last but not least, we see an opportunity to apply ideas and solutions of the functional logic paradigm in probabilistic programming. For instance, Christiansen et al. (Christiansen et al. 2010) investigate free theorems for functional logic programs. As their work considers nondeterminism and sharing, adapting it to probabilistic programming should be easy. As another example, Braßel (Braßel 2009) presents a debugger for Curry that works well with non-determinism. Hence, it should be possible to reuse these ideas in the setting of probabilistic programming as well.

Conclusion
We have implemented a simple library for probabilistic programming in a functional logic programming language, namely Curry. Such a library proves to be a good fit for a functional logic language, because both paradigms share similar features. While other libraries need to reimplement features specific to probabilistic programming, we solely rely on core features of functional logic languages.
The key idea of the library is to use non-determinism to model distributions. We discussed design choices as well as the disadvantages and advantages that result from this approach. In the end, the library provides non-strict probabilistic combinators in order to avoid spawning unnecessary non-deterministic computations. These non-strict combinators have benefits in terms of performance due to early pruning. Using combinators that are too strict leads to a loss of these performance benefits. Fortunately, the user does not have to worry about using the right amount of strictness as long as she only uses the provided combinators.
There are, however, two restrictions the user has to follow when using the library. If the user does not follow these restrictions, a program may behave unexpectedly. Events may not be nondeterministic and the second argument of (> > >=)-operator may not be partial. Notwithstanding, we want to emphasize that the restrictions do not affect expressibility. In fact, a programming language like ProbLog shows similar behavior when mixing non-determinism and probabilities as our implementation.
Last but not least, we showed that the library obeys the expected monad laws with respect to observational behavior and reimplemented examples from the probabilistic programming literature to compare the performance of our library with other existing languages.