Level-p-complexity of Boolean functions using thinning, memoization, and polynomials

Abstract This paper describes a purely functional library for computing level-p-complexity of Boolean functions and applies it to two-level iterated majority. Boolean functions are simply functions from n bits to one bit, and they can describe digital circuits, voting systems, etc. An example of a Boolean function is majority, which returns the value that has majority among the n input bits for odd n. The complexity of a Boolean function f measures the cost of evaluating it: how many bits of the input are needed to be certain about the result of f. There are many competing complexity measures, but we focus on level-p-complexity — a function of the probability p that a bit is 1. The level-p-complexity 
$D_p(f)$
 is the minimum expected cost when the input bits are independent and identically distributed with Bernoulli(p) distribution. We specify the problem as choosing the minimum expected cost of all possible decision trees — which directly translates to a clearly correct, but very inefficient implementation. The library uses thinning and memoization for efficiency and type classes for separation of concerns. The complexity is represented using (sets of) polynomials, and the order relation used for thinning is implemented using polynomial factorization and root counting. Finally, we compute the complexity for two-level iterated majority and improve on an earlier result by J. Jansson.


Introduction
Imagine a voting system with yes/no options, for example direct democracy, indirect democracy, or dictatorship.How much information of the votes do we need until we can conclude the outcome of the election?For dictatorship, we only need the information of the dictator as he or she has all the power, but for a democratic majority we need at least half the votes.Depending on the order in which we find out what the votes are we might need all of them before we can conclude the result.More generally, this question is about complexity of Boolean functions which is application area of this paper.
Boolean functions are wide-spread in mathematics and computer science and can describe yes-no voter systems, hardware circuits, and predicates (O'Donnell, 2014; arXiv:2302.02473v4[cs.PL] 2 Nov 2023 We are interested in the cost of evaluating Boolean functions and we use binary decision trees to describe the evaluation order of Boolean functions.The depth of the decision tree corresponds to the number of votes needed to know the outcome for certain.This is called deterministic complexity.Another well-known notion is randomized complexity, and the randomized complexity bounds of iterated majority have been studied in Landau et al. (2006), Leonardos (2013) and Magniez et al. (2016).Iterated majority on two levels corresponds to the Boolean function for US elections as described above.We are particularly interested in this function due to its symmetry and simplicity, but still the complexity is non-trivial.
Diving into the literature for complexity of Boolean functions we find many different measures.Relevant concepts are certificate complexity, degree of a Boolean function, and communication complexity (Buhrman and De Wolf, 2002).Complexity measures related specifically to circuits are circuit complexity, additive, and multiplicative complexity (Wegener, 1987).Considering Boolean computation in practice we have combinational complexity which is the length of the shortest Boolean chain computing it (Knuth, 2012).Thus, there are many competing complexity measures but we focus on level-p-complexity -a function of the probability p that a bit is 1 (Garban and Steif, 2014).We assume that the bits are independent and identically Bernoulli-distributed with parameter p ∈ [0, 1].Then, for each Boolean function f and probability p, we get the level-p-complexity by minimizing the expected cost over all decision trees.The level-p-complexity is a piecewise polynomial function of p and has many interesting properties (Jansson, 2022).

Contributions
This paper presents a purely functional library for computing level-p-complexity of Boolean functions in general, and for maj2 3 in particular.The level-p-complexity of maj 2 3 was conjectured in Jansson (2022), but could not be proven because it was hard to generate all possible decision trees.This paper fills that gap, by showing that the conjecture is false and by computing the true level-p-complexity of maj 2 3 .The strength of our implementation is that it can calculate the level-p-complexity for Boolean functions quickly and correctly, compared to tedious calculations by hand.Our specification uses exhaustive search and considers all possible candidates (decision trees).Some partial candidates dominate (many) others, which may be discarded.Thinning (Bird and Gibbons, 2020) is an algorithmic design technique which maintains a small set of partial candidates which provably dominate all other candidates.We hope that one contribution of this paper is an interesting example of how a combination of algorithmic techniques can be used to make the intractable tractable.The code in this paper is available on GitHub 2 and uses packages from Jansson et al. (2022).The implementation is in Haskell but should work also in other languages, and parts of it has been reproduced in Agda to check some of the stronger invariants.The choice of Haskell for the implementation is due to its strong compiler and the availability of libraries for BDDs, memoization, and polynomials.

Motivation
To give the flavour of the end result we start with two examples which will be explained in detail later: the level-p-complexity of 2-level iterated majority maj 2 3 and of a 5-bit function we call sim 5 , defined in Fig. 1.1.The level-p-complexity is a piecewise polynomial function of the probability p and sim 5 is the smallest arity Boolean function we have found which has more than one polynomial piece contributing to the complexity.Polynomials are represented by their coefficients: for example, P [5, −8, 8] represents 5 − 8x + 8x 2 .The function genAlgThinMemo uses thinning and memoization to generate a set of minimal cost polynomials.The graph, in Fig. 1.1, shows that different polynomials dominate in different intervals.The polynomial P 1 is best near the end-points, but P 4 is best near p = 1 /2 (despite being really bad near the endpoints).The level-p-complexity is the piecewise polynomial minimum, a combination of P 1 and P 4 .This computation can be done by exhaustive search over the 54192 different decision trees and 39 resulting polynomials, but for more complex Boolean functions the doubly exponential growth makes that impractical.
For our running example, maj 2 3 , a crude estimate indicates we would have 10 111 decision trees to search and very many polynomials.Thus the computation would be intractable if it were not for the combination of thinning, memoization, and symbolic comparison of polynomials.Thanks to symmetries in the problem there turns out to be just one dominating polynomial: [4, 4, 6, 9, −61, 23, 67, −64, 16]] The graph, shown later in Fig. 4.2, shows that only 4 bits are needed in the limiting cases of p = 0 or 1 and that just over 6 bits are needed in the maximum at p = 1 /2.

Background
To explain what level-p-complexity of Boolean functions means we introduce some background about Boolean functions, decision trees, cost and complexity.The Boolean input type B could be {False, True }, {F , T } or {0, 1} and from now on we use 0 for false and 1 for true in our notation.In the running text we write e : t for "e has type t" which in the quoted Haskell code is written e :: t.

Boolean functions
A Boolean function f : B n → B is a function from n Boolean inputs to one Boolean output.We sometimes write BoolFun n for the type B n → B. The easiest examples of Boolean functions are the functions const n b which ignore the n input bits and return b.The usual logical gates like and n and or n are very common Boolean functions.Another example is the dictator function (also known as first projection), which is defined as dict n+1 [x 0 , ..., x n ] = x 0 when the dictator is bit 0.
A naive representation of a Boolean function could be a pair of an arity and a function f : [B] → B, but that turns out to be inefficient when we want to compare and tabulate them (see Section 3.3).Instead we use Binary Decision Diagrams BDDs (Bryant, 1986) as implemented in Masahiro Sakai's excellent Hackage package3 .The package reimplements all the usual Boolean operations on what is semantically expressions in n Boolean variables.BDDs are an efficient way of representing Boolean functions, and they can be used for testing, verification and complexity analysis.For readability, we will present Boolean functions in the naive represention, but the actual code uses the type BDD a from the BDD package (where a keeps track of variable ordering).Note that we only use BDDs to represent our Boolean functions, not our decision trees.
In the complexity computation, we only need two operations on Boolean functions which we capture in the following type class interface: class BoFun bf where isConst :: bf → Maybe B setBit :: The use of a type class here means we keep the interface to the BDD implementation minimal, which makes proofs easier and gives better feedback from the type system.The first method, isConst f , returns Just b iff the function f is constant and always returns b : B. The second method, setBit i b f , restricts a Boolean function (on n + 1 bits) by setting its ith bit to b.The result is a "subfunction" on the remaining n bits, abbreviated f i b , and illustrated in Fig. 2.1.As an example, for the function and 2 we have that setBit i 0 and 2 = const 1 0 and setBit i 1 and 2 = id.For and 2 we get the same result for i = 0, or 1 but for the dictator function it depends if we pick the dictator index (0) or not.We get setBit 0 b dict n+1 = const n b, because the result is dictated by bit 0. Otherwise, we Fig. 2.1: The tree of subfunctions of a Boolean function f .This tree structure is also the call-graph for our generation of decision trees.Note that this tree structure is related to, but not the same as, the decision trees.
get setBit (i + 1) b dict n+1 = dict n irrespective of the value of b since only the value of the dictator bit matters.This behaviour is shown in Fig. 2.2.
Fig. 2.2: The tree of subfunctions of the dict n+1 function.

Decision trees
Consider a decision tree that picks the n bits of a Boolean function f in a deterministic way depending on the values of the bits picked further up the tree.Decision trees are referred to as algorithms in (Landau et al., 2006;Garban and Steif, 2014;Jansson, 2022).Given a Boolean function f , a decision tree t describes one way to evaluate the function f .The Haskell datatype is as follows: data DecTree = Res B | Pick Index DecTree DecTree deriving (Eq, Ord, Show) Parts of the "rules of the game" in the mathematical literature is that you must return a Result if the function is constant and you may only Pick an index once.We can capture most of these rules with a type family version of the DecTree datatype (here expressed in Agda syntax).Here we use two type indices: t : DecTree n f is a decision tree for the Boolean function f , of arity n.The Res constructor may only be used for constant functions (but for any arity), while Pick i takes two subtrees for Boolean functions of arity n to a tree of arity suc n = n + 1. data DecTree : (n : Note that the dependently typed version of setBit clearly indicates that the resulting function g = (setBit i b f ) : BoolFun n has arity one less that of f : BoolFun (suc n).This helps maintaining the invariant that each input bit may only be picked once. 4e use the Haskell versions, but the Agda versions capture the invariants better.We can use these rules backwards to generate all possible decision trees for a certain function.If the function is constant, returning b : B, we immediately know that the only decision tree allowed is Res b.If it is not constant, we pick any index i, any decision tree t 0 for the subfunction setBit i 0 f and t 1 for the subfunction setBit i 1 f recursively.We get back to this in Section 3.1 after some preparation.
Note that we do not use binary decision diagrams (BDDs) to represent our decision trees.An example of a decision tree for the majority function maj 3 on three bits is defined by the expression ex1 visualised in Fig. 2.3.
We will define several functions as folds over DecTree and to do that we introduce a type class TreeAlg (for "Tree Algebra") which collects the two methods res and pic which are then used in the fold to replace the constructors Res and Pick.
class TreeAlg a where res :: B → a pic :: The TreeAlg class is used to define our decision trees but also for several other purposes.(In the implementation we additionally require some total order on a to enable efficient set computations.)We see that our decision tree type is the initial algebra of TreeAlg and that we can reimplement a generic version of ex1 which can be instantiated to any TreeAlg instance:  pic 0 (pic 2 (res 0) (pic 1 (res 0) (res 1))) (pic 1 (pic 2 (res 0) (res 1)) (res 1))

Expected Cost
For a function f and a specific input xs : B n , the cost of evaluating f according to a decision tree t is the length of the path from root to leaf dictated by the bits in xs.We then let the bits be independent and identically distributed with probability p ∈ [0, 1] for 1 and compute the expected cost (averaging over all 2 n inputs).Expected cost can be implemented as an instance of TreeAlg.Note that the expected cost of any decision tree for a Boolean function of n bits will always be a polynomial.We represent polynomials as lists of coefficients: Here zero = P [ ] and one = P [1] represent const 0 and const 1 respectively while xP = P [0, 1] is "the polynomial x".For pickPoly q 0 q 1 we first have to pick one bit and then if this bit is 0 (with probability P(x i = 0) = (1 − p)) we get q 0 which is the polynomial for this case.If the bit is instead 1 (with probability P(x i = 1) = p) we get q 1 .The expected cost of the decision tree ex1 is 2 + 2p − 2p 2 .From now on we will use Haskell's overloading to write 0 and 1 for zero and one even when working with polynomials.

Complexity
Now that we have introduced expected cost, we can introduce the level-p-complexity D p (f ) as the pointwise minimum of the expected cost over all of f 's decision trees: where the generation of decision trees is explained in Section 3.1.When minimizing we do not necessarily get a polynomial, but a piecewise polynomial function.For simplicity we represent a piecewise polynomial function as a set of polynomials: type PPoly a = Set (Poly a) evalPP :: (Ring a, Ord a) ⇒ PPoly a → (a → a) evalPP qs p = minimum (map (λq → evalP q p) qs) This representation will be inefficient if the set is big, but as a specification it works fine and we will later use thinning to keep the set small (see Sections 3.2 and 3.4).We say that one polynomial q is "uniformly worse" than another polynomial p when p x ⩽ q x for all 0 ⩽ x ⩽ 1 and p x < q x for some 0 < x < 1.For some polynomials, we can not determine which is worse, see Fig. 1.1 where four polynomials all intersect.
In this case, they are incomparable.
When computing the level-p-complexity it would be possible to take both f and the probability p as arguments and return the smallest expected cost for that probability, but we prefer to just take f as an argument and compute a piecewise polynomial function representation.In this way we can analyse the result symbolically to find minima, maxima, number of polynomial pieces, etc.

Examples of Boolean functions and their costs
Now that we have introduced expected cost and level-p-complexity we give a few examples of Boolean functions and their costs to give a feeling of how the computations work.The impatient reader can skip forward to Section 3. As mentioned earlier (in Section 2.1), we present the Boolean functions as Haskell functions for readability, but every example has a BDD counterpart.
For the constant functions (const n b), there is just one legal decision tree t = Res b and thus expCost t = 0 which gives D p (const n b) = 0.For the dictator function, there are many decision trees, but as we can see in Fig. 2.2, picking bit 0 first is optimal and gets us to the constant case just covered.Thus the optimal tree is optTree = Pick 0 (Res 0) (Res 1) and we can compute the expected cost as follows.
The parity function can be defined as par n :: In this case all bits have to be picked to determine the parity, regardless of input.We prove that for all decision trees t of par n or ¬ par n we have that expCost t = n using induction over n.For the base case, n = 0 we have that par 0 = const 0 0 and ¬ par 0 = const 0 1 so that expCost t = 0 for all decision trees t as shown above.For Fig. 2.4: The recursive structure of the parity function (par n ).The pattern repeats all the way down to par 0 = const 0 0.
the induction step we assume that for all decision trees t of par n or ¬ par n we have that expCost t = n and show that for all decision trees t of par n+1 or ¬ par n+1 we have that expCost t = n + 1.Any decision tree for par n+1 or ¬ par n+1 is of the form Pick i t 0 t 1 where t 0 and t 1 are decision trees for par n or ¬ par n as seen in Fig. 2.4.
To calculate the expected cost we get Thus, the induction proof is complete and as expCost t = n for all decision trees then also the minimum is n, thus D p (par n ) = n.Comparing Fig. 2.2 with Fig. 2.4, we see that the minimum depth of the dictator tree is 1, while the minimum depth of the parity tree is n.The parity function and the constant function are interesting extreme cases of Boolean functions as they have highest and lowest possible levelp-complexity n and 0. Either all bits have to be picked to determine the parity, or none of them need to be picked to determine the constant function.
We now introduce the Boolean function same which checks if all bits are equal: same :: B n → B same bs = and bs ∨ ¬ (or bs) Using same we construct the example sim 5 from the introduction.We first split the bits into two groups, one with the first three bits and the second with the last two bits.On the first group, called as, we check if the bits are not the same, and on the second group, called cs we check if the bits are the same.The point of this function is to illustrate a special case where the best decision tree depends on p so that the level-p-complexity consists of more than one polynomial piece.This computation is shown in Section 4.1.One of the major goals of this paper was to calculate the level-p-complexity of 9 bit iterated majority called maj 2 3 .When extending the majority function to maj 2 3 , we use maj 3 inside maj 3 .maj 2 3 :: B 9 → B maj 2 3 bs = maj 3 [maj 3 bs 0 , maj 3 bs 1 , maj 3 bs 2 ] where (bs 0 , rest) = splitAt 3 bs (bs 1 , bs 2 ) = splitAt 3 rest maj n :: It is hard to calculate D p (maj 2 3 ) by hand because there are very many different decision trees, and this motivated our Haskell implementation.

Computing the level-p-complexity
In this section we explain how to compute the level-p-complexity of a Boolean function f by recursively "generating all candidates" followed by "picking the best one(s)".The naive approach would be to generate all decision trees of f and then minimizing, but already for the 9-bit function maj 2 3 that is intractable.To reduce the number of polynomials we use the algorithm design technique thinning.We compare polynomials by using Yun's algorithm and Descartes rule of signs.Further, since the same subfunctions often appear in many different nodes we can save a significant amount of computation time using memoization.
The top level complexity computation (from Section 2.4) can be simplified a bit: { fuse expCost into the tree algebra generation } = minimum (map (λq → evalP q p) (genAlg n f )) { let best p = minimum • map (λq → evalP q p) } = best p (genAlg n f ) and we start by explaining genAlg n .The decision trees of a function f can be described in terms of the decision trees for the immediate subfunctions (f i b = setBit i b f ) for different i : Index and b : B. In fact, we can immediately generate elements of any tree algebra, not only decision trees, by using res and pic instead of Res and Pick.(That is used in the "fuse" step of the calculation above.)When we explain the algorithm we write "decision tree" to make it feel more concrete, but we will in the end mostly use it to directly compute expected cost polynomials.

Generating decision trees and other tree algebras
The complexity computation starts from a Boolean function f : BoolFun n, and generates all decision trees for it.There are two top level cases: either the function f is constant (and returns b : B), in which case there is only one decision tree: res b; or the function f still depends on some of the input bits (and thus the arity is at least 1).In the latter case, for each index i : Index we can generate two subfunctions f i 0 = setBit i 0 f and f i 1 = setBit i 1 f .We then recursively generate a decision tree t 0 for f i 0 and t 1 for f i 1 and combine them to a bigger decision tree using pic i t 0 t 1 .This is done for all combinations of i, t 0 , and t 1 in a set comprehension.To make it easier to later extend the definition (for thinning and memoization) we make the recursive step explicit.
We would like to enumerate the cost polynomials of all the decision trees of a particular Boolean function (n = 9, f = maj 2 3 is our main goal).Without taking symmetries into account there are 2 × n immediate subfunctions f i b and if T g is the cardinality of the enumeration for subfunction g we have that These numbers can be really big if we count all decision trees, but if we only care about their cost polynomials, many decision trees will collapse to the same polynomial, making the counts more manageable (but still possibly really big).Even the total number of subfunctions encountered (the number of recursive calls) can be quite big.If all the 2 × n immediate subfunctions are different, and if all of them would generate 2 × (n − 1) different subfunctions in turn, the number of subfunctions would be 2 n × n!.But in practice many subfunctions will be the same.When computing the polynomials for the 9-bit function maj 2 3 , for example, only 215 distinct subfunctions are encountered.
As a smaller example, for the 3-bit majority function maj 3 , choosing i = 0, 1, or 2 gives exactly the same subfunctions.Figure 3.1 illustrates a simplified call graph of genAlg 3 maj 3 and the results (the expected cost polynomials) for the different subfunctions.In this case all the sets are singletons, but that is very unusual for more realistic Boolean functions.It would take too long to compute all polynomials for the 9-bit function maj 2 3 but there are 21 distinct 7-bit sub-functions, and the first one of them already has 18021 polynomials.Thus we can expect billions of polynomials for maj 2 3 and this means we need to look at ways to keep only the most promising candidates at each level.This leads us to the algorithmic design technique of thinning.
Fig. 3.1: A simplified computation tree of genAlg 3 maj 3 .In each node f → ps shows the input f and output ps = genAlg n f of each local call.As all the functions involved are "symmetric" in the index (setBit i b f setBit j b f for all i and j) we only show edges for 0 and 1 from each level.

Thinning
The general shape of the specification has two phases: "generate all candidates" followed by "pick the best one(s)".The first phase is recursive and we would like to push as much as possible of "pick the best" into the recursive computation.In the extreme case of a greedy algorithm, we can thin the intermediate sets all the way down to singletons, but even if the sets are a bit bigger than that we can still reduce the computation cost significantly.A good (but abstract) reference for thinning is the Algebra of Programming book (Bird and de Moor, 1997, Chapter 8) and more concrete references are the corresponding developments in Agda (Mu et al., 2009) and Haskell (Bird and Gibbons, 2020).In this subsection the main focus is on specification and correctness, with Agda-like syntax for the logic part.
The "pick the best" phase is best p = minimum • map (λq → evalP q p) of type Set (Poly r) → r for some ring of scalars r (usually rational numbers).In this context it is clear that in the generation phase we can throw away any polynomial which is "uniformly worse" than some other polynomial and this is what we want to use thinning for.We are looking for some "smallest" polynomials, but we only have a preorder, not a total order, which means that we may need to keep a set of incomparable candidates (elements x ̸ y for which neither x ≺ y nor y ≺ x).We first describe the general setting and move to the specifics of our polynomials later.
We start from a strict preorder (≺) : a → a → Prop (an irreflexive and transitive relation).You can think of Prop as B because we only work with decidable relations and finite sets in this application.As we are looking for minima, we say that y dominates x if y ≺ x.
We lift the order relation to sets in two steps.First ys ≺ x means that ys dominates x, meaning that some element in ys is smaller than x.If this holds, there is no need to add x to ys because we already have at least one better element in ys.Then ys ≺ xs means that ys dominates all of xs.
Finally, we combine subset and domination into the thinning relation: We will use this relation in the specification of our efficient computation to ensure that the small set of polynomials computed, still "dominates" the big set of all the polynomials generated by genAlg n f .But first we introduce the helper function thin : Set a → Set a which aims at removing some elements, while still keeping the minima in the set.Later we will use the function genAlgT n f specified similarly to genAlg n f but using the helper function thin.It has to refine the relation Thin which means that if ys = thin xs then ys must be a subset of xs (ys ⊆ xs) and ys must dominate the rest of xs (ys ≺ (xs \\ ys)).A trivial (but useless) implementation would be thin = id, and any implementation which removes some "dominated" elements could be helpful.
The best we can hope for is that thin gives us a set of only incomparable elements.
If thin compares all pairs of elements, it can compute a smallest thinning.In general that may not be needed (and a linear time greedy approximation is good enough), but in some settings almost any algorithmic cost which can reduce the intermediate sets will pay off.We collect the thinning functions in the type class Thinnable: The greedy thin starts from an empty set and considers one element x at a time.If the set ys collected thus far already dominates x, it is returned unchanged, otherwise x is inserted.The optimal version also removes from ys all elements dominated by x.
It is easy to prove that thin implements the specification Thin.
The method cmp is a more informative version of (≺): it returns Just LT , Just EQ, or Just GT if the first element is smaller, equal, or greater than the second, respectively, or Nothing if they are incomparable.
Our use of thinning.Now we have what we need to specify when an efficient genAlgT n f computation is correct.Our specification (spec n f ) states a relation between a (very big) set xs = genAlg n f and a smaller set ys = genAlgT n f we get by applying thinning at each recursive step.We want to prove that ys ⊆ xs and ys ≺ (xs \\ ys) because then we know we have kept all the candidates for minimality.
We can first take care of the simplest case (for any n).If the function f is constant (returning some b : B), both xs and ys will be the singleton set containing res b.Thus both properties trivially hold.
We then proceed by induction on n to prove S n = ∀ f : BoolFun n. spec n f .In the base case n = 0 the function is necessarily constant, and we have already covered that above.In the inductive step case, assume the induction hypothesis IH = S n and prove S n+1 for a function f : BoolFun (n + 1).We have already covered the constant function case, so we focus on the main recursive clause of the definitions of genAlg n f and genAlgT n f when the fixpoint definitions have been expanded: All subfunctions f i b : BoolFun n used in the recursive calls satisfy the induction hypothesis: spec n f i b .If we name the sets involved in these hypotheses xs i b and ys i b we can thus assume ys i b ⊆ xs i b and ys i b ≺ (xs i b \\ ys i b ).First, the subset property: we want to prove that genAlgT n+1 f ⊆ genAlg n+1 f , or equivalently, ∀ y. (y ∈ genAlgT n+1 f ) ⇒ (y ∈ genAlg n+1 f ).Let y ∈ genAlgT n+1 f .We know from the specification of thin and the definition of genAlgT n+1 f that y = pic i y 0 y 1 for some y 0 ∈ ys i 0 and y 1 ∈ ys i 1 .The subset part of the induction hypothesis gives us that y 0 ∈ xs i 0 and y 1 ∈ xs i 1 .Thus we can see from the definition of genAlg n+1 f that y ∈ genAlg n+1 f .Now for the "domination" property we need to show that ∀ x ∈ xs \\ ys.ys ≺ x where xs = genAlg n+1 f and ys = genAlgT n+1 f .Let x ∈ xs \\ ys.Given the definition of xs it must be of the form x = pic i x 0 x 1 where x 0 ∈ xs i 0 and x 1 ∈ xs i 1 .The (second part of the) induction hypothesis provides the existence of y b ∈ ys i b such that y b ≺ x b .From these y b we can build y ′ = pic i y 0 y 1 as a candidate element to "dominate" xs.
We can now show that y ′ ≺ x by polynomial algebra: We are not quite done, because y ′ may not be in ys.It is clear from the definition of genAlgT n+1 f that y ′ is in the set ys ′ sent to thin, but it may be "thinned away".But, either y ′ ∈ ys = thin ys ′ in which case we take the final y = y ′ , or there exists another y ∈ ys such that y ≺ y ′ and then we get get y ≺ x by transitivity.
To sum up, we have now proved that we can push a powerful thin step into the recursive enumeration of all cost polynomials in such a way that any minimum is guaranteed to reside in the much smaller set of polynomials thus computed.
The specific properties we need from (≺) (in addition to the general requirements for thinning) are that (pos+) and (pos×) are monotonic (for polynomials 0 ≺ pos) and that q 0 ≺ q 1 implies evalP q 0 p ⩽ evalP q 1 p for all 0 ⩽ p ⩽ 1.

Memoization
The call graph of genAlgT n f is the same as the call graph of genAlg n f and, as mentioned above, it can be exponentially big.Thus, even though thinning helps in making the intermediate sets exponentially smaller, we still have one source of exponential computational complexity to tackle.Fortunately, the same subfunctions often appear in many different nodes and this means we can save a significant amount of computation time using memoization.
The classical example of memoization is the Fibonacci function.Naively computing fib (n + 2) = fib (n + 1) + fib n leads to exponential growth in the number of function calls.But if we fill in a table indexed by n with already computed results we can compute fib n in linear time.
Similarly, here we "just" need to tabulate the result of the calls to genAlg n f so as to avoid recomputation.The challenge is that the input we need to tabulate is now a Boolean function which is not as nicely structured as a natural number index.Fortunately, thanks to Hinze (2000), Elliott, and others we have generic Trie-based memo functions only a hackage library away5 .The MemoTrie library provides the Memoizable class and suitable instances and helper functions for most types.We only need to provide a Memoizable instance for BDDs, and we do this using inSig and outSig from the BDD package (decision-diagrams).They expose the top-level structure of a BDD: Sig bf is isomorphic to Either B (Index, bf , bf ) where bf = BDDFun.We define our top-level function genAlgThinMemo by applying memoization to genAlgT n (or, more specifcally, to genAlgStepThin).

Comparing polynomials
As argued in Section 3.2, the key to an efficient computation of the best cost polynomials is to compare polynomials as soon as possible and throw away those which are "uniformly worse".The specification of p ≺ q is p x ⩽ q x for all 0 ⩽ x ⩽ 1 and p x < q x for some 0 < x < 1.Note that (≺) is a strict pre-order -if the polynomials cross, neither is "uniformly worse" and we keep both.A simple example of two incomparable polynomials is xP and 1 − xP which cross at p = 1 /2.
If we have two polynomials p, and q, we want to know if p ⩽ q for all inputs in the interval [0, 1].Equivalently, we need to check if 0 ⩽ q − p in that interval.As the difference is also a polynomial, we can focus our attention to locating polynomial roots in the unit interval.Fig. 3.2: To compare two polynomials p and q we use root counting for q − p and these are the three main cases to consider.
If there are no roots (Fig. 3.2a) in the unit interval, the polynomial stays on "one side of zero" and we just need to check the sign of the polynomial at any point.If there is at least one single-root (Fig. 3.2b), the original polynomials cross and we return Nothing.Similarly for triple-roots or roots of any odd order.Finally, if the polynomial only has roots of even order (some double-roots, or quadruple-roots, etc. as in Fig. 3.2c) the polynomial stays on one side of zero, and we can check a few points to see what side that is.(If the number of distinct roots is r we check up to r + 1 points to make sure at least one will be non-zero and thus tell us on which side of zero the polynomial lies.) To compare polynomials, we thus need to implement the root-counting functions numRoots and numRoots ′ : We will not provide all the code here, because that would take us too far from the main topic of the paper, but we will illustrate the main algorithms and concepts for root-counting in Section 3.5.The second function computes real root multiplicities: numRoots ′ p = [1, 3] means p has one single and one triple root in the open interval (0, 1).From this we get that p has 2 = length [1, 3] distinct real roots and 4 = sum [1, 3] real roots if we count multiplicities.
Using the root-counting functions, the top-level of the polynomial partial order implementation is as follows:

Isolating real roots and Descartes rule of signs
This section explains how to do root-counting by combining Yun's algorithm and Descartes rule of signs.As explained in Section 3.4 the root-counting is the key to implementing comparison, which is needed for thining.First out is Yun's algorithm (Yun, 1976) for square-free factorisation: given a polynomial p it computes a list of polynomial factors p i , each of which only has single-roots, and such that p = C i p i i .Note the exponent i: the factor p 2 , for example, appears squared in p.If p only has single-roots, the list from Yun's algorithm has just one element, p 1 , but in any case we get a finite list of polynomials, each of which is "square-free".6

Level-p-complexity for maj 2 3
When running genAlgThinMemo 9 maj 2 3 we get {P [4, 4, 6, 9, −61, 23, 67, −64, 16]}, which means that the expected cost (P * ) of the best decision tree (T * ) is This can be compared to the decision tree (that we call T t ) conjectured in (Jansson, 2022) to be the best.Its expected cost is slightly higher (thus worse): The expected costs for decision trees T * and T t can be seen in Fig. 4.2.Comparing the two polynomials using cmpPoly P * P t shows that the new one has strictly lower expected cost than the one from the thesis.The difference, which factors to exactly p 2 (1 − p) 2 (1 − p + p 2 ), is illustrated in Fig. 4.3, and we note that it is non-negative in the whole interval.The value the polynomials at the endpoints is 4 and the maximum of P * is ≈ 6.14 compared to the maximum of P t which is ≈ 6.19.The conjecture in (Jansson, 2022) is thus false and the correct formula for the level-p-complexity of maj 2 3 is P * .At the time of publication of (Jansson, 2022) it was believed that sifting through all the possible decision trees would be intractable.Fortunately, using a combination of thinning, memoization, and exact comparison of polynomials, it is now possible to compute the correct complexity in less than a second on the author's laptop.

Conclusions
This paper describes a Haskell library for computing level-p-complexity of Boolean functions, and applies it to two-level iterated majority (maj 2 3 ).The problem specification is straightforward: generate all possible decision trees, compute their expected cost polynomials, and select the best ones.The implementation is more of a challenge because of two sources of exponential computational cost: an exponential growth in the set of decision trees and an exponential growth in the size of the recursive call graph (the collection of subfunctions).The library uses thinning to tackle the first and memoization to handle the second source of inefficiency.In combination with efficient data structures (binary decision diagrams for the Boolean function input, sets of polynomials for the output) this enables computing the level-p-complexity for our target example maj 2 3 in less than a second.From the mathematics point of view the strength of the methods used in this paper to compute the level-p-complexity is that we can get a correct result to something which is very hard to calculate by hand.From a computer science point of view the paper is an instructive example of how a combination of algorithmic and symbolic tools can tame a doubly exponential computational cost.
The library uses type-classes for separation of concerns: the actual implementation type for Boolean functions (the input) is abstracted over by the BoFun class; and the corresponding type for the output is modelled by the TreeAlg class.We also use our own class Thinnable for thinning (and pre-orders), and the Memoizable class from hackage.This means that our main function has the following type: All the Haskell code is available on GitHub 7 and parts of it has been reproduced in Agda to check some of the stronger invariants.One direction of future work is to complete the Agda formalisation so that we can provide a formally verified library, perhaps helped by Swierstra (2022); van der Rest and Swierstra (2022).
The set of polynomials we compute are all incomparable in the pre-order and, together with the thinning relation this means that we actually compute what is called a Pareto front from economics: a set of solutions where no objective can be improved without sacrificing at least one other objective.It would be interesting to explore this in more detail and to see what the overlap is between thinning as an algorithm design method and different concepts of optimality from economics.
The computed level-p-complexity for maj 2 3 is better than the result conjectured in (Jansson, 2022), and the library allows easy exploration of other Boolean functions.With the current library the level-p-complexity of iterated majority on 3 levels (27 bits) is out of reach, but with Christian Sattler and Liam Hughes we are exploring a version specialised to "iterated threshold functions" which can handle this case (see code in the GitHub repository).
Fig. 4.3: Difference between the expected costs of T t and T * .