Heterogeneous binary random-access lists

Writing an evaluator for the simply typed lambda calculus is a classic example of a dependently typed program that appears in numerous tutorials (McBride, 2004; Norell, 2009, 2013; Abel, 2016). The central idea is to represent the well-typed lambda terms over some universe U using an inductive family (Figure 1). Before writing the evaluator for such terms, we need to define a type of environments, capturing the values associated with the free variables in a term. This is typically done using a heterogeneous list, indexed by a list of the free variables’ types: data Env : Ctx → Setwhere Nil : Env Nil Cons : Val u → Env ctx → Env (u :: ctx) lookup : Env ctx → Ref ctx u → Val u lookup (Cons x ctx) Top = x lookup (Cons x ctx) (Pop ref) = lookup ctx ref


Introduction
Writing an evaluator for the simply typed lambda calculus is a classic example of a dependently typed program that appears in numerous tutorials (McBride, 2004;Norell, 2009Norell, , 2013Abel, 2016). The central idea is to represent the well-typed lambda terms over some universe U using an inductive family ( Figure 1). Before writing the evaluator for such terms, we need to define a type of environments, capturing the values associated with the free variables in a term. This is typically done using a heterogeneous list, indexed by a list of the free variables' types: When writing the evaluator, the type indices ensure that we can reuse the host language's constructs for lambda and application, rather than having to define substitution and βreduction ourselves. It is only in the case for variables that we have to do any real work: looking up the value of a variable from the environment: eval : Term ctx u → Env ctx → Val u eval (App t 1 t 2 ) env = (eval t 1 env) (eval t 2 env) eval (Lam body) env = λ x → eval body (Cons x env) eval (Var i) env = lookup env i This evaluator, however, is not particularly efficient. In particular, the environment is represented as a heterogeneous list of values with linear time lookup. This technique of using indexed data types to represent well-typed expressions has trickled down into numerous industrial applications built using Haskell, including the Crucible symbolic simulator developed by Galois and Accelerate library for generating GPU code (Chakravarty et al., 2011). Christiansen et al. (2019)  This pearl explores how to define a heterogeneous data structure that provides a lookup function with logarithmic complexity. The key challenge is to choose indices judiciously, thereby avoiding the need for additional lemmas or type coercions to ensure type safety. Doing so will enable us to define a more efficient evaluator, that is total, equally simple and easily verified to produce the same results as the evaluator above.

Binary random-access lists
Before trying to define an efficient data structure storing heterogeneous values, we will first consider the simpler homogeneous case. In this section, we will start by writing an Agda implementation of homogeneous binary random-access lists (Okasaki, 1999). We will then define a heterogeneous version, as required by our evaluator, indexed by the homogeneous version-much as the heterogeneous environments Env are indexed by a (homogeneous) list of types. The key challenge of implementing such data structures in Agda will be to choose data types that ensure all our definitions are total.
To achieve logarithmic lookup times, we need to shift from linear lists to binary trees. If we assume that we only have to store 2 n elements, we could use a perfect binary tree of depth n: data Tree (a : Set) : N → Set where Leaf : a → Tree a Zero Node : Tree a n → Tree a n → Tree a (Succ n) To define a lookup function, we need to consider how to designate a position in the tree. One way to do so is using a path of n steps, providing directions at every internal node: Heterogeneous binary random-access lists 3 data Path : N → Set where Here : Path Zero Left : Path n → Path (Succ n) Right : Path n → Path (Succ n) lookup : Tree a n → Path n → a lookup (Node t 1 t 2 ) (Left p) = lookup t 1 p lookup (Node t 1 t 2 ) (Right p) = lookup t 2 p lookup (Leaf x) Here = x Note that the index n is shared by both the depth of the tree and the length of the path, ensuring that our lookup function is total: we do not need to provide cases for the Node-Here, Leaf-Left or Leaf-Right constructor combinations. Throughout this paper, code in each section is in a separate module, allowing function names such as lookup to be reused liberally. Only when necessary, we will use qualified names. Although our lookup function is now logarithmic, we can only store a fixed number of elements in this tree. In particular, there is no way to add new elements-as is required by our interpreter. Furthermore, we may want to store a number of elements that is not equal to a power of two. Fortunately, any natural number can be written as a sum of powers of two-and we can use this insight to define a better data structure.

Binary arithmetic
Before doing so, however, we will need two auxiliary definitions: a data type Bin representing little-endian binary numbers and a function bsucc that computes the successor of a binary number: Note that this definition provides different representations of the same number-the binary numbers End and Zero End are two different representations of zero-but this will not be a problem in our setting. By construction, our definitions will always avoid unnecessary trailing zeros.

Binary random-access lists
We now turn our attention to defining a suitable structure for storing an arbitrary number of elements. The key insight used by Okasaki's binary random-access lists is that the binary representation of the number of elements we wish to store determines how to organise 4 W. Swierstra these elements over a series of perfectly balanced binary trees. For example, we can store seven elements in three perfect trees of increasing depth, as illustrated above.
To store fewer elements, we can leave out any of these trees. For example, we might use the first and last trees to store five elements. The binary representation of the number of elements determines which trees must be present and which trees must be omitted.
We can make this precise in the following data type for binary random-access lists: data RAL (a : Set) : (n : N) → Bin → Set where Nil : RAL a n End Cons 1 : Tree a n → RAL a (Succ n) b → RAL a n (One b) Cons 0 : RAL a (Succ n) b → RAL a n (Zero b) A value of type RAL a n b consists of a series of perfectly balanced binary trees of increasing depth. The Nil constructor corresponds to an empty list of trees; the other constructors extend the current binary number with a One or Zero, respectively. In the prior case, we also have a tree of depth n; in either case, we increment the depth of the trees in the remainder of the binary random-access list.
It is worth highlighting the choice of indices here. These binary random-access lists are indexed by the current depth, n, and the binary representation of the number of elements they store. In contrast to the familiar example of vectors, the index n counts up rather than down. The depth n will be Zero initially, but is incremented in both Cons nodes. The binary number used as an index both completely determines the constructors used and counts the number of elements stored.
How do we designate a position in such a binary random-access list? We combine the linear references from the introduction and the paths from the previous section as follows: The two constructors, There 0 and There 1 , are used to designate a position further down the list of trees. Once we hit the desired tree, the Here constructor stores a path of suitable length, which we can use to navigate to a leaf in the tree at the head of our binary randomaccess list. Using both these data types, we can define a total lookup function: lookup : RAL a n b → Pos n b → a lookup (Cons 1 t ral) (Here path) = Tree.lookup t path lookup (Cons 0 ral) (There 0 i) = lookup ral i lookup (Cons 1 ral) (There 1 i) = lookup ral i Heterogeneous binary random-access lists 5 Crucially, as the binary random-access list and position share the same depth n and binary number b, we can rule out having to search the empty binary random-access list. This proves that each value of type Pos n b is guaranteed to designate a unique value in the binary random-access list of type RAL a n b.
In contrast to perfectly balanced binary trees, we can add a single element to a binary random-access list. To do so, we begin by defining the more general consTree function that adds a tree of depth n to a binary random-access list: consTree : Tree a n → RAL a n b → RAL a n (bsucc b) consTree t Nil = Cons 1 t Nil consTree t (Cons 1 t' r) = Cons 0 (consTree (Node t t') r) consTree t (Cons 0 r) = Cons 1 t r As its type suggests, this function closely follows the successor operation on binary numbers. It searches for the first occurrence of a Cons 0 constructor, accumulating any subtrees found in a Cons 1 constructor along the way. Note that the number n, counting both the depth of the tree being inserted and the position in the list of trees being traversed, increases in the recursive call in the Cons 1 branch. Finally, we can add a single element to a binary random-access list by calling consTree with an initial tree storing the single element to be inserted: Note that the cons function requires the depth of the binary random-access list to be Zero, as it calls consTree with a leaf of depth Zero as argument; this depth increases as the consTree function recurses over the binary random-access list.
Although we have an extensible data structure that supports logarithmic lookup time, we can only store elements of a single type. Using these binary random-access lists, however, we can define a heterogeneous alternative.

Heterogeneous binary random-access lists
In this section, we will show how to use our previous definitions to define a binary randomaccess list storing values of different types. For every data type definition in the previous sections, we will give a heterogeneous version indexed by a (homogeneous) structure storing type information. For example, we can define a heterogeneous perfect binary tree as follows: data HTree : Tree U n → Set where HLeaf : Val u → HTree (Leaf u) HNode : HTree us → HTree vs → HTree (Node us vs) Just as the environments from the introduction were indexed by a list of types, we can index these heterogeneous trees by a tree of types, determining the types of the values stored in the leaves. The function Val : U → Set, defined in Figure 1, maps the codes from the universe U to the corresponding types. While not strictly necessary, we will prefix 6 W. Swierstra the constructors of heterogeneous types with a capital 'H' to distinguish them from their homogeneous counterparts. Next, we introduce a heterogeneous version of our Path data type. A value of type HPath t u corresponds to a path in the tree t : Tree U n, ending at a leaf storing u. The constructors closely follow the constructors for the Path data type; it is only the type indices that have changed: data HPath : Tree U n → U → Set where HHere : HPath (Leaf u) u HLeft : HPath us u → HPath (Node us vs) u HRight : HPath vs u → HPath (Node us vs) u We can define a lookup function with the following type by induction over the path: lookup : HTree us → HPath us u → Val u The definition is-up to constructor names-identical to the one we have seen previously; the only difference is in the type signature, as the type of the value that is returned may vary depending on the position in the tree.
We can now define a heterogeneous version of our binary random-access lists, indexed by its homogeneous counterpart: HNil : HRAL Nil HCons 1 : HTree t → HRAL ral → HRAL (Cons 1 t ral) HCons 0 : HRAL ral → HRAL (Cons 0 ral) Once again, the constructors closely follow the constructors of binary random-access lists, RAL. The only key difference lies in the choice of indices. The original RAL data type was indexed by a binary number that determined its structure; here the structure is determined by a binary random-access list of types.
Similarly, we also revisit the type of positions in a binary random-access list: data HPos : RAL U n b → U → Set where HHere : HPath t u → HPos (Cons 1 t ral) u HThere 0 : HPos ral u → HPos (Cons 0 ral) u HThere 1 : HPos ral u → HPos (Cons 1 t ral) u The lookup function on heterogeneous random-access lists follows the same structure as its homogeneous counterpart. Types aside, the only real difference is the call to the heterogeneous lookup function on binary trees, rather than the homogeneous version we saw previously: Finally, the definition of cons and consTree are readily adapted to the heterogeneous setting: consTree : HTree t → HRAL ral → HRAL (RAL.consTree t ral) consTree t HNil = HCons 1 t HNil consTree t (HCons 1 t' hral) = HCons 0 (consTree (HNode t t') hral) consTree t (HCons 0 hral) = HCons 1 t hral cons : (x : Val u) → HRAL ral → HRAL (RAL.cons u ral) cons x r = consTree (HLeaf x) r Again, the only interesting change here is in the type signature. The result of the cons function uses the cons operation on homogeneous binary random-access lists defined in the previous section. Rather than incrementing the binary number counting the number of elements as we did in the homogeneous case, we extend the index binary random-access list with the type of the new element.

An alternative evaluator
Finally, we can write a variation of our original evaluator. We begin by defining two functions. The first calculates the length of a (linear) context represented as a binary number; the second converts a context to a binary random-access list: lengthBin : Ctx → Bin lengthBin Nil = End lengthBin (u :: ctx) = bsucc (lengthBin ctx) makeRAL : (ctx : Ctx) → RAL.RAL U Zero (lengthBin ctx) makeRAL Nil = RAL.Nil makeRAL (u :: ctx) = RAL.cons u (makeRAL ctx) The random-access list produced in this fashion has type RAL U Zero (lengthBin ctx). As we are constructing a binary random-acesss by repeatedly calling the cons function, the second index must be Zero; (the binary representation of) the number of elements is computed using the lengthBin function.
Next, we would like to convert the linear references from the introduction to the positions in a binary random-access list. This amounts to defining a function: Crucially, the definition does not require type coercions or any additional proofs to type check. The type indices of the Term data type ensure we can still safely use Agda's lambda abstraction and application; the cons operations on homogeneous and heterogeneous random-access lists used in the types and values, respectively, share the same structure.
Can we prove these two evaluators are equal? To relate them, we need to convert linear environments that the previous evaluator used to a heterogeneous binary random-access list: toHRAL : Env ctx → HRAL (makeRAL ctx) toHRAL Nil = HNil toHRAL (Cons x env) = cons x (toHRAL env) Next, we can show that our conversions in representation, toHRAL and toHPos, respect the lookup operation by proving the following equality: The proof relies on a pair of auxiliary lemmas, relating the top and pop functions to the lookup of our heterogeneous binary random-access lists: lookupPop : (x : Val v) (env : HRAL ral) (p : HPos ral u) → lookup env p ≡ lookup (cons x env) (pop p) lookupTop : (x : Val u) (env : HRAL.HRAL ral) → x ≡ lookup (cons x env) top Furthermore, we can map one choice of variable representation to another by defining: Finally, we can prove that, assuming functional extensionality, our two evaluators produce identical results: correct : (t : Term Ref ctx u) (env : Env ctx) → evalRef t env ≡ evalHPos (mapTerm toHPos t) (toHRAL env) The proof itself, using our lookupLemma, is only three lines long.

Discussion
Although the code presented here is written in Agda, it can be converted to Haskell using various modern language features, including data kinds (Yorgey et al., 2012), generalised algebraic data types (Vytiniotis et al., 2006), type families (Eisenberg et al., 2014) and multi-parameter type classes with functional dependencies (Jones, 2000). Very few of the definitions presented here rely on computations appearing in types; as a result, the translation is mostly straightforward. The most problematic parts are the top and pop functions from Section 4-the cons function that appears on the type level complicates their definition. Using type classes rather than type families, we can express the desired computations more easily as a relation, rather than a (total) function in Agda as we did in this paper. This Haskell version opens the door to more realistic performance studies. While beyond the scope of this pearl, we expect heterogeneous binary random-access lists to be quite a bit slower Haskell's highly optimised Data.Map library. Nonetheless, we would hope that the performance improvement over heterogeneous lists is sufficient to address the bottlenecks mentioned in the introduction.
Finally, there is clearly a more general pattern at play here: creating a heterogeneous data structure by indexing it by its homogeneous counterpart. Using ornaments (Dagand, 2013;McBride, 2010 (unpublished)), we might be able to give a generic account of this construction. In particular, the work by Ko & Gibbons (2016) on binomial heaps would be a valuable starting point for such exploration.