Relational cost analysis in a functional-imperative setting

Abstract Relational cost analysis aims at formally establishing bounds on the difference in the evaluation costs of two programs. As a particular case, one can also use relational cost analysis to establish bounds on the difference in the evaluation cost of the same program on two different inputs. One way to perform relational cost analysis is to use a relational type-and-effect system that supports reasoning about relations between two executions of two programs. Building on this basic idea, we present a type-and-effect system, called ARel, for reasoning about the relative cost (the difference in the evaluation cost) of array-manipulating, higher order functional-imperative programs. The key ingredient of our approach is a new lightweight type refinement discipline that we use to track relations (differences) between two mutable arrays. This discipline combined with Hoare-style triples built into the types allows us to express and establish precise relative costs of several interesting programs that imperatively update their data. We have implemented ARel using ideas from bidirectional type checking.


Introduction
Standard cost analysis aims at statically establishing an upper or a lower bound on the evaluation cost of a program. The evaluation cost is usually measured in abstract units, for example, the number of reduction steps in an operational semantics, the number of recursive calls made by the program, the maximum number of abstract units of memory used during the program's evaluation, etc. Cost analysis has been developed using a variety of techniques such as type systems (Grobauer, 2001;Danielsson, 2008;Dal Lago & Gaboardi, 2011;Hoffmann et al., 2012b;Avanzini & Dal Lago, 2017), term rewriting and abstract interpretation (Hermenegildo et al., 2005;Sinn et al., 2014;Brockschmidt et al., 2014), and Hoare logics (Atkey, 2010;Carbonneaux et al., 2015;Charguéraud & Pottier, 2015).
Relational cost analysis, the focus of this paper, is a more recently developed problem that aims at statically establishing an upper bound on the difference in the evaluation costs of two related programs or two runs of the same program with different inputs (Çiçek et al., 2017;Ngo et al., 2017;Radicek et al., 2018). This difference is called the relative cost of the two programs or runs. Relational cost analysis has many applications: It can show that an optimized program is not slower than the original program on stipulated inputs; in cryptography, it can show that an algorithm's run time is independent of secret inputs, and hence that there are no leaks on the timing side channel; in algorithmic analysis, it can help understand the sensitivity of an algorithm's cost to input changes, which can be useful for resource allocation.
There are two reasons for examining relational cost analysis as a separate problem, as opposed to performing standard unary cost analysis separately on the two programs and taking a difference of the established costs. First, in many cases, relational cost analysis is easier than unary cost analysis. For example, consider a programmer who would like to update a piece of code of a distributed system, where the cost is local memory and this resource is limited. A unary cost analysis of the overall system may be impractical, while it may be easy to perform an analysis of the local difference memory consumption between the original piece of code and the updated one. Second, in many cases, a direct relational cost analysis may be more precise than the difference of two unary analyses, since the relational analysis can exploit relations between intermediate values in the programs that the unary analyses cannot. As an example, the relative cost of two runs of merge sort on lists of length n that differ in at most k positions is in O(n · (1 + log(k))). This relative cost can be established by a relational analysis as shown by Çiçek et al. (2017), but two separate unary analyses can only establish the coarser relative cost O(n · log(n)).
Hitherto, the literature on relational cost analysis has been limited to functional languages. However, many practical programs are stateful and use destructive updates, which are more difficult to reason about. Consequently, our goal in this work is to develop relational cost analysis for functional languages with mutable state (i.e., for functional-imperative programs).
To this end, we propose a refinement type-and-effect system, ARel, for relational cost analysis in a functional, higher order language with mutable state. The first question we must decide on is what kind of state to consider. One option could be to work with standard references as found in many functional languages like ML. However, from the perspective of cost analysis it is often more interesting to consider programs that operate on entire data structures (e.g., a sorting algorithm), not just on individual references. Consequently, we consider mutable arrays, the standard data structure available in almost all imperative languages. This makes our type system more complicated than it would be with standard references but allows us to verify more interesting examples.
Second, we must decide how to treat state in our functional language. Broadly, we have two choices: State could be a pervasive effect as in ML, or it could be confined to a monad as in Haskell, which limits side effects to only those sub-computations that actually access the heap. In ARel, we choose the latter option since this separates the pure and impure (state-affecting) parts of the language at the level of types and reduces the complexity of our typing rules.
The primary typing judgment of ARel, t 1 t 2 r : τ , states that the programs t 1 and t 2 are related at type τ , which can specify relational properties of their results, and importantly, that their relative cost (cost of t 1 minus the cost of t 2 ) is upper bounded by r. 1 To reason about array-manipulating programs in the relational setting, we also need to express relations between corresponding arrays across the two runs of the two programs. For this reason, our monadic type (the type of impure computations that can access state) has a refinement that specifies how arrays are related across the two runs before and after a heap-accessing computation. Specifically, our monadic type has the form diff (r) {P} ∃ γ .τ {Q}. This type represents a pair of computations, which when starting from arrays related by the relational precondition P, end with arrays related by the relational postcondition Q, return values related at τ , newly generated arrays referred by static names γ , and have relative cost at most r. This design is inspired by relational Hoare logics (Benton, 2004;Nanevski et al., 2013), but there are two key differences: (1) Our pre-and postconditions are minimal-they only specify the indices at which a pair of corresponding arrays differ across the two runs, not full functional properties. This suffices for relational cost analysis of many programs and simplifies our metatheory and, importantly, the implementation.
(2) Our monadic types carry a relative cost, and the monad's constructs combine costs.
Additionally, ARel supports establishing lower and upper bounds on the cost of a single expression, and falling back to such unary analysis in the middle of a proof of relative cost. Improving over previous type-and-effect systems for relational cost analysis, ARel permits combinations of these two kinds of reasoning in the definition of recursive functions. Specifically, ARel provides typing rules for the fix-point operator that allow one to simultaneously reason about the cost in the unary and relational setting. This is useful for the analysis of several programs that we show later.
To prove that our type system is sound, we develop a logical relations model of our types. This model combines unary and binary logical relations and it supports two different effects, cost and state, that are structurally dissimilar. For the state aspect, we build on step-indexed Kripke logical relations (Ahmed, 2004;Ahmed et al., 2009). Specifically, our logical relations are indexed by a "step"-a standard device for inductive proofs that counts how many steps of computation the logical relation is good for Ahmed (2006), Appel & McAllester (2001). Owing to the simplicity of our pre-and postconditions, we do not need state-dependent worlds as in some other work (Neis et al., 2011;Turon et al., 2013).
To show the effectiveness of our approach, we implement a bidirectional type checker for ARel. Thanks to the simplified form of our pre-and postconditions, we can solve the constraints generated by the type checker using SMT solvers. The type checker also uses a restricted number of heuristics to address some of the non-determinism coming from the relational reasoning, and the array operations. In order to evaluate the performance of our implementation, we consider a broad set of examples showcasing different challenges for relational cost analysis in programs manipulating arrays.
Our overarching contribution lies in extending relational cost analysis to higher order functional-imperative programs. Our specific contributions are as follows: • ARel, a type system for relational cost analysis of functional-imperative programs with mutable arrays. • A design for lightweight (relational) refinements of array-based computations.
• A soundness proof for our type system via a new step-indexed logical relation.
• An implementation of ARel, based on bidirectional type checking, which we use to type check several functional-imperative examples.
Improvement with respect to the conference version. This paper is an extended version of a paper published at the ICFP 2019 conference. The main additions to the conference version are as follows: • A comprehensive presentation of the logical relations (Section 4) used to prove soundness, along with a full definition of the type interpretation. Additionally, representative cases of the proof of the fundamental theorem (Section 4.3) are included. • Two new examples (Section 5). The first one is Mergesort. This example further illustrates the expressiveness of ARel. The second example is Loop unswitching, a common technique used in compiler optimization. This example aims at giving our readers insights about how ARel handles two programs that are not structurally similar. • A comprehensive presentation of the algorithmic version of ARel (Section 6). This section briefly introduces bidirectional type checking along with the difficulties of directly applying this technique to ARel. This motivates the introduction of a core language, ARelCore, as a theoretical midway point, for the algorithmization of ARel (Section 6.1). The concrete algorithmic type system BiARel works on the core language (Section 6.3). This section also shows the soundness and completeness of ARelCore with respect to BiARel and the soundness and completeness of the elaboration from ARel to ARelCore (Section 6.2). • An annotated example mapi used to show how the type checker works and the extent of the required annotations (Section 7.3). This section also include an in-depth discussion of the limitations of our implementation and directions for improvement. • The code of the type checker as well as the examples used in Section 7 have been released publicly at https://github.com/haddyclipk/ICFP2019_BiArel. An appendix with full proofs is also available in the repository.

ARel through examples
In this section, we illustrate the key ideas behind ARel through two simple examples.
Inplace Map. Consider the following imperative map function, named mapi, taking as input a pure function f , a mutable array a, an index k, and the array's length n. For all i ∈ [k, n], the function replaces the current value in the ith cell of a with f (a[i]), thus performing a destructive update.
fix mapi (f ).λa.λk.λn. if k ≤ n then let {x} = read a k in let {_} = updt a k (f x) in mapi f a (k + 1) n else return () The expression read a k returns the element at index k in the array a, and updt a k v updates the index k in a to v. Our language uses a state monad to isolate all side effects like array reads and updates, so read a k and updt a k v are actually expressions of monadic types, also called computations. The construct let {x} = t 1 in t 2 is monadic sequencing, often called "bind". Consider the problem of establishing an upper bound on the relative cost of two runs of mapi that use the same function f but two different arrays a. Intuitively, the relative cost should be upper bounded by the product of the maximal variation in the cost of the function f (across inputs) and the number of indices in the range [k, n] at which the two arrays differ.
To support reasoning about two runs as in this example, ARel supports relational types that ascribe a pair of related values or related expressions in the two runs. Relational types are written τ . In general, when we say x : τ , we mean that the variable x may be bound to two different values in the two runs, but these two values will be related by the type τ . Specifically, x : τ 1 → τ 2 means that x can be bound to two different functions f 1 , f 2 in the two runs, satisfying the property that for any two arguments v 1 , v 2 of relational type τ 1 , the two expressions f 1 v 1 , f 2 v 2 have relational type τ 2 . Naturally, ARel also supports unary types, denoted A, that ascribe only one value or expression in a single run, but we will have no occasion to use unary types in this example, so we postpone their discussion.
To establish the relative cost of mapi, we first need a way to represent that the same function f will be given to mapi in both runs. To this end, ARel offers the type annotation . The type τ relates expressions in two runs that are (syntactically) equal and are additionally related at the relational type τ . Note that is a relational refinement: It refines the relation defined by the underlying type τ . Specifically, the relational typing assumption f : (τ 1 → τ 2 ) means that, in the two runs, f will be bound to two copies of the same function, say f , that given arguments v 1 , v 2 related at type τ 1 , give expressions f v 1 and f v 2 related at type τ 2 . In our example, if the array's elements have type τ , the type of f would be (τ → τ ).
Next, we need to represent the maximum possible variation in the cost of applying f . The possible variation in the cost can be seen as an effect, and the cost of applying a function can be seen as the effect associated with the body of the function, in particular. Hence, as is common in effect systems (Nielson & Nielson, 1999), we can record the possible variation in cost by means of a refinement of the function type. ARel offers a refinement of this kind.
We write diff (r) τ 1 −→ τ 2 to represent two functions of relational type τ 1 → τ 2 , the relative cost of whose bodies is upper bounded by r. Accordingly, if f 's cost can vary by r, its type can be further refined to ( diff (r) τ −→ τ ). Next, we need a way to specify where the arrays given as inputs to mapi in the two runs differ. There are various design choices for supporting this. One obvious but problematic option would be to refine the type of an array itself, to specify where the two ascribed arrays differ across two runs. However, this design quickly runs into an issue: An update on the arrays might be different in the two runs, so it might change the arrays' type. This would be highly unsatisfactory since we don't expect the type of an array to change due to an update; in particular, this design would not satisfy (semantic or syntactic) type preservation.
Consequently, we use a different approach inspired by relational Hoare logics: We provide a relational refinement type diff(r) {P} ∃ γ .τ {Q} for monadic expressions that manipulate arrays. The number r is an upper bound on the relative cost of the computations in two runs, similar to the one we have in function types, and τ is the relational type relating the pair of pure values returned by the computations. The precondition P specifies for each pair of related arrays in scope where (at which indices) these corresponding arrays are allowed to differ before the execution of the computations, while the postcondition Q specifies where these arrays may differ after the completion of the computations. More specifically, P and Q are lists of annotations of the form γ → β, where γ is a static name identifying a pair of related arrays and β is a set of indices where this pair of arrays may differ in the two runs. In other words, at any index not in β, the corresponding arrays must be the same in the two runs. Note that even at indices in β, the corresponding values must be related at τ , but our type system includes types that do not force equality of the related values. One such type is U(A, B) that only insists that the left and right values have (unary) types A and B, without requiring any other relation between them. (The existentially quan- {P} ∃ γ .τ {Q} is the list of static names of arrays that are allocated during the computation.) To be more concrete, let us consider an example: If x : τ (i.e., x is the same in two runs) and b represents a pair of arrays associated with the static name γ , then two equal update operations (updt b 5 x) can be given the relational monadic type diff(0) {γ → β} ∃_.unit {γ → (β \ {5})}, for any β. 2 This type means that if the two corresponding arrays b differ at most in the set of indices β before updt b 5 x executes, then afterwards the arrays can still differ in the indices β except at the index 5, which has been overwritten by the same value x (indicated by the box type τ ). If we replace the assumption x : τ with x : τ , so that x may differ in the two runs, then the type of updt b 5 x relative to itself would be diff(0) {γ → β} ∃_.unit {γ → (β ∪ {5})}, indicating that the arrays may differ at index 5 after the update (even if they did not differ at that index before the update).
In order to make this reasoning formal, we need a way to tie the static names γ appearing in computation types to specific arrays. To this end, we refine the type of arrays to include γ . In fact, we also refine the type of arrays to track the length of the array. This doubly refined type is written Array γ [l] τ -a pair of arrays of length l each, identified statically by the name γ , and carrying elements related pointwise at type τ . Finally, we refine integers very precisely: The type int[n] is the singleton type containing only the integer n in both runs. The n in the type is a static representation of the runtime value the type ascribes.
With all these components we can now represent the relative cost of mapi by the following judgment: mapi mapi 0 : This typing means that mapi relates to itself in the following way. Consider two runs of mapi with the same function f of relative cost r (type ( diff (r) τ −→ τ )), two arrays of static length n, statically named γ (type Array γ [n] τ ), two indices, both k (type int[k]), and two lengths, both n (type int[n]). Then, the two runs return computations with the following relational property: If the two arrays differ at most at indices β before they are passed to mapi, then they differ at most at the same positions after the computations, and the relative cost of the two computations is upper bounded by |β ∩ [k, n]| * r, that is, the number of positions in the range [k, n] at which the arrays may differ times r. This is exactly the expected relative cost because at positions where the arrays are equal, f will have the same cost in the two runs (we are assuming language-level determinism here). Note that the variables r, k, n, γ , and β are universally quantified in the type above. Also, note how γ links the input array to the β in the pre-and postcondition of the computation type.
As is usual in effect systems, when we apply mapi, we have two kinds of costs. For example, suppose that we provide arguments f , a, k, and n. Then, we have some cost D such that: Here, we can think of D as a bound on the relative cost of the computation before we get to the array evaluation, while |β ∩ [k, n]| * r is a bound on the relative cost bounds for the part of the computation involving arrays.
Consider now a slightly different situation where different functions f may be passed to mapi in the two runs. Suppose that the relative cost of the bodies of the two f s passed is upper bounded by r, that is, f has the type diff (r) τ −→ τ (without the prefix ). In this case, the relative cost of the two runs of mapi can only be upper bounded by |[k, n]| * r, since even at indices where the arrays agree, the cost of applying the two different f s may differ by as much as r. Moreover, the final arrays may differ in all positions in the range [k, n]. This is formalized in the following, second relational type for mapi. mapi mapi 0 : ∀r :: BooleanOr. Next, we describe how high-level reasoning about relative cost is internalized in the typing. ARel supports two kinds of typing modes: relational typing as shown in the imperative map example above, and unary typing which supports traditional (unary) min-and max-cost analysis for a single run of a program. We will introduce these modes formally in the next section, but here we want to show with the following example how they can be meaningfully combined via an extended fix rule r-fix-ext.
fix BoolOr (a). λk.λn. if k < n then let {x} = read a k in if x then return true else BoolOr a (k + 1) n else return false This function, given as input an array of booleans a, an index k and the array's length n says whether there exists an element in a with index ≥ k and value true.
Given two arbitrary arrays a in two runs, a simple upper bound on the relative cost of BoolOr is (n − k) * c where c is the cost of one iteration. This is because in one run we can find an element with value true in position k, and so the computation can return immediately, while in the other run, we may not find any such element, and would need to visit every element of the array with its index greater than k. This kind of high-level reasoning corresponds to a worst-case, best-case analysis of the two individual runs. ARel supports this kind of reasoning by supporting worst-case, best-case (unary) cost analysis in unary mode, and by means of a rule r-switch, presented formally in Section 3, allows us to derive a relational typing from two unary typings, with relative cost equal to the difference between the max and the min costs of the unary typings.
However, this kind of reasoning does not account for the case where the two input arrays have a meaningful relation, for example, they may be equal in some positions. In such cases, a better upper bound on the relative cost would be expressed in terms of the first index i (if any) where the two arrays differ. That is, we could have the upper bound (n − i) * c. Showing this upper bound in a formal way is more involved. We first need to proceed by case analysis on whether the element x we are reading at each step is the same in the two runs or not. Case analysis in ARel is provided by the rule r-split, presented in Section 3. Using this rule we can consider the two cases separately in typing the subexpression: if x then return true else BoolOr a (k + 1) n.
If x is the same in the two runs, there is no difference in cost because we either return true in both runs, or we perform the recursive call in both runs. In case the two x's differ, we must switch to the unary analysis of the two individual runs since in one run we will return immediately while in the other we will make a recursive call, so there is no way to continue reasoning relationally. Hence, in order to derive the required upper bound on the overall relative cost we need to have information about the unary type of BoolOr. However, since we started by trying to type the body of BoolOr relationally, the standard fixpoint rule only allows us to assume its relational type.
One solution to this impasse is to automatically transform relational types of variables in context to unary types when switching from relational to unary reasoning. This approach was adopted by Çiçek et al. (2017) for analyzing pure functional programs but it provides only trivial lower and upper bounds (0 and ∞) on the costs of function variables in the context during the unary analysis. In our example here, this approach yields the trivial upper bound ∞, which is not what we want.
To allow for more precise analysis, ARel includes a new rule r-fix-ext which we introduce formally in Section 3. This rule allows us to assume unary typing of two recursive functions, when typing their bodies relationally. With this rule, we can use the (assumed) relational type of BoolOr and its unary type in typing the subexpression BoolOr a (k + 1) n. With this, we can conclude the inductive step and assign the precise relative cost (n − i) * c to BoolOr.

ARel formally
In this section, we present the syntax, semantics, and the type system of ARel. We can think about ARel as composed of two parts, a pure part, inspired by Çiçek et al. (2017), which allows one to reason about the difference in the execution costs of two pure programs and an impure part, which allows one to reason about the difference in the execution costs of two programs involving array operations. In our paper, we will mostly focus on the impure part, giving details of the pure part as needed.

Syntax
We summarize ARel's syntax in Figure 1. The term language underlying ARel is a simply typed λ-calculus with recursion and constructs for mutable arrays. Most of the pure constructs (the ones in black in Figure 1) are similar to the ones one can find in a pure standard functional language. We have variables x, natural numbers n, and real numbers r, unit (), lambda abstraction λx.t, and recursion fix f (x).t. and application t 1 t 2 . We have the introduction and elimination constructs for product, sum, and inductive list. Additionally, we have some constructs to deal with type level information. We have term constructs .t and t [], pack t and unpack t 1 as x in t 2 corresponding to the introduction and elimination of universal and existential types over index terms, respectively, and a term construct celim t, which is used to eliminate type-level constraints. We discuss these constructs further when we introduce types.
The impure part at the term level consists of constructs to deal with arrays, which we highlight with blue underlines in Figure 1. We have constructs for allocating arrays (alloc t 1 t 2 , where t 1 specifies the number of array cells to be allocated, and t 2 the initial value to be stored in each array cell), for reading from arrays (read t 1 t 2 , where t 1 specifies the array to read from, and t 2 the position in the array to read from), and for updating arrays (updt t 1 t 2 t 3 , where t 1 specifies the array to be updated, t 2 the position in the array to be updated, and t 3 the value to be used for the update). All imperative (array-manipulating) constructs are confined to a monad. The constructs return t and let {x} = t 1 in t 2 are the usual return and bind of the monad. We do not distinguish between impure expressions and pure expressions at the syntactic level; this distinction is enforced by types. Impure expressions (expressions of monadic types) are values, but can be forced using a special forcing semantics that we describe below. Finally, arrays are referenced through locations, l ∈ Loc, where Loc is a fixed set of heap locations. Although locations do not appear in programs, they are needed for the evaluation, so they are included in the syntax.
Types can contain index terms. We use iVar to denote the set of index term variables, and iLoc to denote the set of index term variables that refer to locations statically. These static identifiers for locations are denoted γ and belong to a specific sort written L. We discuss index terms in detail in Section 3.3.

Operational semantics
We define a cost-annotated, big-step operational semantics for our language. Part of this semantics is based on manipulation of heaps, also described in Figure 1. We represent heaps as mappings H = [l 1 → z 1 , . . . , l n → z n ] from memory locations to concrete arrays z = [v 1 , . . . , v n ]. The notation H(l) [n] = v expresses that the value v is stored in the heap H in the array pointed by the location pointer l at the index n. The notation H(l) [n] ← v represents an update to the heap H: The array pointed by l in H is updated with the value v  at index n. The notation H 1 H 2 , in the spirit of separation logic, denotes a disjoint union of the heaps H 1 and H 2 . We have two kinds of evaluation judgments: pure evaluation t ⇓ c,k v states that the (pure) expression t evaluates to the value v with cost c, using k steps, while forcing evaluation t; H ⇓ c,k f v; H states that the impure expression t can be forced in the heap H to the value v and to the updated heap H with cost c, consuming k steps. We give a selection of the evaluation rules in Figure 2. We include all the rules for the impure part and a selection for the pure part, all the other rules can be found in the Appendix.
Steps k are a proof artifact, needed only in our soundness proof that relies on a stepindexed logical relation (Section 4). We count a unit step for every elimination and monadic construct. Readers may ignore steps for now. The costs c are what we seek to upper bound (relatively) using our type system. At every elimination form or monadic construct, the semantics adds a construct-dependent cost. For example, the cost c app appearing in the rules stands for the cost of the function application operation. By changing these costs and setting some of them to 0, we can get different cost models. In other words, our type system is parametric in the costs of individual constructs.
The pure evaluation rules are mostly standard. They track how the cost and the steps change when a pure expression evaluates. The rule e-val says that a value v evaluates to itself with no cost and in zero steps. The rules e-inl and e-casel describe the evaluation of the introduction and elimination terms for the sum type (we only show the rule for inlas the rule for inr is similar), where in addition we record the cost c case for the case elimination, and we increment the steps. Rules a-app and e-fix are similar but for the two application cases, the latter one includes recursion.
The forcing evaluation rules are used to evaluate impure (monadic) expressions manipulating heaps (arrays). The rule f-ret forces the evaluation of an expression return t, representing the unit of the monad, by evaluating the underlying pure expression t using the pure evaluation semantics. The cost consists of the cost of the pure evaluation of t and the constant cost c ret for the monadic return. As one would expect, the unit of the monad wraps a pure expression into a monadic computation, and it accounts for the cost of this operation by means of the cost c ret . The rule f-bind combines pure and forcing evaluations in order to fully evaluate a monadic let. This rule shows how an impure computation (involving arrays) is evaluated in our semantics. We first evaluate an expression t 1 to a value v using the pure evaluation semantics. Then, we force evaluate this value v to a value v 1 of the underlying type. We can then perform the substitution and evaluate the resulting expression t 2 [v 1 /x] to a value v 2 using the pure evaluation semantics. The resulting value v 2 is then force evaluated to another value v 3 which is also the result of the overall let expression. The heap also changes accordingly. c let accounts for additional costs associated with the bind operation itself.
The rule f-read forces the evaluation of a read expression in the heap H by first evaluating the heap location l from which to read, then the index of the element n to read, and then returning the value stored in l at index n. The rule f-updt forces the evaluation of an update expression in a similar way; it returns a unit value. Finally, the rule f-alloc forces the evaluation of an alloc expression by creating a new array with the length specified by the first argument and initial values specified by the second argument, and by allocating it in the heap at a new location l, which is returned.
It is worth noticing that, in the forcing evaluation rules, all the subexpressions evaluate using the pure semantics to values of base types, since in our language we only allow arrays of base types.

Index terms and constraints
In the spirit of DML (Xi & Pfenning, 1999), types are indexed using static index terms that are defined in Figure 1. Index terms include natural numbers and real numbers, which we use to express size and cost information, respectively. We equip index terms with several operations including ceiling, floor, log, min, and max. Moreover, we have special index terms denoting (potentially infinite) sets of natural numbers, representing sets of array indexes, and operations over them (we identify these with blue underlines in Figure 1). We denote elements of this class with the letter β. These sets can be used to represent at the type level different information on arrays. In relational types, they represent where two related arrays may differ (as explained earlier), while in unary types, they represent the write permissions for the array. We will return to this point later after we explain the types. We can explicitly form a set through an indexed set comprehension of the form {I i } i∈K , where K ⊆ N, and we can take the union β 1 ∪ β 2 or the difference β 1 \ β 2 of two sets β 1 , β 2 . We use index terms also in the relational type for lists to specify the number of values that differ pointwise in the lists across two runs. We denote index terms with this specific meaning with the metavariable α.
We consider only well-sorted index terms. To this end, we have a sorting judgment of the form I :: S where is a sort environment, assigning sorts to index variables, and S is a sort. Our language has five sorts: N for natural numbers, used for sizes of arrays, lists, and other data structures; R for real numbers, used to express costs; B for booleans; P for sets of natural numbers as just described; and L for static names of arrays (sorting rules can be found in the Appendix). As a convention, we use L, U to represent the unary minimum and maximum costs, and D to denote a maximum relational cost (L, U, and D are always of sort R). Index terms can also appear in constraints C, which express equalities and inequalities over index terms and can are used to represent conditional typing.

Unary and relational types
In ARel, we have two typing modes: unary and relational. This separation is also reflected at the type level where we have two different type grammars: unary types A and relational types τ .
Unary types describe values (expressions) in a single run. They use index terms to represent size information, as in the case of the type list[I] A where I represents the size of the list, and costs, as in the case of the type exec(L,U) A −→ A where L and U represent lower and upper bounds on the cost of the body of the function being typed. The cost can also be seen as an effect. We also have other basic types, as well as types for products and sums. Index terms are also used for size in basic types like integers, booleans, etc., for costs in universal quantifications, and in constraints.
Besides the pure types we just discussed, we have a type for arrays and a type for impure computations (with blue underlines in Figure 1). The type Array γ [I] A is the type of arrays of length I containing objects of type A. We limit A to base types like int, bool, etc., to simplify our technical development. In particular, we do not support arrays of arrays here. The annotation γ associates a static name to the array that is typed. This static name can be used to refer to the array in other types.
Impure expressions are typed with monadic types. In our case, a monadic unary type is a cost-annotated Hoare-triple type of the shape exec(L,U) {P} ∃ γ .A {Q}, which is inspired by Hoare Type Theory (Nanevski et al., 2008). Assertions P, Q are sets {γ 1 → β 1 , . . . , γ n → β n } assigning to each static location γ i a set of natural numbers β i describing (writing) permissions. The idea is that the array named γ i can be written only at indices in β i (although it may read anywhere). The domain of Q may be larger than the domain of P, to account for newly allocated arrays. The index terms L and U are lower and upper bounds on the execution cost of the (forcing) evaluation of the typed expression.
Additionally, index terms are used in constraints which can appear in types of the shape C ⊃ A, which reads as "the constraint C implies A", and of the shape C&A, which reads as "A and the constraint C is true". These constraints support conditional typing and they are quite useful to restrict the properties of arrays. For example, the type I > 0 & Array γ [I] A ascribes non-empty arrays, and the constraint C in the type C ⊃ A can be used to restrict array index bounds, as we will see in the examples in Section 5.
Relational types ascribe pairs of expressions, and as we will see in Section 4, they are actually interpreted as sets of pairs of values in our model. In relational types, index terms carry not just size information but also information about the relation between the two expressions, between their inputs, and between their outputs. The type list α [I] τ ascribes a pair of lists, each of length I, whose elements are pointwise related at the type τ . Importantly, the relational refinement α specifies an upper bound on the number of positions at which the corresponding elements may differ. In other words, the two lists must have equal elements in at least I − α positions, even if τ allows them to be unrelated. The type int[I] represents pairs of integers both of which are equal to I. In arrow types diff (D) τ −→ τ , the index term D represents an upper bound on the relative cost of the execution of the underlying pair of functions on two inputs related at τ .
Given a pair of unary types A 1 , A 2 , the relational type U(A 1 , A 2 ) represents arbitrary pairs of expressions of types A 1 , A 2 , respectively. This offers a way to trivially relate two "unrelated" values. As explained in Section 2, we also have a comonadic relational type τ which represents pairs of expressions of type τ which are syntactically equal, and we have corresponding subtyping rule s-r-T and s-r-D in Figure 9. In particular, where the left and right components are equal.
The relational type Array γ [I] τ is similar to the unary array type but it represents two arrays, each of length I, containing values related at τ pointwise. The static name γ is the name for both arrays. As we will see in Section 4, our logical relation relates γ to two arrays in two different heaps. Related impure computations, illustrated in the imperative map example of Section 2, are typed using a relational cost-annotated monadic type of {P} ∃ γ .A {Q} but it means something different. In the relational type, the pre-and postconditions P, Q of form {γ 1 → β 1 , . . . , γ n → β n } have a relational interpretation, namely, that (for all i) the two arrays named γ i must carry equal values at all positions not in β (and the values must be related at τ ). At positions in β, the values must still be related at τ , but they need not be equal (unless τ forces this, e.g., with a prefix ). D is an upper bound on the relative cost of forcing the evaluation of the two impure expressions.
As usual, we consider only types that are well-formed. We have well-formedness judgments ; ; A wf for unary types, and ; ; τ wf for relational types. Here, is a location environment listing the locations that can appear in the rest of the judgment, is a sort environment listing all free index variables and is a constraint environment to support conditional typing. Well-formedness rules are provided in the Appendix.

Unary and relational typing
Unary Typing Judgment. ARel's unary typing uses the judgment form where t is an expression, is a location environment, is a sort environment listing all the free index variables as mentioned before, is a constraint environment, is a unary type environment assigning unary types to variables, A is a unary type, and L and U are index terms representing a lower bound and an upper bound on the cost of evaluating t, respectively. We give a selection of the typing rules for deriving unary typing judgments in Figures 3 (for the pure constructs) and 4 (for the impure part).
We first show the rules for pure expressions in Figure 3. These rules are similar to the ones proposed by Çiçek et al. (2017). The main difference is that our rules have one more environment , used to store the locations in the heap. Rules u-int and u-var are relatively self-explanatory. Rules u-fix and u-app are similar to the ones available in classical effect systems, where the lower bound and upper bound of the cost of the (recursive) function body is recorded in the function type. The cost of application considers also the cost of executing the function body. Rules u-inl and u-inr are dual. Notice that the introduction of the sum type A 1 + A 2 requires well-formedness for the type that is introduced in the sum. Rule u-case for eliminating the sum type requires the same upper and lower bound in both branches. Rules u-nil and u-cons are similar and specify sizes of lists. Finally, rule u-sub, internalizes weakening for the upper and lower bounds and subtyping.
The rules for the unary typing of impure expressions are presented in Figure 4. Since costs of operations like reading and writing memory are variable on most architectures, these rules rely on given upper-and lower bounds on the cost of each operation. For example, L read and U read denote the minimum and maximum cost of reading a heap location, respectively. The costs U let , U alloc , L read , L updt are similar.  Rules u-ret and u-bind type the unit and the bind of the monad, respectively. They combine the different costs and assertions in the monadic type, using a style similar to separation logic. For example, the assertion P 1 P 2 corresponds to disjoint parts of the heap. The rule for allocations, u-alloc, introduces a new static location γ and creates a new monadic type whose postcondition assigns to γ all the natural numbers (N), indicating that the continuation has the permission to write all positions of the array. Additionally, like all other rules, this rule also adds a cost accounting for the forcing of the allocation. Finally, note that the upper and lower bounds on the judgment are 0. This is because alloc t 1 t 2 is a value. Cost arises only when the impure expression alloc t 1 t 2 is forced; this is accounted for in the cost annotations of the monadic type. The rule for reading, u-read, merely checks that the index being read is within the array bounds. The rule for updating, u-updt, also performs a similar check but, in addition, it also requires that the updated index is contained in the permissions available for the array in the precondition.

Relational Typing Judgment. ARel's relational typing uses the judgment form
Here, t 1 and t 2 are two expressions, , , and are environments similar to the ones used by unary typing judgments, is a relational type environment assigning relational types to variables, τ is a relational type for t 1 , t 2 , and D is an index term representing an upper bound on the relative cost of evaluating t 1 and t 2 , that is, cost(t 1 ) − cost(t 2 ). That the relational judgements relates two programs naturally leads to two kinds of relational typing rules: synchronous rules that relate two structurally similar programs, and asynchronous rules for arbitrary programs. We first present a selection of the pure typing rules, both synchronous ( Figure 5) and asynchronous ( Figure 6). These rules are again inspired by the work of Çiçek et al. (2017). Then, we present a selection of impure typing rules, which support relational cost analysis for arrays. Again, we distinguish synchronous rules (Figure 7) from asynchronous rules ( Figure 8). A complete set of typing rules can be found in the Appendix. Pure Synchronous Rules. We present selected synchronous rules for pure expressions in Figure 5. The rule r-int relates two copies of the same integer n the singleton type int[n]. The rules r-var, r-fix, and r-app are the relational counterpart of the rules for variables, function abstraction, and application we saw in the unary part. The main difference is that we give an upper bound to the relative cost, rather than lower and upper bounds on the execution cost of a single expression. The rules r-inl, r-inr, and r-case type the introduction and elimination of the relational sum type. In the elimination rule r-case, notice that we require the relative cost D 2 of the two branches to be the same. Rule r-split allows reasoning by cases on any constraint in the constraint environment. Rule r-sub is the relational version of the rule u-sub. It allows weakening the upper bound on the relative cost D as well as subtyping.
Rule r-nc is the introduction rule for -ed types. Briefly, t can be related to itself at the type τ when t relates to itself at type τ and, additionally, all variables in the context morally have -ed types. The latter ensures that variables can only be substituted by equal terms. In this case, the relative cost is 0. This rule allows us to assume that given a pair of functions whose type is refined with , if we apply them to the same argument, we have two executions following the same path, and at the same cost. We discussed this intuition behind the rule r-nc in Section 2 when we looked at the first relational type of mapi.
Rule r-fix-ext types fixpoint expressions relationally. This rule also requires unary typing for the two functions, which are established in separate premises. We require these additional premises so that we can use the information provided by the unary typing to establish the relative cost. In other words, this rule introduces a weak form of intersection types in the environment which can be used in combination with the rule r-switch ( Figure 6) to give precise bounds on relative cost.
Pure Asynchronous Rules. We present a selection of the pure asynchronous rules in Figure 6. Rule r-switch allows switching from relational reasoning about t 1 and t 2 in the conclusion, to unary reasoning about the two terms independently in the premises. Notice that the relational type in the conclusion is the embedding of the two unary types without any meaningful relation (U(A 1 , A 2 )). The rule uses an erasure map | | i from relational environments to unary environments (i = 1 for left and i = 2 for the right), whose definition can be found in the Appendix. Importantly, the relative cost in the conclusion is the difference of the unary costs in the premises.
Rule r-lt-e relates a pure let binding expression to an arbitrary expression. In this rule, we use the metavariable c lt to denote the cost of a pure let construct. (This is different from the cost c let of the monadic bind, which we discussed earlier.) Notice that one of the assumptions in this rule, the one for the expression t 1 , is a unary typing judgment. This is needed in order to provide guarantees on the typability of t 1 and to provide the cost of evaluating it, which is used in the bound on the relative cost in the conclusion of the rule.
The rule r-e-lt is dual to r-lt-e-it relates an arbitrary expression with a standard let. Notice that while the rule r-lt-e uses the upper bound on the unary cost of t 1 , the rule r-e-lt uses the lower bound.
Rule r-app-e relates a function application to an arbitrary expression, while rule r-case-e relates a case expression with an arbitrary expression and r-e-case does the opposite. Also in these rules, we use some unary typing assumptions to guarantee typability and to provide unary costs that are used in giving upper bounds on the relative costs. Figure 7 shows a selection of relational synchronous typing rules pertaining to monadic constructs and arrays. Rules r-ret and r-bind relationally type the return and bind of our monad. The rules introduce the trivial relational Hoare-triple and combine two relational Hoare triples by sequencing, respectively. In particular, the rule r-bind uses the style of separation logic.

Impure Synchronous Rules.
For each operation on arrays, we have two rules, one that is general and the other that works under some assumption about equality of arguments in the two runs. Consider, for example, the rules r-alloc and r-allocb for relationally typing the alloc construct. The rules are similar, for example, both create a new static name γ for the two allocated arrays, and both account for relative costs very similarly. However, r-allocb applies only when the expressions initializing the two arrays are related at a -ed type (the second premise). As a result, it is guaranteed that the arrays allocated in the two runs will have equal values in all positions. This is reflected in the assertion γ → ∅ in the postcondition of the monadic type in the rule, which says that there are no locations where the newly allocated arrays (named γ ) can differ. In contrast, the rule r-alloc does not require the initializing expressions to be related at a -ed type, but it has γ → N in the postcondition, meaning that the two arrays may differ anywhere after the allocation.
A similar difference between the rules r-read and r-readb for relationally typing the construct read. In r-readb, the read index I must not be in the β of the array being read in the precondition; as a result, the values read must be equal in the two runs. Hence, the resulting type has a on it. r-read is similar, but, here, there is no requirement that I is not in the β, so two different values may be read, and there is no on the result type.
The rules r-updt and r-updtb for updt follow the principle of alloc: In r-updtb, the values being written in the two runs are known to be equal (via a -ed type), so the index I that is updated is removed from β in the postcondition. This is not the case in r-updt, where it must be added to β, since the two values at index I might differ after the update. In all these rules, the premise ; I ≤ I denotes constraint entailment, which means that for any substitution of all index variables in the index environment , if all constraints in hold, then the constraint I ≤ I holds. We omit the rules for deriving this judgment since they are standard. Finally, note that all monadic rules "propagate" relative costs from the premises to the monadic types. This is similar to the unary rules; the difference is that the costs propagated here are relative, whereas the unary type system propagates unary lower and upper bounds.
It is worth emphasizing that the set of γ s in any pre-or postcondition must be written down explicitly, that is, we have not introduced sophisticated constructors (like set comprehension) for pre-and postconditions. This means that we cannot meaningfully specify monadic computations that allocate a data-dependent number of arrays. This has not been a problem for our examples, and we believe an extension to lift this restriction is possible by adding language-level constructors and elimination rules for assigning γ . While this change would make our approach more flexible and more expressive, it would also put an additional burden on the programmer. Impure Asynchronous Rules. Figure 8 shows the two asynchronous rules r-bind-e and r-e-bind, relating a monadic binding construct and an arbitrary expression. We explain only the rule r-bind-e, which relates the monadic binding construct let {x} = t 1 in t 2 to an arbitrary expression t 2 (the rule r-e-bind is its dual and it can be understood similarly). The first premise of the rule r-bind-e requires unary typing for the monadic expression t 1 . This typing has two kinds of costs: the lower bound L 1 and upper bound U 1 for the unary execution cost of t 1 , and the lower bound L and upper bound U for the execution cost of the resulting monadic computation. The second premise requires a unary typing for the monadic expression t 2 . This gives us an upper bound U 2 on the cost of evaluating this expression. The premise dom(P) = dom(P 1 ) requires that the execution of the computation resulting from the expression t 1 can only affect arrays that appear in both P 1 and P. Finally, the last premise requires relating the subexpression t 2 to t 2 with the relative cost upper bound D 2 under the assumption that the values substituted for the variable x are related at the type U(A 1 , A 1 ). Notice that this is the weakest requirement in terms of types that we can have.
Additionally, this typing judgment also gives us the upper bound D on the relative cost for executing the two computations resulting from evaluating the two expressions t 2 and t 2 .
To put the information of the unary and relational typing together we use the precondition P P 1 in this premise, where the operation lifts set union pointwise to partial maps (preconditions P are partial maps from locations γ to sets of indices). The precondition in the unary monadic type of t 1 in the first premise provides the indices where the computation associated with t 1 has write permission. So, intuitively, P 1 provides the indices that may be overwritten when executing t 1 . Hence, we want to use this information when relating expressions t 2 and t 2 because t 1 is supposed to be executed by then. The conclusion of the rule uses all the cost information we discussed to compute an upper bound on the relative cost of the two expressions, where, as usual, we use the metavariable c let to denote the cost of the monadic binding construct. The bounds on the relative cost here deserve some discussion. Following the definition, and observing that monadic let is a value, we have that the relative cost of the two expressions is bound by −L 2 . However, we want also to have a bound on the relative cost of forcing the evaluation of the two expressions, since this must be recorded in the monadic type. The upper bound is U 1 + U + (D 2 + U 2 ) + D + c let , where U 1 + U upper bounds the cost of forcing the evaluation of t 1 , D 2 upper bounds the difference in cost of evaluating t 2 and t 2 to values, U 2 upper bounds the cost of evaluating t 2 to its value, and D upper bounds the difference in the costs of forcing the evaluation of the values obtained by evaluating t 2 and t 2 , respectively.
One can also design similar asynchronous rules for the other monadic constructs. However, the syntactic forms of the other constructs considerably constrain their asynchronous typing rules, making the scope of application of such rules rather narrow. For this reason, we do not commit to the design of such rules here.
Subtyping. Subtyping is important in ARel. It serves several purposes. First, as in all refinement type systems, subtyping equates types up to constraints, for example, it allows replacing int[2 + i] with int[5] under the constraint i = 3. Second, specific to cost analysis, subtyping allows weakening costs, for example, the relational type diff (D) τ 1 −→ τ 2 when D ≤ D since the D on the arrow is an upper bound on relative cost. Third, subtyping allows "massaging" of modalities and U, for example, τ can be subtyped to τ . Finally, specific to the monadic types, subtyping allows weakening of pre and postconditions in monadic types. The first three purposes of subtyping in ARel are relatively standard (e.g., see Çiçek et al., 2017) and we will only introduce them briefly. We describe the last use here at length. The unary and relational subtyping judgments have the forms ; ; |= A 1 A 2 and ; ; |= τ 1 τ 2 , respectively. Figure 9 shows selected subtyping rules. The notation P ⊆ P means that P The rules s-u-arrow and s-r-arrow subtype unary and relational function types, respectively. The rule s-r-w allows weakening from the relational type τ to its weak version U(|τ | 1 , |τ | 2 ), where | · | i is used to convert from a relational type to a unary type. When i = 1, this construct projects the left side of the relational type; when i = 2 it projects its right side (see the Appendix for the definitions of these projections). The rules s-r-list and s-r-array subtype relational list and array types, respectively. They impose requirements on lengths. Rule s-r-ua allows weakening the relational type U(A 1 , A 2 ) to U(A 1 , A 2 ) if we know that A 1 and A 2 are subtypes of A 1 and A 2 , respectively. As we have seen before, the modality applied to a relational type τ requires the two components related by the type τ to be identical. In fact, has a comonadic flavor and it follows the standard comonadic rules s-r-T and s-r-D. Rule s-r-bd can be read as follows: When two equal functions (of τ 1 −→ τ 2 )) are given equal arguments (type τ 1 ), the results are equal and the relative cost is 0. We used this rule implicitly in the mapi example in Section 2.
Next, we present the subtyping rules for monadic types. The first rule, s-um, allows subtyping on the unary monadic type. It says that we can subtype by weakening the costs, adding more (write) permissions to the precondition and removing permissions from the postcondition, as manifest in the premises P ⊆ P and Q ⊆ Q. Rule s-rm similarly allows subtyping on the relational monadic type. This rule says that we can subtype by weakening the relative cost, making the precondition more precise and the postcondition less precise, where P is more precise than P when P tells us more about which values are equal. In particular, γ → β is more precise than γ → β when β ⊆ β. This is why the premises of s-rm check P ⊆ P and Q ⊆ Q . Note that the checks on P, P and Q, Q are dual in the two rules. This is because the meanings of the pre(post)condition in the unary and relational monadic types are completely different. Finally, rule s-rum allows subtyping from the modality U applied to two unary monadic types to a single relational monadic type. This rule is best read as follows: If we have two computations that modify an array (associated with the static name γ i ) at positions in T i and T i , respectively, (left side of ), then running them on two arrays that agree at all positions outside the set β will result in two arrays that agree at all positions outside the set β ∪ T i ∪ T i ' (right side of ). This is because the indices in the set T i ∪ T i ' may be overwritten during at least one of the two executions.

Logical relations
To prove the soundness of ARel we build a step-indexed logical relation for its types. We give two interpretations-one unary and one relational, that interact at the type U(A 1 , A 2 ).

Unary interpretation
The value interpretation A g,k of a unary type A is, as usual, a set of values. Also, as usual, this interpretation is indexed by a "step-index" k ∈ N, which is merely a proof device for induction (Appel & McAllester, 2001;Ahmed, 2006). The step-index counts the "steps" in our operational semantics. Importantly, the interpretation is also indexed by a world g mapping from static names γ to triples (l, n, A) specifying the location, the length of the array, and the syntactic type of the elements of the array named γ . Technically, we are defining a Kripke logical relation, and the world g is a so-called Kripke world (Neis et al., 2011;Turon et al., 2013). We give the clauses defining the value interpretation of unary types in Figure 10.
The value interpretations of standard type constructors like pairs and functions are also standard. The value interpretation of an array type, Array γ [I] A g,k , is a set of locations. A location l is in this set if g(γ ) is (l, A, I), that is, the element type and length for γ in the world g match those in the array type and the location l corresponds to γ .
The value interpretation of monadic types relies on a heap relation H |= g,k P, which is defined in Figure 12. This heap relation means that the assertion P-which could be a preor postcondition from a unary monadic type-holds for the heap H at world g at step k. The {P} ∃ γ .A {Q} g,k is a set of monadic values v that when forced in a heap H validating the precondition P, yield a heap H 1 validating the postcondition Q. Additionally, the interpretation only allows those computations v that update arrays at locations for which the precondition P asserts permissions. We note that the interpretation quantifies universally over worlds g 1 that extend g (g 1 ⊇ g) and step-indices k 1 less than or equal to k. This is standard in step-indexed Kripke logical relations and makes the interpretation "monotonic", that is, closed under larger worlds and smaller step indices (Lemma 1).
Next, we extend the value interpretation to an expression interpretation: Compared to the value interpretation, the expression interpretation, which we identify by the superscript e, accounts for lower and upper bound costs L and U. The expression interpretation requires that if the expression t evaluates to value v with the step index k and cost c, then the cost satisfies the constraint L ≤ c ≤ U and the resulting value v is in the value interpretation of the type A with the remaining step index k − k . Next, we extend our interpretation to open terms. For this, we first extend our value interpretation to unary contexts: Given a substitution σ , we say that σ ∈ g,k if σ maps every variable in to a value in the interpretation of its unary type. We write σ t to denote the application of the substitution σ to the term t. With this, we can extend our interpretation to typed open terms, that is, typing judgments, as in the statement of the fundamental theorem (Theorem 2).

Relational interpretation
Next, we interpret relational types. Figure 11 shows the value and expression interpretations of relational types, and the interpretation of relational contexts. As in the case of the unary interpretation, we use Kripke worlds. Relational Kripke worlds are denoted G and they map static array names γ to 4-tuples (l 1 , l 2 , n, τ ). If G(γ ) = (l 1 , l 2 , n, τ ), then l 1 , l 2 are the locations where the arrays statically named γ are stored in the two runs, n is the length of each of these two arrays, and τ is the type at whose relational interpretation the two arrays' corresponding elements should be related.
The value interpretation of a relational type τ is written τ G,k . It is a set of pairs of related values. Most of the clauses of this definition are straightforward. Somewhat unusually, the interpretation of a function type contains a pair of (recursive) functions that satisfies not one but two conditions: (i) Given related values as arguments, the functions return related results and (ii) Each of the two functions, when given an argument in the (unary) interpretation of the argument type's unary projection, returns a result in the (unary) interpretation of the result type's unary projection. The first condition is standard. The second condition is needed to make our relational-to-unary projections of types sound.
To define the interpretation of the relational monadic types we need a relational heap relation (H 1 , H 2 ) G,k P, defined in Figure 12. The relation says when the heaps H 1 , H 2 from two runs satisfy a relational assertion P, which could be a pre-or postcondition from a relational monadic type. Intuitively, this relation holds when for every γ → β in P, G(γ ) is also defined, and if G(γ ) = (l 1 , l 2 , τ , n), then for every index up to n, the two arrays l 1 , l 2 in heaps H 1 , H 2 , respectively, have elements within the relational interpretation of τ , and every index where the two elements differ is in β. This formalizes the intuitive meaning of β from earlier sections.
Note that the condition on elements differing is a one-way implication: We do not insist that at every index in β, the two elements necessarily differ. In fact, depending on τ , in some cases, even elements at indices in β might be forced to be equal. For example, when τ is int[m] (for some m) or even ∃i.int [i], this forces corresponding elements to be equal at all indices since the relational interpretation of int[m] is the singleton {(m, m)}. However, when τ = U(A 1 , A 2 ), elements at indices in β can be arbitrary values of types A 1 , A 2 since the relational interpretation of U(A 1 , A 2 ) is morally A 1 × A 2 .
Like the unary heap relation, the relational heap relation is well-founded by induction on the step index. {P} ∃ γ .τ {Q} (Figure 11) is the set of pairs of values (v 1 , v 2 ) that when forced starting from heaps H 1 , H 2 satisfying the precondition P, result in heaps H 1 , H 2 satisfying the postcondition Q. The relative cost of forcing must be upper bounded by D.
Next, we extend the value interpretation of relational types to pairs of expressions: The intepretation simply says that two expressions e 1 , e 2 are in the interpretation of type τ if they both evaluate to some values v 1 , v 2 and those values are in the value interpretation of τ . Additionally, the costs c 1 , c 2 of these two evaluations satisfy c 1 − c 2 ≤ D. Note that the step index counts steps of only the left evaluation, not both. We could, alternatively, have set up the interpretation to count steps of only the right evaluation or both. Next, we extend our value interpretation to relational contexts (as for the unary case), and the statement of our fundamental theorem (Theorem 4.3) contains the interpretation of typed open terms, that is, the relational typing judgment.
Note. For readers familiar with Kripke logical relations, we note that our worlds g and G are not step-indexed (only our logical relations are step-indexed). This is unlike some prior work (Neis et al., 2011;Turon et al., 2013). We do not need step-indexed worlds since we include syntactic types, A or τ , for mutable locations (arrays) in the worlds. This suffices for our purposes because our language only considers arrays whose elements are of base type like int, bool, etc.

Fundamental theorem
In this section, we prove a standard theorem, called a fundamental theorem, for each of our two interpretations, unary and relational. We use δ : to mean that δ is a well-sorted substitution for the index variables in the domain of , and δ to denote that δ satisfies the constraint environment . As a preliminary step, we show a monotonicity lemma, which is useful in the proofs of our fundamental theorems.
Lemma 1 (Monotonicity).  (1) and (3), related to the interpretation of relational types, are proved simultaneously by induction on τ . Similarly, points (2) and (4), about the interpretation of the unary types, are proved simultaneously by induction on the unary type A. Points (5) and (6), about the contexts, follows directly from points (1) and (2).

If k ≤ k and G
The monotonicity lemma says that all our interpretations are monotone with respect to larger worlds and smaller step indexes. Take the value interpretation of a relational type as an instance. If a pair of values (v 1 , v 2 ) is in this interpretation for some step index k in some world G, then (v 1 , v 2 ) is also in the value interpretation of the same type for any smaller step index k and any bigger world G .
Next, we present the fundamental theorem for ARel's unary typing. The theorem states the following. Suppose we have an expression t that is well-typed at a unary type A in contexts , , and with cost lower and upper bounds L and U. If we close everything using a substitution δ for the index variables in (satisfying ) and a substitution σ for the term variables satisfying the (context) interpretation of δ (at world g and step index k), then the closed term σ t is in the interpretation of the closed type δA with bounds L and U (at world g and step index k).
Proof. The proof is by induction on the derivation of the judgment ; ; ; U L t : A. We present some of the most relevant cases. exec(L 1 +L 2 +L read ,U 1 +U 2 +U read )

u-read
By assumption we have δ : , δ and σ ∈ δ g,k . We need to show: k,(0,0) Since read (σ t 1 ) (σ t 2 ) is a value, and its evaluation incurs no cost, it is sufficient to show: By the definition of forcing evaluation we have: So, unfolding the definition of interpretation, for k ≤ k, g ⊇ g, and an arbitrary heap H such that H g ,k P and such that k 1 + k 2 + 1 < k ≤ k we must show: By induction hypothesis, from the first premise and Lemma 1 we have: σ t 1 ∈ Array γ [δ I] (δA) e g ,k ,(δ L 1 ,δ U 1 ) From this we have: which in turn tells us: g (γ ) = (l, δA, δ I).
By induction hypothesis, from the premise ; ; ; U 2 L 2 t 2 : int[I ] and Lemma 1 we have: From this we have that: n 2 ∈ int[δ I ] g ,k −k 2 , which in turn tells us that Now to conclude we can proceed as follows: 1. the inequality δ L 1 + δ L 2 + δ L read ≤ c 1 + c 2 + c read ≤ δ U 1 + δ U 2 + δ U read can be shown using the fact that L read ≤ c read < U read and the fact that in our language L read and U read are constants , which means δ L read = L read and δ U read = U read . 2. H g ,k −(k 1 +k 2 +1) P is proved by unfolding the definition of our assumption H g ,k P. 3. For v ∈ δA g ,k −(k 1 +k 2 +1) , we know that from the heap relation H g ,k P and the premise γ ∈ dom(P), we have ∀i ≤ n, (H 1 (l)(i)) ∈ A g ,k −1 , which in turn tells us v ∈ A g ,k −1 . We can then show our goal by using Lemma 1. exec(L 1 +L 2 +L 3 +L updt ,U 1 +U 2 +U 3 +U updt ) {P γ → β} ∃_ : unit {P γ → β}

(); H(l)[n] ← v f-updt
So, unfolding the definition of the interpretation, for k ≤ k, g ⊇ g, for an arbitrary heap H such that H g ,k P γ → β, and k 1 + k 2 + k 3 + 1 < k , we must show: By induction hypothesis, from the first premise ; ; ; U 1 L 1 t 1 : Array γ [I] A and Lemma 1, we have: From this we have: By induction hypothesis, from the second premise ; ; U 2 L 2 t 2 : int[I ] and Lemma 1, we conclude: From this we have that n ∈ int[δ I] g ,k −k 2 , which in turn tells us that By induction hypothesis, from the third premise ; ; ; U 3 L 3 t 3 : A and Lemma 1, we have: To conclude, we proceed as follows: 1. the inequality δ L 1 + δ L 2 + δ L 3 + δ L updt ≤ c 1 + c 2 + c 3 + c updt ≤ δ U 1 + δ U 2 + δ U 3 + δ U updt can be shown using the fact that L updt ≤ c updt < U updt and L updt and U updt are constants, which means δ L updt = L updt and δ U updt = U updt .

(H(l)[n] ← v)(l)[i]
∈ δ A g ,k −(k 3 +1) because of the our previous conclusion v ∈ δA g ,k −k 3 . Then our goal can be proved by using Lemma 1. 3. v ∈ unit g ,k −(k 1 +k 2 +k 3 +1) is proved by the definition of the interpretation of unit type. 4. ∃n.P γ → β = {γ 1 → T 1 , . . . , γ n → T n } ∧ ∀i ∈ Similarly, we have a fundamental theorem for the relational typing. The theorem says the following. Suppose we have a pair of expressions (t 1 ,t 2 ) that is well-typed with the relational type τ in the contexts , , , and with upper bound D on the relative cost. Suppose we close with the substitution δ for index variables, which also satisfies the constraint , and close with a pair of closing substitutions (σ 1 , σ 2 ) satisfying the interpretation of the relational context δ . Then, the pair of closed terms (σ 1 t 1 , σ 2 t 2 ) is in the expression interpretation of the closed relational type δτ with the cost upper bound D.
Proof. The proof is by induction on ; ; ; t 1 t 2 D : τ .
is proved by the previous conclusions.

r-e-bind
By assumption we have δ : , δ and (σ 1 , σ 2 ) ∈ δ G,k , we need to show: By the definition of the evaluation of σ 1 t 2 of the form σ 1 t 2 ⇓ c 1 ,k 1 v 1 , and the evaluation of the monadic bind of the form, we have: So, unfolding the definition of the expression interpretation, we must show: By Theorem 2, from the second premise ; ; ; | | 1 U 2 L 2 t 2 : exec(L ,U ) {P 2 } ∃ γ 1 : A 1 {Q 2 }, instantiated with σ 1 ∈ δ| | 1 |G| 1 ,k inferred from (σ 1 , σ 2 ) ∈ δ G,k , we have: We then need to show: By the definition of the forcing evaluation we have: So, unfolding the definition of the interpretation, for G ⊇ G, k ≤ k − k 1 , arbitrary heaps H, H such that H, H G ,k P and k 2 < k ≤ k − k 1 , we must show: By induction hypothesis using Theorem 2, from the first premise, we have: which in turn tells us the following when we define g 2 = |G| 2 : From this, we can conclude: By induction hypothesis, from the third premise instantiated with ( From this we have: {P ∪ P 1 } ∃ γ 1 .δτ {Q}) G 2 ,k−k 1 and have: (H, H 1 ) G ,k P P 1 by using the definition of the heap relation and the premise dom(P) = dom(P 1 ). We choose G 2 = G and then have: Now to conclude we can proceed as follows: 1. (H 2 , H 2 ) G ,k −k 2 Q is already proved by the previous conclusions. 2. (v 2 , v 3 ) ∈ δτ G ,k −k 2 is already proved by the previous conclusions.
When we have the fundamental theorems for both the unary and relational typing, we can easily have the soundness of our system.
Proof. Proof of statement (1) follows directly by the fundamental theorem for uanry typing. Proof of statement (2) follows directly by the fundamental theorem for relational typing.

More examples
We discuss here five more examples demonstrating how we perform relational cost analysis on programs with arrays. To improve readability, we omit some annotations and use syntactic sugar. For example, we abbreviate let {x} = t 1 in t 2 to x ← − t 1 ; t 2 and even to t 1 ; t 2 when x does not appear in t 2 . We also shorten the type U(A, A) to U(A) and use if t then t 1 else t 2 as syntactic sugar for case(t, x.t 1 , y.t 2 ) when x and y do not appear in t 1 and t 2 , respectively.
In some of the examples, such as the Cooley-Tukey FFT Algorithm below, we add a dummy first argument of type unit to a recursive function. This allows us to make the recursive function polymorphic in index terms.

Cooley-Tukey FFT algorithm
The Fast Fourier Transform or FFT is a discrete Fourier Transform of a sequence of numbers. The Cooley-Tukey algorithm is a commonly used FFT algorithm that uses divide-and-conquer to split the sequence (Cooley & Tukey, 1965). Here, we perform a relational cost analysis of an imperative implementation of the algorithm that represents the sequence as an array and uses in-place updates. The objective of the analysis is to prove that the implementation is constant time-any two runs with arbitrary inputs take the same amount of time, assuming that array read/write operations and primitive numerical operations like addition and multiplication are constant time. Our implementation is shown in Figure 13.
The recursive function FFT uses divide-and-conquer. The variable x is the input array, y is another array used for temporary storage, m is the length of the range of the array to fix FFT ().λx.λy.λm.λp.
if 2 ≤ m then separate () x m y p ; be transformed, and p is an index that specifies the starting index of the range of the array to be split in the recursive call. This function uses a helper function separate to relocate elements in even positions to the lower half of the array x and elements in odd positions to the upper half of the array respectively, using y as a scratchpad. The function separate also uses two helper functions-the function sp does the separation work using temporary storage and the function cp copies the separated part from the temporary storage back to the original array. We omit the code of sp and cp here; this code can be found in the Appendix. Another helper function loop simulates a for-loop in which the input array x is updated to actually perform the Fourier transform.
Intuitively, this example is constant time (for arrays with fixed length) because the sequence of array accesses depends only on the array length, not on array contents. Formally, every function in this example relates to itself with relative cost 0. For instance, the relational types of separate and loop are shown below. separate separate 0 : The constraint P + M < N ensures that we never index the array past its end. Next, we show the relational type of FFT, again with relative cost 0.
The typing derivations for the above typing judgments are straightforward, so we omit them here. We note that a different way to achieve the same result is to first compute precise lower and upper bounds on the unary cost of FFT and then show that they are, in fact, equal. This works, but is much more difficult because computing the precise unary cost of FFT is quite involved. We first establish the precise cost of the auxiliary functions. For example, we need to give the following unary types to separate and loop. Once these unary costs are available, we can conclude that the function FFT has the same min and max costs: 8 * M * log(M) and is, thus, constant time (using the rule r-switch). 3 While both the unary and relational reasoning can show that this example is constant time, the relational reasoning is much easier in this case since the relative cost is 0 everywhere while the unary reasoning requires a precise computation of the actual complexity of all functions.

Naive string search
We show how a combination of unary and relational reasoning can give a precise relative cost to a classic array algorithm: substring search.
We represent strings as arrays of integers (storing the ASCII code of each character). In Figure 14, the function NSS takes as input, a "long" string s and a "short" string w in the form of arrays, the lengths l s and l w of these arrays, and an array p of length l s (we call this the result array). This function iteratively searches the substring w at each position in s and records in p whether the substring is found at that position (1) or not (0). To do this, NSS uses helper function search, which is also shown in Figure 14.
The function search has the same inputs as NSS except for the additional index i, that iterates over the positions of l w . The two conditionals check whether search is in its final step (i + 1 == l w ), and whether the two corresponding characters in s and w coincide. When the two characters differ, p is updated with 0. When the two conditionals are satisfied at the same time, p is updated with 1.
Intuitively, search runs fastest when the first character of w does not appear in s. It runs slowest when the suffix of w starting at index i occurs in s at offset m + i. The difference between these two costs is a bound on the relative cost of search. Consider two runs of search on the same string s, the same index i and where the two ws agree on some prefix. The runs behave identically until we reach an index i where the two ws differ for the first time. We can use this index to give a better bound on the cost of search relative to itself. To write this bound, we need to express the first index in the range [i, l w ] where the two ws differ. In ARel, the index term MIN(β 2 ∩ [I, ∞)) represents this index (assuming β 2 is the relational precondition of w and I is the static index refinement for i's size). Then, search incurs a nontrivial relative cost only after this index is reached. Using this idea, we can show: where P = {γ 1 → ∅, γ 2 → β 2 }, R is the static size of l w , and r is the (constant) cost of two read operations. The constraint I < R guarantees that the search will not exceed the length of the array w and the constraint R < N guarantees that w is shorter than s. The other constraint M + I < N guarantees that the search will not exceed the length of the array s. The postcondition modifies only the set associated with γ 3 , from β 3 to β 3 ∪ {M}, representing that only the result array p will be overwritten. To account for the case where w is the same in the two executions we also add a lower bound R − 1 to the cost. The relative cost we establish here is more precise than the cost (R − 1 − I) * r we would achieve with a non-relational analysis. We stress here that to obtain this relative cost, the rule r-fix-ext is essential. At a high level, typing proceeds by case analysis on I ∈ β 2 . When I ∈ β 2 we can proceed relationally with relative cost 0 in the recursive call. When I ∈ β 2 the control flows may differ in the two runs and we need to switch to unary reasoning via the rule r-switch. To obtain our bound using unary worst-and best-case analysis, we need the precise unary type of search, which is available in the context only due to the rule r-fix-ext. The details of this proof are in our appendix. Using the above type of search, we can also obtain a tight bound on the cost of NSS relative to itself: . This is simply the number of times search is called (N − M − R) multiplied by the relative cost of search.

Mergesort
As our next example, we consider an imperative version of mergesort. Similar to what we did for the mapi example in Section 2, we present two relational types for mergesort, corresponding to two different assumptions on the inputs. Consider the function msort in Figure 15 which sorts the elements of an array a from index l to u, using another array b as buffer. We use an auxiliary function merge that merges the two sorted partitions of the array. This function is defined in Figure 15. The function merge takes in input the array a, the buffer array b, the starting index l and ending index u, and additionally asks for a "midpoint" index m at which the array range is divided. It uses two helper functions: the function merge lp which implements the standard merging process in a recursive way, and the function copy which copies the merged buffer array back to the working array. We omit the code of the function copy here (it can be found in the Appendix). We show just the code of the function merge lp . The argument k is the position where to store the merged element in the buffer array b, the four arguments l s , l e , r s , and r e separate the array a into two portions, the left one is specified by the variables l s and l e representing its starting and ending indices, whereas the right portion is specified by r s and r e . The idea underlying this function is to check if there is an element available in the left portion and right portion using conditionals, then compare two elements one from each side, before storing the smaller one in the buffer array at the position k. Then this pointer is updated, and the process is repeated recursively.
The first relational type we want to show for msort assumes that the two arrays, which are input to the two runs, are equal in the range [l, u]. Intuitively, the fact that we have the same array elements in the range [l, u] implies that we have the same execution path, and so we can prove that the relative cost is 0.
The assumption that the two arrays from the two runs are equal in the range [l, u] is represented by the constraint (β ∩ [l, u] = ∅), where β is the set containing indices at which the two arrays can differ. Using this assumption, we can derive relational types asserting relative cost 0 for both merge lp and copy, and thus for merge. We show the types of merge lp and merge below.
merge lp merge lp 0 : Notice that we also restrict the value of u to be between the starting index l and the array length of a. Moreover, notice that the precondition and the postcondition in merge are equal. This follows again from the assumption we have for this typing: since the two arrays coincide in the range [l, u] before the sort, they will also coincide after the sort, so no new element will be added to β. Next, we show a more general relational type for msort that does not assume that the two input arrays are equal in the range [l, u]. We start a general type for the function merge lp . Without the assumption, we may have different execution paths in two runs of this function. We use the rule r-switch to switch to the unary analysis to get the relative cost and the rule R-X to use the subtyping rule r-rum for the proper monadic type. Using this approach we get the following relational type for merge lp : merge lp merge lp 0 : ∀K, l s , l e , r s , r e , β, γ , γ , N. So, we have that the relative cost of two executions of merge lp is max(l e − l s , r e − r s ). (The function copy does not contain branches and so it does not introduce a nontrivial relative cost.) Using the relational type described above we can derive the following relational type for merge: From this, we can give a relational type to msort. msort msort 0 : and H = log 2 (n) . To prove that msort has this relative cost, we need some algebraic properties of the recurrence relation Q, which we postpone to the appendix. While the cost looks complicated, we prove that it is in O(n · (1 + log 2 (|β ∩ [l, u]|))). This is consistent with a previous relational analysis of a non-imperative variant of msort (Çiçek et al., 2017).

Inplace insertion sort
Our next example, inplace insertion sort, implements the insertion sort algorithm without any temporary array. This is an involved example which combines most of the ideas we discussed in other examples, in order to derive a precise relative cost. The relative cost is complex but we can show that under reasonable assumptions, ARel provides a more precise relative cost than a unary analysis. The algorithm is shown in Figure 16. The function ISort sorts the input array s in the range [i, l s ]. Intuitively, we observe that the cost of ISort relative to itself should be the sum of the possible cost variation of every recursive call, which is mainly determined by the auxiliary function insert shown in Figure 16.
The recursive function insert implements the standard operation of inserting an element into an array by finding the right position x to insert the element at and shifting elements behind x in the array backward before updating the value at index x to a. The input arguments x and i specify the range in the array to insert into.
The function insert uses a helper function shift, which performs the shift operation. It is not hard to observe that the auxiliary function shift, which shifts the elements in the range [idx, i] backward by one index, uses one read operation and one update operation at every index. Finding the right position only needs one read operation. The unary cost of ISort is maximum when the input array is initially sorted in descending order. In contrast, the unary cost is minimum when the input array is initially sorted ascending. Assuming that read and update operations incur a unit cost each, the unary type of insert is as follows: With this unary type of insert in hand, we can obtain the relative cost of insert by switching to unary reasoning and then taking the difference. An interesting observation is that if the input arrays of the two runs coincide in the insertion range [X , I] and the elements "a" being inserted also agree, then insert's cost relative to itself is 0. The corresponding relational type is shown below, where the constraint β 1 ∩ [X , I] = ∅ describes our aforementioned assumption.
This observation can be used in typing ISort: For every I, we split cases on whether β 1 ∩ [0, I] = ∅ or not (using rule r-split). While β 1 ∩ [0, I] = ∅, we proceed relationally (with 0 relative cost). Once β 1 ∩ [0, I] = ∅, we switch to unary reasoning using rule rswitch since control flow may differ in the two runs. We need the rule r-fix-ext to allow us to switch back to unary typing when the control flow actually differs at some point. Using this idea, we obtain a very precise relational cost for ISort.
where the index term k = max(I, min (MIN(β 1 ), N)) represents the first index where the two arrays differ. The relative cost N * (N+1)−k * (k+1) 2 is the sum of all the relative costs generated in the recursive calls corresponding to indices in the range [k, N]. Recursive calls up to index k incur 0 relative cost, as noted above. More details are provided in the appendix. We note that the cost obtained here is more precise than the relative cost that can be obtained using unary reasoning alone.

Loop unswitching
In this example, we examine a classical technique from compiler optimization, loop unswitching. We show how ARel can provide a more precise relative cost than the standard worst-case/best-case analysis when dealing with two programs that are not structurally similar.
In Figure 17, we present a function loop which operates a simple loop over an input array A, from index m, to the end of the array n (of type int [N]). Inside the loop, we have an if statement whose then branch reads the value in the array at index m and does some pure computation f h on the value h just read, and then stores the result back to the array, before moving on to the next iteration. For simplicity, the else branch does nothing interesting.
This program can be optimized by pulling out the if conditional from the function body, as we can see in the right part of Figure 17. We call the optimized program loopOp. Suppose that the original program loop and the optimized program loopOp operate on the same array A, run the same pure computation f inside the loop, and share the boolean input b in two runs. Intuitively, the relative cost of loop and loopOp is upper bounded by N (the size of the array) when we count one unit cost for elimination forms because loop checks b at each iteration, while the optimized one checks b only once. Using unary analysis, we can obtain the following unary types.
λf .λb.loop : In both types above loop, both the precondition and the postcondition map γ 1 to N, which is consistent with our assumption that the array will be updated during the execution of either loop or loopOp. With the help of the subtyping rule s-rum, we obtain the cost of loop relative to loopOp, which is 4 * (N − M). When we start the loop from the beginning, which means M = 0, then the relative cost is bounded by 4 * N, which is higher than the N we expected.
We can do better if we use the relational analysis and, in particular, the relational asynchronous rule r-e-case in Figure 6 on p. 20 (recall that if t then t 1 else t 2 is syntactic sugar for the construct case (t, x.t 1 , y.t 2 )). To be precise, we can derive a simplified asynchronous rule r-e-if from r-e-case.
This asynchronous rule allows us to relationally type loop relative to the inner recursive function loop' inside loopOp. When we compare the bodies of loop and loop', and type the "then" branch of the first if conditional (if m < n . . . ), we come across structurally dissimilar pieces of codes: the piece of code inside the box in loop and the piece of code inside the box in loop'. Here, we use the r-e-if rule above. We want to avoid comparing the "else" branch return () in the loop with the boxed part of loop', when we know that the condition b is the same in the two runs. To this end, we refine our unary boolean type bool to bool[B] and our relational boolean type bool r to bool r [B], where B ∈ {true, false}. (The erasure operation over the refined relational type is defined as |bool With this, we can relationally type the two programs with a precise relative cost as follows: λf .λb.loop λf .λb.loopOp 0 : The negative cost −1 in the type comes from checking b at the beginning in the optimized version. The relative cost embedded in the monadic type reflects the difference between the boxed code in loop and the boxed code in loop' for every iteration.

Bidirectional type checking
In this section we discuss the implementation of ARel. Implementing ARel naively results in an immediate challenge: the relational aspects of ARel create non-determinism in the type system. This non-determinism comes from two aspects. First, some of the rules are not syntax-directed. For example, in the split rule r-split in Figure 5, we can choose any constraint to split on (and this is an infinite choice); we can apply the switch rule r-switch in Figure 6 anywhere; in the rule r-fix-ext shown in Figure 7, we have to guess the unary types of the functions (again an infinite choice); simultaneously, there are two rules for every array operator, which makes a choice between them non-deterministic because τ is a subtype of τ . Second, the existence of the modality and its interaction between other types makes the algorithmization of relational subtyping quite tricky. To be concrete, it is challenging to define relational subtyping algorithmically while preserving transitivity of the subtyping relation.
To address these challenges, inspired by the work of Çiçek et al. (2019), we introduce a two-step method to algorithmize ARel. We first introduce an intermediate core language denoted ARelCore corresponding to an annotated, syntax-directed, version of ARel. An elaboration of ARel to ARelCore eliminates the non-determinism of ARel in two steps. First, it adds annotations on term constructors to resolve non-syntax-directed typing rules. Second, it eliminates relational subtyping in the core language to avoid its corresponding non-determinism. The elaboration replaces relational subtyping with explicit coercions in the core language. The language ARelCore is sound and complete with respect to ARel, modulo finding the right elaboration. We then develop bidirectional type checking for ARelCore as the main component of our implementation, and heuristics for elaboration.
This section is organized as follows. We first present the ARelCore language through its syntax and typing rules. Then, we demonstrate the elaboration from ARel to ARelCore by showing some of the elaboration rules. We also justify its soundness and completeness. Then, after the introduction of ARelCore and the elaboration, we focus on the algorithmization and discuss how to construct a bidirectional type system with respect to ARelCore.

ARelCore
The difficulties encountered when trying to algorithmize ARel disappear when we instead algorithmize a core language suitable for bidirectional type checking. This core language can be seen as a theoretical medium for algorithmization, which is sound and complete with respect to our declarative type system ARel.

Syntax
The syntax of ARelCore is an extension of the syntax of ARel with the annotations and corresponding term constructs shown in Figure 18. These annotations and new constructs make ARelCore's type system syntax-directed. For instance, the term construct split t with C can be used to mark a use of the rule r-split, which splits on the constraint C in the type checking of t. The use of the rule r-fix-ext of Figure 5 is indicated by the construct FIXEXT f (x).t with A that provides the unary type A of fixf (x).t. Similarly, the constructs NC t and switch t specify the use of the rule r-nc which introduces the modality, and the switch rule r-split, respectively. The construct contra t allows us to assign an arbitrary type to the expression t if there is a contradiction in our constraint environment. In addition, we introduce two variants of each array operation, for example, alloc and alloc b correspond to the two rules r-alloc and r-allocb, respectively.

Typing rules of ARelCore
The main purpose of ARelCore is to provide a syntax-directed type system that we can use as the target for a translation of ARel programs. Like ARel, ARelCore has two typing judgments. We have the unary judgment ; ; a ; U L t : c A that assigns to a single term t a type A, and bounds L and U to t's evaluation cost, under the environment ; ; a ; . We also have relational typing ; ; a ; t 1 t 2 D : c τ , which says that two terms Fig. 19. Selected typing rules of ARelCore.
t 1 and t 2 are related at type τ , and that their relative cost is bounded by D, under the context . We present in Figure 19 a selection of the typing rules for ARelCore. We select rules for terms that may incur non-determinism in ARel, and we show how the non-determinism can be resolved at the syntactic level in ARelCore. As an example, to resolve the nondeterminism caused by array-based operations we now have two different constructors for each operation and two different rules for them. For instance, for the read operation, we have a pair of terms read t t and read t t which correspond to the two typing rules c-r-read and c-r-readb, respectively. These typing rules of ARelCore are similar to their counterparts in ARel. Consider the rule c-r-read. It is easy to see that it has a structure similar to that of the rule r-read in Figure 7. For example, the same premise ; I ≤ I guarantees the array bound limit. Such similarity can also be found between the rule c-r-readb and the rule r-readb, and other pairs of array-related rules. Subtyping is managed in a different way. For unary subtyping, we have the rule cwhich has a form similar to its counterpart in ARel. This is mainly because the unary subtyping relation allows a direct algorithmization. On the other hand, relational subtyping in ARelCore is limited to type equivalence as in rule c-r-≡ ≡ ≡. ARel's subtyping is then simulated using explicit coercion functions. The main reason for this approach comes from the fact that the modalities and U prevent easy algorithmization. We show that every relational subtyping can be simulated by applying a coercion expression in ARelCore.
Lemma 4 (Simulation of Binary Subtyping in ARelCore). If ; |= τ τ then there exists a term t in ARelCore, which we also denote coerce τ ,τ , such that Proof. By induction on the given subtyping derivation.

Elaboration
Now that we have ARelCore, we want to show that we can elaborate any well-typed program from ARel into a well-typed program in ARelCore. The unary elaboration judgment ; a ; U L t t * : A represents the translation of a term t in ARel to its counterpart t * in ARelCore. Both terms have the unary type A and bounds L, U in the contexts ; ; a ; . In the same vein, the relational elaboration judgment has the shape ; ; a ; t 1 t 2 t * 1 t * 2 D : τ . It represents the translation of a pair of terms (t 1 , t 2 ) to (t * 1 , t * 2 ). Both pairs are typed at τ in the given contexts, and both pairs have the same relative cost bound D. We show a selection of the unary and relational elaboration rules in Figure 20. In the unary elaboration subsumption rule e-u-sub and the relational rule e-r-sub, we see the two different approaches to subtyping we discussed previously. The rule e-u-sub preserves the unary subtyping relation during the translation. As we mentioned before, ARelCore eliminates relational subtyping using coercions, which are provided by Lemma 4. In the rule e-r-sub, the ARelCore term t is such a coercion. Elaboration rules for array operations have structures that are quite similar to the structure of the rules in ARel, except that they map terms to their annotated versions.
Elaboration is sound and complete in the sense of the following theorems. Proof. By simultaneous induction on the given elaboration derivations. t 1 t 2 D : τ then ∃ t * 1 and ∃ t * 2 such that ; ; ; Proof. By simultaneous induction on the given typing derivations.

Algorithmization
Next, we algorithmize ARelCore's type system. Here, we face the usual challenge of algorithmizing any type system: The need to either annotate or infer the types of bound variables. The problem is more nuanced than would be in a simply typed or even a refinement type calculus, since we must also deal with cost bounds in function and monadic types. To address this challenge, we rely on bidirectional type checking or local type inference (Pierce & Turner, 2000), where type annotations must be provided only at explicit beta-redexes and at the top level, but everything else can be inferred. Our algorithmic system, which we call BiARel, inherits ARelCore's syntax and adds two annotated constructs (t : τ , D) and (t : A, L, U), as shown in Figure 21. These are required for bidirectional type checking. For background, bidirectional type checking is a widely used method to implement type systems. The main idea is to split the typing judgment into two judgments corresponding to two modes: one for checking types and the other for synthesizing types. The combination of these two modes reduces the number of annotations needed to guide a type checker. Rules of a bidirectional type system usually resemble those in standard declarative type systems. This simplifies the proof of soundness and completeness of the bidirectional type system relative to the declarative type system. Our bidirectional type system for ARelCore is similar to the one for RelCost (Çiçek et al., 2019), but extended in a nontrivial way to support array-related operations and for our extended fixpoint operator.
The core language has two typing judgments, the unary and relational one, while we have four of them in BiARel: two relational and two unary judgments. The relational typing judgment of ARelCore splits into two relational judgments in its bidirectional version, one for the "checking mode" and one for "inference mode". The relational checking judgment has the follwoing form: Given the location environment , the index variable environment , the existential variable context ψ a , the current constraint environment a , the relational typing context , and terms t 1 and t 2 , we check against the relational type τ and the relative cost D, and we generate the constraint , which must be discharged separately. In contrast, the relational inference judgment has the follwoing form: Here, we synthesize the relational type τ and the relative cost D, and we generate the constraint with all the newly generated (existential) variables in ψ. Similarly, we have two judgments for the unary case. The unary checking judgment has the form ; ; ψ a ; a ; t ↓ A, L, U ⇒ , while the unary inference judgment has the form ; ; ψ a ; a ; t ↑ A ⇒ [ ψ ], L , U , . Both these judgments can be understood in a way similar to their relational counterparts. In all the judgments, we write all the outputs (inferred components) in red boxes and inputs in black. We can think of our unary checking judgment as a mutually recursive function check(t, A, L, U), whose inputs include a term t, a type A, the bounds L and U. The generated output is a constraint , which holds exactly when t indeed has type A with execution cost bounded by L and U in our semantics. Likewise, the unary inference judgment performs like a function whose input is a term t. The outputs cover the inferred type A , the constraint , the bounds U , L , and an index variable environment ψ that tracks all the new generated index variables during the procedure of the inference. Here, the type and bounds hold iff ∃ ψ .
holds. Notice that algorithmic typing judgments have one more input context ψ a , which records previously eliminated existential variables.
We show selected algorithmic typing judgments in Figure 22 to explain how we handle ARel's non-determinism. The switch rule (r-switch) exists in both checking and inference modes. Both algorithmic rules relate the annotated terms switch t 1 and switch t 2 at the type U (A 1 , A 2 ) and generate the final constraint based on the constraints from subterms t 1 and t 2 obtained in unary mode. The relative cost D is the difference of the maximal unary cost of t 1 (U 1 ) and the minimal unary cost of t 2 (L 2 ). In the checking rule, alg-r-switch↓, this is forced in the output constraint. The split rule (r-split) exists only in checking mode (alg-r-split↓). The terms split t 1 with C and split t 2 with C determine that this rule must be applied, splitting on constraint C. The final output constraint C → 1 ∧ ¬C → 2 also analyzes C.
The algorithmic counterpart of the rule r-fix-ext in checking mode, alg-fixext↓, relates the annotated terms FIXEXT f (x).t with A 1 and FIXEXT f (x).t with A 2 and checks the subterms fix f (x).t and fix f (x).t at the unary types A 1 and A 2 , respectively. The final constraint is the combination of the constraints generated from the unary checking of the two subterms and the relational checking of the two function bodies.
The rule alg-r-↑↓ provides the possiblity of tranfering from inference mode to checking mode, while the rule alg-r-anno-↑ allows the opposite transfer. Notice that we check the equivalence of two types in the rule alg-r-↑↓. In most bidirectional type systems, one would check subtyping here but, as explained earlier, ARelCore only has type equivalence. We emphasize the presence of the annotated term (t : τ , D) in the rule alg-r-↑↓. This annotated term allows the user to provide the type τ and the effect (relative cost D) to be checked in checking mode, as shown in the rule alg-r-↑↓. It helps when the bidirectional type checker has difficulty inferring the type of a term t by transferring the inferring challenge to a task that is easier, namely, checking a user-provided type. The unary annotated term (t, A, L, U) is useful in a similar way.
Next, we discuss selected rules for array operations. These operations constitute the main challenge in our bidirectional type system relative to prior work. We show a selection of bidirectional rules for array operations in Figure 23. As mentioned, to resolve the nondeterminism between the -ed and non--ed rules for each array operation, we use distinct expressions, for example, alloc t 1 t 2 versus alloc t 1 t 2 . Notice that the conclusion of every array operation is typed in checking mode. The two allocation rules alg-r-alc-↓ and alg-r-alcB-↓ check the first arguments t 1 and t 1 against the relational type int[I] and relative cost D 1 , then check the second arguments t 2 and t 2 against the relational type τ (or τ ) and relative cost D 2 . The final constraint r = ∃D 1 :: R.( 1 ∧ ∃D 2 :: R. ) requires that there exist D 1 and D 2 such that 1 and 2 hold and that D 1 + D 2 equals the given cost D.
The algorithmic typing rules for read and updt have other interesting aspects. These rules are in checking mode but the types of the first two arguments are inferred, not checked. This is because, although we know that the first argument of read t 1 t 2 or updt t 1 t 2 t 3 must be an array and the second argument must be a number, we do not know the size of the array or the size (refinement index) of the number. Hence, we must infer this information. Additionally, these rules check the pre and postconditions. As an example, the condition ¬(I ∈ β) is checked in the rule alg-r-readB-↓ to guarantee that we indeed read the same value on the two sides. Similarly, in the rules alg-r-updt-↓ and alg-r-updtB-↓, the β in the postcondition, representing the differences between the two arrays, must be the same as the β in the precondition except for the index I which has been updated. For this, in the rule alg-r-updt-↓ we check that β = β ∪ {I }, while in the rule alg-r-updtB-↓ we check that β = β \ {I }, consistent with the corresponding declarative typing rules of ARel.
Finally, we show the soundness and completeness of BiARel with respect to ARelCore. Soundness says if the constraints in the output of a provable BiARel typing judgment for term t are satisfiable, then a corresponding typing judgment for the type-erased term |t| is provable in ARelCore. Completeness is the converse: If a term t has a typing derivation in ARelCore, then an annotation of t has a typing derivation in BiARel and the constraints in the output of this derivation are satisfiable. In the following theorem, the function FIV(r)eturns the free index variables in its argument.
θ a : ψ a means θ a is a valid substitution for ψ a under the index variable environment . This substitution is used in the theorem, for example, free index variables in constraints are substituted using ψ a , written [θ a ]. We also define the type erasure operation |t|, which erases (t : A, L, U) and (t : τ , D) to t. One heuristic that our type checker implements is to automatically determine whether to apply the -ed rule or the non--ed rule, rather than forcing the programmer to make this choice by providing an annotation on every array operation. Our heuristic applies theed rule first and tries to solve the generated constraints (with an SMT-solver, as explained later). If the constraint cannot be solved, the heuristic tries the non--ed rule. For example, when processing a read operation we always try the alg-r-readB-↓ rule first. The generated constraint is I ∈ β. We pass this constraint to the SMT solver and if it says yes (satisfiable), we just continue. If the SMT solver says no, we backtrack and try the alg-r-read-↓ rule.
Another heuristic is that we switch from relational to unary reasoning only when absolutely necessary. There are three cases when this happens: a) the unary type is explicitly mentioned with the construct FIXEXT t with A 1 , b) the switch term switch t is used, and c) no other relational rules apply.
These heuristics suffice for our examples and reduce our annotation burden at the cost of some extra type checking time.

Constraint solving
The primary difficulty in our implementation (and the most time-consuming step in type checking) is solving the constraints that the bidirectional type system generates. For this, we rely on an SMT solver. Specifically, we use Alt-Ergo (Bobot et al., 2013) through the Why3 frontend (Filliâtre & Paskevich, 2013). A fundamental difficulty here is that the SMT solver struggles with constraints that have too many existential quantifiers. To alleviate this concern, we rely on a solution proposed in the implementation of RelCost (Çiçek et al., 2019): We implement a simple algorithm that generates candidate substitutions for existentially quantified variables by examining equality and inequality constraints that mention the variables. From a simple inspection of the algorithmic rules, we can see that generated constraints contain inequalities on index terms such as I ≤ I to check the array bound limit (for instance, in the rule alg-r-read-↓), and equalities such as D 1 + D 2 = D to show that the inferred relative cost matches the relative cost we want to check. Çiçek et al. (2019)'s simple algorithm works remarkably well on these kinds of constraints.
A new challenge for ARel is how to represent and solve constraints involving the sets of integers β. There are three kinds of constraints involving these sets. (1) equalities of two sets β = β which are generated when the rule compares whether the precondition and postcondition are the same (e.g., in the rule alg-r-read-↓); (2) containments between sets such as β ⊆ β which are generated when the postcondition is updated (e.g., alg-r-updt-↓); (3) index inclusions I ∈ β which are generated when certain indexes appear or disappear after the execution of the computation (e.g., alg-r-updtB-↓). Index inclusions are also used by our heuristic to decide whether to use -ed rules or not. To express these constraints, we rely on the library for set theory from Why3. Its operations for membership, equality, inclusion, empty set, union, intersection, and difference are enough to solve our constraints, and work well in our experience.

Type checking example
We illustrate our implementation of type checking by walking the reader through the type checking of the following annotated version of the mapi function from Section 1.
First, we introduce two primitive functions for ≤ and plusOne. We use a trivial expression contra to simplify the actual implementations of these primitives. For ≤, contra appears in the term (contra, int → int → bool, 0), which also includes the relational type and relative cost of ≤. We omit the relational costs on the arrows here because they are 0, which is also the default cost in our concrete syntax. (In experimental results below, we do not count these functions as annotations, since they are just primitive functions that actually should be inserted by the compiler automatically.) Different from the mapi function in Section 1, our map example uses the annotated term SPLIT{t} with C to specify the use of the split rule r-split in Figure 5. This eliminates non-determinism and guides the type checking. We also have the annotated terms (read a k, τ 1 , 0 ; τ 1 , 0) and (updt a k ( f x), τ 2 , 0 ; τ 2 , 0), which vary from our standard terms because they now contain the two relational types and two relative cost upper bounds. As a reminder, the annotated terms aim to provide the necessary type and effect to help the type checker.
When we want to provide the relational type and relative cost for terms related to array operations, we need to provide two types and two relative costs, one for the -ed rule and the other for the non--ed rule, as depicted in our heuristics in Section 7.1. As an example, consider the expression (updt a k (f x), τ 2 , 0 ; τ 2 , 0). In this expression, (0) {g → b} ∃g.unit {g → b \ {i}} is the relational type we want our type checker to use for the expression updt a k (f x) when it tries the rule alg-r-updtB-↓. The second type, {g → b} ∃g.unit {g → b ∪ {i}} is the type for the non--ed rule alg-r-updt-↓, which is tried only if the rule alg-r-updtB-↓ generates an unsatisfiable constraint.
Overall, this example uses three annotations, which are shown in boxes in the code above. number of type annotations that are needed (#TYP), the number of annotations needed to disambiguate rules (#ESF), the time needed for type checking (TC), the time needed for solving the constraints that arise as premises during type checking (TC-SMT), and the time needed for solving the final constraint, which is the output of the type checking (TF-SMT). Our experiments were performed on a 3.1 GHz Intel Core i5 processor with 8GB of RAM. The programs mapi(1), mapi(2), boolOr, FFT, NSS, and ISort are implementations of the corresponding examples discussed in Sections 2 and 5. The example mapi(1) is mapi where we do not assume that the input functions are equal in the two runs. In mapi(2), we assume that the inputs functions are equal in the two runs (we presented the full annotated code for this example in Section 7.3). For FFT, which uses the auxiliary functions separate and loop, we report statistics for the whole program and individually for each auxiliary function. The program ISort uses helper functions insert and shift. These are also shown separately. The programs merge(1) and merge(2) are the two typings of imperative merge discussed in Section 5.

Experiments
The function SAM (square-and-multiply) computes a positive power of a number represented as an array of bits, while comp checks the equality of two passwords represented as arrays of bits. These last two examples are array-based implementations of similar listbased implementations presented in Çiçek et al. (2017). More details of these examples are in the Appendix.
The results in Table 1 show that ARel can be used to reason about the relative cost of functional-imperative programs. Unsurprisingly, examples combining relational and unary reasoning (using rules r-fix-ext and r-switch) such as boolOr, NSS and ISort need more annotations and need more time for both type checking and SMT solving. In some examples like ISort, the time taken for solving constraints in the premises of the rules (TC-SMT), is very high. This is because of the heuristic we described at the beginning of this section where we try -ed rules before non--ed rules. The SMT solver first tries to prove that the -ed rule can be applied, but in some cases it times out. This timeout period is counted in TC-SMT. It is set to 1s in all examples, except ISort and Insert, where we need 2s. TF-SMT, the time taken to check the final output constraint, is also high for some examples like ISort, but this is due to the complexity of the constraint.

Limitations and future directions
One obvious limitation of our current prototype is efficiency, as we mentioned, for example, NSS and iSort. Type checking slows down for two reasons: (1) The heuristic to determine whether to apply a -ed rule to array-based operations has to wait for SMT to timeout in some cases. The time for alt-ergo (our SMT solver) to solve constraints varies considerably depending on the examples. When dealing with examples with many array-based operations, the problem is exacerbated. Unfortunately, we use Why3 to connect to alt-ergo and have to set a large timeout to guarantee enough time for alt-ergo to deal with the constraint on all connections, which accounts for the unnecessary time consumption. (2) The complexity of the final constraint grows with the number of array-based operations. This complexity translates to longer SMT-solving times.
Another limitation of our implementation is that some annotations are still needed (despite our heuristics). We saw this in the example of Section 7.3.
We plan to improve our prototype by improving our heuristics and the constraint solving process. We would like to find a way to decrease connection times, the constraint solving time, and to make our backtracking more efficient. We also plan to investigate the use of other SMT solvers in order to improve efficiency further.

Related work
A lot of prior work has studied static cost analysis. We discuss some of this work here. Reistad & Gifford (1994) present a type-and-effect system for cost analysis where, like ARel, the cost can depend on the size of the input. Danielsson (2008) uses a cost-annotated monad similar in spirit to the one we use here. Dal Lago & Gaboardi (2011) present a linear dependent type system using index terms to analyze time complexity. Hoffmann et al. (2012a) present an automated amortized cost analysis for programs with complex data structures such as matrices. Wang et al. (2017) develop a type system for cost analysis with time complexity annotations in types. Recurrence extraction analyzes the cost by extracting recurrences which express the run time cost in terms of sizes of inputs, under either call-by-value, call-by-name, or call-by-push-value evaluation strategies (Danner et al., 2015;Kavvos et al., 2019;Cutler et al., 2020). However, none of these systems consider relational costs. Charguéraud & Pottier (2015) present an amortized resource analysis based on an extension of separation logic with time credits. Our use of triples and separation-based management of arrays references is similar to theirs. However, their technique is based on separation logic, while ours is based on a type-and-effect system. Moreover, they consider only unary reasoning while we are interested primarily in relational reasoning. Lichtman & Hoffmann (2017) present an amortized resource analysis for arrays and references using. Their technique represents the available "potential" before and after a computation, similar to our triples. Again, they focus only on unary cost analysis and consider mostly first-order programs and linear potentials.
Outside of cost analysis, a lot of work has considered relational verification techniques for other applications. Lahiri et al. (2010) present a differential static analysis to find code defects looking at two pieces of code relationally. Probabilistic relational verification has seen many applications in cryptography (Barthe et al., 2014) and differential privacy (Gaboardi et al., 2013;Barthe et al., 2015). Barthe et al. (2015) propose HOARe 2 , which uses relational refinements to reason about differential privacy and other probabilistic relational properties. ARel also relies on relational refinements to reason about pairs of arrays via assertions P, Q in our monadic types. The difference is that we choose lightweight assertions and use them to reason only about difference of arrays. Conversely, HOARe 2 uses arbitrary relational refinement types, which are more expressive, but which also require many more annotations.
The indexed types used by Gaboardi et al. (2013) are similar in spirit to ours. Their indices cover the size of the data types as we do, but they also track the sensitivity, which is useful for differential privacy. Our indices instead focus more on effects and differences of arrays. Zhang et al. (2015) introduce dependent labels into the type of SecVerilog, an extension of Verilog with information flow control. The use of a lightweight invariant on variables and security levels in SecVerilog is similar to our use of β, which is also an invariant on static location variables. Unno et al. (2017) present an automated approach to verification based on induction for Horn clauses, which can also be used for relational verification. Benton et al. (2014Benton et al. ( , 2016 introduce abstract effects to reason about abstract locations. This is conceptually similar to the way our preconditions and postconditions allow us to reason about different independent locations. Our work is directly inspired by RelCost (Çiçek et al., 2017) and DuCostIt (Çiçek et al., 2016). These are refinement type-and-effect systems for pure functional languages without mutable state. RelCost supports relational cost analysis of pure programs. In contrast, ARel supports imperative arrays. The difference is substantial: Besides significant changes to the model, the type system has to be enriched with Hoare-like triples, whose design is a key contribution of our work. RelCost has an implementation via an SMT back-end (Çiçek et al., 2019); we extend this approach with imperative features and support for sets of indices (our βs). Ngo et al. (2017) combine information flow and amortized resource analysis to guarantee constant-resource implementations. Their type system allows relational reasoning about resources through precise unary analysis. Their focus is on first-order functional programs and on the constant time guarantee, while we want to support functional-imperative programs and more general relative costs. Radicek et al. (2018) add a cost monad to a relational refinement type system, where refinements reason about relational cost, for programs without state. This system is expressive: it supports a combination of cost analysis with value-sensitivity and full functional specifications (RelCost can also be embedded in it). However, it requires a framework for full functional verification. Our approach is complementary in that we use lighter refinements that are easier to implement, but do not support full functional verification.

Conclusion
We have presented ARel, a relational type-and-effect system for reasoning about the relative cost of two functional-imperative programs with mutable arrays. Our key contribution is a set of lightweight relational refinements allowing one to establish different relations between pairs of state-affecting computations, including upper bounds on cost difference. We have discussed how ARel is implemented and used ARel to reason about the relational cost of several nontrivial examples.
ARel currently supports arrays whose elements are of base types, due to our choice of the lightweight monadic types for the array-based operations. Support for more complicated but common data types such as matrices (arrays of arrays) is something we would like to develop in the future. Other limitations come from the current implementation, as discussed in Section 7.5. Another possible direction for future work is to add other imperative data structures besides arrays.