1 Introduction
The tail recursion modulo cons (TRMc) transformation can rewrite functions that are not quite tail-recursive into a tail-recursive form that can be executed efficiently. This transformation was described already in the early 70s by Risch (Reference Risch1973) and Friedman & Wise (Reference Friedman and Wise1975) and more recently studied by Bour et al. (Reference Bour, Clément and Scherer2021) in the context of OCaml. A prototypical example of a function that can be transformed this way is map, which applies a function to every element of a list:

We can see that the recursive call to map is behind a constructor, and thus map as written is not tail-recursive and uses stack space linear in the length of the list. Of course, it is well known that we can rewrite map by hand into a tail-recursive form by using an extra accumulating argument, but this comes at the cost of losing the simplicity of the original definition.
The TRMc transformation can automatically transform a function like map to a tail-recursive variant but also improves on the efficiency of the manual version by using in-place updates on the accumulation argument. In the following sections, we formalize our calculus and calculate a general tail recursion modulo contexts algorithm (Section 3) that we then instantiate to various use cases (Section 4 and Section 5), and in particular we study the efficient modulo cons instantiation (Section 6), its extension to nonlinear control (Section 7), the user-facing first-class constructor context feature and finally conclude with benchmarks (Section 9) and related work. Readers may choose to read this article selectively in the following order:

1.1 Contributions
This paper is the extended version of Leijen & Lorenzen (Reference Leijen and Lorenzen2023). In previous work (Risch, Reference Risch1973; Friedman & Wise, Reference Friedman and Wise1975; Bour et al., Reference Bour, Clément and Scherer2021), TRMc algorithms are given but all fall short of showing why these are correct, or provide particular insight in what other transformations may be possible. In this article, we generalize tail recursion modulo cons (TRMc) to modulo context (TRMC) and try to bring the general principles out of the shadows of particular implementations and into the light of equational reasoning.
-
• Inspired by the elegance of program calculation as pioneered by Bird (Reference Bird1984), Gibbons (Reference Gibbons2022), Hutton (Reference Hutton2021), Meertens (Reference Meertens1986), and many others, we take an equational approach where we calculate a general tail recursion modulo context transformation from its specification and two general context laws. The resulting generic algorithm is concise and independent of any particular instantiation of the abstract contexts as long as their operations satisfy the context laws (Section 3).
-
• We can instantiate the algorithm by providing an implementation of application and composition on abstract contexts and show that these satisfy the context laws. In Section 4, we provide known instantiations of TRMC, namely modulo evaluation contexts (CPS) and modulo associative operations, and show that those instances satisfy the context laws. We then proceed to show various instantiations not so commonly associated with TRMC that arise naturally in our generic approach, namely modulo defunctionalized evaluation contexts, modulo monoids, modulo semirings, and modulo exponents.
-
• In Section 6, we turn to the most important instance in practice, modulo cons. We show how we can instantiate our operations to the hole calculus of Minamide (Reference Minamide1998), and that this satisfies the context laws and the imposed linear typing discipline. This gives us an elegant and sound in-place updating characterization of TRMc where the in-place update is hidden behind a purely functional (linear) interface.
-
• This is still somewhat unsatisfying as it does not provide insight in the actual in-place mutation as such implementation is only alluded to in prose (Minamide, Reference Minamide1998). We proceed by giving a second instantiation of modulo cons where we target the heap semantics of Xie et al. (Reference Xie and Leijen2021) to be able to reason explicitly about the heap and in-place mutation. Just like we could calculate the generic TRMC translation from its specification, we again calculate the efficient in-place updating versions for context application and composition from the abstract context laws. These calculated reductions are exactly the implementation as used in our Koka compiler.
-
• A well-known problem with the modulo cons transformation is that the efficient in-place mutating implementation fails if the semantics is extended with non-local control operations, like
(Danvy & Filinski, Reference Danvy and Filinski1990; Sitaram & Felleisen, Reference Sitaram and Felleisen1990; Shan, Reference Shan2007), or general algebraic effect handlers (Plotkin & Power, Reference Plotkin and Power2003; Plotkin & Pretnar, Reference Plotkin and Pretnar2009), where one can resume more than once. This is in particular troublesome for a language like Koka which relies foundationally on algebraic effect handlers (Leijen, Reference Leijen2017; Xie & Leijen, Reference Xie and Leijen2021). In Section 7, we show two novel solutions to this: The general approach generates two versions for each TRMc translation and chooses at runtime the appropriate version depending on whether nonlinear control is possible. This duplicates code though and may be too pessimistic where the slow version is used even if no nonlinear control actually occurs. Suggested by our heap semantics, we can do better though – in the hybrid approach we rely on the precise reference counts (Xie et al., Reference Xie and Leijen2021), together with runtime support for context paths. This way we can efficiently detect at runtime if a context is unique and fall back to copying only if required due to nonlinear control. -
• We have fully implemented the hybrid TRMc approach in the Koka compiler, and our benchmarks show that this approach can be very efficient. We measure various variants of modulo cons recursive functions and for linear control the TRMc transformed version is always faster than alternative approaches (Section 9).
In this version, we make the following contributions over the conference version:
-
• We extend the TRMC algorithm to ensure that (when instantiated to general evaluation contexts) it can optimize all recursive calls that are not under a lambda (Section 3). In contrast, the algorithm presented in the conference paper could only achieve this if the source program was in A-normal form (Flanagan et al., Reference Flanagan, Sabry, Duba and Felleisen1993). Our new algorithm extends the previous algorithm to perform the necessary A-normalizations on-demand.
-
• We describe a method for composing context instantiations (Section 5). This is especially useful for programs where fast instantiations (like semiring contexts) are not quite good enough to make the program tail-recursive. In that case, we can use the fast instantiation where it applies and use a slower instantiation like defunctionalized contexts for the rest. We use this insight to derive a tail-recursive evaluator for an arithmetic expression evaluator on fields.
-
• We include a detailed description of Koka’s implementation of constructor contexts. We discuss a snippet of the assembly code generated by the Koka and explain the optimizations that make the implementation efficient (Section 7.3). Furthermore, we describe in detail another implementation strategy proposed by Lorenzen et al. (Reference Lorenzen, Leijen, Swierstra and Lindley2024), which does not rely on reference counting and contrast it with our implementation.
-
• Constructor contexts were special in the conference version of this paper, since they were the only contexts for which the transformation could not be done manually. However, Lorenzen et al. (Reference Lorenzen, Leijen, Swierstra and Lindley2024) show that the hybrid approach can be used to make constructor contexts first-class values. This gives programmers the ability to make their functions tail-recursive manually. We include several practical examples of programming with first-class constructor contexts, where it would be hard to achieve a tail-recursive version fully automatically, but a manual solution is evident.
The new content in this version supersedes several sections of the conference paper. We no longer include “Improving Constructor Contexts” (which is now covered by the extended algorithm in Section 3), “Modulo Cons Products” (which can be achieved more easily using first-class constructor contexts), and “Fall Back to General Evaluation Contexts” (which is less efficient than the implementation proposed in Section 7.4).
2 An overview of tail recursion modulo cons
As shown in the introduction, the prototypical example of a function that can be transformed by TRMc is the map function. One way to rewrite the map function manually to be tail-recursive is to use CPS where we add a continuation parameter
${\textit{k}}$
:

where we have to evaluate f(x) before allocating the closure
since f may have an observable (side) effect. The function id is the identity function, and apply and compose regular function application and composition:
All our examples use the Koka language (Leijen, Reference Leijen2021) since it has a full implementation of TRMc using the design in this paper, including support for nonlinear control (which cannot be handled by previous TRMc techniques).
We would like to stress though that the described techniques are not restricted to Koka as such and apply generally to any strict programming language (and particular instances can already be found in various compilers, including GCC, see Section 4.6). Some techniques, like the hybrid approach in Section 7.2, may require particular runtime support (like precise reference counts), but this is again independent of the particular language.
2.1 Continuation style TRMc
Our new tail-recursive version of map may not consume any extra stack space, but it achieves this at the cost of allocating many intermediate closures in the heap, that each allocate a
node for the final result list. The TRMc translation is based on the insight that for many contexts around a tail-recursive call, we can often use more efficient implementations than function composition. In this paper, we are going to abstract over particular constructor contexts and instead represent abstract program contexts as
with three operations. First, the
expression creates such contexts which can contain a single hole denoted as
for example,
We can see here that the context type
is parameterized by the type of the hole
, which for our purposes must match the result type as well. Furthermore, we can compose and apply these abstract contexts as:
Our general TRMC translation can convert a function like map automatically to a tail-recursive version by recognizing that each recursive invocation to map is under a constant constructor context (Section 6), leading to:

This is essentially equivalent to our manually translated CPS-style map function where we replaced function application and composition with context application and context composition, and the identity function with
Thus, an obvious way to give semantics to our abstract contexts
is to represent them as functions
, where a context expression is interpreted as a function with a single parameter for the hole, for example,
(and therefore
). Context application and composition then map directly onto function application and composition:
Of course, using such semantics is equivalent to our original manual implementation and does not improve efficiency.
2.2 Linear continuation style
The insight of Risch (Reference Risch1973) and Friedman & Wise (Reference Friedman and Wise1975) that leads to increased efficiency is to observe that the transformation always uses the abstract context
${\textit{k}}$
in a linear way, and we can implement the composition and application by updating the context holes in-place. Following the implementation strategy of Minamide (Reference Minamide1998) for their hole-calculus, we can represent our abstract contexts as a Minamide tuple with a res field pointing to the final result object, and a hole field which points directly at the field containing the hole inside the result object. Assuming an assignment primitive
, we can then implement composition and application efficiently as:

where the empty
is represented as
(since we do not yet have an address for the
hole field). If we inline these definitions in the mapk function, we can see that we end up with a very efficient implementation where each new
cell is directly appended to the partially build final result list. In our actual implementation, we optimize a bit more by defining the
type as a value type with only the
constructor where we represent the
case with a hole field containing a null pointer. Such a tuple is passed at runtime in two registers and leads to efficient code where the
in the app function, for example, just zero-compares a register (see Section 7.3). Section 9 shows detailed performance figures that show that the TRMc transformation always outperforms alternative implementations (for linear control flow).
2.3 First-class constructor contexts
The constructor context can be implemented using in-place updates since the TRMC conversion guarantees it is used linearly. However, it turns out this also makes the conversion invalid if the host language has advanced control primitives like
or general algebraic effect handlers (as in Koka). For example, for the map function, the passed in function f might resume multiple times. We describe this issue in detail in Section 7.1. As we show in this paper, we can compile constructor contexts in such a way that we are able to detect if a context is used nonlinearly and, in such case, fall back to copying the context path at runtime to maintain correct semantics (Section 7.2). This still gives us the in-place update efficiency in the usual case but now also handles nonlinear usage.
Furthermore, this fully encapsulates the imperative implementation of constructor contexts behind a purely functional interface, and we can expose these contexts as first-class values in the language. This allows us to write the result of the TRMc transformation manually. In Koka, first-class constructor contexts are written as a
expression with a single hole denoted by an underscore _(Lorenzen et al., Reference Lorenzen, Leijen, Swierstra and Lindley2024). For example, we can write a list constructor context as
or a binary tree constructor context as
. The composition operation is written as
, while application is written as
. For example, the expression
evaluates to
and then to
. Using these context expressions, we can directly implement the TRMC transformed map function in Koka:

These constructor context are quite efficient (see Lorenzen et al., Reference Lorenzen, Leijen, Swierstra and Lindley2024 and Section 7.3), and the Koka compiler implements the automatic TRMc conversion as a source-to-source transformation using the first-class constructor contexts directly. We give further examples of the programming with first-class constructor contexts in Section 8.
3 Calculating tail recursion modulo context
In order to reason precisely about our transformation, we define a small calculus in Figure 1. The calculus is mostly standard with expressions
${\textit{e}}$
consisting of values
${\textit{v}}$
, application
${\textit{e}_{1}~\textit{e}_{2}}$
, let-bindings, and pattern matches. We assume well-typed programs that cannot go wrong, and where pattern matches are always complete and cannot get stuck. Since we reason in particular over recursive definitions, we add a special environment
${\textit{F}}$
of named recursive functions
${\textit{f}}$
. We could have encoded this using a
${\mathsf{fix}}$
combinator, but using explicitly named definitions is more convenient for our purposes.

Fig. 1. Syntax and operational semantics.
Following the approach of Wright & Felleisen (Reference Wright and Felleisen1994), we define applicative order evaluation contexts
${\mathsf{E}}$
. Generally, contexts are expressions with one subexpression denoted as a hole
${\square{}}$
. We write
${\mathsf{E}[\textit{v}]}$
for the substitution
${\mathsf{E}[\square{}~:={}~\textit{v}]}$
(which binds tighter than function application). The definition of
${\mathsf{E}}$
ensures a single reduction order where we never evaluate under a lambda. Dually, we define tail contexts
${\mathsf{T}}$
(Abelson et al., Reference Abelson, Kent Dybvig, Haynes, Rozas, Adams, Friedman and Kohlbecker1998). They describe the last term to be evaluated in an expression. Unlike evaluation contexts, there can be several holes in a tail context (say, in different branches of a match-statement) and the substitution
${\mathsf{T}[\square{}~:={}~\textit{v}]}$
is assumed to be capture-avoiding. We also define expression contexts
${\mathsf{X}}$
which match any expression that is not under a lambda.
The operational semantics can now be given using small step reduction rules of the form
${\textit{e}_{1}~{\longrightarrow}~\textit{e}_{2}}$
together with the
${(\textit{step})}$
rule to reduce in any evaluation context
${\mathsf{E}[\textit{e}_{1}]~{\longmapsto}~\mathsf{E}[\textit{e}_{2}]}$
(and in essence, an
${\mathsf{E}}$
context is an abstraction of the program stack and registers). We write
${{\longmapsto^{\!*}}}$
for the reflexive and transitive closure of the
${{\longmapsto}}$
reduction relation. The small step operational rules are standard, except for the
${(\textit{fun})}$
rule that assumes a global
${\textit{F}}$
environment of recursive function definitions.
When
${\textit{e}~{\longmapsto^{\!*}}~\textit{v}}$
, we call
${\textit{e}}$
terminating (also called valuable Harper, Reference Harper2012). When an evaluation does not terminate, we write
${\textit{e}~\!\!\Uparrow{}}$
. For closed terms, we write
${\textit{e}_{1}~\cong{}~\textit{e}_{2}}$
if
${\textit{e}_{1}}$
and
${\textit{e}_{2}}$
are extensionally equivalent: either
${\textit{e}_{1}~{\longmapsto^{\!*}}~\textit{v}}$
and
${\textit{e}_{2}~{\longmapsto^{\!*}}~\textit{v}}$
, or both
${\textit{e}_{1}~\!\!\Uparrow{}}$
and
${\textit{e}_{2}~\!\!\Uparrow{}}$
. For open terms, we write
${\textit{e}_{1}~\cong{}~\textit{e}_{2}}$
if
${\sigma(\textit{e}_{1})~\cong{}~\sigma(\textit{e}_{2})}$
for all substitutions
${\sigma}$
of free variables by values.
During reasoning, we sometimes use the following equalities:

Since the hole in an evaluation context marks the first term to be evaluated, rewriting along these equalities does not change the evaluation order. Rewrites using the last equality may lead to code duplication since they push the evaluation context into each branch (but this can be avoided by inserting join points Maurer et al., Reference Maurer, Downen, Ariola and Jones2017).
It is straightforward to show these equalities hold, for example, if
${\textit{e}_{2}}$
is terminating:

(and if
${\textit{e}_{2}~\!\!\Uparrow{}}$
, then both do not terminate).
3.1 Abstract contexts
Before we start calculating our general TRMC transformation, we first define abstract contexts as an abstract type
${\textit{ctx}~\tau{}}$
in our calculus. There are three context operations: creation (as
${\mathsf{ctx}}$
), application (as
${\mathsf{app}}$
), and composition (as
${({\bullet})}$
). These are not available to the user but instead are only generated as the target calculus of our TRMC translation. We extend the calculus as follows:
where we assume that the abstract context operations are always terminating. In order to reason about contexts as an abstract type, we assume two context laws. The first one relates the application with the construction of a context:
The second law states that composition of contexts is equivalent to a composition of applications:
When we instantiate to a particular implementation context, we need to show the context laws are satisfied. In such case, we only need to show this for terminating expressions
${\textit{e}}$
, since if
${\textit{e}~\!\!\Uparrow{}}$
, the laws hold by definition. In particular, for
${(\textit{appctx})}$
it follows directly that
${\mathsf{app}~\textit{k}~\textit{e}~\!\!\Uparrow{}}$
and
${\mathsf{E}[\textit{e}]~\!\!\Uparrow{}}$
. Of particular note is that the latter only holds for
${\mathsf{E}}$
contexts since they guarantee that the hole is first in the evaluation order and that is one reason why evaluation contexts are the maximum context possible for our TRMC translation. Similarly, for
${(\textit{appcomp})}$
it follows directly that
${(\textit{app}~(\textit{k}_{1}~{\bullet}~\textit{k}_{2})~\textit{e})~\!\!\Uparrow{}}$
and
${\mathsf{app}~\textit{k}_{1}~(\mathsf{app}~\textit{k}_{2}~\textit{e})~\!\!\Uparrow{}}$
.
3.2 Calculating a general tail-recursion-modulo-contexts algorithm
In this section, we are going to calculate a general TRMC translation algorithm from its specification. The algorithm is calculated assuming an abstract context where the context laws hold. Eventually, the algorithm needs to be instantiated in the compiler to particular contexts (like constructor contexts), with a particular implementation of context application and composition. We show many such instantiations in Sections 4 and 6.
For clarity, we use single parameter functions for proofs and derivations (but of course the results extend straightforwardly to multiple parameter functions). Now consider a function
${\textit{f}~\textit{x}~=~\textit{e}_\textit{f}}$
with its TRMC transformed version denoted as
${\textit{f}'}$
:
Our goal is to calculate the static TRMC transformation algorithm
${{[\![}\_{]\!]}_{\textit{f},\textit{k}}}$
from its specification, where
${\textit{f}}$
is the function we intend to transform and
${\textit{k}}$
is a fresh variable representing the context. The first question is then how we should even specify the intended behavior of such function?
We can follow the standard approach for reasoning about continuation passing style (CPS) here. For example, Gibbons (Reference Gibbons2022) calculates the CPS version of the factorial function, called
${\textit{fact}'}$
, from its specification as:
${\textit{k}~(\textit{fact}~\textit{n})~\cong{}~\textit{fact}'~\textit{n}~\textit{k}}$
, and similarly, Hutton (Reference Hutton2021) calculates the CPS version of an evaluator from its specification as:
${\mathsf{exec}~\textit{k}~(\textit{eval}~\textit{e})~\cong{}~\textit{eval}'~\textit{e}~\textit{k}}$
. Following that approach, we use
${\;\mathsf{app}~\textit{k}~(\textit{f}~\textit{e})~\cong{}~\textit{f}'~\textit{e}~\textit{k}\;}$
(a) as our initial specification. This seems a good start since it implies:

and we can thus replace any applications of
${\textit{f}~\textit{e}}$
in the program with applications to the TRMC translated
${\textit{f}'}$
instead as
${\textit{f}'~\textit{e}~(\mathsf{ctx}~\square{})}$
.
Unfortunately, the specification is not yet specific enough to calculate with as it does not include the translation function
${{[\![}\_{]\!]}_{\textit{f},\textit{k}}}$
itself which limits what we can derive. Can we change this? Let’s start by deriving how we can satisfy our initial specification (a):

This suggests a more general specification as
${\;\mathsf{app}~\textit{k}~\textit{e}~\cong{}~{[\![}\textit{e}{]\!]}_{\textit{f},\textit{k}}\;}$
(b) (for any
${\textit{e}}$
) which both implies our original specification but also includes the translation function now. The improved specification directly gives us a trivial solution for the translation as:
That is not quite what we need for general TRMC though since this does not translate any tail calls modulo a context in
${\textit{e}}$
at all. However, we can be more specific by matching on the shape of
${\textit{e}}$
. In particular, we can match on general tail-modulo-context calls in the expression
${\textit{e}}$
if it has the shape
${\mathsf{E}[\textit{f}~\textit{e}_{1}]}$
. We can then calculateFootnote
1
:

Effectively, we replace all direct tail-recursive calls in the original function
${\textit{f}}$
to tail-recursive calls in our translated function
${\textit{f}'}$
by just extending the continuation parameter
${\textit{k}}$
. Together with our earlier equation, we now have an initial specification of our transation function:
Note that the equations overlap – for a particular instance of the algorithm we generally constrain the
${(\textit{tail})}$
rule to only apply for certain contexts
${\mathsf{E}}$
constrained by some particular
${(\star)}$
condition (e.g., constructor contexts), falling back to
${(\textit{base})}$
otherwise. Similarly, the
${(\textit{tail})}$
case allows a choice in where to apply the tail call for expressions like
${\textit{f}~(\textit{f}~\textit{e})}$
for example and a particular instantiation of
${(\star)}$
should disambiguate for an actual algorithm. By default, we assume that any instantiation matches on the innermost application of
${\textit{f}}$
(for reasons discussed in Section 4.2).
This is still a bit constrained, as these equations do not consider any evaluation contexts
${\mathsf{E}}$
where the recursive call is under a
${\mathsf{let}}$
or
${\mathsf{match}}$
expression. We can again match on these specific forms of
${\textit{e}}$
. For example,
${\mathsf{let}~\textit{x}~=~\textit{e}_{0}~\mathsf{in}~\textit{e}_{1}}$
where
${\textit{e}_{0}~{\neq}~\mathsf{E}[\textit{f}~\textit{e}']}$
(so it does not overlap with
${\mathsf{E}}$
contexts):

Unfortunately, this rule is still too restrictive in general as it does not apply when the let-statement is itself under a context
${\mathsf{E}}$
. For example, we might encounter an expression like:
Here, the recursive call is under the let-binding of
${\textit{x}}$
(as
${\mathsf{E}[\mathsf{let}~\textit{y}~=~\textit{e}_{0}~\mathsf{in}~\textit{f}~\textit{x}~\textit{y}]}$
), but the
${\textit{y}~=~\textit{e}_{0}}$
binding prevents the recursive call
${\textit{f}~\textit{x}~\textit{y}}$
to be the focus of the evaluation context. This situation occurs whenever an expression is not in A-normal form (Flanagan et al., Reference Flanagan, Sabry, Duba and Felleisen1993), and these cannot be optimized by the rules outlined so far (and neither by the rules as presented in earlier work Leijen & Lorenzen, Reference Leijen and Lorenzen2023). Instead, we need to consider the general case where the let-binding appears under a context
${\mathsf{E}}$
. Assuming that the variables bound in let-bindings and matches are fresh, we can calculate:

Effectively we have lifted out the let-binding from the evaluation context. We can do the same for matches:

This form of specification essentially performs A-normalization whenever necessary to create further opportunities to match on tail-recursive calls. Our presentation follows the approach of Maurer et al. (Reference Maurer, Downen, Ariola and Jones2017), who describe the positions in a term that occur last in an evaluation order as tail contexts
${\mathsf{T}}$
. They show that A-normalization can be achieved by commuting the
${\mathsf{E}}$
and
${\mathsf{T}}$
contexts whenever possible. This is exactly the approach taken here, where we commute single let-bindings and matches under
${\mathsf{E}}$
contexts to the front of the term. A potential drawback of the
${\mathsf{match}}$
normalization is that it duplicates the evaluation context
${\mathsf{E}}$
in each of the branches. Maurer et al. (Reference Maurer, Downen, Ariola and Jones2017) also show how join points can be used to avoid code duplication in such case.
This leaves one last expression form to consider: the application of a function to an argument. Using the intuition of commuting tail contexts, we might define
${{[\![}~\mathsf{E}[\textit{e}_{0}~\textit{e}_{1}]~{]\!]}_{\textit{f},\textit{k}}~=~\textit{e}_{0}~{[\![}~\mathsf{E}[\textit{e}_{1}]~{]\!]}_{\textit{f},\textit{k}}}$
. However, while
${\textit{e}_{0}}$
can now be evaluated early, the application itself depends on the result of our transformation. Thus, we need to be a bit more careful and instead calculate:

In this general form, we need to strengthen our requirement that
${\textit{e}_{0}~{\neq}~\mathsf{E}[\textit{f}~\textit{e}']}$
to ensure that it does not overlap with our newly calculated rules. We write
${\textit{e}_{0}~\neq \mathsf{X}[\textit{f}~\textit{e}']}$
to mean that
${\textit{e}_{0}}$
cannot have a recursive call under an expression context
${\mathsf{X}}$
. The expression context
${\mathsf{X}[\textit{e}]}$
matches all possible expressions that contain
${\textit{e}}$
, unless
${\textit{e}}$
occurs in
${\mathsf{X}[\textit{e}]}$
exclusively under lambdas.
3.3 The tail-recursion-modulo-contexts algorithm
Figure 2 shows all five of the calculated equations for our generic tail recursion modulo contexts transformation (extended to multiple parameters). We can instantiate this algorithm by defining the context type
${\textit{ctx}~\alpha{}}$
, the context construction (
${\mathsf{ctx}}$
), composition
${({\bullet})}$
, and application (
${\mathsf{app}}$
) operations, and finally the
${(\star)}$
condition constrains the allowed context
${\mathsf{E}}$
to fit the particular context type.

Fig. 2. Calculated algorithm for general selective tail recursion modulo context transformation. It is parameterized by the (
$\star$
) condition, the composition (
$\bullet$
), and application (app) operations.
As remarked in Section 3.2, our equationally derived algorithm relied on a (single) non-inductive step, but we can still show formally that the algorithm is indeed sound:
Theorem 1. (The TRMC translation is sound)
Let
${\textit{f}~\textit{x}~=~\textit{e}_\textit{f}}$
and
${\textit{f}'~\textit{x}~\textit{k}~=~{[\![}\textit{e}_\textit{f}{]\!]}_{\textit{f},\textit{k}}}$
, then
${\textit{app}~\textit{k}~(\textit{f}~\textit{x})~\cong{}~\textit{f}'~\textit{x}~\textit{k}}$
.
See Appendix B.1 for the proof. Thanks to the A-normalization, we can also show that the TRMC algorithm exhaustively optimizes recursive calls:
Theorem 2. (Matching all recursive calls)
For any transformed expression
${\textit{e}'~=~{[\![}\textit{e}{]\!]}_{\textit{f},\textit{k}}}$
with
${(\star)}$
unconstrained, we have
${\textit{e}'~\neq \mathsf{X}[\textit{f}~\textit{e}_{0}]}$
.
There are two types of recursive calls that cannot be optimized by our algorithm: First, we do not optimize recursive calls inside lambdas. This is a necessary restriction, since it is impossible in general to push the accumulated context
${\textit{k}}$
under a lambda. Second, our algorithm will only optimize the first recursive call(s) in the evaluation order. If those are followed by further recursive calls, the evaluation context
${\mathsf{E}}$
stored as
${\mathsf{ctx}~\mathsf{E}}$
in the
${(\textit{tail})}$
rule may still contain unoptimized recursive calls. We will revisit this problem in Section 4.2. To see our algorithm in action, let’s consider the
${\textit{map}}$
function again:
When translating this function, we first use the
${(\textit{tmatch})}$
rule with
${\mathsf{E}~=~\square{}}$
to descend into the branches of the match. In the
${\textit{Nil}}$
branch, the
${(\textit{base})}$
rule applies. In the
${\textit{Cons}}$
branch, we use the
${(\textit{tapp})}$
rule (again with
${\mathsf{E}~=~\square{}}$
) to bind the call to
${\textit{f}~\textit{x}}$
. We then use the
${(\textit{tail})}$
rule to optimize the recursive call to map xx f:
However, for Constructor Contexts (Section 6), it is useful to keep the
${\textit{Cons}}$
constructor in the context passed to
${\textit{map}}$
. In our pratical implementation, we therefore modify the
${(\textit{tapp})}$
rule slightly to extract the arguments instead of the entire partially applied function. Our final transformation for
${\textit{map}}$
is then:
4 Instantiations of the general TRMC transformation
With the general TRMC transformation in hand, we discuss various instantiations in this section. In the next section, we look at the update-in-place modulo cons (TRMc) instantiation in detail.
4.1 Modulo evaluation contexts
If we use
${\textit{true}}$
for the
${(\star)}$
condition, we can translate any recursive tail modulo evaluation context functions. Representing our abstract context directly as an
${\mathsf{E}}$
context is usually not possible though as
${\mathsf{E}}$
contexts generally contain code. The usual way to represent an arbitrary evaluation context
${\mathsf{E}}$
is simply as a (continuation) function
${\lambda \textit{x}.~\mathsf{E}[\textit{x}]}$
with a context type
${\textit{ctx}~\alpha{}~=~\alpha{}~{\rightarrow}~\alpha{}}$
:

This is an intuitive definition where
${\mathsf{ctx}~\square{}}$
corresponds to the identity function and context composition to function composition. If we apply the TRMC translation, we are essentially performing a selective CPS translation where the context
${\mathsf{E}}$
is represented as the continuation function. We can verify that the context laws hold for this instantiation (where we can assume
${\textit{e}}$
is terminating):

As a concrete example, let’s apply the modulo evaluation context to the
${\textit{map}}$
function:
which translates to:
and which the compiler can further simplify into:
where we derived exactly the standard CPS style version of map as shown in Section 2. A general evaluation context transformation creates more opportunities for tail-recursive calls, but this also happens at the cost of potentially heap allocating continuation closures. As such, it is not common for strict languages to use this instantiation. The exception would be languages like scheme that always guarantee tail calls, but in that case the modulo evaluation contexts instantiation is already subsumed by general CPS conversion.
4.2 Nested translation of modulo evaluation contexts
The current instantiation is already very general as it applies to any
${\mathsf{E}}$
context, but we can do a little better. While the innermost non-tail call
${\mathsf{E}[\textit{f}~\textit{e}]}$
becomes
${\textit{f}'~\textit{e}~(\textit{k}~{\bullet}~\mathsf{ctx}~\mathsf{E})}$
, the context
${\mathsf{E}}$
may contain itself further recursive calls to
${\textit{f}}$
. Since
${\textit{k}}$
is just a variable this allocates a closure for each composition
${({\bullet})}$
and invokes every nested call
${\textit{f}~\textit{e}}$
with an empty context as
${\textit{f}'~\textit{e}~(\mathsf{ctx}~\square{})}$
before composing with
${\textit{k}}$
. This is not ideal, and in the classic CPS translation this is avoided by passing
${\textit{k}}$
itself into the closure for
${\mathsf{ctx}~\mathsf{E}}$
directly. Fortunately, we can achieve the same by specializing the compose function using the specification (b):

That is, in the compiler, instead of generating
${\textit{k}~{\bullet}~(\mathsf{ctx}~\mathsf{E})}$
, we invoke the TRMC translation recursively in the
${(\textit{tail})}$
case and generate
${\lambda \textit{x}.~{[\![}\mathsf{E}[\textit{x}]{}{]\!]}_{\textit{f},\textit{k}}}$
instead. This avoids the allocation of function composition closures and directly passes the continuation
${\textit{k}}$
to any nested recursive calls.
4.3 Modulo defunctionalized evaluation contexts
In order to better understand the shapes that evaluation contexts can take, we want to consider the defunctionalization (Reynolds, Reference Reynolds1972; Danvy & Nielsen, Reference Danvy and Nielsen2001) of the general evaluation context transformation. It turns out that this yields an interesting context in its own right. First, we observe that in any recursive function the evaluation context can only take a finite number of shapes depending on the number of recursive calls. We write this as:
We define an accumulator datatype by creating a constructor
${\textsf{H}}$
for the
${\square{}}$
context and for each
${\mathsf{E}_\textit{i}}$
a constructor
${\mathsf{A}_\textit{i}}$
that carries the free variables of
${\mathsf{E}_\textit{i}}$
. The compiler then generates an
${\mathsf{app}}$
function where we interpret
${\mathsf{A}_\textit{i}}$
by evaluating
${\mathsf{E}_\textit{i}}$
with the stored free variables:

Just as we saw in Section 4.2, we need to use the translated evaluation context in the definition of
${\mathsf{app}}$
to translate nested calls. The context laws now follow by induction – see Appendix B.2 in the supplement for the derivations.
Applying this instantiation to the map function, we obtain:
In the
${\textit{Cons}}$
branch, we have inlined
${\textit{k}~{\bullet}~(\mathsf{A}_{1}~\textit{y}~\textsf{H})}$
. The
${\textit{app}}$
function interprets
${\mathsf{A}_{1}}$
by calling itself recursively on the stored evaluation context:
As we can see, using the modulo defunctionalized evaluation context translation, we derived exactly the accumulator version of the map function that reverses the accumulated list in the end (where app is reverse)! In particular, for the special case where all evaluation contexts are constructor contexts
${\textit{C}^\textit{m}~\textit{x}_1~{\dots}~(\textit{f}~{\dots})~{\dots}~\textit{x}_\textit{m}}$
(as is the case for map), the accumulator datatype stores a path into the data structure we are building and thus essentially becomes a zipper structure (Huet, Reference Huet1997).
This defunctionalized approach might resemble general closure conversion at first (Appel, Reference Appel1991): In both approaches, we store the free variables in a datatype. However, in closure conversion the datatype typically also contains a machine code pointer and one jumps to the code by calling this pointer, while in our case we match on the specialized constructors (similar to the approach of Tolmach & Oliva, Reference Tolmach and Oliva1998).
4.3.1 Reuse
As the defunctionalization makes the evaluation context explicit, we can optimize it further. As Sobel & Friedman (Reference Sobel and Friedman1998) note, the defunctionalized closure is only applied once and we can reuse its memory for other allocations. This can happen automatically in languages with reuse analysis such as Koka (Lorenzen & Leijen, Reference Lorenzen and Leijen2022), Lean (Ullrich & de Moura, Reference Ullrich and de Moura2019), or OPAL (Didrich et al., Reference Didrich, Fett, Gerke, Grieskamp and Pepper1994). In particular, in the app function, the match:
can reuse the
${\mathsf{A}_{1}}$
in-place to allocate the
${\textit{Cons}}$
node if the
${\mathsf{A}_{1}}$
is unique at runtime. In our case, the context is actually always unique (we show this formally in Section 6.1), and the
${\mathsf{A}_{1}}$
nodes are always reused! Even better, if the initial list is unique, we also reuse the initial
${\textit{Cons}}$
cell for the
${\mathsf{A}_{1}}$
accumulator itself in
${\textit{map}'}$
and no allocation takes place at all – the program is functional but in-place (Xie et al., Reference Xie and Leijen2021; Lorenzen et al., Reference Lorenzen, Leijen and Swierstra2023).
4.4 Modulo associative operator contexts
In the previous instantiations, we considered general evaluation contexts. However, we can often derive more efficient instantiations by considering more restricted contexts. A particularly nice example are monoidal contexts. For any monoid with an associative operator
${{\odot}~:{}~\tau{}~{\rightarrow}\tau{}~{\rightarrow}~\tau{}}$
and a unit value
${\textit{unit}~:{}~\tau{}}$
, we can define a restricted operator context as:
For a concrete example, consider the
${\textit{length}}$
function defined as:
which applies for integer addition (
${{\odot}~=~+,~\textit{unit}~=~0}$
). The idea is now to define a compile time fold function
${(|\_|)}$
over a context
${\mathsf{A}}$
to always reduce the context to a single element of type
${\tau{}}$
:

We can now instantiate the abstract contexts by defining the
${(\star)}$
condition to constrain the
${\mathsf{E}}$
context to
${\mathsf{A}}$
, and the context type to
${\textit{ctx}~\tau{}~=~\tau{}}$
, where we use the fold operation to represent contexts always as a single element of type
${\tau{}}$
:

The context laws hold for this definition. For composition, we can derive:

and for context application we have:

We proceed by induction over
${\mathsf{A}}$
.

Common instantiations include integer addition (
${{\odot}~=~+,~\textit{unit}~=~0}$
) and integer multiplication (
${{\odot}~=~\times,~\textit{unit}~=~1}$
). The TRMC algorithm with
${\mathsf{A}}$
contexts instantiated with integer addition translates the previous
${\textit{length}}$
function to the following tail-recursive version:
The intention is that the fold function is performed by the compiler, and the compiler can simplify this further as:
such that we end up with:
This time we derived exactly the text book accumulator version of
${\textit{length}}$
.
4.4.1 Using right biased contexts
Our defined context only allows the recursive call on the left, but we can also define a right biased context:
with the fold defined as:

We can now compose in the opposite order:

We can again show that the context laws hold for this definition (see Appendix B.3 in the supplement). As an example, we can instantiate
${{\odot}}$
as list append
${{+\!\!+}}$
with the empty list as the unit element to transform the
${\textit{reverse}}$
function:
First, our TRMC algorithm transforms it into:
and with our instantiated context, this simplifies to:
Using right-biased contexts, we derived the text book accumulator version of reverse. This shows that our general TRMC algorithm can be instantiated to eliminate append calls automatically as first proposed by Hughes (Reference Hughes1986) and Wadler (Reference Wadler1987).
4.5 Modulo monoid contexts
To handle general monoids, we need to consider recursive calls on both sides of the associative operation:
This context
${\mathsf{A}}$
expresses arbitrarily nested applications of
${{\odot}}$
. As monoid operations may not be commutative, we cannot use a single element to represent the context. Instead, we need to use a product context where we accumulate the left and right context separately:

which we compose as:

We can again show that the context laws hold for this definition (see Appendix B.4 in the supplement).
4.6 Modulo semiring contexts
We can also combine the associative operators of two monoids, as long as one distributes over the other. This is the case for semirings in particular (although we do not need commutativity of
${+}$
). Semiring contexts are relatively common in practice. For example, consider the following hashing function for a list of integers as shown by Bloch (Reference Bloch2008):
Implementing modulo semiring contexts in a compiler may be worthwhile as deriving a tail-recursive version manually for such contexts is not always straightforward (and the interested reader may want to pause here and try to rewrite the
${\textit{hash}}$
function in a tail-recursive way before reading on).
We can define a general context for semirings as:
For simplicity, we assume we have a commutative semiring where both addition and multiplication commute. This allows us to use again a product representation at runtime where we accumulate the additions and multiplications separately (and without commutativity we need a quadruple instead). In the definition of the fold, we take into account that the multiplication distributes over the addition:

Finally, to compose the contexts we need to use distributivity again. Note how the
${(\textit{scomp})}$
rule mirrors the definition of
${(|\mathsf{A}|)}$
above:

We can show the context laws hold for these definitions:

and

-1em We proceed by induction over
${\mathsf{A}}$
(where we compress some cases for brevity):

When we apply this to the hash function, we derive the tail-recursive version as:
which further simplifies to:
The final definition may not be quite so obvious, and we argue that the modulo semiring instantiation may be a nice addition to any optimizing compiler. Indeed, it turns out that GCC implements this optimization (Dvořák, Reference Dvořák2004) for integers and floating point numbers (if –fast-math is enabled to allow the assumption of associativity). This implementation specifically creates two local accumulators for addition and multiplication and uses a direct while loop to compile the tail-recursive calls.
4.7 Modulo exponent contexts
As a final example of an efficient representation of contexts, we consider exponent contexts that consist of a sequence of calls to a function
${\textit{g}}$
:
If we use a defunctionalized evaluation context from Section 4.3, we derive a datatype that is isomorphic to the peano-encoded natural numbers: the continuation counts how often we still have to apply
${\textit{g}}$
. As such, we can represent it more efficiently by an integer, where we fold an evaluation context into a count:

We can define the primitive operations as:

where
${\mathsf{app}~\textit{k}~\textit{e}}$
applies the function
${\textit{g}}$
to its argument
${\textit{k}}$
times. See Appendix B.5 in the supplement for the derivations that show the context laws hold for this definition.
Note that if
${\textit{g}}$
is the enclosing function
${\textit{f}}$
, then the
${(\textit{xapp})}$
specification is not tail-recursive. In that case, we can again use specification (b) to replace
${\mathsf{app}~\textit{k}~(\textit{g}~\textit{e})}$
by
${{[\![}\textit{g}~\textit{e}{]\!]}_{\textit{f},\textit{k}}}$
at compile time (as shown in Section 4.2). A nice example of such an exponent context is given by Wand (Reference Wand1980) who considers McCarthy’s 91-function:
Using the exponent context with the recursive
${(\textit{xapp})}$
, we obtain a mutually tail-recursive version:
5 Context composition
While the contexts we have defined so far are useful when they apply, they can fall short if they only match some of the recursive calls. This makes them fragile when a new recursive call is added to a function, as the context may no longer apply. In this section, we remove this restriction by showing how fast but restricted contexts can be composed with slower more general ones. This is not implemented in Koka though.
5.1 A basic expression evaluator
To motivate the composition of contexts, we consider a basic arithmetic expression evaluator in the style of Hutton (Reference Hutton2021):

The + suggests the use of a monoid context. However, this does not apply directly, since we have two recursive calls to eval instead of just one. The best we can do is to ignore the first recursive call and treat it as a regular value. Then we would obtain:

However, we have not quite achieved a tail-recursive version yet. Like Hutton (Reference Hutton2021), we can achieve this by using defunctionalized evaluation contexts:

This version is now tail-recursive, but it is also more complex than the original version and involves the allocation of
and
constructors. In particular, the
constructor seems superfluous, as it corresponds to the context app(k,addacc + eval(e2)), which we optimized using the monoid contexts earlier. In this section, we want to combine the two approaches to obtain a more efficient version, where we use both an accumulator and a monoid context:

This version has the best of both worlds: it is fully tail-recursive and only needs to allocate a defunctionalized continuation for the left recursive call (where we need to keep track of the expression e2), while the right recursive call is efficiently handled by the monoid context.
5.2 Swapping contexts
To achieve this transformation in general, we need to be able to compose two contexts. For two contexts
${\mathsf{E}_{1}}$
and
${\mathsf{E}_{2}}$
, we can define their product context, which consists of tuples of the two contexts. We can apply a product context to an expression by applying each context in turn:
But how would we compose two product contexts? We would like to turn a composition of tuples into a tuple of compositions as
${(\textit{l}_{1},~\textit{r}_{1})~{\bullet}~(\textit{l}_{2},~\textit{r}_{2})~=~(\textit{l}_{1}~{\bullet}~\textit{l}_{2},~\textit{r}_{1}~{\bullet}~\textit{r}_{2})}$
. We can try to calculate this directly:

… but now we are stuck. Here,
${\textit{l}_{1},\textit{l}_{2}}$
belong to the context
${\mathsf{E}_{1}}$
and
${\textit{r}_{1},\textit{r}_{2}}$
to
${\mathsf{E}_{2}}$
. In order to make progress, we have to swap the inner contexts
${\textit{r}_{1}}$
and
${\textit{l}_{2}}$
. But this is not always going to be possible! Instead, we need to parameterize the product context with a swap operation:
If the contexts are connected in this sense, we can continue to calculate their composition:

This gives us a definition for product contexts: we can fold any context
${\mathsf{E}~=~\mathsf{E}_{1}{\mid}{}\mathsf{E}_{2}}$
by composing the folds of
${\mathsf{E}_{1}}$
and
${\mathsf{E}_{2}}$
:


With this definition in hand, we can now derive several contexts from the previous section from the more basic contexts we defined earlier.
5.2.1 Modulo monoid contexts
We motivated the Modulo Monoid Contexts in Section 4.5 as the composition of a left-biased and a right-biased context. In fact, we can now derive this context as the product context of a left-biased and right-biased contexts, with
${\textit{swap}(\textit{l},~\textit{r})~=~(\textit{r},~\textit{l})}$
. This follows the swap law since:
\begin{align*}\quad &~\mathsf{app}~\textit{l}~(\mathsf{app}~\textit{r}~\textit{e})\\=~\;\;~&~\textit{l}~{\odot}~(\textit{e}~{\odot}~\textit{r})\\=~\;\;~&~(\textit{l}~{\odot}~\textit{e})~{\odot}~\textit{r}\\=~\;\;~&~\mathsf{app}~\textit{r}~(\mathsf{app}~\textit{l}~\textit{e})\end{align*}
With this, we get exactly the previous definition of
${(\textit{acomp})}$
of Modulo Monoid Contexts.
5.2.2 Modulo semiring contexts
Similarly, we can derive the semiring context (Section 4.6) as the composition of two left-biased contexts for its addition (
${\textit{l}}$
) and multiplication (
${\textit{r}}$
). Here, the swap operation is given by
${\textit{swap}(\textit{r},~\textit{l})~=~(\textit{r}~*~\textit{l},~\textit{r})}$
:
\begin{align*}\quad &~\mathsf{app}_*~\textit{r}~(\mathsf{app}_+~\textit{l}~\textit{e})\\=~\;\;~&~\textit{r}~*~(\textit{l}~+~\textit{e})\\=~\;\;~&~(\textit{r}~*~\textit{l})~+~(\textit{r}~*~\textit{e})\\=~\;\;~&~\mathsf{app}_+~(\textit{r}~*~\textit{l})~(\mathsf{app}_*~\textit{r}~\textit{e})\end{align*}
With this, we get exactly the previous definition of
${(\textit{scomp})}$
and it our new context corresponds to the semiring contexts we defined earlier. For this definition to work, it is important though that the left-biased context for the addition is in the first component of the tuple with multiplication in the second. That allows us to define a swap operation that uses the distributivity of the semiring to swap the contexts. We could not define a swap operation if multiplication is in the first component, since this would require us to move an addition under a multiplication, which is only possible if the semiring has multiplicative inverses.
5.3 Composing (defunctionalized) evaluation contexts
(Defunctionalized) evaluation contexts are the only contexts introduced in the last section that can reliably make all recursive calls tail-recursive. For this reason, they are particularly attractive for composition with other contexts that lead to faster code in practice but only apply in more limited cases. Thankfully, this is easily possible, since we can swap an arbitrary context
${\textit{r}}$
with a general evaluation context
${\textit{l}}$
by storing it in a closure:
We can verify that this definition fulfills the swap law:

The same approach can also be used for defunctionalized evaluation contexts. Analogous to creating a fresh closure, we could create a special constructor to store an application of the other context. However, to avoid allocations and to enable a nested translation (Section 4.2), we integrate the restricted context into the constructors.
We define the extended accumulator datatype by creating a constructor
${\textsf{H}}$
for the
${\square{}}$
context and for each
${\mathsf{E}_\textit{i}}$
a constructor
${\mathsf{A}_\textit{i}}$
that carries the free variables of
${\mathsf{E}_\textit{i}}$
and the inner context
${\textit{k}'}$
. The compiler then generates an
${\mathsf{app}}$
function where we interpret
${\mathsf{A}_\textit{i}}$
by evaluating
${\mathsf{E}_\textit{i}}$
with the stored free variables:

Then we can define the swap operation as:
This definition again fulfills the swap law. Case
${\textsf{H}}$
:

Case
${\mathsf{A}_\textit{i}~\textit{x}_1~{\dots}~\textit{x}_\textit{m}~\textit{k}'~\textit{k}_{2}}$
:

5.4 Extending the expression evaluator
We can use this insight to derive a tail-recursive expression evaluator which supports multiplication as well, where we compose a defunctionalized evaluation context with a semiring context. First, we add a new constructor
(e1,e2) to our expression datatype which encodes the multiplication eval(e1) * eval(e2). We then create a datatype accum which stores the defunctionalized evaluation contexts when descending into the first expression e1. These constructors contain both the second expression e2 and the semiring context (a, m). When descending into e1, we store the current semiring context in the constructor and continue with the semiring context
${\mathsf{ctx}~\square{}~=~(0,~1)}$
:

This calculation directly follows the recipe for composing with defunctionalized evaluation contexts and can thus be derived algorithmically. Our full implementation becomes:

In contrast, a translation using just defunctionalized evaluation contexts would require two more constructors
(just as in the basic example). Unfortunately though, now that we store the semiring context in accum, our constructors carry a few more elements than the constructors of expr. In a language like Koka, which can reuse constructors of equal size (Xie et al., Reference Xie and Leijen2021; Lorenzen & Leijen, Reference Lorenzen and Leijen2022), it would be preferable to obtain constructors of the same size as expr, since we could then hope to avoid the allocation of
and
by reusing the memory of
and
. This is often possible when using non-composed defunctionalized evaluation contexts and Lorenzen et al. (Reference Lorenzen, Leijen and Swierstra2023) show that it is guaranteed to work if the original function has the shape of a map or fold (like eval). Alas, the same is not true for composed contexts, since we need to store the additional semiring context. However, using composed contexts like here can still avoid allocations in languages that lack reuse analysis.
Finally, we can reduce the number of elements stored in the constructors and obtain a more natural version of the evaluator by using the distributivity law to push the semiring context into the expression. For the
case, we calculate:

At this point, the recursive call to
${\mathsf{eval}~\textit{e}_{1}}$
is under a semiring context
${({\textit{a}},\textit{m})}$
and an evaluation context
${\square{}~+~\textit{m}~*~\mathsf{eval}~\textit{e}_{2}}$
. We thus have to store an extra m in our
constructor:

It turns out that in this version, the semiring context stored in the accumulated datatype is always going to be (0,1), so we can simplify the definition by omitting it. We thus obtain the implementation:

This version is slightly more efficient than the previous one, but the constructors are still too big for reuse analysis to apply. Furthermore, it is unclear whether we can derive this algorithmically as well. Instead, we will stick with the more general version that can be derived directly from the composition of contexts and extend it to derive an evaluator that can also handle subtraction and division.
To extend the expression evaluator to support division, we might try to add a new context for division and show how to compose it with the semiring context. However, this is not straightforward, since
${({\textit{a}}~+~\square{})^{-1}}$
cannot be simplified to
${{\textit{a}}'~+~\square{}^{-1}}$
for any other
${{\textit{a}}'}$
: the inverse of the sum depends on the
${\square{}}$
, which is not yet known, and there is no general rule for exchanging the inverse operation with addition. Instead, we need to use an idea from the theory of continued fractions.
5.5 Aside: Continued fractions
Continued fractions are a representation of the rational (or real) numbers that arises from the Euclidean algorithm. They consist of a sequence of nested additions and fractions with numerator 1. For example, we can calculate the continued fraction of 4.24 as:
\begin{align*}4.24~&=~4~+~\frac{24}{10{}0}\\~&=~4~+~\cfrac{1}{\frac{10{}0}{24}}~=~4~+~\cfrac{1}{4~+~\cfrac{4}{24}}~=~4~+~\cfrac{1}{4~+~\cfrac{1}{6}}~=~4~+~\cfrac{1}{4~+~\cfrac{1}{\cfrac{6}{1}}}~=~4~+~\cfrac{1}{4~+~\cfrac{1}{5~+~\cfrac{1}{1}}}\end{align*}
We can write such a (long-form) continued fraction (with the final ‘1’ left implicit) as
${[4,~4,~5]}$
. Then we can compute its floating point representation with a simple recursive algorithm:

This algorithm is not tail-recursive, and it might be quite difficult to make tail-recursive without further insight (and without resorting to general evaluation contexts). However, it is well known that continued fractions can be calculated by their convergents, which is a sequence
${\textit{h}_\textit{n},~\textit{k}_\textit{n}}$
with
${\mathsf{frac}([{\textit{a}}_0,~{\dots},~{\textit{a}}_\textit{n}])~=~\textit{h}_\textit{n}~/~\textit{k}_\textit{n}}$
. The convergents start with
${\textit{h}_{-2}~=~0}$
,
${\textit{h}_{-1}~=~1}$
,
${\textit{k}_{-2}~=~1}$
, and
${\textit{k}_{-1}~=~0}$
and are further calculated by:
This gives us a tail-recursive algorithm to calculate the continued fraction (where we write h1 for
${\textit{h}_{\textit{n}-1}}$
, h2 for
${\textit{h}_{\textit{n}-2}}$
and equivalent for k1 and k2):

It turns out that we can use the same idea to define a general context that applies to arbitrary sequences of addition, multiplication, and inverses.
5.6 Modulo fields context
Using the insight from continued fractions, we can define general field contexts that support not only addition and multiplication but also additive and multiplicative inverses. We define the context
${\textit{F}}$
as:
It turns out that we use the same convergent representation as for continued fractions, where we keep four numbers:
and define the fold operation as:

We can apply a field context to an expression by substituting the expression for
${\textit{x}}$
. Similarly, we can compose two field contexts by substituting the second context into the first context and simplifying the expression. Our context is defined as:

and the context laws hold.
5.7 An advanced expression evaluator
Using the field contexts, we can extend our expression evaluator to support arbitrary field operations. Our implementation arises directly from the obvious expression evaluator which folds the expression into a rational number:

We can directly define the field context as a datatype, where we define empty(), add(a), mul(a), and inv() to correspond to the fold operations:

Then we use the TRMC algorithm with the composition of defunctionalized contexts and field contexts to obtain a tail-recursive version that uses a field context for the field operations and a defunctionalized context for the recursive calls that leave an expression to be evaluated:

The final derived program is actually quite sophisticated and fully tail-recursive. We believe that deriving this algorithm manually would be nontrivial. Moreover, it only allocates a small amount of memory while descending the left-spine. In contrast, a simple application of defunctionalized contexts without field contexts would require us to allocate a constructor even in the
cases, which would be less efficient.
6 Modulo constructor contexts
As shown in the introduction, the most interesting instantiation is of course the modulo cons transformation on constructor contexts. We can define a constant constructor context
${\mathsf{K}}$
as:
We define the
${(\star)}$
condition in the TRMC translation to restrict the context
${\mathsf{E}}$
to
${\mathsf{K}}$
contexts only. A possible way to define the contexts is to directly use
${\mathsf{K}}$
as a runtime context:

Similar to general evaluation contexts (Section 4.1), the context laws hold trivially for such definition (Appendix B.6 in the supplement) – and just as with general evaluation contexts, the
${\textit{map}}$
function translates to:

Even though this is a valid instantiation, it does not yet imply that this can be efficient. In particular, the composition
${\mathsf{K}_{1}[\mathsf{K}_{2}]}$
could a fresh context every time and it may be difficult to implement such substitution efficiently at runtime as it needs to copy
${\mathsf{K}_{1}}$
along the path to the hole. What we are looking for instead is an in-place updating instantiation that can compose in constant time.
6.1 Minamide
Minamide (Reference Minamide1998) presents a “hole calculus” that can directly express our contexts in a functional way but also allows an efficient in-place updating implementation. Using the hole calculus as our target calculus, we can instantiate the translation function using Minamide’s system. We define the context type as a “hole function” (
${{\hat{\lambda}} \textit{x}.~\textit{e}}$
), where
${\textit{ctx}~\alpha{}}$
${\equiv{}}$
${\textit{hfun}~\alpha{}~\alpha{}}$
. and instantiate the context operations to use the primitives as given by Minamide (Reference Minamide1998):

Satisfyingly, our primitives turn out to map directly to the hole calculus primitives. The reduction rules for
${\textit{happ}}$
and
${\textit{hcomp}}$
specialized to our calculus are (Minamide, Reference Minamide1998, fig. 5):

This means that for any context
${\textit{k}}$
, we have
${\textit{k}~\cong{}~{\hat{\lambda}} \textit{x}.~\mathsf{K}[\textit{x}]}$
(1). We can now show that our context laws are satisfied for this system, with composition:

and application:

The hole calculus is restricted by a linear-type discipline where the contexts
${\textit{ctx}~\alpha{}}$
${\equiv{}}$
${\textit{hfun}~\alpha{}~\alpha{}}$
have a linear type. This is what enables an efficient in-place update implementation while still having a pure functional interface. For our needs, we need to check separately that the translation ensures that all uses of a context
${\textit{k}}$
are indeed linear. Type judgments in Minamide’s system (Minamide, Reference Minamide1998, fig. 4) are denoted as
where
${\Gamma{}}$
is the normal type environment, and
${\textit{H}}$
for linear bindings containing at most one linear value. The type environment
${\Gamma{}}$
can itself contain linear values with a linear type (like
${\textit{hfun}}$
) but only pass those linearly to a single premise. The environment restricted to nonlinear values is denoted as
${\Gamma{}|_{\mathsf{N}}}$
. We can now show that our translation can indeed be typed under the linear type discipline:
Theorem 3. (TRMC uses contexts linearly)
If
and
${\textit{k}}$
fresh then
.
To show this, we need a variant of the general replacement lemma (Hindley & Seldin, Reference Hindley and Seldin1986, Lemma 11.18; Wright & Felleisen, Reference Wright and Felleisen1994, Lemma 4.2) to reason about linear substitution in an evaluation context:
Lemma 1. (Linear replacement)
If
for a constructor context
${\mathsf{K}}$
then there is a sub-deduction
at the hole and
.
Interestingly, this lemma requires constructor contexts and we would not be able to derive the Lemma for general contexts as the linear type environment is not propagated through applications. The proofs can be found in Appendix B.7 in the supplement, which also contains the full type rules adapted to our calculus.
6.2 In-place update
The instantiation with Minamide’s system is using fast in-place updates and proven sound, but it is still a bit unsatisfactory as how such in-place mutation is done (or why this is safe) is only described informally. In Minamide’s system, a suggested implementation for a context is as a tuple
${\langle\mkern1mu{}\mathsf{K},\textit{x}{@}\textit{i}\mkern2mu\rangle{}}$
where
${\mathsf{K}}$
is (a pointer to) a context and
${\textit{x}{@}\textit{i}}$
is the address of the hole as the
${\textit{i}{\textit{th}}}$
field of object
${\textit{x}}$
(in
${\mathsf{K}}$
). The empty tuple
${\langle\mkern1mu{}\mkern2mu\rangle{}}$
is used for an empty context (
${\square{}}$
). Composition and application directly update the hole pointed to by
${\textit{x}{@}\textit{i}}$
by overwriting the hole with the child context or value.
In contrast, Bour et al. (Reference Bour, Clément and Scherer2021) show a TRMC translation for OCaml that uses destination passing style which makes it more explicit how the in-place update of the hole works. In particular, the general construct
${\textit{x}.\textit{i}~:={}~\textit{v}}$
overwrites the
${\textit{i}{\textit{th}}}$
field of any object
${\textit{x}}$
with
${\textit{v}}$
. To gain more insight of why in-place update is possible and correct, we are going to use the explicit heap semantics of Perceus (Xie et al., Reference Xie and Leijen2021; Lorenzen & Leijen, Reference Lorenzen and Leijen2022). In such semantics, the heap is explicit and all objects are explicitly reference counted. Using the Perceus derivation rules, we can soundly translate our current calculus to the Perceus target calculus where the reference counting instructions (
${\mathsf{dup}}$
and
${\mathsf{drop}}$
) are derived automatically by the derivation rules (Xie et al., Reference Xie and Leijen2021, fig. 5). The Perceus heap semantics reduces the derived expressions using reduction steps of the form
${\textit{H}{\mid}{}\textit{e}_{1}~\;~{\longmapsto}_\textit{r}~\;~\textit{H}'{\mid}{}\textit{e}_{2}}$
, which reduces a heap
${\textit{H}}$
and an expression
${\textit{e}}$
to a new heap
${\textit{H}'}$
and expression
${\textit{e}_{2}}$
(Xie et al., Reference Xie and Leijen2021, fig. 7). The heap
${\textit{H}}$
maps objects
${\textit{x}}$
with a reference count
${\textit{n}\geqslant{}1}$
to values, denoted as
${\textit{x} \mapsto^{\textit{n}} \textit{v}}$
. In this system, we can express in-place updates directly, and it turns out we can even calculate the in-place updating reduction rules for
${\mathsf{comp}}$
and
${\mathsf{app}}$
from the context laws. Before we do that though, we first need to establish some terminology and look carefully at what “in-place update” actually means.
6.2.1 The essence of in-place update
Let’s consider a generic copy function,
${(\textit{x}.\textit{i}~\mathsf{as}~\textit{y})}$
, that changes the
${\textit{i}{\textit{th}}}$
field of an object
${\textit{x}}$
to
${\textit{y}}$
, for any generic constructor
${\textit{C}}$
:
When we apply the Perceus algorithm (Xie et al., Reference Xie and Leijen2021), we need to insert a single drop:
In the special case that
${\textit{x}}$
is unique at runtime (i.e., the reference count of
${\textit{x}}$
is 1), we can now derive the following:

And this is the essence of in-place mutation: when an object is unique, an in-place update corresponds to allocating a fresh copy, discarding the original (due to the uniqueness of
${\textit{x}}$
), and
${\alpha{}}$
-renaming to reuse the original “address”.
We will write
${(\textit{x}.\textit{i}~:={}~\textit{z})}$
for
${(\textit{x}.\textit{i}~\mathsf{as}~\textit{z})}$
in the special case of updating a field in a unique constructor, where we can derive the following reduction rule:
and in the case the field is a
${{{\square}}}$
, we can further refine this to:
For convenience, we will from now on use the notation
${\textit{C}~{\dots}~\textit{x}_\textit{i}~{\dots}}$
, and
${\textit{C}~{\dots}~{{\square}}_\textit{i}~{\dots}}$
to denote the
${\textit{i}{\textit{th}}}$
field in a constructor if there is no ambiguity.
6.2.2 Linear chains
We need a bit more generality to express hole updates in contexts. In particular, we will see that all objects along the path from the top of the context to the hole are unique by construction. We call such unique path a linear chain, denoted as
${[\textit{H}]^\textit{n}_\textit{x}}$
:
where for all
${\textit{x}_\textit{i}\in (\mathsf{dom}(\textit{H})~-~\{\textit{x}\})}$
, we have
${\textit{x}_\textit{i}\in \mathsf{fv}(\textit{v}_{\textit{i}-1})}$
(and therefore for all
${\textit{y}\in \mathsf{dom}(\textit{H})}$
we have
${\mathsf{reachable}(\textit{H},\textit{x})}$
). Since the objects in
${\textit{H}}$
besides
${\textit{x}}$
are all unique and not reachable otherwise, we also say that
${\textit{x}}$
dominates
${\textit{H}}$
. When the dominator is also unique, we call it a unique linear chain (of the form
${[\textit{H}]^1_\textit{x}}$
). We can define linear chains inductively as well since a single object always forms a linear chain:
and we can always extend with a unique linear chain:
Using
${(\textit{linearcons}}$
) we can derive that we can append a unique linear chain as well:
6.2.3 Contexts as a linear chain
To simplify the proofs, we assume in this subsection that all fields in
${\mathsf{K}}$
contexts are variables:
since we can always arrange any
${\mathsf{K}}$
to have this form by let-binding the values
${\textit{v}}$
. It turns out that a constructor context then always evaluates to a unique linear chain:
Lemma 2. (Contexts evaluate to unique linear chains)
For any
${\mathsf{K}}$
, we have
${\textit{H}{\mid}{}\mathsf{K}[\textit{C}~{\dots}~\square{}_\textit{i}~{\dots}]~~{{\longrightarrow_{\textsf{r}}^*}}~~\textit{H},~[\textit{H}',\textit{y} \mapsto^{1} \textit{C}~{\dots}~{{\square}}_\textit{i}~{\dots}]^1_\textit{x}{\mid}{}\textit{x}}$
.
We can show this by induction on the shape of
${\mathsf{K}}$
(Appendix B.8 in the supplement).
6.2.4 Calculating the fold
Following Minamide’s approach, we are going to denote our contexts as a tuple
${\langle\mkern1mu{}\textit{x},\textit{y}{@}\textit{i}\mkern2mu\rangle{}}$
where
${\textit{x}}$
is (a pointer to) a constructor context and
${\textit{y}{@}\textit{i}}$
is the address of the hole as the
${\textit{i}{\textit{th}}}$
field of object
${\textit{y}}$
. We define
${\mathsf{ctx}~\mathsf{K}~=~(|\mathsf{K}|)}$
. For an empty context, we use an empty tuple (
${(|\square{}|)~=~\langle\mkern1mu{}\mkern2mu\rangle{}}$
), but otherwise we can specify the fold as:
where we use the notation
${{}[\textit{x}]}$
do denote the last object of the linear chain formed by
${\mathsf{K}}$
(Lemma 2). We can now calculate the definition of
${(|\_|)}$
from its specification (see Appendix B.9 in the supplement), where we get following definition for
${(|\_|)}$
:

This builds up the context using
${\mathsf{let}}$
bindings, while propagating the address of the hole. As before, the intention is that the compiler expands the fold statically. For example, the
${\textit{map}}$
function translates to:

where
${\textit{z}{@}2}$
correctly denotes the address of the hole field in the context.
6.2.5 Updating a context
Before we can define in-place application, we need an in-place substitution operation
${\mathsf{subst}~\langle\mkern1mu{}\textit{x},\textit{y}{@}\textit{i}\mkern2mu\rangle{}~\textit{z}}$
that substitutes
${\textit{z}}$
at the hole (at
${\textit{y}{@}\textit{i}}$
) in the context
${\textit{x}}$
. Note that in our representation of a context as a tuple
${\langle\mkern1mu{}\textit{x},\textit{y}{@}\textit{i}\mkern2mu\rangle{}}$
we treat
${\textit{y}{@}\textit{i}}$
purely as an address and do not reference count
${\textit{y}}$
as such. The
${\textit{y}}$
part is a “weak” pointer, and we cannot use it directly without also having an “real” reference. This means that if we want to define an in-place substitution, we cannot define it directly as
${\textit{y}.\textit{i}~:={}~\textit{z}}$
(since we have no real reference to
${\textit{y}}$
). Instead, we are going to calculate an in-place updating substitution from its specification:
We do this by induction of the shape of the linear chain. For the singleton case, we have:

and for the extension we have:

This leads to the following inductive definition of
${\mathsf{subst}}$
:

That is, to update the last element of the chain in-place, we need traverse down while separating the links such that when we reach the final element it has a unique reference count and can be updated in-place. We then traverse back up fixing up all the links again. Of course, we would not actually use this implementation in practice – the derivation here just shows that the substitution specification is sound, and we can thus implement the
${(\textit{subspec})}$
reduction by instead using the tuple address
${\textit{y}{@}\textit{i}}$
directly to update the hole in-place. In essence, due to the uniqueness of the elements in the chain, the
${\textit{y}}$
is uniquely reachable through
${\textit{x}}$
, and thus it is safe to use it directly in this case.
6.2.6 Calculating application and composition
With the specification for fold and in-place substitution, we can use the context laws to calculate the in-place updating version of application and composition. Starting with application, we can calculate (for
${\mathsf{K}~{\neq}~\square{}}$
):

And thus we define application directly in terms of in-place substitution as:
We arrived exactly at the “obvious” implementation where the hole inside a unique context is updated in-place in constant time. This also corresponds to the informal implementation given in Section 2.3. For composition, it turns out we can define it in terms of applications:
where the derivation is in Appendix B.10 in the supplement. Again we arrived at the efficient translation where the hole in the first unique context is updated in-place (and in constant time) with a pointer to the second context. The full rules for application and composition are (with the derivations for the empty contexts in Appendix B.10 in the supplement):

Note that
${(\textit{ucompr})}$
is not really needed since by construction our translation never generates empty contexts for the second argument. The rules also correspond to the informal implementation given in Section 2.3 where
was used to represent the empty tuple.
With these definitions, we still need to show that we can be efficient and that we never get stuck. For efficiency, we need to show that a context
${\langle\mkern1mu{}\textit{x},\textit{y}@{\textit{i}}\mkern2mu\rangle{}}$
is always a linear chain so we don’t have to check that at runtime in
${(\textit{subspec})}$
. This follows by construction since any initial context
${\mathsf{ctx}~\mathsf{K}}$
is a linear chain (Lemma 2), and any composition as well
${(\textit{ucomp})}$
. Second, the reference count of the dominator should always be 1 or otherwise
${(\textit{subspec})}$
may not apply – that is, contexts should be used linearly. This follows indirectly from Lemma 4 where we show that our translation adheres to Minamide’s linear-type discipline. A more direct approach would be to show that Perceus never derives a
${\mathsf{dup}}$
operation for a context
${\textit{k}}$
in our translation. However, we refrain from doing so here, as it turns out that with general algebraic effect handlers, the linearity of a context may no longer be guaranteed!
7 Modulo first-class constructor contexts
In Koka, constructor contexts are first-class values in the language (Lorenzen et al., Reference Lorenzen, Leijen, Swierstra and Lindley2024). A constructor context can be used more than once, as for example in the expression
, where the context c is shared and it evaluates correctly to
. This abstraction can safely encapsulate the limited form of mutation necessary to implement a Minamide tuple, while still having a purely functional interface that does not rely on linear types.
As we will show in Section 7.2, a
${\mathsf{K}}$
expression is compiled such that each constructor context has at runtime a representation of its linear chain (the context path). The Koka compiler compiles a context like
internally into a Minamide tuple:
where each constructor along the context path is annotated with a child index (1 and 3) leading from the root down to the hole.
When we compose or apply a context, we have to determine whether the context is shared. If the contexts happen to be used linearly, then all operations execute in constant time, just as in Minamide’s approach. But if the context is shared, we will have to copy it along the context path. This gives us a full functional semantics and any subsequent substitutions on the same context work correctly (but will take linear time in the length of the context path).
The ability to copy contexts if necessary is useful for programmers and we discuss some examples in Section 8. But perhaps surprisingly, it is also important for the TRMC transformation itself as we show next.
7.1 Nonlinear control
A long-standing issue in a TRMc transformation is that it is unsound in the presence of non-local control operations like
${\textit{call}}$
/
${\textit{cc}}$
,
${\textit{shift}}$
/
${\textit{reset}}$
(Danvy & Filinski, Reference Danvy and Filinski1990; Sitaram & Felleisen, Reference Sitaram and Felleisen1990; Shan, Reference Shan2007), or in general with algebraic effect handlers (Plotkin & Power, Reference Plotkin and Power2003; Plotkin & Pretnar, Reference Plotkin and Pretnar2009), whenever a continuation or handler resumption can be invoked more than once. Note that if only single-shot continuations or resumptions are allowed (as in OCaml Dolan et al., Reference Dolan, White, Sivaramakrishnan, Yallop and Madhavapeddy2015) for example), the control flow is still always linear and the TRMc transformation still sound. Since the Koka language relies foundationally on general effect handlers (Leijen, Reference Leijen2017, Reference Leijen2021; Xie & Leijen, Reference Xie and Leijen2021), we need to tackle this problem. Algebraic effect handlers extend the syntax with a handle expression,
${\mathsf{handle}~\textit{h}~\textit{e}}$
, and operations,
${\mathsf{op}}$
, that are handled by a handler
${\textit{h}}$
. There are two more reduction rules (Leijen, Reference Leijen2014):

That is, when an operation is invoked it yields all the way up to the innermost handler for that operation and continues from there with the operation clause. Besides the operation argument, it also receives a resumption
${\textit{resume}}$
that allows the operation to return to the original call site with a result
${\textit{y}}$
. The culprit here is that the resumption captures the delimited evaluation context
${\mathsf{E}}$
in a lambda expression, and this can violate linearity assumptions. In particular, if we regard a TRMC context
${\textit{k}}$
as a linear value (as in Minamide), then such
${\textit{k}}$
may be in the context
${\mathsf{E}}$
of the
${(\textit{handle})}$
rule and captured in a nonlinear lambda. Whenever the operation clause calls the resumption more than once, any captured linear values may be used more than once!
A nice example in practice of this occurs in the well-known Knapsack problem as described by Wu et al. (Reference Wu, Schrijvers and Hinze2014) where they use multiple resumptions to implement a non-determinism handler:

An effectful function in Koka has three arguments where the type
denotes a function from type
with a potential (side) effect
. The select function picks an element from a list using the operations of the
effect. The knapsack function picks items from a list of item weights vs that together do not exceed the capacity w (of the knapsack). Since it calls select it has the
effect and additionally it has a divergence effect. We can now provide an effect handler that systematically explores all solutions using multiple resumptions:

That is, the solutions handler implements the flip function by resuming twice and appending the results. Even though knapsack returns a single solution as a list, the test function returns a list of all possible solution lists
. The knapsack function is in the modulo cons fragment and gets translated to a tail-recursive version by our translation into:

Instead of having a runtime that captures evaluation contexts
${\mathsf{E}}$
directly, Koka usually uses an explicit monadic transformation to translate effectful computations into pure lambda calculus. The effect handling is then implemented explicitly using a generic multi-prompt control monad
(Xie & Leijen, Reference Xie and Leijen2020, Reference Xie and Leijen2021). This transforms our knapsack function into:

Every computation in the effect monad either returns with a result
or is yielding up to a handler
. As described by Xie & Leijen (Reference Xie and Leijen2021), the Koka compiler backend implements this monad as a primitive and can generate efficient C code without needing to allocate closures in the fast non-yielding path.
In our example, we inlined the monadic bind operation where the result select(vs) is explicitly matched. We see that in the
case, the continuation expression (namely
) is now explicitly captured under a lambda expression – including the supposedly linear context k! This is how we can end up at runtime with a context that is shared (with a reference count > 1) and where the rule
${(\textit{ucomp})}$
should not be applied.
7.2 Dynamic copying via reference counting
Our context composition is defined in terms of context application, which in turn relies on the in-place substitution (Section 6.2.5):
This is the operation that eventually fails if the runtime context
${\textit{x}}$
is not unique. In Section 6.2.5, the substitution operation was calculated to recursively visit the full linear chain of the context. This suggests a solution for any non-unique context: we can actually traverse the context at runtime and create a fresh copy instead.
It is not immediately clear though how to implement such operation at runtime: the linear chains up to now are just a proof technique and we cannot actually visit the elements of the chain at runtime as we do not know which field in a chain element points to the next element. What we need to do is to explicitly annotate each constructor
${\textit{C}^\textit{k}}$
(of arity
${\textit{k}}$
) in a context also with an index
${\textit{i}}$
corresponding to the field that points to the next element, as
${\textit{C}^\textit{k}_\textit{i}}$
. It turns out, we can actually do this efficiently while constructing the context – and we can do it systematically just by modifying our fold function to keep track of this context path at construction:

With such indices present at runtime, we can define non-unique substitution as:
where
${\mathsf{append}}$
follows the context path at runtime copying each element as we go and eventually appending
${\textit{z}}$
at the hole:
We can show the context laws still hold for these definitions (see Appendix B.11 in the supplement). The append operation in particular can be implemented efficiently at runtime using a fast loop that updates the previous element at each iteration (essentially using manual TRMC!). In the Koka runtime system, it happens to be the case that there is already an 8-bit field index in the header of each object which is used for stackless freeing. We can thus use that field for context paths since if a context is freed it is fine to discard the context path anyways. The runtime cost of the hybrid technique is mostly due to an extra uniqueness check needed when doing context composition to see if we can safely substitute in-place (see Section 7.3). As we see in the benchmark section, this turns out to be quite fast in practice. Moreover, the Koka compiler uses static-type information when possible to avoid this check if a function is guaranteed to be used only with a linear effect type.
7.3 Efficient code generation
As an example of the code generation of our TRMC scheme, we consider the map function from our benchmarks in Section 9. The map function is specialized by the compiler for the increment function, and after the TRMC transformation we have:

Here the empty context
is the Minamide tuple as a value type containing the final result and hole address, something like
(invalid, null). For efficiency, we represent the empty tuple with a null address for the hole. The single-cell context
is represented by the minamide tuple
. Eventually, the
value type is passed in registers (x19 and x21), and the generated code for arm64 becomes:

Note in particular how the header for the
node in the context is set as
where, from left-to-right, we initialize the tag
, the context path field (
), and the total number of fields (also
). As such, maintaining context paths comes for free since it is done as part of header initialization. Also we see the reuse of Perceus reference counting (Xie et al., Reference Xie and Leijen2021; Lorenzen & Leijen, Reference Lorenzen and Leijen2022) in action, where the
node that is matched (in x0) is reused for the context
node (also in x0). Since the effect inferred for the specialized map function is total, the check for uniqueness of the context is removed as it the context is guaranteed to be used lineraly.
7.4 Dynamic copying without reference counting
Lorenzen et al. (Reference Lorenzen, Leijen, Swierstra and Lindley2024) show that it is possible to support first-class constructor contexts even in languages without precise reference counts. Their proposed implementation (also suggested by Gabriel Scherer) uses a special distinguished value for a runtime hole
${{{\square}}}$
that is never used by any other object. A substitution now first checks the value at the hole: if it is a
${{{\square}}}$
value, the hole is substituted for the first time and we just overwrite the hole in-place (in constant time). However, any subsequent substitution on the same context will find some object instead of
${{{\square}}}$
. At this point, we first dynamically copy the context path (in linear time) and then update the copy in-place.

The illustration above (due to Lorenzen et al., Reference Lorenzen, Leijen, Swierstra and Lindley2024) shows a more complex example of a shared tree context that is applied to two separate nodes. The runtime context path is denoted here by bold edges. The intermediate state is interesting as it is both a valid tree, but also a part of the tree is shared with the remaining context, where the hole points to a regular node now. When that context is applied, only the context path (node 5 and 2) is copied first where all other nodes stay shared (in this case, only node 1).
However, it turns out that this simple approach is not sound without further restrictions. For general first-class contexts, the second context can be arbitrary (instead of always a constant
in the TRMC case), the context composition operation c1 ++ c2 needs an extra check in order to avoid creating cycles: we check if c2 has an already overwritten hole or if the hole in c2 is at the same address as in c1. In either case, c2 is copied along the context path.
Figure 3 shows a partial implemention in C code of how one can implement constructor contexts in a runtime for languages without precise reference counting. We assume that HOLE is the distinguished value for unfilled holes (
${{{\square}}}$
). When we compose two contexts, we need to ensure we can handle shared contexts as well where we copy a context along the context path if needed (using ctx_copy). In the application and composition functions, the check
sees if the hole in c1 is already overwritten (where *c1.hole != HOLE). In that case, we copy c1 along the context path as shown in Section 7.2 to maintain referential transparency.

Fig. 3. Implementing constructor composition and application in the runtime system (for languages without precise reference counts).
However, in the composition operation we also need to do a similar check for c2 as well in order to avoid cycles: the second check
checks if c2 has an already overwritten hole, but also if the hole in c2 is the same as in c1. In either case, c2 is copied along the context path. Effectively, both checks ensure that the new context that is returned always ends with a single fresh HOLE. Let’s consider some examples of shared contexts. A basic example is a simple shared context, as in:
which evaluates to
. Here, during the second application, check
ensures the shared context c is copied such that the list
stays unaffected.
A more tricky example is composing a context with itself:
which evaluates to
. The check
here copies the appended c (since c1.hole == c2.hole). In this example, the potential for a cycle is immediate, but generally it can be obscured with a shared context inside another. Consider:

which evaluates to
. The check
again copies the appended c2 in c ++ c2 (since *c2.hole != HOLE).
Note that the (B) check in composition is sufficient to avoid cycles. In order to create a cycle in the context path, either c1 must be in the context path of c2 (I), or the c2 in the context path of c1 (II). For case (I), if c1 is at the end of c2, then their holes are at the same address where c1.hole == c2.hole. Otherwise, if c1 is not at the end, then *c1.hole != HOLE and we have copied c1 already due to check (A). For case (II) the argument is similar: if c2 is at the end of c1 we again have c1.hole == c2.hole, and otherwise *c2.hole != HOLE.
The implementation using precise reference counting is not very different from the one without reference counting. The main difference is in the checks (A) and (B), which become:

This is the implementation that is used in the Koka runtime system. The (B) check here is required to maintain the invariant that context paths always form unique chains (Section 7.2). From this property, it follows directly that no cycles can occur in the context path.
7.5 Runtime behavior
Interestingly, the two implementations, with or without precise reference counting, do differ in their runtime performance characteristics, which are dual to each other in terms of space and time.
7.5.1 Time
The implementation without reference counting only copies on demand when the hole is already filled, whereas our earlier implementation with reference counts copies whenever the context is found to be not unique upon filling the hole. The latter can be a problem if the context is later discarded without being used. Consider the knapsack program, which in its last iteration may call itself on a one-element list [x] with
. For this special case, the code reduces to:
This computation is run twice, where the first run successfully returns k ++.
but the second run fails. The reference counting-based implementation has to copy k in the first run, since its reference count is not one (due to k being captured for the second run). In contrast, assuming that the hole in k is not yet filled, the new implementation can simply fill the hole of k with
in the first run without copying. Since k is discarded in the second run, no copying is needed at all. We will come back to this point in Section 9, where we see that the reference counting implementation in Koka does not perform well in a backtracking search, presumably due to this issue.
7.5.2 Space
The implementation without precise reference counts can use more space though than the one based on reference counting. This can occur when a context accidentally holds on to values that have been written into its hole. Consider an earlier state of the knapsack program, where it may process a list
. Then we can simplify the code to:
Following the flip(), we first try to use v as our element. But since v > w, this computation fails and we backtrack. However, our new algorithm may have written
into the hole of k. This value is now garbage, but this may not be obvious to a garbage collector or reference counting scheme, since k is still live. Only when backtracking to the second run do we copy k and discard the old value.
In contrast, the implementation based on reference counting would have copied (and discarded) k in the first run already. Unlike the new implementation, it is garbage-free (Xie et al., Reference Xie and Leijen2021) and guarantees that no space is used for values that are no longer needed. For this reason, we prefer the implementation via reference counting in Koka, using the other implementation for GC-based languages.
8 Programming with first-class constructor contexts
First-class constructor contexts turn out to be a powerful feature, and they allow us to write many programs by hand that would be hard to generate automatically from a general TRMC transformation. In this section, we explore some of these programs, all of which can be written in Koka.
8.1 Modulo cons products
The partition function calls a predicate on each element of a list and appends it to one of two piles depending on the result:

The recursive call to partition is followed by a pattern match on the resulting tuple, an if-statement and finally the constructor application. This does not fit the TRMc transformation directly, but it also might not seem too different – and indeed this function was suggested as fruitful target for an expanded TRMC translation both by Bour et al. (Reference Bour, Clément and Scherer2021) and the conference version of this paper.
However, in order to make this function tail-recursive, the p(x) call would have to be moved before the recursive call. That can be done by a compiler if p is pure, but what if p may perform side effects? Thus, even an extended TRMc transformation could only apply if the user first rewrote their code to:

The conference version of this paper describes a transformation that recognizes that the pattern match on the returned tuple is mirrored in the creation of a new tuple and looks for constructor contexts inside the created tuple.
However, it may not be worth implementing such specific transformation as we can easily rewrite it manually using two explicit first-class constructor contexts for yes and no:

The resulting code is clearer than the version with an explicit ok variable, and not just more efficient but arguably even clearer than the original version. For this reason, we now recommend that programmers use first-class constructor contexts directly for examples like this.
8.2 Difference lists
Another example of future work described by Bour et al. (Reference Bour, Clément and Scherer2021) is the flatten function. This function calls itself recursively and passes the result to the append function on lists:

While append is tail-recursive modulo cons, flatten is not. However, append is just a sequence of constructor applications ending in the second argument, and we can easily rewrite it using a first-class constructor context returned from append (i.e., a difference list):

8.3 Composing constructor contexts
Another example which illustrates the usefulness of first-class conntexts that can be stored in data structures is the composition of constructor contexts with defunctionalized evaluation contexts. While constructor contexts naturally apply to the map over a list, they do not apply directly to a map over trees:

Here, the first recursive call to tmap is not in a constructor context and thus the TRMc transformation alone is not enough to make this tail-recursive. However, instead of resorting to full defunctionalized evaluation contexts, we can use them only for descending into the left child and keep using constructor contexts to descend into the right branch:

This function immediately follows from the technique described in Section 5.3. It extends the acc accumulator whenever it goes into the left subtree and extends the top accumulator whenever it goes into the right subtree. While a version using only defunctionalized evaluation contexts corresponds to pointer reversal (Schorr & Waite, Reference Schorr and Waite1967), this version reverses only the pointers going to the right child, but leaves the pointers to the left child intact.
8.4 Polymorphic recursion
In this paper, we have limited ourselves to recursive functions where each recursive call has the same return type. However, there are some functions where the recursive call might have a different return type due to polymorphic recursion. For example, Okasaki (Reference Okasaki1999) presents the following random access list:

Here the recursive call instantiates
with
, and the hole in
has type
. It turns out that for polymorphically recursive code, performing the translation can lead to code that is not typeable is System F. This issue is well known for defunctionalized evaluation contexts, where GADTs are required to regain typability (Pottier & Gauthier, Reference Pottier and Gauthier2004). Analogously, we give two type parameters to first-class constructor contexts
where a corresponds to the type of the root and b to the type of the hole. Our primitive operations have the general types:

It turns out that this encapsulates the necessary type information to type the result of the translation for polymorphic recursion. Even though Koka has an intermediate core representation based on System F, the application and composition functions are primitives and Koka transforms the above function without problems. Our cons function is translated to:

8.5 Multiple holes
A limitation of our current approach is that we do not support multiple holes in a constructor context. For example, consider the following function which builds a perfectly balanced binary tree of height n:

This function cannot directly be made tail-recursive with our current approach. The issue is that the value t occurs twice in the constructor. This means that we would need our context to have two holes instead of one. But the compact representation of our context paths makes it impossible to fork the path, which means that we can only support copying for single-hole contexts. When constructor contexts are only used linearly and never copied, it is possible to support multiple holes directly (Bagrel, Reference Bagrel2024).
8.6 TRMC as a source-to-source transformation
Since the introduction of first-class constructor contexts to the language, the TRMc transformation has become source-to-source. This means that there is no longer a direct reason for compiler writers to support TRMc directly as long as first-class contexts are supported, since the latter allow programmers to recreate the effects of the TRMc transformation manually.
Naturally, this raises the question of which parts of the transformation a compiler should handle automatically. In Koka, we currently support the TRMc transformation to ensure that the naive version of map is automatically optimized. However, we do not implement all extensions that are possible. For example, we have not implemented the extension proposed in Section 7.1 of the conference version of this paper since first-class constructor contexts make it easily possible to derive it manually. Furthermore, we do not implement an automatic translation for other kinds of contexts like the ones proposed in Section 4.
One choice for future compiler writers would be to automatically apply exactly those contexts for which benchmarks indicate that they always improve performance. This seems to be the case for constructor contexts and semiring contexts (which are also supported by GCC, see Section 4.4). However, traditional CPS and defunctionalized contexts tend to decrease performance. It can be valuable to use contexts that decrease performance since they still protect against stack overflows, but this should only be done for those functions that actually present a risk of causing a stack overflow.
Another option would be to let users make the choice on which functions should be transformed and with which context. Possible contexts might either be baked into the language or even declared directly by users. In Section 5, we describe how several contexts can be composed so that the compiler can achieve tail-recursive functions where some of the calls are eliminated using more performant contexts than other calls.
Finally, another sensible option would be to perform no TRMC pass in the compiler at all and instead leave the transformation completely to programmers. We believe that the implementations using an explicit accumulator presented in this section are often similar in clarity to direct-style implementations and it would not be infeasible to simply perform the TRMC transformation by hand where required. In return, the language enjoys simpler semantics and there is a smaller implementation burden in the compiler.
9 Benchmarks
The Koka compiler has a full implementation the TRMC algorithm as described in this paper for constructor contexts (since v2.0.3, Aug 2020). We measure the impact of TRMC relative to other variants on various tests: the standard map function over a list (map), mapping over a balanced binary tree (tmap), balanced insertion in a red-black tree (rbtree), and finally the knapsack problem as shown in Section 7. Each test program scales the repetitions to process the same number of total elements (100 000 000) for each test size. The map test repeatedly maps the increment function over a shared list of numbers from 1 to N and sums the result list. This means that the map function repeatedly copies the original list and Perceus cannot apply reuse here (Lorenzen & Leijen, Reference Lorenzen and Leijen2022). For example, the test for the standard (and TRMC) map function in Koka is written as:

For each test, we measured five different variants:
-
• trmc: the TRMC version which is exactly like the standard (std) version.
-
• std: the standard non-tail-recursive version. This is the same source as the trmc version but compiled with the –fno-trmc flag.
-
• acc: this is the accumulator style definition where the accumulated result list- or tree-visitor is reversed in the end.
-
• acc (no reuse): this is the accumulator style version but with Perceus reuse disabled for the accumulator. The performance of this variant may be more indicative for systems without reuse. Accumulator reuse is important as it allows the accumulated result to be reversed “in place”.
-
• cps: the CPS style version with an explicit continuation function. This allocates a closure for every element that eventually allocates the result element for the final result. Perceus does not reuse the memory underlying closures.
The benchmark results are shown in Figure 4. For the map function, we see that our TRMC translation is always faster than the alternatives for any size list. For a tree map (tmap), this is also the case, except for one-element trees where the standard tmap is slightly faster (6%). However, when we consider a slightly more realistic example of balanced insertion into a tree, TRMC is again as fast or faster in all cases. The rbtree benchmark is interesting as during traversal down to the insertion point, there two recursive cases where TRMC applies, but also two recursive cases where TRMC does not apply. Here, we see that it still helps to apply TRMC where possible as looping is apparently faster than a recursive call in this benchmark.

Fig. 4. Benchmarks on Ubuntu 20.04 (AMD 5950x), Koka v2.4.1-dev. The benchmarks are map over a list (map), map over a tree (tmap), balanced red-black tree insertion (rbtree), and the knapsack program that uses nonlinear control flow. Each workload is scaled to process the same number of total elements (usually 100 000 000). The tested variants are TRMC (trmc), the standard non-tail-recursive style (std), accumulator style (acc), accumulator style without Perceus reuse (acc (no reuse)), and finally CPS style (cps).
Finally, knapsack implements the example from Section 7 with a backtracking effect. Unfortunately, the TRMC variant, which uses the hybrid approach to copy the context on demand, is less fast than the alternatives. It is not that much slower though – about 25% at worst. The reason for this is that there is less sharing. For the accumulator version, at each choice point the current accumulated result is shared between each choice, building a tree of choices. At the end, many of these choices are just discarded (as the knapsack is too full), and only for valid solutions a result list is constructed (as a copy). However, for the hybrid trmc approach, we copy the context on demand at each choice point, and when we reach a point where the knapsack is too full the entire result is discarded, keeping only valid solutions. As such, the trmc variant copies more than the other approaches depending on how many of the generated solutions are eventually kept. Still, in Koka we prefer the hybrid approach to avoid code duplication.
10 Related work
Tail recursion modulo cons was a known technique in the LISP community as early as the 1970s. Risch (Reference Risch1973) describes the TRMc transformation in the context of REMREC system which also implemented the modulo associative operators instantiation described in Section 4.4. A more precise description of the TRMc transformation was given by Friedman & Wise (Reference Friedman and Wise1975). More recently, Bour et al. (Reference Bour, Clément and Scherer2021) describe an implementation for OCaml which also explores various language design issues with TRMc. The implementation is based on destination passing style where the result is always directly written into the destination hole. This entails generating an initial unrolling of each function. For example, the map function is translated (in pseudo code) as:

This can potentially be more efficient since there is only one extra argument for the destination address (instead of our representation as a Minamide tuple of the final result with the hole address), but it comes at the price of duplicating code. Note that the map_dps function returns just a unit value and is only called for its side effect. As such it seems quite different from our general TRMC based on context composition and application. However, the destination passing style may still be reconciled with our approach: with a Minamide tuple the first iteration always uses an “empty” tuple, while every subsequent iteration has a tuple with the fixed final result as its first element, where only the hole address (i.e., the destination) changes. Destination passing style uses this observation to specialize for each case, doing one unrolling for the first iteration (with the empty tuple), and then iterating with only the second hole address as the destination. The algorithm rules by Bour et al. (Reference Bour, Clément and Scherer2021) directly generate a destination passing style program. For example, the core translation rule for a constructor with a hole is:

Here a single rule does various transformations that we treat as orthogonal, such as folding, extraction, instantiation of composition, and the actual TRMc transformation. Allain et al. (Reference Allain, Bour, Clément, Pottier and Scherer2025) expand on the design and give a formal proof of correctness using separation logic. In logic languages, difference lists (Clark & Tärnlund, Reference Clark and Tärnlund1977) can be used to encode a form of TRMc: difference lists are usually presented as a pair
${(\textit{L},\mathsf{X})}$
where
${\mathsf{X}}$
is a logic variable which is the last element of the list
${\textit{L}}$
. With in-place update of the unification variable
${\mathsf{X}}$
, one can thus append to
${\textit{L}}$
in constant time – quite similar to our constructor contexts. This is also done in the experimental Ozma backend of the Scala language (Doeraene & Van Roy, Reference Doeraene and Van Roy2013). Engels (Reference Engels2022) describes an implemention of TRMC for the Elm language that can also tail-optimize calls to the right of a list append by keeping the last cell of the right-appended list as a context. Pottier & Protzenko (Reference Pottier and Protzenko2013) implement a type system inspired by separation logic, which allows the user to implement a safe version of in place updating TRMc through a mutable intermediate datatype. Laziness works similar to TRMc for the functions we consider: recursive calls guarded by a constructor are thunked and incremental forcing can happen without using the stack. The listless machine (Wadler, Reference Wadler1984) is an elegant model for this behavior.
Hughes (Reference Hughes1986) considers the function reverse and shows how the fast version can be derived from the naive version by defining a new representation of lists as a composition of partially applied append functions (which are sometimes also called difference lists). His function rep(xs) (defined as
) creates such abstract list and is equal to our
${\mathsf{ctx}}$
when instantiated to append functions and list contexts (Section 4.1). Similarly, his abs(f) function (defined as f []) corresponds to our
${\mathsf{app}~\textit{k}~[]}$
in that case, and finally, the correctness condition would correspond to our
${(\textit{appctx})}$
law. The idea of calculating programs from a specification has a long history, and we refer the reader to early work by Bird (Reference Bird1984), Wand (Reference Wand1980), and Meertens (Reference Meertens1986), and more recent work by Gibbons (Reference Gibbons2022) and Hutton (Reference Hutton2021).
Defunctionalization (Reynolds, Reference Reynolds1972; Danvy & Nielsen, Reference Danvy and Nielsen2001) has often been used to eliminate all higher-order calls and obtain a first-order version of a program. Wand & Friedman (Reference Wand and Friedman1978) describes a defunctionalization algorithm in the context of LISP. Minamide et al. (Reference Minamide, Morrisett and Harper1996) introduce special primitives pack and open (that correspond roughly to our
${\mathsf{ctx}}$
and
${\mathsf{app}}$
) and describe a type system for correct usage. Bell et al. (Reference Bell, Bellegarde and Hook1997) and Tolmach & Oliva (Reference Tolmach and Oliva1998) perform the conversion automatically at compile-time. Danvy & Nielsen (Reference Danvy and Nielsen2001) propose to apply defunctionalization only to the closures of self-recursive calls, which should produce equal results as our approach in Section 4.3. However, they do not give an algorithm for this and the technique has so far mainly been used manually (Danvy & Goldberg, Reference Danvy and Goldberg2002; Gibbons, Reference Gibbons2022).
An early implementation of TRMc in a typed language was in the OPAL compiler (Didrich et al., Reference Didrich, Fett, Gerke, Grieskamp and Pepper1994). Similar to Bour et al. (Reference Bour, Clément and Scherer2021), they also used destination passing style compilation with an extra destination argument where the final result is written to. Like Koka and Lean, OPAL also managed memory using reference counting and could reuse matched constructors (Schulte & Grieskamp, Reference Schulte and Grieskamp1992). Reuse combines well with TRMc and in recent work Lorenzen & Leijen (Reference Lorenzen and Leijen2022) show how this can be used to speed up balanced insertion into red-black trees using the functional but in-place (FBIP) technique. Sobel & Friedman (Reference Sobel and Friedman1998) propose to reuse the closures of a CPS-transformed program for newly allocated constructors and show that this approach succeeds for all anamorphisms. However, reuse based on dynamic reference counts can improve upon this by for example also reusing the original data for the accumulator (and generalize to nonlinear control).
We are using the linearity of the Perceus heap semantics (Xie et al., Reference Xie and Leijen2021; Lorenzen & Leijen, Reference Lorenzen and Leijen2022) to reason about linear chains and the essence of in-place updates. In our case, these linear chains are used to reason about the shape of a separate part of the heap. This suggest that separation logic (Reynolds, Reference Reynolds2002) could also be used effectively for such proofs. For example, Moine et al. (Reference Moine, Charguéraud and Pottier2023) use separation logic to reason about space usage under garbage collection.
11 Conclusion and future work
In this paper, we explored tail recursion modulo context and tried to bring the general principles out of the shadows of specific algorithms and into the light of equational reasoning. We have a full implementation of the modulo cons instantiation and look forward to explore future extensions to other instantiations as described in this paper.
Acknowledgments
We thank the anonymous reviewers of POPL 2023 and JFP for their helpful feedback. Gabriel Scherer and Jeremy Gibbons provided feedback on earlier drafts of this paper.
Conflicts of Interest
None.
A Further Benchmarks
Figure A1 shows benchmark results of the map benchmark. This time we included the results for OCaml 4.14.0 which has support for TRMc (Bour et al., Reference Bour, Clément and Scherer2021) using the [@tail_mod_cons] attribute. For example, the TRMc map function is expressed as:

Fig. A1. Benchmarks on Ubuntu 20.04 (AMD 5950x), Koka v2.4.1-dev, OCaml 4.14.0. The benchmark repeatedly maps the increment function over a list of a given size and sums the result list. Each workload is scaled to process the same number of total elements (100 000 000). The tested variants of
$\texttt{map}$
are TRMC (trmc), accumulator style (acc), the standard non-tail-recursive style (std), and finally CPS style (cps).

Comparing across systems is always difficult since there are many different aspects, in particular the different memory management of both systems where Koka uses Perceus style reference counting (Xie et al., Reference Xie and Leijen2021) and OCaml uses generational garbage collection, with a copying collector for the minor generation, and a mark-sweep collector for the major heap (Doligez & Leroy, Reference Doligez and Leroy1993).
The results at least indicate that our approach, using Minamide-style tuples of the final result object and a hole address, is competitive with the OCaml approach based on direct destination passing style. For our translation, the trmc translation is always as fast or faster as the alternatives, but unfortunately this is not the case in OCaml (yet) where it requires larger lists to become faster then the standard recursion.
OCaml is also faster for lists of size 10 where std is about 25% faster than Koka’s trmc. We believe this is in particular due to memory management. For the micro benchmark, such small lists always fit in the minor heap with very fast bump allocation. Since in the benchmark the result is always immediately discarded no live data need to be traced in the minor heap for GC – perfect! In contrast, Koka uses regular malloc/free with reference counting with the associated overheads. However, once the workload increases with larger lists, the overhead of garbage collection and copying to the major heap becomes larger, and in such situation Koka becomes (significantly) faster. Also, the time to process the 100 M elements stays relatively stable for Koka (around 0.45 s) no matter the sizes of the lists, while with GC we see that processing on larger lists takes much longer.
B Proofs
B.1 The TRMC Algorithm is Sound
In Section 3.2, we used equational reasoning to derive the TRMC algorithm which makes it sound by construction. However, since we use an assumed specification (a), we generally need to make sure we only use this inductively on smaller terms (Hutton, Reference Hutton2021; Gibbons, Reference Gibbons2022). However, we use the specification on the original term in the directly tail-recursive case, where we translate a tail-recursive expression
${\textit{f}~\textit{e}}$
to its corresponding tail-recursive call in the translation
${\textit{f}'~\textit{e}~\textit{k}}$
. Intuitively, this is correct since we map each original recursive call to a recursive call in the translation, but technically it is not inductive. It is a bit beyond the scope of this paper, we can show formally that such recursive reasoning step is actually sound in general. In particular, we can use a slightly more powerful notion of equality. We base our development on the step-indexed logical relation of Appel & McAllester (Reference Appel and McAllester2001).
Definition 1. For closed terms
${\textit{e}_{1}}$
and
${\textit{e}_{2}}$
, we write
${\textit{F}~\vDash{}~\textit{e}_{1}~\leqslant{}_\textit{i}~\textit{e}_{2}}$
if for all
${0~<~\textit{j}~<~\textit{i}}$
we have
${\textit{e}_{1}~{\longmapsto}^\textit{j}~\textit{v}}$
implies
${\textit{e}_{2}~{\longmapsto^{\!*}}~\textit{v}}$
under the global environment of top-level functions
${\textit{F}}$
. We write
${\textit{F}~\vDash{}~\textit{e}_{1}~\cong{}_\textit{i}~\textit{e}_{2}}$
if
${\textit{F}~\vDash{}~\textit{e}_{1}~\leqslant{}_\textit{i}~\textit{e}_{2}}$
and
${\textit{F}~\vDash{}~\textit{e}_{2}~\leqslant{}_\textit{i}~\textit{e}_{1}}$
. The above definition is similar to the
${\textit{e}~\leqslant{}_\textit{k}~\textit{f}~:{}~\tau}$
relation of Appel & McAllester (Reference Appel and McAllester2001) (Section 3), but simplified in two ways: we do not consider a typing relation
${:{}~\tau}$
and we require that the values produced by
${\textit{e}_{1}}$
and
${\textit{e}_{2}}$
are the same instead of merely being equivalent for
${\textit{i}~-~\textit{j}}$
steps. The latter choice implies that this notion of equality is not congruent under lambdas and we cannot rewrite under lambdas (which is the case for our development). We use the subscript
${\textit{i}}$
instead of
${\textit{k}}$
to avoid confusion with our notation for continuations. We omit the environment
${\textit{F}}$
where it is clear from the context.
Definition 2. For open terms
${\textit{e}_{1}}$
and
${\textit{e}_{2}}$
with free variables
${\Gamma{}}$
, we write
${\textit{F}~\vDash{}~\textit{e}_{1}~\leqslant{}_\textit{i}~\textit{e}_{2}}$
if for all substitutions
${\sigma}$
mapping
${\Gamma{}}$
to values, we have
${\textit{F}~\vDash{}~\sigma(\textit{e}_{1})~\leqslant{}_\textit{i}~\sigma(\textit{e}_{2})}$
and similarly for
${\textit{F}~\vDash{}~\textit{e}_{1}~\cong{}_\textit{i}~\textit{e}_{2}}$
. We write
${\textit{F}~\vDash{}~\textit{e}_{1}~\leqslant{}~\textit{e}_{2}}$
if
${\textit{F}~\vDash{}~\textit{e}_{1}~\leqslant{}_\textit{i}~\textit{e}_{2}}$
for all
${\textit{i}}$
and similarly for
${\textit{F}~\vDash{}~\textit{e}_{1}~\cong{}~\textit{e}_{2}}$
.
This definition is similar to the
${\Gamma{}~\vDash{}_\textit{k}~\textit{e}~\leqslant{}~\textit{f}~:{}~\tau}$
relation but simplified to use the same substitution
${\sigma}$
for both
${\textit{e}_{1}}$
and
${\textit{e}_{2}}$
instead of substitutions
${\sigma_1,~\sigma_2}$
whose values are equivalent for
${\textit{i}}$
steps. The main induction principle we need is the following:
Informally, this lemma states that we can show that
${\mathsf{E}[\textit{f}~\textit{x}]~\leqslant{}~\textit{e}'}$
by unfolding
${\textit{f}}$
exactly one step and then rewrite using the inequality we aim to prove. The key insight that makes this sound is that we quantify over all possible implementations of
${\textit{f}}$
. The “free theorem” of
${\textit{f}}$
then implies that our assumption can only access the inequality and not any other properties of
${\textit{f}}$
.
Lemma 3. (Unfolding Lemma)
Let
${\textit{f}}$
be a function,
${\mathsf{E}}$
be an evaluation context and
${\textit{e},~\textit{e}'}$
be expressions where
${\mathsf{E},\textit{e}'}$
do not mention
${\textit{f}}$
. If for any implementation of
${\textit{f}}$
with
${\mathsf{E}[\textit{f}~\textit{x}]~\leqslant{}~\textit{e}'}$
we have
${\mathsf{E}[\textit{e}]~\leqslant{}~\textit{e}'}$
, then for
${\textit{f}~\textit{x}~=~\textit{e}}$
, we have
${\mathsf{E}[\textit{f}~\textit{x}]~\leqslant{}~\textit{e}'}$
.
Proof. Construct a sequence of functions
${\textit{f}_\textit{i}}$
as follows:
${\textit{f}_0~\textit{x}~=~\textit{f}_0~\textit{x}}$
and
${\textit{f}_{\textit{i}+1}~\textit{x}~=~\textit{e}[\textit{f}_\textit{i}/\textit{f}]}$
.
-
• We show that
${\textit{f}_\textit{j}~\textit{x}~\leqslant{}~\textit{f}_{\textit{j}~+~1}~\textit{x}}$
for all
${\textit{j}}$
by induction on
${\textit{j}}$
. Case
${\textit{j}~=~0}$
: Obvious, since
${\textit{f}_0~\textit{x}}$
diverges. Case
${\textit{j}~=~\textit{j}'~+~1}$
: By induction on
${\textit{e}}$
. Case
${\textit{e}~=~\textit{f}~\textit{v}}$
: Then
${\textit{f}_{\textit{j}'}~\textit{v}~\leqslant{}~\textit{f}_{\textit{j}}~\textit{v}~\leqslant{}~\textit{f}_{\textit{j}~+~1}~\textit{v}}$
. All other cases follow from the inductive hypothesis since
${\textit{f}_{\textit{j}'}~\textit{x}}$
and
${\textit{f}_\textit{j}~\textit{x}}$
only differ in the recursive calls. -
• We show that
${\mathsf{E}[\textit{f}_\textit{i}~\textit{x}]~\leqslant{}~\textit{e}'}$
for all
${\textit{i}}$
. We have
${\textit{f}_0~\textit{x}~\leqslant{}~\textit{e}'}$
since
${\textit{f}_0}$
diverges on all inputs. We have
${\mathsf{E}[\textit{f}_0~\textit{x}]~\leqslant{}~\textit{e}'}$
, since the hole in an
${\mathsf{E}}$
context is always evaluated. Assume that
${\mathsf{E}[\textit{f}_\textit{i}~\textit{x}]~\leqslant{}~\textit{e}'}$
. Then for
${\textit{f}~=~\textit{f}_\textit{i}}$
, the assumption gives us
${\mathsf{E}[\textit{e}]~\leqslant{}~\textit{e}'}$
and thus
${\mathsf{E}[\textit{f}_{\textit{i}+1}~\textit{x}]~\leqslant{}~\textit{e}'}$
.
To finish the proof, we thus only have to show that
${\mathsf{E}[\textit{f}~\textit{x}]~\leqslant{}~\mathsf{E}[\textit{f}_\textit{j}~\textit{x}]}$
for some
${\textit{j}}$
. Assume that
${\mathsf{E}[\textit{f}~\textit{x}]}$
converges in
${\textit{i}}$
steps to
${\textit{v}}$
. Then
${\textit{f}~\textit{x}}$
converges in
${\textit{j}~\leqslant{}~\textit{i}}$
steps to some value
${\textit{w}}$
. We show that
${\textit{f}_\textit{j}~\textit{x}}$
also converges to
${\textit{w}}$
by induction on
${\textit{j}}$
, which implies that
${\mathsf{E}[\textit{f}_\textit{j}~\textit{x}]}$
converges to
${\textit{v}}$
.
-
• Case
${\textit{j}~=~0}$
: Since
${\textit{f}~\textit{x}}$
has to take at least one step, it does not yield a value. -
• Case
${\textit{j}~=~\textit{j}'~+~1}$
: By the induction hypothesis, we have
${\textit{f}~\textit{x}~\leqslant{}_{\textit{j}'}~\textit{f}_{\textit{j}'}~\textit{x}}$
. We show that for
${\textit{f}~\textit{x}~=~\textit{e}}$
, we have
${\textit{f}~\textit{x}~\leqslant{}_\textit{j}~\textit{f}_{\textit{j}}~\textit{x}}$
by induction on
${\textit{e}}$
. Case
${\textit{e}~=~\textit{f}~\textit{v}}$
: Then
${\textit{f}~\textit{v}~\leqslant{}_{\textit{j}'}~\textit{f}_{\textit{j}'}~\textit{v}~\leqslant{}~\textit{f}_{\textit{j}}~\textit{v}}$
. All other cases follow from the inductive hypothesis since
${\textit{f}~\textit{x}}$
and
${\textit{f}_\textit{j}~\textit{x}}$
only differ in the recursive calls.
Our proof of the unfolding lemma is in the style of Scott Induction, where we create a chain of approximations
${\textit{f}_\textit{i}}$
to the least fixpoint
${\textit{f}}$
. An alternative approach would be to define contextual equality of terms using weakest preconditions (Turon et al., Reference Turon, Dreyer and Birkedal2013; Krogh-Jespersen et al., Reference Krogh-Jespersen, Svendsen and Birkedal2017; Timany et al., Reference Timany, Krebbers, Dreyer and Birkedal2024, Section 8) and to replace the approximation functions by Löb induction (Appel et al., Reference Appel, Mellies, Richards and Vouillon2007; Dreyer et al., Reference Dreyer, Ahmed and Birkedal2009) in the weakest-precondition relation.
Proof of Theorem 1:
Let
${\textit{f}~\textit{x}~=~\textit{e}_\textit{f}}$
and
${\textit{f}'~\textit{x}~\textit{k}~=~{[\![}\textit{e}_\textit{f}{]\!]}_{\textit{f},\textit{k}}}$
, then
${\textit{app}~\textit{k}~(\textit{f}~\textit{x})~\cong{}~\textit{f}'~\textit{x}~\textit{k}}$
.
Proof. We focus on the
${(\textit{tail})}$
case of the translation, while
${(\textit{tlet})}$
,
${(\textit{tmatch})}$
,
${(\textit{tapp})}$
, and
${(\textit{base})}$
can be handled as in the text of the paper. We show
${\textit{f}'~\textit{x}~\textit{k}~=~{[\![}\textit{e}_\textit{f}{]\!]}_{\textit{f},\textit{k}}~\leqslant{}~\textit{app}~\textit{k}~(\textit{f}~\textit{x})}$
by the unfolding lemma with
${\mathsf{E}~=~\square{},~\textit{e}'~=~\textit{app}~\textit{k}~(\textit{f}~\textit{x})}$
. That is, we assume that
${\textit{f}'~\textit{x}~\textit{k}~\leqslant{}~\textit{app}~\textit{k}~(\textit{f}~\textit{x})}$
and show that this implies that
${{[\![}\textit{e}_\textit{f}{]\!]}_{\textit{f},\textit{k}}~\leqslant{}~\textit{app}~\textit{k}~(\textit{f}~\textit{x})}$
:

We show
${\textit{app}~\textit{k}~(\textit{f}~\textit{x})~=~\mathsf{app}~\textit{k}~(\textit{e}_\textit{f})~\leqslant{}~\textit{f}'~\textit{x}~\textit{k}}$
by the unfolding lemma with
${\mathsf{E}~=~\textit{app}~\textit{k}~\square{},~\textit{e}'~=~\textit{f}'~\textit{x}~\textit{k}}$
. That is, we assume that
${\textit{app}~\textit{k}~(\textit{f}~\textit{x})~\leqslant{}~\textit{f}'~\textit{x}~\textit{k}}$
and show that this implies that
${\mathsf{app}~\textit{k}~(\textit{e}_\textit{f})~\leqslant{}~\textit{f}'~\textit{x}~\textit{k}}$
:

B.2 Context Laws for Defunctionalized Contexts

and case
${\textit{k}_{2}~=~\mathsf{A}_\textit{i}~\textit{x}_1~{\dots}~\textit{x}_\textit{m}~\textit{k}_{3}}$

For application we have:

B.3 Context Laws for Right-biased-contexts

and for context application we have:

We proceed by induction over
${\mathsf{A}}$
.

B.4 General Monoid Contexts

and

We proceed by induction over
${\mathsf{A}}$
: case
${\mathsf{A}~=~\square{}}$
:

and
${\mathsf{A}~=~\textit{v}~{\odot}~\mathsf{A}'}$
:

and
${\mathsf{A}~=~\mathsf{A}'~{\odot}~\textit{v}}$
:

B.5 Context Laws for Exponent Contexts
We prove the composition law by induction on
${\textit{k}_{2}}$
:

and

Appliction can be derived as:
We proceed by induction over
${\mathsf{A}}$
: case
${\mathsf{A}~=~\square{}}$
:

and
${\mathsf{A}~=~\textit{g}~\mathsf{A}'}$
:

B.6 Constructor Contexts
Composition:

and application:

B.7 Constructor Contexts and Minamide

Fig. B1. Minamide’s type system adapted to our language
The hole calculus is restricted by a linear-type discipline where the contexts
${\textit{ctx}~\alpha{}}$
${\equiv{}}$
${\textit{hfun}~\alpha{}~\alpha{}}$
have a linear type. This is what enables an efficient in-place update implementation while still having a pure functional interface. For our needs, we need to check separately that the translation ensures that all uses of a context
${\textit{k}}$
are indeed linear. Type judgments in Minamide’s system (Minamide, Reference Minamide1998, fig. 4) are denoted as
where
${\Gamma{}}$
is the normal type environment and
${\textit{H}}$
is the linear one containing at most one linear value. The type environment
${\Gamma{}}$
can still contain linear values with a linear-type but only pass those to one of the premises. The environment restricted to nonlinear values is denoted at
${\Gamma{}|_{\mathsf{N}}}$
. We can now show that our translation can be typed in Minamide’s system:
Lemma 4. (TRMC uses contexts linearly)
If
and
${\textit{k}}$
fresh then
.
To show this, we need a variant of the general replacement lemma (Hindley & Seldin, Reference Hindley and Seldin1986, Lemma 11.18; Wright & Felleisen, Reference Wright and Felleisen1994, Lemma 4.2) to reason about linear substitution in an evaluation context:
Lemma 5. (Linear replacement)
If
for a constructor context
${\mathsf{K}}$
, then there is a sub-deduction
at the hole and
.
Proof. By induction over the constructor context
${\mathsf{K}}$
.
Case
${\square{}}$
.

Case
${\textit{C}^\textit{k}~\textit{w}_1~{\dots}~\mathsf{K}'~{\dots}~\textit{w}_\textit{k}}$
.

Again we see that our maximal context is an evaluation context as we would not be able to derive the Lemma for contexts under lambda’s for example (as the linear-type environment is not propagated under lambda’s).
Proof. (Of Theorem 4) By the
and
rules we obtain:
By the
and
rules, we need to derive:

In particular, we have
${\Gamma_{1}~\subseteq \Gamma_{2}}$
. We proceed by induction over the translation function while maintaining the inductive property.
Case
${(\textit{base})}$
.

Case
${(\textit{tail})}$
,
${\textit{e}~=~\mathsf{K}[\textit{f}~\textit{e}_{1}~{\dots}~\textit{e}_\textit{n}]}$
.

Case
${(\textit{let})}$
,
${\textit{e}~=~\mathsf{let}~\textit{x}~=~\textit{e}_{1}~\mathsf{in}~\textit{e}_{2}}$
.


B.8 Contexts Form Linear Chains
Proof. (Of Lemma 2) By induction on the shape of
${\mathsf{K}}$
:
Case
${\textit{C}~{\dots}~\square{}_\textit{i}~{\dots}}$
:

Case
${\textit{C}~{\dots}~\mathsf{K}'[\textit{C}'~{\dots}~\square{}_\textit{i}~{\dots}]~{\dots}}$

B.9 Deriving Constructor Context Fold
Given the specification:
we can calculate the fold using induction over the shape of
${\mathsf{K}}$
. In the case that
${\mathsf{K}~=~\square{}}$
, we derive:

B.10 Deriving Constructor Context Composition
We can calculate for a
${\mathsf{K}_{1},\mathsf{K}_{2}~{\neq}~\square{}}$
:

B.11 Soundness of the Hybrid Approach
We need to show the context laws still hold for the hybrid approach. At runtime, a context
${\mathsf{K}}$
is always a linear chain resulting from the fold or composition. We write
${\textit{H}{\mid}{}\hat{\mathsf{K}}}$
for a non-empty context
${[\textit{H}',\textit{y} \mapsto^{\textit{m}} \textit{C}~{\dots}~{{\square}}_\textit{i}~{\dots}]^\textit{n}_\textit{x}{\mid}{}\langle\mkern1mu{}\textit{x},\textit{y}{@}\textit{i}\mkern2mu\rangle{}}$
if we have
${\textit{H}_{0}{\mid}{}(|\mathsf{K}|)~\cong{}~\textit{H}_{0},~[\textit{H}',\textit{y} \mapsto^{\textit{m}} \textit{C}~{\dots}~{{\square}}_\textit{i}~{\dots}]^1_\textit{x}{\mid}{}\langle\mkern1mu{}\textit{x},\textit{y}{@}\textit{i}\mkern2mu\rangle{}}$
. Application:










Discussions
No Discussions have been published for this article.