Back to Futures

Common approaches to concurrent programming begin with languages whose semantics are naturally sequential and add new constructs that provide limited access to concurrency, as exemplified by futures. This approach has been quite successful, but often does not provide a satisfactory theoretical backing for the concurrency constructs, and it can be difficult to give a good semantics that allows a programmer to use more than one of these constructs at a time. We take a different approach, starting with a concurrent language based on a Curry-Howard interpretation of adjoint logic, to which we add three atomic primitives that allow us to encode sequential composition and various forms of synchronization. The resulting language is highly expressive, allowing us to encode futures, fork/join parallelism, and monadic concurrency in the same framework. Notably, since our language is based on adjoint logic, we are able to give a formal account of linear futures, which have been used in complexity analysis by Blelloch and Reid-Miller. The uniformity of this approach means that we can similarly work with many of the other concurrency primitives in a linear fashion, and that we can mix several of these forms of concurrency in the same program to serve different purposes.


Introduction
Concurrency has been a very useful tool in increasing performance of computations and in enabling distributed computation, and consequently, there are a wide variety of different approaches to programming languages for concurrency. A common pattern is to begin with a sequential language and add some form of concurrency primitive, ranging from threading libraries such as pthreads to monads to encapsulate concurrent computation, as in SILL [33,32,16], to futures [17]. Many of these approaches have seen great practical success, and yet from a theoretical perspective, they are often unsatisfying, with the concurrent portion of the language being attached to the sequential base language in a somewhat ad hoc manner, rather than having a coherent theoretical backing for the language as a whole.
In order to give a more uniform approach to concurrency, we take the opposite approach and begin with a language, Seax, whose semantics are naturally concurrent. With a minor addition to Seax, we are able to force synchronization, allowing us to encode sequentiality. In the resulting language, we can model many different concurrency primitives, including futures, fork/join, and concurrency monads. Moreover, as all of these constructs are encoded in the same language, we can freely work with any mixture and retain the same underlying semantics and theoretical underpinnings.
Two lines of prior research meet in the development of Seax. The first involves a new presentation of intuitionistic logic, called the semi-axiomatic sequent calculus (Sax) [10], which combines features from Hilbert's axiomatic form [18] and Gentzen's sequent calculus [14]. Cut reduction in the semi-axiomatic sequent calculus can be put into correspondence with asynchronous communication, either via message passing [28] or via shared memory [10]. We build on the latter, extending it in three major ways to get Seax. First, we extend from intuitionistic logic to a semi-axiomatic presentation of adjoint logic [30,23,24,28], the second major line of research leading to Seax. This gives us a richer set of connectives as well as the ability to work with linear and other substructural types. Second, we add equirecursive types and recursively defined processes, allowing for a broader range of programs, at the expense of termination, as usual. Third, we add three new atomic write constructs that write a value and its tag in one step. This minor addition enables us to encode both some forms of synchronization and sequential composition of processes, despite the naturally concurrent semantics of Seax.
This resulting language is highly expressive. Using these features, we are able to model functional programming with a semantics in destination-passing style that makes memory explicit [34,22,4,31], allowing us to write programs in more familiar functional syntax which can then be expanded into Seax. We can also encode various forms of concurrency primitives, such as fork/join parallelism [7] implemented by parallel pairs, futures [17], and a concurrency monad in the style of SILL [33,32,16] (which combines sequential functional with concurrent session-typed programming). As an almost immediate consequence of our reconstruction of futures, we obtain a clean and principled subsystem of linear futures, already anticipated and used in parallel complexity analysis by Blelloch and Reid-Miller [2] without being rigorously developed.
The principal contributions of this paper are: 1. the language Seax, which has a concurrent write-once shared-memory semantics for programs based on a computational interpretation of adjoint logic; 2. a model of sequential computation using an extension of this semantics with limited atomic writes; 3. a reconstruction of fork/join parallelism; 4. a reconstruction of futures, including a rigorous definition of linear futures; 5. a reconstruction of a concurrency monad which combines functional programming with session-typed concurrency as an instance of the adjoint framework; 6. the uniform nature of these reconstructions, which allows us to work with any of these concurrency primitives and more all within the same language.
We begin by introducing the type system and syntax for Seax, along with some background on adjoint logic (section 2), followed by its semantics, which are naturally concurrent (section 3). At this point, we are able to look at some examples of programs in Seax. Next, we make the critical addition of sequentiality (section 4), examining both what changes we need to make to Seax to encode sequentiality and how we go about that encoding. Using our encoding of sequentiality, we can build a reconstruction of a standard functional language's lambda terms (section 5), which both serves as a simple example of a reconstruction and illustrates that we need not restrict ourselves to the relatively low-level syntax of Seax when writing programs. Following this, we examine and reconstruct several concurrency primitives, beginning with futures (section 6), before moving on to parallel pairs (an implementation of fork/join, in section 7) and a concurrency monad that borrows heavily from SILL (section 8). We conclude with a brief discussion of our results and future work.

Seax: Types and Syntax
The type system and language we present here, which we will use throughout this paper, are based on adjoint logic [30,23,24,28,29] starting with a Curry-Howard interpretation, which we then leave behind by adding recursion, allowing a richer collection of programs. Most of the details of adjoint logic are not relevant here, and so we provide a brief overview of those that are, focusing on how they relate to our language.
In adjoint logic, propositions are stratified into distinct layers, each identified by a mode. For each mode m there is a set σ(m) ⊆ {W, C} of structural properties satisfied by antecedents of mode m in a sequent. Here, W stands for weakening and C for contraction. For simplicity, we always assume exchange is possible. In addition, any instance of adjoint logic specifies a preorder m ≥ k between modes, expressing that the proof of a proposition A k of mode k may depend on assumptions B m . In order for cut elimination (which forms a basis for our semantics) to hold, this ordering must be compatible with the structural properties: if m ≥ k then σ(m) ⊇ σ(k). Sequents then have the form Γ ⊢ A k where, critically, each antecedent B m in Γ satisfies m ≥ k. We express this concisely as Γ ≥ k.
We can go back and forth between the layers using shifts ↑ m k A k (up from k to m requiring m ≥ k) and ↓ r m A r (down from r to m requiring r ≥ m). A given pair ↑ m k and ↓ m k forms an adjunction, justifying the name adjoint logic. Now, our types are the propositions of adjoint logic, augmented with general equirecursive types formed via mutually recursive type definitions in a global signature -most of the basic types are familiar as propositions from intuitionistic linear logic [15], or as session types [19,20,13], tagged with subscripts for modes. The only new types are the shifts ↓ r m A r and ↑ m k A k . We do, however, change ⊕ and slightly, using an n-ary form rather than the standard binary form. These are, of course, equivalent, but the n-ary form allows for more natural programming. The grammar of types (as well as processes) can be found in fig. 1. Note

Types
Am, Bm :: Processes P, Q ::= xm ← P ; Q cut: allocate x, spawn P , continue as Q | xm ← ym id: copy or move from y to x | xm.V write V to x, or read K from x and pass it V | case xm K write K to x, or read V from x and pass it to shifted address y k (↓, ↑) that while our grammar includes mode subscripts on types, type constructors, and variables we will often omit them when they are clear from context. The typing judgment for processes has the form where P is a process expression and we require that each m i ≥ k. Given such a judgment, we say that P provides or writes x, and uses or reads x 1 , . . . , x n . The rules for this judgment can be found in fig. 2, where we have elided a fixed signature Σ with type and process definitions explained later in this section. As usual, we require each of the x i and x to be distinct and allow silent renaming of bound variables in process expressions. In this formulation, contraction and weakening remain implicit. 1 Handling contraction leads us to two versions of each of the ⊕, 1, ⊗ left rules, depending on whether the principal formula of the rule can be used again or not. The subscript α on each of these rules may be either 0 or 1, and indicates whether the principal formula of the rule is preserved in the context. The α = 0 version (ΓC, ∆ ≥ m ≥ r) ΓC , ∆ ⊢ P :: (x : Am) ΓC , ∆ ′ , x : Am ⊢ Q :: (z : Cr) ΓC, ∆, ∆ ′ ⊢ (x ← P ; Q) :: (z : Cr) cut ΓW , y : Am ⊢ x ← y :: (x : Am) id of each rule is the standard linear form, while the α = 1 version, which requires that the mode m of the principal formula satisfies C ∈ σ(m), keeps a copy of the principal formula. Note that if C ∈ σ(m), we are still allowed to use the α = 0 version of the rule. Moreover, we write Γ C , Γ W for contexts of variables all of which allow contraction or weakening, respectively. This allows us to freely drop weakenable variables when we reach initial rules, or to duplicate contractable variables to both parent and child when spawning a new process in the cut rule. Note that there is no explicit rule for (possibly recursively defined) type variables t, since they can be silently replaced by their definitions. Equality between types and type-checking can both easily be seen to be decidable using a combination of standard techniques for substructural type systems [3] and subtyping for equirecursive session types, [12] which relies on a coinductive interpretation of the types, but not on their linearity, and so can be adapted to the adjoint setting. Some experience with a closely related algorithm [9] for type equality and type checking suggests that this is practical.
We now go on to briefly examine the terms and loosely describe their meanings from the perspective of a shared-memory semantics. We will make this more precise in sections 3 and 4, where we develop the dynamics of such a sharedmemory semantics.
Both the grammar and the typing rules show that we have five primary constructs for processes, which then break down further into specific cases.
The first two process constructs are type-agnostic. The cut rule, with term x ← P ; Q, allocates a new memory cell x, spawns a new process P which may write to x, and continues as Q which may read from x. The new cell x thus serves as the point of communication between the new process P and the continuing Q. The id rule, with term x m ← y m , copies the contents of cell y m into cell x m . If C / ∈ σ(m), we can think of this instead as moving the contents of cell y m into cell x m and freeing y m .
The next two constructs, x.V and case x K, come in pairs that perform communication, one pair for each type. A process of one of these forms will either write to or read from x, depending on whether the variable is in the succedent (write) or antecedent (read).
A write is straightforward and stores either the value V or the continuation K into the cell x, while a read pulls a continuation K ′ or a value V ′ from the cell, and combines either K ′ and V (in the case of x.V ) or K and V ′ (case x K). The symmetry of this, in which continuations and values are both eligible to be written to memory and read from memory, comes from the duality between ⊕ and , between ⊗ and ⊸, and between ↓ and ↑. We see this in the typing rules, where, for instance, ⊕R 0 and L 0 have the same process term, swapping only the roles of each variable between read and write. As cells may contain either values V or continuations K, it will be useful to have a way to refer to this class of expression: Cell contents W ::= V | K The final construct allows for calling named processes, which we use for recursion. As is customary in session types, we use equirecursive types, collected in a signature Σ in which we also collect recursive process definitions and their types. For each type definition t = A, the type A must be contractive so that we can treat types equirecursively with a straightforward coinductive definition and an efficient algorithm for type equality [12].
A named process p is declared as B 1 m1 , . . . , B n mn ⊢ p :: A k which means it requires arguments of types B i mi (in that order) and provides a result of type A k . For ease of readability, we may sometimes write in variables names as well, but they are actually only needed for the corresponding definitions x ← p y 1 , . . . , y n = P . Operationally, a call z ← p w expands to its definition with a substitution [w/y, z/x]P , replacing variables by addresses.
We can then formally define signatures as follows, allowing definitions of types, declarations of processes, and definitions of processes: For valid signatures we require that each declaration B m ⊢ p :: A k has a corresponding definition x ← p y = P with y : B m ⊢ P :: (x : A k ). This means that all type and process definitions can be mutually recursive.
In the remainder of this paper we assume that we have a fixed valid signature Σ, so we annotate neither the typing judgment nor the computation rules with an explicit signature.

Concurrent Semantics
We will now present a concurrent shared-memory semantics for Seax, using multiset rewriting rules [5]. The state of a running program is a multiset of semantic objects, which we refer to as a process configuration. We have three distinct types of semantic objects: Here, we prefix a semantic object with ! m to indicate that it is persistent when C ∈ σ(m), and ephemeral otherwise. Note that empty cells are always ephemeral, so that we can modify them by writing to them, while filled cells may be persistent, as each cell has exactly one writer, which will terminate on writing. We maintain the invariant that in a configuration either thread(c m , P ) appears together with ! m cell(c m , _), or we have just ! m cell(c m , W ), as well as that if two semantic objects provide the same address c m , then they are exactly a thread(c m , P ), ! m cell(c m , _) pair. While this invariant can be made slightly cleaner by removing the ! m cell(c m , _) objects, this leads to an interpretation where cells are allocated lazily just before they are written. While this has some advantages, it is unclear how to inform the thread which will eventually read from the new cell where said cell can be found, and so, in the interest of having a realistically implementable semantics, we just allocate an empty cell on spawning a new thread, allowing the parent thread to see the location of that cell.
We can then define configurations with the following grammar (and the additional constraint of our invariant): We think of the join C 1 , C 2 of two configurations as a commutative and associative operation so that this grammar defines a multiset rather than a list or tree.
A multiset rewriting rule takes the collection of objects on the left-hand side of the rule, consumes them (if they are ephemeral), and then adds in the objects on the right-hand side of the rule. Rules may be applied to any subconfiguration, leaving the remainder of the configuration unchanged. This yields a naturally nondeterministic semantics, but we will see that the semantics are nonetheless confluent (Theorem 3). Additionally, while our configurations are not ordered, we will adopt the convention that the writer of an address appears to the left of any readers of that address.
Our semantic rules are based on a few key ideas: 1. Variables represent addresses in shared memory. 2. Cut/spawn is the only way to allocate a new cell. 3. Identity/forward will move or copy data between cells. 4. A process thread(c, P ) will (eventually) write to the cell at address c and then terminate. 5. A process thread(d, Q) that is trying to read from c = d will wait until the cell with address c is available (i.e. its contents is no longer _), perform the read, and then continue.
The counterintuitive part of this interpretation (when using a message-passing semantics as a point of reference) is that a process providing c : A B does not read a value from shared memory. Instead, it writes a continuation to memory and terminates. Conversely, a client of such a channel does not write a value to shared memory. Instead, it continues by jumping to the continuation. This ability to write continuations to memory is a major feature distinguishing this from a message-passing semantics where potentially large closures would have to be captured, serialized, and deserialized, the cost of which is difficult to control [26].
The final piece that we need to present the semantics is a key operation, namely that of passing a value V to a continuation K to get a new process P . This operation is defined as follows: When any of these reductions is applied, either the value or the continuation has been read from a cell while the other is a part of the executing process. With this notation, we can give a concise set of rules for the concurrent dynamics. We present these rules in fig. 3.

Fig. 3. Concurrent dynamic rules
These rules match well with our intuitions from before. In the cut rule, we allocate a new empty cell a, spawn a new thread to execute P , and continue executing Q, just as we described informally in section 2. Similarly, in the id rule, we either move or copy (depending on whether C ∈ σ(m)) the contents of cell c into cell d and terminate. The rules that write values to cells are exactly the right rules for positive types (⊕, ⊗, 1, ↓), while the right rules for negative types ( , ⊸, ↑) write continuations to cells instead. Dually, to read from a cell of positive type, we must have a continuation to pass the stored value to, while to read from a cell of negative type, we need a value to pass to the stored continuation.

Results
We have standard results for this system -a form of progress, of preservation, and a confluence result. To discuss progress and preservation, we must first extend our notion of typing for process terms to configurations. Configurations are typed with the judgment Γ ⊢ C :: ∆ which means that configuration C may read from the addresses in Γ and write to the addresses in ∆. We can then give the following set of rules for typing configurations, which make use of the typing judgment Γ ⊢ P :: (c : A m ) for process terms in the base cases.
Note that our invariants on configurations mean that there is no need to separately type the objects thread(c, P ) and cell(c, _), as they can only occur together. Additionally, while our configurations are multisets, and therefore not inherently ordered, observe that the typing derivation for a configuration induces an order on the configuration, something which is quite useful in proving progress. 2 Our preservation theorem differs slightly from the standard, in that it allows the collection of typed channels ∆ offered by a configuration C to grow after a step, as steps may introduce new persistent memory cells.
Proof. By cases on the transition relation for configurations, applying repeated inversions to the typing judgment on C to obtain the necessary information to assemble a typing derivation for C ′ . This requires some straightforward lemmas expressing that non-interfering processes and cells can be exchanged in a typing derivation.
Progress is entirely standard, with configurations comprised entirely of filled cells taking the role that values play in a functional language.
Theorem 2 (Progress). If · ⊢ C :: ∆ then either Proof. We first re-associate all applications of the typing rule for joining configurations to the left. Then we perform an induction over the structure of the resulting derivation, distinguishing cases for the rightmost cell or thread and potentially applying the induction hypothesis on the configuration to its left. This structure, together with inversion on the typing of the cell or thread yields the theorem.
In addition to these essential properties, we also have a confluence result, for which we need to define a weak notion of equivalence on configurations. We say C 1 ∼ C 2 if there is a renaming ρ of addresses such that ρC 1 = C 2 . We can then establish the following version of the diamond property: Proof. The proof is straightforward by cases. There are no critical pairs involving ephemeral (that is, non-persistent) objects in the left-hand sides of transition rules.

Examples
We present here a few examples of concurrent programs, illustrating various aspects of our language.
Example: Binary Numbers. As a first simple example we consider binary numbers, defined as a type bin at mode m. The structural properties of mode m are arbitrary for our examples. For concreteness, assume that m is linear, that is, Unless multiple modes are involved, we will henceforth omit the mode m. As an example, the number 6 = (110) 2 would be represented by a sequence of labels e, b1, b1, b0, chained together in a linked list. The first cell in the list would contain the bit b0. It has some address c 0 , and also contains an address c 1 pointing to the next cell in the list. Writing out the whole sequence as a configuration we have Example: Computing with Binary Numbers. We implement a recursive process succ that reads the bits of a binary number n starting at address y and writes the bits for the binary number n + 1 starting at x. This process may block until the input cell (referenced as y) has been written to; the output cells are allocated one by one as needed. Since we assumed the mode m is linear, the cells read by the succ process from will be deallocated.
In this example and others we find certain repeating patterns. Abbreviating these makes the code easier to read and also more compact to write. As a first simplification, we can use the following shortcuts: With these, the code for successor becomes The second pattern we notice are sequences of allocations followed by immediate (single) uses of the new address. We can collapse these by a kind of specialized substitution. We describe the inverse, namely how the abbreviated notation is elaborated into the language primitives.
Value SequenceV : At positive types (⊕, ⊗, 1, ↓), which write to the variable x with x.V , we define: In each case, and similar definitions below, x ′ is a fresh variable. Using these abbreviations in our example, we can shorten it further.
For negative types ( , ⊸, ↑) the expansion is symmetric, swapping the left-and right-hand sides of the cut. This is because these constructs read a continuation from memory at x and pass it a value.
Similarly, we can decompose a continuation matching against a value sequence (V ⇒ P ). For simplicity, we assume here that the labels for each branch of a pattern match for internal (⊕) or external ( ) choice are distinct; a generalization to nested patterns is conceptually straightforward but syntactically somewhat complex so we do not specify it formally.
For example, we can rewrite the successor program one more time to express that y ′ in the last case must actually contain the unit element and match against it as well as construct it on the right-hand side.
We have to remember, however, that intermediate matches and allocations still take place and the last two programs are not equivalent in case the process with destination y ′ does not terminate.
To implement plus2 we can just compose succ with itself.
(z : bin) ⊢ plus2 :: (x : bin) In our concurrent semantics, the two successor processes form a concurrently executing pipeline -the first reads the initial number from memory, bit by bit, and then writes a new number (again, bit by bit) to memory for the second successor process to read.
Example: MapReduce. As a second example we consider mapReduce applied to a tree. We have a neutral element z (which stands in for every leaf) and a process f to be applied at every node to reduce the whole tree to a single value. This exhibits a high degree of parallelism, since the operations on the left and right subtree can be done independently. We abstract over the type of element A and the result B at the meta-level, so that tree A is a family of types, and mapReduce AB is a family of processes, indexed by A and B.
Since mapReduce applies reduction at every node in the tree, it is linear in the tree. On the other hand, the neutral element z is used for every leaf, and the associative operation f for every node, so z requires at least contraction (there must be at least one leaf) and f both weakening and contraction (there may be arbitrarily many nodes). Therefore we use three modes: the linear mode m for the tree and the result of mapReduce, a strict mode s for the neutral element z, and an unrestricted mode u for the operation applied at each node.
Example: λ-Calculus. As a third example we show an encoding of the λ-calculus using higher-order abstract syntax and parallel evaluation. We specify, at an arbitrary mode m: An interesting property of this representation is that if we pick m to be linear, we obtain the linear λ-calculus [25], if we pick m to be strict (σ(m) = {C}) we obtain Church and Rosser's original λI calculus [6], and if we set σ(m) = {W, C} we obtain the "usual" λ-calculus [1]. Evaluation (that is, parallel reduction to a weak head-normal form) is specified by the following process, no matter which version of the λ-calculus we consider.
) In this code, v 2 acts like a future: we spawn the evaluation of e 2 with the promise to place the result in v 2 . In our dynamics, we allocate a new cell for v 2 , as yet unfilled. When we pass v 2 to f in f. v 2 , e ′ 1 the process eval e 2 may still be computing and we will not block until we eventually try to read from v 2 (which may nor may not happen).

Sequential Semantics
While our concurrent semantics is quite expressive and allows for a great deal of parallelism, in a real-world setting, the overhead of spawning a new thread can make it inefficient to do so unless the work that thread does is substantial. Moreover, many of the patterns of concurrent computation that we would like to model involve adding some limited access to concurrency in a largely sequential language. We can address both of these issues with the concurrent semantics by adding a construct to enforce sequentiality. Here, we will take as our definition of sequentiality that only one thread (the active thread ) is able to take a step at a time, with all other threads being blocked.
The key idea in enforcing sequentiality is to observe that only the cut/spawn rule turns a single thread into two. When we apply the cut/spawn rule to the term x ← P ; Q, P and Q are executed concurrently. One obvious way (we discuss another later in this section) to enforce sequentiality is to introduce a sequential cut construct x ⇐ P ; Q that ensures that P runs to completion, writing its result into x, before Q can continue. We do not believe that we can ensure this using our existing (concurrent) semantics. However, with a small addition to the language and semantics, we are able to define a sequential cut as syntactic sugar for a Seax term that does enforce this.
Example Revisited: A Sequential Successor. Before we move to the formal definition that enforces sequentiality, we reconsider the successor example on binary numbers in its most explicit form. We make all cuts sequential.
bin m = ⊕ m {b0 : bin m , b1 : bin m , e : 1 m } (y : bin) ⊢ succ :: (x : bin) This now behaves like a typical sequential implementation of a successor function, but in destination-passing style [34,22,4,31]. When there is a carry (manifest as a recursive call to succ), the output bit zero will not be written until the effect of the carry has been fully computed.
To implement sequential cut, we will take advantage of the fact that a shift from a mode m to itself does not affect provability, but does force synchronization. If x : A m , we would like to define where x ′ : ↓ m m A m , and (informally) P ′ behaves like P , except that wherever P would write to x, P ′ writes simultaneously to x and x ′ . Setting aside for now a formal definition of P ′ , we see that Q is blocked until x ′ has been written to, and so as long as P ′ writes to x ′ no later than it writes to x, this ensures that x is written to before Q can continue.
We now see that in order to define P ′ from P , we need some way to ensure that x ′ is written to no later than x. The simplest way to do this is to add a form of atomic write which writes to two cells simultaneously. We define three new constructs for these atomic writes, shown here along with the non-atomic processes that they imitate. We do not show typing rules here, but each atomic write can be typed in the same way as its non-atomic equivalent.

Atomic Write
Non-atomic equivalent Each atomic write simply evaluates in a single step to the configuration where both x and x ′ have been written to, much as if the non-atomic equivalent had taken three steps -first for the cut, second to write to x, and third to write to x ′ . This intuition is formalized in the following transition rules: Note that the rule for the identity case is different from the other two -it requires the cell y k to have been written to in order to continue. This is because the x ← y construct reads from y and writes to x -if we wish to write to x and x ′ atomically, we must also perform the read from y. Now, to obtain P ′ from P , we define a substitution operation [x ′ .shift(x)/ /x] that replaces writes to x with atomic writes to x and x ′ as follows: Extending [x ′ .shift(x)/ /x] compositionally over our other language constructs, we can define P ′ = P [x ′ .shift(x)/ /x], and so We now can use the sequential cut to enforce an order on computation. Of particular interest is the case where we restrict our language so that all cuts are sequential. This gives us a fully sequential language, where we indeed have that only one thread is active at a time. We will make extensive use of this ability to give a fully sequential language, and in sections 6 to 8, we will add back limited access to concurrency to such a sequential language in order to reconstruct various patterns of concurrent computation.
There are a few properties of the operation [x ′ .shift(x)/ /x] and the sequential cut that we will make use of in our embeddings. Essentially, we would like to know that P [x ′ .shift(x)/ /x] has similar behavior from a typing perspective to P , and that a sequential cut can be typed in a similar manner to a standard concurrent cut. We make this precise with the following lemmas: Lemma 1 follows from a simple induction on the structure of P , and lemma 2 can be proven by deriving the seqcut rule using lemma 1.
In an earlier version of this paper, 3 we developed a separate set of sequential semantics which is bisimilar to the presentation we give here in terms of sequential cuts. However, by embedding the sequential cut into the concurrent semantics as syntactic sugar, we are able to drastically reduce the conceptual and technical overhead needed to look at interactions between the two different frameworks, and simplify our encodings of various concurrency patterns.
Example Revisited: λ-Calculus. If we make all cuts in the λ-calculus interpreter sequential, we obtain a call-by-value semantics. In particular, it may no longer compute a weak head-normal form even if it exists.
Call-by-name. As mentioned at the beginning of this section, there are multiple approaches to enforcing that only one thread is active at a time. We can think of the sequential cut defined in section 4 as a form of call-by-value -P is fully evaluated before Q can continue. Here, we will define a different sequential cut x ⇐ N P ; Q, which will behave more like call-by-name, delaying execution of P until Q attempts to read from x. Interestingly, this construct avoids the need for atomic write operations! We nevertheless prefer the "call-by-value" form of sequentiality as our default, as it aligns better with Halstead's approach to futures, which were defined in a call-by-value language, and also avoids recomputing P if x is used multiple times in Q.
As before, we take advantage of shifts for synchronization, here using an upwards shift rather than a downwards one. If x : A m , we would like to define where x ′ : ↑ m m A m and Q ′ behaves as Q, except that where Q would read from x, Q ′ first reads from x ′ and then from x. We can formalize the operation that takes Q to Q ′ in a similar manner to [x ′ .shift(x)/ /x]. We will call this operation Note that unlike in our "call-by-value" sequential cut, where we needed to write to two cells atomically, here, the order of reads is enforced because x ′ .shift(x) will execute the stored continuation shift(x) ⇒ P , which finishes by writing to x. As such, we are guaranteed that Q ′ is paused waiting to read from x until P finishes executing. Moreover, P is paused within a continuation until Q ′ reads from x ′ , after which it immediately blocks on x, so we maintain only one active thread as desired. While we will not make much use of this form of sequentiality, we find it interesting that it is so simply encoded, and that the encoding is so similar to that of call-by-value cuts. Both constructions are also quite natural -the main decision that we make is whether to pause P or Q inside a continuation. From this, the rest of the construction follows, as there are two natural places to wake up the paused process -as early as possible or as late as possible. If we wake the paused process P immediately after the cut, as in the result is a concurrent cut with the extra overhead of the shift. Our sequential cuts are the result of waking the paused process as late as possible -once there is no more work to be done in P in the call-by-value cut, and once Q starts to actually depend on the result of P in the call-by-name cut.
λ-Calculus Example Revisited. We can achieve a sequential interpreter for the λ-calculus with a single use of a by-name cut. This interpreter is then complete: if a weak head-normal form exists, it will compute it. We also recall that this property holds no matter which structural properties we allow for the λ-calculus (e.g., purely linear if the mode allows neither weakening nor contraction, of the λI-calculus if the mode only allows contraction).

Functions
Rather than presenting an embedding or translation of a full (sequential) functional language into our system, we will focus on the case of functions. There is a standard translation of natural deduction to sequent calculus taking introduction rules to right rules, and constructing elimination rules from cut and left rules. We base our embedding of functions into our language on this translation. By following a similar process with other types, one can similarly embed other functional constructs, such as products and sums. We will embed functions into an instance of Seax with a single mode m. For this example, we specify σ(m) = {W, C} in order to model a typical functional language, but we could, for instance, take σ(m) = {} to model the linear λcalculus. We also restrict the language at mode m to only have sequential cuts, which will allow us to better model a sequential language. Note that while we only specify one mode here, we could work within a larger mode structure, as long as it contains a suitable mode m at which to implement functionsnamely, one with the appropriate structural properties, and where we have the restriction of only having sequential cuts. It is this modularity that allows us to freely combine the various reconstructions presented here and in the following sections. As we are only working within a single mode in this section, we will generally omit mode subscripts, but everything is implicitly at mode m. Now, to add functions to this language, we begin by adding a new type A → B and two new constructs -a constructor and a destructor for this type. The constructor, z.(λx.P ⋆ ), writes a λ-abstraction to destination z. Here, we write P ⋆ to denote that the process expression P denotes its destination by ⋆. We will write P y for P [y/⋆]. The use of ⋆ makes this closer to the functional style, where the location that the result is returned to is not made explicit. The destructor, P ⋆ (Q ⋆ ), applies the function P ⋆ to Q ⋆ . These can be typed using variants of the standard → I and → E rules labeled with channels: In order to avoid having to augment our language each time we wish to add a new feature, we will show that these new constructs can be treated as syntactic sugar for terms already in the language, and, moreover, that those terms behave as we would expect of functions and function applications.
We take the following definitions for the new type and terms: These definitions are type-correct, as shown by the following theorem: Theorem 4. If we expand all new constructs using , then the typing rules rules → I and → E above are admissible.
We can prove this by deriving the typing rules, using lemma 2 in a few places. Now that we have established that we can expand this syntactic sugar for functions in a type-correct manner, we examine the evaluation behavior of these terms. First, we consider the lambda abstraction z.(λx.P ⋆ ) and its expansion case z ( x, y ⇒ P y ). A lambda abstraction should already be a value, and so we might expect that it can be written to memory immediately. Indeed, in the expansion, we immediately write the continuation ( x, y ⇒ P y ), which serves as the analogue for (λx.P ⋆ ). This term thus behaves as expected.
We expect that when applying a function P ⋆ to an argument Q ⋆ , we first reduce P ⋆ to a value, then reduce Q ⋆ to a value, and then apply the value of P ⋆ to the value of Q ⋆ , generally by substitution. In the term f ⇐ P f ; x ⇐ Q x ; f. x, y , we see exactly this behavior. We first evaluate P f into f , then Q x into x, and then apply the continuation stored in f to the pair x, y . The addition of y allows us to specify the destination for the result of the function application, as in the destination-passing style [34,22,4,31] of semantics for functional languages.

Futures
Futures [17] are a classic example of a primitive to introduce concurrency into a sequential language. In the usual presentation, we add to a (sequential) functional language the ability to create a future that immediately returns a promise and spawns a concurrent computation. Touching a promise by trying to access its value blocks until that value has been computed. Futures have been a popular mechanism for parallel execution in both statically and dynamically typed languages, and they are also used to encapsulate various communication primitives.
The development of a sequential cut in section 4 provides us with ways to model or reconstruct concurrency primitives, and futures are a surprisingly simple example of this. Starting with a language that only allows sequential cuts, we would like to add a new construct that serves to create a future, as we added functions to the base language in section 5. In this case, however, we already have a construct that behaves exactly as desired. The concurrent cut x ← P ; Q spawns a new process P , and executes P and Q concurrently. When Q tries to read from x, it will block until P has computed a result W and written it to x. If we wish to add an explicit synchronization point, we can do so with minimal overhead by making use of identity to read from x. For instance, the process z ⇐ (z ← x) ; Q will first copy or move the contents of cell x to cell z, and then run Q. As such, it delays the execution of Q until x has been written to, even if Q does not need to look at the value of x until later. This is analogous to the touch construct of some approaches to futures.
In other words, in this language, futures, rather than being a construct that we need to add and examine carefully, are in fact the default. This is, in a sense, opposite to the standard approach, where sequentiality is the norm and a new construct is needed to handle futures. By instead adding sequential cut to our otherwise concurrent language, we get the same expressive power, being able to specify whenever we spawn a new computation whether it should be run concurrently with or sequentially before the continuation process.
This approach to futures, much like those in Halstead's Multilisp, does not distinguish futures at the type level and does not require an explicit touch construct for synchronization, although we can add synchronization points as shown. It is possible to provide an encoding of futures with a distinct type, as they are used in many more modern languages (see appendix A), but we find the form presented here more natural, as it allows a great deal of flexibility to the programmer -a process using a variable x does not know and need not care whether the value of x is computed concurrently or not.
One interesting result that arises from this approach to futures, and in particular from the fact that this approach works at any mode m, regardless of what σ(m) is, is that by considering the case where σ(m) = {}, we recover a definition of linear futures, which must be used exactly once. This is limited in that the base language at mode m will also be linear, along with its futures. However, we are not restricted to working with one mode. For instance, we may take a mode S with σ(S) = {}, which allows for programming linearly with futures, and a mode S * with σ(S * ) = {W, C} and S < S * , which allows for standard functional programming. The shifts between the linear and non-linear modes allow both types of futures to be used in the same program, embedding the linear language (including its futures) into the non-linear language via the monad ↑ S * S ↓ S * S . Uses for linear futures (without a full formalization) in the efficient expression of certain parallel algorithms have already been explored in prior work [2], but to our knowledge, no formalization of linear futures has yet been given.
Binary Numbers Revisited. The program for plus2 presented in section 3.2 is a classic example of a (rather short-lived) pipeline set up with futures. For this to exhibit the expected parallelism, the individual succ process should also be concurrent in its recursive call.
(z : bin) ⊢ plus2 :: (x : bin) x ← plus2 z = y ← succ z ; x ← succ y Simple variations (for example, setting up a Boolean circuit on bit streams) follow the same pattern of composition using futures.
mapReduce Revisited. As a use of futures, consider making all cuts in mapReduce sequential except those representing a recursive call: In this program, the computation at each node is sequential, but the two recursive calls to mapReduce are spawned as futures. We synchronize on these futures when they are needed in the computation of f .

Fork/Join Parallelism
While futures allow us a great deal of freedom in writing concurrent programs with fine-grained control, sometimes it is useful to have a more restrictive concurrency primitive, either for implementation reasons or for reasoning about the program. Fork/join parallelism is a simple, yet practically highly successful paradigm, allowing multiple independent threads to run in parallel, and then collecting the results together after those threads are finished, using a join construct. Many slightly different treatments of fork/join exist. Here, we will take as the primitive construct a parallel pair P ⋆ | Q ⋆ , which runs P ⋆ and Q ⋆ in parallel, and then stores the pair of results. Joining the computation then occurs when the pair is read from, which requires both P ⋆ and Q ⋆ to have terminated. As with our reconstruction of functions in section 5, we will use a single mode m which may have arbitrary structural properties, but only allows sequential cuts.
As we are working with only a single mode, we will generally omit the subscripts that indicate mode, writing A rather than A m . We introduce a new type A m B m of parallel pairs and new terms to create and read from such pairs. We present these terms in the following typing rules: As in section 5 we can reconstruct these types and terms in Seax already. Here, we define: This definition respects the typing as prescribed by the R and L rules.
Theorem 5. If we expand all new constructs using , then the R and L rules above are admissible.
This theorem follows quite straightforwardly from lemma 1.
The evaluation behavior of these parallel pairs is quite simple -we first observe that, as the derivation of L in the theorem above suggests, the reader of a parallel pair behaves exactly as the reader of an ordinary pair. The only difference, then, is in the synchronization behavior of the writer of the pair. Examining the term we see that it spawns two new threads, which run concurrently with the original thread. The new threads execute P ⋆ [x ′ .shift(x)/ /⋆] and Q ⋆ [y ′ .shift(y)/ /⋆] with destinations x ′ and y ′ , respectively, while the original thread waits first on x ′ , then on y ′ , before writing the pair x, y to z. Because the new threads will write to x and x ′ atomically, and similarly for y and y ′ , by the time x, y is written to z, x and y must have already been written to. However, because both cuts in this term are concurrent cuts, P ⋆ and Q ⋆ run concurrently, as we expect from a parallel pair.
mapReduce Revisited. We can use the fork/join pattern in the implementation of mapReduce so that we first synchronize on the results returned from the two recursive calls before we call f on them.

Monadic Concurrency
For a different type of concurrency primitive, we look at a monad for concurrency, taking some inspiration from SILL [33,32,16], which makes use of a contextual monad to embed the concurrency primitives of linear session types into a functional language. This allows us to have both a fully-featured sequential functional language and a fully-featured concurrent linear language, with the concurrent layer able to refer on variables in the sequential layer, but not the other way around.
To construct this concurrency monad, we will use two modes N and S with N < S. Intuitively, the linear concurrent portion of the language is at mode N, while the functional portion is at mode S. As in common in functional languages, S allows weakening and contraction (σ(S) = {W, C}), but only permits sequential cuts (by which we mean that any cut whose principal formula is at mode S must be a sequential cut) so that it models a sequential functional language. By contrast, N allows concurrent cuts, but is linear (σ(S) = {}). We will write A S and A N for sequential and concurrent types, respectively.
We will borrow notation from SILL, using the type {A N } for the monad, and types A S ∧B N and A S ⊃ B N to send and receive functional values in the concurrent layer, respectively. The type {A N } has as values process expressions {P ⋆ } such that P ⋆ :: (⋆ : A N ). These process expressions can be constructed and passed around in the functional layer. In order to actually execute these processes, however, we need to use a bind construct {c N } ← Q ⋆ in the functional layer, which will evaluate Q ⋆ into an encapsulated process expression {P ⋆ } and then run P ⋆ , storing its result in c N . We can add {·} to our language with the typing rules below. Here, Γ S indicates that all assumptions in Γ are at mode S: Since they live in the session-typed layer, the ∧ and ⊃ constructs fit more straightforwardly into our language. We will focus on the type A S ∧ B N , but A S ⊃ B N can be handled similarly. A process of type A S ∧ B N should write a pair of a functional value with type A S and a concurrent value with type B N . These terms and their typing rules are shown below: To bring these new constructs into the base language, we define These definitions give us the usual type-correctness theorem: Theorem 6. If we expand all new constructs using , then the typing rules for {·} and ∧ are admissible.
As with the previous sections, it is not enough to know that these definitions are well-typed -we would also like to verify that they have the behavior we expect for a concurrency monad. In both cases, this is relatively straightforward. Examining the term we see that this writes a continuation into memory, containing the process P x . A reference to this continuation can then be passed around freely, until it is executed using the bind construct: This construct first evaluates P y with destination y S , to get a stored process, and then executes that stored process with destination c N . The ∧ construct is even simpler. Writing a functional value using the term sends both a shift (bringing the functional value into the concurrent layer) and the pair x N , y N of the continuation y N and the shift-encapsulated value x N . Reading such a value using the term just does the opposite -we read the pair out of memory, peel the shift off of the functional value v S to return it to the sequential, functional layer, and continue with the process P z , which may make use of both v S and the continuation y N . These terms therefore capture the general behavior of a monad used to encapsulate concurrency inside a functional language. The details of the monad we present here are different from that of SILL's (contextual) monad, despite our use of similar notation, but the essential idea is the same.
Example: A Concurrent Counter. We continue our example of binary numbers, this time supposing that the mode m = S, that is, our numbers and the successor function on them are sequential and allow weakening and contraction. counter represents a concurrently running process that can receive inc and val messages to increment or retrieve the counter value, respectively.

Conclusion
We have presented a concurrent shared-memory semantics based on a semiaxiomatic [10] presentation of adjoint logic [30,23,24,28], for which we have usual variants of progress and preservation, as well as confluence. We then demonstrate that by adding a limited form of atomic writes, we can model sequential computation. Taking advantage of this, we reconstruct several patterns that provide limited access to concurrency in a sequential language, such as fork/join, futures, and monadic concurrency in the style of SILL. The uniform nature of these reconstructions means that they are all mutually compatible, and so we can freely work with any set of these concurrency primitives within the same language.
There are several potential directions that future work in this space could take. In our reconstruction of futures, we incidentally also provide a definition of linear futures, which have been used in designing pipelines [2], but to our knowledge have not been examined formally or implemented. One item of future work, then, would be to further explore linear futures, now aided by a formal definition which is also amenable to implementation. We also believe that it would be interesting to explore an implementation of our language as a whole, and to investigate what other concurrency patterns arise naturally when working in it. Additionally, the stratification of the language into layers connected with adjoint operators strongly suggests that some properties of a language instance as a whole can be obtained modularly from properties of the sublanguages at each mode. Although based on different primitives, research on monads and comonads to capture effects and coeffects, respectively, [8,11] also points in this direction. In particular, we would like to explore a modular theory of (observational) equivalence using this approach. Some work on observational equivalence in a substructural setting already exists, [21] but works in a message-passing setting and does not seem to translate directly to the shared-memory setting of Seax.

A Typed Futures
The futures that we discuss in section 6 behave much like Halstead's original futures in Multilisp, which, rather than being distinguished at the type level, are purely operational. One side effect of this is that while we can explicitly synchronize these futures, we can also make use of implicit synchronization, where accessing the value of the future blocks until it has been computed, without the need for a touch construct.
Here, we will look at a different encoding of futures, which distinguishes futures at the type level, as they have often been presented since. As in section 5, we will work with a single mode m, in which we will only allow sequential cuts, and which may have any set σ(m) of structural properties. To the base language, we add the following new types and process terms for futures:

Types
A ::= . . . | fut A Processes P ::= . . . | x. P ⋆ | touch y ( z ⇒ P ) We type these new constructs as: We then reconstruct this in Seax by defining This is not the only possible reconstruction, 4 but we use it because it is the simplest one that we have found. The first property to verify is that these definitions are type-correct: Theorem 7. If we expand all new constructs using , then the rules futL and futR are admissible.
Proof. By examining typing derivations for these processes, we see that these rules can be derived as follows:

↓L0
Note that we omit mode conditions on cut because within a single mode m, they are necessarily satisfied. Now, we examine the computational behavior of these terms to demonstrate that they behave as futures. The type ↓ m m A m , much like in section 4 where we used it to model sequentiality, adds an extra synchronization point. Here, we shift twice, giving ↓ m m ↓ m m A m , to introduce two synchronization points. The first is that enforced by our restriction to only allow sequential cuts in this language (outside of futures), while the second will become the touch construct. We will see both of these when we examine each process term.
We begin by examining the constructor for futures. Intuitively, when creating a future, we would like to spawn a new thread to evaluate P ⋆ with new destination z m , and immediately write the promise of z m (represented by a hypothetical new value z m ) into x m , so that any process waiting on x m can immediately proceed. The term behaves almost exactly as expected. Rather than spawning P ⋆ with destination z m , we spawn P ⋆ [z ′ m .shift(z m )/ /⋆], which will write the result of P ⋆ to z m , and a synchronizing shift to z ′ m . Concurrently, we write the value shift(z ′ m ) to x m , allowing the client of x m to resume execution, even if x m was created by a sequential cut. This value shift(z ′ m ) is the first half of the promise z m , and the second half, shift(z m ), will be written to z ′ m when P finishes executing. If, while P continues to execute, we touch x m , we would expect to block until the promise z m has been fulfilled by P having written to z m . Again, we see exactly this behavior from the term touch x m ( z m ⇒ Q) case x m (shift(z ′ m ) ⇒ case z ′ m (shift(z m ) ⇒ Q)).
This process will successfully read shift(z ′ m ) from x m , but will block trying to read from z ′ m until z ′ m is written to. Since z m and z ′ m are written to at the same time, we block until z m is written to, at which point the promise is fulfilled. Once a result W has been written to z m and (simultaneously) shift(z m ) has been written to z ′ m , this process can continue, reading both z ′ m and z m , and continuing as Q. Again, this is the behavior we expect a touch construct to have.
This approach does effectively model a form of typed future, which ensures that all synchronization is explicit, but comes at the cost of overhead from the additional shifts. Both this and the simpler futures that we describe in section 6 have their uses, but we believe that the futures native to Seax are more intuitive in general.

B Proofs of Type Correctness
In sections 5, 7 and 8, we present type-correctness theorems for our reconstructions of various concurrency primitives, but omit the details of the proofs. Here, we present those details.
Functions. We derive the typing rules as follows, making use of lemma 2 to use the admissible seqcut rule. We omit the conditions on modes for cut, as we only have one mode: Fork/Join. Due to the length of the process term that defines z. P ⋆ | Q ⋆ , we elide portions of it throughout the derivation below, and we will write P ′ for P ⋆ [x ′ .shift(x)/ /⋆], and similarly Q ′ for Q ⋆ [y ′ .shift(y)/ /⋆]. With these abbreviations, we have the following derivation for the R rule, where the dashed inferences are made via lemma 1. The left rule is much more straightforward, since this encoding makes the writer of the pair rather than the reader responsible for synchronization. Note that unlike the rules for {·} or for many of the constructs in previous sections, those for ∧ are not only admissible -they are derivable.