The Geometry of Bayesian Programming

We give a geometry of interaction model for a typed lambda-calculus endowed with operators for sampling from a continuous uniform distribution and soft conditioning, namely a paradigmatic calculus for higher-order Bayesian programming. The model is based on the category of measurable spaces and partial measurable functions, and is proved adequate with respect to both a distribution-based and a sampling based operational semantics.

• Flexibility. The model we present is quite flexible, in the sense of being able to reflect the operational behaviour of programs as captured by both the distribution-based and the sampling-based semantics. • Intuitiveness. GoI visualises the structure of programs in terms of graphs, from which dependencies between subprograms can be analyzed. Adequacy of our model provides diagrammatic reasoning principle about observational equivalence of PCFSS. This paper's contributions, beside the model's definition, are two adequacy results which precisely relate our GoI model to the operational semantics, as expressed (following [28]), in both the distribution and sampling styles. As a corollary of our adequacy results, we show that the distribution induced by sampling-based operational semantics coincides with distribution-based operational semantics.

Turning Measurable Spaces into a GoI Model
Before entering into the details of our model, it is worthwhile to give some hints about how the proposed model is obtained, and why it differs from similar GoI models from the literature.
The thread of work the proposed model stems from is the one of so-called memoryful geometry of interaction [29,30]. The underlying idea of this paper is precisely the same: program execution is modelled as an interaction between the program and its environment, and memoisation takes place inside the program as a result of the interaction.
In the previous work on memoryful GoI by the second author with Hasuo and Muroya, the goal consisted in modelling a λ-calculus with algebraic effects. Starting from a monad together with some algebraic effects, they gave an adequate GoI model for such a calculus, which is applicable to wide range of algebraic effects. In principle, then, their recipe could be applicable to PCFSS, sinc sampling-based operational semantics enables us to see scoring and sampling as algebraic effects acting on global states. However, the that would not work for PCFSS, since the category Meas of measurable spaces 1 is not cartesian closed, and we thus cannot define a state monad by way of the exponential S ⇒ S × (−).
In this paper, we side step this issue by a series of translations, to be described in Section 4 below. Instead of looking for a state monad on Meas, we embed Meas into the category Mealy of Int-objects and Mealy machines (Section 5) and use a state monad on this category. This is doable because Mealy is a compact closed category given by the Int-construction [27]. The use of such compact closed categories (or, more generally, of traced monoidal categories) is the way GoI models capture higher-order functions.

Outline
The rest of the paper is organised as follows. After giving some necessary measure-theoretic preliminaries in Section 2 below, we introduce in Section 3 the language PCFSS, together with the two kinds of operational semantics we were referring to above. In Section 4, we introduce our GoI model informally, while in Section 5 a more rigorous treatment of the involved concepts is given, together with the adequacy results. We discuss in Section 10 an alternative way of giving a GoI semantics to PCFSS based on s-finite kernels, and we conclude in Section 12.

Measure-Theoretic Preliminaries
We recall some basic notions in measure theory that will be needed in the following. We also fix some useful notations. For more about measure theory, see standard text books such as [31].
A σ-algebra on a set X is a family Σ consisting of subsets of X such that ∅ ∈ Σ; and if A ∈ Σ, then the complement X \ A is in Σ; and for any family {A n ∈ Σ} n∈N , the intersection n∈N A n is in Σ. A measurable space X is a set |X| equipped with a σ-algebra Σ X on |X|. We often confuse a measurable space X with its underlying set |X|. For example, we simply write x ∈ X instead of x ∈ |X|. For measurable spaces X and Y , we say that a partial function f : X → Y (in this paper, we use → for both partial functions and total functions) is measurable when for all A ∈ Σ Y , the inverse image {x ∈ X : f (x) is defined and is equal to an element of A} is in Σ X . A measurable function from X to Y is a totally defined partial measurable function. A (partial) measurable function f : X → Y is invertible when there is a measurable function g : Y → X such that g • f and f • g are identities. In this case, we say that f is an isomorphism from X to Y and say that X is isomorphic to Y . We denote a singleton set { * } by 1, and we regard the latter as a measurable space by endowing it with the trivial σ-algebra. We also regard the empty set ∅ as a measurable space in the obvious way. In this paper, N denotes the measurable set of all non-negative integers equipped with the σ-algebra consisting of all subsets of N, and R denotes the measurable set of all real numbers equipped with the σ-algebra consisting of Borel sets, that is, the least σ-algebra that contains all open subsets of R. By the definition of Σ R , a function f : R → R is measurable whenever f −1 (U ) ∈ Σ R for all open subsets U ⊆ R. Therefore, all continuous functions on R are measurable.
When Y is a subset of the underlying set of a measurable space X, we can equip Y with a σ-algebra Σ Y = {A ∩ Y : A ∈ Σ X }. This way, we regard the unit interval and the set of all non-negative real numbers as measurable spaces, and indicate them as follows: For measurable spaces X and Y , we define the product measurable space X ×Y and the coproduct measurable space X + Y by where the underlying σ-algebras are: Σ X×Y = the least σ-algebra such that A × B ∈ Σ X×Y for all A ∈ Σ X and B ∈ Σ Y , We assume that × has higher precedence than +, i.e., we write X + Y × Z for X + (Y × Z). In this paper, we always regard finite products R n as the product measurable space on R. It is wellknown that the σ-algebra Σ R n is the set of all Borel sets, i.e., Σ R n is the least one that contains all open subsets of R n . Partial measurable functions are closed under compositions, products and coproducts. Let X be a measurable space. A measure µ on X is a function from Σ X to [0, ∞] that is the set of all non-negative real numbers extended with ∞, such that • µ(∅) = 0; and • for any mutually disjoint family {A n ∈ Σ X } n∈N , we have n∈N µ(A n ) = µ n∈N A n . We say that a measure µ on X is finite when µ(X) < ∞ and that it is σ-finite if X = n∈N X n for some family {X n ∈ Σ X } n∈N satisfying µ(X n ) < ∞. For a measurable space X, we write ∅ X for a measure on X given by ∅ X (A) = 0 for all A ∈ Σ X . If µ is a measure on a measurable space X, then for any non-negative real number a, the function (a µ)(A) = a(µ (A)) is also a measure on X. The Borel measure µ Borel on R n is the unique measure that satisfies µ Borel ([a 1 , b 1 ] × · · · × [a n , b n ]) = 1≤i≤n |a i − b i |.
We define the Borel measure µ Borel on 1 by µ Borel (1) = 1. For a measurable function f : R n → R and a measurable subset X ⊆ R n , we denote the integral of f with respect to the Borel measure restricted to X by For a measurable space X and for an element x ∈ X, a Dirac measure δ x on X is given by The square bracket notation in the right hand side is called Iverson's bracket. In general, for a proposition P , we have [P ] = 1 when P is true and [P ] = 0 when P is false.
Proposition 2.1. For every σ-finite measures µ on a measurable space X and ν on a measurable space Y , there is a unique measure µ × ν on X × Y such that (µ × ν)(A × B) = µ(A)ν(B) for all A ∈ Σ X and B ∈ Σ Y .
The measure µ × ν is called the product measure of µ and ν. For example, the Borel measure on R 2 is the product measure of the Borel measure on R.
Finally, let us recall the notion of a kernel, which is a well-known concept in the theory of stochastic processes. For measurable spaces X and Y , a kernel from X to Y is a function k : X × Σ Y → [0, ∞] such that for any x ∈ X, the function k(x, −) is a measure on Y , and for any A ∈ Σ Y , the function k(−, A) is measurable. Notions of finite and σ-finite kernels can be naturally given, following the emponymous constraint on measures. Those kernels which can be expressed as the sum of countably many finite kernels are said to be s-finite [32]. We use kernels to give semantics for our probabilistic programming language, to be defined in the next section.

Syntax and Type System
Our language PCFSS for higher order Bayesian programming can be seen as Plotkin's PCF endowed with real numbers, measurable functions, sampling from the uniform distribution on R Here, x varies over a countably infinite set of variable symbols, and a varies over the set R of all real numbers. Each function identifier F is associated with a measurable function fun F from R |F| to R. For terms M and N, we write M{N/x} for the capture-avoiding substitution of x in M by N. Terms in PCFSS are restricted to be A-normal forms, in order to make some of the arguments on our semantics simpler. This restriction is harmless for the language's expressive power, thanks to the presence of let-bindings. For example, term application M N can be defined to be let x be M in let y be N in x y.
The term constructor score and the constant sample enable probabilistic programming in PCFSS. Evaluation of score(r a ) has the effect of multiplying the weight of the current probabilistic branch by |a|, this way enabling a form of soft-conditioning. The constant sample generates a real number randomly drawn from the uniform distribution on R [0,1] . Only one sampling mechanism is sufficient because we can model sampling from other standard distributions by composing sample with measurable functions [33].
Terms can be typed in a natural way. A context ∆ is a finite sequence consisting of pairs of a variable and a type such that every variable appears in ∆ at most once. A type judgement is a triple ∆ ⊢ M : A consisting of a context ∆, a term M and a type A. We say that a type judgement  Figure 1. Here, the type of sample is Real, and the type of score(V) is Unit because sample returns a real number, and the purpose of scoring is its side effect.
In the sequel, we only consider derivable type judgements and typable closed terms, that is, closed terms M such that ⊢ M : A is derivable for some type A.

Distribution-Based Operational Semantics
We define distribution-based operational semantics following [28] where, however, a σ-algebra on the set of terms is necessary so as to define evaluation results of terms to be distributions (i.e. measures) over values. In this paper, we only consider evaluation of terms of type Real and avoid introducing σ-algebras on sets of closed terms, thus greatly simplifying the overall development.
Distribution-based operational semantics is a function that sends a closed term M : Real to a measure µ on R. Because of the presence of score, the measure may not be a probabilistic measure, i.e., µ(R) may be larger than 1, but the idea of distribution-based operational semantics is precisely that of associating each closed term of type Real with a measure over R.
As common in call-by-value programming languages, evaluation is defined by way of evaluation contexts: The distribution-based operational semantics of PCFSS is a family of binary relations {⇒ n } n∈N between closed terms of type Real and measures on R inductively defined by the evaluation rules in Figure 2 where the evaluation rule for score is inspired from the one in [32]. The binary relation red −→ in the precondition of the third rule in Figure 2 is called deterministic reduction and is defined as follows as a relation on closed terms: The last evaluation rule in Figure 2 makes sense because k in the precondition is a kernel from R [0,1] to R: Lemma 3.1. For any n ∈ N and for any term there is a finite kernel k from R m to R such that for any u ∈ R m and for any measure µ on R, where u = (a 1 , . . . , a m ).
Proof. Let ∆ be a context of the form x 1 : Real, . . . , x m : Real. In this proof, for a finite sequence u = (a 1 , . . . , a n ) ∈ R m , and for a term ∆ ⊢ M : A, we denote by M{r u /∆}. We prove the statement by induction on n ∈ N. (Base case) Let k be a kernel from R m to R given by k(u, A) = 0.
Then for any u = (a 1 , . . . , a m ) ∈ R m , (Induction step) We define a redex R by We note that V, W in the above BNF can be variables. By induction on the size of type derivation, we can show that every term ∆ ⊢ M : A is either a value or of the form E[R] for some evaluation context E[−] and some redex R. Given a term ∆ ⊢ M : A where ∆ = x 1 : Real, . . . , x m : Real, we prove the induction step by case analysis.
• If ∆ ⊢ M : Real is a value, then M is either a variable x i or a constant r a . When M is a variable When M is a constant r a , we have , then by induction hypothesis, there is a kernel from R m+1 to R such that for any u ∈ R m+1 , We define a kernel h from R m to R by This is a kernel because if f : R × · · · × R → R is a non-negative measurable function, then for some i ∈ {1, 2, . . . , m}, then by induction hypothesis, there is a kernel k from R m to R such that for any u ∈ R m , We define a kernel h : R m to R by Then, for any u = (a 1 , . . . , a m ) ∈ R m , for some a ∈ R, then by induction hypothesis, there is a kernel k from R m to R such that for any u ∈ R m , We define a kernel h : Then, for any u = (a 1 , . . . , a m ) ∈ R m , , then by induction hypothesis, there is a kernel k from R m to R such that for all u ∈ R m , Hence, Hence, , then V i is equal to either a variable or a constant r a . For simplicity, we suppose that |F| = 2 and V 1 = x i and V 2 = r a . By induction hypothesis, there is a kernel from R m+1 to R such that for all u ∈ R m+1 , We define a kernel h from R m to R by h((a 1 , . . . , a m ), A) = k((a 1 , . . . , a m , fun F (a i , a)), A).
Then, for any u = (a 1 , . . . , a m ) ∈ R m , • If ∆ ⊢ M : Real is of the form let x be V in N, then by induction hypothesis, there is a kernel k from R m to R such that for all u ∈ R m , Hence, . . , m}, then by induction hypothesis, there are kernels k and k ′ from R m to R such that for any u ∈ R m , We define a kernel h from R m to R by where u = (a 1 , . . . , a n ).
Then, for any u ∈ R m , • If ∆ ⊢ M : Real is of the form E[ifz(r 0 , N, L)], then by induction hypothesis, there is a kernel k from R m to R such that for any u ∈ R m , Hence, • If ∆ ⊢ M : Real is of the form E[ifz(r a , N, L)] for some real number a = 0, then by induction hypothesis, there is a kernel k from R m to R such that Hence, The step-indexed distribution-based operational semantics approximates the evaluation of closed terms by restricting the number of reduction steps. Thus, the limit of the step-indexed distribution-based operational semantics represents the "true" result of evaluating the underlying term. The binary relation ⇒ ∞ is a function from the set of closed terms of type Real to the set of measures on R. This follows from Lemma 3.1 and that the family of measures {µ n } n∈N on R such that M ⇒ n µ n forms an ascending chain µ 0 ≤ µ 1 ≤ · · · with respect to the pointwise order. Moreover, it can be proved that for any x 1 : Real, . . . , x m : Real ⊢ M : Real, k given by M{r a1 /x 1 , . . . , r am /x m } ⇒ ∞ k((a 1 , . . . , a m ), −) is an s-finite kernel.

Sampling-Based Operational Semantics
PCFSS can be endowed with another form of operational semantics, closer in spirit to inference algorithms, called the sampling-based operational semantics. The way we formulate it is deeply inspired from the one in [28].
The idea behind sampling-based operational semantics is to give the evaluation result of each probabilistic branch somehow independently. We specify each probabilistic branch by two parameters: one is a sequence of random draws, which will be consumed by sample; the other is a likelihood measure called weight, which will be modified by score. Below, we write ε for the empty sequence. For a real number a and a finite sequence u consisting of real numbers, we write a :: u for the finite sequence obtained by putting a on the head of u. In Figure 3, we give the evaluation rules of sampling-based operational semantics where red −→ is the deterministic reduction relation introduced in the previous section. We denote the reflective transitive closure of → by → * . Intuitively, (M, 1, u) → * (r a , b, ε) means that by evaluating M, we get the real number a with weight b consuming all the random draws in u.

Towards Mealy Machine Semantics
In this section, we give some intuitions about our GoI model, which we also call Mealy machine semantics. Giving Mealy machine semantics for PCFSS requires translating PCFSS into the linear λ-calculus. This is because GoI is a semantics for linear logic, and is thus tailored for calculi in which terms are treated as resources. Schematically, Mealy machine semantics for PCFSS translates terms in PCFSS into Mealy machines in the following way.
In Section 4.1, we explain the first three steps. The last step deserves to be explained in more detail, which we do in Section 4.2. For the sake of simplicity, we ignore the translation of conditional branching and the fixed point operator.

Moggi's Translation
In the first step, we translate PCFSS into an extension of the Moggi's meta-language by Moggi's translation [34]. Here, in order to translate scoring and sampling in PCFSS, we equip Moggi's meta-language with base types Unit and Real and the following terms: where T is the monad of Moggi's meta-language. Any type A of PCFSS is translated into the type A ♯ defined as follows: Terms sample and score(−) in PCFSS are translated into sample and score(−) in Moggi's meta-language respectively. See [34] for more detail about Moggi's translation.

Girard Translation
We next translate the extended Moggi's meta-language into an extension of the linear λ-calculus, by way of the so-called Girard translation [35]. Types are given by where Unit, Real and State are base types, and terms are generated by the standard term constructors of the linear λ-calculus, plus the following rules: (as customary in linear logic, A ⊸ B is an abbreviation of A ⊥ ℘ B). These typing rules are derived from the following translation (−) ♭ of types of the extended Moggi's meta-language into types of the extended linear λ-calculus: The definition of (T A) ♭ is motivated by the following categorical observation: let L be the syntactic category of the extended linear λ-calculus, which is a symmetric monoidal closed category endowed with a comonad ! : L → L with certain coherence conditions (see e.g. [36]), and let L ! be the coKleisli category L ! of the comonad !. Then, by composing the adjunction between L and L ! with a state monad State ⊸ State ⊗ (−) on L, we obtain a monad on L ! : which sends an object A ∈ L ! to State ⊸ State ⊗ !A. This use of the state monad is motivated by sampling-based operational semantics: we can regard PCFSS as a call-by-value λ-calculus with global states consisting of pairs of a non-negative real number and a finite sequence of real numbers, and we can regard score and sample as effectful operations interacting with those states.

The Third Step
We translate terms in the extended linear λ-calculus into (an extension of proof structures) [37], which are graphical presentations of type derivation trees of linear λ-terms. We can also understand proof structures as string diagrams for compact closed categories [38]. Operators of the pure, linear, λ-calculus, can be translated as usual [37]. For example, type derivation trees respectively where nodes labelled with M and N are proof structures associated to type derivations of M and N. Terms of the form r a , sample(M) and score, require new kinds of nodes: . This is not a direct adaptation of typing rules for score and sample in the linear λ-calculus, but the correspondence can be recovered by way of multiplicatives:

From Proof Structures to Mealy Machines
The series of translations from PCFSS to proof structures is agnostic as for the computational meaning of score and sample in PCFSS because score and sample introduced in these translations are just constant symbols. In other words, the translation from PCFSS to the extended proof structures is not sound with respect to either form of operational semantics for PCFSS.
In the last translation step, we assign proof structures a computational meaning, respecting the operational semantics of the underlying PCFSS term. We do this by associating proof structures with Mealy machines. A Mealy machine is an input/output-machine whose evolution may depend on its current state. In this paper, for the sake of supporting intuition and of enabling graphical reasoning, we depict a Mealy machine M as a node with some input/output-ports: For example, the thick arrow in the middle diagram indicates that if the current state is s and the given input is x, then the Mealy machine outputs y and changes its state to t. In the GoI jargon, data traveling along edges of proof structures are often called tokens.
For the standard proof structures, we can follow [39] where Mealy machines associated with proof structures are built up from Mealy machines associated to each nodes. For example, the following nodes are both associated with a one-state Mealy machine that behaves in the following manner: . Namely, the Mealy machine forwards each input from the left hand side to the right hand side endowing it with a tag that tells where the token came from. The Mealy machine handles inputs from the right hand side in the reverse way. Soundness of Mealy machine semantics states that if two (pure) linear λ-terms are β-equivalent, then the behaviours of the Mealy machines associated to these terms are the same. As an example, let us consider a β-reduction step (λx A . x) y → y. The proof structure associated to (λx A . x) y is the graph in the left hand side, and the arrow in the right hand side illustrates a trace of a run of this Mealy machine for an input a from the right edge: This Mealy machine forwards any input from the right hand side to the left hand side as indicated by the thick arrow, and it also forwards any input from the left hand side to the right hand side. Hence, the behaviour of this Mealy machine is equivalent to the behaviour of the following trivial Mealy machine: A a a a a , which is the interpretation of y : A ⊢ y : A. This is in fact a symptom of a general phenomenon: Mealy machine semantics for the linear λ-calculus captures β-reduction (λx A . x) y → y.
But how can we extend this Mealy machine semantics to score and sample? Here, we borrow the idea from Game semantics [40] that models computation in terms of interaction between programs and environments. For scoring and sampling, we can infer how they interact with the environment from sampling-based operational semantics. For scoring, we associate score with a one-state Mealy machine that has the following transitions: where u is a finite sequence of real numbers and a, b are real numbers such that a ≥ 0. We can read these transitions as follows: for each "configuration" (−, a, u), the Mealy machine sends a query (a, u) to environment in order to know the value of its argument, and if environment answers that the value is b, i.e., if the Mealy machine receives (a, b :: u), then it outputs (|b| a, u), which is the evaluation result of (score(r b ), a, u).
For sampling, we associate sample with a Mealy machine that has the following transitions: where u is a finite sequence of real numbers and a, b are real numbers such that a ≥ 0. The first transition means that in the initial state * , given a "configuration" (−, a, b ::u), the Mealy machine pops the first element of b :: u and memorises the value b by changing its state from * to b. After this transition, for any query (a, u) asking the result of sampling, it answers the value memorised in the first transition. For example, a Mealy machine which is a denotation of the term M = let x be sample in score(x), and behaves as follows: Our adequacy theorem says that the evaluation result of a term coincides with the execution result of the associated Mealy machine. In fact, for this case, the outcome (|b| a, u) of the above Mealy machine is equal to the evaluation result of (M, a, b :: u), that is, (M, a, b :: u) → * (skip, |b| a, u). In this interaction process, the memoisation mechanism of the sa-node is necessary, otherwise the sa-node can not tell the sc-node that the result of sampling is b.
Remark 4.1. Two notions of state (the one coming from the state monad and the one of the of the Mealy machine itself ) are used for different purpose here: the first notion is needed to model the call-by-value evaluation strategy where we need to store intermediate effects that are invoked during the evaluation. The second notion of state is needed to model sampling. More concretely, each Mealy machine for sampling need to remember the already sampled values in the current probabilistic branch.

Mealy Machines and their Compositions
After having described Mealy machine semantics briefly and informally, it is now time to get more formal. In this section, we introduce the notion of a Mealy machine and some constructions on Mealy machines. We also introduce a way of diagramatically presenting Mealy machines which is behaviourally sound.

Mealy Machines, Formally
In this paper, we call a pair of measurable spaces an Int-object. We use sans-serif capital letters X, Y, Z, . . . to denote Int-objects, and we denote the positive/negative part of an Int-object by the same italic letter superscripted by +/−. For example, X denotes an Int-object (X + , X − ) consisting of two measurable spaces X + and X − . The name "Int-object" comes from the socalled Int-construction [26]. Definition 5.1 and the definition of monoidal products in Section 5.4 are also motivated by Int-construction.
• an element s M ∈ S M called the initial state of M; • a partial measurable function The transition function τ M of a Mealy machine M describes a mapping between inputs and outputs which can also alter the underlying state. For x ∈ X + + Y − and s ∈ S M , τ M (x, s) = (y, t) means that when the current state of M is s, given an input x, there is an output y and the next state is t.
Readers may wonder why X − appears in the target and Y − appears in the source of the transition function of a Mealy machine from X to Y. In short, this is because we are interested in Mealy machines that handle bidirectional computation. The diagrammatic presentation of Mealy machines clarifies the meaning of "bidirectional." Let M : X ⊸ Y be a Mealy machine. In this paper, we depict M as follows: Intuitively, each label on an edge indicates the type of data traveling along the edge. Namely, on the X-edge (on the Y-edge), elements in X + (in Y + ) go from left to right, and elements in X − (in Y − ) go from right to left. For example, we depict the following transitions (Recall that the white/black bullet indicates the left/right part of the disjoint sum.) The expressions s 0 /s 1 and s 0 /s 2 on the Mealy machine M stands for transitions of states. We omit states transitions when we can infer them. We will give some Mealy machines whose state spaces are trivial, namely 1. We call such a Mealy machine token machine. Our usage of the term token machine is along the lines of that in other papers on GoI such as [41,39]. Since we can identify the transition function of a token machine M : X ⊸ Y with the following partial measurable function giving partial measurable function of this type is enough to specify a token machine.
Convention 5.1. We define a token machine M : X ⊸ Y by giving a partial measurable function from X + + Y − to Y + + X − , and we also call this partial measurable function transition function of M. Abusing notation, we write τ M for this transition function.

Behavioural Equivalence
We are now ready to give an equivalence relation between Mealy machines which identifies machines which behave the same way. Identifying Mealy machines in terms of their behaviour is important to reason about compositions of Mealy machines in the following part of this paper. Here, we are inspired by behavioural equivalence from coalgebraic theory of modelling transition systems [42].
Let M and N be Mealy machines from X to Y. We write M X,Y N when there is a measurable function f : The definition means that if we have M X,Y N, then no observer can distinguish between M and N from their input/output behaviour, although their internal structure can be quite different. We define an equivalence relation ≃ X,Y to be the reflective symmetric transitive closure of X,Y . Below, if we can infer the subscript X, Y from the context, we write ≃ instead of ≃ X,Y .  Proposition 5.1. The set of equivalence classes for ≃ X,Y with ≤ is a pointed ωcpo.
We can characterize interpretation of the fixed point operator in PCFSS in terms of least fixed points, see [43]. We give a proof of Proposition 5.1 in Section 5.3.

Proof of Proposition 5.1
For a partially defined expressions E and E ′ , we write E ≈ E ′ when E is defined if and only if E ′ is defined, and if both expressions are defined, then they are the same. For example, we have 1] . For a measurable space X, we write LX for the measurable space of all finite sequences over X endowed with the following σ-algebra: We write ε for the empty sequence. For a ∈ X and u ∈ LX, we denote the list obtained by appending a to u by a :: u.
Let M : X ⊸ Y be a Mealy machine. We write Z for X + + Y − and W for Y + + X − . Then, the transition function of M is of the form We define partial measurable functions α M : LZ → S M and β M : Below, for x ∈ W × S M , we write fst(x) for the first entry of x, and we write snd(x) for the second entry of x. By the definition of α M and β M , we have For a Mealy machine M : Here, the σ-algebra of S M @ is the one induced by Σ 1+S M via the obvious bijection between 1 + S M and {⋄} ∪ S M .
Proof. It is straightforward to check that the embedding e : We show that for any (z, u) ∈ Z × LZ, Hence, each equivalence class [M] of behavioural equivalence is represented by M # , and M # is independent of choice of M. We extend this correspondence to order theoretic structure of Mealy machines.
Proof. By induction on the size of u ∈ LZ, we can show that if α M (u) is defined, then α N (u) is defined and they are the same. Then τ M # ≤ τ N # follows from the definition of (−) # .
Theorem 5.2. The set of equivalence classes of Mealy machines from X to Y with the partial order ≤ is an ω-cpo.

Proof. Let [N] be an upper bound of an ω-chain
We define a Mealy machine L :

Constructions on Mealy Machines
It is now time to give some constructions which are the basic building blocks of our Mealy machine semantics. This section consists of three parts. The first part (from Section 5.4.2 to Section 5.4.5) is related to the linear λ-calculus and is serves to model the purely functional features of PCFSS, such as λ-abstraction and function application. In the second part (Section 5.4.6 and Section 5.4.7), we give Mealy machines modelling real numbers and measurable functions. In the last part (from Section 5.4.9 to Section 5.4.11), we introduce a state monad and associate the monad with Mealy machines modelling score and sample.

Composition
Let X, Y and Z be Int-objects, and let M : and the above join is with respect to the inclusion order between graph relations. The above join is measurable because measurable sets are closed under countable joins. It is tedious but doable to check that the above join always exists and that the composition is compatible with behavioural equivalence and satisfies associativity modulo behavioural equivalence. We define a Mealy machine id X : X ⊸ X by τ id X = id X + +X − . This is the unit of the composition modulo behavioural equivalence.

Monoidal Products
Monoidal Products of Int-objects We introduce monoidal products of Int-objects and their diagrammatic presentation. For Int-objects X and Y, we define a Int-object X ⊗ Y by We define an Int-object I to be (∅, ∅). We write X ⊗ Y ⊗ · · · for X ⊗ (Y ⊗ · · · ). Let X 1 , . . . , X n , Y 1 , . . . , Y m be Int-object. We depict a Mealy machine M from X 1 ⊗ · · · ⊗ X n to Y 1 ⊗ · · · ⊗ Y m as a node with edges labeled by X 1 , . . . , X n on the left hand side and edges labeled by Y 1 , . . . , Y m on the right hand side:   , (•, x)), s) = ((•, (•, · · · (•, y ′ ))), t ′ ), and s, t, t ′ ∈ S M as follows: We note that there are several ways to present a Mealy machine M : Monoidal Product of Mealy Machines We give monoidal products of Mealy machines. For Mealy machines M : X ⊸ Z and N : Y ⊸ W, we define a Mealy machine M ⊗ N : s N ) and τ M⊗N is given by It is not difficult to check that the monoidal product is compatible with behavioural equivalence. We depict M ⊗ N : (X ⊗ Y) ⊸ (Z ⊗ W) as follows: Convention 5.2. We do the following identification: • We identity X⊗(Y⊗Z) with (X⊗Y)⊗Z by the canonical isomorphism X +(Y +Z) ∼ = (X +Y )+Z.
• We identify I ⊗ X and X ⊗ I with X by the unit laws X + + ∅ ∼ = X + and ∅ + X − ∼ = X − .

Axiom Link and Cut Link
For an Int-object X, we define X ⊥ to be (X − , X + ), and we define token machines unit X : I ⊸ X ⊗ X ⊥ , counit X : X ⊥ ⊗ X ⊸ I by τ unit X = id X + +X − and τ counit X = id X − +X + . We depict them by single edges respectively. This is compatible with behaviour of these Mealy machines: if we give an input to an edge, then we will get the same value from the other end of the edge. For example, for any x ∈ X + , we have

Symmetry
Let X and Y be Int-objects. We define a token machine sym X,Y : X ⊗ Y ⊸ Y ⊗ X by letting its transition function be the canonical isomorphism We depict sym X,Y by a crossing: As arrows in the right hand side indicate, given an input from an edge in one side, sym X,Y outputs the same value to the corresponding edge on other side.

A Modal Operator
We give a constructor on Mealy machines that corresponds to the resource modality in linear logic. For an Int-object X, we define an Int-object !X by We can informally regard !X as a countable monoidal power n∈N X ≈ X ⊗ X ⊗ · · · . Following this intuition, we extend the action of !(−) to Mealy machines. Let M : X ⊸ Y be a Mealy machine. We define a Mealy machine !M : !X ⊸ !Y by: the state space of !M is defined to be |M| N associated with the least σ-algebra such that for all A 1 , A 2 , . . . ∈ Σ M , the initial state s !M is (s M , s M , . . .); the transition function τ !M is the unique partial measurable function satisfying . In other words, given an input whose first entry is n, then the nth copy of M handles the input, and there is no side effect to the other copies of M.
These stateless Mealy machines dg X and con X behave as follows: for all n, m ∈ N, Weakening We define a token machine w X : X → I by τ wX = the empty partial function.
Because the identity is the only Mealy machine from I to I (up to behavioural equivalence), we see that for any Mealy machine M : I ⊸ X, This behavioural equivalence means that we can remove M wX X from any diagram.

Real Numbers
We define an Int-object R to be (S, S) where S is the measurable space of all finite sequences of real numbers endowed with the following σ-algebra For a ∈ R, we define a token machine r a : I ⊸ R by The transition means that given a query u from environment, r a answers its value a by appending a to u. We will use u as a stack. See Section 5.4.7 and Section 5.4.10.

Measurable Functions
We associate a measurable function f : R n → R with a token machine fn f : R ⊗n ⊸ R. For simplicity, we define fn f for n = 1 and n = 2. When n = 1, the transition function τ fn f : S + S → S + S is given by We explain how fn f simulates f by describing execution of fn f • r a for a real number a ∈ R. As in the following diagram, given an input u ∈ S from the right R-edge, fn f sends u to the left R-edge in order to obtain the value of its argument. The return value to fn f from r a is a :: u, by which fn f sees that its argument is a. Then, fn f outputs f (a) :: u. As a whole, the following Mealy machine is behaviourally equivalent to r f (a) . As in the following diagram, given an input u ∈ S from the right R-edge, fn f first sends u to the lower R-edge in the left hand side in order to obtain the value of its first argument. The return value to fn f from r a is a :: u. Next, fn f sends a :: u to the upper R-edge in the left hand side. Then r b returns b :: a :: u. Now, fn f sees that its first argument is a and its second argument is b. Finally, fn f outputs f (a, b) :: u. For general cases, f may have more arguments, and fn f sequentially sends queries to its arguments storing partial information about its arguments on finite sequences of real numbers.

Conditional Branching
For an Int-object X such that X − is a measurable subspace of S, we define to be a token machine whose transition function is given by For a real number a ∈ R and Mealy machines M, N : I ⊸ X, we describe execution of cd X • (r a ⊗ M ⊗ N). Given an input u ∈ X − , then cd X tries to check whether a is zero or not by sending u to the R-edge. There are two cases: (i) if a is 0, then r a returns 0 :: u, and cd X forwards u to the middle X-edge; (ii) if a is not 0, say 1, then r a returns 1 :: u, and cd X forwards u to the upper X-edge:

A State Monad
Let T be the subspace of S consisting of all finite sequences of real numbers in R [0,1] . Recall that R ≥0 × T is "the set of states" in sampling-based operational semantics and our idea is to model score and sample by a state monad. In this section, we give a state monad that we use in our Mealy machine semantics. We define Int-objects S 0 and S by Then S ⊗ (−) is a state monad (on Mealy) because for any Int-object X, we have S ⊗ X = ((S 0 ⊗ X) ⊥ ⊗ S 0 ) ⊥ . The unit and the multiplication of this monad are: where e = unit S0 and m = S 0 ⊗ counit S0 ⊗ S ⊥ 0 . Note that S is equal to S 0 ⊗ S ⊥ 0 . We can depict the unit and the multiplication as follows:

Scoring
We define sc to be a token machine from R to S whose transition function τ sc : S + R ≥0 × T → R ≥0 × T + S is given by .
As we explained in Section 4.2, the Mealy machine sa simulates the evaluation rule (sample, a, b :: . Namely, sa pops b from the trace, and then sa answers queries (n, u) that the result of sampling is b.

Diagrammatic Reasoning
We now give a brief remark on diagrammatic presentation of Mealy machines. The diagrammatic presentation of a Mealy machine is not only for intuitive explanation, but also for rigorous reasoning about behavioural equivalence. This follows from some categorical observation. Let Mealy be the category of Int-objects and behavioural equivalence classes of Mealy machines, where composition is induced by the composition of Mealy machines. We will give a proof of the followng proposition in the next section.
Proposition 5.7. The category Mealy is a compact closed category. The dual of an Int-object X is X ⊥ . The unit and the counit arrows are unit X and counit X .
Therefore, as a consequence of the coherence theorem for compact closed categories [44,38], we see that graph isomorphism preserves behavioural equivalence.
Proposition 5.8. If two Mealy machines have the same diagrammatic presentation modulo some rearrangement of edges and nodes, then they are behaviourally equivalent.

5.6
Proof of Proposition 5.7

The Category of Partial Measurable Functions
For some basic categorical notions, see standard text books such as [45]. We define pMeas to be the category of measurable spaces and partial measurable functions. In pMeas, the empty space ∅ is the initial object, and the coproduct space X + Y is the coproduct of X and Y in the categorical sense. We write for the left/right injections. For partial measurable functions f : X → Y and g : Z → Y , we define [f, g] : X + Z → Y to be the cotupling of f and g. For partial measurable functions f : X → Y and g : Z → W , we define partial measurable functions f + g : is defined and is equal to y, undefined, otherwise, is defined and is equal to w, undefined, otherwise, if f (x) is defined and is equal to y and g(w) is defined and is equal to z, undefined, otherwise.
The notion of trace introduced by Joyal, Street and Verity [26] plays important role in this section.
Definition 5.3. Let (C, I, ⊗) be a symmetric monoidal category. A trace operator on (C, I, ⊗) is a family tr Z X,Y X,Y,Z∈C satisfying the following axioms: ).
• (Sliding) For all C-arrows f : • (Vanishing I) For all C-arrows f : X ⊗ I → Y ⊗ I, A symmetric monoidal category (C, I, ⊗) endowed with a trace operator tr is called a traced symmetric monoidal category.
We give a trace operator on (pMeas, ∅, +). The symmetric monoidal category (pMeas, ∅, +) is enriched over ωCppo, which is the cartesian category of pointed ω-cpos and continuous functions. The partial order on a hom-set pMeas(X, Y ) is given by is defined, and f (x) = g(x).
The least arrow ⊥ X,Y : X → Y is the empty partial measurable function. The ωCppo-enrichment induces an iterator iter X,Y : pMeas(X, Y + X) → pMeas(X, Y ) given by The operator iter induces another operator given by Concretely, for a partial measurable function f : Proposition 5.9. The family of operators tr Z X,Y X,Y,Z∈pMeas is a trace operator of the symmetric monoidal category (pMeas, ∅, +). Furthermore, the trace operator is uniform [46] with respect to partial measurable functions : for all partial measurable functions f :

Proof. It is straightforward to adapt the argument in [47, Section A].
We will use the next proposition to construct a trace operator for Mealy machines.
Proposition 5.10. For any partial measurable function f : where we identify w with the arrow from 1 = { * } to W that sends * to w. Because By dinaturality, we obtain Since this is true for any w ∈ W , we see that W × tr Z X,Y (f ) is equal to

The Category of Mealy Machines
Definition 5.4. We define a category Mealy by: • objects are Int-objects X; and • arrows f : X ⊸ Y are behavioural equivalence classes of Mealy machines from X to Y. We denote a wide subcategory of Mealy consisting of Int-object X such X − = ∅ by Mealy + .
Intuitively, while arrows in Mealy are bidirectional Mealy machines, arrows in Mealy + are "one-way" Mealy machines. We consider the wide subcategory Mealy + because categorical structure of Mealy + is easier to describe than that of Mealy, and categorical structure of Mealy is induced by that of Mealy + .
The identity arrow and the composition of Mealy + is given by the identity Mealy machine [id X ] and the composition of Mealy machine: Concrete description of the composition of Mealy machines between Mealy + -objects is easy: for Mealy + -objects X and Y, and for Mealy machines M : X ⊸ Y and N : Y ⊸ Z, the composition N • M : X ⊸ Y consists of: Proof. It is easy to see that objects in Mealy + are closed under the monoidal product of Intobjects. Thanks to simplicity of the composition of Mealy + -arrows, we can easily check that the monoidal product of Mealy machines between Mealy + -objects is compatible with behavioural equivalence and that (Mealy + , I, ⊗) is a symmetric monoidal category.
Proposition 5.12. The family of operators {Tr Z X,Y } X,Y,Z∈Mealy + is a trace operator on the symmetric monoidal category (Mealy + , I, ⊗).
Proof. Well-definedness of Tr Z X,Y (−) follows from uniformity of the trace operator on pMeas. Sliding, vanishing I, vanishing II, superposing and yanking for Tr follow from that of tr. Dinaturality for Tr follows from dinaturality of tr and Proposition 5.10.
We recall the notions of Int-construction [26] and compact closed category.
Definition 5.5 (Int-construction). Let (C, I, ⊗, tr) be a traced symmetric monoidal category. We define a category Int(C) by: • objects are pairs (X + , X − ) of C-objects; The identity on (X + , X − ) is given by the identity on X + ⊗ X − , and the composition of Int(C)- Here, we omit some coherence isomorphisms.
Definition 5.6. A compact closed category is a symmetric monoidal category (C, I, ⊗) with a function (−) ⊥ : obj(C) → obj(C) and families of C-arrows For X ∈ C, the object X ⊥ is called the dual object of X.

Theorem 5.3 ([26]
). The category Int(C) is a compact closed category. The unit and the monoidal product are given by The dual object of (X + , X − ) is (X − , X + ). The unit arrow η (X + ,X − ) and the counit arrow ǫ (X + ,X − ) are given by Corollary 5.1. The category Mealy is a compact closed category. The monoidal structure is given by (I, ⊗), and the unit and the counit are given by unit X and counit X respectively.
Proof. It is straightforward to check that Mealy is isomorphic to Int(Mealy + ), and the compact closed structure is given by data provided in Section 5.

Mealy Machine Semantics for PCFSS
We interpret a type A as the Int-object A given by We define interpretation of contexts by When ∆ is the empty sequence, we define ∆ to be I. For interpreting conditional branching, we use the following proposition. Proof. We first define an embedding from A − to S by induction on A. We note that for any type A, we have A + = A − . Base cases are easy. For induction step,

The statement follows from
inductively defined by diagrams in Figure 4. In these definitions, when we can infer ∆ and A, we simply write M and V for ∆ ⊢ M : A and ∆ ⊢ V : A respectively, and we often apply Convention 5.3 to these Mealy machines. Extracting precise definitions from these diagrams would be easy.

Adequacy Theorems
Finally, we give our main results. In the proof of our adequacy theorems, we use logical relations, diagrammatic reasoning of Mealy machines (Proposition 5.8), the domain theoretic structure of Mealy machines (Proposition 5.1), and Fubini-Tonelli theorem.

Distribution-Based Operational Semantics
Then we define a measure O(M) on R by: It follows from our adequacy theorems that sampling-based operational semantics induces distribution-based operational semantics.
A result analogous to Corollary 7.2 has already been proved by way of a purely operational (and quite laburious) argument in an untyped setting where score is not available in its full generality [28]. Here, it is just an easy corollary of our adequacy theorems. Proof. By case analysis. For the case of recursion, see Proposition 9.2 in Section 9.

Proof of Adequacy Theorems
We first prove soundness. Proof. By induction on the length of → * . (Base case) Easy. (Induction step) By case analysis on the first evaluation step of (M, a, u) → * (b, a ′ , ε). It remains to prove that o(M)(a, u) = (a ′ , b) implies that (M, a, u) → * (b, a ′ , ε). We use logical relations. We define a binary relation O between closed terms of type Real and Mealy machines from I to S ⊗ !R by i.e., if we have the following transitions: We then inductively define binary relations We list some properties of the logical relations. Proof. We can check these items by unfolding the definition of O and the logical relations.
• For any term ∆ ⊢ M : A and for any (V i , N i ) ∈ R Ai for i = 1, 2, . . . , n, we have • For any value ∆ ⊢ V : A and for any (V i , N i ) ∈ R Ai for i = 1, 2, . . . , n, we have Proof • When M = sample, for any (E, then by the definition of sa, we see that u = c :: u ′ for some c ∈ R [0,1] and u ′ ∈ T such that , a, u) → * (b, a ′ , ε).
• When M = fix A,B (f, x, N), for simplicity, we suppose that M is a closed term. By induction hypothesis, we can check that

Approximation Lemma
Let M : !X ⊸ X be a Mealy machine. In this section, we give a way to calculate a Mealy machine M † : I → !X given by whenever n ∈ α. When n / ∈ α, there is no output from ! α M. We can think ! α M as a "restriction" of !M to α. In fact, !M is equal to ! N M.
We are interested in restrictions of ! to subsets α n , β n ⊆ N inductively given by The definition of α n and β n are motivated by the following lemma.
Lemma 9.1. For any n ∈ N and for any M : X ⊸ Y, we have By means of ! α , we can also parametrize the operator (−) † . For α ⊆ N, and for M : !X ⊸ X, we define M †,α : I → !X by It is easy to see that M † is equal to M †,N .  For all n ∈ N, we have Hence, we have an ascending chain Proof. It follows from the definition of ! αn , we have Hence, by the definition of the composition and the monoidal product, we have for all n ∈ N, and 10 How About S-Finite Kernels?
The reader experienced with the semantics of probabilistic programming languages have probably already wondered whether a GoI model for PCFSS could be given out of s-finite kernels instead of measurable functions, following Staton's work on the semantics of a first-order probabilistic programming language [32]. The answer is indeed positive: the kind of construction we have presented in Section 5 can in fact be adapted to the category of measurable spaces and s-finite kernels. The latter, being traced monoidal, has all the necessary structure one needs [27]. What one obtains proceeding this way is indeed a GoI model, but adequate only for the distribution-based operational semantics.
The interpretation of any program in this alternative GoI can be seen as structurally identical to the one from Section 5 once the sample and score operators are interpreted as usual, namely as those s-finite kernels which actually perform sampling and scoring internally. Below, we first recall the definition of s-finite kernel, and then we introduce Mealy machines whose transition is described in terms of an s-finite kernel, and we give some basic Mealy machines. Finally, we give an adequate GoI model for the distribution-based operational semantics.
Being adequate for the distribution-based semantics directly (and not by way of integration as in Theorem 7.2) has the pleasant consequence of validating a number of useful program transformations, and in particular commutation of sampling and scoring effects, see [28] for a thorough discussion about this topic, and about how s-finite kernels are a particularly nice way of achieving commutativity in presence of scoring.

S-finite Kernels
Let k : X Y be a kernel. We say that k is finite when there is a real number c > 0 such that for all x ∈ X and A ∈ Σ Y , we have k(x, A) < c. An s-finite kernel is a kernel k : X Y such that there is a countable family {k n : X Y } of finite kernels such that k(x, A) = n∈N k n (x, A) for all x ∈ X and A ∈ Σ Y . It is easy to see that s-finite kernels are closed under the pointwise addition. We write i∈I k i : X Y for the pointwise addition of s-finite kernels k i : X Y . A (sub)probability kernel is a kernel k : X Y such that k(x, −) is a (sub)probability measure on X for all x ∈ X. Every (sub)probability kernel is a finite kernel.
Every measurable function f : X → Y gives rise to a probability kernelf : X Y given bŷ We denote the probability kernel induced by the identity measurable function by id X : X X. Concretely, this is given by id X We recall two constructions of s-finite kernels.
• (Composition) For s-finite kernels k : X Y and h : Y Z, we define an s-finite kernel The composition of s-finite kernels is associative and satisfies the unit laws, namely, we have k • id X = k and id Y • k = k.
• (Tensor product) For s-finite kernels k : X Y and h : Z W , we define an s-finite kernel k ⊗ h : X × Z Y × W to be the unique s-finite kernel such that for all (x, z) ∈ X × Z and for all A ∈ Σ Y and B ∈ Σ W , The tensor product and the coproduct of s-finite kernels is functorial. This means that these constructors are compatible with the composition and preserve identities. The following proposition summarizes catagorical status of these structures.
Proposition 10.1. The category of measurable spaces and s-finite kernels with ⊗ forms a symmetric monoidal category where the unit object is 1. The object ∅ is the zero object, and X + Y with inl X,Y : forms the coproduct of X and Y where inl X,Y : X → X + Y and inr X,Y : X → X + Y are the first and the second injections. Furthermore, the monoidal product distributes over the coproducts. Namely, the canonical s-finite kernel dst X,Y,Z : is a natural isomorphism.
Proof. For associativity of the composition, see [32,Lemma 3], and for functoriality of ⊗, see [32,Proposition 5]. It is not difficult to check that the category of measurable spaces and s-finite kernels associated with ⊗ and 1 forms a symmetric monoidal category. For s-finite kernels f : X Z and g : Y Z, the cotupling [f, g] : X + Y Z is given by It follows from universality of coproducts that dst X,Y,Z is a natural isomorphism.
For s-finite kernels k : X Y and h : Z W , we define an s-finite kernel k⊕h : X +Z Y +W by This is the unique s-finite kernel satisfying • an element s M ∈ S M called the initial state of M;

Probabilistic Mealy Machine
• an s-finite kernel τ M : When M is a probabilistic Mealy machine from X to Y, we write M : X Y.
We can regard a Mealy machine M : X ⊸ Y as a probabilistic Mealy machine from X to Y by identifying the transition function τ M : In the sequel, we confuse Mealy machines (and token machines) with corresponding probabilistic Mealy machines. Let X 1 , . . . , X n and Y 1 , . . . , Y m be Int-object. Just like Mealy machines, we depict a probabilistic Mealy machine M from X 1 ⊗ · · · ⊗ X n to Y 1 ⊗ · · · ⊗ Y m as a box with edges labeled by X 1 , . . . , X n on the left hand side and edges labeled by Y 1 , . . . , Y m on the right hand side:

Behavioral Equivalence
We give an equivalence relation between probabilistic Mealy machines so as to identify probabilistic Mealy machines that behaves in the same way. Let X and Y be Int-objects, and let M and N be probabilistic Mealy machines from X to Y. We write M ∼ X,Y N when there is a measurable function f : S M → S N such that f (s M ) = s N and the following diagram commutes: We define an equivalence relation ≃ X,Y to be the symmetric transitive closure of ∼ X,Y . A probabilistic Mealy machine M : X Y is behaviorally equivalent to N : X Y when we have M ≃ X,Y N. When we can infer subscripts of ≃ X,Y , we omit them. We say that a measurable function f : S M → S N realizes a behavioral equivalence M ≃ N (realizes M ∼ N) when M ∼ N is witnessed by f .

Construction of probabilistic Mealy Machines
We introduce probabilistic Mealy machines and their constructions that are building blocks of our denotational semantics. Most of them are adoptation of Mealy machines in Section 5.4, and we just give their formal definitions.

Composition/Cut
namely, the s-finite kernels k A,B,C,D satisfies Here, the horizontal arrows consists of the injection from A into X + + Y − + Y + + Z − followed by distributivity and symmetry. For example, when A = X + + Z − , the upper horizontal arrow is given by Joins in the definition of the composition of probabilistic Mealy machines are the pointwise ordoer. We can check that the composition of probabilistic Mealy machines is compatible with behavioural equivalence and that Int-objects and the composition of probabilistic Mealy machines is a category where the identity on an Int-object X is id X : X X (regarded as a probabilistic Mealy machine).

Monoidal Products
We give monoidal products of probabilistic Mealy machines. For probabilistic Mealy machines M : X Z and N : Y W, we define a probabilistic Mealy machine M ⊗ N : s N ) and τ M⊗N is given by It is not difficult to check that the monoidal product is compatible with behavioural equivalence. We depict M ⊗ N : (X ⊗ Y) (Z ⊗ W) as follows: For unit X , counit X and sym X,Y , we adopt the same diagrammatic presentation.

A Modal Operator
Let M : X Y be a probabilistic Mealy machine. We define a probabilistic Mealy machine !M : !X !Y by: the state space of !M is defined to be |M| N associated with the least σ-algebra such that for all A 1 , A 2 , . . . ∈ Σ M , the initial state s !M is (s M , s M , . . .); the transition function τ !M is the unique partial measurable function satisfying

Diagrammatic Reasoning on Probabilistic Mealy Machines
Diagrammatic reasoning is valid also for probabilistic Mealy machines.
Proposition 10.2. The category pMealy of Int-object and probabilistic Mealy machines (modulo behavioural equivalence) is a compact closed category. The dual of an Int-object X is X ⊥ . The unit and the counit arrows are unit X and counit X . Proposition 10.3. If two probabilistic Mealy machines have the same diagrammatic presentation modulo some rearrangement of edges and nodes, then they are behaviourally equivalent.
We can check Proposition 10.2 by replacing the category of partial measurable functions by the category of s-finite kernels in Section 5.6.

A State Monad
We define an Int-object J by (1, 1) and define an Int-object J 0 by (1, ∅). Then J ⊗ (−) is a state monad (on pMealy), whose unit and multiplication are given by: where j = unit J0 and n = J 0 ⊗ counit J0 ⊗ J ⊥ 0 .

Scoring
We construct a probabilistic Mealy machine Sc : R J by: The probabilistic Mealy machine simulates scoring score(r a ) as follows: ra Sc R J |a| ε a :: ε * * .

Sampling
We define a Mealy machine Sa : 1] , and the initial state s Sa is * , and the transition function The probabilistic Mealy machine behaves as follows: • In the initial state * , given * from the J-edge, Sa draws a real number from the uniform distribution and stores the real number: For example, the probability of the state being a real number in [0, 0.3] after this transition is 0.3.
• After this transition, Sa returns (n, a :: u) to each "query" (n, u): Then we define a measure obs(M) on R to be t M 1 • t M 0 ( * , −). Intuitively, obs(M) is a measure that describes distribution of real numbers obtained by the following process: • We first input * to the J-wire of M.
• If M outputs * to the J-wire, then we input (0, ε) to the !R-wire of M.
It remains to prove that M ⇒ ∞ µ implies obs( M d ) ≤ µ. We use logical relations. We define a binary relation O d between closed terms of type Real and probabilistic Mealy machines from I to We then inductively define binary relations We list some properties of the logical relations.
Lemma 11.3. Let A be a type.
• When M = sample, let (E, E) be a pair in R ⊤ Real . We write N for (n ⊗ !R) • (J ⊗ E) • Sa. By the definition of t 0 and t 1 , we have (t E•ra k(a, A) da.
• When M = fix A,B (f, x, N), for simplicity, we suppose that M is a closed term. By induction hypothesis, we can check that Proof. By soundness, we have µ ≤ obs( M d ). On the other hand, because ([−], j ⊗ id !R ) is an element of S ⊤ Real , we obtain the other inequality by Lemma 11.4.

Induction step on recursion
Our Goal: Approximation Lemma Let M : !X X be a Mealy machine. In this section, we show that a Mealy machine M † : I → !X given by and τ !αM is a unique s-finite kernel such that the following diagrams commute: • for any n ∈ α, • for any n ∈ α, Let α n , β n ⊆ N be α 0 = ∅, β n = { i, j : i ∈ α n and j ∈ N}, α n+1 = {2i : i ∈ N} ∪ {2i + 1 : i ∈ β n }.
The definition of α n and β n are motivated by the following lemma.
Lemma 11.5. For any n ∈ N and for any M : X Y, we have By means of ! α , we also parametrize the operator (−) † . For α ⊆ N, and for M : !X X, we define M †,α : I → !X by Lemma 11.7. There is a family of measurable functions φ X : (X N ) N → (X N ) N such that the following diagram commutes: where u X : X N → X N × (X N ) N is a measurable isomorphism given by u X (x n ) n∈N = ((x 2n ) n∈N , ((x 2 m0,m1 +1 ) m0∈N ) m1∈N ).
Proof. Almost equivalent to the proof of Lemma 11.4.  , (•, (n, a))), s), A) = 0. By Corollary 11.2 and by the definition of composition of probabilistic Mealy machines, we see that if (1) then let x be M in let y be N in L d = let y be N in let x be M in L d where k and h are s-finite kernels given by restricting the domain and the codomain of τ M d and τ N d respectively. Because of commuativity for s-finite kernels [32], the equality (1) is true. Hence, (2) holds. Then by adequacy, we see that let x be M in let y be N in L is observationally equivalent to let y be N in let x be M in L.

Conclusion
We introduced a denotational semantics for PCFSS, a higher-order functional language with sampling from a uniform continuous distribution and scoring. Following [28], we considered two operational semantics, namely a distribution-based operational semantics, which associates terms with distributions over real numbers, and a sampling-based operational semantics, which associates each term with a weight along every probabilistic branch. Our main results are adequacy theorems for both kinds of operational semantics, and it follows from these theorems that sampling-based operational semantics is essentially equivalent to distribution-based operational semantics. Another consequence of adequacy theorems is the possibility of diagrammatic reasoning for observational equivalence of programs. It follows from the observation in Section 5.5 and the adequacy theorems, that diagrammatic equivalence for denotation of terms implies observational equivalence. It would be interesting to explore possible connections between our work and other works on diagrammatic reasoning for probabilistic computation, such as [48,49]. At this point, our language does not support normalisation mechanism as a first class operator, and we are negative about extending our semantics to capture normalisation mechanism. However, capturing sampling algorithms such as the Metropolis-Hastings algorithm [50,51], which consists of a number of interactions between programs and their environment seems plausible. Exploring the relationships between "idealised" normalisation mechanisms and such "approximating" normalisation mechanisms from the point of view of GoI is an interesting topic for future work.