Solving high-dimensional optimal stopping problems using deep learning

Nowadays many financial derivatives, such as American or Bermudan options, are of early exercise type. Often the pricing of early exercise options gives rise to high-dimensional optimal stopping problems, since the dimension corresponds to the number of underlying assets. High-dimensional optimal stopping problems are, however, notoriously difficult to solve due to the well-known curse of dimensionality. In this work, we propose an algorithm for solving such problems, which is based on deep learning and computes, in the context of early exercise option pricing, both approximations of an optimal exercise strategy and the price of the considered option. The proposed algorithm can also be applied to optimal stopping problems that arise in other areas where the underlying stochastic process can be efficiently simulated. We present numerical results for a large number of example problems, which include the pricing of many high-dimensional American and Bermudan options, such as Bermudan max-call options in up to 5000 dimensions. Most of the obtained results are compared to reference values computed by exploiting the specific problem design or, where available, to reference values from the literature. These numerical results suggest that the proposed algorithm is highly effective in the case of many underlyings, in terms of both accuracy and speed.

In this work, we propose an algorithm for solving general possibly high-dimensional optimal stopping problems; cf. Framework 3.2 in Subsection 3.2. In spirit, it is similar to the algorithm introduced in [9]. The proposed algorithm is based on deep learning and computes both approximations of an optimal stopping strategy and the optimal expected pay-off associated with the considered optimal stopping problem. In the context of pricing early exercise options, these correspond to approximations of an optimal exercise strategy and the price of the considered option, respectively. The derivation and implementation of the proposed algorithm consist of essentially the following three steps.
(I) A neural network architecture for, in an appropriate sense, 'randomised' stopping times (cf. (31) in Subsection 2.4) is established in such a way that varying the neural network parameters leads to different randomised stopping times being expressed. This neural network architecture is used to replace the supremum of the expected pay-off over suitable stopping times (which constitutes the generic optimal stopping problem) by the supremum of a suitable objective function over neural network parameters (cf. (38)- (39) in Subsection 2.5).
(II) A stochastic gradient ascent-type optimisation algorithm is employed to compute neural network parameters that approximately maximise the objective function (cf. Subsection 2.6).
(III) From these neural network parameters and the corresponding randomised stopping time, a true stopping time is constructed which serves as the approximation of an optimal stopping strategy (cf. (44) and (46) in Subsection 2.7). In addition, an approximation of the optimal expected pay-off is obtained by computing a Monte Carlo approximation of the expected pay-off under this approximately optimal stopping strategy (cf. (45) in Subsection 2.7).
It follows from (III) that the proposed algorithm computes a low-biased approximation of the optimal expected pay-off (cf. (48) in Subsection 2.7). Yet, a large number of numerical experiments where a reference value is available (cf. Section 4) show that the bias appears to become small quickly during training and that a very satisfying accuracy can be achieved in short computation time, even in high dimensions (cf. the end of this introduction below for a brief overview of the numerical computations that were performed). Moreover, in (I) we resort to randomised stopping times in order to circumvent the discrete nature of stopping times that attain only finitely many different values. As a result, it is possible in (II) to tackle the arising optimisation problem with a stochastic gradient ascent-type algorithm. Furthermore, while the focus in this article lies on American and Bermudan option pricing, the proposed algorithm can also be applied to optimal stopping problems that arise in other areas where the underlying stochastic process can be efficiently simulated. Apart from this, we only rely on the assumption that the stochastic process to be optimally stopped is a Markov process (cf. Subsection 2.4). But this assumption is no substantial restriction since, on the one hand, it is automatically fulfilled in many relevant problems and, on the other hand, a discrete stochastic process that is not a Markov process can be replaced by a Markov process of higher dimension that aggregates all necessary information (cf., e.g., [9,Subsection 4.3] and, e.g., Subsection 4.4.4).
Next we compare our algorithm to the one introduced in [9]. The latter splits the original problem into smaller optimal stopping problems at each time step where stopping is permitted and decides to stop at that point in time or later (cf. [9, (4) in Subsection 2.1]). Starting at maturity, these auxiliary problems are solved recursively backwards until the initial time is reached. Thereby, in every new step, neural network parameters are learned for an objective function that depends, in particular, on the parameters found in the previous steps (cf. [9,Subsection 2.3]). In contrast, in (I) a single objective function is designed. This objective function allows to search in (II) for neural network parameters that maximise the expected pay-off simultaneously over (randomised) stopping times which may decide to stop at any of the admissible points in time. Therefore, the algorithm proposed here does not rely on a recursion over the different time points. In addition, the construction of the final approximation of an optimal stopping strategy in (III) differs from a corresponding construction in [9]. We refer to Subsection 4.3.2.1 for a comparison between the two algorithms with respect to performance.
The remainder of this article is organised as follows. In Section 2, we present the main ideas from which the proposed algorithm is derived. More specifically, in Subsection 2.1 we illustrate how an optimal stopping problem in the context of American option pricing is typically formulated. Thereafter, a replacement of this continuous-time problem by a corresponding discrete time optimal stopping problem is discussed by means of an example in Subsection 2.2. Subsection 2.3 is devoted to the statement and proof of an elementary but crucial result about factorising general discrete stopping times in terms of compositions of measurable functions (cf. Lemma 2.2), which lies at the heart of the neural network architecture we propose in Subsection 2.4 to approximate general discrete stopping times. This construction, in turn, is exploited in Subsection 2.5 to transform the discrete time optimal stopping problem from Subsection 2.2 into the search of a maximum of a suitable objective function (cf. (I) above). In Subsection 2.6, we suggest to employ stochastic gradient ascent-type optimisation algorithms to find approximate maximum points of the objective function (cf. (II) above). As a last step, we explain in Subsection 2.7 how we calculate final approximations of both the American option price and an optimal exercise strategy (cf. (III) above). In Section 3, we introduce the proposed algorithm in a concise way, first for a special case for the sake of clarity (cf. Subsection 3.1) and second in more generality so that, in particular, a rigorous description of our implementations is fully covered (cf. Subsections 3.2-3.3). Following this, in Section 4 first a few theoretical results are presented (cf. Subsection 4.1), which are used to design numerical example problems and to provide reference values. Thereafter, we describe in detail a large number of example problems, on which our proposed algorithm was tested, and present numerical results for each of these problems. In particular, the examples include the optimal stopping of Brownian motions (cf.

Main ideas of the proposed algorithm
In this section, we outline the main ideas that lead to the formulation of the proposed algorithm in Subsections 3.1-3.2 by considering the example of pricing an American option. The proposed algorithm in Framework 3.2 in Subsection 3.2 is, however, general enough to also be applied to optimal stopping problems where there are no specific assumptions on the dynamics of the underlying stochastic process, as long as it can be cheaply simulated (cf. Subsection 3.3). Furthermore, often in practice and, in particular, in the case of Bermudan option pricing (cf. many of the examples in Section 4), the optimal stopping problem of interest is not a continuous-time problem but is already formulated in discrete time. In such a situation, there is no need for a time discretisation, as described in Subsection 2.2 below, and the proposed algorithm in Framework 3.2 can be applied directly.

The American option pricing problem
Let T ∈ (0, ∞), d ∈ N = {1, 2, 3, . . .}, let (Ω, F, P) be a probability space with a filtration F = (F t ) t∈[0,T ] that satisfies the usual conditions (cf., e.g., [59,Definition 2.25 in Section 1.2]), let ξ : Ω → R d be an F 0 /B(R d )-measurable function which satisfies for × Ω → R d be a standard (Ω, F, P, F )-Brownian motion with continuous sample paths, let µ : R d → R d and σ : R d → R d×d be Lipschitz continuous functions, let X : [0, T ] × Ω → R d be an F -adapted continuous solution process of the stochastic differential equation let F = (F t ) t∈[0,T ] be the filtration generated by X, and let g : [0, T ] × R d → R be a continuous and at most polynomially growing function. We think of X as a model for the price processes of d underlyings (say, d stock prices) under the risk-neutral pricing measure P (cf., e.g., Kallsen [58]) and we are interested in approximatively pricing the American option on the process (X t ) t∈[0,T ] with the discounted pay-off function g : that is, we intend to compute the real number In addition to the price of the American option in the model (1), there is also a high demand from the financial engineering industry to compute an approximately optimal exercise strategy, that is, to compute a stopping time which approximately reaches the supremum in (2). In a very simple example of (1)-(2), we can think of an American put option in the onedimensional Black-Scholes model, in which there are an interest rate r ∈ R, a dividend yield δ ∈ [0, ∞), a volatility β ∈ (0, ∞), and a strike price K ∈ (0, ∞) such that it holds for all

Stochastic gradient ascent optimisation algorithms
Local/global maxima of the objective function (39) can be approximately reached by maximising the expectation of the random objective function by means of a stochastic gradient ascent-type optimisation algorithm. This yields a sequence of random parameter vectors along which we expect the objective function (39) to increase. More formally, applying under suitable hypotheses stochastic gradient ascenttype optimisation algorithms to (39) results in random approximations for m ∈ {0, 1, 2, . . . } of the local/global maximum points of the objective function (39), where m ∈ {0, 1, 2, . . . } is the number of steps of the employed stochastic gradient ascenttype optimisation algorithm.

Price and optimal exercise time for American-style options
The approximation algorithm sketched in Subsection 2.6 above allows us to approximatively compute both the price and an optimal exercise strategy for the American option (cf. Subsection 2.1). Let M ∈ N and consider a realisation Θ M ∈ R ν of the random variable Θ M : Ω → R ν . Then for sufficiently large N, ν, M ∈ N a candidate for a suitable approximation of the American option price is the real number and a candidate for a suitable approximation of an optimal exercise strategy for the American option is the function Note, however, that in general the function (43) does not take values in {0, 1, . . . , N } and hence is not a proper stopping time. Similarly, note that in general it is not clear whether there exists an exercise strategy such that the number (42) is equal to the expected discounted pay-off under this exercise strategy. For these reasons, we suggest other candidates for suitable approximations of the price and an optimal exercise strategy for the American option. More specifically, for every θ ∈ R ν let τ θ : Ω → {0, 1, . . . , N } be the F-stopping time given by (cf. (30) above). Then for sufficiently large N, ν, M ∈ N we use a suitable Monte Carlo approximation of the real number as a suitable implementable approximation of the price of the American option (cf. (2) in Subsection 2.1 above and (59) in Subsection 3.1 below) and we use the random variable as a suitable implementable approximation of an optimal exercise strategy for the American option. Note that (30) ensures that This shows that the exercise strategy τ Θ M : Ω → {0, 1, . . . , N } exercises at the first time index n ∈ {0, 1, . . . , N } for which the approximate stopping time factor associated with the mesh point t n is at least as large as the combined approximate stopping time factors associated with all later mesh points. Finally, observe that This implies that Monte Carlo approximations of the number (45) typically are low-biased approximations of the American option price (2).

Details of the proposed algorithm 3.1 Formulation of the proposed algorithm in a special case
In this subsection, we describe the proposed algorithm in the specific situation where the objective is to solve the American option pricing problem described in Subsection 2.1, where batch normalisation (cf. Ioffe & Szegedy [53]) is not employed in the proposed algorithm, and where the plain vanilla stochastic gradient ascent approximation method with a constant learning rate γ ∈ (0, ∞) and without mini-batches is the employed stochastic approximation algorithm. The general framework, which includes the setting in this subsection as a special case, can be found in Subsection 3.2 below. and for every n ∈ {0, 1, . . . , N }, θ ∈ R ν let U n,θ : (R d+1 ) n+1 → (0, 1) be the function which satisfies for all z 0 , z 1 , . . . , z n ∈ R d+1 that for every m ∈ N let φ m : R ν × Ω → R be the function which satisfies for all θ ∈ R ν , ω ∈ Ω that let Θ : N 0 × Ω → R ν be a stochastic process which satisfies for all m ∈ N that and for every j ∈ N, θ ∈ R ν let τ j,θ : Ω → {0, 1, . . . , N } be the random variable given by Consider the setting of Framework 3.1, assume that µ and σ are globally Lipschitz continuous, and assume that g is continuous and at most polynomially growing. In the case of sufficiently large N, M, J ∈ N and sufficiently small γ ∈ (0, ∞), we then think of the random number as an approximation of the price of the American option with the discounted pay-off function g and for every j ∈ N we think of the random variable as an approximation of an optimal exercise strategy associated with the underlying timediscrete path (X M +j n ) n∈{0,1,...,N } (cf. Subsection 2.1 above and Section 4 below).

Formulation of the proposed algorithm in the general case
In this subsection, we extend the framework in Subsection 3.1 above and describe the proposed algorithm in the general case.
, g(t n , X m,j ) , for every n ∈ {0, 1, . . . , N }, θ ∈ R ν , s ∈ R ς let u θ,s n : R d+1 → (0, 1) be a function, for every n ∈ {0, 1, . . . , N }, θ ∈ R ν , s ∈ R ς let U θ,s n : (R d+1 ) n+1 → (0, 1) be the function which satisfies for all z 0 , z 1 , . . . , z n ∈ R d+1 that for every j ∈ N, θ ∈ R ν , s ∈ R ς let τ j,θ,s : Ω → {0, 1, . . . , N } be the random variable given by and let P : Ω → R be the random variable which satisfies for all ω ∈ Ω that Consider the setting of Framework 3.2. Under suitable further assumptions, in the case of sufficiently large N, M, ν, J 0 ∈ N, we think of the random number as an approximation of the price of the American option with the discounted pay-off function g and for every j ∈ N we think of the random variable as an approximation of an optimal exercise strategy associated with the underlying timediscrete path (X 0,j n ) n∈{0,1,...,N } (cf. Subsection 2.1 above and Section 4 below).

Comments on the proposed algorithm
Note that the lack in Framework 3.2 of any assumptions on the dynamics of the stochastic process (X 0,1 n ) n∈{0,1,...,N } allows us to approximatively compute the optimal pay-off as well as an optimal exercise strategy for very general optimal stopping problems where, in particular, the stochastic process under consideration is not necessarily related to the solution of a stochastic differential equation. We only require that (X 0,1 n ) n∈{0,1,...,N } can be simulated efficiently and formally we still rely on the Markov assumption (cf. Subsection 2.4 above). In addition, observe that the choice of the functions u θ,s N : R d+1 → (0, 1), s ∈ R ς , θ ∈ R ν , has no influence on the proposed algorithm (cf. (61)). Furthermore, the dynamics in (65) associated with the stochastic processes (Ξ m ) m∈N 0 and (Θ m ) m∈N 0 allow us to incorporate different stochastic approximation algorithms such as • plain vanilla stochastic gradient ascent with or without mini-batches (cf. (57) [53] and the beginning of Section 4 below) into the algorithm in Subsection 3.2. In that case, we think of (S m ) m∈N 0 as a bookkeeping process keeping track of approximatively calculated means and standard deviations as well as of the number of steps m ∈ N 0 of the employed stochastic approximation algorithm.  [60]) with varying learning rates and with mini-batches (cf. Subsection 4.2 below for a precise description).
In the example in Subsection 4.4.4 below, the initial value X 0,1 0 is random. Therefore, we use N fully connected feedforward neural networks to model the functions u θ,s 0 , . . . , u θ,s N −1 : Then it can be decided whether it is better to stop at time 0 or not by comparing the deterministic pay-off g(0, X 0,1 0 ) to a standard Monte Carlo estimate of the expected pay-off generated by the stopping strategy given by u θ,s 0 = 0 and the functions u θ,s 1 , . . . , u θ,s N −1 : R d+1 → (0, 1); cf. [9, Remark 6 in Subsection 2.3]. The standard network architecture we use in this paper consists of a (d+1)-dimensional input layer, two (d+50)-dimensional hidden layers, and a one-dimensional output layer. As non-linear activation functions just in front of the hidden layers, we employ the multidimensional version of the rectifier function R x → max{x, 0} ∈ [0, ∞), whereas just in front of the output layer we employ the standard logistic function R x → exp(x) /(exp(x)+1) ∈ (0, 1). In addition, batch normalisation (cf. Ioffe & Szegedy [53]) is applied just before the first linear transformation, just before each of the non-linear activation functions in front of the hidden layers as well as just before the non-linear activation function in front of the output layer. We use Xavier initialisation (cf. Glorot & Bengio [45]) to initialise all weights in the neural networks.
Two hidden layers work well in all our examples. However, the examples in Subsection 4.3 have an underlying one-dimensional structure, and as a consequence, fewer hidden layers yield equally good results; see Tables 2-3 below. On the other hand, the examples in Subsection 4.4 are more complex. In particular, it can be seen from Table 11 that for the max-call option in Subsection 4.4.1.1, two hidden layers give better results than zero or one hidden layer, but more than two hidden layers do not improve the results.
All examples presented below were implemented in Python. The corresponding Python source codes (cf. Section 5) were run, unless stated otherwise (cf. Subsection 4.4.1.2 as well as the last sentence in Subsection 4.4.1.3 below), in single precision (float32) on a NVIDIA GeForce RTX 2080 Ti GPU. The underlying system consisted of an AMD Ryzen 9 3950X CPU with 64 GB DDR4 memory running Tensorflow 2.1 on Ubuntu 19.10. We would like to point out that no special emphasis was put on optimising computation speed. In many cases, some of the algorithm parameters could be adjusted in order to obtain similarly accurate results in shorter runtime.

Theoretical considerations
Before we present the optimal stopping problem examples on which we tested the algorithm of Framework 3.2 (cf. Subsections 4.3-4.4 below), we recall a few theoretical results, which are used to design some of these examples, determine reference values, and provide further insights.

Option prices in the Black-Scholes model
The elementary and well-known result in Lemma 4.1 below specifies the distributions of linear combinations of independent and identically distributed centred Gaussian random variables which take values in a separable normed R-vector space.
The next elementary and well-known corollary follows directly from Lemma 4.1.
The next elementary result, Proposition 4.3, states that the distribution of a product of multiple correlated geometric Brownian motions is equal to the distribution of a single particular geometric Brownian motion.
-adapted stochastic process with continuous sample paths, let Y : [0, T ] × Ω → R be an F (2) -adapted stochastic process with continuous sample paths, and assume that for all t ∈ [0, T ] it holds P-a.s. that (ii) it holds that P and G are continuous functions, and Proof of Proposition 4.3. Throughout this proof let γ = (γ 1 , . . . , γ d ) ∈ R d be the vector Observe that for all i ∈ {1, . . . , d}, t ∈ [0, T ] it holds P-a.s. that In addition, note that for all i ∈ {1, . . . , d}, t ∈ [0, T ] it holds P-a.s. that Itô's formula hence shows that for all i ∈ {1, . . . , d}, t ∈ [0, T ] it holds P-a.s. that Combining this and (78) This establishes (i). In the next step note that (ii) is clear. It thus remains to prove (iii). For this observe that (i) establishes that for all t ∈ [0, T ] it holds P-a.s. that Continuity hence implies that it holds P-a.s. that Moreover, note that (i) shows that for all t ∈ [0, T ] it holds P-a.s. that This and continuity establish that it holds P-a.s. that Furthermore, observe that Corollary 4.2 ensures that The fact thatG : The proof of Proposition 4.3 is thus complete.
In the next result, Lemma 4.4, we recall the well-known formula for the price of a European call option on a single stock in the Black-Scholes model (cf., e.g., Øksendal [78,Corollary 12.3.8]).
2 y 2 dy, let (Ω, F, P) be a probability space with a filtration F = (F t ) t∈[0,T ] that satisfies the usual conditions, let W : [0, T ] × Ω → R be a standard (Ω, F, P, F )-Brownian motion with continuous sample paths, and let X : [0, T ] × Ω → R be an F -adapted stochastic process with continuous sample paths which satisfies that for all t ∈ [0, T ] it holds P-a.s. that Then it holds for all K ∈ R that

Approximating American options with Bermudan options
In our numerical simulations, we approximate Bermudan options with a finite number of execution times rather than American options, which theoretically can be executed at infinitely many time points (any time before maturity). However, the following result shows that the prices of American options can be approximated with prices of Bermudan options with equidistant execution times if the number of execution times is sufficiently large.

Setting
Framework 4.6. Assume Framework 3.2, let ζ 1 = 0.9, assume for all n ∈ {0, 1, . . . , N } that = 2ν, Ξ 0 = 0, and t n = nT N , and assume for all m ∈ N, x = (x 1 , . . . , x ν ), y = (y 1 , . . . , y ν ), η = (η 1 , . . . , η ν ) ∈ R ν that and ψ m (x, y) = Equations (97) • of T as the maturity, • of d as the dimension of the associated optimal stopping problem, • of N as the time discretisation parameter employed, • of M as the total number of training steps employed in the Adam optimiser, • of g as the discounted pay-off function, • of {t 0 , t 1 , . . . , t N } as the discrete time grid employed, • of J 0 as the number of Monte Carlo samples employed in the final integration for the price approximation, • of (J m ) m∈N as the sequence of batch sizes employed in the Adam optimiser, • of ζ 1 as the momentum decay factor, of ζ 2 as the second momentum decay factor, and of ε as the regularising factor employed in the Adam optimiser, • of (γ m ) m∈N as the sequence of learning rates employed in the Adam optimiser, • and, where applicable, of X as a continuous-time model for d underlying stock prices with initial prices ξ, drift coefficient function µ, and diffusion coefficient function σ.
Moreover, note that for every m ∈ N 0 , j ∈ N the stochastic processes W m,j,(1) = (W

Examples with known one-dimensional representation
In this subsection, we test the algorithm of Framework 3.2 on different d-dimensional optimal stopping problems that can be represented as one-dimensional optimal stopping problems. This representation allows us to employ a numerical method for the one-dimensional optimal stopping problem to compute reference values for the original d-dimensional optimal stopping problem. We refer to Subsection 4.4 below for more challenging examples where a one-dimensional representation is not known.

A Bermudan put-type example with three exercise opportunities
In this subsection, we test the algorithm of Framework 3.2 on the example of optimally stopping a correlated Brownian motion under a put option inspired pay-off function with three possible exercise dates. Among other things, we examine the performance of the algorithm for different numbers of hidden layers of the employed neural networks. Assume The random variable P given in (67) provides approximations of the real number sup E g(τ, S W 0,1 τ ) : τ : Ω→{t 0 ,t 1 ,t 2 } is an (Ft) t∈{t 0 ,t 1 ,t 2 } -stopping time .
The numbers in Table 1 were obtained with our standard network architecture with two hidden layers. It shows approximations of the mean of P, of the standard deviation of P, and of the relative L 1 -approximation error associated with P, the uncorrected sample standard deviation of the relative approximation error associated with P, and the average runtime in seconds needed for calculating one realisation of P for d ∈ {1, 5, 10, 50, 100, 500, 1000}. For each case, the calculations of the results in Tables 1-3 are based on 10 independent realisations of P, which were obtained from an implementation in Python. Furthermore, in the approximative calculations of the relative approximation error associated with P, the exact number (100) was replaced, independently of the dimension d, by the real number (cf. Corollary 4.2), which, in turn, was replaced by the value 7.894. The latter was computed in Matlab R2017b using the binomial tree method implemented as Matlab's function optstockbycrr with 20,000 nodes. Note that (101) corresponds to the price of a Bermudan put option on a single stock in the Black-Scholes model with initial stock price χ, interest rate r, volatility β, strike price K, maturity T , and N possible exercise dates.
Due to the underlying one-dimensional structure, all examples in Subsection 4.3 admit an optimal stopping rule which, at each possible exercise date t n , checks whether the current pay-off is above a threshold c n ∈ R. Therefore, we also apply our algorithm to the example in Subsection 4.3.1.1 using networks with one input neuron and no hidden layers. This corresponds to learning the thresholds c n from simulated pay-offs with onedimensional logistic regressions. We used the same number of simulations as in Table 1 and batch normalisation before the first linear transformation but no batch normalisation before the logistic function. As can be seen from   have the same accuracy as the ones of Table 1 and, in addition, the computations times are shorter. However, as we will see in Subsection 4.4, it cannot be hoped that good results can be obtained with a simplified network architecture if the stopping problem is more complex. Table 3 shows approximations of (100) for d = 10 obtained with networks with (d + 1)dimensional input layers and different numbers of hidden layers. Again, it can be seen that in this example hidden layers do not improve the accuracy of the results.

An American put-type example
In this subsection, we test the algorithm of Framework 3.2 on the example of optimally stopping a standard Brownian motion under a put option inspired pay-off function. Assume The random variable P given in (67) We report approximations of the mean of P, of the standard deviation of P, and of the relative L 1 -approximation error associated with P, the uncorrected sample standard deviation of the relative approximation error associated with P, and the average runtime in seconds needed for calculating one realisation of P for d ∈ {1, 5, 10, 50, 100, 500, 1000} in Table 4. For each case, the calculations of the results in Table 4 are based on 10 independent realisations of P, which were obtained from an implementation in Python. Furthermore, in the approximative calculations of the relative approximation error associated with P, the exact number (103) was replaced, independently of the dimension d, by the real number   . . . , β d ) ∈ R d , ρ,δ,β, δ 1 , δ 2 , . . . , δ d ∈ R, r = 0.6, K = 95,ξ = 100 satisfy for all i ∈ {1, . . . , d} that × Ω → R be an F -adapted stochastic process with continuous sample paths which satisfies that for all t ∈ [0, T ] it holds P-a.s. that and that The random variable P given in (67)  (108) In Table 5, we show approximations of the mean of P, of the standard deviation of P, and of the relative L 1 -approximation error associated with P, the uncorrected sample standard deviation of the relative approximation error associated with P, and the average runtime in seconds needed for calculating one realisation of P for d ∈ {40, 80, 120, 160, 200}. For each case, the calculations of the results in Table 5 are based on 10 independent realisations of P, which were obtained from an implementation in Python. Furthermore, in the approximative calculations of the relative approximation error associated with P, the exact value of the price (108) was replaced, independently of the dimension d, by the real number (cf. Proposition 4.3), which, in turn, was replaced by the value 6.545. The latter was calculated using the binomial tree method on Smirnov's website [87] with 20,000 nodes. Note that (109) corresponds to the price of an American put option on a single stock in the Black-Scholes model with initial stock priceξ, interest rate r, dividend yieldδ, volatilitỹ β, strike price K, and maturity T .   Table 6: Numerical simulations of the algorithm from [9] for pricing the American geometric average put-type option from the example in Subsection 4.3.2.1. In the approximative calculations of the relative approximation errors, the exact value of the price (108) was again replaced by the value 6.545.
In addition, we computed approximations of the price (108) using the algorithm introduced in [9], where the random variableL from [9, Subsection 3.1] plays the role analogous to P. In order to maximise comparability, the hyperparameters and neural network architectures employed for the algorithm from [9] were chosen to be identical to the corresponding ones used for computing realisations of P. Table 6 shows approximations of the mean ofL (cf. [9, Subsection 3.1]), of the standard deviation ofL, and of the relative L 1 -approximation error associated withL, the uncorrected sample standard deviation of the relative approximation error associated withL, and the average runtime in seconds needed for calculating one realisation ofL for d ∈ {40, 80, 120, 160, 200}. For each case, the calculations of the results in Table 6 are based on 10 independent realisations ofL, which were obtained from an implementation in Python. In the approximative calculations of the relative approximation error associated withL, the exact value of the price (108) again was replaced, independently of the dimension d, by the value 6.545. Comparing Table 5 with Table 6, we note that in the present cases the algorithm of Framework 3.2 and the algorithm from [9] exhibit very similar performance in terms of both accuracy and speed, with a slight runtime advantage for the algorithm of Framework 3.2.

An American geometric average call-type example
In this subsection, we test the algorithm of Framework 3.2 on the example of pricing an American geometric average call-type option on up to 200 correlated stocks in the Black-Scholes model. This example is taken from Sirignano & Spiliopoulos [86,Subsection 4.3].
Assume Framework 4.6, let r = 0%, δ = 0.02 = 2%, β = 0.25 = 25%, K =ξ = 1, × Ω → R be an F -adapted stochastic process with continuous sample paths which satisfies that for all t ∈ [0, T ] it holds P-a.s. that and that The random variable P given in (67) and of the relative L 1 -approximation error associated with P, the uncorrected sample standard deviation of the relative approximation error associated with P, and the average runtime in seconds needed for calculating one realisation of P for d ∈ {3, 20, 100, 200} in Table 7. The approximative calculations of the mean of P, of the standard deviation of P, and of the relative L 1 -approximation error associated with P, the computations of the uncorrected sample standard deviation of the relative approximation error associated with P as well as the computations of the average runtime for calculating one realisation of P in Table 7 each are based on 10 independent realisations of P, which were obtained from an implementation in Python. Furthermore, in the approximative calculations of the relative approximation error associated with P, the exact value of the price (113) was replaced by the number (114) (cf. Proposition 4.3), which was approximatively calculated using the binomial tree method on Smirnov's website [87] with 20,000 nodes. Note that (114) corresponds to the price of an American call option on a single stock in the Black-Scholes model with initial stock priceξ, interest rate r, dividend yieldδ, volatilityβ, strike price K, and maturity T .  In the approximative calculations of the relative approximation errors, the exact value of the price (113) was replaced by the number (114), which was approximatively calculated using the binomial tree method on Smirnov's website [87].

Another American geometric average call-type example
In this subsection, we test the algorithm of Framework 3.2 on the example of pricing an American geometric average call-type option on up to 400 distinguishable stocks in the Black-Scholes model. Assume Framework 4.6, assume that d ∈ {40, 80, 120, . . .}, let β = (β 1 , . . . , β d ) ∈ R d , α 1 , . . . , α d ∈ R, r,β ∈ (0, ∞), K = 95,ξ = 100 satisfy for all i ∈ {1, . . . , d} that be an F -adapted stochastic process with continuous sample paths which satisfies that for all t ∈ [0, T ] it holds P-a.s. that and that The random variable P given in (67) provides approximations of the price sup E g(τ, X τ ) : τ : Ω→[0,T ] is an In Table 8, we show approximations of the mean of P, of the standard deviation of P, of the real number E e −rT max{Y T − K, 0} ,  and of the relative L 1 -approximation error associated with P, the uncorrected sample standard deviation of the relative approximation error associated with P, and the average runtime in seconds needed for calculating one realisation of P for d ∈ {40, 80, 120, 160, 200, 400}. The approximative calculations of the mean of P, of the standard deviation of P, and of the relative L 1 -approximation error associated with P, the computations of the uncorrected sample standard deviation of the relative approximation error associated with P as well as the computations of the average runtime for calculating one realisation of P in Table 8 each are based on 10 independent realisations of P, which were obtained from an implementation in Python. Moreover, in the approximative calculations of the relative approximation error associated with P, the exact value of the price (118) was replaced by the real number sup E e −rτ max{Y τ − K, 0} : τ : Ω→[0,T ] is an (cf. Proposition 4.3). It is well known (cf., e.g., Shreve [84,Corollary 8.5.3]) that the number (120) is equal to the number (119), which was approximatively computed in Matlab R2017b using Lemma 4.4 above. Note that (120) corresponds to the price of an American call option on a single stock in the Black-Scholes model with initial stock priceξ, interest rate r, volatilityβ, strike price K, and maturity T , while (119) corresponds to the price of a European call option on a single stock in the Black-Scholes model with initial stock priceξ, interest rate r, volatilityβ, strike price K, and maturity T . . Assume Framework 4.6, let H ∈ N 0 , r = 0.05 = 5%, δ = 0.1 = 10%, β = 0.2 = 20%, K = 100, let F = (F t ) t∈[0,T ] be the filtration generated by X, assume that each of the employed neural networks has H hidden layers, and assume for all m, j ∈ N, n ∈ {0, 1, . . . , N },
In Table 9, we show approximations of the mean and of the standard deviation of P, binomial approximations as well as 95% confidence intervals for the price (123) according to Andersen & Broadie [3, Table 3 in Subsection 5.3] (where available), and the average runtime in seconds needed for calculating one realisation of P for (d, ξ 1 ) ∈ {2, 3, 5} × {90, 100, 110} and H = 2. The approximative calculations of the mean and of the standard deviation of P as well as the computations of the average runtime for calculating one realisation of P in Tables 9-11 each are based on 10 independent realisations of P, which were obtained from an implementation in Python (cf. Python code 2 in Subsection 5.2 below). In Table 10, we list approximations of the mean and of the standard deviation of P and the average runtime in seconds needed for calculating one realisation of P for (d, ξ 1 ) ∈ {10, 20, 30, 50, 100, 200, 500} × {90, 100, 110} and H = 2.
To see the impact of the number of hidden layers used in the neural networks, we additionally report in Table 11 approximation results for d = 5, ξ 1 = 100, and H ∈ {0, 1, 2, 3, 4, 5}. We used the same number of simulations as in Tables 9-10, but due to the higher number of hidden layers, we chose M = 5000 and ∀ m ∈ N : γ m = 5[10 −2 1 [1,1000] (m)+ 10 −3 1 (1000,3000] (m) + 10 −4 1 (3000,∞) (m)]. It can be seen that, in this example, two hidden layers yield better results than zero or one hidden layer. But more than two hidden layers do not lead to an improvement.

A high-dimensional Bermudan max-call benchmark example
In this subsection, we test the algorithm of Framework 3.2 on the example of pricing the Bermudan max-call option from the example in Subsection 4.4.1.1 in a case with 5000 underlying stocks. All Python source codes corresponding to this example were run in single precision (float32) on a NVIDIA Tesla P100 GPU. Assume and that g(s, x) = e −rs max max{x 1 , . . . , x d } − K, 0 .
For sufficiently large M ∈ N, the random variable P provides approximations of the price sup E g(τ, X τ ) : In Table 12, we show a realisation of P, a 95% confidence interval for the corresponding realisation of the random variable Ω w → E g τ 1,Θ M (w),S M (w) , X 0,1 the corresponding realisation of the relative approximation error associated with P, and the runtime in seconds needed for calculating the realisation of P for M ∈ {0, 250, 500, . . . , 2000} ∪ {6000}. In addition, Figure 1 depicts a realisation of the relative approximation error associated with P against M ∈ {0, 10, 20, . . . , 2000}. For each case, the 95% confidence interval for the realisation of the random variable (127) in Table 12 was computed based on the corresponding realisation of P, the corresponding sample standard deviation, and the 0.975 quantile of the standard normal distribution (cf., e.g., [9,Subsection 3.3]). Moreover, in the approximative calculations of the realisation of the relative approximation error associated with P, in Table 12 and Figure 1 the exact value of the price (126) was replaced by the value 165.430, which corresponds to a realisation of P with M = 6000 (cf. and that g(s, x) = e −rs max max{x 1 , . . . , x d } − K, 0 .
Dimen-Matu-Strike Mean Standard European Price Average runtime sion d rity T price K of P deviation price (131) in [5] in sec. for one of P realisation of P and that The random variable P given in (67)    one realisation of P in Table 14 each are based on 10 independent realisations of P, which were obtained from an implementation in Python (cf. Python code 4 in Subsection 5.4 below).

A put basket option in Dupire's local volatility model
In this subsection, we test the algorithm of Framework 3.2 on the example of pricing an American put basket option on five stocks in Dupire's local volatility model. This example is taken from Labart & Lelong [70,Subsection 6.3] with the modification that we also consider the case where the underlying stocks do not pay any dividends. Assume Framework 4.6, let L = 10, r = 0.05 = 5%, δ ∈ {0%, 10%}, K = 100, assume for all i ∈ {1, . . . , d}, x ∈ R d that ξ i = 100 and µ(x) = (r − δ) x, let β : and σ(t, x) = diag(β(t, x 1 ), β(t, x 2 ), . . . , β(t, x d )), let S = (S (1) , . . . , S (d) ) : [0, T ] × Ω → R d be an F -adapted stochastic process with continuous sample paths which satisfies that for all t ∈ [0, T ] it holds P-a.s. that The random variable P given in (67) which, in turn, is an approximation of the price In Table 15, we show approximations of the mean and of the standard deviation of P and the average runtime in seconds needed for calculating one realisation of P for (δ, N ) ∈ {0%, 10%} × {5, 10, 50, 100}. For each case, the calculations of the results in Table 15 are  based on 10 independent realisations of P, which were obtained from an implementation in Python (cf. Python code 5 in Subsection 5.5 below). According to [70,Subsection 6.3], the value 6.30 is an approximation of the price (141) for δ = 10%. Furthermore, the European put basket option price E g(T, Y 0,1 T ) corresponding to (141) was approximatively calculated using a Monte Carlo approximation based on 10 10 realisations of the random variable Ω ω → g(T, Y 0,1 T (ω)) ∈ R (cf. Python code 5 in Subsection 5.5 below), which resulted in the value 1.741 in the case δ = 0% and in the value 6.304 in the case δ = 10%.

A path-dependent financial derivative
In this subsection, we test the algorithm of Framework 3.2 on the example of pricing a specific path-dependent financial derivative contingent on prices of a single underlying stock in the Black-Scholes model, which is formulated as a 100-dimensional optimal stopping problem. This example is taken from Tsitsiklis & Van Roy [90, Section IV] with the modification that we consider a finite instead of an infinite time horizon.
Assume Framework 4.6, let r = 0.0004 = 0.04%, β = 0.02 = 2%, let W m,j : [0, ∞)×Ω → R, j ∈ N, m ∈ N 0 , be independent P-standard Brownian motions with continuous sample paths, let S m,j : [−100, ∞) × Ω → R, j ∈ N, m ∈ N 0 , and Y m,j : N 0 × Ω → R 100 , j ∈ N, m ∈ N 0 , be the stochastic processes which satisfy for all m, n ∈ N 0 , j ∈ N, t ∈ [−100, ∞) that S m,j t = exp r − 1 2 β 2 (t + 100) + β W m,j t+100 ξ 1 and In Table 16, we show approximations of the mean and of the standard deviation of P and the average runtime in seconds needed for calculating one realisation of P for T ∈ {100, 150, 200, 250, 1000}. For each case, the calculations of the results in Table 16 are based on 10 independent realisations of P, which were obtained from an implementation in Python (cf. Python code 6 in Subsection 5.6 below). Note that in this example time is measured in days and that, roughly speaking, (144) corresponds to the price of a financial derivative which, if the holder decides to exercise, pays off the amount given by the ratio between the current underlying stock price and the underlying stock price 100 days ago (cf. [ which corresponds to the price (144) in the case of an infinite time horizon. Since the mean of P is a lower bound for the price (144), which, in turn, is a lower bound for the price (145), a higher value indicates a better approximation of the price (145). In addition, observe that the price (144) is non-decreasing in T . While in our numerical simulations the approximate value of the mean of P is less or equal than 1.282 for comparatively small time horizons, i.e., for T ≤ 150, it is already higher for slightly larger time horizons, i.e., for T ≥ 200 (cf.