Hostname: page-component-848d4c4894-wg55d Total loading time: 0 Render date: 2024-05-05T07:10:58.890Z Has data issue: false hasContentIssue false

Exact simulation for multivariate Itô diffusions

Published online by Cambridge University Press:  03 December 2020

Jose Blanchet*
Affiliation:
Stanford University
Fan Zhang*
Affiliation:
Stanford University
*
*Postal address: Huang Engineering Center, 475 Via Ortega, Stanford, CA 94305, United States.
*Postal address: Huang Engineering Center, 475 Via Ortega, Stanford, CA 94305, United States.
Rights & Permissions [Opens in a new window]

Abstract

We provide the first generic exact simulation algorithm for multivariate diffusions. Current exact sampling algorithms for diffusions require the existence of a transformation which can be used to reduce the sampling problem to the case of a constant diffusion matrix and a drift which is the gradient of some function. Such a transformation, called the Lamperti transformation, can be applied in general only in one dimension. So, completely different ideas are required for the exact sampling of generic multivariate diffusions. The development of these ideas is the main contribution of this paper. Our strategy combines techniques borrowed from the theory of rough paths, on the one hand, and multilevel Monte Carlo on the other.

Type
Original Article
Copyright
© Applied Probability Trust 2020

1. Introduction

Consider a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and an Itô stochastic differential equation (SDE)

(1) \begin{equation}dX(t)=\mu (X(t))dt+\sigma (X(t))dW(t),\text{ }X(0)=x_0,\end{equation}

where $W(\!\cdot\!)$ is a $d^{\prime }$-dimensional Brownian motion under $\mathbb{P}$, and $\mu (\!\cdot\!) = (\mu_i(\!\cdot\!))_d\,{:}\,\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ and $\sigma (\!\cdot\!) = (\sigma_{ij}(\!\cdot\!))_{d \times d^{\prime }}\,{:}\,\mathbb{R}^{d}\rightarrow \mathbb{R}^{d\times d^{\prime }}$ satisfy suitable regularity conditions. For instance, in order for (1) to have a strong solution, it is sufficient to assume that both $\mu \left( \cdot \right) $ and $\sigma \left( \cdot \right) $ are uniformly Lipschitz.

Under additional regularity conditions on $\mu \left( \cdot \right) $ and $\sigma \left( \cdot \right) $, this paper provides the first Monte Carlo simulation algorithm which allows sampling of any discrete skeleton $X\left(t_{1}\right)\!,...,X\left( t_{m}\right) $ exactly, without any bias.

The precise regularity conditions that we impose on $\mu \left( \cdot\right) $ and $\sigma \left( \cdot \right) $ are stated in Section 3. In particular, it is sufficient for the validity of our Monte Carlo method to assume $\mu \left( \cdot \right) $ and $\sigma \left( \cdot\right) $ to be three times continuously differentiable, both with Lipschitz continuous derivatives of order three. In addition, we must assume that $\sigma \left( \cdot \right) $ is uniformly elliptic.

Exact simulation of SDEs has generated a substantial amount of interest in the applied probability and Monte Carlo simulation communities. The landmark paper [Reference Beskos and Roberts5] introduced what has become the standard procedure for the design of generic exact simulation algorithms for diffusions. The authors of [Reference Beskos and Roberts5] propose a clever acceptance-rejection sampler which uses Brownian motion as a proposal distribution. The authors of [Reference Chen and Huang7] apply a localization technique which eliminates certain boundedness assumptions originally present in [Reference Beskos and Roberts5]; see also [Reference Beskos, Papaspiliopoulos and Roberts4] for the use of retrospective simulation ideas to dispense with boundedness assumptions.

The fundamental assumption underlying the work of [Reference Beskos and Roberts5] and its extensions is that the underlying (target) process has a constant diffusion coefficient; i.e., $\sigma \left( x\right) =\sigma $ for every x. Beskos and Roberts [Reference Beskos and Roberts5] note that in the case $d=1$, owing to Lamperti’s transformation, the constant diffusion coefficient assumption comes basically at no cost in generality.

Unfortunately, however, Lamperti’s transformation is only generally applicable in one dimension. In fact, [Reference Ait-Sahalia1] characterizes the multidimensional diffusions for which Lamperti’s transformation can be successfully applied, and these models are very restrictive.

Moreover, even if Lamperti’s transformation is applicable in a multidimensional setting, another implicit assumption in the application of the Beskos and Roberts acceptance-rejection procedure is that the drift coefficient $\mu \left( \cdot \right) $ is the gradient of some function (i.e. $\mu \left( x\right) =\nabla v\left( x\right) $ for some $v\left( \cdot \right) $). This assumption, once again, comes at virtually no cost in generality in the one-dimensional setting, but it may be very restrictive in the multidimensional case.

Because of these limitations, a generic algorithm for exact simulation of multidimensional diffusions, even under the regularity conditions that we impose here, requires a completely different set of ideas.

The contribution in this paper is therefore not only the production of such a generic exact simulation algorithm, but also the development of the ideas that are behind its construction. In Section 2 we introduce an algorithm which assumes a constant diffusion coefficient, but removes the assumption of the drift coefficient being the gradient of some function. In Section 3, we eventually remove the requirement of a constant diffusion matrix and propose an algorithm applicable to general diffusions. The algorithms in Sections 2 and 3 are different in nature. However, they share some common elements, such as the use of so-called Tolerance-Enforced Simulation (TES) techniques based on rough path estimates. Even though the algorithm in Section 3 is more general, we believe that there is significant value in developing the algorithm in Section 2, for two reasons. The first one is pedagogical: the algorithm in Section 2 is easier to understand, while building on a key idea, which involves localizing essential quantities within specific compact domains with probability one. The second reason is that the algorithm in Section 2, being simpler, may be subject to potential improvement methodologies to be pursued in future research.

Potential improvements are particularly interesting directions, especially given that, unfortunately, the algorithms that we present have infinite expected termination time. We recognize that this issue should be resolved for the algorithms to be widely used in practice, and we discuss the elements which lead to infinite expected running time in Section 4. There are basically two types of elements that affect the running time of the algorithm in Section 3; one of them has to do with the types of issues that arise when trying to fully remove the bias in TES and related approximations, and the other has to do with the use of a density approximation coupled with Bernoulli factories. In contrast, the algorithm in Section 2 is only affected by removing the bias in TES-type approximations. We must stress, however, that the present paper shows for the first time that it is possible to perform exact sampling of multidimensional diffusions in substantial generality, and, in doing so, it provides a conceptual framework different from the prevailing use of Lamperti transformation, which is the only available generic approach for producing exact sampling of diffusions.

Now, despite the algorithm’s practical limitations, it is vital to recognize the advantages that unbiased samplers have over biased samplers in the context of a massive parallel computing environment, because it is straightforward to implement a parallel procedure to reduce the estimation error of an unbiased sampler.

Recently, there have been several unbiased estimation procedures which have been proposed for expectations of the form $\alpha =\mathbb{E}\!\left(\,f\left(X\left( t\right) \right) \right) $, assuming $\text{Var}\!\left(\,f\left(X\left( t\right) \right) \right) <\infty $. For example, the work of [Reference Rhee and Glynn21] shows that if $f\left( \cdot \right) $ is twice continuously differentiable (with Lipschitz derivatives) and if there exists a discretization scheme which can be implemented with a strong convergence error of order 1, then it is possible to construct an unbiased estimator for $\alpha $ with finite variance and finite expected termination time. The work of [Reference Giles and Szpruch12] shows that such a discretization scheme can be developed if $\mu \left( \cdot \right) $ and $\sigma \left( \cdot\right) $ are sufficiently smooth under certain boundedness assumptions. The paper [Reference Henry-Labordère, Tan and Touzi13] also develops an unbiased estimator for $\alpha $ using a regime-switching technique. Our work here is somewhat related to this line of research, but an important difference is that we actually generate $X\left( t\right) $ exactly, while all of the existing algorithms which apply in multidimensional diffusion settings generate Z such that $\mathbb{E}\left( Z\right) =\alpha$. So, for example, if $f\left( \cdot \right) $ is positive, one cannot guarantee that Z is positive using the type of samplers suggested in [Reference Rhee and Glynn21]. However, by sampling $X\left( t\right) $ directly, one maintains the positivity of the estimator.

Another instance in which direct exact samplers are useful arises in the context of stochastic optimization. For instance, consider the case in which one is interested in optimizing a convex function of the form $g\left(\theta \right) =\mathbb{E}\left( h\left( X\left( t\right)\!,\theta \right)\right) $, where $h\left( x,\cdot \right) $ is differentiable. In this case, one can naturally construct an estimator $Z\left( \theta \right) $ such that $g\left( \theta \right) =\mathbb{E}\left( Z\left( \theta \right) \right) $ using the results in [Reference Rhee and Glynn21] and optimize the mapping $\theta\rightarrow $ $n^{-1}\sum_{i=1}^{n}Z_{i}\left( \theta \right) $, which unfortunately will typically not be convex. So, having access to a direct procedure to sample $X\left( t\right) $ in this setting is particularly convenient, as convexity is preserved.

The rest of the paper is organized as follows. In Section 2, we consider the case of multidimensional diffusions with a constant diffusion coefficient and a Lipschitz continuous (suitably smooth) drift. The general case is discussed in Section 3. Our development uses localization ideas which are introduced in Section 2, as well as some basic estimates of the transition density of the underlying diffusion (e.g. Lipschitz continuity), which are developed in Appendix A. As mentioned before, we discuss the bottlenecks in the expected running time of the algorithm in Section 4.

2. Exact simulation of SDEs with identity diffusion coefficient

In the case that Lamperti’s transformation is applicable, the SDE of interest is reducible to another SDE whose diffusion matrix is the identity. As a result, it suffices to consider simulating the following SDE:

(2) \begin{equation}dX(t)=\mu (X(t))dt+dW(t),\quad \quad X(0)=x_{0},\end{equation}

where $W=\left\{ W(t)=(W_{1}(t),\cdots ,W_{d}(t))\,{:}\,0\leq t<\infty \right\} $ is a d-dimensional Brownian motion. In this section we concentrate on the identity diffusion case (2), but the development can be immediately extended to the case of a constant diffusion matrix. However, throughout this section we must impose the following assumptions.

Assumption 1. The SDE (2) has a strong solution.

Assumption 2. The drift coefficient $\mu(\!\cdot\!)$ is three times continuously differentiable.

Assumption 2 is the requirement of TES, the theoretical foundation of our algorithm, which we shall introduce later.

Let us introduce some notation first. For any set G and $x\in \mathbb{R}^{d}$, we use $d(x,G)=\inf \left\{ \Vert x-y\Vert _{2}\,{:}\,y\in G\right\} $ to denote the distance between x and G; $\accentset{\,\,\circ}{G}$ denotes the interior of G; $\partial G$ denotes the boundary of G; $G^{c}$ denotes the complement of G.

Consider a probability space $(\Omega ,\mathcal{F},\tilde{\mathbb{P}})$ endowed with a filtration $\left\{ \mathcal{F}_{t}\,{:}\,0\leq t\leq 1\right\} $, and supporting a d-dimensional Brownian motion

\begin{equation*}X(t)=(X_{1}(t),{\cdots}\,,X_{d}(t)), \qquad 0\leq t\leq 1.\end{equation*}

Let $\left\{ L(t)\,{:}\,0\leq t\leq 1\right\} $ be a $\tilde{\mathbb{P}}$-local martingale defined as

(3) \begin{equation}L(t)=\exp \left( \int_{0}^{t}\mu ^{T}(X(t))dX(t)-\frac{1}{2}\int_{0}^{t}\Vert \mu (X(t))\Vert _{2}^{2}dt\right)\!,\end{equation}

where $\mu ^{T}(\!\cdot\!)$ denotes the transpose of the column vector $\mu(\!\cdot\!)$. Under Assumption 1, $L(\!\cdot\!)$ is a $\tilde{\mathbb{P}}$-martingale; see Corollary 3.5.16 of [Reference Karatzas and Shreve15].

In this case we can define a probability measure $\mathbb{P}$ through

\begin{equation*}\mathbb{P}(A)=\mathbb{E}^{\tilde{\mathbb{P}}}\left[ I(A)L(1)\right]\qquad \forall A\in \mathcal{F},\end{equation*}

where I(A) denotes the indicator function of the set A and $\mathbb{E}^{\tilde{\mathbb{P}}}\left( \cdot \right) $ is the expectation operator under $\tilde{\mathbb{P}}$.

Let

\begin{equation*}W(t)=(W_{1}(t),\cdots ,W_{d}(t)), \qquad 0\leq t\leq 1,\end{equation*}

be a d-dimensional process defined by

(4) \begin{equation}W(t)=X(t)-\int_{0}^{t}\mu (X(s))ds, \qquad 0\leq t\leq 1.\end{equation}

The following theorem provides the distribution of $W(\!\cdot\!)$.

Theorem 1 (Girsanov’s theorem). If Assumption 1 is satisfied, then the process $W(\!\cdot\!)$ is a d-dimensional Brownian motion on the probability space $(\Omega, \mathcal{F},\mathbb{P}).$

Proof. See, for instance, Theorem 3.5.1 of [Reference Karatzas and Shreve15].

It is readily apparent from (4) that $X(\!\cdot\!)$ is a weak solution to the SDE (2) on the probability space $(\Omega ,\mathcal{F},\mathbb{P})$. The exact simulation problem becomes sampling $X(1) $ under the measure $\mathbb{P}$. Since X(1) follows a normal distribution under the measure $\tilde{\mathbb{P}}$, we can attempt to use acceptance-rejection to sample X(1). A direct application of acceptance-rejection may proceed by using the $\tilde{\mathbb{P}}$ distribution of X(1) (which is simply a normal distribution) as the proposal, which, if acceptance-rejection is applicable, would then be accepted with probability proportional to L(1). However, there are two obstacles when trying to apply such a direct acceptance-rejection approach. First, the presence of the general stochastic integral appearing in the definition of $L( 1) $ makes the likelihood ratio difficult to compute directly. Second, a direct application of acceptance-rejection requires the likelihood ratio, L(1), to be bounded, which is unfortunately violated.

In order to deal with the first obstacle, we note that it is really not necessary to accurately evaluate the likelihood ratio. In the standard procedure of acceptance-rejection, the likelihood ratio is only used for comparison with an independent uniform random variable. Thus, to address the first obstacle, we can approximate the likelihood ratio with a deterministic error bound, and keep refining until we can decide to either accept or reject the proposal. It turns out, as we shall see in Corollary 1, that the same approximation technique can actually be used to localize $L\left( 1\right)$ and also resolve the second obstacle. Then, we will sample the distribution of X(1) conditional on the localization of $L\left(1\right)$ using acceptance-rejection, where the rejection scheme is suggested by Lemma 1.

The theoretical foundation for such an approximation and refinement strategy is given by Tolerance-Enforced Simulation (TES), which is presented in Theorem 2.

Theorem 2 (Tolerance-Enforced Simulation). Consider a probability space $(\Omega ,\mathcal{F},\mathbb{P})$ and the following SDE:

(5) \begin{equation}dY(t)=\alpha (Y(t))dt+\nu (Y(t))dW(t),\quad Y(t)=y_{0},\end{equation}

where $\alpha (\!\cdot\!)=(\alpha _{i}(\!\cdot\!))_{d}\,{:}\,\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$, $\nu (\!\cdot\!)=(\nu _{ij}(\!\cdot\!))_{d\times d^{\prime }}\,{:}\,\mathbb{R}^{d}\rightarrow \mathbb{R}^{d\times d^{\prime }}$, and $W(\!\cdot\!)$ is a d’-dimensional Brownian motion. Suppose that $\alpha (\!\cdot\!)$ is continuously differentiable and that $\nu (\!\cdot\!)$ is three times continuously differentiable. Then, given any deterministic $\varepsilon >0$, there is an explicit Monte Carlo procedure that allows us to simulate a piecewise constant process $Y_{\varepsilon }(\!\cdot\!)$ such that

\begin{equation*}\sup_{t\in \lbrack 0,1]}\Vert Y_{\varepsilon }(t)-Y(t)\Vert _{2}\leq\varepsilon\end{equation*}

with probability one. Furthermore, for any $m>1$ and $0<\varepsilon_{m}<\dots <\varepsilon _{1}<1$, we can simulate $Y_{\varepsilon _{m}}$ conditional on $Y_{\varepsilon _{1}},\dots ,Y_{\varepsilon _{m-1}}$.

Proof. See Theorem 2.1, Theorem 2.2, and Section 2.1 of [Reference Blanchet, Chen and Dong6], where a detailed TES procedure is also provided.

Remark 1. TES is based on the Lévy–Ciesielski construction of the driving Brownian motion $W(\!\cdot\!)$ up to a random level. Consequently, W(1) is obtained for free when we run TES in which a skeleton of the driving Brownian motion $W(\!\cdot\!)$ is simulated. In particular, for any $m>1$ and $0<\varepsilon_{m}<\dots <\varepsilon _{1}<1$, we can simulate $Y_{\varepsilon _{m}}$ conditional on $Y_{\varepsilon _{1}},\dots ,Y_{\varepsilon _{m-1}}$ and W(1).

As a straightforward consequence of Theorem 2, we develop a localization procedure for SDE in Corollary 1. Before moving forward to state the result, we define some notation that will be used therein.

Definition 1. A family of (Borel measurable) sets $\mathcal{G}=\{G_{i}\subset \mathbb{R}^{d}\,{:}\,i\in \mathbb{N}\}$ is said to be a countable continuous partition for a d-dimensional random vector Y if and only if the following hold:

  1. 1. The sets in $\mathcal{G}$ are mutually disjoint; that is, $G_{i}\cap G_{j}= \emptyset$ for $i\neq j$.

  2. 2. Y is concentrated on $\mathcal{G}$; that is, $\mathbb{P}(Y\in \cup_{i\in \mathbb{N}}G_{i})=1$.

  3. 3. $\mathbb{P}(Y\in \partial G_{i})=0$ for all $i\in \mathbb{N}$.

In addition, a function $\Xi_{\mathcal{G}}(x)\,{:}\,\text{supp}(Y)\rightarrow\mathbb{N}$ is defined such that $\Xi_{\mathcal{G}}(x) = i$ if and only if $x\in G_i$.

Corollary 1. In the setting of Theorem 2, let $\mathcal{G}=\{G_{i}\,{:}\,i\in \mathbb{N}\}$ be a countable continuous partition for Y(1); then there is an algorithm for simulating $\Xi _{\mathcal{G}}(Y(1))$ that terminates in finite time with probability one. In particular, for any set G such that $\mathbb{P}(Y(1)\in \partial G)=0$, there is an algorithm for simulating $I(Y(1)\in G)$.

Proof. Notice that $\mathbb{P}(Y(1)\in \partial G_i) = 0$, so $Y(1)\in\bigcup_{i\in \mathbb{N}} \accentset{\,\,\circ}{G}_i$ holds almost surely. Recall from Theorem 2 that $\|Y_{\varepsilon}(1)-Y(1)\|_2\leq\varepsilon$ almost surely, which suggests that

\begin{equation*}\mathbb{P}\left(\{\omega\in \Omega\,{:}\,Y(1)\in \accentset{\,\,\circ}{G}_i\}\right) = \mathbb{P}\left(\bigcup_{\varepsilon>0} \{\omega\in \Omega\,{:}\,d(Y_{\varepsilon}(1),G_i^c)>\epsilon \}\right)\!.\end{equation*}

Thus, we pick $\varepsilon \in (0,1)$ and apply TES to simulate the approximation process $Y_{\varepsilon }(1)$. If

\begin{equation*}d\Big(Y_{\varepsilon }(1),G_{\Xi _{\mathcal{G}}(Y_{\varepsilon}(1))}^{c}\Big)>\varepsilon,\end{equation*}

then $\Xi _{\mathcal{G}}(Y(1))=\Xi _{\mathcal{G}}(Y_{\varepsilon }(1))$, which terminates the algorithm. Otherwise we keep refining the approximation of TES, by setting $\varepsilon \leftarrow \varepsilon /2$, until $d(Y_{\varepsilon }(1),G_{\Xi _{\mathcal{G}}(Y_{\varepsilon}(1))}^{c})>\varepsilon$. The algorithm will ultimately terminate since

\begin{equation*}\mathbb{P}\left( \bigcup_{i\in \mathbb{N}}\bigcup_{\varepsilon >0}\{\omega\in \Omega \,{:}\,d(Y_{\varepsilon }(1),G_{i}^{c})>\epsilon \}\right) =1.\end{equation*}

The procedure for simulation of $I(Y(1)\in G)$ is just a particular case, obtained by setting $\mathcal{G}=\{G,G^{c}\}$. The details of the algorithm are given in Algorithm 1.

The algorithm for simulating X(1) is performed in a two-stage fashion. At the first stage, the likelihood ratio L(1) is localized with the help of Corollary 1. (The efficiency of the algorithm may be slightly improved if we localize X(1) and L(1) simultaneously at the first step, then apply acceptance-rejection based on localization. However, this does not solve the problem of the infinite expected running time.) Then, at the second stage, X(1) is sampled conditional on the result of localization.

We now illustrate how to localize L(1) in detail. In order to write the dynamics of Y(1) in standard form as in (5), we consider the SDE of $(L(\!\cdot\!),X(\!\cdot\!))$ under the measure $\mathbb{P}$ as follows:

(6) \begin{align}\begin{cases}dL(t) = L(t)\|\mu(X(t))\|_2^2dt + L(t)\mu^{T}(X(t))dW(t), \\[3pt] dX(t) = \mu(X(t))dt + dW(t).\end{cases}\end{align}

Let $\mathcal{G} = \{G_i = [i,i+1)\times \mathbb{R}^d\,{:}\, i\in \mathbb{N} \}$ in the rest of this section. As (3) guarantees that $L(1)$ is nonnegative, it follows immediately that $\mathcal{G}$ is a countable continuous partition for L(1). Therefore, Algorithm 1 is directly applicable to sample $\Xi_\mathcal{G}((L(1),X(1)))$ using the SDE (6). Without loss of generality, we assume $\Xi_\mathcal{G}((L(1),X(1))) = i$ in the rest of this section. It remains to sample X(1) conditional on $\Xi_\mathcal{G}((L(1),X(1))) = i$ under the probability measure $\mathbb{P}$.

The following lemma provides an alternative expression of the conditional distribution of X(1), which facilitates the simulation of X(1) conditional on localization.

Lemma 1. Let $U\sim\textrm{Unif}\;(0,1)$ independent of everything else under the probability measure $\tilde{\mathbb{P}}$; then we have

\begin{align*}\mathbb{P}\Big(X(1)\in dx\Big|\Xi_\mathcal{G}\big((L(1),X(1))\big) = i\Big) = \tilde{\mathbb{P}}\Big(X(1)\in dx\Big|\!\max\big(i,(i+1)U\big)<L(1)<i+1\Big).\\[-30pt]\end{align*}

Proof. From the definition of conditional probability,

\begin{equation*}\mathbb{P}\Big(X(1)\in dx\Big|\Xi _{\mathcal{G}}\big((L(1),X(1))\big)=i\Big)=\frac{\mathbb{P}\Big(X(1)\in dx;\ \Xi_{\mathcal{G}}\big((L(1),X(1))\big)=i\Big)}{\mathbb{P}\Big(\Xi_{\mathcal{G}}\big((L(1),X(1))\big)=i\Big)}.\end{equation*}

Recall that $d\tilde{\mathbb{P}}=L(1)d\mathbb{P}$, so we have

\begin{equation*}\mathbb{P}\Big(X(1)\in dx\Big|\Xi_{\mathcal{G}}\big((L(1),X(1))\big)=i\Big)=\frac{\mathbb{E}^{\tilde{\mathbb{P}}}\Big[L(1)I(X(1)\in dx;\ \Xi_{\mathcal{G}}\big((L(1),X(1)\big)=i\Big]}{\mathbb{P}\Big(\Xi_{\mathcal{G}}\big((L(1),X(1))\big)=i\Big)}.\end{equation*}

Since on $\Xi _{\mathcal{G}}((L(1),X(1))=i$,

\begin{equation*}i\leq L(1)\leq i+1,\end{equation*}

we can rewrite the expectation into a probability by introducing $U\sim\text{Unif}(0,1)$, as follows:

\begin{align*}& \mathbb{E}^{\tilde{\mathbb{P}}}\Big[L(1)I(X(1)\in dx;\ \Xi _{\mathcal{G}}\big((L(1),X(1))\big)=i)\Big] \\[2pt] &\quad = (i+1)\tilde{\mathbb{P}}\Big(X(1)\in dx;\ \Xi _{\mathcal{G}}\big((L(1),X(1))\big)=i;\ (i+1)U<L(1)\Big) \\[2pt] &\quad = (i+1)\tilde{\mathbb{P}}\Big(X(1)\in dx;\ \max \big(i,(i+1)U\big)<L(1)<i+1\Big).\end{align*}

By substitution, it follows easily that

\begin{align*}\mathbb{P}\Big(X(1)\in dx\Big|\Xi _{\mathcal{G}} ((L(1),X(1)))=i\Big)&=\frac{(i+1)\tilde{\mathbb{P}}\Big(\!\max \big(i,(i+1)U\big)<L(1)<i+1\Big)}{\mathbb{P}\Big(\Xi _{\mathcal{G}}\big((L(1),X(1))\big)=i\Big)} \\[2pt] & \quad\times \tilde{\mathbb{P}}\Big(X(1)\in dx\Big|\!\max \big(i,(i+1)U\big)<L(1)<i+1\Big).\end{align*}

It remains to prove that

\begin{equation*}(i+1)\tilde{\mathbb{P}}\Big(\!\max \big(i,(i+1)U\big)<L(1)<i+1\Big)=\mathbb{P}\Big(\Xi _{\mathcal{G}}\big((L(1),X(1))\big)=i\Big).\end{equation*}

By a similar argument we can deduce that

\begin{align*}& \mathbb{P}\Big(\Xi_{\mathcal{G}}\big((L(1),X(1)\big)=i\Big) \\[2pt] &\quad = \mathbb{E}^{\tilde{\mathbb{P}}}\Big[L(1)I\big(\Xi_{\mathcal{G}}\big((L(1),X(1)\big)=i\big)\Big]\\[2pt] &\quad = (i+1)\tilde{\mathbb{P}}\Big(\Xi_{\mathcal{G}}\big((L(1),X(1)\big)=i;\ (i+1)U<L(1)\Big) \\[2pt] &\quad = (i+1)\tilde{\mathbb{P}}\Big(\!\max \big(i,(i+1)U\big)<L(1)<i+1\Big),\end{align*}

which ends the proof.

As a direct implication of Lemma 1, in order to obtain an example sample for $X\left( 1\right) $ under $\mathbb{P}$, given $\Xi_{\mathcal{G}}\big((L(1),X(1)\big)=i$, we can simply simulate X(1) conditional on $\max (i,(i+1)U)<L(1)<i+1$ under the probability measure $\tilde{\mathbb{P}}$. In order to do this sampling under $\tilde{\mathbb{P}}$, we can sample U first and denote the value by u. Then, observing that $X(\!\cdot\!)$ is the driving Brownian motion under the probability measure $\mathbb{P}$, Algorithm 1 is applied to the SDE

(7) \begin{equation}dL(t)=L(t)\mu ^{T}(X(t))dX(t)\end{equation}

to simulate the indicator function $I(\!\max (i,u)<L(1)<i+1)$. In addition, according to Remark 1, when TES is employed in Algorithm 1, a sample of X(1) is also produced simultaneously. Thereafter, the value of X(1) is accepted if and only if $I(\!\max (i,u)<L(1)<i+1)=1$; otherwise we repeat the procedure in this paragraph, but we fix the parameter i, because $\Xi _{\mathcal{G}}((L(1),X(1)))=i$ has already been sampled under the correct distribution $\mathbb{P}$. The output of the algorithm, once the value $X\left( 1\right)$ is finally accepted, follows the distribution of X(1) under $\mathbb{P}$ without any bias.

We summarize the discussion in this section in the following theorem.

Theorem 3. If Assumptions 1 and 2 are satisfied, then there is an exact simulation algorithm for X(1) that terminates with probability one; see Algorithm 2.

3. Exact simulation for general SDEs

In this section, we will develop an exact simulation algorithm for the SDE (1). We shall fix $X\left( 0\right) =x_{0}$, and the dependence of $x_{0}$ on some objects (such as the transition density of $X\left( 1\right)$) will be omitted.

We are still going to construct an exact simulation algorithm based on acceptance-rejection in this section. However, for SDEs with non-constant diffusion matrix, applying Girsanov’s theorem no longer provides a Brownian-type proposal distribution for acceptance-rejection, so we will construct an acceptance-rejection algorithm based on the density of X(1).

Throughout the rest of this section, we shall assume the following assumptions and conditions.

Assumption 3. The drift coefficient $\mu (\!\cdot\!)$ is continuously differentiable, and the diffusion coefficient $\sigma (\!\cdot\!)$ is three times continuously differentiable. Moreover, a strong solution to the SDE (1) exists.

Condition 1. The probability distribution of X(1) is absolutely continuous with respect to the Lebesgue measure. In other words, $X(1) $ has a density function, denoted by $p(\!\cdot\!)$, with respect to the Lebesgue measure.

Condition 2. For any relatively compact set S, the density $p(\!\cdot\!)$ is Lipschitz continuous with Lipschitz constant $C_{S}$; i.e.

\begin{equation*}|p(x)-p(y)|\leq C_{S}|x-y|\quad \quad \forall x,y\in S.\end{equation*}

Condition 3. For any relatively compact set S, there exist $\delta_S > 0$ such that

\begin{equation*}p(x)\geq \delta_S \quad\quad \forall x\in S.\end{equation*}

As we have seen in the previous section, Assumption 3 is the necessary condition for the applicability of the TES result introduced in Theorem 2, which enables us to strongly approximate X(1). Condition 1 will eventually be used to apply the acceptance-rejection technique using an absolutely continuous (with respect to the Lebesgue measure) proposal distribution. Conditions 2 and 3, as we shall see, will allow us to control the bound of the likelihood ratio when applying acceptance-rejection.

It is important to ensure that the constants $C_{S}$ and $\delta _{S}$ are explicitly computable in terms of $\mu \left( \cdot \right) $ and $\sigma\left( \cdot \right) $ only, but we should also emphasize that we are not assuming that the density $p({\cdot})$ is known.

There are many ways in which the computability of $C_{S}$ and $\delta _{S}$ can be enforced. For instance, in Appendix A we discuss a set of assumptions involving classical estimators of the fundamental solutions of parabolic equations, which we review in order to compute $C_{S}$ and $\delta _{S}$ explicitly.

The standard use of the acceptance-rejection algorithm requires knowing the density p(x), which seems hopeless for the general SDE problem that we study. An alternative approach is constructing a nonnegative, bounded, and unbiased estimator of p(x). While the density p(x) is unknown, an unbiased estimator of p(x), denoted by $\Lambda_N(x)$ in Section 3.1, can be constructed by means of a local approximation of the density. However, the unbiased estimator $\Lambda_N(x)$ may be negative, so it cannot be directly used in acceptance-rejection. To remedy this problem, in Lemma 3 we construct a nonnegative and unbiased estimator $\Lambda^{+}_N(x)$ of p(x) using a random walk and a suitable Bernoulli factory. However, the estimator $\Lambda^{+}_N(x)$ is unbounded, so we propose to sample enough information about the SDEs (the ancillary variable $N^{\prime}$) so that the estimator $\Lambda^{+}_N(x)$ conditional on the sampled information is locally bounded. Consequently, conditional on the localization of X(1) and the additional information $N=N'$, the estimator $\Lambda^{+}_N(x)$ is bounded and nonnegative.

We now state the outline of our exact simulation algorithm. First of all, we apply a localization technique on the countable continuous partition $\mathcal{G}_{\text{loc}}$ defined as

\begin{equation*}\mathcal{G}_{\text{loc}}=\{[i_{1},i_{1}+1)\times \cdots \times \lbrack i_{d},i_{d}+1)\,{:}\,(i_{1},\dots ,i_{d})\in \mathbb{Z}^{d}\}.\end{equation*}

Since $\mathcal{G}_{\text{loc}}$ has countably many components, we can enumerate $\mathcal{G}_{\text{loc}}$ and rewrite it in the form $\mathcal{G}_{\text{loc}}=\{G_{i}\,{:}\,i\in \mathbb{N}\}$, where each $G_{i}$ is a unit hypercube. Obviously, Algorithm 1 is applicable to X(1) with respect to the countable continuous partition $\mathcal{G}_{\text{loc}}$.

Then, we introduce an ancillary random variable $N^{\prime }$ coupled with $X(1)$, and simulate $(N^{\prime }|X(1)\in G_{i})$. As we shall see, the random variable $N^{\prime }$ will play an important role after we introduce a suitable family of random variables whose expectations converge to the density of $X\left( 1\right) $ at a given point. In the end, we will be able to sample X(1) conditional on $N^{\prime }$ and $X(1)\in G_{i}$, using the estimator $\Lambda^{+}_N(x)$, which is bounded and nonnegative for $x\in G_{i}$ and $N=N^{\prime }$.

The following theorem provides the main contribution of this paper.

Theorem 4. If Assumption 3 and Conditions 13 are satisfied, then there is an algorithm for exactly simulating X(1) which terminates in finite time with probability one; see Algorithm 3.

The rest of this section is organized as follows. Section 3.1 applies a technique borrowed from multilevel Monte Carlo to construct the unbiased density estimator and the ancillary random variable $N^{\prime }$. Section 3.2 explains how to sample $N^{\prime }$ using acceptance-rejection and a suitable Bernoulli factory [Reference Nacu and Peres19, Reference Łatuszyńki, Kosmidis, Papaspiliopoulos and Roberts17, Reference Huber14] conditional on localization. Section 3.3 demonstrates how to sample $X(1)$ conditional on $N^{\prime }$, once again using a suitable localization.

3.1. A multilevel representation of the density

In this section, we borrow an idea from multilevel Monte Carlo [Reference Giles11] to construct an unbiased estimator for $p(\!\cdot\!)$, and we also introduce the ancillary random variable $N^{\prime }$.

In order to illustrate our idea, first we need to introduce some notation. For any x in $G_{i}$, we define $\{B_{r_{n}}(x)\,{:}\,n\geq 1\}$ as a sequence of open balls centered at x, whose radii $\{r_{n}\,{:}\,n\geq 1\}$ form a decreasing sequence with $r_{n}\rightarrow 0$ as $n\rightarrow \infty $.

Let V(r) denote the volume of a d-dimensional ball with radius r (i.e. the volume of $B_{r}\left( 0\right) $). We define $\overline{p_{n}}(x)$ to be the average density over the ball $B_{r_{n}}(x)$, i.e.

\begin{equation*}\overline{p_{n}}(x)=[V(r_{n})]^{-1}\int_{B_{r_{n}}(x)}p(x)dx.\end{equation*}

Let $\hat{p}_{n}(x)$ denote a nonnegative unbiased estimator for $\overline{p_{n}}(x)$, i.e.

\begin{equation*}\hat{p}_{n}(x)=[V(r_{n})]^{-1}\times I(X(1)\in B_{r_{n}}(x))\end{equation*}

for $n\geq 1$, where $\hat{p}_{n}(x)$ is defined using the same realization X(1) for all n and x. We define $\hat{p}_{0}(x)\coloneqq 0$ and $\overline{p_{0}}\coloneqq 0$ for notational simplicity. It follows immediately that $\mathbb{E}[\hat{p}_{n}(x)]=\overline{p_{n}}(x)$ for $n\geq 0.$

The density p(x) is first decomposed into an infinite telescoping sum,

\begin{equation*}p(x)=\sum_{n=0}^{\infty }\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right)\!.\end{equation*}

Then we introduce a randomization technique inspired by randomized multilevel Monte Carlo (see [Reference McLeish18, Reference Rhee and Glynn22]. The density p(x) can be decomposed as the expectation of an infinite sum of estimators, which is truncated to a finite but random level so that the expectation is invariant. The idea is to introduce an integer-valued random variable N, which is independent of everything else. Then p(x) can be expressed as

\begin{align*}p(x)& =\sum_{n=0}^{\infty }\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right) \\[3pt]& =\sum_{n=0}^{\infty }\sum_{k=0}^{\infty }\frac{\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right) }{\mathbb{P}(N\geq n)}\mathbb{P}(N=k)I(n\leq k) \\[3pt]& =\sum_{k=0}^{\infty }\sum_{n=0}^{\infty }\frac{\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right) }{\mathbb{P}(N\geq n)}\mathbb{P}(N=k)I(n\leq k) \\[3pt]& =\sum_{k=0}^{\infty }\sum_{n=0}^{k}\frac{\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right) }{\mathbb{P}(N\geq n)}\mathbb{P}(N=k) \\[3pt]& =\mathbb{E}\left[ \sum_{n=0}^{N}\frac{\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right) }{\mathbb{P}(N\geq n)}\right] ,\end{align*}

where the third equality follows from Fubini’s theorem, which can be justified if

\begin{align*}\sum_{n=0}^{\infty }\sum_{k=0}^{\infty }\frac{\left\vert \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right\vert }{\mathbb{P}(N\geq n)}\mathbb{P}(N=k)I(n\leq k)&=\sum_{n=0}^{\infty }\left\vert \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right\vert\\[3pt]&\leq 2C_{G_{i},r_1}\sum_{n=0}^{\infty}r_{n}\\[3pt]&<\infty .\end{align*}

We will show $\sum_{n=0}^{\infty }r_{n}<\infty $ in the sequel. Moreover, by the tower property we have

\begin{align*}\mathbb{E}\left[ \sum_{n=0}^{N}\frac{\left( \overline{p_{n+1}}(x)-\overline{p_{n}}(x)\right) }{\mathbb{P}(N\geq n)}\right] & =\mathbb{E}\left[\sum_{n=0}^{N}\frac{\mathbb{E}[\hat{p}_{n+1}(x)-\hat{p}_{n}(x)|N]}{\mathbb{P}(N\geq n)}\right] \\[3pt]& =\mathbb{E}\left[ \mathbb{E}\left[ \left.\sum_{n=0}^{N} \frac{\hat{p}_{n+1}(x)-{\hat{p}_{n}}(x)}{\mathbb{P}(N\geq n)}\right\vert N\right] \right]\\[3pt]& =\mathbb{E}\left[ \sum_{n=0}^{N}\frac{\hat{p}_{n+1}(x)-{\hat{p}_{n}}(x)}{\mathbb{P}(N\geq n)}\right] .\end{align*}

Therefore, if we define

\begin{equation*}\Lambda _{n}(x)=\sum_{k=0}^{n}\frac{\hat{p}_{k+1}(x)-\hat{p}_{k}(x)}{\mathbb{P}(N\geq k)}\quad \mbox{for } n\geq 0,\end{equation*}

it follows easily that

(8) \begin{equation}p(x)=\mathbb{E}\left[ \Lambda _{N}(x)\right] .\end{equation}

We now are interested in obtaining bounds for $\Lambda _{n}(x)$ and its expectation for $x\in G_{i}$. To this end we first define $G_{i,r_{1}}$ as an $r_{1}$-neighborhood of the set $G_{i}$, which consists of all points at a distance less than $r_{1}$ from $G_{i}$; i.e.

\begin{equation*}G_{i,r_{1}}=\bigcup_{x\in G_{i}}B_{r_{1}}(x).\end{equation*}

It is not hard to observe that $G_{i,{r_{i}}}$ is a relatively compact set, to which Conditions 13 are applicable. In the following lemma, we will demonstrate that under these conditions, one can judiciously pick the distribution of N and the radii $\{r_{n}\,{:}\,n\geq 1\}$ in order to establish explicit bounds for $\Lambda_{n}(x) $ and $\mathbb{E}[\Lambda _{n}(x)]$, respectively.

Lemma 2. Suppose that $x\in G_i$ and Conditions 13 are satisfied. If we pick

\begin{equation*}r_n = \big(3\delta_{G_{i,r_1}}\big)/\big(2\pi^2n^3C_{G_{i,r_1}}\big) \quad\mbox{and}\quad\mathbb{P}(N = n) = 1/[n(n+1)]\end{equation*}

for $n\geq 1$, then we have

(9) \begin{equation}\delta_{G_{i,r_1}}/2\leq \mathbb{E}[\Lambda_{n}(x)] \leq [V(r_{1})]^{-1} +\delta_{G_{i,r_1}}/2\end{equation}

and

(10) \begin{equation}|\Lambda_n(x)|\leq[V(r_1)]^{-1} +\sum_{k=1}^n\left(k[V(r_{k+1})]^{-1}+k[V(r_k)]^{-1}\right)\eqqcolon m_{n}.\end{equation}

Proof. Let us construct the lower bound of $\mathbb{E}[\Lambda_{n}(x)]$ first. By the triangle inequality,

\begin{equation*}\mathbb{E}[\Lambda_n(x)]\geq \mathbb{E}[\Lambda_{0}(x)] - \sum_{k=1}^n\mathbb{E}\left|\Lambda_{k}(x)-\Lambda_{k-1}(x)\right|.\end{equation*}

On the one hand, from the definition of $\Lambda_0(x)$ and Condition 3, we can conclude that

\begin{equation*}\mathbb{E}[\Lambda_{0}(x)] = \mathbb{E}[\hat{p}_{1}(x)] = \overline{p_1} (x)\geq \delta_{G_{i,r_1}}.\end{equation*}

On the other hand, using the triangle inequality,

\begin{align*} \mathbb{E}\left|\Lambda_{k}(x)-\Lambda_{k-1}(x)\right| &= \mathbb{E} \left\vert\frac{\hat{p}_{k+1}(x)-\hat{p}_{k}(x)}{\mathbb{ P}(N\geq k)}\right\vert\\[3pt] &\leq\left\vert\frac{\mathbb{E}\hat{p}_{k+1}(x)-\mathbb{E}\hat{p}_{k}(x)}{\mathbb{ P}(N\geq k)}\right\vert\\[3pt] &=(\mathbb{P}(N\geq k))^{-1} |\overline{p_{k+1}}(x) - \overline{p_{k}}(x)|.\end{align*}

Then, from Condition 2 we have

(11) \begin{equation}|p(x) -p(y)|\leq C_{G_{i,r_1}}|x-y|\quad \mbox{for}\quad x,y\in C_{G_{i,r_1}}.\end{equation}

Recall that $\overline{p_{k}}(x)$ is the average density over the ball $B_{r_{k}}(x)$ and $B_{r_{k}}(x)\subseteq B_{r_{1}}(x)\subseteq G_{i,r_1}$ for $x\in G_i$. It then follows from (11) that

\begin{equation*} |\overline{p_{k+1}}(x) - \overline{p_{k}}(x)|\leq C_{G_{i,r_1}} \mathrm{diam}\big(B_{r_1}(x)\big) = 2C_{G_{i,r_1}}r_1.\end{equation*}

Consequently,

\begin{equation*}\sum_{k=1}^n\mathbb{E}\left|\Lambda_{k}(x)-\Lambda_{k-1}(x)\right| \leq\sum_{k=1}^n 2C_{G_{i},r_{1}}(\mathbb{P}(N\geq k))^{-1}r_k \leq\delta_{G_{i,r_1}}/2.\end{equation*}

Combining the above inequality with $\mathbb{E}[\Lambda_{0}(x)]\geq \delta_{G_{i,r_1}}$ yields

\begin{equation*}\mathbb{E}[\Lambda_n(x)]\geq \delta_{G_{i,r_1}}/2.\end{equation*}

Similarly, observing that

\[\mathbb{E}[\Lambda_{0}(x)] = \mathbb{E}[\hat{p}_1(x)] = [V(r_{1})]^{-1}\times \mathbb{P}(X(1)\in B_{r_{1}}(x))\leq [V(r_{1})]^{-1},\]

for the upper bound we have

\begin{equation*}\mathbb{E}[\Lambda_{n}(x)]\leq \mathbb{E}[\Lambda_{0}(x)] + \sum_{k=1}^n\mathbb{E}\left|\Lambda_{k}(x)-\Lambda_{k-1}(x)\right| \leq[V(r_{1})]^{-1}+\delta_{G_{i,r_1}}/2.\end{equation*}

We can also derive an upper bound of $|\Lambda_n(x)|$:

\begin{align*}|\Lambda_n(x)|&\leq \sum_{k=0}^n(\mathbb{P}(N\geq k))^{-1}\left|\hat{p}_{k+1}(x)-\hat{p}_{k}(x)\right| \\&\leq [V(r_1)]^{-1} +\sum_{k=1}^n\left(k[V(r_{k+1})]^{-1}+k[V(r_k)]^{-1}\right)\eqqcolon m_{n}<\infty,\end{align*}

which ends the proof.

In the rest of this paper, we will adopt the value of $r_{n}$ and the distribution of N in Lemma 2.

Even though we have constructed an unbiased estimator $\Lambda _{N}(x)$ for $p(x)$, acceptance-rejection is not applicable because $\Lambda _{N}(x)$ may be negative and unbounded. In order to apply acceptance-rejection, we need a nonnegative unbiased estimator for p(x), which will be constructed in Lemma 3. The idea of the construction is borrowed from [Reference Fearnhead, Papaspiliopoulos, Roberts and Stuart8]. Let $\{\Lambda _{n,k}(x)\,{:}\,k\geq 1\}$ be independent and identically distributed copies of $\Lambda _{n}(x)$. We then define

\begin{equation*}S_{n,k}(x)\coloneqq \Lambda _{n,1}(x)+\dots +\Lambda _{n,k}(x)\end{equation*}

and

\begin{equation*}\tau _{n}(x)\coloneqq \inf \{k\geq 1\,{:}\,S_{n,k}(x)\geq 0\}.\end{equation*}

By Wald’s first equation,

(12) \begin{equation}\mathbb{E}[\Lambda _{n}(x)]=\mathbb{E}\left[ S_{n,\tau _{n}(x)}(x)\right] /\mathbb{E}[\tau _{n}(x)].\end{equation}

Note that $S_{n,\tau _{n}(x)}(x)\geq 0$, but now we have an additional contribution $1/\mathbb{E}[\tau _{n}(x)]$, which can be interpreted as a probability. In order to sample a Bernoulli random variable with such a probability, we will need the following result, which we refer to as the Bernoulli factory.

Theorem 5 (Bernoulli factory [Reference Nacu and Peres19, Reference Huber14]). Assume that $\epsilon \in (0,1/2]$ and $\alpha >0$ are two known constants and that we have an oracle that outputs independent and identically distributed Bernoullis with parameter $p\in (0,\left( 1-\epsilon \right) /\alpha ]$. Then there is an algorithm which takes the output of the oracle and produces a Bernoulli random variable with parameter $f(p)=\min \left( \alpha p,1-\epsilon \right) =\alpha p$. Moreover, if $\bar{N}\left( \alpha,\epsilon \right) $ is the number of Bernoulli$\left( p\right) $ random variables required to output Bernoulli$\left(\,f\left( p\right) \right) $, then $.004\cdot \alpha /\epsilon \leq \mathbb{E}\left( \bar{N}\left( \alpha,\epsilon \right) \right) \leq 10\cdot \alpha /\epsilon $.

We can now explain how to construct $\Lambda _{n}^{+}(x)\geq 0$ such that $\mathbb{E}[\Lambda _{n}^{+}(x)]=\mathbb{E}[\Lambda _{n}(x)]$.

Lemma 3. There exists a family of random variables $\{\Lambda_{n}^{+}(x)\,{:}\,n\in \mathbb{N},x\in G_{i}\}$ such that the following properties hold:

  1. 1. $0\leq \Lambda _{n}^{+}(x)\leq m_{n}$.

  2. 2. $\mathbb{E}[\Lambda _{n}^{+}(x)]=\mathbb{E}[\Lambda _{n}(x)]$.

  3. 3. Given n and x, there is an algorithm for simulating $\Lambda_{n}^{+}(x)$.

Proof. Let $\bar{\Gamma}_{n}(x)$ be a Bernoulli random variable with parameter $\left( \mathbb{E}[\tau _{n}(x)]\right) ^{-1}$, which is independent of everything else. It follows that

\begin{equation*}\mathbb{E}[\Lambda _{n}(x)]=\mathbb{E}\left[ \bar{\Gamma}_{n}(x)S_{n,\tau_{n}(x)}(x)\right] .\end{equation*}

We write $\Lambda _{n}^{+}(x)\coloneqq \bar{\Gamma}_{n}(x)S_{n,\tau _{n}(x)}(x)$. Property 1 follows from the facts that $0\leq S_{n,\tau _{n}(x)}(x)\leq\Lambda _{n,\tau _{n}(x)}(x)\leq m_{n}$ and that $0\leq \bar{\Gamma}_{n}(x)\leq 1$. Property 2 is justified directly by (12). To show that $\Lambda _{n}^{+}(x)$ can be simulated, we just need to provide an algorithm for simulating $\bar{\Gamma}_{n}(x)$.

Recall that $\mathbb{E}[\Lambda _{n}(x)]\geq \delta _{G_{i,r_{1}}}/2$, so we have

\begin{equation*}\mathbb{E}[\tau _{n}(x)]= \frac{\mathbb{E}[S_{n,\tau _{n}(x)}(x)]}{\mathbb{E}[\Lambda _{n}(x)]}\leq 2\delta _{G_{i,r_{1}}}^{-1}m_{n}.\end{equation*}

Consider Wald’s second equation

\begin{equation*}\mathbb{E}\left[ \left( S_{n,\tau _{n}(x)}(x)-\mathbb{E}(\Lambda_{n}(x))\tau _{n}(x)\right) ^{2}\right] =\text{Var}(\Lambda _{n}(x))\mathbb{E}[\tau _{n}(x)],\end{equation*}

which implies

(13) \begin{align}\mathbb{E}\left[ (\tau _{n}(x))^{2}\right] & \leq \frac{2\mathbb{E}\left[\tau _{n}(x)S_{n,\tau _{n}(x)}(x)\right]\mathbb{E}\left[ \Lambda _{n}(x)\right]+\text{Var}(\Lambda _{n}(x))\mathbb{E}\left[\tau_{n}(x)\right] }{\mathbb{E}\left[ \Lambda _{n}(x)\right]^{2}}\notag \\[3pt]& \leq \frac{(2m_{n}^{2}+m_{n}^{2})\mathbb{E}(\tau _{n}(x))}{(\delta_{G_{i},r_{1}}/2)^{2}} \notag\\[3pt]& \leq \frac{3m_{n}^{3}}{(\delta _{G_{i},r_{1}}/2)^{3}}\eqqcolon m_{\tau,n}.\end{align}

Now we shall proceed to simulate the random variable $\bar{\Gamma}_{n}(x)$. Consider a random variable $T_{n}(x)$ with distribution

\begin{equation*}\mathbb{P}(T_{n}(x)=k)=\mathbb{P}(\tau _{n}(x)\geq k)/\mathbb{E}[\tau_{n}(x)]\quad \text{for}\quad k\geq 1.\end{equation*}

Since $I(T_{n}(x)=1)$ is the desired Bernoulli random variable with parameter $(\mathbb{E}[\tau _{n}(x)])^{-1}$, it then suffices to simulate $T_{n}(x)$. Towards this end, we apply acceptance-rejection again using another random variable $T^{\prime }$ as proposal, whose distribution is

\begin{equation*}\mathbb{P}(T^{\prime }=k)=\frac{6}{\pi ^{2}k^{2}}\quad \text{for}\quad k\geq1.\end{equation*}

Since $\tau_n(x)$ is nonnegative, Markov’s inequality asserts that

(14) \begin{align} \mathbb{P}(\tau _{n}(x)\geq k) = \mathbb{P}\left(\left(\tau _{n}(x)\right)^2\geq k^2\right)\leq k^{-2}\mathbb{E}\big[\tau _{n}(x)^{2}\big]\leq k^{-2}m_{\tau,n},\end{align}

where the last inequality follows from (13). Also, from the definition of $\tau_{n}(x)$ we know that $\mathbb{E}[\tau_n(x)]\geq 1$. Consequently, the likelihood ratio between $T_{n}(x)$ and $T^{\prime }$ is given by

\begin{equation*}\frac{\mathbb{P}(T_{n}(x)=k)}{\mathbb{P}(T^{\prime }=k)}=\frac{\pi ^{2}k^{2}\mathbb{P}(\tau _{n}(x)\geq k)}{6\mathbb{E}[\tau _{n}(x)]}\leq \frac{\pi^{2}}{6}m_{\tau ,n}.\end{equation*}

From the above inequality we see that the likelihood ratio is bounded, so the acceptance-rejection procedure is applicable.

Conditional on the proposal $T' = k$, we then introduce a new Bernoulli random variable $\tilde{\Gamma}_{n,k}(x)$ to decide whether or not the proposal is accepted as $T_n(x)$. The distribution of $\tilde{\Gamma}_{n,k}(x)$ is defined as

\begin{equation*}\mathbb{P}\big(\tilde{\Gamma}_{n,k}(x)=1\big)=1-\mathbb{P}\big(\tilde{\Gamma}_{n,k}(x)=0\big)=\frac{k^{2}}{2m_{\tau ,n}}\mathbb{P}(\tau _{n}(x)\geq k).\end{equation*}

Hence it follows from (14) that

(15) \begin{equation}\mathbb{P}\big(\tilde{\Gamma}_{n,k}(x)=1\big)\leq 1/2.\end{equation}

Observe that the indicator $I(\tau _{n}(x)\geq k)$ is simulable, but its distribution is not explicitly accessible, so it is natural to sample $\tilde{\Gamma}_{n,k}(x)$ via the Bernoulli factory introduced in Theorem 5. By (15), the function $f(\!\cdot\!)$ involved in the Bernoulli factory is a linear function, as follows:

\begin{equation*}f(p)=\min\left(\frac{k^{2}}{2m_{\tau ,n}}p,\frac{1}{2}\right) =\frac{k^{2}}{2m_{\tau ,n}}p.\end{equation*}

We summarize the procedure for simulating $\Lambda _{n}^{+}(x)$ in Algorithm 4.

We now introduce an ancillary random variable $N^{\prime }$ coupled with $X(1)$ in the following way:

(16) \begin{equation}\mathbb{P}(N^{\prime }= n|X(1) \in dx) \propto \mathbb{P}(N=n)\times \mathbb{E}[\Lambda^{+}_n(x)].\end{equation}

Assuming $(N^{\prime }|X(1)\in G_{i})$ can be simulated, $(X(1)|N^{\prime},X(1)\in G_{i})$ can easily be simulated by acceptance-rejection as well, thanks to the convenient density representation given by (18). The algorithm for sampling $(N^{\prime }|X(1)\in G_{i})$ will be explained in the next section.

3.2. Conditional sampling of $N^{\prime }$

In this section we will focus on the procedure for simulating $N^{\prime }$ conditional on $X(1)\in G_i$.

First we derive from (16) the probability mass function of $(N^{\prime }|X(1)\in G_i)$, namely

\begin{equation*}\mathbb{P}(N^{\prime }= n|X(1)\in G_i) = \frac{\mathbb{P}(N=n)}{\mathbb{P}(X(1)\in G_i)} \times\int_{G_1} \mathbb{E}[\Lambda^+_n(x)]dx.\end{equation*}

From the upper bound of $\mathbb{E}[\Lambda_n]$ given by Lemma 2, we have the following inequality:

\begin{equation*}\mathbb{P}(N^{\prime }= n|X(1)\in G_i) \leq \frac{\mathbb{P}(N=n)}{\mathbb{P}(X(1)\in G_i)}\times \left([V(r_1)]^{-1} + \delta_{G_{i,r_1}}/2\right)\!.\end{equation*}

Simulation of $(N^{\prime }|X(1)\in G_{i})$ can be achieved by acceptance-rejection. Consider a Bernoulli random variable $\Gamma _{n}(x)$ defined as

(17) \begin{equation}\mathbb{P}(\Gamma _{n}(x)=1)=1-\mathbb{P}(\Gamma _{n}(x)=0)=\frac{1}{2}\left( [V(r_1)]^{-1}+\delta _{G_{i,r_{1}}}/2\right) ^{-1}\mathbb{E}[\Lambda_{n}^{+}(x)]\leq \frac{1}{2}.\end{equation}

Then the outline of the acceptance-rejection algorithm for simulating $N^{\prime }$ would be as follows:

Step 1. Sample n from random variable N.

Step 2. Sample x from uniform distribution $U_{G_i}\sim\text{Unif}(G_i)$.

Step 3. Simulate $\Gamma_{n}(x)$. If $\Gamma_{n}(x) = 0$, go to Step 1. Otherwise accept n as a sample of $N^{\prime }$.

The only difficult step in the above procedure is Step 3, namely, simulating $\Gamma _{n}(x)$, which we will discuss now.

Lemma 2 implies that $0\leq \Lambda _{n}^{+}(x)\leq m_{n}$. Let $U\sim \text{Unif}(0,m_{n})$, which is independent of everything else; then $I(U\leq \Lambda _{n}^{+}(x))$ is a Bernoulli random variable with parameter $(m_{n})^{-1}\mathbb{E}[\Lambda _{n}^{+}(x)]$. By (17), the Bernoulli factory given in Theorem 5 can be applied to simulate $\Gamma _{n}(x)$, using $I(U\leq\Lambda _{n}^{+}(x))$ as input and using the function

\begin{equation*}f(p)=\min\left(\frac{m_{n}}{2}\left( [V(r_1)]^{-1}+\delta _{G_{i,r_{1}}}/2\right) ^{-1}p,\frac{1}{2}\right)= \frac{m_{n}}{2}\left( [V(r_1)]^{-1}+\delta _{G_{i,r_{1}}}/2\right)^{-1}p.\end{equation*}

To conclude, we synthesize the complete steps for simulating $N^{\prime }$ in the following algorithm.

3.3. Sampling of $(\textbf{\textit{X}}(1)|\textbf{\textit{N}}^{\prime },\textbf{\textit{X}}(1)\in \textbf{\textit{G}}_\textbf{\textit{i}})$

In this section, we will focus on sampling $(X(1)|N^{\prime },X(1)\in G_i)$.

Without loss of generality, let us assume $N^{\prime }=n$ throughout the the rest of this section. According to (16) and Lemma 3, the conditional distribution of X(1) can be written as

(18) \begin{equation}\mathbb{P}(X(1)\in dx|N^{\prime }=n,X(1)\in G_{i})=C_{n,G_{i}}\mathbb{E}[\Lambda _{n}^{+}(x)],\end{equation}

where $C_{n,G_{i}}$ is a constant independent of x. Once again, as we shall see, $(X(1)|N^{\prime }=n,X(1)\in G_{i})$ can be simulated by acceptance-rejection. We use the uniform distribution $U_{G_{i}}$ as the proposal distribution, and accept the proposal with probability $m_{n}^{-1}\times \mathbb{E}[\Lambda_{n}^{+}(x)]$, so we can accept if and only if $I\left( U\leq \Lambda_{n}^{+}(x)\right) =1$, where $U\sim \text{Unif}(0,m_n)$ is independent of everything else. The output of the acceptance-rejection follows the desired distribution. The explicit procedure for simulating $(X(1)|N^{\prime },X(1)\in G_{i})$ is given in Algorithm 6.

4. Conclusion

The main contribution of this paper is the construction of the first generic exact simulation algorithm for multidimensional diffusions. The algorithm extensively uses several localization ideas and TES techniques. But it also combines ideas from multilevel Monte Carlo in order to construct a sequence of random variables which ultimately provides an unbiased estimator for the underlying transition density.

Although the overall construction can be implemented with a finite termination time almost surely, the expected running time is infinite. Thus, the contribution of the paper is of theoretical nature, showing that it is possible to perform exact sampling of multivariate diffusions without applying Lamperti’s transformation. However, more research is needed to investigate whether the algorithm can be modified to be implemented in finite expected time, perhaps under additional assumptions. Alternatively, perhaps some controlled bias can be introduced while preserving features such as positivity and convexity in the applications discussed at the end of the introduction. To this end, we conclude with a discussion of the elements in the algorithm which are behind the infinite expected termination time.

There are three basic problems that cause the algorithm to have infinite expected termination time. Two of them can be appreciated already from the constant diffusion discussion and involve the use of the localization techniques that we have introduced. The third problem has to do with the application of the Bernoulli factory.

  1. Problem 1: The first problem arises when trying to sample a Bernoulli of the form $I\left( X\left( 1\right) \in G\right) $. Given $\epsilon _{n}>0$, sampling $X_{\epsilon _{n}}\left( 1\right) $ such that $\left\Vert X_{\epsilon _{n}}\left( 1\right) -X\left( 1\right)\right\Vert \leq \epsilon _{n}$ takes an $O(\epsilon _{n}^{-\left( 2+\delta\right) })$ computational cost for any $\delta >0$. So, if G is a unit hypercube in d dimensions, using the density estimates for $X\left(1\right) $ we obtain

    \begin{equation*}\mathbb{P}\left( d\left( X\left( 1\right)\!,\partial G_{i}\right) \leq\varepsilon \right) \geq c_{G}\varepsilon\end{equation*}
    for some $c_{G}>0$. Therefore, if $N_{0}$ is the computational cost required to sample $I\left( X\left( 1\right) \in G\right) $, we have that for some $\delta _{0}>0$,
    \begin{equation*}\mathbb{P}\left(N_{0}>\frac{1}{\varepsilon ^{2+\delta}}\right) \geq \mathbb{P}\left( d\left(X\left( 1\right)\!,\partial G_{i}\right) \leq \delta_{0}\varepsilon \right) \geq c_{G}\delta _{0}\varepsilon .\end{equation*}
    Therefore,
    \begin{equation*}\mathbb{P}\left( N_{0}>x\right) \geq c_{G}\delta _{0}\frac{1}{x^{1/(2+\delta)}},\end{equation*}
    which yields that $\mathbb{E}\left( N_{0}\right) =\infty $.
  2. Problem 2: The second problem arises in the acceptance-rejection step applied in Lemma 1, which requires sampling $X\left( 1\right) $ under $\tilde{\mathbb{P}}$ conditional on $\max(i,(i+1)U)<L(1)<i+1$. Directly sampling from this conditional law might be inefficient if i is large. However, this problem can be mitigated using rare-event simulation techniques, which might be available in the presence of additional structure on the drift, because under $\tilde{\mathbb{P}}\left(\cdot \right) $, $X\left( \cdot \right) $ follows a Brownian motion.

  3. Problem 3: This arises because, as indicated in Theorem 5, the computational complexity of the Bernoulli factory of a linear function of the form $f\left( p\right) =\min \left( \alpha p,1-\epsilon \right) $ scales with order $O\left( \alpha /\epsilon \right) $. In our case, we are able to select $\epsilon =1/2$, and we invoke the Bernoulli factory in Algorithms 4 and 5. In Algorithm 4, $\alpha =O\left( k^{2}\right) $, given $T^{\prime }=k$ and $\mathbb{E}\left( T^{\prime }\right) =\infty $. In Algorithm 5, $\alpha =O\left( m_{n}\right) $, given $N=n$. Although the bound which is used to define $m_{n}$ in Lemma 3 is far from optimal, in its current form, $m_{n}=O\left(n^{3d+2}\right) $, we have that $\mathbb{E}\left( N\right) =\infty $.

Appendix A. Transition density estimates

In the appendix, we will discuss some assumptions which are sufficient for the applicability of Conditions 1, 2, and 3. In addition, we also give explicit procedures for computing the constants which appear in these conditions.

Consider a matrix-valued function $a(\!\cdot\!)=(a_{ij}(\!\cdot\!))_{d\times d}\,{:}\,\mathbb{R}^{d}\mapsto \mathbb{R}^{d\times d}$ defined as

\begin{equation*}a_{ij}(x)\coloneqq \sum_{k=1}^{d^{\prime }}\sigma _{ik}(x)\sigma _{jk}(x)\quad \text{for}\quad 1\leq i,j\leq d.\end{equation*}

Assumption 4. Every component of $\mu $ and a is three times continuously differentiable. Moreover, there exists a constant M such that $\Vert \mu ^{(i)}\Vert _{\infty }\leq M$ and $\Vert a^{(i)}\Vert _{\infty}\leq M$ for $i=0,1,2,3$.

Assumption 5. There exist constants $0<\lambda_{\downarrow}<\lambda_{\uparrow}<\infty$ such that for all $x\in\mathbb{R}^d$ and $\xi = (\xi_i)_{d}\in \mathbb{R}^d$, we have

\begin{equation*}\lambda_{\downarrow}\|\xi\|_{2}\leq \sqrt{\xi^{T}a(x)\xi} \leq\lambda_{\uparrow}\|\xi\|_{2}.\end{equation*}

Under Assumptions 4 and 5, it is proved in [Reference Friedman10] that the SDE (1) possesses a transition density denoted by $p(x,t;\ y,\tau )$, which satisfies

\begin{equation*}\mathbb{P}(X(t)\in dx|X(\tau )=y)=p(x,t;\ y,\tau )dx\end{equation*}

for $\tau <t$. Therefore, Condition 1 is proved given Assumptions 4 and 5.

In the following proposition, we will establish Condition 2 via the Kolmogorov forward equation.

Proposition 1. Suppose Assumptions 4 and 5 are satisfied. Then for any relatively compact set S, the density $p(\!\cdot\!)=p(\cdot ,T;\ x_{0},0)$ is Lipschitz continuous with Lipschitz constant $C_{S}$; i.e.

\begin{equation*}|p(x)-p(y)|\leq C_{S}\Vert x-y\Vert _{2}\quad \quad \forall x,y\in S.\end{equation*}

Furthermore, $C_{S}$ can be computed by Algorithm 7.

Proof. Our methodology is closely related to the parametrix method introduced in [Reference Friedman10]. Following the same scheme, we focus on explicit computation of the constants.

Under Assumptions 4 and 5, $p(\cdot,\cdot;\ y,\tau)$ is a solution of the Kolmogorov forward equation:

(19) \begin{equation}\frac{\partial}{\partial t}p(x,t;\ y,\tau) = -\sum_{i =1}^d \frac{\partial}{\partial x_i}[\mu_i(x)p(x,t;\ y,\tau)] +\frac 1 2 \sum_{i=1}^d\sum_{j=1}^d\frac{\partial^2}{ \partial x_i\partial x_j}[a_{ij}(x)p(x,t;\ y,\tau)].\end{equation}

We shall rewrite (19) into its non-divergence form as

\begin{equation*}\mathcal{L}_{f}p\coloneqq \sum_{i,j=1}^{d}a_{ij}(x)\frac{\partial ^{2}p(x,t;\ y,\tau )}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}b_{i}(x)\frac{\partial p(x,t;\ y,\tau )}{\partial x_{i}}+c(x)p(x,t;\ y,\tau )-\frac{\partial p(x,t;\ y,\tau )}{\partial t}=0,\end{equation*}

where

\begin{equation*}b_{i}(x)\coloneqq \sum_{j=1}^{d}\frac{\partial a_{ij}(x)}{\partial x_{j}}-\mu_{i}(x), c(x)\coloneqq \frac{1}{2}\sum_{i=1}^{d}\sum_{j=1}^{d}\frac{\partial^{2}a_{ij}(x)}{\partial x_{i}\partial x_{j}}-\sum_{i=1}^{d}\frac{\partial\mu _{i}(x)}{\partial x_{i}},\end{equation*}

and $\mathcal{L}_{f}$ is a uniform parabolic operator. By Assumption 4, it follows that

(20) \begin{equation}\Vert b(x)\Vert _{\infty }\leq (d+1)M, \quad\quad|c(x)|\leq (0.5d^{2}+d)M.\end{equation}

We denote by $a^{-1}(x)=(a_{ij}^{-1}(x))_{d\times d}$ the inverse matrix of $(a_{ij}(x))_{d\times d}$, and define

\begin{equation*}\theta (x,\xi )\coloneqq \sum_{i,j=1}^{d}a_{ij}^{-1}(\xi )(x_{i}-\xi _{i})(x_{j}-\xi_{j}).\end{equation*}

Assumption 5 implies that for all $\xi \in\mathbb{R}^{d}$,

\begin{equation*}\lambda _{\uparrow }^{-1}\Vert \xi \Vert _{2}\leq \sqrt{\xi ^{T}a^{-1}(x)\xi}\leq \lambda _{\downarrow }^{-1}\Vert \xi \Vert _{2}.\end{equation*}

Following the idea of the parametrix method, we also define a partial differential equation with constant coefficients, namely,

\begin{equation*}\mathcal{L}_{0}^{y}u\coloneqq \sum_{i,j=1}^{n}a_{ij}(y)\frac{\partial ^{2}u(x,t)}{\partial x_{i}\partial x_{j}}-\frac{\partial u(x,t)}{\partial t}=0.\end{equation*}

The fundamental solution of function $\mathcal{L}_{0}^{y}u=0$ is given by

\begin{equation*}Z(x,t;\ \xi ,\tau )=C_{Z}(\xi )w(x,t;\ \xi ,\tau ),\end{equation*}

where

\begin{equation*}C_{Z}(\xi )\coloneqq (2\sqrt{\pi })^{-d}[\det (a_{ij}(\xi ))]^{1/2}\leq (2\sqrt{\pi })^{-d}\lambda _{\uparrow }^{d/2}\eqqcolon C_{0},\end{equation*}
\begin{equation*}w(x,t;\ \xi ,\tau )\coloneqq (t-\tau )^{-d/2}\exp \left( -\frac{\theta (x,\xi )}{4(t-\tau )}\right)\!.\end{equation*}

According to Theorem 1.3 and Theorem 1.10 in [Reference Friedman10], $p(x,t;\ \xi ,\tau )$ can be represented by the parametrix method as

\begin{equation*}p(x,t;\ \xi ,\tau )=Z(x,t;\ \xi ,\tau )+\int_{\tau }^{t}\int_{\mathbb{R}^{d}}Z(x,t;\ y,s)\Phi (y,s;\ \xi ,\tau )dyds,\end{equation*}

where

\begin{align*}\Phi (x,t;\ \xi ,\tau )& \coloneqq \sum_{k =1}^{\infty }(\mathcal{L}_{f}Z)_{k}(x,t;\ \xi ,\tau ), \\[3pt](\mathcal{L}_{f}Z)_{1}& \coloneqq \mathcal{L}_{f}Z, \\[3pt](\mathcal{L}_{f}Z)_{k +1}& \coloneqq \int_{\tau }^{t}\int_{\mathbb{R}^{d}}\mathcal{L}_{f}Z(x,t;\ y,s)(\mathcal{L}_{f}Z)_{k }(y,s;\ \xi ,\tau )dyds\end{align*}

for $k \geq 1$. Furthermore, the partial derivatives of the fundamental solution admit the following expression:

(21) \begin{equation}\frac{\partial }{\partial x_{i}}p(x,t;\ \xi ,\tau )=\frac{\partial }{\partial x_{i}}Z(x,t;\ \xi ,\tau )+\int_{\tau }^{t}\int_{\mathbb{R}^{d}}\frac{\partial}{\partial x_{i}}Z(x,t;\ y,s)\Phi (y,s;\ \xi ,\tau )dyds.\end{equation}

Let us pick $\epsilon \in (0,1)$; then we can derive a bound for Z as

\begin{equation*}|Z(x,t;\ \xi ,\tau )|\leq C_{0}\times (t-\tau )^{-d/2}\exp \left( -\frac{(1-\epsilon )\Vert x-\xi \Vert _{2}^{2}}{4\lambda _{\downarrow }(t-\tau )}\right)\!.\end{equation*}

For the bound of $\partial Z(x,t;\ \xi ,\tau )/\partial x_{i}$, note that

\begin{equation*}\left\vert \frac{\partial Z(x,t;\ \xi ,\tau )}{\partial x_{i}}\right\vert=[4(t-\tau )]^{-1}\left\vert \frac{\partial \theta (x,\xi )}{\partial x_{i}}\right\vert C_{Z}(\xi )w(x,t;\ \xi ,\tau ),\end{equation*}

and that

\begin{equation*}\left\vert \frac{\partial \theta (x,\xi )}{\partial x_{i}}\right\vert=\left\vert 2\sum_{j=1}^{d}a_{ij}^{-1}(\xi )(x_{j}-\xi _{j})\right\vert \leq2\lambda _{\uparrow }^{-1}\Vert x-\xi \Vert _{2}.\end{equation*}

Combining the definition of $C_Z(\!\cdot\!)$ and the above two equations implies that

\begin{align*}\left\vert \frac{\partial Z(x,t;\ \xi ,\tau )}{\partial x_{i}}\right\vert &\leq \frac{\lambda _{\uparrow }^{-1}}{2}C_{0}|x-\xi |(t-\tau )^{-\frac{d+1}{2}}(\theta (x,\xi ))^{-1/2}\left[ \frac{\theta (x,\xi )}{t-\tau }\right]^{1/2} \\[3pt]&\quad \times \exp \left( -\frac{\epsilon \theta (x,\xi )}{4(t-\tau )}\right)\exp \left( -\frac{(1-\epsilon )\theta (x,\xi )}{4(t-\tau )}\right)\!.\end{align*}

Applying the inequalities

\begin{equation*}\left[ \frac{\theta (x,\xi )}{t-\tau }\right] ^{1/2}\exp \left( -\frac{\epsilon \theta (x,\xi )}{4(t-\tau )}\right) \leq \sup_{x\in \lbrack 0,+\infty )}x^{\frac{1}{2}}e^{-\frac{\epsilon x}{4}}=\left( \frac{2}{\epsilon e}\right) ^{1/2}\end{equation*}

and

\begin{equation*}|x-\xi |(\theta (x,\xi ))^{-1/2}\leq \lambda _{\downarrow }^{1/2},\end{equation*}

we obtain

\begin{equation*}\left\vert \frac{\partial Z(x,t;\ \xi ,\tau )}{\partial x_{i}}\right\vert \leq\frac{C_{1}}{(t-\tau )^{\frac{d+1}{2}}}\exp \left( -\frac{(1-\epsilon )\Vert x-\xi \Vert _{2}^{2}}{4\lambda _{\downarrow }(t-\tau )}\right)\end{equation*}

by setting

\begin{equation*}C_{1}\coloneqq (2\epsilon e)^{-1/2}\lambda _{\downarrow }^{1/2}\lambda _{\uparrow}^{-1}C_{0}.\end{equation*}

Similarly, we can derive bounds for $\partial ^{2}Z(x,t;\ \xi ,\tau)/(\partial x_{i}\partial x_{j})$ and $\partial ^{2}Z(x,t;\ \xi ,\tau )/(\partial x_{i}^{2})$. For $i\neq j$,

\begin{align*}\left\vert \frac{\partial ^{2}Z(x,t;\ \xi ,\tau )}{\partial x_{i}\partial x_{j}}\right\vert & \leq \frac{C_{2}}{(t-\tau )^{\frac{d+1}{2}}|x-\xi |}\exp\left( -\frac{(1-\epsilon )\Vert x-\xi \Vert _{2}^{2}}{4\lambda _{\downarrow}(t-\tau )}\right)\!, \\[3pt]\left\vert \frac{\partial ^{2}Z(x,t;\ \xi ,\tau )}{\partial x_{i}^{2}}\right\vert & \leq \frac{C_{3}}{(t-\tau )^{\frac{d+1}{2}}|x-\xi |}\exp\left( -\frac{(1-\epsilon )\Vert x-\xi \Vert _{2}^{2}}{4\lambda _{\downarrow}(t-\tau )}\right)\!,\end{align*}

where

\begin{align*}C_{2}& \coloneqq C_{0}\left( \frac{4\lambda _{\downarrow }}{e\epsilon \lambda_{\uparrow }}\right) ^{2}, \\[3pt]C_{3}& \coloneqq C_{0}\left( \frac{4\lambda _{\downarrow }}{e\epsilon \lambda_{\uparrow }}\right) ^{2}+C_{0}\frac{M}{4}\left( \frac{2\lambda _{\downarrow}}{\epsilon e}\right) ^{\frac{1}{2}}.\end{align*}

By definition of $Z(\!\cdot\!)$ we can observe that

\begin{align*}\mathcal{L}_{f}Z(x,t;\ \xi ,\tau )& =\sum_{i,j=1}^{d}[a_{ij}(x)-a_{ij}(\xi )]\frac{\partial ^{2}Z(x,t;\ \xi ,\tau )}{\partial x_{i}\partial x_{j}} \\[3pt]&\quad +\sum_{i=1}^{d}b_{i}(x)\frac{\partial Z(x,t;\ \xi ,\tau )}{\partial x_{i}}+c(x)Z(x,t;\ \xi ,\tau ).\end{align*}

Suppose $0\leq t-\tau \leq T$ in the sequel. By considering the upper bounds of the partial derivatives of Z, as well as (20) and Assumption 4, we obtain

(22) \begin{equation}|\mathcal{L}_{f}Z(x,t;\ \xi ,\tau )|\leq \frac{C_{4}}{(t-\tau )^{\frac{d+1}{2}}}\exp \left( -\frac{(1-\epsilon )\Vert x-\xi \Vert _{2}^{2}}{4\lambda_{\downarrow }(t-\tau )}\right)\!,\end{equation}

where

\begin{equation*}C_{4}\coloneqq dMC_{3}+d(d-1)MC_{2}+d(d+1)MC_{1}+T^{\frac{1}{2}}(0.5d^{2}+d)MC_{0}.\end{equation*}

Now, in order to find a bound for $\Phi (x,t;\ \xi ,\tau )$, we need to introduce a technical lemma.

Lemma 4 (Lemma 1.3 of [Reference Friedman10]). If $\beta $ and $\gamma $ are two constants in $(-\infty ,\frac{d}{2}+1)$, then

\begin{align*}& \int_{\tau }^{t}\int_{\mathbb{R}^{d}}(t-s)^{-\beta }\exp \left( -\frac{h\Vert x-y\Vert _{2}^{2}}{4(t-s)}\right) (s-\tau )^{-\gamma }\exp \left( -\frac{h\Vert y-\xi \Vert _{2}^{2}}{4(s-\tau )}\right) dyds \\[3pt]&\quad =\left( \frac{4\pi }{h}\right) ^{\frac{d}{2}}\text{Beta}\left( \frac{d}{2}-\beta +1,\frac{d}{2}-\gamma +1\right) (t-\tau )^{\frac{d}{2}+1-\beta-\gamma }\exp \left( -\frac{h\Vert x-\xi \Vert _{2}^{2}}{4(t-\tau )}\right)\!,\end{align*}

where Beta$(\!\cdot\!)$ is the beta function.

From (22) and Lemma 4, we can derive

\begin{align*}| (\mathcal{L}_{f}Z)_2(x,t;\ \xi,\tau)|&\leq \int_\tau^t\int_{\mathbb{R}^d} |\mathcal{L}_{f}Z(x,t;\ y,s)||\mathcal{L}_{f}Z(y,s;\ \xi,\tau)|dyds. \\[3pt]&\leq \frac{C_5C_6^2}{1!}(t-\tau)^{1-\frac{d}{2} }\exp\left(-\frac{(1-\epsilon)\|x-\xi\|_{2}^2}{4\lambda_{\downarrow}(t-\tau)}\right)\!,\end{align*}

where

\begin{align*}C_5\coloneqq \left(\frac{4\pi\lambda_{\downarrow}}{1-\epsilon}\right)^{-\frac d 2},&& C_6\coloneqq C_4\left(\frac{4\pi\lambda_{\downarrow}}{1-\epsilon}\right)^{\frac d2}.\end{align*}

By induction we can show that, for any positive integer m,

\begin{align*}| (\mathcal{L}_{f}Z)_m(x,t;\ \xi,\tau)|\leq \frac{C_5C_6^m}{(m-1)!}(t-\tau)^{m-\frac{d}{2}-1}\exp\left(-\frac{(1-\epsilon)\|x-\xi\|_{2}^2}{4\lambda_{\downarrow}(t-\tau)}\right)\!.\end{align*}

It turns out that

\begin{align*}\Phi(x,t;\ \xi,\tau)&\leq\sum_{m=1}^\infty| (\mathcal{L }Z)_m(x,t;\ \xi,\tau)|\notag \\[3pt]&\leq \frac{C_7}{(t-\tau)^{\frac{d}{2}}}\exp\left(-\frac{ (1-\epsilon)\|x-\xi\|_{2}^2}{4\lambda_{\downarrow}(t-\tau)}\right)\!,\end{align*}

where

\begin{align*}C_7\coloneqq \sum_{m=1}^\infty\frac{C_5C_6^m}{(m-1)!} T^{m-1} =C_{5}C_{6}\exp(C_{6}T).\end{align*}

Recalling (21), we can apply Lemma 4 again and conclude that

\begin{align*}\left|\frac{\partial}{\partial x_i}p(x,t;\ \xi,\tau)\right| &\leq \left|\frac{\partial}{\partial x_i}Z(x,t;\ \xi,\tau)\right|+\int_\tau^t\int_{\mathbb{R}^d}\left|\frac{\partial}{\partial x_i}Z(x,t;\ y,s)\Phi(y,s;\ \xi,\tau)\right|dyds.\notag \\[3pt]&\leq \left[\frac{C_1}{(t-\tau)^\frac{d+1}{2}}+\frac{C_8}{(t-\tau)^\frac{ d-1}{2}}\right]\exp\left(-\frac{(1-\epsilon)\|x-\xi\|_{2}^2}{4\lambda_{\downarrow}(t-\tau)}\right)\!,\end{align*}

where

\begin{align*}C_8\coloneqq 2C_1C_7\left(\frac{4\pi\lambda_{\downarrow}}{1-\epsilon}\right)^{\frac d 2}.\end{align*}

Therefore, we obtain an upper bound for $|\nabla_x p(x,t;\ \xi,\tau)|$ by considering

\begin{equation*}|\nabla_x p(x,t;\ \xi,\tau)|\leq d\times\left|\frac{\partial}{\partial x_i}p(x,t;\ \xi,\tau)\right|.\end{equation*}

Therefore, for all $x,y \in S$ we have

\begin{equation*}|p(x)-p(y)|\leq C_S\|x-y\|_2,\end{equation*}

where

\begin{align*}C_S = \left[\frac{dC_1}{T^\frac{d+1}{2}}+\frac{dC_8}{T^\frac{d-1}{2}}\right]\exp\left(-\frac{(1-\epsilon)\inf_{\bar{x}\in S}\|\bar{x}-x_0\|_2}{4\lambda_{\downarrow}T}\right)\!.\\[-40pt]\end{align*}

Next, we will propose a computational procedure for lower bounds of transition density. There is a substantial amount of literature that studies lower bounds for the transition density of diffusions, through analytical approaches or probabilistic approaches. For instance, Aronson [Reference Aronson2] develops estimates of lower bounds of fundamental solutions of second-order parabolic partial differential equations in divergence form. Using Malliavin calculus, Kusuoka and Stroock [Reference Kusuoka and Stroock16] derived a lower bound for the transition density of uniformly elliptic diffusions. Bally [Reference Bally3] generalized the idea of [Reference Kusuoka and Stroock16] to locally elliptic diffusions. We follow the approach suggested by Sheu [Reference Sheu23] and review it in order to find explicit expressions to obtain a computable lower bound.

In order to keep our paper self-contained, we first introduce some notation to be used later.

Let $\mathcal{L}_{b}$ be the generator of the Komolgorov backward equation:

\begin{equation*}\mathcal{L}_{b}u(x,t)\coloneqq \frac{1}{2}\sum_{i,j=1}^{n}a_{ij}(x)\frac{\partial^{2}u(x,t)}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}\mu _{i}(x)\frac{\partial u(x,t)}{\partial x_{i}}-\frac{\partial u(x,t)}{\partial t}.\end{equation*}

The transition density as a function of $(x,t)\mapsto p(y,t;\ x,0)$ coincides with the fundamental solution of the Komolgorov backward equation:

\begin{align*}\mathcal{L}_{b}u(x,t)& =0,\qquad t>0, \quad x\in \mathbb{R}^{d}, \\[3pt]u(0,x)& =u_{0}(x).\end{align*}

Throughout the rest of this section, we suppose that Assumptions 4 and 5 are in force.

Recall that $a^{-1}(x)$ is the inverse matrix of a(x), and define

\begin{align*}k(x,\psi) = \frac{1}{2}\sum_{i,j=1}^da^{-1}_{ij}(x)(\mu_i(x)-\psi_i)(\mu_j(x) - \psi_j).\end{align*}

For a fixed $y_{0}\in \mathbb{R}^{d}$, we define

\begin{equation*}f^{\beta }(y;\ y_{0})\coloneqq \left( \frac{1}{\sqrt{2\pi \beta }}\right) ^{d}\frac{1}{\sqrt{\det a(y_{0})}}\exp \left( -\frac{1}{2\beta }\sum_{i,j=1}^{d}a_{ij}^{-1}(y_{0})(y-y_{0})_{i}(y-y_{0})_{j}\right)\!,\end{equation*}

and

\begin{equation*}p^{\beta }(y_{0},t;\ x,0)\coloneqq \mathbb{E}_{x}[f^{\beta }(X(t);\ y_{0})].\end{equation*}

The continuity of the density implies

(23) \begin{equation}\lim_{\beta \rightarrow 0}p^{\beta }(y_{0},t;\ x,0)=p(y_{0},t;\ x,0).\end{equation}

For simplicity, we also define the logarithmic transform of p and $p^{\beta }$ as

\begin{align*}J(t,x)& \coloneqq -\log (p(y_{0},t;\ x,0)), \\[3pt]J^{\beta }(t,x)& \coloneqq -\log (p^{\beta }(y_{0},t;\ x,0)).\end{align*}

To prepare the analysis, which is based on stochastic control, we introduce the space of control functions, denoted by $\mathcal{F}_{T,x}$. The class $\mathcal{F}_{T,x}$ is defined as a family of measurable functions $\psi\,{:}\,[0,T]\times\mathbb{R}^{d}\rightarrow \mathbb{R}^{d}$ such that the SDE

\begin{equation*}d\eta (t)=\psi (t,\eta (t))dt+\sigma (\eta (t))dW(t),\quad \eta (0)=x\end{equation*}

has a weak solution $\eta (\!\cdot\!)$ that satisfies

\begin{equation*}\mathbb{E}\left( \int_{0}^{T}\Vert \psi (t,\eta (t))\Vert _{2}^{2}dt\right)<\infty .\end{equation*}

Now we state a lemma that is crucial for proving the main result of this section.

Lemma 5. Recall the definition of $\mathcal{F}_{T,x}$ and $\eta(\!\cdot\!) $ from the previous paragraph; then we have

\begin{align*}J^\beta(T,x) = \inf_{\psi\in \mathcal{F}_{T,x}} \mathbb{E}\left(\int_0^Tk(\eta(t),\psi(t))dt+J^{\beta}(0,\eta(T))\right)\!.\end{align*}

Together with (23), we see that

(24) \begin{align}J(T,x) = \lim_{\beta\rightarrow 0 } \inf_{\psi\in \mathcal{F}_{T-\beta,x}}\mathbb{E}\left(\int_0^{T-\beta}k(\eta(t),\psi(t))dt+J^{\beta}(0,\eta(T-\beta))\right)\!.\end{align}

Proof. See [Reference Fleming and Rishel9].

Theorem 6. Suppose Assumptions 4 and 5 are satisfied. Then, for any relatively compact set S, the density $p(\!\cdot\!)=p(\cdot ,T;\ x_{0},0)$ has a uniform lower bound $\delta _{S}>0$ in S; i.e.

\begin{equation*}p(x)\geq \delta _{S}\quad \forall x\in S.\end{equation*}

Furthermore, $\delta _{S}$ can be computed by Algorithm 8.

Proof. Finding a lower bound of the density $p(y_{0},T;\ x_{0},0)$ is equivalent to finding an upper bound for $J(T,x_{0})$. Towards this end, it suffices to find an upper bound for $J^{\beta }(T,x_{0})$ that is uniform in $\beta $. We shall define $\phi (\!\cdot\!)$ as a linear function such that $\phi(0)=x_{0}$, $\phi (T)=y_{0}$. Write

\begin{equation*}\psi (t,x)=\frac{y_{0}-x_{0}}{T}-\frac{x-\phi (t)}{T-t},\quad 0\leq t\leq T-\beta .\end{equation*}

It is not hard to see that $\psi \in \mathcal{F}_{T-\beta ,x_{0}}$. Therefore,

(25) \begin{equation}J^{\beta }(T,x_{0})\leq \mathbb{E}\left( \int_{0}^{T-\beta }k(\eta (t),\psi(t))dt+J^{\beta }(0,\eta (T-\beta ))\right)\!,\end{equation}

according to Lemma 5. Notice that

(26) \begin{equation}(\eta (t)-\phi (t))_{i}=(T-t)\sum_{l=1}^{d^{\prime }}\int_{0}^{t}\frac{1}{T-s}\sigma _{il}(\eta (s))dW_{l}(s),\quad \text{for}\quad i=1,\dots ,d.\end{equation}

It follows that

\begin{equation*}\mathbb{E}\left( (\eta (t)-\phi (t))_{i}(\eta (t)-\phi (t))_{j}\right)=(T-t)^{2}\mathbb{E}\left( \int_{0}^{t}\frac{1}{(T-s)^{2}}a_{ij}(\eta(s))ds\right)\end{equation*}

and

(27) \begin{equation}\mathbb{E}(\Vert \eta (t)-\phi (t)\Vert _{2}^{2})=(T-t)^{2}\mathbb{E}\left(\int_{0}^{t}\frac{1}{(T-s)^{2}}\sum_{i=1}^{d}a_{ii}(\eta (s))ds\right) \leq d\lambda _{\uparrow }(T-t).\end{equation}

We now apply a Taylor expansion of $k(\eta (t),\psi (t))$ around $(\phi (t),\dot{\phi}(t))$, where $\dot{\phi}(\!\cdot\!)$ denotes the derivative of $\phi(\!\cdot\!)$. For notational simplicity, we define

\begin{align*}\Delta_{1}(t)& =\eta (t)-\phi (t), \\[3pt]\Delta_{2}(t)& =\psi (t)-\dot{\phi}(t)=-\frac{1}{T-t}\Delta_{1}(t).\end{align*}

We also define

\begin{equation*}D_{x_{i}}k(\lambda )=\frac{\partial }{\partial x_{i}}k(\phi (t)+\lambda\Delta_{1}(t),\dot{\phi}(t)+\lambda \Delta_{2}(t)),\end{equation*}

and similarly for $D_{\psi _{i}}k$, $D_{x_{i},x_{j}}k$, $D_{x_{i},\psi_{j}}k $, and $D_{\psi _{i},\psi _{j}}k$. The Taylor expansion with remainders of third order is given as follows:

\begin{align*}k(\eta (t),\psi (t)) = k_{0}(t) + k_{1}(t)+k_{2,1}(t)+k_{2,2}(t)+k_{2,3}(t)+k_{3,1}(t)+k_{3,2}(t)+k_{3,3}(t),\end{align*}

where

\begin{align*}k_{0}(t) &\coloneqq k(\phi(t),\dot{\phi}(t)), \\[3pt]k_{1}(t) &\coloneqq \sum_{i=1}^{d}\left( D_{x_{i}}k(0)\Delta _{1,i}(t)+D_{\psi_{i}}k(0)\Delta _{2,i}(t)\right)\!, \\[3pt]k_{2,1}(t) &\coloneqq \frac{1}{2}\sum_{i,j=1}^{d} D_{x_{i},x_{j}}k(0)\Delta_{1,i}(t)\Delta _{1,j}(t), \\[3pt]k_{2,2}(t) &\coloneqq \sum_{i,j=1}^{d} D_{x_{i},\psi _{j}}k(0)\Delta_{1,i}(t)\Delta _{2,j}(t), \\[3pt]k_{2,3}(t) &\coloneqq \frac{1}{2}\sum_{i,j=1}^{d} D_{\psi _{i},\psi _{j}}k(0)\Delta_{2,i}(t)\Delta _{2,j}(t), \\[3pt]k_{3,1}(t) &\coloneqq \sum_{i,j=1}^{d}\int_{0}^{1}\int_{0}^{1}\left(D_{x_{i},x_{j}}k(\lambda \mu )-D_{x_{i},x_{j}}k(0)\right) \Delta_{1,i}(t)\Delta _{1,j}(t)\lambda d\mu d\lambda, \\[3pt]k_{3,2}(t) &\coloneqq \sum_{i,j=1}^{d}2\int_{0}^{1}\int_{0}^{1}\left( D_{x_{i},\psi_{j}}k(\lambda \mu )-D_{x_{i},\psi _{j}}k(0)\right) \Delta _{1,i}(t)\Delta_{2,j}(t)\lambda d\mu d\lambda, \\[3pt]k_{3,3}(t) &\coloneqq \sum_{i,j=1}^{d}\int_{0}^{1}\int_{0}^{1}\left( D_{\psi_{i},\psi_{j}}k(\lambda \mu )-D_{\psi _{i},\psi _{j}}k(0)\right) \Delta_{2,i}(t)\Delta _{2,j}(t)\lambda d\mu d\lambda.\end{align*}

Now we integrate all the above terms from 0 to $T-\beta $ with respect to the variable t, then take expectations, and analyze the upper bounds of the result term by term.

  1. Zeroth-order term: Notice that k is in quadratic form, with matrix $(a_{ij}^{-1}(x))$, so

    \begin{equation*}\mathbb{E}\left(\int_{0}^{T-\beta } k_{0}(t)dt\right)= \mathbb{E}\left(\int_{0}^{T-\beta }k(\phi (t),\dot{\phi}(t))dt\right) \leq \lambda_{\downarrow }^{-1}T\left( M+\frac{|y_{0}-x_{0}|}{T}\right) ^{2}.\end{equation*}
  2. First-order term: We treat the first-order term $k_{1}(t)$ first. Since $\Delta _{2,i}(t)$ is a martingale by (26), the first-order term satisfies

    \begin{equation*}\mathbb{E}\left(\int_{0}^{T-\beta } k_{1}(t)dt\right)= \mathbb{E}\left(\int_{0}^{T-\beta }\left( D_{x_{i}}k(0)\Delta _{1,i}(t)+D_{\psi_{i}}k(0)\Delta _{2,i}(t)\right) dt\right) =0.\end{equation*}
  3. Second-order terms: We then treat the second-order terms. As $D_{\psi _{i},\psi _{j}}k(0)=a_{ij}^{-1}(\phi (t))$,

    \begin{align*}\mathbb{E}\left(\int_{0}^{T-\beta } k_{2,3}(t)dt\right) &= \mathbb{E}\left(\int_{0}^{T-\beta }\frac{1}{2}\sum_{i,j=1}^{d}D_{\psi _{i},\psi_{j}}k(0)\Delta _{2,i}(t)\Delta _{2,j}(t)dt\right) \\[3pt]&= \frac{1}{2}\int_{0}^{T-\beta }\mathbb{E}\left( \int_{0}^{t}\frac{1}{(T-s)^{2}}\sum_{i,j=1}^{d}a_{ij}^{-1}(\phi (t))a_{ij}(\eta (s))ds\right) dt.\end{align*}
    Writing
    \begin{equation*}a_{ij}(\eta (s))=(a_{ij}(\eta (s))-a_{ij}(\phi (s)))+(a_{ij}(\phi(s))-a_{ij}(\phi (t)))+a_{ij}(\phi (t)),\end{equation*}
    and noticing that $(a_{ij}(t))$ is symmetric, we see that
    (28) \begin{equation}\frac{1}{2}\int_{0}^{T-\beta }\mathbb{E}\left( \int_{0}^{t}\frac{1}{(T-s)^{2}}\sum_{i,j=1}^{d}a_{ij}^{-1}(\phi (t))a_{ij}(\phi (t))ds\right) dt=\frac{d}{2}(\log (T)-\log (\beta )).\end{equation}
    Assumption 4 implies the Lipschitz continuity of $a(\!\cdot\!)$, which gives
    \begin{align*}& \frac{1}{2}\int_{0}^{T-\beta }\mathbb{E}\left( \int_{0}^{t}\frac{1}{(T-s)^{2}}\sum_{i,j=1}^{d}g_{ij}(\phi (t))(a_{ij}(\eta (s))-a_{ij}(\phi(s)))ds\right) dt \\[3pt]&\quad \leq \frac{1}{2}\int_{0}^{T-\beta }\mathbb{E}\left( \int_{0}^{t}\frac{M}{(T-s)^{2}}\sum_{i,j=1}^{d}|g_{ij}(\phi (t))|\times \Vert \eta (s)-\phi(s)\Vert _{2}ds\right) dt \\[3pt]&\quad = \frac{1}{2}\int_{0}^{T-\beta }\int_{0}^{t}\frac{M}{(T-s)^{2}}\sum_{i,j=1}^{d}|a_{ij}^{-1}(\phi (t))|\times \mathbb{E}\left( \Vert \eta(s)-\phi (s)\Vert _{2}\right) dsdt.\end{align*}
    By (27) and Jensen’s inequality,
    \begin{equation*}\mathbb{E}\left[ \Vert \eta (s)-\phi (s)\Vert _{2}\right] \leq \left(d\lambda _{\uparrow }(T-t)\right) ^{1/2}.\end{equation*}
    Observe that $\sum_{i,j=1}^{d}|a_{ij}^{-1}(\phi (t))|\leq d\lambda_{\downarrow }^{-1}$, so we have
    (29) \begin{equation}\begin{split}& \frac{1}{2}\int_{0}^{T-\beta }\mathbb{E}\left( \int_{0}^{t}\frac{1}{(T-s)^{2}}\sum_{i,j=1}^{d}a_{ij}^{-1}(\phi (t))(a_{ij}(\eta (s))-a_{ij}(\phi(s)))ds\right) dt \\[3pt]&\quad \leq M(d\lambda _{\uparrow }T)^{1/2}d\lambda _{\downarrow }^{-1}.\end{split}\end{equation}
    By the Lipschitz continuity of $(a_{ij}(\!\cdot\!))$, it follows that
    \begin{equation*}|a_{ij}(\phi (s))-a_{ij}(\phi (t))|\leq MT^{-1}\Vert x_{0}-y_{0}\Vert_{2}|s-t|.\end{equation*}
    Therefore,
    (30) \begin{equation}\begin{split}& \frac{1}{2}\int_{0}^{T-\beta }\mathbb{E}\left( \int_{0}^{t}\frac{1}{(T-s)^{2}}\sum_{i,j=1}^{d}a_{ij}^{-1}(\phi (t))(a_{ij}(\phi (s))-a_{ij}(\phi(t)))ds\right) dt \\[3pt]&\quad \leq \frac{1}{2}d\lambda _{\downarrow }^{-1}MT^{-1}\Vert x_{0}-y_{0}\Vert_{2}\int_{0}^{T-\beta }\left( \int_{0}^{t}\frac{t-s}{(T-s)^{2}}ds\right) dt\\[3pt]&\quad \leq \frac{1}{2}d\lambda _{\downarrow }^{-1}M\Vert x_{0}-y_{0}\Vert _{2}.\end{split}\end{equation}
    Combining (28), (29), and (30) yields
    \begin{equation*}\begin{split}\mathbb{E}\left(\int_{0}^{T-\beta } k_{2,3}(t)dt\right) &= \mathbb{E}\left(\int_{0}^{T-\beta }\frac{1}{2}\sum_{i,j=1}^{d}D_{\psi _{i},\psi_{j}}k(0)\Delta _{2,i}(t)\Delta _{2,j}(t)dt\right) \\[3pt] &\leq \frac{d}{2}(\log (T)-\log (\beta ))+M(d\lambda _{\uparrow}T)^{1/2}d\lambda _{\downarrow }^{-1}+\frac{d}{2}\lambda _{\downarrow}^{-1}M\Vert x_{0}-y_{0}\Vert _{2}.\end{split}\end{equation*}
    By the chain rule and Assumption 4, we obtain
    \begin{align*}|D_{x_{i}}a_{ij}^{-1}(x)|& \leq d^{2}\lambda _{\downarrow }^{-2}M, \\[3pt]|D_{x_{i}\psi _{j}}k(0)|& \leq \Psi _{1}(\Vert x_{0}-y_{0}\Vert _{2}/T), \\[3pt]|D_{x_{i}x_{j}}k(0)|& \leq \Psi _{2}(\Vert x_{0}-y_{0}\Vert _{2}/T),\end{align*}
    where $\Psi _{i}(\!\cdot\!)\,{:}\,\mathbb{R}\rightarrow \mathbb{R}$, $i=1,2$, are defined as
    (31) \begin{equation}\hspace*{-140pt}\Psi _{1}(x)\coloneqq d^{2}\lambda _{\downarrow }^{-2}M(M+x)+d\lambda _{\downarrow}^{-1}M,\end{equation}
    (32) \begin{align}\Psi _{2}(x)&\coloneqq (M+x)^{2}d^{2}\left(\frac{1}{2}\lambda _{\downarrow}^{-2}Md+\lambda _{\downarrow }^{-3}M^{2}d^{2}\right)+2\lambda _{\downarrow}^{-1}M^{2}d^{2}+\lambda _{\downarrow }^{-1}Mdx\nonumber\\[3pt]&\quad+2\lambda _{\downarrow}^{-2}M^{3}d^{3}+2\lambda _{\downarrow }^{-2}M^{2}d^{2}x.\end{align}
    Taking (27) into consideration, we obtain
    \begin{align*}\mathbb{E}\left(\int_{0}^{T-\beta } k_{2,2}(t)dt\right)&= \mathbb{E}\left(\int_{0}^{T-\beta }\frac{1}{2}\sum_{i,j=1}^{d}D_{x_{i},\psi _{j}}k(0)\Delta_{1,i}(t)\Delta _{2,j}(t)dt\right) \\[3pt]& \leq \frac{1}{2}d\lambda _{\uparrow}T\Psi _{1}(\Vert x_{0}-y_{0}\Vert _{2}/T), \\[3pt]\mathbb{E}\left(\int_{0}^{T-\beta } k_{2,1}(t)dt\right)&= \mathbb{E}\left(\int_{0}^{T-\beta }\frac{1}{2}\sum_{i,j=1}^{d}D_{x_{i},x_{j}}k(0)\Delta_{1,i}(t)\Delta _{1,j}(t)dt\right) \\[3pt]& \leq \frac{1}{4}d\lambda _{\uparrow}T^{2}\Psi _{2}(\Vert x_{0}-y_{0}\Vert _{2}/T),\end{align*}
  4. Third-order terms: We proceed to analyze the third-order remainder terms. Let us consider $k_{3,3}(t)$ first. Notice that

    \begin{equation*}|D_{\psi _{i},\psi _{j}}k(\lambda \mu )-D_{\psi _{i},\psi _{j}}k(0)|\leq\lambda _{\downarrow }^{-2}d^{2}M\lambda \mu \Vert \Delta_{1}(t)\Vert _{2};\end{equation*}
    thus,
    \begin{align*}&\left\vert\mathbb{E}\left(\int_{0}^{T-\beta } k_{3,3}(t)dt\right)\right\vert\\[3pt]&\quad= \left\vert \mathbb{E}\left(\sum_{i,j=1}^{d}\int_{0}^{1}\int_{0}^{1}\left( D_{\psi _{i},\psi_{j}}k(\lambda \mu )-D_{\psi _{i},\psi _{j}}k(0)\right)\Delta_{2,i}(t)\Delta _{2,j}(t)\lambda d\mu d\lambda \right) \right\vert \\[3pt]&\quad\leq\frac{1}{6}Md^{3}\lambda _{\downarrow }^{-2}\mathbb{E}(\Vert \Delta^{1}(t)\Vert _{2}\Vert \Delta_{2}(t)\Vert _{2}^{2}).\end{align*}
    Then, by the Burkholder–Davis–Gundy inequality,
    \begin{align*}& \mathbb{E}(\Vert \Delta_{1}(t)\Vert _{2}\Vert \Delta_{2}(t)\Vert_{2}^{2})\leq (T-t)C_{\text{BDG}}(3)d^{\frac{1}{2}}\sum_{i=1}^{d}\mathbb{E}\left( \left( \int_{0}^{t}\frac{1}{(T-s)^{2}}a_{ii}(\eta (s))ds\right) ^{\frac{3}{2}}\right) \\[3pt]&\quad \leq C_{\text{BDG}}(3)d^{\frac{3}{2}}\lambda _{\uparrow }^{\frac{3}{2}}(T-t)^{-\frac{1}{2}},\end{align*}
    where $C_{\text{BDG}}(3)$ is the explicit constant in the Burkholder–Davis–Gundy inequality. We can pick
    \[C_{\text{BDG}}(p)=\left( \frac{p(p-1)}{2}\left(\frac{p}{p-1}\right)^{p}\right) ^{p/2}\]
    (see Proposition 4.4.3 of [Reference Revuz and Yor20]. To summarize, we obtain
    \begin{align*}&\left\vert\mathbb{E}\left(\int_{0}^{T-\beta } k_{3,3}(t)dt\right)\right\vert\\[3pt]&\quad= \left\vert \int_{0}^{T-\beta }\mathbb{E}\left(\sum_{i,j=1}^{d}\int_{0}^{1}\int_{0}^{1}\left( D_{\psi _{i},\psi_{j}}k(\lambda \mu )-D_{\psi _{i},\psi _{j}}k(0)\right) \Delta_{2,i}(t)\Delta _{2,j}(t)\lambda d\mu d\lambda \right) dt\right\vert \\[3pt]& \quad\leq \frac{1}{3}C_{\text{BDG}}(3)Md^{\frac{9}{2}}\lambda _{\downarrow}^{-2}\lambda _{\uparrow }^{\frac{3}{2}}T^{\frac{1}{2}}.\end{align*}
    Next, we consider the other two remainders $k_{3,2}(t)$, $k_{3,1}(t)$. Observe that
    \begin{equation*}|D_{x_{i}\psi _{j}}k(\lambda \mu )|\leq \Psi _{1}(\Vert x_{0}-y_{0}\Vert_{2}/T+\lambda \mu \Vert \Delta _{2}(t)\Vert _{2})\end{equation*}
    and
    \begin{equation*}|D_{x_{i}x_{j}}k(\lambda \mu )|\leq \Psi _{2}(\Vert x_{0}-y_{0}\Vert_{2}/T+\lambda \mu \Vert \Delta _{2}(t)\Vert _{2}).\end{equation*}
    Thus, by a similar argument, we can also derive
    \begin{align*}\left\vert\mathbb{E}\left(\int_{0}^{T-\beta } k_{3,2}(t)dt\right)\right\vert\leq \Psi _{3}(\Vert x_{0}-y_{0}\Vert _{2}/T)\end{align*}
    and
    \begin{align*}\left\vert\mathbb{E}\left(\int_{0}^{T-\beta } k_{3,1}(t)dt\right)\right\vert\leq \Psi _{4}(\Vert x_{0}-y_{0}\Vert _{2}/T),\end{align*}
    where $\Psi _{3}(\!\cdot\!)$ and $\Psi _{4}(\!\cdot\!)$ are defined as
    (33) \begin{equation}\Psi _{3}(x)\coloneqq d^{4}T\lambda _{\uparrow }\lambda _{\downarrow}^{-2}Mx+d^{4}T\lambda _{\uparrow }\lambda _{\downarrow}^{-2}M^{2}+d^{3}T\lambda _{\uparrow }\lambda _{\downarrow }^{-1}M+\frac{1}{3}C_{\text{BDG}}(3)d^{\frac{7}{2}}T^{\frac{1}{2}}\lambda _{\uparrow }^{\frac{3}{2}}\lambda _{\downarrow }^{-2}M,\end{equation}
    (34) \begin{align}& \quad \Psi _{4}(x)\coloneqq \left( \frac{1}{4}d^{5}T^{2}\lambda _{\uparrow}\lambda _{\downarrow }^{-2}M+\frac{1}{2}d^{6}T^{2}\lambda _{\uparrow}\lambda _{\downarrow }^{-3}M^{2}\right) x^{2} \notag\\[3pt] & + \frac{1}{9}C_{\text{BDG}}(3)(d\lambda _{\uparrow }T)^{\frac{3}{2}}(\lambda _{\downarrow }^{-2}Md^{3}+2\lambda _{\downarrow}^{-3}M^{3}d^{4})x+d^{5}T^{2}\lambda _{\uparrow }\lambda _{\downarrow}^{-2}M^{2}x+d^{6}T^{2}\lambda _{\uparrow }\lambda _{\downarrow}^{-3}M^{3}x \notag\\[3pt]& +d^{3}T^{2}\lambda _{\uparrow }\lambda _{\downarrow}^{-1}M x+\frac{1}{24}C_{\text{BDG}}(4)d^{5}T\lambda _{\uparrow }\lambda_{\downarrow }^{-2}M+\frac{1}{12}C_{\text{BDG}}(4)d^{6}T\lambda _{\uparrow}\lambda _{\downarrow }^{-3}M^{2} \nonumber\\[3pt]& +\frac{2}{9}C_{\text{BDG}}(3)d^{\frac{9}{2}}T^{\frac{3}{2}}\lambda _{\uparrow }^{\frac{3}{2}}\lambda _{\downarrow}^{-2}M^{2}+\frac{2}{9}C_{\text{BDG}}(3)d^{\frac{11}{2}}T^{\frac{3}{2}}\lambda_{\uparrow }^{\frac{3}{2}}\lambda _{\downarrow }^{-3}M^{3}+\frac{1}{9}C_{ \text{BDG}}(3)d^{\frac{5}{2}}T^{\frac{3}{2}}\lambda _{\uparrow }^{\frac{3}{2}}\lambda _{\downarrow }^{-1}M \nonumber\\[3pt]&+\frac{1}{2}d^{6}T^{2}\lambda _{\uparrow}\lambda _{\downarrow }^{-3}M^{4}+\frac{1}{2}d^{4}T^{2}\lambda _{\uparrow}\lambda _{\downarrow }^{-1}M^{2} +\frac{3}{4}d^{5}T^{2}\lambda _{\uparrow }\lambda _{\downarrow }^{-2}M^{3}.\end{align}

Finally, let us consider $\mathbb{E}(J^{\beta }(0,\eta (T-\beta )))$. Since

\begin{align*}& \mathbb{E}((\eta (T-\beta )-y_{0})_{i}(\eta (T-\beta )-y_{0})_{j}) \\[3pt]&= \mathbb{E}((\eta (T-\beta )-\phi (T-\beta ))_{i}(\eta (T-\beta )-\phi(T-\beta ))_{j} \\[3pt]& \quad +(\phi (T-\beta )-\phi (T))_{i}(\phi (T-\beta )-\phi (T))_{j}) \\[3pt]&\leq d\lambda _{\uparrow }\beta +\beta ^{2}|x_{0}-y_{0}|^{2}/T^{2},\end{align*}

it follows that

\begin{equation*}\begin{split}& \mathbb{E}(J^{\beta }(0,\eta (T-\beta ))) \\[3pt]&= \frac{d}{2}\log (2\pi \beta )+\frac{1}{2}\log \det (a(y_{0}))+\frac{1}{2\beta }\sum_{i,j=1}^{d}g_{ij}(y_{0})\mathbb{E}((\eta (T-\beta)-y_{0})_{i}(\eta (T-\beta )-y_{0})_{j}) \\[3pt]&\leq \frac{d}{2}\log (2\pi \beta )+\frac{d}{2}\log \lambda _{\uparrow }+\frac{1}{2}d^{2}\lambda _{\uparrow }\lambda _{\downarrow }^{-1}+\frac{d}{2}\frac{\Vert x_{0}-y_{0}\Vert _{2}^{2}}{T}.\end{split}\end{equation*}

To conclude, let us summarize all the intermediate results and substitute them into (25). We have

\begin{equation*}J(T,x_{0})=\lim_{\beta \rightarrow 0}J^{\beta }(T-\beta ,x_{0})\leq J_{\uparrow }(\Vert x_{0}-y_{0}\Vert _{2};\ T),\end{equation*}

where $J_{\uparrow }(\cdot ;T)$ is defined as

(35) \begin{align}& J_{\uparrow }(x;\ T)\coloneqq \lambda _{\downarrow }^{-1}T\left( M+\frac{\Vert y_{0}-x_{0}\Vert _{2}}{T}\right) ^{2}+\frac{d}{2}(\!\log (2\pi T))+M(d\lambda_{\uparrow }T)^{1/2}d\lambda _{\downarrow }^{-1} \nonumber \\ &\quad + \frac{d}{2}\lambda _{\downarrow }^{-1}Mx+\frac{1}{2}d\lambda _{\uparrow}T\Psi _{1}(x/T)+\frac{1}{4}d\lambda _{\uparrow }T^{2}\Psi _{2}(x/T) \displaybreak\nonumber\\[3pt]&\quad +\frac{1}{3}C_{\text{BDG}}(3)Md^{\frac{9}{2}}\lambda _{\downarrow}^{-2}\lambda _{\uparrow }^{\frac{3}{2}}T^{\frac{1}{2}}+\Psi _{3}(x/T)+\Psi_{4}(x/T)+\frac{d}{2}\log \lambda _{\uparrow } \notag \\[3pt]&\quad+ \frac{1}{2}d^{2}\lambda _{\uparrow }\lambda _{\downarrow }^{-1}+\frac{d}{2}\frac{x^{2}}{T},\quad \forall \beta \in (0,T). \end{align}

Therefore, if we pick $D_{S}=\sup_{x\in S}\Vert x-x_{0}\Vert _{2}$, it follows that

\begin{equation*}p(x,T;\ x_{0},y)\geq \exp \left( -J_{\uparrow }(D_{S};\ T)\right)\!,\quad \forall x\in S,\end{equation*}

which ends the proof.

Acknowledgements

We gratefully acknowledge support from the NSF grants 1915967, 1820942, and 1838676, as well as from AFOSR.

References

Ait-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. Ann. Statist. 36, 906937.CrossRefGoogle Scholar
Aronson, D. (1967). Bounds for the fundamental solution of a parabolic equation. Bull. Amer. Math. Soc. 73, 890896.CrossRefGoogle Scholar
Bally, V. (2006). Lower bounds for the density of locally elliptic Itô processes. Ann. Prob. 34, 24062440.CrossRefGoogle Scholar
Beskos, A., Papaspiliopoulos, O. and Roberts, G. O. (2006). Retrospective exact simulation of diffusion sample paths with applications. Bernoulli 12, 10771098.CrossRefGoogle Scholar
Beskos, A. and Roberts, G. O. (2005). Exact simulation of diffusions. Ann. Appl. Prob. 15, 24222444.CrossRefGoogle Scholar
Blanchet, J., Chen, X. and Dong, J. (2017). $\epsilon$-strong simulation for multidimensional stochastic differential equations via rough path analysis. Ann. Appl. Prob. 27, 275336.CrossRefGoogle Scholar
Chen, N. and Huang, Z. (2013). Localization and exact simulation of Brownian motion-driven stochastic differential equations. Math. Operat. Res. 38, 591616.CrossRefGoogle Scholar
Fearnhead, P., Papaspiliopoulos, O., Roberts, G. O. and Stuart, A. (2010). Random-weight particle filtering of continuous time processes. J. R. Statist. Soc. B [Statist. Methodology] 72, 497512.CrossRefGoogle Scholar
Fleming, W. H. and Rishel, R. W. (2012). Deterministic and Stochastic Optimal Control. Springer Science & Business Media, New York.Google Scholar
Friedman, A. (2013). Partial Differential Equations of Parabolic Type. Courier Corporation, Chelmsford, MA.Google Scholar
Giles, M. B. (2008). Multilevel Monte Carlo path simulation. Operat. Res. 56, 607617.CrossRefGoogle Scholar
Giles, M. B. and Szpruch, L. (2014). Antithetic multilevel Monte Carlo estimation for multi-dimensional SDEs without Lévy area simulation. Ann. Appl. Prob. 24, 15851620.CrossRefGoogle Scholar
Henry-Labordère, P., Tan, X., Touzi, N. (2017). Unbiased simulation of stochastic differential equations. Ann. Appl. Prob. 27, 33053341.CrossRefGoogle Scholar
Huber, M. (2016). Nearly optimal Bernoulli factories for linear functions. Combinatorics Prob. Comput. 25, 577591.CrossRefGoogle Scholar
Karatzas, I. and Shreve, S. (2012). Brownian Motion and Stochastic Calculus. Springer Science & Business Media, New York.Google Scholar
Kusuoka, S. and Stroock, D. (1987). Applications of the Malliavin calculus, part 3. J. Fac. Sci. Univ. Tokyo Sect. IA Math. 34, 397442.Google Scholar
Łatuszyńki, K., Kosmidis, I., Papaspiliopoulos, O. and Roberts, G. O. (2011). Simulating events of unknown probabilities via reverse time martingales. Random Structures Algorithms 38, 441452.CrossRefGoogle Scholar
McLeish, D. (2011). A general method for debiasing a Monte Carlo estimator. Monte Carlo Meth. Appl. 17, 301315.CrossRefGoogle Scholar
Nacu, Ş. and Peres, Y. (2005). Fast simulation of new coins from old. Ann. Appl. Prob. 15, 93115.CrossRefGoogle Scholar
Revuz, D. and Yor, M. (2013). Continuous Martingales and Brownian Motion. Springer Science & Business Media, New York.Google Scholar
Rhee, C.-H. and Glynn, P. W. (2012). A new approach to unbiased estimation for SDE’s. In WSC ’12: Proceedings of the Winter Simulation Conference, Berlin, pp. 17.CrossRefGoogle Scholar
Rhee, C.-H. and Glynn, P. W. (2015). Unbiased estimation with square root convergence for SDE models. Operat. Res. 63, 10261043.CrossRefGoogle Scholar
Sheu, S.-J. (1991). Some estimates of the transition density of a nondegenerate diffusion Markov process. Ann. Prob. 19, 538561.CrossRefGoogle Scholar