1. Introduction
1.1. Overview
The dynamical Schrödinger problem [Reference Föllmer19, Reference Léonard27] seeks to find the entropic projection of a reference path measure (such as a Wiener measure) onto the space of path measures with given initial and terminal distributions. Originally motivated by physics, the problem has received increasing interest from other application domains such as statistics and machine learning; see [Reference Bernton, Heng, Doucet and Jacob4, Reference De Bortoli, Thornton, Heng and Doucet16, Reference Pavon, Trigila and Tabak38, Reference Stromme43] and references therein. From a purely mathematical point of view, the time marginal flow, called entropic interpolation, provides a powerful technique for deriving functional inequalities and analysis of metric measure spaces [Reference Boissard, Gozlan, Lehec, Léonard, Menz and Schlichting5, Reference Gentil, Léonard, Ripani and Tamanini20, Reference Gigli and Tamanini21], making the dynamical Schrödinger problem of intrinsic interest. Additionally, the static version of the Schrödinger problem is equivalent to quadratic entropic optimal transport (EOT) [Reference Nutz35], the analysis of which has seen extensive research activities. This is in particular due to EOT admitting efficient computation via Sinkhorn’s algorithm, which lends itself well to large-scale data analysis [Reference Cuturi14, Reference Peyré and Cuturi39].
Schrödinger problems can be interpreted as noisy counterparts of Monge–Kantorovich optimal transport (OT) problems. In particular, [Reference Léonard26, Reference Mikami31, Reference Mikami and Thieullen32] studied the rigorous connection between the two problems, establishing convergence of optimal solutions for dynamical Schrödinger problems (Schrödinger bridges) toward the dynamical OT problem when the noise level tends to zero. In this work, we study local rates of convergence of Schrödinger bridges toward the limiting law. Specifically, we establish large-deviation principles (LDPs) for Schrödinger bridges on a path space and characterize the rate function.
Our baseline setting goes as follows. Let
$\mu_0,\mu_1$
be Borel probability measures on
$\mathbb{R}^d$
with finite second moments that will be fixed throughout. Let E be the space of continuous maps
$[0,1] \to \mathbb{R}^d$
endowed with the sup norm
$\| \omega \|_E = \sup_{t \in [0,1]} |\omega (t)|$
for
$\omega = (\omega (t))_{t \in [0,1]} \in E$
(we use
$| \cdot |$
to denote the Euclidean norm). For a given
$\varepsilon > 0$
(noise level), let
$R^{\varepsilon}$
be the law, defined on the Borel
$\sigma$
-field of E, of
$\xi + \sqrt{\varepsilon}W$
, where
$\xi \sim \mu_0$
and
$W=(W(t))_{t \in [0,1]}$
is a standard Brownian motion starting at 0 independent of
$\xi$
. For
$s,t \in [0,1]$
, we denote the projections at t and (s, t) as
$e_t$
and
$e_{st}$
, respectively, i.e.
$e_t(\omega) =\omega (t)$
and
$e_{st} (\omega) = (\omega (s),\omega(t))$
for
$\omega \in E$
. For a given Borel probability measure P on E, we write
$P_t = P \circ e_t^{-1}$
and
$P_{st} = P \circ e_{st}^{-1}$
. Given two endpoint marginals
$\mu_0,\mu_1$
and a reference measure
$R^{\varepsilon}$
, the dynamical Schrödinger problem reads as
where
$\mathcal{H}(\cdot \mid \cdot)$
denotes the relative entropy (see Section 1.4 for the formal definition). Provided
$\mu_1$
has finite entropy relative to the Lebesgue measure (cf. Remark 1), the problem in (1) admits a unique optimal solution
$P^{\varepsilon}$
, called the Schrödinger bridge. The solution
$P^{\varepsilon}$
is given by a mixture of Brownian bridges against a (unique) optimal solution
$\pi_\varepsilon$
to the static Schrödinger problem
where
$\Pi(\mu_0,\mu_1)$
is the set of couplings with marginals
$\mu_0$
and
$\mu_1$
. The zero-noise limit (
$\varepsilon \downarrow 0$
) of (2) corresponds to the OT problem with quadratic cost
$c(x,y) = |x-y|^2/2$
,
which admits a unique optimal solution (OT plan)
$\pi_{\mathrm{o}}$
(as
$\mu_1$
is assumed to be absolutely continuous; [Reference Brenier7]).
In his influential work [Reference Mikami31], Mikami proved, under an additional assumption that
$\mu_0$
is absolutely continuous, that
$P^{\varepsilon}$
converges weakly to the law
$P^{\mathrm{o}}$
of the geodesic path connecting two random endpoints following
$\pi_{\mathrm{o}}$
,
$t \mapsto \sigma^{\xi_0,\xi_1}(t)$
for
$\sigma^{xy}(t) = (1-t)x+ty$
and
$(\xi_0,\xi_1) \sim \pi_{\mathrm{o}}$
, i.e.
$P^{\mathrm{o}} = \int \delta_{\sigma^{xy}} \, \, \mathrm{d}\pi_{\mathrm{o}} (x,y)$
with
$\delta_{\cdot}$
denoting the Dirac delta ([Reference Mikami31] indeed proved convergence with respect to Wasserstein
$W_2$
distance). The limiting law
$P^{\mathrm{o}}$
can be characterized as an optimal solution to the dynamical OT problem
where
$\dot{\omega}(t)$
denotes the time derivative of
$\omega$
and
$\int_{0}^1 |\dot{\omega}(t)|^2 \, \, \mathrm{d} t =\infty$
if
$\omega$
is not absolutely continuous [Reference Léonard26]. The marginal laws of the limiting process give rise to a constant-speed geodesic (displacement interpolation; [Reference McCann29]) in the Wasserstein space connecting
$\mu_0$
and
$\mu_1$
.
Our main large deviation results establish that (see Sections 1.4 and 2 for notation and definitions), under regularity conditions, for any sequence
$\varepsilon_k \downarrow 0$
, the Schrödinger bridges
$P^{\varepsilon_k}$
satisfy an LDP with rate function
$I(h) = \int_0^1(|\dot{h}(t)|^2/2) \, \, \mathrm{d} t - \psi^c(h(0)) - \psi(h(1))$
, where
$\psi$
is an OT (or Kantorovich) potential from
$\mu_1$
to
$\mu_0$
and
$\psi^c$
is its c-transform (the rate function I is set to
$\infty$
if h(0) or h(1) is outside the support of
$\mu_0$
or
$\mu_1$
, respectively). Very roughly, this means
$P^{\varepsilon_k}(A) \approx \exp\{-\varepsilon_k^{-1} \inf_{h \in A}I(h)\}$
for large k. The rate function I(h) vanishes as soon as
$h \in \Sigma_{\pi_0} \,:\!=\, \{ \sigma^{xy} \colon(x,y) \in \mathrm{spt}(\pi_{\mathrm{o}}) \}$
, which agrees with the support of
$P^{\mathrm{o}}$
, but I(h) is positive outside
$\Sigma_{\pi_0}$
in many cases. Effectively, our result implies that the Schrödinger bridges
$P^{\varepsilon}$
charge exponentially small masses outside the support of the limiting law
$P^{\mathrm{o}}$
. Precisely, we establish a weak-type LDP under uniqueness of OT potentials, which allows for marginals with unbounded supports, but induces a full LDP when
$\mu_0,\mu_1$
are compactly supported.
The proof of the main theorem relies on the expression of
$P^{\varepsilon}$
as a
$\pi_\varepsilon$
-mixture of Brownian bridges. The main ingredient of the proof is the exponential continuity [Reference Dinwoodie and Zabell18] of Brownian bridges, i.e. establishing large-deviation upper and lower bounds for Brownian bridges when the locations of initial and terminal points vary with the noise level. Note that an LDP for Brownian bridges with fixed initial and terminal points was derived in [Reference Hsu23], but Hsu’s proof, which relies on transition density estimates, seems difficult to adapt to establishing the exponential continuity. Instead, we use techniques from abstract Wiener spaces (cf. [Reference Stroock45, Chapter 8]) to establish the said result. Given the exponential continuity, the main theorem follows from combining the large-deviation results for
$\pi_\varepsilon$
established in [Reference Bernton, Ghosal and Nutz3]. For the compact support case, we provide a more direct proof of the full LDP using the representation of
$P^{\varepsilon}$
as an integral of a
$(\mu_0 \otimes \mu_1)$
-mixture of Brownian bridges. The proof first shows an LDP for the
$(\mu_0 \otimes \mu_1)$
-mixture of Brownian bridges, and then establishes the full LDP by adapting the (Laplace–)Varadhan lemma (cf. [Reference Dembo and Zeitouni17, Theorem 4.4.2]) and using the convergence of EOT (or Schrödinger) potentials. The alternative proof can be easily adapted to establish an LDP for the dynamical Schrödinger problem with Langevin diffusion as a reference measure when two marginals are compactly supported; cf. Remark 10.
1.2. Literature review
The literature related to this paper is broad, so we confine ourselves to the references directly related to our work. The most closely related are [Reference Bernton, Ghosal and Nutz3, Reference Nutz and Wiesel36], which established large deviations for static Schrödinger problems in fairly general settings, allowing for marginals on a general Polish space and general continuous costs, and our proofs use several results from their work. [Reference Bernton, Ghosal and Nutz3] derived a weak LDP for EOT via a novel cyclical invariance characterization of EOT plans, while [Reference Nutz and Wiesel36] built on convergence of EOT potentials.
The connection between Schrödinger and OT problems has been one of the central problems in the OT literature. We focus here on convergence of Schrödinger problems. The pioneering works in this direction are [Reference Léonard26, Reference Mikami31, Reference Mikami and Thieullen32]. Mikami’s proof in [Reference Mikami31] relies on the fact that the Schrödinger bridge
$P^{\varepsilon}$
corresponds to a weak solution of a certain stochastic differential equation (SDE) with diffusion component
$\sqrt{\varepsilon} \, \, \mathrm{d} W(t)$
, the special case of which is often referred to as the Föllmer process [Reference Lehec25, Reference Mikulincer and Shenfeld33]; see Remark 3. The drift function of said SDE being dependent on
$\varepsilon$
in a nontrivial way (among others) makes the problem of large deviations for dynamical Schrödinger problems fall outside the realm of the Freidlin–Wentzell theory (cf. [Reference Dembo and Zeitouni17, Chapter 5]). On the other hand, Léonard’s proof in [Reference Léonard26] relies on the variational representation of the relative entropy and convex analysis techniques to establish
$\Gamma$
-convergence of the Schödinger objective functions, which yields convergence of the optimal solutions. Arguably, recent interest in EOT (static Schrödinger problem) stems from the fact that EOT provides an efficient computational means for unregularized OT [Reference Cuturi14, Reference Peyré and Cuturi39]. From this perspective, extensive research has been done on convergence and speed of convergence of EOT costs, potentials, plans, and maps toward those of unregularized OT [Reference Altschuler, Niles-Weed and Stromme1, Reference Carlier, Duval, Peyré and Schmitzer8, Reference Carlier, Pegon and Tamanini9, Reference Chizat, Roussillon, Léger, Vialard and Peyré11, Reference Conforti and Tamanini13, Reference Nutz and Wiesel36, Reference Pal37, Reference Pooladian and Niles-Weed40].
To the best of the author’s knowledge, this is the first paper to establish large deviations for dynamical Schrödinger problems. As noted in the beginning, the dynamical aspect of the Schrödinger bridge has received increasing interest from application domains, which calls for further research on this subject. Our results contribute to the rigorous understanding of the connection between the dynamical Schrödinger and OT problems in the small-noise regime. From a technical perspective, our use of mixture representations to explore large deviations on path spaces might be applied to other problems. Finally, in this work we focus on the Wiener reference measure that corresponds to the quadratic OT problem. Arguably, this setting would be the most basic. Extending our large-deviation results to the dynamical problem in abstract metric spaces [Reference Monsaingeon, Tamanini and Vorotnikov34] would be of interest, but beyond the scope of this paper.
1.3. Organization
The rest of the paper is organized as follows. Section 2 contains background on EOT, Schrödinger, and OT problems, and Section 3 presents the main results. All the proofs are gathered in Section 4.
1.4. Notation and definitions
Let
$x \cdot y$
denote the Euclidean inner product for
$x,y \in \mathbb{R}^d$
. For
$x,y \in \mathbb{R}^d$
and a Borel probability measure P on E, let
$P^{xy}$
denote the (regular) conditional law of X given
$(X(0),X(1)) = (x,y)$
for
$X=(X(t))_{t \in [0,1]} \sim P$
. For a set A, let
$\iota_{A}(x) = 0$
if
$x\in A$
and
$=\infty$
if
$x \notin A$
. On a metric space M, let
$B_M(x,r)$
denote the open ball in M with center x and radius r. For a Borel probability measure
$\mu$
on a metric space, its support is denoted by
$\mathrm{spt}(\mu)$
. For probability measures
$\alpha,\beta$
on a common measurable space,
$\mathcal{H}(\alpha \mid \beta)$
is the relative entropy defined as
\begin{align*} \mathcal{H}(\alpha \mid \beta) \,:\!=\, \begin{cases} \displaystyle\int \log \dfrac{\, \mathrm{d}\alpha}{\, \mathrm{d}\beta} \, \, \mathrm{d}\alpha & \text{if} \ \alpha \ll \beta, \\ \infty & \text{otherwise}. \end{cases}\end{align*}
A lower semicontinuous function
$I\colon M \to [0,\infty]$
defined on a metric space M is called a rate function. The rate function I is called good if all level sets
$\{ x \colon I(x) \le \alpha \}$
for
$\alpha \in [0,\infty)$
are compact. Given a sequence of positive reals
$a_k \to \infty$
, a sequence of Borel probability measures
$\{ P_k \}_{k \in \mathbb{N}}$
on M satisfies a weak large-deviation principle with speed
$a_k$
and rate function I if
-
(i) for every open set
$A \subset M$
,
$\liminf_{k\to\infty}a_k^{-1}\log P_k(A) \ge -\inf_{x\in A}I(x)$
; -
(ii) for every compact set
$A \subset M$
,
$\limsup_{k\to\infty}a_k^{-1}\log P_k(A) \le -\inf_{x\in A}I(x)$
.
If condition (ii) holds for every closed set
$A \subset M$
, then we say that
$\{ P_k \}_{k \in \mathbb{N}}$
satisfies a (full) LDP. We refer the reader to [Reference Dembo and Zeitouni17] as an excellent reference on large deviations.
2. Preliminaries
2.1. From EOT to Schrödinger problems
We first review EOT and its connection to the Schrödinger problems, which will play a key role in the proofs of the main results. Proofs of the results below can be found in [Reference Léonard27] or [Reference Nutz and Wiesel36]. Throughout, we set
$\mathcal{X} = \mathrm{spt}(\mu_0)$
and
$\mathcal{Y}= \mathrm{spt}(\mu_1)$
.
Given marginals
$\mu_0,\mu_1$
, the EOT problem for quadratic cost
$c(x,y) = |x-y|^2/2$
reads as
Setting
$\, \mathrm{d}\nu_\varepsilon=Z_\varepsilon^{-1}\mathrm{e}^{-c/\varepsilon}\,\, \mathrm{d}(\mu_0\otimes\mu_1)$
with
$Z_{\varepsilon}= \int\mathrm{e}^{-c/\varepsilon}\,\, \mathrm{d}(\mu_0\otimes\mu_1)$
, we have
which implies that (4) is equivalent to the static Schrödinger problem
Recall that
$\Pi (\mu_0,\mu_1)$
is compact for the weak topology. Since
$\pi \mapsto \mathcal{H}(\pi \mid \nu_\varepsilon)$
is lower semicontinuous with respect to (w.r.t.) the weak topology (which follows from the variational representation of the relative entropy) and strictly convex on the set of
$\pi$
such that
$\mathcal{H}(\pi \mid \nu_\varepsilon)$
is finite (which follows from the strict convexity of
$x \mapsto x\log x$
), the problem in (5) admits a unique optimal solution
$\pi_{\varepsilon}$
, provided
$\mathcal{H}(\pi \mid \nu_{\varepsilon}) < \infty$
for some
$\pi \in \Pi (\mu_0,\mu_1)$
. Since
$\mu_0$
and
$\mu_1$
have finite second moments, we have
$\mathcal{H}(\mu_0 \otimes \mu_1 \mid \nu_{\varepsilon}) < \infty$
. We will call
$\pi_{\varepsilon}$
the EOT plan.
The EOT plan has a density w.r.t.
$\mu_0 \otimes \mu_1$
given by
where
$\varphi_{\varepsilon} \in L^1(\mu_0)$
and
$\psi_{\varepsilon} \in L^1(\mu_1) $
are EOT potentials satisfying the Schrödinger system
\begin{equation} \begin{cases} \displaystyle\int \mathrm{e}^{(\varphi_{\varepsilon} (x) + \psi_{\varepsilon} (y) - c(x,y))/\varepsilon} \, \, \mathrm{d}\mu_1 (y)= 1, & \text{$\mu_0$-almost every }{x}, \\ \displaystyle\int \mathrm{e}^{(\varphi_{\varepsilon}(x) + \psi_{\varepsilon}(y) - c(x,y))/\varepsilon} \, \, \mathrm{d}\mu_0(x) = 1, & \text{$\mu_1$-almost every }{y}. \end{cases} \end{equation}
EOT potentials are almost surely (a.s.) unique up to additive constants, i.e. if
$(\tilde{\varphi}_{\varepsilon},\tilde{\psi}_{\varepsilon})$
is another pair of EOT potentials, then there exists a constant
$a \in \mathbb{R}$
such that
$\tilde{\varphi}_{\varepsilon} = \varphi_{\varepsilon} + a$
$\mu_0$
-almost everywhere (a.e.) and
$\tilde{\psi}_{\varepsilon} = \psi_{\varepsilon} - a$
$\mu_1$
-a.e. In many cases (e.g. as soon as
$\mu_0,\mu_1$
are sub-Gaussian), we can choose versions of (finite) EOT potentials for which the Schrödinger system (6) holds for all
$x \in \mathcal{X}$
and
$y \in \mathcal{Y}$
(in fact for all
$x \in \mathbb{R}^d$
and
$y \in \mathbb{R}^d$
); see [Reference Mena and Niles-Weed30, Proposition 6]. Whenever possible, we always choose such versions of EOT potentials.
To link EOT to the original static Schrödinger problem (2), we make the following assumption.
Assumption 1.
$\mu_1 \ll \, \mathrm{d} y$
and
$\mathcal{H}(\mu_1 \mid \, \mathrm{d} y) < \infty$
.
Remark 1. (On the relative entropy
$\mathcal{H}(\mu_1\mid\, \mathrm{d} y)$
.) Here, as in [Reference Léonard27, Appendix A], we define the relative entropy
$\mathcal{H}(\mu_1\mid\, \mathrm{d} y)$
against the Lebesgue measure
$\, \mathrm{d} y$
given by
where
$\rho = \, \mathrm{d}\mu_1/\, \mathrm{d} y$
and
$\mathfrak{g}$
is the standard Gaussian density on
$\mathbb{R}^d$
.
The reference measure
$R_{01}^{\varepsilon} = R^{\varepsilon} \circ e_{01}^{-1}$
for (2) has a density w.r.t.
$\, \mathrm{d} y\,\, \mathrm{d}\mu_0(x)$
given by
$\, \mathrm{d} R_{01}^\varepsilon(x,y) = (2\pi\varepsilon)^{-d/2}\mathrm{e}^{-c(x,y)/\varepsilon}\,\, \mathrm{d} y\, \mathrm{d}\mu_0(x)$
, so
$\nu_{\varepsilon}$
is absolutely continuous w.r.t.
$R_{01}^\varepsilon$
with density
$\, \mathrm{d}\nu_{\varepsilon}(x,y)=(2\pi\varepsilon)^{d/2} Z_{\varepsilon}^{-1} \rho(y) \, \, \mathrm{d} R^{\varepsilon}_{01}(x,y)$
. Hence,
and the unique optimal solution to (2) is given by
$\pi_{\varepsilon}$
.
Going back to the dynamical Schrödinger problem (1), by the chain rule for the relative entropy, we have
$\mathcal{H}(P \mid R^{\varepsilon}) = \mathcal{H}(P_{01} \mid R_{01}^{\varepsilon}) +\int\mathcal{H}(P^{xy} \mid R^{\varepsilon,xy})\,\, \mathrm{d} P_{01}(x,y)$
, which is minimized by taking
$P^{xy} = R^{\varepsilon,xy}$
and
$P_{01}=\pi_\varepsilon$
, i.e.,
Alternatively, setting
$\bar{R}^{\varepsilon} = \int R^{\varepsilon,xy} \, \, \mathrm{d}(\mu_0 \otimes \mu_1)$
, which is a
$(\mu_0 \otimes \mu_1)$
-mixture of Brownian bridges,
$P^\varepsilon$
has a density w.r.t.
$\bar{R}^\varepsilon$
given by
where
$\phi_{\varepsilon}\colon \mathcal{X} \times \mathcal{Y} \rightarrow \mathbb{R}$
is a function defined by
$\phi_{\varepsilon}(x,y) = c(x,y) - \varphi_{\varepsilon}(x) - \psi_{\varepsilon}(y)$
. To see this, for
$X = (X(t))_{t \in [0,1]} \sim \bar{R}^{\varepsilon}$
and every Borel set
$A \subset E$
,
where we used
$(X(0),X(1)) \sim \mu_0 \otimes \mu_1$
.
Remark 2. (On Assumption 1.) Assumption 1 is unavoidable to ensure the problem in (2) has a unique optimal solution. On the other hand, the initial distribution
$\mu_0$
need not be absolutely continuous, e.g.
$\mu_0$
can be discrete.
Remark 3. (Connnection to Follmer processes.) The Schrödinder bridge
$P^{\varepsilon}$
corresponds to the law of a weak solution to a certain SDE, the special case of which is often referred to as the Föllmer process. Let
$\mathcal{B}(E)$
be the Borel
$\sigma$
-field on E. Equip
$(E,\mathcal{B}(E),R^{\varepsilon})$
with the canonical filtration (augmented, if necessary), and denote by
$X=(X(t))_{t \in [0,1]}$
the canonical process, i.e.
$X(t,\omega) = \omega(t)$
for
$\omega = (\omega(t))_{t \in [0,1]} \in E$
. Under
$R^\varepsilon$
,
$W = \varepsilon^{-1/2} (X-X(0))$
is a standard Brownian motion starting at 0. Recalling that
$\rho=\, \mathrm{d}\mu_1/\, \mathrm{d} y$
, we set
$\tilde{\psi}_\varepsilon(y) \,:\!=\, \varepsilon((d/2)\log(2\pi\varepsilon)+\log\rho(y)) + \psi_{\varepsilon}(y)$
. With this notation, it can be seen that
which implies (cf. the preceding argument) that
We write
\begin{align*} \mathfrak{h}_\varepsilon (t,y) \,:\!=\, \begin{cases} (2\pi\varepsilon(1-t))^{-d/2}\displaystyle\int \exp\bigg\{{-}\dfrac{1}{\varepsilon}\bigg(\frac{c(y,y')}{1-t}-\tilde{\psi}_{\varepsilon}(y')\bigg)\bigg\}\, \, \mathrm{d} y' & \text{if} \ t \in [0,1), \\ \mathrm{e}^{\tilde{\psi}_{\varepsilon}(y)/\varepsilon} & \text{if} \ t = 1, \end{cases} \end{align*}
which satisfies
$(\partial_t + \varepsilon\Delta_y/2)\mathfrak{h}_\varepsilon = 0$
under regularity conditions (cf. the heat equation). Applying Itô’s formula (cf. [Reference Karatzas and Shreve24, Theorem 3.3.6]), we have
\begin{align*} \log\mathfrak{h}_{\varepsilon}(1,X(1)) = \underbrace{\log\mathfrak{h}_{\varepsilon}(0,X(0))}_{=-\varphi_{\varepsilon}(X(0))/\varepsilon} + \frac{1}{\sqrt{\varepsilon}\,}\int_0^1 b_\varepsilon(t,X(t))\cdot\,\, \mathrm{d} W(t) - \frac{1}{2\varepsilon}\int|b_\varepsilon(t,X(t))|^2\,\, \mathrm{d} t, \end{align*}
where we define
$b_\varepsilon (t,y) = \varepsilon \nabla_y \log \mathfrak{h}_\varepsilon (t,y)$
. We conclude that,
$R^\varepsilon$
-a.s.,
By Girsanov’s theorem, under
$P^{\varepsilon}$
the new process
$\tilde{W} = W - \varepsilon^{-1/2}\int_0^\cdot b_\varepsilon(t,X(t))\,\, \mathrm{d} t$
is a standard Brownian motion, and the process X solves the SDE
(see, e.g., [Reference Karatzas and Shreve24, Proposition 5.3.6]; see also [Reference Dai Pra15]). When
$\varepsilon=1$
and
$\mu_0 = \delta_0$
, we can take
$\varphi_{1}(x) = 0$
and
$\psi_{1}(y) = |y|^2/2$
, so
$\mathrm{e}^{\tilde{\psi}_{1}}$
is the density of
$\mu_1$
w.r.t. the standard Gaussian. Hence, the SDE in (9) corresponds to the Föllmer process in [Reference Lehec25, (12)] and [Reference Mikulincer and Shenfeld33]. Abusing terminology, we call
$P^{\varepsilon}$
with
$\varepsilon > 0$
and
$\mu_0 = \delta_0$
a (perturbed) Föllmer process.
2.2. OT potentials
The rate function for Schrödinger bridges involves OT potentials. For duality theory of OT, we refer the reader to [Reference Ambrosio, Gigli and Savare2, Reference Santambrogio41, Reference Villani47]. The OT problem (3) admits a dual problem that reads as
\begin{equation} \max_{\substack{(\varphi,\psi) \in L^1(\mu_0) \times L^1(\mu_1) \\ \varphi + \psi \le c}} \int\varphi\,\, \mathrm{d}\mu_0 + \int\psi\,\, \mathrm{d}\mu_1. \end{equation}
By restricting to the respective support, it is assumed without loss of generality that
$\varphi$
and
$\psi$
are functions defined on
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively. One of
$\varphi$
and
$\psi$
can be replaced with the c-transform of the other. Recall that the c-transform of
$\psi\colon \mathcal{Y} \rightarrow [\!-\infty,\infty)$
with
$\psi \not \equiv -\infty$
is a function
$\psi^c\colon \mathcal{X} \rightarrow [\!-\infty,\infty)$
defined by
$\psi^c (x) \,:\!=\, \inf_{y \in \mathcal{Y}} \{ c(x,y) - \psi (y) \}$
,
$x \in \mathcal{X}$
. The c-transform of
$\varphi\colon \mathcal{X} \rightarrow [\!-\infty,\infty)$
with
$\varphi \not \equiv -\infty$
is defined analogously. The dual problem (10) then reduces to
whose maximum is attained at some c-concave function
$\psi \in L^1(\mu_1)$
with
$\psi^c \in L^1(\mu_0)$
(a function on
$\mathcal{Y}$
is called c-concave if it is the c-transform of a function on
$\mathcal{X}$
); see, e.g., [Reference Villani47, Theorem 5.9] or [Reference Ambrosio, Gigli and Savare2, Theorem 6.1.5]. We call such a
$\psi$
an OT potential from
$\mu_1$
to
$\mu_0$
. An OT potential from
$\mu_0$
to
$\mu_1$
is defined analogously.
For any OT potential
$\psi$
and any OT plan
$\pi$
, the support of
$\pi$
is contained in the c-superdifferential
$\partial^c \psi$
of
$\psi$
,
$\partial^c\psi \,:\!=\, \{(x,y) \colon \psi^c(x) + \psi(y) = c(x,y)\}$
. Indeed,
$\partial^c \psi$
is a closed set (as c-concave functions are upper semicontinuous) on which
$\pi$
has full measure by duality, so
$\mathrm{spt}(\pi)\subset \partial^c \psi$
. In particular, for
$(x,y) \in \mathrm{spt}(\pi)$
,
$\psi^c(x)$
and
$\psi(y)$
are finite.
Observe that Assumption 1 ensures that the OT problem (3) admits a unique OT plan
$\pi_o$
. Let
$\mathcal{X}_{\mathrm{o}}$
and
$\mathcal{Y}_{\mathrm{o}}$
denote the projections of
$\mathrm{spt}(\pi_{\mathrm{o}})$
onto
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively, i.e.,
and
$\mathcal{Y}_{\mathrm{o}}$
is defined analogously. As
$\pi_{\mathrm{o}}$
is a coupling for
$\mu_0$
and
$\mu_1$
, the sets
$\mathcal{X}_{\mathrm{o}}$
and
$\mathcal{Y}_{\mathrm{o}}$
have full
$\mu_0$
- and
$\mu_1$
-measure, respectively. As in [Reference Bernton, Ghosal and Nutz3], we assume uniqueness of OT potentials (from
$\mu_1$
to
$\mu_0$
) on
$\mathcal{Y}_{\mathrm{o}}$
to derive our large-deviation results.
Assumption 2. The dual problem (11) admits a unique OT potential
$\psi$
on
$\mathcal{Y}_{\mathrm{o}}$
, i.e. if
$\tilde{\psi}$
is another OT potential, then
$\psi - \tilde{\psi}$
is constant on
$\mathcal{Y}_{\mathrm{o}}$
.
Appendix B in [Reference Bernton, Ghosal and Nutz3] and [Reference Staudt, Hundrieser and Munk42] provide various sufficient conditions under which uniqueness of OT potential holds. For example, Assumption 2 holds under each of the following cases:
-
(A)
$\mathcal{X}$
and
$\mathcal{Y}$
are compact, and one of them agrees with the closure of a connected open set [Reference Karatzas and Shreve24, Theorem 7.18]. -
(B) The interior
$\mathrm{int}(\mathcal{Y})$
is connected,
$\mu_1$
is absolutely continuous with positive Lebesgue density on
$\mathrm{int}(\mathcal{Y})$
, and
$\mu_1(\partial\mathcal{Y})=0$
[Reference Bernton, Ghosal and Nutz3, Proposition B.2].
Case (A) does not require
$\mu_0$
or
$\mu_1$
to have a Lebesgue density (although Assumption 1 requires
$\mu_1 \ll \, \mathrm{d} y$
). We provide a self-contained proof of Case (A) in Lemma 3 for completeness. Case (B) imposes no restrictions on
$\mu_0$
, so it allows
$\mu_0$
to be discrete.
Remark 4. Often, regularity conditions are imposed on the input measure
$\mu_0$
to ensure uniqueness or regularity of OT potentials from
$\mu_0$
to
$\mu_1$
. For the (static) EOT case, the role of
$\mu_0$
and
$\mu_1$
is symmetric, so it is possible, without loss of generality, to focus on the forward (
$\mu_0 \to \mu_1$
) case. However, in our dynamical setting, the roles of
$\mu_0$
and
$\mu_1$
are asymmetric because of Assumption 1. Since Assumption 1 already imposes absolute continuity on
$\mu_1$
, we treat OT potentials for the backward direction (
$\mu_1 \to \mu_0$
), contrary to the convention in the literature.
3. Main results
We first recall the weak convergence of
$P^\varepsilon$
toward
$P^{\mathrm{o}} = \int\delta_{\sigma^{xy}}\,\, \mathrm{d}\pi_{\mathrm{o}}(x,y)$
with
$\sigma^{xy}(t) = (1-t)x+ty$
. Recall that the cost function is
$c(x,y) = |x-y|^2/2$
.
Proposition 1. Under Assumption 1,
$P^\varepsilon \to P^{\mathrm{o}}$
weakly as
$\varepsilon \downarrow 0$
. The support of
$P^{\mathrm{o}}$
agrees with
$\Sigma_{\pi_{\mathrm{o}}} \,:\!=\, \{\sigma^{xy} \colon (x,y) \in \mathrm{spt}(\pi_{\mathrm{o}})\}$
.
Remark 5. (On Proposition 1.) A version of this proposition was proved in [Reference Mikami31] under the extra assumption that
$\mu_0$
is absolutely continuous. [Reference Léonard26, Theorem 3.7] implies the proposition but the proof is somewhat involved (as it covers more general settings). We provide a simple proof in Section 4.
We are now in a position to state our main results. Let H denote the space of absolutely continuous maps
$h\colon [0,1] \to \mathbb{R}^d$
with
$\int_0^1 |\dot{h}(t)|^2\,\, \mathrm{d} t < \infty$
, where
$\dot{h}(t) = \, \mathrm{d} h(t)/\, \mathrm{d} t$
. We endow H with the (semi-)inner product
$(g,h)_H = \int_0^1\dot{g}(t)\cdot\dot{h}(t)\,\, \mathrm{d} t$
. Set
$\| \cdot \|_{H} = \sqrt{(\cdot,\cdot )_H}$
. Formally, define
$\| h \|_{H} = \infty$
for
$h \in E \setminus H$
. We first state the weak-type LDP for Schrödinger bridges, which allows for marginals with unbounded supports.
Theorem 1. (Weak-type LDP for Schrodinger bridges.) Suppose Assumptions 1 and 2 hold. Pick any
$\varepsilon_k \downarrow 0$
. Then the following hold:
-
(i) For every open set
$A \subset e_{01}^{-1}(\mathcal{X}_o \times \mathcal{Y}_o)$
(w.r.t. the relative topology), for the rate function
\begin{align*} \liminf_{k \to \infty} \varepsilon_k \log P^{\varepsilon_k} (A) \ge - \inf_{h \in A} I(h) \end{align*}
$I(h) = {\|h\|_H^2}/{2} - \psi^c(h(0)) - \psi(h(1))$
.
-
(ii) For every closed set
$A \subset E$
of the form
$A = e_{01}^{-1}(C)$
for some compact set
$C \subset \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$
,
\begin{equation*} \limsup_{k \to \infty} \varepsilon_{k}\log P^{\varepsilon_k} (A) \le -\inf_{h \in A} I(h). \end{equation*}
Theorem 1 is not precisely a weak LDP since (ii) holds for every compact set
$C \subset e_{01}^{-1}(\mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}})$
but also for some noncompact closed sets. As such, we call Theorem 1 a weak-type LDP. If the marginals have compact supports, then a full LPD holds, subject to one technical condition essential to guarantee the uniqueness of OT potentials.
Corollary 1. (Full LDP for Schrodinger bridges.) Suppose Assumption 1 holds. Pick any
$\varepsilon_k \downarrow 0$
. If
$\mathcal{X}$
and
$\mathcal{Y}$
are compact and one of them agrees with the closure of a connected open set, then the sequence
$\{ P^{\varepsilon_k} \}_{k \in \mathbb{N}}$
satisfies a (full) LDP on E with speed
$\varepsilon_k^{-1}$
and good rate function I, where I is set to
$\infty$
outside
$e_{01}^{-1}(\mathcal{X} \times \mathcal{Y})$
.
We leave several remarks on the preceding results.
Remark 6. (On Corollary 1.) The sets
$\mathcal{X}, \mathcal{Y}$
being compact implies
$\mathcal{X}_{\mathrm{o}} = \mathcal{X}$
and
$\mathcal{Y}_{\mathrm{o}} = \mathcal{Y}$
, as the projections from
$\mathcal{X} \times \mathcal{Y}$
onto
$\mathcal{X}$
and
$\mathcal{Y}$
are then closed maps. The assumption of Corollary 1 guarantees the uniqueness of OT potentials; see the discussion after Assumption 2. Since
$P^{\varepsilon}$
charges no mass outside
$e_{01}^{-1}(\mathcal{X} \times \mathcal{Y})$
, the full LDP is indeed deduced from the preceding theorem. Connectedness of the support of one of the marginals is essential for the uniqueness of OT potentials (see the discussion before Proposition B.2 in [Reference Bernton, Ghosal and Nutz3]). The full LPD (more specifically, establishing exponential tightness) for Schrödinger bridges requires both marginals to be compactly supported, since exponential tightness implies the limiting law
$P^{\mathrm{o}}$
is concentrated on a compact set, which fails to hold if one of the marginals has unbounded support. See [Reference Nutz and Wiesel36, Remark 4.2(b)] for a relevant discussion in the static case.
Remark 7. (On the rate function I(h).) Since
$\psi^c(x) + \psi(y) \le c(x,y)$
by construction, the rate function I(h) is positive as soon as
$h \ne \sigma^{h(0),h(1)}$
. Even when
$h =\sigma^{h(0),h(1)}$
, which entails
$\|h\|_H^2/2=c(h(0),h(1))$
, the rate function
$I(h) = c(h(0),h(1)) - \psi^c (h(0)) - \psi(h(1))$
can be positive provided
$(h(0),h(1)) \notin \mathrm{spt}(\pi_{\mathrm{o}})$
. [Reference Bernton, Ghosal and Nutz3, Section 5] provides several conditions under which the rate function for the static case,
$\phi (x,y) = c(x,y) - \psi^c(x) - \psi(y)$
, is positive outside
$\mathrm{spt}(\pi_{\mathrm{o}})$
. Considering the characterization of the support of
$P_{\mathrm{o}}$
, our large-deviation results essentially imply that the Schrödinger bridges
$P^{\varepsilon}$
charge exponentially small masses outside
$\mathrm{spt}(P_{\mathrm{o}})$
when
$\varepsilon \downarrow 0$
.
Remark 8. (Proofs of Theorem 1 main and Corollary 1.) The proof of Theorem 1 uses the expression
$P^{\varepsilon}(A) = \int R^{\varepsilon,xy}(A)\,\, \mathrm{d}\pi_{\varepsilon}(x,y)$
from (7). The main ingredient is the exponential continuity of
$\{ R^{\varepsilon_k,xy} \}$
, i.e. establishing large-deviation upper and lower bounds for
$\{R^{\varepsilon_k,(x_k,y_k)}\}_{k \in \mathbb{N}}$
when
$(x_k,y_k) \to (x,y)$
, which will be proved in Proposition 4. The proof then directly evaluates
$P^\varepsilon (A)$
by combining the large-deviation results for the static case from [Reference Bernton, Ghosal and Nutz3]. As noted in Remark 6, Corollary 1 is a special case of Theorem 1. Nonetheless, we provide a separate, more direct proof for the compact support case. It relies on the expression
$P^{\varepsilon}(A) = \int_{A}\mathrm{e}^{-\phi_{\varepsilon} \circ e_{01}(\omega)/\varepsilon}\,\, \mathrm{d}\bar{R}^{\varepsilon}(\omega)$
from (8). Then the proof proceeds by (i) proving an LDP for
$\bar{R}^{\varepsilon}$
, which follows directly from the exponential continuity [Reference Dinwoodie and Zabell18], and then (ii) adapting the (Laplace–)Varadhan lemma (cf. [Reference Dembo and Zeitouni17, Theorem 4.4.2]) to evaluate
$P^{\varepsilon}(A)$
. Step (ii) is relatively simple, because, while the function
$\phi_{\varepsilon}$
depends on
$\varepsilon$
, so the Varadhan lemma is not directly applicable, the assumption of Corollary 1 ensures uniform convergence of the EOT potentials.
Remark 9. (On uniqueness of OT potentials.) Inspection of the proof of Corollary 1 reveals that, as long as Assumption 1 holds and
$\mathcal{X}$
and
$\mathcal{Y}$
are compact (but without assuming uniqueness of OT potentials), the conclusion of Corollary 1 continues to hold, provided that
for some (continuous) functions
$\bar{\varphi}$
and
$\bar{\psi}$
on
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively (necessarily,
$(\bar{\varphi},\bar{\psi})$
are dual potentials for
$(\mu_0,\mu_1)$
). The rate function I needs to be modified so that
$(\psi^c,\psi)$
are replaced with
$(\bar{\varphi},\bar{\psi})$
. A similar comment applies to Proposition 3. Conversely, the uniform convergence of the EOT potentials in (12) is necessary for the LDP for the Schrödinger bridges
$\{ P^{\varepsilon_k} \}_{k \in \mathbb{N}}$
to hold, by [Reference Nutz and Wiesel36, Proposition 4.5] in the static case and the contraction principle.
We now look at a few special cases.
Example 1. (Föllmer process.) When
$\mu_0 = \delta_0$
, we have
$\mathcal{Y}_{\mathrm{o}} = \mathcal{Y}$
and
$\psi(y) = |y|^2/2$
, so the rate function reduces to
$I(h) = \| h \|_H^2/2 - |h(1)|^2/2 + \iota_{\{ 0 \}}(h(0)) + \iota_{\mathcal{Y}}(h(1))$
, which vanishes if and only if
$h(t) = ty$
for
$t \in [0,1]$
for some
$y \in \mathcal{Y}$
, i.e. if and only if
$h \in \mathrm{spt}(P^{\mathrm{o}})$
.
Example 2. (Two-point marginal.) The LDP in Corollary 1 directly yields an LDP for
$P^{\varepsilon_k} \circ f^{-1}$
for any continuous function f from E into another metric space by the contraction principle (cf. [Reference Dembo and Zeitouni17, Theorem 4.2.1]). We consider the case where
$f=e_{st}$
for
$0 \le s < t \le 1$
. Note that
$P_{st}^{\varepsilon_k}$
is a coupling for
$P_{s}^{\varepsilon_k}$
and
$P_{t}^{\varepsilon_k}$
. Recall that the marginal flow
$(P_t^{\varepsilon})_{t \in [0,1]}$
is called an entropic interpolation, and its limiting analog
$(P_t^{\mathrm{o}})_{t \in [0,1]}$
is a displacement interpolation connecting
$\mu_0$
and
$\mu_1$
. To characterize the rate function for
$P_{st}^{\varepsilon_k}$
, we need additional notation.
For a function
$f\colon \mathbb{R}^d \to (-\infty,\infty]$
and
$t \ge 0$
, define
The family of operators
$\{ \mathcal{Q}_t \}_{t \ge 0}$
is called the Hopf–Lax semigroup; cf. [Reference Villani47, Chapter 7]. Assuming Case (A) after Assumption 2, we set
$\varphi = \psi^c$
and extend
$\varphi$
and
$\psi$
to the whole
$\mathbb{R}^d$
by setting
$\varphi=-\infty$
and
$\psi=-\infty$
outside
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively. For
$0 \le s < t \le 1$
, consider the rescaled cost
$c^{s,t}(x,t) = c(x,y)/(t-s)$
.
Proposition 2. (LDP for two-point marginal.) Suppose Assumption 1 holds. Pick any
$0 \le s < t \le 1$
and
$\varepsilon_k \downarrow 0$
. If
$\mathcal{X}$
and
$\mathcal{Y}$
are compact and one of them agrees with the closure of a connected open set, then the sequence
$\{ P^{\varepsilon_k}_{st} \}_{k \in \mathbb{N}}$
satisfies an LDP on
$\mathbb{R}^{2d}$
with speed
$\varepsilon_k^{-1}$
and good rate function
$I_{st}(x,y) = c^{s,t}(x,y) - \varphi_s(x) - \psi_t(y)$
, where
$(\varphi_s,\psi_t)\,:\!=\,(\!-\mathcal{Q}_s (\!-\varphi),-\mathcal{Q}_{1-t}(\!-\psi))$
are dual potentials for
$(P_s^{\mathrm{o}},P_t^{\mathrm{o}})$
w.r.t.
$c^{s,t}$
, i.e. optimal solutions to (10) with
$(\mu_0,\mu_1,c)$
replaced by
$(P_s^{\mathrm{o}},P_t^{\mathrm{o}},c^{s,t})$
.
Finally, we point out that the direct proof for Corollary 1 can be easily adapted to cover the dynamical Schrödinger problem with Langevin diffusion as a reference measure.
Remark 10. (Langevin diffusion as reference measure.) For a bounded smooth potential
$V\colon \mathbb{R}^d \to \mathbb{R}$
with bounded derivatives, consider the Langevin diffusion
$X=(X(t))_{t \ge 0}$
defined by the unique (strong) solution to the SDE
$\, \mathrm{d} X(t) = -\nabla V(X(t))\,\, \mathrm{d} t + \, \mathrm{d} W(t)$
,
$X(0) \sim \mu_0$
, where
$(W(t))_{t \ge 0}$
is a standard Brownian motion starting at 0 independent of X(0). Let
$p_{t}(x,y)$
denote the transition density of the Langevin diffusion X and
$\check{R}^\varepsilon$
be the law of
$X^{\varepsilon} \,:\!=\, (X(\varepsilon t))_{t \in [0,1]}$
defined on
$\mathcal{B}(E)$
. Instead of the Wiener reference measure as in (1), we consider the dynamical Schrödinger problem with reference measure
$\check{R}^{\varepsilon}$
:
Under Assumption 1, arguing as in Section 2 (see also [Reference Léonard27, Proposition 2.3]), we can see that the unique optimal solution to (13) is given by
where
$\check{R}^{\varepsilon,xy}$
is the conditional law of
$X^\varepsilon$
given
$(X^\varepsilon (0),X^{\varepsilon}(1))=(x,y)$
and
$\check{\pi}^\varepsilon$
is the unique optimal solution to the static EOT problem
with
$c_{\varepsilon}(x,y)\,:\!=\,-\varepsilon \log p_\varepsilon (x,y)$
. Recall that the transition density
$p_t (x,y)$
is everywhere positive (cf. [Reference Stroock44, Chapter 4]) and the conditional laws
$\check{R}^{\varepsilon,xy}$
are defined for all
$(x,y) \in \mathbb{R}^{2d}$
[Reference Bravo and Chaumont6]. The classical Varadhan asymptotics implies that
$\lim_{\varepsilon \downarrow 0} c_\varepsilon (x,y) = |x-y|^2/2=c(x,y)$
(cf. [Reference Stroock44, Chapter 4]), so we can expect that the Schrödinger bridges
$\{ \check{P}^\varepsilon \}_{\varepsilon > 0}$
satisfy the LDP with the same rate function I as in the Brownian case. The next proposition confirms this under a similar setting to Corollary 1.
Proposition 3. (Full LDP for Schrodinger bridges: Langevin case.) Suppose Assumption 1 holds. Pick any
$\varepsilon_k \downarrow 0$
. If
$\mathcal{X}$
and
$\mathcal{Y}$
are compact and one of them agrees with the closure of a connected open set, then the sequence
$\{ \check{P}^{\varepsilon_k} \}_{k \in \mathbb{N}}$
satisfies a (full) LDP on E with speed
$\varepsilon_k^{-1}$
and good rate function I, where I is given in Corollary 1.
The condition on the potential V appears to be stronger than needed, but is imposed for the sake of simplicity. As announced, the proof follows similar arguments to the direct proof for Corollary 1. To establish exponential continuity for the Langevin bridge
$\check{R}^{\varepsilon,xy}$
, we use the explicit expression for the Radon–Nikodym density of the Langevin bridge against the Brownian bridge; cf. [Reference Levy and Krener28].
4. Proofs for Section 3
Recall that
$R^{\varepsilon,xy}$
is the (regular) conditional law of
$x+\sqrt{\varepsilon}W$
given
$x+\sqrt{\varepsilon}W(1) = y$
for a standard Brownian motion
$W=(W(t))_{t \in [0,1]}$
starting at 0. Alternatively,
$R^{\varepsilon,xy}$
can be characterized as the law of
$\sqrt{\varepsilon} W^\circ + \sigma^{xy}$
with
$W^\circ = (W(t)-tW(1))_{t \in [0,1]}$
a standard Brownian bridge. For simplicity of notation, let
$z = (x,y) \in \mathbb{R}^{2d}$
and write
$R^{\varepsilon,z} = R^{\varepsilon,xy}$
.
4.1. Proof of Proposition 1
Proof of Proposition
1. By the uniqueness of the OT plan, we have
$\pi_{\varepsilon} \to \pi_{\mathrm{o}}$
weakly by [Reference Bernton, Ghosal and Nutz3, Proposition 3.2], which implies that
(see [Reference van der Vaart and Wellner46, Chapter 1.12]. Pick any 1-Lipschitz function
$f\colon E \to [\!-1,1]$
. We have
\begin{align*} \int f\,\, \mathrm{d} P^{\varepsilon} = \int\bigg(\int f\,\, \mathrm{d} R^{\varepsilon,z}\bigg)\,\, \mathrm{d}\pi_{\varepsilon}(z) = \int\underbrace{\mathbb{E}[f(\sqrt{\varepsilon}W^{\circ} + \sigma^z)]}_{=:g_{\varepsilon}(z)}\,\, \mathrm{d}\pi_{\varepsilon}(z). \end{align*}
By construction,
$g_\varepsilon$
is bounded by 1,
$|g_\varepsilon(z) - g_\varepsilon(z')| \le \|\sigma^z - \sigma^{z'}\|_{E} \le 2|z-z'|$
, and
$\lim_{\varepsilon \downarrow 0} g_{\varepsilon}(z)= f(\sigma^z) = \int f\,\, \mathrm{d}\delta_{\sigma^z}$
. Hence,
\begin{align*} \int g_{\varepsilon}\,\, \mathrm{d}\pi_{\varepsilon} \le \int g_{\varepsilon}\,\, \mathrm{d}\pi_o + 2\eta_\varepsilon = \underbrace{\int\bigg(\int f\,\, \mathrm{d}\delta_{\sigma^z}\bigg)\,\, \mathrm{d}\pi_{\mathrm{o}}}_{=\int f\,\, \mathrm{d} P^o} + o(1), \end{align*}
where we used the dominated convergence theorem. The reverse inequality follows similarly, and we conclude that
$\lim_{\varepsilon\downarrow0}\int f\,\, \mathrm{d} P^{\varepsilon} = \int f\,\, \mathrm{d} P^{\mathrm{o}}$
, which yields
$P^\varepsilon \to P^{\mathrm{o}}$
weakly. The second claim follows from Lemma 1, which follows.
Lemma 1. For any Borel probability measure
$\gamma$
on
$\mathbb{R}^{2d}$
, the mixture
$P = \int\delta_{\sigma^{xy}}\,\, \mathrm{d}\gamma(x,y)$
has support
$\Sigma_{\gamma} \,:\!=\, \{\sigma^{xy} \colon (x,y) \in \mathrm{spt} (\gamma)\}$
.
Proof. The set
$\Sigma_{\gamma}$
is closed in E. Pick any
$(x,y) \in \mathrm{spt}(\gamma)$
and any open set U containing
$\sigma^{xy}$
. Since
$O = \{ (x',y') \colon \sigma^{x'y'} \in U \}$
is open in
$\mathbb{R}^{2d}$
(as
$(x',y') \mapsto \sigma^{x'y'}$
is continuous), we have, for
$(\xi_0,\xi_1) \sim \gamma$
,
$P(U) = \mathbb{P}(\sigma^{\xi_0,\xi_1} \in U) = \mathbb{P}((\xi_0,\xi_1) \in O) = \gamma(O) > 0$
, which yields
$\mathrm{spt}(P) = \Sigma_{\gamma}$
.
4.2. Exponential continuity of Brownian bridges
For given
$x,y \in \mathbb{R}^d$
, [Reference Hsu23] showed that the sequence
$\{ R^{\varepsilon,xy} \}_{\varepsilon > 0}$
satisfies an LDP with rate function
Write
$J_z(h) = J_{xy}(h)$
for
$z= (x,y)$
. Additionally, set
$H_z \,:\!=\, \{ h \in H \colon (h(0),h(1)) = z \}$
. Pick any
$\varepsilon_k \downarrow 0$
.
Proposition 4. (Exponential continuity of Brownian bridges)
-
(i) For every open set
$A \subset E$
,
$\liminf_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (A) \ge -\inf_{h \in A} J_{z}(h)$
whenever
$z_k \to z$
in
$\mathbb{R}^{2d}$
. -
(ii) For every closed set
$A \subset E$
, (14)whenever
\begin{equation} \limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (A) \le -\inf_{h \in A} J_{z}(h) \end{equation}
$z_k \to z$
in
$\mathbb{R}^{2d}$
.
Proof. Hsu’s proof in [Reference Hsu23] that relies on transition function estimates seems difficult to adapt to establishing the exponential continuity. Instead, we adapt the proof of large deviations for abstract Wiener spaces; cf. [Reference Stroock45, Chapter 8]. For the sake of completeness, we provide a self-contained proof.
For (i), it suffices to show that for every
$h \in H$
such that
$J_z (h) < \infty$
,
Set
$\bar{h} \in H_0$
by
$\bar{h}= h -\sigma^{xy}$
and
$h_k \in H_{z_k}$
by
$h_k = \bar{h} + \sigma^{x_k,y_k}$
. Since
$\| h_k - h \|_E \to 0$
,
$B_E(h_k,r/2) \subset B_E(h,r)$
for large k. Observe that
Recall that
$(H_0, (\cdot,\cdot)_H)$
is a reproducing kernel Hilbert space for
$W^\circ$
(cf. [Reference Giné and Nickl22, Exercise 2.6.16]), whose closure in E agrees with
$E_0 \,:\!=\, \{\omega \in E \colon \omega (0)=\omega(1)=0\}$
. Hence, the pair of spaces
$(H_0,E_0)$
coupled with the law of
$W^\circ$
constitutes an abstract Wiener space; cf. [Reference Stroock45, Chapter 8]. Let
$E_0^*$
denote the topological dual of
$E_0$
with dual norm
$\| \cdot \|_{E_0^*}$
, and
$\langle \omega,\omega^* \rangle$
denote the duality pairing for
$\omega \in E_0$
and
$\omega^* \in E_0^*$
. Since
$H_0$
is continuously embedded as a dense subspace of
$E_0$
(as
$\| \cdot \|_E \le \| \cdot \|_H$
on
$H_0$
), for each
$\omega^* \in E_0^*$
, there exists a unique
$h_{\omega^*} \in H_0$
with the property that
$(h,h_{\omega^*})_H = \langle h,\omega^* \rangle$
for all
$h \in H_0$
, and the map
$\omega^* \mapsto h_{\omega^*}$
is continuous, linear, one-to-one, and onto a dense subspace of
$H_0$
(cf. [Reference Stroock45, Lemma 8.2.3]). Let
$0 < \delta < r/2$
and
$\omega^* \in E_0^*$
be such that
$B_{E_0}(h_{\omega^*},\delta) \subset B_{E_0}(\bar{h},r/2)$
. Now, an application of the Cameron–Martin formula (cf. [Reference Stroock45, Theorem 8.2.9]) yields
\begin{align*} \begin{split} R^{\varepsilon_k,0}(B_E(\bar{h},r/2)) & = R^{\varepsilon_k,0}(B_{E_0}(\bar{h},r/2)) \ge R^{\varepsilon_k,0}(B_{E_0}(h_{\omega^*},\delta)) \\ & = \mathbb{P}\big(W^\circ-\varepsilon_k^{-1/2}h_{\omega^*} \in B_{E_0}\big(0,\varepsilon_k^{-1/2}\delta\big)\big) \\ & = \mathbb{E}\big[\exp\big\{{-}\varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle - \varepsilon_k^{-1}\|h_{\omega^*}\|_{H}^2/2\big\}\mathbf{1}_{B_{E_0}(0,\varepsilon_k^{-1/2}\delta)}(W^\circ)\big] \\ & \ge \exp\big\{{-}\delta\varepsilon_k^{-1}\|\omega^*\|_{E_0^*} - \varepsilon_k^{-1}\|h_{\omega^*}\|_{H}^2/2\big\} \mathbb{P}\big(W^\circ \in B_{E_0}\big(0,\varepsilon_k^{-1/2}\delta\big)\big), \end{split} \end{align*}
so that, by taking
$k \to \infty$
,
Choosing
$\delta = r/4$
and
$\omega^* \in E_0^*$
with
$\| \bar{h}-h_{\omega^*} \|_{H} < r/4$
, and then taking
$r \downarrow 0$
, we have
For (ii), we first show that for every
$h \in E$
,
Using the same notation as in (i), we have
$B_E(h,r) \subset B_E(h_k,2r)$
for large k and
\begin{align*} \begin{split} R^{\varepsilon_k,z_k}(B_E(h_k,2r)) & = \mathbb{P}(W^\circ\in B_E(\bar{h}/\sqrt{\varepsilon_k},2r/\sqrt{\varepsilon_k})) \\ & = \mathbb{E}\big[\exp\big\{{-}\varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle + \varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle\big\} \mathbf{1}_{B_{E_0}(\bar{h}/\sqrt{\varepsilon_k},2r/\sqrt{\varepsilon_k})}(W^\circ)\big] \\ & \le \exp\big\{{-}\varepsilon_k^{-1}(\langle\bar{h},\omega^*\rangle-2r\|\omega^*\|_{E_0^*})\big\} \mathbb{E}\big[\mathrm{e}^{\varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle}\big] \\ & = \exp\big\{{-}\varepsilon_k^{-1}(\langle\bar{h},\omega^*\rangle - \|h_{\omega^*}\|_H^2/2 - 2r\|\omega^*\|_{E_0^*})\big\} \end{split} \end{align*}
for all
$\omega^* \in E_0^*$
, where we used the fact that
$\langle W^\circ,\omega^*\rangle \sim N(0,\|h_{\omega^*}\|_{H}^2)$
. This yields
\begin{align*} \begin{split} \limsup_{r\downarrow0}\limsup_{k\to\infty}\varepsilon_k\log R^{\varepsilon_k,z_k}(B_E(h_k,2r)) & \le -\sup_{\omega^*\in E_0^*}\bigg(\langle\bar{h},\omega^*\rangle-\frac{\|h_{\omega^*}\|_H^2}{2}\bigg) \\ & = \begin{cases} -\dfrac{\|\bar{h}\|_H^2}{2} & \text{if $\bar{h} \in H_0$}, \\ -\infty & \text{otherwise}. \end{cases} \end{split} \end{align*}
Now,
$\bar{h} \in H_0$
if and only if
$h \in H_z$
, and
$\| \bar{h} \|_H^2=\| h \|_H^2 - |x-y|^2$
, which leads to (15).
Given (15), it is standard to show that (14) holds for every compact set
$A \subset E$
. It remains to verify exponential tightness for
$\{ R^{\varepsilon_k,z_k} \}_{k \in \mathbb{N}}$
(cf. [Reference Dembo and Zeitouni17, Lemma 1.2.18]), i.e. for every
$\alpha < \infty$
, there exists a compact set
$K \subset E$
such that
$\limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (K^{\mathrm{c}}) < -\alpha$
. We first note that the exponential tightness holds for
$\{ R^{\varepsilon_k,0} \}_{k \in \mathbb{N}}$
. Indeed, by [Reference Stroock45, Corollary 8.3.10], we can construct a separable Banach space F that is continuously embedded in
$E_0$
as a measurable subset with the properties that
$\mathbb{P}(W^\circ \in F)=1$
, bounded subsets of F are totally bounded in
$E_0$
, and
$(H_0,F)$
coupled with the restriction of the law of
$W^\circ$
on F is another abstract Wiener space. Then, choosing
$K_0$
to be the
$E_0$
-closure of a ball in F with large enough radius satisfies
$\limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,0} (K_0^{\mathrm{c}}) < -\alpha$
by Fernique’s theorem (cf. [Reference Stroock45, Theorem 8.2.1]), and
$K_0$
is compact in
$E_0$
by construction.
Now, for an arbitrary bounded neighborhood
$O \subset \mathbb{R}^{2d}$
of z, set
$K_1=\{ \sigma^{x'y'} \colon (x',y') \in O \}$
. By the Ascoli–Arzelà theorem, the set
$K=\{ \omega + \omega' \colon \omega \in K_0, \omega' \in K_1 \}$
is relatively compact in E, and such that
$R^{\varepsilon_k,z_k}(K) {\ge} R^{\varepsilon_k,0} (K_0)$
for large k. Indeed,
$\sigma^{z_k} \in K_1$
for large k, so if
$\omega \in K_0$
, then
$\omega + \sigma^{z_k} \in K$
, which implies
$R^{\varepsilon_k,0}(K_0) \le R^{\varepsilon_k,0} (\omega+\sigma^{z_k} \in K) = R^{\varepsilon_k,z_k}(K)$
. This yields exponential tightness for
$\{ R^{\varepsilon_k,z_k} \}_{k \in \mathbb{N}}$
.
Given the exponential continuity, the following corollary concerning large deviations of mixtures of Brownian bridges follows immediately from [Reference Dinwoodie and Zabell18, Theorems 2.1 and 2.2]. The result might be of independent interest.
Corollary 2. (Large deviations for mixtures of Brownian bridges.) Let
$\gamma$
be a Borel probability measure on
$\mathbb{R}^{2d}$
. Consider the mixture distribution
$Q^{\varepsilon} = \int R^{\varepsilon,xy} \, \, \mathrm{d}\gamma(x,y)$
.
-
(i) The function
(16)is lower semicontinuous from E into
\begin{equation} J(h) \,:\!=\, \inf_{(x,y)\in\mathrm{spt}(\gamma)}J_{xy}(h) = \frac{\|h\|_H^2}{2} - c(h(0),h(1)) + \iota_{\mathrm{spt}(\gamma)}(h(0),h(1)) \end{equation}
$[0,\infty]$
.
-
(ii) For every open set
$A \subset E$
,
$\liminf_{k \to \infty} \varepsilon_k\log Q^{\varepsilon_k} (A) \ge -\inf_{h \in A} J(h)$
. -
(iii) If
$\gamma$
is compactly supported, then for every closed set
$A \subset E$
, and J is a good rate function.
\begin{align*} \limsup_{k \to \infty} \varepsilon_k\log Q^{\varepsilon_k} (A) \le -\inf_{h \in A} J(h), \end{align*}
Proof. For (i), set
$F = \{ (h,z) \in E \times \mathbb{R}^{2d} \colon (h(0),h(1)) = z \}$
. The rate function
$J_{z}(h)$
can be expressed as
This yields the second expression for the J function in (16). Since
$\mathrm{spt}(\gamma)$
is closed by definition, what remains is to verify that the mapping
$E \ni h \mapsto \| h \|_{H}^2/2$
is lower semicontinuous. It suffices to show that the set
$\{ h \in H \colon \| h \|_H \le 1 \}$
is closed in E. Let
$\{ h_n \}_{n \in \mathbb{N}} \subset H$
be a sequence with
$ \| h_n \|_H \le 1$
for all
$n \in \mathbb{N}$
and
$h_n \to h_\infty$
in E. We may assume without loss of generality that
$h_n(0)=h_\infty(0)=0$
. Since
$\tilde{H}= \{ h \in H \colon h(0) = 0\}$
endowed with inner product
$( \cdot, \cdot )_H$
is a Hilbert space, by the Banach–Alaoglu theorem, there exists a subsequence
$h_{n'}$
such that
$h_{n'} \to \tilde{h}$
weakly in
$\tilde{H}$
for some
$\tilde{h} \in \tilde{H}$
with
$\| \tilde{h} \|_H \le 1$
, i.e.
$\lim_{n'} (h_{n'},g)_H = (\tilde{h},g)_H$
for all
$g \in \tilde{H}$
. This implies that
$h_\infty = \tilde{h}$
(choose appropriate g) and
$\| h_\infty \|_{H} \le 1$
, as desired.
Part (ii) follows from Proposition 4(i) and [Reference Dinwoodie and Zabell18, Theorem 2.1].
For (iii), the large-deviation upper bound follows from Proposition 4(ii) and [Reference Dinwoodie and Zabell18, Theorem 2.2]. Finally, we verify that J has compact level sets, but this follows from [Reference Dembo and Zeitouni17, Lemma1.2.18], since the argument in Proposition 4(ii) indeed shows that
$\{ Q^{\varepsilon_k} \}_{k \in \mathbb{N}}$
is exponentially tight (replace O by
$\mathrm{spt}(\gamma)$
).
4.3. Proof of Theorem 1
Proof of Theorem
1
. Set
$\phi(z) = c(x,y) - \psi^c(x) - \psi(y)$
for
$z = (x,y) \in \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$
.
For (i), it suffices to show that for any
$h \in e_{01}^{-1}(\mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}) \cap H$
and
$r > 0$
,
Set
$z = (h(0),h(1)) \in \mathcal{X}_o \times \mathcal{Y}_o$
. By the exponential continuity of
$\{ R^{\varepsilon_k,z} \}$
established in Proposition 4, for every
$\delta >0$
we can choose an open neighborhood
$O_z \subset \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$
of z and a positive integer
$k_z$
such that, for every
$z' \in O_z$
,
For if not, for the open ball
$O_i$
in
$\mathcal{X}\circ \times \mathcal{Y}\circ$
with center z and radius
$i^{-1}$
, we can find
$z_i' \in O_i$
and a large enough positive integer
$k_i$
(with
$k_i > k_{i-1}$
) such that
but this contradicts the exponential continuity (as
$z_i' \to z$
). Hence,
\begin{align*} \begin{split} P^{\varepsilon_k}(B_E(h,r)) & \ge \int_{O_z}\exp\big\{\varepsilon_k^{-1}\cdot\varepsilon_k\log R^{\varepsilon_k,z'}(B_E(h,r))\big\}\, \, \mathrm{d}\pi_{\varepsilon_k}(z') \\ & \ge \exp\Big\{\!-\varepsilon_k^{-1}\Big(\inf_{h' \in B_E(h,r)} J_z (h') + \delta\Big)\Big\} \pi_{\varepsilon_k}(O_z). \end{split} \end{align*}
Invoking [Reference Bernton, Ghosal and Nutz3, Corollary 4.7], we arrive at
\begin{align*} \begin{split} \liminf_{k \to \infty} \varepsilon_k \log P^{\varepsilon_k} (B_E(h,r)) & \ge -\inf_{h' \in B_E(h,r)} J_z (h') - \delta - \inf_{z' \in O_z}\phi(z') \\ & \ge -(J_z(h)+\phi(z)) - \delta = -I(h) - \delta, \end{split} \end{align*}
establishing the desired claim.
For (ii), we first observe that for
$A = e_{01}^{-1}(C)$
with
$C \subset \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$
compact,
$P^{\varepsilon}(A) = \int_{C} R^{\varepsilon,z}(A) \, \, \mathrm{d}\pi_{\varepsilon}(z)$
. Taking into account [Reference Bernton, Ghosal and Nutz3, Proposition 4.5], extend
$\phi$
to
$\mathcal{X} \times \mathcal{Y}$
as
\begin{align*} \phi(x,y) = \sup_{\ell\ge2}\sup_{\{(x_i,y_i)\}_{i=1}^\ell\subset\mathrm{spt}(\pi_{\mathrm{o}})} \sup_{\tau}\sum_{i=1}^\ell c(x_i,y_i) - \sum_{i=1}^\ell c(x_i,y_{\tau(i)}), \end{align*}
where
$\sup_{\tau}$
is taken over all permutations of
$\{ 1,\ldots,\ell \}$
and
$(x_1,y_1)=(x,y)$
. The function
$\phi\colon\mathcal{X}\times\mathcal{Y}\rightarrow[0,\infty]$
is lower semicontinuous [Reference Bernton, Ghosal and Nutz3, Lemma 4.2]) and agrees with the previous definition of
$\phi$
on
$\mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$
. Let
$\delta > 0$
be given. For every
$z \in C$
, by the exponential continuity of
$\{ R^{\varepsilon_k,z} \}$
established in Proposition 4 we can choose a bounded open neighborhood
$O_z \subset \mathcal{X} \times \mathcal{Y}$
of z and a positive integer
$k_z$
such that, for every
$z' \in O_z$
,
Furthermore, since
$\phi$
is lower semicontinuous, by choosing
$O_z$
smaller if necessary, we have
$\inf_{z' \in \bar{O}_z} \phi(z') \ge \phi(z) - \delta$
, where
$\bar{O}_z$
denotes the closure of
$O_z$
in
$\mathcal{X} \times \mathcal{Y}$
. By the compactness of C, we can find
$z_1,\ldots,z_N \in C$
such that
$C \subset \bigcup_{i=1}^N O_{z_i}$
, so
\begin{align*} P^{\varepsilon_k}(A) \le \sum_{i=1}^N\int_{O_{z_i}}\mathrm{e}^{\varepsilon_k^{-1}\cdot\varepsilon_k\log R^{\varepsilon_k,z}(A)}\, \, \mathrm{d}\pi_{\varepsilon_k}(z) \le \sum_{i=1}^N\mathrm{e}\Big\{\varepsilon_k^{-1}\Big(-\inf_{h\in A}J_{z_i}(h) + \delta + \varepsilon_k \log\pi_{\varepsilon_k}(\bar{O}_{z_i})\Big)\Big\}. \end{align*}
We invoke the following elementary result, whose proof follows from Jensen’s inequality [Reference Chatterjee10].
Lemma 2. (Smooth max function.) For
$\beta > 0$
and
$v = (v_1,\ldots,v_N) \in \mathbb{R}^N$
, consider a smooth max function
$m_\beta(v) = \beta^{-1}\log (\sum_{i=1}^N \mathrm{e}^{\beta v_i})$
. Then, for every
$v \in \mathbb{R}^N$
, we have
$\max_{1 \le i \le N}v_i \le m_\beta (v) \le \max_{1 \le i \le N}v_i + \beta^{-1}\log N$
.
Using Lemma 1 combined with [Reference Bernton, Ghosal and Nutz3, Corollary 4.3], we have
\begin{align*} \begin{split} \varepsilon_k\log P^{\varepsilon_k}(A) & \le \max_{1 \le i \le N}\Big\{{-}\inf_{h\in A}J_{z_i}(h) + \delta + \varepsilon_k\log\pi_{\varepsilon_k}(\bar{O}_{z_i})\Big\} + \varepsilon_k\log N \\ & \le \max_{1 \le i \le N}\Big\{{-}\inf_{h\in A}J_{z_i}(h) - \inf_{z\in\bar{O}_{z_i}}\phi(z)\Big\} + \delta + o(1) \\ & \le \max_{1 \le i \le N}\Big\{{-}\inf_{h\in A}J_{z_i}(h) - \phi(z_i)\Big\} + 2\delta + o(1) \\ & \le -\inf_{h\in A}\inf_{z\in C}\{J_z(h) + \phi(z)\} + 2\delta + o(1) \\ & = -\inf_{h\in A}\inf_{z\in C}\{I(h) + \iota_{\{z\}}(h(0),h(1))\} + 2\delta + o(1) \\ &= -\inf_{h \in A} I(h) + 2 \delta +o(1), \end{split} \end{align*}
where we used the fact that
$(h(0),h(1)) \in C$
whenever
$h \in A$
by our choice of A. This completes the proof.
4.4. Direct proof of Corollary 1
We first prove the following technical result concerning convergence of EOT potentials.
Lemma 3. (Convergence of EOT potentials.) Suppose that
$\mathcal{X}$
and
$\mathcal{Y}$
are compact and one of them agrees with the closure of a connected open set. Then, under normalization
$\int\psi^c\,\, \mathrm{d}\mu_0 = \int\psi\,\, \mathrm{d}\mu_1$
, the OT potential
$\psi$
from
$\mu_1$
to
$\mu_0$
is everywhere unique, and
$(\psi^c,\psi)$
are bounded and Lipschitz on
$\mathcal{X} \times \mathcal{Y}$
. Furthermore, let
$(\varphi_{\varepsilon},\psi_{\varepsilon})$
be the unique EOT potentials under normalization
$\int\varphi_{\varepsilon}\,\, \mathrm{d}\mu_0 = \int\psi_{\varepsilon}\,\, \mathrm{d}\mu_1$
. Then, for any sequence
$\varepsilon_{k} \downarrow 0$
,
$\varphi_{\varepsilon_k} \to \psi^c$
and
$\psi_{\varepsilon_{k}} \to \psi$
uniformly on
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively.
Proof. The lemma follows from [Reference Santambrogio41, Proposition 7.18] and [Reference Nutz and Wiesel36, Proposition 3.2]. We include a self-contained proof for completeness. First, under the current assumption, we observe that any OT potential
$\psi$
is bounded and Lipschitz on
$\mathcal{Y}$
. We have seen that the support of any OT plan
$\pi$
is contained in
$\partial^c\psi$
, so any
$(x_0,y_0) \in \mathrm{spt}(\pi)$
satisfies
$\psi(y_0) > -\infty$
and
$\psi^c(x_0) > -\infty$
, which entails
$\psi = \psi^{cc} \le \sup_{\mathcal{X}\times\mathcal{Y}}c - \psi^c(x_0)$
and
$\psi \ge -\sup_{\mathcal{X}}\psi^c \ge -\sup_{\mathcal{X}\times\mathcal{Y}}c + \psi(y_0)$
. Lipschitz continuity follows from c-concavity. For the uniqueness, suppose
$\mathrm{int}(\mathcal{Y})$
is connected. Recall that the projections of
$\mathrm{spt}(\pi)$
onto
$\mathcal{X}$
and
$\mathcal{Y}$
agree with
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively (cf. Remark 6). For any OT potential
$\psi$
and any
$y_0 \in \mathrm{int} (\mathcal{Y})$
, we can find
$x_0 \in \mathcal{X}$
such that
$\psi^c(x_0) + \psi(y_0) =c(x_0,y_0)$
, i.e.
$c(x_0,\cdot) - \psi(\!\cdot\!)$
is minimized at
$y_0$
, which entails
$\nabla \psi(y_0) = \nabla_y c(x_0,y_0)$
as long as
$\psi$
is differentiable at
$y_0$
. We have shown that
$\nabla \psi$
is uniquely determined Lebesgue-a.e. on
$\mathrm{int} (\mathcal{Y})$
. As
$\mathrm{int}(\mathcal{Y})$
is connected,
$\psi$
is uniquely determined on
$\mathrm{int}(\mathcal{Y})$
up to additive constants. By continuity,
$\psi$
is uniquely determined on
$\mathcal{Y}$
up to additive constants. If
$\mathrm{int}(\mathcal{X})$
is connected, then the OT potential
$\varphi$
from
$\mu_0$
to
$\mu_1$
is unique up to additive constants. If
$\psi$
is an OT potential from
$\mu_1$
to
$\mu_0$
, then by the definition of the c-transform, we must have
$\int(\psi-\varphi^c)\,\, \mathrm{d}\mu_1 = 0$
, which yields
$\psi = \varphi^c$
$\mu_1$
-a.e. By continuity, we have
$\psi = \varphi^c$
on
$\mathcal{Y}$
.
For the second result, by the Schrödinger system (6) and Jensen’s inequality, we have
$\psi_{\varepsilon}^c \le \varphi_{\varepsilon} \le \sup_{\mathcal{X} \times \mathcal{Y}} c$
and
$\varphi_{\varepsilon}^c \le \psi_{\varepsilon} \le \sup_{\mathcal{X} \times \mathcal{Y}} c$
, so the EOT potentials are uniformly bounded by
$\sup_{\mathcal{X} \times \mathcal{Y}} c$
. Furthermore, under our assumption, the EOT potentials extend to smooth functions on
$\mathbb{R}^d$
by the Schrödinger system, and directly calculating derivatives shows that
$|\nabla\varphi_{\varepsilon}|\vee|\nabla\psi_{\varepsilon}| \le C$
on
$\mathcal{X} \times \mathcal{Y}$
for some constant C independent of
$\varepsilon$
. Hence, the Ascoli–Arzelà theorem applies, and after passing to a subsequence,
$\varphi_{\varepsilon_k} \to \bar{\varphi}$
and
$\psi_{\varepsilon_k} \to \bar{\psi}$
uniformly on
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively. By the identity
$\int\mathrm{e}^{(\varphi_{\varepsilon}+\psi_{\varepsilon}-c)/\varepsilon}\,\, \mathrm{d}(\mu_0\otimes\mu_1) = 1$
and Fatou’s lemma, we have
$\bar{\varphi} + \bar{\psi} \le c$
$(\mu_0 \otimes \mu_1)$
-a.e. By continuity,
$\bar{\varphi} + \bar{\psi} \le c$
on
$\mathcal{X} \times \mathcal{Y}$
, but
$\bar{\psi}^c \le \bar{\varphi}$
and
$\bar{\varphi}^c \le \bar{\psi}$
by construction,
$\bar{\varphi} = \bar{\psi}^c$
and
$\bar{\psi} = \bar{\varphi}^c$
, i.e.
$(\bar{\varphi},\bar{\psi})$
are c-concave. Now, using duality, for any OT plan
$\pi$
,
$\int c\,\, \mathrm{d}\pi \le \lim_{k}\big(\int c\,\, \mathrm{d}\pi_{\varepsilon_k} + \varepsilon_k H(\pi_{\varepsilon_k}\mid\mu_0\otimes\mu_1)\big) = \int\bar{\varphi}\,\, \mathrm{d}\mu_0 + \int\bar{\psi}\,\, \mathrm{d}\mu_1 \le \int c\,\, \mathrm{d}\pi$
, so
$(\bar{\varphi},\bar{\psi})$
are OT potentials. Since
$\int\bar{\varphi}\,\, \mathrm{d}\mu_0 = \int\bar{\psi}\,\, \mathrm{d}\mu_1$
by construction, by the uniqueness result,
$\bar{\varphi} = \psi^c$
and
$\bar{\psi}=\psi$
. Finally, by the uniqueness of the limits, along the original sequence,
$\varphi_{\varepsilon_k} \to \psi^c$
and
$\psi_{\varepsilon_k} \to \psi$
uniformly on
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively.
Direct proof of Corollary 1. Set
$S = e_{01}^{-1} (\mathcal{X} \times \mathcal{Y}) \subset E$
. Recall that
$\bar{R}^{\varepsilon} = \int R^{\varepsilon,xy}\,\, \mathrm{d}(\mu_0 \otimes \mu_1)$
. By construction,
$\bar{R}^{\varepsilon}(S)=1$
for all
$\varepsilon > 0$
. Abusing notation, we shall write
$\phi_{\varepsilon}(\omega) = \phi_{\varepsilon} (\omega(0),\omega(1))$
. With this convention, we have
$P^\varepsilon(A) = \int_A\mathrm{e}^{-\phi_\varepsilon/\varepsilon}\,\, \mathrm{d}\bar{R}^{\varepsilon}$
. Set
$J(h) = \inf_{(x,y)\in\mathcal{X}\times\mathcal{Y}}J_{xy}(h)$
and
$\phi(h) = \phi(h(0),h(1)) = c(h(0),h(1))-\psi^c (h(0))-\psi(h(1))$
for
$h \in S$
.
Step 1
Let
$A \subset E$
be open and pick any
$h \in A$
such that
$I(h)<\infty$
(if no such h exists then the conclusion is trivial). By Lemma 3, for every
$\delta > 0$
there exists an open neighborhood
$G \subset A$
of h such that
$\sup_{\omega \in G \cap S}\phi_{\varepsilon_k}(\omega) \le \phi(h) + \delta$
for all large k. Hence,
Corollary 2 implies that
$\varepsilon_k \log P^{\varepsilon_k} (A) \ge -\phi(h) - \delta - J(h) + o(1)$
as
$k \to \infty$
. Noting that
$\phi(h) + J(h) = I(h)$
yields the desired lower bound.
Step 2
For the upper bound, we first note that by Lemma 3,
$\phi_{\varepsilon}$
are uniformly lower bounded on S, and
$\phi_{\varepsilon}(\omega) \ge -M$
for all
$\omega \in S$
and
$\varepsilon > 0$
for some
$M >0$
. Let
$A \subset E$
be closed. Pick any
$\alpha < \infty$
and
$\delta > 0$
. Set
$\Psi_{J}(\alpha) = \{ h \colon J(h) \le \alpha \} \cap A$
, which is a compact subset of E as J is a good rate function and A is closed. By Lemma 3 and the lower semicontinuity of the function J, for every
$h \in \Psi_{J}(\alpha)$
(which entails
$h \in S$
), we can find an open neighborhood
$U_h$
of h such that
$\inf_{\omega \in \bar{U}_h} J(\omega) \ge J(h) - \delta$
and
$\inf_{\omega \in \bar{U}_h \cap S} \phi_{\varepsilon_k} (\omega) \ge \phi(h) - \delta$
for large k, where
$\bar{U}_h$
denotes the closure of
$U_h$
in E. By the compactness of
$\Psi_J(\alpha)$
, we can find
$h_1,\ldots,h_N \in \Psi_J(\alpha)$
such that
$\Psi_J(\alpha) \subset \bigcup_{i=1}^N U_{h_i}$
. Now, setting
$F=\big(\bigcup_{i=1}^N U_{h_i}\big)^c \cap A$
(which is a closed subset of E), we observe that
\begin{align*} P^{\varepsilon_k}(A) & = \int_A\mathrm{e}^{-\phi_{\varepsilon_k}/\varepsilon_k}\,\, \mathrm{d}\bar{R}^{\varepsilon} \\ & \le \sum_{i=1}^N\exp\big\{(\varepsilon_k\log\bar{R}^{\varepsilon_k}(\bar{U}_{h_i}) - \phi(h_i) + \delta)/\varepsilon_k\big\} + \mathrm{e}^{(M+\varepsilon_k \log \bar{R}^{\varepsilon_k}(F))/\varepsilon_k}. \end{align*}
Using Lemma 2, and combining Lemma 3 and Corollary 2, we have
\begin{align*} \begin{split} \varepsilon_k\log P^{\varepsilon_k}(A) & \le \max\big\{\varepsilon_k\log\bar{R}^{\varepsilon_k}(\bar{U}_{h_1}) - \phi(h_1) + \delta, \ldots, \varepsilon_k\log\bar{R}^{\varepsilon_k}(\bar{U}_{h_N}) - \phi(h_N) + \delta, \\ & \qquad\qquad M + \varepsilon_k\log\bar{R}^{\varepsilon_k}(F)\big\} + \varepsilon_k\log(N+1) \\ & \le \max\Big\{{-}\inf_{\omega\in\bar{U}_{h_1}}J(\omega) - \phi(h_1) + \delta, \ldots, -\inf_{\omega\in\bar{U}_{h_N}}J(\omega) - \phi(h_N) + \delta, \\ & \qquad\qquad M - \inf_{\omega\in F}J(\omega)\Big\} + o(1) \\ &\le \max\{-I(h_1) + 2\delta, \ldots, -I(h_N) + 2\delta, M-\alpha\} + o(1) \\ & \le \max\Big\{{-}\inf_{h \in A}I(h) + 2\delta, M-\alpha\Big\} + o(1), \end{split} \end{align*}
where we used
$J(h) + \phi (h) = I(h)$
. Since
$\alpha < \infty$
and
$\delta >0$
are arbitrary, we obtain the desired upper bound. Finally, the rate function I being good follows from an argument similar to the proof of Corollary 2(iii). This completes the proof.
4.5. Proof of Proposition 2
Proof of Proposition
2
. The fact that the sequence
$\{ P^{\varepsilon_k}_{st} \}_{k \in \mathbb{N}}$
satisfies an LDP having a good rate function follows from Corollary 1 and the contraction principle. The rate function is given by
First, fix two endpoints
$h(0)=x'$
and
$h(1)=y'$
and optimize
$\| h \|_H^2$
under the constraint
$(h(s),h(t))=(x,y)$
. The optimal h is given by
\begin{equation*} h(u) = \begin{cases} \bigg(1-\dfrac{u}{s}\bigg)x' + \dfrac{u}{s}x & \text{if} \ u \in [0,s], \\[10pt] \bigg(1-\dfrac{u-s}{t-s}\bigg)x + \dfrac{u-s}{t-s}y & \text{if} \ u \in [s,t], \\[10pt] \bigg(1-\dfrac{u-t}{1-t}\bigg)y + \dfrac{u-s}{1-t}y' & \text{if} \ u \in [t,1], \end{cases} \end{equation*}
which gives
$\| h \|_H^2/2 = c^{0,s}(x',x)+c^{s,t}(x,y)+c^{t,1}(y,y')$
. Hence,
\begin{align*} \begin{split} I_{st}(x,y) & = \inf_{x',y'}\big\{c^{0,s}(x',x)+c^{s,t}(x,y)+c^{t,1}(y,y') - \varphi(x') - \psi(y') \big \} \\ & = c^{st}(x,y) + \mathcal{Q}_{s}(\!-\varphi)(x) + \mathcal{Q}_{1-t}(\!-\psi)(x) = c^{s,t}(x,y) - \varphi_s(x) - \psi_t(y). \end{split} \end{align*}
The final claim follows from [Reference Villani47, Theorem 7.35] after adjusting the signs.
4.6. Proof of Proposition 3
Proof of Proposition
3. The EOT plan
$\check{\pi}^\varepsilon$
is of the form
where
$(\check{\varphi}_\varepsilon,\check{\psi}_\varepsilon)$
are EOT potentials satisfying the Schrödinger system (6) with c replaced by
$c_\varepsilon$
. For uniqueness, we assume without loss of generality that
$\int\check{\varphi}_\varepsilon\,\, \mathrm{d}\mu_0 = \int\check{\psi}_\varepsilon\,\, \mathrm{d}\mu_1$
. Consider the mixture distribution
$Q^{\varepsilon} = \int\check{R}^{\varepsilon,xy}\,\, \mathrm{d}(\mu_0 \otimes \mu_1)(x,y)$
; then
Furthermore, by [Reference Stroock44, Theorems 4.4.6 and 4.4.12], we have
$\lim_{\varepsilon \downarrow 0} c_{\varepsilon}(x,y) = {|x-y|^2}/{2} = c(x,y)$
uniformly over
$(x,y) \in \mathcal{X} \times \mathcal{Y}$
. Hence, in view of the direct proof of Corollary 1, the desired claim follows once we verify the following:
-
• the mixture distributions
$\{ Q^{\varepsilon_k} \}_{k \in \mathbb{N}}$
satisfy the LDP with good rate function
$J(h) = \inf_{(x,y) \in \mathcal{X} \times \mathcal{Y}} J_{xy}(h)$
; -
• as
$k \to \infty$
,
$\check{\varphi}_{\varepsilon_k}\to\psi^c$
and
$\check{\psi}_{\varepsilon_k}\to\psi$
uniformly on
$\mathcal{X}$
and
$\mathcal{Y}$
, respectively.
The first item follows by establishing the exponential continuity of
$\{ \check{R}^{\varepsilon_k, xy} \}_{k \in \mathbb{N}}$
w.r.t. (x,y). To this end, we invoke the Radon–Nikodym derivative of the Langevin bridge
$\check R^{\varepsilon,xy}$
against the Brownian bridge
$R^{\varepsilon,xy}$
:
where
$\Delta V$
is the Laplacian of V and
$Z_{\varepsilon,xy}$
is the normalizing constant. See [Reference Levy and Krener28, Section 5] and the proof of [Reference Conforti12, Theorem 2.1]; see also Remark 11. Heuristically, this follows from the following observation. The Langevin diffusion
$X^\varepsilon$
follows the SDE
The Girsanov theorem yields that
under
$R^\varepsilon$
. An application of Itô’s formula yields
under
$R^\varepsilon$
. The bridge case is obtained by canceling
$V(\omega(1)) - V(\omega(0))$
, which is to be expected since it depends only on the endpoints. Now, since the potential V has bounded derivatives, the desired exponential continuity follows from Proposition 4.
For the second item, by the Schrödinger system and Jensen’s inequality, we have
\begin{align*} \begin{split} |\check{\varphi}_\varepsilon(x) - \check{\varphi}_\varepsilon (x')| & \le \sup_{y \in \mathcal{Y}}|c_\varepsilon(x,y) - c_\varepsilon(x',y)| \\ & \le \sup_{y \in \mathcal{Y}}|c(x,y) - c(x',y)| + 2\sup_{\mathcal{X} \times \mathcal{Y}}|c_\varepsilon-c|. \end{split} \end{align*}
By the generalized Ascoli–Arzelà theorem (cf. [Reference Nutz and Wiesel36, Lemma 2.2]), the sequence of functions
$\{ \check{\varphi}_{\varepsilon_k} \}_{k \in \mathbb{N}}$
converges uniformly on
$\mathcal{X}$
along a subsequence. A similar result holds for
$\check{\psi}_{\varepsilon_k}$
. The rest of the proof is analogous to the second part of the proof of Lemma 3. This completes the proof.
Remark 11 (Derivation of (17).) Formally, the Radon–Nikodym derivative (17) follows by reducing to the
$\varepsilon=1$
case via reparameterization and [Reference Conforti12, (25)]. Indeed, the process
$Y^\varepsilon(t)=X^{\varepsilon}(t)/\sqrt{\varepsilon}$
satisfies
$\, \mathrm{d} Y^{\varepsilon}(t) = -\nabla V^\varepsilon(Y^\varepsilon(t))\,\, \mathrm{d} t + \, \mathrm{d} W(t)$
, where
$V^{\varepsilon}(x) = V(\sqrt{\varepsilon}x)$
. By [Reference Conforti12, (25)], denoting by
$Y^{\varepsilon}_{\#}\mathbb{P}$
the law of the process
$Y^\varepsilon = (Y^\varepsilon (t))_{t \in [0,1]}$
, we have
\begin{align*} \begin{split} \frac{\, \mathrm{d}(Y^{\varepsilon}_{\#}\mathbb{P})^{xy}}{\, \mathrm{d} R^{1,xy}}(\omega) & = Z_{xy}^{-1}\exp\bigg\{{-}\frac{1}{2}\int_0^1\big(|\nabla V^\varepsilon(\omega(t))|^2 - \Delta V^\varepsilon(\omega(t))\big)\,\, \mathrm{d} t \bigg\} \\ & = Z_{xy}^{-1}\exp\bigg\{{-}\frac{\varepsilon}{2} \int_0^1 \big(|\nabla V(\sqrt{\varepsilon}\omega(t))|^2 - \Delta V(\sqrt{\varepsilon}\omega(t))\big)\,\, \mathrm{d} t \bigg\}, \end{split} \end{align*}
where
$Z_{xy}$
is the normalizing constant. Now, the formula (17) follows by a simple reparameterization.
Acknowledgement
The author would like to thank the editor and two anonymous referees for their careful reading and constructive comments that helped improve the quality of this paper.
Funding information
K. Kato is partially supported by NSF grants DMS-1952306, DMS-2210368, and DMS-2413405.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.