Hostname: page-component-857557d7f7-v48vw Total loading time: 0 Render date: 2025-12-11T08:34:20.881Z Has data issue: false hasContentIssue false

Large deviations for dynamical Schrödinger problems

Published online by Cambridge University Press:  09 December 2025

Kengo Kato*
Affiliation:
Cornell University
*
*Postal address: CIS Building 304, Ithaca, NY 14853, USA. Email: kk976@cornell.edu
Rights & Permissions [Opens in a new window]

Abstract

We establish large deviations for dynamical Schrödinger problems driven by perturbed Brownian motions when the noise parameter tends to zero. Our results show that Schrödinger bridges charge exponentially small masses outside the support of the limiting law that agrees with the optimal solution to the dynamical Monge–Kantorovich optimal transport problem. Our proofs build on mixture representations of Schrödinger bridges and establishing exponential continuity of Brownian bridges with respect to the initial and terminal points.

MSC classification

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

1.1. Overview

The dynamical Schrödinger problem [Reference Föllmer19, Reference Léonard27] seeks to find the entropic projection of a reference path measure (such as a Wiener measure) onto the space of path measures with given initial and terminal distributions. Originally motivated by physics, the problem has received increasing interest from other application domains such as statistics and machine learning; see [Reference Bernton, Heng, Doucet and Jacob4, Reference De Bortoli, Thornton, Heng and Doucet16, Reference Pavon, Trigila and Tabak38, Reference Stromme43] and references therein. From a purely mathematical point of view, the time marginal flow, called entropic interpolation, provides a powerful technique for deriving functional inequalities and analysis of metric measure spaces [Reference Boissard, Gozlan, Lehec, Léonard, Menz and Schlichting5, Reference Gentil, Léonard, Ripani and Tamanini20, Reference Gigli and Tamanini21], making the dynamical Schrödinger problem of intrinsic interest. Additionally, the static version of the Schrödinger problem is equivalent to quadratic entropic optimal transport (EOT) [Reference Nutz35], the analysis of which has seen extensive research activities. This is in particular due to EOT admitting efficient computation via Sinkhorn’s algorithm, which lends itself well to large-scale data analysis [Reference Cuturi14, Reference Peyré and Cuturi39].

Schrödinger problems can be interpreted as noisy counterparts of Monge–Kantorovich optimal transport (OT) problems. In particular, [Reference Léonard26, Reference Mikami31, Reference Mikami and Thieullen32] studied the rigorous connection between the two problems, establishing convergence of optimal solutions for dynamical Schrödinger problems (Schrödinger bridges) toward the dynamical OT problem when the noise level tends to zero. In this work, we study local rates of convergence of Schrödinger bridges toward the limiting law. Specifically, we establish large-deviation principles (LDPs) for Schrödinger bridges on a path space and characterize the rate function.

Our baseline setting goes as follows. Let $\mu_0,\mu_1$ be Borel probability measures on $\mathbb{R}^d$ with finite second moments that will be fixed throughout. Let E be the space of continuous maps $[0,1] \to \mathbb{R}^d$ endowed with the sup norm $\| \omega \|_E = \sup_{t \in [0,1]} |\omega (t)|$ for $\omega = (\omega (t))_{t \in [0,1]} \in E$ (we use $| \cdot |$ to denote the Euclidean norm). For a given $\varepsilon > 0$ (noise level), let $R^{\varepsilon}$ be the law, defined on the Borel $\sigma$ -field of E, of $\xi + \sqrt{\varepsilon}W$ , where $\xi \sim \mu_0$ and $W=(W(t))_{t \in [0,1]}$ is a standard Brownian motion starting at 0 independent of $\xi$ . For $s,t \in [0,1]$ , we denote the projections at t and (s, t) as $e_t$ and $e_{st}$ , respectively, i.e. $e_t(\omega) =\omega (t)$ and $e_{st} (\omega) = (\omega (s),\omega(t))$ for $\omega \in E$ . For a given Borel probability measure P on E, we write $P_t = P \circ e_t^{-1}$ and $P_{st} = P \circ e_{st}^{-1}$ . Given two endpoint marginals $\mu_0,\mu_1$ and a reference measure $R^{\varepsilon}$ , the dynamical Schrödinger problem reads as

(1) \begin{equation}\min_{P: P_0=\mu_0,P_1=\mu_1} \mathcal{H}(P \mid R^\varepsilon), \end{equation}

where $\mathcal{H}(\cdot \mid \cdot)$ denotes the relative entropy (see Section 1.4 for the formal definition). Provided $\mu_1$ has finite entropy relative to the Lebesgue measure (cf. Remark 1), the problem in (1) admits a unique optimal solution $P^{\varepsilon}$ , called the Schrödinger bridge. The solution $P^{\varepsilon}$ is given by a mixture of Brownian bridges against a (unique) optimal solution $\pi_\varepsilon$ to the static Schrödinger problem

(2) \begin{equation}\min_{\pi \in \Pi(\mu_0,\mu_1)} \mathcal{H}(\pi \mid R_{01}^\varepsilon),\end{equation}

where $\Pi(\mu_0,\mu_1)$ is the set of couplings with marginals $\mu_0$ and $\mu_1$ . The zero-noise limit ( $\varepsilon \downarrow 0$ ) of (2) corresponds to the OT problem with quadratic cost $c(x,y) = |x-y|^2/2$ ,

(3) \begin{equation}\min_{\pi \in \Pi(\mu_0,\mu_1)} \int c \, \, \mathrm{d}\pi,\end{equation}

which admits a unique optimal solution (OT plan) $\pi_{\mathrm{o}}$ (as $\mu_1$ is assumed to be absolutely continuous; [Reference Brenier7]).

In his influential work [Reference Mikami31], Mikami proved, under an additional assumption that $\mu_0$ is absolutely continuous, that $P^{\varepsilon}$ converges weakly to the law $P^{\mathrm{o}}$ of the geodesic path connecting two random endpoints following $\pi_{\mathrm{o}}$ , $t \mapsto \sigma^{\xi_0,\xi_1}(t)$ for $\sigma^{xy}(t) = (1-t)x+ty$ and $(\xi_0,\xi_1) \sim \pi_{\mathrm{o}}$ , i.e. $P^{\mathrm{o}} = \int \delta_{\sigma^{xy}} \, \, \mathrm{d}\pi_{\mathrm{o}} (x,y)$ with $\delta_{\cdot}$ denoting the Dirac delta ([Reference Mikami31] indeed proved convergence with respect to Wasserstein $W_2$ distance). The limiting law $P^{\mathrm{o}}$ can be characterized as an optimal solution to the dynamical OT problem

\begin{align*}\min_{P: P_0=\mu_0, P_1=\mu_1} \int \bigg( \frac{1}{2}\int_{0}^1 |\dot{\omega}(t)|^2 \, \, \mathrm{d} t \bigg) \, \, \mathrm{d} P(\omega),\end{align*}

where $\dot{\omega}(t)$ denotes the time derivative of $\omega$ and $\int_{0}^1 |\dot{\omega}(t)|^2 \, \, \mathrm{d} t =\infty$ if $\omega$ is not absolutely continuous [Reference Léonard26]. The marginal laws of the limiting process give rise to a constant-speed geodesic (displacement interpolation; [Reference McCann29]) in the Wasserstein space connecting $\mu_0$ and $\mu_1$ .

Our main large deviation results establish that (see Sections 1.4 and 2 for notation and definitions), under regularity conditions, for any sequence $\varepsilon_k \downarrow 0$ , the Schrödinger bridges $P^{\varepsilon_k}$ satisfy an LDP with rate function $I(h) = \int_0^1(|\dot{h}(t)|^2/2) \, \, \mathrm{d} t - \psi^c(h(0)) - \psi(h(1))$ , where $\psi$ is an OT (or Kantorovich) potential from $\mu_1$ to $\mu_0$ and $\psi^c$ is its c-transform (the rate function I is set to $\infty$ if h(0) or h(1) is outside the support of $\mu_0$ or $\mu_1$ , respectively). Very roughly, this means $P^{\varepsilon_k}(A) \approx \exp\{-\varepsilon_k^{-1} \inf_{h \in A}I(h)\}$ for large k. The rate function I(h) vanishes as soon as $h \in \Sigma_{\pi_0} \,:\!=\, \{ \sigma^{xy} \colon(x,y) \in \mathrm{spt}(\pi_{\mathrm{o}}) \}$ , which agrees with the support of $P^{\mathrm{o}}$ , but I(h) is positive outside $\Sigma_{\pi_0}$ in many cases. Effectively, our result implies that the Schrödinger bridges $P^{\varepsilon}$ charge exponentially small masses outside the support of the limiting law $P^{\mathrm{o}}$ . Precisely, we establish a weak-type LDP under uniqueness of OT potentials, which allows for marginals with unbounded supports, but induces a full LDP when $\mu_0,\mu_1$ are compactly supported.

The proof of the main theorem relies on the expression of $P^{\varepsilon}$ as a $\pi_\varepsilon$ -mixture of Brownian bridges. The main ingredient of the proof is the exponential continuity [Reference Dinwoodie and Zabell18] of Brownian bridges, i.e. establishing large-deviation upper and lower bounds for Brownian bridges when the locations of initial and terminal points vary with the noise level. Note that an LDP for Brownian bridges with fixed initial and terminal points was derived in [Reference Hsu23], but Hsu’s proof, which relies on transition density estimates, seems difficult to adapt to establishing the exponential continuity. Instead, we use techniques from abstract Wiener spaces (cf. [Reference Stroock45, Chapter 8]) to establish the said result. Given the exponential continuity, the main theorem follows from combining the large-deviation results for $\pi_\varepsilon$ established in [Reference Bernton, Ghosal and Nutz3]. For the compact support case, we provide a more direct proof of the full LDP using the representation of $P^{\varepsilon}$ as an integral of a $(\mu_0 \otimes \mu_1)$ -mixture of Brownian bridges. The proof first shows an LDP for the $(\mu_0 \otimes \mu_1)$ -mixture of Brownian bridges, and then establishes the full LDP by adapting the (Laplace–)Varadhan lemma (cf. [Reference Dembo and Zeitouni17, Theorem 4.4.2]) and using the convergence of EOT (or Schrödinger) potentials. The alternative proof can be easily adapted to establish an LDP for the dynamical Schrödinger problem with Langevin diffusion as a reference measure when two marginals are compactly supported; cf. Remark 10.

1.2. Literature review

The literature related to this paper is broad, so we confine ourselves to the references directly related to our work. The most closely related are [Reference Bernton, Ghosal and Nutz3, Reference Nutz and Wiesel36], which established large deviations for static Schrödinger problems in fairly general settings, allowing for marginals on a general Polish space and general continuous costs, and our proofs use several results from their work. [Reference Bernton, Ghosal and Nutz3] derived a weak LDP for EOT via a novel cyclical invariance characterization of EOT plans, while [Reference Nutz and Wiesel36] built on convergence of EOT potentials.

The connection between Schrödinger and OT problems has been one of the central problems in the OT literature. We focus here on convergence of Schrödinger problems. The pioneering works in this direction are [Reference Léonard26, Reference Mikami31, Reference Mikami and Thieullen32]. Mikami’s proof in [Reference Mikami31] relies on the fact that the Schrödinger bridge $P^{\varepsilon}$ corresponds to a weak solution of a certain stochastic differential equation (SDE) with diffusion component $\sqrt{\varepsilon} \, \, \mathrm{d} W(t)$ , the special case of which is often referred to as the Föllmer process [Reference Lehec25, Reference Mikulincer and Shenfeld33]; see Remark 3. The drift function of said SDE being dependent on $\varepsilon$ in a nontrivial way (among others) makes the problem of large deviations for dynamical Schrödinger problems fall outside the realm of the Freidlin–Wentzell theory (cf. [Reference Dembo and Zeitouni17, Chapter 5]). On the other hand, Léonard’s proof in [Reference Léonard26] relies on the variational representation of the relative entropy and convex analysis techniques to establish $\Gamma$ -convergence of the Schödinger objective functions, which yields convergence of the optimal solutions. Arguably, recent interest in EOT (static Schrödinger problem) stems from the fact that EOT provides an efficient computational means for unregularized OT [Reference Cuturi14, Reference Peyré and Cuturi39]. From this perspective, extensive research has been done on convergence and speed of convergence of EOT costs, potentials, plans, and maps toward those of unregularized OT [Reference Altschuler, Niles-Weed and Stromme1, Reference Carlier, Duval, Peyré and Schmitzer8, Reference Carlier, Pegon and Tamanini9, Reference Chizat, Roussillon, Léger, Vialard and Peyré11, Reference Conforti and Tamanini13, Reference Nutz and Wiesel36, Reference Pal37, Reference Pooladian and Niles-Weed40].

To the best of the author’s knowledge, this is the first paper to establish large deviations for dynamical Schrödinger problems. As noted in the beginning, the dynamical aspect of the Schrödinger bridge has received increasing interest from application domains, which calls for further research on this subject. Our results contribute to the rigorous understanding of the connection between the dynamical Schrödinger and OT problems in the small-noise regime. From a technical perspective, our use of mixture representations to explore large deviations on path spaces might be applied to other problems. Finally, in this work we focus on the Wiener reference measure that corresponds to the quadratic OT problem. Arguably, this setting would be the most basic. Extending our large-deviation results to the dynamical problem in abstract metric spaces [Reference Monsaingeon, Tamanini and Vorotnikov34] would be of interest, but beyond the scope of this paper.

1.3. Organization

The rest of the paper is organized as follows. Section 2 contains background on EOT, Schrödinger, and OT problems, and Section 3 presents the main results. All the proofs are gathered in Section 4.

1.4. Notation and definitions

Let $x \cdot y$ denote the Euclidean inner product for $x,y \in \mathbb{R}^d$ . For $x,y \in \mathbb{R}^d$ and a Borel probability measure P on E, let $P^{xy}$ denote the (regular) conditional law of X given $(X(0),X(1)) = (x,y)$ for $X=(X(t))_{t \in [0,1]} \sim P$ . For a set A, let $\iota_{A}(x) = 0$ if $x\in A$ and $=\infty$ if $x \notin A$ . On a metric space M, let $B_M(x,r)$ denote the open ball in M with center x and radius r. For a Borel probability measure $\mu$ on a metric space, its support is denoted by $\mathrm{spt}(\mu)$ . For probability measures $\alpha,\beta$ on a common measurable space, $\mathcal{H}(\alpha \mid \beta)$ is the relative entropy defined as

\begin{align*} \mathcal{H}(\alpha \mid \beta) \,:\!=\, \begin{cases} \displaystyle\int \log \dfrac{\, \mathrm{d}\alpha}{\, \mathrm{d}\beta} \, \, \mathrm{d}\alpha & \text{if} \ \alpha \ll \beta, \\ \infty & \text{otherwise}. \end{cases}\end{align*}

A lower semicontinuous function $I\colon M \to [0,\infty]$ defined on a metric space M is called a rate function. The rate function I is called good if all level sets $\{ x \colon I(x) \le \alpha \}$ for $\alpha \in [0,\infty)$ are compact. Given a sequence of positive reals $a_k \to \infty$ , a sequence of Borel probability measures $\{ P_k \}_{k \in \mathbb{N}}$ on M satisfies a weak large-deviation principle with speed $a_k$ and rate function I if

  1. (i) for every open set $A \subset M$ , $\liminf_{k\to\infty}a_k^{-1}\log P_k(A) \ge -\inf_{x\in A}I(x)$ ;

  2. (ii) for every compact set $A \subset M$ , $\limsup_{k\to\infty}a_k^{-1}\log P_k(A) \le -\inf_{x\in A}I(x)$ .

If condition (ii) holds for every closed set $A \subset M$ , then we say that $\{ P_k \}_{k \in \mathbb{N}}$ satisfies a (full) LDP. We refer the reader to [Reference Dembo and Zeitouni17] as an excellent reference on large deviations.

2. Preliminaries

2.1. From EOT to Schrödinger problems

We first review EOT and its connection to the Schrödinger problems, which will play a key role in the proofs of the main results. Proofs of the results below can be found in [Reference Léonard27] or [Reference Nutz and Wiesel36]. Throughout, we set $\mathcal{X} = \mathrm{spt}(\mu_0)$ and $\mathcal{Y}= \mathrm{spt}(\mu_1)$ .

Given marginals $\mu_0,\mu_1$ , the EOT problem for quadratic cost $c(x,y) = |x-y|^2/2$ reads as

(4) \begin{equation} \min_{\pi\in\Pi(\mu_0,\mu_1)}\int c\,\, \mathrm{d}\pi + \varepsilon\mathcal{H}(\pi\mid\mu_0\otimes\mu_1) = \min_{\pi\in\Pi(\mu_0,\mu_1)}\varepsilon\bigg(\int(c/\varepsilon)\,\, \mathrm{d}\pi + \mathcal{H}(\pi\mid\mu_0\otimes\mu_1)\bigg). \end{equation}

Setting $\, \mathrm{d}\nu_\varepsilon=Z_\varepsilon^{-1}\mathrm{e}^{-c/\varepsilon}\,\, \mathrm{d}(\mu_0\otimes\mu_1)$ with $Z_{\varepsilon}= \int\mathrm{e}^{-c/\varepsilon}\,\, \mathrm{d}(\mu_0\otimes\mu_1)$ , we have

\begin{align*}\int (c/\varepsilon) \, \, \mathrm{d}\pi + \mathcal{H}(\pi \mid \mu_0 \otimes \mu_1)=\mathcal{H}(\pi \mid \nu_{\varepsilon}) - \log Z_{\varepsilon},\end{align*}

which implies that (4) is equivalent to the static Schrödinger problem

(5) \begin{equation} \min_{\pi\in\Pi(\mu_0,\mu_1)}\mathcal{H}(\pi\mid\nu_{\varepsilon}). \end{equation}

Recall that $\Pi (\mu_0,\mu_1)$ is compact for the weak topology. Since $\pi \mapsto \mathcal{H}(\pi \mid \nu_\varepsilon)$ is lower semicontinuous with respect to (w.r.t.) the weak topology (which follows from the variational representation of the relative entropy) and strictly convex on the set of $\pi$ such that $\mathcal{H}(\pi \mid \nu_\varepsilon)$ is finite (which follows from the strict convexity of $x \mapsto x\log x$ ), the problem in (5) admits a unique optimal solution $\pi_{\varepsilon}$ , provided $\mathcal{H}(\pi \mid \nu_{\varepsilon}) < \infty$ for some $\pi \in \Pi (\mu_0,\mu_1)$ . Since $\mu_0$ and $\mu_1$ have finite second moments, we have $\mathcal{H}(\mu_0 \otimes \mu_1 \mid \nu_{\varepsilon}) < \infty$ . We will call $\pi_{\varepsilon}$ the EOT plan.

The EOT plan has a density w.r.t. $\mu_0 \otimes \mu_1$ given by

\begin{align*} \, \mathrm{d}\pi_{\varepsilon}(x,y) = \mathrm{e}^{(\varphi_{\varepsilon}(x)+\psi_{\varepsilon}(y)-c(x,y))/\varepsilon}\,\, \mathrm{d}(\mu_0 \otimes \mu_1)(x,y),\end{align*}

where $\varphi_{\varepsilon} \in L^1(\mu_0)$ and $\psi_{\varepsilon} \in L^1(\mu_1) $ are EOT potentials satisfying the Schrödinger system

(6) \begin{equation} \begin{cases} \displaystyle\int \mathrm{e}^{(\varphi_{\varepsilon} (x) + \psi_{\varepsilon} (y) - c(x,y))/\varepsilon} \, \, \mathrm{d}\mu_1 (y)= 1, & \text{$\mu_0$-almost every }{x}, \\ \displaystyle\int \mathrm{e}^{(\varphi_{\varepsilon}(x) + \psi_{\varepsilon}(y) - c(x,y))/\varepsilon} \, \, \mathrm{d}\mu_0(x) = 1, & \text{$\mu_1$-almost every }{y}. \end{cases} \end{equation}

EOT potentials are almost surely (a.s.) unique up to additive constants, i.e. if $(\tilde{\varphi}_{\varepsilon},\tilde{\psi}_{\varepsilon})$ is another pair of EOT potentials, then there exists a constant $a \in \mathbb{R}$ such that $\tilde{\varphi}_{\varepsilon} = \varphi_{\varepsilon} + a$ $\mu_0$ -almost everywhere (a.e.) and $\tilde{\psi}_{\varepsilon} = \psi_{\varepsilon} - a$ $\mu_1$ -a.e. In many cases (e.g. as soon as $\mu_0,\mu_1$ are sub-Gaussian), we can choose versions of (finite) EOT potentials for which the Schrödinger system (6) holds for all $x \in \mathcal{X}$ and $y \in \mathcal{Y}$ (in fact for all $x \in \mathbb{R}^d$ and $y \in \mathbb{R}^d$ ); see [Reference Mena and Niles-Weed30, Proposition 6]. Whenever possible, we always choose such versions of EOT potentials.

To link EOT to the original static Schrödinger problem (2), we make the following assumption.

Assumption 1. $\mu_1 \ll \, \mathrm{d} y$ and $\mathcal{H}(\mu_1 \mid \, \mathrm{d} y) < \infty$ .

Remark 1. (On the relative entropy $\mathcal{H}(\mu_1\mid\, \mathrm{d} y)$ .) Here, as in [Reference Léonard27, Appendix A], we define the relative entropy $\mathcal{H}(\mu_1\mid\, \mathrm{d} y)$ against the Lebesgue measure $\, \mathrm{d} y$ given by

\begin{align*} \mathcal{H}(\mu_1 \mid \, \mathrm{d} y) \,:\!=\, \int \log (\rho/\mathfrak{g}) \, \, \mathrm{d}\mu_1 + \int (\log \mathfrak{g}) \, \, \mathrm{d}\mu_1 \in (-\infty,\infty], \end{align*}

where $\rho = \, \mathrm{d}\mu_1/\, \mathrm{d} y$ and $\mathfrak{g}$ is the standard Gaussian density on $\mathbb{R}^d$ .

The reference measure $R_{01}^{\varepsilon} = R^{\varepsilon} \circ e_{01}^{-1}$ for (2) has a density w.r.t. $\, \mathrm{d} y\,\, \mathrm{d}\mu_0(x)$ given by $\, \mathrm{d} R_{01}^\varepsilon(x,y) = (2\pi\varepsilon)^{-d/2}\mathrm{e}^{-c(x,y)/\varepsilon}\,\, \mathrm{d} y\, \mathrm{d}\mu_0(x)$ , so $\nu_{\varepsilon}$ is absolutely continuous w.r.t. $R_{01}^\varepsilon$ with density $\, \mathrm{d}\nu_{\varepsilon}(x,y)=(2\pi\varepsilon)^{d/2} Z_{\varepsilon}^{-1} \rho(y) \, \, \mathrm{d} R^{\varepsilon}_{01}(x,y)$ . Hence,

\begin{align*} \mathcal{H}(\pi\mid\nu_{\varepsilon}) = \mathcal{H}(\pi \mid R^{\varepsilon}_{01}) - \frac{d}{2}\log(2\pi\varepsilon) + \log Z_{\varepsilon} -\mathcal{H}(\mu_1\mid\, \mathrm{d} y),\end{align*}

and the unique optimal solution to (2) is given by $\pi_{\varepsilon}$ .

Going back to the dynamical Schrödinger problem (1), by the chain rule for the relative entropy, we have $\mathcal{H}(P \mid R^{\varepsilon}) = \mathcal{H}(P_{01} \mid R_{01}^{\varepsilon}) +\int\mathcal{H}(P^{xy} \mid R^{\varepsilon,xy})\,\, \mathrm{d} P_{01}(x,y)$ , which is minimized by taking $P^{xy} = R^{\varepsilon,xy}$ and $P_{01}=\pi_\varepsilon$ , i.e.,

(7) \begin{equation} P^{\varepsilon}(\!\cdot\!) = \int R^{\varepsilon,xy}(\!\cdot\!)\,\, \mathrm{d}\pi_{\varepsilon}(x,y) = \int\mathrm{e}^{(\varphi_{\varepsilon}(x)+\psi_{\varepsilon}(y)-c(x,y))/\varepsilon}R^{\varepsilon,xy}(\!\cdot\!)\,\, \mathrm{d}(\mu_0\otimes\mu_1)(x,y).\end{equation}

Alternatively, setting $\bar{R}^{\varepsilon} = \int R^{\varepsilon,xy} \, \, \mathrm{d}(\mu_0 \otimes \mu_1)$ , which is a $(\mu_0 \otimes \mu_1)$ -mixture of Brownian bridges, $P^\varepsilon$ has a density w.r.t. $\bar{R}^\varepsilon$ given by

(8) \begin{equation} \frac{\, \mathrm{d} P^{\varepsilon}}{\, \mathrm{d}\bar{R}^{\varepsilon}}(\omega) = \mathrm{e}^{-\phi_{\varepsilon}(\omega(0),\omega(1))/\varepsilon}, \quad \omega = (\omega(t))_{t \in [0,1]} \in E, \end{equation}

where $\phi_{\varepsilon}\colon \mathcal{X} \times \mathcal{Y} \rightarrow \mathbb{R}$ is a function defined by $\phi_{\varepsilon}(x,y) = c(x,y) - \varphi_{\varepsilon}(x) - \psi_{\varepsilon}(y)$ . To see this, for $X = (X(t))_{t \in [0,1]} \sim \bar{R}^{\varepsilon}$ and every Borel set $A \subset E$ ,

\begin{align*}\begin{split} \mathbb{E}[\mathbf{1}_{A}(X)\mathrm{e}^{-\phi_{\varepsilon}(X(0),X(1))/\varepsilon}] & = \mathbb{E}[\mathbb{P}(X \in A \mid X(0),X(1))\mathrm{e}^{-\phi_{\varepsilon}(X(0),X(1))/\varepsilon}] \\ & = \mathbb{E}[R^{\varepsilon,(X(0),X(1))}(A)\mathrm{e}^{-\phi_{\varepsilon}(X(0),X(1))/\varepsilon}] = P^{\varepsilon}(A),\end{split}\end{align*}

where we used $(X(0),X(1)) \sim \mu_0 \otimes \mu_1$ .

Remark 2. (On Assumption 1.) Assumption 1 is unavoidable to ensure the problem in (2) has a unique optimal solution. On the other hand, the initial distribution $\mu_0$ need not be absolutely continuous, e.g. $\mu_0$ can be discrete.

Remark 3. (Connnection to Follmer processes.) The Schrödinder bridge $P^{\varepsilon}$ corresponds to the law of a weak solution to a certain SDE, the special case of which is often referred to as the Föllmer process. Let $\mathcal{B}(E)$ be the Borel $\sigma$ -field on E. Equip $(E,\mathcal{B}(E),R^{\varepsilon})$ with the canonical filtration (augmented, if necessary), and denote by $X=(X(t))_{t \in [0,1]}$ the canonical process, i.e. $X(t,\omega) = \omega(t)$ for $\omega = (\omega(t))_{t \in [0,1]} \in E$ . Under $R^\varepsilon$ , $W = \varepsilon^{-1/2} (X-X(0))$ is a standard Brownian motion starting at 0. Recalling that $\rho=\, \mathrm{d}\mu_1/\, \mathrm{d} y$ , we set $\tilde{\psi}_\varepsilon(y) \,:\!=\, \varepsilon((d/2)\log(2\pi\varepsilon)+\log\rho(y)) + \psi_{\varepsilon}(y)$ . With this notation, it can be seen that

\begin{align*} P^\varepsilon(\!\cdot\!) = \int\mathrm{e}^{(\varphi_{\varepsilon}(x)+\tilde{\psi}_\varepsilon(y))/\varepsilon}R^{\varepsilon,xy}(\!\cdot\!)\, \, \mathrm{d} R_{01}^{\varepsilon}(x,y), \end{align*}

which implies (cf. the preceding argument) that

\begin{align*} \frac{\, \mathrm{d} P^{\varepsilon}}{\, \mathrm{d} R^{\varepsilon}} = \mathrm{e}^{(\varphi_{\varepsilon}(X(0))+\tilde{\psi}_\varepsilon(X(1)))/\varepsilon}. \end{align*}

We write

\begin{align*} \mathfrak{h}_\varepsilon (t,y) \,:\!=\, \begin{cases} (2\pi\varepsilon(1-t))^{-d/2}\displaystyle\int \exp\bigg\{{-}\dfrac{1}{\varepsilon}\bigg(\frac{c(y,y')}{1-t}-\tilde{\psi}_{\varepsilon}(y')\bigg)\bigg\}\, \, \mathrm{d} y' & \text{if} \ t \in [0,1), \\ \mathrm{e}^{\tilde{\psi}_{\varepsilon}(y)/\varepsilon} & \text{if} \ t = 1, \end{cases} \end{align*}

which satisfies $(\partial_t + \varepsilon\Delta_y/2)\mathfrak{h}_\varepsilon = 0$ under regularity conditions (cf. the heat equation). Applying Itô’s formula (cf. [Reference Karatzas and Shreve24, Theorem 3.3.6]), we have

\begin{align*} \log\mathfrak{h}_{\varepsilon}(1,X(1)) = \underbrace{\log\mathfrak{h}_{\varepsilon}(0,X(0))}_{=-\varphi_{\varepsilon}(X(0))/\varepsilon} + \frac{1}{\sqrt{\varepsilon}\,}\int_0^1 b_\varepsilon(t,X(t))\cdot\,\, \mathrm{d} W(t) - \frac{1}{2\varepsilon}\int|b_\varepsilon(t,X(t))|^2\,\, \mathrm{d} t, \end{align*}

where we define $b_\varepsilon (t,y) = \varepsilon \nabla_y \log \mathfrak{h}_\varepsilon (t,y)$ . We conclude that, $R^\varepsilon$ -a.s.,

\begin{align*} \frac{\, \mathrm{d} P^{\varepsilon}}{\, \mathrm{d} R^{\varepsilon}} = \exp\bigg\{\frac{1}{\sqrt{\varepsilon}\,}\int_0^1 b_\varepsilon(t,X(t))\cdot\,\, \mathrm{d} W(t) - \frac{1}{2\varepsilon}\int|b_\varepsilon(t,X(t))|^2\,\, \mathrm{d} t\bigg\}. \end{align*}

By Girsanov’s theorem, under $P^{\varepsilon}$ the new process $\tilde{W} = W - \varepsilon^{-1/2}\int_0^\cdot b_\varepsilon(t,X(t))\,\, \mathrm{d} t$ is a standard Brownian motion, and the process X solves the SDE

(9) \begin{equation} \, \mathrm{d} X(t) = b_{\varepsilon}(t,X(t))\,\, \mathrm{d} t + \sqrt{\varepsilon}\,\, \mathrm{d}\tilde{W}(t), \quad X(0) \sim \mu_0 \end{equation}

(see, e.g., [Reference Karatzas and Shreve24, Proposition 5.3.6]; see also [Reference Dai Pra15]). When $\varepsilon=1$ and $\mu_0 = \delta_0$ , we can take $\varphi_{1}(x) = 0$ and $\psi_{1}(y) = |y|^2/2$ , so $\mathrm{e}^{\tilde{\psi}_{1}}$ is the density of $\mu_1$ w.r.t. the standard Gaussian. Hence, the SDE in (9) corresponds to the Föllmer process in [Reference Lehec25, (12)] and [Reference Mikulincer and Shenfeld33]. Abusing terminology, we call $P^{\varepsilon}$ with $\varepsilon > 0$ and $\mu_0 = \delta_0$ a (perturbed) Föllmer process.

2.2. OT potentials

The rate function for Schrödinger bridges involves OT potentials. For duality theory of OT, we refer the reader to [Reference Ambrosio, Gigli and Savare2, Reference Santambrogio41, Reference Villani47]. The OT problem (3) admits a dual problem that reads as

(10) \begin{equation} \max_{\substack{(\varphi,\psi) \in L^1(\mu_0) \times L^1(\mu_1) \\ \varphi + \psi \le c}} \int\varphi\,\, \mathrm{d}\mu_0 + \int\psi\,\, \mathrm{d}\mu_1. \end{equation}

By restricting to the respective support, it is assumed without loss of generality that $\varphi$ and $\psi$ are functions defined on $\mathcal{X}$ and $\mathcal{Y}$ , respectively. One of $\varphi$ and $\psi$ can be replaced with the c-transform of the other. Recall that the c-transform of $\psi\colon \mathcal{Y} \rightarrow [\!-\infty,\infty)$ with $\psi \not \equiv -\infty$ is a function $\psi^c\colon \mathcal{X} \rightarrow [\!-\infty,\infty)$ defined by $\psi^c (x) \,:\!=\, \inf_{y \in \mathcal{Y}} \{ c(x,y) - \psi (y) \}$ , $x \in \mathcal{X}$ . The c-transform of $\varphi\colon \mathcal{X} \rightarrow [\!-\infty,\infty)$ with $\varphi \not \equiv -\infty$ is defined analogously. The dual problem (10) then reduces to

(11) \begin{equation} \max_{\psi \in L^1(\mu_1)} \int \psi^c \, \, \mathrm{d}\mu_0 + \int \psi \, \, \mathrm{d}\mu_1,\end{equation}

whose maximum is attained at some c-concave function $\psi \in L^1(\mu_1)$ with $\psi^c \in L^1(\mu_0)$ (a function on $\mathcal{Y}$ is called c-concave if it is the c-transform of a function on $\mathcal{X}$ ); see, e.g., [Reference Villani47, Theorem 5.9] or [Reference Ambrosio, Gigli and Savare2, Theorem 6.1.5]. We call such a $\psi$ an OT potential from $\mu_1$ to $\mu_0$ . An OT potential from $\mu_0$ to $\mu_1$ is defined analogously.

For any OT potential $\psi$ and any OT plan $\pi$ , the support of $\pi$ is contained in the c-superdifferential $\partial^c \psi$ of $\psi$ , $\partial^c\psi \,:\!=\, \{(x,y) \colon \psi^c(x) + \psi(y) = c(x,y)\}$ . Indeed, $\partial^c \psi$ is a closed set (as c-concave functions are upper semicontinuous) on which $\pi$ has full measure by duality, so $\mathrm{spt}(\pi)\subset \partial^c \psi$ . In particular, for $(x,y) \in \mathrm{spt}(\pi)$ , $\psi^c(x)$ and $\psi(y)$ are finite.

Observe that Assumption 1 ensures that the OT problem (3) admits a unique OT plan $\pi_o$ . Let $\mathcal{X}_{\mathrm{o}}$ and $\mathcal{Y}_{\mathrm{o}}$ denote the projections of $\mathrm{spt}(\pi_{\mathrm{o}})$ onto $\mathcal{X}$ and $\mathcal{Y}$ , respectively, i.e.,

\begin{align*} \mathcal{X}_{\mathrm{o}} = \{ x \colon (x,y) \in \mathrm{spt}(\pi_{\mathrm{o}}) \ \text{for some} \ y\},\end{align*}

and $\mathcal{Y}_{\mathrm{o}}$ is defined analogously. As $\pi_{\mathrm{o}}$ is a coupling for $\mu_0$ and $\mu_1$ , the sets $\mathcal{X}_{\mathrm{o}}$ and $\mathcal{Y}_{\mathrm{o}}$ have full $\mu_0$ - and $\mu_1$ -measure, respectively. As in [Reference Bernton, Ghosal and Nutz3], we assume uniqueness of OT potentials (from $\mu_1$ to $\mu_0$ ) on $\mathcal{Y}_{\mathrm{o}}$ to derive our large-deviation results.

Assumption 2. The dual problem (11) admits a unique OT potential $\psi$ on $\mathcal{Y}_{\mathrm{o}}$ , i.e. if $\tilde{\psi}$ is another OT potential, then $\psi - \tilde{\psi}$ is constant on $\mathcal{Y}_{\mathrm{o}}$ .

Appendix B in [Reference Bernton, Ghosal and Nutz3] and [Reference Staudt, Hundrieser and Munk42] provide various sufficient conditions under which uniqueness of OT potential holds. For example, Assumption 2 holds under each of the following cases:

  1. (A) $\mathcal{X}$ and $\mathcal{Y}$ are compact, and one of them agrees with the closure of a connected open set [Reference Karatzas and Shreve24, Theorem 7.18].

  2. (B) The interior $\mathrm{int}(\mathcal{Y})$ is connected, $\mu_1$ is absolutely continuous with positive Lebesgue density on $\mathrm{int}(\mathcal{Y})$ , and $\mu_1(\partial\mathcal{Y})=0$ [Reference Bernton, Ghosal and Nutz3, Proposition B.2].

Case (A) does not require $\mu_0$ or $\mu_1$ to have a Lebesgue density (although Assumption 1 requires $\mu_1 \ll \, \mathrm{d} y$ ). We provide a self-contained proof of Case (A) in Lemma 3 for completeness. Case (B) imposes no restrictions on $\mu_0$ , so it allows $\mu_0$ to be discrete.

Remark 4. Often, regularity conditions are imposed on the input measure $\mu_0$ to ensure uniqueness or regularity of OT potentials from $\mu_0$ to $\mu_1$ . For the (static) EOT case, the role of $\mu_0$ and $\mu_1$ is symmetric, so it is possible, without loss of generality, to focus on the forward ( $\mu_0 \to \mu_1$ ) case. However, in our dynamical setting, the roles of $\mu_0$ and $\mu_1$ are asymmetric because of Assumption 1. Since Assumption 1 already imposes absolute continuity on $\mu_1$ , we treat OT potentials for the backward direction ( $\mu_1 \to \mu_0$ ), contrary to the convention in the literature.

3. Main results

We first recall the weak convergence of $P^\varepsilon$ toward $P^{\mathrm{o}} = \int\delta_{\sigma^{xy}}\,\, \mathrm{d}\pi_{\mathrm{o}}(x,y)$ with $\sigma^{xy}(t) = (1-t)x+ty$ . Recall that the cost function is $c(x,y) = |x-y|^2/2$ .

Proposition 1. Under Assumption 1, $P^\varepsilon \to P^{\mathrm{o}}$ weakly as $\varepsilon \downarrow 0$ . The support of $P^{\mathrm{o}}$ agrees with $\Sigma_{\pi_{\mathrm{o}}} \,:\!=\, \{\sigma^{xy} \colon (x,y) \in \mathrm{spt}(\pi_{\mathrm{o}})\}$ .

Remark 5. (On Proposition 1.) A version of this proposition was proved in [Reference Mikami31] under the extra assumption that $\mu_0$ is absolutely continuous. [Reference Léonard26, Theorem 3.7] implies the proposition but the proof is somewhat involved (as it covers more general settings). We provide a simple proof in Section 4.

We are now in a position to state our main results. Let H denote the space of absolutely continuous maps $h\colon [0,1] \to \mathbb{R}^d$ with $\int_0^1 |\dot{h}(t)|^2\,\, \mathrm{d} t < \infty$ , where $\dot{h}(t) = \, \mathrm{d} h(t)/\, \mathrm{d} t$ . We endow H with the (semi-)inner product $(g,h)_H = \int_0^1\dot{g}(t)\cdot\dot{h}(t)\,\, \mathrm{d} t$ . Set $\| \cdot \|_{H} = \sqrt{(\cdot,\cdot )_H}$ . Formally, define $\| h \|_{H} = \infty$ for $h \in E \setminus H$ . We first state the weak-type LDP for Schrödinger bridges, which allows for marginals with unbounded supports.

Theorem 1. (Weak-type LDP for Schrodinger bridges.) Suppose Assumptions 1 and 2 hold. Pick any $\varepsilon_k \downarrow 0$ . Then the following hold:

  1. (i) For every open set $A \subset e_{01}^{-1}(\mathcal{X}_o \times \mathcal{Y}_o)$ (w.r.t. the relative topology),

    \begin{align*} \liminf_{k \to \infty} \varepsilon_k \log P^{\varepsilon_k} (A) \ge - \inf_{h \in A} I(h) \end{align*}
    for the rate function $I(h) = {\|h\|_H^2}/{2} - \psi^c(h(0)) - \psi(h(1))$ .
  2. (ii) For every closed set $A \subset E$ of the form $A = e_{01}^{-1}(C)$ for some compact set $C \subset \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$ ,

    \begin{equation*} \limsup_{k \to \infty} \varepsilon_{k}\log P^{\varepsilon_k} (A) \le -\inf_{h \in A} I(h). \end{equation*}

Theorem 1 is not precisely a weak LDP since (ii) holds for every compact set $C \subset e_{01}^{-1}(\mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}})$ but also for some noncompact closed sets. As such, we call Theorem 1 a weak-type LDP. If the marginals have compact supports, then a full LPD holds, subject to one technical condition essential to guarantee the uniqueness of OT potentials.

Corollary 1. (Full LDP for Schrodinger bridges.) Suppose Assumption 1 holds. Pick any $\varepsilon_k \downarrow 0$ . If $\mathcal{X}$ and $\mathcal{Y}$ are compact and one of them agrees with the closure of a connected open set, then the sequence $\{ P^{\varepsilon_k} \}_{k \in \mathbb{N}}$ satisfies a (full) LDP on E with speed $\varepsilon_k^{-1}$ and good rate function I, where I is set to $\infty$ outside $e_{01}^{-1}(\mathcal{X} \times \mathcal{Y})$ .

We leave several remarks on the preceding results.

Remark 6. (On Corollary 1.) The sets $\mathcal{X}, \mathcal{Y}$ being compact implies $\mathcal{X}_{\mathrm{o}} = \mathcal{X}$ and $\mathcal{Y}_{\mathrm{o}} = \mathcal{Y}$ , as the projections from $\mathcal{X} \times \mathcal{Y}$ onto $\mathcal{X}$ and $\mathcal{Y}$ are then closed maps. The assumption of Corollary 1 guarantees the uniqueness of OT potentials; see the discussion after Assumption 2. Since $P^{\varepsilon}$ charges no mass outside $e_{01}^{-1}(\mathcal{X} \times \mathcal{Y})$ , the full LDP is indeed deduced from the preceding theorem. Connectedness of the support of one of the marginals is essential for the uniqueness of OT potentials (see the discussion before Proposition B.2 in [Reference Bernton, Ghosal and Nutz3]). The full LPD (more specifically, establishing exponential tightness) for Schrödinger bridges requires both marginals to be compactly supported, since exponential tightness implies the limiting law $P^{\mathrm{o}}$ is concentrated on a compact set, which fails to hold if one of the marginals has unbounded support. See [Reference Nutz and Wiesel36, Remark 4.2(b)] for a relevant discussion in the static case.

Remark 7. (On the rate function I(h).) Since $\psi^c(x) + \psi(y) \le c(x,y)$ by construction, the rate function I(h) is positive as soon as $h \ne \sigma^{h(0),h(1)}$ . Even when $h =\sigma^{h(0),h(1)}$ , which entails $\|h\|_H^2/2=c(h(0),h(1))$ , the rate function $I(h) = c(h(0),h(1)) - \psi^c (h(0)) - \psi(h(1))$ can be positive provided $(h(0),h(1)) \notin \mathrm{spt}(\pi_{\mathrm{o}})$ . [Reference Bernton, Ghosal and Nutz3, Section 5] provides several conditions under which the rate function for the static case, $\phi (x,y) = c(x,y) - \psi^c(x) - \psi(y)$ , is positive outside $\mathrm{spt}(\pi_{\mathrm{o}})$ . Considering the characterization of the support of $P_{\mathrm{o}}$ , our large-deviation results essentially imply that the Schrödinger bridges $P^{\varepsilon}$ charge exponentially small masses outside $\mathrm{spt}(P_{\mathrm{o}})$ when $\varepsilon \downarrow 0$ .

Remark 8. (Proofs of Theorem 1 main and Corollary 1.) The proof of Theorem 1 uses the expression $P^{\varepsilon}(A) = \int R^{\varepsilon,xy}(A)\,\, \mathrm{d}\pi_{\varepsilon}(x,y)$ from (7). The main ingredient is the exponential continuity of $\{ R^{\varepsilon_k,xy} \}$ , i.e. establishing large-deviation upper and lower bounds for $\{R^{\varepsilon_k,(x_k,y_k)}\}_{k \in \mathbb{N}}$ when $(x_k,y_k) \to (x,y)$ , which will be proved in Proposition 4. The proof then directly evaluates $P^\varepsilon (A)$ by combining the large-deviation results for the static case from [Reference Bernton, Ghosal and Nutz3]. As noted in Remark 6, Corollary 1 is a special case of Theorem 1. Nonetheless, we provide a separate, more direct proof for the compact support case. It relies on the expression $P^{\varepsilon}(A) = \int_{A}\mathrm{e}^{-\phi_{\varepsilon} \circ e_{01}(\omega)/\varepsilon}\,\, \mathrm{d}\bar{R}^{\varepsilon}(\omega)$ from (8). Then the proof proceeds by (i) proving an LDP for $\bar{R}^{\varepsilon}$ , which follows directly from the exponential continuity [Reference Dinwoodie and Zabell18], and then (ii) adapting the (Laplace–)Varadhan lemma (cf. [Reference Dembo and Zeitouni17, Theorem 4.4.2]) to evaluate $P^{\varepsilon}(A)$ . Step (ii) is relatively simple, because, while the function $\phi_{\varepsilon}$ depends on $\varepsilon$ , so the Varadhan lemma is not directly applicable, the assumption of Corollary 1 ensures uniform convergence of the EOT potentials.

Remark 9. (On uniqueness of OT potentials.) Inspection of the proof of Corollary 1 reveals that, as long as Assumption 1 holds and $\mathcal{X}$ and $\mathcal{Y}$ are compact (but without assuming uniqueness of OT potentials), the conclusion of Corollary 1 continues to hold, provided that

(12) \begin{equation} \lim_{k \to \infty}\varphi_{\varepsilon_k} = \bar{\varphi} \quad \text{and} \quad \lim_{k \to \infty}\psi_{\varepsilon_k} = \bar{\psi} \quad \text{uniformly on $\mathcal{X}$ and $\mathcal{Y}$, respectively} \end{equation}

for some (continuous) functions $\bar{\varphi}$ and $\bar{\psi}$ on $\mathcal{X}$ and $\mathcal{Y}$ , respectively (necessarily, $(\bar{\varphi},\bar{\psi})$ are dual potentials for $(\mu_0,\mu_1)$ ). The rate function I needs to be modified so that $(\psi^c,\psi)$ are replaced with $(\bar{\varphi},\bar{\psi})$ . A similar comment applies to Proposition 3. Conversely, the uniform convergence of the EOT potentials in (12) is necessary for the LDP for the Schrödinger bridges $\{ P^{\varepsilon_k} \}_{k \in \mathbb{N}}$ to hold, by [Reference Nutz and Wiesel36, Proposition 4.5] in the static case and the contraction principle.

We now look at a few special cases.

Example 1. (Föllmer process.) When $\mu_0 = \delta_0$ , we have $\mathcal{Y}_{\mathrm{o}} = \mathcal{Y}$ and $\psi(y) = |y|^2/2$ , so the rate function reduces to $I(h) = \| h \|_H^2/2 - |h(1)|^2/2 + \iota_{\{ 0 \}}(h(0)) + \iota_{\mathcal{Y}}(h(1))$ , which vanishes if and only if $h(t) = ty$ for $t \in [0,1]$ for some $y \in \mathcal{Y}$ , i.e. if and only if $h \in \mathrm{spt}(P^{\mathrm{o}})$ .

Example 2. (Two-point marginal.) The LDP in Corollary 1 directly yields an LDP for $P^{\varepsilon_k} \circ f^{-1}$ for any continuous function f from E into another metric space by the contraction principle (cf. [Reference Dembo and Zeitouni17, Theorem 4.2.1]). We consider the case where $f=e_{st}$ for $0 \le s < t \le 1$ . Note that $P_{st}^{\varepsilon_k}$ is a coupling for $P_{s}^{\varepsilon_k}$ and $P_{t}^{\varepsilon_k}$ . Recall that the marginal flow $(P_t^{\varepsilon})_{t \in [0,1]}$ is called an entropic interpolation, and its limiting analog $(P_t^{\mathrm{o}})_{t \in [0,1]}$ is a displacement interpolation connecting $\mu_0$ and $\mu_1$ . To characterize the rate function for $P_{st}^{\varepsilon_k}$ , we need additional notation.

For a function $f\colon \mathbb{R}^d \to (-\infty,\infty]$ and $t \ge 0$ , define

\begin{align*} \mathcal{Q}_{t}(f)(y) = \inf_{x \in \mathbb{R}^d} \bigg\{\frac{c(x,y)}{t} + f(x)\bigg\}, \quad t > 0, \ \mathcal{Q}_{0}(f)=f. \end{align*}

The family of operators $\{ \mathcal{Q}_t \}_{t \ge 0}$ is called the Hopf–Lax semigroup; cf. [Reference Villani47, Chapter 7]. Assuming Case (A) after Assumption 2, we set $\varphi = \psi^c$ and extend $\varphi$ and $\psi$ to the whole $\mathbb{R}^d$ by setting $\varphi=-\infty$ and $\psi=-\infty$ outside $\mathcal{X}$ and $\mathcal{Y}$ , respectively. For $0 \le s < t \le 1$ , consider the rescaled cost $c^{s,t}(x,t) = c(x,y)/(t-s)$ .

Proposition 2. (LDP for two-point marginal.) Suppose Assumption 1 holds. Pick any $0 \le s < t \le 1$ and $\varepsilon_k \downarrow 0$ . If $\mathcal{X}$ and $\mathcal{Y}$ are compact and one of them agrees with the closure of a connected open set, then the sequence $\{ P^{\varepsilon_k}_{st} \}_{k \in \mathbb{N}}$ satisfies an LDP on $\mathbb{R}^{2d}$ with speed $\varepsilon_k^{-1}$ and good rate function $I_{st}(x,y) = c^{s,t}(x,y) - \varphi_s(x) - \psi_t(y)$ , where $(\varphi_s,\psi_t)\,:\!=\,(\!-\mathcal{Q}_s (\!-\varphi),-\mathcal{Q}_{1-t}(\!-\psi))$ are dual potentials for $(P_s^{\mathrm{o}},P_t^{\mathrm{o}})$ w.r.t. $c^{s,t}$ , i.e. optimal solutions to (10) with $(\mu_0,\mu_1,c)$ replaced by $(P_s^{\mathrm{o}},P_t^{\mathrm{o}},c^{s,t})$ .

Finally, we point out that the direct proof for Corollary 1 can be easily adapted to cover the dynamical Schrödinger problem with Langevin diffusion as a reference measure.

Remark 10. (Langevin diffusion as reference measure.) For a bounded smooth potential $V\colon \mathbb{R}^d \to \mathbb{R}$ with bounded derivatives, consider the Langevin diffusion $X=(X(t))_{t \ge 0}$ defined by the unique (strong) solution to the SDE $\, \mathrm{d} X(t) = -\nabla V(X(t))\,\, \mathrm{d} t + \, \mathrm{d} W(t)$ , $X(0) \sim \mu_0$ , where $(W(t))_{t \ge 0}$ is a standard Brownian motion starting at 0 independent of X(0). Let $p_{t}(x,y)$ denote the transition density of the Langevin diffusion X and $\check{R}^\varepsilon$ be the law of $X^{\varepsilon} \,:\!=\, (X(\varepsilon t))_{t \in [0,1]}$ defined on $\mathcal{B}(E)$ . Instead of the Wiener reference measure as in (1), we consider the dynamical Schrödinger problem with reference measure $\check{R}^{\varepsilon}$ :

(13) \begin{equation} \min_{P: P_0 = \mu_0,P_1=\mu_1} \mathcal{H}(P\mid\check{R}^{\varepsilon}). \end{equation}

Under Assumption 1, arguing as in Section 2 (see also [Reference Léonard27, Proposition 2.3]), we can see that the unique optimal solution to (13) is given by

\begin{align*} \check{P}^{\varepsilon} (\!\cdot\!) = \int \check{R}^{\varepsilon,xy}(\!\cdot\!) \, \, \mathrm{d}\check{\pi}^{\varepsilon}(x,y), \end{align*}

where $\check{R}^{\varepsilon,xy}$ is the conditional law of $X^\varepsilon$ given $(X^\varepsilon (0),X^{\varepsilon}(1))=(x,y)$ and $\check{\pi}^\varepsilon$ is the unique optimal solution to the static EOT problem

\begin{align*} \min_{\pi\in\Pi(\mu_0,\mu_1)}\int c_{\varepsilon}\,\, \mathrm{d}\pi + \varepsilon\mathcal{H}(\pi\mid\mu_0\otimes\mu_1) \end{align*}

with $c_{\varepsilon}(x,y)\,:\!=\,-\varepsilon \log p_\varepsilon (x,y)$ . Recall that the transition density $p_t (x,y)$ is everywhere positive (cf. [Reference Stroock44, Chapter 4]) and the conditional laws $\check{R}^{\varepsilon,xy}$ are defined for all $(x,y) \in \mathbb{R}^{2d}$ [Reference Bravo and Chaumont6]. The classical Varadhan asymptotics implies that $\lim_{\varepsilon \downarrow 0} c_\varepsilon (x,y) = |x-y|^2/2=c(x,y)$ (cf. [Reference Stroock44, Chapter 4]), so we can expect that the Schrödinger bridges $\{ \check{P}^\varepsilon \}_{\varepsilon > 0}$ satisfy the LDP with the same rate function I as in the Brownian case. The next proposition confirms this under a similar setting to Corollary 1.

Proposition 3. (Full LDP for Schrodinger bridges: Langevin case.) Suppose Assumption 1 holds. Pick any $\varepsilon_k \downarrow 0$ . If $\mathcal{X}$ and $\mathcal{Y}$ are compact and one of them agrees with the closure of a connected open set, then the sequence $\{ \check{P}^{\varepsilon_k} \}_{k \in \mathbb{N}}$ satisfies a (full) LDP on E with speed $\varepsilon_k^{-1}$ and good rate function I, where I is given in Corollary 1.

The condition on the potential V appears to be stronger than needed, but is imposed for the sake of simplicity. As announced, the proof follows similar arguments to the direct proof for Corollary 1. To establish exponential continuity for the Langevin bridge $\check{R}^{\varepsilon,xy}$ , we use the explicit expression for the Radon–Nikodym density of the Langevin bridge against the Brownian bridge; cf. [Reference Levy and Krener28].

4. Proofs for Section 3

Recall that $R^{\varepsilon,xy}$ is the (regular) conditional law of $x+\sqrt{\varepsilon}W$ given $x+\sqrt{\varepsilon}W(1) = y$ for a standard Brownian motion $W=(W(t))_{t \in [0,1]}$ starting at 0. Alternatively, $R^{\varepsilon,xy}$ can be characterized as the law of $\sqrt{\varepsilon} W^\circ + \sigma^{xy}$ with $W^\circ = (W(t)-tW(1))_{t \in [0,1]}$ a standard Brownian bridge. For simplicity of notation, let $z = (x,y) \in \mathbb{R}^{2d}$ and write $R^{\varepsilon,z} = R^{\varepsilon,xy}$ .

4.1. Proof of Proposition 1

Proof of Proposition 1. By the uniqueness of the OT plan, we have $\pi_{\varepsilon} \to \pi_{\mathrm{o}}$ weakly by [Reference Bernton, Ghosal and Nutz3, Proposition 3.2], which implies that

\begin{align*} \eta_{\varepsilon} \,:\!=\, \sup\nolimits_{\substack{g:\mathbb{R}^{2d}\to[\!-1,1]\\g\,\text{1-Lipschitz}}} \bigg|\int g\,\, \mathrm{d}(\pi_{\varepsilon} - \pi_{\mathrm{o}})\bigg| \to 0 \end{align*}

(see [Reference van der Vaart and Wellner46, Chapter 1.12]. Pick any 1-Lipschitz function $f\colon E \to [\!-1,1]$ . We have

\begin{align*} \int f\,\, \mathrm{d} P^{\varepsilon} = \int\bigg(\int f\,\, \mathrm{d} R^{\varepsilon,z}\bigg)\,\, \mathrm{d}\pi_{\varepsilon}(z) = \int\underbrace{\mathbb{E}[f(\sqrt{\varepsilon}W^{\circ} + \sigma^z)]}_{=:g_{\varepsilon}(z)}\,\, \mathrm{d}\pi_{\varepsilon}(z). \end{align*}

By construction, $g_\varepsilon$ is bounded by 1, $|g_\varepsilon(z) - g_\varepsilon(z')| \le \|\sigma^z - \sigma^{z'}\|_{E} \le 2|z-z'|$ , and $\lim_{\varepsilon \downarrow 0} g_{\varepsilon}(z)= f(\sigma^z) = \int f\,\, \mathrm{d}\delta_{\sigma^z}$ . Hence,

\begin{align*} \int g_{\varepsilon}\,\, \mathrm{d}\pi_{\varepsilon} \le \int g_{\varepsilon}\,\, \mathrm{d}\pi_o + 2\eta_\varepsilon = \underbrace{\int\bigg(\int f\,\, \mathrm{d}\delta_{\sigma^z}\bigg)\,\, \mathrm{d}\pi_{\mathrm{o}}}_{=\int f\,\, \mathrm{d} P^o} + o(1), \end{align*}

where we used the dominated convergence theorem. The reverse inequality follows similarly, and we conclude that $\lim_{\varepsilon\downarrow0}\int f\,\, \mathrm{d} P^{\varepsilon} = \int f\,\, \mathrm{d} P^{\mathrm{o}}$ , which yields $P^\varepsilon \to P^{\mathrm{o}}$ weakly. The second claim follows from Lemma 1, which follows.

Lemma 1. For any Borel probability measure $\gamma$ on $\mathbb{R}^{2d}$ , the mixture $P = \int\delta_{\sigma^{xy}}\,\, \mathrm{d}\gamma(x,y)$ has support $\Sigma_{\gamma} \,:\!=\, \{\sigma^{xy} \colon (x,y) \in \mathrm{spt} (\gamma)\}$ .

Proof. The set $\Sigma_{\gamma}$ is closed in E. Pick any $(x,y) \in \mathrm{spt}(\gamma)$ and any open set U containing $\sigma^{xy}$ . Since $O = \{ (x',y') \colon \sigma^{x'y'} \in U \}$ is open in $\mathbb{R}^{2d}$ (as $(x',y') \mapsto \sigma^{x'y'}$ is continuous), we have, for $(\xi_0,\xi_1) \sim \gamma$ , $P(U) = \mathbb{P}(\sigma^{\xi_0,\xi_1} \in U) = \mathbb{P}((\xi_0,\xi_1) \in O) = \gamma(O) > 0$ , which yields $\mathrm{spt}(P) = \Sigma_{\gamma}$ .

4.2. Exponential continuity of Brownian bridges

For given $x,y \in \mathbb{R}^d$ , [Reference Hsu23] showed that the sequence $\{ R^{\varepsilon,xy} \}_{\varepsilon > 0}$ satisfies an LDP with rate function

\begin{align*}J_{xy}(h) =\frac{\| h \|_{H}^2}{2} - c(x,y) + \iota_{\{(x,y)\}}(h(0),h(1)).\end{align*}

Write $J_z(h) = J_{xy}(h)$ for $z= (x,y)$ . Additionally, set $H_z \,:\!=\, \{ h \in H \colon (h(0),h(1)) = z \}$ . Pick any $\varepsilon_k \downarrow 0$ .

Proposition 4. (Exponential continuity of Brownian bridges)

  1. (i) For every open set $A \subset E$ , $\liminf_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (A) \ge -\inf_{h \in A} J_{z}(h)$ whenever $z_k \to z$ in $\mathbb{R}^{2d}$ .

  2. (ii) For every closed set $A \subset E$ ,

    (14) \begin{equation} \limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (A) \le -\inf_{h \in A} J_{z}(h) \end{equation}
    whenever $z_k \to z$ in $\mathbb{R}^{2d}$ .

Proof. Hsu’s proof in [Reference Hsu23] that relies on transition function estimates seems difficult to adapt to establishing the exponential continuity. Instead, we adapt the proof of large deviations for abstract Wiener spaces; cf. [Reference Stroock45, Chapter 8]. For the sake of completeness, we provide a self-contained proof.

For (i), it suffices to show that for every $h \in H$ such that $J_z (h) < \infty$ ,

\begin{align*} \liminf_{r \downarrow 0} \liminf_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k}(B_E(h,r)) \ge -J_z (h). \end{align*}

Set $\bar{h} \in H_0$ by $\bar{h}= h -\sigma^{xy}$ and $h_k \in H_{z_k}$ by $h_k = \bar{h} + \sigma^{x_k,y_k}$ . Since $\| h_k - h \|_E \to 0$ , $B_E(h_k,r/2) \subset B_E(h,r)$ for large k. Observe that

\begin{align*} R^{\varepsilon_k,z_k}(B_E(h_k,r/2)) = R^{\varepsilon_k,0}(B_E(\bar{h},r/2)) = \mathbb{P}\big(W^\circ \in B_E(\bar{h}/\sqrt{\varepsilon_k},r/(2\sqrt{\varepsilon_k}))\big). \end{align*}

Recall that $(H_0, (\cdot,\cdot)_H)$ is a reproducing kernel Hilbert space for $W^\circ$ (cf. [Reference Giné and Nickl22, Exercise 2.6.16]), whose closure in E agrees with $E_0 \,:\!=\, \{\omega \in E \colon \omega (0)=\omega(1)=0\}$ . Hence, the pair of spaces $(H_0,E_0)$ coupled with the law of $W^\circ$ constitutes an abstract Wiener space; cf. [Reference Stroock45, Chapter 8]. Let $E_0^*$ denote the topological dual of $E_0$ with dual norm $\| \cdot \|_{E_0^*}$ , and $\langle \omega,\omega^* \rangle$ denote the duality pairing for $\omega \in E_0$ and $\omega^* \in E_0^*$ . Since $H_0$ is continuously embedded as a dense subspace of $E_0$ (as $\| \cdot \|_E \le \| \cdot \|_H$ on $H_0$ ), for each $\omega^* \in E_0^*$ , there exists a unique $h_{\omega^*} \in H_0$ with the property that $(h,h_{\omega^*})_H = \langle h,\omega^* \rangle$ for all $h \in H_0$ , and the map $\omega^* \mapsto h_{\omega^*}$ is continuous, linear, one-to-one, and onto a dense subspace of $H_0$ (cf. [Reference Stroock45, Lemma 8.2.3]). Let $0 < \delta < r/2$ and $\omega^* \in E_0^*$ be such that $B_{E_0}(h_{\omega^*},\delta) \subset B_{E_0}(\bar{h},r/2)$ . Now, an application of the Cameron–Martin formula (cf. [Reference Stroock45, Theorem 8.2.9]) yields

\begin{align*} \begin{split} R^{\varepsilon_k,0}(B_E(\bar{h},r/2)) & = R^{\varepsilon_k,0}(B_{E_0}(\bar{h},r/2)) \ge R^{\varepsilon_k,0}(B_{E_0}(h_{\omega^*},\delta)) \\ & = \mathbb{P}\big(W^\circ-\varepsilon_k^{-1/2}h_{\omega^*} \in B_{E_0}\big(0,\varepsilon_k^{-1/2}\delta\big)\big) \\ & = \mathbb{E}\big[\exp\big\{{-}\varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle - \varepsilon_k^{-1}\|h_{\omega^*}\|_{H}^2/2\big\}\mathbf{1}_{B_{E_0}(0,\varepsilon_k^{-1/2}\delta)}(W^\circ)\big] \\ & \ge \exp\big\{{-}\delta\varepsilon_k^{-1}\|\omega^*\|_{E_0^*} - \varepsilon_k^{-1}\|h_{\omega^*}\|_{H}^2/2\big\} \mathbb{P}\big(W^\circ \in B_{E_0}\big(0,\varepsilon_k^{-1/2}\delta\big)\big), \end{split} \end{align*}

so that, by taking $k \to \infty$ ,

\begin{align*} \liminf_{k\to\infty}\varepsilon_k\log R^{\varepsilon_k,0}(B_E(\bar{h},r/2)) \ge -\delta\|\omega^*\|_{E_0^*} - \frac{\|h_{\omega^*}\|_H^2}{2}. \end{align*}

Choosing $\delta = r/4$ and $\omega^* \in E_0^*$ with $\| \bar{h}-h_{\omega^*} \|_{H} < r/4$ , and then taking $r \downarrow 0$ , we have

\begin{align*} \liminf_{r\downarrow0}\liminf_{k\to\infty}\varepsilon_k\log R^{\varepsilon_k,0}(B_E(\bar{h},r/2)) \ge -\frac{\|\bar{h}\|_H^2}{2} = -\frac{1}{2}(\|h\|_{H}^2 - |x-y|^2) = -J_z(h). \end{align*}

For (ii), we first show that for every $h \in E$ ,

(15) \begin{equation} \limsup_{r \downarrow 0} \limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (B_E(h,r)) \le -J_z(h). \end{equation}

Using the same notation as in (i), we have $B_E(h,r) \subset B_E(h_k,2r)$ for large k and

\begin{align*} \begin{split} R^{\varepsilon_k,z_k}(B_E(h_k,2r)) & = \mathbb{P}(W^\circ\in B_E(\bar{h}/\sqrt{\varepsilon_k},2r/\sqrt{\varepsilon_k})) \\ & = \mathbb{E}\big[\exp\big\{{-}\varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle + \varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle\big\} \mathbf{1}_{B_{E_0}(\bar{h}/\sqrt{\varepsilon_k},2r/\sqrt{\varepsilon_k})}(W^\circ)\big] \\ & \le \exp\big\{{-}\varepsilon_k^{-1}(\langle\bar{h},\omega^*\rangle-2r\|\omega^*\|_{E_0^*})\big\} \mathbb{E}\big[\mathrm{e}^{\varepsilon_k^{-1/2}\langle W^\circ,\omega^*\rangle}\big] \\ & = \exp\big\{{-}\varepsilon_k^{-1}(\langle\bar{h},\omega^*\rangle - \|h_{\omega^*}\|_H^2/2 - 2r\|\omega^*\|_{E_0^*})\big\} \end{split} \end{align*}

for all $\omega^* \in E_0^*$ , where we used the fact that $\langle W^\circ,\omega^*\rangle \sim N(0,\|h_{\omega^*}\|_{H}^2)$ . This yields

\begin{align*} \begin{split} \limsup_{r\downarrow0}\limsup_{k\to\infty}\varepsilon_k\log R^{\varepsilon_k,z_k}(B_E(h_k,2r)) & \le -\sup_{\omega^*\in E_0^*}\bigg(\langle\bar{h},\omega^*\rangle-\frac{\|h_{\omega^*}\|_H^2}{2}\bigg) \\ & = \begin{cases} -\dfrac{\|\bar{h}\|_H^2}{2} & \text{if $\bar{h} \in H_0$}, \\ -\infty & \text{otherwise}. \end{cases} \end{split} \end{align*}

Now, $\bar{h} \in H_0$ if and only if $h \in H_z$ , and $\| \bar{h} \|_H^2=\| h \|_H^2 - |x-y|^2$ , which leads to (15).

Given (15), it is standard to show that (14) holds for every compact set $A \subset E$ . It remains to verify exponential tightness for $\{ R^{\varepsilon_k,z_k} \}_{k \in \mathbb{N}}$ (cf. [Reference Dembo and Zeitouni17, Lemma 1.2.18]), i.e. for every $\alpha < \infty$ , there exists a compact set $K \subset E$ such that $\limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,z_k} (K^{\mathrm{c}}) < -\alpha$ . We first note that the exponential tightness holds for $\{ R^{\varepsilon_k,0} \}_{k \in \mathbb{N}}$ . Indeed, by [Reference Stroock45, Corollary 8.3.10], we can construct a separable Banach space F that is continuously embedded in $E_0$ as a measurable subset with the properties that $\mathbb{P}(W^\circ \in F)=1$ , bounded subsets of F are totally bounded in $E_0$ , and $(H_0,F)$ coupled with the restriction of the law of $W^\circ$ on F is another abstract Wiener space. Then, choosing $K_0$ to be the $E_0$ -closure of a ball in F with large enough radius satisfies $\limsup_{k \to \infty} \varepsilon_k \log R^{\varepsilon_k,0} (K_0^{\mathrm{c}}) < -\alpha$ by Fernique’s theorem (cf. [Reference Stroock45, Theorem 8.2.1]), and $K_0$ is compact in $E_0$ by construction.

Now, for an arbitrary bounded neighborhood $O \subset \mathbb{R}^{2d}$ of z, set $K_1=\{ \sigma^{x'y'} \colon (x',y') \in O \}$ . By the Ascoli–Arzelà theorem, the set $K=\{ \omega + \omega' \colon \omega \in K_0, \omega' \in K_1 \}$ is relatively compact in E, and such that $R^{\varepsilon_k,z_k}(K) {\ge} R^{\varepsilon_k,0} (K_0)$ for large k. Indeed, $\sigma^{z_k} \in K_1$ for large k, so if $\omega \in K_0$ , then $\omega + \sigma^{z_k} \in K$ , which implies $R^{\varepsilon_k,0}(K_0) \le R^{\varepsilon_k,0} (\omega+\sigma^{z_k} \in K) = R^{\varepsilon_k,z_k}(K)$ . This yields exponential tightness for $\{ R^{\varepsilon_k,z_k} \}_{k \in \mathbb{N}}$ .

Given the exponential continuity, the following corollary concerning large deviations of mixtures of Brownian bridges follows immediately from [Reference Dinwoodie and Zabell18, Theorems 2.1 and 2.2]. The result might be of independent interest.

Corollary 2. (Large deviations for mixtures of Brownian bridges.) Let $\gamma$ be a Borel probability measure on $\mathbb{R}^{2d}$ . Consider the mixture distribution $Q^{\varepsilon} = \int R^{\varepsilon,xy} \, \, \mathrm{d}\gamma(x,y)$ .

  1. (i) The function

    (16) \begin{equation} J(h) \,:\!=\, \inf_{(x,y)\in\mathrm{spt}(\gamma)}J_{xy}(h) = \frac{\|h\|_H^2}{2} - c(h(0),h(1)) + \iota_{\mathrm{spt}(\gamma)}(h(0),h(1)) \end{equation}
    is lower semicontinuous from E into $[0,\infty]$ .
  2. (ii) For every open set $A \subset E$ , $\liminf_{k \to \infty} \varepsilon_k\log Q^{\varepsilon_k} (A) \ge -\inf_{h \in A} J(h)$ .

  3. (iii) If $\gamma$ is compactly supported, then for every closed set $A \subset E$ ,

    \begin{align*} \limsup_{k \to \infty} \varepsilon_k\log Q^{\varepsilon_k} (A) \le -\inf_{h \in A} J(h), \end{align*}
    and J is a good rate function.

Proof. For (i), set $F = \{ (h,z) \in E \times \mathbb{R}^{2d} \colon (h(0),h(1)) = z \}$ . The rate function $J_{z}(h)$ can be expressed as

\begin{equation*} J_{z}(h) = \frac{\|h\|_H^2}{2} - c(x,y) + \iota_{F}(h,z) = \frac{\|h\|_H^2}{2} - c(h(0),h(1)) + \iota_{F}(h,z). \end{equation*}

This yields the second expression for the J function in (16). Since $\mathrm{spt}(\gamma)$ is closed by definition, what remains is to verify that the mapping $E \ni h \mapsto \| h \|_{H}^2/2$ is lower semicontinuous. It suffices to show that the set $\{ h \in H \colon \| h \|_H \le 1 \}$ is closed in E. Let $\{ h_n \}_{n \in \mathbb{N}} \subset H$ be a sequence with $ \| h_n \|_H \le 1$ for all $n \in \mathbb{N}$ and $h_n \to h_\infty$ in E. We may assume without loss of generality that $h_n(0)=h_\infty(0)=0$ . Since $\tilde{H}= \{ h \in H \colon h(0) = 0\}$ endowed with inner product $( \cdot, \cdot )_H$ is a Hilbert space, by the Banach–Alaoglu theorem, there exists a subsequence $h_{n'}$ such that $h_{n'} \to \tilde{h}$ weakly in $\tilde{H}$ for some $\tilde{h} \in \tilde{H}$ with $\| \tilde{h} \|_H \le 1$ , i.e. $\lim_{n'} (h_{n'},g)_H = (\tilde{h},g)_H$ for all $g \in \tilde{H}$ . This implies that $h_\infty = \tilde{h}$ (choose appropriate g) and $\| h_\infty \|_{H} \le 1$ , as desired.

Part (ii) follows from Proposition 4(i) and [Reference Dinwoodie and Zabell18, Theorem 2.1].

For (iii), the large-deviation upper bound follows from Proposition 4(ii) and [Reference Dinwoodie and Zabell18, Theorem 2.2]. Finally, we verify that J has compact level sets, but this follows from [Reference Dembo and Zeitouni17, Lemma1.2.18], since the argument in Proposition 4(ii) indeed shows that $\{ Q^{\varepsilon_k} \}_{k \in \mathbb{N}}$ is exponentially tight (replace O by $\mathrm{spt}(\gamma)$ ).

4.3. Proof of Theorem 1

Proof of Theorem 1 . Set $\phi(z) = c(x,y) - \psi^c(x) - \psi(y)$ for $z = (x,y) \in \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$ .

For (i), it suffices to show that for any $h \in e_{01}^{-1}(\mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}) \cap H$ and $r > 0$ ,

\begin{align*} \liminf_{k \to \infty} \varepsilon_k \log P^{\varepsilon_k}(B_E(h,r)) \ge - I(h). \end{align*}

Set $z = (h(0),h(1)) \in \mathcal{X}_o \times \mathcal{Y}_o$ . By the exponential continuity of $\{ R^{\varepsilon_k,z} \}$ established in Proposition 4, for every $\delta >0$ we can choose an open neighborhood $O_z \subset \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$ of z and a positive integer $k_z$ such that, for every $z' \in O_z$ ,

\begin{align*} \varepsilon_k \log R^{\varepsilon_k,z'}(B_E(h,r)) \ge -\inf_{h' \in B_E(h,r)} J_z (h')- \delta, \quad k \ge k_z. \end{align*}

For if not, for the open ball $O_i$ in $\mathcal{X}\circ \times \mathcal{Y}\circ$ with center z and radius $i^{-1}$ , we can find $z_i' \in O_i$ and a large enough positive integer $k_i$ (with $k_i > k_{i-1}$ ) such that

\begin{align*} \varepsilon_{k_i} \log R^{\varepsilon_{k_i},z_i'}(B_E(h,r)) < -\inf_{h' \in B_E(h,r)} J_z (h')- \delta, \end{align*}

but this contradicts the exponential continuity (as $z_i' \to z$ ). Hence,

\begin{align*} \begin{split} P^{\varepsilon_k}(B_E(h,r)) & \ge \int_{O_z}\exp\big\{\varepsilon_k^{-1}\cdot\varepsilon_k\log R^{\varepsilon_k,z'}(B_E(h,r))\big\}\, \, \mathrm{d}\pi_{\varepsilon_k}(z') \\ & \ge \exp\Big\{\!-\varepsilon_k^{-1}\Big(\inf_{h' \in B_E(h,r)} J_z (h') + \delta\Big)\Big\} \pi_{\varepsilon_k}(O_z). \end{split} \end{align*}

Invoking [Reference Bernton, Ghosal and Nutz3, Corollary 4.7], we arrive at

\begin{align*} \begin{split} \liminf_{k \to \infty} \varepsilon_k \log P^{\varepsilon_k} (B_E(h,r)) & \ge -\inf_{h' \in B_E(h,r)} J_z (h') - \delta - \inf_{z' \in O_z}\phi(z') \\ & \ge -(J_z(h)+\phi(z)) - \delta = -I(h) - \delta, \end{split} \end{align*}

establishing the desired claim.

For (ii), we first observe that for $A = e_{01}^{-1}(C)$ with $C \subset \mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$ compact, $P^{\varepsilon}(A) = \int_{C} R^{\varepsilon,z}(A) \, \, \mathrm{d}\pi_{\varepsilon}(z)$ . Taking into account [Reference Bernton, Ghosal and Nutz3, Proposition 4.5], extend $\phi$ to $\mathcal{X} \times \mathcal{Y}$ as

\begin{align*} \phi(x,y) = \sup_{\ell\ge2}\sup_{\{(x_i,y_i)\}_{i=1}^\ell\subset\mathrm{spt}(\pi_{\mathrm{o}})} \sup_{\tau}\sum_{i=1}^\ell c(x_i,y_i) - \sum_{i=1}^\ell c(x_i,y_{\tau(i)}), \end{align*}

where $\sup_{\tau}$ is taken over all permutations of $\{ 1,\ldots,\ell \}$ and $(x_1,y_1)=(x,y)$ . The function $\phi\colon\mathcal{X}\times\mathcal{Y}\rightarrow[0,\infty]$ is lower semicontinuous [Reference Bernton, Ghosal and Nutz3, Lemma 4.2]) and agrees with the previous definition of $\phi$ on $\mathcal{X}_{\mathrm{o}} \times \mathcal{Y}_{\mathrm{o}}$ . Let $\delta > 0$ be given. For every $z \in C$ , by the exponential continuity of $\{ R^{\varepsilon_k,z} \}$ established in Proposition 4 we can choose a bounded open neighborhood $O_z \subset \mathcal{X} \times \mathcal{Y}$ of z and a positive integer $k_z$ such that, for every $z' \in O_z$ ,

\begin{align*} \varepsilon_k \log R^{\varepsilon_k,z'}(A) \le -\inf_{h \in A} J_z (h) + \delta, \quad k \ge k_z. \end{align*}

Furthermore, since $\phi$ is lower semicontinuous, by choosing $O_z$ smaller if necessary, we have $\inf_{z' \in \bar{O}_z} \phi(z') \ge \phi(z) - \delta$ , where $\bar{O}_z$ denotes the closure of $O_z$ in $\mathcal{X} \times \mathcal{Y}$ . By the compactness of C, we can find $z_1,\ldots,z_N \in C$ such that $C \subset \bigcup_{i=1}^N O_{z_i}$ , so

\begin{align*} P^{\varepsilon_k}(A) \le \sum_{i=1}^N\int_{O_{z_i}}\mathrm{e}^{\varepsilon_k^{-1}\cdot\varepsilon_k\log R^{\varepsilon_k,z}(A)}\, \, \mathrm{d}\pi_{\varepsilon_k}(z) \le \sum_{i=1}^N\mathrm{e}\Big\{\varepsilon_k^{-1}\Big(-\inf_{h\in A}J_{z_i}(h) + \delta + \varepsilon_k \log\pi_{\varepsilon_k}(\bar{O}_{z_i})\Big)\Big\}. \end{align*}

We invoke the following elementary result, whose proof follows from Jensen’s inequality [Reference Chatterjee10].

Lemma 2. (Smooth max function.) For $\beta > 0$ and $v = (v_1,\ldots,v_N) \in \mathbb{R}^N$ , consider a smooth max function $m_\beta(v) = \beta^{-1}\log (\sum_{i=1}^N \mathrm{e}^{\beta v_i})$ . Then, for every $v \in \mathbb{R}^N$ , we have $\max_{1 \le i \le N}v_i \le m_\beta (v) \le \max_{1 \le i \le N}v_i + \beta^{-1}\log N$ .

Using Lemma 1 combined with [Reference Bernton, Ghosal and Nutz3, Corollary 4.3], we have

\begin{align*} \begin{split} \varepsilon_k\log P^{\varepsilon_k}(A) & \le \max_{1 \le i \le N}\Big\{{-}\inf_{h\in A}J_{z_i}(h) + \delta + \varepsilon_k\log\pi_{\varepsilon_k}(\bar{O}_{z_i})\Big\} + \varepsilon_k\log N \\ & \le \max_{1 \le i \le N}\Big\{{-}\inf_{h\in A}J_{z_i}(h) - \inf_{z\in\bar{O}_{z_i}}\phi(z)\Big\} + \delta + o(1) \\ & \le \max_{1 \le i \le N}\Big\{{-}\inf_{h\in A}J_{z_i}(h) - \phi(z_i)\Big\} + 2\delta + o(1) \\ & \le -\inf_{h\in A}\inf_{z\in C}\{J_z(h) + \phi(z)\} + 2\delta + o(1) \\ & = -\inf_{h\in A}\inf_{z\in C}\{I(h) + \iota_{\{z\}}(h(0),h(1))\} + 2\delta + o(1) \\ &= -\inf_{h \in A} I(h) + 2 \delta +o(1), \end{split} \end{align*}

where we used the fact that $(h(0),h(1)) \in C$ whenever $h \in A$ by our choice of A. This completes the proof.

4.4. Direct proof of Corollary 1

We first prove the following technical result concerning convergence of EOT potentials.

Lemma 3. (Convergence of EOT potentials.) Suppose that $\mathcal{X}$ and $\mathcal{Y}$ are compact and one of them agrees with the closure of a connected open set. Then, under normalization $\int\psi^c\,\, \mathrm{d}\mu_0 = \int\psi\,\, \mathrm{d}\mu_1$ , the OT potential $\psi$ from $\mu_1$ to $\mu_0$ is everywhere unique, and $(\psi^c,\psi)$ are bounded and Lipschitz on $\mathcal{X} \times \mathcal{Y}$ . Furthermore, let $(\varphi_{\varepsilon},\psi_{\varepsilon})$ be the unique EOT potentials under normalization $\int\varphi_{\varepsilon}\,\, \mathrm{d}\mu_0 = \int\psi_{\varepsilon}\,\, \mathrm{d}\mu_1$ . Then, for any sequence $\varepsilon_{k} \downarrow 0$ , $\varphi_{\varepsilon_k} \to \psi^c$ and $\psi_{\varepsilon_{k}} \to \psi$ uniformly on $\mathcal{X}$ and $\mathcal{Y}$ , respectively.

Proof. The lemma follows from [Reference Santambrogio41, Proposition 7.18] and [Reference Nutz and Wiesel36, Proposition 3.2]. We include a self-contained proof for completeness. First, under the current assumption, we observe that any OT potential $\psi$ is bounded and Lipschitz on $\mathcal{Y}$ . We have seen that the support of any OT plan $\pi$ is contained in $\partial^c\psi$ , so any $(x_0,y_0) \in \mathrm{spt}(\pi)$ satisfies $\psi(y_0) > -\infty$ and $\psi^c(x_0) > -\infty$ , which entails $\psi = \psi^{cc} \le \sup_{\mathcal{X}\times\mathcal{Y}}c - \psi^c(x_0)$ and $\psi \ge -\sup_{\mathcal{X}}\psi^c \ge -\sup_{\mathcal{X}\times\mathcal{Y}}c + \psi(y_0)$ . Lipschitz continuity follows from c-concavity. For the uniqueness, suppose $\mathrm{int}(\mathcal{Y})$ is connected. Recall that the projections of $\mathrm{spt}(\pi)$ onto $\mathcal{X}$ and $\mathcal{Y}$ agree with $\mathcal{X}$ and $\mathcal{Y}$ , respectively (cf. Remark 6). For any OT potential $\psi$ and any $y_0 \in \mathrm{int} (\mathcal{Y})$ , we can find $x_0 \in \mathcal{X}$ such that $\psi^c(x_0) + \psi(y_0) =c(x_0,y_0)$ , i.e. $c(x_0,\cdot) - \psi(\!\cdot\!)$ is minimized at $y_0$ , which entails $\nabla \psi(y_0) = \nabla_y c(x_0,y_0)$ as long as $\psi$ is differentiable at $y_0$ . We have shown that $\nabla \psi$ is uniquely determined Lebesgue-a.e. on $\mathrm{int} (\mathcal{Y})$ . As $\mathrm{int}(\mathcal{Y})$ is connected, $\psi$ is uniquely determined on $\mathrm{int}(\mathcal{Y})$ up to additive constants. By continuity, $\psi$ is uniquely determined on $\mathcal{Y}$ up to additive constants. If $\mathrm{int}(\mathcal{X})$ is connected, then the OT potential $\varphi$ from $\mu_0$ to $\mu_1$ is unique up to additive constants. If $\psi$ is an OT potential from $\mu_1$ to $\mu_0$ , then by the definition of the c-transform, we must have $\int(\psi-\varphi^c)\,\, \mathrm{d}\mu_1 = 0$ , which yields $\psi = \varphi^c$ $\mu_1$ -a.e. By continuity, we have $\psi = \varphi^c$ on $\mathcal{Y}$ .

For the second result, by the Schrödinger system (6) and Jensen’s inequality, we have $\psi_{\varepsilon}^c \le \varphi_{\varepsilon} \le \sup_{\mathcal{X} \times \mathcal{Y}} c$ and $\varphi_{\varepsilon}^c \le \psi_{\varepsilon} \le \sup_{\mathcal{X} \times \mathcal{Y}} c$ , so the EOT potentials are uniformly bounded by $\sup_{\mathcal{X} \times \mathcal{Y}} c$ . Furthermore, under our assumption, the EOT potentials extend to smooth functions on $\mathbb{R}^d$ by the Schrödinger system, and directly calculating derivatives shows that $|\nabla\varphi_{\varepsilon}|\vee|\nabla\psi_{\varepsilon}| \le C$ on $\mathcal{X} \times \mathcal{Y}$ for some constant C independent of $\varepsilon$ . Hence, the Ascoli–Arzelà theorem applies, and after passing to a subsequence, $\varphi_{\varepsilon_k} \to \bar{\varphi}$ and $\psi_{\varepsilon_k} \to \bar{\psi}$ uniformly on $\mathcal{X}$ and $\mathcal{Y}$ , respectively. By the identity $\int\mathrm{e}^{(\varphi_{\varepsilon}+\psi_{\varepsilon}-c)/\varepsilon}\,\, \mathrm{d}(\mu_0\otimes\mu_1) = 1$ and Fatou’s lemma, we have $\bar{\varphi} + \bar{\psi} \le c$ $(\mu_0 \otimes \mu_1)$ -a.e. By continuity, $\bar{\varphi} + \bar{\psi} \le c$ on $\mathcal{X} \times \mathcal{Y}$ , but $\bar{\psi}^c \le \bar{\varphi}$ and $\bar{\varphi}^c \le \bar{\psi}$ by construction, $\bar{\varphi} = \bar{\psi}^c$ and $\bar{\psi} = \bar{\varphi}^c$ , i.e. $(\bar{\varphi},\bar{\psi})$ are c-concave. Now, using duality, for any OT plan $\pi$ , $\int c\,\, \mathrm{d}\pi \le \lim_{k}\big(\int c\,\, \mathrm{d}\pi_{\varepsilon_k} + \varepsilon_k H(\pi_{\varepsilon_k}\mid\mu_0\otimes\mu_1)\big) = \int\bar{\varphi}\,\, \mathrm{d}\mu_0 + \int\bar{\psi}\,\, \mathrm{d}\mu_1 \le \int c\,\, \mathrm{d}\pi$ , so $(\bar{\varphi},\bar{\psi})$ are OT potentials. Since $\int\bar{\varphi}\,\, \mathrm{d}\mu_0 = \int\bar{\psi}\,\, \mathrm{d}\mu_1$ by construction, by the uniqueness result, $\bar{\varphi} = \psi^c$ and $\bar{\psi}=\psi$ . Finally, by the uniqueness of the limits, along the original sequence, $\varphi_{\varepsilon_k} \to \psi^c$ and $\psi_{\varepsilon_k} \to \psi$ uniformly on $\mathcal{X}$ and $\mathcal{Y}$ , respectively.

Direct proof of Corollary 1. Set $S = e_{01}^{-1} (\mathcal{X} \times \mathcal{Y}) \subset E$ . Recall that $\bar{R}^{\varepsilon} = \int R^{\varepsilon,xy}\,\, \mathrm{d}(\mu_0 \otimes \mu_1)$ . By construction, $\bar{R}^{\varepsilon}(S)=1$ for all $\varepsilon > 0$ . Abusing notation, we shall write $\phi_{\varepsilon}(\omega) = \phi_{\varepsilon} (\omega(0),\omega(1))$ . With this convention, we have $P^\varepsilon(A) = \int_A\mathrm{e}^{-\phi_\varepsilon/\varepsilon}\,\, \mathrm{d}\bar{R}^{\varepsilon}$ . Set $J(h) = \inf_{(x,y)\in\mathcal{X}\times\mathcal{Y}}J_{xy}(h)$ and $\phi(h) = \phi(h(0),h(1)) = c(h(0),h(1))-\psi^c (h(0))-\psi(h(1))$ for $h \in S$ .

Step 1

Let $A \subset E$ be open and pick any $h \in A$ such that $I(h)<\infty$ (if no such h exists then the conclusion is trivial). By Lemma 3, for every $\delta > 0$ there exists an open neighborhood $G \subset A$ of h such that $\sup_{\omega \in G \cap S}\phi_{\varepsilon_k}(\omega) \le \phi(h) + \delta$ for all large k. Hence,

\begin{align*} P^{\varepsilon_k}(A) \ge P^{\varepsilon_k}(G) \ge \mathrm{e}^{-(\phi(h)+\delta)/\varepsilon_k}\bar{R}^{\varepsilon_k}(G). \end{align*}

Corollary 2 implies that $\varepsilon_k \log P^{\varepsilon_k} (A) \ge -\phi(h) - \delta - J(h) + o(1)$ as $k \to \infty$ . Noting that $\phi(h) + J(h) = I(h)$ yields the desired lower bound.

Step 2

For the upper bound, we first note that by Lemma 3, $\phi_{\varepsilon}$ are uniformly lower bounded on S, and $\phi_{\varepsilon}(\omega) \ge -M$ for all $\omega \in S$ and $\varepsilon > 0$ for some $M >0$ . Let $A \subset E$ be closed. Pick any $\alpha < \infty$ and $\delta > 0$ . Set $\Psi_{J}(\alpha) = \{ h \colon J(h) \le \alpha \} \cap A$ , which is a compact subset of E as J is a good rate function and A is closed. By Lemma 3 and the lower semicontinuity of the function J, for every $h \in \Psi_{J}(\alpha)$ (which entails $h \in S$ ), we can find an open neighborhood $U_h$ of h such that $\inf_{\omega \in \bar{U}_h} J(\omega) \ge J(h) - \delta$ and $\inf_{\omega \in \bar{U}_h \cap S} \phi_{\varepsilon_k} (\omega) \ge \phi(h) - \delta$ for large k, where $\bar{U}_h$ denotes the closure of $U_h$ in E. By the compactness of $\Psi_J(\alpha)$ , we can find $h_1,\ldots,h_N \in \Psi_J(\alpha)$ such that $\Psi_J(\alpha) \subset \bigcup_{i=1}^N U_{h_i}$ . Now, setting $F=\big(\bigcup_{i=1}^N U_{h_i}\big)^c \cap A$ (which is a closed subset of E), we observe that

\begin{align*} P^{\varepsilon_k}(A) & = \int_A\mathrm{e}^{-\phi_{\varepsilon_k}/\varepsilon_k}\,\, \mathrm{d}\bar{R}^{\varepsilon} \\ & \le \sum_{i=1}^N\exp\big\{(\varepsilon_k\log\bar{R}^{\varepsilon_k}(\bar{U}_{h_i}) - \phi(h_i) + \delta)/\varepsilon_k\big\} + \mathrm{e}^{(M+\varepsilon_k \log \bar{R}^{\varepsilon_k}(F))/\varepsilon_k}. \end{align*}

Using Lemma 2, and combining Lemma 3 and Corollary 2, we have

\begin{align*} \begin{split} \varepsilon_k\log P^{\varepsilon_k}(A) & \le \max\big\{\varepsilon_k\log\bar{R}^{\varepsilon_k}(\bar{U}_{h_1}) - \phi(h_1) + \delta, \ldots, \varepsilon_k\log\bar{R}^{\varepsilon_k}(\bar{U}_{h_N}) - \phi(h_N) + \delta, \\ & \qquad\qquad M + \varepsilon_k\log\bar{R}^{\varepsilon_k}(F)\big\} + \varepsilon_k\log(N+1) \\ & \le \max\Big\{{-}\inf_{\omega\in\bar{U}_{h_1}}J(\omega) - \phi(h_1) + \delta, \ldots, -\inf_{\omega\in\bar{U}_{h_N}}J(\omega) - \phi(h_N) + \delta, \\ & \qquad\qquad M - \inf_{\omega\in F}J(\omega)\Big\} + o(1) \\ &\le \max\{-I(h_1) + 2\delta, \ldots, -I(h_N) + 2\delta, M-\alpha\} + o(1) \\ & \le \max\Big\{{-}\inf_{h \in A}I(h) + 2\delta, M-\alpha\Big\} + o(1), \end{split} \end{align*}

where we used $J(h) + \phi (h) = I(h)$ . Since $\alpha < \infty$ and $\delta >0$ are arbitrary, we obtain the desired upper bound. Finally, the rate function I being good follows from an argument similar to the proof of Corollary 2(iii). This completes the proof.

4.5. Proof of Proposition 2

Proof of Proposition 2 . The fact that the sequence $\{ P^{\varepsilon_k}_{st} \}_{k \in \mathbb{N}}$ satisfies an LDP having a good rate function follows from Corollary 1 and the contraction principle. The rate function is given by

\begin{align*} I_{st}(x,y) = \inf_{h: h(s)=x, h(t)=y} \frac{\| h \|_H^2}{2} - \varphi(h(0)) - \psi(h(1)). \end{align*}

First, fix two endpoints $h(0)=x'$ and $h(1)=y'$ and optimize $\| h \|_H^2$ under the constraint $(h(s),h(t))=(x,y)$ . The optimal h is given by

\begin{equation*} h(u) = \begin{cases} \bigg(1-\dfrac{u}{s}\bigg)x' + \dfrac{u}{s}x & \text{if} \ u \in [0,s], \\[10pt] \bigg(1-\dfrac{u-s}{t-s}\bigg)x + \dfrac{u-s}{t-s}y & \text{if} \ u \in [s,t], \\[10pt] \bigg(1-\dfrac{u-t}{1-t}\bigg)y + \dfrac{u-s}{1-t}y' & \text{if} \ u \in [t,1], \end{cases} \end{equation*}

which gives $\| h \|_H^2/2 = c^{0,s}(x',x)+c^{s,t}(x,y)+c^{t,1}(y,y')$ . Hence,

\begin{align*} \begin{split} I_{st}(x,y) & = \inf_{x',y'}\big\{c^{0,s}(x',x)+c^{s,t}(x,y)+c^{t,1}(y,y') - \varphi(x') - \psi(y') \big \} \\ & = c^{st}(x,y) + \mathcal{Q}_{s}(\!-\varphi)(x) + \mathcal{Q}_{1-t}(\!-\psi)(x) = c^{s,t}(x,y) - \varphi_s(x) - \psi_t(y). \end{split} \end{align*}

The final claim follows from [Reference Villani47, Theorem 7.35] after adjusting the signs.

4.6. Proof of Proposition 3

Proof of Proposition 3. The EOT plan $\check{\pi}^\varepsilon$ is of the form

\begin{align*} \, \mathrm{d}\check{\pi}^{\varepsilon}(x,y) = \exp\{(\check{\varphi}_\varepsilon(x)+\check{\psi}_\varepsilon(y)-c_{\varepsilon}(x,y))/\varepsilon\}\, \, \mathrm{d}(\mu_0 \otimes \mu_1)(x,y), \end{align*}

where $(\check{\varphi}_\varepsilon,\check{\psi}_\varepsilon)$ are EOT potentials satisfying the Schrödinger system (6) with c replaced by $c_\varepsilon$ . For uniqueness, we assume without loss of generality that $\int\check{\varphi}_\varepsilon\,\, \mathrm{d}\mu_0 = \int\check{\psi}_\varepsilon\,\, \mathrm{d}\mu_1$ . Consider the mixture distribution $Q^{\varepsilon} = \int\check{R}^{\varepsilon,xy}\,\, \mathrm{d}(\mu_0 \otimes \mu_1)(x,y)$ ; then

\begin{align*} \frac{\, \mathrm{d}\check{P}^\varepsilon}{\, \mathrm{d} Q^\varepsilon}(\omega) = \exp\bigg\{\frac{1}{\varepsilon}(\check{\varphi}_\varepsilon(\omega(0)) + \check{\psi}_\varepsilon(\omega(1)) - c_{\varepsilon}(\omega(0),\omega(1)))\bigg\}, \quad \omega = (\omega(t))_{t \in [0,1]} \in E. \end{align*}

Furthermore, by [Reference Stroock44, Theorems 4.4.6 and 4.4.12], we have $\lim_{\varepsilon \downarrow 0} c_{\varepsilon}(x,y) = {|x-y|^2}/{2} = c(x,y)$ uniformly over $(x,y) \in \mathcal{X} \times \mathcal{Y}$ . Hence, in view of the direct proof of Corollary 1, the desired claim follows once we verify the following:

  • the mixture distributions $\{ Q^{\varepsilon_k} \}_{k \in \mathbb{N}}$ satisfy the LDP with good rate function $J(h) = \inf_{(x,y) \in \mathcal{X} \times \mathcal{Y}} J_{xy}(h)$ ;

  • as $k \to \infty$ , $\check{\varphi}_{\varepsilon_k}\to\psi^c$ and $\check{\psi}_{\varepsilon_k}\to\psi$ uniformly on $\mathcal{X}$ and $\mathcal{Y}$ , respectively.

The first item follows by establishing the exponential continuity of $\{ \check{R}^{\varepsilon_k, xy} \}_{k \in \mathbb{N}}$ w.r.t. (x,y). To this end, we invoke the Radon–Nikodym derivative of the Langevin bridge $\check R^{\varepsilon,xy}$ against the Brownian bridge $R^{\varepsilon,xy}$ :

(17) \begin{equation} \frac{\, \mathrm{d}\check{R}^{\varepsilon,xy}}{\, \mathrm{d} R^{\varepsilon,xy}}(\omega) = Z_{\varepsilon,xy}^{-1}\exp\bigg\{{-}\frac{\varepsilon}{2}\int_0^1 \big(|\nabla V(\omega(t))|^2 -\Delta V(\omega(t))\big)\,\, \mathrm{d} t\bigg\}, \end{equation}

where $\Delta V$ is the Laplacian of V and $Z_{\varepsilon,xy}$ is the normalizing constant. See [Reference Levy and Krener28, Section 5] and the proof of [Reference Conforti12, Theorem 2.1]; see also Remark 11. Heuristically, this follows from the following observation. The Langevin diffusion $X^\varepsilon$ follows the SDE

\begin{align*} \, \mathrm{d} X^\varepsilon(t) = -\varepsilon\nabla V(X^\varepsilon(t))\,\, \mathrm{d} t + \sqrt{\varepsilon}\,\, \mathrm{d} W(t). \end{align*}

The Girsanov theorem yields that

\begin{align*} \frac{\, \mathrm{d}\check{R}^\varepsilon}{\, \mathrm{d} R^\varepsilon}(\omega) = \exp\bigg\{{-}\int_0^1\nabla V(\omega(t))\cdot\,\, \mathrm{d}\omega(t) - \frac{\varepsilon}{2}\int_0^1|\nabla V(\omega(t))|^2\,\, \mathrm{d} t\bigg\} \end{align*}

under $R^\varepsilon$ . An application of Itô’s formula yields

\begin{align*} \int_0^1\nabla V(\omega(t))\cdot\, \mathrm{d}\omega(t) = V(\omega(1)) - V(\omega(0)) - \frac{\varepsilon}{2}\int_0^1\Delta V(\omega(t))\,\, \mathrm{d} t \end{align*}

under $R^\varepsilon$ . The bridge case is obtained by canceling $V(\omega(1)) - V(\omega(0))$ , which is to be expected since it depends only on the endpoints. Now, since the potential V has bounded derivatives, the desired exponential continuity follows from Proposition 4.

For the second item, by the Schrödinger system and Jensen’s inequality, we have

\begin{align*} \begin{split} |\check{\varphi}_\varepsilon(x) - \check{\varphi}_\varepsilon (x')| & \le \sup_{y \in \mathcal{Y}}|c_\varepsilon(x,y) - c_\varepsilon(x',y)| \\ & \le \sup_{y \in \mathcal{Y}}|c(x,y) - c(x',y)| + 2\sup_{\mathcal{X} \times \mathcal{Y}}|c_\varepsilon-c|. \end{split} \end{align*}

By the generalized Ascoli–Arzelà theorem (cf. [Reference Nutz and Wiesel36, Lemma 2.2]), the sequence of functions $\{ \check{\varphi}_{\varepsilon_k} \}_{k \in \mathbb{N}}$ converges uniformly on $\mathcal{X}$ along a subsequence. A similar result holds for $\check{\psi}_{\varepsilon_k}$ . The rest of the proof is analogous to the second part of the proof of Lemma 3. This completes the proof.

Remark 11 (Derivation of (17).) Formally, the Radon–Nikodym derivative (17) follows by reducing to the $\varepsilon=1$ case via reparameterization and [Reference Conforti12, (25)]. Indeed, the process $Y^\varepsilon(t)=X^{\varepsilon}(t)/\sqrt{\varepsilon}$ satisfies $\, \mathrm{d} Y^{\varepsilon}(t) = -\nabla V^\varepsilon(Y^\varepsilon(t))\,\, \mathrm{d} t + \, \mathrm{d} W(t)$ , where $V^{\varepsilon}(x) = V(\sqrt{\varepsilon}x)$ . By [Reference Conforti12, (25)], denoting by $Y^{\varepsilon}_{\#}\mathbb{P}$ the law of the process $Y^\varepsilon = (Y^\varepsilon (t))_{t \in [0,1]}$ , we have

\begin{align*} \begin{split} \frac{\, \mathrm{d}(Y^{\varepsilon}_{\#}\mathbb{P})^{xy}}{\, \mathrm{d} R^{1,xy}}(\omega) & = Z_{xy}^{-1}\exp\bigg\{{-}\frac{1}{2}\int_0^1\big(|\nabla V^\varepsilon(\omega(t))|^2 - \Delta V^\varepsilon(\omega(t))\big)\,\, \mathrm{d} t \bigg\} \\ & = Z_{xy}^{-1}\exp\bigg\{{-}\frac{\varepsilon}{2} \int_0^1 \big(|\nabla V(\sqrt{\varepsilon}\omega(t))|^2 - \Delta V(\sqrt{\varepsilon}\omega(t))\big)\,\, \mathrm{d} t \bigg\}, \end{split} \end{align*}

where $Z_{xy}$ is the normalizing constant. Now, the formula (17) follows by a simple reparameterization.

Acknowledgement

The author would like to thank the editor and two anonymous referees for their careful reading and constructive comments that helped improve the quality of this paper.

Funding information

K. Kato is partially supported by NSF grants DMS-1952306, DMS-2210368, and DMS-2413405.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Altschuler, J. M., Niles-Weed, J. and Stromme, A. J. (2022). Asymptotics for semidiscrete entropic optimal transport. SIAM J. Math. Anal. 54, 17181741.Google Scholar
Ambrosio, L., Gigli, N. and Savare, G. (2008). Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Springer, Berlin.Google Scholar
Bernton, E., Ghosal, P. and Nutz, M. (2022). Entropic optimal transport: Geometry and large deviations. Duke Math. J. 171, 33633400.Google Scholar
Bernton, E., Heng, J., Doucet, A. and Jacob, P. E. (2019). Schrödinger bridge samplers. Preprint, arXiv:1912.13170.Google Scholar
Boissard, E., Gozlan, N., Lehec, J., Léonard, C., Menz, G. and Schlichting, A. (2014). Some recent developments in functional inequalities. ESAIM: Proc. 44, 338354.CrossRefGoogle Scholar
Bravo, G. U. and Chaumont, L. (2011). Markovian bridges: Weak continuity and pathwise constructions. Ann. Prob. 39, 609647.Google Scholar
Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44, 375417.Google Scholar
Carlier, G., Duval, V., Peyré, G. and Schmitzer, B. (2017). Convergence of entropic schemes for optimal transport and gradient flows. SIAM J. Math. Anal. 49, 13851418.10.1137/15M1050264CrossRefGoogle Scholar
Carlier, G., Pegon, P. and Tamanini, L. (2023). Convergence rate of general entropic optimal transport costs. Calc. Var. Partial Differ. Equ. 62, 116.10.1007/s00526-023-02455-0CrossRefGoogle Scholar
Chatterjee, S. (2005). An error bound in the Sudakov–Fernique inequality. Preprint, arXiv:0510424.Google Scholar
Chizat, L., Roussillon, P., Léger, F., Vialard, F.-X. and Peyré, G. (2020). Faster Wasserstein distance estimation with the Sinkhorn divergence. In Proc. 34th Int. Conf. Neural Information Processing Systems, 2257–2269.Google Scholar
Conforti, G. (2018). Fluctuations of bridges, reciprocal characteristics and concentration of measure. Ann. Inst. H. Poincaré Prob. Statist. 54, 14321463.Google Scholar
Conforti, G. and Tamanini, L. (2021). A formula for the time derivative of the entropic cost and applications. J. Funct. Anal. 280, 108964.10.1016/j.jfa.2021.108964CrossRefGoogle Scholar
Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Proc. 27th Int. Conf. Neural Information Processing Systems, 2292–2300.Google Scholar
Dai Pra, P. (1991). A stochastic control approach to reciprocal diffusion processes. Appl. Math. Optim. 23, 313329.10.1007/BF01442404CrossRefGoogle Scholar
De Bortoli, V., Thornton, J., Heng, J. and Doucet, A. (2021). Diffusion Schrödinger bridge with applications to score-based generative modeling. In Proc. 35th Int. Conf. Neural Information Processing Systems, 17695–17709.Google Scholar
Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd edn. Springer, New York.10.1007/978-1-4612-5320-4CrossRefGoogle Scholar
Dinwoodie, I. H. and Zabell, S. L. (1992). Large deviations for exchangeable random vectors. Ann. Prob. 20, 11471166.Google Scholar
Föllmer, H. (1988). Random fields and diffusion processes. In Ecole d’Ete de Probabilites de Saint-Flour XV–XVII, 1985–87, eds P. Diaconis, D. Elworthy, H. Föllmer, E. Nelson, G. Papanicolaou and Srinivasa R. S. Varadhan. Springer, New York, pp. 101203.Google Scholar
Gentil, I., Léonard, C., Ripani, L. and Tamanini, L. (2020). An entropic interpolation proof of the HWI inequality. Stoch. Process. Appl. 130, 907923.10.1016/j.spa.2019.04.002CrossRefGoogle Scholar
Gigli, N. and Tamanini, L. (2021). Second order differentiation formula on $RCD^*(K,N)$ spaces. J. Europ. Math. Soc. 23, 17271795.10.4171/jems/1042CrossRefGoogle Scholar
Giné, E. and Nickl, R. (2021). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press.Google Scholar
Hsu, P. (1990). Brownian bridges on Riemannian manifolds. Prob. Theory Relat. Fields 84, 103118.Google Scholar
Karatzas, I. and Shreve, S. E. (1998). Brownian Motion and Stochastic Calculus. Springer, New York.10.1007/978-1-4612-0949-2CrossRefGoogle Scholar
Lehec, J. (2013). Representation formula for the entropy and functional inequalities. Ann. Inst. H. Poincaré Prob. Statist. 49, 885899.10.1214/11-AIHP464CrossRefGoogle Scholar
Léonard, C. (2012). From the Schrödinger problem to the Monge–Kantorovich problem. J. Funct. Anal. 262, 18791920.10.1016/j.jfa.2011.11.026CrossRefGoogle Scholar
Léonard, C. (2014). A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete Continuous Dynam. Syst. 34, 15331574.10.3934/dcds.2014.34.1533CrossRefGoogle Scholar
Levy, B. C. and Krener, A. J. (1993). Dynamics and kinematics of reciprocal diffusions. J. Math. Phys. 34, 18461875.Google Scholar
McCann, R. J. (1997). A convexity principle for interacting gases. Adv. Math. 128, 153179.Google Scholar
Mena, G. and Niles-Weed, J. (2019). Statistical bounds for entropic optimal transport: Sample complexity and the central limit theorem. In Proc. 33rd Int. Conf. Neural Information Processing Systems, 4541–4551.Google Scholar
Mikami, T. (2004). Monge’s problem with a quadratic cost by the zero-noise limit of h-path processes. Prob. Theory Relat. Fields 129, 245260.CrossRefGoogle Scholar
Mikami, T. and Thieullen, M. (2008). Optimal transportation problem by stochastic optimal control. SIAM J. Control Optim. 47, 11271139.Google Scholar
Mikulincer, D. and Shenfeld, Y. (2021). The Brownian transport map. Preprint, arXiv:2111.11521.Google Scholar
Monsaingeon, L., Tamanini, L. and Vorotnikov, D. (2023). The dynamical Schrödinger problem in abstract metric spaces. Adv. Math. 426, 109100.10.1016/j.aim.2023.109100CrossRefGoogle Scholar
Nutz, M. (2021). Introduction to entropic optimal transport. Lecture Notes, Columbia University, New York.Google Scholar
Nutz, M. and Wiesel, J. (2022). Entropic optimal transport: Convergence of potentials. Prob. Theory Relat. Fields 184, 401424.Google Scholar
Pal, S. (2024). On the difference between entropic cost and the optimal transport cost. Ann. Appl. Prob. 34, 10031028.Google Scholar
Pavon, M., Trigila, G. and Tabak, E. G. (2021). The data-driven Schrödinger bridge. Commun. Pure Appl. Math. 74, 15451573.Google Scholar
Peyré, G. and Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning 11, 355607.Google Scholar
Pooladian, A.-A. and Niles-Weed, J. (2021). Entropic estimation of optimal transport maps. Preprint, arXiv:2109.12004.Google Scholar
Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians. Springer, New York.Google Scholar
Staudt, T., Hundrieser, S. and Munk, A. (2022). On the uniqueness of Kantorovich potentials. Preprint, arXiv:2201.08316.Google Scholar
Stromme, A. (2023). Sampling from a Schrödinger bridge. Proc. Mach. Learn. Res. 26, 40584067.Google Scholar
Stroock, D. W. (2008). Partial Differential Equations for Probabalists. Cambridge University Press.Google Scholar
Stroock, D. W. (2011). Probability Theory: An Analytic View, 2nd edn. Cambridge University Press.Google Scholar
van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.Google Scholar
Villani, C. (2009). Optimal Transport: Old and New. Springer, New York.Google Scholar