Transportation on spheres via an entropy formula

The paper proves transportation inequalities for probability measures on spheres for the Wasserstein metrics with respect to cost functions that are powers of the geodesic distance. Let $\mu$ be a probability measure on the sphere ${\bf S}^n$ of the form $d\mu =e^{-U(x)}{\rm d}x$ where ${\rm d}x$ is the rotation invariant probability measure, and $(n-1)I+{\hbox {Hess}}\,U\geq {\kappa _U}I$, where $\kappa _U>0$. Then any probability measure $\nu$ of finite relative entropy with respect to $\mu$ satisfies ${\hbox {Ent}}(\nu \mid \mu ) \geq (\kappa _U/2)W_2(\nu,\, \mu )^2$. The proof uses an explicit formula for the relative entropy which is also valid on connected and compact $C^\infty$ smooth Riemannian manifolds without boundary. A variation of this entropy formula gives the Lichnérowicz integral.


Transportation on the sphere
Optimal transportation involves moving unit mass from one probability distribution to another, at minimal cost, where the cost is measured by Wasserstein's distance.
DEFINITION Let (M, d) be a compact metric space and let µ and ν be probability measures on M .Then for 1 ≤ p < ∞, Wasserstein's distance from µ to ν is W p (ν, µ), where where the probability measure π has marginals ν and µ. (See [8], [14].) Transportation inequalities are results that bound the transportation cost W p (ν, µ) p in terms of µ, ν and geometrical quantities of (M, d).Typically, one chooses µ to satisfy special conditions, and then one imposes minimal hypotheses on ν.In this section, we consider the case where (M, d) is the unit sphere S 2 in R 3 , and obtain transportation inequalities by vector calculus.In section two, we extend these methods to a connected, compact and C ∞ smooth Riemannian manifold (M, d).
On S 2 , let θ ∈ [0, 2π) be the longitude and φ ∈ [0, π] the colatitude, so the area measure is dx = sin φ dφdθ.Let ABC be a spherical triangle where A is the North Pole; then by [10] the Green's function G(B, C) = −(4π) −1 log(1 − cos d(B, C)) may be expressed in terms of longitude and co latitude of B and C via the spherical cosine formula.A related cost function is listed in [14], p 972.Given probability measures µ and ν on S 2 , we can form with gradient in the x variable PROPOSITION 1.1.Let µ and ν be nonatomic probability measures on S 2 .Then Proof.The Green's function is chosen so that ∇ • ∇G(B, C) = δ B (C) − 1/(4π) in the sense of distributions.Given non-atomic probability measures µ and ν on S 2 , their difference µ − ν is orthogonal to the constants on S 2 , so for a 1-Lipschitz function ϕ : S 2 → R, we have so by Kantorovich's duality theorem [8], the Wasserstein transportation distance is bounded by DEFINITION Suppose that µ is a probability measure and ν is a probability measure that is absolutely continuous with respect to µ, so dν = vdµ for some probability density function v ∈ L 1 (µ).Then the relative entropy of ν with respect to µ is where 0 ≤ Ent(ν | µ) ≤ ∞ by Jensen's inequality.
At x ∈ S 2 , we have tangent space T s S 2 = {y ∈ R 3 : x • y = 0}.For y ∈ T x S 2 with y = 1, we consider exp x (ty) = x cos t + y sin t so that exp x (0) = x, exp x (ty) = 1 and (d/dt) t=0 exp x (ty) = y; hence exp x : T x S 2 → S 2 gives the exponential map.We let J exp x be the Jacobian determinant of this map.
Suppose that µ(dx) = e −U (x) dx is a probability measure and ν is a probability measure that is absolutely continuous with respect to µ, so dν = vdµ.We say that a Borel function Ψ : S 2 → S 2 induces ν from µ if f (y)ν(dy) = f (Ψ(x))µ(dx) for all f ∈ C(S 2 ; R).McCann [12] showed that there exists Ψ that gives the optimal transport strategy for the W 2 metric; further, there exists a Lipschitz function ψ : S 2 → R such that Ψ(x) = exp x (∇ψ(x)); so that (1.6) as in [14] p 569.In [5] and [6], the authors obtain some functional inequalities that are related to T p inequalities.Here we offer an approach that is more direct, and uses only basic differential geometry to augment McCann's fundamental result.The key point is an explicit formula for the relative entropy in terms of the optimal transport maps.
LEMMA 1.2.Suppose that ν has finite relative entropy with respect to µ, and let H = Hess x ψ(x) and A = Hess x d(x, y) 2 /2 at y = Ψ(x); (1.7) Then the relative entropy satisfies where A is positive definite, H is symmetric and A + H is also positive definite, and

then equality holds in (1.8).
Proof.To express the relative entropy in terms of the transportation map, we adapt an argument from [1].We have Ent(ν where the final term arises from the Jacobian of the change of variable y = Ψ(x), where Ψ = Ψ 1 and Ψ t (x) = exp x (t∇ψ(x)).We compute this Jacobian by the chain rule for derivatives with respect to x. Specifically by [6] p 622, we have Hess(ψ(x) + d(x, y) 2 /2) ≥ 0 and where J exp x is the Jacobian of exp x : T x S 2 → S 2 and Hess = D 2 x is the Hessian, where the expression is evaluated at y = exp x (∇ψ(x)).For x ∈ S 2 and τ ∈ R 3 such that x • τ = 0, we have τ ∈ T x S 2 and exp x (τ ) = cos( τ ) x + sin( τ ) τ τ ; (1.12) see [5].By a vector calculus computation, which we replicate from [5], one finds With ψ : S 2 → R we have ∇ψ(x) ⊥ x, so 0 = x • ∇ψ(x), hence 0 = ∇ψ(x) + Hess(ψ(x))x.We write θ = ∇ψ(x) for the angle between x and Ψ(x) so where × denotes the usual vector product; then {x, θ −1 ∇ψ(x), v} gives an orthonormal basis of R 3 .Hence hence A is positive definite and is a rank-one perturbation of a multiple of the identity matrix.
Note that the formulas degenerate on the cut locus d(x, y) = π; consider the international date line opposite the Greenwich meridian.
We have and we can combine the first two terms in (1.16) by the divergence theorem so in which the Alexandrov Hessian [6], [14] p 363 satisfies trace Hess where ∆ D ψ is the distributional derivative of the Lipschitz function ψ; so we recognise (1.8).
We have an orthonormal basis for R 3 in which the final two vectors give an orthonormal basis for T x S 2 .Then hence A and H have the form with respect to the stated basis of T x S 2 .The function f (x) = x − 1 − log x for x > 0 is convex and takes its minimum value at f (1) = 0. Let T be a self-adjoint matrix with eigenvalues λ 1 ≥ . . .≥ λ n where λ n > −1; then the Carleman determinant of I + T is det 2 (I + T ) = n j=1 (1 + λ j )e −λ j .Since A + H is positive definite, as in [1] Corollary 4.3, we can apply the spectral theorem to compute the Carleman determinant and show that (1.26) PROPOSITION 1.3.Suppose that the Hessian matrix of U satisfies for some κ U > 0. Then µ satisfies the transportation inequality (1.28) This applies in particular when µ is normalized surface area measure.
Proof.Let K : [0, π) → R be the function Then from (1.13) and (1.26) we have Considering the final integral in (1.8), we have which has constant speed ∂Ψ t (x) ∂t = ∇ψ(x) and ∂Ψ t (x) ∂t , Ψ t (x) = 0; also where the final term is zero since ∇U • Ψ t (x) is in the tangent space at Ψ t (x), hence is perpendicular to Ψ t (x).We therefore have the crucial inequality To simplify the function K, we recall from [9] 8.342 the Maclaurin series where we have introduced Euler's Γ function and Riemann's ζ function, so (1.34) Now we consider (1.32) with the hypothesis (1.27) in force.The Carleman determinant contributes a nonnegative term as in (1.25), while the final integral in (1.32) combines with the integral of K( ∇ψ(x) ) to give When µ is normalized surface area, U is a constant and the hypothesis (1.27) holds with κ U = 1.

Transportation on compact Riemannian manifolds
Let M be a connected, compact and C ∞ smooth Riemannian manifold of dimension n without boundary, and let g be the Riemannian metric tensor, giving metric d.Let µ(dx) = e −U (x) dx be a probability measure on M where dx is Riemannian measure and U ∈ C 2 (M ; R).Suppose that ν is a probability measure on M that is of finite relative entropy with respect to µ.
Then by McCann's theory [12], there exists a Lipschitz function ψ : M → R such that Ψ(x) = exp x (∇ψ(x)) induces ν from µ. then we let Ψ t (x) = exp x (t∇ψ(x)).We proceed to compute quantities which we need for our extension of Lemma 1.2.
LEMMA 2.1.Suppose that Ψ t (x) = exp x (t∇ψ(x)), where Ψ 1 induces the probability measure ν from µ and gives the optimal transport map for the W 2 metric.Then the relative entropy satisfies where H is symmetric and A + H is also positive definite.If ψ ∈ C 2 (M ; R), then equality holds in (2.12).
Proof.This is similar to Lemma 1.2.As in (125), we have trace and by standard calculations [13] p32 we have The curvature operator is the symmetic operator R Z : Y → R(Z, Y )Z.If M has nonnegative Ricci curvature so that R Z ≥ 0 as a matrix for all Z, then we have (2.15) by (3.4) of [Ca].