Mixing and average mixing times for general Markov processes

Abstract Yuval Peres and Perla Sousi showed that the mixing times and average mixing times of reversible Markov chains on finite state spaces are equal up to some universal multiplicative constant. We use tools from nonstandard analysis to extend this result to reversible Markov chains on compact state spaces that satisfy the strong Feller property.


Introduction
Consider the simple random walk on Z n , given by for an i.i.d. ∼ sequence δ , δ , . . . of random variables with Unif({− , + }) distribution. It is easy to see that X m is always even, while X m+ is always odd for m ∈ N.
is periodic behaviour means that the chain is not ergodic. On the other hand, there are various ways that this periodic behaviour seems to be essentially the only obstacle to mixing. For example, for T ∼ Geom(Cn ) for a sufficiently large constant C > , one can easily check that the distribution of X T is very close to uniform on Z n (see e.g., [LPW ] for a coupling argument).
is sort of (near-)periodicity is o en undesired, and a common way to "fix" the problem is to replace a chain with an ε-lazy version (in Example ( . ), the -lazy chain can be obtained by sampling the driving sequence from δ , δ , . . . i .i .d .

∼
Unif({− , , + })). is chain is also close to uniform a er Θ(n ) steps, but it is natural to ask if smaller modifications can also eliminate periodic behaviour; for the above example, choosing T ∼ Unif({Cn , Cn + }) spreads our random time over only two choices but still works well. is minimal modification turns out to work quite generally, and [PS ] shows that this gives an equivalent reduction in the time a discrete chain takes to mix (see also refinements in [HP ]). e modest goal of this paper is to give a quick proof of the analogous result for continuous chains.
Beyond providing a proof of this useful result, we were motivated to write this paper as a way to illustrate how the machinery developed in [ADS ], which shows that mixing times and hitting times are equal up to multiplicative constants for general Markov processes satisfying regularity conditions, can be used to give fairly quick and simple translations of facts about discrete chains into facts about continuous chains.

Notation and Main Results
We fix a compact metric state space X endowed with Borel σ-algebra B[X] and let {P (t) x (⋅)} x∈X,t∈N denote the transition kernel of a Markov process with stationary measure π. roughout this paper, all transition kernels are assumed to have a unique stationary distribution. We occasionally write x (⋅) be the Dirac measure on x. roughout the paper, we include in N. We write P x (A) and P(x, A) as an abbreviation for P ( ) the usual total variation distance between µ and ν.

Definition .
We use {P  L (x, ⋅)} x∈X,t∈N has the same stationary distribution as {P (t) (x, ⋅)} x∈X,t∈N , and we denote its mixing time by t L (ε).
In general, it is possible to have t m ≫ t L due to (near)-periodicity. To avoid these issues, one could instead take an average over two successive steps. is suggests the following definition.

Definition .
For ε ∈ R > , the average mixing time is Recall that a transition kernel {P We generalize eorem . to transition kernels with compact metric state space satisfying the following continuity condition.

Definition .
DSF e transition kernel {P (t) x (⋅)} x∈X,t∈N satisfies the strong Feller property if for every x ∈ X and every ε > , there exists δ > such that roughout the paper, we use C to denote the collection of discrete time reversible transition kernels with compact metric state space satisfying Assumption . . Our main theorem is the following.

Equivalent Form of Mixing Times and Hitting times
In this section, we define a quantity that is asymptotically equivalent to the mixing times defined in the previous section. is equivalent form plays an important role throughout the entire paper. Let Similarly, for ε ∈ R > , define the standardized mixing time to be and let t L (ε) be the analogous quantity for the lazy kernel g L .

Nonstandard Analysis and Nonstandard Probability Theory
In this paper, we use nonstandard analysis, powerful machinery derived from mathematical logic, as our main toolkit. For those who are not familiar with nonstandard analysis, [DRW ] and [DR ] provide reviews tailored to probabilists and statisticians. [ACH , CNOSP , WL ] provide thorough introductions. For completeness, we give a brief introduction to nonstandard analysis as well as nonstandard probability theory. is section is taken from [ADS , Section. . ] and [Kei ].
Given any set S, the superstructure VS over S is found by iterating the power set operation countably many times. at is

R. Anderson, H. Duanmu, and A. Smith
We use * : V(S) → V( * S) to denote the nonstandard extension map taking elements, sets, functions, relations, etc., to their nonstandard counterparts. An internal object is an element of a set * b where b ∈ V(S). We assume that S contains R as a subset. In particular, * R and * N denote the nonstandard extensions of the reals and natural numbers, respectively. An element r ∈ * R is infinite if r > n for every n ∈ N and is finite otherwise. An element r ∈ * R with r > is infinitesimal if r − is infinite. For r, s ∈ * R, we use the notation r ≈ s as shorthand for the statement " r − s is infinitesimal, " and similarly, we use use r ⪆ s as shorthand for the statement "either r ≥ s or r ≈ s. " Given a topological space (X, T), the monad of a point x ∈ X is the set We say y is the standard part of x and write y = st(x). Note that such y is unique. We use NS * (X) to denote the collection of near-standard elements of * X], and we say NS * X is the near-standard part of * (X). e standard part map st is a function from NS * * (X) to X, taking near-standard elements to their standard parts. In both cases, the notation elides the underlying space Y and the topology T, because the space and topology will always be clear from context. For a metric space (X, d), two elements ( ) µ( * X) = ; and ( ) µ is hyperfinitely additive (that is, it satisfies the usual equality for an additive measure, but with the sum ranging from to any element in * N).
e Loeb space of the internal probability space Every standard model is closely connected to its nonstandard extension via the transfer principle, which asserts that a first order statement is true in the standard model is true if and only if it is true in the nonstandard model. Finally, given a cardinal number κ, a nonstandard model is called κ-saturated if the following condition holds: let F be a family of internal sets, if F has cardinality less than κ and F has the finite intersection property, then the total intersection of F is non-empty. In this paper, we assume our nonstandard model is as saturated as we need (see e.g., [ACH , m. . . ] for the existence of κ-saturated nonstandard models for any uncountable cardinal κ).

Hyperfinite Representation of Compact Spaces
In this section, we give an overview of hyperfinite representation for compact metric spaces. Hyperfinite representation for more general metric spaces are discussed in [DRW ]. We use similar notation to [DRW ]. For the rest of the paper, we use the common notation d(x, A) = inf{y ∈ X ∶ d(x, y)} for every x ∈ X and every A ⊂ X.
e formal definition of a hyperfinite representation of a compact metric space is given below.

Definition .
Let (X, d) be a compact metric space. Let δ ∈ * R + be an infinitesimal. A δ-hyperfinite representation of X is a tuple (S, {B(s)} s∈S ) such that (i) S is a hyperfinite subset of * X; (ii) s ∈ B(s) ∈ * B[X] for every s ∈ S; (iii) for every s ∈ S, the diameter of B(s) is no greater than δ; e set S is called the base set of the hyperfinite representation of X. For every x ∈ ⋃ s∈S B(s), we use s x to denote the unique element in S such that x ∈ B(s x ).
As discussed in [DRW ], hyperfinite representations exist for more general spaces. For simplicity, we focus on hyperfinite representations of compact metric spaces in this paper. Moreover, by [ADS , m. . ], for every compact metric space X and every positive infinitesimal δ, there exists a δ-hyperfinite representation of X.

Hyperfinite Representation of Markov Processes
We give a brief introduction of hyperfinite representation of general Markov processes in this section. e construction of such hyperfinite representations is developed in x (⋅)} x∈X,t∈N be the transition kernel of a discrete-time Markov process with state space X. We assume that X is a compact metric space for the remainder of the paper unless otherwise mentioned. e transition kernel can be viewed as a x (A) for every x ∈ X, t ∈ N and A ∈ B[X]. We will use g(x, t, A) and P (t) x (A) interchangeably. We will construct an internal transition kernel on S to represent the standard transition kernel g. We fix a set M = { , , . . . , K} for some infinite K ∈ * N throughout the paper. A hyperfinite Markov process is defined analogously to a finite Markov process. Namely, a hyperfinite Markov process is characterized by the following four ingredients: ( ) a state space S that is a non-empty hyperfinite set; ( ) a time line M; ( ) a set {ν i ∶ i ∈ S} ⊂ * R where each ν i ≥ and ∑ i∈S ν i = ; ( ) a set {p i j } i , j∈S of non-negative hyperreals with ∑ j∈S p i j = for every i ∈ S.
It was shown in [DRW , m. . ] that one can always construct a hyperfinite Markov process given a fixed collection of {ν i ∶ i ∈ S} and {p i j } i , j∈S .
Let p be a standard probability measure on (X, B[X]) and let (S, {B(s)} s∈S ) be a δ-hyperfinite representation of X for some infinitesimal δ. Define the associated hyperfinite probability measure P (with respect to p) to be the internal probability measure on S by letting P({s}) = * p(B(s)) for every s ∈ S. We now quote the following result from [ADS ]. (i) the associated internal probability measure Π (with respect to π) is a * stationary distribution of H; (ii) if g is reversible with respect to π, then H is * reversible with respect to Π; (iii) for every t ∈ N, every s ∈ NS(S), and every A ∈ * B[X], we have * g s, t, ⋃

Mixing Times and Average Mixing Times with Their Nonstandard Counterparts
In this section, we develop nonstandard notions of mixing and average mixing times for hyperfinite Markov processes and we show that the nonstandard notions agree with the standard notions. For the remainder of the paper, we fix g ∈ C with stationary measure π. We also fix a hyperfinite representation (S, {B(s)} s∈S ) of X and an internal transition kernel {H(s, , ⋅)} s∈S on S as in eorem . .

Agreement on Mixing Times
e following lemma shows that the mixing time of the lazy chain is no greater than the mixing time of the hyperfinite lazy chain. It is desirable to prove the reverse direction of Lemma . . To do this, we introduce the following definition, which replaces a "so " inequality ≤ by a "strict" inequality <.

Definition .
Let ε ∈ R > . e strict mixing time t For every ε > , we write t (<) L (ε) to denote the strict mixing time of the lazy chain.

eorem .
For every ε ∈ R > : ∥< ε} is the internal strict mixing time of the lazy version of the hyperfinite chain.
Hence, we have the desired result. ∎

Agreement of Average Mixing Time
In this section, we show that the hyperfinite average mixing time is equivalent to the standard average mixing time. We begin by showing that standard average mixing times are no greater than hyperfinite average mixing times.

eorem .
For every ε ∈ R > : Proof Pick ε ∈ R > . We have Hence, we know that completing the proof. ∎ Mixing and average mixing times for general Markov processes e following result is an immediate consequence of Lemma . .

Corollary .
For every ε ∈ R > , we have is the internal average mixing time of the hyperfinite chain.
e result then follows from Lemma . . ∎ It is desirable to prove the reverse direction of Corollary . . To do this, we introduce the following definition.

Definition .
For ε ∈ R > , the strict average mixing time is

eorem .
For every ε ∈ R > , is the internal strict average mixing time of the hyperfinite chain.

R. Anderson, H. Duanmu, and A. Smith
Let t be a natural number such that t Hence, we have the desired result. ∎

Mixing Times and Average Mixing Times on Compact Sets
In this section, we prove our main result, eorem . . e following well-known equivalence follows from submultiplicativity of d(t) and the fact that d(t) ≤ d(t) < d(t) (see Lemmas . and . from [LPW ]). Recall that t m denotes the mixing time of a Markov process (see Definition . ).

Lemma .
For every < ε < ε < , there exists a positive universal constant c ε ε such that for every Markov process with a unique stationary distribution.
We have the following result for strict mixing times and strict average mixing times.

Lemma .
For every < ε ≤ , there exist universal positive constants e ε and e ′ ε so that for every finite reversible Markov process Proof Pick some < ε < and let ε = ε . By eorem . , we have As the lazy chain of a reversible Markov process is reversible, by Lemma . , we have

Mixing and average mixing times for general Markov processes
Let e ε = c ε cεε . en We now prove the other direction. By eorem . and Lemma . , we have L (ε). By letting e ′ ε = c ′ ε c εε , we have the desired result. ∎ We now prove eorem . for strict mixing times and strict average mixing times.