Persistence of spectral projections for stochastic operators on large tensor products

In this paper it is proved that for families of stochastic operators on a countable tensor product, depending smoothly on parameters, any spectral projection persists smoothly, where smoothness is defined using norms based on ideas of Dobrushin. A rigorous perturbation theory for families of stochastic operators with spectral gap is thereby created. It is illustrated by deriving an effective slow 2-state dynamics for a 3-state probabilistic cellular automaton. Some further potential applications are discussed.


Introduction
The problem of persistence of spectral projections for large tensor products is crucial in many domains.Perhaps the most significant domain is many-particle quantum systems.In addition to condensed matter physics, this has taken on enhanced interest because of the problem of designing quantum registers for quantum computing.
Yet a parallel problem arises for stochastic systems with many components.This paper specialises to Markov processes and mostly to discrete time, such as probabilistic cellular automata (PCA).Then the transition operator T acts on the space F of realvalued continuous functions of the state of the whole system.The space F can be considered to be the tensor product of the spaces of real-valued continuous functions of the state of the individual units.
Even if the system is geometrically ergodic (meaning there is a unique stationary probability and it attracts every probability exponentially in an appropriate metric) and the update of each unit is independent of the state outside a bounded neighbourhood, when one changes parameters in a reasonable way the stationary probability may move at a speed going to infinity with the number of units if distances between probability distributions are measured in any of the standard ways (e.g. total variation, Jensen-Shannon, Hellinger, Kantorovich, Fisher information [M1], and Prokhorov [M2]).The solution proposed in [M1] was to introduce a new 1 metric for probabilities on large product systems, christened "Dobrushin metric" as most of the ingredients were already in Dobrushin's work, but credit should also be given to Vasershtein [Vn] (more commonly transliterated now as Wasserstein).With respect to this metric, smooth variation of the his metric is defined differently and in a restricted context, I was able to prove that it is in fact equal to mine in the finite case (Appendix to [M2]), and with Armstrong-Goodall we have now proved they are equal under the general conditions for definition of mine [AM].
1 stationary probability was proved for the class of geometrically ergodic PCA, uniformly in the size of the system [M1].A slightly more sophisticated way of viewing this result is as persistence of the (rank-1) projection P onto the space of stationary measures, for which complementary projection Q = I − P sends constant functions to zero.Thus P is a spectral projection, a projection operator onto a subspace corresponding to a closed subset of the spectrum of T , whose complementary projection Q = I − P is onto a complementary subspace corresponding to the disjoint closed complement of the spectrum.
A question that the work of [M1] prompted is whether other spectral projections for stochastic operators might also persist uniformly smoothly in the size of the system.Here suitable conditions are formulated and a proof of persistence is given.
Borel measures p are dual to continuous functions f , in that one can take p(f ) to be the integral of f with respect to p.We think of functions as column vectors and measures as row vectors.A transition operator T acts to the right on functions and to the left on measures.Transition operators preserve total probability, which can be written as preservation of the function 1, defined to take the value 1 everywhere.
The outline of the paper is that firstly, Dobrushin metric is reviewed (sec.2).Then the continuation problem for spectral projections of a class of stochastic operators is formulated and solved (sec.3).An illustration of the result is given (sec.4),followed by a general development of second-order perturbation theory for stochastic operators (sec.5) and then a discussion of further potential applications, including to metastability (sec.6).The paper ends with a short discussion (sec.7).

Dobrushin metric
Consider transition operators T for functions and probability distributions on the product X of a set of metric spaces (X s , d s ), for sites s in a countable set S. The spaces (X s , d s ) are assumed to be Polish (complete separable metric spaces) with bounded diameter, sup s∈S diam s (X s ) < ∞.The product X is endowed with product topology and with Borel measures.The set of Borel probabilities on X (measures p satisfying p(X) = 1 and p(Y ) ≥ 0 for all Borel subsets Y ) is denoted by P.
For a function f : X → R, define its Lipschitz constant with respect to variations on site s ∈ S by at site s and agreeing elsewhere.Define the set F of Dobrushin smooth functions to be those for which the semi-norm is finite.Define the space Z of zero-charge measures on X to be the signed Borel measures µ for which µ(X) = 0. Define a norm This makes P into a complete metric space, with diameter sup s∈S diam s (X s ) [M1].The point of this metric is that if (I − T ) is invertible on Z then T has a unique stationary probability p ∈ P and it varies smoothly with respect to changes in T : Z , where ′ denotes derivative with respect to parameters and (I − T ) Z means the restriction of I − T to Z (which is necessary to take its inverse) [M1].Also there are conditions for invertibility of I − T in terms of Dobrushin's dependency matrix, which are verifiable in relevant classes of system.

Spectral projections
Given a bounded transition operator T on a space F of continuous functions, a spectral projection for T is a bounded operator P on F such that P2 = P , P T = T P and the parts of the spectrum of T corresponding to the restrictions of T to the image 2 of P and the image of Q = I − P (which are invariant under T ) are disjoint.Because they are closed and bounded, it follows that the distance between them is positive, called a spectral gap.
An example of a spectral projection is P = 1p for the stationary probability p for a geometrically ergodic operator T .This is because p1 = 1 for any probability p, T 1 = 1 by definition of a stochastic operator, pT = p for a stationary measure, and geometric ergodicity implies the eigenspace for eigenvalue 1 is one-dimensional and the rest of the spectrum is in a disk of radius less than 1.
To extend the persistence theory to more general spectral projections, one needs to define norms on changes to transition operators and on tangent vectors to the manifold M of projections.
Note that the set M of projections on F is indeed a manifold (a variant of a Grassmann manifold).Here is an outline proof, because ingredients are useful later.Let B(F ) be the space of bounded linear operators P on the space F and define Φ on B(F ) by ( 1) Φ(P ) = P 2 − P.

For an arbitrary operator
With respect to the same direct-sum decomposition, let T P M be3 the space of operators on F of the form (4) 0 π 2 π 3 0 , and N P M be those of the form π 1 0 0 π 4 .Consequently, DΦ P maps N P M to itself by I 0 0 −I , which has bounded inverse, namely itself.So the implicit function theorem shows that M is locally the graph of a C 1 function ψ : T P M → N P M .This completes the outline proof.Furthermore, Dψ = 0, so T P M is the tangent space to M at P (hence the notation).Note, however, that the manifold M of projections has many components of different dimensions, corresponding to the rank of the projection.Now the paper moves to the promised definition of norms on changes to transition operators and on tangents to M .
Firstly, a change T ′ to a transition operator satisfies T ′ 1 = 0, where 0 is the function taking value 0 everywhere, so T ′ takes any measure into Z.One can quantify the size of its effect on Z by the operator norm of its restriction to Z: One needs also, however, to measure the size of ρT ′ for measures ρ outside Z, for example the Borel probabilities P, which can be non-zero even if µT ′ = 0 for all µ ∈ Z.A suitable quantification of this is These two quantities were used in [M1] to define continuous change of a transition operator and smooth change of transition operator.So take the norm (5) It is a norm because each part is non-negative, homogeneous and satisfies the triangle inequality, and |T ′ | * = 0 implies T ′ = 0. To see the latter, |T ′ | * = 0 implies both |T ′ Z | Z and L(T ′ ) are zero.T ′ maps any measure into Z, so for any µ ∈ Z then µT ′ = 0 and for any p ∈ P then pT ′ = 0. Now any measure ν can be written as kp + µ for some k ∈ R, p ∈ P and µ ∈ Z, so νT ′ = 0, thus T ′ = 0.
Secondly, the question of defining a norm on tangents to the manifold M of projections is addressed.Infinitesimal changes P ′ to a projection P do not necessarily map measures to Z.They are characterised by P ′ P + P P ′ = P ′ , which can equivalently be written as P ′ P = QP ′ or as P P ′ = P ′ Q and hence in the form (4) with respect to the direct sum decomposition F = R ⊕ K.
Nevertheless, for spectral projections one can restrict attention to the submanifolds M 1 , M 0 of projections P that fix 1 or send it to 0, respectively, according as eigenvalue 1 is or is not in the spectrum of the restriction of T to the image of P .In either case, tangents satisfy the additional condition P ′ 1 = 0. Thus P ′ satisfies the same conditions as a change to a transition operator above.So, use the norm |P ′ | * on the tangent spaces to M 0 and M 1 .For the special case of P ′ = 1p ′ , where p ′ is a change in a probability, So now the question of persistence of a spectral projection reduces to application of the implicit function theorem to the equation P T = T P for P in the manifold M 0 or M 1 .The tangent space T M i to M i (i = 0, 1) at P consists of the operators P ′ such that P P ′ + P ′ P = P ′ and P ′ 1 = 0. Consider the functions G i : M i × N → T M i where N is the space (affine with boundary) of transition operators T taking non-negative functions to non-negative ones and satisfying T 1 = 1, defined by the commutator It is easily checked that the right hand side is in Tangents T ′ to N are characterised by T ′ 1 = 0.If ∂G i ∂P : T M i → T N is invertible with bounded inverse then a solution P to G i (P, T ) = 0 has locally unique continuation P (T ) for nearby T and depends C 1 on T , with ( 7) The condition of bounded inverse boils down to a spectral gap.To see this, take a direct-sum decomposition in which ( 8) Then as in (4), P ′ has the form (9) and for (7) one wants to be able to solve ∂G ∂P (P ′ ) equal to an arbitrary element of T M i for P ′ .The elements of T M i have the same zero-diagonal block form.
Thus this reduces to solving two "Sylvester equations" for operators X = U, V respectively, with A, B equal to T P , T Q or vice versa, and arbitrary bounded C.There is a unique bounded solution X to each of the Sylvester equations iff the spectra of T P and T Q are disjoint [BR].In the disjoint case, it is automatic that the resulting inverse operator C → X is bounded [K].
For practical purposes, one needs a bound on the inverse.Define This generalises the definition of the "separation" of two matrices A, B, which is usually defined using Frobenius norm [Sw, Vh].Note that sep(B, A) is not necessarily equal to sep(A, B).From the above discussion, sep(A, B) > 0 iff the spectra of A and B are disjoint.
In some cases one can be explicit about its size.For example, if the spectral radii ρ A method to obtain similar bounds for some more general forms of separation of the spectra is described in [N].
The important point is that for a family of examples with growing system size N , if the separation is bounded away from zero then the continuation above is uniform in N .Furthermore, it can apply to infinite systems.
The same can be done in continuous time, with the transition operator replaced by a transition generator, but I do not spell it out here, save to mention that if the spectra of

An illustration
As an example, consider a 3-state PCA, with local state space {+, 0, −} at each site of a finite undirected graph with N nodes and bounded degree, say by m.The transition probabilities for the state at a given site are taken to be (11) in basis (+, 0, −), where n ± are the numbers of neighbours in states ± respectively, α ≥ 0 and ε ∈ [0, (1 + αm) −1 ].For ε = 0 the system has eigenvalue 1 with multiplicity 2 N and eigenvalue 0 with multiplicity 3 N − 2 N .From the above theory, the spectral projection to the subspace for eigenvalue 1 has a continuation for ε small.It is uniform in N because the separation of the relevant operators can be bounded away from 0 uniformly in N .The dynamics on the image of the projection is slow because the spectrum moves from {1} by at most O(ε).It is still Markovian.It might loosely be considered as an effective PCA with two states {+, −} on each site but R = im P is not spanned by δ-functions on states, so such a description requires interpretation analogous to quasiparticles in quantum mechanics.
If α < 1/m and ε > 0, it is geometrically ergodic, because the whole system is.This can be proved using Dobrushin's dependency matrix, as follows.Take discrete metric on the local state spaces.Then the update probability distributions p(σ) for the state at a site given its current state σ and the numbers n ± are the rows of the matrix (11) and so the variation distance for a change of state on the given site is at most (1 − ε) and for a change on a neighbouring site is at most αε.Thus the dependency matrix has ℓ ∞ -norm at most 1 − (1 − mα)ε.If α < 1/m and ε > 0, this is less than 1 and so the system is geometrically ergodic.
It is interesting to compute approximate dynamics on the image of the projection corresponding to the spectrum near 1.This is analogous to a second-order perturbation theory computation in quantum mechanics, e.g. the derivation of the t − J model from the Hubbard model [Sp].Here, the 0 state mediates interactions between the ± states.A general treatment of second-order perturbation theory for families of stochastic operators is given in the next section.

Second-order perturbation theory
Under the spectral gap condition at ε = 0 and with the norm (5), it has been proved above that for a smooth family of stochastic operators T ε and a spectral projection P 0 for T 0 , there is a range of ε for which P 0 continues smoothly to a spectral projection P ε for T ε .It is convenient to write for an ε-dependent invertible bounded linear map ψ, with ψ 0 the identity.There is a lot of freedom in the choice of ψ, the only constraints being that ψ ε (imP ε ) = imP 0 and ψ ε (ker P ε ) = ker P 0 , but it is good also to take ψ to be as smooth in ε as is T .Then one can define where ′ denotes d dε , and the above conditions reduce to (12) [S, P ] = P ′ , where [, ] again denotes the commutator.A convenient solution is which can be checked to satisfy the condition (12).Then it is desired to compute T = ψ −1 T ψ on im P 0 .This is the stochastic operator that represents the dynamics of T ε on im P ε , using the coordinate system ψ ε .
Start from T0 = T 0 .The first derivative T ′ = ψ −1 (T ′ + [T, S])ψ.Using the above choice of S, one obtains T ′ = ψ −1 Jψ, where J = P T ′ P + QT ′ Q.The second derivative can be evaluated to T ′′ = ψ −1 (P Evaluating these at ε = 0, one obtains T to second order in ε as It is perhaps more useful to substitute [T, P ′ ] = [T ′ , P ], since the righthand side of this is readily computable, but the second occurrence of P ′ above means that solution of this equation for P ′ is unavoidable.
For the example of section 4, there is already an effect at first order in ε, so secondorder perturbation theory is perhaps unnecessary, but it still serves as an illustration of the procedure.
where β ± = 1 + αn ± .Thus To compute the second-order term, begin with Using P 0 P ′ 0 = P ′ 0 Q 0 , P ′ 0 has just four independent parameters on each site, and solving [T, P ′ ] = [T ′ , P ] for them yields To compute the ε 2 term in (13), we can first subtract [T ′ 0 , P 0 ] from P ′ 0 , leaving site-wise Thus the final result to second order, using the (+, −) part of the ψ ε basis, is the effective PCA The outcome is a two-state model in which to leading order there is a small probability εβ ∓ /2 = ε(1 + αn ∓ )/2 per timestep for transition from state + to −, respectively − to +.
Perhaps a reader can suggest (and apply the method to) a more significant example.

Further potential applications
One area to which the above results might be usefully applied is metastability.Metastability is the phenomenon that an ergodic process may spend long times exploring restricted subsets of the support of the probability distribution, switching between them infrequently but sufficiently to achieve ergodicity in the long run.One reference is [BdH].
As an application of the result, one can start from the paper [D] in which the hypothesis is a reversible Markov process in continuous time with spectrum in [−ε, 0] ∪ [−∞, −1] such that each function in the image of the spectral projection P for [−ε, 0] is bounded, and it is proved that there is a partition of the state spaces into "metastable regions".It is not clear to me, however, whether there are relevant examples satisfying the hypotheses.One might think that Glauber dynamics of the 2D Ising model below the critical temperature would qualify, but I'm not aware that it is proved to have a spectral gap.Nevertheless, if there are examples on product spaces then the result of the present paper proves robustness of spectral gap and hence of the phenomenon of metastability.
One could envisage the result also being useful to treat perturbation of Markov dynamics with more than one stationary distribution, for example with more than one communicating component.Addition of some interaction between the communicating components typically makes the system have a single communicating component but the result of this paper shows there is a continuation of the spectral projection to a spectral projection with spectrum contained near 1 and the same rank (equal to the original number of communicating components), and it gives strong control over the resulting continuation.More substantially, one would like to deduce something about metastability in ergodic finite versions of infinite PCA with non-unique stationary distribution.
Another possible application is to perturbations of product systems in which the units all have simple eigenvalue +1 and isolated spectrum near some λ in the open unit disk.The conclusion is that there persists an invariant subspace with decay constant near λ.

Discussion
It has been proved here that every spectral projection of a stochastic operator on a product space persists C r -smoothly with respect to the norm (5) for C r -smooth changes in the transition operator, again measured using (5).This generalises the case of the rank-one projection onto a stationary distribution, treated in [M1].
Second-order perturbation theory has been developed for families of such operators and an example treated.Potential applications have also been suggested to robustness of metastability and some other uses for multi-component stochastic processes.
where C is the set of constant functions.Then define the Dobrushin distance between any two ρ, σ ∈ P by D(ρ, σ) = |ρ − σ| Z .
A and B lie in ℜz ≥ r A , ℜz ≤ r B respectively, with r A > r B , the explicit (convergent) solution X = ∞ 0 e (−A+r)t Ce (B−r)t dt for r ∈ (r B , r A ) provides a bound of the form sep(A, B) ≥ r A − r B − 2ε c A c B for any ε > 0, with c A , c B depending on ε.