Markov Capacity for Factor Codes with an Unambiguous Symbol

In this paper, we first give a necessary and sufficient condition for a factor code with an unambiguous symbol to admit a subshift of finite type restricted to which it is one-to-one and onto. We then give a necessary and sufficient condition for the standard factor code on a spoke graph to admit a subshift of finite type restricted to which it is finite-to-one and onto. We also conjecture that for such a code, this finite-to-one property is equivalent to the existence of a stationary Markov chain that achieves the capacity of the corresponding deterministic channel.


Introduction
Shifts of finite type (SFT), and more generally sofic shifts, are spaces of bi-infinite sequences that play a prominent role in symbolic dynamics.Of particular interest are factor codes (onto sliding block codes) from one such space to another, as they represent ways of encoding blocks in the domain space into blocks in the range space.However, typically such maps are badly many-to-one.So, it would be useful to know when one can restrict to a subspace of the domain such that the code is still onto and one-to-one or finite-to-one.Consider the following properties.Given an irreducible SFT X, a sofic shift Y , and a factor code, φ : X → Y , P1: There exists an SFT Z ⊂ X such that φ| Z is a conjugacy onto Y .P2: There exists an SFT Z ⊂ X such that φ| Z is finite-to-one and onto Y .
P3: There exists a stationary Markov measure ν on X s.t.φ * (ν) = µ 0 , the unique measure of maximal entropy (mme for short) on Y .
We are interested in finding checkable, necessary and sufficient conditions for each of these properties and in determining relationships among these properties.Clearly, P1 implies P2 and P2 implies P3 because, given P2, any mme ν on Z satisfies P3 (see Proposition 4.2).
A factor code φ : X → Y can be viewed as an input-constrained, deterministic, but typically lossy, channel in the information theoretic sense: an input x determines a channel output y = φ(x).Our interest in P3 stems from the fact that it is equivalent to the condition that the Markov capacity achieves the capacity of this channel, i.e., there is an input Markov measure on X that achieves capacity (See Sections 3 and 4 for more details).
Since Y is the image of an irreducible shift space, it must be irreducible, and it follows that µ 0 is indeed unique and fully supported on Y .However, we do not require ν to be fully supported on X.
For P1, there are certainly some necessary conditions; for instance if Y has a fixed point, then X must have a fixed point, and Y must be an SFT.
We consider the special class of factor codes with an unambiguous symbol.This means that the alphabet of Y is {0, 1} and in the block code Φ that generates φ, there is exactly one block u s.t.Φ(u) = 1.In Theorem 6.1, we characterize, for this class, all such φ for which there exists a shift space Z ⊂ X s.t.φ| Z is a conjugacy onto Y and show that such a Z must necessarily be an SFT, i.e., P1.In Theorem 6.5 we give a refined version of this result when X is the full 2-shift.
For P2, we recall from a counterexample [MPW84, pp.287-289] that P2 is not always satisfied.We consider a subclass of factor codes with an unambiguous symbol, motivated by that counterexample, called standard factor codes on spoke graphs (for the definition, see Section 7).In Theorem 8.1, for this subclass, we characterize all such φ satisfying P2, and we show that for any φ in this subclass, P2 is equivalent to the existence of an SFT Z ⊂ X, such that φ| Z is almost invertible and onto Y .
The same counterexample in [MPW84, shows that for standard factor codes on spoke graphs, P3 is not always satisfied.
We conjecture that for standard factor codes on spoke graphs, P3 and P2 are equivalent, i.e., if there exists a stationary Markov measure ν on X s.t.φ * (ν) = µ 0 , then there exists an SFT Z ⊂ X such that φ| Z is finite-to-one and onto Y ; if true, then for this class, the same characterization for P2 holds for P3.In Proposition 9.6, we prove this in several special cases.The proof combines the Chinese Remainder Theorem and a dominance condition.
We note that P3 is related to the property that a factor code from an irreducible SFT to an irreducible SFT is Markovian, although in that case one assumes that such ν is fully supported [BT84], [BP11].
It was shown in [MPW84, Proposition 3.2] that P2 always holds if we relax SFT Z to sofic Z.Similarly, it was shown in [MPW84,Corollary 3.3] that if we relax stationary Markov ν to stationary hidden Markov ν, then P3 always holds.
We point the reader to a related paper which considers factor codes φ : X → Y as deterministic channels and for a given factor code φ, characterizes those subshifts, of entropy strictly less than that of Y , that can be faithfully encoded through φ [Mac22].
The remainder of this paper is organized as follows.In Section 2, we give brief background on symbolic dynamics, focusing on SFTs, sofic shifts and factor codes.In Section 3, we describe a motivating problem from information theory.In Section 4 we describe factor codes as special channels in information theory (as was done in [MPW84]).We introduce in Section 5 the class of factor codes with an unambiguous symbol and for this class consider P1 in Section 6.In Section 7 we introduce the subclass of standard factor codes on spoke graphs and consider P2 for this subclass in Section 8.In Section 9, we consider P3 for this subclass and prove Proposition 9.6.Finally, in Section 10, we discuss standard factor codes on another class of graphs.

Notation and Brief Background from Symbolic Dynamics
We introduce in this section some basic terms and facts in symbolic dynamics.For more details, see [LM95].
Let A be a finite alphabet.The full A-shift, denoted by X A , is the collection of all bi-infinite sequences over A. When A = {0, 1, • • • , n − 1}, the full shift is called the full n-shift and will be denoted by X is compact and is invariant under σ.For any positive integer m, we use B m (X) to denote the set of all allowed blocks of length m in a shift space X, and B(X Let A 1 , A 2 be two alphabets, s, t be two fixed integers and let X be a shift space over A 1 .The map φ : X → A 2 Z defined by φ(x) i = φ(x [i−s,i+t] ) for any i is called a sliding block code with anticipation t and memory s.A sliding block code φ : X → Y is finite-to-one if there is an integer M such that |φ −1 (y)| ≤ M for every y ∈ Y , and it is one-to-one when M = 1.Moreover, the sliding block code φ : X → Y is a factor code if it is onto, in which case Y will be called the factor of X, and φ is a conjugacy if it is one-to-one and onto.
A point diamond for φ is a pair of distinct points in X that differ in finitely many coordinates and have the same image under φ.If X is irreducible, then φ is finite-to-one iff it has no point diamonds [LM95,Theorem 8.1.16].
Let G be a directed graph with no multiple edges.For a path γ in G, V (γ) denotes the sequence of vertices of γ and |γ| is the length, i.e., the number of edges, of γ (for example, for γ = e 1 e 2 • • • e n , V (γ) = I(e 1 )I(e 2 ) • • • I(e n )T (e n ) and |γ| = n, where for any i, I(e i ) and T (e i ) denote the initial vertex and the terminal vertex of e i , respectively).We use V(G) to denote the vertex set of G and X G to denote the vertex shift induced by G.That is, the shift space whose points are sequences of vertices of bi-infinite paths in G. Let Φ : V(G) → A be a labelling of vertices of G over a finite alphabet A. A graph diamond of Φ is a pair of distinct paths in G that have the same initial vertex, terminal vertex and label.It is well-known that, assuming G is irreducible, the factor code generated by Φ is finite-to-one iff Φ has no graph diamonds [LM95, Section 8.1].
A shift space X can be expressed as X = X F where F is a forbidden set, a list of forbidden words such that x ∈ X iff x contains no element of F .The choice of the forbidden set of X is in general not unique.When X = X F for some finite set F , X is called a shift of finite type (SFT).An SFT X is called M-step (or has memory M) if X = X F for a collection F of (M + 1)-blocks.A vertex shift is always a 1-step SFT and conversely, by lifting to its (M + 1)-th higher block shift, an M-step SFT can always be represented as the vertex shift of a graph.A shift space Y is sofic if there exist an SFT X and a sliding block code φ such that φ(X) = Y .Clearly, SFTs must be sofic.
There is a general definition of the degree of a factor code on any subshift, see [LM95, Definition 9.1.2.].For our purposes, we focus only on the following equivalent definition of the degree of a 1-block finite-to-one factor code φ : X → Y where X is an irreducible M-step SFT X: Let N := max{1, M}.The degree of φ is defined as the minimum over all blocks w = w 1 w 2 • • • w |w| in Y and all 1 ≤ i ≤ |w| −N + 1 the number of distinct N-blocks in X that we see beginning at coordinate i among all the pre-images of w [LM95, Proposition 9.1.12].A word w that achieves the minimum above with some coordinate i is called a magic word, and the subblock A factor code φ is almost invertible if its degree is 1.While an almost invertible code need not be finite-to-one, on an irreducible SFT it must be finite-to-one [LM95, Proposition 9.2.2].
The topological entropy of a shift space X is For a probability measure µ on X, let h(µ) denote its measure theoretic entropy.By the variational principle [Wal82, Theorem 8.6], h top (X) = sup µ {h(µ) : µ is a shift-invariant Borel probability measure on X}. (1) The measure of maximal entropy (mme) µ 0 of X is a probability measure on X such that the supremum in (1) is achieved.Given S ⊂ Z ≥0 , an S-gap shift X(S) is a subshift of X [2] such that any x ∈ X(S) is a concatenation of blocks of the form 0 s 1 with s ∈ S, where points with infinitely many 0's to both sides are allowed when S is infinite.Let λ be the unique positive solution to m∈S x −m−1 = 1.Then h top (X(S)) = log λ [DJ12], and the mme µ 0 of X(S) is determined by µ for any m, n and any allowed block It has been proven in [DJ12] that X(S) is an SFT if and only if S is finite or cofinite.Indeed, the forbidden set of X(S) is when S is finite which will be called the standard forbidden set of X(S) in this paper.

A Problem in Information Theory
A central object in information theory is a discrete channel.Here, there is a space of input sequences X, a space of output sequences Y , each over a finite alphabet, and for each x ∈ X a probability measure λ x on Y which gives the distribution of outputs, given that x was transmitted.One assumes that the map x → λ x is at least measurable and the channel is stationary in the sense that λ σx = σλ x where σ is the left shift.
Typically, X and Y are full shifts and in the simplest case, that of a discrete memoryless channel, λ x (y 1 . . .y n ) = Π n i=1 p(y i |x i ); here, for each element a of the alphabet of X, p(•|a) is a probability distribution on the alphabet of Y ; the channel is memoryless in the sense that conditioned on the input x i , the output y i is independent of all other inputs.For example, the binary symmetric channel (BSC) is the memoryless channel where X and Y are the full 2-shift and Here, ǫ is a parameter, known as the crossover probability.Given a stationary (i.e., shift invariant) input measure ν on X, one defines the stationary output measure µ on Y by µ = λ x dν.The mutual information of µ and ν is defined as where h(•) denotes entropy, and h(•|•) denotes conditional entropy (the second equality follows from the chain rule for entropy, which is a fundamental equality in information theory); in information theory, shift-invariant measures are viewed as stationary processes, and these entropies are often referred to as entropy rates.
There are several notions of channel capacity, which all agree under relatively mild assumptions.The stationary capacity (capacity for short) of a discrete noisy channel is defined Cap = sup stationary ν I(µ, ν).
Note that this makes sense since the output measure µ is a function of the input measure ν and the channel.
For a discrete memoryless channel, the capacity can be computed effectively because it agrees with the sup when restricted only to i.i.d.(i.e., stationary Bernoulli) measures, turning it into a finite dimensional optimization problem, and, while there is no known closed form expression for capacity in general, the optimum can be effectively approximated by the well-known Blahut-Arimoto algorithm [Bla72,Ari72].
We define the k-th order Markov capacity We are interested in the problem: when does Markov capacity achieve capacity, i.e., when does Cap k = Cap for some k?
It is known, using the ergodic decomposition, that under mild assumptions, Cap (resp., Cap k ) coincides with the maximum mutual information over all stationary, ergodic input measures (resp., stationary, irreducible, k-th order Markov input measures) [Gra11,Fei59].
Again, with mild assumptions on the channel, one shows that lim k→∞ Cap k = Cap [CS08]; informally, "Markov capacity asymptotically achieves capacity."This is important because for fixed k, computation of Cap k is a finite dimensional optimization problem.According to the discussion above, for discrete memoryless channels, Cap 0 = Cap; informally, "Bernoulli capacity achieves capacity."But for channels with memory, even just one step of memory, except in certain cases such as input-constrained noiseless channels below, it is believed that Cap k = Cap for all k.However, little along these lines has been proven.
If X is not a full shift, then the channel is called input-constrained.Typically, the input constraint X is an SFT or sofic shift.Such a shift space can be considered a noiseless channel in itself, in a trivial way: Y = X and for each x ∈ X, ν x = δ x , the point mass on {x}.The capacity of this channel is easily seen to be the topological entropy, h top (X), otherwise known as the noiseless capacity, which can be easily computed.Now, consider the input-constrained binary symmetric channel.This is the BSC, where the inputs are required to belong to a given SFT or sofic shift X over {0, 1}.While the capacity of the BSC and the noiseless capacity of X are known explicitly, the capacity of the X-constrained BSC is not known.And while Markov capacity asymptotically achieves capacity of this channel, it is believed that Markov capacity does not achieve capacity, i.e, for all k, Cap k = Cap.However, this has not been proven.

Factor Codes as Channels
This brings us to a main point of our paper: for a class of channels, albeit rather simple in practice, we can rigorously decide whether or not Markov capacity achieves capacity.An example of this was given in [MPW84,.Specifically, we view a factor code φ : X → Y as an input-constrained, deterministic channel; here, ν x = δ φ(x) , so the input determines the output uniquely.Intuitively, for this channel input sequences are distorted in a deterministic way.It follows that, in this case, for any invariant input measure ν, h(µ|ν) = h(φ * (ν)|ν) = 0 where φ * is the induced map (of φ) on stationary measures on X.
According to [MPW84, Corollary 3.2], there exists a stationary input measure ν (in fact, a stationary hidden Markov input measure) such that φ * (ν) = µ 0 , the unique mme on Y .Thus, by the variational principle [Wal82, Theorem 8.6], Cap = h top (Y ) (an alternative to this argument is to show that the map ν → φ * (ν) is onto the set of all stationary measures on Y : given stationary µ on Y , use the Hahn-Banach theorem to find a not-necessarilystationary ν ′ on X s.t.φ * (ν ′ ) = µ and let ν be any weak limit point of the sequence In summary, we have: Proposition 4.1.Let φ : X → Y be a factor code from an irreducible SFT X to a sofic shift Y .Let µ 0 be the unique measure of maximal entropy on Y .For the input-constrained, deterministic channel defined by φ, (1) Cap (resp.Cap k ) coincides with the maximum mutual information over all stationary, ergodic input measures (resp., stationary, irreducible, k-th order Markov input measures); (2) lim k→∞ Cap k = Cap; (3 The following simple result gives a relation between P2 and P3. Proposition 4.2.With the same assumptions as in Proposition 4.1, if there is an SFT Z ⊂ X such that φ| Z is finite-to-one and onto Y , then there is an irreducible stationary Markov measure ν on Z of order at most the memory of Z such that φ * (ν) = µ 0 .
Proof.Let ν be the unique mme of any irreducible component of Z with maximum topological entropy.It is stationary, irreducible, and Markov.Since φ| Z is finite-to-one and onto Since µ 0 is the unique mme on Y , we have φ * (ν) = µ 0 .
Proposition 4.3.Let φ : X → Y be a factor code from an irreducible SFT X to a sofic shift Y .Let ν be an irreducible stationary Markov measure on X and assume that φ * (ν) = µ 0 , the unique mme on Y (in particular, Markov capacity achieves capacity of the input-constrained deterministic channel determined by φ).
It follows from Propositions 4.2 and 4.3 that P2 holds iff P3 holds with a measure ν that is also irreducible stationary Markov and satisfies any of the equivalent conditions in Proposition 4.3.We will return to this point in Section 9.

Factor Codes with an Unambiguous Symbol
We begin with a brief introduction to factor codes with an unambiguous symbol.Such factor codes are also known as factor codes with a singleton clump [PQS03].
Let X be a shift space over an alphabet A and Then, the factor code φ : induced by Φ is called a factor code with an unambiguous symbol.Here, Y is the image of φ.
In the remainder of this paper, we focus on the case when X is an irreducible SFT.Note that in this case, by passing to a higher block shift, in the preceding definition we can and sometimes will assume that k = 1 and that X is an SFT with memory 1.
The following propositions give some properties of Y .
Proposition 5.1.Let φ : X → Y be a factor code with an unambiguous symbol.Then Y is an S-gap shift.
Proof.The elements of Y are arbitrary concatenations of strings of the form 10 s with s ∈ S such that there exists some allowed block w of length k + s + 1 satisfying the following: (1) w [1,k] = D; (2) w [s+2,s+k+1] = D; (3) for all 2 Hence, Y is an S-gap shift.
Proposition 5.2.Let φ : X → Y be a factor code with an unambiguous symbol.If X = X [2] , then (1) 10 k−1 1 is not allowed in Y iff D is purely periodic (i.e., D = u ℓ for some ℓ ≥ 2 and some block u); (2) For any j ≥ k, 10 j 1 is allowed in Y .
Proof.To prove (1), first observe that 10 k−1 1 allowed iff the image of DD is 10 k−1 1.If 10 k−1 1 is not allowed, then the image of DD has a prefix of the form 10 c 1 for some 0 here and below in this proof, subscripts are read modulo k).It follows that for all integers m, n and all 0 (d, k).Then e = md+nk for some m, n.Thus, for all 0 . Since e < k, k/e ≥ 2. So, D is purely periodic.Conversely, assume that D is purely periodic.Then the image of the block DD is not 10 k−1 1 and so 10 k−1 1 is not allowed.
We now prove (2).For j ≥ k, we show 10 j 1 is allowed in Y by finding a binary block By reversing the roles of 0 and 1 in the domain a similar argument works when b where ms + t = k and m is the smallest positive integer such that b 1 • • • b k can be expressed by (5).We consider the following two cases: In this case, we claim that (4) is satisfied by letting To see this, assume to the contrary that This means that there is an extra 1 in addition to the two 1's at the first and the last position in the image.Hence, there is an extra b 1 • • • b k in the input in addition to the two at the initial and tail end (these two b 1 • • • b k 's will be called the head and the tail, respectively).Since or In this case, an extra b 1 • • • b k in the input must intersect the head, the tail and the "intermediate" subblock the head and end with some b 1 • • • b t in the tail.Therefore, (4) holds as long as which is always possible for some binary 6 Characterization of the One-to-One Condition for Factor Codes with an Unambiguous Symbol In this section, we address P1 for factor codes with an unambiguous symbol.Through this section, a factor code with an unambiguous symbol always refers to the one induced by Φ in (3) unless otherwise specified.We have the following theorem which characterizes the existence of a subshift of finite type, on which the restriction of φ is one-to-one and onto.Theorem 6.1.Let φ : X → Y be a factor code with an unambiguous symbol defined on an irreducible shift space X.Let S be such that Y is an S-gap shift.Then, there is a shift space Z ⊂ X s.t.φ| Z is a conjugacy from Z onto Y if and only if either of the following conditions holds: (C1) S is a finite set; (C2) there is a fixed point (i.e., fixed via the shift) in X other than D ∞ .
Moreover, Z and Y must be SFTs if either (C1) or (C2) holds.
(Note: D ∞ may or may not be in X and even if D ∞ ∈ X, it may or may not be a fixed point.)Remark 6.2.Note that to say that S is finite means that there exists some M such that every allowed block in X of length M contains D as a subblock.Sometimes, one says that in such a case D is a "Rome".Remark 6.3.According to Proposition 4.2, when (C1) or (C2) holds, the capacity of the deterministic channel, defined by φ, is achieved by a Markov chain.
Proof of Theorem 6.1:We first show that Y must be an SFT (and thus Z must also be an SFT) when (C1) or (C2) holds.This follows from the fact that an S-gap shift is an SFT iff S is either finite or cofinite [DJ12].If (C1) holds, there is nothing to prove.If (C2) holds, then, using irreducibility of X, the image Y contains points with blocks that begin and end with 1 and contain arbitrarily long strings of 0's in between, and thus S is cofinite.
Only if part: If S is finite, we are done.So assume that S is infinite.Then 0 ∞ ∈ Y .Since there exists a shift space Z ⊂ If part: Assume Condition (C2) of the theorem.Up to recoding, we may assume that X is a (1-step) vertex shift X G , D is a vertex of the graph G and there is a vertex A in G such that A is distinct from D and G has a self-loop τ at A. Using irreducibility of X, there are paths in G, β + from D to A and β − from A to D, neither of which contains D in its interior.Let Now Y is a gap shift with gap set of the form S := F ∪ {N, N + 1, • • • }, where each element of F is less than N.For each s ∈ S, choose π s to be a first-return cycle of length s from D to itself ("first-return" means that it does not contain D in its interior).We will assume that for s ≥ N, we choose π s = β + τ s−N β − .For y ∈ Y , let O y := {j ∈ Z : y j = 1} and define η : Y → X as follows: Observe that η is injective because if y, y ′ ∈ Y and y = y ′ , then for some i, WLOG we assume y i = 1 and y ′ i = 0 and so (η(y)) i = D and (η(y ′ )) i = D. Furthermore, we claim that η is a sliding block code.To see this, note that η is shift-invariant by virtue of its definition, and (η(y So, η is an injective sliding block code from Y into X = X G .Let Z be its image.Then, η −1 is a bijective sliding block code from Z onto Y .Moreover, by the construction of η, for every It follows that η −1 = φ| Z .This completes the proof of the if part assuming Condition (C2).Now assume Condition (C1).The proof follows along the same lines except that the definition of η is even easier: S = F is a finite set, and we only need the first two cases, (D1) and (D2), of the definition of η because for any y ∈ Y , O y is a nonempty set with no maximum and no minimum.
When the domain of φ is X [2] , then Condition (C2) in Theorem 6.1 holds and there is always an SFT Z ⊂ X to which the restriction of φ is one-to-one and onto Y .Note that Y must be an S-gap shift with S cofinite.Our next result gives an explicit description of Z for some special cases.Theorem 6.5.Let φ : X = X [2] → Y be a factor code with an unambiguous symbol, F be the standard forbidden set of Y and F be the bitwise complement of F .Then, the following are equivalent: (1) At least one of the symbols from {0, 1} occurs at most once in D; (2) Either φ| X F or φ| X F is one-to-one and onto Y ; (3) Either φ| X F or φ| X F is finite-to-one and onto Y ; (4) Either φ| X F or φ| X F is onto Y .
(Note: When (1) holds, φ| X F and φ| X F may not both satisfy (2) (resp., (3) and (4)).For example, suppose k = 4 and D = b 1 b 2 b 3 b 4 = 0000.Then, one verifies that φ| X F is one-to-one and onto, but φ| X F is not.See Example 6.6 for more details.) and φ is trivially a conjugacy.Hence, we assume k ≥ 2 throughout the remainder of the proof.
By reversing the roles of 0 and 1 in the domain it follows that φ| X F : X F → X F is also one-to-one and onto when b Case 2: There is only one 0 or only one We first assume that b j = 1 for some 1 ≤ j ≤ k and b i = 0 for any 1 with m l ≥ M for all l ∈ Z, one directly verifies that φ(x) = σ j−k (x).Thus, φ| X F must be one-to-one and onto Y .By reversing the roles of 0 and 1 in the domain it follows that φ| X F → X F is also one-to-one and onto when there is only one 0 in b [1,k] .
(4) ⇒ (1): We prove by way of contradiction.Suppose there are at least two 1's and at least two 0's in b [1,k] .Then, k ≥ 4 and 11 ∈ F .We will show that both φ| X F and φ| X F are not onto by finding a y ∈ Y and two blocks B 1 ∈ F and B 2 ∈ F such that any x ∈ φ −1 (y) contains B 1 and B 2 .Indeed, if such a y exists, then y / ∈ φ(X F ) and y / ∈ φ(X F ) and therefore both φ| X F and φ| X F are not onto, contradicting (4).
We consider the following cases: Case 1: Both 00 and 11 are subblocks of b Choose y ∈ Y with y 0 = 1.Then, for any . Since 11 ∈ F , 00 ∈ F and they are both subblocks of x, we conclude that φ| X F and φ| X F are not onto.
In either case, one directly verifies that 11 ∈ F , 010 ∈ F .Consider any y ∈ Y with y 0 = 1.Then, any x ∈ φ −1 (y) satisfies and therefore it contains both 11 and 010.Thus, both φ| X F and φ| X F are not onto.
) m 1 and it is purely periodic.In this case, we infer from Proposition 5.2 (1) that 10 k−1 1 is not allowed in Y but 10 k 1 is.Consider y ∈ Y with y [0,k+1] = 10 k 1.For any x ∈ φ −1 (y), either ∈ F , we conclude that both φ| X F and φ| X F are not onto. If Hence, we infer from Proposition 5.2 (1) that 10 k−1 1 is allowed in Y .A similar argument as in Case 2 for odd k implies that both φ| X F and φ| X F are not onto.
By reversing the roles of 0 and 1, a similar argument as in Subcase 3.1 for s 1 = 0 implies that both φ| X F and φ| X F are not onto.
If s 2 = j or t 2 = 0, a similar argument as in Subcase 3.1 for s 1 = 0 again implies that both φ| X F and φ| X F are not onto.
Example 6.6.Let Φ : {0, 1} 2 → {0, 1} be a 4-block code defined by → Y be the factor code induced by Φ.Using Proposition 5.2, one verifies that Y is an S-gap shift with S = {0, 4, 5, 6, • • • }.Equivalently, Y is an SFT with the forbidden set F = {101, 1001, 10001}.Noting that 1 ∞ ∈ X, we deduce from Theorem 6.1 that there is an SFT Z ⊂ X such that φ| Z is a conjugacy.Note that φ| X F is not onto: since 010 ∈ F and Φ −1 (100001) = 000010000, 100001 is not allowed in the image of φ| X F and therefore φ| X F is not onto.It follows from Theorem 6.5 that we can choose Z to be X F .The reader can verify this directly.
Remark 6.7.If a factor code φ defined on an irreducible SFT X is finite-to-one but not oneto-one, then there is no shift space Z ⊂ X such that φ| Z is one-to-one and onto.This follows from the fact that if such a Z exists, then by [ A graph G is a spoke graph if it consists of a central state B and finitely many distinct spokes U i , i ∈ T such that for any i = j ∈ T , U i and U j only intersect at the indices of degenerate spokes and T 1 T \ T 0 denote the indices of regular spokes.See Figure 1 for an example of a spoke graph with two regular spokes and one degenerate spoke.Let Φ : V(G) → {0, 1} be defined by For a block Consider the factor code φ : X G → Y ⊂ X [2] induced by Φ.We call φ the standard factor code on G.The image Y of φ is a gap shift with gap set where i := a i + (j − 1)d i and 0 ≤ b (j) i < D for any i ∈ T 1 and any 1 ≤ j ≤ n(i).For each i ∈ T 1 , denote Then the gap set S can be expressed by 8 Characterization of the Finite-to-one Condition for Standard Factor Codes on Spoke Graphs Here, we characterize P2 for standard factor codes on spoke graphs.
Theorem 8.1.Let G be a spoke graph and φ be the standard factor code on G.Then, the following are equivalent: (1) There is a (2) There is an irreducible SFT Z ⊂ X G such that φ| Z is almost invertible and onto Y .
(3) There is an irreducible SFT Z ⊂ X G such that φ| Z is finite-to-one and onto Y .
(Note 1: If d i ≥ 2 for all i ∈ T 0 ∪ T 1 , then the vertex shift of a spoke graph does not have a fixed point.If T 1 = ∅, then the image Y always has a fixed point 0 ∞ .So, under these assumptions, φ| Z cannot be a conjugacy.Note 2: In (2) and (3), it is not necessary to assume that Z is irreducible since otherwise we can replace Z with an irreducible component with maximal topological entropy.) Proof.
(1) ⇒ (2): Suppose there is a set We first construct a new graph H from the graph G through the following three steps: (A) Let H be the graph consisting of the central state B ∈ V(G) and all the spokes U i ⊂ G with i ∈ W ; (B) For each r ∈ S (1) \ S (0) , add to H a simple cycle, denoted C(r), of length r + 1 starting and ending with B; (C) For each s ∈ S (2) \ S (1) , choose an i(s) ∈ T 0 such that |C i | = s + 1. Add the degenerate spoke U i(s) to H.
See Figure 2 for an example of the construction of H. Let H 1 , H 2 , H 3 denote the subgraph consisting of spokes added to H in Step (A), (B) and (C), respectively.It is worth noting that any r ∈ S (1) \ S (0) corresponds to a "gap" in regular spokes of G that is missing from {U i : i ∈ W }, and any s ∈ S (2) \ S (1) corresponds to a "gap" in degenerate spokes of G that is missing from {U i : i ∈ T 1 }.
The following properties are immediate from the construction of H: Now, define a one-block map Ψ : V(H) → V(G) as follows: • For any r ∈ S (1) \ S (0) , choose a cycle C(r) in G starting and ending with B with no Note that for any two distinct vertices , where C(r 1 ) and C(r 2 ) are constructed in Step (B).Let ψ : X H → X G be the sliding block code induced by Ψ and define Z := ψ( X H ). Note that any point z ∈ Z is a concatenation of strings of the form where u j 's, v j 's, w j 's and i j 's are vertices in G distinct from B. Thus, To show that ψ is one-to-one, it suffices to show that any string in (11) has a unique Ψ-pre-image, and we prove this by considering the following cases: (1) Any allowed block of the form Bu (2) For simplicity, among the infinite paths in (11), we consider only those of the form Such a string must be the Ψ-image of some string of the form Let Z := ψ( X H ). Then Z is an irreducible SFT because it is conjugate to X H .We now prove that φ| Z is almost invertible and onto Y .Note that by definition Φ • Ψ maps the central state B to 1 and maps all other vertices in H to 0. So φ • ψ is the standard factor code on the spoke graph H.
To see that φ • ψ is onto, first note that the image (φ • ψ)( X H ) is a gap shift with gaps of the form where we use the fact that S (0) ⊂ S (1) in the last equation.Since ∪ i∈W K i = ∪ i∈T 1 K i , we have S ′ = S where S is such that Y is an S-gap shifts.Therefore φ • ψ is onto.
We now show that φ • ψ is finite-to-one.We first note from the construction of H that for any t ∈ S, there is a unique cycle of length t + 1 in H starting and ending with B, whose interior does not contain B. Hence, for any t ∈ S, there is a unique path in H whose image under Φ • Ψ is 10 t 1.This implies that φ • ψ has no graph diamond and therefore it is finite-to-one.
Since the central state B is the only vertex in H whose (Φ • Ψ)-image is 1, and since φ • ψ is a finite-to-one 1-block code on a 1-step SFT, its degree is 1 (by [LM95, Theorem 9.1.11(3) and Proposition 9.1.12])and therefore it is almost invertible.
Finally, since φ • ψ is almost invertible and onto Y, and ψ is a conjugacy from X H to Z, we conclude that φ| Z : Z → Y is almost invertible and onto.
(3) ⇒ (1): Suppose that there is an irreducible SFT Z ⊂ X such that φ| Z is finite-to-one and onto.Let k be the degree of φ| Z and L be the maximum length of words in a forbidden list of blocks from X that defines Z.Then, there exist a word of the form u := 0 e 1 10 e 2 1 • • • 10 en such that each e i ∈ S, an integer L ≤ M ≤ |u|, and an index 1 ≤ j ≤ |u| − M + 1 such that the set has cardinality k.Note that u is a magic word and u [j,j+M −1] is the corresponding magic block.
For notational convenience, in the remainder of this proof, for any block w with length |u|, we use the following notation: where u, j and M are defined as above.
Denote elements in E by a (1) , a (2) , • • • , a (k) and for any 1 ≤ l ≤ k, define Note that R is the set of all φ| Z -pre-images of u.By a higher block recoding similar to [LM95, Proposition 9.1.7],the following observation follows from [LM95, Proposition 9.1.9(part 2)].
For any 1 ≤ l ≤ k, define l) and some w ∈ R} to be the index set of regular spokes that can follow some pre-images of u in B (l) and precede some pre-images of u in R. We claim that for any 1 ≤ l ≤ k, {K i : i ∈ F (l) } are pairwise disjoint and ∪ i∈F (l) K i = ∪ i∈T 1 K i .We assume WLOG that l = 1 in the following.To show {K i : i ∈ F (1) } are pairwise disjoint, we suppose to the contrary that there exists f Then, n(f ) ∈ S and according to the definition of F (1) , there are v, x ∈ B (1) , w ∈ B (l 1 ) , y ∈ B (l 2 ) for some 1 ≤ l 1 , l 2 ≤ k such that Φ(vV (γ Then, we infer from Observation 1 that l 1 = l 2 and therefore w = y = a (l 1 ) .Now, the two words are both φ| Z -pre-images of u u10 n(f ) 1u u, and they both start with a (1) and end with a (l 1 ) .Since a (1) and a (l 1 ) both have length M, which is no less than L, we deduce that the two words in (12) can be extended to a point diamond, contradicting the fact that φ| Z is finite-to-one.
To show ∪ i∈F (1) K i = ∪ i∈T 1 K i , assume to the contrary that there is a g ∈ ∪ i∈T 1 K i but g / ∈ ∪ i∈F (1) K i .Choose n(g) := g (mod D) such that n(g) > max{d i : i ∈ T 0 } and n(g) ≥ max{d i L + m i : i ∈ T 1 }.Then, n(g) ∈ S and we deduce from the definition of F (1) that the set does not contain a (1) .Noting that Q ⊂ {a (1) , a (2) , • • • , a (k) } since u is a magic word, the cardinality of Q is at most k − 1.This contradicts the fact that φ| Z has degree k, and therefore ∪ i∈F (1) . Then, we immediately infer from above that W is the desired set and therefore complete the proof.
Remark 8.2.Our proof indeed shows that conditions (2) and (3) in Theorem 8.1 are equivalent for any 1-block factor code with an unambiguous symbol defined on a 1-step SFT.
Remark 8.3.Let G be the graph in Figure 3 where B is the central state.Let φ be the standard factor code on G.Then, one verifies that Φ (which generates φ) has no graph diamond and so φ is finite-to-one; on the other hand, φ is not one-to-one: both (V 1 V 2 ) ∞ and (V 2 V 1 ) ∞ are preimages of 0 ∞ .In this case, there is no subshift Z ⊂ X G such that φ| Z is one-to-one and onto (see discussion in Remark 6.7).
Here the image Y of φ is an S-gap shift with There are two ways to choose W : (1) W = {1, 3}.It can be readily checked that ∪ i∈W K i = ∪ i∈T 1 K i and K 1 ∩ K 3 = ∅.So, by Theorem 8.1 there is an SFT Z ⊂ X G such that φ Z is finite-to-one and onto Y .In this case, the proof chooses Z to be X U 1 ∪U 3 .
(2) W = {2}.Here, the proof of Theorem 8.1 chooses Z to be X U 2 .This shows that there are two irreducible Markov measures ν 1 and ν 2 , with ν 1 supported on X U 1 ∪U 3 and ν 2 ssupported on X U 2 , that both achieve the capacity of the channel given by the standard factor code on G. Using ν 1 and ν 2 , one can construct a fully supported irreducible Markov measure on X G that achieves capacity.
In the remainder of this section, we will prove some special cases of Conjecture 9.1.To this end, we begin with some lemmas.Lemma 9.3.(Consequence of Strong Form of Chinese Remainder Theorem) Let k be a positive integer.If for any 1 ≤ i < j ≤ k, there exists x i,j s.t.x i,j = a i (mod d i ) and x i,j = a j (mod d j ), then there exists x such that x = a l (mod d l ) for any 1 ≤ l ≤ k.
Proof.For any 1 ≤ i < j ≤ k, let g i,j = gcd(d i , d j ).Then g i,j divides x i,j − a i and x i,j − a j so g i,j divides a i − a j .Hence, the generalized Chinese Remainder Theorem [Le56, Theorem 3-12] asserts that there is a common solution to x = a i (mod where Proof.Fix a congruence class 0 ≤ j < D and let µ 0 be the mme on Y .Since φ(ν) = µ 0 , for all n ≥ max i∈T m i /D, µ 0 (10 Dk+j+Dn 1) = )).Since µ(1) = ν(B), using the formula for the mme µ we have Letting n → ∞, we have for all i ∈ R j ∩ P , This yields equation (13) and proves (a).Since Y is a gap shift, the mme on Y is fully supported and so gives positive measure to each allowed gap.Thus, ∪ i∈T 1 \P K i ⊂ ∪ i∈P K i , proving (b).
We then claim that there exists i 3 ∈ S(i 1 , i 2 , j 1 ) such that K i 2 ∩ K i 3 = ∅.If not, then Recalling that j 2 ∈ K i 1 ∩ K i 2 and j 1 ∈ K i 1 ∩ (∩ i∈S(i 1 ,i 2 ,j 1 ) K i ), we derive from Lemma 9.3 that there exists j With these lemmas in hand, we prove the following.
Proposition 9.6.Let G be a spoke graph, φ be the standard factor code on G and P be defined as in Lemma 9.4.If there is a stationary Markov measure ν on X G s.t.φ * (ν) = µ 0 , the mme of the output Y , then there is an SFT Z ⊂ X G such that φ| Z is finite-to-one and onto Y if any of the following hold: (a) ∩ i∈P K i = ∅ (in particular, this holds when m i = 1 for all i or the {d i } are pairwise co-prime (by the Chinese Remainder Theorem)); Proof.According to Theorem 8.1, it suffices to show that there is a W ⊂ T 1 such that Proof of (a): Let A := ∪ i∈P K i .Note that P = ∅ by the existence of ν.Let j ∈ ∩ i∈P K i .Apply Lemma 9.4(c) to this j and an arbitrary j ′ ∈ A to get that for all i ∈ P , i ∈ ∩ j ′ ∈A R j ′ and so each K i = A. By Lemma 9.4 (b), A = ∪ i∈T 1 K i .Hence, W can be taken to consist of only one element, namely any element of P .
Proof of (b): Since K i 's are pairwise intersecting, an application of Lemma 9.3 to Proof of (c): We assume WLOG that the K i 's are distinct.Denote We claim that for any To see this, assume to the contrary that there are i Proof of (d): By adding repeated spokes (for which the choice of the set W is not affected), we can regard the cases |P | < 5 as special cases of |P | = 5.Hence, we assume |P | = 5 in the following.
Let P = {1, 2, 3, 4, 5}.A pair i, i ′ ∈ P is called an intersecting pair if K i = K i ′ and K i ∩ K i ′ = ∅.We consider the following cases.Case 1: For any intersecting pair i, i ′ ∈ P , either In this case, we define a partial order in the following way: if i, i ′ is an intersecting pair and K i ⊂ K i ′ , then K i K i ′ ; if i, i ′ is not a intersecting pair, then K i and K i ′ are incomparable.
The partial order partitions the set {K i : i ∈ P } into several classes such that 1. each class is a chain with a unique maximal element (under ); 2. if K i and K i ′ are from different classes, then Hence, letting W be the indices of all the maximal elements, we have {K i : i ∈ W } are pairwise disjoint and Case 2: There exists an intersecting pair i, i ′ ∈ P such that both K i K i ′ and K i ′ K i .
Remark 9.7.When |P | ≤ 4, by carefully going through a similar argument as in the proof of (d), one can show that for any i = j ∈ P , K i ∩ K j = ∅ or K i ⊂ K j or K j ⊂ K i .

Standard Factor Codes Defined on Another Class of Graphs
We believe that our approach in the proof of Theorem 8.1 also works for more general graphs.Note that for a graph G with one (regular) spoke, Theorem 8.1 implies that P2 always holds.In this section, as an example we show that P2 also holds for standard factor codes defined on a more general class of graphs.To be specific, let G be a graph which consists of a central state B, a simple path γ + from B to B ′ = B, a simple path γ − from B ′ to B, and two simple cycles C 1 and C 2 including B ′ s.t.
Here, we implicitly assume that γ + = ∅ and γ − = ∅.Just as in Section 7, a standard factor code φ on G is induced by a one-block map Φ : V(G) → {0, 1} that maps the central state B to 1 and any other vertex to 0.
Let Y be the image of φ.We have the following.
Proposition 10.1.Let G be the graph defined above and φ be the standard factor code on G.Then, there is an SFT Z ⊂ X G s.t.φ| Z is finite-to-one and onto Y .
We need the following lemma.

Case 2 :
Neither 00 nor 11 is a subblock of b 1 b 2 • • • b k .In this case, b 1 b 2 • • • b kis a binary block with 0 and 1 occurring alternately.We assume WLOG that b 1

iFigure 1 :
Figure 1: A spoke graph with two regular spokes and one degenerate spoke, where dots denote vertices.

Figure 2 :
Figure 2: A example of G and H, where dots denote vertices.

Figure 3 :
Figure 3: The graph G, which is a representation of X F .
[MPW84, section, we consider a class of factor codes with an unambiguous symbol motivated by the example in[MPW84,.A graph U is called a spoke if U consists of a state B, a simple path γ + from B to a state B ′ = B, a simple path γ − from B ′ to B, a simple cycle C including B ′ s.t.γ + , γ − and C are all disjoint (except that they all share the state B ′ and γ + , γ − share the state B).We also allow degenerate spokes with one simple cycle C at B, which we indicate by γ + LM95, Corrolary 4.4.9],h top (Z) < h top (X), which contradicts [LM95, Corollary 8.1.20].For a simple example of such a φ with an unambiguous symbol, see Example 8.3.7 Standard Factor Codes Defined on Spoke Graphs c) There are subsets E 1 and E 2 of P such that {Ki : i ∈ E 1 } and {K i : i ∈ E 2 } are both pairwise disjoint and ∪ i∈E 1 ∪E 2 K i = ∪ i∈T 1 K i .Inparticular, this condition is satisfied if there are only two distinct d i 's; (d) |P | ≤ 5.