ALGEBRAIC DYNAMICAL SYSTEMS FROM LDPC CODES SATISFY A STRONG NEGATION OF THE WEAK PINSKER PROPERTY

Tim Austin; Lewis Bowen; Christopher Shriver

doi:10.1017/S147474802510100X

ALGEBRAIC DYNAMICAL SYSTEMS FROM LDPC CODES SATISFY A STRONG NEGATION OF THE WEAK PINSKER PROPERTY

Part of: Ergodic theory

Published online by Cambridge University Press: 09 June 2025

and

Tim Austin: Affiliation:
Mathematics Institute, University of Warwick, U.K. (tim.austin@warwick.ac.uk)
Lewis Bowen*: Affiliation:
Department of Mathematics, University of Texas at Austin, U.S.A.
Christopher Shriver: Affiliation:
Department of Mathematics, University of Texas at Austin, U.S.A. (christopher.shriver@math.utexas.edu)
*: lpbowen@math.utexas.edu

Article contents

Abstract
Introduction
Preliminaries
Bethe–Kikuchi entropy and annealed calculations
Completely positive entropy, local convergence and model mixing
Shattering
LDPC codes and measures on microstate spaces
Property M
Proof of local and empirical convergence
Proof of Theorem
Proof of Theorems and
Directions for further study
Competing interests
Footnotes
References

Rights & Permissions

Abstract

We construct an explicit algebraic example of a subshift of finite type over a group $\Gamma $ with an invariant Markov measure which has completely positive sofic entropy (with respect to ‘most’ sofic approximations) and yet does not have a direct Bernoulli factor because its model spaces shatter into exponentially many clusters of sub-exponential size. The example and its analysis are related to random low-density parity-check (LDPC) codes.

Keywords

sofic entropy LDPC codes weak Pinsker property Bernoulli shifts algebraic dynamics

MSC classification

Primary: 37A35: Entropy and other invariants, isomorphism, classification

Information

Type: Research Article
Information: Journal of the Institute of Mathematics of Jussieu , Volume 24 , Issue 6 , November 2025 , pp. 2317 - 2385

DOI: https://doi.org/10.1017/S147474802510100X [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

1 Introduction

This paper constructs explicit dynamical systems with unusual properties related to recent work on the weak Pinsker property and shattering. The construction is explained next; the background, motivation and precise statements are developed afterwards.

Fix natural numbers $d,k$ and let $\Gamma =\Gamma _{d,k}$ be the d-fold free product of order-k cyclic groups

$$\begin{align*}\Gamma := \langle s_1,\dots,s_d:\ s_1^k = \dots = s_d^k = e\rangle = \underbrace{\mathbb{Z}_k\ast \cdots \ast \mathbb{Z}_k}_{d},\end{align*}$$

where $\mathbb {Z}_k$ means $\mathbb {Z}/k\mathbb {Z}$ . The set of all functions $x:\Gamma \to \mathbb {Z}_2$ is denoted $\mathbb {Z}_2^\Gamma $ . This is a compact Abelian group under pointwise addition with the pointwise convergence topology. Let $X \le \mathbb {Z}_2^\Gamma $ be the closed subgroup defined by

$$\begin{align*}X = \left\{x \in \mathbb{Z}_2^\Gamma:\ \sum_{j=0}^{k-1}x_{gs_i^j} = 0 \quad \forall g\in \Gamma,\ i = 1,\dots,d\right\},\end{align*}$$

and let $\mu = m_X$ be the Haar probability measure on X.

For $g\in \Gamma $ , let $T^g: \mathbb {Z}_2^\Gamma \to \mathbb {Z}_2^\Gamma $ be the continuous group automorphism given by permuting indices on the left:

(1)

$$ \begin{align} T^g((x_h)_{h \in \Gamma}) = (x_{g^{-1}h})_{h\in \Gamma}. \end{align} $$

The subgroup X is invariant under this action, and hence so is its Haar measure.

This state space is easily visualized in terms of the Cayley graph of $\Gamma $ with its generators $s_1,\dots ,s_k$ . Through each group element g, each $s_i$ generates a k-cycle. So each vertex of the Cayley graph lies in d of these k-cycles, and there are no other relations in the group, so these k-cycles are attached together into a hyper-tree. With this picture in mind, a member of X is simply an assignment of zeros and ones to the vertices of the Cayley graph such that the sum around every k-cycle is even. For this reason, and by analogy with similar constructions in coding theory, we call X a parity check subshift. Indeed, certain random finite parity check codes play a crucial auxiliary role later in the paper: see Section 6.

Informally stated, our main results are these:

• If $k> d \ge 3$ , then the sofic entropy of the dynamical system $(X,m_X,T)$ is ${(1-d/k)\log (2)}$ .
• Every nontrivial factor of $(X,m_X,T)$ has positive sofic entropy and therefore positive Rokhlin entropy (this property is called ‘completely positive entropy’ or ‘CPE’). In fact, we prove the stronger assertion that the outer Pinsker factor of $(X,m_X,T)$ is trivial.
• The system $(X,m_X,T)$ is not isomorphic to a direct product of a nontrivial Bernoulli shift with another system. Combined with the previous conclusion, this is a strong negation of the weak Pinsker property.
• The system $(X,m_X,T)$ is not weakly contained in a Bernoulli shift. It is one of the first examples that has completely positive entropy and also this property.

Next, we introduce background needed to state our main results precisely.

1.1 Background: classical entropy theory

Kolmogorov introduced entropy theory into dynamics for the purpose of distinguishing Bernoulli shifts up to measure conjugacy. Given a standard probability space $(\mathtt {K},\kappa )$ , the Bernoulli shift over a countable group $\Gamma $ with base space $(\mathtt {K},\kappa )$ consists of the probability space $(\mathtt {K}^\Gamma , \kappa ^\Gamma )$ together with the action of $\Gamma $ by permuting indices as in 1. A sample of $(\mathtt {K}^\Gamma ,\kappa ^\Gamma )$ is a random $\mathtt {K}$ -valued configuration $(x_g)_{g \in \Gamma }$ whose coordinates are i.i.d. with law $\kappa $ .

Suppose we are given a standard probability space $(X,\mu )$ (where we have left the sigma-algebra out of the notation for simplicity). Let $\operatorname {\mathrm {Aut}}(X,\mu )$ denote the group of all measure-preserving automorphisms of $(X,\mu )$ . A pmp (probability-measure-preserving) action of $\Gamma $ is a homomorphism $T:\Gamma \to \operatorname {\mathrm {Aut}}(X,\mu )$ . The triple $(X,\mu ,T)$ is a $\Gamma $ -system. We also refer to it as a system or action if $\Gamma $ is understood.

If we are given two $\Gamma $ -systems $(X_i,\mu _i,T_i)$ , then a measurable map $\Phi :X_1 \to X_2$ is a factor map if it is a.e. $\Gamma $ -equivariant (this means $\Phi (T_1^g x)=T_2^g \Phi (x)$ for all $g\in \Gamma $ and $\mu _1$ -a.e. $x\in X_1$ ) and the pushforward measure satisfies $\Phi _*\mu _1=\mu _2$ . More precisely, we allow that $\Phi $ be defined only on a subset of full measure. If $\Phi $ is invertible (after ignoring a null set), then it is a measure-conjugacy or isomorphism.

If $\Gamma = \mathbb {Z}$ , then an action of the integers is given by a single transformation $T \in \operatorname {\mathrm {Aut}}(X,\mu )$ . Thus, it makes sense to consider whether two transformations are measurably conjugate.

A problem attributed to von Neumann asks whether there could be two Bernoulli shifts over the group of integers which are not measurably conjugate. To answer this, Kolmogorov defined the entropy rate of a dynamical system in the special case in which $\Gamma =\mathbb {Z}$ [Reference Kolmogorov43, Reference Kolmogorov44]. He proved entropy is invariant under measure-conjugacy and computed entropy rates for Bernoulli shifts, thereby answering the problem in the affirmative. In fact, the entropy rate of a Bernoulli shift action is the same as the Shannon entropy of the base space. When the base space is $(\mathtt {K},\kappa )$ and $\mathtt {K}$ is countable, its Shannon entropy is

$$ \begin{align*}H(\kappa) = - \sum_{k \in \mathtt{K}} \kappa(\{k\}) \log (\kappa(\{k\})).\end{align*} $$

If $\kappa $ is not supported on a countable set, then its Shannon entropy is defined to be $+\infty $ .

Kolmogorov’s theory extends fairly directly to the case when $\Gamma $ is amenable. The first published work on entropy theory for general amenable groups is due to Kieffer [Reference Kieffer41].

Since Kolmogorov’s pioneering work, entropy and Bernoulli shifts have played a central role in classifying dynamical systems. For example, Sinai proved that if an ergodic action of $\mathbb {Z}$ has positive entropy, then it factors onto a Bernoulli shift of the same entropy [Reference Sinaĭ67]. Because entropy cannot increase under a factor map, this shows that Bernoulli factors witness entropy. Inspired by Sinai’s theorem, Ornstein proved that Bernoulli shifts over the integers are isomorphic if and only if they have the same entropy [Reference Ornstein52, Reference Ornstein53]. These results were extended to the case of amenable acting groups in [Reference Ornstein and Weiss58].

Shannon entropy is easily seen to be additive under direct products, and this property is inherited by Kolmogorov’s entropy rate. Naively, one might guess that any ergodic system is isomorphic to a direct product of a Bernoulli shift with a zero entropy system. This turns out to be false; counterexamples to weaker claims appear in [Reference Ornstein56, Reference Ornstein55, Reference Ornstein54]. If it were true for a system which was not itself isomorphic to a Bernoulli shift, then additivity implies that the system would have a nontrivial (direct) factor with zero entropy. A system is said to have completely positive entropy (CPE) if every nontrivial factor has positive entropy. Here a ‘trivial’ factor is a measure-preserving system where the measure is a delta mass at a single point. This system is a factor of every other system, and it is easy to see that its entropy is zero. The paper [Reference Ornstein and Shields57] shows that for any positive number $h>0$ , there exist uncountably many pairwise non-isomorphic transformations which are CPE and have entropy h.

A factor map $\pi :(X_1,\mu _1,T_1) \to (X_2,\mu _2,T_2)$ is said to be direct or split if there is another factor map $\xi :(X_1,\mu _1,T_1) \to (X_3,\mu _3,T_3)$ so that the pair $(\pi ,\xi )$ together forms an isomorphism

$$\begin{align*}(X_1,\mu_1,T_1) \to (X_2 \times X_3,\mu_2 \times \mu_3,T_2\times T_3).\end{align*}$$

Note that the measure on the right-hand side is required to be the product, so in particular the factor maps $\pi $ and $\xi $ must generate independent sigma-subalgebras of subsets of $X_1$ . While Sinai’s factor theorem shows the existence of Bernoulli factors, it does not say anything about the existence of direct Bernoulli factors.

In the 1970s, Thouvenot defined a system to have the weak Pinsker property (WPP) if for every $\epsilon>0$ , it is isomorphic to a direct product of a Bernoulli shift with a system of entropy less than $\epsilon $ [Reference Thouvenot69]. In other words, a system has the WPP if its entropy is witnessed by direct Bernoulli factors. Thouvenot asked whether every ergodic transformation has the WPP. The first author recently proved that this is indeed the case [Reference Austin5]. Moreover, the statement holds whenever the acting group $\Gamma $ is amenable.

1.2 Background: sofic entropy theory

The second author constructed a system without the WPP in the special case when the group $\Gamma $ is a free group of sufficiently high rank [Reference Bowen17]. To explain, we need to pause for a moment to discuss entropy theory when the acting group is not amenable.

An example due to Ornstein and Weiss in [Reference Ornstein and Weiss58] suggested it might not be possible to extend entropy theory to non-amenable groups. However, this changed with the introduction of sofic entropy theory [Reference Bowen11]. The new theory applies to all sofic groups, which is a class of groups containing amenable and linear groups, for example. It is unknown whether all countable groups are sofic. Sofic entropy theory is reviewed in §2.2.

A sofic approximation to a group $\Gamma $ is a sequence $\Sigma $ of partial actions on finite sets which approximates the action of the group on itself by left-translations. To be precise, $\Sigma =(\sigma _n)_{n\in \mathbb {N}}$ where $\sigma _n: \Gamma \to \operatorname {\mathrm {Sym}}(V_n)$ , $V_n$ are finite sets, $\operatorname {\mathrm {Sym}}(V_n)$ is the symmetric group on $V_n$ and the sequence is required to satisfy for all $g, h, f \in \Gamma $ such that f is not the identity,

$$ \begin{align*} 1 &= \lim_{n\to\infty} |V_n|^{-1} |\{ v\in V_n:~ \sigma_n(gh)v = \sigma_n(g)\sigma_n(h)v\}|\\ 0&= \lim_{n\to\infty} |V_n|^{-1} |\{ v\in V_n:~ \sigma_n(f)v = v\}|. \end{align*} $$

A group is called sofic if it admits a sofic approximation. The sofic entropy of a system $(X,\mu ,T)$ (defined in Section 2.2) depends a priori on a choice of sofic approximation, although for many actions where it has been computed, it has been shown not to.

Many classical results extend to the sofic setting. For example, the sofic entropy of a Bernoulli shift action is equal to the Shannon entropy of the base. Sofic entropy is a measure-conjugacy invariant, and so two Bernoulli shifts with different sofic entropy are not isomorphic. In recent work, Seward completed the converse direction: for any countable group $\Gamma $ , if two Bernoulli shifts over $\Gamma $ have the same base space Shannon entropy, then they are measurably conjugate [Reference Stepin68, Reference Bowen13, Reference Seward66]. This converse does not depend on sofic entropy, which might not even be defined.

In a series of works generalizing Krieger’s Theorem [Reference Seward63, Reference Seward64], Seward introduced Rokhlin entropy. To define it, suppose we are given an action $T:\Gamma \to \operatorname {\mathrm {Aut}}(X,\mu )$ and a countable measurable partition $\mathcal {P}$ of X. We say the partition is generating if the smallest sigma-algebra containing it which is also $T(\Gamma )$ -invariant is the sigma-algebra of all measurable sets (up to sets of measure zero). Then Rokhlin entropy is defined to be the infimum of $H_\mu (\mathcal {P})$ over all generating partitions, where

$$ \begin{align*}H_\mu(\mathcal{P}) = - \sum_{P \in \mathcal{P}} \mu(P)\log(\mu(P))\end{align*} $$

is the Shannon entropy of $(\mathcal {P},\mu )$ .

It is immediate that Rokhlin entropy is a measure-conjugacy invariant. Moreover, it upper bounds sofic entropy. In fact, it is unknown whether Rokhlin entropy equals sofic entropy whenever the latter is not minus infinity (which can happen). However, the only known method for computing a lower bound to Rokhlin entropy uses sofic entropy. For example, it is unknown how to compute the Rokhlin entropy of Bernoulli shift actions, except in the case when $\Gamma $ is assumed to be sofic.

In a different paper [Reference Seward65], Seward generalized Sinai’s factor theorem: every ergodic system with positive Rokhlin entropy factors onto a Bernoulli shift with the same entropy.

However, other structural results about classical Kolmogorov–Sinai entropy break down outside the world of amenable groups. For example, Ornstein and Weiss’ example shows that sofic entropy can increase under a factor map. In fact, recent work of the second author shows that if $\Gamma $ is an arbitrary non-amenable group, then every Bernoulli shift over $\Gamma $ factors onto every Bernoulli shift over $\Gamma $ [Reference Bowen15]. For example, Bernoulli shifts of small entropy factor onto Bernoulli shifts of infinite entropy.

Although Bernoulli shifts themselves have been classified, there is no known substitute for broader ‘Ornstein theory’, which provides necessary and sufficient conditions for general ergodic processes to be isomorphic to Bernoulli shifts. Moreover, some specific counterexamples show that the story must change substantially for some non-amenable groups. For example, when G has property (T), Popa and Sasyk [Reference Popa and Sasyk60] have given simple examples of factors of Bernoulli shifts that are not isomorphic to Bernoulli shifts.

It is also known that the weak Pinsker property does not hold for all non-amenable groups and for the main non-amenable notions of entropy. The first counterexample appeared in [Reference Bowen17]. While that counterexample does not have the WPP, it might still admit some direct Bernoulli factors. In other words, the system might be measurably conjugate to the direct product of a Bernoulli shift with another system, but one cannot choose the other system to have entropy less than $\epsilon $ if $\epsilon>0$ is chosen low enough. In this work, we present a new counterexample which does not admit any nontrivial Bernoulli factors. Here, a Bernoulli shift $(\mathtt {K}^\Gamma ,\kappa ^\Gamma ,T)$ is said to be trivial if $\kappa $ is supported on a single point in $\mathtt {K}$ . So a trivial Bernoulli shift is measurably conjugate to the trivial system which consists of $\Gamma $ acting on a single point.

1.3 Main results

Recall from the introduction that $\Gamma $ is the d-fold free power of $\mathbb {Z}_k$ , $X \subset (\mathbb {Z}_2)^\Gamma $ is a certain closed ‘parity check’ subgroup and $T: \Gamma \to \operatorname {\mathrm {Aut}}(X,m_X)$ is the canonical shift action by automorphisms.

Theorem A. Let $k> d \ge 3$ . Then there exists a sofic approximation $\Sigma =(\sigma _n)_n$ to $\Gamma $ such that

$$\begin{align*}\operatorname{\mathrm{h}}_\Sigma(X,m_X,T) = (1-d/k)\log 2\end{align*}$$

and along which the outer Pinsker factor of $(X,m_X,T)$ is trivial. In particular, $(X,m_X,T)$ has completely positive sofic entropy along $\Sigma $ .

Since sofic entropy is always bounded above by Rokhlin entropy, the last part of this theorem has the following immediate corollary:

Corollary 1.1. The outer Rokhlin Pinsker factor of $(X,m_X,T)$ is trivial, so it has completely positive Rokhlin entropy.

Theorem B. If $k> d \ge 3$ , then the system $(X,m_X,T)$ has no nontrivial direct Bernoulli factors.

Remark. Theorems A and B should hold in the more general setting in which ${\Gamma = \Gamma _1 \ast \cdots \ast \Gamma _d}$ , where each $\Gamma _i$ is a group of order k. That is, we do not need to require that each $\Gamma _i$ is cyclic. The proofs are essentially the same.

Remark. The weak Pinsker entropy of a system $(X,\mu ,T)$ is defined to be the supremum of the Shannon entropies $H(\mathtt {K},\kappa )$ over all direct Bernoulli factors of the form $\Gamma {\curvearrowright } (\mathtt {K},\kappa )^\Gamma $ . This concept was introduced in [Reference Bowen and Tucker-Drob18]. So Theorem B implies that $(X,m_X,T)$ has zero weak Pinsker entropy.

Finally, our methods also give the following.

Theorem C. If $k> d \ge 3$ , then the system $(X,m_X,T)$ is not weakly contained in a Bernoulli system.

For a definition of weak containment of measure-preserving systems, see, for example, [Reference Burton and Kechris19].

Theorems B and C are consequences of the system having totally shattered microstate spaces along some sofic approximation: see Corollary 5.5.

Together, Theorems A and C show that $(X,m_X,T)$ has completely positive sofic entropy along some sofic approximation, and hence also completely positive Rokhlin entropy, but is not weakly contained in a Bernoulli shift. The authors are not aware of other systems for which both of these properties have been verified previously.

1.4 Probabilistic versions of the main theorems

Both Theorem A and Theorem B will be derived as corollaries of probabilistic theorems. To explain, we say that a permutation of a set V is k -uniform if it consists entirely of k-cycles: that is, its cycle type is $[k^{n/k}]$ . Recall that $\Gamma = \Gamma _{d,k} = \langle s_1,\dots ,s_d:\ s_1^k = \dots = s_d^k = e\rangle $ . A homomorphism $\sigma :\Gamma \to \operatorname {\mathrm {Sym}}(V)$ is k -uniform if for each generator $s_i$ , the image $\sigma (s_i)$ is k-uniform. Let $\mathbb {P}_n$ be the uniform distribution on the set $\operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma ,\operatorname {\mathrm {Sym}}(V_n))$ of k-uniform homomorphisms. We will always assume n is chosen so that k divides n (since otherwise there are no k-uniform permutations).

We infer the existence of sofic approximations with the desired properties from the next proposition, which follows immediately from a more precise estimate in [Reference Airey, Bowen and Lin2, Lemma 3.1].

Proposition 1.2. There are subsets $\Omega _n^{\mathrm {sofic}}$ of $\operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma ,\operatorname {\mathrm {Sym}}(V_n))$ with ${\mathbb {P}_n(\Omega _n^{\mathrm {sofic}})\to 1}$ and such that $(\sigma _n)_n$ is a sofic approximation to $\Gamma $ whenever $\sigma _n \in \Omega _n^{\mathrm {sofic}}$ for all n.

With Proposition 1.2 in hand, Theorem A is a corollary of the following probabilistic version.

Theorem 1.3. There are subsets $\Omega _n' \subseteq \Omega _n^{\mathrm {sofic}}$ with $\mathbb {P}_n(\Omega _n')\to 1$ and such that, if $\sigma _n \in \Omega _n'$ for every n, then

a. the sofic entropy of $(X,m_X,T)$ along any subsequence of $(\sigma _n)_n$ equals
$$\begin{align*}(1-d/k)\log 2;\end{align*}$$
b. every nontrivial factor of $(X,m_X,T)$ inherits positive sofic entropy along $(\sigma _n)_n$ .

Part (b) of this theorem refers to ‘inherited’ sofic entropy. This quantity has previously been referred to as ‘extension’ sofic entropy, ‘outer’ sofic entropy, or ‘sofic entropy in the presence’. The outer Pinsker factor along $(\sigma _n)_n$ is the largest factor of $(X,m_X,T)$ for which this quantity vanishes, so part (b) above asserts that the outer Pinsker factor of $(X,m_X,T)$ along $(\sigma _n)_n$ is trivial. We recall these definitions and discuss our choice of terminology in Subsection 2.3 below.

Remark. It is tempting to summarize part (b) of Theorem 1.3 as ‘ $(X,m_X,T)$ has completely positive entropy with high probability according to $\mathbb {P}_n$ ’. But we must be careful because this summary hides an important detail about the order of quantifiers. For part (b) of Theorem 1.3, the high-probability subsets $\Omega _n'$ are found a priori, and then any factor of $(X,m_X,T)$ has positive sofic entropy along any choice of $\sigma _n$ from those subsets. One could instead take ‘completely positive entropy with high probability’ to mean that for every factor of $(X,m_X,T)$ , there is a sequence of high-probability sets $\Omega ^{\prime }_n$ (depending on the factor) within which one sees the desired positive sofic entropy. The latter conclusion is formally weaker, and we do not know whether there exist examples that satisfy one but not the other. Since the collection of all factors may be uncountable, one cannot use a simple diagonal argument to prove that these two formulations are equivalent. (If one identifies factors with their conditional expectation operators, then their induced strong operator topology is separable, but the semi-continuity properties of sofic entropy still do not obviously combine with approximations in this topology to enable a more complicated diagonal argument.)

Theorem B is also a corollary of a probabilistic assertion about randomly-chosen $\sigma _n$ and the geometry of the space of microstates. We will prove that the space of microstates shatters in a strong sense, which roughly speaking means that it splits into a union of exponentially many well-separated clusters, each of which has sub-exponential size. The precise definition is given in §5, and the precise statement of the result for the parity-check subshift (Theorem 6.1, part (3)) is formulated in §6.1 below. It was essentially already known from [Reference Bowen17] that this phenomenon is incompatible with having a direct Bernoulli factor. We give a formal proof of this incompatibility in §5.

Our focus on the geometry and on shattering is inspired by similar ideas from statistical physics on random sparse graphs [Reference Achlioptas and Ricci-Tersenghi1]. As far as we know, the term ‘shattering’ first appears in [Reference Coja-Oghlan and Efthymiou22]. In other works, this phenomenon is called dynamical replica symmetry breaking [Reference Fu and Anderson29, Reference Krza̧kała, Montanari, Ricci-Tersenghi, Semerjian and Zdeborová45].

1.5 Outline of the paper

This paper is divided into two parts. In Part I, we consider general symbolic dynamical systems with a focus on the case in which $\Gamma $ is a free product of finite cyclic groups.

• Section 2 is a review of sofic group theory and sofic entropy.
• Section 3 introduces Kikuchi entropy and annealed entropy for actions of $\Gamma _{d,k}$ . These entropies generalize the F-functional and the f-invariant from [Reference Bowen10] and are strongly related to the first moment method in statistical physics. This is the main tool for proving Theorem 1.3 part (a).
• Section 4 relates completely positive inherited entropy to a version of uniform mixing for model spaces that we call ‘property M’. These general notions provide the background and tools needed to prove Theorem 1.3 part (b).
• Section 5 defines totally shattered microstate spaces and shows how this property prevents both direct Bernoulli factors and weak containment in a Bernoulli shift.

Part II focuses on the parity check sub-shifts which appear in the main theorems:

• Section 6 discusses random (finite) LDPC codes.
• Section 7 proves that for a typical sequence of these codes, the sequence of uniform measures on the codebooks has property M.
• Section 8 proves that typically these uniform measures converge locally and empirically to the Haar measure. Since the sequence also has property M, this gives Theorem 1.3(b) as an application of Theorem 4.3.
• Section 9 proves that the sofic entropy of the Haar measure is $(1-d/k) \log 2$ over most sofic approximations (Theorem 1.3(a)).
• Section 10 proves that the Haar measure typically has totally shattered microstate spaces (Theorem 6.1(3)). This implies Theorem B (the Haar measure has no direct Bernoulli factors) via Corollary 5.5(2).

1.6 Notational conventions

We use the following standard notation for approximate comparison of functions. Let f and g be real-valued functions on the same domain S, and let A be any set of additional parameters in specifying these functions. Then,

• In case g is nonnegative, we write $f = O_A(g)$ if there is a positive constant c, depending possibly on A but on nothing else, such that $|f| \le c g$ .
• In case g is nonnegative and $S = \mathbb {N}$ , we write $f = o_A(g)$ if $g(n)$ is strictly positive for all sufficiently large n and $f(n)/g(n)\to 0$ , where this convergence may be bounded by a function that tends to $0$ and depends only on A.
• For any S, we write $f \lesssim _A g$ or $g \gtrsim _A f$ if (i) both functions are nonnegative, and (ii) there is a positive constant c, depending possibly on A but on nothing else, such that $f \le c g$ . Note that this is similar to writing ‘ $f = O_A(g)$ ’, except here it is part of the assertion that f and g are both nonnegative, in case this is not obvious.

Finally, if $(\Omega _n,\mathbb {P}_n)$ is a sequence of probability spaces indexed by $n \in \mathbb {N}$ , we write ‘ $o_p(1)$ ’ as a placeholder for any sequence of random variables $X_n$ on these spaces such that

$$\begin{align*}\mathbb{P}_n(|X_n|> \varepsilon) \to 0 \qquad \forall \varepsilon > 0.\end{align*}$$

So this is essentially an analog of $o(1)$ for convergence in probability.

Part I

General systems

2 Preliminaries

2.1 Sofic groups

A sofic approximation to a countable group $\Gamma $ is a sequence $\Sigma =\{\sigma _n\}_{n\in \mathbb {N}}$ of maps

$$ \begin{align*}\sigma_n: \Gamma \to \operatorname{\mathrm{Sym}}(V_n),\end{align*} $$

where $V_n$ are finite sets, $\operatorname {\mathrm {Sym}}(V_n)$ is the symmetric group on $V_n$ , and the sequence is required to be asymptotically homomorphic and free in the following sense: For every $g,h \in \Gamma $ , we require that the homomorphism equation $\sigma _n(gh)v=\sigma _n(g)\sigma _n(h)v$ holds for asymptotically all $v\in V_n$ :

$$ \begin{align*}1 = \lim_{n\to\infty} |V_n|^{-1} |\{ v\in V_n:~ \sigma_n(gh)v = \sigma_n(g)\sigma_n(h)v\}|.\end{align*} $$

For every non-identity element $g \in \Gamma \setminus \{1_\Gamma \}$ , we require that the percentage of points fixed by $\sigma _n(g)$ tends to zero:

$$ \begin{align*} 0= \lim_{n\to\infty} |V_n|^{-1} |\{ v\in V_n:~ \sigma_n(g)v = v\}|. \end{align*} $$

A group $\Gamma $ is sofic if it admits a sofic approximation.

If $\Gamma $ admits a finite generating set S, then it is common to visualize a map $\sigma _n$ as above in terms of the labeled and directed graph $G(\sigma _n)=(V_n,E_n)$ it induces. The edges of this graph are pairs of the form $(v, \sigma _n(s)v)$ for $s\in S$ and $v\in V_n$ , and the label of this pair is s. Then $\Sigma $ is a sofic approximation precisely when this sequence of graphs $G(\sigma _n)$ Benjamini-Schramm converges to the (labeled and directed) Cayley graph of $\Gamma $ with respect to S [Reference Elek and Szabó27].

It is an exercise to check that all amenable groups and all residually finite groups are sofic. Because finitely generated linear groups are sofic and direct unions of sofic groups are also sofic, it follows that all linear groups are sofic. It is an open problem whether all countable groups are sofic.

The class of sofic groups was introduced implicitly by Gromov [Reference Gromov35] and explicitly by Weiss [Reference Weiss73]. For further background on sofic groups, see [Reference Pestov59, Reference Capraro and Lupini21].

2.2 Sofic entropy

This section defines sofic entropy for subshifts using the formulation from [Reference Bowen14]. See also [Reference Bowen16] or [Reference Kerr and Li40] for more comprehensive references.

Let $\Gamma $ denote a countable group and let $\mathtt {A}$ be a finite set (called the alphabet). Let $\mathtt {A}^\Gamma $ be the set of all functions $x:\Gamma \to \mathtt {A}$ . We write either $x_g$ or $x(g)$ for the value of x on $g\in \Gamma $ , whichever is most convenient. We endow $\mathtt {A}^\Gamma $ with the pointwise convergence topology, under which it is compact and metrizable.

Let $\operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ denote the space of all Borel probability measures on $\mathtt {A}^\Gamma $ , which we endow with the weak* topology. In this topology, a sequence of Borel probability measures $(\mu _n)_{n \in \mathbb {N}}$ converges to a measure $\mu _\infty $ if and only if for every continuous function $f:\mathtt {A}^\Gamma \to \mathbb {R}$ , $\lim _{n\to \infty } \int f~d \mu _n= \int f~d \mu _\infty $ . An equivalent characterization uses cylinder sets which are defined as follows. Given a finite subset $F \subset \Gamma $ and $x:F \to \mathtt {A}$ , let $C(x,F)$ be the set of all functions $y:\Gamma \to \mathtt {A}$ such that $y(f)=x(f)$ for all $f\in F$ . Then $(\mu _n)_{n \in \mathbb {N}}$ converges to $\mu _\infty $ if and only if for every such F and x, $\lim _{n\to \infty } \mu _n(C(x,F)) = \mu _\infty (C(x,F))$ . This is because the cylinder sets $C(x,F)$ form a sub-basis for the topology on $\mathtt {A}^\Gamma $ .

Let $T = (T^g)_{g\in \Gamma }$ be the shift action on $\mathtt {A}^\Gamma $ defined by $T^gx(f)= x(g^{-1}f)$ for $x \in \mathtt {A}^\Gamma $ . This induces an action on $\operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ by pushforwards. The set of all shift-invariant Borel probability measures on $\mathtt {A}^\Gamma $ will be denoted $\operatorname {\mathrm {Prob}}^\Gamma (\mathtt {A}^\Gamma )$ . If $\mu $ is a shift-invariant Borel probability measure on $\mathtt {A}^\Gamma $ , then the system $(\mathtt {A}^\Gamma ,\mu ,T)$ is called a shift $\Gamma $ -system. We define here the sofic entropy of such systems.

Given $\sigma :\Gamma \to \textrm {Sym}(V)$ , $v \in V$ and $x:V \to \mathtt {A}$ , the pullback name of x at v is the labeling $\Pi ^\sigma _v(x) \in \mathtt {A}^\Gamma $ defined by

$$\begin{align*}\Pi^\sigma_v(x)(g) = x_{\sigma(g^{-1})v} \quad \forall g \in \Gamma. \end{align*}$$

For the sake of building some intuition, note that when $\sigma $ is a homomorphism, the map $v \mapsto \Pi ^\sigma _{v}(x)$ is $\Gamma $ -equivariant (in the sense that $\Pi ^\sigma _{\sigma (g)v}(x) = T^g \Pi ^\sigma _{v}(x)$ ). In particular, ${\Pi ^\sigma _{v}(x) \in \mathtt {A}^\Gamma} $ is periodic. In general, we think of $\Pi ^\sigma _v(x)$ as an approximate periodic point.

The empirical measure of $x:V\to \mathtt {A}$ is

$$ \begin{align*}P^\sigma_x = |V|^{-1} \sum_{v\in V} \delta_{\Pi^\sigma_v(x)} \in \operatorname{\mathrm{Prob}}(\mathtt{A}^\Gamma),\end{align*} $$

where, for $y \in \mathtt {A}^\Gamma $ , $\delta _y \in \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ is the Dirac measure concentrated on $\{y\}$ . For example, if $\sigma $ is a homomorphism, then $P^\sigma _x$ is a $\Gamma $ -invariant measure supported on the $\Gamma $ -orbits of the pullback names $\Pi ^\sigma _v(x)$ .

Given an open set $\mathcal {O} \subset \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ , a map $x:V \to \mathtt {A}$ is called an $(\mathcal {O},\sigma )$ -microstate if $P^\sigma _x \in \mathcal {O}$ . Typically, we take $\mathcal {O}$ to be a small neighborhood of $\mu $ , in which case we consider $(\mathcal {O},\sigma )$ -microstates to be ‘good microstates for $\mu $ ’. Let $\Omega (\sigma ,\mathcal {O}) \subset \mathtt {A}^V$ denote the set of all $(\mathcal {O},\sigma )$ -microstates.

Let $\mu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ be $\Gamma $ -invariant and let $\Sigma =(\sigma _i:\Gamma \to \textrm {Sym}(V_i))_{i \in \mathbb {N}}$ be a sofic approximation to $\Gamma $ . We say that the system $(\mathtt {A}^\Gamma ,\mu ,T)$ has microstates along $\Sigma $ if for every neighbourhood $\mathcal {O}$ of $\mu $ ,

$$\begin{align*}\Omega(\sigma_n,\mathcal{O}) \ne \emptyset \quad \mbox{for all sufficiently large }n.\end{align*}$$

More generally, an arbitrary measure-preserving $\Gamma $ -system has microstates along $\Sigma $ if every shift-system factor of it has microstates along $\Sigma $ .

The $\Sigma $ -entropy of the action $(\mathtt {A}^\Gamma , \mu , T)$ is defined by

(2)

$$ \begin{align} h_\Sigma(\mathtt{A}^\Gamma, \mu, T):= \inf_{\mathcal{O} \ni \mu} \limsup_{i\to\infty} |V_i|^{-1} \log {\lvert {\Omega(\sigma_i,\mathcal{O})} \rvert}, \end{align} $$

where the infimum is over all open neighborhoods of $\mu $ in $\operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ . We abbreviate this to $h_\Sigma (\mu )$ if the other data are clear from the context. This number depends on the action $(\mathtt {A}^\Gamma , \mu , T)$ only up to measure conjugacy [Reference Bowen11]. It therefore defines an invariant for any abstract measure-preserving system that can be represented up to measure conjugacy by a shift system with a finite alphabet or, equivalently, that has a finite generating partition. If the system does not have microstates along any subsequence of $\Sigma $ , then we declare that the $\Sigma $ -entropy is $-\infty $ .

2.3 Factor maps and inherited entropy

The second part of Theorem A concerns the entropy not only of $(X,m_X,T)$ but also of all its factors. Since X is a shift-invariant subset of $\mathbb {Z}_2^\Gamma $ , the $\Sigma $ -entropy of $(X,m_X,T)$ is an instance of formula (2). But this system may have factors that are not measure conjugate to shift systems, so that formula (2) does not apply.

Sofic entropy can be generalized to measure-preserving systems on standard measurable spaces in various ways: see, for instance, [Reference Bowen16, Subsection 2.4] and [Reference Austin3, Subsection 3.1] for formulations and proofs of their equivalence. However, rather than repeat these in detail here, we need only recall how they are controlled by another entropy notion that does permit us to reduce our work to the study of shift systems.

Towards defining this, consider a factor map between two shift systems on finite alphabets, say

(3)

$$ \begin{align} \Phi:(\mathtt{A}^\Gamma,\mu)\to (\mathtt{B}^\Gamma,\nu), \end{align} $$

where $\mu $ and $\nu $ are both invariant under the shift actions of $\Gamma $ on their respective spaces. Rather than counting good microstates for $\mu $ or $\nu $ separately, we can ask how many of the good microstates for $\nu $ can be lifted to good microstates for $\mu $ – that is, how many good microstates $\nu $ ‘inherits’ through the map $\Phi $ . To make this precise, consider the graphical joining

(4)

$$ \begin{align} \lambda := \int_{\mathtt{A}^\Gamma} \delta_{(x,\Phi(x))}\ d\mu(x). \end{align} $$

This is invariant under the shift action of $\Gamma $ on $(\mathtt {A}\times \mathtt {B})^\Gamma $ . Let $\mathrm {proj}_i$ be the coordinate projection from $(\mathtt {A}\times \mathtt {B})^{V_i}$ to $\mathtt {B}^{V_i}$ . Finally, define the inherited $\Sigma $ -entropy of $\Phi $ to be

(5)

$$ \begin{align} h_\Sigma(\mu,T\,;\,\Phi) := \inf_{\mathcal{O}\ni \lambda}\limsup_{i\to\infty}|V_i|^{-1}\log|\mathrm{proj}_i[\Omega(\sigma_i,\mathcal{O})]|. \end{align} $$

If $\Phi $ is the identity, then this is easily checked to coincide with $h_\Sigma (\mu )$ .

Like sofic entropy itself, inherited sofic entropy can be generalized to factor maps between arbitrary measure-preserving systems. This general notion is not new, but our use of the term ‘inherited’ is new. Indeed, the idea behind this quantity is already implicit in Kerr’s original approach to defining sofic entropy itself for general measure-preserving systems (see [Reference Kerr39] or [Reference Bowen16, Subsubsection 2.4.2]). It was formulated and studied explicitly by Hayes in [Reference Hayes38, Reference Hayes37], who refers to it as ‘entropy in the presence’ in recognition of a parallel usage in Voiculescu’s theory of free entropy [Reference Voiculescu70]. It has also been studied by other authors under various names, including ‘outer sofic entropy’ and ‘extension sofic entropy’. It is reviewed in [Reference Bowen16, Subsection 11.1], which gives more complete references. Despite this history, we do propose the new name ‘inherited’ entropy because this seems to capture the idea behind the definition better.

Starting from (5), one can define entropy for a factor map between general systems by carefully inserting a supremum over generating partitions of the lower system and an infimum over generating partitions of the upper system. However, the proof that sofic entropy itself is invariant under measure conjugacy can be adapted directly to the quantity in (5), showing that it is an invariant of $\Phi $ , where we consider two factor maps to be equivalent if they appear downwards in a commuting square whose horizontal arrows are conjugacies. As a corollary, (5) coincides with the abstract inherited entropy in the case of two shift systems. A more general version of this argument can be found in [Reference Hayes38, Proposition 2.9].

One crucial advantage to working with inherited sofic entropy is its monotonicity under factor maps. The following lemma is a special case of parts (iii) and (iv) of [Reference Hayes38, Proposition 2.10].

Lemma 2.1. If

$$\begin{align*}(X,\mu,T) \stackrel{\Pi}{\to} (Y,\nu,S) \stackrel{\Phi}{\to} (Z,\theta,R)\end{align*}$$

are factor maps, then

$$\begin{align*}h_\Sigma(\mu,T\,;\,\Phi\circ\Pi) \le \min\big\{h_\Sigma(\nu,S\,;\,\Phi),\ h_\Sigma(\mu,T\,;\,\Pi)\big\}.\end{align*}$$

In particular, taking $\Pi $ (resp. $\Phi $ ) to be the identity, we obtain

$$\begin{align*}h_\Sigma(\mu,T\,;\,\Phi) \le h_\Sigma(\mu,T) \quad (\mbox{resp.}\quad h_\Sigma(\mu,T\,;\,\Pi) \le h_\Sigma(\mu,S)).\end{align*}$$

Inspired by this property, one defines the outer $\Sigma $ -Pinsker factor of a measure-preserving system to be the largest factor whose inherited $\Sigma $ -entropy is zero. A routine argument shows that a unique maximal such factor exists. Hayes’ paper [Reference Hayes38] develops this story as well, although it had been considered in unpublished work previously. As a result, the assertion that every nontrivial factor of a measure-preserving system has positive inherited $\Sigma $ -entropy is equivalent to the assertion that the outer $\Sigma $ -Pinsker factor of that system is trivial. In addition, by the last inequality in Lemma 2.1, if a system has this property, then it also has completely positive $\Sigma $ -entropy. The observations together explain our formulation of the second part of Theorem A.

Starting from Lemma 2.1, Hayes develops enough properties of the outer Pinsker factor to turn it into a valuable tool in the study of sofic entropy in general. For instance, these properties are crucial to his proof that a certain large class of systems of algebraic origin all have completely positive sofic entropy [Reference Hayes37]. Our reasons for using inherited entropy in the proof of Theorem A are similar, but the details of our proof are essentially disjoint from those of Hayes, and our LDPC system is not among the systems of algebraic origin that he considers in that reference.

Among its useful consequences, Lemma 2.1 gives the following:

Corollary 2.2. Let $(X,\mu ,T)$ be a measure-preserving system. If every nontrivial factor map of $(X,\mu ,T)$ to a shift system has positive inherited $\Sigma $ -entropy, then every nontrivial factor map of $(X,\mu ,T)$ has positive inherited $\Sigma $ -entropy.

Proof. Let $\Pi $ be a factor map to another nontrivial system $(Y,\nu ,S)$ . Then $(Y,\nu )$ has a nontrivial partition into two measurable subsets. By acting on this partition using S, we define a further factor map of the form

$$\begin{align*}\Phi:(Y,\nu,S)\to (\{0,1\}^\Gamma,\theta,\textrm{shift}),\end{align*}$$

where $\theta $ is not a Dirac measure. Now our hypothesis gives that $h_\Sigma (\mu ,T\,;\,\Phi \circ \Pi )> 0$ , and this implies that $h_\Sigma (\mu ,T\,;\,\Pi )> 0$ by Lemma 2.1.

Corollary 2.2 can simplify many of the technicalities involved in a proof of completely positive inherited entropy by letting us restrict our attention to factor maps between shift spaces. Given such a map, say

$$\begin{align*}\Phi:(\mathtt{A}^\Gamma,\mu)\to (\mathtt{B}^\Gamma,\nu),\end{align*}$$

it is uniquely determined by its coordinate at the identity of $\Gamma $ , which is an arbitrary measurable map $\phi :\mathtt {A}^\Gamma \to \mathtt {B}$ . Starting from $\phi $ , we write $\phi ^\Gamma $ for the factor map that it induces, which is given by

(6)

$$ \begin{align} \phi^\Gamma(x)(\gamma) = \phi(\gamma^{-1} \cdot x). \end{align} $$

When working with such a map $\Phi $ , a further simplification is often necessary. If D is a finite subset of $\Gamma $ , then the map $\phi $ above is called D -local if the image $\phi (x)$ depends only on the coordinates of x indexed by D – equivalently, if $\phi $ factorizes into maps

$$\begin{align*}\mathtt{A}^\Gamma \to \mathtt{A}^D\to \mathtt{B},\end{align*}$$

where the first map is the coordinate projection. We say that $\phi $ is local if it is D-local for some finite set D,and apply the same terminology to the whole of $\phi ^\Gamma $ . This is equivalent to $\phi ^\Gamma $ being a continuous map for the product topologies on our shift spaces.

If $\mathbf {x} \in \mathtt {A}^{V}$ is some good microstate for $\mu $ over a map $\sigma \colon \Gamma \to \operatorname {\mathrm {Sym}}(V)$ , we might attempt to send it to a good microstate for $\nu $ using the map $\phi ^\sigma \colon \mathtt {A}^V \to \mathtt {B}^V$ defined by

(7)

$$ \begin{align} \phi^\sigma(\mathbf{x})(v) = \phi( \Pi_v^\sigma \mathbf{x}) , \end{align} $$

since the empirical distribution of $\phi ^\sigma (\mathbf {x})$ would then be $\phi ^\Gamma _* P_{\mathbf {x}}^\sigma $ . Since $P_{\mathbf {x}}^\sigma $ is close to $\mu $ , one would hope this would be close to $\nu $ . This argument is correct in case $\phi ^\Gamma _*$ acts continuously on measures, which in turn holds if $\phi $ is local. In that case, $\phi ^\sigma $ also has the following form of quantitative continuity.

Lemma 2.3. Suppose $\mathtt {A},\mathtt {B}$ are finite sets, $\phi \colon \mathtt {A}^\Gamma \to \mathtt {B}$ is D-local, and $\sigma \colon \Gamma \to \operatorname {\mathrm {Sym}}(V)$ . If $\mathbf {x}$ and $\mathbf {y}$ are elements of $\mathtt {A}^V$ that disagree in exactly one coordinate, then $\phi ^\sigma (\mathbf {x})$ and $\phi ^\sigma (\mathbf {y})$ disagree in at most $|D|$ coordinates.

Equivalently, this asserts that $\phi ^\sigma $ is $|D|$ -Lipschitz for the ‘normalized Hamming metrics’ on $\mathtt {A}^V$ and $\mathtt {B}^V$ . This point of view is introduced and used for an application of Lemma 2.3 in Subsection 5.1.

Proof. Suppose that $\mathbf {x}(u)\ne \mathbf {y}(u)$ , and let $v \in V$ . Since $\phi $ is D-local, we can have $\phi (\Pi _v^\sigma \mathbf {x}) \ne \phi (\Pi _v^\sigma \mathbf {y})$ only if there exists $\gamma \in D$ such that $\mathbf {x}(\sigma (\gamma ^{-1}) v) \ne \mathbf {y}(\sigma (\gamma ^{-1}) v)$ , and hence only if u appears in the set $\sigma (D^{-1})\cdot v$ . Since $\sigma $ is an action by permutations, this holds if and only v lies in $\sigma (D^{-1})^{-1}\cdot u$ , and that set has cardinality at most $|D^{-1}| = |D|$ .

The arguments about $\phi ^\sigma $ above can fail for general measurable factor maps, but this problem can be overcome by approximating these in measure by local factor maps. We say that a sequence of maps

$$\begin{align*}\psi_m:\mathtt{A}^\Gamma \to \mathtt{B} \quad (m=1,2,\dots)\end{align*}$$

is a local approximating sequence to $\phi $ if each $\psi _m$ is local and

(8)

$$ \begin{align} \mu\{\psi_m \ne \phi\} \to 0. \end{align} $$

(Note that this notion implicitly also depends on the measure $\mu $ .) These are the special case for shift spaces of the ‘almost Lipschitz approximating sequences’ introduced and used in [Reference Austin3] to study factor maps between more general measure-preserving systems. At a few points in the sequel, we refer to [Reference Austin3] for properties that we need in our special case. For instance, informally, if $\psi $ is a good enough local approximation to $\phi $ , then $\psi ^\sigma $ sends very good microstates for $\mu $ to fairly good microstates for $\phi ^\Gamma _*\mu $ [Reference Austin3, Lemma 4.10].

3 Bethe–Kikuchi entropy and annealed calculations

Throughout this section, we set $\Gamma = \Gamma _{d,k} =\mathbb {Z}_k \ast \dots \ast \mathbb {Z}_k$ equal to the free product of d copies of $\mathbb {Z}_k = \mathbb {Z}/k\mathbb {Z}$ .

The proof of Theorem A part (a) (and its probabilistic version Theorem 1.3) relies on a first moment calculation. These kinds of computations have a long and interesting history in statistical physics and more recently appeared in the entropy theory of actions of the free group, where, for example, they were used to answer a long-standing open problem on the classification of Bernoulli shifts over free groups [Reference Bowen10]. This history is reviewed in §3.3.

In the next section, we introduce the uniformly random sofic approximation to $\Gamma $ . This will be our main focus, and we will only obtain results about deterministic sofic approximations indirectly as a consequence of the analysis of these random ones.

In Section 3.2, we introduce Kikuchi entropy, which has historical roots in physics [Reference Kikuchi42] and is a version of the functional F of [Reference Bowen10] adapted from the free group to groups $\Gamma $ which are free products of finite groups. This entropy is a first approximation to the annealed entropy which in recent ergodic-theory research was called the f-invariant [Reference Bowen10]. These entropies are key ingredients for proving Theorem 1.3 part (a).

3.1 Random sofic approximations

Instead of directly constructing sofic approximation sequences $\sigma _n: \Gamma \to \operatorname {\mathrm {Sym}}(V_n)$ , we will construct probability measures on the space of homomorphisms from $\Gamma $ to $\operatorname {\mathrm {Sym}}(kn)$ . These measures were used for the same purpose in [Reference Airey, Bowen and Lin2].

Let V be a finite set whose size is divisible by k. Let us say that a permutation of V is k -uniform if it consists entirely of k-cycles: that is, its cycle type is $[k^{n/k}]$ . Consider a d-tuple of k-uniform permutations $(\sigma (s_1),\dots ,\sigma (s_d))$ . Then $\sigma (s_i^k)$ equals the identity permutation for each i, and so the tuple $(\sigma (s_i),\dots ,\sigma (s_d))$ is the image of the generators $(s_1,\dots ,s_d)$ under a homomorphism $\sigma :\Gamma \to \operatorname {\mathrm {Sym}}(V)$ . We call such a homomorphism k -uniform if the images of these generators are all k-uniform permutations. Denote the set of k-uniform homomorphisms into $\operatorname {\mathrm {Sym}}(V)$ by $\operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma , \operatorname {\mathrm {Sym}}(V))$ . Note that for arbitrary members of $\operatorname {\mathrm {Hom}}(\Gamma , \operatorname {\mathrm {Sym}}(V))$ , the cycle sizes of the images of the generators of $\Gamma $ must be factors of k.

Given a homomorphism $\sigma \in \operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma , \operatorname {\mathrm {Sym}}(V))$ , consider the collection of all the orbits of the individual maps $\sigma _i = \sigma (s_i)$ – that is, all the subsets of the form

(9)

$$ \begin{align} \{ \sigma(s_i^j)(v) \,:\, 0 \leq j < k \} \end{align} $$

for $v \in V$ and $i \in [d]$ . Taken together, these may be regarded as a hyper-graph on V. Because we consider a k-uniform homomorphism, this hyper-graph is k-uniform in the usual sense in combinatorics: this is the origin of the terminology. However, it could happen that two $\sigma _i$ and $\sigma _j$ give some vertex the same orbit. In this eventuality, it is better to think of the family of sets (9) as a multi-hyper-graph, in that we count each hyper-edge with this multiplicity.

For each n, we set $V_n := \{1,2,\dots ,n\}$ . If k divides n, then we let $\mathbb {P}_n$ be the uniform distribution on $\operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma ,\operatorname {\mathrm {Sym}}(V_n))$ . The sofic approximations that appear in Theorem A are obtained at random from these distributions and are shown to have all the desired properties with high probability.

In order to take this approach, Proposition 1.2 above is a basic prerequisite. It allows us to focus on typical properties of the microstate spaces, knowing that random homomorphisms will be good sofic approximation maps with high probability. It is implied by the more precise estimate in [Reference Airey, Bowen and Lin2, Lemma 3.1], where our $\mathbb {P}_n$ is called ‘ $\mathbb {P}_n^u$ ’. Note that attention is restricted to even n in that reference, but that restriction is unnecessary for the case of $\mathbb {P}_n^u$ ; it is included there only for the sake of the other probability distribution $\mathbb {P}_n^p$ that is covered by the same lemma.

3.2 Formula for Kikuchi entropy

Let $\mathtt {A}$ be a finite set and $\Gamma = (\mathbb {Z}_k)^{\ast d}$ as above. Given a $\Gamma $ -invariant measure $\mu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ , we define the edge weight $W_\mu ( \cdot; \cdot ) \colon \mathtt {A}^{\mathbb {Z}_k} \times [d] \to [0,1]$ by

$$\begin{align*}W_\mu (\mathbf{a}; i) = \mu \{ \mathbf{x} \in \mathtt{A}^\Gamma \,:\, \mathbf{x}(s_i^j) = \mathbf{a}(j) \ \forall 0 \leq j < k \}. \end{align*}$$

Each $W(\cdot; i)$ is a probability measure on $\mathtt {A}^{\mathbb {Z}_k}$ which records the statistics of $\mu $ on the hyper-edge $\{s_i, s_i^2, \ldots , s_i^{d-1}\}$ . Also define the vertex weight

$$\begin{align*}W_\mu(\mathtt{a}) = \mu \{ \mathbf{x} \in \mathtt{A}^\Gamma \,:\, \mathbf{x}(e) = \mathtt{a} \} \end{align*}$$

for $\mathtt {a} \in \mathtt {A}$ . This probability measure on $\mathtt {A}$ records the single-site statistics of $\mu $ . It is determined by the edge weight $W_\mu (\cdot; \cdot )$ .

More generally, an abstract weight W is a d-tuple of probability vectors $W(\cdot; 1), \ldots , W(\cdot; d)$ on $\mathtt {A}^{\mathbb {Z}_k}$ such that there is a probability vector $W(\cdot )$ on $\mathtt {A}$ satisfying the consistency condition

$$\begin{align*}W(\mathtt{a}_0) = \sum_{\mathbf{a} \in \mathtt{A}^{\mathbb{Z}_k} \,:\, \mathbf{a}(j) = \mathtt{a}_0 } W(\mathbf{a}; i)\end{align*}$$

for every $i \in [d]$ , $j \in \mathbb {Z}_k$ and $\mathtt {a}_0 \in \mathtt {A}$ .

For any weight W, we define the Kikuchi entropy of W by

$$\begin{align*}\mathrm{H_K}(W) := (1-d) \operatorname{\mathrm{H}}( W(\cdot) ) + \frac{1}{k} \sum_{i \in [d]} \operatorname{\mathrm{H}}( W(\cdot; i) ), \end{align*}$$

where $\operatorname {\mathrm {H}}(\cdot )$ denotes Shannon entropy, and for a $\Gamma $ -invariant measure $\mu $ on $\mathtt {A}^\Gamma $ , we abbreviate $\mathrm {H_K}(W_\mu )$ to $\mathrm {H_K}(\mu )$ . The functional $\mathrm {H_K}$ appears in our work because it gives the upper exponential growth rate of the expected number of microstates whose averaged hyper-edge marginals are approximately specified by $\mu $ . This is an analog of [Reference Bowen9, Theorem 1.4]. Given a microstate $\mathbf {x} \in \mathtt {A}^V$ and $\sigma \in \operatorname {\mathrm {Hom}}(\Gamma , \operatorname {\mathrm {Sym}}(V))$ , let $W_{\mathbf {x}, \sigma }$ be the weight corresponding to $P_{\mathbf {x}}^\sigma \in \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ , the empirical distribution of $\mathbf {x}$ over $\sigma $ . Also, given two weights $W,W'$ , let

$$ \begin{align*}{\| {W - W'}\|} := \max_{1\le i \le d} \max_{\mathbf{a} \in \mathtt{A}^{\mathbb{Z}_k}} |W( \mathbf{a};i) - W'(\mathbf{a} ;i)|.\end{align*} $$

Proposition 3.1. We have

$$\begin{align*}\mathrm{H_K}(\mu) = \lim_{\varepsilon \to 0} \limsup_{n \to \infty} \frac{1}{nk} \log \mathbb{E}_{\sigma_n \sim \mathbb{P}_{nk}} {\lvert {\{ \mathbf{x} \in \mathtt{A}^{nk} \,:\, {\| {W_{\mathbf{x}, \sigma_n} - W_\mu} \|} < \varepsilon \}}\rvert}. \end{align*}$$

The proof of this proposition shows that we also obtain $\mathrm {H_K}(\mu )$ if $\limsup $ is replaced with $\liminf $ on the right-hand side.

Proof. Let $\mathbb {Z}_k$ act on $\mathtt {A}^{\mathbb {Z}_k}$ in the usual way: $\pi \mathbf {a}(j) = \mathbf {a}(j-\pi )$ for $\pi \in \mathbb {Z}_k$ , $\mathbf {a} \in \mathtt {A}^{\mathbb {Z}_k}$ and $j \in \mathbb {Z}_k$ . We will say that two elements $\mathbf {a}, \mathbf {b} \in \mathtt {A}^{\mathbb {Z}_k}$ are equivalent if they are in the same $\mathbb {Z}_k$ -orbit. Let $[\mathbf {a}] \subset \mathtt {A}^{\mathbb {Z}_k}$ denote the equivalence class of $\mathbf {a}$ . For a weight W, write

A weight W is cyclically invariant if $W(\mathbf {a}; i)=W(\mathbf {b}; i)$ for every equivalent pair $\mathbf {a},\mathbf {b}$ in $\mathtt {A}^{\mathbb {Z}_k}$ . For example, since $\mu $ is $\Gamma $ -invariant, for each $i \in [d]$ , the probability measure $W_\mu (\cdot; i) \in \operatorname {\mathrm {Prob}}(\mathtt {A}^{\mathbb {Z}_k})$ is cyclically invariant.

We first calculate $\mathbb {E} {\lvert {\{ \mathbf {x} \in \mathtt {A}^{kn} \,:\, W_{\mathbf {x}, \sigma _n} = W\}}\rvert }$ for an arbitrary cyclically invariant weight W with denominator $kn$ (i.e., such that $kn \cdot W([\mathbf {a}]; i) \in \mathbb {Z}$ for each $i, \mathbf {a}$ ). Letting

$$\begin{align*}N_n = {\lvert {\operatorname{\mathrm{Hom}}_{\mathrm{unif}}(\Gamma, \operatorname{\mathrm{Sym}}(kn))} \rvert} = \left( \frac{(kn)!}{n! k^n} \right)^d, \end{align*}$$

we have

$$ \begin{align*} \mathbb{E} {\lvert {\{ \mathbf{x} \in \mathtt{A}^{kn} \,:\, W_{\mathbf{x}, \sigma_n} = W\}} \rvert} &= \frac{1}{N_n} \sum_{\sigma \in \operatorname{\mathrm{Hom}}_{\mathrm{unif}}(\Gamma, \operatorname{\mathrm{Sym}}(kn))} {\lvert {\{ \mathbf{x} \in \mathtt{A}^{kn} \,:\, W_{\mathbf{x}, \sigma} = W \}}\rvert} \\ &= \frac{1}{N_n} \sum_{\mathbf{x} \in \mathtt{A}^{kn}} {\lvert {\{ \sigma \in \operatorname{\mathrm{Hom}}_{\mathrm{unif}}(\Gamma, \operatorname{\mathrm{Sym}}(kn)) \,:\, W_{\mathbf{x}, \sigma} = W\}} \rvert}. \end{align*} $$

Now a labeling $\mathbf {x} \in \mathtt {A}^{kn}$ admits at least one $\sigma $ that gives the correct weight W if and only if it has the correct ‘vertex statistics’; that is, $\frac {1}{kn}{\lvert {\{ i \in [kn] \,:\, \mathbf {x}(i) = \mathtt {a} \}} \rvert } = W(\mathtt {a})$ for all $\mathtt {a} \in \mathtt {A}$ . There are $\exp \{ kn ( \operatorname {\mathrm {H}}(W(\cdot )) + o(1))\}$ such $\mathbf {x}$ , and each admits the same number of $\sigma $ . From now on, fix one such $\mathbf {x}$ .

For $i \in [d]$ , let $G_i$ be the set of k-uniform permutations $\pi \in \operatorname {\mathrm {Sym}}(kn)$ such that if $W(\cdot; \pi )$ is the probability vector on $\mathtt {A}^{\mathbb {Z}_k}$ given by

$$ \begin{align*}W(\mathbf{a}; \pi) = (kn)^{-1} |\{v \in [kn]:~ \mathbf{a}(t) = \mathbf{x}(\pi^t(v))~\forall 0\le t < k\}|,\end{align*} $$

then $W(\cdot; \pi ) \equiv W(\cdot; i)$ . Any k-uniform homomorphism $\sigma $ with $W_{\mathbf {x}, \sigma } = W$ is determined by the permutations $\sigma (s_i)$ which must be in $G_i$ . So the number of such homomorphisms is $\prod _{i=1}^d |G_i|$ .

Let $G(\mathbf {x})$ be the set of permutations $g\in \operatorname {\mathrm {Sym}}(kn)$ which fix $\mathbf {x}$ in the sense that $\mathbf {x}(v)=\mathbf {x}(gv)$ for all $v\in [kn]$ . Observe that $G(\mathbf {x})$ acts on $G_i$ by conjugation. This means that if $\pi _i$ is a fixed permutation in $G_i$ and $g\in G(\mathbf {x})$ , then $g \pi _i g^{-1} \in G_i$ . Moreover, this action is transitive. So

$$ \begin{align*}|G_i| = \frac{ |G(\mathbf{x})|}{|\textrm{Stab}(\pi_i)|}\end{align*} $$

where $\textrm {Stab}(\pi _i)$ is the set of $g \in G(\mathbf {x})$ with $g \pi _i g^{-1} = \pi _i$ . Observe

$$ \begin{align*}|G(\mathbf{x})| = \prod_{\mathtt{a} \in \mathtt{A}} \left( kn \cdot W(\mathtt{a}) \right) !.\end{align*} $$

There are two mechanisms by which a $g \in G(\mathbf {x})$ can stabilize $\pi _i$ . Either g can permute k-cycles with the same labels or it can rotate a given labeled k-cycle. Therefore,

$$\begin{align*}|\textrm{Stab}(\pi_i)| = \prod_{[\mathbf{a}] \in \mathtt{A}^{\mathbb{Z}_k}/\mathbb{Z}_k} \left( n \cdot W([\mathbf{a}]; i) \right)! \left( \frac{k}{{\lvert {[\mathbf{a}]} \rvert} } \right)^{n \cdot W([\mathbf{a}]; i)}. \end{align*}$$

Putting everything together, we get

$$\begin{align*}\mathbb{E} {\lvert {\{ \mathbf{x} \in \mathtt{A}^{kn} \,:\, W_{\mathbf{x}, \sigma_n} = W\}} \rvert} = \frac{ e^{kn (\operatorname{\mathrm{H}}(W(\cdot)) + o(1))} \left( n! k^n \prod_{\mathtt{a} \in \mathtt{A}} \left( kn \cdot W(\mathtt{a}) \right) ! \right)^d}{(kn)!^d \prod_{i \in [d]} \prod_{[\mathbf{a}] \in \mathtt{A}^{\mathbb{Z}_k}/\mathbb{Z}_k} \left( n \cdot W([\mathbf{a}]; i) \right)! \left( \frac{k}{{\lvert {[\mathbf{a}]} \rvert}} \right)^{n \cdot W([\mathbf{a}]; i)} }. \end{align*}$$

Applying Stirling’s approximation $\log n! = n \log n - n + o(n)$ , the logarithm of this is

$$\begin{align*}kn \left( \operatorname{\mathrm{H}}(W(\cdot)) + d\sum_{\mathtt{a} \in \mathtt{A}} W(\mathtt{a}) \log W(\mathtt{a}) - \frac{1}{k} \sum_{i \in [d]} \sum_{[\mathbf{a}]} W([\mathbf{a}]; i) \log \frac{W([\mathbf{a}]; i)}{{\lvert {[\mathbf{a}]} \rvert} } \right) + o(n). \end{align*}$$

By cyclic invariance of each $W(\cdot; i)$ , we have $W(\mathbf {a}; i) = \frac {W([\mathbf {a}]; i)}{{\lvert {[\mathbf {a}]} \rvert }}$ , so this gives

$$\begin{align*}\mathbb{E} {\lvert {\{ \mathbf{x} \in \mathtt{A}^{kn} \,:\, W_{\mathbf{x}, \sigma_n} = W\}}\rvert} = e^{nk F(W) + o(n)} ,\end{align*}$$

where

Now, since the number of denominator-n weights W grows polynomially in n, it follows that for any $\varepsilon>0$ ,

$$\begin{align*}\lim_{n \to \infty} \frac{1}{kn} \log \mathbb{E} {\lvert {\{ \mathbf{x} \in \mathtt{A}^{kn} \,:\, {\| {W_{\mathbf{x}, \sigma_n} - W_\mu}\|} < \varepsilon\}}\rvert} = \sup \{ F(W) \,:\, {\| {W - W_\mu} \|} < \varepsilon\}, \end{align*}$$

and taking $\varepsilon $ to 0 gives the claimed formula, by continuity of F.

The annealed entropy of $\mu $ is defined by

$$\begin{align*}\mathrm{h_{ann}}(\mu) = \inf_{\mathcal{O} \ni \mu} \limsup_{n \to \infty} \frac{1}{nk} \log \mathbb{E}_{\sigma_n \sim \mathbb{P}_{nk}} {\lvert {\Omega(\sigma_n, \mathcal{O})} \rvert}, \end{align*}$$

where the infimum is over all open neighborhoods of $\mu $ .

To emphasize the relationship between $\mathrm {H_K}(\mu )$ and $\mathrm {h_{ann}}(\mu )$ , let

$$ \begin{align*}\mathcal{O}_\epsilon(\mu) = \{\nu \in \operatorname{\mathrm{Prob}}(\mathtt{A}^\Gamma):~ {\| {W_\nu - W_\mu} \|} <\epsilon\}.\end{align*} $$

Then $\mathcal {O}_\epsilon (\mu )$ is an open neighborhood of $\mu $ and Proposition 3.1 becomes

$$\begin{align*}\mathrm{H_K}(\mu) = \inf_{\epsilon>0} \limsup_{n \to \infty} \frac{1}{nk} \log \mathbb{E}_{\sigma_n \sim \mathbb{P}_{nk}} {\lvert {\Omega(\sigma_n, \mathcal{O}_\epsilon(\mu))} \rvert}. \end{align*}$$

In particular, $\mathrm {h_{ann}} (\mu ) \leq \mathrm {H_K}(\mu )$ .

The next proposition shows that $\mathrm {H_K}(\mu )$ is an upper bound for sofic entropy with respect to ‘most’ sofic approximations.

Proposition 3.2. Let $\mu $ be a $\Gamma $ -invariant Borel probability measure on $\mathtt {A}^\Gamma $ . Then there are subsets $\Omega _n' \subseteq \Omega _n^{\mathrm {sofic}}$ with

$$ \begin{align*}\lim_{n\to\infty} \mathbb{P}_n(\Omega_n')= 1\end{align*} $$

and such that if $\Sigma =\{\sigma _n\}_{n=1}^\infty $ satisfies $\sigma _n \in \Omega ^{\prime }_{i_n}$ for some increasing sequence $(i_n)_n$ with $k \mid i_n$ for all n, then

$$ \begin{align*}h_\Sigma(\mathtt{A}^\Gamma,\mu,T)\le \mathrm{H_K}(\mu).\end{align*} $$

Proof. Given a positive integer n divisible by k, $\varepsilon>0$ , and $\sigma :\Gamma \to \operatorname {\mathrm {Sym}}(n)$ , let $N_{n,\varepsilon }(\sigma )$ be ${\lvert {\Omega (\sigma , \mathcal {O}_\epsilon (\mu ))} \rvert }$ . Think of $N_{n,\varepsilon }$ as a random variable with respect to the uniform measure on the space of k-uniform homomorphisms from $\Gamma $ to $\operatorname {\mathrm {Sym}}(n)$ . In addition, let

$$ \begin{align*} \mathrm{H}_{\mathrm{K},\varepsilon}(\mu) := \limsup_{n \to \infty} \frac{1}{n} \log \mathbb{E}[N_{n,\varepsilon}], \end{align*} $$

where here and below we always restrict n to be a multiple of k.

For each $\varepsilon $ and n as above, let $\Omega _{n,\epsilon }'$ be the set of those $\sigma \in \Omega _n^{\mathrm {sofic}}$ that satisfy $N_{n,\epsilon }(\sigma ) \le e^{\sqrt {n}}\mathbb {E}[N_{n,\varepsilon }]$ , and let $\Omega _n' := \Omega _{n,1}'\cap \cdots \cap \Omega _{n,1/n}'$ . Then Markov’s inequality gives

$$ \begin{align*}\mathbb{P}_n(\Omega_n') \ge 1 - \sum_{m=1}^n\mathbb{P}_n\big(N_{n,1/m}(\sigma)> e^{\sqrt{n}}\mathbb{E}[N_{n,1/m}]\big) \ge 1 - ne^{-\sqrt{n}} \to 1.\end{align*} $$

Finally, suppose that $\Sigma =\{\sigma _n\}_{n=1}^\infty $ satisfies $\sigma _n \in \Omega ^{\prime }_{i_n}$ for some increasing sequence $(i_n)_n$ with $k \mid i_n$ for all n. Then we also have $\sigma _n \in \Omega ^{\prime }_{i_n,1/m}$ whenever $i_n\ge m$ . Fixing m and letting $n\to \infty $ , it follows that

$$ \begin{align*}h_\Sigma(\mathtt{A}^\Gamma,\mu,T) \le \limsup_{n \to \infty} \frac{1}{i_n} \log N_{i_n,1/m}(\sigma_n) \le \limsup_{n \to \infty} \frac{1}{i_n} \log \mathbb{E}[N_{i_n,1/m}] \le \mathrm{H}_{\mathrm{K},1/m}(\mu),\end{align*} $$

where the first inequality uses again the fact that $\mathcal {O}_{1/m}(\mu )$ is an open neighborhood of $\mu $ for every positive integer m. Now letting $m\to \infty $ , Proposition 3.1 gives $\mathrm {H_K}(\mu ) = \lim _{m\to \infty }\mathrm {H}_{\mathrm {K},1/m}(\mu )$ , so this completes the proof.

3.3 A brief history

The most basic method for analyzing the behaviour of a random sofic approximation is the first moment method. Our first indication of the typical number of microstates for $(X,m_X,T)$ over $\sigma _n$ chosen from $\mathbb {P}_n$ is given by the expectation of that number.

In the analogous setting of actions of free groups, such averages have been studied intensely in recent years. In [Reference Bowen9], the exponential growth rate of the expected number of good microstates was shown to coincide with an invariant of systems previously introduced by the second author in [Reference Bowen10], where it was used to solve the isomorphism problem for finite-state Bernoulli actions of free groups. In those and several subsequent papers, this invariant was called the ‘f-invariant’. Here, we propose a new term instead: we refer to this quantity as ‘annealed entropy’.

In work of the second author, the f-invariant was obtained as a limit of functionals referred to as F, which are annealed entropies of Markov approximations. As explained farther below, this quantity first appeared in refinements of work of Kikuchi [Reference Kikuchi42]. For this reason, we call it Kikuchi entropy. In later sections, it is used to prove Theorem 1.3.

The reason for the name ‘annealed entropy’ is a connection to statistics and statistical physics. During the last forty years, very similar first-moment calculations for various configurations over large sparse random graphs have become a central feature of the analysis of ‘graphical models’ in those disciplines. Often, the use of such averages can be seen as a first attempt to find the value for a ‘typical’ random graph. In such settings, the first moment is referred to as an annealed average: see, for instance, the usage in [Reference Mézard, Parisi and Virasoro50, Section IV.1] or [Reference Mézard and Montanari49, Section 5.4]. Its use as a prediction of typical behaviour is called the ‘Bethe ansatz’ (or sometimes the ‘replica symmetric’ approximation in reference to a phenomenology in the study of spin glasses that we do not explain here: see, for instance, [Reference Mézard, Parisi and Virasoro50, Chapter I] or [Reference Mézard and Montanari49, Chapter 8]Footnote ¹ ).

In fact, the origins of these quantities lie even farther back in the statistical physics literature. In the general setting for random graphical models studied in statistics, the leading order exponents in first moment calculations are given by quantities called ‘Bethe’ or ‘Kikuchi’ entropy.

The first of these terms refers to foundational work by Bethe [Reference Bethe8]. He estimated the free energy of a certain model of an alloy on a two-dimensional lattice by a recursive expansion that retained nearest-neighbour interactions but ignored the effect of loops in the lattice graph. A more mathematical description is that the two-dimensional lattice is approximated by an infinite regular tree, and this is why such trees are now often called ‘Bethe lattices’ in statistical physics.

In [Reference Kikuchi42], Kikuchi expanded on Bethe’s ideas by proposing a more careful expansion that respects slightly more of the lattice structure. In modern terms, this can be understood as an approximation to the lattice by a hyper-tree rather than a simple tree. While Bethe argued mostly in terms of free energies, Kikuchi’s paper includes various explicit formulae for entropy estimates, and these evolved over time into the quantities studied in statistical inference today. See [Reference Kikuchi42, Equations (A.7) and (C1.6)] for early intimations of these modern formulae. Bethe’s and Kikuchi’s approximation methods can also be found in physics surveys from closer to that time such as Section III in Burley’s contribution [Reference Domb and Green26, Chapter 9].

These formulae were brought explicitly into statistical theory around 2000 by Yedidia and various co-authors in a series of technical reports: see, in particular, [Reference Yedidia77, Reference Yedidia, Freeman and Weiss78] and the further references given there. While these references continued to emphasize free energy more than entropy, they do cover both: the explicit formula for Bethe entropy is [Reference Yedidia77, Formula (1.32)], for example.

The term ‘annealed entropy’ (instead of ‘f-invariant’) emphasizes the connection between these two fields. To be more precise, the annealed entropy of a measure-preserving action of a free group is defined as an infimum of the values of a more elementary quantity over Markov approximations to the action. This more elementary quantity, denoted by ‘F’ in [Reference Bowen10, Reference Bowen9], has precisely the same formula as Bethe entropy. Thus, the same formula for an entropy-like quantity was discovered independently and then used for very similar first-moment calculations in both fields.

Bethe entropy has by now become a textbook topic in statistical inference with graphical models: see, for instance, [Reference Wainwright and Jordan71, Section 4] or [Reference Mézard and Montanari49, Chapter 14, especially Subsection 14.2.4]. Some of the theory of these models has also been analyzed rigorously in the probability literature. For example, [Reference Dembo, Montanari, Sly and Sun24] proves that a first-moment quantity asymptotically agrees with the typical free energy for ferromagnetic Potts models over sparse graph sequences, justifying the ‘Bethe ansatz’ for these models.

Whereas Bethe entropy can be understood as a functional of a probability distribution over a tree, the extension to Kikuchi entropy allows an underlying graph that is a hyper-tree, which is a hyper-graph $G=(V,E)$ such that there is some tree with vertex set V whose subgraphs induced by hyper-edges of G are all connected.

For the first-moment calculations we need below, the exponent is given by the analogous annealed entropy for an action of $\Gamma $ , the d-fold free power of $\mathbb {Z}_k$ , rather than a free group. It turns out that this could again be defined as an infimum of a more elementary quantity over Markov approximations, where now the more elementary quantity is the Kikuchi entropy associated to the Cayley hyper-tree of $\Gamma $ . We do not work this out completely here. Instead, we give a more direct formula for the annealed entropy and only show that it is bounded above by the Kikuchi entropy (see discussion following Proposition 3.1).

4 Completely positive entropy, local convergence and model mixing

In order to prove the completely positive entropy (CPE) statement in Theorem 1.3, we will use a variant of the main result of [Reference Austin and Burton7]. That paper proves that if a model measure sequence locally and empirically converges to the target measure and is uniformly model mixing, then the system is CPE. The local and empirical convergence result needed to prove CPE will also help us establish the lower bound in Theorem 1.3, part (a). We review these concepts here.

As in previous sections, let $\mathtt {A}$ be a finite set, $\Gamma $ be a countable group and $\mu $ be a $\Gamma $ -invariant Borel probability measure on $\mathtt {A}^\Gamma $ . We also let $\Sigma =(\sigma _n)_n$ be a sofic approximation, where $\sigma _n:\Gamma \to \operatorname {\mathrm {Sym}}(V_n)$ for each n.

Given a probability measure $\kappa $ on $\mathtt {A}^{V_n}$ and a vertex $v \in V_n$ , the localization of $\kappa $ at v is the probability measure

$$ \begin{align*}{\mathrm{Loc}({\kappa}, {v})} = (\Pi_v^{\sigma_n})_* \kappa = \sum_{\mathbf{x} \in \mathtt{A}^{V_n} } \kappa(\mathbf{x}) \delta_{\Pi^{\sigma_n}_v(\mathbf{x})} \in \operatorname{\mathrm{Prob}}(\mathtt{A}^\Gamma).\end{align*} $$

This is the law of the pull-back name of a $\kappa $ -random sample, as viewed from a fixed $v\in V_n$ . This measure depends on the homomorphism $\sigma _n$ , but we will usually leave that dependence implicit. If we want to specify $\sigma _n$ , we use the notation ${\mathrm {Loc}_{\sigma _n}({\kappa }, {v})}$ .

A model measure sequence is a sequence $(\mu _n)_n$ of probability measures $\mu _n$ on $\mathtt {A}^{V_n}$ . The sequence $(\mu _n)_n$ is said to converge to $\mu $ locally and empirically if for every open neighborhood $\mathcal {O}$ of $\mu $ in $\operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ ,

$$ \begin{align*} 1 &= \lim_{n\to\infty} |V_n|^{-1} \, {\lvert {\{v\in V_n:~ {\mathrm{Loc}({\mu_{n}}, {v})} \in \mathcal{O}\}}\rvert} \\ 1 &= \lim_{n\to\infty} \mu_n(\{\mathbf{x} \in \mathtt{A}^{V_n}:~ P_{\mathbf{x}}^{\sigma_n} \in \mathcal{O}\}). \end{align*} $$

Below, we will sometimes refer to the first equality holding for every $\mathcal {O}$ as local convergence and the second as empirical convergence in order to be explicit about which property is relevant.

If the measures $\mu _n$ and/or the maps $\sigma _n$ are random with law $\mathbb {P}_n$ , then we say the sequence converges locally and empirically in probability to $\mu $ if the same limits hold in probability. Explicitly, for every open neighborhood $\mathcal {O}$ of $\mu $ and every $\varepsilon>0$ ,

$$ \begin{align*} 1 &= \lim_{n\to\infty} \mathbb{P}_n\left\{ |V_n|^{-1} \, {\lvert {\{v\in V_n:~ {\mathrm{Loc}({\mu_{n}}, {v})} \in \mathcal{O}\}} \rvert}> 1 - \varepsilon \right\} \\ 1 &= \lim_{n\to\infty} \mathbb{P}_n\left\{ \mu_n(\{\mathbf{x} \in \mathtt{A}^{V_n}:~ P_{\mathbf{x}}^{\sigma_n} \in \mathcal{O}\}) > 1 - \varepsilon \right\}. \end{align*} $$

It will be convenient to reformulate local convergence in probability in terms of total variation distance between marginals. To make this precise, we need notation for the marginals.

Given a finite set $B \subset \Gamma $ and a probability measure $\nu $ on $\mathtt {A}^\Gamma $ , let $\nu _B$ be the probability measure on $\mathtt {A}^B$ equal to the pushforward of $\nu $ under the projection map $\mathtt {A}^\Gamma \to \mathtt {A}^B$ . This is the marginal of $\nu $ on B.

Let $d_{\operatorname {\mathrm {TV}}}$ denote total variation distance. Because the sets of the form $\mathcal {O}(B,\varepsilon ,\mu )=\{\nu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma ):~ d_{\operatorname {\mathrm {TV}}}(\nu _B, \mu _B) < \varepsilon \}$ form a neighborhood basis for the topology at $\mu $ , it follows that a sequence of random measures $\mu _n \in \operatorname {\mathrm {Prob}}(\mathtt {A}^{V_n})$ converges locally in probability to a measure $\mu \in \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ if and only if for every finite $B \subset \Gamma $ and $\varepsilon>0$ ,

(10)

$$ \begin{align} \lim_{n\to\infty} \mathbb{P}_n \left\{ \frac{1}{{\lvert {V_n}\rvert} } {\lvert {\left\{ v \in V_n \,:\, d_{\operatorname{\mathrm{TV}}} \big( {\mathrm{Loc}({\mu_n}, {v})}_{B} ,\, \mu_{B} \big)> \varepsilon \right\}} \rvert} > \varepsilon \right\} = 0. \end{align} $$

Versions of the next lemma have appeared several times before: for instance, inside the proof of [Reference Bowen12, Theorem 4.1], or explicitly as [Reference Hayes36, Lemma 5.4] or [Reference Austin3, Corollary 5.7]. We include a proof for completeness.

Lemma 4.1. If a sequence of random measures $(\mu _n)_n$ converges locally in probability to an ergodic measure $\mu \in \operatorname {\mathrm {Prob}}^\Gamma (\mathtt {A}^\Gamma )$ over some random sequence of homomorphisms, then it converges locally and empirically in probability to $\mu $ .

Proof. For each n, let $\theta _n \in \operatorname {\mathrm {Prob}}(\operatorname {\mathrm {Prob}}^\Gamma (\mathtt {A}^\Gamma ))$ denote the law of

$$\begin{align*}\frac{1}{{\lvert {V_n}\rvert}} \sum_{v \in V_n} {\mathrm{Loc}_{\sigma_n}({\mu_n}, {v})} , \end{align*}$$

where $(\sigma _n, \mu _n)$ are jointly distributed as given. As stated, $\theta _n$ is supported on $\Gamma $ -imvariant measures because each $\sigma _n$ is a homomorphism, and therefore, the empirical measure $P^{\sigma _n}_{\mathbf {x}}$ is invariant, for any $\mathbf {x} \in \mathtt {A}^{V_n}$ .

Passing to a subsequential limit if necessary, the sequence $(\theta _n)_n$ converges weakly to some $\theta \in \operatorname {\mathrm {Prob}}(\operatorname {\mathrm {Prob}}^\Gamma (\mathtt {A}^\Gamma ))$ . We first show the barycenter of $\theta $ must be $\mu $ : given a continuous function $g \in C(\mathtt {A}^\Gamma )$ ,

$$ \begin{align*} \iint g(\mathbf{z}) \, \nu(d\mathbf{z})\, \theta(d\nu) &= \lim_{n \to \infty} \iint g(\mathbf{z}) \, \nu(d\mathbf{z})\, \theta_n(d\nu) \\ &= \lim_{n \to \infty} \mathbb{E}\left[ \frac{1}{|V_n|} \sum_{v \in V_n} \int g(\Pi_v^{\sigma_n}\mathbf{x}) \, \mu_n(d\mathbf{x}) \right]. \end{align*} $$

Now given $\varepsilon>0$ , let $\mathcal {O} \ni \mu $ be an open neighborhood such that if $\nu \in \mathcal {O}$ , then $\int g\, d\nu $ is within $\varepsilon $ of $\int g\, d\mu $ . Then we control the expectation above by dividing up the terms based on whether $\mu _n$ looks like $\mu $ near v:

$$ \begin{align*} \mathbb{E}\left[ \frac{1}{|V_n|} \sum_{v \in V_n} \int g(\Pi_v^{\sigma_n}\mathbf{x}) \, \mu_n(d\mathbf{x}) \right] &= \frac{1}{|V_n|} \mathbb{E}\left[ \sum_{\substack{v \in V_n \\ {\mathrm{Loc}({\mu_n}, {v})} \in \mathcal{O}}} \int g(\Pi_v^{\sigma_n}\mathbf{x}) \, \mu_n(d\mathbf{x}) \right] \\ &\qquad + \frac{1}{|V_n|} \mathbb{E}\left[ \sum_{\substack{v \in V_n \\ {\mathrm{Loc}({\mu_n}, {v})} \not\in \mathcal{O}}} \int g(\Pi_v^{\sigma_n}\mathbf{x}) \, \mu_n(d\mathbf{x}) \right]. \end{align*} $$

The magnitude of the second term is bounded by

$$\begin{align*}\max {\lvert {g} \rvert} \cdot \mathbb{E} \left[ \frac{1}{|V_n|} {\lvert {\{v \in V_n : {\mathrm{Loc}({\mu_n}, {v})} \not\in \mathcal{O} \}} \rvert}\right] \end{align*}$$

which goes to 0 as n goes to infinity by definition of local convergence in probability. By choice of $\mathcal {O}$ , the first term is within $\varepsilon $ of $\int g\, d\mu $ for large n. Since $\varepsilon>0$ was arbitrary, this proves that the (subsequential) limit must have barycenter $\mu $ .

Since m is ergodic, the only possible subsequential limit with barycenter $\mu $ is $\delta _\mu $ , so this is the true limit. This implies that for any $\varepsilon> 0$ and open $\mathcal {O} \ni \mu $ ,

$$\begin{align*}\mathbb{P} \big\{ \mu_n(\Omega(\sigma_n, \mathcal{O}))> 1-\varepsilon \big\} \to 1. \end{align*}$$

This is because

$$\begin{align*}\mathbb{P} \big\{ 1-\mu_n(\Omega(\sigma_n, \mathcal{O}))> \varepsilon \big\} \leq \frac{\mathbb{E}\big[1-\mu_n(\Omega(\sigma_n, \mathcal{O}))]}{\varepsilon} = \frac{1 - \mathbb{E}[\mathbb{E}_{\mathbf{x} \sim \mu_n}[\mathbf{1}_{P_{\mathbf{x}}^{\sigma_n} \in \mathcal{O}}]]}{\varepsilon} = \frac{1 - \theta_n(\mathcal{O})}{\varepsilon} \to 0\end{align*}$$

using Markov’s inequality, the tower law of expectation and the portmanteau theorem.

4.1 Property M

To define notions of model mixing, we will impose distance functions on the finite sets V which form a given sofic approximation. For this purpose, we will assume $\Gamma $ is finitely generated and let $E \subset \Gamma $ be a finite symmetric generating subset. For $g\in \Gamma $ , let $|g|$ be the word-length of g, which is the length of the shortest word in E representing g. Given $\sigma :\Gamma \to \operatorname {\mathrm {Sym}}(V)$ , define distance in V by

$$ \begin{align*}d_\sigma(v,w) = \min\{ |g|:~ g \in \Gamma, \sigma(g)v=w\}.\end{align*} $$

If there does not exist g with $\sigma (g)v=w$ , then we set $d_\sigma (v,w)=+\infty $ . If $\sigma $ is not a homomorphism, then $d_\sigma $ may fail to satisfy the triangle inequality.

A subset $S \subset V$ is r -separated if $d_\sigma (v,w)> r$ for every pair of distinct $v,w \in S$ .

Suppose a model measure sequence $(\mu _n)_n$ converges locally and empirically to $\mu $ . We say the sequence is uniformly model mixing (umm) if for every finite $F \subset \Gamma $ and every $\epsilon>0$ , there is some $r<\infty $ and a sequence of finite subsets $W_n \subset V_n$ such that

$$ \begin{align*}|W_n| = (1-o(1))|V_n|,\end{align*} $$

and if $S \subset W_n$ is r-separated, then

$$ \begin{align*}\operatorname{\mathrm{H}}( (\mu_n)_{\sigma_n^{F}(S)} ) \ge |S| (\operatorname{\mathrm{H}}(\mu_F) - \epsilon),\end{align*} $$

where

• $\mu _F$ is the probability measure on $\mathtt {A}^F$ which is the pushforward of $\mu $ under the projection map $\mathtt {A}^\Gamma \to \mathtt {A}^F$ ;
• $\sigma _n^F(S) = \{\sigma _n(f)s:~f \in F, s\in S\}$ ;
• $ (\mu _n)_{\sigma _n^{F}(S)} $ is the probability measure on $\mathtt {A}^{\sigma _n^{F}(S)}$ which is the pushforward of $\mu _n$ under the projection map $\mathtt {A}^{V_n} \to \mathtt {A}^{\sigma _n^{F}(S)}$ .

This is a microstates analog of uniform mixing, introduced by Rudolph and Weiss in [Reference Rudolph and Weiss62] for actions of an amenable group; see also [Reference Weiss74, Definition 10], where the name ‘uniform mixing’ appears for the first time. The main result of [Reference Austin and Burton7] is that if $(\mu _n)_n$ locally and empirically converges to $\mu $ and is uniformly model mixing with respect to a fixed sofic approximation $\Sigma $ , then the system $(\mathtt {A}^\Gamma ,\mu ,T)$ has completely positive entropy with respect to $\Sigma $ , in analogy with a corresponding result of [Reference Rudolph and Weiss62].

Unfortunately, we do not know whether the parity check sub-shifts of Theorem 1.3 are uniformly model mixing. Instead, we define a weaker version of model mixing which suffices.

Definition 4.2 (Property M).

Suppose $(\mu _n)_n$ is a model measure sequence and $\mu \in \operatorname {\mathrm {Prob}}^\Gamma (\mathtt {A}^\Gamma )$ is an invariant measure. We say the sequence has property M if for every $\epsilon>0$ and $0<r<\infty $ , there is a sequence of subsets $S_n \subset V_n$ such that

$$ \begin{align*}\liminf \frac{|S_n|}{|V_n|}>0\end{align*} $$

and

(11)

$$ \begin{align} \operatorname{\mathrm{H}}( (\mu_n)_{\sigma_n^{B_r}(S_n)} ) \ge |S_n| (\operatorname{\mathrm{H}}(\mu_{B_r}) - \epsilon) \end{align} $$

for all n, where $B_r = {\mathrm {B}({r}, {e})} \subset \Gamma $ denotes the ball of radius r centered at the identity. In applications, $\mu $ will be a limit of the sequence $(\mu _n)_n$ , but we do not impose any such requirement for the definition.

In contrast with uniform model-mixing, we only require $S_n$ to have asymptotically positive density in $V_n$ , and there is no uniform lower bound on this density across different choices of $\epsilon , r$ . This density could be much smaller than $|B_r|^{-1}$ , for example. We also do not require the sets $S_n$ to be separated, although the lower bound (11) does usually imply a kind of approximate separation anyway.

We suspect that other variants of uniform model mixing could be used in a similar way to prove completely positive entropy, so Definition 4.2 is not an attempt at optimal generality. This is why we have chosen a rather bland name for Property M, although it is convenient in our work below.

Here is the main result of this section:

Theorem 4.3. As above, let $\mu $ be a $\Gamma $ -invariant probability measure on $\mathtt {A}^\Gamma $ , $\Sigma =(\sigma _n:\Gamma \to \operatorname {\mathrm {Sym}}(V_n))_n$ a sofic approximation, and $(\mu _n)_n$ a model measure sequence. Assume $(\mu _n)_n$ converges to $\mu $ locally and empirically along $\Sigma $ and has property M. Then every nontrivial factor of $(\mathtt {A}^\Gamma ,\mu ,T)$ inherits positive $\Sigma $ -entropy.

Proof. By Corollary 2.2, it suffices to consider factor maps to other shift systems. For these, the proof is based on the proof of [Reference Austin and Burton7, Theorem 1.2].

Step 1. We start by establishing some general entropy inequalities. For two or more jointly distributed random variables $X_1, \ldots , X_k$ , define the total correlation

$$\begin{align*}\operatorname{\mathrm{TC}}(X_1; \cdots; X_k) = \left(\sum_{i=1}^k \operatorname{\mathrm{H}}(X_i)\right) - \operatorname{\mathrm{H}}(X_1, \ldots, X_k). \end{align*}$$

This is a generalization of mutual information to more than two random variables, introduced in [Reference Watanabe72]. It can also be recursively defined by setting $\operatorname {\mathrm {TC}}(X_1; X_2) = \operatorname {\mathrm {I}}(X_1; X_2)$ and for $k \geq 3$

$$\begin{align*}\operatorname{\mathrm{TC}}(X_1; \cdots; X_k) = \operatorname{\mathrm{TC}}(X_1; \cdots; X_{k-1}) + \operatorname{\mathrm{I}}(X_1, \ldots, X_{k-1}; X_k). \end{align*}$$

Using this recursion and the data-processing inequality [Reference Cover and Thomas23, Theorem 2.8.1], it can be shown by induction on k that if f is any function and $Y_i = f(X_i)$ for each i, then

(12)

$$ \begin{align} \operatorname{\mathrm{TC}}(Y_1; \cdots; Y_k) \leq \operatorname{\mathrm{TC}}(X_1; \cdots; X_k). \end{align} $$

This inequality has also appeared in [Reference Austin6, Lemma 4.3]. Note that the total correlation does not depend on the order in which the random variables are listed. Below, we will refer to the total correlation of a collection of random variables $\{X_i \,:\, i \in I\}$ indexed by a finite set I using the notation $\operatorname {\mathrm {TC}}(\{ X_i \,:\, i \in I\})$ , since fixing an ordering would unnecessarily complicate notation.

The Rokhlin distance between random variables $\alpha ,\beta $ which are defined on the same probability space is defined by $\textrm {d}^{\mathrm {Rok}}_\mu (\alpha , \beta ) = \operatorname {\mathrm {H}}_\mu (\alpha | \beta ) + \operatorname {\mathrm {H}}_\mu (\beta | \alpha )$ . This satisfies the triangle inequality, and it equals zero if and only if $\alpha $ and $\beta $ generate the same partition up to null sets. This distance can be used to control total correlation via the bound

$$ \begin{align*} &\hspace{-1in} {\lvert {\operatorname{\mathrm{TC}}(X_1; \cdots; X_k) - \operatorname{\mathrm{TC}}(Y_1; \cdots; Y_k)} \rvert} \\ &\leq {\lvert {\operatorname{\mathrm{H}}(X_1, \ldots, X_k)- \operatorname{\mathrm{H}}(Y_1, \ldots, Y_k)}\rvert} + \sum_{i=1}^k {\lvert {\operatorname{\mathrm{H}}(X_i) - \operatorname{\mathrm{H}}(Y_i)} \rvert} \\ &\leq 2 \sum_{i=1}^k \big( \operatorname{\mathrm{H}}(X_i | Y_i) + \operatorname{\mathrm{H}}(Y_i | X_i) \big) = 2 \sum_{i=1}^k \textrm{d}^{\mathrm{Rok}}(X_i,Y_i). \end{align*} $$

Step 2. Since $(\mu _n)_n$ locally and empirically converges to $\mu $ , if $S_n \subset V_n$ satisfies $\liminf \frac {{\lvert {S_n}\rvert } }{{\lvert {V_n} \rvert }}> 0$ , then

$$\begin{align*}\frac{1}{{\lvert {S_n}\rvert}} \sum_{v \in S_n} \operatorname{\mathrm{H}} \big( (\mu_n)_{\sigma_n^{B_r}(v)} \big) = \operatorname{\mathrm{H}}(\mu_{B_r}) + o(1). \end{align*}$$

So property M implies that for every $r,\varepsilon>0$ , there is a sequence of subsets $S_n \subset V_n$ such that $\liminf \frac {{\lvert {S_n}\rvert }}{{\lvert {V_n}\rvert }}> 0$ and

$$\begin{align*}\frac{1}{{\lvert {S_n}\rvert} } \operatorname{\mathrm{TC}}\big( \{ (\mathbf{x}_n)^{\sigma_n^{B_r}(v)} \,:\, v \in S_n \} \big) \leq \varepsilon \end{align*}$$

for all large n, where $\mathbf {x}_n$ is a random sample of $\mu _n$ and the projections $\{ (\mathbf {x}_n)^{\sigma _n^{B_r}(v)} \,:\, v \in S_n \}$ are jointly distributed in the natural way (from a common sample of $\mathbf {x}_n$ ).

Step 3. Now let $\phi \colon \mathtt {A}^\Gamma \to \mathtt {B}$ be a measurable map into a finite set $\mathtt {B}$ generating a factor map $\phi ^\Gamma $ as in (6). If this factor is nontrivial, then $\operatorname {\mathrm {H}}_\mu (\phi )> 0$ . We want to show that the property M assumption on $\mu _n$ implies that

$$\begin{align*}h_\Sigma(\mu\,;\,\phi^\Gamma)> 0.\end{align*}$$

Let $\lambda $ be the graphical joining of the factor map $\phi ^\Gamma $ as in (4).

Fix $r \in \mathbb {N}$ and a ${\mathrm {B}({r}, {e})}$ -local function $\psi \colon \mathtt {A}^\Gamma \to \mathtt {B}$ which approximates $\phi $ closely enough in measure that $\textrm {d}^{\mathrm {Rok}}_\mu (\psi , \phi ) < \frac {1}{8}\operatorname {\mathrm {H}}_\mu (\phi )$ .

Now with $\varepsilon = \frac {1}{8}\operatorname {\mathrm {H}}_\mu (\phi )$ and this r, let $(S_n)_n$ be the sequence of subsets of $V_n$ given by property M. Since $\psi $ is ${\mathrm {B}({r}, {e})}$ -local, the data-processing inequality (12) above implies that

$$\begin{align*}\operatorname{\mathrm{TC}}\big( \{ {\psi^{\sigma_n}}(\mathbf{x}_n)(v) \,:\, v \in S_n \} \big) \leq \operatorname{\mathrm{TC}}\big( \{ (\mathbf{x}_n)^{\sigma_n^{B_r}(v)} \,:\, v \in S_n \} \big) \leq \varepsilon {\lvert {S_n}\rvert}, \end{align*}$$

where $\mathbf {x}_n$ is a random sample of $\mu _n$ and $\psi ^{\sigma _n} \colon \mathtt {A}^{V_n} \to \mathtt {B}^{V_n}$ is defined as in (7).

Step 4. Let $(\psi _m)_m$ be a local approximating sequence sequence to $\phi $ , meaning that (8) holds, and hence, $\textrm {d}^{\mathrm {Rok}}_\mu (\phi , \psi _m)$ also converges to 0. Since the Rokhlin distance satisfies the triangle inequality, there is some $M \in \mathbb {N}$ such that if $m \geq M$ , then $\textrm {d}^{\mathrm {Rok}}_\mu (\psi , \psi _m) < \frac {1}{8} \operatorname {\mathrm {H}}_\mu (\phi )$ .

Given an open neighborhood $\mathcal {O} \ni \lambda $ , by [Reference Austin3, Prop. 4.10], there is some open neighborhood $\mathcal {U} \ni \mu $ and some $m \geq M$ such that, for all large enough n, the map $(\mathrm {id}_{\mathtt {A}^{V_n}},\psi _m^{\sigma _n})$ sends $\mathcal {U}$ -microstates to $\mathcal {O}$ -microstates. Let us also assume m is large enough that $\operatorname {\mathrm {H}}_\mu (\psi _m) \geq \frac {1}{2} \operatorname {\mathrm {H}}_\mu (\phi )$ .

Now fix some R such that $\psi $ and $\psi _m$ are both ${\mathrm {B}({R}, {e})}$ -local. By local convergence of $\mu _n$ to $\mu $ , for any $\delta>0$ , the fraction of $v \in S_n$ for which the local marginal ${\mathrm {Loc}({\mu _n}, {v})}_{B_R}$ is within total variation distance $\delta $ of $\mu _{B_R}$ is $1-o(1)$ . For the rest of the $v \in S_n$ , the term in the sum below has the upper bound $2 \log {\lvert {\mathtt {B}} \rvert } $ :

$$ \begin{align*} &\hspace{-1in} \tfrac{1}{{\lvert {S_n}\rvert} } {\lvert {\operatorname{\mathrm{TC}}\big( \{ {\psi^{\sigma_n}}(\mathbf{x}_n)(v) \,:\, v \in S_n \} \big) - \operatorname{\mathrm{TC}}\big( \{ {\psi_m^{\sigma_n}}(\mathbf{x}_n)(v) \,:\, v \in S_n \} \big)} \rvert} \\ &\leq \frac{2}{{\lvert {S_n}\rvert} } \sum_{v \in S_n} \left( \operatorname{\mathrm{H}}_{{\mathrm{Loc}({\mu_n}, {v})}} (\psi | \psi_m) + \operatorname{\mathrm{H}}_{{\mathrm{Loc}({\mu_n}, {v})} } (\psi_m | \psi) \right) \\ &\leq 2 \textrm{d}^{\mathrm{Rok}}_\mu(\psi_{m},\psi) + o(1). \end{align*} $$

Hence,

(13)

$$ \begin{align} \tfrac{1}{{\lvert {S_n}\rvert} } \operatorname{\mathrm{TC}}\big( \{ {\psi_m^{\sigma_n}}(\mathbf{x}_n)(v) \,:\, v \in S_n \} \big) \leq \frac{3}{8} \operatorname{\mathrm{H}}_\mu(\phi) + o(1). \end{align} $$

By empirical convergence, for large n, the model measure $\mu _n$ is mostly supported on $\mathcal {U}$ -microstates. So $(\mathrm {id}_{\mathtt {A}^{V_n}},\psi _m^{\sigma _n})_*\mu _n$ is mostly supported on $\mathcal {O}$ -microstates, and using Fano’s inequality, we see that

$$\begin{align*}\frac{1}{{\lvert {V_n}\rvert} } \operatorname{\mathrm{H}}( {\psi_{m}^{\sigma_n}}_*\mu_n) \leq \frac{1}{{\lvert {V_n}\rvert} } \log {\lvert {\mathrm{proj}_n[\Omega(\sigma_n, \mathcal{O})]} \rvert} + o(1). \end{align*}$$

The total correlation bound (13) gives

$$ \begin{align*} \operatorname{\mathrm{H}}( {\psi_{m}^{\sigma_n}}_*\mu_n) = \operatorname{\mathrm{H}}( \psi_{m}^{\sigma_n}(\mathbf{x}_n)) &\ge \operatorname{\mathrm{H}}( \{ (\psi_{m}^{\sigma_n}(\mathbf{x}_n)(v) \,:\, v \in S_n \}) \\ &\geq \sum_{v \in S_n} \operatorname{\mathrm{H}}( (\psi_{m}^{\sigma_n}(\mathbf{x}_n)(v) \}) -{\lvert {S_n}\rvert} \big( \tfrac{3}{8} \operatorname{\mathrm{H}}_\mu(\phi) + o(1) \big). \end{align*} $$

Since $(\mu _n)_n$ converges locally to $\mu $ and $\psi _m$ is a local function,

$$\begin{align*}\frac{1}{{\lvert {S_n}\rvert} } \sum_{v \in S_n} \operatorname{\mathrm{H}}( (\psi_{m}^{\sigma_n}(\mathbf{x}_n)(v) ) = \operatorname{\mathrm{H}}_\mu(\psi_m) + o(1) \geq \tfrac{1}{2} \operatorname{\mathrm{H}}_\mu(\phi) + o(1). \end{align*}$$

$$\begin{align*}\frac{1}{{\lvert {V_n}\rvert} } \log {\lvert {\mathrm{proj}_n[\Omega(\sigma_n, \mathcal{O})]}\rvert} \geq \frac{{\lvert {S_n}\rvert} }{{\lvert {V_n}\rvert} } \left( \tfrac{1}{8}\operatorname{\mathrm{H}}_\mu(\phi) + o(1) \right) + o(1) , \end{align*}$$

and for every $\mathcal {O} \ni \lambda $ ,

$$\begin{align*}\liminf_{n \to \infty} \frac{1}{{\lvert {V_n}\rvert} } \log {\lvert {\mathrm{proj}_n[\Omega(\sigma_n, \mathcal{O})]}\rvert} \geq \left(\liminf_{n \to \infty} \frac{{\lvert {S_n}\rvert} }{{\lvert {V_n}\rvert} }\right) \tfrac{1}{8} \operatorname{\mathrm{H}}_\mu(\phi). \end{align*}$$

Since we chose $(S_n)_n$ independently of $\mathcal {O}$ , and $\liminf _{n \to \infty } \frac {{\lvert {S_n}\rvert } }{{\lvert {V_n}\rvert } }> 0$ , taking the infimum over $\mathcal {O}$ completes the proof.

5 Shattering

Let $(\mathtt {A}^\Gamma ,\mu ,T)$ be a shift $\Gamma $ -system and $\Sigma =(\sigma _n)_n$ a sofic approximation, where ${\sigma _n:\Gamma \to \operatorname {\mathrm {Sym}}(V_n)}$ . On each model space $\mathtt {A}^{V_n}$ , we have the normalized Hamming distance defined by

$$ \begin{align*}\textrm{d}^{(V_n)}(\mathbf{x},\mathbf{y})= |V_n|^{-1}|\{v\in V_n:~\mathbf{x}(v)\ne \mathbf{y}(v)\}|.\end{align*} $$

In this section, we derive ergodic-theoretic consequences from the following phenomenon, which is at the heart of our study of parity check shifts.

Definition 5.1. The shift system has totally shattered microstate spaces along $\Sigma $ if (i) it has microstates along $\Sigma $ , and (ii) there exists a $\delta> 0$ for which the following holds. For every $\varepsilon> 0$ , there exist a weak $^\ast $ neighbourhood $\mathcal {O}$ of $\mu $ and a positive integer $n_0$ such that, for any $n\ge n_0$ and any two microstates $\mathbf {x},\mathbf {y} \in \Omega (\sigma _n,\mathcal {O})$ , we have

$$ \begin{align*}\mbox{either} \quad \textrm{d}^{(V_n)}(\mathbf{x},\mathbf{y}) \ge \delta \quad \mbox{or}\quad \textrm{d}^{(V_n)}(\mathbf{x},\mathbf{y}) < \varepsilon.\end{align*} $$

We refer to any such $\delta $ as a shatter distance for the system along $\Sigma $ .

5.1 Consequences of shattering

In this section, fix a sofic approximation $\Sigma = (\sigma _n)_n$ by homomorphisms, and assume that $(\mathtt {A}^\Gamma ,\mu ,T)$ has totally shattered microstate spaces along $\Sigma $ .

Theorem 5.2. For every $\varepsilon> 0$ , there is a neighbourhood $\mathcal {O}$ of $\mu $ for which the following holds. Let $(\mathtt {B}^\Gamma ,\nu ,T)$ be another shift system that has microstates along $\Sigma $ , let $(\mathtt {K}^\Gamma , \kappa ^\Gamma ,T)$ be a Bernoulli shift, and let

$$\begin{align*}\phi:\mathtt{B}^\Gamma \times \mathtt{K}^\Gamma \to \mathtt{A}\end{align*}$$

be a measurable map. If

(14)

$$ \begin{align} \phi^\Gamma_\ast (\nu\times \kappa^\Gamma) \in \mathcal{O}, \end{align} $$

then

(15)

$$ \begin{align} (\nu\times \kappa^\Gamma\times \kappa^\Gamma)\{(y,z,z'):\ \phi(y,z) \ne \phi(y,z')\} < \varepsilon. \end{align} $$

Intuitively, the conclusion is that if $\phi ^\Gamma $ is approximately a factor map onto $(\mathtt {A}^\Gamma ,\mu ,T)$ , then it must be approximately independent from the second coordinate in $\mathtt {B}^\Gamma \times \mathtt {K}^\Gamma $ . There are many ways to capture the latter assertion precisely, but (15) turns out to be convenient during the proof.

The proof of Theorem 5.2 has much in common with the main proof in [Reference Gamarnik and Sudan33]. In that paper, Gamarnik and Sudan used a relative of shattering to prove an a.a.s. upper bound on the maximum size of an independent set on a random regular graph that can be constructed using a local algorithm. The property they use there is now called the ‘overlap gap property’ and is actually a little weaker than being totally shattered. See [Reference Gamarnik32] for a recent survey. The reference [Reference Lyons46, Section 4] explains how the absence of an approximating local algorithm for certain combinatorial problems implies that a resulting limit process is not weakly contained in a Bernoulli shift.

Our proof of Theorem 5.2 needs a couple of known facts about microstate spaces for a product in which one factor is Bernoulli. We recall these as separate lemmas before starting the proof.

Lemma 5.3. Let $(\mathtt {B}^\Gamma ,\nu ,T)$ be a shift system and let $(\mathtt {L}^\Gamma ,\lambda ^\Gamma ,T)$ be a Bernoulli shift. Assume that $\mathbf {y}_n \in \mathtt {B}^{V_n}$ is a sequence such that $P^{\sigma _n}_{\mathbf {y}_n} \to \nu $ . Then

$$\begin{align*}\lambda^{V_n}\big\{\mathbf{z}:\ P^{\sigma_n}_{(\mathbf{y}_n,\mathbf{z})}\in \mathcal{O}\big\} \to 1\end{align*}$$

for every neighbourhood $\mathcal {O}$ of $\nu \times \lambda ^\Gamma $ . symbol

Lemma 5.3 is well known as folklore in the study of sofic entropy, and a full proof can be found inside the proofs of some of its consequences in the literature. The earliest and perhaps easiest to extract is inside the proof of the lower bound in [Reference Bowen11, Theorem 8.1].

The next lemma is more specialized but was also used in the second author’s previous counterexample to the weak Pinsker conjecture for some sofic groups: it is a special case of [Reference Bowen17, Proposition 7.9]. It refers to ‘hereditary’ neighbourhoods of a shift-invariant measure. We do not repeat the definition of these here; the only property we need is that they form a basis for the weak $^\ast $ topology.

Lemma 5.4. Let $\mathcal {U} \subset \mathcal {V}$ be open neighborhoods of $\nu \times \kappa ^\Gamma $ with $\mathcal {U}$ hereditary and $\mathcal {V}$ containing the closure of $\mathcal {U}$ . For every $\delta>0$ , if n is large enough, then for every $\mathbf {z}, \mathbf {z}' \in \mathtt {K}^{V_n}$ and $\mathbf {y} \in \mathtt {B}^{V_n}$ , if $(\mathbf {y},\mathbf {z}),\, (\mathbf {y}, \mathbf {z}')$ are in $\Omega (\sigma _n, \mathcal {U})$ , then they are $\delta $ -connected within $\Omega (\sigma _n,\mathcal {V})$ (this means there are $\mathbf {w}_1,\ldots , \mathbf {w}_k\in \Omega (\sigma _n,\mathcal {V})$ with $\mathbf {w}_1=(\mathbf {y},\mathbf {z})$ , $\mathbf {w}_k=(\mathbf {y},\mathbf {z}')$ and $d^{(V_n)}(\mathbf {w}_i, \mathbf {w}_{i+1})<\delta $ for all i). symbol

Proof of Theorem 5.2.

Let $\delta $ be a shatter distance for the system along $\Sigma $ . Let $\varepsilon> 0$ be small enough that $2\varepsilon < \delta $ , and now let $\mathcal {O}_1$ and $n_0$ be a neighbourhood and positive integer as promised by Definition 5.1 for this choice of $\varepsilon $ . Lastly, choose a smaller neighbourhood $\mathcal {O}$ of $\mu $ whose closure is contained in $\mathcal {O}_1$ .

Let $\phi :\mathtt {B}^\Gamma \times \mathtt {K}^\Gamma \to \mathtt {A}$ be a measurable map such that $\phi _\ast ^\Gamma (\nu \times \kappa ^\Gamma ) \in \mathcal {O}$ . In the rest of the proof, we show that (15) holds for this $\mathcal {O}$ and with $3\varepsilon $ in place of $\varepsilon $ .

Step 1. Since $\mathtt {A}$ is finite and $\phi $ is measurable, for any $\eta> 0$ , there is an approximating map

$$\begin{align*}\psi:\mathtt{B}^\Gamma\times \mathtt{K}^\Gamma \to \mathtt{A}\end{align*}$$

such that

(16)

$$ \begin{align} (\nu\times \kappa^\Gamma)\{\psi \ne \phi\} < \eta \end{align} $$

and such that (i) $\psi $ is a local map and (ii) $\psi $ depends on the coordinates in $\mathtt {K}^\Gamma $ only through some finite measurable partition $\mathcal {P}$ of $\mathtt {K}$ . If we choose $\eta $ sufficiently small in terms of $\mathcal {O}\subset \mathcal {O}_1$ and $\varepsilon $ , then (14) and (16) imply that $\psi ^\Gamma _\ast (\nu \times \kappa ^\Gamma )$ still lies in $\mathcal {O}$ , and also (16) implies that the desired conclusion (15) holds for $\phi $ with error $3\varepsilon $ if it holds for $\psi $ with error $2\varepsilon $ .

By replacing $\mathtt {K}$ with the set of cells $\mathcal {P}$ , we have therefore reduced our work to the case when $\mathtt {K}$ is finite and $\phi $ is F-local for some finite subset F of $\Gamma $ . We assume this for the rest of the proof and do not refer to $\psi $ again.

Having made these assumptions, let us note that the set

$$\begin{align*}\Delta:= \{(y,z,z'):\ \phi(y,z) \ne \phi(y,z')\}\end{align*}$$

is closed and open in $\mathtt {B}^\Gamma \times \mathtt {K}^\Gamma \times \mathtt {K}^\Gamma $ .

Step 2. Since $\phi $ is local, pushing forward by the equivariant map $\phi ^\Gamma $ acts continuously on probability measures. We may therefore choose a neighbourhood $\mathcal {V}$ of $\nu \times \kappa ^\Gamma $ such that $\phi ^\Gamma _\ast [\mathcal {V}] \subset \mathcal {O}$ . Since $\sigma _n$ is a homomorphism, we have $P_{\phi ^{\sigma _n}(\mathbf {x})}^{\sigma _n} = \phi ^\Gamma _* P_{\mathbf {x}}^{\sigma _n}$ , so this implies that

(17)

$$ \begin{align} \phi^{\sigma_n}[\Omega(\sigma_n,\mathcal{V})] \subset \Omega(\sigma_n,\mathcal{O}) \quad \mbox{for every}\ n. \end{align} $$

In addition, since hereditary neighbourhoods form a basis, we may let $\mathcal {U}$ be a hereditary neighbourhood of $\nu \times \kappa ^\Gamma $ whose closure is contained in $\mathcal {V}$ .

Step 3. By assumption, there is a sequence $\mathbf {y}_n \in \mathtt {B}^{V_n}$ such that $P^{\sigma _n}_{\mathbf {y}_n} \to \nu $ . Fix this for the rest of the proof.

For each n, consider $\mathbf {z}_n$ and $\mathbf {z}_n' \in \mathtt {K}^{V_n}$ drawn independently at random according to $\kappa ^{V_n}$ , and consider the event

$$\begin{align*}E := \big\{(\mathbf{z}_n,\mathbf{z}_n') \in \mathtt{K}^{V_n}\times \mathtt{K}^{V_n}:\ (\mathbf{y}_n,\mathbf{z}_n),(\mathbf{y}_n,\mathbf{z}_n')\in \Omega(\sigma_n,\mathcal{U})\big\}.\end{align*}$$

By Lemma 5.3, we have

(18)

$$ \begin{align} (\kappa^{V_n}\times \kappa^{V_n})(E) \to 1. \end{align} $$

When E occurs, we can draw the following additional conclusions:

• The points $(\mathbf {y}_n,\mathbf {z}_n)$ and $(\mathbf {y}_n,\mathbf {z}_n')$ are $(\delta /|F|)$ -connected within $\Omega (\sigma _n,\mathcal {V})$ for all sufficiently large n (not depending on the specific values of $\mathbf {y}_n$ , $\mathbf {z}_n$ or $\mathbf {z}_n'$ ), by Lemma 5.4.
• Since $\phi $ is F-local, the map $\phi ^{\sigma _n}$ is $|F|$ -Lipschitz for the normalized Hamming metrics (Lemma 2.3), so $\phi ^{\sigma _n}(\mathbf {y}_n,\mathbf {z}_n)$ and $\phi ^{\sigma _n}(\mathbf {y}_n,\mathbf {z}_n')$ both lie in $\Omega (\sigma _n,\mathcal {O})$ and are $\delta $ -connected within that set, by the previous conclusion and (17).
• By total shattering and our choice of $\mathcal {O}$ , we can now deduce that
(19) $$ \begin{align} \textrm{d}^{(V_n)}\big(\phi^{\sigma_n}(\mathbf{y}_n,\mathbf{z}_n),\phi^{\sigma_n}(\mathbf{y}_n,\mathbf{z}_n')\big) < \varepsilon \end{align} $$
for all sufficiently large n. Indeed, if n is large enough and
$$\begin{align*}\phi^{\sigma_n}(\mathbf{y}_n,\mathbf{z}_n) = \mathbf{w}_1,\ \dots,\ \mathbf{w}_l = \phi^{\sigma_n}(\mathbf{y}_n,\mathbf{z}_n')\end{align*}$$
is a sequence in $\Omega (\sigma _n,\mathcal {O})$ with all consecutive distances less than $\delta $ , then total shattering implies that all of these distances are actually less than $\varepsilon $ . Then the triangle inequality implies that $\textrm {d}^{(V_n)}(\mathbf {w}_1,\mathbf {w}_3) < 2\varepsilon $ , which is still less than $\delta $ . We may therefore invoke total shattering again to conclude that in fact ${\textrm {d}^{(V_n)}(\mathbf {w}_1,\mathbf {w}_3) < \varepsilon }$ . Now a simple induction shows that in fact $\textrm {d}^{(V_n)}(\mathbf {w}_1,\mathbf {w}_i) < \varepsilon $ for every i, giving (19) when $i=l$ .

Unpacking the definitions of normalized Hamming metric and empirical distribution, the left-hand side of (19) is equal to

$$\begin{align*}\frac{1}{|V_n|}\sum_{v \in V_n} 1_{\left\{\phi(\Pi^{\sigma_n}_v\mathbf{y}_n,\Pi^{\sigma_n}_v\mathbf{z}_n)\ne \phi(\Pi^{\sigma_n}_v\mathbf{y}_n,\Pi^{\sigma_n}_v\mathbf{z}_n')\right\}} = P^{\sigma_n}_{(\mathbf{y}_n,\mathbf{z}_n,\mathbf{z}_n')}(\Delta).\end{align*}$$

Therefore, in view of the conclusions above, (18) implies that

(20)

$$ \begin{align} (\kappa^{V_n}\times \kappa^{V_n})\big\{P^{\sigma_n}_{(\mathbf{y}_n,\mathbf{z}_n,\mathbf{z}_n')}(\Delta) < \varepsilon\big\} \to 1. \end{align} $$

However, since $\Delta $ is a closed and open set, another appeal to Lemma 5.3 (this time for the larger product $\nu \times \kappa ^\Gamma \times \kappa ^\Gamma $ ) shows that

(21)

$$ \begin{align} (\kappa^{V_n}\times \kappa^{V_n}) \big\{P^{\sigma_n}_{(\mathbf{y}_n,\mathbf{z}_n,\mathbf{z}_n')}(\Delta)> (\nu\times \kappa^\Gamma\times \kappa^\Gamma)(\Delta) - \epsilon\big\} \to 1. \end{align} $$

The limits (20) and (21) can hold simultaneously only if

$$\begin{align*}(\nu \times \kappa^\Gamma\times \kappa^\Gamma)(\Delta) < 2\varepsilon.\end{align*}$$

Since $\varepsilon $ was arbitrary, this completes the proof.

Theorem 5.2 has several more streamlined consequences.

Corollary 5.5. Suppose that $(\mathtt {A}^\Gamma ,\mu ,T)$ has totally shattered microstate spaces along $\Sigma $ .

1. If $(Y,\theta ,S)$ is any probability-preserving $\Gamma $ -system that has microstates along $\Sigma $ (recall the definition of this property from Subsection 2.2), and $(\mathtt {K}^\Gamma ,\kappa ^\Gamma ,T)$ is a Bernoulli shift, then any factor map
$$\begin{align*}(Y\times \mathtt{K}^\Gamma,\theta\times \kappa^\Gamma,S\times T)\to (\mathtt{A}^\Gamma,\mu,T)\end{align*}$$
factorizes through the coordinate projection $Y \times \mathtt {K}^\Gamma \to Y$ up to agreement a.e.
2. The system $(\mathtt {A}^\Gamma ,\mu ,T)$ has no nontrivial direct Bernoulli factors.
3. The system $(\mathtt {A}^\Gamma ,\mu ,T)$ is not weakly contained in a Bernoulli shift unless it is actually a trivial system, meaning that $\mu = \delta _{(\dots ,a,a,\dots )}$ for some $a \in \mathtt {A}$ .

Proof. Part 1. Let the factor map in question be $\xi ^\Gamma $ for some measurable map ${\xi :Y\times \mathtt {K}^\Gamma \to \mathtt {A}}$ . Let $\varepsilon> 0$ , and let $\mathcal {O}$ be the neighbourhood of $\mu $ given by Theorem 5.2 for this $\varepsilon $ .

By approximating the level sets of $\xi $ by measurable rectangles, for any $\eta> 0$ , we can choose (i) a factor map $\pi $ from $(Y,\theta ,S)$ to a finite-alphabet shift system $(\mathtt {B}^\Gamma ,\nu ,T)$ and (ii) a map $\phi :\mathtt {B}^\Gamma \times \mathtt {K}^\Gamma \to \mathtt {A}$ such that

(22)

$$ \begin{align} (\theta\times \kappa^\Gamma)\{(y,z):\ \phi(\pi(y),z) \ne \xi(y,z)\} < \eta. \end{align} $$

Provided we choose $\eta $ small enough, the inequality (22) implies that

$$ \begin{align*}\phi^\Gamma_\ast(\nu\times \kappa^\Gamma) \in \mathcal{O}.\end{align*} $$

By shrinking further if necessary, we may also assume that $\eta < \varepsilon $ .

Fix a choice of $\eta $ with these properties, and consider the resulting maps $\pi $ and $\phi $ . Since $(Y,\theta ,S)$ has microstates along $\Sigma $ , so does its factor $(\mathtt {B}^\Gamma ,\nu ,T)$ . By our choice of $\eta $ , we have arranged that this factor system and the map $\phi $ satisfy (14). Therefore, Theorem 5.2 tells us that they also satisfy (15). Combined with (22), this gives

$$ \begin{align*} &(\theta\times \kappa^\Gamma\times \kappa^\Gamma)\{(y,z,z'):\ \xi(y,z)\neq \xi(y,z')\} \\ &\qquad\qquad\qquad\le 2\eta + (\nu\times \kappa^\Gamma\times \kappa^\Gamma)\{(y_1,z,z'):\ \phi(y_1,z)\neq \phi(y_1,z')\} < 2\eta + \varepsilon < 3\varepsilon. \end{align*} $$

Since $\varepsilon $ is arbitrary, the left-hand side here must actually equal $0$ , so in fact,

$$\begin{align*}\xi(y,z) = \xi(y,z') \quad \mbox{for}\ (\theta\times \kappa^\Gamma\times \kappa^\Gamma)\mbox{-a.e.}\ (y,z,z').\end{align*}$$

Therefore, possibly after adjusting on a set of measure zero, $\xi $ depends on only the first coordinate in $Y\times \mathtt {K}^\Gamma $ .

Part 2. If

$$\begin{align*}\xi^\Gamma:(Y,\theta,S)\times (\mathtt{K}^\Gamma,\kappa^\Gamma,T)\to (\mathtt{A}^\Gamma,\mu,T)\end{align*}$$

is an isomorphism, then the system $(Y,\theta ,S)$ is a factor of $(\mathtt {A}^\Gamma ,\mu ,T)$ , and so it inherits the property of having microstates along $\Sigma $ . Therefore, Part 1 shows that, up to agreement a.e., $\xi $ depends on only the first coordinate in $Y\times \mathtt {K}^\Gamma $ . Since $\xi ^\Gamma $ is an isomorphism, this is possible only if the Bernoulli factor $(\mathtt {K}^\Gamma ,\kappa ^\Gamma ,T)$ is trivial.

Part 3. Towards a contradiction, suppose that $(\mathtt {K}^\Gamma ,\kappa ^\Gamma ,T)$ is a Bernoulli shift and that $\phi _m:\mathtt {K}^\Gamma \to \mathtt {A}$ is a sequence of measurable maps such that

(23)

$$ \begin{align} \phi^\Gamma_{m\ast}\kappa^\Gamma \to \mu. \end{align} $$

Apply Theorem 5.2 by inserting a trivial one-point system in the place of $(\mathtt {B}^\Gamma ,\nu ,T)$ . The conclusion is that

$$\begin{align*}(\kappa^\Gamma\times \kappa^\Gamma)\{(z,z'):\ \phi_m(z) \ne \phi_m(z')\} \to 0.\end{align*}$$

However, this implies that the distribution of $\phi _m$ is converging to $\delta _a$ for some $a\in \mathtt {A}$ . Combined with (23), it follows that $\mu = \delta _{(\dots ,a,a,\dots )}$ .

5.2 Shattering and Bernoulli splittings

Suppose the shift system $(\mathtt {A}^\Gamma , \mu , T)$ has totally shattered microstate spaces along $\Sigma $ . Note as soon as $\varepsilon < \delta /2$ , if $\mathcal {O}$ is the neighborhood of $\mu $ given by the definition, then the relation ‘ $\textrm {d}^{(V_n)} < \delta $ ’ restricted to the microstate space $\Omega (\sigma _n, \mathcal {O})$ is an equivalence relation, so it partitions the microstate space into small-diameter, well-separated clusters.

In this section, we give a second proof of Corollary 5.5(2) and sketch a third. Both of these approaches are based on the clusters of microstates: one uses the number of clusters, which is one of the sofic homological invariants introduced in [Reference Bowen17], and the other uses the sizes of clusters. Both can be compared to the use of the ‘overlap gap property’ introduced in [Reference Gamarnik and Sudan33] to prove an a.a.s. upper bound on the size of an independent set on a random regular graph, when the independent set is required to be constructed from a local algorithm.

The size of clusters can be used as follows: if a direct Bernoulli factor exists, it can be shown that its entropy rate is a uniform lower bound for the exponential size of all microstate clusters, while if a system has totally shattered microstate spaces, then all its clusters are of subexponential size. This implies Corollary 5.5(2).

In the rest of the section, we give a proof using the number of clusters. First, we give relevant definitions.

If $\mathcal {O}$ is a subset of a metric space and $x,y \in \mathcal {O}$ , we say $x,y$ are $\delta $ -connected within $\mathcal {O}$ if there is a sequence of points $z_1, \ldots , z_l \in \mathcal {O}$ with $z_1 = x$ , $z_l = y$ , and $\textrm {d}(z_i, z_{i+1}) < \delta $ for each i. This defines an equivalence relation on $\mathcal {O}$ , which we denote $\mathrm {cl}(\mathcal {O},\delta )$ .

Given a map $\sigma \colon \Gamma \to \operatorname {\mathrm {Sym}}(V)$ , a labeling $\mathbf {x} \in \mathtt {A}^{V}$ , an open set $\mathcal {O} \subset \operatorname {\mathrm {Prob}}(\mathtt {A}^\Gamma )$ , and $\delta>0$ , let

$$\begin{align*}\mathrm{cl}(\sigma, \mathbf{x}, \mathcal{O}, \delta) = [x]_{\mathrm{cl}(\Omega(\sigma,\mathcal{O}),\delta)} = \{ \mathbf{z} \in \mathtt{A}^{V} \,:\, \mathbf{z} \text{ is }\delta\text{-connected to } \mathbf{x} \text{ within } \Omega(\sigma,\mathcal{O}) \} , \end{align*}$$

where $\Omega (\sigma ,\mathcal {O}) \subset \mathtt {A}^{V}$ has the normalized Hamming metric $\textrm {d}^{(V)}$ . In particular, $\mathrm {cl}(\sigma ,\mathbf {x}, \mathcal {O}, \delta ) \subset \Omega (\sigma ,\mathcal {O})$ .

Now if in addition to the above we have some $\mathcal {O}' \subset \mathcal {O}$ , the quotient

is the set of clusters of $\mathcal {O}'$ -microstates that are $\delta $ -connected within $\Omega (\sigma _n, \mathcal {O})$ . We define

Informally, we first set coarseness parameters $\mathcal {O}, \delta $ which divide microstate spaces into clusters. We consider the exponential growth rate of the number of these clusters which contain $\mathcal {O}'$ -good microstates for $\mu $ . At first, it might seem more natural to consider directly the growth rate of the number of $\delta $ -connected clusters within a single $\Omega (\sigma _n, \mathcal {O})$ :

then take $\delta $ to $0$ and in some sense $\mathcal {O}$ to $\mu $ . But there is no monotonicity in $\mathcal {O}$ : shrinking the neighborhood of $\mu $ removes some microstates from $\Omega (\sigma _n, \mathcal {O})$ , but this can both remove some clusters and/or break a cluster into multiple pieces. Considering pairs $\mathcal {O}' \subset \mathcal {O}$ is one natural way around this. See also the discussion of a related definition of connected model spaces in [Reference Austin4, Section 2.2].

In [Reference Bowen17], $b_{0,\Sigma }(\mu )$ is called the $0$ th Betti number of $\mu $ . If X is totally disconnected and $\mu \in \operatorname {\mathrm {Prob}}^\Gamma (X^\Gamma )$ , then $b_{0,\Sigma }(\mu )$ is a measure-conjugacy invariant [Reference Bowen17, Corollary 4.2].

It follows directly from the definition that $b_{0,\Sigma }(\mu ) \le \operatorname {\mathrm {h}}_\Sigma (\mu )$ (a similar inequality holds for higher-dimensional sofic homology theories [Reference Bowen17, Lemma 7.13]).

Lemma 5.6. If $(\mathtt {A}^\Gamma , \mu )$ has totally shattered microstate spaces over $\Sigma $ , then ${b_{0,\Sigma }(\mu ) = \operatorname {\mathrm {h}}_\Sigma (\mu )}$ .

Proof. Recall that in general $b_{0,\Sigma }(\mu ) \leq \operatorname {\mathrm {h}}_\Sigma (\mu )$ , so we only have to prove that having totally shattered microstate spaces implies the reverse inequality.

Let $\delta>0$ be as in the definition of totally shattered microstate spaces. Given $\varepsilon < \delta /2$ , there exists a neighborhood U of $\mu $ such that for all large n, every $\mathbf {x},\mathbf {y} \in \Omega (\sigma _n, U)$ have $\textrm {d}^{(V_n)}(\mathbf {x},\mathbf {y}) \in [0,\varepsilon ) \cup [\delta , \infty )$ . In particular, for every $\mathbf {x} \in \Omega (\sigma _n, U)$ ,

$$\begin{align*}\mathrm{cl}(\sigma_n, \mathbf{x}, U, \varepsilon) \subseteq {\mathrm{B}({\varepsilon}, {\mathbf{x}})} , \end{align*}$$

where ${\mathrm {B}({\varepsilon }, {\mathbf {x}})}$ is the radius- $\varepsilon $ ball around $\mathbf {x}$ . Thus,

$$\begin{align*}{\lvert {\mathrm{cl}(\sigma_n, \mathbf{x}, U, \varepsilon)}\rvert} \leq {\lvert {{\mathrm{B}({\varepsilon}, {\mathbf{x}})} }\rvert} \le \exp\big( {\lvert {V_n}\rvert} (\operatorname{\mathrm{H}}(\varepsilon) + \varepsilon \log {\lvert {\mathtt{A}}\rvert} + o(1) )\big). \end{align*}$$

This leads to a lower bound on the number of clusters for all $\mathcal {O}' \subset U$ :

Hence,

and taking the supremum over $\varepsilon>0$ and $U \ni \mu $ completes the proof.

Alternate proof of Corollary 5.5(2).

To obtain a contradiction, suppose that $(\mathtt {A}^\Gamma ,\mu ,T)$ is measurably conjugate to the direct product of a nontrivial Bernoulli shift $(\mathtt {K}^\Gamma ,\kappa ^\Gamma ,T)$ with another shift system $(\mathtt {B}^\Gamma ,\nu ,S)$ where $\mathtt {B}$ is a compact metrizable space. We may assume $\mathtt {B}$ is totally disconnected without loss of generality because any dynamical system is measurably conjugate to a system of this form; this assumption is used in results of [Reference Bowen17] cited below.

Sofic entropy is additive under taking direct products with a Bernoulli shift [Reference Bowen11, Theorem 8.1], so

(24)

$$ \begin{align} \operatorname{\mathrm{h}}_\Sigma(\mathtt{A}^\Gamma,\mu,T) = \operatorname{\mathrm{H}}(\mathtt{K},\kappa) + \operatorname{\mathrm{h}}_\Sigma(\mathtt{B}^\Gamma,\nu,S)> \operatorname{\mathrm{h}}_\Sigma(\mathtt{B}^\Gamma,\nu,S). \end{align} $$

Theorem 7.8 of [Reference Bowen17] implies that the $0$ -dimensional sofic homology theories of $(\mathtt {A}^\Gamma ,\mu ,T)$ and $(\mathtt {B}^\Gamma ,\nu ,S)$ are equivalent. In particular, this implies that the exponential rate of growth of the number of clusters in the microstate spaces of the two actions are the same. In the notation of [Reference Bowen17], this means

$$ \begin{align*}b_{0,\Sigma}(\mathtt{A}^\Gamma,\mu,T) = b_{0,\Sigma}(\mathtt{B}^\Gamma,\nu,S)\le \operatorname{\mathrm{h}}_\Sigma(\mathtt{B}^\Gamma,\nu,S),\end{align*} $$

where the last inequality holds by [Reference Bowen17, Lemma 7.13].

Because $(\mathtt {A}^\Gamma ,\mu ,T)$ has totally shattered microstate spaces, by Lemma 5.6, we have $b_{0,\Sigma }(\mathtt {A}^\Gamma ,\mu ,T) = \operatorname {\mathrm {h}}_\Sigma (\mathtt {A}^\Gamma ,\mu ,T)$ . Combined with the previous inequality, this implies

$$ \begin{align*}\operatorname{\mathrm{h}}_\Sigma(\mathtt{A}^\Gamma,\mu,T) = b_{0,\Sigma}(\mathtt{A}^\Gamma,\mu,T) \le \operatorname{\mathrm{h}}_\Sigma(\mathtt{B}^\Gamma,\nu,S),\end{align*} $$

which contradicts (24). This contradiction finishes the proof.

PART II

Parity check subshifts

In this part, we fix natural numbers $d,k$ with $k> d \geq 3$ and let $\Gamma =\Gamma _{d,k}$ be the d-fold free product of order-k cyclic groups:

$$\begin{align*}\Gamma := \langle s_1,\dots,s_d:\ s_1^k = \dots = s_d^k = e\rangle = \underbrace{\mathbb{Z}_k\ast \cdots \ast \mathbb{Z}_k}_{d}.\end{align*}$$

Let $X \le \mathbb {Z}_2^\Gamma $ be the closed subgroup defined by

$$\begin{align*}X = \left\{x \in \mathbb{Z}_2^\Gamma:\ \sum_{j=0}^{k-1}x_{gs_i^j} = 0 \ \forall g\in \Gamma,\ i = 1,\dots,d\right\},\end{align*}$$

and let $\mu = m_X$ be the Haar probability measure on X.

If F is any subset of $\Gamma $ , let $X_F$ be the image of X under the coordinate projection map $\mathbb {Z}_2^\Gamma \to \mathbb {Z}_2^F$ , and let $m_F$ be the pushforward of $m_X$ under this projection. We mostly use these notations when F is $B_r$ , the ball of radius r centered at the identity in the Cayley graph of $\Gamma $ , where we use $\{s_i^j:~ 1\le i \le d, 1\le j \le k-1\}$ for the generating set.

6 LDPC codes and measures on microstate spaces

6.1 The use of LDPC codes

Our proofs of the main theorems are considerably simplified by using the special structure of the system as a subgroup of $\mathbb {Z}_2^\Gamma $ . We largely do so via the corresponding finitary codes constructed over the sofic approximations. Given a k-uniform homomorphism $\sigma :\Gamma _{d,k}\to \operatorname {\mathrm {Sym}}(V)$ , let

$$\begin{align*}X_\sigma := \left\{\mathbf{x} \in \mathbb{Z}_2^V:\ \sum_{j=0}^{k-1}x_{\sigma_i^j(v)} = 0\ \forall v\in V,\ i=1,\dots,d\right\}.\end{align*}$$

Let $\mu _\sigma $ be the uniform distribution on $X_\sigma $ .

These are the obvious finitary analogs of X and $m_X$ themselves. It turns out that these finitary constructs can be used as better and better approximations to the infinitary system and measure, and their linear structure makes them easier to analyze than the ‘looser’ sets $\Omega (\sigma ,U)$ that are used to define sofic entropy.

In fact, sets such as $X_\sigma $ are classical objects in coding theory. They are linear codes over the field $\mathbb {Z}_2$ , each defined by a collection of parity-check constraints. Since each of those parity-checks involves only k vertices, and k is fixed as n grows, these are examples of low-density parity-check (‘LDPC’) codes. Such codes were introduced in Gallager’s PhD thesis [Reference Gallager31, Reference Gallager30]. After many years of relative neglect, they were rediscovered independently by MacKay in the late 1990s [Reference MacKay47], and they are now a textbook family of codes with desirable properties. A good basic reference for their theory is [Reference MacKay48, Chapter 47], and a more dedicated treatment is [Reference Richardson and Urbanke61]. In fact, the essence of the parity-check subshift X itself already appears in those sources too, playing the role of an ‘idealized limit code’ on which to investigate the performance of local decoding algorithms: see, for instance, [Reference MacKay48, Figure 47.11] and the discussion around it.

6.2 Outline of the rest of the paper

Our use of the measures $\mu _\sigma $ to analyze the system $(X,m_X,T)$ rests on the following main results. As before, we confine the index n to multiples of k. Abbreviate $X_{\sigma _n}$ to $X_n$ and $\mu _{\sigma _n}$ to $\mu _n$ .

Recall that $\mathbb {P}_n$ is the uniform probability measure on $\operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma , \operatorname {\mathrm {Sym}}(n))$ which is the set of k-uniform homomorphisms $\sigma :\Gamma \to \operatorname {\mathrm {Sym}}(n)$ .

Theorem 6.1. There are subsets

$$\begin{align*}\Omega^{\prime}_n \subseteq \operatorname{\mathrm{Hom}}_{\mathrm{unif}}(\Gamma,\operatorname{\mathrm{Sym}}(V_n))\end{align*}$$

such that $\mathbb {P}_n(\Omega ^{\prime }_n) \to 1$ and the following holds. If $\sigma _n\in \Omega _n'$ for each n, then

1. $(\mu _n)_n$ has property M;
2. $(\mu _n)_n$ converges locally and empirically to $m_X$ ;
3. $(X,m_X,T)$ has totally shattered microstate spaces along $\Sigma =(\sigma _n)_n$ .

Our proof of Theorem 6.1 relies on the linear structure of the codes $X_\sigma $ .

In the next few sections, we introduce notation to formulate Proposition 7.1 which, roughly speaking, rules out near-cancellations among parity-check words of a typical $\sigma _n \sim \mathbb {P}_n$ . Section §7.1 proves Theorem 6.1(1) from Proposition 7.1. The rest of §7 proves Proposition 7.1.

Item (2) of Theorem 6.1 is proven in §8. Its proof relies on item (1). Item (3) is proven in §10. Its proof does not refer to items (1) or (2). Theorem B is an immediate consequence of item (3) and Corollary 5.5(2).

Theorem 1.3 is proven in §9. Part (b) of that theorem follows readily from items (1) and (2) of Theorem 6.1 together with Theorem 4.3. Part (a) (which computes the sofic entropy value) uses item (2) and the Bethe-Kikuchi entropy theory of §3.

6.3 Random factor graphs

In this subsection, fix a size n that is divisible by k and suppress it from the notation: thus, for instance, $\mathbb {P}$ stands for $\mathbb {P}_n$ . Most of our work towards Theorem 6.1 consists of estimates of various probabilities under $\mathbb {P}$ .

Several of these estimates involve a sum or union bound over possible subfamilies of the set of all hyper-edges of the form (9). We need to be able to move the sum outside an expectation, and for this purpose, the sum must be over a range which is fixed, not random. For this reason, it is convenient to augment the information in $\sigma _n$ with a labeling of the family of hyper-edges (9) by a fixed index set.

Let $E_1$ , …, $E_d$ be disjoint sets, each of size $n/k$ , and let $E := E_1\cup \cdots \cup E_d$ . Taking some terminology from coding theory, we refer to the elements of E as check nodes: this is explained further in Subsection 6.4 below. Let $\sigma \in \operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma ,\operatorname {\mathrm {Sym}}(V))$ with $|V|=n$ . Fix $i \in [d]$ , and consider that V is partitioned into the orbits of the generator $\sigma (s_i)$ . Each orbit corresponds to a hyper-edge as in (9). Since each hyper-edge has size k, there are $n/k$ of them, and so there exists a bijection between $E_i$ and this family of hyper-edges. Let us choose such a bijection uniformly and independently at randomly for each i, and record the result as a subset $H\subseteq E\times V$ : a pair $(e,v) \in E_i\times V$ lies in H if e is attached by the $i^{\mathrm {th}}$ bijection to the hyper-edge that contains v.

We regard H as a bipartite graph on the disjoint union of E and V. As such, each check node in E has exactly k neighbours in V, and each vertex in V has exactly one neighbour in each of the subsets $E_i$ (and thus d neighbours in total). It follows that each intersection $H\cap (E_i\times V)$ is equivalent to a partition of V into parts labeled by $E_i$ . In the sequel, we borrow some more terminology from coding theory and refer to any such bipartite graph H on E and V as a factor graph: see, for instance, [Reference MacKay48, Sections 26.1 and 47.2]. Beware that this is actually a slight deviation from standard usage, which would not insist that H be a union of the partitions $H\cap (E_i\times V)$ , but here we do take this as part of the definition of a ‘factor graph’.

More generally, if $F \subseteq E$ , then a partial factor graph on F and V is a bipartite graph $M \subseteq F\times V$ such that every check node in F has precisely k neighbours in V and every vertex in V is joined to at most one check node in each intersection $F\cap E_i$ . Equivalently, there exists a factor graph H such that $M = H \cap (F\times V)$ . In particular, if $F = E_i$ , then a partial factor graph is simply a partition of V into k-sets that are labeled by $E_i$ .

If H is a factor graph on E and V and $F\subseteq E$ , then the vertex neighbourhood of F is the set

$$\begin{align*}\mathtt{Vert}(H;F) := \{v \in V:\ (e,v) \in H\ \mbox{for some}\ e \in F\}.\end{align*}$$

We also use the notation $\mathtt {Vert}(M;F) $ for a partial factor graph M in the same way.

Similarly, the check-node neighbourhood of $U\subseteq V$ is the set

$$\begin{align*}\mathtt{Check}(H;U) := \{e \in E:\ (e,v) \in H\ \mbox{for some}\ v \in V\}.\end{align*}$$

We may iterate these definitions to define neighbourhoods with larger radii. In general, for $F \subseteq E$ or $U\subseteq V$ , we define

$$ \begin{align*} \mathtt{Vert}_1(H;F) &:= \mathtt{Vert}(H;F),\\ \mathtt{Check}_1(H;U) &:= \mathtt{Check}(H;U),\\ \mathtt{Check}_1(H;F) &:= \mathtt{Check}_1\big(\mathtt{Vert}_1(H;F)\big) \\ \mbox{and} \quad \mathtt{Vert}_1(H;U) &:= \mathtt{Vert}_1\big(\mathtt{Check}_1(H;U)\big). \end{align*} $$

Then for integers $r> 1$ , we make the recursive definitions

$$\begin{align*}\mathtt{Vert}_r(H;F) := \mathtt{Vert}_1(H;\big( \mathtt{Vert}_{r-1}(H;F)\big),\end{align*}$$

and the same with U in place of F or $\mathtt {Check}$ in place of $\mathtt {Vert}$ . In graph theoretic terms, if $F \subseteq E$ , then $\mathtt {Vert}_r(H;F)$ is the set of all vertices whose graph distance in H is at most $2r-1$ from F, $\mathtt {Check}_r(H;F)$ is the set of all check nodes whose graph distance in H is at most $2r$ from F, and similarly for subsets of V.

These neighbourhoods are compatible with the Cayley graph neighbourhoods $B_r = {\mathrm {B}({e}, {r})}$ induced by the word metric on $\Gamma $ (with respect to the generating set $\{s_i^j:~ 1 \le i \le d, 1\le j \le k-1\})$ . Specifically, if H arises from $\sigma $ through the construction above, and $U\subseteq V$ , then

$$\begin{align*}\sigma^{B_r}(U) := \{\sigma^g(u):\ g\in B_r,\ u \in U\} = \mathtt{Vert}_r(H;U).\end{align*}$$

Now consider $\sigma \sim \mathbb {P}$ and generate H from it as described above. Then H results from two random steps: the choice of $\sigma $ and then the choice of a bijection for each i. Let $\tilde {\mathbb {P}}$ be the joint distribution of $(\sigma ,H)$ after this construction. This is a probability measure on

$$\begin{align*}\tilde{\Omega} := \operatorname{\mathrm{Hom}}_{\mathrm{unif}}(\Gamma,\operatorname{\mathrm{Sym}}(V)) \times \{0,1\}^{E\times V}.\end{align*}$$

It is a coupling of $\mathbb {P}$ to the law of H described above, and it has the following simple properties:

• Given $\sigma $ , the conditional distribution of H is uniform over $((n/k)!)^d$ choices of bijections.
• Given H, the conditional distribution of $\sigma $ is uniform over all choices of cyclic orderings for each hyper-edge of the multi-hyper-graph: there are ${((k-1)!)^{dn/k}}$ such choices for any H.
• Under the marginal distribution of H, the intersections $H\cap (E_i\times V)$ are independent as i varies.

For $(\sigma ,H)$ drawn from $\tilde {\mathbb {P}}$ , the underlying multi-hyper-graph may be read off from either coordinate: it is the multi-set of all $\sigma (s_i)$ orbits for $i\in [d]$ , and it is also the collection of all vertex neighbourhoods of the check nodes according to H. The labeling of these hyper-edges by the fixed set E that is given by H is convenient for union bounds and other forms of counting. Most of our probabilistic estimates concerning these hyper-graphs later refer to $\tilde {\mathbb {P}}$ rather than $\mathbb {P}.$

Remark. The random hyper-graphs on V that arise from $\sigma \sim \mathbb {P}$ are k-uniform and d-regular.

Models of such random hyper-graphs already have an established place in the literature on probabilistic combinatorics: see, for instance, [Reference Wormald76, Subsection 3.5] and the references given there. However, most of that literature is dedicated to a uniform random choice of such a hyper-graph, and this is not the same as the distribution that results from a random choice of $\sigma $ . The point is that if the hyper-graph is generated by a homomorphism $\sigma $ , then its hyper-edges can be classified according to which generator $s_i$ gave rise to them. For each single i, the corresponding hyper-edges form a partition of V into k sets. In general, a k-uniform d-regular hyper-graph need not be a union of partitions.

In case $k=2$ , the difference here is between a uniformly random d-regular graph and the sum of d independent random matchings. In this case, the difference has been studied in some depth, with the outcome that these models are ‘contiguous’: see, for instance, [Reference Greenhill, Janson, Kim and Wormald34] or[Reference Wormald76, Subsection 4.3]. The relation of contiguity is strong enough to allow us to transfer most phenomena of interest from one model to the other.

However, for $k> d \ge 3$ , it turns out that contiguity fails. While we have not found a reference for this fact in the literature, it follows fairly easily from some other standard results, so we explain this in Appendix A.

For uniformly random k-uniform d-regular hyper-graphs, estimates on the typical behaviour of the resulting LDPC codes are widely available in the coding theory literature. For example, the typical rate of the resulting LDPC code is known very exactly from [Reference Miller and Cohen51] (we cite this in our proof of non-contiguity in Appendix A). These known results would include most of the facts we need here, were it not for the difference in the underlying random hyper-graph model. However, we have not found the analogous estimates for our distributions $\mathbb {P}$ , so we must develop them here from scratch as necessary. Nevertheless, the conclusions generally look the same as in those previous works, and we have been guided by those throughout.

6.4 Parity-check matrices

To prove the desired properties of the random measures $\mu _{\sigma _n}$ , we make careful use of the linear structure of the codes $X_n$ . As is standard in coding theory, this structure is conveniently summarized by a parity-check matrix.

To introduce this point of view, we start with some more notation. If a ground set A is understood and $B \subseteq A$ , then $\mathbf {e}_B$ denotes the mod- $2$ indicator function of B: this is the element of $\mathbb {Z}_2^A$ with entries equal to $\mathtt {1}$ precisely at the indices in B.

Now consider a vertex set V of size n which is a multiple of k, and let $E = E_1\cup \cdots \cup E_d$ as in Subsection 3.1. Let H be a factor graph on E and V, and turn it into the $(E\times V)$ -matrix $\mathbf {H} = \mathbf {e}_H$ , which also takes values in $\mathbb {Z}_2$ . If this generates the same hyper-graph as $\sigma \in \operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma , \operatorname {\mathrm {Sym}}(V))$ , then our code is given by

$$\begin{align*}X_\sigma = \{\mathbf{x} \in \mathbb{Z}_2^V:\ \mathbf{H}\mathbf{x} = \boldsymbol{0}\} = \ker \mathbf{H}.\end{align*}$$

In this role, $\mathbf {H}$ is called the parity-check matrix (for an introduction, see, for instance, [Reference Cameron and van Lint20, Chapter 9], [Reference Cover and Thomas23, Section 7.11] or [Reference MacKay48, Chapter 1], especially the discussion of Exercise 1.9). This is why we refer to the elements of E as ‘check nodes’: each corresponds to a row of $\mathbf {H}$ , which every codeword $\mathbf {x}$ must be orthogonal to. The representation of a code using a set of connections between vertices and check nodes is a common example of a ‘factor graph representation’ [Reference MacKay48, Section 47.2] – hence our use of the term ‘factor graph’ for H.

Because $X_\sigma $ is a linear subspace of $\mathbb {Z}_2^V$ , it may equivalently be specified via its dual code

$$ \begin{align*}X_\sigma^\perp := \{\mathbf{y} \in \mathbb{Z}_2^V:\ \langle\mathbf{y}, \mathbf{x}\rangle = 0 \quad\pmod{2} \ \forall \mathbf{x} \in X_\sigma \}.\end{align*} $$

Below, we refer to elements of $X_\sigma ^\perp $ as parity checks. By the construction of $X_\sigma $ , this $X_\sigma ^\perp $ is precisely the linear subspace of $\mathbb {Z}_2^V$ spanned by the rows of $\mathbf {H}$ – that is, by the vectors $\mathbf {e}_{\mathtt {Vert}(H;\{e\})}$ for $e \in E$ . More succinctly,

$$\begin{align*}X_\sigma^\perp = \mathrm{img}\,(\mathbf{H}^{\mathsf{T}}).\end{align*}$$

Theorem 6.1 (1,2) are proved by counting relations or ‘near relations’ among the rows of $\mathbf {H}$ . The second of these theorems requires the more complicated calculation. It is based on Proposition 7.1 below. With that proposition in hand, we can also prove Theorem 6.1(2); see Section 8.

7 Property M

In this section, we again suppress the subscript n from our notation, as in Subsection 6.3. As before, this index will tend to $\infty $ along multiples of k.

The main part of the proof of Theorem 6.1 parts (1) and (2) is another, more technical proposition. Put roughly, it rules out most ‘near cancellations’ among the parity-check words of a typical $(\sigma ,H)$ drawn at random. However, for the application to Theorem 6.1, we need such a result not only when $(\sigma ,H) \sim \tilde {\mathbb {P}}$ but also after conditioning $(\sigma ,H)$ on a small fraction of the vertices and check nodes.

We formulate this technical proposition next. For each $i \in [d]$ , let $F_i \subseteq E_i$ and let $W_i := \mathtt {Vert}(H;F_i)$ . Let $w_i:= |W_i| = k|F_i|$ . Let $F:= F_1\cup \cdots \cup F_d$ , $W := W_1 \cup \cdots \cup W_d$ , and $w:= |W|$ ; see Figure 1. Then

$$\begin{align*}\max\{w_1,\dots,w_d\} \le w \le w_1 + \cdots + w_d.\end{align*}$$

Figure 1 Diagrams showing full factor graph H (left) and partial factor graph M (right). The square vertices on the right of each graph are the check nodes, colored according to their membership in the sets $E_1, E_2, E_3$ . The distribution $\tilde {\mathbb {P}}^M$ draws a new pair $(\sigma , H)$ conditioned on the edges on the right being present.

Let M be a partial factor graph on F and V, and let $\tilde {\mathbb {P}}^M$ be the distribution obtained by conditioning $\tilde {\mathbb {P}}$ on the event that $H\cap (F\times V) = M$ . The key to Theorem 6.1 is that if the sets $F_i$ are small enough, then after this conditioning, the rest of H is very unlikely to create many new parity checks that involve only vertices inside W. Roughly speaking, this means that if $\mathbf {x} \in \mathbb {Z}^W_2$ satisfies all of the parity checks arising from F, then, with high probability, it admits an extension satisfying all of the parity checks.

Here and in the rest of the paper, if $F\subseteq E$ and $U\subseteq V$ , then $\mathbf {H}_{F\times U}$ denotes the submatrix of $\mathbf {H}$ indexed by these sets, and we use analogous notation for the transpose $\mathbf {H}^{\mathsf {T}}$ and for vectors indexed by either E or V. We identify $\mathbb {Z}_2^{E\setminus F}$ as a subspace of $\mathbb {Z}_2^E$ in the obvious way, and so any $\mathbf {y} \in \mathbb {Z}_2^{E\setminus F}$ may be written as a tuple $(\mathbf {y}_1,\dots ,\mathbf {y}_d)^{\mathsf {T}}$ with $\mathbf {y}_i \in \mathbb {Z}_2^{E_i\setminus F_i}$ .

Proposition 7.1 (‘Few additional checks inside W’).

For every $K> 0$ and $\varepsilon> 0$ , the following holds for all sufficiently small $\delta $ (depending on d, k, $\varepsilon $ and K). Let $F_i$ , $W_i$ , $w_i$ and M be as above. If

(25)

$$ \begin{align} \delta n \le w_1 + \dots + w_d \le K \delta n, \end{align} $$

then

$$ \begin{align*} \tilde{\mathbb{P}}^M\Big(\exists \mathbf{y} \in \mathbb{Z}_2^{E\setminus F}\ \mbox{such that}\ \min\{|\mathbf{y}_i|,|\mathbf{e}_{E_i\setminus F_i} - \mathbf{y}_i|\} \ge &\varepsilon\delta\frac{n}{k}\ \mbox{for some } i\\ &\quad \mbox{ and}\ (\mathbf{H}^{\mathsf{T}}\mathbf{y})_{V\setminus W} = \boldsymbol{0}\Big) \to 0 \end{align*} $$

as $n\to \infty $ . The rate of convergence depends only on d, k and $\delta $ and is independent of $\delta $ provided $\delta $ is small enough and also bounded away from zero.

We show how Proposition 7.1 implies Theorem 6.1 in the next subsection, and then we prove Proposition 7.1 itself in Subsection 7.2.

Remark. Proposition 7.1 has a similar flavor to the analysis of the satisfiability threshold for the random combinatorial problem known as random XORSAT. But our setting has the additional complication that we must analyse conditional probabilities given the behaviour of the random factor graph H on the small sets $F_i$ , which seems to make our situation less ‘homogeneous’. The random XORSAT model is treated in [Reference Mézard and Montanari49, Chapter 18] using the paradigm of ‘belief propagation’. It is possible that that approach could also be brought to bear in the situation in Proposition 7.1, possibly leading to an alternative proof, but we have not pursued this idea further.

7.1 Property M from few additional checks

To deduce Theorem 6.1 from Proposition 7.1, we also need two simpler calculations that we present as separate lemmas.

The first concerns the following construction. Fix a subset R of V and let $r> 0$ . For any factor graph $H\subset E\times V$ , we can form the neighbourhood $F := \mathtt {Check}_r(H;R)$ , and then the intersection $M := H \cap (F\times V)$ is a partial factor graph on F and V. Let us call the pair $(F,M)$ possible with respect to $(R,r)$ if they can arise from a factor graph in this way.

Lemma 7.2. Let R and r be as above, let $(F,M)$ be possible with respect to $(R,r)$ , and let $(\sigma ,H) \sim \tilde {\mathbb {P}}$ . If the event

$$\begin{align*}\{H:\ H\cap (F\times V) = M\}\end{align*}$$

occurs, then so does the event

$$\begin{align*}\{H:\ \mathtt{Check}_r(H;R) = F\}.\end{align*}$$

If we condition on the former event, then the subsets $H\cap ((E_i\setminus F)\times V)$ are independent for different i, and each is a uniform random labeled partition of $V\setminus \mathtt {Vert}(M; E_i \cap F)$ into k-sets.

Proof. First, the definition of the neighbourhood $\mathtt {Check}_r(H;R)$ depends only on those edges of the bipartite graph H that connect this neighbourhood to V. Since we are told that $(F,M)$ is possible with respect to $(R,r)$ , all those edges must already be visible in M, and so knowing that $H\cap (F\times V) = M$ is enough to tell us that $\mathtt {Check}_r(H;R) = F$ .

Now the conditional probability in question is $\tilde {\mathbb {P}}^M$ , as introduced previously. Since $\tilde {\mathbb {P}}$ is a uniform distribution, $\tilde {\mathbb {P}}^M$ is the uniform distribution over all factor graphs for which the event holds. However, if the event holds, then (i) $H \cap (F\times V)$ is uniquely determined, (ii) each $H\cap ((E_i\setminus F)\times V)$ must consist of a labeled partition of $V\setminus \mathtt {Vert}(M; E_i \cap F)$ into k-sets, and (iii) any tuple of such labeled partitions is still possible. So this conditional distribution is simply the uniform distribution over the Cartesian product set of tuples of such labeled partitions. This implies the desired joint distribution for these sets.

The next lemma may be well known in coding theory, but we have not found a convenient reference.

Lemma 7.3. Let A be any finite nonempty index set, $\delta < 1/3$ , and let Y be a linear subspace of $\mathbb {Z}_2^A$ such that

(26)

$$ \begin{align} \mbox{either}\quad |\mathbf{y}| \le \delta |A| \quad \mbox{or} \quad |\mathbf{y}| \ge (1-\delta)|A| \quad \mbox{for every} \quad \mathbf{y} \in Y. \end{align} $$

Then $\dim Y \le 2\delta |A| + 1$ .

Proof. Let Z be the subset of all $\mathbf {y} \in Y$ for which $|\mathbf {y}| \le \delta |A|$ . We prove that Z is a linear subspace, $\dim Y/Z \le 1$ , and $\dim Z \le 2\delta |A|$ .

First, Z clearly contains $\boldsymbol {0}$ , and if $\mathbf {y},\mathbf {y}' \in Z$ , then $\mathbf {y} + \mathbf {y}' \in Y$ and

$$\begin{align*}|\mathbf{y} + \mathbf{y}'| \le 2\delta |A|.\end{align*}$$

Since $2\delta < 1 - \delta $ , by (26), this forces $\mathbf {y} + \mathbf {y}' \in Z$ , so Z is a subspace.

Next, if $\mathbf {y},\mathbf {y}' \in Y\setminus Z$ , then

$$\begin{align*}|\mathbf{y} - \mathbf{y}'| \le |(\mathtt{1},\dots,\mathtt{1}) - \mathbf{y}| + |(\mathtt{1},\dots,\mathtt{1}) - \mathbf{y}'| \le 2\delta |A|,\end{align*}$$

so again we must in fact have $\mathbf {y} - \mathbf {y}' \in Z$ . Therefore, $\dim Y/Z \le 1$ .

Finally, call an index $i \in A$ proper if some member of Z is nonzero in this coordinate. Now choose $\mathbf {z}$ from Z uniformly at random. If i is proper, then the coordinate $z_i$ is equally likely to be $0$ or $1$ , so the expectation of $|\mathbf {z}|$ is half the number of proper coordinates. Therefore, the number of proper coordinates is at most $2\delta |A|$ , and this number is an upper bound on $\dim Z$ .

Property M (Theorem 6.1) follows from the following result, which is also used to prove local and empirical convergence:

Theorem 7.4. Given $r \in \mathbb {N}$ and $\eta> 0$ , for all small enough $\delta>0$ , the following holds: if for each n, $R_n \subset V_n$ is a subset of size $\lceil \delta n \rceil $ , then with high probability as $n \to \infty $ ,

$$\begin{align*}\operatorname{\mathrm{H}}\! \big( (\mu_n)_{\sigma_n^{B_r}(R_n)} \big) \geq (1 - \eta) \, \operatorname{\mathrm{H}} \!\big( m_{B_r} \big)|R_n|. \end{align*}$$

Proof of Theorem 7.4 assuming Proposition 7.1.

Beware that we continue to suppress n from subscripts where possible. It should be understood that the following argument and construction are carried out for each n that is divisible by k.

Fix $\varepsilon> 0$ . For a small positive $\delta $ to be specified shortly, let R be any fixed choice of a subset of V of size $\lceil \delta n\rceil $ , and now consider the following three subsets of $(\sigma ,H)$ in $\tilde {\Omega }$ :

i. (Few additional checks) $(\sigma ,H)$ is in $\Omega ^1$ if for

any $\mathbf {y} \in \mathbb {Z}_2^{E\setminus \mathtt {Check}_r(H;R)}$ such that $(\mathbf {H}^{\mathsf {T}}\mathbf {y})_{V\setminus \mathtt {Vert}_r(H;R)} = \boldsymbol {0}$ also has
$$\begin{align*}\min\{|\mathbf{y}_i|,|\mathbf{e}_{E_i\setminus F_i} - \mathbf{y}_i|\} < \varepsilon\delta\frac{n}{k} \qquad \mbox{for every}\ i \in [d],\end{align*}$$

where $F_i = E_i \cap \mathtt {Check}_r(H;R)$ .
ii. (Most vertices in R well-separated) $(\sigma ,H)$ is in $\Omega ^2$ if the set
$$\begin{align*}S_1 := \big\{v \in R:\ \mathtt{Vert}_r(H;v)\cap \mathtt{Vert}_r(H;R \setminus v) = \emptyset\big\}\end{align*}$$
has $|S_1|> (1-\varepsilon )|R|$ .
iii. (Most vertices in R not close to any short loops) $(\sigma ,H)$ is in $\Omega ^3$ if the set
$$\begin{align*}S_2 := \big\{v \in R:\ \mbox{the orbit map}\ B_r\to \mathtt{Vert}_r(H;v)\ \mbox{is injective}\big\}\end{align*}$$
has $|S_2|> (1-\varepsilon )|R|$ . (Here, the orbit map sends $\gamma \in B_r \subset \Gamma $ to $\sigma (\gamma )v$ .)

Let $S := S_1\cap S_2$ and $\Omega ' := \Omega ^1 \cap \Omega ^2\cap \Omega ^3$ . Think of S as the result of expurgating the ‘bad’ vertices from R and $\Omega '$ as the event that there are only a few of these.

In the remainder of the proof, we show that, provided $\delta $ was chosen small enough, we have $\tilde {\mathbb {P}}(\Omega ')\to 1$ and the set R satisfies the desired entropy bound.

Step 1: $\tilde {\mathbb {P}}(\Omega ')\to 1$ . First, we have $\tilde {\mathbb {P}}(\Omega ^2) \to 1$ as $n\to \infty $ for any sufficiently small $\delta $ in terms of d, k, r and $\varepsilon $ . To see this, observe that although we are fixing R and choosing $(\sigma ,H)$ at random, we obtain the same distribution on ${\lvert {S_1}\rvert }$ if we choose R uniformly at random among all subsets of V of cardinality $\lceil \delta n\rceil $ , independently of $(\sigma ,H)$ . Thus, it suffices to show that on any d-regular k-uniform hyper-graph on n vertices, if a subset $R \subset V$ of $\lceil \delta n\rceil $ vertices is chosen uniformly at random, then the probability that there are more than $\varepsilon \delta n$ vertices of R which are $\le 2r$ distance from another vertex in R tends to zero as $n\to \infty $ .

If some number, say x, of vertices have already been chosen, then the probability that the next vertex is not within distance $2r$ of the previously selected vertices is at least

$$ \begin{align*}1- \frac{(kd)^{2r}x}{n-x} \ge 1- \frac{(kd)^{2r}\lceil\delta n\rceil}{n - \lceil\delta n\rceil}\ge 1- (kd)^{2r} \frac{\delta}{1-\delta}.\end{align*} $$

This is because the number of vertices in the $(2r)$ -neighbourhood of a given vertex is at most $(kd)^{2r}$ . Thus, $\tilde {\mathbb {P}}(\Omega ^2)$ is at least the probability that in $\lceil \delta n\rceil $ Bernoulli trials with success probability $1- (kd)^{2r} \, \delta /(1-\delta ) $ , there are at least $(1-\varepsilon )\lceil \delta n\rceil $ successes. This occurs with overwhelming probability as $n\to \infty $ as long as $\varepsilon> (kd)^{2r} \, \delta /(1-\delta )$ , which we may assume by choosing $\delta $ sufficiently small.

Second, for any $\delta> 0$ , we have $\tilde {\mathbb {P}}(\Omega ^3)\to 1$ as $n\to \infty $ as an immediate corollary of Proposition 1.2.

So now let us show that $\tilde {\mathbb {P}}(\Omega ^1) \to 1$ as $n\to \infty $ . This is our application of Proposition 7.1. We make contact with that proposition by conditioning on the partial factor graph $H\cap (\mathtt {Check}_r(H;R)\times V)$ and using the law of total probability.

Let $(F,M)$ be a possible pair with respect to $(R,r)$ as in Lemma 7.2, and let $\tilde {\mathbb {P}}^M$ be the result of conditioning $\tilde {\mathbb {P}}$ on the event $H\cap (F\times V) = M$ , as previously. Let ${W_i := \mathtt {Vert}(M;F_i)}$ and let $W := W_1\cup \cdots \cup W_d$ . By the law of total probability, $\tilde {\mathbb {P}}(\tilde {\Omega }\setminus \Omega ^1)$ is equal to the sum

$$ \begin{align*} \sum_{\mathrm{possible}\,F,M}\tilde{\mathbb{P}}&\big(H\cap (F\times V) = M\big)\cdot \tilde{\mathbb{P}}^M\Big(\exists \mathbf{y} \in \mathbb{Z}_2^{E\setminus F}\ \mbox{such that}\\ &\quad\qquad \min\{|\mathbf{y}_i|,|\mathbf{e}_{E_i\setminus F_i} - \mathbf{y}_i|\} \ge \varepsilon\delta\frac{n}{k}\ \ \mbox{for some } i\ \mbox{and}\ (\mathbf{H}^{\mathsf{T}}\mathbf{y})_{V\setminus W} = \boldsymbol{0}\Big). \end{align*} $$

Let $K := (dk)^{r+1}+1$ . Our construction of the possible pair $(F,M)$ gives that

$$\begin{align*}\delta n\le |R| \le |W| \le \sum_{i=1}^d |W_i| \le (dk)^{r+1}|R| \le K\delta n\end{align*}$$

when $\delta n>1$ , and so the condition (25) is satisfied for this value of K by the second factor in every term of the sum above. Therefore, by Proposition 7.1, if $\delta $ is small enough in terms of d, k, $\varepsilon $ and r, then this sum is a convex combination of quantities that converge to $0$ at a rate depending only on d, k and $\delta $ , and hence, so does the whole expression.

This proves that $\tilde {\mathbb {P}}(\Omega ')\to 1$ for any sufficiently small $\delta $ . Fix such a $\delta $ for the rest of the proof.

Step 2. To finish the proof, we show that if $\sigma \in \Omega '$ , then (i), (ii) and (iii) imply that S has the properties required to witness property M as in Definition 4.2.

First, $S_1$ is $(2r)$ -separated by property (ii), and hence, so is S.

Second, properties (ii) and (iii) together give

$$\begin{align*}\frac{|S|}{|V|} \ge (1 - 2\varepsilon)\frac{|R|}{|V|} \ge (1 - 2\varepsilon)\delta,\end{align*}$$

which is uniformly positive in n provided $\varepsilon < 1/2$ .

Finally, we must prove (11). Because $\mu _\sigma $ is the uniform distribution on the linear subspace $X_\sigma $ of $\mathbb {Z}_2^V$ , the projection of $\mu _\sigma $ over the subset of vertices $\sigma ^{B_r}(S) = \mathtt {Vert}_r(H;S)$ is the uniform distribution on the linear space

$$\begin{align*}Z:= \{\mathbf{x}_{\mathtt{Vert}_r(H;S)}:\ \mathbf{x} \in X_\sigma\}.\end{align*}$$

Therefore, after ignoring a factor of $\log 2$ , we need a lower bound on $\dim Z$ .

By property (iii), each of the neighbourhoods $\mathtt {Vert}_r(H;v)$ for $v \in S$ is a bijective copy of $B_r \subseteq \Gamma $ , and by property (ii), these neighbourhoods are disjoint. Therefore, Z is naturally identified with a linear subspace of

$$ \begin{align*}\prod_{v \in S} \mathbb{Z}_2^{\mathtt{Vert}_r(H;v)}\cong (\mathbb{Z}_2^{B_r})^S.\end{align*} $$

As a linear subspace of $\mathbb {Z}_2^{\mathtt {Vert}_r(H;S)}$ , Z is determined by its dual code $Z^\perp $ – that is, its own collection of parity-check words in $\mathbb {Z}_2^{\mathtt {Vert}_r(H;S)}$ [Reference MacKay48, Section 13.10]. Since Z is the projection of $X_\sigma $ , each word in $Z^\perp $ becomes a word in $X^\perp _\sigma $ when it is extended by $0$ to $V\setminus \mathtt {Vert}_r(H;S)$ . Therefore, $Z^\perp $ is identified with the set of all mod- $2$ sums of the rows of $\mathbf {H}$ that vanish on $V\setminus \mathtt {Vert}_r(H;S)$ , and so the rank-nullity formula gives

(27)

$$ \begin{align} \dim Z = \dim (\mathbb{Z}_2^{B_r})^S - \dim \ker \mathbf{H}^{\mathsf{T}}_{(V\setminus \mathtt{Vert}_r(H;S))\times E}. \end{align} $$

So let us consider the ways in which a mod- $2$ sum of the rows of $\mathbf {H}$ can vanish on $V\setminus \mathtt {Vert}_r(H;S)$ . First, each individual row of $\mathbf {H}$ that is indexed by an element of $\mathtt {Check}_r(H;S)$ vanishes outside $\mathtt {Vert}_r(H;v)$ for some single element $v \in S$ . Since each of these sets is a bijective copy of $B_r$ , these checks by themselves show that Z is actually a linear subspace of $X_{B_r}^S \subset (\mathbb {Z}_2^{B_r})^S$ .

However, given any other mod- $2$ sum of rows of $\mathbf {H}$ which vanishes on $V\setminus \mathtt {Vert}_r(H;S)$ , we may remove any summands indexed by $\mathtt {Check}_r(H;S)$ without losing that property. So now consider a mod- $2$ sum of rows of $\mathbf {H}$ that is supported in $\mathtt {Vert}_r(H;S)$ and uses no rows indexed by $\mathtt {Check}_r(H;S)$ . It might use only rows indexed by $\mathtt {Check}_r(H;R)\setminus \mathtt {Check}_r(H;S)$ . Or it might include at least one summand indexed by an element of $E\setminus \mathtt {Check}_r(H;R)$ , in which case those summands by themselves define a nonzero vector in

$$ \begin{align*}D := \ker \mathbf{H}^{\mathsf{T}}_{(V\setminus \mathtt{Vert}_r(H;R)) \times (E\setminus \mathtt{Check}_r(H;R))}\end{align*} $$

because the rows of $\mathbf {H}$ indexed by $\mathtt {Check}_r(H;R)$ are supported in $\mathtt {Vert}_r(H;R)$ , and so ignoring them cannot change the support in $V\setminus \mathtt {Vert}_r(H;R)$ .

Combining these possibilities, (27) implies the lower bound

(28)

$$ \begin{align} \dim Z \ge |S|\cdot \dim X_{B_r} - |\mathtt{Check}_r(H;R)\setminus \mathtt{Check}_r(H;S)| - \dim D. \end{align} $$

By properties (ii) and (iii), $|S| \ge (1-2\epsilon )|R|$ . Since ${|\mathtt {Check}_r(H;R \setminus S)| \le (dk)^{r-1}k |R\setminus S|}$ , this implies

$$\begin{align*}|\mathtt{Check}_r(H;R)\setminus \mathtt{Check}_r(H;S)| \le |\mathtt{Check}_r(H;R\setminus S)| \le Kk\varepsilon|S|.\end{align*}$$

However, by property (i), if $\mathbf {y} \in D$ and we write $\mathbf {y} = (\mathbf {y}_1,\dots ,\mathbf {y}_d)^{\mathsf {T}}$ , then for each i, we have $\mathbf {y}_i \in \mathbb {Z}_2^{E_i\setminus \mathtt {Check}_r(H;R)}$ and

$$\begin{align*}\mbox{either}\ |\mathbf{y}_i| < \varepsilon\delta\frac{n}{k} \quad \mbox{or} \quad |\mathbf{e}_{E_i\setminus \mathtt{Check}_r(H;R)} - \mathbf{y}_i| < \varepsilon\delta\frac{n}{k}.\end{align*}$$

Therefore, provided $\varepsilon \delta < 1/3$ , Lemma 7.3 gives

$$\begin{align*}\dim\{\mathbf{y}_i:\ \mathbf{y} \in D\} \le 2\varepsilon\delta\frac{n}{k} + 1\quad \mbox{for each}\ i,\end{align*}$$

and hence, $\dim D \le 2\varepsilon \delta nd/k+d$ .

Inserting these bounds into (28), we finally arrive at

$$ \begin{align*} \dim Z &\ge |S|\cdot \dim X_{B_r} - Kk\varepsilon|S| - 2\varepsilon\delta\frac{nd}{k}-d\\ &\ge \left(\dim X_{B_r} - Kk\varepsilon - \frac{2\varepsilon d}{k} - o(1)\right)\cdot |S|. \end{align*} $$

This gives us a lower bound on the desired joint entropy:

$$ \begin{align*} \operatorname{\mathrm{H}}\! \big( (\mu_n)_{\sigma_n^{B_r}(R)} \big) &\geq \operatorname{\mathrm{H}}\! \big( (\mu_n)_{\sigma_n^{B_r}(S)} \big) \\ &= \log 2 \cdot \dim Z \\ &\geq \left( \log2 \cdot \dim X_{B_r} - \varepsilon \log 2 \cdot \Big(Kk - \frac{2 d}{k}-o(1)\Big) \right)\cdot (1-2\varepsilon)|R| \\ &\geq \left( \operatorname{\mathrm{H}}(m_{B_r}) - \varepsilon C \right)|R|, \end{align*} $$

where $C = \log 2 \cdot (Kk - \frac {2 d}{k}) + 2\operatorname {\mathrm {H}}(m_{B_r})$ . Since C depends only on $d,k$ and r, we could have taken at the beginning $\varepsilon = \eta /C$ , so this proves (11).

Remark. Proposition 7.1 shows that, with high conditional probability in the choice of $\mathbf {H}$ , the only parity checks among the bits in W that are created by $\mathbf {H}$ are (i) those created by the rows in F, and possibly (ii) a few others that are generated by some vectors $\mathbf {y}$ whose support is either extremely small or extremely close to the whole of $E\setminus F$ . Since the number of possible vectors of type (ii) is very small compared with those of type (i), this implies that there are few enough ‘spurious’ parity checks among the bits in W to give Theorem 6.1. However, we do not expect that there are no extra parity checks of type (ii): just by chance, one should typically find as many as a very small multiple of n of these.

Proof of Theorem 6.1 item (1) from Theorem 7.4.

We have to prove there are subsets

$$\begin{align*}\Omega^{\prime}_n \subseteq \operatorname{\mathrm{Hom}}_{\mathrm{unif}}(\Gamma,\operatorname{\mathrm{Sym}}(V_n))\end{align*}$$

such that $\mathbb {P}_n(\Omega ^{\prime }_n) \to 1$ and if $\sigma _n\in \Omega _n'$ for each n, then $(\mu _n)_n$ has property M with respect to $\Sigma =(\sigma _n)_n$ . This means that for every $\epsilon>0$ and $0<r<\infty $ , there is a sequence of subsets $S_n \subset V_n$ such that

$$ \begin{align*}\liminf \frac{|S_n|}{|V_n|}>0\end{align*} $$

and

(29)

$$ \begin{align} \operatorname{\mathrm{H}}( (\mu_n)_{\sigma_n^{B_r}(S_n)} ) \ge |S_n| (\operatorname{\mathrm{H}}(m_{B_r}) - \epsilon) \end{align} $$

for all n.

Let $r,\varepsilon>0$ . By Theorem 7.4, if $\delta>0$ is sufficiently small and for each n, $S_n \subset [n]$ is an arbitrary set of size $\lceil \delta n \rceil $ , then there is some sequence $(\Omega ^{\prime }_n)_n$ with $\mathbb {P}(\Omega ^{\prime }_n) \to 1$ such that the desired entropy inequality holds when $\sigma _n \in \Omega ^{\prime }_n$ , and $\liminf \frac {{\lvert {S_n}\rvert } }{n} = \liminf \frac {\lceil \delta n \rceil }{n} = \delta> 0$ .

Then, by diagonalizing over the countably many choices of $r \in \mathbb {N}$ and $\varepsilon \in \{\frac {1}{2},\frac {1}{3},\frac {1}{4},\ldots \}$ , we can get a single sequence of sets $(\Omega ^{\prime }_n)_n$ with $\mathbb {P}(\Omega ^{\prime }_n) \to 1$ such that for any $r \in \mathbb {N}$ and $\varepsilon>0$ , there is a sequence $(S_n)_n$ with the desired properties.

7.2 Proof that there are few additional checks

This subsection proves Proposition 7.1. Let the sets $F_i$ , $W_i$ and M and parameters K, $\varepsilon $ , $w_i$ and w be as in that statement. For $\mathbf {y} \in \mathbb {Z}_2^{E\setminus F}$ , let

$$\begin{align*}G_{\mathbf{y}} := \left\{ (\sigma, H) \,:\, (\mathbf{H}^{\mathsf{T}}\mathbf{y})_{V\setminus W} = \boldsymbol{0}\right\} \subset \tilde{\Omega}_n.\end{align*}$$

Write $\mathbf {y}=(\mathbf {y}_1, \ldots , \mathbf {y}_d)^{\mathsf {T}}$ with $\mathbf {y}_i \in \mathbb {Z}_2^{E_i\setminus F_i}$ . We will focus on those vectors $\mathbf {y}$ which satisfy restrictions on the cardinalities $|\mathbf {y}_i|$ coming from Proposition 7.1. To be precise, let $\mathcal {R}=\mathcal {R}(\varepsilon ,\delta )$ be the set of all nonnegative integer tuples $\mathbf {\ell } = (\ell _1,\dots ,\ell _d)$ that satisfy

(30)

$$ \begin{align} \ell_i \le \frac{n-w_i}{k} \quad \mbox{for every} \ i \quad \mbox{and} \quad \min\left\{\ell_i,\frac{n-w_i}{k} - \ell_i\right\} \ge \varepsilon\delta\frac{n}{k} \quad \mbox{for some}\ i. \end{align} $$

The main conclusion of Proposition 7.1 is equivalent to

$$\begin{align*}\tilde{\mathbb{P}}^M\left(\bigcup_{\mathbf{y}:\ (|\mathbf{y}_1|,\dots,|\mathbf{y}_d|) \in \mathcal{R}}G_{\mathbf{y}}\right) \to 0\end{align*}$$

as $n\to \infty $ . Our proof uses a simple union bound over $\mathbf {y}$ . We will derive estimates on $\tilde {\mathbb {P}}^M(G_{\mathbf {y}})$ that depend on $\varepsilon $ and $\delta $ for $\mathbf {y} \in \mathcal {R}(\varepsilon ,\delta )$ and then use these to conclude Proposition 7.1 provided $\delta $ is chosen correctly.

Fix a vector $\mathbf {y} \in \mathbb {Z}_2^{E\setminus F}$ with $(|\mathbf {y}_1|,\dots ,|\mathbf {y}_d|) \in \mathcal {R}$ , and set $r_i := k|\mathbf {y}_i|$ and $\mathbf {r} := (r_1,\dots ,r_d)$ . Let $Y_i$ be the support of the random vector $\mathbf {H}^{\mathsf {T}}\mathbf {y}_i$ for each i. This means $\mathbf {H}^{\mathsf {T}}\mathbf {y}_i = \mathbf {e}_{Y_i}$ . These random sets are independent by the independence in Lemma 7.2. Each $Y_i$ is uniformly random among all subsets of $V\setminus W_i$ of size $r_i$ .

Each $Y_i$ may have nonempty intersection with $W\setminus W_i$ . We bound $\tilde {\mathbb {P}}^M(G_{\mathbf {y}})$ by breaking into a few further cases depending on the sizes of these intersections. To this end, let $Z_i := Y_i\setminus W$ . Under $\tilde {\mathbb {P}}^M$ , the cardinality $|Z_i|$ is a random quantity obtained by sampling $r_i$ points from $V\setminus W_i$ without replacement and counting how many of them land in $V\setminus W$ . This random quantity has the hypergeometric distribution with parameters $n-w_i$ , $n-w$ and $r_i$ : see, for instance, [Reference Feller28, Section II.6]. Its possible values are the integers $s_i$ that satisfy

(31)

$$ \begin{align} \max(0,r_i-(w-w_i)) \le s_i \le \min(r_i,n-w), \end{align} $$

and for such values, we have

$$\begin{align*}\tilde{\mathbb{P}}^M(|Z_i| = s_i) = \frac{{n-w \choose s_i}{w-w_i \choose r_i-s_i}}{{n-w_i \choose r_i}}.\end{align*}$$

By the standard exponential estimate for binomial coefficients (see, for instance, [Reference Cover and Thomas23, Example 11.1.3]), this ratio is at most

(32)

$$ \begin{align} \exp\left(\operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right)(n-w) + \operatorname{\mathrm{H}}\left(\frac{r_i-s_i}{w-w_i}\right)(w-w_i) - \operatorname{\mathrm{H}}\left(\frac{r_i}{n-w_i}\right)(n-w_i) + o(n)\right), \end{align} $$

where quality of the error term does not depend on any other parameters.

Towards Proposition 7.1, we estimate the probability of $G_{\mathbf {y}}$ after further conditioning on the tuple of cardinalities $|Z_i|$ , and then combine this estimate with (32) using the law of total probability. That refined conditional probability estimate depends on the following lemma.

Let $\mathtt {Even}(d)$ be the subset of all strings in $\{0,1\}^d$ that have even weight.

Lemma 7.5. Let $q_1$ , …, $q_d$ be probability distributions on $\{0,1\}$ , and assume that q is a coupling of $q_1$ , …, $q_d$ that is supported on ${\mathtt {Even}}(d)$ . Then

$$\begin{align*}\operatorname{\mathrm{H}}(q) \le \left(1 - \frac{1}{d}\right)\big(\operatorname{\mathrm{H}}(q_1) + \dots + \operatorname{\mathrm{H}}(q_d)\big).\end{align*}$$

Proof. By permuting indices, we may assume without loss of generality that

$$\begin{align*}\operatorname{\mathrm{H}}(q_1) \ge \dots \ge \operatorname{\mathrm{H}}(q_d).\end{align*}$$

Let $\mathbf {z} = (z_1,\dots ,z_d)$ be the identity map on $\{0,1\}^d$ , and regard it as a random binary string with distribution q. Then $\mathbf {z}$ has even weight almost surely, and hence, the coordinates $z_2$ , …, $z_d$ determine $z_1$ almost surely. Therefore,

$$\begin{align*}\operatorname{\mathrm{H}}(q) = \operatorname{\mathrm{H}}_q(\mathbf{z}) = \operatorname{\mathrm{H}}_q(z_2,\dots,z_d) \le \operatorname{\mathrm{H}}_q(z_2) + \cdots + \operatorname{\mathrm{H}}_q(z_d) = \operatorname{\mathrm{H}}(q_2) + \cdots + \operatorname{\mathrm{H}}(q_d).\end{align*}$$

Because of our ordering of the indices, this is at most the desired upper bound.

Now let $\mathcal {S}(\mathbf {r})$ be the set of all integer tuples $\mathbf {s} = (s_1,\dots ,s_d)$ that satisfy (31) for every i.

Lemma 7.6. Fix a vector $\mathbf {y} \in \mathbb {Z}_2^{E\setminus F}$ with $(|\mathbf {y}_1|,\dots ,|\mathbf {y}_d|) \in \mathcal {R}$ as above. If $\mathbf {s} \in \mathcal {S}(\mathbf {r})$ , then

$$\begin{align*}\tilde{\mathbb{P}}^M\left(G_{\mathbf{y}}\ \big|\ |Z_i|=s_i\ \mbox{for each}\ i\right) \le \exp\left(-\frac{1}{d}\sum_{i=1}^d\operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right)\cdot (n-w) + o_d(n)\right).\end{align*}$$

Proof. To lighten notation, within this proof, let

$$\begin{align*}\tilde{\mathbb{P}}^{M,\mathbf{s}} := \tilde{\mathbb{P}}^M\big(\,\cdot\,\big|\ |Z_i|=s_i\ \mbox{for each}\ i\big).\end{align*}$$

Record the random sets $Z_1$ , …, $Z_d$ into the random vector

$$\begin{align*}\boldsymbol{\eta} = (\eta_v)_{v\in V\setminus W} \in (\{0,1\}^d)^{V\setminus W} \quad \mbox{where}\ \eta_v := (1_{Z_1}(v),\dots,1_{Z_d}(v)).\end{align*}$$

Let $P_{\boldsymbol {\eta }} = \frac {1}{{\lvert {V \setminus W}\rvert } } \sum _{v \in V \setminus W} \delta _{\eta _v}$ be the empirical distribution of $\boldsymbol {\eta }$ . This is a probability distribution on $\{0,1\}^d$ , and the event $G_{\mathbf {y}}$ occurs if and only if this probability distribution is supported on the subset $\mathtt {Even}(d)$ . The marginals of $P_{\boldsymbol {\eta }}$ are $(q_i,1-q_i)$ for $i=1,2,\dots ,d$ , where $q_i := s_i/(n-w)$ . Moreover, $P_{\boldsymbol {\eta }}$ must take values that are multiples of $1/(n-w)$ , and the total number of such possible distributions is at most

$$\begin{align*}(n-w+1)^{2^d} \le n^{2^d}= e^{o_d(n)}\end{align*}$$

(here and in some subsequent steps, we generally loosen $o_d(n-w)$ to $o_d(n)$ ). Therefore,

(33)

$$ \begin{align} \tilde{\mathbb{P}}^{M,\mathbf{s}}(G_{\mathbf{y}}) \le e^{o_d(n)}\max &\big\{\tilde{\mathbb{P}}^{M,\mathbf{s}}(P_{\boldsymbol{\eta}} = q):\ q\ \mbox{a coupling}\\ &\qquad\qquad\qquad\qquad\mbox{of}\ q_1,\dots,q_d\ \mbox{such that}\ q(\mathtt{Even}(d)) = 1\big\}.\notag \end{align} $$

For a distribution q as above, the set of vectors $\boldsymbol {\eta }$ that give $P_{\boldsymbol {\eta }} = q$ are the ‘type class’ of q, and their number is simply bounded using the entropy of q:

$$\begin{align*}|\{\boldsymbol{\eta} \in (\{0,1\}^d)^{V\setminus W}:\ P_{\boldsymbol{\eta}} = q\}| \le e^{\operatorname{\mathrm{H}}(q)\cdot (n-w)}\end{align*}$$

(see, for instance, [Reference Cover and Thomas23, Theorem 11.1.3], except note that Cover and Thomas use $\log _2$ rather than natural logarithms to define $\operatorname {\mathrm {H}}$ ). By Lemma 7.5, this upper bound is always at most

(34)

$$ \begin{align} \exp\left(\left(1 - \frac{1}{d}\right)\cdot \big(\operatorname{\mathrm{H}}(q_1) + \cdots + \operatorname{\mathrm{H}}(q_d)\big)\cdot (n-w)\right). \end{align} $$

However, under the conditional probability measure $\tilde {\mathbb {P}}^{M,\mathbf {s}}$ , the set $Z_i$ is a uniform random subset of $V\setminus W$ of size $s_i$ , and these random sets are still independent. Therefore, the probability of any particular d-tuple of sets of these sizes occurring is

$$\begin{align*}\prod_{i=1}^d{n-w \choose s_i}^{-1},\end{align*}$$

and by another use of standard exponential estimates on binomial coefficients [Reference Cover and Thomas23, Example 11.1.3], this is at most

$$\begin{align*}\exp\left(-\big(\operatorname{\mathrm{H}}(q_1) + \cdots + \operatorname{\mathrm{H}}(q_d)\big)\cdot (n-w) + o_d(n)\right).\end{align*}$$

Multiplying by the cardinality upper bound (34), we obtain

$$\begin{align*}\tilde{\mathbb{P}}^{M,\mathbf{s}}(P_{\boldsymbol{\eta}} = q) \le \exp\left(-\frac{1}{d}\big(\operatorname{\mathrm{H}}(q_1) + \cdots + \operatorname{\mathrm{H}}(q_d)\big)\cdot (n-w) + o_d(n)\right)\end{align*}$$

for any such coupling q. Since this upper bound is independent of the particular coupling q, and the extra factor in (33) is sub-exponential, this gives the result.

Lemma 7.7. Fix k and d as before, and define the function

$$\begin{align*}f(t,\alpha',\alpha") := (1 - k^{-1})\operatorname{\mathrm{H}}\big((1-t)\alpha' + t\alpha"\big) - (1-t)(1 - d^{-1})\operatorname{\mathrm{H}}(\alpha') - t\operatorname{\mathrm{H}}(\alpha")\end{align*}$$

for $0 \le \alpha ',\alpha ",t \le 1$ . Then

$$\begin{align*}\tilde{\mathbb{P}}^M\left(\bigcup_{\mathbf{y}:\ (|\mathbf{y}_1|,\dots,|\mathbf{y}_d|) \in \mathcal{R}}G_{\mathbf{y}}\right) \le \sum_{\mathbf{r} \in k\mathcal{R},\ \mathbf{s} \in \mathcal{S}(\mathbf{r})}\exp \left(-\sum_{i=1}^df_i(r_i,s_i)\cdot (n-w_i) + o_d(n)\right),\end{align*}$$

where

$$\begin{align*}f_i(r_i,s_i) = f\left(\frac{w-w_i}{n-w_i},\frac{s_i}{n-w},\frac{r_i-s_i}{w-w_i}\right).\end{align*}$$

Proof. For each $\mathbf {y}$ , we let $r_i := k|\mathbf {y}_i|$ and bound $\tilde {\mathbb {P}}^M(G_{\mathbf {y}})$ from above using (32), Lemma 7.6 and the law of total probability. The resulting upper bound is

$$ \begin{align*} \tilde{\mathbb{P}}^M\left(G_{\mathbf{y}}\right) &\le \sum_{\mathbf{s} \in \mathcal{S}(\mathbf{r})}\exp\Bigg(-\frac{1}{d}\sum_{i=1}^d\operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right) (n-w) + \sum_{i=1}^d\operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right)(n-w) \\ &\qquad + \sum_{i=1}^d\operatorname{\mathrm{H}}\left(\frac{r_i-s_i}{w-w_i}\right)(w-w_i) - \sum_{i=1}^d\operatorname{\mathrm{H}}\left(\frac{r_i}{n-w_i}\right)(n-w_i) + o_d(n)\Bigg). \end{align*} $$

However, the number of vectors $\mathbf {y} \in \mathbb {Z}_2^{E\setminus F}$ with given weights $|\mathbf {y}_i| = r_i/k$ is at most

$$\begin{align*}\exp\left(\sum_{i=1}^d\operatorname{\mathrm{H}}\left(\frac{r_i}{n-w_i}\right)\frac{n-w_i}{k}\right).\end{align*}$$

Therefore, the sum of $\tilde {\mathbb {P}}^M(G_{\mathbf {y}})$ over all $\mathbf {y}$ satisfying $(|\mathbf {y}_1|,\dots ,|\mathbf {y}_d|) \in \mathcal {R}$ is at most

(35)

$$ \begin{align} \sum_{\mathbf{r} \in k\mathcal{R},\ \mathbf{s} \in \mathcal{S}(\mathbf{r})}\exp \left(-\sum_{i=1}^df_i(r_i,s_i)\cdot (n-w_i) + o_d(n)\right), \end{align} $$

where

$$ \begin{align*} f_i(r_i,s_i) &:= -\frac{1}{k}\operatorname{\mathrm{H}}\left(\frac{r_i}{n-w_i}\right) + \frac{1}{d}\cdot \frac{n-w}{n-w_i}\cdot \operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right) \\ &\qquad -\frac{n-w}{n-w_i}\cdot \operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right) - \frac{w-w_i}{n-w_i}\cdot \operatorname{\mathrm{H}}\left(\frac{r_i-s_i}{w-w_i}\right) + \operatorname{\mathrm{H}}\left(\frac{r_i}{n-w_i}\right)\\ &= \left(1 - \frac{1}{k}\right)\cdot \operatorname{\mathrm{H}}\left(\frac{r_i}{n-w_i}\right)\\ &\qquad -\left(1-\frac{1}{d}\right)\cdot \frac{n-w}{n-w_i}\cdot \operatorname{\mathrm{H}}\left(\frac{s_i}{n-w}\right) - \frac{w-w_i}{n-w_i}\cdot \operatorname{\mathrm{H}}\left(\frac{r_i-s_i}{w-w_i}\right)\\ &= f\left(\frac{w-w_i}{n-w_i},\frac{s_i}{n-w},\frac{r_i-s_i}{w-w_i}\right).\\[-38pt] \end{align*} $$

We are nearly ready to prove Proposition 7.1. For that proof, we must combine Lemma 7.7 with an elementary but rather fiddly estimate. That estimate refers to the functions

$$\begin{align*}\gamma_1(t) := \frac{1}{(\log(1/t))^{1/3}} \quad \mbox{and} \quad \gamma_2(t) := \frac{1}{(\log(1/t))^{2/3}},\end{align*}$$

both for $0 < t < 1$ . The exponents $1/3$ and $2/3$ are not particularly special here: all we really need is the ordering $0 < 1/3 < 2/3 < 1$ . The next lemma gives a collection of simple bounds on the quantity $f(t,\alpha ',\alpha ")$ for different ranges of the arguments. Each part requires that t is sufficiently small in terms of d and k. The quantities $t_0^{\mathrm {(a)}}, t_0^{\mathrm {(b)}}$ , $t_0^{\mathrm {(c)}}$ and $t_0^{\mathrm {(d)}}$ are unspecified positive numbers that are sufficiently small in terms of only d and k. Recall we assume $k> d \geq 3$ .

Lemma 7.8. Write $\alpha := (1-t)\alpha ' + t\alpha "$ . The function f from Lemma 7.7 satisfies the following.

a. If $t < t_0^{\mathrm {(a)}}$ and either $\alpha " \le \alpha ' \le 1/2$ or $\alpha " \ge \alpha '> 1/2$ , then
$$\begin{align*}f(t,\alpha',\alpha") \gtrsim_{d,k} \operatorname{\mathrm{H}}(\alpha).\end{align*}$$
(This includes the assertion that the left-hand side is nonnegative; see Section 1.6.)
b. If $t < t_0^{\mathrm {(b)}}$ and $t\gamma _p(t) \le \alpha ' \le 1 - t\gamma _p(t)$ , then
$$\begin{align*}f(t,\alpha',\alpha") \gtrsim_{d,k} t\cdot (\log(1/t))^{1-p/3}.\end{align*}$$
c. If $t < t_0^{\mathrm {(c)}}$ and $\gamma _p(t) \le \alpha " \le 1 - \gamma _p(t)$ , then
$$\begin{align*}f(t,\alpha',\alpha") \gtrsim_{d,k} t\cdot (\log(1/t))^{1-p/3}.\end{align*}$$
d. If $t < t_0^{\mathrm {(d)}}$ and
$$\begin{align*}[\ \alpha' < t\gamma_p(t)\ \mbox{or}\ \alpha'> 1-t\gamma_p(t) ]\ \mbox{and}\ [\ \alpha" < \gamma_p(t)\ \mbox{or}\ \alpha"> 1 - \gamma_p(t)\ ],\end{align*}$$
then
$$\begin{align*}\max\{ 0, -f(t,\alpha',\alpha")\} \lesssim_{d,k} t\cdot (\log(1/t))^{1-p/3}.\end{align*}$$
(The maximum is used here to maintain the nonnegativity convention for $\lesssim $ .)

Observe that at least one of the parts (b), (c) or (d) must hold whenever $t < \min \{ t_0^{\mathrm {(b})},t_0^{\mathrm {(c)}},t_0^{\mathrm {(d)}}\}$ .

Proof. Each part of this lemma is symmetric under replacing $(\alpha ',\alpha ")$ with ${(1-\alpha ',1-\alpha ")}$ , and so is the function f. We therefore assume that $\alpha ' \le 1/2$ throughout the proof.

Part (a). By the concavity of $\operatorname {\mathrm {H}}$ , we have

$$ \begin{align*} &(1-t)(1 - d^{-1})\operatorname{\mathrm{H}}(\alpha') + t\operatorname{\mathrm{H}}(\alpha") \\ &\le \big((1-t)(1 - d^{-1}) + t\big)\operatorname{\mathrm{H}}\left(\frac{(1-t)(1 - d^{-1})}{(1-t)(1 - d^{-1}) + t}\alpha' + \frac{t}{(1-t)(1 - d^{-1}) + t}\alpha"\right). \end{align*} $$

Since $1-d^{-1} < 1$ , the convex combination inside the argument of $\operatorname {\mathrm {H}}$ here skews more towards $\alpha "$ than does the convex combination that gives $\alpha $ . Therefore, since $\alpha " \le \alpha ' \le 1/2$ and $\operatorname {\mathrm {H}}$ is increasing on $[0,1/2]$ , the right-hand side above is bounded above by

$$\begin{align*}\big((1-t)(1 - d^{-1}) + t\big)\operatorname{\mathrm{H}}(\alpha) = (1 - d^{-1} + td^{-1})\operatorname{\mathrm{H}}(\alpha).\end{align*}$$

Therefore,

$$\begin{align*}f(t,\alpha',\alpha") \ge \big((1-k^{-1}) - (1-d^{-1} + td^{-1})\big)\operatorname{\mathrm{H}}(\alpha) = (d^{-1} - k^{-1} - td^{-1})\operatorname{\mathrm{H}}(\alpha),\end{align*}$$

which is $\gtrsim _{d,k} \operatorname {\mathrm {H}}(\alpha )$ provided $t < t_0^{\mathrm {(a)}}$ .

Part (b). For this part, our assumptions are now $\alpha ' \le 1/2$ and $\alpha ' \ge t\gamma _p(t)$ . Let ${c_1 := (d^{-1} - k^{-1})/2> 0}$ . Since $\alpha '\le 1/2$ and $\operatorname {\mathrm {H}}$ is continuous on $[0,1]$ and increasing on $[0,1/2]$ , we have that

(36)

$$ \begin{align} \operatorname{\mathrm{H}}((1-t)\alpha' + t\alpha") \ge \frac{1-d^{-1} + c_1}{1-k^{-1}}\operatorname{\mathrm{H}}(\alpha') \quad \mbox{whenever}\ t < t_0^{\mathrm{(b)}}. \end{align} $$

If $\alpha ' \ge t\gamma _p(t)$ and $t < t_0^{\mathrm {(b)}}$ , then

(37)

$$ \begin{align} \operatorname{\mathrm{H}}(\alpha') &\ge \operatorname{\mathrm{H}}(t\gamma_p(t)) \ge t\cdot \gamma_p(t) \cdot \big(\log(1/\gamma_p(t)) + \log(1/t)\big) \\ &\ge t\cdot \gamma_p(t) \cdot \log(1/t) = t \cdot (\log(1/t))^{1-p/3}. \nonumber \end{align} $$

Combining (36) and (37), we obtain

$$ \begin{align*} f(t,\alpha',\alpha") &\ge (1 - d^{-1} + c_1)\operatorname{\mathrm{H}}(\alpha') - (1-t)(1-d^{-1})\operatorname{\mathrm{H}}(\alpha') - t\operatorname{\mathrm{H}}(\alpha")\\ &\ge c_1\operatorname{\mathrm{H}}(\alpha') - t\operatorname{\mathrm{H}}(\alpha")\\ &\ge c_1\cdot t\cdot (\log(1/t))^{1-p/3} - \log 2\cdot t\\ &\gtrsim_{d,k} t\cdot (\log(1/t))^{1-p/3} \qquad \qquad \qquad \mbox{if}\ t < t_0^{\mathrm{(b)}}. \end{align*} $$

Part (c). For this part, our assumptions are now $\alpha ' \le 1/2$ and $\alpha " \ge \gamma _p(t)$ . We may also assume that $\alpha ' < t\gamma _p(t)$ , for otherwise, part (b) already gives the desired bound.

If, in addition, we have $\alpha ' \ge \alpha "$ , then part (a) gives

$$\begin{align*}f(t,\alpha',\alpha")\gtrsim_{d,k} \operatorname{\mathrm{H}}(\alpha),\end{align*}$$

and this in turn satisfies

(38)

$$ \begin{align} \operatorname{\mathrm{H}}(\alpha) \ge \operatorname{\mathrm{H}}(t\alpha") \ge t\cdot \alpha"\cdot \log(1/t) \ge t\cdot \gamma_p(t) \cdot \log(1/t) = t\cdot (\log(1/t))^{1-p/3}. \end{align} $$

So for the rest of this part, assume in addition that $\alpha ' \le \alpha "$ . Then for sufficiently small t, we must have the ordering $\alpha ' \le \alpha < 1/2$ , and so

$$ \begin{align*} (1-t)(1-d^{-1})\operatorname{\mathrm{H}}(\alpha') + t\operatorname{\mathrm{H}}(\alpha") &\le (1-d^{-1})\operatorname{\mathrm{H}}(\alpha') + t\operatorname{\mathrm{H}}(\alpha")\\ &\le (1-d^{-1})\operatorname{\mathrm{H}}(\alpha) + t\operatorname{\mathrm{H}}(\alpha"). \end{align*} $$

Therefore, in this case, it suffices to show that

$$\begin{align*}(d^{-1} - k^{-1})\operatorname{\mathrm{H}}(\alpha) - t\operatorname{\mathrm{H}}(\alpha") \gtrsim_{d,k} t\cdot (\log(1/t))^{1-p/3}.\end{align*}$$

Since the second left-hand term here is $O(t)$ , this follows by another use of (38).

Part (d). For this case, we simply neglect the positive term in f entirely. If

$$\begin{align*}t < t_0^{\mathrm{(d)}}, \quad \alpha' < t\cdot \gamma_p(t) \quad \mbox{and}\quad \alpha" < \gamma_p(t),\end{align*}$$

then

$$\begin{align*}\operatorname{\mathrm{H}}(\alpha') \lesssim_{d,k} t\cdot \gamma_p(t)\cdot \log\frac{1}{t} = t\cdot (\log(1/t))^{1-p/3}\end{align*}$$

and

$$\begin{align*}t \operatorname{\mathrm{H}}(\alpha") = O(t) \lesssim_{d,k} t\cdot (\log(1/t))^{1-p/3}.\end{align*}$$

In the cases where $\alpha '> 1 - t \cdot \gamma _p(t)$ or $\alpha "> 1 - \gamma _p(t)$ , the same estimates hold, by the symmetry $\operatorname {\mathrm {H}}(x) = \operatorname {\mathrm {H}}(1-x)$ . Adding these estimates gives the conclusion.

Corollary 7.9. Fix $K\ge 1$ and $\delta> 0$ , and let the other notation be as for Lemma 7.8. If $\delta $ is sufficiently small in terms of d, k and K, and if

$$\begin{align*}t \le K\delta \quad \mbox{and} \quad \delta\cdot \gamma_1(\delta) \le \alpha \le 1 - \delta\cdot \gamma_1(\delta),\end{align*}$$

then

$$\begin{align*}f(t,\alpha',\alpha") \gtrsim_{d,k,K} \delta \cdot (\log(1/\delta))^{2/3}\end{align*}$$

(irrespective of any further bounds on $\alpha '$ and $\alpha "$ ).

Proof. By the same symmetry as for Lemma 7.8, we may assume that $\alpha ' \le 1/2$ .

Having done so, suppose first that $\alpha " \le \alpha '$ . Then part (a) of Lemma 7.8 gives

$$\begin{align*}f(t,\alpha',\alpha") \gtrsim_{d,k} \operatorname{\mathrm{H}}(\alpha),\end{align*}$$

and our assumed range for $\alpha $ gives

(39)

$$ \begin{align} \operatorname{\mathrm{H}}(\alpha) \ge\operatorname{\mathrm{H}}(\delta\cdot \gamma_1(\delta)) \ge \delta\cdot (\log(1/\delta))^{2/3}, \end{align} $$

giving a lower bound of the desired form.

So now suppose that $\alpha ' \le 1/2$ and $\alpha " \ge \alpha '$ . Since $t \le K\delta $ , it follows that

$$\begin{align*}\alpha' \le \alpha\le (1-K\delta)\alpha' + K\delta.\end{align*}$$

Provided $\delta $ is sufficiently small in terms of d and k, this range of possible values for $\alpha '$ and $\alpha $ implies that

$$\begin{align*}(1-d^{-1})\operatorname{\mathrm{H}}(\alpha') \le \left(1-\frac{d^{-1}+k^{-1}}{2}\right)\operatorname{\mathrm{H}}(\alpha)\end{align*}$$

(noting that the constant in front of the entropy on the left is slightly smaller than the constant in front of the entropy on the right). Rearranging, this implies that

$$\begin{align*}(1-k^{-1})\operatorname{\mathrm{H}}(\alpha) - (1-d^{-1})\operatorname{\mathrm{H}}(\alpha') \ge \frac{d^{-1}-k^{-1}}{2}\operatorname{\mathrm{H}}(\alpha) \ge \frac{d^{-1}-k^{-1}}{2}\cdot \delta\cdot (\log(1/\delta))^{2/3},\end{align*}$$

using again the lower bound (39). This now gives

$$ \begin{align*} f(t,\alpha',\alpha") &\ge \frac{d^{-1}-k^{-1}}{2}\cdot \delta\cdot (\log(1/\delta))^{2/3} - t\big(\operatorname{\mathrm{H}}(\alpha") - (1-d^{-1})\operatorname{\mathrm{H}}(\alpha')\big)\\ &\ge \frac{d^{-1}-k^{-1}}{2}\cdot \delta\cdot (\log(1/\delta))^{2/3} - \log 2\cdot K\cdot \delta. \end{align*} $$

This implies the desired lower bound on f for all sufficiently small $\delta $ .

Proof of Proposition 7.1.

Fix K and $\varepsilon $ and also $F_i$ , $W_i$ , $w_i$ and M as in the statement of Proposition 7.1, and suppose that (25) holds. We prove the convergence to zero of the required probabilities provided that $\delta $ is small enough in terms of d, k, $\varepsilon $ and K.

First, assume $\delta $ is small enough that

(40)

$$ \begin{align} \gamma_1(\delta) \le \varepsilon, \end{align} $$

and also small enough in terms of d, k and K that Corollary 7.9 applies.

By Lemma 7.7, and since a small choice of $\delta $ ensures that $n-w_i \ge n/2$ for each i, it suffices to show that, if $\delta $ is sufficiently small, then the negative exponent

(41)

$$ \begin{align} \sum_{i=1}^d f\left(\frac{w-w_i}{n-w_i},\frac{s_i}{n-w},\frac{r_i-s_i}{w-w_i}\right) \end{align} $$

is bounded below by a positive quantity that is independent of $\mathbf {r} \in k\mathcal {R}$ and $\mathbf {s} \in \mathcal {S}(\mathbf {r})$ .

So fix $\mathbf {r}$ and $\mathbf {s}$ , and let

$$\begin{align*}(t_i,\alpha^{\prime}_i,\alpha^{\prime\prime}_i) := \left(\frac{w-w_i}{n-w_i},\frac{s_i}{n-w},\frac{r_i-s_i}{w-w_i}\right)\end{align*}$$

and

$$ \begin{align*} \alpha_i = (1-t_i)\alpha^{\prime}_i + t_i \alpha^{\prime\prime}_i =\frac{r_i}{n-w_i}. \end{align*} $$

For each i, this implies that

$$\begin{align*}t_i \le \frac{w}{n} \le K \delta.\end{align*}$$

Assume $\delta $ is also small enough that $K\delta < \min \{t_0^{\mathrm {(a)}}, t_0^{\mathrm {(b)}},t_0^{\mathrm {(c)}},t_0^{\mathrm {(d)}}\}$ , so each $t_i$ is also less than this minimum.

Classify the indices $i \in [d]$ into two subsets:

$$ \begin{align*} I_1 &= \{i \in [d]:\ \varepsilon\delta \le r_i/(n-w_i) \le 1 - \varepsilon\delta\}\\ I_2 &= [d] \setminus I_1. \end{align*} $$

The definition of $\mathcal {R}$ implies that $I_1 \neq \emptyset $ . Consider the terms in (41) for indices in these subsets:

• If $i \in I_1$ , then, by our choice of $\delta $ in (40) and since $t_i \le K\delta $ , Corollary 7.9 gives
$$\begin{align*}f\left(\frac{w-w_i}{n-w_i},\frac{s_i}{n-w},\frac{r_i-s_i}{w-w_i}\right) \ge C_1 \cdot \delta \cdot (\log(1/\delta))^{2/3}\end{align*}$$
for some positive constant $C_1$ depending only on d, k and K.
• Next, for $i\in I_2$ , we use that at least one of parts (b), (c), and (d) of Lemma 7.8 must hold, which gives that
$$ \begin{align*} \max\left\{ 0, \, -f\left(\frac{w-w_i}{n-w_i},\frac{s_i}{n-w},\frac{r_i-s_i}{w-w_i}\right) \right\} &\lesssim_{d,k} t_i\cdot (\log(1/t_i))^{1/3}\\ &\lesssim_K \delta\cdot (\log(1/\delta))^{1/3} \end{align*} $$
provided $\delta $ is sufficiently small, and so the left-hand side is bounded above by $C_2\cdot \delta \cdot (\log (1/\delta ))^{1/3}$ for some positive $C_2$ depending only on d, k and K.

We bound (41) from below by adding these estimates. At least one term has $i \in I_1$ , and there are at most $d-1$ terms with $i \in I_2$ . This leaves the lower bound

$$\begin{align*}C_1\cdot \delta \cdot (\log(1/\delta))^{2/3} - (d-1)\cdot C_2\cdot \delta\cdot (\log(1/\delta))^{1/3}.\end{align*}$$

This is positive for all sufficiently small $\delta $ , uniformly over different choices of $\mathbf {r}$ or $\mathbf {s}$ , as required.

8 Proof of local and empirical convergence

In this section, we prove Theorem 6.1(2): we show that if $(\Omega ^{\prime }_n)_n$ is the sequence given by Theorem 6.1(1) and $\sigma _n \in \Omega ^{\prime }_n$ for each n, then the measures $\mu _n$ converge locally and empirically in probability to the Haar measure m.

Recall $X_n = X_{\sigma _n}$ is the set of parity check codewords over $\sigma _n \in \operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma ,\operatorname {\mathrm {Sym}}(V_n))$ . For $v \in V_n$ , let

$$\begin{align*}X_{n,v} = \left\{ \Pi_v^{\sigma_n} \mathbf{x} \,:\, \mathbf{x} \in X_n \right\} \end{align*}$$

be the set of pullback names at v of all the codewords in $X_n$ . As above, let

$$\begin{align*}{\mathrm{Loc}({\mu_{n}}, {v})} = \left( \Pi_v^{\sigma_n} \right)_* \mu_n \in \operatorname{\mathrm{Prob}}(X) .\end{align*}$$

Let $X_{n,v,r}$ be the projection of $X_{n,v}$ onto $\mathbb {Z}_2^{B_r}$ . Call a vertex $v \in V_n$ r -proper if $X_{n,v,r} = X_{B_r}$ . Otherwise, call it r -improper.

The following lemma is a (stronger) version of [Reference Richardson and Urbanke61, Lemma 3.47] for our random factor graph model.

Lemma 8.1. For any $r, \varepsilon>0$

$$\begin{align*}\mathbb{P} \left( \frac{1}{{\lvert {V_n}\rvert} } {\lvert {\{ v \in V_n \,:\, v \text{ is } r\text{-improper} \}}\rvert} \geq \varepsilon \right) \to 0 \end{align*}$$

as $n \to \infty $ .

Proof. In general, $X_{n,v,r}$ is a vector subspace of $X_{B_r}$ . If v is r-improper, then it is a subspace of strictly smaller dimension, so

(42)

$$ \begin{align} \operatorname{\mathrm{H}}({\mathrm{Loc}({\mu_{n}}, {v})}_{B_r}) \leq \log {\lvert {X_{n,v,r}}\rvert} = (\dim X_{n,v,r}) \log 2 \leq (\dim X_{B_r} -1)\log 2 = \operatorname{\mathrm{H}}(m_{B_r}) - \log 2. \end{align} $$

With $\varepsilon ,r$ as given, pick $\eta = \frac {\varepsilon \log 2}{4}$ . By Theorem 7.4, for small enough $\delta>0$ , if for each n we pick a subset $R_n$ of $V_n$ of size $\lceil \delta n \rceil $ , then with high probability, $\sigma _n$ satisfies

$$\begin{align*}\operatorname{\mathrm{H}}( (\mu_n)_{\sigma_n^{B_r}\cdot R_n} ) \geq \left( \operatorname{\mathrm{H}}(m_{B_r}) - \eta \right) {\lvert {R_n} \rvert}. \end{align*}$$

Now for the sake of contradiction, suppose that

$$\begin{align*}\limsup_{n \to \infty} \mathbb{P} \left( \frac{1}{{\lvert {V_n}\rvert} } {\lvert {\{ v \in V_n \,:\, v \text{ is } r\text{-improper} \}}\rvert} \geq \varepsilon \right)> 0. \end{align*}$$

Then, by symmetry of the law of $\sigma _n$ , and using that ${\lvert {R_n} \rvert } \geq \delta {\lvert {V_n}\rvert } $ , the probability that the fraction of r-improper vertices within $R_n$ is at least $\frac {\varepsilon }{2}$ is uniformly bounded below for infinitely many n. But if an $\frac {\varepsilon }{2}$ fraction of vertices of $R_n$ are r-improper, then by (42) and subadditivity of Shannon entropy,

$$\begin{align*}\operatorname{\mathrm{H}}((\mu_n)_{\sigma_n^{B_r} \cdot R_n} ) \leq \left( \operatorname{\mathrm{H}}(m_{B_r}) - \tfrac{\varepsilon}{2} \log 2 \right) {\lvert {R_n}\rvert} .\end{align*}$$

Combining with the above, this implies that with nonvanishing probability,

$$\begin{align*}\left( \operatorname{\mathrm{H}}(m_{B_r}) - \eta \right) {\lvert {R_n}\rvert} \leq \left( \operatorname{\mathrm{H}}(m_{B_r}) - \tfrac{\varepsilon}{2} \log 2 \right) {\lvert {R_n}\rvert}. \end{align*}$$

But this is false by choice of $\eta $ , so it must be that

as desired.

Lemma 8.2. Assume $k\ge 3$ . Then the action of $\Gamma $ on $(X,m_X)$ is mixing and hence ergodic.

Proof. We prove this using Fourier analysis on the compact Abelian group X.

Step 1. For any $g \in \Gamma $ , let $|g|$ be its word length in the generating set $\{s_i^j:\ i = 1,\dots ,d,\ j=1,\dots ,k-1\}$ . The characters of X form an orthonormal basis for $L^2_{\mathbb {C}}(m_X)$ , and all non-identity characters have mean zero. It therefore suffices to prove that, for any two non-identity characters $\chi _1$ and $\chi _2$ , we have

$$\begin{align*}\langle \chi_1,\chi_2\circ T^g\rangle = 0\end{align*}$$

whenever $|g|$ is sufficiently large.

Step 2. Every character of X is the restriction of a character of $\mathbb {Z}_2^\Gamma $ , and all of these have the form

$$\begin{align*}\chi_U(\mathbf{x}) := (-1)^{\sum_{g \in U}x_g} \qquad (\mathbf{x} \in \mathbb{Z}_2^\Gamma)\end{align*}$$

for some finite subset U of $\Gamma $ . In addition, the action T is by group automorphisms and satisfies $\chi _U\circ T^g = \chi _{gU}$ . Therefore, $\chi _U\cdot (\chi _W\circ T^g) = \chi _{U\triangle gW}$ for any U, W and g, where $\triangle $ denotes symmetric difference.

Let E be the set of all k-cycles of the form $\{g,gs_i,\dots ,gs_i^{k-1}\}$ in the Cayley graph of $\Gamma $ , where $g \in \Gamma $ and $i \in \{1,\dots ,d\}$ . As in the Introduction, we can regard E as the set of k-hyper-edges of a hypergraph on $\Gamma $ , and the corresponding characters $\chi _e$ for $e \in E$ give the parity checks that define the LDPC subgroup X. Therefore, by Pontrjagin duality, a character $\chi _U$ restricts to the identity character on X if and only if there is a finite subfamily F of E such that

$$\begin{align*}1_U = \sum_{e \in F}1_E \quad\mod 2.\end{align*}$$

Let us write $X^\perp $ for the collection of finite subsets U that have this property. Regarded as a subspace of $\mathbb {Z}_2^{\oplus \Gamma }$ , this $X^\perp $ is the linear span of the set $\{1_e:\ e \in E\}$ .

In these terms, we must show that if U and W are finite subsets of $\Gamma $ and neither of them lies in $X^\perp $ , then $U\triangle gW$ also does not lie in $X^\perp $ whenever $|g|$ is sufficiently large. Fix such U and W for the rest of the proof.

Step 3. Let $B_r$ and $B_s$ be closed balls around the identity in the right Cayley graph that contain U and W, respectively. More specifically, in this last step of the proof, we assume that $g \in \Gamma $ satisfies both (i) $|g|> r+ s+ 2$ and also (ii) $U\triangle gW \in X^\perp $ , and derive a contradiction from these. Assumption (i) implies U and $gW$ are disjoint, and therefore, $U\triangle gW=U\cup gW$ . Having made assumption (ii), let F be a finite subfamily of E such that

(43)

$$ \begin{align} 1_{U\triangle gW} = \sum_{e \in F}1_e \quad\mod 2. \end{align} $$

To work with elements of $X^\perp $ , it helps to introduce the dual graph $(E,\tilde {E}$ ) of the hypergraph $(\Gamma ,E)$ . The vertices of the dual graph are the hyperedges in E, and two hyperedges $e_1$ and $e_2$ are joined in $\tilde {E}$ if and only if $e_1\cap e_2 \ne \emptyset $ .

In the dual graph, F is the union of its connected components. At least one of these must meet both U and $gW$ . Indeed, otherwise we could let G be the union of those connected components of F that meet U and would then find that $1_U$ agrees with $\sum _{e \in G}1_e$ mod $2$ , contradicting our assumption that $U \not \in X^\perp $ .

Next, because the hypergraph $(\Gamma ,E)$ is a hyper-tree and we have $|g|> r+s +2$ , there is a single hyperedge $e_0$ lying ‘between’ $B_r$ and $gB_s$ in the following sense: removing this hyperedge $e_0$ disconnects that hypergraph into k components, one of which contains the whole of $B_r$ (and hence U), and a different one of which contains the whole of $gB_s$ (and hence $gW$ ).

As a result, if $F_0$ is a connected component of F in the dual graph which meets both U and $gW$ , then $F_0$ must contain $e_0$ . Now consider again the k connected components of the hypergraph $(\Gamma ,E\setminus e_0)$ . Since $k\ge 3$ , at least one of them does not meet either U or $gW$ ; let $V \subset \Gamma $ be one such component. Then V meets $e_0$ in a single vertex (that is, group element) h.

Finally, let $h'$ be a vertex of $V\cap \bigcup _{e\in F}e$ that lies at maximal distance from h. This $h'$ may be equal to h, or it may lie ‘deeper inside’ the component V. However, by that distance maximality and the hyper-tree structure of $(\Gamma ,E)$ , $h'$ can be contained in only one member of F, and because $h' \in V$ , it cannot lie in either U or $gW$ . So at $h'$ , the left-hand side of (43) must equal zero, while the right-hand side must equal $1$ . This is the desired contradiction with that equation.

Proof of Theorem 6.1(2).

Let ${\mathrm {Loc}({\mu _n}, {v})}_{B_r}$ denote the marginal of ${\mathrm {Loc}({\mu _n}, {v})}$ on $X_{n,v,r}$ .

Since ${\mathrm {Loc}({\mu _n}, {v})}_{B_r}$ is the uniform distribution on $X_{n,v,r}$ and $m_{B_r}$ is the uniform distribution on $X_{B_r}$ , local convergence in probability to $\mu $ is implied by

$$\begin{align*}\lim_{n\to\infty} \mathbb{P} \left( \frac{1}{{\lvert {V_n}\rvert} } {\lvert { \left\{ v \in V_n \,:\, X_{n,v,r} \ne X_{B_r} \right\}} \rvert}> \varepsilon \right) = 0 , \end{align*}$$

which is true by Lemma 8.1.

Since $(\mu _n)_n$ converges locally in probability to $m_X$ , which is ergodic by Lemma 8.2, Lemma 4.1 completes the proof.

9 Proof of Theorem 1.3

Next, we prove the upper bound in Theorem 1.3, part (a). After this, part (b) of the Theorem 1.3 is an immediate consequence of Theorems 6.1 (items 1,2) and 4.3.

9.1 Proof of the sofic entropy value

Proof of Theorem 1.3, part (a)

Let $\sigma _n \sim \mathbb {P}_n$ and let $\Omega _n' \subseteq \Omega _n^{\mathrm {sofic}}$ be as in Proposition 3.2. Let $\Sigma =\{\sigma _n\}_{n=1}^\infty $ satisfy $\sigma _n \in \Omega ^{\prime }_{i_n}$ for some increasing sequence $(i_n)_n$ with $k \mid i_n$ for all n. It suffices to prove

$$ \begin{align*}\operatorname{\mathrm{h}}_\Sigma(X,m_X,T)= (1-d/k)\log 2.\end{align*} $$

By definition,

$$ \begin{align*}\mathrm{H_K}(m_X) = (1-d) \operatorname{\mathrm{H}}( W_{m_X}(\cdot) ) + \frac{1}{k} \sum_{i \in [d]} \operatorname{\mathrm{H}}( W_{m_X}(\cdot; i) ).\end{align*} $$

The weight $W_{m_X}$ is uniform on $\mathbb {Z}_2$ . So $ \operatorname {\mathrm {H}}( W_{m_X}(\cdot ) )=\log (2)$ . For each $i\in [d]$ , the measure $W_{m_X}(\cdot; i)$ is supported on the subspace of $\mathbb {Z}_2^k$ which is the kernel of the homomorphism $(x_1,\ldots , x_k) \mapsto \sum _i x_i \in \mathbb {Z}_2$ . This subspace has cardinality $2^{k-1}$ . So

$$ \begin{align*}\operatorname{\mathrm{H}}( W_{m_X}(\cdot; i) ) \leq (k-1)\log(2).\end{align*} $$

Combined with the previous formula, we obtain

$$ \begin{align*}\mathrm{H_K}(m_X) \leq (1 -d/k)\log(2).\end{align*} $$

The conclusion of Proposition 3.2 now implies the upper bound

$$ \begin{align*}\operatorname{\mathrm{h}}_\Sigma(X,m_X,T)\le (1-d/k)\log 2.\end{align*} $$

By Theorem 6.1(2), $\mu _n$ converges empirically to $m_X$ . Therefore, if $\mathcal {O}$ is any open neighborhood of $m_X$ , then

$$ \begin{align*}1=\lim_{n\to\infty} \mu_n(\Omega(\sigma_n, \mathcal{O})).\end{align*} $$

Since $\mu _n$ is the uniform measure on $X_{\sigma _n}$ , this implies

$$ \begin{align*}\operatorname{\mathrm{h}}_\Sigma(X,m_X,T)\ge \lim_{n\to\infty} \frac{1}{i_n} \log |X_{\sigma_n}|.\end{align*} $$

We may estimate $|X_{\sigma _n}|$ by simple dimension-counting. In fact, $X_{\sigma _n}$ is the kernel of a homomorphism from $\mathbb {Z}_2^{i_n}$ to $\mathbb {Z}_2^{di_n/k}$ . So

$$ \begin{align*}|X_{\sigma_n}| \ge 2^{(1-d/k)i_n}.\end{align*} $$

This gives the lower bound

$$ \begin{align*}\operatorname{\mathrm{h}}_\Sigma(X,m_X,T)\ge (1-d/k)\log 2.\\[-35pt]\end{align*} $$

10 Proof of Theorems B and C

Theorem B is an immediate consequence of Theorem 6.1(3) and Corollary 5.5(2).

10.1 Expected number of low-density codewords

In this section, we derive an asymptotic formula for the expected number of low-density codewords. Versions of this appear in [Reference Mézard and Montanari49, Formula (11.10)] and [Reference Richardson and Urbanke61, Lemma 3.163]. The authors of [Reference Mézard and Montanari49] and [Reference Richardson and Urbanke61] use a configuration model instead of a permutation model. As discussed above, these models are not contiguous (see Appendix A). Moreover, we are able to show a stronger result than contiguity alone would imply.

For $\eta>0$ and $\sigma \in \operatorname {\mathrm {Hom}}_{\mathrm {unif}}(\Gamma , \operatorname {\mathrm {Sym}}(kn))$ , let $X_{\sigma }^\eta $ denote the set of $\mathbf {x} \in (\mathbb {Z}_2)^{kn}$ for which the sum around all but at most a fraction $\eta $ of each of the d hyper-edges types is even. More formally,

$$\begin{align*}X_\sigma^\eta = \bigcap_{i \in [d]} \left\{\mathbf{x} \in (\mathbb{Z}_2)^{kn} \,:\, {\lvert {\left\{v \in [kn] \,:\, \sum_{j=0}^{k-1} \mathbf{x}(\sigma_i^j v) = 0 \quad\pmod{2} \right\}}\rvert}> kn (1-\eta) \right\} .\end{align*}$$

We can think of these as ‘approximate codewords’.

For $t \in [0,1]$ , we define the upper exponential growth rate of the expected number of approximate codewords of density t by

$$\begin{align*}G_{\mathrm{cw}}(t) = \inf_{\varepsilon, \eta>0}\limsup_{n \to \infty} \frac{1}{kn} \log \mathbb{E}_{\sigma \sim \mathbb{P}_{kn}} {\lvert {\{\mathbf{x} \in X_\sigma^\eta \,:\, \tfrac{1}{kn}{\lvert {\mathbf{x}}\rvert} \in (t-\varepsilon,\, t+\varepsilon) \}}\rvert}. \end{align*}$$

Proposition 10.1. For any $d,k$ ,

$$\begin{align*}G_{\mathrm{cw}}(t) = \frac{1}{2} t \big( d \log (k-1)- d +2 +(d -2) \log (t) \big)+O(t^2). \end{align*}$$

In particular, for $k \geq 2$ , this is negative for small $t>0$ if and only if $d> 2$ .

The ‘ $O(t^2)$ ’ term here is a power series convergent on some neighborhood of $t=0$ with lowest-order term $t^2$ .

Figure 2 compares exact plots of $G_{\mathrm {cw}}(t)$ (created using parametric plots in the parameter s of Lemma 10.2) with plots of this approximation.

Figure 2 Comparison of $G_{\mathrm {cw}}(t)$ (solid lines) with asymptotic in Proposition 10.1 (dashed lines) for $k=6$ and several choices of d.

The proof of Proposition 10.1 is based on the following lemma.

Lemma 10.2. For any $k,d$ and any $t \in [0,1]$ ,

$$\begin{align*}G_{\mathrm{cw}}(t) = (1-d) \operatorname{\mathrm{H}}(t) + \frac{d}{k}\left( -kt \log s + \log Z \right) , \end{align*}$$

where s and Z are related to t via

$$\begin{align*}t = s \frac{(1+s)^{k-1} - (1-s)^{k-1}}{(1+s)^k + (1-s)^k} \quad \text{and} \quad Z = \frac{1}{2} \left( (1+s)^k + (1-s)^k \right). \end{align*}$$

Proof of Lemma 10.2.

Here, let $\mathtt {A} = \mathbb {Z}_2$ . We also let $\mathcal {W}$ denote the set of weights which have edge weights that are cyclically invariant and supported on configurations with even parity. The terminology of weights was defined in Section 3.2.

We first show that, for every $t \in [0,1]$ ,

(44)

$$ \begin{align} G_{\mathrm{cw}}(t) = \sup_{W \in \mathcal{W} \,:\, W(1) = t} \mathrm{H_K}(W). \end{align} $$

First, suppose $W \in \mathcal {W}$ has $W(1) = t$ . For every $\varepsilon , \eta> 0$ , if $\delta>0$ is small enough, then we always have

$$\begin{align*}\{ \mathbf{x} \in (\mathbb{Z}_2)^{kn} \,:\, {\| {W_{\sigma, \mathbf{x}} - W}\|} < \delta\} \subseteq \{ \mathbf{x} \in X_\sigma^\eta \,:\, \tfrac{1}{kn} {\lvert {\mathbf{x}} \rvert} \in (t - \varepsilon, t+ \varepsilon) \} .\end{align*}$$

By Proposition 3.1, this implies that $\mathrm {H_K}(W) \leq G_{\mathrm {cw}}(t)$ , which gives one half of Equation 44.

For the converse inequality: Given $\varepsilon ,\eta>0$ , let $\mathcal {W}_{\varepsilon ,\eta }$ be the set of cyclically-invariant weights with ${\lvert {W(1) - t}\rvert } < \varepsilon $ and each $W(\cdot; i)$ giving mass at least $1 - \eta $ to even-parity configurations.

Given $\delta>0$ , by Proposition 3.1 for each weight W, there is some $r_W> 0$ which satisfies

$$\begin{align*}\limsup_{n \to \infty} \frac{1}{kn} \log \mathbb{E}_\sigma {\lvert {\{ \mathbf{x} \in (\mathbb{Z}_2)^{kn} \,:\, {\| {W_{\sigma, \mathbf{x}} - W} \|} < r_W \}} \rvert} \leq \mathrm{H_K}(W) + \delta. \end{align*}$$

By compactness, there is a finite set $\mathcal {S} \subset \overline {\mathcal {W}_{\varepsilon ,\eta }}$ such that the balls centered at $W \in \mathcal {S}$ of radius $r_W$ cover $\mathcal {W}_{\varepsilon ,\eta }$ . Then we have

$$\begin{align*}\{ \mathbf{x} \in X_\sigma^\eta \,:\, \tfrac{1}{kn}{\lvert {\mathbf{x}}\rvert} \in (t-\varepsilon, t+\varepsilon) \} \subseteq \bigcup_{W \in \mathcal{S}} \{ \mathbf{x} \in (\mathbb{Z}_2)^{kn} \,:\, {\| {W_{\sigma, \mathbf{x}} - W} \|} < r_W \}. \end{align*}$$

Therefore,

$$ \begin{align*} \limsup_{n \to \infty} &\frac{1}{kn} \log \mathbb{E}_\sigma {\lvert {\{ \mathbf{x} \in X_\sigma^\eta \,:\, \tfrac{1}{kn}{\lvert {\mathbf{x}}\rvert} \in (t-\varepsilon, t+\varepsilon) \}}\rvert} \\ &\leq \max_{W \in \overline{\mathcal{W}_{\varepsilon,\eta}}} \limsup_{n \to \infty} \frac{1}{kn} \log \mathbb{E}_\sigma {\lvert {\{ \mathbf{x} \in (\mathbb{Z}_2)^{kn} \,:\, {\| {W_{\sigma, \mathbf{x}} - W}\|} < r_W \}}\rvert} \\ &\leq \max_{W \in \overline{\mathcal{W}_{\varepsilon,\eta}}} \mathrm{H_K}(W) + \delta. \end{align*} $$

Taking $\varepsilon ,\eta $ and then $\delta $ to 0 gives the other half of Equation 44.

Now any weight achieving the supremum in 44 must have all edge weights equal: to maximize $\mathrm {H_K}$ under the constraint $W(1) = t$ , whichever $W(\cdot; i)$ maximizes the edge term in the definition of $\mathrm {H_K}(W)$ should be used for all $i \in [d]$ .

We can specify such a weight by a single probability vector $\mathbf {p} \in \operatorname {\mathrm {Prob}}(\mathtt {A}^{\mathbb {Z}_k}/\mathbb {Z}_k)$ recording the edge weight. We use a probability measure on $\mathtt {A}^{\mathbb {Z}_k}/\mathbb {Z}_k$ rather than $\mathtt {A}^{\mathbb {Z}_k}$ because the edge weights must be invariant under cyclic permutations. Let $W_{\mathbf {p}}$ be the weight with edge weights specified by $\mathbf {p}$ , and write $\alpha (\mathbf {p}) = W_{\mathbf {p}}(1)$ . The Kikuchi entropy of $W_{\mathbf {p}}$ is

$$\begin{align*}\mathrm{H_K}(W_{\mathbf{p}}) = (1-d) \operatorname{\mathrm{H}}(\alpha(\mathbf{p})) + \frac{d}{k}\left( \operatorname{\mathrm{H}}(\mathbf{p}) + \sum_{[\mathbf{a}] \in \mathtt{A}^{\mathbb{Z}_k}/\mathbb{Z}_k} \mathbf{p}([\mathbf{a}]) \log {\lvert {[\mathbf{a}]}\rvert}\right), \end{align*}$$

where ${\lvert {[\mathbf {a}]}\rvert }$ is the number of elements of the equivalence class $[\mathbf {a}] \in \mathtt {A}^{\mathbb {Z}_k}/\mathbb {Z}_k$ .

Now, we are interested in estimating

$$\begin{align*}G_{\mathrm{cw}}(t) = \max_{\mathbf{p} \,:\, \alpha(\mathbf{p})=t} \mathrm{H_K}(W_{\mathbf{p}}) , \end{align*}$$

where the maximum is also constrained to $\mathbf {p}$ supported on equivalence classes of even-parity configurations. Since $\alpha (\mathbf {p})$ is fixed to be t, we really just need to maximize

$$\begin{align*}\operatorname{\mathrm{H}}(\mathbf{p}) + \sum_{[\mathbf{a}] \in \mathtt{A}^{\mathbb{Z}_k}/\mathbb{Z}_k} \mathbf{p}([\mathbf{a}]) \log {\lvert {[\mathbf{a}]}\rvert} \end{align*}$$

subject to the constraints

$$\begin{align*}\sum_{[\mathbf{a}]} \mathbf{p}([\mathbf{a}]) \alpha([\mathbf{a}]) = kt \quad \text{and} \quad \sum_{[\mathbf{a}]} \mathbf{p}([\mathbf{a}]) = 1, \end{align*}$$

where $\alpha ([\mathbf {a}])$ is the number of $1$ ’s in any representative of $[\mathbf {a}]$ . So we get two Lagrange multipliers $\lambda _1, \lambda _2$ so that a maximizer on the interior of the constraint region is given by

$$\begin{align*}\mathbf{p}([\mathbf{a}]) = {\lvert {[\mathbf{a}]}\rvert} \cdot e^{\lambda_1 \alpha([\mathbf{a}]) + \lambda_2 - 1} \end{align*}$$

for $[\mathbf {a}] \in \mathtt {A}^{\mathbb {Z}_k}/\mathbb {Z}_k$ with even parity and 0 otherwise. We rewrite this as

$$\begin{align*}\mathbf{p}([\mathbf{a}]) = {\lvert {[\mathbf{a}]}\rvert} \cdot \frac{s^{\alpha([\mathbf{a}])}}{Z}, \end{align*}$$

where Z is determined by the normalization constraint and s is determined by the density-t constraint. Since the objective function is strictly concave, the critical point given by these $Z,s$ is in fact the unique maximum. The corresponding $\mathrm {H_K}$ -maximizing weight W therefore satisfies

$$\begin{align*}W(\mathbf{a}; i) = \frac{s^{\alpha(\mathbf{a})}}{Z} \end{align*}$$

if $\alpha (\mathbf {a})$ (the number of $1$ ’s in $\mathbf {a}$ ) is even and $W(\mathbf {a}; i) = 0$ otherwise.

Note that

(45)

$$ \begin{align} Z = \frac{1}{2} \left( (1+s)^k + (1-s)^k \right) \end{align} $$

since

$$\begin{align*}Z = \sum_{\substack{\mathbf{a} \in \mathtt{A}^k \\ \alpha(\mathbf{a}) \text{ even}}} s^{\alpha(\mathbf{a})} = \sum_{\substack{m = 0 \\ m \text{ even}}}^k s^m \binom{k}{m} = \frac{1}{2} \left( \sum_{m=0}^k s^m \binom{k}{m} + \sum_{m=0}^k (-s)^m \binom{k}{m} \right). \end{align*}$$

The constraint $\sum _{[\mathbf {a}]} \mathbf {p}([\mathbf {a}])\, \alpha ([\mathbf {a}]) = kt$ gives the relation between t and s:

$$\begin{align*}kt = \sum_{[\mathbf{a}] \text{ even}} {\lvert {[\mathbf{a}]} \rvert} \frac{s^{\alpha([\mathbf{a}])}}{Z} \cdot \alpha([\mathbf{a}]) = s \frac{\frac{dZ}{ds}}{Z} = s k \frac{(1+s)^{k-1} - (1-s)^{k-1}}{(1+s)^k + (1-s)^k}. \end{align*}$$

Finally, we can calculate

$$\begin{align*}\operatorname{\mathrm{H}}(W(\cdot; i)) = -\sum_{\mathbf{a} \text{ even}} \frac{s^{\alpha(\mathbf{a})}}{Z}\alpha(\mathbf{a}) \log s + \log Z = -kt \log s + \log Z, \end{align*}$$

which gives the claimed formula.

This lemma does not give an explicit formula for $G_{\mathrm {cw}}(t)$ : we have t as a function of s, but we do not have an explicit formula for the inverse function. Still, we can use it to prove Proposition 10.1.

Proof of Proposition 10.1.

Expanding the formula for t as a power series centered at $s=0$ , we get $t = (k-1) s^2 + O(s^4)$ so, by Lagrange inversion [Reference Whittaker and Watson75, §7.32], there is a power series for $s^2$ on some interval around $0$ of the form

$$\begin{align*}s^2 = \frac{t}{k-1} + O(t^2). \end{align*}$$

Hence,

$$\begin{align*}t \log s = \frac{1}{2} t \log \frac{t}{k-1} + O(t^2) \end{align*}$$

and

$$\begin{align*}\log Z = \frac{1}{2} k (k-1) s^2 + O(s^4) = \frac{1}{2} kt + O(t^2). \end{align*}$$

The estimate for Shannon entropy

$$\begin{align*}\operatorname{\mathrm{H}}(t) = -t \log t + t + O(t^2) \end{align*}$$

completes the proof.

10.2 Proof of Theorem 6.1(3)

Recall that $\mu $ has totally shattered microstate spaces along $(\sigma _n)_n$ if there exists a $\delta> 0$ for which the following holds: For every $\varepsilon> 0$ there exist a weak $^\ast $ neighbourhood U of $\mu $ and a positive integer $n_0$ such that for any $n\ge n_0$ and any two microstates $\mathbf {x},\mathbf {y} \in \Omega (\sigma _n,U)$ , we have either $\textrm {d}^{(V_n)}(\mathbf {x},\mathbf {y}) \ge \delta $ or $\textrm {d}^{(V_n)}(\mathbf {x},\mathbf {y}) < \varepsilon $ .

Proof of Theorem 6.1(3).

Given $k \geq 2$ and $d>2$ , by Proposition 10.1, there is some $\delta>0$ such that $G_{\mathrm {cw}}(t) < 0$ for $t \in (0,\delta )$ . We will use this $\delta $ to establish totally shattered microstates.

For any fixed $\varepsilon>0$ ,

$$\begin{align*}\inf_{\eta> 0} \limsup_{n \to \infty} \frac{1}{n} \log \mathbb{E} | \{ \mathbf{x} \in X_{n}^\eta \,:\, \tfrac{1}{n} |\mathbf{x}| \in [\varepsilon, \delta] \} | = \sup\{ G_{\mathrm{cw}}(t) \,:\, t \in (\varepsilon, \delta) \} < 0. \end{align*}$$

Pick some $\eta $ such that the expression in the infimum is negative, and let $U^\eta \subset \operatorname {\mathrm {Prob}}(\mathbb {Z}_2^\Gamma )$ be the set of all probability measures whose marginal on every hyper-edge gives probability greater than $1 - \eta $ to labelings with even parity. Then $U^\eta $ is a weak $^\ast $ neighborhood of $m_X$ and

$$\begin{align*}X_{n}^\eta = \Omega(\sigma_{n}, U^\eta) , \end{align*}$$

$$\begin{align*}\limsup_{n \to \infty} \frac{1}{n} \log \mathbb{E} | \{ \mathbf{x} \in \Omega(\sigma_{n}, U^\eta) \,:\, \tfrac{1}{n} |\mathbf{x}| \in [\varepsilon, \delta] \} | < 0, \end{align*}$$

and there are subsets $\Omega _n^\varepsilon \subset \Omega _n^{\mathrm {sofic}}$ with $\mathbb {P}_n(\Omega _n^\varepsilon ) \to 1$ and such that for all large enough n, if $\sigma _n \in \Omega _n^\varepsilon $ , then $\{ \mathbf {x} \in \Omega (\sigma _{n}, U^\eta ) \,:\, \tfrac {1}{n} |\mathbf {x}| \in [\varepsilon , \delta ] \} = \varnothing $ . Now if $\mathbf {x}, \mathbf {y} \in \Omega (\sigma _{n}, U^{\eta /2})$ , then $\mathbf {x}+\mathbf {y} \in \Omega (\sigma _{n}, U^{\eta })$ , so

$$\begin{align*}\textrm{d}^{(V_n)}(\mathbf{x}, \mathbf{y}) = \tfrac{1}{n} |\mathbf{x} + \mathbf{y}| \notin [\varepsilon, \delta]. \end{align*}$$

We can then get a single sequence $\Omega _n'$ that works for every $\varepsilon $ by picking one for each $\varepsilon = \frac {1}{2}, \frac {1}{3}, \ldots $ and then using a diagonal argument.

10.3 Proof of Theorem C

Theorem C is an immediate consequence of Corollary 5.5(3) and Theorem 6.1(3).

11 Directions for further study

A probability measure-preserving action $\Gamma {\curvearrowright } (X,\mu )$ is anti-Pinkser if it has positive entropy but does not have any nontrivial direct Bernoulli factors. We are being deliberately vague here by not specifying whether ‘positive entropy’ refers to sofic, Rokhlin or some other notion of entropy.

11.1 Possible anti-Pinsker actions of other groups

1. Are there explicit anti-Pinsker actions of a free group? One candidate is the frozen model associated to independent sets [Reference Ding, Sly and Sun25].
2. Do all non-amenable groups admit anti-Pinsker actions? Here, it might be necessary to use Rokhlin rather than sofic entropy.
3. Given a positive number h and a non-amenable group $\Gamma $ , does there exist an uncountable family of pairwise non-measurably conjugate ergodic pmp actions of $\Gamma $ which are anti-Pinsker, have completely positive sofic entropy and have sofic entropy h (with respect to some fixed sofic approximation)? Starting from the example of the present paper, one place to look might be among its ‘typical’ compact extensions.

11.2 Open problems for the parity-check subshift

Let $(X,m_X,T)$ be the system in Theorem A.

1. Does there exist a sofic approximation $\Sigma $ to $\Gamma $ with respect to which $(X,m_X,T)$ does not have completely positive sofic entropy?
2. Is $(X,m_X,T)$ finitely determined? This would mean that if $(\{0,1\}^\Gamma ,\mu ,T)$ is another system for which $\mu $ is close to $m_X$ both in sofic entropy and in the weak $^\ast $ topology, then there is a joining of these two systems under which the identity coordinates agree with high probability. This property characterizes those processes isomorphic to Bernoulli shifts over amenable groups [Reference Ornstein and Weiss58], but little is known about it for non-amenable groups.
3. Formally, the family of equations that defines our LDPC shift X can be used to define a system of algebraic origin inside $A^\Gamma $ for another compact Abelian group A, such as a finite cyclic group or the continuous circle $\mathbb {R}/\mathbb {Z}$ . For which such A (and which values of d and k) is the resulting system still anti-Pinsker? If there are such examples with $A = \mathbb {R}/\mathbb {Z}$ , do these have infinite sofic entropy?

A A failure of contiguity

Let V be a vertex set of size n divisible by k, let E be a set of size $d n/k$ , and let $\tilde {\mathbb {P}}_n$ be the measure on k-uniform d-regular factor graphs on $(V,E)$ that is constructed in Section 6.3. In addition, let $\mathbb {P}_n^{\mathrm {unif}}$ be the uniform distribution on all such k-uniform d-regular factor graphs on $(V,E)$ produced without respect to a partition $E = \bigsqcup _{i\in [d]} E_i$ .

Proposition A.1. Let $k> d \ge 3$ , and let

$$\begin{align*}U_n := \left\{ H \subset V \times E :\ \exists E'\subseteq E\ \mbox{with every } v \in V \mbox{ adj.~to exactly two elements of } E' \right\}.\end{align*}$$

Then $\tilde {\mathbb {P}}_n(U_n) = 1$ for all n, but $\mathbb {P}_n^{\mathrm {unif}}(U_n)\to 0$ as $n\to \infty $ . As a result, the models $\tilde {\mathbb {P}}_n$ and $\mathbb {P}_n^{\mathrm {unif}}$ are not contiguous.

Note that the definition of $U_n$ ensures that the contiguity also fails for the associated multi-hyper-graph models with unlabeled hyper-edges.

Proof. If H arises from $\tilde {\mathbb {P}}_n$ , then let $i,j \in [d]$ be distinct and let $E' = E_i \cup E_j$ . Then every vertex is adjacent to exactly one check node in each of $E_i, E_j$ , and in particular to exactly two check nodes in $E'$ .

However, the very precise calculations in [Reference Miller and Cohen51] show that, with high probability according to $\mathbb {P}_n^{\mathrm {unif}}$ , the transposed parity-check matrix associated to H has kernel that is either trivial (if d is odd) or one-dimensional (if d is even, in which case, the all 1’s vector is in the kernel). In either case, there can be no $E'$ as promised by the event $U_n$ , since it would give an additional nontrivial element of the kernel.

Acknowledgements

The authors thank Brandon Seward for helpful discussions and an anonymous referee for pointing out several corrections. The authors (ordered alphabetically by last name) were supported by NSF grants DMS-1855694, DMS-2154680 and DMS-1937215, respectively.

Competing interests

The authors have no competing interests to declare.

Footnotes

Tim Austin, supported in part by NSF grant DMS-1855694

Lewis Bowen, supported in part by NSF grant DMS-2154680

Christopher Shriver, supported in part by NSF grant DMS-1937215

1 Indeed, the second author has previously also suggested the term ‘replica-symmetric entropy’ [Reference Bowen14, Subsection 7.3], but we feel ‘annealed’ reflects the general nature of this quantity better.

References

Achlioptas, D and Ricci-Tersenghi, F (2006) On the solution-space geometry of random constraint satisfaction problems. In STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing. New York: ACM, 130–139.CrossRef Google Scholar

Airey, D, Bowen, L and Lin, Y (2022) A topological dynamical system with two different positive sofic entropies. Trans. Amer. Math. Soc. Ser. B 9(2), 35–98.CrossRef Google Scholar

Austin, T (2016) Additivity properties of sofic entropy and measures on model spaces. Forum Math. Sigma 4, 79p.CrossRef Google Scholar

Austin, T (2016) The geometry of model spaces for probability-preserving actions of sofic groups. Anal. Geom. Metr. Spaces 4(1), 160–186.Google Scholar

Austin, T (2018) Measure concentration and the weak Pinsker property. Publ. Math. Inst. Hautes Études Sci. 128, 1–119.CrossRef Google Scholar

Austin, T (2020) Multi-variate correlation and mixtures of product measures. Kybernetika 56(3), 459–499.Google Scholar

Austin, T and Burton, P (2019) Uniform mixing and completely positive sofic entropy. J. Anal. Math. 138(2), 597–612.CrossRef Google Scholar

Bethe, HA (1935) Statistical theory of superlattices. Proc. Roy. Soc. London Ser. A 152, 552–575.Google Scholar

Bowen, L (2010) The ergodic theory of free group actions: Entropy and the f-invariant. Groups Geom. Dyn. 4(3), 419–432.CrossRef Google Scholar

Bowen, L (2010) A measure-conjugacy invariant for free group actions. Ann. of Math. (2) 171(2), 1387–1400.CrossRef Google Scholar

Bowen, L (2010) Measure conjugacy invariants for actions of countable sofic groups. J. Amer. Math. Soc. 23(1), 217–245.CrossRef Google Scholar

Bowen, L (2011) Entropy for expansive algebraic actions of residually finite groups. Ergodic Theory Dynam. Systems 31(3), 703–718.CrossRef Google Scholar

Bowen, L (2012) Every countably infinite group is almost Ornstein. In Dynamical Systems and Group Actions, vol. 567. Contemp. Math. Providence, RI: Amer. Math. Soc., 67–78.CrossRef Google Scholar

Bowen, L (2018) A brief introduction to sofic entropy theory. In Proceedings of the International Congress of Mathematicians—Rio de Janeiro 2018. Vol. III. Invited Lectures. Hackensack, NJ: World Sci. Publ., 1847–1866.Google Scholar

Bowen, L (2019) Finitary random interlacements and the Gaboriau-Lyons problem. Geom. Funct. Anal. 29(3), 659–689.CrossRef Google Scholar

Bowen, L (2020) Examples in the entropy theory of countable group actions. Ergodic Theory Dynam. Systems 40(10), 2593–2680.CrossRef Google Scholar

Bowen, L (2022) Sofic homological invariants and the Weak Pinsker Property. Amer. J. Math. 144(1), 169–226.CrossRef Google Scholar

Bowen, L and Tucker-Drob, RD (2022) Superrigidity, measure equivalence, and weak Pinsker entropy. Groups Geom. Dyn. 16(1), 247–286.CrossRef Google Scholar

Burton, PJ and Kechris, AS (2020) Weak containment of measure-preserving group actions. Ergodic Theory Dynam. Systems 40(10), 2681–2733.CrossRef Google Scholar

Cameron, PJ and van Lint, JH (1991) Designs, Graphs, Codes and Their Links, vol. 22. London Mathematical Society Student Texts. Cambridge: Cambridge University Press.CrossRef Google Scholar

Capraro, V and Lupini, M (2015) Introduction to Sofic and Hyperlinear Groups and Connes’ Embedding Conjecture, vol. 2136. Lecture Notes in Mathematics. Cham: Springer. With an appendix by Vladimir Pestov.CrossRef Google Scholar

Coja-Oghlan, A and Efthymiou, C (2015) On independent sets in random graphs. Random Structures Algorithms 47(3), 436–486.CrossRef Google Scholar

Cover, TM and Thomas, JA (2006) Elements of Information Theory, 2nd edn. Hoboken, NJ: Wiley-Interscience [John Wiley & Sons].Google Scholar

Dembo, A, Montanari, A, Sly, A and Sun, N (2014) The replica symmetric solution for Potts models on

$d$ -regular graphs. Comm. Math. Phys. 327(2), 551–575.CrossRef Google Scholar

Ding, J, Sly, A and Sun, N (2016) Maximum independent sets on random regular graphs. Acta Math. 217(2), 263–340.CrossRef Google Scholar

Domb, C and Green, MS (eds) (1972) Phase Transitions and Critical Phenomena, vol. 2. London-New York: Academic Press.Google Scholar

Elek, G and Szabó, E (2004) Sofic groups and direct finiteness. J. Algebra 280(2), 426–434.CrossRef Google Scholar

Feller, W (1968) An Introduction to Probability Theory and Its Applications, vol. I, 3rd edn. New York-London-Sydney: John Wiley & Sons, Inc.Google Scholar

Fu, Y and Anderson, PW (1986) Application of statistical mechanics to NP-complete problems in combinatorial optimisation. J. Phys. A 19(9), 1605–1620.CrossRef Google Scholar

Gallager, RG (1963) Low-Density Parity-Check Codes. PhD thesis, MIT. http://www.inference.org.uk/mackay/gallager/papers/ldpc.pdf.CrossRef Google Scholar

Gallager, RG (1962) Low-density parity-check codes. IRE Trans. IT-8, 21–28.Google Scholar

Gamarnik, D (2021) The overlap gap property: A topological barrier to optimizing over random structures. Proc. Natl. Acad. Sci. 118(41). Available online at arXiv.org:2109.14409.CrossRef Google Scholar PubMed

Gamarnik, D and Sudan, M (2017) Limits of local algorithms over sparse random graphs. Ann. Probab. 45(4), 2353–2376.CrossRef Google Scholar

Greenhill, C, Janson, S, Kim, JH and Wormald, NC (2002) Permutation pseudographs and contiguity. Combin. Probab. Comput. 11, 273–298.CrossRef Google Scholar

Gromov, M (1999) Endomorphisms of symbolic algebraic varieties. J. Eur. Math. Soc. (JEMS) 1(2), 109–197.CrossRef Google Scholar

Hayes, B (2016) Fuglede-Kadison determinants and sofic entropy. Geom. Funct. Anal. 26(2), 520–606.CrossRef Google Scholar

Hayes, B (2017) Mixing and spectral gap relative to Pinsker factors for sofic groups. In Proceedings of the 2014 Maui and 2015 Qinhuangdao Conferences in Honour of Vaughan F. R. Jones’ 60th Birthday. Canberra: Australian National University, Centre for Mathematics and Its Applications, 193–221.Google Scholar

Hayes, B (2021) Relative entropy and the Pinsker product formula for sofic groups. Groups Geom. Dyn. 15(2), 413–463.CrossRef Google Scholar

Kerr, D (2013) Sofic measure entropy via finite partitions. Groups Geom. Dyn. 7(3), 617–632.CrossRef Google Scholar

Kerr, D and Li, H (2016) Ergodic Theory. Springer Monographs in Mathematics. Cham: Springer. Independence and dichotomies.CrossRef Google Scholar

Kieffer, JC (1975) A generalized Shannon-McMillan theorem for the action of an amenable group on a probability space. Ann. Probab. 3(6), 1031–1037.CrossRef Google Scholar

Kikuchi, R (1951) A theory of cooperative phenomena. Phys. Rev. 81, 988–1003.CrossRef Google Scholar

Kolmogorov, AN (1958) A new metric invariant of transient dynamical systems and automorphisms in Lebesgue spaces. Dokl. Akad. Nauk SSSR (N.S.) 119, 861–864.Google Scholar

Kolmogorov, AN (1959) Entropy per unit time as a metric invariant of automorphisms. Dokl. Akad. Nauk SSSR 124, 754–755.Google Scholar

Krza̧kała, F, Montanari, A, Ricci-Tersenghi, F, Semerjian, G and Zdeborová, L (2007) Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. USA 104(25), 10318–10323.CrossRef Google Scholar PubMed

Lyons, R (2017) Factors of IID on trees. Combin. Probab. Comput. 26(2), 285–300.CrossRef Google Scholar

MacKay, DJC (1999) Good error-correcting codes based on very sparse matrices. IEEE Trans. Inform. Theory 45(2), 399–431.CrossRef Google Scholar

MacKay, DJC (2003) Information Theory, Inference and Learning Algorithms. New York: Cambridge University Press.Google Scholar

Mézard, M and Montanari, A (2009) Information, Physics, and Computation. Oxford: Oxford University Press.CrossRef Google Scholar

Mézard, M, Parisi, G and Virasoro, MA (1987) Spin Glass Theory and Beyond, vol. 9. World Scientific Lecture Notes in Physics. Teaneck, NJ: World Scientific Publishing Co., Inc. Google Scholar

Miller, G and Cohen, G (2003) The rate of regular LDPC codes. IEEE Trans. Inform. Theory 49(11), 2989–2992.CrossRef Google Scholar

Ornstein, D (1970) Bernoulli shifts with the same entropy are isomorphic. Adv. Math. 4, 337–352.CrossRef Google Scholar

Ornstein, D (1970) Two Bernoulli shifts with infinite entropy are isomorphic. Adv. Math. 5, 339–348.CrossRef Google Scholar

Ornstein, DS (1973) An example of a Kolmogorov automorphism that is not a Bernoulli shift. Adv. Math. 10, 49–62.CrossRef Google Scholar

Ornstein, DS (1973) A

$K$ automorphism with no square root and Pinsker’s conjecture. Adv. Math. 10, 89–102.CrossRef Google Scholar

Ornstein, DS (1973) A mixing transformation for which Pinsker’s conjecture fails. Adv. Math. 10, 103–123.CrossRef Google Scholar

Ornstein, DS and Shields, PC (1973) An uncountable family of

$K$ -automorphisms. Adv. Math. 10, 63–88.CrossRef Google Scholar

Ornstein, DS and Weiss, B (1987) Entropy and isomorphism theorems for actions of amenable groups. J. Analyse Math. 48, 1–141.CrossRef Google Scholar

Pestov, VG (2008) Hyperlinear and sofic groups: A brief guide. Bull. Symb. Log. 14(4), 449–480.CrossRef Google Scholar

Popa, S and Sasyk, R (2007) On the cohomology of Bernoulli actions. Ergodic Theory Dynam. Systems 27(1), 241–251.CrossRef Google Scholar

Richardson, T and Urbanke, R (2008) Modern Coding Theory, 1st edn. Cambridge: Cambridge University Press.CrossRef Google Scholar

Rudolph, DJ and Weiss, B (2000) Entropy and mixing for amenable group actions. Ann. of Math. (2) 151(3), 1119–1150.CrossRef Google Scholar

Seward, B (2019) Krieger’s finite generator theorem for actions of countable groups I. Invent. Math. 215(1), 265–310.CrossRef Google Scholar

Seward, B (2019) Krieger’s finite generator theorem for actions of countable groups II. J. Mod. Dyn. 15, 1–39.CrossRef Google Scholar

Seward, B (2020) Positive entropy actions of countable groups factor onto Bernoulli shifts. J. Amer. Math. Soc. 33(1), 57–101.CrossRef Google Scholar

Seward, B (2022) Bernoulli shifts with bases of equal entropy are isomorphic. J. Mod. Dyn. 18, 345–362.CrossRef Google Scholar

Sinaĭ, JG (1964) On a weak isomorphism of transformations with invariant measure. Mat. Sb. (N.S.) 63(105), 23–42.Google Scholar

Stepin, AM (1975) Bernoulli shifts on groups. Dokl. Akad. Nauk SSSR 223(2), 300–302.Google Scholar

Thouvenot, JP (1977) On the stability of the weak Pinsker property. Israel J. Math. 27(2), 150–162.CrossRef Google Scholar

Voiculescu, D (1996) The analogues of entropy and of Fisher’s information measure in free probability theory. III. The absence of Cartan subalgebras. Geom. Funct. Anal. 6(1), 172–199.CrossRef Google Scholar

Wainwright, MJ and Jordan, MI (2008) Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305.CrossRef Google Scholar

Watanabe, S (1960) Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development 4(1), 66–82.CrossRef Google Scholar

Weiss, B (2000) Sofic groups and dynamical systems. Sankhyā Ser. A 62(3), 350–359. Ergodic theory and harmonic analysis (Mumbai, 1999).Google Scholar

Weiss, B (2003) Actions of amenable groups. In Topics in Dynamics and Ergodic Theory, vol. 310. London Math. Soc. Lecture Note Ser. Cambridge: Cambridge Univ. Press, 226–262.CrossRef Google Scholar

Whittaker, ET and Watson, GN (2021) A Course of Modern Analysis—An Introduction to the General Theory of Infinite Processes and of Analytic Functions with an Account of the Principal Transcendental Functions, 5th edn. Cambridge: Cambridge University Press. Edited by VH Moll, with a foreword by SJ Patterson.Google Scholar

Wormald, NC (1999) Models of random regular graphs. In Surveys in Combinatorics, 1999 (Canterbury), vol. 267. London Math. Soc. Lecture Note Ser. Cambridge: Cambridge Univ. Press, 239–298.Google Scholar

Yedidia, J (2000) An idiosyncratic journey beyond mean field theory. Technical Report TR-2000-27, Mitsubishi Electric Res. Lab.Google Scholar

Yedidia, J, Freeman, W and Weiss, Y (2001) Bethe free energy, Kikuchi approximations, and belief propagation algorithms. Technical Report TR-2001-16, Mitsubishi Electric Res. Lab.Google Scholar

Figure 1 Diagrams showing full factor graph H (left) and partial factor graph M (right). The square vertices on the right of each graph are the check nodes, colored according to their membership in the sets $E_1, E_2, E_3$. The distribution $\tilde {\mathbb {P}}^M$ draws a new pair $(\sigma , H)$ conditioned on the edges on the right being present.

Figure 2 Comparison of $G_{\mathrm {cw}}(t)$ (solid lines) with asymptotic in Proposition 10.1 (dashed lines) for $k=6$ and several choices of d.

Article contents

ALGEBRAIC DYNAMICAL SYSTEMS FROM LDPC CODES SATISFY A STRONG NEGATION OF THE WEAK PINSKER PROPERTY

Abstract

Keywords

MSC classification

Information

1 Introduction

1.1 Background: classical entropy theory

1.2 Background: sofic entropy theory

1.3 Main results

1.4 Probabilistic versions of the main theorems

1.5 Outline of the paper

1.6 Notational conventions

2 Preliminaries

2.1 Sofic groups

2.2 Sofic entropy

2.3 Factor maps and inherited entropy

3 Bethe–Kikuchi entropy and annealed calculations

3.1 Random sofic approximations

3.2 Formula for Kikuchi entropy

3.3 A brief history

4 Completely positive entropy, local convergence and model mixing

4.1 Property M

Definition 4.2 (Property M).

5 Shattering

5.1 Consequences of shattering

Proof of Theorem 5.2.

5.2 Shattering and Bernoulli splittings

Alternate proof of Corollary 5.5(2).

6 LDPC codes and measures on microstate spaces

6.1 The use of LDPC codes

6.2 Outline of the rest of the paper

6.3 Random factor graphs

6.4 Parity-check matrices

7 Property M

Proposition 7.1 (‘Few additional checks inside W’).

7.1 Property M from few additional checks

Proof of Theorem 7.4 assuming Proposition 7.1.

Proof of Theorem 6.1 item (1) from Theorem 7.4.

7.2 Proof that there are few additional checks

Proof of Proposition 7.1.

8 Proof of local and empirical convergence

Proof of Theorem 6.1(2).

9 Proof of Theorem 1.3

9.1 Proof of the sofic entropy value

Proof of Theorem 1.3, part (a)

10 Proof of Theorems B and C

10.1 Expected number of low-density codewords

Proof of Lemma 10.2.

Proof of Proposition 10.1.

10.2 Proof of Theorem 6.1(3)

Proof of Theorem 6.1(3).

10.3 Proof of Theorem C

11 Directions for further study

11.1 Possible anti-Pinsker actions of other groups

11.2 Open problems for the parity-check subshift

A A failure of contiguity

Acknowledgements

Competing interests

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests