Action convergence of operators and graphs

Abstract We present a new approach to graph limit theory that unifies and generalizes the two most well-developed directions, namely dense graph limits (even the more general 
$L^p$
 limits) and Benjamini–Schramm limits (even in the stronger local-global setting). We illustrate by examples that this new framework provides a rich limit theory with natural limit objects for graphs of intermediate density. Moreover, it provides a limit theory for bounded operators (called P-operators) of the form 
$L^\infty (\Omega )\to L^1(\Omega )$
 for probability spaces 
$\Omega $
 . We introduce a metric to compare P-operators (for example, finite matrices) even if they act on different spaces. We prove a compactness result, which implies that, in appropriate norms, limits of uniformly bounded P-operators can again be represented by P-operators. We show that limits of operators, representing graphs, are self-adjoint, positivity-preserving P-operators called graphops. Graphons, 
$L^p$
 graphons, and graphings (known from graph limit theory) are special examples of graphops. We describe a new point of view on random matrix theory using our operator limit framework.


Introduction
A fundamental question posed in the emerging field of graph limit theory is the following: How can we measure similarity of graphs?Each branch of graph limit theory is based on a similarity metric [28].Experience shows that, to be useful in applications, the similarity metric should satisfy a few natural properties.
(1) (Expressive power)The similarity metric should be fine enough to provide a rich enough picture of graph theory.(2) (Compactness)The similarity metric should be coarse enough to provide many interesting Cauchy convergent graph sequences.(3) (Limit objects) Limits of Cauchy convergent sequences of graphs should be naturally represented by "graph-like" analytic objects.

73
The tension between the first and the second requirement makes the search for useful similarity metrics especially interesting.The so-called dense graph limit theory is based on a set of equivalent metrics.One of them is the δ ◻ -distance [9,28,29].Convergence in δ ◻ is equivalent to the convergence of subgraph densities.The completion of the set of all graphs in this metric is compact, and thus every graph sequence has a convergent subsequence, which is a very useful property.A shortcoming of dense graph limit theory is that sparse graphs are considered to be similar to the empty graph and thus it has not enough expressive power to study graphs in which the number of edges is subquadratic in the number of vertices.Another similarity notion was introduced by Benjamini and Schramm [4] to study bounded degree graphs that are basically the sparsest graphs.This metric requires an absolute bound for the largest degree and hence it cannot be used for graphs with super-linear number of edges.
Graph sequences in which the number of edges is super-linear and subquadratic in terms of the number of vertices are called graphs of intermediate density.
Finding useful similarity notions for graphs of intermediate density is a major research direction in graph limit theory.There are many promising nonequivalent approaches to this subject [6-8, 18, 23, 32, 33, 35].However, none of them provides a real unification of the most well-developed branches: dense graph limit theory (together with its L p extension [7, 8]), Benjamini-Schramm limit theory (together with the stronger local-global convergence, see, e.g., [11,21]), and corresponding limit objects: graphons, L p graphons, and graphings.
In this paper, we take a new point of view on the subject.Instead of considering graphs as static structures, we focus more on the action and dynamics generated by graphs.One can associate various operators with graphs.The most well-known examples are: adjacency operators, Laplace operators, and Markov kernel operators (related to random walks).We formulate a framework theory of operator convergence and apply it to graph theory through representing operators.
The dynamical aspect is present in many existing limit theories.However, it has not been exploited to unify them.Limit objects, such as graphons and graphings, act on L 2 spaces of probability spaces.(Even the so-called L p graphons can be viewed as operators of the form L q (Ω) → L p (Ω), where Ω is a probability space.)While graphons are compact operators represented by measurable functions of the form W ∶ [0, 1] 2 → [0, 1], graphings are noncompact and are represented by singular measures on [0, 1] 2 concentrated on edge sets of bounded degree Borel graphs [14,21].A common property of all of these objects is that they are bounded operators in an appropriate norm and they act on function spaces of random variables.Graphons and graphings are bounded in the usual L 2 operator norm ∥.∥ 2→2 , and L p graphons are bounded in the ∥.∥ q→p norm, where p −1 + q −1 = 1.
In spite of the fact that existing convergence notions for graphons and graphings are intuitively similar, the exact connection has not yet been explained from a functional analytic point of view.In this paper, we introduce a general convergence notion for operators acting on functions on probability spaces.We show that graphon convergence, L p graphon convergence, and local-global convergence of graphings are all special cases of this general convergence notion.Moreover, we obtain a very general framework for graph limit theory by studying the convergence of operator representations of graphs.

Á. Backhausz and B. Szegedy
We also demonstrate that the new limit theory for operators has applications beyond graph theory through a new approach to random matrix theory.An important motivation for this paper comes from a previous result, by the authors, which proves Gaussianity for almost eigenvectors of random regular graphs using graph limit techniques (local-global limits) and information theory [2].It is very natural to ask if similar limit techniques can be used to study dense random matrices, such as matrices with i.i.d.±1 entries.Available graph limit techniques proved to be too weak for this problem.Dense random matrices (when regarded as weighted graphs) converge to trivial objects in dense graph limit theory.Note that an interesting connection between dense graph limits and random matrices was investigated in [31,37].
We propose a new limit approach for matrices, graphs, and operators, which is based on the following quite simple and natural probabilistic view point on matrix actions.Let A ∈ R n×n be an arbitrary matrix and let v ∈ R n be a vector.Let M denote the 2 × n matrix, whose rows are v and vA.Each column of M is an element in R 2 , thus, by choosing a random column, we obtain a probability distribution μ v on R 2 (see Figure 1).The following interesting question arises: How much do we learn about A if we know the set of all probability measures μ v arising this way?
It is easy to see, for example, that A is the identity matrix if and only if μ v is supported on the line y = x in R 2 for every v ∈ R n .The matrix A is degenerate if and only if there is a measure μ v , which is not the Dirac measure δ (0,0) , but it is supported on the line y = 0.
Philosophically, we regard each measure μ v as an observation associated with the action of A and we regard the set of all possible observations {μ v ∶ v ∈ R n } as the profile of A. A useful fact about profiles is that they allow us to compare matrices of different sizes, since they are sets of probability measures on R 2 independently of the sizes of the matrices.Another nice fact is that the profile of A contains rather detailed information about the eigenvalues of A and the entry distributions of the corresponding eigenvectors.It is easy to see that v is an eigenvector with eigenvalue λ if and only if the measure μ v is supported on the line y = λx in R 2 .The entry distribution of v is simply the distribution of the x coordinates in μ v .
It is useful to extend this idea to the case when k vectors v 1 , v 2 , . . ., v k are considered simultaneously.(For some technical reasons, we will assume that v 1 , v 2 , . . ., v k are in [−1, 1] n .)In this case, M is the 2k × n matrix with rows {v i } k i=1 and {v i A} k i=1 .A random column in M yields a probability distribution on R 2k , and the k-profile S k (A) of A is the set of all such probability measures.We regard A and B to be similar if for small natural numbers k their k-profiles are close in the Hausdorff metric d H defined for sets of probability measures on R 2k based on the Lévy-Prokhorov metric for individual measures (precise definition will be given in Section 2.) This similarity can be metrized by the formula A sequence of matrices converges in this metric if for every fixed k, their k-profiles converge in d H .The above ideas generalize naturally to the framework where (Ω, A, μ) is a probability space and A is an operator of the form A ∶ L ∞ (Ω) → L 1 (Ω).Such operators with an appropriate boundedness condition will be called P-operators (Definition 2.1).If v ∈ L ∞ (Ω), then both v and vA are random variables, and their joint distribution is a measure μ v on R 2 .This allows us to define k-profiles, metric, and convergence for P-operators similarly as we defined them for matrices.Note that matrices are special P-operators, where the probability space is [n] ∶= {1, 2, . . ., n} with the uniform distribution.In this case L ∞ ([n]) = L 1 ([n]) = R [n] , and every matrix is a P-operator.Note that both graphons (symmetric measurable functions of the form W ∶ [0, 1] 2 → [0, 1]) and graphings (certain bounded degree Borel graphs on measure spaces) are special P-operators.We prove the next surprising result.
Theorem 1.1 P-operator convergence (given by Definition 2.5) restricted to the set of graphons is the same as graphon convergence.Furthermore, P-operator convergence restricted to the set of graphings is equivalent to the local-global convergence of graphings.
The proof of the above theorem relies on a recent result of the second author, which reformulates local-global convergence in terms of colored star metric [36].Our main theorem (in an informal language) is the following.Theorem 1.2 (Compactness and limit object) Every sequence of P-operators with uniformly bounded ∥.∥ ∞→1 norms has a Cauchy convergent subsequence with respect to d M .Furthermore, if p, q ∈ [1, ∞), then every Cauchy convergent sequence of ∥.∥ p→q uniformly bounded P-operators has a limit, which is also a P-operator, and the same bound applies for its norm.
A particularly nice property of graphops is that they can be represented by symmetric finite measures ν on Ω 2 with absolutely continuous marginals (see Theorem 6.3).
Intuitively, the measure ν plays the role of the "edge set" of the graphop A. When scaled to a probability measure, ν can be used to sample a random element in Ω × Ω, which is the analogue of a random directed edge in a finite graph.By disintegrating ν, we obtain measures ν x for every x ∈ Ω, describing "neighborhoods" in A.
Adjacency matrices of graphs (or positive weighted graphs), graphons, L pgraphons, and graphings are all examples for graphops.A concrete example for a graphop (called spherical graphop), which is none of the previous classes, is explained in Figure 4.

Remark 1.3 Action convergence vs. shape (quotient) convergence:
The representation of graph limits with measures on Ω 2 (or more specifically on [0, 1] 2 ) is not new.It was proved in [26] that limits of shape convergent graph sequences can be represented by such measures.Roughly speaking, a graph sequence is shape convergent if the sets of possible bounded quotients of the graphs converge in the Hausdorff metric.Although shape convergence can be used to study intermediate density graph sequences, in general, it has less expressive power than action convergence.For many interesting graph sequences, action convergence is strictly finer than shape convergence.For example, shape convergence does not capture Benjamin-Schramm limits, whereas action convergence captures the even finer local-global limits.Another difference is that action convergence works for more general operators (for example, matrices with negative entries), but quotient convergence was not extended to this case due to various difficulties.

Remark 1.4 Graphops have "edge densities" and "degrees. "
The expected value of 1 Ω A is the edge density of A. The value of 1 Ω A at a point x ∈ Ω is the "degree" of x.The distribution of the random variable 1 Ω A is the "degree distribution" of A.
Remark 1. 5 In dense graph limit theory, there is a rather natural answer to the question when two graphons are isomorphic, i.e., their δ ◻ distance is 0 [10].Similar statements would be useful for sparse graph limits.Unfortunately, it is well known that in the case of Benjamini-Schramm and local-global convergence graphing isomorphism is much more complicated and no natural description is known.This problem is also inherited by action convergence.However, on the positive side, there are several natural invariants such as eigenvalue gap or "degree distribution" that are isomorphism invariants, as the following remark shows.

Remark 1.6
It is a known phenomenon in the theory of dense graph limit theory that certain graphon parameters are not continuous with respect to the topology but they satisfy lower semicontinuity.An example for this is the entropy ∫ −W log W [27]. Similar phenomenon happens in the theory of local-global limits.While the second largest eigenvalue is continuous in dense graph limit theory, it is only lower semicontinuous in the theory of local-global convergence, in particular, the spectral gap of the limit object is greater than or equal to the limit of the spectral gap of the graphs.This can be seen by making the following argument more precise.First, after an appropriate truncation, every eigenvector of the limit object can be approximated with almost eigenvectors on the finite graphs, and hence we can also find proper eigenvectors to an eigenvalue close to the original eigenvalue of the limit object.On the other hand, by considering disjoint union of a 3-regular random graph and a 3regular graph without spectral gap, and looking at the eigenvectors of the first part with some additional zeros, we can decrease the spectral gap of the limit with this kind of perturbation of the graph.Hence spectral gap cannot be continuous with respect to action convergence.Other graph parameters happen to be continuous, for example, the size of the maximal cut normalized by the number of edges, which can be represented in the profile of the graph (Figure 2).

Adjacency operator convergence:
We obtain a general graph convergence notion by considering the convergence of appropriately normalized adjacency matrices of graphs.For a graph G, let A(G) denote the adjacency matrix of G.It turns out that for a bounded degree sequence of graphs {G i } ∞ i=1 the P-operator convergence of the sequence {A(G i )} ∞ i=1 is equivalent to local-global convergence (and thus it implies Benjamini-Schramm convergence).On the other hand, for a general graph sequence, the P-operator convergence of A(G i )/|V (G i )| is equivalent to dense graph convergence.For graph sequences of intermediate growth, we normalize each operator A(G i ) by a constant depending on G i to obtain nontrivial convergence notion and limit object.A natural choice is the spectral radius given by ∥A(G i )∥ 2→2 or, more generally, norms of the form ∥A(G i )∥ p→q .
The convergence of normalized adjacency matrices leads to a rich limit theory for graphs of intermediate density.To demonstrate this, we give various examples for convergent sequences and limit objects.We calculate the limit object of hypercube graphs.The hypercube graph Q n is the graph on {0, 1} n in which two vectors are connected if they differ at exactly one coordinate.These graphs are very sparse and they are of intermediate density.The graph Q n is vertex-transitive and can be represented as a Cayley graph of the elementary abelian group Z n 2 with respect to a minimal generating system.Quite surprisingly, the limiting P-operator turns out to be also a Cayley graph of the compact group Z ∞ 2 with respect to a carefully chosen topological generating system.This illustrates that our limit objects give natural representations of convergent sequences.We calculate, similarly, natural representations for other convergent graph sequences such as increasing powers of regular graphs and incidence graphs of projective planes.Random walk metric and convergence: A possible limitation for the use of adjacency operator convergence is that it may trivialize if the degree distribution is very uneven in a graph sequence.The simplest examples are stars and subdivisions of complete graphs.In the star graph S n , there is one vertex with degree n and n vertices with degree 1.When normalized in any reasonable way, they converge to the 0 operator.The property that a graph has very uneven degree distribution is related to the property that a random walk on the graph spends a positive proportion of the time in a negligible fraction of the vertex set.A natural way to counterbalance this problem is to use Markov kernels of random walks instead of adjacency operators.(Such a modified limit was first used by Benjamini and Curien in the case of bounded degree graphs [3].)The P-operator language shows a nice advantage, in this case, to the plain matrix language (Figure 3).Even for finite graphs G, the corresponding Markov kernel is not just a matrix.The underlying probability space on V (G) is modified from the uniform distribution to the stationary distribution ν G of the random walk.Note that The random walk metric d RW on finite nonempty graphs is given by The completion of the set of finite, nonempty graphs in d RW is a compact space G RW .Elements of G RW can be represented by Markov graphops, i.e., positivity-preserving, self-adjoint 1-regular P-operators.Note that if A is a Markov graphop, then ∥A∥ 2→2 = 1.We will see that Markov graphops can also be represented by symmetric self-couplings of probability spaces (Ω, A, μ).A symmetric self-coupling is a probability measure ν on (Ω × Ω, A ⊗ A) such that ν is symmetric with respect to interchanging the coordinates and both marginals of ν are equal to μ.A very pleasant property of the set of all Markov graphops is that it is compact in the metric d M (see Theorem 3.3), and thus we do not need any extra conditions to guarantee convergent subsequences.
We will show in the examples section (Section 12) that stars and subdivisions of complete graphs converge to natural and nontrivial limit objects according to random walk convergence.Note that random walk convergence coincides with normalized adjacency operator convergence for regular graphs (graphs in which every degree is the same).
As we mentioned before, random walk convergence is very convenient.Every graph sequence {G i } ∞ i=1 with nonempty graphs has a convergent subsequence in d RW , and the limit object is usually an interesting structured object independently of the sparsity of the sequence.The most trivial the limit object one can get is the quasi-random Markov graphop, which can be represented by the constant 1 graphon W(x, y) ∶= 1.This occurs, for example, if the second largest eigenvalue in absolute value of M(G i ) is o (1).Extended random walk convergence: Finally, we describe a general convergence notion that combines the advantages of adjacency operator convergence and random walk convergence.A feature of random walk convergence is that some information may be lost in the limit regarding degree distributions.It turns out that there is a rather natural way to solve this problem, using a mild extension of random walk convergence based on a simultaneous version of action convergence.The principle of action convergence allows us to introduce the convergence of pairs (A, f ), where A ∶ L ∞ (Ω) → L 1 (Ω) is a P-operator and f is a measurable function on Ω. Roughly speaking, this goes by considering f as a reference function that is automatically included as the last function into every function system used in the definition of the k-profile of A. More precisely, we define S k (A, f ) as the set of all possible joint distributions of the random variables We can use the extra function to store information on the degrees of vertices in G.For a graph G, let d * G denote a function on V (G) that is an appropriately normalized version of the degree function d G .We can represent G by the pair (M(G), d * G ) (recall equation (1.1)).In the limit, we obtain a similar pair (A, f ), where A is a Markov graphop.The non-negative function f −1 (which may also take the value ∞) can be used to "re-scale" the probability measure on Ω to a possibly infinite measure.
Pairs of the form (A, f ) can also be used to represent generalized graphons of the form W ∶ R + × R + → [0, 1], where W is symmetric and ∥W∥ 1 ≤ ∞.This construction will be described in Section 5. Note that these generalized graphons arise in the recently emerging theory of graphexes [6,23].
Sparse random graphs: An important application of graph limit theory is in the study of random graph models.By useful limit theories, we can view large random graphs as approximations of a single idealistic infinite object.For example, in the framework of dense graph limit theory, growing Erdős-Rényi random graphs with edge probability 1/2 converge to the constant 1/2 graphon.Note that the existence of a single limit object requires a concentration of measure type result for the random model in the given graph metric.These concentration results are interesting even if an actual representation of the limit object is unknown.A good example for this is the fact that for fixed d random d-regular graphs on n points are concentrated in the localglobal topology if n is large [1].It is a far reaching open problem, however, that they concentrate around the same point for growing n.The case of preferential attachment trees is much better understood.They converge in the Benjamini-Schramm metric [5,34], moreover they are trees and thus they are hyperfinite graphs.It follows that they also converge in the local-global topology.Action convergence for bounded degree graphs is equivalent to local-global convergence by Theorem 9.2, and thus this theory includes all these facts.The next natural question is what happens for sparse but not bounded degree (intermediate density) graphs.It turns out that the intermediate density Erdős-Rényi graphs (normalized by the average degree) all converge to the constant 1 graphon, as Remark 2.15 in Section 2 shows.In this sense, Erdős-Rényi graphs behave similar to the dense setting.Applications to random matrix theory: As we mentioned earlier, the notion of Poperator convergence was partially motivated by efforts to find a fine enough convergence notion such that random matrices converge to a structured, nontrivial object.The study of this limiting object can help in describing approximate properties of random matrices, such as entry distributions of eigenvectors and almost eigenvectors.In dense graph limit theory, random matrices with i.i.d.±1 entries converge, but the limit object is the constant 0 function.(In a refinement of dense limit theory [25] the limit object is the constant function on [0, 1] 2 whose value is the uniform probability measure on {1, −1}.) Our main observation about random matrices is that k-profiles of appropriately normalized random matrices are nontrivial rich objects and their study brings a new point of view on random matrix theory.Let G n denote a random matrix whose entries are i.i.d.zero-mean ±1/ √ n-valued random variables.The normalizing factor √ n is needed to obtain bounded spectral radius.With probability close to 1, we have that ∥G n ∥ 2→2 is close to 2, see, e.g., [19].In this paper, we do ground work on the limiting properties of G n according to action convergence.We prove a concentration of measure type statement for G n with respect to the metric d M .This means that for large n the matrix-valued random variable G n is well concentrated in the metric space of P-operators together with distance d M .This concentration result, together with our compactness results, implies that for certain good sequences {n i } ∞ i=1 of natural numbers, {G n i } ∞ i=1 is convergent with probability one and the limit object is represented by some P-operator L acting on L 2 ([0, 1]) with bounded ∥.∥ 2→2 norm.In this paper, we leave the question open whether the sequence of all natural numbers is a good sequence.Note that a similar open problem is known for random regular graphs using local-global convergence [1] and it is known that a positive answer would imply the convergence of a great number of interesting graph parameters.For our application, it will be enough that any sequence of natural numbers contains a good subsequence.Our general results in this paper prepare a follow-up paper, which focuses on the limiting properties of random matrices with a special emphasis on eigenvectors and almost eigenvectors.

Limits of matrices and operators
For k vectors {v i ∈ R n } k i=1 , let us define their joint empirical entry distribution, denoted by D(v 1 , v 2 , . . ., v k ), as the probability measure on R k given by where v i , j denotes the jth component of v i and δ x denotes the Dirac measure at x ∈ R k .A natural view on empirical entry distributions is the following.Consider [n] ∶= {1, 2, . . ., n} as a probability space with the uniform distribution μ [n] and vectors in R n as functions of the form v ∶ [n] → R. From this view point, vectors are random variables and matrices in R n×n are operators acting on the space of random variables on the probability space ([n], μ [n] ).The joint empirical entry distribution D(v 1 , v 2 , . . ., v k ) is simply the joint entry distribution of the vectors v 1 , v 2 , . . ., v k viewed as random variables.
Let (Ω, A, μ) (or shortly Ω) be a probability space and assume that v 1 , v 2 , . . ., v k are R-valued measurable functions on Ω.We denote by D(v 1 , v 2 , . . ., v k ) the joint distribution of v 1 , v 2 , . . ., v k .In other words, it is the push-forward of the measure μ under the map x ↦ (v 1 (x), v 2 (x), . . ., v k (x)), which is a Borel measure on R k .

Definition 2.1 A P-operator is a linear operator of the form
is finite.We denote by B(Ω) the set of all P-operators on Ω.
is the set of all n × n matrices.Thus, every matrix A ∈ R n×n is a P-operator.
For a set S ⊆ R we denote by L ∞ S (Ω) the set of bounded measurable functions on Ω whose values are in S. Definition 2.2 (k-profile of P-operators) For a P-operator A ∈ B(Ω), we define the k-profile of A, denoted by S k (A), as the set of all possible probability measures of the form where v 1 , v 2 , . . ., v k run through all possible k-tuples of functions in L ∞ [−1,1] (Ω).For joint distributions of the form (2.2), we will often use the shorthand notation Let P(R k ) denote the set of Borel probability measures on R k for k ∈ N.
where B k is the Borel σ-algebra on R k and U ε is the set of points that have distance smaller than ε from U. Definition 2.4 (Hausdorff metric) We measure the distance of subsets X, Y in P(R k ) using the Hausdorff metric d H .
, where cl is the closure in d LP .It follows that d H is a pseudometric for all subsets in P(R k ) and it is a metric for closed sets.

Definition 2.5 (Action convergence of P-operators)
We say that a sequence of P-operators 2 We will often use the following consequence of the definition.For an action convergent sequence of operators Remark 2. 3 The completeness of (P(R k ), d LP ) implies that the induced Hausdorff topology is also complete [20].Therefore, a sequence {A i } ∞ i=1 in the above definition is convergent if and only if for every k ∈ N there is a closed set X k such that lim i→∞ d H (S k (A i ), X k ) = 0.

Definition 2.6 (Metrization of action convergence) For two P-operators A, B, let
Furthermore, let Q c (R k ) denote the set of closed sets in the metric space (P c (R k ), d LP ).The following lemma is an easy consequence of classical results.

Lemma 2.4
The metric spaces (P c (R k ), d LP ) and (Q c (R k ), d H ) are both compact and complete metric spaces.Proof Markov's inequality gives uniform tightness in P c (R k ), which implies the compactness of (P c (R k ), d LP ).It is known that Hausdorff distance for the closed subsets in a compact space is again compact (see, e.g., [20] or "Hausdorff metric" in [22]).∎ Lemma 2.5 Let A ∈ B(Ω) and let c ∶= max{∥A∥ ∞→1 , 1}.Then for every k ∈ N, we have that Proof Let {v i } k i=1 be a system of vectors in L ∞ [−1,1] (Ω).We have that ∥v i ∥ 1 ≤ ∥v i ∥ ∞ ≤ 1 and ∥v i A∥ 1 ≤ ∥A∥ ∞→1 holds for 1 ≤ i ≤ k.Since the first moments of the absolute values of the coordinates in (2.2) are given by {∥v i ∥ 1 } k i=1 and {∥v i A∥ 1 } k i=1 , it follows that S k (A) ∈ P c (R k ).Further using that S k (A) is closed, the proof is complete.∎ Lemma 2.4 and Lemma 2.5 have the following corollary.
Lemma 2.6 (Sequential compactness) Let {A i } ∞ i=1 be a sequence of P-operators with uniformly bounded ∥.∥ ∞→1 norms.Then, it has an action convergent subsequence.
Proof By the previous lemmas, for every k, we can extract a subsequence for which S k (A) is convergent.By a diagonalization argument, we can also find a subsequence for which S k (A) is convergent for every k at the same time, and this implies action convergence. ∎ For a real number p ∈ [1, ∞] and measurable function v ∶ Ω → R, we have that Note that if p = ∞, then ∥v∥ ∞ denotes the "essential maximum" of v.It is well known that for p ≤ q, we have ∥v∥ p ≤ ∥v∥ q for any measurable function v on Ω.Let p, q ∈ [1, ∞] be real numbers and let A ∶ L ∞ (Ω) → L 1 (Ω) be a linear operator.The operator norm ∥A∥ p→q is defined by ∥vA∥ q /∥v∥ p .
We say that A is (p, q)-bounded if ∥A∥ p→q is finite.We have that if p ′ , q ′ ∈ [1, ∞] satisfy p ′ ≥ p and q ′ ≤ q then ∥A∥ p ′ →q ′ ≤ ∥A∥ p→q .We denote by B p,q (Ω) the set of (p, q)-bounded linear operators from Remark 2.7 (L 2 theory) If a P-operator A satisfies ∥A∥ p→q < ∞, then A extends uniquely to an operator of the form L p (Ω) → L q (Ω).In this sense, by slightly abusing the notation, we can identify the set B p,q (Ω) with the set of operators L p (Ω) → L q (Ω) with bounded ∥.∥ p→q norm.An especially nice class of P-operators is the set B 2,2 (Ω).If Ω is fixed then these operators are closed with respect to composition and they form a so-called von Neumann algebra.

Remark 2.8 (L p graphons as
https://doi.org/10.4153/S0008414X2000070XPublished online by Cambridge University Press Let q ∶= p/(p − 1).We can associate an operator It is easy to see that ∥A W ∥ q→p < ∞ and thus A W ∈ B q, p is a P-operator representing the so-called L p -graphon W. For a theory of L p -graphon convergence, see [7].It follows from our theory that for sequences of L p -graphons {W i } ∞ i=1 with uniformly bounded L p -norms, action convergence of the representing operators A W i is equivalent to L p -graphon convergence (see Section 8, in particular, Theorem 8.2).
The following theorem is one of the main results in this paper.Its proof can be found in Section 4.
Theorem 2.9 (Existence of limit object) i=1 be a convergent sequence of P-operators with uniformly bounded ∥.∥ p→q norms.Then there is a P-operator A such that lim i→∞ d M (A i , A) = 0, and Definition 2.7 (Weak equivalence and weak containment) Let A and B be two P-operators.We say that A and B are weakly equivalent if d M (A, B) = 0. We have that A and B are weakly equivalent if and only if It is easy to see that norms of the form ∥.∥ p→q are invariant with respect to weak equivalence (these norms can be read off from the 1-profiles of P-operators).Let X denote the set of weak equivalence classes of P-operators and let X ′ ⊂ X denote the set of equivalence classes of P-operators defined on atomless probability spaces.For The next theorem follows from Lemma 2.6, Theorem 2.9, and Lemma 3.1.

Theorem 2.10 (Compactness)
Lemma 2.12 Let k ∈ N and let A, B be P-operators both in B(Ω) for some probability space (Ω, A, μ).Then, By switching the roles of A and B and repeating the same argument we get the above inequality with A and B switched.This implies the statement of the lemma.∎ Lemma 2.13 (Norm distance vs. d M distance) Assume that A, B are P-operators acting on the same space L ∞ (Ω).We have 2→2 .Proof Using Lemma 2.12, we obtain that The last inequality is clear from the argument after Lemma 2.6.∎ Let A ∈ B(Ω, A, μ) be a P-operator.We define the bilinear form and thus ( f , g) A is finite.In general, if ∥A∥ p→q < ∞ holds for a conjugate pair with 1/p + 1/q = 1, then we have We define the cut norm of A by It is well known (see equation (3.6) in [9] or Lemma 8.11 in [27]) that which means that ∥.∥ ◻ is equivalent to the norm ∥.∥ ∞→1 .Let ψ ∶ Ω → Ω be an invertible measure-preserving transformation with measure-preserving inverse.The transformation ψ induces a natural, linear action on L ∞ (Ω), also denoted by ψ, defined by where ψ, ϕ run through all invertible measure, preserving transformations of Ω.The proof of the next lemma follows from Lemma 2.13 and inequality (2.7).15 Let G n denote the Erdős-Rényi graph with average degree f (n) such that lim n→∞ f (n) = ∞.Let A n be adjacency operator of G n and let B n be the n × n matrix with every entry equal to 1/n.Theorem 6.2 and 6.3 of [16] implies that the largest eigenvalue of 1) with probability close to 1. Recalling that the 2 → 2 norm is related to the spectrum, it follows from Lemma 2.13 that if n is large enough then 1), where W is the constant 1 graphon.It follows that the random graphs G n converge in probability to W.

P-operators with special properties
The goal of this chapter is to show that various fundamental properties behave well with respect to P-operator convergence.

Lemma 3.1 Atomless P-operators are closed with respect to d M .
Proof Let A ∈ B(Ω) be an atomless P-operator and let , where α 1 = D(v) and β 1 = D(w) are the marginals of α and β on the first coordinate.It follows that β 1 is at most 3ε far from the uniform distribution in d LP , and thus the largest atom in β 1 is at most 10ε.Hence the largest atom in Ω 2 has weight at most 10ε = 10d M (B, A).We obtained that if B is the limit of atomless operators, then B is atomless. ∎ i=1 be a sequence of uniformly (p, q)-bounded P-operators converging to a P-operator A ∈ B(Ω).Then, we have the following two statements.
(1) If A i is positive for every i, then A is also positive.
We have by the assumption that (v i , v i ) A i ≥ 0 holds for every i, and thus (v, v) A ≥ 0 holds.
To prove the second claim let v, w ∈ L ∞ [−1,1] (Ω) and let μ ∶= D A (v, w).Again by Remark 2.2, we have that for every i ∈ N there exist functions ) and E((v i A i )w i ) converges to E((vA)w) as i goes to infinity.On the other hand, we have that Since by assumption, we have i=1 be a sequence of uniformly (p, q)-bounded P-operators converging to a P-operator A ∈ B(Ω).Then, we have the following two statements.
(1) If A i is positivity-preserving for every i, then A is also positivity-preserving.
(2) If A i is c-regular for every i, then A is also c-regular.
weakly converges to δ 0 .Based on the uniform boundedness of the functions, it follows from Lemma 13.6 that ) converges to 0 and so (v, vA) is the weak limit of i=1 be a sequence of functions such that D A i (v i ) weakly converges to D A (1 Ω ).We have that D(v i − 1 Ω i ) weakly converges to δ 0 and hence by Lemma 13.6 we have that d LP (D A i (1 Ω i ), D A i (v i )) goes to 0 as i goes to infinity.It follows that Proof First, we show that and thus by linearity and the previous statement we obtain that ∥vA∥ ∞ ≤ m.
The fact that 1 by the first part of the proof.Now, we use the spectral theorem for the bounded self-adjoint operator A to get a projection-valued measure E representing A. Suppose that this is not supported on [−1, 1] and there exists w Since the image of v is not zero, this tends to infinity as k → ∞.This contradicts to the fact that ∥vA k ∥ 2 is uniformly bounded for v ∈ L ∞ (Ω).It follows that ∥A∥ 2→2 ≤ 1, and this finishes the proof.∎ Theorem 3.3 Let M be the set of weak equivalence classes of Markov graphops.Then (M, d M ) is a compact metric space.
Proof Let {A i } ∞ i=1 be a sequence of Markov graphops with limit A. Lemma 3.2 guarantees that this sequence has uniformly bounded (2, 2)-norms.Hence we can apply Proposition 3 and Proposition 3, and we get that A is also a Markov graphop.The compactness follows from Lemma 2.6 and Theorem 2.9.∎ Remark 3. 4 For the case of dense graph limits, it is known that every graphon can arise as limits of finite graphs [28].The answer for the same question for graphops is more complicated.To start with, for the bounded degree case, one of the main open problems related to Benjamini-Schramm graph limits is the question whether every unimodular random graph is the Benjamini-Schramm limit of finite graphs.A positive answer to this question would imply that every group is sofic and it would solve several important open problems in group theory (see, e.g., [15]).Unfortunately, the analogue question for the stronger local-global convergence was refuted by Kun and Thom [24].This implies that not every graphing is the local-global limit of a finite graph sequence.Since the theory of local-global limits is embedded into the theory of action convergence, we obtain examples for graphops that do not arise as finite graph limits.It is also natural to ask if there is some general invariance property that is common to all finite graph limits, in the same spirit as unimodularity for Benjamini-Schramm limits?This turns out to be true.Unimodularity, in this sense, is equivalent to the property that the representing graphing is a self-adjoint operator.Our Lemma 3 shows that, under mild conditions, action limits of finite graphs are always self-adjoint operators.

Construction of the limit object
In this section, we prove Theorem 2.9.Let We wish to prove that there is a P-operator A ∈ B p,q (Ω) for some probability space (Ω, A, μ) such that for every k ∈ N we have that Before the formal definitions, we explain the heuristics behind this proof.Recall that our goal is to construct a P-operator such that its k-profile is the limit of the k-profiles of the operators in a given convergent sequence of operators for every fixed k.Take countable dense sets of points in the limiting k-profiles of the operators for every k.Then we have that each point in this dense system can be approximated by elements in the k-profiles of the converging operators A i .On the other hand, every point in the k-profile of A i involves 2k measurable functions on Ω i .Recall that since Ω i is a probability space, these functions are random variables.Very roughly speaking, the main idea is to take all the functions involved into an approximating profile point of A i for our dense system of points in the limiting profiles.These are countably many functions for each i.By choosing a subsequence, we can assume that the joint distributions of these countably many functions (random variables) converge weakly and the limit is some probability distribution on Ω ∶= R ∞ .As a first attempt, we could try to produce the limiting operator on the function space of this probability space.Observe that each coordinate function in the probability space on R ∞ corresponds to a function involved in a k-profile for some k.Since every k-profile comes from k functions and their k images, we obtain some information on a possible limiting operator.More precisely, we obtain that certain coordinate functions are the images of some other coordinate functions under this action.However, it is not clear that such an action extends to the full function space on Ω and so we need a more complicated version of the above idea.Instead of just working with functions involved into profile points, we need enough functions to represent the function space of a whole σ-algebra.To this end, we extend the above function systems by new functions that are obtained by some natural operations.To keep track of the new functions, we introduce an abstract algebraic formalism involving semigroups.The main difficulty in the proof is to verify that, at the end of this process, we obtain a well-defined operator and it has the desired limiting properties.
First, we will need the next algebraic notion.( An example for a word in Construction of a function system: In this technical part of the proof, we construct a function system {v i , f ∈ L ∞ (Ω i )} i∈N, f ∈F for some countable index set F. Later, we will use this function system to construct a probability distribution κ ∈ P(R F×{0,1} ) and an operator A ∈ B p,q (R F×{0,1} , κ).We will show that A is an appropriate limit object for the sequence The index set F will be the free semigroup generated by G and an appropriate set of nonlinear operators on function spaces.For y ∈ Q and z where l is given by the pair (y, z) ∈ Q × Q + .Note that by definition ∥l(v)∥ ∞ ≤ 1.As these functions are naturally identified with Q × Q + , we will use L = Q × Q + .Furthermore, let F ∶= F(G, L) be as in Definition 4.1.We have that F is countable.Now we describe the functions {v i , g } i∈N, g∈G .For every i, k ∈ N and t ∈ X ′ k let {v i ,(t, j) } k j=1 be a system of functions in L ∞ [−1,1] (Ω i ) such that the joint distribution of converges to t as i goes to infinity.Now we construct the functions {v i ,w } i∈N,w∈F recursively to the length of m(w).For words of length 1 the functions are already constructed above.Assume that for some k ∈ N we have already constructed all the functions v i ,w with m(w ). Construction of the probability space.Let ξ i ∶ Ω i → R F×{0,1} be the function such that for f ∈ F, e ∈ {0, 1}, and where A 0 i is defined to be the identity operator.Let κ i ∈ P(R F×{0,1} ) denote the distribution of the random variable ξ i .(In other words, κ i is the joint distribution of the functions {v i , f } f ∈F and {v i , f A i } f ∈F .)Since τ(κ i ) ≤ c holds (recall equation (2.4) for the definition of τ), we have that there is a strictly growing sequence {n i } ∞ i=1 of natural numbers such that κ n i is weakly convergent with limit κ as i goes to infinity.Let Ω ∶= R F×{0,1} be the probability space with the Borel σ-algebra A and probability measure κ.We will consider Ω as topological space, equipped with the product topology.The fact that κ is a probability measure follows from its definition as a weak limit of probability distributions κ n i .Construction of the operator: We will define an operator A ∈ B p,q (Ω) with Ω defined above.For ( f , e) ∈ F × {0, 1} let π ( f ,e) ∶ R F×{0,1} → R denote projection function to the coordinate at ( f , e).Notice that In particular, due to the definition of κ, we also have 1 The coordinate functions on R F×{0,1} have the following properties. ( (2) If f ∈ F and l = (y, z) holds for some y, z, then π (l ( f (4) The linear span of the functions {π ( f ,0) } f ∈F is dense in the space L p (Ω).
Remark 4.2 Note that when functions on Ω are treated as functions in L r (Ω) for some r ∈ [1, ∞], then they are considered to be equal if they differ on a κ zero measure set.This kind of weak equality of functions enables various algebraic correspondences between different coordinate functions, which would be impossible in a strict sense.As a toy example, let us consider the uniform measure μ on {(x, x) ∶ x ∈ [0, 1]}, which is a Borel measure on R 2 .We have that the x-coordinate function (x, y) ↦ x is equal to the y-coordinate function (x, y) ↦ y in the space L r (R 2 , μ), because they agree on the support of μ.
The proof of Lemma 4.1 will use the following two lemmas.
and hence where v is viewed as a random variable on Ω.For r = 1, it is well known that v ∈ L 1 (Ω) implies lim n→∞ nP(|v| ≥ n) = 0, due to the fact that we have 0 in this case.For r > 1, by Markov's inequality we have that P(|v| r ≥ n r ) is at most ∥v∥ r r /n r and thus nP(|v| ≥ n) converges to 0 as n goes to infinity.Now it suffices to show that lim n→∞ The facts that v is measurable and κ is a probability measure imply that lim n→∞ κ(U n ) = 0 and lim n→∞ ∫ Un |v| r dκ = 0. ∎ The following lemma is easy to prove, see, e.g., Theorem 22.4 in the lecture notes [13].
} i∈I be a system of functions for some countable index set I such that for every a, b ∈ I there is c ∈ I with v a v b = v c .Let A 0 be the σ-algebra generated by the functions {v i } i∈I .Suppose that the constant 1 function on Ω can be approximated by a uniformly bounded family of finite linear combinations of {v i } i∈I .Then the L r -closure of the linear span of Proof To prove the first statement of the lemma recall that, by the construction of the function system, we have for every i ∈ N and f 1 , f 2 ∈ F that v i , f1 f2 = v i , f1 v i , f2 holds.Therefore, by equation (4.1) and the continuity of π, we have that each κ i is supported inside the closed set This implies that κ is also supported inside this set and thus π ( f1 f2 ,0) = π ( f1 ,0) π ( f2 ,0) holds κ-almost everywhere.
The proof of the second statement is similar to the first one.Again, by the construction of the function system, we have for every i ∈ N and This means by the definition of κ i , equation (4.1) and the continuity of π that κ i is supported on the closed set for every i ∈ N. Thus, π (l ( f ),0) = h p,q ○ π ( f ,1) holds κ-almost everywhere.
For the proof of the third statement, let us use that ∥A i ∥ p→q ≤ c holds for every i ∈ N and hence Since the sum on the right-hand side is a function in ) is a bounded, continuous function on the support of κ.Therefore, using κ n i w → κ and equation (4.1) again (in particular, integrating of the pth power of the absolute values with respect to κ i ), we obtain that On the other hand, | ∑ k j=1 λ j π (a j ,1) | q is a nonnegative continuous function, thus weak convergence, in this case, implies the following inequality: These inequalities together yield the third statement.
For the fourth statement, let H r denote the L r -closure of the linear span of the functions {π ( f ,0) } f ∈F for r ∈ [1, ∞).First we show that π ( f ,1) ∈ H q holds for every f ∈ F. By the second statement, we have that where l j is given by the pair ( j/n, 1/n) for −n 2 ≤ j ≤ n 2 .Since the right-hand side of (4.2) is in H q , we obtain that the left-hand side is also in H q .On the other hand, we have π ( f ,1) ∈ L q (Ω) due to the third statement.Hence by Lemma 4.3, we obtain that, as n goes to infinity, the left-hand side of (4.2) converges to π ( f ,1) in L q (Ω) and thus π ( f ,1) ∈ H q .
Let A 0 be the σ-algebra generated by the functions {π ( f ,0) } f ∈F .Notice that the constant 1 function on Ω can be approximated already in X ′ 1 .We have by the first statement in this lemma and Lemma 4.4 that H r = L r (Ω, A 0 , κ) holds for every r ∈ [1, ∞).As we have shown, we have for every f ∈ F that π ( f ,1) ∈ H q = L q (Ω, A 0 , κ) and thus all coordinate functions on R F×{0,1} are measurable in A 0 .This shows that The last statement of the lemma follows directly from the definition of the functions {v i ,(t, j) } i∈N, j∈ [k] and the definition of κ. ∎ We are ready to define the operator A ∈ B p,q (Ω).
. This defines a linear operator on the linear span of {π ( f ,0) } f ∈F , which is bounded due to the third statement of Lemma 4.1.Hence it has a unique continuous linear extension on its L p -closure.By the fourth statement of the same lemma, we get that there is a unique operator A ∈ B p,q (Ω) with ∥A∥ p→q ≤ c such that π ( f ,0) A = π ( f ,1) holds for every f ∈ F.Last part of the proof: The last statement of Lemma 4.1 together with the equality π ((t, j),0) A = π ((t, j),1) implies that for every k ∈ N and t ∈ X ′ k we have t ∈ S k (A).Therefore, for every k ∈ N we have that X k ⊆ S * k (A).Our goal is to show that X k = S * k (A) for every k ∈ N and thus it remains to prove that Let ε > 0 be arbitrary.We have by the fourth statement of Lemma 4.1 that for some large enough natural number m there are elements f 1 , f 2 , . . ., f m ∈ F and real numbers {λ a, j } a∈[m], j∈ [k] such that for every j ∈ [k] we have ∥w j − v j ∥ p ≤ ε, where w j ∶= ∑ m a=1 λ a, j π ( fa ,0) for j ∈ [k].Since only vectors with infinity norm at most 1 are considered in the profile, we will use a truncating function.Namely, let h ∶ R → R be the continuous function with h . This also implies by the triangle inequality that . By the definition of κ, we have that , where c ′ ∶= max(c, 1).Let Notice that the functions h ○ z i , j − z i , j are uniformly bounded and their distribution converges weakly to the distribution of h ○ w j − w j .Hence if i is large enough, then by (4.3) we have that ∥ h ○ z i , j − z i , j ∥ p ≤ 3ε holds for j ∈ [k] and thus This holds for every ε > 0 and thus α ∈ X k .◻

General graph limits
There are various ways of representing graphs by operators and, in particular, by Poperators.Depending on the chosen representation, we get a corresponding limit notion for graphs.In this section, we list four natural operator representations of graphs and investigate the corresponding graph limit notions.Let G be a finite graph on the vertex set V (G) = [n] with edge set E(G).

Adjacency operator convergence: Recall that μ [n] denotes the uniform distribution on [n]. We denote by A(G) ∈ B([n], μ [n]
) the P-operator defined by , where d is the maximal degree in G.We can say that a graph sequence i=1 is an action convergent sequence of P-operators.We obtain compactness for graphs with uniformly bounded degree.Quite surprisingly (and nontrivially), it turns out that this convergence notion is equivalent to local-global convergence, which is a refinement of Benjamini-Schramm convergence (see [4,21]).However, for graphs with nonbounded degrees, compactness is not guaranteed.This can be solved by scaling the operators A(G) by some number that depends on G.For example, we have that ∥A(G)/|V (G)|∥ 2→2 ≤ 1 holds for every graph G. Again, quite surprisingly, it turns out that convergence of A(G i )/|V (G i )| is equivalent to dense graph convergence.(For a definition of dense graph convergence, see [28].)This motivates us to introduce scaling functions that map graphs to positive real numbers.Let G denote the set of isomorphism classes of finite graphs.Definition 5.1 Let f ∶ G → R + be a function.We say that a graph sequence {G i } ∞ i=1 is adjacency operator convergent (or just convergent) with scaling f if the sequence i=1 is an action convergent sequence of P-operators.Recall that a graphop is a self-adjoint, positivity-preserving P-operator.The word graphop is a mixture of the words, graph, and operator.Note that both graphons and graphings used in graph limit theory are graphops.Theorem 2.9, combined with Proposition 3 and Proposition 3, implies the following.

Theorem 5.1 Let {G i } ∞
i=1 be an adjacency operator convergent sequence of graphs with scaling f ∶ G → R + .Assume that there exist p ∈ [1, ∞), q ∈ (1, ∞) and c ∈ R + such that ∥G i / f (G i )∥ p→q ≤ c holds for every i ∈ N. Then there is a graphop A such that lim i→∞ A(G i )/ f (G i ) = A holds.We say that A is the adjacency operator limit of ({G i } ∞ i=1 , f ).A natural scaling is f p,q (G) ∶= ∥A(G)∥ p→q defined for nonempty graphs G ( f p,q can be defined as 1 on the empty graph), where p ∈ [1, ∞), q ∈ (1, ∞].Let us call it norm scaling.With this scaling, every graph sequence has a convergent subsequence, hence we have sequential compactness for arbitrary graph sequences.Norm scaling leads to a general convergence notion that generalizes local-global convergence and recovers dense graph limits up to a constant multiplicative factor in the limit object.The norm scaling is very convenient to use for general graph sequences where no other natural normalization is given.Random walk convergence of graphs: Let ν G denote the stationary measure of the random walk on G.It is well known that ν G is the probability measure on , where d i is the degree of the vertex i for i ∈ [n].We denote by M(G) ∈ B 2,2 ([n], ν G ) the P-operator defined by equation (1.1).The operator M(G) is known as the Markov kernel corresponding to the random walk on G.We have that M(G) is a Markov graphop.Consequently, by Lemma 3.2, we have that ∥M(G)∥ 2→2 = 1.(If G has no edges, then M(G) is not defined.)

Definition 5.2 A graph sequence of nonempty graphs
i=1 is a convergent sequence of P-operators.The following theorem is a direct consequence of Theorem 3.3.

Theorem 5.2 Every graph sequence {G i } ∞
i=1 has a random walk convergent subsequence.If {G i } ∞ i=1 is random walk convergent, then there is a Markov graphop A such that lim i→∞ M(G i ) = A. We say that A is the random walk limit of {G i } ∞ i=1 .Note that, for regular graphs, random walk convergence is equivalent to adjacency operator convergence with scaling by ∥G∥ 2→2 .However, if there is a small but nonzero number of very high degree points in G and many low degree points, then adjacency operator convergence may trivialize.Examples for this are the star graphs or the 2-subdivisions of complete graphs.In these cases random walk convergence turns out to be more natural and leads to interesting and nontrivial limit objects (see Section 12).Extended random walk convergence: A Markov pair is a pair of a Markov graphop A ∈ B(Ω) and a measurable function f on Ω.As we explained in the introduction, a sequence of Markov pairs {(A i , f i )} ∞ i=1 is convergent if the extended k-profiles formed by distributions of the form converge in d H for every k ∈ N. It can be proved with a slight extension of the proof of Theorem 2.9 (details will be worked out elsewhere) that a convergent sequence of Markov pairs has a limit, which is also a Markov pair.There are two different uses of Markov pairs.The first one is the following.In spite of the fact that the Markov Á. Backhausz and B. Szegedy kernel of a finite graph M(G) determines the sequence of the nonzero degrees (even with multiplicities), this information may be lost in the limit.Even if it is preserved in some way (examples for this are sequences of bounded degree graphs), degrees cannot be read off in the usual way from the limit object which is a Markov graphop.The idea is that we can store the information on the degrees in a normalized version d * of the degree function d.Let ν be the representing measure of A given by Theorem 6.3 and let ν ′ be the marginal measure of ν on (Ω, A).Let M(A) denote the Markov graphop on (Ω, A, ν ′ ) determined by the measure ν using again Theorem 6.3.Note that the action of M(A) is given by The representation of A is given by the pair (M(A), 1 Ω A).
Another interesting use of Markov pairs is that we can use them to represent generalized graphons (related to graphexes [6,23]) that are symmetric non-negative measurable functions of the form 2 .Let ν ′ be the marginal distribution of ν on R + .Let M denote the Markov graphop on (R + , ν ′ ) determined by Theorem 6.3.For x ∈ R + let f (x) ∶= ∫ R + W(x, y) dλ.We can represent the generalized graphon W by the Markov pair (M, f ).Laplace operator convergence: Using the above notations, we denote by L(G) ∈ B 2 ([n], μ [n] ) the P-operator defined by We have that L(G) is a positive self-adjoint operator.Note that, in contrast with A(G) and M(G), the operator L(G) is typically not positivity-preserving.On the other hand, we gain positiveness.Similarly, to the previous definitions, we say that a graph sequence i=1 has uniformly bounded operator norm and is a convergent sequence of P-operators.Limit objects are positive, self-adjoint P-operators.Degree weighted operator convergence: Finally, we mention one more interesting Poperator related to G. Similarly, to M(G), we use the stationary distribution of the random walk.Let F(G) ∈ B([n], ν G ) be defined by Again, we have that F(G) is a graphop.Indeed, positivity-preserving property is clear, and self-adjointness can be verified by a simple calculation.Limits of appropriately normalized versions of F(G i ) are graphops.

Measure representation of graphops
Let A ∈ B(Ω, A, μ) be a graphop.In this section, we construct a measure ν on Ω × Ω that represents the operator A. This means that the operator A can be reconstructed from the measure ν in a natural way.Intuitively, if we think of A as an infinite graphlike object, then ν shows where we can find the edges in Ω × Ω.Note that both graphons and graphings are given in terms of such measures rather than in the form of operators.More precisely, graphons are given by a measurable function which is the Radon-Nykodim derivative of a measure on [0, 1] × [0, 1].Our goal, in this chapter, is to bring closer the operator language and the existing representations of graph limits.
Assume (Ω, A) is a standard Borel space.Let R denote the set of product sets of the form S × T ⊆ Ω × Ω, where S, T ∈ A. We have that R is a so-called semiring.We define the function ν on R such that ν(S × T) ∶= (1 S , 1 T ) A holds for S, T ∈ A (recall equation (2.5)).

Lemma 6.1
The function ν has the following properties.
(2) If S × T is the disjoint union of the finitely many product sets

whenever the minimum of μ(S) and μ(T) is at most δ.
Proof By the positivity-preserving property of A and the bilinearity of (., .)A we have that ν satisfies the first two properties.To show the last property, observe that, by the self-adjoint property A, we have that ν(S × T) = ν(T × S) and so the statement is equivalent to showing the existence of δ > 0 such that ν(S × T) ≤ ε whenever ν(T) ≤ δ.We have by the first two properties that where f ∶= 1 Ω A. Now, since f ≥ 0 and ∫ f dμ < ∞, the statement of the lemma follows from the well-known absolute continuity property of integration.∎ The proof of the following lemma follows a similar scheme as [12,Lemma A.10] or [17,Theorem 454D].

Lemma 6.2 The function ν is a premeasure on R and it has a unique extension to a Borel measure on (Ω × Ω, A ⊗ A), denoted also by ν.
Proof First, we claim that if ε > 0 and δ > 0 satisfy the third property in lemma 6.1 and μ(S), μ(T) ≥ 1 − δ, then Indeed, by Lemma 6.1, we have https://doi.org/10.4153/S0008414X2000070XPublished online by Cambridge University Press

Á. Backhausz and B. Szegedy
To prove that ν is a premeasure, the only nontrivial part is to show that if R ∈ R is the pairwise disjoint union of sets Note that it suffices to prove it for R = Ω × Ω, since given any other product set R we can obtain Ω × Ω as the disjoint union of R and a finite number of other product sets, in such a way that the claim for Ω × Ω implies the claim for R. Now, since Ω is standard, there exists a topology τ on Ω generating A such that (1) we can approximate every set in A by a compact set with arbitrary precision measured in μ; (2) both S i and T i are open for every i ∈ N. Let ε > 0 arbitrary and let K 1 , K 2 ∈ A be τ-compact sets such that μ(K 1 ), μ(K 2 ) ≥ 1 − δ(ε).We have by equation ( 6 i=1 is an open cover of the compact set K 1 × K 2 , and so there is a finite subcover.Applying the second property of ν to this finite subcover, we obtain that Since this is true for every ε > 0, we obtain that ν is a premeasure.The Carathéodory extension theorem implies that there is a unique extension of ν to A ⊗ A. ∎ Theorem 6.3 (Measure representation of graphops) If A ∈ B(Ω, A, μ) is a graphop, then there is a unique finite measure ν on (Ω × Ω, A ⊗ A) with the following properties.
(2) The marginal distribution of ν on Ω is absolutely continuous with respect to μ. ( Conversely, if ν is a finite measure on (Ω × Ω, A ⊗ A) satisfying the first two properties, then there is a unique graphop A such that the third property is satisfied.
Proof The existence and uniqueness of ν follows from Lemma 6.2.For the converse statement, let f ∈ L ∞ (Ω) be an arbitrary function.Let ), and thus, by duality, there is a unique function m( f ) is a self-adjoint, positivity-preserving linear operator with ∥A∥ ∞→1 = ν(Ω 2 ), satisfying the third property.Notice that for this converse statement we assumed that ν is finite, which guarantees that we can apply the extension theorem.∎ Remark 6.4 (Fiber measures) A natural way of reconstructing A from the representing measure ν goes by disintegrating the measure ν.By using the disintegration theorem one obtains a family of measures {ν x } x∈Ω on (Ω, A) (called fiber measures) such that In general, it is very convenient to describe a graphop in terms of fiber measures.This is illustrated in Figure 4.  μ.This means that Markov graphops are completely specified by the data (Ω, A, ν), where ν is a symmetric probability measure on (Ω × Ω, A ⊗ A).Such objects are symmetric self-couplings of probability spaces.

Quotient convergence and partitions
In the first part of this section, we relate P-operator convergence to the so-called quotient convergence, which was studied in different forms by different authors [8,11,26].The version that we generalize to P-operators was defined in [26].In the second part of the chapter, we describe a variant of action convergence that turns out to be equivalent to the original version for uniformly (p, q)-bounded sequences.

Definition 7.3 (Quotient convergence and metric) A sequence of P-operators
The following proposition says that P-operator convergence is stronger than quotient convergence if the sequence has uniformly bounded ∥.∥ p→q norm for some p ∈ [1, ∞), q ∈ (1, ∞].

Lemma 7.1 Let us fix c ≥ 1 and numbers p
Proof Depending on p and q let us choose p ′ ∈ (p, ∞) and q ′ ∈ (1, q) with 1/p ′ + 1/q ′ = 1.Then we have ∥A∥ p ′ →q ′ ≤ ∥A∥ p→q .Let M ∈ Q k (B).We need to show that if A and B are sufficiently close in d M , then there is follows by the symmetry of the argument.
Let v 1 , v 2 , . . ., v k be a balanced fractional function partition of Ω 2 such that the corresponding quotient of B is M. We have by the definition of d M that there are vectors Let ε 2 > 0 be an arbitrary constant, and suppose that d M (A, B) is so small that the right-hand side of the above inequality is smaller than ε 2 /2.The fact that {v i } k i=1 is a balanced fractional function partition can be expressed by the joint empirical distribution of these vectors: it is concentrated on k-tuples of nonnegative numbers with sum 1, and the mean of every marginal is 1/k.By the inequality above, the joint empirical distribution of {w i } k i=1 is close to this, and it follows that there is a balanced fractional function partition For such a function system, we have for every i, j where the last inequality is by (2.6) and A, B) is sufficiently small.This follows from Lemma 13.4.∎ The following proposition is a direct consequence of the previous lemma.
i=1 be an action convergent sequence of P-operators with uniformly bounded ∥.∥ p→q norms.Then {A i } ∞ i=1 is quotient convergent.In the rest of this section, we will formulate a version of action convergence.It is clear that {v i } k i=1 is a function partition if and only if there is measurable partition , where e i ∈ R k is the vector with 1 at the ith coordinate and 0 everywhere else.Let A ∈ B(Ω, A, μ).We have that {v i } k i=1 is a function partition if and only if is a function partition.The next theorem gives a useful equivalent formulation of Poperator convergence for uniformly bounded operators.
Theorem 7.3 Let p ∈ [1, ∞), q ∈ (1, ∞] and let {A i } ∞ i=1 be a uniformly (p, q)-bounded sequence of P-operators.Then {A i } ∞ i=1 is convergent if and only if for every k the sequence The proof of this theorem is a direct consequence of the following two lemmas.
Proof Assume that A ∈ B(Ω).If μ satisfies the conditions of the first part of the lemma, it can be written as μ = D A ({v i } k i=1 ).On the other hand, since d LP (μ, M k ) ≤ δ, we can find a probability measure in M k with distance at most δ from μ.The first k marginals are concentrated on vectors with entries in [0, 1], hence there are [0, 1]valued functions w 1 , w 2 , . . ., , by the upper bound on the matrix norm.By using Lemma 13.1, it follows that if δ is small enough, then We can apply the first part of the statement for μ 2 and B. We obtain that there is into at most 2m measurable sets.By taking common refinement of the level sets, we obtain that there is a partition is measurable in this partition.This means that there exist real numbers {u i , j } i∈[k], j∈ [N] between −1 and 1 such that for every i ∈ [k] we have Let ε ′ > 0 be some sufficiently small number.If δ is small enough, we have that there is a partition . We obtain that if m is big enough, and ε ′ is small enough then κ ∈ S k (B) is at most ε far from μ.All the estimates in the proof depend only on c, ε, k and p, q. ∎

Dense graph limits and graphons
In this chapter, we explain how the so-called dense graph limit theory fits into our general limit theory.Let us consider the probability space ([0, 1], L, λ), where λ is the Lebesgue measure on the Lebesgue σ-algebra L. Special P-operators on L 2 ([0, 1])called graphons-play a crucial role in dense graph limits.A graphon is a two-variable measurable function W ∶ [0, 1] 2 → [0, 1] with the symmetry property that W(x, y) = W(y, x) holds for every x, y ∈ [0, 1].Graphons act on the Hilbert space L 2 ([0, 1]) by where f ∈ L 2 ([0, 1]).It is easy to see that ∥W∥ 2→2 ≤ 1 and thus graphons are Poperators.It is also clear that graphons are positivity-preserving and self-adjoint operators and hence they are also graphops.Let W denote the space of graphons.For U , W ∈ W we say that U ∼ W (U is isomorphic to W) if δ ◻ (U , W) = 0. Let W ∶= W/ ∼ be the set of equivalence classes.The next theorem from [28] is a fundamental result in graph limit theory.
The space ( W, δ ◻ ) is basically the graph limit space.Every finite graph and W G (x, y) ∶= 0 otherwise.Graph convergence in dense graph limit theory is equivalent to the convergence of the representing functions W G in ( W, δ ◻ ).Our next theorem shows that this limit theory is embedded into our more general limit framework.

Theorem 8.2
The two pseudometrics δ ◻ and d M are equivalent on W.
Proof By Lemma 2.14, it remains to show that for every ε > 0 there is δ > 0 such that if d M (U , W) ≤ δ, then δ ◻ (U , W) ≤ ε.By contradiction, let us assume that there exist ε > 0 and two sequence of graphons By choosing an appropriate subsequence, we can assume by Theorem 8.1 that lim i→∞ U i = U and lim i→∞ W i = W holds where the convergence is in δ ◻ .We obtain that δ ◻ (U , W) ≥ ε.On the other hand, by the triangle inequality and Lemma 2.14 we have that and thus by taking lim i→∞ we get that d M (U , W) = 0. We have by Proposition 7 that It is well known in graph limit theory (see, e.g., [8]) that such quotient equivalence implies δ ◻ (U , W) = 0. ∎ We need the following lemma.

Lemma 8.3
For every ε > 0 there exists a number n such that if G is a finite graph with (where integers represent the vertices of the graph), while the k-profile of W G is based on functions v j ∶ [0, 1] → R with ∥v j ∥ ∞ ≤ 1.By the representation given above (after Theorem 8.1), for every such function h j we can define the function v j (x) = h j (⌈xn⌉) for x ∈ (0, 1] and v j (0) = h j (1).Since this is consistent with the graphon representation of graphs, the empirical distributions and will be exactly the same.That is, the Therefore, the main part is to show that for every ε > 0 there exists N such that for n ≥ N for every 1] such that the Lévy-Prokhorov distance of the two empirical distributions is at most ε.
First, we suppose that all possible values of the functions v j are multiples of an appropriate δ > 0 (to be chosen later, depending only on k and ε.That is, we suppose that v j ∶ [0, 1] → T δ , where T δ is the finite set consisting of the multiples of δ within [−1, 1].We will see later that this easily implies the general statement.Now, we use the weak Szemerédi regularity lemma (in the form of Lemma 9.3 and the remark afterward in [27]).This says that given S ≥ 2 and a graph G on n vertices, there exists a partition P of the vertices of G to S parts such that (i) the difference between the size of any two classes is at most 1; (ii) we have d ◻ (G, G P ) ≤ 4/ √ log S. Here, G P is a weighted graph on the vertex set of G, where the weight of uv is equal to the edge density between the class of u and v.By considering the corresponding graphons, we can say the following.Given ε > 0, let S = S(ε) be chosen such that 4/ √ log S < ε.Furthermore, let us partition [0, 1] into ∪ S s=1 P s , where each P s is the union of the intervals corresponding to the vertices in class s (in the graphon representation of a graph, the ith vertex is represented by the interval ((i − 1)/n, i/n)).Then, the regularity lemma implies that given ε > 0 and W G , we can find a graphon and W ′ G is a step function, which is constant on each set of the form P s × P t .Furthermore, by inequality (2.7), this condition implies that ∥W G − W ′ G ∥ ∞→1 ≤ 4ε.Then, by the same argument as in the proof of Lemma 2.12, namely, using Lemma 13.2, we obtain that Now, we define a modification of the family of functions (v 1 , . . ., v k ) mapping from [0, 1] to T δ , with the new versions being constants on each interval ((i − 1)/n, i/n), but with approximately the same joint empirical distribution.For every element ω ∈ T k δ , we define where | ⋅ | denotes the Lebesgue measure on [0, 1].For every ω ∈ T k δ , we choose ⌊μ s,ω ⋅ n⌋ intervals belonging to P s , and define (v 1 , . . ., v k ) to take value ω on these intervals (for different ωs we choose different intervals; since the sum of μ s,ω is 1, and we have n intervals, this is possible).On the remaining intervals, we let (v 1 , . . ., v k ) to be 0.
We will use the following facts about this transformation.First, if the quantities μ s,ω are the same for every s = 1, . . ., S and ω ∈ T k δ for (v 1 , . . ., v k ) and some (u 1 , . . ., u k ), then To see this, recall that W ′ G is constant on each P i × P j , which means that it is only the average on P j that counts, and we did not change that.However, the quantities μ s,ω are changed when we use the rounding, in order to make the function constant on each interval ((i − 1)/n, i/n).For ω ∈ T k δ , we have, at most, one "remainder" interval, and by using the fact that this set has a fixed size, plus, the partition classes have almost the same size, we get that the rounding affects the function only on a set whose measure tends to 0 as n → ∞.Thus, for large enough n, we can find functions u 1 , . . ., u k ∶ [0, 1] → T k δ such that they have the same quantities μ s,ω as v 1 , . . ., v k , and ∥u j − v j ∥ 1 ≤ ε.By Lemma 13.2 and the finiteness of the norm of W ′ G it follows that for large enough n.Now, we use the same argument for v j that we used for v j , which is based on the proof of Lemma 2.12 and Lemma 13.2 and ∥W G − W ′ G ∥ ◻ ≤ ε.This yields Since v j is constant on each interval (i/n, (i + 1)/n), it is straightforward to assign a function h j ∶ [n] → T δ to it, for which we have To see that this is the right normalization of the adjacency matrix, recall that W G is defined on [0, 1] 2 , and the length of the intervals on which it is constant 1 is 1/n.Putting this together, we conclude that we can find appropriate functions Finally, let w 1 , . . ., w k ∶ [0, 1] → R with ∥v j ∥ ∞ ≤ 1 be arbitrary functions (we omit the condition that is goes to T k δ ).It is clear that we can find v 1 , . . ., v k ∶ [0, 1] → T δ such that ∥w j − v j ∥ 1 ≤ δ.As we have seen before, this implies that Therefore, if δ = δ(ε) is a fixed constant smaller than ε, then, together with the first step of the proof, we can conclude that that the Hausdorff distance of the k-profiles of W G and A(G)/|V (G)| is smaller than ε for large enough n. ∎ Together with Theorem 8.2 it implies the following.
for every fixed r the probability distribution of isomorphism classes of neighborhoods of radius r converges when i goes to infinity.It is often useful to refine this convergence notion to the local-global setting.In this framework, we put "colorings" on the vertex sets of G i in all possible ways and look at all possible colored neighborhood statistics.It is not to confuse with colored Benjamini-Schramm limits, where we put one coloring on each graph G i .We give the formal definition of local-global limits.
We summarize the notion of local-global convergence based on [21].A rooted graph is a graph with a distinguished vertex o called root.The radius of a rooted graph is the maximal distance from the root over all vertices.A k-coloring of a graph r denote the set of isomorphism classes of k-colored rooted graphs of maximal degree at most d and radius at most r.Note that G d ,k,r is a finite set.We denote by P(G d ,k,r ) the set of probability distributions on G d ,k,r .We have that P(G d ,k,r ) together with the total variation distance d TV is a compact metric space.By abusing the notation, we denote by d H the Hausdorff distance for subsets in (P(G d ,k,r ), d TV ).
Let G = (V , E) ∈ G d and let f ∶ V → [k] be a k-coloring.We denote by τ r (G, f ) the probability distribution on G d ,k,r obtained by putting the root o on a uniformly chosen random vertex of G and then taking the colored neighborhood of o of radius r.Let Z k,r (G) denote the set of all possible probability distributions τ r (G, f ), where f runs through all possible k-colorings of G.We have that Z k,r (G) is a subset of P(G d ,k,r ).
i=1 is convergent in the metric d H .It was proved in [21] that limits of local-global convergent graph sequences can be described by certain Borel graphs called graphings.We give the formal definition.for all measurable sets A, B ⊆ X, where e(x, S) is the number of edges from x ∈ X to S ⊆ X.
The probability measure ν allows us to talk about random vertices in G.The colored neighborhood of a random vertex in G is a graph in G d ,k,r .The probability distribution τ r (G, f ), the measure Z k,r (G) and convergence are similarly defined as for finite graphs.
Finite graphs are special graphings, where X is a finite set and ν is the uniform distribution.It will be important that graphings are bounded operators on L 2 (X, ν).The action is given by for v ∈ L 2 (X, ν).We have that ∥G∥ 2 ≤ d.Note that the integral formula (9.1) is equivalent to the fact that G is a self-adjoint operator.Graphings are also positivity-preserving and hence they are examples of graphops.The next theorem is proved in [36].
i=1 is convergent in d H for every fixed k ≥ 1.Our main theorem here says that, restricted to graphings, P-operator convergence is the same as local-global convergence.Consequently, P-operator convergence is a generalization of graphing convergence.

Theorem 9.2 A sequence of graphings is local-global convergent if and only if it is convergent in the metric d M .
Proof We need some preparation.Let M d ,k denote the set of vectors There is a natural bijection α between M d ,k and G d ,k,1 given in the following way.For a vector v ∈ M d ,k let q(v) denote the unique coordinate i ∈ [k] with v i = 1 and let s(v) ∶= ∑ 2k i=k+1 v i .We denote by α(v) the colored star in which the color of the root (denoted by o) is q(v), the root has s(v) neighbors, and for the neighbors of o the color i ∈ [k] is used v i+k times.It is clear that the isomorphism type α(v) is determined by this information and each isomorphism type in G d ,k,1 is obtained this way.Consequently α is a bijection.We denote by α the bijection between the sets of probability measures P(M d ,k ) and P(G d ,k,1 ) induced by α using the formula α(μ)(T) ∶= μ(α −1 (T)).It is clear that α is continuous with respect to the given metrization on P(M d ,k ) and P(G d ,k,1 ).
Observe that if G is a graphing (of maximal degree d), then To see this, notice that colorings f ∶ X → [k] are in a one-to-one correspondence with systems of 0 − 1-valued . Now the continuity of α and Theorem 7.3 finish the proof.∎

Generalizations
Action convergence is based on a very general principle.We do not exploit the generality of it in this paper, but as illustration we describe a few useful generalizations.
Complex spaces:The theory developed in this paper can be generalized to operators acting on complex number valued function spaces.Most of the definitions are the same and the proofs of the theorems require some minor changes.Note also that if a P-operator A over C has the special property that it takes real-valued functions to real valued functions, then its k-profile over C-valued functions can be reconstructed from its 2k-profile over R by decomposing functions according to real and complex part.

Simultaneous convergence:
We have mentioned in the introduction that it is sometimes useful to introduce simultaneous convergence of pairs (A, f ), where A ∈ B(Ω) is a P-operator and f is measurable on Ω.Based on the same principle, one can further generalize this to a simultaneous convergence notion of several P-operators and several functions.An especially interesting case is when matrices and their adjoint matrices are considered simultaneously.In this case, S k (A) is defined by It is not clear whether this leads to a finer convergence notion of matrices or not.
Nonlinear operators: In the definition of the metric d M , we never use the linearity of the operators.The definition of k-profile and distance d M (A, B) is meaningful for arbi- . (Even multivalued functions can be allowed.)Interesting examples for nonlinear operators are finite matrices composed pointwise with nonlinear functions.For example: (x, y, z)A ∶= ((x + y) 2 , sin(y + z), z − x).Such functions arise in deep learning.

Random matrices
In this section, we investigate the convergence of certain dense random matrices with respect to the metric d M .We consider a sequence of normalized random matrices (H n ) with independent zero-mean ±n −1/2 -valued random variables as entries.This is the same as if we choose an element uniformly at random from the set of all n × n matrices with ±n −1/2 entries.This set will be denoted by M n .Our goal is to prove the following.
Proposition For every infinite S ′ ⊆ N there exists a P-operator A and an infinite set S ⊆ S ′ such that the sequence (H j ) j∈S converges to A with respect to d M with probability 1.
We start with a statement on the concentration of measure.
Proof For 1 ≤ j ≤ n, let F j be the σ-algebra generated by the first j columns of H n .We apply the well-known concentration inequalities for the martingale Let X p,q,c be the set of matrices M with ∥M∥ p→q ≤ c.

Lemma 11.2
There exists a sequence of matrices (M j ) j∈N such that the following conditions hold: , where H j is a uniformly chosen random element of M j .
Proof Given ε > 0, first we find a sequence of matrices around which random matrices are concentrated with error ε.The metric space (X 2,2,3 , d M ) is compact by Theorem 2.10, hence it contains a finite ε/8-net.We denote the size of this net by F(ε).Consider balls of radius ε/8 around the elements of this net.Let N ε,n be the set of matrices satisfying the following property: in one of these balls, it is the closest element of M n to the center (in case of equality, choose one arbitrarily).Then, N ε,n is an ε/4-net in M n ∩ X 2,2,3 , and its size is at most F(ε), as we have chosen at most one element from each ball.It follows that there exists M ′ ε,n ∈ N ε,n such that Since the operator norm of our random matrix H n random matrix is concentrated around its expectation 2 (see, e.g., [19]), the probability P(∥H n ∥ 2 > 3) tends to 0 as n goes to infinity.Therefore, for every ε > 0, we have This equation together with Lemma 11.1 for η = ε/4 and (M ′ ε,n ) n∈N implies that By combining this with Lemma 11.1 for η = ε/2, we conclude that The proof can be completed by a standard diagonalization argument.More precisely, we can choose a function n 0 (ε) such that Then, the sequence M j = M ′ 1/k( j), j satisfies the conditions of the lemma.∎

Proof of Proposition 11
Let (M j ) j∈N be a sequence of matrices satisfying the conditions of Lemma 11.2.By this lemma, we can choose an infinite subset S ⊆ S ′ such that (d M (M j , H j )) j∈S tends to 0 with probability 1 as j → ∞, and (M j ) j∈S converges to a P-operator A with respect to d M .To guarantee the second condition, we can use Lemma 2.6 and Theorem 2.9, because M j ∈ X 2,2,3 for all j.This S will be an appropriate subset of S ′ .∎

Hypercubes and uniform towers
The hypercube graph This means that the sequence {Q n } ∞ n=1 is very sparse but not with bounded degrees.Note that Q n is a Cayley graph of the group Z n 2 where {0, 1} is identified with the cyclic group Z 2 of order 2 and the generators are the basis vectors e i , i ∈ [n] with 1 at the ith coordinate and 0 elsewhere.
Our goal is to show that hypercubes converge to an appropriate Cayley graph of the compact group Z ∞ 2 with a carefully chosen topological basis.A topological basis is an independent set of vectors in Z ∞ 2 that generates a dense set in Z ∞ 2 .(Note that topological independence is not assumed here.)Quite surprisingly, the usual topological basis {e i } ∞ i=1 of Z ∞ 2 is not useful for constructing the limit of the hypercubes.The main obstacle is that {e i } ∞ i=1 is a countable set but there is no natural uniform distribution on an infinite countable set.Instead, we need to find a nice enough topological basis with uncountably many elements and a natural uniform distribution on this basis.
Since Q n is regular, we have that adjacency operator convergence is equivalent to random walk convergence, so we do not have to choose one of them.The right scaling of the sequence is A n ∶= A(Q n )/n, where A(Q n ) is the adjacency matrix of Q n .The operator A n is a Markov graphop, and if {A n } ∞ n=1 is convergent, then the limit is also a Markov graphop (recall Theorem 3.3).As we stated above, the purpose of this part of the paper is to show that they indeed converge and to determine the limit object.Some details will be left to the reader regarding the general convergence.We will work with the subsequence {Q 2 n } ∞ n=1 that has especially nice properties based on certain uniform mappings between Q 2 n+1 and Q 2 n .The general convergence can be obtained from approximate versions of these uniform maps.In Figure 5, we show the adjacency matrix of the eight-dimensional hypercube Q 8 using two different orderings of the vertices.Light gray points represent zeros and black points represent ones.The first ordering is based on the binary forms of the numbers 0, 1, 2, . . ., 255, which is a rather natural way to order {0, 1} 8 .In the second figure, we compose this ordering with a carefully chosen automorphism of the group Z 8  2 .Quite surprisingly, it turns out that the second figure provides a more useful representation when going to the limit.There is a qualitative difference between the two types of representations of Q 8 .Intuitively, the first pictures would converge to some "infinite picture, " where each vertical (and horizontal) line has countable intersection with the black points.On the other hand, the second figure fits into a sequence such that, after going to the limit, vertical (and horizontal) lines have uncountable intersections with the black points.We will see later that this helps in putting a uniform distribution on the limiting picture.
We will need the following definition.The first part will be a generalization of the graph theoretic notion of covering.Namely, for b = 1, we get back the usual definition.For larger b, the map can contract several neighbors of a given vertex into one single vertex, but, as condition (3) shows, only in a balanced way, i.e., the number of contracted neighbors is fixed.

Definition 12.1 (Uniform map and uniform tower
Furthermore, the first and third properties imply that if To see this, notice that the left-hand side at a given vertex u ∈ V (G 2 ) is the sum of f at the neighbors of f (u).By the third property, exactly b neighbors of u are mapped to any of these vertices.On the other hand, since f is a graph homomorphism, neighbors of u can be mapped only to the neighbors of w.We get that the left-hand side is equal to b times the sum of values of f (w) at the neighbors w of u, and the latter is exactly (v ○ f )A(G 2 ).This proves the equality.Therefore, we obtain that we have that X is compact with respect to the subspace topology.The map π i ∶ X → X i defined by π i (x 1 , x 2 , . . . ) ∶= x i is a continuous map.If each f i has the property that | f −1 i (v)| = | f −1 i (w)| holds for every v, w ∈ X i+1 , then by the Kolmogorov extension theorem there is a unique Borel probability measure μ on X such that for every i the push-forward measure of μ under π i is uniform on X i .We call μ the uniform measure on X. Definition 12.2 Let {G i , f i } ∞ i=1 be a uniform tower such that G i is finite and d iregular for i ∈ N. Let V be the inverse limit of {V (G i ), f i } ∞ i=1 .For every x ∈ V let N(x) ⊆ V denote the inverse limit of the set of neighbors of π i (x) and let ν x denote the uniform measure on N(x) (more precisely, as the previous argument shows, there is a uniform measure on V, as it was constructed as the inverse limit of the finite sets V (G i ), and this induces a uniform measure on N(x)).Let A be the P-operator in B 2,2 (V , μ) defined by ( f A)(x) = ∫ V f dν x .We say that A is the inverse limit of the tower {G i , f i } ∞ i=1 .
Theorem 12.2 (Convergence of uniform towers) Let {G i , f i } ∞ i=1 be a uniform tower such that G i is d i -regular for i ∈ N. Then {A(G i )/d i } is a convergent sequence of Poperators and the limit object is the inverse limit of {G i , f i } ∞ i=1 .Then the P-operator A is the limit of the graph sequence {Q 2 n }.

Proof
We conclude with some further remarks on the convergence of hypercubes.Similarity of large dimensional hypercubes is interesting even without knowing the limit object.We only establish it for power of two dimensions.Note that it is true in general but it requires an asymptotic version of our covering techniques.This extension of our proof is more technical but the main idea is essentially the same.Our results imply that various extra structures on hypercubes behave similarly in different large dimensions.For example, this includes entry statistics of almost eigenvectors even in the simultaneous setting where many almost eigenvectors are viewed together.(For a more precise definition of this problem see [2].)We emphasize that our results on hypercubes serve more as a theoretical foundation for a large class of questions rather than a complete description of their properties.Any problem that is expressible in terms of the sets S k can now be reduced to the study of the limit object, which is a natural algebraic structure.However, this direction is new and needs to be exploited.

Product graphs
The product of two graphs, G 1 and G 2 , is the graph on V (G 1 ) × V (G 2 ) such that ((i, j), (k, l)) ∈ E(G 1 × G 2 ) if and only if (i, k) ∈ E(G 1 ) and ( j, l) ∈ E(G 2 ).Graph sequences formed by the powers of a given graph are good test graphs for limit theories.We have that 2|E(G The number 0 ≤ β ≤ 2 expresses the exponent of the growth rate of the number of edges in terms of the number of vertices in {G i } ∞ i=1 .One can view G i as a fractal-like graph (see Figure 6).
When G is d-regular, we can use Theorem 12.2 to compute the limit object of {G i } ∞ i=1 .The main observation is that the map π i ∶ V (G i+1 ) → V (G i ) given by the projection to the first i coordinates is uniform and thus {G i , π i } ∞ i=1 is a uniform tower.It is easy to see that the inverse limit is simply given as the infinite power G ∞ with the uniform distribution on the vertices and on the edges.According to Theorem 12.2, this inverse limit is basically the limit of the normalized P-operator sequence {A(G i )/d i } ∞ i=1 .The corresponding graphop A ∈ B(V (G) ∞ ) is given by (vA)(x) = E (x , y)∈E(G ∞ ) v(y), where the expected value is calculated according to the product measure on the neighbors of x.More precisely, if x = (x 1 , x 2 , . . . ) ∈ V (G ∞ ) is fixed, then the set of neighbors of x is the infinite product N(x 1 ) × N(x 2 ) × ⋯.We define ν x as the product of the uniform measures on N(x 1 ) × N(x 2 ) × ⋯ and the expected value is according to the measure ν x .
Similarly, to the previous case, by approximating L2 functions with continuous ones, it can be proved that M is indeed the limit of M n , and hence the sequence of 2subdivisions of complete graphs converges to this P-operator according to random walk convergence.

Incidence graphs of finite projective planes
Let q be a prime power and let P(q) denote the projective plane over the finite field with q elements.The plane P(q) has q 2 + q + 1 lines and q 2 + q + 1 points.We denote by G q the bipartite graph, whose vertices are the lines and the points in P q and the edges in G q are incidences in P(q).This means that a line l is connected with a point p if l contains p.We have that G q is (q + 1)-regular, |V (G q )| = 2(q 2 + q + 1) and |E(G q )| = (q 2 + q + 1)(q + 1).It follows that the sequence G q is an intermediate density sequence.The number of edges is roughly the 3/2th power of the number of vertices.
The proof is based on the fact that the eigenvalues of G q are known to be q + 1, −q − 1, √ q, − √ q with multiplicity 1, 1, q 2 + q, q 2 + q.The two eigenvalues q + 1 and −q − 1 belong to the constant 1 vector v 1 and the vector v 2 , which takes 1 at points and −1 at lines.Let B q ∶= (v * 1 v 1 − v * q −1/2 .It follows from Lemma 2.13 that d M (A(G q )/(q + 1), B q ) ≤ 3∥A(G q )/(q + 1) − B q ∥ 1/2 2→2 = 3q −1/4 and hence the limit A(G q )/(q + 1) is the same as the limit of B q as the prime power q goes to infinity.On the other hand B q is twice the normalized adjacency matrix of the complete bipartite graph with equal color classes on 2(q 2 + q + 1) points.This proves the claim.
The next question illustrates that this does not end the limiting investigation of G q .We can look at it at a finer scale by removing the two dominant eigenvectors and normalizing it with a different constant (Figure 7).

Question 12.3 Let B ′
q ∶= (A(G q )/(q + 1) − B q )q 1/2 .We have that ∥B ′ q ∥ 2→2 = 1.Does the sequence of P-operators B ′ q converge as the prime power q goes to infinity?If yes, what is the limit object?Note that by compactness we know that B ′ q has convergent subsequences.

Appendix (technical lemmas)
Recall from equation (2.1) that D(v) denotes the empirical distribution of a vector, while (2.4) says that for a probability measure μ on R k we defined τ(μ) ∈ [0, ∞] by Lemma 13.3 Let q ∈ (1, ∞) and let X be a real-valued random variable with E(|X| q ) = c < ∞.Then for z ∈ R + we have that E| f z (X) − X| ≤ cz 1−q .

Proof
The statement E(|Y| q ) ≤ c follows from the compactness of {μ ∶ μ ∈ P(R), ∫ x |x| q dμ ≤ c} in the weak topology.Since f z is continuous with finite support,

Figure 1 :
Figure 1: Two probability measures of the form μv in the profile of a 2, 000 × 2, 000 random matrix.

Figure 2 :
Figure 2: Graph ⇒ operator ⇒ action ⇒ measure (computing an element in the 1-profile of a graph).

Definition 4 . 1 (
Free semigroup with operators) Let G and L be sets.We denote by F(G, L) the free semigroup with generator set G and operator set L (freely acting on F(G, L)).More precisely, we have that F(G, L) is the smallest set of abstract words satisfying the following properties.
Now we return to the proof of Lemma 4.1.https://doi.org/10.4153/S0008414X2000070XPublished online by Cambridge University Press

Remark 5 . 3 (
Representing graphops by Markov pairs) In general, every graphop A ∈ B(Ω, A, μ) can be naturally represented by a Markov pair in the following way.

Remark 6 . 5 (
Markov graphops as couplings) Markov graphops are special graphops such that 1 Ω A = 1 Ω .It follows that the marginal distribution of ν on Ω is equal to

Figure 4 :
Figure 4: The spherical graphop.It is neither a graphon nor a graphing: it is somewhere half way in between.
H denote the corresponding Hausdorff distance (recall Definition 2.4).

Definition 9 . 2 (
Graphing) Let X be a Polish topological space and let ν be a probability measure on the Borel sets in X.A graphing is a graph G on V (G) = X https://doi.org/10.4153/S0008414X2000070XPublished online by Cambridge University Press 106 Á. Backhausz and B. Szegedy with Borel measurable edge set E(G) ⊂ X × X in which ∫ X e(x, y)dν(y) ≤ d for all x ∈ X and ∫ A e(x, B)dν(x) = ∫ B e(x, A)dν(x) (9.1)

Figure 5 :
Figure 5: Two representations of the adjacency matrix of the hypercube in dimension 8.

Figure 7 :
Figure 7: The Fano plane P(2) and the matrix of the incidence graph of the projective plane over the field F9.
a growing sequence of finite graphs, then the action convergence of {A(G i )/|V (G i )|} ∞ i=1 is equivalent to dense graph convergence.Benjamini-Schramm and local-global limits are used in the study of bounded degree graphs.Let d be a fixed number and let G d denote the set of isomorphism classes of graphs with maximal degree at most d.Informally speaking, a graph sequence For every n ∈ N, let M n ∈ M n be fixed, and H n be a uniformly chosen element of M n .Then for every η > 0, we have https://doi.org/10.4153/S0008414X2000070XPublished online by Cambridge University Press 108 Á. Backhausz and B. Szegedy Lemma 11.1 1} n×n differ only in a single column, then the distance of D A (v 1 , . . ., v k ) and D B (v 1 , . . ., v k ) is at most 1/n in the Lévy-Prokhorov metric (for arbitrary k and vectors v j ), because the two measures coincide everywhere except on an event of probability 1/n.This implies d M (A, B) ≤ 1/n.Hence |Y j − Y j−1 | ≤ 1/n holds for every j = 1, 2, . . ., n.Therefore, by Azuma's inequality, we have that Observe that ∥A(G i )/d i ∥ 2 = 1 and if f i is (a i , b i ) uniform, then d i+1 = d i b i .We have by Lemma 12.1 that A(G i )/d i ≺ A(G i+1 )/d i+1 holds for every i ∈ N. It follows by compactness that S k (A(G i )/d i ) converges to ∪ ∞ i=1 S k (A(G i )/d i ) in d H as i goes to infinity.Let A ∈ B 2,2 (V , μ) be the inverse limit of the tower {G i , f i } ∞ i=1 .By measure on the edges.More precisely, let A denote the P-operator in B 2,2 (G ∞ , μ) defined by z∈Q f (x + z) dν.