Tree limits and limits of random trees

We explore the tree limits recently defined by Elek and Tardos. In particular, we find tree limits for many classes of random trees. We give general theorems for three classes of conditional Galton-Watson trees and simply generated trees, for split trees and generalized split trees (as defined here), and for trees defined by a continuous-time branching process. These general results include, for example, random labelled trees, ordered trees, random recursive trees, preferential attachment trees, and binary search trees.

this case we equip P(X) with the standard weak topology, see e.g. [11]. If X is a Polish space, then so is P(X), see [11,Appendix III] or [12,Theorem 8.9.5].
The Dirac measure (unit point mass) at a point x is denoted δ x . If μ is a probability measure on a space X, then ξ ∼ μ and μ = L(ξ ) both denote that ξ is a random element of X with distribution μ.

Limits
Unspecified limits are as n → ∞.
As usual, w.h.p. (with high probability) means with probability tending to 1 as a parameter (here always n) tends to ∞.
If Z, Z n are random elements of a metric space X, then Z n d −→ Z, Z n p −→ Z, and Z n a.s. −→ Z denote convergence in distribution, in probability and almost surely (a.s.), respectively. Note that Z n d −→ Z is the same as convergence in P(X) of the distributions, i.e., L(Z n ) → L(Z). If (a n ) n is a sequence of positive numbers, then o p (a n ) denotes a sequence of random variables Z n such that Z n /a n p −→ 0; this is equivalent to |Z n |/a n < ε w.h.p. for every ε > 0.

Miscellaneous
If T is a tree, we abuse notation and write T for its vertex set V(T). The number of vertices is denoted by |T|. If T is a rooted tree, then the root is denoted by o.
If x, y ∈ R, then x ∧ y := min{x, y}. On the other hand, if v and w are vertices in a rooted tree, then v ∧ w denotes their last common ancestor.

Convergence of trees and long dendrons
We give here a summary of the main definitions and results of [20], together with some further notation.

Convergence of trees
For r 1, let M r be the space of real r × r matrices; note that M r = R r 2 is a Polish space, and thus P(M r ) is a Polish space.
For a set X with a given function d:X 2 → R, and r 1, let ρ r :X r → M r be the map given by the entries ρ r (x 1 , . . . , x r ) ij = ρ r (x 1 , . . . , x r ;X, d) ij  We often consider ρ r when d is a metric on X; then the special definition in (3.1) when i = j is redundant. However, for the long dendrons defined below, we typically have d(x, x) > 0, and then the definition (3.1) is important. See also Remark 3. 10.
A metric measure space is a triple (X, d, μ), where X is a measurable space (so X = (X, F) with F hidden in the notation), μ ∈ P(X), and d:X 2 → R is a measurable metric on X.
Suppose, more generally, that X = (X, F, μ) is a probability space and that d:X 2 → R is a measurable function. For r 1, define the sampling measure τ r (X) = τ r (X, d, μ) := ρ r (μ r ) ∈ P(M r ), (3.2) the push-forward of the measure μ r ∈ P(X r ) along ρ r . In other words, if ξ 1 , . . . , ξ r are i.i.d. random points in X with ξ i ∼ μ, then τ r (X) := L ρ r (ξ 1 , . . . , ξ r ;X) , (3.3) the distribution of the random matrix ρ r (ξ 1 , . . . , ξ r ) ∈ M r . A finite tree T is regarded as a metric space (T, d T ), where d T is the graph distance. Furthermore, if c > 0, we let cT denote the metric space (T, cd T ), where all distances are rescaled by c. We regard cT as a metric probability space by equipping it with the uniform measure μ T defined by μ T {x} = 1/|T| for x ∈ T. Then τ r (cT) ∈ P(M r ) is defined by (3.2).
1 be a sequence of finite trees and (c n ) ∞ 1 a sequence of positive numbers. Then the sequence (c n T n ) ∞ 1 converges if the sampling measures converge for every fixed r, i.e., if there exist λ r ∈ P(M r ) such that, as n → ∞, τ r (c n T n ) = τ r T n , c n d T n , μ T n → λ r in P(M r ), r 1. (3.4) By (3.3), the condition (3.4) is equivalent to convergence in distribution of the random matrices ρ r (ξ (n) 1 , . . . , ξ (n) r ;c n T n ), where for each n, (ξ (n) i ) i are i.i.d. uniform random vertices of T n .

Remark 3.2.
As said in the introduction, Elek and Tardos [20] consider only the normalization c n = 1/ diam (T n ), but we will not assume this.
A real tree is a complete non-empty metric space (T, d) such that for any pair of distinct points x, y ∈ T, there exists a unique isometric map α: [0, d(x, y)] → T with α(0) = x and α(d(x, y)) = y, and furthermore, for every s ∈ (0, d(x, y)), x and y are in different components of T \ {α(s)}. (There are several different but equivalent versions of the definition; see e.g. [17,18,36,37].) Remark 3.3. Note that we define the trees as complete (as do [20]); this is often not required. For our purposes completeness is convenient and no real loss of generality; if T is an incomplete real tree (defined as above without completeness), then the completion T is also a real tree (see e.g. [17,Theorem 8]), and in the limit theory below we can use T instead of T.
If T = (T, d) is a real tree and c > 0, let cT := (T, cd). Then cT is also a real tree. A measured real tree is a real tree T = (T, d) equipped with a probability measure μ. We will only consider separable trees T and Borel measures μ, and then (T, d, μ) is always a metric measure space. (For non-separable measured real trees, see [20], where they e.g. are used in the proofs; then μ might be defined on a smaller σ -field than the Borel one, and the condition that d has to be measurable is added. See also Remark 3.10.) Example 3.4. If T is any finite tree (in the usual combinatorial sense), letT denote the real tree obtained by regarding each edge in T as an interval of length 1. ThenT is a compact real tree, and T is isometrically embedded as a subset ofT. Hence, we can regard μ T as a probability measure on https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0963548321000055 Downloaded from https://www.cambridge.org/core. IP address: 54.70.40.11, on 08 Aug 2021 at 09:59:01, subject to the Cambridge Core terms of use, available at T, and (T, μ T ) = (T, d, μ T ) is a measured real tree. Obviously, τ r (T) = τ r (T, μ T ). More generally, cT is isometrically embedded in cT for any c > 0, and τ r (cT) = τ r (cT, μ T ). (3.5) We can therefore sometimes identifyT and T; see Section 5.
Consequently, we can regard Definition 3.1 as a special case of the following definition.
Definition 3.5. Let (T n ) ∞ 1 = (T n , d n , μ n ) ∞ 1 be a sequence of measured real trees. Then the sequence (T n ) ∞ 1 converges if the sampling measures converge for every fixed r, i.e., if there exist λ r ∈ P(M r ) such that, as n → ∞, Again, (3.6) is equivalent to convergence in distribution of the random matrices ρ r (ξ (n) 1 , . . . , ξ (n) r ;T n ), where, for each n, (ξ (n) i ) i are i.i.d. random points in T n with ξ (n) i ∼ μ n .
Remark 3.6. Gromov [22, Chapter 3 1 2 ] studied general complete separable metric measure spaces (with a finite Borel measure, which we may normalize to be a probability measure as above). He defined the Gromov-Prohorov metric (see Villani [50, p. 762] for another version, and Löhr [38] for the equivalence), and he also considered convergence in the sense above, i.e., τ r (X n ) → τ r (X) for every r, where X n and X are metric measure spaces; it turns out that this is equivalent to convergence in the Gromov-Prohorov metric, see Greven, Pfaffelhuber and Winter [21]; see also [31]. Gromov [22, 3 1 2 .14 and 3 1 2 .18] noted also that it is possible that τ r (X n ) converges for every r to some limit measure, but that there is no metric measure space X that is the limit. (One of Gromov's examples is the sequence of unit spheres S n with uniform measure, which behave as in Theorem 3.13 below with almost all distances being almost equal; a metric measure space limit would have to have almost all distances equal to some positive constant, which is impossible for separable spaces.) The new idea by Elek and Tardos [20] is to define another type of limit object (long dendrons) that works in general when X n are trees.

Long dendrons
Elek and Tardos [20] defined limit objects as follows. Note that a real tree T is locally connected (and locally pathwise connected); thus, if p ∈ T, then T \ {p} is the disjoint union of one or several (possibly infinitely many) open connected components; these are called p-branches. A branch of T is a p-branch for some p ∈ T.
An isomorphism between two long dendrons D = (T, d, ν) and D = (T , d , ν ) is an isometry f from (T, d) onto (T , d ) such that the mappingf : We call the real tree T the base of the long dendron D; we may identify T with T × {0} ⊂ A D . It is shown in [20,Lemma 6.2] that the base T of a dendron necessarily is separable; thus T and A D are Polish spaces.

Remark 3.8.
Elek and Tardos [20] also define a dendron as a long dendron such that if ξ 1 , ξ 2 ∈ A D are i.i.d. random points with distribution ν, then d D (ξ 1 , ξ 2 ) 1 a.s. These are the limit objects for real trees with diameter 1, and thus for trees with the Elek-Tardos normalization in Remark 3.2, but they have no special importance in the present paper. We may call them short dendrons. (For consistency with [20], we keep the name long dendron, although for our purposes it would be more natural to change notation and call them dendrons.) For a long dendron D = (T, d, ν), we use again (3.2)-(3.3) and define the sampling measure (3.8) i.e., the distribution of the random matrix ρ r (ξ 1 , . . . , Convergence of finite or real trees to a long dendron is defined by adding to Definitions 3.1 and 3.5 that the limits of the sampling measures are the sampling measures for the limit. Definition 3.9. Let (T n ) ∞ 1 be a sequence of finite trees and (c n ) ∞ 1 a sequence of positive numbers, and let D be a long dendron. Then the sequence (c n T n ) ∞ 1 converges to D if, as n → ∞, Similarly, if (T n ) ∞ 1 is a sequence of real trees and D a long dendron, then T n converges to D if, as n → ∞, Again, (3.9) and (3.10) are equivalent to convergence in distribution of the random matrices ρ r (ξ (n) 1 , . . . , ξ (n) r ), where (ξ (n) i ) i are i.i.d. as above. In general, we have to attach a continuum of half-lines at each point t (so that each half-line has measure 0), for example by defining T := T × C regarded as T with the half-lines {(t, re iθ ):r 0} attached, for every t ∈ T and θ ∈ [0, 2π), and with the measure ν on T equal to the push-forward by the map (t, r, θ) → (t, re iθ ) of the measure ν × dθ/2π. Note that then τ k (T ) = τ k (D) for every k 1. (Note how the special definition for i = j in (3.1) interacts with (3.7) to give the desired result.) However, we agree with Elek and Tardos [20] that it is more convenient to use long dendrons as limit objects. One reason is that the trees just constructed are nonseparable, and that the measures are not Borel measures on T . (There are plenty of nonmeasurable open sets.) Another reason is that long dendrons provide uniqueness of the limits in a simple way.

Two examples
The following examples of long dendrons are rather simple, and extreme in the sense that the measure ν on A D = T × [0, ∞) is supported on a 'one-dimensional' set with one of the coordinates fixed. Nevertheless, these two examples will play the main role in our limit theorems for random trees.
Note that if ξ ∼ μ, then (ξ , 0) ∼ ν. Since (3.7) implies that d D equals d on T × {0} = T, it follows that τ r (D) = τ r (T) for every r 1. We may thus identify the long dendron D with the measured real tree T.
By Remark 3.6, convergence of a sequence of (real) trees to D as in Definition 3.9 is equivalent to convergence to T in the Gromov-Prohorov metric. We may identify Thus every probability measure ν on [0, ∞) defines a long dendron ϒ ν := (T, d, ν).
By (3.1) and (3.7), A particularly simple, and important, case is when ν = δ a for some a 0. In this case we denote the long dendron by ϒ a , and note that ξ i = a is non-random, and thus (3.11) shows that ρ r (ξ 1 , . . . ξ r ) is the constant matrix This leads to the following simple characterization of convergence to the long dendron ϒ a .
Theorem 3.13. Let (c n T n ) n be a sequence of rescaled trees, and a 0. Then c n T n → ϒ a if and only if where d n is the graph distance in T n and (ξ (n) i ) i are i.i.d. uniformly random vertices in T n . The same holds, mutatis mutandis, for a sequence (T n , d n , μ n ) of measured real trees.
Proof. Convergence in distribution to a constant is the same as convergence in probability. Thus, (3.3), (3.1) and (3.12) show that Definition 3.9 now yields (3.14) By symmetry, it suffices to consider the case i = 1, j = 2.

Limit theorems
Some of the main results of Elek and Tardos [20] are the following, here somewhat reformulated. This is not stated in quite this generality in [20]; we show in Section 15 how it follows from other results in [20]. (We postpone this proof until the end of the paper because it uses arguments from [20] quite different from the other arguments in the present paper.) Again, this is not stated in quite this form in [20], but it is a simple consequence of [20, Lemmas 7.1 and 7.2]; we omit the details.

Infinite matrices
We extend the definitions in Section 3 to the case r = ∞, i.e. to infinite matrices. Let M ∞ be the space of infinite real matrices (a ij ) ∞ i,j=1 . Define ρ r and τ r by (3.1) and (3.2)-(3.3) also for r = ∞; Given any A = (a ij ) s i,j=1 ∈ M s , with r s ∞, define the restriction i.e., the r × r top left corner of A. Furthermore, if A ∈ M s is a random matrix with distribution ν ∈ P(M s ), we denote the distribution of r (A) by r (ν) ∈ P(M r ). (This is the push-forward of ν, see (2.1).) In other words, r (ν) is the marginal distribution of the r × r top left corner. Say that a sequence λ r ∈ P(M r ), 1 r < ∞, is consistent if r (λ s ) = λ r when r s. If λ ∈ P(M ∞ ), then the sequence λ r := r (λ) is obviously consistent. Conversely, every consistent sequence arises in this way for a unique λ ∈ P(M ∞ ); the corresponding statement for distributions of random vectors in R ∞ is well-known [34, Theorem 6.14], and the result for M ∞ follows immediately by reading the entries of the matrices in a suitable fixed order. Furthermore, if λ, λ n ∈ M ∞ , then Again, this follows immediately from the corresponding well-known fact for R ∞ [11, p. 19]. A sequence τ r (X), r 1, given by (3.3) is obviously consistent; furthermore, τ r (X) = r (τ ∞ (X)) for every r. Consequently, (4.2) implies the following.
i.e., if and only if the infinite random matrices ρ ∞ (ξ (n) In particular, the same results holds for a sequence (c n T n ) ∞ 1 of rescaled finite trees.
Proof. This follows from the remarks before the theorem. Note that if (3.6) holds for every r 1, then (λ r ) r is a consistent sequence, since (τ r (T n )) r is for every n.

Abstract tree limits
Based on the preceding section, we can define tree limits in an abstract way as follows, using only (part of) the definitions and elementary considerations above and none of the deep results of [20]. (Cf. [15] for graph limits.) Let T f be the set of all rescaled finite trees cT (with arbitrary c > 0). Then τ ∞ : This defines T as a closed subset of the Polish space P(M ∞ ); thus T is a Polish space. Hence, we can regard T as a (complete and separable) metric space whenever convenient; if necessary we can define a metric of T e.g. as the Prohorov metric on P(M ∞ ) [11, Appendix III], [12, Theorem 8.3.2], but we have in the present paper no need for a specific choice of metric. We can identify a rescaled finite tree cT with its image τ ∞ (cT) ∈ T (temporarily ignoring the question whether this is a one-to-one map). Then convergence as in Definition 3.1 is, by Theorem 4.1, the same as convergence in the metric space T. Furthermore, T is the set of all possible limits of convergent sequences; thus it is natural to say that T is the set of tree limits.
We have thus defined a set of tree limits; moreover, this set has turned out to be a Polish space. Similarly, a measured real tree T defines an element τ ∞ (T) ∈ P(M ∞ ). We define T r as the set of all measured real trees and T r := τ ∞ (T r ) ⊂ P(M ∞ ). (We ignore the set-theoretic difficulty of defining the "set of all measurable real trees"; formally we either consider trees that are subsets of some huge universe, or suitable equivalence classes under isomorphisms.) Then the following holds.
Theorem 5.1. With notations as above, We postpone the proof. It follows that convergence of measured real trees as in Definition 3.5 also is the same as convergence in T. From now on, whenever convenient, we identify finite trees and measured real trees with their images in T.
Returning to the deep results by Elek and Tardos [20] in Theorems 3.15-3.17, we first note that, similarly, each long dendron D defines an element τ ∞ (D) ∈ P(M ∞ ).   Proof. If D is a long dendron, then by Theorem 3.16, there exists a convergent sequence of rescaled finite trees (c n T n ) n that converges to D. In other words, τ ∞ (c n T n ) → τ ∞ (D). Thus Conversely, if μ ∈ T, then there exists a sequence c n T n ∈ T f such that τ ∞ (c n T n ) → μ. Thus the sequence c n T n is convergent, and by Theorem 3.15, there exists a long dendron D such that c n T n → D, which means τ ∞ (c n T n ) → τ ∞ (D). Consequently, μ = τ ∞ (D).
Hence, τ ∞ (D) = T, and we have already remarked that τ ∞ is injective on D by Theorem 3.17.
Consequently, we can identify D and T, and regard also D as the set of all tree limits. (As done by Elek and Tardos [20].) Note that this defines a topology on D, making D into a Polish space.
We ignore the taking of equivalence classes, and regard D as the set of all long dendrons. Thus, the topology on D gives a notion of convergence for long dendrons.  Proof. Immediate by the definitions and comments before the theorem together with (4.2).
Summarizing, we may thus regard finite trees, real trees, and long dendrons as elements of the Polish space T ⊂ P(M ∞ ). This gives a unified meaning to convergence of trees and real trees to a long dendron, and also a notion of convergence of long dendrons.
We turn to the question whether τ ∞ is injective (up to obvious isomorphisms) on the sets T f of finite trees and T r of measured real trees; recall that for long dendrons, this is answered (positively) by Theorem 3.17. Gromov [22, 3 1 2 .5 and 3 1 2 .7] studied a more general setting and proved that if X 1 = (X 1 , d 1 , μ 1 ) and X 2 = (X 2 , d 2 , μ 2 ) are two separable and complete metric measure spaces such that the measures have full support, and τ ∞ (X 1 ) = τ ∞ (X 2 ), then X 1 and X 2 are isomorphic. This applies immediately to rescaled finite trees, and it follows that if c 1 T 1 and c 2 T 2 are rescaled trees with τ ∞ (c 1 T 1 ) = τ ∞ (c 2 T 2 ), then T 1 ∼ = T 2 as metric spaces, and thus as trees, and c 1 = c 2 . (Except in the trivial case |T 1 | = |T 2 | = 1, when c 1 and c 2 are arbitrary.) In other words, τ ∞ :T f → T is injective up to isomorphism.
For measured real trees (T, d, μ), this is not quite true, since it may happen that μ is concentrated on a subtree T ⊂ T, and then τ ∞ (T, d, μ) = τ ∞ (T , d, μ). However, if T c is the set of measured real trees such that every branch has positive measure, then τ ∞ is injective on T c (up to isomorphism). One way to see that is to note that every T = (T, d, μ) ∈ T c may be regarded as a long dendron as in Example 3.12, and then use Theorem 3.17.
In general, given a measured real tree T, we may prune branches of measure 0 and obtain a subtree T ∈ T c ; this is called the core of T in [20], where a detailed definition is given. We see that the mapping τ ∞ does not distinguish between a measured real tree T and its core T .
In other words, our identification of measured real trees with tree limits in T means that we ignore branches of measure 0, and thus identify a tree with its core, but trees with different cores are distinguished. With some care, we may thus also regard measured real trees as elements of T.
One important consequence of regarding trees, measured real trees and long dendrons as elements of the Polish space T is that then standard theory (e.g. [11]) defines for us random trees, random measured real trees and random long dendrons, as well as convergence in probability or distribution of such random objects. This will be a central topic in the remainder of the paper.
First, however, it remains to prove Theorem 5.1.
Proof of Theorem 5.1. First, as explained in Example 3.4, a rescaled finite tree cT ∈ T f can be embedded in a measured real tree cT ∈ T r such that (3.5) holds for all finite r, and thus also for r = ∞. This proves T f ⊆ T r . Recalling (5.1), it remains only to show T r ⊆ T. We give first a short proof using the results of [20]. If T is a measured real tree, then the constant sequence T, T, . . . trivially is convergent, and thus Theorem 3.15 shows that there exists a long dendron D such that T → D, which by Theorem 5.3 means τ ∞ (T) = τ ∞ (D). Hence, by Theorem 5.2, We give also an alternative, elementary proof. We do this in several steps. We consider for simplicity, as said earlier, only separable trees.
Step 1. As in [20], say that a measured real tree is a finite real tree if it can be obtained from a finite tree by regarding each edge as an interval of some positive length (not necessarily the same for all edges), and adding a probability measure on the (finite) set of vertices. Note that a finite real tree has finite diameter. [20,Lemma 7.2] shows that every finite real tree T of diameter 1 is a limit of rescaled finite trees, i.e., T ∈ T. By rescaling, the same holds for every finite real tree.
Step 2. Suppose that T = (T, d, μ) is a measured real tree such that μ is concentrated on a finite set of points Step 3. Suppose that T = (T, d, μ) is a measured real tree such that μ is concentrated on a countable set of points {x 1 , x 2 , . . . }. Let Step 4. Let T = (T, d, μ) be any separable tree. There exists a countable dense subset A := For each n 1, define a measurable function f n :T → A such that d(x, f n (x)) < 1/n for all x. (For example, let f n (x) := x i for the smallest i such that d(x, for every i and j, and and thus ρ r (ξ (n)

Compactness
Recall that a set S in a metric space X is relatively compact if every sequence in S has a convergent subsequence. (This is equivalent to S being compact.) Recall also that a family {Z α :α ∈ A} of random variables in a metric space X is tight if for every In this case we also say that the family of distributions {L(Z α )} is tight.
Prohorov's theorem [11,Section 6] says that for a Polish space X, the set of distributions {L(Z α )} is relatively compact in P(X) if and only if {Z α } is tight. In particular, this leads to the following characterization of relative compactness in T.

Proof. (i) ⇐⇒ (ii) ⇐⇒ (iii): Prohorov's theorem, together with the definition of convergence and
(iv) =⇒ (iii): Follows by symmetry and the fact that M ∞ has the product topology. To be more precise, let ε > 0. By (iv), there exist constants and, by symmetry, It suffices to consider ε < 1/2. It then follows that and thus the events and thus we may choose Definition 6.2. A set of rescaled trees, measured real trees, or long dendrons, is tight if (iv) (or, equivalently, (v)) in Theorem 6.1 holds.
With this definition, Theorem 6.1 simply says that a set of rescaled trees, measured real trees or long dendrons is relatively compact if and only if it is tight. Usually, we consider sequences rather than general sets, and then Theorem 6.1 has the following corollary. Corollary 6.3. If (c n T n ) n is a tight sequence of rescaled trees, then some subsequence converges to some long dendron.
The same holds for tight sequences of measured real trees and for tight sequences of long dendrons.

Simple examples
As a preparation for the study of limits of random trees in the following sections, we give here a few simple examples of limits of deterministic trees.

Example 7.1 (paths).
Let P n be the path with n vertices. We may take the vertices to be {1, . . . , n}, and then d P n (x, y) = |x − y|. Let I = (I, d, μ) be the unit interval I := [0, 1] considered as a measured real tree with the usual metric d and Lebesgue measure μ. We regard I as a long dendron as in Example 3.11, and claim that 1 n P n → I. To see this, let for every r. This implies convergence in distribution, and thus τ r 1 n P n → τ r (I), and thus 1 n P n → I.
The diameter of P n is n − 1. Obviously, we obtain the same limit I if we use the Elek-Tardos normalization 1 n−1 P n . (Note that the limit I is a short dendron.) Example 7.2 (stars). Let S n = K n−1,1 be a star with n vertices. If (ξ (n) i ) i are i.i.d. random vertices in S n , then with probability 1 − O(1/n), ξ (n) 1 and ξ (n) 2 are distinct peripheral vertices, and thus d( Recall that x ∧ y denotes the last common ancestor of x, y ∈ B n . If h(x ∧ y) k, then x and y are both descendants of one of the 2 k vertices z with depth k. For each z, the number of such x (or y) is Consequently, Consequently, Theorem 3.13 yields We see that (7.9) encapsulates (and formalizes) the fact that almost all pairs of vertices in B n have distance ≈ 2n.
Recall that B n has N := 2 n − 1 vertices. Thus, (7.9) can also be written The results extend to complete b-ary trees T b n , for Example 7.4 (superstars). Let T n consist of a central vertex o with n paths attached: N kn paths with k edges for k 1, all having o as one endpoint but otherwise disjoint, for some numbers N kn 0 with k N kn = n. The number of vertices is thus |T n | = 1 + k kN kn . We assume that as n → ∞, for some

Thus
|T n | ∼ γ n. (7.14) Suppose further (this actually follows from the other assumptions) that uniformly random vertices of T n . It follows from the assumptions above that, for k 1, Let ν be the probability distribution on N given by ν{k} = q k , and let Furthermore, for any i, j 1, by (7.15) and (7.14), as in the special case in Example 7.2, It follows from (7.18) and (7.19) that for any r 1, Hence,

Limits of random trees
In the rest of the paper we consider limits of random (finite) trees. Suppose that T n , n 1, are random trees (with any distributions) and let, conditioned on T n , (ξ (n) i ) i be i.i.d. uniformly random vertices of T n . We are concerned with limits in distribution or probability of c n T n to some random or deterministic long dendron (tree limit). (Here, c n are some given positive numbers.) By the definitions above, this is equivalent to convergence of the conditional distributions regarded as random elements of P(M ∞ ); we thus want to show either that τ ∞ (c n T n ) converges in distribution to τ ∞ (D) for a random long dendron D, or (as a special case) that it converges in probability to τ ∞ (D) for a fixed D.

Remark 8.1.
It is important that we consider randomness in two steps: first T n is a random tree and then (ξ (n) i ) i are random vertices in T n . As seen in (8.1), we are interested in the quenched version, where we first sample T n and then condition on T n .
The alternative annealed version considers T n and (ξ (n) i ) i as random together; the annealed dis- We note one simple case where the difference between quenched and annealed disappears.
Proof. Recall that for any random variables Z n , Thus, for deterministic trees T n , the convergence in probability (3.13) is equivalent to Consequently, by Theorem 3.13, (i) is equivalent to A simple argument using Markov's inequality shows that (8.6) is equivalent to (iii). (This argument is a conditional version of (8.4).) Furthermore, since the left-hand side of (8.6) is bounded by 1, (8.4) shows that (8.6) is equivalent to which by a final application of (8.4) is equivalent to (ii).
We give also a version of the compactness criterion in Theorem 6.1 for random trees. We state the theorem for a sequence of random trees, although the statement and proof holds for an arbitrary set. (i) The sequence (c n T n ) n of random elements of T is relatively compact in P(T).
(ii) The sequence (c n T n ) n of random elements of T is tight.
(ii) =⇒ (iii): By the definitions in Section 5, T is a closed subspace of P(M ∞ ) and it follows that (ii) means that for every ε > 0, there exists a compact set K ε ⊂ P(M ∞ ) such that, for every n 1, Furthermore, Prohorov's theorem (now applied to the Polish space M ∞ ) shows that for every Consequently, for every ε, δ > 0, and all n, and thus, using also (8.9), By taking δ = ε, this shows (iii).
(iii) =⇒ (ii): By (iii), for every ε > 0, there exists C ε such that This is a compact subset of M ∞ , and (8.14) implies Hence, and Markov's inequality shows that, for any 1, By the definition of τ ∞ , this is the same as and note that K ε is compact by Prohorov's theorem. It follows by (8.19) that Hence, the sequence τ ∞ (c n T n ) is tight in P(M ∞ ), and thus in T.
Remark 8.4. Again, the same holds, mutatis mutandis, for random measured real trees and for random long dendrons. In fact, the argument is quite general and holds for any measured metric spaces. We believe that this may be known, but we do not know a reference and have included a full proof for completeness.

Conditioned Galton-Watson trees, I
Consider a Galton-Watson process with some given offspring distribution ζ . (We let ζ denote both the distribution and a random variable with this distribution.) The family tree of the Galton-Watson process is a random tree T , which in the subcritical and critical cases (i.e., when E ζ 1) is a.s. finite. T is a Galton-Watson tree, and the random tree T n := (T | |T | = n) obtained by conditioning T on a given size n is said to be a conditioned Galton-Watson tree. (We consider only n such that P (|T | = n) > 0.) For further details, see e.g. the survey [29].
In the standard case E ζ = 1 and Var ζ < ∞, Aldous [5,6,7] proved convergence in distribution of the conditioned Galton-Watson tree T n , after rescaling, to a limit object called the Brownian continuum random tree; this is a random measured real tree which we denote by T 2e , for reasons given below. Aldous's original result was not in terms of the type of convergence discussed in the present paper, but it holds in the present context too. In fact, Aldous's result has been stated in several different forms, more or less equivalent; one version, stated e.g. in [24,Theorem 8] and [2,Theorem 5.2], is convergence in the Gromov-Hausdorff-Prohorov metric (defined in e.g. [50,Chapter 27] and [41, Section 6]), which is stronger than Gromov-Prohorov convergence and thus implies convergence in the tree limit sense used in the present paper (see Remark 3.6 and Example 3.11). We thus have the following. Theorem 9.1. Let T n be a conditioned Galton-Watson tree with critical offspring distribution ζ with finite variance, i.e., we assume E ζ = 1 and σ 2 := Var ζ ∈ (0, ∞). Then, as n → ∞, where T 2e is the Brownian continuum random tree.
Proof. As said before the theorem, this is known. For completeness, we sketch a proof in the present context; omitted details can be found e.g. in e.g. [7] and [36]. One standard version of Aldous's theorem uses the contour function C T n (t) of T n . In general, if T is a rooted tree with |T| = n, then C T is a continuous function [0, 2(n − 1)] → [0, ∞); informally, C T (t) is the distance, at time t, from the root to a particle that travels with unit speed along the "outside" of the tree, starting at the root at time 0 and returning at time 2(n − 1), having traversed every edge once in each direction. Aldous [7] showed that Taking g(t) = C T (2(n − 1)t) for a rooted tree T with |T| = n gives T g =T, the real tree obtained from T as in Example 3.4. The measure μ induced by g is the uniform measure on the edges of T, and not the uniform measure μ on the vertices of T ⊆T; however, it is easy to couple these measures and find ξ ∼ μ and ξ ∼ μ such that P (|ξ − ξ | > 1) 1/n. It follows that (9.2) implies (9.1), both in the sense of the present paper and in the stronger Gromov-Hausdorff-Prohorov metric.
Remark 9.2. Duquesne [19] considered the case when ζ has infinite variance and furthermore is in the domain of attraction of a stable distribution; he extended Aldous's result and showed convergence of the contour process of c n T n (for suitable c n ) to a certain stochastic process in this case too; this implies convergence of c n T n to a random real tree called the stable tree [37] in Gromov-Hausdorff-Prohorov sense, and thus in the weaker sense of tree limits, also in this case. (See [24,Theorem 8], with a somewhat stronger assumption on ζ .) Remark 9.3. As is well-known, several important classes of random trees can be represented as conditioned Galton-Watson trees T n satisfying the conditions above by choosing suitable offspring distributions ζ ; thus Theorem 9.1 applies to them. This includes (uniformly) random labelled trees (σ 2 = 1), random ordered trees (σ 2 = 2) and random binary trees (σ 2 = 1/2); see e.g. [6] and [29]. Remark 9.4. Recall that random simply generated trees are defined by a weight sequence (w k ) k ; see, again, e.g. [6] or [29] for the definition and for the well-known fact that while simply generated trees are more general than conditioned Galton-Watson trees, they can in many cases be reduced to equivalent conditioned Galton-Watson trees. Thus Theorem 9.1 applies to simply generated trees under rather weak conditions. In other cases of simply generated trees, the results in Sections 10 and 11 may apply.

Conditioned Galton-Watson trees, II
Although a large class of conditioned Galton-Watson trees (and simply generated trees) are covered by Theorem 9.1, there are also other cases. One class of conditioned Galton-Watson trees with a different local limit behaviour showing condensation was found by Jonsson and Stefánsson [33]; this was generalized in [29], with further results in [1] and [49]. This class of conditioned Galton-Watson trees (called type II in [29] and [49]) has offspring distributions ζ satisfying 0 < κ := E ζ < 1, (10.1) In other words, the Galton-Watson trees are subcritical, and ζ has infinite moment generating function; see further [29,Section 8]. We will show that this class has tree limits that are very different from the ones in Section 9.
For a rooted tree T and a vertex v ∈ T, let δ(v) denote the outdegree of v. Furthermore, let = (T) := max v∈T δ(v) be the maximum outdegree, and let v † be the vertex with maximum outdegree (chosen as e.g. the lexicographically first if there is a tie), so = δ(v † ).
It is shown in [29,Section 19.6] that (10.1)-(10.2) imply the existence (asymptotically) of one or several vertices of very high (out)degree, with a total outdegree ≈ (1 − κ)n; typically, there is one single large vertex with degree ≈ (1 − κ)n, but this is not always the case; see [29] and Remark 10.8. We will assume that there is such a vertex; a case known as complete condensation.
To be precise, we assume that ζ is such that  Recall that ϒ ν is the deterministic long dendron defined in Example 3.12.

Remark 10.2.
Note that there is no rescaling of T n in (10.4); the situation is similar to Examples 7.2 and 7.4. Distances are typically small; formally, the distance d(ξ (n) 1 , ξ (n) 2 ) between two random vertices is stochastically bounded (i.e., tight). Hence, the local limits studied in [29] and [49] are essentially global in this case. Remark 10.3. The diameter diam (T n ) p −→ ∞, e.g. by Lemma 10.6 below. Hence, rescaling such that the diameter becomes 1 would only give the trivial limit ϒ 0 , see Remark 3.14.
The rest of this section contains the proof of Theorem 10.1. We begin with some further notation. In this proof, all trees are rooted and ordered. Trees that are equal up to order-preserving isomorphisms are regarded as equal. Let T be the countable set of all finite trees.
Let again T denote the (unconditioned) Galton-Watson tree with the chosen offspring distribution ζ . Since E ζ < 1, T is a.s. finite. If t is any fixed finite tree, let π t := P (T = t). (10.5) In other words, (π t ) t∈T is the probability distribution of T ∈ T. The fringe tree [4] of a tree T at a vertex v, denoted T v , is the subtree of T consisting of v and its descendants, rooted at v.
Let t denote a finite tree. For any tree T, let Both sides of (10.7) are, regarded as functions of t, probability distributions on the countable set of finite trees. (Note that N t (T n )/n t is the conditional distribution of T v n given T n , with v a random vertex, while π t t is the distribution of T .) Hence, (10.7) says that the random distribution N t (T n )/n t converges in probability to (π t ) t in the space P(T) with the usual weak topology. (Note that we here consider convergence of random probability distributions, regarded as elements of the space P(T) of probability distributions on finite trees.) We claim that, since T is countable, it follows that the random distribution N t (T n )/n t converges in probability to (π t ) t in total variation, and thus for any set T ⊆ T of finite trees, To see this, note that the corresponding result for sequences of probability distributions on a countable set is well known, see e.g. [23, Theorem 5.6.4]. The version used here with random distributions and convergence in probability follows by essentially the same proof, or by first using the Skorohod coupling theorem [34,Theorem 4.30], to see that we may assume that (10.7) holds a.s. for each t, and then using the deterministic version.
We need an extension of (10.7). Let Note that v is in the set in (10.13) if and only if Tˆv ∈ T j , where T j is the set of all treest such that the root has exactly k children, and if w is the jth of these, then the fringe treet w = t. Hence, (10.8) shows that, using also the recursive property of the Galton-Watson tree T , (10.14) The result follows, since N t,k (T n ) = k j=1 N t,k,j (T n ). We have so far not used the assumption (10.3), but it is essential for the next lemma. Recall that = (T n ) is the maximum outdegree, and that w.h.p. v † is the only vertex of outdegree . Hence, w.h.p., N t, (T n ) is the number of children v of v † such that T v n = t. Now consider the N t vertices v such that T v n = t. Assume for convenience n > |t|, so that the root is not one of these vertices. Then, using (10.7) and (10.12), Thus, using (10.18) and N t,k kN k , w.h.p., kp k π t n + o p (n). (10.20) Using also (10.17) and (10.11), we find that w.h.p.
Next, for a tree t, and 0, let w be the number of vertices at distance from the root. Furthermore, for 1, let W := w (T v † n ), the number of vertices in T n that are descendants of v † and are generations from it, and let W := n − 1 W be the number of vertices that are not descendants of v † .
Hence, by (10.24), for any finite T 0 . Furthermore, by elementary branching process theory, In particular, the sum converges, and it follows from (10.26) that Proof. If ξ (n) 1 and ξ (n) 2 both are descendants of v † , then d( and Lemma 10.7 yields −→. Then, (10.42) and the independence of (Y (n) i ) i shows that a.s. the sequence T n is such that, conditioned on T n , we have (Y (n) Consequently, for every r 1, using also (10.43), Hence, a.s., τ r (T n ) → τ r (ϒ ν ) and thus T n → ϒ ν .  (10.3). In this example, there exists a subsequence of n such that (T n )/n p −→ 0; there exists also another subsequences for which T n w.h.p. contains two vertices of outdegree n/3.
It is an open problem to find tree limits in such cases. In the case just mentioned with two large vertices (but not more), we conjecture that the tree limit is similar to ϒ ν in Example 3.12, but has a base consisting of a unit interval with the marginal distribution of ν concentrated on the two endpoints.
Remark 10.9. The proof above is based on the result (10.7) for random fringe trees T v n of T n . However, we also consider the parent of the random node v, see e.g. (10.9); thus we really consider properties of (part of) the extended fringe tree, also defined by Aldous [4]. The asymptotic distribution of the entire extended fringe tree was found by Stufler [49]. However, his result is for the annealed version, where the tree T n and the vertex v are chosen at random together, while we here need the quenched version, where we fix (i.e., condition on) T n and then take a random vertex v. We have therefore used the (quenched) result (10.7) rather than the result of [49]. In fact, the argument above is easily extended to show that for the part of the extended fringe tree up to the first very large ancestor (i.e., w.h.p., v † ), the infinite limit tree found by Stufler [49] is also the limit in the quenched sense. However, this does not hold for the remaining part of the extended fringe tree; this part is, for n − o p (n) choices of v, equal to the part of T n between the root and v † , and conditioned on T n it is thus w.h.p. equal to some random tree determined by T n . Consequently, there is no quenched limit of the entire extended fringe tree.

Simply generated trees, type III
As said in Remark 9.4, many simply generated trees are covered by the results for conditioned Galton-Watson trees in the preceding sections. However, there are also simply generated trees of a different type (called type III in [29]), where there is no equivalent conditioned Galton-Watson tree. These are defined by weight sequences (w k ) k such that the power series k w k z k has radius of convergence 0, i.e.,   (1), and of the n − 1 − Z n branches, Z n have size 2 and all others are single vertices (i.e., leaves of T n ).
The case w k = (k!) α with 0 < α < 1 is similar [32]; there are more branches that have size 2, and the largest may have size 1/α + 1, but their number is still o p (n) and (11.2) holds. If w k = (k!) α with α > 1, then T n = S n w.h.p.
Thus, (11.3) holds for any α > 0. In all these cases, it turns out that the random trees T n after rescaling have a non-random tree limit (in probability) of the type ϒ a in Example 3.12. More precisely, for some a ∈ (0, ∞), We note that by Theorem 8.2, (12.1) is equivalent to (12.2) where, as usual, d is the distance in T n and (ξ (n) i ) i are i.i.d. vertices in T n . Equivalently, from our point of view, i.e. with regard to distances between random points, these classes of logarithmic trees behave just like the deterministic binary tree in Example 7.3. (We do not know any natural example of logarithmic random trees that do not satisfy (12.1)-(12.2).) Remark 12.1. Note that Theorem 8.2 shows that in this case, with convergence to a constant, the annealed result (12.2) is sufficient. To prove (12.1) for some random trees T n , we therefore may work with annealed results and do not have to show quenched versions (which often are more difficult).
We end with an example that, as far as we know, does not follow from the general results in Sections 13 and 14.
Example 12.8. Simple families of increasing trees (simply generated increasing trees) were studied by Panholzer and Prodinger [47], who showed that if the generating function is a polynomial of degree d 2, then (12.2) holds with a = d/(d − 1). Hence,

Split trees
Random split trees were introduced by Devroye [14] as a unified model that includes many important families of random trees (of logarithmic height), for example binary search trees, m-ary search trees, tries and digital search trees. Theorem 13.1 below shows that random split trees after rescaling have a non-random tree limit of the type ϒ a in Example 3.12. Equivalently, by Theorem 8.2, distances between random points satisfy (12.2). The definition of split trees involves several parameters b, s, s 0 , s 1 and a split vector V = (V 1 , . . . , V b ) which is a random vector with V i 0 and b i=1 V i = 1, i.e., a random probability distribution on {1, . . . , b}. The split tree is defined as a subtree of the infinite b-ary tree T b . The tree is constructed by adding a sequence of n balls to the tree, which initially is empty. Each ball arrives at the root and then moves recursively as follows; see [14] for further details.
Each vertex is equipped with its own copy V (v) of the random split vector V; these copies are independent. Each vertex has maximum capacity s 1; the first s balls that arrive at a vertex stay there (temporarily), but when the (s + 1)th ball arrives at the vertex, some balls are sent to its children, leaving s 0 ∈ [0, s] balls that remain in the vertex for ever. (The details of this step depend on s 1 , see [14].) Any further ball that comes to the vertex is immediately passed along to one of its children, with probability V (v) i for child i and independently of all previous events. The split tree T n is defined as the set of all vertices that have been visited by a ball; note that (if s 0 = 0) some vertices in T n may be empty, but there is always at least one ball in some descendant of the vertex.
We exclude the trivial case max (V 1 , . . . , V b ) = 1 a.s., and then T n is finite a.s. (Usually one assumes the slightly stronger V i < 1 a.s. for every i [14].) It is important to note that T n is defined with a fixed number n of balls, while the number of vertices |T n | in general is random. Nevertheless, it is easy to see that (13.1) Furthermore, since each node stores at most s balls, we have a deterministic lower bound |T n | n/s. (13.2) In fact, in most cases E |T n |/n converges to some constant, and, moreover, |T n |/n converges in probability to the same constant, see [25,Theorem 1.1]. However, this is not always the case; for some tries, E |T n |/n oscillates.
Theorem 13.1. Let T n be a random split tree with a split vector V = (V 1 , . . . , V b ) and let χ be given by (13.3). Then, Proof. As said in Section 12, by Theorem 8.2, (13.4) is equivalent to Under a technical condition, (13.5) was proved by Berzunza, Cai and Holmgren [8,Corollary 1], as a corollary to some stronger estimates. (Actually, their d is slightly different, and includes the distance to the root, but the same proof yields (13.5).) For completeness, we give a proof (by similar methods) in the following subsection, not requiring any further conditions; in fact, we consider there an even more general model.
Without going into details, we note that the proof of (13.5) in [8], as well as our similar proof in Section 13.1, is based on showing the two results (12.3) and (12.4), and that (12.3) was shown by Holmgren [25].
Note also the related fact that if η (n) is a random ball in T n , then  [48] showed (in particular) the corresponding fact for the distance between two random balls: See also Albert, Holmgren, Johansson and Skerman [3,Lemma 3.4], showing that d(η (n) 1 ∧ η (n) 2 , o) is tight, which together with (13.6) implies (13.7), Remark 13.2. In analogy with Theorem 8.2, we can interpret (13.7) as convergence where we equip T n with the probability measure μ * n defined as the distribution of the balls on T n .

Generalized split trees
We define random generalized split trees as follows; this is a minor variation of the model in Broutin, Devroye, McLeish and de la Salle [13]. Let 2 b < ∞ be a fixed branching factor and suppose that for every integer n 1 we have a random vector Consider the infinite b-ary tree T b . For a given number n of balls, all starting at the root, distribute the balls according to N (n) , with N (n) 0 balls remaining in the root (for ever), and N (n) i balls passed to the ith child. Continue recursively in each subtree that has received at least one ball, using an independent copy of N (m) at each vertex that has received m balls.
It is convenient to begin by equipping each vertex v in the infinite tree T b with a private copy N (n,v) of N (n) for each n 1, with all these random vectors N (n,v) independent. Then, at each vertex v that receives m 1 balls, we apply N (m,v) .
The tree T n is defined as the set of all vertices that have received at least one ball (whether or not any ball remains there). Equivalently, T n is the set of all vertices v ∈ T b such that the fringe tree T v b contains at least one ball. Note again that the size |T n | is random. We assume the following: (ST1) There exists a constant C 0 such that for every n, a.s., The random vector n −1 N (n) converges in distribution as n → ∞: (13.11) (ST3) For every n 1, (13.12) and, similarly, We call the limit V in (13.11) the (asymptotic) split vector. Note that V 0 = 0 by (ST1) and (ST2); thus it suffices to consider ( Thus, V is a random probability distribution on {1, . . . , b}. It should be clear that the definition above includes the split trees defined by Devroye [14] and discussed above. (In particular, our model includes tries, unlike the version in [13].) Remark 13.3. (13.12) only excludes the trivial case when, for some n, a.s. all n balls are passed to the same child, and therefore, by induction, continue along some infinite path so that T n becomes infinite.
Conversely, it is easy to see by induction that (13.12) implies that T n is finite a.s. for every n 1. Moreover, (13.13) implies uniformity in (13.12): it is easy to see that (13.12)-(13.13) is equivalent to the existence of c, δ > 0 such that, for every n 1, Theorem 13.4. Let T n be a random generalized split tree with a split vector V = (V 1 , . . . , V b ) and let χ be given by (13.3). Then, where d is the distance in T n and (ξ (n) To prove Theorem 13.4, we show a series of lemmas. We define random variables W (n) and W as size-biased selections from N (n) /n and V. More precisely, conditionally on N (n) , we select an index I with distribution P (I = i | N (n) ) = N (n) i /n, and then define It follows from (ST2) that Note that (13.21) and, by (13.3), Furthermore, Let N v be the number of balls received by vertex v ∈ T b . Thus Proof. First, (13.31) follows immediately from the fact that by (ST1), no vertex contains more than C 0 balls when the construction is finished; hence there are at least n/C 0 vertices containing balls. For (13.28), recall (13.15), and assume as we may that δ < 1/2. Let r := 1/(1 − δ) < 2, and let X k be the number of vertices v such that N v ∈ [r k , r k+1 ). For a given k 0, generate the tree as usual, but stop at every vertex v that receives N v < r k+1 balls, and colour these vertices pink. If a pink vertex v has N v r k , recolour it red. Since the red vertices receive disjoint sets of balls, the number R k of them is at most n/r k . Condition on the set of red vertices and the numbers of balls in them, and continue the construction of the tree. Since r < 2, each red vertex has at most one child w with N w r k , and by (13.15), with probability at least c it has none. Continuing, we see that for each red vertex, the number of descendants w with N w r k is dominated by a geometric distribution, and thus the expected number of such descendants is O(1). Consequently, E X k | R k CR k , and thus E X k C E R k C n r k . (13.32) This yields which is (13.28). We obtain (13.29) in the same way, summing only over k with r k+1 > K.
Finally, the argument above shows that X k is stochastically dominated by a sum of n/r k independent copies of a geometric random variable ζ . Furthermore, we may choose these to be independent also for different k. (The red vertices for different k are not independent, but the stochastic upper bound that we use holds also conditioned on events for larger k.) Hence, (13.34) where ζ i ∈ Ge (p) are i.i.d. with some fixed 0 < p < 1, and m n := ∞ k=0 n/r k Cn. Hence, (13.30) follows by the law of large numbers.
Lemma 13.7. Let χ > 0 be given by (13.3). If D n is the depth of a random ball in T n , then  Let L = λ log n and define the stopping time τ as the smallest k such that one of the following occurs.
(a) k L, (b) N v k B, (c) k > D, and thus the ball has come to rest.
By the comments just made, we can couple the sequence (Y k ) k with an i.i.d.
Let, recalling (13.36), Then, (13.37) implies  By (ST1), there are at most C 0 |S| such balls, and thus the conditional probability given T n that our random ball is one of them is C 0 |S|/n. Hence, by (13.29), We may increase B if necessary so that B > C 1 /ε, and thus P (E − ) < ε.
On the other hand, if E − holds, then, by (13.38), Thus, by the law of large numbers, recalling that E ζ k = χ by (13.22), on the event E − , w.h.p.
If λ < 1/χ, and ε is so small that λ(χ + ε) + ε < 1, (13.42), (13.39) and Markov's inequality yield Hence, Letting ε → 0, we see that P (D λ log n − 2) P (D < L − 1) → 0. In other words, for any λ < 1/χ, D > λ log n − 2 w.h.p. (13.45) For the other side, assume λ > 1/χ, and let E + be the event {D L}. Let E + := {τ = L} and The law of large numbers and (13.38) imply that on the event E + , w.h.p., (13.46) Hence, if ε is small enough, (13.39) and Markov's inequality yield If E + holds, then (a) and (c) cannot hold, and thus N v τ B. Hence our chosen ball belongs to a subtree rooted at v τ with at most B balls. Conditioned on N v τ = m, this subtree is a copy of T m , and since the finitely many random trees T m , 1 m B, all are a.s. finite and thus have finite (random) heights H(T m ), there exists a constant C 2 such that It follows that conditioned on E ,  The constant C 2 may depend on ε, but it follows that for large n, (1). (13.51) Since ε can be arbitrarily small, this shows that for any λ > 1/χ and δ > 0, D (λ + δ) log n w.h.p., (13.52) which together with (13.45) completes the proof.
We transfer this result from balls to vertices.
Lemma 13.8. Let T n and ξ (n) i be as above. Then Proof. Again, let L := λ log n for a fixed λ > 0. Let Z k be the set of vertices of T n with depth k, and Z b k the set of balls with depth k; define Z k , Z k , Z b k , Z b k analogously. First, let λ > 1/χ. Let U L be the set of all vertices of T b with depth L. For any v ∈ T b , conditioned on N v , the fringe subtree T v n has the same distribution as T m with m = N v . Consequently, Lemma 13.6 shows that and thus Since Z L is the union of the fringe trees T v n for v ∈ U L , and Z b L is the set of all balls that reach some vertex in U L , it follows from (13.55) that In the opposite direction, let λ < 1/χ. Let ε > 0 and let B be a large number. We split Z L into the two sets Z L := {v ∈ Z L : N v > B} and Z L := {v ∈ Z L : N v B}. By (13.29), we may choose B so large that E |Z L | εn. (13.59) To treat Z L , we now stop the construction of T n at each vertex v with N v B. If such a vertex also has depth L, we colour it green. Let G be the set of all green vertices. Then the set Z L is included in the union of the fringe trees T v n for v ∈ G. Furthermore, conditioned on the set G and ( N v ) v∈G , each T v n (for v ∈ G) has the same distribution as T m for m = N v . Thus, by Lemma 13.6,  (13.48) shows that each fringe tree T v n (for v ∈ G) with probability 1/2 has height C 2 ; if this happens, T v n has in particular at least one ball of depth C 2 in the fringe tree, and thus depth L + C 2 in T n . Hence, Together with (13.61), this yields and then Lemma 13.7 implies This completes the proof together with (13.58).
Lemma 13.9. With notations as above, Proof. There is a standard identification of the vertices of T b with finite strings i 1 · · · i k with k 0 and i j ∈ {1, . . . , b}. If v = i 1 · · · i k ∈ T b , let v j := i 1 · · · i j , j k, and define Then [13,Lemma 2], by (ST2) and induction over k, as n → ∞, for some v ∈ U K , it follows from (13.72) that, using also (13.31) and v∈U K V v = 1, Since the probability on the left-hand side is bounded by 1, we may assume that so is the term o p (1) on the right-hand side, and thus we may take the expectation and use dominated convergence to conclude Furthermore, by the definition (13.68) and independence, Hence, given any ε > 0, we can find K such that (13.74) yields Remark 13.10. The random recursive tree and preferential attachment trees are not split trees in the sense above, since degrees are unbounded. Nevertheless, if the definition above is generalized to allow b = ∞, they too can be regarded as split trees, see [30]. We conjecture that under suitable conditions, Theorem 13.4 extends to the case b = ∞, but we have not pursued this. (Random recursive trees and preferential attachment trees can be handled by Theorem 14.3 below instead.)

Crump-Mode-Jagers branching process trees
A Crump-Mode-Jagers (CMJ) branching process (see e.g. [28]) is a continuous time process, where each individual gives birth to a (generally random) number of children at arbitrary random times; the times a single individual gets children are thus described by a point process on [0, ∞). All individuals have independent and identically distributed such point processes. We start with a single individual, born at time 0; we also suppose that the CMJ process is supercritical and that it never dies out; hence its size a.s. grows to ∞.
The family tree of the CMJ process is a growing random treeT t , t 0, where the vertices are all individuals born up to time t. We stop the tree at the stopping time τ (n) where the tree first reaches n vertices. Then (provided births a.s. occur at distinct times) T n :=T τ (n) is a random tree with fixed size |T n | = n. More generally, τ (n) may be defined as the first time the total weight reaches n, where each individual has a weight given by some "characteristic" ψ; see [26] for details. (For example, for an m-ary search tree, ψ counts the balls, and we stop when there are n balls; cf. the split trees in Section 13.) Many examples of such CMJ trees are discussed in the survey [26, Sections 6-8]; these include for example binary search trees and m-ary search trees (also covered by Section 13), and random recursive trees and preferential attachment trees. We give in Theorem 14.3 a general result for such trees. For example, this applies to Examples 12.3-12.6.
The point process can informally be regarded as the random set {ξ i } N i=1 of the times of births ξ i of the children of the root, where the number of children N ∈ {0, 1, . . . , ∞} in general is random. (We use the notationξ i to avoid confusion with the random vertices ξ (n) i .) Formally, is defined as the random measure N i=1 δξ i . Let μ := E denote the intensity measure of . We define the Laplace transform of the measure μ on [0, ∞) by As in [26], we make the following assumptions; see further [26].
(This rules out a rather trivial case with explosions already at the start. In all examples in [26], μ{0} = 0.) (A2) μ is not concentrated on any lattice hZ, h > 0. (This is for convenience only.) (A3) E N > 1. (This is known as the supercritical case.) For simplicity, we further assume that N 1 a.s. (In this case, every individual has at least one child, so the process never dies out and |T ∞ | = ∞.) (A4) There exists a real number α > 0 (the Malthusian parameter) such that μ(α) = 1, i.e., (A5) μ(θ) < ∞ for some θ < α. (A6ψ) (Only needed if the stopping time τ (n) is defined using a weight ψ. Thus void in the case that T n always has n vertices.) The random variable sup t e −θt ψ(t) has finite expectation for some θ < α.
We assume also the following technical condition. (We conjecture that this is not necessary, but we use it in our proof.) Define the random variable (14.3) and note that (14.2) is equivalent to We assume a weak moment condition.
(A7) We have E (α) log (α) < ∞. We regardT t as a subtree of the infinite tree T ∞ , where the vertices are all finite strings i 1 · · · i k of natural numbers i j ∈ N, with 0 k < ∞; thus the children of v are vi, for i = 1, . . . , in this order. Note that the length of the string labelling v ∈ T ∞ equals d(v, o); we denote this length by |v|.
For a vertex v ∈ T ∞ , let b v be the time that v is born in our CMJ branching process; if v never appears, then b v := ∞.
The fringe treeT v t (defined as ∅ if b v > t so v / ∈T t ) is from the time b v on a copy of the entire branching process tree, and thus (14.6) implies that for every v with b v < ∞, which holds trivially also for b v = ∞ (with e −∞ = 0). Consequently, Consider first the children of the root; these are labelled with i ∈ N. Since Z t = 1 + i Z i t , we have by (14.11)  Equivalently, there is a.s. equality in (14.12).
Let v ∈ T ∞ and apply (14.15) to the fringe treeT v t , again regarded as a copy of the original branching process; this shows that if b v < ∞, then Taking the expectation yields, by dominated convergence, We want to show that the right-hand side of (14.27) tends to 0 as k → ∞. Define, for k 0, By (14.17), a.s., For convenience, we define Q ∞:i also when b i = ∞, as some copy of Q ∞ independent of everything else. Then, (14.31) and (14.28) yield, for any k 0, Letting k → ∞ in (14.33), we obtain by dominated convergence, since Q k;i Q 0;i and i e −2αb i Q 0;i = Q 1 Q 0 = W 2 < ∞ a.s., with strict inequality as soon as there is more than one non-zero term in the sum. Moreover, since b i and Q ∞;i are independent, using (14.3)-(14.4) again, and thus there is equality in (14.35) a.s. Suppose that P (Q ∞ > 0) > 0. Conditioned on the offspring of the root, the fringe treesT i t , i N, are independent copies ofT t . Hence, the events N 2, Q ∞;1 > 0 and Q ∞;2 > 0 are independent and thus with positive probability they occur together, and then there is strict inequality in (14.35). This contradiction shows that Q ∞ = 0 a.s.
Consequently, (14.28) shows that, since W > 0 a.s., Proof of Theorem 3.15. This is the only place in the present paper where we use the machinery with ultraproducts used in [20] to prove the results there. We refer to [20] for definitions and basic properties, and will here only give the additional arguments needed. We fix, as in [20], an ultrafilter ω on N. All ultralimits and ultraproducts are defined using ω.
Let (T n ) ∞ 1 = (T n , d n , μ n ) ∞ 1 be a convergent sequence of measured real trees. Thus (3.6) holds for some measures λ r ∈ P(M r ).
Taking r = 2 in Definition 3.5, we see by (3.3) and (3.1) that, in particular, for some random variable ζ . It follows from (15.1) that the sequence of random variables d n (ξ (n) 1 , ξ (n) 2 ) is tight, i.e., that for every ε > 0, there exists a constants C ε such that for every n P d n (ξ (n) 1 , ξ (n) 2 ) > C ε ε. (15.2) Fix ε > 0. By (15.2) and Fubini's theorem, there exists x n ∈ T n such that P d n (ξ (n) 1 , x n ) > C ε ε. (15.3) Let A n := {x ∈ T n :d n (x, x n ) C ε }. Then (15.3) says μ n (A n ) 1 − ε. (15.4) As in [20], form the ultraproduct T := ω T n , and equip it with the pseudometric d := lim ω d n (which may take the value +∞) and the probability measure μ := ω μ n . Let x := [(x n ) n ] ∈ T and A := ω A n ⊆ T. For any y ∈ A, y = [(y n ) n ] for some y n ∈ T n with y n ∈ A n and thus d n (x n , y n ) C ε for every n; hence (It is shown in [20] that X is μ-measurable.) Here x = x(ε) and X = X(ε) may depend on ε. However, two infinite balls B(x 1 , ∞) and B(x 2 , ∞) in T either coincide or are disjoint. (Such infinite balls are called clusters in [20].) Hence, considering only ε 1 2 , it follows from (15.7) that all X(ε) coincide, and consequently form a cluster X with, using (15.7) again, μ(X) = 1.
This means that the sequence (T n , d n , μ n ) n is essentially bounded, in the terminology of [20]. Consequently, [20,Theorem 4] applies, and shows that lim ω τ r (T n ) = lim ω τ r (T n , d n , μ n ) = τ r (D), (15.8) for every r 1 and some long dendron D (constructed from the ultraproduct T in a way that we do not have to consider further).
Remark 15.1. The proof shows that a tight sequence (T n ) n of measured real trees is essentially bounded. The converse does not hold, since we may let T n be arbitrary along some subsequences without affecting the ultraproduct and ultralimits, and thus the property of being essentially bounded. Nevertheless, a sequence (T n ) n such that every subsequence is essentially bounded is tight (as a consequence of [20,Theorem 4]). Similarly, a sequence is tight if and only if it is essentially bounded for every ultrafilter ω.