The distribution of the maximum protection number in simply generated trees

The protection number of a vertex $v$ in a tree is the length of the shortest path from $v$ to any leaf contained in the maximal subtree where $v$ is the root. In this paper, we determine the distribution of the maximum protection number of a vertex in simply generated trees, thereby refining a recent result of Devroye, Goh and Zhao. Two different cases can be observed: if the given family of trees allows vertices of outdegree $1$, then the maximum protection number is on average logarithmic in the tree size, with a discrete double-exponential limiting distribution. If no such vertices are allowed, the maximum protection number is doubly logarithmic in the tree size and concentrated on at most two values. These results are obtained by studying the singular behaviour of the generating functions of trees with bounded protection number. While a general distributional result by Prodinger and Wagner can be used in the first case, we prove a variant of that result in the second case.

1. Introduction 1.1. Simply generated trees. Simply generated trees were introduced by Meir and Moon [21], and owing to their use in describing an entire class of trees, have created a general framework for studying random trees. A simply generated family of rooted trees is characterised by a sequence of weights associated with the different possible outdegrees of a vertex. Specifically, for a given sequence of nonnegative real numbers w j (j ≥ 0), one defines the weight of a rooted ordered tree to be the product v w d(v) over all vertices of the tree, where d(v) denotes the outdegree (number of children) of v. Letting Φ(t) = j≥0 w j t j be the weight generating function and Y (x) the generating function in which the coefficient of x n is the sum of the weights over all n-vertex rooted ordered trees, one has the fundamental relation (1) Y (x) = xΦ(Y (x)).
Common examples of simply generated trees are: plane trees with weight generating function Φ(t) = 1/(1 − t); binary trees (Φ(x) = 1 + t 2 ); pruned binary trees (Φ(t) = (1 + t) 2 ); and labelled trees (Φ(t) = e t ). In the first three examples, Y (x) becomes an ordinary generating function with the total weight being the number of trees in the respective family, while Y (x) can be seen as an exponential generating function in the case of labelled trees. In addition to the fact that the notion of simply generated trees covers many important examples, there is also a strong connection to the probabilistic model of Bienaymé-Galton-Watson trees: here, one fixes a probability distribution on the set of nonnegative integers.
Next, a random tree is constructed by starting with a root that produces offspring according to the given distribution. In each subsequent step, all vertices of the current generation also produce offspring according to the same distribution, all independent of each other and independent of all previous generations. The process stops if none of the vertices of a generation have children. If the weights in the construction of a simply generated family are taken to be the corresponding probabilities of the offspring distribution, then one verifies easily that the distribution of a random n-vertex tree from that family (with probabilities proportional to the weights) is the same as that of the Bienaymé-Galton-Watson process, conditioned on the event that the final tree has n vertices.
Conversely, even if the weight sequence of a simply generated family does not represent a probability measure, it is often possible to determine an equivalent probability measure that produces the same random tree distribution. For example, random plane trees correspond to a geometric distribution while random rooted labelled trees correspond to a Poisson distribution. We refer to [7] and [16] for more background on simply generated trees and Bienaymé-Galton-Watson trees.

Protection numbers in trees.
Protection numbers in trees measure the distance to the nearest leaf successor. Formally, this can be expressed as follows.
Definition (Protection number). The protection number of a vertex v is the length of the shortest path from v to any leaf contained in the maximal subtree where v is the root.
Alternatively, the protection number can be defined recursively: a leaf has protection number 0, the parent of a leaf has protection number 1, and generally the protection number of an interior vertex is the minimum of the protection numbers of its children plus 1. In this paper, we will be particularly interested in the maximum protection number of a tree, which is the largest protection number among all vertices. Figure 1 shows an example of a tree along with the protection numbers of all its vertices. The study of protection numbers in trees began with Cheon and Shapiro [4] considering the average number of vertices with protection number of at least 2 (called 2-protected) in ordered trees. Several other authors contributed to knowledge in this direction, by studying the number of 2-protected vertices in various types of trees: k-ary trees [20]; digital search trees [8]; binary search trees [19]; ternary search trees [14]; tries and suffix trees [10]; random recursive trees [18]; and general simply generated trees from which some previously known cases were also obtained [6].
Generalising the concept of a vertex being 2-protected, k-protected vertices-when a vertex has protection number at least k-also became a recent topic of interest. Devroye and Janson [6] proved convergence of the probability that a random vertex in a random simply generated tree has protection number k. Copenhaver gave a closed formula for the number of k-protected vertices in all unlabelled rooted plane trees on n vertices along with expected values, and these results were extended by Heuberger and Prodinger [13]. A study of k-protected vertices in binary search trees was done by Bóna [2] and Bóna and Pittel [3]. Holmgren and Janson [15] proved general limit theorems for fringe subtrees and related tree functionals, applications of which include a normal limit law for the number of k-protected vertices in binary search trees and random recursive trees.
Moreover, the protection number of the root of families of trees has also been studied. In [13], Heuberger and Prodinger derived the probability of a plane tree having a root that is k-protected, the probability distribution of the protection number of the root of recursive trees is determined by Gołębiewski and Klimczak in [12]. The protection number of the root in simply generated trees, Pólya trees, and unlabelled non-plane binary trees was studied by Gittenberger, Gołębiewski, Larcher, and Sulkowska in [11], where they also obtained results relating to the protection number of a randomly chosen vertex.
Very recently, Devroye, Goh and Zhao [5] studied the maximum protection number in Bienaymé-Galton-Watson trees, referring to it as the leaf-height. Specifically, they showed the following: if X n is the maximum protection number in a Bienaymé-Galton-Watson tree conditioned on having n vertices, then Xn log n converges in probability to a constant if there is a positive probability that a vertex has exactly one child. If this is not the case, then Xn log log n converges in probability to a constant.
Our aim in this paper is to refine the result of Devroye, Goh and Zhao by providing the full limiting distribution of the maximum protection number. For our analytic approach, the framework of simply generated trees is more natural than the probabilistic setting of Bienaymé-Galton-Watson trees, though as mentioned earlier the two are largely equivalent.
1.3. Statement of results. As was already observed by Devroye, Goh and Zhao in [5], there are two fundamentally different cases to be considered, depending on whether or not vertices of outdegree 1 are allowed (have nonzero weight) in the given family of simply generated trees. If such vertices can occur, then we find that the maximum protection number of a random tree with n vertices is on average of order log n, with a discrete double-exponential distribution in the limit. On the other hand, if there are no vertices of outdegree 1, then the maximum protection number is on average of order log log n. There is an intuitive explanation for this phenomenon. If outdegree 1 is allowed, it becomes easy to create vertices with high protection number: if the subtree rooted at a vertex is an (h + 1)-vertex path, then this vertex has protection number h. On the other hand, if outdegree 1 is forbidden, then the smallest possible subtree rooted at a vertex of protection number h is a complete binary tree with 2 h+1 − 1 vertices. An illustration of the two cases is given in Figure 2.
In the case where vertices of outdegree 1 can occur, the limiting distribution turns out to be a discrete double-exponential distribution that also occurs in many other combinatorial examples, and for which general results are available-see Section 2.2. These results are adapted in Section 5.2 to the case where there are no vertices of outdegree 1. In the following results, we make a common technical assumption, stating formally that there is a positive real number τ , less than the radius of convergence of Φ, such that Φ(τ ) = τ Φ ′ (τ ) (see Section 2.1 for further details). This is equivalent to the offspring distribution of the associated Bienaymé-Galton-Watson process having a finite exponential moment, which is the case for all the examples mentioned earlier (plane trees, binary trees, pruned binary trees, labelled trees). This assumption is crucial for the analytic techniques that we are using, which are based on an asymptotic analysis of generating functions. However, it is quite likely that our main results remain valid under somewhat milder conditions. Theorem 1. Given a family of simply generated trees with w 1 = Φ ′ (0) = 0, the proportion of trees of size n whose maximum protection number is at most h is asymptotically given by as n → ∞ and h = log d (n) + O(1), where κ (given in (55)) and d = (ρΦ ′ (0)) −1 > 1 are positive constants. Moreover, the expected value of the maximum protection number in trees with n vertices is where γ denotes the Euler-Mascheroni constant and ψ d is the 1-periodic function that is defined by the Fourier series In the case where vertices of outdegree 1 are excluded, we show that the maximum protection number is strongly concentrated. In fact, with high probability it only takes on one of at most two different values (depending on the size of the tree). The precise result can be stated as follows.
Theorem 2. Given a family of simply generated trees with w 1 = Φ ′ (0) = 0, set r = min{i ∈ N : i ≥ 2 and w i = 0} and D = gcd{i ∈ N : w i = 0}. The proportion of trees of size n whose maximum protection number is at most h is asymptotically given by as n → ∞, n ≡ 1 (mod D), and h = log r (log d (n)) + O(1), where κ = wrλ r 1 Φ(τ ) and d = µ −r > 1 are positive constants with λ 1 and µ defined in (61) and (62) respectively (see Lemma 5.5). Moreover, there is a sequence of positive integers h n such that the maximum protection number of a tree with n vertices is h n or h n + 1 with high probability (i.e., probability tending to 1 as n → ∞) where n ≡ 1 (mod D).
Specifically, with m n = log r log d (n) and {m n } denoting its fractional part, one can set If we restrict to those values of n for which {m n } ∈ [ǫ, 1 − ǫ], where ǫ > 0 is fixed, then with high probability X n is equal to ⌈m n ⌉.
Note that in the setting of Theorem 2, it is easy to see that there are no trees of size n if n ≡ 1 (mod D). In the setting of Theorem 1, we have gcd{i ∈ N : w i = 0} = 1 because w 1 = 0. Theorem 1 is illustrated in Figure 3, while Theorem 2 is illustrated in Figure 4.  The proof of Theorem 1 relies on a a general distributional result provided in [22], see Theorem 2.1. For the proof of Theorem 2, however, we will need a variant for doubly-exponential convergence of the dominant singularities. The statement and proof are similar to the original and we expect that this variant will be useful in other contexts, too.  Theorem 3. Let Y h (x) = n≥0 y h,n x n (h ≥ 0) be a sequence of generating functions with nonnegative coefficients such that y h,n is nondecreasing in h and and let X n denote the sequence of random variables with support N 0 = {0, 1, 2, . . .} defined by Assume that each generating function Y h has a singularity at ρ h ∈ R such that (1) ρ h = ρ(1 + κζ r h + o(ζ r h )) as h → ∞ for some constants ρ > 0, κ > 0, ζ ∈ (0, 1), and r > 1.
(2) Y h (x) can be continued analytically to the domain {x ∈ C : |x| ≥ (1 + δ)|ρ h |, |Arg(x/ρ h − 1)| > φ} for some fixed δ > 0 and φ ∈ (0, π/2), and holds within this domain, uniformly in h, where U h (x) is analytic and uniformly bounded in h within the aforementioned region, α ∈ R \ N 0 , and A h is a constant dependent on h such that lim h→∞ A h = A = 0. Finally,  In the next theorem, we show that the consequences of this distributional result are quite drastic.
Theorem 4. Assume the conditions of Theorem 3. There is a sequence of nonnegative integers h n such that X n is equal to h n or h n + 1 with high probability. Specifically, with m n = log r log d (n) and {m n } denoting its fractional part, one can set If we restrict to those values of n for which {m n } ∈ [ǫ, 1 − ǫ], where ǫ > 0 is fixed, then with high probability X n is equal to ⌈m n ⌉.

2.1.
Basic facts about simply generated trees. For our purposes, we will make the following typical technical assumptions: first, we assume without loss of generality that w 0 = 1 or equivalently Φ(0) = 1. In other words, leaves have an associated weight of 1, which can be achieved by means of a normalising factor if necessary. Moreover, to avoid trivial cases in which the only possible trees are paths, we assume that w j > 0 for at least one j ≥ 2. Finally, we assume that there is a positive real number τ , less than the radius of convergence of Φ, such that Φ(τ ) = τ Φ ′ (τ ). As mentioned earlier, this is equivalent to the offspring distribution having exponential moments.
It is well known (see e.g. [7, Section 3.1.4]) that if such a τ exists, it is unique, and the radius of convergence ρ of Y can be expressed as which is equivalent to ρ and τ satisfying the simultaneous equations y = xΦ(y) and 1 = xΦ ′ (y) (which essentially mean that the implicit function theorem fails at the point (ρ, τ )). Moreover, Y has a square root singularity at ρ with τ = Y (ρ), with a singular expansion of the form The coefficients a, b, c can be expressed in terms of Φ and τ . In particular, we have In fact, there is a full Newton-Puiseux expansion in powers of (1 − x/ρ) 1/2 . If the weight sequence is aperiodic, i. e., gcd{j : w j = 0} = 1, then ρ is the only singularity on the circle of convergence of Y , and for sufficiently small ε > 0 there are no solutions to the simultaneous equations y = xΦ(y) and 1 = xΦ ′ (y) with |x| ≤ ρ+ ε and |y| ≤ τ + ε other than (x, y) = (ρ, τ ). Otherwise, if this gcd is equal to D, there are D singularities at ρe 2kπi D (i ∈ {0, 1, . . . , D−1}), all with the same singular behaviour. In the following, we assume for technical simplicity that the weight sequence is indeed aperiodic, but the proofs are readily adapted to the periodic setting, see Remarks 3.17 and 5.9. By means of singularity analysis [9, Chapter VI], the singular expansion (4) yields an asymptotic formula for the coefficients of Y : we have If the weight sequence corresponds to a probability distribution, then y n is the probability that an unconditioned Bienaymé-Galton-Watson tree has exactly n vertices when the process ends. For other classes such as plane trees or binary trees, y n represents the number of n-vertex trees in the respective class.

2.2.
A general distributional result. The discrete double-exponential distribution in Theorem 1 has been observed in many other combinatorial instances, for example the longest run of zeros in a random 0-1-string, the longest horizontal segment in Motzkin paths or the maximum outdegree in plane trees. This can often be traced back to the behaviour of the singularities of associated generating functions. The following general result, taken from [22], will be a key tool for us.
Theorem 2.1 (see [22,Theorem 1]). Let Y h (x) = n≥0 y h,n x n (h ≥ 0) be a sequence of generating functions with nonnegative coefficients such that y h,n is nondecreasing in h and and let X n denote the sequence of random variables with support N 0 = {0, 1, 2, . . .} defined by Assume, moreover, that each generating function Y h has a singularity ρ h ∈ R, such that as h → ∞ for some constants ρ > 0, κ > 0 and ζ ∈ (0, 1).
(2) Y h (x) can be continued analytically to the domain for some fixed δ > 0 and φ ∈ (0, π/2), and holds within this domain, uniformly in h, where U h (x) is analytic and uniformly bounded in h within the aforementioned region, α ∈ R \ N 0 , and A h is a constant depending on h such that lim h→∞ A h = A = 0. Finally, for a function U (x) that is analytic within this region. Then the asymptotic formula holds as n → ∞ and h = log d (n) + O(1), where d = ζ −1 . Hence the shifted random variable X n − log d (n) converges weakly to a limiting distribution if n runs through a subset of the positive integers such that the fractional part {log d (n)} of log d (n) converges.
As we will see, the conditions of this theorem hold for the random variable X n given by the maximum protection number of a random n-vertex tree from a simply generated family that satisfies our technical assumptions. Under slightly stronger assumptions, which also hold in our case, one has the following theorem on the expected value of the random variable X n . Theorem 2.2 (see [22,Theorem 2]). In the setting of Theorem 2.1, assume additionally that (1) There exists a constant K such that y h,n = y n for h > Kn, the asymptotic expansions of Y h and Y around their singularities are given by Then the mean of X n satisfies where γ denotes the Euler-Mascheroni constant and ψ d is given by (2).

2.3.
A system of functional equations. As a first step of our analysis, we consider a number of auxiliary generating functions and derive a system of functional equations that is satisfied by these generating functions. The family of simply generated trees and the associated weight generating function Φ are regarded fixed throughout. Let h be a positive integer and k an integer with 0 ≤ k ≤ h. Consider trees with the following two properties: P1. No vertex has a protection number greater than h. P2. The root is at least k-protected (but also at most h-protected). Let Y h,k (x) be the associated generating function, where x marks the number of vertices. Note in particular that when k = 0, we obtain the generating function for trees where the maximum protection number is at most h. Hence we can express the probability that the maximum protection number of a random n-vertex tree (from our simply generated family) is at most h as the quotient This is precisely the form of (5), and indeed our general strategy will be to show that the generating functions Y h,0 satisfy the technical conditions of Theorem 2.1. Compared to the examples given in [22], this will be a rather lengthy technical task. However, we believe that the general method, in which a sequence of functional equations is shown to converge uniformly in a suitable region, is also potentially applicable to other instances and therefore interesting in its own right.
Let us now derive a system of functional equations, using the standard decomposition of a rooted tree into the root and its branches. Clearly, if a tree has property P1, then this must also be the case for all its branches. Moreover, property P2 is satisfied for k > 0 if and only if the root of each of the branches is at least (k − 1)-protected, but not all of them are h-protected (as this would make the root (h + 1)-protected). Thus, for 1 ≤ k ≤ h, we have Note that the only case in which the root is only 0-protected is when the root is the only vertex. Hence we have The analytic properties of the system of functional equations given by (7) and (8) will be studied in the following section, culminating in Proposition 3.16, which shows that Theorem 2.1 is indeed applicable to our problem.
3. Analysis of the functional equations 3.1. Contractions and implicit equations. This section is devoted to a detailed analysis of the generating functions Y h,k that satisfy the system of equations given by (7) and (8). The first step will be to reduce it to a single implicit equation satisfied by Y h,1 that is then shown to converge to the functional equation (1) in a sense that will be made precise. This is then used to infer information on the region of analyticity of Y h,1 as well as its behaviour around the dominant singularity, which is also shown to converge to the dominant singularity of Y . This information is collected in Proposition 3.16 at the end of the section.
In the following, we will prove various statements for sufficiently small ε > 0. In several, but finitely many, steps it might be necessary to decrease ε; we tacitly assume that ε is always small enough to ensure validity of all statements up to the given point. In order to avoid ambiguities, we will always assume that ε < 1. Let us remark that ε and other constants as well as all implied O-constants that occur in this section depend on the specific simply generated family of trees (in particular the weight generating function Φ and therefore ρ and τ ), but nothing else.
Recall that ρ is the dominant singularity of the generating function Y of our simply generated family of trees.
Let us write D δ (w) := {z ∈ C : |z − w| < δ} for open disks. For ε > 0, we define ε , and we also set Ξ ε := Ξ ε . As τ is less than the radius of convergence of Φ by our assumptions, we may choose ε > 0 sufficiently small such that τ + 2ε is still smaller than the radius of convergence of Φ.
Consider the function defined by f x,z (y) = x(Φ(y) − Φ(z)). We can rewrite the functional equation (7) in terms of this function as x,z (y) = y and f (j+1) This means that (10) and (11) are a system of two functional equations for Y h,1 (x) and Y h,h (x). We intend to solve (10) for Y h,h (x) and then plug the solution into (11). As a first step towards this goal, we show that f x,z represents a contraction on a suitable region.
For the remainder of this section, λ will be defined as in Lemma 3.2. , f x,z maps Ξ (2) ε to itself and is a contraction with Lipschitz constant λ.
Proof. The fact that f x,z maps Ξ (2) ε to itself for sufficiently small ε > 0 is a direct consequence of Lemma 3.1.
Making use of Lemma 3.2, the contraction property now follows by a standard argument: , Banach's fixed point theorem together with Lemma 3.3 implies that f x,z has a unique fixed point in Ξ (2) ε . This fixed point will be denoted by g(x, z), i. e., (12) g If we plug in 0 for z, we see that (12) holds for g(x, 0) = 0, so uniqueness of the fixed point implies that Lemma 3.4. For sufficiently small ε > 0, g : Ξ is an analytic function, and ∂ ∂z g(x, z) is bounded.
Proof. Note that using Lemma 3.2, we have that x,z (y)| ≥ 1 − λ is bounded away from zero for sufficiently small ε > 0 and (x, y, z) ∈ Ξ ε . Thus the analytic implicit function theorem shows that g as defined by (12) is analytic and has bounded partial derivative ∂ ∂z g(x, z) on Ξ for sufficiently small ε > 0.
We now intend to solve (10) for Y h,h (x). Therefore, we consider the equation and attempt to solve it for z. For large h, f (y) will be close to the fixed point g(x, z) of f x,z by the Banach fixed point theorem.
Therefore, we define Λ h as the difference between the two: (14) can be rewritten as We first establish bounds on Λ h .
Proof. Since g is defined as the fixed point of f x,z and f x,z is a contraction with Lipschitz constant λ, we have for (x, y, z) ∈ Ξ ε , so we have shown (16).
In order to apply the analytic implicit function theorem to the implicit equation (10) for Y h,h , we will need to show that the derivative of the difference of the two sides of (15) with respect to z is nonzero. The derivative of the second summand on the right-hand side of (15) is small by (17), so we first consider the remaining part of the equation.
Lemma 3.6. There is a δ > 0 such that for sufficiently small ε > 0, we have Proof. To compute ∂ ∂z g(x, z), we differentiate (12) with respect to z and obtain ∂ ∂z .
Note that the denominator is nonzero for ( , and by (13), it follows that for ε → 0, uniformly in x. Therefore, we have 1+ρΦ ′ (0) and choosing ε small enough yields the result. We need bounds for z such that we remain in the region where our previous results hold. In fact, (13) shows that z = 0 would be a solution when the summand Λ h (which is O(λ h )) is removed from the implicit equation, so we expect that the summand Λ h does not perturb z too much. This is shown in the following lemma.
There exists an ε > 0 such that for sufficiently large h, there is a unique analytic function q h : Ξ Proof. We choose h sufficiently large such that (17) implies where δ is taken as in Lemma 3.6, and such that (20) implies for all (x, y, z) ∈ Ξ ε for which (15) holds.
By definition of f , we have f x,0 (0) = 0 and therefore f ε , so z = 0 is a solution of (14) for y = 0. By (18) and (23), we have for (x, y, z) ∈ Ξ ε . The analytic implicit function theorem thus implies that, for every x ∈ Ξ ε , there is an analytic function q h defined in a neighbourhood of (x, 0) such that (22) holds there and such that q h (x, 0) = 0. Next we show that this extends to the whole region Ξ ε , let r(x 0 ) be the supremum of all r < τ − ρ + ε for which there is an analytic ε . Suppose for contradiction that r(x 0 ) < τ − ρ + ε. Consider a point y 0 with |y 0 | = r(x 0 ), and take a sequence y n → y 0 such that |y n | < r(x 0 ). Note that |q h (x 0 , y n )| ≤ ε 2 by (24). Without loss of generality, we can assume that q h (x 0 , y n ) converges to some q 0 with |q 0 | ≤ ε 2 as n → ∞ (by compactness). By continuity, we have q 0 = f (h−1) x 0 ,q 0 (y 0 ). Since (x 0 , y 0 , q 0 ) ∈ Ξ ε , we can still use the analytic implicit function theorem together with (25) to conclude that there is a neighbourhood of (x 0 , y 0 , q 0 ) where the equation f x,q h (x,y) (y) andq h (x 0 , y 0 ) = q 0 . We assume the neighbourhood to be chosen small enough such thatq h (x, y) ∈ Ξ ε for all (x, y) in the neighbourhood. For large enough n, this neighbourhood contains (x 0 , y n , q h (x 0 , y n )), so we must have q h (x 0 , y n ) =q h (x 0 , y n ) for all those n. This implies thatq h is an analytic continuation of q h in a neighbourhood of (x 0 , y 0 ) with values in Ξ at least in a neighbourhood of 0, which we can plug into (11) to get this can be rewritten as The function F h is analytic on Ξ (1,2) ε by Lemma 3.8 and the fact that Φ is analytic for these arguments. Note also that . By the estimate on q h in Lemma 3.8, we also have 14 uniformly for (x, y) ∈ Ξ (1,2) ε . Using the same argument as in Lemma 3.5, we can also assume (redefining ε if necessary) that and analogous estimates for any finite number of partial derivatives hold as well. Having reduced the original system of equations to a single equation for Y h,1 (x), we now deduce properties of its dominant singularity. Since Y h,1 (x) has a power series with nonnegative coefficients, by Pringsheim's theorem it must have a dominant positive real singularity that we denote by ρ h . Since the coefficients of Y h,1 (x) are bounded above by those of Y (x), we also know that ρ h ≥ ρ.
On the other hand, we also have so by continuity there must exist some x 0 ∈ (ρ/2,ρ) such that Moreover, if h is large enough we have as x 0 and thus also Y h,1 (x 0 ) are bounded below by positive constants, and analogously But this would mean that Y h,1 has a square root singularity at x 0 < ρ h (compare the discussion in Section 3.3 later), and we reach a contradiction. Hence we can assume that Assume next that ρ h > ρ + λ h/2 . Now for . Note here that u 1 ≤ τ + ε 2 + λ h/2 by (28), thus u 1 is in the region of analyticity of Φ (again assuming h to be large enough). However, since u ≤ ρΦ(u) for all positive real u for which Φ(u) is well-defined (the line u → u ρ is a tangent to the graph of the convex function Φ at τ ), for sufficiently large h the right-hand side in (29) is necessarily greater than the left, and we reach another contradiction. So it follows that ρ h ≤ ρ + λ h/2 , and in particularρ = , i.e., (ρ h , η h,1 ) lies within the region of analyticity of F h . So the singularity at ρ h must be due to the implicit function theorem failing at this point:

The second equation in particular gives us
by (27). Since Φ ′ is increasing for positive real arguments and we know that ρΦ ′ (τ ) = 1 and As we have established that η h,1 → τ − ρ as h → ∞, we will use the abbreviation η 1 := τ − ρ in the following. This will later be generalised to η h,k := Y h,k (ρ h ) → η k , see Sections 4 and 5. For our next step, we need a multidimensional generalisation of Rouché's theorem: Let Ω be a bounded domain in C n whose boundary ∂Ω is piecewise smooth. Suppose that u, v : Ω → C n are analytic functions, and that the boundary of Ω does not contain any zeros of u. Moreover, assume that for every z ∈ ∂Ω, there is at least one coordinate j for which |u j (z)| > |v j (z)| holds. Then u and u + v have the same number of zeros in Ω.
Lemma 3.11. If ε is chosen sufficiently small and h sufficiently large, then the pair (ρ h , η h,1 ) is the only solution to the simultaneous equations F h (x, y) = y and ∂ ∂y F h (x, y) = 1 with (x, y) ∈ Ξ (1,2) ε . Proof. Note that (ρ, η 1 ) is a solution to the simultaneous equations F ∞ (x, y) = x(Φ(x + y) − 1) = y and ∂ ∂y F ∞ (x, y) = xΦ ′ (x + y) = 1, and that there is no other solution with |x| ≤ ρ + ε and |y| ≤ η 1 + ε if ε is chosen sufficiently small by our assumptions on the function Φ (see Section 2.1). We take Ω = Ξ (1,2) ε in Theorem 3.10 and set Note that both coordinates of v are O(λ h ) by (26) and (27). Since the boundary ∂Ω contains no zeros of u, if we choose h sufficiently large, then the conditions of Theorem 3.10 are satisfied.
Consequently, u and u + v have the same number of zeros in Ω, namely 1. Solutions to the simultaneous equations F h (x, y) = y and ∂ ∂y F h (x, y) = 1 are precisely zeros of u + v, so this completes the proof.
At this point, it already follows from general principles (see the discussion in [9, Chapter VII.4]) that for every sufficiently large h, Y h,1 has a dominant square root singularity at ρ h , and is otherwise analytic in a domain of the form (6). As we will need uniformity of the asymptotic expansion and a uniform bound for the domain of analyticity, we will make this more precise in the following section.

Asymptotic expansion and area of analyticity.
Lemma 3.12. Let ε > 0 be such that all previous lemmata hold. There exist δ 1 , δ 2 > 0, some positive number h 0 , and analytic functions holds for (x, y) ∈ D δ 1 (ρ h ) × D δ 2 (η h,1 ) and h ≥ h 0 and such that |R h | is bounded from above and below by positive constants on D δ 1 (ρ h ) × D δ 2 (η h,1 ) for h ≥ h 0 (uniformly in h) and |S h | is bounded from above and below by positive constants on D δ 2 (η h,1 ) for h ≥ h 0 (uniformly in h). Furthermore, the sequences R h and S h converge uniformly to some analytic functions R and S, respectively. The same holds for their partial derivatives.

By (34), this can be rewritten as
Rearranging and using the definition of S h (y) as well as (32) yields for all y ∈ D δ 2 (η h,1 ) and h ≥ h 0 . Thus |S h (y)| is bounded from below and above by positive constants for every such y and h. We now define R h (x, y) such that (30) holds, which is equivalent to Rearranging and using the definition of R h (x, y) yields by (31) for x ∈ D δ 1 (ρ h ) \ {ρ h } and y ∈ D δ 2 (η h,1 ) and h ≥ h 0 . In other words, |R h (x, y)| is bounded from below and above by positive constants for these (x, y) and h.
To prove analyticity of R h , we use Cauchy's formula to rewrite it as note that the integrand has a removable singularity at ζ = ρ h in this case). The integral is also defined for x = ρ h and clearly defines an analytic function on D δ 1 (ρ h )×D δ 2 (η h,1 ) whose absolute value is bounded from above and below by a constant.
To see uniform convergence of R h , we use Cauchy's formula once more and get and y ∈ D δ 2 (η h,1 ). Without loss of generality, h 0 is large enough such that |ρ h − ρ| < δ 1 /4 and |η h,1 − η 1 | < δ 2 /4. By Cauchy's theorem, we can change the contour of integration such that (35) implies and y ∈ D δ 2 /4 (η 1 ), as the deformation is happening within the region of analyticity of the integrand. Using (26) and the fact that the denominator of the integrand is bounded away from zero shows that for x ∈ D δ 1 /4 (ρ) and y ∈ D δ 2 /4 (η 1 ). By Lemma 3.9, replacing the remaining occurrences of ρ h by ρ induces another error term of O(λ h/2 ), so that we get for x ∈ D δ 1 /4 (ρ) and y ∈ D δ 2 /4 (η 1 ). Of course, the O constants do not depend on x and y; therefore, we have uniform convergence. Analogously, we get for y ∈ D δ 2 /4 (η 1 ). Analogous results hold for partial derivatives. We replace δ 1 by δ 1 /4 and δ 2 by δ 2 /4 to get the result as stated in the lemma.
Proof. We first choose δ 1 and δ 2 as in Lemma 3.12. Then y = F h (x, y) and Lemma 3.12 imply that The fraction on the right-hand side is bounded by some absolute constant according to Lemma 3.12. So by decreasing δ 1 if necessary, the right-hand side is at most δ 2 /2. Lemma 3.14. Let ε > 0 be such that the previous lemmata hold. There exists δ 0 > 0 such that, for all sufficiently large h, the asymptotic formula Proof. By (30), the function Y h,1 is determined by the implicit equation Lemma 3.13. For some h ≥ h 0 , let r h be the supremum of all r ≤ δ 1 such that Y h,1 can be continued analytically to C(r) with values in D δ 2 /2 (η h,1 ). We claim that r h = δ 1 .
Suppose for contradiction that r h < δ 1 and let x ∞ ∈ C(r h ). Choose a sequence of elements x n ∈ C(r h ) converging to x ∞ for n → ∞ and set y n := Y h,1 (x n ) for all n. By assumption, we have |y n − η h,1 | ≤ δ 2 /2. By replacing the sequence x n by a subsequence if necessary, we may assume that the sequence y n is convergent to some limit y ∞ . Note that |y ∞ − η h,1 | ≤ δ 2 /2. By continuity of F h , we also have y ∞ = F h (x ∞ , y ∞ ). As (x ∞ , y ∞ ) ∈ Ξ (1,2) ε with x ∞ = ρ h , Lemma 3.11 and the analytic implicit function theorem imply that Y h,1 can be continued analytically in a suitable open neighbourhood of x ∞ . This neighbourhood can be chosen small enough such that the inequality |Y h,1 (x) − η h,1 | ≤ δ 2 holds for all x in this neighbourhood. However, Lemma 3.13 implies that we then actually have |Y h,1 (x) − η h,1 | ≤ δ 2 /2 for all such x.
The set of these open neighbourhoods associated with all x ∞ ∈ C(r h ) covers the compact set C(r h ), so a finite subset of these open neighbourhoods can be selected. Thus we find an analytic continuation of Y h,1 to C( r h ) for some r h ∈ (r h , δ 1 ) with values still in D δ 2 /2 (η h,1 ), which is a contradiction to the choice of r h .
Thus we have r h = δ 1 . In particular, choosing h large enough that |η h, Rearranging (39) yields We know from Lemma 3.12 that R h is bounded above and S h is bounded below on D δ 1 (ρ h ) × D δ 2 (η h,1 ) and D δ 2 (η h,1 ), respectively. Therefore, the absolute value of the first factor on the right-hand side of (40) is bounded above and below by positive constants for x ∈ D δ 1 (ρ h ). For x < ρ h , we have that the factor (1 − x/ρ h ) is trivially positive and that η h,1 > Y h,1 (x) 20 because Y h,1 is strictly increasing on (0, ρ h ), so the first factor on the right-hand side of (40) must be positive. Thus we may take the principal value of the square root to rewrite (40) as for x ∈ C(δ 1 ). The above considerations also show that the radicand in (41) remains positive in the limit x → ρ − h (i.e., as x approaches ρ h from the left) and then for h → ∞. As we just observed that the first factor on the right-hand side of (41) is bounded, (41) implies with an O-constant that is independent of h. We can now iterate this argument: using Taylor expansion along with the fact that partial derivatives of R h and S h are uniformly bounded above while S h is also uniformly bounded below, we obtain

Plugging this into (41) yields
still with an O-constant that is independent of h. This can be continued arbitrarily often to obtain further terms of the expansion and an improved error term (for our purposes, it is enough to stop at O((x−ρ h ) 2 )). Indeed it is well known (cf. [ where the numerator N is a polynomial in R h (ρ h , η h,1 ), S h (η h,1 ) and their derivatives. By Lemma 3.12, R h and S h as well as their partial derivatives converge uniformly to R and S as well as their partial derivatives, respectively, with an error bound of O(λ h/2 ). We also know that ρ h and η h,1 converge exponentially to ρ and η 1 , respectively, see Lemma 3.9. This means that first replacing all occurrences of R h and S h by R and S, respectively, and then replacing all occurrences of ρ h and η h,1 by ρ and η 1 , respectively, shows that a h = a + O(λ h/2 ), b h = b + O(λ h/2 ), and c h = c + O(λ h/2 ) where a, b, and c are the results of these replacements. Taking the limit for h → ∞ in (30) shows that R and S and therefore a, b, and c play the 21 same role with respect to F ∞ as R h , S h , a h , b h , and c h play with respect to F h , which implies that a, b, and c are indeed the constants from (4).
Having dealt with the behaviour around the singularity, it remains to prove a uniform bound on Y h,1 in a domain of the form (6) for fixed δ.
Lemma 3.15. Let ε > 0 be such that all previous lemmata hold. There exist δ > 0 and a positive integer h 0 such that Y h,1 (x) has an analytic continuation to the domain for all h ≥ h 0 , and has the uniform upper bound with δ 0 as in the previous lemma. Note that trivially, r h ≥ ρ. If lim inf h→∞ r h > ρ, we are done: in this case, there is some δ > 0 such that Y h,1 extends analytically to D ρ(1+δ) (0) \ D δ 0 (ρ h ) and satisfies |Y h,1 (x)| < η 1 + ε 2 there. As the previous lemma covers D δ 0 (ρ h ), this already completes the proof.
So let us assume that lim inf h→∞ r h = ρ and derive a contradiction. The assumption implies that there is an increasing sequence of positive integers h j such that lim j→∞ r h j = ρ. Without loss of generality, we may assume that r h j ≤ ρ + ε 2 for all j. Pick (for each sufficiently large j) a point x h j with |x h j | = r h j and |Y h j ,1 (x h j )| = η 1 + ε 2 . If this were not possible, we could analytically continue Y h j ,1 at every point x with |x| = r h j and x / ∈ D δ 0 (ρ h j ) to a disk where Y h j ,1 is still bounded by η 1 + ε 2 . This analytic continuation is possible, since by Lemma 3.11 the pair (ρ h j , η h j ,1 ) is the only solution to the simultaneous equations F h j (x, y) = y and ∂ ∂y F h j (x, y) = 1 with (x, y) ∈ Ξ (1,2) ε , so the analytic implicit function theorem becomes applicable (compare e.g. the analytic continuation of q h in Lemma 3.8). By compactness, this would allow us to extend Y h j ,1 to D r (0) \ D δ 0 (ρ h j ) for some r > r h j while still maintaining the inequality |Y h j ,1 (x)| < η 1 + ε 2 , contradicting the choice of r h j . Without loss of generality (choosing a subsequence if necessary), we can assume that x h j and Y h j ,1 (x h j ) have limits x ∞ and y ∞ , respectively. By construction, |x ∞ | = ρ and |y ∞ | = η 1 + ε 2 . Since x h j / ∈ D δ 0 (ρ) for all j, Arg x h j is bounded away from 0. Thus we can find α > 0 such that |Arg x h j | ≥ 2α for all j. Define the region A by A = z ∈ C : |z| < 1 2 or (|z| < 1 and |Arg z| < α) .
Note that x h j A avoids the part of the real axis that includes ρ h j (see Figure 5), so the function Y h j ,1 (x) is analytic in this region for all j by construction since (x, Y h j ,1 (x)) ∈ Ξ (1,2) ε whenever x ∈ x h j A. So we have a sequence of functions W j (z) := Y h j ,1 (x h j z) that are all analytic on A and are uniformly bounded above by η 1 + ε 2 by our choice of x h j . By Montel's theorem, there is a subsequence of these functions (without loss of generality the sequence itself) that converges locally uniformly and thus to an analytic function W ∞ on A. This function needs to satisfy the following: Figure 5. Illustration of the domain x h j A.
This is also equivalent to These two properties imply that W ∞ (z) = Y (x ∞ z) − x ∞ z, since Y is the unique function that is analytic at 0 and satisfies the implicit equation .
Now for x ∈ x h j A, |x| ≤ ρ+ ε holds by assumption, as does , so that the denominator in (43) is bounded below by a positive constant for sufficiently large j.
So we can conclude that Y ′ h j ,1 (x) is uniformly bounded by a constant for x ∈ x h j A, implying that W ′ j (z) is uniformly bounded (for all z ∈ A and all sufficiently large j) by a constant that is independent of j. Therefore, W j (z) is a uniformly equicontinuous sequence of functions on A, the closure of A. By the Arzelà-Ascoli theorem, this implies that W j (z) → W ∞ (z) = Y (x ∞ z) − x ∞ z holds even for all z ∈ A, not only on A. In particular, Here, we have |x ∞ | ≤ ρ and |y ∞ | = η 1 + ε 2 by assumption. However, holds for all |x| ≤ ρ by the triangle inequality, so we finally reach a contradiction.
We conclude this section with a summary of the results proven so far. The following proposition follows by combining the last two lemmata.
Proposition 3.16. There exists a constant δ > 0 such that Y h,1 (x) can be continued analytically to the domain for every sufficiently large h. Moreover, Y h,1 (x) is then uniformly bounded on this domain by a constant that is independent of h, and the following singular expansion holds near the singularity: where the O-constant is independent of h and a h , b h , c h converge at an exponential rate to a, b, c respectively as h → ∞. If D > 1, then for all trees of our simply generated family of trees, the number n of vertices will be congruent to 1 modulo D because all outdegrees are multiples of D. Trivially, the same is true for all trees with maximum protection number h.
By [9,Remark VI.17], both Y and Y h,1 have D conjugate roots on its circle of convergence. Therefore, it is enough to study the positive root at the radius of convergence. Up to Theorem 3.10, no changes are required. In Lemma 3.11, there are exactly D solutions instead of exactly one solution to the simultaneous equations. Lemmata 3.12, 3.13, and 3.14 analyse the behaviour of Y h,1 around the dominant positive singularity and remain valid without any change. In the proof of Lemma 3.15, we need to exclude balls around the conjugate roots. Proposition 3.16 must also be changed to exclude the conjugate roots.
4. The exponential case: w 1 = 0 4.1. Asymptotics of the singularities. Proposition 3.16 that concluded the previous section shows that condition (2) of Theorem 2.1 is satisfied (with α = 1 2 ) by the generating functions Y h,1 (and thus also Y h,0 , since Y h,0 (x) = Y h,1 (x) + x). It remains to study the behaviour of the singularity ρ h of Y h,0 and Y h,1 to make the theorem applicable. As it turns out, condition (1) of Theorem 2.1 holds precisely if vertices of outdegree 1 are allowed in our simply generated family of trees. In terms of the weight generating function Φ, this can be expressed as w 1 = Φ ′ (0) = 0. Starting with Lemma 4.3, we will assume that this holds. The case where vertices of outdegree 1 cannot occur (equivalently, w 1 = Φ ′ (0) = 0) is covered in Section 5.
Let us define the auxiliary quantities η h,k := Y h,k (ρ h ) for all 0 ≤ k ≤ h. We know that these must exist and be finite for all sufficiently large h. Since the coefficients of Y h,k are nonincreasing in k in view of the combinatorial interpretation, we must have Note also that the following system of equations holds: in view of (8) and (7), respectively. Since Y h,1 is singular at ρ h by assumption, the Jacobian determinant of the system that determines Y h,0 , Y h,1 , . . . , Y h,h needs to vanish (as there would otherwise be an analytic continuation by the analytic implicit function theorem). This determinant is given by .
Using column expansion with respect to the last column to obtain the determinant, we find that this simplifies to We will now use (45), (46), and (47) to determine an asymptotic formula for ρ h . Throughout this section, B i 's will always be positive constants with B i < 1 that depend on the specific family of simply generated trees, but nothing else. Proof. Since we already know that η h,1 converges to τ − ρ and that ρ h converges to ρ, η h,0 converges to τ by (45). By the monotonicity property (44), all η h,k must therefore be bounded by a single constant M for sufficiently large h. Since η h,1 converges to τ − ρ, we must have that

Thus by induction
This proves the desired inequality for sufficiently large h and 1 ≤ k ≤ h with B 1 = ρΦ ′ (τ − ρ/2) < 1, and we are done.
With this bound, we will be able to refine the estimates for the system of equations, leading to better estimates for ρ h and η h,0 . Recall from Lemma 3.9 that ρ h and η h,1 converge to their respective limits ρ and τ − ρ = η 1 (at least) exponentially fast. Since η h,0 = η h,1 + ρ h by (45), this also applies to η h,0 . We show that an analogous statement also holds for η h,k with arbitrary k. In view of (46), it is natural to expect that η h,k → η k , where η k is defined recursively as follows: η 0 = τ and, for k > 0, η k = ρΦ(η k−1 ) − ρ, which also coincides with our earlier definition of η 1 = τ − ρ. This is proven in the following lemma.
For a suitable choice of B 2 , the estimate for ρ h has been established by Lemma 3.9, as has the estimate for η h,k in the cases where k = 0 and k = 1.
. Without loss of generality, suppose that B 2 ≥ B 1 . Then, using (46), we obtain where ξ h,k−1 is between η k−1 and η h,k−1 (by the mean value theorem) and the O-constant is independent of k. Let M be this O-constant. We already know (compare the proof of

Iterating this inequality yields
and the desired statement follows.
From Lemma 4.1 and the fact that η h,k → η k , we trivially obtain η k ≤ C · B k 1 , with the same constants B 1 and C as in Lemma 4.1. In fact, we can be more precise, and this is demonstrated in the lemma that follows. Since the expression ρΦ ′ (0) occurs frequently in the following, we set ζ := ρΦ ′ (0). Recall that we assume Φ ′ (0) = 0 until the end of this section. Lemma 4.3. The limit λ 1 := lim k→∞ ζ −k η k exists. Moreover, we have Proof. Recall that we defined the sequence (η k ) k≥0 by η 0 = τ and η k = ρΦ(η k−1 )−ρ for k ≥ 1. Using Taylor expansion, we obtain Since we already know that η k−1 ≤ C · B k−1 1 , this implies that . 26 Now it follows that the infinite product converges. The error bound follows from noting that Next, we consider the expression in (47) and determine the asymptotic behaviour of its parts.
In view of Lemma 4.2, we have ρ h−k+1 Hence the expression simplifies to Since ζ < 1 and B 1 < 1, we can simply evaluate the geometric series, and the expression further simplifies to for an appropriately chosen B 3 < 1. This proves the first statement. For the second statement, we also use Lemma 4.2, along with the monotonicity of Φ ′ and the assumption that Φ ′ (0) = 0, which implies that Φ ′ (η j ) is bounded away from 0. This yields Lemma 4.3), the product that defines λ 2 converges. So we can rewrite the product term as and thus, using again the estimate Φ ′ (η j ) = Φ ′ (0) + O(ζ j ) on the remaining product, k > 1. Iterating this inequality yields In view of the exponential bound on η h,ℓ provided by Lemma 4.1, we can choose ℓ so large that M 1/(r−1) η h,ℓ ≤ 1 2 for all sufficiently large h. This proves the desired bound for k ≥ ℓ with B 1 = 2 −r −ℓ and a suitable choice of C (for k < ℓ, it is implied by the exponential bound). Our next step is an analogue of Lemma 4.4.
Lemma 5.2. For large enough h and the same constant B 1 < 1 as in the previous lemma, we have Proof. We already know that ρ h Φ ′ (η h,1 ) converges to ρΦ ′ (τ − ρ) < 1, so for sufficiently large h and some q < 1, As in the proof of Proposition 4.7, we observe that the function H(x) = xΦ ′ (x) − Φ(x) is increasing (on the positive real numbers within the radius of convergence of Φ) with derivative H ′ (x) = xΦ ′′ (x) and a unique zero at τ . So it follows from this equation that η h,0 = τ + O B r h 1 . Using this estimate for η h,0 in (56) it follows that ρ h = ρ + O B r h 1 . As in the previous section, we will approximate η h,k by η k , defined recursively by η 0 = τ and η k = ρ(Φ(η k−1 ) − 1). As it turns out, this approximation is even more precise in the current case.
Plugging this into (59) which implies the statement for any B 2 > B 1 .
The next lemma parallels Lemma 4.3.
We iterate this recursion k times to obtain The infinite series converge in view of the estimate θ j = O(η j ) = O(B r j 1 ) that we get from Lemma 5.1. Moreover, we have ∞ j=k r k−1−j θ j = O(B r k 1 ) by the same bound. The result follows upon taking the exponential on both sides and multiplying by λ 1 , setting (62) µ := exp log η 0 λ 1 + ∞ j=0 r −1−j θ j = η 0 λ 1 ∞ j=0 e θ j /r j+1 .
Note that µ < 1 because we already know that η k = O(B r k 1 ). In order to further analyse the behaviour of the product h j=1 (ρ h Φ ′ (η h,j )) in (47), we need one more short lemma. Proof. Again, we can assume that h is so large that η h,k ≤ τ for all k ≥ 1. The auxiliary function Ψ 2 (u) = log(Φ ′ (e u )) is continuously differentiable on (−∞, log τ ] and satisfies lim u→−∞ Ψ ′ 2 (u) = r − 1. Thus its derivative is also bounded, and the same argument as in Lemma 5.4 shows that log Φ ′ (η h,k ) Φ ′ (η k ) ≤ K log η h,k η k for some positive constant K. Now the statement follows from Lemma 5.4.
Lemma 5.7. There exist positive constants λ 2 , λ 3 and B 3 < 1 such that, for large enough h, with µ as in Lemma 5.5.

Consequently,
Putting everything together, the statement of the lemma follows with λ 2 = Πµ −r and a suitable choice of B 3 > B 2 .
With this estimate for the product term in the determinant of the Jacobian (47), and the estimate for the sum term from Lemma 5.2, we can now obtain a better asymptotic formula for ρ h Φ ′ (η h,0 ) than that which was obtained in (56). For large enough h, we have that . For the error term, recall that B 1 < B 3 . Moreover, combining Lemmata 5.4 and 5.5 leads to η h,h = λ 1 µ r h 1 + O(B r h 2 ) since B 2 was chosen to be greater than B 1 , which we can apply to (57): since B 3 was chosen to be greater than B 1 (and thus also µ) and B 2 . As we did earlier to obtain Lemma 5.3, we multiply the two equations (64) and (65) and divide by ρ h to find that From this, the following result follows now in exactly the same way as Proposition 4.7 follows from (53). and ρ h = ρ 1 + w r λ r 1 Φ(τ ) µ r h+1 + O(µ r h+1 B r h 4 ) .

5.2.
An adapted general scheme and the proof of Theorem 2. In this final section, we will first prove Theorems 3 and 4. Then, we will be able to put all pieces together and prove Theorem 2.
Proof of Theorem 3. We apply singularity analysis, and use the uniformity condition to obtain uniformly in h as n → ∞ as well as n −α−1 ρ −n (1 + o(1)).
Since in addition A h → A and ρ h = ρ(1 + κζ r h + o(ζ r h )), it holds that y h,n y n = ρ h ρ