1 Introduction
A classical theorem (Cayley’s formula) says that there are
$n^{n-2}$
labeled trees on
$\{1,\dots ,n\}$
, rooted at
$1$
:
The
$16$
labeled rooted trees at
$1$
, for
$n=4$
. Here and below the root vertex is indicated as solid, and nonroot vertices are hollow. There are
$64$
labeled rooted trees, obtained by selecting any of the four vertices as root.

The
$4$
unlabeled rooted trees, for
$n=4$
.

These are called Cayley trees in his honor. In contrast, there is no formula for rooted, unlabeled trees, called Pólya trees to remember that Pólya [Reference PólyaPól37] (followed by Otter [Reference OtterOtt48]) determined their asymptotics: The number of unlabeled rooted trees on n vertices is asymptotic to
$$ \begin{align} \frac{b\sqrt{\rho}}{2\sqrt\pi}n^{-3/2}\rho^{-n}\big(1+\mathcal O(\tfrac1n)\big) \end{align} $$
with
$b=2.6811266\dots $
and
$\rho =0.338219\dots $
. This follows from the functional equation
for the generating function
$t(x)=x+x^2+2x^3+4x^4+\cdots $
counting Pólya trees. The functional equation leads to a fast recursive procedure to produce the number of Pólya trees on n vertices.
Even for small n, the number of labeled and unlabeled trees are quite different. The following numbers are from the Online Encyclopedia of Integer Sequences: [OEIS24, A000081] (Pólya, unlabeled rooted) and [OEIS24, A000272] (Cayley, labeled rooted at 1). We include, for comparison and in italics, [OEIS24, A000055] (unlabeled unrooted) and [OEIS24, A000169] (labeled rooted); these shall not concern us in this article:
$$ \begin{align} \begin{array}{c|cccccccccc} n & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\ \hline \textit{unlabeled unrooted} & \mathit{1} & \mathit{1} & \mathit{1} & \mathit{2} & \mathit{3} & \mathit{6} & \mathit{11} & \mathit{23} & \mathit{47} & \mathit{106}\\ \text{P}\acute{\rm o}\text{lya} & 1 & 1 & 2 & 4 & 9 & 20 & 48 & 115 & 286 & 719\\ \text{Cayley} & 1 & 1 & 3 & 16 & 125 & 1296 & 16807 & 262144 & 4782969 & 10^8\\ \textit{labeled rooted} & \mathit{1} & \mathit{2} & \mathit{9} & \mathit{64} & \mathit{625} & \mathit{7776} & \mathit{117649} & \mathit{2097152} & \mathit{43046721} & \mathit{10^9} \end{array} \end{align} $$
It is natural to suspect that properties of labeled and unlabeled trees will behave differently. There has been a good deal of elegant asymptotic theory for both classes of trees, reviewed in Section 2. The main result of this paper is to supplement the asymptotics by providing a Monte Carlo algorithm to generate uniformly random Pólya trees (it is easy to generate random Cayley trees for fixed n using Prüfer codes, as explained in §2.2).
Theorem A (= Algorithms 2 and 3).
There are linear-time algorithms (assuming constant-time operations on integers
$\le n$
) that produce, from a labeled tree on
$[n]$
, a uniform permutation fixing it; and, from a permutation of
$[n]$
, a uniform labeled tree fixed by it.
The main application of Theorem A is a conjecturally efficient procedure to generate uniform Pólya trees. For a fixed parameter
$n \in \mathbb {N}$
, it starts with any Cayley tree
${\mathcal {T}}$
on
$[n]$
(e.g., an
$(n-1)$
-pointed star), and repeatedly lets
$\sigma $
be a uniform permutation fixing
${\mathcal {T}}$
, and replaces
${\mathcal {T}}$
by a uniform labeled tree fixed by
$\sigma $
. The limit distribution of this Markov chain (on the tree side) is a uniform Pólya tree.
We are not able to bound the necessary number of steps to obtain convergence, but experimentation suggests that a constant number of steps (perhaps
$20$
) could be enough; we have checked this for n between
$10^3$
and
$10^6$
, albeit crudely (by tracking the evolution of the parameters of height, width, etc. along the first steps of the Markov chain).
Conjecture B. The Markov chain yields an
$\mathcal O(-n\log \epsilon )$
-time algorithm for producing an
$\epsilon $
-approximation to a uniform Pólya tree.
The algorithm gives a formula for the number of Cayley trees admitting a given automorphism. For integers
$m,x,y$
, consider
$f(m,x,y)=(mx+y)^{m-1}y$
. Let
$\sigma $
be a permutation in
$S_n$
fixing
$1$
, with
$\lambda _d\, d$
-cycles. Set
$\mu _d = \sum _{e|d} e\lambda _e$
.
Theorem C (= Theorem 3.3).
The number of labeled trees rooted at
$1$
on
$[n]$
that are
$\sigma $
-invariant is
When
$\sigma = \text {id}$
, we have
$\lambda _1 = n$
, all trees are fixed and the formula boils down to
$n^{n-2}$
.
Background on trees and their asymptotic theory is collected in Section 2. We also review there the Burnside process, a general Monte Carlo Markov chain approach to generating random orbits for a finite group acting on a finite set of which the above Markov chain is an example.
The algorithm itself is explained in Section 3. Section 4 illustrates the algorithm. It compares the distribution of labeled and unlabeled trees, and compares asymptotic theory with simulations. It may be read directly for further motivation.
2 Background
This section reviews the extensive prior literature on trees (§2.1), Cayley trees (§2.2) and Pólya trees (§2.3). These last two classes have global “shape properties” captured by Aldous’ “continuum random tree” explained in §2.4. The “Burnside process,” which is the basis of our algorithm, is explained in §2.5.
2.1 Trees
Because of their surprisingly wide applicability, there is a large combinatorial, enumerative and probabilistic literature on trees. Knuth [Reference KnuthKnu97] offers a succinct review of the basics, see in particular [Reference KnuthKnu97, §2.3.4.4]. This is supplemented by the
$40^+$
-page development in [Reference KnuthKnu11, §7.2.1.6]. The classical book by Moon [Reference MoonMoo70] develops probabilistic aspects which are brought much further in the readable account of Drmota [Reference DrmotaDrm09]. Any serious account of graph theory [Reference DiestelDie17, Reference Bondy and MurtyBM08, Reference BollobásBol98] has a chapter on trees.
Of course, trees come in many flavors; phylogenetic trees [Reference HolmesHol03, Reference FelsensteinFel03], real trees [Reference EvansEva08], A-B trees [Reference OdlyzkoOdl84], search trees [Reference MahmoudMah92], …. The present paper deals only with Cayley and Pólya trees, but we think the distinction between labeled and unlabeled, and our new algorithm, can be carried over to some of these classes.
Note that all the trees we consider are rooted, namely come with a distinguished vertex; this is often dictated by applications (search trees, for example), but is also more convenient: once a tree is rooted, it naturally admits an ordering of its edges towards the root, a notion of leaf, and a recursive decomposition obtained by cutting the top branches off the root and viewing them as smaller rooted trees.
Below we will consider Cayley trees, whose vertices are labeled by
$[n]$
with
$1$
chosen as root, and Pólya trees, whose vertices are not labeled. There is little difference between rooted and unrooted labeled trees: there is an
$n:1$
map from rooted labeled trees to unrooted labeled trees, which admits a section by selecting
$1$
as root. Therefore counts, statistics, etc. of rooted and unrooted labeled trees can be readily deduced from each other. On the other hand, no such statements hold for Pólya trees, and it is crucial that our unlabeled trees are rooted. In this article, we do not consider unrooted Pólya trees (often called “free trees”); see [Reference StuflerStu23] for some results on their distribution. There is a natural map from Cayley trees to Pólya trees, in which all labels are forgotten except
$1$
which is designated as root.
An automorphism of a tree is a bijection of its vertex set that fixes the root and induces a bijection of its edges. For a Cayley tree, such a bijection may be written as an element of the symmetric group
$S_n$
that fixes
$1$
.
A wide variety of “features” are studied in the literature:
-
distances the height of a tree is the maximum distance from the root to a leaf, and its path length is the sum of all distances to the root. Also of interest is “pick a pair of vertices at random. What is the distribution of their distance?”. This was Aldous’ original motivation for his continuum random tree;
-
growth for
$k=1,2,\dots ,\text {height}$
, let
$w_k$
be the number of vertices at distance k. The vector
$(w_1,w_2,\dots ,w_{\text {height}})$
is called the profile and its maximum is called the width of the tree; -
degrees the maximum degree, the average degree, the empirical measure of degrees, and the number of leaves; we could list the degree of the root here, and also above as
$w_1$
.
For special applications, particularly phylogenetic trees, a variety of additional “shape parameters” are of interest. See [Reference MatsenMat06] for a review of some, and the extensive book [Reference Fischer, Herbst, Kersting, Kühn and WickeFHK+23].
These features give rise to the question what does a typical tree “look like”?
2.2 Cayley trees
There are many different proofs that there are
$n^{n-2}$
labeled trees rooted at
$1$
. See Lovász [Reference LovászLov07] or [Reference Addario-Berry, Blanc-Renaudie, Donderwinkel, Maazoun and MartinABBRD+23]. One that we find algorithmically useful is via Prüfer codes. This assigns to a tree
${\mathcal {T}}$
a sequence of integers
$[a_1,\dots ,a_{n-2}]$
, with
$1\le a_i\le n$
, giving a bijection between Cayley trees
$\mathscr C_n$
and
$[n]^{n-2}$
. The mapping is easy to state. Given
${\mathcal {T}}\in \mathscr C_n$
, remove the leaf edge with the lowest leaf (the root never counts as a leaf) and let
$a_1$
be the label on the other side of the edge. Continue with the remaining tree to generate
$a_2,\dots ,a_{n-2}$
. For example,

since the leaves are removed in order
$2,4,5$
and their neighbors are respectively
$1,3,3$
. It is similarly easy to reconstruct
${\mathcal {T}}$
from the code.
By inspection, the degrees
$d_i$
of vertices
$1,\dots ,n$
can be read from the code
$[a_1,\dots ,a_{n-2}]$
via
$d_i=n_i+1$
if i occurs
$n_i$
times in the code. For the example above, the code
$[1,3,3]$
has
$d_1=2$
,
$d_2=d_4=d_5=1$
,
$d_3=3$
. The number of trees with degree sequence
$d_1,\dots ,d_n$
is therefore
$$\begin{align*}\binom{n-2}{d_1-1\quad\cdots\quad d_n-1}.\end{align*}$$
Translated as a probability statement, this becomes the following
Proposition 2.1 (Folklore).
For
${\mathcal {T}}\in {\mathscr {C}}_n$
chosen uniformly, the joint distribution of the degrees
$(d_1,\dots ,d_n)$
is exactly the same as the joint distribution of the box counts
$(n_1,\dots ,n_n)$
when
$n-2$
balls are dropped randomly into n boxes (with
$d_i=n_i+1)$
.
From this, standard facts about “balls in boxes” (multinomial allocation) yield theorems for trees. This is developed in [Reference RényiRén59, Reference MoonMoo70] and elsewhere. We content ourselves with a corollary:
Corollary 2.2. For
${\mathcal {T}}\in \mathscr C_n$
chosen uniformly, let
$\ell ({\mathcal {T}})$
be the number of leaves. Then
$\ell ({\mathcal {T}})$
has mean and variance
and, normalized by its mean and standard deviation,
$\ell ({\mathcal {T}})$
has a limiting standard normal distribution.
Proof. This is just a translation of classical facts about
$n-2$
balls dropped into n boxes and the distribution of the empty cells, see [Reference Kolčin and ČistjakovKČ74]. For example, if
$$\begin{align*}X_i=\begin{cases}1&\text{ if box i is empty},\\0&\text{ else}\end{cases},\qquad \ell({\mathcal{T}})=\sum_{i=1}^n X_i;\end{align*}$$
and
$\mathbb E_n(X_i)=(1-\frac 1n)^{n-2}=\frac 1e(1+\mathcal O(\tfrac 1n))$
,
$\mathbb E_n(X_i X_j)=(1-\frac 2n)^{n-2}$
for
$i\ne j$
, and an easy variant of the central limit theorem yields the corollary.
By symmetry, Proposition 2.1 also applies to the degree of the root.
The maximum degree in a tree
${\mathcal {T}}$
is one more than the maximal box count if
$n-2$
balls are randomly dropped into n boxes. By now, it is well known that the maximum of integer valued random variables tend not to have limiting distributions. A nice paper of Carr, Goh and Schmutz [Reference Carr, Goh and SchmutzCGS94] shows that the maximum is concentrated around
$\lfloor \log n/\log \log n\rfloor $
:
Theorem 2.3. Let
${\mathcal {T}}\in \mathscr C_n$
be chosen uniformly, and let
$\Delta ({\mathcal {T}})$
be the maximum degree. Set
$k_n=\lfloor \log n/\log \log n\rfloor $
. Then as
$n\to \infty $
with probability tending to
$1$
.
The authors prove refined, oscillating approximations for the limiting chance that
$\Delta ({\mathcal {T}})$
takes the three possible values, depending on
$\log n/\log \log n-k_n\in [0,1)$
. This differs from the parallel result below (Theorem 2.4) for Pólya trees.
As explained in Drmota [Reference DrmotaDrm09, Example 1.9], random Cayley trees are exactly distributed as a Galton-Watson process with Poisson birth distribution, conditioned to have n descendants at extinction. This implies that, as random metric spaces for the path metric,
${\mathcal {T}}\in \mathscr C_n$
converges to Aldous’ continuum random tree. Now, a host of limit theorems for “global functions” are available, see §2.4 below for details. We remark here that the number of leaves and the maximum degree are not suitably continuous so are not covered by the general theory; see the remark at the end of §2.4.
2.3 Pólya trees
The symmetric group
$S_n$
acts on labeled trees in
$\mathscr C_n$
by permuting their labels, and the subgroup
$S_{n-1}=\{\sigma :\sigma (1)=1\}$
acts on labeled trees rooted at
$1$
. The orbits are unlabeled rooted trees. If
$t_n$
is the number of Pólya trees on n vertices and
$t(x)=\sum _{n\ge 1}t_n x^n$
is their generating function, Pólya [Reference PólyaPól37] proved the functional equation
Equivalently, we have a recursive formula computing the numbers
$t_n$
: writing
$j_s$
for the number of s-vertex subtrees that are directly attached to the root, we obtain (see [Reference KnuthKnu97, §2.3.4.4(2)], where Pólya trees are called “oriented trees”)
$$ \begin{align} t_n = \sum_{j_1+2j_2+\cdots=n-1}\binom{t_1+j_1-1}{j_1}\cdots\binom{t_{n-1}+j_{n-1}-1}{j_{n-1}}. \end{align} $$
Pólya [Reference PólyaPól37], followed by Otter [Reference OtterOtt48], used (3) and delicate singularity analysis to prove the asymptotics (1).
The functional equation and further delicate analysis has been used by the analytic combinatorics community to derive remarkable limiting properties for various features of random Pólya trees. A survey is in [Reference DrmotaDrm09, Section 3.6]. In particular, this includes the limiting distribution of the height, profile, and limiting degree distribution [Reference Panagiotou and SinhaPS12].
Later work includes a remarkable limit theorem for the size of the automorphism group of a random Pólya tree [Reference Olsson and WagnerOW23]: for a random rooted Pólya tree
${\mathcal {T}}$
,
$$\begin{align*}\frac{\log\#\text{Aut}({\mathcal {T}})-\mu n}{\sqrt n}\Longrightarrow\mathscr N(0,\sigma^2)\end{align*}$$
for
$\mu =0.1373423\dots $
and
$\sigma ^2=0.1967696\dots $
; thus “random trees have exponentially many automorphisms”; this will be useful in section 3 when a random automorphism must be chosen.
Many of these developments use an algorithm for generating a random element of a combinatorial class in the presence of a functional equation such as (3). This is called the “Boltzmann sampler.” It has been extensively developed by the Flajolet school, in the highly recommended original article [Reference Duchon, Flajolet, Louchard and SchaefferDFLS04]; see also [Reference Flajolet, Fusy and PivoteauFFP07] focusing on algorithms for generating unlabeled objects.
The Boltzmann sampler generates objects of random size N and one must tune parameters to get the distribution of N centered about a desired n, rejecting samples for which
$N\neq n$
. The community has introduced a host of techniques to aid this but at present writing much is art; see §4.1 for a rough speed comparison which is not favorable, in the case of Pólya trees, to the Boltzmann sampler. Indeed, for large n our experiments show that thousands of samples must be discarded before one gets one of the exact degree needed. In our comparisons of data with limits of theoretical predictions, we definitely wanted a source of fixed-size trees. Note that for some classes, for example, labeled binary trees, there exist Boltzmann samplers that produce data of exact size n, and therefore produce random samples in time complexity
$\mathcal O(n)$
; see [Reference SportielloSpo21]. This does not appear to be the case in the context of Pólya trees, though the analytic nature of the generating function
$t(x)$
(beyond its disk of convergence, it converges on a Pacman-like domain) gives rise to more efficient cutoff mechanisms (“singular capped Boltzmann samplers”) for which time complexity
$\mathcal O(n^2)$
can be achieved; see [Reference Flajolet, Fusy and PivoteauFFP07, Figure 9]. This seems to be currently an active area of research.
The functional equation (3) can be turned into a procedure that uniformly samples Pólya trees; in essence, it recursively produces smaller Pólya trees of appropriately distributed sizes and combines them at their root; see [Reference Nijenhuis and WilfNW78, Algorithm
$\texttt{RANRUT}$
] and [Reference KnuthKnu11, §7.2.1.6, Exercise 91]. Its performance is clearly polynomial, though beware that its implementation requires extremely high-precision arithmetic.
We note that, if one wishes to enumerate all Pólya trees with n vertices, then there are extremely efficient algorithms that will produce them, see [Reference KnuthKnu11, §7.2.1.6, Algorithm O]. More precisely, this algorithm produces the successor of a Pólya tree under a very elegant bijection between
$[t_n]$
and the Pólya trees on n vertices. A careful examination of this bijection may also turn it into a polynomial-time algorithm producing uniform Pólya trees.
Note next that, for random Pólya trees, [Reference Robinson and SchwenkRS75] determine the mean number of leaves. It is asymptotic to
$(0.438156\dots )n$
which is greater than the value
$n/e$
from Proposition 2.1 above, so already this local feature distinguishes Cayley and Pólya trees.
Another difference between Cayley and Pólya trees is seen from work of Goh and Schmutz [Reference Goh and SchmutzGS94]. Recall that Theorem 2.3 above shows that the maximum degree of a random Cayley tree concentrates with high probability on just three values. The following theorem shows that the maximum degree of a random Pólya tree has an “oscillating extreme value distribution.” In particular, all values of the form
$\lfloor c_1\log n\rfloor +d$
have positive probability. Furthermore, Cayley trees have maximum degree concentrated around
$\lfloor \log n/\log \log n\rfloor $
while Pólya trees have maximum degree centered at
$\lfloor c_1\log n\rfloor $
.
Theorem 2.4 [Reference Goh and SchmutzGS94].
For a random Pólya tree on n vertices, the maximum degree is concentrated around
$\Theta (\log n)$
: for
$\mu _n=c_1\log n$
, and any fixed integer d, we have
for explicit constants
$c_0\approx 3.262$
,
$c_1\approx 0.9227$
,
$\rho \approx 0.3383$
.
2.4 The continuum random tree
As described in the Introduction, there are literally hundreds of flavors of trees, each with natural uniform distribution and associated limit theorems. Drmota’s book [Reference DrmotaDrm09] is a splendid account, with careful proofs and full references. Another superb reference to the continuum random tree is [Reference Le GallLe05].
Aldous [Reference AldousAld91a, Reference AldousAld91b, Reference AldousAld93] introduces a kind of Brownian motion on such spaces and shows how many classes of trees and measures have the same limit theory. Thus, a hard won theorem for some specific model holds for many others. He has written a splendid overview of the subject [Reference AldousAld91b]. His rough motivation follows: a tree has a unique shortest path between two vertices which makes the tree into a metric space. For many models, if two vertices are chosen at random their distance is of order
$\sqrt n$
. His continuum random tree models these metric spaces of trees in which the mean distance between vertices is of order
$\sqrt n$
.
The limit object, the “continuum random tree” (CRT), in a random, compact metric space. This uses Gromov-Hausdorff’s idea that the space of all compact, Polish, metric spaces can itself be made into a metric space, call it
$(\mathscr M,d_{GH})$
. This is itself a Polish space and the well developed machine of weak convergence is in force.
To describe the limit object, recall that Brownian excursion on
$[0,1]$
is just Brownian motion
$B_t$
with
$B_0=B_1=0$
, conditioned to be non-negative. Call this process
$E_t$
, for
$t\in [0,1]$
. Use
$E_t$
to define a (random) pseudo-metric on
$[0,1]$
via
The quotient of
$[0,1]$
obtained by identifying points at distance
$0$
from each other gives a random metric space
$([0,1]/{\sim },d_E)$
, the CRT.
Now consider a class of trees
$\mathscr T_n$
on n vertices. Each
${\mathcal {T}}\in \mathscr T_n$
may be viewed as a metric space, and remains so if the metric is rescaled by dividing it by
$\sqrt n$
. Choosing
${\mathcal {T}}$
uniformly in
$\mathscr T_n$
gives a (random) metric space
$({\mathcal {T}},d_{\mathcal {T}}/\sqrt n)$
.
Aldous shows that for many classes of trees
This class includes trees constructed from critical Galton Watson trees with progeny having finite variance (the limiting E is rescaled by
$\sigma $
), conditioned to have n total progeny.
Since Cayley trees are Galton Watson trees with
$\text {Poisson}(1)$
births per generation, random Cayley trees have a CRT limit. Drmota [Reference DrmotaDrm09, Theorems 4.8, 4.11] works out the distribution of height and shape of random Cayley trees from this point of view.
For quite a while, it was a mystery: are Pólya trees in the domain of the CRT? Drmota and Gittenberger [Reference Drmota and GittenbergerDG10, Theorem 1] showed that Pólya trees are not Galton-Watson and indeed Aldous wrote “the model “all unordered unlabeled trees equally likely” does not fit into this set-up, and no simple probabilistic description is known” [Reference AldousAld91b, page 29] (though later in the same paper he conjectures that Pólya trees have CRT limits!).
A breakthrough occurred in work of Haas and Miermont [Reference Haas and MiermontHM12] sharpened by Panagiotou and Stufler[Reference Panagiotou and StuflerPS18] and Gittenberger et al [Reference Gittenberger, Jin and WallnerGJW18]. These authors show that a random Pólya tree has a large “spine” which is Galton-Watson. The spine is then “decorated” with small forests (of size
$\Theta (\log n)$
) which do not affect convergence to a rescaled CRT. The details are deep, beautiful mathematics, and the account of Panagiotou-Stufler is recommended reading.
From all of this, for “global functions,” especially those that are continuous in the Gromov-Hausdorff topology, one expects Cayley and Pólya trees to “look the same.” Note however three caveats: (a) we have found it difficult to find a reasonable, useful description of the continuous functions on tree space, and there seem actually to be very few; (b) one needs Gromov-Hausdorff convergence of pointed, normalized spaces, which is even harder to obtain; (c) on the positive side, a weaker notion of convergence (Gromov-Hausdorff-Prokhorov convergence, see [Reference Greven, Pfaffelhuber and WinterGPW09]) suffices, exploiting the fact that the rooted trees come equipped themselves with a probability measure; more functionals become continuous once this setting is adopted, such as path length considered in §4.2. We are grateful to Grégory Miermont for explaining this to us.
In all cases, every theorem on the distribution of a tree parameter is a feat, and one should not take for granted that the convergence rate of one parameter is in any way related to that of another one. The examples in Section 4 tell a nuanced story. Furthermore, there are any number of “local features” where this limit theory doesn’t hold and special purpose theory and simulation are the only game in town; these are also considered in that section.
2.5 The Burnside process
Counting and sampling problems often occur in the presence of a group action. We begin in a general abstract setting: let
$\mathscr X$
be a finite set, and let G be a finite group acting on the right on
$\mathscr X$
. This divides
$\mathscr X$
into disjoint orbits
Natural questions are:
-
• How many orbits are there?
-
• What are typical orbit sizes?
-
• Do the orbits have “nice names,” that is, can they be labeled by a convenient coding?
-
• How can one choose an orbit uniformly at random?
The Burnside process [Reference Goldberg and JerrumGJ02, Reference DiaconisDia05] gives a Markov chain Monte-Carlo approach to the last problem, and this permits progress on the first two problems.
The process runs as follows:
-
1. From
$x\in \mathscr X$
choose
$g\in G$
uniformly in the stabilizer
$G_x=\{h\in G:x^h = x\}$
; -
2. From
$g\in G$
choose
$x\in \mathscr X$
uniformly in the fixed point set
$\mathscr X_g=\{y\in \mathscr X:y^g=y\}$
.
In other words, we consider the bipartite graph with vertex set
$\mathscr X\sqcup G$
and an edge between
$x\in \mathscr X$
and
$g\in G$
whenever
$x^g=x$
, and perform a simple random walk on this graph.
The two steps of the chain go from
$x\in \mathscr X$
to
$y\in \mathscr X$
with chance
$$\begin{align*}K(x,y)=\frac1{\#G_x}\sum_{g\in G_x\cap G_y}\frac1{\#\mathscr X_g}.\end{align*}$$
It is easy to see that this gives an ergodic, reversible Markov chain on
$\mathscr X$
with stationary distribution
$$\begin{align*}\pi(x)=\frac{k^{-1}}{\#\mathscr O_x},\qquad k=\#\text{orbits}, x\in\mathscr O_x.\end{align*}$$
Thus, running the chain, and simply reporting the current orbit gives a symmetric Markov chain on
$\{1,\dots ,k\}$
with uniform stationary distribution.
This procedure has been applied when
$\mathscr X=G$
with
$x^g=g^{-1}x g$
, so the orbits are conjugacy classes. When furthermore
$G=S_n$
, the classes are indexed by partitions of n and the procedure gives a useful way to generate a random partition; see [Reference Diaconis and TungDT24, Reference Diaconis and HowesDH25].
The Burnside process is a close cousin of the celebrated Swedsen Wang algorithm used for Ising simulations; see [Reference Andersen and DiaconisAD07] for details. It is expected to converge rapidly. This can be proved in some cases [Reference Diaconis and ZhongDZ23, Reference RahmaniRah22, Reference PaguyoPag23, Reference Diaconis, Lin and RamDLR25a, Reference Diaconis, Lin and RamDLR25b] but careful analysis of the running time for complex problems, such as partitions or trees, are open research problems. For a sharp analysis of a Markov chain on Pólya trees (that has a nonuniform stationary distribution) see [Reference FulmanFul09].
Our focus is on empirical results, some of which we report in Section 4.
3 The algorithm
The implementation of the Burnside process requires two basic operations:
-
1. Given a labeled rooted tree on
$[n]$
, select uniformly one of its automorphisms; -
2. Given a permutation on
$[n]$
, select uniformly a tree fixed by that permutation.
It proceeds as follows: start with a tree
${\mathcal {T}}$
, labeled by
$[n]$
and rooted at
$1$
. Then repeatedly follow steps 1. and 2., computing a uniform automorphism
$\sigma $
of
${\mathcal {T}}$
and replacing
${\mathcal {T}}$
by a uniform
$\sigma $
-fixed tree; and finally, after sufficiently many (see Observation 4.1) steps, return
${\mathcal {T}}$
forgetting its labels but keeping its root at
$1$
.
We shall describe algorithms that implement efficiently the above two steps; but first review and extend a well-known encoding of labeled trees by their Prüfer codes. This is needed for the algorithm, but also gives our refinement of Cayley’s formula.
3.1 Prüfer codes with automorphisms, and a refinement of Cayley’s formula
Consider
$\sigma \in S_n$
, a permutation on n points. We are interested in the set
$\mathscr T_\sigma $
of labeled rooted trees
${\mathcal {T}}$
on
$[n]$
that are invariant under
$\sigma $
.
The crucial remark is that if
${\mathcal {T}}$
is invariant under
$\sigma $
, then there is an associated tree
${\mathcal {T}}/\sigma $
on the cycles of
$\sigma $
; in which the cycle containing i is below the cycle containing j in
${\mathcal {T}}/\sigma $
if and only if i is below j in
${\mathcal {T}}$
. For example, consider the full binary tree and some quotients:

The number of trees
${\mathcal {T}}$
giving rise to the same
${\mathcal {T}}/\sigma $
is the product
$\pi (\sigma )$
of the cycle lengths of
$\sigma $
; indeed each vertex in
${\mathcal {T}}/\sigma $
, labeled by a cycle of length d, corresponds to d vertices in
${\mathcal {T}}$
in some given cyclic order. There are d ways of matching that cyclic order with the order of the vertex’s parent, and all these choices may be made independently. In other words, we think of each such edge in
${\mathcal {T}}/\sigma $
as a cable made of d strands of
${\mathcal {T}}$
; they can be twisted in d ways to recover a possible
${\mathcal {T}}$
.
On the other hand, not all trees on the set of cycles of
$\sigma $
are legal; only those for which if i is below j (our trees grow downwards) then the cycle length at j divides the cycle length at i. These preliminary remarks are in fact sufficient to count the number of trees in
$\mathscr T_\sigma $
.
We now get into specifics. The following is an easy variant of Prüfer codes:
Lemma 3.1. Let
$X,Y$
be any sets, and consider
$n\in \mathbb {N}$
. There is a bijection between the following two objects:
-
1. Forests on
$[n]$
with a choice of one root per tree, a label in X on each edge, and a label in Y on each root; -
2. Sequences in
$([n]\times X\cup \{0\}\times Y)^n$
whose last term is in
$\{0\}\times Y$
.
Proof. Given a forest, construct a sequence as follows: select the lowest-numbered leaf in it, say k. Either k has a parent j in the forest, and a label
$x\in X$
on the edge
$\{j,k\}$
, or k is a root with a label
$y\in Y$
on it. In the former case, the first term of the Prüfer sequence is
$(j,x)$
while in the second case it is
$(0,y)$
. Repeat until all n vertices have been removed, and note that the last vertex is perforce a root. We have produced a sequence as desired.
Conversely, given a sequence, Algorithm 1 constructs a forest, and it is easy to verify that both operations are mutual inverses.
For example, consider the following forest:

The lowest-numbered leaf is
$3$
, so the sequence starts with
$(9,x_3)$
, and the edge
$\{3,9\}$
is removed. The lowest-numbered leaf is then the root
$4$
, so the sequence continues with
$(0,y_4)$
; proceeding, the whole sequence is
Starting again from this sequence
$\Sigma $
, the counts of the numbers, increased by one, are
$$\begin{align*}\begin{array}{c|ccccccccccc} n & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\ \hline \# & 3 & 3 & 1 & 1 & 1 & 1 & 2 & 2 & 2 & 1 & 1 \end{array}.\end{align*}$$
For example, the numbers
$1$
and
$2$
each appear twice in
$\Sigma $
. Adding
$1$
gives the
$3,3$
that start the second line. The table is used, along with
$\sigma $
, to reconstruct as follows the original forest.
The first entry of
$\Sigma $
is
$(9,x_3)$
and the lowest-numbered
$1$
in the table above is
$3$
so
$\{3,9\}$
is selected as an edge labeled
$x_3$
while the count of
$9$
is decreased to
$1$
. The second entry of
$\Sigma $
is
$(0,y_4)$
so
$4$
is made a root with label
$y_4$
; proceeding, the original forest is reconstructed.

Corollary 3.2. For any
$m,x,y\in \mathbb {N}$
, the number
$f(m,x,y)$
of labeled forests on
$[m]$
with a symbol in
$[y]$
at every root and a symbol in
$[x]$
at every nonroot is given by
Note that one recovers in particular Cayley’s count
$\tfrac {\partial f}{\partial y}f(m,1,0)=m^{m-1}$
of rooted trees on
$[m]$
.
Proof. There is a bijection between edges and nonroot vertices, in which every edge is paired with its endpoint furthest from the root.
The next theorem is our extension of Cayley’s formula.
Theorem 3.3. Let
$\sigma $
be a permutation in
$S_n$
fixing
$1$
and with
$\lambda _d$
cycles of length d; so
$\lambda \vdash n$
. Furthermore set
$\mu _d = \sum _{e | d, e<d} e\lambda _e$
. Then the number of labeled
$1$
-rooted trees on
$[n]$
that are
$\sigma $
-invariant is
Proof. If
$\sigma $
is the identity, this reduces to the classical Cayley enumeration formula. We proceed by induction on the maximal cycle length d of
$\sigma $
. Let there be then
$\lambda _d>0$
cycles of length d in
$\sigma $
, and let
$\sigma '$
be the permutation from which these cycles were removed.
Every
$\sigma $
-invariant tree may then be produced as follows: choose first a
$\sigma '$
-invariant tree
$T'$
on the complement in
$[n]$
of the d-cycles. Choose then a forest F on
$[\lambda _d]$
, with a label in
$[d]$
on each nonroot and a label in
$[\mu _d]$
on each root. Duplicate each tree d times in F, to support it on the union of the d-cycles, by choosing for each nonroot vertex
$\in [\lambda _d]$
a point in that cycle. Attach then each root to any
$[\mu _d]$
vertices of
$T'$
whose cycle length divides d. We have exhausted all the choices and decorations of F to produce once every possible
$\sigma $
-invariant tree; so
$t_{n,\sigma } = t_{n-d\lambda _d,\sigma '} f(d\lambda _d,d,\mu _d)$
and the claim follows by induction.
Theorem 3.3 bears some similarity to work done for phylogenetic trees [Reference Billey, Konvalinka and MatsenBKM16, Reference FusyFus16, Reference Matsen, Billey, Kas and KonvalinkaMBKK18, Reference GesselGes21]. Recall that a phylogenetic tree is a rooted, leaf labeled, binary tree with n terminal nodes. The symmetric group acts on these trees and the orbits are “tree shapes.” The Burnside process applies here and gives a method for choosing a uniformly distributed tree shape. As part of their work, Billey, Konvalinka and Matsen derive a nice formula for the number of phylogenetic trees fixed by a permutation
$\sigma $
. Call this number
$A(n,\sigma )$
. Suppose that
$\sigma $
has cycles of length
$\lambda _1,\lambda _2,\dots ,\lambda _\ell $
in decreasing order. They give
$$\begin{align*}A(n,\sigma) = \prod_{i=2}^\ell (2(\lambda_i +\cdots+\lambda_\ell)-1).\end{align*}$$
Note that taking
$\sigma =\text {id}$
gives
$\lambda _1=\dots =\lambda _n=1$
and recovers for
$A(n,\text {id})$
the classical count
$(2n-3)!!$
of phylogenetic trees. Neither formula is a consequence of the other, but they both admit a reasonably simple product form because the fixed subtrees under powers of
$\sigma $
form an increasing exhaustion of the tree. Note also that in the case of these (binary) phylogenetic trees all cycle lengths are a power of
$2$
, so there are polynomially many conjugacy classes of
$\sigma $
to consider, so the Burnside process can be implemented on a set of states of polynomial size.
There is a parallel development in studying the action of the symmetric group on the set of functions from
$[n]$
to itself, see [Reference Constantineau and LabelleCL91, Reference Meir and MoonMM84, Reference MutafchievMut88]. Formulas similar to Theorem 3.3 are obtained by these authors; for example, the number of functions
$[n]\to [n]$
that are fixed by
$\sigma \in S_n$
– namely, commute with
$\sigma $
– is
$$\begin{align*}\prod_{i=1}^n(\mu_d)^{\lambda_d}\end{align*}$$
in the notation of Theorem 3.3; that formula can be proven by similar arguments to ours, and fits nicely with Joyal’s proof of Cayley’s formula establishing a bijection between bi-rooted trees and self-maps of
$[n]$
, see [Reference Aigner and ZieglerAZ18, page 236]. Labelle [Reference LabelleLab86, Corollary A2] deduces Theorem 3.3 as a count of those self-maps of
$[n]$
commuting with
$\sigma $
and with a single periodic (perforce fixed) point; see also [Reference Constantineau and LabelleCL90].
3.2 From a labeled tree to a permutation fixing it
The problem of computing the automorphism group of a graph is a long-standing one, and has been studied extensively, both theoretically [Reference BabaiBab18] and practically [Reference McKay and PipernoMP14, Reference Anders and SchweitzerAS24]. If the automorphism group has been “computed” in a sufficiently convenient format, it is usually straightforward to produce a uniformly random element from the group.
For practical purposes, and for the applications and tests we have in mind (with
$n\approx 10^6$
), much more efficient procedures are necessary.
Edmonds [Reference Busacker and SaatyBS65, §6-21] gives a procedure to test isomorphism of rooted trees, based on canonical labelings of vertices. Mathematically, it may be described as follows: label each vertex with the nested list consisting, in lexicographically increasing order, of the labels of its children. Thus leaves are labeled by the empty list, and the four nonisomorphic unlabeled rooted trees from Figure 2 have root respectively labeled
$(((())))$
,
$(((),()))$
,
$((),(()))$
and
$((),(),())$
. Two trees are isomorphic if and only if their root labels are equal.
This algorithm is analyzed in detail in [Reference Aho, Hopcroft and UllmanAHU75, Theorem 3.3] where it is shown to run in linear time, as an application of bucket sorting. Here are some details, included for background; the algorithm itself will not be used. The idea is that, in decreasing order of height, the nested lists can be replaced by integers: start by putting
$0$
at each leaf; then proceed in decreasing order of height, first labeling vertices by lists of integers and then sorting the set of these lists and replacing them by their index in the sorted set. These integers are called i-numbers, and two trees are isomorphic if and only if they have, at each level, the same sets of lists of i-numbers. Again for the trees from Figure 2, the i-numbers are

Colbourn and Booth show that a small modification of the algorithm gives the automorphism group of the tree, also in linear time. (Here some care is necessary: the automorphism group is given by a collection of generators; it is crucial to show that there are at most
$n-1$
generators, and that each of them can be given in a very compact format, as a permutation for which only the moved points are stored.) We content ourselves with an intermediate notion: there is a linear-time algorithm that computes the automorphism partition, namely the partition of the vertex set
$[n]$
into orbits of the symmetric group. It is based on the following
Lemma 3.4 (Colbourn and Booth [Reference Colbourn and BoothCB81, Lemma 2.1]).
Two vertices
$v,w$
of a tree are in the same orbit of its automorphism group if and only if the list of i-numbers from the root to v coincides with the list of i-numbers from the root to w.
The automorphism group of a rooted tree admits the following description:
Lemma 3.5. Let
${\mathcal {T}}$
be a rooted tree with subtrees
${\mathcal {T}}_1,\dots ,{\mathcal {T}}_d$
immediately attached to the root, and let
$\lambda $
denote the partition of
$[d]$
in which
$i,j$
are in the same part precisely when
${\mathcal {T}}_i,{\mathcal {T}}_j$
are isomorphic. For each class
$C\in \lambda $
, choose an element
$i(C)$
in it. Then
$$\begin{align*}\text{Aut}({\mathcal {T}})=\bigg(\prod_{i\in[d]}\text{Aut}({\mathcal {T}}_i)\bigg)\rtimes\bigg(\prod_{C\in\lambda}S_C\bigg).\end{align*}$$
Proof. Trees in each class C may be interchanged arbitrarily, and may also be acted upon by their own automorphism group. Conversely, any automorphisms of the
${\mathcal {T}}_i$
and permutation in
$S_C$
may be assembled into an automorphism of
${\mathcal {T}}$
.

Therefore, a uniformly random element of
$\text {Aut}({\mathcal {T}})$
may be obtained by selecting uniformly elements of the symmetric groups
$S_C$
above, and then recursively selecting uniformly random elements of
$\text {Aut}({\mathcal {T}}_{i(C)})$
. The resulting permutation of
$[n]$
is obtained by composing these operations, as done in lines 3–5 of Algorithm 2 in which the image of a descendant is chosen so as to be the descendant of the already-chosen image, and has not yet been allocated in the image of the permutation. In this manner, once the orbit partition of a tree’s vertex set is known, Algorithm 2 produces in linear time (assuming constant-time operations on
$[n]$
) a uniformly random permutation fixing the tree.
In practice, we only implemented Algorithm 2, and did not implement the i-numbering algorithm, since there are very fast packages that compute the automorphism partition in close to linear time. The most efficient one we found is available at https://automorphisms.org, based on [Reference Anders, Schweitzer and StießASS23].
Note that, when it turns to practical and not asymptotic considerations, many graph manipulation libraries are ill-suited for large trees because of memory allocation overhead. The best data structure, to store a forest on
$[n]$
, consists of three integer lists of size n, pointing to the parent (
$0$
for roots), the firstborn (
$0$
for leaves) and the next sibling (
$0$
for youngest children) of each vertex. Note that only the parent pointer is necessary to characterize the tree, and seems to be the preferred format in [Reference KnuthKnu11, §7.2.1.6].
3.3 From a permutation to a tree fixed by it
Consider a permutation
$\sigma $
of
$[n]$
, and a tree T that is invariant under
$\sigma $
. For every
$d\in [n]$
, the restriction of T to the d-cycles of
$\sigma $
is a forest
$F_d$
in which each tree appears d times isomorphically. Furthermore, these d-uples of trees are grafted on the subtree
$T_{|d}$
of T spanned by the forests
$F_e$
for
$e|d$
. We may thus construct T iteratively, by going through all d in increasing order, and selecting uniformly a forest
$F_d$
. In fact, to take into account the symmetry, we proceed as follows. Let
$C_d$
be the d-cycles of
$\sigma $
, let
$n_d$
be the number of d-cycles, and let
$v_d$
be the number of vertices of
$T_{|d}$
. We construct a forest
$F'$
on
$n_d$
, with a decoration in
$[d]$
on each edge, and a decoration in
$[v_d]$
on each root. The edge decorations determine how an edge in
$F'$
may give rise to d edges in
$F_d$
, while the root decorations determine how the root in
$F'$
may be attached to a vertex in
$F_{|d}$
. The overall algorithm, which runs in linear time assuming constant-time operations on
$[n]$
, is given as Algorithm 3.

Note that the total number of random choices Algorithm 3 may make coincides with the total number of trees given by Theorem 3.3.
Here is an example to illustrate the algorithm. Consider the permutation
$\sigma =(5,6)(7,8)(9,10)(11,12,13)(14,15,16,17,18,19)(20,21,22,23,24,25)$
in
$S_{25}$
. To construct an invariant tree, we first choose uniformly a tree on the fixed points of
$\sigma $
, say

Then we consider the three
$2$
-cycles, and choose a forest on
$\{\{5,6\},\{7,8\},\{9,10\}\}$
, with an element of
$\mathbb Z/2\mathbb Z$
on each edge and an element of the fixed-point-set
$[4]$
at each root; for example,

We graft the double of this forest onto the previous tree, attaching each root by an edge to the indicated vertex, and matching
$\{5,6\}$
to
$\{7,8\}$
by a shift of
$1$
, namely
$5$
to
$8$
and
$6$
to
$7$
:

For the
$3$
-cycle, there is a unique choice of forest, namely a single root, again with a label in
$[4]$
at the root, say
$2$
; grafting the tripling of that vertex at
$2$
gives the tree

Finally there are two
$6$
-cycles, for which we must choose a forest on them (say the one-edge tree), a label for the root among the points of order dividing
$6$
, namely
$[11]$
(say
$10$
) and an element in
$\mathbb {Z}/6\mathbb {Z}$
on the edge (say
$2$
). Grafting six copies of that edge at
$10$
, and matching the
$6$
-cycles with a rotation of
$2$
, produces the tree

3.4
$\sigma $
-Prüfer sequences
Here is a more extensive discussion of the encoding of trees with symmetries by a Prüfer sequence.
We consider a permutation
$\sigma $
of
$[n]$
fixing
$1$
. Say its number of d-cycles is
$\lambda _d$
as above, so
$\lambda $
is a partition of n. Let
$\Lambda _d$
be the union of
$\sigma $
-cycles of length dividing d, namely those
$i\in [n]$
with
$\sigma ^d(i)=i$
, and let
$\Lambda ^{\prime }_d$
be the subset of
$\Lambda _d$
consisting of points of order strictly dividing d.
A
$\sigma $
-Prüfer sequence is a sequence of numbers in
$[n]$
of length
$(\lambda _1-1)+\lambda _2+\dots +\lambda _n$
, namely one less than the number of cycles of
$\sigma $
, subject to the following restriction. The sequence is grouped in blocks as in the sum above, so there is one block of numbers for each cycle length. The entries in the block of cycles of length d have to belong to
$\Lambda _d$
, and the last entry in each such block has furthermore to belong to
$\Lambda ^{\prime }_d$
, with the convention that
$\Lambda ^{\prime }_1=\{1\}$
.
Proposition 3.6. There is a bijection between
$\sigma $
-Prüfer sequences and trees on
$[n]$
, rooted at
$1$
and
$\sigma $
-invariant.
This is really just a reformulation of Theorem 3.3.
Let us illustrate this with three remarks. Firstly, if
$\sigma =\text {id}$
, then a
$\sigma $
-Prüfer sequence is really just a classical Prüfer sequence in
$[n]^{n-2}$
, with an extra
$1$
appended.
Consider now as example the permutation
$\sigma \in S_{25}$
from the previous section, and the corresponding tree from (5). Let us re-encode it as a
$\sigma $
-Prüfer sequence. Because of the cycle counts of
$\sigma $
, this sequence will take the form
where the blocks of period
$1,2,3,6$
have been separated by
$|$
. We first consider the fixed points of
$\sigma $
, and the corresponding subtree on
$[4]$
. Within that subtree, the smallest leaf is
$2$
, and its neighbor is
$4$
; the next smallest leaf is
$3$
also with neighbor
$4$
; then the next smallest leaf is
$4$
with neighbor
$1$
. Therefore the
$\sigma $
-Prüfer sequence starts as
We now move on to
$2$
-cycles and to the corresponding subtree on
$[10]$
. The smallest leaf cycle is
$(5,6)$
attached to
$4$
; the next-smallest is
$(7,8)$
with minimum attached to
$5$
; the next-smallest
$(9,10)$
attached to
$1$
; so the
$\sigma $
-Prüfer sequence continues as
In the subtree spanned by
$\Lambda _3$
, namely on
$\{1,2,3,4,11,12,13\}$
, there is a single leaf
$3$
-cycle
$(11,12,13)$
attached to
$2$
, so the
$\sigma $
-Prüfer sequence continues as
Finally in the tree the smallest leaf
$6$
-cycle is
$(24,25,26,27,22,23)$
, with minimum attached to
$18$
; and after removing that
$6$
-cycle the smallest leaf
$6$
-cycle is
$(14,15,16,17,18,19)$
with minimum attached to
$10$
. The
$\sigma $
-Prüfer sequence is therefore
It is clear that this process produces a bijection, essentially by following the steps in reverse.
The third remark is that there is some asymmetry in the handling of fixed points. This asymmetry vanishes if we count invariant forests rather than trees. For this, we consider an arbitrary permutation
$\sigma \in S_n$
, now set
and let
$\Lambda ^{\prime }_d\subseteq \Lambda _d$
consist of points of period strictly dividing d, with
$0$
included. An extended
$\sigma $
-Prüfer sequence a sequence of length
$\lambda _1+\dots +\lambda _d$
, such that the entries in the block of d-cycles belong to
$\Lambda _d$
, and the last entry in that block belongs to
$\Lambda ^{\prime }_d$
.
Proposition 3.7. There is a bijection between extended
$\sigma $
-Prüfer sequences and
$\sigma $
-invariant rooted forests on
$[n]$
.
Proof. Given a forest on
$[n]$
, extend it to a tree on
$\{0\}\cup [n]$
by connecting all roots to
$0$
. This tree is rooted at
$0$
, and
$\sigma '$
-invariant for the permutation
$\sigma '$
of
$\{0\}\cup [n]$
that extends
$\sigma $
by fixing
$0$
. Apply the previous proposition.
(This is merely the generalization of the classical fact that Prüfer sequences in
$[n+1]^{n-1}$
code forests on
$[n]$
).
4 Comparison
The algorithms from Section 3 were implemented in the computer language Julia, and thanks to their essentially linear complexity were run to produce large numbers (in the millions) of random trees with large numbers of vertices (also in the millions). We have tested our code in various ways, checking in particular that, for small n, the proportion of Pólya trees produced matches the exact counts in Table 2.
This section gathers some empirical data that can be extracted from these experiments. The first, and foremost, observation is that the Burnside process performs remarkably well. It is of course difficult to judge how close a sample is to the uniform distribution if the uniform distribution itself is not known; but we observe:
Observation 4.1. Starting with the (obviously not random) tree of height
$1$
, given by the Prüfer code
$[1,\dots ,1]$
, approximately
$20$
steps of the Burnside process are sufficient to produce a uniformly random tree; that is, a tree for which height, width, number of leaves, etc. have converged to their expected value.
This has been checked experimentally for a number of vertices up to
$n\approx 10^7$
; and could be pushed further with more optimizations.
We shall consider in turn the “features” (distances, growth, degrees) from §2.1, comparing experimental data with those predicted using the constants
$b,\rho $
appearing in (1). Recall from §2.4 that a random Pólya tree has as its “spine” a large Galton-Watson tree whose offspring distribution has variance
$\sigma =b\sqrt {\rho /2}$
, so our comparisons will show how well the spine, and its asymptotic features, describe the random Pólya tree. We record our route to computing these constants at the end of this section; for reference, they are
There are actually three kinds of comparisons that we have in mind:
-
1. compare our sampler (the “Burnside sampler”) to others, and in particular the “Boltzmann sampler” mentioned in §2.3;
-
2. compare features of Cayley and Pólya trees;
-
3. compare experimental data to theoretical predictions.
Data are often compared using “qq plots.” These are a convenient means of comparing distributions, by plotting quantiles of one distribution against the other’s: a dot appears at (
$x,y$
) if the proportion of data points
$\le x$
in the first distribution equals the proportion of data points
$\le y$
in the second. Thus closeness to the diagonal indicates similarity of distributions.
4.1 Comparing samplers
We have compared our code to freely available implementations of the Boltzmann sampler. One of them, “usain-boltz” [Reference Dien and PépinDP23], is distributed as the Python package usainboltz. It can generate structures following a grammar, and attempts to tune parameters so as to produce objects with the correct cardinality. For example, producing a Pólya tree is in principle obtained by the grammar rules

expressing B as an atom z followed by an unordered collection of B’s. The command generator.sample((1000,1000)) produces a random tree with
$1000$
vertices in roughly
$2$
seconds on a 2022 laptop. Another freely available sampler was written by Carine Pivoteau, see
https://github.com/CarinePivoteau/Alea2023Notebooks
It also consists of easy-to-interface Python code, and can also produce a Pólya tree with
$1000$
vertices in roughly
$2$
seconds on a 2022 laptop. A rough comparison (qq plot of tree depth of her trees compared to ours, see the next subsection) indicated that all three samplers are either correct, or at least suffer from similar bugs:

We refrain from more serious speed comparisons, but note that our Julia code produces random Pólya trees on
$1000$
vertices in less than a millisecond, even without serious optimization.
We stress again the fact that the Boltzmann sampler doesn’t return a random Pólya tree of exact degree n, but rather a random tree of degree N where N is random with mean n. For large n, our experience shows it can take thousands of samples to get one of the exact degree needed. Of course, fixed degree is crucial for tasks such as comparing data to limit approximations.
4.2 Distances
One well-understood statistic is the height
$H_n=H({\mathcal {T}}_n)$
of a random Pólya tree
${\mathcal {T}}_n$
. A Galton-Watson tree whose offspring distribution has variance
$\sigma $
has, according to [Reference DrmotaDrm09, Theorem 4.8], normalized height
$H_n/\sqrt n$
converging in distribution to
$2/\sigma \max _{0\le t\le 1}E_t$
, where
$E_t$
is the Brownian excursion, see §2.4. There is an analytic expression for this maximum, closely related to the Kolmogorov-Smirnov distribution, with density given by
Since height is a continuous parameter, Drmota deduces in [Reference DrmotaDrm09, Theorem 4.59]
$$\begin{align*}H({\mathcal {T}}_n)/\sqrt n\approx\frac{2\sqrt2}{b\sqrt\rho}\max_{t\in[0,1]}E_t. \end{align*}$$
We computed the heights of 1 000 000 random trees with 1000 vertices to the analytic estimate, and fitted a function of the form
$d M(\mu _e+\sigma _e x)/d(\mu _e+\sigma _e x)$
for optimal
$\mu _e,\sigma _e$
to the data, obtaining an empirical estimate
$\sigma _e$
of
$\sigma $
. The result, and its qq plot compared to the analytic distribution (7) is

One sees that the optimal fit, on the left, is extremely good, but quite far from the asymptotics; while the qq plot shows significant differences. In particular, the optimal
$\sigma _e$
turns out to be approximately
$1.04$
, and this value persists for
$n=10\,000$
and even
$n=100\,000$
; so if there is convergence of
$\sigma _e$
to the value
$\sigma =1.103\dots $
from (6) then it certainly is very slow. We have no explanation for this phenomenon.
The comparison with Cayley trees also gives quite clear results; the distributions are similar, but with a different scale. Here is the qq plot of depth of Cayley trees compared to Pólya trees:

Another important function is the path length
$I_n$
of a random Pólya tree, namely the sum of all distances from vertices to the root. According to [Reference DrmotaDrm09, Theorem 4.9], the normalized path length
$I_n/n^{3/2}$
converges in distribution to
$2/\sigma \int _0^1 E_t dt$
; this last term
$\omega _+=\int _0^1 E_t dt$
is often called the Brownian excursion area, and its distribution, known as the Airy distribution has been computed by Takács [Reference TakácsTak91, Theorem 5]: recall Kummer’s confluent hypergeometric series
and let
$\alpha _1\approx 2.3381,\alpha _2\approx 4.0879,\dots $
be the absolute values of the negative zeros of the Airy function
$Ai$
; then
$\omega _+$
is distributed as
$$\begin{align*}\frac{2^{13/6}3^{-3/2}}{x^{10/3}}\sum_{i\ge1}\exp(-2\alpha_i^3/27x^2)\alpha_i^2 U(-5/6,4/3;2\alpha_i^3/27x^2).\end{align*}$$
Again we see a good agreement between experimental data and the theoretical distribution, but with a significantly different parameter
$\sigma $
(optimal fit
$\sigma _e\approx 1.138$
, theoretical
$\sigma =1.103\dots $
from (6)):

4.3 Growth
Recall that the growth function
$L_n(k)$
is the number of vertices at distance k from the root in a random n-vertex Pólya tree
${\mathcal {T}}_n$
; it is sometimes called its profile. These parameters also admit limiting distributions based on the Brownian excursion: setting
$\ell _n(t)=n^{-1/2} L_n(t n^{1/2})$
, we have by [Reference DrmotaDrm09, Theorem 4.10] that
$\ell _n(t)$
behaves as
$\sigma /2 \ell (t\sigma /2)$
for
$\ell $
the occupation measure
$$\begin{align*}\ell(t)=\lim_{\epsilon\to0}\epsilon^{-1}\int_0^1 \mathbf1_{[t,t+\epsilon]}(E_s)ds.\end{align*}$$
In particular, the maximum of
$L_n$
is the width
$W_n$
, and is a continuous parameter – in other words, it may be estimated on the Galton-Watson spine of
${\mathcal {T}}_n$
. We thus obtain, according to [Reference DrmotaDrm09, Theorem 4.11],
$$\begin{align*}W({\mathcal{T}}_{n})/\sqrt{n}\approx 2\frac{\sqrt{8}}{b\sqrt{\rho}}\sup_{t\in[0,1]}\ell(t);\end{align*}$$
note then that the distribution of
$\sup \ell (t)$
is twice that of
$\sup E_t$
.
We can compare the widths of 1 000 000 random trees with 1000 vertices to the analytic estimate, and fit a function of the form
$dM(\tau x)/d(\tau x)$
for optimal
$\tau $
to the data; the predicted
$\tau $
is
$2b\sqrt {\eta /8n}$
. The result is

Once more, the optimal, empirical parameter gives an extremely good fit, while the analytic value of the parameter is quite far from the observations.
The comparison with Cayley trees also gives quite clear results; the distributions are similar, but with a different scale. Here is the qq plot of the width of Cayley trees compared to Pólya trees:

4.4 Degrees
The degree distribution in a random Pólya tree is of a quite different nature, and cannot be approached at all by a limiting model such as the CRT. One of the most practical methods involves generating functions: recall the generating function
$t(z)$
for Pólya trees from (3), and note that there is a similar-looking functional equation for the generating function
$f_m$
of Pólya trees with out-degree at most m; this already appears in Otter [Reference OtterOtt48].
The distributions of degree-n vertices in a random Pólya tree, for
$n=1,\dots ,8$
.

All the functions
$f_m$
have a root singularity at
$\rho _m$
on their circle of convergence, leading to the asymptotic count
$t_{n,m}\approx b_m\rho _m^n n^{-3/2}$
of Pólya trees with maximal out-degree m. Here
$t_{n,\infty }=t_n$
and
$\rho _\infty =\rho $
is the constant we had before.
Goh and Schmutz [Reference Goh and SchmutzGS94] show that
$\rho _m$
behaves quite precisely as
for some constant
$c\approx 1.1103$
. This immediately leads to
$$\begin{align*}\mathbb P(\text{max.deg.}\le m)\approx\frac{b\rho^n n^{-3/2}}{b_m\rho_m^n n^{-3/2}}\approx\exp(-c n\rho^m).\end{align*}$$
This means that the maximum degree has expected value
$-\log (n)/\log (\rho )$
, doubly exponential decay below its expectancy, and exponential convergence to
$1$
above its expectancy.
Note that this is quite different from the maximum degree of a random labeled tree, whose expectancy is
$\propto \log n/\log \log n$
and concentrates on three values, see Theorem 2.4. For comparison, here is a plot with both distributions, where the circle radius is proportional to the logarithm of the distribution density:

Here the fit between data and predictions is excellent: already for
$n=100$
the curves are indistinguishable to the eye, in their cumulative distributions,

and as a qq-plot comparing to random values sampled according to the cumulative distribution
$\exp (-c n\rho ^m)$
, namely samples
$\log (-\log (x)/c)/log(\rho )$
for x drawn uniformly from
$[0,1]$
:

However, for that last plot, the optimal fitting parameter c was
$1.8$
, far off from the
$1.1103$
predicted by theory.
We finally turn to the number of leaves, or more generally of vertices of given small degree. We increase, here, the number of vertices to
$n=10\,000$
, and compute
$100\,000$
samples, comparing their distribution of degrees (which is normal) to the values predicted in [Reference Robinson and SchwenkRS75, Table 2], which we repeat for convenience:
$$\begin{align*}\begin{array}{r|c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} \text{degree} & 1 & 2 & 3 & 4 & 5\\ \mathbf P(\deg) & 0.438\,156 & 0.293\,998 & 0.159\,114 & 0.068\,592 & 0.026\,027\\ \hline \text{degree} & 6 & 7 & 8 & 9 & 10\\ \mathbf P(\deg) & 0.009\,259 & 0.003\,198 & 0.000\,985 & 0.000\,355 & 0.000\,316 \end{array}\end{align*}$$
The means of the histograms in Table 3 closely match the exact asymptotics for the mean degrees of above table. For example the mean of the degree
$1$
vertices is 0.4382, in agreement with the theoretical mean
$0.438\,156$
. This match of our simulations to theory is one more verification that our code is working as it should.
4.5 Estimating the constants
$b,\rho $
The constants
$b,\rho $
appearing in (1) may be approximated to high precision as follows. Consider a large parameter n (in our experiments,
$40$
is plenty and gives hundreds of correct digits; we eschew detailed error estimates, deferring to [Reference OtterOtt48]), and the map
$$\begin{align*}F\colon\mathbb{R}^{n+1}\to\mathbb{R}^{n+1},\;((u_1,\dots,u_n),\rho)\mapsto\Big(\big(u_i-\rho^i\exp\big(\sum_{1\le j\le n/i} u_{i\cdot j}/j\big)\big)_{i=1,\dots,n},u_1-1\Big)\end{align*}$$
approximating the generating function t. More precisely, the variable
$u_i$
tracks the value
$t(\rho ^i)$
truncated to
$n/i$
terms, and solving
$F=0$
amounts to simultaneously solving (3) truncated to n terms and the condition
$t(\rho )=1$
, see [Reference OtterOtt48, (11)]. Therefore,
$\rho $
is approximately the last coordinate of a zero of F, and for
$\rho _0=0.338$
the point
$((\rho _0^i)_{i=1,\dots ,n},\rho _0)$
is in its basin of attraction, so Newton’s method converges very fast to a solution
$\rho $
with hundreds of digits –
$13$
iterations were enough. Then to compute b we replace the last coordinate of F by
$u_1-1+\epsilon $
, find a solution
$\rho _\epsilon $
, and estimate b as
$\epsilon /\sqrt {\rho -\rho _\epsilon }$
. This merely requires thrice more significant digits from
$\rho $
than we get for b.
5 Conclusions
We have devised and implemented an efficient, robust procedure for sampling Pólya trees. Using it, we have compared experimental data to the existing literature. Our simulations show a systematic departure from the asymptotics. Perhaps there is a correction term which eventually tends to zero but slowly enough (or with a large enough constant) to be visible for present sample sizes.
We also do not have any theoretical foundation for the performance of our algorithm. The data we obtained are compatible with the possibility that the Markov chain mixes in constant (i.e., independent of n) time, and that
$20$
steps of the Burnside process produce a Pólya tree that is for all practical purposes indistinguishable from a uniform random sample.
With the present tools, it is straightforward to set up the Burnside process to generate a random unlabeled graph on n vertices, or a random unlabeled graph on n vertices with m edges. The same two step procedure works; from a labeled graph G (say on n vertices; there are
$2^{\binom n2}$
such graphs), choose an automorphism of G uniformly using the algorithms and software tools mentioned above, and then a random
$G'$
fixed by this automorphism by flipping a coin for each orbit on pairs. For the literature on unlabeled graphs see [OEIS24, A000088], and for competing algorithms see [Reference WormaldWor87].
Acknowledgments
We thank David Aldous, Maciej Bendkowski, Vladimir Dotsenko, Sergey Dovgal, Jason Fulman, Éric Fusy, Slava Grigorchuk, Susan Holmes, Michael Howes, Don Knuth, Grégory Miermont, Ivan Mitrofanov, Evita Nestoridi, Jim Pitman, Carine Pivoteau, Nathan Tung, and Stephan Wagner for their generous comments and encouragement on this project. We are also deeply grateful to the referees for their thoughtful, useful reviews.
Competing interests
The authors have no competing interests to declare.
Funding statement
L.B. gratefully acknowledges partial support from the SNSF advanced grant TMAG-2_216487/1. P.D. gratefully acknowledges partial support from NSF grant 1954042.
Data availability statement
All experiments were performed using the programming language Julia. Its many advantages include speed, parallelism, a large collection of libraries implementing statistics, plotting, and special functions; and the ability to link in external code such as the DejaVu functions.
The experiments are all part of a Jupyter notebook, which can be downloaded as part of the resources. The degree count data in §4.4 has been extracted, and can be downloaded in plain text format (one column per degree,
$100\,000$
rows). The data are available at https://doi.org/10.5281/zenodo.14186423
Ethical standards
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Author contributions
L.B. and P.D. share their contribution to this work.








