We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this paper we study the number of random records in an arbitrary split tree (or, equivalently, the number of random cuttings required to eliminate the tree). We show that a classical limit theorem for the convergence of sums of triangular arrays to infinitely divisible distributions can be used to determine the distribution of this number. After normalization the distributions are shown to be asymptotically weakly 1-stable. This work is a generalization of our earlier results for the random binary search tree in Holmgren (2010), which is one specific case of split trees. Other important examples of split trees include m-ary search trees, quad trees, medians of (2k + 1)-trees, simplex trees, tries, and digital search trees.
We consider a generalized form of the coupon collection problem in which a random number, S, of balls is drawn at each stage from an urn initially containing n white balls (coupons). Each white ball drawn is colored red and returned to the urn; red balls drawn are simply returned to the urn. The question considered is then: how many white balls (uncollected coupons) remain in the urn after the kn draws? Our analysis is asymptotic as n → ∞. We concentrate on the case when kn draws are made, where kn / n → ∞ (the superlinear case), although we sketch known results for other ranges of kn. A Gaussian limit is obtained via a martingale representation for the lower superlinear range, and a Poisson limit is derived for the upper boundary of this range via the Chen-Stein approximation.
In this paper we study the size of the largest clique ω(G(n, α)) in a random graph G(n, α) on n vertices which has power-law degree distribution with exponent α. We show that, for ‘flat’ degree sequences with α > 2, with high probability, the largest clique in G(n, α) is of a constant size, while, for the heavy tail distribution, when 0 < α < 2, ω(G(n, α)) grows as a power of n. Moreover, we show that a natural simple algorithm with high probability finds in G(n, α) a large clique of size (1 − o(1))ω(G(n, α)) in polynomial time.
We investigate the final size distribution of the SIR (susceptible-infected-recovered) epidemic model in the critical regime. Using the integral representation of Martin-Löf (1998) for the hitting time of a Brownian motion with parabolic drift, we derive asymptotic expressions for the final size distribution that capture the effect of the initial number of infectives and the closeness of the reproduction number to zero. These asymptotics shed light on the bimodularity of the limiting density of the final size observed in Martin-Löf (1998). We also discuss the connection to the largest component in the Erdős-Rényi random graph, and, using this connection, find an integral expression of the Laplace transform of the normalized Brownian excursion area in terms of Airy functions.
In this paper we consider a generalized coupon collection problem in which a customer repeatedly buys a random number of distinct coupons in order to gather a large number n of available coupons. We address the following question: How many different coupons are collected after k = kn draws, as n → ∞? We identify three phases of kn: the sublinear, the linear, and the superlinear. In the growing sublinear phase we see o(n) different coupons, and, with true randomness in the number of purchases, under the appropriate centering and scaling, a Gaussian distribution is obtained across the entire phase. However, if the number of purchases is fixed, a degeneracy arises and normality holds only at the higher end of this phase. If the number of purchases have a fixed range, the small number of different coupons collected in the sublinear phase is upgraded to a number in need of centering and scaling to become normally distributed in the linear phase with a different normal distribution of the type that appears in the usual central limit theorems. The Gaussian results are obtained via martingale theory. We say a few words in passing about the high probability of collecting nearly all the coupons in the superlinear phase. It is our aim to present the results in a way that explores the critical transition at the ‘seam line’ between different Gaussian phases, and between these phases and other nonnormal phases.
We study first passage percolation (FPP) on the configuration model (CM) having power-law degrees with exponent τ ∈ [1, 2) and exponential edge weights. We derive the distributional limit of the minimal weight of a path between typical vertices in the network and the number of edges on the minimal-weight path, both of which can be computed in terms of the Poisson-Dirichlet distribution. We explicitly describe these limits via construction of infinite limiting objects describing the FPP problem in the densely connected core of the network. We consider two separate cases, the original CM, in which each edge, regardless of its multiplicity, receives an independent exponential weight, and the erased CM, for which there is an independent exponential weight between any pair of direct neighbors. While the results are qualitatively similar, surprisingly, the limiting random variables are quite different. Our results imply that the flow carrying properties of the network are markedly different from either the mean-field setting or the locally tree-like setting, which occurs as τ > 2, and for which the hopcount between typical vertices scales as log n. In our setting the hopcount is tight and has an explicit limiting distribution, showing that information can be transferred remarkably quickly between different vertices in the network. This efficiency has a down side in that such networks are remarkably fragile to directed attacks. These results continue a general program by the authors to obtain a complete picture of how random disorder changes the inherent geometry of various random network models; see Aldous and Bhamidi (2010), Bhamidi (2008), and Bhamidi, van der Hofstad and Hooghiemstra (2009).
The Ehrenfest urn is a model for the diffusion of gases between two chambers. Classic research deals with this system as a Markovian model with a fixed number of balls, and derives the steady-state behavior as a binomial distribution (which can be approximated by a normal distribution). We study the gradual change for an urn containing n (a very large number) balls from the initial condition to the steady state. We look at the status of the urn after kn draws. We identify three phases of kn: the growing sublinear, the linear, and the superlinear. In the growing sublinear phase the amount of gas in each chamber is normally distributed, with parameters that are influenced by the initial conditions. In the linear phase a different normal distribution applies, in which the influence of the initial conditions is attenuated. The steady state is not a good approximation until a certain superlinear amount of time has elapsed. At the superlinear stage the mix is nearly perfect, with a nearly perfect symmetrical normal distribution in which the effect of the initial conditions is completely washed away. We give interpretations for how the results in different phases conjoin at the ‘seam lines’. In fact, these Gaussian phases are all manifestations of one master theorem. The results are obtained via martingale theory.
Recurrence equations for the number of types and the frequency of each type in a random sample drawn from a finite population undergoing discrete, nonoverlapping generations and reproducing according to the Cannings exchangeable model are deduced under the assumption of a mutation scheme with infinitely many types. The case of overlapping generations in discrete time is also considered. The equations are developed for the Wright-Fisher model and the Moran model, and extended to the case of the limit coalescent with nonrecurrent mutation as the population size goes to ∞ and the mutation rate to 0. Computations of the total variation distance for the distribution of the number of types in the sample suggest that the exact Moran model provides a better approximation for the sampling formula under the exact Wright-Fisher model than the Ewens sampling formula in the limit of the Kingman coalescent with nonrecurrent mutation. On the other hand, this model seems to provide a good approximation for a Λ-coalescent with nonrecurrent mutation as long as the probability of multiple mergers and the mutation rate are small enough.
We introduce and motivate the study of (n + 1) × r arrays X with Bernoulli entries Xk,j and independently distributed rows. We study the distribution of which denotes the number of consecutive pairs of successes (or runs of length 2) when reading the array down the columns and across the rows. With the case r = 1 having been studied by several authors, and permitting some initial inferences for the general case r > 1, we examine various distributional properties and representations of Sn for the case r = 2, and, using a more explicit analysis, the case of multinomial and identically distributed rows. Applications are also given in cases where the array X arises from a Pólya sampling scheme.
We consider a variety of subtrees of various shapes lying on the fringe of a recursive tree. We prove that (under suitable normalization) the number of isomorphic images of a given fixed tree shape on the fringe of the recursive tree is asymptotically Gaussian. The parameters of the asymptotic normal distribution involve the shape functional of the given tree. The proof uses the contraction method.
Consider a sequence of exchangeable or independent binary trials ordered on a line or on a circle. The statistics denoting the number of times an F-S string of length (at least)k1 + k2, that is, (at least)k1 failures followed by (at least) k2 successes in n such trials, are studied. The associated waiting time for the rth occurrence of an F-S string of length (at least) k1 + k2 in linearly ordered trials is also examined. Exact formulae, lower/upper bounds and approximations are derived for their distributions. Mean values and variances of the number of occurrences of F-S strings are given in exact formulae too. Particular exchangeable and independent sequences of binary random variables, used in applied research, combined with numerical examples clarify further the theoretical results.
Let Kn denote the number of types of a sample of size n taken from an exchangeable coalescent process (Ξ-coalescent) with mutation. A distributional recursion for the sequence (Kn)n∈ℕ is derived. If the coalescent does not have proper frequencies, i.e. if the characterizing measure Ξ on the infinite simplex Δ does not have mass at 0 and satisfies ∫Δ ∣x∣Ξ(dx)/(x,x)<∞, where ∣x∣:=∑i=1∞xi and (x,x)≔∑i=1∞xi2 for x=(x1,x2,…)∈Δ, then Kn/n converges weakly as n→∞ to a limiting variable K that is characterized by an exponential integral of the subordinator associated with the coalescent process. For so-called simple measures Ξ satisfying ∫ΔΞ(d x)/(x,x)<∞, we characterize the distribution of K via a fixed-point equation.
We will propose an alternative condition for stochastic domination. This condition differs in an essential way from the strong likelihood ratio property. We also show an example which satisfies the new condition, but does not satisfy the strong likelihood ratio property.
The lying oracle problem is a problem of finding the optimal strategies in a two-person game where an oracle predicts the outcomes of coin flips and a player bets on the outcomes. The oracle announces whether the coin will land heads or tails, but may at times lie. We analyze the variant of the game which uses a biased coin, where the probability p that the coin lands heads is common knowledge. We determine optimal strategies for both the oracle and player, and we give an explicit expression for the expected payoff to the player when the coin is flipped n times and the oracle may lie at most k times.
The probability that two randomly selected phylogenetic trees of the same size are isomorphic is found to be asymptotic to a decreasing exponential modulated by a polynomial factor. The number of symmetrical nodes in a random phylogenetic tree of large size obeys a limiting Gaussian distribution, in the sense of both central and local limits. The probability that two random phylogenetic trees have the same number of symmetries asymptotically obeys an inverse square-root law. Precise estimates for these problems are obtained by methods of analytic combinatorics, involving bivariate generating functions, singularity analysis, and quasi-powers approximations.
In an infinite sequence of independent Bernoulli trials with success probabilities pk=a/(a+b +k-1) for k=1,2,3,…, let Nr be the number of r≥2 consecutive successes. Expressions for the first two moments of Nr are derived. Asymptotics of the probability of no occurrence of r consecutive successes for large r are obtained. Using an embedding in a marked Poisson process, it is indicated how the distribution of Nr can be calculated for small r.
Let (Xi)i∈ℕ be a sequence of independent and identically distributed random variables with values in the set ℕ0of nonnegative integers. Motivated by applications in enumerative combinatorics and analysis of algorithms we investigate the number of gaps and the length of the longest gap in the set {X1,…,Xn} of the first n values. We obtain necessary and sufficient conditions in terms of the tail sequence (qk)k∈ℕ0,qk=P(X1≥ k), for the gaps to vanish asymptotically as n→∞: these are ∑k=0∞qk+1/qk <∞ and limk→∞qk+1/qk=0 for convergence almost surely and convergence in probability, respectively. We further show that the length of the longest gap tends to ∞ in probability if qk+1/qk→ 1. For the family of geometric distributions, which can be regarded as the borderline case between the light-tailed and the heavy-tailed situations and which is also of particular interest in applications, we study the distribution of the length of the longest gap, using a construction based on the Sukhatme–Rényi representation of exponential order statistics to resolve the asymptotic distributional periodicities.
In this work we give precise asymptotic expressions for the probability of the existence of fixed-size components at the threshold of connectivity for random geometric graphs.
In this paper we consider a class of quasi-birth-and-death processes for which explicit solutions can be obtained for the rate matrix R and the associated matrix G. The probabilistic interpretations of these matrices allow us to describe their elements in terms of paths on the two-dimensional lattice. Then determining explicit expressions for the matrices becomes equivalent to solving a lattice path counting problem, the solution of which is derived using path decomposition, Bernoulli excursions, and hypergeometric functions. A few applications are provided, including classical models for which we obtain some new results.
Mallows and Shepp (2008) developed the following necklace processes. Start with a necklace consisting of one white bead and one black bead, and insert, one at a time, under a deterministic rule, a white bead or a black bead between a randomly chosen adjacent pair. They studied the statistical properties of the number of white beads by investigating the nature of the moments and the expected number of gaps of given length between white beads. In this note we study the number of white beads via Pólya urns and give a classification of necklace processes for some general rules. Additionally, we discuss the number of runs, i.e. the number of consecutive same color beads, instead of the number of gaps.