We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In DNA sequences, specific words may take on biological functions as marker or signalling sequences. These may often be identified by frequent-word analyses as being particularly abundant. Accurate statistics is needed to assess the statistical significance of these word frequencies. The set of shuffled sequences - letter sequences having the same k-word composition, for some choice of k, as the sequence being analysed - is considered the most appropriate sample space for analysing word counts. However, little is known about these word counts. Here we present exact formulae for word counts in shuffled sequences.
Consider n cells, of which some are target cells, and suppose that each cell has a weight. The cells are killed in a sequential manner, with each currently live cell being the next one killed with a probability proportional to its weight. We study the distribution of the number of cells that are alive at the moment when all the target cells have been killed.
Consider the single-server queue with an infinite buffer and a first-in–first-out discipline, either of type M/M/1 or Geom/Geom/1. Denote by 𝒜 the arrival process and by s the services. Assume the stability condition to be satisfied. Denote by 𝒟 the departure process in equilibrium and by r the time spent by the customers at the very back of the queue. We prove that (𝒟, r) has the same law as (𝒜, s), which is an extension of the classical Burke theorem. In fact, r can be viewed as the sequence of departures from a dual storage model. This duality between the two models also appears when studying the transient behaviour of a tandem by means of the Robinson–Schensted–Knuth algorithm: the first and last rows of the resulting semistandard Young tableau are respectively the last instant of departure from the queue and the total number of departures from the store.
Consider the random graph model of Barabási and Albert, where we add a new vertex in every step and connect it to some old vertices with probabilities proportional to their degrees. If we connect it to only one of the old vertices then this will be a tree. These graphs have been shown to have a power-law degree distribution, the same as that observed in some large real-world networks. We are interested in the width of the tree and we show that it is at the nth step; this also holds for a slight generalization of the model with another constant. We then see how this theoretical result can be applied to directory trees.
Berry-Esseen-type bounds to the normal, based on zero- and size-bias couplings, are derived using Stein's method. The zero biasing bounds are illustrated in an application to combinatorial central limit theorems in which the random permutation has either the uniform distribution or one that is constant over permutations with the same cycle type, with no fixed points. The size biasing bounds are applied to the occurrences of fixed, relatively ordered subsequences (such as rising sequences) in a random permutation, and to the occurrences of patterns, extreme values, and subgraphs in finite graphs.
The convex hull of n independent random points in ℝd, chosen according to the normal distribution, is called a Gaussian polytope. Estimates for the variance of the number of i-faces and for the variance of the ith intrinsic volume of a Gaussian polytope in ℝd, d∈ℕ, are established by means of the Efron-Stein jackknife inequality and a new formula of Blaschke-Petkantschin type. These estimates imply laws of large numbers for the number of i-faces and for the ith intrinsic volume of a Gaussian polytope as n→∞.
In a tree, a level consists of all those nodes that are the same distance from the root. We derive asymptotic approximations to the correlation coefficients of two level sizes in random recursive trees and binary search trees. These coefficients undergo sharp sign-changes when one level is fixed and the other is varying. We also propose a new means of deriving an asymptotic estimate for the expected width, which is the number of nodes at the most abundant level. Crucial to our methods of proof is the uniformity achieved by singularity analysis.
Let Yk(ω) (k ≥ 0) be the number of vertices of a Galton-Watson tree ω that have k children, so that Z(ω) := ∑k≥0Yk(ω) is the total progeny of ω. In this paper, we will prove various statistical properties of Z and Yk. We first show, under a mild condition, an asymptotic expansion of P(Z = n) as n → ∞, improving the theorem of Otter (1949). Next, we show that Yk(ω) := ∑j=0kYj(ω) is the total progeny of a new Galton-Watson tree that is hidden in the original tree ω. We then proceed to study the joint probability distribution of Z and Ykk, and show that, as n → ∞, Yk/nk is asymptotically Gaussian under the conditional distribution P(· | Z = n).
A simplified proof of Thorp and Walden's fundamental theorem of card counting is presented, and a corresponding central limit theorem is established. Results are applied to the casino game of trente et quarante, which was studied by Poisson and De Morgan.
Throw n points sequentially and at random onto a unit circle and append a clockwise arc (or rod) of length s to each such point. The resulting random set (the free gas of rods) is a union of a random number of clusters with random sizes modelling a free deposition process on a one-dimensional substrate. A variant of this model is investigated in order to take into account the role of the disorder, θ > 0; this involves Dirichlet(θ) distributions. For such free deposition processes with disorder θ, we shall be interested in the occurrence times and probabilities, as n grows, of two specific types of configurations: those avoiding overlapping rods (the hard-rod gas) and those for which the largest gap is smaller than the rod length s (the packing gas). Special attention is paid to the thermodynamic limit when ns = ρ for some finite density ρ of points. The occurrence of parking configurations, those for which hard-rod and packing constraints are both fulfilled, is then studied. Finally, some aspects of these problems are investigated in the low-disorder limit θ ↓ 0 as n ↑ ∞ while nθ = γ > 0. Here, Poisson-Dirichlet(γ) partitions play some role.
In this paper we study different types of planar random motions (performed with constant velocity) with three directions, defined by the vectors dj = (cos(2πj/3), sin(2πj/3)) for j = 0, 1, 2, changing at Poisson-paced times. We examine the cyclic motion (where the change of direction is deterministic), the completely uniform motion (where at each Poisson event each direction can be taken with probability ) and the symmetrically deviating case (where the particle can choose all directions except that taken before the Poisson event). For each of the above random motions we derive the explicit distribution of the position of the particle, by using an approach based on order statistics. We prove that the densities obtained are solutions of the partial differential equations governing the processes. We are also able to give the explicit distributions on the boundary and, for the case of the symmetrically deviating motion, we can write it as the distribution of a telegraph process. For the symmetrically deviating motion we use a generalization of the Bose-Einstein statistics in order to determine the distribution of the triple (N0, N1, N2) (conditional on N(t) = k, with N0 + N1 + N2= N(t) + 1, where N(t) is the number of Poisson events in [0, t]), where Nj denotes the number of times the direction dj (j = 0, 1, 2) is taken. Possible extensions to four directions or more are briefly considered.
We investigate the limit distributions associated with cost measures in Sattolo's algorithm for generating random cyclic permutations. The number of moves made by an element turns out to be a mixture of 1 and 1 plus a geometric distribution with parameter ½, where the mixing probability is the limiting ratio of the rank of the element being moved to the size of the permutation. On the other hand, the raw distance traveled by an element to its final destination does not converge in distribution without norming. Linearly scaled, the distance converges to a mixture of a uniform and a shifted product of a pair of independent uniforms. The results are obtained via randomization as a transform, followed by derandomization as an inverse transform. The work extends analysis by Prodinger.
Consider the classical coupon-collector's problem in which items of m distinct types arrive in sequence. An arriving item is installed in system i ≥ 1 if i is the smallest index such that system i does not contain an item of the arrival's type. We study the expected number of items in system j at the moment when system 1 first contains an item of each type
This paper studies path lengths in random binary search trees under the random permutation model. It is known that the total path length, when properly normalized, converges almost surely to a nondegenerate random variable Z. The limit distribution is commonly referred to as the ‘quicksort distribution’. For the class 𝒜m of finite binary trees with at most m nodes we partition the external nodes of the binary search tree according to the largest tree that each external node belongs to. Thus, the external path length is divided into parts, each part associated with a tree in 𝒜m. We show that the vector of these path lengths, after normalization, converges almost surely to a constant vector times Z.
In this paper, we investigate sooner and later waiting time problems for patterns S0 and S1 in multistate Markov dependent trials. The probability functions and the probability generating functions of the sooner and later waiting time random variables are studied. Further, the probability generating functions of the distributions of distances between successive occurrences of S0 and between successive occurrences of S0 and S1 and of the waiting time until the rth occurrence of S0 are also given.
In this paper, we introduce a compound random mapping model which can be viewed as a generalization of the basic random mapping model considered by Ross and by Jaworski. We investigate a particular example, the Poisson compound random mapping, and compare results for this model with results known for the well-studied uniform random mapping model. We show that, although the structure of the components of the random digraph associated with a Poisson compound mapping differs from the structure of the components of the random digraph associated with the uniform model, the limiting distribution of the normalized order statistics for the sizes of the components is the same as in the uniform case, i.e. the limiting distribution is the Poisson-Dirichlet (½) distribution on the simplex {{xi} : ∑ xi ≤ 1, xi ≥ xi+1 ≥ 0 for every i ≥ 1}.
Uniform sequential tree-building aggregation of n particles is analyzed together with the effect of the avalanche that takes place when a subtree rooted at a uniformly chosen vertex is removed. For large n, the expected subtree size is found to be ≃ logn both for the tree of size n and the tree that remains after an avalanche. Repeated breakage-restoration cycles are seen to give independent avalanches which attain size k(1 ≤ k ≤ n-1) with probability (k(k+1))-1 and restored trees that are recursive.
The coupon subset collection problem is a generalization of the classical coupon collecting problem, in that rather than collecting individual coupons we obtain, at each time point, a random subset of coupons. The problem of interest is to determine the expected number of subsets needed until each coupon is contained in at least one of these subsets. We provide bounds on this number, give efficient simulation procedures for estimating it, and then apply our results to a reliability problem.
The problem of finding bounds for P(A1 ∪ ⋯ ∪ An) based on P(Ak1 ∩ ⋯ ∩ Aki) (1 ≤ k1 < ⋯ < ki ≤ n, i = 1,…,d) goes back to Boole (1854), (1868) and Bonferroni (1937). In this paper upper bounds are presented using methods in graph theory. The main theorem is a common generalization of the earlier results of Hunter, Worsley and recent results of Prékopa and the author. Algorithms are given to compute bounds. Examples for bounding values of multivariate normal distribution functions are presented.
Optical mapping is a new technique to generate restriction maps of DNA easily and quickly. DNA restriction maps can be aligned by comparing corresponding restriction fragment lengths. To relate, organize, and analyse these maps it is necessary to rapidly compare maps. The issue of the statistical significance of approximately matching maps then becomes central, as in BLAST with sequence scoring. In this paper, we study the approximation to the distribution of counts of matched regions of specified length when comparing two DNA restriction maps. Distributional results are given to enable us to compute p-values and hence to determine whether or not the two restriction maps are related. The key tool used is the Chen-Stein method of Poisson approximation. Certain open problems are described.