We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A sequence of independent Bernoulli random variables with success probabilities a / (a + b + k − 1), k = 1, 2, 3, …, is embedded in a marked Poisson process with intensity 1. Using this, conditional Poisson limits follow for counts of failure strings.
Recently, Makri, Philippou and Psillakis (2007b) studied the exact distribution of success run statistics defined on an urn model. They derived the exact distributions of various success run statistics for a sequence of binary trials generated by the Pólya-Eggenberger sampling scheme. In our study we derive the joint distributions of run statistics defined on the multicolor urn model using a simple unified combinatorial approach and extend some of the results of Makri, Philippou and Psillakis (2007b). As a consequence of our results, we obtain the joint distributions of success and failure runs defined on the two-color urn model. The results enable us to compute the characteristics of particular consecutive-type systems and start-up demonstration tests.
We study a process where balls are repeatedly thrown into n boxes independently according to some probability distribution p. We start with n balls, and at each step, all balls landing in the same box are fused into a single ball; the process terminates when there is only one ball left (coalescence). Let c := ∑jpj2, the collision probability of two fixed balls. We show that the expected coalescence time is asymptotically 2c−1, under two constraints on p that exclude a thin set of distributions p. One of the constraints is c = o(ln−2n). This ln−2n is shown to be a threshold value: for c = ω(ln−2n), there exists p with c(p) = c such that the expected coalescence time far exceeds c−1. Connections to coalescent processes in population biology and theoretical computer science are discussed.
We study the number of collisions, Xn, of an exchangeable coalescent with multiple collisions (Λ-coalescent) which starts with n particles and is driven by rates determined by a finite characteristic measure η(dx) = x−2Λ(dx). Via a coupling technique, we derive limiting laws of Xn, using previous results on regenerative compositions derived from stick-breaking partitions of the unit interval. The possible limiting laws of Xn include normal, stable with index 1 ≤ α < 2, and Mittag-Leffler distributions. The results apply, in particular, to the case when η is a beta(a − 2, b) distribution with parameters a > 2 and b > 0. The approach taken allows us to derive asymptotics of three other functionals of the coalescent: the absorption time, the length of an external branch chosen at random from the n external branches, and the number of collision events that occur before the randomly selected external branch coalesces with one of its neighbours.
We study a model arising in chemistry where n elements numbered 1, 2, …, n are randomly permuted and if i is immediately to the left of i + 1 then they become stuck together to form a cluster. The resulting clusters are then numbered and considered as elements, and this process keeps repeating until only a single cluster is remaining. In this article we study properties of the distribution of the number of permutations required.
We investigate the average similarity of random strings as captured by the average number of ‘cousins’ in the underlying tree structures. Analytical techniques including poissonization and the Mellin transform are used for accurate calculation of the mean. The string alphabets we consider are m-ary, and the corresponding trees are m-ary trees. Certain analytic issues arise in the m-ary case that do not have an analog in the binary case.
In a sequence of independent Bernoulli trials the probability of success in the kth trial is pk = a / (a + b + k − 1). An explicit formula for the binomial moments of the number of two consecutive successes in the first n trials is obtained and some consequences of it are derived.
We study collision probabilities concerning the simple balls-and-bins problem developed by Wendl (2003). In this article we give the factorial moment of the number of collisions. Moreover, we obtain a Poisson approximation for the number of collisions using the Chen-Stein method.
The distributions of the run occurrences for a sequence of independent and identically distributed (i.i.d.) experiments are usually obtained by combinatorial methods (see Balakrishnan and Koutras (2002, Chapter 5)) and the resulting formulae are often very tedious, while the distributions for non i.i.d. experiments are generally intractable. It is therefore of practical interest to find a suitable approximate model with reasonable approximation accuracy. In this paper we demonstrate that the negative binomial distribution is the most suitable approximate model for the number of k-runs: it outperforms the Poisson approximation, the general compound Poisson approximation as observed in Eichelsbacher and Roos (1999), and the translated Poisson approximation in Rollin (2005). In particular, its accuracy of approximation in terms of the total variation distance improves when the number of experiments increases, in the same way as the normal approximation improves in the Berry-Esseen theorem.
Detection of repeated sequences within complete genomes is a powerful tool to help understanding genome dynamics and species evolutionary history. To distinguish significant repeats from those that can be obtained just by chance, statistical methods have to be developed. In this paper we show that the distribution of the number of long repeats in long sequences generated by stationary Markov chains can be approximated by a Poisson distribution with explicit parameter. Thanks to the Chen-Stein method we provide a bound for the approximation error; this bound converges to 0 as soon as the length n of the sequence tends to ∞ and the length t of the repeats satisfies n2ρt = O(1) for some 0 < ρ < 1. Using this Poisson approximation, p-values can then be easily calculated to determine if a given genome is significantly enriched in repeats of length t.
Start with a necklace consisting of one white bead and one black bead, and add new beads one at a time by inserting each new bead between a randomly chosen adjacent pair of old beads, with the proviso that the new bead will be white if and only if both beads of the adjacent pair are black. Let Wn denote the number of white beads when the total number of beads is n. We show that EWn = n/3 and, with c2 = 2/45, that (Wn − n/3) / c√nis asymptotically standard normal. We find that, for all r ≥ 1 and n > 2r, the rth cumulant of the distribution of Wn is of the form nhr. We find the expected numbers of gaps of given length between white beads, and examine the asymptotics of the longest gaps.
In this paper we investigate the ‘local’ properties of a random mapping model, TnD̂, which maps the set {1, 2, …, n} into itself. The random mapping TnD̂, which was introduced in a companion paper (Hansen and Jaworski (2008)), is constructed using a collection of exchangeable random variables D̂1, …, D̂n which satisfy In the random digraph, GnD̂, which represents the mapping TnD̂, the in-degree sequence for the vertices is given by the variables D̂1, D̂2, …, D̂n, and, in some sense, GnD̂ can be viewed as an analogue of the general independent degree models from random graph theory. By local properties we mean the distributions of random mapping characteristics related to a given vertex v of GnD̂ - for example, the numbers of predecessors and successors of v in GnD̂. We show that the distribution of several variables associated with the local structure of GnD̂ can be expressed in terms of expectations of simple functions of D̂1, D̂2, …, D̂n. We also consider two special examples of TnD̂ which correspond to random mappings with preferential and anti-preferential attachment, and determine, for these examples, exact and asymptotic distributions for the local structure variables considered in this paper. These distributions are also of independent interest.
Expressions for the joint distribution of the longest and second longest excursions as well as the marginal distributions of the three longest excursions in the Brownian bridge are obtained. The method, which primarily makes use of the weak convergence of the random walk to the Brownian motion, principally gives the possibility to obtain any desired joint or marginal distribution. Numerical illustrations of the results are also given.
Statistics denoting the numbers of success runs of length exactly equal and at least equal to a fixed length, as well as the sum of the lengths of success runs of length greater than or equal to a specific length, are considered. They are defined on both linearly and circularly ordered binary sequences, derived according to the Pólya-Eggenberger urn model. A waiting time associated with the sum of lengths statistic in linear sequences is also examined. Exact marginal and joint probability distribution functions are obtained in terms of binomial coefficients by a simple unified combinatorial approach. Mean values are also derived in closed form. Computationally tractable formulae for conditional distributions, given the number of successes in the sequence, useful in nonparametric tests of randomness, are provided. The distribution of the length of the longest success run and the reliability of certain consecutive systems are deduced using specific probabilities of the studied statistics. Numerical examples are given to illustrate the theoretical results.
Consider a random graph, having a prespecified degree distribution F, but other than that being uniformly distributed, describing the social structure (friendship) in a large community. Suppose that one individual in the community is externally infected by an infectious disease and that the disease has its course by assuming that infected individuals infect their not yet infected friends independently with probability p. For this situation, we determine the values of R0, the basic reproduction number, and τ0, the asymptotic final size in the case of a major outbreak. Furthermore, we examine some different local vaccination strategies, where individuals are chosen randomly and vaccinated, or friends of the selected individuals are vaccinated, prior to the introduction of the disease. For the studied vaccination strategies, we determine Rv, the reproduction number, and τv, the asymptotic final proportion infected in the case of a major outbreak, after vaccinating a fraction v.
In a sequence of independent Bernoulli trials the probability for success in the kth trial is pk, k = 1, 2, …. The number of strings with a given number of failures between two subsequent successes is studied. Explicit expressions for distributions and moments are obtained for the case in which pk = a/(a + b + k − 1), a > 0, b ≥ 0. Also, the limit behaviour of the longest failure string in the first n trials is considered. For b = 0, the strings correspond to cycles in random permutations.
Let n points be randomly and independently placed in Rd according to a common probability law. It is known that the expected volume for the convex hull of these points, in the cases where n - d ≥ 2 and even, is related linearly to expected volumes of the convex hulls for j points, j < n. We show that similar identities for these volumes hold almost surely - and in contexts where independence and communality of law do not apply. New geometric and topological identities developed here provide a foundation for this result.
We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.
We propose a simple and efficient scheme for ranking all teams in a tournament where matches can be played simultaneously. We show that the distribution of the number of rounds of the proposed scheme can be derived using lattice path counting techniques used in ballot problems. We also discuss our method from the viewpoint of parallel sorting algorithms.
The trie is a sort of digital tree. Ideally, to achieve balance, the trie should grow from an unbiased source generating keys of bits with equal likelihoods. In practice, the lack of bias is not always guaranteed. We investigate the distance between randomly selected pairs of nodes among the keys in a biased trie. This research complements that of Christophi and Mahmoud (2005); however, the results and some of the methodology are strikingly different. Analytical techniques are still useful for moments calculation. Both mean and variance are of polynomial order. It is demonstrated that the standardized distance approaches a normal limiting random variable. This is proved by the contraction method, whereby the limit distribution is shown to approach the fixed-point solution of a distributional equation in the Wasserstein metric space.