To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Computer-based tests with randomly generated questions allow a large number of different tests to be generated. Given a fixed number of alternatives for each question, the number of tests that need to be generated before all possible questions have appeared is surprisingly low.
AMS subject classification (MSC2010) 60G70, 60K99
Introduction
The use of computer-based tests in which questions are randomly generated in some way provides a means whereby a large number of different tests can be generated; many universities currently use such tests as part of the student assessment process. In this paper we present findings that illustrate that, although the number of different possible tests is high and grows very rapidly as the number of alternatives for each question increases, the average number of tests that need to be generated before all possible questions have appeared at least once is surprisingly low. We presented preliminary findings along these lines in Cornish et al. (2006).
A computer-based test consists of q questions, each (independently) selected at random from a separate bank of a alternatives. Let Nq be the number of tests one needs to generate in order to see all the aq questions in the q question banks at least once. We are interested in how, for fixed a, the random variable Nq grows with the number of questions q in the test.
Our program of interpreting a nonlinear Markov process as the LLN limit of an approximating Markov interacting-particle system was fulfilled in Chapter 9 for a wide class of interactions. In this chapter we address the natural next step in the analysis of approximating systems of interacting particles. Namely, we deal with processes involving fluctuations around the dynamic LLN limit. The objective is to show that in many cases the limiting behavior of a fluctuation process is described by an infinite-dimensional Gaussian process of Ornstein–Uhlenbeck type. This statement can be called a dynamic central limit theorem (CLT). As in Chapter 9 we start with a formal calculation of the generator for the fluctuation process in order to be able to compare it with the limiting second-order Ornstein–Uhlenbeck generator. Then we deduce a weak form of the CLT, though with precise convergence rates. Finally we sketch the proof of the full result (i.e. the convergence of fluctuation processes in a certain Skorohod space of càdlàg paths with values in weighted Sobolev spaces) for a basic coagulation model, referring for details to the original paper.
Generators for fluctuation processes
In this section we calculate generators for fluctuation processes of approximating Markov interacting-particle systems around their LLNs, which are given by solutions to kinetic equations. Here we undertake only general, formal, calculations without paying much attention to the precise conditions under which the various manipulations actually make sense. We postpone to later sections justifying the validity of these calculations for concrete models in various strong or weak topologies under differing assumptions. The calculations are lengthy but straightforward.
We prove a long-standing conjecture which characterizes the Ewens—Pitman two-parameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2, 3, …, if one of n individuals is chosen uniformly at random, independently of the random partition πn of these individuals into various types, and all individuals of the same type as the chosen individual are deleted, then for each r > 0, given that r individuals remain, these individuals are partitioned according to for some sequence of random partitions which does not depend on n. An analogous result characterizes the associated Poisson—Dirichlet family of random discrete distributions by an independence property related to random deletion of a frequency chosen by a size-biased pick. We also survey the regenerative properties of members of the two-parameter family, and settle a question regarding the explicit arrangement of intervals with lengths given by the terms of the Poisson–Dirichlet random sequence into the interval partition induced by the range of a homogeneous neutral-to-the right process.
Kingman introduced the concept of a partition structure, that is a family of probability distributions for random partitions πn of a positive integer n, with a sampling consistency property as n varies.
The volume of a Wiener sausage constructed from a diffusion process with periodic, mean-zero, divergence-free velocity field, in dimension 3 or more, is shown to have a non-random and positive asymptotic rate of growth. This is used to establish the existence of a homogenized limit for such a diffusion when subject to Dirichlet conditions on the boundaries of a sparse and independent array of obstacles. There is a constant effective long-time loss rate at the obstacles. The dependence of this rate on the form and intensity of the obstacles and on the velocity field is investigated. A Monte Carlo algorithm for the computation of the volume growth rate of the sausage is introduced and some numerical results are presented for the Taylor–Green velocity field.
We consider the problem of the existence and characterization of a homogenized limit for advection-diffusion in a perforated domain. This problem was initially motivated for us as a model for the transport of water vapour in the atmosphere, subject to molecular diffusion and turbulent advection, where the vapour is also lost by condensation on suspended ice crystals. It is of interest to determine the long-time rate of loss and in particular whether this is strongly affected by the advection. In this article we address a simple version of this set-up, where the advection is periodic in space and constant in time and where the ice crystals remain fixed in space.
In the introduction to this book general kinetic equations were obtained as the law of large numbers (LLN) limit of rather general Markov models of interacting particles. This deduction can be called informal, because the limit was performed (albeit quite rigorously) on the forms of the corresponding equations rather than on their solutions, and only the latter type of limit can make any practical sense. Thus it was noted that, in order to make the theory work properly, one has to complete two tasks: to obtain the well-posedness of the limiting kinetic equations (specifying nonlinear Markov processes) and to prove the convergence of the approximating processes to the solutions of these kinetic equations. The first task was settled in Part II. In this chapter we address the second task by proving the convergence of approximations and also supplying precise estimates for error terms.
We can proceed either analytically using semigroup methods or by working directly with the convergence of stochastic processes. Each method has its advantages, and we shall demonstrate both. To obtain the convergence of semigroups we need to estimate the difference between the approximating and limiting generators on a sufficiently rich class of functionals (forming a core for the limiting generator). In Section 9.1 we calculate this difference explicitly for functionals on measures having well-defined first- and second-order variational derivatives. Section 9.2 is devoted to the case of limiting generators of Lévy—Khintchine type with bounded coefficients.
The construction of the Ornstein-Uhlenbeck (OU) semigroups from Section 10.4 is very straightforward. However, the corresponding process is Gaussian; hence it is also quite natural and insightful to construct infinite-dimensional OU semigroups and/or propagators alternatively, via the completion from its action on Gaussian test functions. In analyzing the latter, the Riccati equation appears. We shall sketch here this approach to the analysis of infinite-dimensional OU semigroups, starting with the theory of differential Riccati equations on symmetric operators in Banach spaces.
Let B and B* be a real Banach space and its dual, duality being denoted as usual by (., .). Let us say that a densely defined operator C from B to B* (that is possibly unbounded) is symmetric (resp. positive) if (Cν, ω) = (Cω, ν) (resp. if (Cν, ν) ≥ 0) for all ν, ω from the domain of C. By SL+(B, B*) let us denote the space of bounded positive operators taking B to B*. Analogous definitions are applied to the operators taking B* to B. The notion of positivity induces a (partial) order relation on the space of symmetric operators.
The properties of separability, metrizability, compactness and completeness for a topological space S are crucial for the analysis of S-valued random processes. Here we shall recall the basis relevant notions for the space of Borel measures, highlighting the main ideas and examples and omitting lengthy proofs.
Recall that a topological (e.g. metric) space is called separable if it contains a countable dense subset. It is useful to have in mind that separability is a topological property, unlike, say, completeness, which depends on the choice of distance. (For example, an open interval and the line R are homeomorphic, but the usual distance is complete for the line and not complete for the interval). The following standard examples show that separability cannot necessarily be assumed.
Example A.1 The Banach space l∞ of bounded sequences of real (or complex) numbers a = (a1, a2, …) equipped with the sup norm ∥a∥ = supi∣ai∣ is not separable, because its subset of sequences with values in {0, 1} is not countable but the distance between any two such (not coinciding) sequences is 1.
Example A.2 The Banach spaces C(Rd), L∞(Rd), Msigned(Rd) are not separable because they contain a subspace isomorphic to l∞.
Example A.3 The Banach spaces C∞(Rd), Lp(Rd), p ∈ [1,∞), are separable; this follows from the Stone–Weierstrass theorem.
We begin by reviewing some probabilistic results about the Dirichlet Process and its close relatives, focussing on their implications for statistical modelling and analysis. We then introduce a class of simple mixture models in which clusters are of different ‘colours’, with statistical characteristics that are constant within colours, but different between colours. Thus cluster identities are exchangeable only within colours. The basic form of our model is a variant on the familiar Dirichlet process, and we find that much of the standard modelling and computational machinery associated with the Dirichlet process may be readily adapted to our generalisation. The methodology is illustrated with an application to the partially-parametric clustering of gene expression profiles.
The purpose of this note is four-fold: to remind some Bayesian nonparametricians gently that closer study of some probabilistic literature might be rewarded, to encourage probabilists to think that there are statistical modelling problems worth of their attention, to point out to all another important connection between the work of John Kingman and modern statistical methodology (the role of the coalescent in population genetics approaches to statistical genomics being the most important example; see papers by Donnelly, Ewens and Griffiths in this volume), and finally to introduce a modest generalisation of the Dirichlet process.
Chapter 3 was devoted to the construction of Markov processes by means of SDEs. Here we shall discuss analytical constructions. In Section 4.1 we sketch the content of the chapter, making, in passing, a comparison between these two approaches.
Comparing analytical and probabilistic tools
Sections 4.2 and 4.3 deal with the integral generators corresponding probabilistically to pure jump Markov processes. The basic series expansion (4.3), (4.4) is easily obtained analytically via the du Hamel principle, and probabilistically it can be obtained as the expansion of averages of terms corresponding to a fixed number of jumps; see Theorem 2.35. Thus for bounded generators both methods lead to the same, easily handled, explicit formula for such processes. In the less trivial situation of unbounded rates the analytical treatment given below rapidly yields the general existence result and eventually, subject to the existence of a second bound, uniqueness and non-explosion. However, if the process does explode in finite time, leading to non-uniqueness, specifying the various processes that arise (i.e. solutions to the evolution equation) requires us to fix “boundary conditions at infinity”, and this is most naturally done probabilistically by specifying the behavior of a process after it reaches infinity (i.e. after explosion). We shall not develop the theory in this direction; see, however, Exercise 2.7.