To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
SECTION 1 considers the elementary case of conditioning on a map that takes only finitely many different values, as motivation for the general definition.
SECTION 2 defines conditional probability distributions for conditioning on the value of a general measurable map.
SECTION 3 discusses existence of conditional distributions by means of a slightly more general concept, disintegration, which is essential for the understanding of general conditional densities.
SECTION 4 defines conditional densities. It develops the general analog of the elementary formula for a conditional density: (joint density)/(marginal density).
SECTION *5 illustrates how conditional distributions can be identified by symmetry considerations. The classical Borel paradox is presented as a warning against the misuse of symmetry.
SECTION 6 discusses the abstract Kolmogorov conditional expectation, explaining why it is natural to take the conditioning information to be a sub-sigma-field.
SECTION *7 discusses the statistical concept of sufficiency.
Conditional distributions: the elementary case
In introductory probability courses, conditional probabilities of events are defined as ratios, ℙ(A∣B) = ℙAB/ℙB, provided ℙB ≠ 0. The division by ℙB ensures that ℙ(· ∣ B) is also a probability measure, which puts zero mass outside the set B, that is, ℙ(Bc ∣ B) = 0. The conditional expectation of a random variable X is defined as its expectation with respect to ℙ(· ∣ B), or, more succinctly, ℙ(X ∣ B) = ℙ(XB)/ℙB. If ℙB = 0, the conditional probabilities and conditional expectations are either left undefined or are extracted by some heuristic limiting argument.
SECTION 1 defines the concepts of weak convergence for sequences of probability measures on a metric space, and of convergence in distribution for sequences of random elements of a metric space and derives some of their consequences. Several equivalent definitions for weak convergence are noted.
SECTION 2 establishes several more equivalences for weak convergence of probability measures on the real line, then derives some central limit theorems for sums of independent random variables by means of Lindeberg's substitution method.
SECTION 3 explains why the multivariate analogs of the methods from Section 2 are not often explicitly applied.
SECTION 4 develops the calculus of stochastic order symbols.
SECTION *5 derives conditions under which sequences of probability measures have weakly convergent subsequences.
Definition and consequences
Roughly speaking, central limit theorems give conditions under which sums of random variable have approximate normal distributions. For example:
If ξ1, …, ξn are independent random variables with ℙξi = 0 for each i and ∑i var(ξi) = 1, and if none of the ξi, makes too large a contribution to their sum, then ∑i ξi is approximately N(0, 1) distributed.
The traditional way to formalize approximate normality requires, for each real x, that ℙ{∑i ξi ≤ x) ≈ ℙ{Z ≤ x} where Z has a N(0, 1) distribution. Of course the variable Z is used just as a convenient way to describe a calculation with the N(0, 1) probability measure; Z could be replaced by any other random variable with the same distribution.
SECTION 1 explains why the traditional split of introductory probability courses into two segments—the study of discrete distributions, and the study of “continuous” distributions—is unnecessary in a measure theoretic treatment. Absolute continuity of one measure with respect to another measure is defined. A simple case of the Radon-Nikodym theorem is proved.
SECTION *2 establishes the Lebesgue decomposition of a measure into parts absolutely continuous and singular with respect to another measure, a result that includes the Radon-Nikodym theorem as a particular case.
SECTION 3 shows how densities enter into the definitions of various distances between measures.
SECTION 4 explains the connection between the classical concept of absolute continuity and its measure theoretic generalization. Part of the Fundamental Theorem of Calculus is deduced from the Radon-Nikodym theorem.
SECTION *5 establishes the Vitali covering lemma, the key to the identification of derivatives as densities.
SECTION *6 presents the proof of the other part of the Fundamental Theorem of Calculus, showing that absolutely continuous functions (on the real line) are Lebesgue integrals of their derivatives, which exist almost everywhere.
Densities and absolute continuity
Nonnegative measurable functions create new measures from old.
Let (X, A, µ) be a measure space, and let Δ(·) be a function in M+(X, A). The increasing, linear functional defined on M+(X, A) by vf ≔ µ(fΔ) inherits from µ the Monotone Convergence property, which identifies it as an integral with respect to a measure on A.
SECTION 1 introduces independence as a property that justifies some sort of factorization of probabilities or expectations. A key factorization Theorem is stated, with proof deferred to the next Section, as motivation for the measure theoretic approach. The Theorem is illustrated by a derivation of a simple form of the strong law of large numbers, under an assumption of bounded fourth moments.
SECTION 2 formally defines independence as a property of sigma-fields. The key Theorem from Section 1 is used as motivation for the introduction of a few standard techniques for dealing with independence. Product sigma-fields are defined.
SECTION 3 describes a method for constructing measures on product spaces, starting from a family of kernels.
SECTION 4 specializes the results from Section 3 to define product measures. The Tonelli and Fubini theorems are deduced. Several important applications are presented.
SECTION *5 discusses some difficulties encountered in extending the results of Sections 3 and 4 when the measures are not sigma-finite.
SECTION 6 introduces a blocking technique to refine the proof of the strong law of large numbers from Section 1, to get a version that requires only a second moment condition.
SECTION *7 introduces a truncation technique to further refine the proof of the strong law of large numbers, to get a version that requires only a first moment condition for identically distributed summands.
SECTION *8 discusses the construction of probability measures on products of countably many spaces.
We consider an extension of the Monotone Subsequence lemma of Erdős and Szekeres in higher dimensions. Let v1,…,vn ∈ ℝd be a sequence of real vectors. For a subset I ⊆ [n] and vector [srarr ]c ∈ {0,1}d we say that I is [srarr ]c-free if there are no i < j ∈ I, such that, for every k = 1,…,d, vik < vik if and only if [srarr ]ck = 0. We construct sequences of vectors with the property that the largest [srarr ]c-free subset is small for every choice of [srarr ]c. In particular, for d = 2 the largest [srarr ]c-free subset is O(n⅝) for all the four possible [srarr ]c. The smallest possible value remains far from being determined.
We also consider and resolve a simpler variant of the problem.
SECTION 1 explains why you will not learn from this Chapter everything there is to know about the multivariate normal distribution.
SECTION 2 introduces Fernique's inequality. As illustration, Sudakov's lower bound for the expected value of a maximum of correlated normals is derived.
SECTION *3 proves Fernique's inequality.
SECTION 4 introduces the Gaussian isoperimetric inequlity. As an application, Borell's tail bound for the distribution of the maximum of correlated normals is derived.
SECTION *5 proves the Gaussian isoperimetric inequlity.
Introduction
Of all the probability distributions on multidimensional Euclidean spaces the multivariate normal is the most studied and, in many ways, the most tractable. In years past, the statistical subject known as “Multivariate Analysis” was almost entirely devoted to the study of the multivariate normal. The literature on Gaussian processes—stochastic processes whose finite dimensional distributions are all multivariate normal—is vast. It is important to know a little about the multivariate normal.
As you saw in Section 8.6, the multivariate normal is uniquely determined by its vector of means and its matrix of covariances. In principle, everything that one might want to know about the distribution can be determined by calculation of means and covariances, but in practice it is not completely straightforward. In this Chapter you will see two elegant examples of what can be achieved: Fernique's (1975) inequality, which deduces important information about the spread in a multivariate normal distribution from its covariances; and Borell's (1975) Gaussian isoperimetric inequality, with a proof due to Ehrhard (1983a, 1983b).
We investigate a graph function which is related to the local density, the maximal cut and the least eigenvalue of a graph. In particular it enables us to prove the following assertions.
Let p [ges ] 3 be an integer, c ∈ (0, 1/2) and G be a Kp-free graph on n vertices with e [les ] cn2 edges. There exists a positive constant α = α (c, p) such that:
(a) some [lfloor]n/2[rfloor]-subset of V (G) induces at most (c-4 − α) n2 edges (this answers a question of Paul Erdős);
(b) G can be made bipartite by the omission of at most (c-2 − α) n2 edges.
SECTION 1 gives some examples of martingales, submartingales, and supermartingales.
SECTION 2 introduces stopping times and the sigma-fields corresponding to “information available at a random time.” A most important Stopping Time Lemma is proved, extending the martingale properties to processes evaluted at stopping times.
SECTION 3 shows that positive supermartingales converge almost surely.
SECTION 4 presents a condition under which a submartingale can be written as a difference between a positive martingale and a positive supermartingale (the Krickeberg decomposition). A limit theorem for submartingales then follows.
SECTION *5 proves the Krickeberg decomposition.
SECTION *6 defines uniform integrability and shows how uniformly integrable martingales are particularly well behaved.
SECTION *7 show that martingale theory works just as well when time is reversed.
SECTION *8 uses reverse martingale theory to study exchangeable probability measures on infinite product spaces. The de Finetti representation and the Hewitt-Savage zero-one law are proved.
What are they?
The theory of martingales (and submartingales and supermartingales and other related concepts) has had a profound effect on modern probability theory. Whole branches of probability, such as stochastic calculus, rest on martingale foundations. The theory is elegant and powerful: amazing consequences flow from an innocuous assumption regarding conditional expectations. Every serious user of probability needs to know at least the rudiments of martingale theory.
A little notation goes a long way in martingale theory. A fixed probability space (Ω, ℱ, ℙ) sits in the background.
We shall pack circuits of arbitrary lengths into the complete graph KN. More precisely, if N is odd and [sum ]ti=1mi = (N2), mi [ges ] 3, then the edges of KN can be written as an edge-disjoint union of circuits of lengths m1,…,mt. Since the degrees of the vertices in any such packing must be even, this result cannot hold for even N. For N even, we prove that if [sum ]ti=1mi [les ] (N2) − N−2 then we can write some subgraph of KN as an edge-disjoint union of circuits of lengths m1,…,mt. In particular, KN minus a 1-factor can be written as a union of such circuits when [sum ]ti=1mi = (N2) − N−2. We shall also show that these results are best possible.
It is shown that it is possible to extend α Hölder maps from subsets of Lp to Lq (1 < p, q ≤ 2) isometrically if and only if α≤p/q*, and isomorphically if and only if α≤p/2. It is also proved that the set of αs which allow an isomorphic extension for α Hölder maps from subsets of X to Y is monotone when Y is a dual Banach space. Finally, the isometric and isomorphic extension problems for Hölder functions between Lp and Lq is studied for general p, q ≥ 1, and a question posed by K. Ball is solved by showing that it is not true that all Lipschitz maps from subsets of Hilbert space into normed spaces extend to the whole of Hilbert space.
We study the properties of single boundary spike solutions for the following singularly perturbed problemIt is known that at a non-degenerate critical point of the mean curvature function H(P), there exists a single boundary spike solution. In this paper, we show that the single boundary spike solution is unique and moreover it has exactly (N − 1) small eigenvalues. We obtain the exact asymptotics of the small eigenvalues in terms of H(P).
It is shown that, if h and k are harmonic in ℝ2 and there exists a positive constant c so that
in ℝ2, where h+ = max {h, 0}, then it need not follow that h - k is identically a constant. The necessary counterexample is obtained by applying Arakelyan's theorem on approximation by an entire function in certain regions in ℝ2.
For every 0 < k < min{m,n} and any linear subspace E of real m × n matrices whose non-zero elements have rank greater than k, we show that there is a maximal extension Emax satisfying the same rank condition, and that the dimension of Emax is not less than (m – k)(n – k). We apply this result to the study of quasiconvex functions defined on the complement E⊥ of E in the form F(X) = f(PE⊥(X)), where PE⊥ is the orthgonal projection to E⊥.
We say that a Banach space E is a continuation space for a given parabolic problem if the E-norm of any non-global solution has to become unbounded. We will prove that for large classes of parabolic systems of two equations, the space E = Lr1 × Lr2 can be a continuation space even though the problem is not locally well posed in E. This stands in contrast with classical results for analogous scalar equations.
In this paper we consider the semilinear elliptic problem in a bounded domain Ω ⊆ Rn,where μ ≥ 0, 0 ≤ α ≤ 2, 2α* := 2(n − α)/(n − 2), f : Ω → R+ is measurable, f > 0 a.e, having a lower-order singularity than |x|-2 at the origin, and g : R → R is either linear or superlinear. For 1 < p < n, we characterize a class of singular functions Ip for which the embedding is compact. When p = 2, α = 2, f ∈ I2 and 0 ≤ μ < (½(n − 2))2, we prove that the linear problem has -discrete spectrum. By improving the Hardy inequality we show that for f belonging to a certain subclass of I2, the first eigenvalue goes to a positive number as μ approaches (½(n − 2))2. Furthermore, when g is superlinear, we show that for the same subclass of I2, the functional corresponding to the differential equation satisfies the Palais-Smale condition if α = 2 and a Brezis-Nirenberg type of phenomenon occurs for the case 0 ≤ α < 2.
In this paper, we consider non-negative solutions of,We prove that if pq ≤ 1, every solution is global while if pq > 1, all solutions blow up in finite time. We also show that if p, q ≥ 1, then blow-up can occur only on the boundary.