To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A life which included no improbable events would be the real statistical improbability.
Poul Anderson
It is plain that any scientist is trying to correlate the incoherent body of facts confronting him with some definite and orderly scheme of abstract relations, the kind of scheme which he can borrow only from mathematics.
G. H. Hardy
This chapter introduces the basic concepts of probability in an informal way. We discuss our everyday experience of chance, and explain why we need a theory and how we start to construct one. Mathematical probability is motivated by our intuitive ideas about likelihood as a proportion in many practical instances. We discuss some of the more common questions and problems in probability, and conclude with a brief account of the history of the subject.
Chance
My only solution for the problem of habitual accidents is to stay in bed all day. Even then, there is always the chance that you will fall out.
Robert Benchley
It is not certain that everything is uncertain.
Blaise Pascal
You can be reasonably confident that the sun will rise tomorrow, but what it will be shining on is a good deal more problematical. In fact, the one thing we can be certain of is that uncertainty and randomness are unavoidable aspects of our experience.
At a personal level, minor ailments and diseases appear unpredictably and are resolved not much more predictably. Your income and spending are subject to erratic strokes of good or bad fortune.
The calculus of probabilities, in an appropriate form, should interest equally the mathematician, the experimentalist, and the statesman. … It is under its influence that lotteries and other disgraceful traps cunningly laid for greed and ignorance have finally disappeared.
Francois Arago, Eulogy on Laplace, 1827
Lastly, one of the principal uses to which this Doctrine of Chances may be applied, is the discovering of some truths, which cannot fail of pleasing the mind, by their generality and simplicity; the admirable connexion of its consequences will increase the pleasure of the discovery; and the seeming paradoxes wherewith it abounds, will afford very great matter of surprize and entertainment to the inquisitive.
Abraham de Moivre, The Doctrine of Chances, 1756
This book provides an introduction to elementary probability and some of its simple applications. In particular, a principal purpose of the book is to help the student to solve problems. Probability is now being taught to an ever wider audience, not all of whom can be assumed to have a high level of problem-solving skills and mathematical background. It is also characteristic of probability that, even at an elementary level, few problems are entirely routine. Successful problem solving requires flexibility and imagination on the part of the student. Commonly, these skills are developed by observation of examples and practice at exercises, both of which this text aims to supply.
With these targets in mind, in each chapter of the book, the theoretical exposition is accompanied by a large number of examples and is followed by worked examples incorporating a cluster of exercises.
Although more detailed and formal than the presentation in §1.1, this appendix does not claim to provide a complete, rigorous presentation of axiomatic set theory (there are several entire books devoted to the subject, some of them listed in the references). Although axioms for set theory will be stated in detail, some definitions, such as linear ordering and well-ordering, will be assumed to be known (from Chapter 1).
Mathematical Logic
Around 300 b.c., Euclid's geometry presented “a strictly logical deduction of theorems from a set of definitions, postulates and axioms” (Struik, 1948, p. 59). Euclid went a long way, although not all the way, to the modern ideal of the axiomatic method, where, when the proof of a theorem is written out in detail, it can be checked mechanically and precisely to ascertain that it is (or is not) a proof. From a modern point of view, perhaps the least strictly logical part of Euclid's system is his definitions—for example, “a point is that which has no extension,” “a line is a length, without width …” As was noted in §1.1, a truly precise mathematical system, or ‘formal system’, begins with some basic undefined terms. Then other terms can be defined from the basic ones.
The most widely accepted formal systems, giving a foundation for modern mathematics, are based on propositional calculus and first-order predicate logic. Only a very brief introduction to these topics will be given here. For more details see, for example, Kleene (1967).
So far we have dealt with convergence of laws mainly on finite-dimensional Euclidean spaces ℝk, for the central limit theorem (§§9.3–9.5). Now we'll treat converging laws on more general, possibly infinite-dimensional spaces. Here are some cases where such spaces and laws can be helpful.
Let x(t, Ω) be the position of a randomly moving particle at time t, where Ω ∈ Ω, for some probability space (Ω, , P). For each Ω, we then have a continuous function t↦ x(t, Ω). Suppose we consider times t with 0 ≤ t ≤ 1 and that x is real-valued (the particle is moving along a line, or we just consider one coordinate of its position). Then x(·, Ω) belongs to the space C[0, 1] of continuous real-valued functions on [0, 1]. The space C[0, 1] has a usual norm, the supremum norm |f| ≔, sup|f(t)|: 0 < t < 1. Then C[0, 1] is a complete separable metric space for the metric d defined as usual by d(f, g) ≔, |f – g|. It may be useful to approximate the process x, for example, by a process yn such that for each Ω and each k = 1, …, n, yn (·, Ω) is linear on the interval [(k – 1)/n, k/n]. Thus it may help to define yn converging to x in law (or in probability or a.s.) in C[0, 1].
A classical example of measure is the length of intervals. In the modern theory of measure, developed by Émile Borel and Henri Lebesgue around 1900, the first task is to extend the notion of “length” to very general subsets of the real line. In representing intervals as finite, disjoint unions of other intervals, it is convenient to use left open, right closed intervals. The length is denoted by λ((a, b]) ≔ b – a for a ≤ b. Now, in the extended real number system [−∞, ∞] ≔ {−∞} ∪ ℝ ∪ {+∞}, −∞ and +∞ are two objects that are not real numbers. Often +∞ is written simply as ∞. The linear ordering of real numbers is extended by setting −∞ < x < ∞ for any real number x. Convergence to ±∞ will be for the interval topology, as defined in §2.2; for example, xn → +∞ iff for any K < ∞ there is an m with xn > K for all n < m. If a sequence or series of real numbers is called convergent, however, and the limit is not specified, then the limit is supposed to be in ℝ, not ±∞. For any real x, x + (–∞) ≔, −∞ and x + ∞ ≔, +∞, while ∞ – ∞, or ∞ + (−∞), is undefined, although of course it may happen that an → + ∞ and bn → −∞ while an + bn approaches a finite limit.
Stochastic processes have been treated so far mainly in connection with martingales, although a general definition was given: a stochastic process is a function X of two variables t and Ω, t ∈ T, Ω ∈ Ω, where (Ω, , P) is a probability space and for each t, X(t,·) is measurable on Ω. Taking T to be the set of positive integers, any sequence of random variables is a stochastic process. In much of the more classical theory of processes, T is a subset of the real line. But by the 1950s, if not before, it began to be realized that there are highly irregular random processes, useful in representing or approximating “noise,” for example, which are in a sense defined over the line but which do not have values at points t. Instead, “integrals” W(f)= ∫ W(t)f(t)dt are defined only if f has some smoothness and/or other regularity properties. Thus an appropriate index set T for the process may be a set of functions on ℝ rather than a subset of ℝ. Such processes are also useful where we may have random functions not only changing in time but defined also on space, so that T may be a set of smooth functions of space as well as, or instead of, time variables. At any rate, the beginnings of the theory of stochastic processes, and a basic existence theorem, hold for an arbitrary index set T without any structure.
Nearly every measure used in mathematics is defined on a space where there is also a topology such that the domain of the measure is either the Borel σ-algebra generated by the topology, its completion for the measure, or perhaps an intermediate σ-algebra. Defining the integrals of real-valued functions on a measure space did not involve any topology as such on the domain space, although structures on the range space ℝ (order as well as topology) were used. Section 7.1 will explore relations between measures and topologies.
The derivative of one measure with respect to another, dγ/dμ = f, is defined by the Radon-Nikodym theorem (§5.5) in case γ is absolutely continuous with respect to μ. A natural question is whether differentiation is valid in the sense that then γ(A)/μ(A) converges to f(x) as the set A shrinks down to x. For this, it is clearly not enough that the sets A contain x and their measures approach 0, as most of the sets might be far from x. One would expect that the sets should be included in neighborhoods of x forming a filter base converging to x. In ℝ, for the usual differentiation, the sets A are intervals, usually with an endpoint at x. It turns out that it is not enough for the sets A to converge to x.
Functional analysis is concerned with infinite-dimensional linear spaces, such as Banach spaces and Hilbert spaces, which most often consist of functions or equivalence classes of functions. Each Banach space X has a dual space X′ defined as the set of all continuous linear functions from X into the field ℝ or ℂ.
One of the main examples of duality is for Lp spaces. Let (X, S, μ) be a measure space. Let 1 < p < ∞ and 1/p + 1/q = 1. Then it turns out that Lp and Lq are dual to each other via the linear functional f ↦ ∫ f g d μ for f in p and g in q. For p = q = 2, L2 is a Hilbert space, where it was shown previously that any continuous linear form on a Hilbert space H is given by inner product with a fixed element of H (Theorem 5.5.1).
Other than linear subspaces, some of the most natural and frequently applied subsets of a vector space S are the convex subsets C, such that for any x and y in C, and 0 < t < 1, we have tx + (1 – t)y ∈ C. These sets are treated in §§6.2 and 6.6. A function for which the region above its graph is convex is called a convex function. §6.3 deals with convex functions. Convex sets and functions are among the main subjects of modern real analysis.
In constructing a building, the builders may well use different techniques and materials to lay the foundation than they use in the rest of the building. Likewise, almost every field of mathematics can be built on a foundation of axiomatic set theory. This foundation is accepted by most logicians and mathematicians concerned with foundations, but only a minority of mathematicians have the time or inclination to learn axiomatic set theory in detail.
To make another analogy, higher-level computer languages and programs written in them are built on a foundation of computer hardware and systems programs. How much the people who write high-level programs need to know about the hardware and operating systems will depend on the problem at hand.
In modern real analysis, set-theoretic questions are somewhat more to the fore than they are in most work in algebra, complex analysis, geometry, and applied mathematics. A relatively recent line of development in real analysis, “nonstandard analysis,” allows, for example, positive numbers that are infinitely small but not zero. Nonstandard analysis depends even more heavily on the specifics of set theory than earlier developments in real analysis did.
This chapter will give only enough of an introduction to set theory to define some notation and concepts used in the rest of the book. In other words, this chapter presents mainly “naive” (as opposed to axiomatic) set theory.