To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
How does one calculate the probability of throwing heads more than fifteen times in 25 tosses of a fair coin? What is the probability of winning a lottery prize? Is it exceptional for a city that averages eight serious fires per year to experience twelve serious fires in one particular year? These kinds of questions can be answered by the probability distributions that we will be looking at in this chapter. These are the binomial distribution, the Poisson distribution and the hypergeometric distribution. A basic knowledge of these distributions is essential in the study of probability theory. This chapter gives insight into the different types of problems to which these probability distributions can be applied. The binomial model refers to a series of independent trials of an experiment that has two possible outcomes. Such an elementary experiment is also known as a Bernoulli experiment, after the famous Swiss mathematician Jakob Bernoulli (1654–1705). Inmost cases, the two possible outcomes of a Bernoulli experiment will be specified as “success” or “failure.” Many probability problems boil down to determining the probability distribution of the total number of successes in a series of independent trials of a Bernoulli experiment. The Poisson distribution is another important distribution and is used, in particular, to model the occurrence of rare events. When you know the expected value of a Poisson distribution, you know enough to calculate all of the probabilities of that distribution.
Why do so many students find probability difficult? Could it be the way the subject is taught in so many textbooks? When I was a student, a class in topology made a great impression on me. The teacher asked us not to take notes during the first hour of his lectures. In that hour, he explained ideas and concepts from topology in a non-rigorous, intuitive way. All we had to do was listen in order to grasp the concepts being introduced. In the second hour of the lecture, the material from the first hour was treated in a mathematically rigorous way and the students were allowed to take notes. I learned a lot from this approach of interweaving intuition and formal mathematics.
This book is written very much in the same spirit. It first helps you develop a “feel for probabilities” before presenting the more formal mathematics. The book is not written in a theorem–proof style. Instead, it aims to teach the novice the concepts of probability through the use of motivating and insightful examples. No mathematics are introduced without specific examples and applications to motivate the theory. Instruction is driven by the need to answer questions about probability problems that are drawn from real-world contexts. The book is organized into two parts. Part One is informal, using many thought-provoking examples and problems from the real world to help the reader understand what probability really means. Probability can be fun and engaging, but this beautiful branch of mathematics is also indispensable to modern science.
In Chapter 8, conditional probabilities are introduced by conditioning upon the occurrence of an event B of nonzero probability. In applications, this event B is often of the form Y = b for a discrete random variable Y. However, when the random variable Y is continuous, the condition Y = b has probability zero for any number b. The purpose of this chapter is to develop techniques for handling a condition provided by the observed value of a continuous random variable. We will see that the conditional probability density function of X given Y = b for continuous random variables is analogous to the conditional probability mass function of X given Y = b for discrete random variables. The conditional distribution of X given Y = b enables us to define the natural concept of conditional expectation of X given Y = b. This concept allows for an intuitive understanding and is of utmost importance. In statistical applications, it is often more convenient to work with conditional expectations instead of the correlation coefficient when measuring the strength of the relationship between two dependent random variables. In applied probability problems, the computation of the expected value of a random variable X is often greatly simplified by conditioning on an appropriately chosen random variable Y. Learning the value of Y provides additional information about the random variable X and for that reason the computation of the conditional expectation of X given Y = b is often simple.
Many random phenomena happen in continuous time. Examples include occurrence of cell phone calls, spread of epidemic diseases, stock fluctuations, etc. A continuous-time Markov chain is a very useful stochastic process to model such phenomena. It is a process that goes from state to state according to a Markov chain, but the times between state transitions are continuous random variables having an exponential distribution.
The purpose of this chapter is to give an elementary introduction to continuous-time Markov chains. The basic concept of the continuous-time Markov chain model is the so-called transition rate function. Several examples will be given to illustrate this basic concept. Next we discuss the time-dependent behavior of the process and give Kolmogorov's differential equations to compute the time-dependent state probabilities. Finally, we present the flow-rate-equation method to compute the limiting state probabilities and illustrate this powerful method with several examples dealing with queueing systems.
Markov chain model
A continuous-time stochastic process {X(t), t ≥ 0} is a collection of random variables indexed by a continuous time parameter t ∈ [0, ∞), where the random variable X(t) is called the state of the process at time t. In an inventory problem X(t) might be the stock on hand at time t and in a queueing problem X(t) might be the number of customers present at time t. The formal definition of a continuous-time Markov chain is a natural extension of the definition of a discrete-time Markov chain.
In previous chapters we have dealt with sequences of independent random variables. However, many random systems evolving in time involve sequences of dependent random variables. Think of the outside weather temperature on successive days, or the price of IBM stock at the end of successive trading days. Many such systems have the property that the current state alone contains sufficient information to give the probability distribution of the next state. The probability model with this feature is called a Markov chain. The concepts of state and state transition are at the heart of Markov chain analysis. The line of thinking through the concepts of state and state transition is very useful to analyze many practical problems in applied probability.
Markov chains are named after the Russian mathematician Andrey Markov (1856–1922), who first developed this probability model in order to analyze the alternation of vowels and consonants in Pushkin's poem “Eugine Onegin.” His work helped to launch the modern theory of stochastic processes (a stochastic process is a collection of random variables, indexed by an ordered time variable). The characteristic property of a Markov chain is that its memory goes back only to the most recent state. Knowledge of the current state only is sufficient to describe the future development of the process. A Markov model is the simplest model for random systems evolving in time when the successive states of the system are not independent.
Constructing the mathematical foundations of probability theory has proven to be a long-lasting process of trial and error. The approach consisting of defining probabilities as relative frequencies in cases of repeatable experiments leads to an unsatisfactory theory. The frequency view of probability has a long history that goes back to Aristotle. It was not until 1933 that the great Russian mathematician Andrej Nikolajewitsch Kolmogorov (1903–1987) laid a satisfactory mathematical foundation of probability theory. He did this by taking a number of axioms as his starting point, as had been done in other fields of mathematics. Axioms state a number of minimal requirements that the mathematical objects in question (such as points and lines in geometry) must satisfy. In the axiomatic approach of Kolmogorov, probability figures as a function on subsets of a so-called sample space, where the sample space represents the set of all possible outcomes the experiment. The axioms are the basis for the mathematical theory of probability. As a milestone, the law of large numbers can be deduced from the axioms by logical reasoning. The law of large numbers confirms our intuition that the probability of an event in a repeatable experiment can be estimated by the relative frequency of its occurrence in many repetitions of the experiment. This law is the fundamental link between theory and the real world. Its proof has to be postponed until Chapter 14.
In many practical applications of probability, physical situations are better described by random variables that can take on a continuum of possible values rather than a discrete number of values. Examples are the decay time of a radioactive particle, the time until the occurrence of the next earthquake in a certain region, the lifetime of a battery, the annual rainfall in London, and so on. These examples make clear what the fundamental difference is between discrete random variables taking on a discrete number of values and continuous random variables taking on a continuum of values. Whereas a discrete random variable associates positive probabilities to its individual values, any individual value has probability zero for a continuous random variable. It is only meaningful to speak of the probability of a continuous random variable taking on a value in some interval. Taking the lifetime of a battery as an example, it will be intuitively clear that the probability of this lifetime taking on a specific value becomes zero when a finer and finer unit of time is used. If you can measure the heights of people with infinite precision, the height of a randomly chosen person is a continuous random variable. In reality, heights cannot be measured with infinite precision, but the mathematical analysis of the distribution of heights of people is greatly simplified when using a mathematical model in which the height of a randomly chosen person is modeled as a continuous random variable.
This appendix first gives some background material on counting methods. Many probability problems require counting techniques. In particular, these techniques are extremely useful for computing probabilities in a chance experiment in which all possible outcomes are equally likely. In such experiments, one needs effective methods to count the number of outcomes in any specific event. In counting problems, it is important to know whether the order in which the elements are counted is relevant or not. After the discussion on counting methods, the appendix summarizes a number of properties of the famous number e and the exponential function ex both playing an important role in probability.
Permutations
How many different ways can you arrange a number of different objects such as letters or numbers? For example, what is the number of different ways that the three letters A, B, and C can be arranged? By writing out all the possibilities ABC, ACB, BAC, BCA, CAB, and CBA, you can see that the total number is 6. This brute-force method of writing down all the possibilities and counting them is naturally not practical when the number of possibilities gets large, for example the number of different ways to arrange the 26 letters of the alphabet. You can also determine that the three letters A, B, and C can be written down in 6 different ways by reasoning as follows. For the first position, there are 3 available letters to choose from, for the second position there are 2 letters over to choose from, and only one letter for the third position.
Generating functions were introduced by the Swiss genius Leonhard Euler (1707–1783) in the eighteenth century to facilitate calculations in counting problems. However, this important concept is also extremely useful in applied probability, as was first demonstrated by the work of Abraham de Moivre (1667–1754) who discovered the technique of generating functions independently of Euler. In modern probability theory, generating functions are an indispensable tool in combination with methods from numerical analysis.
The purpose of this chapter is to give the basic properties of generating functions and to show the utility of this concept. First, the generating function is defined for a discrete random variable on nonnegative integers. Next, we consider the more general moment-generating function, which is defined for any random variable. The (moment) generating function is a powerful tool for both theoretical and computational purposes. In particular, it can be used to prove the central limit theorem. A sketch of the proof will be given. This chapter also gives a proof of the strong law of large numbers, using moment-generating functions together with so-called Chernoff bounds. Finally, the strong law of large numbers is used to establish the powerful renewal-reward theorem for stochastic processes having the property that the process probabilistically restarts itself at certain points in time.
Generating functions
We first introduce the concept of generating function for a discrete random variable X whose possible values belong to the set of nonnegative integers.
In experiments, one is often interested not only in individual random variables, but also in relationships between two or more random variables. For example, if the experiment is the testing of a new medicine, the researcher might be interested in cholesterol level, blood pressure, and glucose level of a test person. Similarly, a political scientist investigating the behavior of voters might be interested in the income and level of education of a voter. There are many more examples in the physical sciences, medical sciences, and social sciences. In applications, one often wishes to make inferences about one random variable on the basis of observations of other random variables.
The purpose of this chapter is to familiarize the student with the notations and the techniques relating to experiments whose outcomes are described by two or more real numbers. The discussion is restricted to the case of pairs of random variables. The chapter treats joint and marginal densities, along with covariance and correlation. Also, the transformation rule for jointly distributed random variables and regression to the mean are discussed.
In performing a chance experiment, one is often not interested in the particular outcome that occurs but in a specific numerical value associated with that outcome. Any function that assigns a real number to each outcome in the sample space of the experiment is called a random variable. Intuitively, a random variable can be thought of as a quantity whose value is not fixed. The value of a random variable is determined by the outcome of the experiment and consequently probabilities can be assigned to the possible values of the random variable.
The purpose of this chapter is to familiarize the reader with a number of basic rules for calculating characteristics of random variables such as the expected value and the variance. In addition, we give rules for the expected value and the variance of a sum of random variables, including the square-root rule. The rules for random variables are easiest explained and understood in the context of discrete random variables. These random variables can take on only a finite or countably infinite number of values (the so-called continuous random variables that can take on a continuum of values are treated in the next chapter). To conclude this chapter, we discuss the most important discrete random variables such the binomial, the Poisson and the hypergeometric random variables.
Do the one-dimensional normal distribution and the one-dimensional central limit theorem allow for a generalization to dimension two or higher? The answer is yes. Just as the one-dimensional normal density is completely determined by its expected value and variance, the bivariate normal density is completely specified by the expected values and the variances of its marginal densities and by its correlation coefficient. The bivariate normal distribution appears in many applied probability problems. This probability distribution can be extended to the multivariate normal distribution in higher dimensions. The multivariate normal distribution arises when you take the sum of a large number of independent random vectors. To get this distribution, all you have to do is to compute a vector of expected values and a matrix of covariances. The multidimensional central limit theorem explains why so many natural phenomena have the multivariate normal distribution. A nice feature of the multivariate normal distribution is its mathematical tractability. The fact that any linear combination of multivariate normal random variables has a univariate normal distribution makes the multivariate normal distribution very convenient for financial portfolio analysis, among others.
The purpose of this chapter is to give a first introduction to the multivariate normal distribution and the multidimensional central limit theorem. Several practical applications will be discussed, including the drunkard's walk in higher dimensions and the chi-square test.
This eagerly awaited textbook covers everything the graduate student in probability wants to know about Brownian motion, as well as the latest research in the area. Starting with the construction of Brownian motion, the book then proceeds to sample path properties like continuity and nowhere differentiability. Notions of fractal dimension are introduced early and are used throughout the book to describe fine properties of Brownian paths. The relation of Brownian motion and random walk is explored from several viewpoints, including a development of the theory of Brownian local times from random walk embeddings. Stochastic integration is introduced as a tool and an accessible treatment of the potential theory of Brownian motion clears the path for an extensive treatment of intersections of Brownian paths. An investigation of exceptional points on the Brownian path and an appendix on SLE processes, by Oded Schramm and Wendelin Werner, lead directly to recent research themes.
Stein's method is a collection of probabilistic techniques that allow one to assess the distance between two probability distributions by means of differential operators. In 2007, the authors discovered that one can combine Stein's method with the powerful Malliavin calculus of variations, in order to deduce quantitative central limit theorems involving functionals of general Gaussian fields. This book provides an ideal introduction both to Stein's method and Malliavin calculus, from the standpoint of normal approximations on a Gaussian space. Many recent developments and applications are studied in detail, for instance: fourth moment theorems on the Wiener chaos, density estimates, Breuer–Major theorems for fractional processes, recursive cumulant computations, optimal rates and universality results for homogeneous sums. Largely self-contained, the book is perfect for self-study. It will appeal to researchers and graduate students in probability and statistics, especially those who wish to understand the connections between Stein's method and Malliavin calculus.