To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Distribution theory is concerned with probability distributions of random variables, with the emphasis on the types of random variables frequently used in the theory and application of statistical methods. For instance, in a statistical estimation problem we may need to determine the probability distribution of a proposed estimator or to calculate probabilities in order to construct a confidence interval.
Clearly, there is a close relationship between distribution theory and probability theory; in some sense, distribution theory consists of those aspects of probability theory that are often used in the development of statistical theory and methodology. In particular, the problem of deriving properties of probability distributions of statistics, such as the sample mean or sample standard deviation, based on assumptions on the distributions of the underlying random variables, receives much emphasis in distribution theory.
In this chapter, we consider the basic properties of probability distributions. Although these concepts most likely are familiar to anyone who has studied elementary probability theory, they play such a central role in the subsequent chapters that they are presented here for completeness.
Basic Framework
The starting point for probability theory and, hence, distribution theory is the concept of an experiment. The term experiment may actually refer to a physical experiment in the usual sense, but more generally we will refer to something as an experiment when it has the following properties: there is a well-defined set of possible outcomes of the experiment, each time the experiment is performed exactly one of the possible outcomes occurs, and the outcome that occurs is governed by some chance mechanism.
This chapter explains simultaneous-equation models, and how to estimate them using instrumental variables (or two-stage least squares). These techniques are needed to avoid simultaneity bias (aka endogeneity bias). The lead example will be hypothetical supply and demand equations for butter in the state of Wisconsin. The source of endogeneity bias will be explained, and so will methods for working around this problem.
Then we discuss two real examples—(i) the way education and fertility influence each other, and (ii) the effect of school choice on social capital. These examples indicate how social scientists use two-stage least squares to handle (i) reciprocal causation and (ii) self-selection of subjects into the sample. (In the social sciences, two-stage least squares is often seen as the solution to problems of statistical inference.) At the end of the chapter there is a literature review, which puts modeling issues into a broader perspective.
We turn now to butter. Supply and demand need some preliminary discussion. For an economist, butter supply is not a single quantity but a relationship between quantity and price. The supply curve shows the quantity of butter that farmers would bring to market at different prices. In the left Figure 1. Supply and demand. The vertical axis shows quantity; the horizontal axis, price.
hand panel of figure 1, price is on the horizontal axis and quantity on the vertical. (Economists usually do it the other way around.)
This appendix contains a review of some basic mathematical facts that are used throughout this book. For further details, the reader should consult a book on mathematical analysis, such as Apostal (1974), Rudin (1976), or Wade (2004).
Sets
Basic definitions. A set is a collection of objects that itself is viewed as a single entity. We write x ∈ A to indicate that x is an element of a set A; we write x ∉ A to indicate that x is not an element of A. The set that contains no elements is known as the empty set and is denoted by ø.
Let A and B denote sets. If every element of B is also an element of A we say that B is a subset of A; this is denoted by B ⊂ A. If there also exists an element of A that is not in B we say that B is a proper subset of A. If A and B have exactly the same elements we write A = B. The difference between A and B, written A \ B, is that set consisting of all elements of A that are not elements of B.
Set algebra. Let S denote a fixed set such that all sets under consideration are subsets of S and let A and B denote subsets of S. The union of A and B is the set C whose elements are either elements of A or elements of B or are elements of both; we write C = A ∪ B.
This chapter is about the regression line. The regression line is important on its own (to statisticians), and it will help us with multiple regression in chapter 4. The first example is a scatter diagram showing the heights of 1078 fathers and their sons (figure 1). Each pair of fathers and sons becomes a dot on the diagram. The height of the father is plotted on the x-axis; the height of his son, on the y-axis. The left hand vertical strip (inside the chimney) shows the families where the father is 64 inches tall to the nearest inch; the right hand vertical strip, families where the father is 72 inches tall. Many other strips could be drawn too. The regression line (solid) approximates the average height of the sons, given the heights of their fathers. This line goes through the centers of all the vertical strips. The regression line is flatter than the SD line, which is dashed. “SD” is shorthand for “standard deviation”; definitions come next.
The regression line
We have n subjects indexed by i = 1, …, n, and two data variables x and y. A data variable stores a value for each subject in a study. Thus, xi is the value of xfor subject i, and yi is the value of y.
This chapter is concerned with two separate but interrelated themes. The first has to do with extending the discussion of Chapter 4 to more complicated hypothesis testing problems, and the second is concerned with conditional inference.
We will consider first testing two-sided hypotheses of the form H0 : θ ε [θ1, θ2] (with θ1 < θ2) or H0 : θ = θ0 where, in each case, the alternative H1 includes all θ not part of H0. For such problems we cannot expect to find a uniformly most powerful test in the sense of Chapter 4. However, by introducing an additional concept of unbiasedness (Section 7.1), we are able to define a family of uniformly most powerful unbiased, or UMPU, tests. In general, characterising UMPU tests for two-sided problems is a much harder task than characterising UMP tests for one-sided hypotheses, but for one specific but important example, that of a one-parameter exponential family, we are able to find UMPU tests. The details of this are the subject of Section 7.1.2.
The extension to multiparameter exponential families involves the notion of conditional tests, discussed in Section 7.2. In some situations, a statistical problem may be greatly simplified by working not with the unconditional distribution of a test statistic, but the conditional distribution given some other statistic. We discuss two situations where conditional tests naturally arise, one when there are ancillary statistics, and the other where conditional procedures are used to construct similar tests. The basic idea behind an ancillary statistic is that of a quantity with distribution not depending on the parameter of interest.
From now on, we consider a variety of specific statistical problems, beginning in this chapter with a re-examination of the theory of hypothesis testing. The concepts and terminology of decision theory will always be present in the background, but inevitably, each method that we consider has developed its own techniques.
In Section 4.1 we introduce the key ideas in the Neyman–Pearson framework for hypothesis testing. The fundamental notion is that of seeking a test which maximises power, the probability under repeated sampling of correctly rejecting an incorrect hypothesis, subject to some pre-specified fixed size, the probability of incorrectly rejecting a true hypothesis. When the hypotheses under test are simple, so that they completely specify the distribution of X, the Neyman–Pearson Theorem (Section 4.2) gives a simple characterisation of the optimal test. We shall see in Section 4.3 that this result may be extended to certain composite (non-simple) hypotheses, when the family of distributions under consideration possesses the property of monotone likelihood ratio. Other, more elaborate, hypothesis testing problems require the introduction of further structure, and are considered in Chapter 7. The current chapter finishes (Section 4.4) with a description of the Bayesian approach to hypothesis testing based on Bayes factors, which may conflict sharply with the Neyman–Pearson frequentist approach.
This book aims to provide a concise but comprehensive account of the essential elements of statistical inference and theory. It is designed to be used as a text for courses on statistical theory for students of mathematics or statistics at the advanced undergraduate or Masters level (UK) or the first-year graduate level (US), or as a reference for researchers in other fields seeking a concise treatment of the key concepts of and approaches to statistical inference. It is intended to give a contemporary and accessible account of procedures used to draw formal inference from data.
The book focusses on a clear presentation of the main concepts and results underlying different frameworks of inference, with particular emphasis on the contrasts among frequentist, Fisherian and Bayesian approaches. It provides a description of basic material on these main approaches to inference, as well as more advanced material on recent developments in statistical theory, including higher-order likelihood inference, bootstrap methods, conditional inference and predictive inference. It places particular emphasis on contemporary computational ideas, such as applied in bootstrap methodology and Markov chain Monte Carlo techniques of Bayesian inference. Throughout, the text concentrates on concepts, rather than mathematical detail, but every effort has been made to present the key theoretical results in as precise and rigorous a manner as possible, consistent with the overall mathematical level of the book. The book contains numerous extended examples of application of contrasting inference techniques to real data, as well as selected historical commentaries. Each chapter concludes with an accessible set of problems and exercises.