To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
THIS CHAPTER INTRODUCES several important concepts, provides a guide to the rest of the book, and offers some historical perspective and suggestions for further reading.
Econometrics
Econometrics is largely concerned with quantifying the relationship between one or more variables y, called the response variables or the dependent variables, and one or more variables x, called regressors, independent variables, or covariates. The response variable or variables may be continuous or discrete; the latter case includes binary, multinomial, and count data. For example, y might represent the quantities demanded of a set of goods, and x could include income and the prices of the goods; or y might represent investment in capital equipment, and x could include measures of expected sales, cash flows, and borrowing costs; or y might represent a decision to travel by public transportation rather than private, and x could include income, fares, and travel time under various alternatives.
In addition to the covariates, it is assumed that unobservable random variables affect y, so that y itself is a random variable. It is characterized either by a probability density function (p.d.f.) for continuous y or a probability mass function (p.m.f.) for discrete y. The p.d.f. or p.m.f. depends on the values of unknown parameters, denoted by θ. The notation y ∼ f(y∣θ, x) means that y has the p.d.f. or p.m.f. f(y∣θ, x), where the function depends on the parameters and covariates.
THE END OF the previous chapter mentions that simulation has greatly expanded the scope of Bayesian inference. This chapter reviews methods for generating independent samples from probability distributions. The methods discussed here form the basis for the newer methods discussed in Chapter 7 that are capable of dealing with a wide variety of distributions but do not generate independent samples.
All major statistics packages contain routines for generating random variables from such standard distributions as those summarized in Appendix A. The following examples are intended to illustrate methods of generating samples. I do not claim that the algorithms are the best that can be designed, and you should not study the methods in great detail. The goal for the chapter is to present the standard techniques of simulation and explain the kinds of questions that simulated samples can help answer.
Many of the applications discussed can be regarded as attempts to approximate a quantity such as E[g(X)] where X ∼ f(X), but the necessary integral, ∫ g(x)f(x)dx, cannot be computed analytically. This problem includes the computation of expected values (where g(X) = X) and other moments, as well as P(c1 ≤ X ≤ c2), for which you set g(X) = 1(c1 ≤ X ≤ c2).
Probability Integral Transformation Method
The most basic method of generating samples takes advantage of the ability of computers to generate values that can be regarded as drawn independently from a uniform distribution on (0,1), U(0, 1).
THE BASIS OF an MCMC algorithm is the construction of a transition kernel (see Section 6.3), p(x, y), that has an invariant density equal to the target density. Given such a kernel, the process can be started at x0 to yield a draw x1 from p(x0, x1), x2 from p(x1, x2), …, and xG from p(xG–1, xG), where G is the desired number of simulations. After a transient period, the distribution of the xg is approximately equal to the target distribution. The question is how to find a kernel that has the target as its invariant distribution. It is remarkable that there is a general principle for finding such kernels, the Metropolis-Hastings (MH) algorithm. I first discuss a special case – the Gibbs algorithm or Gibbs sampler – and then explain a more general version of the MH algorithm.
It is important to distinguish between the number of simulated values G and the number of observations n in the sample of data that is being analyzed. The former may be made very large – the only restriction comes from computer time and capacity, but the number of observations is fixed at the time the data are collected. Larger values of G lead to more accurate approximations. MCMC algorithms provide an approximation to the exact posterior distribution of a parameter; that is, they approximate the posterior distribution of the parameters, taking the number of observations to be fixed at n.
THE FIRST SECTION of this chapter discusses general properties of posterior distributions. It continues with an explanation of how a Bayesian statistician uses the posterior distribution to conduct statistical inference, which consists of learning about parameter values either in the form of point or interval estimates, making predictions, and comparing alternative models.
Properties of Posterior Distributions
This section discusses general properties of posterior distributions, starting with the likelihood function. It continues by generalizing the concept to include models with more than one parameter and goes on to discuss the revision of posterior distributions as more data become available, the role of the sample size, and the concept of identification.
The Likelihood Function
As you have seen, the posterior distribution is proportional to the product of the likelihood function and the prior distribution. The latter is somewhat controversial and is discussed in Chapter 4, but the choice of a likelihood function is also an important matter and requires discussion. A central issue is that the Bayesian must specify an explicit likelihood function to derive the posterior distribution. In some cases, the choice of a likelihood function appears straightforward. In the coin-tossing experiment of Section 2.2, for example, the choice of a Bernoulli distribution seems natural, but it does require the assumptions of independent trials and a constant probability. These assumptions might be considered prior information, but they are conventionally a part of the likelihood function rather than of the prior distribution.
THE MODELS DISCUSSED in this book are rather easy to program, and students are encouraged to do some or all of the exercises by writing their own programs. The writing of programs requires a complete understanding of the problem and is therefore the best way to ensure that the material has been mastered.
A number of excellent programs are suitable for programming the MCMC algorithms described in this book. At this writing, the most popular seems to be R, which is a free software environment for statistical computing and graphics. There are versions for UNIX, Windows, and MacOS, and it may be downloaded from your preferred CRAN mirror. R is explained in a large number of books and online material. An excellent general introduction is Maindonald and Braun (2010), and Springer publishes a large number of titles in its “Use R!” series, some of which cover Bayesian methods. Another important feature of R is the extensive set of packages that provide tools for specialized tasks.
Two useful packages for Bayesian model fitting in R are:
• MCMCpack is available at http://mcmcpack.wustl.edu; it contains some of the models discussed in this book as well as some additional measurement and ecological inference models of interest to political scientists. Its lead developers are Andrew Martin, Kevin M. Quinn, and Jong Hee Park.
MCMCpack utilizes the coda package (http://cran.r-project.org/web/packages/coda/ coda.pdf) to summarize the MCMC output by preparing summaries, computing convergence diagnostics, and making plots. Functions in the coda package can be used to analyze any MCMC output.
THE MOTIVATION FOR this edition is the same as for the first: to provide a concise introduction to the main ideas of Bayesian statistics and econometrics. The changes, however, have made the book somewhat less concise. In particular, I have added a chapter on Bayesian nonparametrics and new sections on the ordinal probit model, item response models, factor analysis models, and time-varying variances. I believe that these additional materials make the book more useful to readers. Another difference is that this edition adopts the R statistics environment as the primary tool for computing.
In addition to those thanked in the preface to the first edition, without implicating them in any errors or omissions, I offer my sincere gratitude to John Burkett, Stephen Haptonstahl, Alejandro Jara, Kyu Ho Kang, Xun Pang, Jong Hee Park, Srikanth Ramamurthy, Richard Startz, and Ghislain Vieilledent.
I am grateful for the continued support of Lisa, Aida, my grandchildren, and Sylvia Silver and her family.
With sadness, I note the recent passing of my friends and colleagues Peter Steiner, Arthur Goldberger, and Arnold Zellner, and of my dear son Arthur, to whom I dedicate this edition.
THIS CHAPTER CONCERNS data sets for which the assumption made about the exogeneity of covariates in Chapter 4 and subsequent chapters is untenable. Covariates that are correlated with the disturbance term are called endogenous variables in the econometrics literature. Three types of models are taken up in which endogeneity may be present: treatment models, unobserved covariates, and sample selection subject to incidental truncation.
Treatment Models
Treatment models are used to compare responses of individuals who belong either to a treatment or a control group. If the assignment to a group is random, as inmany clinical trials, the assignmentmay be regarded as independent of any characteristics of the individual. But in many economic applications and in clinical trials in which compliance is not guaranteed, whether an individual is in the treatment or control group is a choice made by the individual, and the choice may depend on unobserved covariates that are correlated with the response variable. Such unobserved covariates are called confounders in the statistical literature; in the econometrics literature, the treatment assignment is called endogenous when it is not independent of the response variable. As an example, let the response variable be wages and the treatment be participation in a job training program. You might expect that people with sufficient motivation to participate in training would earn higher wages, even without participating in the program, than those with less motivation. The problem may be less serious if individuals are randomly assigned to the training program, but there may still be confounding. For example, individuals assigned to the program may choose not to participate, and individuals not assigned to the program may find a way to participate.
THIS BOOK IS a concise introduction to Bayesian statistics and econometrics. It can be used as a supplement to a frequentist course by instructors who wish to introduce the Bayesian viewpoint or as a text in a course on Bayesian econometrics supplemented by readings in the current literature.
While the student should have had some exposure to standard probability theory and statistics, the book does not make extensive use of statistical theory. Indeed, because of its reliance on simulation techniques, it requires less background in statistics and probability than most books that take a frequentist approach. It is, however, strongly recommended that the student become familiar with the forms and properties of the standard probability distributions collected in Appendix A.
Since the advent of Markov chain Monte Carlo (MCMC) methods in the early 1990s, Bayesian methods have been extended to a large and growing number of applications. This book limits itself to explaining in detail a few important applications. Its main goal is to provide examples of MCMC algorithms to enable students and researchers to design algorithms for the models that arise in their own research. More attention is paid to the design of algorithms for the models than to the specification and interpretation of the models themselves because I assume that the student has been exposed to these models in other statistics and econometrics classes.
THE ANALYSIS OF time series data has generated a vast literature from both frequentist and Bayesian viewpoints. This chapter considers a few standard models to illustrate how they can be analyzed with MCMC methods. Section 11.6 provides references to more detailed explanations and additional models.
Autoregressive Models
This section is concerned with models of the general form
where and t = 1, …, T. The disturbance ∊t is said to be autoregressive of order p, denoted by ∊t ∼ AR(p). This model is a way to capture the possibility that disturbances in a particular time period continue to affect y in later time periods, a property that characterizes many time series in economics and other areas.
Assume that the stochastic process defining ∊t is second-order stationary, which means that the means E(∊t) and all covariances E(∈s∊t) of the process are finite and independent of t and s, although the covariances may depend on ∣t – s∣. Because the variance is the special case of the covariance when t = s, it is finite and independent of time.
The stationarity property imposes restrictions on the ϕs. To state these, I define the lag operator L. It operates on time-subscripted variables as Lzt = zt−1, which implies that for integer values of r. The polynomial in the lag operator
We prove that for each positive integer k in the range 2≤k≤10 and for each positive integer k≡79 (mod 120) there is a k-step Fibonacci-like sequence of composite numbers and give some examples of such sequences. This is a natural extension of a result of Graham for the Fibonacci-like sequence.