To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we will lay the groundwork for our presentation of three strategies to estimate causal effects when simple conditioning on observed variables that lie along back-door paths will not suffice. These strategies will be taken up in Chapters 9, 10, and 11, where we will explain instrumental variable estimators, front-door identification with causal mechanisms, and conditioning estimators that use data on pretreatment values of the outcome variable. Under very specific assumptions, these three strategies will identify average causal effects of interest, even though selection is on the unobservables and treatment assignment is nonignorable.
In this chapter, we will first review the related concepts of nonignorable treatment assignment and selection on the unobservables, using the directed graphs presented in prior chapters. To deepen the understanding of these concepts, we will then demonstrate why the usage of additional posttreatment data on the outcome of interest is unlikely to aid in the point identification of the treatment effects of most central concern. One indirect goal of this demonstration is to convince the reader that oft-heard claims such as “I would be able to establish that this association is causal if I had longitudinal data” are nearly always untrue if the longed-for longitudinal data are additional measurements taken only after treatment exposure. Instead, longitudinal data are most useful, as we will later explain in detail in Chapter 11, when pretreatment measures are available for those who are subsequently exposed to the treatment.
As discussed in previous chapters, the fundamental challenge of causal inference is that an individual cannot be simultaneously observed in both the treatment and control states. In some situations, however, it is possible to observe the same individual or unit of observation in the treatment and control states at different points in time. If the potential outcomes do not evolve in time for reasons other than the treatment, then the causal effect of a treatment can be estimated as the difference between an individual's observed outcome in the control state at time 1 and the same individual's observed outcome in the treatment state at time 2. The assumption that potential outcomes are stable in time (and thus age for individuals) is often heroic. If, however, potential outcomes evolve in a predictable way, then it may be possible to use the longitudinal structure of the data to predict the counterfactual outcomes of each individual.
We begin our discussion with the interrupted time series (ITS) design, which we introduced already with the example of the year of the fire horse in Section 2.8.1. The ITS design is the simplest case where the goal is to determine the degree to which a treatment shifts the underlying trajectory of an outcome.
The rise of the counterfactual model to prominence has increased the popularity of data analysis routines that are most clearly useful for estimating the effects of causes. The matching estimators that we will review and explain in this chapter are perhaps the best example of a classic technique that has reemerged in the past three decades as a promising procedure for estimating causal effects. Matching represents an intuitive method for addressing causal questions, primarily because it pushes the analyst to confront the process of causal exposure as well as the limitations of available data. Accordingly, among social scientists who adopt a counterfactual perspective, matching methods are fast becoming an indispensable technique for prosecuting causal questions, even though they usually prove to be the beginning rather than the end of causal analysis on any particular topic.
We begin with a brief discussion of the past use of matching methods. Then, we present the fundamental concepts underlying matching, including stratification of the data, weighting to achieve balance, and propensity scores. Thereafter, we discuss how matching is usually undertaken in practice, including an overview of various matching algorithms.
In the course of presentation, we will offer four hypothetical examples that demonstrate some of the essential claims of the matching literature, progressing from idealized examples of stratification and weighting to the implementation of alternative matching algorithms on simulated data for which the treatment effects of interest are known by construction.
Probability and statistics are as much about intuition and problem solving as they are about theorem proving. Consequently, students can find it very difficult to make a successful transition from lectures to examinations to practice because the problems involved can vary so much in nature. Since the subject is critical in so many applications from insurance to telecommunications to bioinformatics, the authors have collected more than 200 worked examples and examination questions with complete solutions to help students develop a deep understanding of the subject rather than a superficial knowledge of sophisticated theories. With amusing stories and historical asides sprinkled throughout, this enjoyable book will leave students better equipped to solve problems in practice and under exam conditions.
The original motivation for writing this book was rather personal. The first author, in the course of his teaching career in the Department of Pure Mathematics and Mathematical Statistics (DPMMS), University of Cambridge, and St John's College, Cambridge, had many painful experiences when good (or even brilliant) students, who were interested in the subject of mathematics and its applications and who performed well during their first academic year, stumbled or nearly failed in the exams. This led to great frustration, which was very hard to overcome in subsequent undergraduate years. A conscientious tutor is always sympathetic to such misfortunes, but even pointing out a student's obvious weaknesses (if any) does not always help. For the second author, such experiences were as a parent of a Cambridge University student rather than as a teacher.
We therefore felt that a monograph focusing on Cambridge University mathematics examination questions would be beneficial for a number of students. Given our own research and teaching backgrounds, it was natural for us to select probability and statistics as the overall topic. The obvious starting point was the first-year course in probability and the second-year course in statistics. In order to cover other courses, several further volumes will be needed; for better or worse, we have decided to embark on such a project.
This entry-level text offers clear and concise guidelines on how to select, construct, interpret, and evaluate count data. Written for researchers with little or no background in advanced statistics, the book presents treatments of all major models using numerous tables, insets, and detailed modeling suggestions. It begins by demonstrating the fundamentals of modeling count data, including a thorough presentation of the Poisson model. It then works up to an analysis of the problem of overdispersion and of the negative binomial model, and finally to the many variations that can be made to the base count models. Examples in Stata, R, and SAS code enable readers to adapt models for their own purposes, making the text an ideal resource for researchers working in health, ecology, econometrics, transportation, and other fields.