To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Do charter schools increase the test scores of elementary school students? If so, how large are the gains in comparison to those that could be realized by implementing alternative educational reforms? Does obtaining a college degree increase an individual's labor market earnings? If so, is this particular effect large relative to the earnings gains that could be achieved only through on-the-job training? Did the use of a butterfly ballot in some Florida counties in the 2000 presidential election cost Al Gore votes? If so, was the number of miscast votes sufficiently large to have altered the election outcome?
At their core, these types of questions are simple cause-and-effect questions of the form, Does X cause Y? If X causes Y, how large is the effect of X on Y? Is the size of this effect large relative to the effects of other causes of Y?
Simple cause-and-effect questions are the motivation for much research in the social, demographic, and health sciences, even though definitive answers to cause-and-effect questions may not always be possible to formulate given the constraints that researchers face in collecting data and evaluating alternative explanations. Even so, there is reason for optimism about our current and future abilities to effectively address cause-and-effect questions. Over the past four decades, a counterfactual model of causality has been developed and refined, and as a result a unified framework for the prosecution of causal questions is now available.
In this chapter, we introduce the foundational components of the potential outcome model. We first discuss causal states, the relationship between potential and observed outcome variables, and the usage of the label “counterfactual” to refer to unobserved potential outcomes. We introduce average causal effects and then discuss the assumption of causal effect stability, which is maintained explicitly in most applications that use the potential outcome model. We discuss simple estimation techniques and demonstrate the importance of considering the relationship between the potential outcomes and the process of causal exposure. We conclude by extending our presentation to over-time potential outcome variables for one or more units of analysis, as well as causal variables that take on more than two values.
Defining the Causal States
The counterfactual framework for observational data analysis presupposes the existence of well-defined causal states to which all members of the population of interest could be exposed. As we will show in the next section, causal effects are then defined based on comparisons of outcomes that would result from exposure to alternative causal states. For a binary cause, the two states are usually labeled treatment and control. When a many-valued cause is analyzed, the convention is to refer to the alternative states as alternative treatments.
Although these labels are simple, the assumed underlying states must be very carefully defined so that the contribution of an empirical analysis based upon them is clear.
With an Extended Example of a Weighted Regression Alternative to Matching
In the last chapter, we argued that traditional regression estimators of casual effects have substantial weaknesses, especially when individual-level causal effects are heterogeneous in ways that are not explicitly parameterized. In this chapter, we will introduce weighted regression estimators that solve these problems by appropriately averaging individual-level heterogeneity across the treatment and control groups using estimated propensity scores. In part because of this capacity, weighted regression estimators are now at the frontier of causal effect estimation, alongside the latest matching estimators that are also designed to properly handle such heterogeneity.
In the long run, we expect that weighted regression estimators will prove to be a common choice among alternative conditioning procedures that are used to estimate causal effects. In fact, we expect that weighted regression estimators will be used more frequently than the matching estimators presented in Chapter 5 when there is good overlap in the distributions of adjustment variables across the treatment and control groups. We have four primary reasons for this prediction, each of which we will explain in this chapter. First, weighted regression estimators allow the analyst to adopt the spirit of matching, and the clear thinking that it promotes, within a mode of data analysis that utilizes widely available software and that is familiar to most social scientists.
What role should the counterfactual approach to observational data analysis play in causal analysis in the social sciences? Some scholars see its elaboration as a justification for experimental methodology as an alternative to observational data analysis. We agree that by laying bare the challenges that confront causal analysis with observational data, the counterfactual approach does indirectly support experimentation as an alternative to observation. But, because experiments are often (perhaps usually) infeasible for most of the causal questions that practicing social scientists appear to want to answer, this implication, when considered apart from others, is understandably distressing.
We see the observational data analysis methods associated with the potential outcome model, motivated using directed graphs, as useful tools that can help to improve the investigation of causal relationships within the social sciences, especially when experiments are infeasible. Accordingly, we believe that the methods associated with the counterfactual approach complement and extend older approaches to causal analysis with observational data by shaping the goals of an analysis, requiring explicit consideration of individual-level heterogeneity of causal effects, encouraging a wider consideration of available identification strategies, and clarifying standards for credible interpretations.
In this chapter, we first shore up our presentation of the counterfactual approach by considering several critical perspectives on its utility. We weigh in with the arguments that we find most compelling, and it will not be surprising to the reader that we find these objections less serious than do those who have formulated them.
Social scientists have recognized for decades that the best explanations for how causes bring about their effects must specify in empirically verifiable ways the causal pathways between causes and their outcomes. This valuation of depth of causal explanation applies to the counterfactual tradition as well. Accordingly, it is widely recognized that a consistent estimate of a counterfactually defined causal effect of D on Y may not qualify as a sufficiently deep causal account of how D effects Y, based on the standards that prevail in a particular field of study.
In this chapter, we first discuss the dangers of insufficiently deep explanations of causal effects, reconsidering the weak explanatory power of some of the natural experiments discussed already in Chapter 9. We then consider the older literature on intervening variables in the social sciences as a way to introduce the mechanism-based estimation strategy proposed by Pearl (2009). In some respects, Pearl's approach is completely new, and it shows in a novel and sophisticated way how intervening variables can be used to identify causal effects, even when unblocked back-door paths between a causal variable and an outcome variable are present. In other respects, however, Pearl's approach is refreshingly familiar, as it helps to clarify the appropriate usage of intervening variables when attempting to deepen the explanation of a causal claim.
Independent of Pearl's important work, a diverse group of social scientists has appealed recently for the importance of mechanisms to all explanation in social science research.
In this chapter, we consider how analysts can proceed when no observed variables are available to point-identify and then estimate causal effects using the procedures explained in prior chapters. We discuss three complementary approaches. First, we will review the early literature on selection-bias adjustment, which shows clearly how distributional assumptions about unobservables, when harnessed from within a structural model, can identify causal parameters of interest. Point identification and estimation utilizing this strategy was frequent before it became clear to researchers in the 1990s how rarely the required distributional assumptions were warranted for their applications. The more recent selection-bias adjustment literature offers less restrictive semiparametric methods, but it also reemphasizes the relative value of instrumental variables in contrast to distributional assumptions for unobservables.
We will then consider the exact opposite strategy. Rather than place strong and usually untestable assumptions on the distributional characteristics of unobservables, the set-identification approach asks what can be learned about particular causal parameters by asserting only weak but defendable assumptions about unobserved variables. Instead of attempting to point-identify average causal effects, the set-identification approach suggests that it is more credible to try to limit the interval within which an average treatment effect must fall.
Finally, we will consider the related approach known as sensitivity analysis. Here, the analyst offers an estimate based on the provisional maintenance of an identifying assumption – most often, ignorability or selection on the observables – and then assesses the extent to which the estimate would vary as violations of the identifying assumption increase in severity.
In this chapter, we will lay the groundwork for our presentation of three strategies to estimate causal effects when simple conditioning on observed variables that lie along back-door paths will not suffice. These strategies will be taken up in Chapters 9, 10, and 11, where we will explain instrumental variable estimators, front-door identification with causal mechanisms, and conditioning estimators that use data on pretreatment values of the outcome variable. Under very specific assumptions, these three strategies will identify average causal effects of interest, even though selection is on the unobservables and treatment assignment is nonignorable.
In this chapter, we will first review the related concepts of nonignorable treatment assignment and selection on the unobservables, using the directed graphs presented in prior chapters. To deepen the understanding of these concepts, we will then demonstrate why the usage of additional posttreatment data on the outcome of interest is unlikely to aid in the point identification of the treatment effects of most central concern. One indirect goal of this demonstration is to convince the reader that oft-heard claims such as “I would be able to establish that this association is causal if I had longitudinal data” are nearly always untrue if the longed-for longitudinal data are additional measurements taken only after treatment exposure. Instead, longitudinal data are most useful, as we will later explain in detail in Chapter 11, when pretreatment measures are available for those who are subsequently exposed to the treatment.
As discussed in previous chapters, the fundamental challenge of causal inference is that an individual cannot be simultaneously observed in both the treatment and control states. In some situations, however, it is possible to observe the same individual or unit of observation in the treatment and control states at different points in time. If the potential outcomes do not evolve in time for reasons other than the treatment, then the causal effect of a treatment can be estimated as the difference between an individual's observed outcome in the control state at time 1 and the same individual's observed outcome in the treatment state at time 2. The assumption that potential outcomes are stable in time (and thus age for individuals) is often heroic. If, however, potential outcomes evolve in a predictable way, then it may be possible to use the longitudinal structure of the data to predict the counterfactual outcomes of each individual.
We begin our discussion with the interrupted time series (ITS) design, which we introduced already with the example of the year of the fire horse in Section 2.8.1. The ITS design is the simplest case where the goal is to determine the degree to which a treatment shifts the underlying trajectory of an outcome.
The rise of the counterfactual model to prominence has increased the popularity of data analysis routines that are most clearly useful for estimating the effects of causes. The matching estimators that we will review and explain in this chapter are perhaps the best example of a classic technique that has reemerged in the past three decades as a promising procedure for estimating causal effects. Matching represents an intuitive method for addressing causal questions, primarily because it pushes the analyst to confront the process of causal exposure as well as the limitations of available data. Accordingly, among social scientists who adopt a counterfactual perspective, matching methods are fast becoming an indispensable technique for prosecuting causal questions, even though they usually prove to be the beginning rather than the end of causal analysis on any particular topic.
We begin with a brief discussion of the past use of matching methods. Then, we present the fundamental concepts underlying matching, including stratification of the data, weighting to achieve balance, and propensity scores. Thereafter, we discuss how matching is usually undertaken in practice, including an overview of various matching algorithms.
In the course of presentation, we will offer four hypothetical examples that demonstrate some of the essential claims of the matching literature, progressing from idealized examples of stratification and weighting to the implementation of alternative matching algorithms on simulated data for which the treatment effects of interest are known by construction.
Providing a novel approach to sparsity, this comprehensive book presents the theory of stochastic processes that are ruled by linear stochastic differential equations, and that admit a parsimonious representation in a matched wavelet-like basis. Two key themes are the statistical property of infinite divisibility, which leads to two distinct types of behaviour - Gaussian and sparse - and the structural link between linear stochastic processes and spline functions, which is exploited to simplify the mathematical analysis. The core of the book is devoted to investigating sparse processes, including a complete description of their transform-domain statistics. The final part develops practical signal-processing algorithms that are based on these models, with special emphasis on biomedical image reconstruction. This is an ideal reference for graduate students and researchers with an interest in signal/image processing, compressed sensing, approximation theory, machine learning, or statistics.
The stochastic processes that we wish to characterize are those generated by linear transformation of non-Gaussian white noise. If we were operating in the discrete domain and restricting ourselves to a finite number of dimensions, we would be able to use any sequence of i.i.d. random variables wn as system input and rely on conventional multivariate statistics to characterize the output. This strongly suggests that the specification of the mixing matrix (L−1) and the probability density function (pdf) of the innovation is sufficient to obtain a complete description of a linear stochastic process, at least in the discrete setting.
But our goal is more ambitious since we place ourselves in the context of continuously defined processes. The situation is then not quite as straightforward because: (1) we are dealing with infinite-dimensional objects, (2) it is much harder to properly define the notion of continuous-domain white noise, and (3) there are theoretical restrictions on the class of admissible innovations. While this calls for an advanced mathematical machinery, the payoff is that the continuous-domain formalism lends itself better to analytical computations, by virtue of the powerful tools of functional and harmonic analysis. Another benefit is that the non-Gaussian members of the family are necessarily sparse as a consequence of the theory which rests upon the powerful characterization and existence theorems by Lévy–Khintchine, Minlos, Bochner, and Gelfand–Vilenkin.
As in the subsequent chapters, we start by providing some intuition in the first section and then proceed with a more formal characterization.
In order to obtain an uncoupled representation of a sparse process s = L−1w of the type described in the previous chapters, it is essential that we somehow invert the integral operator L−1. The ideal scenario would be to apply the differential operator L = (L−1)−1 to uncover the innovation w that is independent at every point. Unfortunately, this is not feasible in practice because we do not have access to the signal s(r) over the entire domain r ∈ ℝd, but only to its sampled values on a lattice or, more generally, to a series of coefficients in some appropriate basis. Our analysis options, as already alluded to in Chapter 2, are essentially twofold: the application of a discrete version of the operator, or an operator-like wavelet analysis.
This chapter is devoted to the in-depth investigation of these two modes of representation. As with the other chapters, we start with a concrete example (Lévy process) to lay down the key ideas in Section 8.1. Our primary tool for deriving the transform-domain pdfs is the characteristic functional, reviewed in Section 8.2 and properly extended so that it can handle arbitrary analysis functions φ ∈ Lp(ℝd). In Section 8.3, we investigate the decoupling ability of finite-difference-type operators and determine the statistical distribution of the resulting generalized increments. In Section 8.4, we show how a sparse process can be expanded in a matched-wavelet basis and provide the complete multivariate description of the transform-domain statistics, including general formulas for the wavelet cumulants.
Sparsity: Occam's razor of modern signal processing?
The hypotheses of Gaussianity and stationarity play a central role in the standard statistical formulation of signal processing. They fully justify the use of the Fourier transform as the optimal signal representation and naturally lead to the derivation of optimal linear filtering algorithms for a large variety of statistical estimation tasks. This classical view of signal processing is elegant and reassuring, but it is not at the forefront of research anymore.
Starting with the discovery of the wavelet transform in the late 1980s [Dau88, Mal89], researchers in signal processing have progressively moved away from the Fourier transform and have uncovered powerful alternatives. Consequently, they have ceased modeling signals as Gaussian stationary processes and have adopted a more deterministic, approximation-theoretic point of view. The key developments that are presently reshaping the field, and which are central to the theory presented in this book, are summarized below.
• Novel transforms and dictionaries for the representation of signals. New redundant and non-redundant representations of signals (wavelets, local cosine, curvelets) have emerged since the mid 1990s and have led to better algorithms for data compression, data processing, and feature extraction. The most prominent example is the wavelet based JPEG-2000 standard for image compression [CSE00], which outperforms the widely-used JPEG method based on the DCT (discrete cosine transform). Another illustration is wavelet-domain image denoising, which provides a good alternative to more traditional linear filtering [Don95]. The various dictionaries of basis functions that have been proposed so far are tailored to specific types of signals; there does not appear to be one that fits all.