To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Abstract. Epidemiologic methods were developed to prove general causation: identifying exposures that increase the risk of particular diseases. Courts often are more interested in specific causation: On balance of probabilities, was the plaintiff's disease caused by exposure to the agent in question? Some authorities have suggested that a relative risk greater than 2.0 meets the standard of proof for specific causation. Such a definite criterion is appealing, but there are difficulties. Bias and confounding are familiar problems; and individual differences must also be considered. The issues are explored in the context of the swine flu vaccine and Guillain-Barré syndrome. The conclusion: There is a considerable gap between relative risks and proof of specific causation.
Introduction
This article discusses the role of epidemiologic evidence in toxic tort cases, especially, relative risk: Does a relative risk above 2.0 show specific causation? Relative risk compares groups in an epidemiologic study: One group is exposed to some hazard–like a toxic substance; the other “control” group is not exposed. For present purposes, relative risk is the ratio
RR = Observed/Expected.
The numerator in this fraction is the number of injuries observed in the exposed group. The expected number in the denominator is computed on the theory that exposure has no effect, so that injury rates in the exposed group should be the same as injury rates in the control group.
“Son, no matter how far you travel, or how smart you get, always remember this: Someday, somewhere, a guy is going to show you a nice brand-new deck of cards on which the seal is never broken, and this guy is going to offer to bet you that the jack of spades will jump out of this deck and squirt cider in your ear. But, son, do not bet him, for as sure as you do you are going to get an ear full of cider.”
—Damon Runyon
Abstract. After sketching the conflict between objectivists and subjectivists on the foundations of statistics, this chapter discusses an issue facing statisticians of both schools, namely, model validation. Statistical models originate in the study of games of chance and have been successfully applied in the physical and life sciences. However, there are basic problems in applying the models to social phenomena; some of the difficulties will be pointed out. Hooke's law will be contrasted with regression models for salary discrimination, the latter being a fairly typical application in the social sciences.
What is probability?
For a contemporary mathematician, probability is easy to define, as a countably additive set function on a σ-field, with a total mass of one. This definition, perhaps cryptic for non-mathematicians, was introduced by A. N. Kolmogorov around 1930, and has been extremely convenient for mathematical work; theorems can be stated with clarity, and proved with rigor.
Abstract. The U.S. Census is a sophisticated, complex undertaking, carried out on a vast scale. It is remarkably accurate. Statistical adjustment is unlikely to improve on the census, because adjustment can easily introduce more error than it takes out. The data suggest a strong geographical pattern to such errors even after controlling for demographic variables, which contradicts basic premises of adjustment. In fact, the complex demographic controls built into the adjustment process seem on whole to have been counter-productive.
Introduction
The census has been taken every ten years since 1790, and provides a wealth of demographic information for researchers and policy-makers. Beyond that, counts are used to apportion Congress and redistrict states. Moreover, census data are the basis for allocating federal tax money to cities and other local governments. For such purposes, the geographical distribution of the population matters more than counts for the nation as a whole. Data from 1990 and previous censuses suggested there would be a net undercount in 2000. Furthermore, the undercount would depend on age, race, ethnicity, gender, and–most importantly–geography. This differential undercount, with its implications for sharing power and money, attracted considerable attention in the media and the courthouse.
There were proposals to adjust the census by statistical methods, but this is advisable only if the adjustment gives a truer picture of the population and its geographical distribution.
Abstract. Statistical inference with convenience samples is a risky business. Technical issues and substantive issues overlap. No amount of statistical maneuvering can get very far without deep understanding of how the data were generated. Empirical generalizations from a single data set should be viewed with suspicion. Rather than ask what would happen in principle if the study were repeated, it is better to repeat the study–as is standard in physical science. Indeed, it is generally impossible to predict variability across replications of an experiment without replicating the experiment, just as it is generally impossible to predict the effect of intervention without actually intervening.
Introduction
Researchers who study punishment and social control, like those who study other social phenomena, typically seek to generalize their findings from the data they have to some larger context: In statistical jargon, they generalize from a sample to a population. Generalizations are one important product of empirical inquiry. Of course, the process by which the data are selected introduces uncertainty. Indeed, any given data set is but one of many that could have been studied. If the data set had been different, the statistical summaries would have been different, and so would the conclusions, at least by a little.
Abstract. Regression adjustments are often made to experimental data to address confounders that may not be balanced by randomization. Since randomization does not justify the models, bias is likely; nor are the usual variance calculations to be trusted. Here, we evaluate regression adjustments using Neyman's non-parametric model. Previous results are generalized, and more intuitive proofs are given. A bias term is isolated, and conditions are given for unbiased estimation in finite samples.
Introduction
Data from randomized controlled experiments (including clinical trials) are often analyzed using regression models and the like. The behavior of the estimates can be calibrated using the non-parametric model in Neyman (1923), where each subject has potential responses to several possible treatments. Only one response can be observed, according to the subject's assignment; the other potential responses must then remain unobserved. Covariates are measured for each subject and may be entered into the regression, perhaps with the hope of improving precision by adjusting the data to compensate for minor imbalances in the assignment groups.
As discussed in Freedman (2006b [Chapter 17], 2008a), randomization does not justify the regression model, so that bias can be expected, and the usual formulas do not give the right variances. Moreover, regression need not improve precision. Here, we extend some of those results, with proofs that are more intuitive.
Abstract. Endogeneity bias is an issue in regression models, including linear and probit models. Conventional methods for removing the bias have their own problems. The usual Heckman two-step procedure should not be used in the probit model: From a theoretical perspective, this procedure is unsatisfactory, and likelihood methods are superior. However, serious numerical problems occur when standard software packages try to maximize the biprobit likelihood function, even if the number of covariates is small. The log-likelihood surface may be nearly flat or may have saddle points with one small positive eigenvalue and several large negative eigenvalues. The conditions under which parameters in the model are identifiable are described; this produces novel results.
Introduction
Suppose a linear regression model describes responses to treatment and to covariates. If subjects self-select into treatment, the process being dependent on the error term in the model, endogeneity bias is likely. Similarly, we may have a linear model that is to be estimated on sample data; if subjects self-select into the sample, endogeneity becomes an issue.
Heckman (1978, 1979) suggested a simple and ingenious two-step method for taking care of endogeneity, which works under the conditions described in those papers. This method is widely used. Some researchers have applied the method to probit response models. However, the extension is unsatisfactory.
This book is about the coordinate-free, or geometric, approach to the theory of linear models; more precisely, Model I ANOVA and linear regression models with non-random predictors in a finite-dimensional setting. This approach is more insightful, more elegant, more direct, and simpler than the more common matrix approach to linear regression, analysis of variance, and analysis of covariance models in statistics. The book discusses the intuition behind and optimal properties of various methods of estimating and testing hypotheses about unknown parameters in the models. Topics covered range from linear algebra, such as inner product spaces, orthogonal projections, book orthogonal spaces, Tjur experimental designs, basic distribution theory, the geometric version of the Gauss-Markov theorem, optimal and non-optimal properties of Gauss-Markov, Bayes, and shrinkage estimators under assumption of normality, the optimal properties of F-test, and the analysis of covariance and missing observations.
Drawing sound causal inferences from observational data is a central goal in social science. How to do so is controversial. Technical approaches based on statistical models—graphical models, non-parametric structural equation models, instrumental variable estimators, hierarchical Bayesian models, etc.—are proliferating. But David Freedman has long argued that these methods are not reliable. He demonstrated repeatedly that it can be better to rely on subject-matter expertise and to exploit natural variation to mitigate confounding and rule out competing explanations.
When Freedman first enunciated this position decades ago, many were skeptical. They found it hard to believe that a probabilist and mathematical statistician of his stature would favor “low-tech” approaches. But the tide is turning. An increasing number of social scientists now agree that statistical technique cannot substitute for good research design and subject-matter knowledge. This view is particularly common among those who understand the mathematics and have on-the-ground experience.
Historically, “shoe-leather epidemiology” is epitomized by intensive, door-to-door canvassing that wears out investigators' shoes. In contrast, advocates of statistical modeling sometimes claim that their methods can salvage poor research design or low-quality data. Some suggest that their algorithms are general-purpose inference engines: Put in data, turn the crank, out come quantitative causal relationships, no knowledge of the subject required.
This is tantamount to pulling a rabbit from a hat. Freedman's conservation of rabbits principle says “to pull a rabbit from a hat, a rabbit must first be placed in the hat.” In statistical modeling, assumptions put the rabbit in the hat.
Abstract. Regressions can be weighted by propensity scores in order to reduce bias. However, weighting is likely to increase random error in the estimates and to bias the estimated standard errors downward, even when selection mechanisms are well understood. Moreover, in some cases, weighting will increase the bias in estimated causal parameters. If investigators have a good causal model, it seems better just to fit the model without weights. If the causal model is improperly specified, there can be significant problems in retrieving the situation by weighting, although weighting may help under some circumstances.
Estimating causal effects is often the key to evaluating social programs, but the interventions of interest are seldom assigned at random. Observational data are therefore frequently encountered. In order to estimate causal effects from observational data, some researchers weight regressions using “propensity scores.” This simple and ingenious idea is due to Robins and his collaborators. If the conditions are right, propensity scores can be used to advantage when estimating causal effects.
However, weighting has been applied in many different contexts. The costs of misapplying the technique, in terms of bias and variance, can be serious. Many users, particularly in the social sciences, seem unaware of the pitfalls. Therefore, it may be useful to explain the idea and the circumstances under which it can go astray.
Abstract. In his 1997 book, King announced “A Solution to the Ecological Inference Problem.” King's method may be tested with data where truth is known. In the test data, his method produces results that are far from truth, and diagnostics are unreliable. Ecological regression makes estimates that are similar to King's, while the neighborhood model is more accurate. His announcement is premature.
Introduction
Before discussing King (1997), we explain the problem of “ecological inference.” Suppose, for instance, that in a certain precinct there are 500 registered voters of whom 100 are Hispanic and 400 are non-Hispanic. Suppose too that a Hispanic candidate gets ninety votes in this precinct. (Such data would be available from public records.) We would like to know how many of the votes for the Hispanic candidate came from the Hispanics. That is a typical ecological-inference problem. The secrecy of the ballot box prevents a direct solution, so indirect methods are used.
This review will compare three methods for making ecological inferences. First and easiest is the “neighborhood model.” This model makes its estimates by assuming that, within a precinct, ethnicity has no influence on voting behavior: In the example, of the ninety votes for the Hispanic candidate, 90 × 100/(100 + 400) = 18 are estimated to come from the Hispanic voters. The second method to consider is “ecological regression,” which requires data on many precincts (indexed by i).
Abstract. The logit model is often used to analyze experimental data. However, randomization does not justify the model, so the usual estimators can be inconsistent. A consistent estimator is proposed. Neyman's non-parametric setup is used as a benchmark. In this setup, each subject has two potential responses, one if treated and the other if untreated; only one of the two responses can be observed. Beside the mathematics, there are simulation results, a brief review of the literature, and some recommendations for practice.
Introduction
The logit model is often fitted to experimental data. As explained below, randomization does not justify the assumptions behind the model. Thus, the conventional estimator of log odds is difficult to interpret; an alternative will be suggested. Neyman's setup is used to define parameters and prove results. (Grammatical niceties apart, the terms “logit model” and “logistic regression” are used interchangeably.)
After explaining the models and estimators, we present simulations to illustrate the findings. A brief review of the literature describes the history and current usage. Some practical recommendations are derived from the theory. Analytic proofs are sketched at the end of the chapter.
Abstract. King's “solution” works with some data sets and fails with others. As a theoretical matter, inferring the behavior of subgroups from aggregate data is generally impossible: The relevant parameters are not identifiable. Unfortunately, King's diagnostics do not discriminate between probable successes and probable failures. Caution would seem to be in order.
Introduction
King (1997) proposed a method for ecological inference and made sweeping claims about its validity. According to King, his method provided realistic estimates of uncertainty, with diagnostics capable of detecting failures in assumptions. He also claimed that his method was robust, giving correct inferences even when the model is wrong.
Our review (Freedman, Klein, Ostland, and Roberts 1998 [Chapter 5]) showed that the claims were exaggerated. King's method works if its assumptions hold. If assumptions fail, estimates are unreliable: so are internally-generated estimates of uncertainty. His diagnostics do not distinguish between cases where his method works and where it fails. King (1999) raised various objections to our review. After summarizing the issues, we will respond to his main points and a few of the minor ones. The objections have little substance.
Model comparisons
Our review compared King's method to ecological regression and the neighborhood model. In our test data, the neighborhood model was the most accurate, while King's method was no better than ecological regression. To implement King's method, we used his software package EZIDOS, which we downloaded from his web site.
David A. Freedman presents in this book the foundations of statistical models and their limitations for causal inference. Examples, drawn from political science, public policy, law, and epidemiology, are real and important.
A statistical model is a set of equations that relate observable data to underlying parameters. The parameters are supposed to characterize the real world. Formulating a statistical model requires assumptions. Rarely are those assumptions tested. Indeed, some are untestable in principle, as Freedman shows in this volume. Assumptions are involved in choosing which parameters to include, the functional relationship between the data and the parameters, and how chance enters the model. It is common to assume that the data are a simple function of one or more parameters, plus random error. Linear regression is often used to estimate those parameters. More complicated models are increasingly common, but all models are limited by the validity of the assumptions on which they ride.
Freedman's observation that statistical models are fragile pervades this volume. Modeling assumptions—rarely examined or even enunciated—fail in ways that undermine model-based causal inference. Because of their unrealistic assumptions, many new techniques constitute not progress but regress. Freedman advocates instead “shoe leather” methods, which identify and exploit natural variation to mitigate confounding and which require intimate subject-matter knowledge to develop appropriate research designs and eliminate rival explanations.
Abstract. One type of scientific inquiry involves the analysis of large data sets, often using statistical models and formal tests of hypotheses. Large observational studies have, for example, led to important progress in health science. However, in fields ranging from epidemiology to political science, other types of scientific inquiry are also productive. Informal reasoning, qualitative insights, and the creation of novel data sets that require deep substantive knowledge and a great expenditure of effort and shoe leather have pivotal roles. Many breakthroughs came from recognizing anomalies and capitalizing on accidents, which require immersion in the subject. Progress means refuting old ideas if they are wrong, developing new ideas that are better, and testing both. Qualitative insights can play a key role in all three tasks. Combining the qualitative and the quantitative–and a healthy dose of skepticism–may provide the most secure results.
One type of scientific inquiry involves the analysis of large data sets, often using statistical models and formal tests of hypotheses. A moment's thought, however, shows that there must be other types of scientific inquiry. For instance, something has to be done to answer questions like the following. How should a study be designed? What sorts of data should be collected? What kind of a model is needed? Which hypotheses should be formulated in terms of the model and then tested against the data?
Abstract. Graphical models for causation can be set up using fewer hypothetical counterfactuals than are commonly employed. Invariance of error distributions may be essential for causal inference, but the errors themselves need not be invariant. Graphs can be interpreted using conditional distributions so that one can better address connections between the mathematical framework and causality in the world. The identification problem is posed in terms of conditionals. As will be seen, causal relationships cannot be inferred from a data set by running regressions unless there is substantial prior knowledge about the mechanisms that generated the data. There are few successful applications of graphical models, mainly because few causal pathways can be excluded on a priori grounds. The invariance conditions themselves remain to be assessed.
In this chapter, I review the logical basis for inferring causation from regression equations, proceeding by example. The starting point is a simple regression, next is a path model, and then simultaneous equations (for supply and demand). After that come nonlinear graphical models.
The key to making a causal inference from nonexperimental data by regression is some kind of invariance, exogeneity being a further issue. Parameters need to be invariant to interventions. This well-known condition will be stated here with a little more precision than is customary. Invariance is also needed for errors or error distributions, a topic that has attracted less attention.