To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
I test several hypotheses concerning the origins of political repression in the states of the United States. The hypotheses are drawn from the elitist theory of democracy, which asserts that repression of unpopular political minorities stems from the intolerance of the mass public, the generally more tolerant elites not supporting such repression. Focusing on the repressive legislation adopted by the states during the McCarthy era, I examine the relationships between elite and mass opinion and repressive public policy. Generally it seems that elites, not masses, were responsible for the repression of the era. These findings suggest that the elitist theory of democracy is in need of substantial theoretical reconsideration, as well as further empirical investigation.
Over three decades of research on citizen willingness to “put up with” political differences has led to the conclusion that the U.S. public is remarkably intolerant. Though the particular political minority that is salient enough to attract the wrath of the public may oscillate over time between the Left and the Right (e.g., Sullivan, Piereson, and Marcus 1982), generally, to be much outside the centrist mainstream of U.S. politics is to incur a considerable risk of being the object of mass political intolerance.
This book is primarily intended for advanced undergraduates or beginning graduate students in statistics. It should also be of interest to many students and professionals in the social and health sciences. Although written as a textbook, it can be read on its own. The focus is on applications of linear models, including generalized least squares, two-stage least squares, probits and logits. The bootstrap is explained as a technique for estimating bias and computing standard errors.
The contents of the book can fairly be described as what you have to know in order to start reading empirical papers that use statistical models. The emphasis throughout is on the connection—or lack of connection—between the models and the real phenomena. Much of the discussion is organized around published studies; the key papers are reprinted for ease of reference. Some observers may find the tone of the discussion too skeptical. If you are among them, I would make an unusual request: suspend belief until you finish reading the book. (Suspension of disbelief is all too easily obtained, but that is a topic for another day.)
The first chapter contrasts observational studies with experiments, and introduces regression as a technique that may help to adjust for confounding in observational studies. There is a chapter that explains the regression line, and another chapter with a quick review of matrix algebra. (At Berkeley, half the statistics majors need these chapters.)
1. In table 1, there were 837 deaths from other causes in the total treatment group (screened plus refused) and 879 in the control group. Not much different.
Comments. (i) Groups are the same size, so we can look at numbers or rates. (ii) The difference in number of deaths is relatively small, and not statistically significant.
2. This comparison is biased. The control group includes women who would have accepted screening if they had been asked, and are therefore comparable to women in the screening group. But the control group also includes women who would have refused screening. The latter are poorer, less well educated, less at risk from breast cancer. (A comparison that includes only the subjects who follow the investigators' treatment plans is called “per protocol analysis,” and is generally biased.)
3. Natural experiment. The fact that the Lambeth Company moved its pipe (i) sets up the comparison with Southwark & Vauxhall (table 2) and (ii) makes it harder to explain the difference in death rates between the Lambeth customers and the Southwark & Vauxhall customers on the basis of some difference between the two groups—other than the water. For instance, people were generally not choosing between the two water companies on the basis of how the water tasted. If they had been, selfselection and confounding would be bigger issues. The change in water intake point is one basis for the view that the data could be analyzed as if they were from a randomized controlled experiment.
Every statistician and data analyst has to make choices. The need arises especially when data have been collected and it is time to think about which model to use to describe and summarise the data. Another choice, often, is whether all measured variables are important enough to be included, for example, to make predictions. Can we make life simpler by only including a few of them, without making the prediction significantly worse?
In this book we present several methods to help make these choices easier. Model selection is a broad area and it reaches far beyond deciding on which variables to include in a regression model.
Two generations ago, setting up and analysing a single model was already hard work, and one rarely went to the trouble of analysing the same data via several alternative models. Thus ‘model selection’ was not much of an issue, apart from perhaps checking the model via goodness-of-fit tests. In the 1970s and later, proper model selection criteria were developed and actively used. With unprecedented versatility and convenience, long lists of candidate models, whether thought through in advance or not, can be fitted to a data set. But this creates problems too. With a multitude of models fitted, it is clear that methods are needed that somehow summarise model fits.
Data can often be modelled in different ways. There might be simple approaches and more advanced ones that perhaps have more parameters. When many covariates are measured we could attempt to use them all to model their influence on a response, or only a subset of them, which would make it easier to interpret and communicate the results. For selecting a model among a list of candidates, Akaike's information criterion (AIC) is among the most popular and versatile strategies. Its essence is a penalised version of the attained maximum log-likelihood, for each model. In this chapter we shall see AIC at work in a range of applications, in addition to unravelling its basic construction and properties. Attention is also given to natural generalisations and modifications of AIC that in various situations aim at performing more accurately.
Information criteria for balancing fit with complexity
In Chapter 1 various problems were discussed where the task of selecting a suitable statistical model, from a list of candidates, was an important ingredient. By necessity there are different model selection strategies, corresponding to different aims and uses associated with the selected model. Most (but not all) selection methods are defined in terms of an appropriate information criterion, a mechanism that uses data to give each candidate model a certain score; this then leads to a fully ranked list of candidate models, from the ostensibly best to the worst.
In this chapter we compare some information criteria with respect to consistency and efficiency, which are classical themes in model selection. The comparison is driven by a study of the ‘penalty’ applied to the maximised log-likelihood value, in a framework with increasing sample size. AIC is not strongly consistent, though it is efficient, while the opposite is true for the BIC. We also introduce Hannan and Quinn's criterion, which has properties similar to those of the BIC, while Mallows's Cp and Akaike's FPE behave like AIC.
Comparing selectors: consistency, efficiency and parsimony
If we make the assumption that there exists one true model that generated the data and that this model is one of the candidate models, we would want the model selection method to identify this true model. This is related to consistency. A model selection method is weakly consistent if, with probability tending to one as the sample size tends to infinity, the selection method is able to select the true model from the candidate models. Strong consistency is obtained when the selection of the true model happens almost surely. Often, we do not wish to make the assumption that the true model is amongst the candidate models. If instead we are willing to assume that there is a candidate model that is closest in Kullback–Leibler distance to the true model, we can state weak consistency as the property that, with probability tending to one, the model selection method picks such a closest model.
This book is about making choices. If there are several possibilities for modelling data, which should we take? If multiple explanatory variables are measured, should they all be used when forming predictions, making classifications, or attempting to summarise analysis of what influences response variables, or will including only a few of them work equally well, or better? If so, which ones can we best include? Model selection problems arrive in many forms and on widely varying occasions. In this chapter we present some data examples and discuss some of the questions they lead to. Later in the book we come back to these data and suggest some answers. A short preview of what is to come in later chapters is also provided.
Introduction
With the current ease of data collection which in many fields of applied science has become cheaper and cheaper, there is a growing need for methods which point to interesting, important features of the data, and which help to build a model. The model we wish to construct should be rich enough to explain relations in the data, but on the other hand simple enough to understand, explain to others, and use. It is when we negotiate this balance that model selection methods come into play. They provide formal support to guide data users in their search for good models, or for determining which variables to include when making predictions and classifications.
The model selection methods presented earlier (such as AIC and the BIC) have one thing in common: they select one single ‘best model’, which should then be used to explain all aspects of the mechanisms underlying the data and predict all future data points. The tolerance discussion in Chapter 5 showed that sometimes one model is best for estimating one type of estimand, whereas another model is best for another estimand. The point of view expressed via the focussed information criterion (FIC) is that a ‘best model’ should depend on the parameter under focus, such as the mean, or the variance, or the particular covariate values, etc. Thus the FIC allows and encourages different models to be selected for different parameters of interest.
Estimators and notation in submodels
In model selection applications there is a list of models to consider. We shall assume here that there is a ‘smallest’ and a ‘biggest’ model among these, and that the others lie between these two extremes. More concretely, there is a narrow model, which is the simplest model that we possibly might use for the data, having an unknown parameter vector θ of length p. Secondly, in the wide model, the largest model that we consider, there are an additional q parameters γ = (γ1, …, γq). We assume that the narrow model is a special case of the wide model, which means that there is a value γ0 such that with γ = γ0 in the wide model, we get precisely the narrow model.