We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging 'flipped classroom' environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.
This chapter uses history of polling to explain how pollsters have dealt with challenges of nonresponse. It tells the tale of three polling paradigms: large-scale polling, quota sampling, and random sampling. The first two paradigms came crashing down after pollsters made poor predictions for presidential elections. The third paradigm remains vibrant intellectually, but is increasingly difficult to implement. We do not yet know if the bad polling predictions in 2016 and 2020 will push the field to a new paradigm, but certainly they raised doubts about the current state of the field.
This chapter focuses on next-generation selection models that allow us to expand on the Heckman model using copula and control function models that allow one to estimate selection models for a large range of other statistical distributions. This chapter also shows how to generate weights that account for nonignorable nonresponse; not only do these weights increase the weight on demographic groups that respond with lower probabilities, they also increase weights on people with opinions that may make them less inclined to respond. This chapter also shows how to modify a Heckman model to allow for estimation of a nonignorable nonresponse selection model when we have a response-related variable that is available only for people in the survey sample.
This is a brief conclusion arguing that the direction forward is clear, even if the path is not. The time for assuming away problems is past. We should begin with a paradigm that reflects all the ways that polling can go wrong and then identify, model, and measure all the sources of bias, not just the ones that are easy to fix. Much work remains to be done, though, as these new models and data sources will require much evaluation and development theoretically, empirically, and practically. The payoff will be that survey researchers will be able to remain true to their aspirations of using information about a small number of people to understand the realities about many people, even as it gets harder and hear from anyone, let alone the random samples that our previous theory relied on.
This chapter explores ways to diagnose the potential for nonignorable nonresponse to cause problems. Section 7.1 describes how to define the range of possible values of population values that are consistent with the observed data. These calculations require virtually no assumptions and are robust to nonignorable nonresponse; they are simple yet tend to be uninformative. Section 7.2 shows how to postulate possible levels of nonignorability and assess how results would change.
This chapter brings together the argument so far, showing how nonignorable nonresponse may manifest itself and how the various models perform across these contexts, including how they may fail. It also highlights the ideal response to potential nonignorable nonresponse, which involves (1) creating randomized instruments, (2) using the randomized instrument to diagnose nonignorable nonresponse, (3) moving to conventional weights if there is no evidence of nonignorable nonresponse, but (4) using selection models explained here when there is evidence of nonignorable nonresponse. Section 11.1 simulates and analyzes data across a range of scenarios using multiple methods. Section 11.2 discusses how to diagnose whether nonresponse is nonignorable. Section 11.3 integrates the approaches with a decision tree based on properties of the data. Section 11.4 discusses how selection models can fail.
This chapter describes contemporary practices of probabilistic and nonprobabilistic pollsters. First, even pollsters who aspire to random sampling are doing something quite foreign to the random sampling paradigm. Continuing to use the language of random sampling is therefore becoming increasingly untenable. Second, the energy and growth in polling is concentrated in nonprobabilistic polls that do not even pretend to adhere to the tenets of the random sampling paradigm. When we use, teach, and critique such polls, we need a new language for assessing them. Finally, one of the biggest vulnerabilities for both probabilistic and nonprobabilistic polling is nonignorable nonresponse, something largely ignored in the current state of the art. It is striking that despite the incredible diversity of techniques currently deployed, academic and commercial pollsters mostly continue to use models that assume away nonignorable nonresponse.
This chapter illustrates how to use randomized response treatments to assess possible nonresponse bias. It focuses on a 2019 survey and shows how nonignorable nonresponse may have deflated Trump support in the Midwest and among Democrats even as nonignorable nonresponse inflated Trump support among Republicans. We also show that Democrats who responded to the poll were much more liberal on race than Democrats who did not respond, a pattern that was particularly strong among White Democrats and absent among non-White Democrats. Section 12.1 describes a survey design with a randomized response instrument. Section 12.2 discusses nonignorable nonresponse bias for turnout questions. Section 12.3 looks at presidential support, revealing regional and partisan differences in nonignorable nonresponse. Section 12.4 looks at race, focusing on partisan and racial differences in nonignorable nonresponse. Section 12.5 assesses nonignorable nonresponse on climate, taxes, and tariffs.
This chapter presents the intuition behind why nonignorable nonresponse can be a problem and how it can arise in many contexts. With a foundation that explicitly centers this possibility, we can better reason through when the problem may be larger, how to diagnose it, and how to fix or at least ameliorate it. Section 5.1 describes qualitatively when nonignorable nonresponse may be likely. Section 5.2 works through the intuition about how and why nonignorable nonresponse undermines polling accuracy. Section 5.3 presents a framework for modeling nonignorable nonresponse and culminates by describing Meng’s (2018) model of sampling error. Section 5.4 raises the possibility that nonignorability varies across groups, over time, and even across questions.
Nonresponse is a challenge in many fields, including demography, economics, public health, sociology, and business. This chapter explores nonpolitical manifestations of nonignorable nonresponse by focusing on population health. For many conditions, the decision to get tested or the willingness to allow a test is deeply wrapped up in the likelihood of having the condition. During Covid, for example, people who thought they might have been exposed to the virus were almost certainly more likely to get tested meaning that nonignorable nonresponse complicated our ability to understand the Covid outbreak. Section 13.1 discusses the challenge of estimating public health variables in terms of a nonignorable missing data problem. Section 13.2 explores how first-stage instruments can improve the efficiency and accuracy of efforts to assess prevalence. Section 13.3 presents a framework for comparing Covid positivity rates across regions even when testing rates differ.
This chapter explains weighting in a manner that allows us to appreciate both the power and vulnerability of the technique and, by extension, other techniques that rely on similar assumptions. Once we understand how weighting works, we will better understand when it works. This chapter opens by discussing weighting in general terms. The subsequent sections get more granular. Sections 3.2 and 3.3 cover widely used weighting techniques: cell-weighting and raking. Section 3.4 covers variable selection, a topic that may well be more important than weighting technique. Section 3.5 covers the effect of weighting on precision, a topic that frequently gets lost in polling reporting. This chapter mixes intuitive and somewhat technical descriptions of weighting. The technical details in Sections 3.2 and3.3 can be skimmed by readers focused on the big picture how weighting works.
This chapter introduces selection models in a way that highlights important intuition about how they work. Section 8.1 formalizes the model we’ve been working with already. Section 8.2 uses the model to highlight a bad news, good news story. The bad news is that statistical estimation of a two-equation model like this will be challenging. The good news is that the model helps us recognize the traces nonignorable nonresponse leaves in observable data. Section 8.3 introduces the Heckman selection model. Section 8.4 uses the Heckman model to highlight the starkly different way that selection and weighting approaches use information. The Heckman model is far from perfect, however, as Section 8.5 explains.