To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Simulation of random variables is important in applied statistics for several reasons. First, we use probability models to mimic variation in the world, and the tools of simulation can help us better understand how this variation plays out. Second, we can use simulation to approximate the sampling distribution of data and propagate this to the sampling distribution of statistical estimates and procedures. Third, regression models are not deterministic; they produce probabilistic predictions. Simulation is the most convenient and general way to represent uncertainties in forecasts.
As discussed in Chapter 1, regression is fundamentally a technology for predicting an outcome y from inputs x1, x2, . . . . In this chapter we introduce regression in the simple (but not trivial) case of a linear model predicting a continuous y from a single continuous x, thus fitting the model yi = a+bxi +errortodata(xi,yi), i=1, ..., n. We demonstrate with an applied example that includes the steps of fitting the model, displaying the data and fitted line, and interpreting the fit. We then show how to check the fitting procedure using fake-data simulation, and the chapter concludes with an explanation of how linear regression includes simple comparison as a special case.
Most of this book is devoted to examples and tools for the practical use and understanding of regression models, starting with linear regression with a single predictor and moving to multiple predictors, nonlinear models, and applications in prediction and causal inference. In this chapter, we lay out some of the mathematical structure of inference for regression models and some algebra to help you understand estimation for linear regression. We also explain the rationale for the use of the Bayesian fitting routine stan_glm and its connection to classical linear regression. This chapter thus provides background and motivation for the mathematical and computational tools used in the rest of the book.
The present chapter considers two sorts of operations that are done as adjuncts to fitting a regression. In poststratification, the outputs from a fitted model are combined to make predictions about a new population that can differ systematically from the data. The model allows us to adjust for differences between sample and population–as long as the relevant adjustment variables are included as predictors in the regression, and as long as their distribution is known in the target population. Poststratification is a form of post-processing of inferences that is important in survey research and also arises in causal inference for varying treatment effects, as discussed in subsequent chapters. In contrast, missing-data analysis is a pre-processing step in which data are cleaned or imputed in some ways so as to allow them to be used more easily in a statistical analysis. This chapter introduces the basic ideas of poststratification and missing-data imputation using a mix of real and simulated-data examples.
We can apply the principle of logistic regression–taking a linear “link function“ y = a + bx and extending it through a nonlinear transformation and a probability model–to allow it to predict bounded or discrete data of different forms. This chapter presents this generalized linear modeling framework and goes through several important special cases, including Poisson or negative binomial regression for count data, the logistic-binomial and probit models, ordered logistic regression, robust regression, and some extensions. As always, we explain these models with a variety of examples, with graphs of data and fitted models along with associated R code, with the goal that you should be able to build, fit, understand, and evaluate these models on new problems.
In this chapter we turn to the assumptions of the regression model, along with diagnostics that can be used to assess whether some of these assumptions are reasonable. Some of the most important assumptions rely on the researcher’s knowledge of the subject area and may not be directly testable from the available data alone. Hence, it is good to understand the ideas underlying the model, while recognizing that there is no substitute for engagement with data and the purposes for which they are being used. We show different sorts of plots of data, fitted models, and residuals, developing these methods in the context of real and simulated-data examples. We consider diagnostics based on predictive simulation from the fitted model, along with numerical summaries of fit, including residual error, explained variance, external validation, and cross validation. The goal is to develop a set of tools that you can use in constructing, interpreting, and evaluating regression models with multiple predictors.
Bayesian inference involves three steps that go beyond classical estimation. First, the data and model are combined to form a posterior distribution, which we typically summarize by a set of simulations of the parameters in the model. Second, we can propagate uncertainty in this distribution–that is, we can get simulation-based predictions for unobserved or future outcomes that account for uncertainty in the model parameters. Third, we can include additional information into the model using a prior distribution. The present chapter describes all three of these steps in the context of examples capturing challenges of prediction and inference.
Before fitting a model, though, it is a good idea to understand where your numbers are coming from. The present chapter demonstrates through examples how to use graphical tools to explore and understand data and measurements.
Building on the success of Abadir and Magnus' Matrix Algebra in the Econometric Exercises Series, Statistics serves as a bridge between elementary and specialized statistics. Professors Abadir, Heijmans, and Magnus freely use matrix algebra to cover intermediate to advanced material. Each chapter contains a general introduction, followed by a series of connected exercises which build up knowledge systematically. The characteristic feature of the book (and indeed the series) is that all exercises are fully solved. The authors present many new proofs of established results, along with new results, often involving shortcuts that resort to statistical conditioning arguments.
This groundbreaking textbook combines straightforward explanations with a wealth of practical examples to offer an innovative approach to teaching linear algebra. Requiring no prior knowledge of the subject, it covers the aspects of linear algebra - vectors, matrices, and least squares - that are needed for engineering applications, discussing examples across data science, machine learning and artificial intelligence, signal and image processing, tomography, navigation, control, and finance. The numerous practical exercises throughout allow students to test their understanding and translate their knowledge into solving real-world problems, with lecture slides, additional computational exercises in Julia and MATLAB®, and data sets accompanying the book online. Suitable for both one-semester and one-quarter courses, as well as self-study, this self-contained text provides beginning students with the foundation they need to progress to more advanced study.