To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In statistics, we are often interested in some characteristics of a population. Maybe we are interested in the mean of some measurable characteristic, or maybe we are interested in the proportion of the population that have some property. In all but the simplest cases, the population is so large that it is impossible, or at least impractical, to take the measurement on every item in the population. We therefore have to settle on taking a sample and measuring those units selected for this sample.
Forecasting is an important problem that spans many fields, including business and industry, government, economics, environmental sciences, medicine, social science, politics, and finance. Forecasting problems are often classified as short term, medium term, and long term. Short-term forecasting problems involve predicting events only a few time periods (days, weeks, and months) into the future. Medium-term forecasts extend from 1 to 2 years into the future, and long-term forecasting problems can extend beyond that by many years.
Often we look at the relationships between categorical variables, such as which hospital a patient is admitted to, or whether a person has diabetes, pre-diabetes, or no diabetes at all. These variables can be nominal (like the hospital) or ordinal (like diabetes, pre-diabetes, or no diabetes). In many cases we want to know something about how these variables are related.
In the 1990s and before, most of the world’s information was stored on paper and other analog media, such as film. However, with the proliferation of personal computers and the internet, by 2000 one-quarter of the world’s information was stored digitally. Since that time, the amount of digital data has exploded, roughly doubling every couple of years, so that now more than 98% of all stored information is digital.
Graphical plots are the means by which data are most easily visualized and understood. Indeed, there is no better tool for finding patterns in data than the human eye applied to appropriate displays of relevant data, particularly patterns that are ill-specified or unknown.
Overfitting refers to the use of a model with more parameters than can be justified by the data. Models that are overfit are often poor at predicting the outcome of new observations, that is, observations that were not used in the construction of the model. The next example illustrates this concept.
We began the last chapter by reviewing the terms population, parameter, sample, and statistic. Parameters are numerical characteristics of a population that we would like to know, but since the population is nearly always too large to make a measurement on every unit, we often rely on a sample from the population.
In Chapter 9 we discussed point estimation for a parameter or a vector of parameters. In Chapters 10 and 11, on confidence intervals and hypothesis testing, we needed the idea of the standard error of an estimator.
A common problem in statistics is to compare groups. Does a new drug work better at reducing the time of hospitalization from COVID? Which pop-up ad generates a higher click-rate? Which type of metal – aluminum, brass, or stainless steel – will produce the most reliable product? Usually, the question involves either the mean response or the proportion of responses.
The problem of statistical inference can be described as follows. There is a population and we would like to know certain aspects of the units that make up the population. For example, we might want to know what proportion have a certain property, or what the mean value (of some measure) of all units in the population is. The population is too large to sample in its entirety, so we rely on information from a sample taken from the population.
In Chapter 14 we studied multiple regression and polynomial regression and how these techniques can be used to determine the relationship between an outcome and several predictor variables .
In Chapter 3 we learned about the fundamental ideas of probability, and in Chapter 4 we generalized the notion of probability from working with sets to working with random variables and distributions. In many ways, random variables and their associated distributions can simplify probability calculations and, appropriately applied, are useful models for real-world phenomena.
In this chapter, we look at the analytic studies that are our main tools for identifying the causes of disease and evaluating health interventions. Unlike descriptive epidemiology, analytic studies involve planned comparisons between people with and without disease, or between people with and without exposures thought to cause (or prevent) disease. They try to answer the questions, ‘Why do some people develop disease?’ and ‘How strong is the association between exposure and outcome?’. This group of studies includes the intervention, cohort and case–control studies that you met briefly in Chapter 1. Together, descriptive and analytic epidemiology provide information for all stages of health planning, from the identification of problems and their causes to the design, funding and implementation of public health solutions and the evaluation of whether these solutions really work and are cost-effective in practice.
This paper considers linear rational expectations models in the frequency domain. The paper characterizes existence and uniqueness of solutions to particular as well as generic systems. The set of all solutions to a given system is shown to be a finite-dimensional affine space in the frequency domain. It is demonstrated that solutions can be discontinuous with respect to the parameters of the models in the context of nonuniqueness, invalidating mainstream frequentist and Bayesian methods. The ill-posedness of the problem motivates regularized solutions with theoretically guaranteed uniqueness, continuity, and even differentiability properties.
In the previous chapters we have considered the ‘nuts and bolts’ of epidemiology. In this and the next few chapters we look at how epidemiology is used in practice to improve public health. We start with ‘surveillance’ because without timely information on emerging and changing health problems, public health action can be paralysed or, at best, inefficient. In this chapter we discuss the design and use of surveillance systems that enable health officials to detect new risks and diseases such as mpox promptly, track known diseases and health problems, and generate data needed for effective health planning and resource allocation.