To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This is a new (to the second edition) chapter illustrating many aspects of medical statistics using the COVID-19 pandemic. Topics covered include, reporting cases, case fatality as a function of age, developing vaccines, testing for infection and modelling the spread of infection.
The topic of clinical trials is introduced using the example of the MRC trial in streptomycin in TB. The role of randomization, the subject of design of experiments and ethical problems in conducting trials in patients are covered.
Summarizing results from many studies has a long history and is currently a hot topic, largely as a result of the Evidence Based Medicine movement. This is treated in this chapter, starting with an early attempt by Karl Pearson at the beginning of the twentieth century. The statistical techniques of meta-analysis are described, as is the Cochrane Collaboration and its programme of summarizing results from clinical trials.
The development of the MMR vaccine and the history of the study of the three diseases, measles, mumps and rubella, that it is designed to protect against are treated in this chapter as is the controversy attendant on the claim that it might be a cause of autism. The is taken as an example to illustrate the many statistical topics that have been developed throughout the book.
Statistical models of processes where random events have an effect on partly random subsequent events are covered in this chapter. The sequence of eruptions of the geyser Old Faithful is taken as a simple example to illustrate Markov Chains. Infectious disease models are then covered and the history of various attempts at modelling them from the early twentieth century onwards is covered. Modelling religious conversion as a stochastic process is treated briefly.
During the past half-century, exponential families have attained a position at the center of parametric statistical inference. Theoretical advances have been matched, and more than matched, in the world of applications, where logistic regression by itself has become the go-to methodology in medical statistics, computer-based prediction algorithms, and the social sciences. This book is based on a one-semester graduate course for first year Ph.D. and advanced master's students. After presenting the basic structure of univariate and multivariate exponential families, their application to generalized linear models including logistic and Poisson regression is described in detail, emphasizing geometrical ideas, computational practice, and the analogy with ordinary linear regression. Connections are made with a variety of current statistical methodologies: missing data, survival analysis and proportional hazards, false discovery rates, bootstrapping, and empirical Bayes analysis. The book connects exponential family theory with its applications in a way that doesn't require advanced mathematical preparation.
As a result of the COVID-19 pandemic, medical statistics and public health data have become staples of newsfeeds worldwide, with infection rates, deaths, case fatality and the mysterious R figure featuring regularly. However, we don't all have the statistical background needed to translate this information into knowledge. In this lively account, Stephen Senn explains these statistical phenomena and demonstrates how statistics is essential to making rational decisions about medical care. The second edition has been thoroughly updated to cover developments of the last two decades and includes a new chapter on medical statistical challenges of COVID-19, along with additional material on infectious disease modelling and representation of women in clinical trials. Senn entertains with anecdotes, puzzles and paradoxes, while tackling big themes including: clinical trials and the development of medicines, life tables, vaccines and their risks or lack of them, smoking and lung cancer, and even the power of prayer.
Beyond quantifying the amount of association between two variables, as was the goal in a previous chapter, regression analysis aims at describing that association and/or at predicting one of the variables based on the other ones. Examples of applications where this is needed abound in engineering and a broad range of industries. For example, in the insurance industry, when pricing a policy, the predictor variable encapsulates the available information about what is being insured, and the response variable is a measure of risk that the insurance company would take if underwriting the policy. In this context, a procedure is solely evaluated based on its performance at predicting that risk, and can otherwise be very complicated and have no simple interpretation. The chapter covers both local methods such as kernel regression (e.g., local averaging) and empirical risk minimization over a parametric model (e.g., linear models fitted by least squares). Cross-validation is introduced as a method for estimating the prediction power of a certain regression or classification metod.
Measurements are often numerical in nature, which naturally leads to distributions on the real line. We start our discussion of such distributions in the present chapter, and in the process introduce the concept of random variable, which is really a device to facilitate the writing of probability statements and the derivation of the corresponding computations. We introduce objects such as the distribution function, survival function, and quantile function, any of which characterizes in the underlying distribution.
Some experiments lead to considering not one, but several measurements. As before, each measurement is represented by a random variable, and these are stacked into a random vector. For example, in the context of an experiment that consists in flipping a coin multiple times, we defined in a previous chapter as many random variables, each indicating the result of one coin flip. These are then concatenated to form a random vector, compactly describing the outcome of the entire experiment. Concepts such as conditional probability and independence are introduced.
We consider an experiment that yields, as data, a sample of independent and identically distributed (real-valued) random variables with a common distribution on the real line. The estimation of the underlying mean and median is discussed at length, and bootstrap confidence intervals are constructed. Tests comparing the underlying distribution to a given distribution (e.g., the standard normal distribution) or a family of distribution (e.g., the normal family of distributions) are introduced. Censoring, which is very common in some clinical trials, is briefly discuss.
In this chapter we introduce some tools for sampling from a distribution. We also explain how to use computer simulations to approximate probabilities and, more generally, expectations, which can allow one to circumvent complicated mathematical derivations. The methods that are introduced include Monte Carlo sampling/integration, rejection sampling, and Markov Chain Monte Carlo sampling.
An expectation is simply a weighted mean, and means are at the core of Probability Theory and Statistics. In Statistics, in particular, such expectations are used to define parameters of interest. It turns out that an expectation can be approximated by an empirical average based on a sample from the distribution of interest, and the accuracy of this approximation can be quantified via what is referred to as concentration inequalities.