To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces key concepts and methods in Bayesian statistical modelling. The posterior predictive distribution captures both epistemic uncertainty in model parameters and aleatory uncertainty in future outcomes. A Bayesian p-value gives the probability that a statistic computed from data output by a given model will be more extreme than the value of the same statistic computed from observed data. Bayesian p-values close to 0 or 1 suggest the model may be inadequate. Markov chain Monte Carlo is a general-purpose tool for sampling from complex, unnormalised distributions. It produces dependent samples, so the effective sample size is usually smaller than the number of iterations. Informative priors are useful when data leave large uncertainties in parameter values. Empirical Bayes combines information across related datasets by estimating a distribution over parameters using frequentist methods. Hierarchical modelling provides a unified Bayesian framework for handling multiple related datasets, capturing group structure via a hierarchical graph.
QUEUES FEATURE IN our daily lives like never before. From the checkout counter in the community grocery store to customer support over the phone, queues are theatres of great social and engineering drama. Entire business operations of many leading companies are geared towards providing hassle-free customer support and experience – timely and effective resolution of client queries about services on a regular basis. Alternatively, it could be effective traffic management and resource optimization for a multiplex cinema operator involved in ticket sales. Sometimes it may not involve humans at all, like in the case of a database query to a computer server for specific information that may be routed through a job queue. How a queue moves in time and how services are offered over epochs determine how businesses will be able to make profit or how efficiently computer servers will execute tasks. All these have a huge technological and economical impact. No wonder we have seen huge investments by concerned stakeholders to upgrade and upscale hardware and software infrastructure to re-engineer queues towards greater system efficiency and profitability. The mathematical technology of queues is crafted out of models that investigate and replicate stochastic behavior of engineering systems. This is the subject of our study in this chapter.
STATISICAL EXPERIMENTS ENABLE us to make inferences from data about parameters that characterize a population. Generally speaking, inferences may be of two types, namely, deductive inference and inductive inference. Deductive inference pertains to conclusions based on a set of premises (propositions) and their synthesis. Deductive reasoning has a definitive character. For example, all men are mortal (first proposition); Socrates is a man (second proposition); hence, Socrates is mortal (deductive conclusion). On the other hand, inductive inference has a probabilistic character. One conducts an experiment and collects data. Based on this data, certain conclusions are drawn that may have a broader applicability beyond the contours of the particular experiment performed by the researcher. This generalization of the conclusions drawn from the particular experiment constitutes the framework of inductive reasoning. For example, measurement of heights of a small group of people belonging to a certain population is conducted. Based on the calculations of this small sample set, and upon finding that for this small group the average height of men is greater than the average height of women, it is inferred that the men of this population are generally taller than the women.
The formal practice of inductive reasoning dates back to the thesis of Gottfried Wilhelm Leibniz (see Figure 5.1). He was the first to propose that probability is a relation between hypothesis and evidence (data). His thesis was founded on three conceptual pillars: chance (probability), possibilities (realizable random events), and ideas (generalization of inferences by induction). We have encountered the first two concepts in earlier chapters of this textbook. In this chapter, we will delve into the third theme whereby we will discuss methods to draw conclusions from data derived from statistical experiments based on the principles of inductive reasoning.
In this chapter, we explore an unsupervised learning problem: estimating a distribution function from two-dimensional data. Although there is no response variable, the workflow mirrors that of supervised learning. We select the best-fitting function within a family by maximising the sum of the log of the distribution's values at the observed data points. As in supervised learning, excessive flexibility leads to overfitting, while insufficient flexibility leads to underfitting. We use cross-validation to identify a function family that achieves a happy medium.
This chapter introduces simple and multiple linear regression models – core tools in predictive modelling due to their simplicity and interpretability. These models assume the response variable is a linear function of the predictor(s), plus a noise term. The regression function gives the expected response given the predictors. The coefficient of determination, R2, measures how much of the variance in the response is explained by the model. In simple linear regression, R2 equals the square of the Pearson correlation between response and predictor; in multiple regression, it equals the square of the correlation between response and predicted values. Each coefficient in multiple regression reflects the expected change in the response for a one-unit increase in that predictor, holding others fixed. Standardising predictors lets us compare coefficient sizes. Strong collinearity between predictors increases uncertainty in the fitted coefficients. Models using only a subset of predictors may generalise better than those using all and overfitting. The squared error risk of a modelling procedure – its expected test error – can be broken down into bias, variance and irreducible noise.
Our lived experiences are punctuated by events that are sometimes a result of our purposeful intentions and at other times outcomes that happen by pure chance. Even at an abstract level, it is a very human endeavor to deduce meaning from seemingly random observations an exercise whose primary objective is to derive a causal structure in observed phenomena. In fact, our whole intellectual pursuit that differentiates us from other beings can be understood through our inner urge to discover the very purpose of our existence and the conditions that make this possible. This eternal play between chance episodes and purposeful volition manifests in diverse situations that I have labored to recreate through computer simulations of realistic events. This play has a dual role - first, it binds together the flow of our varied experiences and, second, it offers us a perspective to assimilate our understanding of events happening around us that affect us. In order to appreciate this play of chance and purpose, it is essential that students and readers have a conceptual grounding in the areas of probability, statistics, and stochastic processes. Therefore, several playful computer simulations and projects are interlaced with theoretical foundations and numerical examples - both solved and exercise problems. In this way, the presentation in this book remains true to its spirit of inviting thoughtful readers to the various aspects of this area of study.
Historical remark
The advent of a rigorous framework for studying probability and statistics dates back to the eighth century AD and is documented in the works of Al-Khalil, who was an Arab philologist. This branch of mathematics continues to be under development with major contributions from Soviet mathematician Andrey N. Kolmogorov, who developed the modern foundations of probability and statistical theory from a measure-theoretic standpoint in the twentieth century.
In this chapter, we examine our first supervised learning problem, focusing on how to construct prediction functions and assess their performance. Given data consisting of predictor–response pairs, we can learn the parameters of a prediction function by minimising a loss, such as the residual sum of squares, which measures the discrepancy between actual and predicted responses. Using more flexible families of prediction functions typically reduces loss on the training data, but excessive flexibility can lead to overfitting: fitting to noise rather than the systematic component of the relationship. Overfitting results in poor prediction performance on new, unseen data. To estimate how a prediction method will perform on unseen data, we use cross-validation. However, when we compare many prediction methods using cross-validation, the best-performing method often appears better than it truly is; its apparent performance is an unreliable guide to its future accuracy. Prior knowledge is crucial for selecting plausible prediction methods to compare. Finally, we can use bootstrapping to quantify uncertainty in prediction functions and their predictions.
In this chapter, we examine how to quantify uncertainty about model parameters, highlighting two main approaches: frequentist and Bayesian. We start by modelling a data-generating mechanism with a parametric family, where different parameter values correspond to different models. Assuming our model family can describe the mechanism, we use data to infer plausible parameters and quantify uncertainty. In frequentist inference, we build parameter estimators and study their sampling distributions across repeated data collection. Here, parameters are fixed unknown constants, and only estimators are treated probabilistically. In Bayesian inference, parameters are latent random variables. We express uncertainty through probability, combining prior beliefs about parameter values with observed data using Bayes’ rule to obtain a posterior distribution. The posterior and the frequentist sampling distribution often play similar roles and can resemble each other in practice. Computational tools like bootstrapping and Markov chain Monte Carlo help estimate sampling and posterior distributions, respectively.
This chapter introduces simple and multiple Bayesian linear regression models, in which parameters are treated as latent random variables. Thanks to their simplicity, these models yield closed-form posteriors. With flat priors, the posterior closely resembles the frequentist sampling distribution. We also explore the use of shrinkage priors to penalise model complexity and reduce overfitting. A Gaussian prior on the coefficients leads to ridge regression, where the MAP estimate corresponds to L2-regularised least squares. A Laplace prior yields lasso regression, based on L1 regularisation. Both are examples of regularisation techniques, but they behave differently: ridge regression shrinks all coefficients toward zero, while lasso tends to set some exactly to zero, producing a sparse model.