Search results for Pattern Recognition and Machine Learning

3 - Unsupervised Learning Warm-Up
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 24-35
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we explore an unsupervised learning problem: estimating a distribution function from two-dimensional data. Although there is no response variable, the workflow mirrors that of supervised learning. We select the best-fitting function within a family by maximising the sum of the log of the distribution's values at the observed data points. As in supervised learning, excessive flexibility leads to overfitting, while insufficient flexibility leads to underfitting. We use cross-validation to identify a function family that achieves a happy medium.

7 - Frequentist Linear Regression
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 96-123
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces simple and multiple linear regression models – core tools in predictive modelling due to their simplicity and interpretability. These models assume the response variable is a linear function of the predictor(s), plus a noise term. The regression function gives the expected response given the predictors. The coefficient of determination, R2, measures how much of the variance in the response is explained by the model. In simple linear regression, R2 equals the square of the Pearson correlation between response and predictor; in multiple regression, it equals the square of the correlation between response and predicted values. Each coefficient in multiple regression reflects the expected change in the response for a one-unit increase in that predictor, holding others fixed. Standardising predictors lets us compare coefficient sizes. Strong collinearity between predictors increases uncertainty in the fitted coefficients. Models using only a subset of predictors may generalise better than those using all and overfitting. The squared error risk of a modelling procedure – its expected test error – can be broken down into bias, variance and irreducible noise.

Appendix C - Jensen’s and Gibbs’ Inequalities
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 304-305
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Supervised Learning Warm-Up
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 8-23
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we examine our first supervised learning problem, focusing on how to construct prediction functions and assess their performance. Given data consisting of predictor–response pairs, we can learn the parameters of a prediction function by minimising a loss, such as the residual sum of squares, which measures the discrepancy between actual and predicted responses. Using more flexible families of prediction functions typically reduces loss on the training data, but excessive flexibility can lead to overfitting: fitting to noise rather than the systematic component of the relationship. Overfitting results in poor prediction performance on new, unseen data. To estimate how a prediction method will perform on unseen data, we use cross-validation. However, when we compare many prediction methods using cross-validation, the best-performing method often appears better than it truly is; its apparent performance is an unreliable guide to its future accuracy. Prior knowledge is crucial for selecting plausible prediction methods to compare. Finally, we can use bootstrapping to quantify uncertainty in prediction functions and their predictions.

References
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 306-306
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Frequentist and Bayesian Uncertainty
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 57-95
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we examine how to quantify uncertainty about model parameters, highlighting two main approaches: frequentist and Bayesian. We start by modelling a data-generating mechanism with a parametric family, where different parameter values correspond to different models. Assuming our model family can describe the mechanism, we use data to infer plausible parameters and quantify uncertainty. In frequentist inference, we build parameter estimators and study their sampling distributions across repeated data collection. Here, parameters are fixed unknown constants, and only estimators are treated probabilistically. In Bayesian inference, parameters are latent random variables. We express uncertainty through probability, combining prior beliefs about parameter values with observed data using Bayes’ rule to obtain a posterior distribution. The posterior and the frequentist sampling distribution often play similar roles and can resemble each other in practice. Computational tools like bootstrapping and Markov chain Monte Carlo help estimate sampling and posterior distributions, respectively.

9 - Bayesian Linear Regression, Priors, and Regularisation
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 133-152
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces simple and multiple Bayesian linear regression models, in which parameters are treated as latent random variables. Thanks to their simplicity, these models yield closed-form posteriors. With flat priors, the posterior closely resembles the frequentist sampling distribution. We also explore the use of shrinkage priors to penalise model complexity and reduce overfitting. A Gaussian prior on the coefficients leads to ridge regression, where the MAP estimate corresponds to L2-regularised least squares. A Laplace prior yields lasso regression, based on L1 regularisation. Both are examples of regularisation techniques, but they behave differently: ridge regression shrinks all coefficients toward zero, while lasso tends to set some exactly to zero, producing a sparse model.

11 - Classification
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 192-221
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces probabilistic models for supervised learning tasks where the prediction target is categorical. In binary classification, the target takes two values; models output the conditional probability of one of these, given the predictors. Logistic regression expresses the log odds as a linear function of the predictors and is fitted by minimising (regularised) cross-entropy loss. Minimising unregularised cross-entropy is equivalent to maximising likelihood, but in linearly separable cases, a maximum likelihood solution may not exist. Regularisation ensures the problem is well posed and helps control overfitting. In multiclass classification, the target can take K > 2 values, and models output a K-dimensional probability vector. Multinomial logistic regression expresses a K-dimensional score vector as a linear function of the predictors and applies the softmax function to convert scores into probabilities. k-nearest neighbours (k-NN) is a non-parametric method that estimates class probabilities from nearby training points. In high-dimensional predictor spaces, parametric models like logistic regression often outperform non-parametric ones like k-NN.

Index
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 307-309
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - Probabilistic Modelling
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 41-56
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In this chapter, we introduce probabilistic models of the mechanisms that generate data. Probabilistic models let us express scientific hypotheses with clear truth conditions, even when the mechanisms are inherently stochastic. A conditional probabilistic model describes how the conditional probability density of a response variable given a predictor depends on the predictor’s value. This dependence is controlled by a parameter vector, whose possible values form the model’s hypothesis space. Fitting the model means choosing a specific parameter vector based on data. One common approach is maximum likelihood, which selects parameters that make the observed data most probable. For many conditional models, maximising likelihood is equivalent to minimising the residual sum of squares.

Notation
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp xii-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

14 - Expanding the Toolkit
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 293-298
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This final chapter sketches ways to expand the inference toolkit introduced in the book. We explore more flexible prediction functions, including splines, generalised additive models and local regression. These methods improve expressivity while controlling complexity, helping avoid overfitting. We also show how differential equations – commonly used in scientific modelling – fit naturally into the probabilistic framework by defining parameterised function families. Inherently, stochastic systems can be modelled using Markov processes, allowing inference via familiar likelihood-based methods. Finally, we discuss generative language models, focusing on the GPT architecture. GPT models define probability distributions over token sequences using autoregressive neural networks trained via cross-entropy loss. Though the underlying architecture is complex, the core modelling idea – predicting the next word given prior context – builds directly on probabilistic and machine learning principles developed throughout the book.

Appendix B - Linear Algebra
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 301-303
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

1 - Orientation
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 1-7
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter provides an overview of the types of inference problems we address and the different approaches to solving them. We focus on risky inference: drawing conclusions, learning and making predictions in situations where certainty is impossible. Predicting a response from one or more predictors using past data is called supervised learning. When the response is continuous, the task is regression; when it is categorical, the task is classification. In unsupervised learning, there is no response variable. Instead, the goal is to find patterns or structure in data, as in density estimation, clustering and dimensionality reduction. In both supervised and unsupervised contexts, overfitting occurs when we model data in excessive detail and fail to distinguish systematic patterns from noise; underfitting occurs when our models are too simple to capture systematic patterns. Probability is a key tool for tackling risky inference, with frequentist and Bayesian interpretations motivating distinct approaches. Finally, large neural networks have proven remarkably effective in both supervised and unsupervised tasks, often avoiding overfitting despite containing billions of parameters.

4 - Interlude: Probability, Likelihood and Bayes
James Burridge, University of Portsmouth, Nick Tosh, University of Galway
Book:

Inference in Statistical Modelling and Machine Learning

Published online:

22 May 2026

Print publication:

30 June 2026, pp 36-40
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces key ideas about probability, likelihood, and Bayesian inference. The likelihood of a hypothesis is the conditional probability of the data given the hypothesis. One way of using data to choose a hypothesis from a hypothesis space is to pick the hypothesis with the greatest likelihood; this is known as maximum likelihood inference. When used to choose between hypotheses that differ greatly in intrinsic plausibility, maximum likelihood inference is unreliable. Bayesian inference takes likelihoods into account but is also sensitive to the intrinsic plausibility of hypotheses.

Contents
Ryoma Sato, National Institute of Informatics, Chiyoda, Japan
Book:

Accelerating Deep Neural Networks

Published online:

09 May 2026

Print publication:

04 June 2026, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Quantile Regression
Isabella Verdinelli, Carnegie Mellon University, Pennsylvania, Larry Wasserman, Carnegie Mellon University, Pennsylvania
Book:

All of Regression

Published online:

08 May 2026

Print publication:

04 June 2026, pp 110-116
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Most regression methods estimate the mean of Y given X. But it can also be useful to estimate the quantiles of Y given X. This provides more information about the relationship between X and Y.

Appendix A: - Matrix Theory
Isabella Verdinelli, Carnegie Mellon University, Pennsylvania, Larry Wasserman, Carnegie Mellon University, Pennsylvania
Book:

All of Regression

Published online:

08 May 2026

Print publication:

04 June 2026, pp 189-192
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

4 - Pruning
Ryoma Sato, National Institute of Informatics, Chiyoda, Japan
Book:

Accelerating Deep Neural Networks

Published online:

09 May 2026

Print publication:

04 June 2026, pp 88-124
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 3, we achieved speedup by reducing the precision used throughout the model using low-precision techniques. In this chapter, we introduce pruning, a method that transforms the model at a finer granularity to accelerate computation.

Bibliography
Isabella Verdinelli, Carnegie Mellon University, Pennsylvania, Larry Wasserman, Carnegie Mellon University, Pennsylvania
Book:

All of Regression

Published online:

08 May 2026

Print publication:

04 June 2026, pp 202-205
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

2513 results in Pattern Recognition and Machine Learning

3 - Unsupervised Learning Warm-Up

Summary

7 - Frequentist Linear Regression

Summary

Appendix C - Jensen’s and Gibbs’ Inequalities

2 - Supervised Learning Warm-Up

Summary

References

6 - Frequentist and Bayesian Uncertainty

Summary

9 - Bayesian Linear Regression, Priors, and Regularisation

Summary

11 - Classification

Summary

Index

5 - Probabilistic Modelling

Summary

Notation

14 - Expanding the Toolkit

Summary

Appendix B - Linear Algebra

1 - Orientation

Summary

4 - Interlude: Probability, Likelihood and Bayes

Summary

Contents

8 - Quantile Regression

Summary

Appendix A: - Matrix Theory

4 - Pruning

Summary

Bibliography

Pattern Recognition and Machine Learning

Refine search

Refine search

Actions for selected content:

Save Search

2513 results in Pattern Recognition and Machine Learning

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary