We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we study the linear-Gaussian setting, where the forward model (·)is linear and both the prior on 𝑢 and the distribution of the observation noise 𝜂 are Gaussian. This setting is highly amenable to analysis and arises frequently in applications. Moreover, as we will see throughout these notes, many methods employed in nonlinear or non-Gaussian settings build on ideas from the linear- Gaussian case by performing linearization or invoking Gaussian approximations.
In practical applications, we often try many different model classes (such as SVM, neural networks, decision trees), and we want to select the best model to achieve the smallest test loss. This problem is referred to as model selection. This chapter studies techniques used to analyze model selection problems.
In this chapterwe describe the Extended Kalman Filter (ExKF)1 and the Ensemble Kalman Filter (EnKF). The ExKF approximates the predictive covariance by linearization, while the EnKF approximates it by the empirical covariance of a collection of particles. The ExKF is a provably accurate approximation of the filtering distribution if the dynamics are approximately linear and small noise is present in both signal and data, in which case the filtering distribution is well approximated by a Gaussian.
This chapter introduces the concept of covering numbers and uniform convergence, and using them to analyze the generalization of machine learning algorithms.
In the standard multiarmed bandit problem, one observes a fixed number of arms. To achieve optimal regret bounds, one estimates confidence intervals of the arms by counting. In the contextual bandit problem, one observes side information for each arm, which can be used as features for more accurate confidence interval estimation. This chapter studies contextual bandit problems with both linear and nonlinear models
In this chapter we explore the properties of Bayesian inversion from the perspective of an optimization problem which corresponds to maximizing the posterior probability; that is, to finding a maximum a posteriori (MAP) estimator, or mode of the posterior distribution. We demonstrate the properties of the point estimator resulting from this optimization problem, showing its positive and negative attributes, the latter motivating our work in the following three chapters. We also introduce, and study, basic gradient-based optimization algorithms.
In this chapter we introduce Monte Carlo sampling and importance sampling. These are two general techniques for estimating expectations with respect to a given pdf π. Monte Carlo generates independent samples from π and combines them with equal weights, whilst importance sampling uses independent samples, weighted appropriately, from a different distribution. In quantifying the error in Monte Carlo and importance sampling, we will use a distance on random probability measures that reduces to total variation in the case of deterministic probability measures; and we will introduce the χ2 divergence.
For bandit problems, we consider the so-called partial information setting, where only the outcome of the action taken is observed. In this chapter, we will investigate some bandit algorithms that are commonly used.
In this chapter we introduce the Bayesian approach to inverse problems in which the unknown parameter and the observed data are viewed as random variables. In this probabilistic formulation, the solution of the inverse problem is the posterior distribution on the parameter given the data. We will show that the Bayesian formulation leads to a form of well-posedness: small perturbations of the forward model or the observed data translate into small perturbations of the posterior distribution. Well-posedness requires a notion of distance between probability measures. We introduce the total variation and Hellinger distances, giving characterizations of them, and bounds relating them, that will be used throughout these notes. We prove well-posedness in the Hellinger distance.
The aim of these notes is to provide a clear and concise mathematical introduction to the subjects of Inverse Problems and Data Assimilation, and their interrelations, together with bibliographic pointers to literature in this area that goes into greater depth. The target audiences are advanced undergraduates and beginning graduate students in the mathematical sciences, together with researchers in the sciences and engineering who are interested in the systematic underpinnings of methodologies widely used in their disciplines.
In this chapter we introduce the Bayesian approach to inverse problems in which the unknown parameter and the observed data are viewed as random variables. In this probabilistic formulation, the solution of the inverse problem is the posterior distribution on the parameter given the data. We will show that the Bayesian formulation leads to a form of well-posedness: small perturbations of the forward model or the observed data translate into small perturbations of the posterior distribution. Well-posedness requires a notion of distance between probability measures. We introduce the total variation and Hellinger distances, giving characterizations of them, and bounds relating them, that will be used throughout these notes. We prove well-posedness in the Hellinger distance.
In Chapter 14, we introduced the basic definitions of online learning, and analyzed a number of first-order algorithms. In this chapter, we consider more advanced online learning algorithms that inherently exploit second-order information.