We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The idea of reproducing kernel Hilbert space (RKHS), was popularized in machine learning through support vector machines (SVMs) in the 1990s. This chapter presents an overview of RKHS kernel methods and their theoretical analysis.
In practical applications, we typically solve the empirical risk minimization problem using optimization methods such as stochastic gradient descent (SGD). Such an algorithm searches a model parameter along a path that does not cover the entire model space. Therefore the empirical process analysis may not be optimal to analyze the performance of specific computational procedures. In recent years, another theoretical tool, which we may refer to as stability analysis, has been proposed to analyze such computational procedures.
In this chapter, we focus on additive models that can be regarded as the sum of base models. The goal of additive model is to find a combination of models such that the combined model is more accurate than the base models.
This chapter considers lower bounds for empirical processes and statistical estimation problems. We know that upper bounds for empirical processes and empirical risk minimization can be obtained from the covering number analysis. We show that, under suitable conditions, lower bounds can also be obtained using covering numbers.
In online learning, we consider a learning model that is different from that of supervised learning, in that we make predictions sequentially and obtain feedback after predictions are made. In this chapter, we introduce this learning model as well as some first-order online learning algorithms.
This chapter describes some theoretical results of reinforcement learning, and the analysis may be regarded as a natural generalization of techniques introduced for contextual bandit problems. We will consider both model-free and model-based algoithms, and introduce structural results for reinforcemet learning that lead to algorithms with provably efficient statistical complexity.
This chapter demonstrates the use of optimization, namely the 3DVAR and 4DVAR methodologies, to obtain information from the filtering and smoothing distributions. We emphasize that the methods we present in this chapter do not provide approximations of the filtering and smoothing distributions; they simply provide estimates of the signal, given data, in the filtering (on-line) and smoothing (off-line) data scenarios.
This chapter is devoted to the particle filter, a method that approximates the filtering distribution by a sum of Dirac masses. Particle filters provably converge to the filtering distribution as the number of particles, and hence the number of Dirac masses, approaches infinity. We focus on the bootstrap particle filter (BPF), also known as sequential importance resampling; it is linked to the material on Monte Carlo and importance sampling described in Chapter 5.
In this chapter we study Markov chain Monte Carlo (MCMC), a methodology that delivers approximate samples from a given target distribution π. The methodology applies to settings in which π is the posterior distribution in (1.2), but it is also widely used in numerous applications beyond Bayesian inference. As with Monte Carlo and importance sampling, MCMC may be viewed as approximating the target distribution by a sum of Dirac masses, thus allowing the approximation of expectations with respect to the target. Implementation of Monte Carlo presupposes that independent samples from the target can be obtained. Importance sampling and MCMC bypass this restrictive assumption: importance sampling by appropriately weighting independent samples from a proposal distribution, and MCMC by drawing correlated samples from a Markov kernel that has the target as invariant distribution.
In this chapter we again adopt an optimization approach to the problem of Bayesian inference, but instead seek a Gaussian distribution 𝑝 = N(μ, Σ) that minimizes some distance-like measure from the posterior 𝜋𝑦 (u). However, rather than using a metric to define the distance, we use the Kullback–Leibler divergence introduced in Section 4.1.
This chapters presents some known theoretical results for neural networks, including some theoretical analysis that has been developed recently. We show that neural networks can be analyzed using kernel methods and L1 regularization methods that have been studied in previous chapters.
In sequential estimation problems investigated in the next few chapters, we observe a sequence of random variables that are not independent. This requires a generalization of sums of independent variables, called Martingales. This chapter studies probability inequalities and uniform convergence for Martingales, which are essential in analyzing sequential statistical estimation problems.