To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we describe scattering representations, a signal representation built using wavelet multiscale decompositions with a deep convolutional architecture. Its construction highlights the fundamental role of geometric stability in deep learning representations, and provides a mathematical basis to study CNNs. We describe its main mathematical properties, its applications to computer vision, speech recognition and physical sciences, as well as its extensions to Lie Groups and non-Euclidean domains. Finally, we discuss recent applications to modeling high-dimensional probability densities.
We indicated in the concluding remarks of the previous chapter that feedforward neural networks have powerful modeling capabilities, as reflected by the universal approximation theorem. In one of its versions, the theorem asserts that networks with a single hidden layer are rich enough to model almost any arbitrary function.
We encountered one instance of Bayesian inference in Chapter 50, based on the quadratic loss in the context of mean-square-error (MSE) estimation. We explained there that the optimal solution for inferring a hidden zero-mean random variable from observations of another zero-mean random variable is given by the conditional estimator, , whose computation requires knowledge of the conditional distribution, .
In supervised methods, learning is attained by training on a sufficient amount of labeled data in order to deliver reliable levels of classification. However, there are important situations in practice where data is scarce because it is either difficult or expensive to collect. This scenario leads to few-shot learning, where it is desired to train a classifier by using only a few training samples for each class.
We illustrated in Example 63.2 one limitation of linear separation surfaces by considering the XOR mapping (63.11). The example showed that certain feature spaces are not linearly separable and cannot be resolved by the perceptron algorithm. The result in the example was used to motivate one powerful approach to nonlinear separation surfaces by means of kernel methods.
In the immediate past chapters we developed several techniques for the design of linear classifiers, such as logistic regression, perceptron, and support vector machines (SVM). These algorithms are suitable for data that are linearly separable; otherwise, their performance degrades significantly. In this chapter we explain how the methods can be adjusted to determine nonlinear separation surfaces.
In most multistage decision problems, we are interested in determining the optimal strategy, (i.e., the optimal actions to follow in the state–action space). Most of the algorithms described in the previous chapters focused on evaluating the state and state–action value functions, and , for a given policy . More is needed to learn the optimal policy.
We derived in the previous two chapters procedures for assessing the performance of strategies used by agents interacting with a Markov decision process (MDP), including obtaining optimal policies. Among other methods, we discussed the policy evaluation algorithm (44.116) and the value and policy iterations (45.23) and (45.43), respectively.
In this chapter, we describe a tensor network (TN) based common language established between machine learning and many-body physics, which allows for bidirectional contributions. By showing that many-body wave functions are structurally equivalent to mappings of convolutional and recurrent networks, we bring forth quantum entanglement measures as natural quantifiers of dependencies modeled by such networks. Accordingly, we propose a novel entanglement-based deep learning design scheme that sheds light on the success of popular architectural choices made by deep learning practitioners and suggests new practical prescriptions. In the other direction, we construct TNs corresponding to deep recurrent and convolutional networks. This allows us to theoretically demonstrate that these architectures are powerful enough to represent highly entangled quantum systems polynomially more efficiently than previously employed architectures. We thus provide theoretical motivation to shift neural-network-based wave function representations closer to state-of-the-art deep learning architectures.
Principal component analysis (PCA) is a formidable tool for dimensionality reduction. Given feature vectors in ‐dimensional space, PCA replaces them by lower‐dimensional vectors of size each.
Markov decision processes (MDPs) are at the core of reinforcement learning theory. Similar to Markov chains, MDPs involve an underlying Markovian process that evolves from one state to another, with the probability of visiting a new state being dependent on the most recent state. Different from Markov chains, MDPs involve both agents and actions taken by these agents. As a result, the next state is dependent on which action was chosen at the state preceding it. MDPs therefore provide a powerful framework to explore state spaces and to learn from actions and rewards.