To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
After a discussion of best programming practices and a brief summary of basic features of the Python programming language, chapter 1 discusses several modern idioms. These include the use of list comprehensions, dictionaries, the for-else idiom, as well as other ways to iterate Pythonically. Throughout, the focus is on programming in a way which feels natural, i.e., working with the language (as opposed to working against the language). The chapter also includes basic information on how to make figures using Matplotlib, as well as advice on how to effectively use the NumPy library, with an emphasis on slicing, vectorization, and broadcasting. The chapter is rounded out by a physics project, which studies the visualization of electric fields, and a problem set.
Extracting the latent underlying structures of complex nonlinear local and nonlocal flows is essential for their analysis and modeling. In this Element the authors attempt to provide a consistent framework through Koopman theory and its related popular discrete approximation - dynamic mode decomposition (DMD). They investigate the conditions to perform appropriate linearization, dimensionality reduction and representation of flows in a highly general setting. The essential elements of this framework are Koopman eigenfunctions (KEFs) for which existence conditions are formulated. This is done by viewing the dynamic as a curve in state-space. These conditions lay the foundations for system reconstruction, global controllability, and observability for nonlinear dynamics. They examine the limitations of DMD through the analysis of Koopman theory and propose a new mode decomposition technique based on the typical time profile of the dynamics.
In this Element, the authors consider fully discretized p-Laplacian problems (evolution, boundary value and variational problems) on graphs. The motivation of nonlocal continuum limits comes from the quest of understanding collective dynamics in large ensembles of interacting particles, which is a fundamental problem in nonlinear science, with applications ranging from biology to physics, chemistry and computer science. Using the theory of graphons, the authors give a unified treatment of all the above problems and establish the continuum limit for each of them together with non-asymptotic convergence rates. They also describe an algorithmic framework based proximal splitting to solve these discrete problems on graphs.
Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This Element provides a unified framework to handle these approaches via Markov chains. The authors consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties, and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables the coupling of both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. The authors' framework establishes a useful mathematical tool to combine the various approaches.
This chapter summarizes recent advances on the analysis of the optimization landscape of neural network training. We first review classical results for linear networks trained with a squared loss and without regularization. Such results show that under certain conditions on the input-output data spurious local minima are guaranteed not to exist, i.e. critical points are either saddle points or global minima. Moreover, the globally optimal weights can be found by factorizing certain matrices obtained from the input-output covariance matrices.We then review recent results for deep networks with parallel structure, positively homogeneous network mapping and regularization, and trained with a convex loss. Such results show that the non-convex objective on theweights can be lower-bounded by a convex objective on the network mapping. Moreover, when the network is sufficiently wide, local minima of the non-convex objective that satisfy a certain condition yield global minima of both the non-convex and convex objectives, and that there is always a non-increasing path to a global minimizer from any initialization.
In this chapter we discuss the algorithmic and theoretical underpinnings of layer-wise relevance propagation (LRP), apply the method to a complex model trained for the task of visual question answering (VQA), and demonstrate that it produces meaningful explanations, revealing interesting details about the model’s reasoning. We conclude the chapter by commenting on the general limitations of current explanation techniques and interesting future directions.
Over the last few decades sparsity has become a driving force in the development of new and better algorithms in signal and image processing. In the context of the late deep learning zenith, a pivotal work by Papyan et al. showed that deep neural networks can be interpreted and analyzed as pursuit algorithms seeking for sparse representations of signals belonging to a multilayer synthesis sparse model. In this chapter we review recent contributions showing that this observation is correct but incomplete, in the sense that such a model provides a symbiotic mixture of coupled synthesis and analysis sparse priors. We make this observation precise and use it to expand on uniqueness guarantees and stability bounds for the pursuit of multilayer sparse representations. We then explore a convex relaxation of the resulting pursuit and derive efficient optimization algorithms to approximate its solution. Importantly, we deploy these algorithms in a supervised learning formulation that generalizes feed-forward convolutional neural networks into recurrent ones, improving their performance without increasing the number of parameters of the model.
This chapter provides theoreticalinsights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, non-robustness, and sharp minima, responding to an open question in the literature. We also discuss approaches to provide non-vacuousgeneralization guarantees for deep learning. On the basis of the theoreticalobservations, wepropose new open problems.
We give a short and concise review about the dynamical system and the control theory approach to deep learning. From the viewpoint of the dynamical systems, the back-propagation algorithm in deep learning becomes a simple consequence of the variational equations in ODEs. From the viewpoint of control theory, deep learning is a case of mean-field control in that all the agents share the same control. As an application, we discuss a new class of algorithms for deep learning based on Pontryagin’s maximum principle in control theory.
We take a look at the universal approximation question for stochastic feedforward neural networks. In contrast with deterministic networks, which represent mappings from inputs to outputs, stochastic networks represent mappings from inputs to probability distributions over outputs. Even if the sets of inputs and outputs are finite, the set of stochastic mappings is continuous. Moreover, the values of the output variables may be correlated, which requires that their values are computed jointly. A prominent class of stochastic feedforward networks are deep belief networks. We discuss the representational power in terms of compositions of Markov kernels expressed by the layers of the network. We investigate different types of shallow and deep architectures, and the minimal number of layers and units that are necessary and sufficient in order for the network to be able to approximate any stochastic mapping arbitrarily well. The discussion builds on notions of probability sharing, focusing on the case of binary variables and sigmoid units. After reviewing existing results, we present a detailed analysis of shallow networks and a unified analysis for a variety of deep networks.
Deep generative models have been recently proposed as modular datadriven priors to solve inverse problems. Linear inverse problems involve the reconstruction of an unknown signal (e.g. a tomographic image) from an underdetermined system of noisy linear measurements. Most results in the literature require that the reconstructed signal has some known structure, e.g. it is sparse in some known basis (usually Fourier or wavelet). Such prior assumptions can be replaced with pre-trained deep generative models (e.g. generative adversarial getworks (GANs) and variational autoencoders (VAEs)) with significant performance gains. This chapter surveys this rapidly evolving research area and includes empirical and theoretical results in compressed sensing for deep generative models.
We describe the new field of the mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.
In this chapter we describe scattering representations, a signal representation built using wavelet multiscale decompositions with a deep convolutional architecture. Its construction highlights the fundamental role of geometric stability in deep learning representations, and provides a mathematical basis to study CNNs. We describe its main mathematical properties, its applications to computer vision, speech recognition and physical sciences, as well as its extensions to Lie Groups and non-Euclidean domains. Finally, we discuss recent applications to modeling high-dimensional probability densities.