We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
An intuitive and accessible text explaining the fundamentals and applications of graph signal processing. Requiring only an elementary understanding of linear algebra, it covers both basic and advanced topics, including node domain processing, graph signal frequency, sampling, and graph signal representations, as well as how to choose a graph. Understand the basic insights behind key concepts and learn how graphs can be associated to a range of specific applications across physical, biological and social networks, distributed sensor networks, image and video processing, and machine learning. With numerous exercises and Matlab examples to help put knowledge into practice, and a solutions manual available online for instructors, this unique text is essential reading for graduate and senior undergraduate students taking courses on graph signal processing, signal processing, information processing, and data analysis, as well as researchers and industry professionals.
This textbook establishes a theoretical framework for understanding deep learning models of practical relevance. With an approach that borrows from theoretical physics, Roberts and Yaida provide clear and pedagogical explanations of how realistic deep neural networks actually work. To make results from the theoretical forefront accessible, the authors eschew the subject's traditional emphasis on intimidating formality without sacrificing accuracy. Straightforward and approachable, this volume balances detailed first-principle derivations of novel results with insight and intuition for theorists and practitioners alike. This self-contained textbook is ideal for students and researchers interested in artificial intelligence with minimal prerequisites of linear algebra, calculus, and informal probability theory, and it can easily fill a semester-long course on deep learning theory. For the first time, the exciting practical advances in modern artificial intelligence capabilities can be matched with a set of effective principles, providing a timeless blueprint for theoretical research in deep learning.
A Boolean analog of the standard compressive sensing problem, known as nonadaptive group testing, is analyzed in this chapter. Its success is characterized via the notion of separability (intimately related to disjunctiveness and strong selectivity) of the testing procedure. The minimal number of tests making separability possible is determined, and a deterministic procedure using roughly this number of tests is presented. Finally, it is shown that solving a linear feasibility program allows one to exactly recover sparse binary vectors from the outcomes of a separable testing procedure.
This chapter is concerned with quasi-Monte Carlo rules, i.e., multivariate quadrature rules featuring equal weights and deterministically chosen evaluation points. The variation of a function and the star discrepancy of a set of points are defined as a prerequisite to the Koksma--Hlawka inequality, which bounds the error of a quasi-Monte Carlo rule by the product of the variation and the star discrepancy. Finally, some evaluation points with small star discrepancy are uncovered, namely the Halton sequence and the Hammersley set.
This appendix recalls some key notions of probability theory, such as tails and moment generating functions. These notions are essential in the proof of some concentration inequalities, e.g., the McDiarmid inequality. In turn, these inequalities are used to establish the restricted isometry properties for sparse vectors and for low-rank matrices required earlier.
The high dimensionality of datapoints often constitutes an obstacle to efficient computations. This chapter investigates three workarounds that replace the datapoints by some substitutes selected in a lower dimensional set. The first workaround is principal component analysis, where the lower dimensional set is a linear space spanned by the top singular vectors of the data matrix. The second workaround is a Johnson–Lindenstrauss projection, where the lower dimensional set is a random linear space. The third workaround is locally linear embedding, where the lower dimensional set is not chosen as a linear space anymore.
This chapter studies binary classification from a non-statistical viewpoint. For data that are linearly separable, the perceptron algorithm is presented first. It is followed by an optimization program, known as the hard support vector machine (SVM), consisting in maximizing the margin. For data that are not exactly linearly separable, this optimization program is relaxed into soft SVM. Finally, for data that are linearly separable only after applying a feature map, the representer theorem is used to validate the so-called kernel trick.
This chapter corroborates the empirical belief in the superiority of deep networks over shallow ones. It does so by highlighting three situations where a clear advantage can be demonstrated. First, using depth two, there are activation functions turning neural networks into universal approximators even when restricting the width. Second, depth overcomes the limitation that shallow ReLU networks cannot generate compactly supported functions. Third, the approximation rate of Lipschitz functions by deep ReLU networks is better than that of shallow ones.
This chapter considers the unsupervised learning task known as clustering, which consists in grouping unlabeled datapoints based on some similarity information. The single-linkage algorithm is examined first. Then, the Lloyd algorithm is presented to illustrate the center-based clustering strategy. Finally, the problem of detecting two communities via spectral clustering is analyzed under the stochastic block model.
This chapter starts by introducing the key concepts attached to neural networks, such as architecture, weights, biases, and activation function. It proceeds with the specific choice of the rectified linear unit (ReLU) as activation function. In this case, neural networks generate continuous piecewise linear (CPwL) functions. It is then shown that, in the univariate setting, any CPwL function can generated by a shallow ReLU network. This is no longer true in the multivariate setting, for which it is nonetheless shown that any CPwL function can generated by a deep ReLU network.
This chapter touches on some aspects related to the training of neural networks. First, a method called backpropagation is presented as a way to efficiently compute gradients in descent algorithms when deep networks are used. Next, the chapterconsiders shallow networks in the overparametrized regime, and it is proved that the empirical-risk landscape, despite its nonconvexity, features no strict local minimizers. Finally, convolutional neural networks are briefly mentioned.
In this chapter, a variation of the standard compressive sensing problem is studied. In this variation, sparse vectors are replaced by low-rank matrices. Recovery is now performed by nuclear-norm minimization, with success characterized by an analog of the null space property for the observation map. This property holds with high probability for random observation maps, again as a consequence of an analog of the restricted isometry property. Finally, a formulation of nuclear norm minimization as a semidefinite program is justified.
This appendix states and proves several important results about completeness, convexity, and extreme points. These results, including the supporting hyperplane theorem and the Hahn–Banach extension theorem, are invoked throughout the text.