To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The phase retrieval problem is the problem of finding the intersections between a high-dimensional magnitude torus, A, in a Euclidean space and a second set, B, defined by the auxiliary data. The problem is difficult because the set A is not convex. In this chapter we give a very explicit description of the tangent and normal bundles of the torus in both the image-space and Fourier representations. Using this description, we show that, for practical support constraints, the intersections between A and B are not usually transversal. The chapter concludes with numerical examples demonstrating this phenomenon, with various types of images and support constraints. The chapter closes with appendices on the tangent and normal bundles of submanifolds of Euclidean spaces, and a fast algorithm for finding the orthogonal projections onto the tangent and normal bundles of a magnitude torus.
This chapter repeats much of the analysis of the previous chapter but using the assumption that the image is nonnegative, and its autocorrelation image has sufficiently small support rather than an estimate for the support of the image itself. In this case the auxiliary set is B+, the set consisting of nonnegative images, and the intersections of A and B+ lie on the boundary of B+, which is convex but not smooth. This complicates the notion of transversality, as was discussed in Chapter 4. We then present many numerical examples exploring the behavior of algorithms using nonnegativty and also the assumption that the unknown image has a given l1-norm. The chapter concludes with an appendix describing an efficient algorithm for finding the l2-nearest point on the boundary of an l1-ball.
This chapter sets the stage for the study of algorithms used in the practical solution of the phase retrieval problem. It defines the types of maps used and many of the algorithms that will be considered in Chapters 7–10. Many of the concepts that arise in these chapters are introduced here as well.
CUDA is now the dominant language used for programming GPUs, one of the most exciting hardware developments of recent decades. With CUDA, you can use a desktop PC for work that would have previously required a large cluster of PCs or access to a HPC facility. As a result, CUDA is increasingly important in scientific and technical computing across the whole STEM community, from medical physics and financial modelling to big data applications and beyond. This unique book on CUDA draws on the author's passion for and long experience of developing and using computers to acquire and analyse scientific data. The result is an innovative text featuring a much richer set of examples than found in any other comparable book on GPU computing. Much attention has been paid to the C++ coding style, which is compact, elegant and efficient. A code base of examples and supporting material is available online, which readers can build on for their own projects.
A Boolean analog of the standard compressive sensing problem, known as nonadaptive group testing, is analyzed in this chapter. Its success is characterized via the notion of separability (intimately related to disjunctiveness and strong selectivity) of the testing procedure. The minimal number of tests making separability possible is determined, and a deterministic procedure using roughly this number of tests is presented. Finally, it is shown that solving a linear feasibility program allows one to exactly recover sparse binary vectors from the outcomes of a separable testing procedure.
This chapter is concerned with quasi-Monte Carlo rules, i.e., multivariate quadrature rules featuring equal weights and deterministically chosen evaluation points. The variation of a function and the star discrepancy of a set of points are defined as a prerequisite to the Koksma--Hlawka inequality, which bounds the error of a quasi-Monte Carlo rule by the product of the variation and the star discrepancy. Finally, some evaluation points with small star discrepancy are uncovered, namely the Halton sequence and the Hammersley set.
This appendix recalls some key notions of probability theory, such as tails and moment generating functions. These notions are essential in the proof of some concentration inequalities, e.g., the McDiarmid inequality. In turn, these inequalities are used to establish the restricted isometry properties for sparse vectors and for low-rank matrices required earlier.
The high dimensionality of datapoints often constitutes an obstacle to efficient computations. This chapter investigates three workarounds that replace the datapoints by some substitutes selected in a lower dimensional set. The first workaround is principal component analysis, where the lower dimensional set is a linear space spanned by the top singular vectors of the data matrix. The second workaround is a Johnson–Lindenstrauss projection, where the lower dimensional set is a random linear space. The third workaround is locally linear embedding, where the lower dimensional set is not chosen as a linear space anymore.
This chapter studies binary classification from a non-statistical viewpoint. For data that are linearly separable, the perceptron algorithm is presented first. It is followed by an optimization program, known as the hard support vector machine (SVM), consisting in maximizing the margin. For data that are not exactly linearly separable, this optimization program is relaxed into soft SVM. Finally, for data that are linearly separable only after applying a feature map, the representer theorem is used to validate the so-called kernel trick.
This chapter corroborates the empirical belief in the superiority of deep networks over shallow ones. It does so by highlighting three situations where a clear advantage can be demonstrated. First, using depth two, there are activation functions turning neural networks into universal approximators even when restricting the width. Second, depth overcomes the limitation that shallow ReLU networks cannot generate compactly supported functions. Third, the approximation rate of Lipschitz functions by deep ReLU networks is better than that of shallow ones.
This chapter considers the unsupervised learning task known as clustering, which consists in grouping unlabeled datapoints based on some similarity information. The single-linkage algorithm is examined first. Then, the Lloyd algorithm is presented to illustrate the center-based clustering strategy. Finally, the problem of detecting two communities via spectral clustering is analyzed under the stochastic block model.
This chapter starts by introducing the key concepts attached to neural networks, such as architecture, weights, biases, and activation function. It proceeds with the specific choice of the rectified linear unit (ReLU) as activation function. In this case, neural networks generate continuous piecewise linear (CPwL) functions. It is then shown that, in the univariate setting, any CPwL function can generated by a shallow ReLU network. This is no longer true in the multivariate setting, for which it is nonetheless shown that any CPwL function can generated by a deep ReLU network.
This chapter touches on some aspects related to the training of neural networks. First, a method called backpropagation is presented as a way to efficiently compute gradients in descent algorithms when deep networks are used. Next, the chapterconsiders shallow networks in the overparametrized regime, and it is proved that the empirical-risk landscape, despite its nonconvexity, features no strict local minimizers. Finally, convolutional neural networks are briefly mentioned.