To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Optimization problems often have symmetries. For example, the value of the cost function may not change if its input vectors are scaled, translated or rotated. Then, it makes sense to quotient out the symmetries. If the quotient space is a manifold, it is called a quotient manifold. This often happens when the symmetries result from invariance to group actions: This chapter first reviews conditions for this to happen. Continuing with general quotient manifolds, the chapter reviews geometric concepts (points, tangent vectors, vector fields, retractions, Riemannian metrics, gradients, connections, Hessians and acceleration) to show how to work numerically with these abstract objects through lifts. The chapter aims to show the reader what it means to optimize on a quotient manifold, and how to do so on a computer. To this end, two important sections detail the relation between running Riemannian gradient descent and Newton’s method on the quotient manifold compared to running them on the non-quotiented manifold (called the total space). The running example is the Grassmann manifold as a quotient of the Stiefel manifold. Its tools are summarized in a closing section.
This chapter provides the classical definitions for manifolds (via charts, which have not appeared thus far), smooth maps to and from manifolds, tangent vectors and tangent spaces and differentials of smooth maps. Special care is taken to introduce the atlas topology on manifolds and to justify topological restrictions in the definition of a manifold. It is shown explicitly that embedded submanifolds of linear spaces as detailed in earlier chapters are manifolds in the general sense. Then, sections go on to explain how the geometric concepts introduced for embedded submanifolds extend to general manifolds mostly without effort. This includes tangent bundles, vector fields, retractions, local frames, Riemannian metrics, gradients, connections, Hessians, velocity and acceleration, geodesics and Taylor expansions. One section explains why the Lie bracket of two vector fields can be interpreted as a vector field (which we omitted in Chapter 5). The chapter closes with a section about submanifolds embedded in general manifolds (rather than only in linear spaces): This is useful in preparation for the next chapter.
To design more sophisticated optimization algorithms, we need more refined geometric tools. In particular, to define the Hessian of a cost function, we need a means to differentiate the gradient vector field. This chapter highlights why this requires care, then proceeds to define connections: the proper concept from differential geometry for this task. The proposed definition is stated somewhat differently from the usual: An optional section details why they are equivalent. Riemannian manifolds have a privileged connection called the Riemannian connection, which is used to define Riemannian Hessians. The same concept is used to differentiate vector fields along curves. Applied to the velocity vector field of a curve, this yields the notion of intrinsic acceleration; geodesics are the curves with zero intrinsic acceleration. The tools built in this chapter naturally lead to second-order Taylor expansions of cost functions along curves. These then motivate the definition of second-order retractions. Two optional closing sections further consider the important special case of Hessians on Riemannian submanifolds, and an intuitive way to build second-order retractions by projection.
Optimization on Riemannian manifolds–the result of smooth geometry and optimization merging into one elegant modern framework–spans many areas of science and engineering, including machine learning, computer vision, signal processing, dynamical systems and scientific computing.
This text introduces the differential geometry and Riemannian geometry concepts that will help students and researchers in applied mathematics, computer science and engineering gain a firm mathematical grounding to use these tools confidently in their research. Its charts-last approach will prove more intuitive from an optimizer's viewpoint, and all definitions and theorems are motivated to build time-tested optimization algorithms. Starting from first principles, the text goes on to cover current research on topics including worst-case complexity and geodesic convexity. Readers will appreciate the tricks of the trade sprinkled throughout the book for conducting research in this area and for writing effective numerical implementations.
Before we define any technical terms, this chapter describes simple optimization problems as they arise in data science, imaging and robotics, with a focus on the natural domain of definition of the variables (the unknowns). In so doing, we proceed through a sequence of problems whose search spaces are a Euclidean space or a linear subspace thereof (which still falls within the realm of classical unconstrained optimization), then a sphere and a product of spheres. We further encounter the set of matrices with orthonormal columns (Stiefel manifold) and a quotient thereof which only considers the subspace generated by the orthonormal columns (Grassmann manifold). Continuing, we then discuss optimization problems where the unknowns are a collection of rotations (orthogonal matrices), a matrix of fixed size and rank, and a positive definite matrix. In closing, we discuss how a classical change of variables in semidefinite programming known as the Burer–Monteiro factorization can sometimes also lead to optimization on a smooth manifold, exhibiting a benign non-convexity phenomenon.
This chapter details how to work on several manifolds of practical interest, focusing on embedded submanifolds of linear spaces. It provides two tables which point to Manopt implementations of those manifolds, and to the various places in the book where it is explained how to work with products of manifolds. The manifolds detailed in this chapter include Euclidean spaces, unit spheres, the Stiefel manifold (orthonormal matrices), the orthogonal group and associated group of rotations, the manifold of matrices with a given size and rank and hyperbolic space in the hyperboloid model. It further discusses geometric tools for optimization on a manifold defined by (regular) constraints $h(x) = 0$ in general. That last section notably makes it possible to connect concepts from Riemannian optimization with classical concepts from constrained optimization in linear spaces, namely, Lagrange multipliers and KKT conditions under linear independence constraint qualifications (LICQ).
The main purpose of this chapter is to define and analyze Riemannian gradient descent methods. This family of algorithms aims to minimize real-valued functions (called cost functions) on manifolds. They apply to general manifolds, hence in particular also to embedded submanifolds of linear spaces. The previous chapter provides all necessary geometric tools for that setting. The initial technical steps involve constructing first-order Taylor expansions of the cost function along smooth curves, and identifying necessary optimality conditions (at a solution, the Riemannian gradient must vanish). Then, the chapter presents the algorithm and proposes a worst-case iteration complexity analysis. The main conclusion is that, under a Lipschitz-type assumption on the gradient of the cost function composed with the retraction, the algorithm finds a point with gradient smaller than $\varepsilon$ in at most a multiple of $\varepsilon^2$ iterations. The chapter ends with three optional sections: They discuss local convergence rates, detail how to compute gradients in practice and describe how to check that a gradient is correctly implemented.
The optimization algorithms from Chapters 4 and 6 require only rather simple tools from Riemannian geometry, all covered in Chapters 3 and 5 for embedded submanifolds then generalized in Chapter 8. This chapter provides additional geometric tools to gain deeper insight and help develop more sophisticated algorithms. It opens with the Riemannian distance then discusses exponential maps as retractions which generate geodesics. This is paired with a careful discussion of what it means to invert the exponential map. Then, the chapter defines parallel transport to compare tangent vectors in different tangent spaces. Later, the chapter defines transporters which can been seen as a relaxed type of parallel transport. Before that, we take a deep dive into the notion of Lipschitz continuity for gradients and Hessians on Riemannian manifolds, aiming to connect these concepts with the Lipschitz-type regularity assumptions we required to analyze gradient descent and trust regions. The chapter closes with a discussion of how to approximate Riemannian Hessians with finite differences of gradients via transporters, and with an introduction to the differentiation of tensor fields of all orders.
Convexity is one of the most fruitful concepts in classical optimization. Geodesic convexity generalizes that concept to optimization on Riemannian manifolds. There are several ways to carry out such a generalization: This chapter favors permissive definitions which are sufficient to retain the most important properties for optimization purposes (e.g., local optima are global optima). Alternative definitions are discussed, highlighting the fact that all coincide for the special case of Hadamard manifolds (essentially, negatively curved Riemannian manifolds). The chapter continues with a discussion of the special properties of differentiable geodesically (strictly, strongly) convex functions, and builds on them to show global linear convergence of Riemannian gradient descent, assuming strong geodesic convexity and Lipschitz continuous gradients (via the Polyak–Łojasiewicz inequality). The chapter closes with two examples of manifolds where geodesic convexity has proved useful, namely, the positive orthant with a log-barrier metric (recovering geometric programming), and the cone of positive definite matrices with the log-Euclidean and the affine invariant Riemannian metrics.
As an entry point to differential geometry, this chapter defines embedded submanifolds as subsets of linear spaces which can be locally defined by equations satisfying certain regularity conditions. Such sets can be linearized, yielding the notion of tangent space. The chapter further defines what it means for a map to and from a submanifold to be smooth, and how to differentiate such maps. The (disjoint) union of all tangent spaces forms the tangent bundle which is also a manifold. That makes it possible to define vector fields (maps which select a tangent vector at each point) and retractions (smooth maps which generate curves passing through any point with any given velocity). The chapter then proceeds to endow each tangent space with an inner product (turning each one into a Euclidean space). Under some regularity conditions, this extra structure turns the manifold into a Riemannian manifold. This makes it possible to define the Riemannian gradient of a real function. Taken together, these concepts are sufficient to build simple algorithms in the next chapter. An optional closing section defines local frames: They are useful for proofs but can be skipped for practical matters.
The main purpose of this chapter is to motivate and analyze the Riemannian trust-region method (RTR). This optimization algorithm shines brightest when it uses both the Riemannian gradient and the Riemannian Hessian. It applies for optimization on manifolds in general, thus for embedded submanifolds of linear spaces in particular. For that setting, the previous chapters introduce the necessary geometric tools. Toward RTR, the chapter first introduces a Riemannian version of Newton's method. It is motivated by first developing second-order optimality conditions. Each iteration of Newton's method requires solving a linear system of equations in a tangent space. To this end, the classical conjugate gradients method (CG) is reviewed. Then, RTR is presented with a worst-case convergence analysis guaranteeing it can find points which approximately satisfy first- and second-order necessary optimality conditions under some assumptions. Subproblems can be solved with a variant of CG called truncated-CG (tCG). The chapter closes with three optional sections: one about local convergence, one providing simpler conditions to ensure convergence, and one about checking Hessians numerically.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.