A tutorial on adjoint methods and their use for data assimilation in glaciology

Glen D. Granzow

doi:10.3189/2014JoG13J205

A tutorial on adjoint methods and their use for data assimilation in glaciology

Published online by Cambridge University Press: 10 July 2017

Glen D. Granzow

Show author details

Glen D. Granzow*: Affiliation:
Department of Computer Science, University of Montana, Missoula, MT, USA E-mail: ggranzow@gmail.com

Article contents

Abstract
Preface
Introduction
A Typical Situation
A Numerical Example
A Data-Assimilation Example from Glaciology
Footnotes
References

Rights & Permissions

Abstract

This paper provides an introduction to adjoint methods, which are used to find the gradient of an objective function, as required by optimization algorithms. Examples are included, culminating in a data-assimilation problem from glaciology.

Keywords

glacier modelling glaciological instruments and methods glaciological model experiments ice-sheet modelling

Information

Type: Instruments and Methods
Information: Journal of Glaciology , Volume 60 , Issue 221 , 2014 , pp. 440 - 446

DOI: https://doi.org/10.3189/2014JoG13J205 [Opens in a new window]
Copyright: Copyright © International Glaciological Society 2014

Preface

Numerical models of glaciers and ice sheets include parameters such as viscosity (often expressed using other parameters), density, thermal conductivity, heat capacity, geothermal heat flux and basal traction coefficients. (Even basal topography can be considered a model ‘parameter’.) Many of these parameters are difficult to measure directly. We use the phrase ‘data assimilation’ to refer to a method where more easily measured data, such as surface velocities, are used to estimate the values of such parameters. The idea behind data assimilation is to find the values of the parameter of interest (e.g. a frictional coefficient along the bed) that result in the best match between values calculated by the model (surface velocities, perhaps) and measured data. Thus, finding appropriate values for model parameters can be accomplished by solving an optimization problem, namely that of minimizing a measure of the difference between calculated and measured values.

In this paper ‘adjoint methods’ are described in the context of optimization problems. Application to an idealized data-assimilation problem from glaciology is described in the final section. Although applications using real glaciological data can be found in various published research papers (e.g. Reference Brinkerhoff, Meierbachtol, Johnson and HarperBrinkerhoff and others, 2011) no attempt is made to review them here.

Introduction

Suppose we have a problem involving a collection of parameters p whose solution is u = F (p). We want to find the values of the parameters p that minimize (or maximize) a given (scalar) function g( u ). Since u is a function of the parameters p, we can think of g as a function of p. (Formally we could introduce a new function, but we believe the presentation is clearer, albeit less rigorous, without it.) Most efficient algorithms used to optimize g require knowledge of ∂g /∂p_k for each of the parameters p_k (components of p). Adjoint methods can be used to find these derivatives. We write this collection of partial derivatives as a column matrix ∂g /∂p = [∂g /∂p₁ ∂g /∂p₂ ∂g /∂p_m ]^T

The Jacobian matrix of the function F that maps the parameters p to the solution u is defined as

This Jacobian matrix can be used to approximate changes in the solution u resulting from small changes in the parameters p. For an individual component of the solution u,

The approximate changes for all components of u can be written compactly (using matrix multiplication) as

To find the desired derivatives, ∂g /∂ p, consider

Using the chain rule,

Note that the derivatives of the components of u with respect to p_k are the elements of the k th column of the Jacobian matrix, ∂ u /∂ p. We can write the desired derivatives compactly using the transpose (also called the adjoint Footnote ^* ) of the Jacobian matrix

Use of the adjoint of a Jacobian matrix seems to be where ‘adjoint methods’ get their name.

The method can be illustrated by a simple example where the function F mapping the parameters p to the solution u is given explicitly.

Example 1:

Suppose and g(u) = u ₁ + u ₂ ². We want to find .

To calculate ∂g/∂ p using the adjoint method we begin by finding the Jacobian matrix

Then

To check this solution we fine ∂g/∂p directly using g(u) = u₁ + u₂ ² =)p₁ ² + p₂) + (p₁p₂)², from which we find ∂g /∂p₁ = 2p₁ + 2p₁p₂ ² and ∂g /∂p₂ = 1 + 2p₁ ²p₂.

A Typical Situation

Often the problem to be solved (whose solution is u = F(p)) is expressed as a system of n equations, f(u, p) = 0. For this problem we still have a function g (u), for which we want to find optimal values of the parameters p and thus need ∂g/∂ p = [∂g/∂p ₁ ∂g/∂p ₂ ∂g/∂p_m ]^T. In the preceding section the chain rule was used to show that ∂g/∂ p = ∂ u/∂p ^T ∂g/∂ u. To simplify the appearance of some of the equations that follow, we begin by taking the transpose of both sides of this equation and use the matrix identity (AB) ^T = B^T A^T to obtain

As before, the elements of ∂g/∂ u are obtained by differentiating g (u), but now the elements of ∂ u /∂ p must be determined from f(u, p) = 0. If we take the total derivative of both sides of the equation f_i (u, p) = 0 with respect to p_k we obtain

Doing this for every combination of the n elements of u and the m elements of p results in nm equations that can be represented compactly as ∂ f/ ∂ u ∂ u /∂ p + ∂ f/∂p = 0:

This equation can be solved for the ∂u_i/∂p_k factors

needed to obtain our goal

Here ∂g /∂u ^T is a 1 × n (row) matrix, ∂f/∂u is n × n and ∂f /∂p is n × m. To avoid multiplying the two large matrices we perform the multiplication in the order

With the product λ^T = ∂g /∂u^T ∂f/∂u⁻¹ found by solving the adjoint problem

Note that ∂ f /∂ u is a Jacobian matrix; once again obtaining the desired partial derivatives (∂g/∂ p) uses the transpose (i.e. adjoint) of a Jacobian matrix. Also note that this Jacobian matrix is the one used in Newton’s method to (approximately) solve f(u, p) = 0 (with fixed p) iteratively.

Example 2:

Suppose

f ₁₍ u ₁, u ₂, p ₁, p ₂₎ = u ₁ + u ₂ + p ₁

f ₂₍ u ₁, u ₂, p ₁, p ₂) = u ³ ₁ – u ₂ + p ₂

g (u ₁, u ₂₎ = u ² ₁ + u ² ₂

The system of equations f ₁₍ u ₁, u ₂, p ₁, p ₂) = 0 and f ₂(u ₁, u ₂, p ₁, p ₂) = 0 has a solution, (u ₁, u ₂₎, that depends on the parameters p ₁ and p ₂. Changing the values of the parameters p ₁ and p ₂ causes the values of the variables u ₁ and u ₂ to change and thus the value of g (u ₁, u ₂) also changes. We want to find ∂g/∂ p ^T = [∂g/∂p ₁ ∂g/∂p ₂].

To calculate ∂g/∂ p ^T using the adjoint method we begin by finding the Jacobian matrix

along with

Next we write the adjoint problem (∂f /∂u)^T λ = ∂g /∂u

and solve it to find

The desired derivatives are now found from

To appreciate the meaning of these derivatives we consider a couple of specific value pairs for the parameters p ₁ and p ₂.

First, if p ₁ = 0 and p ₂ = 0, the equation f(u, p) = 0 has solution u = 0, so g( u) = 0. Since g = u ² ₁ + u ² ₂ ≥ 0 (always), g( u) = 0 must be a global minimum so we expect ∂g/∂ p = 0. Indeed, substituting u ₁ = 0 and u ₂ = 0 into our expression, confirm this,

To gain further appreciation of ∂g/∂ p, we consider a graphical representation of our example problem. If we plot solutions of f ₁(u ₁, u ₂, p ₁, p ₂₎ = u ₁ + u ₂ + p ₁ = 0 in the (u ₁, u ₂₎ plane we obtain a line with slope –1 and vertical intercept –p ₁. If we plot solutions of f ₂₍ u ₁, u ₂, p ₁, p ₂) = u ³ ₁ – u ₂ + p ₂ = 0 in the (u ₁, u ₂) plane we obtain a cubic curve that crosses the vertical axis at p ₂ with slope 0. The solution to the system of equations f(u, p) = 0 is represented by the point where the line and the cubic curve cross. The function g gives the square of the distance from the origin, so circles centered on the origin represent contours of constant g. Curves for p ₁ = –2, p ₂ = 0 and g = 2 are shown in Figure 1.

Fig. 1. Curves for the problem presented in example 2, with p ₁ = –2, p ₂ = 0 and g( u) = 2.

As can be seen from Figure 1, for this value of p the solution of f(u, p) = 0 is u = (1, 1). If we increase p ₁, the line with slope 1 will move down in the figure so the solution point will follow the cubic curve, decreasing its distance from the origin, so g decreases. If, however, we increase p ₂, the cubic curve will move up so the solution point will follow the line with slope –1 which is tangent to the (dashed) circle representing a contour of constant g. Thus we expect ∂g/∂p ₁ < 0 and ∂g=∂p ₂ = 0. Substituting the solution u ₁ = 1 and u ₂ = 1 into our expression gives ∂g/∂p ^T = [–2 0], confirming this. The reader is encouraged to use the graphical representation to find other parameter pairs where one of the components of ∂g/∂p is zero and use the expression for ∂g/∂p ^T to confirm their findings.

A Numerical Example

In this section we use finite differences to approximate the boundary-value problem

where p (x) = p ₀ + p ₁ x + p ₂ x ² (a second-order polynomial). After computing an approximate solution for particular values of the parameters c ₂, c ₁, c ₀, p ₀, p ₁, p ₂, a ₀ and a ₁, we use the adjoint method to calculate the rate of change of a function of the solution, g (u), as we vary these parameters. Two different functions are considered: (1) g is the value of u at the midpoint of the domain, and (2) g is the average value of u,

To apply the finite-difference method to our differential equation we divide the domain [0, 1] into n subintervals, each with length The endpoints of these subintervals are x ₀, x ₁, …, x_n , where x_k = k∆x. We denote approximate values of u at each of these points as u_k . The first derivative, u ^ʹ, at x_k is approximated by . The second derivative, u ^ʹʹ, at x_k is approximated by u_k ’ = Substituting these approximations into the differential equation gives

which can be rearranged to give

a linear algebraic equation that can be written for each of the interior points k = 1, 2, …, n –1. At the endpoints of the domain we assert u ₀ = a ₀ and u_n = a ₁. This collection of equations can be written compactly as Au = b, where A is a tridiagonal matrix, and u and b are column matrices. The matrix equation can be solved (e.g. using Gaussian elimination and back substitution) for u, giving an approximation to the solution of the boundary-value problem.

Following the procedure in the preceding paragraph for the (somewhat arbitrary) parameter values c ₂ = 1, c ₁ = –2, c ₀ = 1, p ₀ = 1, p ₁ = 1, p ₂ = –5, a ₀ = 0 and a ₁ = 0 generates the approximation shown in Figure 2.

Fig. 2. Solution to the boundary-value problem u″ – 2u′ + u = 1 + x – 5x ² for 0 < x < 1, and u( 0) = 0, u( 1) = 0. The solid curve is the exact solution; the black circles are an approximate solution found using finite differences with n = 20.

We can use the adjoint method to find the rate of change of a function of u with respect to the parameters c ₂, c ₁, c ₀, p ₀, p ₁, p ₂, a ₀ and a ₁. We consider first the simple function where an even value for n has been chosen.

To utilize the procedure described in the preceding section, the system of equations represented by the matrix equation Au = b is written in the form f(u, p) = 0, by subtracting the right-hand sides from both sides of each equation. Here p = [c ₂ c ₁ c ₀ p ₀ p ₁ p ₂ a ₀ a ₁]. To find ∂g/∂ p we need the three matrices ∂ f/ @ u, ∂ f/ @ p and ∂g/∂ u. Since ∂f/ ∂ u and ∂ f/ ∂ p require two columns to be displayed clearly, they are shown in Figure 3. For g (u) = u_n/ ₂,

Fig. 3. Matrices used to calculate ∂g/∂ p for the numerical example involving a boundary-value problem.

After solving the adjoint problem ∂ f/ ∂ u ^T λ = ∂g/∂ u for λ we can calculate ∂g/∂ p ^T = –λ^T∂ f/∂p. The results for the example problem whose solution is shown in Figure 2 are given in Table 1.

Table 1 Partial derivatives for using a finite-difference approximation to the boundary-value problem c ₂ u ^ʹʹ + c ₁ ⁰ + c ₀ u = p ₀ + p ₁ x + p ₂ x ² for 0 < x < 1, and u (0) = a ₀, u( 1) = a ₁ with c ₂ = 1, c ₁ = –2, c ₀ = 1, p ₀ = 1, p ₁ = 1, p ₂ = 5, a ₀ = 0 and a ₁ = 0

The values of the partial derivatives found using the adjoint method were verified by perturbing the parameters one at a time. For each parameter (c ₂, c ₁, c ₀, p ₀, p ₁, p ₂, a ₀ and a ₁) an additional finite-difference approximation was calculated with that parameter increased by 0.01, giving new values of u. Calling these new values ũ we can approximate the partial derivative of g with respect to the perturbed parameter by . As seen in Table 1, these approximations are very close to the values found using the adjoint method.

To calculate the partial derivatives of we use (the composite) Simpson’s rule (which requires an even value for n) to approximate this integral,

from which we have

Again, ∂g/∂p ^T = –λ^T ∂f/∂p, where λ is the solution of the adjoint problem ∂f/∂u ^Tλ = ∂g/∂u. Results are given in Table 1.

A Data-Assimilation Example from Glaciology

Mathematical models of glaciers typically include parameters which are difficult to estimate directly from field measurements. For example, some type of frictional coefficient is required at the base of the glacier where the ice rests upon the earth. The basal traction (described mathematically using this frictional coefficient) affects the velocity of the ice throughout the glacier. Thus, it is plausible to use the more easily measured velocity at the upper surface of the glacier to determine the frictional coefficient. In this concluding section we present an example problem, using a simplified geometry to demonstrate the use of the adjoint method to accomplish this.

The example problem is related to a benchmark problem, ‘Experiment D’ of the Ice Sheet Model Intercomparison Project for Higher-Order Models (ISMIP-HOM) (Reference PattynPattyn and others, 2008). In ISMIP-HOM Experiment D, a sheet of ice with uniform thickness of 1000 m rests on a plane inclined at an angle α = 0: 1°. A domain of length 20 km with periodic boundary conditions at each end is considered. A numerical model is used to calculate the velocity of the ice, which varies not only throughout the thickness of the ice but also along the length of the domain, due to variations in basal traction. There is no variation in the horizontal direction transverse to the inclination.

In ISMIP-HOM Experiment D, the basal traction is prescribed and ice velocities (and pressure) are calculated. We call this a ‘forward problem’. Here we are interested in a related ‘inverse problem’: given the surface velocity, find the basal traction.

The model used is a finite-element approximationFootnote ^* of Stokes’ equationsFootnote ^†

along with conservation of mass,

In these equations, x and y are the coordinate directions, x along the incline and y perpendicular to it (almost vertical). The components of velocity in the x – and y –directions are given by u and v, respectively. Pressure is represented by the variable p, while ρ = 910 kg m^–3 is the density of ice and g = 9: 81 m s⁻² is the acceleration due to gravity. The viscosity, μ, varies according to Glen’s flow lawFootnote ^‡

with the effective strain rate

And flow parameters A = 10⁻¹⁶ Pa^-na⁻¹ and n = 3.

Stress-free boundary conditions apply at the upper surface

While at the base v = 0 and basal traction is modeled using the boundary condition

In ISMIP-HOM Experiment D

where L is the length of the domain (20 km for the problem presented here). This frictional coefficient is plotted as a heavy black curve in the lower left panel of Figure 4. The upper left panel of Figure 4 shows the velocity component, u, at the (top) surface of the ice for a solution of the ‘forward’ problem (using β ² = 1000 + 1000 sin 2πx/L)) plotted as a heavy black line.

Fig. 4. Solution of an inverse problem. The component of the ice velocity in the direction parallel to the bed (u; m a⁻¹) at the (top) surface of the ice is shown in the upper left panel. This velocity evolves towards the desired solution (the heavy black curve) as the basal friction (β ², plotted in the lower left panel) is changed. The upper right panel shows the error, g ðu Þ, decreasing as the coefficients, p_k , in the trigonometric expansion of β ² (shown in the lower right panel) change.

Now, for the inverse problem. Suppose we are given the surface velocity component, u, plotted as a heavy black curve in the upper left panel of Figure 4. We hope to use this information to discover the basal friction coefficient, β ² (plotted as a heavy black curve in the lower left panel of Fig. 4). To do so we pose the problem as an optimization problem: find β ² such that the surface velocity component, u, found by solving the ‘forward’ problem minimizes the error given by

Where u_d is the desired surface velocity. To discretize the function β² a trigonometric expansionFootnote ^§ is used,

A variety of algorithms for solving optimization problems exist (Reference PressPress, 2007). Here we choose the Broyden–Fletcher– Goldfarb–Shano (BFGS) algorithm (Reference PressPress, 2007, p. 521–524), which uses partial derivatives of the objective function, g(u), with respect to the parameters, p_k , that can be varied. These partial derivatives (∂g/∂p_k ) are exactly what the adjoint method described in the previous sections can provide. Starting with a ‘guess’ for β ², an approximate solution to the inverse problem is obtained by repeating the following steps:

1. Solve the ‘forward’ problem with the current estimate of β².
2. Evaluate the error, g(u).
3. Evaluate the partial derivatives, ∂g/∂p_k , using the adjoint method.
4. Calculate an improved estimate of β² (i.e. the p_k ) according to the BFGS algorithm.

The results of applying this process are shown in Figure 4. The initial guess was β² = 1000; i.e. p ₀ = 1000 and p_k = 0 for 1 ≤ k ≤ N where, here, N = 30. The gray curves in the left panels of Figure 4 show approximations of u and β ² as the process progressed. The original guess of² ¼ 1000 (represented by a horizontal line in the lower left panel of Fig. 4) resulted in the bottommost horizontal line in the upper left panel of Figure 4. As the estimate of² improves, the curves approach the desired values plotted as heavy black curves. The upper right panel of Figure 4 shows the error, g (u), decreasing throughout the process. Further improvement can be made by allowing more iterations of the algorithm. The lower right panel of Figure 4 shows the evolution of the coefficient (p_k ) values. The upper curve represents p ₀ which starts at 1000, dips down for a while but returns to a value very near 1000. The curve that increases from zero to near 1000 represents p ₁. The remainder of the curves, which remain near zero, represent p ₂ through p ₃₀. In the optimal solution these values are all zero, but the largest, p ₉, was 36.9 when the process was terminated. Again, further improvement can be made by allowing more iterations.

Acknowledgements

Most of the ideas in the Introduction were taken from section 3 of Ericco (1997). Most of the ideas in the section entitled ‘A typical situation’ were taken from section 8.7 of Reference StrangStrang (2007). For the ‘A data-assimilation example from glaciology’ section the finite-element and adjoint methods were implemented using the FEniCS and dolfin-adjoint software packages. Thanks to Jesse Johnson who served as committee chair for my thesis (Reference GranzowGranzow, 2013), which includes source code for computer programs that produced Figures 2 and 4. Thanks also to scientific editor Weili Wang and reviewers Fuyuki Saito and Stephen Price. This paper was written while the author was supported by NASA Research Opportunities in Space and Earth Sciences (ROSES) grant NNX11AR23G.

Footnotes

page 440 note * The adjoint of a matrix Z whose elements are complex numbers is the transpose of the matrix whose elements are the complex conjugates of the elements of Z. For real-valued matrices, such as the Jacobian matrices in this paper, the transpose and the adjoint are the same.

page 444 note * The finite-element method is a technique for discretizing a boundary-value problem, approximating its solution using the solution to a system of algebraic equations. One step in the method is to partition the domain of the problem into a set of ‘elements’. Here a uniform grid with 32 elements along the x –axis and 16 elements along the y –axis was used.

† Stokes’ equations represent Newton’s second law, ∑F=m a, for a viscous fluid when the inertial term, m a, is negligible. The equations presented here have been simplified using the assumption that derivatives with respect to z (the transverse direction) are zero.

‡ Glen’s flow law is a nonlinear constitutive equation used to model the relationship between stress and strain rates in ice. This relationship is temperature-dependent, but the model used here assumes that the ice is isothermal.

§ Fourier series, which are trigonometric expansions of this form with an infinite number of terms, can be used to represent any differentiable periodic function (Reference TolstovTolstov, 1962). Truncated Fourier series, such as that used here, are often used to approximate functions of unknown form. An alternate discretization of β ² would be an expansion in terms of the ‘test functions’ used in the finite-element method. This could result, for example, in approximating β ² with a continuous piecewise linear function.

References

Brinkerhoff, DJ, Meierbachtol, TW, Johnson, JV and Harper, JT (2011) Sensitivity of the frozen/melted basal boundary to perturbations of basal traction and geothermal heat flux: Isunguata Sermia, western Greenland. Ann. Glaciol., 52(59), 43–50 (doi: 10.3189/172756411799096330)CrossRef Google Scholar

Errico, RM (1997) What is an adjoint model? Bull. Am. Meteorol. Soc., 78 (11), 2577–2591 (doi: 10.1175/1520–0477(1997)078< 2577:WIAAM> 2.0.CO;2)2.0.CO;2>CrossRef Google Scholar

Granzow, GD (2013) An investigation of viscosity using measured velocities on Helheim Glacier. (Master’s thesis, University of Montana)Google Scholar

Pattyn, F and 20 others (2008) Benchmark experiments for higher-order and full-Stokes ice sheet models (ISMIP-HOM). Cryo-sphere, 2(2), 95–108 (doi: 10.5194/tc-2–95–2008)Google Scholar

Press, WH (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge Google Scholar

Strang, G (2007) Computational science and engineering. Wellesey-Cambridge, Wellesey, MA Google Scholar

Tolstov, GP [transl. A. Silverman] (1962) Fourier series. Prentice-Hall, Englewood Cliffs, NJ Google Scholar

Fig. 1. Curves for the problem presented in example 2, with p1 = –2, p2 = 0 and g(u) = 2.

Fig. 2. Solution to the boundary-value problem u″ – 2u′ + u = 1 + x – 5x2 for 0 < x < 1, and u( 0) = 0, u( 1) = 0. The solid curve is the exact solution; the black circles are an approximate solution found using finite differences with n = 20.

Fig. 3. Matrices used to calculate ∂g/∂p for the numerical example involving a boundary-value problem.

Table 1 Partial derivatives for using a finite-difference approximation to the boundary-value problem c2uʹʹ + c10 + c0u = p0 + p1x + p2x2 for 0 < x < 1, and u (0) = a0, u( 1) = a1 with c2 = 1, c1 = –2, c0 = 1, p0 = 1, p1 = 1, p2 = 5, a0 = 0 and a1 = 0

Fig. 4. Solution of an inverse problem. The component of the ice velocity in the direction parallel to the bed (u; m a−1) at the (top) surface of the ice is shown in the upper left panel. This velocity evolves towards the desired solution (the heavy black curve) as the basal friction (β2, plotted in the lower left panel) is changed. The upper right panel shows the error, g ðu Þ, decreasing as the coefficients, pk, in the trigonometric expansion of β2 (shown in the lower right panel) change.

Article contents

A tutorial on adjoint methods and their use for data assimilation in glaciology

Abstract

Keywords

Information

Preface

Introduction

A Typical Situation

A Numerical Example

A Data-Assimilation Example from Glaciology

Acknowledgements

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests