To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
‘Assimilate’ is a word that conjures up a variety of meanings ranging from its use in the biological to the social to the physical sciences. In all of its uses and meanings, the word embraces the concept of incorporation. “Incorporation of what?” is central to the definition. In our case, we expand its usage by appending the words dynamic and data – where dynamic implies the use of a law or equation or set of equations, typically physical laws. Now we have dynamic law, we have data and we assimilate. It is this melding of data to law or matching of data and law or, in the spirit of the dictionary definition, incorporating data into the law that captures the meaning of our title, dynamic data assimilation.
Its modern-day usage stems from the efforts of meteorologists to estimate the 3-D state of the global atmosphere. The genesis of this effort began in the midto late-1960s when atmospheric general circulation models (now known as global or climate prediction models) came into prominence and the weather satellites began to collect data on a global scale. The major question surfaced: Is it possible or feasible to make long-term weather predictions, predictions on the order of weeks instead of days? Certainly a first step in such an endeavor is to estimate the atmospheric state in the global domain so that the deterministic model can use this state as initial condition and march out into the future via numerical integration of the governing equations. Over the populated continents of the world, Europe and the North American Continent, conventional data are generally sufficient to define a meaningful state.
In this chapter we provide an overview of the historical developments that led to the vast and rich discipline called dynamic data assimilation.
Where do we begin the history?
In Chapters 1 and 2 we have established our philosophy of dynamic data assimilation. Central to this philosophy is the existence of data and governing equations or model dynamics. Thus, the Herculean efforts by scientists like Galileo and Kepler and Newton, efforts that made use of observations to formulate theory, fall outside our scope. Their monumental contributions established some of the governing equations (also known as constraints) upon which dynamic data assimilation depends.
The mathematicians and astronomers of the seventeenth and eighteenth centuries who made use of the Newtonian laws to calculate the orbits of comets were the first data assimilators in our sense of the definition. Newton was among them and he discussed the problem in Principia (Book III, Prop. XLI). Regarding the problem of determining the orbit of comets, he said: “This being a problem of very great difficulty, I tried many methods of resolving it.” Among the early investigators of this problem were Leonard Euler, Louis Lagrange, Pierre-Simon Laplace, and lesser known amateur astronomers like Heinrich Olbers. The task of finding the path of a comet, of course, relied on the coupled set of nonlinear differential equations that described its path under the assumption of two-body celestial mechanics – the motion controlled by the gravitational attraction of the comet to the Sun.
In this chapter we revisit the linear least squares estimation problem and solve it using the method of orthogonal projection. This geometric view is quite fundamental and has guided the development and extension of least squares solutions in several directions. In Section 6.1 we describe the basic principles of orthogonal projections, namely, projecting a vector z onto a single vector h. In Section 6.2, we discuss the extension of this idea of projecting a given vector z onto the subspace spanned by the columns of the measurement matrix H ∈ ℝm×n. An interesting outcome of this exercise is that the set of linear equations defining the optimal estimate by this geometric method are identical to those derived from the method of normal equations. This invariance of the least squares solution with respect to the methods underscores the importance of this class of solutions. Section 6.3 develops the geometric equivalent of the weighted or generalized linear least squares problem. It is shown that the optimal solution is given by an oblique projection as opposed to an orthogonal projection. In Section 6.4 we derive conditions for the invariance of least squares solutions under linear transformations of both the model space ℝn and the observation space ℝm. It turns out invariance is achievable within the framework of generalized or weighted least squares formulation.
Orthogonal projection: basic idea
Let h = (h1, h2, …, hm)T ∈ ℝm be the given vector representing the measurement system, and let z = (z1, z2, …, zm)T ∈ ℝm be a set of m observations, where it is assumed that z is not a multiple of h. Refer to Figure 6.1.1.
In Chapters 5 and 7 the least squares problem – minimization of the residual norm, f (x) = ∥r(x)∥ where r(x) = (z − Hx) in (5.1.11) and (7.1.3) with respect to the state variable x is formulated. There are essentially two mathematically equivalent approaches to this minimization. In the first, compute the gradient ∇ f (x) and obtain (the minimizer) x by solving ∇ f (x) = 0. We then check if the Hessian ∇2f (x) is positive definite to guarantee that x is indeed a local minimum. In the linear least squares problem in Chapter 5, f (x) is a quadratic function x and hence ∇ f (x) = 0 leads to the solution of a linear system of the type Ax = b with A a symmetric and positive definite matrix (refer to (5.1.17)) which can be solved by the methods described in Chapter 9. In the nonlinear least squares problem, f (x) may be highly nonlinear (far beyond the quadratic nonlinearity). In this case, we can compute x by solving a nonlinear algebraic system given by ∇ f (x) = 0, and then checking for the positive definiteness of the Hessian ∇2f(x). Alternatively, we can approximate f(x) locally around a current operating point, say, xc by a quadratic form Q(y) (using either the first-order or the second-order method described in Chapter 7) where y = (x – xc).
In this opening chapter of Part V we develop the basic concepts leading to the formulation of the so-called data assimilation problem for static models. This problem arises in a wide variety of application domains and accordingly it goes with different terminologies that are unique to an application domain. For example, in oceanography and geological exploration, it is known as the inverse problem. In meteorology, this is known as the retrieval problem, objective analysis, three dimensional variational assimilation (3DVAR) problem, to mention a few. Henceforth, we use the term static data assimilation problem, retrieval problem, and inverse problem interchangeably. Despite these differences in the origin and the peculiarities of the labels, there is a common mathematical structure – a unity in diversity – that underlie all of these problems. The primary aim of this chapter is to develop this common framework. In Part VI we develop the data assimilation for dynamic models.
In Section 18.1, we describe the basic building blocks leading to the statement of the data assimilation problem for the static model. It turns out that this problem is intrinsically under-determined (where the number n of unknown variables is larger than the number, m of equations) which in turn implies that the solution space has a large degree of freedom (equal to n–m) leading to infinitely many solutions. Any attempt to induce uniqueness of the solution calls for the reduction of the dimensionality of the solution space. This is achieved by a general technique that is called regularization. A comprehensive review of the various regularization techniques that are commonly used is given in Section 18.2.
This chapter complements Chapter 1 by providing a bottom-up view of data assimilation through illustrative examples – one for each of the four classes of problems introduced there. We also include a discussion of problems associated with sensitivity and predictability. Using the standard least squares formulation, we provide a natural and intuitive interpretation of the solutions to these problems.
Least squares
The central criterion used in data assimilation is least squares. As stated earlier, it arose 200 years ago and history has bestowed simultaneity of discovery on both Gauss and Legendre. It assumes a variety of forms, but its fundamental tenet in data assimilation is minimization of the squared departure between the desired estimate and observations and/or other “background” information (typically a forecast). It was built on the foundation of variational calculus, the branch of mathematics that explores minimization of integrals – for example, integrals that express the path of quickest descent (the brachistichrone problem), path of least time (refraction of light), and the principle of least action. As such, there is a rich heritage of applied mathematical methods that can be brought to bear on these minimization problems.
Deterministic/Static problem
In its simplest form, the solution of a data assimilation problem underpinned by least squares reduces to averaging the observations. It is no more or no less than the “carpenter's rule of thumb”: the best estimate of a length measured more than once with the same instrument is the average of the measurements. Let's put this adage in the context of a dynamical law where we choose the nonlinear advection constraint of Burgers (see Chapter 3).
It was around 1660 Newton discovered the method for solving nonlinear equations that bears his name. Shortly thereafter – around 1665 – he also developed the secant method for solving nonlinear equations. Since then these methods have become a part of the folklore in numerical analysis (see Exercises 12.1 and 12.2). In addition to solving nonlinear equations, these methods can also be applied to the problem of minimizing a nonlinear function. In this chapter we provide an overview of the classical Newton's method and many of its modern relatives called quasi-Newton methods for unconstrained minimization. The major advantage of the Newton's method is its quadratic convergence (Exercise 12.3) but in finding the next descent direction it requires solution of a linear system which is often a bottleneck. Quasi-Newton methods are designed to preserve the good convergence properties of the Newton's method while they provide considerable relief from this computational bottleneck. Quasi-Newton methods are extensions of the secant method. Davidon was the first to revive the modern interest in quasi-Newton methods in 1959 but his work remained unpublished till 1991. However, Fletcher and Powell in 1963 published Davidon's ideas and helped to revive this line of approach to designing efficient minimization algorithms.
The philosophy and practice that underlie the design of quasi-Newton methods underscore the importance of the trade - off between rate of convergence and computational cost and storage.
In this chapter we describe a deterministic approach to stability and predictability of dynamic systems. While the classical stability theory deals with characterizing the growth and behavior of perturbations around an equilibrium state of a system, the goal of the predictability theory is to quantify the growth and behavior of infinitesimally small perturbations superimposed on an evolving trajectory – be it stable, unstable or chaotic – of the given dynamical system. Any two states that are infinitesimally close to each other are called analogs. Thus, predictability theory seeks to characterize the future evolution of analogous states. Since every trajectory starts from an initial state, predictability analysis is often recast as one of analyzing the sensitive dependence on initial state. Despite this difference in goals, both stability and predictability theories depend heavily on the same set of mathematical ideas and tools drawn from the spectral (eigenvalue) theory of finite dimensional (matrix) operators.
The goals and problems related to deterministic predictability theory are reviewed in the opening Section 32.1. Section 32.2 through 32.5 provide a succinct review of stability theory of dynamical systems. Predictability analysis using singular vectors is developed in Section 32.6. A summary of a fundamental theorem of Osledec leading to the definition of Lyapunov vectors and Lyapunov indices is given in Section 32.7. This section also contains two related algorithms for computing Lyapunov indices. The concluding Section 32.8 describes two methods for generating deterministic ensembles using which one can assess evolution of the spread among the trajectories measured among other things through the sample covariance.
This chapter provides an introduction to the principles and techniques of statistical least squares estimation of an unknown vector x ∈ ℝn when the observations are corrupted by additive random noise. While the techniques and developments in this chapter parallel those of Chapter 5, the key assumption relative to the random nature of the observation sets this chapter apart. An immediate consequence is that the estimates are random variables and we now need to contend with the additional challenge of quantifying its mean, variance and many of the other desirable attributes such as unbiasedness, efficiency, consistency, to mention a few.
Section 14.1 contains the derivation of the statistical least squares estimate. An analysis of the quality of the fit between the linear model and the data is presented in Section 14.2. The Gauss–Markov theorem and its implications of optimality of the linear least squares estimates are covered in Section 14.3. A discussion of the model error and its impact on the quality of the least squares estimate is presented in Section 14.4.
Statistical least squares estimate
Consider the linear estimation problem where the unknown x ∈ ℝn and the known observation z ∈ ℝm are related as
where H ∈ ℝm×n is a known matrix and v is the additive random noise corrupting the observations. For definiteness, it is assumed that m > n. This noise vector v is not observable and to render the problem tractable, the following assumptions are made.
This chapter provides an overview of the methods for recursively estimating the state of a nonlinear stochastic dynamical system based on a set of observations that (a) depend (nonlinearly) on the state being estimated and (b) are corrupted by additive white noise. The exact solution to this problem involves characterizing the evolution of the posterior probability density function over the state space, ℝn. This evolution equation can easily be derived from first principles. However, except in special cases (linear dynamics and linear observations) it is often difficult to explicitly characterize the form of the density as a function of space and time. Numerical methods are the only recourse to solving this class of infinite dimensional problems. Given this challenge and the difficulty, researchers have sought for alternate characterization, namely to compute the evolution of the moments of distribution of states being estimated. Ideally, one would require infinitely many moments to provide an equivalent characterization of the distribution. This infinite dimensional problem is further exacerbated by the fact that the rth moment often depends on the qth moment, for q > r. Computational feasibility demands that we find a “good” finite dimensional approximation to this infinite system of coupled moments.
One useful idea is to find the closure property among these moments, namely to find the least positive integer p such that the first p moments depend only among themselves and not on moments of order larger than p.
In the variational approach, the dynamic data assimilation problem is recast as a minimization of the least squares performance criterion subject to the dynamic constraints. The first-order adjoint methods described in Chapters 22–24 enable us to compute the gradient of this objective function. Since the convergence of the gradient algorithm can be slow, especially in nonlinear problems of interest in geophysical applications, the gradient obtained using the first-order adjoint method is often used in conjunction with the quasi-Newton methods (Chapter 12) to obtain faster convergence. The strength of the quasi-Newton methods lies in their ability to extract the approximate Hessian of the objective function which in turn is used in a Newton-like algorithm. It is well known that minimization algorithms using the Hessian information perform better. Thus it behooves us to ponder the following question: in addition to the gradient, can we directly compute the Hessian related information, namely the Hessian-vector product? If this information can be obtained, we can then use it in conjunction with the conjugate gradient algorithm to obtain faster convergence. A framework for using the Hessian-vector product within the conjugate gradient algorithm framework is described in Section 12.3.
In this chapter we derive the so-called second-order adjoint method for computing simultaneously the gradient and the Hessian-vector product. The derivation for the scalar case is given in Section 25.1 and its extension to include the vector case is given in 25.2. Section 25.3 describes an application of the second-order adjoint method for computing the sensitivity of a response function. First-order adjoint sensitivity computations are given in Section 24.5.
This chapter provides an overview of the classical Bayesian method for point estimation. The main point of departure of this method from other methods is that it considers the unknown x as a random variable. All the prior knowledge about this unknown is summarized in the form of a known prior distribution p(x) of x. If z is the set of observations that contains information about the unknown x, this distribution is often given in the form of a conditional distribution p(z│x). The basic idea is to combine these two pieces of information to obtain an optimal estimate of x, called the Bayes estimate.
The Bayesian framework is developed in Section 16.1. Special classes of Bayesian estimators – Bayes least squares estimate leading to the conditional mean (which is also the minimum variance estimate), conditional mode, and conditional median estimates are derived in Section 16.2.
The Bayesian framework
Let x ∈ ℝn be the unknown to be estimated and z ∈ ℝm be the observations that contain information about the unknown x to be estimated. The distinguishing feature of the Bayes framework is that it also treats the unknown x as a random variable. It is assumed that a prior distributionp(x) is known. This distribution summarizes our initial belief about the unknown. It is assumed that nature picks a value of x from the distribution p(x) but decides to tease us by not disclosing her choice, thereby defining a game.
In this chapter we extend the first-order adjoint method developed in the context of linear dynamical systems in Chapter 23 to the case of general nonlinear dynamical systems. Since most of the models of interest in research and operations are nonlinear, the contents of this chapter are especially applicable to real world problems.
In Chapter 23 we have brought out the similarities and differences between the two methods – the Lagrangian approach and the adjoint operator theoretic approach – for computing the gradient of the functional representing the desired criterion. Since we have featured the Lagrangian method twice – in Chapters 22 and 23 – we use the adjoint operator theoretic approach in this chapter. Our decision to use alternate approaches in different chapters is driven by two goals. First it provides variety and is hopefully more stimulating to the reader. Second, and more importantly, it enables the reader to acquire dexterity with the spectrum of tools that can be applied to real world problems.
The statement of the inverse problem is contained in Section 24.1. The first-order perturbation method for quantifying the growth of error is described in Section 24.2. Computation of the gradient using the adjoint method and an algorithm for minimization are given in Section 24.3 and 24.4 respectively. In Section 24.5, we introduce the reader to computation of model output sensitivity via adjoint modeling, i.e., the change of model output with respect to elements of the control vector.
So far in Chapters 5 through 7, it was assumed that the number m of observations is fixed and is known in advance. This treatment has come to be known as the fixed sample or off-line version of the least squares problem. In this chapter, we introduce the rudiments of the dual problem wherein the data or the observations are not known in advance and arrive sequentially in time. The challenge is to keep updating the optimal estimates as the new observations arrive on the scene. A naive way would be to repeatedly solve a sequence of least squares problems after the arrival of every new observation using the methods described in Chapters 5 through 7. A little reflection will, however, reveal that this is inefficient and computationally very expensive. The real question is: knowing the optimal estimate x(m) based on the m samples, can we compute x(m + 1), the optimal estimate for (m + 1) samples, recursively by computing an increment or a correction to x(m) that reflects the new information contained in the new (m + 1)th observation? The answer is indeed “yes”, and leads to the sequential or recursive method for least squares estimation which is the subject of this chapter.
Section 8.1 provides an introduction to the deterministic recursive linear least squares estimation.
A recursive framework
Let x ∈ ℝn denote the state of the system under observation where n is fixed.