To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Recall from Chapters 5 and 6 that the optimal linear estimate x is given by the solution of the normal equation
(HTH)x = HTz when m > n
and
where H ∈ ℝm×n and is of full rank. In either case HTH ∈ ℝn×n and HHT ∈ ℝm×m, called the Grammian, is a symmetric and positive definite matrix. In the opening Section 9.1, we describe the classical Cholesky decomposition algorithm for solving linear systems with symmetric and positive definite matrices. This algorithm is essentially an adaptation of the method of LU decomposition for general matrices. This method of solving the normal equations using the Cholesky decomposition is computationally very efficient, but it may exhibit instability resulting from finite precision arithmetic. To alleviate this problem, during the 1960s a new class of methods based directly on the orthogonal decomposition of the (rectangular) measurement matrix H have been developed. In this chapter we describe two such methods. The first of these is based on the QR-decomposition in Section 9.2 and the second, called the singular value decomposition(SVD) is given in Section 9.3. Section 9.4 provides a comparison of the amount of work measured in terms of the number of floating point operations (FLOPs) to solve the linear least squares problem by these methods.
Cholesky decomposition
We begin by describing the classical LU-decomposition.
This chapter develops the solution to the retrieval problem (stated in Chapter 18) using the Bayesian framework and draws heavily from Part III (especially Chapters 16 and 17). The method based on this framework has also come to be known as the three dimensional variational method – 3DVAR for short. This class of methods is becoming the industry standard for use at the operational weather prediction centers around the world (Parrish and Derber (1992), Lorenc (1995), Gauthier et al. (1996), Cohn et al. (1998), Rabier et al. (1998). Andersson et al. (1998)). This global method does not require any form of data selection strategy which local methods depend on.
From the algorithmic perspective, there are two ways of approach for this problem – use of the model space (ℝn) or use of the observation space (ℝm) (refer to Chapter 17). While these two approaches are logically equivalent, there is a computational advantage to model space when n < m, whereas the advantage goes to observation space when n > m.
In Section 20.1, we derive the Bayesian framework for the problem. The straightforward solution for the special case when the forward operator is linear is covered in Section 20.2. The following Section 20.3 brings out the duality between the model space and the observation space formulations using the notion of preconditioning. The general case of nonlinear method is treated in the next two sections with the second-order method in Section 20.4 and the first-order method in Section 20.5. The first-order method for the nonlinear case closely resembles the linear formulation.
In the opening chapter of Part VI we considered a very special dynamical model for pedagogical reasons. Having gained some working knowledge of the methodology for solving the inverse problem using the Lagrangian framework, we now consider the general linear dynamical system. Once we understand the underpinnings of this methodology in the context of a general linear dynamical system, its applicability to a wide variety of linear models is possible.
When compared to Chapter 22, the contents of this chapter are a generalization in one sense and a specialization in another. The generalization comes from the fact that we consider the generic linear dynamical system where the state variables are vectors instead of scalars. The specialty, on the other hand, comes from the fact that we only consider the problem of estimating the initial condition instead of an initial condition and a parameter (x0 and α in the straight line problem).
It could be argued that since few models of interest in real world applications are linear, this chapter's value is essentially academic. While this argument carries some weight, it should be recognized that linear analysis has a fundamental role to play in development of adjoint method for non-linear dynamical systems. For example, one standard approach to non-linear system analysis is using the so-called perturbation method. In this method the non-linear problem is reduced to a local analysis of an associated linear system. Next we want to demonstrate that the data assimilation problem is intrinsically challenging, even when the system is controlled by linear dynamics and observations are linear functions of the state variables.
In Chapters 22–25 we have discussed the solution to the off-line, 4DVAR problem of assimilating a given set of observations in deterministic/dynamic models using the classical least squares (Part II) method. In this framework, the adjoint method facilitates the computation of the gradient of the least squares objective function, which when used in conjunction with the minimization methods described in Part III, leads to the optimal initial conditions for the dynamic model. Even in the ideal case of a perfect dynamic model (error free model), the computed values of the optimal initial condition are noisy in response to erroneous observations. The deterministic approach in Chapter 22–25 are predicated on the assumption that the statistical properties of the noise corrupting the observations are not known a priori. The question is: if we are given additional information, say the second-order properties (mean and covariance) of the noisy observations, how can we use this information to derive the second-order properties of the optimal initial conditions? This can only be achieved by reliance on the statistical least squares method described in Chapter 14.
The goal of this chapter is two fold. The first is to apply the statistical least squares method of Chapter 14. More specifically, we derive explicit expressions for the unbiased, optimal (least squares) estimate of the initial condition and its covariance when the model is linear and the observations are a linear function of the state.
In this chapter we provide an overview of the role and use of spatial digital filters in solving the retrieval problem of interest in this part V. This chapter begins with a classification of filters in Section 21.1. Nonrecursive filters are covered in Section 21.2 and a detailed account of the recursive filters and their use is covered in Section 21.3.
Filters: A Classification
The word filter in spatial digital filter is used in the same (functional) sense as used in coffee filter, water filter, filter lenses, to name a few, that is, to prevent the passage of some unwanted items – the coffee bean sediments from the concoction, impurities in the drinking water, a light of particular wavelength or color from passing through. Spatial filters are designed to prevent the passage of signal components of a specified frequency or wavelength. For example, the low-pass filter is designed to suppress or filter out high frequency or smaller wavelength signals. There is a vast corpus of literature dealing with filters in general. In this section we provide a useful classification of these filters.
Filters can be classified along at least five different dimensions depending on the type of signals and the properties of the filter. Refer to Figure 21.1.1. Signals to be filtered can be in analog or digital form and signals can be a function of time and/or space. For example, in time series modelling we deal with digital signals in discrete time and analog computers use continuous time signals.
In this opening chapter of Part VI, we introduce the basic principles of data assimilation using the now classical Lagrangian framework. This is done using a very simple dynamical system representing a particle moving in a straight line at a constant velocity, and hence the title “straight line problem”.
In Section 22.1 the statement of the problem is given. First by reformulating this problem as one of fitting data to a straight line, we compute the required solution in closed form in Section 22.2. This solution is used as a benchmark against which the basic iterative scheme for data assimilation is compared. The first introduction to the iterative algorithm (which has come to be known as the adjoint method) for data assimilation is derived in Section 22.3. Section 22.4 describes a practical method for experimental analysis of this class of algorithms based on the notion of Monte Carlo type twin experiments.
A statement of the inverse problem
We begin by describing a common physical phenomenon of interest in everyday life. A particle is observed to be moving in a straight line at a constant velocity, say, α > 0. The problem is to model the dynamics of motion of this particle and then predict its position at a future instant in time, say t > 0.
Let x(t) ∈ ℝ denote the state representing the position of the particle at time t ≥ 0. where x(t0) = x0 is the initial state. Refer to Figure 22.1.1.
In this setup, the dynamics of motion of the particle can be adequately described by the following.
This opening chapter begins with a discussion of the role of dynamic data assimilation in applied sciences. After a brief review of the models and observations, we describe four basic forms – based on the model characteristics: static vs. dynamic and deterministic vs. stochastic. We then describe two related problems – analysis of sensitivity and predictability of the models. In the process, two basic goals are achieved: (a) we introduce the mathematical notation and concepts and (b) we provide a top-down view of data assimilation with pointers to various parts of the book where the basic forms and methodology for solving the problems are found.
Forecast: justification for data assimilation
It is the desire to forecast, to predict with accuracy, that demands a strategy to meld observations with model, a coupling that we call data assimilation. At the fountainhead of data assimilation is Carl Friedrich Gauss and his prediction of the reappearance of Ceres, a planetoid that disappeared behind the Sun in 1801, only to reappear a year later. And in a tour de force, the likes of which have rarely been seen in science, Gauss told astronomers where to point their telescopes to locate the wanderer. In the process, he introduced the method of least squares, the foundation of data assimilation. We briefly explore these historical aspects of data assimilation in Chapter 4.
Prediction came into prominence in the early seventeenth century with Johann Kepler's establishment of the three laws of planetary motion.
The major impetus for the development of conjugate direction/gradient methods stems from the weakness of the steepest descent method (Chapter 10). Recall that while the search directions which are the negative of the gradient of the function being minimized can be computed rather easily, the convergence of the steepest descent method can be annoyingly slow. This is often exhibited by the zig-zag or oscillatory behavior of the iterates. To use an analogy, there is lot of talk with very little substance. The net force that drives the iterates towards the minimum becomes very weak as the problem becomes progressively ill-conditioned (see Remark 10.3.1). The reason for this undesirable behavior is largely a result of the absence of transitivity of the orthogonality of the successive search directions (Exercise 10.5). Consequently the iterates are caged up in a smaller (two) dimensional subspace and the method is unable to exploit the full n degrees of freedom that are available at our disposal. Conjugate direction method was designed to remedy this situation by requiring that the successive search directions are mutually A-Conjugate (Exercise 10.6). A-Conjugacy is a natural extension of the classical orthogonality. It can be shown that if a set of vectors are A-Conjugate, then they are also linearly independent. Thus, as the iteration proceeds conjugate direction/gradient method guarantees that the iterates minimize the given function in subspaces of increasing dimension. It is this expanding subspace property which is a hallmark of this method that guarantees convergence in almost n steps provided that the arithmetic is exact. Conjugate gradient (CG) method is a special class of conjugate direction (CD) method where the mutually A-Conjugate directions are recursively derived using the gradient of the function being minimized.
With the advent of numerical weather prediction (NWP) in the post-WWII period of the 1950s, it became clear that numerical weather map analysis was a necessary adjunct to the prediction. Whereas a subjective hand analysis of weather data was the standard through the mid-twentieth century, the time constraints of operational NWP made it advisable to analyze the variables on a network of grid points in an objective fashion. It is worth mentioning, however, that the limited availability of computers in the early 1950s led the Norwegian weather service to prepare initial data for the one-level barotropic forecast by hand. In fact, the forecast (a 24-hour prediction) was also prepared graphically by methods that were akin to those discussed in Chapter 2 (Section 2.6). These forecasts compared favorably with the computer-generated results in the USA and Sweden. Nevertheless, by the mid-1950s, objective data assimilation via computer was the rule. We review several of these early methods of data assimilation.
In section 19.1 we provide a brief review of the polynomial approximation method with a discussion of two ways of handling the balance constraints – as strong and weak constraints. An algorithm based on Tikhonov regularization is described in section 19.2. A class of iterative techniques known as successive correction methods (SCM) is reviewed in section 19.3 along with convergence properties of these schemes. The concluding section 19.4 derives the mechanics of the optimum interpolation method and its relation to SCM.
In this chapter our aim is to provide an introduction to the nonlinear least squares problem. In practice many of the problems of interest are nonlinear in nature. These include several problems of interest in radar and satellite meteorology, exploration problems in geology, and tomography, to mention a few. In Section 7.1, we describe the first-order method which in many ways is a direct extension of the ideas developed in Chapters 5 and 6. This method is based on a classical idea from numerical analysis – replacing h(x) by its linear approximation at a given operating point xc, and solving a linear problem to obtain a new operating point, xnew, which is closer to the target state, x, than the original starting point. By repeatedly applying this idea, we can get as close to the target state as needed. The second-order counterpart of this idea is to replace h(x) by its quadratic approximation and, except for a few algebraic details, this method essentially follows the above iterative paradigm. This second-order method is described in Section 7.2.
A first-order method
Let x ∈ ℝn denote the state of a system under observation. Let z ∈ ℝm denote a set of observables which depend on the state x, and let z = h(x) be a representation of the physical laws that relate the underlying state to the observables – temperature to the energy radiated measured by a satellite or rain to the reflectivity measured by a doppler radar, where h(x) = (h1(x), h2(x), …, hm(x))T is a m-vector-valued function of the vector x, and hi(x) : ℝn → ℝ is the scalar valued function which is the ith component of h(x) for i = 1, 2, …, m.
In all of the Chapters 14 through 16, we have concentrated on the basic optimality of the estimators derived using different philosophies – least sum of squared errors, minimum variance estimates (Chapter 14), maximum likelihood estimates (Chapter 15), and optimality using several key parameters of the posterior distribution including the conditional mean, mode and median (Chapter 16). In this concluding chapter of Part IV, we turn to analyzing the structure of certain class of optimal estimates. For example, we only know that the conditional mean of the posterior distribution is a minimum variance estimate. But this mean, in general, could be a nonlinear function of the observations z. This observation brings us to the following structural question: when is a linear function of the observations optimal? Understanding the structural properties of an estimator is extremely important and is a major determinant in evaluating the computational feasibility of these estimates.
In Section 17.1 we derive conditions under which a linear function of the observations defines a minimum variance estimate. We then extend this analysis in Section 17.2 to the sequential framework where it is assumed that we have two pieces of information about the unknown, (a) an a priori estimate x− and its associated covariance matrix ∑− and (b) a new observation z and its covariance matrix ∑v. We derive conditions under which a linear function of x− and z will lead to a minimum variance estimate x+ of the unknown x.