To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Generating the adjoint model (ADJM) by hand is tedious, time-consuming, and error prone. In most practical applications of data assimilation these days, the derivative codes, including the ADJM, are generated by the automatic differentiation (AD) tools, which evaluate the exact derivative information of a function in terms of a program. Terminologies and methods in AD are introduced, including the practical exclusion of the forward and reverse modes of differentiation. Various AD tools based on two major AD approaches, source transformation and operator overloading, are compiled with their webpages.
Data assimilation is theoretically founded on probability, statistics, control theory, information theory, linear algebra, and functional analysis. At the same time, data assimilation is a very practical subject, given its goal of estimating the posterior probability density function in realistic high-dimensional applications. This puts data assimilation at the intersection between the contrasting requirements of theory and practice. Based on over twenty years of teaching courses in data assimilation, Principles of Data Assimilation introduces a unique perspective that is firmly based on mathematical theories, but also acknowledges practical limitations of the theory. With the inclusion of numerous examples and practical case studies throughout, this new perspective will help students and researchers to competently interpret data assimilation results and to identify critical challenges of developing data assimilation algorithms. The benefit of information theory also introduces new pathways for further development, understanding, and improvement of data assimilation methods.
This chapter gives an introduction to extreme value theory. Unlike most statistical analyses, which are concerned with the typical properties of a random variable, extreme value theory is concerned with rare events that occur in the tail of the distribution. The cornerstone of extreme value theory is the Extremal Types Theorem. This theorem states that the maximum of N independent and identically distributed random variables can converge, after suitable normalization, only to a single distribution in the limit of large N. This limiting distribution is called the Generalized Extreme Value (GEV) distribution. This theorem is analogous to the central limit theorem, except that the focus is on the maximum rather than the sum of random variables. The GEV provides the basis for estimating the probability of extremes that are more extreme than those that occurred in a sample. The GEV is characterized by three parameters, called the location, scale, and shape. A procedure called the maximum likelihood method can be used to estimate these parameters, quantify their uncertainty, and account for dependencies on time or external environmental conditions.
The correlation test is a standard procedure for deciding if two variables are linearly related. This chapter discusses a test for independence that avoids the linearity assumption. The basic idea is the following. If two variables are dependent, then changing the value of one them, say c, changes the distribution of the other. Therefore, if samples are collected for fixed value of c, and additional samples are collected for a different value of c, and so on for different values of c, then a dependence implies that the distributions for different c’s should differ. It follows that deciding that some aspect of the distributions depend on c is equivalent to deciding that the variables are dependent. A special case of this approach is the t-test, which tests if two populations have identical means. Generalizing this test to more than two populations leads to Analysis of Variance (ANOVA), which is the topic of this chapter. ANOVA is a method for testing if two or more populations have the same means. In weather and climate studies, ANOVA is used most often to quantify the predictability of an ensemble forecast, hence this framing is discussed extensively in this chapter.
The previous chapter considered the following problem: given a distribution, deduce the characteristics of samples drawn from that distribution. This chapter goes in the opposite direction: given a random sample, infer the distribution from which the sample was drawn. It is impossible to infer the distribution exactly from a finite sample. Our strategy is more limited: we propose a hypothesis about the distribution, then decide whether or not to accept the hypothesis based on the sample. Such procedures are called hypothesis tests. In each test, a decision rule for deciding whether to accept or reject the hypothesis is formulated. The probability that the rule gives the wrong decision when the hypothesis is true leads to the concept of a significance level. In climate studies, the most common questions addressed by hypothesis test are whether two random variables (1) have the same mean, (2) have the same variance, or (3) are independent. This chapter discusses the corresponding tests for normal distributions, called the (1) t-test (or difference-in-means test), (2) F-test (or difference-in-variance test), and (3) correlation test.
This chapter reviews some essential concepts of probability and statistics, including: line plots, histograms, scatter plots, mean, median, quantiles, variance, random variables, probability density function, expectation of a random variable, covariance and correlation, independence the normal distribution (also known as the Gaussian distribution), the chi-square distribution. The above concepts provide the foundation for the statistical methods discussed in the rest of this book.
Field significance is concerned with testing a large number of hypothesis simultaneously. Previous chapters have discussed methods for testing one hypothesis, such as whether one variable is correlated with one other variable. Field significance is concerned with whether one variable is related to a random vector. In climate applications, a characteristic feature of field significance problems is that the variables in the random vector correspond to quantities at different geographic locations. As such, neighboring variables are correlated and therefore exhibit spatial dependence. This spatial dependence needs to be taken into account when testing hypotheses. This chapter introduces the concept of field significance and explains three hypothesis test procedures: a Monte Carlo method proposed by Livezey and Chen (1983) and an associated permutation test, a regression method proposed by DelSole and Yang (2011), and a procedure to control the false discovery rate, proposed in a general context by Benjamini and Hockberg (1995) and applied to field significance problems by Ventura et al. (2004) and Wilks (2006).
The previous chapter discussed Analysis of Variance (ANOVA), a procedure for deciding if populations have identical scalar means. This chapter discusses the generalization of this test to vector means, which is called Multivariate Analysis of Variance, or MANOVA. MANOVA can detect predictability of random vectors and decompose a random vector into an sum of components ordered such that the first maximizes predictability, the second maximizes predictability subject to being uncorrelated with the first, and so on. This decomposition is called Predictable Component Analysis (PrCA) or signal-to-noise maximizing EOF analysis. A slight modification of this procedure can decompose forecast skill. The connection between PrCA, Canonical Correlation Analysis, and Multivariate Regression is reviewed. In typical climate studies, the dimension of the random vector exceeds the number of samples, leading to an ill-posed problem. The standard approach to this problem is to apply PrCA on a small number of principal components. The problem of selecting the number of principal components can be framed as a model selection problem in regression.
Many scientific questions lead to hypotheses about random vectors. For instance, the question of whether global warming has occurred over a geographic region is a question about whether temperature has changed at each spatial location within the region. One approach to addressing such a question is to apply a univariate test to each location separately and then use the results collectively to make a decision. This approach is called multiple testing or multiple comparisons and is common in genomics for analyzing gene expressions. The disadvantage of this approach is that it does not fully account for correlation between variables. Multivariate techniques provide a framework for hypothesis testing that takes into account correlations between variables. Although multivariate tests are more comprehensive, they require estimating more parameters and therefore have low power when the number of variables is large. Multivariate statistical analysis draws heavily on linear algebra and includes a generalization of the normal distribution, called the multivariate normal distribution, whose population parameters are the mean vector and the covariance matrix.
Climate data are correlated over short spatial and temporal scales. For instance, today’s weather tends to be correlated with tomorrow’s weather, and weather in one city tends to be correlated with weather in a neighboring city. Such correlations imply that weather events are not independent. This chapter discusses an approach to accounting for spatial and temporal dependencies based on stochastic processes. A stochastic process is a collection of random variables indexed by a parameter, such as time or space. A stochastic process is described by the moments at a single time (e.g., mean and variance), and also by the degree of dependence between two times, often measured by the autocorrelation function. This chapter presents these concepts and discusses common mathematical models for generating stochastic processes, especially autoregressive models. The focus of this chapter is on developing the language for describing stochastic processes. Challenges in estimating parameters and testing hypotheses about stochastic processes are discussed.
This chapter discusses a procedure for quantifying differences between two covariance matrices. Despite being applicable to a range of statistical problems, the general procedure has no standard name. In this chapter, we call it Covariance Discriminant Analysis (CDA). CDA finds the linear combination of variables that maximizes the ratio of variances. More generally, CDA decomposes two multivariate time series, separately, into components ordered such that the variance ratio of the first component is maximized, and each succeeding component maximizes the variance ratio under the constraint that it is uncorrelated with the preceding components. This technique is used in numerous other multivariate techniques, including canonical correlation analysis, predictable component analysis, and multivariate ANOVA. CDA also is used to identify low-frequency components that maximize the ratio of low-frequency to high-frequency variance. To mitigate overfitting, the standard approach is to apply CDA to a few principal components. No standard criterion exists for choosing the number of principal components. A new criterion is proposed in this chapter.
The hypothesis tests discussed in the previous chapters are parametric. That is, the procedures assume samples come from a prescribed family of distributions, leaving only the parameters of the distribution open to question. For instance, a univariate Gaussian distribution is characterized by two parameters, the mean and variance, and hypotheses are expressed in terms of those parameters. This chapter discusses a class of procedures called nonparametric statistics, or distribution-free methods, that make fewer assumptions. For some hypotheses, nonparametric tests are almost as powerful as parametric tests, hence some statisticians recommend nonparametric methods as a first choice. This chapter discusses the following non-parametric tests: Wilcoxon rank-sum test, a non-parametric version of the t-test, Kruskal-Wallis test, a nonparametric version of Analysis of Variance, a nonparametric version of the F-test, based on medians, Spearman’s rank correlation, a non-parametric version of the correlation test. This chapter assumes familiarity with hypothesis tests, particularly the concepts of null hypothesis, decision rule, and significance level.
A goal in statistics is to make inferences about a population. Typically, such inferences are in the form of estimates of population parameters; for instance, the mean and variance of a normal distribution. Estimates of population parameters are imperfect because they are based on a finite amount of data. The uncertainty in a parameter estimate may be quantified using a confidence interval. A confidence interval is a random interval that encloses the population value with a specified probability. Confidence intervals are related to hypothesis tests about population parameters. Specifically, for a given hypothesis about the value of a parameter, a test at the 5% significance level would reject that value if the 95% confidence interval contained that hypothesized value. This chapter explains how to construct a confidence interval for a difference in means, a ratio of variances, and a correlation coefficient. These confidence intervals assume the samples come from normal distributions. If the distribution is not Gaussian, or the quantity being inferred is complicated, then bootstrap methods offer an important alternative approach, as discussed at the end of this chapter.
This chapter discusses the problem of selecting predictors in a linear regression model, which is a special case of model selection. One might think that the best model is the one with the most predictors. However, each predictor is associated with a parameter that must be estimated, and errors in the estimation add uncertainty to the final prediction. Thus, when deciding whether to include certain predictors or not, the associated gain in prediction skill should exceed the loss due to estimation error. Model selection is not easily addressed using a hypothesis testing framework because multiple testing is involved. Instead, the standard approach is to define a criterion for preferring one model over another. One criterion is to select the model that gives the best predictions of independent data. By independent data, we mean data that is generated independently of the sample that was used to inform the model building process. Criteria for identifying the model that gives the best predictions in independent data include Mallows’ Cp, Akaike’s Information Criterion, Bayesian Information Criterion, and cross-validated error.
Data assimilation is a procedure for combining observations and forecasts of a system into a single, improved description of the system state. Because observations and forecasts are uncertain, they are each best described by probability distributions. The problem of combining these two distributions into a new, updated distribution that summarizes all our knowledge is solved by Bayes theorem. If the distributions are Gaussian, then the parameters of the updated distribution can be written as an explicit function of the parameters of the observation and forecast distributions. The assumption of Gaussian distributions is tantamount to assuming linear models for observations and state dynamics. The purpose of this chapter is to provide an introduction to the essence of data assimilation. Accordingly, this chapter discusses the data assimilation problem for Gaussian distributions in which the solution from Bayes theorem can be derived analytically. Practical data assimilation usually requires modifications of this assimilation procedure, a special case of which is discussed in the next chapter.
Scientists often propose hypotheses based on patterns seen in data. However, if a scientist tests a hypothesis using the same data that suggested the hypothesis, then that scientist has violated a rule of science. The rule is: test hypotheses with independent data. This rule may sound so obvious as to be hardly worth mentioning. In fact, this mistake occurs frequently, especially when analyzing large data sets. Among the many pitfalls in statistics, screening is particularly serious. Screening is the process of evaluating a property for a large number of samples and then selecting samples in which that property is extreme. Screening is closely related to data fishing, data dredging, or data snooping. After a sample has been selected through screening, classical hypothesis tests exhibit selection bias. Quantifying the effect of screening often reveals that it creates biases that are substantially larger than one might guess. This chapter explains the concept of screening and illustrates it through examples from selecting predictors, interpreting correlation maps, and identifying change points.