To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The previous chapter considered the following problem: given a distribution, deduce the characteristics of samples drawn from that distribution. This chapter goes in the opposite direction: given a random sample, infer the distribution from which the sample was drawn. It is impossible to infer the distribution exactly from a finite sample. Our strategy is more limited: we propose a hypothesis about the distribution, then decide whether or not to accept the hypothesis based on the sample. Such procedures are called hypothesis tests. In each test, a decision rule for deciding whether to accept or reject the hypothesis is formulated. The probability that the rule gives the wrong decision when the hypothesis is true leads to the concept of a significance level. In climate studies, the most common questions addressed by hypothesis test are whether two random variables (1) have the same mean, (2) have the same variance, or (3) are independent. This chapter discusses the corresponding tests for normal distributions, called the (1) t-test (or difference-in-means test), (2) F-test (or difference-in-variance test), and (3) correlation test.
This chapter reviews some essential concepts of probability and statistics, including: line plots, histograms, scatter plots, mean, median, quantiles, variance, random variables, probability density function, expectation of a random variable, covariance and correlation, independence the normal distribution (also known as the Gaussian distribution), the chi-square distribution. The above concepts provide the foundation for the statistical methods discussed in the rest of this book.
Field significance is concerned with testing a large number of hypothesis simultaneously. Previous chapters have discussed methods for testing one hypothesis, such as whether one variable is correlated with one other variable. Field significance is concerned with whether one variable is related to a random vector. In climate applications, a characteristic feature of field significance problems is that the variables in the random vector correspond to quantities at different geographic locations. As such, neighboring variables are correlated and therefore exhibit spatial dependence. This spatial dependence needs to be taken into account when testing hypotheses. This chapter introduces the concept of field significance and explains three hypothesis test procedures: a Monte Carlo method proposed by Livezey and Chen (1983) and an associated permutation test, a regression method proposed by DelSole and Yang (2011), and a procedure to control the false discovery rate, proposed in a general context by Benjamini and Hockberg (1995) and applied to field significance problems by Ventura et al. (2004) and Wilks (2006).
The previous chapter discussed Analysis of Variance (ANOVA), a procedure for deciding if populations have identical scalar means. This chapter discusses the generalization of this test to vector means, which is called Multivariate Analysis of Variance, or MANOVA. MANOVA can detect predictability of random vectors and decompose a random vector into an sum of components ordered such that the first maximizes predictability, the second maximizes predictability subject to being uncorrelated with the first, and so on. This decomposition is called Predictable Component Analysis (PrCA) or signal-to-noise maximizing EOF analysis. A slight modification of this procedure can decompose forecast skill. The connection between PrCA, Canonical Correlation Analysis, and Multivariate Regression is reviewed. In typical climate studies, the dimension of the random vector exceeds the number of samples, leading to an ill-posed problem. The standard approach to this problem is to apply PrCA on a small number of principal components. The problem of selecting the number of principal components can be framed as a model selection problem in regression.
Many scientific questions lead to hypotheses about random vectors. For instance, the question of whether global warming has occurred over a geographic region is a question about whether temperature has changed at each spatial location within the region. One approach to addressing such a question is to apply a univariate test to each location separately and then use the results collectively to make a decision. This approach is called multiple testing or multiple comparisons and is common in genomics for analyzing gene expressions. The disadvantage of this approach is that it does not fully account for correlation between variables. Multivariate techniques provide a framework for hypothesis testing that takes into account correlations between variables. Although multivariate tests are more comprehensive, they require estimating more parameters and therefore have low power when the number of variables is large. Multivariate statistical analysis draws heavily on linear algebra and includes a generalization of the normal distribution, called the multivariate normal distribution, whose population parameters are the mean vector and the covariance matrix.
Climate data are correlated over short spatial and temporal scales. For instance, today’s weather tends to be correlated with tomorrow’s weather, and weather in one city tends to be correlated with weather in a neighboring city. Such correlations imply that weather events are not independent. This chapter discusses an approach to accounting for spatial and temporal dependencies based on stochastic processes. A stochastic process is a collection of random variables indexed by a parameter, such as time or space. A stochastic process is described by the moments at a single time (e.g., mean and variance), and also by the degree of dependence between two times, often measured by the autocorrelation function. This chapter presents these concepts and discusses common mathematical models for generating stochastic processes, especially autoregressive models. The focus of this chapter is on developing the language for describing stochastic processes. Challenges in estimating parameters and testing hypotheses about stochastic processes are discussed.
This chapter discusses a procedure for quantifying differences between two covariance matrices. Despite being applicable to a range of statistical problems, the general procedure has no standard name. In this chapter, we call it Covariance Discriminant Analysis (CDA). CDA finds the linear combination of variables that maximizes the ratio of variances. More generally, CDA decomposes two multivariate time series, separately, into components ordered such that the variance ratio of the first component is maximized, and each succeeding component maximizes the variance ratio under the constraint that it is uncorrelated with the preceding components. This technique is used in numerous other multivariate techniques, including canonical correlation analysis, predictable component analysis, and multivariate ANOVA. CDA also is used to identify low-frequency components that maximize the ratio of low-frequency to high-frequency variance. To mitigate overfitting, the standard approach is to apply CDA to a few principal components. No standard criterion exists for choosing the number of principal components. A new criterion is proposed in this chapter.
The hypothesis tests discussed in the previous chapters are parametric. That is, the procedures assume samples come from a prescribed family of distributions, leaving only the parameters of the distribution open to question. For instance, a univariate Gaussian distribution is characterized by two parameters, the mean and variance, and hypotheses are expressed in terms of those parameters. This chapter discusses a class of procedures called nonparametric statistics, or distribution-free methods, that make fewer assumptions. For some hypotheses, nonparametric tests are almost as powerful as parametric tests, hence some statisticians recommend nonparametric methods as a first choice. This chapter discusses the following non-parametric tests: Wilcoxon rank-sum test, a non-parametric version of the t-test, Kruskal-Wallis test, a nonparametric version of Analysis of Variance, a nonparametric version of the F-test, based on medians, Spearman’s rank correlation, a non-parametric version of the correlation test. This chapter assumes familiarity with hypothesis tests, particularly the concepts of null hypothesis, decision rule, and significance level.
A goal in statistics is to make inferences about a population. Typically, such inferences are in the form of estimates of population parameters; for instance, the mean and variance of a normal distribution. Estimates of population parameters are imperfect because they are based on a finite amount of data. The uncertainty in a parameter estimate may be quantified using a confidence interval. A confidence interval is a random interval that encloses the population value with a specified probability. Confidence intervals are related to hypothesis tests about population parameters. Specifically, for a given hypothesis about the value of a parameter, a test at the 5% significance level would reject that value if the 95% confidence interval contained that hypothesized value. This chapter explains how to construct a confidence interval for a difference in means, a ratio of variances, and a correlation coefficient. These confidence intervals assume the samples come from normal distributions. If the distribution is not Gaussian, or the quantity being inferred is complicated, then bootstrap methods offer an important alternative approach, as discussed at the end of this chapter.
This chapter discusses the problem of selecting predictors in a linear regression model, which is a special case of model selection. One might think that the best model is the one with the most predictors. However, each predictor is associated with a parameter that must be estimated, and errors in the estimation add uncertainty to the final prediction. Thus, when deciding whether to include certain predictors or not, the associated gain in prediction skill should exceed the loss due to estimation error. Model selection is not easily addressed using a hypothesis testing framework because multiple testing is involved. Instead, the standard approach is to define a criterion for preferring one model over another. One criterion is to select the model that gives the best predictions of independent data. By independent data, we mean data that is generated independently of the sample that was used to inform the model building process. Criteria for identifying the model that gives the best predictions in independent data include Mallows’ Cp, Akaike’s Information Criterion, Bayesian Information Criterion, and cross-validated error.
Data assimilation is a procedure for combining observations and forecasts of a system into a single, improved description of the system state. Because observations and forecasts are uncertain, they are each best described by probability distributions. The problem of combining these two distributions into a new, updated distribution that summarizes all our knowledge is solved by Bayes theorem. If the distributions are Gaussian, then the parameters of the updated distribution can be written as an explicit function of the parameters of the observation and forecast distributions. The assumption of Gaussian distributions is tantamount to assuming linear models for observations and state dynamics. The purpose of this chapter is to provide an introduction to the essence of data assimilation. Accordingly, this chapter discusses the data assimilation problem for Gaussian distributions in which the solution from Bayes theorem can be derived analytically. Practical data assimilation usually requires modifications of this assimilation procedure, a special case of which is discussed in the next chapter.
There is limited information on the volume of antibiotic prescribing that is influenza-associated, resulting from influenza infections and their complications (such as streptococcal pharyngitis and otitis media). Here, we estimated age/diagnosis-specific proportions of antibiotic prescriptions (fills) for the Kaiser Permanente Northern California population during 2010–2018 that were influenza-associated. The proportion of influenza-associated antibiotic prescribing among all antibiotic prescribing was higher in children aged 5–17 years compared to children aged under 5 years, ranging from 1.4% [95% CI (0.7–2.1)] in aged <1 year to 2.7% (1.9–3.4) in aged 15–17 years. For adults aged over 20 years, the proportion of influenza-associated antibiotic prescribing among all antibiotic prescribing was lower, ranging from 0.7% (0.5–1) for aged 25–29 years to 1.6% (1.2–1.9) for aged 60–64 years. Most of the influenza-associated antibiotic prescribing in children aged under 10 years was for ear infections, while for age groups over 25 years, 45–84% of influenza-associated antibiotic prescribing was for respiratory diagnoses without a bacterial indication. This suggests a modest benefit of increasing influenza vaccination coverage for reducing antibiotic prescribing, as well as the potential benefit of other measures to reduce unnecessary antibiotic prescribing for respiratory diagnoses with no bacterial indication in persons aged over 25 years, both of which may further contribute to the mitigation of antimicrobial resistance.
Scientists often propose hypotheses based on patterns seen in data. However, if a scientist tests a hypothesis using the same data that suggested the hypothesis, then that scientist has violated a rule of science. The rule is: test hypotheses with independent data. This rule may sound so obvious as to be hardly worth mentioning. In fact, this mistake occurs frequently, especially when analyzing large data sets. Among the many pitfalls in statistics, screening is particularly serious. Screening is the process of evaluating a property for a large number of samples and then selecting samples in which that property is extreme. Screening is closely related to data fishing, data dredging, or data snooping. After a sample has been selected through screening, classical hypothesis tests exhibit selection bias. Quantifying the effect of screening often reveals that it creates biases that are substantially larger than one might guess. This chapter explains the concept of screening and illustrates it through examples from selecting predictors, interpreting correlation maps, and identifying change points.
The correlation coefficient measures the linear relation between scalar X and scalar Y. How can the linear relation between vector X and vector Y be measured?Canonical Correlation Analysis (CCA) provides a way. CCA finds a linear combination of X, and a (separate) linear combination of Y, that maximizes the correlation. The resulting maximized correlation is called a canonical correlation. More generally, CCA decomposes two sets of variables into an ordered sequence of component pairs ordered such that the first pair has maximum correlation, the second has maximum correlation subject to being uncorrelated with the first, and so on. The entire decomposition can be derived from a Singular Value Decomposition of a suitable matrix. If the dimension of the X and Y vectors is too large, overfitting becomes a problem. In this case, CCA often is computed using a few principal components of X and Y. The criterion for selecting the number of principal components is not standard. The Mutual Information Criterion (MIC) introduced in Chapter 14 is used in this chapter.
The previous chapter discussed data assimilation for the case in which the variables have known Gaussian distributions. However, in atmospheric and oceanic data assimilation, the distributions are neither Gaussian nor known, and the large number of state variables creates numerical challenges. This chapter discusses a class of algorithms, called Ensemble Square Root Filters, for performing data assimilation with high-dimensional, nonlinear systems. The basic idea is to use a collection of forecasts (called an ensemble) to estimate the statistics of the background distribution. In addition, observational information is incorporated by adjusting individual ensemble members (i.e., forecasts) rather than computing an entire distribution. This chapter discusses three standard filters: the Ensemble Transform Kalman Filter (ETKF), the Ensemble Square Root Filter (EnSRF), and the Ensemble Adjustment Kalman Filter (EAKF). However, ensemble filters often experience filter divergence, in which the analysis no longer tracks the truth. This chapter discusses standard approaches to mitigating filter divergence, namely covariance inflation and covariance localization.
Some variables can be modeled by a linear combination of other random variables, plus random noise. Such models are used to quantify the relation between variables, to make predictions, and to test hypotheses about the relation between variables. After identifying the variables to include in a model, the next step is to estimate the coefficients that multiply them, called the regression parameters. This chapter discusses the least squares method for estimating regression parameters. The least squares method estimates the parameters by minimizing the sum of squared differences between the fitted model and the data. This chapter also describes measures for the goodness of fit and an illuminating geometric interpretation of least squares fitting. The least squares method is illustrated on various routine calculations in weather and climate analysis (e.g., fitting a trend). Procedures for testing hypotheses about linear models are discussed in the next chapter.
Multivariate linear regression is a method for modeling linear relations between two random vectors, say X and Y. Common reasons for using multivariate regression include (1) to predicting Y given X, (2) to testing hypotheses about the relation between X and Y, and (3) to projecting Y onto prescribed time series or spatial patterns. Special cases of multivariate regression models include Linear Inverse Models (LIMs) and Vector Autoregressive Models. Multivariate regression also is fundamental to other statistical techniques, including canonical correlation analysis, discriminant analysis, and predictable component analysis. This chapter introduces multivariate linear regression and discusses estimation, measures of association, hypothesis testing, and model selection. In climate studies, model selection often involves selecting Y as well as X. For instance, Y may be a set of principal components that need to be chosen, which is not a standard selection problem. This chapter introduces a criterion for selecting X and Y simultaneously called Mutual Information Criterion (MIC).