To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter discusses a procedure for quantifying differences between two covariance matrices. Despite being applicable to a range of statistical problems, the general procedure has no standard name. In this chapter, we call it Covariance Discriminant Analysis (CDA). CDA finds the linear combination of variables that maximizes the ratio of variances. More generally, CDA decomposes two multivariate time series, separately, into components ordered such that the variance ratio of the first component is maximized, and each succeeding component maximizes the variance ratio under the constraint that it is uncorrelated with the preceding components. This technique is used in numerous other multivariate techniques, including canonical correlation analysis, predictable component analysis, and multivariate ANOVA. CDA also is used to identify low-frequency components that maximize the ratio of low-frequency to high-frequency variance. To mitigate overfitting, the standard approach is to apply CDA to a few principal components. No standard criterion exists for choosing the number of principal components. A new criterion is proposed in this chapter.
The hypothesis tests discussed in the previous chapters are parametric. That is, the procedures assume samples come from a prescribed family of distributions, leaving only the parameters of the distribution open to question. For instance, a univariate Gaussian distribution is characterized by two parameters, the mean and variance, and hypotheses are expressed in terms of those parameters. This chapter discusses a class of procedures called nonparametric statistics, or distribution-free methods, that make fewer assumptions. For some hypotheses, nonparametric tests are almost as powerful as parametric tests, hence some statisticians recommend nonparametric methods as a first choice. This chapter discusses the following non-parametric tests: Wilcoxon rank-sum test, a non-parametric version of the t-test, Kruskal-Wallis test, a nonparametric version of Analysis of Variance, a nonparametric version of the F-test, based on medians, Spearman’s rank correlation, a non-parametric version of the correlation test. This chapter assumes familiarity with hypothesis tests, particularly the concepts of null hypothesis, decision rule, and significance level.
A goal in statistics is to make inferences about a population. Typically, such inferences are in the form of estimates of population parameters; for instance, the mean and variance of a normal distribution. Estimates of population parameters are imperfect because they are based on a finite amount of data. The uncertainty in a parameter estimate may be quantified using a confidence interval. A confidence interval is a random interval that encloses the population value with a specified probability. Confidence intervals are related to hypothesis tests about population parameters. Specifically, for a given hypothesis about the value of a parameter, a test at the 5% significance level would reject that value if the 95% confidence interval contained that hypothesized value. This chapter explains how to construct a confidence interval for a difference in means, a ratio of variances, and a correlation coefficient. These confidence intervals assume the samples come from normal distributions. If the distribution is not Gaussian, or the quantity being inferred is complicated, then bootstrap methods offer an important alternative approach, as discussed at the end of this chapter.
This chapter discusses the problem of selecting predictors in a linear regression model, which is a special case of model selection. One might think that the best model is the one with the most predictors. However, each predictor is associated with a parameter that must be estimated, and errors in the estimation add uncertainty to the final prediction. Thus, when deciding whether to include certain predictors or not, the associated gain in prediction skill should exceed the loss due to estimation error. Model selection is not easily addressed using a hypothesis testing framework because multiple testing is involved. Instead, the standard approach is to define a criterion for preferring one model over another. One criterion is to select the model that gives the best predictions of independent data. By independent data, we mean data that is generated independently of the sample that was used to inform the model building process. Criteria for identifying the model that gives the best predictions in independent data include Mallows’ Cp, Akaike’s Information Criterion, Bayesian Information Criterion, and cross-validated error.
Data assimilation is a procedure for combining observations and forecasts of a system into a single, improved description of the system state. Because observations and forecasts are uncertain, they are each best described by probability distributions. The problem of combining these two distributions into a new, updated distribution that summarizes all our knowledge is solved by Bayes theorem. If the distributions are Gaussian, then the parameters of the updated distribution can be written as an explicit function of the parameters of the observation and forecast distributions. The assumption of Gaussian distributions is tantamount to assuming linear models for observations and state dynamics. The purpose of this chapter is to provide an introduction to the essence of data assimilation. Accordingly, this chapter discusses the data assimilation problem for Gaussian distributions in which the solution from Bayes theorem can be derived analytically. Practical data assimilation usually requires modifications of this assimilation procedure, a special case of which is discussed in the next chapter.
There is limited information on the volume of antibiotic prescribing that is influenza-associated, resulting from influenza infections and their complications (such as streptococcal pharyngitis and otitis media). Here, we estimated age/diagnosis-specific proportions of antibiotic prescriptions (fills) for the Kaiser Permanente Northern California population during 2010–2018 that were influenza-associated. The proportion of influenza-associated antibiotic prescribing among all antibiotic prescribing was higher in children aged 5–17 years compared to children aged under 5 years, ranging from 1.4% [95% CI (0.7–2.1)] in aged <1 year to 2.7% (1.9–3.4) in aged 15–17 years. For adults aged over 20 years, the proportion of influenza-associated antibiotic prescribing among all antibiotic prescribing was lower, ranging from 0.7% (0.5–1) for aged 25–29 years to 1.6% (1.2–1.9) for aged 60–64 years. Most of the influenza-associated antibiotic prescribing in children aged under 10 years was for ear infections, while for age groups over 25 years, 45–84% of influenza-associated antibiotic prescribing was for respiratory diagnoses without a bacterial indication. This suggests a modest benefit of increasing influenza vaccination coverage for reducing antibiotic prescribing, as well as the potential benefit of other measures to reduce unnecessary antibiotic prescribing for respiratory diagnoses with no bacterial indication in persons aged over 25 years, both of which may further contribute to the mitigation of antimicrobial resistance.
Scientists often propose hypotheses based on patterns seen in data. However, if a scientist tests a hypothesis using the same data that suggested the hypothesis, then that scientist has violated a rule of science. The rule is: test hypotheses with independent data. This rule may sound so obvious as to be hardly worth mentioning. In fact, this mistake occurs frequently, especially when analyzing large data sets. Among the many pitfalls in statistics, screening is particularly serious. Screening is the process of evaluating a property for a large number of samples and then selecting samples in which that property is extreme. Screening is closely related to data fishing, data dredging, or data snooping. After a sample has been selected through screening, classical hypothesis tests exhibit selection bias. Quantifying the effect of screening often reveals that it creates biases that are substantially larger than one might guess. This chapter explains the concept of screening and illustrates it through examples from selecting predictors, interpreting correlation maps, and identifying change points.
The correlation coefficient measures the linear relation between scalar X and scalar Y. How can the linear relation between vector X and vector Y be measured?Canonical Correlation Analysis (CCA) provides a way. CCA finds a linear combination of X, and a (separate) linear combination of Y, that maximizes the correlation. The resulting maximized correlation is called a canonical correlation. More generally, CCA decomposes two sets of variables into an ordered sequence of component pairs ordered such that the first pair has maximum correlation, the second has maximum correlation subject to being uncorrelated with the first, and so on. The entire decomposition can be derived from a Singular Value Decomposition of a suitable matrix. If the dimension of the X and Y vectors is too large, overfitting becomes a problem. In this case, CCA often is computed using a few principal components of X and Y. The criterion for selecting the number of principal components is not standard. The Mutual Information Criterion (MIC) introduced in Chapter 14 is used in this chapter.
The previous chapter discussed data assimilation for the case in which the variables have known Gaussian distributions. However, in atmospheric and oceanic data assimilation, the distributions are neither Gaussian nor known, and the large number of state variables creates numerical challenges. This chapter discusses a class of algorithms, called Ensemble Square Root Filters, for performing data assimilation with high-dimensional, nonlinear systems. The basic idea is to use a collection of forecasts (called an ensemble) to estimate the statistics of the background distribution. In addition, observational information is incorporated by adjusting individual ensemble members (i.e., forecasts) rather than computing an entire distribution. This chapter discusses three standard filters: the Ensemble Transform Kalman Filter (ETKF), the Ensemble Square Root Filter (EnSRF), and the Ensemble Adjustment Kalman Filter (EAKF). However, ensemble filters often experience filter divergence, in which the analysis no longer tracks the truth. This chapter discusses standard approaches to mitigating filter divergence, namely covariance inflation and covariance localization.
Some variables can be modeled by a linear combination of other random variables, plus random noise. Such models are used to quantify the relation between variables, to make predictions, and to test hypotheses about the relation between variables. After identifying the variables to include in a model, the next step is to estimate the coefficients that multiply them, called the regression parameters. This chapter discusses the least squares method for estimating regression parameters. The least squares method estimates the parameters by minimizing the sum of squared differences between the fitted model and the data. This chapter also describes measures for the goodness of fit and an illuminating geometric interpretation of least squares fitting. The least squares method is illustrated on various routine calculations in weather and climate analysis (e.g., fitting a trend). Procedures for testing hypotheses about linear models are discussed in the next chapter.
Multivariate linear regression is a method for modeling linear relations between two random vectors, say X and Y. Common reasons for using multivariate regression include (1) to predicting Y given X, (2) to testing hypotheses about the relation between X and Y, and (3) to projecting Y onto prescribed time series or spatial patterns. Special cases of multivariate regression models include Linear Inverse Models (LIMs) and Vector Autoregressive Models. Multivariate regression also is fundamental to other statistical techniques, including canonical correlation analysis, discriminant analysis, and predictable component analysis. This chapter introduces multivariate linear regression and discusses estimation, measures of association, hypothesis testing, and model selection. In climate studies, model selection often involves selecting Y as well as X. For instance, Y may be a set of principal components that need to be chosen, which is not a standard selection problem. This chapter introduces a criterion for selecting X and Y simultaneously called Mutual Information Criterion (MIC).
The method of least squares will fit any model to a data set, but is the resulting model "good"?One criterion is that the model should fit the data significantly better than a simpler model with fewer predictors. After all, if the fit is not significantly better, then the model with fewer predictors is almost as good. For linear models, this approach is equivalent to testing if selected regression parameters vanish. This chapter discusses procedures for testing such hypotheses. In interpreting such hypotheses, it is important to recognize that a regression parameter for a given predictor quantifies the expected rate of change of the predict and while holding the other predictors constant. Equivalently, the regression parameter quantifies the dependence between two variables after controlling or regressing out other predictors. These concepts are important for identifying a confounding variable, which is a third variable that influences two variables to produce a correlation between those two variables. This chapter also discusses how detection and attribution of climate change can be framed in a regression model framework.
Atmospheric simulation data present richer information in terms of spatiotemporal resolution, spatial dimension, and the number of physical quantities compared to observational data; however, such simulations do not perfectly correspond to the real atmospheric conditions. Additionally, extensive simulation data aids machine learning-based image classification in atmospheric science. In this study, we applied a machine learning model for tropical cyclone detection, which was trained using both simulation and satellite observation data. Consequently, the classification performance was significantly lower than that obtained with the application of simulation data. Owing to the large gap between the simulation and observation data, the classification model could not be practically trained only on the simulation data. Thus, the representation capability of the simulation data must be analyzed and integrated into the observation data for application in real problems.
Large data sets are difficult to grasp. To make progress, we often seek a few quantities that capture as much of the information in the data as possible. In this chapter, we discuss a procedure called Principal Component Analysis (PCA), also called Empirical Orthogonal Function (EOF) analysis, which finds the components that minimizes the sum square difference between the components and the data. The components are ordered such that the first approximates the data the best (in a least squares sense), the second approximates the data the best among all components orthogonal to the first, and so on. In typical climate applications, a principal component consists of two parts: (1) a fixed spatial structure, called an Empirical Orthogonal Function (EOF), and (2) its time-dependent amplitude, called a PC time series. The EOFs are orthogonal and the PC time series are uncorrelated. Principal components often are used as input to other analyses, such as linear regression, canonical correlation analysis, predictable components analysis, or discriminant analysis. The procedure for performing area-weighted PCA is discussed in detail in this chapter.
This chapter introduces the power spectrum. The power spectrum is the Fourier Transform of the autocovariance function, and the autocovariance function is the (inverse) Fourier Transform of the power spectrum. As such, the power spectrum and autocovariance function offer two complementary but mathematically equivalent descriptions of a stochastic process. The power spectrum quantifies how variance is distributed over frequencies and is useful for identifying periodic behavior in time series. The discrete Fourier transform of a time series can be summarized in a periodogram, which provides a starting point for estimating power spectra. Estimation of the power spectrum can be counterintuitive because the uncertainty in periodogram elements does not decrease with increasing sample size. To reduce uncertainty, periodogram estimates are averaged over a frequency interval called the bandwidth. Trends and discontinuities in time series can lead to similar low-frequency structure despite very different temporal characteristics. Spectral analysis provides a particularly insightful way to understand the behavior of linear filters.