To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Wavelets are mathematical tools for analyzing time series or images (although not exclusively so:for examples of usage in other applications, see Stollnitz et al., 1996, andSweldens, 1996). Our discussion of wavelets in this book focuses on their use with time series,which we take to be any sequence of observations associated with an ordered independent variablet (the variable t can assume either a discrete set of values suchas the integers or a continuum of values such as the entire real axis - examples of both typesinclude time, depth or distance along a line, so a time series need not actually involve time).Wavelets are a relatively new way of analyzing time series in that the formal subject dates back tothe 1980s, but in many aspects wavelets are a synthesis of older ideas with new elegant mathematicalresults and efficient computational algorithms. Wavelet analysis is in some cases complementary toexisting analysis techniques (e.g., correlation and spectral analysis) and in other cases capable ofsolving problems for which little progress had been made prior to the introduction of wavelets.
Broadly speaking (and with apologies for the play on words!), there have been two main waves ofwavelets. The first wave resulted in what is known as the continuous wavelet transform (CWT), whichis designed to work with time series defined over the entire real axis; the second, in the discretewavelet transform (DWT), which deals with series defined essentially over a range of integers(usually t = 0, 1,…,N – 1, where Ndenotes the number of values in the time series). In this chapter we introduce and motivate waveletsvia the CWT.
Here we introduce the discrete wavelet transform (DWT), which is the basic tool needed forstudying time series via wavelets and plays a role analogous to that of the discrete Fouriertransform in spectral analysis. We assume only that the reader is familiar with the basic ideas fromlinear filtering theory and linear algebra presented in Chapters 2 and 3. Our exposition buildsslowly upon these ideas and hence is more detailed than necessary for readers with strongbackgrounds in these areas. We encourage such readers just to use the Key Facts and Definitions ineach section or to skip directly to Section 4.12 – this has a concise self-containeddevelopment of the DWT. For complementary introductions to the DWT, see Strang (1989, 1993), Riouland Vetterli (1991), Press et al. (1992) and Mulcahy (1996).
The remainder of this chapter is organized as follows. Section 4.1 gives a qualitativedescription of the DWT using primarily the Haar and D(4) wavelets as examples. The formalmathematical development of the DWT begins in Section 4.2, which defines the wavelet filter anddiscusses some basic conditions that a filter must satisfy to qualify as a wavelet filter. Section4.3 presents the scaling filter, which is constructed in a simple manner from the wavelet filter.The wavelet and scaling filters are used in parallel to define the pyramid algorithm for computing(and precisely defining) the DWT – various aspects of this algorithm are presented inSections 4.4, 4.5 and 4.6.
The last decade has seen an explosion of interest in wavelets, a subject area that has coalescedfrom roots in mathematics, physics, electrical engineering and other disciplines. As a result,wavelet methodology has had a significant impact in areas as diverse as differential equations,image processing and statistics. This book is an introduction to wavelets and their application inthe analysis of discrete time series typical of those acquired in the physical sciences. While wepresent a thorough introduction to the basic theory behind the discrete wavelet transform (DWT), ourgoal is to bridge the gap between theory and practice by
• emphasizing what the DWT actually means in practical terms;
• showing how the DWT can be used to create informative descriptive statistics fortime series analysts;
• discussing how stochastic models can be used to assess the statisticalproperties of quantities computed from the DWT; and
• presenting substantive examples of wavelet analysis of time seriesrepresentative of those encountered in the physical sciences.
To date, most books on wavelets describe them in terms of continuous functions and oftenintroduce the reader to a plethora of different types of wavelets. We concentrate on developingwavelet methods in discrete time via standard filtering and matrix transformation ideas.
The continuous time wavelet transform is becoming a well-established tool for multiple scale representation of a continuous time ‘signal,’ which by definition is a finite energy function denned over the entire real axis. This transform essentially correlates a signal with ‘stretched’ versions of a wavelet function (in essence a continuous time band-pass filter) and yields a multiresolution representation of the signal. In this chapter we summarize the important ideas and results for the multiresolution view of the continuous time wavelet transform. Our primary intent is to demonstrate the close relationship between continuous time wavelet analysis and the discrete time wavelet analysis presented in Chapter 4. To make this connection, we adopt a formalism that allows us to bridge the gap between the inner product convention used in mathematical discussions on wavelets and the filtering convention favored by engineers. For simplicity we deal only with signals, scaling functions and wavelet functions that are all taken to be real-valued. Only the case of dyadic wavelet analysis (where the scaling factor in the dilation of the basis function takes the value of two) is considered here.
As we saw in Chapters 4 and 5, one important use for the discrete wavelet transform (DWT) and its variant, the maximal overlap DWT (MODWT), is to decompose the sample variance of a time series on a scale-by-scale basis. In this chapter we explore wavelet-based analysis of variance (ANOVA) in more depth by defining a theoretical quantity known as the wavelet variance (sometimes called the wavelet spectrum). This theoretical variance can be readily estimated based upon the DWT or MODWT and has been successfully used in a number of applications; see, for example, Gamage (1990), Bradshaw and Spies (1992), Flandrin (1992), Gao and Li (1993), Hudgins et al. (1993), Kumar and Foufoula-Georgiou (1993, 1997), Tewfik et al. (1993), Wornell (1993), Scargle (1997), Torrence and Compo (1998) and Carmona et al. (1998). The definition for the wavelet variance and rationales for considering it are given in Section 8.1, after which we discuss a few of its basic properties in Section 8.2. We consider in Section 8.3 how to estimate the wavelet variance given a time series that can be regarded as a realization of a portion of length N of a stochastic process with stationary backward differences. We investigate the large sample statistical properties of wavelet variance estimators and discuss methods for determining an approximate confidence interval for the true wavelet variance based upon the estimated wavelet variance (Section 8.4).
In Chapter 4 we discussed the discrete wavelet transform (DWT), which essentially decomposes atime series X into coefficients that can be associated with different scales and times. We can thusregard the DWT of X as a ‘time/scale’ decomposition. The wavelet coefficients for agiven scale Tj ≡ 2J−1 tell ushow localized weighted averages of X vary from one averaging period to the next. The scaleTj gives us the effective width in time (i.e., degree of localization)of the weighted averages. Because the DWT can be formulated in terms of filters, we can relate thenotion of scale to certain bands of frequencies. The equivalent filter that yields the waveletcoefficients for scale Tj is approximately a band-pass filter with apass-band given by [l/2j+1, 1/2j].For a sample size N = 2J, the N - 1wavelet coefficients constitute - when taken together - an octave band decomposition of thefrequency interval [1/2J+1, 1/2], while the single scalingcoefficient is associated with the interval [0, 1/2J+1]. Taken asa whole, the DWT coefficients thus decompose the frequency interval [0, 1/2] into adjacentindividual intervals.
In this chapter we consider the discrete wavelet packet transform (DWPT), whichcan be regarded as any one of a collection of orthonormal transforms, each of which can be readilycomputed using a very simple modification of the pyramid algorithm for the DWT.
The chi-square statistic For testing hypotheses concerning multinomial distributions derives its name From the asymptotic approximation to its distribution. Two important applications are the testing of independence in a two-way classification and the testing of goodness-of-fit. In the second application the multinomial distribution is created artificially by grouping the data, and the asymptotic chi-square approximation may be lostifthe original data are used to estimate nuisance parameters.
Quadratic Forms in Normal Vectors
The chi-square distribution with k degrees of freedom is (by definition) the distribution of for i.i.d. N(O, 1)-distributed variables, The sum of squares is the squared norm of the standard normal vector. The following lemma gives a characterization of the distribution of the norm of a general zero-mean normal vector.
17.1 Lemma. If the vector X is Nk-distributed, thenis distributed as, the eigenvalues.
Proof. There exists an orthogonal matrix such that. Then the vector is diag -distributed, which is the same as the distribution of the vector has the same distribution.
The distribution of a quadratic form of the type is complicated in general. However, in the case that every is either, it reduces to a chi-square distribution. If this is not naturally the case in an application, then a statistic is often transformed to achieve this desirable situation. The definition of the Pearson statistic illustrates this.
Pearson Statistic
Suppose that we observe a vector with the multinomial distribution corresponding to trials and classes having probabilities. The Pearson statistic for testing the null hypothesis is given by
We shall show that the sequence converges in distribution to a chi-square distribution if the null hypothesis is true. The practical relevance is that we can use the chi-square table to find critical values for the test.
In this chapter we derive the asymptotic distribution of estimators of quantiles from the asymptotic distribution of the corresponding estimators of a distribution function. Empirical quantiles are an example, and hence we also discuss some results concerning order statistics. Furthermore, we discuss the asymptotics of the median absolute deviation, which is the empirical 1/2-quantile of the observations centered at their 1/2-quantile.
Weak Consistency
The quantile function of a cumulative distribution function is the generalized inverse given by
It is a left-continuous function with range equal to the support of F and hence is often unbounded. The following lemma records some useful properties.
Proof. The proofs of the inequalities in (i) through (iv) are best given by a picture. The equalities (v) follow from (ii) and (iv) and the monotonicity of by (iv). This proves the first statement in (ii); the second is immediate from the inequalities in (ii) and (iii). Statement (vi) follows from (i) and the definition of Consequences of (ii) and (iv) are that is strictly increasing i.e., has no. Thusis a proper inverse if and only if F is both continuous and strictly increasing, as one would expect.
By (i) the random variable has distribution function is uniformly distributed on [0, 1]. This is called the quantile transformation. On the other hand, by (i) and (ii) the variable is uniformly distributed on [0, 1] if and only if X has a continuous distribution function This is called the probability integral transformation.
A sequence of quantile functions is defined to converge weakly to a limit quantile function, denoted is continuous. This type of convergence is not only analogous in form to the weak convergence of distribution functions, it is the same.
A projection of a random variable is defined as a closest element in a given set of functions. We can use projections to derive the asymptotic distribution of a sequence of variables by comparing these to projections of a simple form. Conditional expectations are special projections. The Hajek projection is a sum of independent variables; it is the leading term in the Hoeffding decomposition.
Projections
A common method to derive the limit distribution of a sequence of statistics Tn is to show that it is asymptotically equivalent to a sequence Sn of which the limit behavior is known. The basis of this method is Slutsky's lemma, which shows that the sequence Tn = Tn–Sn + Sn converges in distribution to S if both Tn–Sn and S.
How do we find a suitable sequence Sn? First, the variables Sn must be of a simple form, because the limit properties of the sequence Sn must be known. Second, Sn must be close enough. One solution is to search for the closest Sn of a certain predetermined form. In this chapter, “closest” is taken as closest in square expectation.
Let T and be random variables (defined on the same probability space) with finite second-moments. A random variable S is called a proi-edion of and minimizes
Often S is a linear space in the sense that isfor every, whenever In this case S is the projection of if and only if is orthogonal to for the inner product This is the content of the following theorem.
Theorem. Let S be a linear space of random variables with finite second moments. Then S is the projection of Tonto S if and only ifSand
Every two projections of Tonto S are almost surely equal.
This chapter is concerned with statistical models that are indexed by infinite-dimensional parameters. It gives an introduction to the theory of asymptotic efficiency, and discusses methods of estimation and testing.
Introduction
Semi parametric models are statistical models in which the parameter is not a Euclidean vector but ranges over an “infinite-dimensional” parameter set. A different name is “model with a large parameter space.” In the situation in which the observations consist of a random sample from a common distribution P, the model is simply the set P of all possible values of P: a collection of probability measures on the sample space. The simplest type of infinite-dimensional model is the non parametric model, in which we observe a random sample from a completely unknown distribution. Then P is the collection of all probability measures on the sample space, and, as we shall see and as is intuitively clear, the empirical distribution is an asymptotically efficient estimator for the underlying distribution. More interesting are the intermediate models, which are not “nicely” parametrized by a Euclidean parameter, as are the standard classical models, but do restrict the distribution in an important way. Such models are often parametrized by infinite-dimensional parameters, such as distribution functions or densities, that express the structure under study. Many aspects of these parameters are estimable by the same order of accuracy as classical parameters, and efficient estimators are asymptotically normal. In particular, the model may have a natural parametrization. is a Euclidean parameter and 1] runs through a nonparametric class of distributions, or some other infinite-dimensional set. This gives a semiparametric model in the strict sense, in which we aim at estimating and consider 1] as a nuisance parameter. More generally, we focus on estimating the value of some function on the model.
In this chapter we extend the theory of asymptotic efficiency, as developed in Chapters 8 and 15, from parametric to semiparametric models and discuss some methods of estimation and testing. Although the efficiency theory (lower bounds) is fairly complete, there are still important holes in the estimation theory. In particular, the extent to which the lower bounds are sharp is unclear.
This chapter is an introduction to estimating densities if the underlying density of a sample of observations is considered completely unknown, up to existence of derivatives. We derive rates of convergence for the mean square error of kernel estimators and show that these cannot be improved. We also consider regularization by monotonicity.
Introduction
Statistical models are called parametric models if they are described by a Euclidean parameter (in a nice way). For instance, the binomial model is described by a single parameter p, and the normal model is given through two unknowns: the mean and the variance of the observations. In many situations there is insufficient motivation for using a particular parametric model, such as a normal model. An alternative at the other end of the scale is a non parametric model, which leaves the underlying distribution of the observations essentially free. In this chapter we discuss one example of a problem of nonparametric estimation: estimating the density of a sample of observations if nothing is known a priori. From the many methods for this problem, we present two: kernel estimation and monotone estimation. Notwithstanding its simplicity, this method can be fully asymptotically efficient.
Kernel Estimators
The most popular nonparametric estimator of a distribution based on a sample of observations is the empirical distribution, whose properties are discussed at length in Chapter 19. This is a discrete probability distribution and possesses no density. The most popular method of nonparametric density estimation, the kernel method, can be viewed as a recipe to “smooth out” the pointmasses of sizes in order to tum the empirical distribution into a continuous distribution.
Let be a random sample from a density f on the real line. If we would know that f belongs to the normal family of densities, then the natural estimate of f would be the normal density with mean and variance or the function
In this section we suppose that we have no prior knowledge of the form of f and want to “let the data speak as much as possible for themselves.“
Let K be a probability density with mean 0 and variance 1, for instance the standard normal density.