To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we are concerned with probability models for the mathematical description of random signals. We start with the fundamental concepts of random experiment, random variable, and statistical regularity and we show how they lead into the concepts of probability, probability distributions, and averages, and the development of probabilistic models for random signals. Then, we introduce the concept of stationary random process as a model for random signals, and we explain how to characterize the average behavior of such processes using the autocorrelation sequence (time-domain) and the power spectral density (frequency-domain). Finally, we discuss the effect of LTI systems on the autocorrelation and power spectral density of stationary random processes.
Study objectives
After studying this chapter you should be able to:
Understand the concepts of randomness, random experiment, statistical variability, statistical regularity, random variable, probability distributions, and statistical averages like mean and variance.
Understand the concept of correlation between two random variables, its measurement by quantities like covariance and correlation coefficient, and the meaning of covariance in the context of estimating the value of one random variable using a linear function of the value of another random variable.
Understand the concept of a random process and the characterization of its average behavior by the autocorrelation sequence (time-domain) and power spectral density (frequency-domain), develop an insight into the processing of stationary processes by LTI systems, and be able to compute mean, autocorrelation, and power spectral density of the output sequence from that of the input sequence and the impulse response.
During the last three decades Digital Signal Processing (DSP) has evolved into a core area of study in electrical and computer engineering. Today, DSP provides the methodology and algorithms for the solution of a continuously growing number of practical problems in scientific, engineering, and multimedia applications.
Despite the existence of a number of excellent textbooks focusing either on the theory of DSP or on the application of DSP algorithms using interactive software packages, we feel there is a strong need for a book bridging the two approaches by combining the best of both worlds. This was our motivation for writing this book, that is, to help students and practicing engineers understand the fundamental mathematical principles underlying the operation of a DSP method, appreciate its practical limitations, and grasp, with sufficient details, its practical implementation.
Objectives
The principal objective of this book is to provide a systematic introduction to the basic concepts and methodologies for digital signal processing, based whenever possible on fundamental principles. A secondary objective is to develop a foundation that can be used by students, researchers, and practicing engineers as the basis for further study and research in this field. To achieve these objectives, we have focused on material that is fundamental and where the scope of application is not limited to the solution of specialized problems, that is, material that has a broad scope of application.
In this chapter we introduce the concept of Fourier or frequency-domain representation of signals. The basic idea is that any signal can be described as a sum or integral of sinusoidal signals. However, the exact form of the representation depends on whether the signal is continuous-time or discrete-time and whether it is periodic or aperiodic. The underlying mathematical framework is provided by the theory of Fourier series, introduced by Jean Baptiste Joseph Fourier (1768–1830).
The major justification for the frequency domain approach is that LTI systems have a simple behavior with sinusoidal inputs: the response of a LTI system to a sinusoid is a sinusoid with the same frequency but different amplitude and phase.
Study objectives
After studying this chapter you should be able to:
Understand the fundamental differences between continuous-time and discrete-time sinusoidal signals.
Evaluate analytically the Fourier representation of continuous-time signals using the Fourier series (periodic signals) and the Fourier transform (aperiodic signals).
Evaluate analytically and numerically the Fourier representation of discrete-time signals using the Fourier series (periodic signals) and the Fourier transform (aperiodic signals).
Choose the proper mathematical formulas to determine the Fourier representation of any signal based on whether the signal is continuous-time or discrete-time and whether it is periodic or aperiodic.
Understand the use and implications of the various properties of the discrete-time Fourier transform.
Signal processing is a discipline concerned with the acquisition, representation, manipulation, and transformation of signals required in a wide range of practical applications. In this chapter, we introduce the concepts of signals, systems, and signal processing. We first discuss different classes of signals, based on their mathematical and physical representations. Then, we focus on continuous-time and discrete-time signals and the systems required for their processing: continuous-time systems, discrete-time systems, and interface systems between these classes of signal. We continue with a discussion of analog signal processing, digital signal processing, and a brief outline of the book.
Study objectives
After studying this chapter you should be able to:
Understand the concept of signal and explain the differences between continuous-time, discrete-time, and digital signals.
Explain how the physical representation of signals influences their mathematical representation and vice versa.
Explain the concepts of continuous-time and discrete-time systems and justify the need for interface systems between the analog and digital worlds.
Recognize the differences between analog and digital signal processing and explain the key advantages of digital over analog processing.
Signals
For our purposes a signal is defined as any physical quantity that varies as a function of time, space, or any other variable or variables. Signals convey information in their patterns of variation. The manipulation of this information involves the acquisition, storage, transmission, and transformation of signals.
There are many signals that could be used as examples in this section. However, we shall restrict our attention to a few signals that can be used to illustrate several important concepts and they will be useful in later chapters.
A key feature of the discrete-time systems discussed so far is that the signals at the input, output, and every internal node have the same sampling rate. However, there are many practical applications that either require or can be implemented more efficiently by processing signals at different sampling rates. Discrete-time systems with different sampling rates at various parts of the system are called multirate systems. The practical implementation of multirate systems requires changing the sampling rate of a signal using discrete-time operations, that is, without reconstructing and resampling a continuous-time signal. The fundamental operations for changing the sampling rate are decimation and interpolation. The subject of this chapter is the analysis, design, and efficient implementation of decimation and interpolation systems, and their application to two important areas of multirate signal processing: sampling rate conversion and multirate filter banks.
Study objectives
After studying this chapter you should be able to:
Understand the operations of decimation, interpolation, and arbitrary sampling rate change in the time and frequency domains.
Understand the efficient implementation of discrete-time systems for sampling rate conversion using polyphase structures.
Design a special type of filter (Nyquist filters), which are widely used for the efficient implementation of multirate filters and filter banks.
Understand the operation, properties, and design of two-channel filter banks with perfect reconstruction analysis and synthesis capabilities.
Sampling rate conversion
The need for sampling rate conversion arises in many practical applications, including digital audio, communication systems, image processing, and high-definition television.
In this chapter we discuss the basic concepts and the mathematical tools that form the basis for the representation and analysis of discrete-time signals and systems. We start by showing how to generate, manipulate, plot, and analyze basic signals and systems using Matlab. Then we discuss the key properties of causality, stability, linearity, and time-invariance, which are possessed by the majority of systems considered in this book. We continue with the mathematical representation, properties, and implementation of linear time-invariant systems. The principal goal is to understand the interaction between signals and systems to the extent that we can adequately predict the effect of a system upon the input signal. This is extremely difficult, if not impossible, for arbitrary systems. Thus, we focus on linear time-invariant systems because they are amenable to a tractable mathematical analysis and have important signal processing applications.
Study objectives
After studying this chapter you should be able to:
Describe discrete-time signals mathematically and generate, manipulate, and plot discrete-time signals using Matlab.
Check whether a discrete-time system is linear, time-invariant, causal, and stable; show that the input-output relationship of any linear time-invariant system can be expressed in terms of the convolution sum formula.
Determine analytically the convolution for sequences defined by simple formulas, write computer programs for the numerical computation of convolution, and understand the differences between stream and block processing.
Determine numerically the response of discrete-time systems described by linear constant-coefficient difference equations.
Scientists do mensurative or manipulative experiments to test hypotheses. The result of an experiment will differ from the expected outcome under the null hypothesis because of two things: (a) chance and (b) any effect of the experimental condition or treatment. This concept was illustrated with the chi-square test for nominal scale data in Chapter 6.
Although life scientists work with nominal scale variables, most of the data they collect are measured on a ratio, interval or ordinal scale and are often summarised by calculating a statistic such as the mean (which is also called the average: see Section 3.5). For example, you might have the mean blood pressure of a sample of five astronauts who had spent the previous six months in space and need to know if it differs significantly from the mean blood pressure of the population on Earth. An agricultural scientist might need to know if the mean weight of tomatoes differs significantly between two or more fertiliser treatments. If you knew the range of values within which 95% of the means of samples taken from a particular population were likely to occur, then a sample mean within this range would be considered non-significant and one outside this range would be considered significant. This chapter explains how a common property of many variables measured on a ratio, interval or ordinal scale data can be used for significance testing.
Life scientists often collect data that can be assigned to two or more discrete and mutually exclusive categories. For example, a sample of 20 humans can be partitioned into two categories of ‘right-handed’ or ‘left-handed’ (because even people who claim to be ambidextrous still perform a greater proportion of actions with one hand and can be classified as having a dominant right or left hand). These two categories are discrete, because there is no intermediate state, and mutually exclusive, because a person cannot be assigned to both. They also make up the entire set of possible outcomes within the sample and are therefore contingent upon each other, because for a fixed sample size a decrease in the number in one category must be accompanied by an increase in the number in the other and vice versa. These are nominal scale data (Chapter 3). The questions researchers ask about these data are the sort asked about any sample(s) from a population.
First, you may need to know the probability a sample has been taken from a population having a known or expected proportion within each category. For example, the proportion of left-handed people in the world is close to 0.1 (10%), which can be considered the proportion in the population because it is from a sample of several million people. A biomedical scientist, who suspected the proportion of left- and right-handed people showed some variation among occupations, sampled 20 statisticians and found that four were left handed and 16 right handed. The question is whether the proportions in the sample were significantly different from the expected proportions of 0.1 and 0.9 respectively. The difference between the population and the sample might be solely due to chance or also reflect career choice.
This chapter explains how simple linear regression analysis describes the functional (linear) relationship between a dependent and an independent variable, followed by an introduction to some more complex regression analyses which include fitting curves to non-linear relationships and the use of multiple linear regression to simultaneously model the relationship between a dependent variable and two or more independent ones. If you are using this book for an introductory course, you may only need to study the sections on simple linear regression, but the material in the latter part will be a very useful bridge to more advanced courses and texts.
The uses of correlation and regression were contrasted in Chapter 16. Correlation examines if two variables are related. Regression describes the functional relationship between a dependent variable (which is often called the response variable) and an independent variable (which is often called the predictor variable).
One way of generating hypotheses is to collect data and look for patterns. Often, however, it is difficult to see any underlying trend or feature from a set of data which is just a list of numbers. Graphs and descriptive statistics are very useful for summarising and displaying data in ways that may reveal patterns. This chapter describes the different types of data you are likely to encounter and discusses ways of displaying them.
Variables, experimental units and types of data
The particular attributes you measure when you collect data are called variables (e.g. body temperature, the numbers of a particular species of beetle per broad bean pod, the amount of fungal damage per leaf or the numbers of brown and albino mice). These data are collected from each experimental or sampling unit, which may be an individual (e.g. a human or a whale) or a defined item (e.g. a square metre of the seabed, a leaf or a lake). If you only measure one variable per experimental unit, the data set is univariate. Data for two variables per unit are bivariate. Data for three or more variables measured on the same experimental unit are multivariate.
This chapter describes some non-parametric tests for ratio, interval or ordinal scale univariate data. These tests do not use the predictable normal distribution of sample means, which is the basis of most parametric tests, to assess whether samples are from the same population. Because of this, non-parametric tests are generally not as powerful as their parametric equivalents but if the data are grossly non-normal and cannot be satisfactorily improved by transformation it is necessary to use one.
Non-parametric tests are often called ‘distribution free tests’, but most nevertheless assume that the samples being analysed are from populations with the same distribution. They should not be used where there are gross differences in distribution (including the variance) among samples and the general rule that the ratio of the largest to smallest sample variance should not exceed 4:1 discussed in Chapter 14 also applies. Many non-parametric tests for ratio, interval or ordinal data calculate a statistic from a comparison of two or more samples and work in the following way.
A single-factor ANOVA (Chapter 11) is used to analyse univariate data from samples exposed to different levels or aspects of only one factor. For example, it could be used to compare the oxygen consumption of a species of intertidal crab (the response variable) at two or more temperatures (the factor), the growth of brain tumours (the response variable) exposed to a range of drugs (the factor), or the insecticide resistance of a moth (the response variable) from several different locations (the factor).
Often, however, life scientists obtain univariate data for a response variable in relation to more than one factor. Examples of two-factor experiments are the oxygen consumption of an intertidal crab at several combinations of temperature and humidity, the growth of brain tumours exposed to a range of drugs and different levels of radiation therapy, or the insecticide resistance of an agricultural pest from different locations and different host plants.
This chapter gives a summary of the essential concepts of probability needed for the statistical tests covered in this book. More advanced material that will be useful if you go on to higher-level statistics courses is in Section 7.6.
Probability
The probability of any event can only vary between zero (0) and one (1) (which correspond to 0% and 100%). If an event is certain to occur, it has a probability of 1. If an event is certain not to occur, it has a probability of 0.
This chapter is an introduction to four slightly more complex analyses of variance often used by life scientists: two-factor ANOVA without replication, ANOVA for randomised block designs, repeated-measures ANOVA and nested ANOVA. An understanding of these is not essential if you are reading this book as an introduction to biostatistics. If, however, you need to use models that are more complex, the explanations given here are straightforward extensions of the pictorial descriptions in Chapters 11 and 13 and will help with many of the models used to analyse designs that are more complex.
Two-factor ANOVA without replication
This is a special case of the two-factor ANOVA described in Chapter 13. Sometimes an orthogonal experiment with two independent factors has to be done without replication because there is a shortage of experimental subjects or the treatments are very expensive to administer. The simplest case of ANOVA without replication is a two-factor design. You cannot do a single-factor ANOVA without replication.
Sometimes when testing whether two or more samples have come from the same population, you may suspect that the value of the response variable is also affected by a second factor. For example, you might need to compare the concentration of lead in the tissues of rats from a contaminated mine site and several uncontaminated sites. You could do this by trapping 20 rats at the mine and 20 from each of five randomly chosen uncontaminated sites. These data for the response variable ‘lead concentration’ and the factor ‘site’ could be analysed by a single-factor ANOVA (Chapter 11), but there is a potential complication. Heavy metals such as lead often accumulate in the bodies of animals, so the concentration in an individual’s tissues will be determined by (a) the amount of lead it has been exposed to at each site and (b) its age. Therefore, if you have a wide age range of rats in each sample, the variance in lead concentration within each is also likely to be large. This will give a relatively large standard error of the mean and a large error term in ANOVA (Chapter 11).
An example for a comparison between a mine site and a relatively uncontaminated control site is given in Figures 18.1(a) and (b). The lead concentration at a particular age is always higher in a rat from the mine site compared to one from the control, so there appears to be a difference between sites, but a t test or single-factor ANOVA will not show a significant difference because the data are confounded by the wide age range of rats trapped at each. The second confounding factor is called the covariate.
To generate hypotheses you often sample different groups or places (which is sometimes called a mensurative experiment because you usually measure something, such as height or weight, on each sampling unit) and explore these data for patterns or associations. To test hypotheses, you may do mensurative experiments, or manipulative experiments where you change a condition and observe the effect of that change on each experimental unit (like the experiment with millipedes and light described in Chapter 2). Often you may do several experiments of both types to test a particular hypothesis. The quality of your sampling and the design of your experiment can have an effect upon the outcome and therefore determine whether your hypothesis is rejected or not, so it is absolutely necessary to have an appropriate experimental design.
First, you should attempt to make your measurements as accurate and precise as possible so they are the best estimates of actual values. Accuracy is the closeness of a measured value to the true value. Precision is the ‘spread’ or variability of repeated measures of the same value. For example, a thermometer that consistently gives a reading corresponding to a true temperature (e.g. 20°C) is both accurate and precise. Another that gives a reading consistently higher (e.g. +10°C) than a true temperature is not accurate, but it is very precise. In contrast, a thermometer that gives a reading that fluctuates around a true temperature is not precise and will usually be inaccurate except when it occasionally happens to correspond to the true temperature.