To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In various signal processing and communication applications, multiple signals can coexist and a receiver has to detect or estimate multiple signals (or possibly their features) simultaneously. For example, we can consider a multiple-speaker identification system that attempts to identify multiple speakers’ voices simultaneously. Another example is a multi-sensor system for multiple-signal classification in radar and sonar applications. The notion of signal combining in Chapter 4 can be extended to the case of multiple signals. Since other signals co-exist, the signal combiner plays a crucial role in not only combining multiple observations, but also mitigating the other signals.
In this chapter, we discuss optimal signal combining to estimate multiple signals simultaneously when multiple observations or received signals are available. Various well-known optimal combiners are introduced. In particular, we mainly focus on the MMSE combiner as it is widely used and has various crucial properties. In signal combining or estimation, no particular constraint on transmitted signals is imposed, while in Chapter 7 we will discuss signal detection (not estimation) for multiple signals under the assumption that each signal is an element of a signal alphabet or constellation, which becomes a crucial constraint in signal detection.
Systems with multiple signals
Suppose that there are K signal sources transmitting signals simultaneously through a common channel to a receiver as shown in Fig. 6.1. The signals generated from multiple sources could be correlated or not.
Statistical hypothesis testing is a process to accept or reject a hypothesis based on observations, where multiple hypotheses are proposed to characterize observations. Taking observations as realizations of a certain random variable, each hypothesis can be described by a different probability distribution of the random variable. Under a certain criterion, a hypothesis can be accepted for given observations. Signal detection is an application of statistical hypothesis testing.
In this chapter, we present an overview of signal detection and introduce key techniques for performance analysis. We mainly focus on fundamentals of signal detection in this chapter, while various signal detection problems and detection algorithms will be discussed in later chapters (e.g. Chapters 7, 8, and 9).
Elements of hypothesis testing
There are three key elements in statistical hypothesis testing, which are (i) observation(s); (ii) set of hypotheses; and (iii) prior information. With these key elements, the decision process or hypothesis testing can be illustrated as in Fig. 2.1.
In statistical hypothesis testing, observations and prior information are all important and should be taken into account. However, in some cases, no prior information is available or prior information could be useless. In this case, statistical hypothesis testing relies only on observations.
Suppose that there are M(≥ 2) hypotheses. Then, we have an M-ary hypothesis testing in which we will choose one of the M hypotheses that are proposed to characterize observations and prior information under a certain performance criterion. There are various hypothesis tests or decision rules depending on criteria.
For a better signal reception, it is often desirable to use multiple sensors or antennas at a receiver. (Note that we will assume that receive antenna and sensor are interchangeable and they are considered as a device that can receive signals through a certain channel medium. For convenience, however, we prefer antenna throughout the book with wireless communication applications in mind.) To extract a signal of interest, multiple signals received by multiple antennas are to be properly combined. For signal combining, we need to take into account the desired signal's (statistical or deterministic) properties as well as statistical properties of background noise.
Although there are various signal combining techniques, we focus on linear combining techniques in this chapter, because they can be relatively easily implemented and their analysis is tractable. In addition, only second-order moments of a desired signal and noise are usually required to find a linear combiner under the MMSE criterion.
Signals in space
Suppose that there are N sensors or antennas to receive a signal of interest generated from a source, which can be a radio signal or a voice. In general, the signal is received through a certain channel medium with channel attenuation or distortion and corrupted by noise. Since multiple observations of a signal are available using multiple sensors or antennas, the signal can be seen as a vector in a vector space as illustrated in Fig. 4.1.
Digital television is a multibillion-dollar industry with commercial systems now being deployed worldwide. In this concise yet detailed guide, you will learn about the standards that apply to fixed-line and mobile digital television, as well as the underlying principles involved. The digital television standards are presented to aid understanding of new systems in the market and reveal the variations between different systems used throughout the world. Discussions of source and channel coding then provide the essential knowledge needed for designing reliable new systems. Throughout the book the theory is supported by over 200 figures and tables, whilst an extensive glossary defines practical terminology. This is an ideal reference for practitioners in the field of digital television. It will also appeal to graduate students and researchers in electrical engineering and computer science, and can be used as a textbook for graduate courses on digital television systems.
This authoritative guide is the first to provide a complete system design perspective based on existing international standards and state-of-the-art networking and infrastructure technologies, from theoretical analyses to practical design considerations. The four most critical components involved in a multimedia networking system - data compression, quality of service (QoS), communication protocols, and effective digital rights management - are intensively addressed. Many real-world commercial systems and prototypes are also introduced, as are software samples and integration examples, allowing readers to understand practical tradeoffs in the design of multimedia architectures, and get hands-on experience learning the methodologies and procedures. Balancing just the right amount of theory with practical design and integration knowledge, this book is ideal for graduate students and researchers in electrical engineering and computer science, and also for practitioners in the communications and networking industry. It can also be used as a textbook for specialized graduate-level courses on multimedia networking.
Compression for Multimedia was primarily developed as class notes for my course on techniques for compression of data, speech, music, pictures, and video that I have been teaching for more than 10 years at the University of Aerospace Instrumentation, St Petersburg.
During spring 2005 I worked at Lund University as the Lise Meitner Visiting Professor. I have used part of this time to thoroughly revise and substantially extend my previous notes, resulting in the present version.
I would also like to mention that this task could not have been fulfilled without support. Above all, I am indebted to my colleague and husband Boris Kudryashov. Without our collaboration I would not have reached my view of how various compression techniques could be developed and should be taught. Boris' help in solving many TEX problems was invaluable. Special thanks go to Grigory Tenengolts who supported our research and development of practical methods for multimedia compression. Finally, I am grateful to Rolf Johannesson who proposed me as a Lise Meitner Visiting Professor and, needless to say, to the Engineering faculty of Lund University who made his recommendation become true! Rolf also suggested that I should give an undergraduate course on compression for multimedia at Lund University, develop these notes, and eventually publish them as a book. Thanks!
Rate distortion theory is the part of information theory which studies data compression with a fidelity criterion. In this chapter we consider the notion of rate-distortion function which is a theoretical limit for quantizer performances. The Blahut algorithm for finding the rate-distortion function numerically is given. In order to compare the performances of different quantizers, some results of the high-resolution quantization theory are discussed. Comparison of quantization procedures for the source with the generalized Gaussian distribution is performed.
Rate-distortion function
Each quantization procedure is characterized by the average distortion D and by the quantization rate R. The goal of compression system design is to optimize the rate-distortion tradeoff. In order to compare different quantizers, the rate-distortion function R(D) (Cover and Thomas 1971) is introduced. Our goal is to find the best quantization procedure for a given source. We say that for a given source at a given distortion D = D0, a quantization procedure with rate-distortion function R1(D) is better than another quantization procedure with rate-distortion function R2(D) if R1 (D0) ≤ R2 (D0). Unfortunately, very often it is difficult to point out the best quantization procedure. The reason is that the best quantizer can have very high computational complexity or sometimes, even, it can be unknown. On the other hand, it is possible to find the best rate-distortion function without finding the best quantization procedure. This theoretical lower limit for the rate at a given distortion is provided by the information rate distortion function (Cover and Thomas 1971).
Multimedia data can be considered as data observed at the output of a source with memory. Sometimes we say that speech and images have considerable redundancy, meaning the statistical correlation or dependence between the samples of such sources which is referred to as memory in information theory literature. Scalar quantization does not exploit this redundancy or memory. As was shown in Chapter 3, scalar quantization for sources with memory provides a rate-distortion function which is rather far from the achievable rate-distortion function H(D) for a given source. Vector quantization could attain better rate-distortion performance but usually at the cost of significantly increasing computational complexity. Another approach leading to a better rate-distortion function and preserving rather low computational complexity combines linear processing with scalar quantization. First, we remove redundancy from the data and then apply scalar quantization to the output of the memoryless source. Outputs of the obtained memoryless source can be vector quantized with lower average distortions but with higher computational complexity. The two most important approaches of this variety are predictive coding and transform coding (Jayant and Noll 1984). The first approach is mainly used for speech compression and the second approach is applied to image, audio, and video coding. In this chapter, we will consider predictive coding systems which use time-domain operations in order to remove redundancy and thereby to reduce the bit-rate for given quantization error levels.
In this Appendix we consider data compression algorithms which guarantee that the reconstructed file from the compressed bitstream will bit-by-bit coincide with the original input file. These lossless (entropy-coding) algorithms have many applications. They can be used for archiving different types of datum: texts, images, speech, and audio. Since multimedia data usually can be considered as outputs of a source with memory or, in other words, have significant redundancy, then the entropy coding can be combined with different types of preprocessing, for example, linear prediction.
However, multimedia compression standards are based mainly on lossy coding schemes which provide much larger compression ratios compared to lossless coding. It might seem that for such compression systems entropy coding plays no role but that is not the case. In fact, lossless coding is an important part of lossy compression standards also. In this case we consider quantized outputs of a preprocessing block as the outputs of a discrete-time source, estimate statistics of this source, and apply to its outputs entropy-coding techniques.
Symbol-by-symbol lossless coding
Let a random variable take on values x from the discrete set X = {0, 1, …, M − 1} and let p(x) > 0 be the probability mass function of X or, in other words, the probability distribution on the set X. If we do not take into account that different values x are not equally probable, we can only construct a fixed-length code for all possible values x ∈ X.
The main application of audio compression systems is to obtain compact digital representations of high-quality (CD-quality) wideband audio signals. Typically, audio signals recorded on CDs and digital audio tapes are sampled at 44.1 or 48 kHz and each sample is represented by a 16-bit integer; that is, the uncompressed two-channel stereo CD-quality audio requires 2 × 44(48) × 16 = 1.41(1.54) Mb/s for transmission. Unlike speech compression systems, the audio codecs process sounds generated by arbitrary sources and they cannot exploit specific features of the input signals. However, almost all modern audio codecs are based on a model of the human auditory system. The key idea behind the so-called perceptual coding is to remove the parts of the input signal which the human cannot perceive. The imperceptible information removed by the perceptual coder is called the irrelevancy. Since, similarly to speech signals, audio signals can be interpreted as outputs of sources with memory, then perceptual coders remove both irrelevancy and redundancy in order to provide the lowest bit-rate possible for a given quality.
An important part of perceptual audio coders is the psychoacoustic model of the human hearing. This model is used in order to estimate the amount of quantization noise that is inaudible. In the next section, we consider physical phenomena which are exploited by the psychoacoustic model.
Video signals represent sequences of images or frames which can be transmitted with a rate from 10 up to 60 frames/s (fps) providing the illusion of motion in the displayed signal. They can be represented in different formats which differ in frame size and number of fps. For example, the video format QCIF intended for video conferencing and mobile applications uses frames of size 176 × 144 pixels transmitted at rate 10 fps. The High Definition Digital Television (HDTV) standard uses frames of much larger size, 1280 × 720 pixels, transmitted at rate 60 fps. The frames can be represented in RGB or YUV formats with 24 or fewer (due to decimation of U and V components) number of bits per pixel. If the frames are in RGB format, then the first processing step in any video coder is the RGB to YUV transform.
Unlike images, video sequences contain the so-called temporal redundancy which arises from repeating objects in consecutive frames of a video sequence. The simplest method of temporal prediction can use the previous frame as a prediction of the current frame. However, the residual formed by subtracting the prediction from the current frame typically has large energy, the reason being the object movements between the current and the previous frames. Better predictions can be performed by compensating the motion between two frames.