To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A study of human hearing and the biomechanical processes involved in hearing, reveals several nonlinear steps, or stages, in the perception of sound. Each of these stages contributes to the eventual unequal distribution of subjective features against purely physical ones in human hearing.
Put simply, what we think we hear is quite significantly different from the physical sounds that may be present (which in turn differs from what would be captured electronically by, for example, a computer). By taking into account the various nonlinearities in the hearing process, and some of the basic physical characteristics of the ear, nervous system, and brain, it is possible to account for the discrepancy.
Over the years, science and technology has incrementally improved the ability to model the hearing process from purely physical data. One simple example is that of A-law compression (or the similar μ-law used in some regions of the world), where approximately logarithmic amplitude quantisation replaces the linear quantisation of PCM: humans tend to perceive amplitude logarithmically rather than linearly, and thus A-law quantisation using 8 bits sounds better than linear PCM quantisation using 8 bits. It thus achieves a higher degree of subjective speech quality than PCM for a given bitrate.
Physical processes
The ear, as shown diagrammatically in Figure 4.1, includes the pinna which filters sound and focuses it into the external auditory canal. Sound then acts upon the eardrum where it is transmitted and amplified by the three bones, the malleus, incus and stapes, to the oval window, opening on to the cochlea.
This chapter introduces the notion of noisy quantum channels, and the different types of “quantum noise” that affect qubit messages passed through such channels. The main types of noisy channel reviewed here are the depolarizing, bit-flip, phase-flip, and bit-phase-flip channels. Then the quantum channel capacity χ is defined through the Holevo–Schumacher–Westmoreland (HSW) theorem. Such a theorem can conceptually be viewed as the elegant quantum counterpart of Shannon's (noisy) channel coding theorem, which was described in Chapter 13. Here, I shall not venture into the complex proof of the HSW theorem but only provide a background illustrating the similarity with its classical counterpart. The resemblance with the channel capacity χ and the Holevo bound, as described in Chapter 21, and with the classical mutual information H(X; Y), as described in Chapter 5, are both discussed. For advanced reference, a hint is provided as to the meaning of the still not fully explored concept of quantum coherent information. Several examples of quantum channel capacity, derived from direct applications of the HSW theorem, along with the solution of the maximization problem, are provided.
Noisy quantum channels
The notion of “noisiness” in a classical communication channel was first introduced in Chapter 12, when describing channel entropy. Such a channel can be viewed schematically as a probabilistic relation between two random sources, X for the originator, and Y for the recipient.
This chapter makes us walk a few preliminary, but decisive, steps towards quantum information theory (QIT), which will be the focus of the rest of this book. Here, we shall remain in the classical world, yet getting a hint that it is possible to think of a different world where computations may be reversible, namely, without any loss of information. One key realization through this paradigm shift is that “information is physical.” As we shall see, such a nonintuitive and striking conclusion actually results from the age-long paradox of Maxwell's demon in thermodynamics, which eventually found an elegant conclusion in Landauer's principle. This principle states that the erasure of a single bit of information requires one to provide an energy that is proportional to log 2, which, as we know from Shannon's theory, is the measure of information and also the entropy of a two-level system with a uniformly distributed source. This consideration brings up the issue of irreversible computation. Logic gates, used at the heart of the CPU in modern computers, are based on such computation irreversibility. I shall then describe the computers' von Newman's architecture, the intimate workings of the ALU processing network, and the elementary logic gates on which the ALU is based. This will also provide some basics of Boolean logic, expanding on Chapter 1, which is the key to the following logic-gate concepts.
We saw in Chapter 13 that, while vocal-tract methods can often generate intelligible speech, they seem fundamentally limited in terms of generating natural-sounding speech. We saw that, in the case of formant synthesis, the main limitation is not so much in generating the speech from the parametric representation, but rather in generating these parameters from the input specification which was created by the text-analysis process. The mapping between the specification and the parameters is highly complex, and seems beyond what we can express in explicit human-derived rules, no matter how “expert” the rule designer. We face the same problems with articulatory synthesis and in addition have to deal with the facts that acquiring data is fundamentally difficult and improving naturalness often necessitates a considerable increase in complexity in the synthesiser.
A partial solution to the complexities of specifiction-to-parameter mapping is found in the classical LP technique whereby we bypassed the issue of generating of the vocal-tract parameters explicitly and instead measured them from data. The source parameters, however, were still specified by an explicit model, which was identified as the main source of the unnaturalness.
In this chapter we introduce a set of techniques that attempt to get around these limitations. In a way, these can be viewed as extensions of the classical LP technique in that they use a data-driven approach: the increase in quality, however, largely arises from the abandonment of the over-simplistic impulse/noise source model.
Analysis techniques are those used to examine, understand and interpret the content of recorded sound signals. Sometimes these lead to visualisation methods, whilst at other times they may be used in specifying some form of further processing or measurement of the audio.
There is a general set of analysis techniques which are common to all audio signals, and indeed to many forms of data, particularly the traditional methods used for signal processing. We have already met and used the basic technique of decomposing sound into multiple sinusoidal components with the Fast Fourier Transform (FFT), and have considered forming a polynomial equation to replicate audio waveform characteristics through linear prediction (LPC), but there are many other useful techniques we have not yet considered.
Most analysis techniques operate on analysis windows, or frames, of input audio. Most also require that the analysis window is a representative stationary selection of the signal (stationary in that the signal statistics and frequency distribution do not change appreciably during the time duration of the window – otherwise results may be inaccurate). We had discussed the stationarity issue in Section 2.5.1, and should note that the choice of analysis window size, as well as the choice of analysis methods used, depends strongly upon the identity of the signal being analysed. Speech, noise and music all have different characteristics, and while many of the same methods can be used in their analysis, knowledge of their characteristics leads to different analysis periods, and different parameter ranges of the analysis result.
This chapter is concerned with a remarkable type of code, whose purpose is to ensure that any errors occurring during the transmission of data can be identified and automatically corrected. These codes are referred to as error-correcting codes (ECC). The field of error-correcting codes is rather involved and diverse; therefore, this chapter will only constitute a first exposure and a basic introduction of the key principles and algorithms. The two main families of ECC, linear block codes and cyclic codes, will be considered. I will then describe in further detail some specifics concerning the most popular ECC types used in both telecommunications and information technology. The last section concerns the evaluation of corrected bit-error-rates (BER), or BER improvement, after information reception and ECC decoding.
Communication channel
The communication of information through a message sequence is made over what we shall now call a communication channel or, in Shannon's terminology, a channel. This channel first comprises a source, which generates the message symbols from some alphabet. Next to the source comes an encoder, which transforms the symbols or symbol arrangements into codewords, using one of the many possible coding algorithms reviewed in Chapters 9 and 10, whose purpose is to compress the information into the smallest number of bits. Next is a transmitter, which converts the codewords into physical waveforms or signals. These signals are then propagated through a physical transmission pipe, which can be made of vacuum, air, copper wire, coaxial wire, or optical fiber.
We now turn to unit-selection synthesis which is the dominant synthesis technique in text-to-speech today. Unit selection is the natural extension of second-generation concatenative systems, and deals with the issues of how to manage large numbers of units, how to extend prosody beyond just F0 and timing control, and how to alleviate the distortions caused by signal processing.
From concatenative synthesis to unit selection
The main progression from first- to second-generation systems was a move away from fully explicit synthesis models. Of the first-generation techniques, classical LP synthesis differs from formant synthesis in that it uses data, rather than rules, to specify vocal-tract behaviour. Both first-generation techniques, however, still used explicit source models. The improved quality of second-generation techniques stems largely from abandoning explicit source models as well, regardless of whether TD-PSOLA (no model), RELP (use of real residuals) or a sinusoidal model (no strict source/filter model) is employed. The direction of progress is therefore clear: a movement away from explicit, hand-written rules, towards implicit, data-driven techniques.
By the early 1990s, a typical second-generation system was a concatenative diphone system in which the pitch and timing of the original waveforms were modified by a signal-processing technique to match the pitch and timing of the specification. In these second-generation systems, the assumption is that the specification from the text-analysis system comprises a list of items as before, where each item is specified with phonetic/phonemic identity information, a pitch and a timing value. Hence, these systems assume the following.
This second chapter concludes our exploration tour of coding and data compression. We shall first consider integer coding, which represents another family branch of optimal codes (next to Shannon–Fano and Huffman coding). Integer coding applies to the case where the source symbols are fully known, but the probability distribution is only partially known (thus, the previous optimal codes cannot be implemented). Three main integer codes, called Elias, Fibonacci, and Golomb–Rice, will then be described. Together with the previous chapter, this description will complete our inventory of static codes, namely codes that apply to cases where the source symbols are known, and the matter is to assign the optimal code type. In the most general case, the source symbols and their distribution are unknown, or the distribution may change according to the amount of symbols being collected. Then, we must find new algorithms to assign optimal codes without such knowledge; this is referred to as dynamic coding. The three main algorithms for dynamic coding to be considered here are referred to as arithmetic coding, adaptive Huffman coding, and Lempel–Ziv coding.
Integer coding
The principle of integer coding is to assign an optimal (and predefined) codeword to a list of n known symbols, which we may call {1,2,3,…, n}. In such a list, the symbols are ranked in order of decreasing frequency or probability, or mathematically speaking, in order of “nonincreasing” frequency or probability.
It is always a great opportunity and pleasure for a professor to introduce a new textbook. This one is especially unusual, in a sense that, first of all, it concerns two fields, namely, classical and quantum information theories, which are rarely taught altogether with the same reach and depth. Second, as its subtitle indicates, this textbook primarily addresses the telecom scientist. Being myself a quantum-mechanics teacher but not being conversant with the current Telecoms paradigm and its community expectations, the task of introducing such a textbook is quite a challenge. Furthermore, both subjects in information theory can be regarded by physicists and engineers from all horizons, including in telecoms, as essentially academic in scope and rather difficult to reconcile in their applications. How then do we proceed from there?
I shall state, firsthand, that there is no need to convince the reader (telecom or physicist or both) about the benefits of Shannon's classical theory. Generally unbeknown to millions of telecom and computer users, Shannon's principles pervade all applications concerning data storage and computer files, digital music and video, wireline and wireless broadband communications altogether. The point here is that classical information theory is not only a must to know from any academic standpoint; it is also a key to understanding the mathematical principles underlying our information society.
Shannon's theory being reputed for its completeness and societal impact, the telecom engineer (and physicist within!) may, therefore, wonder about the benefits of quantum mechanics (QM), when it comes to information.