To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We saw in Chapter 13 that, while vocal-tract methods can often generate intelligible speech, they seem fundamentally limited in terms of generating natural-sounding speech. We saw that, in the case of formant synthesis, the main limitation is not so much in generating the speech from the parametric representation, but rather in generating these parameters from the input specification which was created by the text-analysis process. The mapping between the specification and the parameters is highly complex, and seems beyond what we can express in explicit human-derived rules, no matter how “expert” the rule designer. We face the same problems with articulatory synthesis and in addition have to deal with the facts that acquiring data is fundamentally difficult and improving naturalness often necessitates a considerable increase in complexity in the synthesiser.
A partial solution to the complexities of specifiction-to-parameter mapping is found in the classical LP technique whereby we bypassed the issue of generating of the vocal-tract parameters explicitly and instead measured them from data. The source parameters, however, were still specified by an explicit model, which was identified as the main source of the unnaturalness.
In this chapter we introduce a set of techniques that attempt to get around these limitations. In a way, these can be viewed as extensions of the classical LP technique in that they use a data-driven approach: the increase in quality, however, largely arises from the abandonment of the over-simplistic impulse/noise source model.
Analysis techniques are those used to examine, understand and interpret the content of recorded sound signals. Sometimes these lead to visualisation methods, whilst at other times they may be used in specifying some form of further processing or measurement of the audio.
There is a general set of analysis techniques which are common to all audio signals, and indeed to many forms of data, particularly the traditional methods used for signal processing. We have already met and used the basic technique of decomposing sound into multiple sinusoidal components with the Fast Fourier Transform (FFT), and have considered forming a polynomial equation to replicate audio waveform characteristics through linear prediction (LPC), but there are many other useful techniques we have not yet considered.
Most analysis techniques operate on analysis windows, or frames, of input audio. Most also require that the analysis window is a representative stationary selection of the signal (stationary in that the signal statistics and frequency distribution do not change appreciably during the time duration of the window – otherwise results may be inaccurate). We had discussed the stationarity issue in Section 2.5.1, and should note that the choice of analysis window size, as well as the choice of analysis methods used, depends strongly upon the identity of the signal being analysed. Speech, noise and music all have different characteristics, and while many of the same methods can be used in their analysis, knowledge of their characteristics leads to different analysis periods, and different parameter ranges of the analysis result.
This chapter is concerned with a remarkable type of code, whose purpose is to ensure that any errors occurring during the transmission of data can be identified and automatically corrected. These codes are referred to as error-correcting codes (ECC). The field of error-correcting codes is rather involved and diverse; therefore, this chapter will only constitute a first exposure and a basic introduction of the key principles and algorithms. The two main families of ECC, linear block codes and cyclic codes, will be considered. I will then describe in further detail some specifics concerning the most popular ECC types used in both telecommunications and information technology. The last section concerns the evaluation of corrected bit-error-rates (BER), or BER improvement, after information reception and ECC decoding.
Communication channel
The communication of information through a message sequence is made over what we shall now call a communication channel or, in Shannon's terminology, a channel. This channel first comprises a source, which generates the message symbols from some alphabet. Next to the source comes an encoder, which transforms the symbols or symbol arrangements into codewords, using one of the many possible coding algorithms reviewed in Chapters 9 and 10, whose purpose is to compress the information into the smallest number of bits. Next is a transmitter, which converts the codewords into physical waveforms or signals. These signals are then propagated through a physical transmission pipe, which can be made of vacuum, air, copper wire, coaxial wire, or optical fiber.
We now turn to unit-selection synthesis which is the dominant synthesis technique in text-to-speech today. Unit selection is the natural extension of second-generation concatenative systems, and deals with the issues of how to manage large numbers of units, how to extend prosody beyond just F0 and timing control, and how to alleviate the distortions caused by signal processing.
From concatenative synthesis to unit selection
The main progression from first- to second-generation systems was a move away from fully explicit synthesis models. Of the first-generation techniques, classical LP synthesis differs from formant synthesis in that it uses data, rather than rules, to specify vocal-tract behaviour. Both first-generation techniques, however, still used explicit source models. The improved quality of second-generation techniques stems largely from abandoning explicit source models as well, regardless of whether TD-PSOLA (no model), RELP (use of real residuals) or a sinusoidal model (no strict source/filter model) is employed. The direction of progress is therefore clear: a movement away from explicit, hand-written rules, towards implicit, data-driven techniques.
By the early 1990s, a typical second-generation system was a concatenative diphone system in which the pitch and timing of the original waveforms were modified by a signal-processing technique to match the pitch and timing of the specification. In these second-generation systems, the assumption is that the specification from the text-analysis system comprises a list of items as before, where each item is specified with phonetic/phonemic identity information, a pitch and a timing value. Hence, these systems assume the following.
This second chapter concludes our exploration tour of coding and data compression. We shall first consider integer coding, which represents another family branch of optimal codes (next to Shannon–Fano and Huffman coding). Integer coding applies to the case where the source symbols are fully known, but the probability distribution is only partially known (thus, the previous optimal codes cannot be implemented). Three main integer codes, called Elias, Fibonacci, and Golomb–Rice, will then be described. Together with the previous chapter, this description will complete our inventory of static codes, namely codes that apply to cases where the source symbols are known, and the matter is to assign the optimal code type. In the most general case, the source symbols and their distribution are unknown, or the distribution may change according to the amount of symbols being collected. Then, we must find new algorithms to assign optimal codes without such knowledge; this is referred to as dynamic coding. The three main algorithms for dynamic coding to be considered here are referred to as arithmetic coding, adaptive Huffman coding, and Lempel–Ziv coding.
Integer coding
The principle of integer coding is to assign an optimal (and predefined) codeword to a list of n known symbols, which we may call {1,2,3,…, n}. In such a list, the symbols are ranked in order of decreasing frequency or probability, or mathematically speaking, in order of “nonincreasing” frequency or probability.
It is always a great opportunity and pleasure for a professor to introduce a new textbook. This one is especially unusual, in a sense that, first of all, it concerns two fields, namely, classical and quantum information theories, which are rarely taught altogether with the same reach and depth. Second, as its subtitle indicates, this textbook primarily addresses the telecom scientist. Being myself a quantum-mechanics teacher but not being conversant with the current Telecoms paradigm and its community expectations, the task of introducing such a textbook is quite a challenge. Furthermore, both subjects in information theory can be regarded by physicists and engineers from all horizons, including in telecoms, as essentially academic in scope and rather difficult to reconcile in their applications. How then do we proceed from there?
I shall state, firsthand, that there is no need to convince the reader (telecom or physicist or both) about the benefits of Shannon's classical theory. Generally unbeknown to millions of telecom and computer users, Shannon's principles pervade all applications concerning data storage and computer files, digital music and video, wireline and wireless broadband communications altogether. The point here is that classical information theory is not only a must to know from any academic standpoint; it is also a key to understanding the mathematical principles underlying our information society.
Shannon's theory being reputed for its completeness and societal impact, the telecom engineer (and physicist within!) may, therefore, wonder about the benefits of quantum mechanics (QM), when it comes to information.
This appendix provides a brief overview of common data compression standards used for sounds, texts, files, images, and videos. The description is just meant to be introductory and makes no pretense of comprehensively defining the actual standards and their current updated versions. The list of selected standards is also indicative, and does not reflect the full diversity of those available in the market, as freeware, shareware, or under license. It is a tricky endeavor to attempt a description here in a few pages of a subject that would fill entire bookshelves. The hope is that the reader will get a flavor and will be enticed to learn more about this seemingly endless, yet fascinating subject. Why put this whole matter into an appendix, and not a fully fledged chapter? This is because this set of chapters is primarily focused on information theory, not on information standards. While the first provides a universal and slowly evolving background reference, like science, the second represents practically all the reverse. As we shall see through this appendix, however, information standards are extremely sophisticated and “intellectually smart,” despite being just an application field for the former. And there are no telecom engineers or scientists who may ignore or will not benefit from this essential fact and truth!
This chapter marks a key turning point in our journey in information-theory land. Heretofore, we have just covered some very basic notions of IT, which have led us, nonetheless, to grasp the subtle concepts of information and entropy. Here, we are going to make significant steps into the depths of Shannon's theory, and hopefully begin to appreciate its power and elegance. This chapter is going to be somewhat more mathematically demanding, but it is guaranteed to be not significantly more complex than the preceding materials. Let's say that there is more ink involved in the equations and the derivation of the key results. But this light investment will turn out well worth it to appreciate the forthcoming chapters!
I will first introduce two more entropy definitions: joint and conditional entropies, just as there are joint and conditional probabilities. This leads to a new fundamental notion, that of mutual information, which is central to IT and the various Shannon's laws. Then I introduce relative entropy, based on the concept of “distance” between two PDFs. Relative entropy broadens the perspective beyond this chapter, in particular with an (optional) appendix exploration of the second law of thermodynamics, as analyzed in the light of information theory.
Joint and conditional entropies
So far, in this book, the notions of probability distribution and entropy have been associated with single, independent events x, as selected from a discrete source X = {x}.
Chapter 1 enabled us to familiarize ourselves (say to revisit, or to brush up?) the concept of probability. As we have seen, any probability is associated with a given event xi from a given event space S = {x1, x2,…, xN}. The discrete set {p(x1), p(x2),…, p(xN)} represents the probability distribution function or PDF, which will be the focus of this second chapter.
So far, we have considered single events that can be numbered. These are called discrete events, which correspond to event spaces having a finite size N (no matter how big N may be!). At this stage, we are ready to expand our perspective in order to consider event spaces having unbounded or infinite sizes (N → ∞). In this case, we can still allocate an integer number to each discrete event, while the PDF, p(xi), remains a function of the discrete variable xi. But we can conceive as well that the event corresponds to a real number, for instance, in the physical measurement of a quantity, such as length, angle, speed, or mass. This is another infinity of events that can be tagged by a real number x. In this case, the PDF, p(x), is a function of the continuous variable x.
This chapter is an opportunity to look at the properties of both discrete and continuous PDFs, as well as to acquire a wealth of new conceptual tools!
In this chapter we introduce the three main synthesis techniques which dominated the field up until the late 1980s, collectively known as first-generation techniques. Even though these techniques are used less today, it is still useful to discuss them because, apart from simple historical interest, they give us an understanding of why today's systems are configured the way they are. As an example, we need to know why today's dominant technique of unit selection is used rather than the more-basic approach which would be to generate speech waveforms “from scratch”. Furthermore, modern techniques have been made possible only by vast increases in processing power and memory, so in fact, for applications that require small footprints and low processing cost, the techniques explained here remain quite competitive.
Synthesis specification: the input to the synthesiser
First-generation techniques usually require a quite-detailed, low-level description of what is to be spoken. For purposes of explanation, we will take this to be a phonetic representation for the verbal component, together with a time for each phone and an F0 contour for the whole sentence. The phones will have been generated by a combination of lexical lookup, G2T rules and post-lexical processing (see Chapter 8), while the timing and F0 contour will have been generated by a classical prosody algorithm of the type described in Chapter 9. It is often convenient to place this information in a new structure called a synthesis specification.
This chapter describes what is generally considered to be one of the most important and historical contributions to the field of quantum computing, namely Shor's factorization algorithm. As its name indicates, this algorithm makes it possible to factorize numbers, which consists in their decomposition into a unique product of prime numbers. Other classical factorization algorithms previously developed have a complexity or computing time that increases exponentially with the number size, making the task intractable if not hopeless for large numbers. In contrast, Shor's algorithm is able to factor a number of any size in polynomial time, making the factorization problem tractable should a quantum computer ever be realized in the future. Since Shor's algorithm is based on several nonintuitive properties and other mathematical subtleties, this chapter presents a certain level of difficulty. With the previous chapters and tools readily assimilated, and some patience in going through the different preliminary steps required, such a difficulty is, however, quite surmountable. I have sought to make this description of Shor's algorithm as mathematically complete as possible and crack-free, while avoiding some academic considerations that may not be deemed necessary from any engineering perspective. Eventually, Shor's algorithm is described in only a few basic instructions. What is conceptually challenging is to grasp why it works so well, and also to feel comfortable with the fact that its implementation actually takes a fair amount of trial and error. The two preliminaries of Shor's algorithm are the phase estimation and the related order-finding algorithms.
This chapter is concerned with the measure of quantum states. This requires one to introduce the subtle notion of quantum measurement, an operation that has no counterpart in the classical domain. To this effect, we first need to develop some new tools, starting with Dirac notation, a formalism that is not only very elegant but is relatively simple to handle. The introduction of Dirac notation makes it possible to become familiar with the inner product for quantum states, as well as different properties for operators and states concerning projection, change of basis, unitary transformations, matrix elements, similarity transformations, eigenvalues and eigenstates, spectral decomposition and diagonal representation, matrix trace and density operator or matrix. The concept of density matrix makes it possible to provide a very first and brief hint of the analog of Shannon's entropy in the quantum world, referred to as von Neumann's entropy, to be further developed in Chapter 21. Once we have all the required tools, we can focus on quantum measurement and analyze three different types referred to as basis-state measurements, projection or von Neumann measurements, and POVM measurements. In particular, POVM measurements are shown to possess a remarkable property of unambiguous quantum state discrimination (UQSD), after which it is possible to derive “absolutely certain” information from unknown system states. The more complex case of quantum measurements in composite systems described by joint or tensor states is then considered.