To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter represents our first step into quantum information theory (QIT). The key to operating such a transition is to become familiar with the concept of the quantum bit, or qubit, which is a probabilistic superposition of the classical 0 and 1 bits. In the quantum world, the classical 0 and 1 bits become the pure states |0〉 and |1〉, respectively. It is as if a coin can be classically in either heads or tails states, but is now allowed to exist in a superposition of both! Then I show that qubits can be physically transformed by the action of unitary matrices, which are also called operators. I show that such qubit transformations, resulting from any qubit manipulation, can be described by rotations on a 2D surface, which is referred to as the Bloch sphere. The Pauli matrices are shown to generate all such unitary transformations. These transformations are reversible, because they are characterized by unitary matrices; this property always makes it possible to trace the input information carried by qubits. I will then describe different types of elementary quantum computations performed by elementary quantum gates, forming a veritable “zoo” of unitary operators, called I, X, Y, Z, H, CNOT, CCNOT, CROSSOVER or SWAP, controlled-U, and controlled-controlled-U. These gates can be used to form quantum circuits, involving any number of qubits, and of which several examples and tools for analysis are provided. Finally, the concept of tensor product, as progressively introduced through the above description, is eventually formalized.
We now turn to an examination of just what is involved in performing text-to-speech (TTS) synthesis. In the previous chapter, we described some of the basic properties of language, the nature of signal, form and meaning, and the four main processes of generation, encoding, decoding and understanding. We will now use this framework to explain how text-to-speech can be performed.
In TTS, the input is writing and the output speech. While it is somewhat unconventional to regard it as such, here we consider writing as a signal, in just the same way as speech. Normal reading then is a process of decoding the signal into the message, and then understanding the message to create meaning. We stress this, because too often no distinction at all is made between signal and form in writing; we hear about “the words on the page”. More often an informal distinction is made in that it is admitted that real writing requires some “tidying up” to find the linguistic form, for instance by “text normalisation”, which removes capital letters, spells out numbers or separates punctuation. Here we take a more structured view in that we see linguistic form as clean, abstract and unambiguous, and written form as a noisy signal that has been encoded from this form.
The process of reading aloud then is one of taking a signal of one type, writing, and converting it into a signal in another type, speech.
This relatively short chapter on channel entropy describes the entropy properties of communication channels, of which I have given a generic description in Chapter 11 concerning error-correction coding. It will also serve to pave the way towards probably the most important of all Shannon's theorems, which concerns channel coding, as described in the more extensive Chapter 13. Here, we shall consider the different basic communication channels, starting with the binary symmetric channel, and continuing with nonbinary, asymmetric channel types. In each case, we analyze the channel's entropy characteristics and mutual information, given a discrete source transmitting symbols and information thereof, through the channel. This will lead us to define the symbol error rate (SER), which corresponds to the probability that symbols will be wrongly received or mistaken upon reception and decoding.
Binary symmetric channel
The concept of the communication channel was introduced in Chapter 11. To recall briefly, a communication channel is a transmission means for encoded information. Its constituents are an originator source (generating message symbols), an encoder, a transmitter, a physical transmission pipe, a receiver, a decoder, and a recipient source (restituting message symbols). The two sources (originator and recipient) may be discrete or continuous. The encoding and decoding scheme may include not only symbol-to-codeword conversion and the reverse, but also data compression and error correction, which we will not be concerned with in this chapter. Here, we shall consider binary channels.
The concept of entropy is central to information theory (IT). The name, of Greek origin (entropia, tropos), means turning point or transformation. It was first coined in 1864 by the physicist R. Clausius, who postulated the second law of thermodynamics. Among other implications, this law establishes the impossibility of perpetual motion, and also that the entropy of a thermally isolated system (such as our Universe) can only increase. Because of its universal implications and its conceptual subtlety, the word entropy has always been enshrouded in some mystery, even, as today, to large and educated audiences.
The subsequent works of L. Boltzmann, which set the grounds of statistical mechanics, made it possible to provide further clarifications of the definition of entropy, as a natural measure of disorder. The precursors and founders of the later information theory (L. Szilárd, H. Nyquist, R. Hartley, J. von Neumann, C. Shannon, E. Jaynes, and L. Brillouin) drew as many parallels between the measure of information (the uncertainty in communication-source messages) and physical entropy (the disorder or chaos within material systems). Comparing information with disorder is not at all intuitive. This is because information (as we conceive it) is pretty much the conceptual opposite of disorder! Even more striking is the fact that the respective formulations for entropy that have been successively made in physics and IT happen to match exactly. A legend has it that Shannon chose the word “entropy” from the following advice of his colleague von Neumann: “Call it entropy.
We now turn to the problem of how to convert the discrete, linguistic, word-based representation generated by the text-analysis system into a continuous acoustic waveform. One of the primary difficulties in this task stems from the fact that the two representations are so different in nature. The linguistic description is discrete, the same for each speaker for a given accent, compact and minimal. By contrast, the acoustic waveform is continuous, is massively redundant, and varies considerably even between utterances with the same pronunciation from the same speaker. To help with the complexity of this transformation, we break the problem down into a number of components. The first of these components, pronunciation, is the subject of this chapter. While specifics vary, this can be thought of as a system that takes the word-based linguistic representation and generates a phonemic or phonetic description of what is to be spoken by the subsequent waveform-synthesis component. In generating this representation, we make use of a lexicon, to find the pronunciations of words we know and can store, and a grapheme-to-phoneme (G2P) algorithm, to guess the pronunciations of words we don't know or can't store. After doing this we may find that simply concatenating the pronunciations for the words in the lexicon is not enough; words interact in a number of ways and so a certain amount of post-lexical processing is required. Finally, there is considerable choice in terms of how exactly we should specify the pronunciations for words, hence rigorously defining a pronunciation representation is in itself a key topic.
This chapter considers the continuous-channel case represented by the Gaussian channel, namely, a continuous communication channel with Gaussian additive noise. This will lead to a fundamental application of Shannon's coding theorem, referred to as the Shannon–Hartley theorem (SHT), another famous result of information theory, which also credits the earlier 1920 contribution of Ralph Hartley, who derived what remained known as the Hartley's law of communication channels. This theorem relates channel capacity to the signal and noise powers, in a most elegant and simple formula. As a recent and little-noticed development in this field, I will describe the nonlinear channel, where the noise is also a function of the transmitted signal power, owing to channel nonlinearities (an exclusive feature of certain physical transmission pipes, such as optical fibers). As we shall see, the modified SHT accounting for nonlinearity represents a major conceptual progress in information theory and its applications to optical communications, although its existence and consequences have, so far, been overlooked in textbooks. This chapter completes our description of classical information theory, as resting on Shannon's works and founding theorems. Upon completion, we will then be equipped to approach the field of quantum information theory, which represents the second part of this series of chapters.
Gaussian channel
Referring to Chapter 6, a continuous communications channel assumes a continuous originator source, X, whose symbol alphabet x1,…, xi can be viewed as representing time samples of a continuous, real variable x, which is associated with a continuous probability distribution function or PDF, p(x).
Informally we can describe prosody as the part of human communication which expresses emotion, emphasises words, reveals the speaker's attitude, breaks a sentence into phrases, governs sentence rhythm and controls the intonation, pitch or tune of the utterance. This chapter describes how to predict prosodic form from the text while Chapter 9 goes on to describe how to synthesize the acoustics of prosodic expression from these form representations. In this chapter we first introduce the various manifestations of prosody in terms of phrasing, prominence and intonation. Next we go on to describe how prosody is used in communication, and in particular explain why this has a much more direct affect on the final speech patterns than with verbal communication. Finally we describe techniques for predicting what prosody should be generated from a text input.
Prosodic form
In our discussion of the verbal component of language, we saw that, while there were many difficulties in pinning down the exact nature of words and phonemes, broadly speaking words and phonemes were fairly easy to find, identify and demarcate. Furthermore, people can do this readily without much specialist linguistic training – given a simple sentence, most people can say which words were spoken, and with some guidance people have little difficulty in identifying the basic sounds in that sentence.
The situation is nowhere near as clear for prosody, and it may amaze new comers to this topic to discover that there are no widely agreed description or representation systems for any aspect of prosody, be it to do with emotion, intonation, phrasing or rhythm.
In this chapter we turn to the topic of speech analysis, which tackles the problem of deriving representations from recordings of real speech signals. This book is of course concerned with speech synthesis – and at first sight it may seem that the techniques for generating speech “bottom-up” as described in Chapters 10 and 11 may be sufficient for our purpose. As we shall see, however, many techniques in speech synthesis actually rely on an analysis phase, which captures key properties of real speech and then uses these to generate new speech signals. In addition, the various techniques here enable useful characterisation of real speech phenomena for purposes of visualisation or statistical analysis. Speech analysis then is the process of converting a speech signal into an alternative representation that in some way better represents the information which we are interested in. We need to perform analysis because waveforms do not usually directly give us the type of information we are interested in.
Nearly all speech analysis is concerned with three key problems. First, we wish to remove the influence of phase; second, we wish to perform source/filter separation, so that we can study the spectral envelope of sounds independently of the source that they are spoken with. Finally, we often wish to transform these spectral envelopes and source signals into other representations that are coded more efficiently, have certain robustness properties, or more clearly show the linguistic information we require.
This final chapter concerns cryptography, the principle of securing information against access or tampering by third parties. Classical cryptography refers to the manipulation of classical bits for this purpose, while quantum cryptography can be viewed as doing the same with qubits. I describe these two approaches in the same chapter, as in my view the field of cryptography should be understood as a whole and appreciated within such a broader framework, as opposed to focusing on the specific applications offered by the quantum approach. I, thus, begin by introducing the notions of message encryption, message decryption, and code breaking, the action of retrieving the message information contents without knowledge of the code's secret algorithm or secret key. I then consider the basic algorithms to achieve encryption and decryption with binary numbers, which leads to the early IBM concept of the Lucifer cryptosystem, which is the ancestor of the first data encryption standard (DES). The principle of double-key encryption, which alleviates the problem of key exchange, is first considered as an elegant solution but it is unsafe against code-breaking. Then the revolutionary principles of cryptography without key exchange and public-key cryptography (PKC) are considered, the latter also being known as RSA. The PKC–RSA cryptosystem is based on the extreme difficulty of factorizing large numbers. This is the reason for the description made earlier in Chapter 20 concerning Shor's factorization algorithm.
This appendix gives a brief guide to the probability theory needed at various stages in the book. The following is too brief to be intended as a first exposure to probability, but rather is here to act as a reference. Good introductory books on probability include Bishop, and Duda, Hart and Stork.
Discrete probabilities
Discrete events are the simplest to interpret. For example, what is the probability of
it raining tomorrow?
a 6 being thrown on a die?
Probability can be thought of as the chance of a particular event occurring. We limit the range of our probability measure to lie in the range 0 to 1, where
lower numbers indicate that the event is less likely to occur, 0 indicates it will never occur;
higher numbers indicate that the event is more likely to occur, 1 indicates that the event will definitely occur.
We like to think that we have a good grasp of both estimating and using probability. For simple cases such as “will it rain tomorrow?” we can do reasonably well. However, as situations get more complicated things are not always so clear. The aim of probability theory is to give us a mathematically sound way of inferring information using probabilities.
Discrete random variables
Let some event have have M possible outcomes. We are interested in the probability of each of these outcomes occurring.
The previous chapter introduced the concept of coding optimality, as based on variable-length codewords. As we have learnt, an optimal code is one for which the mean codeword length closely approaches or is equal to the source entropy. There exist several families of codes that can be called optimal, as based on various types of algorithms. This chapter, and the following, will provide an overview of this rich subject, which finds many applications in communications, in particular in the domain of data compression. In this chapter, I will introduce Huffman codes, and then I will describe how they can be used to perform data compression to the limits predicted by Shannon. I will then introduce the principle of block codes, which also enable data compression.
Huffman codes
As we have learnt earlier, variable-length codes are in the general case more efficient than fixed-length ones. The most frequent source symbols are assigned the shortest codewords, and the reverse for the less frequent ones. The coding-tree method makes it possible to find some heuristic codeword assignment, according to the above rule. Despite the lack of further guidance, the result proved effective, considering that we obtained η = 96.23% with a ternary coding of the English-character source (see Fig. 8.3, Table 8.3). But we have no clue as to whether other coding trees with greater coding efficiencies may ever exist, unless we try out all the possibilities, which is impractical.
This mathematically intensive chapter takes us through our first steps in the domain of quantum computation (QC) algorithms. The simplest of them is the Deutsch algorithm, which makes it possible to determine whether or not a Boolean function is constant for any input. The key result is that this QC algorithm provides the answer at once, whereas in the classical case it would take two independent calculations. I describe next the generalization of the former algorithm to n qubits, referred to as the Deutsch–Jozsa algorithm. Although they have no specific or useful applications in quantum computing, both algorithms represent a most elegant means of introducing the concept of quantum computation parallelism. I then describe two most important QC algorithms, which nicely exploit quantum parallelism. The first is the quantum Fourier transform (QFT), for which a detailed analysis of QFT circuits and quantum-gate requirements is also provided. As will be shown in the next chapter, a key application of QFT concerns the famous Shor's algorithm, which makes it possible to factor numbers into primes in terms of polynomials. The second algorithm, no less famous than Shor's, is referred to as the Grover quantum database search, whose application is the identification of database items with a quadratic gain in speed.
Deutsch algorithm
Our exploration of quantum algorithms shall begin with the solution of a very basic problem: finding whether or not a Boolean function f(x) is a constant.
This chapter is about coding information, which is the art of packaging and formatting information into meaningful codewords. Such codewords are meant to be recognized by computing machines for efficient processing or by human beings for practical understanding. The number of possible codes and corresponding codewords is infinite, just like the number of events to which information can be associated, in Shannon's meaning. This is the point where information theory will start revealing its elegance and power. We will learn that codes can be characterized by a certain efficiency, which implies that some codes are more efficient than others. This will lead us to a description of the first of Shannon's theorems, concerning source coding. As we shall see, coding is a rich subject, with many practical consequences and applications; in particular in the way we efficiently communicate information. We will first start our exploration of information coding with numbers and then with language, which conveys some background and flavor as a preparation to approach the more formal theory leading to the abstract concept of code optimality.
Coding numbers
Consider a source made of N different events. We can label the events through a set of numbers ranging from 1 to N, which constitute a basic source code. This code represents one out of N! different possibilities. In the code, each of the numbers represents a codeword.
The speech-production process was qualitatively described in Chapter 7. There we showed that speech is produced by a source, such as the glottis, which is subsequently modified by the vocal tract acting as a filter. In this chapter, we turn our attention to developing a more-formal quantitative model of speech production, using the techniques of signals and filters described in Chapter 10.
The acoustic theory of speech production
Such models often come under the heading of the acoustic theory of speech production, which refers both to the general field of research in mathematical speech-production models and to the book of that title by Fant. Although considerable previous work in this field had been done prior to its publication, this book was the first to bring together various strands of work and describe the whole process in a unified manner. Furthermore, Fant backed his study up with extensive empirical studies with X-rays and mechanical models to test and verify the speech-production models being proposed. Since then, many refinements to the model have been made, as researchers have investigated trying to improve the accuracy and practicalities of these models. Here we focus on the single most widely accepted model, but conclude the chapter with a discussion on variations on this.
As with any modelling process, we have to reach a compromise between a model that accurately describes the phenomena in question and one that is simple, effective and suited to practical needs.