To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter deals with the subject of quantum error correction and the related codes (QECC), which can be applied to noisy quantum channels and quantum memories with the purpose of preserving or protecting the information integrity. I first describe the basics of quantum repetition codes, as applicable to bit-flip and phase-flip quantum channels. Then I consider the 9-qubit Shor code, which has the capability of diagnosing and correcting any combination of bit-flip and phase-flip errors, up to one error of each type. Furthermore, it is shown that the Shor code is, in fact, capable of fully restoring qubit integrity under a continuum of bit or phase errors, a property that has no counterpart in the classical world of error-correction codes. But the exploration of QECC does not stop here! We shall discover the elegant Calderbank–Shor–Steane (CSS) codes, which have the capability of correcting any number of errors t, both bit-flip and phase-flip. As an application of the CSS code, I then describe the 7-qubit Hadamard–Steane code, which can correct up to one error on single qubits. A corresponding quantum circuit, based on an original generator-matrix example, is presented.
Quantum repetition code
In Chapter 11, we saw that the simplest form of error-correction code (ECC) is the repetition code, based on the principle of majority logic. The background assumption is that in a given message sequence, or bit string, the probability of a bit error is sufficiently small for the majority of bits to be correctly transmitted through the channel.
I'd like to say that Text-to-Speech Synthesis was years in the planning but nothing could be further from the truth. In mid 2004, as Rhetorical Systems was nearing its end, I suddenly found myself with spare time on my hands for the first time since …, well, ever to be honest. I had no clear idea of what to do next and thought it might “serve the community well” to jot down a few thoughts on TTS. The initial idea was a slim “hardcore” technical book explaining the state of the art, but as writing continued I realised that more and more background material was needed. Eventually the initial idea was dropped and the more-comprehensive volume that you now see was written.
Some early notes were made in Edinburgh, but the book mainly started during a summer I spent in Boulder, Colorado. I was freelancing at that point but am grateful to Ron Cole and his group for accommodating me in such a friendly fashion. A good deal of work was completed outside the office and I am eternally grateful to Laura Michalis for putting me up, putting up with me and generally being there and supportive during that phase.
While the book was in effect written entirely by myself, I should pay particular thanks to Steve Young and Mark Gales as I used the HTK book and lecture notes directly in Chapter 15 and the appendix.
Because of the reader's interest in information theory, it is assumed that, to some extent, he or she is relatively familiar with probability theory, its main concepts, theorems, and practical tools. Whether a graduate student or a confirmed professional, it is possible, however, that a good fraction, if not all of this background knowledge has been somewhat forgotten over time, or has become a bit rusty, or even worse, completely obliterated by one's academic or professional specialization!
This is why this book includes a couple of chapters on probability basics. Should such basics be crystal clear in the reader's mind, however, then these two chapters could be skipped at once. They can always be revisited later for backup, should some of the associated concepts and tools present any hurdles in the following chapters. This being stated, some expert readers may yet dare testing their knowledge by considering some of this chapter's (easy) problems, for starters. Finally, any parent or teacher might find the first chapter useful to introduce children and teens to probability.
I have sought to make this review of probabilities basics as simple, informal, and practical as it could be. Just like the rest of this book, it is definitely not intended to be a math course, according to the canonic theorem–proof–lemma–example suite. There exist scores of rigorous books on probability theory at all levels, as well as many Internet sites providing elementary tutorials on the subject.
Audio is normal and best handled by Matlab, when stored as a vector of samples, with each individual value being a double-precision floating point number. A sampled sound can be completely specified by the sequence of these numbers plus one other item of information: the sample rate. In general, the majority of digital audio systems differ from this in only one major respect, and that is they tend to store the sequence of samples as fixed-point numbers instead. This can be a complicating factor for those other systems, but an advantage to Matlab users who have two less considerations to be concerned with when processing audio: namely overflow and underflow.
Any operation that Matlab can perform on a vector can, in theory, be performed on stored audio. The audio vector can be loaded and saved in the same way as any other Matlab variable, processed, added, plotted, and so on. However there are of course some special considerations when dealing with audio that need to be discussed within this chapter, as a foundation for the processing and analysis discussed in the later chapters.
This chapter begins with an overview of audio input and output in Matlab, including recording and playback, before considering scaling issues, basic processing methods, then aspects of continuous analysis and processing. A section on visualisation covers the main time- and frequency-domain plotting techniques. Finally, methods of generating sounds and noise are given.
Handling audio in Matlab
Given a high enough sample rate, the double precision vector has sufficient resolution for almost any type of processing that may need to be performed – meaning that one can usually safely ignore quantisation issues when in the Matlab environment.
Before delving into the details of how to perform text-to-speech conversion, we will first examine some of the fundamentals of communication in general. This chapter looks at the various ways in which people communicate and how communication varies depending on the situation and the means which are used. From this we can develop a general model, which will then help us specify the text-to-speech problem more exactly in the following chapter.
Types of communication
We experience the world though our senses and we can think of this as a process of gaining information. We share this ability with most other animals: if an animal hears running water it can infer that there is a stream nearby; if it sees a ripe fruit it can infer that there is food available. This ability to extract information from the world via the senses is a great advantage in the survival of any species. Animals can, however, cause information to be created: many animals make noises, such as barks or roars, or gestures such as flapping or head nodding, which are intended to be interpreted by other animals. We call the process of deliberate creation of information with the intention that it be interpretedcommunication.
The prerequisites for communication are an ability to create information in one being, an ability to transmit this information and an ability to perceive the created information by another being.
This relatively short but mathematically intense chapter brings us to the core of Shannon's information theory, with the definition of channel capacity and the subsequent, most famous channel coding theorem (CCT), the second most important theorem from Shannon (next to the source coding theorem, described in Chapter 8). The formal proof of the channel coding theorem is a bit tedious, and, therefore, does not lend itself to much oversimplification. I have sought, however, to guide the reader in as many steps as is necessary to reach the proof without hurdles. After defining channel capacity, we will consider the notion of typical sequences and typical sets (of such sequences) in codebooks, which will make it possible to tackle the said CCT. We will first proceed through a formal proof, as inspired from the original Shannon paper (but consistently with our notation, and with more explanation, where warranted); then with different, more intuitive or less formal approaches.
Channel capacity
In Chapter 12, I have shown that in a noisy channel, the mutual information, H(X;Y) = H(Y) − H(Y|X), represents the measure of the true information contents in the output or recipient source Y, given the equivocation H(Y|X), which measures the informationless channel noise. We have also shown that mutual information depends on the input probability distribution, p(x).
This chapter represents our first step into quantum information theory (QIT). The key to operating such a transition is to become familiar with the concept of the quantum bit, or qubit, which is a probabilistic superposition of the classical 0 and 1 bits. In the quantum world, the classical 0 and 1 bits become the pure states |0〉 and |1〉, respectively. It is as if a coin can be classically in either heads or tails states, but is now allowed to exist in a superposition of both! Then I show that qubits can be physically transformed by the action of unitary matrices, which are also called operators. I show that such qubit transformations, resulting from any qubit manipulation, can be described by rotations on a 2D surface, which is referred to as the Bloch sphere. The Pauli matrices are shown to generate all such unitary transformations. These transformations are reversible, because they are characterized by unitary matrices; this property always makes it possible to trace the input information carried by qubits. I will then describe different types of elementary quantum computations performed by elementary quantum gates, forming a veritable “zoo” of unitary operators, called I, X, Y, Z, H, CNOT, CCNOT, CROSSOVER or SWAP, controlled-U, and controlled-controlled-U. These gates can be used to form quantum circuits, involving any number of qubits, and of which several examples and tools for analysis are provided. Finally, the concept of tensor product, as progressively introduced through the above description, is eventually formalized.
We now turn to an examination of just what is involved in performing text-to-speech (TTS) synthesis. In the previous chapter, we described some of the basic properties of language, the nature of signal, form and meaning, and the four main processes of generation, encoding, decoding and understanding. We will now use this framework to explain how text-to-speech can be performed.
In TTS, the input is writing and the output speech. While it is somewhat unconventional to regard it as such, here we consider writing as a signal, in just the same way as speech. Normal reading then is a process of decoding the signal into the message, and then understanding the message to create meaning. We stress this, because too often no distinction at all is made between signal and form in writing; we hear about “the words on the page”. More often an informal distinction is made in that it is admitted that real writing requires some “tidying up” to find the linguistic form, for instance by “text normalisation”, which removes capital letters, spells out numbers or separates punctuation. Here we take a more structured view in that we see linguistic form as clean, abstract and unambiguous, and written form as a noisy signal that has been encoded from this form.
The process of reading aloud then is one of taking a signal of one type, writing, and converting it into a signal in another type, speech.
This relatively short chapter on channel entropy describes the entropy properties of communication channels, of which I have given a generic description in Chapter 11 concerning error-correction coding. It will also serve to pave the way towards probably the most important of all Shannon's theorems, which concerns channel coding, as described in the more extensive Chapter 13. Here, we shall consider the different basic communication channels, starting with the binary symmetric channel, and continuing with nonbinary, asymmetric channel types. In each case, we analyze the channel's entropy characteristics and mutual information, given a discrete source transmitting symbols and information thereof, through the channel. This will lead us to define the symbol error rate (SER), which corresponds to the probability that symbols will be wrongly received or mistaken upon reception and decoding.
Binary symmetric channel
The concept of the communication channel was introduced in Chapter 11. To recall briefly, a communication channel is a transmission means for encoded information. Its constituents are an originator source (generating message symbols), an encoder, a transmitter, a physical transmission pipe, a receiver, a decoder, and a recipient source (restituting message symbols). The two sources (originator and recipient) may be discrete or continuous. The encoding and decoding scheme may include not only symbol-to-codeword conversion and the reverse, but also data compression and error correction, which we will not be concerned with in this chapter. Here, we shall consider binary channels.
The concept of entropy is central to information theory (IT). The name, of Greek origin (entropia, tropos), means turning point or transformation. It was first coined in 1864 by the physicist R. Clausius, who postulated the second law of thermodynamics. Among other implications, this law establishes the impossibility of perpetual motion, and also that the entropy of a thermally isolated system (such as our Universe) can only increase. Because of its universal implications and its conceptual subtlety, the word entropy has always been enshrouded in some mystery, even, as today, to large and educated audiences.
The subsequent works of L. Boltzmann, which set the grounds of statistical mechanics, made it possible to provide further clarifications of the definition of entropy, as a natural measure of disorder. The precursors and founders of the later information theory (L. Szilárd, H. Nyquist, R. Hartley, J. von Neumann, C. Shannon, E. Jaynes, and L. Brillouin) drew as many parallels between the measure of information (the uncertainty in communication-source messages) and physical entropy (the disorder or chaos within material systems). Comparing information with disorder is not at all intuitive. This is because information (as we conceive it) is pretty much the conceptual opposite of disorder! Even more striking is the fact that the respective formulations for entropy that have been successively made in physics and IT happen to match exactly. A legend has it that Shannon chose the word “entropy” from the following advice of his colleague von Neumann: “Call it entropy.
We now turn to the problem of how to convert the discrete, linguistic, word-based representation generated by the text-analysis system into a continuous acoustic waveform. One of the primary difficulties in this task stems from the fact that the two representations are so different in nature. The linguistic description is discrete, the same for each speaker for a given accent, compact and minimal. By contrast, the acoustic waveform is continuous, is massively redundant, and varies considerably even between utterances with the same pronunciation from the same speaker. To help with the complexity of this transformation, we break the problem down into a number of components. The first of these components, pronunciation, is the subject of this chapter. While specifics vary, this can be thought of as a system that takes the word-based linguistic representation and generates a phonemic or phonetic description of what is to be spoken by the subsequent waveform-synthesis component. In generating this representation, we make use of a lexicon, to find the pronunciations of words we know and can store, and a grapheme-to-phoneme (G2P) algorithm, to guess the pronunciations of words we don't know or can't store. After doing this we may find that simply concatenating the pronunciations for the words in the lexicon is not enough; words interact in a number of ways and so a certain amount of post-lexical processing is required. Finally, there is considerable choice in terms of how exactly we should specify the pronunciations for words, hence rigorously defining a pronunciation representation is in itself a key topic.
This chapter considers the continuous-channel case represented by the Gaussian channel, namely, a continuous communication channel with Gaussian additive noise. This will lead to a fundamental application of Shannon's coding theorem, referred to as the Shannon–Hartley theorem (SHT), another famous result of information theory, which also credits the earlier 1920 contribution of Ralph Hartley, who derived what remained known as the Hartley's law of communication channels. This theorem relates channel capacity to the signal and noise powers, in a most elegant and simple formula. As a recent and little-noticed development in this field, I will describe the nonlinear channel, where the noise is also a function of the transmitted signal power, owing to channel nonlinearities (an exclusive feature of certain physical transmission pipes, such as optical fibers). As we shall see, the modified SHT accounting for nonlinearity represents a major conceptual progress in information theory and its applications to optical communications, although its existence and consequences have, so far, been overlooked in textbooks. This chapter completes our description of classical information theory, as resting on Shannon's works and founding theorems. Upon completion, we will then be equipped to approach the field of quantum information theory, which represents the second part of this series of chapters.
Gaussian channel
Referring to Chapter 6, a continuous communications channel assumes a continuous originator source, X, whose symbol alphabet x1,…, xi can be viewed as representing time samples of a continuous, real variable x, which is associated with a continuous probability distribution function or PDF, p(x).