To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Speech-processing technology has been a mainstream area of research for more than 50 years. The ultimate goal of speech research is to build systems that mimic (or potentially surpass) human capabilities in understanding, generating and coding speech for a range of human-to-human and human-to-machine interactions.
In the area of speech coding a great deal of success has been achieved in creating systems that significantly reduce the overall bit rate of the speech signal (from of the order of 100 kilobits per second to rates of the order of 8 kilobits per second or less), while maintaining speech intelligibility and quality at levels appropriate for the intended applications. The heart of the modern cellular industry is the 8 kilobit per second speech coder, embedded in VLSI logic on the more than two billion cellphones in use worldwide at the end of 2007.
In the area of speech recognition and understanding by machines, steady progress has enabled systems to become part of everyday life in the form of call centres for the airlines, financial, medical and banking industries, help desks for large businesses, form and report generation for the legal and medical communities, and dictation machines that enable individuals to enter text into machines without having to type the text explicitly.
This chapter is concerned with the measure of information contained in qubits. This can be done only through quantum measurement, an operation that has no counterpart in the classical domain. I shall first describe in detail the case of single qubit measurements, which shows under which measurement conditions “classical” bits can be retrieved. Next, we consider the measurements of higher-order or n-qubits. Particular attention is given to the Einstein–Podolsky–Rosen (EPR) or Bell states, which, unlike other joint tensor states, are shown being entangled. The various single-qubit measurement outcomes from the EPR–Bell states illustrate an effect of causality in the information concerning the other qubit. We then focus on the technique of Bell measurement, which makes it possible to know which Bell state is being measured, yielding two classical bits as the outcome. The property of EPR–Bell state entanglement is exploited in the principle of quantum superdense coding, which makes it possible to transmit classical bits at twice the classical rate, namely through the generation and measurement of a single qubit. Another key application concerns quantum teleportation. It consists of the transmission of quantum states over arbitrary distances, by means of a common EPR–Bell state resource shared by the two channel ends. While quantum teleportation of a qubit is instantaneous, owing to the effect of quantum-state collapse, it is shown that its completion does require the communication of two classical bits, which is itself limited by the speed of light.
This chapter will take us into a world very different from all that we have seen so far concerning Shannon's information theory. As we shall see, it is a strange world made of virtual computers (universal Turing machines) and abstract axioms that can be demonstrated without mathematics merely by the force of logic, as well as relatively involved formalism. If the mere evocation of Shannon, of information theory, or of entropy may raise eyebrows in one's professional circle, how much more so that of Kolmogorov complexity! This chapter will remove some of the mystery surrounding “complexity,” also called “algorithmic entropy,” without pretending to uncover it all. Why address such a subject right here, in the middle of our description of Shannon's information theory? Because, as we shall see, algorithmic entropy and Shannon entropy meet conceptually at some point, to the extent of being asymptotically bounded, even if they come from totally uncorrelated basic assumptions! This remarkable convergence between fields must make integral part of our IT culture, even if this chapter will only provide a flavor. It may be perceived as being somewhat more difficult or demanding than the preceding chapters, but the extra investment, as we believe, is well worth it. In any case, this chapter can be revisited later on, should the reader prefer to keep focused on Shannon's theory and move directly to the next stage, without venturing into the intriguing sidetracks of algorithmic information theory.
The task of text decoding is to take a tokenised sentence and determine the best sequence of words. In many situations this is a classical disambiguation problem: there is one, and only one, correct sequence of words that gave rise to the text, and it is our job to determine this. In other situations, especially where we are dealing with non-natural-language text such as numbers and dates and so on, there may be a few different acceptable word sequences.
So, in general, text decoding in TTS is a process of resolving ambiguity. The ambiguity arises because two or more underlying forms share the same surface form, and, given the surface form (i.e. the writing), we need to find which of the underlying forms is the correct one. There are many types of linguistic ambiguity, including word identity, grammatical and semantic, but in TTS we need only concentrate on the type of ambiguity which affects the actual sound produced. So, while there are two words that share the orthographic form bank, they both sound the same, so we can ignore this type of ambiguity for TTS purposes. Tokens such as record can be pronounced in two different ways, so this is the type of ambiguity we need to resolve.
In this chapter, we concentrate on resolving ambiguity relating to the verbal component of language.
A study of human hearing and the biomechanical processes involved in hearing, reveals several nonlinear steps, or stages, in the perception of sound. Each of these stages contributes to the eventual unequal distribution of subjective features against purely physical ones in human hearing.
Put simply, what we think we hear is quite significantly different from the physical sounds that may be present (which in turn differs from what would be captured electronically by, for example, a computer). By taking into account the various nonlinearities in the hearing process, and some of the basic physical characteristics of the ear, nervous system, and brain, it is possible to account for the discrepancy.
Over the years, science and technology has incrementally improved the ability to model the hearing process from purely physical data. One simple example is that of A-law compression (or the similar μ-law used in some regions of the world), where approximately logarithmic amplitude quantisation replaces the linear quantisation of PCM: humans tend to perceive amplitude logarithmically rather than linearly, and thus A-law quantisation using 8 bits sounds better than linear PCM quantisation using 8 bits. It thus achieves a higher degree of subjective speech quality than PCM for a given bitrate.
Physical processes
The ear, as shown diagrammatically in Figure 4.1, includes the pinna which filters sound and focuses it into the external auditory canal. Sound then acts upon the eardrum where it is transmitted and amplified by the three bones, the malleus, incus and stapes, to the oval window, opening on to the cochlea.
This chapter introduces the notion of noisy quantum channels, and the different types of “quantum noise” that affect qubit messages passed through such channels. The main types of noisy channel reviewed here are the depolarizing, bit-flip, phase-flip, and bit-phase-flip channels. Then the quantum channel capacity χ is defined through the Holevo–Schumacher–Westmoreland (HSW) theorem. Such a theorem can conceptually be viewed as the elegant quantum counterpart of Shannon's (noisy) channel coding theorem, which was described in Chapter 13. Here, I shall not venture into the complex proof of the HSW theorem but only provide a background illustrating the similarity with its classical counterpart. The resemblance with the channel capacity χ and the Holevo bound, as described in Chapter 21, and with the classical mutual information H(X; Y), as described in Chapter 5, are both discussed. For advanced reference, a hint is provided as to the meaning of the still not fully explored concept of quantum coherent information. Several examples of quantum channel capacity, derived from direct applications of the HSW theorem, along with the solution of the maximization problem, are provided.
Noisy quantum channels
The notion of “noisiness” in a classical communication channel was first introduced in Chapter 12, when describing channel entropy. Such a channel can be viewed schematically as a probabilistic relation between two random sources, X for the originator, and Y for the recipient.
This chapter makes us walk a few preliminary, but decisive, steps towards quantum information theory (QIT), which will be the focus of the rest of this book. Here, we shall remain in the classical world, yet getting a hint that it is possible to think of a different world where computations may be reversible, namely, without any loss of information. One key realization through this paradigm shift is that “information is physical.” As we shall see, such a nonintuitive and striking conclusion actually results from the age-long paradox of Maxwell's demon in thermodynamics, which eventually found an elegant conclusion in Landauer's principle. This principle states that the erasure of a single bit of information requires one to provide an energy that is proportional to log 2, which, as we know from Shannon's theory, is the measure of information and also the entropy of a two-level system with a uniformly distributed source. This consideration brings up the issue of irreversible computation. Logic gates, used at the heart of the CPU in modern computers, are based on such computation irreversibility. I shall then describe the computers' von Newman's architecture, the intimate workings of the ALU processing network, and the elementary logic gates on which the ALU is based. This will also provide some basics of Boolean logic, expanding on Chapter 1, which is the key to the following logic-gate concepts.