Multimedia Computing

Gerald Friedland; Ramesh Jain

doi:10.1017/CBO9781139049351

Chapter 14: Speech Compression

pp. 174-187

Gerald Friedland

, International Computer Science Institute, Berkeley, California,

Ramesh Jain

, University of California, Irvine

Get access

Add bookmark
Cite
Share

Summary

While the compression techniques presented so far have assumed generic acoustic or visual content, this chapter presents lossy compression techniques especially designed for a particular type of acoustic data: human speech. Almost every human being on earth talks virtually every day – needless to say, there is a lot of captured digital speech content. Every movie or TV show contains an audio track, most of which usually consists of spoken language. The most important use of captured speech, however, is for communication, such as in cell phones, voice-over IP applications, or as part of video conferencing and meeting recordings. Most of the compression concepts discussed so far will also work on speech. The algorithms presented in this chapter were developed to achieve a higher compression ratio while preserving higher perceptual quality by exploiting speech-specific properties of the audio signal. We discussed human speech in Chapter 5. This chapter will directly dig into the algorithmic part using that knowledge.

Properties of a Speech Coder

As explained in Chapter 5, the properties of every sound are defined by the properties of the objects that create the sounds, by the environment that the sound waves travel in, and by the characteristics of the receiver and/or capturing device. The object that creates human speech is the vocal tract. Vocal tracts also exist in animals, such as birds or cats. As we all know, the sounds they produce differ substantially from average human speech, so creating a bird-sing compression or cat’s meow encoding algorithm would also be substantially different. The following algorithms all try to exploit the characteristics of speech and have very limited applicability to music or other nonspeech. However, all of them are of importance to multimedia computing because millions of people use them in everyday life.

About the book

Book DOI https://doi.org/10.1017/CBO9781139049351
Subjects Communications and Signal Processing,Computer Science,Engineering,Robotics, Vision, and Graphics
Format: Hardback
- Publication date: 28 July 2014
- ISBN: 9780521764513
Format: Digital
- Publication date: 12 September 2018
- ISBN: 9781139049351
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$98.00

Hardback

US$98.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers