Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-m9kch Total loading time: 0 Render date: 2024-05-25T07:03:49.364Z Has data issue: false hasContentIssue false

9 - Speech recognition

Published online by Cambridge University Press:  05 June 2016

Ian Vince McLoughlin
Affiliation:
University of Kent
Get access

Summary

Having considered big data in the previous chapter, we now turn our attention to speech recognition – probably the one area of speech research that has gained the most from machine learning techniques. In fact, as discussed in the introduction to Chapter 8, it was only through the application of well-trained machine learning methods that automatic speech recognition (ASR) technology was able to advance beyond a decades long plateau that limited performance, and hence the spread of further applications.

What is speech recognition?

Entire texts have been written on the subject of speech recognition, and this topic alone probably accounts for more than half of the recent research literature and computational development effort in the fields of speech and audio processing. There are good reasons for this interest, primarily driven by the wish to be able to communicate more naturally with a computer (i.e. without the use of a keyboard and mouse). This is a wish which has been around for almost as long as electronic computers have been with us. From a historical perspective we might consider identifying a hierarchy of mainstream human– computer interaction steps as follows:

Hardwired: The computer designer (i.e. engineer) ‘reprograms’ a computer, and provides input by reconnecting wires and circuits.

Card: Punched cards are used as input, printed tape as output.

Paper: Teletype input is used directly, and printed paper as output.

Alphanumeric: Electronic keyboards and monitors (visual display units), alphanumeric data.

Graphical: Mice and graphical displays enable the rise of graphical user interfaces (GUIs).

WIMP: Standardised methods of windows, icons, mouse and pointer (WIMP) interaction become predominant.

Touch: Touch-sensitive displays, particularly on smaller devices.

Speech commands: Nascent speech commands (such as voice dialling, voice commands, speech alerts), plus workable dictation capabilities and the ability to read back selected text.

Natural language: We speak to the computer in a similar way to a person, it responds similarly.

Anticipatory: The computer understands when we speak to it just like a close friend, husband or wife would, often anticipating what we will say, understanding the implied context as well as shared references or memories of past events.

Type
Chapter
Information
Speech and Audio Processing
A MATLAB-based Approach
, pp. 267 - 313
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×