from Part I - General discussion
Published online by Cambridge University Press: 05 August 2015
This chapter focuses on basic statistical models (Gaussian mixture models (GMM), hidden Markov models (HMM),n–gram models and latent topic models), which are widely used in speech and language processing. These are well-known generative models, and these probabilistic models can generate speech and language features based on their likelihood functions. We also provide parameter-learning schemes based on maximum likelihood (ML) estimation which is derived according to the expectation and maximization (EM) algorithm (Dempster et al. 1976). Basically, the following chapters extend these statistical models from ML schemes to Bayesian schemes. These models are fundamental for speech and language processing.We specifically build an automatic speech recognition (ASR) system based on these models and extend them to deal with different problems in speaker clustering, speech verification, speech separation and other natural language processing systems.
In this chapter, Section 3.1 first introduces the probabilistic approach to ASR, which aims to find the most likely word sequence W corresponding to the input speech feature vectors O. Bayes decision theory provides a theoretical solution to build up a speech recognition system based on the posterior distribution of the word sequence p(W|O) given speech feature vectors O. Then the Bayes theorem decomposes the problem based on p(W|O) into two problems based on two generative models of speech features p(O|W) (acoustic model) and language features p(W) (language model), respectively. Therefore, the Bayes theorem changes the original problem to these two independent generative model problems.
Next, Section 3.2 introduces the HMM with the corresponding likelihood function as a generative model of speech features. The section first describes the discrete HMM, which has a multinomial distribution as a state observation distribution, and Section 3.2.4 introduces the GMM as a state observation distribution of the continuous density HMM for acoustic modeling. The GMM by itself is also used as a powerful statistical model for other speech processing approaches in the later chapters. Section 3.3 provides the basic algorithms of forward–backward and Viterbi algorithms. In Section 3.4, ML estimation of HMM parameters is derived according to the EM algorithm to deal with latent variables included in the HMM efficiently. Thus, we provide the conventional ML treatment of basic statistical models for acoustic models based on the HMM.
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.