from Part II - Approximate inference
Published online by Cambridge University Press: 05 August 2015
Asymptotic approximation is also well known in practical Bayesian approaches (De Bruijn 1970) for approximately obtaining the posterior distributions. For example, as we discussed in Chapter 2, the posterior distributions of a model parameter p(Θ|O) and a model p(M|O) given an observation O = ﹛ot {ℝD|t = 1, …, T}) are usually difficult to solve. The approach assumes that we have enough data (i.e., T is sufficiently large), which also makes Bayesian inference mathematically tractable. As a particular example of asymptotic approximations, we introduce the Laplace approximation and Bayesian information criterion, which are widely used for speech and language processing.
The Laplace approximation is used to approximate a complex distribution as a Gaussian distribution (Kass & Raftery 1995, Bernardo & Smith 2009). It assumes that the posterior distribution is highly peaked at about its maximum value, which corresponds to the mode of the posterior distribution. Then the posterior distribution is modeled as a Gaussian distribution with the mode as a mean parameter. By using the approximation, we can obtain the posterior distributions analytically to some extent. Section 6.1 first explains the Laplace approximation in general. In Sections 6.3 and 6.4 we also discuss use of the Laplace approximation for analytically obtaining Bayesian predictive distributions for acoustic modeling and Bayesian extension of successful neural-network-based acoustic modeling, respectively.
Another example of this asymptotic approximation is the Bayesian information criterion (or Schwarz criterion (Schwarz 1978)). The Bayesian information criterion also assumes the large sample case, and approximates the posterior distribution of a model p(M|O) with a simple equation. Since the Bayesian information criterion assumes the large sample case, it is also described as an instance of asymptotic approximations. Section 6.2 explains the Bayesian information criterion in general; it is used for model selection problems in speech processing. For example, Section 6.5 discusses the optimization of an appropriate model structure of hidden Markov models, and Section 6.6 discusses estimation of the number of speakers and detecting speech segments in conversations by regarding these approaches as model selection problems.
Laplace approximation
This section first describes the basic theory of the Laplace approximation. We first consider a simple case where a model does not have a latent variable. We focus on the posterior distributions of model parameters, but this approximation can be applied to the other continuous probabilistic variables.
To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Find out more about the Kindle Personal Document Service.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.
To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.