Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-28T07:56:45.881Z Has data issue: false hasContentIssue false

Bayesian approaches to acoustic modeling: a review

Published online by Cambridge University Press:  06 December 2012

Shinji Watanabe*
Affiliation:
Mitsubishi Electric Research Laboratories (MERL), 201 Broadway, Cambridge, MA 02139, USA.
Atsushi Nakamura
Affiliation:
NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan.
*
Corresponding author: Shinji Watanabe Email: watanabe@merl.com

Abstract

This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech-processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems. However, there have been many such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech-processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo-based sampling techniques.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution-NonCommercial-ShareAlike license . The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Authors, 2012.
Figure 0

Table 1. Comparison of VBEC and other Bayesian frameworks in terms of Bayesian advantages.

Figure 1

Fig. 1. Superiority of VBEC-based acoustic model construction for a small amount of training data.

Figure 2

Fig. 2. Robust classification based on marginalization effect.

Figure 3

Table 2. Automatic determination of acoustic model topology.

Figure 4

Table 3. Technical trend of speech recognition using VB

Figure 5

Fig. 3. Graphical representation of multi-scale mixture model.

Figure 6

Algorithm 1 Gibbs sampling based multi-scale mixture model.

Figure 7

Fig. 4. Diarization error rate for NTT meeting data.

Figure 8

Table 4. Comparison of MCMC and VB for speaker clustering