Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-14T06:30:20.756Z Has data issue: false hasContentIssue false

Optimized wavelet-domain filtering under noisy and reverberant conditions

Published online by Cambridge University Press:  27 July 2015

Randy Gomez*
Affiliation:
Honda Research Institute Co., Ltd., Wako-shi, Saitama 351-0188, Japan
Tatsuya Kawahara
Affiliation:
Kyoto University, ACCMS, Sakyo-ku, Kyoto 606-8501, Japan
Kazuhrio Nakadai
Affiliation:
Honda Research Institute Co., Ltd., Wako-shi, Saitama 351-0188, Japan
*
Corresponding author: R. Gomez Email: r.gomez@jp.honda-ri.com

Abstract

The paper addresses a robust wavelet-based speech enhancement for automatic speech recognition in reverberant and noisy conditions. We propose a novel scheme in improving the speech, late reflection, and noise power estimates from the observed contaminated signal. The improved estimates are used to calculate the Wiener gain in filtering the late reflections and additive noise. In the proposed scheme, optimization of the wavelet family and its parameters is conducted using an acoustic model (AM). In the offline mode, the optimal wavelet family is selected separately for the speech, late reflections, and background noise based on the AM likelihood. Then, the parameters of the selected wavelet family are optimized specifically for each signal subspace. As a result we can use a wavelet sensitive to the speech, late reflection, and the additive noise, which can independently and accurately estimate these signals directly from an observed contaminated signal. For speech recognition, the most suitable wavelet is identified from the pre-stored wavelets, and wavelet-domain filtering is conducted to the noisy and reverberant speech signal. Experimental evaluations using real reverberant data demonstrate the effectiveness and robustness of the proposed method.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2015
Figure 0

Fig. 1. Overview of the enhancement model.

Figure 1

Fig. 2. Wavelet parameter optimization scheme for speech, late reflection and background noise.

Figure 2

Fig. 3. Noise profile and reverberation time identification.

Figure 3

Fig. 4. Combined noise and late reflection power tracking.

Figure 4

Table 1. GMM classification performance.

Figure 5

Fig. 5. ASR performance in word accuracy (averaged over all types of noise: Mall, Hall, Crowd, Office, Vacuum cleaner, and Computer noise.

Figure 6

Table 2. Performance in word accuracy (%) attributed to the series of optimization.

Figure 7

Fig. 6. Robustness to noise that are not enrolled in the profile database (averaged results of 20, 10, and 0 dB SNR).