Skip to main content
×
Home
    • Aa
    • Aa

Combining augmented statistical noise suppression and framewise speech/non-speech classification for robust voice activity detection

  • Yasunari Obuchi (a1)
Abstract

This paper proposes a new voice activity detection (VAD) algorithm based on statistical noise suppression and framewise speech/non-speech classification. Although many VAD algorithms have been developed that are robust in noisy environments, the most successful ones are related to statistical noise suppression in some way. Accordingly, we formulate our VAD algorithm as a combination of noise suppression and subsequent framewise classification. The noise suppression part is improved by introducing the idea that any unreliable frequency component should be removed, and the decision can be made by the remaining signal. This augmentation can be realized using a few additional parameters embedded in the gain-estimation process. The framewise classification part can be either model-less or model-based. A model-less classifier has the advantage that it can be applied to any situation, even if no training data are available. In contrast, a model-based classifier (e.g., neural network-based classifier) requires training data but tends to be more accurate. The accuracy of the proposed algorithm is evaluated using the CENSREC-1-C public framework and confirmed to be superior to many existing algorithms.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Combining augmented statistical noise suppression and framewise speech/non-speech classification for robust voice activity detection
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about sending content to Dropbox.

      Combining augmented statistical noise suppression and framewise speech/non-speech classification for robust voice activity detection
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about sending content to Google Drive.

      Combining augmented statistical noise suppression and framewise speech/non-speech classification for robust voice activity detection
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Corresponding author
Corresponding author: Y. Obuchi, Email: obuchiysnr@stf.teu.ac.jp
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

[1] L.R. Rabiner ; M.R. Sambur : An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. J., 54 (2) (1975), 297315.

[6] J. Ramirez ; P. Yelamos ; J.M. Gorriz ; J.C. Segura : SVM-based speech endpoint detection using contextual speech features. Electron. Lett., 42 (7) (2006), 426428.

[10] X.-L. Zhang ; J. Wu : Deep belief networks based voice activity detection. IEEE Trans. Audio Speech Lang. Process., 21 (4) (2013), 697710.

[13] Y. Fujita ; K. Iso : Robust DNN-based VAD augmented with phone entropy based rejection of background speech, in Interspeech, San Francisco, CA, USA, 2016, 36633667.

[14] J. Ramirez ; J.C. Segura ; C. Benitez ; A. de la Torre ; A. Rubui : An effective subband OSF-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process., 13 (6) (2005), 11191129.

[17] J Sohn ; N.S. Kim ; W. Sung : A statistical model-based voice activity detection. IEEE Signal Process. Lett., (6) (1999), 13.

[18] Y. Ephraim ; D. Malah : Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process., 32 (6) (1984), 11091121.

[19] M. Fujimoto ; K. Ishizuka : Noise robust voice activity detection based on switching Kalman filter. IEICE Trans. Inf. Syst., E91-D (3) (2008), 467477.

[20] I. Cohen ; B. Berdugo : Speech enhancement for non-stationary noise environments. Signal Process., 81 (2001), 24032418.

[23] Y. Ephraim ; D. Malah : Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process., ASSP-33 (2), (1985), 443445.

[25] N. Kitaoka : CENSREC-1-C: an evaluation framework for voice activity detection under noisy environments. Acoust. Sci. Technol., 30 (5) (2009), 363371.

[29] M. Hall ; E. Frank ; G. Holmes ; B. Pfahringer ; P. Reutemann ; I.H. Witten : The WEKA data mining software: an update. SIGKDD Explorations, 11 (1) (2009), 1018.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords:

Metrics

Full text views

Total number of HTML views: 9
Total number of PDF views: 24 *
Loading metrics...

Abstract views

Total abstract views: 68 *
Loading metrics...

* Views captured on Cambridge Core between 14th July 2017 - 21st September 2017. This data will be updated every 24 hours.