Skip to main content
×
×
Home

Noise masking method based on an effective ratio mask estimation in Gammatone channels

  • Feng Bao (a1) and Waleed H. Abdulla (a1)
Abstract

In computational auditory scene analysis, the accurate estimation of binary mask or ratio mask plays a key role in noise masking. An inaccurate estimation often leads to some artifacts and temporal discontinuity in the synthesized speech. To overcome this problem, we propose a new ratio mask estimation method in terms of Wiener filtering in each Gammatone channel. In the reconstruction of Wiener filter, we utilize the relationship of the speech and noise power spectra in each Gammatone channel to build the objective function for the convex optimization of speech power. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time–frequency units, and then smoothed by interpolating with the estimated binary masks. The objective tests including the signal-to-noise ratio improvement, spectral distortion and intelligibility, and subjective listening test demonstrate the superiority of the proposed method compared with the reference methods.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Noise masking method based on an effective ratio mask estimation in Gammatone channels
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Noise masking method based on an effective ratio mask estimation in Gammatone channels
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Noise masking method based on an effective ratio mask estimation in Gammatone channels
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Corresponding author
Corresponding author: F. Bao, Email: fbao026@aucklanduni.ac.nz
References
Hide All
[1]Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust., Speech, Signal Process, ASSP-27 (2) (1979), 113120.
[2]Li, C.; Liu, W.J.: A novel multi-band spectral subtraction method based on phase modification and magnitude compensation, In Proc. IEEE ICASSP, 2011, 47604763.
[3]Loizou, P.C.: Speech Enhancement: Theory and Practice, CRC Press, Boca Raton, FL, USA, 2007.
[4]Ephraim, Y.; Malah, D.: Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process., 32 (6) (1984), 11091121.
[5]Erkelens, J.S.; Hendriks, R.C.; Heusdens, R.; Jensen, J.: Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors. IEEE Trans. Audio, Speech, Lang. Process., 15 (6) (2007), 17411752.
[6]Zoghlami, N.; Lachiri, Z.; Ellouze, N.: Speech enhancement using auditory spectral attenuation, In EUSIPCO 2009, Scotland, 2428 August 2009.
[7]Zhao, D.Y.; Kleijn, W.B.: HMM-Based gain modeling for enhancement of speech in noise. IEEE Trans. Audio, Speech, Lang. Process., 15 (3) (2007), 882892.
[8]Srinivasan, S.; Samuelsson, J.; Kleijn, W.B.: Codebook driven short term predictor parameter estimation for speech enhancement. IEEE Trans. Audio, Speech, Lang. Process., 14 (1) (2006), 163176.
[9]Deng, F.; Bao, C.C.; Kleijin, W.B.: Sparse hiddenMarkov models for speech enhancement in non-stationary noise environments. IEEE Trans. Audio, Speech, Lang. Process., 23 (11) (2015), 19731987.
[10]He, Q.; Bao, F.; Bao, C.C.: Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE Trans. Audio, Speech, Lang. Process., 25 (3) (2017), 457468.
[11]Hu, G.; Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw., 15 (5) (2004), 11351150.
[12]Bao, F.; Abdulla, W.H.: A Noise Masking Method with Adaptive Thresholds based on CASA, APSIPA, Jeju, South Korea, 2016.
[13]Wang, Y.; Narayanan, A.; Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio, Speech, Lang. Process., 22 (12) (2014), 18491858.
[14]Williamson, D.S.; Wang, Y.X.; Wang, D.L.: Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio, Speech, Lang. Process., 24 (3) (2016), 483493.
[15]Madhu, N.; Spriet, A.; Jansen, S.; Koning, R.; Wouters, J.: The potential for speech intelligibility improvement using the ideal binary mask and the ideal Wiener filter in single channel noise reduction systems: application to auditory prostheses. IEEE/ACM Trans. Audio, Speech, Lang. Process., 21 (1) (2013), 6372.
[16]Koning, R.; Madhu, N.; Wouters, J.: Ideal timeÍCFrequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and Cochlear implant listeners. IEEE/ACM Trans. Audio, Speech, Lang. Process., 62 (1) (2014), 331341.
[17]Boyd, S.; Vandenberghe, L.: Convex Optimization, Cambridge University Press, 2004.
[18]Bao, F.; Abdulla, W.H.: A new IBM estimation method based on convex optimization for CASA. Speech Commun., 97 (2018), 5165.
[19]Patterson, R.D.; Nimmo-Smith, I.; Holdsworth, J.; Rice, P.: An Efficient Auditory Filterbank based on the Gammatone Function, Appl. Psychol. Unit, Cambridge Univ., Cambridge, UK, 1998.
[20]Abdulla, W.H.: Advance in Communication and Software Technologies, Chapter Auditory Based Feature Vectors for Speech Recognition Systems, WSEAS Press, 2002, pp. 231236.
[21]Cohen, I.: Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Process. Lett., 9 (1) (2002), 1215.
[22]Bao, F.; Dou, H.J.; Jia, M.S.; Bao, C.C.: A novel speech enhancement method using power spectra smooth in wiener filtering, In APSIPA, 2014.
[23]Gardner, W.A.: Learning characteristics of stochastic gradient-descent algorithms: a general study, analysis, and critique. Signal Process, 6 (2) (1984), 113133.
[24]Weintraub, M.: A Theory and Computational Model of Auditory Monaural Sound Separation. Ph.D. dissertation, Dept. Elect. Eng., Stanford University, Stanford, CA, 1985.
[25]Bao, F.; Abdulla, W.H.: A convex optimization approach for time-frequency mask estimation, In WASPAA, 2017, pp. 3135.
[26]Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S.; Dahlgrena, N.L.: DARPA- TIMIT, Acoustic Phone Ticcontinuous Speech Corpus, US Department of Commerce, Washington, DC, 1993 (NISTIR Publication No. 4930).
[27]Varga, A.P.; Steeneken, H.J.M.; Tomlinson, M.; Jones, D.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. http://spib.rice.edu/spib/select, 1992.
[28]Quackenbush, S.R.; Barnwell, T.P.; Clements, M.A.: Objective Measures of Speech Quality, Prentice–Hall, Englewood Cliffs, NJ, 1988.
[29]Abramson, A.; Cohen, I.: Simultaneous detection and estimation approach for speech enhancement. IEEE Trans. Audio, Speech, Lang. Process, 15 (8) (2007), 23482359.
[30]Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J.: An algorithm for intelligibility prediction of timeÍCfrequency weighted noisy speech. IEEE Trans.Audio, Speech, Lang. Process, 19 (7) (2011), 21252136.
[31]Vincent, E.: MUSHRAM: A MATLAB interface for MUSHRA listening tests, [Online]. Available: http://www.elec.qmul.ac.uk/people/emmanuelv/mushram, 2005.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Full text views

Total number of HTML views: 13
Total number of PDF views: 57 *
Loading metrics...

Abstract views

Total abstract views: 77 *
Loading metrics...

* Views captured on Cambridge Core between 15th May 2018 - 24th June 2018. This data will be updated every 24 hours.