Skip to main content Accessibility help
×
×
Home

A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF

  • Hiroshi Sawada (a1), Nobutaka Ono (a2), Hirokazu Kameoka (a1), Daichi Kitamura (a3) and Hiroshi Saruwatari (a4)...
Abstract

This paper describes several important methods for the blind source separation of audio signals in an integrated manner. Two historically developed routes are featured. One started from independent component analysis and evolved to independent vector analysis (IVA) by extending the notion of independence from a scalar to a vector. In the other route, nonnegative matrix factorization (NMF) has been extended to multichannel NMF (MNMF). As a convergence point of these two routes, independent low-rank matrix analysis has been proposed, which integrates IVA and MNMF in a clever way. All the objective functions in these methods are efficiently optimized by majorization-minimization algorithms with appropriately designed auxiliary functions. Experimental results for a simple two-source two-microphone case are given to illustrate the characteristics of these five methods.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Corresponding author
Corresponding author: Hiroshi Sawada Email: sawada.hiroshi@lab.ntt.co.jp
References
Hide All
1Jutten, C.; Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process., 24 (1) (1991), 110.
2Haykin, S.: Ed., Unsupervised Adaptive Filtering (Volume I: Blind Source Separation). John Wiley & Sons, The United States of America, 2000.
3Hyvärinen, A.; Karhunen, J.; Oja, E.: Independent Component Analysis. John Wiley & Sons, The United States of America, 2001.
4Cichocki, A.; Amari, S.: Adaptive Blind Signal and Image Processing. John Wiley & Sons, England, 2002.
5Makino, S.; Lee, T.-W.; Sawada, H.: Eds., Blind Speech Separation. Springer, The Netherlands, 2007.
6Jourjine, A.; Rickard, S.; Yilmaz, O.: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures, in Proc. ICASSP, vol. 5, June 2000, 29852988.
7Roman, N.; Wang, D.; Brown, G.: Speech segregation based on sound localization. J. Acoust. Soc. Am., 114 (4) (2003), 22362252.
8Yilmaz, O.; Rickard, S.: Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Signal Process., 52 (7) (2004), 18301847.
9Araki, S.; Sawada, H.; Mukai, R.; Makino, S.: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Process., 87 (8) (2007), 18331847.
10Mandel, M.I.; Weiss, R.J.; Ellis, D.P.W.: Model-based expectation maximization source separation and localization. IEEE Trans. Audio, Speech Language Process., 18 (2) (2010), 382394.
11Sawada, H.; Araki, S.; Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio, Speech, Language Process., 19 (3) (2011), 516527.
12Ito, N.; Araki, S.; Nakatani, T.: Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing, in Proc. EUSIPCO, August 2016, 11531157.
13Hershey, J.R.; Chen, Z.; Le Roux, J.; Watanabe, S.: Deep clustering: Discriminative embeddings for segmentation and separation, in Proc. ICASSP, March 2016, 3135.
14Nugraha, A.A.; Liutkus, A.; Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio, Speech Language Process., 24 (9) (2016), 16521664.
15Yu, D.; Kolbæk, M.; Tan, Z.-H.; Jensen, J.: Permutation invariant training of deep models for speaker-independent multi-talker speech separation, in Proc. ICASSP, March 2017, 241245.
16Zmolikova, K.; Delcroix, M.; Kinoshita, K.; Higuchi, T.; Ogawa, A.; Nakatani, T.: Speaker-aware neural network based beamformer for speaker extraction in speech mixtures, in Proc. Interspeech, 2017.
17Higuchi, T.; Kinoshita, K.; Delcroix, M.; Zmolikova, K.; Nakatani, T.: Deep clustering-based beamforming for separation with unknown number of sources, in Proc. Interspeech, 2017.
18Kameoka, H.; Li, L.; Inoue, S.; Makino, S.: Semi-blind source separation with multichannel variational autoencoder, arXiv preprint arXiv:1808.00892, August 2018.
19Mogami, S. et al. : Independent deeply learned matrix analysis for multichannel audio source separation, in Proc. EUSIPCO, September 2018, 15571561.
20Wang, D.; Chen, J.: Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans. Audio, Speech, Language Process., 26 (10) (2018), 17021726.
21Leglaive, S.; Girin, L.; Horaud, R.: Semi-supervised multichannel speech enhancement with variational autoencoders and non- negative matrix factorization, in Proc. ICASSP, 2019, (to appear).
22Comon, P.: Independent component analysis, a new concept? Signal. Process., 36 (1994), 287314.
23Bell, A.; Sejnowski, T.: An information-maximization approach to blind separation and blind deconvolution. Neural Comput., 7 (6) (1995), 11291159.
24Amari, S.; Cichocki, A.; Yang, H.H.: A new learning algorithm for blind signal separation, in Touretzky, D.; Mozer, M.; Hasselmo, M. (eds.), Advances in Neural Information Processing Systems, vol. 8. The MIT Press, Cambridge, MA, 1996, pp. 757763.
25Cardoso, J.-F.; Souloumiac, A.: Jacobi angles for simultaneous diagonalization. SIAM J. Matrix Anal. Appl., 17 (1) (1996), 161164.
26Cardoso, J.-F.: Infomax and maximum likelihood for blind source separation. IEEE Signal Process. Lett., 4 (4) (1997), 112114.
27Bingham, E.; Hyvärinen, A.: A fast fixed-point algorithm for independent component analysis of complex valued signals. Int. J. Neural Syst., 10 (1) (2000), 18.
28Sawada, H.; Mukai, R.; Araki, S.; Makino, S.: Polar coordinate based nonlinear function for frequency domain blind source separation. IEICE Trans. Fund., E86-A (3) (2003), 590596.
29Ono, N.; Miyabe, S.: Auxiliary-function-based independent component analysis for super-Gaussian sources, in Proc. LVA/ICA. Springer, 2010, 165172.
30Lee, D.D.; Seung, H.S.: Learning the parts of objects with nonnegative matrix factorization. Nature, 401 (1999), 788791.
31Lee, D.; Seung, H.: Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems, vol. 13, 2001, 556562.
32Kameoka, H.; Goto, M.; Sagayama, S.: Selective amplifier of periodic and non-periodic components in concurrent audio signals with spectral control envelopes, in IPSJ SIG Technical Reports, 2006-MUS-66-13, August 2006, 7784, in Japanese.
33Févotte, C.; Bertin, N.; Durrieu, J.-L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput., 21 (3) (2009), 793830.
34Kameoka, H.; Ono, N.; Kashino, K.; Sagayama, S.: Complex NMF: a new sparse representation for acoustic signals, in Proc. ICASSP, April 2009, 34373440.
35Nakano, M.; Kameoka, H.; Le Roux, J.; Kitano, Y.; Ono, N.; Sagayama, S.: Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence, in Proc. MLSP, August 2010, 283288.
36Févotte, C.; Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput., 23 (9) (2011), 24212456.
37Hiroe, A.: Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proc. ICA 2006 (LNCS 3889). Springer, March 2006, 601608.
38Kim, T.; Eltoft, T.; Lee, T.-W.: Independent vector analysis: An extension of ICA to multivariate components, in Proc. ICA 2006 (LNCS 3889). Springer, March 2006, 165172.
39Lee, I.; Kim, T.; Lee, T.-W.: Complex FastIVA: A robust maximum likelihood approach of MICA for convolutive BSS, in Proc. ICA 2006 (LNCS 3889). Springer, March 2006, 625632.
40Kim, T.; Attias, H.T.; Lee, S.-Y.; Lee, T.-W.: Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio, Speech Language Process., 15 (1) (2007), 7079.
41Lee, I.; Kim, T.; Lee, T.-W.: Fast fixed-point independent vector analysis algorithms for convolutive blind source separation. Signal Process., 87 (8) (2007), 18591871.
42Kim, T.: Real-time independent vector analysis for convolutive blind source separation. IEEE Trans. Circuits and Systems I: Regular Papers, 57 (7) (2010), 14311438.
43Ono, N.: Stable and fast update rules for independent vector analysis based on auxiliary function technique, in Proc. WASPAA, October 2011, 189192.
44Ono, N.: Auxiliary-function-based independent vector analysis with power of vector-norm type weighting functions, in Proc. APSIPA ASC, December 2012, 14.
45Anderson, M.; Fu, G.-S.; Phlypo, R.; Adali, T.: Independent vector analysis: identification conditions and performance bounds. IEEE Trans. Signal Process., 62 (17) (2014), 43994410.
46Ikeshita, R.; Kawaguchi, Y.; Togami, M.; Fujita, Y.; Nagamatsu, K.: Independent vector analysis with frequency range division and prior switching, in Proc. EUSIPCO, August 2017, 23292333.
47Ozerov, A.; Févotte, C.: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio, Speech Language Process., 18 (3) (2010), 550563.
48Arberet, S. et al. : Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation, in Proc. ISSPA 2010, May 2010, 14.
49Sawada, H.; Kameoka, H.; Araki, S.; Ueda, N.: Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Trans. Audio, Speech, Language Process., 21 (5) (2013), 971982.
50Higuchi, T.; Kameoka, H.: Joint audio source separation and dereverberation based on multichannel factorial hidden Markov model, in Proc. MLSP, September 2014, 16.
51Nikunen, J.; Virtanen, T.: Direction of arrival based spatial covariance model for blind sound source separation. IEEE/ACM Trans. Audio, Speech, Language Process., 22 (3) (2014), 727739.
52Mirzaei, S.; Van Hamme, H.; Norouzi, Y.: Blind audio source counting and separation of anechoic mixtures using the multichannel complex NMF framework. Signal. Process., 115 (2015), 2737.
53Itakura, K.; Bando, Y.; Nakamura, E.; Itoyama, K.; Yoshii, K.; Kawahara, T.: Bayesian multichannel nonnegative matrix factorization for audio source separation and localization, in Proc. ICASSP, 2017, 551555.
54Kameoka, H.; Sawada, H.; Higuchi, T.: General formulation of multichannel extensions of NMF variants, in Makino, S. (ed.), Audio Source Separation. Springer, Cham, Switzerland, 2018, pp. 95124.
55Kameoka, H.; Yoshioka, T.; Hamamura, M.; Le Roux, J.; Kashino, K.: Statistical model of speech signals based on composite autoregressive system with application to blind source separation, in Proc. LVA/ICA. Springer, September 2010, 245253.
56Kitamura, D.; Ono, N.; Sawada, H.; Kameoka, H.; Saruwatari, H.: Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio, Speech, Language Process., 24 (9) (2016), 16261641.
57Kitamura, D.; Ono, N.; Sawada, H.; Kameoka, H.; Saruwatari, H.: Determined blind source separation with independent low-rank matrix analysis, in Makino, S. Ed., Audio Source Separation. Springer, Cham, Switzerland, March 2018.
58Kitamura, D. et al. : Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation. EURASIP J. Adv. Signal Process., 2018 (28), 2018, 25 pages.
59Ikeshita, R.; Kawaguchi, Y.: Independent low-rank matrix analysis based on multivariate complex exponential power distribution, in Proc. ICASSP, April 2018, 741745.
60Mogami, S. et al. : Independent low-rank matrix analysis based on generalized Kullback-Leibler divergence. IEICE Trans. Fund., E102-A (2) (2019), 458463.
61Lange, K.; Hunter, D.R.; Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist., 9 (1) (2000), 120.
62Hunter, D.R.; Lange, K.: Quantile regression via an MM algorithm. J. Comput. Graph. Statist., 9 (1) (2000), 6077.
63Hunter, D.R.; Lange, K.: A tutorial on MM algorithms. The American Statistician, 58 (1) (2004), 3037.
64Ono, N.; Kohno, H.; Ito, N.; Sagayama, S.: Blind alignment of asynchronously recorded signals for distributed microphone array, in Proc. WASPAA, October 2009, 161164.
65Ono, N.; Sagayama, S.: R-means localization: A simple iterative algorithm for source localization based on time difference of arrival, in Proc. ICASSP, March 2010, 27182721.
66Yoshii, K.; Tomioka, R.; Mochihashi, D.; Goto, M.: Infinite positive semidefinite tensor factorization for source separation of mixture signals, in Proc. ICML, June 2013, 576584.
67Kameoka, H.; Takamune, N.: Training restricted Boltzmann machines with auxiliary function approach, in Proc. MLSP, September 2014, 16.
68Sun, Y.; Babu, P.; Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans Signal Process., 65 (3) (2017), 794816.
69Amari, S.; Douglas, S.; Cichocki, A.; Yang, H.: Multichannel blind deconvolution and equalization using the natural gradient, in Proc. IEEE Workshop on Signal Processing Advances in Wireless Communications, April 1997, 101104.
70Kawamoto, M.; Matsuoka, K.; Ohnishi, N.: A method of blind separation for convolved non-stationary signals. Neurocomputing, 22 (1998), 157171.
71Douglas, S.C.; Sun, X.: Convolutive blind separation of speech mixtures using the natural gradient. Speech. Commun., 39 (2003), 6578.
72Nishikawa, T.; Saruwatari, H.; Shikano, K.: Blind source separation of acoustic signals based on multistage ICA combining frequency-domain ICA and time-domain ICA. IEICE Trans. Fund., 86 (4) (2003), 846858.
73Buchner, H.; Aichner, R.; Kellermann, W.: TRINICON: A versatile framework for multichannel blind signal processing, in Proc. ICASSP, vol. 3, 2004, iii889.
74Bourgeois, J.; Minker, W.: Time-domain beamforming and blind source separation. Lecture Notes in Electrical Engineering. Springer-Verlag, New York, NY, 2009.
75Koldovsky, Z.; Tichavsky, P.: Time-domain blind separation of audio sources on the basis of a complete ica decomposition of an observation space. IEEE Trans. Audio, Speech, Language Process., 19 (2) (2011), 406416.
76Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing, 22 (1998), 2134.
77Parra, L.; Spence, C.: Convolutive blind separation of non-stationary sources. IEEE Trans. Speech Audio Process., 8 (3) (2000), 320327.
78Schobben, L.; Sommen, W.: A frequency domain blind signal separation method based on decorrelation. IEEE Trans. Signal Process., 50 (8) (2002), 18551865.
79Anemüller, J.; Kollmeier, B.: Amplitude modulation decorrelation for convolutive blind source separation, in Proc. ICA, June 2000, 215220.
80Asano, F.; Ikeda, S.; Ogawa, M.; Asoh, H.; Kitawaki, N.: Combined approach of array processing and independent component analysis for blind separation of acoustic signals. IEEE Trans. Speech Audio Process., 11 (3) (2003), 204215.
81Saruwatari, H.; Kurita, S.; Takeda, K.; Itakura, F.; Nishikawa, T.; Shikano, K.: Blind source separation combining independent component analysis and beamforming. EURASIP J. Appl. Signal Process., 2003 (11) (2003), 11351146.
82Saruwatari, H.; Kawamura, T.; Nishikawa, T.; Lee, A.; Shikano, K.: Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Audio, Speech Language Process., 14 (2) (2006), 666678.
83Yoshioka, T.; Nakatani, T.; Miyoshi, M.: An integrated method for blind separation and dereverberation of convolutive audio mixtures, in Proc. EUSIPCO, August 2008.
84Vincent, E.; Jafari, M.G.; Abdallah, S.A.; Plumbley, M.D.; Davies, M.E.: Probabilistic modeling paradigms for audio source separation, in Wang, W.: Ed., Machine Audition: Principles, Algorithms and Systems. IGI global, Hershey, PA, USA, 2010, 162185.
85Duong, N.; Vincent, E.; Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio, Speech, Language Process., 18 (7) (2010), 18301840.
86Winter, S.; Sawada, H.; Makino, S.: Geometrical interpretation of the PCA subspace approach for overdetermined blind source separation. EURASIP. J. Adv. Signal. Process., 2006 (1) (2006), 071632.
87Osterwise, C.; Grant, S.L.: On over-determined frequency domain BSS. IEEE/ACM Trans. Audio, Speech, Language Process., 22 (5) (2014), 956966.
88Sawada, H.; Mukai, R.; Araki, S.; Makino, S.: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process., 12 (5) (2004), 530538.
89Ozerov, A.; Févotte, C.; Blouet, R.; Durrieu, J.-L.: Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation, in Proc. ICASSP, 2011, 257260.
90Hyvärinen, A.: Fast and robust fixed-point algorithm for independent component analysis. IEEE Trans. Neural Networks, 10 (3) (1999), 626634.
91Yoshii, K.; Kitamura, K.; Bando, Y.; Nakamura, E.; Kawahara, T.: Independent low-rank tensor analysis for audio source separation, in Proc. EUSIPCO, September 2018.
92Yeredor, A.: On hybrid exact-approximate joint diagonalization, in Proc. IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2009, 312315.
93Cardoso, J.-F.: Multidimensional independent component analysis, in Proc. ICASSP, May 1998, 19411944.
94Murata, N.; Ikeda, S.; Ziehe, A.: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing, 41 (2001), 124.
95Matsuoka, K.; Nakashima, S.: Minimal distortion principle for blind source separation, in Proc. ICA, December 2001, 722727.
96Takatani, T.; Nishikawa, T.; Saruwatari, H.; Shikano, K.: High-fidelity blind separation of acoustic signals using SIMO-model-based independent component analysis. IEICE Trans. Funda., E87-A (8) (2004), 20632072.
97Mori, Y. et al. : Blind separation of acoustic signals combining SIMO-model-based independent component analysis and binary masking. EURASIP J. Appl. Signal Process., 2006, article ID 34970, 17 pages, 2006.
98Sawada, H.; Araki, S.; Makino, S.: MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals, in Proc. MLSP, August 2007, 4550.
99Vincent, E. et al. : The signal separation evaluation campaign (2007–2010): Achievements and remaining challenges. Signal Process., 92 (8) (2012), 19281936.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed