Hostname: page-component-77f85d65b8-6bnxx Total loading time: 0 Render date: 2026-03-28T14:40:12.346Z Has data issue: false hasContentIssue false

A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF

Published online by Cambridge University Press:  14 May 2019

Hiroshi Sawada*
Affiliation:
NTT Corporation, Tokyo, Japan
Nobutaka Ono
Affiliation:
Tokyo Metropolitan University, Hino, Japan
Hirokazu Kameoka
Affiliation:
NTT Corporation, Tokyo, Japan
Daichi Kitamura
Affiliation:
National Institute of Technology, Kagawa College, Takamatsu, Japan
Hiroshi Saruwatari
Affiliation:
The University of Tokyo, Tokyo, Japan
*
Corresponding author: Hiroshi Sawada Email: sawada.hiroshi@lab.ntt.co.jp

Abstract

This paper describes several important methods for the blind source separation of audio signals in an integrated manner. Two historically developed routes are featured. One started from independent component analysis and evolved to independent vector analysis (IVA) by extending the notion of independence from a scalar to a vector. In the other route, nonnegative matrix factorization (NMF) has been extended to multichannel NMF (MNMF). As a convergence point of these two routes, independent low-rank matrix analysis has been proposed, which integrates IVA and MNMF in a clever way. All the objective functions in these methods are efficiently optimized by majorization-minimization algorithms with appropriately designed auxiliary functions. Experimental results for a simple two-source two-microphone case are given to illustrate the characteristics of these five methods.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Authors, 2019
Figure 0

Fig. 1. Various methods for blind audio source separation. Methods in blue are discussed in this paper in an integrated manner.

Figure 1

Fig. 2. Historical development of BSS methods.

Figure 2

Table 1. Notations.

Figure 3

Fig. 3. Tensor and sliced matrices.

Figure 4

Fig. 4. Independence in ICA and IVA.

Figure 5

Fig. 5. NMF as spectrogram model fitting.

Figure 6

Fig. 6. Example of MNMF-learned spatial property. The left and middle plots show the learned complex arguments ${\rm arg}([{\ssf H}_{ik}]_{12}), k=1,\ldots,10$, and ${\rm arg}([{\ssf H}_{in}]_{12}), n=1,2$, respectively. The right figure illustrates the corresponding two-source two-microphone situation.

Figure 7

Fig. 7. ILRMA: unified method of IVA and NMF.

Figure 8

Fig. 8. Majorization-minimization: minimizing the auxiliary function indirectly minimizes the objective function.

Figure 9

Fig. 9. Source images (left-most column) and source estimates by ICA, IVA, and ILRMA (three columns on the right) whose scales were adjusted by projection back (PB). The first and second rows correspond to music and speech sources, respectively. The plots are spectrograms colored in log scale with large values being yellow. The ICA estimates were not well separated in a full-band sense (SDRs = 6.27 dB, 1.38 dB). The IVA estimations were well separated (SDRs = 13.52 dB, 8.79 dB). The ILRMA estimates were even better separated (SDRs = 16.78 dB, 12.33 dB). Detailed investigations are shown in Fig. 10.

Figure 10

Fig. 10. (Continued from Fig. 9) Source estimates and auxiliary variables of ICA, IVA, and ILRMA. The source estimates yij,n were not scale-adjusted, and had direct links to the auxiliary variables. The ICA estimates were not well separated because there was no communication channel among frequency bins (auxiliary variables used in the other two methods) and the permutation problem was not solved. The IVA estimates were well separated. The IVA auxiliary variables R, [R]j,n = rj,n, represented the activities of source estimates and helped to solve the permutation problem. The ILRMA estimates were even better separated. The ILRMA bases T and activations V, [Tn]ik = tik,n, [Vn]kj = vkj,n, modeled the source estimates with low-rank matrices, which were richer representations than the IVA auxiliary variables R.

Figure 11

Fig. 11. Experimental mixtures and variables (log scale, large values in yellow) of NMF and MNMF. The Two-channel mixtures look very similar in a power spectrum sense. However, the phases (not shown) are considerably different to achieve effective multichannel separation. The NMF results were obtained corresponding to each mixture. No multichannel information was exploited, and thus the two sources were not separated. In the MNMF results, 10 NMF bases were clustered into two classes according to the multichannel information ${\ssf H}_{in}$ in the model (26). The off-diagonal elements $[{\ssf H}_{in}]_{mm'}, m\ne m'$, expressed the phase differences between the microphones as spatial cues, and the two sources were well separated (SDRs = 14.96 dB, 10.31 dB).