Skip to main content
×
×
Home

Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

  • Naohiro Tawara (a1), Tetsuji Ogawa (a1), Shinji Watanabe (a2) and Tetsunori Kobayashi (a1)
Abstract

This paper proposes a novel model estimation method, which uses nested Gibbs sampling to develop a mixture-of-mixture model to represent the distribution of the model's components with a mixture model. This model is suitable for analyzing multilevel data comprising frame-wise observations, such as videos and acoustic signals, which are composed of frame-wise observations. Deterministic procedures, such as the expectation–maximization algorithm have been employed to estimate these kinds of models, but this approach often suffers from a large bias when the amount of data is limited. To avoid this problem, we introduce a Markov chain Monte Carlo-based model estimation method. In particular, we aim to identify a suitable sampling method for the mixture-of-mixture models. Gibbs sampling is a possible approach, but this can easily lead to the local optimum problem when each component is represented by a multi-modal distribution. Thus, we propose a novel Gibbs sampling method, called “nested Gibbs sampling,” which represents the lower-level (fine) data structure based on elemental mixture distributions and the higher-level (coarse) data structure based on mixture-of-mixture distributions. We applied this method to a speaker clustering problem and conducted experiments under various conditions. The results demonstrated that the proposed method outperformed conventional sampling-based, variational Bayesian, and hierarchical agglomerative methods.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Corresponding author
Corresponding author: N. Tawara tawara@pcl.cs.waseda.ac.jp
References
Hide All
[1] Watanabe, S.; Mochihashi, D.; Hori, T.; Nakamura, A.: Gibbs sampling based multi-scale mixture model for speaker clustering, in ICASSP, 2011, 45244527.
[2] Rabiner, L.; Juang, B.H.: Fundamentals of Speech Recognition. Signal Processing. Prentice-Hall, Upper Saddle River, NJ, 1993.
[3] Reynolds, D.A.; Quatieri, T.F.; Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process., 10 (1–3) (2000), 1941.
[4] Spellman, E.; Vemuri, B.C.; Rao, M.: Using the KL-center for efficient and accurate retrieval of distributions arising from texture images, in CVPR (1), 2005, 111116.
[5] Bishop, C.M.: Pattern Recognition and Machine Learning, Springer-Verlag, New York, 2006.
[6] McLachlan, G.; Peel, D.: Finite Mixture Models. John Wiley & Sons, New York, 2004.
[7] Andrew, J.L.; McNicholas, P.D.; Sudebi, S.: Model-based classification via mixtures of multivariate t-distributions. Comput. Stat. Data Anal., 55 (1) (2011), 520529.
[8] Banerjee, A.; Dhillon, I.S.; Ghosh, J.; Sra, S.: Clustering on the unit hypersphere using von Mises–Fisher distributions. J. Mach. Learn. Res., 6 (2005), 13451382.
[9] Tang, H.; Chu, S.M.; Huang, T.S.: Generative model-based speaker clustering via mixture of von Mises–Fisher distributions, in ICASSP, 2009, 41014104.
[10] Marron, J.S.; Wand, M.P.: Exact mean integrated squared error. Ann. Stat., 20 (2) (1992), 712736.
[11] Lawrence, C.J.; Krzanowski, W.J.: Mixture separation for mixed-mode data. Stat. Comput., 6 (1996), 8592.
[12] Willse, A.; Boik, R.J.: Identifiable finite mixtures of location models for clustering mixed-mode data. Stat. Comput., 9 (1999), 111121.
[13] Calo, D.G.; Montanari, A.; Viroli, C.: A hierarchical modeling approach for clustering probability density functions. Comput. Stat. Data Anal., 71 (2014), 7991.
[14] Vermunt, J.K.: A hierarchical mixture model for clustering three-way data sets. Comput. Stat. Data Anal., 51 (11) (2007), 53685376.
[15] Vermunt, J.K.; Magidson, J.: Hierarchical mixture models for nested data structures, in Classification: The Ubiquitous Challenge. Springer, Heidelberg, 2005, 240247.
[16] Dempster, A.P.; Laird, N.M.; Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B, 39 (1) (1977), 138.
[17] Tawara, N.; Ogawa, T.; Watanabe, S.; Kobayashi, T.: Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering, in ICASSP, 2012, 52535256.
[18] Valente, F.; Motlícek, P.; Vijayasenan, D.: Variational Bayesian speaker diarization of meeting recordings, in ICASSP, 2010, 49544957.
[19] Valente, F.; Wellekens, C.J.: Variational Bayesian adaptation for speaker clustering, in ICASSP, vol. 03, 2005, 965968.
[20] Teh, Y.W.; Newman, D.; Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation, in Advances in Neural Information Processing Systems, vol. 19, 2007, 13531360.
[21] Blei, D.M.; Ng, A.Y.; Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res., 3 (2003), 9931022.
[22] Sung, J.; Ghahramani, Z.; Bang, S.: Latent-space variational Bayes. IEEE Trans. PAMI, 30 (12) (2008), 22362242.
[23] Teh, Y.W.; Newman, D.; Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. 2006.
[24] Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P.: Optimization by simulated annealing. Science, 220 (4598) (1983), 671680.
[25] Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York, Berlin, Heidelberg, 2008.
[26] Garofolo, J.S.; Lamel, L.F.; Fisher, W.M.; Fiscus, J.G.; Pallett, D.S.; Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. 1993.
[27] Kawahara, T.; Nanjo, H.; Furui, S.: Automatic transcription of spontaneous lecture speech, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, 2001, 186189.
[28] Shuichi, I.: On recent speech corpora activities in Japan. J. Acoust. Soc. Japan (E), 20 (3) (1999), 163169.
[29] Solomonoff, A.; Mielke, A.; Schmidt, M.; Gish, H.: Clustering speakers by their voices, in ICASSP, 1998, 757760.
[30] Rodriguez, A.E.G.A.; Dunson, D.B.: The nested Dirichlet process. J. Am. Stat. Assoc., 103 (2008), 11311154.
[31] Tawara, N.; Ogawa, T.; Watanabe, S.; Nakamura, A.; Kobayashi, T.: Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model, in INTERSPEECH, 2012, 52535256.
[32] Tawara, N.; Ogawa, T.; Watanabe, S.; Nakamura, A.; Kobayashi, T.: A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large scale data. APSIPA Transactions on Signal and Information Processing, 4 (2015), E16.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed