Hostname: page-component-77f85d65b8-zzw9c Total loading time: 0 Render date: 2026-03-29T18:32:11.550Z Has data issue: false hasContentIssue false

Ensemble based speaker recognition using unsupervised data selection

Published online by Cambridge University Press:  10 May 2016

Chien-Lin Huang*
Affiliation:
Department of Computer Science and Information Engineering, National Central University, Taiwan 32001, Republic of China
Jia-Ching Wang
Affiliation:
Department of Computer Science and Information Engineering, National Central University, Taiwan 32001, Republic of China
Bin Ma
Affiliation:
Human Language Technology, Institute for Infocomm Research (I2R), Singapore 138632, Singapore
*
Corresponding author:C.-L. Huang Email: chiccocl@gmail.com

Abstract

This paper presents an ensemble-based speaker recognition using unsupervised data selection. Ensemble learning is a type of machine learning that applies a combination of several weak learners to achieve an improved performance than a single learner. A speech utterance is divided into several subsets based on its acoustic characteristics using unsupervised data selection methods. The ensemble classifiers are then trained with these non-overlapping subsets of speech data to improve the recognition accuracy. This new approach has two advantages. First, without any auxiliary information, we use ensemble classifiers based on unsupervised data selection to make use of different acoustic characteristics of speech data. Second, in ensemble classifiers, we apply the divide-and-conquer strategy to avoid a local optimization in the training of a single classifier. Our experiments on the 2010 and 2008 NIST Speaker Recognition Evaluation datasets show that using ensemble classifiers yields a significant performance gain.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is included and the original work is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. The pipeline of the proposed ensemble based speaker recognition using unsupervised data selection.

Figure 1

Fig. 2. Illustration of speaker discriminative feature analysis using the mean of short-term spectral features in a long-term window.

Figure 2

Fig. 3. Testing procedure of ensemble classifiers using unsupervised data selection.

Figure 3

Table 1. Data (or parameters) used for ensemble classifiers based on different evaluation systems.

Figure 4

Table 2. Results of ensemble classifiers using different clustering and weighting schemes on MAP and ZT-norm systems on NIST SRE-2010.

Figure 5

Table 3. Results of ensemble classifiers using different distance metrics on MAP and ZT-norm systems on NIST SRE-2010.

Figure 6

Table 4. Results of ensemble classifiers using different distance metrics on MAP and ZT-norm systems on NIST SRE-2008.

Figure 7

Fig. 4. DCF curves of eigenchannel with ZT-norm systems with different numbers of UBM mixtures on NIST SRE-2010.

Figure 8

Fig. 5. DCF curves of eigenchannel with ZT-norm systems with different numbers of UBM mixtures and data subsets on NIST SRE-2010.

Figure 9

Table 5. Results of I-Vector system with and without ensemble classifiers on NIST SRE-2010.

Figure 10

Table 6. Results of i-vector system with and without ensemble classifiers on NIST SRE-2008.

Figure 11

Fig. 6. DET curves showing improvements of conventional i-vector system, ensemble-based system, fusion of LTF system on SRE-2010.

Figure 12

Table 7. fusion of ensemble based I-Vector system with LTFs on NIST SRE-2010 and SRE-2008.