Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-13T14:58:15.870Z Has data issue: false hasContentIssue false

Advances in anti-spoofing: from the perspective of ASVspoof challenges

Published online by Cambridge University Press:  15 January 2020

Madhu R. Kamble*
Affiliation:
Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India
Hardik B. Sailor
Affiliation:
University of Sheffield, UK
Hemant A. Patil
Affiliation:
Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, Gujarat, India
Haizhou Li
Affiliation:
National University of Singapore (NUS), Singapore
*
Corresponding author: Madhu R. Kamble Email: madhu_kamble@daiict.ac.in, mk310191@gmail.com

Abstract

In recent years, automatic speaker verification (ASV) is used extensively for voice biometrics. This leads to an increased interest to secure these voice biometric systems for real-world applications. The ASV systems are vulnerable to various kinds of spoofing attacks, namely, synthetic speech (SS), voice conversion (VC), replay, twins, and impersonation. This paper provides the literature review of ASV spoof detection, novel acoustic feature representations, deep learning, end-to-end systems, etc. Furthermore, the paper also summaries previous studies of spoofing attacks with emphasis on SS, VC, and replay along with recent efforts to develop countermeasures for spoof speech detection (SSD) task. The limitations and challenges of SSD task are also presented. While several countermeasures were reported in the literature, they are mostly validated on a particular database, furthermore, their performance is far from perfect. The security of voice biometrics systems against spoofing attacks remains a challenging topic. This paper is based on a tutorial presented at APSIPA Annual Summit and Conference 2017 to serve as a quick start for those interested in the topic.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2020
Figure 0

Fig. 1. Biometric identification along with spoofing techniques for fingerprint, iris, face, and voice. (Images are adapted from [19].)

Figure 1

Fig. 2. Brief illustration of an Automatic Speaker Verification (ASV) system. After [20].

Figure 2

Fig. 3. The selected chronological progress in ASVspoof for voice biometrics. In INTERSPEECH 2013, a special session was organized and Spoofing and Anti-Spoofing (SAS) corpus of speech synthesis and voice conversion spoofing data was created. The first ASVspoof challenge was held in INTERSPEECH 2015. In 2016, the OCTAVE project started which focused on only replay spoofing data resulting in the second edition of ASVspoof challenge in INTERSPEECH 2017. The follow-up third ASVspoof 2019 challenge was on physical and LA attacks going to be held during INTERSPEECH 2019 [23]. IS indicates INTERSPEECH.

Figure 3

Fig. 4. Different spoofing attacks on voice biometrics along with their availability and risk factor. IS: INTERSPEECH. Adapted from [32].

Figure 4

Fig. 5. Spectral energy densities of natural (Panel I), synthetic speech (Panel II), and voice converted speech (Panel III). (a) Time-domain speech signal, and (b) corresponding spectral energy density.

Figure 5

Fig. 6. Spectral energy densities of natural (Panel I) and replay speech (Panel II). (a) Time-domain speech signal, and (b) spectral energy density.

Figure 6

Table 1. Various corpus on spoofing attacks to ASV system.

Figure 7

Table 2. A summary of ASVspoof 2015 Challenge database [22].

Figure 8

Table 3. A summary of AVspoof Database [14].

Figure 9

Table 4. A summary of ASVspoof 2017 Challenge version 2.0 [24,81].

Figure 10

Table 5. The summary of ASVspoof 2019 Challenge database [82].

Figure 11

Table 6. Data volume of the ReMASC corpus (*indicates incomplete data due to recording device crashes).

Figure 12

Table 7. Decision of four possible outcomes in the ASV system [20].

Figure 13

Fig. 7. Spoofing detection framework.

Figure 14

Fig. 8. Block diagram of the CFCCIF feature extraction process. After [93].

Figure 15

Fig. 9. Demonstration of eight different types of features is shown for a natural utterance D15_1000931 from the development set of ASVspoof 2015 challenge dataset. For each feature type, only the low half of the FFT frequency bins are shown. Adapted from [95].

Figure 16

Fig. 10. Block diagram of two-level scattering decomposition. Adapted from [98].

Figure 17

Table 8. Comparison of results (in % EER) on ASVspoof 2015 Challenge Database.

Figure 18

Fig 11. Schematic block diagram of short-time instantaneous amplitude and frequency modulation (AM-FM) feature set. After [122].

Figure 19

Fig 12. The ConvRBM subband filters in temporal-domain (a) without, and (b) with pre-emphasis, respectively. After [139].

Figure 20

Table 9. Comparison of results (in % EER) on ASVspoof 2017 Challenge Database.