Hostname: page-component-77f85d65b8-8wtlm Total loading time: 0 Render date: 2026-03-29T03:39:01.741Z Has data issue: false hasContentIssue false

Environmental sound recognition: a survey

Published online by Cambridge University Press:  15 December 2014

Sachin Chachada*
Affiliation:
Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA
C.-C. Jay Kuo
Affiliation:
Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA
*
Corresponding author: Sachin Chachada Email: chachada@usc.edu

Abstract

Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution-NonCommercial-ShareAlike licence . The written permission of Cambridge University Press must be obtained for commercial re-use.
Copyright
Copyright © The Authors, 2014
Figure 0

Fig. 1. Taxonomy for audio features as proposed in [18].

Figure 1

Fig. 2. Illustration of the NB-ACF feature extraction process.

Figure 2

Table 1. Environmental Sound Database (ESD).

Figure 3

Table 2. Selected methods for comparison.

Figure 4

Fig. 3. The feature extraction process used in Method M10.

Figure 5

Fig. 4. Averaged classification accuracies over 30 trials.

Figure 6

Fig. 5. Classification accuracies for 30 trials.

Figure 7

Table 3. McNemar's test statistic for 1 of 30 trials

Figure 8

Table 4. Class-pairs that frequently failed McNemar's Test over 30 trials

Figure 9

Table 5. Paired t-test statistic for 30 trials

Figure 10

Fig. 6. Comparison of averaged classification accuracies of M6, M7, and M10.

Figure 11

Fig. 7. Comparison of averaged classification accuracies of M2, M3, M5, and M9.

Figure 12

Fig. 8. Comparison of averaged classification accuracies of M1, M10, and M11.