Skip to main content Accessibility help

Environmental sound recognition: a survey

  • Sachin Chachada (a1) and C.-C. Jay Kuo (a1)


Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Environmental sound recognition: a survey
      Available formats

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Environmental sound recognition: a survey
      Available formats

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Environmental sound recognition: a survey
      Available formats


The online version of this article is published within an Open Access environment subject to the conditions of the Creative Commons Attribution-NonCommercial-ShareAlike licence . The written permission of Cambridge University Press must be obtained for commercial re-use.

Corresponding author

Corresponding author: Sachin Chachada Email:


Hide All
[1] Virtanen, T.; Helén, M.: Probabilistic model based similarity measures for audio query-by-example, in 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics . IEEE, 2007, 8285.
[2] Duan, S.; Zhang, J.; Roe, P.; Towsey, M.: survey of tagging techniques for music, speech and environmental sound. Artif. Intell. Rev., 42 ( 2012), 125.
[3] Chu, S.; Narayanan, S.; Kuo, C.-C.J.; Mataric, M.J.: Where am I? Scene recognition for mobile robots using audio features, in 2006 IEEE Int. Conf. on Multimedia and Expo. IEEE, 2006, 885–888.
[4] Yamakawa, N.; Takahashi, T.; Kitahara, T.; Ogata, T.; Okuno, H.G.: Environmental sound recognition for robot audition using Matching-Pursuit, in Modern Approaches in Applied Intelligence, in Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K. & Ali, M. (Eds), Springer Berlin Heidelberg, 2011, 110.
[5] Chen, J.; Kam, A.H., Zhang, J.; Liu, N.; Shue, L.: Bathroom activity monitoring based on sound, in Pervasive Computing, in Gellersen, H.W., Want, R., & Schmidt, A. (Eds), Springer Berlin Heidelberg, 2005, 4761.
[6] Vacher, M.; Portet, F.; Fleury, A.; Noury, N.: Challenges in the processing of audio channels for ambient assisted living, in 2010 12th IEEE Int. Conf. on e-Health Networking Applications and Services (Healthcom), IEEE, 2010, 330–337.
[7] Wang, J.-C.; Lee, H.-P.; Wang, J.-F.; Lin, C.-B.: Robust environmental sound recognition for home automation. Automation Science and Engineering, IEEE Transactions on, 5 (1) (2008), 2531.
[8] Cristani, M.; Bicego, M.; Murino, V.: Audio-visual event recognition in surveillance video sequences. IEEE Trans. Multimed., 9 (2) (2007), 257267.
[9] Sitte, R.; Willets, L.: Non-speech environmental sound identification for surveillance using self-organizing-maps, in Proc. 4th Conf. on IASTED Int. Conf.: Signal Processing, Pattern Recognition, and Applications, ser. SPPR'07. ACTA Press, Anaheim, CA, USA: 2007, 281–286. [Online]. Available:
[10] Bardeli, R.; Wolff, D.; Kurth, F.; Koch, M.; Tauchert, K.-H.; Frommolt, K.-H.: Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognit. Lett., 31 (12) (2010), 15241534.
[11] Weninger, F.; Schuller, B.; Audio recognition in the wild: static and dynamic classification on a real-world database of animal vocalizations. in 2011 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2011, 337340.
[12] Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 (2) (1989), 257286.
[13] Scaringella, N.; Zoia, G.; Mlynek, D.: Automatic genre classification of music content: a survey. IEEE Signal Process. Mag., 23 (2) (2006), 133141.
[14] Cowling, M.; Sitte, R.: Comparison of techniques for environmental sound recognition. Pattern Recognit. Lett., 24 (15) (2003), 28952907.
[15] Liu, H.; Motoda, H.; Setiono, R.; Zhao, Z.: Feature selection: An ever evolving frontier in data mining, in Proc. of the Fourth Workshop on Feature Selection in Data Mining, vol. 4, 2010, 413.
[16] Pickens, J.: A survey of feature selection techniques for music information retrieval. 2001.
[17] Van der Maaten, L.; Postma, E.; Van den Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res., 10 ( 2009), 141.
[18] Mitrović, D.; Zeppelzauer, M.; Breiteneder, C.: Features for content-based audio retrieval. Adv. Comput., 78 ( 2010), 71150.
[19] Deng, J.D., Simmermacher, C.; Cranefield, S.: A study on feature analysis for musical instrument classification. IEEE Trans. Syst., Man, Cybern. B, 38 (2) (2008), 429438.
[20] Peltonen, V.; Tuomi, J.; Klapuri, A.; Huopaniemi, J.; Sorsa, T.: Computational auditory scene recognition, in 2002 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2 IEEE, 2002, II–1941.
[21] Potamitis, I.; Ganchev, T.: Generalized recognition of sound events: approaches and applications, in Multimedia Services in Intelligent Environments, in Tsihrintzis, G.A. & Jain, L.C. (Eds), Springer Berlin Heidelberg, 2008, 4179.
[22] Wang, J.-C.; Wang, J.-F.; He, K.W., Hsu, C.-S.: Environmental sound classification using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor, in Int. Joint Conf. on Neural Networks, 2006. (IJCNN'06), IEEE, 2006, 1731–1735.
[23] Muhammad, G.; Alotaibi, Y.A., Alsulaiman, M.; Huda, M.N.: Environment recognition using selected MPEG-7 audio features and Mel-Frequency Cepstral Coefficients, in 2010 Fifth Int. Conf. Digital Telecommunications (ICDT), IEEE, 2010, 11–16.
[24] Tsau, E.; Kim, S.-H.; Kuo, C.-C.J.: Environmental sound recognition with CELP-based features, in 2011 10th Int. Symp. on Signals, Circuits and Systems (ISSCS). IEEE, 2011, 1–4.
[25] Karbasi, M.; Ahadi, S.; Bahmanian, M.: Environmental sound classification using spectral dynamic features, in 2011 8th Int. Conf. on Information, Communications and Signal Processing (ICICS). IEEE, 2011, 1–5.
[26] Valero, X.; Alías, F.: Classification of audio scenes using narrow-band autocorrelation features, in 2012 Proc. 20th Eur. Signal Process. Conf. (EUSIPCO), IEEE, 2012.
[27] Chui, C.K., An Introduction to Wavelets, in Chui, C.K. (Ed), vol. 1, Academic Press Professional, Inc., 1992.
[28] Grossmann, A.; Morlet, J.: Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM J. Math. Anal., 15 (4) (1984), 723736.
[29] Han, B.-j.; Hwang, E.: Environmental sound classification based on feature Collaboration, in IEEE Int. Conf. on Multimedia and Expo, 2009. (ICME 2009), IEEE, 2009, 542–545.
[30] Han, B.-j.; Hwang, E.: Gammatone wavelet features for sound classification in surveillance Applications, in 2012 Proc. of the 20th European Signal Processing Conf. (EUSIPCO), IEEE, 2012, 1658–1662.
[31] Umapathy, K.; Krishnan, S.; Rao, R.K.: Audio signal feature extraction and classification using local discriminant bases. Audio, Speech, and Language Processing, IEEE Transactions on, 15 (4) (2007), 12361246.
[32] Su, F.; Yang, L.; Lu, T.; Wang, G.: Environmental sound classification for scene recognition using local discriminant bases and HMM. in Proc. of the 19th ACM Int. Conf. on Multimedia, ACM, 2011, 1389–1392.
[33] Chu, S.; Narayanan, S.; Kuo, C.-C.J.: Environmental sound recognition with time–frequency audio features. Audio, Speech, and Language Processing, IEEE Transactions on, 17 (6) (2009), 11421158.
[34] Sivasankaran, S.; Prabhu, K.: Robust features for environmental sound Classification, in 2013 IEEE Int. Conf. on Electronics, Computing and Communication Technologies (CONECCT), 2013, 16.
[35] Wang, J.-C.; Lin, C.-H.; Chen, B.-W.; Tsai, M.-K.: Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation. IEEE Trans. Autom. Sci. Eng., 11 (2) (2014), 607613.
[36] Yamakawa, N.; Kitahara, T.; Takahashi, T.; Komatani, K.; Ogata, T.; Okuno, H.G.: Effects of modelling within-and between-frame temporal variations in power spectra on non-verbal sound recognition, in Proc. 2010 Int. Conf. on Spoken Language Processing, Makuhari, Citeseer, 2010, 23422345.
[37] Khunarsal, P.; Lursinsap, C.; Raicharoen, T.: Very short time environmental sound classification based on spectrogram pattern matching. 2013, (in press). [Online]. Available:
[38] Ghoraani, B.; Krishnan, S.: Time–frequency matrix feature extraction and classification of environmental audio signals. Audio, Speech, and Language Processing, IEEE Transactions on, 19 (7) (2011), 21972209.
[39] Ghoraani, B.; Krishnan, S.: Discriminant non-stationary signal features' clustering using hard and fuzzy cluster labeling. EURASIP J. Adv. Signal Process., 2012 (2012), (1), 250.
[40] Ghosal, A.; Chakraborty, R.; Dhara, B.C., Saha, S.K.: Song/ instrumental classification using spectrogram based contextual features, in Proc. of the CUBE Int. Information Technology Conf., ACM, 2012, 21–25.
[41] Yu, G.; Slotine, J.-J.: Fast wavelet-based visual classification, in 19th Int. Conf. on Pattern Recognition, 2008. ICPR 2008, 2008, 1–5.
[42] Yu, G.; Slotine, J.-J.: Audio classification from time-frequency texture, in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2009 (ICASSP 2009), IEEE, 2009, 1677–1680.
[43] Souli, S.; Lachiri, Z.: Environmental sounds classification based on visual features, in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, in San Martin, C. & Kim, S.-W. (Eds), Springer Berlin Heidelberg, 2011, 459466.
[44] Souli, S.; Lachiri, Z: Environmental sounds spectrogram classification using log-Gabor filters and multiclass support vector machines. arXiv:1209.5756, 2012.
[45]Elephant Call Types Database: [Online]. Available:
[46]International Affective Digital Sounds: [Online]. Available:
[47]Audio Network Sound Effects: [Online]. Available:
[48]BBC Sound Effects Library (SFX 001-040): [Online]. Available:
[49] Nakamura, S.; Hiyane, K.; Asano, F.; Nishiura, T.; Yamada, T.: Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition, in LREC, 2000.
[50]Find Sounds: Search the web for sounds: [Online]. Available:
[51]The Free Sound Project: [Online]. Available:
[52]Royalty free sounds from youtube: [Online]. Available:
[53] Chang, C.-C.; Lin, C.-J.; LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:127:27, 2011, software available at
[54] Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput., 10 (7), 18951923, 1998.
[55] Bell, R.M., Koren, Y.; Volinsky, C.: The Bellkor solution to the Netflix prize. KorBell Team's Report to Netflix, 2007.
[56] Töscher, A.; Jahrer, M.; Bell, R.M.: The Bigchaos solution to the Netflix grand prize. Netflix Prize Documentation, 2009.
[57] Wu, M.: Collaborative filtering via ensembles of matrix factorizations, in Proc. of KDD Cup and Workshop, vol. 2007, 2007.



Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed