Skip to main content Accessibility help
×
Hostname: page-component-8448b6f56d-c4f8m Total loading time: 0 Render date: 2024-04-24T15:10:16.485Z Has data issue: false hasContentIssue false

6 - Computational Analysis of Vocal Expression of Affect: Trends and Challenges

from Part I - Conceptual Models of Social Signals

Published online by Cambridge University Press:  13 July 2017

Klaus Scherer
Affiliation:
University of Geneva
Björn Schüller
Affiliation:
Imperial College London and Technical University Munich
Aaron Elkins
Affiliation:
San Diego State University
Judee K. Burgoon
Affiliation:
University of Arizona
Nadia Magnenat-Thalmann
Affiliation:
Université de Genève
Maja Pantic
Affiliation:
Imperial College London
Alessandro Vinciarelli
Affiliation:
University of Glasgow
Get access

Summary

In this chapter we want to first provide a short introduction into the “classic” audio features used in this field and methods leading to the automatic recognition of human emotion as reflected in the voice. From there, we want to focus on the main trends leading up to the main challenges for future research. It has to be stated that a line is difficult to draw here – what are contemporary trends and where does “future” start. Further, several of the named trends and challenges are not limited to the analysis of speech, but hold for many if not all modalities. We focus on examples and references in the speech analysis domain.

“Classic Features”: Perceptual and Acoustic Measures

Systematic treatises on the importance of emotional expression in speech communication and its powerful impact on the listener can be found throughout history. Early Greek and Roman manuals on rhetoric (e.g., by Aristotle, Cicero, Quintilian) suggested concrete strategies for making speech emotionally expressive. Evolutionary theorists, such as Spencer, Bell, and Darwin, highlighted the social functions of emotional expression in speech and music. The empirical investigation of the effect of emotion on the voice started with psychiatrists trying to diagnose emotional disturbances and early radio researchers concerned with the communication of speaker attributes and states, using the newly developed methods of electroacoustic analysis via vocal cues in speech. Systematic research programs started in the 1960s when psychiatrists renewed their interest in diagnosing affective states, nonverbal communication researchers explored the capacity of different bodily channels to carry signals of emotion, emotion psychologists charted the expression of emotion in different modalities, linguists and particularly phoneticians discovered the importance of pragmatic information, all making use of ever more sophisticated technology to study the effects of emotion on the voice (see Scherer, 2003, for further details).

While much of the relevant research has exclusively focused on the recognition of vocally expressed emotions by naive listeners, research on the production of emotional speech has used the extraction of acoustic parameters from the speech signal as a method to understand the patterning of the vocal expression of different emotions.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Banea, C.,Mihalcea, R., & Wiebe, J. (2011).Multilingual sentiment and subjectivity. In I., Zitouni & D., Bikel (Eds), Multilingual Natural Language Processing. Prentice Hall.
Batliner, A. & Schuller, B. (2014).More than fifty years of speech processing – the rise of computational paralinguistics and ethical demands. In Proceedings ETHICOMP 2014. Paris, France: CERNA, for Commission de réflexion sur l'Ethique de la Recherche en sciences et technologies du Numérique d'Allistene.
Bonneh, Y. S., Levanon, Y., Dean-Pardo, O., Lossos, L., & Adini, Y. (2011). Abnormal speech spectrum and increased pitch variability in young autistic children. Frontiers in Human Neuroscience, 4.Google Scholar
Callejas, Z. & Lòpez-Cózar, R. (2008). In fluence of contextual information in emotion annotation for spoken dialogue systems. Speech Communication, 50(5), 416–433.Google Scholar
Chen, S. X. & Bond, M. H. (2010). Two languages, two personalities? Examining language effects on the expression of personality in a bilingual context. Personality and Social Psychology Bulletin, 36(11), 1514–1528.Google Scholar
Cirillo, J. (2004). Communication by unvoiced speech: The role of whispering. Annals of the Brazilian Academy of Sciences, 76(2), 1–11.Google Scholar
Cirillo, J. & Todt, D. (2002). Decoding whispered vocalizations: relationships between social and emotional variables. Proceedings IX International Conference on Neural Information Processing (ICONIP) (pp.1559–1563).
Coutinho, E., Deng, J., & Schuller, B. (2014). Transfer learning emotion manifestation across music and speech. In Proceedings 2014 International Joint Conference on Neural Networks (IJCNN) as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI). Beijing: IEEE.
Cowie, R. (2011). Editorial: “Ethics and good practice” – computers and forbidden places:Where machines may and may not go. In P., Petta, C., Pelachaud, & R., Cowie (Eds), Emotion-Oriented Systems: The Humaine Handbook (pp. 707–712). Berlin: Springer.
Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In Proceedings 14th Conference on Computational Natural Language Learning (pp. 107–116).
Deng, J. & Schuller, B. (2012). Confidence measures in speech emotion recognition based on semi-supervised learning. In Proceedings Interspeech 2012. Portland, OR.
Deng, J., Han, W., & Schuller, B. (2012). Confidence measures for speech emotion recognition: A start. In T., Fingscheidt & W., Kellermann (Eds), Proceedings 10th ITG Conference on Speech Communication (pp. 1–4). Braunschweig, Germany: IEEE.
Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In Proceedings 5th biannual Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013) (pp. 511–516). Geneva: IEEE.
Deng, J., Xia, R., Zhang, Z., Liu, Y., & Schuller, B. (2014). Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition. In Proceedings 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Florence, Italy: IEEE.
Dhall, A., Goecke, R., Joshi, J.,Wagner, M., & Gedeon, T. (Eds) (2013). Proceedings of the 2013 Emotion Recognition in the Wild Challenge and Workshop. Sydney: ACM.
Döring, S., Goldie, P., & McGuinness, S. (2011). Principalism: A method for the ethics of emotion-oriented machines. In P., Petta, C., Pelachaud, & R., Cowie (Eds), Emotion-Oriented Systems: The Humaine Handbook (pp. 713–724). Berlin Springer.
Forbes-Riley, K. & Litman, D. (2004). Predicting emotion in spoken dialogue from multiple knowledge sources. In Proceedings HLT/NAACL (pp.201–208).
Goldie, P., Döring, S., & Cowie, R. (2011). The ethical distinctiveness of emotion-oriented technology: Four long-term issues. In P., Petta, C., Pelachaud, & R., Cowie (Eds), Emotion-Oriented Systems: The Humaine Handbook (pp. 725–734). Berlin: Springer.
Grossman, R. B., Bemis, R. H., Skwerer, D. P., & Tager-Flusberg, H. (2010). Lexical and affective prosody in children with high-functioning autism. Journal of Speech, Language, and Hearing Research, 53, 778–793.Google Scholar
Gunes, H., Schuller, B., Pantic, M., & Cowie, R. (2011). Emotion representation, analysis and synthesis in continuous space: A survey. In Proceedings International Workshop on Emotion Synthesis, Representation, and Analysis in Continuous space, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011) (pp. 827–834). Santa Barbara, CA: IEEE.
Han, W., Zhang, Z., Deng, J., et al. (2012). Towards distributed recognition of emotion in speech. In Proceedings 5th International Symposium on Communications, Control, and Signal Processing, ISCCSP 2012 (pp. 1–4). Rome, Italy: IEEE.
Han, W., Li, H., Ruan, H., et al. (2013). Active learning for dimensional speech emotion recognition. In Proceedings Interspeech 2013 (pp. 2856–2859). Lyon, France: ISCA.
Havasi, C., Speer, R., & Alonso, J. (2007). ConceptNet 3: A flexible, multilingual semantic network for common sense Knowledge. In Recent Advances in Natural Language Processing, September.
Hayes, A. F. & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77–89.Google Scholar
Juslin, P. N. & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129, 770–814.Google Scholar
Kajackas, A., Anskaitis, A., & Gursnys, D. (2008). Peculiarities of testing the impact of packet loss on voice quality. Electronics and Electrical Engineering, 82(2), 35–40.Google Scholar
Kövecses, Z. (2000). The concept of anger: Universal or culture specific? Psychopathology, 33, 159–170.Google Scholar
Lindquist, K., Feldman Barrett, L., Bliss-Moreau, E., & Russell, J. (2006). Language and the perception of emotion. Emotion, 6(1), 125–138.Google Scholar
Liscombe, J., Riccardi, G., & Hakkani-Tür, D. (2005). Using context to improve emotion detection in spoken dialog systems. In Proceedings of INTERSPEECH (pp. 1845–1848).
Mahdhaoui, A. & Chetouani, M. (2009). A new approach for motherese detection using a semisupervised algorithm. Machine Learning for Signal Processing XIX – Proceedings of the 2009 IEEE Signal Processing Society Workshop, MLSP (pp. 1–6).
Marchi, E., Schuller, B., Batliner, A., et al. (2012a). Emotion in the speech of children with autism spectrum conditions: Prosody and everything else. In Proceedings 3rd Workshop on Child, Computer and Interaction (WOCCI 2012), Satellite Event of Interspeech 2012. Portland, OR: ISCA.
Marchi, E., Batliner, A., Schuller, B., et al. (2012b). Speech, emotion, age, language, task, and typicality: Trying to disentangle performance and feature relevance. In Proceedings 1st International Workshop on Wide Spectrum Social Signal Processing (WS3P 2012), held in conjunction with the ASE/IEEE International Conference on Social Computing (SocialCom 2012). Amsterdam, The Netherlands: IEEE.
Obin, N. (2012). Cries and whispers – classification of vocal effort in expressive speech. In Proceedings Interspeech. Portland, OR: ISCA.
Patel, S. & Scherer, K. R. (2013). Vocal behaviour. In J. A., Hall & M. L., Knapp (Eds), Handbook of Nonverbal Communication. Berlin: Mouton-DeGruyter.
Ramírez-Esparza, N., Gosling, S. D., Benet-Martínez, V., Potter, J. P., & Pennebaker, J.W. (2006). Do bilinguals have two personalities? A special case of cultural frame switching. Journal of Research in Personality, 40, 99–120.Google Scholar
Riviello, M. T., Chetouani, M., Cohen, D., & Esposito, A. (2010). On the perception of emotional “voices”: a cross-cultural comparison among American, French and Italian subjects. In Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues (vol. 6800, pp. 368–377). Springer LNCS.
Sauter, D., Eisner, F., Ekman, P., & Scott, S. K. (2010). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences of the United States of America, 107(6), 2408–2412.Google Scholar
Sauter, D. A. (2006). An investigation into vocal expressions of emotions: the roles of valence, culture, and acoustic factors. PhD thesis, University College London.
Scherer, K. R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99, 143–165.Google Scholar
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256.Google Scholar
Scherer, K. R. & Brosch, T. (2009). Culture-specific appraisal biases contribute to emotion dispositions. European Journal of Personality, 23, 265–288.Google Scholar
Schröder, M., Devillers, L., Karpouzis, K., et al. (2007). What should a generic emotion markup language be able to represent? In A., Paiva, R.W., Picard & R., Prada (Eds), Affective Computing and Intelligent Interaction: Second International Conference, ACII 2007, Lisbon, Portugal, September 12-14, 2007, Proceedings. Lecture Notes on Computer Science (LNCS) (vol. 4738, pp. 440–451). Berlin: Springer.
Schuller, B. (2012). The computational paralinguistics challenge. IEEE Signal Processing Magazine, 29(4), 97–101.Google Scholar
Schuller, B. & Batliner, A. (2013). Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Hoboken, NJ: Wiley.
Schuller, B. & Devillers, L. (2010). Incremental acoustic valence recognition: An inter-corpus perspective on features, matching, and performance in a gating paradigm. In Proceedings Interspeech (pp. 2794–2797). Makuhari, Japan: ISCA.
Schuller, B., Dunwell, I., Weninger, F., & Paletta, L. (2013a). Serious gaming for behavior change – the state of play. IEEE Pervasive Computing Magazine, Special Issue on Understanding and Changing Behavior, 12(3), 48–55.Google Scholar
Schuller, B., Steidl, S., Batliner, A., et al. (2013b). The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proceedings Interspeech 2013 (pp. 148–152). Lyon, France: ISCA.
Silverman, K., Beckman, M., Pitrelli, J., et al. (1992). ToBI: A standard for labeling English prosody. In Proceedings ICSLP(vol. 2, pp. 867–870).Google Scholar
Sneddon, I., Goldie, P., & Petta, P. (2011). Ethics in emotion-oriented systems: The challenges for an ethics committee. In P., Petta, C., Pelachaud, & R., Cowie (Eds), Emotion-Oriented Systems: The Humaine Handbook. Berlin. Springer.
Sundberg, J., Patel, S., Björkner, E., & Scherer, K. R. (2011). Interdependencies among voice source parameters in emotional speech. IEEE Transactions on Affective Computing, 99, 2423– 2426.Google Scholar
Tawari, A. & Trivedi, M. M. (2010a). Speech emotion analysis: Exploring the role of context. IEEE Transactions on Multimedia, 12(6), 502–509.Google Scholar
Tawari, A. & Trivedi, M. M. (2010b). Speech emotion analysis in noisy real world environment. In Proceedings 20th International Conference on Pattern Recognition (ICPR) (pp. 4605–4608). Istanbul, Turkey: IAPR.
Weninger, F., Eyben, F., Schuller, B., Mortillaro, M., & Scherer, K. R. (2013). On the acoustics of emotion in audio: What speech, music and sound have in common. Frontiers in Psychology, Emotion Science, Special Issue on Expression of emotion in music and vocal communication, 4(292), 1–12.Google Scholar
Wöllmer, M., Eyben, F., Reiter, S., et al. (2008). Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. Proceedings Interspeech 2008 (pp. 597–600). Brisbane, Australia: ISCA.
Wöllmer, M., Weninger, F., Knaup, T., et al. (2013). YouTube movie reviews: Sentiment analysis in an audiovisuall context. IEEE Intelligent Systems Magazine, Special Issue on Statistical Approaches to Concept-Level Sentiment Analysis, 28(3), 46–53.Google Scholar
Wu, D. & Parsons, T. (2011). Active class selection for arousal classification. Proceedings Affective Computing and Intelligent Interaction (ACII) (pp. 132–141).
Zhang, Z. & Schuller, B. (2012). Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition. In Proceedings Interspeech 2012. Portland, OR: ISCA.
Zhang, Z., Weninger, F., Wöllmer, M., & Schuller, B. (2011). Unsupervised learning in crosscorpus acoustic emotion recognition. Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2011) (pp. 523–528). Big Island, HY: IEEE.
Zhang, Z., Deng, J., Marchi, E., & Schuller, B. (2013a). Active learning by label uncertainty for acoustic emotion recognition. Proceedings Interspeech 2013 (pp. 2841–2845). Lyon, France: ISCA.
Zhang, Z., Deng, J., & Schuller, B. (2013b). Co-training succeeds in computational paralinguistics. In Proceedings 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) (pp. 8505–8509). Vancouver, IEEE.

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×