In this chapter we want to first provide a short introduction into the “classic” audio features used in this field and methods leading to the automatic recognition of human emotion as reflected in the voice. From there, we want to focus on the main trends leading up to the main challenges for future research. It has to be stated that a line is difficult to draw here – what are contemporary trends and where does “future” start. Further, several of the named trends and challenges are not limited to the analysis of speech, but hold for many if not all modalities. We focus on examples and references in the speech analysis domain.
“Classic Features”: Perceptual and Acoustic Measures
Systematic treatises on the importance of emotional expression in speech communication and its powerful impact on the listener can be found throughout history. Early Greek and Roman manuals on rhetoric (e.g., by Aristotle, Cicero, Quintilian) suggested concrete strategies for making speech emotionally expressive. Evolutionary theorists, such as Spencer, Bell, and Darwin, highlighted the social functions of emotional expression in speech and music. The empirical investigation of the effect of emotion on the voice started with psychiatrists trying to diagnose emotional disturbances and early radio researchers concerned with the communication of speaker attributes and states, using the newly developed methods of electroacoustic analysis via vocal cues in speech. Systematic research programs started in the 1960s when psychiatrists renewed their interest in diagnosing affective states, nonverbal communication researchers explored the capacity of different bodily channels to carry signals of emotion, emotion psychologists charted the expression of emotion in different modalities, linguists and particularly phoneticians discovered the importance of pragmatic information, all making use of ever more sophisticated technology to study the effects of emotion on the voice (see Scherer, 2003, for further details).
While much of the relevant research has exclusively focused on the recognition of vocally expressed emotions by naive listeners, research on the production of emotional speech has used the extraction of acoustic parameters from the speech signal as a method to understand the patterning of the vocal expression of different emotions.