Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-07T21:19:33.716Z Has data issue: false hasContentIssue false

Voice as objective biomarker of stress: association of speech features and cortisol

Published online by Cambridge University Press:  03 September 2025

Felix Menne
Affiliation:
ki:elements GmbH, Saarbrücken, Germany
Hali Lindsay
Affiliation:
ki:elements GmbH, Saarbrücken, Germany
Johannes Tröger
Affiliation:
ki:elements GmbH, Saarbrücken, Germany
Silke Paulmann
Affiliation:
Department of Psychology and Centre for Brain Science, University of Essex, Colchester, UK
Alexandra König
Affiliation:
ki:elements GmbH, Saarbrücken, Germany Centre Hospitalier et Universitaire, Clinique Gériatrique du Cerveau et du Mouvement, Centre Mémoire de Ressources et de Recherche, Université Côte d’Azur, Nice, France
Nadine Steinbach
Affiliation:
Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe-University Frankfurt am Main, Frankfurt, Germany
Andreas Reif
Affiliation:
Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe-University Frankfurt am Main, Frankfurt, Germany
Michael M. Plichta
Affiliation:
Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe-University Frankfurt am Main, Frankfurt, Germany
Maren Schmidt-Kassow*
Affiliation:
Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital, Goethe-University Frankfurt am Main, Frankfurt, Germany
*
Corresponding author: Maren Schmidt-Kassow; Email: schmidt-kassow@med.uni-frankfurt.de
Rights & Permissions [Opens in a new window]

Abstract

Objective:

Cortisol is a well-established biomarker of stress, assessed through salivary or blood samples, which are intrusive and time-consuming. Speech, influenced by physiological stress responses, offers a promising non-invasive, real-time alternative for stress detection. This study examined relationships between speech features, state anger, and salivary cortisol using a validated stress-induction paradigm.

Methods:

Participants (N = 82) were assigned to cold (n = 43) or warm water (n = 39) groups. Saliva samples and speech recordings were collected before and 20 minutes after the Socially Evaluated Cold Pressor Test (SECPT), alongside State–Trait Anger Expression Inventory (STAXI) ratings. Acoustic features from frequency, energy, spectral, and temporal domains were analysed. Statistical analyses included Wilcoxon tests, correlations, linear mixed models (LMMs), and machine learning (ML) models, adjusting for covariates.

Results:

Post-intervention, the cold group showed significantly higher cortisol and state anger. Stress-related speech changes occurred across domains. Alpha ratio decreased and MFCC3 increased post-stress in the cold group, associated with cortisol and robust to sex and baseline levels. Cortisol–speech correlations were significant in the cold group, including sex-specific patterns. LMMs indicated baseline cortisol influenced feature changes, differing by sex. ML models modestly predicted SECPT group membership (AUC = 0.55) and showed moderate accuracy estimating cortisol and STAXI scores, with mean absolute errors corresponding to ∼ 24–38% and ∼16–28% of observed ranges, respectively.

Conclusion:

This study demonstrates the potential of speech features as objective stress markers, revealing associations with cortisol and state anger. Speech analysis may offer a valuable, non-invasive tool for assessing stress responses, with notable sex differences in vocal biomarkers.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Scandinavian College of Neuropsychopharmacology
Figure 0

Figure 1. Graphical overview depicting the study timeline.

Figure 1

Table 1. Demographics, STAXI scores and cortisol data

Figure 2

Table 2. Difference in speech features before and after intervention. Features that remained significant after correction are highlighted in bold. Before and after arrows indicate if the feature value increased or decreased in value after exposure to cold/warm water. Features are organised by feature type and then alphabetically. Only features that were significant in the warm and/or cold group are summarised in Table 2. All results are listed in Supplementary Table 2

Figure 3

Figure 2. Delta in correlations of speech features and cortisol differences (after-before) for cold and warm groups for the whole sample and stratified by sex. For brevity reasons, only significant results are displayed here, the full results are displayed in Supplementary Table 3.

Figure 4

Table 3. Associations of acoustic features with cortisol levels across conditions. The table presents correlation coefficients (corr), effect sizes (ES), and adjusted p-values (p-adj) for the overall population. Sex-specific correlations are indicated separately for males (M) and females (F), with positive ( + ) or negative ( − ) directions. Fisher’s Z-test results, including Z-values and corresponding p-values, are reported under the ‘Sex Comparison’ header to assess differences between sexes. Results are shown for both cold and warm conditions. For brevity reasons, only significant results are reported here. The full results are displayed in Supplementary Table 3

Figure 5

Figure 3. Delta in correlations of speech features and STAXI scores (after-before) for cold and warm groups for the whole sample and stratified by sex. For brevity reasons, only significant results are displayed here, the full results are displayed in Supplementary Table 4.

Figure 6

Table 4. Summary of speech feature associations with sex, stress, and robustness to confounding factors. The table categorises key speech features based on their sensitivity to sex differences (✔ = significant, ✘ = not significant), their link to stress responses (as measured by cortisol levels), and whether these associations are robust to confounding factors such as baseline cortisol and sex. The significant Before v. After column indicates whether a feature showed significant differences before and after exposure to the stress condition and if so, in which condition. Cortisol correlation direction shows whether the feature has a positive ( + ) or negative ( − ) correlation with cortisol levels. Direction (before-after stress) indicates whether the feature difference increased (↑) or decreased (↓) after exposure to the stress condition

Supplementary material: File

Menne et al. supplementary material

Menne et al. supplementary material
Download Menne et al. supplementary material(File)
File 616.9 KB