Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-06T21:41:18.795Z Has data issue: false hasContentIssue false

Does prosody mark sarcasm early in an utterance? A production and perception study, including listeners who self-identified as being on the autism spectrum

Published online by Cambridge University Press:  26 January 2026

Csilla Tatár*
Affiliation:
University of Michigan, US
Jonathan R. Brennan
Affiliation:
University of Michigan, US
Jelena Krivokapić
Affiliation:
University of Michigan, US Haskins Laboratories, US
Ezra Keshet
Affiliation:
University of Michigan, US
*
*Corresponding author Email: cstatar@umich.edu
Rights & Permissions [Opens in a new window]

Abstract

This study examines the utterance-initial prosodic marking of sarcasm in English and its perception in listeners who did and listeners who did not self-identify as being on the autism spectrum. We ask (i) whether speakers use prosody to mark sarcasm in the early, ‘pre-target’ portion of an utterance (that is, in the portion before a ‘target’ word most closely associated with the sarcastic intent occurs), (ii) whether individuals vary in how they mark sarcasm, (iii) whether listeners reliably recognize sarcasm from pre-target prosody alone, and (iv) whether recognition accuracy varies by speaker or self-identified autistic traits. Eight American English speakers were recorded producing utterances presented in contexts conducive to either sarcasm or sincerity. Pre-target parts were presented in a two-alternative forced-choice experiment to individuals who either did (n=51) or did not (n=44) self-identify as being on the autism spectrum, and were examined for syllable duration and f0-related properties (maximum, minimum, range, and wiggliness). Results show that speakers distinguish sarcasm and sincerity in the pre-target region with duration being the most salient marker. Most listeners recognize sarcasm from pre-target fragments, but there is variation in how well each speaker is perceived. Whether the listener self-identified as being on the autism spectrum or not does not predict sarcasm and sincerity recognition accuracy. The results provide evidence that utterance-initial prosody contributes to sarcasm recognition, with the proviso that speaker and listener variation be taken into account.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The International Phonetic Association
Figure 0

Table 1. An example of an utterance with three contexts, one conducive to sincerity, one to sarcasm, and one to disbelief

Figure 1

Table 2. Pairwise correlations between the acoustic measures

Figure 2

Figure 1. Centered data by speaker (P1–P8) in the sincerity and sarcasm conditions. The black points correspond to the individual means; error bars indicate standard error of the mean.

Figure 3

Figure 2. Model-predicted regressions lines by speaker. On the y-axis is affect (1=sarcasm, 0=sincerity), and on the x axes are the phonetic measures.

Figure 4

Figure 3. Sarcasm and sincerity recognition rates. The large points correspond to the individual means; error bars indicate standard error of the mean.

Figure 5

Table 3. Means of pilot listeners’ (n = 49) accuracy rates

Figure 6

Figure 4. Accuracy for sarcasm and sincerity by self-identified ASC grouping. The large points correspond to the group means; error bars indicate standard error of the mean.

Figure 7

Figure 5. Affect recognition accuracy by ASC diagnosis as reported by Prolific participants.

Figure 8

Figure 6. Affect recognition accuracy by speaker and self-identified ASC grouping. Black points indicate grand-average accuracy while colored points and error bars indicate posterior mean and 95% CI per speaker.

Figure 9

Table A1. Sample production stimuli

Figure 10

Table B1. Summary statistics of speakers’ production data

Figure 11

Table C1. Utterance lists keyed by positive adjectives (e.g., brainy stands for Lynn’s really quite a brainy kid, of which the pre-target portion was analyzed, i.e. Lynn’s really quite a)

Figure 12

Table C2. The structure of the perception study lists. Shaded cells indicate that no item from the participant was included in the corresponding list

Figure 13

Table D1. Recognition rates by listener group given in percentage

Figure 14

Table D2. Recognition rates by speaker given in percentage, with the listener groups conflated

Figure 15

Table D3. Recognition rates by speaker given in percentage, separated by listener group

Figure 16

Table E1. Individual affect recognition accuracy scores

Figure 17

Figure F1. Accuracy rate by speaker and AQ grouping (black points indicate grand-average accuracy while colored points and error bars indicate posterior mean and 95% CI per speaker).

Figure 18

Figure F2. Accuracy for sarcasm and sincerity by AQ grouping (over threshold, under threshold). The large points correspond to the group means, error bars indicate standard error of the mean.