Skip to main content Accessibility help
×
Hostname: page-component-6766d58669-bp2c4 Total loading time: 0 Render date: 2026-05-24T13:55:12.279Z Has data issue: false hasContentIssue false

Section 4 - Diversity of Rhythm from Oral Speech to Music

Published online by Cambridge University Press:  23 April 2026

Lars Meyer
Affiliation:
Max Planck Institute for Human Cognitive and Brain Sciences
Antje Strauss
Affiliation:
University of Konstanz

Information

Figure 0

Figure 23.1 Annotation with the Praat phonetic workbench software (German spontaneous report, illustrating hesitatons).Figure 23.1 long description.

Figure 1

Figure 23.2 The rPVI and nPVI for different languages, showing linear and non-linear properties for the two metrics.Figure 23.2 long description.

Figure 2

Figure 23.3 Speech modulation frequency scale.Figure 23.3 long description.

Figure 3

Figure 23.4 Rhythm formant analysis data flow.Figure 23.4 long description.

Figure 4

Figure 23.5 Dialogue registers: top, 20 s, toddlers (Section 23.7.1); bottom, 18 s, caller–choir exchange (Section 23.7.2).Figure 23.5 long description.

Figure 5

Figure 23.6 State machine (finite transition network) representing Ega orature dialogue grammar.Figure 23.6 long description.

Figure 6

Figure 23.7 Low-frequency spectrogram: first chant section of the orature session.Figure 23.7 long description.

Figure 7

Figure 23.8 English: top, L1 male, South-Eastern British English; middle: Chinese L2 female (fluent); bottom, Chinese L2 male (less fluent).Figure 23.8 long description.

Figure 8

Figure 23.9 RFA demodulation and LF spectrum outputs for a newsreading (top) and a poetry reading (bottom).Figure 23.9 long description.

Figure 9

Figure 23.10 Comparison of variances of newsreadings (B) and poetry readings (H).Figure 23.10 long description.

Figure 10

Figure 23.11 Hierarchical clustering of newsreading and poetry reading (Euclidean distance and farthest neighbour clustering).Figure 23.11 long description.

Figure 11

Figure 23.12 Distance network with modern recitations of two genres of Tang dynasty poetry.Figure 23.12 long description.

Figure 12

Figure 24.1 Individual example frames from the instruction-oriented video produced by KP for the purpose of this study.

Figure 13

Figure 24.2 Overview of the experimental procedure and the material provided in each step.

Figure 14

Figure 24.3 Illustration of the recording setting inside the sound-treated booth of the CIE Acoustics Lab.

(photo taken with consent of participant IF3)
Figure 15

Figure 24.4 Illustration of the Smalley–Trent five-minute personality test result.Figure 24.4 long description.

Freely drawn based on the screen image; printed with kind permission of Sara Pearsell.
Figure 16

Figure 24.5 Example of a Praat TextGrid file in combination with its corresponding sound file.The figure shows sentence 2 of the fable uttered by the female speaker KF (“Sie wurden einig, dass derjenige für den Stärkeren gelten sollte, der den Wanderer zwingen würde, seinen Mantel abzunehmen”). Vertical bars mark landmarks in the speech signal at six levels, from acoustic energy peaks (level 6) to syllable and individual sound-segment boundaries (levels 5 and 1). The dark gray curve in the spectrogram shows the f0 contour (100–400 Hz); “sil” labels indicate silent (nonspeech) intervals.Figure 24.5 long description.

Figure 17

Figure 24.6 Results on rhythm characteristics illustrated by two time-interval parameters.Estimated marginal means and error bars (95% CI) for (a) the delta V and (b) the syllable-based rPVI. Dark gray bars indicate the effect-oriented and light-gray bars the instruction-oriented video conditions. Top panels show the female speakers’ and bottom panels the male speakers’ results.Figure 24.6 long description.

Figure 18

Figure 24.7 Results on intonation, timbre, timing, and loudness illustrated by one parameter each.Figure 24.7 long description.

Figure 19

Figure 24.8 Results on jaw lowering represented by absolute minimum and normalized range.Estimated marginal means and error bars (95% CI) for (a) the minimum jaw-lowering amplitude and (b) the normalized jaw-lowering range, where values below zero indicate that speakers opened their mouth less than in the CE condition. Dark gray bars indicate the effect-oriented and light-gray bars the instruction-oriented video conditions. Top panels show the female speakers’ and bottom panels the male speakers’ results. Note that the male display of jaw-lowering amplitude shows dB*10 to account for the head-size differences between male and female speakers on absolute amplitude offset levels (Alam et al., 2015).Figure 24.8 long description.

Figure 20

Figure 26.1(A)

Figure 21

Figure 26.1(B)

Figure 22

Figure 26.2 Linear mixed-effects regression results for neural and acoustic data.Neural tracking (speech-brain coherence in the theta band – 4–8 Hz) is related to rhythmic regularity and pulse clarity, even after controlling for utterance type.Figure 26.2 long description.

Figure 23

Figure 27.1 The relationship between symbolic time structure (beat time) and subsymbolic time structure (real time).The relationship between both time structures is characterized by their mapping of the reference beat. In other words, the idealized isochronic beat in beat time is distorted in subsymbolic real time.

Figure 24

Figure 27.2 Two examples of the metrical grid.The left, two bars from the old folk song “Scarborough Fair,” displays an instance of an isochronic meter (here, 6/8); the right, three bars from the song “Seven Days” by Sting, shows an instance of a non-isochronic meter (5/4); note that the beat level combines 3/8 + 3/8 + 2/8 + 2/8.

Figure 25

Figure 27.3 The same set of onsets and durations placed on two different metrical grids.The resulting rhythmic Gestalt is different for the two cases.

Figure 26

Figure 27.4a.

Figure 27

Figure 27.4b.

Figure 28

Figure 27.4c.

Figure 29

Figure 27.5(A)

Figure 30

Figure 27.5(B)

Figure 31

Figure 27.5(C)

Figure 32

Figure 27.6a.

Figure 33

Figure 27.6b.

Figure 34

Figure 27.6c.

Figure 35

Figure 27.6d.

Figure 36

Figure 27.7 Intensity and F0 of an utterance with late peak in German.Figure 27.7 long description.

Figure 37

Figure 27.8 Inventory of prosodic domains association to beats for German eine Lagune in der Wüste.Figure 27.8 long description.

Figure 38

Figure 27.9 A representation of the last two bars of the example in Figure 27.3, 4/4 version.Figure 27.9 long description.

Figure 39

Figure 27.10 A representation of the last two bars of the example in Figure 27.3, 3/4 version.Figure 27.10 long description.

Figure 40

Figure 28.1 The PRISM framework.The three mechanisms proposed in the PRISM framework (Fiveash et al., 2021).Figure 28.1 long description.

Copyright © 2021 by American Psychological Association. Reproduced with permission (Fiveash et al., 2021).
Figure 41

Figure 28.2 Broader cognitive and biological considerations.In addition to the neural and cognitive considerations presented in the PRISM framework, there are broader cognitive and biological considerations to keep in mind when investigating speech and music connections.Figure 28.2 long description.

Copyright © 2021 by American Psychological Association. Reproduced with permission (Fiveash et al., 2021).
Figure 42

Figure 29.1 An overview of the processes and structures involved in Interaction Phonology.The diagram depicts the processes in a listener who entrains to the rhythmic patterns of speech based on the expectations inherent in their language competence. The level of rhythmic-prosodic entrainment can be strengthened in difficult communicative situations. That way, the listener’s attention is guided to higher-order linguistic aspects connected to the rhythmic structures thus enhanced. This attentional process may alter the way that rhythmic-prosodic structures are connected to higher-order linguistic patterns, but also intensify the level of entrainment with an interlocutor. Taken together, these processes are expected to aid mutual understanding, particularly in “difficult” situations. The model relies on a set of modules, some of which are part of the speaker’s grammar. These encompass (1) an entrainment module, (2) an auditory analysis guided by it, which is also linked to (3) motor patterns, which automatically lead to convergence in speech production as an automatic by-product of entrainment, (4) a set of linguistic structures and expectations as part of a speaker’s grammar, which are linked to the levels of entrainment via their corresponding levels of prosodic organization, and (5) a monitoring of communication relevance, which estimates the need for entrainment (informed by the auditory and linguistic analysis) and adjusts the level of entrainment by modulating the coupling strength.

Figure 43

Figure 29.2 An adapted sketch of Interaction Phonology.Those parts of Interaction Phonology that have received empirical support are indicated by check marks. Other parts are either commented as optional (auditory-motor mapping and speech adaptation) or have been modified/extended in line with empirical findings. In particular, the language-specific structures and expectations for which we have evidence to guide rhythmic-prosodic entrainment and to be shaped by it currently are restricted to phonetic-phonological ones. It remains unclear whether syntactic or lexical adaptations are connected with entrainment processes likewise.Figure 29.2 long description.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×