Hostname: page-component-89b8bd64d-7zcd7 Total loading time: 0 Render date: 2026-05-06T08:30:44.738Z Has data issue: false hasContentIssue false

From co-speech actions to co-speech gestures? Effects of visibility and information structure on cross-modal synchronization

Published online by Cambridge University Press:  28 November 2025

Petra Wagner*
Affiliation:
Faculty of Linguistics and Literary Studies and CITEC, Bielefeld University, Bielefeld, Germany
Rights & Permissions [Opens in a new window]

Abstract

We investigate the synchronization of speech and co-speech actions (i.e., manual game moves) in a dyadic game interaction across different levels of information structure and mutual visibility. We analyze cross-modal synchronization as the temporal distance between co-speech actions and corresponding (1) pitch accent peaks and (2) word onsets. For (1), we find no effect of mutual visibility on cross-modal synchronization. However, pitch accent peaks and co-speech actions are more tightly aligned when game moves are prosodically prominent due to information structural needs. This result is in line with a view of a tightly coupled processing of modalities, where prominence-lending modifications in the prosodic structure attract corresponding manual actions to be realized in a way similar to ‘beat’ gestures. For (2), we do find an effect of mutual visibility, under which co-speech actions are produced earlier and more tightly aligned with word onsets. This result is in line with a view in which co-speech actions act as communicative affordances, which may help an early disambiguation of a message, similar to ‘representational’ co-speech gestures.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. The recording setup in the full visibility condition.

Figure 1

Figure 2. Manipulation of the four different within-subject visibility conditions. 1. Full visibility (upper left): transparent game board between interlocutors; 2. manual visibility (upper right): transparent game board and curtain at head height; 3. facial visibility (lower left): nontransparent game board between interlocutors; and 4. no visibility (lower right): nontransparent game board and curtain at head height.

Figure 2

Table 1. Contingency table of game moves classified as (un)predictable or (un)important in the full data set. Numbers in brackets indicate the reduced number of game moves that are used in the analyses of cross-modal synchronization (cf. 2.5)

Figure 3

Figure 3. Manipulation of importance and unpredictability in the recordings. The game move on the left (field number 5) is unpredictable but unimportant for the outcome of the game. The move on the right (field number 7) is more predictable, but game decisive, hence important.

Figure 4

Figure 4. Illustration of distance measurements between pitch accent peaks (distWORDINIT), word initializations (distPEAK) and the time of manual contact with the game board when performing a game move. Both prosodic anchors (word initializations = dashed line; PA peak = dotted line) create different ‘0’ points on the time axis. Moves preceding those anchor points result in negative values, and moves following those anchor points result in positive values. In this illustration, a manual game move is executed following the word initialization (left), resulting in a positive value for the distWORDINIT measure, and precedes the PA peak (right), resulting in a negative value for the distPEAK measure.

Figure 5

Figure 5. Density plot of distances between pitch accent peaks (‘0’) and the timing of corresponding manual game board contacts. The data show a strong bimodal distribution, with local maxima occurring 241 and 28 ms before the corresponding pitch peak position (dashed lines). The data are split into pitch peak preceding and pitch peak synchronized co-speech movements using the local minimum (solid line) between both maxima (preceding the corresponding pitch peak by 119 ms).

Figure 6

Figure 6. Interaction plot showing the predicted probabilities (marginal means and 95% confidence intervals) of manual game moves to be closely aligned with their corresponding pitch accent peaks (i.e., synchronized distPEAK events). If a game move is important and unpredictable, this probability increases.

Figure 7

Figure 7. Interaction plot showing the predicted timing (marginal means and 95% confidence intervals) of manual game board contact relative to the corresponding initializations of verbalized game moves (distWORDINIT) in milliseconds. Predictable and unimportant moves are realized considerably earlier than important or unpredictable game moves. Manual game moves are generally realized earlier when interlocutors can see each other’s hands (ManualVisi).