The objective of this special issue is to present innovative research demonstrating that prosody needs to be reconceptualized as an inherently multimodal phenomenon, manifested across the spoken and/or visual domains. The studies included are organized into three core themes. Theme 1 addresses the temporal alignment of spoken and visual aspects of prosody, and how this is shaped by linguistic factors, speaker-specific traits (such as neurodiversity) and language learning patterns. Theme 2 deals with the coordination of spoken and visual aspects of prosody in conveying pragmatic intent, focusing on aspects such as negation, emotion and epistemic stance. Theme 3 explores how visual signals, including head movements and manual signals, fulfil essential prosodic roles across diverse sign language typologies. Taken together, the empirical evidence presented here shows that prosody is also embodied and that our bodily movements can manifest prosodic characteristics. On the one hand, they show the need to comprehensively re-evaluate our understanding of how speakers, listeners and learners engage with the prosodic dimension of language. On the other hand, they reveal that non-referential gestures are deeply meaningful and prosodically structured. Ultimately, visual cues are presented as indispensable for building accurate models of the human language capacity.