To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Emerson’s poetry has been somewhat of an enigma for readers and critics alike, who have often found it thematically opaque and stylistically unwieldy. Many have concluded that he was incapable of writing “better” verse, a conclusion predicated upon the assumption that he intended to do otherwise but couldn’t. This essay takes as a starting point the idea that the roughness of Emerson’s poetic style was intentional and that his metric irregularities are not accidents. After analyzing the style, rhetoric, and prosody of the poems, this essay contextualizes these elements within Emerson’s metaphysics. It argues that Emerson’s poetry reveals the crumbling of meter that led to the modernist revolution and free verse; poetic style did not suddenly jump from Longfellow to Whitman, but rather meter was stretched and strained before it was broken.
The “speech envelope” is often used as an acoustic proxy for neural rhythm. The problem is its assumption that the unfiltered, broadband signal can satisfactorily model neural modulation in the auditory pathway (and beyond). However, the auditory system does not function as a passive transducer but rather decomposes and segregates the signal into an array of tonotopically organized frequency channels. This modulation filtering results in a partitioning of slow (3–20 Hz) neural modulation patterns across the tonotopic axis that bear only a passing resemblance to the broadband speech envelope. Such polychromatic diversity (in frequency, magnitude, and phase) of auditory modulation patterns is critical for decoding the speech signal, as it highlights critical linguistic properties such as articulatory-acoustic and prosodic features important for decoding and understanding spoken language. The low-frequency modulation patterns associated with high-frequency (>2 kHz) auditory channels are especially important for prosodic processing and consonant discrimination, both key for speech intelligibility, especially in adverse listening conditions and among the hard of hearing.
The temporal signatures that characterize speech – especially its prosodic qualities – are observable in the movements of the hands and bodies of its speakers. A neurobiological account of these prosodic rhythms is thus likely to benefit from insights on the neural coding principles underlying co-speech gestures. Here we consider whether the vestibular system, a sensory system that encodes movements of the body, contributes to prosodic processing. Careful review of the vestibular system’s anatomy and physiology, its role in dynamic attention and active inference, its relevance for the perception and production of rhythmic sound sequences, and its involvement in vocalization all point to a potential role for vestibular codes in the neural tracking of speech. Noting that the kinematics and time course of co-speech movements closely mirror prosodic fluctuations in spoken language, we propose that the vestibular system cooperates with other afferent networks to encode and decode prosodic features in multimodal discourse and possibly in the processing of speech presented unimodally.
Intonation units (IUs) are a fundamental prosodic unit of all known human languages, and as such they likely constitute an absolute universal property of language. IUs are chunks defined by a specific pattern of syllable delivery, together with resets in pitch and articulatory force. In this chapter we discuss IUs from four different perspectives and introduce them within the context of rhythms of speech, language, and the brain. First, we provide a detailed description of how IUs are defined. Second, we review linguistic research on the roles of IUs in communication, including their cross-linguistic applicability. This body of research suggests that IUs provide a universal structural cue for the cognitive dynamics of speech production and comprehension at a timescale of ~1 Hz. Third, we synthesize the linguistic perspective with findings from the study of brain rhythms and cognition. Finally, we review the existing algorithmic tools for IU identification from speech acoustics, to facilitate the incorporation of IUs in experimental and quantitative research.
Almost no seminar, book, or YouTube tutorial on successful public speaking is without the established and traditional “cork exercise.” It is supposed to enhance speakers’ rhythm and intelligibility, for which there is, however, no scientific evidence so far. Our experiment addresses this gap. Twenty speakers performed a presentation task three times: (1) before a cork exercise intervention, (2) immediately after it, and (3) some minutes later after having completed a distractor questionnaire. The intervention was a video recorded by a professional media trainer. Results show significant rhythmic (and related melodic and articulatory) differences between presentations (1) and (2), suggesting a positive effect for speakers in (2). However, in presentation (3), all measurements revert to the baseline presentation (1) level. Thus, the "cork exercise" basically works and yields positive effects; however, they are short-lived. The chapter ends with suggestions for further research and practical ideas for a more sustainable design of the cork exercise.
In speech perception, timing and content are interdependent. For example, in distal rate effects, context speech rate determines the number of words, syllables, and phonemes heard in an unchanging target speech segment. Such results confront psycholinguistic theory with the chicken-and-egg problem of concurrently inferring speech timing and content, and the interrelated issues of narrowing the search space of speech interpretations without bias and optimizing the speed/accuracy tradeoff in online processing. We propose listeners address these issues by managing the timing of speech-related computations. Specifically, we claim: (1) Listeners model speech timing as part of a speaker model; (2) variable-length sequences of morphosyntactic units are the basic increments of speech inference; and (3) listeners adaptively schedule inferential updates and computationally intensive operations according to (4) fluctuations in uncertainty predicted by the speaker model. We illustrate these claims in a mechanistic model – vowel-onset-paced syllable inference – explaining multiple psycholinguistic results, including distal rate effects.
On phrasal timescales, spontaneous conversational speech is not very rhythmic. Instead, periods of speech activity are intermittent: Words tend to come in short bursts and are often interrupted with hesitations. Nonetheless, it has been suggested that there is a production mechanism that generates phrasal rhythmicity in speech. This chapter examines the empirical evidence for such a mechanism and concludes that speakers do not directly control the timing of phrases. Instead, it is argued that temporal patterns associated with phrases are epiphenomena of processes involved in conceptual-syntactic organization. A model is presented in which coherency-monitoring systems govern the initiation and interruption of speech activity. Hesitations arise when conceptual or syntactic systems fail to achieve sufficiently ordered states. The model provides a mechanism to account for intermittency on phrasal timescales.
Australian languages have often been noted for their high rates of phonological uniformity cross-linguistically; investigations into the phonetics of these languages, however, have revealed rich phonetic variation below the phonological level. In the current study, the phonetic correlates of stress in thirteen Australian languages with fixed initial stress placement are investigated using corpus phonetics methods and based on archival field recordings of natural speech. Across these languages, a high f0 peak is a common correlate of initial stress, as has often been cited in the literature; increased vowel duration is similarly common. Effects of onset consonant or post-tonic consonant lengthening have been noted for many Australian languages and are sometimes found in this study, though the lengthening may only apply to one or two of stops, nasals, and glides.
This study investigates how lexical, phrasal, and contrastive stress are acoustically realized in American English, focusing on whether men and women differ in how they use pitch, amplitude, and duration to convey stress. Thirty-six native speakers completed minimal-pair stress production tasks online. We analyzed the resulting speech using prosodic contour measures, Bayesian ANOVAs, mixed-effects regression, Random Forest Classification, and human coder judgments. Results show greater acoustic overlap between lexical and contrastive stress than between either of those and phrasal stress. Duration was the primary cue for phrasal stress, while lexical and contrastive stress relied more evenly on multiple cues. Gender-based differences were especially evident in contrastive stress, which, to our knowledge, has not previously been studied in relation to gender: women relied more on pitch, while men emphasized amplitude and duration. These findings highlight the multidimensional acoustic nature of stress realization and demonstrate the value of combining computational and perceptual approaches in prosody research.
Americans of Mexican or Central American heritage have developed a cluster of dialects that follow recognisable patterns of immigrant groups. These dialects exhibit diversity depending on the region of the United States where they are spoken, the relative concentrations of their speakers, the degree of historical discrimination, the presence of African American influence and other factors. They all share a background of Spanish interference features, but they have all undergone a process of winnowing those features and adopting others as they develop. Phonetic influences have been easier to document than morphosyntactic influences.
The prosodic phenomenon called stød in Danish is largely predictable from stress patterns. Words with stress on the ultimate syllable generally have stød, and words with stress on the penultimate syllable generally do not have stød (Grønnum 2005, Goldshtein 2023). According to Basbøll (2005), stød in monomorphemic words can be explained exclusively by the location of the stressed syllable. However, there are exceptions to this general pattern. For example, words with stress on the penultimate syllable ending in -en, -er, or -el often have stød. In this study, an experiment with a two-alternative forced-choice task is used to investigate speakers’ preference for stød in monomorphemic nonce words ending in -en, -er, or -el, compared to nonce words ending in -e. The results show that participants prefer stød more often in nonce words ending in -en, -er, or -el than in nonce words ending in -e, and this preference increases in the same direction as the distribution in the lexicon. The study therefore shows that stød is not exclusively conditioned by the location of the stressed syllable in monomorphemic words. Speakers’ generalizations for stød are more fine-grained, and they reflect statistical patterns in the lexicon.
Although children with cochlear implants (CIs) have limited access to pitch information due to the suboptimal device transmission, durational cues are relatively well preserved, allowing for the acquisition of prosodic cues needed for communication. Recent findings show that Mandarin-speaking preschoolers with CIs can produce prosodic cues (e.g., duration and pitch) to disambiguate noun-noun compounds (e.g., xiong-mao “panda”) and lists (e.g., xiong, mao “bear, cat”), with those implanted early (before age 2) demonstrating production patterns similar to their typical hearing (TH) peers. This then raises questions about these children’s ability to perceive prosodic cues, and if early implantation again enhances their performance. These questions were investigated using a two-alternative forced-choice task with 57 Mandarin-speaking preschoolers with CIs and 66 TH peers. The results show that all preschoolers can perceive the prosodic cues needed to identify compounds but not lists, suggesting that, like English, the mapping between prosodic cues and postlexical meaning is also acquired late in children learning a tonal language. In terms of the effect of CIs, those implanted before age 2 performed as well as their TH peers. These findings suggest that preschoolers may rely more on other linguistic information rather than prosodic cues when comprehending compounds and lists, offering cross-linguistic evidence for this tendency. Furthermore, interventions for preschoolers with CIs should support the mapping of prosodic cues to discourse functions rather than just vocabulary training, improving daily communicative abilities.
This article tests theories of verb stress in the Northwest Caucasian language Abkhaz using a new database corpus of 3,115 inflected forms of 445 verbs. I describe the creation of the corpus and show how it can be used to gain new insights into principles of Abkhaz stress assignment, which depend on complex interactions between phonology and polysynthetic verbal morphology. I implement a previous theory of Abkhaz stress assignment (Dybo 1977) in a computer program and use the corpus to assess empirically how well it accounts for stress patterns across the lexicon of eventive verbs in Abkhaz. I show how this empirical evaluation identifies both strengths and weaknesses of the theory, and use these to propose a revised theory of Abkhaz stress assignment. The revised theory ties with or outperforms the original in all verb categories, accounting for the stress alternations in 40 additional verbs, which comprise almost 10% of the corpus. This shows that corpora, combined with computational implementations of phonological theories, can be used to further our understanding of highly complex phonological data sets.
Although Elizabeth Bowen is primarily known for her work as a novelist throughout her long career, her prose frequently resembles poetry. She often borrows elements from verse to enhance her fiction. Notably, the three-part greater romantic lyric has an influence on The House in Paris and The Death of the Heart in its plying together of past and present, as well as different locations. In her lectures, radio broadcasts, and literary criticism, Bowen was fond of illustrating the craft of fiction with examples from verse. Not only was she an avid reader and reviewer of contemporary poets such as T. S. Eliot, W. H. Auden, Stephen Spender, Christopher Isherwood, and May Sarton, but she was also a close friend of many of them. Whenever her work was compared to poetry, she took it as the highest compliment. This essay explores her intertwined connections, both in her language and in her life, to the poets and poetry that surrounded her.
Critics have routinely voiced their frustrations with William Carlos Williams’s term ‘measure’. But from the late 1930s onwards, he compared his idea of ‘measure’ to the science of measurement. This chapter suggests, first, that to fully appreciate Williams’s measure, one must understand how the science of measurement frequently appeared in the vocabulary of a variety of contemporaneous critics of poetry. In so doing, it sketches a lineage of scientific criticism that began in the late nineteenth century and that shaped modernist theories of prosody. Second, by close reading Williams’s long poem Paterson (1963), it suggests that by rejecting the term ‘rhythm’ and reprising ‘measure’, Williams was attempting to define the knowledge practices proper to poetry in an era where to measure was to know.
This chapter introduces the varied, intense, committed, unruly and, above all else, deeply political attempts to fashion a definitive scientific account of poetic production from 1880 to the present. It shows how, when one casts their eye back on nineteenth- and twentieth-century disciplinary history, criticism was not just written by literary critics. It was also an activity undertaken by scientists – by mathematicians, physicists, psychologists, statisticians, public rationalists, early computer scientists, educationalists and other generalist intellectuals seduced by the power of scientific rationality. This chapter then rehearses the major arguments of the book, noting, first, how professionalised literary criticism was shaped by this search for a science of verse. Second, it outlines how a series of modern poets, from Laura Riding to Veronica Forrest-Thomson, theorised how their poetry could produce a form of knowledge removed from the hegemony of scientific rationality. To do this, the chapter outlines a theory of the epistemology and political power of poetic artifice.
The objective of this special issue is to present innovative research demonstrating that prosody needs to be reconceptualized as an inherently multimodal phenomenon, manifested across the spoken and/or visual domains. The studies included are organized into three core themes. Theme 1 addresses the temporal alignment of spoken and visual aspects of prosody, and how this is shaped by linguistic factors, speaker-specific traits (such as neurodiversity) and language learning patterns. Theme 2 deals with the coordination of spoken and visual aspects of prosody in conveying pragmatic intent, focusing on aspects such as negation, emotion and epistemic stance. Theme 3 explores how visual signals, including head movements and manual signals, fulfil essential prosodic roles across diverse sign language typologies. Taken together, the empirical evidence presented here shows that prosody is also embodied and that our bodily movements can manifest prosodic characteristics. On the one hand, they show the need to comprehensively re-evaluate our understanding of how speakers, listeners and learners engage with the prosodic dimension of language. On the other hand, they reveal that non-referential gestures are deeply meaningful and prosodically structured. Ultimately, visual cues are presented as indispensable for building accurate models of the human language capacity.
This chapter studies the sixty-plus songs not forming part of Fauré’s seven defined song cycles, with reference also to the recently-published body of wordless vocalises that Fauré produced between 1906 and 1916. His evolving technique in song writing is viewed chronologically, in relation to poets he set, noting how he adapted compositional techniques to different poets; patterns that emerge can imply two further unstated ‘cycles’ involving his settings of Hugo and Baudelaire. Some meticulous hidden musical structuring can be related to his close attention to poetry, along with an unusual but focussed approach to syllabification, with vocal lines characteristically running in rhythmic counterpoint over piano parts rather than comfortably lying within them. Singers with whom Fauré collaborated closely are discussed, noting their vocal and musical qualities and how these may have marked Fauré’s vocal writing; the chapter ends by reciprocally quoting their accounts of Fauré’s wishes and preferences in performance.
This article examines the V3 particle så in Fenno-Swedish, where the particle can follow both initial arguments and adjuncts in root clauses. In Mainland Scandinavian, this distribution is rather strictly limited to the latter context. The starting point is that the V3-pattern-triggering så is the ‘general adverbial resumptive’ in copy-left dislocation. In copy-left dislocation, an agreeing resumptive item causes a similar V3 pattern, where the adverbial spell-outs of the resumptive are partially interchangeable with så. Three hypotheses are considered. Firstly, så may have become fully generalised resumptive being interchangeable with all spell-outs. Secondly, the distribution could include all initial elements, also wh-phrases and negation markers, that are not pure operators. Finally, the paper suggests that the phenomenon is partially prosodic, and så satisfies a preference of having an anacrusis in the prosodic constituent including the finite verb.
The results of a production experiment show that English speakers distinguish elements under contrastive focus from elements that are merely new in the discourse. A novel paradigm eliciting both contrastively focused and merely discourse-new elements in the same sentence avoids differences in information structure and pitch accenting in the context surrounding the target elements that were confounds in previous studies on the topic. Elements under contrastive focus show greater duration, relative intensity, and F0 movement with respect to other elements in the utterance than elements that are new in the discourse but not under contrastive focus. We argue that the phonetic differences revealed here cannot be explained in terms of systematic manipulation of pitch-accent type or phrasal boundaries, and should instead be analyzed as differences in phrase-level phonological prominence for contrastively focused and merely discourse-new elements.