To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Durational information provides a reliable cue to the unfolding syntactic structure of a sentence. At the same time, durational properties of speech are largely dependent on predictability: Less predictable elements of an utterance are more carefully articulated, and thus produced more slowly. While these two determinants of duration (structure and predictability) often align, there exists a well-defined exception where the two factors make opposite predictions. We discuss converging evidence for tempo modulation playing a crucial role in the disambiguation of clausal attachment (modifier versus argument), leading to a shorter duration for the less predictable nested structure and a longer duration for the more predictable sisterhood structure. We then present an account of these temporal patterns based on the interaction of independently motivated prosodic principles.
A string of speech is a string of syllables, a series of varying amounts of jaw openings/closings. Neutralizing the vowel-intrinsic jaw opening indicates a pattern of jaw opening matching the utterance syllable prominence patterns. The hypothesis is that the jaw-opening patterns ensue from the metrical hierarchy of the language, such that for languages such as English, we see exponentially increasing jaw displacement on the metrically strong syllable within each foot, phrase, and utterance; for languages such as French, Chinese, and Japanese, we see increased jaw displacement at the end of each phrase, with the largest amount of jaw displacement at the end of the utterance. These language-specific jaw-displacement patterns tend to be carried over when learning a second language. Also explored in this chapter are segmental articulation interactions with jaw-displacement patterns, as well as the relationship between metrically motivated jaw displacement patterns and listeners’ perceptions of utterance prominence patterns.
A knowledge graph (KG) is a machine-readable structured representation of knowledge consisting of entities (entity and entity type) and relationships in various forms (e.g., labeled property graphs and resource description frameworks (RDFs)) (Sheth et al., 2019b). KiL based on Machine Learning/Deep Learning seamlessly integrates external knowledge to address challenging problems in low-resource and open-domain natural language processing tasks and domain-specific problems. Domain-specific problems require the application of task-specific knowledge (implicit or explicit) to generic AI models. For example, to detect emerging events in a stream of crisis-related posts (e.g., Hurricane, COVID-19 Pandemic), a generic language model (e.g., Word2Vec Mikolov et al., 2013, BERT) can be fine tuned using the concepts and relationships found in disaster ontology (e.g., empathi from Gaur et al., 2019a). Lowresource problems are characterized by having few labeled samples, making further labeling difficult in terms of effort, quality, and time. For instance, annotating millions of posts from users in various mental health communities on Reddit would require (a) establishing guidelines for annotation, (b) training annotators, (c) resolving annotation conflicts, and (d) enriching quality over multiple iterations to achieve high annotator agreement. A study by Gaur et al. (2021b) proposed a KiL pipeline to annotate such extensive social data at scale, shifting the human role from annotators to evaluators.
This chapter reviews speech rhythm in the context of prosodic entrainment in speakers with autism, and then presents data on speaking-rate entrainment obtained from conversations of children and adolescents with and without autism. The study focuses in particular on speaking rate entrainment at the level of the conversational turn and compares patterns of speaking rate entrainment to patterns in entrainment of fundamental frequency. The relationship between local entrainment at the conversational turn level is furthermore compared to global conversational entrainment that occurs over the course of the entire conversation. Results show no differences in entrainment in speaking rate at the turn level between speakers with and without autism. Furthermore, speaking rate and fundamental frequency entrainment behavior are correlated at the level of the conversational turn for both groups. Lastly, results suggest that turn-level entrainment is not correlated with global entrainment in fundamental frequency, possibly indicating that local and global entrainment serve different conversational functions.
A considerable amount of the linguistic input that young infants receive consists of multi-word utterances where word boundaries are not marked by pauses. Therefore, a crucial step in language acquisition is to learn to parse the continuous speech stream into possible word candidates. Here we argue that the ability to anticipate how the speech signal will unfold plays an important part in speech segmentation throughout the lifespan, and that spoken language that is rhythmic and temporally predictable will have the biggest effect on the speech segmentation. We introduce spontaneous pupillary synchrony with auditory stimuli as a novel way of investigating speech perception and segmentation as the speech signal unfolds. We discuss two studies with adults and young infants that show what synchronized changes in pupil size can reveal about the perception of temporal and structural rhythmic regularities in spoken language.
In speech, linguistic information is encoded in hierarchically organized units such as phones, syllables, and words. In auditory neuroscience, it is widely accepted that syllables in connected speech are quasi-rhythmic, and the rhythmicity makes them suitable to be encoded by theta-band neural oscillations. The rhythmicity of phones or words, however, is more controversial. Here, we analyze the statistical regularity in the duration of phones, syllables, and words, based on large corpora in English and Mandarin Chinese. The coefficient of variation (CV) of unit duration is slightly lower for syllables than phones and words, consistent with the idea that syllables are more rhythmic than phones and words, but the difference is weak. The mean duration of phones, syllables, and words matches the timescales of alpha-, theta-, and delta-band neural oscillations, respectively.
The “speech envelope” is often used as an acoustic proxy for neural rhythm. The problem is its assumption that the unfiltered, broadband signal can satisfactorily model neural modulation in the auditory pathway (and beyond). However, the auditory system does not function as a passive transducer but rather decomposes and segregates the signal into an array of tonotopically organized frequency channels. This modulation filtering results in a partitioning of slow (3–20 Hz) neural modulation patterns across the tonotopic axis that bear only a passing resemblance to the broadband speech envelope. Such polychromatic diversity (in frequency, magnitude, and phase) of auditory modulation patterns is critical for decoding the speech signal, as it highlights critical linguistic properties such as articulatory-acoustic and prosodic features important for decoding and understanding spoken language. The low-frequency modulation patterns associated with high-frequency (>2 kHz) auditory channels are especially important for prosodic processing and consonant discrimination, both key for speech intelligibility, especially in adverse listening conditions and among the hard of hearing.
A novel approach of this book is its reliance on experimental evidence primarily drawn from well-controlled comparisons between completely illiterate and literate individuals, highlighting the mind-enhancing powers of reading. To properly interpret this evidence, it is necessary to clarify the evolving definitions of literacy and often inconsistent terminology used to describe individuals with varying literacy levels.
A better understanding of where speech and language rhythms come from may not only require their investigation in humans but also their roots in the animal kingdom. In this chapter, we summarize what is known about the role of locomotion and respiration as generators of rhythm across species. Furthermore, we discuss selected prosodic phenomena such as f0 declination over the course of an utterance and final lengthening at the end of an utterance as markers of rhythm. We summarize the evidence as to what extent they may also appear in communicative calls of animals, propose a new research program along those lines, and discuss their relation to language representations.
An increasing number of studies report that different forms of rhythmic stimulation influence linguistic task performance. First, this chapter aims at describing to what extent the construction of a tree-like structure in which lower-level units are combined into higher-level constituents in linguistic syntax and rhythm could be subserved by similar mechanisms. Second, we review and categorize rhythmic stimulation findings based on the temporal delay between the rhythmic stimulation and linguistic task that it influences, the precise relationship between the rhythmic and linguistic stimuli used, and the nature of the linguistic task. Lastly, this chapter discusses which categories of rhythmic stimulation effects can be interpreted in a framework based on a shared cognitive system that is responsible for hierarchical structure building.
New media create new realities, and, more than we often realize or acknowledge, new ways of thinking: new minds. Reading and the written medium transform not only societies but also individual minds.
The medium is not merely a channel for transmitting information or a passive carrier of content. While we tend to focus on the content, it is the medium that brings about the deeper, transformative effects. Extending McLuhan’s insight, one compelling conclusion emerges: The mind is the medium. The science of the benefits of the written medium for individual minds elucidates the myriad ways in which reading reshapes and enhances human cognition.
The study of memory resilience and cognitive aging remains in its early stages. Nevertheless, growing evidence suggests that a lifetime of literacy engagement and continued reading in older age confer significant cognitive benefits. High literacy levels are associated with increased cognitive reserve; which may offer a buffer against age-related memory decline. Once forgetfulness begins to interfere with daily functioning, this additional reserve may help avid readers maintain cognitive performance. In people at elevated risk for age-related memory disorders, such reserve may even delay or mitigate the onset of full-blown dementia.
Wise deliberative spaces depend on multiple forms of trust. Chapter 6 explores the factors that have diminished social, epistemic, and institutional trust over recent decades. These factors include fewer face-to-face interactions, the weakening of cross-cutting identities (across party lines), rising income inequality, and the fragmentation of the news media. The chapter examines specific patterns of epistemic and institutional trust in scientists, academics, journalists, and government officials.Drawing on the work of Mercier and Sperber, it then analyzes the mechanisms people use to calibrate trust. Humans have evolved to rely on a combination of social cues, plausibility checks, and argument evaluation to decide whom to trust. While these strategies can be fairly reliable in small-scale environments, they tend to be less dependable in large-scale, complex societies. The chapter concludes by considering strategies to strengthen trust and improve trust calibration. These include cultivating democratic, deliberative systems at the local level, especially those that promote civic interaction with outgroup members, and enhancing the transparency and responsiveness of institutions.
The term post-truth refers to circumstances in which objective facts exert less influence on public opinion than appeals to emotion and personal belief. While not new, this phenomenon has intensified with the rapid speed that misinformation and conspiracy theories can spread online, compounded by rising political polarization. This book draws on leading research in psychology and other social sciences to explain how post-truth claims emerge, why they persist despite contrary evidence, and how we might respond to their challenges. My analysis integrates three distinct approaches to human reasoning: Bayesian models, dual-process theories, and social argumentation. I introduce the term wise deliberative spaces to describe forums that pursue truth and the common good through discourse practices that foster deliberative dialogue. These spaces have declined in recent decades due to reduced face-to-face community engagement, shifts in the media landscape, declining trust in knowledge-producing institutions, and deepening political divides. The chapter concludes by summarizing the book’s organization.
Research on speech rhythm over the last decades has led to the widespread application of so-called rhythm metrics in order to empirically quantify variation in timing across languages and dialects. Many of these rhythm metrics are duration-based, such as the standard deviation of vocalic and consonantal interval duration (ΔV and ΔC), respectively, the coefficient of variation of vocalic interval duration (VarcoV), and the normalized pairwise variability index for vocalic intervals (nPVI-V). While these and other duration-based rhythm metrics have been widely used in research, and also tested for their reliability, there are also a number of lesser-used acoustic rhythm metrics. These indices rely solely on measures of variability in pitch, loudness, or factors, or combine them with measures of duration. This chapter discusses which rhythm metrics are available and concludes with practical recommendations for their application (an accompanying Praat script is available at https://osf.io/79qyg/).
In an era of rampant misinformation, conspiracy theories, and political polarization, this book confronts the paradox between rational models of human cognition and seemingly irrational behavior. Drawing on cutting-edge research in psychology and other social sciences, it explores practical tools such as fostering digital literacy and cultivating “wise deliberative spaces” grounded in argument, perspective taking, and moral inquiry. Written for graduate students, researchers, and general readers, E. Michael Nussbaum provides an accessible introduction to contemporary models of reasoning, motivation, and dialogue. With chapters on truth, talk, trust, and thinking, the volume presents a revised model of dual-process theory, linking it to deliberative dialogue while integrating insights from education, communication studies, philosophy, and political science. The result is a timely vision of cautious optimism for navigating today’s post-truth challenges.
Speech is a multiplexed signal displaying levels of complexity, organizational principles, and perceptual units of analysis at distinct timescales. This critical acoustic signal for human communication is thus characterized at distinct representational and temporal scales, related to distinct linguistic features, from acoustic to supra-lexical. This chapter presents an overview of experimental work devoted to the characterization of the speech signal at different timescales, beyond its acoustic properties. The functional relevance of these different levels of analysis for speech processing is discussed. We advocate that studying speech perception through the prism of multi-timescale representations effectively integrates work from various research areas into a coherent picture and contributes significantly to increasing our knowledge on the topic. Finally, we discuss how these experimental results fit with neural data and current dynamical models of speech perception.