We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Connected speech is defined here as any speech in units larger than single words, including phenomena that happen at word boundaries even in careful speech, as well as phenomena of spontaneous or conversational speech. The former includes abstract phonological processes that are triggered by word boundaries (e.g. insertion of /r/ in some English dialects, as in Australia[ɹ] is) but that are accompanied by sub-phonemic, phonetic effects. The latter topic covers acoustic characteristics and perception of connected speech, regardless of word boundaries. For example, the vowel space appears to shrink in more connected and/or spontaneous speech, phonemically voiced stop consonants are often reduced to approximants, and segmental deletions and reduction in the number of syllables are common. It is often difficult to believe the extent of the reduction that one finds in spontaneous speech, and even when listening to recordings, one frequently fails to notice the reductions until one zooms in and examines individual syllables. Providing an array of examples (audio available online) may help to demonstrate the pervasiveness of reduction in connected speech.
The phonetics/phonology interface refers to the relationship between the physical dimensions of phonetics and the abstract arrangement of phonemes and their manifestations within the phonological systems of languages. This chapter provides an overview of a range of approaches to the investigation of the phonetics/phonology interface, with particular attention to the relationships between phonetic factors such as positional prominence, acoustic salience and articulatory gestures, and phonological phenomena such as segment features and inventories, assimilation, and tone. I survey several clusters of theoretical orientation, each with distinct theoretical underpinnings and claims about the extent to which phonological concepts encode, reflect or direct phonetic details. I conclude with a discussion synthesising these seemingly disparate approaches, unifying them around a theme of linking the continuous physical dimensions of phonetic science with the abstract cognitive categories and rules of combination that typify phonological models. I discuss pedagogical implications and new directions in which facets of the interface can be explored.
This chapter addresses the issue of measuring consonants from an acoustic perspective. After reviewing some of the historic precedents that laid the foundations of acoustic analysis relevant for speech, the chapter provides a detailed report of the techniques for measuring the acoustic information of consonants of six manners of articulation (fricatives, stops, affricates, nasals, approximants, and the group of trills, taps and flaps). The chapter discusses links between the main articulatory characteristics of consonants of each manner and their acoustic correlates, with a focus on those acoustic variables that differentiate consonants within a manner, and on the variety of methods that are employed to measure them. Whenever possible, the chapter gives specific guidelines on how to apply the measurements, highlighting the differences in implementation between authors as well as the advantages and disadvantages of selecting one approach over another. In its closing sections, the chapter discusses some recent studies which address the issue of measuring consonants, provides some practical recommendations for teaching, and identifies some future directions for the topic.
This chapter gives an overview of critical issues in contemporary research on the phonetics of intonation, arising from a survey of historical and recent trends in the field. We begin with a brief introduction to some of the key concepts to be used in the description of intonation in the chapter, which is based primarily on the Autosegmental Metrical framework. In the subsequent historical overview, we place this tone-based framework in its historical context, comparing it with the British tune-based tradition, before outlining more recent developments arising out of studies of typological variation of intonation, which have influenced our understanding of both the forms and the meanings of intonation. Three critical issues in the study of intonation are then reviewed: defining the phonetic variables of intonation, the relationship of intonation to other linguistic structures, and intonational variation and change. A sampling of recent research subsequently highlights work that relates to these critical issues. Key considerations for the teaching of intonation are then reviewed, before some closing comments on future directions for intonation research.
This paper covers the methods for measuring rhythm and the main paradigms used to study rhythm perception. An overview of ideas about speech rhythm is provided, starting with the traditional view of isochrony and rhythm classes. Production and perception methods used to establish rhythm-class differences are presented and critically reviewed, as are a number of research practices associated with them. Recent developments leading to an alternative view of rhythm are discussed, and suggestions for pedagogical practice and future research are provided.
Consonants are speech sounds produced with a closure or near complete constriction of the vocal tract. All languages systematically exploit place of articulation to differentiate consonants. Eight other phonetically independent parameters are used to create consonant contrast: airstream, constriction degree, laryngeal setting, nasality, laterality, length, articulator stiffness, and respiratory strength. Aspiration, affrication, pre-stopping, secondary articulations, and other properties of ‘complex’ consonants are best described as patterns of coordination in the underlying gestures.
This chapter covers two related prosodic phenomena: stress, i.e. the relative perceived prominence of individual syllables, and speech rhythm, the distributed prominence of syllables across stretches of speech and their perceived regularity in time. Both stress and rhythm can be viewed from the angles of perception and production, and speakers of different languages differ in how stress and rhythm are produced, perceived and interpreted for linguistic meaning. The chapter explains which articulatory and phonatory factors have been found to play a role in the production of stressed syllables, and distinguishes between stress and accent. The historically important concepts of rhythm classes and isochrony are presented in the context of current developments and debates. Three recent issues for research are presented in some detail: the analysis of stress in different languages, rhythm metrics, and rhythm and perception. The chapter further explores the role of rhythm for turn-taking in everyday talk, showing that conversationalists aim to rhythmically integrate their turns at talk with those of other speakers.
Eye-tracking has proven to be a fruitful method to investigate how listeners process spoken language in real time. This chapter highlights the contribution of eye-tracking to our understanding of various classical issues in phonetics about the uptake of segmental and suprasegmental information during speech processing, as well as the role of gaze during speech perception. The review introduces the visual-world paradigm and shows how variations of this paradigm can be used to investigate the timing of cue uptake, how speech processing is influenced by phonetic context, how word recognition is affected by connected-speech processes, the use of word-level prosody such as lexical stress, and the role of intonation for reference resolution and sentence comprehension. Importantly, since the eye-tracking record is continuous, it allows us to distinguish early perceptual processes from post-perceptual processes. The chapter also provides a brief note on the most important issues to be considered in teaching and using eye-tracking, including comments on data processing, data analysis and interpretation, as well as suggestions for how to implement eye-tracking experiments.
This chapter provides an introduction to the acoustic and perceptual measurement of vowels. The measurable acoustic properties of vowels are formants, duration, pitch and intensity. Perceptual measurements include identification and discrimination of natural or synthesised vowels. After a brief review of the historical representation of the vowel space, technical details are given on measuring the acoustic properties of vowels, including perceptual measurements and speaker normalisation. This last plays a pivotal role in vowel space comparison among various language and gender groups. A few normalisation methods, along with the transformation of acoustic formant frequency values into auditory scales, are reviewed to provide a foundation for a cross-linguistic and curvilinear comparison of vowels. In addition, we describe competing models and theories and discuss correlations between vowel height and pitch, followed by practical scenarios and future studies on these measurements using software and internet resources.
Speech physiology consists of the articulatory structures, including the respiratory system, the larynx and various vocal tract articulators, plus the sensory organs, which provide auditory, somatosensory and visual inputs that map the feature space in which speech is produced and perceived. In this chapter the focus is on the neurophysiology of the articulatory structures. The acoustic characteristics of speech sounds are determined by changes in the length and tension of muscles, coordinated, at the lowest level, by interlinked clusters of motor neurons and interneurons in the brainstem which are themselves directed by excitation from cortical and midbrain structures. This chapter provides a brief foundation to these systems and structures, taking a functional perspective. The progressive nature of research into the anatomy and physiology of speech continues to generate new discoveries, and advances in modelling and mapping of biomechanical and neural control promise new avenues for phonetic research.
This chapter addresses the bidirectional interface between phonetics and speech-language therapy/pathology, focusing on the application of phonetic principles and methods within the clinical domain. The history of clinical phonetics as a phonetic subdomain is charted, including the birth of the extensions to the IPA for disordered speech (extIPA). Three critical issues are touched on: the complexities of the phonetics/phonology interface in discussing disordered speech; the related clinical application of different levels of transcription; and how advancing technologies are enabling clinical phoneticians to better understand the implications of clinical conditions for speech perception and production. In discussing a range of clinical populations and affected speech subsystems, it highlights some of the salient phonetic features explored in recent years and insights gained from different instrumental methods. Best practice for teaching and learning is described in the context of the professional training objective of most clinical phonetics programmes, and future directions of clinical phonetics are hypothesised in terms of the evolving technological and clinical landscapes.
Pitch, the subjective impression of whether individual speech sounds are perceived as relatively high or low, is an important characteristic of spoken language, contributing in some languages to the lexical identity of words and in all languages to the perception of the intonation pattern of utterances. Pitch corresponds to the physiological parameter of the frequency of vibration of the vocal folds, the fundamental frequency, which can be measured in cycles per second or hertz.Estimating and measuring fundamental frequency and modelling pitch is not easy. After presenting some automatic models of pitch, we address issues related to the detection and measurement of fundamental frequency, including tracking/detection errors, and explain how many of these errors can be avoided by the appropriate choice of pitch ceiling and floor settings. We finally discuss the use of acoustic scales (linear, logarithmic, psychoacoustic) for the measurement of pitch. Based on evidence from recent findings in neuroanatomy, neurophysiology, behavioural studies and speech production, we suggest that a new scale, the Octave-Median (OMe) scale, appears to be more natural for the study of speech prosody.
Building machines to converse with human beings through automatic speech recognition (ASR) and understanding (ASU) has long been a topic of great interest for scientists and engineers, and we have recently witnessed rapid technological advances in this area. Here, we first cast the ASR problem as a pattern-matching and channel-decoding paradigm. We then follow this with a discussion of the Hidden Markov Model (HMM), which is the most successful technique for modelling fundamental speech units, such as phones and words, in order to solve ASR as a search through a top-down decoding network. Recent advances using deep neural networks as parts of an ASR system are also highlighted. We then compare the conventional top-down decoding approach with the recently proposed automatic speech attribute transcription (ASAT) paradigm, which can better leverage knowledge sources in speech production, auditory perception and language theory through bottom-up integration. Finally we discuss how the processing-based speech engineering and knowledge-based speech science communities can work collaboratively to improve our understanding of speech and enhance ASR capabilities.