To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Chapter 1 it was noted that the outer ear (pinna), ear canal, and middle ear modify and distort the acoustic signal before it reaches the inner ear. Although these changes would be considered serious defects if produced by microphones or other physical instruments, they can furnish valuable perceptual information. Thus, the complex acoustic changes produced by the pinnae when a source moves provide information concerning position, and result in perception of an unchanging sound at a changing location. This chapter discusses the effects produced by the pinnae, as well as other examples of how changes in acoustic input are not perceived as differences in the nature of the sound, but rather as differences in the location of the sound source relative to the listener's head.
Obviously, it often is important for us to know where the sound is originating, but there is another advantage associated with the ability to localize sound. As we shall see, mechanisms employed for localization allow us to hear signals which would otherwise be inaudible.
Any position in space can be specified relative to an observer by its azimuth (angle from straight ahead measured in the horizontal plane), elevation (angle from the horizontal measured in a vertical plane), and distance. Unfortunately, it is difficult to mimic the localization of sound sources in space using headphones because the sounds generated by them usually have apparent sources positioned within the listener's head (for reasons that will be discussed subsequently).
The world is a noisy place, and signals of importance are often accompanied by louder irrelevant sounds. Hearing would lose much of its usefulness could we discern only whichever sound was at the highest level. However, there are mechanisms permitting us to attend to sounds that are fainter. In addition, we can, under certain circumstances, perceptually restore sounds that have been completely obliterated.
Some of the mechanisms enhancing the perception of signals in the presence of interfering sounds have been discussed in Chapter 2. One of these mechanisms squelches or reduces the interference produced by echoes and reverberation through the isolation of the early-arriving components that reach the listener directly from the source. Another mechanism is associated with localization, and reduces the threshold for a sound originating at one place when subject to interference by a sound originating at another position. It is also pointed out in Chapter 2 that contralateral induction can restore monaurally masked signals, and prevent mislocalization of a source.
Nevertheless, signals of importance can still be completely masked and obliterated. But when masking is intermittent, snatches of the signal occurring before and after the obliterated segment can furnish information concerning the missing segments. When this occurs, a perceptual synthesis can allow listeners to perceive the missing sounds as clearly as those actually present. As we shall see, this “temporal induction” of missing fragments can not only result in illusory continuity of steady-state sounds such as tones, but it can also restore contextually appropriate segments of time-varying signals such as speech, melodies, and tone glides.
Books on perception usually concentrate on a single modality, such as vision or hearing, or a subdivision of a modality, for example, color vision or speech perception. Even when an introductory book on perception deals with several modalities, it is generally subdivided into sections with little overlap. The few books treating the senses together as a single topic generally emphasize philosophy or epistemology (but see Gibson, 1966; Marks, 1978).
Yet the senses are not independent. Events in nature are often multidimensional in character and stimulate more than one sensory system. An organism which optimizes its ability to interact appropriately with the environment is one that integrates relevant information across sensory systems. In a book dealing largely with single neurons that respond to more than one modality, Stein and Meredith (1993) stated that “we know of no animal with a nervous system in which the different sensory representations are organized so that they maintain exclusivity from one another.”
Multimodal perception
Interaction of vision with senses other than hearing
Depth perception in vision is based on a number of cues, including disparity of the images at the two retinae, and motion parallax. But, in addition to these cues transmitted by the optic nerve, there are proprioceptive ocular cues from the muscles producing accommodation (changes in the curvature of the lens necessary to produce a sharp image) and convergence (adjustment of the ocular axes so that the fixated object is imaged on corresponding points of each fovea).
This chapter reviews a classical problem, perception of tones, and suggests that our understanding of this topic may be enhanced by considering it as part of a larger topic: that of perception of acoustic repetition. As we shall see, periodic sounds repeated at tonal and infratonal frequencies appear to form a single perceptual continuum, with study in one range enhancing understanding in the other.
Terminology
Some terms used in psychoacoustics are ambiguous. The American National Standards Institute (ANSI, 1976/1999) booklet Acoustical Terminology defines some basic technical words as having two meanings, one applying to the stimulus and the other to the sensation produced by the stimulus. The confusion of terms describing stimuli and their sensory correlates is an old (and continuing) potential cause of serious conceptual confusions – a danger that in 1730 led Newton (1952, p. 124) to warn that it is incorrect to use such terms as red light or yellow light, since “ … the Rays to speak properly are not coloured.” However, the ANSI definitions for the word “tone” reflect current usage, and state that the word can refer to: “(a) Sound wave capable of exciting an auditory sensation having pitch. (b) Sound sensation having pitch.” A similar ambiguity involving use of the same term to denote both stimulus and sensation is stated formally in the ANSI definitions for the word “sound.” The use of both of these terms will be restricted here to describe only the stimuli.
The comprehension of speech and the appreciation of music require listeners to distinguish between different arrangements of component sounds. It is often assumed that the temporal resolution of successive items is required for these tasks, and that a blurring and perceptual inability to distinguish between permuted orders takes place if sounds follow each other too rapidly. However, there is evidence indicating that this common-sense assumption is false. When components follow each other at rates that are too rapid to permit the identification of order or even the sounds themselves, changes in their arrangement can be recognized readily. This chapter examines the rules governing the perception of sequences and other stimulus patterns, and how they apply to the special continua of speech and music.
Rate at which component sounds occur in speech and music
Speech is often considered to consist of a sequence of acoustic units called phones, which correspond to linguistic units called phonemes (the nature of phonemes will be discussed in some detail in Chapter 7). Phonemes occur at rates averaging more than 10 per second, with the order of these components defining syllables and words. Conversational English contains on average about 135 words per minute, and since the average word has about 5 phonemes, this corresponds to an average duration of about 90 ms per phoneme (Efron, 1963). It should be kept in mind that individual phonemes vary greatly in duration, and that the boundaries separating temporally contiguous phonemes are often not sharply defined.
This chapter provides a brief introduction to the physical nature of sound, the manner in which it is transmitted and transformed within the ear, and the nature of auditory neural responses.
The nature of auditory stimuli
The sounds responsible for hearing consist of rapid changes in air pressure that can be produced in a variety of ways – for example, by vibrations of objects such as the tines of a tuning fork or the wings of an insect, by puffs of air released by a siren or our vocal cords, and by the noisy turbulence of air escaping from a small opening. Sound travels through the air at sea level at a velocity of about 335 meters per second, or 1,100 feet per second, for all but very great amplitudes (extent of pressure changes) and for all waveforms (patterns of pressure changes over time). Special interest is attached to periodic sounds, or sounds having a fixed waveform repeated at a fixed frequency. Frequency is measured in hertz (Hz), or numbers of repetitions of a waveform per second; thus, 1,000 Hz corresponds to 1,000 repetitions of a particular waveform per second. The time required for one complete statement of an iterated waveform is its period. Periodic sounds from about 20 through 16,000 Hz can produce a sensation of pitch and are called tones.
Earlier chapters dealing with nonlinguistic auditory perception treated humans as receivers and processors of acoustic information. But when dealing with speech perception, it is necessary also to consider humans as generators of acoustic signals. The two topics of speech production and speech perception are closely linked, as we shall see.
We shall deal first with the generation of speech sounds and the nature of the acoustic signals. The topic of speech perception will then be described in relation to general principles, which are applicable to nonspeech sounds as well as to speech. Finally, the topic of special characteristics and mechanisms employed for the perception of speech will be examined.
Speech production
The structures used for producing speech have evolved from organs that served other functions in our prelinguistic ancestors and still perform nonlinguistic functions in humans.
It is convenient to divide the system for production of speech into three regions (see Figure 7.1). The subglottal system delivers air under pressure to the larynx (located within the Adam's apple) which contains a pair of vocal folds (also called vocal cords). The opening between the vocal folds is called the glottis, and the rapid opening and closing of the glottal slit interrupts the air flow, resulting in a buzz-like sound. The buzz is then spectrally shaped to form speech sounds or phonemes by the supralaryngeal vocal tract having the larynx at one end and the lips and the nostrils at the other.
Bird songs are among the most beautiful, complex sounds produced in the natural world and have inspired some of our greatest poets and composers. Whilst biologists are equally impressed, their curiosity is also aroused. How and why has such an elaborate form of communication developed among birds? Charles Darwin was one of many who struggled to attempt an answer, and the elaborate songs of male birds such as nightingales clearly influenced his thinking as he developed the theory of sexual selection. Since then, biologists from many different disciplines, ranging from molecular biology to ecology, have found bird song to be a fascinating and productive area for research. The scientific study of bird song has made important contributions to such areas as neurobiology, ethology and evolutionary biology. In doing so, it has generated a large and diverse literature, which can be frustrating to those attempting to enter or survey the field. At the moment, the choice is largely between wrestling with the original literature or tackling advanced, multi-author volumes. Although our book is aimed particularly at students of biology, we hope that our colleagues in different branches of biology and psychology will find it a useful introduction. We have also tried to make it accessible to the growing numbers of ornithologists and naturalists who increasingly want to know more about the animals they watch and study.