Abstract
The aim of this paper is to examine English vowel spaces produced and perceived by Americans and Koreans, and to observe any similarities and differences among them in light of perceptual contrast in order to lead to further research on appropriate auditory scales. The paper will review speaker variation focused on non-linguistic factors, normalization methods, and then move on to competing theories of speech production and perception. Finally, the paper illustrates how Americans and Koreans perceive the synthesized English vowels followed by a discussion on the production data of B. Yang (1996).
Speaker variation
Speech signals vary greatly between and within speakers. In real life, a speaker does not produce a word in physically the same way on two occasions or in two different contexts. Furthermore, no two speakers will produce a word in exactly the same way articulatorily or acoustically. Therefore, the phonetic measurements of vowels from two different speaker or language populations may include too much speaker variation, which often leads to invalid inference.
Traunmüller (1988) divided the source of speaker variation into two factors: (1) linguistic factors such as dialectal, sociolectal differences and (2) non-linguistic factors such as physical anatomy, age, gender, and the emotional state of the speaker. Speakers also adapt their speech output depending on the listener or the environment (Lindblom & Engstrand, 1989).