Hostname: page-component-6766d58669-mzsfj Total loading time: 0 Render date: 2026-05-15T01:36:21.343Z Has data issue: false hasContentIssue false

Perceptual validation of vowel normalization methods for variationist research

Published online by Cambridge University Press:  26 April 2021

Santiago Barreda*
Affiliation:
University of California, Davis
Rights & Permissions [Opens in a new window]

Abstract

The evaluation of normalization methods sometimes focuses on the maximization of vowel-space similarity. This focus can lead to the adoption of methods that erase legitimate phonetic variation from our data, that is, overnormalization. First, a production corpus is presented that highlights three types of variation in formant patterns: uniform scaling, nonuniform scaling, and centralization. Then the results of two perceptual experiments are presented, both suggesting that listeners tend to ignore variation according to uniform scaling, while associating nonuniform scaling and centralization with phonetic differences. Overall, results suggest that normalization methods that remove variation not according to uniform scaling can remove legitimate phonetic variation from vowel formant data. As a result, although these methods can provide more similar vowel spaces, they do so by erasing phonetic variation from vowel data that may be socially and linguistically meaningful, including a potential male-female difference in the low vowels in our corpus.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. Vowel spaces of one male and one female speaker who differ primarily according to uniform scaling, presented in Hertz and normalized using the log-mean (LM), Watt and Fabricius (WF) and Lobanov (LB) methods.

Figure 1

Figure 2. The vowel spaces of one male and one female speaker (top row), and two male speakers (bottom row) who differ in nonuniform scaling. Vowel spaces are presented in Hertz and normalized using the log-mean (LM), Watt and Fabricius (WF) and Lobanov (LB) methods.

Figure 2

Figure 3. The vowel spaces of two male speakers (top row), and one male and one female speaker (bottom row) who differ in centralization. Vowel spaces are presented in Hertz and normalized using the log-mean (LM), Watt and Fabricius (WF) and Lobanov (LB) methods.

Figure 3

Figure 4. Mean productions of a subset of vowels produced by thirty male (dashed line) and female (solid line) speakers of California English, presented in Hertz and normalized using three methods. Ellipses enclose two standard deviations.

Figure 4

Figure 5. (a) Locations of training and testing stimuli for the standard voice. Points indicate the steps along the /i/-/ɪ/ continuum. (b) Comparison of testing stimuli for the large (L), medium (M), and small (S) standard speaker. Smaller speakers produce higher formant-frequencies. Polygons outline vowel spaces implied by training vowels. (c) Comparison of testing stimuli for the large voices by voice type. (d) Comparison of all testing stimuli for standard (circle), high-F1 (triangle), and centralized (square) voice types across large, medium, and small speakers.

Figure 5

Table 1. Formant frequencies for large-speaker stimuli. Tokens whose vowel labels are numbers are steps along the /i/-/ɪ/ continua for the voices, with /i/ being the first step

Figure 6

Figure 6. (a) Testing vowels plotted according to F1 and F2 values in Hertz. The same vowels are presented when normalized according to different normalization methods (b-d).

Figure 7

Figure 7. (a) Proportion of classifications into response categories for each continuum step, by voice type and size. (b) The top row contrasts classifications across size for each voice type (small voices in broken lines). The bottom row contrasts average classification rates for standard (Std), high-F1 (HF1), and centralized (Cent) voices, averaged across sizes.

Figure 8

Figure 8. (a) Distribution of squared correlation (R2) between classification functions for pairs of voices in the bootstrap analysis. Lines indicate 95% highest-density intervals, points indicate means. Numbers indicate corresponding lines in the right panel. (b) Testing continua used in this experiment. Lines indicate specific differences highlighted in the left panel. For example, although line (1) indicates a small acoustic difference, the members of these continua are phonetically dissimilar. In contrast, line (3) compares continua that are acoustically dissimilar yet phonetically similar.

Figure 9

Figure 9. Proportion of classifications into /i/ (left distribution), /ɪ/ (middle distribution), and /ε/ (right distribution) by continuum step, presented across voice and presentation type. The final row compares the classification functions of the panels in each column. The final column compares the classification functions for the panels in that row.

Figure 10

Figure 10. (a) Points indicate classification rates for all testing stimuli into different categories, organized along each of the different normalized spaces. Continuum steps increase left to right. Point types indicate standard (circles), high-F1 (triangle), and centralized (square) speakers. Lines indicate predicted classification rates at each location. (b) Distribution of R2 for each normalization method and the differences in R2 between each method resulting from the bootstrap analysis.

Figure 11

Figure 11. In the top row, productions of thirty male (dashed line) and female (solid line) California speakers are compared for low and mid vowels. Ellipses enclose two standard deviations. In the bottom row, lines indicate values of t-statistics comparing means for each vowel along F1 (circles) and F2 (squares). The horizontal dotted lines indicate the level at which values reach significance, and filled points indicate significant comparisons.