Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-09T03:36:44.950Z Has data issue: false hasContentIssue false

Forced alignment for Nordic languages: Rapidly constructing a high-quality prototype

Published online by Cambridge University Press:  01 December 2021

Nathan J. Young
Affiliation:
Centre for Research on Bilingualism, Stockholm University, Stockholm 106 91, Sweden. Email: nathan.young@biling.su.se
Michael McGarrah
Affiliation:
Department of Computer Science, Georgia Institute of Technology, North Avenue, Atlanta, Georgia 30332, USA. Email: mcgarrah@gmail.com

Abstract

We propose a rapid adaptation of FAVE-Align to the Nordic languages, and we offer our own adaptation to Swedish as a template. This study is motivated by the fact that researchers of lesser-studied languages often neither have sufficient speech material nor sufficient time to train a forced aligner. Faced with a similar problem, we made a limited number of surface changes to FAVE-Align so that it – along with its original hidden Markov models for English – could be used on Stockholm Swedish. We tested the performance of this prototype on the three main sociolects of Stockholm Swedish and found that read-aloud alignments met all of the minimal benchmarks set by the literature. Spontaneous-speech alignments met three of the four minimal benchmarks. We conclude that an adaptation such as ours would especially suit laboratory experiments in Nordic phonetics that rely on elicited speech.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of Nordic Association of Linguistics
Figure 0

Figure 1. INPUTS 1 and 2: Five-column tab-delimited transcription input for FAVE-Align, produced with ELAN, and sound file.

Figure 1

Figure 2. INPUT 3: Pronunciation dictionary with all possible pronunciations using ASCII characters for IPA.

Figure 2

Figure 3. OUTPUT: Phonetically segmented file that is readable in Praat.

Figure 3

Table 1. Schedule of the benchmarks set in the literature according to the four most popular measurements. (Abbreviations: AE American English; BE British English; S spontaneous speech; R read-aloud speech; ms milliseconds; pct percentage.)

Figure 4

Table 2. SweFAbet, corresponding IPA, grapheme, Swedish lexical example,8 and closest English phoneme with ARPAbet

Figure 5

Figure 4. Map of greater Stockholm and its metro. Home neighborhoods of the nine speakers are plotted, and speakers are itemized according to their respective social dialects.

Figure 6

Table 3. (top) Upper and lower performance standards from the literature. (bottom) Performance of SweFA for three male speakers of Stockholm’s three main sociolects each in two speech styles according to four metrics. Results highlighted in light gray exceed the lowest standards in the literature; results highlighted in dark gray exceed the highest standards in the literature

Figure 7

Figure A1. Section of FAVE’s Python code that defines monophones.

Figure 8

Figure A2. Section of SweFA’s Python code that defines monophones.

Figure 9

Figure A3. Section of SweFA’s Python code that defines monophone string length and stress numbering.

Figure 10

Figure A4. Additional section of SweFA’s Python code that defines monophone string length.

Figure 11

Figure A5. Section of FAVE’s Python code that converts potential UTF-8 characters in the transcription into ASCII.

Figure 12

Figure A6. Excerpt from lines 40470 to 41068 of the hidden Markov model vectors for the monophone UH in unstressed position (indicated by ~h “UH0”).

Figure 13

Figure A7. Converting the FAVE-Align vectors for UH to SweFA’s OEH. First UH1 and UH2 are duplicated, then the names are changed.

Figure 14

Figure A8. Dictionary format for /FAVE-align/model/dict. Every entry requires its own line, the entry must be in ASCII, and the entry is separated from its pronunciation by a single space. Subsequent spaces separate monophones.