Hostname: page-component-89b8bd64d-r6c6k Total loading time: 0 Render date: 2026-05-08T19:40:25.653Z Has data issue: false hasContentIssue false

Identifying the dialectal background of American Finnish speakers using a supervised machine-learning model

Published online by Cambridge University Press:  11 July 2023

Ilmari Ivaska*
Affiliation:
Department of Finnish and Finno-Ugric Languages, University of Turku, 20014, Finland
Mirva Johnson
Affiliation:
Department of German, Nordic, and Slavic+, University of Wisconsin-Madison, Madison, WI 53705, USA
Tommi Kurki
Affiliation:
Department of Finnish and Finno-Ugric Languages, University of Turku, 20014, Finland
*
Corresponding author: Ilmari Ivaska; Email: ilmari.ivaska@utu.fi

Abstract

This study presents results of two experiments using supervised machine-learning models to examine individual Finnish speakers’ dialectal backgrounds. Data come from interviews conducted with heritage speakers of Finnish in northern Wisconsin and are compared to data from the Finnish Dialect Syntax Archive. The models were constructed and then, following successful validation testing, used to identify the dialectal background of five individual American Finnish speakers. Results showed individual variation in dialectal backgrounds and some correlation to speakers’ likely language input. Our approach offers a new methodological tool for examining speakers’ dialectal backgrounds in situations of language contact.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of The Nordic Association of Linguists
Figure 0

Figure 1. Finland’s dialect areas: (1) Southwest, (2) Southwest transitional, (3) Häme, (4) South Ostrobothnia, (5) Central/North Ostrobothnia, (6) Far North, (7) Savo, (8) Southeast. The dark-grey areas are predominantly Swedish-speaking. Map by Tommi Kurki.

Figure 1

Table 1. Background of American Finnish speakers

Figure 2

Table 2. Distribution of variants of inessive case forms

Figure 3

Table 3. Confusion matrix of the classification of the LAX informants

Figure 4

Figure 2. The fifteen most important dialect features for distinguishing speakers from different dialect regions. Abbreviations used in the model: 1PLPRON = first person plural pronoun; 1SGPRON = first person singular pronoun; D = /d/ variable; INE = inessive case; ME = standard first person plural pronoun; MINA = standard first person singular pronoun; OA = word-final -oa; SCHWA = schwa vowel; SCHWA_H = schwa vowel inserted following /h/; VERBPL = first person plural verb ending.

Figure 5

Figure 3. True and predicted locations of the LAX informants.

Figure 6

Figure 4. The fifteen most important dialect features when predicting a speaker’s geographical location.

Figure 7

Figure 5. Standard Finnish speaker as a point of reference. The two T points in the Gulf of Finland are islands that belonged to Finland until WWII. The one further east is the island of Seiskari (coordinates 60.02302N, 28.37754E), and the one a little further west is the island of Suursaari (coordinates 60.055833N, 26.983889E). Both of these locations are included in the LAX data.

Figure 8

Figure 6. Marie’s dialectal distribution.

Figure 9

Figure 7. Laura’s dialectal distribution.

Figure 10

Figure 8. Gerry’s dialectal distribution.

Figure 11

Figure 9. Don’s dialectal distribution.

Figure 12

Figure 10. Ron’s dialectal distribution.

Supplementary material: File

Ivaska et al. supplementary material

Appendix

Download Ivaska et al. supplementary material(File)
File 24 KB