Skip to main content Accessibility help
Hostname: page-component-59b7f5684b-fmrbl Total loading time: 0.44 Render date: 2022-10-05T16:39:15.937Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "displayNetworkTab": true, "displayNetworkMapGraph": true, "useSa": true } hasContentIssue true

Lects in Helsinki Finnish - a probabilistic component modeling approach

Published online by Cambridge University Press:  17 May 2021

Olli Kuparinen*
Tampere University
Jaakko Peltonen
Tampere University
Liisa Mustanoja
Tampere University
Unni Leino
Tampere University
Jenni Santaharju
University of Helsinki


This article examines Finnish lects spoken in Helsinki from the 1970s to the 2010s with a probabilistic model called Latent Dirichlet Allocation. The model searches for underlying components based on the linguistic features used in the interviews. Several coherent lects were discovered as components in the data, which counters the results of previous studies that report only weak covariation between features that are assumed to be present in the same lect. The speakers, however, are not categorical in their linguistic behavior and tend to use more than one lect in their speech. This implies that the lects should not be considered in parallel with seemingly uniform linguistic systems such as languages, but as partial systems that constitute a network.

Research Article
Copyright © The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Agha, Asif. (2003). The social life of cultural value. Language and communication 23:231–73.CrossRefGoogle Scholar
Akaike, Hirotugu. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19:716–23.CrossRefGoogle Scholar
Åström, S.-E. (1956). Kaupunkiyhteiskunta murrosvaiheessa. [Urban society in a turning point.] In Helsingin kaupungin historia 4 (2). Helsinki: City of Helsinki. 9333.Google Scholar
Bayes, Thomas & Price, Richard. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London 53:370418.Google Scholar
Blei, David, Ng, Andrew, & Jordan, Michael. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3:9931022.Google Scholar
Bourdieu, Pierre, & Boltanski, Luc. (1975). Le fétichisme de la langue. Actes de la recherche en sciences sociales 1:132.CrossRefGoogle Scholar
Boyd, Sally, & Fraurud, Kari. (2010). Challenging the homogeneity assumption in language variation analysis: Findings from a study of multilingual urban spaces. In Auer, P. & Schmidt, J. E. (Eds.), Language and space: An international handbook of linguistic variation. Berlin: Mouton De Gruyter. 686706.Google Scholar
Bucholtz, Mary. (2003). Sociolinguistic nostalgia and the authentication of identity. Journal of Sociolinguistics 7:398416.CrossRefGoogle Scholar
Buchstaller, Isabelle. (2015). Exploring linguistic malleability across the life span: Age-specific patterns in quotative use. Language in Society 44:457–96.CrossRefGoogle Scholar
Campbell-Kibler, Kathryn. (2011). The sociolinguistic variant as a carrier of social meaning. Language Variation and Change 22:423–41.CrossRefGoogle Scholar
Cao, Juan, Xia, Tian, Li, Jintao, Zhang, Yongdong, & Tang, Sheng. (2008). A density-based method for adaptive lDA model selection. Neurocomputing–16th European Symposium on Artificial Neural Networks 72:1775–81.Google Scholar
Cheshire, Jenny, Kerswill, Paul, Fox, Sue, & Torgersen, Eivind. (2011). Contact, the feature pool and the speech community: The emergence of Multicultural London English. Journal of Sociolinguistics 15:151–96.CrossRefGoogle Scholar
Cheshire, Jenny, Kerswill, Paul, & Williams, Ann. (2005). Phonology, grammar and discourse in dialect convergence. In Auer, P., Hinskens, F., & Kerswill, P. (Eds.), Dialect change: Convergence and divergence of dialects in contemporary societies. Cambridge: Cambridge University Press. 135–67.CrossRefGoogle Scholar
Csárdi, Gábor. (2019). igraph R package. Available at Scholar
Deveaud, Romain, Sanjuan, Éric, & Bellot, Patrice. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique 17:6184.CrossRefGoogle Scholar
Eckert, Penelope. (2008). Variation and the indexical field. Journal of Sociolinguistics 12:453–76.CrossRefGoogle Scholar
Forgy, Edward W. (1965). Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–9.Google Scholar
Fruchterman, Thomas M.J., and Reingold, Edward M. (1991). Graph Drawing by Force-directed Placement. Software - Practice and Experience. 21:1129–64.CrossRefGoogle Scholar
Geeraerts, Dirk. (2010). Schmidt redux: how systematic is the linguistic system if variation is rampant? In Boye, K. & Engberg-Pedersen, E. (Eds.), Language usage and language structure. Berlin: Mouton de Gruyter. 237–62.Google Scholar
Gregersen, Frans, & Pharao, Nicolai. (2016). Lects are perceptually invariant, productively variable: A coherent claim about Danish lects. Lingua 172–173:2644.CrossRefGoogle Scholar
Gross, Johan. (2018). Segregated vowels: Language variation and dialect features among Gothenburg youth. Language Variation and Change 30:315–36.CrossRefGoogle Scholar
Gross, Johan, Boyd, Sally, Leinonen, Therese, & Walker, James A. (2016). A tale of two cities (and one vowel): Sociolinguistic variation in Swedish. Language Variation and Change 28:225–47.CrossRefGoogle Scholar
Grün, Bettina, & Hornik, Kurt. (2011). topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:130.CrossRefGoogle Scholar
Guy, Gregory R. (2013). The cognitive coherence of sociolects: How do speakers handle multiple sociolinguistic variables? Journal of Pragmatics 52:6371.CrossRefGoogle Scholar
Halonen, Mia, & Vaattovaara, Johanna. (2017). Tracing the indexicalization of the notion “Helsinki s”. Linguistics 55:1169–95.CrossRefGoogle Scholar
Hebdige, Dick. (1979). Subculture: The Meaning of Style. New York: Methuen.Google Scholar
Helpuhe–The Longitudinal Corpus of Finnish Spoken in Helsinki (1970s, 1990s and 2010s). University of Helsinki, Institute for the Languages of Finland and Heikki Paunonen. URN: Accessed September 15, 2018.Google Scholar
Horvath, Barbara, & Sankoff, David. (1987). Delimiting the Sydney speech community. Language in Society 16:179204.CrossRefGoogle Scholar
Irvine, Judith T. (2001). Style as distinctiveness: The culture and ideology of linguistic differentiation. In Eckert, P. and Rickford, J. (Eds.), Style and sociolinguistic variation. Cambridge: Cambridge University Press. 2143.Google Scholar
Itkonen, Terho. (1989). Nurmijärven murrekirja. [Dialect book of Nurmijärvi]. Helsinki: The Finnish Literature Society.Google Scholar
Johnstone, Barbara, & Kiesling, Scott F. (2008). Indexicality and experience: Exploring the meanings of /aw/-monophthongization in Pittsburgh. Journal of Sociolinguistics 12:533.CrossRefGoogle Scholar
Labov, William. (1966). The social stratification of English in New York City. Washington D.C: Center for Applied Linguistics.Google Scholar
Labov, William. (1990). The intersection of sex and social class in the course of linguistic change. Language Variation and Change 2:204–54.CrossRefGoogle Scholar
Lappalainen, Hanna. (2001). Sosiolingvistinen katsaus suomalaisnuorten nykypuhekieleen ja sen tutkimukseen. [A sociolinguistic overview of the contemporary spoken Finnish of young speakers and its research.] Virittäjä 105:74101.Google Scholar
Lappalainen, Hanna. (2010). Hän vai se, he vai ne? Pronominivariaatio ja normien ristiveto. [She or it, they or those? Variation in pronouns and the tug of war between norms.] In Lappalainen, H., Sorjonen, M-L. & Vilkuna, M. (Eds.), Kielellä on merkitystä. Näkökulmia kielipolitiikkaan. Helsinki: The Finnish Literature Society. 279324.Google Scholar
Lehikoinen, Laila, & Kiuru, Silva. (1989). Kirjasuomen kehitys. [The development of written Finnish.] Helsinki: University of Helsinki.Google Scholar
Levon, Erez, & Buchstaller, Isabelle. (2015). Perception, cognition, and linguistic structure: The effect of linguistic modularity and cognitive style on sociolinguistic processing. Language Variation and Change 27:319348.CrossRefGoogle Scholar
Ma, Roxana, & Herasimchuk, Eleanor. (1972). Speech styles in Puerto Rican bilingual speakers: a factor analysis of co-variation of phonological variables. In Fishman, J.A. (Ed.), Advances in the Sociology of Languages, vol. II. The Hague: Mouton. 268–95.Google Scholar
Mäki, Netta, & Vuori, Pekka. (2019). Helsingin väestö vuodenvaihteessa 2018/2019 ja väestönmuutokset vuonna 2018. [The population of Helsinki in 2018/19 and changes in the population in 2018.] Tilastoja 2019:9.Google Scholar
Mantila, Harri. (2004). Murre ja identiteetti. [Dialect and identity.] Virittäjä 108:322–46.Google Scholar
Meyerhoff, Miriam, & Walker, James A. (2013). An existential problem: The sociolinguistic monitor and variation in existential constructions on Bequia. Language in Society 42:407–28.CrossRefGoogle Scholar
Milroy, Leslie. (2002). Mobility, contact and language change–working with contemporary speech communities. Journal of Sociolinguistics 6:315.CrossRefGoogle Scholar
Mufwene, Salikoko S. (2001). The Ecology of Language Evolution. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Oushiro, Livia. (2016). Social and structural constraints in lectal cohesion. Lingua 172–173:116–30.CrossRefGoogle Scholar
Paunonen, Heikki. (1994). The Finnish language in Helsinki. In Nordberg, B. (Ed.), The sociolinguistics of urbanization: the case of the Nordic countries. Berlin: de Gruyter. 223–45.CrossRefGoogle Scholar
Paunonen, Heikki. (1995). Suomen kieli Helsingissä. [Finnish in Helsinki]. Helsinki: University of Helsinki.Google Scholar
Paunonen, Heikki. (2005). Helsinkiläisiä puhujaprofiileja. [Speaker profiles in Helsinki.] Virittäjä 109:162200.Google Scholar
Paunonen, Heikki. (2006). Vähemmistökielestä varioivaksi valtakieleksi. [From a minority to a varying majority language.] In Juusela, K. & Nisula, K. (Eds.), Helsinki kieliyhteisönä. Helsinki: University of Helsinki. 1399.Google Scholar
Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine 2 (11): 559–72.Google Scholar
Pratt, Mary Louise. (1987). Linguistic utopias. In Fabb, N., Attridge, D., Durant, A. & MacCabe, C. (Eds.), The linguistics of writing: arguments between language and literature. New York: Methuen. 4866.Google Scholar
Quist, Pia. (2008). Sociolinguistic approaches to multiethnolect: Language variety and stylistic practice. International Journal of Bilingualism 12:4361.CrossRefGoogle Scholar
R Core Team. (2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at: Scholar
Sankoff, Gillian. (2018). Before there were corpora: The evolution of the Montreal French project as a longitudinal study. In Wagner, S. & Buchstaller, I. (Eds.), Panel Studies of Variation and Change. Oxford: Routledge. 2152.Google Scholar
Schwarz, Gideon E. (1978). Estimating the dimension of a model. Annals of Statistics 6:461–4.CrossRefGoogle Scholar
Silverstein, Michael. (2003). Indexical order and the dialectics of sociolinguistic life. Language and Communication 23:193229.CrossRefGoogle Scholar
Spearman, Charles. (1904). General intelligence objectively determined and measured. American Journal of Psychology 15: 201–93.CrossRefGoogle Scholar
Thelander, Mats. (1979). A Qualitative Approach to the Quantitative Data of Speech Variation. Uppsala: University of Uppsala.Google Scholar
Thompson, Ken. (1968). Regular expression search algorithm. Communications of the ACM 11:419–22.CrossRefGoogle Scholar
Vaattovaara, Johanna, & Soininen-Stojanov, Henna. (2006). Pääkaupunkiseudulla kasvaneiden kotiseuturajaukset ja kielelliset asenteet. [Regional identity and linguistic attitudes among the capital region residents.] In Juusela, K. & Nisula, K. (Eds.), Helsinki kieliyhteisönä. Helsinki: University of Helsinki. 223–55.Google Scholar
Wagner, Suzanne Evans, & Sankoff, Gillian. (2011). Age grading in the Montréal French inflected future. Language Variation and Change 23:275313.CrossRefGoogle Scholar
Waris, Heikki. (1951). Helsinkiläisyhteiskunta. [The Helsinki society.] In Helsingin kaupungin historia 3 (2): ajanjakso 1809–1875. Helsinki: City of Helsinki. 89211.Google Scholar
Wolfram, Walt. (2007). Sociolinguistic folklore in the study of African American English. Language and Linguistics Compass 1:292313.CrossRefGoogle Scholar
Supplementary material: File

Kuparinen et al. supplementary material

Figure S1

Download Kuparinen et al. supplementary material(File)
File 4 MB

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Lects in Helsinki Finnish - a probabilistic component modeling approach
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Lects in Helsinki Finnish - a probabilistic component modeling approach
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Lects in Helsinki Finnish - a probabilistic component modeling approach
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *