Skip to main content Accessibility help
Hostname: page-component-55597f9d44-5zjcf Total loading time: 0.373 Render date: 2022-08-18T05:07:07.294Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true } hasContentIssue true

Constrained language use in Finnish: A corpus-driven approach

Published online by Cambridge University Press:  13 April 2020

Ilmari Ivaska*
Department of Finnish and Finno-Ugric Languages, FI-20014, University of Turku, Finland
Silvia Bernardini*
Department of Interpreting and Translation, University of Bologna, Corso della Repubblica 136, 47121Forlì (FC), Italy
Emails for correspondence: and
Emails for correspondence: and
Get access


It has been suggested that second languages and translated languages are constrained by an interplay of several linguistic systems. This paper reports on a data-driven quantitative study on constrained Finnish. We detect linguistic phenomena that distinguish constrained from non-constrained Finnish across constrained varieties, first/source languages, and registers. Implementing a two-phase method, we first detect key quantitative differences of syntactically defined POS bigrams between each variety-, language-pair- and register-specific constrained dataset and its non-constrained counterpart, using Boruta feature selection. We then use the results as variables in a Multi-dimensional Analysis. The results show that both nominal complexity and verbal/clausal complexity distinguish constrained from non-constrained Finnish. These differences interact with both type of constraint and register: the constrained varieties are less sensitive to register differences, and this tendency is more pronounced in learner Finnish than in translated Finnish. Leaving out any of these variables from the analysis would blur our view of this multi-faceted phenomenon.

Research Article
© Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Baker, Mona. 1993. Corpus linguistics and translation studies: Implications and applications. In Baker, Mona, Francis, Gill & Tognini-Bonelli, Elena (eds.), Text and Technology: In Honour of John Sinclair, 233250. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Baker, Mona. 1996. Corpus-based translation studies: The challenges that lie ahead. In Somers, Harold (ed.), Terminology, LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager, 175187. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Baroni, Marco & Bernardini, Silvia. 2006. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing 21(3), 259274.CrossRefGoogle Scholar
Becher, Viktor. 2010. Abandoning the notion of “translation-inherent” explicitation: Against a dogma of translation studies. Across Languages and Cultures 11(1), 128.CrossRefGoogle Scholar
Berber Sardinha, Tony & Pinto, Marcia Veirano (eds.). 2014. Multi-dimensional Analysis, 25 Years On: A Tribute to Douglas Biber. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Biber, Douglas. 1989. A typology of English texts. Linguistics 27(1), 343.CrossRefGoogle Scholar
Biber, Douglas. 2014. Using Multi-dimensional Analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1), 734.Google Scholar
Biber, Douglas & Conrad, Susan. 2009. Register, Genre, and Style. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Biber, Douglas, Gray, Bethany & Staples, Shelley. 2016. Predicting patterns of grammatical complexity across Language Exam Task types and proficiency levels. Applied Linguistics 37(5), 639668.CrossRefGoogle Scholar
Bohnet, Bernd, Nivre, Joakim, Boguslavsky, Igor, Farkas, Richárd, Ginter, Filip & Hajič, Jan. 2013. Joint morphological and syntactic analysis for richly inflected languages. Transactions of the Association for Computational Linguistics 1, 415428.CrossRefGoogle Scholar
Breiman, Leo. 2001. Random forests. Machine Learning 45(1), 532.CrossRefGoogle Scholar
Bulté, Bram & Housen, Alex. 2012. Defining and operationalising L2 complexity. In Housen, Alex, Kuiken, Folkert & Vedder, Ineke (eds.), Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA, 2146. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Eskola, Sari. 2004. Untypical frequencies in translated language: A corpus-based study on a literary corpus of translated and non-translated Finnish. In Mauranen & Kujamäki (eds.), 83–99.Google Scholar
Filipović, Luna & Hawkins, John A. 2013. Multiple factors in second language acquisition: The CASP model. Linguistics 51(1), 145176.CrossRefGoogle Scholar
Gabrielatos, Costas. 2018. Keyness analysis: Nature, metrics and techniques. In Taylor, Charlotte & Marchi, Anna (eds.), Corpus Approaches to Discourse: A Critical Review, 225258. Oxford: Routledge.CrossRefGoogle Scholar
Granger, Sylviane. 2015. Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research 1(1), 724.CrossRefGoogle Scholar
Gries, Stefan Th. On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement. Corpus Linguistics and Linguistic Theory, DOI: Published by de Gruyter, 16 April 2019.Google Scholar
Grosjean, François. 2001. The bilingual’s language modes. In Nicol, Janet (ed.), One Mind, Two Languages: Bilingual Language Processing, 122. Oxford: Blackwell.Google Scholar
House, Juliane. 2008. Beyond intervention: Universals in translation? trans-kom 1(1), 619.Google Scholar
Ivaska, Ilmari. 2014a. The Corpus of Advanced Learner Finnish (LAS2): Database and toolkit to study academic learner Finnish. Apples: Journal of Applied Language Studies 8(3), 2138.Google Scholar
Ivaska, Ilmari. 2014b. Edistyneen oppijansuomen avainrakenteita. Korpusnäkökulma kahden kielimuodon tyypillisiin rakenteellisiin eroihin [Key structures in advanced learner Finnish: Corpus approaches towards structural differences between two language forms]. Virittäjä 118(2), 161193.Google Scholar
Ivaska, Ilmari. 2014c. Mahdollisuuden ilmaiseminen S1-suomea ja edistynyttä S2-suomea erottavana piirteenä [Expressions of possibility as a distringuishing feature between L1-Finnish and advanced L2-Finnish]. Lähivõrdlusi. Lähivertailuja 24, 4780.CrossRefGoogle Scholar
Ivaska, Ilmari. 2015. Longitudinal changes in academic learner Finnish: A key structure analysis. International Journal of Learner Corpus Research 1(2), 210241.CrossRefGoogle Scholar
Ivaska, Ilmari, Reunanen, Elisa & Siitonen, Kirsti. 2016. Infinite Konstruktionen im fortgeschrittenen Finnisch als Fremdsprache [Infinitive constructions in advanced Finnish as a foreign language]. Ural-Altaische Jahrbücher 26, 4676.Google Scholar
Ivaska, Ilmari & Siitonen, Kirsti. 2017a. Learner language morphology as a window to crosslinguistic influences: A key structure analysis. Nordic Journal of Linguistics 40(2), 225253.CrossRefGoogle Scholar
Ivaska, Ilmari & Siitonen, Kirsti. 2017b. Tehdessä-konstruktio edistyneessä oppijansuomessa. Korpusanalyysin ja oikeakielisyysarviointien ristivalotus [The tehdessä construction in advanced learner Finnish]. Sananjalka 59, 154180.Google Scholar
Ivaska, Laura. 2019. Distinguishing translations from non-translations and identifying (in-)direct translations’ source languages. In Jantunen, Jarmo, Brunni, Sisko, Kunnas, Niina, Palviainen, Santeri & Västi, Katja (eds.), Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools, Oulu, 125138.Google Scholar
Iwasaki, Shoichi. 2015. A multiple-grammar model of speakers’ linguistic knowledge. Cognitive Linguistics 26(2), 161210.CrossRefGoogle Scholar
Jantunen, Jarmo. 2004. Untypical patterns in translations. In Mauranen & Kujamäki (eds.), 101–126.Google Scholar
Jantunen, Jarmo. 2008. Haasteita oppijankielen korpusanalyysille: oppijankielen universaalit [Challenges in the learner corpus analysis: The universals of learner language]. In Eslon, Pille (ed.), Õppijakeele analüüs: võimalused, probleemid, vajadused [Analysing learner language: Opportunities, problems, needs], 6792. Tallinn: Tallinna Ülikool.Google Scholar
Jantunen, Jarmo. 2011a. Kansainvälinen oppijansuomen korpus (ICLFI): typologia, taustamuuttujat ja annotointi [International Corpus of Learner Finnish (ICLFI): Typology, variables and annotation]. Lähivõrdlusi. Lähivertailuja 21, 86105.CrossRefGoogle Scholar
Jantunen, Jarmo. 2011b. Avainsana-analyysi annotoidun oppijankieliaineiston tutkimuksessa: Alustavia havaintoja [Keyword analysis in the study of annotated learner language data: Preliminary observations]. In Lehtinen, Esa, Aaltonen, Sirkku, Koskela, Merja, Nevasaari, Elina & Skog-Södersved, Mariann (eds.), AFinla-e 3, 4861.Google Scholar
Jantunen, Jarmo & Eskola, Sari. 2002. Käännössuomi kielivarianttina: Syntaktisia ja leksikaalisia erityispiirteitä [Translated Finnish as a language variant: Untypical syntactical and lexical features]. Virittäjä 106(2), 184207.Google Scholar
Jarvis, Scott. 2000. Methodological rigor in the study of transfer: Identifying L1 influence in the interlanguage lexicon. Language Learning 50(2), 245309.CrossRefGoogle Scholar
Jarvis, Scott. 2010. Comparison-based and detection-based approaches to transfer research. EUROSLA Yearbook 10, 169192.CrossRefGoogle Scholar
Kaiser, Henry F. 1974. An index of factorial simplicity. Psychometrika 39(1), 3136.CrossRefGoogle Scholar
Kanerva, Jenna, Ginter, Filip, Miekka, Niko, Leino, Akseli & Salakoski, Tapio. 2018. Turku Neural Parser Pipeline: An end-to-end system for the CoNLL 2018 Shared Task. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Brussels: ACL.Google Scholar
Kolehmainen, Leena, Meriläinen, Lea & Riionheimo, Helka. 2014. Interlingual reduction: Evidence from language contacts, translation and second language acquisition. In Paulasto, Heli, Meriläinen, Lea, Riionheimo, Helka & Kok, Maria (eds.), Language Contacts at the Crossroads of Disciplines, 332. Cambridge: Cambridge Scholars Publishing.Google Scholar
Koppel, Moshe & Ordan, Noam. 2011. Translationese and its dialects. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 13181326. Portland, OR: ACL.Google Scholar
Kruger, Haidee. 2017. The effects of editorial intervention: Implications for studies of the features of translated language. In De Sutter, Gert, Lefer, Marie-Aude & Delaere, Isabelle (eds.), Empirical Translation Studies: New Methodological and Theoretical Traditions, 113155. Berlin: de Gruyter.Google Scholar
Kruger, Haidee & van Rooy, Bertus. 2016. Constrained language: A multidimensional analysis of translated English and a non-native indigenised variety of English. English World-Wide 37(1), 2657.CrossRefGoogle Scholar
Kruger, Haidee & van Rooy, Bertus. 2018. Register variation in written contact varieties of English. English World-Wide 39(2), 214242.CrossRefGoogle Scholar
Kujamäki, Pekka. 2004. What happens to “unique items” in learners’ translations? In Mauranen & Kujamäki (eds.), 187–204.Google Scholar
Kursa, Miron & Rudnicki, Witold. 2010. Feature selection with the Boruta Package. Journal of Statistical Software, Articles 36(11), 113.Google Scholar
Lanstyák, Istvan & Heltai, Pál. 2012. Universals in language contact and translation. Across Languages and Cultures 13(1), 99121.CrossRefGoogle Scholar
Leech, Geoffrey. 2006. New resources, or just better old ones? The Holy Grail of representativeness. In Nesselhauf, Nadja & Biewer, Carolin (eds.), Corpus Linguistics and the Web, 133149. London: Brill.Google Scholar
Lefer, Marie-Aude & Vogeleer, Svetlana. 2013. Interference and normalization in genre-controlled multilingual corpora: Introduction. Belgian Journal of Linguistics 27(1), 121.Google Scholar
Mauranen, Anna. 2000. Strange strings in translated language: A study on corpora. In Olohan, Maeve (ed.), Intercultural Faultlines: Research Models in Translation Studies, 119141. Manchester: St Jerome Publishing.Google Scholar
Mauranen, Anna. 2004. Corpora, universals and interference. In Mauranen & Kujamäki (eds.), 65–82.Google Scholar
Mauranen, Anna & Kujamäki, Pekka (eds.). 2004. Translation Universals: Do they Exist? Amsterdam: John Benjamins.CrossRefGoogle Scholar
Mauranen, Anna & Tiittula, Liisa. 2005. MINÄ käännössuomessa ja supisuomessa [MINÄ ’I’ in the translated and non-translated Finnish]. In Mauranen, Anna & Jantunen, Jarmo (eds.), Käännössuomeksi. Tutkimuksia suomennosten kielestä [In translated Finnish: Studies on the language of Finnish translations], 3569. Tampere: Tampere University Press.Google Scholar
Miestamo, Matti. 2006. On the feasibility of complexity metrics. In Kerge, Krista & Sepper, Maria-Maren (eds.), FinEst Linguistics, Proceedings of the Annual Finnish and Estonian Conference of Linguistics, Tallinn, May 6–7, 2004, 11–26. Tallinn: Tallinna Ülikool.Google Scholar
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Scholar
Rabinovich, Ella, Nisioi, Sergu, Ordan, Noam & Wintner, Shuly. 2016. On the similarities between native, non-native and translated texts. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1870–1881. Berlin: ACL.CrossRefGoogle Scholar
Revelle, William. 2018. psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, IL: Northwestern University. Scholar
Rohdenburg, Günther. 1996. Cognitive complexity and increased grammatical explicitness in English. Cognitive Linguistics 7(2), 149182.CrossRefGoogle Scholar
Seilonen, Marja. 2013. Epäsuora henkilöön viittaminen oppijansuomessa [Indirect references in Finnish learner language]. Ph.D. thesis, University of Jyväskylä.Google Scholar
Spoelman, Marianne. 2013. Prior linguistic knowledge matters: the use of the partitive case in Finnish learner language. Ph.D. thesis, University of Oulu.Google Scholar
Szmrecsanyi, Benedikt. 2017. Variationist sociolinguistics and corpus-based variationist linguistics: Overlap and cross-pollination potential. Canadian Journal of Linguistics/Revue canadienne de linguistique 62(4), 685701.CrossRefGoogle Scholar
Szymor, Nina. 2018. Translation: Universals or cognition? A usage-based perspective. Target 30(1), 5386.CrossRefGoogle Scholar
Teitto, Heli. 2010. Human referents in subtitles: A study on personal pronouns and proper nouns in translated and original Finnish. MA thesis, University of Eastern Finland.Google Scholar
Tirkkonen-Condit, Sonja. 2004. Unique items: Over- or under-represented in translated language? In Mauranen & Kujamäki (eds.), 177–184.Google Scholar
Tirkkonen-Condit, Sonja. 2005. Häviävätkö uniikkiainekset käännössuomesta? [Do unique items disappear from translated Finnish?]. In Mauranen, Anna & Jantunen, Jarmo (eds.), Käännössuomeksi. Tutkimuksia suomennosten kielestä [In translated Finnish: Studies on the language of Finnish translations], 12137. Tampere: Tampere University Press.Google Scholar
Toury, Gideon. 2012. Descriptive Translation Studies – and beyond: Revised edition. Amsterdam: John Benjamins.CrossRefGoogle Scholar
VISK = Hakulinen, Auli, Vilkuna, Maria, Korhonen, Riitta, Koivisto, Vesa, Heinonen, Tarja Riitta & Alho, Irja, 2004: Iso suomen kielioppi [The great grammar of Finnish]. Helsinki: Suomalaisen Kirjallisuuden Seura. (accessed 24 November 2019).Google Scholar
Volansky, Vered, Ordan, Noam & Wintner, Shuly. 2013. On the features of translationese. Digital Scholarship in the Humanities 30(1), 98118.CrossRefGoogle Scholar
Cited by

Save article to Kindle

To save this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Constrained language use in Finnish: A corpus-driven approach
Available formats

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox.

Constrained language use in Finnish: A corpus-driven approach
Available formats

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive.

Constrained language use in Finnish: A corpus-driven approach
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *