The origins of Uralic speakers in Fennoscandia have for a long time intrigued both linguists and archaeologists (e.g. Fogelberg Reference Fogelberg1999; Lang Reference Lang2018), and also geneticists in recent decades (e.g. Saag et al. Reference Saag2017; Lamnidis et al. Reference Lamnidis2018; Tambets et al. Reference Tambets2018). To create a common basis for multidisciplinary work, the Department of Biology at the University of Turku launched the two-year archaeological and linguistic data-collection project, Kipot ja kielet (KiKi), as a part of a strategic data-collection call.
Archaeological data collection
KiKi's archaeological data collection continued the work of the Argeopop project (funded by the Finnish Academy 2009–2013), in which a Stone Age (c. 8900–1600 BC) artefact database was created at the Finnish Heritage Agency. While the Argeopop database was never made publicly accessible, the collected data were used in several papers applying spatiotemporal modelling of Stone Age settlement in Finland (e.g. Tallavaara et al. Reference Tallavaara, Pesonen and Oinonen2010; Sundell et al. Reference Sundell2014).
To widen the archaeological perspective and scale, KiKi not only increased the documentation of Stone Age artefacts, but also digitised the typologically discernible Bronze Age (c. 1600–500 BC) and Iron Age (c. 500 BC–AD 1200/1300) finds. The data-collection work was extended from the Finnish Heritage Agency to regional archaeological collections in museums and universities around Finland. The current archaeological database comprises over 40 000 entries and records around 70–80 per cent of Stone Age, 90 per cent of Bronze Age and 40–45 per cent of Iron Age finds discovered in Finland, the Åland Islands and the Karelian areas ceded to Russia in 1945 (metal-detector finds from the past two decades are not included) (Figures 1–2).
The archaeological data-collection work has included a systematic examination of archaeological collections around Finland. Finds have been photographed, and their typological features and measurements recorded. Details, such as use-wear or the remains of organic materials, have also been registered. The work has supplemented the information available in finds catalogues and provided an opportunity to verify or update original records. The verification of find locations allows a more accurate examination of the spatial distribution of finds.
Linguistic data collection
The Uralic language family consists of around 40 languages, which are spoken in a wide area of Northern Eurasia (Figure 3). Before KiKi no comprehensive collection of Uralic typological (structural) data on these languages existed, even though the languages were sporadically represented in the World Atlas of Linguistic Structure (WALS). Some Uralic languages were included in the language structure list Grambank developed by the Max Planck Institute for the Science of Human History. Although the Grambank list captures diversity between language families, it does not make fine-grained distinctions within a language family. Thus, in the course of the Kiki project, researchers from the University of Tartu compiled an additional list containing 165 features that would differentiate between Uralic languages (Figure 4). In total, the datasets include over 300 features relevant for 34 Uralic languages or language varieties, such as different dialects of Karelian. The online interface will provide geographic visualisation of these language character distributions.
The linguistic data were collected using a typological questionnaire containing 165 questions. For the Uralic typological database (UraTyp), which also includes 195 questions obtained from Grambank, this was done mainly using grammars and grammar sketches; additionally, language experts were interviewed by coders who explained the questions to them. This method helped overcome gaps and terminological differences in grammar books and allowed inclusion of results of the most recent, as yet unpublished studies on languages and those in older publications that may have followed different research traditions allowing only sporadic entries.
The information collected in the Finnish archaeological database forms a sound basis for comparative and interdisciplinary studies on the prehistory of Eastern Fennoscandia and answers the need for big data approaches. The project aims to show high usability of the data for advanced studies of spatiotemporal trends and variability in material culture and the overall changes in human activity concentrations through space and time. The database has vast potential for traditional archaeological research, as previously it was not possible to make use of archaeological big data at any comparable scale in Finland. The archaeological database not only offers tools for comprehensive typologies (cf. Whittaker et al. Reference Whittaker, Caulkins and Kamp1998) and accurate distribution maps, but also provides new possibilities to establish connections between ostensibly separate archaeological phenomena. It has already been possible to observe connecting local variants between ‘East-Karelian’ and ‘Bothnic’ Stone Age tool types (cf. Lehtosalo-Hilander Reference Lehtosalo-Hilander, Lehtosalo-Hilander and Pirinen1988). The origin and development of Bronze Age stone axe types in Finland also now appear more complicated than traditionally thought (Figure 2). The database even facilitates museological studies by offering the possibility of analysing how the criteria for curating certain finds have changed over time, and how these selections shape our understanding of archaeological periods, sites and related finds.
The linguistic database is the first of its kind to include large-scale data on all branches of the Uralic languages. It will offer a useful tool for studying and teaching linguistic variation of Uralic languages and structural (dis)similarities between Uralic and other language families, and may help trace the lineage of the Uralic family. The acquired information on linguistic events, contacts and divergences will complement the interdisciplinary studies on the human past.
Once published it will still be possible to add to and develop the two databases so that new data on, for example, Iron Age artefacts and language variants, such as dialects, can be incorporated. The open-access data will also be a useful tool for public outreach, increasing the impact and the value of the databases.
We thank Saara-Veera Härmä, Gerson Klumpp, Richard Kowalik, Enni Lappela, Sirpa Leskinen, Helle Metslang, Karl Pajusalu, Minerva Piha, Eva Saar, Jasse Tiilikkala and Elisa Väisänen for their work on the project. Niko Anttiroiko, Henrik Asplund, Reija Eeva, Juha Jämbäck, Leena Koivisto, Jutta Kuitunen, Kreetta Lesell, Veronica Lindholm, Tytti Partanen-Räikkönen, Tanja Ratilainen, Katja Vuoristo and Anna Väänänen are thanked for their help and assistance with the archaeological collections around the country.
The Kipot ja kielet project was funded by the University of Turku between 2018 and 2020.