Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-07T02:25:08.784Z Has data issue: false hasContentIssue false

Mosaic Database: Consolidation, Innovation, and Challenges in the Comparative Family Demography of Historical Europe

Published online by Cambridge University Press:  11 September 2024

Mikołaj Szołtysek*
Affiliation:
The Cardinal Wyszyński University, Warsaw, Poland
Bartosz Ogórek
Affiliation:
Institute of History, Polish Academy of Sciences, Warsaw, Poland
Siegfried Gruber
Affiliation:
University of Graz, Graz, Austria
Radosław Poniat
Affiliation:
University of Białystok, Białystok, Poland
*
Corresponding author: Mikołaj Szołtysek; Email: mszoltis@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

This paper looks at the progress that the Mosaic database has enabled in the study of family structures in continental Europe in the past. Our main argument is that the combination of comprehensive archival research, digitization and computation, data mining, and open-access dissemination that is at the core of the Mosaic project is bringing about an important shift in the fundamental principles that have driven European family history research to date. These transformative features of Mosaic go beyond mere data infrastructural developments, as scaling up to much larger datasets leads to qualitative differences in measurements, methods, and questions. Integrating these perspectives can lead to an important incremental shift in both the scale and the scope of knowledge about historical European family systems.

Information

Type
Advances in Data and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Social Science History Association
Figure 0

Figure 1. Changes in the volume of the Mosaic data over time (in population totals).Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: The discrepancy between the two curves is related to the fact that the release of new data in the early years of Mosaic was somewhat delayed by the requirements of several ongoing research projects. Currently, all datasets ever researched are also publicly available as part of Mosaic.

Figure 1

Figure 2. Spatial distribution of Mosaic data by settlement points.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).

Figure 2

Figure 3. Spatial distribution of Mosaic data by regions.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: each point on the map (B) represents the centroid of one Mosaic regional population as defined in the text. “Expected new data” include: (1) census samples Bessarabia 1850, Zagreb 1857, Serbia 1863, Montenegro 1879, Armenians in Istanbul 1907, Sarajevo 1910, about (200,000 persons in total) (see S. Gruber “Demography and society in historical Southeastern Europe” (FWF No. P 34285); (2) a selection of 12 localities from the Spanish census of 1887 (Censo de la Población de España) (ca. 60k persons) (University of Zaragoza); (3) a selection of 33 localities from the 1860 census in the province of Zaragoza in Spain (ca. 25k people) (University of Zaragoza); (4) the Florentine Catasto of 1427 based on D. Herlihy and C. Klapisch-Zuber’s original datafile (ca. 270,000 persons).

Figure 3

Figure 4. Spatial distribution of the selected demographic parameters across Mosaic data.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).

Figure 4

Figure 5. The share of nuclear and multifamily households for two sub-datasets of the Mosaic collection (by number of households per region).Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: On the left side of the scatter diagram - the 363 localities from the German territories from the period 1690-1867; on the right side - the 234 parishes from the Polish-Lithuanian Commonwealth (the late 18th and early 19th centuries).

Figure 5

Figure 6. The example of multilevel embodiment of a single locality from the Mosaic collection (the parish of Kaziemirza Wielka in Poland, 1791).Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: the eclipses from the smaller to the larger indicate, accordingly: the province, the country, the macro-region.

Figure 6

Figure 7. The share of householders among ever-married men, all Mosaic regions.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: the proportion of households among ever-married men in certain age groups was used as a proxy for the relationship between entering marriage and becoming head of household.Five bigger territorial groupings followed major institutional and socioeconomic distinction across historic Europe. “Germany”: German-dominated areas other than the Habsburg territories; “West”: areas west and south-west of Germany; “Habsburg”: Austrian, Hungarian, Croatian, as well as Slovakian data; “East”: east-central and eastern Europe, including the former Polish-Lithuanian Commonwealth and Russia (including Siberian territories geographically in Asia); “Balkans”: areas south and/or east of Croatia and Hungary.

Figure 7

Figure 8. Sequences of main life course transitions in Poland-Lithuania based on synthetic cohorts.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: Only male population. Entry into marriage is measured with the singulate mean age at marriage (Hajnal 1982); for leaving home - singulate mean age at leaving home (see Schürer 2004; Szołtysek 2015: 282–83); for household formation – singulate mean age at household headship (see Szołtysek 2015: 512–13). Home leaving data based on estimates of parental co-residence taken from the listings adjusted for the availability of parents assessed through CAMSIM microsimulation (Szołtysek 2015: 284–85). “West,” “East1,” and “East 3” stand, respectively, for: western and central Kingdom of Poland (including Silesia); central Belarussian part of the Grand Duchy of Lithuania; and, the southern Belarussian part of the Grand Duchy of Lithuania.

Figure 8

Figure 9. Four-cluster structure of Hajnal’s household formation markers on the geographic coordinates, Mosaic, and NAPP datasets combined.Sources: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org); Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 7.2 [dataset]. Minneapolis, MN: IPUMS, 2019. https://doi.org/10.18128/D020.V7.2.Note: SMAM – Singulate Mean Age at Marriage; Service – the share of unmarried women in the age group that is determined by the value of the female SMAM in each of populations under study; CMHD – Cumulative Marriage Headship Difference (for details, Szołtysek and Ogórek 2020: 56–57).Characteristics of cluster medoids: k1 - 20.8 (female SMAM), 3.9 (proportion female servants), 7.4 (cmhd), 0.41 (share nuclear households); k2 (respectively)- 20.7, 10.6, 2.4, 0.7; k3 – 26.3, 32, 1.1, 0.7; k4 – 27.2, 59.6, 0.7, 0.8.

Figure 9

Table 1. Application of various locator variables to the encoding of the members of a domestic group in the Mosaic data

Figure 10

Figure 10. Distribution of Mosaic regions by the value of the Patriarchy Index, by five bigger territorial groupings.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: The groupings as in Figure 7.

Figure 11

Figure 11. The connectivity graph showing the spatial weight matrix for Mosaic data.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: the five nearest neighbors (based on great circle distances) with a row-standardized inverse distance weight matrix.

Figure 12

Figure 12. Moran scatter plot for the proportion of elderly living in stem family configurations.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Note: spatially lagged variable is equal to the average value of the variable of interest among the neighbors of each datapoint.

Figure 13

Figure 13. Mosaic regions’ centroids overlaid over selected geocovariates.Source: for Mosaic - Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).For (A) - Gridded elevation: the GTOPO30 dataset (downloaded 30 and 31 August 2016 from http://earthexplorer.usgs.gov/; files: gt30e020n40, gt30e020n90, gt30w020n40, gt30w020n90, gt30w060n90). For (B) – Land suitability for agriculture; Ramankutty et al. (2002), data download - https://sage.nelson.wisc.edu/data-and-models/atlas-of-the-biosphere/mapping-the-biosphere/land-use/suitability-for-agriculture/. For (C) - Share croplands (1800): the History Database of the Global Environment (HYDE 3.1; https://public.yoda.uu.nl/geo/UU01/G4HO5I.html). For (D) – Share forests (1800): Ellis et al. (2010).Notes: All this information, over which the Mosaic data is superimposed, has been converted into numerical data on an interval scale linked to the centroids of the regions.To derive the information on the terrain ruggedness, we used the Terrain Ruggedness Index (TRI) (Wilson et al. 2007). For this, we applied the focal function in the R- library raster (the TRI formula can be found in the help function of “terrain” in the raster library). For Mosaic sites, we generated the information for all sites by looking at the raster data within a circle with a diameter of 7.5 km around the site coordinates. Based on this data, we derived the population-weighted values for all regions.The Land suitability for agriculture (LSA) index was extracted from the corresponding raster files using the “extract” function in the R package “raster.” For each region in our database (residential points for Mosaic), a population-weighted centroid was first derived so that the value of the variable reflected the mean around that point (with the buffer size set at 50 km to better capture local variation in the environment). The same procedures were applied to cropland. However, since the proportion of cropland is available from HYDE 3.1 for each decade after 1700, we can use the raster for the date closest to each census date for each region in our collection. The afforestation data is available in four scenarios (1700, 1800, 1900, and 2000). Again, we can use the grid data closest to the respective census date.

Figure 14

Figure 14. Within-country regional distribution of the Patriarchy Index across Eurasia.Sources: For Mosaic – Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).For NAPP and IPUMS-I: Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 7.2 [dataset]. Minneapolis, MN: IPUMS, 2019, https://doi.org/10.18128/D020.V7.2Notes: The data used for the figure comprised 311 regional populations from 1700 to 1926 with 29 million individuals. The contemporary data included 546 regions from 21 countries with 65 million individuals. Each of the stacked histograms refers to the distribution of regional PI values within a country and contains the mean PI value for a particular region. The dashed vertical line shows the mean PI value for the entire data set.

Figure 15

Figure 15. Spatio-temporal variation in the Mosaic data.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Notes: The size of the circles indicates the number of regions in each period and region. Five bigger territorial groupings as in Figure 7.

Figure 16

Figure 16. Bootstrapped sex ratios and their corresponding 95% confidence intervals by sample size, Mosaic data.Source: Gruber, Siegfried, Mikołaj Szołtysek, and Bartosz Ogórek (2023) Mosaic datafile, 2023 [machine-readable dataset]. IPUMS-International (mosaic.ipums.org).Notes: sample size refers to the number of children 0–4 in particular region. Confidence intervals based on resampling with replacement (5,000).