Prussia disaggregated: the demography of its universe of localities in 1871

Abstract We provide, for the first time, a detailed and comprehensive overview of the demography of more than 50,000 towns, villages, and manors in 1871 Prussia. We study religion, literacy, fertility, and group segregation by location type (town, village, and manor). We find that Jews live predominantly in towns. Villages and manors are substantially segregated by denomination, whereas towns are less segregated. Yet, we find relatively lower levels of segregation by literacy. Regression analyses with county-fixed effects show that a larger share of Protestants is associated with higher literacy rates across all location types. A larger share of Jews relative to Catholics is not significantly associated with higher literacy in towns, but it is in villages and manors. Finally, a larger share of Jews is associated with lower fertility in towns, which is not explained by differences in literacy.


Introduction
Nineteenth-century Prussia, the dominant state in the German Empire, provides for a fascinating setting to study many of the most fundamental questions in 19th-century economic history. Its relatively uniform institutional setting and extremely rich sources of surviving data provide researchers with unique opportunities to study the link between religion, literacy, fertility, economic development, and many other topics.
Prussian census data have long been recognized as a source of high-quality disaggregated data. Data for more than 400 Prussian counties (Kreise) have been used for demographic research for at least 25 years [see e.g., Galloway et al. (1994) and Lee et al. (1994) for research into the fertility decline in Prussia]. Compared to earlier research which either looked at national aggregates or used district-level data for the roughly three dozen districts (Regierungsbezirke), such as the work of the Finally, we make publicly available digitized locality-level data which, to our knowledge, have never been available to researchers before. Many smaller locations have since been incorporated into larger villages and towns and their distinctive features have thus "disappeared" in those larger administrative units. Our new data will give the opportunity to economic and social historians to reconstruct, at the time of the German unification in 1871, the demographic, religious, and educational structure of each location, giving renewed impetus to research on rural economic history.
Our main results are multiple. We find that as much as 79% of Jews in Prussia lived in towns. For comparison, only 35% of Protestants and 26% of Catholics lived in towns. For the first time, we can shed light on the level of literacy in rural administrative units. We find that about 73% of people living in manors were able to read and write. This is an astonishingly high level, considering the rural nature of these administrative units with an average size of 126 people. Such a literacy level compares favorably to the whole of France (62%) and it was much higher than Italy or Spain in the same period. The literacy rate in Italy at the time of unification in 1861 was only 27% and around 35% in 1871; the literacy rate in Spain was around 30% by 1870 [Cipolla (1969)]. In fact, the average literacy rate in Prussian manors in 1871 was much higher than in large Spanish cities such as Barcelona (0.50), Madrid (0.66), or Valencia (0.40) three decades later, in 1900 [Cinnirella et al. (2020)]. Compared to Italy, the literacy level in the Prussian manors was considerably higher than the literacy rate of Turin (0.58), the city with the highest level of literacy in 1871 Italy [Federico et al. (2019)].
We construct a dissimilarity index to document group segregation along religious lines and for literate and illiterate people. Our results unveil a significant level of denominational segregation within counties, suggesting that Protestants and Catholics tended to cluster in localities. In particular, we find that denominational segregation was more pronounced in villages and manors than in towns. This is an important result for those working with county-level data as county averages seem to hide a significant amount of variation across localities. On the contrary, we find relatively low levels of segregation between literate and illiterate people.
Our new data set allows us to perform regression analyses accounting for county-fixed effects. We find that the relationship between literacy and population size decreases for manors, whereas it is positive in the case of towns and villages.
The regression analysis shows that in towns, villages, and manors, a larger share of Protestants is associated with higher literacy. Although a larger share of Jews is associated with higher literacy in villages and manors, this is not the case in towns.
Finally, we find a large and significant negative association between the share of Jews in towns and fertility, measured as the number of children below age 10 over the total number of women (child-woman ratio). Importantly, this negative association is not explained by differences in literacy, consistent with a possible cultural explanation for the lower fertility.
Our regression results should not be interpreted causally. In fact, it is not the objective of this paper to identify any causal relationship between the socio-economic variables at our disposal. The aim is to provide a comprehensive cross-sectional description of some religious and socio-economic patterns of Prussian demography in 1871. Our hope is that the analysis provided in this paper will spark new research ideas.
The rest of the paper is structured as follows: we discuss the related literature in section 2 and introduce the data in section 3. We perform a descriptive analysis in section 4 and document the extent of group segregation in section 5. In section 6, we present the results from the regression analysis and section 7 concludes providing some hints for future research.

Literature
The previous literature on demographic and religious aspects of 19th century Germany had a strong focus on the determinants of the fertility transition. In the 1960s and 1970s, the so-called European Fertility Project was carried out and Knodel analyzed the demographic patterns for Germany [Knodel (1974);Coale and Watkins, (1986)]. Yet, the studies on Germany mainly relied either on individual-level data for a very small sample of villages, or on region-level aggregates. Knodel and Maynes (1976) review evidence on urban-rural and regional differences in marriage patterns in Germany around 1880. They use data from the 1880 census of the Imperial Statistical Bureau (Statistisches Reichsamt) at the district level (Regierungsbezirk, n = 74). They find that the proportions of people never marrying, and age at marriage were higher in urban areas and lower in the countryside. Knodel (1977) reviews results on urban-rural differences in nuptiality, fertility, illegitimacy, and infant mortality across German states. He finds that urban-rural residence patterns account for relatively small differences in the proportion of single women, overall fertility, and infant mortality. On the contrary, there is no clear urban-rural differential in illegitimacy, male nuptiality, and marital fertility. As in the previous case, these results are based on census data aggregated at the district level. Knodel (1979) uses a sample of 12 German village genealogies to document changes in reproductive patterns in Germany during the 19th century. He finds large differences in the timing of the emergence of fertility limitation. Religious aspects of fertility limitation behavior are not addressed. Brown and Guinnane (2002) use district level data (138 rural districts and 38 urban districts) from Bavaria to study the determinants of the fertility transition in the 19th century. They find that Bavaria's fertility transition occurred late with respect to neighboring Prussia. Catholicism was strongly positively related to fertility, underlining the potential importance of cultural norms. Yet, the authors find also an important role for economic motives: areas with higher economic opportunities for women experienced the most rapid decline in fertility.
Their results differ from those put forward by Galloway et al. (1994). Using data for 407 counties in Prussia in the period 1875-1910, they find that religion is by far the most important determinant of fertility levels, followed by ethnicity. Galloway et al. (1998) analyze the fertility decline in cities and rural counties in Prussia for the same period. They find that urban fertility was much lower than rural fertility because of changes in female labor force participation, communications, improvements in education, and a reduction in infant mortality. 1 Galloway has given a substantial contribution to the study of Prussian economic history also with the creation of the Galloway Prussia Database which provides, at the county level, many socioeconomic variables from different Prussian censuses for the period 1861-1914. 2 1 More recently, Klüsener and Goldstein (2016) reintroduced geography in the debate on the determinants of the fertility decline. By applying spatial econometrics to county-level data in Prussia, they find a strong degree of spatial clustering in the fertility decline. Becker et al. (2014) created a complementary database which goes further back in time and provides a rich collection of variables covering the period 1816-1901. Recently, there has been an increased interest in the relationship between Protestantism and the accumulation of human capital. 3 Becker and Woessmann (2009) use county-level data for 1871 Prussia to take a fresh look at Weber's hypothesis of a Protestant work ethic. Their main finding is that, although Protestantism affects literacy, after controlling for literacy, Protestantism has no residual impact on economic outcomes. Cantoni (2015) addresses a similar question using city-level data for German territories for the period 1300-1900. He finds no evidence of a Protestant advantage in terms of urbanization. The difference in the results is likely explained by the different unit of observation, namely the county and the city. With our newly collected data, we can shed light on the relationship between literacy and religious denominations by locality. Goldstein and Klüsener (2014) use county-level census data to document how different modes of production in agriculture between west and east Elbian territories might explain differences in non-marital fertility toward the end of the 19th-century. Related to our story, they argue that the rural population of the west lived more concentrated in rural villages (with comparatively more social control) whereas the rural population in East Elbia was more dispersed on rural estates. Different degrees of social control and more seasonal work migration might have generated persistent differences in non-marital fertility patterns. With our data at the highly disaggregated level of more than 50,000 localities, one can shed more light on different demographic patterns in rural areas in eastern and western Prussian territories. Relatedly, Cinnirella and Hornung (2017) document the relationship between the power of landed elites and marriage patterns. They find no systematic evidence in favor of the hypothesis that noble landowners directly interfered with marriage decisions. Yet, they find a robust negative relationship between education and the share of married women. Ogilvie and Küpker (2015) study human capital levels in Wuerttemberg between 1610 and 1899. The analysis based on individual-level data from two selected locations finds that literacy (measured by the ability to sign their marriage inventory) declined significantly with age, controlling for family wealth at marriage. The authors interpret this finding as evidence of a progressive decay of writing skills after leaving school, consistent with the idea that literacy was not relevant for economic life.

Data
Starting with the first full-scale population census in 1816, the Royal Prussian Statistical Office collected a large amount of economic and demographic data. Although individual-level census data have not been preserved to this day, tens of thousands of pages of county-level data have survived in archives. Thanks to the Prussian proverbial orderliness and thoroughness, high quality data for the Prussian counties (Kreise) covering nearly the whole range of the 19th century are at researchers' disposal. These data provide a unique source for empirical research in economic history, with the potential to study historical micro-regional data with modern microeconometric methods.
For this paper, we collected data for the universe of localities in 1871 Prussia provided by the official population census (Die Gemeinden und Gutsbezirke des Preussischen Staates und ihre Bevölkerung nach den Urmaterialen der allgemeinen Volkszählung vom 1. December 1871). We collected information on the population (by gender), religious denomination (Protestant, Catholic, Jewish, other Christian religion, and other religion), number of people below age 10, and number of literate and illiterate people aged 10 or older. All these variables are provided for every single locality, i.e., towns (Stadt), villages (Landgemeinde), and manors (Gutsbezirk). 4 In total, we have data for 54,270 localities. The results presented below are based on 1,285 towns, 37,783 villages, and 15,202 manors. Descriptive statistics are reported in Table 1.
The average size of towns was 6,204 inhabitants, 386 inhabitants for villages, and only 126 inhabitants for manors (see Table B1 in the Appendix). 5 The urbanization rate in 1871 Prussia, i.e., the share of people living in towns was around 30%. Therefore, although the number of towns is only ca. 2% of the total number of localities (Table 1), they represent close to one-third of the whole population.

Descriptive analyses 4.1 Religion
In Table 2, we show the share of religious groups by type of locality. In towns, 67% are Protestants, 29% are Catholics, and 3% Jews. In villages, we find about 71% of Protestants and 29% of Catholics and a very small share of Jews. We find a similar distribution across manors. The most interesting result regards the Jewish population, 79% of which lived in towns (see Table 3).
For each type of locality (town, village, and manor), we can compute the Herfindahl index (also known as the index of diversity) which indicates the probability of sampling two individuals of different denominations. 6 In Figure 1, we display the distribution of the Herfindahl index for the entire Prussia and by locality type. As expected, we find that there is much more religious diversity in towns (0.21) than in villages (0.08) or in manors (0.12) (see Table B1 in the Appendix). 4 There are few cases (n = 213) of localities "not incorporated" (nicht incommunalisierte Wohnplätze) which are not analyzed separately but included in the category of manors. For the county of Namslau (in the province of Silesia), data on religion and literacy are missing for manors. We therefore drop this county from the dataset (n = 132). We drop 191 observations where either demographic, religious, or literacy status is completely missing. There are-very few-typos in the printed originals for which there was no obvious correction. As described in the Appendix, the Prussian Statistical Office provided corrections in the appendix of each volume, which we took on board, but in a very low number of cases, there was no correction provided. As a result, we drop few observations in which the share of males, females, or literacy rates is, respectively, larger than 1.01 (n = 12). We do the same if the sum of the religious shares in a locality is above 1.01 (n = 18). We chose the value of 1.01 to allow for rounding. Regarding the data on literacy, it is important to note that, when constructing the literacy rate, we count individuals whose literacy status is missing as illiterate. For more details about the data sources see the Data Appendix A.

Literacy
The literature has used various ways to measure historical literacy rates, e.g., signatures in marriage registers or age heaping, using the fact that self-reported age is often rounded [see A'Hearn et al. (2009)]. The census reports the number of people aged 10 or older who are able to read and write, or illiterate. Using this information, we construct the literacy rate for each locality. 7 The distribution of the literacy rate for the whole of Prussia and by the different type of localities are shown in Figure 2. It is well known that the average level of literacy in 19th-century Prussia, around 80%, was high in both absolute and relative terms. What, so far, was less known is the distribution of literacy in villages and manors. Although the distribution of literacy is left skewed in the case of villages, it appears to be more uniform in the case of  We compute the literacy rate as the ratio of the number of people aged 10 or above who are able to read and write over the total number of people aged 10 or above. As mentioned above, we do not consider the individuals whose literacy status is missing as literate.
manors. In particular, we find that even in small manors, on average, about 73% of the population was able to read and write (compared to 89% in towns and 82% in villages). Therefore, the literacy rate in manors in Prussia was higher than the country-wide  Note: The literacy rate is computed as the ratio of the number of people above age 10 able to read and write over the total number of people above age 10. The vertical dotted lines indicate the sample mean. Source: Die Gemeinden und Gutsbezirke des Preussischen Staates und ihre Bevölkerung: nach den Urmaterialien der allgemeinen Volkszählung vom 1. December 1871. Berlin: Verl. d. Königl. Statist. Bureaus, 1873-1874 literacy rate in many European countries. For example, it compares with the overall literacy rate of 69% in France in 1870 and it is much higher than the literacy rate in Italy in 1870 (32%) or in Spain (26%) in 1860 [Cipolla (1969)].

Fertility
To study differences in fertility across localities, we compute the ratio of children below age 10 over the total number of women in the population (child-woman ratio) as our measure of fertility. 8 It is important to stress that we standardize by the number of women because including men could introduce a significant bias due to male deployment and casualties in the Franco-Prussian war of 1870/71, and because of gender-biased internal migration to larger settlements such as towns. 9 This index for fertility is 0.5 for the whole of Prussia, i.e., there are 0.5 children under the age of 10 for each woman (of any age) in Prussia in 1871. The average by locality provides interesting insights: whereas fertility was comparatively lower in towns (0.469), we do not find a significant difference in fertility levels between villages (0.497) and manors (0.495) (Figure 3). 10 Figure 3. Distribution of fertility by locality. Note: The child-woman ratio is computed as the number of children under age ten over the number of women. Only for display purposes, we omit from the graph 104 localities in which the child-woman ratio is larger than one. Source: Die Gemeinden und Gutsbezirke des Preussischen Staates und ihre Bevölkerung: nach den Urmaterialien der allgemeinen Volkszählung vom 1. December 1871. Berlin: Verl. d. Königl. Statist. Bureaus, 1873-1874 Note that the data do not allow us to measure the number of children ever born, i.e., we cannot explicitly account for child mortality. 9 It should be noted that the child-woman ratio can take values over one when the number of children below age 10 is larger than the number of women. Indeed, we have 104 cases where this is the case. For display purposes we omit these observations in Figure 3 but we use them in the regressions. 10 It should be noted that if we standardized the number of children below age 10 per capita, thus including also men in the denominator, we would find that fertility levels in manors are slightly but significantly higher than in villages.

Religious segregation
One of the advantages of our data is the possibility to study how two or more groups (e.g., religious denominations) are distributed across localities (towns, villages, and manors). In fact, obtaining a clearer understanding of how religious groups clustered locally is important as it sheds light on potential (i) interactions between the groups and (ii) spillover effects. The segregation analysis can also inform us about the potential ecological fallacy of analyzing data at a more aggregate level such as the county.
To study group segregation we use the dissimilarity index D which is the most common measure of group segregation. For the case of two groups (e.g., Protestants and Catholics), the dissimilarity index is defined as follows: where p i is the number of Protestants in the i th locality; P is the total number of Protestants in the large geographic entity for which the index of dissimilarity is calculated (in our case the county); c i and C refer, respectively, to the Catholic population in the locality and in the county. The index can be interpreted as the share of a group that would have to move to a different locality (town, village, or manor) to produce a distribution that matches that of the larger area, e.g., the county. 11 The index can take on values between zero and one. A value of the index close to one indicates that the groups are highly segregated, for example nearly all Protestants live in towns whereas nearly all Catholics live in villages (or manors); a value of the index close to zero indicates that the groups are not segregated and that the distribution in each locality closely matches the distribution at the county level. It is important to note that the index D is "scale free," i.e., it does not depend on the population size of the county or on the relative shares of the groups. We provide a simple example to explain how the index works. Let's assume that there are two counties, A and B. Within each county there are two localities, 1 and 2. In county A, there are 100 people in total, 90% Catholics and 10% Protestants. All Catholics (n = 90) live in locality 1 and all Protestants (n = 10) in locality 2. In this county, the groups are perfectly segregated, i.e., the dissimilarity index D for county A is 1. In county B there are also 100 people, 80% Catholics and 20% Protestants. In locality 1, there are 20 Catholics and 5 Protestants; in locality 2, there are 60 Catholics and 15 Protestants. The dissimilarity index D for county B is zero as the proportion of Catholics over Protestants is always 4:1, for both locations, and for the county as a whole. Figure 4 shows the distribution of the dissimilarity index D for Protestants and Catholics across counties. The average is about 0.55, i.e., we would need to move 55% of a denomination between localities to match the distribution at the county level. This value for the dissimilarity index suggests that there is a substantial level of religious segregation in 1871 Prussia. 12 This important result is a warning for those, 11 The index is "symmetric" in the sense that it does not matter which group is moved to match the distribution at the larger level. 12 To put this value in context, gender occupational segregation in the USA is equal to 0.25 whereas multi-group occupational segregation by ethnicity is equal to 0.12 [Alonso-Villar et al. (2012)].
including the authors, who have been working with county-level data. In fact, in many cases the distribution of denominations at county level deviates significantly from the distribution at lower levels of administrative units. We show in Figure 5 how the segregation index is distributed geographically. The darkest regions, that is those with the higher levels of religious segregation, are in the Rhineland and Hessen-Nassau province (in the west) and in East Prussia (in the east).
In Figure 6, we report the dissimilarity index computed over all five denominations (Protestants, Catholics, Jews, other Christians, and other religion). As one can see the distribution of D does not change by much, and the mean is 0.54. For this reason, and because Protestants and Catholics are by far the largest denominations in Prussia, the following segregation analysis will be based only on these two main groups.

Religious segregation in urban and rural localities
To assess the contribution of urban and rural localities to religious segregation, we perform the segregation analysis separately for towns and rural localities, i.e., villages and manors. 13 In Figure 7, we report the kernel density estimates of the distributions of D separately for towns and rural localities. As one might expect, most of the religious segregation takes place across rural localities, i.e., villages and manors tend to be either predominantly Protestant or Catholic. On the contrary, there is much less segregation across towns, although the right tail of the distribution indicates the presence of counties with highly segregated towns. For example, the county of Zell, near Koblenz, in the Rhineland province, has a dissimilarity index equal to 0.87. In this county there are two towns, Zell and Trarbach, whose populations are Note: The mean of the dissimilarity index across counties is 0.55. Only for display purposes, we remove the bar at zero that indicates the city-counties (Stadtkreis) for which, by definition, locality and county coincide. Source: Die Gemeinden und Gutsbezirke des Preussischen Staates und ihre Bevölkerung: nach den Urmaterialien der allgemeinen Volkszählung vom 1. December 1871. Berlin: Verl. d. Königl. Statist. Bureaus, 1873-1874. 13 We consider only counties in which there are at least two towns for the urban sample and two villages or manors for the rural sample.  denominationally highly clustered. The town of Zell is predominantly Catholic with 2,223 Catholics, 60 Protestants, and 40 Jews; Trarbach has 1,475 Protestants, 227 Catholics, 1 Jew, and 1 from another Christian denomination. The two towns are only about 10 km distant from each other.

Human capital segregation
Since we have data on the number of literates and illiterates for each locality, we can compute the dissimilarity index for these two groups to study if people in Prussia clustered in high vs. low human capital groups. 14 The histogram of the human capital dissimilarity index is shown in Figure 8. The first thing to note is the relatively low level of human capital segregation with an average of 0.23. This means that the distribution of literates and illiterates across localities does not deviate much from the distribution of the two groups at the county level. We notice also the relatively low level of variance compared to religious segregation.
We show in Figure 9 the geographic distribution of human capital segregation in 1871 Prussia. We find relatively high levels of human capital segregation in Hessen-Nassau, Hannover, and in the western part of the province of Prussia (regions of Danzig and Marienwerder).
Also in this case, we can try to disentangle the urban and rural contributions to the human capital segregation of the county. Thus, we compute the dissimilarity index for the group of literates and illiterates in, respectively, urban and rural localities. The 14 Note that we do not consider the individuals whose information on their literacy status is missing. kernel density estimates of the respective distribution (with the relative mean) are shown in Figure 10. Consistent with the findings on religious segregation, we find a relatively higher level of human capital segregation in rural localities than in urban centers. However, in both cases the level of human capital segregation is lower than the religious segregation.
At this point one could ask: How much of the human capital segregation is "explained" by religious segregation? A higher level of religious segregation could mechanically result in a high level of human capital segregation. 15 To address this question, we first compute the dissimilarity index for literacy for localities with above-average and below-average Protestant shares. The distributions of the respective indexes are displayed in Figure 11. We find that, in a given county, localities with above-average and below-average Protestant shares have similarly low levels of human capital segregation. This result suggests that, whereas Protestants and Catholics tended to be segregated across (mainly rural) localities (see Figure 7), we do not observe differential human capital segregation by religious majorities.
Second, we can plot the human capital dissimilarity index against the religious dissimilarity index ( Figure 12). As one can see, almost all of the religious segregation indexes lie below the 45 degree line suggesting that increasing levels of religious segregation are less than proportionally related to increments in human capital segregation. Referring to the previous example of the county of Zell which has a level of religious segregation equal to 0.87 (!), the dissimilarity index for literacy is only 0.19. Note: The mean of the dissimilarity index for literacy is 0.23. Only for display purposes, we remove the bar at zero that indicates the city-counties (Stadtkreis) for which, by definition, locality and county coincide. Source: Die Gemeinden und Gutsbezirke des Preussischen Staates und ihre Bevölkerung: nach den Urmaterialien der allgemeinen Volkszählung vom 1. December 1871. Berlin: Verl. d. Königl. Statist. Bureaus, 1873-1874 This is only possible if there is not a perfect correspondence between Protestantism (Catholicism) and literacy (illiteracy).

Regression analyses
So far, we have documented how religion, literacy, and fertility were distributed across localities. In this section, we study the relationship between these variables in a regression framework. In particular, we analyze, for the whole of Prussia and by type of locality (i) the relationship between literacy and population size; (ii) the relationship between literacy and religious denomination; and (iii) the relationship between fertility and religious denomination.
Some of these relationships have been studied before in the literature and with more sophisticated econometric analyses. For example, the relationship between religious denominations and literacy has been explored in Becker and Woessmann (2008). The relationship between fertility, human capital, and religion has been investigated in Becker et al. (2012). However, the advantage of our setting is that we can exploit fine-grained local variation, accounting for time-invariant county-specific characteristics that might have affected previous results based mainly on cross-county or cross-city estimates. Yet, it is important to bear in mind that the relationships reported and discussed below do not have a causal interpretation. In fact, earlier work by Becker and Woessmann (2009) used distance to Wittenberg, the center of Martin Luther's Reformation movement, as an instrumental variable for the spread of the Reformation throughout the regions of Prussia. Obviously, this instrument does not lend itself to exploiting within-county variation in Protestantism. Although distance to Wittenberg is expected to identify variation in Protestantism at a more aggregated level such as the county, this instrument has no power to explain variation within very short distances such as across localities within counties.

Literacy and population
In this section, we explore the possibility of a heterogeneous relationship between population size and literacy across locality type. This could be due to agglomeration economies in the provision of education.
We start by exploring graphically the relationship between literacy and log population size. 16 In Figure 13, we present scatter plots of literacy rates against log population size by locality. We unveil some interesting patterns. The relationship between literacy and log population size is increasing for towns and villages. On the contrary, we find that, for manors, the relationship between literacy and population size is negative. Therefore, the larger the manors, the lower the literacy rate. This important finding is substantiated by the regression analysis below.
In Table 4, we report the ordinary least squares (OLS) estimates. Results in column 1 indicate the existence of a positive relationship between literacy and log population size which holds even when including dummies for locality type (column 2). The coefficients for the locality dummies indicate that literacy levels in villages and manors were significantly lower than that in towns. The estimates by locality in columns 4-6 (with county-fixed effects) confirm the results shown in Figure 13, namely a positive relationship between literacy and log population size for towns and villages and a negative relationship for manors. 17 The result on the negative relationship between literacy and log population in manors is consistent with the  16 We use log population size as the population distribution is highly skewed to the right. The main conclusions of the analysis hold if we use population levels instead of log population. 17 Note that these results hold if we control for the share of people with missing information on their literacy status.    interpretation that a larger manorial population would be more spatially distributed and would thus have less access to churches and schools, impairing the accumulation of human capital. It is important to bear in mind that this data source does not allow us to account for the Polish minority, which was predominantly present in eastern counties. If Poles were disproportionally living in large manors in the east, and if their level of literacy was comparatively lower, this could also explain the negative relationship between literacy and population size for manors [Kersting et al. (2020)]. One way to address this point is to control for the share of Catholics, because Poles were predominantly Catholic. If we include the share of Catholics in the specification of column 6 in Table 4 (not shown), the negative relationship between literacy and log population is virtually unaffected. This result should attenuate the concern that the Polish population affects the association between literacy and population size across localities.

Literacy and religion
In Table 5, we report OLS estimates of the relationship between literacy and religious denomination. The coefficients for the religious groups in column 1 confirm previous results of the literature, namely that literacy rates are larger when the share of Protestants and Jews is higher (the share of Catholics is the reference group in the regression). This result for the share of Protestants holds when accounting for locality fixed effects (column 2) and county-fixed effects (column 3).
The estimates by locality (columns 4-6) also unveil important heterogeneity. First, the Protestant "literacy premium" is much larger in manors. Second, we have previously seen that Jews lived predominantly in towns (see Table 3). The estimates in column 4 suggest that, in towns, a larger share of Jews is not associated with higher literacy. 18 There is, however, a significantly positive relationship between the share of Jews and literacy rates in rural villages and manors (column 5 and 6, respectively). 19 Following Becker and Woessmann (2009), a large literature has studied the relationship between Protestantism and literacy. 20 Our new data allow us to investigate more in detail this relationship. In Figure 14, we plot the relationship between literacy and the share of Protestants by locality. The scatter plots reveal an interesting pattern, namely that literacy does not increase monotonically with the share of Protestants. Instead, we observe that the literacy rate first declines at very low levels of the share of Protestants, and then it starts to increase only at around 20% of Protestants. This pattern seems to be consistent across types of locality although it is more accentuated in villages and manors. A possible explanation for this U-shape pattern refers to the provision of public education. Religiously mixed 18 See Abramitzky and Halaburda (this issue) on the literacy of urban and rural Jews in the context of interwar Poland. 19 Note that when we talk about the Protestant "literacy premium," we are essentially associating the share of Protestants and the literacy rate in a location with an underlying individual-level association. This is subject to any potential caveat arising from ecological fallacy, which, however, is less pronounced in smaller locations, such as municipalities, than in larger geographic units, such as counties.        localities might have suffered from coordination problems in the provision of public education, thus generating lower levels of literacy [Cinnirella and Schueler (2016)]. 21

Fertility and religion
Finally, we analyze the relationship between fertility and the shares of religious denominations (Table 6). We proxy the fertility level by the ratio of the number of children below age 10 over the total number of women (child-woman ratio). The specification with county-fixed effects in column 3 shows that higher shares of Protestants and of Jews are both associated with lower fertility. The sample split by locality shows that the negative relationship between the share of Jews and fertility is particularly accentuated in towns (column 4). The negative coefficient for the share of Protestants is also large in towns but not statistically significant at standard levels of statistical significance. As discussed in the literature and shown in Table 5, denominations are systematically different in terms of literacy levels. This begs the question to which extent differences in literacy can "explain" the relationship between fertility and denominations. In Table 7, we estimate, by locality, the relationship between fertility and the share of different denominations conditional on literacy. The estimates suggest that differences in literacy levels do not explain away the large negative relationship between the share of Jews and fertility in towns. Instead, controlling for literacy reduces, to some extent, the negative coefficient for the share of Protestants. These results are consistent with the notion that fertility of Jewish minorities living in urban centers might have also had a cultural explanation. Botticini et al. (2019), who analyze fertility patterns of Jews in central and eastern Europe from 1500 to 1930, argue that childcare practices such as breastfeeding led to a higher survival rate of newborns and that can explain the lower number of children born in the first place.

Conclusion
In this paper, we analyze, for the first time, demographic, religious, and educational patterns in 1871 Prussia at the level of locality (town, village, and manor) for more than 50,000 localities. This allows us not only to provide basic descriptive statistics at locality level but also to analyze more in depth whether and to what extent demographic groups were geographically segregated.
From the descriptive analysis, we find that Jews in Prussia lived predominantly in urban centers, as about 79% of them resided in towns. Prussia was already well-known for high levels of literacy in the 19th century. Our new data set shows that levels of literacy were quite high even in manors. With an average of 73% of people able to read and write, manors in Prussia had literacy levels higher than the whole of France and much higher than Italy and Spain. 21 An alternative explanation would be that members of a religious minority are more fervent believers and attachment to their own identity can help to resist the influence of the majority religion Rocco (2016, 2018); Verdier (2000, 2001)]. In this context, literacy could have been an important element of Protestant identity. This could explain the relatively high levels of literacy at low shares of Protestantism, i.e., when Protestants are clearly a minority. What makes this alternative interpretation the less likely one is the fact that for very low shares of Protestants, their influence on the aggregate literacy rate in a municipality is small.
From the segregation analysis, we find that there was a substantial level of segregation by religious denomination within counties. That is, for a given distribution of denominations at the county level, Protestants and Catholics tended to cluster in villages and/or manors. As expected, religious segregation in towns was significantly smaller. Interestingly, we find a relatively low level of human capital segregation, i.e., towns, villages, and manors were substantially less clustered by literacy. We also provide some evidence that there is not a perfect correlation between denominational and human capital segregation, suggesting a potential mixing of more literate Protestants with less literate Catholics and vice-versa in the localities.
The results from the regression analysis, which accounts for county fixed effects, show that the relationship between literacy and population size differs by locality. In particular, we find that the relationship between literacy and log population is negative for manors. This result is consistent with the interpretation that a larger manorial population lived more sparsely distributed and had less access to schools and/or churches. An alternative explanation is selective migration of domestic migrants.
The regression analysis also shows that, in towns, a larger share of Jews is not associated with higher literacy. This result is consistent with the hypothesis that towns formed and possibly attracted high-literacy individuals, leveling out differences between Jews and other religious groups. In fact, in villages and manors, a larger share of Jews is associated with higher literacy rates, in line with the work of Botticini and Eckstein (2012).
Finally, we find a large negative association between fertility and the share of Jews in towns. This fertility pattern is not explained by differences in literacy, suggesting the existence of cultural explanations, possibly related to childcare practices.
The regression results presented in this paper should not be interpreted as causal. In fact, it was not the purpose of this paper to unveil any causal relationship between the socio-economic variables at our disposal but to report "stylized facts" which, so far, had been investigated only at a more aggregated level. It is, however, our hope that the findings unveiled here will trigger more research on Prussian economic history. For example, it would be interesting to study whether localities which were predominantly Catholic had some spillover effects from neighboring Protestant localities which invested more in primary schools. Another "fact" that begs for an answer is how manors managed to achieve such high levels of literacy. Finally, although there is already substantial evidence on the trade-off between quantity and quality of children in Prussia [Becker et al. (2010[Becker et al. ( , 2012], it would be interesting to analyze the extent of the trade-off across localities and in conjunction with differences in the sex ratio, the latter possibly driven by gender-specific migration.