Sociolinguistic and Geographical Approaches

Part 5 Sociolinguistic and Geographical Approaches

26 Sociolinguistic Variation in Slavic Languages

26.1 Introduction

26.1.1 The Notion of Sociolinguistic Variation

Studies of sociolinguistic variation provide the basis for investigating the relationship between society and language (Reference HolmesHolmes 2013). This chapter outlines approaches to studying linguistic variation in Slavic languages by investigating both quantitative and qualitative aspects of variation with the focus on the relationship between human communication in the society and the corresponding linguistic features. Methodologically, in this chapter we will approach the notion of sociolinguistic variation according to the language user and the language use (Reference Gregory and BensonGregory 1988):

variation according to the language user covering such parameters of variation as the age, gender, region, socio-economic status, and other sociologically distinctive features (Reference CrystalCrystal 2004: 286, 364)
variation according to the language use covering such parameters of variation as communicative functions, for example academic writing or regulatory legalese, and communication styles, for example politeness or adherence to language standards (Reference HolmesHolmes 2013).

Analysis of sociolinguistic variation starts from the assumption that language has its role in the process of shaping the society, its concepts and beliefs, while in turn it is shaped by them (Reference HallidayHalliday 1978). In this view, meanings to be exchanged in the society and linguistic constructions used to express those meanings are shaped by millions of interactions between the individuals on the micro level. The totality of those interactions leads to establishing expectations within the society on the macro level, both with respect to the language user, as a member of various social groups, and with respect to the language use, as adherence to the norm which constrains features appropriate in specific social contexts. For example, the authors of academic articles come from different demographic groups, but the social context of sharing academic knowledge guides them in choosing linguistic constructions expected in this sphere of communication, rather than what is expected in their demographic group. In its turn the interface between the micro and macro levels is regulated (1) through formal language policies, which also change over time depending on the level of their acceptance in the society, and (2) through informal shaping of communication by influential speakers, such as writers, editors, or educators acting as the gatekeepers setting the expectations on what is appropriate in a given social exchange. For example, PhD advisors along with journal editors and reviewers act as the gatekeepers for their students helping them to adhere to the norms of academic writing.

The theoretical framework we are going to use in this chapter takes into account concepts from both formal and functional linguistics. The focus of functional linguistics is aimed at investigating a link between how language functions in the society and which resources it has to express those functions (Reference HallidayHalliday 1985). This implies the need to deal with variation. This concerns variation in the linguistic functions, which evolve as the product of culture (in some societies the communicative function of academic writing is not sufficiently distinguishable from other kinds of communication), as well as variation in the lexicogrammatical resources as codified linguistic expectations (for example, expressing an argument in favor of wearing facemasks in a Facebook forum is different from how the same argument is expressed in a research article).

The very notion of sociolinguistic variation with respect to the user refers to ‘identity’ as a factor of social cohesion, so it is important to emphasize that this variation does not influence the communicative function of language as a whole, but rather works on the microlinguistic level(s), for example, with the sense of social identity of various social groups of speakers of a given language. Therefore, sociolinguistic variation can be analyzed from the linguistic standpoints communicating with the fact that language and its contextual features can be analyzed as a code. These standpoints are: the language policy and language planning (with respect to the fact that much of this process is related to the spatial and social status of the language; Reference Mesthrie, Swann and LeapMesthrie et al. 2009: 373–374); also merging functional and discourse-oriented approaches.

Modern study of sociolinguistic variation entails a wide range of research ideas, both microlinguistic (the study of variation of the language elements, e.g. specific words, in different types of texts/spoken varieties) and macrolinguistic (the study of different types of language varieties, e.g. regional, as separate idioms).

The focus of formal linguistics is on the distinctive features of language variation, which can be divided into several types, most of which will be taken into account in this chapter: orthographical features (e.g. spelling differences in language variants); phonological features (e.g. the distinctive use of phonemes in dialects); grammatical features (such as word-formational, inflectional, constructional, and other); lexical features (the choice of the vocabulary in relation to sociolinguistic factors); discourse features (e.g. the structural organization of a text); and so on (Reference CrystalCrystal 2004: 286–297).

Generally speaking, there are two main determiners which play a role in controlling the direction of the discourse: global (e.g. topic markers and topic shifters), and local (exemplifiers, relators, evaluators, and so on). Distinctive features which determine sociolinguistic markers can be found in both, on various levels of linguistic analysis.

While we cannot attempt to overview studies of sociolinguistic variation across the entirety of Slavic languages, we will examine modern methods for studying sociolinguistic variation. More specifically we will examine the typology of sociolinguistic layers with respect to variation according to the language user and the language use, while the examples will be provided by a contrastive study which compares variation parameters in two fairly distant Slavic languages, Russian and Serbo-Croatian. See also reviews of specific studies in Slavic sociolinguistic variation (Reference Belikov and KrysinBelikov & Krysin 2001, Reference LauersdorfLauersdorf 2009).

The research of sociolinguistic variation in Serbian linguistics dates back as far as the middle of the twentieth century. It has developed from dialectology as a study of geographical differences in speech: phonological, morphological, grammatical, and lexical. Mainly in the second half of the twentieth century, starting from the 1960s, it is noticed that the ‘main picture’ of dialectal distribution had to be supported by further language differentiation according to generation, sex, educational level, and other social factors. In other words, besides the term ‘geographical space’, a term ‘social space’ in linguistic research had to be considered, as well. The notion of ‘sociolect’ was born (Reference Bugarski and BošnjakovićBugarski 2009: 15).

In time, the notion of sociolinguistic variation in linguistic research gained autonomy, developing its own methodological apparatus. Inspired by Labov, Trudgill, Milroy, and other researchers, Serbian sociolinguists introduced the term variety, as “any publicly pronounced form of a language: geographical, social, professional etc.” (Reference Bugarski and BošnjakovićBugarski 2009: 23). In this way, the horizontal dimension of language research, to which the term dialect was assigned, gained a new, vertical dimension. In recent times, the term vernacular is being used instead of variety.

In Serbian linguistics, there were two main factors which marked the transition from ‘classical’ or ‘rural’ dialectology towards the research on language varieties (although the ‘classical’ dialectology has continued to develop on its own to this day). The first factor was rising interest in urbanization and the language of urban populations, which can be traced back to the 1920s, but has reached its full potential from the 1980s onwards, with P. L. Thomas, Lj. Rajić, and other researchers (Reference BošnjakovićBošnjaković 2009: 49). Urbanization as a linguistic subject was analyzed both from dialectal and standardization viewpoints. This research has developed from an assumption (which was later confirmed) that social changes and language changes are in causal relation (Reference JovićJović 1978: 496). The second factor was (implicit or explicit) introduction of the term variable as a social marker causally related to language behavior. Later research showed that sociolinguistic variables can be found along the lines of the following social markers: class identity, occupation, sex, movability, age, region of origin, education, personality, and affinity towards social networking (Reference Rajić and BošnjakovićRajić 2009). All of these variables were introduced in the publication The Speech of Novi Sad (Volume 1: theoretical foundations, phonetic features, 2009; Volume 2: morphosyntax, lexical, and pragmatic features, 2011), a publication which fully revealed this type of sociolinguistic research both in Serbian and Slavic linguistics.

In conclusion, it can be said that, although research on sociolinguistic variation in Serbian linguistics does not follow the ‘waves’ described in (Reference EckertEckert 2012), either historically or methodologically, it does not contradict them either. The first wave, according to which the term variation can be described in relation to ‘standard’ language, more or less correlates with ‘classical’ dialectological research on vernacular(s) in the first half of the twentieth century. However, the understanding of the language variation without the polarization towards the language standard, which is the main feature of the second wave, and the rise of ‘social identities’ in which the speakers place themselves through stylistic practice, which is the main feature of the third wave, both constitute the habitus of research on sociolinguistic variation in Serbian linguistics from the 1990s onwards.

26.1.2 Sources of Evidence

In this description we will rely on two primary sources of information: quantitative research on the basis of large corpora and dictionary descriptions on the basis of fieldwork-based studies. This provides a complementary perspective, as large corpora offer an objective view of how linguistic constructions are actually used by a large number of speakers in a range of communicative situations. At the same time, corpus research focuses on written texts because of their availability in electronic form, so other sources are needed, such as field studies to focus on phenomena more widely manifested in spoken language.

26.1.2.1 Corpora

The first source of evidence comes from monolingual corpora, as they provide examples for qualitative analysis, as well as counts of detectable forms for quantitative analysis, which can be at the level of words, morphemes, or syntactic constructions (for corpora with syntactic annotation). This helps in comparing the frequencies of forms with the functions expressed by those forms, even when the link between the forms and the functions is rarely one-to-one (Reference Sharoff, Bartlett and O’GradySharoff 2017 ). Corpora as collections of texts have been used since computers became powerful enough to process large volumes of texts (see Reference Kučera and FrancisKučera & Francis 1967 ). Some of the applications of corpora concern studying sociolinguistic variation, for example Reference Reppen, Fitzmaurice and BiberReppen et al. (2002), with one of the studies in that collection focusing on variation in expressing deontic or epistemic modality via dolžen, nado, or nel’zja in two Russian registers (news vs. fiction) using the Uppsala corpus of Russian (Reference de Haan, Reppen, Fitzmaurice and Biberde Haan 2002). For a general overview of methods for studying sociolinguistic variation using corpora, see Reference Andersen, O’Keeffe and McCarthyAndersen (2010).

Since the beginning of the 2000s the amount of texts available on the Web democratized the process of collecting large corpora via crawling large samples of webpages (Reference SharoffSharoff 2006, Reference Baroni, Bernardini, Ferraresi and ZanchettaBaroni et al. 2009). Since a Web snapshot for a given language provides the closest approximation to creating comparable descriptions, the following Web corpora are used in this study:

hrWac: a corpus of 1.3 billion words, 574,000 pages, produced by crawling Serbo-Croatian language websites from the .hr Internet domain (Reference Ljubešić and KlubičkaLjubešić & Klubička 2014);
ruWac: a corpus of 2.5 billion words, 2 million pages, produced by crawling Russian language websites without restricting the Internet domains (Reference Sharoff, Goldhahn, Quasthoff, Quasthoff, Fiedler and HallsteindóttirSharoff et al. 2017);
ukWac: a corpus of 2 billion words, 2.5 million pages, produced by crawling English language websites from the .uk Internet domain (Reference Ferraresi, Zanchetta, Bernardini and BaroniFerraresi et al. 2008);
GICR: a corpus of 20 billion words, which consists of Russian social media posts with information about the age, place of origin, and gender of their authors, primarily from Livejournal.com and VK.com (Reference Belikov, Kopylov, Selegey and SharoffBelikov et al. 2014).

These corpora will allow us to capture variation through quantitative study of contexts. GICR from this list is particularly important, as the language users in its texts can be described via demographic parameters, thus providing a text-external link to the lexicogrammatical features, which can be extracted from texts automatically. Table 26.1 lists Russian administrative regions with the largest amount of texts in the Livejournal portion of GICR. Some regional indicators, such as Moscow, Saint Petersburg, or the USA, are less interpretable with respect to their dialectal variation, as these regions are the destinations of mass migration. However, for smaller regions this corpus provides authentic examples from million-word corpora to study relevant features.

Table 26.1 Region-specific sub-corpora for Russian bloggers in Livejournal

%	Words	Location
37.29	1,598,317,700	NA
21.02	900,987,080	Moscow
5.49	235,093,320	Saint Petersburg
4.76	203,762,440	Ukraine
1.69	72,221,940	Israel
1.39	59,565,660	Belarus
1.03	44,241,820	USA
0.95	40,673,500	Moscow region
0.80	34,392,680	Yekaterinburg region
0.78	33,428,360	Novosibirsk region
0.76	32,430,020	Germany
0.51	21,809,480	Samara region
0.47	20,061,300	Latvia
0.44	18,765,880	Estonia
0.43	18,363,940	Krasnodar region
0.42	18,060,980	Canada
0.40	17,047,660	Rostov region
0.39	16,534,700	Bashkortostan
0.38	16,467,360	Chelyabinsk region
0.37	15,855,280	Tatarstan
0.37	15,820,000	Perm region

26.1.2.2 Dictionary Descriptions

The second source comes from monolingual descriptive dictionaries as a special kind of linguistic manual which provides a unique view on sociolinguistic variation. Given their primary task – to describe language on both syntagmatic and paradigmatic levels – descriptive dictionaries by nature have in their sources a large variety of documents representing different types of texts or kinds of discourse. Furthermore, the metalanguage of a dictionary is in most cases structured so as to mark different kinds of sociolinguistic variation; everything which is not ‘standard’ to the lexicographer must be explicated in some way, usually by means of specific labels (e.g. ‘jarg. [jargon]’, ‘nonlit. [non-literary]’, etc.). External indicators of sociolinguistic variation in dictionaries (elements of dictionary metalanguage) can be turned into variables and correlated with variables of internal indicators of sociolinguistic variation (e.g. the typological lexicogrammatical features of lexis which is being defined). Because of this, descriptive dictionaries – when they are processed by quantitative methods – may provide useful additional/control information to the linguist who explores genre variation in the Web corpora and vice versa (Reference Hanks, Granger and PaquotHanks 2012).

A survey of the dictionary users (Reference Šipka and RistićŠipka 2021) shows that primary normative labels (those with the primary purpose of excluding the word from the formal standard language variety, such as slang, colloquial, etc.) have a higher excluding effect than secondary normative labels (which mark something else, but have a secondary effect of excluding, e.g. facetious, obscene, etc.). This proves that descriptive dictionaries have an impact on the process of language standardization.

26.1.3 On Different Views on Sociolinguistic Variation

The Standardization View

This conception of ‘standard’ or ‘literary’ language is based on the view that the contemporary language represents an entity which has undergone four main stages of language planning: selection, codification, implementation, and elaboration, followed by other stages such as acceptation, expansion, cultivation, evaluation (Reference HaugenHaugen 1987, in Reference Mesthrie, Swann and LeapMesthrie et al. 2009: 375 and Reference RadovanovićRadovanović 2003: 190). The core of the ‘literary’ language is based on the rules governing the linguistic levels: orthographic, lexical, grammatical, and the like, which represent the language standard. On the other hand, the periphery of this language includes various kinds of atypical or non-standard language use. In other words, the central zone of the ‘standard’ language represents the literary (written) language in standardized form, whereas the peripheral zone is marked by various cases of vernacular, dialectal, obsolete, or jargonistic language use.

This setting implies Schuchardt’s view on the relation between language of the individual and language of the collective (in Reference BelićBelić 1951 and Reference KovačevićKovačević 2014: 31–32). The individual participates in the building and institutionalizing of the language of the collective; the collective serves as a corrective to the individual through accepting or not accepting its innovations. This ‘genetic’ relationship is open to conceptualizing through various kinds of sociolinguistic models related to notions of ‘center/periphery’ and ‘standard/non-standard’. Furthermore, this setting also implies that language users can ‘delegate’ their language collective.

In historical overview it is important to conclude that the standardization view in its entirety is based on the notion of language cohesion among the members of one nation; therefore a national language is often perceived as a necessary condition for a nation to exist.

The main stages of language standardization can be explained in the case study of standard Serbian (Serbo-Croatian) and Russian languages from their origins to the present day. Contemporary Serbian is based on Štokavian dialect, which is differentiated from other dialects (Čakavian and Kajkavian) by the pronunciation of the interrogative pronoun što (as opposed to ča and kaj, respectively). What is taken to be the basis of the language standard is the state of the Serbian language immediately following an overall language reform (orthographical, grammatical, lexical), which was undertaken by Vuk Stefanović Karadžić in the nineteenth century.

The modern Russian standard emerged first from the Central Russian dialects when Moscow was gaining prominence over other Russian-speaking regions during the fifteenth century, while in itself this dialect incorporated some features of the earlier Northern Russian dialects (e.g. the ‘hard g’ consonant) and Southern dialects (reduction of unstressed vowels). This was followed by its extensive expansion over the realm of the Russian empire towards Siberia and the Far East. The literary standard was also heavily influenced by borrowings from a range of European languages, such as Polish, Dutch, French, and German (Reference Timberlake, Comrie and CorbettTimberlake 1993 ).

The View from Communicative Functions

In addition to variations coming from the user, variations in the sociocultural context of communication shape the linguistic constructions in various ways. First, the sociocultural context sets expectations on the kinds of communicative intentions suitable in a given situation. To describe them we need a typology of communicative functions. Second, it sets expectations on the kinds of linguistic constructions appropriate for expressing the communicative intentions; this is what Reference HallidayHalliday (1978) refers to as ‘register’. Thus, we need a typology of register features.

The choices available for the realization of communicative intentions are instantiated in individual texts according to individual preferences of their authors. However, statistical features obtained from a large corpus anonymize these individual preferences and provide a map of how the system of language connects the communicative functions with their realizations. The intersubjective nature of stylistic expectations also leads to the relative stability of typical genre-related ways of linguistic realizations of communicative intentions. In Bakhtin’s words, “each separate utterance is individual […] but each sphere in which language is used develops its own relatively stable types of utterances” (Reference Bakhtin, Emerson and HolquistBakhtin 1986: 60).

Therefore, we will address in this chapter the communicative functions available in existing corpora following the framework of Reference SharoffSharoff (2018) followed by statistical analysis of register features associated with those functions following the framework of Reference Biber and ConradBiber & Conrad (2009 ). We start from the assumption that the kinds of sociocultural contexts are mostly compatible across modern societies (Francophone, Slavic, or otherwise), especially when considering data from the Web corpora. However, the parameters of sociolinguistic variation across languages differ with respect to the following factors:

Variation in communicative functions. If corpora are collected from different sources or via different pipelines, such as newspapers or social media, the distribution of communicative functions is likely to be different. The composition of available corpora can also differ because of sociocultural differences in the frequencies of the functions, such as the preferences for the balance of argumentation or factual reporting in newspapers.
Standard ways for expressing communicative functions. Usually functions are associated with ways for their realization acceptable in the society. However, some cultures can lack codified lexicogrammatical features for realizing specific functions. For example, reporting in citizen journalism does not necessarily follow the established journalistic conventions, thus shifting the register choices. Alternatively, the gatekeepers are likely to influence the frequencies of register features accepted in the prestigious genres, such as expressions expected in academic writing or fiction.
Language-specific linguistic features. Finally, the functions can be similar and well codified, while they can be expressed via language-specific mechanisms. For example, the communicative function of narration is commonly associated with the higher rate of temporal adverbials and verbs in the past tense, unless a culture prefers to express some kinds of narrative reporting in the present tense in the form of ‘historical present’. Similarly, the argumentative texts are often characterized by the higher rate of explicit causation markers and emphatics. However, in certain cultures their use might be discouraged by the gate keepers.

The Interpersonal View

This view falls into the scope of research of the ‘ethnography of communication’. Dell Hymes developed a checklist of dimensions of sociolinguistic awareness that are involved when speakers communicate in particular speaking communications: genre, topic, purpose (or function), setting, key (emotional tone), participants, message, act sequence, rules of interaction, and norms of interpretation (Reference HymesHymes 1971, Reference Hymes1974).

More specifically, we will discuss methods for investigating such kinds of variation as:

with respect to the language standardization model: division according to regional variation, center vs. periphery, literary vs. vernacular language, standard language vs. dialects; temporal variation; social variation (jargon and slang), and idiolectal variation
with respect to registers, communicative functions, and their lexicogrammatical features: argumentative, news reporting, personal reporting, academic writing
with respect to interpersonal context of language use: politeness, code-switching.

26.2 Variation with Respect to the Language Standardization Model

Variation with respect to the language standardization model implies the existence of levels of sociolinguistic variation which can be presented in terms of their ‘peripheral’ relation to the language standard. In other words, the nature of these levels relies on their linguistic deviation from the ‘standard language’, the ‘standard’ being understood as a set of criteria for selecting the correct language expression in the society. In many Slavic languages, definitely in the two languages described here, these so-called standard varieties are privileged over the other regional and social varieties. Having in mind that the notion of the language standard does not present one homogeneous whole, it is understandable why this type of variation correlates to social and language stratification. The most common types of variation in Slavic languages are: (1) regional (division according to various forms of regional and/or vernacular language use); (2) social (which implies that different social levels of speakers share common linguistic markers); (3) temporal (division according to the time of language use); and (4) idiolectal (which puts different kinds of ‘incorrect’ language use related to the sense of identity of the speaker(s)). However, this division should be taken conditionally, given that, linguistically, these types of variation do not form isolated wholes, but rather overflow into each other.

26.2.1 Regional Variation

In the sociolinguistic sense, ‘regional variation’ represents a complex notion which combines different linguistic approaches to the notion of ‘region’.

The most common approach treats the ‘region’ in a geographical sense, as a set of language features pertinent to the territory where a certain idiolect or dialect is spoken. This sense also entails the fact that ‘regional’ use of language concerns the lexis naming the objects and terms from various aspects of everyday life, that is, the names of the seasons, plants, folk remedies, domestic and wild animals, fruit, agricultural tools, church paraphernalia, terms of common law, etc.

Features of regional variation in a geographical sense function on the broad range of linguistic contexts, from the constructional (accentual, phonological, morphological, and so on) to the lexical-semantic. For instance, in the Dictionary of the Serbian Academy the same label, ‘pokr.’ (‘regional’) is applied in a wide range of cases: to the lexemes used specifically in certain regions (e.g. lotnjak (n) pokr. … ‘olive oil of a good quality’ (Poljica); nikolča (n) pokr. … ‘national dance’ (srednji Timok)); to the regional phonetic variants of the lexemes of the standard language (e.g. mrmnjati (v) pokr. … ‘mrmljati’ (to murmur); nedomak (prep) pokr. … ‘nadomak’ (within reach)); and to the culturally specific meanings or constructions of polysemous words (e.g. kuća (n HOUSE) … pokr. ‘dečja igra’ (a children’s game); negodovati nekoga (v RESENT + Gen. [Anim.]) … pokr. ‘osuđivati nekoga’ (to judge someone)), etc.

A narrower subtype of regional variation can be considered a dialectal variation, which is directed towards systemic changes in the spoken language. Dialectal variation is most commonly analyzed through linguistic atlases, the collections of linguo-geographical maps which enable their users to analyze areal dissemination of dialectal features which belong to different levels of language use (Reference MiloradovićMiloradović 2012: 141). These atlases often represent the results of international projects, and their making is measured in decades. The most important atlases in the Slavic world are the European linguistic atlas ALE, the Slavic linguistic atlas OLA, and the Carpathian dialectological atlas OKDA; there are also national projects, such as the Serbian dialectological atlas SDA (earlier, Serbo-Croatian dialectological atlas). The central result of linguistic atlases is the production of linguistic maps which represent phonetic, morphological, syntactic, semantic, and lexico-derivational variation of languages in one or several areas (see http://slavatlas.org).

As a dialectal language feature can be realized on different levels of the organization of the language structure, this also implies various issues regarding the dialects (as linguistically complex forms) in relation to sociolinguistic perception of (literary) language and its variation. The research implemented by Reference Karlić and ŠakićV. Karlić and S. Šakić (2019) analyzes the language of literature written by Serbian writers from Croatia whose works were published after 1991 by the Serbian Cultural Society Prosvjeta’s publishing house in the edition Mala plava biblioteka. The paper shows that lexical choices between ‘Serbian’ and ‘Croatian’ idiom employed by the writers (sveštenik – svećenik ‘priest’, hrišćanin – kršćanin ‘Christian’, and so on), especially when it represents the dialogue between the characters, can be context-dependent.

Social media corpora like GICR for Russian provide information about author profiles to study examples of sociolinguistic variation. For example, GICR can be used to test the regional distribution of such phenomena as names of professions (ximička, ximica ‘chemistry teacher’) or food (rasstegaj, rastjagaj ‘a specific kind of pies’) with the differences clearly depending on the origin of the speaker (Reference Belikov, Mustajoki, Protassova and VakhtinBelikov 2010). At the same time, modern society is characterized by extensive demographic movements, which dilute the specific geolinguistic features for the destinations of mass migration, as well as by innovations, which create new phenomena enriching the previously known distributions of dialectal features. For example, GICR shows a clear association of new words or senses with specific regions, such as multifora ‘plastic wallet’ or svečka lit. ‘candle’, in the sense of ‘high-rise tower’ (Reference Belikov, Kopylov, Selegey and SharoffBelikov et al. 2014).

Development of mass transportation as well as socio-political upheavals of the twentieth century have led to extensive demographic movements, especially in the context of the use of Russian in the Soviet Union, for example, mass movements to Siberia and Kazakhstan in the 1950s and 1960s, or the mass migration to the Moscow region in the 1990s. This population mixing has led to very extensive contacts across the regional varieties resulting in the lack of well-defined dialectal features known from the rural communities. Therefore, it became difficult to provide definitive features which can describe the language of the destinations of mass migration as they are indicated in the current profile in social media accounts, while the picture for the regions less affected by mass migration is much more reliable.

26.2.2 Social Variation

In relation to the language standard there is usually a notion of two kinds of social variation: jargon (occupational registers) and slang (subcultural and youth registers).

Jargon variation usually implies the existence of special registers related to various scientific and cultural fields (finance, politics, medicine, etc.). Lexical units belonging to these registers can exist as separate terminological units, or they can be integrated into the standard language either by terminologization (left (n) … ‘leva ruka’ (the left hand) → pol. ‘members of revolutionary or liberal parties’) or by determinologization (constitution (n) … pol. ‘the main law which determines rights and duties of citizens in a state’ → ‘a set of physical or mental characteristics’).

On the other hand, there are several types of divisions in the slang variation. The most common divisions are the following: by the age of a speaker (slang as a characteristic feature of language of the youth), by the social status of a speaker (slang as a subcultural entity with the task of providing mutual understanding among speakers belonging to the same social structures), and division by speaker’s education (Reference BugarskiBugarski 2006, Reference Vujović, Alanović, Vasić and ŠtrbacVujović & Alanović 2011 ). The main linguistic characteristics of slang are its tendency towards imaginative and vivid lexical creations, which opens several systemic possibilities for the analysis of its distinctiveness. These characteristics are: assigning new, metaphorical meanings to existing words (krvav (adj) ‘bloody’ … odličan, izvanredan ‘excellent, remarkable’); distortion of the rules of language creation through permutation of sounds in a word (vozdra instead of zdravo ‘hello’); shortening of words or even just using their initials (profa ‘professor’; za dž (za džabe) ‘for free’); wide adoption of lexical borrowings (picikato (Ital. pizzicato) ‘gentle’, pis (Eng. piece) ‘small amount of drugs’); and so on (Reference BugarskiBugarski 2006: 22).

The slang features in Slavic languages are mostly researched in urban areas, where several non-language factors take place: change of socio-political concepts, technical and scientific progress, development of educational activities, etc. These research studies show that sociolinguistic variation can be investigated through morphosyntactic features of words as well. The most prominent examples are the variation of lexical doublets and various case constructions according to age and occupation of the speaker (e.g. ostareti – ostariti ‘grow old’, zbog toga – radi toga ‘because of this’, etc.) (Reference Vujović, Alanović, Vasić and ŠtrbacVujović & Alanović 2011: 46–51).

However, the most productive features of slang in Serbian at the word level stay in the mechanisms of word formation. For instance, there is a significantly large number of expressive suffixes which, in the standard language, have fairly marginal, obsolete, or informal use, but are highly productive in slang usage: -džija (tabadžija ‘goon, rowdy person’, tupadžija ‘sap, dull or chattering person’), -uša (uspijuša ’twerker, dirty dancer, a female who dances tastelessly and sexually provocatively’, bilderuša ‘muscle chick, tasteless female body builder’), etc. In addition to this, a part of the lexical inventory of slang is made by compounding and blending: radoholičar (workaholic), čedovišta (sweetchildmonsters), etc. (For a detailed list of suffixes and compounds in Serbian slang see in Reference BugarskiBugarski 2006: 238–274, 275–280.) In addition to this, in the urban environment it is common for residents to develop different derivational models of gentilics: Pejtonac – Pejtočanin – Pejtončan – Gradpejtonac – Gradić Pejtonac etc. ‘person living or working in the neighborhood called Peyton Place (named after a US series popular in the former Yugoslavia in the 1960s)’ (Reference Štasni, Ajdžanović, Vasić and ŠtrbacŠtasni & Ajdžanović 2011).

26.2.3 Temporal Variation

26.2.3.1 Descriptive Studies

In relation to the vertical, time-related dimension of the ‘standard language’, there are layers which can be characterized as ‘archaic’ or ‘obsolete’ and ‘new’.

The ‘archaic’ layer: lexis which belonged to the ‘higher’ styles in the history of Serbian literary language develops an anachronistic relation to the language standard (in Serbian, usually from the Slaveno-Serbian period: otvečanije ‘response’, vinodelije ‘winemaking’); today it can be used with highly expressive function.

The ‘obsolete’ layer: lexis, idioms, and meanings obsolete in relation to better choices (plav (adj. ‘blue’) in the meaning of koji ima svetliju nijansu, svetao (‘lighter’): ‘yellow can be more or less blue’, etc.); it can be mixed with the dialect (in obsolete dialectal words).

The ‘new’ layer can be attributed both to lexis and its meanings (e.g. new lexical borrowings (peč (patch), strimovati (to stream) …); in the standardizational sense, it signifies that the word is not yet accepted.

Variation with respect to the time of language use (diachronic level of sociolinguistic variation): there is also a model, to which this use is related. The model represents the ‘modern’ or ‘present-day’ usage of the standard language. In relation to this model, language can be ‘archaic’ or ‘new’ (with further subclassification in both levels, e.g. ‘archaic’ and ‘recently obsolete’). Differences can be established on the levels of word formation, word origin, syntax, and so on.

26.2.3.2 Corpus Evidence

The primary source of corpus evidence comes either from historical corpora, such as the Russian National Corpus (Reference Sharoff, Archer, Wilson and RaysonSharoff 2005) or from social media corpora annotated with author profiles, such as GICR (Reference Belikov, Kopylov, Selegey and SharoffBelikov et al. 2014). Historical corpora are suitable for detecting variation over large time intervals, while at the same time they are limited by the availability of their sources; in particular, fiction and legal texts are the main sources in the historical part of the RNC, as the language of formal writing can be preserved much better, but it reflects only a small portion of the total language use. This language is also very much influenced by the gatekeepers at a specific time.

On the other hand, social media have made it possible to capture the language of spontaneous interaction in everyday life. At the same time they are limited with respect to the relatively recent time period. Any sizeable social media collections are available from 2000s and their authors are mostly aged from 18 to 60; see the distribution of the number of blog posts for the authors who have explicitly provided their year of birth in Figure 26.1. A number of data points in a study of this kind can be expanded further via automatic age prediction for the authors who have not provided their age (Reference Nguyen, Doğruöz, Rosé and de JongNguyen et al. 2016), but this has not been attempted for Slavic languages yet.Footnote ¹

Figure 26.1 The number of blog authors in GICR with respect to the year of their birth

Another limitation of relying on corpora is that they often exhibit topical biases leading to predictable prevalence of specific topics, such as dating and education for the authors younger than 22 or history and illnesses for those older than 55. Nevertheless, corpora allow detection of more interesting patterns, in particular through associating grammatical properties with age differences. This can be done by building a statistical model predicting the author’s age on the basis of selected lexicogrammatical features, such as those from (Reference BiberBiber 1988). For example, a linear Support Vector Regression model (Reference Cherkassky and MaCherkassky & Ma 2004) can predict the author’s age in GICR using the grammatical features with the mean absolute error of 8.5 years. This model on the basis of modern GICR data for Russian associates higher positive weights (leading to positive correlation with age) for such features as the rate of third person pronouns, hedges, amplifiers and nominalizations, while higher negative weights (leading to negative correlation with age) are exhibited by such features as the rate of the first person pronouns, relative subordinate clauses, and omissions of čto (‘that’) after verbs of reporting, such as soglasit’sja ‘agree’, utverždat’ ‘claim’. This suggests greater acceptance of more personal, less formal communication in the blog posts of younger writers.

26.2.4 Idiolectal Variation

In Slavic languages, idiolectal variation implies idiosyncrasies. These idiosyncrasies function as transitional forms between language of collective and language of individual(s) (levels of ‘langue’ and ‘parole’). This phenomenon is most frequently researched on lexical and semantic levels of language structure.

On the lexical level, it is noticed that speakers who belong to certain regional, culture, educational, or age groups use non-standard or substandard phonetic realizations of certain words signifying notions in everyday language use. These phonetic realizations mark the identity of the above-mentioned groups. For instance, forms like bicikli (instead of bicikl, ‘bicycle’), utornik (instead of utorak, ‘Tuesday’), šaraf (instead of šraf, ‘screw’), etc. are considered to be used more frequently among the older Serbian native speakers from Novi Sad. Idioms in active language use also fall into this classification, for instance izvoditi kerefeke instead of izvoditi nešto (‘to act out’), or mani me instead of ostavi me na miru (‘leave me be’). Researchers of this level agree that variables of age and the sense of local identity are the most dominant in this division (Reference Štrbac, Vujović, Vasić and ŠtrbacŠtrbac & Vujović 2011). Idiolectal deviation from the language standard can also be marked by morphonological irregularities. In Serbian, these irregularities can be related to deviation from orthoepic norm (beleti instead of beliti ‘whiten’, izvrsan instead of izvrstan ‘excellent’), to the point of (more or less widely accepted) incorrect pronunciation (gledaoc, nosioc instead of gledalac ‘spectator’, nosilac ‘carrier’).

The same applies to the idiosyncrasies on the lexical level of the language structure. Non-standard or substandard realizations here imply that the language users of certain groups use loan words, archaisms, etc. which have the same meaning as ‘standard’ words, but with different connotation (for instance, kanda instead of izgleda, ‘it seems so’, or špacirati se instead of šetati se, ‘to stroll’). When used in speech, these forms are used as recognizable features of their users’ identity.

The most known idiosyncrasies to lexicographers are hapax legomena and the uncommonly used words or potential words. Both types are a result of non-standard procedures in the word formation of nouns, adjectives, adverbs, and verbs. Idiosyncrasies are commonly used in the language of literature, where they serve as a way of expressing the personality of their author as well as his thought. In Serbian, individualisms can arise as a result of analogy to more common words (zrakoproliće ‘beamshed’ in analogy to krvoproliće ‘bloodshed’), or by connecting or compounding the base of the word with an unusual or unexpected derivational affix or word (mladoženjstvo ‘groomness’). Potential words, on the other hand, usually arise by analogy, and signify penetration of new derivational models into standard language (detovati ‘to spend time as a child’, kolumbovati ‘to spend time acting as Columbus’).

The conclusion to be drawn from this is that variation with respect to the language standardization model does not imply the existence of whole new idioms which deviate from the standard language, but rather a set of recognizable landmarks on different levels of language structure (morphonological, semantic, word-formation, pragmatic, and so on). Overall, the word-formation and lexical levels seem to be the most active in this division.

26.3 Variation with Respect to Communicative Functions

26.3.1 Distribution of Communicative Functions

As mentioned by Douglas Biber, “language may vary across genres even more markedly than across languages” (Reference BiberBiber 1995). Nowadays it is relatively easy to collect very large samples from the Web to build representative corpora. However, there is often a lack of understanding of the composition of those corpora with respect to their inherent variation in terms of communicative functions codified via genres.

From the viewpoint of the categories for describing the genres, a large number of labels is needed to account for a large number of different kinds of texts in a large corpus. However, from the viewpoint of their annotation and meaningful comparison, a smaller number of labels is needed to cover the immense variety of text types. It is difficult in practice to compare corpora using even the 70 genres from the BNC typology, and many genre variation studies focus on smaller subsets of 10 to 15 genres (Reference Lee and SwalesLee & Swales 2006, Reference SzmrecsanyiSzmrecsanyi 2009 ). At the same time, even the full BNC genre set is far too small to describe variation with respect to very common Web genres such as personal blogs or discussion forums.

Another difficulty comes from the fact that the Web is a reasonably unconstrained publication medium, and many Web pages are produced without explicit gatekeepers, such as editors or reviewers, who ensure greater uniformity of formally published outputs. In the end, Web corpora contain many more examples of genre hybridism, for example, citizen journalism, which combines news reporting with personal observations, thus blending established genre categories.

In this chapter we will follow a topological approach to describe the variation of communicative functions by following the framework from Reference SharoffSharoff (2018). With the use of a small number of categories, the communicative functions for each text in a Web corpus can be analyzed with respect to how similar the text is to prototypes. The communicative functions common on the Web are:

A1 Argumentative. To what extent does the text try to persuade the reader? (For example, argumentative blog entries, newspaper opinion columns).
A8 News. To what extent does the text appear to be an informative report of events recent at the time of writing? Information about future events can be considered as reporting too. (For example, reporting newswire story).
A11 Personal. To what extent does the text report a first-person story? (For example, diary entries, travel blogs).
A12 Promotion. To what extent does the text promote a product or service? (For example, adverts, promotional publications).
A14 Academic. To what extent does the text report academic research? (For example, academic research papers).

The prototypes are given in the lists of examples, so that a citizen journalism blog entry can be judged as serving the same function as a reporting newswire story irrespective of its place of publication (i.e. a blog instead of a newspaper). At the same, hybridization of genres is represented via assessing the presence of several functions in the same text. For example, a citizen journalism post can blend the function of news reporting with personal reporting as in a private diary entry.

26.3.1.1 Variation with Respect to Similarity to Prototypes

While each text can be assessed with respect to its communicative functions manually, assessing them across the entirety of the Web corpora requires development of automatic classifiers. Table 26.2 presents the results of automatic classification of the respective corpora with respect to selected communicative functions using a neural model (Reference SharoffSharoff 2021). As the model can predict hybrid categories, the counts are given for the function with the strongest presence (according to the automatic classifier).

Table 26.2 Distribution of communicative functions in three Russian corpora as compared to English

FTD	GICR-lj		GICR-news		ruWac		ukWac
FTD	%	Freq.	%	Freq.	%	Freq.	%	Freq.
A1. Argument	4.69	480,113	28.64	605,223	18.20	222,741	18.27	366,412
A4. Fiction	1.22	124,942	0.12	2,624	3.23	39,526	1.37	34,943
A8. News	2.08	212,712	64.67	1,366,632	5.77	70,689	11.78	236,233
A11. Personal	81.07	8,305,825	2.29	48,414	44.29	542,111	4.45	89,199
A12. Promotion	1.99	204,261	0.66	13,977	5.34	65,417	21.52	547,013
A14. Academic	0.57	58,636	0.29	6,172	4.77	58,410	2.62	52,516

The composition of these corpora differs in both expected and unexpected ways. Reporting personal experience (A11) is by far the most common communicative function for postings in social media (GICR-lj). This is followed by argumentative texts (A1), primarily concerning expression of opinions and discussions on such topics as politics, parenting, or entertainment. The profile of the GICR news segment demonstrates a combination of factual reporting and argumentative texts providing analysis and opinions at the ratio of roughly two to one, as its sources consist of informative newswires (e.g. lenta.ru or rosbalt.ru). Analysis of composition also shows an estimate of the amount of reposting in GICR-lj, primarily from news and fiction, which obscures the demographic features of the authors for studying variation with respect to the user by means of social media.

In contrast to well-curated sources of GICR, the corpora produced by wide crawling of websites (ruWac and ukWac) contain a substantial portion of promotional texts, which reflects one of the major functions of the Web as a medium of business transactions, especially in the case of ukWac in English. As ukWac was constrained with respect to its top-level domain name (.uk), while ruWac was not, ruWac contains a substantial portion of texts from livejournal.com, blogspot.com, and hiblogger.net, which provide sources of personal reporting, making it fairly different from ukWac and more similar to GICR.

26.3.1.2 Cases of Hybridization

Individual authors contributing to their blogs have freedom in expressing their thoughts by combining several communicative functions, so that a personal blog entry can follow the style of traditional mass media with additions of personal diaries and argumentation about the state of affairs, thus instantiating an example of citizen journalism. In turn, many online news outlets are likely to employ some of the genre techniques of citizen journalism, since this genre has gained popularity on the Web. With respect to the social media sources (GICR-lj), 11 percent of texts are detected as hybrids, with the majority of them being hybrids of A1 and A11 or vice versa, for example, political argumentation supported by personal stories.

In addition to complete hybridization (when the entire text is aimed at expressing two or more communicative functions), many texts also have quanta of specific communicative functions. This situation is especially common in the interviews, which can include clearly separate parts, such as some aimed at reporting and others at evaluating (Reference Kibrik and VolodinaKibrik 2013). Quantization of communicative functions can be easily detected via human interpretation, but since the granularity of automatic predictions is at the text level, this situation is not captured by the automatic classifiers.

26.3.2 Lexicogrammatical Features

After completing our analysis of variation in terms of communicative functions, we can zoom in to describe variation in terms of lexicogrammatical features. A set of features suitable for automatic extraction from corpora has been proposed by Douglas Reference BiberBiber (1988); this has been adapted to Russian (Reference Katinskaya and SharoffKatinskaya & Sharoff 2015, Reference SharoffSharoff 2021 ). The features include:

text-level features, such as

average word length
type/token ratio (TTR)

part-of-speech (POS) features, such as

past tense verbs
wh-words in English and the corresponding question words in Russian
nominalizations (nouns ending in -tion, -ness, -ment in English, -cija, -st’, -nie in Russian)

syntactic features, such as

adjectives in the attributive function
subordinate clauses.

The POS and syntactic features of this kind can be reliably extracted using modern tools such as udpipe (Reference Straka, Hajič and StrakováStraka et al. 2016). Even though the original set was initially proposed by Biber for English, it matches many properties of Russian (and other Slavic languages). For example, the major parts of speech, subordinate clauses, or nominalizations provide useful indicators for the functional varieties. Even when the classes differ from English, they can be functionally equivalent. For example, wh-words do not exist under the same class in Slavic languages. However, their translations (kto, gde, kogda … in Russian) are likely to be a useful indicator of functional variation, though not necessarily identical to how they are used in English.

Information about the predicted communicative functions in a corpus can be used in assessing their register profile. Table 26.3 presents the lexicogrammatical features most closely associated with the respective communicative functions in ruWac. The absence of + and − in this table indicates the absence of statistically significant correlation; their presence indicates the degree of correlation with respective communicative functions (Reference SharoffSharoff 2021).

Table 26.3 Comparison of register features in Russian using ruWac

Features	A1 Argument	A4 Fiction	A8 News	A11 Personal	A12 Promotion	A14 Academic
Type–token ratio		+ +	+ +	− − −	+ +
Word length		− − −	+ + +	−	+ +
Adverbs			− −		+ +	−−
Conjunctions	+ +	−	+
Discourse particles		−		+ +	− −	−
Nouns		+		− −	+ +
Nominalizations	+ +	− −		− − −		+ +
Prepositions		−	+ +	−		−
Pronouns, 1p	− −	−	− − −	+ + +		− − −
Pronouns, 2p		+	− −		+ +	− − −
Pronouns, 3p		+ + +	− −	− −	− −	− −
Pronouns, WH-	+ +			−
Verbs, past		+ + +	+ +	+ +	− −	− −
Verbs, present		+ +		+		+
Attributive adjectives	+ +		− −		+ +
Negation	+				− −	− −
Subordinate clauses	+	−		−	+

Each communicative function has its own profile with respect to the features. For example, the Russian argumentative texts (A1) in comparison to other functions show higher rates of conjunctions, nominalizations, wh-pronouns, and attributive adjectives. In total this indicates the most typical features associated with expressing argument in Russian. The unexpected register profile of the promotional texts (A12) concerns their similarity to fairly formal texts with the higher rates of attributive adjectives and subordinate clauses, denser noun phrases, longer words, and the higher type–token ratio (TTR). For example, the following sentence from a typical promotional text in Russian has a long list of noun phrases and a range of different, often infrequent words without repetitions (leading to a higher TTR): Lučšie partnerskie programmy dlja zarabotka na vašem sajte s oplatoj za kliki, procenta s prodaž, SMS partnerki … (‘The best partnership programs for earning on your website with payment through clicks, sales royalty, SMS partnerships …’).

Data from Table 26.3 can also be used to investigate differences between reasonably similar communicative functions. Examples from fiction (A4) are often used as a substitute for representing everyday language. At the same time, the table shows how Russian fiction is different from personal reporting in blogs (A11) by the distribution of discourse particles, nouns, pronouns, and TTR. The differences indicate much better planning of linguistic constructions employed by fiction authors (e.g. the higher density of nouns) and the use of more varied lexicon (TTR) in comparison to more spontaneous, often repetitive interaction in personal blog entries. At the same time, personal blog entries exhibit a much higher rate of first person pronouns and discourse particles which indicate less objective reporting and lesser formality in comparison to fiction.

The functions of A4, A8, and A11 are all related to a more general communicative function of narration about past events. From the viewpoint of features, this is expressed in their register profile by the rate of verbs in the past tense. At the same time, they exhibit important differences from each other. For example, texts classified as A8 (news reporting) show a higher rate of prepositions (which are often used for expressing spatiotemporal circumstances that are especially important for news reporting) and a lower rate of pronouns, which indicate a very different context of narration in the case of news reporting in comparison to either fiction or personal reporting.

The register profile of academic writing is clearly different from other functions, including the A1 argumentative texts which come from newspaper opinion columns and argumentative blogs. Academic texts in Russian demonstrate considerably lower rates of adverbs, prepositions, verbs in the past tense, and negations (unlike other argumentative texts), which indicate the lack of the narrative context of academic argumentation. Also, the low rate of negations can be linked to the aim of obtaining positive knowledge (what is true) and the greater formality of academic writing. Surprisingly, TTR, which often correlates with formality and difficulty in English, does not emerge as a defining feature of Russian academic texts, possibly because formal academic texts often use repetitive constructions.

According to the corpora of Serbian scientific texts from the eighteenth and nineteenth centuries, academic discourse in Serbian language is characterized by a large number of distinctive features. For example, present tense is used more than other tenses (Ova škodljiva trava/zadušuje konoplju ‘this destructive grass/smothers hemp’); verbal forms are expressed in a personal way (1st person/plural: Sve navedene pojave/svrstavamo u dve klase … u radu ćemo istražiti … ‘All the cited phenomena/we classify into two classes … in the work we will study …’); de-agentivized (passive) constructions ‘should + infinitive’ are used in texts that give guidelines to the readers for something (seme/treba dobro osušiti ‘one should dry the seed well’, treba napisati šest jednačina ‘one should write six equations’); nominalized constructions are frequently used (promene se najčešće dešavaju pri spavanju ‘changes most frequently occur during sleep’); and so on (Reference StojanovićStojanović 2014). These features lead to expressiveness of the contemporary style of Serbian scientific writing (Reference StojanovićStojanović 2014: 85–119).

26.4 Variation with Respect to Communication Media

There is also an aspect of variation which is not often addressed in traditional accounts of sociolinguistic variation. Digital communication, in addition to providing a better window into sociolinguistic research (such as the GICR corpus discussed above), has also resulted in changes in the use of language. The relevant technological innovations on the Web include all forms of user-generated content, such as Wikipedia or review-sharing websites, as well as social media. They all offer more opportunities for individual contributions, with a relatively low threshold for participation in comparison to traditional communication and distribution channels. At the same time, the new affordances are first developed and then accepted by a small proportion of the population, primarily within the younger, more affluent, and better educated groups. In the case of Russian this has led (among other things) to the much greater use of both borrowing and calques from English, for example, sajt (‘website’), klik (‘click’), and partnerki (‘partnerships’). A little later some of the innovations gain enough popularity to lead to their standardization, for example, sajt belongs to the top 500 words in the general corpus of ruWac, as this word can be used in the most formal discourse contexts: Materialy konferencii budut razmeščeny na oficial’nom sajte konferencii v vide èlektronnogo sbornika i razmeščeny v RINC (‘The conference materials will be published on the official website of the conference as an electronic publication and will be included in the Russian Science Citation Index’).

Because of the difference in the alphabet between English and Russian, expression of some borrowings can be made easier by combining the Cyrillic and Latin characters when the more difficult combinations of Latin characters are copied, thus creating a mix of spellings, for example, call-центр (the call center), web-сайт (the website), and Android’ом (with Android).

The new ways of using language also lead to the emergence of new cultural phenomena, which can start their relatively unhindered development in the absence of gatekeepers (at least at the initial stage of their development). For example, many orthographic features have been borrowed into Russian social media from English, such as an extensive use of smileys, punctuation combining exclamation and question marks (!?!?), repeated characters (Daaaa!!, ‘yeeees!!’), and texts written in the upper case or with strikethroughs (Reference Piperski and SominPiperski & Somin 2013).

On the Russian Internet, there is also a case of creative spelling in the subculture known as padonki (lit. ‘scoundrels’ with the first vowel misspelled, o → a), also known as Olbanskij yazyk (Olbanian language, this time misspelling A → O). This is related to, though not directly borrowed from, the cacography tradition of making deliberate spelling or writing mistakes, usually with the purpose of mockery. What is specific to the Russian subculture case is a very extensive development of rules for violating the expected norms of Russian spelling, such as spelling a as o and vice versa, turning v to ff (for example, podonkov → padonkoff, ‘scoundrels’ in plural genitive), etc., often with the aim of expressing a phonetically similar string in a way which is maximally different from the accepted orthographic norms (Reference KrongauzKrongauz 2013).

27 False Cognates

27.1 Introduction and Definitions

The term false cognate refers to pairs of words in two languages or language varieties (such as dialects) perceived as having similar form and non-identical meaning. If more than two languages are included, one can talk about sets of words. We use this term rather than false friend, given that it is a more specialized linguistic term.

This concept comprises cases such as English gift vs. German Gift ‘poison’, or BCS palac ‘thumb’ vs. Rus. palec ‘finger’, where words in two languages share form while exhibiting different meanings. Inasmuch as the problem of false cognates arises primarily in translation, most designations of the concept refer to ‘false friends of the translator’, as in French faux amis du traducteur, German falsche Freunde des Übersetzers, Cze. falešní přátelé překladatele, Pol. fałszywi przyjaciele tłumacza, and Rus. ložnye druz’ja perevodčika (found also in shorter forms: faux amis, falsche Freunde, ložnye druz’ja). The fact that such terms can generate mistakes is underlined in many phrases including the German term irreführende Fremdwörter, lit. ‘misleading foreign words’, the Cze. mezijazykové falsiekvivalenty, lit. ‘interlingual false equivalents’, zrádné slovo, lit. ‘deceptive word’, and the Pol. złudny odpowiednik, lit. ‘deceptive equivalent,’ odpowiednik pozorny, lit. ‘misleading equivalent’.

All the aforementioned terms approach the concept from a psychological and applied linguistic perspective – they encompass all cross-linguistic pairs with a potential of generating false equivalence. A different perspective is observable in the Pol. term tautonim, the Cze. mezijazyková homonymie, Rus. mež‘’jazykovaja omonimija lit. ‘interlingual homonymy’, and the BCS međujezički paronim, lit. ‘interlingual paronym’, međujezički homonim, lit. ‘interlingual homonym’. They refer to a contrastive linguistic category, regardless of their translational and/or psychological functioning.

This points to the following two overlapping phenomena:

(1) false cognates – a relation between two words, each from its own language (or language variety), which can cause cross-linguistic false equivalence – a psycholinguistic and applied linguistic category;
(2) cross-linguistic paronyms – a relation between two words, each from its own language (or language variety), with a similar form and different meaning – a contrastive linguistic category.

The languages or varieties considered here need to have something in common. In Slavic languages, the link that is the sine qua non for the existence of false cognates is inherited Proto-Slavic vocabulary and the so-called internationalisms (the vocabulary from the shared Greco-Roman cultural circle). The first ground for the relationship is dominant in inter-Slavic false cognates, the second in Slavic-to-non-Slavic false cognates.

There are those who see false cognates as a subset of lexical parallels, that is, the words that coincide in form and may or may not coincide in meaning. More information about this approach can be found in Reference Dubičinskij and RojterDubičyns’kij & Reuter (2011, Reference Dubičinskij and Rojter2015, Reference Dubičinskij and Rojter2020), as well as in Reference Kozdra and DubichynskyiKozdra & Dubichynskyi (2019).

Being perceived as having similar form in Slavic languages presupposes phonological correspondences between the two words in question. The forms of the two words need to include one of the following or a combination of them:

(a) sounds occupying approximately the same place in the phonological systems of their languages, which means that suprasegmental and phonotactic features, such as positional palatalization and final devoicing, may be different;
(b) reflexes of the same Proto-Slavic sound; or
(c) standard phonological and morphological adaptation of a borrowed word.

To exemplify the first category, (a) above, let us consider LSo. sad ‘fruit’ and Pol. or Rus. sad ‘orchard’, where all sounds occupy approximately the same places in the phonological systems of these languages. Within this category there are examples such as BCS bólnica ‘hospital’ vs. Sln. bolníca ‘female patient’, where suprasegmental features (such as stress placement) are different. Different phonotactics can be exemplified by Cze. mrav [mraf] ‘custom, deportment, manners’ vs. BCS mrav [mrav] ‘ant’, with final devoicing in Cze. and a lack of it in BCS. The second situation, see (b), can be seen in the fact that Macedonian danok ‘tax’ corresponds with BCS danak ‘levy’, and Slovene dánek ‘day (diminutive)’ given the correspondences of the development of jers (o:a:e), established in the chapters on vowels and consonants, although the BCS and Macedonian lexemes are etymologically related to PSL *dânь ‘tribute, tax’, while the Slovenian diminutive is linked with PSL *dьnь. Finally, English scan in general and technical senses, and Pol. skanować ‘scan, in a technical sense only’ exemplify the third category, as a is the standard adaptation of the English æ sound in Polish, and the -ować suffix is one way to adapt borrowed English words in Polish. Phonological development is covered in this volume in Chapters 2, 3, and 5; phonological adaptations in borrowing are covered in Chapter 25 in this volume.

The importance of false cognates in Slavic languages is that they represent results of semantic development from Proto-Slavic to present-day Slavic languages, covered in this volume in Chapter 24, and lexical borrowing, covered in Chapter 25. The topic of false cognates in Slavic languages has commanded considerable interest in theoretical and practical research. The list of references at the end of the chapter lists various Slavic-to-Slavic and Slavic-to-non-Slavic dictionaries, papers about Slavic false cognates, and even papers about false cognates in Slavic dialects.

27.2 Types of False Cognates

Slavic languages feature large areas of commonality in their vocabularies (body parts, measures, kinship terms, flora, fauna, etc., which is covered in Chapter 23 in this volume). Reference Golubović and GooskensGolubović & Gooskens (2015) explored mutual intelligibility between the six Slavic languages in the European Union and obtained a variety of scores in different tasks intended to measure understanding of texts in another Slavic language, ranging from 9.52 percent to 96.52 percent. The ability to understand another Slavic language is limited by various factors. For example, some languages are closer to one another than others, understanding of certain types of texts is easier than with other types, etc. False cognates are definitely one of the factors that hamper inter-Slavic communication.

The lexical relationship in a pair of false cognates can take different configurations. First, there is a difference between full and partial false cognates . In some pairs, the two words do not share any meanings; their entire semantic structure is in the relationship of false cognates. For example, Pol. angielski ‘English’, while Rus. angel’skij ‘angelic, cherubic’ are full false cognates. On the other hand, Pol. broda ‘beard, chin’ and standard contemporary Russian boroda ‘beard’ exemplify partial false cognates, given that only a part of the semantic structure (in this case the sense of ‘chin’) engages in the relationship of false cognates. Second, some false cognates are monodirectional, others are bidirectional. The aforementioned pair broda : boroda is monodirectional. Only when one translates from Polish into Russian does the relation of false cognates exist (with the meaning ‘chin’). Translating from Russian into Polish will not involve false cognates, as the two languages share the sense of ‘beard’. The example angielski : angel’skij exemplifies the bidirectional type. Translating from Polish into Russian and from Russian into Polish alike will have to pay attention to false cognates.

The pairs of false cognates also differ as to the underlying cause of the relationship. The three main mechanisms behind the emergence of false cognates in Slavic languages are the split of an inherited Proto-Slavic root, lexical borrowing, and word formation. The split, the first aforementioned mechanism can be seen in the development of Proto-Slavic words *batę (meaning ‘father, brother, relative, uncle’, depending on the language), and *životъ (meaning ‘life’, ‘stomach’, ‘body’, depending on the language). Borrowing can be seen in BCS fleka ‘stain’ vs. Pol. flek ‘heel tip’, where both members of the pair are German loanwords, but their meanings differ. Finally, Sln. mostišče ‘pile dwelling, bridgehead’ vs. Rus. mostišče ’bridge (augmentative)’ exemplifies word-formation differences, in this particular case two different functions of the suffix -išče, referring to the place in Slovene and denoting the augmentative in Russian. Concrete pairs of false cognates can also include a combination of these factors. Word-formation differences can also emerge when the two words are derived from different stems, for example Rus. xiščnik ‘vulture, predator’, derived from -xitit’ ‘to plunder’, coinciding with Slovene hišnik ‘janitor’, related to Slovene hišen ‘residential’ and hiša ‘house’. The aforementioned factors can be concurrently present in a pair of randomly connected false cognates; thus BCS dah ‘breath’ vs. Pol. dach ‘roof’ is a product of the development of a PSL root in BCS and borrowing from German in Polish.

There is a scale of difference in the meaning of false cognates, ranging from subtle differences in the same general subject-matter field, as in BCS penzija, Rus. pensija ‘retirement money’ vs. Pol. pensja ‘salary’, to loosely related meanings, as in Cze. život ‘life’ vs. Rus. život ‘stomach’, to completely unrelated meanings, as in the aforementioned example dah : dach. Subtle differences also exist in the stylistic sphere, when the two languages in question share their basic meaning. For example, both BCS plesati and Pol. pląsać mean ‘to dance’, yet the BCS word is neutral and its Pol. equivalent archaic.

27.3 History

False cognates in various lexical fields can be traced back even in the development of early German and Roman loanwords. In some cases, semantic divergence can be related to the polysemy of the respective Proto-Slavic lexical items. For instance, PSL *duma ‘advice; thought; opinion’ (< Proto-Germanic *dōma ‘judgment’) has reflexes with divergent meanings, for example Rus. duma ‘thought; council’ and Bul. duma ‘word; conversation; thought’. In other cases, the diversification of the meanings of early loanwords has to do with idiosyncratic semantic changes that occurred in a specific language or a group of Slavic languages. For example, in Western Slavic languages PSL *kъ̏nęgъ ‘prince, ruler’ (< North-West Germanic *kuningaz ‘ruler’) develops a new sense of ‘priest’ (Pol. ksiądz, Cze. kněz, Slk. kňaz), while in Polabian t‘ėnądz, there was also an additional sense of ‘moon’.

The diversification of meaning that provoked emergence of Slavic false cognates can involve different degrees of semantic change from the original meaning. That is the case of the polysemy of the diverse adaptations of the Roman word *comes in Slavic languages. Its original meaning ‘companion’ most probably designated the official that represented the Roman power in late imperial provinces (Reference BočekBoček 2010). In feudalism, PSL *kъmetъ/*kъmetь, an early Roman loanword, underwent various semantic modifications, yielding Old Rus. ‘knight, warrior, member of a feudal lord’s retinue’ and Old Cze. ‘head of a family, village leader, representative of a village’. The traces of the former meaning, related to the high hierarchy position, are found in Bul. kmet ‘elected foreman; village headman’, Mac. kmet ‘village community leader’, and arch. Ser. kmet ‘land judge, village leader’. The designation of the person having a high position can be metaphorically linked with Cze. kmet and Slk. kmet‘, both of them marked as archaic and literary, denoting ‘wise and experienced old man’.

However, in the history of Slavic languages the lexeme *kъmetъ/*kъmetь underwent semantic devalorization: one of the semantic components of kmet implied subservient relationships between the village leader and his ruler. The impact of that component was that the word, influenced by the functional metaphor, got the meaning BCS ‘serf, one who doesn’t have his own land’, and with the evolution of feudal relationships BCS ‘peasant who has to pay rent for his land’. Sln. kmet ‘peasant’ and Ukr. kmet ‘village dweller’ are nothing but results of further semantic widening after the word lost its feudal connotations. In contemporary Sln. kmet shows two semantic drifts: by metonymic shift (‘peasant’ → ‘his land and surroundings’), the non-nominative plural forms (na kmeteh, s kmetov) acquire the meaning ‘in the countryside/from the countryside’; metaphorically, Sln. kmet denotes also ‘pawn’ as the least important chess piece.

Another semantic shift, triggered by a metaphor ‘peasant’ → ‘someone who behaves like a peasant’, occurred in some Slavic languages where the word obtained pejorative connotations. Thus Pol. kmiot ‘vulgar person, boor’ and Sln. kmet ‘awkward, inelegant person’ were the furthest departures from the original meaning.

After having considered two cases of loanwords that had developed different meanings in various Slavic languages, we will pass to inter-Slavic examples of false cognates, resulting from diverse semantic mechanisms. The word *sъklèpъ is a deverbative noun from PSL *sъklepa̋ti ‘to join, to unite’ (Reference SnojSnoj 2016: 683), and its original meaning denotes ‘a spot where two elements are united’. As the result of semantic narrowing in Sln. sklep denotes ‘joint, connection between bones’. In West Slavic languages the starting point of its semantic development was Old Pol. sklep ‘vault’. By metonymic shift, the word could express also a place with a vault: Old Pol. sklep, Cze. sklep ‘basement, cellar, treasury’. In Rus. sklep, a Pol. loanword adopted via Ukrainian, narrowed its meaning – it did not mean an underground place, but also acquired the meaning ‘burial vault’. The semantic development in contemporary Pol. was different than in Russian – motivated by metonymic shift, Pol. sklep became the general word for shops and stores, as they were often located in underground places (Reference BoryśBoryś 2008: 551).

In contrast to all the аbove-listed cases, where sklep denotes concrete objects, Sln. sklep takes on an additional abstract meaning ‘final meaning, finding, conclusion’. That semantic shift was triggered by the following metaphor: in the eighteenth century, the verb sklepati ‘to join, to unite’ got the meaning ‘to combine thoughts’.

As has been demonstrated, in the history of Slavic languages, one can distinguish the following mechanisms of semantic development in the development of inherited Proto-Slavic roots and words borrowed from other languages:

(a) various types of transfers, such as metaphorical and metonymical shifts, drift from concrete to abstract meanings;
(b) widening and narrowing.

This can be seen in Pol. żywot ‘hagiography, the life of a saint’, which is narrowed down compared to Cze. život ‘life’, which maintains the inherited Proto-Slavic sense (for more information about this root, see Reference SaenkoSaenko 2021). In the homographic pair of Sln. grad ‘castle’ vs. BCS grad ‘city’, the latter language features a broader semantic widening than the former. In the example of Rus. list ‘leaf, sheet’ vs. Pol. list ‘letter’, Polish features metaphorical transfer while Russian retains the original sense.

In addition, maintenance and loss have played a considerable role in the emergence of false cognates. This can be exemplified with the pair Ukr. ložyty ‘lay, put’ vs. BCS ložiti ‘heat, make a fire’, where the former maintains the inherited Proto-Slavic sense, while the latter loses it and features a further semantic development. This is not the case of the pair Rus. vrednyj vs. Mac. vreden ‘valuable, worthy, diligent’, as these units refer to two different Proto-Slavic roots: the former is related to PSL *vrědьnъ ‘harmful’, while the latter is linked with the PSL *verdьnъ ‘worthy’. The derivatives of *verdьnъ ‘worthy’ were in some Slavic languages exposed to additional semantic shifts, for example the comparative adverbial form urẹ:dnẹ in the Slovenian dialect of Upper Savinja Valley has acquired the meaning ‘cheaper’ and not ‘worthier’ (Reference WeissWeiss 1990: 102).

For more information about semantic development see Chapter 24 in this volume.

False cognates are occasionally generated by paronymic attraction, that is, the change of form caused by the similarity with another word. For example, the Pol. word dorożka ‘droshky, a horse carriage’ is a false cognate with Rus. dorožka ‘narrow road’ owing to the fact that the Rus. borrowing drożki ‘droshky, a horse carriage’ has been adapted in Pol. into dorożka in consequence of paronymic attraction with Rus. dorožka.

27.4 Inter-Slavic vs. Slavic and Non-Slavic

The distribution of the aforementioned mechanisms in the emergence of the Slavic-to-Slavic false cognates is substantially different from those involved in the emergence of Slavic-to-non-Slavic false cognates. Reference ŠipkaŠipka (2015) compared false cognates between BCS and Polish with those between BCS and English. It turns out that in the former category (Slavic-to-Slavic false cognates) the split of an inherited Proto-Slavic root accounts for 91 percent of the cases, borrowing for 7 percent, and word formation for 1 percent. In the latter group (Slavic-to-non-Slavic false cognates), Borrowing accounts for 97 percent of the cases and word formation for only 3 percent. Most commonly, the borrowing is from Latin in both languages (51 percent of the cases); English borrowings in BCS comprise 22 percent of the cases, and borrowings from other languages 24 percent. This disproportion has consequences in applied linguistics. False cognates will be an issue very early in the process of teaching Slavic languages to Slavs (given that inherited roots are found mostly in frequent vocabulary items). In teaching Slavic languages to non-Slavs, false cognates will become frequent much later, given that most of the borrowings are in the sphere of abstract vocabulary.

27.5 Research Tradition and Dictionaries

The topic of false cognates in Slavic languages has aroused considerable interest in theoretical and practical research. There are dictionaries and glossaries of the Slavic-to-Slavic false cognates, such as Reference BelinʹkajaBelin’kaja 2015 , Reference BirbrajerBirbrajer 1987 , Reference BunčićBunčić 2020 , Reference ČemerikićČemerikić et al. 1988 , Reference GrabčikovGrabčikov 1980, Reference Ivanova and MarijanaIvanova & Aleksić 2007, Reference Kononenko and SpivakKononenko & Spivak 2008, Reference LewisLewis 2016, Reference LewisLewis 2002, Reference LotkoLotko 1992, Reference LytovLytov 2020, Reference Szalek and NečasSzalek & Nečas 1993, Reference OrłośOrłoś 2006, Reference SedakovaSedakova 2005, Reference SedakovaSedakova 2008, Reference StojkovićStojković 2004, Reference ŠušarinaŠušarina 2015, Reference TokarzTokarz et al. 1994, Reference TokarzTokarz 1998, Reference Tokarz1999, Reference UryadovUryadov 2015, Reference VlčekVlček 1966, Reference VyxotaVyxota 2004, Reference WijasWijas 2014, Reference Žuravlev and ZaxarovŽuravlev Zaxarov 1977). There are studies of Slavic-to-Slavic false cognates (Reference FranekFranek 1998 , Reference BunčićBunčić 2000, Reference Dubičinskij and RojterDubičinskij & Rojter 2015 , Reference IlievskaIlievska 2006, Reference Kalenić and HadžiKalenić 2001, Reference KarpaczewaKarpaczewa 1987, Reference KonickajaKonickaja 2011, Reference Lewis, Příhoda and VaňkováLewis 2008, Reference LewisLewis 2015, Reference LobkovskajaLobkovskaja 2012, Reference Opačić, Djigunović and PintarićOpačić 1995, Reference Peti-Stantić and TivardarPeti-Stantić 2014, Reference Popović and TrostinskaPopović & Trostinska 1988, Reference Popović and Trostinska1989, Reference SoglasnovaSoglasnova 2018, Reference XucišviliХucišvili 2006, Reference XucišviliXucišvili 2010). There are furthermore papers on Slavic inter-dialectal false cognates (Reference Blažeka and BlažetinBlažeka 2013, Reference Blažeka and VendiBlažeka & Franc 2017). There are dictionaries of Slavic-to-non-Slavic false cognates (Reference AkulenkoAkulenko 1969 , Reference BorisovaBorisova 2002 , Reference Dubičinskij and RojterDubičinskij & Rojter 2011, Reference Gotlib and GenrixGotlib 1985, Reference KanoničKanonič 2001, Reference KovačevićKovačević 2009, Reference KrasnovKrasnov 2004, Reference PaxotinPaxotin 2006, Reference VyxotaVyxota 2002). Finally, there are papers about Slavic-to-non-Slavic false cognates (Reference IvirIvir 1968, Reference LipczukLipczuk 1988, Reference ŠipkaŠipka 1991, Reference Šipka2015).

27.6 Conclusion

The study of false cognates in Slavic languages has its applied and theoretical importance. In applied linguistics, false cognates are a formidable hurdle in inter-Slavic second language acquisition, as they are encountered very early in the process of instruction, unlike Slavic to non-Slavic language instruction, when most of them appear much later in the process. Theoretically, the study of false cognates is important as they represent the final point of various processes in the history of the lexicons of individual Slavic languages: semantic developments, lexical borrowing, and word-formation processes. As such, they encapsulate not only the aforementioned processes but also the dynamics of interaction between them.

While false cognates seem to be well documented between most major Slavic languages, the need for further research still exists in the following three areas. First, the gaps where there is a conspicuous absence of Slavic-to-Slavic and Slavic-to-non-Slavic dictionaries of false cognates should be filled. Second, the applied linguistic ramifications of false cognates should be elucidated. Finally, theoretical work on linking false cognates with semantic development in the lexicon of Slavic languages and on their typological classifications should be conducted.

28 Dialectal Fragmentation

28.1 The Concept of Dialectal Fragmentation

As a metaphor for the description of language variation over time, the notion of dialectal fragmentation presumes that there is a relatively homogeneous whole (a language) which in the course of time undergoes separation into pieces (dialects) due to a variety of historical circumstances. In this respect, the notion is analogous to the Stammbaum (branching diagram) conception of linguistic development, which starts from an artificially reconstructed point of departure (here Proto-Slavic [PSL], ca. fifth century AD), then gradually branches out into increasingly diverse, yet genetically related dialects (here East, West, and South Slavic, whose representatives are attested, aside from earlier toponyms, directly and variously from the tenth to twelfth centuries AD).

This conceptualization is of course rather schematic. Since all attested human languages exhibit some degree of variation, we must assume that protolanguages conform to this pattern. The generalization applies to PSL (particularly in view of its numerous tribal sub-affiliations, including at least one, whose name *Xъrvat- ‘Croat’, is ultimately of Iranian origin, thus dating back to the prolonged period of close Slavic-Sarmatian contact north of the Black Sea, ca. 700 BC– AD 300). Unambiguous supporting evidence for this assertion can only be traced back to the early period of the Slavic Migrations (seventh to eighth centuries) from the probable Slavic homeland, which was originally situated to the (north)east of the Carpathians, although some resettlement to the mid-Danubian basin had occurred by the time of the Avar Khaghanate in the seventh century (Reference Nichols, Maguire and TimberlakeNichols 1993). These migrations were part of the great movement of peoples which attended the collapse of the Roman Empire, marking the transition from late Antiquity to the early Middle Ages. By the end of this period, the Slavic speech continuum extended from the Elbe River and the Baltic Sea (also NW Russia) in the north to the Peloponnesus in the south, and as far east as the Dniepr. This territorial expansion (noted in the mid-sixth century AD by the Gothic historian Jordanes and the Byzantine historian Procopius, with reference to the Sklavenoi and Antai, as well as by John of Ephesus, who describes the Slavs as present in “all of Greece, Thessaly, all of Thrace” in AD 581–584) not only intensified prior contact with Baltic and Germanic in the northeast and northwest, but also brought the Slavs into contact for the first time with a number of other languages, including Finnic in the north, as well as Balkan Romance (chiefly Romanian and Aromanian), Albanian, and Greek in the south (not to mention possible surviving remnants of Paleo-Balkan languages other than Proto-Albanian). This geographical dispersion facilitated rapid and territorially differentiated linguistic change in the Slavic speech area. The effects of dispersion were augmented by those of actual division, when in the last centuries of the first millennium AD the South Slavs were gradually separated from their northern kinfolk as the Bavarian kingdom extended farther eastward (mid-eighth century), and the Magyar (Hungarian) kingdom incorporated most of Great Moravia in the early tenth century.

Some developments which we will consider partially link West and South Slavic (e.g. the ‘jugoslavisms’ in Central Slovak or the lenition of *g > γ in parts of northern West South Slavic), others point to an earlier separation of West Slavic from the other branches (as in particular treatments of the Second Velar Palatalization and *tl, *dl consonant clusters), still others oppose South Slavic to both West and East Slavic (sometimes referred to as ‘North Slavic’), as in the conservative retention of the reflexes of nasal vowels in certain desinences. At times the residue of earlier changes manifests itself in the form of discontinuous isoglosses (as in Carpathian-Balkan lexical parallels) or more fragmented, tessellated formations (as in the various endings for the third person singular and first person plural in the present tense). Finally, interaction with different non-Slavic languages (Finnic, Germanic, Venetian, and the many languages of the Balkans) has contributed to a mixture of divergent and parallel innovations in the Slavic languages which participate in the process. As a result, we find perfect past tense formations with ‘have’ not only in Balkan Slavic, but also in colloquial Czech, or the merger of palato-alveolar voiceless affricates with pre-palatals or dentals not only in northern Russia, but also on the Dalmatian coast.

The reader is advised that although the first sections of this chapter focus more on early dialect differences, the chapter is not intended to serve as a historical outline of the development from Proto-Slavic to the modern daughter languages (for this, see e.g. Reference Schenker, Comrie and CorbettSchenker 1993, Reference Greenberg and KapovićGreenberg 2017 ). Instead, it seeks merely to illustrate and provide a minimal historical context for some of the kinds of dialectal fragmentation which have been documented in Slavic languages. The topics covered include the following: PSL palatalizations; other PSL dialect differences in phonology, prosody, and morphology; pre-triadic variation (morphology); early East Slavic tribal dialects; lexical isoglosses across Slavic; inter-Slavic areal features; prosodic continua in West and South Slavic; the role of external factors (economic, political); the effects of contact with non-Slavic languages (Finnic, Germanic, Venetian, the Balkan Sprachbund); and finally, the impact of sociolinguistic factors (religious confession, gender). While most of the topics and illustrative material concern phonology and prosody, I have also made some reference to morphology, syntax, and the lexicon. Under the assumption that the reader has no prior familiarity with the historical phonology (or grammar) of Slavic, I have also provided a modicum of explanatory background where it seemed necessary.

28.2 Notes on the Transcription of Proto-Slavic and Old Church Slavonic

When citing PSL reconstructed forms, I follow current common practice, striking a balance between a precise phonetic reconstruction and typographic considerations, which are intended to minimize or avoid the use of special IPA symbols. This principally concerns the PSL ‘back low’ vowel, whose phonetic reconstruction would be [ɔ], but for which we use *a (as in Reference Greenberg and KapovićGreenberg 2017, Reference VermeerVermeer 2014 ); correspondingly, the PSL ‘front low’ *e probably resembled [ε] or [ä]. Further, in representing PSL vowels, we use the macron to indicate length /ā, ē, ū, ī/ and leave the low short vowels unmarked /a, e/, while marking the short high vowels (ŭ, ĭ), as a concession to tradition and a reminder that these are the ancestors of OCS ъ, ь, respectively.

For Old Church Slavonic (hereafter OCS), we retain the letters ъ (back jer, pronounced roughly as the vowel in English ‘but’) and ь (the front jer, pronounced roughly as in ‘bit’), but transliterate the jat’ vowel as ě (phonetically probably [^jæ], with a sub-phonemic palatal on-glide), and the jery <ы> as y (phonetically [ɨ], Rus. ы). Unless otherwise indicated, when referring to the hushing consonants, č = [tʃ], š = [ʃ], ž = [ʒ] (pronounced with varying degrees of secondary palatalization or retroflexion), I will use the term ‘palatal’ here in a broader traditional sense (as in Reference VermeerVermeer 2014), rather than only to refer to IPA [ɕ, ʝ, c, ɟ]; c = [ts]. The affricates ć, dź are pre-palatal, as in BCS and Polish. Czech <ě> usually indicates [jε], for example běs [bjεs] ‘demon’, except after /m/, where město = [mɲεsto], and after dentals, where it indicates an IPA palatal stop, for example děd [ɟet] ‘grandfather’.

Generic geographical abbreviations: N = north, NE = northeast, NW = northwest, S = south, SE = southeast, SW = southwest.

28.3 Early Differences in the Proto-Slavic Palatalizations

The clearest early evidence for PSL dialectal differentiation at the time of the Slavic Migration is provided by the so-called Second Velar Palatalization and the Jot Palatalization of PSL *dj, *tj (Reference Greenberg and KapovićGreenberg 2017, Reference Schenker, Comrie and CorbettSchenker 1993, Reference ShevelovShevelov 1965 ). These changes are best understood in the overall context of PSL palatalizations.

At a stage in the development of PSL which had been reached in the first half of the first millennium AD, but prior to the Slavic migrations, the existing velar obstruents (the stops *g, *k, and the voiceless fricative *x) could occur in front of both back and front vowels (*gadŭ ‘year’: *genā ‘woman’, *kala ‘wheel’: *kela ‘forehead’, *xaldŭ ‘cold’: *xestŭ ‘sixth’). Over the period of approximately the fifth to eighth centuries, this distribution was fundamentally realigned due to a series of palatalizations, by which velars adapted their pronunciation to following and (more rarely) preceding front vowels (compare the different [k] sounds in English ‘cool, cold’ vs. ‘kill’, or the even more emphatic alteration of historical *k in ‘chilly’, a cognate of ‘cool’, as well as the difference between the velar and palatal fricatives in German Ach [ax] ‘Ah!’ and ich [iɕ] ‘I’). These changes occurred in two historically distinct stages, known as the First and Second Velar Palatalizations (hereafter, 1st and 2nd Vel Pal). The two stages were formally separated by the Monophthongization of all inherited diphthongs (hereafter, Mono), a change which provided a fresh stock of roots and endings with long *ē. According to the current consensus, an integral part of the 2nd Vel Pal also includes what has traditionally been referred to as the Third Velar Palatalization (3rd Vel Pal). The 3rd Vel Pal yields the same reflexes as 2nd Vel Pal, but is caused by a preceding (high) front vowel under various conditions (including the value of the following vowel).

The outcomes of the velar palatalizations in OCS are in most respects typical of the majority of Slavic dialects (see Table 28.1).

Table 28.1 Reflexes of velar palatalizations in Old Church Slavonic

	1st Vel Pal	2nd Vel Pal	3rd Vel Pal
*g	ž- žena ‘woman, wife’	(d)z- (d)zělo ‘very’	-(d)z kъnę(d)zь ‘prince; priest’
*k	č- čelo ‘forehead’	c- cělь ‘whole’	-c otьcь ‘father’
*x	š- šestъ ‘sixth’	s- sědъ ‘grey’	-s vьsь ‘all’

The relative chronology of the 1st Vel Pal, Mono, and 2nd Vel Pal can be established by comparing the behavior of the voiceless velar *k- at four reconstructed stages in its PSL development in position before the back vowel *-a [ɔ] and the front vowel *-e [ε] in the following combinations: *ka-, *ke-, *kei-, *kai- (see Table 28.2). Of particular interest here is the different behavior of the Indo-European e-grade and o-grade cognate diphthongs in PSL *keist- > *čeist- (by 1st Vel Pal) > *čīst- ‘clean’ (by Mono) vs. *kaist- > *kēst- (by Mono) > *cěsta (by 2nd Vel Pal, thus not *čěsta > časta) ‘road’ (i.e. ‘a cleared path’, e.g. Cze. cesta, Sln. césta).

Table 28.2 Relative chronology of First Velar Palatalization, monophthongization of i-diphthongs, and Second Velar Palatalization

		Diachronic progression (stage a > stage b, etc.)
PSL		1st Vel Pal >	Mono >	2nd Vel Pal	AD 850^a	gloss
*k-	*kala >	*kala	*kala	*kala	*kolo	‘wheel’
	*kela >	*čela	*čela	*čela	*čelo	‘forehead’
	*kei̯sta >	*čei̯sta	*čīsta	*čīsta	*čisto	‘clean’
	*kai̯stā >	*kai̯sta	*kēstā	*cēstā	*cěsta	‘road’

^a 850 = a few years before Constantine and Methodius’s mission to Great Moravia.

This chronology is consistent with the fact that later loanwords associated with the adoption of Christianity by the Slavs exhibit 2nd Vel Pal rather than 1st Vel Pal before front vowels, for example OCS crьky, Rus. cerkov’ < PSL *kьrky ‘church’, Gen. *kьrkъve < West Germanic *kirikō Fem. < Greek kurikon Adj. ‘the Lord’s’. The PSL forms here probably date to the mid-eighth century, when Carantanian Slovenes could have borrowed the word in its West Germanic form (see Reference Pronk-TiethoffPronk-Tiethoff 2013 ).

As shown in Table 28.3, the 1st Vel Pal (*g > *ž, *k > č, *x > *š) and Mono (more specifically, *ai- > *ě₂) yield uniform reflexes across all of Slavic. Note that the sub-critic in *ě₂ is merely a linguistic convention used to indicate the diphthongal origin of the vowel *ě in certain lexemes and morphemes. This vowel is identical phonetically to *ě of monophthongal origin, although this identity is partially obscured by a late PSL dissimilation, whereby *čě- (from 1st Vel Pal) > č’a, etc. in almost all of Slavic (e.g. *krīkētei ‘to scream’ > *kričěti > Rus. kričát’ vs. *kesŭ ‘time’ > OCS (Zographensis) čěsъ). The same change occurred in *cě₂-, etc., but only in Bul.-Mac. dialectal cal, pl. cali (*cě₂ l-) ‘whole’ (vs. standard Bul. c’al, celi), as well as dzalo (*dzě₂lo) ‘very’, etc. in a few medieval SW Bul.-Mac. texts (Reference Scatton and MagnerScatton 1976). The identity of the sound in question is confirmed by the fact that the OCS alphabet uses the same letter (called jat‘) to designate both *ě and *ě₂. Early Slavic loanwords from the period of the Slavic migration in Greece (primarily toponyms) show no surviving velars in the position for the 1st Vel Pal, nor any PSL diphthongs (see further below).

Table 28.3 Summary of Slavic velar palatalizations

	PSLa^a	West Slavic Czech	South Slavic OCS	East Slavic^c Russian	Gloss
1st Vel Pal	*Gen-	žena	žena	žená	‘woman’
	*Kel-	čelo	čelo	ORus. čelo	‘forehead’
	*Xest-	šestý	šestъi	šestój	‘sixth’
2nd Vel Pal	Gail → Gě2l-	OCze. zielo	dzělo ~ zělo	ORus. zělo	‘very’
	Kail → Kě2l-	celý	cělъ	cel	‘whole’
	Xaid → Xě2d-	OCze. šiedivý	sědь	sedój	‘grey’
	Gwaizd → Gvě2zd-	hvězda^b	dzvězda	zvezdá	‘star’
	Kwait → Kvě2t-	květ	cvětъ	cvet	‘flower’
3rd Vel Pal	kuninG → kuni͕g-	kněz	kъnęzь	kn’az’	‘prince; priest’
	*atiK-	otec	otьcь	otéc	‘father’
	*viX-	OCze. vše	vьse	vse	‘all’ neut

^a Capital letters [G, K, X] in proto-forms indicate the velars which are affected by the Palatalization in question.

^b Cze. h < *g by lenition (a separate development of *g, not of *z), cf. Pol. gwiazda.

^c The 2nd Vel Pal is absent in Old Novgorod-Pskov and, dialectally, after *kv- and *gv- (> *hv-) in Ukr.

In contrast, the 2nd Vel Pal (again Table 28.3) already shows several different kinds of reflexation in the Slavic dialects. The differences involve various limitations on the occurrence of the 2nd Vel Pal (in West Slavic and dialectal East Slavic), as well as a particular treatment of *x- (again in West Slavic).

Type 1: *g-, *gv- > (d)z, (d)zv, *k-, *kv- > c, cv, *x- > s (OCS, South Slavic, and most of East Slavic). This most common type articulates all reflexes as dentals/alveolars. It also treats *kv-, *gv- the same as *k-, *g-. It is probable that an intervening ‘pre-palatal’ stage (dź, ć, ś) preceded the alveolar outcome (see Reference VermeerVermeer 2014 ).
Type 2: *g > (d)z, *k > c (as in type 1), but *x > š (i.e. a palatal rather than a dental) and the velars remain in *kv-, *gv- (West Slavic).
Type 3: This rare type does not undergo 2nd Vel Pal, but retains the velars before the reflex of *ě₂ as [k’, g’, x‘] in all positions (NW Russian), for example Old Novgorod-Pskov kěle ‘whole’ (Rus. cel), xěrь ‘grey cloth’ Fem. (Rus. séryj ‘grey’, OCze. šěř ‘grey cloth’), Xědovo (toponym) (Rus. sedój ‘grey’), lęgi Impv. ‘lie down, go to bed’ (-i₂ in the Impv. Sg. is a particular development of Indo-European *-ōi, rather than *-ēi), gvězda ‘star’, květ ‘flower’. The modern dialect evidence indicates that the velars here underwent palatalization, but did not shift their fundamental place of articulation, thus k’, g’, x‘ (Reference ZaliznjakZaliznjak 2004 ).
Type 4: This particular Ukrainian dialect type is the same as Type 1, except that *kv- and *gv- exhibit diffuse regional variation between preservation and standard East Slavic palatalization of the velars for this position, as is reflected in a small number of examples, such as (1) Ukr. kvit ‘flower’ ~ cvit ‘blossom; flower’ (kvit is more common in Kiev-Poles’e dialects), and (2) dialectal zvizda ~ hvizda (the lexeme itself is better attested in SW Ukrainian, mainly as zvizda, which occurs in Church Slavonic and folklore, but is generally replaced in most of Ukrainian by zorja, attested since the thirteenth century). Note that the distinction in the plural which Russian expresses with cvetý ‘flowers’ vs. cvetá ‘colors’ is rendered by Klymentij Zinovijiv (a late seventeenth-century Ukr. poet) through květy vs. cvěty (Reference ShevelovShevelov 1979: 56–58, as well as Map 28.2 for approximate isoglosses). The semantic complexity of the problem can be seen in the co-occurrence of the two types in the Carpathian Ukrainian dialect of Torunj (Reference Nikolaev and TolstajaNikolaev & Tolstaja 2001 ), where c’v’it, Gen. c’v‘ítu, pl. c’v’itɷ́ ‘color of foliage on a tree; blossoming; color (in general)’, and the plural signifies both ‘flowers’ and ‘colors’, whereas the plurale tantum kv‘ítɷ, Instr. -amɪ means only ‘flowers’, for exmaple v’inók … s kv‘ítamɪ ‘a wreath of/with flowers’, and the corresponding sg form is the diminutive Fem. kv‘ítka ‘a flower’. Meanwhile, the underlying verb shows only the 2nd Vel Pal (f. Inf. cvɪstɪ́, 1p.pres cvɪtú, 3p -é).

For still other East Slavic dialect types, some of which even differentiate the reflexes of 2nd and 3rd Vel Pal from one another, see Reference Nikolaev and IvanovNikolaev (1988).

Due to limitations of space, we will not discuss dialectal differences in the treatment of *skě₂-clusters (e.g. *voskъ ‘wax’, *na voskě ‘on the wax’ > OCze. voščě ‘wax’ Loc., OCS voscě; see e.g. Reference ShevelovShevelov 1979, Reference VermeerVermeer 2014 ).

To conclude this section, we will consider a different type of PSL palatalization (of somewhat later date than those described above), one which contributed perhaps even more significantly to early Slavic dialectal fragmentation. I refer to the so-called ‘Jot Palatalization’ of the dentals in the combinations *tj, *dj, as in Late PSL *světja ‘candle’ and *medja ‘boundary’, respectively (from *svaitiā, *mediā by glide formation [^j] of the derivational morpheme {i} in position between a stop and a vowel, a typologically common development).

In general, the reflexes of *tj, *dj exhibit four different kinds of reflexes (see Table 28.4):

Type 1 (East Slavic): Merger with the result of the 1st Vel Pal (thus, č, ž in ESl, except for č, dž with archaic retention of an initial stop, in parallel with the voiceless reflex, in SW Ukr.; less commonly in South Slavic – č, dž in Torlak, whereas č, but j [a palatal glide] in Slovene);
Type 2 (West Slavic): Merger with the 2nd Vel Pal (thus, WSL c, dz in Slovak and Polish, but c, z in Czech and Sorbian, although the merger is incomplete in Slovak, due to lenition in the reflex of the 3rd Vel Pal; contrast Slk. *dj > medza vs. *-g > peniaze by 3rd Vel Pal);
Type 3 (Western South Slavic): A distinct pre-palatal reflex, which arguably continues an earlier phase in the PSL development (ć, dź [spelled đ] in BCS, t́, d́ in some Čakavian and (rarely) Torlak dialects, which also attest ḱ, ǵ);
Type 4 (Eastern South Slavic): Clusters of the type št, šč, (rarely šk’) < *tj, *dj > žd, ždž, (rarely žg’), which bear witness to an earlier stage of gemination (ćć, dźdź), with subsequent dissimilation of the initial affricate. More specifically, OCS: certainly št, perhaps earlier šč; Bulgarian: nearly always št, žd, rarely šk’, žg’, Macedonian: šč/št, žđž/žd (in the Southwest and far East), but more commonly ḱ, ǵ (in the North, most of the Central West, and therefore also standard Macedonian). The last-mentioned reflexes in the standard language are due to the influence of Serbian ć, đ (perhaps even ḱ, ǵ or t́, d́, due to the proximity of Southeastern Torlak), which increased in the fourteenth century and particularly after Skopje became the Serbian capital, whereas the older forms št, šč, ždž, žd are more consistently maintained in peripheral Southwestern and many Eastern dialects. The two sets of reflexes exhibit a north–south gradation and a lexicalized distribution, for example Central West synonyms snošti ~ sinok’a ~ ‘last night’ (*sь-nokti), but also differentiated derivatives, such as praštilo ‘drawstring’ ~ praḱa ‘slingshot’ (both from *prati- ‘send, throw’). Unusual for Macedonian is the pairing of *tj > š‘č’ (guréš‘č’o ‘hot’) and *dj > ž (with lenition, e.g. méža ‘boundary hedge’) in the remote SW village of Vrnik in eastern Albania (Reference SchallertSchallert 2005).

Table 28.4 Reflexes of *tj, **kt’,dj in modern Slavic dialect areas

		West	South				East
	Late PSL	West	Sln	BCS	Torlak^a	Bul-Mac^b	Rus
*tj	*světja ‘candle’	Pol. świeca	svéča	svijèća *svēt́ȁ Čak	svečá	svešti, svešči sveḱi (pl.)	svečá
*kt’ *gt’	noktь ‘night’ mogtь ‘might’	Pol. noc moc	nọ̑č mọ̑č	nȏć mȏć	noč moč	nošt, nošč; noḱ mošt, moḱ	noč moč
*dj	*medja ‘boundary’	miedza Pol. medza Slk. meza Cze.	mέja	mèđa	medžá	meždá,meždžá mežg(’)a,meǵa	mežá medža^d
Merger with Vel Pal?		2nd (except Slk.)^c	1st (*tj)	–	1st	–	1st
Lenition of voiced reflex		Czech	+	Kaj, WŠtok?	–	rare	all (?)

^a Torlak here = easternmost Torlak (Timok-Lužnica).

^b Mac. reflexes.

^c Cf. Slk. medza (*dj- > dz) vs. peniaze ‘money’ (*g with 3rd Vel Pal > z).

^d SW Ukr. medžá (Reference ShevelovShevelov 1979).

As is evident from the above, lenition (spirantization) of the initial voiced stop in the reflex of *dj is widely attested in all three branches of Slavic: (1) [ž] in nearly all of ESl and (quite rarely) in southern dialects of Macedonian, (2) [z] in Czech and Sorbian, and (3) [j] (with complete lenition to palatal glide) in Slovene, as well as in adjacent Čakavian and (to varying degrees) some of West Štokavian.

The reflexes of *tj are joined by those of the cluster *kt’ (including *gt’ > *kt’ by early devoicing assimilation).

28.4 Other PSL Dialect Differences

Among other notable late PSL dialect differences (phonological, prosodic, morphological), we may cite the following.

Archaic retention of the PSL clusters *tl, *dl is typical of West Slavic (Cze. pletla ‘she knitted’, mýdlo ‘soap’), whereas in East and South Slavic the initial dental stop is lost (Rus. plelá, mýlo, BCS plèla, mȉlo, Sln. molíti ‘to pray’). Exceptions here are NW Russian, in which the stop is not lost, but dissimilates to a velar (*tlešč- > kleščь ‘bream’) and NW Slovene (mọ̑dli̥m ‘pray’1p sg., kridwo ‘wing’), whereas Central Slovak is congruent with South Slavic plela, šilo ‘awl’ (*ši-dlo), one of several ‘jugoslavisms’ which typify this zone of Slovak.

Innovations in the treatment of closed syllables ending in ‘liquid (r, l) + obstruent’ (the so-called *TORT and *ORT groups) fall into two categories, viz. root structural (for *TORT) and prosodic-vocalic (for *ORT). For the first type, *golva ‘head’ > East Slavic pleophony, as in Rus. golová (for complications in the development of Ukrainian, see Reference GardeGarde 1974; for further variation, including Rus. dial. polymja ‘flame’, see Reference ZaliznjakZaliznjak 1985, Reference Zaliznjak2004) ~ West and South Slavic share metathesis, but with different vocalism, even within West Slavic, see Pol. głowa ~ Cze. hlava, Bul. glavá (for Lechitic a preliminary stage *gǝlowa has been proposed). For the second type, we find that the treatment of *ORT depends on the PSL tone: PSL acute tone *őrdlo ‘plow’ > BCS short falling pitch rȁlo, North (= East + West) Slavic /a/ Pol. radło, ORus. ralo vs. PSL circumflex tone *ȏlkъtь masc. ‘elbow’ > BCS long falling pitch lȃkat ~ North Slavic /o/ Pol. łokieć, Slk. loket’, Rus. lókot’ (but cf. Central Slovak laket’, again as in South Slavic). Since /a/ is the late PSL reflex of PSL long *ā, whereas /o/ reflects short *ă, East Slavic in the course of metathesis appears to have converted the PSL acute in this position to length, but the PSL circumflex to brevity. This is the opposite of the South Slavic quantitative reflexes of these same two PSL tones (at least in disyllabic word forms). Nor is the observance of such a prosodic difference in early East Slavic at all surprising in light of differences such as koróva ‘cow’ from late PSL *kőrva (Acute) vs. górod ‘town’ from late PSL *gȏrdъ (circumflex).

There are also a small number of North–South dichotomies, two of which involve the treatment of late PSL nasal vowels in auslaut in specific morphological environments (Pres. Act. Participle: OCS nesy ‘carrying’ ~ ORus. nesa, gen. sg and nom-acc. pl in ja-stems: OCS zemlę ‘earth, land’ ~ ORus. zemlě, OPo ziemie, where the absence of a nasal vowel in OPol. is strongly diagnostic) and one which concerns the replacement of *-a (the anticipated o-stem masc.instr.sg, as probably in the adverb *vьčera̍ ‘yesterday’) either with the ŭ-stem desinence (as in North Slavic, cf. ORus. bog-ъ-mь ‘god’, Pol. bratem ‘brother’, for the Polish reflex of *ъ, cf. sen ‘sleep, dream’ < *sъnъ) or a freshly minted o-stem equivalent (as in South Slavic, cf. OCS bog-o-mь, BCS bratom). There are a very limited number of exceptions with the reflex of *-ъmь in South Slavic (attested in a tiny number of peripheral southern Štokavian dialects, see Reference IvićIvić 1994).

28.5 Pre-triadic Variation

Other candidates for evidence of PSL dialect fragmentation include some whose areal distribution differs at times markedly from the triadic type.

One such piece of evidence can be found in the idiosyncratic areal distribution of the two competing suffixes (original *-nū- and secondary *-nūn-, as per Reference AndersenAndersen 1999) for Slavic verbs formed with the n-suffix (cf. respectively Slovene sah-ni-ti ‘dry (intrans.)’ from -ny- < *nū, vs. BCS sah-nu-ti from *-nǫ- < *-nūn). The common element *-nū- in each form represents a lengthened grade variant of *-nu-, itself a zero grade of the IE *neu- (for the lengthened grade, cf. Greek deik-nū-mi ‘I show’ vs. zero-grade pl. deik-nu-men ‘we show’, for *-neu̯, cf. the Slavic past passive participle formation type mi-nov-en ‘past’). The archaic *nū-type occurs only on the western periphery of the West and South Slavic speech territories (cf. West Slavic: Polabian, Upper Sorbian, Silesian; South Slavic: Slovene, Čakavian, some Kajkavian, West Štokavian), which suggests that it was brought there by an early wave of settlers from the east and south, before the newer *nǫ-reflex began to spread to the rest of Slavic (for the pathways of migration determined by recent archaeology, see Reference BaranBaran 1990, whose map is redeployed in Reference AndersenAndersen 1999).

A second example of apparent pre-triadic variation is the case of three distinct endings for the third person singular present tense (thematic *-tь, pronominal *-tъ, and originally injunctive [?] *-t > zero), all of which occur in our earliest written sources (tenth to twelfth centuries), exhibiting an areal distribution in several respects incongruent with the threefold East–West–South classification (thus, in the Postscript to the Old Russian ‘Ostromir Gospel’, the Novgorod (?) scribe Grigorij writes da iže gorazněe sego napiše ‘but if anyone can write better than this’, using an East Slavic 3p.sg.pres dialect form napiše, even though throughout the main body of the Gospel text he has consistently used the ‘standard’ ORus. thematic ending -tь, which survives widely in modern dialects, as well as in i-verbs in standard Ukrainian, rather than Old Church Slavonic -tъ; see Reference MillerMiller (1988) for the linguo-geographical distribution and history of the question, as well as Reference OlanderOlander 2015 ). Of particular note is the retention of -t in Western Macedonian (as in OCS) down to the present day, despite zero in the rest of South Slavic.

A third example of a scattered distribution, also inconsistent with the modern triadic grouping, is evident in at least three of the four attested first person plural endings, see *-mъ (OCS, Bul. [-im, -em], Rus.); *-mo (BCS, Slovene; Central Slovak; Ukrainian, since at least the fourteenth century); *-me (Czech, Slovak, Bulgarian [-ame], Macedonian, Ukrainian West Galician dialect); -*my (Polish, Sorbian, and as a variant in some of OCS and medieval Bulgarian) appears to be secondary (for PSL reconstructions, see Reference VaillantVaillant 1966 Vol. 3, Reference OlanderOlander 2015). Two factors which initiated or supported modern vocalic endings are (1) the merging of 1sg -mь and 1pl -mъ in the numerically small, but highly frequent athematic verbs (whence -my in early Bulgarian, by analogy to the first plural pronoun), and (2) the analogical spread of the athematic 1sg -m(ь) (cf. dām ‘give’, ěm ‘eat’, věm ‘know’, esm̥ ‘am’, imām ‘have’) in South and West Slavic to some or all of the remaining verb classes, which would have increased the motivation for endings with a vowel following -m (e.g. BCS prȍsīm : prȍsīmo ‘ask’, čìtām : čìtāmo ‘read’; Mac. kradam : krademe ‘steal’ vs. Bul. kradá : kradém).

28.6 Early East Slavic Tribal Dialects

The Old Russian Novgorod birchbark documents (twelfth to fifteenth centuries) attest an interesting array of distinctive archaisms, including (1) nom.sg.masc -e (vs. -ъ in all the rest of Slavic; see Reference VermeerVermeer 1994, Reference ZaliznjakZaliznjak 2004 ), (2) the absence of the 2nd Vel Pal (for examples, see above), and (3) the retention of PSL *tl-, *dl- clusters (in the dissimilated form kl-, gl-, e.g. Old Novgorod kleščь ‘bream’, jogla ‘fir tree’, vs. Rus. leščь, jelь, but with a structural resemblance to Polish jodła and segmental identity with East Baltic cognates, such as Lithuanian ẽglė, Latvian egle). These traits set the Old Novgorod dialect apart from the rest of East Slavic and in the case of the nom.sg and the 2nd Vel Pal from the rest of Slavic as a whole. According to one theory, these archaisms indicate that the East Slavic Kriviči tribe, who composed the Slavic population in the western part of the Pskov-Novgorod regions in the tenth century, had separated from the main body of Proto-East Slavic speakers during the beginning of the Migration period, then re-established contact with other East Slavic tribes as the latter moved northward. Yet another Northwestern Russian dialect trait, which is already attested in the earliest Novgorod documents, is tsokan’je (the pronunciation of *č of diverse PSL origin as [c’] or [c], e.g. c’isto ‘clean’, skac’eš ‘you leap’ ~ Rus. čisto, skačešь), attributed by some scholars to assimilation of a Finno-Ugric substratum. The widespread occurrence of this innovation primarily in northernmost Russian dialects overlaps considerably with the domain of the Novgorod Republic (twelfth to fifteenth centuries) with its extensive colonial outreach. Other innovations contributed further to the differentiation of parts of Northwestern Russian, such as the formation of the so-called ‘Second pleophony’ (s’arép for Russian serp ‘sickle’) and a low-mid, non-palatalizing front-to-central value for the back jer *ъ when it occurred before the palatal glide (as in the nom.sg.masc.adj desinence *-ъjь, e.g. molodέj/-ə́j < *mold-ъjь ‘young’, Rus. molodój, or in the root vowel, e.g. mέju/mə́ju < *mъjǫ ‘I wash’, Rus. móju) or before palatalized sonorants (as in the residual lexeme odέn’je ‘stave at bottom of barrel; bale of hay’ < *o-dъn-ьje, where *dъn- is the same etymon as in Rus. dno, pl. dón’ja ‘bottom’ <*dъn-o). For further discussion of these and other archaic traits, including some (such as *květъ > t’v’et (sic), and ‘exotic’ oxytonesis in tučá ‘rain cloud’, bur‘á ‘storm’, vol‘á ‘will’ vs. Rus. túča, búr’a, vól’a) which arguably can be related to the formation of the Vjatiči tribal dialect (situated non-cotangentially to the southwest of the Kriviči in the eleventh to thirteenth centuries), see Reference NikolaevNikolaev (1994).

28.7 Early Lexical Differences across and within Slavic Languages

Here we offer a small sampling of lexical differences which exhibit significant linguo-geographical patterning within Slavic.

East, South ~ West (Reference BräuerBräuer 1961, Reference SkokSkok 1971–1973 ): (1) ‘feast, wedding feast, (patron) saint’s day feast’, Rus., OCS, BCS pȋr, pȉjer (Bosnia), Sln. pȋr ~ Pol. gody, uczta; Cze. svátek, hostina, (2) ‘watch, look at’, Rus. smotret’, OCS sъmotriti, BCS mòtriti ~ *patrěti > Pol. patrzeć, Cze. patřit ‘to belong, to be fit for’ (cf. hledět, dívat se ‘look at’), for the semantic shift ‘look at’ > ‘watch out’ > ‘belong to, pertain’, see Reference SkokSkok 1971–1973, and (3) ‘wedding’ Rus. svad’ba, BCS, Bul., Mac. svadba, ~ Pol. ślub (but Cze. svatba).

South ~ East, West (Reference SławskiSławski 1962 ): (1) *žica ‘thread’ ~ Rus. nit’, Pol. nić, (2) *slana ‘hoar-frost, frost’ ~ Pol. szron, Rus. inej, (3) *loza ‘grape-vine’ (whereas Pol. łoza ‘osier, Salix cinerea’), (4) *kaniti ‘invite > intend’, (5) *brojь ‘number’ ~ Rus. čislo, Cze. číslo, Pol. liczba, and (6) *gaziti ‘wade (at a ford), tread’ ~ *bresti, *broditi ‘wander’ > *brodъ ‘ford’ > Rus. brod, Pol. bród.

Within South Slavic, lexical differentiation allows us to distinguish an innovating Center (BCS, especially Štokavian) from a more conservative Periphery (Slovene and to some degree Čakavian in the northwest, as well as Bulgarian-Macedonian in the southeast; Reference TolstojTolstoj 1974/1997, Reference PopovićPopović 1960 ): (1) *vatra : *ognь ‘fire’, (2) *kyša : *dъzdjь ‘rain’, (3) *pravъ : *desnъ ‘right (hand)’, (4) *znojь : potъ ‘sweat’ (but cf. also S Bul. znoj), (5) *čadъ/*čad’ъ : *sadja ‘soot’, (6) *ručati : *obědati ‘dine (eat the main meal)’, and (7) *govoriti : *gъlčěti ‘to speak, talk’ (the latter is restricted to the Prekmurje dialect in easternmost Slovene and various Bulgarian dialects, such as Mizija in the northeast, the central Rhodopes, and some of the Stara Planina, including dialects adjacent to Torlak, where it is again attested in eastern Torlak; for *gьlčěti and other verba dicendi, see Reference Schallert and GreenbergSchallert & Greenberg 2007).

Certain Eastern Bulgarian innovations, which are also reflected in standard Bulgarian, are not generally found in the rest of South Slavic, including Western Bulgarian, Macedonian, and sometimes Southern Bulgarian (Reference StojkovStojkov 1993 ): (1) ‘leg’ *korkъ > Bul. krak ~ *noga pan-Slavic, including Mac., W Bul., S Bul., and (2) ‘to pluck, pick’ *kǫsati (PSL ‘bite’) > Bul. kǝ́sa ~ *rъvati ‘pluck; tear’ > Rus. rvat’, Pol. rwać, SW Mac. (Kostur) ǝ́rve 3sg ~ *ky(d)nǫti (PSL ‘throw’) > BCS, Mac. kine, W Bul. kinem (note that *ky(d)nǫti is itself an innovation, when viewed in the general Slavic perspective; see Reference Schallert, Friedman and DyerSchallert 2001, Reference Kočev, Vakarelska-Čobanska, Kostova, Kjaeva and Tetovska-TroevaBDA-OT 2001, Lexicon). There are an even larger number of lexical oppositions of smaller geographical scope, which tend to distinguish Eastern Bulgarian from the West and sometimes also the South (Reference StojkovStojkov 1993), although the landscape is sometimes complicated by additional terms (Reference Kočev, Vakarelska-Čobanska, Kostova, Kjaeva and Tetovska-TroevaBDA-OT 2001), for example (1) ‘hot’ goréšto : žéžko, (2) ‘loom’ stan (also some NW): razboj (also SE Thrace), (3) ‘do not (prohibitive imperative)’ nedéj (some NW): nemój (also the South), both attested, though usually in different contexts, in OCS: nedějь ‘do not’, ne mo[d]zi < *mogi ‘can’t’, and (4) ‘I’ az : jas, ja.

Northern ~ Southern Russian (Reference KuraszkiewiczKuraszkiewicz 1963): (1) ‘plow a field’ > orat’ ~ *paxat’, (2) ‘horse’ koń, komoń ~ lošad’, (3) ‘wolf’ volk ~ b’ir’uk, (4) ‘squirrel’ vékša ~ bélka, (5) ‘rooster’ petúx (as well as pevún [center-north], petún Pskov-Novgorod) ~ kóčet (although mainly limited to the SE, this form nonetheless extends north as far as Kostroma, whereas the areal of petux continues from the North into the center and southwest; see Reference KasatkinKasatkin et al. 2012: map 6), and (6) ‘ashes’ pópel ~ zola (note that variation between popel and pepel is also widely attested in Slavic, see Reference SaenkoSaenko 2017: especially figure 1).

There are a number of lexical isoglosses which tend to link Carpathian Ukrainian with ‘Balkan Slavic’ in sensu largo and which tend to overlap with very old accentual developments (see Reference NikolaevNikolaev 1994 for a basic list, as well as the relevant literature, especially Reference BernštejnBernštejn 1963; the ‘Balkan’ areal here basically encompasses W Bul., Mac., S, E, and N Štokavian, but not W Štok., Čakavian, Slovene, nor Kajkavian): *bričь ‘razor’, žirъ ‘acorn, beech nut’ (especially for feeding pigs); note that žir in Macedonian is dialectal, whereas the standard form is želad, as in Bul. žǝ́lǝd, Rus. ž‘ólud’, which etymologically appears to be the older of the two forms; *těrati ‘seek, chase after, drive’, *jarьmъ ‘halter, harness’, *osojь ‘sunny side of hill’, *tǫča ‘hail’, *sъnokti ‘yesterday evening, last night’.

28.8 Inter-Slavic Areal Features

Some phonological (and prosodic) developments in Slavic dialects exhibit an areal distribution over adjacent portions of different branches of Slavic. A well-known example is the lenition of voiced (lax) *g > γ (voiced velar fricative) [> h] in Upper Sorbian-Czech-Slovak-Ukraine-Western Belarusian-Southern Russian, as well as some of Slovene and Northwest Čakavian in Croatia (i.e. those South Slavic dialects which were geographically contiguous pre-historically in Pannonia with southernmost West Slavic). The dialects affected by lenition occupy a central position in the Slavic linguistic continuum, in contradistinction to dialects of the conservative periphery, where *g does not lenite, namely NW West Slavic (Polish, Lower Sorbian, Kashubian, and extinct Polabian), most of South Slavic, and the rest of East Slavic. The lenited forms are attested in Slovene toponyms by the end of the tenth century, but since the fifteenth century have receded in the eastern and central regions (Reference GreenbergGreenberg 2000).

However, an analysis of the phonological environments and areal distribution of the *g > γ change indicates an important difference in one of the positions of its occurrence (see Reference AndersenAndersen 1969, whose treatment of the question we follow here, while leaving aside differences in the phonetic details of the lenited *g, cf. γ vs. h). Whereas the outcome is generally uniform in (1) intervocalic position (*bogatъ > *boγat) and (2) following a so-called ‘weak jer’ (*sъgoda > *zγoda), in (3) the position following fricative *z, we find two distinct reflexes, namely (1) *mozgъ ‘brain’ > *mozg- (Resia [NW Sn.], E Moravia, Slk., Ukr., most of Bel., which form a contiguous ‘center within the center’) vs. (2) *mozγ-/*mozh- (NW-SSl other than Resia, Cze., N Bel., S Rus., where Czech represents the western ‘periphery of the center’, while N Belarusian and S Russian form the eastern edge thereof). The retention of -zg- is somewhat unexpected in the geographical center of the lenition, where one would anticipate -*zγ-.

Here we must bear in mind that for much of the first millennium AD the PSL inventory for permissible consonant clusters (such as -zg-) was limited to ‘fricative + stop’ (e.g. -zd-, -zg-, -st-, -sk-; an exception is the sequence of voiceless fricatives -sx-, as in *iz-xod- ‘go out’ > OCS isxoditi, isxod ‘exit’ and the like). This situation changed dramatically toward the end of the millennium when a large number of new clusters entered the language as a result of the so-called ‘loss of weak jers’, whereby the mid-high lax vowels ъ and ь were dropped in ‘weak’ position, that is, word-finally (*sъnъ ‘sleep, dream’ > Rus. son, Cze. sen, etc.) or (as pertinent here) when followed by syllables containing full vowels (*sъ-goda ‘early’ > Sln. zgodaj, *vъzgorьje ‘small hill’ > Rus. vzgor’je, *tъkati ‘to weave’ > Rus. tkat’, *bьdenьje ‘vigilance’ > Rus. bdenie). It was the creation of the new consonant clusters issuing from the fall of weak jers that removed the prior constraint upon combinations other than ‘fricative + stop’, thus opening the gate for *zγ- and providing us with a crucial diagnostic for the relative chronology of the lenition.

One can then apply the concept of variable rule-ordering (in our case, the two sound changes ‘lenition’ and ‘jer drop’) to the three positions of relevance for the treatment of *g in leniting dialects as described above (see Table 28.5). Assuming that lenition preceded the loss of weak jers in the ‘centermost’ dialect group (Chronology I), *g > γ would have occurred only in the first two positions, with the relative chronology for the second position being *sъgoda > *sъγoda > *sγoda > *zγoda (by voicing assimilation between fricatives), since /γ/ had been created before coming into contact with the preceding fricative /s/. Such is not the case in the third position, where the ‘fricative + stop’ rule blocked the change *mozgъ > *mozγъ.

Table 28.5 Relative chronologies of the lenition of *g > γ and the fall of weak jers

Relative chronology I				Relative chronology II
Center: EMoravia, Slk, Ukr, most Bel				Periphery: Cze (West), NBel, SRus (East)
PSL	*bogatъ ‘rich’	*sъgoda^a ‘early’	*mozgъ ‘brain’	PSL	*bogatъ ‘rich’	*sъgoda ‘early’	*mozgъ ‘brain’
Lenition	boγatъ	sъγoda	mozgъ	Jer fall	bogat	zgoda	mozg
Jer fall	boγat	zγoda	mozg-	Lenition	boγat	zγoda	mozγ-

^a For dialects in which sъ goda does not occur, one may substitute *sъ gory ‘down from the hill’.

Conversely, if lenition occurred after the loss of jers (Chronology II, which applies to the two peripheries, i.e. Czech in the west, and Northern Belarusian, Southern Russian in the east), then nothing would prevent the occurrence of sъgoda > *sgoda > *zgoda (by voicing assimilation) in the first stage, after which the ensuing phase of lenition (stage two) could occur without positional constraints, resulting in *boγat-, *zγoda, *mozγ-. This also indicates that the leniting impulse occurred as a wave, which passed from the center of the change to the periphery, rather than constantly recycling within the center.

A more circumscribed, but still powerful sphere of influence (this time in the domain of prosody) has been exercised by Polish, arguably upon Eastern Slovak (which, like Polish, has lost phonemic length in its vowel systems and generally stresses the penultimate syllable), but most certainly upon Lemko (from the idiosyncratic lexeme lem ‘only’) dialects of Ukrainian (spoken along the Polish-eastern Slovak border until 1947), where we usually find stress on the penult (po našómu ‘in our language’ [JS-conversation with Lemko emigree in Texas, 1990]; starósta ‘elder’, mołóda ‘young’ fem, každómu ‘to each one’ masc, na hołówu ‘onto (his) head’ (Reference KuraszkiewiczKuraszkiewicz 1963); compare Russian (which is generally conservative in matters of accent) nášemu, stárosta, molodá, káždomu, na gólovu/ná golovu.

Similarly, but more specifically induced by southern (Małopolska) Polish dialects, along both sides of the Carpathians as far as the San and Už river basins (and including some of the Hucul Ukrainian dialects), we find a particular phonotactic voicing rule, whereby voiceless obstruents become voiced and voiced obstruents fail to devoice in word-final position before words beginning with a vowel, or with a sonorant (l, r, m, n) or the palatal glide (j), for example (for voiceless obstruents) Hucul pładž-abo-śmix (*plač a[l‘]bo směx) ‘(you don’t know) whether to laugh or cry’, niz-mérzne (*nos mrzne) ‘(my) nose is freezing’, kid-utík (*kot utek) ‘the cat ran away’, tag-jeg- u̯ uyn (*tak jak on) ‘just like him’ (Reference KuraszkiewiczKuraszkiewicz 1963).

28.9 Prosodic Continua

The Slavic speech area is notable for its rich inventories of prosodic phenomena (including various combinations of pitch, quantity, and free word stress, as well as fixed or restricted stress). It is therefore not surprising that in some cases prosodic continua have formed across neighboring dialects within a single branch of Slavic or even within one language. Here we will consider two examples, both of which exhibit a longitudinal (here west–east) gradation of prosodic systems.

Our first example is drawn from West Slavic; we can diagram it as follows:

Czech > Central Moravian > Western Slovak > Central Slovak > Eastern Slovak > Sotak

Czech has fixed initial stress with unrestricted distribution of long and short vowels, but with a generally higher concentration of long root vowels in some morphological and prosodic classes in SW Czech (see Reference VoráčVoráč 1955 ; note that in the examples cited below, Czech and Slovak long vowels are indicated with the acute diacritic). Central Moravian also has fixed initial stress, but with some shortening of high vowels (kúň/kuň ‘horse’, mlíko/mliko ‘milk’) and even of /a/ under the old acute tone (rak ‘crab’, krava ‘cow’, blato ‘mud’ vs. Cze. rák, kráva, bláto). West Slovak has initial stress, with regional variation in the quantitative reflex of the old acute (kráva ~ krava). Central Slovak (as well as Standard Slovak) combines initial stress with a ‘rhythmic law’, which tends to prohibit two successive syllables with long vowels by shortening the second of the two. Thus, for example, in Central Slovak even though the fem.def. adj.desinence is etymologically long due to vowel contraction (*-aja > *-ā), the manifestation of this length depends on the quantity of the preceding root syllable (cf. nová ‘the new (one)’ : krásna ‘the beautiful (one)’). East Slovak and Lacha Moravian show no long vowels and fixed penultimate stress, as in Polish. Finally, the peripheral (now mainly extinct) Sotak dialect has a locally variant mixture of free stress (as in Ukrainian, but with a different etymological distribution) and quantity. The Sotak dialect, which derives its name from the typical substitution of a fricative for an affricate (as in the interrogative pronoun [so] < *tso ‘what’), is little known except among Slovak dialectologists, despite its intrinsic historical and typological interest. Let us consider then the two main Sotak prosodic subtypes, both of which were attested in the Cirochou river valley in the mid-twentieth century.

A clear example of the ‘free stress’ Sotak type is the Humenné dialect of Modra nad Cirochou. Stress (marked here with the vertical ictus ̍) on the final open syllable is attested in PSL circumflex stems with original long root vowels: ucho̍ ‘ear’, meso̍ ‘meat’, hlava̍ ‘head’ nom., *hlavu̍ acc., z duba̍ ‘from the oak’, but not in na vo̍dźä ‘on the water’ (with short circumflex), or žä̍na ‘woman/wife’ (with short PSL oxytone) (see Reference LiškaLiška 1968, but also with reference to other research published in the 1940s).

As an example of the Sotak type with phonemic quantity under the accent, compare the neighboring dialect of Dlhé nad Cirochou, where long vowels /ā, ǟ, ọ̄, ẹ̄/ are pronounced more closed than their short counterparts, but also about one-and-a-half times longer. (This is reminiscent of Polish ‘pochylenie’, in which the reflexes of some of the medieval Polish long vowels are typically qualitatively distinct from their short vowel counterparts, but without any quantitative distinction.) Stress, though generally unrestricted, does not occur on final open syllables (perhaps an early phase in the development of a paroxytonic system). For examples, see Table 28.6, again drawn from data in Reference LiškaLiška 1968, who based his findings on spectrographic, oscillographic, and roentgenographic analysis of samples from the data he collected. For a derivation from PSL prosodemes, see Reference SchallertSchallert 2011 .Footnote ¹

Table 28.6 Long : short vowel pairs in Dlhé nad Cirochou (Humenné Sotak dialect of ESk)

	PSl *a	PSl *ę	PSl ja, ě	PSL *o	PSl ě : e
Long	spāl’i ‘will burn’	p’ǟta ‘fifth’	p’ǟna ‘drunk’ fem.	pọ̄t ‘loft’ PSL *podъ	śčẹ̄p’u ‘I split’
Short	spal’i ‘they slept’	p’äta ‘heel’	p’äna ‘foam’	pot ‘sweat’	śčep’u ‘I graft’
L : S^a	1.63 : 1	1.46 : 1	1.56: 1	1.53 : 1	1.40 : 1

^a L : S = relative duration of long vowels compared to their short vowel counterparts.

Our second example of a prosodic continuum is taken from the Macedonian speech territory in the southern Balkans, diagrammatically (Reference BaermanBaerman 1999):

Western Mac (fixed) > Central Western > < Central < Easternmost Mac (free).

(Note that the arrows in the diagram indicate the basic directions of change.)

Fixed stress in the West is historically innovating and manifests itself in three varieties, falling on the penult in Korča in Albania (vodeníčar ‘miller’, def. vodeničáro ‘the miller’), on the initial syllable in Oščima near Lake Prespa in Greece (*vódeničar, *vódeničaro), and elsewhere on the antepenult as in Standard Macedonian (vodéničar, vodeníčarot), although the latter generalization holds mainly only for dialects spoken in the Republic of North Macedonia, as well as some immediately adjacent dialects within and to the north of the Šar Planina mountain range (in the extreme SW of Kosovo and a strip of Albania adjacent thereto) and to the south, extending several kilometers into NW Greece. This is to say that the rest of the Macedonian dialects in NW Greece belong to the Central-Western accentual system (see below).

The other typologically and historically conservative pole is represented by those Eastern Macedonian dialects situated more closely to the Bulgarian border, where dialects exhibit a free, lexically determined stress system comparable in most respects to Western Bulgarian (and, except for fixed root stress in the present tense of i-verbs, to Russian). These two typological extremes are separated by two intermediate zones, Central and Central-Western, in which phonological tendencies towards antepenultimate or penultimate stress interact with trends toward uniform stress in morphological categories or classes. The picture is further complicated by a tendency towards proclisis (initial stress) on verbal prefixes (usually just in the singular, but sometimes also in the plural) in some dialects in the aorist or (more commonly) the imperative.

The Central zone stretches in parallel with the Easternmost zone in a fairly broad band from southernmost Torlak in Serbia through Kumanovo in the north of Macedonia all the way to within a few kilometers of Thessaloniki in Greece in the south. Here lexicalized stress has been virtually eliminated, as root stress has been generalized in the present tense (péčeš ‘bake’ 2sg, cf. Rus. pečóš) in contradistinction to desinential stress in the aorist (pečé ‘baked’ 2-3sg, cf. BCS pȅče), while the accent has been retracted in PSL oxytonic nouns (rebró > rébro ‘rib’, žená > žéna ‘woman, wife’), although in districts adjoining the Easternmost zone, end-stress remains in the neuter type momčé ‘young man’, as well as in Turkisms such as džadé ‘road’, and polysyllabic suffixed feminine nouns, such as visočiná ‘height’ and slobodá ‘freedom’. In the Central zone, the imperative (with its simple paradigm and communicatively marked status) is the most volatile form in terms of a tendency to generalize one stress pattern; see root stress in the North (naprávi, naprávite), but initial (including prefixal) stress in the mid-Central region of Štip (nápravi, nápravite) and with oxytonic plural in most of the south (nápravi, napravéjte, with -ej- by analogy with the productive a-verb form -ajte).

The Central-Western zone (the second transitional type) runs next to the Western zone in a narrow strip across Southwestern Kosovo, then enters Macedonia to the north of Skopje, continuing down the Vardar river valley to Veles, while expanding southwestward to include the Tikveš and Mariovo regions, then turning fairly sharply to the west a few kilometers south of the border with Greece, before ending in the Kostur dialect of Vrnik in eastern Albania. In the CW zone (represented here by the Kajlar dialect in northern Greece; Reference DrvošanovDrvošanov 1993 ), although stress in the present remains on the root (nósam ‘carry’ 1sg, nósiš 2sg, nósime 1pl), it is generalized on the suffix in the highly productive -uva derivative class (dunisúvam ‘I bring’, dunisúvăme ‘we bring’). While the accent is retracted from the ending in the singular forms of the aorist (réku ‘said’ 1sg, réče 2-3sg vs. plural rikóme, -óte, -óa), in the imperfect, stress is advanced from the root onto the penult in all forms (vikáše 2-3sg ‘called’, vikáme 1pl, etc.), including the 1sg viká, which is leveled with the rest of the paradigm (cf. 2-3sg vikáše) and which may have ended in a closed syllable (*vikáx) at the time of the retraction from final open syllables. These changes in the preterite paradigms reflect an underlying trend towards the generalization of penultimate stress. Note that some of these preterite forms once had (and in some dialects still do have) long vowels due to compensatory lengthening attendant upon the loss of /x/ (rikó:me, rikó:te from *rekóxme, *rekóxte or imperfect vikā́me < *vikáaše, -áa[x]me). In some CW dialects the degree of lexical variation is quite high, in both the noun and the verb, as competing stress patterns overlap, suggesting subsystems still in the process of formation.

Early evidence for fixed stress systems in Macedonian is quite rare, but paroxytonic tendencies in the Central-West zone can be dated to the sixteenth century, as attested in a Macedonian-Greek lexicon and brief phrase book based on the Kostur dialect in southwest Aegean Macedonia (Reference Giannelli and VaillantGiannelli & Vaillant 1958 ). Here we find forms such as pl. golóbi ‘pigeons’ (sg *gólob); grében ‘comb’ : pl grebéni, as well as the stress shift caused in the imperative singular by the addition of an enclitic Postelí mi posteláta ‘Make my bed!’ (from *pósteli + mi, where initial stress in the imperative singular as the starting point in our derivation is suggested by other examples such as óstavi ni da spime ‘Leave us alone to sleep’). Such competing patterns may have contributed to the development of the so-called ‘double stress’, which is so widespread in parts of Aegean Macedonian and some western Rhodope dialects (e.g. Suxo [to the east of Saloniki] v’éč́er ‘evening, pl. v’éč́eróvi) and which has parallels in Greek (e.g. o δáskalos ‘the teacher’ : o δáskalós mas ‘our teacher’).

The origins of fixed antepenult and penult stress have not yet been determined with any certainty, although scholars have sought rough parallels in the framework offered by neighboring languages such as Greek, where stress is restricted to one of the last three syllables of the phonological word (more precisely to one of the last three morae or units of duration, since Ancient Greek had both short and long vowels). Of course, in the Greek noun at least, this ‘tri-moraic’ law allows for the existence of lexically specified end-stressed oxytona (ModGk γerós ‘robust’), penult-stressed paroxytona (ModGk γéros ‘old man’), and antepenult-stressed proparoxytona (ánthropos), whereas in the verb, so-called ‘recessive’ stress on the antepenultimate mora is generally consistent (Reference MackridgeMackridge 1985). An Ancient Greek exception is the Aeolian dialect of Lesbos, where stress is also recessive in the noun (e.g. pótamos ‘river’ vs. Ionic potamós; Reference Goodwin and GulickGoodwin & Gulick 1930/1958 ).

28.10 The Role of External Factors (Economic and Political)

The spread of dialect features is often driven by economic and/or political factors. An example of economic motivation is the quest for more arable land which attracted speakers of the northeastern Mazovian Polish dialect to the south (Lesser Poland/Małopolska), resulting in the spread of the phonological feature known as mazurzenie (the merger of alveolar /š ž č dž/ with dental /s z c dz/, e.g. szósty > sósty ‘sixth’, etc.) (Reference StieberStieber 1973). The structural-linguistic factor at play here is the pressure exerted by the recently formed pre-palatal /ś/ (siadam ‘I sit down’, siano ‘hay’) upon the phonological space previously occupied by postalveolar /š/. Since this factor was latent in all Polish dialects at the time, the change was readily adapted even by speakers of other neighboring dialects. Less productive was the northernmost Mazovian (e.g. Kurpie) assibilating palatalization of labial stops, for example ṕśęć < pięć ‘five’ (a unique development in the Slavic speech area, but one with some parallels in Romance, cf. French savoir : sache [saʃ] < *sapere : sapia-, rouge [ruʒ] < Rubia-]), which remained confined for the most part to Mazovia (Reference FriedrichFriedrich 1955 ). It is nonetheless notable that the twentieth-century western isogloss for mazurzenie conforms generally with the political boundaries which existed prior to the thirteenth century between Great Poland (Wielkopolska) (in the west) and most of the remaining regions of medieval Poland (Reference StieberStieber 1973).

A historical sequence of economic and political factors fostered the expansion of the Serbian Eastern Herzegovinian dialect, which eventually came to serve as the basis for one of the varieties of literary Serbian, as first encoded in Vuk Karadžić’s Serbian dictionary with German and Latin glosses (published in Vienna in 1818, but here cited from the 1852 edition, as per the reprint of 1935). One of the most notable features of Vuk’s dialect was the so-called jekavian pronunciation of the reflex of the Common Slavic jat’ vowel (vjȅra ‘faith’ < *vě̋ra, brȉjeg ‘hill, bank’ < *brě̑gъ < *bȇrgъ), although Vuk’s dictionary also records ekavian (vȅra, brȇg) and ikavian (vȉra, brȋg) variants from other dialects. Another prominent Eastern Herzegovinian trait that one finds systematically indicated by Vuk is the so-called Neo-Štokavian accent retraction, which also generated secondary short and long rising pitches on the newly stressed syllables (e.g. ženȁ ‘wife’ > žèna, rūkȁ ‘hand’ > rúka). Difficult economic conditions in the highlands of the Dinaric mountain range had always tended to stimulate gradual resettlement to the lowlands to the east and the Dalmatian coast to the west, but this trend was given a powerful fresh impulse by the vicissitudes attendant upon the Turkish conquest of the Balkans, which imparted a northwestern vector to the spread of the Eastern Herzegovinian dialect. These events culminated in the great Serbian migrations (seobe), which followed the unsuccessful outcomes of the two Habsburg-Ottoman Wars (1690, 1737–1739).

28.11 Fragmentation Induced by Contact with Non-Slavic Languages

Historical change in the Slavic dialects has also been shaped by prolonged contact with non-Slavic languages. As an example of contact phenomena on the northern periphery of East Slavic (in this case generally attributed by scholars to a Finnic substratum in Karelia), we may cite the case of so-called ‘ljapan’je’, which covers a range of prosodic and phonological features in Zaonežje North Russian dialects (Reference MixajlovaMixajlova 2019); see the widespread stress shift from the final syllable to the initial syllable, leading in some cases to the accentuation of original pretonic vowels which had undergone jakan’e vowel reduction at an earlier stage, for example Pjáškom s mjáškom do Medvjášky ‘On foot with a sack to Medveška’ (standard Rus. Peškóm s meškóm do Medvéški). The phenomenon was first observed by early Russian folklorists, such as Rybnikov and Gil’ferding (see Reference Ter-Avanesova, Bulatova, Dybo, Nikolaeva and ZaliznjakTer-Avanesova 1989, which also provides a detailed diachronic analysis of ljapan’je, including typological parallels with similar stress retractions in BCS dialects in Slavonija and the Drava river basin). This stress type also occurs in phrasal constructs with proclitics, for example n‘á bojus’ ‘I’m not afraid’. Perhaps more unusual in the Slavic context is the rise of a prothetic /g/ before initial mid-front vowel in this same kind of dialect, for example gésli ‘if’ (jesli), as in one type of Veps dialect (where the continuum is järva > därva ~ gärva ‘lake’).

Several other features of Russian or its northern dialects have been attributed (albeit with varying degrees of probability) to a Finnic substratum. These features include: (1) the formation of the so-called ‘second (partitive) genitive’ and ‘second (locative) prepositional’ cases, whose desinences have been recycled from the original u-stem endings (as in standard Russian kiló sáxaru ‘a kilo of sugar’ vs. vkus sáxara ‘the taste of sugar’, and iskál v snegú ‘he searched in the snow’ vs. govoril o snége ‘he talked about (the) snow’); (2) the use of a locative phrase, rather than the verb ‘have’, to indicate possession, again as in standard Russian u menja est’ dom ‘I have a home/house’, as in Finnish Minulla on talo (*iměti ‘to have’ in the rest of Slavic, including Ukrainian and Belarusian); (3) in Northern dialects, the use of the nominative case as a direct object (Kosi travá poka rosá! lit. ‘Mow grass while the dew’s still fresh’/‘The early bird catches the worm’) and the nominative object of infinitive, as well as (4) the use of resultative perfects such as U volkov zdes’ xoženo ‘Wolves have been walking/prowling here’ and On privykši ‘He is used to it’ (the two types overlap only in Karelia), both in contradistinction to standard Russian concordant L-preterites (guljali, xodili, brodili : privyk < privyklъ); and (5) the use of a postponed demonstrative pronoun (*t-) in a fashion which bears some resemblance to the development of the definite article in the history of Bulgarian and Macedonian (domo-t ‘the house’, domu-tomu ‘to the house’), particularly in the influential and widely popular writings of the mid-seventeenth-century priest Avvakum, who grew up in a region which included a Mordvinian-speaking population (Reference KiparskyKiparsky 1968).

In South Slavic, interaction with Albanian in a context of extensive bilingualism along the border with Montenegro may have supported retention of /ä/, which can be reconstructed as a plausible intermediate stage between /ə/ and /a/ in an archaic reflex of the jer vowels (dä̑n ‘day’ < *dьnь, sä̏n ‘sleep’ < *sъnъ), but this same interaction also promoted innovations such as the devoicing of consonants in auslaut (grop < grob ‘grave’, noš < nož ‘knife’) and the shift of the palatal lateral continuant to a palatalized variant (ljúljati ‘rock’ [ʎuʎati] > l’ul’at; Reference IvićIvić 1994 ). Similarly, it has been suggested that peripheral southwestern Macedonian dialects of the Kostur region retained the nasal feature in forms such as réndo ‘the row’, zə́mbi, zɔ́mbi, zǻmbi ‘teeth’ through decomposition of *ę, *ǫ into ‘vowel + nasal stop’ at a stage prior to the general denasalization which affected the rest of the Macedonian dialects, and that this ‘archaic innovation’ transpired due to centuries of contact with speakers of neighboring dialects of Aromanian, Albanian, and Greek, all of which have undergone prenasalization of voiced stops (in the case of Greek, from as early as the fourth century AD, when -mp- > -mb-, -nt- > -nd-, etc., for example seventh-century papyrus <Pondikón> ‘mouse’ < Pontikon, thirteenth-century <loumpardoí> < ‘Lombards’; Reference Arvaniti and JosephArvaniti & Joseph 2000). The hypothesis is supported by a correlation between the timbre of the root vowel in Macedonian and in the geographically relevant contact language. Thus, central and northern Kostur Macedonian, Albanian, and Aromanian all have a schwa-like vowel in this context (e.g. Mac. Northern Kostur zə́mbi ~ Alb. Tosk dhëmp ‘tooth’, dhëmbi ‘the tooth’, by chance an IE cognate of PSL *zǫbъ; Arm. Farsheroţi plăɲg^ụ ‘I cry’), whereas southeastern Kostur Macedonian (in this context) and Greek (altogether) lack such a vowel (e.g. Kostur Mac. zɔ́mbi ~ Gk. /a/ lambo ‘I shine’, WMac. Gk. /o/ [đond] ‘tooth’ [Reference NewtonNewton 1972: 149]) (Reference Friedman and BethinFriedman 2018, Reference LindstedtLindstedt 2016).

This same region of SW Macedonia is also the epicenter for a set of radical innovations in the verbal system, one of which is the development of a new series of perfect tenses formed with the verb ima (‘have’) and sum (‘am’, etc.) in combination with past passive participles, for example Imam bideno ‘I have been’, Imam imano ‘I have had’, Sum dojden ‘I have arrived’, and even Veḱe sum jaden ‘I have (literally ‘am’) already eaten’, all forms with differing degrees of productivity in Western Macedonian. Resultative perfects formed with ‘have’ as the auxiliary are typical of Aromanian, Albanian, and Greek, and their particular prosperity in Western Macedonia is generally attributed to intense contact with these non-Slavic languages in the context of the Balkan Sprachbund (linguistic league). Macedonian, in its turn, has arguably contributed to the formation of perfects with ‘be’ as the alternative auxiliary in Aromanian (Reference GołąbGołąb 1984), although the presence of both ‘have’ and ‘be’ perfects with lexically determined diathetic functions is typical of Romance languages in general.

Another well-known bilingual contact zone in South Slavic is the eastern Adriatic coast and adjacent islands, where centuries-long interaction with the politically and culturally dominant Venetian dialect of Italian has played a significant role in the formation of the Cakavian variant of the Čakavian dialect of Croatian. Cakavian is attested from as far north as the Labin district in the southeast of the peninsula of Istria to as far south as the island of Vis, occurring principally, but not exclusively, in the towns, which is where the Venetian administrators and colonists were concentrated (Reference MałeckiMałecki 1929, Reference MogušMoguš 1977: 66–79). The influence of Venetian was particularly strong during the fifteenth to eighteenth centuries and is manifested on a variety of linguistic levels, including the lexicon and morphology (see below for examples), as well as syntax and verbal semantics (Reference KalsbeekKalsbeek 2011 e.g. cites (1) clause-initial clitics, ću te kumpanjit ‘I will accompany you’, (2) the habitual preterite suffix -eva-, smo čuvievȁle skupa ‘We used to tend our flocks together’, (3) a simplified lexical system for verbs of motion, where voditi ‘lead/guide by hand’ and voziti ‘convey in vehicle’ are replaced by peljati ‘drive, impel’ (cf. Ital. condurre ‘lead; drive’), (4) substitution of adverbs for verbal prefixes denoting direction, da ne grȅ nuõtre ‘that (s)he not go inside/enter’, (5) purpose clauses with za ‘for’ + inf as in Italian per + inf, although this construction is more widespread). It is nonetheless a prominent phonological trait which lends its name to Cakavian, as reflected in the iconically typical pronunciation of the interrogative pronoun ca [tsa] ‘what’ (with variants co, ce, more rarely će [tśe]), corresponding to the geographically and demographically more widespread Čakavian ča [tša]). The historical substitution of a dental affricate /c/ for the palato-alveolar affricate /č/ that we see in this example is part of a broader tendency, which also affected the palato-alveolar fricatives /ž, š/, shifting their articulation to the pre-palatal /ź, ś/ or dental /z, s/ region, depending on the local dialect, for example žena ‘woman, wife’ > źenȁ, zena, široko ‘wide’ > śirõko, siroko (for phonetic details, see Reference MogušMoguš 1977). In those Cakavian dialects where /ź/ rather than /z/ replaced /ž/, the shift to pre-palatal articulation also often affected the dental sibilants, e.g. selo ‘village’ > śelo, zelen ‘green’ > źelẽn. In such dialects we find words in which both an original dental and an original palato-alveolar have been changed, resulting in extensive merger with /ś, ź/, for example šest ‘6’ > śȇś, železo ‘iron’ > źeleźo. An excellent example of a consistent Cakavian dialect of the last-mentioned type was recently recorded in the speech of an older female informant in the village of Štrmac (near Labin in southeast Istria; Reference NežićNežić 2013: 419–420): *č > ocȗva ‘keeps’, docȅkat ‘wait until’, *c > ocȁ ‘father’ gen., *ć > kȕća ‘house’, ćȅmo ‘we will’, *š > źnȏś ‘you know’, śȇś ‘6’, ślȁ ‘she went’, *s > śo ‘they are’, śe ‘self ’, śadȁ ‘now’, *ž > oźenȉli ‘they married’, źȉvi ‘alive’ pl., *z > źa ‘for’, źȋt ‘wall’. Cakavian since the early twentieth century has generally been in retreat before local Čakavian and literary Štokavian, while at times exhibiting nearly free variation between the possible outcomes of the fricatives (as documented, e.g. for Vis [Reference Moskovljevič and BulatovaMoskovljevič 1972, based on fieldwork from 1928] and Labin [Reference NežićNežić 2013]).

This elimination of the ‘hushing’ series /ž, š/ brought the Čakavian system into closer alignment with that of Venetian, which never possessed or developed /ž, š/. On the other hand, the introduction of /ś, ź/ in some Čakavian dialects seems to reflect a different kind of adaptation, since the corresponding Venetian sibilants, though commonly designated /s, z/ at the phonemic level, have often been described in phonetic terms as situated in their articulation and acoustic impression between [s, z] and [š, ž] (Reference BidwellBidwell 1967, Reference PiccioPiccio 1928, Reference MałeckiMałecki 1929). This is consistent with Reference Canepari and CortelazzoCanepari (1979), who identifies a type of Modern Venetian pronunciation of /s, z/ in similar, but more precise terms as ‘apico-alveolar’ sibilants articulated farther back than the dentals [s, z], that is, as alveolar retracted (not ‘retroflex’!) sibilants. Since the IPA alphabet lacks separate symbols for this articulation, one might have recourse to [s̠, z̠] or even ‘ṣ, ẓ’ (following Małecki and the earlier Italian linguists). It is conceivable that this pronunciation is not a recent innovation, as suggested by typological parallels elsewhere in Romance (e.g. Old French pousser > English push) and by the phonology of Venetian loanwords in Cakavian, for example Northern Cres (Reference VelčićVelčić 2003 ) śalvḁ̑n ‘saved’ (< Ven *salvo, cf. Ital salvato with morphological substitution of the Slavic past passive participle suffix /-an-/ for Romance /-at-/), and žvȇlt/źvȇlt ‘fast, quick’ < Ven zvelt (cf. Ital svelto).

The linguistic impact of long-term contact with German is felt most profoundly in West Slavic (especially Czech and Sorbian) and northern West South Slavic (chiefly Slovene and Kajkavian). Depending on local circumstances, this impact is reflected in various components of the languages involved (phonological, morphosyntactic, aspectual, and lexical). Some of the influence was reciprocal (as in Carinthia, between local Bavarian dialects and the northernmost Slovene dialects). Aside from possible prehistoric Iron Age Germanic-Slavic contact/mixture in the west in the context of the Przeworsk culture of the late La Tène to Roman times (third century BC to fifth century AD) (Reference Nichols, Maguire and TimberlakeNichols 1993), this contact was initiated or maintained by the Slavs in the course of the Slavic Migration, but by the ninth to tenth centuries the demographic and political tide had turned, such that at a later date, German colonies and concentration in urban centers such as Prague, Ljubljana, and Zagreb/Agram, as well as the smaller towns, served as points of radiation for the politically and culturally dominant language.

To begin with a West Slavic example, we may consider colloquial Prague speech (Reference TownsendTownsend 1990), where we find the use of a perfect tense formation involving ‘have’ (German haben) + past passive participles in two types of construction, the first with the short form of the participle in neuter gender, for example Máme tady rozsvíceno ‘We have the lights turned on’, Mám tady uklizeno ‘I’ve tidied/cleaned up here’, the second with the long form of the participle marking concord with topicalized (sentence-initial) direct object, for example Referát už mám napsanej ‘The paper/report I’ve already written’ (note that in standard Czech this last example would have a short form predicate participle (napsán), just as in the first type, whereas the long form would more often be attributive; in German there is no such distinction marked directly on the participle itself, a factor which may have played a role in the erosion of the formal difference in Prague Czech). In the noun phrase, demonstratives in colloquial Prague Czech can be used in ways that resemble colloquial and even standard usage of the article in German, for example Ten Honza je už dlouho pryč = Der Honza ist schon lange weg, or to replace the third person pronoun in emphatic (?) topicalized position, for example Toho jsem ale neviděl ‘But him I didn’t see’ = Den habe ich aber nicht gesehen. In the Czech Doudleby dialect, the usage of the demonstrative at times resembles a definite article, for example ti [those-Masc.-nom.-pl] tahouni se používali po sekaní obilí … ‘the draft animals were used after harvesting the crops …’ (where the reference to draft animals is generic, since there has been no prior allusion to them). More broadly in the Czech lexicon, loanwords and phraseological calques from German are evident at different stylistic levels: colloquial Cze. kšeft < Geschäft ‘business’ or Mám smůlu < Ich habe Pech ‘I’ve got pitch (tar)’ that is, ‘I’ve got bad luck’, learned calques (obrození < Wiedergeburt ‘Rebirth, Renaissance’, výlet < Ausflug ‘excursion, outing’), and high-register borrowings (as in the use of říše neut. ‘empire, kingdom, realm’ < das Reich, which dates back to Old Czech and is one of the rare Germanisms to be found in Mácha’s “Máj” (1836), cf. line 3 of canto 2, Klesla hvězda s nebes výše … padá v neskončené říše … ‘A star fell from the heavens on high … It falls into the endless kingdom …’).

Turning to Slovene, we begin with phonetic innovations which are typical of the contact zone between Bavarian and Slovene as spoken in Carinthia (Austria), but which are not found in other Bavarian or Slovene dialects (Reference GreenbergGreenberg 2000). These include shared innovations, such as (1) uvular trilled [R] and (2) g, k pronounced as laryngeals [h, ʔ], but also those which Bavarian has adopted due to contact with Slovene, such as (1) merger of /h/ and /x/, (2) lengthening of short stressed root vowels (similar to the well-known Slovene type bràta > bráta), and (3) diphthongal pronunciation of short e and o (Carint. Sln. smréaʔa ‘juniper’ ~ Carint. Bav. schean ‘schön’, Carint. Sln. ʔoaža ‘hide, leather’ ~ Carint. Bav. roat ‘rot’). In morphosyntax, a characteristic feature of several prominent sixteenth-century Slovene Protestant Bible translators is the use of a fully inflected, morphologically demonstrative definite article (*t-) in a fashion which closely resembles that of the German article, for example from Primož Trubar’s preface to his translation of the New Testament (1557): Tiga [masc.gen] Noviga testamenta ena [fem.Nom.] dolga predguvor ‘An extended preface to the New Testament’ (with topicalization of the genitive noun phrase and in addition the use of an ‘indefinite’ article, based on the numeral ‘one’), Ta [masc.nom] pervi deil te [fem.gen] slovenske dolge predguvori ‘The first part/Part one of the extended Slovene preface’, and Na to [fem.acc] pervo nedelo tiga [masc.gen] adventa ‘On the first week of Advent’ (contrary to English usage, Advent takes a definite article in German). This usage was somewhat pejoratively characterized by Kopitar in his 1808 grammar of Slovene as typical of German-influenced town speech (in contrast with the more authentic Slovene spoken in the countryside) and even Slovene Protestant writers of the next generation, such as Dalmatin, made less use of the form. Although the definite article has long since been banished from the standard language, an interesting and highly idiosyncratic reflex of the masc.nom.sg /ta/ is widely used in modern colloquial Slovene to modify adjectives (such as zelen ‘green’) and ordinal adverbs (such as prvič ‘for the first time’), but never the bare noun (*ta svinčnik ‘the pencil’), which proves that it is not a true definite article. Though derived from the demonstrative *t-, uninflected /ta/ is readily distinguished from the latter in minimal pairs, such as pr [*pri] ta lepem ‘with the nice one’ vs. pr tem lepem ‘with that nice one’ (example provided by Marta Pirnat). The form /ta/ can also co-occur with demonstratives, for example téga (gen) ta zelenega svinčnika ‘of this green pencil’ and in other unexpected combinations (see Reference Marušič and ŽaucerMarušič & Žaucer 2008 ).

The mutual interactions of Slavic (chiefly Czech, but also the rest of West Slavic, and Slovene, but also Kajkavian and Croatian) and Germanic (most intensely the Bavarian dialect) have also been viewed by scholars within the construct of a Central European (‘Danubian’ or ‘Carpathian Basin’) Sprachbund, which includes other languages of the region, such as Hungarian and Yiddish (see Reference ThomasThomas 2008 for a thorough survey and methodological study). Among many other features, both phonological and morphological, which have been attributed either to long-term Germanic influence or to participation in the Sprachbund (as underlying or contributing factors), we may note the development of different kinds of fixed stress in West Slavic (see above), the reinforcement of conservative traits in verbal aspect such as the use of the imperfective aspect in sequences of events (a trait which serves to distinguish West from East Slavic; see Reference DickeyDickey 2011 ), and the use of a ‘double perfect’ to form the pluperfect, for example Cze. byl jsem videl ‘I had seen’, Sln. Padel sem bil ‘I had fallen’, and Kajkavian ja sam ti bil pisal ‘I had written to you’ (cf. Yiddish, and Bavarian dialects, Er hat gǝhat gǝtrunkǝn ‘He had drunk’).

28.12 Sociolinguistic Factors

One sociolinguistic correlate of dialect variation which is reinforced by language contact is that of religious confession. Examples of this phenomenon are particularly common in South Slavic with its long-standing confluence of different creeds. Thus, according to Reference IvićIvić (1994), in the consonant systems of Štokavian dialects in Bosnia and Herzegovina, one archaism (the retention of *x, sometimes as /h/) and one innovation (the loss of a distinction between pre-palatal /ć, dź/ and postalveolar /č, dž/) are both far more common in villages with a predominantly Muslim Slavic population than in those whose population is Orthodox or Catholic, a circumstance which has probably been shaped in part by more intense exposure to Turkish (which has /h/, but lacks pre-palatals) and perhaps even Arabic (as mediated via Islamic practices of daily occurrence). In Mostar, an important administrative center with a significant Muslim population, even the Orthodox population has retained /x/.

A particularly striking example of dialect preservation and innovation in the context of a strongly insular religious community is that of the Pavlikjan Bulgarian dialect, spoken by descendants of Armenian and Syrian adherents of the Paulician Christian heretical sect, some of whose members were exiled by the Byzantine authorities in the eight to tenth centuries to the Balkans, where they settled in the south near the city of Plovdiv (where their descendants are known to linguists as the Southern Pavlikjans; see Reference MiletičMiletič 1900). In the course of time (but beginning perhaps as early as the fifteenth century), some of the Pavlikjans migrated due north through the central Stara Planina (‘Old Mountains’) to the Danubian plain near Svištov and Nikopol (regarding these Northern Pavlikjans, see Reference NedelčevNedelčev 1994 ). Due to the successful efforts of Franciscan missionaries, the Pavlikjans (‘Palk’éne’, in their native dialect) converted en masse to Catholicism in the mid-seventeenth century, a decision which subsequently exposed villages of the Northern branch to Ottoman Turkish reprisals following the unsuccessful Čiprovci rebellion of 1688 (undertaken principally by Bulgarian Catholics in the Northwest). As a result, many of the Northern Pavlikjan Catholics fled across the Danube to Romania, some settling in the east near Bucharest (Popešt-Leorden; see Reference Petrovič and VrabiePetrovič & Vrabie 1963, 1965; Reference MladenovMladenov 1993 ), but with the permission of the Austro-Hungarian authorities mostly in the west in the province of Banat in 1738–1741 (Bešenov, Vinga; see Reference MiletičMiletič 1900, Reference StojkovStojkov 1967).

Remarkable features encountered in Pavlikjan vocalic systems include (1) the raising of the PSL mid vowels (*o > u, e.g. *kon’ > kun ‘horse’, *e > i, mit ‘honey’ in Popešt-Leorden and in Miletič’s description of Southern Pavlikjan) and (2) preservation of PSL *y in forms similar to Ru. ы, Ukr. <и> =[ɩ], and [ə]. These vowels also tend to serve as common reflexes of *i after depalatalized consonants (pɪsmo ‘letter’, gŏdǝ́na ‘year’). Particularly notable is the treatment of *y after labial consonants in Banat and Southern Pavlikjan in isolated examples such as [vu] ‘you’ pl. < *vy (not *vie, as in most Bul. and Mac.) and [muja] ‘I wash’ trans. < *myja. This apparent relic is otherwise attested in South Slavic in the Old Slovene Freising Fragments (tenth century) <mui> *my ‘we’, and the SE Mac village of Visoka (Reference MałeckiMałecki 1936) múja, mújiš́. Reference BrochBroch (1899) also describes a similar reflex (high-mid rounded /ω/) in the Eastern Slovak-Rusyn dialect of Koroml’a in the Carpathians (e.g. mωt’ ‘to wash’, bώu̯at’ ‘to live, dwell’, gríbω ‘mushrooms’, cf. Russian myt’, byvat‘), although one must bear in mind that /ω/ in the Koroml’a dialect also originates from other sources, such as (probably lengthened) *o in rωch ‘horn’.

To conclude this chapter, we find it is not surprising that another sociolinguistic variable which impacts dialect performance is gender. Thus, according to Reference VidoeskiVidoeski (2005: 105), the pronunciation of the mid-vowels /e, o/ as low-mid [ε, ɔ] (bέlo ‘white’ neut., kɔ́sa ‘hair’) is a general tendency in Northern Macedonian dialects (such as Kumanovo and Kriva Palanka). Although unfamiliar with Vidoeski’s observation at the time, in the course of fieldwork conducted in 1983 in the village of Luke (formerly Drží kon’, literally ‘Hold the horse!’, immediately to the south of the border with Serbia in northeastern Macedonia), I observed that this type of vowel was used consistently in the speech of one of two young informants (male), but not at all in the speech of the other (female), whose mid vowels were pronounced slightly high-mid [e, o]. Both speakers had full command of other distinctive local dialect traits in morphology and word stress, which differ markedly from those of Standard Macedonian. Similar kinds of observations on gender-correlated phonetic differences in the pronunciation of vowels can be found in the commentaries to the various volumes of the Bulgarian Dialect Atlas. As an example, according to the commentary to BDA-SE (Map 9), in the town of Zlatograd in the SE Rhodopes, instead of the usual dialect (and literary) pronunciation žə́na ‘I reap’, older Christian female informants say žė́na (where ė is identified as a quite narrow vowel, yet pronounced with medium ‘lip rounding’).

29 Language Contacts

The contact history of Slavic is probably as old as the Slavic language family itself, running back to prehistoric times when Slavic split off from Balto-Slavic (or directly from Proto-Indo-European, if one doubts the existence of Balto-Slavic).Footnote ¹ On the basis of lexical loans and a few structural features, we can reconstruct prehistoric contacts with Uralic, Germanic, and Iranian. Recently Reference GvozdanovićGvozdanović (2009) has argued for prehistoric contacts with Celtic, but these were rejected by most critics (Reference DaiberDaiber 2012, Reference VathVath 2013 ). More promising is Reference HolzerHolzer’s (1989) reconstruction of a prehistoric layer of loans in Slavic – and to a lesser extent in Baltic – from an unidentified Indo-European language, which Holzer calls Temematic and attempts to link to the Cimmerians mentioned in early Greek sources.

When the Slavs enter history, the social preconditions of language contacts become more visible. Multiethnic and multilingual empires rise all over Europe, and many if not most Slavic languages became involved in complex speech communities, giving rise to language contact areas in the Balkans and possibly also in Central Europe. Whereas in Southeast and Central Europe Slavic languages were impacted by dominant imperial languages (German, Turkish, Greek), as seen in lexical borrowing and ephemeral forms of mixed learner varieties, Russian became itself an imperial language, assimilated speakers of various Finnic languages and endangered many indigenous languages. Russian is also the only Slavic language that served as a lexifier for pidgin languages. With the rise of literacy in Slavic, lexical loans as well as structural patterns from major languages of learning (Greek, Latin, later replaced by Western European languages) were borrowed massively; mutual exchanges ensued among the Slavic languages.

29.1 Prehistoric Contacts

29.1.1 Contact-Induced Origins of Proto-Slavic?

In recent decades, proposals have appeared that Slavic originated through contacts (Reference BirnbaumBirnbaum 1982: 12–13, Reference CurtaCurta 2004 ). For Curta, Slavic is clearly a ‘contact language’ which he describes as pidginization, deriving from “contact between speakers of various, mutually unintelligible languages” (Reference Curta2004: 135–136). Slavic, however, shows no signs of radical restructuring of its inherited PIE grammatical base, so is unlikely to have ever gone through a pidgin stage. A hypothetical case can be made for some kind of koinéization of Slavic within the Avar khaganate, to be sure, but this implies long-standing prior existence of a set of very similar, though not uniform dialects (Reference Andersen, Klír, Boček and JansensAndersen 2020). In a linguistically more well-balanced attempt, Reference Andersen and AndersenAndersen (2003) makes a case for Slavic (and Baltic) being the contact linguistic result of a succession of early migrations of IE-speaking groups into Eastern Europe, where dialects of early settlers merged with those of later settlers.

29.1.2 Slavic and Uralic

Early contacts with Uralic languages are evident from Slavic loanwords in the Balto-Finnic languages, though most of them cannot be shown to pre-date the emergence of Eastern-Slavic. During the early period of contacts, lexical borrowing appears to have been a one-way road from Slavic to Uralic (more specifically Finnic). In fact, hardly any Uralic loanwords can be identified in Common Slavic (CS), which fits general expectations about how substrates show up in the receiving language (Reference Thomason and KaufmanThomason & Kaufman 1988: 240, Reference Kallio, Meiser and HacksteinKallio 2005: 275). It proves difficult to indicate Uralic substrate effects on CS phonology or morphosyntax.

Only for later, Finnic-Russian, contacts can more specific substrate effects be identified (Reference VeenkerVeenker 1967, Reference KiparskyKiparsky 1969, Reference Thomason and KaufmanThomason & Kaufman 1988: 238–251, Reference SaarikiviSaarikivi 2006). On the phonetic level, word-initial stress shift in some North Russian dialects, Northern Russian cokanje, and possibly also Central Russian akanje, are likely candidates for Finnic substrate influence.Footnote ² Among morphosyntactic features for which a Finnic substrate has been suggested, locative possessive predications in Russian (1) as well as nominative objects in necessitative constructions in Northern Russian dialects (2) are the most unquestionable candidates:

(1)
U menja knig-a.
prp 1sg.gen book-nom.sg
‘I have a book.’

(2)
Vod-a pi-t’.
water-nom.sg drink-inf
‘It is necessary to drink water.’

Also plausible is a Finnic substrate underlying the Russian partitive genitive (3)–(4) and predicative instrumental (5),Footnote ³ both of which have neat parallels in most Finnic languages:

(3)
Vod-y ubyva-et.
water-gen.sg decrease-3prs.sg
‘The water is going down.’

(4)
Da-j-te chleb-a, požalujsta.
give-imv-2sg bread-gen.sg please
‘Pass me some bread, please!’

(5)
S 1983 p. 1988 on by-l člen-om
from 1983 till 1988 3nom.sg cop.pst.sg.m member-instr.sg
kommissi-i.
commission-gen.sg
‘From 1983 till 1988 he was a member of the commission.’

29.1.3 Proto-Slavic and Germanic

Genuine contact relations between Slavic and Germanic seem to start only in the first centuries of our era. It is generally assumed that contacts started with the intrusion of Eastern Germanic speaking groups (the predecessors of the Goths) into the Slavic homeland from the first century onwards. This first symbiosis of Goths and Slavs was disrupted by the coming of the Huns in AD 376 and the emigration of the Goths westwards. Slavs migrated into Central Europe sometime before AD 600, and came into contact with speakers of Western Germanic dialects (Old High German, Old Saxon).

Ever since Uhlenbeck (1893), loan relations between Germanic and Slavic have been treated as unidirectional. Slavs were assumed to borrow Germanic lexical items, but not the other way round. Capitalizing on the fact that in many cases the loan direction cannot be unambiguously determined on formal linguistic grounds, Reference MartynovMartynov (1963) argues that borrowing between early Germanic and Slavic was mutual, but his alternative etymologies generally raise more problems than they solve.

Identifying the ultimate source of Germanic loanwords proves a challenge, since features differentiating East from West Germanic are few and not present in each and every etymon in question.Footnote ⁴ Germanic loanwords are therefore classified primarily on a mix of cultural and lexicographic considerations, which, however, do not always yield conclusive results. Both Gothic and later West Germanic loanwords display a semantic pattern suggestive of an exclusively cultural borrowing relationship (Reference KlugeKluge 1913: 41–42).

29.1.4 Proto-Slavic and Iranian

Since Reference RozwadowskiRozwadowski (1915) it has been argued on the basis of Slavic-Iranian lexical parallels that pre-Christian religious notions of the Slavs were remodeled according to dualistic concepts of the reformed Iranian religion. This concept was, however, early rejected by Antoine Reference MeilletMeillet (1926: 68). The close historical phonetic similarity between PSL and Proto-Iranian makes it hard to tell loanwords from inherited cognates, so that a clear decision on the matter of loanwords or inheritance is difficult (Reference Andersen and AndersenAndersen 2003: 48, Reference Álvarez-PedrosaÁlvarez-Pedrosa 2014: 71). This is particularly true of the three main items cited: CS bogъ ‘god’, svętъ ‘holy’, divъ ‘strange’ (also used as a theonym in the Igor Tale; cf. Reference Álvarez-PedrosaÁlvarez-Pedrosa 2014: 68), which are said to derive from Iranian. These words show no Iranian sound changes that would identify them as such. In the case of CS bogъ, however, the absence of lengthening of the stem vowel by Winter’s Law indeed favors a loanword account (Reference Winter and FisiakWinter 1978: 442). If it was not the words themselves that were borrowed from Iranian, at least their shifted meanings seem to have been taken over.

29.2 Slavic within the Multilingual Empires of Europe

29.2.1 The Byzantine and Ottoman Empire and the Emergence of the Balkan Linguistic Area

The Balkan Linguistic Area (BLA) is usually not treated in connection with the successive multilingual empires of the Balkans, though the link strongly suggests itself given that the Balkans is the only European region that can boast a continuous imperial history from the Roman empire until the demise of the Ottoman empire. Reference Stern, De Dobbeleer and VervaetStern (2013b) argues that the uninterrupted imperial social organization brought about a large-scale deterritorialization of languages in the Balkans that made bi- and multilingualism a common feature of daily face-to-face interactions for a significant part of the members of what might be termed the Balkanic speech community. But socio-historical research into the origins and preconditions of the BLA has never been particularly strong, its focus having always been philological and structural-linguistic. We refer the reader to the most comprehensive and influential introductions to the vast research field (Reference SandfeldSandfeld 1930, Reference SchallerSchaller 1975, Reference FeuilletFeuillet 2012, Reference AsenovaAsenova 2002, Reference Friedman and JosephFriedman & Joseph forthcoming ).

The BLA is a zone of linguistic convergence defined by considerable structural overlap between several genetically unrelated or distantly related and sociogeographically adjoining languages. The Balkan languages form a subset of the languages of the Balkans and comprise: Greek, Albanian, Aromanian, Daco-Romanian, Megleno-Romanian, Bulgarian, Macedonian, Judezmo, Romani, the Torlak dialects of Serbian, and, with some reservations, Turkish.

The commonly registered Balkan linguistic features pertain predominantly to the morphosyntactic domain, though the Balkan languages also share many phonological and lexical features. The lexical aspects of the BLA are clearly under-researched.Footnote ⁵ As a matter of fact, cultural loanwords in the Balkan languages, which derive mostly from Turkish and to a lesser extent from Greek, show a different pattern of diffusion from the morphosyntactic and phonetic features of the BLA. Lists of Balkan linguistic features vary considerably among authors of surveys. We will restrict Table 29.1 to features which form the common core of all lists and which, moreover, are shared by Balkan Slavic.

Table 29.1 Balkan linguistic features

No.	Feature	BSlavic	BRomance	Albanian	Greek	Source
Phonology
1	Stressed central vowels	Bulgarian	+	+	−	Old Balkanian, Balkan Romance?, Slavic
Morphosyntax
2	Postpositional definite article	+	+	+	−	Unknown
3	Loss of infinitive	+ (remnants in dialects)	+ (remnants)	+ (restored in Geg)	+	Greek
4	Syncretism of gen and dat	+	+	+	+	Greek? Romance?
5	gen/dat enclitic as poss pronoun	+	−	−	+	Greek?
6	Analytic comparative	+	+	+	+	Romance?
7	(Invariant) particle for future tense based on ‘want’	+	+	+ (not in Geg)	+	Romance?
8	Evidential verb form	+	~ (‘resumptive’)	+	−	Turkish
9	Pleonastic use of object pronouns (object doubling)	+	+	+	+	Unknown
10	Syncretism of place and direction (ubi=quo)	+	+	+	+	Unknown
Lexical and semantic patterns
11	Numerals 11-19 as digit-on-ten	+	+	+	−	Slavic
12	Relative pronoun based on ‘where’	+	+	+	+	Unknown

The determination of the sources of individual Balkanisms has troubled scholars from the beginning, and with few exceptions remains open to debate.

Although the beginnings of linguistic convergence in the Balkans elude us, the entrance of Slavic into what became known as the Balkan Sprachbund or Balkan Linguistic Area can be assigned a fairly precise terminus post quem, namely the first successful intrusion of bands of Slavs in 548 across the Danube into Eastern Roman territory. It is notoriously difficult to date the emergence of any Balkan linguistic feature for any of the Balkan languages. There are though indications that some features, like the postpositive article, must have emerged before the end of the Old Slavonic period, when the reduced vowels in weak positions were still present, so that OSL člověkъ-tъ could yield Modern Bulgarian čovekă-tØ (Reference AsenovaAsenova 2002: 125). The latest linguistic Balkanism to affect Slavic was the emergence in Bulgarian of morphologically marked evidentiality, which would not pre-date the fourteenth century. The time range for the emergence of Balkanisms in Slavic may thus be assumed to span the whole period from the Slavic migrations well into the Ottoman era.

29.2.2 The Russian Empire and the Soviet Union

Among the multilingual empires that comprise Slavic languages, the Russian empire is clearly the youngest. Imperial expansion starts from the sixteenth century with the conquest of Siberia, though the spread of Russian among Finno-Ugric peoples West of the Urals from the eleventh century onwards (Reference LeinonenLeinonen 2006: 235) could be seen as prefiguring the colonial enterprise that turned the principality of Moscow into the Russian empire.

The Russian empire also differs with respect to the position Slavic languages take within its imperial linguistic ecology. The imperial language is itself Slavic (i.e. Russian). Ukrainian, Belarusian, and also Polish found themselves in a subordinate position not unlike that of Slavic in the Habsburg and Ottoman empires. Indeed, it was these three Slavic minority languages, which entered the imperial Russian dominion by two successive events (the annexation of left-bank Ukraine in 1667 and the Polish partitions at the end of the eighteenth century), that became subject to severe and restrictive language policy measures of the late Russian empire.

Russian itself remained largely unaffected by multilingualism within the confines of the Russian empire and later on the Soviet Union, with the exception of regionally restricted L2 varieties of Russian used by titular nations and minority groups. These varieties show a number of features that betray the local substrate, but also reflect features associated with second language acquisition, such as loss of gender agreement. None of these regional varieties has been systematically researched.Footnote ⁶

In recent decades, essential changes to the usage norms and grammatical makeup of endangered indigenous minority languages have been observed under the impact of the cultural spread of Russian. Some of the changes, especially those entailing simplification or reduction of grammatical structures, may be due to general tendencies in language attrition, while others more or less clearly reflect direct impact of Russian (Reference Grenoble, Gilbers, Nerbonne and SchaekenGrenoble 2000, Reference Gruzdeva, Gilbers, Nerbonne and SchaekenGruzdeva 2000, Reference Kazakevič and KrysinKazakevič 2000, Reference Nevskaja, Gilbers, Nerbonne and SchaekenNevskaja 2000, Reference LeinonenLeinonen 2006, Reference AndersonAnderson 2005).

29.2.2.1 New, Contact-Induced Languages: Three Pidgins and One Mixed Language

Russian imperial expansion, especially where it took the form of a colonial frontier, could have led to the emergence of pidgins all along the paths of pioneering colonists who discovered and appropriated the North Asian land mass from the sixteenth to the eighteenth centuries. In fact, however, only for Taimyr Pidgin Russian and for the mixed language Copper Island Aleut can a truly colonial origin be claimed. Russenorsk and Chinese Pidgin Russian clearly originate from informal cross-border trade relations.

Taimyr Pidgin Russian (TPR) is a colonial frontier pidgin that bridged the linguistic divide between Russians and the nomadic peoples of the Taimyr peninsula (Nganasan, Enets, Nenets, Evenki, and Yakut). Its earliest attestations date from the second half of the nineteenth century; it went extinct only recently. It was first described in the 1980s by the Uralist Evgenij Arnol’dovič Reference Xelimskij, Vardul‘ and BelikovXelimskij (1987, Reference Xelimskij2000). Further extensive research has been done by Reference SternStern (2005, Reference Stern2009, Reference Stern2012).

The lexicon of TPR is predominantly Russian. Its dominant SOV basic word order reflects the word order of its substrate languages Nganasan, Dolgan (close to Yakut), and Evenki. A most unusual peculiarity of TPR is the wholesale preservation of the verbal inflections of its Russian lexifier. Russian nominal grammatical morphology has been replaced by a simple contrast of uninflected bare nouns for grammatical core relations and nouns plus the postposition mesto ‘place’ for peripheral relations. The genitive/accusative forms of the personal pronouns (menja, tebja, ego) have been generalized for all syntactic contexts. The locative adverbial tut ‘here’ is used as a demonstrative and as 3rd sg personal pronoun.

Russenorsk (RN) facilitated barter trade between Russian merchants from the White Sea region and Norwegian peasant fishermen in northern Norway (Finnmark, Troms) from the eighteenth up to the beginning of the twentieth century. The main body of RN material was published by Olaf Reference BrochBroch (1930), who also provided a first in-depth analysis (Reference Broch1927). Ingvild Broch and Ernst Håkon Jahr were able to extend the text corpus and add new insights into the history and structure of RN (Reference Broch and Jahr1981, Reference Broch, Jahr, Sture Ureland and Clarkson1984).

RN is one of the very rare dual-source pidgins. Its lexicon is made up primarily of Norwegian and Russian items. There is also a small group of items from Dutch and English, which is suggestive of the influence of a nautical jargon. The basic word order is SVO. A basic differentiation is made between nouns and adjectives marked by -a (fiska ‘fish’, bela ‘white’) and verbs marked by -om (robotom ‘work’). The preposition på, which has a double Russian and Norwegian etymology, is used as a generalized marker for peripheral case relations. The generalized forms of the personal pronoun are moja for the first, tvoja (alongside English ju) for the second person.

Chinese-Russian Pidgin (CRP) emerged as a trade pidgin at the trading-post of Kyaxta on the former Chinese-Russian border (now Mongolia) in the latter half of the eighteenth century. With the annexation of the Russian Far East in 1858, CRP underwent a functional shift from trade to colonial pidgin, giving rise to its Ussuri variant. A third variant came with the construction of the Chinese Eastern Railway (1896–1900) in Manchuria and the foundation of Harbin. With the disruption of Chinese-Russian relations in the 1930s CRP went extinct. The first grammatical treatment is Reference ČerepanovČerepanov (1853). Work on CRP has recently been taken up again by Elena Vsevolodovna Reference Perexval’skajaPerexval’skaja (2008), who built her monogenetic argument for a Pan-Siberian pidgin largely on insights into CRP. A recent extensive overview of CRP is Reference Shapiro and AnsaldoShapiro (2012).

All three variants of CRP share basic features, though the Manchurian variety seems to stand somewhat apart. The basic word order of CRP is SOV, which is suggestive of an Altaic (probably Mongolian) substrate. Somewhat similar to RN, moja, tvoja, naša, vasa, and ego (for sg. and pl.) have become the generalized forms of the personal and possessive pronoun. A desinence -i serves as a generalized verb marker.

Copper Island or Mednyj Aleut (CIA) is a moribund mixed language that was in use as an in-group language among the inhabitants of Mednyj Island, situated in the Bering Sea roughly 100 miles off the shore of Kamchatka. Menovščikov provides a first description (Reference Menovščikov1964). Additional material was collected in the 1980s by Sergej Golovko and Nikolaj Vakhtin (Reference Golovko, Bakker and MousGolovko 1994, Reference Golovko and VachtinGolovko & Vakhtin 1990 ).

The mixing pattern of CIA is not found in any other mixed language. It shows a split between the nominal morphological complex, which is Aleut, and the verbal morphological complex, which is Russian. The lexicon is basically Aleut, with a modest number of Russian cultural loans added.

29.2.2.2 Between East and West: Surzhyk and Trasjanka

Ever since the annexation of Belarusian and Ukrainian territories into the Russian empire, but particularly since industrialization with massive labor migrations from the countryside into cities after World War II, Ukrainian and Belarusian entered into everyday face-to-face contact relations with closely related Russian. The typical outcomes of these contacts, Russian-lexified speech varieties with a mixed Russian-Belarusian/Ukrainian phonetic and morphological matrix, came to be known as Suržyk in Ukraine and Trasjanka in Belarus.

Only recently has systematic research on Surzhyk and Trasjanka gained momentum.Footnote ⁷ It has emerged that Suržyk, and even more Trasjanka, resemble patterns of sociostylistic variation found in dialect-standard continua throughout most modern industrialized European nation states (Reference SternStern 2013a). In fact, many speakers of either Trasjanka or Suržyk do not perceive their language as mixed, but rather as a local variety of either Russian or Belarusian/Ukrainian (Reference Kittel, Lindner, Tesch and HentschelKittel et al. 2010 ).

29.3 Written Language Contacts

Language contact is a key feature of the very beginnings of Slavic literature, which was almost exclusively translations of Byzantine Greek clerical texts into (Old) Church Slavonic. Medieval translators applied more or less literal translation leading to syntactic copies of the Greek model (Reference BirnbaumBirnbaum 1996) and to frequent loan translations of Greek compound nouns (Reference MolnárMolnár 1985). However, the rigid barrier between sacred and profane, which on the linguistic level took the form of diglossia (Reference UspenskijUspenskij 1983, Reference LuntLunt 1987 ), counteracted the spread of Greek loan features into registers of wider communication. Diglossia gradually dissolved, so that in the Polish-Lithuanian Commonwealth a lexically mixed register (Ruthenian, Polish, Church Slavonic) for confessional polemics and catechetic literature, known as prosta mova, emerged from the late sixteenth century onwards (Reference MoserMoser 2002).

The modern age saw an increasing westwards orientation of the cultural elites of Eastern European Slavic-speaking countries, most markedly in Russia. Access to refined Western culture was initially sought via the Slavic-speaking immediate neighbors to the West. Before the Petrine reforms paved the way for accessing Western culture through Western languages directly, a steady influx of lexical items from Polish into Russian texts in writing is observed throughout the seventeenth century (Reference KochmannKochmann 1967, Reference WitkowskiWitkowski 1999 ), as are effects on syntax and style (Reference MoserMoser 1998).

With the Petrine reforms, direct loans of mostly technical terminology from Dutch (Reference Van der Meulenvan der Meulen 1909, Reference Van der Meulen1959 ), German (Reference BondBond 1974, Reference OttenOtten 1985 ), and to a lesser extent English (Reference WójtowiczWójtowicz 1993) entered Russian through translations, whereas the role of a refined model language for the higher classes was taken over by French from the mid-eighteenth century. The strong presence of French in the social life of Russian elites throughout the eighteenth and nineteenth centuries is overt in many Russian literary works of that period and can be traced in the syntax (Reference Hüttl-FolterHüttl-Folter 1996) as well as in the semantic structures and phraseology (Reference SmithSmith 2006, Reference Offord, Offord, Ryazanova-Clarke, Rjéoutski and ArgentOfford 2015 ) of modern literary Russian.

The Age of Nationalism also inspired large-scale attempts at language engineering in order to reslavicize those Slavic languages which had been subject to long-standing cultural and linguistic domination. Turkisms (Reference HenningerHenninger 1990) and Germanisms (Reference ThomasThomas 1996) became major targets of linguistic purification, and Russian as the one language that never underwent cultural domination came to serve as the major source of reslavicization (Reference Giger, Kosta and WeissGiger 2008: 126). Thus, Kollár’s idea of balanced Slavic mutuality (Reference Kollár1844) involuntarily turned into a policy of replacing one model of cultural-linguistic domination by another one, which was believed to restore lost authenticity on the basis of genetic proximity. Slavicizing purification efforts focused primarily on the lexicon, but also included phraseology and grammar.

29.4 Outlook: Topics for the Age of Globalization

The age of globalization is marked by high international mobility, migrations, and resettlement of workforces and refugees alongside the establishment of English as language of the globalizing world. In one way or another all these aspects of globalization have found reflection in recent research on Slavic linguistic contacts, but the overall amount of research done is still modest. The opening of borders in the post-socialist space has given rise to vibrant local cross-border economies, mostly within the informal sector, but only Russian-Chinese border traffic has so far attracted the attention of contact linguists (Reference Fedorova, Bruns and MiggelbrinkFedorova 2012, Reference NamsaraevaNamsaraeva 2014, Reference OgleznevaOglezneva 2007, Reference Oglezneva2014, Reference SternStern 2015, Reference Stern, Kamusella, Nomachi and Gibson2016 ). Soviet and post-Soviet emigration to the West, especially where it entailed permanent resettlement of larger groups of Russian speakers, has attracted research on language maintenance and attrition (Reference ZemskajaZemskaja 2001; see also in this volume Chapter 31, on Slavic heritage languages). Perhaps the best-documented and researched case of a Slavic-speaking immigrant community is that of the Russian Germans or Spätaussiedler, who adopted a bilingual in-group practice of conversational code-switching alongside an emergent mixed code with a Russian matrix (Reference BerendBerend 1998, Reference Brehmer and AnstattBrehmer 2007 ). Finally, the impact of Global English on Slavic languages has brought numerous studies addressing recent changes of language use and style that are underway in subcultures and specialized professions under the impact of English (Reference Pfandl, Muhr and KettemannPfandl 2002, Reference Rathmayr, Muhr and KettemannRathmayr 2002, Reference MečkovskajaMečkovskaja 2007 ). It is here that we perceive that while past contact is accepted as a fascinating ingredient of cultural history, present contact remains even for professional linguists a deplorable sign of cultural threat and decay (Reference GriffinGriffin 1997, Reference Griffin2001, Reference GorhamGorham 2000).

30 The Slavic Literary Micro-Languages

30.1 Definition of the Slavic Literary Micro-Languages (SLMLs)

In addition to the major Slavic languages that are treated in any handbook on Slavic languages, there are additional Slavic languages – usually used by Slavic minority groups – with (to some extent) established traditions of literacy. In the past, these languages were often dealt with under dialectology, with a special note that a given dialect has a written form. The status of those minority languages is often disputed, in the sense that there are discussions – often politically oriented – on whether to give them the status of languages of their own, or of dialects of a language X, or some status between the aforementioned two (with or without a written form). A turning point in studying these phenomena was due to the Tartu-based linguist, Aleksandr Duličenko (1941–), who introduced into Slavic studies the terminology Slavjanskie literaturnye mikrojazyki ‘Slavic literary micro-languages (SLMLs)’ in his seminal work with the same title published in 1981, and advanced the theoretical frame of SLML. Here the terminology ‘literary language’, traditionally used in Slavic linguistics, does not mean exclusively ‘a language of literature’, but rather means a ‘standard language’ with polyfunctional usage in a given linguistic community, including an aesthetic function for writing literary works (see Duličenko’s characteristics of SLMLs below). The idea of SLML can be found exclusively in Slavic studies, not in theoretical sociolinguistic literature (Reference Stern, Stern, Nomachi and BelićStern 2018: 20). It has not been supported by all Slavists as a fact (see Section 30.7), but since the appearance of Duličenko’s above-mentioned monograph and his works in this field, the terminology has become more or less stabilized and accepted in some handbooks written by specialists (see Reference SuprunSuprun 1989, Reference RehderRehder et al. 1993, Reference MečkovskajaMečkovskaja 2000, Reference Okuka and KrennOkuka & Krenn 2002, Reference PiperPiper 2008, Reference VečerkaVečerka 2008, Reference SiatkowskaSiatkowska 2004, and many others).

According to Reference DuličenkoDuličenko (2015: 572–573), an SLML is a language that has the following features: (1) It is one of the forms in which language exists – similarly to any ‘ethnic’ literary language (and unlike such linguistic forms as dialects or substandard vernaculars); (2) a micro-language is a written form of language and, as such, implements certain orthographic principles; (3) it is based on a specific dialect – like a majority of ‘ethnic’ literary languages; (4) it is characterized by normalization tendencies, that is, literary norms are developed at the phonological, grammatical [i.e. morpho-syntactic], and lexical levels; (5) [literary norms] may become stable and be further codified (i.e. established by way of normative grammars and normative dictionaries); (6) which, on the whole, is a consequence of the functioning of such a literary linguistic form in various spheres of life; (7) within the framework of an organized and socially maintained literary linguistic process (for a critique of this definition, see Section 30.6).

Reference DuličenkoDuličenko (2015: 574) considers that the existence of the above-mentioned features distinguishes SLML from so-called literary dialects in which no process of standardization can be observed and which often remain personal linguistic experiments. This characterization of SLMLs is rather close to that of major Slavic standard languages, though the overall quality is different. Another important fact is that, according to Reference DuličenkoDuličenko (1981: 9), an SLML always co-exists with a major national language and the SLML hierarchically occupies a lower position in two respects: language-internally, the degree of standardization is less strict in SLMLs, while language-externally, the functional polyvalency is narrower in SLMLs.

In addition to the term SLMLs, in his newer publications Duličenko applies the terminology Malye slavjanskie literaturnye jazyki ‘Small Slavic Literary Languages’ (cf. Reference Duličenko, Moldovan, Skorvid, Kibrik, Rogova, Jakuškina, Žuravlev and TolstajaDuličenko 2017) as a synonym for SLML. Other scholars who work on these languages, but without paying attention to the notion of a literary language (such as Reference Breu, Duličenko and NomachiBreu 2018 ), often use the terminology without ‘literary’, that is, Slavic Micro-Languages (SML). The term SLML or SML is widespread nowadays not only among scholars, but also among language activists who often play the role of codifiers of an SLML. For instance, codifiers of the Bunjevac literary language and the Podlachian literary language call their languages ‘micro-languages’ in addition to the usual terminology ‘language’ (Reference Popov and Kujundžić-OstojićPopov & Kujundžić Ostojić 2019: 3, Reference Maksymiuk, Stern, Nomachi and BelićMaksymiuk 2018 ).Footnote ¹

30.2 Changing and Varying List of SLMLs

The number of SLMLs varies, depending on time and scholars who count these languages, because, first, SLMLs not only appear but also disappear for various reasons. For instance, when Duličenko published his first monograph in 1981, it was inconceivable to consider Silesian as a literary micro-language, but Reference Duličenko, Duličenko and NomachiDuličenko (2018: 4) now regards it as an SLML. On the other hand, West Polesian, which had a certain success in the late 1980s and early 1990s in the Soviet Union’s Belarus and Ukraine, seems to have disappeared since the leader of the movement Mykola Šyljahovyč stopped his activity. In addition, the disappearance of any SLML does not always negate the possibility of reappearance of the same or a similar SLML. For instance, Óndra Łysohorsky’s literary Lachian is counted as one of the specific SLMLs, but since Łysohorsky’s death in 1989, nobody has published anything in Lachian. However, today, after 30 years (as of 2020), Eva Tvrda, a Silesian poet, inspired by Łysohorsky, publishes her works in Lachian. Nevertheless, at this stage it seems to be impossible to find any features of SLMLs that could be applied to her Lachian. In 1981, Reference DuličenkoDuličenko (1981: 10–11) counted 12 SLMLs with two unsuccessful attempts (Table 30.1). The 12 SLMLs’ approximate geographical distribution is as shown in Figure 30.1.

Table 30.1 SLMLs according to Reference DuličenkoDuličenko (1981)

Linguonym	Genetic affiliation	Main existing areas
Vojvodina Rusyn	East Slavic/West Slavic	Northern Serbia, Eastern Croatia, Southern Hungary
Burgenland Croatian	South Slavic	The border area between Austria and Hungary, Western Slovakia
Molise Croatian	South Slavic	South Central Italy in the province of Campobasso
Prekmurje Slovene	South Slavic	Eastern Slovenia close to Hungary
Čakavian	South Slavic	Adriatic coast in Croatia
Kajkavian	South Slavic	Northern Croatia
Banat Bulgarian	South Slavic	Western Romania, North-Eastern Serbia
Kashubian	West Slavic	Northern Poland
Lachian	West Slavic	Border zones between the Czech Republic and Poland
East Slovak	West Slavic	Eastern Slovakia
Carpatho-Rusyn	East Slavic	South-Western Ukraine, Eastern Slovakia, South Eastern Poland^a
Resian	South Slavic	Friuli Venezia Giulia in Italy, close to Slovenia
Venetian Slovene (attempt)	South Slavic	Friuli Venezia Giulia in Italy, close to Slovenia
Aegean Macedonian (attempt)	South Slavic	Originally in Greece, but attempts were made in Romania and Poland

^a This last variety is referred as the Lemko language. Some activists do not treat it as a kind of Carpatho-Rusyn, but as a separate language.

Figure 30.1 Approximate geographical distributions of SLMSs in Reference DuličenkoDuličenko (1981)

Venetian Slovene and Aegean Macedonian are not included in Figure 30.1.

In his encyclopedic handbook, Reference DuličenkoDuličenko (2003–2004) proposed the following 20 languages, including attempts. Here let us include only those which were added to Reference DuličenkoDuličenko (1981) (Table 30.2).

Table 30.2 Micro-languages added by Duličenko

Linguonym	Genetic affiliation	Main existing areas
Pomak (attempt)	South Slavic	Northern Greece
West Polesian	East Slavic	Southwestern Belarus, Northwestern Ukraine
Upper Sorbian	West Slavic	Eastern Germany (Bautzen)
Lower Sorbian	West Slavic	Eastern Germany (Cottbus)
Vič (attempt)	West Slavic	Lithuania close to Belarus
Halšan (attempt)	East Slavic	Lithuania close to Belarus

Among these, Pomak, West Polesian, Vič, and Halšan SLMLs are phenomena that appeared after the publication of Reference DuličenkoDuličenko (1981). The difference is that the two varieties of Sorbian have been added to the list, which is disputable. As will be mentioned later in Section 30.6, the linguistic stability with continuous tradition of a written form and its establishment, and the sociolinguistic status of both varieties of Sorbian, are clear enough, therefore this particular idea of Duličenko is not supported even among those scholars who recognize micro-literary languages as linguistic phenomena.Footnote ²

In his latest publication on this topic, Reference Duličenko, Duličenko and NomachiDuličenko (2018) proposes 20 SLMLs. The difference between the previous and current lists of languages consists in that Vič and Halšan are not mentioned any more, while Bunjevac (Northern Serbia) and Silesian (Southern Poland) are added. Reference Duličenko, Duličenko and NomachiDuličenko (2018) does not seem to cover all SLMLs and attempts. Among such, one could mention various attempts to (re-)codify literary Moravian (West Slavic) in the Czech Republic (Reference Šustek and DuličenkoŠustek 1998, Reference Osowski, Stern, Nomachi and BelićOsowski 2018 ), and literary Podlachian (East Slavic) (Reference Maksymiuk, Stern, Nomachi and BelićMaksymiuk 2018, Reference Mladenova2021 ). In the Balkans, there was an attempt to codify literary Gorani (South Slavic) that even gained an official status in Kosovo (Reference Nomachi, Duličenko and NomachiNomachi 2018, Reference Długosz, Greenberg and GrenobleDługosz 2020 ).

Only new attempts are mapped on Figure 30.2.

Figure 30.2 Approximate geographical distributions of SLMSs in Reference DuličenkoDuličenko (2003–2004) and later publications

If we include some Internet-based projects, one could mention the project of the Siberian literary language in the early twenty-first century. It was based on the northern Russian dialect spoken in Siberia and even some kind of a normative grammar was produced, but the project itself disappeared.Footnote ³ Similar phenomena such as Katsian in Northern Russia could also be mentioned, but so far there does not seem to be any notable success in establishing a written form (cf. Reference BaranovaBaranova 2014 ).

In contrast to the above-mentioned recently emerging, rather unstable SLMLs and attempts, some SLMLs such as Burgenland Croatian or Vojvodina Rusyn are much more stable and have a longer history that can be comparable to the major Slavic languages.

It is important to note that there are attempts to create a literary dialect with an established norm, but without an intention to establish a language: Mazurian (West Slavic) (Reference SzatkowskiSzatkowski 2020) and Kurpian (West Slavic) (Reference RubachRubach 2009, Reference Rubach2016, Reference Rubach2017, Reference Rubach2019). The codifiers of those dialects do not claim any official status or any linguistic separateness from Polish. However, particularly with regard to the latter, the Kurpian dialect has an already well-established normative grammar, dictionaries, textbooks, and other publications. Also, there is a local cultural society, Związek Kurpiów ‘The Union of Kurpians’, that has been promoting activity in the dialect.

Reference VečerkaVečerka (2008) offers a somewhat different list of SLMLs from Duličenko and his followers. In addition to those SLMLs in Duličenko’s open-ended list, as SLMLs Večerka counts the following seven defunct varieties: the Biblical Czech language (West Slavic) that was used among Slovaks in the eighteenth and nineteenth centuries; Jesuit Slovak/Trnava Slovak (West Slavic), in which some orthography manuals and textbooks were printed in the eighteenth century; Camaldolese Slovak (also West Slavic, named for an eighteenth-century monastery), which was the first attempt at a codified Slovak language based on the western dialect in the eighteenth century; Bernolakian Slovak (West Slavic), which was the first codified Slovak based on the western dialect in the eighteenth century; Slaveno-Serbian (South Slavic), which was a mixture of Russian, the Russian and Serbian recensions of Church Slavonic, and the Serbian vernacular (the Vojvodina dialect) in the eighteenth century; Transylvanian Bulgarian (South Slavic), which existed between the sixteenth and eighteenth centuries in present-day Romania; and Ruthenian/Prosta mova ‘Simple Speech’ (East Slavic), which was an official language of the Polish-Lithuanian Commonwealth in the sixteenth and seventeenth centuries. Although there is no universally accepted definition of SLMLs, many of Večerka’s additions are usually dealt with within the history of each major Slavic literary language (Slovak, Serbian, both Ukrainian and Belarusian for Ruthenian, respectively). In addition, not all these cases show features characteristic of the SLMLs discussed in Section 30.1; they will not be discussed in this chapter.

30.3 Classifications of SLMLs

In Section 30.2, the genetic affiliation of SLMLs was mentioned. Generally speaking, there is no difficulty in determining the genetic affiliation on the macro level, except for Vojvodina Rusyn, whose affiliation is disputed among scholars (West Slavic or East Slavic). Indeed, Reference DuličenkoDuličenko (2011: 333, Reference Duličenko, Moldovan, Skorvid, Kibrik, Rogova, Jakuškina, Žuravlev and Tolstaja2017: 630) offers a classification of SLMLs based on the traditional tripartition of the Slavic languages. However, the discussion about the detailed status of each SLML is not always obvious. When an SLML is formed based on a transitional dialect between language X and language Y, its linguistic affiliation often can be disputed. For instance, the literary Lachian language was formed based on a transitional dialect between Czech and Polish. In this case there can easily appear three (even four) camps among scholars and activists with regard to the status of Lachian: Lachian is not a language but a dialect of Czech, or a dialect of Polish, or neither Czech nor Polish, or both Czech and Polish. The last option is implausible, but at least Óndra Łysohorsky, the founder and the codifier of the Lachian literary language, took the third position (neither Polish nor Czech); he intentionally set up an orthographic system which is a mixture of Czech and Polish to show the transitional status between Czech and Polish (Reference ŁysohorskyŁysohorsky 1988: 816). There are many SLMLs that are difficult to uniquely classify genetically on the micro level.

In Slavic linguistics, there is another type of classification of SLMLs. The other pattern is also due to Duličenko and is based on the combination of an areal-geographical principle and an ethnolinguistic principle (Reference DuličenkoDuličenko 2003–2004: 6, Reference Duličenko2011: 334). In later works, Duličenko offers a different ordering of each group, but the concept remains the same. They are (1) Autonomous, (2) Insular, (3) Peripheral-Insular, and (4) Peripheral (or regional). According to Reference DuličenkoDuličenko (2011: 323–324), each group is defined as follows.

Autonomous SLMLs: (a) they are autonomous both on the geographical and on the ethnolinguistic and genetic levels; (b) they have all possibilities needed for cultivating a norm of a literary language; (c) and for widening their functional spectrum (scope of application).
Insular SLMLs: (a) they emerge as a result of migration to different areas in other Slavic or non-Slavic lands at different periods of time; (b) geographically, they are detached from their own ethnolinguistic root; (c) on a genetic level, they retain fairly clear links to their original linguistic root; (d) as for the form of the literary language, they look like the Autonomous SLMLs, that is, norms are formed or being formed; (e) attempts to try to keep or widen their functional spectrum are observed.
Peripheral-Insular SLMLs: (a) these are languages which have moved beyond the boundary of the main area and found themselves partially in a different ethnolinguistic environment, and only state borders make them, relatively speaking, islands; (b) on a genetic level, just like Insular SLMLs, this group of languages keep clear connection with their ethnolinguistic root; (c) on the level of literary languages, work on normalizing them is underway in the same manner as Insular SLMLs.
Peripheral (or regional) SLMLs: (a) geographically, they are adjacent to their original ethnolinguistic array, that is, to the corresponding majority Slavic people and their language; (b) they retain clear genetic ties with it, but their separateness appears on the cultural and linguistic level (the local culture performed in a local dialect or dialects in line with the process of making a single literary language); (c) being used mainly in literary and artistic works and thereby forming a literary language, this group of literary languages is least standardized, and if we bear in mind that writers often write relying mainly on their own speeches and dialects, then these can be called polycentric literary languages. But even so, the functioning process of forming a literary language is directed here towards the formation of a koine.

Based on the four typological features, SLMLs can be classified as follows (Reference Duličenko, Duličenko and NomachiDuličenko 2018: 4–6):

Autonomous SLMLs: Upper Sorbian, Lower Sorbian, Kashubian
Insular SLMLs: Vojvodina Rusyn, Burgenland Croatian, Molise Slavic, Resian, Banat Bulgarian
Peripheral-Insular SLMLs: Carpatho-Rusyn, Aegean Macedonian, Pomak, Venetian Slovene, Bunjevac
Peripheral (or regional) SLMLs: Čakavian, Kajkavian, Prekmurje Slovene, Lachian, East Slovak, West Polesian, Silesian.

As has been mentioned in Section 30.2, all these categories are open-ended, and the classification cannot be absolute. Some attempts at creating SLMLs could be added into one of those categories. In addition, in some cases, an SLML does not always fit with a given category. For instance, West Polesian is linguistically much closer to Standard Ukrainian than to Standard Belarusian, though West Polesian can be dealt with as a distinct dialect of Belarusian. Therefore, following the linguistic facts, West Polesian could be classified as a Peripheral-Insular SLML. Kashubian perhaps should not be treated absolutely as an Autonomous SLML, because it could be seen as a Peripheral SLML considering the traditional placement of Kashubian in Polish linguistics.

30.4 Conditions and Factors Particularly Pertinent for Creating SLMLs

There are no universal conditions whose fulfillment automatically produces SLMLs, because in the emergence of SLMLs, as is the case with major literary languages, each case is rather specific and has experienced different political, administrative, and ethnolinguistic changes and different dialectal patterning with different cultures. However, there are some characteristics shared by all SLMLs. According to Reference DuličenkoDuličenko (2015: 582–583), there are two kinds of features – obligatory and optional – which he regarded as conditions for creating SLMLs. The primary features are as follows:

the existence of people in a given society who recognize their own ethnic specificity, which leads them to their common ethnonym (ethnic endonym)
the existence of a linguistic separateness, that is, awareness of material difference between one’s own and other languages and awareness of homogeneity (uniformity) of one’s own language, based on which later on a literary language will be formed and a linguonym (i.e. self-name of a language) will appear
the social cohesion of the ethnolinguistic community, that is, community members’ awareness of their own affiliation to the same ethno-social organism
territorial isolation of the ethno-linguo-social community and in this connection a compactness of the environment.

These four features are essential and obligatory. The absence of one of them makes it difficult to create an SLML, according to Reference DuličenkoDuličenko (2015).Footnote ⁴ In addition to them, Reference DuličenkoDuličenko (2015) recognizes secondary features which are not obligatory:

the existence of a pre-tradition of a literary language (more broadly, a cultural-linguistic pre-tradition) based on their own mother tongue and non-mother tongue
the existence of cultural-religious or cultural-national movements that acted, for example, in the form of Protestantism (the sixteenth century) and national revival (the second half of the eighteenth and mid-nineteenth centuries)
the existence of a subjective factor, which is usually expressed in the form of awakeners.

In this context, one could also mention political changes as a supplement to two of Duličenko’s secondary conditions. For instance, appearance of the Lachian LML was closely associated with the emergence of socialism, particularly that in the Soviet Union, as Łysohorsky saw an ideal future in the advent of socialism including the rise of proletarians and self-determination and that is why he started his linguo-cultural activities. Other cases are the most recent SLMLs and similar attempts. They became possible only after the political changes in the Eastern Bloc or during the collapse of the Bloc when democratization started to infiltrate into the (former) Bloc. In addition, it is important that the rise of nationalism in those countries stimulated this tendency. The attempts of Vič and Halšan in Lithuania, and of West Polesian in Soviet Belarus and Ukraine should be placed in this context. In connection with this, a further step was a political Europeanization of the former Eastern Bloc. The former Bloc countries saw it as almost imperative to become members of the European Union and cooperate with the Council of Europe, and therefore they had to accept the multilingualism that Europe has been promoting. In this context, Kashubian received official status in Poland in 2005, which influenced and activated the Silesian autonomy movement to pursue the same status in one way or another.

In addition to these conditions, one could also add the development of new technologies and their application to SLMLs for widening the sphere of usage. Most successful societies that have been promoting their own SLMLs need to possess not only traditional media such as books, journals, and language handbooks, but also multiple mass media such as newspapers and TV and radio broadcasting in SLMLs. Today, SLMLs without presence on the Internet are not even imaginable; the Internet is vital for their activities including the propagation of an SLML to non-members of the society related to it (cf. Reference Stern, Stern, Nomachi and BelićStern 2018: 85–123). For instance, most important information about the Podlachian LML can be found exclusively on the website Svoja.org. The use of the Internet implies the existence of language users and some successful language transmission from older generations to newer.

30.5 SLMLs as Minority and Endangered Languages

In most cases, SLMLs are lesser-used languages and endangered, though the degree of endangerment varies from one SLML to another. The UNESCO Atlas of the World’s Languages in Danger mentions only Upper and Lower Sorbian, Kashubian, Carpatho-Rusyn, Vojvodina Rusyn, and Polesian (Reference Salminen and BrenzingerSalminen 2008: 36), while the electronic version of the Atlas includes Banat Bulgarian, Burgenland Croatian, Resian, and Molise Croatian (=Slavic), as of September 2021.Footnote ⁵ This project is important for minority languages but has no binding legal force. According to the list of languages covered by the European Charter for Regional or Minority Languages (ECRML), the following seven SLMLs are included: Bunjevac (Serbia), Burgenland Croatian (Austria),Footnote ⁶ Kashubian (Poland), Lemko (= one of the varieties of Carpatho-Rusyn) (Poland), Lower Sorbian (Germany), Ruthenian (= Rusyn) (Bosnia and Herzegovina, Croatia, Hungary, Romania, Serbia, Slovakia, Ukraine), and Upper Sorbian (Germany), as of May 2015.Footnote ⁷ Since each European state that has ratified the Charter has to choose at least some measures to protect any minority language that it itself has recognized, the above-mentioned SLMLs are officially included.

This does not always mean that other SLMLs are not protected, but some of them are not treated separately as SLMLs.Footnote ⁸ For instance, Bulgarian is protected in Romania and in Serbia, while Banat Bulgarian is not treated as a separate entity. Indeed, in the communist period, Bulgarian was already protected and taught to Bulgarian minorities in Romania, but the language taught in the whole country was the standard Bulgarian language. Speakers of Banat Bulgarian, though its standardized form equipped with a Latin script-based orthography is significantly different from Standard Bulgarian (cf. Reference MladenovaMladenova 2021), had to study the standard variety in schools; therefore in the long run it is unclear whether the official status of minority language in Romania helps Banat Bulgarians to preserve and develop their linguistic heritage. The same is true for Resian and Molise Slavic in Italy, as they are protected simply as Slovene and Croatian. While in Serbia today, Bunjevac is treated as a separate entity, not as a variety/dialect of Croatian or Serbian, outside of Serbia, Bunjevac is mainly treated as a dialect of Croatian. These different treatments of the language–dialect dichotomy are based on different language policies of each state and, in some cases, on different linguistic identities of speakers, too.

In this context, one should note that Kajkavian and Čakavian LMLs are usually treated as dialects of Croatian, therefore they are not treated as endangered languages, though there are various local initiatives to preserve the dialectal diversity in Croatia. The same can be said about Kashubian spoken in Renfrew County in Ontario, Canada. But in this particular case, the problem is that speakers identify themselves as Poles and their language as Polish as well.Footnote ⁹

In all the above-mentioned cases, naturally, almost personal projects or attempts at creating SLMLs with less-visible success, dialects without speakers’ awareness of linguistic separateness are not subject to being mentioned or protected by laws. It is even impossible to specify how many speakers there are in these cases, as the national census in any country does not usually collect information about the number of dialect speakers.

In the context of protection and development of SLMLs as minority languages, contemporary Vojvodina Rusyn turns out to be a specific case. When Rusyns and their language were officially recognized in 1919 in the Kingdom of Serbs, Croats, and Slovenes, the Rusyns’ cultural organization Rusyn Popular Education Society was immediately set up and the Society published newspapers, journals, books, textbooks including a normative grammar which was taught in primary schools. Thus, in the very early phase of its codification, the polyfunctionality of newly codified literary Rusyn started its development. After World War II, Rusyns could take advantage of better social and legal conditions. The first Rusyn high school opened in 1945, the first radio and TV broadcasting in 1949 and 1975, respectively. In 1977 one high school in Ruski Kerestur was launched in which all subjects in three classes have been taught in Rusyn since then. In 1981, the department of Rusyn language and literature was established within the Faculty of Philosophy at the University of Novi Sad and it has existed until today (cf. Reference FejsaFejsa 2010). This rather straight development is exceptional among SLMLs. As of 2021, there are only three SLMLs (Vojvodina Rusyn, Carpatho-Rusyn in Slovakia, and Sorbian) that have their own departments/institutes in higher education.Footnote ¹⁰ However, this does not mean that the future of Vojvodina Rusyn or Sorbian is secured, particularly as the Lower Sorbian case clearly shows.

Establishing (more stable) standardized forms and their implementation in the educational system is an important strategy for the preservation and development of endangered languages, including Slavic minority languages. However, this strategy does not work well enough when a given language shows a high dialectal diversity. For example, Kashubian is being standardized based on the Central dialect, which is significantly different from Southern and Northern dialects. This means that Kashubs from Southern and Northern regions will eventually have to learn a new variety that is rather foreign to them.

30.6 SLMLs Pro et Contra: Critical Assessments of SLMLs as a Category and Alternative Approaches to Slavic Micro-Languages

Since the appearance of Reference DuličenkoDuličenko (1981), there have been various kinds of discussion over SLMLs. Critiques of each SLML are based on a language–dialect dichotomy. For instance, the Bulgarian scholar Ivan Reference KočevKočev (1984) attacked Duličenko because Duličenko divided Banat Bulgarian from the corpus of the Bulgarian language. A more scholarly critique was made by Hanna Reference Popowska-TaborskaPopowska-Taborska (1988), who claimed that Kashubian is nevertheless one of the Polish dialects, purely considering linguistic facts. Konstantin Reference Lifanov, Stern, Nomachi and BelićLifanov (2018) considers that there has been no East Slovak LML, because each sort of writing in Eastern Slovakia that emerged in different periods of time has been individual not only in terms of its linguistic features, but also of its mechanism of emergence and sphere of functions. In addition, it seems that Kajkavian and Čakavian dialects of Croatian may not be justified as LMLs at least today, as there are no clear or observable standardizing tendencies, though there were indeed rich traditions of written forms which can be labeled as literary languages as Croatian scholars described them in the history of Croatian.

There have been critiques about the theoretical aspects of SLMLs as well. For instance, Reference RehderPeter Rehder (1984–1985) criticized Duličenko for the claim that macro-languages and micro-languages are different not in their quantitative respects, but exclusively in their qualitative respects.

By criticizing the vagueness of the definition of SLMLs that could be also applied to a regular literary language, Dieter Reference Stern, Stern, Nomachi and BelićStern (2018) provides more concrete features to define SLMLs. Stern summarizes that LMLs have six features by which they can be distinguished from literary languages (Table 30.3).

Table 30.3 Six features of LML in contrast to national standard language

	Literary micro-language	National standard language
1	Second/supplementary language in writing	First, often only language in writing
2	Externally claimable	Not externally claimable
3	Regionally bounded	Supraregional
4	Subnational	National
5	In-group only	In-group/out-group
6	Writing necessarily an act of identity	Writing not necessarily an act of identity

(Stern 2018: 25)

According to Stern (ibid), it is feature (2) that makes scholars include Sorbian in SLMLs.Footnote ¹¹ In addition, Stern warns that the SLMLs defined by him can work only within the frame of a modern nation-state. Therefore, Renaissance poets’ works in Čakavian do not justify the existence of a Čakavian SLML now.Footnote ¹²

Reference Skorvid and SkorvidSkorvid (2017) criticizes Duličenko and offers a different paradigm for SLMLs with different terminologies. On the one hand, Skorvid points out that Duličenko’s terminology includes quite different linguistic entities under the same terms ‘SLMLs’ or ‘SMLs’, which are vague and are given totally different interpretations by one scholar and by another. On the other hand, Skorvid views as problematic the inconsistency of Duličenko’s usage of terminology even in his own publications, which has been pointed out and criticized by Reference HenzelmannHenzelmann (2016), too.Footnote ¹³

The most problematic issue is that since 2003, Upper and Lower Sorbian have been included in Duličenko’s list of SLMLs. However, in fact they could not be placed on the same level and treated together with West Polesian or Lachian, which Skorvid calls literary idiolects. Skorvid considers that Sorbian, Vojvodina Rusyn, and Lemko in Poland should be called simply minority languages, while he suggested that Kashubian, Čakavian, and Kajkavian could be classified as regional languages.

Along similar lines to Skorvid, Vladislav Reference Knoll and SkorvidKnoll (2017) criticizes the general vagueness of Duličenko’s paradigm of SLMLs and offers a different and detailed scheme to classify the cultural-written idioms that Duličenko has been calling SLMLs. Knoll differentiates five stages and defines them as follows:

Unwritten idiom (UI): a linguistic form that functions only in an oral form (including the sphere of an oral culture/literature)
Literary idiolect (LI): a linguistic form which a certain person uses, alone or with a group of confederates (in an oral and/or written form)
Literary dialect (LD): a linguistic form which possesses a written artistic literature (genres are typically limited), but is absent in the social sphere in a written form; ways of writing it are often not unified
Regional literary language (RL): a linguistic form that has a writing culture, which is limitedly used in artistic literature and in social life, but is characterized by its absence in some functional spheres
National literary language (LL): a language form is present in all available functional spheres.

Considering these features, Knoll summarizes his re-classifications of SLMLs as in Table 30.4.

Table 30.4 Knoll’s scheme for the re-classification of SLMLs

Writing	UI	LI	LD	RL	LL
Secondary	+	not obligatory	+	+	+
Artistic literature	(only in an oral form)	+	+	+	+
Social functions	−	−	−	limited	+

Knoll neither classified all SLMLs that Duličenko listed nor put them into his scheme, and historical development of each SLML could be placed in a different cell. According to Knoll, however, Lachian is LI, while Sorbian, Burgenland Croatian, Vojvodina Rusyn, Kashubian, and Carpatho-Rusyn in Slovakia are characterized as RL. Čakavian, Kajkavian, Prekmurje Slovene, and East Slovak could be classified as LD.

31 Heritage Language Forms

31.1 Introduction

The complex political, economic, and social forces of the last few decades have impelled massive waves of migration from regions of Eastern Europe and the post-Soviet space, prompting a renewed spike in the continuous spread of Slavic languages to new locales documented throughout their history. Along the traditional path of three-generational language shift characteristic of immigrant settings, these varieties have been passed down as heritage languages, that is, languages acquired naturalistically, albeit to varying degrees, by bilingual children who most typically attain higher proficiency in the societally dominant language by adulthood.

This chapter presents a broad overview of the current research on the formal properties of Slavic languages developing in heritage language (HL) settings. Considering the scarcity of directly comparable data across the Slavic varieties, the chapter will take stock of and draw generalizations from a relatively narrow set of the most representative and amply researched structural changes in heritage Slavic grammatical systems, analyzed against the backdrop of the corresponding baseline systems (the latter notion encompasses both the homeland and first-generation diaspora varieties). Starting with morphosyntactic phenomena in the nominal and verbal domains, I will focus on the restructuring of the case and gender systems and shifts in the grammatical encoding of temporal distinctions through aspect and tense morphology (Section 31.2). At the levels of sentence organization and discourse structure, I will discuss word order change in fixed and variable syntactic configurations (Section 31.3). The concluding discussion in Section 31.4 will generalize over the recurrent themes of the surveyed body of work and identify directions for future studies in heritage Slavic linguistics.

31.2 Morphosyntax

As highly inflected fusional systems, Slavic languages are characterized by complex nominal and verbal paradigms and elaborate agreement marking. Considering that inflectional morphology often represents the primary locus of change attested in HL settings (Reference PolinskyPolinsky 2018), it does not come as a surprise that the majority of existing HL scholarship to date has been concerned with their core morphosyntactic categories: case, gender, and tense-aspect marking.

31.2.1 Case

Within the domain of nominal morphology, an area most susceptible to change in HLs (Reference Benmamoun, Montrul and PolinskyBenmamoun et al. 2013), grammatical case seems particularly vulnerable to pressures of unbalanced bilingualism. Most contemporary languages within the Slavic group (with the notable exception of Bulgarian and Macedonian) retain the Proto-Slavic case paradigm consisting of seven forms: nominative, genitive, dative, accusative, instrumental, prepositional, and (often substantially weakened) vocative, with considerable variability in the number of productive distinctions and their syntactic and semantic behavior across the Slavic family (see Chapter 7). This section draws on data from heritage Russian, Polish, Ukrainian, Serbian, and Croatian to identify some common properties of heritage Slavic case systems arising under different sociolinguistic scenarios of intergenerational language transmission.

Working with low- and intermediate-proficiency heritage speakers in the US, Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky (1997, Reference Polinsky2006) documented differences between the six-case system of Modern Russian and its HL realization. Case grammars of the least fluent speakers were shown to have undergone reanalysis toward a binary opposition between the nominative and accusative (Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky 1997: 381), with the degree of case loss corresponding to the speakers’ level of proficiency. In the encoding of verbal arguments, where standard Russian relies on the nominative, accusative, and dative for the marking of subjects and direct and indirect objects, respectively, heritage grammars were shown to incrementally merge the subject and direct object, both indicated by the nominative, while the accusative, if retained, was employed primarily for the marking of indirect objects. The resulting reduced system is schematized in (1), where the arrows point in the direction of change (Reference PolinskyPolinsky 2006: 220):

(1) Dative > Accusative > Nominative

In the encoding of adjuncts, the attested patterns of case marker use suggest a similar trajectory of shift toward the nominative, which eventually replaces all oblique cases for low-proficiency speakers. In these grammars, Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky (1997) argued, case ceases to be marked as a grammatical category, barring some lexicalized ‘chunks’ of structure. In instances where some of the peripheral cases are retained, their distribution once again seems to conform to a set pattern of underlying principles, including the replacement of the lexically governed genitive with the accusative or nominative (e.g. boitsja prestupniki ‘afraid of criminals-NOM/ACC’), loss of the genitive of negation in favor of the nominative (e.g. u nee net muž ‘she has no husband-NOM’), and loss of prepositional obliques (Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky 1997: 378–379; see example (3)).

In a subsequent study of narratives produced by a child and an adult heritage speaker, Reference Polinsky, Brinton, Kagan and BauckusPolinsky (2008b) found, consistent with the generalization in (1) above, that less than half of direct objects carried the expected accusative case marking. In the same vein, both the child (36 percent) and especially the adult (69 percent) did not use the oblique cases with prepositional phrases, opting for the nominative as the most frequent replacement form (Reference Polinsky, Brinton, Kagan and BauckusPolinsky 2008b: 153). Examples (2) and (3) illustrate these patterns.Footnote ¹

(2)
i malčik idjot iskat’ ljaguška
and boy-nom goes look for frog-nom
‘And the boy goes looking for his frog’ (Reference Polinsky, Brinton, Kagan and BauckusPolinsky 2008b: 153)

(3)

Ja s	babuška	s deduška	govorju	po russkom
I with	granny-nom	with grandpa-nom	speak	in Russian
‘I speak Russian with my grandparents’ (Reference Laleko, Iverson, Ivanov, Judy, Rothman, Slabakova and TryznaLaleko 2010a: 51)

It is noteworthy that, among the massive shifts documented in Polinsky’s data, the directionality of change in the HL seems reflective of tendencies already present in Standard Russian. For instance, loss of the instrumental and attrition of the genitive (Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky 1997: 375–377) both appear to be the more categorical instantiations of the variability in the contextual distribution of these forms in the baseline (e.g. the instrumental/nominative alternation attested with predicative nominals and adjectives; the optionality in the occurrence of the genitive of negation). In this light, it is no accident that the breakdown of the heritage case system appears to originate along the ‘cracks’ permeating the baseline system, rather than within its more stable areas. At more advanced stages of restructuring, however, even the core distinctions begin to disappear, as evidenced by changes in argument marking strategies and loss of preposition-governed obliques. Taken together, these findings raise the question of what specific factors condition the degree of restructuring of HL case systems and at what point along the path of these systems’ development such changes originate or become most prevalent.

Several studies have attempted to address these questions by investigating the acquisition of case inflections in child heritage speakers. Reference Schwartz and MinkovSchwartz & Minkov (2014) looked at the Russian case system in simultaneous and successive Russian-Hebrew bilingual children aged between 36 and 42 months, in comparison to age-matched monolinguals, and found that the differences between bilinguals and monolinguals, and among bilinguals, were largely quantitative (Reference Schwartz and Minkov2014: 85). In terms of error types, most notably substitution of nominative case and errors in oblique cases, bilinguals were qualitatively similar to monolinguals. The study identified the dative and prepositional cases in the singular and the genitive and prepositional cases in the plural as the locus of greatest difficulty for the bilingual acquirers, a result attributed to the relatively low frequencies (i.e. sparsity in the input) and high variability of these forms in Russian, along with negative transfer from Hebrew.

Focusing specifically on the bilingual acquisition of the accusative case inflection, Reference Janssen and MeirJanssen & Meir (2019) compared data obtained from child heritage speakers of Russian in the Netherlands and Israel via production, comprehension, and repetition tasks. The bilingual children were overall less accurate than the monolingual controls, a pattern signaled by erroneous overextensions of nominative inflections and a lower sensitivity to the accusative case cues in the comprehension of OVS sentences. The authors attributed these results to the competition between forms in the HL and negative transfer from the dominant languages, both with sparse case morphology. The study also demonstrated facilitative effects of length of uninterrupted HL acquisition and amount of exposure at home on the production of the accusative case by child speakers.

In a follow-up investigation, Reference Meir, Parshina, Sekerina, Brown and KohutMeir et al. (2020) drew on online eye-tracking data to further probe the production and comprehension of the accusative case morphology in child Russian-Hebrew speakers. Similarly to Reference Janssen and MeirJanssen & Meir’s (2019) findings, the study showed that these early heritage bilinguals were less accurate in using the dedicated accusative inflections (e.g. kukl-u ‘doll-acc’) but were on target in producing the unmarked forms that exhibit syncretism between the accusative and the nominative (e.g. stol-ø ‘table-acc/nom’). Despite the lower accuracy in production, the bilingual children were able to use the case cue predictively, albeit with a delay compared to the monolingual children, when processing OVS sentences in Russian and exhibited an advantage over their Hebrew-speaking peers in integrating case morphology in their processing of OVS structures. In the authors’ account, these results show that bilingual online processing of weaker cues in one language may be aided by stronger cues in the other language, suggesting that the processing strategies in the two languages interact for predictive comprehension in bilinguals (Reference Meir, Parshina, Sekerina, Brown and KohutMeir et al. 2020: 388).

In contrast to the above studies, Reference Antonova Ünlü and WeiAntonova Ünlü & Wei (2018) presented evidence of an overall successful mastery of the Russian case system in a longitudinal production study of a Russian-Turkish child between the ages of 2;11 and 4;0, with the highest accuracy rates across the recordings obtained for the nominative case (100 percent correct use), the lowest rates obtained for the genitive case (82 percent correct use), and the remaining accuracy rates ranging between 93 and 95 percent, a pattern quite similar to that reported for age-matched monolingual children (Reference Antonova Ünlü and Wei2018: 651). The two most problematic genitive case constructions attested in the corpus, genitive of negation and genitive with numbers and quantity words, represent areas of delays in monolingual acquisition and are therefore predictably prone to variability in bilinguals. In fact, problems with the use of case and number marking with numeral-noun and adjective-noun expressions were shown to persist even in adult heritage speakers, as evidenced by data from adult Russian-Hebrew bilinguals reported in Reference Meir and PolinskyMeir & Polinsky (2021). In rating sentences with possible and impossible agreement patterns, speakers with an early Age of Onset of bilingualism (AoO) demonstrated across-the-board problems in detecting ungrammatical inflection in both conditions, while speakers with a later AoO were challenged only by the numeral-noun forms. In the latter group, difficulties emerged in use of paucal count forms, which formally coincide with the form of genitive singular in Russian (tri stol-a ‘three tables-m.pauc’) and the genitive plural forms, used with numerals 5 and above (pjat’ samoljet-ov ‘five airplanes-m.pl.gen’). According to the authors, these results could stem from broader problems with case morphology in heritage Russian under the influence of Hebrew and a pressure to streamline the complex Russian case system into a more economical system with fewer distinctions. In the domain of count forms, these distinctions are reduced to a single numerical representation, the genitive plural, chosen as a generalized count form in heritage grammars due to its more frequent occurrence.

A similar tendency toward case reduction is reported in Reference KozminskaKozminska’s (2015) investigation of Polish spoken in the Chicago area of the US. Among other patterns of change, adult heritage speakers displayed a smaller range of case distinctions in their speech, often substituting one of the more functionally marked cases of the Polish seven-case system (dative, locative, instrumental) with a less functionally marked case (nominative, genitive, accusative), sans vocative. The following examples illustrate the occurrence of the less marked case (either the nominative, as glossed by the author, or the accusative) in place of the expected genitive in (4a) and a functional extension of the genitive to the locative context in (4b).

(4)

Case in heritage Polish (Reference KozminskaKozminska 2015: 249)
a.	Po-szł-a	pilnować	t-e	dzieci.
	pf-go-past.3.sg.f	look after	def.pl.f.nom	child-pl.nom
	‘She started looking after the children.’
b.	Moj-a	mama	pracow-ał-a	w fabryk-i
	my-nom.f	mom-nom	worked-3.sg.f	in factory-gen
	ceramiczn-ej.
	ceramic-gen.f
	‘My mom worked in a ceramic factory.’

These patterns are reminiscent of those discussed earlier for heritage Russian. Apart from the overapplication of the unmarked cases, the extension of the genitive to obliques mirrors another previously observed pattern of change: emergence of a generalized oblique case – the prepositional in the instance of Russian (Reference Polinsky and KaganPolinsky & Kagan 2007 ) – that absorbs the functions of all other oblique cases. Considering the high frequency of the genitive in Polish, its occurrence in place of the instrumental, dative, and locative cases, as documented in Reference KozminskaKozminska’s (2015) study, serves to illustrate this general principle.

Reference Wolski-MoskoffWolski-Moskoff’s (2019) intergenerational investigation of case morphology in Chicago-area Polish revealed proficiency-based differences in the shape of HL case systems. The more advanced grammars seemed to preserve the basic case distinctions, with some reduction in their functions and diminished frequency in the occurrence of the obliques. The latter tendency also showed up in the data from first-generation immigrants (particularly for genitive and instrumental), highlighting the link between divergent parental input and HL properties. Lower-proficiency speakers, on the other hand, manifested a qualitatively different case system, governed for some by the syntactic rules of English, as evidenced by the replacement of all obliques with the nominative and increased reliance on prepositions.

While still relatively scarce, analyses of the heritage Slavic case systems carried out outside of the US context yield a similarly complex set of results, with some findings pointing to their global reorganization in the heritage varieties while others highlight similarities in the use of case morphology by heritage and homeland speakers. In the former group, Reference LaskowskiLaskowski’s (2014) work with child bilinguals in the Polish community in Sweden demonstrated significant case simplification, manifested in the expansion of prepositional constructions to mark case relations and reduction of the case paradigm on the principle of replacing the more functionally marked or ‘weak’ cases (dative, locative, and instrumental) with the less functionally marked or ‘strong’ cases (nominative, genitive, and accusative), culminating in the use of the nominative as the default form in the least proficient speakers. These findings are corroborative of the implicational case restructuring model previously put forward in Reference ĎurovičĎurovič’s (1983: 24) study with second-generation Serbo-Croatian speakers in Sweden, which arranges the cases in the order from the least to most vulnerable in the HL as shown in (5).

(5) Nominative > accusative > genitive > locative > instrumental > dative > vocative

Subsequent work on heritage Serbian in Australia validated this model’s predictions in an English-dominant setting (Reference Dimitrijević-SavićDimitrijević-Savić 2008), contributing to the overall pattern of findings that highlight the systematic nature of case reorganization in heritage bilinguals while also raising important questions about the extent to which HL case restructuring may be correlated with the presence of case as a grammatical category in the dominant language.

To explore the degree of heritage grammar permeability to cross-linguistic influence at the level of syntactic categories, Reference Hansen, Hansen, Grković-Major and SonnenhauserHansen (2018) drew on a corpus of semi-structured interviews and written essays by heritage Croatian and Serbian speakers in Germany. With respect to case marking, the study identified a strong trend for case replacement triggered by the case patterns of German. For example, the HL subject pronoun in (6) appears in the dative case, replicating the form expected in German, instead of the accusative required in Croatian.

(6)
Njemu je još više sram.
he-dat be-3.sg more more shame
‘He is even more embarrassed.’ (Reference Hansen, Hansen, Grković-Major and SonnenhauserHansen 2018: 22)

Other types of case substitutions in Hansen’s corpus are consistent with the previously documented expansion of the unmarked (or strong) nominative to non-nominative contexts, for example after the preposition na ‘on’ instead of the locative (e.g. na njemački ‘in German’), or its use instead of the accusative on the adjective modifying an accusative noun, as in (7).

(7)
one turski krovove
this-acc.pl Turkish-nom.pl roof-acc.pl
‘those Turkish roofs’ (Reference Hansen, Hansen, Grković-Major and SonnenhauserHansen 2018: 32)

These observations demonstrate that case restructuring in a HL setting is not limited to situations in which the dominant contact language lacks the grammatical category of case. As a case in point, the availability of a four-case system in German does not prevent case restructuring of the six-case systems of Croatian and Serbian, and although the similarity between the heritage and dominant languages in the use of case forms does increase, it is also clear that not all of the observed case shifts are driven entirely by cross-linguistic transfer.

A very different picture emerges in recent work on heritage Slavic languages in Toronto, Canada. Applying variationist analyses to a corpus of spontaneous speech from three generations of Russian, Polish, and Ukrainian speakers, Reference Łyskawa and NagyŁyskawa & Nagy (2020) found few principled differences between the case systems of heritage and homeland varieties. All groups exhibited some instances of non-normative case occurrence (approximately 2 percent in homeland speakers and 8 percent in heritage speakers); however, controlling for contextual differences yielded the differences insignificant. Still, the analysis of case mismatches in the HL data pointed to broadly similar types of deviations from normative usage as those reported in prior studies. In particular, the most common instance of case replacement in all three varieties was a shift to the nominative; in heritage Polish and Ukrainian, a shift from the genitive (and, less frequently, the instrumental) to the accusative was also attested. In fact, the authors observed that the genitive-to-accusative shift attested in heritage and homeland Polish and Ukrainian replicates the historical development already completed in Russian (Reference Łyskawa and NagyŁyskawa & Nagy 2020: 149), supporting the overall conclusion that the underlying principles of HL change are similar to those operating across the corresponding homeland varieties.

In the same vein, Reference Isurin and Ivanova-SullivanIsurin & Ivanova-Sullivan’s (2008) study of narratives from heritage learners in a college-level Russian language program in the US found no principled shifts in the use of case forms by heritage bilinguals; however, the authors did observe occasional substitutions of obliques with other obliques and functional changes with cases marking direction, location, and means (Reference Isurin and Ivanova-Sullivan2008: 84). Taken together with the findings of the Toronto-based study, these results bring into focus the dimension of HL proficiency, associated with access to formal instruction and availability of a speech community, as an important predictor of the degree and nature of the restructuring (or preservation) in HL case systems.

In summary, the available studies on the acquisition and maintenance of grammatical case distinctions in heritage Slavic bilinguals encompass a broad range of experimental perspectives and span over various points of the proficiency continuum. Under sociolinguistically favorable conditions that provide speakers with ample opportunities for HL use, the case systems are likely to remain on par with the corresponding homeland systems, with differences contained mainly to the peripheral areas and stemming from the application of globally similar pressures of language change, often accelerated in HL settings. In the absence of sustained input, however, HLs tend to develop reduced and reorganized paradigms of case, ranging from systems that absorb certain case patterns from the societally dominant language and regularize the existing ones according to their functional salience, to those where the function of case is restricted to signaling only the core grammatical distinctions (e.g. subjects vs. objects; arguments vs. obliques), and finally to those in which the entire system is collapsed and all nominal forms are retained as invariably nominative.

31.2.2 Gender

While our understanding of case in heritage Slavic is informed by cross-linguistic studies, heritage Slavic gender systems have so far been examined most systematically in relation to Russian. As other Slavic languages, Russian has three grammatical genders (Chapter 7). In the nominative, feminine nouns typically end in -a or a palatalized consonant; neuter nouns end in -o, -e; and masculine nouns end in a non-palatalized consonant. These distinctions are reflected in agreement relations between nouns, their modifiers, and past tense verbs (see also Chapter 13).

This system has been shown to undergo varying degrees of reanalysis in HL varieties. Drawing on production and comprehension data, Reference PolinskyPolinsky (2008a) documented a shift in gender assignment strategies from those based on declension type in monolinguals to those governed by phonological principles in English-dominant heritage speakers. Intermediate speakers retained the three-way system, but grouped nouns into gender classes on the basis of nominal endings: nouns ending in a consonant were reanalyzed as masculine, nouns ending in a stressed -o as neuter, and all remaining nouns as feminine. Adhering to the same phonological criterion, grammars of low-proficiency speakers reduced the system even further by eliminating the neuter and retaining only the binary masculine–feminine opposition, with gender assignment carried out on the basis of whether the final sound is a consonant or a vowel. Thus, regardless of the degree of streamlining in the organization of the HL gender system, these results underscore its internal consistency at various points of the proficiency continuum.

Apart from changes in gender assignment principles, heritage grammars have been shown to exhibit different strategies of gender agreement. Reference LalekoLaleko (2018) employed a series of acceptability judgment tasks to examine the agreement behavior of animate nouns, targeting the patterns of fixed (i.e. only masculine or feminine) and variable (i.e. either masculine or feminine) agreement that hold in this sub-domain of the Russian gender system. In the latter category, the study tested nouns denoting occupations or describing personal qualities (i.e. hybrid nouns such as vrač ‘doctor’ and common-gender nouns such as kollega ‘colleague’), whose gender reference may vary in context. Heritage speakers converged with the monolinguals in fixed agreement contexts (with the exception of formally opaque nouns like papa ‘dad’), but significantly underrated the referential agreement pattern with hybrid and common-gender nouns, treating these forms as invariably masculine or feminine in accordance with their morphophonological properties. In eliminating contextual optionality and thus reducing the processing costs necessary for resolving competition among conflicting gender cues, the bilingual grammars arrived at a more formally transparent gender system than one instantiated in the baseline.

Several studies with child heritage speakers have underscored the facilitative effect of gender transparency in the acquisition of gender distinctions. For Russian-Norwegian bilinguals, Reference Rodina and WestergaardRodina & Westergaard (2017) and Reference Mitrofanova, Rodina, Urek and WestergaardMitrofanova et al. (2018) documented a robust variation in the shape of the acquired gender system based on the amount and consistency of HL exposure, positively correlated with a greater accuracy in non-transparent contexts. At the very low end, the heritage bilinguals applied the masculine agreement pattern nearly exclusively (possibly under transfer from Norwegian), or reduced the system to the masculine–feminine opposition, in line with Reference PolinskyPolinsky’s (2008a) findings for adults, while children with most exposure made monolingual-like distinctions. Turning to another language dyad, Reference Brehmer, Rothweiler, Braunmüller and GabrielBrehmer & Rothweiler (2012) examined the acquisition of typical (i.e. formally transparent) and atypical masculine, feminine, and neuter nouns by Polish children in Germany and found systematic agreement errors with neuter and atypical forms. In these challenging contexts, children relied on morphophonological and, when available, semantic cues and exhibited a bias for masculine agreement, argued to serve as a default pattern in these early grammars. A similar tendency towards the overgeneralization of the masculine had previously been documented for other heritage varieties of Polish, including those developing under contact with Swedish (Reference LaskowskiLaskowski 2014) and English (Reference LyraLyra 1962, Reference PolinskyPolinsky 1995).

The issue of the potential impact of the gender system instantiated in the societally dominant language on the development of gender in the HL has been taken up in a number of studies. Several large-scale investigations have been carried out across various geographic locales, focusing on the same HL variety developing in contact with gendered and genderless systems. Overall, the results have not systematically corroborated the role of transfer as a reliable predictor of the emerging HL gender system, pointing instead to the importance of continuous and varied input in the HL. For instance, Reference Schwartz, Minkov, Dieser, Protassova, Moin and PolinskySchwartz et al. (2015) elicited adjectival gender agreement from preschool bilingual Russian-speaking children with English, Hebrew, German, and Finnish as the majority languages and found primarily only quantitative differences between the bilingual children and monolingual controls, with similar error types found in both groups. Since the child bilinguals in the study came from predominantly Russian-speaking communities and displayed monolingual-like patterns of language development, transfer from the majority language did not prove to be a factor in these speakers’ emerging grammars of gender.

Drawing on elicited production data from Russian-speaking children in Germany, Israel, Norway, Latvia, and the UK, Reference Rodina, Kupisch, Meir, Mitrofanova, Urek and WestergaardRodina et al. (2020) examined formally transparent and opaque nouns – including, in the latter group, stem-stressed schwa-final nouns (neuter and feminine) and nouns ending in palatalized consonants (feminine and masculine) – and demonstrated that, despite difficulties with infrequent and non-transparently marked forms, the majority of bilingual children successfully acquire the three-way gender distinction. As in prior work, some children exhibited reduced gender systems (masculine–feminine) or lacked gender altogether (masculine only), underscoring the crucial role of input consistency in the construction of early grammatical representations in bilinguals.

In summary, the reviewed studies on the expression of the three-way Slavic gender systems in HLs paint a dynamic but coherent picture of change across various heritage speaker groups. At higher ends of attainment, heritage grammars tend to retain the three-way distinction, but reorganize the gender system to increase its formal transparency, resulting in a more predictable categorization of nouns into classes, and decreased reliance on referential factors in construing agreement relations in favor of formal features. In lower-proficiency speakers, the gender systems are reduced to a binary morphophonologically salient contrast or eliminated altogether.

31.2.3 Aspect and Tense

One of the most salient features of Slavic is the morphologically encoded grammatical category of aspect, manifested as a binary contrast between perfective and imperfective verb forms (see Chapter 10). Working with heritage Russian speakers at low and intermediate proficiency levels, Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky (1997, Reference Polinsky2006, Reference Polinsky, Corbett and Noonan2009) demonstrated that these distinctions are no longer obtained for all verbs, mostly retained by these speakers as single-valued forms (i.e. invariably perfective or imperfective). In accounting for their distribution, researchers have suggested that aspectual morphology in these HL varieties stands to mark the lexical distinction between telic and atelic predicates,Footnote ² rather than grammatical aspect proper (Reference Polinsky, Browne, Dornisch, Kondrashova and ZecPolinsky 1997, Reference Pereltsvaig, Kempchinsky and SlabakovaPereltsvaig 2005, Reference Laleko, Iverson, Ivanov, Judy, Rothman, Slabakova and TryznaLaleko 2010a).

Research with high-proficiency speakers has pointed to a more nuanced recognition of aspectual forms in these grammars, which nevertheless differed from the baseline systems in areas related to these forms’ contextual distribution. For example, Reference LalekoLaleko (2010b) documented a reduced range of discourse-pragmatic functions associated with the imperfective aspect in advanced heritage Russian speakers, resulting in a system with a greater transparency and less optionality in the selection of aspectual forms by these early bilinguals, as compared both to late Russian-dominant bilinguals (i.e. the HL speakers’ parents’ generation) and monolingual controls.

Apart from the marking of aspect, the encoding of temporality in Slavic HLs has also been investigated in relation to tense morphology (see Chapter 9). While tense is usually retained better than aspect (Reference Polinsky, Corbett and NoonanPolinsky 2009), some studies have demonstrated quantitative changes in the distribution of tensed forms in the HL. For example, in Reference Laleko, Iverson, Ivanov, Judy, Rothman, Slabakova and TryznaLaleko’s (2010a) elicited production experiment, heritage speakers of Russian in the US used overall more non-finite verb forms than baseline controls (53.8 percent vs. 40.6 percent), with a particularly high proportion of imperatives (8.7 percent vs. 1.7 percent), and underused past-tense forms (29.2 percent vs. 50.9 percent), which are more morphologically complex in Russian than present-tense forms due to the obligatory marking of aspect.

Using data from elicited narratives and an acceptability judgment task, Reference Brehmer, Czachór, Braunmüller and GabrielBrehmer & Czachór (2012) examined the formation and use of two semantically equivalent variants of the analytic future tense in heritage Polish in Germany. The heritage speakers exhibited the knowledge of formal rules for both constructions; however, in quantitative terms they demonstrated a preference for the variant that replicates the German pattern (the combination of the auxiliary być ‘be’ and the main verb infinitive), also attested across other Indo-European languages, over the more typologically peculiar variant used in Polish (być ‘be’ followed by a participle inflected for gender and number). Apart from cross-linguistic transfer, this trend is favored by processing considerations that also account for Reference Laleko, Iverson, Ivanov, Judy, Rothman, Slabakova and TryznaLaleko’s (2010a) results discussed above: opting for the less inflected variant allows heritage speakers to lessen the burden associated with the selection and phonological integration of morphology.

31.3 Word Order

A typologically salient feature of Slavic languages is their flexible word order. A number of researchers have attested changes in word order principles operating in HL settings, both in constructions in which the order of elements is syntactically fixed and those where constituent placement serves to mark their discourse function. The majority of existing heritage Slavic studies have discussed changes in both of these domains primarily in relation to dominant language influence.

A good example of the former phenomenon is the placement of clitics in the South Slavic languages (see also Chapter 17). Drawing on sociolinguistic interviews with first- and second-generation speakers of Serbian in Australia, Reference Dimitrijević-SavićDimitrijević-Savić (2008: 67–74) identified differences in the use and location of clitics in the speech of heritage bilinguals, compared to standard grammatical conventions. While clitics in Serbian typically occupy the so-called second position (i.e. occur after the first clausal constituent or after the first word within that constituent), heritage speakers often placed dative and accusative pronominal clitics after the verb (i.e. in positions where the direct and indirect objects are realized in English). A similar pattern of non-canonical clitic placement was documented in Reference Ivanova-SullivanIvanova-Sullivan’s (2019) longitudinal case study of a Bulgarian-English child observed between the ages of 2;0 and 4;0. Example (8) illustrates one of the earliest occurrences of a Bulgarian object clitic in the child’s production, appearing post-verbally instead of its expected preverbal position.

(8)
Iskam da vidja go.
want to see it-cl.m
‘I want to see it.’ (Reference Ivanova-SullivanIvanova-Sullivan 2019: 22)

Both studies also document instances of clitic omission and their replacement with full pronominal forms, suggesting that differences between the heritage and baseline grammars, when apparent, lie not only in the syntax of clitic placement, but also involve their formal and semantic properties. However, as demonstrated by the analysis of direct object production by Polish-German and Portuguese-German school-aged children in Reference Rinke, Flores and SopataRinke et al. (2019), contact with a cliticless language does not appear to affect the acquisition of object clitics in the HL, modulated instead by the accumulated amount of contact with the HL.

Apart from the ordering of clitics, word order in Slavic is traditionally described as free, with various configurations of main clausal constituents deemed acceptable under the appropriate information-structural conditions (see also Chapter 17). To anticipate the discussion below, most published studies on syntactic change in heritage Slavic point to word order rigidification and strengthening of the predominant SVO pattern. However, since the majority of existing work has been conducted in an English-dominant setting (but see e.g. Reference LaskowskiLaskowski 2014 for similar patterns in heritage Polish in Sweden), it is not yet possible to decisively determine the underlying factors driving this change.

Several studies have employed oral narratives to quantify the occurrence of canonical and non-canonical word order patterns in the naturalistic data. Looking specifically at subject–verb inversion (see example (9a)), Reference Isurin and Ivanova-SullivanIsurin & Ivanova-Sullivan (2008) and Reference Ivanova-SullivanIvanova-Sullivan (2014: 105–110) have documented a significant reduction in the use of VS orders by heritage Russian speakers, with no detectable effect of proficiency on the frequency of their use. In a different study, Reference Laleko, Dubinina, Bauckus and KresinLaleko & Dubinina (2018) detected similar patterns of diminished use of VS orders (9 percent in the heritage group vs. 14 percent in the monolingual group) alongside a higher occurrence of canonical SV(O) orders in the heritage group (76 percent vs. 67 percent). Additionally, Laleko and Dubinina found that heritage and baseline speakers adhered to different principles in the use of dislocation constructions, which involve the movement of objects to preverbal positions as illustrated in (9b).

(9)

Word order in Russian (Reference Laleko, Dubinina, Bauckus and KresinLaleko & Dubinina 2018: 194)
a.	Ljagušku	poceloval	princ.			OVS
	frog-acc	kissed	prince-nom
	‘A prince kissed the frog.’
b.	Ljagušku	princ		poceloval.	OSV
	frog-acc	prince-nom		kissed
	‘A prince kissed the frog.’

Dislocations occurred more commonly with pronominal and otherwise short (i.e. grammatically light) constituents in the HL, whereas for monolinguals the effect of grammatical weight was more likely to be overridden by information-structural requirements on the placement of constituents in accordance with their role in discourse (i.e. new vs. old information).

The effects of information structure on word order variation in heritage Russian in the US were investigated in Reference LalekoLaleko (2022), who employed acceptability judgment tests to measure the effects of subject focus on the speakers’ ratings of SV(O) and (O)VS sentences. In line with the previously attested trends, heritage speakers over-accepted the canonical orders and under-accepted the inverted orders, a trend manifested most categorically in low-proficiency speakers. Additionally, while the monolinguals were equally sensitive to focus in intransitive and transitive conditions, only in the latter contexts did focus serve as a predictor of subject placement for high-proficiency heritage speakers. As discussed in Laleko, these results suggest that HL speakers do not experience a global difficulty with focus marking through word order, but possibly find it more taxing to utilize information-structural cues in contexts where other cues (such as the lexico-syntactic properties of the verb in the case of intransitive predicates) must also be integrated.

While most of the above studies account for the observed decrease in HL word order variability in terms of dominant language transfer, work conducted outside of the English-dominant environment suggests that influence from the contact language may affect the shape of the HL word order system without loss of word order flexibility. Looking at a set of written narratives from heritage Russian adolescents in Germany, Reference Brehmer, Usanova and PeukertBrehmer & Usanova (2015) did not find a lower rate of non-canonical constructions in the HL corpus, compared to monolingual data. However, the study did report a more frequent use of verb-final orders, a property of German subordinate clauses, in the HL essays. In fact, the heritage Russian writers applied this restricted German-like pattern more liberally by extending it to main declarative clauses, resulting in a novel, less pragmatically marked structure adopted into the HL.

In another study involving German as the contact language, Reference Brehmer and SopataBrehmer & Sopata (2021) examined the placement of auxiliaries and infinitive complements in complex verb phrases in heritage Polish, as illustrated in (10).

(10)

Word order in complex predicates in Polish (Reference Brehmer and SopataBrehmer & Sopata 2021)
a.	On	chc-e	złapać		ptak-a.
	he-nom	wants	catch-inf.pf		bird-acc
	‘He wants to catch a/the bird.’
b.	On	chc-e	go	złapać.
	he-nom	wants	he-acc.cl	catch-inf.pf
	‘He wants to catch him/it.’

In Polish, word order in these constructions varies depending on the pragmatic context, with both adjacent (10a) and discontinuous (10b) structures licensed by the grammar, whereas German only allows for the discontinuous placement of these elements. Using an apparent-time approach, the study examined data from elicited oral production and acceptability judgments in several age groups of HL speakers and documented transfer effects from German in both simultaneous and sequential bilinguals, manifested as an overuse and over-acceptance of the discontinuous pattern. The developmental trajectories in the acquisition of the relevant structures were further shown to depend on a complex interplay of factors, with a joint effect of delayed acquisition in childhood and attrition in adulthood documented for simultaneous bilinguals.

In summary, while syntactic convergence with the dominant language emerges as a prime direction of HL word order change based on the reviewed studies, more work is necessary to account for the mechanisms and outcomes of such change in a wider set of linguistic environments. The fact that HL grammars, like the grammars of all natural languages, are subject to language-internal pressures associated with diachronic change predicts the possibility of non-contact-induced developments in HLs that may produce novel patterns not fully mapping onto the dominant language template. Analyses of HL corpora, particularly those based on naturalistic observation, and experimental studies probing into the speakers’ judgments and processing would be essential for such investigations.

31.4 Conclusions and Future Directions

The chapter has offered a concise but comprehensive review of current research on grammatical properties of Slavic languages as HLs. Notwithstanding the expectedly high degree of variability in heritage speaker competencies observed across the surveyed studies, the many recurring motifs in this literature make it clear that the trajectory of change in heritage grammars undergoing restructuring is modulated by a set of common principles, many of which are already familiar from research on intergenerational language loss (Reference Seliger and VagoSeliger & Vago 1991 ). Two such trends seem particularly robust in relation to the data discussed here. On the one hand, the inflectionally rich and highly synthetic Slavic languages represent a very clear illustration of the effects of morphological leveling in heritage grammar formation, a process manifested as reduction and regularization of the nominal and verbal inflectional paradigms and decrease or loss of allomorphic variation. The resulting streamlining of morphophonological distinctions often goes hand in hand with the development of analytic or periphrastic constructions to replace synthetic forms, innovations in the marking of grammatical relations, and reorganization of word order principles. At first glance, some of these changes seem to operate in the direction of convergence between the typological structures of the heritage and dominant languages, especially considering that the lion’s share of existing empirical studies have examined the formation of heritage grammars in close contact with the inflectionally rigid and predominantly analytic grammatical system of English. Yet, several studies outside the English-dominant environment have demonstrated that greater morphological richness of the ambient language does not suffice to avert paradigm leveling in the heritage system, and that the internal principles of change are still operative in such systems alongside external, transfer-induced changes, resulting in a conglomeration of properties that require careful evaluation in relation to the structures of the relevant baseline and dominant languages as well as to the general principles of diachronic change.

On a larger scale, the predominantly fusional nature of Slavic morphology creates ideal conditions for observing linguistic changes motivated by the interactions between the principle of economy, which favors the development of grammatical systems with the least possible number of forms (this encompasses the leveling processes discussed above), and the principle of transparency, which serves to establish the most predictable and unilateral links between forms and meanings within a linguistic system (i.e. one meaning–one form). Effects of the latter principle seem particularly evident in processes manifested as elimination (or delays in the acquisition of) opaque forms, reduction in the range of their meanings, and difficulties with processing underspecified, ambiguously marked, or optionally occurring elements, as demonstrated by numerous studies reviewed in this chapter. In situations of sustained societal bilingualism, these global pressures may surface in the heritage varieties as no more than quantitative tendencies affecting the distribution of forms across different contexts or variation in accessibility during online processing. In the more extreme cases, these processes could be accelerated to the level of linguistic representation and culminate in the elimination of certain grammatical categories from the HL or their full or partial adaptation to the structural template of the societally dominant language.

HLs can offer much to the study of language change and historical research more generally. As the brief survey presented in this chapter hopes to have demonstrated, the quickly expanding field of heritage linguistics is positioned perfectly at the intersection of distinct linguistic traditions and can fruitfully build on and inform a range of methodological and conceptual approaches to language study. While much progress has been made in recent years toward the development of the empirical and theoretical foundations of heritage Slavic linguistics, a major drawback of the existing body of knowledge is that it remains limited to only a few most well-represented Slavic languages, examined in a relatively narrow set of language contact situations. Some notable gaps concern HL development in Slavic-to-Slavic contact settings (e.g. Ukrainian in Poland or Serbo-Croatian in Slovenia), that is, in situations where the grammatical systems of the heritage and dominant languages are maximally similar, and contexts in which typologically-related non-dominant HLs develop simultaneously (e.g. early Russian-Ukrainian bilinguals in the US) or undergo koineization under prolonged inter-linguistic and inter-dialectal contact in established immigrant communities. Going forward, along with the advancement in the range of research methods employed for the study of HLs at various points on the continuum of language change, a key imperative is to expand the typological diversity of language dyads under investigation in order to draw broader conclusions about the overarching principles of grammatical development in Slavic languages as HLs in different socio-demographic niches and across a wider range of linguistic locales.

32 Scripts

32.1 Introduction

A script is a set of characters that have a certain relation to linguistic units. Every script belongs to a certain script type (Reference Daniels, Olson and TorranceDaniels 2009; called a ‘writing system’ by Reference CoulmasCoulmas 1999) based on its main level of representation: In logographic scripts (e.g. Chinese, Sumerian cuneiform, Mayan), a single character represents a word or a morpheme; in syllabaries (e.g. Hiragana, Cypriot, Cherokee), a single character represents a syllable; and in alphabets (e.g. Cyrillic, Latin, Greek), a single character represents a phoneme. Special cases are consonantal alphabets (or abjads, e.g. Arabic, Hebrew, Phoenician), in which vowels are not normally written, and syllabic alphabets (or alphasyllabaries or abugidas, e.g. Nagari, Ethiopic, Cree), in which vowels are represented as systematic modifications of consonant signs. However, writing systems as adaptations of scripts to a certain language (called ‘spelling systems’ by Reference Coulmas, Günther and LudwigCoulmas 1996: 1381) never belong to a ‘pure’ type. Thus, for example, all existing logographic systems are actually logosyllabic (making use of the rebus principle to also write words independently of their meaning), and alphabetic texts often contain logograms like 〈&〉, 〈€〉, 〈%〉, 〈°C〉, etc. All scripts that have been used for Slavic languages are either alphabets or consonantal alphabets.

All types of writing systems exploit links to phonology. However, no existing writing system is a simple ‘transcription’ of the phonology of a language. This means that, although most of the languages of the world are never written down, in a language that happens to have a written form the writing system is an independent subsystem of that language. We call this subsystem the graphematics of a language. It works in a similar way to the phonological subsystem. Consequently, we need to distinguish between the systematic aspects of the system, which are referred to as graphemics, and the material aspects, which are referred to as graphetics. Graphemic material is conventionally enclosed in angular brackets 〈…〉, and graphetic material (following Reference Fuhrhop and BuchmannFuhrhop & Buchmann 2009) in vertical bars |…|.

This chapter only treats general issues of the use and development of scripts for writing Slavic languages. The concrete spelling systems of the Slavic languages are the subject of Chapter 33.

32.2 Scripts Used for Slavic Languages

Six different scripts have so far been used for writing Slavic languages. The main ones are the Glagolitic, Cyrillic, and Latin alphabets. Apart from that, Slavic texts have also been written in the Greek alphabet and the Arabic and Hebrew consonantal alphabets.

32.2.1 Before the Glagolitic Script

Before the actual invention of a Slavic script, as the monk Xrabr (ca. AD 900) writes in his treatise “O pismenexъ” (‘About the letters’), Slavs sporadically wrote in the Greek and Latin alphabets bez ustroenija (‘unsystematically’), and they used črьty i rězy (‘lines and cuts’) to write down the Slavic language(s). The unsystematic use of the Latin alphabet is attested, for example in the Freising Manuscripts. The ‘lines and cuts’ have variously been identified as Germanic runes, as the (Turkic) Bulgar script, or as proof of a specifically Slavic pre-Christian script. However, no evidence has been found for Slavic written in Germanic runes: all attempts at reading runes as Slavic have proved unsuccessful (like e.g. Reference LeciejewskiLeciejewski’s (1906: 68–73) reading of “ᛋᚨᛒᚨᚱ” on a bracteate from Wapno that is nowadays dated to the fifth/sixth century (Reference ŻakŻak 1985) as a supposed given name ‘Zabaw’ – with the last letter allegedly being 〈ᚹ (w)〉 and no other attestation for the name – rather than as ‘Sabar’, which probably is an abbreviation of the well-attested Gothic and Old High German name Saba-ric (Reference NowakNowak 2003: 299–300)); Reference Macháček, Nedoma, Dresler, Schulz, Lagonik, Johnson, Kaňáková, Slámová, Llamas, Wegmann and HofmanováMacháček et al.’s (2021) suggestion that the runes 〈ᛏᛒᛖᛗᛞᛟ (tbemdo)〉 on a sixth-century bone fragment recently found in south-eastern Czechia were written by a Slav is pure speculation, since they merely constitute a part of the translingual runic alphabet sequence. The Bulgar script has not been deciphered yet, and so it is not impossible that, when we one day are able to read it, some of the texts in this script will turn out to be Slavic (Reference Trunte, Matulić and LupisTrunte 2019). The numerous reports about pre-Christian, specifically Slavic writing systems or ‘Slavic runes’, however, are entirely based on fakes or pseudoscience (or ‘amateurish linguistics’; Reference ZaliznjakZaliznjak 2010).

32.2.2 The Glagolitic Script

The heatedly debated question of the nineteenth century about whether the Glagolitic or the Cyrillic alphabet is older and which of them was invented by Saint Cyril (Constantine) has been answered unequivocally: the oldest Slavic script is the Glagolitic alphabet (see Table 32.1), and it was invented by Cyril. Among the main arguments in favor of this are the fact that in palimpsests and glossed documents containing both alphabets, it is always the original text that is in Glagolitic while the newer text or the glosses are in Cyrillic, the tendency for the most archaic forms of Old Church Slavonic to be conveyed in Glagolitic documents, and the fact that it was the Glagolitic script that was originally called kurilovica ‘Cyril’s script’ (e.g. in Upyrʹ Lixyj’s colophon of 1047).

Table 32.1 Glagolitic and Cyrillic scripts

The debate about the ‘origin’ of the Glagolitic letters, however, is still ongoing. Several exogenic (‘natural’, ‘palaeographic’; Reference Trunte, Matulić and LupisTrunte 2019) theories have claimed an evolutionary development of the Glagolitic script from some other existing script, for example Greek (minuscule), Armenian, or some Oriental script, whereas endogenic (‘artificial’, ‘ideographic’; Reference Trunte, Matulić and LupisTrunte 2019) approaches have tried to understand the alphabet as created according to some underlying design principle, for example by combining the three Christian symbols cross, circle, and triangle (Reference TschernochvostoffTschernochvostoff [1947] 1995) or by connecting lines on a common grid (Jončev & Jončeva 1982). None of these approaches is convincing in its entirety. Therefore, most probably Cyril freely invented the Glagolitic letters, sometimes being inspired by theological ideas (e.g. when using the cross as the first letter of the alphabet or when designing the letters and , both of which show one part of the Holy Trinity enclosed by divine eternity, to form the abbreviation for ‘Jesus’, which can be read as descent from heaven and ascension) and sometimes using associations with other scripts he knew (e.g. ← Greek , ← Latin 〈P〉, ← Armenian , or ← Hebrew ).

Invented by Cyril, the Glagolitic script was first used for Cyril and Methodius’s Moravian mission, which began in 863 and continued after Cyril’s death in 869 until Methodius’s death in 885, after which their disciples were expelled from Moravia by Wiching, who was reinstated as bishop of Nitra by the Frankish adversaries of the Slavic liturgy. Cyril & Methodius’s followers mainly fled to modern Bulgaria and North Macedonia, maybe also directly to Dalmatia and Istria (cf. Reference BirnbaumBirnbaum 1996); continuity with the Moravian mission has also been claimed for the Sázava Monastery near Prague (where the Prague Glagolitic Fragments were written in the eleventh century; cf. Reference ČajkaČajka 2011: 41–45). From Bulgaria and North Macedonia, the script spread further to the region of modern-day Serbia, Bosnia and Herzegovina, and Montenegro (which, however, is only attested from Štokavian influences in Old Church Slavonic manuscripts like the Codex Marianus, the Gršković Apostle, or the Mihanović Homiliarium, as well as from individual Glagolitic letters in Serbian Cyrillic manuscripts, not from inscriptions), and to Kievan Rusʹ (cf. e.g. the Glagolitic inscriptions in Novgorod; Reference Gippius and MixeevGippius & Mixeev 2022). The oldest manuscripts and inscriptions we have were thus written about a generation after the Moravian mission, though some very early texts (especially the Kiev Folia) show West Slavic and Catholic influences. Consequently, the ‘classical’ version of Glagolitic we know from these texts must already have been the result of a development from the first version of Glagolitic originally invented by Cyril (cf. Reference Trunte, Dürrigl, Mihaljević and VelčićTrunte 2004). For example, the four ‘classical’ nasal vowel letters show clear signs of modifications (, which is a combination of , which does not exist separately, and , which might originally have been either the letter for or a general nasal vowel letter or diacritic); in the oldest texts we find a second /x/ letter next to canonical , the so-called ‘spidery x’ , whose original function is unclear (maybe it represented the archaic pronunciation [kʰ] of Greek , maybe [x] vs. for [ç] in Greek loanwords, maybe German [h], maybe it was even the original letter for , cf. Reference Fuchsbauer, Fuchsbauer, Stadler and ZinkFuchsbauer 2021: 107f.); there is evidence that, unlike Cyrillic letters, which have numerical values only for units, tens, and hundreds, in Glagolitic there originally was also a fourth row of letters representing thousands (e.g. = 1000, = 9000), but this row has been preserved only fragmentarily.

Between the end of the ninth and the end of the twelfth century, the Glagolitic alphabet was gradually superseded by Cyrillic, first in eastern Bulgaria (Preslav) and later in North Macedonia (Ohrid). During this time, both scripts existed side by side, as, for example, Cyrillic glosses in Glagolitic manuscripts show. Obviously, in some places people already used the Cyrillic script for most purposes while holding on to Glagolitic for liturgical texts. When Glagolitic text production had ceased, Glagolitic texts continued to be copied into Cyrillic, which of course still required reading them. Moreover, some developments within Cyrillic seem to have had a ‘backward’ influence on Glagolitic, for example the differentiation of and and the reuse of (from Greek ) as a letter for the reflex of *tj (i.e. št in Bulgaria) might have led to the subsequent analogous differentiation of Glagolitic and and the reassignment of Glagolitic .

After the twelfth century, Glagolitic continued to be used by the glagoljaši (‘Glagolites’, Catholic priests reading mass in Croatian Church Slavonic) along the Dalmatian coast, on the islands, and in Istria, after whom the script received its modern name. In this region, Glagolitic developed a different shape, the so-called angular Glagolitic (see Section 32.3.1). In the fourteenth century, the Croatian Glagolitic tradition was also carried to Prague, where Charles IV founded the Emaus Monastery with Croatian monks in 1347, and from there to Oleśnica (Lower Silesia) in 1380 and to Kleparz near (nowadays in) Cracow in 1390, where Glagolitic texts were written for several decades (Reference Żurek, Barciak and IwańczakŻurek 2006). In Istria, Dalmatia, and especially on the Croatian islands, the Glagolitic alphabet continued to be written, also for mundane purposes and for the vernacular language (mostly Čakavian), in some places up to the nineteenth century. In the twentieth century, the script finally fell out of use, being supplanted by the Latin alphabet.

32.2.3 The Cyrillic Script

The Cyrillic alphabet is probably the consequence of the widespread knowledge of the Greek alphabet among educated Slavs in eastern Bulgaria. At the end of the ninth century, it was probably members of the Preslav Literary School in Eastern Bulgaria that started to use Greek letters for all the sounds of Old Church Slavonic that could be expressed with Greek letters ( …; see the medium-shaded cells in Table 32.1), complementing them with those Glagolitic letters that represented non-Greek sounds, adapted to the Greek/Cyrillic style (; ; see the dark-shaded cells in Table 32.1; the light-shaded cells indicate letters of unclear/disputed origin). What resulted was the Cyrillic alphabet, which spread from Bulgaria to North Macedonia and to Serbia and, together with Church Slavonic texts after the Christianization of Rusʹ in 988, to the East Slavic area, so that it came to be used by all Slavs in the Byzantine sphere of influence – the so-called Slavia Orthodoxa (Reference PicchioPicchio 1958). Very early on, alongside Church Slavonic for religious texts, mundane texts were also written in the Slavic vernaculars (e.g. the Novgorod birch bark texts going back to the first half of the eleventh century). Historically, the Cyrillic alphabet was also used by Catholics (especially the Franciscans) in Croatia and Bosnia and Herzegovina from the eleventh to the nineteenth century, in the sixteenth-century Protestant prints from Tübingen and Urach, and by Bosnian Muslims, who until the nineteenth century used the Cyrillic alphabet as their main script for everyday use (while the Arabic script was mainly used for certain types of literature). As a non-Slavic language, Romanian was written in Cyrillic from the first extant Romanian manuscript of 1521 to its gradual replacement with the Latin alphabet during the nineteenth century. (The Romanian Orthodox Church used Church Slavonic as its liturgical language from the thirteenth to the early eighteenth century.) Many non-Slavic languages of the Russian Empire (e.g. Udmurt, Komi, Mari, Yakut, Even; including Aleut, Yupik, and Tlingit in Alaska) were written in Cyrillic as well, often for purposes of proselytization, but with early evidence dating back to the Middle Ages (e.g. Karelian in the Novgorod birch bark letters no. 292 and no. 403 from the thirteenth and fourteenth centuries). From the 1930s on, Cyrillization was also extended to languages of the Soviet Union that had formerly been written in other scripts like Arabic (e.g. Tatar, Kazakh, Chechen, Dungan) or Mongolian (e.g. Buryat, Kalmyk) and to further hitherto unwritten languages (e.g. Chukchi, Nivkh, Yukaghir). Cyrillic was also introduced to Mongolia in the 1940s and used, in the first decade or so after the 1949 revolution, for some languages in China (e.g. Uyghur). It has been replaced by the Latin alphabet for Romanian, Bosnian, and Croatian, and many former Soviet languages spoken outside the Russian Federation (e.g. Azerbaijani, Turkmen, Karakalpak); however, within the Russian Federation, an attempt to convert the Tatar language to the Latin alphabet was stymied by the State Duma, and Crimean Tatar had to return to the Cyrillic alphabet after the annexation of Crimea.

32.2.4 The Latin Script

As reported by Xrabr, the Latin alphabet was used to write Slavic from the beginning. The oldest evidence of such sporadic and unsystematic use of the Latin alphabet are the Freising Manuscripts, written in Old Slovenian around the year 1000. However, because of the dominant role of the Latin language, the vernacular languages of Slavia Latina started to be written down in the Latin alphabet only in the fourteenth century (in Czech, Polish, Croatian, and Slovenian; in this chronological order) and thus much later than the vernaculars of Slavia Orthodoxa. The Latin alphabet has also been used for Belarusian and Ukrainian, and in the course of the twentieth century it came to be used alongside Cyrillic for Serbian and Montenegrin. Because of the international significance of the Latin alphabet, all Slavic languages written in Cyrillic are regularly transcribed into Latin letters for various purposes (from passports and international telegrams to tourist maps and text messages).

32.2.5 Other Scripts

Slavic-speaking Muslims have lived in two areas. Tatars, who lived in the Grand Duchy of Lithuania since the fourteenth century, produced religious texts in Ruthenian and Polish written in the Arabic script – the so-called kitaby, which are attested from the seventeenth to the nineteenth century (but must in part have been written much earlier; cf. Reference MiškineneMiškinene 2001). In the Ottoman Empire, especially in Bosnia and the Sandžak, Serbo-Croatian-speaking Muslims since the fifteenth century produced aljamiado (alhamijado) literature in the Arabic script (cf. Reference KalajdžijaKalajdžija 2019). After the independence of Bosnia and Herzegovina, there have been attempts at revitalizing arebica (Reference SchlundSchlund 2020: 392–393).

Although Jews play an important role in many Slavic societies, and did so even more before the Holocaust, there are hardly any Slavic texts in the Hebrew script. This is because Slavic-speaking Jews have tended to use their in-group languages Yiddish, Ladino, and Hebrew for internal communication, and when writing in a Slavic language they usually use the same alphabet as the gentiles. Therefore, a number of Old Czech, Old Sorbian, Old Polish, and other Slavic glosses in eleventh- to thirteenth-century Hebrew texts (see the collection in Reference Kupfer and LewickiKupfer & Lewicki 1956, Reference MoskovičMoskovič et al. 2014, Reference Bláha, Dittmann, Komárek, Polakovič and UličnáBláha et al. 2015, Reference Pronk and KapetanovićPronk 2018) and a famous late-twelfth-century bracteate with the inscription 〈משקא קרל (mšk’ krl)〉 for Pol. Mieszko król ‘King Mieszko’ are about the only Slavic ‘texts’ in the Hebrew script. Whether the language varieties used in these glosses constitute a separate Judeo-Slavic language (also named Knaanic) akin to Judeo-Spanish (Ladino) or Yiddish, or just regular Czech, Polish, Sorbian, etc. in Hebrew letters, is a matter of debate.

Apart from probable but unpreserved attempts before the invention of Cyrillic, the Greek alphabet is used again today to write Pomak and Aegean Macedonian, two Slavic languages spoken in Greece. Notable examples of literacy in South-East Slavic dialects spoken in Greece are nineteenth-century manuscripts in the Greek alphabet like the Konikovo Gospel (Reference Lindstedt, Spasov and NuorluotoLindstedt et al. 2008) or the Kulakia Gospel (Reference Mazon and VaillantMazon & Vaillant 1938).

32.3 Graphetics: Development and Variation of Scripts Used by the Slavs

Just like languages, scripts can exhibit diachronic and diatopic variation. Consequently, within a script we can distinguish several glyphic variants (Reference BunčićBunčić et al. 2016: 22–24).

32.3.1 Glagolitic Letters

Since no manuscripts from the Moravian mission are preserved, the original shapes of the Glagolitic letters are a matter of debate. Arguably the oldest preserved manuscript are the Kiev Folia (probably from the early tenth century; Figure 32.1). They differ from the ‘classical’ Glagolitic in the large Old Church Slavonic manuscripts in several respects. Most notably, the letters, which have markedly different heights (the shortest being , the tallest and ), do not sit on a baseline but appear to be hanging down from a line. The letters and are not symmetrical yet, the is simply a cross, the lowest line of is not connected to the upper part, the left part of is rectangular rather than rounded, and the right part of and is significantly smaller than the left part.

Figure 32.1 Oldest form of Glagolitic

(Kiev Folia, tenth century, fol. 4r) (https://lccn.loc.gov/2021667731)

In all the Old Church Slavonic manuscripts the letters have a lot of curves and loops (see Figure 32.2). This version, which seems to have been optimized for writing with ink (cf. the additional loops and curves in Latin and Cyrillic cursives: for , for , for , for , etc.) is therefore commonly called round Glagolitic. In Croatia, the Glagolitic alphabet was often incised in stone, and for these inscriptions it was more practical to have straight lines and angular shapes. The resulting angular Glagolitic script was also used in manuscripts from Croatia (see Figure 32.3). One of its characteristics is an abundance of ligatures that reduce the number of largely redundant vertical lines (e.g. round > angular → ligature , > → , > → ). The angular variant was also used in prints from the late fifteenth to the twentieth century. For everyday use, a cursive form, called brzopis ‘quick writing’ in Croatian, developed and was used up to the nineteenth century (see Figure 32.4).

Figure 32.2 Round Glagolitic

(Codex Marianus, eleventh century, fol. 36r) (https://w.wiki/8DwH)

Figure 32.3 Angular Glagolitic

(Vrbnik Statute, sixteenth century, p. 2) (http://urn.nsk.hr/urn:nbn:hr:238:355082)

Figure 32.4 Cursive Glagolitic

(Sermon on Love for Enemies, 1790, fol. 3v) (http://urn.nsk.hr/urn:nbn:hr:238:201216)

32.3.2 Cyrillic Letters

The oldest Cyrillic liturgical texts are written in a variant called ustav, which is a solemn two-line script, in which all the letters have the same height, and ascenders and descenders only occur in the form of hairlines (Figure 32.5). A more casual variant was the poluustav, a four-line script in which ascenders and descenders are more pronounced and straight lines can more often be bent or tilted (Figure 32.6). Gradually, ustav was completely superseded by poluustav, which was also the variant used in print, for Church Slavonic to this day. Among the Catholics in Bosnia and Croatia, however, poluustav was replaced by a Western variant of Cyrillic, often referred to as Bosančica (and in Croatian as hrvatska ćirilica ‘Croatian Cyrillic’; Figure 32.7). Diatopic (and diachronic) variation was most pronounced in the multiplicity of cursives: The letter shapes in a text written in skoropisʹ can be used to date and locate manuscripts (Figure 32.8) and, in the case of Western Cyrillic, even to determine the religion of the writer (Reference BunčićBunčić et al. 2016: 198–200).

Figure 32.5 Ustav

(Codex Suprasliensis, eleventh century, fol. 28r) (https://w.wiki/8DwE)

Figure 32.6 Poluustav

(Cologne Manuscript of Isaac the Syrian, fifteenth century, fol. 2r) (Isaak 2015: 1)

Figure 32.7 Western Cyrillic cursive

(Poljica Statute, 1665, fol. 1r) (www.croatianhistory.net/etf/et04.html)

Figure 32.8 Ruthenian Skoropisʹ

(Debt bond, 1569; Russian National Library, Zinčenko collection No. 46) (https://expositions.nlr.ru/rusautograph/pismo/ustav/albom.php)

A major turning point in the development of Cyrillic was tsar Peter I’s alphabet reform of 1708. By introducing the more frugal letter shapes of his graždanskij šrift (or graždanskaja pečat‘) ‘civil type’ (Figure 32.9), which were modeled on contemporaneous roman type (cf. |а| → |a|, |е| → |e|, |ѧ| → |я|, etc.), and omitting all the diacritics, suspended letters, titla, and synonymous letters characteristic of church prints, he made printing for mundane purposes simpler and therefore cheaper (cf. the title of a news report from 1711: “Релѧ́цїа. ѡ҆ поведенїи бы́вшемъ въ а҆́рмеѣ Е҆гѡ Црⷭ҇кагѡ Вели́чества · Ма́їа съ трїдесѧ́тагѡ числа̀” in Old Cyrillic and “Реляція. О поведенїи бывшемъ въ Армее Его Царского Велїчества · маїя съ 30 числа” in civil type; see the facsimiles in Reference BunčićBunčić et al. 2016: 112). A fact that is often forgotten is that this reform also applied to handwriting, replacing the old skoropisʹ with graždanskoe pisʹmo ‘civil cursive’. Subsequently, the civil script was also adopted by the other peoples using the Cyrillic alphabet. Consequently, the former diatopic variety has been replaced by an almost uniform Cyrillic alphabet in print and relatively little variation between the national cursives. Exceptions are a few special letter shapes in Serbian, Montenegrin, and Macedonian Cyrillic (|б| and italic |г|, |д|, |п|, |т| vs. |б|, |г|, |д|, |п|, |т| for 〈б〉, 〈г〉, 〈д〉, 〈п〉, 〈т〉). Apart from that, a Bulgarian variant of Cyrillic with more independent small-letter shapes and more ascenders and descenders (e.g. |ϐ|, |ꙅ|, |ɡ|, |u|, |k|, |ʌ|, |n|, |m| for 〈в〉, 〈г〉, 〈д〉, 〈и〉, 〈к〉, 〈л〉, 〈п〉, 〈т〉) has been developing for several decades (Reference Kempgen, Tomelleri and KempgenKempgen 2015); however, while this variant is ubiquitous on Bulgarian posters, signposts, book covers, etc., it has only recently come into use as body text font.

Figure 32.9 Civil type

(Sanktpeterburgskie vědomosti, June 1711, p. 8) (https://w.wiki/8DwB)

32.3.3 Latin Letters

In contrast to Glagolitic and Cyrillic, the Latin alphabet has a long pre-Slavic history. However, Slavic texts only start to be written after the introduction of the Carolingian minuscule (in which e.g. the Old Slovenian Freising Manuscripts are written; see Figure 32.10). The development of the Latin alphabet during the Late Middle Ages is also reflected in Slavic manuscripts. Thus, Slavic manuscripts of the Gothic period are written in the textualis style (a kind of blackletter), for example the Polish Sankt Florian Psalter (Psałterz floriański) or the Slovenian Rateče Manuscript (Rateški rokopis; also Celovški rokopis ‘Klagenfurt Manuscript’; Figure 32.11), both from the late fourteenth century. In Dalmatia, meanwhile, the rotunda style that is also widespread in Italy can be seen for example in the Croatian Red i zakon of 1345 or the Vatican Prayer Book from around 1400 (Figure 32.12). After the introduction of letterpress printing, Slavia Latina was divided into a northern part (i.e. the West Slavic languages) using blackletter and a southern part (i.e. the South Slavic languages) using roman type (or antiqua; Figure 32.13) since the sixteenth century (after the first Croatian books – e.g. the Lectionary of Bernadine of Split, Venice 1495 – were printed in a rotunda typeface, and the two very first Slovenian books were printed in 1550 in Tübingen in Schwabacher). In the north, the two main variants of blackletter are Schwabacher (Figure 32.14) and Fraktur (Figure 32.15), the former having slightly rounder, simpler, and wider letters than the latter (compare |d|, |g|, |m|, |o|, |s| in Schwabacher with |d|, |g|, |m|, |o|, |s| in Fraktur). While German texts were printed in Schwabacher only at the beginning but then mainly in Fraktur from the latter half of the sixteenth century to 1941, Polish texts remained in Schwabacher until blackletter was replaced by roman type around the turn of the eighteenth century. Czech was also mainly printed in Schwabacher (although a few Czech texts were in Fraktur), and the abolition of blackletter only happened in the 1820s–1830s (Reference BunčićBunčić et al. 2016: 300–303). Sorbian texts, by contrast, were printed mainly in Fraktur (just like German texts). Roman type was only introduced in 1847 for Upper Sorbian and 1862 for Lower Sorbian, but Fraktur remained in use for Upper Sorbian until the Nazi period, whereas for Lower Sorbian blackletter was even revived after 1989 (Reference BunčićBunčić et al. 2016: 303–305).

Figure 32.10 Carolingian minuscule

(Freising Manuscripts, Slovenian part, late tenth century, fol. 160v) (https://w.wiki/8D$a)

Figure 32.11 Textualis

(Slovenian Rateče Manuscript, late fourteenth century) (https://w.wiki/8D$m)

Figure 32.12 Rotunda

(Vatican Croatian Prayerbook, around 1400, fol. 22v) (Lupić 2019: 59)

Figure 32.13 Roman type

(Marko Marulić, Judita, Venice 1586, fol. 3v) (http://urn.nsk.hr/urn:nbn:hr:238:371054)

Figure 32.14 Schwabacher

(Czech Grammar of Náměšť, 1533, p. 10) (https://doi.org/10.5282/ubm/digi.1032)

Figure 32.15 Fraktur

(Upper Sorbian New Testament translated by Michał Frencel, 1706, Matthew 1:1) (https://w.wiki/8E22)

The use of blackletter and roman type in print correlates with the use of a German cursive (Kurrent) or a roman cursive in handwriting. Although nowadays the German cursive is not taught in schools anymore, every country teaches a different cursive variant in their schools, so that people writing a foreign language usually have an ‘accent in writing’ (Schriftakzent; Reference Meyer, Meyer and ReinkowskiMeyer 2018) because the letter shapes they have learned at school are not the same that native speakers of the language have learned. This also applies across scripts (e.g. Russian schoolchildren learn a specific cursive variant of the Latin alphabet in their foreign language classes).

32.4 Social, Cultural, Religious, and Political Influences

Being a part of language planning, the adoption or change of a writing system for a language is an action undertaken by people and as such subject to all kinds of extralinguistic influences. The phenomenon that ‘script follows religion’ (Reference SampsonSampson 1985: 16), though vastly overestimated, is responsible for the fact that the use of the Latin and Cyrillic alphabet almost exactly coincides with Slavia Latina and Slavia Orthodoxa. The Serbian speech community, which belongs to Slavia Orthodoxa but uses both alphabets, writes texts about Orthodox religion exclusively in the Cyrillic script.

Furthermore, we can see areal effects of power relations between scripts (cf. Reference CalvetCalvet 1999: 94–98). For example, the Bosnian Muslims have never relied exclusively on the Arabic alphabet but used the Cyrillic script (just like the Catholics and Orthodox in Bosnia) until the end of the Ottoman era and mainly the Latin alphabet since 1878. This change is part of a Latinization process in the Balkans (which might be seen in the context of a Europeanization or Westernization), which also led to the obsolescence of Glagolitic and Cyrillic for Croatian and the introduction and increasing use of the Latin alphabet to Serbian (Reference BunčićBunčić et al. 2016: 321–324). In the Russian sphere of influence, by contrast, it was the Cyrillic alphabet that ousted the Latin alphabet that had been in use for Belarusian and (to a lesser extent) Ukrainian until the beginning of the twentieth century, and that was also exported to a great number of languages of the Soviet Union, Mongolia, and even China. The replacement of blackletter with roman type in Polish, Czech, and Upper and Lower Sorbian can be seen as a Slavization that gained momentum during the period of Slavic Revival, but also as a deliberate alienation from the German sphere of influence.

32.5 Biscriptality

In the course of history, many Slavic speech communities have employed more than one writing system at the same time. This phenomenon, which can be called biscriptality, can be broken down into 3 × 3 major types (Reference BunčićBunčić et al. 2016: 67): On the one hand, we have to distinguish between scripts, script variants, and orthographies. (Orthographic biscriptality is treated in, Section 33.6.) On the other hand, the writing systems in parallel use can be distributed in three different ways: in an equipollent opposition, a privative opposition, or a diasituative distribution. (Most of the following examples are taken from Reference BunčićBunčić et al. 2016, where further references can be found.)

32.5.1 Scriptal Pluricentricity

In an equipollent distribution, both writing systems are used by different parts of the speech community, which differ from each other by place of residence, ethnicity, or religion (e.g. [Orthodox] vs. [Catholic]). Such a language has to be considered pluricentric (even if the varieties differ only in the writing system used). For example, in medieval and early modern Croatia, three alphabets were used to write vernacular texts: Glagolitic, Cyrillic, and Latin. However, contrary to Reference HercigonjaHercigonja’s (2006) picture of a unified ‘triscriptal and trilingual culture’ (tropismena i trojezična kultura), these alphabets were used by different parts of the population, depending primarily on the region where they lived and secondarily on their social status (e.g. the Latin alphabet was initially written by those people who had also learned the Latin language). Consequently, the Protestant Bible society in Tübingen in the 1560s printed books in all three alphabets in order to reach a maximum audience. By the nineteenth century, the Latin alphabet was virtually uncontested among the (Catholic) Croats, but the same Serbo-Croatian language was written almost exclusively in Cyrillic by the (Orthodox) Serbs, and the Bosnian Muslims used both the Cyrillic and the Arabic script (and changed from Cyrillic to Latin after 1878). In the twentieth century, the Latin alphabet came to be used by all parts of the speech community, but the Cyrillic alphabet is specific to the Serbs and Montenegrins, so that the choice of alphabet now depends on nationality. The Arabic script was also used by Muslims writing Belarusian, Polish, Bulgarian, and Macedonian, making those languages scriptally pluricentric during the time of this use as well. The sporadic use of the Hebrew script for West Slavic languages mentioned above happened at a time when these languages were hardly written in the Latin alphabet either, but the mechanism of course is the same. In nineteenth- and early twentieth-century Belarus, many people learned to read and write in confessional schools, that is, Catholics learned the Latin alphabet, and Orthodox the Cyrillic. This is why some publications appearing after the Russian Revolution of 1905 – most notably the newspaper Naša Niva and Taraškevič’s Belarusian grammar – were printed in parallel in both alphabets, to reach both parts of the population. An interesting case of scriptal pluricentricity is the pidgin Russenorsk, which was of course mainly a spoken language, but if it was written down, Norwegians wrote it in the Latin alphabet, and Russians in Cyrillic.

32.5.2 Glyphic Pluricentricity

Coming back to the Balkans, the melting pot of cultures, it is in Bosnia and Herzegovina that we can find three different variants of the Cyrillic alphabet used by three different religious groups: the Orthodox using Eastern variants of Cyrillic, poluustav and Serbian skoropisʹ, the Muslims using a specific handwritten variant of Western Cyrillic called begovica, and the Catholics using a form of Western Cyrillic introduced mainly by the Franciscans. Similarly, the Masurian Protestants used blackletter into the twentieth century, long after the Catholic Poles had switched to roman type. While civil type has become the only variant used by the majority of the Russians to write Russian, Old Believers continue writing Russian in Old Cyrillic (mostly by hand) to this day, often still detesting the civil type (though many modern Old Believers’ printed – and online – publications nowadays do use the civil script, alongside Old Cyrillic on book covers, posters, website headers, etc.).

32.5.3 Digraphia

The second kind of distribution of writing systems is the privative opposition. This is a functional distribution (e.g. [+ religious] vs. [− religious]) that often resembles diglossia, which is why we use terms modeled on the word diglossia to describe this. The people living in Poljica, a small region between Split and Omiš, from the twelfth to the mid-eighteenth century used the Glagolitic script for religious purposes but (Western) Cyrillic for secular texts. While this was a very stable situation lasting many centuries, there are many examples of short-lived digraphia in transitional phases, for example when Glagolitic was superseded by Cyrillic in Bulgaria in the ninth/tenth century, where the latter seems to have started to be used for non-liturgical purposes and the former continued to be used in liturgy for a considerable time.

The modern use of the Glagolitic alphabet in Croatia for exclusively decorative purposes can also be called digraphia, although it would be a very marginal case, and one might argue whether the Glagolitic alphabet fulfills a linguistic function at all and thus whether it is really used as a writing system.

For technical reasons (e.g. for writing e-mails or text messages), languages otherwise written in Cyrillic are nowadays sometimes (and even more so in the 1980s–2000s, when the technical restrictions were even more severe) written in Latin letters. Usually, this transcription is not very systematic, using di- and trigraphs (e.g. 〈ch〉 for 〈ч〉; 〈sh〉 for 〈ш〉, but often also for 〈щ〉) but also characters with similar shapes (e.g. 〈4〉 for 〈ч〉; 〈w〉 for 〈ш〉) and leaving the disambiguation to the context (cf. Reference BircerBircer 2004).

32.5.4 Diglyphia

After the introduction of civil type in Russia by Tsar Peter I in 1708, the Old Cyrillic script variant continued to be used for religious texts, so that there was a functional distribution of the two variants. An interesting case are Feofan Prokopovič’s (* 1681, † 1736) ‘political sermons’ (simply called slova in Russian): as sermons, they were printed in Old Cyrillic by the Synodal Press, but as political speeches, they were printed in civil type by the Academy of Sciences (Reference BunčićBunčić et al. 2016: 114f.). This situation of diglyphia lasted until the 1760s; since then, Russian texts have been exclusively printed in civil type, while Old Cyrillic type has been reserved for the Church Slavonic language. Where Old Cyrillic fonts are used for Russian texts today, it is always for textual functions like emphasis or to convey certain associations, not in the context of a sociolinguistic distribution of script variants.

32.5.5 Bigraphism

The third type of distribution of writing systems is diasituative distribution, where the choice depends on a multitude of sometimes contradictory factors, mainly concerning indexical values ascribed to the writing systems. The prime examples of bigraphism are Modern Serbian and Montenegrin. Every Serbian and Montenegrin child learns to read and write their native language in Cyrillic and Latin in first and second grade, and after that, both alphabets are employed seemingly at random. The main factor influencing the choice of script are indexical values associated with the two alphabets: Cyrillic is associated with values like tradition, home, Serbian culture, slowing down, or diligence, whereas the Latin alphabet is associated with values like modernity, business, Western culture, speed, or efficiency. The fact that these values are not mutually exclusive, with all of them often being – to varying degrees – related to the same situation, means that, in contrast to functional distributions like digraphia, people actually have a choice which script to use in a certain situation. However, in contrast to pluricentricity, the same text is never written in both scripts in parallel, because every reader can read both scripts.

A similar situation as in Modern Serbian and Montenegrin also existed in sixteenth- to eighteenth-century Ruthenian, although the quantity of texts produced in Cyrillic and Latin was not as balanced, with the number of Cyrillic texts far exceeding the number of texts in the Latin alphabet. Renewed attempts in the nineteenth century to introduce the Latin alphabet for writing the Modern Ukrainian language in Galicia led to public discussions in 1833–1837 and 1858–1859 that have been called ‘alphabet wars’ (Ukr. azbučna vijna, cf. Reference LesjukLesjuk 2014: 417–492; following Čop’s name for a similar phenomenon in Slovenia, see Section 33.6.3).

Bigraphism is also typical of linguistic minorities with a different writing system than the majority. For example, Rusyns in Slovakia in part use the Latin alphabet next to the traditional Cyrillic alphabet because some Rusyns read Cyrillic only with difficulty or not at all. The co-existence of Old Church Slavonic documents in Glagolitic and Cyrillic might also be a sign of bigraphism, though we lack data to assess the exact sociolinguistic situation and to exclude a privative or equipollent distribution.

32.5.6 Biglyphism

The Sorbian languages were characterized by a similar distribution of blackletter and roman type from 1847 (for Upper Sorbian) and 1862 (for Lower Sorbian) to 1938. The specificity of this situation of biglyphism was that both script variants were associated with domestic or foreign values by different parts of the population: the more educated Sorbs associated roman type with Sorbian culture within the framework of Slavic National Revival and blackletter with German hegemony, whereas many common people saw blackletter as the traditional Sorbian script variant and roman type as a sign of Czech or – among Lower Sorbs – Upper Sorbian dominance.

32.6 Script Reforms

The writing system as the most ‘visual’ subsystem of language, and a subsystem that has to be consciously learned (nowadays usually in school), is much more accessible to language planning than most other subsystems of language. Consequently, apart from slow, evolutionary developments that are characteristic of language change in general, writing systems are often characterized by abrupt, revolutionary changes on the basis of writing reforms.

All Slavic languages have been the object of such writing reforms. However, changes of script have always occurred gradually, for example the replacement of Glagolitic and Cyrillic with the Latin alphabet for Croatian, or the introduction of the Latin alphabet to Serbian. When in Belarusian the 1933 Narkamaŭka orthography, in contrast to the 1918 Taraškevica orthography, included only the Cyrillic alphabet, this was only the formal recognition of a process that had already begun in 1912 when Naša Niva discontinued its Latin-alphabet edition. All attempts at changing the script of a Slavic language by reform have been unsuccessful: three proposals to latinize the Ukrainian language (in 1834, 1859, and 1923; cf. Reference BunčićBunčić et al. 2016: 280), the 1930 proposals for a Latin alphabet for Russian (cf. Reference Alpatov, Tomelleri and KempgenAlpatov 2015), an attempted Cyrillization of Polish by the Russian authorities after the January Uprising of 1863/64, and even the declaration in the Serbian constitution of 2006 that only the Cyrillic alphabet is official, which did not change anything about the status of the Latin alphabet in the everyday life of the Serbs.

This is different with script variants. The most well-known Slavic example of a reform directed at glyphic variants is Peter I’s 1708 introduction of civil type for Russian, which was successful not only in Russia but eventually also in all the other languages written with Cyrillic letters. The reform that introduced roman type to Upper and Lower Sorbian in 1847 and 1862 was equally successful, although blackletter continued to be used for a long time in Upper Sorbian and is even used to this day in Lower Sorbian. In the other West Slavic languages, blackletter was phased out more slowly.

32.7 Conclusion

The Slavic scripts roughly align with the cultural division between Slavia Orthodoxa and Slavia Latina. Although the Cyrillo-Methodian tradition of the Glagolitic script connected to the Old Church Slavonic language was originally not confined to either of the two areas and the Glagolitic script was used longest in Catholic Croatia, it is nowadays continued in the form of the Cyrillic script, whereas the Latin alphabet in Slavia Latina is based on a completely different, ‘Western’ tradition. However, mutual influences abound and can be seen in script changes and various instances of biscriptality as well as in the introduction of roman type in the West in the sixteenth century, the adoption of its design principles in the ‘civil type’ in the East in the eighteenth century and the gradual replacement of both blackletter and Old Cyrillic, which was (almost) completed only in the twentieth century.

33 Orthographies

33.1 Introduction

The writing system of a language consists of a script (or several scripts), which provides a set of characters and their general relation to the respective linguistic units, and an orthography, which provides a norm for the concrete adaptation of the script to the target language. In Reference CoseriuCoseriu’s (1952) sense, the script is the sistema, whereas the orthography is the norma, which makes choices from the options the script provides (e.g. whether /j/ is written as 〈j〉, 〈i〉, or 〈y〉 in an orthography based on the Latin script) but often also deviates from the system in arbitrary ways. The term orthography is sometimes restricted to such norms that are formally standardized; here, however, it also encompasses orthographic usage (in the sense of German Gebrauchsnorm ‘norm by use’) that might otherwise be referred to by terms like spelling or Pol. grafia.

33.2 Orthographic Principles

33.2.1 Phonemic (or Phonetic) Principle

Orthographies are structured on the basis of several underlying principles. The basic orthographic principle of an alphabet is, of course, the phonemic principle, that is, the principle that the graphemes used to write a word should exactly mirror the phonemes occurring in it on the basis of fixed grapheme–phoneme correspondences. This principle is especially strong in languages like Belarusian, Serbo-Croatian, or Macedonian. It can be seen in spellings like Bel. 〈стала〉 as genitive singular of 〈стол〉 ‘table’, in Rus. 〈безграничный〉 ‘boundless’ vs. 〈беспокойный〉 ‘restless’, or in Scr. 〈udžbenik〉 ‘textbook’ from učiti ‘to teach’ + ‑benik. Rarely, orthographies represent allophones of the same phoneme by different graphemes; thus, in Belarusian, [v] spelled as 〈в〉 and [w] spelled as 〈ў〉 are allophones of the phoneme /v/ (Reference Bieder and RehderBieder 1998: 114; cf. 〈паправіць〉 ‘repair (verb)’ vs. 〈папраўка〉 ‘repair (noun)’). This can either be seen as a phonetic principle sensu stricto or as overspecification from the point of view of the phonemic principle. Underspecification is much more common; for example, in Slovenian the one letter 〈e〉 has to represent three separate phonemes, /ɛ/, /e/, and /ə/, and the Bulgarian graphemes 〈а〉 and 〈я〉 each have to cover both /a/ and /ə/. Note, however, that even the most ‘phonemic’ orthographies like the ones for Belarusian or Serbo-Croatian usually do not represent all the phonological detail, for example by disregarding stress, vowel length, or intonation; orthographies are not phonemic transcriptions, because there are other principles at work as well.

33.2.2 Syllabic Principle

The syllabic principle is the basic principle in a syllabary, but it also occurs in alphabets. For example, in Russian the vowel letters determine the pronunciation of the consonant letters (e.g. 〈т〉 can be /t/ or /tʲ/) and the pronunciation of many vowel letters depends on whether they are preceded by a consonant or not (e.g. 〈е〉 can be /ɛ/ or /jɛ/) – compare 〈лук〉 [ˈɫuk] ‘onion’ with 〈люк〉 [ˈlʲuk] ‘hatchway’ and 〈юг〉 [ˈjuk] ‘south’. Similarly, in Polish the pronunciation of consonant letters depends on whether they are followed by 〈i〉, regardless of whether the latter is pronounced as /i/ or mute, for example 〈sykać〉 [ˈsɪkat͡ɕ] ‘hiss’ vs. 〈sikać〉 [ˈɕikat͡ɕ] ‘splash’, 〈cało〉 [ˈt͡sawɔ] ‘whole’ vs. 〈ciało〉 [ˈt͡ɕawɔ] ‘body’. In Czech , this principle applies only to the letters 〈t〉, 〈d〉, and 〈n〉 before front vowels, for example 〈tým〉 [ˈtiːm] ‘team’ vs. 〈tím〉 [ˈciːm] ‘thereby’, 〈devět〉 [ˈdɛvjɛt] ‘nine’ vs. 〈děvče〉 [ˈɟɛft͡ʃɛ] ‘girl’. In all these cases, the pronunciation of a letter can only be determined by reading other letters, often the whole syllable.

33.2.3 Morphological Principle (or Stem Principle)

The morphological principle (sometimes referred to as stem principle) keeps the spelling of morphemes constant even though their pronunciation changes. This is the case when morphonological alternations like vowel reduction, assimilation, or final devoicing are ignored in spelling, for example Rus. 〈стола〉 [stʌˈɫa] ‘table (gen.sg)’ (rather than *〈стала〉) because of 〈стол〉 ‘table (nom.sg)’, Scr. 〈hrvatski〉 [ˈxr̩vat͡ski] ‘Croatian’ (rather than *〈hrvacki〉) because of 〈Hrvat〉 ‘Croat’. A special device are umlaut graphemes, which signify both a morphological relationship and a deviating pronunciation, like Rus. and Bel. 〈ё〉 (morphologically {е} but phonologically /ʲɔ/ or /jɔ/), Pol. 〈ó〉 ({o} but /u/) and 〈rz〉 ({r} but /ʒ/), or Cze. 〈ů〉 ({o} but /uː/): Thus, the forms given in Table 33.1 could have alternative, more ‘phonological’ spellings, but the ‘umlaut’ graphemes signify their relation with forms containing the same stem morpheme.

Table 33.1 ‘Umlaut’ graphemes

Language	Normative spelling	Alternative	Morphological relation
Russian	〈жёны〉 ‘wives’	*〈жоны〉	〈жена〉 ‘wife (nom.sg)’
Polish	〈nóg〉 ‘leg (gen.pl)’	*〈nuk〉	〈noga〉 ‘leg (nom.sg)’
Polish	〈pierze〉 ‘plumage’	*〈pieże〉	〈pióro〉 ‘feather’
Czech	〈dům〉 ‘house’	*〈dúm〉	〈domu〉 ‘house (gen.sg)’

33.2.4 Lexical Principle

The lexical principle, which is the counterpart of the morphological principle by virtue of ensuring that different lexemes (or morphemes) are spelled differently in spite of identical pronunciation (cf. English 〈seas〉 vs. 〈sees〉 vs. 〈seize〉 or 〈write〉 vs. 〈right〉 vs. 〈rite〉), hardly plays a role in the Slavic languages. Rare examples are Cze. 〈být〉 ‘to be’ vs. 〈bít〉 ‘to hit’ or Rus. 〈компания〉 ‘company’ vs. 〈кампания〉 ‘campaign’ (all of which, however, also have a clear historical basis). Otherwise, the main reason for different spellings of homonyms is always the morphological principle; for example, the spellings of the Polish homonyms 〈morze〉 ‘sea’ and 〈może〉 ‘can’ signify that these word forms are related to 〈morski〉 ‘maritime’ and 〈mógł〉 ‘could’, respectively; similarly, the spelling of Rus. 〈везти〉 ‘to drive’ in contrast to 〈вести〉 ‘to lead’ demonstrates its relatedness to 〈везу〉 ‘I drive’. In pre-1917 Russian orthography, there were significantly more spellings based solely on the lexical principle, for example 〈лѣчу〉 ‘I heal’ vs. 〈лечу〉 ‘I fly’ or 〈миръ〉 ‘peace’ vs. 〈міръ〉 ‘world’ – the former being the result of the merger of the phonemes *ě and *e, and the latter a deliberate distinction of two meanings of the same etymon. In Modern Church Slavonic, there are further such ‘artificial’ distinctions, for example using 〈ѧ〉 vs. 〈ꙗ〉 in 〈ѧзы́къ〉 ‘tongue; language’ vs. 〈ꙗзы́къ〉 ‘people; pagans’.

In some cases, diacritics are used to distinguish homophones. Thus, in Bulgarian the grave accent is obligatory on the word 〈ѝ〉 ‘her (dat.sg.f)’ (to distinguish it from 〈и〉 ‘and’) and in Macedonian on two more words (〈cѐ〉 ‘everything’ vs. 〈се〉 (reflexive pronoun) and 〈нѐ〉 ‘us’ vs. 〈не〉 ‘not’). Non-obligatory practices to disambiguate otherwise homographic word forms also exist in other languages, for example Scr. |sâm| ‘self’ vs. |sam| ‘am’, or Rus.|нёбо| ‘palate’ vs. |небо| ‘sky, heaven’ (but note that these word forms are also pronounced differently).

33.2.5 Grammatical Principle

The grammatical principle, which uses spelling to mark grammatical categories and which is so central in French, for example, is not used very much in Slavic languages. However, the Russian soft sign 〈ь〉 does not serve any phonological purpose after the postalveolars and affricates, because 〈ч〉 and 〈щ〉 are always palatalized and 〈ж〉, 〈ц〉, and 〈ш〉 are never palatalized, independent of what follows. Thus, the soft sign is placed at the end of i‑stem feminines but not masculines, for example 〈ложь〉 (f) ‘lie’ vs. 〈нож〉 (m) ‘knife’, 〈печь〉 (f) ‘oven’ vs. 〈меч〉 (m) ‘sword’, and even with loanwords like 〈тушь〉 (f) ‘India ink’ vs. 〈туш〉 (m) ‘fanfare’; furthermore, it marks the second person singular in all verbs (e.g. 〈пишешь〉 ‘(you) write’), and with some verbs, the imperative (e.g. 〈не плачь〉 ‘don’t cry!’ vs. 〈плач〉 (m) ‘lament’) or the infinitive (e.g. 〈сечь〉 ‘to hack’).

The grammatical principle is very consistently observed in the orthography of Modern Church Slavonic, which uses a number of highly artificial rules for the use of homophonous letters, letter variants, and diacritics to distinguish homonymous forms of the same word, as illustrated in Table 33.2 (cf. Reference TrunteTrunte 2018: 26–27). A remnant of this system is the use of accent marks in the Russian translation of the Bible, for example |чтò| ‘what’ vs. |что| ‘that’, |приходи́те| ‘come!’ vs. |прихо́дите| ‘you are coming’, or |го́рода| ‘town (gen.sg)’ vs. |города̀| ‘towns (nom.pl)’.

Table 33.2 Examples of Modern Church Slavonic grammatical distinctions

Opposition	Spelling 1	Function	Spelling 2	Function	Meaning
〈ѡ〉 vs. 〈о〉	〈но́вагѡ〉	gen.sg	〈но́ваго〉	acc.sg	‘new’
〈е〉 vs. 〈є〉	〈і҆ере́й〉	nom.sg	〈і҆ерє́й〉	gen.pl	‘priest’
oksia vs. kamora	〈ри́зы〉	gen.sg	〈ри̑зы〉	nom.pl	‘robe’

33.2.6 Etymological (or Historical) Principle

The etymological principle (or historical principle) conveys historical connections that no longer affect the contemporary phonology of a language. In the modern Slavic orthographies, this principle is not very strong. However, it underlies the use of Pol. 〈rz〉 vs. 〈ż〉, 〈h〉 vs. 〈ch〉, and 〈ó〉 vs. 〈u〉, Cze. 〈y〉 vs. 〈i〉, or Rus. unstressed 〈o〉 vs. 〈a〉 etc. wherever these graphemes are not supported by the morphological principle, for example Pol. 〈porządek〉 [pɔˈʒɔndɛk] ‘order’ but 〈pożądany〉 [pɔʒɔnˈdanɪ] ‘desired’, Cze. 〈pivo〉 [ˈpivɔ] ‘beer’ but 〈pytel〉 [ˈpitɛl] ‘sack’, or Rus. 〈паук〉 [pʌˈuk] ‘spider’ but 〈пором〉 [pʌˈrɔm] ‘ferry’. Additionally, foreign features are sometimes retained in spelling but not in pronunciation, for example in Rus. 〈комментарий〉 [kʌmʲɪnˈtaɾʲɪj] ‘comment’ or Cze. 〈komunismus〉 [ˈkɔmunɪzmus] ‘communism’. The fact that in Russian the spelling of 〈ѣ〉 vs. 〈е〉 was by the turn of the twentieth century based almost exclusively on the historical principle and therefore had to be learned by heart was the main reason for the spelling reform of 1917 (see Section 33.7).

A notable difference between the Cyrillic and the Latin alphabet is that in the former, letters and combinations of letters cannot be used with foreign grapheme–phoneme correspondences. Thus, it is completely impossible to borrow a name like 〈Shakespeare〉 into Cyrillic as something like 〈Схакеспеаре (Sxakespeare)〉 – it has to be spelled according to domestic grapheme–phoneme correspondences as 〈Шекспир (Šekspir)〉 (cf. Reference Bunčić and KotjurovaBunčić 2003). This applies even within Cyrillic, so that Russian names like 〈Екатеринбург (Ekaterinburg)〉 or 〈Калининград (Kaliningrad)〉 cannot be borrowed into Serbian without being turned into 〈Јекатеринбург (Jekaterinburg)〉 and 〈Калињинград (Kalinjingrad)〉, and Bulgarian names like 〈Търново (Tărnovo)〉 or 〈Свищов (Svištov)〉 have to be assimilated to Russian as 〈Тырново (Tyrnovo)〉 and 〈Свиштов (Svištov)〉. This restriction limits the extent to which the etymological principle can be used in orthographies based on the Cyrillic alphabet, which also includes the Serbian orthography written in the Latin alphabet, because it is based on a one-to-one correspondence between the two alphabets (and therefore a Serbian text always has 〈Šekspir〉, whereas Croatian texts have 〈Shakespeare〉).

33.2.7 Deep vs. Shallow Orthographies

As one can see, the relative weight of the orthographic principles is very unevenly distributed among the Slavic languages. In some (e.g. Belarusian, Serbo-Croatian, Macedonian ), the phonemic (or phonetic) principle is very strong and hardly constrained by other principles. Such orthographies are commonly called shallow. Other orthographies are comparatively deep by virtue of having many non-phonemic spellings on the basis of higher-order principles. Polish and Russian are examples of deeper orthographies, and the Modern Church Slavonic orthography is by far the deepest. No Slavic language, however, has an orthography as shallow as Finnish or as deep as Irish, English, or French.

The relative weight of the principles can also change over time. It has already been mentioned that the Russian orthography before 1917 was deeper than the contemporary one. For Serbo-Croatian, there were debates between supporters of shallower orthographies in the tradition of Vuk Karadžić and Aleksandar Belić and those of deeper orthographies in the tradition of Croatian literacy since the renaissance, such as Dragutin Boranić and Adolf Bratoljub Klaić. While Serbian orthography traditionally leans to the former and Croatian orthography to the latter side, all current Serbo-Croatian orthographies are a compromise between the two extremes.

33.3 Punctuation, Numbers, Abbreviations

Punctuation has so far been severely understudied. During the Middle Ages, the main punctuation mark, in all three Slavic alphabets, was the middle dot 〈·〉. With the introduction of letterpress printing, the modern punctuation system evolved in Western Europe and subsequently spread to the Cyrillic alphabet as well, so that throughout the modern Slavic languages the set of punctuation marks is essentially the same.

However, especially the comma can be used in very different ways. Although comma rules are too complex to discuss here in detail, there are general tendencies towards ‘heavy’ or ‘light’ comma use in the Slavic languages. Thus, in most Slavic languages, all subordinate clauses are separated by commas. However, the Serbo-Croatian comma rules are ‘lighter’ and very similar to the English ones, that is, subordinate clauses are usually not separated by commas (with the exception of non-defining relative clauses, clauses signifying a contrast or consequence, and subclauses followed by the main clause). The Russian comma is especially ‘heavy’ by virtue of it being also used to separate sentence adverbials like konečno ‘of course’ or nakonec ‘finally’ as well as any syntagma introduced by krome ‘except’, kak ‘as’, or čem ‘than’.

Quotation marks, while being used for roughly the same functions in all the Slavic languages, exhibit a wide variety of forms. Thus, in Russian, Belarusian, and Ukrainian, the main shape is «цитата» (but „цитата“ and “цитата” are used as well); in Polish, the main shape is „cytat” (with «cytat» being used less frequently); in Bulgarian, Czech, Slovak, and Slovenian, the main shape is „citat“ (note the difference between the shape combination ₉₉ … ⁹⁹ in Polish and ₉₉ … ⁶⁶ in the other languages), but in Slovenian the shape »citat« is also common, and in Bulgarian «цитат» and „цитат” are in use next to „цитат“. Serbo-Croatian and Macedonian are characterized by a wide variety of shapes in common use: »цитат«, “цитат”, „цитат“, and „цитат”. For quotations inside quotations, single variants of the same quotation marks are used in Polish („‚cytat’ w cytacie”), Czech and Slovak („‚citát‘ v citátě“), as well as Slovenian („‚citat‘ v citatu“ or »›citat‹ v citatu«). In Serbo-Croatian (especially in Croatian), single apostrophes (in identical or opposing shapes) can be used regardless of the shape of the main quotation marks (e.g. »’citat’ u citatu«, „‘citat’ u citatu“, etc.). In Bulgarian, and often also in the East Slavic languages, the same quotation marks are used for the inner and the outer quotation, with adjacent quotation marks being merged into one (i.e. *〈„„〉, *〈»»〉, etc., for example Bul. „цитат“ в цитат“, Rus. «цитата» в цитате»). Additionally, in the East Slavic languages, Polish, Serbo-Croatian, and Macedonian, nested quotations can also be marked by combining any other different shapes used in these languages (e.g. Pol. „«cytat» w cytacie”, Rus. «„цитата“ в цитате»). In Czech, Slovak, Slovenian, Serbo-Croatian, and Macedonian, quotation marks are also used for direct speech in narrative texts (e.g. Cze. „Kdo je tam?“ rozkřikl se.), whereas the East Slavic languages, Polish, and Bulgarian use a dash in this function (e.g. Ukr. – Хто там? – закричав він.). The latter practice is also used in Serbo-Croatian and Macedonian.

The modern Slavic languages use Arabic and Roman numbers just like the other European languages, although the use of these two sets of numbers differs in details; for instance, the East Slavic languages and Polish use Roman numbers for centuries (e.g. Pol. XXI wiek ‘21st century’, Rus. XXI век), whereas the other Slavic languages use Arabic numbers for this purpose. The West Slavic languages and the languages of former Yugoslavia use a dot to indicate ordinal numerals (e.g. Cze. 21. století), whereas the other Slavic languages either use numbers without any distinction (e.g. Bul. 21 век) or append the appropriate ending (e.g. Rus. 1980-е годы ‘the 1980s’).

Apart from that, in Old Cyrillic or Glagolitic, Cyrillic numbers or Glagolitic numbers are used, which consist of letters that are attributed numeric values. These systems are based on the Milesian number system used in Greek; however, while Cyrillic letters are assigned the numeric values of the corresponding Greek letters (e.g. Cyrillic = Greek ‘22’), Glagolitic numbers are based on the order of the Glagolitic alphabet (e.g. Glagolitic ‘43’, ‘22’).

In texts written in Old Cyrillic or Glagolitic, abbreviations are formed either using various forms of titlo as an abbreviation mark (originally only for sacred words) or superscript letters that signify that at least one other letter is omitted. Nowadays, all modern Slavic languages use the same types of abbreviations as other European languages, including the abbreviation dot. Additionally, however, when taking notes of a conversation, lecture, etc., people often rather omit the middle of a word than the end because of the importance of inflection in the Slavic languages, so that for example a Russian linguistics student might write 〈диф-ому пр-ку〉 for дифференциальному признаку ‘distinctive feature (dat.sg)’.

33.4 Orthographic Devices for Adapting a Script to a Language

When a script is applied to a new language, there are usually some phonemic oppositions that cannot be expressed by the script because they do not exist in the language for which the script was used before. There are different ways to solve this problem (see Tables 33.3 and 33.4; SRu. = Southern [Pannonian, Vojvodina] Rusyn, NRu. = Northern [Carpathian] Rusyn ).

Table 33.3 Letters and digraphs: Latin script

Table 33.4 Letters and digraphs: Cyrillic script

33.4.1 Underspecification

At the beginning, the problem is often not solved at all. Thus, the Old Church Slavonic Glagolitic alphabet used for both *ě and *ja and – at least initially – very imperfectly represented palatalized consonants and /j/. It seems likely that Cyril’s original Glagolitic alphabet did not contain any preiotated letters, the distinction between and having been added only later and maybe originally representing a separate vowel phoneme, *ü /y/. In the Slavic Latin alphabet of the twelfth/thirteenth century (Cze. primitivní pravopis ‘primitive orthography’, Pol. grafia niezłożona ‘non-complex spelling’), individual letters could represent several different phonemes (e.g. 〈z〉 for Pol. /z/, /ɕ/, /ʑ/, /ʒ/, and /d͡z/: 〈wzacone〉 w zakonie, 〈Zeraz〉 Sieradz, 〈zeme〉 ziemie, 〈ziuoth〉 żywot, 〈Bezdeze〉 Bezdziedze; Reference MazurMazur 1993: 155).

Underspecification can also be the result of sound changes. An example is the change of e > o in Russian, due to which /o/ could at least since the sixteenth century appear after palatalized consonants, which, however, could not be represented by the orthography until solutions were invented in the eighteenth century, among them the digraph proposed by Vasilij Tatiščev, the ligature proposed by Vasilij Adodurov (and not, as is often claimed, by Vasilij Trediakovskij), the digraph advocated by Aleksandr Sumarokov, and, finally, the diacritic combination 〈ё〉 proposed by Ekaterina Daškova in 1785 (and not by Nikolaj Karamzin), which is still not an obligatory part of the Russian alphabet to this day (cf. Reference UspenskijUspenskij 1975: 85, 208–212, Reference Pčelov and ČumakovPčelov & Čumakov 2000: 13–16, 22–23).

33.4.2 Digraphs

The next step in the development of Latin-based Slavic orthographies was the use of combinations of letters (Cze. spřežkový pravopis ‘digraph orthography’, Pol. grafia złożona ‘complex spelling’), for example 〈sz〉 for /ʃ/, 〈rz〉 for /r̤/, etc. Similarly, the Glagolitic and Cyrillic alphabet use the digraphs and (nowadays ) to represent *y, but in general this is much rarer in the Cyrillic and Glagolitic than in the Latin alphabet.

33.4.3 Special Letters

Digraphs have the disadvantage of potential ambiguity, as for example in Pol. 〈rz〉, which signifies /ʒ/ in marzyć ‘to dream’ but /rz/ in marznąć ‘to freeze’, or Ser. 〈nj〉, which signifies /ɲ/ in vanjski ‘external’ but /nj/ in vanjezički ‘extralinguistic’. An alternative is the creation of special letters, for example by turning digraphs into ligatures. Thus, Vuk Karadžić created the ligatures for /ʎ/ from and for /ɲ/ from (so that 〈вањски〉 and 〈ванјезички〉 are clearly distinguished). Reference Januszowski and UrbańczykJanuszowski ([1594] 1983: 188) designed special ligatures for the Polish digraphs, so that he could distinguish between 〈marзyć〉 and 〈marᴣnąć〉, but they did not take hold. The preiotated letters of Old Cyrillic, , , , and , as the name suggests, were originally combinations with iota (, , , ), and the Cyrillic letter , whose original shape seems to have been mirrored horizontally, might have arisen as a ligature from the Greek digraph (pronounced [y]). Cyrillic is nowadays viewed as a single letter although it still consists of two unconnected parts, and .

An alternative way of creating a special letter is borrowing. As we have seen, Cyrillic is essentially the Greek alphabet with several additional letters borrowed from Glagolitic. Vuk Karadžić was heavily criticized for integrating Latin into his Serbian Cyrillic alphabet in 1818, but his alphabet is used to this day. He also borrowed from the Romanian Cyrillic orthography.

Another possible source of special letters are letter variants. For example, is attested as a variant of beta in Greek texts, which made it possible to distinguish between for /b/ and for /v/ in Cyrillic. The distinction of and in Old Cyrillic also seems to be a secondary differentiation of what originally was just one nasal vowel borrowed from Glagolitic . , , and could at some time all be used for /e/, but nowadays Rus. and Ukr. are in opposition to , and Vuk Karadžić created Ser. and as a secondary differentiation of the Old Cyrillic letter djerv .

33.4.4 Diacritics

Probably the most elegant way to adapt a script to a language is by using diacritical marks (this stage of orthographic development is called diakritický pravopis ‘diacritic orthography’ in Czech, and the introduction of diacritics into Polish marks the beginning of the Middle Polish orthography). However, diacritics do not appear out of thin air. They can be iconic, developed from (parts of) letters, from deletion marks, from disambiguation marks, or borrowed (Bunčić 2023).

Iconic diacritics like the Greek tone marks, which signify rising tone by the acute 〈´〉, falling tone by the grave 〈`〉, and rising-falling tone by the circumflex accent 〈 ̑ 〉, do not occur in Slavic orthographies. (The conventional Serbo-Croatian accent marks, which are not part of official orthography, were borrowed from Greek and are in part rather anti-iconic, e.g. 〈`〉 for a (short) rising tone. This seems to be due to the very impressionistic description of Štokavian tone by a probably rather tone-deaf Vuk Karadžić (1814: XXXVI) and Đuro Daničić’s endeavor to gently modify his friend’s system rather than introduce a new one; cf. Reference Anić, Isaković and TorbarinaAnić 1981.) However, some Greek diacritics that were borrowed into Slavic writing systems have preserved their iconicity, for example the trema (diaeresis) 〈¨〉 signifying two separate syllables and the breve 〈˘〉 signifying a short or non-syllabic vowel, which have kept their meanings in Church Slavonic with kendema for syllabic /i/ vs. with breve for non-syllabic /v/ as well as in Rus. , Bel. , and Ukr. .

The only diacritic developed from a letter in Slavic is the Czech kroužek ‘little circle’ (ring) on for /uː/, which is obviously descended from the letter 〈o〉, from an original spelling 〈uo〉 for the diphthong . However, the German umlaut dots 〈¨〉 can be traced back to the letter 〈e〉 (cf. in old German texts for 〈ä〉, 〈ö〉, 〈ü〉), and they were borrowed as Slk. 〈ä〉, Rusyn , and Rus. (Reference Pčelov and ČumakovPčelov & Čumakov 2000; compare German 〈Mann〉 ‘man’ vs. 〈Männer〉 ‘men’ with Rus. ‘wife’ vs. ‘wives’).

The most obvious deletion mark is a stroke through the letter, as in Pol. and Scr. , which were originally employed to signify that 〈l〉 or 〈d〉 ‘is not the correct letter (but close)’. The Old Polish nasal vowel letter , which eventually gave rise to , probably also originated from an 〈o〉 or 〈a〉 struck out. In medieval manuscripts, dots above or below letters were also used for cancellation, and it is this deletion dot that was turned into a diacritic in Old Irish and then probably borrowed by Jan Hus for Czech (Reference Bunčić and KotjurovaBunčić 2023), where it later turned into the háček (caron) and produced , and in Czech, Slovak, Slovenian, and Serbo-Croatian as well as, before its change of shape, Pol. and, Reference Zaborowski and Urbańczykvia Zaborowski’s ([1514] 1983: 58) double dot 〈¨〉 that then turned into a stroke , Pol. , and and, borrowed into South Slavic languages, Scr. , Mac. and , and Mon. and .

There are no diacritics created from disambiguation marks (like the opposition between 〈i〉 and 〈ı〉 in Turkish) in Slavic orthographies. Most Slavic diacritics are in fact borrowed from a non-Slavic or from another Slavic language (see the examples above and Section 33.5).

While the Latin alphabet mostly uses detached diacritical marks in the narrow sense, the Cyrillic alphabet has a tradition of using diacritical elements that are directly attached to the letter they modify, for example Ukr. vs. or Old Cyrillic vs. ; these probably arose from a secondary differentiation of letter variants, just like other forms of special letters (see Section 33.4.3).

33.4.5 Schriftdenken

When a script is applied to a new language, certain mechanisms of the writing system of the donor language are often transferred to the new language although they are not needed there. Reference TrubetzkoyTrubetzkoy (1954: 15) has called this phenomenon Schriftdenken (‘script-thinking’) and given the Glagolitic alphabet as an example: just as the Greek alphabet of the ninth century contained three graphemes for /i/, two letters for /o/ and only a digraph for /u/, the Glagolitic alphabet has three /i/ letters , two /o/ letters , and a ligature (derived from a digraph ) for /u/. Manifestations of Schriftdenken can also be seen in more modern implementations of scripts, for example when the principle behind Polish spellings like 〈gość〉 ‘guest’ with a separate diacritic on the assimilated 〈s〉 is transferred to the Belarusian Taraškevica orthography as 〈госьць〉 (where the Narkamaŭka spelling 〈госць (goscʹ)〉 is sufficient to convey the pronunciation) or when Russian, Ukrainian, Polish, and Slovak influences compete in contemporary Rusyn orthographies (Reference Weth and BunčićWeth & Bunčić 2020).

33.5 Social, Cultural, Religious, and Political Influences

Just as with scripts and script variants (see Section 32.5), cultural influences and political alliances can also be seen in the borrowing of orthographic devices. A very conspicuous example of this is the borrowing of diacritics. In the Slavic world, we see a repeated borrowing of Greek diacritics into Church Slavonic during the so-called Second and Third South Slavic Influences, and a spread of the Czech diacritics (dot/háček and stroke/acute) to many other Slavic languages after the Hussite movement had led to Czech being the first language of Slavia Latina with a noteworthy text production and with a systematic orthography, which therefore served as a role model for other Slavic speech communities: Polish, Slovak, Upper and Lower Sorbian, Serbo-Croatian, and Slovenian, as well as the Latin orthographies for Belarusian and Ukrainian, non-Slavic languages like Hungarian, Lithuanian, Latvian, or even Lakota, and scientific transliteration and Americanist phonetic transcription. Historical cultural ties with France and Germany left their traces in the use of quotation marks of the French («citation» and dashes for direct speech) or German shape („Zitat“ and quotation marks for direct speech).

33.6 Orthographies in Biscriptality

There are many cases in which two or more orthographies are used (or were used historically) simultaneously for the same language. As described in Section 32.6, this phenomenon can be analyzed as biscriptality, using the model proposed by Reference BunčićBunčić et al. (2016: 67). Just like scripts and script variants, orthographies can be distributed in three different ways: in an equipollent opposition, a privative opposition, or a diasituative distribution. (Most of the following examples are taken from Reference BunčićBunčić et al. 2016, where further references can be found.)

33.6.1 Orthographic Pluricentricity

An example of an equipollent distribution of orthographies is Modern Serbo-Croatian. Bosnian, Croatian, Montenegrin, and Serbian nowadays all have independent spelling norms. While in general the differences are subtle, the Montenegrin authorities have gone a step further in officially introducing two additional letters to both alphabets, 〈ś〉/〈с́〉 for [ɕ] and 〈ź〉/〈з́〉 for [ʑ]. These letters reflect the – actually rather colloquial – palatalization of [sj] and [zj], which in the other standard varieties are written as 〈sj〉/〈сј〉 and 〈zj〉/〈зј〉. (Additionally, palatalization is also rendered as 〈ć〉/〈ћ〉 for [t͡ɕ] and 〈đ〉/〈ђ〉 for [d͡ʑ] in the place of 〈tj〉/〈тј〉 for [tj] and 〈dj〉/〈дј〉 for [dj].) However, the official orthography does not declare this as mandatory, so that at least up to now most Montenegrin texts do not use the new letters at all. Historically, there had been orthographic pluricentricity within what is now the Croatian variety until the introduction of the Gajica alphabet in the 1830s. The main difference lay between Italian-based orthographies along the coast and Hungarian-based orthographies in the hinterland. Among the former, a Dalmatian and a Ragusan orthography could be distinguished, and among the latter, a Catholic and a Protestant variety (cf. Reference Marti, Baddeley and VoesteMarti 2012: 282–286).

Further – historic – Slavic cases of orthographic pluricentricity are: Russian between the World Wars, when the emigration continued to use the old orthography (Bunin famously exclaimed “охъ, какое проклятое правописаніе!” (‘oh, what a damned orthography!’) when reading the Izvestija in reformed spelling, Reference BuninBunin 1935: 97; cf. Reference BunčićBunčić et al. 2016: 219–224), while in the Soviet Union only the new orthography could be used; Upper Sorbian, which from the seventeenth century to World War II had a German-based Protestant orthography (with 〈ß〉 for /s/, 〈ſch〉 for /ʃ/, etc.) and a Czech-based Catholic orthography (with 〈ſ/s〉 for /s/, 〈ſſ/š〉 for /ʃ/, etc.); and Polish, which was – in part due to the partitions of Poland – governed by two different norm authorities from 1877 to 1936, the Warsaw school of orthography and the Cracow school of orthography. In 1936, the two orthographies were replaced by a compromise, which for example favored the Cracow spelling 〈Anglia〉 ‘England’ over the Warsaw spelling 〈Anglja〉 but also the Warsaw spelling 〈Francja〉 ‘France’ over the Cracow spelling 〈Francya〉.

33.6.2 Diorthographia

A privative opposition yields sociolinguistic situations in which scripts are functionally distributed, in a similar way as speech forms are distributed in diglossia. For example, in the texts written on birch bark from eleventh- to fifteenth-century Novgorod, there were two orthographies in use, a ‘standard orthography’ (standartnaja orfografija) and a vernacular orthography (bytovaja orfografija ‘mundane orthography’, Reference ZaliznjakZaliznjak 2004: 21–22). The latter was characterized by using |о| ~ |ъ| as well as |е| ~ |ь| (~ |ѣ|) as allographs in free variation. While documents on parchment were (with one exception) exclusively written in the standard orthography, the vernacular orthography seems to have been the default for birch bark texts, at least during the thirteenth century, when it accounted for ca. 90 percent of the birch bark texts. (Note that the choice of orthography was determined neither by the social position of the writer nor by the content of the text.)

In the Early Modern Period, the orthography used in private handwriting was often different from the one in printed texts or widely circulated manuscripts. Reference Čejka, Zand and HolýČejka (1999: 28) has called the former ‘pro foro interno’ and the latter ‘pro foro externo’. In Czech, the Brethren orthography was never completely adopted in the manuscripts ‘pro foro interno’; in Polish, the new orthography established by the mid-sixteenth century took about another century to be used in private writings as well; and in Russian, the orthographic changes associated with the civil style introduced in 1708 took until the end of the century to completely prevail in handwriting.

A technically induced case of diorthographia is the reduction of Slavic orthographies to the 26 letters of the Latin alphabet without diacritics (according to the restrictions of the ASCII code or a Western codepage) for use in e‑mails, text messages, etc. (nowadays becoming increasingly rare due to the Unicode support in most applications). While in many cases diacritics are simply omitted (〈č〉 → 〈c〉, 〈Ł〉 → 〈L〉, etc.), a few letters are commonly replaced with digraphs (e.g. 〈đ〉 → 〈dj〉, 〈ť 〉 → 〈t‘〉). However, complex systems striving for an unambiguous transliteration have also been used (defining correspondences like Scr. 〈č〉 → 〈cc〉, 〈ć〉 → 〈ch〉, etc.).

33.6.3 Biorthographism

With a diasituative distribution, two (or more) orthographies are used in accordance with a multitude of indexical values ascribed to them. For example, the first officially codified orthography for Modern Belarusian was laid down in Reference TaraškevičTaraškevič’s 1918 grammar and is therefore commonly called Taraškevica. It was superseded by the russificatory Narkamaŭka orthography of 1933. However, the Taraškevica norm started to be used again in the course of Perestrojka, so that the two norms began to compete with each other. After Lukašėnka was elected president in 1994, Taraškevica gradually became a symbol of a democratic, oppositional orientation. However, since the oppositional newspaper Naša Niva switched to Narkamaŭka in 2008, Taraškevica has again been so marginalized (existing only in a handful of webpages, including a separate version of Wikipedia) that the Belarusian biorthographic situation can be considered to have practically ended after two decades, at least in Belarus itself; Belarusians outside of Belarus can of course still use it freely, and many continue to do so.

In the 1820s–1830s, Slovenian was written in several different orthographies that used different devices to augment the Latin alphabet: The traditional Bohorič alphabet (bohoričica, introduced in 1584) was based on German grapheme–phoneme correspondences (〈z〉 /t͡s/, 〈ſ〉 /s/) and used long 〈ſ〉 vs. round 〈s〉 to distinguish between voiceless /s/ and voiced /z/ as well as digraphs formed with 〈h〉 for the postalveolars (〈ſh〉 /ʃ/, 〈sh〉 /ʒ/, 〈zh〉 /t͡ʃ/). The Dajnko alphabet (dajnčica, 1824) instead used other grapheme–phoneme correspondences (〈c〉 /t͡s/, 〈s〉 /s/, 〈z〉 /z/, 〈x〉 /ʒ/) and a few special letters (〈ɥ〉 /t͡ʃ/, 〈ȣ〉 /ʃ/, 〈ŋ〉 /ɲ/). The Metelko alphabet (metelčica, 1825) introduced 13 special letters – mostly borrowed from Cyrillic – for phonemes for which there was no unambiguous letter in the Latin alphabet (e.g. 〈ꝫ〉 /z/, 〈ƞ〉 /t͡s/, 〈ɰ〉 /ʃt͡ʃ/). School books and grammars appeared in all three alphabets, and all had their supporters. A public discussion in 1831–1833, which brought about the end of the Metelko alphabet, was called ‘alphabet war’ by Matija Čop (‘ABC-Krieg’, Reference ZhópZhóp 1833; Sln. abecedna vojna). The Dajnko alphabet continued to have supporters especially in Styria, but 1839 saw the first book printed in Gajica, the orthography that finally prevailed and is used to this day.

This alphabet, which was introduced by Ljudevit Gaj in 1830, uses 〈č〉 /t͡ʃ/, 〈š〉 /ʃ/, and 〈ž〉 /ʒ/, and for Serbo-Croatian also 〈ć〉 /t͡ɕ/ (and 〈đ〉 /d͡ʑ/, which was introduced later by Đuro Daničić). However, the Rječnik hrvatskoga ili srpskoga jezika (Dictionary of the Croatian or Serbian language, Zagreb 1880–1976), which was initiated by Đuro Daničić in 1867 and edited by the Yugoslav Academy of Sciences for over a hundred years, was printed in a special orthography that also included the letters 〈ǵ〉 for 〈dž〉, 〈ļ〉 for 〈lj〉, and 〈ń〉 for 〈nj〉. Some Serbo-Croatian scholarly texts from the late nineteenth and twentieth centuries also used this orthography, which was associated with scientific accuracy and the common Serbo-Croatian literary heritage on which the dictionary was based (because the three additional letters allow a one-to-one transliteration into Cyrillic: 〈ǵ〉 ↔ 〈џ〉, 〈ļ〉 ↔ 〈љ〉, 〈ń〉 ↔ 〈њ〉).

33.7 Spelling Reforms

Orthography, as a system that has to be actively learned, is an element of language that is very accessible to language planning. However, those who have painstakingly (in former times literally) managed to master it, are often reluctant to learn new rules. This is why spelling reforms are almost always a matter of heated discussion, at least in modern societies.

Usually, orthography reforms aim at making an orthography shallower, that is, at strengthening the phonemic principle, often readjusting spellings to phonology after a sound change, and thus making writing and especially learning to write easier. For example, by abolishing the letter 〈ѣ〉 (and other letters) in 1917, the Russian spelling reform reacted, with a few centuries’ delay, to the merger of *ě with *e in Russian (which had – probably somewhat artificially – still been pronounced differently by educated people at the time of Peter I’s alphabet reform). A phonemicization was also the pronounced aim of Vuk Karadžić’s orthography reform, which started in 1814 and used Sava Mrkalj’s rendering Piši kao što govoriš of Adelung’s principle Schreib wie du sprichst (‘Write as you speak’) as a motto. The introduction of diacritics for Czech in the early fifteenth-century treatise De orthographia bohemica and the improvements of Polish orthography by Stanisław Zaborowski, Hieronymus Vietor, and others a century later also met their objectives of making orthography reflect pronunciation more clearly.

Other spelling reforms were intended to increase or decrease the visual differences between one language and another. Thus, the mid-nineteenth-century replacement of 〈w〉 with 〈v〉 and 〈au〉 with 〈ou〉 in Czech distanced Czech from German. The 1933 reforms of Ukrainian and Belarusian orthography brought these languages visually closer to Russian (and the 1990 reintroduction of the letter 〈ґ〉 in Ukrainian slightly increased the distance again). Similarly, the so-called korienski pravopis ‘root orthography’ introduced by the Croatian fascist government in 1941, a distinctly morphemic orthography, sought to increase the contrast to Serbian with its determinedly phonemic orthography. The Bulgarian spelling reform of 1945 avowedly followed the Russian spelling reform of 1917 by abolishing all the letters that did not exist in the new Russian orthography (i.e. 〈ѣ〉, 〈ѫ〉, and final 〈‑ъ〉) while at the same time phonemicizing Bulgarian spelling at the expense of the morphological principle. The major aim of some spelling reforms is the orthographic (re)unification of a language, for example the introduction of Gajica to Croatian and Slovenian , which were written in a multitude of orthographies at the time, or the Polish spelling reform of 1936, which ended the time of the competing orthographic ‘schools’ of Warsaw and Cracow.

Some reform proposals, however, were not accepted by the population, although they made just as much sense as the successful reforms. For example, together with the introduction of civil type in 1708, Peter I also abolished the ‘superfluous’ letters 〈и〉 and 〈з〉 in favor of 〈і〉 and 〈ѕ〉, respectively, but the doublets had to be reintroduced two years later and continued to be used for another two centuries before the issue was finally resolved by the reform of 1917 (which, however, abolished 〈і〉 and 〈ѕ〉 instead of 〈и〉 and 〈з〉). Very sensible proposals to improve Russian orthography were put forward by official orthographic commissions in 1964 and 2000 but were never adopted (cf. Reference KarpovaKarpova 2010). The Metelko alphabet of 1825 represented Slovenian phonology much better than both the Bohorič alphabet current at the time and the Gaj alphabet used today because it had separate letters for the phonemes /e/, /ɛ/, /ə/, /o/, and /ɔ/ (〈ϵ〉, 〈e〉, 〈ꙅ〉, 〈o〉, and , respectively). The Omarčevski orthography (Bul. omarčevski pravopis), introduced by the Bulgarian spelling reform of 1921, anticipated many solutions of the 1945 orthography that is in use to this day, but it was, for example, more consistent in using a single letter 〈ѫ〉 to represent /ə/ rather than the three letters 〈ъ〉, 〈а〉, and 〈я〉, two of which are ambiguous. Nonetheless, it was rejected by many intellectuals and withdrawn after the coup d’état of 1923. The Russian letter 〈ё〉, which was invented in 1783 by princess Ekaterina Daškova (cf. Reference Pčelov and ČumakovPčelov & Čumakov 2000: 13–16), took 12 years to be used in a printed book for the first time and 159 years to become a mandatory part of the Russian school program. Two hundred and forty years after its invention, still nobody is obliged to actually use it (with the exception of proper names in official documents, for which it was made obligatory in 2017), and most people replace it with 〈е〉 in almost all circumstances.

The question which factors influence the success or failure of writing reforms is not yet resolved. Among the most significant factors seem to be the literacy rate of the speech community (lower literacy rates making it easier to reform the writing system) and the timing with respect to political events (writing reforms implemented immediately after a revolution seeming to have the best chances of being accepted; cf. Reference BunčićBunčić 2017).

33.8 Conclusion

Most Slavic orthographies are relatively shallow, relying mainly on the phonemic and the morphological principles, with other orthographic principles playing minor roles. However, in some orthographies with a rather unbroken tradition the historical principle plays a certain role (like Polish or Czech) and others rely heavily on the morphological principle (like Russian or to some extent Bulgarian). In some minor details (like comma rules or quotation marks) one can see different Slavic languages displaying different external influences, especially from French and German. In the way writing systems were adapted to the Slavic languages, the major split is between the languages written in Cyrillic and the ones written in the Latin alphabet: in the Latin alphabet, the main devices used are diacritics (nowadays especially the ones introduced by the Hussites) and digraphs, whereas the Cyrillic alphabet hardly has diacritics or digraphs at all but uses special letters created from ligatures or with diacritic elements or borrowed from a different script. Spelling reforms over the course of history have generally strengthened the phonemic principle, unified orthography for a language, or increased or decreased differences between languages in line with the political situation.

Vod-a	pi-t’.
water-nom.sg	drink-inf
‘It is necessary to drink water.’

Vod-y	ubyva-et.
water-gen.sg	decrease-3prs.sg
‘The water is going down.’

Da-j-te	chleb-a,	požalujsta.
give-imv-2sg	bread-gen.sg	please
‘Pass me some bread, please!’

i	malčik	idjot	iskat’	ljaguška
and	boy-nom	goes	look for	frog-nom
‘And the boy goes looking for his frog’ (Reference Polinsky, Brinton, Kagan and BauckusPolinsky 2008b: 153)

Njemu	je	još	više	sram.
he-dat	be-3.sg	more	more	shame
‘He is even more embarrassed.’ (Reference Hansen, Hansen, Grković-Major and SonnenhauserHansen 2018: 22)

one	turski	krovove
this-acc.pl	Turkish-nom.pl	roof-acc.pl
‘those Turkish roofs’ (Reference Hansen, Hansen, Grković-Major and SonnenhauserHansen 2018: 32)

Iskam	da vidja	go.
want	to see	it-cl.m
‘I want to see it.’ (Reference Ivanova-SullivanIvanova-Sullivan 2019: 22)

U	menja	knig-a.
prp	1sg.gen	book-nom.sg
‘I have a book.’

Book contents

Part 5 - Sociolinguistic and Geographical Approaches

Summary

Information