Empirical Approaches

Part IV Empirical Approaches

14 Studying Epigraphic Writing

14.1 Introduction

In this chapter, we focus on epigraphic writing (i.e. writing on hard materials such as stone, metal and ceramics) in ancient Italy, from around 300 to 50 BC. This material presents a whole host of methodological problems for the study of orthography, but also a wealth of opportunities for gaining insight into the mechanics of these languages and their wider social context. First, the problems. There were multiple steps in the production of epigraphic texts on stone or bronze, and each step (payment, commissioning, drafting, tracing onto the support, incising, painting) might be completed by one or more different people. It becomes difficult, therefore, to know who was in control of the orthography in any individual case. In some cases, the person commissioning the inscription, whose name and personal details may be recorded in the inscription, might have come prepared with a draft, which was faithfully copied; in other cases, his or her instructions might have been much vaguer; in others, the tracing of the text might have been done inexpertly or inaccurately, or the stonecutter might have made changes at a late stage. In yet other cases, such as some curse tablets and types of graffiti, it may be possible to shrink this multistage process down to the actions of one individual, but we can never be certain: many curse tablets, for example, were likely written or drafted by professionals, who themselves carried handbooks of example curses copied from elsewhere to be adapted for their clients (Reference GagerGager 1992: 4–5, Reference DickieDickie 2001: 48–49, Reference McDonaldMcDonald 2015: 136). We have to assess each inscription carefully in its own context, looking for any clues as to how it was put together.Footnote ¹

Then there are the practical problems of studying languages with limited data. Small corpora of only a few hundred or a few thousand words limit the number of instances of each spelling, making it harder to find patterns. Damage to stones over the centuries can obscure some of the spelling variation. Dating the texts is also a constant problem. Many texts in ancient languages can only be dated to within a century or two, limiting our recognition of change over time. As a result, this kind of ancient written evidence is sometimes labeled ‘bad data’, at least from the point of view of traditional sociolinguistics (Reference LabovLabov 1972a: 100). However, these issues can be overcome. There has been a wealth of work on the fragmentary ancient languages of the Iron Age Mediterranean in the last decade or so, including Reference ClacksonClackson 2015 (an overview), Reference DuprazDupraz 2012 (on Oscan and Umbrian), Reference McDonaldMcDonald 2015, Reference McDonald2017 (on Oscan), Reference 787ZairZair 2016 (on Oscan), Reference MullenMullen 2013 (on Gaul), Reference SteeleSteele 2013b (on Cyprus), Reference Estarán TolosaEstarán Tolosa 2016 (on the western Mediterranean, including Iberia), Reference AdiegoAdiego 2006 (on Carian) and Reference TribulatoTribulato 2012 (on Sicily), and we have made considerable inroads into establishing a solid methodological basis for understanding these languages and their writing systems (Reference McDonaldMcDonald 2017).

As we discuss below, there often were identifiable spelling norms within communities or regions; interpreting the significance of these spelling norms is the next challenge. Our knowledge of what the ancient Italian education system was like – for professional scribes and inscription writers, for the elite, and for the wider population who had some knowledge of writing – is very limited. As a result, any notion of orthographic ‘rules’ or ‘norms’ has to be treated carefully. We know, however, that teaching and learning did occur, and that this involved more than teaching the basics of the alphabet. Some of the orthographic conventions that we see probably came about as spelling ‘rules’ taught to learners from the earliest stages of their education. Much of this teaching was very informal, but a considerable proportion of it centered on professional training – for scribes, priests and priestesses, craftsmen, stonemasons and other kinds of workers who made use of writing (for an overview of education in Italy at this period, see Reference McDonaldMcDonald 2019). The ‘schools’ or ‘tutors’ used by the elite were another way in which orthography was transmitted. Latin, in particular, underwent a process of ‘standardization’ in the second and first centuries BC (Reference Clackson and HorrocksClackson and Horrocks 2007), and it seems likely, from the remarkable consistency of its orthographic conventions and letter shapes, that Oscan, a neighboring and closely related language, was experiencing a similar crystallization of rules at a similar time.

This does not mean, however, that these rules were as systematic as they are for contemporary languages, nor as fixed and rigid. On the contrary, the notion of centralized reforms accepted by a wide community in any ancient language, which has sometimes been mooted in twentieth-century scholarship, should probably be given up. We should assume that the orthographic norms attested in the written record of any language in ancient Italy reflect both collectively accepted norms (explicitly taught in some kind of teaching system) and individual variations, preferences and innovations, which may have either become shared by individuals other than their inventor or sunk quickly into oblivion.

It is also important to note that many studies of ancient orthography consider multilingualism closely. This is partly because the ancient world was so profoundly multilingual that many of the slaves, craftsmen and soldiers involved in the creation of inscriptions spoke and wrote in multiple languages, and indeed in multiple alphabets. But also, because our knowledge of fragmentary languages often draws on our more detailed knowledge of languages like Latin and Greek, our best way into the epigraphic evidence in lesser-known languages like Oscan, Umbrian or Venetic is often through the study of contact and exchange.

In sum, the baseline assumptions of academics have changed profoundly in the past twenty years. Writing which was previously deemed incompetent, error-prone or careless is now much more likely to be assumed to be meaningful in some way – whether the writer or copyist was aware of the significance or not. This change of viewpoint has allowed a much deeper understanding of variation and change in ancient orthography. Spellings that look like errors or incompetence may be unconscious borrowings from another language known by the writer, or a particular orthography may have been used deliberately to evoke a specific effect. Nonstandard spelling may be the result of a lack of knowledge, or an experimentation with the orthography, or may show us a snapshot of a wider change in progress. Once the writing system of a particular area became more established, there was still an ongoing process of adaptation and change, in reference to other nearby alphabets, the changing phonology of the language, conscious archaism (or even false archaism) referencing older forms, processes of top-down standardization and so on. Ancient orthography can often be best understood on the micro level of individual inscriptions, taking into account their context, location and linguistic landscape – in other words, we must follow the principle of “informational maximalism” (Reference Janda, Joseph, Joseph and JandaJanda and Joseph 2003: 37).

14.2 Republican and Early Imperial Latin Orthography: Grammarians and Epigraphy

The surviving evidence of Latin epigraphy spans a millennium; the unique depth and breadth of the Latin corpus means that we have the opportunity to see orthographic fashions come and go. The most famous of these, perhaps, is that of the ‘Claudian’ letters – new characters invented and promoted by the somewhat scholarly emperor Claudius (ruled AD 41–54), which did not appear to outlive his reign (Suetonius Life of Claudius 41.3). More tenacious were the apex (a small stroke or ‘accent’ above long vowels, apart from ) and the i-longa (a taller version of ), which can be seen as early as the second century BC, most commonly in official and more expensive inscriptions. A nice example of the apex and i-longa in action is the inscription of in the College of the Augustales in Herculaneum (AE 1979 169; see Figure 14.1); however, they were never used consistently (Reference 691GordonGordon 1983: 14, Reference Clackson and HorrocksClackson and Horrocks 2007: 95). We also have literary texts which provide a commentary on Latin spelling rules – advocating for some, and rejecting others. The epigraphic material provides a helpful warning against taking the literary texts too much at face value, as the ‘rules’ that writers espoused did not always match writing practices on the ground.

Figure 14.1 Dedication to Augustus, Herculaneum, first century AD

The use of the apex and i-longa is visible multiple times (e.g. the and second of Lucii in the second line).

(AE 1979, 169. Photo: K. McDonald)

One example is the use of double graphemes to represent long vowels, such as <aa> for /aː/, also known as geminatio vocalium. This spelling convention is documented in the epigraphic record of Latin between about 140 and 50 BC, mostly in initial syllables, resulting in spellings such as Maarcus, seedes and iuus (Reference Salomies, Bruun and EdmondsonSalomies 2015: 172). The earliest example may be aaram (CIL I² 2238, 135/4 BC, Delos). Double <aa> is both the earliest and the most frequently attested double vowel spelling, with <uu> appearing around 117 BC and <ee> not until around 100 BC (Reference VineVine 1993: 269). The spelling <oo> is rare, and <ii> particularly rare, though not unknown (Reference Chahoud, Taylor and PezziniChahoud 2019: 65), probably because of possible confusion with the cursive letter <e>, which was written as two vertical lines (Reference VineVine 1993: 272). The double vowel convention probably did not originate in Latin inscriptions, but in Oscan, a closely related language spoken in central and southern Italy, whose speakers (and writers) had been in sustained contact with Latin since the Roman expansion into Italy in the late fourth and early third century BC (Reference LejeuneLejeune 1975: 240–42, Reference VineVine 1993: 267–86). Double vowels are attested in Oscan much earlier than in Latin, possibly as early as the third century BC (depending on the date of Pallanum 1/Fr 2, which was written some time from c. 300 BC onward; there are many examples from 200 BC onward).Footnote ²

In Oscan, vowel length is only distinguished for stressed vowels, and the word stress falls on the initial syllable (see Reference MeiserMeiser 1986: 135–51 for comments on both Umbrian and Oscan); as a result, the writing of long vowels as double is almost always confined to initial syllables. The spelling <aa> is by far the most common double vowel grapheme (about 55 percent of the attested examples). This can be ascribed to the phonology of Oscan vowels: the phonemes /εː/ and /ɔː/ do exist, but they are relatively infrequent, due to a proto-historic vowel shift (Reference MeiserMeiser 1986: 39–54). As a result, the vowel /aː/ represents by far the most frequent open long vowel in the language. In Latin, in contrast, vowel length is relevant in all word positions; the word stress also does not necessarily fall on the first syllable of polysyllabic forms, because it falls on the penultimate or antepenultimate syllable. Furthermore, /aː/ was not particularly frequent compared to the other long vowels. However, the extant examples of double vowel graphemes used to represent long vowels in Latin inscriptions show exactly the same tendencies as in Oscan: double vowel graphemes are attested mainly in word-initial syllables (or in monosyllabic words), and double <aa> represents more or less half of all the documented examples. To give a few examples:

Oscan aasas (‘altars’, nominative plural, ImIt Teruentum 34/ST Sa 1, line B 1, Agnone);
Latin aara (‘altar’, nominative singular, CIL I² 1439, Bouillae);
Oscan Staattieís (‘Statius’, masculine name, genitive singular, ImIt Abella 3/ST Cm 3, Abella);
Latin Staatius (‘Statius’, masculine name, nominative singular, CIL I² 1845, Amiternum).

As can be seen from these examples, many words and names are common to both languages (as is understandable in two closely related languages spoken in neighboring areas, and sometimes spoken within the same communities). The shared names in particular, Vine has suggested, may have been the motivation for the cross-over of this orthographic convention (Reference VineVine 1993: 279–80, Reference Chahoud, Taylor and PezziniChahoud 2019: 66). It seems reasonable to suppose that the double spelling of long vowels might have been introduced into Latin by Oscan/Latin bilinguals who were literate in both languages.Footnote ³ The new orthographic rule that some writers of Latin seem to have acquired can be expressed as follows: ‘(optionally) transcribe any long vowel in word-initial syllables and in stressed monosyllabic forms as a double vowel, especially if the vowel is an /aː/’. If we accept this explanation of the origin of this orthographic convention, some writers of Latin ended up following a convention that was (partly) based on the phonology of a non-Latin language. However, the rule also seems to have been subject to reinterpretations within Latin itself. The practice of writing long vowels with a digraph is mentioned by various Roman grammarians, with arguments both for and against the practice. The poet Lucius Accius (c. 170–c. 86 BC) seems to have been in favor, according to Terentius Scaurus:

Accius wanted the syllables which are long by nature to be written with gemination of the vowel, whereas, otherwise, the fact that a vowel is long or short can be expressed by the addition or the suppression of an apex.

(Dangel 1995, fr. XXI)

The poet Gaius Lucilius (c. 180–c. 102–101 BC), who is characterized as an “assertive voice delivering views on an unprecedented variety of themes” (Reference Chahoud, Taylor and PezziniChahoud 2019: 46), and who had a special interest in orthography, seems to have rejected this rule, for the following reason:

First, AA is a long, and A a short syllable: however, we will do the same for both, and, just as we speak, we shall write pacem [peace], placide [calmly], Ianum [Janus], aridum [dry], acetum [vinegar], as the Greeks do with Ἆρες Ἄρες [two allomorphs of the god-name Ares in the vocative, the first one with long /aː/, the second one with short /a/].

(Lucilius fr. 9, 5, as quoted in Charpin 1991)

Lucilius advocated the use of what he considers a phonetic rule: just “as [the Romans] speak,” they should write <a> in all cases. This means that Lucilius does not regard the suprasegmental opposition between short and long vowels as a truly phonetic one. And furthermore, Lucilius reminds his readers that the ancient Greek orthography also uses the same grapheme, alpha, both for short /a/ and for /a(ː)/, even though they have different letters for some vowel sounds, for example, /e(ː)/ and /o(ː)/. His ancient Greek example ( Ἆρες Ἄρες, Homer Iliad 5.31) is one also used by his younger ancient Greek contemporary, Dionysius Thrax, which suggests a common stock of examples on which they both drew, and shows how closely Lucilius was aligning himself with ancient Greek rather than ‘local’ or ‘nonurban’ orthographic norms (Reference VineVine 1993: 286, Reference Chahoud, Taylor and PezziniChahoud 2019: 61, 65). Interestingly enough, however, it seems that Lucilius was only concerned with the transcription of the vowel /a(ː)/ in the first syllable of word forms (Reference VineVine 1993: 279–80). As examples, he quotes pācem, plăcidē, Iānum, āridum, ăcētum, all of which contain the initial vowel /a/ or /aː/ in a stressed (in most cases) or in an unstressed syllable (ăcētum, in which the stressed vowel is /eː/). As we have shown, the doubling of /aː/ in initial syllables was the most common kind of vowel doubling in epigraphy, and this was perhaps partly what Lucilius had in mind.

Were either Accius or Lucilius aware of Oscan spelling conventions, including the double spelling <aa> for /aː/? Lucilius probably knew some Oscan (Reference CharpinCharpin 1978: 11, Reference AdamsAdams 2003: 120–22); Accius, on the other hand, was born in Pisaurum, an Umbrian-speaking area. Perhaps Lucilius’s appeal to Greek models implicitly drew a contrast with some Romans’ habit of sharing spelling rules with their Samnite neighbors – which had a very different social and cultural meaning. Of the two systems for marking long vowels, it was the apex which survived longer – Quintilian (c. AD 35–c. AD 100) referred to the double vowels as an obsolete practice (Institutio oratoria 1.4.10, also 1.7.14). It is worth remembering that authors’ comments on orthographic rules are drawing a contrast between themselves and (a) other ‘experts’ and (b) the spellings being used among their contemporaries. For example, Quintilian, writing in the first century AD, advocated for a very sparing use of the letter <k>:

For I think that in fact K should not be used in any words except those which can also be signified by the letter on its own [e.g. K can be used as an abbreviation for Kalendae, the first day of the month]. I mention this because some people believe that K is necessary whenever it is followed by the letter A, even though the letter C exists, which has the same quality in front of every vowel.

(Quintilian Institutio oratoria 1.7.10)

Here, Quintilian explicitly limits the instances where <k> is correct, but he allows that there are other opinions on this matter. To a modern learner of Latin, the letter <k> is limited to a very small number of words – Kalends, Karthago and a few others. And Quintilian is right that, for the most part, writers of Latin of his time did not always use <k> before <a> (although this had been the orthographic convention a few hundred years before him). But we can see that what was ‘incorrect’ to Quintilian was not ‘nonstandard’ in the literate community more widely. For example, karissimus/-a and related words are extremely common words on tombstones, and this spelling is almost as common as carissimus.Footnote ⁴ The spelling with <k> may have been preferred by some writers and not others, as can be seen in the Vindolanda tablets, where letters to and from some of the generals and their wives show this spelling (Reference AdamsAdams 1995: 119). The ‘standard’ orthography espoused by Quintilian was not the only possible spelling used even by profession and educated writers. These slight mismatches between the literary testimonia and the epigraphic evidence reminds us that neither ‘education’ nor ‘literacy’ were monolithic. A great deal of our evidence for education hinges on the accounts of male authors from a certain class and time period, writing for a particular audience; there were many literate people outside this social class who had their own understanding of what the ‘rules’ were and, of course, these rules changed considerably over time.

14.3 Oscan Orthography: Regional Communities of Practice

We have already touched on the possible influence of Oscan orthography on Latin orthography. How much do we know about orthographic norms in Oscan itself? Oscan is an unusual example of a fragmentary language in some respects. Firstly, it is written in (at least) three alphabets (Reference Tikkanen, Clackson, James, McDonald, Tagliapietra and ZairTikkanen 2020). The best-attested are the ‘Oscan’ or ‘National’ or ‘Central’ alphabet, which used the Etruscan alphabet as its direct model, the Greek alphabet (used particularly in Lucania and Bruttium) and the Latin alphabet (used particularly in the areas nearest Rome and in later inscriptions).Footnote ⁵ All of these alphabets were adapted to a greater or lesser extent to write Oscan: for example, Oscan in the Greek alphabet uses a special character for /f/, often in the shape of an <s>. The adaptation of the Greek alphabet to writing Oscan also required the creation of norms for how to write vowels and certain consonant clusters. We explore both of these examples below.

Secondly, Oscan is attested across a particularly wide range of different document types and, as a result, we have evidence of the spellings used by a range of people with different purposes in mind. Surviving inscriptions include legal texts, religious texts, inscriptions commemorating building work, dedications, curse tablets, graffiti, coins and artists’ signatures. Thirdly, it is often considered to show an unusually high level of orthographic ‘standardization’ (Reference 787ZairZair 2016: 124–25, Reference Tikkanen, Clackson, James, McDonald, Tagliapietra and ZairTikkanen 2020). This is not to say that there is no spelling variation – particularly in the texts written in the Greek and Latin alphabets. Several recent studies, including Reference 787ZairZair (2016) and Reference McDonaldMcDonald (2015), both discussed below, have looked at Oscan orthography and tried to make sense of some of this variation – although some spelling ‘rules’ have been proposed, the evidence is sparse and needs to be used carefully. All the same, the evidence leads us in some interesting directions, and suggests that even in communities with a great deal of spelling variation, there was probably some sense that ‘rules’ existed.

Reference 787ZairZair (2016: 26–95) has explored the vowel orthography of Oscan in the Greek alphabet. The spelling of vowels presented a problem in all of Oscan’s alphabets, because none of the alphabets had a catalogue of signs which mapped directly onto the number of vowels in the language. In the Oscan alphabet, diacritic marks were created c. 300 BC to mitigate this issue. In the Greek alphabet, however, the problem remained: how could a six-vowel system – /i(ː)/, /e(ː)/, /ε(ː)/, /a(ː)/, /o(ː)/, /u(ː)/, plus diphthongs – be written using the nine Greek vowel graphemes – <ι>, <ει>, <ε>, <η>, <α>, <ο>, <ω>, <ου>, <υ>?Footnote ⁶ This topic had previously been explored by Reference LejeuneLejeune (1970), but considerable new evidence was uncovered in the second half of the twentieth century which invited a new evaluation of the issue. As described by Lejeune, there were two major stages in how vowels were written in Oscan in the Greek alphabet.Footnote ⁷ Lejeune believed in orthographic norms created in centralized scribal schools, the most important of which was centered on the sanctuary of Rossano di Vaglio (Reference LejeuneLejeune 1970: 276). This understanding of the education system accounts for the possibility of a widespread and relatively quick spelling reform (as set out in Table 14.1).

Table 14.1 Lejeune’s analysis of vowel orthography in Oscan in the Greek alphabet

	Stage 1 (up to 300 BC)	Stage 2 (after to 300 BC)
/i/	ι	ι
/e/	ε	ει
/ε/	ε	ε
/εi/	ει	ηι
/a/	α	α
/a/ in word-final position	ο	ο
		Nonfinal syllables	Final syllables
/o/	ο	ο	ο, ου, ω
/o/ next to a labial	ο	ω
/u/	υ	ου	ου, ο, ω
/u/ > /ju/ > /y/	υ	ιυ, υ
/ou/	ου	ωϝ, ωυ

However, Zair’s analysis based on the current evidence has shown that, at the very least, a large minority of the extant inscriptions do not follow the system outlined by Lejeune (Reference 787ZairZair 2016: 44). Rather than accepting that these exceptions are the products of other, minor scribal schools, or no scribal schools at all (which is how Lejeune explains spelling variation), Zair puts forward a different proposition. There was, in fact, a great deal more variation in vowel spelling than has previously been supposed, both across the whole time period when these inscriptions were being written, and within individual inscriptions. There was no single ‘reform’, but a collection of spellings from which writers could choose, most of which were in use from the fourth to the first century BC (as shown in Table 14.2).

Table 14.2 Zair’s analysis of vowel orthography in Oscan in the Greek alphabet

	Spelling 1	Spelling 2	Spelling 3
/i/	ι
/e/	ι	ε	ει
/ε/	ε
/εi/	ει (perhaps not in use in the first century BC)	ηι
/a/	α
/o/	ο	ω
/u/	ου	ο	υ (only /y/ < *-u- after dental)

This scrutiny of the vowel orthography is not done simply for its own sake – Zair shows in his analysis how a better understanding of the vowel orthography can help us produce better data for understanding Oscan etymology and morphology (Reference 787ZairZair 2016: 80–83). It also gives us some insight into the education system, and the kinds of rules that people were (or were not) taught. They were not taught, for example, that spelling variation was to be avoided on principle, apart from perhaps as a matter for individual preference (Reference 787ZairZair 2016: 91). However, there are hints at orthographic conventions which were taken up by many writers, such as the use of -ηι for the /εi/ diphthong. Forty-five of 48 instances of the spelling -ηι- occur in either the dative singular -ηι or the genitive singular –ηιс,Footnote ⁸ so that it seems that there was a convention of sorts around how to write these particular noun endings. It is very likely that this was influenced by the same spelling found in dative endings in Greek (Reference 787ZairZair 2016: 95).

Another recent study is McDonald’s work on the writing of consonant clusters in Oscan in the Greek alphabet, in particular on /ps/, written either as <ψ> or <πс>, and /ks/, written either as <ξ> or <κс> (Reference McDonaldMcDonald 2015: 82–93). The Greek letter psi is the most common way to spell the cluster /ps/, but it is not used consistently. Most notably, all our examples of psi are found in names, whether personal names or divine names (e.g. νοψιν, a male name in the accusative,Footnote ⁹ Laos 2/Lu 46, Laos), while our examples of <πс> are found in other kinds of words (e.g. (ω)πсανω ‘building (gerundive)’, Potentia 1/Lu 5, Rossano di Vaglio). Xi follows the same pattern, but is even rarer. On its own, this might not tell us much – there are only a handful of examples of these spellings, and many Oscan inscriptions contain names exclusively, so we do not have enough other kinds of words to make a fair comparison. However, there is a regional pattern in the data. There is a split between names and other words everywhere except Messina, in Sicily, where we have the only nonname instance of a xi (μεδδειξ ‘magistrate’, nominative plural, Messana 4/Me 1) and the only name written with <κс> (μαμαρεκс, Messana 6, Me 4). Historically, the idea of a difference in orthographic norms between the Oscan speakers of Messina and other speakers using the Greek alphabet is plausible, because the Mamertines who occupied Messina were (supposedly) Oscan-speaking mercenaries hired in Campania, where the Oscan alphabet was used (Reference McDonaldMcDonald 2015: 90–92). Perhaps they had the Oscan alphabet – which has no psi or xi – partly in mind. The pattern identified here may be due to chance, but it may also be the result of a small community of writers whose orthographic norms came from a different source to others writing in the same alphabet.

There are some mysteries among the apparent orthographic rules of Oscan. For example, although double consonants can be written either double or single in Oscan, in loanwords from Latin and Greek containing /-st-/, the /s/ is almost always written double (Reference 787ZairZair 2016: 163–64). Examples include kvaísstur < Latin quaestor, passtata < Greek παστάς ‘porch’, perisstul[leís] < Greek περίστυλον ‘colonnade’. Native words containing /st/, however, are written <st>. This may reflect some perceived difference in the quality of the Greek and Latin cluster compared to the sound of the cluster in Oscan. But, strikingly, the doubling of the <s> is not found when the same borrowed words are written in the Greek alphabet – so, for example, we have quaestor as κϝαιсτορ. Zair suggests that perhaps writers in Lucania and Bruttium (using the Greek alphabet) were in close enough contact with the Oscan alphabet to be aware of some orthographic rules (e.g. ‘double letters to write geminate consonants are noncompulsory’), but not close enough contact to be aware of more detailed exceptions to the rules, (e.g. ‘more-or-less compulsory double writing of <s> for /sst/ in loanwords’; Reference 787ZairZair 2016: 164). This example suggests not just that there were regional orthographic conventions, but that we might be able to reconstruct something of how the rules spread from place to place, being adapted, changed and simplified as they went.

14.4 Umbrian Orthography: Rules in the Iguvine Tables

Let us now discuss orthography in another language of fragmentary attestation: Umbrian, another Italic language closely related to Latin, written in and around Umbria from around the seventh century BC to the first century AD. It is mainly documented in an exceptional document known as the Iguvine Tables, which merits a detailed exploration. The Iguvine Tables are seven tablets made of bronze, on which are engraved the descriptions of six complex state rituals, together with four general regulations of the Atiedian Brotherhood which was in charge of these rituals, in the city-state of Gubbio (Iguuium). These texts were engraved at various times from the end of the third century to the beginning of the first century BC. It must be emphasized that they only represent the last stage of a written tradition which had arisen as early as the last decades of the fourth century BC, as well as a lengthy oral tradition (Reference Maggiani, Nardo and PaolettiMaggiani and Nardo 2014). Most of the engraved texts contain older sections taken from some earlier version that had been written on a perishable support such as wood or waxed tablets (Reference Rix and QuattordioRix 1985: 27–34, Reference DuprazDupraz 2011). As a result, the Iguvine Tables document the writing habitus of a restricted community of learned priests, who would have drafted the text to be inscribed. They tell us something about the orthographic norms used by a small number of individuals (elite priests and the craftspeople – who may have been enslaved, freed or freeborn – whom they hired to engrave their inscriptions) in one small city during a time span of about two centuries. We know little about the education of these people, but it probably mirrored the systems of learning to write found elsewhere in Italy: the professional training of both priests and stonecutters created and reproduced local spelling conventions. Our knowledge of Umbrian is so heavily based on this single series of documents that it is very difficult for us to speak of ‘Umbrian’ writing habits more broadly; however, the Iguvine Tables represent a uniquely long text in an Italic language other than Latin (four times longer than the next longest text, the Oscan Tabula Bantina), and they provide a rich resource for thinking about orthographic norms in a single community (Figure 14.2).

Figure 14.2 Table 5 of the Iguvine Tables, showing the end of the older text (in the Umbrian alphabet) followed by the beginning of the newer text (in the Latin alphabet)

(image: K. McDonald)

The Iguvine priests used two different writing systems: an Etruscan-based alphabet, usually known as the ‘Umbrian’ alphabet, and the Latin alphabet. The Umbrian alphabet used in the Tables can be further divided into two variants, in which the inventory of signs is the same although the shapes of some of the letters are different. The alphabets includes two letters unique to Umbrian which we usually transcribe <ř> and <ç>. The Latin alphabet is used on the last two and a half tablets, which were probably written last, around the first century BC. This change in the alphabet should not be explained as a form of decay or a ‘forgetting’ of the Umbrian alphabet over time; it is more likely to be a process of fitting Umbrian traditions into a changing political and social landscape, in competition first with other Umbrian city-states, and then with Rome (see Reference McDonald, Zair, Jones and MooneyMcDonald and Zair 2017 for similar considerations for the Tabula Bantina). The orthography of the Iguvine Tables respects some general spelling conventions throughout; but these conventions left much room for individual freedom and creative innovation, and there is considerable internal variation in the document in how some phonemes are represented. Meiser, for instance, was able to establish the overall correspondences between phonemes and graphemes. The situation for the front vowels is as shown (Reference MeiserMeiser 1986: 27–28, Reference DuprazDupraz 2016) in Table 14.3. The Umbrian language had three front vowels (i.e. vowels produced with the high point of the tongue close to the front of the mouth).

Table 14.3 Front vowels in Umbrian (simplified presentation)

Phonemes	Graphemes (Umbrian alphabet)	Graphemes (Latin alphabet)
/iː/	<i>, <ih>	<i>, <ihi>
/i/	<i>	<i>
/eː/	<e>, <i>, <eh>	<e>, <i>, <ei>, <eh>, <ehe>
/e/	<e>, <i>	<e>, <i>, <ei>
/ε/	<e>	<e>
/εː/	<e>, <eh>	<e>, <ee>, <ehe>

Vowel length was distinctive only in stressed syllables (as in Oscan, but unlike Latin), and the stress lay on the initial syllable of the word. Specific strategies were optionally available for representing long vowels, and these are only documented in stressed syllables. The main one consisted of the use of the grapheme <h> as a marker of length, often combined, in the texts in the Latin alphabet, with the repetition of the vowel, so that for example /iː/ was written <ihi> (e.g. persnihmu ‘he should pray’, IV 11, 23, 25, alongside persnimu, IV 8, 10 and persnihimu VIb 17). The transcription of the mid-close vowel /e(ː)/ clearly raised a difficulty: neither the Etruscan alphabet, from which the Umbrian alphabet was adapted, nor the Latin alphabet had three symbols for front vowels. This problem was usually solved by using (arbitrarily) either the grapheme <e> or , combined (in the Latin alphabet) with the marker of length <h> when relevant; however, a digraph <ei> also arose as a specific marker for this mid-close vowel.

This is the usual explanation of the orthography of front vowels in Umbrian, and it is quite correct overall, but it fails to account for the existence of some specific spelling conventions if we turn from the overall level of the whole corpus to analyze some subsets of words. The analysis of the orthographic rules of the Iguvine Tables, started more than one century ago by von Planta, is not yet complete (Reference Von Plantavon Planta 1892). As we shall see, there is considerable methodological value in looking closely at individual forms and lexemes when using this kind of epigraphic evidence. To begin with, there seem to be orthographic rules relating to various morphological forms. The descriptions of rituals are mainly written in a specific mood called the imperative II, often known in Latin grammars as the ‘future imperative’, which means something like ‘X should do’ or ‘X shall do’. The Tables contain hundreds of examples of this morphological form. The imperative II of the verbs of the *-ē-conjugation is almost always transcribed with an <e> in the texts written in the Umbrian alphabets, and with an in the Latin alphabet (Reference Von Plantavon Planta 1892: 1, 95), regardless of the other possibilities offered by the Umbrian orthography as a whole for /e/. See for instance the following examples:

habetutu ‘they shall have’ (I b 15, Umbrian alphabet);
kařetu ‘you shall call’ (I b 33, Umbrian alphabet), but carsitu (e.g. in VII a 43, Latin alphabet);
habetu ‘you shall have’ (e.g. in II b 23, 2×, Umbrian alphabet);
uřetu ‘you shall lighten’ (e.g. in III 12, Umbrian alphabet);
sersitu ‘he shall be sitting’ (VI b 41, Latin alphabet).

It seems clear that there was a collectively shared norm as to the orthography of the verbal suffix *-ē- in the imperative II, and the writers of the Iguvine Tables are remarkably consistent in its application. Many questions remain open, of course (for instance, was this rule explicitly taught when learning to write?), but the existence of the convention is clear. A slightly different problem is that of the present active participles of the *-ē-conjugation. In these participles, the verbal suffix is always written with <e>, in both alphabets:

kutef ‘being silent’ (e.g. in I a 6, Umbrian alphabet);
zeřef ‘sitting’ (e.g. in I a 25, Umbrian alphabet);
serse ‘sitting’ (e.g. in VI b 41, 3×, Latin alphabet).

This raises a problem, which is quite often encountered when discussing the orthography of ancient languages, especially in languages of fragmentary attestation: is the spelling with <e> a purely orthographic rule? Or should it be ascribed to a specific phonetic property of the vowel in the suffix? There are reasons to believe that in the specific context of the present active participles, a prehistoric conditioned treatment known as Osthoff’s lawFootnote ¹⁰ shortened the etymologically long *-ē- to *-e-, which, following a vowel shift, became /ε/. This /ε/ vowel is always transcribed as <e> in Umbrian (Reference UntermannUntermann 2000, see: kutef, Reference Fortson IV and WeissFortson and Weiss 2019: 639). On the other hand, as we have seen, <e> is also one of the usual notations for /e/ (< unstressed *-ē- after the vocalic shift), alongside and <ei>.Footnote ¹¹ So the quality of the vowel in this ending is somewhat in question.

The fact that in the texts written in the Latin alphabet the imperative II has always (e.g. sersitu in VI b 41), whereas the present active participle has always <e> (e.g. serse, the participle of the same lexeme) seems to imply that a phonetic difference was felt between the two vowel qualities. It is to be assumed, therefore, that the scribes using the Umbrian alphabet did not want to devise a contrastive orthography (since they used <e> both for the imperative II and for the present active participle), and that this was a new spelling convention created by later priests who used the Latin alphabet.

In Umbrian, spelling conventions also existed at the lexical level – that is, there were spellings which stayed consistent across different attestations of the same word. It is probably significant, for instance, that the lexeme frite ‘trust’, which is unfortunately only attested in texts written in the Latin alphabet, is always transcribed with an in the first syllable, although it contains a stressed *-ē- > /eː/, which may in principle appear as <e>, or <ei>, or even as <eh> or <ehe>. This noun is attested in two different texts (the long version of the description of the piaculum ritual, VI a 1–VI b 47, 4×, and the long version of the lustratio ritual, VI b 48–VII a 54, 4×). Therefore, its uniform spelling is probably not to be interpreted as the choice of a single priest or scribe, but as a wider spelling convention.

The apparent variation in spelling when examining the corpus as a whole is often the effect of the existence of multiple smaller-scale norms, variously followed by different individuals. One example of this is the orthography of the demonstrative stem *eks- attested in the long version of the lustratio ritual.

Umbrian has a proximal demonstrative stem *eks- > /εss-/, which is almost always written as <es->. It seems that the vowel /ε/ was phonetically raised to /e/ before several consonant clusters, among which *-ks- > /ss/, when these were followed by a front vowel; this raising probably took place in some of the case forms of *eks-, but not in all (Reference MeiserMeiser 1986: 110, Reference DuprazDupraz 2012: 84). However, the raising is not, in general, taken into account in the orthography of this demonstrative. The choice not to transcribe the phonetic opposition between stem vowels with and without raising should be interpreted as a lexical orthographic norm. On the one hand, the raising is almost never transcribed as such in the inflectional forms of this demonstrative (though it is not clear how all these forms were pronounced). On the other hand, however, it is always transcribed in the adverbs derived from the demonstrative stem, even in those in which the following vowel is not a front vowel (e.g. isunt ‘in that same way’, in which represents /e/ (Reference DuprazDupraz 2012: 78–81, 84). This means that in this family of words a lexical orthographic rule has developed which does not reflect the original phonetic properties of the words in question.

There are only two exceptions to this lexical rule. The long version of the lustratio contains six examples of the dative-ablative plural of the stem *eks-.Footnote ¹² In that case form, the stem is followed by the front vowel /e/ (ending /-er/ < *-oys or similar). Four of the six forms show the expected orthography esir (VII a 10, VII a 18, VII a 26, VII a 32); the last two, however, have isir (VII a 21, VII a 34). The four regular forms appear in two parallel long texts of prayer (VII a 9–VII a 20; VII a 25–VII a 34): the same prayer formula is quoted both in VII a 10 and in VII a 26; another formula is attested both in VII a 18 and in VII a 32. The context of the two irregular forms is significant: they appear in two short prayers following respectively the first and the second long prayer (VII a 21–VII a 23; VII a 34–VII a 36); in both these short prayers the form isir appears in the very same formula. It is unlikely to be a coincidence that the spelling <is> for the demonstrative stem *eks- appears only in these two parallel complementary prayers and nowhere else in the Iguvine Tables. We may speculate, for example, that the short complementary prayers were added by a different scribe who did not know or did not accept the convention that all forms of the demonstrative *eks-, regardless of their (original) pronunciation, were written with <es>. These kinds of micro-level examples help us to build up the complex history of how the prayers were copied and recopied, before making it into the version we see today.

In many cases, of course, the variation in orthographic practices happens not within a single text, but between different texts drafted and copied by different individuals – and so, we find some examples where a particular orthography is used consistently within one section of the Tables but contrasts with the orthography of another section. It is probably significant, for instance, that the ending of the first singular present active indicative of the *-ā-conjugation, which is attested both in the long version of the piaculum and in the long version of the lustratio, is written with <-au> in the piaculum (15×) and with <-auu> in the lustratio (8×). This ending may have been pronounced /a^wu/ with an intervocalic glide. The scribe in charge of the engraving of the long version of the lustratio seems to have tried to transcribe the glide, in contrast to his colleague; unfortunately this ending is not attested in the other sections of the text.

Finally, language contact must always be taken into account when investigating the orthographic norms in a corpus like that of the Iguvine Tables. To return to an orthographic convention we have already discussed: the double spelling of long vowels is attested in Umbrian as well as in Latin and Oscan as a specific marker of length, but only in seven word forms (Reference DuprazDupraz 2016: 18–21):

frateer (‘brothers’, nominative plural masculine, V b 16);
eest (‘he will go’, future indicative third singular, VI a 2);
ooserclom- (‘watching tower (?)’, accusative singular neuter, VI a 12);
meersta (‘correct, righteous’, accusative singular feminine, VI a 17);
eesona (‘divine’, accusative plural feminine, VI a 18);
eetu (‘thou shalt go/he shall go’, imperative II, VI b 54);
feetu (‘he shall do’, imperative II, VII a 41).

These forms appear in three of the four texts written in the Latin alphabet: the general regulation V b 8–V b 18, the long version of the piaculum , and the long version of the lustratio. These four texts date from the end of the second century and the beginning of the first century BC. It has been suggested that the Umbrian double vowel graphemes are an effect of language contact, probably with Latin, which at that period was beginning to become the prestige language in the whole peninsula (Reference ProsdocimiProsdocimi 1984: 154–60). This argument is disputable. The extant documentation, however scanty, does not present the same pattern as in Latin (and in Oscan). Of the seven forms showing double vowels, six contain the digraph <ee>, and the last one has <oo>. The word frateer has its double vowel in a noninitial syllable. This is not what would be expected if the writers had taken on the Latin pattern for double vowels, which shows an overall predominance of <aa> in initial syllables. The phoneme /aː/ is fairly common in Umbrian, among others in word-initial syllables, and it is never written with <aa>.

While it may safely be assumed that the principle of doubling vowel graphemes did arise in contact with Latin (and perhaps also with Oscan), it seems that double vowels were used in Umbrian in a particular context. In fact, this spelling functions as a variant of a local orthographic convention unconnected to Latin and Oscan double vowels: the digraph ǀeeǀ is an allograph of the digraph <ei>, which is used in the later Iguvine Tables to transcribe the phonemes /e/ and /eː/. Both the digraph <ei> and the much rarer double vowel graphemes are mainly attested in forms in which the vowel (whether etymologically short or long) was originally followed by a consonant at the coda of the syllable, but where this consonant has undergone lenition and has been lost or weakened from a stop to a semivowel. This lost consonant may still have been pronounced as a semivowel yod, at least in the more formal registers.Footnote ¹³

Therefore, even in languages of fragmentary attestation, the written tradition of which is often considered the worst possible ‘bad data’, language contact should not be regarded as the entire explanation for all orthographic changes, since orthography always presupposes an effort, however unconscious, to transcribe the specific properties of the language being used. In the present case, several of the Iguvine priests seem to have devised a new orthographic norm, perhaps taking the cue from their knowledge of a rule attested in contemporary Latin and Oscan, but adapting it carefully to the needs of their own language.

14.5 Venetic Orthography and Punctuation: Becoming Roman

To finish, we would like to briefly highlight punctuation as a part of orthography, and to emphasize the cultural meaning which can sometimes be tied up with orthography. In some languages, punctuation conventions are just as important as spelling conventions, and can take on considerable significance for the writers. A key example is Venetic, a language of northeastern Italy attested from around the sixth to the first century BC; it is probably related to the other Italic languages such as Latin, Oscan and Umbrian, but may form a separate branch of the Italic language family. The most notable feature of the Venetic orthographic system is not its spelling but its punctuation. In the Venetic system of punctuation, any syllable which consists of consonant + vowel is left unmarked. However, any syllables which end in a consonant, diphthongs, or syllables which consist only of a vowel, are marked by placing dots or short lines around the letter which causes the syllable not to conform to the CV structure. For example: .e.go vhu.k.s.siia.i. vo.l.tiio.m.mnina.i. (Es 2, funerary inscription on stone, 475–350 BC, Reference Pellegrini and ProsdocimiPellegrini and Prosdocimi 1967: 54–56). This system was borrowed from an Etruscan punctuation system around the fifth century and then maintained until the first century BC, far outliving its use in Etruscan (Reference ProsdocimiProsdocimi 1983: 79–84, Reference WachterWachter 1986: 111–12, Reference Bonfante and BonfanteBonfante and Bonfante 2002: 56). One of the most interesting phases in Venetic epigraphy is the period around the first century BC in which the Latin alphabet starts to be used, but the Venetic punctuation is still maintained in a few inscriptions (not always in accordance with the original rules). So, for example, an older urn reads, in the Venetic alphabet:

va.n.te.i. vho.u.go.n.tio.i. .e.go

‘For Vants Fougontios I (am)’

Es 79, Este (Pellegrini and Prosdocimi 1967: 197)

But one of the more recent examples, which uses the Latin alphabet, reads:

frema. .i.uantina. .ktulistoi uesces

‘Frema Iuantina for Ktulistos (as his) foster-child’

Es 104, Este (Pellegrini and Prosdocimi 1967: 222)

There is some recognition, even after the change to the Latin alphabet, that there ‘should’ be punctuation, and that sometimes it should be word-internal, but the clusters and diphthongs have not been marked as they would have been under the earlier rules. The initial <i-> has been marked, so there is perhaps some remaining understanding that initial vowels needed punctuation. Were there new orthographic rules in play that we have not yet understood? Or, as seems more likely, was the visual appearance of the orthography the main thing that was important to these writers? It is relevant that we see this orthographic archaism mainly on the inscriptions on funerary urns, which were buried in family groups of multiple generations. Each time a new family member was cremated and buried, the grave would be opened and the older urns would be visible once again. This perhaps influenced the orthography names on the new urns – the writers wanted their orthography to fit with the visual language which tied the family urns together.

14.6 Conclusion

We selected the case studies of ancient Italian epigraphy in this chapter to highlight just a few of the methodological issues that arise when studying ancient orthography, and some of the recent approaches taken by scholars to understand the orthography used by writers of this languages. As we have seen, although we are dealing with languages without a written ‘standard language’ (or, perhaps, languages in the very earliest stages of developing such a standard), we can still speak of spelling conventions and orthographic norms. These norms arose within communities or groups of writers, but were flexible and mutable – they were also subject to faster change over time than in a truly ‘standardized’ written language. Nevertheless, it seems likely that writers of many of these languages could have articulated some of the ‘rules’ they had been taught, if asked to do so, in greater or lesser detail.

When we investigate Latin orthography, we have both literary accounts and epigraphic evidence to guide us. These are frequently complementary, as literary authors are typically setting their own opinions up in contrast to the spellings used by the wider literate community. In more fragmentary languages, contact with Latin or Greek can be a way into the evidence, but frequently we have to take the texts on their own merits. As we have seen from the Umbrian examples in particular, sometimes the key is to work at levels other than the level of the entire corpus – we need to go text by text, lexeme by lexeme, or verb form by verb form, to uncover the orthographic conventions behind the apparently endless variation. Oscan, with its three alphabets, gives us a particularly striking case study of regionalism in orthographic practices across one language. And in both our Oscan and Venetic case studies, we have seen how there can be an awareness of orthographic ‘rules’, which may even have cultural meaning attached to them, without these rules being executed identically by all writers. Overall, like many other scholars working today, we advocate for an approach which prioritizes both textual detail and social context. In this way, we believe that it is possible to overcome the methodological issues of epigraphic texts to yield new information about ancient orthography .

15 Materiality of Writing

15.1 Introduction

Historically, orthography has always been dependent on the materiality (physicality) of writing tools and surfaces. In the development of orthographies, the perception of durability and ephemerality has been a significant factor as well: in certain cases, hopes for permanence as opposed to temporality encouraged orthographic differences in written (printed) texts. In ancient Rome, for example, capital letters carved in stone were more durable than the minuscules used on wax tablets. After the invention of movable type printing in fifteenth-century Europe, manuscripts became more ephemeral than printed texts. For instance, newspapers printed in Lithuania at the end of the nineteenth century were considered more perishable than printed books. The materiality of the printing milieu was one of the main reasons for the symmetry of the majuscules and minuscules in orthographies based on Latin script and for the absence of most of the abbreviations that are characteristic of medieval manuscripts. Constrained by the space available within printers’ type cases, the number of typefaces would have been limited, which may have impacted the development of certain orthographies and may have constrained the abundance of diacritic marks. The direct materiality of a sheet of paper also may have influenced the presence of hyphens and abbreviations, and in certain cases, the choice of graphemes.

This chapter focuses on the materiality of orthography as approached from the perspective of such research fields as paleography, codicology and orthography. Materiality as a physical condition or reason for spelling transformations and variation is considered as either restrictive or encouraging physical reality for an applied orthography. The chapter then discusses the way the imaginary durability or ephemerality of the composed text might influence the orthographic approach (minuscules vs. majuscules, double parallel orthographies, and manuscripts vs. prints). Symmetricity of capital and noncapital graphemes in Latin script is approached as a result of the materiality of printers’ work conditions. The chapter also discusses how manuscript orthographies might have been shaped after the imagined collections of types present in local printing shops, and how specificity of the writing material might have impacted certain orthographic peculiarities.

This chapter deals primarily with the European orthographic tradition and, within it, with some features of Lithuanian orthography, as history of Lithuanian orthography is my major area of expertise.

15.2 Paleography

15.2.1 Scope

In Europe, paleography was introduced in the seventeenth century by Jean Mabillon, who sought to identify the date at which a manuscript was created from the handwriting used to create it. Mabillon worked with the Latin manuscripts, and Latin remains very much a focus for paleographic research. Today, however, any old manuscript (e.g. Armenian, Egyptian, Chinese) falls within the scope of paleography. Paleographers read old manuscripts. Perception of orthographic nuances was always the key to deciphering ancient texts. By the middle of the twentieth century, codicology had branched out of paleography, developing into a separate science of manuscript books (codices). In a sense, paleography remained a ‘bare’ philological science, devoid of additional manuscript information, which was taken over by codicology. Orthography remains one of the foci of paleography, while codicology aims at other aspects of manuscript book origination. Still, codicology may be aided by the orthography, especially in dating manuscripts and supplying more accurate judgments about the locations in which manuscripts might have been created. Orthography is not an object or a tool of codicology, but codicologists may exploit the results of orthographic research.

Paleography and codicology are often paired, and their proximity may be seen in books and in book chapters of the edited volume titled Problemy paleografii i kodikologii v SSSR (Reference Li͡ublinskai͡aLi͡ublinskai͡a 1974), The Makings of the Medieval Hebrew Book: Studies in Palaeography and Codicology (Reference Beit-ArieBeit-Arie 1993), ‘Paleography and codicology’ (Reference Mathisen, Harvey and HunterMathisen 2008), ‘Paleography and codicology: Bibliothèque Nationale de France, Arabe 328a’ (Reference PowersPowers 2009), ‘Palaeography, codicology and language’ (Reference Rambaran-OlmRambaran-Olm 2014) and, of course, the Cambridge Studies in Palaeography and Codicology series. The older of the two terms is paleography (from Greek παλαιός, ‘old’, and γράφειν ‘to write’). The Merriam-Webster dictionary gives it a second meaning of “an ancient or antiquated manner of writing,” which dates from 1749. Today, paleography is defined as a science whose focus is “the study of ancient or antiquated writings and inscriptions: the deciphering and interpretation of historical writing systems and manuscripts.” Encyclopaedia Britannica defines it simply as the “study of ancient and medieval handwriting.” Two salient attributes emerge from these definitions: age (old, ancient, medieval) and type of human activity (writing or handwriting). In short, paleography, the original meaning of which was old writing, today refers to the study of old writing.

The publication in 1681 of De re diplomatica (in Latin) by Mabillon, a French Benedictine monk (Reference MabillonMabillon 1681b, Reference BoyleBoyle 1984: 12–13, Reference ArisAris 1995: 417, Reference Mathisen, Harvey and HunterMathisen 2008: 140), is considered to mark the beginning of Latin paleography. Mabillon’s purpose was to “[establish] the age of Latin manuscripts based on their handwriting and other internal considerations” (Reference Mathisen, Harvey and HunterMathisen 2008: 141). Reading, understanding and conveying the meaning of old texts are the main goals of paleography. As Reference Carroll and KleinhenzCarroll (1976: 39) put it, “Paleography is the science – and one might very well say the art – of deciphering texts.” Paleographers also engage in the preparation of old texts for publication for modern readers; as Reference KosterKoster (2009: 258) noted, “a tradition of critical scrutiny and editing […] has been one of the hallmarks of our discipline.” Modern paleography considers many aspects of this work: form of writing, orthography, languages and dialects. Paleography began as the analysis of Latin texts, and it is sometimes claimed that “paleography is [currently] regarded as relating to Greek and Latin scripts with their derivatives, thus, as a rule, excluding Egyptian, Hebrew, and Middle and Far Eastern scripts” (Encyclopaedia Britannica). In practice, however, paleographers work with other scripts as well as Latin, and surely the analysis of any old manuscript may be termed paleographic. Consider, for instance, texts written by paleographers about Armenian, Egyptian, Aramaic and Chinese scripts: ‘Album of Armenian paleography’ (Reference ClacksonClackson 2003, Reference RussellRussell 2006), ‘Egyptian paleography’ (Reference BreastedBreasted 1910), ‘Ancient Egyptian epigraphy and paleography’ (Reference SilvermanSilverman 1979), ‘A calligraphic approach to Aramaic paleography’ (Reference DanielsDaniels 1984), ‘Wu Dacheng’s paleography and artifact studies’ (Reference BrownBrown 2011). Modern paleography deals with any ancient or medieval writing.

Paleographers, among other things, research scripts. Although the same script can be adopted for writing in various languages – Latin, Greek, Cyrillic and Arabic scripts, for example, are used for writing in many different languages – orthography usually characterizes writings of one language. In phases of prestandard orthographies, we may analyze the orthographies of a particular region (e.g. Wycliffite or Egyptian), of a particular time period, of a particular person or even of a particular phase within the creative biography of a single individual, for example Ion Heliade Rădulescu in Romanian (Reference CloseClose 1974) or Simonas Daukantas in Lithuanian (Reference Subačius and CondorelliSubačius 2020).

15.2.2 Orthography in the Context of Paleography

Certainly, orthography is very important for paleography. Not all aspects of writing that matter to paleographers are objects of orthographic study, however. Features such as the introduction of letter extensions (in today’s terminology, descenders and ascenders) for certain minuscules such as b, d, g, h, p, q and ſ in some early scripts – for instance, in New Roman Cursive, which had been in use since the third century BC (Reference MarcosMarcos 2017: 16) – and in many other medieval minuscules is an object of paleography. This modification involves only the visuality of letter surfaces; the relationship of the letter to sounds, syllables and words as well as to other letters in the alphabet is unchanged. In other words, the system of graphemic signs (graphemes) remains intact. This is not the case with the minuscules i and j, however. Initially, there was only one letter i, and the shape of j was merely its elongated variant in certain strong positions, marked longer with a descender for easier reading. By the late medieval and early modern period, it had developed into two different graphemes, one signifying a vowel, and the other a consonant. Because the inventory of letters had been transformed, this development became an object not only of paleography but of historical orthography as well. The border into orthography was crossed because the graphic system expanded.

Before various rapid ways of writing were developed, a single alphabetic set of letters was sufficient, and many of the oldest writing systems did not employ majuscules and minuscules. The development of cursive and minuscule ways of writing was of great importance both for paleography and historical orthography. Writing more quickly generally led to modification of the shape of letters. In this way, the separate rounded letters carved in stone in ancient times became somewhat smaller, linked shapes on papyrus, wax tablets, parchment and paper. The more important texts continued to be rendered in capital letters, however, as if minuscules had less significance because they could be produced more quickly (see the use of capital letters for book titles and chapters and the use of minuscules in the body of texts today). As systemic modifications of scripts, cursives and minuscules were significant additions. Individual orthographies inherited this duality of majuscules and minuscules and it became an important characteristic of many of them. Not only were the shapes restyled, but the very system of graphemes changed. Today, many alphabets include both capitals and minuscules, or uppercase and lowercase letters as they are commonly known in English, designations derived from the typesetting case containing the capital letters which, in a traditional printer’s shop, was situated above the tray containing their minuscule counterparts. Knowledge of orthographic rules helps paleographers to make sense of concrete signs and decipher old texts. Orthographic development, on the other hand, is an object of investigation on its own: historical orthography constitutes a separate scholarly discipline.

Orthography is a system of graphemes (as opposed to the letter shapes, known as glyphs) and other graphic signs (e.g. obligatory or optional diacritics), consisting of their relation within a word, of their connectivity into digraphs, trigraphs and so on. It is a system of ideas that rule our recognition and choice of graphemes, expressed through the shapes of particular letters (see also Reference Baker, Tabouret-Keller, Le Page, Gardner-Chloros and VarroBaker 1997: 93 and Reference SebbaSebba 2007: 10). Imagine language as an invisible being with an audible voice and think of writing as a visible garment worn by that being. In this sense, language and writing are like Mr. Griffin in the notorious novel by Reference WellsWells (1897), The Invisible Man : unseen yet dressed up and speaking. Visuality is present only in the garment – we cannot see language per se. Now imagine the orthography as the pattern used to construct that garment. That pattern represents the underlying principle of what we ultimately visualize as concrete signs (letters and so on). In prestandard periods, however, the inventory of graphemes, other graphic signs and their guiding rules were often unstable and characterized by an abundance of variation. Both the number of graphemes and the regulations governing them could fluctuate: consider, for example, the variety of letters with diacritics in Early Modern Latin, French, Polish and Lithuanian orthographies (Reference StrockisStrockis 2007, Reference ŠinkūnasŠinkūnas 2010, Reference Baddeley, Baddeley and VoesteBaddeley 2012, Reference Bunčić, Baddeley and VoesteBunčić 2012). Paleographic research is bound in time by the appearance of printed texts. The study of printed texts is allotted to historians of printing.

15.3 Codicology

15.3.1 Scope

Codicology (from Latin codex ‘notebook, book’ and Greek λόγος ‘word’) is a much younger term than paleography. According to Merriam-Webster, which defines it as “the study of manuscripts as cultural artefacts for historical purposes,” codicology dates back only to 1953 in the English language. The term codicology was not coined in English, however. The French term codicologie appeared first. The first to use this term systematically was Alphonse Dain in his 1949 book Les manuscrits (Reference DainDain 1949, Reference GruijsGruijs 1972: 92). There was an earlier attempt by Charles Samaran in 1934–35 to term the new science codigraphie to distinguish it from bibliographie, but it did not catch on (Reference GruijsGruijs 1972: 94). According to Delaissé, “the term codicologie or codicology usually signifies ‘a wider knowledge of the mediaeval book’” (Delaissé, after Reference GruijsGruijs 1972: 101).Footnote ¹ Also,

codicology (codicologie, Handschriftenkunde, manuscript study) is concerned with such matters as inscriptions, format, number of lines to a page, dimensions of the text on the page, size and type of paper, watermarks, binding, clamps, foliation, type of ink, damp and grease stains, pin-prickings, arrangement of gatherings, decoration, and so on. All of these help to provide information about the origin and history of the codex, and about the society in which it was produced.

(Ostrowski 1977: 264)

Codicology is used in a narrow and in a broad sense. According to Albert Gruijs, “[c]odicology – in the strictest sense – is archaeology. For, like the archaeologist, the codicologist examines the codex first and foremost as an object from the past” (Reference GruijsGruijs 1972: 90). Moreover, “codicology comprises the investigation of all physical aspects of codices, together with the indispensable interpretation of the results which such a synthesis has to provide for subsequent historical research” (Reference GruijsGruijs 1972: 102). In the broadest sense of the term, by contrast, codicology “virtually coincides with the modern conception of manuscript studies as a multidimensional approach to the codex as object-in-itself, and as cultural phenomenon” (Reference GruijsGruijs 1972: 102).

An example of codicology in the wider sense […] seems to me to be the book by Jean Destrez, La pecia dans les manuscrits universitaires du XIIIe et du XIVe siécle […] In this work the author is in fact dealing with a social institution: the organisation and techniques employed in reduplicating university texts rapidly by lending separate quires, or pecia, to student copyists so that a whole book could be copied in the time it took one student to copy one quire.

(Gruijs 1972: 103)

Gruijs also elaborates on what he considers to be codicology in a strict sense:

a. a highly detailed description of the physical aspects of the object […]; b. a synthesis based on this description which outlines the material evolution of the codex; […] c. confrontation of this evolution with the actual contents of the item in question […]. The whole gives a picture of the static and dynamic structure of the manuscript.

(Gruijs 1972: 104)

Gruijs considers “everything not comprised by these principles to belong to codicology in the wider sense ” (Reference GruijsGruijs 1972: 104). In Ostrowski’s words, “[o]ne can distinguish between codicology in the broad sense, that is, the study of a manuscript in its social and historical context, and codicology in the narrow sense, that is, merely a description of the manuscript” (Reference OstrowskiOstrowski 1977: 264). Thus, in the narrower sense of the term, codicology is focused on the physical manuscript per se ; more broadly, it is focused on all possible background cultural and environmental specificities in which that manuscript was created and preserved. Codicology deals with “physical and paratextual features” (Reference SmithSmith 2014: 37). The materiality of an object is important for codicology in its narrower sense. As van Beek put it, traditionally “we are supposed to know that matter doesn’t matter” (Reference Van Beekvan Beek 1996: 15). However, his own approach to the materiality of codicological things is different:

It is ultimately a gut feeling that leads me to accept that material culture significantly contributes to our construction and perception of the world and that presumably this must have to do with the most superficial aspect (appearance) of objects: the materiality, tangibility of things. That is to say that material objects contribute something special that is different from the textual attributes of cultural meaning.

(van Beek 1996: 10)

Historically, paleography encompassed aspects that today are considered codicology. Features that, for earlier paleographers, were nonessential, often neglected or studied superficially after receiving only casual attention, for codicologists became the major research object: the materiality of an object itself came to the forefront. Codicology branched out of paleography, thereby shrinking the object of paleography. In earlier works by paleographers, as well as the reading and analysis of old texts, details were provided in the preface or footnotes about the writing materials used, the conditions in which manuscripts were compiled, facts about their previous and current owners, particulars of historical circumstance of manuscript assembly and the institutions involved in the preservation of manuscripts. Since the mid-twentieth century, however, all these details have been an object of codicology, in both the narrow and the broad sense.

Generally, codicology is a pre–mid-twentieth century kind of paleography except for the study of texts. Codicology is what remains of early paleography after its object constriction to mere scripting, writing, reading and orthography. As Smith put it, paleography is the “philological counterpart” of codicology (Reference SmithSmith 2014: 35). Reference GruijsGruijs (1972: 90) has compared codicology to archaeology, describing “old books” as “archaeological ‘finds’” which must be “subjected to different types of interpretation: material, historical, ethnological and artistic” (Lieftinck 1958–59, after Reference GruijsGruijs 1972: 89–90). Codicology, which is “sometimes called in English ‘the archeology of the codex’”, in Russian is called “archeografija” (Reference OstrowskiOstrowski 1977); in the former Soviet Union, the term codicology (kodikologija) did not appear in the title of a book until 1974 (Reference OstrowskiOstrowski 1977: 264). The term codicology “is increasingly being adopted to distinguish the study of old writing (paleography) from the study of the codex or manuscript in which the writing is found” (Reference OstrowskiOstrowski 1977: 264). At the same time, the science of codicology itself has been growing in importance: “over the past decades, interest has shifted from the contents of manuscripts to the ‘complex network of historical circumstances and processes’ […] that must come together to produce these objects” (Reference EchardEchard 2013: 298; the embedded quotation comes from Reference Pearsall, Gillespie and WakelinPearsall 2011: xvi).

15.3.2 Orthography in the Context of Codicology

Both paleographers and codicologists analyze very old (ancient, medieval) writings and their physicality in a cultural milieu. While paleographers work with orthography as well as other features of writing (scripting), codicologists are more detached from orthography. Still, consideration of the dating and localization of a codex is an essential part of codicological research, and orthographic representation of that codex may help draw more precise conclusions. For instance, the Greek Codex Alexandrinus was believed to exhibit Egyptian Greek forms, but after quantitative analysis of certain orthographic patterns, Smith was able to assert that its “orthographic variations have very little in common with the variants found in Egyptian Koine” (Reference SmithSmith 2014: 243). Thus, he was able to disprove the Egyptian provenance of Alexandrinus via the orthographic research.

An analogy might be drawn with the so-called Qumran Hebrew Orthography, found in the Dead Sea Scrolls. Reference TovTov (1986) had claimed there was a separate ‘Qumran orthography’ which was characterized by the full rather than the short orthography (Reference TovTov 1986). However, after examining the orthography of other scrolls discovered around the same time, Kim challenged the stability of that orthography: “The Qumran sectarian works show a wide spectrum in their orthography, which refuses to be described uniformly. This clearly shows that the orthography of the Qumran literature was unstable” (Reference 709KimKim 2004: 81). For this reason, “the orthography can no longer tell where the text came from” (Reference 709KimKim 2004: 81). Orthographic features once again were used to shed light on the proposed origin of the manuscript. Here, we deal, however, with the scrolls rather than codices. It is not that orthography is a direct object of codicology; rather, orthographic research can contribute to codicological knowledge, helping to improve our understanding of the origin, time and location of the manuscript compilation. Without orthographic input, a codicologist may miss important points in the history of a codex.

Let us briefly take stock. Orthography helps paleographers to understand texts. It may help codicologists to characterize the provenance of a codex. Orthographic patterns and variations that allow the original date and locality of a codex to be established more precisely might be the line at which orthography meets codicology: orthographic data allow codicological inferences to be drawn about time and place.

15.4 Orthography and Materiality

In contrast to linguistic, ideological and social reasons for orthographic change, such as political intentions, regional traditions and dialectal variation, marking group belonging, materiality might be considered a physical condition or reason for spelling transformations and variation. Materiality is important for orthography when the choice of graphemes, other graphic elements and the rules thereof depend on neither phonetic nor cultural (social, political) dimensions and principles, but either on restrictions or encouragement of certain physical realities. The physical writing environment might become a factor not only for the style of a letter shape, but also for its orthographic adaptation.

Miller has argued that “a key question is to determine when, where and for whom the material attributes of things matter. Materiality like nature is mainly a potential presence. The solidity of the branch comes to matter mainly when you hit your head on it” (Reference MillerMiller 1996: 27). So, to expand on the metaphor, on what occasions might orthography hit its head on the solidity of a branch? What kind of influence might the tangible materiality of writing and printing exert on orthography? What impact might papyrus, parchment, paper, scroll, book, clay or wax tablet, pen or stylus, movable type, typewriter and computer monitor have on such an ideal aspect of writing as orthography – on its sets of rules, norms, combinations of graphemes, morphemes, punctuation, abbreviation and so on? How might the surface and space of a clay or wax tablet, or of a sheet of parchment or paper, affect orthographic particularities? It is common knowledge that the cuneiform writing system was devised for a stick or reed to be pressed into a wet clay (tablet, roll), but other writing tools were more appropriate for papyrus, parchment and paper, whose surface better accommodated the strokes of quill, pen or brush. In ancient Greece and Rome, a chisel was used to carve words in stone in majuscule (capitalization), but the wax tablet and stylus prompted the development of different shapes, what we call minuscule letters today. After the invention of movable print type, of course, text production changed dramatically; individual letter strokes in manuscripts were replaced by ready-made mirrored letter-blocks, known as types, set one by one on a composing stick, dipped in ink and pressed on paper.

Even some luxuriously prepared manuscript volumes have had a very unsettled orthography, containing signs of aesthetic play with variants, as in the case of Simonas Daukantas’s History of the Lithuanian Lowlands (1831–34). In this book, the message of the solidity of Lithuania’s past conveyed through the striking material shape of the volume was prioritized over the orthographic needs of a few potential readers. Individual attempts to develop orthographies have been characterized as much by an almost unrestrained imagination, a kind of challenge to the printing houses (e.g. Daukantas), as by the constraints imposed by available printers (e.g. Martynas Mažvydas, 1510–63, and Jurgis Ambrozijus Pabrėža, 1771–1849).

15.4.1 Durability and Ephemerality

The materiality of texts also means that they can be mutilated or destroyed. Despite the assertion by Roman emperor Caius Titus that verba volant scripta manent, as van Peer has argued, “[o]wing to their very materiality, written texts are prone to erosion and damage”; in fact, the written language is so vulnerable to material destruction that it “needs the creation and operation of special institutions in order to be preserved” (Reference Van Peervan Peer 1997: 35). Van Peer speaks about signs that are mutilated or erased for magical or political and economic reasons – any material destruction of texts or their parts. He considers that even editing one’s own text and proofreading it is a form of mutilation. Although typically described as a cleaning process (as in “[t]his allowed us to do a more thorough job of editing and cleaning of typographical errors,” Reference BernardBernard 1980: 134), even typographical correction involves the alteration – and therefore the mutilation – of a former text. The concept of mutilation draws our attention to the importance of any text that may be produced, even those that are labeled flawed. Mutilation presupposes the significance of the text in its premutilated form, and of any chain of alterations made to the text before it reaches us in its so-called final form. In short, it presupposes that all surviving texts are worthy of study.

In practice, however, people often ‘measure’ texts according to their chances for survival: an ‘erroneous’ text is likely to be destroyed more readily than its ‘corrected’ counterpart. In earlier centuries, printers tended to destroy manuscripts once the book versions of them had been printed. Proofs were not preserved; it is rare to find a sheet with marked proofs from the sixteenth century (see the Newberry Library text with the 1570 proofs at Reference NovickasNovickas 2004: 32). Even today we tend to perceive texts as either less or more worthy of preservation. Imagine a continuum of texts lined up according to their presumed intended perishability; at one end, the most preservable and durable writings would be placed, and at the other, the most ephemeral, those almost unworthy of a place under the sun. The dichotomy would be of what I call durability versus ephemerality. Manuscripts in general are recognized as more easily lost or destroyed than printed material. Consider Crain’s analysis of the nineteenth-century children’s books Jack and the Bean-stalk (1848) and The Harper Establishment (Reference AbbottAbbott 1855), which describes how storybooks are made. Crain noted that

[H]ere manuscripts are represented as an ephemeral and disposable stage of book production. Manuscripts, in their scruffy, fragile uniqueness, appear not as, for example, auratic signs of authorship but as “obscure and seemingly useless” rolls. Jack’s story and The Harper Establishment promote and valorize both the “thingness” […] of the codex, its seemingly durable materiality, and the collaborative labor (and laborers) invested in it.

(Crain 2013: 159)

The codex in this context is a printed book as opposed to a manuscript roll. Books are regarded as durable, whereas manuscripts are ephemeral and disposable. This dichotomy may be discernible in other spheres as well. Historically, as previously noted, minuscules were developed primarily to enhance the speed of writing, and the texts composed in them may have been perceived as prone to greater perishability than those written in majuscules – because minuscules were written on more perishable surfaces (i.e. wax tablets) and in less formal contexts. For example,

[I]n the fifth century, during the lifetime of Sr. Mesrop Mashtots, who invented an alphabetic, fully phonetic script for Armenian, and certainly soon after him, there existed both the rounded and square forms of erkat’agir [majuscule]. These were suitable for more or less formal purposes with different media (stone, parchment, papyrus), and there was a nascent bolorgir [minuscule and other cursive book hands] as well, without any “transitional” stage.

(Russell 2006: 278, Stone et al. 2002)

Similarly, an Egyptian rapid cursive known as “the book-hand” became “the hand used for religious books, while the rapid cursive [was] employed only for business and other secular affairs” (Reference BreastedBreasted 1910: 136). Both minuscule letters and rapid cursive would appear to be associated with the more disposable, perishable end of the survivability continuum. Production of codices in late medieval England can be tentatively allocated into two other categories according to the level of commercialization: “an amount of porosity between ‘non-commercial’ and ‘commercial’ is an important way of considering the degree of necessary co-existence of the two paradigms” (Reference Pouzet, Gillespie and WakelinPouzet 2011: 238), “and indeed between professional and nonprofessional” (Reference EchardEchard 2013: 300) paradigms. This distinction of ‘noncommercial’ and nonprofessional manuscripts could suggest an awareness of their comparatively high perishability. Commercial and professional correlate with durability, whereas noncommercial and nonprofessional are associated with the more ephemeral nature of such texts.

Accordingly, how a scribe (or society) perceives the durability and ephemerality of a text may depend on the material used (i.e. on materiality) and the purpose of the text (e.g. sacred or quotidian). The ultimate goal of a text may be discerned from the effort that the scribe put into its production. The commerciality and professionalism of a text also depend on the extent to which it is widely distributed and on the scribe’s ability to achieve and maintain a high standard of work. Thus, commerciality determines distribution, professionalism reflects ability, and durability indicates the degree to which a manuscript is intended to survive. That intention (i.e. the long- or short-term survival of the manuscript) is the primary concern of the scribe, followed by production (professional or otherwise), and then distribution (commercial or otherwise). Greater effort was put into the preparation and completion of texts that were meant to survive for longer, which generally were more accurate, tidier and better organized. In other words, the more durable the text, the stricter the orthographic norms. Texts intended to be preserved in perpetuity were expected to have a better kind of orthography. Carelessness can be a sign of greater ephemerality of writing, while meticulousness may signal the expectation of longer survival (Reference SubačiusSubačius 2021: 35, 38–39, 196, 539, 595, 625). The perceived materiality of a text, its chances for survival and its anticipated place in future society may have significantly influenced orthographic choices.

15.4.2 Printing

Manuscripts produced in the centuries that followed Antiquity and the Middle Ages fall outside the scope of either paleography or codicology. The print era nevertheless holds great interest for historians of orthography, as orthographic systems continued to develop. Not only do orthographies in manuscripts and prints sometimes differ, but some languages go through centuries of evolution before developing strong orthographic standards. Consider the early dialect selection standard languages in Europe, such as Danish, Dutch, English, French, German, Hungarian, Polish, Spanish and Swedish, all of which chose the dialectal norm as basis for their standard during the early modern period, and especially during the Renaissance (Reference SubačiusSubačius 2002). The basis for standard orthographies were laid then, but it took several hundred years until uniform orthographic standards were developed, most often during the eighteenth and nineteenth centuries. As for the late dialect selection standard languages such as Albanian, Croatian, Estonian, Finnish, Latvian, Lithuanian, Slovak, Slovenian and Ukrainian, the orthographic variation must have been even greater, as their orthographies typically were not standardized until the end of the nineteenth or the beginning of the twentieth century. Indeed, some European orthographies such as Galician, Macedonian, Rusyn, Valencian, Võru and Yiddish were evolving even after that, in the late twentieth century up to the present day. Some variation is still present even in highly standardized orthographies, such as German (daß and dass), English (minuscule and miniscule) and Portuguese (carácter and caráter) (Reference Marquilhas, Villa and VostersMarquilhas 2015: 284).

The changing materiality of text production during the print era was also a stimulus for certain orthographic modifications. Initially, as is commonly known, printers aimed to follow closely the traditions associated with manuscript production, but with the passing of time they became emboldened to introduce a variety of changes. Most contemporary readers simply do not notice standardized orthography, in which graphemes appear to be wearing uniforms. One can predict the letters that will appear within words and is never surprised. In short, standard orthography is, most of the time, for most people, uninteresting, even boring. However, if its uniformity is disturbed even slightly, one notices the difference at once. For those who wish to read at speed, a disturbed orthography is an unwelcome distraction. We read more quickly and absorb the information the text contains more efficiently when the orthography is unmarked. We do not want to be distracted by orthographic discrepancies that do not comply with our expectations and hinder our progress through the text. Over many years of training, we develop the skills required to read machine-made (printed) texts quickly, and orthographic uniformity guarantees our proficiency and contributes greatly to our ability to compete with the many other skilled readers in our present-day society. The uniformity of standard orthography is an asset that enables thoughts and ideas to be shared and exchanged much more quickly than before.

When we read historical texts, however, we often encounter prestandard orthographies characterized by diversity and variation. In the early phases of orthographic development during the print era, the need for uniformity was first recognized not by readers but by the typesetters in the printing houses (see Reference Voeste, Baddeley and VoesteVoeste 2012: 176). It was they who introduced ‘order’ into orthography, and readers reaped the fruits of their labor. As in today’s consumer society, a producer creates a product which in turn sparks demand for it. During the period of orthographic diversity, the introduction of a more uniform orthography carried the promise of greater efficiency and profitability for printers and typesetters of books and newspapers. As the upper and lower cases containing the type sets grew more standardized, typesetters became more skilled and were able to set words and entire texts more quickly. As the speed of production increased, the more profit the printer earned (see Reference Voeste, Baddeley and VoesteVoeste 2012: 174–76). The greater the profit, the more printers were able to invest in their businesses, enabling them to produce more texts – and the more texts to which readers are exposed, the more their reading skills improve. Thus, consumers benefited from the standardization of orthography achieved through printing.

Materiality had very different implications for printers than it had for manuscript writers. That difference could, in some cases, influence orthography, as printers altered orthography to ensure optimal speed of production. As they began to produce their own texts, printers found that majuscules and minuscules were often used in manuscripts. They adopted both. Today, the alphabets of many languages feature this dual system of letters; consider, for example, the Greek, Latin and Cyrillic scripts and the many alphabets based on them. Although some scripts or alphabets such as Arabic, Georgian, Hebrew, Hindi and Sanskrit are unicameral, meaning they have only one set of letters, we have inherited this duplicity from the manuscripts. Minuscule letters could be written more quickly with a quill than their majuscule equivalents. For a typesetter, however, the time required to set text in majuscules or minuscules was comparable. Had printing been introduced in early Antiquity, we can speculate that the printers would have created only one set of alphabetic letters (unicameral), since working with only one case of letters in front of them rather than with two would have enhanced the speed of production. But the tradition of two alphabetic sets was well rooted in manuscripts, so it was only natural that printers from the fifteenth century on would adopt it.

In a manuscript, one might write any sign with no loss of time or effort. Graphemic representation had no limits other than those set by the imagination, and any combination, diacritic mark, abbreviation and so on was possible. By contrast, a typesetter is constrained by the limited inventory of type in his cases – and the larger the inventory, the more intricate the process of setting. The two alphabetic sets of letters in manuscripts, based on Latin script, were not symmetric, and their asymmetricity was embraced by the early printers. Over time, however, printers made them symmetric. Consider the disappearance of the long graph ſ in orthographies based on Latin script. Until the end of the eighteenth century, texts in various languages contained two lowercase letters ſ and s and only one uppercase S for a single phoneme /s/ (ſ was used in the word-initial and median positions, s in the word-final position). At the end of the eighteenth century, printers in various European countries stopped including the asymmetrical long ſ in the font sets they were fashioning (Reference SubačiusSubačius 2004c: 239–43). With the long letter ſ missing from the sets of new fonts, printers were unable to employ it even if they wanted to. In his old age, the American politician, scientist and former printer Benjamin Franklin deplored the loss of the long ſ. Were it not for the printers, Latin-based alphabets might still have asymmetric features like the one that persists in the Greek alphabet, in which the capital sigma Σ has two corresponding lower case sigmas σ and ς.

Consider another asymmetry, of the letters V and v, u ; I and i, j that in early modern times was almost omnipresent in Latin script-based alphabets. Printers modified these orthographic particularities by filling ‘the gaps’ in the upper case through adding symmetrical capitals U and J . Thus, the materiality of the printers’ milieu encouraged them to make upper and lower cases more symmetrical, to help typesetters synchronize their skills for both typesetting cases. Consequently, the physicality of the printers’ environment influenced the Latin script and the orthographies based on it. In sum, for orthographies based on Latin script, the ‘order’ achieved by the standardization and symmetry in the two sets was primarily developed through the efforts of printers.

Newly developing orthographies were often limited to the inventories of those printers who were ready to print texts, but only in other, earlier orthographies of other languages. Consider the case of Jan Weinreich, a sixteenth-century East Prussian printer who used to print Latin, German and Polish texts in Karaliaučius (Regiomons, Królewiec, Königsberg; today Kaliningrad). The author of the first printed book in Lithuanian, Martynas Mažvydas, took his text to Weinreich’s shop to be published in 1547. There it was printed with two Polish letters ą and ę containing the diacritic mark ogonek ˛ to signify nasality. Polish orthography marked nasality only by these two letters, whereas Lithuanians used to pronounce four nasal phonemes. In the manuscripts of that period, other Lithuanian authors used to mark the nasality of these four letters (i and u in addition to a and e) with dots under the letters (Reference GelumbeckaitėGelumbeckaitė 2008).

For a long time, the inventories available to printers determined that only the graphemes corresponding to nasals, that is <ą˃ and <ę>, were marked, and the diacritic dot under the letter that was present in manuscripts was never printed. The printers were ready to use the letters they had in Polish type sets, but any request for a new grapheme would have been a pain in the neck. Not only was it expensive to cut punches, make matrixes and cast the type in lead, but printers also had to think about where these new letters would be placed within their type cases, and changing the location of type undermined their skills and their ability to work efficiently. Today, Lithuanian orthography includes four nasal graphemes <ą>, <ę>, <į>, <ų>, indicating that the shape of the ogonek was chosen by the early printers and not by the Lithuanian authors, who initially preferred to use dots under the letters; the <į>, <ų> diacritics were modeled after <ą>, <ę> subsequently. The materiality of the printing environment (i.e. the inventories of printers) was a decisive factor in the choice of certain Lithuanian graphemes. One might label this a Polish orthographic influence, which was ensured by the printers’ inventories.

15.4.3 Double Orthography: Prints and Manuscripts

As the printing process advanced, an important duplicity was recognized in divergent orthographies. Regarding eighteenth-century English orthography, Noel Osselton wrote:

Lord Chesterfield speaks (in The World, 1754) of “two very different orthographies, the pedantic, and the polite ” which were current at the time and it seems to me that the history of English spelling in that period must take account of this double standard which both existed and was recognized to exist. Dr. Johnson’s letters are full of spellings he would never have countenanced in his Dictionary.

(Osselton 1963: 274)

Orthographic duplicity developed as printed texts diverged from manuscript writing traditions. The difference between the two also stemmed from the formal and informal ways of using orthography in printed texts as opposed to manuscripts. According to Tieken-Boon van Ostade, there were “two spelling systems currently in use, a public spelling system and a private one” (Reference Tieken-Boon van Ostade, Fisiak and KrygierTieken-Boon van Ostade 1998: 457). Moreover,

[t]he existence of a dual spelling system was recognized as such in the early eighteenth century, as the following quotation from The Spectator (1711) shows: […] “he told us […] that he never liked Pedantry in Spelling, and that he spelt like a Gentleman and not like a Scholar .”

(Tieken-Boon van Ostade 1998: 464)

Of course, pedantic and scholarly spellings were those used by the printers in contemporary terms. On the continuum from durability to ephemerality, the pedantic and scholarly way of spelling and those termed polite and gentlemanly would have been located at opposite ends, with the former being considered more durable than the latter. Different material reality of printers modified not only the ways the texts were produced and distributed, but also the orthography itself. As Rutkowska has noted in her review of Orthographies in Early Modern Europe (Reference Voeste, Baddeley and VoesteBaddeley and Voeste 2012b), a similar duplicity of orthography has been identified in many other European languages:

The evidence for the significance of printing and printers in the process of standardisation can be seen in that printed books reached a relatively high level of spelling regularisation much earlier than handwritten documents which preserved idiosyncrasies for decades or even centuries longer. In fact, in several languages, two separate systems have been identified, one in printed books and the other in manuscripts, for example in English (Nevalainen, pp. 141–46), Polish (Bunčić, pp. 224–25), and Czech (Berger, pp. 264–65).

(Rutkowska 2015b: 299–300)

Double orthographies may have existed within the scope of prints themselves. Around the turn of the twentieth century (c. 1899–1904), a double orthography was developing in the printed materials produced by the Lithuanian diaspora in America. The texts identified as worthy of longevity were printed in books, and those that were presumed to have a shorter life expectancy were found in newspapers. Book orthography included the ‘new’ graphemes <č> and <š> for the /tʃ/ and /ʃ/ phonemes, while newspaper orthography included the ‘old’ digraphs <cz˃ and <sz> for the same phonemes. It was not uncommon to find both in use in the same newspaper. Often on the same page, some columns would be printed in the older newspaper orthography and others in book orthography. The latter, which were wider and ended with the phrase ‘to be continued’, served to preserve the serialized texts in standing type form until the entire text was completed. These were then reprinted as a separate volume; book orthography was used in these columns in anticipation of their eventual transfer to book form (Reference Subačius, Baldi and DiniSubačius 2004b).

Comparable orthographic duplicity was used in Lithuanian newspapers of East Prussia (1890–93). One orthography was used in those newspapers that were aimed at more educated readers, which were considered more durable, and another in those which were aimed at the less educated, such as newspapers about farming, and therefore were more perishable (Reference Venckienė and StaliūnasVenckienė 2004). In both cases, one Lithuanian orthography was coherently employed in some texts and another in others, as if both those orthographies were already partly standardized and the standards observed. Duplicity of orthography in printing was dependent less on the materiality of the production means than on the presumed survivability of the matter.

Prior to standardization, creative individuals may have been inventing new orthographies, perhaps with the intention of competing with tradition, or perhaps not. Consider the Lithuanian author Jurgis Ambrozijus Pabrėža, who compiled multiple manuscript volumes and never printed a line of them. His orthography of doubled vocalics <aa˃, <ee˃, <oo˃, <ii˃ and <yy> stands out as the only attempt to introduce the doubled vocalics in the entire history of Lithuanian writing (Reference Subačius, Fishman and GarciaSubačius 2011b: 448). Pabrėža’s orthography, however, was designed to contain only those letters present in Polish orthography, because the printing types were adjusted to print Polish texts . Thus, knowledge of the material content of printing houses influenced Pabrėža’s unique orthographic system.

An equally creative person was Simonas Daukantas, whose orthography, especially in his manuscripts, was probably the most diverse in the known history of written Lithuanian. In his voluminous manuscript History of the Lithuanian Lowlands (1831–34), for instance, Daukantas employed at least seven different digraphs <ei>, <ęi>, <yi>, <ij>, <ie>, <iei> and <iey> to render a single diphthong /əi/ of his Lowland Lithuanian dialect, even though he could have picked up a single uniform digraph had he followed a generalizing phonetic rule. The impressive size of the manuscript (553 leaves = 1,106 pages in folio) itself conveyed the message that the history of Lithuania was weighty and equipped with the Lithuanian language, that it had a concrete, material shape, and that it was solid and undeniable. The desire to impress readers of both Polish and Lithuanian with the sheer visuality, the sheer materiality of the volume was more important to Daukantas than generalizing orthographic rules. On the contrary, the emphasis on the materiality of the volume enabled Daukantas to take a relaxed approach to orthography. For instance, in approximately one half of his manuscript Daukantas experimented with the aesthetics of spelling variants: the long initial graphemes <l>, <t> and with the ascenders and descenders correlated with the long variants of the digraphs <yi> and <ij>, <y> and <j> with the descenders (Reference Subačius and CondorelliSubačius 2020; regarding aesthetic variation in orthography, see Reference Voeste, Elspaß, Langer, Scharloth and VandenbusscheVoeste 2007b: 303). The striking materiality of the manuscript both encouraged and enabled the laxity of the newly developed individual orthography. Because orthography was less important than the materiality of the work, it became a playground for aesthetic individualism.

15.4.4 Text

The materiality of the laid-out text, as it stretches on a specific material surface (e.g. a page of paper), can also have an impact on orthographic decisions and variation. Traditional ‘on-page’ textual materiality, or direct materiality, may influence usage of abbreviations. The goal of saving precious writing material facilitated the development of a wide range of abbreviations in the manuscript culture of Latin and other languages in the Middle Ages. The most famous study of Latin and Italian abbreviations was completed more than a century ago by Reference CappelliCappelli (1912). Using textual abbreviations and ligatures conserved the expensive parchment used for handwritten texts and saved time for the scribe producing the text. Due to the scarcity of writing material, the scribe was expected to ‘save’ text, that is, to use fewer graphemes to render the same amount of text. The more expensive the writing material, the more abbreviations one was encouraged to apply.

Initially printers had imitated the traditional practices of scribes and used the abbreviations developed for manuscripts. This changed as printing became more developed. Paper was much more available during the print era, and setting entire words in print made the text much more legible and accessible. Printing changed the attitude of public toward abbreviations and led most of them to be abandoned. The dependence of orthography on the materiality of the page was evidenced by the practice of interrupting words at the end of a line and completing them on the next line. In earlier centuries, before the modern hyphen <-> became widely accepted, various approaches had been utilized, from a slash </>, to a double hyphen <=> to no sign at all. Then and now, the deployment of this mark depends on the position of the word at the end of the line, that is, on the direct materiality of the page. Position at the end of the line alone controls the rule, not any other orthographic principle. One more kind of direct materiality influenced the development of orthographic features: the different graphemes chosen in different positions on a page. For instance, in most places in his manuscripts, Pabrėža used to write the diagraph <sz> for the phoneme /ʃ/, but sometimes at the end of the line, and only in the final word of a line, he marked the ‘old’ ligature <ß> instead. Thus, the end-of-the-line position influenced graphemic variation. In such cases, the wider the sheet of paper used, the fewer <ß> ligatures would appear (Reference 767SubačiusSubačius 1996: 18–19).

15.5 Conclusion

The materiality of texts leaves them open to mutilation, destruction or preservation. People ‘measure’ texts according to their chances for survival. An ‘erroneous’ text (e.g. an earlier draft) is likely to be destroyed more rapidly than its ‘corrected’ counterpart. In certain cases, the development of changes in orthography may have been influenced by an awareness of worthiness for preservation and by the prediction of the length of their material life (durability or ephemerality of texts). Historically, minuscules were developed primarily to enhance writing speed, and texts composed in them may have been perceived as more perishable than those written in majuscules. Manuscripts are generally recognized as more perishable than printed texts. Texts intended for lengthy preservation were expected to have a more prestigious orthography. Simultaneous double orthographies developed in various societies, one being more prestigious than another.

Typesetters were the first to recognize the need for orthographic uniformity. In manuscripts, the two alphabetic styles of capital and noncapital letters, which were based on Latin script, were asymmetric, but over time printers made them symmetric by abandoning the ‘redundant’ long <ſ > and introducing uppercase and <J>. Printing changed the attitude of the public toward abbreviations and led most of them to be abandoned. The hyphen at the end of a line is dependent on the word position at the end of the line, on the direct materiality of the page and not on any other orthographic principle. Sometimes different graphemes were positioned at the end of a line. Paleography involves using historical orthographies; codicology may be served by them. The dependence of historical orthography upon the ‘direct’ materiality of texts and upon the material aspects of text production is a comparatively new and developing field of research.

16 Data Collection and Interpretation

16.1 Introduction

This chapter is intended to offer assistance with the earliest stages of an empirical investigation, while you are contemplating how to design a study. For this reason, the underlying premises and assumptions of analysis, which are usually taken for granted, are stated here explicitly and hopefully clarified to some extent. For a project already in the planning phase, this chapter helps you to evaluate the advantages and disadvantages of a given method. Moreover, it helps you to understand which theoretical decisions are already inevitably made when choosing an empirical method. If you are aware of a method’s theoretical prerequisites, you may be able to give your study a sounder theoretical basis and discuss its theoretical foundations more convincingly. A key issue, addressed at different points in the chapter, is that data collection and interpretation are always intertwined. Data collection is predetermined and controlled by prior findings or by your research objectives. Conversely, the modes of description applied during data collection lead to a preference for a specific set of interpretations. While many scholars rely on data that have been collected and described by others (e.g. when using historical corpora), analytical reasoning should be applied, hypotheses formulated and sound interpretations reached. Even large corpora and effective analysis tools will not spare us from this task. In the comic science fiction The Hitchhiker’s Guide to the Galaxy (Reference AdamsAdams 1979), ‘42’ is the noninterpretable answer to the ultimate question of life, given by a supercomputer after several million years of computing time. It is still the case that ‘42’ is not a good answer if this number is not interpreted in its historical context.

In the following, I briefly discuss two typical forms of historical spelling studies, namely the tracking of a specific spelling feature, that is, its emergence, spread or decline, and the observation of a spelling variable, that is, the alternation between different spelling variants. Both cases allow for comparative variable analysis, as tracking specific spelling features also works according to the pattern of 1:0 (occurrence vs. nonoccurrence; 1 = occurrence, 0 = nonoccurrence). Three possible methods are available for comparative variable analysis, which are presented in more detail in Section 16.2. These methods are the comparison of variants in a single text (TRAVA), the comparison of two or more texts (TERVA) and, as a subvariant of this second method, the comparison of different copies of the same text (CTVA). The advantages and disadvantages of each method can be taken into account if you are aware of each method’s dangers and pitfalls. A targeted use of these methods or a use of two or more of them in combination may help you to strengthen your interpretation and/or to find alternative explanatory hypotheses.

16.2 Data Collection

The first step in any investigation involves basic decisions about which research questions to pursue and which analytical tools to use for the empirical investigation. In these times of ‘computational revolution’ and the widespread use of statistical methods, it is possible to draw on larger corpora of electronic data. This allows the researcher to rely on broader data as a foundation for their work and thus hopefully glean more significant and representative results. Even with the increasingly widespread use of computational technology, a key question remains: How good really is the material that we have in our hands for analysis? As is often pointed out, Reference LabovLabov (1994: 11) claimed that historical linguistics can be described as “the art of making the best use of bad data.” Even if this seems to be true at first glance, especially from the point of view of ‘big data’, it may still be argued that the statement is misleading. Data are always incomplete, never perfect: historical data can never fully reflect the complexity of the linguistic reality in which we are actually interested. Even large corpora do not entirely mirror reality or guarantee objective knowledge. They are still highly filtered, codified and potentially distorted datasets (see also the demand for more representative corpora in Reference Elspaß and MaitzElspaß 2012a). The availability of more data does not necessarily mean that patterns would be less messy, more transparent or easier to interpret for the researcher.

On the contrary, the availability of more data may entice us into assuming the primacy of correlations over causal explanations (Reference MazzocchiMazzocchi 2015: 1252). However, correlations only inform us that something is happening; they cannot give us the crucial answer of why it is happening. In fact, we now know that with more datasets, the possibility of false or so-called spurious correlations increases: Reference Calude and LongoCalude and Longo (2017) confirmed that the bigger the database which one mines for correlations, the higher the chance of finding recurrent (spurious) regularities, which of course should not be interpreted as evidence of causation. Therefore, it should be borne in mind that the same epistemological difficulties as those inherent to the more narrow-scale, traditional methods remain when using ‘big data’ and applying statistical methods. The ‘computational turn’ has thus not resulted in electronic tools taking over the task of analytical reasoning. We still need to formulate hypotheses based on nondeductive inference rules (e.g. by induction or analogy) and test them for plausibility (Reference CellucciCellucci 2013, section 4).

Moreover, we still have to deduce creatively from the available data and include in our deductions even implausible inferences that might elude computer-based logic. The problem is easily demonstrated by the example of language contact where the texts under study are more or less influenced by a contact language. This is especially true for so-called invisible languages, a concept that was introduced by Reference Havinga and LangerHavinga and Langer (2015: 1–34). The term refers to linguistic features or even entire varieties that were marginalized or systematically excluded from the written language at a certain point in time (e.g. Low German, South Jutish, North Frisian). Invisible languages may be stigmatized substrate languages or regional varieties which only become perceptible when interfering with the textual records of the elite language. That makes it even more important to rely on our experience and our interpretative strength to detect, for example, subtle contact phenomena in the records.

Hypotheses play a role even at the beginning of data collection. Data collection itself is controlled by hypotheses and prior findings. Pragmatic considerations must also be weighed, asking, for example, how much time is available, what the costs are, and what expertise is needed. In order to answer these questions and obtain relevant data, we first need to identify the research questions. In the field of historical orthography, identifying a research question typically involves two possible approaches: focusing on the usage (emergence, spread or decline) of a particular spelling feature or on the variable occurrence of different spellings. When starting with existing corpora or easily accessible sources, it makes sense, from a practical point of view, to investigate questions that are predetermined by the corpus or the given material. Therefore, it is more than likely that the corpora or the sources will affect our choice of questions. Alternatively, one could put together a corpus or modify the datasets of an existing corpus in order to address a previously defined research question. In either case, however, the data to be collected or used will not be an amorphous amount of data put together without any criteria, but rather they will always be a targeted and sensible selection of historical material carefully selected and compiled for researchers as a whole, or a specific research project.

For the first approach, if the usage (emergence, spread or decline) of a particular spelling feature is to be addressed, a number of additional parameters must be decided, for example, which time periods, which text types or which writing materials are to be included. This is where the first hypotheses and, of course, the researcher’s expertise come into play: the selection is already determined by the consideration of which texts or which linguistic or external factors may be particularly relevant for one’s investigation. Let us look at some simple examples. One could investigate the use of graphemes, graphs, glyphs or other characters, for example homoform digraphs (e.g. aa and tt ), heteroform digraphs (e.g. ae and th ), diacritics (e.g. í and ñ ), ligature glyphs (e.g. æ and fi ), allographs and contextual variants (e.g. |ſ| and |ς|), abbreviations (e.g. and ₰), punctuation marks (e.g. ¿ and ⸗) or typographical spaces of different widths (em-quad, en-quad ). At first glance, these examples may seem like superficial questions of form, but they nevertheless entail many potential problems that influence the researcher’s hypotheses and, therefore, their data collection and methods. Homoform digraphs may indicate distinctive syllabic features (New High German Ratte /ʁaṭə/ ‘rat’ vs. Rate /ʁaː.tə/ ‘rate’; dots indicating different syllable boundaries) and they may be subject to combinatorial restrictions (Early New High German *raatt ‘council’). Allographs and contextual variants may involve questions of capitalization and word separation, and so morphological, syntactic and semantic factors must be taken into account.

The investigation of graphotactic combinations of graphemes quickly leads to questions of specification and to those of possible minimum or maximum word constraints. Depending on the context, graphemes may be underspecified and overspecified. In German, <v> is underspecified because it may be pronounced as [f ] or [v]. On the other hand, <g> may be overspecified in cases of final devoicing. Although <g> is to be pronounced voiceless at the end of the syllable, the logographic spelling specifies more than it usually does. It indicates paradigmatic congruency with word forms in which no final devoicing occurs (Tag according to Tages, Tage). Minimum or maximum word constraints describe the possible allowed length of word forms, that is, the minimal or maximal size of lexical words. In Polish, these constraints allow either minimum words like the preposition w ‘in’ or maximum words like the noun konstantynopolitańczykowianeczka ‘young girl of Constantinople’.

Furthermore, the influence of external factors is to be expected. Abbreviations, for instance, may have been used predominantly by professional writers, or homoform digraphs might have been an important aid for a target group of unskilled readers. If we want to take a closer look into one of these issues, we will also have to collect or select data that shed light on any internal and external factors involved or use corpora for which the influence of these factors has already been proven. On the basis of prior findings, preliminary hypotheses and initial samples of our data, the researcher can then decide in favor of either a qualitative or a quantitative survey, and select a synchronic longitudinal study of individuals, a cross-sectional study (a ‘snapshot’ of a specific historical moment) or a comparative diachronic analysis of several points in time (time-series analysis). The researcher will probably want to choose a text type in which the spelling feature occurs frequently and a writing material that might be particularly interesting for their research question. Examples may include uncommon writing surfaces such as stone, wood or metal (epitaphs, house inscriptions, jewelry) (Reference BalbachBalbach 2014, Reference SchmidSchmid 1989) or the relationship of text and image on paintings or coins. One who considers undertaking a comparative analysis in a cross-sectional study may want to compare different writing materials, for example parchment and paper, scrolls and codices, block books and incunabula, or luxury editions and ordinary reading copies. Scrolls were more difficult to handle during the process of writing than codices, so scribes could not as easily move back and forth between paragraphs; therefore, spellings in scrolls may be less consistent. Block books contain lettering that has been carved out of woodblocks; they differ fundamentally in terms of production from printing with movable type. Can we expect differences in spelling or in punctuation between the two types? Luxury editions can physically weigh so much that no one would want to pick them up or hold them while reading, in contrast to ordinary reading copies. Did the scribes of those prestigious, luxury copies therefore put more effort or less effort into writing uniformly?

Now for the second approach to studying historical material – that of addressing a spelling variable. If occurrences of different spellings are to be investigated, the variables in question must first be properly defined. It may be helpful to remember that variables do not only consist of simple variants according to the pattern of king vs. kyng or Early New High German nahme vs. na_me ‘name’.Footnote ¹ Variables may also include sequenced elements, such as A + B vs. B + A (Early New High German raht/rath ‘council’). Variants of these two types (so-called simple and complex paradigmatic variants) may even co-occur and make up variants that are multipart (so-called syntagmatic variants): bo_ke and book_ or queen, que_n, cween, cwe_n (see Reference WolframWolfram 1991: 23–24, Reference Wolfram and Brown2006: 334, Reference Voeste, Baddeley and VoesteAuer and Voeste 2012: 253–55 for corresponding grammatical variants). It must also be considered that variants may contain word boundaries. This is obvious in compound spellings (hitchhiker/hitch-hiker/hitch hiker) or in ‘long-distance’ assimilations such as Old High German mag ih vs. meg ih (i-mutation), but could also play a role in other cases, for example wyth hys mount/with hys mount (analogous to satysfye/satisfye).

The next step is to choose a method or to combine different methods in order to analyze the variables in question (as also discussed in Reference Voeste and CondorelliVoeste 2020). Intratextual variable analysis (TRAVA) involves the investigation of the frequency and range of variants in a single text copy with the objective of comparing specific contexts and explaining particular usages of the variants. Since the external variables remain constant for a single text copy, this method is particularly useful for the detection of potential internal factors that trigger the choice of a spelling variant, such as lexical category, intervocalic position or syllabic characteristics. It is also helpful for detecting hypercorrect formsFootnote ² or unintentional interferencesFootnote ³ stemming from the local substrate variety or the native tongue of a scribe or typesetter. The intertextual analytical method (TERVA) aims to compare the results of two or more intratextual investigations, for example with respect to different external determinants such as time and place. This approach may also be used to identify diachronic or diatopic differences in order to exclude them from a study of other external constraints such as the influence of scribal ‘schools’ or language contact. The third method, namely cross-textual variable analysis (CTVA), is a subtype of the intertextual analytical method. It compares the variants of different versions of the same text. This method is also based on comparing alterations from one version to the other in order to detect a pattern of deliberate changes. As a precondition of its use, this approach requires successive textual records, such as a handwritten template and printed edition or different copies of the same text. Ideally, the three methods can be combined in order to uncover the impact of language-internal and external influencing factors on spelling, including diachronic, diatopic, text-type-specific or media-dependent aspects (concerning the writing material). In what follows, I present the advantages and disadvantages of the three methods in more detail. My examples stem from the early history of printing, as this is the working area that I am most familiar with as a researcher.

16.2.1 Intratextual Variable Analysis (TRAVA)

TRAVA is the most suitable method for determining possible internal factors that trigger the choice of a variant, such as syllable boundaries (Early New High German menner ‘men’ but man ‘man’),Footnote ⁴ adjacent sonorous segments (Turkish düğün ‘marriage’) or assimilations (Hungarian egyben ‘in one piece’, hatban ‘in six pieces’), and questions of word shape (so-called graphematic weight )Footnote ⁵ (Late Middle English ‘Chancery standard’ theyre ‘their’) or lexical category (German function words in/*inn, dir/*dier ‘to you’). This intratextual approach by no means excludes language-external factors. Aspects such as regional origin, level of education and experience, or individual preferences (idiosyncrasies) of the scribes or typesetters involved may have led to the deliberate or unintentional use of certain variants. But if we deal with variability in one and the same text copy, diachronic, diatopic or text-type-specific explanatory factors are less likely.

TRAVA has the advantage of allowing for detailed insights into a single text copy. This provides the opportunity to identify even factors that are not immediately apparent. It is advisable to use TRAVA as a preliminary study before starting a larger investigation in an intertextual variable analysis. Potential decisive factors identified in a single text can then be further investigated in a larger corpus. TRAVA is by no means simpler than other methods. Especially when looking at a low number of tokens, it is difficult to rely on percentages only. Consider the following example. Figure 16.1 shows the text of a pamphlet of about 1500 (ISTC ih00134500),Footnote ⁶ a Christian song about Mary’s sufferings, in which I have highlighted the uppercase and lowercase letters of the nomina sacra : Maria (13), maria (7), iheſus (2), Jheſus (1), Johannes (2), Joſeph (2), ſymeon (2), and annas (1). In total, 18 of the 30 tokens (i.e. 60 percent) are capitalized. If we were to conduct a time-series analysis, we would certainly observe that the percentages increase rapidly during the sixteenth century until all nomina sacra are capitalized (the topic is discussed thoroughly in Reference Bergmann and NeriusBergmann and Nerius 2006). In the case of TRAVA, however, one cannot simply argue for a change in progress; one must also address the question of why variable spellings occur at all at a given point in time. Was the author or typesetter not aware that his spellings were inconsistent, or was his use of upper and lower case indeed intentional?

Figure 16.1 Die sieben Herzensleiden Unserer Lieben Frau, 5 pages, with highlighted uppercase and lowercase letters of nomina sacra

(ISTC ih00134500)

As preliminary work for such hypotheses and interpretations, the data must be carefully checked and described. If we take a closer look at the incunable, we can see that the name Maria is repeatedly placed at the beginning of a paragraph (opening the verse), namely as the initial word or after an alinea,Footnote ⁷ so that the use of upper case may have served as a structuring element. Furthermore, in 11 out of 13 instances, names as vocative expressions are written in upper case. Names also tend to be written in upper case when following a full stop (9 of 13 instances) or when they are the first word at the top of the page. Eventually, even exceptions can be explained. Sometimes the space in the line seems not to have been sufficient to use a capital letter (see Figure 16.2). All these considerations are of course only hypotheses based on the one incunable selected. They would, however, be a good starting point for a more detailed survey. If one wanted to plan a larger study on this basis, one would include the position (i.e. initial, after pilcrow, after full stop, at the top of the page) as well as the case (i.e. nominative) or the usage in a vocative expression and consider questions of line spacing (i.e. justification), and then one could determine which of the hypotheses would work in a large-scale approach.

Figure 16.2 Maria in lower case (vocative after full stop)

(ISTC ih00134500, p. 4, line 15)

16.2.2 Intertextual Variable Analysis (TERVA)

TERVA is a method that involves at least two different texts to investigate the influence of a particular variable, usually the influence of an external one such as time or place of text production. The comparison should be conducted in such a way that only one variable, the so-called independent variable, is examined as a possible causal determinant. TERVA is difficult to perform because it is usually based on the precondition of ceteris paribus (‘all other things being equal’). If we want to find out whether the independent variable influences the dependent variables (the so-called regressands), it must be verified that no potentially confounding variables interfere. Consequently, all other variables have to remain the same. However, the main problem that has to be addressed is the extent to which the ceteris paribus condition is fulfilled or can be fulfilled at all. Usually, historical studies do not strictly follow the criterion of ceteris paribus ; for example, texts stemming from the same region or from the same decade or the same quarter of a century are often considered to be ‘the same’. On the other hand, the ceteris paribus condition can be interpreted very strictly so that, for example, two London texts may only be regarded as ‘the same’ if they were produced in the same year, in the same printing shop or chancery, or even by the same typesetter or scribe. Sometimes, even if we focus on one and the same individual, challenges are never lacking. Reference Bowie, Gerstenberg and VoesteBowie (2015) has shown that speakers behave a lot like sinusoidal curves, oscillating back and forth between variants during their lifetime. This has also been proven for spelling, morphology and syntax. A corpus analysis of Thomas Mann’s works (Reference GrimmGrimm 1991) has shown, for example, that his linguistic features are sometimes more consistent with his brother Heinrich’s than with his own. Before engaging with a TERVA type of study, therefore, one should carefully consider when the ceteris paribus condition is sufficiently met and when it is not, in order to conduct a reliable study that produces meaningful results.

In the following example, I compare three theological treatises printed in Hanover in 1669/70.Footnote ⁸ The publication was motivated by the conversion of John Frederick, duke of the Welf dynasty in the northern territory of Brunswick-Calenberg, to the Roman Catholic Church. The increasing presence of the Catholic faith in Hanover, an exclusively Protestant residence,Footnote ⁹ led to a theological dispute, initiated by the chief court chaplain and general superintendent Justus Gesenius. Using the pseudonym Timotheus Friedlieb, Gesenius started the dispute by publishing a treatise entitled Warum wilt du nicht Römisch=Catholisch werden/wie deine Vorfahren waren? (‘Why don’t you want to be Roman Catholic like your ancestors were?’). The question was soon answered on behalf of the Duke by the Jesuit Gaspar Sevenstern and the Capuchin priest Christoph Kirchweg (Reference KöcherKöcher 1895: 57). The analytical material that I present here meets the ceteris paribus condition sufficiently. My approach is based on three texts of the same type, printed within a few months in the same city. Despite these similarities, the treatises show striking differences. Both Sevenstern and Kirchweg chose Upper German variants, which were not typical of the northern territory of Brunswick-Calenberg and the city of Hanover (Reference Ahlzweig and PieskeAhlzweig and Pieske 2009). Compared to Gesenius, this is particularly evident in the use of the apocope (Reference HabermannHabermann 1997, Reference Habermann, Macha, Balbach and Horstkamp2012: 75–78).Footnote ¹⁰ While Gesenius chose the variant with word-final -e (Ende ‘end’, Worte ‘words’) in almost all instances, his opponents preferred apocopes in the singular and plural. Figure 16.3 shows the distribution of variants with final -e in percentage values. The percentages are based on at least 100 tokens in the singular and 50 in the plural per text (tokens in the dative singular case were excluded). Looking at these results, one question comes to mind immediately: How to explain the contrast between Gesenius’s text (96–99 percent apocope) and Sevenstern’s/Kirchweg’s texts (39–49 percent apocope)? The crucial difference between the treatises is the conflicting theological position of the three opponents: Gesenius was a Lutheran, while Sevenstern and Kirchweg were both Roman Catholics. Therefore, a use of Upper German variants such as Cron-⊘ ‘crown’ instead of Crone or Zeugnus ‘testimony’ instead of Zeugnis by Sevenstern and Kirchweg suggests that these southern variants functioned as Catholic shibboleth forms even in the North (for East Upper German, see Reference RösslerRössler 2005). The comparison shows that at the close of the seventeenth century, after the Counter-Reformation had come to an end with the Peace of Westphalia (1648), denominational differences still influenced the choice of variants.

Figure 16.3 Percentages of final e in singular and plural

Although the analysis of an isolated determinant based on the ceteris paribus condition is useful and convincing, it is not always the best choice. Consider that a causal relationship between a spelling variable and an external or internal constraint may be difficult to isolate since linguistic variables can be affected for more than one reason (for an introduction, see Reference WalkerWalker 2010, Reference Walker, Podesva and Sharma2014). In addition, various factors may affect each other or trigger the choice of a variant so that it would not be advisable to isolate a single causal determinant. Therefore, especially when looking into larger corpora, historical linguists work on the assumption of mutatis mutandis (‘with the necessary modifications’), allowing other possible determinants to change as well. In a time-series analysis, for example, multiple factors (such as text type, dialect or social rank of the author) are correlated with each other while the aim is still to determine which one of them is particularly influential. While the ceteris paribus assumption is helpful for isolating causation, the mutatis mutandis concept suggests a multivariate analysis in order to measure correlations between a spelling variable and several other determinants, usually computed with the correlation coefficient r using the open-source software R (Reference GriesGries 2013).

16.2.3 Cross-textual Variable Analysis (CTVA)

CTVA is a subtype of TERVA and it involves comparing different versions of the same text. As a precondition, this method requires different copies, reprints, editions or successive textual records, such as a draft, a first complete manuscript and a fair or final copy. CTVA adheres even more strongly to the principle of ceteris paribus because it retains the author and the text not just as similar but as identical factors. CTVA is particularly suitable if one wants to investigate changes that appear to have been made intentionally (and perhaps even systematically) from one version of a text to the next, possibly inserted by the scribe, the editor, the proofreader or the compositor. Those types of variation may indicate a sound change or characteristic regional or local features, and, even more interestingly, they may warrant a revaluation of spelling variants. Revaluations or reanalyses typically occur in folk etymologies or in so-called eggcorns (eggcorn is a misinterpretation of acorn). In spelling, they are most evident in capitalization or in word segmentation, when syllable or morpheme boundaries are explicitly clarified. Revaluations can be used as evidence for the grammatical editing of spelling variants.

In the following sample study, I compare three editions (A, B₁ and B₂) of a poem by Hans Witzstat, later used as a hymn text, Der geiſtlich buchßbaum. Von dem ſtreyt des fleyſchs wider den geyſt, ‘The sacred boxwood. On the conflict between flesh and spirit’. The title is probably an allusion to a popular song about a dispute between a boxwood and a willow. All three editions were printed in Nuremberg, first in 1526 (A) by Jobst Gutknecht and then reprinted, probably in 1528 (B₁ and B₂), by Kunigunde Hergot, who continued her husband’s work after he was executed for political reasons in 1527. The ceteris paribus condition is strictly met for the texts mentioned above. Two copies even stem from the same printing shop. The editions consist of only seven pages in octavo. Nevertheless, there is a whole range of differences in terms of spelling. Texts A and B₁ differ in 148 instances, A and B₂ differ in 178 instances, while B₁ and B₂ differ in 72 instances only (see the examples below).

Particularly apparent are the frequently occurring differences in the graphic representation of the diphthong /aɪ/ or /ɛɪ/, which is written <ey> or <ei>, once also <ai> (in A, typical of Middle High German (MHG) /ɛɪ/ in Upper German dialects). Apart from the <ai> spelling of MHG /ɛɪ/, there is no historically justifiable pattern. MHG /ɛɪ/ and /iː/ appear both as <ei> and <ey>: examples include reyn ‘clean’/allein ‘alone’ (MHG reine, alein) and weyl ‘while’/feindt ‘foe’ (MHG wîle, vî(e)nt ). There is, however, an interesting level of consistency among the texts: due to positional constraints, never occurs at the end of a word (e.g. bey ‘by, near’, drey ‘three’, frey ‘free’). At a first glance, the alternation between <ei> and <ey> seems to represent free variation, because the same types are realized as variants (dein ~ deyn ‘your’, fleiſch ~ fleyſch ‘flesh’, ſtreit ~ ſtreyt ‘dispute’). However, the typesetters of each of the three texts proceeded according to different principles. Text A from Jobst Gutknecht’s workshop shows a clear tendency to avoid <ei> in front of letters whose stems have ascenders, probably because the i-dot comes too close to the ascender (Figure 16.4). But while he opted for <ey> instead of <ei> here, he otherwise still used spellings with simple in front of ascenders (nit ‘not’, iſt ‘is’, will ‘want(s)’ and so on). A comparison of the three editions shows that only compositor B₁ followed this practice to a large extent or at least abode closely by the template of typesetter A (Figure 16.5). He also adhered to this template in other cases, while typesetter B₂ did not follow their footsteps (a hypothesis which can also be confirmed by spellings such as abendt, verporgen, ſtätig). However, we cannot conclude from this congruence that typesetter B₁’s avoidance of the sequence <ei> + ascender was a conscious decision. The typesetter of text B₂ probably did not comprehend this aesthetic restriction, or if he was aware of it, he did not continue the practice. His deviations from the template (presumably A or B₁) rather speak in favor of free variation in text B₂.

Figure 16.4 Undesired combination of i-dot and ascender

Figure 16.5 Percentages of <y> and in relation to following ascenders/nonascenders (Σ 92)

The analysis reveals a different typesetting behavior, that is, a different action with respect to the variable , in each text copy. One of the typesetters (A) followed the aesthetic-typographical restriction to not use <ei> followed by letters with ascenders. The typesetter of the first reprint (B₁) adopted this special feature, just as he closely adhered to the template in typesetting and spelling. The third typesetter (B₂), on the other hand, used a completely different approach: in his copy, <ei> and <ey> alternate independently of subsequent letters with ascenders. A comparison via CTVA affords one the ability to detect such differences in the behavior of the craftsmen involved. Clearly, this method is of great value for developing a satisfactory historical interpretation, especially since such subtle differences fade away when large amounts of data are analyzed in historical corpora.

16.3 Data Interpretation

After having reviewed and analyzed the data, one usually has to arrive at relevant conclusions. However, this does not happen on neutral ground, because data interpretation and the previous data collection cannot be truly separated. As the examples above have shown from different angles, the annotation or the close inspection and description of data during the collection phase is a prerequisite for further hypotheses and interpretations. In turn, data collection is aided or even decisively influenced by focusing on the phenomena we consider relevant. Judgments about relevance may be based on specific research objectives, as our scholarly expertise or even our own interests and taste filter our view of the data and influence their description. Reference AbbottAbbott (2004: 245) points out the problem of tunnel vision, in which researchers tend to see the same puzzles and answer the same questions over and over again. For political, individual or moral reasons, they may be tempted to address well-intended but repetitive research questions by, for example, focusing on particular social groups (women, workers, religious ‘outsiders’). This focused research may miss the fact that many patterns are repeated within different social groups. Against this backdrop, we need to interpret our historical data and the behavior of scribes and typesetters with an open mind. We should always expect unknown determinants to influence the choice of variants, including determinants which we may not have considered previously. Unexpected determinants might, for instance, be connected to frequency effects, so that an ongoing change first occurs in function words but affects content words only with a time lag (see Reference PhillipsPhillips 2006: 181–96, Reference BybeeBybee 2007: 5–18). Or there might be questions of orthographic form and design to consider. Written characters and words may be extended, abbreviated, modernized or historicized because of the given space, format or material (Reference ShuteShute 2017). Shorter forms may have served to make better use of the limited space, or historical spellings may have been chosen to support the retrospective construction of tradition and thus to ensure social prestige (Reference VoesteVoeste 2010: 2–7). Therefore, it is especially important to study the original writing surface and spacing (in large or small format, in single or multiple columns, with even or uneven lines) and not to rely solely on decontextualized spelling variants collected in electronic corpora.

16.4 Conclusion

If forming hypotheses depends so much on the method of data collection, what does this imply for the three methods presented? In principle, TRAVA takes all variants of a text into account in their full scope, although the variants are generally narrowed down to certain subsets (such as the use of uppercase letters). The potential set of hypotheses is initially unlimited; all language-internal and language-external variables can be considered as explanatory factors. Therefore, the advantage of TRAVA is its special usefulness for forming hypotheses. If you work thoroughly with this method, and if you study the variants in detail, you may come up with innovative hypotheses for variable correlations. TERVA, on the other hand, is a much more stringent method that is controlled by theoretical presumptions, since it is based on a defined testing arrangement and aims to test a predetermined explanation. The independent and dependent variables in question are previously specified and are then brought together for testing. Therefore, as a description technique, TERVA can be seen as a more focused method. In terms of research logic, everything that has been said about TERVA applies even more to CTVA. In this third and last method, the problem of ceteris paribus is reduced, though not wholly eliminated. The comparative investigation of temporally and spatially distant versions of one and the same text inevitably draws the researcher’s attention to different material modes and to the text’s various sociocultural contexts.

All three methods have their specific advantages and disadvantages, and none of them possesses an inherent superiority over the others. One can even attempt to combine all three methods and apply them consecutively to achieve the best possible outcome. But no matter which of the methods you choose or how you combine them, historical orthography remains a field for those who are thrilled by complex puzzles and bored by simple solutions.

17 Philological Approaches

17.1 Introduction

As a mode of study, philology has a long history, yet the way the term has been used, and the attitude of scholars to both the study of orthography and the meaning of orthographic variation, has changed substantially over time. This chapter outlines the origins and history of philology, from its roots in the Classical period to the present day, and discusses how far philological approaches pertain to the study of historical orthography. Philology’s focus on material, historical and manuscript contexts makes it an especially fruitful way of interrogating historical texts, and philological methods have long been viewed as a particularly apt way of dealing with (among other features) the wide orthographic variation naturally present in medieval works. To illustrate the concerns and approaches of present-day philologists to the study of historical orthography, the chapter presents two case studies. The first focuses on scribal practices in Old English and provides an example of a manuscript-centered analysis of orthography. The second focuses on the scripting of Old English and Old High German and illustrates how historical orthographies can be analyzed by mapping spelling onto an etymological sound reference system.Footnote ¹

The term philology ultimately derives from Greek ϕιλολογία ‘love of reasoning, love of learning and literature’, which is a derivative of the compound adjective φιλόλογος ‘fond of words’. It enters the modern European languages via Latin philologia in the later Middle Ages; compare French philologie, Spanish filología, Italian, Portuguese, Polish filologia , Czech filologie, Russian филоло́гия. The English and German words (philology, Philologie) are coined on the basis of the French form.Footnote ² Philology involves a wide range of practices that are generally linked to the study of texts and languages. Initially tied to the task of editing works from Classical Antiquity (fifth century BC–fifth century AD), philology has split into separate branches, including Classical philology, comparative philology (or historical linguistics), manuscript studies, Altertumskunde, as well as literary criticism.Footnote ³ As a result of the differentiation of philology, the notion as to what philological approaches entail has changed considerably and also differs from discipline to discipline. Nevertheless, all philological practices are characterized by an orientation toward the material sources in which the languages and literatures of the past are attested.

Not only the notion of what philological approaches entail but also their appreciation has changed. Jacob Grimm (1785–1863) placed a very high value on philology when he famously claimed that “none among the sciences is prouder, more noble, more pugnacious than philology and more implacable against mistakes.”Footnote ⁴ On the other hand, in the first half of the twentieth century, Buchstabenphilologie (i.e. philology of letters) came to be used as a derogatory term assigned to research that was held to rely too strongly on the written word. In the past decades, philology has undergone a rehabilitation, particularly in the context of historical sociolinguistics and pragmatics with a strong focus on the manuscript evidence. The different uses of philology have impacted philological approaches to orthography. Whether orthography was considered worthy of study has waxed and waned with the fortunes of philology itself.

17.2 Philology in Classical Antiquity and the Middle Ages

In Classical Antiquity (eighth century BC‒fifth century AD) and the Middle Ages (fifth–fifteenth century), philology included all branches of learning, as illustrated by Martianus Capella’s fifth century allegorical encyclopaedia De nuptiis Philologiae et Mercurii (ed. Reference WillisWillis 1983). This work gives an account of the seven maidens which Philology, a personification of learning, receives as a wedding gift from her husband Mercury. The maidens embody the seven liberal arts: grammar, dialectic, rhetoric (the trivium of medieval education); geometry, arithmetic, astronomy and music (the quadrivium).Footnote ⁵ The study of orthography lies at the very foundation of this curriculum as it represents the initial step of the Ars grammatica, the first discipline of the trivium. Classical and medieval discussions of orthography center on the concept of the littera, which combines nomen ‘name’, figura ‘shape’ and vox ‘voice’ (or potestas ‘might’, i.e. the sound value of a letter).Footnote ⁶ Thus, littera refers not only to the character as a visual unit of a writing system, but also to the sound a character represents, as well as to the name by which it is identified. Consequently, medieval discussions of orthography focus not only on the correct spelling of Latin words but also on their pronunciation. Regional differences of Latin are addressed by Abbo of Fleury in the tenth century: in his Quaestiones grammaticales (ed. Reference Guerreau-JalabertGuerreau-Jalabert 1982), he criticizes the way his students at Ramsay Abbey apparently pronounced words like civis, using the spelling <qui> to represent what must have been the sound /k/.Footnote ⁷

Medieval accounts of orthography do not normally focus on the vernacular languages. One notable exception is the twelfth-century First Grammatical Treatise (ed. and trans. Reference BenediktssonBenediktsson 1972), which devises an orthographic system for Old Icelandic in which each speech sound is represented by a single character (Reference Huth, Goyens and VerbekeHuth 2003: 444–57). While the First Grammarian’s use of minimal pairs to establish differences between sounds is highly innovative and reminiscent of twentieth-century linguistic methodology, his terminology is firmly grounded in medieval grammatical theory; a stafr , for example, like its Latin counterpart littera, combines shape, sound and name (Reference BenediktssonBenediktsson 1972: 44–45). Sharing the fate of many orthographic reforms, few of the First Grammarian’s suggestions were incorporated into Old Icelandic spelling practice.

17.3 Orthography, Renaissance Philology and Beyond

In the Renaissance (fifteenth and sixteenth centuries), philology evolved as the set of methods necessary to edit Classical Greek and Latin texts.Footnote ⁸ The antiquarian interest of scholars in the sixteenth and seventeenth century also extended to the vernacular languages, which resulted in the establishment of ‘modern’ philology. Textual criticism necessitated the collation of different manuscripts and, thus, resulted in fine-grained analyses of orthographic differences and their implications (see Reference Zanobini and SgarbiZanobini 2016: 5). The method culminated in the nineteenth century in Lachmann’s scientific approach to the reconstruction of the archetype of a text – still one of the tenets of Classical and medieval philology.Footnote ⁹

On a theoretical level, Renaissance scholars started to rethink the Classical concept of littera. Julius Caesar Scaliger (1484–1558), was among the first to criticize the Ars grammatica and to argue that littera referred only to the written letter (Reference Vogt-SpiraVogt-Spira 1991: 311–13). Scaliger adduced spurious etymological ‘evidence’ to support his view: he explained that litera – to be spelled with a single <t> – derives from lineaturae, that is, the lines drawn on the page. Vogt-Spira suggests a connection between this new conceptualization of writing and the practice of silent reading, which certainly became the norm with the spread of printed books, though it must have started some centuries earlier (Reference Vogt-SpiraVogt-Spira 1991: 313–14).Footnote ¹⁰ Printing, in any case, did engender a wider debate on orthography, which manifested itself in suggestions for orthographic reforms for the modern languages across sixteenth- and seventeenth-century Western Europe (Reference Neis and HasslerNeis 2011: 174). The aim of such propositions was to bring the spelling in line with contemporary pronunciation, as for example, by Louis Meigret (1500–58) for French, John Hart (d. 1574) for English and Gonzalo de Correas (1571–1631) for Spanish (Reference Neis and HasslerNeis 2011, Reference Salmon and LassSalmon 1999: 15–21, Reference LucasLucas 2000). Reform attempts were often informed by philological work on medieval texts. Some of the characters proposed by Sir Thomas Smith (1513–77) for English, for instance, were adopted from Anglo-Saxon scripts, for example <ð> and <þ> for the voiced and voiceless dental fricatives, respectively, or <ꝼ> for /v/ and <ȝ> for /ʤ/ (Reference LucasLucas 2000: 6).

The use of these characters was closely linked to type design.Footnote ¹¹ Such spelling reforms were generally unsuccessful (Reference LiuzzaLiuzza 1996: 25); however, their influence is still visible in the International Phonetic Alphabet.Footnote ¹² During the eighteenth-century boom in the publication of pronouncing dictionaries, elocutionists employed a variety of methods for conveying their preferred pronunciation, such as italic and Gothic fonts (Reference JohnstonJohnston 1764), accents and macrons (Reference JonesJones 1798), numeric (Reference KenrickKenrick 1773) and alphanumeric notation (Reference WalkerWalker 1791, Reference SheridanSheridan 1780), or devising their own systems (Reference SpenceSpence 1775) to reconcile spelling and pronunciation.Footnote ¹³ However, spelling reform itself was not a concern of the eighteenth-century orthoepists.

17.4 Orthography and Comparative Philology

Philology took a new turn in the late eighteenth century with the identification of Sanskrit as an Indo-European language by Sir William Jones in 1786. This discovery resulted in a focus on the relationship and history of the Indo-European languages (Reference Sonderegger, Besch, Betten, Reichmann and SondereggerSonderegger 2000a: 417, Reference Sonderegger, Besch, Betten, Reichmann and Sonderegger2000b: 443), as in the work undertaken by the Danish philologist Rasmus Rask (1787–1832) and the Germans Franz Bopp (1791–1867), and Jacob (1785–1863) and Wilhelm Grimm (1786–1859). Jacob Grimm discussed orthography in a lecture addressed to the Prussian Academy of Sciences in 1847. His starting point was contemporary German orthography, which he considered to be ‘barbaric’ in contrast with earlier spelling:

More than 800 years ago, in St. Gall during the time of Notker, German orthography was in a better state, and great care was applied to the exact designation of our sounds; good things can still be said about the writings of the twelfth and thirteenth century; only since the fourteenth century has it started to deteriorate.

(Grimm 1864: I, 349)

Grimm specifically criticized words in which the spelling deviates from the spoken language, for example, ‘superfluous’ letters in compounds like Schifffahrt,Footnote ¹⁴ etymological or hypercorrect spellings , as well as the different ways of representing vowel length (Reference GrimmGrimm 1864: I, 330, 349–50). Interestingly, Grimm also took issue with features of written language that have no counterpart in the spoken language, such as hyphens or apostrophes, as well as word-initial capitals. His remarks illustrate that, in his view, an ideal writing system is closely aligned with the spoken language and has a one-to-one relationship between letters/graphs and sounds. This attitude is coupled with the belief that earlier orthographies represented this ideal state.Footnote ¹⁵ Grimm has been accused by later scholars of not being able to distinguish between letters and sounds (see Reference HaasHaas 1990: 10–13). While he may not have been an astute phonetician, part of this criticism arises from Grimm’s Deutsche Grammatik (1822), whose Book 1 is entitled Von den buchstaben. Yet, Grimm’s use of buchstabe stands in the Classical littera tradition: he clearly separated zeichen (‘sign’) and laut (‘sound’), and he was very aware of the fact that the sounds of historical stages of languages can only partially be recovered from writing. Fowkes goes one step further in his defense of Grimm and argues “that his use of the term Buchstabe was tantamount to ‘phoneme’” (Reference FowkesFowkes 1964: 60).Footnote ¹⁶ On the other hand, as Reference HaasHaas (1990: 13) reminds us, as a philologist Grimm primarily dealt with letters and not with sounds.

The neogrammarians (Junggrammatiker ) in the second half of the nineteenth century took the natural sciences, primarily anatomy and biology, as a model for their linguistic work.Footnote ¹⁷ On the one hand, this resulted in the application of empirical methods to the study of articulatory phonetics, or ‘sound physiology’. On the other hand, based on the wide-reaching impact of Charles Darwin’s On the Origin of Species (1859), it led to the adoption of the tree model for historical linguistics and resulted in the reconstruction of Proto-Indo-European. Orthography is largely ignored in comparative philology, even though an understanding of the letter–sound correlations of the earliest attested stages of languages is a prerequisite for any reconstruction. The attitude of historical linguistics toward writing is reflected in the handbooks on the earlier stages of languages; they traditionally start with a section on spelling and pronunciation before they move on to a more detailed discussion of the phonology and morphology (and rarely the syntax). However, in-depth discussions of the writing system itself are usually absent.Footnote ¹⁸ Only a few nineteenth-century studies focus specifically on orthography: there are, for example, Friedrich Wilkens’s Zum hochalemannischen Konsonantismus der althochdeutschen Zeit (1891) and Friedrich Kauffmann’s ‘Über althochdeutsche Orthographie’ (1892), both on Old High German spelling, or Karl D. Bülbring’s ‘Was lässt sich aus dem gebrauch der buchstaben k und c im Matthäus-Evangelium des Rushworth-Manuscripts folgern?’ (1899) on Old English orthography.

A theoretical discussion of writing is provided by Hermann Paul (1846–1921) in Prinzipien der Sprachgeschichte (Reference PaulPaul 2009 [1880]). In his chapter 13, ‘Language and writing’ (Sprache und schrift ), Paul stresses the fact that any linguistic information from the past is only accessible through “the medium of writing” (das medium der schrift, Reference PaulPaul 2009 [1880]: 245). However, he holds that it is impossible to fully reconvert writing into speech – even in the case of writing systems that are close to spoken language. To illustrate the relationship of spoken and written language, Paul uses two similes: first, spoken language and writing are as a line is to a number (Reference PaulPaul 2009 [1880]: 246), since speech sounds blend into each other whereas writing is discontinuous. Second, they are as a painting to a rough sketch, meaning that writing can never express all the nuances of speech, and only someone who is familiar with the language will be able to recover details such as quantity or stress (Reference PaulPaul 2009 [1880]: 249–50).

17.5 Saussure and the Structuralists

Starting with Ferdinand de Saussure’s (1857–1913) ground-breaking publications, philology, referring to a diachronic analysis of language, came to be contrasted with synchronic linguistics. Saussure explicitly criticized philology for “attaching itself too slavishly to the written language and forgetting the living language” (Reference SaussureSaussure 1995 [1916]: 14). In his view, the only function of writing was to represent spoken language; for those who study writing rather than language Saussure used a comparison similar to the one presented by Hermann Paul: “It is as if one believed that, in order to know someone, one should look at his photo rather than at his face” (Reference SaussureSaussure 1995 [1916]: 45). Therefore, he argued, the sole object of linguistics is spoken language. The only reason for studying writing is that linguists need to understand its “functionality, defects and perils” (44) in order to recover language from written sources.

This view dominated the structuralists’ approach to language, which culminated in Leonard Bloomfield’s famous statement in his introduction to Language that, “[w]riting is not language, but merely a way of recording language by means of visible marks” (Reference BloomfieldBloomfield 1973 [1933]: 21). While Bloomfield considered writing an impediment, whose study is only needed in order to “get […] information about the speech of past times” (Reference BloomfieldBloomfield 1973: 20–21), he nevertheless devoted one chapter to “Written records” (Reference BloomfieldBloomfield 1973: 297–313), in which he discussed the properties and history of different writing systems. Bloomfield also used a simile to illustrate the relationship of writing to language: “writing is […] merely an external device, like the use of the phonography, which happens to preserve for our observation some features of the speech of past times” (Reference BloomfieldBloomfield 1973: 299). The generally negative attitude toward writing meant that, as Reference VenezkyVenezky (1970: 10) put it, “orthography was relegated to the backporch of the new linguistic science.”

In the Prague linguistic circle, Josef Vachek (1909–96) began to rethink the structuralist stance on writing, identifying written language as a separate norm alongside spoken language. Vachek saw the two norms as independent but co-ordinated representations of a universal linguistic norm, or langue. Yet, he accorded independent status only to established writing systems and considered the earliest attempts at writing by a linguistic community as “a mere transposition of the spoken norm” (Reference VachekVachek 1939: 102) or as “a kind of quasi-transcription” (Reference VachekVachek 1945–49: 91) and, thus, as a secondary system of representation. Vachek’s work heralded the development of grapholinguistics as a separate linguistic discipline. This field has been dominated by a debate on the relationship between writing and spoken language and the consequent methodological question whether to use an autonomistic or a relational approach for graphemic analysis. Proponents of an autonomistic approach call for an analysis of written language without making recourse to the spoken language.Footnote ¹⁹

17.6 Philology in the Twentieth Century

Philological approaches to orthography in the twentieth century have largely eschewed autonomistic methods. Instead, they are characterized by a careful assessment of spelling evidence in combination with other philological methods. Particular significance has been attached to the letter–sound correlations of the Latin alphabet in different regions and time periods. An early study urging a reconsideration of the evidence of orthographic variation was Reference DauntDaunt’s (1939) examination of the Old English spellings which were traditionally viewed as representing short diphthongs arising from a number of sound changes.Footnote ²⁰ Under one such change (breaking), monophthongs which were followed by /r/ or /l/ plus another consonant (or by /h/ on its own) became diphthongs, for example weorpan, eald, feohtan. Daunt reinterpreted the digraphs <ea> and <eo>, not as evidence of short diphthongs, but as allophones of the short vowels /æ/ and /e/, the second vowel indicating the velar quality of the following consonant. According to Daunt, this was due to Irish influence on Old English orthography, where vowel graphs are used as diacritics to distinguish velarized and palatalized consonants.Footnote ²¹

Important work on Old High German orthography and phonology was undertaken by Penzl. His numerous publications provided new impulses, in particular, by making explicit some of the methodological issues at stake (e.g. Reference PenzlPenzl 1950, Reference Penzl1959, Reference Penzl1971, Reference Penzl1982, Reference Penzl and Luelsdorff1987).Footnote ²² For example, Reference PenzlPenzl (1971: 305) proposes a method for establishing the phonological systems of early Germanic languages, which combines an “internal graphemic analysis” with a “diagraphic comparison.” The first takes into consideration the “choice and distribution of graphemes” within a single text, while the latter entails an analysis of the spelling attested in earlier or later periods as well as in different dialects. Penzl illustrates the application of this method with an example from the St. Gall Paternoster & Creed (c. 790). This text uses <o>, <oo> in words like losi, prooth, sonen, erstoont. Comparing this material with the same words in Notker’s works (eleventh century), it becomes clear that for Notker there is a graphic contrast between <ô> (lôse, brôt) and <uô> ([be]suônet, irstuônt ). Early St. Gall charter material (before 762) shows that Notker’s <ô> corresponds to <au> or later <ao> (e.g. Autmarus, Gaozberto); this is not the case for Notker’s <uô>, which corresponds to <o> in the charters. This evidence makes it clear that in the Paternoster & Creed “the two o oo must have been different, even if lack of symbols led to their graphic merger in the writing system of [St.] G[all] Pat[ernoster]’s scribe” (Reference PenzlPenzl 1971: 306). Penzl’s method also takes other types of evidence into consideration, which include comparative data from the wider language family, meter and rhyme, loanwords, typological aspects, as well as metalinguistic comments. On a theoretical level, Reference PenzlPenzl (1971: 307) identified the “phonemic fit of the orthography” as “a major consideration” of any analysis of written texts. He questioned the structuralist assumption of biuniqueness (i.e. a one-to-one relation of graphemes and phonemes), which resulted either in misconceptions of the phonology represented by early orthographies or in a rejection of writing as an object worthy of study. In Penzl’s work, by contrast, a careful consideration of the “complex orthographic solutions” (Reference PenzlPenzl 1971: 307) leads to a deeper understanding of writing systems and their evolution.

In another work arguing for a more nuanced relationship between orthography and sound values, Reference Clark and HicksClark (1992a) worked at the intersection of history and onomastics. She carefully considered the spellings of personal and place names in the Domesday Book, which routinely render /θ/ and /ð/ as <t> or <d>, for example. Clark disputed the traditional view that these unetymological spellings represented the effects of French speakers’ pronunciation on insular names, and concluded that the Domesday scribes were “not consciously representing current pronunciations used either by scribes or by informants” (Reference Clark and HicksClark 1992a: 320); rather, they were deliberately rendering insular names according to Latin (or, when that failed, French) orthographic norms, in the context of what was a Latin-language administrative text. Similar considerations involving detailed discussions of the pronunciation of medieval Latin, letter–sound correlations and vernacular phonology resulted, for example, in Reference Harvey, Glaser, Seiler and WaldispühlHarvey’s (2011) reassessment of the origins of Celtic orthography (and publications cited there), or Reference DietzDietz’s (2006) analysis of digraphs in the transition from Old to Middle English. These studies demonstrate philology’s continuing applicability to a number of related disciplines.

17.7 ‘New Philology’ and Pragmaphilology

In the second half of the twentieth century, a renewed emphasis on the value of the manuscript sources of medieval texts lay at the heart of ‘New Philology’. This approach arose from concerns among literary scholars that medieval studies – long seen as a bastion of the philological method – had become marginalized and widely perceived as irrelevant in the face of newer methodologies and advances, particularly in literary criticism. In a special volume of Speculum, Reference NicholsNichols (1990) describes New Philology in terms of a renewal, with a strong desire among its adherents to return to its origins in manuscript culture. This entailed a concentration on the materiality of the text (see Chapter 15, this volume), in contrast with earlier focuses on text stemmata and the reconstruction of an idealized, ‘original’ text as envisaged by its author. The new approach presented itself as a fundamental shift; whereas earlier efforts had had the effect of narrowing the variation (orthographic, morphological or lexical) naturally present in multiple-witness texts in an attempt to retrieve the author’s ‘original’ text, New Philologists emphasized the importance of variety inherent in the different manuscripts.Footnote ²³ Variety in the manuscript and linguistic variation were seen as fundamental aspects of the condition of medieval texts:

If we accept the multiple forms in which our artifacts have been transmitted, we may recognize that medieval culture did not simply live with diversity, it cultivated it. The ‘new’ philology of the last decade or more reminds us that, as medievalists, we need to embrace the consequences of that diversity, not simply live with it, but to situate it squarely within our methodology.

(Nichols 1990: 8–9)

New Philology’s emphasis on the original manuscript text aligned literary studies more closely with some of the more language-oriented approaches to studying manuscript texts, although some questioned whether New Philology offered anything that was not already being done.Footnote ²⁴

More recent work has again returned to the question of manuscript transmission and how this can be elucidated by the evidence offered by orthographic variation.Footnote ²⁵ Scholars in historical linguistics have also sought a return to the manuscript text; such a plea was at the heart of Reference Lass, Dossena and LassLass’s (2004) essay which, in focusing on the processes of textual selection in corpus-building, makes a strong case for the inclusion only of texts as they occur in their manuscript form. He advocates the faithful recording of features such as spelling, capitalization and punctuation, and rejects edited texts which normalize or modernize; building a corpus from edited texts runs the risk of incorporating the distortions of editorial choices into the evidence we are able to gain from corpus inquiry. Reference 762SmithSmith (1996: 14) also emphasizes the necessity of bringing together philological and linguistic approaches when studying Old or Middle English, as advocated in historical pragmatics and historical sociolinguistics. He points out that as our earliest records of the language are mediated to us through writing, an understanding of writing systems is essential if we are to undertake effective historical language research (Reference 762SmithSmith 1996: 56).Footnote ²⁶

The emergence of pragmaphilology as a discipline also reflects the increasing preoccupation of scholars with the written text in its context. In a seminal publication heralding the arrival of the new discipline , Reference Jacobs, Jucker and JuckerJacobs and Jucker (1995: 11–12) state that “adequate (i.e. pragmatic) analysis of historical texts must study these texts in their entirety including sociohistorical context, their production process and – crucially – a faithful account not only of the syntactic/lexical level but also the physical and orthographic level.” Among the more recent studies in the field are those which combine small details (e.g. punctuation or paleography) with morphosyntactic features and wider concerns such as the social contexts of a text’s production, in order to produce a more nuanced and rounded picture of the text and its communicative function.Footnote ²⁷ At heart, pragmaphilology, in common with New Philology and other recent fields in linguistics such as historical sociolinguistics, maintains a focus on original texts. As Reference Taavitsainen, Fitzmaurice, Fitzmaurice and TaavitsainenTaavitsainen and Fitzmaurice (2007: 18) note, “a prerequisite for the conduct of historical pragmatics is the acceptance of written texts as legitimate data.” The increasing availability of high-quality facsimiles and online scans of manuscripts has been fundamental in enabling a more manuscript-centered approach which is able to account for factors such as paleographical data alongside areas which fall more traditionally under the domain of linguistics. Reference BaischBaisch (2018: 183) notes that the increasing availability of digital editions “has begun to open up new possibilities which reflect central preoccupations of the New Philology.”

17.8 A Linguistic Atlas of Late Mediaeval English

While not always a mainstream approach, the philological focus on manuscript text and context repeatedly surfaces as a primary concern among scholars working on medieval language and literature. This holistic approach is exemplified by the substantial body of work undertaken in compiling A Linguistic Atlas of Late Mediaeval English (LALME, Reference 725McIntosh, Samuels, Benskin, Laing and WilliamsonMcIntosh et al. 1986a) and its subsequent counterpart, A Linguistic Atlas of Early Middle English (LAEME, Reference LaingLaing 2013–). The seeds of LALME were sown by McIntosh in an article in which he advocated the study of Middle English orthography in its own right, and not just as a way to devise or understand the correspondence between written and spoken Middle English (Reference McIntoshMcIntosh 1956). His confidence in the value of the evidence of written language in its own right was not widely shared at the time, and put him at odds with the structuralist stance: “there is beyond doubt at present a fairly prevalent feeling that the approach to spoken manifestations of language is in some fundamental sense a more rewarding – not to say reputable – pursuit than that to written texts” (Reference McIntoshMcIntosh 1956: 37). This approach was informed by McIntosh’s earlier work as a dialectologist in present-day Scots, and his observations that orthographic patterns were apparent in surviving Middle English manuscripts which enabled him to make geographical or dialectal correspondences.

McIntosh’s methodology was novel in that it treated each manuscript witness as a linguistic informant, the equivalent of a living speaker in a dialect survey. From each witness he collected counts of a wide range of variants akin to a dialect questionnaire to construct profiles for each scribe (Reference McIntoshMcIntosh 1974: 602–3). These included ‘S-features’ (reflective of spoken language differences, such as hem/þem), ‘W-features’ (orthographic features, reflective of written language and which have no bearing on the pronunciation of a word, such as sche/she) and ‘G-features’ (paleographical features such as the shape of a particular graph). Throughout, McIntosh emphasizes the value of working across disciplines to view the problem in the round, because “it is sometimes the case that that a scribe fails to impose his own S-features on texts but does impose upon them various scribal characteristics of his own” (Reference McIntoshMcIntosh 1974: 603). That is to say, paleographical variants are used alongside evidence from spellings which encode spoken variation, as well as those spellings which do not. This complementary evidence is used as part of the ‘fit technique’ to place writers geographically.Footnote ²⁸

This build-up of small details culled directly from the manuscripts themselves enabled LALME researchers to categorize scribal behavior into different types; for example, a scribe may choose to copy his exemplar text literatim, reproducing a near-identical text, or he may ‘translate’ the exemplar into his own linguistic norms, substituting his favored spellings for those he finds in his exemplar. Or he may choose to do something in between, perhaps beginning as a more literatim scribe before moving to translating behavior as he becomes more familiar with the exemplar’s forms. Reference Benskin, Laing, Benskin and SamuelsBenskin and Laing (1981) also described the behavior of a Mischsprache scribe: one who produces both forms from the exemplar and those from his own preferred usage, but who, importantly, maintains this behavior throughout his copy. Altogether, this methodology not only tells us about the way the scribe of the surviving manuscript went about his task, but it can also allow us to build, through the collection of relict forms, an idea of the nature of the underlying exemplar. As Reference LaingLaing (1988: 83) notes, “dialectal analysis often provides the means to do far more than place a scribe on the map.” More recent research has focused on what can be discovered about the writing systems employed by different scribes; careful and painstaking analysis has revealed the use by some writers of “litteral substitution sets” (where one sound is represented by several litterae), and by others of “potestatic substitution sets” (where one symbol represents several sounds; see Reference Laing and LassLaing and Lass 2009: 1).Footnote ²⁹ Reference LaingLaing (1999) details, for example, how changes in the written forms of <þ>, <ƿ> and <y> during the Middle English period led to the interchangeable use of these graphs by some scribes to map a range of pronunciations including /ð/, /θ/, /w/ and /j/. Reference Laing and LassLaing and Lass (2009) see scribal variation as overlapping function and formal equivalence, as systematic, and not a result of ‘mental failure’ or ‘scribal error’.Footnote ³⁰ This emphasis on the value of the input of the scribe (as a ‘native speaker’), rather than trying to correct something that is perceived as an inferior version of the author’s original, links the LALME/LAEME project’s attitude to historical texts with that of new philologists, historical sociolinguists and pragmaphilologists: “[i]t is recognised that a ‘corrupt’ text may reflect the activity of a contemporary editor, critic, or adaptor rather than that of a merely careless copyist” (Reference LaingLaing 1988: 83).

17.9 Case Study: Two Scribes of the Tanner Bede

The methods outlined by the compilers of LALME/LAEME are not only of use for the study of Middle English but can also be applied to Old English material, although the language situation is rather different; in general later writers of Old English appear to have used a focused variety (i.e. late West Saxon), whereas Middle English was “par excellence, the dialectal phase of English” (Reference StrangStrang 1970: 224), when writing routinely reflected local usage.Footnote ³¹ The important thing to bear in mind is that many surviving Old English texts are copies, rather than autograph writings, meaning that what we see on the page is not the result (as in our second case study below) of a considered scripting choice, but the outcome of the copying behavior of the latest scribe. Thus, in line with McIntosh’s observations, we may detect orthographic features as well as morphosyntactic ones, which may have been transmitted from the exemplar text, or else translated into the scribe’s own preferred usage. The difference between looking at late Old English and Middle English is that Old English literacy was probably far less widespread socially, being more or less restricted to the ecclesiastical elite. In addition, the destruction of Northumbrian and Mercian monasteries and their libraries during the Viking attacks of the ninth and tenth centuries means that a substantial part of our data for Old English comes from eleventh-century Wessex and the dialect written there (Reference Fulk and CainFulk and Cain 2013: 21–22).

This case study examines the performance of two scribes from Oxford, Bodleian Library Tanner 10 (T), a late tenth-century copy of the Old English translation of Bede’s Historia ecclesiastica, and is based on the methodology developed in Reference WallisWallis (2013) and adapted from that of the LALME project. An examination of T alongside the other Bede manuscripts reveals that the original translation, which no longer exists, was written in a Mercian dialect, and the text was progressively West-Saxonized as a succession of scribes recopied it during the late tenth and eleventh centuries (Reference MillerMiller 1890, Reference WallisWallis 2013). In total five scribes contributed to T, and the two under examination are referred to as T2 and T4.Footnote ³² A questionnaire was used to collect the variant spellings, from which five features are examined.Footnote ³³ These are all conservative features indicative of Bede ’s original Mercian dialect: ah is a form of ac (‘but’) commonly found in Anglian dialects, rather than West Saxon (Reference HoggHogg 1992a: 275), while ec is a spelling of eac (‘also’) which shows Anglian smoothing (Reference CampbellCampbell 1959: 95). Spellings retaining <oe> represent the rounding of ē, found in non–West-Saxon dialects (Reference CampbellCampbell 1959: 76–78, 133), while double vowel combinations, for example tiid for tid (‘time’), are found in older texts representing long vowels (Reference CampbellCampbell 1959: 13). Cuom- and cwom- represent early spellings of com- the past tense of cuman (‘to come’), before the loss of -w- (Reference Ringe and TaylorRinge and Taylor 2014: 339). These features are summarized in Table 17.1.

Table 17.1 Spelling variation in T2 and T4

Relict feature	Newer variant
ah	ac
ec	eac
oe	e
double vowels: aa, ee, ii, oo, uu	a, e, i, o, u
cuom-, cwom-	com-

These features appear sporadically as relicts in other Bede manuscripts, as well as in T. It is important not only to ascertain the form(s) of each feature present, but also where each instance occurs, by folio. In that way we can detect whether a scribe’s behavior is consistent throughout his stint, or whether it changes as he writes. Two main trends are noticeable about T2’s performance. Firstly, he has a strong tendency to use the more conservative spellings; for example, he only ever uses ah, and never ac, while he transmits six <oe> spellings, including roeðnis ‘storminess’, woedelnisse ‘poverty’ and woen ‘hope’. Another relict feature transmitted throughout his stint is the use of double vowels, rendering both proper and common nouns; his 23 examples include tiidum ‘time’, cwoom ‘came’ and the personal name eedgils. Finally, T2 only uses older spellings of the past tense of the verb cuman ; while he vacillates between the older for w and the newer wynn, it is notable that he never writes com (see our second case study on variation between and wynn in Old English).Footnote ³⁴ One place where T2 may introduce a form of his own is on a single occasion right at the beginning of his stint, where he writes eac (f. 103r). However, following this he always writes ec (three times), apparently following his exemplar. It would appear then that T2’s copying behavior falls toward the literatim end of the spectrum; there is little evidence on the basis of the features discussed here to suggest that he brings many of his own preferred spellings to his copy, and he maintains conservative spellings throughout.

When T2 reached the end of his stint, the copying task was taken up by T4, whose approach over the next 18 folios is rather different. T4 begins by reproducing a number of forms from his exemplar, and on the first two folios (f. 117v–118r) we find ah, ec, cuom, cwom, and forðfoered ‘to depart, die’ with an <oe> spelling. It is quite clear that as he continues, T4 gradually abandons these inherited spellings for ones which reflect his own training and preferences. What is notable, however, is that he does not change all these spellings at the same point in his copy; while ec is soon changed to eac on f. 118r, com makes its appearance a little later, on f. 119r. Rather later still is the change from ah to ac (f. 121v), suggesting that these changes happen perhaps at a lexical level, rather than at a systematic, orthographic level; previous exposure to a spelling does not seem to be a factor, as ec is written only once before the spelling is changed, while ah appears four times before it is replaced. It is rather more difficult to say whether the lack of <oe> forms in the later folios represents a conscious change by T4, or whether it was simply the case that no such forms existed in this part of the exemplar.

A different pattern is shown, however, by the four double vowel spellings, which appear, rather sporadically, throughout T4’s stint in words such as aa ‘always’ and riim ‘reckoning’. Although the contributions of both scribes are short, it appears that T2 is rather more likely to transmit a double vowel spelling than T4 (23 times in 1,540 graphic units, against T4’s four times in 3,651 graphic units). It is possible that fewer double vowel spellings occurred in T4’s section of the text than in T2’s, although the fact that such spellings are also transmitted by scribes T1 and T5 suggests that this is unlikely. That double vowels occur in each scribe’s stint, though to differing degrees, might suggest that they were not felt by the scribes to be too incongruous a spelling, or that they were part of the T scribes’ passive repertoire (Reference Benskin, Laing, Benskin and SamuelsBenskin and Laing 1981: 58–59). T4, then, acts as a translator scribe, albeit one who starts out more literatim, before ‘writing in’ to his own preferred norms and style. Of course, without the original exemplar, we cannot be entirely sure to what extent either scribe made alterations in their text, and it should be stressed that this is not an exhaustive survey. Nevertheless, comparison of T2 and T4 with each other, and with other scribes of the Bede manuscripts, allows us to build a picture of the sorts of features we would expect to have been in the archetype, and which therefore may well have occurred in T’s exemplar. Building up a scribal profile, which aims to analyze both the features used as well as their distribution, enables us to map the internal consistency of each scribe, in addition to their differences from one another.

17.10 Quantification in Philological Approaches to Orthography

Beyond the research on early English encouraged by the LALME/LAEME project, a number of philological approaches to orthography from the early decades of the twenty-first century have addressed written language by mapping spelling onto a linguistic reference system. This method was initially developed by Mihm and Elmentaler in the context of a project entitled Niederrheinische Sprachgeschichte at the University of Duisburg, which focused on administrative writing in Duisburg from the fourteenth to the seventeenth century (as described in Reference ElmentalerElmentaler 2003: 49–51). While rejecting an autonomistic analysis of written language as impractical, Elmentaler takes great pains to avoid circular reasoning. This is achieved by analyzing graphs (Graphien) according to their correspondence to Lautpositionen (‘sound positions’), which are units defined by sound etymology and context. Elmentaler’s research also relies on the strict separation of scribes and exact quantification (Reference ElmentalerElmentaler 2003: 60–63). Graphs and sound positions are correlated, which makes it possible to establish the grapheme systems of individual scribes and to assess the overlap in the representation of different sound positions. On a wider level, Elmentaler’s research confirms that early written languages are fully functional, that the letter–sound correlations of Latin are persistent, and that change in written language is often discontinuous (Reference ElmentalerElmentaler 2003: 51–53).

Subsequent studies have applied Mihm and Elmentaler’s approach to other types of material: Reference LarsenLarsen (2001, Reference Larsen2004) has adopted it for a study on Middle Dutch statutes of the Flemish town of Ghent, Reference KawasakiKawasaki (2004) for a graphemic analysis of the Old Saxon Heliand, and Reference SeilerSeiler (2014) in the context of research on the earliest Old English, Old High German and Old Saxon sources. These studies address different research questions and, consequently, adapt the relational method to suit their own purpose: Larsen aims at establishing the entire grapheme systems represented in the material from Ghent; Kawasaki systematically compares the spellings for the dental letters þ, d, đ and t across the five extant manuscripts of Heliand ; and Seiler investigates, on the one hand, how a number of consonant phonemes are represented and, on the other hand, how ‘superfluous’ graphemes like <k>, <q>, <x> and <z> are employed. These differences aside, the studies share a cautious stance when it comes to attributing exact sound values to the orthographic features under investigation and they all aim to elucidate the workings of nonstandardized writing systems.

17.11 Case Study: The Scripting of /w/ in Old English and Old High German

This second case study focuses on the spellings for one sound, the continuant of Proto-Germanic */w/ in Old English and Old High German. This sound was phonologically stable, yet it is represented in various ways since Latin had no corresponding sound and, therefore, the alphabet provided no suitable character.Footnote ³⁵ The case study provides insights into the scripting of Old English and Old High German; the details presented here are based on a comparative analysis of early orthography (Reference SeilerSeiler 2014), which relies on quantitative data to identify the factors determining graphemic choices. The methodology is adapted from Reference ElmentalerElmentaler (2003; see above), mapping spellings onto an etymological reference system. The results show that while, overall, the spellings for Old English and Old High German /w/ are variable, there are clear-cut diatopic and diachronic patterns. Furthermore, different orthographic solutions tend to be used for specific sound positions. Once these factors are taken into consideration, Old English and Old High German orthographies turn out to be surprisingly consistent.

When scribes in England and in Frankia started to write their vernacular languages with the Latin alphabet, three typologically distinct spellings for the representation of /w/ were available to them. The first option was to use single , though this graph stood for labiodental /v/ in Latin; the second spelling consisted of the digraph <uu>, and a third option was to adopt the character <ƿ>, named wynn ‘joy’, from the runic script (ᚹ). All three spellings (as well as some others) are attested in Old English and Old High German sources, yet their patterns of distribution are very different. Wynn is the standard spelling in Old English from the ninth century onward and remained in use well beyond the end of the Anglo-Saxon period.Footnote ³⁶ Early Old English sources, going back to the late seventh and eighth century, generally use or <uu> instead, though rare instances of wynn occur. Single dominates in the eighth-century versions of Cædmon’s Hymn transmitted as part of Bede’s Ecclesiastical History (see the extract from the Moore manuscript under (1) below) and is found in names attested in the earliest Anglo-Saxon charters. Even in texts that use <ƿ> or <uu> elsewhere, is often retained as a spelling for /w/ in the consonant clusters /kw/, /hw/, /sw/ and so on, as for example in the Alfredian translation of the Pastoral Care, which normally employs <ƿ> but uses <cu>, <su> and so on for these clusters (2).Footnote ³⁷ These spellings are clearly modeled on Latin words, like suavis ‘sweet’, which contain a bilabial semivowel (Reference StotzStotz 1996: 142). Double <uu> occurs only occasionally, mostly in early Mercian sources as exemplified by examples from the Épinal Glossary (3). However, the digraph spelling continues to be used in Old English names in Anglo-Latin texts as in the Vita St. Æthelwoldi (4) (Reference Lapidge and WinterbottomLapidge and Winterbottom 1991: clxxxviii). Again, there is a restriction: <uu> is rarely used before the vowel /u/ (e.g. uulfgar and not **uuulfgar). The following text samples illustrate the range of spellings found in different Anglo-Saxon sources:

(1) Nu scylun hergan hefaenricaes uard, metudæs maecti end his modgidanc, uerc uuldurfadur, sue he uundra gihuaes, eci dryctin, or astelidæ (Cædmon’s Hymn from the Moore manuscript, Cambridge University Library, MS Kk. 5.16, c. 737, ed. Reference DobbieDobbie 1942: 105; emphasis added here and throughout);
(2) Ne cuæð he ðæt forðyðe he ænegum men ðæs ƿyscte oððe ƿilnode, ac he ƿitgode sua sua hit geƿeorðan sceolde (Old English Pastoral Care, Bodleian Library, Hatton MS 20, late ninth century, ed. Reference 768SweetSweet 1871: I, 29.10);
(3) [232] ca[ta]ractis: uuaeterthruch ‘water-pipe’, [1026] telum: uueb ‘web’, [1040] taberna: uuinaern ‘tavern’, [1045] talpa: uuandaeuuiorpae ‘mole’, [1062] uitelli: suehoras ‘fathers-in-law’, [1088] uirecta: quicae ‘green place’ (Épinal Glossary, c. 700, ed. Reference PheiferPheifer 1974);
(4) Est enim ciuitas quaedam modica, commerciis abunde referta, quae solito uuealinga ford appellatur, in qua uir strenuus quidam morabatur, cui nomen erat Ælfhelmus, qui casu lumen amittens oculorum cecitatem multis perpessus est annis. Huic in somnis tempore gallicinii sanctus AĐELVVOLDUS antistes adstitit eumque ut maturius uuintoniam pergeret et ad eius tumbam gratia recipiendi uisus accederet ammonuit […] (Vita St. Æthelwoldi, ed. Reference LockhartLapidge and Winterbottom 1991: 42).Footnote ³⁸

In Old High German, the digraph <uu> is already the regular spelling for w in the earliest sources in the eighth century. Its use is doubtless modeled on West Frankish spelling practice, where <uu> is attested in personal names on coins and charters from the late sixth century onward (e.g. UUaldemarus, UUandeberctus, see Reference WellsWells 1972: 118–19, 144, 157, Reference FelderFelder 2003: 700). From Merovingian Frankia the digraph presumably also spread to Anglo-Saxon England (Reference Seiler, Conti, Rold and ShawSeiler 2015: 119–20). Eventually, <uu> or <vv> were combined into a single character with touching or overlapping strokes, resulting in the establishment of a new letter <w>.Footnote ³⁹ The runic character wynn, on the other hand, is restricted to a small number of texts and is rarely used consistently (Reference 659Braune and HeidermannsBraune and Heidermanns 2018: 24). The presence of wynn in Old High German is generally attributed to Anglo-Saxon influence. One text in which it is found is in the Hildebrandslied, an alliterative heroic poem (5). The mixture of <uu> and <ƿ> spellings suggests that wynn occurred in the exemplar from which the extant version was copied but was not normally used by the two scribes (see Reference LührLühr 1982: 32–34). The Hildebrandslied was copied in Fulda, one of the centers of the Anglo-Saxon mission on the Continent, which explains the presence of insular influence in the scriptorium.

The restrictions on <uu> found in Old English also apply to Old High German orthography: in consonant clusters and before the vowel /u/ many scribes prefer single as a spelling for /w/ (5, 6, 7). One exception to this rule is Otfrid of Weissenburg, who explicitly speaks out in favor of ‘triple-u’ for the sequence /wu-/ in one of the prefaces to his Evangelienbuch : “Sometimes, as I believe, three u are necessary for the sound; the first two as consonants, as it seems to me, but the third keeping its vocalic sound” (ed. Reference MagounMagoun 1943: 880). Otfrid also insisted on this spelling being used in the Evangelienbuch (see Reference Seiler and RobinsonSeiler 2010: 92–95, 99). For the representation of the cluster /kw/, many Old High German sources resort to <qu> or similar spellings, as in the Old High German translation of Tatian’s Diatessaron (6). This spelling is clearly modeled on the large number of Latin words containing a labiovelar (quia, quod and so on). Incidentally, the same spelling occurs in some Old English sources (e.g. quicae in (3) above).

Overall, Old High German orthography is highly idiosyncratic and more prone to intricate digraph and trigraph spellings than Old English. The scribe of part Ka of the Abrogans glossary, for example, uses <ouu> to represent Proto-Germanic */w/ in clusters with /s/ or /z/, single in other clusters and double <uu> elsewhere (7). It is possible that the trigraph owes its composition to the insertion of a parasitic vowel after the sibilant (Reference 659Braune and HeidermannsBraune and Heidermanns 2018: 103); however, many intricate spelling rules are graphic in nature and unconnected to the sound level. The following examples illustrate the range of Old High German spellings for /w/:

(5) […] gurtun sih iro suert ana, helidos, ubar [h]ringa. do sie to dero hiltiu ritun. hiltibraht gimahalta, heribrantes sunu – her uuas heroro man, ferahes frotoro –; her fragen gistuont fohem uuortum, [h]ƿ́er sin fater ƿ́ari […] (Hildebrandslied 5b–9, c. 830, ed. Reference LührLühr 1982, I, 2);Footnote ⁴⁰
(6) Inti quad Zacharias zi themo engile: uuanan uueiz ih thaz? ih bim alt, inti mīn quena fram ist gigangan in ira tagun (Old High German Tatian, c. 830; ed. by Reference Braune and EbbinghausBraune and Ebbinghaus 1994: 47);
(7) [12.19] ambiguus : undar zouuaim ‘going two ways’, [12.20] dubius : zouuiual ‘doubt’, [28.20] natare : souuimman ‘to swim’, [29.02] natabat : souuam ‘swam’; [13.11] ambitus : cadhuing ‘region’, [23.06] ego inquid : ih qhuad ‘I said’, [30.16] adfligit : thuingit ‘he throws down’; [10.21] almum : uuih ‘holy’, [25.16] crescit : uuahsit ‘it grows’ (Abrogans, Cod. Sang. 911, c. 790, ed. Reference Bischoff, Duft and SondereggerBischoff et al. 1977).

A comparison of the spellings for /w/ in Old English and Old High German suggests that similar factors were at work. Orthographic solutions are influenced by two opposing principles: firstly, a desire for an unambiguous representation of vernacular sounds and, secondly, the rules of Latin grammar. This leads to compromises such as single in clusters and before the vowel /u/, while <uu> is used elsewhere. The dominance of Latin spelling practice and orthographic rules results in similarities between individual writing systems but also across the traditions of the West Germanic languages. Such similarities are owing to a shared background rather than to direct influence from one spelling system to another. Individual scribes define their own, sometimes intricate spelling rules, though Old English spelling coalesces toward a relatively uniform representation of the vernacular in the course of the Anglo-Saxon period. Old High German orthography, on the other hand, remains more fragmented. Finally, scribal choices are also affected by the text genre. In nonstandardized writing systems, spellings often carry associations beyond the sounds that they represent. The runic character ƿ, for example, is clearly a ‘vernacular’ graph. Single but also the digraph <uu>, on the other hand, stand for (Merovingian) Latinity and are thus more suitable for the representation of vernacular elements in Latin texts.Footnote ⁴¹ On a more general level, this case study shows how core philological methods can be updated to reach a more sophisticated understanding of the writing systems of the past. This entails, on the one hand, a more nuanced assessment of the correlations of spellings and sounds and, on the other, investigating writing systems as culturally transmitted phenomena that contain features going beyond sound representation. By shifting attention squarely onto written language, the term Buchstabenphilologie may thus be reclaimed as the study of writing in its own right.

17.12 Conclusion

The popularity of philology has waxed and waned among scholars of historical texts. However, it has never been entirely eclipsed by other methods. It has frequently been noted that ‘philology’ is difficult to define (e.g. Reference NicholsNichols 1990: 2, Reference Fulk, Kytö and PahtaFulk 2016: 95), encompassing a wide range of methods and involving competence in a number of disciplines.Footnote ⁴² Nevertheless, it is because it is a fundamental part of textual scholarship that philology remains a relevant and valid approach to the study of historical texts on a variety of levels. While the concerns of philologists may have moved away from the tasks of textual editing and the recovery of the original authorial text, the methodology and expertise developed by philologists now find their use in “mediating between the demands of linguistic methodology and the limitations that beset the records of prior states of the language available for linguistic analysis” (Reference Fulk, Kytö and PahtaFulk 2016: 96). It is precisely this mediating role which is most valuable; philology is easily absorbed by and combined with newer theoretical linguistic approaches, providing scholars with a deeper understanding of the “extralinguistic contexts of linguistic data” (Reference Fulk, Kytö and PahtaFulk 2016: 95). Thus, a range of scholarship has developed that combines philological sensitivities with the theoretical underpinning of, for example, variationist linguistics in historical sociolinguistics, or politeness theory in historical pragmatics. It is arguably in these fields, where philological methods are able to take advantage of advances in digital humanities such as corpus linguistics or digital editing, that we see the most fruitful combinations of many of the strands laid down by twentieth-century work (e.g. New Philology, pragmaphilology, LALME), much of which involves the study of historical orthographies, alongside several other features. Finally, there is an emphasis in these newer fields on finding new texts to study, often from the kinds of writers who have been overlooked by traditional scholarship, such as documents from lower-class writers in ‘language history from below’ (Reference Elspaß, Hernández-Campoy and Conde-SilvestreElspaß 2012b). This means that the supply of historical documents is by no means exhausted, and there remains much work for philologists to do using such combined methods, both on existing documents and on those yet to be discovered.

18 Exploring Orthographic Distribution

18.1 Introduction

“I’ll call for pen and ink, and write my mind.” This sentence from Shakespeare (1 H6, V, 3, 66, see Reference BurnsBurns 2000) is a neat way to introduce our readers to the content and scope of this chapter. These are the Earl of Suffolk’s words for when, in love with Margaret, he doubts whether or not to free her, and appeals for a pen and ink to let his intentions flow on a piece of paper. The present chapter examines the relationship between ‘pen and paper’ in the composition of early English manuscripts and printed books, on the basis of the hypothesis that some common practice on the matter was shared by scribes and printers alike. Old manuscripts and printed books are taken as the source of evidence to discuss the concept of spacing and distribution. The term distribution is not simply taken as the mere arrangement of the sentences and paragraphs on the page, but it is rather conceived in its widest sense referring to the writers’ decisions both in the preparation of the writing surface and in the writing process itself.

This chapter discusses the arrangement of the external aspects of the text together with the distribution of internal features associated with spacing in Late Middle English (1350–1500) and Early Modern English (1500–1700). The focus is on the English language and Early Modern English in particular as the period when the standards on spacing were on the rise. In the following, we describe the rationale behind the composition of early English handwritten documents, reconsidering aspects such as the preparation of the writing surface, the dimensions of the folio, margin conventions, frame and line ruling, the use of columns (and its association with the formality of the text) and line justification. Next, we explain the main notions of the concept of spacing, describing different types of line-final word division and its specific variants, and providing a general overview of existing research in the literature. Finally, two case studies are offered where we discuss the emergence of spacing in the Middle Ages and its development throughout the Early Modern English period, paying attention to both handwritten and printed sources. These studies are carried out by considering divisions both in the middle and at the end of a line. Divisions in the middle of a line, on the one hand, are described in light of the evidence of nominal and adjectival compounds, reflexives, adverbs and words which, although independent lexemes, are irregularly found together in the period. Line-final word division, in turn, considers the typology of boundaries, whether morphological, phonological or anomalous. We argue that it is from narrow case studies like these that we can effectively contribute to knowing more about our general understanding of orthographic distribution.

18.2 Formatting and Layout

The preparation of the writing surface was a time-consuming process according to which animal skin (sheepskin was more frequently used in Britain than goatskin) was turned into parchment as a result of the cleaning and the subsequent dehairing of the material. In itself, creating parchment was an arduous and lengthy process which made the resulting material a valuable product, a luxurious item which could only be afforded by the elite of the time. Parchment, and writing materials in general, had limited availability and was therefore an expensive item at the time. Consequently, careful planning of its use was crucial in order to make the most of this writing surface (Reference Clemens and GrahamClemens and Graham 2007: 15–17).

In this context, columns were a recurrent practice among medieval and, to a lesser extent, Early Modern English scribes, although “in the fifteenth century a renewed preference among some for layouts with long lines is detectable, probably under Italian Humanistic influence” (Reference 676DerolezDerolez 2003: 37). Even though there is not a one-to-one correspondence between the disposition of columns and the formality of the text, the use of columns is more strongly associated with particular registers and genres, especially those considered to have a higher level of formality. As far as genre is concerned, the use of columns is especially frequent in high-esteemed literary compositions, poetry in particular, often highly decorated and colored specimens. This is the case of MS Hunter 7, containing a decorated version of John Gower’s Confessio Amantis ; MS Hunter 197, housing a copy of Geoffrey Chaucer’s The Canterbury Tales ; or MS Hunter 5, a precious version of John Lydgate’s Fall of Princes (Reference CrossCross 2004).

As far as register is concerned, the presence of columns is also more widely connected with texts with a higher level of formality, and medical writing may be the best testimony to this scribal preference. Theoretical and surgical treatises were considered the most academic registers and belonged to the learned tradition, being mostly translations of learned Latin medicine with an academic origin designed for physicians of the highest class and (barber) surgeons. Remedies, in turn, portray the language used by lay people, as they were collections of recipes that families stored for their use at home. While the latter are seldom rendered in two columns for the purposes of private use, some theoretical/surgical treatises are often found with columns depending on the circulation and value of the item at hand. MS Hunter 95, for instance, is a beautiful two-column composition housing a Late Middle English version of the Book of Operation. From the beginning of the sixteenth century, however, the use of columns decreased among Early Modern English scribes.

The dimensions of the margins are in most cases a matter of convention in early English manuscript composition insofar as the foot margin is usually twice as wide as that at the top, and the side margins are greater than the top and less than the foot (Reference JohnstonJohnston 1945: 72). By doing this, medieval scribes ensured that “the height of the written space equalled the width of the page” (Reference De Hamelde Hamel 1992: 21). Regardless of its dimension, in a regular quire, consisting of eight leaves, the upper and lower margins measure approximately 20 mm and 35–40 mm, respectively, while the left and right margins amount to c. 15–20 mm and 15–35 mm. This can be taken as a milestone both in Late Middle English texts, as in MS Hunter 497 (Reference Calle-Martín and Miranda-GarcíaCalle-Martín and Miranda-García 2012: 26), MS Wellcome 542 (Reference Calle-Martín and Castaño-GilCalle-Martín and Castaño-Gil 2013: 29) or MS Hunter 328 (Reference Calle-MartínCalle-Martín 2020: 15); and Early Modern English specimens, as in MS Wellcome 3009 (Reference Criado-PeñaCriado-Peña 2018: 16) or MS Rylands 1310 (Reference Calle-MartínCalle-Martín 2020: 16). Figure 18.1 shows the average dimension of a manuscript folio with the approximate size of the margins and the actual writing space as found in the fifteenth-century English translation of Macer Floridus’s De viribus herbarum (Glasgow University Library, MS Hunter 497).

Figure 18.1 Margins and writing space in MS Hunter 497

Ruling techniques changed over time, however. Drypoints were used until the eleventh century, a method which consisted in pressing into the page with a sharp instrument, where only one side of the page needed to be ruled. Leadpoint, in turn, was in vogue until the thirteenth century and it was distinguished by its grey or reddish-brown color and, contrary to drypoint rule, it required the ruling of both sides of the page. Finally, ink began to be in use from the fourteenth century, often with the same color as the running text, even though this practice became less and less fashionable from the fifteenth century and, since then, only frame ruling remained (Reference 676DerolezDerolez 2003: 55, Reference Clemens and GrahamClemens and Graham 2007: 16–17). Even though frame ruling was a consistent practice in early English manuscripts, line ruling was more often associated with valuable copies to ensure that the text had a visually appealing layout (Reference Calle-MartínCalle-Martín 2020: 17). The number of lines of a handwritten composition depended on the size of the volume.

The history of English handwriting in the period 1400–1600 is characterized by the replacement of the Anglicana hand by the Secretary script, the latter more cursive and considerably smaller in size (Reference RobertsRoberts 2005: 4, Reference Calle-Martín, Miranda-García and GonzálezCalle-Martín 2011b: 35–54). The progressive spread of the Secretary hand had a crucial impact on the design of the manuscript page leaving more room for running text. Both from the fifteenth century, the scribe of MS Hunter 497 is consistent as to the use of a hybrid Anglicana hand, while MS Hunter 328 is rendered with a more cursive hand, allowing for more running words per page.

Line justification is also a matter of scribal choice in handwritten documents, depending in most cases on the value of the copy at hand. Even though there is a general commitment to make the most of the writing space, it is a fact that valuable copies are particularly respectful to the inner and outer margins, and line-fillers are frequent devices to avoid a blank line after the closing of a paragraph. Less valuable copies are more concerned with the importance of the writing space and, as a consequence, show a frequent use of margins for the running text – to the detriment of word division – together with a wider use of abbreviations. Printed texts, on the other hand, are obviously more prone to line justification, while at the same time avoided the use of line-fillers for visual purposes.

In addition to the size and spacing conventions of the written material, formatting was obviously the only means to provide the written text with some kind of organization. Decorative material was often employed to indicate major textual divisions in the text. The litterae notabiliores stand out as visual indicators of the beginning of a sentence, becoming “the primary way in which the reader was guided through the text” (Reference Smith, Stenroos and ThengsSmith 2020b: 212). The cost of doing such hand-rubrication was then enormous and the use of underlining and/or colored ink, red in particular, were also frequent practices to indicate textual divisions within the text, thus guiding the eye of the reader to the important parts of the text. A hierarchy of scripts was also a common device for macrostructural purposes, both in handwritten and printed documents, where “square capitals [were used] for main headings, uncials for lower level headings and initial words, and Caroline minuscule for the main text” (Reference Smith, Myers and HarrisSmith 1994: 36–38; Reference BaronBaron 2001: 22). MS Hunter 135, housing a sixteenth-century English version of De chirurgia libri IV (ff. 34r–73v), displays this kind of typographical arrangement of the text where section titles are reproduced with an italic script while the running text is rendered with a fairly legible Tudor Secretary hand.

Punctuation also played a decisive role in the organization of the text, understood as a means to divide the text into pages, lines and paragraphs. It was considered pragmatically and, besides pages and lines, the paragraph was taken as the earliest unit of punctuation, often unaccompanied by any internal mark of punctuation until the seventh century (Reference Lennard and JuckerLennard 1995: 65–68). Written punctuation started thenceforth and by the eleventh century there was a set of symbols which, with overlapping uses, were devised for the expression of particular needs. The Middle Ages then stand out as a crucial period in the development of the system of punctuation in the sense that it consisted of overlapping repertoires of marks associated with a particular scriptorium or geographic area until the eleventh century, and of “a general repertoire with a wide European distribution” from the twelfth to the fourteenth century (Reference Lennard and JuckerLennard 1995: 66). In light of this, apart from the paragraph itself, different punctuation marks appeared to create the mise-en-page, thus making the text more readable. The list includes, for instance, the paragraphus § and the paraph ¶ along with other symbols such as the virgule /, the double virgule // and the perioslash ./, each of these adopting various forms (.//, //., etc.). The paragraphus was mostly found as an indicator of divisions in a text. The paraph carried the pragmatic function of “a macro structural marker to indicate particular relationships within the paragraph as well as the major sections and subsections within the text” (Reference Calle-Martín and Miranda-GarcíaCalle-Martín and Miranda-García 2005: 33). The virgule, and its variant forms, were recurrent symbols with section titles while at the same time also committed to the separation of sense units which are semantically and syntactically independent. The period, in turn, also served to circumscribe some key terms of a text, apart from other kinds of sentential, clausal and phrasal relationships.

18.3 Spacing: Word Division

The term word division is used to denote the threefold rendering of some words in historical compositions, which may appear either joined, hyphenated or separated, although the latter overwhelmingly predominated (Reference TannenbaumTannenbaum 1930: 146). In itself, word division is relevant to orthography in view of its connection with punctuation. The phenomenon dates back to the sixth century, when Irish and Anglo-Saxon scribes contributed decisively to the development of the system of distinctiones. These scribes were in need of visual marks in order to understand Latinate texts, most of them written in scriptura continua, and turned to the practice of word separation, with spaces and periods used therein (Reference Clemens and GrahamClemens and Graham 2007: 83–84, Reference Calle-Martín, Thaisen and RutkowskaCalle-Martín 2011a: 18). The phenomenon, however, has been traditionally shunned in most sources and the only references to it in the literature are limited to mentioning the lack of an orthographic standard until the first half of the sixteenth century (Reference Denholm-YoungDenholm-Young 1954: 70, Reference PettiPetti 1977: 31).

Line-final word division is defined as the breaking of a word at the end of a line and, unlike divisions in other positions, the rules determining it differ according to phonological and morphological factors. The issue is also ignored in traditional handbooks on paleography, where its omission is assumed to be the rule rather than the exception. The only references to the topic reveal that there is no consensus, neither at a phonological nor at a morphological level, governing line-final word division among English penmen. The splitting of words at the end of lines is considered arbitrary in handwriting and the only precept “seems to have been that not less than two completing letters could be carried over to the second line” (Reference HectorHector 1958; see also Reference Denholm-YoungDenholm-Young 1954: 70, Reference PettiPetti 1977: 31). The modern tenets have discredited the traditional approaches proposing the existence of conventional patterns. The topic is open to interpretation, however. In Old English (OE), Reference HladkýHladký (1985a: 73) states that the main word division principle is basically morphological, including suffixed, prefixed and compound words. Lutz, in turn, affirms that the division of polysyllabic words reflects their phonological organization into syllables, thus assuming that line-final word division is based on the syllabification of OE (Reference LutzLutz 1986: 193; see also Reference Burchfield, Godden, Gray and HoadBurchfield 1994: 182). There is not a univocal attitude in the period in view of the distinctive practices of scribes and, as such, the evidence found in these modern approaches cannot be generalized to represent the whole period.

To cope with this limitation, the last decade has witnessed a number of statistical analyses (Reference Calle-Martín, Vera and CaballeroCalle-Martín 2009, Reference Calle-Martín, Thaisen and Rutkowska2011a) addressing the phenomenon in terms of the particular choices of scribes to provide empirical data that may be eventually compared with other texts. From a methodological viewpoint, the rationale used for these investigations stems from Reference HladkýHladký’s (1985a, Reference Hladký1985b) approach to the study of word division in some historical texts, proposing a classification of the topic in terms of the ultimate force of splitting, that is, morphology and phonology. The former recurs to the traditional word-formation principles of prefixation, suffixation and composition, as in vn-curable, sauour-ing and som-what, respectively. The latter divides words in terms of their actual pronunciation where the following types of phonological rules stand out: (i) the CV-CV rule, that is, the division after an open syllable, as in sy-newes ; (ii) the C-C rule, the division between two consonants, as in mer-curye ; (iii) the V-V rule, the division between two conjoining vowels, as in api-um ; (iv) the ST rule, either the separation or the preservation of the cluster -st, as in sub-stance or was-tyng ; (v) the CL rule, the keeping together of a consonant and a liquid on condition that both belong to the same syllable, as in par-brakynge ; and (vi) the CT rule, the division between the pair -ct, as in elec-tuaryes. These statistical analyses have also added a third group (Reference Calle-Martín, Vera and CaballeroCalle-Martín 2009: 38, Reference Calle-Martín, Thaisen and Rutkowska2011a: 18) so as to account for those anomalous divisions which fall apart from this twofold classification, as in ointme-nt.

18.4 Case Studies

The present section explores the emergence of spacing in the Middle Ages and its development throughout the Early Modern English period, paying attention to both handwritten and printed sources. Spacing is examined here by considering divisions both in the middle and at the end of a line (see Subsections 18.4.1 and 18.4.2, respectively). Two case studies are used for explanatory purposes here to show that the methodology proposed for researching orthographic distribution works, and the focus on word division specifically is then offered as an example. The data used as source of evidence come from the two components of The Málaga Corpus of Early English Scientific Prose, both The Málaga Corpus of Late Middle English Scientific Prose (for the period 1350–1500) (Reference Miranda-García, Calle-Martín, Moreno-Olalla, González Fernández-Corugedo and CaieMiranda-García et al. 2014) and The Málaga Corpus of Early Modern English Scientific Prose (for the period 1500–1700) (Reference Calle-Martín, Moreno-Olalla, Esteban-Segura, Marqués-Aguado, Romero-Barranco, Thaisen and 663RutkowskaCalle-Martín et al. 2016). These corpora contain material from the three branches of early English scientific writing: specialized treatises, surgical treatises and recipe collections (Reference Voigts and EdwardsVoigts 1984, Reference Taavitsainen, Tyrkkö, Taavitsainen and PahtaTaavitsainen and Tyrkkö 2010). These two components of the Málaga corpus contain transcribed material using semi-diplomatic conventions to render an accurate reproduction of the original handwriting. In this fashion, the spelling, lineation, paragraphing, word division and punctuation have been exactly reproduced as by the scribal hand, while abbreviations have been systematically expanded in italics. The corpus has been automatically annotated with CLAWS7 (Constituent Likelihood Word-tagging System), developed by the UCREL team at the University of Lancaster (Reference Garside, Garside, Leech and SampsonGarside 1987, Reference Garside, Smith, Garside, Lech and McEneryGarside and Smith 1997),Footnote ¹ whose tagset incorporates more than 160 tags together with particular labels for the different marks of punctuation (Reference Romero-Barranco, Fuster-Márquez, Gregori-Signes and Santaemilia RuizRomero-Barranco 2020).Footnote ²

The printed material comes from the Early English Books Online corpus, which contains a total of 755 million words from 25,368 texts from the period 1470–1700. Even though it includes material from a wide range of fields, such as literature, philosophy, history, religion, science and politics, among others, the present case study exclusively relies on the scientific component of the corpus so that all the data belong to the same typology of texts. The EEBO corpus has been supplemented with a small collection of texts on pharmacy, botany, alchemy and medicine from the sixteenth and seventeenth centuries that we have manually compiled following the Málaga Corpus editorial model (Reference BrunschwigBrunschwig 1528, Reference DodoensDodoens 1578, Reference RuscelliRuscelli 1595 and Reference HartmanHartman 1696). The size and format of EEBO allows for the examination of particular spelling features from a chronological perspective, word division included, both in the middle and at the end of a line.

The compilation of data has been a straightforward process. As far as the Málaga Corpus is concerned, the corpus contains a semi-diplomatic transcription of the original manuscript text. A simple search was required to obtain the instances of word division in the middle of the line as the modernized version of the corpus allows the generation of all the allomorphs of a given lexeme. Line-final breaks, in turn, were automatically generated by searching for the hyphen – and the double hyphen =, which are the punctuation symbols used to mark these breakings in handwritten texts. As far as printed texts are concerned, the instances of division in the middle of a line were generated automatically through the Sketch Engine interface,Footnote ³ and, in this case, different searches were needed to comply with the orthographic variation of Early Modern English. Next, line-final word division instances were automatically generated by means of the hyphen, which is also found to mark these breakings in printed texts. On quantitative grounds, divisions in the middle of a line have been represented with percentages (%) while the distribution of line-final word splits has been analyzed with normalized frequencies (n.f.).

18.4.1 Word Division

Compounds are taken to be the lexemes consisting of two independent lexemes (Reference BauerBauer 2003: 40). Word division is here described according to the evidence provided by nominal or adjectival compounds (i.e. headache, toothache, aquavitae, rosemary, lukewarm and so on), compound adverbs (i.e. therewith, within, inward and so on), reflexive pronouns (i.e. myself, himself, themselves and so on) and other forms unequivocally rendered as two different lexemes in present-day English, but together for some time in the history of English (i.e. shall be and as much, among others). Figure 18.2 shows the distribution of nominal and adjectival compounds in the period 1350–1700, where the joined version of words is observed to proliferate over time. While there is a major preference for the separation of both members of the compound (72.2 percent) in the fifteenth century, the sixteenth century stands out as a transitional period marking the progressive decline of the separated form (58.8 percent) together with the rise of the joined form (38.2 percent). The seventeenth century shows the eventual standardization of the joined spelling with a rate of 66.2 percent, contrasting with the sporadic occurrence of both the hyphenated (17.7 percent) and the split forms (16.1 percent) in handwritten texts. Figure 18.3, in turn, presents the same state of affairs in printed documents inasmuch as split forms significantly predominate over joined forms in the sixteenth century (with rates of 71.1 percent and 28.1 percent, respectively). In the seventeenth century, however, there is a significant rise of joined forms (51.9 percent), which eventually outnumbered the occurrence of split forms (40.8 percent). Still, figures are surprisingly high if compared with their occurrence in handwritten documents.

Figure 18.2 Nominal and adjectival compounds in handwritten texts (%)

Figure 18.3 Nominal and adjectival compounds in printed texts (%)

The standardization of these compounds is not in all cases systematic as the adoption of the joined form seems to behave differently across the different compounds. There is, on the one hand, a group of compounds like aquavitae, rosemary and quicksilver, among others, which are systematically represented in its full form at the beginning of the seventeenth century, with a minute occurrence of the separated and the hyphenated forms. There is, on the other hand, another set of compounds which are more reluctant to the adoption of the joined form and more bound to appear with the hyphen well into the seventeenth century. The list includes the -ache compounds (i.e. headache, toothache) together with other combinations such as lukewarm.

Figures 18.4 and 18.5 show the threefold representation of reflexives in handwritten and printed documents, respectively. As far as handwriting is concerned, a similar trend of development is observed with the final standardization of the full form of the reflexive at the beginning of the seventeenth century. There is an outstanding preference for the split form of the reflexive throughout the fifteenth and sixteenth centuries with rates of 76.9 percent and 67.9 percent, respectively, followed by the joined form (23.1 percent and 30.3 percent), while the use of the hyphenated form is sporadic. In the seventeenth century, the results again show the rise of the joined form (46.1 percent), which outnumbers the split (39.1 percent) and the hyphenated spellings (14.7 percent), although the split form of these reflexives is still considerably high in these texts. Printers, on the other hand, present a different attitude toward reflexives insofar as the split form is significantly preferred both in the sixteenth (94.1 percent) and the seventeenth centuries (68.7 percent). Even though there is a significant rise of the full version of reflexives in the seventeenth century (31.2 percent), the split form shows a wider distribution in printed documents.

Figure 18.4 Reflexive forms in handwritten texts (%)

Figure 18.5 Reflexive forms in printed texts (%)

There is also room for morphological variation in the distribution of reflexives depending on the person of the verb, with three clear diachronic tendencies. First and second person pronouns are systematically separated in Late Middle (1350–1500) and Early Modern English (1500–1700) (i.e. my self, your self, our selves, your selves) both in handwritten and printed texts, with just occasional occurrences of the hyphenated and the joined spellings (i.e. my-self, myself). Third person plural pronouns, in turn, present another trend. Even though the split spelling is clearly the choice in the fifteenth-century (i.e. them selves), the full form begins to slightly outnumber the others in the sixteenth century, becoming the standard spelling at the turn of the following century (i.e. themselves). Third person singular pronouns would lie somewhere in between the previous tendencies with the preference for the split form throughout the fifteenth and sixteenth centuries and the rise of the joined form in the seventeenth century. The two spellings, himself and him self, are then found to have a balanced distribution throughout that century.

Figures 18.6 and 18.7 present the development of the spellings of the adverbs afterward, inward, outward, therewith and within together with the preposition without in handwritten and printed texts. These items are found to be somewhat more advanced in the standardization process with the adoption of a systematic form already in the sixteenth century. Adverbs are usually represented either joined or separated in the fifteenth century, with rates of 57.8 percent and 42.1 percent, respectively. One century later, however, the split form declines, leaving these words with an unequivocal spelling. The full form becomes general practice among penmen and printers in the sixteenth and seventeenth centuries. A different tendency is observed for lexemes which appear together without any apparent justification. This is particularly the case of combinations like asmuch and shalbe, the latter “found capriciously till the seventeenth century” (Reference Denholm-YoungDenholm-Young 1954: 70).

Figure 18.6 Other adverbs and prepositions in handwritten texts (%)

Figure 18.7 Other adverbs and prepositions in printed texts (%)

Figure 18.8 presents the distribution of these items in handwritten texts. As shown, they developed irregularly with drastic ups and downs over time. Interestingly enough, there is a widespread practice of separating these lexemes among fifteenth-century scribes with a rate of 99.1 percent of the instances. The sixteenth century, however, witnesses the rise of joined spelling with 89.1 percent and just 10.7 percent of separated instances. As in the previous cases, standardization seems to take place in the early seventeenth century, when the number of split instances surpasses the joined version with 85.9 percent of the examples. Printed texts, on the other hand, already present split spelling in the sixteenth century with a distribution of 81.8 percent and 18.2 percent of split and joined instances, respectively (Figure 18.9). The printers’ decision to avoid the joined form of these lexemes is already a consensus in the seventeenth century, with a rate of 98.2 percent of the instances.

Figure 18.8 Development of shalbe and asmuch in handwritten texts (%)

Figure 18.9 Development of shalbe and asmuch in printed texts (%)

18.4.2 Line-final Word Division

An empirical analysis of line-final word division in early English prose must necessarily stem from a statistical overview of the phenomenon in OE, the period marking off the beginning of this practice to validate the existence of a regular set of patterns among Anglo-Saxon scribes. The quantitative analysis of line-final word division comes to refute the argument of Reference HladkýHladký (1985a: 73) that the major principle determining divisions at the end of the lines in OE is fundamentally morphological. As shown in Table 18.1, phonological divisions are found to outnumber morphological breaks. The former amount to 86.97 occurrences, the latter to 70.11 occurrences. Anomalous boundaries are sporadic with just 7.22 instances.

Table 18.1 Type of division in OE (n.f.)

Phonological	86.97
Morphological	70.11
Anomalous	7.22
Total	164.31

A previous study on line-final breaks in OE sheds light on the erratic distribution of the phenomenon among Anglo-Saxon writers, more bound to make the most of the writing surface at the margin. Interestingly enough, phonological splits exceed morphological ones in MS Corpus Christi College 140, containing the Anglo-Saxon version of the Gospel According to St. Matthew (Mt for short), and MS Vitelius A.xv (Vit for short), housing the Beowulf manuscript and three prose tracts dated c. 1000 (Reference RypinsRypins 1998). There is, in turn, a substantial preference for morphological line-final divisions in MS Corpus Christi College 201, housing a mid-eleventh-century version of Apollonius of Tyre (AoT for short), despite sharing the same dialect and chronology as Mt (Reference Calle-Martín, Thaisen and RutkowskaCalle-Martín 2011a: 19–20). Although morphological splits are more widespread among several OE writers, the available data do not corroborate the principle that morphological breakings prevail in the period, as these are ultimately dependent upon the idiosyncratic preferences of scribes.

Phonologically speaking, as shown in Table 18.2, there is a major preference for the CV-CV rule, as in cire-niscan (AoT, li); and the C-C rule, as in nih-tes (Vit, 103r, 16), although other possibilities arise depending on the word and the factual space at the margin: (i) the V-V rule, as in farise-isce (Mt, xiv); (ii) the ST rule, as in fæs-tenu (Vit, fol. 112v, 17) or arce-strates (AoT, xxii, li);Footnote ⁴ and (iii) the CL rule, as in wun-driende (Vit, fol. 106v, 9) and hreo-fla (Mt, vii).

Table 18.2 Phonological boundaries in OE (n.f.)

	Vit	Mt	AoT
CV-CV rule	59.8	51.6	1.5
C-C rule	50.5	42.7	6.1
V-V rule	0.9	1.9	–
ST rule	–	1.9	0.3
CL rule	0.9	–	–

(from Calle-Martín 2011a: 21)

From a morphological standpoint, as shown in Table 18.3, prefixation is usually the most frequent type of division in OE, followed by suffixation and composition. While in Vit and AoT prefixes outnumber suffixes, in Mt suffixation slightly surpasses prefixation. Anomalous boundaries, in turn, are irregularly distributed in the OE period. In the majority of cases, this irregular separation is the result of limited space at the margin of the folios. The scribe was, to some extent, forced to break the word elsewhere, always on condition that there are at least two letters on the following line, as in hlafo-rd (Mt, xxvi). In other cases, the distortion might be explained in terms of an erroneous interpretation of the inflection by the scribe, as in heof-enum (Mt, vi).

Table 18.3 Morphological boundaries in OE (n.f.)

	Vit	Mt	AoT
Prefixation	47.7	24.3	22.9
Suffixation	41.1	33.7	4.5
Composition	10.2	8.4	3.1

(from Calle-Martín 2011a: 21)

As illustrated in Table 18.4, there is a growing importance of phonological boundaries in fifteenth-century handwritten documents coinciding with the gradual decline of morphological divisions, with just 15.02 occurrences – a negligible figure if compared with the total of anomalous divisions (Reference Calle-Martín, Thaisen and RutkowskaCalle-Martín 2011a: 23, see also Reference HladkýHladký 1985a: 74–75, Reference HladkýHladký 1987: 137, Reference Calle-Martín, Vera and CaballeroCalle-Martín 2009: 40). Still, there are texts where line-final word division is subjected to an array of random rules by which some words are prone to be broken almost elsewhere (Reference Calle-Martín, Thaisen and RutkowskaCalle-Martín 2011a; see also Reference Calle-Martín, Vera and CaballeroCalle-Martín 2009: 40). This is, for instance, the case of MS Peterhouse College 118 (P118 for short), which presents a significant number of anomalous splits.

Table 18.4 Type of division in ME (n.f.)

Phonological	151.88
Morphological	15.02
Anomalous	8.59
Total	175.49

From a phonological viewpoint, the CV-CV rule predominates, followed by the C-C rule (see Table 18.5). There are cases, however, where a consonant letter is spuriously added after the break, perhaps in the attempt to preserve the C-C rule rather than providing an irregular split. In MS Hunter 328 and MS Sloane 340 (H328 and S340 for short), for instance, the scribes prefer the writing of vrin-nal (S340, fol. 49v, 10) and strang-gurie (H328, fol. 26v, 23) to avoid the breaking of a syllable at the end of the line, even when other breaks would have also been possible, such as the CV-CV rule in vri-nal or strangu-rie. In addition to this, in contrast with OE, there is a growing specialization among fifteenth-century scribes in view of the slight increase in the use of V-V splits, always on condition that both vowels are pronounced, as in ve-ynes (S340, fol. 42v, 3);Footnote ⁵ the more frequent use of the ST rule, as in dyges-tyon (S340, fol. 40v, 24); and the appearance of the CT rule, as in lac-tea (H328, fol. 11r, 25).

Table 18.5 Phonological boundaries in ME (n.f.)

	E2622	H328	S340	P118	H497
CV-CV rule	56.08	36.52	104.82	87.81	127.12
C-C rule	25.67	10.80	57.24	29.27	103.47
V-V rule	2.70	1.86	8.27	9.06	6.39
ST rule	–	0.37	5.51	–	0.52
CT rule	–	0.37	–	–	0.27

(from Calle-Martín 2011a: 25, 2009: 41–45)

From a morphological standpoint, Table 18.6 presents the distribution of morphological boundaries among fifteenth-century scribes where suffixation is observed to outnumber prefixation in most cases. This distribution is consistent, except for MS Egerton 2622 (E2622 for short) on account of its preference for prefixes, as in for-sohte (fol. 136v, 6), up-warde (fol. 155v, 8) or y-do (fol. 148r, 10).

Table 18.6 Morphological boundaries in ME (n.f.)

	E2622	H328	S340	P118	H497
Prefixation	3.37	1.11	8.96	6.75	5.56
Suffixation	1.35	6.70	15.86	6.75	8.90
Composition	1.35	1.49	0.68	–	4.45

(from Calle-Martín 2011a: 25, 2009: 46–47)

Table 18.7 presents the distribution of line-final breaks among sixteenth and seventeenth-century penmen, where phonological boundaries are negligible in comparison with its frequency in previous centuries. As shown, there is a higher tendency for line-final division in printed texts than in handwritten texts (74.21 vs. 36.78 occurrences). Morphological division, in turn, is sporadic in both types of texts. Notwithstanding these general tendencies, there is also room for variation across some pieces insofar as anomalous boundaries are found to supersede the morphological in two early sixteenth-century compositions, MS Ryland 1310 (R1310 for short) and the Booke of Dystyllacyon of Waters (Dyst for short; Reference BrunschwigBrunschwig 1528), in particular.

Table 18.7 Type of division in EModE (n.f.)

	Handwriting	Printing
Phonological	36.78	74.21
Morphological	11.71	9.52
Anomalous	6.70	5.66
Total	55.19	89.39

As far as phonological divisions are concerned, Tables 18.8 and 18.9 show that the CV-CV and the C-C rules are, as expected, the most widespread boundaries both in handwriting and printing, with all the other rules lagging well behind. There are, however, exceptions to this rule in some texts. For instance, the C-C rule outnumbers the CV-CV rule in MS Ferguson 7 (FER7 for short), housing an early-seventeenth-century handwritten extract of The Secrets of Alexis of Piemont and A Niewe Herbal or Historie of Plants (AoP and NH for short, respectively; Reference RuscelliRuscelli 1595, Reference DodoensDodoens 1578); this is interpreted as an erratic practice of the scribe given the preference for the CV-CV rule in the printed versions of the same text. Apart from the preference for CV-CV, both handwritten and printed sources show a higher level of specialization in view of the constrained distribution of the V-V rule, on the one hand, and the ST rule, on the other, as in indige-stion (R1310 fol. 3r, 34), ipo-stasis (R1310 fol. 11v, 22), Ma-sterwort (FPh, p. 26) and so on. Likewise, there is a rebirth of the CL rule in the period, becoming more recurrent in printed compositions, NH and FPh, in particular (Reference DodoensDodoens 1578; Reference HartmanHartman 1696). In handwriting, MS Hunter 95 (H95 for short) stands out on account of its relative high frequency, as in Com-frey (fol. 1v, 55) and con-tractum (fol. 10r, 3).

Table 18.8 Phonological boundaries in EModE handwritten texts (n.f.)

	R1310	H135	H95	FER7	W6812
CV-CV rule	27.53	15.22	16.07	4.96	32.56
C-C rule	21.81	10.55	4.38	5.52	25.97
V-V rule	1.03	1.22	0	0.82	2.32
ST rule	5.71	1.96	0	0	2.32
CT rule	0.51	0.24	0	0.27	0.77
CL rule	2.07	1.96	5.84	0.55	2.71

Table 18.9 Phonological boundaries in EModE printed texts (n.f.)

	Dyst	AoP	NH	FPh
CV-CV rule	51.37	49.87	63.44	58.99
C-C rule	38.20	34.91	55.05	57.79
V-V rule	1.50	3.11	0	4.55
ST rule	4.32	0.62	2.33	2.87
CT rule	0.56	0	0	1.43
CL rule	3.38	4.98	10.73	9.35

Finally, morphological divisions are more erratic with the absence of a standard practice across penmen and printers. Tables 18.10 and 18.11 present the distribution of morphological boundaries in handwritten and printed documents. While in MS Wellcome 6812 (W6812 for short) suffixation sharply outnumbers composition, in H135 composition surpasses suffixation. Prefixation, except for H95, becomes almost nonexistent in handwritten texts. In printed documents, however, composition is more widely disseminated, followed by suffixation and prefixation, the latter negligible across all the texts – with the only exception of FPh, where suffixation is preferred.

Table 18.10 Morphological boundaries in EModE handwritten texts (n.f.)

	R1310	H135	H95	FER7	W6812
Prefixation	1.03	0.98	2.92	0.55	0.77
Suffixation	4.15	3.92	4.38	2.20	15.50
Composition	4.15	7.61	1.46	2.76	5.42

Table 18.11 Morphological boundaries in EModE printed texts (n.f.)

	Dyst	AoP	NH	FPh
Prefixation	0.37	1.24	0	1.91
Suffixation	1.31	4.98	3.73	12.94
Composition	2.63	6.85	8.39	6.71

18.5 Some Follow-up Thoughts

Since the analysis section has been quite intense and data-driven, let us pause for a minute and take stock of what the whole purpose of the present contribution has been so far, before moving on to giving some follow-up thoughts. The core section of the chapter has been concerned with the emergence of spacing in the Middle Ages and its development in the Early Modern English period. After a brief description of the writing material and the arrangement of the text on the writing surface, the chapter has focused in greater detail on the concept of spacing applied to words broken in the middle and at the end of a line, as these are the two environments where both scribes and printers were bound to make choices as to the separation of a word. To this purpose, two case studies were offered, paying attention to the phenomenon both in handwritten and printed sources. As far as handwritten material is concerned, the data were drawn from the two components of The Málaga Corpus of Early English Scientific Prose, both The Málaga Corpus of Late Middle English Scientific Prose (for the period 1350–1500) and The Málaga Corpus of Early Modern English Scientific Prose (for the period 1500–1700). As for the printed material, the analyses have relied on the scientific component of the corpus of Early English Books Online together with other sixteenth- and seventeenth-century scientific texts, providing us with fresh data to evaluate the printers’ attitude toward word division at the time.

Word division in the middle of the line has been explored in light of the evidence provided by nominal/adjectival compounds, compound adverbs, reflexives and words, which, although rendered unequivocally as two independent lexemes in present-day English, are irregularly found together in the history of English. The orthographic standardization of these words initiated in the sixteenth century and, after a period of competition between the joined and split forms, it was not until the seventeenth century that the solid version of these forms seems to be adopted. Notwithstanding this, reflexives present an unexpected development in the sense that the joined form is the rule in seventeenth-century handwritten documents, while the split form is still the dominant practice in printed texts. The shape of independent lexemes such as shall be and as much, however, was deemed to be the result of the scribes’ and printers’ choice insofar as they were mainly separated in the fifteenth century but overwhelmingly joined in the sixteenth century. Line-final word division was discussed in terms of the typology of breaking, whether morphological, phonological or anomalous. While the phenomenon is erratic in Old English to make the most of the writing surface, a more standard practice was observed among medieval and Early Modern English penmen. There is an increasing importance of phonological divisions in the period 1500–1700, with an outstanding preference for the CV-CV and the C-C rules, followed by the V-V rule. Morphological breaks, in turn, become sporadic, suffixation preferred over prefixation and composition. This rationale also seems to be the dominant pattern for breaking words at the end of a line in the early printed texts, where both the CV-CV and the C-C rules are systematically adopted by the printers.

With that said, what can our readers take from our case studies and from our chapter as a whole? We believe that they are a testimony of how little one can lay out from a scientific, linguistic point of view about orthographic distribution as a whole. Through research, however, one can glean more about the practical aspects of conducting empirical work on something as little explored as word separation, as a way to hopefully reach some encouraging generalization on at least one area of orthographic distribution. The study of word separation in historical documents is at times painstaking for the analyst in view of the number of irregular breaks which are scattered throughout the text with no other explanation than making the most of such an expensive writing surface as parchment or paper. In the absence of a standard pattern among scribes and printers, one of the methodological problems with studying word separation across time lies in the selection of data, which may, to some extent, bias the validity of the results. The different attitudes toward word division are commonplace in the historical analysis of the phenomenon and some of the texts under scrutiny have raised this same point. In spite of this shortcoming, the study of the phenomenon in handwritten texts also sheds light on some of the scribal attitudes toward word division in the medieval period, where both regular and irregular practices are found depending on the hand involved. The Renaissance, and the printing press in particular, marked off the beginning of a new era in which both scribes and printers were increasingly committed to the use of a more standard practice in the middle of the line – with a wider preference of solid forms toward the seventeenth century – and at the end of the line – with the increasing adoption of phonological divisions throughout the early modern period, the CV-CV and the C-C rules in particular.

Even though word division in the middle of the line was practically standardized toward the end of the seventeenth century, line-final word division still awaits the labor of other scholars to provide a more convincing picture of the phenomenon after the arrival of printing. This chapter has mostly shown one side of the coin, and there is still a long way ahead to gather some more evidence about it in other periods, genres or text types. The seventeenth century was crucial in the development of word division as a result of the printers’ decisive contribution. The study of the phenomenon from the community-of-practice perspective would then open new doors to assess the role of the different printing houses in the dissemination of the sixteenth- and seventeenth-century practices on word division. Reference TyrkköTyrkkö (2020: 70) argues that “printers formed a tight-knit professional community where new innovations, or deviations from current standards, were immediately noticed” and this community “is a valuable one to take when it comes to Early Modern printing and thus to spelling standardisation” (Reference Tyrkkö, Kopaczyk and JuckerTyrkkö 2013).

To wrap up, then, where can future research depart from in order to understand more orthographic distribution and its relation with word division? We believe that a diachronic study is a desideratum to reconsider the actual contribution of seventeenth-century printers, the role of eighteenth- and nineteenth-century prescriptive grammars and the eventual configuration of the phenomenon as it now stands in present-day English, worldwide varieties included. Genre and text-type variation would also be a revealing line of research in the light of the evidence provided by, for instance, magazine and newspaper material, which could have surely pioneered the standardization process in comparison with the timid contribution of text types such as fiction or science, among others.

18.6 Conclusion

The present chapter has provided an overview of issues relating to orthographic distribution and has then discussed the emergence of spacing in early English writing, considering the attitudes of both scribes and printers toward word division in the middle and at the end of the line. The two case studies have cast light on the existence of some level of orthographic variation throughout the fifteenth and the sixteenth centuries, when the phenomenon was mostly found to rely on the individual preferences of scribes. The seventeenth century, in turn, brought some fresh air to the issue with the progressive adoption of solid forms in the middle of the line and phonological divisions at the end of the line. This trend, however, is incipient at the turn of the seventeenth century, being still a very early date to propose some sort of orthographic standardization in the writing of these words. The case studies have, we hope, illuminated our knowledge about word division in early English and have provided a methodological framework for the study of word division across time. The topic surely awaits the future insight of other scholars to elucidate the moment and the forces which contributed to the eventual standardization of line-final word division in English. It is from relatively narrow areas of empirical work that, we believe, useful generalizations about such a big topic as orthographic organization can hopefully be drawn in the future.

19 Comparative and Sociopragmatic Methods

19.1 Introduction

From the outset of philological inquiries into texts on historical stages, the interest in text materiality (discussed in Chapter 15, this volume), has paved the way for the development of paleography, historical phonology and text criticism. In each of these fields, the graphic signs themselves as well as their systems have constantly played a subordinate role. It is only recently that the examination of writing systems themselves as well as orthography and punctuation have become truly independent research areas, whose consolidation continues, given that research on them remains dispersed in case studies across various disciplines. According to Reference CoulmasCoulmas (2013: 17–18), a writing system can, in this context, be understood as a set of graphic signs (scripts) in order to decode a language into its written form; orthography means a set of scripts, including a set of rules, that regulates its usage.Footnote ¹

The first attempts at systematizing script types were presented in paleography (e.g. Reference KarskiĭKarskiĭ 1901, Reference Zhukovskai͡a and BorkovskiĭZhukovskai͡a 1955; see also Chapter 15, this volume), an integrative part of historical science, which was to introduce a divide between historical science and philology. If we examine how language histories have been written over time, an endeavor that started simultaneously with paleography, the history of writing systems was by and large excluded from them on account of the fact that, on the one hand, the history of writing systems did not fit into the structuralist concept of language and, on the other, it was already part of another discipline. However, it must be borne in mind that, as a practice, paleography was more interested in the shape of scripts; their inventory has been described with regard to this perspective, which Zalizni͡ak refers to, for example, as calligraphy in a broader sense (Reference Zalizni͡ak and Zalizni͡akZalizni͡ak 2002a [1979]: 560–61).

Subsequently, the divide in text criticism between historical science and philology deepened, while a separate editorial canon for the needs of historical science and for philology was developed, even though the very same texts were often to become objects of interest to both disciplines. Ultimately, this was to result in contradictory solutions in text materiality interpretation and representation, particularly regarding text normalization and modernization in historical editions, as well as its literal (diplomatic) reproduction in philological editions, which restricted their interoperability between both disciplines (Reference PiotrowskiPiotrowski 2012: 19, Reference SahleSahle 2013: 107–10, 143–67, 225–53). The editor’s vision, as conveyor of ultimate decisions about the shape of the edited text, often led to editorial encroachments for clarity’s sake, notably concerning orthography. This procedure deviates from the original text and impedes an appropriate graphematic analysis (Reference Černá-WilliČerná-Willi 2012: E5–E7).

The philological editions were mostly used in linguistic phonological research in diachrony ; the scripts, on the other hand, were interpreted as graphic representations of phonemes; this was the reason behind the faithful representation of the spelling used in source texts. However, authors of some studies distanced themselves from the prevailing paleographical praxis; instead, they described the set of graphemes and their corresponding functions as used in the language system under scrutiny. The phonological and graphemic reconstruction of linguistic proto-stages was based on structuralist oppositions: phoneme ~ allophone; grapheme ~ allograph (Reference LisowskiLisowski 2001, Reference Marti, Kempgen, Kosta, Berger and GutschmidtMarti 2014, Reference Stadnik-Holzer, Gutschmidt, Berger, Kempgen and KostaStadnik-Holzer 2014). Research topics such as capitalization, punctuation and text structure, however, were not included in the scope of linguistic research over a longer period, for the modernization praxis in historical editions rendered them invisible, while philological editions frequently followed the same principle in order to ensure ease of reading. Ultimately, it was paleography that completed their description (e.g. Reference KarskiĭKarskiĭ 1901: 232–57). From the 1990s on, linguistic interest in these research areas has burgeoned; the potential for its development has still not been realized, however.

Alongside a functional description, the description of users’ concept of writing systems and their attitudes toward them had already been outlined in the late 1970s (Reference Zalizni͡ak and Zalizni͡akZalizni͡ak 2002 [1979]). The user-based turn was motivated both by the subjectivity of usage in and of writing systems and strove for a description of the system used; the latter obviously deviated from the prescribed one (Reference Zalizni͡ak and Zalizni͡akZalizni͡ak 2002a [1979]: 566–67). Ultimately, such considerations were to open up a vast field of sociolinguistic and sociopragmatic research into the usage of writing systems (see Section 19.3). Besides those descriptive studies that furnish an overview of writing systems in particular sources and – more broadly – languages, a comparative paradigm was deployed in more advanced studies. On the one hand, this approach provides a positivist exploration into similarities and divergences in graphic representation within single texts or between several texts. On the other, it can serve as an access to sociolinguistic and sociopragmatic variables in order to explain orthographic consistency or variation, as well as, paired with an appropriate digital mark-up, calculate and measure the regularity in variation. The starting point for this development can be observed in text criticism, whereby a holistic method was used for studying texts that involve both an in-depth investigation and a thorough reconstruction of the extralinguistic context.

In this chapter, comparative and sociopragmatic methods are illustrated based on material from Slavic languages written in early modern times, when their writing systems had already been developed to a great extent, thus corresponding, albeit sometimes with delayed timing, to the common European trend. Traditionally, Slavic studies have shown considerable interest in writing, orthography, and variation in and of the writing systems. And yet, as with other philologies, this research has not been completed within a specific sociolinguistic or sociopragmatic paradigm, even though it inevitably touches upon important concepts within it. This chapter equally serves as the summary of trends therein and is structured as follows.Footnote ²

First, the most important directions in comparative studies on Slavic writing systems and orthographies are presented. Second, the theoretical preliminaries in sociopragmatics and the deployment of this framework for research on writing systems are discussed, for sociopragmatic concepts have largely been developed without taking writing systems into consideration. Then, pragmaphilology is presented as a promising direction emerging recently within Slavic studies, and it serves as an example of the shift from the initial paleographic interest in linguistic material to studying variation in and of several writing systems. To conclude, the chapter summarizes the criticism of both methods, evaluates their impact on future development within the discipline, and maps out prospective directions in how to best adopt those methods .

19.2 Comparative Method: Exploring Variation

Given the relative scarcity of historical sources, the comparative method has been predominantly adopted for carrying out small-scale studies with a high zoom-in effect. These have resulted in detailed explorations and presuppose the existence of a certain number of comparable studies in order to provide sufficient evidence for verified generalizations (for further considerations, consult Section 19.5). The comparative method is deployed on selected sources and is used to explore the variation, either within a single text or between several copies of a text, in order to access an individual, a small-scale (e.g. communities of practice) or an institutionally driven (e.g. chanceries, printing houses) alternation between them (see Reference Auer, Hinskens, Auer, Hinskens and KerswillAuer and Hinskens 2005: 336).Footnote ³

This approach requires using serial, impactful texts, such as manuscript copies or book reprints. In research on writing systems and orthographies, the comparative method has, to date, been applied to various versions of the Bible (for Bibles printed in Czech see Reference Fidlerová, Dittmann, Vladimírová, Čornejová, Rychnovská and ZemanováFidlerová et al. 2010), or ecclesiastic literature such as Joannes de Caulibus’s Meditationes vitae Christi (for translations printed in Polish such as Baltazar Opiec’s Żywot Pana Jezu Kristu, see Reference LisowskiLisowski 2001, Reference Bunčić, Baddeley and VoesteBunčić 2012). The advantage of focusing on such texts is their widespread circulation and their apparent, frequency-induced influence on the establishment of orthographic norms across society – either officially established or socially agreed upon. Both factors underpin the significance of serial sources for language and cultural history.

The vast majority of serial texts in the vernacular languages were originally translations (mainly from Latin) in the Middle Ages and in early modern times. However, this key factor was not always taken into consideration in previous philological scholarship; this led to isolated analyses of target texts without recourse to the source texts. Meanwhile, in translation studies, the use of comparisons between the source text and the target text, as well as between different translations of one and the same text, became an acknowledged analytical method. Similarly, here, the interest in the target text’s quality of translation and in the source text’s cultural transfer led to the focus on the mechanisms of cultural accommodation between those cultures that were in contact. Moreover, research on these mechanisms not only encompassed the motivation behind the translation, but also aesthetic and ideological programs that explain the interpretative deviations between the source and the target text, as well as between translations by different authors.

The impact of the linguistic shape of the source text on the target text (linguistic interference), particularly in translations from medieval and early modern times, became obvious as one consequence of cultural translation from the learned into the vernacular languages; this led to the unification of the linguistic/philological research methods with those employed in translation studies (see Reference ZemenováZemenová 2011, Reference Lazar, Köster, Link and LückLazar 2018, Reference Maier and ShaminMaier and Shamin 2018). This interdisciplinary research has primarily concentrated on lexical and morphosyntactic interferences; the exploration of writing systems and orthographies was subsequently narrowed down to observing orthographic ‘mistakes’ that were interpreted as hasty or inattentive writing executed by someone lacking experience therewith. Notwithstanding the existence of such writing irregularities, the prospective studies on orthographic interferences in medieval and early modern times translations should provide fresh insights into the development of and interrelationships between European writing systems, where micro- and macro-linguistic variation remain to be explored.

For the microlinguistic variation, a careful and systemic examination of impactful orthographic practices by literate individuals could exemplify the orthographic practices of those social groups over whom they wielded influence, particularly in the situation of learning a language. This approach often reveals consistent writing principles as used by an expert writer.Footnote ⁴ This was the case in the 1607 Russian-German phrasebook and dictionary compiled by Tönnies Fonne, subsequently examined by Reference HendriksHendriks (2014: 81–138).Footnote ⁵ Fonne’s phrasebook was compiled following the tradition of phrasebooks for German-speaking merchants trading in Eastern Europe, where East Slavic vernacular languages, including Russian, were spoken. These phrasebooks were used for language learning; thanks to them the essential speaking, writing, reading and cultural competency were developed and they are considered to have had considerable outreach.

The macrolinguistic variation in writing systems and orthography can be productively explored by examining serial texts from different regions. In this case, not only common features but also regionally induced divergence between the related writing systems can be explored. This approach is exemplified in translations of German municipal law into Czech and the Old Slovak language, which was used in East Central European towns during the fifteenth century.Footnote ⁶ In these closely related varieties, the onset of orthographic change from a simplified and digraphic orthography toward the use of diacritic orthography has been documented after the Hussite Wars (approximately 1419–34).Footnote ⁷ The shift toward diacritic orthography – under the premise of the simultaneous usage of other orthographies – was motivated by two extralinguistic factors:

(1) Writing in Slavic varieties/translation into Slavic varieties and the appropriate use of spelling was part of the Hussite ideological program and, hence, translating activities and usage of diacritic orthography was a sign of affiliation to this movement.
(2) In times of uncertainty, the towns adhered to the symbolic re-establishment and legitimation of the German law in Slavic translations; these were compiled using the new and more progressive orthography.

The study discovered the use of diacritic orthography in the Old Slovak translation together with a simplified and digraphic orthography, while the Czech translation demonstrates, with some exceptions, the exclusive use of simplified and digraphic orthography. This divergence emerged on account of the comparative chronology of translations of the German law in two separate regions: in Bohemia, Czech translations were already coming out from the beginning of the fifteenth century (pre-Hussite period), while the Old Slovak translations in Upper Hungary date back to the mid-fifteenth century (post-Hussite period). The later onset of the Slavic written tradition in Upper Hungary was to facilitate implementing the diacritic orthography, while in Bohemia the established translation and orthography tradition hindered the breakthrough of the innovative, progressive orthography (Reference Lazar, Kuße and KosourováLazar 2016: 193–96). As the exemplified study shows, several variation types – in this case, diachronic and diatopic variation – may be explored at once.Footnote ⁸ Further exploration into orthographic principles applied to other languages or varieties will help to cluster the European writing systems according to common orthographic principles that cross linguistic boundaries and to explore systemic similarities and differences between them.

The comparative method is also frequently deployed in order to explore and compare writing systems found in written and printed texts on various media or surfaces (see Reference FranklinFranklin 2019: 1–2, Reference Rozhdestvenskai͡aRozhdestvenskai͡a 1992, Reference SchaekenSchaeken 2019).Footnote ⁹ In particular, the interplay of manifold text materiality with writing systems was primarily explored in the earlier stages of language development when the interdisciplinary collaboration of linguistics with archaeology and historical auxiliary sciences, such as sphragistics, numismatics, or epigraphy dealing with texts written on seals, coins, church walls and so on, occurs (see Section 19.4). In later language stages, such collaboration is usually less intensive for the substantial growth of text stored on traditional media (manuscripts or prints). In particular, those sources written during the transition from the Middle Ages to early modern times have attracted interest among researchers across linguistic specializations, for the medial shift from manuscript to print was to constitute a seminal event in the history of European writing systems and orthographies. Such scholarship involved the study of similarities and discrepancies between writing systems on media and the motivation analysis for diverging graphemic choices (Reference LisowskiLisowski 2001, Reference Fidlerová, Dittmann, Vladimírová, Čornejová, Rychnovská and ZemanováFidlerová et al. 2010, Reference Bunčić, Baddeley and VoesteBunčić 2012). However, the constantly growing interest in the role of written language in shaping urban space, as evidenced from the abovementioned studies on the beginnings of written languages, has revived synthetic approaches to writing systems and orthographies presented in urban spaces. Diachronic studies on linguistic landscapes (see Reference Pavlenko, Shohamy, Ben-Rafael and BarniPavlenko 2010, Reference Pavlenko and MullenPavlenko and Mullen 2015) involve interdisciplinary approaches and multimodal corpora; the complete medial and material range of written texts, including pictures of billboards, information tables, shop names, street names, commemorative plaques, graffiti and so on, have been taken into consideration.

19.3 Theoretical Preliminaries in Sociopragmatics

The sociopragmatic paradigm in historical linguistics dates back to the mid-1990s and is usually divided into two branches: diachronic pragmatics and pragmaphilology (Reference Taavitsainen, Fitzmaurice, Fitzmaurice and TaavitsainenTaavitsainen and Fitzmaurice 2007: 13–15). As the discipline’s name suggests, it unites two subdisciplines, pragmatics and sociolinguistics,Footnote ¹⁰ whose boundaries are elastic. Compared with historical linguistics studies, whose aim is to describe the earlier language stages, as well as to explain the causes behind linguistic change and genetic interrelations in languages, sociopragmatics also involves the historical conditions for text production, transmission and reception so as to be able to explain language change and reconstruct the meanings that texts conveyed within those settings (Reference Taavitsainen, Bergs and BrintonTaavitsainen 2012: 1464). Notwithstanding historical distance, this perspective ensures an appropriate interpretation of historical texts. This assertion is equally true for historical studies on writing systems and orthographies, for their development and change depend on sociolinguistic and sociopragmatic variables (Reference Voeste, Baddeley and VoesteBaddeley and Voeste 2012a: 11).

The development of historical sociopragmatics has been facilitated by placing conceptually oral texts in the spotlight (Reference Koch, Österreicher, Günther and LudwigKoch and Österreicher 1994: 587, 2008: 199–203),Footnote ¹¹ that is, plays, dialogues, courtroom recordings, textbooks, phrase books, as well as ego-documents (letters, diaries, postcards, notes and so on),Footnote ¹² writings that often draw upon an ‘invisible’ language history ‘from below’ (Reference Elspaß, Elspaß, Langer, Scharloth and VandenbusscheElspaß 2007: 2–3, Reference Taavitsainen, Fitzmaurice, Fitzmaurice and TaavitsainenTaavitsainen and Fitzmaurice 2007: 18–21, Reference Taavitsainen, Bergs and BrintonTaavitsainen 2012: 1466–67, Reference Havinga and LangerHavinga and Langer 2015: 2–5).Footnote ¹³ A significant characteristic of those texts is their heterogeneity on multiple levels of the language system; one needs to be aware, however, of constraints put in place by the genre-related encoding practices (for more details, see Reference Taavitsainen, Fitzmaurice, Fitzmaurice and TaavitsainenTaavitsainen and Fitzmaurice 2007: 18). Developing appropriate concepts mostly depends on available sources; this results in the abovementioned production of unique case studies that often provide an insufficient basis for generalizations. An exception exists for the English language, as the available corpora cover its entire history and have a relatively high level of representativeness compared to those compiled so far for other languages (Reference Nevalainen, Bergs and BrintonNevalainen 2012b: 1442). Hence, the following presentation of both branches, and the appropriate research frameworks used therein, is based on representative case studies that demonstrate their practical applications, advantages and perspectives.

19.4 Pragmaphilology Meets Diachronic Pragmatics

Differences between the philological and the historical sociopragmatic approach can be traced in the history of research on the Novgorodian birchbark letters – short texts of utilitarian character found in Northern Russian Novgorod by archaeologists, starting in 1951 until the present day. Dating back to between the eleventh and fourteenth centuries, these texts were written on birchbark strips in an East Slavic variety called the Old Novgorodian dialect. Initially, their linguistic interpretation was a supportive activity for historians who had been seeking to read the birchbark letters segmented into words and sentences whose difficult and incomprehensible contents were in need of explanation. In parallel to this work process, the significance of these texts has been acknowledged for the history of Slavic languages, and the first structuralist descriptions of the birchbark letters, which also include descriptions of paleography and phonology, have been published (Reference 658BorkovskiĭBorkovskiĭ 1955). The writing system used in the birchbark letters differs from the writing system used on parchment; in earlier research it was even evaluated as a writing system that availed of fewer characters and was ascribed to society’s less educated strata (Reference Avanesov and BorkovskiĭАvanesov 1955, 80–81; compare criticism in Reference Zalizni͡ak, I͡Anin and Zalizni͡akZalizni͡ak 1986: 93, 104, 217).

This particular writing system as well as its orthography have been described in great detail, notably by Reference Zalizni͡ak, I͡Anin and Zalizni͡akZalizni͡ak (1986, Reference Zalizni͡ak and Zalizni͡ak2002b). He demonstrated how both the birchbark letter’s writing system and orthography followed a given set of rules, and, moreover, that discrepancies in the writing system and orthography used on parchment were not random in nature (Reference Zalizni͡ak, I͡Anin and Zalizni͡akZalizni͡ak 1986: 96–97). In line with this recognition, observations on line-final word division showing orthographic conventions and text arrangement demonstrate a shift from the line-final open syllable, until c. 1350, to the possibility of dividing a word, thus also ending a line with a closed syllable, post-1350 (Reference SchaekenSchaeken 1995: particularly pages 101–2).Footnote ¹⁴ In Novgorodian society, a dichotomy existed between a bookish and an everyday writing system, as did, to a lesser extent, a dichotomy between two orthographic systems. As is shown below, these two orthographies used to have a social function. In structural terms, Reference Zalizni͡ak and Zalizni͡akZalizni͡ak (2002b: 594–95) speaks of a default and a non-distinguished writing system. Within the default writing system, the scripts <ъ> /ø/ – <o> /o/ and <ь> /^jø/ – <e> /e/ were distinguished, while within the non-distinguished writing system they were interchangeable; moreover, intermediate writing systems with shared features of the default and non-distinguished system existed. In particular, within the non-distinguished system, the interchangeability of the scripts might have a more or less systemic character (Reference Zalizni͡ak, I͡Anin and Zalizni͡akZalizni͡ak 1986: 100–5).

The use of the appropriate writing system depended not only on the author’s referential perspective on the particular situation but also on the corresponding sustainability of a piece of writing (i.e. the time span a piece of writing was meant to serve). As Reference 686FranklinFranklin (2002: 40) observes, “[s]cribes of parchment manuscripts kept half an eye on eternity; senders of birchbark letters would hardly have counted on the prying persistence of future archeologists.” Following Koch’s and Österreicher’s distance and proximity communication model (Reference Koch, Österreicher, Günther and LudwigKoch and Österreicher 1994: 588, Reference Koch, Österreicher and Janich2008: 201), two referential perspectives are suggested: an immediacy and a distance perspective. Both perspectives are distinguished by the clusters of features distributed among them, as shown in Table 19.1 .

Table 19.1 Clusters of features in Novgorodian writings with different authors’ referential perspectives

Features	Immediacy perspective	Distance perspective
Writing material^a	Birchbark, church walls, everyday items, etc.	Parchment, ritual liturgical items
Writing system	Everyday (non-distinguished)	Bookish (default)
Orthography	Everyday	Bookish
Formulaic/formulation conventions	An everyday set	A bookish set
Conveyance practice	Conceptually oral	Conceptually written

^a On the use of different items and surfaces for writing, see Reference Rozhdestvenskai͡aRozhdestvenskai͡a 1992: 152–55, Reference SchaekenSchaeken 2019: 44–47, Reference FranklinFranklin 2019: 1–18.

However, the immediacy and distance perspectives were not impenetrable; this might explain idiosyncrasies within the material clusters (Reference SchaekenSchaeken 2011b: 354–8, Reference SchaekenSchaeken 2019: 49–53). Epigraphic inscriptions on church walls serve as a good example for this phenomenon: they demonstrate the interchangeability whenever it came to choosing between everyday and bookish orthography. This decision primarily depended on the functionality the writer aspired to. It would be simplistic to state that the writing material impacted upon the choice of orthography (see Reference Bunčić, Bunčić, Lippert and RabusBunčić 2016: 137). Yet, birchbark as a material and everyday orthography definitely exhibit a regular co-occurrence (see Reference SchaekenSchaeken 2019: 49–50).Footnote ¹⁵ Hence, functionality (sustainability) was decisive whenever opting either for an everyday or for a bookish orthography. As the research summary on birchbark letters shows, the text materiality description within paleography paved the way for the description of writing materiality (writing system and orthography) from the perspective of historical linguistics. The example of the birchbark letters demonstrates that the explanation for using diorthographia in Novgorod was only feasible when reconstructing the sociopragmatic context of its use. Without taking the sociopragmatic perspective into account, the birchbark letters were merely interpreted as unlearned writings. The contextualization of their usage – as a strategy in a situation that required an immediacy or a distance perspective – led to the depiction of a competent and conscious user, who could readily switch between two writing systems.

Indeed, the further pragmaphilological research on birchbark letters confirmed the existence of such a type of competent writer. Over the last two decades the writing and delivery of the birchbark letters has been reconstructed in terms of its participants’ communicative roles: this included the role of the messenger who delivered the letter, read aloud the message for the addressee, and later returned to the original sender with an answer (Reference Gippius, I͡Anin, Zalizni͡ak and GippiusGippius 2004, Reference Gippius and SchaekenGippius and Schaeken 2011, Reference SchaekenSchaeken 2019: 141–85). Subsequently, the structure of letters was disclosed as a polyphony of references concerning the author herself/himself, the messenger, the addressee and further individuals participating in the communicative situation. These findings resulted in the contents across a range of birchbark letters being reinterpreted and showed the necessity of sociopragmatic analysis for an adequate historical interpretation of these sources. Switching between the default writing system and a ‘corrected’ non-distinguished writing system marked a different referential perspective in several birchbark letters and served therefore for text structuring (Reference SchaekenSchaeken 2011b: 354–58).Footnote ¹⁶ The description of birchbark letters as conceptually oral texts “with limited interference of specific genre conventions” (Reference SchaekenSchaeken 2011a: 10) that unite several referential perspectives showed the motivation behind the usage of several writing systems.

19.5 Criticism, Impact and Perspectives

The issues facing historical orthography are akin to most issues that those working in historical linguistics are likely to encounter. They encompass the general sparsity of linguistic resources, their restricted availability and their selectiveness. While each of these factors has inhibited digital research in historical orthography on account of the unbalanced corpora, the automatic recognition of manuscripts and prints has already reached a high standard. This development, in turn, has facilitated text acquisition at least. Consequently, the scope for research in historical orthography within a digital environment is equally limited to accessible corpora. The range of languages involved, however, has grown and thus boosted the diversity of languages considered for research purposes.

The shift toward digitization in the humanities has led to expanding the range of research methods from hermeneutic to computer-based ones, and has ultimately promoted more comprehensive research into writing systems from the perspective of big data. These new methods have been underpinning the necessity of unambiguously and consistently encoding the characters in digital representations of analogue texts in order to ensure their retrieval through digital applications (Reference PiotrowskiPiotrowski 2012: 11). However, the most common praxis in text criticism and corpus linguistics involves modernizing orthography for the sake of a standardized retrieval (see the DIAKORP within the Czech National Corpus, CNC). Recently, however, this trend has changed; the interest in spelling and orthography in linguistics has burgeoned and the TEI-encoding initiative has offered a practicable way for tagging spelling irregularities as well as a systemic analysis of them. A multilayer representation of contents in a digital environment, being a powerful instrument, apparently resolves those prevailing discrepancies; this includes but is not restricted to interlinking transcripts and facsimiles. The multilayer representation of contents enables validation of the transcriber’s solutions (when compared with the facsimile), or for the indication of possible variable solutions as proposed by a transcriber in the case of uncertainty (Reference Vertan, Baranov, Engel, Fuchsbauer and MiklasVertan 2018: 55). Notwithstanding all the advantages that this powerful instrument can offer, a substantial amount of study both on writing systems and orthography is still carried out manually.

Similarly, as already highlighted in Section 19.1, historians are not interested in orthographic variation, but instead in content, and sometimes in text structure. In his comprehensive legal analysis, for example, Reference MikułaMikuła (2018: 24) points out the importance variation plays in text structure, for it might convey key changes in legal norms or in their interpretation. Yet, Mikuła also comments on the omission of language variation on other levels of his analysis. Reference Jamborová, Čornejová, Rychnovská and ZemanováJamborová (2010) clearly showed, however, that orthographic variation might be distinctive for lexical meaning and text interpretation, juxtaposing město (in the sense of civitas) and miesto (in the sense of locus) in the Czech legal codices of the fifteenth century. Curiously enough, the restricted orthographic variation, or lack thereof, might equally be an indicator for formalization as contrasted with the variable and informal context: such was the case of the n-gram geden czlowiek ‘one man/somebody’ in Middle Czech legal sources (Reference Lazar and CarlsLazar forthcoming).

In order to verify already existing outcomes, the scope of genres to be examined needs to be broadened (Reference Fidlerová, Dittmann, Vladimírová, Čornejová, Rychnovská and ZemanováFidlerová et al. 2010: 303), not only focusing on serial texts, but also exploring longer time spans. This will enable a deeper look into the editorial praxis of printing houses and their respective programs to unify orthography on the eve of standardization of European languages. Legal texts constitute one such area to explore; they were usually updated and compiled several times according to evolving circumstances in their usage and, as the previous research shows, constitute a fruitful and promising field for research in historical orthography. In particular, specialized platforms that collect sources on selected topics and integrate research tools such as automatic comparison of text versions provide a promising environment for large-scale research (see the research area Sources from Laws of the Past, IURA for Polish legal texts). A longitudinal exploration into orthographic decisions made in a mixed community of practice, including translators and publishers, is feasible with materials from early newspapers that frequently contained news reports that had been translated. Some preliminary analyses in this direction, based upon articles from Russian newspapers in the first decades of the eighteenth century, has been completed by Reference Maier and ShaminMaier and Shamin (2018). Such large-scale studies afford a solid foundation for generalizations and subsequent typologies of orthographic changes as bundles of intra- and extra-linguistic factors across language boundaries or language families (see Reference Voeste, Baddeley and VoesteBaddeley and Voeste 2012b: 11, Reference CondorelliCondorelli 2020c).

Contact linguistics is the area where the contact and impact of several orthographic systems on each other needs to be more intensively explored in future. Orthographic systems came into contact primarily as a result of exchanges between and amidst professional networks, the acquisition of several writing systems or typesets, foreign language learning, as well as during the translation process itself. Some research on these topics has been presented in Section 19.2 (see also Chapter 11, this volume), but it still needs to cross the threshold of studies sui generis. In order to do so, a shift is called for: from a comparative approach, which undoubtedly was adopted on account of the peculiarities of the materials involved, toward a sociolinguistic approach in order to explain the comparison outcomes.

To sum up, recent developments in digital humanities have opened up new avenues for research in the diachronic examination into writing systems and orthographies, which for some time had been impeded by editorial practices. As a result of this shift, interdisciplinary approaches on the same material have been consolidated and synergy effects between the disciplines involved have emerged. Apart from the identification of the appropriate type of orthographic variation (see Reference Rutkowska, Rössler, Hernández-Campoy and Conde-SilvestreRutkowska and Rössler 2012: 217–19), this synthetic view has enabled a broader and more diversified scope for explaining variation in writing systems and orthographies. This is precisely the direction that needs to be followed in future research.

19.6 Conclusion

The present chapter has drawn upon the comparative and sociopragmatic methods in historical orthography research. Having defined writings systems and orthography, including the potential overlap between these terms, I have described orthography as a supportive discipline on the fringes of other disciplines, such as paleography, historical phonology and text criticism. I have also explained the growing interest in studying writing systems and orthographies as independent disciplines. Section 19.2 has presented the adoption of the comparative method in Slavic studies and principal directions therein, for example, a proliferation of small-scale studies on individual and group variation, exploration of consecutive versions of biblical and ecclesiastical literature, microlinguistic and macrolinguistic approaches to variation as well as investigating and contrasting texts written on different types of medium and surface. Section 19.3 has summarized theoretical preliminaries in historical sociopragmatics, primarily based upon research on English historical orthography, pointing to such aspects of sociopragmatic context as historical conditions for text production, transmission, reception, and the reconstruction of meanings conveyed by texts in the relevant context, with the focus on conceptually oral texts. In section 19.4, I have offered an overview of the most important concepts in Slavic studies (mostly involving Russian, Czech and Polish material). These have been roughly divided into pragmaphilology and diachronic pragmatics, with the differences and synergies between these approaches exemplified. In Section 19.5, the methods proposed have been critically appraised and their applicability for prospective research demonstrated.

20 Reconstructing a Prehistoric Writing System

20.1 Introduction

How can we reconstruct the orthography of a writing system that is no longer used? How can we test our assumptions and reach definite conclusions? This chapter investigates the topic of orthographic reconstruction of a historical writing system by taking as case study the Linear B (LB) syllabary of Bronze Age Greece, used to render the oldest Greek dialect attested in written form, Mycenaean Greek. Reconstructing the orthography of a historical writing system poses challenges of both a structural and a linguistic nature. In the case of LB, the task is complicated by the paucity of evidence, its state of preservation (fragmentary texts) as well as its nature (economic records). The evidence we are left with are syllabic sequences such as a-to-ro-qo, i-qo, po-me : this is how the words ‘man’, ‘horse’, ‘shepherd’ were spelled in Mycenaean Greek. No doubt there is a considerable gap to gauge to connect these spellings to their alphabetic Greek counterparts ἄνθρωπος /ant^hroːpos/ ‘man’, ἵππος /hippos/ ‘horse’, ποιμήν /poimeːn/ ‘shepherd’. How did we arrive at such reconstructions? Understanding and systematically reconstructing the orthographic conventions devised to write this early form of Greek with a syllabic system was the first step toward establishing a methodology for reading and interpreting LB texts. Reconstructing orthographic conventions helped us make sense of the grapholinguistic units: knowing the way in which specific phonological and morphological features were encoded as recurrent patterns allowed us to ‘reconstruct’ the linguistic reality behind the writing system and to study the language in its diachronic development.

In the case of LB, orthographic reconstruction was a backward process, as based on (and borne out by) comparison with later evidence from alphabetic Greek. The methods used to reconstruct historical orthographies may be multiple, inasmuch as contextual. A global, overarching method may prove ill-suited to accounting for the peculiarities of the many contextual realities. Although some general ‘universals’ (e.g. phonotactics) may help to reconstruct historical orthographies, the raison d’être of each orthography remains context-based, as intertwining both writing and linguistic systems. For, as Reference Faber and BaldiFaber (1991: 620) holds, “any linguistic interpretation of an orthography is based on an understanding of its creation and use.” This contribution aims to prove useful to those interested in seeing how reconstructing the orthography of a historical writing system no longer in use gives us a finer appreciation of the language encoded, to be studied in its diachronic development. In this respect, the Greek language is unique in having a continuous written record from the Late Bronze Age (c. 1400–1200 BC) until today, although over time it was rendered with typologically different writing systems: first written with the LB syllabic system, adapted from the earlier Minoan Linear A syllabary, before the adoption of the alphabetic system, adapted from the Phoenician alphabet, in the Iron Age.

20.2 Writing in Bronze Age Greece: Linear B in Context

A number of writing systems were in use in Bronze Age Greece, having had as their cradle the island of Crete, situated in the middle of the Aegean Sea at a crossroads between Europe, Egypt and the ancient Near East. From a typological perspective, all these writing systems are syllabaries, meaning that each graphic sign represents a syllable (e.g. /pa/, /e/; see Tables 20.1–20.3) and is a phonological unit in the script. These syllabaries were also complemented with a set of logographic (or ideographic) signs, depicting real-world referents and standing for words/concepts, not individual syllables (e.g. a sign depicting a tripod cooking pot and standing for the world ‘tripod’; a sign depicting a pig flanked by the syllabogram /si/ standing for the concept ‘fattened pig’, see Reference MeissnerMeissner 2019). Crete first saw the rise of Cretan Hieroglyphic and the Linear A script (LA): the former is understood to be a North/East Cretan phenomenon (with major find-spots at Knossos, Mallia and Petras) in use in the period c. 1900–1700 BC; the latter, with its original nucleus probably to be sought in central Crete (Phaistos), shows a much wider geographical distribution (across Crete, on the Aegean islands, some finds also in Asia Minor) as well as time span, c. 1800–1450 BC. Both scripts are still undeciphered and are understood to encode the indigenous language(s) of Bronze Age Crete (on LA see Reference SchoepSchoep 2002, Reference DavisDavis 2014, Reference SalgarellaSalgarella 2020, Reference Salgarella2021; on Cretan Hieroglyphic see Reference Olivier and GodartOlivier and Godart 1996, Reference FerraraFerrara 2015, Reference Decorte and SteeleDecorte 2017, Reference Decorte and Steele2018).

Table 20.1 The LB ‘basic’ syllabary

Darker shading for LB signs that have both the same shape and the same/approximate phonetic value in LA; lighter shading for signs that have only got the same shape (but not phonetic value) in LA; no shading for signs that are new LB introductions.

(drawings by the author based on actual attestations)

The role played by LA, and the civilization responsible for creating and making use of such writing – the so-called ‘Minoans’ in the literature – cannot be overestimated, so much so that over time LA was taken as a template (mother-script) for the creation of another two writing systems: LB, used on Crete and Mainland Greece in the period c. 1400–1190 BC; and Cypro-Minoan, developed on Cyprus and used in the period c. 1600–1050 (on LB see Reference Ventris and ChadwickVentris and Chadwick 1973, Reference PalmerPalmer 1963, Reference HookerHooker 1980, Reference Bernabé and LujánBernabé and Luján 2006, Reference Duhoux and Morpurgo DaviesDuhoux and Morpurgo-Davies 2008, Reference Duhoux and Morpurgo Davies2011, Reference Duhoux and Morpurgo Davies2014, Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c, Reference Del Freo and PernaDel Freo and Perna 2019; on Cypriot scripts see Reference SteeleSteele 2013a, Reference Steele2013b, Reference Steele2019). Both daughter-scripts render different languages from the template: LB was successfully deciphered in 1952 and proven to write an archaic form of Greek, while Cypro-Minoan is still undeciphered. In this chapter, the focus is on LB, since, being the only system currently deciphered and of which we thus have a better appreciation, it gives us the most insights into its diachronic development (starting from the process of adaptation from LA) and the reconstruction of the orthographic conventions used to write Greek by means of such system back in the Bronze Age.

As last remark, it needs stressing that the context of use of LB is restricted, as limited to the bookkeeping of bureaucratic transactions by palatial administrations: our evidence consists in inscribed clay tablets (and some vessels) which have been burnt and thus preserved to us as a result of a number of firing episodes which took place at the end of the Bronze Age. Therefore, due to the economic nature of the evidence, LB texts show a highly formulaic structure.

20.3 The Ancestry of the Syllabary

The first time Greek speakers set out to write down their language they made use of syllabic signs, since LB was molded and adapted from the LA writing system already in use on Crete. Scholars have long been working on reconstructing the process and circumstances of adaptation and script transmission from one system onto the other. Upon discovery of the first inscribed documents at the start of the twentieth century, British archaeologist Arthur Evans, who was then excavating the palatial center of Knossos on Crete (report on the first excavations in Reference EvansEvans 1901), coined and used the unifying label ‘Minoan linear scripts’ to refer to such writing, further subdivided into scripts of ‘Class A’ and ‘Class B’ (with a chronological connotation). Since very early on, in fact, it was apparent that a good number of signs were shared between the two systems as appearing in both (hence listed with the prefix ‘AB’ in the systematized sign list), implying that upon adaptation these signs had been directly borrowed from the template. Some of these shared signs are likely to have retained both the same shape and the same (or approximately comparable) phonetic value and are therefore standardly referred to as ‘homomorphic and homophonic’ signs; some other signs, instead, show a comparable sign form (homomorphic), but their correspondence in terms of phonetic value cannot be uncontroversially proved (lastly Reference Steele, Meissner and SteeleSteele and Meissner 2017). For this reason, and thanks to the decipherment of LB as Greek (see Section 20.4) allowing for the phonetic interpretation of syllabic signs, it is possible to at least read with an approximation LA texts. On top of this core of AB ‘shared’ signs, some 12 signs were also created ex novo in LB (see Section 20.8).

From a typological perspective, a syllabary is often deemed not to be the perfect fit for rendering Greek in writing: the syllabic structure of the LB system, which only encodes open syllables (i.e. those ending in a vowel), does not allow for the straightforward notation of final consonants (i.e. consonants in the coda of a syllable) and consonantal clusters, at times resulting in ambiguity as to the correct reading and interpretation of words (see Section 20.5).Footnote ¹ This is the reason why, in mainstream scholarship, LB is often seen as an ‘unsuitable’ system for writing Greek. However, it has to be acknowledged that in the Late Bronze Age the (logo-)syllabic system was the best (if not the only) ‘game in town’ in the Aegean area. Given the derivation of LB from LA, it is a reasonable assumption that the form of the syllabary may well reflect the characteristics of another language (Minoan), for which it was created, and not Greek. It may not be too far-fetched, in fact, to suppose that in the context of adaptation the first Mycenaean Greek writers may well have retained not only the script in its purely graphic form (i.e. its sign repertory), but also the orthographic conventions which were bound to it in the template system. This, in fact, would have been the most effortless solution for individuals who had to learn the writing technology ex novo, as well as how to notate their own – different – language in writing.

In this respect, the possibility may be entertained that the first writers of LB might have been bilinguals, mastering both Mycenaean Greek and Minoan (lastly Reference SalgarellaSalgarella, 2020: 377), given the rather systematic regularity of spellings and the observance of orthographic conventions since the very early stage of LB writing documented to us. This could explain, at least in part, why the system was not adapted to reflect more accurately and explicitly the phonological repertory of the Greek language (e.g. voicing and aspiration, see Sections 20.5, 20.6). What is remarkable is that, all in all, the extant LB texts present a notable orthographic uniformity, across both time (some 200 years) and space (Crete and Mainland Greece). However, this is not to say that the system was completely standardized, as modifications, additions and local as well as chronological writing preferences can be observed nevertheless (see Section 20.8; on the possible existence of ‘local scripts’ within LB, see Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 84, Reference Del Freo and PernaDel Freo and Perna 2019: 137–38).

20.4 Reconstructing Orthographic Conventions

At this point we may well wonder how we arrived at reconstructing the orthographic conventions used to write Mycenaean Greek with the LB writing system. Regrettably, we do not have any contemporary records or accounts of writing conventions, and it was down to modern philologists to reconstruct the conventions that, it is believed, scribes followed when reducing the spoken language to syllabic writing. The reconstruction of orthographic conventions can be seen as a step-by-step process, which stretches from predecipherment to postdecipherment. In short, we could say that the method followed for the reconstruction was threefold. The first step involved looking for general recurrent patterns, and this was done before the decipherment by scholars working on the then still called ‘Minoan scripts’, that is, both LA and LB. To start with, Evans was already able to identify the presence of gender distinction (masculine/feminine) inferable from the juxtaposition of self-evident logograms depicting a man or a woman to words showing different endings (among which, for example, the signs later to be read as jo, masculine ending, and ja, feminine ending; Reference EvansEvans 1909: 35–36). On this wave, the American scholar Kober demonstrated the existence of not just two, but three grammatical cases, and showed proof for inflection by collecting sets of words containing sequences of the same signs but ending with different terminations: the so-called ‘Kober’s triplets’ (Reference KoberKober 1946, Reference Kober1948; for a description of the LB decipherment process see Reference ChadwickChadwick 1967, Reference Pope, Duhoux and DaviesPope 2008, Reference Judson, Christophilopoulou, Galanakis and GrimeJudson 2017b).

As a second step, building on Kober’s results and on a further search for systematic patterns, Ventris carried out a methodical analysis of sign frequency and position in 20 ‘work notes’ he circulated to scholars (Reference Ventris and SacconiVentris and Sacconi 1988). By using a combinatory method, he arrived at establishing relative links between signs, resulting in a ‘grid’ in which each row included signs likely to share the same consonant, and each column signs likely to share the same vowel (although at this stage phonetic values had yet to be suggested). Variation in spelling also played a role in the construction of the ‘grid’: if two words differed by one sign only, by implication the two variant signs had to have something in common (either the same consonant or the same vowel). One such example is the word later to be transcribed as a-re-pa-zo-o, which also comes in the alternative spelling a-re-po-zo-o : although Ventris could not read it yet, he was able to list the two variant signs (pa and po) in the same row as arguably sharing the same consonant. The ‘grid’ was then tested against the texts.

At this point, Ventris made an educated guess by looking for the place name Amnisos (the harbor of Knossos) on the tablets, as he thought that place names were likely to have been recorded and the ‘grid’ already offered suggestions to read some signs as a, n + vowel, consonant + i. Fortunately, one of Kober’s triplets showed exactly the desired pattern: a-*i-n*-**, easily to be restored as a-mi-ni-so. By filling in the gaps with the new values and updating the ‘grid’ accordingly, Ventris started to allocate phonetic values to signs and, by consequence, tentatively to work out the readings of further words. The result of this work was what he called an ‘experimental vocabulary’ (Reference Ventris and SacconiVentris and Sacconi 1988: 337–48): a list of LB words for which plausible Greek counterparts could be suggested, enabling him to demonstrate that the language encoded in LB showed enough features and vocabulary to be Greek. Ventris’s decipherment was endorsed by classicist Chadwick, who, by testing Ventris’s ‘grid’ further, was able to read another number of Greek words not included in the ‘experimental vocabulary’. The collaboration between the two resulted in firmly establishing the phonetic values of the core LB syllabary. The third and final step to reconstruct orthographic conventions involved comparing the LB spellings with words known in alphabetic Greek on the one side, and their etymology in reconstructed Proto-Indo-European on the other. This allowed for a better and more nuanced appreciation of the readings, as well as the historical development of the Greek language.

As we can see, reconstructing orthographic conventions goes hand in hand with (and is a result of) the unfolding of the decipherment process: since the first stages of Ventris’s analysis, it was reckoned that the words appearing on the LB documents, which started to sound quite like Greek, did show ‘an unlikely set of spelling conventions’ (Reference ChadwickChadwick 1967: 67). Underneath this syllabic vest, the lexemes of Greek started to show themselves: the more words were given a Greek interpretation and read accordingly, the more the spelling conventions received confirmation. As Ventris himself announced on the BBC on July 1, 1952,

I have come to the conclusion that the Knossos and Pylos tablets must, after all, be written in Greek – a difficult and archaic Greek, seeing that it is 500 years older than Homer and written in a rather abbreviated form, but Greek nevertheless. Once I made this assumption, most of the peculiarities of the language and spelling which had puzzled me seemed to find a logical explanation.

(quoted in Chadwick 1967: 68, full transcript in Ventris and Sacconi 1988)

By the end of the decipherment process, it was only a matter of systematizing in a coherent way such conventions: these had first been put forward as the ‘assumed rules of Mycenaean orthography’ by Ventris and Chadwick in their after-decipherment technical article ‘Evidence for Greek dialect in the Mycenaean archives’ (1953) and firmly established later on in the pivotal publication Documents in Mycenaean Greek (1956, followed by a second edition in 1973). However, the uneasiness Chadwick and Ventris felt at reconstructing such ‘rules’ cannot be concealed and is on occasion expressed throughout their work:

these rules had been forced upon us as the result of identifying the Mycenaean words as Greek; they were in many respect unexpected and unwelcome; […] although they were empirically determined, they do form a coherent pattern

(Chadwick 1967: 74)

and

the inadequacy of the script led to considerable uncertainty about the exact form of many words, which could only be given intelligible shape by the assumption of certain rules of orthography. […] These conventions are based on the general assumption that the pronunciation behind the spelling is a normal – though archaic – form of East Greek, such as had already been inferred for the period by philologists.

(Ventris and Chadwick 1973: 67)

In the years that followed, scholars attempted to account for these complex (and at times rather puzzling) orthographic conventions in detail (see, e.g., Reference VilborgVilborg 1960, Reference PalmerPalmer 1963, Reference DoriaDoria 1965, Reference HookerHooker 1980; in more recent reference volumes see Reference BartoněkBartoněk 2003: 106–12, Reference Risch and HajnalRisch and Hajnal 2006: 45–55), and developed approaches aimed at understanding the raison d’être of such spelling strategies, with a focus on the systematic principles behind the spelling of consonantal clusters (for which see Section 20.6). In this respect, two main currents of thought have been followed (summary in Reference WoodardWoodard 1997: 19–132, Reference Bernabé and LujánBernabé and Luján 2006: 45–52): one encompasses syllable-dependent approaches, hinging on the premise that orthographic conventions are dependent upon syllabic structure (Reference Householder and BennettHouseholder 1964, Reference BeekesBeekes 1971, Reference RuijghRuijgh 1985: 105–26, Reference SampsonSampson 1985: 65–70, Reference Morpurgo Davies, Ilievski and CrepajacMorpurgo-Davies 1987: 91–104); the other, by contrast, encompasses non–syllable-dependent approaches, based on the idea that such spelling representations are sensitive to a set of hierarchical relations (e.g. sonority hierarchy), and not dependent upon syllabic structure (Reference TronskiĭTronskiĭ 1962, Reference ViredazViredaz 1983, Reference JustesonJusteson 1988, Reference Woodard and WattWoodard 1994, Reference Woodard and Watt1997: 62–78, 112–32).Footnote ² To this latter group belongs the theory of the ‘hierarchy of orthographic strength’ elaborated by Reference Woodard and WattWoodard (1994, Reference Woodard and Watt1997: 62–78, 112–32), which stands out for not only giving an accurate account of the principles behind the spellings, but also managing to predict spelling outcomes (more on this in Section 20.6).

In conclusion, LB writing conventions had first been established right after the decipherment and were refined over time by way of comparing LB spellings with the phonology of reconstructed Proto-Indo-European on the one side, and that of the later (first millennium) alphabetic Greek dialects on the other. The very regularity of these orthographic ‘rules’ (although with some exceptions, discussed in Section 20.7) implies that these conventions did exist. It remains to be demonstrated whether such conventions were created in the process of script-adaptation to write Greek in the LB syllabic system, or were continued, to some extent, from the previous system (LA) which, however, rendered a different language. In this chapter, the focus rests on the LB syllabary only (not the whole logo-syllabic set) to suit the theme of the present Handbook. In what follows an outline is given of the main orthographic conventions (‘rules’) that scholars have reconstructed for the orthography of the LB writing system. However, it has to be borne in mind that the LB sign repertory extends beyond its syllabic component (accounting for phonetic units) by also encompassing a set of logograms (i.e. picture-signs standing for real-word referents and commodities), a number of monograms (i.e. signs made up with all the individual signs, strongly interwoven, of the word they stand for), ligatures (i.e. combinations of logogram plus syllabogram) and measure signs (for exhaustive descriptions of the LB script and documents see works listed in Section 20.2).

20.5 Writing Greek in Syllables

With respect to syllabic structure, LB signs represent open syllables of the type (C)V (consonant + vowel, e.g. /da/, or vowel alone, e.g. /a/). There are, however, some exceptions of signs with a CCV structure (consonant + consonant + vowel), where the second consonant is either a labialized sound (/w/, e.g. dwo) or a palatalized sound (/j/, e.g. rja) (see Section 20.7). The open syllable structure represents a structural constraint for writing Greek which has to be borne in mind since, as is explained below, those responsible for ‘standardizing’ the writing (orthographic) conventions upon adaptation of the LB script had to find ways of getting around the open syllable structure in order to account for consonants appearing in the coda of a syllable (a common feature in Greek, as an inflectional Indo-European language) as well as word-initial and word-internal consonantal clusters.

After the decipherment, scholars were able to arrange signs based on their phonetic value into 13 series (Table 20.1, horizontal rows): 12 consonantal series interlocking five vocalic sounds (/a/, /e/, /i/, /o/, /u/; Table 20.1, vertical columns). The entire syllabary, with a total of 87 signs, was subdivided into ‘basic’ and ‘additional’ syllabary. The former comprises some 59 signs and represents the fundamental nucleus of the script, with the basic set of sounds necessary to write down any Greek word (although not unambiguously). The latter comprises some 14 signs which are either ‘doublets’ (this is the case when one single sign is used to replace the sequence of two signs already present in the basic syllabary, e.g. au to replace a-u), or ‘complex’ signs representing consonantal clusters (discussed in Section 20.7). On top of these, there are some 14 still undeciphered signs (discussed in Section 20.9). In what follows, the main characteristics of the basic syllabary are outlined (in-depth descriptions in Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 26–53, Reference Bernabé and LujánBernabé and Luján 2006: 19–21, 23–26, Reference Del Freo and PernaDel Freo and Perna 2019: 132–33).

The 12 consonantal series of the basic syllabary consist of stops (/d/, /k/, /p/, /t/), nasals (/m/, /n/), liquids (/l/, /r/), sibilant (/s/), labial approximant (/w/), palatal approximant (/j/), labio-velar (/k^w/)Footnote ³ and a z-series of still debated phonetic interpretation.Footnote ⁴ There is one vocalic series, with each grapheme standing for either a long or a short vowel. It can be noticed at first glance that the system, as it is, suffers from underrepresentation of phonemes (from a present-day perspective). As to vowels, vowel length is not marked (e.g. the sign transliterated as ‘o’ may represent either /o/ or /oː/, likewise the sign transliterated as ‘e’ could be either /e/ or /eː/), nor is the presence of possible initial aspiration (the script has no series for the aspirate /h/, except for sign a₂ rendering /ha/). As for consonants, voice and aspiration are not marked in the series rendering stops, nor are these marked in the labiovelar series, and there is one single series for the rendition of liquids (/l/, /r/), which is conventionally transcribed as r-series (e.g. the syllabogram for ra is read as either /ra/ or /la/). Moreover, LB neither makes use of diacritics to mark the presence of accents or aspiration, nor has it a way of marking geminated (i.e. double) consonants (e.g. mi-to-we-sa /miltowessa/ *μιλτοFεσσα ‘painted red’).

This ‘minimum marking’ is a crucial shortcoming of the system, not allowing for a straightforward phonetic and phonemic reading of Mycenaean Greek words. Hence, an adequate understanding of the contextual occurrences of words is often necessary for their correct interpretation and reconstruction. This shortcoming affects stops to a great extent, as the lack of differentiation between voiceless, voiced and aspirated stops results in having the graphemes of the p-series representing the phonemes /p/, /b/, /p^h/, and the graphemes of the k-series representing the phonemes /k/, /g/, /k^h/. Dental stops are the only exception, as in this case, in addition to the d-series (voiced /d/), the system has a dedicated t-series for marking voiceless /t/, and plausibly also aspirated /t^h/.

These characteristics, alongside the LB orthographic conventions (Section 20.6) result in some ‘obscuring’ of the exact phonological reality of the word concealed behind the spelling. In fact, in certain contexts, the LB script may create ambiguity: for example, the spelling pa-te could be interpreted as either /pateːr/ (alphabetic Greek πατήρ) ‘father’ or /pantes/ ‘all’ (alphabetic Greek πάντες), and only a contextual analysis can give us the most suitable reconstruction. This brings us to the next section, illustrating the orthographic conventions as reconstructed.

20.6 Spelling ‘Rules’

Once confronted with the technology of writing, those responsible for adapting the writing system to the needs of the Greek language established, and, to an extent, likely inherited, a number of writing conventions, also known in the scholarship as ‘spelling rules’ (most recent descriptions in Reference Bernabé and LujánBernabé and Luján 2006: 31–52, Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 89–123, Reference Del Freo and PernaDel Freo and Perna 2019: 140–46). It has been mentioned earlier that the main feature of the LB syllabary is its open syllable structure (syllables of the (C)V or C(C)V type), and that this is a hindrance for the accurate rendering of Greek phonological and morphological features alike. This is particularly true when it comes to writing down closed syllables (CVC type: e.g. consonants at word-end as case markers) and consonantal clusters (CCV type), which are abundant in the Greek language (e.g. ἄνθρωπος /ant^hroːpos/ ‘man’). This was a critical issue that had to be dealt with; to overcome it, Mycenaean Greek writers (‘scribes’ in the literature) adopted mainly two procedures. Let us illustrate these with some examples.

The first solution, most economical and easy, was to simply omit the ‘extra’ consonant in the coda of a syllable: this always applies to consonants at word-end, and in case the consonant in the coda is one of the following: /l/, /r/ (liquid), /m/, /n/ (nasal), /s/ (sibilant). By reason of omission of a sound, this spelling is also called ‘partial spelling’ (Reference WoodardWoodard 1997: 11). Thus, how would a word like χαλκός /k^halkos/ ‘bronze’ be spelled out? Here the syllabification is k^hal-kos, showing a sequence of two closed syllables (CVC–CVC). Based on the rule outlined above, word-final /s/ is dropped, and /l/ in the word-internal cluster is dropped likewise because of its phonetic nature (liquid): hence, the resulting spelling is ka-ko. But what if the word-internal consonant is none of the above? In this case, the second procedure followed (a slightly more creative one) consisted in spelling out both consonants of the cluster, giving rise to two syllables sharing the same vocalic sound. This is the so-called ‘empty vowel’ (alternatively, ‘dummy vowel’), as this vowel was not supposed to be pronounced (as not present in the phonological word). One such example is the graphic rendering of the word χρυσός /k^hrusos/ ‘gold’, whose syllabification is k^hru-sos (CCV–CVC), showing a consonantal cluster (/k^hr/) at word-start. With the aid of an ‘empty vowel’, added right after the first consonant of the cluster, the resulting spelling is ku-ru-so (with word-final /s/ regularly omitted).

Another such example is the spelling of the renowned site of Knossos on Crete: the place name is rendered as Κνωσσός /knoːssos/ in alphabetic Greek (and has remained the same until today), showing a word-initial consonantal cluster. In LB the term is written with an ‘empty vowel’, resulting in the spelling ko-no-so. The procedure employing the ‘empty vowel’ is used, rather systematically, for both word-initial and word-internal consonantal clusters starting with a stop (stop + stop; stop + /l/, /r/, /m/, /n/, sometimes also stop + /s/), and with the clusters /mn/ (e.g. Ἀμνισός /amnisos/ ‘Amnisos’ = A-mi-ni-so) and /sm/ (see below). This alternative spelling strategy is also called ‘plenary spelling’, given that in this case all consonants are clearly spelled (Reference WoodardWoodard 1997: 11). At this point, some observations are in order on the spelling of /s/ in clusters, as its treatment is not systematic. We have already seen that word-final /s/ is always omitted. In word-initial and word-internal position, /s/ is normally omitted when starting a consonantal cluster comprising a stop (e.g. σπέρμον /spermon/ ‘seed/grain’ = pe-mo ; φάσγανα /p^hasgana/ ‘sword’ = pa-ka-na ; Fάστυ /wastu/ ‘city’ = wa-tu); however, /s/ is usually spelled out when the second consonant of the cluster is either /m/ (smV) or /w/ (swV) (e.g. δοσμός /dosmos/ ‘contribution’ = do-so-mo). As a final remark, in addition to biconsonantal clusters, we can also find a few instances of triconsonantal clusters: in this context the ‘empty vowel’ rule applies and all three consonants are spelled out with the aid of the ‘empty vowel’ (e.g. the man’s name a-re-ku-tu-ru-wo /Alektruōn/).

In sum, the two strategies used to spell consonantal clusters are either partial spelling (omission of a sound) or plenary spelling (full rendition of both consonants). It would appear that these two strategies are not accidental but compliant with the sonority hierarchy of consonantal sounds. Reference Woodard and WattWoodard (1994, Reference Woodard1997) has demonstrated that these spellings respect what he calls the ‘hierarchy of orthographic strength’ (Section 20.3). This theory is based on the assumption that orthographic strength progressively decreases from stops to liquids, following the sequence stop > fricative > nasal > glide > liquid. Thereby, Reference WoodardWoodard (1997: 65) comes to the conclusion that “within a word, any two successive consonants will be represented with plenary spelling if, and only if, the orthographic strength of the first is greater or equal to that of the second; otherwise, partial spelling will be used.” This non–syllable-dependent approach gives an accurate and elegant explanation of the systematic procedures used to write consonantal clusters and is therefore worth bearing in mind for any further analysis of syllabic spelling.

The next set of spelling rules concerns the treatment of vocalic sounds. As mentioned earlier, vowel length is not marked, neither is initial aspiration. The script, in fact, does not have a sign series for aspirated vowels, the only exception being sign a₂ (an LB innovation) standing for /ha/, which is attested in word-initial, word-internal (at compound boundary) and word-final position (see Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 73–78, Reference Pierini, Bernabé and LujánPierini 2014). The possibility may be entertained that some of the still undeciphered signs (Section 20.8) could potentially represent aspirated vocalic sounds, although no such cases have so far been clearly identified. Notwithstanding, there seem to have been ways of signaling the presence of intervocalic aspiration (although not explicitly marked): one of the methods used was the intentional omission of the so-called ‘graphic glides’. Graphic glides is the name conventionally given to transitional sounds /j/ and /w/ following a syllable ending in /i/ and /u/, respectively, and preceding a following vocalic sound, in order to ease the phonetic transition between the two next-by vowels. The phonetic nature of glides is still problematic and it is unclear whether these are simply transitional sounds or actual subphonemic features (see Reference Meissner, Sacconi, Del Freo, Godart and NegriMeissner 2008, Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 23). For this reason, their name for now remains ‘graphic’ glides. Let us see the above rule at work: the adjective ko-no-si-jo /knoːssios/ ‘of Knossos’ shows the glide /j/ placed between the vocalic sounds /i/ and /o/, likewise the noun ta-ra-nu-we /t^hraːnues/ ‘footstools’ (compare alphabetic Greek θρῆνοι/θρόνοι) shows the glide /w/ coming after /u/ and before /e/. The use of glides is reasonably consistent (with a few contextual exceptions: see Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 116–17, 120–22, Reference Del Freo and PernaDel Freo and Perna 2019: 142) and their absence generally points to the presence of intervocalic aspiration: e.g. the noun a-ni-o-ko /hannihok^hos/ ‘reins holder/charioteer’, and the personal name wa-tu-o-ko / wastuhok^hos/.Footnote ⁵ Omission of glides in contexts where these would be expected is one way of marking intervocalic aspiration; another way of marking it, in environments that would not have featured a glide anyway, is through the presence of the hiatus: the hiatus, expressed by writing two consecutive vocalic sounds, blocks vowel contraction, pointing to the presence of intervocalic /h/. We can see this phenomenon in instances such as the neuter plural ending -wo-a /woha/ of the perfect participle, which may also show the most accurate alternative spelling -wo-a₂ (e.g. te-tu-ko-wo-a alongside te-tu-ko-wo-a₂ /tetuk^hwoha/ ‘completely built’); the dative-locative plural case-endings -a-i (a-stems) and -o-i (o-stems) representing /ahi/ and /ohi/ respectively (e.g. e-qe-ta-i /hek^wetahi/ ‘to the Followers’).

Let us now move on to another case of sequences of vocalic sounds – diphthongs. Mycenaean Greek has /i/ diphthongs and /u/ diphthongs, but there are no separate and complete sign series for the notation of diphthongs in the script. Therefore, a set of conventions had to be devised for their rendering. With respect to /i/ diphthongs, conventionally the second element, /i/, is not spelled out (with very few exceptions).Footnote ⁶ Examples include po-me /poimeːn/ ‘shepherd’ (alphabetic Greek ποιμήν), ko-wa /korwai/ ‘girls’ (alphabetic Greek κόρ(F)αι, nominative plural). By contrast, in /u/ diphthongs the second element, /u/, is always explicitly notated: for example e-u-me-de /eumeːdeːs/ ‘Eumedes’ (alphabetic Greek Εὐμήδης), na-u-do-mo /naudomoi/ ‘ship builders’. It has to be pointed out that LB does have three signs that represent diphthongs: a₃ /ai/, a₄ /au/ and ra₃ /rai, lai/. These all belong to the ‘additional syllabary’, which is discussed in the next section.

20.7 Breaking the ‘Rules’

The ‘additional syllabary’ (Table 20.2) consists of 14 signs which do not form complete series (in-depth discussion in Reference Bernabé and LujánBernabé and Luján 2006: 26–30, Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 53–82, Reference Del Freo and PernaDel Freo and Perna 2019: 133–34). These are conventionally subdivided into ‘doublets’ and ‘complex syllabograms’. The former group comprises signs whose phonetic value is understood to be somewhat similar to that of signs belonging to the basic syllabary (hence ‘doublets’). Classified as such are a₂ /ha/, a₃ /ai/, a₄ /au/, pu₂ /p^hu/, /b^(h)u/, ra₂ /rja, lja/ (giving /rra, lla/) and ro₂ /rjo, ljo/ (giving /rro, llo/),Footnote ⁷ ra₃ /rai, lai/, ta₂ /sta^?/.Footnote ⁸ Of these, four are LB innovations (a₂, a₃, ro₂, ra₃), while four others (au, pu₂, ra₂, ta₂) have graphic antecedents in LA (shaded in Table 20.2). The latter group comprises signs of the CCV type, where the second consonant is either /w/ (labialized) or /j/ (yodized). To this group belong dwe and dwo, twe and two, pte (from p^je?), nwa. Except for the latter (nwa), which has a graphic antecedent in LA, these signs are all new introductions in LB (for a discussion of additional syllabary signs and their role in reconstructing the LA to LB script-adaptation process see Reference Judson and SteeleJudson 2017a).

Table 20.2 The LB ‘additional syllabary’

Shading for signs that have only got the same shape (but not phonetic value) in LA; no shading for signs that are new LB introductions.

(drawings by the author based on actual attestations)

It is worth noting that additional syllabary signs are used for the rendition of specific phonological traits, for example aspiration, gemination, and notation of diphthongs and of labialized and yodized clusters (CwV, CjV). The use of additional syllabary signs, however, is not systematic, and it seems to have been up to each individual writer to decide whether or not to make use of these signs instead of combinations of signs already available in the basic syllabary. Thanks to such spelling alternations it was thus possible to work out the phonetic value of most additional syllabary signs, which have been established through the joint effort of a number of scholars (in primis Reference MeriggiMeriggi 1955, Reference PalmerPalmer 1955, Reference 742Petruševski and IlievskiPetruševski and Ilievski 1958, Reference EphronEphron 1961, Reference LejeuneLejeune 1962, Reference ChadwickChadwick 1968). By way of example, the syllabogram ra₂ alternates with the digraph -ri-ja to represent the suffix /tria/ (compare alphabetic Greek -τρια) of feminine agent nouns (e.g. a-ke-ti-ra₂ and a-ke-ti-ri-ja /askeːtriai/ ‘weavers’), meaning that ra₂ was likely to represent the cluster /rja/. Using one sign instead of a sequence of two would have been a more economical choice for tablet writers.

There are other similar examples of variant spellings of the same word, with and without an additional syllabary sign: for example pe-ru-si-nwa alongside pe-ru-si-nu-wa (/perusinwa/ ‘last year’s’, alphabetic Greek περυσινός), pte-re-wa alongside pe-te-re-wa (/ptelewa/ ‘made of elm wood’, alphabetic Greek πτελέα ‘elm tree’), o-da-twe-ta alongside o-da-tu-we-ta and te-mi-dwe-ta alongside te-mi-de-we-ta (terms used to describe chariot wheels), pa-we-a₂ alongside pa-we-a (/p^harweha/ ‘clothes’, compare Homeric φάρε(h)α), pe-ra₃-ko-ra-i-ja alongside pe-ra-a-ko-ra-i-jo (the ‘Further Province’ in Pylos). Interestingly, in some instances spelling variation is also witnessed within the graphic repertory of one single scribe (e.g. scribe H 32 at Pylos). It was possible, therefore, for scribes to ‘break the rules’ and make use of alternative spellings, as long as these did not compromise the understanding of the underlying phonological word.

20.8 Scribal ‘Creativity’: Inconsistencies or Scribal Choices?

Overall, some 12 new signs were introduced in LB by the scribes, in both the basic and the additional syllabary (see Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 84–88). The basic syllabary was expanded with the addition of seven o-series signs (do, jo, mo, no, qo, so, wo)Footnote ⁹ and one e-series sign (pe). The additional syllabary was expanded with four ‘doublets’ (three signs of the a-series a₂ /ha/, a₃ /ai/, ra₃ /rai, lai/ and one of the o-series ro₂ /rjo, ljo/, arguably giving /rro/, /llo/) and all the new ‘complex’ syllabograms (except nwa). The fact that LB introduced most o-series signs has made some scholars argue that the template system, LA, and by implication the Minoan language, was a three-vowel system (/a/, /i/, /u/) (Reference 739Palaima, Sikkenga, Betancourt, Karageorghis, Laffineur and NiemeierPalaima and Sikkenga 1999; objections in Reference DavisDavis 2014: 240–2, Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 86, Reference Meissner, Steele, Nosch and EnegrenMeissner and Steele 2017). The absence of pe in LA, which at least graphically has got a number of e-series signs also continued into LB, remains puzzling and yet to be convincingly explained.

As to the innovations in the additional syllabary, most of these signs were either introduced more clearly to express specific features or used in specific contexts. I have already mentioned that a₂ /ha/ is the only sign clearly marking aspiration and, given its most occurrences in either word-initial or word-internal position starting the second element of a compound, it has been suggested that a₂ may have originated as a demarcative sign used mainly at word-start or as a marker for a compound boundary (Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 74). Another sign created to suit a specific context is ra₃ /rai, lai/, as it was used at word-end to mark the nominative plural of feminine nouns. Moreover, ra₃ appears to be a Pylian creation: it is widely used at Pylos in such morphological contexts by a number of scribes (H 1, 2, 4, 21, 31), but is never attested elsewhere. However, this is not the only sign to have originated at Pylos (or at least to be limited to this site). In fact, also the complex syllabogram two is so far a one-off attestation used by Pylian scribe H 43 to write the man’s name o-two-we-o (genitive singular). On the other hand, its counterpart twe is only attested at Knossos (and more widely employed), but never elsewhere. Whether this pattern of attestations is fortuitous and due to the partial state of preservation of the extant evidence or genuinely meaningful is a matter that remains to be ascertained.

What is interesting, however, is that the other two complex signs newly introduced in LB, namely dwo and dwe, show a comparable formation: these are labialized CCV signs belonging to the o-series and e-series. Moreover, we can be reasonably sure that dwo was a creation within LB. In fact, the shape of dwo is made of two mirroring wo signs, in other words ‘a pair of wo’s’. It has been argued (Reference RischRisch 1957: 32) that its graphic form is itself an indication that dwo was created on the basis of the Greek language, since the Greek word for the numeral ‘two’ is δύο /duo/, sounding like ‘duo wo’ and therefore ‘dwo’. In turn, dwe may well have originated from dwo by analogy. All in all, we may see here at play an underlying tendency to create pairs in an attempt at systematizing the new introductions. Moreover, it can be noted that a great number of new signs belong to the vocalic series most innovated in the basic syllabary (i.e. the newly expanded e-/o-series), and some of these signs seem to have been created on the basis of the analogic principle. This is likely to have been the case for ro₂ /rjo, ljo/ giving /rro, llo/, a sign arguably created by analogy with inherited ra₂ /rja, lja/, given that we do not have any examples of alternative spelling ro₂/-ri-jo (while we have the alternation ra₂/-ri-ja). Hence, the possibility may be entertained that the same principle also operated for the creation of the newly introduced sign a₃, representing the diphthong /ai/, by analogy with the inherited sign a₄, noting its /u/ diphthong counterpart /au/. A final note is worth adding in relation to pte, as this sign appears to stand out for not being part of any series, hence being somewhat isolated in the structure of the syllabary. It has been suggested (see lastly Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 69–70) that its phonetic value /pte/ developed from an original /pje/, where the palatalized labial never underwent full palatalization, but was replaced with the cluster /pt/ instead. Interestingly, and strangely enough given the assigned value, this sign is a LB innovation, not present in LA (or at least not found yet if it did exist), and is only used for the spelling of a limited number of words mainly at Knossos, Pylos and sporadically Tiryns.

By presenting and discussing the signs which were newly introduced in LB, this section has laid stress on the ‘creativity’ of the writers involved in the process of adapting the script to the Greek language. It has been shown that some creations were added to better account for Greek phonological features (e.g. aspiration); some others instead appear to be more restricted in both context of use and attestations (e.g. ra₃). All in all, such examples are worthy of the title of ‘scribal creativity’.

20.9 Filling the Gaps and the Undeciphered Signs

Looking back at Tables 20.1 and 20.2, some irregularities catch the eye: Why are there still empty slots in the syllabic series? Some of these empty slots are to be expected for phonetic reasons and can therefore be taken as ‘structural gaps’. this is most likely the case for the slots showing the phonetic sequences /(C)wu/ (including qu = /k^wu/), and maybe also for the sequence /ji/ (although Reference Melena, Bernabé and LujánMelena 2014a: 83, suggests a value /ji/ or /zi/ for the undeciphered sign *63 ; for a possible, although still speculative, reconstruction of the general structure of the LB syllabary see: Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 88–89, Reference Del Freo and PernaDel Freo and Perna 2019: 138–40). Other empty slots, instead, may simply reflect our still partial understanding of the LB syllabary in its full form and stand a chance of being filled with some of the undeciphered signs. By way of example, the values /ju/, /zi/, /zu/ are not implausible and would fill in some of the partially complete series, making it reasonable as well as justifiable to look for their presence among the undeciphered signs.

In fact, although more than half a century has now gone by since the decipherment of LB, some 14 syllabograms (Table 20.3) are still enigmatic and remain untransliterated, hence called ‘undeciphered’, and conventionally referred to by their classification number preceded by an asterisk (see summary in Reference Del Freo and PernaDel Freo and Perna 2019: 134–35, with specific references; Reference JudsonJudson 2020). Also in this case, some of these signs have been inherited and continued from LA (Table 20.3, shaded slots), while some others are only attested in LB. Most of the undeciphered signs are rarely used and usually occur to spell names of arguable non-Greek origin (e.g. a female name spelled *18-to-no at Knossos). Moreover, a good number of these do not show a widespread geographical distribution, as their attestations (and therefore use) are limited to certain sites: for instance, signs *18, *47 and *49 are only attested at Knossos; sign *63 only at Pylos and Thebes; signs *64, *83, *86 only at Knossos and Pylos.

Thanks to alternative spellings with signs of the basic syllabary as well as contextual analyses, some very speculative phonetic values have been advanced for a number of undeciphered signs. These have been put forward mainly by Reference Melena, Duhoux and Morpurgo DaviesMelena (2014c: 88–89) and are given in Table 20.3 (with further references to specific discussions). For the time being, all these tentative values must be taken with due caution, as most values are not officially endorsed by the Mycenological Colloquia,Footnote ¹⁰ and the hypotheses that await confirmation need to be tested with further studies. In fact, as Reference JudsonJudson’s (2020) thorough analysis of the undeciphered signs points out, at present we are at an impasse, as there is no way to prove uncontrovertibly any of the values attributable to these signs, with a very few exceptions. The only sign whose phonetic value can be more securely determined is *65 (Reference Melena, Bernabé and LujánMelena 2014a: 75–79, 81–83), which is likely to be read as /ju/ because of some compelling spelling alternations (e.g. a place name variably spelled ri-*65-no, ri-u-no, ri-jo-no) and a possible etymological connection of the LB word i-*65 (nominative), alternating with i-je-we (dative), with the Greek word for ‘son’ (/^hius/ > υἱός). In case this reading is unanimously accepted, sign *65 is likely soon to be moved to the basic syllabary to fill the slot ju /ju/.

Table 20.3 The LB ‘undeciphered’ syllabograms

Shading for signs that have only got the same shape (but not phonetic value) in LA; no shading for signs that are new LB introductions.

(drawings by the author based on actual attestations)

20.10 Beyond Signs: Tablet Layout

In addition to orthographic conventions, the documents inscribed in LB are characterized by a peculiar set of ‘layouts’ (i.e. modalities of disposing textual information on the writing surface), which can be taken as some sort of mise-en-page. The purpose of this strategy appears to have been to enhance and ease legibility of the record as a whole for easier and quicker access to key information (e.g. place names, commodities listed, personnel involved) at first glance. As such, textual structure itself can be taken as carrier of information, as each item occupies a dedicated space. There are examples of ‘capitalization’, where the first word of the record is written in bigger characters and stands out from the remaining text (usually for its importance). One such example is given in Figure 20.1, where a man’s name (a-re-ke-se-u /Alekseus/) is written in ‘capitals’, followed by the indication of a place name (pa-i-to ‘Phaistos’) in smaller characters on the second line, and the items (flock of sheep) recorded by means of logograms followed by numerals positioned at the far right end. This disposition arrangement places emphasis on the name of the individual (in this case a shepherd), in charge of the flock of sheep and its location, as well as pointing out the overall size of the flock (100 sheep): an easy and effective method.

Figure 20.1 LB tablet from Knossos (KN Da 1156)

(drawing by the author after CoMIK, Chadwick et al. 1986–1998, vol. 2: 43)

Change in character size is a strategy used to separate words, but not the only one. Two other methods were devised by the ‘scribes’ for marking word division: leaving a space between two consecutive words (as we do), or using a word divider in the shape of a short, straight vertical line (represented by a comma in transliterations, as in Figure 20.1; on word division see Reference 679Duhoux, Deger-Jalkotzy, Hiller and PanaglDuhoux 1999: 227–36, Reference Melena, Duhoux and Morpurgo DaviesMelena 2014c: 123–28, Reference Del Freo and PernaDel Freo and Perna 2019: 146–47, Reference Meissner, Killen and Morpurgo DaviesMeissner forthcoming). The latter method appears to have had an edge over the former, as it is the most frequently used (especially at Pylos), albeit not systematically. The word divider is a new introduction in LB, not present in LA. In some cases, we may even talk of scribal hypercorrection, as word dividers are sometime placed (unnecessarily) between a word and a logogram, between logograms, or even between logograms and their related numerical entry. There are, however, some exceptions to the use of either method: this is the case for formulae (nominal compounds) and clitic particles. Some words are in fact simply juxtaposed, without graphic separation: for example a-ne-mo-i-je-re-ja ‘priestess’ (i-je-re-ja /hi^jerei^ja/, alphabetic Greek ἱερεία) ‘of the winds’ (a-ne-mo /anemoːn/, alphabetic Greek ἀνέμων, genitive plural), pa-si-te-o-i ‘to all’ (pa-si /pansi/, alphabetic Greek πα(ν)σί) ‘the Gods’ (te-o-i /t^hehoihi/, alphabetic Greek θέοισι, dative plural). As for clitics, these are usually attached to word-start or word-end: o-u-di-do-si /ou didonsi/ ‘they do not give’, with proclitic o-u- /ou/ ‘not’ (alphabetic Greek οὐ) preceeding the verb; e-ke-qe /ek^hei k^we/ ‘(s/he) has’, with enclitic -qe /k^we/ (alphabetic Greek τε) following the verb (on Mycenaean particles see Reference Salgarella, Jasink and AlbertiSalgarella 2018, Reference Salgarella2019a, Reference Salgarella2019b). As a last remark, in LB there are no cases of scriptio continua (also known as scriptura continua ; see Chapter 18, this volume), nor are words ever split across lines (which, instead, happens quite often in LA). Moreover, writing lines may at times be ruled to ease directionality of writing; this is also a new feature introduced in LB (only a few LA texts show ruling, which is never consistent throughout the document). In conclusion, LB documents show neater writing on the tablet surface and appear to have improved on the mise-en-page for quicker information retrieval, resulting in an overall systematization of the writing practice as a whole.

20.11 Conclusion

This chapter has discussed the methods scholars used over the past decades to reconstruct the orthography of a writing system previously unknown and no longer in use, and the challenges they faced to reach as accurate an understanding as possible of both the writing system itself (Linear B) and the underlying language (Mycenaean Greek). It has been shown that the reconstruction of LB orthographic conventions was context-based, as well as context-driven. The historical and linguistic backdrop, within which the adaptation of an already existing writing system, LA, to render a different language, Greek, took place, played a major role in the reconstruction, which is still ongoing (e.g. ‘undeciphered’ signs). It has been shown that the reconstruction was a complex process, involving steps stretching from predecipherment (identification of recurrent patterns) to postdecipherment (systematic analysis of such patterns to work out orthographic conventions by comparing LB spellings with Proto-Indo-European and alphabetic Greek). This enabled scholars to draw up the ‘rules’ that governed the system and, by assessing deviations, to evaluate the extent to which these were adhered to. In fact, orthographic variation is also observable within the time span of LB itself (variant spellings, site-restricted signs). This chapter has illustrated how an initially unknown orthographic system was reconstructed, and how that reconstruction contributed to a more subtle appreciation of the diachronic development of Greek, here seen in its ‘Bronze Age snapshot’.

Book contents

Part IV - Empirical Approaches

Summary

Information

14.1 Introduction

14.2 Republican and Early Imperial Latin Orthography: Grammarians and Epigraphy

14.3 Oscan Orthography: Regional Communities of Practice

Table 14.1 Lejeune’s analysis of vowel orthography in Oscan in the Greek alphabet

Table 14.2 Zair’s analysis of vowel orthography in Oscan in the Greek alphabet

14.4 Umbrian Orthography: Rules in the Iguvine Tables

Table 14.3 Front vowels in Umbrian (simplified presentation)

14.5 Venetic Orthography and Punctuation: Becoming Roman

14.6 Conclusion

15.1 Introduction

15.2 Paleography

15.2.1 Scope

15.2.2 Orthography in the Context of Paleography

15.3 Codicology

15.3.1 Scope

15.3.2 Orthography in the Context of Codicology

15.4 Orthography and Materiality

15.4.1 Durability and Ephemerality

15.4.2 Printing

15.4.3 Double Orthography: Prints and Manuscripts

15.4.4 Text

15.5 Conclusion

16.1 Introduction

16.2 Data Collection

16.2.1 Intratextual Variable Analysis (TRAVA)

16.2.2 Intertextual Variable Analysis (TERVA)

16.2.3 Cross-textual Variable Analysis (CTVA)

16.3 Data Interpretation

16.4 Conclusion

17.1 Introduction

17.2 Philology in Classical Antiquity and the Middle Ages

17.3 Orthography, Renaissance Philology and Beyond

17.4 Orthography and Comparative Philology

17.5 Saussure and the Structuralists

17.6 Philology in the Twentieth Century

17.7 ‘New Philology’ and Pragmaphilology

17.8 A Linguistic Atlas of Late Mediaeval English

17.9 Case Study: Two Scribes of the Tanner Bede

Table 17.1 Spelling variation in T2 and T4

17.10 Quantification in Philological Approaches to Orthography

17.11 Case Study: The Scripting of /w/ in Old English and Old High German

17.12 Conclusion

18.1 Introduction

18.2 Formatting and Layout

18.3 Spacing: Word Division

18.4 Case Studies

18.4.1 Word Division

18.4.2 Line-final Word Division

Table 18.1 Type of division in OE (n.f.)

Table 18.2 Phonological boundaries in OE (n.f.)

Table 18.3 Morphological boundaries in OE (n.f.)

Table 18.4 Type of division in ME (n.f.)

Table 18.5 Phonological boundaries in ME (n.f.)

Table 18.6 Morphological boundaries in ME (n.f.)

Table 18.7 Type of division in EModE (n.f.)

Table 18.8 Phonological boundaries in EModE handwritten texts (n.f.)

Table 18.9 Phonological boundaries in EModE printed texts (n.f.)

Table 18.10 Morphological boundaries in EModE handwritten texts (n.f.)

Table 18.11 Morphological boundaries in EModE printed texts (n.f.)

18.5 Some Follow-up Thoughts

18.6 Conclusion

19.1 Introduction

19.2 Comparative Method: Exploring Variation

19.3 Theoretical Preliminaries in Sociopragmatics

19.4 Pragmaphilology Meets Diachronic Pragmatics

Table 19.1 Clusters of features in Novgorodian writings with different authors’ referential perspectives

19.5 Criticism, Impact and Perspectives

19.6 Conclusion

20.1 Introduction

20.2 Writing in Bronze Age Greece: Linear B in Context

Table 20.1 The LB ‘basic’ syllabary

20.3 The Ancestry of the Syllabary

20.4 Reconstructing Orthographic Conventions

20.5 Writing Greek in Syllables

20.6 Spelling ‘Rules’

20.7 Breaking the ‘Rules’