Throughout this book, we have examined novel computational methods of establishing family relationships among languages developed by the Gray–Atkinson approach, showing that they fail to provide an alternative to the traditional techniques, such as the comparative method, linguistic paleontology, and so on. It is for good reason that Kiparsky (Reference 302Kiparsky, Bowern and Evans2014: 65) calls the comparative approach “the gold standard” in historical linguistics and that Longobardi and Guardiano (Reference Longobardi and Guardiano2009: 1681) consider it to be “undoubtedly one of the greatest achievements of the human sciences”. Still, despite their significant triumphs, none of these time-tested methods can yet provide conclusive answers to a number of the key questions of linguistic prehistory. Although multiple lines of linguistic evidence, for example, point to an Indo-European homeland in or near the Pontic steppes (as discussed in the preceding chapters), the issue remains open. Continued uncertainty, however, does not mean that the outlook for more decisive studies is gloomy, much less that we should jettison our accumulated knowledge and well-honed methods so that we can reinvent historical linguistics as a quasi-biological science. In this concluding chapter, we consider the future outlook of historical linguistics in general and of Indo-European studies in particular.
In our view, the best opportunities – as well as the biggest challenges – for historical linguistics lie with its connections to other fields, both within and outside the broad discipline of linguistics. Cultural and physical anthropology, history, and philology all have something to contribute to our understanding of the differentiation and spread of languages. Recent advances in such fields as archeology (Heggarty Reference Heggarty, Bowern and Evans2014), geochemical fingerprinting (Kamber Reference Kamber2009), and human genetics (Underhill et al. Reference Underhill, Myres, Rootsi, Metspalu, Zhivotovsky, King, Lin, Chow, Semino, Battaglia, Kutuev, Järve, Chaubey, Ayub, Mohyuddin, Mehdi, Sengupta, Rogaev, Khusnutdinova, Pschenichnov, Balanovsky, Balanovska, Jeran, Augustin, Baldovic, Herrera, Thangaraj, Singh, Singh, Majumder, Rudan, Primorac, Villems and Kivisild2010; Myres et al. Reference Myres, Rootsi, Lin, Järve, King, Kutuev, Cabrera, Khusnutdinova, Pschenichnov, Yunusbayev, Balanovsky, Balanovska, Rudan, Baldovic, Herrera, Chiaroni, Di Cristofaro, Villems, Kivisild and Underhill2011; Rootsi et al. Reference Rootsi, Myres, Lin, Järve, King, Kutuev, Cabrera, Khusnutdinova, Varendi, Sahakyan, Behar, Khusainova, Balanovsky, Balanovska, Rudan, Yepiskoposyan, Bahmanimehr, Farjadian, 310Kushniarevich, Herrera, Grugni, Battaglia, Nici, Crobu, Karachanak, Kashani, Koushmand, Sanati, Toncheva, Lisa, Semino, Chiaroni, Di Cristofaro, Villems, Kivisild and Underhill2012; Patterson et al. Reference Patterson, Moorjani, Luo, Mallick, Rohland, Zhan, Genschorick, Webster and Reich2012; inter alia) shed new light on prehistoric population movements, which have far-reaching effects on the structure of language groups. As for the wider discipline of linguistics, we agree with Kiparsky (Reference 302Kiparsky, Bowern and Evans2014: 88) that historical linguistics
is situated at a crossroads where almost all branches of the field meet. A historical study might draw on processing and pragmatics, morphology and corpus linguistics, sociolinguistics and syntax, phonetics and formal language theory.
As a result of such developments, studies of language contact now put the analysis of language change on a firmer footing, making substratum and superstratum hypotheses, long considered with suspicion, empirically falsifiable and therefore advanced more confidently (Matras and Sakel Reference Matras and Sakel2007). In particular, it has been argued that contacts both within the Indo-European family and between Indo-European and other families have had profound effects on language change (see Stilo Reference Stilo, Csató, Isaksson and Jahani2004; Aikio Reference Aikio2006; Schrijver Reference Schrijver2009, Reference 311Schrijver, Hasselblatt, Houtzagers and van Pareren2011; Kiparsky Reference Kiparsky2012; Filppula Reference Filppula and Hickey2013; and others). In another promising development, the enhanced documentation of understudied languages, in conjunction with advances in theoretical linguistics and language typology, allows for more detailed and encompassing comparative work. Experience gained in studying one particular language family has also been successfully applied in examining other families. Sociolinguistic studies of ongoing sound changes (e.g. Labov et al. Reference Labov, Rosenfelder and Fruehwald2013) allow for a better understanding of similar changes in the past. Corpus linguistics permits a more fine-grained analysis of phenomena involving variation – and ultimately, historical change – conditioned by frequency (cf. the study of English vowel syncope by Bybee Reference Bybee2007). Needless to say, such enhanced understanding of language change clarifies language relatedness.
But perhaps the most important change in historical linguistics derives from what Kiparsky (Reference 302Kiparsky, Bowern and Evans2014: 88) calls “breaching […] the Saussurian firewall between synchrony [i.e. theoretical linguistics] and diachrony [i.e. historical linguistics]”. Ironically, the parallels here to biology are much stronger than Gray and Atkinson would admit. In their essay on “curious parallels and curious connections” between biology and historical linguistics, Atkinson and Gray (Reference Atkinson and Gray2005: 521–524) note a number of methodological challenges shared by the two fields, such as “developing algorithms to determine the probability that lexical characters are cognate”, “model fitting and comparison”, “developing methods to investigate reticulate evolution”, and the like. Curiously, what they fail to note is the more general point: just as the biological classification of species, originally based on externally accessible characteristics, underwent a revolution on the grounds of progress in theoretical biology (i.e. the rise of molecular genetics) so too progress in the phylogenetic classification of languages must be based on progress in theoretical linguistics. Thus, only linguists – not biologists – can push the research frontier forward by identifying the basic building blocks of language – its “atoms”, in Mark Baker’s memorable metaphor (Baker Reference Baker2001b) – and by examining carefully how they play out in linguistic evolution.
The most important paradigmatic change in linguistics, one continuously alluded to in this book (see especially Chapters 3 and 4), is the realization that language is not merely words. Thus if the main problem with the Gray–Atkinson method lies in its exclusive use of lexical material, could some of the same procedures and algorithms that they use be applied to phonological or grammatical characteristics? The remainder of this chapter considers this very question.
The “atoms” of sound: distinctive features
One possible approach, pioneered by Heggarty (Reference Heggarty, Renfrew, McMahon and Trask2000), is to quantify linguistic distance (i.e. the degree of similarity/difference between any two language varieties: e.g. Italian and French) and the magnitude of change over time (i.e. linguistic difference between stages of a given language: e.g. Latin and Italian) in phonetic terms. Heggarty’s starting point is an observation pertaining to words cognate with the Latin castellum ‘castle’. As he argues, “it scarcely takes a linguist to tell that Italian castello /kas'tεllo/, while indeed different from Spanish castillo /kas'tiʎo/, is far less different from it than is French château /ʃato/” (Reference Heggarty, Renfrew, McMahon and Trask2000: 533). But how can one capture such an intuition in an objective manner? Heggarty’s answer is to apply our knowledge of phonetics and cross-linguistic typology. After aligning correspondent phonemes (a methodological step that is itself more complicated than it seems; see Heggarty Reference Heggarty, Renfrew, McMahon and Trask2000: 544–547), one can look for phoneme differentiation. For example, by aligning the phonetic representations of Italian castello and French château, one notes that the Italian /k/ corresponds to the French /ʃ/ (as a result of the sound change in French, discussed in detail in Chapters 4 and 8), the /s/ is missing from the French but not the Italian word, and so on.
However, it quickly becomes clear that the number of different phonemes (as well as deleted or inserted phonemes), known as the Levenshtein distance, is still an inadequate measure of language distance. For instance, the English word man differs from bad by two phonemes, the same number that characterizes the difference between man and bin. However, the correspondences between /m/ and /b/ and between /n/ and /d/ are surface realizations of the same pattern: the loss of nasality (as can be easily ascertained by pronouncing the word man with a severe cold or by pinching one’s nose shut). In contrast, the two phonemic differences between man and bin involve different patterns: the loss of nasality on the consonant and the change in vowel quality. Moreover, phonemes may not be the best units in which to measure change, as the examination of distinctive features offers a number of advantages. For example, the phonemic difference between tap and tab and between tap and tan amounts to one phoneme in each case. However, /p/ and /b/ share two features and differ in one: in the place of articulation both are bilabial, and in the manner of articulation both are stops, the only difference being that of voicing: /p/ is voiceless and /b/ is voiced. In contrast, /p/ and /n/ differ in three features: place of articulation (bilabial vs. dento-alveolar), manner of articulation (or nasality, depending on analysis of the oral/nasal distinction), and voicing (voiceless vs. voiced). Thus, the difference between /p/ and /n/ is significantly greater than that between /p/ and /b/; consequently, the latter correspondence is much more frequently attested, both in synchronic patterns (e.g. word-final devoicing in Russian: grib ‘mushroom’ and grip ‘flu’ are pronounced the same, [grip]) and in historical change. A change from /p/ to /n/, or vice versa, on the other hand, is virtually unattested. Assuming that all features are assigned equal weight, the “overall similarity rating” for /p/ and /b/ would be 2/3, or 67 percent. Heggarty further maintains, however, that different features should be assigned different weight; for example, on the grounds that the place of articulation and the manner of articulation “are standardly used to bear more phonemic distinctions than is [voicing], the similarity rating for [p] and [b] […] actually emerges from the calculations not at 2/3, but nearer 4/5, or 80 per cent” (Reference Heggarty, Renfrew, McMahon and Trask2000: 543).
The overall results of Heggarty’s calculations accord well with common intuitions about language similarities as well as with the accepted phylogenetic classification of the languages that he examines. For example, Russian is more similar to other Slavic languages (with an average similarity rating based on forty cognates of 72.5, where 100 means full identity) than it is to modern Romance (37.6) or modern Germanic languages (39.3). In contrast, Norwegian is more similar to other modern Germanic languages (57) than to either modern Romance (40.8) or modern Slavic languages (39.3). In the comparison of modern Romance languages, Italian is the closest to Latin (63) and French is the most distinct (36). (Curiously, these results even confirm our impressionistic personal intuition that Portuguese bears more resemblance to Polish than to any other Slavic language.)
While these results are certainly interesting, they are not without problems. The first difficulty is evident if we compare the similarity ratings presented by Heggarty (Reference Heggarty, Renfrew, McMahon and Trask2000: 539, 551) in the three charts; the relevant data are summarized in Table 11.1.
These data make it clear that the similarity ratings depend heavily on the cognate set employed – that is, the list of words on the basis on which the similarity is calculated. Going back to the original observation about the cognates of the Latin castellum ‘castle’, the calculations performed here confirm the common intuition that Italian and Spanish are more like each other – in fact, more than twice more like each other – than either is like French. However, the difference is much less conspicuous if we compare cognates of numerals from ‘one’ to ‘ten’: here the similarity between Italian and Spanish is only 1.5 times greater than that between Italian and French. Based on the first cognate set, Italian and Spanish are 90 percent similar, whereas based on the numerals they are merely 68 percent similar; based on the more expansive set of forty cognates, Italian and Spanish are 72 percent similar. These results confirm the point we stressed in Chapter 4: the choice of the data set can predetermine, or at least bias, the results. In this respect, Heggarty’s approach shares many of the flaws of the Gray–Atkinson approach.
Two additional problems emerge when Heggarty’s method is used to quantify change over time or the time depth of a particular extinct language (or an earlier stage of an extant language). First, to use his methods effectively one would need to know the precise pronunciation of the given words in the given extinct language. For such languages as Latin or Anglo-Saxon (Old English), pronunciation is fairly well understood, but for Proto-Indo-European (PIE), the manner in which reconstructed words were pronounced is notoriously controversial. We have no indisputable PIE pronunciation table to compare with those of its extant or intermediate descendants. The second problem is that rates of change (calculated as “retained phonetic similarity” between ancestral language and its descendant subtracted from 100 percent) vary from language pair to language pair. As can be deduced from Table 11.1, Latin changed vastly more in becoming modern French than in turning into any other Romance language. This observation is further confirmed by “retained phonetic similarity” ratings, in which Latin to French is rated at 36 percent, whereas Latin to Italian is 63 percent, Latin to Spanish or to Romanian 57 percent, and Latin to Portuguese 54 percent. Since all of these modern languages developed from Latin over the same amount of time, the resulting rates of change in different Romance branches vary from 20.2 percent (for Italian) to 39.2 percent (for French). The rate of change in the development from Anglo-Saxon to modern English, which took much less time, is comparable to the rate of change from Latin to French, 37.4 percent. Note, however, that the rate of change for English and French is nearly four times greater than that for Modern Greek emerging out of classical Greek (10.1 percent). It is distinctly possible, moreover, that some languages changed faster than French or more slowly than Greek. Thus, the average rate of change in phonetics is about as meaningful as the average temperature of patients in a hospital. Heggarty (Reference Heggarty, Renfrew, McMahon and Trask2000: 554) admits as much: “rates of change in phonetics seem quite unable to give us a dating tool of any precision at all”.
To summarize, neither phonemes themselves nor distinctive phonemic features appear to be good elements for quantifying the linguistic distance between languages and for deducing from such a measurement their relatedness. As a result, a different type of comparanda must be identified.
Phonological/morphological rules and typological features as the “atoms” of change
A promising alternative approach has been taken by Ringe et al. (Reference Ringe, Warnow and Taylor2002) who compare and quantify phonological and morphological changes rather than phonological and morphological elements. While this approach is theoretically attractive, as it avoids most problems identified for both the Gray–Atkinson model and that of Heggarty, the approach of Ringe et al. suffers from another flaw: there are few if any such characters that can be applied meaningfully to the classification of Indo-European languages on all levels. Ringe and his colleagues identify twenty-two phonological characters and fifteen morphological characters usable for high-order groupings within the Indo-European family. However, these phonological and morphological changes are not appropriate for identifying many lower-level subgroupings. Conversely, phonological and grammatical changes that are useful for identifying such lower-level subgroupings are not necessarily helpful for high-order groupings.
A good example of such complexities is the presence or absence of pleophony, discussed in Chapters 4 and 5, in connection with the proper placement of Polish within the Slavic tree. Pleophony is useful for subgrouping Slavic languages, but has no value for the rest of the Indo-European tree. Similarly, the lenition of intervocalic consonants, which can be reflected in voicing, spirantization, or even deletion, helps identify Western Romance languages (those spoken north and west of the so-called La Spezia–Rimini Line) from those in the South Romance (or Italo-Romance) and Eastern (or Balkan) Romance groupings. Another innovation, which correlates with intervocalic voicing, is the use of -s to mark plurals of nouns regardless of gender or declension in Western Romance languages, rather than the change of the final vowel, as in Eastern and South Romance languages. Both of these innovations are illustrated in Table 11.2. Yet, these developments are of no use for classifying languages outside the Romance family.
Table 11.2 Intervocalic lenition (voicing, spirantization, or deletion) and plural -s in Romance languages, illustrated with the singular and plural forms of the words for ‘life’ and ‘wolf’
Yet, selecting phonological and grammatical changes as comparanda also depends crucially on one’s theoretical assumptions. As McMahon and McMahon (Reference McMahon and McMahon2008: 277) remind us,
the features used will depend on the depth of linguistic analysis, and often on the theoretical model being used, as different constructions may be recognized in different theories. Our historical knowledge of structural features is also less certain, and, for example, it is not currently clear whether some morphosyntactic characteristics might be more prone to borrowing than others, or might arise spontaneously from a number of different sources, and hence be erroneously thought to indicate relatedness.
In fact, attempts have been made by the Gray–Atkinson team to construct phylogenetic language trees based on typological rather than lexical information. For instance, Greenhill, Atkinson, et al. (Reference Greenhill, Atkinson, Meade and Gray2010) analyzed a global dataset based on 99 languages and 138 typological features compiled in the World Atlas of Linguistic Structures (WALS; Dryer and Haspelmath Reference Dryer and Haspelmath2013). The NeighborNet analysis of the data by Greenhill, Atkinson, et al. correctly grouped some of the languages in this database into known language families, such as Indo-European, Altaic, and Nakh-Dagestanian (Northeast Caucasian). However, other well-established families, including Sino-Tibetan, Uralic, Austronesian, and Trans-New Guinea, were not recovered. These authors also point out a substantial number of conflicting signals (represented in NeighborNet diagrams by box-like structures), leading to an inaccurate recovery of many well-attested phylogenetic relationships within major language families. For instance, the network links German to French rather than to English, a much more closely related language. When applied to a lexical rather than grammatical dataset, the same method recovered a much more tree-like signal, leading the authors to conclude that “the lexical data were a significantly better fit to the expected family trees than the typological data” (Greenhill, Atkinson, et al. Reference Greenhill, Atkinson, Meade and Gray2010: 2446).
The problem, however, is not that typological features are inappropriate for phylogenetic work in general, but rather that we still lack an accurate and comprehensive list of all typological attributes that define cross-linguistic variation. As good a database as WALS (Dryer and Haspelmath Reference Dryer and Haspelmath2013) is, with its catalog of over 140 typological features across 2,561 languages, many if not most of the features that it lists are epiphenomenal, questionable, or contain information duplicating that of other features. For instance, the Order of Genitive and Noun (Feature 86A) is epiphenomenal, as several unrelated types of both Genitive–Noun and Noun–Genitive constructions should be distinguished, as discussed by Longobardi (Reference Longobardi, Baltin and Collins2001) and Crisma (in press). Another problem can be illustrated by the order of subject, object, and verb (Feature 81A), which is a composite of two related features: the order of subject and verb (Feature 82A) and the order of object and verb (Feature 83A). Even if only the order of subject, object, and verb is considered, the validity of this feature is dubious for at least three types of languages. First, in languages with Philippine-style topic marking, such as Tagalog and Malagasy, one of two elements may be considered the subject: the semantic subject (i.e. whoever did the action) or the grammatical subject, which is reflected on the verb in the form of topic marking morphology (Guilfoyle et al. Reference Guilfoyle, Hung and Travis1992). Second, in many syntactically ergative languages the word order is determined not by whether a given element is a subject or an object, but by its case marking: ergative vs. absolutive. Third, in non-configurational languages, such as Walpiri, the predominant word order of subject, object, and verb cannot be determined at all. Moreover, many of the features listed in the WALS database describe superficial patterns rather than the deep design properties responsible for these patterns. For instance, the order of object and verb and the order of adposition and noun phrase (Feature 85A) may be determined by one and the same factor – the head-directionality parameter, which determines the relative ordering of the head (e.g. verb or preposition) and its complement (Travis Reference Travis1984; Baker Reference Baker2001b). As a result, most Verb–Object languages (92 percent) have prepositions, whereas an even greater majority of Object–Verb languages (97 percent) have postpositions.
The “atoms of language”: parameters
As can be seen from the brief discussion above, identifying deep-design typological features that truly underpin cross-linguistic variation and change – called “parameters” – is a matter of hot theoretical debates. Even the number of parameters has been contested, with some scholars suggesting a figure between ten and twenty, and others claiming that the number of parameters must be “at least in the hundreds” (Longobardi and Guardiano Reference Longobardi and Guardiano2009; this issue is discussed in greater detail in Baker Reference Baker2001b). But even if a comprehensive list has not yet been compiled, pioneering work is ongoing to use parameters (or rather parameter values, to which we return below) as comparanda in determining phylogenetic relationships, most notably by the LanGeLin (Language and Gene Lineages) project headed by Giuseppe Longobardi at the University of York (Longobardi Reference Longobardi2003, Reference Longobardi, Broekhuis, Corver, Huybregts, Kleinhenz and Koster2005; Guardiano and Longobardi Reference Guardiano, Longobardi, Batllori, Hernanz, Picallo and Roca2005; Longobardi and Guardiano Reference Longobardi and Guardiano2009; Longobardi et al. Reference Longobardi, Guardiano, Silvestri, Boattini and Ceolin2013).
Taking as their point of departure the groundbreaking work of Nichols (Reference Nichols1992) and Dunn et al. (Reference Dunn, Burenhult, Kruspe, Tufvesson and Becker2005), who applied phylogenetic concerns and methods to “language structure”, Longobardi and his team “explore the historical significance not of surface generalizations [like those found in the WALS database], but of syntactic parameters, which should encode the rich implicational relations supposedly connecting distinct observable phenomena at the level of abstract cognitive structures” (Longobardi and Guardiano Reference Longobardi and Guardiano2009: 1683). Longobardi’s project is couched in the Principles-and-Parameters framework, developed since the publication of Chomsky’s (Reference Chomsky1981) Lectures on Government and Binding. According to this theory, the invariant human language faculty, or Universal Grammar (UG), predefines
a set of open choices between presumably binary values […] closed by each language learner on the basis of his/her environmental linguistic evidence […] grammar acquisition should reduce, for a substantial part, to parameter setting, and the core grammar of every natural language can in principle be represented by a string of binary symbols […] each coding the value of a parameter in UG.
To illustrate how such parameters work, consider the so-called wh-parameter, which pertains to the placement of question-words (who? what? where? and the like). Its binary options are: (a) question-words must appear sentence-initially, or (b) they do not have to do so. In English, Spanish, and Russian, the former option is instantiated (the parameter is set as “yes”, or “fronting”), whereas in Chinese and Japanese, the latter option is chosen (the parameter is set as “no”, or “no fronting”). For example, in Japanese ‘Who did John kick?’ is rendered as John-ga dare-o butta ka? (literally, ‘John who kicked?’). Crucially, the question word dare-o does not appear sentence-initially. (The placement of the object dare-o before the verb butta, in contrast to English, is controlled by a separate Head directionality parameter, discussed immediately below.)
The list of widely discussed parameters (see Baker Reference Baker2001b) includes the Polysynthesis parameter, which determines whether a verb must include some expression for each event participant (subject, object, indirect object), either via agreement morphemes or via incorporation (set as “yes” in Mohawk, “no” in English); Head directionality parameter, which determines whether heads (verbs, adpositions, auxiliaries) precede their complements (set “yes” for English, “no” for Japanese); Subject side parameter, which determines whether the subject is placed sentence-initially (set “yes” for English, “no” for Malagasy); Verb attraction parameter, which determines whether the verb precedes certain adverbs such as often and its counterparts in other languages (set “yes” for Welsh, “no” for English); Subject placement parameter, which determines whether the subject precedes the verb (set “yes” for French, “no” for Welsh); and Pro-drop parameter, which determines whether the subject can be omitted if its reference is understood from context (set “yes” for Spanish, “no” for French). With only these six parameters in mind (in that order), French can be described as: “no”, “yes”, “yes”, “yes”, “yes”, and “no”. Because a particular value of one parameter can entail the irrelevance of another parameter, only four of the six parameters can be set for English: “no”, “yes”, “yes”, “no”, and the latter two parameters are undefined for English.
According to Longobardi and Guardiano (Reference Longobardi and Guardiano2009), the parametric framework represents, mutatis mutandis, theoretical progress in linguistics parallel to that of the rise of molecular genetics in biology. The key similarity between parameters and genetic markers is that both “form a universal list of discrete options” ( Reference Longobardi and Guardiano2009: 1684). As such, the parametric approach is applicable to any set of languages, no matter how different, unlike the classical comparative method based on lexical cognates, which cannot apply to languages so distinct that no reliable sound correspondences can be identified. Another similarity between the parametric approach and the use of genetic markers in biology is that in both cases, like is compared with like: the value of a parameter in one language is compared to the value of exactly the same parameter in other languages. Also like many genetic polymorphisms, parameters are “virtually immune from natural selection and, in general, from environmental factors [and] largely unaffected by deliberate individual choice” ( Reference Longobardi and Guardiano2009: 1686). Therefore, parameters and their values in individual languages can serve as ideal comparanda for a phylogenetic study.
For practical reasons, Longobardi and Guardiano (Reference Longobardi and Guardiano2009) limit themselves to parameters pertaining to the structure of noun phrases, examining sixty-three binary parameters as set in twenty-eight languages (twenty-three extant and five extinct languages). Twenty-two of these languages are from the Indo-European family; the non-Indo-European languages in the sample are Hebrew, Arabic, Wolof, Hungarian, Finnish, and Basque. The tree generated from the syntactic differences (see Figure 6 in the Appendix) “meets most of [the authors’] expectations” (Reference Longobardi and Guardiano2009: 1693). Basque, usually treated as an isolate, is the first outlier. Wolof, a West Atlantic language that has never been connected to any European or Mediterranean language, comes second. Both Basque and Wolof are clearly recognized as external to a node coinciding with the so-called Nostratic grouping (cf. Pedersen Reference Pedersen and Spargo1931, Reference Pedersen1951; Illič-Svityč Reference Illič-Svityč1971/1984; Dolgopolsky Reference Dolgopolsky1988; Bomhard Reference Bomhard2008, Reference Bomhard2011).Footnote 1 The next outmost bifurcation singles out the (West) Semitic subgroup, containing Arabic and Hebrew. The Uralic (Finno-Ugric) family (Finnish and Hungarian in the sample) emerges correctly as well.
The branching within the remaining Indo-European family is overwhelmingly the expected one, although a few surprises are found. Among these unexpected patterns is the grouping of Slavic with Hindi, possibly explainable as reflecting the deep unity of satem languages (see Chapter 4). Since no other Indic or Indo-Iranian languages were included in the study, it is hard to evaluate how severe this problem is. The three Slavic languages used in the study – Russian, Serbo-Croatian, and Bulgarian – are connected into a Slavic group, and Hindi emerges as an outlier of this group. One could say, however, that the grouping of Hindi with the three Slavic languages reflects the so-called “Core IE languages”, a branch of the IE family consisting of Balto-Slavic and Indo-Iranian languages (see Figure 4 in the Appendix and Ringe et al. Reference Ringe, Warnow and Taylor2002). A second problem concerns the extinct Germanic languages: while modern Germanic languages – Norwegian, English, and German – form one cluster within Germanic, the two extinct languages in the sample – Gothic and Old English – are shown as forming another distinct cluster. This tree fails, in other words, to reflect the connection between Old English and Modern English. Longobardi and Guardiano (Reference Longobardi and Guardiano2009: 1693) account for this configuration as an effect of time: “the two ancient varieties, chronologically closer to the common source, will naturally attract each other, and indirectly affect the position of German” as an outlier among the three modern Germanic languages. This seems to be more of an issue with the Bayesian phylogenetic method than with the use of grammatical rather than lexical data. An alternative explanation for the odd grouping of English and Norwegian is that it is a reflection of the Scandinavian influence on English (see Chapter 4). Another surprising placement is that of Grico, a Greek variety spoken in Italy, which is grouped with Eastern Romance (Romanian) as an outlier branch within the Romance subfamily. This mistake is likely explainable by factors of areal influence. Curiously, if only modern languages are considered (Longobardi and Guardiano Reference Longobardi and Guardiano2009: 1701, their Figure 5), Grico is grouped with Greek and not with Romance languages. We have no explanation for this shift in grouping. It is hard to tell from the available data if the method indeed works better if only modern languages are considered (the only improvement being the correct treatment of Grico), but if this in fact turns out to be the case, it would be an additional advantage to Longobardi and Guardiano’s method over previous computational phylogeny techniques that seem to work better if ancient languages are included as well.
Perhaps the clearest departure from the traditional family tree in this approach concerns Slavic languages: of the three languages considered, Russian and Serbo-Croatian are grouped together, with Bulgarian as the outlier among the three. The traditional classification, in contrast, places Bulgarian with Serbo-Croatian in the South Slavic grouping, whereas Russian falls into the East Slavic branch. However, the classification produced by Longobardi and Guardiano may not be all that surprising: it has long been noted that Bulgarian differs significantly from Serbo-Croatian because of influences from the Balkan Sprachbund. The most notable difference between Bulgarian and Serbo-Croatian is the presence of post-posed articles in Bulgarian (for Longobardi and Guardiano this characteristic of the nominal system is associated with parameter 12: the “+” value is assigned to the three languages in the sample with post-posed articles: Romanian, Bulgarian, and Norwegian). As a result of such distinctive features, some Slavic scholars (cf. Sussex and Cubberley Reference Sussex and Cubberley2006: 42) had proposed a four-way division of Slavic languages into (North-)West, (North-)East, South-West (Serbo-Croatian and Slovenian), and South-East (Bulgarian and Macedonian). Longobardi and Guardiano (Reference Longobardi and Guardiano2009: 1693) further note that Bulgarian and Romanian “continue to be well-behaved Slavic and Romance languages, respectively, with opposite values for parameter 45”. One possible approach is “to argue that this persistence in 45 makes the two languages very different in other subtler surface properties, which go beyond the simplest noun-article phrases” (Reference Longobardi and Guardiano2009: 1693). While at this stage of analysis there is no conclusive list of parameters that may be more susceptible to areal interference, it is a promising avenue of research to look for ways “to single out genetic from areal sources of similarities” (Reference Longobardi and Guardiano2009: 1693). At any rate, it must be stressed that horizontal transmission (i.e. borrowing) of grammatical patterns is much less common than lexical borrowing (see Chapter 4). Grammatical borrowing requires prolonged and intense contact between linguistic groups, while lexical borrowing is much more ubiquitous. Therefore, interference effects in grammar are bound to be both less pervasive and more easily identifiable on geographical grounds. For example, Russian has borrowed a significant portion of its vocabulary from English and French, but not its grammatical patterns. Bortolussi et al. (Reference Boroditsky, Schmidt, Phillips, Gentner and Goldin-Meadow2011) argue that the findings resulting from the application of the Parametric Comparison Method (PCM) are significantly beyond chance. The reliability of the PCM has been further tested in Longobardi et al. (Reference Longobardi, Guardiano, Silvestri, Boattini and Ceolin2013), which applies the method – in order to verify and validate it – to the domains whose phylogeny is already known. This study consisted of “some experiments performed on a selection of 26 contemporary Indo-European varieties belonging to the Romance, Greek, Germanic, Celtic, Slavic, Indic and Iranian families” (Reference Longobardi, Guardiano, Silvestri, Boattini and Ceolin2013: 123). Nine additional contemporary Indo-European languages have been analyzed: Sicilian, (Northern) Calabrese, Bovese Greek, Danish, Icelandic, Slovenian, Polish, Farsi, and Marathi. The PCM identified the main subfamilies of Indo-European “strikingly well” (Reference Longobardi, Guardiano, Silvestri, Boattini and Ceolin2013: 124); importantly, horizontal transmission does not seem to limit the effectiveness of this method seriously enough to undermine the correct representation of the vertical (i.e. phylogenetic) relations. Importantly, the PCM is shown to be able “to reconstruct chronologically deep phylogenies using exclusively modern language data, often the only available data outside Eurasia” (Reference Longobardi, Guardiano, Silvestri, Boattini and Ceolin2013: 123).
While it is clear that the Parametric Comparison Method (PCM) needs much fine-tuning and extension, the overall conclusion that has emerged from the LanGeLin project is that generative syntax, “and more generally the bio-cognitive framework of which it is a salient part […] can […] become a true historical science, capable of gaining insights into the actual (pre)history of human populations, no less than the successful historical-comparative enterprise of the 19th century” (Longobardi et al. Reference Longobardi, Guardiano, Silvestri, Boattini and Ceolin2013: 124). We find such conclusions largely convincing.