How to Build a Language

Ian Roberts

doi:10.1017/9781316576595.011

10 How to Build a Language Language Typology and Universals

In Brazil, one can always find a five-legged cat.

(attributed to Voltaire)

Up to now, I’ve mostly talked about English. The reason for this is very simple: that’s the one language I can be sure you know or you wouldn’t be reading this book in the first place. And English is a real language, with lots of native speakers (about 400 million, in fact, making it third in the world after Mandarin and Spanish) and a fairly well-documented history. But there are plenty of other languages around, somewhere in the region of six to seven thousand according to most estimates. Of these, we now have some kind of linguistic description of about two thousand or more; most of the available information can be found on-line at the database of the World Atlas of Language Structures (www.wals.info). So now it’s time to take a look at what’s out there. There are two reasons for doing this. First, it’s intrinsically interesting to see the world’s linguistic flora and fauna. Second, as we’ll see, there appear to be patterns in the variation we observe among the world’s languages. We’re interested in patterns, as I’ve mentioned a couple of times. But, picking up the thread from the end of the last chapter, these patterns might conceivably be telling us something about Universal Grammar. So looking at diversity always involves looking at the other side of the coin: trying to see what does not vary and may therefore be universal and so, perhaps, innate. I’m going to concentrate on syntactic diversity and universals, partly for reasons of brevity and partly because this is where most of the interesting work has been done (although there’s plenty to say about phonological and morphological diversity too).

The pioneer in the field of language typology was Joseph Greenberg. In his original research during the early 1960s, Greenberg looked at thirty languages from around the world and observed about forty-five universals of different types. One of the things he looked at was the basic order of the main elements in a simple sentence: subject (S), verb (V) and object (O). We saw in Chapter 7 that we can distinguish VO languages like English, where the verb normally precedes the object (as in eat a mouse) from OV languages like Japanese where the order is the opposite (a mouse eat). In Greenberg’s early work, the subject was brought into the picture too, so we talk about SVO languages like English (Clover ate a mouse) as opposed to SOV languages like Japanese (Clover a mouse ate).

So, S, V and O give you three basic elements in the sentence. That means there are six logically possible permutations. All six are found in the world’s languages, which may not come as a surprise. But what’s interesting is that the incidence of the orders is highly skewed. Two orders are much commoner than all of the others, SVO and SOV, and two other orders, OVS and OSV, are extremely rare (in fact, they were not known to exist for sure until the 1970s). The picture is summarised by the data from the 1,377 languages analysed for word order in the World Atlas of Language Structures in (1):

- SOV (Clover a mouse ate): 565 languages = 41 percent
- SVO (Clover ate a mouse): 488 languages = 35 percent
- VSO (Ate Clover a mouse): 95 languages = 7 percent
- VOS (Ate a mouse Clover): 25 languages = 2 percent
- OVS (A mouse ate Clover): 11 languages = 0.8 percent
- OSV (A mouse Clover ate): 4 langauges = 0.3 percent

You might have noticed that the percentages only add up to 86.1 percent. The other 13.9 percent (189 languages) are languages for which a single basic order can’t be isolated for one reason or another. But it’s absolutely clear that there is a skewing here. We ought to expect each order to show up a sixth of the time (or if we allow a category of “no dominant order”, thinking of the missing 14 percent, a seventh) which would be in the region of 15 percent. But what we see is a big preference for SOV and SVO and a big aversion to OVS and OSV. Why should this be? There are basically four possible answers to this question.

The first is that it isn’t really true. We’ve only looked at around a third of the world’s languages here, and so maybe the other 4,000 or so will turn up a whole lot more OVS and OSV languages. That’s of course possible, but as our databases have grown over the years, the basic tendencies have remained pretty constant; the only thing which has really changed has been the discovery of the OSV and OVS languages, which are so rare (and mostly spoken in inaccessible parts of the Amazon jungle) that they were only discovered quite recently. Every survey so far has shown SVO and SOV to be the big winners.

A second answer is that it’s due to history. Languages can change their orders (we saw that when we looked at syntactic reconstruction in Chapter 7), and so maybe they tend to change in the same ways and, perhaps starting from a more even distribution in the distant past (something we can never know for sure), they’ve moved towards SOV and SVO. But this only shifts the question: why should languages all change that way? Of course, it could be a coincidence but it’s a very odd one, and we don’t like coincidences anyway.

The third possibility is to appeal to psycholinguistics. Maybe we prefer to ‘process’ the favoured orders. Perhaps SVO and SOV orders are just, to put it very simply, easier to use, or to understand. This view is quite influential, and there is some psycholinguistic evidence to support it (we didn’t go into this area of psycholinguistics in Chapter 9 so I won’t say any more here).

The fourth possibility is that Universal Grammar, while obviously allowing all of the orders, prefers SOV and SVO. Perhaps the PS-rules that generate these orders are somehow more natural than others. Remember from Chapter 4 that the verb and its direct object form a constituent, the VP, in English. SOV and SVO are the two orders which need the basic PS-rule S → NP VP (with the expansion of VP determining whether you have OV or VO), while the others seemingly have something else. This view is also influential, but as you can see it depends on working out which syntactic rules might be favoured and why. That can be done, but it would take us far beyond what can be looked at here.

So you can see that even a set of cross-linguistic data as simple as what we have in (1) raises really interesting and tricky questions, and potentially links up with things we saw in the other chapters, notably psycholinguistics.

One of the really important innovations in Greenberg’s early work was the idea of implicational universals. These are if-then statements about cross-linguistic variation like (2):

(2)

If a language has VSO order, it has Prepositions.

Greenberg had in mind the notion of implication from propositional logic (not the pragmatically enriched everyday notion). We saw the truth table for this in Chapter 5, and I’m repeating it here:

(3)
p q p → q
t t t
f t t
t f f
f f t

So let’s take p to be ‘this language has VSO order’ and q to be ‘This language has Prepositions’. (Here Prepositions are opposed to Postpositions, so it’s a question of whether you say to London, with a Preposition, or London to, with a Postposition). So, what (2) says is that you can find VSO languages with Prepositions, i.e. where both p and q hold; Welsh is like this, for example. It also says you can find non-VSO languages with Prepositions, i.e. where p is false and q is true. English is a language like this. But you can’t find VSO languages with Postpositions (p is true and q is false). As far as we know, this is true. There are very few languages with this combination of orders in the sentence and in the PP (in fact, the World Atlas of Language Structures lists just eight languages out of a total of 1,076, so this combination of orders does exist, but it’s very rare, being found in under 1 percent of the languages surveyed; see www.wals.info). On the other hand, non-VSO languages with Postpositions (p and q both false) are found, e.g. Japanese. So what an implicational universal says is that, out of the four possible combinations of two properties, one of them won’t be found. If that turns out to be true, the same question as we saw in relation to the preference for SVO and SOV orders arises, with the same possible answers.

Comparing what we saw regarding the preference for SVO and SOV with the implicational universal in (2), we can see that the first deals with a statistical preference while the latter, if it really holds, is an absolute statement. That makes our question (why?) and the possible answers all the more interesting.

Greenberg used implicational universals, both statistical and absolute ones, to define language types. This was the second really important idea in Greenberg’s early work. To see how language types can be set up, let’s look at French. French is an SVO language, as you can see from (4):

- Le chat (S) mange (V) la souris (O).
- ‘The cat eats the mouse’.

In French, we also have Prepositions rather than Postpositions, as (5) shows:

- sous (P) la table (O)
- ‘under the table’

So we can say French is PO language. Next, in possessive constructions, the possessor (the owner) follows the possessee (the thing owned):

- la plume (N) de ma tante (Poss)
- ‘the pen of my aunt’

Finally, adjectives usually follow nouns:

- le chat (N) intelligent (A)
- ‘the cat intelligent (i.e. the intelligent cat)’

So we can call French an SVO, PO, NPoss, NA language. That (partially) defines its word-order type. This particular type is very common around the world. Other languages following this pattern include the other Romance languages, Albanian, Yoruba, Edo, all the Bantu languages, most of the Chadic languages of West Africa, Khmer (spoken in Cambodia), Vietnamese, Thai and many Austronesian languages of South-East Asia and the Pacific, including Malay. English almost fits this pattern, but you can put the possessor before the possessed noun (as in my auntie’s pen), and adjectives of course precede nouns. So English is of the SVO, PO, PossN, AN type, a slightly rarer one than the French type.

Now let’s look at Japanese. As I already mentioned, this is an SOV language:

- Taroo-ga tegami-o kaita.
- Taroo (S) letter (O) wrote (V)
- ‘Taroo wrote the letter’.

It has Postpositions:

- Tokyo kara
- Tokyo (O) from (P)
- ‘from Tokyo’

Possessors precede the possessed noun:

- Taroo-no ie
- Taroo (Poss) house (N)
- ‘Taroo’s house’

And adjectives precede nouns:

- kono omosiroi hon
- this interesting(A) book(N)
- ‘this interesting book’

So Japanese is SOV, OP, PossN, AN, the opposite of French on every count. This is also a very common type. Other languages following this pattern include: Hindi, Bengali and the other Indic languages of Northern India, Modern Armenian, most of the Finno-Ugric languages of Northern Russia and Siberia (but not Finnish), the Altaic languages of Central Asia, the Paleo-Siberian languages, Korean, Ainu (the indigenous language of Japan), Hottentot (spoken in South Africa), Abkhaz and other Caucasian languages, the Dravidian languages of South India, the Sino-Tibetan languages of China, South-East Asia and Tibet (although Mandarin is slightly exceptional), Navaho (the most spoken surviving Native American language) and Quechua (spoken in the Andes, the ancient language of the Incas).

So we see another tendency: languages tend to favour the French type or the Japanese type. Most SVO languages are like French (but not all, of course, as we can see from English), and most SOV languages are like Japanese.

Now let’s look at Welsh, a VSO language. Here we see the following pattern:

1. a. Lladdodd y dyn ddraig.
  Killed (V) the man(S) dragon(O)
  ‘The man killed a dragon’.
2. b. i Fangor
  to(P) Bangor(O)
3. c. car John
  (N) (Poss)
  ‘John’s car’
4. d. bws bach
  bus(N) little(A)
  ‘little bus’

So Welsh is VSO, PO, NPoss and AN. This is just like French on all counts except the first. But if we leave aside the position of the subject, and just look at the order of the verb (V) and the direct object (O), then we see that Welsh and French are the same. This is one reason why some typologists have suggested leaving the subject out of basic word-order typology.

Here are some examples of the other basic word-order types:

- Malagasy, VOS:
- nahita ny vehivahy ny mpianatra
- saw the woman the student
- ‘The student saw the woman’.

- Hixkaryana, OVS:
- toto yonove kamara
- man he-ate-him jaguar
- ‘The jaguar ate the man’.

- Nadëb (OSV):
- samũũy yi qa-wùh
- howler-monkey people eat
- ‘People eat howler monkeys’.

Hixkaryana and Nadëb are both spoken in the Amazon jungle. Malagasy is spoken on the island of Madagascar.

These word-order correlations, as they are known, can be elaborated further. In a famous 1992 paper, Matthew Dryer established the following sixteen correlations:

(16)

a.	V	object	see [ the man ]
b.	P	object	in Cambridge
c.	V	PP	slept [ on the floor]
d.	want	infinitive	wants [ to see Mary]
e.	copular	predicate	is [ a teacher]
f.	aux	VP	has [ eaten dinner]
g.	Neg-Aux	VP	don’t know
h.	Comp	sentence	that [ John is sick]
i.	Q-marker	sentence	if [ John is sick]
j.	adverb	sentence	because [ John is sick]
k.	V	manner adverb	ran slowly
l.	article	noun	the man
m.	plural marker	noun	PL man (= “men”)
n.	noun	relative clause	movies [ that we saw]
o.	noun	genitive	father [ of John]
p.	adjective	standard of comparison	taller [ than John]

Dryer showed that the best predictor of all the other pairs was the order of verb and direct object, again indicating that VO and OV are the most basic typological distinctions as far as word order is concerned. Again, our earlier question, along with its four possible answers, comes up here, and still more forcefully.

In addition to implicational universals and language types, there are two further kinds of possible universals: substantive universals and formal universals (this distinction was originally made by Chomsky). Substantive universals are grammatical categories and notions of the kind we looked at throughout Chapters 1 to 5, although I mostly illustrated them from English. This includes things like tense, negation, interrogatives, nouns, verbs, vowels, etc. As far as we know, all languages have these. So all languages can talk about the time of an event in relation to the time of utterance (although by no means all languages do this with inflectional morphology), all languages have some way of denying the truth of a proposition and have some way of asking questions. Furthermore, the basic noun (thing words) vs verb (event words) distinction seems to hold universally. Finally, in the domain of phonetics and phonology, all languages seem to make a distinction between vowels and consonants. These are very interesting universals, if they really hold. Perhaps the most interesting one is the noun-verb distinction. Talking about time, denying things and asking questions could all be universal because these are obvious things to do with language, and things people are interested in doing: this would be a functional explanation for the universals. Similarly, all languages distinguish vowels and consonants because of the way the organs of speech are set up. But why distinguish nouns and verbs? The question is all more interesting in the light of type theory, as discussed in Chapter 5. If predicates and common nouns are of the same type, <e,t>, then there’s not even a semantic reason. So this starts to look like a candidate for a feature of Universal Grammar. Maybe.

Another type of substantive universal might be a set of universally available categories from which languages select. This might include things like nasalised vowels (French has them, English doesn’t, as we mentioned in Chapter 1), adjectives (certainly not all languages have these, ‘qualities’ can be expressed with verbs or nouns, i.e. instead of the fat cat, you can say something like the cat fats), auxiliaries, articles and so on. When languages have these things, they tend to behave alike, but certainly not all languages have them.

Formal universals, on the other hand, are statements about the actual rules of syntax. One might be that no language has a rule which simply reverses the order of all the words in a clause in order to form an interrogative from a declarative, as in (17):

1. a. This is the house that Jack built.
2. b. *Built Jack that house the is this?

As far as we know, this is unknown in the world’s languages. But changing the order of words in a sentence for one reason or another is quite common. So the rules that permute order (to use the technical term, which we didn’t get to in Chapter 4) are constrained. A much more interesting constraint than ‘no reversing the order’ is structure dependency. We can see this in operation with the English rule that swaps the position of the auxiliary and the subject to turn a declarative sentence into an interrogative (this one we did see in Chapter 4) as with the auxiliary is in (18):

1. a. The person is here.
2. b. Is the person here?

At first sight it seems easier (both for us and, presumably, for a computer or a smartphone) to apply this rule to the first auxiliary in the sentence. But we can see that isn’t true from the next pair:

1. a. The person who is here is rich.
2. c. *Is the person who – here is rich?

(19b) is ungrammatical. But here we applied the dumb inversion rule of moving the first auxiliary in the sentence (going from left to right). The grammatical version of (19b) is (20):

(19)

Is the person who is here – rich?

Here we’ve applied the rule, correctly, to the second is. The reason for this is that the rule inverts the first auxiliary after the subject NP. In the declarative sentence (19a) the subject NP is complex, containing a relative clause: the person who is here. So the first is after the subject NP is actually the second one in the sentence. Rules like this don’t just look at the linear order, but at the hierarchical structure of sentences. They’re sensitive to hierarchical notions like NPs, rather than linear notions like ‘first (or second) going from left to right’. If this is correct, then this kind of structure dependency is a good candidate for something in Universal Grammar, as it limits what the rules can do.

*

So here we’ve seen several important aspects of diversity and universals. We’ve seen that the world’s languages vary a lot – look at how Japanese and French differ on all the word-order features we looked at – but that the variation shows interesting patterns. The closer one looks at word order, the more of this kind of thing one observes. How to explain it is a different matter, and several general lines of explanation are available, as we saw. We also briefly looked at substantive universals, noting in particular the prevalence of the noun vs verb distinction. Finally, we briefly looked at structure dependency as a possible formal universal.

A major question, and a much-debated one, is how typological observations like those concerning word order in the world’s languages that we looked at here, could be connected to something like Chomsky’s notion of Universal Grammar. At first sight, it seems that there’s no hope of a Universal Grammar for word order because we find so much variation. But then when you start to see patterns you might think again. One way to reconcile systematic variation with Universal Grammar has been under development since around 1980. This is the principles-and-parameters approach. The idea is that certain things are absolutely fixed and part of Universal Grammar (there might not be many of these, but the noun-verb distinction and structure dependency are quite good candidates): these are the principles. Then there are parameters of variation, governing things like whether the language is VO or OV (which might predict lots of other word-order properties, as we saw). The principles and the parameters can be thought of as innate; they make up Universal Grammar, and then the child acquiring a language has to ‘fix the values’ of the parameters in order to acquire I-language competence. This idea seems to be able to reconcile the idea of an innate Universal Grammar with the variation we can observe in the world’s languages.

But every aspect of this is controversial. There are linguists who don’t think any of the universals put forward by Greenberg, Dryer and others are universal at all; or at least that they’ll all turn out to be wrong as we get more and more data about more and more languages. There are those who find the notion of innate ideas difficult and look for other explanations for first-language acquisition and any universals we might have found. There are many who recognise that probably something has to be innate, but it might be a generalised capacity to learn complicated systems, or a capacity to communicate, rather than anything specific to language (and as odd as the verb-noun distinction). And finally there are those who accept Chomsky’s general position but are sceptical about the principles-and-parameters idea.

It seems there are no absolute certainties here. But there may be one: these questions, like everything else about language, are fascinating, complex and profound.

What their answers are we just don’t know, but we’re working on it.

Book contents

10 - How to Build a Language

Information

10 How to Build a Language Language Typology and Universals

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Book contents

10 - How to Build a Language

Information

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Save book to Kindle

Save book to Dropbox

Save book to Google Drive