1.1 What Is Communicative Efficiency?
Generally speaking, efficiency means minimization of a cost-to-benefit ratio. In other words, being efficient means not spending more effort than necessary in order to achieve something. This idea is popular nowadays. We are taught to work smarter, not harder. We are advised to keep only things and human contacts that are meaningful to us. We are expected to practise time management and use energy-efficient cars and gadgets.
Efficiency is an inherent property of living organisms. It is a product of biological evolution. Individuals who behave efficiently will be more fit and ultimately will leave more copies of their genes (Ha Reference Ha, Breed and Moore2010). There is plenty of evidence that humans and other animals behave efficiently in foraging, parental investment, cooperation and sibling rivalry. For example, the kinematic paths of motion of humans minimize the energy costs of movement (Anderson and Pandy Reference Anderson and Pandy2001). Penguins waddle because it conserves energy. If they did not, this would result in more work being required from the muscles (Griffin and Kram Reference Griffin and Kram2000). Zach (Reference Zach1979) found efficient foraging behaviour in Northwestern crows, who feed on whelks (sea snails) by dropping them from a height in order to break them. The birds preferred the largest whelks, which have a higher caloric content and broke more readily than medium and small ones. Since ascending flight was energetically expensive, the crows minimized the total amount of ascending flight required for breaking whelks by choosing the optimal height of drop. As a result, they achieved a large positive difference between the amount of calories gained from whelks and the amount of calories spent flying.
But efficiency is not only a result of biological evolution. It also comes with practice. For example, professional runners position their heels in such a way as to lower metabolic energy consumption (Scholz et al. Reference Scholz, Bobbert, Van Soest, Clark and van Heerden2008; see also Napoli and Liapis Reference Napoli and Liapis2019). Since language is a very old and frequent human activity, we have had many opportunities to optimize it, both in phylogeny and ontogeny.
Human language as such can be regarded as a very efficient tool because it helps us to save time and effort when we need something from others. Language has created huge benefits for us as a species, allowing us to build large and complex societies and cope with many challenges. At the same time, we tend to save our articulatory and processing effort while using language. For example, during the COVID-19 pandemic, people all over the world started using abbreviated names for coronavirus. The clipped form corona is particularly popular, being used in many languages, such as Bengali, Hebrew, Indonesian, Malayalam and Romanian. In Dutch, as well as in German, Danish and Swedish, corona is particularly frequent in compounds. The Dutch, for example, speak about coronapatiënten ‘corona-patients’, coronadoden ‘corona-deaths’ and coronatests. They must adhere to coronaregels ‘corona-rules’ and deal with the coronacrisis. In short, we are living in the coronatijd ‘corona-time’ at the moment. Speakers of Australian English are probably the champions in least effort. They have come up with a radically shortened form, rona. One would say, I’m in iso [self-isolation] because of rona.Footnote 1
Moreover, we are aware of our tendency to save effort. We can even use it as an excuse. For example, at one meeting Donald Trump called Tim Cook, the Apple CEO, ‘Tim Apple’. After the media started making fun of his gaffe, Trump posted a message on Twitter, saying that he had been trying to ‘save time & words’:
At a recent round table meeting of business executives, & long after formally introducing Tim Cook of Apple, I quickly referred to Tim + Apple as Tim/Apple as an easy way to save time & words. The Fake News was disparagingly all over this, & it became yet another bad Trump story!
Thus, efficiency is an important aspect of language communication. But it is not easy to study. Unfortunately, it is impossible to tell exactly how efficient a particular utterance is in a specific context. The reason is that we cannot measure all costs and all benefits of communication (see more in Section 1.2). Instead, we can compare alternative expressions that convey similar meanings and say which one is more costly, and which is less. In many situations the speakerFootnote 2 can choose between expressions of different length. Some examples are given in (1). In (1a), one can use the lexical causative stop or the periphrastic causative get X to stop. Example (1b) illustrates the use of different referential expressions: the longer proper name Jennifer, and the shorter pronominal form she. In (1c), the difference between the sentences is in the use or absence of the complementizer that. In (1d), the speaker can choose between the clipped form maths and the full form mathematics. The example in (1e) contrasts the analytic and synthetic comparative forms of adjectives, which can sometimes be used interchangeably in English. The example in (1f) illustrates variation in pronunciation of I don’t know. The variants differ in the total length, the presence or absence of the pronominal subject and amount of articulatory detail. The example in (1g) is an instance of the genitive alternation, where the Saxon genitive with ‑’s is shorter than the Norman genitive with of and also allows for omission of some determiners.
| a. | John stopped the car. – John got the car to stop. |
| b. | Jennifer entered the room. – She entered the room. |
| c. | She believes you are here. – She believes that you are here. |
| d. | I’m studying maths. – I’m studying mathematics. |
| e. | Ann is cleverer than Mary. – Ann is more clever than Mary. |
| f. | Dunno [dəˈnəʊ]. – I don’t know [aɪ dəʊn(t) ˈnəʊ]. |
| g. | the emperor’s family – the family of the emperor |
In all these pairs, the costs of articulation are lower if the speaker uses the shorter variant. It also costs less time. But the shorter variant is not always the best one. Sometimes one needs to use a more effortful expression in order to make sure that the intended meaning is conveyed. For example, if there is a chance of phonetic misinterpretation, one will use hyperarticulation: It’s not a pin, it’s a bin. Also, when talking to a stranger, a local is unlikely to use an abbreviated variant of a toponym. For example, if a Berliner says Alex instead of Alexanderplatz when giving directions to a tourist, they are likely to be misunderstood. We speak about efficiency if people use less costly expressions, at the same time conveying the intended meaning. In many cases, this means using shorter forms to convey easily accessible meanings, and longer forms to convey less accessible ones. More examples of such contrasts can be found in Chapter 2.
But efficiency is not only about saving articulation effort and time. Different structures can be more or less efficient from the perspective of language processing. For example, (2) illustrates variation in the order of syntactic constituents. According to some theories, the sentence in (2a), where the short prepositional phrase precedes the long object, requires less processing effort than the sentence in (2b), where the order is reversed. The reason is that (2b) has longer syntactic dependencies, which create higher memory costs. These issues are discussed in detail in Chapter 3.
| a. | I met [on the street] [my eccentric aunt from San Francisco]. |
| b. | I met [my eccentric aunt from San Francisco] [on the street]. |
In the above-mentioned examples, users have choice between more and less costly expressions. Very often, these choices become conventionalized and associated with different meanings, grammatical categories or registers. They become obligatory. A typical example is the singular–plural distinction. Cross-linguistically, singular forms are less often marked formally than plural forms (Greenberg Reference Greenberg1966), as illustrated by the pair book – books in (3a). In (3b), the shorter form furniture has a collective use, whereas the longer form a piece of furniture has a singulative meaning. In (3c), the comparative forms of adjectives are more costly than the positive forms.
| a. | (one) book-Ø – (five) book-s |
| b. | furniture – a piece of furniture |
| c. | nice – nicer, expensive – more expensive. |
Unlike in (1) and (2), the speaker has no choice because the constructions convey different categories and meanings (although one can find languages where number marking is optional, for instance). Still, these asymmetries are efficient because more frequent meanings and categories are expressed by less costly forms. This saves the total amount of effort and time in the long run.
Finally, we can compare the costs of expressions which are not functionally or formally related at all, provided we can also compare their accessibility. According to Zipf’s (Reference Zipf1965 [1935]) Law of Abbreviation, more frequent words tend to be shorter than less frequent ones. Compare, for example, short and frequent words I, in and be, with long and rare words harpsichord, archaeopteryx and gongoozle ‘to watch the passage of boats’. Although one can also find many pairs of words in which the frequent member is longer than the rare one (e.g., the word understand is more frequent in everyday language than a physics term quark, but the former is longer than the latter), any text of sufficient length will yield a significant negative correlation between frequency and length (see Section 2.6).
The idea of minimizing the costs of communication while keeping the benefits has a long tradition in linguistics. In fact, already in the 19th century similar ideas were used to explain the processes of grammaticalization and sound change. For example, Georg Curtius (1820–1985), a German philologist, explained phonetic attrition (Verwitterung ‘weathering’) by the drive to Bequemlichkeit ‘comfort’. This drive is counterbalanced by the tendency to preserve meaning-bearing sounds and syllables, which resist attrition in order to be recognizable (Delbrück Reference Delbrück1919: 143–144). Therefore, language users try to minimize their effort, at the same time making sure that the meanings are conveyed. Similarly, William Dwight Whitney (Reference Whitney1875: 69) wrote about the tendency towards ease and economy as a driving force of the process of assimilation. He also mentioned that what is easy to the ‘practised speaker’ is not necessarily what is easy for second language learners and children, thus pointing to the potential conflict with learnability – another important factor in language evolution.
Zipf not only formulated the Law of Abbreviation (see above), but also contemplated the causes of efficient behaviour. He argued that language users act as rational ‘artisans’, who follow the Principle of Least Effort (Zipf Reference Zipf1949; see also Section 5.4.2). Among more recent approaches, one can mention the following closely related principles and hypotheses:
Haiman’s (Reference Haiman1983) principle of economy;
Du Bois’ (Reference Du Bois and Haiman1985) dictum ‘Grammars code best what speakers do most’;
Cristofaro’s (Reference Cristofaro2003) principle of Information Recoverability;
Hawkins’ (Reference Hawkins2004) principle ‘Minimize Forms’;
Givón’s (Reference Givón2017: 157) code–quantity principle;
Haspelmath’s (Reference Haspelmath2021a) form–frequency correspondence hypothesis.
Efficient word order has also received substantial attention. One of the earliest contributions is Behaghel’s (Reference Behaghel1909) law of growing constituents, which says that of two constituents of different length, the longer constituent follows the shorter one. This provides advantages both for production and comprehension. Later, Yngwe (Reference Yngve1960) wrote about efficient word orders generated by a formal language model, which put less demands on working memory. Hawkins (Reference Hawkins2014) formulated the principles ‘Minimize Domains’ and ‘Maximize On-line Processing’. One manifestation of word order efficiency which has received a lot of attention recently is so-called dependency distance minimization (Ferrer-i-Cancho Reference Ferrer-i-Cancho2006; Liu Reference Liu2008; Futrell, Mahowald and Gibson Reference Futrell, Mahowald and Gibson2015b; see also Chapter 3).
The speaker’s efficient choices are also discussed in pragmatics. In particular, they are captured by some of the Gricean and Neo-Gricean principles, maxims and heuristics (Grice Reference Grice, Cole and Morgan1975; Horn Reference Horn and Schiffrin1984; Levinson Reference Levinson2000), as will be shown in Section 1.4. I should also mention here Keller’s hypermaxim ‘Talk in such a way that you are socially successful, at the lowest possible cost’ and maxim ‘Talk in such a way that you do not spend more energy than you need to attain your goal’ (Keller Reference Keller1994: 107).
In the recent decades, these and similar ideas have been tested on large and typologically diverse corpora with the help of advanced quantitative methods (see Levshina and Moran Reference Levshina and Moran2021 for an overview). Examples are phonological studies of language production, focusing on duration of words and articulation or omission of certain sounds (e.g., Cohen Priva Reference Cohen Priva, Abner and Bishop2008; Bell et al. Reference Bell, Brenier, Gregory, Girand and Jurafsky2009; Seyfarth Reference Seyfarth2014), use and omission of optional grammatical markers, such as complementizers or relativizers (e.g., Jaeger Reference Jaeger2006; Wasow, Jaeger and Orr Reference Wasow, Jaeger, Orr, Simon and Wiese2011), or the above-mentioned dependency distances. In addition to corpora, we can rely on other methods, such as computational modelling, artificial language learning, communication games and traditional psycholinguistic experiments. In many studies, an important role is played by information theory (cf. Gibson et al. Reference Gibson, Futrell, Piantadosi, Dautriche, Mahowald, Bergen and Levy2019).
All this wealth of ideas and evidence requires systematization and explanation, as well as some critical re-evaluation. In particular, the following questions require an answer:
What are the different costs and benefits in language communication?
What efficient linguistic strategies are there?
What are the pragmatic and cognitive mechanisms of efficient linguistic behaviour for the speaker and the addressee?
How do efficient conventionalized linguistic form–meaning pairings develop?
This book addresses these questions and provides many examples of efficient linguistic structures and patterns of use. Note that we will only speak here about communicative efficiency, that is, minimization of the cost-to-benefit ratio in language use, and leave out other possible types of efficiency in language (e.g., learning efficiency).
1.2 Benefits and Costs in Communication
1.2.1 Types of Benefits
If efficiency is minimization of a costs-to-benefits ratio, what are the costs and benefits of using language? We will begin with benefits. Surprisingly, they are rarely discussed in the literature on communicative efficiency.
Speaking very generally, the ultimate goal of all our activities as an organism is survival. For this purpose, we need to collaborate with some people and compete with others. This involves influencing other people, so that they give us some material goods, help us, attack our rivals or simply leave us alone. We also benefit from useful information that we request and obtain because it helps us to adjust our behaviour and adapt to the environment better. These are the benefits of communication in a very broad sense.
Following Relevance Theory (Sperber and Wilson Reference Sperber and Wilson1995; Wilson and Sperber Reference Wilson, Sperber, Horn and Ward2004), we can also speak of benefits as positive cognitive (or contextual) effects for the addressee. Positive cognitive effects are worthwhile differences between the old (before communication) and new (after communication) representation of the world. They represent new conclusions based on the utterance and context, but also strengthening, revisions and abandonment of already available assumptions. Cognitive effects are similar to updating of prior beliefs in Bayesian inference. Human cognition is geared towards maximizing cognitive effects (Sperber and Wilson Reference Sperber and Wilson1995). The changes in beliefs correspond to diverse cognitive processes in the addressee: learning new information or confirming previous beliefs about the world, bonding with the speaker, deciding to perform an action, empathizing with the speaker, enjoying the style or accepting new linguistic conventions. These diverse processes illustrate Jakobson’s referential, phatic, conative, emotive, poetic and metalingual functions of language use (Jakobson Reference Jakobson and Jakobson1971 [1960]). Importantly, cognitive effects and the resulting processes represent benefits not only for the addressee but also for the speaker, who is interested in evoking them.
In order to evoke desired cognitive effects in the addressee, the speaker needs to ensure that the linguistic units and their functions (that is, lexical meanings, grammatical categories, roles and other information) are transferred more or less successfully. Using Relevance-Theoretic parlance, we can say that successful communication requires a recovery of what is explicitly said, or explicatures. This information is obtained by a combination of decoding and inference, with the help of such operations as reference resolution, semantic narrowing, loosening, speech act identification and others. Explicatures form the basis for recovery of implicated premises and conclusions, which represent cognitive effects for the addressee.
Of course, this is an idealization. We do not always recover all units and all meanings; nor do we need to. First of all, communication happens in a noisy channel, using Shannon’s terminology (Reference Shannon1948). Faithful transfer of linguistic units can fail due to physical impediments (e.g., speaking in a crowded pub) or processing difficulties (e.g., see F. Ferreira [Reference Ferreira2003] on ‘good-enough’ processing of sentences). Our language seems to be protected against noise by redundancy (cf. Hengeveld and Leufkens Reference Hengeveld and Leufkens2018), which means that not all units must be transferred perfectly. At the same time, it is obvious that linguistic units must be of some use for communicators. If we speak about the grammatical function of a case marker, for instance, we assume that this meaning helps the addressee to understand who did what to whom, even if this information can be partly inferred from other linguistic cues (e.g., lexical or semantic properties of the arguments). The working hypothesis is that human languages develop and retain conventionalized cues because these cues are normally useful for evoking cognitive effects.
The benefits, from more specific to very general ones, are displayed in Figure 1.1. We will assume that in most cases the transfer of linguistic units is successful, helping the addressee to obtain intended cognitive effects and adjust their own behaviour, as a result. From the speaker’s perspective, triggering desirable cognitive effects in the addressee helps to influence the addressee’s behaviour in a useful way. Finally, influencing other people’s behaviour or adjusting one’s own increases the chances of the language user’s survival as a living organism.

Figure 1.1 A hierarchy of benefits in linguistic communication
1.2.2 Types of Costs
Communication costs have received more attention than benefits in the literature. They can be classified into several types, as shown in Figure 1.2. First of all, we can speak about costs related to the effort involved in communication. Two major types are processing and articulation (including sign languages) or writing. Processing costs are associated with different cognitive processes required for language comprehension and production.

Figure 1.2 Different types of costs in linguistic communication
Time (or space in writing) is another type of cost. According to V. Ferreira (Reference Ferreira2008), speakers have the responsibility not only to say things their addressees can understand, but also to say things quickly. Similarly, Clark’s (Reference Clark1996) ‘temporal imperative’ says that speakers need to use time in the conversations wisely and responsibly. Since articulation takes time, these costs usually go together. However, there are situations in which time has an independent value. An important aspect of efficient use of time has to do with word order. Speakers tend to produce first the constituents that are more accessible. Accessibility is influenced by a range of factors, including frequency, givenness and animacy. By producing accessible material first, the speaker buys time for planning less accessible units (see more in Section 3.2.2).
Most studies of efficiency focus on the amount of effort and time, but other costs can be important, too. For example, poor communication can have severe social consequences, including loss of face, ruined reputation and broken relationships. Politicians know this all too well. For example, when the current US President Joe Biden once said, I’m Irish but not stupid, many Irish people were not amused. Why? The use of but signals that the speaker thinks that both he and the addressee are familiar with the cultural stereotype that Irish people are generally stupid. From that, it is easy to conclude that Biden actually thinks that his audience shares the stereotypical belief that the Irish are stupid.
It is difficult to measure social costs, but there are ways of quantifying the degree of miscommunication with the help of information theory. For example, Kemp, Xu and Regier (Reference Kemp, Yang and Regier2018) operationalize what they call communicative costs as the difference between the speaker’s and the addressee’s probability distributions over referents that can be represented by a certain referential expression. See more on this approach in Chapter 6.
Social costs are closely related to effort. Misunderstanding can lead to additional articulation costs and loss of time, from a simple repair in a dialogue to extensive explanations and press releases. Articulatory and social costs can also be in conflict, as one can see in the current debate about the use of feminitives in German. In particular, the masculine plural form of nouns referring to human beings is considered ambiguous in the sense that it is not clear whether it names men only or both men and women. For example, die Kollegen ‘the colleagues’ and die Lehrer ‘the teachers’ can be interpreted in both ways. In order to be gender-inclusive, avoiding the male-only interpretation, it is considered appropriate by many people to use these forms along with the plural feminine forms, e.g., Kolleginnen und Kollegen ‘colleagues (female) and colleagues (male)’. This can lead to very long forms, especially if there are attributes. For example, in a job advertisement, one would write something like this:
| Wir | suchen | ein-e | erfahren-e | Buchhalter-in | ||
| We | search | art-f.acc | experienced-f.sg.acc | accountant-f | ||
| / ein-en | erfahren-en | Buchhalter. | ||||
| art-m.acc | experienced-m.acc | accountant | ||||
| ‘We are looking for an experienced accountant (male or female).’ | ||||||
In this example, the costs of writing and space are particularly high.
There have been attempts to make the forms gender-inclusive but more compact. For example, one can use the so-called Gendernstern ‘gender-star’, an asterisk followed by the feminitive suffix, as in Lehrer*innen ‘teachers (male or female)’. Alternatively, one can use a gap or a slash, e.g., Lehrer_innen and Lehrer/innen. Also, the suffix can be written with a capital -I-: LehrerInnen. In order to reflect this in spoken language, the feminitive suffix is separated by a glottal stop. It is also possible to avoid gendered nouns with the help of passive constructions, relative clauses or participles, e.g., Studierende ‘the ones who study’ instead of Studenten und Studentinnen ‘students (male and female)’.Footnote 3 The attempts to make the German language more gender-equal with the help of these forms are a matter of heated debate.
Another negative correlation between social costs and articulation costs is associated with politeness and etiquette. Commonly, expressions that are appropriate in formal situations are long. For example, in Japanese, when asking someone for a favour, one says yoroshiku onegaiitashimasu in very formal situations, yoroshiku onegaishimasu in less formal situations, and simply yoroshiku when speaking to one’s friends. Similarly, formal and distant V-forms are often longer than informal and intimate T-forms, e.g., French vous parlez vs. tu parles, Russian ty znaeš ‘you.SG know.IPF.PRES.2SG’ vs. vy znaete ‘you.PL know.IPF.PRES.2PL’. If one uses the form that is shorter than required, one will save articulation costs but risk substantial social costs.
Let us now turn to articulation costs, which play a major role in studies of efficiency. Unfortunately, the current state of research does not allow us to measure articulation costs precisely (ideally, in calories or other units of energy). Usually, their estimations are based on the number of phonological segments, or even letters. This is not unproblematic, of course. While in contrasts like cat vs. crocodile or cat – cats, the latter wordform is obviously more costly than the former, it is easy to find less clear cases. For example, take the wordform cups [kʌps] with four segments, which include a short vowel and voiceless consonants, and calm [kɑːm], which has three segments, but a long vowel and a sonorant. Which one is more costly (cf. Martinet Reference Martinet1963: 169)? The costs associated with stress and pitch are still waiting for a more precise estimation, as well.
Articulation effort is not a property of spoken languages only. It also includes kinematic effort in sign languages and gesture communication. If a signer moves more joints, this means that they also move greater mass and therefore expend more articulatory effort in comparison with moving fewer joints. Also moving one’s shoulders or elbows – that is, joints that are more proximal to the torso – is more effortful than moving one’s wrists or fingers (Napoli, Sanders and Wright Reference Napoli, Sanders and Wright2014).
As pointed out by Levinson (Reference Levinson2000: 28), human speech encoding is the slowest stage in human communication, due to the physiological constraints on articulation. It also consumes muscular energy. All other aspects of speech production and comprehension can run at a much higher rate, including inference. So,
inference is cheap, articulation expensive, and thus the design requirements are for a system that maximizes inference.
But this does not mean that processing comes at zero costs. In order to measure processing costs, we can use different behavioural and neural indicators. For example, extra effort in language production can be accompanied by disfluency markers (e.g., um, eh) and longer planning times (Beattie and Butterworth Reference Beattie and Butterworth1979). Another kind of effort marker is speech errors, such as blends, e.g., ‘Don’t shell so loud’, where shell is a blend of shout and yell (Hockett Reference Hockett1967). These reflect difficulties in choosing between lexical items with similar semantic features (Fromkin Reference Fromkin and Fromkin1973).
Another important behavioural marker is reaction times. For example, Britt, Ferrara and Mirman (Reference Britt, Ferrara and Mirman2016) found that language users were fastest to name pictures that only had one appropriate name, slower when selecting among synonymous names (e.g., gift and present), and slowest when selecting between closely related names, which were near semantic neighbours (e.g., jam and jelly). This suggests that lexical choice can be costly.
Similarly, production of grammatical structures can be more or less complex. F. Ferreira (Reference Ferreira1991) showed that it took participants longer to initiate an utterance when sentences were more complex from a syntactic point of view, as measured by the number of nodes in a phrase-structure tree. Also, if a sentence had a syntactically complex subject and a syntactically complex object, speakers tended to pause at the subject–verb phrase boundary. The duration of pauses increased with upcoming complexity.
Another type of production cost is associated with the use of memory. If the speaker has to keep some elements of the utterance in the memory buffer before dispatching them, it will create additional memory costs. More on that follows in Chapter 3.
Processing costs in comprehension can be detected by poor comprehension accuracy, slower reading times and greater activation of brain areas. In particular, one can detect deflections in the EEG (electroencephalogram) signal, which serve as indicators of extra effort in language processing.
An important cause of processing costs is low predictability, which corresponds to low averaged cloze probability in a norming task, or high surprisal (unpredictability) based on n-grams (that is, neighbouring words) or more sophisticated neural network models trained on large corpora. More predictable words are pre-activated by previous context. As a result, it is easier to retrieve their lexical information. Words that are less predictable have longer reading times (e.g., Demberg and Keller Reference Demberg and Keller2008; Frank and Bod Reference Frank and Bod2011; Smith and Levy Reference Smith and Levy2013; Merkx and Frank Reference Merkx and Frank2020; Wilcox et al. Reference Wilcox, Gauthier, Hu, Qian and Levy2020). They also trigger a deflection with greater amplitude in EEG in comparison with predictable words (Frank et al. Reference Frank, Otten, Galli and Vigliocco2015). This happens approximately 400ms from the onset, which is why this effect is called N400. Units that do not fit semantically or violate our expectations based on encyclopaedic or contextual knowledge generate N400, due to more effortful unification of the unit with the context (see Baggio and Hagoort Reference Baggio and Hagoort2011 for an overview).
A different type of neural response is called P600. This represents a positive deflection reaching its peak around 600 milliseconds after presentation of the stimulus. This effect is associated with syntactic reanalysis, and in particular with garden-path sentences. Osterhout, Holcomb and Swinney (Reference Osterhout, Holcomb and Swinney1994), for instance, observed a P600-effect in response to sentences like ‘The lawyer charged the defendant was …’, in contrast to ‘The lawyer charged that the defendant was …’. Similarly, a positivity between 300 and 600ms post onset was observed in sentences that required a reversal of the thematic ordering between the arguments (Bornkessel, Schlesewsky and Friederici Reference Bornkessel, Schlesewsky and Friederici2003). Another trigger of P600 is syntactic anomaly, e.g., an error in subject–verb agreement or word category, such as noun and verb (Hagoort, Brown and Groothusen Reference Hagoort, Brown and Groothusen1993). At the same time, one should be aware that readers do not always engage in reinterpretation of the input. They are often satisfied with shallow and incomplete interpretations (Ferreira Reference Ferreira2003). This approach is known as ‘good-enough’ processing, which can be considered efficient because it saves processing effort, and works just fine in most cases.
Memory costs play an important role in comprehension. The addressee often needs to store in the memory parts of the input that may be integrated later with upcoming units (Gibson Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000). Compare (5a) and (5b). The sentences contain the same words and meaning. However, (5a) does not represent a problem for processing, whereas (5b) is unprocessable for most people. The reason is that the memory load is too heavy (Gibson and Warren Reference Gibson and Warren2004; Grodner and Gibson Reference Grodner and Gibson2005; Bartek et al. Reference Bartek, Lewis, Vasishth and Smith2011). See more in Section 3.2.1.
| a. | [The intern [ who [ the nurse supervised ] ] had bothered the administrator [who [ lost the medical reports ] ] ]. |
| b. | # The administrator [ who [ the intern [ who [ the nurse supervised ] ] had bothered ] ] lost the medical reports. |
There is evidence showing that pragmatic processing can be costly. In particular, comprehension of intended ironic meanings caused processing costs during late phases of processing indicated by P600 in Regel’s (Reference Regel2009) study. An example of a target sentence in German is Das ist ja großartig ‘That’s great!’, which was used in either ironic or literal sense. Interpretation of irony requires suppression of some aspects of the literal meaning, and computation of the intended meaning, which involves contextual information and speakers’ communicative intentions. As shown by Bašnáková et al. (Reference Bašnáková, Weber, Petersson, van Berkum and Hagoort2014), processing of indirect speech acts requires activation of several brain regions usually responsible for mentalizing and empathy and for discourse-level language processing.
At the same time, the literal interpretation is not always the most accessible. For example, Giora, Givoni and Fein (Reference Giora, Givoni and Fein2015) show that negative understatements of the type He is not the smartest president in Hebrew are processed faster than negative literal utterances (e.g., He is not the smartest president. John Adams was the smartest one), as well as affirmative sarcastic utterances (e.g., He is the smartest president. So don’t use any difficult words when talking to him). In fact, in the absence of any context, negative sentences X is not the most Y are by default interpreted sarcastically (e.g., ‘He is stupid’), which means that they should be treated as constructions – that is, conventionalized form–meaning pairings (Goldberg Reference Goldberg1995).
Of course, human language involves many other costs. In particular, the costs of learning a linguistic system can play an important role in explaining why human languages are the way they are. These costs are beyond the scope of this book. Also, we will focus mostly on articulation, time and processing costs because they have been investigated in greater detail.
1.2.3 Cooperation or Selfish Behaviour?
An important question is whether the speaker and the addressee each try to minimize their individual costs or they both minimize their joint costs in a collaborative effort. There is a long tradition of regarding communication as a conflict of the speaker’s and the addressee’s interests. In particular, Zipf (Reference Zipf1949) wrote about them as opposing forces. The interests of the speaker are represented by the force of unification. The speaker ‘has the job of not only selecting the meanings to be conveyed but also the words that will convey them’ (Zipf Reference Zipf1949: 20). The highest economy for the speaker would be if the vocabulary consisted of only one word that would mean anything the speaker wanted to mean. There would be no effort to acquire and maintain a large vocabulary and to select the words with a particular meaning from that vocabulary. In contrast, the addressee is interested in diversification, that is, having a distinct word for each meaning to be verbalized. A popular view is that the speaker’s and the addressee’s efforts represent a trade-off. As V. Ferreira puts it,
A speaker can expend little effort when constructing an utterance (‘the thing’), leaving much of the burden for understanding to their listener. Or, speakers can work harder (‘the red Honda on your left’), leaving less work for their addressees.
At the same time, it is obvious that participants are forced to cooperate. If speakers are too ‘stingy’, they will not get their message across and fail to reach their goals. Alternatively, the speaker will have to produce a repair, which will result in even more effort in total. So, it is in the best interest of the speaker to make life not too difficult for the addressee.
Another question is, how costly is ambiguity? According to available experimental evidence, lexical ambiguity represents a challenge for processing in the absence of disambiguating information. During natural reading, the duration of fixation times has been shown to be longer for ambiguous words compared with unambiguous controls (Frazier and Rayner Reference Frazier and Rayner1990). Also, EEG studies with word-by-word presentation reveal a sustained frontal negativity for ambiguous words presented in a semantically neutral context compared with unambiguous words (Hagoort and Brown Reference Hagoort, Brown, Clifton, Frazier and Rayner1994). These findings show that processing of ambiguous words is more effortful than processing of monosemous words, which supports Zipf’s ideas. However, there are no clear indications that ambiguity is costly in normal language use, where contextual disambiguating information is usually abundant (see also Section 6.2.1):
what is surprising … is not that people sometimes [my emphasis] experience difficulty with ambiguity, but that they experience difficulty so rarely.
In other words, ambiguity as a threat to communication is overrated (Wasow Reference Wasow and Winkler2015). We become aware of it only in jokes, as in the examples below.
| a. | Call me a taxi! – OK, you’re a taxi. |
| b. | Time flies like an arrow, fruit flies like a banana. |
| c. | How do you make a turtle fast? Take away his food. |
Similarly, syntactic ambiguity is sometimes presented as evidence that the speaker’s interests are more important than the addressee’s. In particular, V. Ferreira and Dell (Reference Ferreira and Dell2000) had their participants produce sentences as in (7):
| a. | The coach knew (that) I missed practice. |
| b. | The coach knew (that) you missed practice. |
If the speakers took seriously the addressee’s needs, they would avoid creating garden-path sentences in contexts like (7b), where the pronoun you could be first understood as the direct object of knew, by using the complementizer that more often than in the unambiguous (7a). However, this is not what the participants did. The omission rate of that was nearly identical in the potentially ambiguous and unambiguous contexts (see similar results in Jaeger Reference Jaeger2010).
Notably, when Ferreira and Dell (Reference Ferreira and Dell2000) compared the rates of that-omission in different sentences exemplified in (8), they found that the participants omitted that more often when the subject of the embedded clause was coreferential with the subject of the main clause and therefore more accessible, as in (8a) and (8d), in comparison with the different-subject sentences in (8b) and (8c).
| a. | I knew (that) I would miss the flight. |
| b. | I knew (that) you would miss the flight. |
| c. | You knew (that) I would miss the flight. |
| d. | You knew (that) you would miss the flight |
Again, disambiguation pressure did not play any role. Note that (8d) with the sequence You knew you was unlikely to be interpreted as a transitive clause because one would then use the reflexive pronoun for the coreferential object, You knew yourself, so the only potentially ambiguous sentence was (8b), but this difference did not play any role. One can conclude that the grammatical encoding process works in a speaker-centred rather than addressee-centred way. As argued by V. Ferreira (Reference Ferreira2008), there is a division of labour between the speaker and the addressee. The former takes care of minimizing time, the costs of formulation and articulation, whereas the latter is supposed to do the rest.
The experimental task involved sentence recall without actual communication, so it is difficult to interpret the results in favour or against the addressee-centred linguistic behaviour, in the first place. Another problem is that ambiguity is understood as the existence of several possible interpretations disregarding language users’ expectations based on the context. But, as argued above, we can perfectly manage ambiguous words and structures if there is enough context. In the examples from Ferreira and Dell, the chances that you can be interpreted as a direct object of knew are in fact very small. A search for the sequence knew you in a segment of the spoken subcorpus from the Corpus of Contemporary American English (COCA, Davies Reference Davies2008– ) reveals that contexts where you is the subject of a complement clause are five times more frequent than contexts with you as the direct object of knew. So, there are few reasons, if any, to provide additional marking in order to avoid the interpretation of you as a direct object.Footnote 4
In fact, ambiguity is highly beneficial both for the speaker and the addressee. First of all, it is advantageous for the speaker to save articulatory effort by using shorter words. Since short words are limited, ambiguity provides a convenient way to minimize the costs without jeopardizing message transfer very seriously, thanks to rich contextual cues (Piantadosi, Tily and Gibson Reference Piantadosi, Tily and Gibson2012; see also Section 6.2.1). Moreover, using ambiguous but short words and constructions helps the addressee to save time, too.
It is likely that ‘participants in a contribution try to minimise the total effort spent on that contribution – in both the presentation and acceptance phases’ (Clark and Schaefer Reference Clark and Schaefer1989: 269). This is known as the Principle of Least Collaborative Effort (see also Clark and Wilkes-Gibbs Reference Clark and Wilkes-Gibbs1986). There are some indications that this is indeed true. For example, there is no immediate benefit for the addressee to say yeah, mm-hm, nod or even systematically blink when listening to the speaker (so-called backchannel communication). But this is what hearers do all the time, signalling to the speaker ‘I’m here with you, please continue’ (Hömke, Holler and Levinson Reference Hömke, Holler and Levinson2017). If there are problems, they quickly ask for repair (e.g., huh?), which means ‘I’m not with you, please don’t continue.’ All these costs, however, seem to save everyone’s effort in the long run.Footnote 5 Without these signals, the speaker does not know if they are on the right track and runs the risk of spending a considerable amount of time and effort in vain. Constant feedback is essential.
So, is communication a cooperative action or selfish behaviour? The most likely answer is ‘both’. The speaker and the addressee cooperate because it is in their best interests.
1.3 How to Be Efficient?
In this book I argue that we use language efficiently, trying to minimize the cost-to-benefit ratio of communication. But how to achieve this in practice? First of all, the speaker can save time and effort by not expressing irrelevant information that will not produce useful cognitive effects. An example is the deletion of the agent in passive sentences and the patient in anti-passive sentences. This strategy is so natural that we hardly ever think about it. It only becomes obvious when the speaker violates it by providing information that is irrelevant, for example, to distract or confuse the interlocutor. Experienced politicians are very good at creating a smokescreen in order not to answer difficult questions directly.
Another efficient strategy is to save costs by omitting information that is highly accessible, available, predictable, expected, typical, and so on. In this book I will use the term accessibility, in order to highlight the role of the cognitive state of the speaker and the addressee in efficient language use. Accessibility reflects the ease with which some mental representations or forms can be activated in or retrieved from memory (Bock Reference Bock1982; Bock and Warren Reference Bock and Warren1985). If a mental representation or a form is highly accessible, it is either activated in discourse, or it can be easy to access due to high frequency, salience, relatedness to the activated information, etc. The notions of activation and accessibility are closely related. If some information has low accessibility, it is normally not activated. But accessibility is not reducible to activation. For example, Bill Gates is accessible as a referent because he is famous, but he was not activated in the context before this sentence.
The notion of accessibility has been successfully used for explaining why some referential expressions are long (e.g., the old schoolteacher) and others are short (e.g., she) in Accessibility Theory (Ariel Reference Ariel1990, Reference Ariel, Sanders, Schliperoord and Spooren2001). According to Ariel, more informative, less ambiguous and longer forms help to identify less accessible referents. Accessibility of referents has been shown to depend on a variety of factors, including recency of mention in discourse, topicality, syntactic role, the presence of competing referents, and others (see Section 2.2.1).
In addition to length differences, accessibility has been argued to affect the order of syntactic constituents. Different flavours of accessibility – conceptual, lexical, semantic and phonological – are important in this regard. For example, referents that are more accessible due to their previous mention or high imaginability tend to appear before less accessible ones (cf. Bock and Irwin Reference Bock and Irwin1980; Bock Reference Bock1982; Bock and Warren Reference Bock and Warren1985; see also Section 3.2.2). Factors including animacy, concreteness, shortness and discourse-givenness play an important role in the choice between near-synonymous constructions, e.g., particle placement, the English double-object vs. prepositional dative constructions, and the active–passive alternation (Weiner and Labov Reference Weiner and Labov1983; Gries Reference Gries2003; Bresnan et al. Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007). Different parameters of accessibility usually overlap (e.g., she normally refers to a discourse-given and animate referent, and is also a highly frequent and short wordform, which is easy to retrieve), but they often have independent effects on language production (e.g., Bresnan et al. Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007).
In Relevance Theory (Sperber and Wilson Reference Sperber and Wilson1995), accessibility is a property of the contextual assumptions needed for recovery of the intended meaning. The more frequently a certain assumption is used for inference, the greater its accessibility. For example, for an average European, stereotypical assumptions like ‘Germans love rules’ or ‘Italians talk with their hands’, which are often used in jokes, are probably more accessible than stereotypes describing Canadians or New Zealanders. Also, the fewer steps one has to take in order to get from the immediate context to certain information for deriving cognitive effects, the higher the accessibility of this information.
In this book, the term ‘accessibility’ will be used in a broad sense, covering diverse kinds of intended information: referents, lexical and grammatical meanings, syntactic functions, connotations, and so on. I believe that the observed manifestations of efficiency in different areas of linguistics have more in common than has been acknowledged so far, and that we can speak of accessibility asymmetries in phonology, lexicon, morphology and syntax, which are correlated with different levels of costs.
In many cases, accessibility can be measured quantitatively, although the measures may vary from one case to another. For example, Haspelmath (Reference Haspelmath2021a) uses the relative frequencies of contrasting grammatical meanings (e.g., singular and plural) to explain many efficient formal asymmetries (see Section 2.3). Accessibility (or rather, inaccessibility) can also be discussed in terms of surprisal, or informativity, which are popular in studies taking an information-theoretic perspective on efficiency (e.g., Cohen Priva Reference Cohen Priva, Abner and Bishop2008; Levy Reference Levy2008; Piantadosi, Tily and Gibson Reference Piantadosi, Tily and Gibson2011; Seyfarth Reference Seyfarth2014). These measures are often estimated with the help of n-gram frequencies in large corpora, or, more recently, with the help of neural network models, which often show higher correlations with measures of brain activity and human behaviour in language processing than simpler n-gram models (e.g., Heilbron et al. Reference Heilbron, Ehinger, Hagoort and de Lange2019). There is substantial evidence that informativity is correlated with formal length (see Chapter 2).
Crucially, accessibility depends on common ground. For example, in studies of the production of referential expressions in a joint activity, speakers rely on the knowledge that they believe they share with the addressee. In particular, Clark and Marshall (Reference Clark, Marshall, Joshi, Webber and Sag1981) speak about three possible sources of common knowledge:
preceding linguistic context;
beliefs about the communities the speaker and the addressee belong to (see also Isaacs and Clark Reference Isaacs and Clark1987);
particular interaction, physical context, common past experience.
Common ground plays a vital role in minimization of effort. For example, in a director-matcher experiment where the participants had to describe irregular geometric shapes (tangrams), speakers used short referential expressions more often in the subsequent interactions than in the very beginning (Clark and Wilkes-Gibbs Reference Clark and Wilkes-Gibbs1986).
While the effects of situational and linguistic context on the formal properties of linguistic expressions have been studied extensively, there is less research that focuses on the role of encyclopaedic knowledge as a factor that influences accessibility. There are attempts to quantify our knowledge of everyday scenarios and scripts, such as eating at a restaurant or cooking food (Venhuizen, Crocker and Brouwer Reference Venhuizen, Crocker and Brouwer2019), but this information is still not easy to obtain for every situation.
A popular idea in the literature on communicative efficiency is that an efficient speaker will provide information at a constant rate. This hypothesis was first formulated by Fenk and Fenk (Reference Fenk and Fenk1980), who argued that an efficient communication system should distribute the information (understood in an information-theoretic sense and measured in bits) as uniformly as possible across small time spans. More recent formulations are known as the Smooth Signal Redundancy Hypothesis (Aylett and Turk Reference Aylett and Turk2004) and the Uniform Information Density Hypothesis (Jaeger Reference Jaeger2006; Levy and Jaeger Reference Levy, Jaeger, Schlökopf, Platt and Hoffman2007). In a noisy communication channel like language, efficiency is maximized when the rate of information is distributed as uniformly as possible throughout an utterance. Crucially, it should not be distributed too densely for perception, because that would lead to a breakdown in communication. But it also should not be distributed too sparsely because that would result in a waste of time.
This claim is supported by positive correlations between informativity (the opposite of contextual predictability) and duration or orthographic length of linguistic units. For example, Aylett and Turk (Reference Aylett and Turk2004) show an inverse relationship between surprisal of a syllable and its duration. However, it is more challenging to test the uniformity of information carried by more complex meaningful units, such as words or syntactic constituents. In some studies of grammatical variation, it has been shown that language users tend to add optional function words to avoid a peak in surprisal when the upcoming word or construction is not very predictable from previous context (Jaeger Reference Jaeger2006; Levy and Jaeger Reference Levy, Jaeger, Schlökopf, Platt and Hoffman2007; Jaeger Reference Jaeger2010), or omit function words in predictable contexts (e.g., Bouma Reference Bouma, Wieling, Kroon, van Noord and Bouma2016). Yet the correlations between informativity and length can also be interpreted by the tendency to spend less effort and time on more accessible information, and more effort and time on less accessible information. In a similar vein, Ferrer-i-Cancho (Reference Ferrer-i-Cancho2017) argues that uniform information density and similar principles are not needed to explain the correlation between length and informativity: the principle of compression (that is, minimization of coding length in bits) in standard information theory will perfectly suffice.
Moreover, there is not much direct evidence showing that information is indeed distributed uniformly across time. A noteworthy exception is the study by Coupé et al. (Reference Coupé, Yoon Mi, Dediu and Pellegrino2019). Using spoken corpora of seventeen diverse languages, they show that languages have similar information rates (approximately 39 bits per second), which is computed on the basis of information per syllable and speech rate (number of syllables per second). This evidence supports the idea that there is a certain optimal quantity of information per second, which is advantageous under communicative pressure and can probably be explained by neurobiological constraints. To what extent this can be extrapolated to units that express lexical and grammatical meanings is an open question.
At the same time, there are some indications that language users tend to avoid processing overload. For example, human languages have restrictions related to the number of new referents that need to be integrated into discourse (Du Bois Reference Du Bois1987). One of the restrictions is that a clause should contain at most only one new referent that needs to be integrated into discourse (see more in Section 4.5). So, language users may indeed be sensitive to having too much new and inaccessible information. When the cognitive load is too high, processing can break down, and the interlocutors will have to start anew, which means higher total costs.
Similarly, language users can decrease processing effort by putting semantically and syntactically related linguistic units next to each other, thereby avoiding the costs of storing and integrating long dependencies. This also helps to avoid high surprisal (another cost component) because if one word is strongly associated with its neighbours, its surprisal is low. Also, language users tend to produce accessible information as early as possible, which helps to save time needed for planning of less accessible elements. Examples are discussed in Chapter 3.
Even poetry contains relatively accessible words, contrary to what one might expect. An empirical study of Russian poetry and prose revealed that surprisal of words in poetry and prose is surprisingly close (Manin Reference Manin2012). Poetry has indeed less conventional lexical choices and word order than prose, but the choice of the next word in verse is restricted by metric, rhythmical and other formal constrains. As any person with some experience in writing verse should know, it is often very difficult to find a good replacement for a word in a poem. Even avant-garde poetry is more formally constrained than prose. This suggests that there is a certain level of accessibility that needs to be reached for communication to be comfortable for all participants.
To summarize, we can formulate the following principles for an efficient communicator:
the principle of positive correlation between benefits and costs, which means that language users should spend more effort and time on information that provides more benefits, and less effort and time on less useful information. If information is useless, no effort and time should be spent;
the principle of negative correlation between accessibility and costs, which means that language users should spend less effort and time on highly accessible information, and more effort and time on less accessible information;
the principle of maximization of accessibility, which tells language users to maximize accessibility of information at every point in communication.
The principles interact. For example, if the principle of maximization of accessibility fails, e.g., due to the choice of a particular word order, this will lead to higher articulation costs according to the principle of negative correlation between accessibility and costs. The first principle is more important than the other two. For instance, a message should be informative enough to justify the time and articulation costs needed for articulating it (the first principle). This is why the message still has to be somewhat surprising, despite the pressure to maximize accessibility (the third principle). Also, if some information is not accessible but useless, such information will be omitted, contrary to the second principle, which posits negative correlation between accessibility and costs. An example is omission of some arguments, as in agentless passives (see more in Section 2.2.1).
The strategies formulated above are mostly based on available data from spoken languages. Signed languages can have their own ways of being efficient. Importantly, signers can represent information about different referents and their actions sequentially or simultaneously, whereas speakers have to arrange the elements only sequentially. For example, if one needs to encode ‘A woman is holding a child’, a signer can do it sequentially, by using lexical units for ‘woman’, ‘hold’ and ‘child’. Alternatively, it is possible to express this event simultaneously by using a specific head direction, face expression and eye-gaze directed to an imaginary child, at the same time bending the arm and hand, as if holding the child. According to Slonimska, Özyürek and Capirci (Reference Slonimska, Özyürek and Capirci2020), simultaneous encoding of complex events helps to reduce the cognitive load. Semantically related information is represented simultaneously, which is analogous to dependency length minimization and similar principles of efficient order in spoken languages (see Chapter 3). As a result, the working memory costs are minimized, as well as the time needed for encoding the event.
Notably, there is evidence that the use of gesture with speech can reduce processing costs in spoken communication. For example, Goldin-Meadow et al. (Reference Goldin-Meadow, Nusbaum, Kelly and Wagner2001) asked children to solve a maths problem. The children were then given a list of unrelated items to remember while explaining how they solved the problem. It turned out that the children did better on a memory task when they gestured while explaining their solution than when they were told not to gesture. This means that gesturing lightened working memory. How different channels of communication and their costs interact is an open question which requires further investigation.
1.4 Three Principles of Efficient Communication
1.4.1 The Principle of Positive Correlation between Benefits and Costs
This section discusses the principles of efficient communication formulated above in more detail. According to the first principle, benefits and costs should be positively correlated: high costs are associated with high benefits, and low costs are associated with low benefits. On the low-cost, low-benefit side, this idea is similar to Givón’s principle ‘Unimportant information need not be mentioned’ (Reference Givón2017: 3). For example, if one uses a passive construction with an explicit agent, as in (9a), we can assume that the information about the agent (a moped gang) plays a certain role in the story. The addressee will probably expect the speaker to discuss a new form of street crime. If the speaker mentions the object of the crime (his Rolex watch), as in (9b), the addressee may think that the speaker wants to talk about the careless victim, who wore an expensive watch while travelling. Finally, if the location near the police station is mentioned, as in (9c), the addressee is likely to conclude that the speaker will complain how daring criminals have become.
| a. | A tourist was robbed by a moped gang. |
| b. | A tourist was robbed of his Rolex watch. |
| c. | A tourist was robbed near the police station. |
The idea is similar to a recommendation that is often attributed to Anton Chekhov, a famous Russian playwright and writer: ‘If in the first act you have hung a pistol on the wall, then in the following one it should be fired. Otherwise, don’t put it there.’ If the audience in a theatre see a pistol hanging on the wall, they will expect that it will be used later in the play. Similarly, if the speaker introduces a new piece of information, the addressee will expect it to play a part in the story.
This principle is closely related to Grice’s Maxims of Quantity, ‘Make your contribution as informative as required (for the current purposes of the exchange)’, and ‘Do not make your contribution more informative than is required’ (Grice Reference Grice, Cole and Morgan1975: 45), as well as to Horn’s (Reference Horn and Schiffrin1984) Q-principle: ‘say as much as you can’, and the R-principle: ‘say no more than you must’.
At the same time, there may be cultural reasons for omitting some information. In particular, some arguments of a verb can be omitted because they are taboo, or out of politeness. Examples are provided in Section 2.2.1.
In some communicative situations, the costs can be higher than normal. This usually happens in communication where the primary focus is on the form, rather than on the content, of the message. Consider performance dance, a non-linguistic form of communication. In performance dance, the movements are enhanced, not reduced, which creates greater metabolic costs for performers, but gives the audience extra aesthetic pleasure (Napoli and Liapis Reference Napoli and Liapis2019). Therefore, the high costs are justified by the high benefits.
1.4.2 The Principle of Negative Correlation between Accessibility and Costs
This principle reflects the tendency to use shorter forms to express more predictable, expected, typical, etc. meanings, and longer forms to express less predictable, expected, typical, etc. meanings. Numerous examples are given in Chapter 2.
The principle is somewhat similar to the supermaxim of Manner in Grice (Reference Grice, Cole and Morgan1975), which says, ‘Be perspicuous’. It is related to how something is said (not to ‘what is said’), and includes several submaxims: ‘Avoid obscurity of expression’, ‘Avoid ambiguity’, ‘Be brief (avoid unnecessary prolixity)’ and ‘Be orderly’ (Grice Reference Grice, Cole and Morgan1975: 46). If we interpret ambiguity as lack of accessibility of a single interpretation due to insufficient linguistic cues and context, and unnecessary prolixity as using costly expressions for transfer of highly accessible information, we can see that Grice’s supermaxim of Manner subsumes the principle of negative correlation between accessibility and costs.
Probably more directly relevant, however, is the account proposed by Levinson (Reference Levinson2000) because it involves the notion of typicality, which can be directly linked to accessibility. One of Levinson’s main principles is called the I-heuristic: ‘What is expressed simply is stereotypically exemplified’ (Levinson Reference Levinson2000: 37). Consider the following example:
John: I cut a finger
Under normal circumstances, this utterance communicates that the finger belongs to John, although this information is not encoded in the sentence. This is an instance of generalized conversational implicatures, which can be triggered normally (in the absence of special circumstances) by the use of certain forms in an utterance (Grice Reference Grice, Cole and Morgan1975). If the finger belongs to someone else, a longer expression will be used (e.g., I cut my brother’s finger). We are speaking about an implicature here because it still depends on the context. One can imagine a situation where the most natural interpretation could be that John cut someone else’s finger. That would be the case, for instance, if John were a manicurist speaking to his colleague (Levinson Reference Levinson2000: 17).
Another example is implicated gender. For example, a nurse implicates ‘a female nurse’ because female nurses are the norm in many countries.Footnote 6 If one speaks about male nurses, one often adds the adjective, as in the following scene from the film Meet the Parents (2000) with Ben Stiller and Robert de Niro:
| Jack Byrnes: Is your name Gaylord Focker, yes or no? |
| Greg Focker: Yes. |
| Jack Byrnes: Are you a male nurse? |
| Greg Focker: Yes. |
Also, English nominal compounds, which are formally ‘lean’, tend to have highly accessible interpretations. For example, a bread knife is a knife for cutting bread, a steel knife is a knife made of steel, and a kitchen knife is a knife used in the kitchen. When the intended interpretation is less accessible, a longer expression is used. For instance, one can speak of a knife made of ice when the interpretation of material is intended, rather than of an ice knife, which normally represents a tool for cutting and carving ice (cf. Hawkins Reference Hawkins2004: 47).
From the efficiency perspective, I-implicatures can be accounted for by the principle of negative correlation between accessibility and costs. In the examples above, there is a default interpretation which involves some typical relationship or scenario. The intended interpretation has the highest accessibility given the linguistic cue. For example, female nurses constitute the overwhelming majority of the entire population of nurses, a fact known both to the addressee and the speaker.
Levinson also formulated the Q-heuristic: ‘What isn’t said, isn’t’ (Levinson Reference Levinson2000: 35). This means, in other words, that the lack of extra information or a stronger statement is informative. The Q-heuristic helps the speaker to spare effort because it enables them to omit additional restrictions. In Examples (12) and (13), the (a) version, where this principle is exploited, is shorter than the (b) version, which the speaker would need to produce if language users could not rely on the Q-heuristic.
| a. | Her dress was red. |
| (→ not red and blue or red and any other colour) | |
| b. | Her dress was red, and only red. |
| a. | I’ve eaten some chocolates. |
| (→ not all). | |
| b. | I’ve eaten some chocolates, but not all. |
The knowledge required for inferring these implicatures is the knowledge of the existing alternative expressions, such as ‘red and X’ for (12a), where X stands for any other colour, and ‘all’ for (13a). The addressee derives the implicatures because they understand that the more informative expressions are not selected (Levinson Reference Levinson2000: 40–41). The examples in (b) can also be motivated by the principle of positive correlation between benefits and costs. In particular, (12b) and (13b) can be efficient if the speaker wants to override the addressee’s false belief that the dress was in different colours, or that all chocolates were eaten. Cancelling old beliefs and creating new ones implies greater cognitive effects, which justifies the higher costs.
Finally, Levinson also formulated the M-heuristic. In the short version, it is expressed as follows: ‘What’s said in an abnormal way isn’t normal’ (Levinson Reference Levinson2000: 38). There is also a longer version, which may be somewhat easier to understand:
| The M-heuristic |
| Speaker’s maxim: Indicate an abnormal, nonstereotypical situation by using marked expressions that contrast with those you would use to describe the corresponding normal, stereotypical situation. |
| Recipient’s corollary: What is said in an abnormal way indicates an abnormal situation, or marked messages indicate marked situations. |
| (Levinson Reference Levinson2000: 136) |
Consider some examples, where both I- and M-implicatures are present:
| a. | Sue smiled. |
| (I-implicature → Sue produced a nice happy expression.) | |
| b. | The corners of Sue’s lips turned slightly upward. |
| (M-implicature → Sue produced a smirk or grimace). |
Another pair of examples illustrates the contrast between forms like go to school and go to the school, which involves highly conventionalized inferences:
| a. | She went to school/church/university/bed/hospital/sea/town … |
| (Conventionalized I-implicatures → She went to do the stereotypical activity associated with this location.) | |
| b. | She went to the school/church/university/bed/hospital/sea/town … |
| (M-implicatures → She went to the place, but not necessarily to do the associated stereotypical activity). |
Other examples of the contrast between more typical and less typical expressions which involve implicatures of stereotypical and non-stereotypical situations, include litotes (e.g., happy vs. not unhappy, where the latter implicates ‘less than happy’), lexicalized forms of periphrasis (pink vs. pale red, i.e., an untypical pink), nominal compounds (e.g., a matchbox vs. a box for matches, i.e., an untypical one), some prepositions (e.g., on the table vs. on top of the table). Another well-known case is the contrast between lexical and analytic causatives, as in the example below:
| a. | Susan stopped the car. |
| (I-implicature → in the usual way, i.e., by putting her foot on the brake pedal). | |
| b. | Susan got the car to stop. |
| (M-implicature → in an unusual way, e.g., by using the emergency brake or crashing into a lamppost). |
The notion of markedness is not unproblematic, though. Marked forms are more morphologically complex and less lexicalized than the corresponding unmarked forms; they are also more prolix or periphrastic, less frequent, or less neutral stylistically. One can see that these are very diverse features (cf. Fenk-Oczlon Reference Fenk-Oczlon1991, Reference Fenk-Oczlon, Bybee and Hopper2001; Haspelmath Reference Haspelmath2006). As far as the meaning is concerned, marked forms imply some additional meaning or connotation absent from the corresponding unmarked forms (Levinson Reference Levinson2000: 137).
Thus, we can also regard M-implicatures as inferences based on the principle of negative correlation between accessibility and costs. When confronted with a costly form, the addressee chooses the less accessible interpretation.
For an M-implicature to be derived, it is crucial to have a conventionalized typical expression. If a periphrastic causative does not have a corresponding lexical causative, the implicature of doing something unusual does not emerge, according to Levinson. For example, the expression make someone laugh does not generate such an implicature because English has no lexical causative with a similar meaning (McCawley Reference McCawley and Cole1978: 250). It is an open question whether language users are likely to evaluate the meaning of make someone laugh as having low accessibility, even if they do not consider the causation unusual or strange. A more detailed discussion of causative constructions can be found in Chapter 7 of this book.
1.4.3 The Principle of Maximization of Accessibility
Finally, the principle of maximization of accessibility reflects the tendency to minimize processing effort by producing accessible (given, salient, frequent, etc.) meanings and forms early in the sentence, by putting semantically and syntactically related words next to each other, or by integrating referents in discourse by linking them with previous information. This helps to minimize processing costs related to surprisal and memory (Futrell and Levy Reference Futrell, Levy, Lapata, Blunsom and Koller2017). Therefore, the addressee can expect that at each point in discourse, the information presented next will be maximally accessible given the rules of grammar, unless this is in conflict with the principle of negative correlation between accessibility and costs.
The principle corresponds closely to V. Ferreira and Dell’s (Reference Ferreira and Dell2000) Principle of Immediate Mention, which is driven by the pressure of producing fluent speech efficiently. According to this principle, speakers tend to choose syntactic structures that allow accessible lemmas to be mentioned early. As a result, the speaker buys some time to retrieve the less available material. Similarly, MacDonald (Reference MacDonald2013) speaks of the Easy First Bias: easily retrieved words and phrases tend to appear earlier in the utterance. Time is an important resource for both the speaker and the addressee. According to the unwritten rules of communication, the speaker is responsible for using time wisely.
Not only can specific words be more or less accessible, but also abstract grammatical structures. Speakers tend to reuse recently executed sentence plans, as one can see from experimental evidence and corpus-based studies of morphosyntactic priming (Pickering and Branigan Reference Pickering and Branigan1998; Pickering and Ferreira Reference Pickering and Ferreira2008; Szmrecsanyi Reference Szmrecsanyi2006). With every use, a morphosyntactic plan becomes more likely to be reused in the future. MacDonald (Reference MacDonald2013) argues that the source of rigid word order lies in the preference for easy, more practised plans of utterances, which is called Plan Reuse. It encourages us to reproduce highly familiar structures. Both principles – Plan Reuse and Easy First – help to optimize language production, the former at the level of abstract schemas, the latter at the level of specific words and phrase elements. Human languages rely on both principles, but the preference for Plan Reuse is stronger in languages with rigid word order and weaker in flexible languages.
Addressees also process language as if they have expectations of maximal accessibility. This can be seen from the following example provided by Gibson (Reference Gibson1998):
The bartender told the detective that the suspect left the country yesterday.
The adverb yesterday can be linked to two verbs. One candidate is the local verb left. The other one is the more distant verb told. The local attachment to the verb left is strongly preferred. According to Gibson, this preference is explained by the fact that it is less costly to reactivate the verb left in the memory than to reactivate the verb told when integrating the adverb into the syntactic structure. The reason is that the activation of left has decayed less than the activation of the more distant verb tell. More information about the costs of syntactic integration is provided in Chapter 3. In the presence of ambiguous parses, the addressee will choose the structure that minimizes the costs of keeping the unit in the memory and reactivating it (in addition to other factors, such as plausibility of an interpretation, usage frequency and others). In other words, the addressee will prefer the most accessible interpretation.
Moreover, the fact that the addressee buys into garden-path sentences like The horse raced past the barn fell and other misleading expressions, instead of wisely suspending the choice between possible interpretations until the end of the sentence, suggests that the addressee expects that the speaker will adhere to this principle. These sentences can mislead the addressee exactly because they are different from structures used in everyday communication. The addressee knows from experience that the most accessible interpretation is nearly always correct. The use of such misleading sentences is like a betrayal of this trust (although sentences like these are often produced for humorous effect, so the addressee is in fact rewarded with additional cognitive benefits).
1.5 ‘Good-Enough’ Efficiency
Language users tend to behave efficiently, following the principles formulated in the previous section. But how is this possible? What kind of cognitive processes are involved?
According to the pragmatic theory of Grice (Reference Grice, Cole and Morgan1975) and the more recent Rational Speech Act model (Frank and Goodman Reference Frank and Goodman2012), language users behave rationally and cooperatively, and expect the same from their interlocutors. They follow the Cooperative Principle, which says:
Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.
In a similar vein, we can also formulate the Principle of Communicative Efficiency:
| The Principle of Communicative Efficiency |
| Communicate in such a way as to minimize the cost-to-benefit ratio. |
Similar to the Cooperative Principle, the Principle of Communicative Efficiency can be treated as an implicit and mutual assumption shared by interlocutors. The speaker knows that the addressee believes that the speaker behaves efficiently. This is why when using a particular linguistic expression, the speaker will rely on the addressee’s ability to interpret this expression in the intended way. In cases when this principle is obviously violated, the addressee can derive inferences that are similar to Gricean implicatures. For example, if one asks a question and gets a completely irrelevant answer (e.g., Why aren’t you married? – Nice weather, isn’t it?), this means that the question was possibly stupid or tactless. The inference is based on the principle of positive correlation between benefits and costs.
In the absence of obvious violations, the addressee can make inferences intended by the speaker, as well. For example, when hearing a message with a certain referential expression, the addressee can engage in pragmatic reasoning of the sort, ‘The speaker uses the form F1 to refer to some referent Ri. There are two referents, R1 and R2, that fit this description. The speaker is acting efficiently, and she expects that I know that. The referent R1 is more accessible than the referent R2. If she wanted to refer to the less accessible referent R2, she would be using the more costly form F2. But she is using the less costly form F1. Therefore, she should mean R1’.
I believe that this type of reasoning, which involves complex mental recursion with several levels of embedding, is possible, but infrequent. It is too effortful. The number of phonological, lexical and grammatical choices a language user makes during language production is so large that we will be quickly overwhelmed. Somewhat paradoxically, trying to be maximally efficient is inefficient. Using Daniel Kahneman’s (Reference Kahneman2011) terms, we use ‘fast’ (that is, sloppy, automatic, stereotypic, unconscious and easy) thinking way more often than ‘slow’ (deliberate, logical and effortful) thinking. In other words, our rationality is bounded.
I argue that efficient communicative behaviour is usually automatic, unconscious and therefore not very costly from the processing perspective. Language users adhere to simple shortcuts whenever possible, treating the efficient strategies as heuristics, or rules of thumb, similar to Levinson’s (Reference Levinson2000) heuristics described in Section 1.4.2.
Once successful, efficient linguistic choices are reproduced over and over again, requiring even less effort than previously (cf. Diessel Reference Diessel2019: 37). This kind of recycling is efficient by itself because it saves our effort and produces desired effects most of the time. Communicative efficiency is only ‘good enough’, like most of our thinking and language processing (cf. Ferreira Reference Ferreira2003). Although language users can engage in the recursive reasoning exemplified above, this happens rarely, when the principles of efficient communication are obviously violated, for example.
This view is supported by the results of experiments in Turnbull (Reference Turnbull2019), who found no consistent relationship between efficient phonetic reduction driven by predictability and individual differences in theory of mind. Individuals with high scores on the autism-spectrum quotient (AQ) questionnaire and low scores on other theory-of-mind tasks did not show less efficient behaviour in language production than participants with greater theory-of-mind abilities. This suggests that theory of mind is unlikely to be strongly involved in efficient management of communication costs, at least, as far as efficient reduction is concerned. However, note that all participants were neurotypical, which leaves open the question how autistic individuals with even weaker theory-of-mind abilities would perform in this respect.
At the same time, there is some empirical evidence that the principles of efficient communication formulated above can trigger pragmatic inferences. For example, the use of more costly forms signals that something unexpected is going on, which needs more attention and effort. In contrast, when less costly forms are chosen, this signals to the addressee ‘business as usual’, nothing special is going on. In this way, costly and cheap forms can also be seen as tools for attention management, directing the processor’s cognitive effort to where it is needed most. Gordon and Chan (Reference Gordon and Chan1995) showed that reading times increase when a repeated long referential expression is used for a highly accessible referent. Consider the following example:
Susan decided to give Fred a hamster. She/Susan was questioned at length by Fred about what to feed it.
When the referential expression was longer (Susan), the self-paced reading times were longer than when the pronoun she was used. The proper name is supposed to introduce a new referent, but this is not the case. The readers get confused, which slows down their reading times. These results indicate that there are certain expectations based on the principle of negative correlation between accessibility and costs.
Similarly, it was demonstrated by Engelhardt, Demiral and Ferreira (Reference Engelhardt, Demiral and Ferreira2011) that overspecification of referential expressions leads to longer processing times and negativities in the event-related brain potentials (ERP, N400), which usually indicate different types of semantic and syntactic anomalies during language comprehension. For example, a participant in an experiment can see a red star and a red square side by side on the screen. If the participant hears ‘Look to the red star’, this results in slower reaction times and ERP negativity in comparison with the situation where the participant can see a red star and a blue star. This means that over-descriptions with an unnecessary modifier impair comprehension performance. More information does not mean more clarity.Footnote 7
There is also evidence that uninformative utterances are costly for pragmatically competent language users, as well. For example, processing of trivial sentences like Some people have lungs triggered a pragmatic N400 effect (Nieuwland, Ditman and Kuperberg Reference Nieuwland, Ditman and Kuperberg2010). Although accessible information is normally easier to process, language users expect a certain degree of inaccessibility from their interlocutor if a substantial amount of effort and time is used. Similarly, Rohde, Futrell and Lucas (Reference Rohde, Futrell and Lucas2021) found shorter self-paced reading times for more newsworthy and informative expressions (e.g., chopping carrots with a shovel) in comparison with less newsworthy and informative ones (e.g., chopping carrots with a knife).
To summarize, this evidence suggests that language users have certain expectations about the relationships between the accessibility of information and the costs and benefits of its transfer. These expectations may be based on pragmatic reasoning, but it is more likely that they operate in form of simple heuristics, e.g., ‘Don’t waste time and effort on useless information’, or even abstract constructional schemas representing pairings of form (e.g., a proper name) and function (e.g., a low-accessibility referent), which emerge as a result of frequent occurrence in language use.
A related question is how much ‘audience design’, or adjustment of one’s message to take into account the addressee’s perspective (Bell Reference Bell1984), is needed for efficient communication? The evidence is mixed. For example, in an experiment by Isaacs and Clark (Reference Isaacs and Clark1987), pairs of participants were asked to work together to arrange pictures of New York City landmarks by talking about them. Some of the participants knew the city well; others did not. The experiment demonstrated that the participants assessed each other’s level of expertise very quickly and automatically, adjusting their choice of descriptions. In particular, the directors, who were describing the pictures, used fewer words on average when the matchers were familiar with the city than when they were not. For example, a director would say ‘the Citicorp building’ (now ‘Citigroup center’) to an expert and ‘the tall building with the triangular top’ to a novice. The experts often introduced the proper name after the match. This allowed the novices to become experts, too. As a result, the total number of words decreased in the course of interaction. However, Isaacs and Clark also hypothesize that in more naturalistic settings, where the criteria of success are less stringent than in the experiment, language users would spend less effort by accommodating less and exchanging less expertise.
In some cases, language users take into account the other’s perspective, but they do so inconsistently. For example, Vanlangendonck, Willems and Hagoort (Reference Vanlangendonck, Willems and Hagoort2018) investigated situations where the speaker and the listener have access to the same or different information. In their experiment, the speaker described one object from a set of objects shown in a picture, and the listener was supposed to select it. Some objects on the pictures were visible only to the speaker, while some others were visible to both participants. For example, if the speaker saw three glasses, one small, one medium-sized and one large, while the addressee saw only the large and the medium-sized ones, we could expect that the speaker would refer to the medium-sized one as ‘a small glass’, provided that they took into account the addressee’s perspective. The participants indeed chose the pragmatically more felicitous description most of the time, but not always.
Fukumura and van Gompel (Reference Fukumura and van Gompel2012) investigated if speakers choose pronouns or full noun phrases depending on whether the addressee also heard an immediately preceding sentence that made the target referent more accessible to the addressee. Although in general the proportion of pronominal reference was higher in the shared condition (i.e., when the addressee heard the preceding sentence) than in the privileged condition (i.e., when the addressee did not hear the sentence), as one could expect based on Accessibility Theory (Ariel Reference Ariel1990, Reference Ariel, Sanders, Schliperoord and Spooren2001; see also Section 2.2.1), the difference was small and did not reach statistical significance. Moreover, this tendency was observed regardless of whether the preceding sentence mentioned the target referent or the competitor. More exactly, the speakers used pronouns slightly more often when the preceding sentence was shared with their addressee than when it was not. However, they did not take into account whether this sentence made the referent or the competitor accessible to the addressee. It is likely that the speakers experienced difficulties combining two types of information: the fact that the sentence was shared with the addressee, and the content of the sentence.
Experimental evidence suggests that the extent to which a speaker engages in audience design, taking into account the addressee’s perspective, depends on the cognitive load in a specific task (Pate and Goldwater Reference Pate and Goldwater2015). If it is low, then the speaker adjusts to the addressee’s needs. If it is too high, the speaker fails to do so. If it is in-between, the speaker tries to evaluate the situation and make a choice. For example, speakers can adjust their language to the listener’s needs only if they see that the listener is engaged in interaction. This selectivity can be seen as efficient behaviour.
Pate and Goldwater (Reference Pate and Goldwater2015) found that predictability given preceding context affects word durations in adult-directed speech, but not in infant-directed speech. Also, when interlocutors saw each other and could use the visual channel, they used less phonetic redundancy in communication. The conclusion is that language users modulate predictability effects according to very coarse, salient, and easy-to-track characteristics of the addressee and the channel of communication.
It is also possible that the speaker and the addressee evaluate the costs, benefits and accessibility in many situations automatically, based on their own cognitive states rather than the addressee’s. This should work well when the speaker and the addressee create common ground in social interaction. Performing joint actions or simply discussing different issues, they align their conceptual representations and even synchronize their neural patterns (Stephens, Silbert and Hasson Reference Stephens, Silbert and Hasson2010). This helps to evaluate the accessibility of certain information for the partner correctly and easily. In the absence of such alignment or under cognitive pressure, language users fall back on their previous communicative experience, reproducing linguistic strategies that were previously successful with similar interlocutors or in general.
To summarize, the Gricean behaviour of the speaker and the addressee, involving full-scale pragmatic reasoning for making efficient choices, is possible, but unlikely in most situations of language use. The pragmatic principles described in the previous sections normally operate as heuristics and are likely to become conventionalized. If an efficient strategy is found, it can be recycled in future communication under similar circumstances, leading to ‘good-enough’ efficiency. The amount of the speaker’s effort required for estimating the accessibility of some information for the addressee is restricted by the cognitive resources the speaker has at the given moment.
1.6 Conclusions
This chapter has introduced the main concepts that will be discussed in this book: communicative efficiency, costs, benefits and accessibility. It was also proposed that language users’ behaviour is guided by several principles, which explain how they can behave efficiently in everyday communication. The three main principles are as follows: the principle of positive correlation between benefits and costs, the principle of negative correlation between accessibility and costs, and the principle of maximization of accessibility. I also argued that it is unlikely that these principles are normally realized in fully rational behaviour. It is more probable that the principles work as heuristics, or rules of thumb, which operate automatically and unconsciously most of the time, and do not involve full-scale mind reading and perfect audience design. A crucial role is played by previous experience of using language. Successful linguistic choices are recycled again and again.
These concepts and principles will be exemplified in the following chapters, which describe different manifestations of efficiency in language structure and use. In Chapter 2, I will discuss longer and shorter alternative forms which can be used efficiently. The main focus will be on the principle of negative correlation between costs and accessibility. Chapter 3 focuses on efficient order of meaningful elements and deals mostly with the principle of maximization of accessibility. Chapter 4 will show several less frequently discussed types of efficiency, which help to maximize accessibility as well.
2.1 Efficient Length Asymmetries
This chapter describes efficient use of linguistic units with different lengths. Length represents articulation costs, but it can also be interpreted as time expenditure. It is difficult to separate articulation costs from time costs. In phonological studies, one often measures the duration of units (e.g., Aylett and Turk Reference Aylett and Turk2004), while in studies of lexicon and grammar, which are often based on written corpora, the focus is usually on the number of words, segments or letters (see examples below). Although articulation effort also depends on stress and amount of articulatory detail, which require carefully annotated spoken data, I will focus here on length, which is easier to measure and compare.
As mentioned in Chapter 1, articulation is the slowest and most energy-consuming stage in human communication. The speaker can spare effort and time by omitting or shortening the forms that represent accessible information – that is, the information already available to the addressee, or easily inferable from the context and general knowledge. In contrast, more effort should be spent on information that is less accessible. This behaviour corresponds to the principle of negative correlation between accessibility and costs. We speak of an efficient length asymmetry when there is a negative correlation between formal length and accessibility of information. The sections below illustrate diverse formal asymmetries that display this correlation.
Formal length asymmetries are extremely diverse. Some efficient asymmetries are fully conventionalized, as, for example, zero marking of singular and non-zero marking of plural, e.g., book – books. Some asymmetries are context-dependent and require pragmatic inference. For example, when someone says their address to a taxi driver, they in fact exploit the principle of negative correlation between accessibility and costs. An expression like Park Street, 23, please is efficient, unlike saying I need to get to Park Street, house number 23, in this city. I want you to take me there in your cab now and promise to pay you a certain amount of money in return if you get me there.
The principle of negative correlation between accessibility and costs is also responsible for so-called bridging implicatures. For example, if your friend says, I bought a new bicycle yesterday. The saddle is very comfortable, you will understand that the saddle belongs to the bicycle that your friend has bought. You friend relies on your ability to access the knowledge that a typical bicycle has a saddle. This allows your friend to spare effort instead of saying The saddle of the bicycle I bought yesterday is very comfortable. This type of efficiency is pervasive in discourse.
Importantly, by opting for a longer or shorter expression, the speaker signals how accessible the intended interpretation is. The length itself represents an instruction for where to search for the interpretation. For example, the pronoun she, as discussed in the next section, means that the referent is not only female and singular, but also highly accessible, whereas the definite description the friend means not only a ‘close acquaintance’, but also that the referent has a relatively low degree of accessibility (Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001: 29). In this sense, every linguistic expression the speaker chooses is also a marker of the accessibility of its interpretation (cf. Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001). Some marking of this type involves the speaker’s choice, as in the referential expressions mentioned above (see Section 2.2), while some is fully conventionalized, as in the obligatory marking of grammatical categories (see Section 2.3). I argue that the emergence and maintenance of obligatory marking follows the same pragmatic principle as the speaker’s choice between different coding possibilities in optional marking – namely, the principle of negative correlation between accessibility and costs.
2.2 Accessibility of Referents and Length of Referential Expressions and Markers
2.2.1 Efficient Use of Referential Expressions: Hierarchy of Explicitness
An important type of efficient context-dependent asymmetries is observed in referential expressions. One can formulate a hierarchy of explicitness of such expressions (Ariel Reference Ariel1990, Reference Ariel, Sanders, Schliperoord and Spooren2001; Arnold Reference Arnold2010, see also Givón Reference Givón1983, Reference Givón2017), as shown in (1):Footnote 1
| Hierarchy of explicitness: | |
| Most explicit | |
| Semantically rich expressions (the most popular teacher at our school) | |
| Shorter nominal expressions (Ann, the teacher) | |
| Pronouns (she) | |
| Zeros | |
| Least explicit | |
This variation is constrained by the degree of accessibility of mental representations of the referents. The notion of accessibility was introduced in Chapter 1. Highly accessible representations are expressed by shorter forms than less accessible ones. Note that there can be more subtle accessibility distinctions within these broad categories, which cannot be explained by length alone. For example, James as a surname can signal lower accessibility than James as a first name (Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001). In Russia, colleagues would refer to me using the full patronymic form Natalja Gennadievna in front of the students, and would say Natasha when speaking with other colleagues, although the accessibility of me as the referent could be the same. As discussed in Section 1.2.2, social costs often interact with formal length.
The level of accessibility of referents in discourse depends on several factors (Ariel Reference Ariel1990, Reference Ariel, Sanders, Schliperoord and Spooren2001, Reference Ariel2008; Arnold Reference Arnold2010), which can interact in complex ways (see Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001). A crucial factor is previous discourse. The referents that have been introduced in discourse have more activated representations than the referents that have not been mentioned. This is why full nouns are typically used to introduce new referents, while pronouns or zeros are usually reserved for the referents already introduced in the discourse. Moreover, the more recent the mention of the referent, the more accessible the mental representation is. For example, Arnold (Reference Arnold2010) provides data to show that the chances of pronominal reference decrease with distance from the last mention of the referent (measured in clauses). Paragraphs and episode boundaries also decrease accessibility. A related factor is density of mention. The higher the density of mention of a referent in previous discourse, the more activated its mental representation is and therefore the higher the chances of short (pronominal) expressions (Levy and McNeill Reference Levy and McNeill1992). In addition, topical referents are more accessible and therefore expressed by less explicit forms than non-topical ones.
The syntactic function of the referent is another important factor. A referent is more accessible if it has been mentioned previously in the same syntactic function. This parallelism makes it easier for the addressee to identify the referent. This explains why reduced forms are more likely if the referring expression and the previous mention of the referent are in the same syntactic position (Levy and McNeill Reference Levy and McNeill1992). Consider an example:
| Ann invited Sue to the conference. | |
| a. | She asked Sue to present her new research on metaphors. |
| b. | Sue asked her to tell more about the event. |
According to Arnold (Reference Arnold2010), the preference for the pronoun she/her that refers to Ann should be stronger in (2a) She asked Sue…, than in (2b) Sue asked her… . Also, the current thematic role of the referent can be important. For example, Arnold (Reference Arnold2001) shows that goals of verbs of transfer, e.g., give/send/bring to Sue, are more frequently referred to by shorter pronominal forms than sources, e.g., accept/get/borrow from Sue. Language users also refer more to goal referents than to source referents in discourse, as Arnold’s story-telling experiment and corpus analyses reveal. This frequency asymmetry is also observed for inanimate goals and sources (e.g., to London/the market/a village is more frequent than from London/the market/a village), which accounts for the cross-linguistic differences in the length of marking of goals and sources (Michaelis Reference Michaelis2017). The higher probability of goals means their higher accessibility, which explains why they are expressed by shorter forms than sources.
The presence of competing referents in the context decreases accessibility. Tily and Piantadosi (Reference Tily, Piantadosi, van Deemter, Gatt, van Gompel and Krahmer2009) found, in particular, that participants were less likely to guess the upcoming referent correctly if there were many referents in the previous text. Notably, the presence of other referents plays a role even if there is no direct need for disambiguation. For example, Arnold and Griffin (Reference Arnold, Losongco, Wasow and Ginstrom2007) performed an experiment with cartoons, on which the subjects could see either one character or two different-gender characters. The first line of the story was, for example, Daisy went for a boat ride {with Mickey} on the lake. Next, the second picture was shown, which displayed one character doing something (e.g., Daisy rowing away). The second character was either present or absent. Participants generated another line for the story (e.g., Daisy left Mickey behind; or She rowed into the sunset). Interestingly, pronouns were more common in the one-character than two-character stories, despite the obvious fact that there was no risk of confusability, since the characters had different genders. The competition between the characters in the speaker’s mental model results in greater cognitive load and, importantly, in lower activation of each referent.
Finally, we should mention the interaction between the speaker and the addressee. Wilkes-Gibbs and Clark (Reference Wilkes-Gibbs and Clark1992) show that descriptive nominal expressions tend to become shorter when the speaker and the hearer develop and expand their common ground – the information they believe they share. Interestingly, even subtle differences in the status of the hearer, e.g., from being able to overhear or watch the previous interactions to being totally new to the scene, determine the amount of coding in the subsequent interaction.
To summarize, the more accessible a referent is due to the immediate context, interaction settings, syntactic role or previous experience with language, the less costly the referential expression will be. As pointed out by Ariel (Reference Ariel1990), the choice of the specific form helps the addressee to identify the location of the referent in their mental representation. The use of a shorter variant signals that the referent is accessible. Longer forms signal low accessibility. Section 1.5 discussed experimental evidence showing that processing costs increase if there is a mismatch between the length of a referential expression and the accessibility of the referent. This supports the idea that the pragmatic processes captured by the principle of negative correlation between accessibility and costs play a role in the processing.
As for zero anaphora, different languages have different rules with regard to which constituents can or should be omitted in discourse. Yet, there are a few general tendencies. First, given and topical referents, which are restorable from previous discourse, are more frequently omitted than new and focal ones. In many languages, including Chinese, Japanese, Korean, Hindi, Hungarian and Lao, any given, non-focal argument can be omitted, whereas no language omits focal elements (Goldberg Reference Goldberg, Östman and Fried2005). As defined by Lambrecht (Reference Lambrecht1994: 218), the focus relation relates ‘the pragmatically non-recoverable to the recoverable component of a proposition and thereby creates a new state of information in the mind of the addressee’. This is why focal elements need to be overtly expressed.
In Gilligan’s (Reference Gilligan1987) cross-linguistic study, imperative subjects can be omitted in nearly all languages, followed by subjects and then by direct objects. Other constituents (indirect objects, possessive pronouns and adpositional objects) are very rarely omittable. Note that this hierarchy is observed in languages without agreement (see Section 2.2.2). The hierarchy can be explained by the different average levels of accessibility of different arguments. Imperative subjects are easily restorable from the context, and therefore highly accessible. They are followed by other subjects, which are more frequently thematic, given, and therefore more accessible than objects (Lambrecht Reference Lambrecht1994: 262; see also Chapter 8). Some languages display variation within a specific argument. For example, in Ancient Greek, it was natural to omit definite objects if they were highly accessible (Luraghi Reference Luraghi2003). Notably, their omission depended on the degree of conventionalization and grammaticalization of the information helping the addressee to access the object referent. In highly grammaticalized constructions with conjunct participles,Footnote 2 object omission was obligatory. It was also common in coordinated clauses, followed by answers to yes–no questions. In other cases, omission was discourse-conditioned and optional. It affected highly accessible topical objects.
The examples of obligatory zero arguments demonstrate that efficient behaviour motivated by accessibility of a referent in context can become conventionalized, becoming obligatory. This mechanism is efficient by itself, as it makes language production more automatic and reduces the processing load.
English represents an interesting case as far as zero objects are concerned. Although it generally does not allow for object omission, there are a few lexically specific exceptions (Fillmore Reference Fillmore1986). Consider the following contrasts:
She won Ø can be said when the person in question won an election/game/race, but not if she won the gold medal or the first prize.
She lost Ø, again, can be said if she lost some competition, but not if she lost her wallet or keys.
We’ve already eaten Ø can be said in the situation when we have had a meal, but not when we have eaten something specific.
I forgot Ø, e.g., to fix something, but not if the speaker forgot the keys.
Interestingly, the object cannot be omitted even if it is previously mentioned or clear from the context (e.g., Where’re the keys? I forgot *(them)).
One might think that abstract entities and events are more commonly omitted than concrete physical objects. However, this is not quite true. If we take verbs of motion with a specific destination or point of departure, the object can be omitted if it is a physical location, and cannot be omitted if it is abstract and metaphorical:
She was approaching Ø (e.g., the speaker, the town), but not if she approached the solution.
She arrived Ø (e.g., at the summit), but not if she arrived at the answer.
The elliptical use is supported by conventionalized inferences based on the principle of negative correlation between accessibility and costs. This is obvious in the case of motion verbs, where the interpretation of the physical motion (approaching a location and arriving at a certain place) is the stereotypical interpretation, and the metaphorical extensions (approaching a solution or arriving at an answer) are less accessible. In other cases, the interpretation that allows the ellipsis is on average more probable than the interpretation that does not.
Consider the verb win. In a random sample of 100 examples of the verb from the Corpus of Contemporary American English (COCA, Davies Reference Davies2008– ), 90 were instances of win as a verb followed by a direct object or used without any complement. The majority of these instances (61) were about winning some competition (elections, sports, social conflicts, etc.), as in (3a). We will call this sense win1. Only 26 were about winning something for oneself (a prize, confidence, support, more rights, a Senate seat, etc.), as in (3b). This usage will be called win2. In three instances, it was difficult to classify the examples semantically.
| a. | Everything counts, everything has to be perfect for you to win the game. |
| (COCA, News, Denver, 2005) | |
| b. | Guess what? You can win a cruise at home as well. |
| (COCA, Spoken, NBC: Today Show, 2017) |
The meaning of win1 (i.e., winning some competition) is more common and therefore more restorable from context than win2 (i.e., winning some objects or other benefits). Also, the information about winning a competition is often mentioned previously or clear from context. Consider (4a and b):
| a. | If this is a big chess game, did you win or lose? |
| (COCA, Spoken, CBS_48Hours, 2007) | |
| b. | How are you doing in the polls? How are you going to win in New Hampshire? |
| (COCA, Spoken, CBS_Early, 1999) |
So, the information about the competition X wins (win1) is often accessible. It is discourse-given and topical. In contrast, the information about the prize X wins (win2) is usually not accessible. It is often focal. This is why the intransitive use of win2 has not become conventional, even if the object is accessible in a given context, e.g., Where did he get ten million dollars from? – He won2 *(them) in a lottery. This example demonstrates how an efficient strategy becomes conventionalized and becomes a categorical grammar rule.
Omission can also be due to reasons different from saving articulation effort or time. For example, taboo objects, such as bodily emissions (spit, piss) are usually omitted for reasons of politeness (from Goldberg Reference Goldberg, Östman and Fried2005):
| a. | Pat sneezed (mucus) onto the computer screen. |
| b. | The hopeful man ejaculated (his sperm) into the petri dish. |
| c. | Pat vomited (her lunch) into the sink. |
These are cases of the so-called Implicit Theme Construction (Goldberg Reference Goldberg, Östman and Fried2005). At the same time, the object is highly accessible to the addressee from general knowledge, so its omission helps to save effort, as well.
Next, the object can also be irrelevant if the attention is on the action itself (Goldberg Reference Goldberg, Östman and Fried2005):
| a. | Tigers only kill at night. |
| b. | She gave and gave, and he took and took. |
These are instances of the so-called Deprofiled Object Construction (Goldberg Reference Goldberg, Östman and Fried2005). This agrees with Givón’s (Reference Givón2017: 3) principle of cataphoric zeros: ‘Unimportant information need not be mentioned.’ Probably the most famous example of this principle at work is omission of the agent in passive constructions:
An English tourist was robbed of his Rolex watch (by Ø).
This type of argument omission is efficient, as well. The speaker does not spend effort on transfer of information that will bring no communicative benefits (see Section 1.4.2).
Goldberg also explains conventionalized habitual uses like She drinks/smokes/writes as a result of such deprofiling of the object, with subsequent lexicalization of the intransitive use. A similar perspective is taken by Givón (Reference Givón2017: 198). Indeed, what is important is that the person in question is an alcoholic, a smoker or a writer.
Although this interpretation is perfectly reasonable, accessibility of the object may also play a role. In particular, Huang (Reference Huang2007: 48–49) classifies uses like John doesn’t drink in the sense ‘John doesn’t drink alcohol’ as cases of lexical narrowing based on an I-implicature (see Section 1.4.2). Alcohol is a highly accessible interpretation if one is speaking about a habit, as the present simple form suggests. So it can be omitted as a typical object. Similarly, one can say John smokes, implying that he smokes tobacco (cigarettes, cigars or a pipe). Smoking other substances would be a less likely interpretation. One might wonder, however, if this inference will be made in a community where other plants are preferred.
Resnik (Reference Resnik1996) investigated the use of English verbs with and without objects in corpora and in human subject norms. He also measured selectional preference strength, which reflects the strength of association between the verbs and semantic classes of their objects.Footnote 3 The stronger the preference, the more biased a verb is to objects of certain semantic classes. Resnik found that the percentage of omitted objects positively correlated with selectional preference strength. For example, drink and sing had the highest rates of object omission, as well as the strongest selectional restrictions. In contrast, verbs like get and make had zero object omission rates and weak selectional restrictions. He concluded that strong selectional restrictions are a necessary condition for object omission. Notably, Glass (Reference Glass and Farrell2020) does not find strong support for this claim in general-interest conversations on Reddit.Footnote 4 However, when the data are taken from specific-interest threads, an interesting pattern emerges: verb objects are more frequently omitted in the communities where they are more strongly associated with a routine. For example, fitness enthusiasts frequently omit the object of the verb lift (weights), whereas home-brewers do not mention the object of bottle (beer). This demonstrates the importance of social and situational expectations for efficient use and omission of arguments.
It is possible that all the factors mentioned above play a role in determining if the argument can be omitted: its level of accessibility (based on diverse sources), the communicative benefits of naming it, and politeness concerns. The interaction of these factors requires further investigation.
2.2.2 Dependent Forms of Arguments
In addition to the factors discussed in the previous section, argument omission also depends on the presence or absence of agreement. In a cross-linguistic survey by Gilligan (Reference Gilligan1987: Section 3.4), languages where the verb agrees with a specific argument nearly always allow for omission of that argument. An example is subject agreement in Pashto:
| Pashto: Indo-European (Huang Reference Huang2007: 142) | ||
| Ø | mana | xwr-əm. |
| apple | eat-1.m.sg | |
| ‘(I) ate the apple.’ | ||
Languages without agreement allow pro-drop less frequently. As far as subject expression is concerned, this claim is supported by a recent study by Berdicevskis, Schmidtke-Bode and Seržant (Reference Berdicevskis, Schmidtke-Bode and Seržant2020), who report that languages that have subject indexation tend to allow for omission of the pronominal subject. They interpret that as evidence for an efficient trade-off: the subject should be coded only once, either as an independent form or as an agreement marker (see a critical evaluation of this claim in Section 6.2.1). There are some indications of a similar trade-off in case of object agreement: some languages (Arabic, Bantu and Iranian) have so-called pro-indexes, which are in complementary distribution with object nominals (Haspelmath Reference Haspelmath, Bakker and Haspelmath2013a; Haig Reference Haig2018). In other words, the indexes cannot occur when the object is explicit (although they may occur in the case of dislocated objects). However, object indexing often depends on diverse semantic and pragmatic factors, which are parallel to those relevant for differential case marking of objects (see Chapter 8). This can lead to patterns opposite to pro-indexing.
Consider Ruuli, a Bantu language, which has differential object indexing. In (9), the index ‑bu- corresponds to the noun class of the object (traps).
| Ruuli: Bantu (Just and Witzlack-Makarevich, Reference Just and Witzlack-MakarevichForthcoming: 2) | |
| Obuterega | o-bu-maite? |
| trap(14) | 2sg.sbj-14.obj-know.pfv |
| ‘Do you know these traps?’ | |
The indexing is probabilistic: 1st and 2nd-person, human and given objects are more frequently indexed than 3rd-person, non-human and new ones (Just and Witzlack-Makarevich, Reference Just and Witzlack-MakarevichForthcoming). This is efficient because new, indefinite/non-specific, nominal and 3rd-person referents are more likely to be objects than subjects, while given, definite/specific, pronominal and 1st or 2nd-person referents are biased towards the subject role (see the data in Section 8.4). So, arguments with a more accessible interpretation in terms of their grammatical role are less likely to be marked than arguments with a less accessible interpretation.
Another example is Maltese (Just and Čéplö Reference Just and Čéplö2019). An object index is always present if the object is pronominal and given, and always absent if it is new and non-specific (in typical VO sentences). Thus, arguments whose grammatical role is less accessible are indexed, and those whose role is more accessible are not. Also, an index is always used in sentences with OV order, which is less typical than VO. By providing an object marker, the speaker helps the addressee to process a sentence with a non-canonical order (see another example in Section 8.3.1).
We can also find efficient patterns at a more general level if we compare different arguments. Siewierska (Reference Siewierska2004: 43–46) observes a cross-linguistic correlation between the two scales in (10), which describe types of person markers.
| a. | Scale of phonological reduction/dependence of person markers: |
| Zero > Bound > Clitic > Weak | |
| b. | Scale of argument prominence: |
| Subject > Direct object/Theme > Indirect object > Oblique |
In the vast majority of languages that she examined (89 per cent, to be exact), more phonologically reduced and/or dependent person markers according to the scale in (10a), are used for arguments higher on the argument prominence hierarchy in (10b). Siewierska explains this correlation by the differences in accessibility of typical arguments in different syntactic positions:
since dependent person markers involve less encoding than independent ones, the expectation is that they should be characteristic of syntactic functions which tend to realize highly accessible referents.
Therefore, we can observe efficient asymmetries both on a global level (between person forms of different arguments), and on the level of specific arguments (as in differential indexing). As we will see below, such ‘recursive’ organization of efficient patterns is very common.
2.2.3 Expression of Coreferential Objects
Coreferentiality allows us to see two types of efficient correlations between accessibility and formal length. First, reflexive pronouns coreferential with the subject are either as long as or longer than corresponding forms with disjoint reference, e.g., English himself vs. him, Dutch zich or zichzelf ‘him/herself, themselves’ vs. hem ‘him’, and Mandarin Chinese (tā) zíji ‘him/herself’ vs. tā ‘him/her’ (Haspelmath Reference Haspelmath2008a). This has to do with the fact that in the overwhelming majority of cases, the subject and the object have disjoint reference (Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001: 37; Ariel Reference Ariel2008: 218–219). For example, the Book of Genesis in Hebrew contains no direct objects coreferential with their subjects, out of approximately 4,500 clauses. This means that a disjoint reference interpretation of an object is more accessible than a coreferential one, which explains why the corresponding forms are often shorter. A diachronic account of the emergence of reflexive pronouns is offered in Section 5.3.1.
Second, similar to what we saw in the previous section on agreement markers, some languages display efficient asymmetries also at a more local level. There is variation within coreferential uses, which depends on the semantics of the verb. A language can have different coreferential forms for objects of verbs that usually represent self-directed actions, which include grooming verbs (e.g., wash, shave or dress), and for objects of verbs normally representing other-directed actions (e.g., hate, see or envy). Coreferential objects of self-directed verbs tend to have forms that are as long as or shorter than coreferential objects of other-directed verbs (Ariel Reference Ariel2008: Ch. 6; Haspelmath Reference Haspelmath2008a). For example, in English it is possible to omit the object when the action is self-directed, e.g., He shaved and dressed. In contrast, one cannot omit the object of an other-directed verb, e.g., He hates himself. This formal difference is efficient because a coreferential object of a self-directed verb is highly accessible, while a coreferential object of an other-directed verb has low accessibility. The different degrees of accessibility are supported by corpus frequencies (Haspelmath Reference Haspelmath2008a).
Thus, on the global level, coreferential objects are usually less accessible than objects with disjoint reference. This is why reflexive pronouns are often longer than non-reflexive ones. Moreover, at a local level, coreferential objects of verbs like wash are more accessible than coreferential objects of verbs like hate, for which disjoint reference is more typical. This is why coreferential objects of verbs like wash are shorter than coreferential objects of verbs like hate. The multiple layers of efficiency we observe here are similar to global and local markedness patterns and coding splits (Haspelmath’s Reference Haspelmath2021b), which are discussed in the next section.
2.3 Grammatical Coding Asymmetries and Splits
2.3.1 Global Markedness
Grammatical coding asymmetries are observed in members of contrasting grammatical categories that are expressed by markers of different length (Greenberg Reference Greenberg1966; Haspelmath Reference Haspelmath2021a). Below are some examples.
| a. | singular vs. plural nouns (e.g., book – books) |
| b. | positive vs. comparative and superlative degrees of comparison of adjectives (e.g., nice – nicer – the nicest) |
| c. | cardinal vs. ordinal numerals (e.g., ten – tenth) |
| d. | indicative vs. subjunctive (e.g., I go – I would go) |
| e. | active vs. passive verb forms (I called X – I was called by X). |
It is a robust cross-linguistic tendency that the first member in these pairs is formally unmarked (or has a shorter marker), whereas the second (and third) one is formally marked (or has a longer marker). These coding asymmetries became important in structuralist linguistics after Roman Jakobson (Reference Jakobson and Jakobson1971 [1932]) extended the notion of markedness from phonology to grammar. In binary oppositions, the shorter member is considered the unmarked one, whereas the longer one is referred to as marked. The unmarked member appears in neutralization contexts. For instance, in the opposition between singular and plural, as in cat – cats, the singular form is used to express the generic meaning, e.g., The cat is a night wanderer. Therefore, it is considered unmarked. With time, the notion of markedness has become so broad, being understood as non-naturalness, cognitive complexity, language-specific or cross-linguistic rarity, etc., that it can hardly be considered a useful scientific concept (see Haspelmath Reference Haspelmath2006). As argued by Fenk-Oczlon (Reference Fenk-Oczlon1991, Reference Fenk-Oczlon, Bybee and Hopper2001) and later by Haspelmath (Reference Haspelmath2006), markedness phenomena can be reduced to frequency effects, which provide a more parsimonious explanation and a causal mechanism for many interesting facts. For example, the unmarked members in the examples above usually have higher inflectional and syntagmatic potential than the marked members (Croft Reference Croft2003: Chapter 4). This and other observations can be explained by the fact that the unmarked members are more frequent than the marked ones (some corpus evidence is provided in Greenberg Reference Greenberg1966).
Importantly for the efficiency account of these asymmetries, the marked members are usually expressed by longer forms than the unmarked ones. According to Haspelmath, the unmarked categories are more frequent, and therefore, their meaning is more predictable:
Speakers can afford to use short shapes or zero coding for predictable meanings, but they have to make a greater coding effort for unpredictable meaning.
Using the notion of accessibility, we can say that a singular interpretation of a nominal is in general more accessible than a plural one. This allows language users to spare effort when speaking about singular referents. The same logic applies to the other coding asymmetries.
2.3.2 Local Markedness
The examples in (11) illustrated global markedness, where the markedness contrast is the same for all instances of the categories (e.g., singular is unmarked, while plural is marked). Local markedness, in contrast, represents a markedness reversal for some members of the contrasting categories. Tiersma (Reference Tiersma1982) discussed such exceptions in the paradigm levelling in Frisian and some other languages. Markedness theory predicts that the levelling of paradigmatic alternation will favour the unmarked form. However, as some nouns in Frisian undergo change, the originally ‘marked’ plural form becomes the basis for the singular form, rather than the ‘unmarked’ singular. For example, goes/gwozzen ‘goose/geese’ becomes gwos/gwozzen. Thus, the plural stem can be seen as unmarked. Tiersma showed that this markedness reversal happened to those nouns that are frequently used in the plural (‘arm’, ‘goose’, ‘horn’, ‘stocking’, etc.). Some examples from Slavic languages and Bavarian dialects are given in Fenk-Oczlon (Reference Fenk-Oczlon1991).
In some cases, the frequency effects can be even stronger and trigger a reversal of the formal marking. There are a few languages, for example, that can have both overt plural marking (e.g., day – days) and overt singular marking (e.g., Welsh pys-en ‘pea’ – pys ‘peas’), depending on the noun. Haspelmath and Karjus (Reference Haspelmath and Karjus2017) distinguish between ‘individualist’ nouns, which tend to occur with uniplex meaning, e.g., day, and ‘gregarious’ nouns, which are usually associated with multiplex meaning, e.g., pea. Gregarious nouns are often the names of fruits and vegetables, e.g., Russian kartofel’ ‘potatoes (mass noun)’ – kartofelina ‘potato’; small animals, e.g., Welsh adar ‘birds/flock of birds’ – aderyn ‘bird’; and body parts, e.g., Cushitic farró ‘fingers’ – farri-t ‘finger’. Corpus data from different languages demonstrate that the nouns that tend to have overt singular cross-linguistically are also predominantly gregarious. That is, they are used in the multiplex sense.
It would be efficient if all languages were like Welsh, marking the plural of individualist nouns and the singular of gregarious nouns. However, this is not what we see in the world’s languages. For example, English individualist and gregarious nouns behave similarly, e.g., day – days, pea – peas, potato – potatoes, bee – bees, eye – eyes. There is a strong competing factor, namely the systemic pressure, which explains why such efficient strategies are not very frequent cross-linguistically. A system with simpler rules is easier to learn (Haspelmath Reference Haspelmath, MacWhinney, Malchukov and Moravcsik2014).
2.3.3 Coding Splits
A famous example of coding splits is differential object marking. If a language formally marks some objects and does not mark others, prominent (e.g., animate and definite) objects tend to be formally marked, while less prominent (inanimate and indefinite) are usually unmarked. Differential object indexing was discussed in Section 2.2.2. Differential case marking of subject and object will be addressed in detail in Chapter 8. In all these cases, languages tend to mark more frequently those arguments for which the interpretation of an object or subject is less accessible given some semantic and pragmatic features or other contextual factors.
Coding splits can also be found in locative marking (Haspelmath Reference Haspelmath2019). If a language has a split depending on the semantics of locative noun phrases, then place names are likely to be unmarked, inanimates can be either unmarked or marked, and animates tend to be marked. The explanation is that place names represent typical locations, while animates are untypical locations. In other words, the interpretation of a location is the most accessible for place names, and the least accessible for animate beings.
Another example is adnominal possessive constructions, e.g., John’s house (Haspelmath Reference Haspelmath2017). In some languages, different possessive constructions are used, depending on whether possession is alienable or inalienable. For example, in Abun, a West Papuan language, there is the following contrast:
| Abun: West Papuan (Berry and Berry 1999: 77–82, cited from Haspelmath Reference Haspelmath2017: 194) | |||
| a. | alienable possession | ||
| ji | bi | nggwe | |
| I | gen | garden | |
| ‘my garden’ | |||
| b. | inalienable possession | ||
| ji syim | |||
| I arm | |||
| ‘my arm’ | |||
This example illustrates a cross-linguistic tendency for inalienable possession constructions, as in (12b), to have shorter coding than alienable possession constructions, as in (12a). Haspelmath’s corpus data demonstrate that entities that are usually inalienable (kinship terms, body parts) more frequently occur in the possessive constructions (e.g., ‘my hand’, ‘his sister’) than alienable objects, such as a house, a garden or a knife. In other words, the interpretation of inalienable entities as possessed is more accessible. Since nouns that are more frequently mentioned as possessed objects receive less formal marking than those that are less frequently mentioned as such, this coding split can be regarded as efficient. More details about the diachronic development of such patterns follow in Sections 5.2 and 5.3.3.
Differential marking of Recipient can be found in English. It can be expressed by a zero-marked form in the double-object dative (e.g., Sue gives her colleague the memory stick), and by a case-marked form in the prepositional dative (e.g., Sue gives the memory stick to her colleague). The two constructions have different word orders, namely, Recipient + Theme in the double-object construction and Theme + Recipient in the prepositional dative (although there can be exceptions, especially in dialects (Hawkins Reference Hawkins1994: 214; Gast Reference Havelka2007)). There is substantial evidence that language users switch between the constructions in order to manage the flow of information and optimize processing, as will be shown in Section 3.2.2. For example, Bresnan et al. (Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007) show that the double-object construction is preferred when the Recipient is animate, definite, given and pronominal, whereas the Theme is non-given, non-pronominal and indefinite and has a low rank on the animacy hierarchy. The prepositional dative is preferred in the reverse situations (see also Hawkins Reference Hawkins1994: 212–214; Goldberg Reference Goldberg1995: 91ff). In addition, according to Goldberg (Reference Goldberg1995: Chapters 5–7), the prepositional dative construction is a metaphorical extension of the caused motion construction ‘X causes Z to move to Y’ (e.g., I sent the letter to my parents/to her old address), while the double-object construction means ‘X causes Y to receive Z’. This semantic difference is also supported by the distinctive collexeme analysis in Gries and Stefanowitsch (Reference Holler, Kendrick and Levinson2004).
Yet, the constructions differ not only with regard to the order of their constituents and semantics, but, crucially, also in the amount of formal coding. Haspelmath (Reference Haspelmath2021b) argues that the shorter variant in alternations is normally used if the referential prominence of arguments corresponds to their roles, while the longer variant is used if there is some deviation from such canonical relationships. In particular, if an argument is animate, given, definite and pronominal, it is more likely to be Recipient than Theme. And conversely, if an argument is inanimate, new, indefinite and nominal, it is more likely to the Theme than Recipient. The features that provide strong cues to the roles (namely, animate, given, definite and pronominal Recipient, and inanimate, new, indefinite and nominal Theme) are associated with the shorter double-object construction, according to the data in Bresnan et al. (Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007). Therefore, we can interpret the division of labour between the two dative constructions as an efficient coding split in the marking of Recipient: the construction with more formal coding (that is, the prepositional dative) expresses the less accessible assignment of roles to arguments than the construction with less formal coding (the double-object dative).
Interestingly, the frequency of the to-dative rose dramatically in Middle English, when formal marking on verbs and nouns was substantially reduced. Zehentner (Reference Zehentner2022) uses corpus data to show that the more costly to-construction was preferred in contexts with semantically atypical Recipient and Theme – that is, if Recipient is inanimate and/or Theme is animate. These findings can be regarded as support for the idea that the additional marking is used to facilitate a less accessible interpretation.Footnote 5
2.4 The Use and Omission of Clause Connectors
2.4.1 Omission of Adverbial Clause Connectors
In Relevance Theory (Sperber and Wilson Reference Sperber and Wilson1995), an important distinction is made between conceptual (representational) and procedural (computational) information. The former is information about concepts or conceptual representations to be processed, and the latter is information about how to process them (e.g., Blakemore Reference Blakemore1987; Wilson and Sperber Reference Wilson and Sperber1993). For instance, the conjunction so plays such a role:
She’s got a PhD, so she’ll be able to fill in this form.
Such connectors indicate the type of inference process that the addressee is expected to go through. In (13), the connector so indicates that the second clause should be interpreted as a conclusion. As Blakemore points out, expressions like so contribute to relevance by guiding the addressee towards the intended cognitive effects. In Grice (Reference Grice, Cole and Morgan1975), such inferences, which are associated with specific expressions, are called conventional implicatures. The connector so conventionally implicates, according to Grice, that the first clause explains the second. In spite of the differences between the theoretical interpretations, there is one common idea: the speaker guides the addressee’s inferential process by providing an instruction about how to process the propositions in the first and second clauses. Other examples of such cues are the connectors but, and, therefore, on the other hand and after all.
Importantly, connectors can be omitted when the intended inference is expected or easy to make. For example, Blumenthal-Dramé and Kortmann (Reference Blumenthal-Dramé and Kortmann2017) investigate the use and omission of causal and concessive adverbial connectors therefore and still, as in the following examples:
| a. | Ann didn’t read the essay questions properly and therefore failed the exam last January. |
| b. | Ann didn’t read the essay questions properly and failed the exam last January. |
| c. | Peter studied a lot and still failed the exam last January. |
| d. | Peter studied a lot and failed the exam last January. |
It is argued that there is a general tendency for concessive relations to be marked overtly, as in (14c), while causal relations are more often left implicit, as in (14b). The reason is that concessive relationships are more cognitively complex. As a result, implicit concessivity is more disruptive to discourse processing than implicit causality.
Taking the efficiency perspective, we can say that a causal interpretation is generally more accessible in discourse than a concessive one. This claim is supported by the counts from the Penn Discourse Treebank obtained by Asr and Demberg (Reference Asr and Demberg2012), who also show that causal relations are much more often implicit (62% to 69%, depending on the order of cause and effect) than concessive relations (8% to 19%). Therefore, the omission of a connector signals that the more probable (causal) meaning is intended. In addition, we cannot exclude that humans have a cognitive bias towards establishing causal links between events, even if these events are not causally related, e.g., the logical fallacy post hoc ergo propter hoc. If this is true, it makes a causal interpretation more accessible.
2.4.2 Omission of Complementizers and Relativizers
Similar reasoning can be applied to other clause-linking elements, such as complementizers and relativizers. They help the addressee to identify the syntactic and semantic role of elements in discourse. In a language with optional clause-linking elements, the speaker can use them if the function of the clause they introduce is more difficult to identify, and omit them if the function is more accessible. An important role is played by their heads – i.e., nominal phrases and predicates. If they are often followed by a clause, the interpretation is easier to access, which allows the speaker to omit the function word. For instance, as shown by Wasow, Jaeger and Orr (Reference Wasow, Jaeger, Orr, Simon and Wiese2011), the relativizer that in non-subject relative clauses is more likely to be omitted when the nominal phrase is definite (e.g., the colleague I’m replacing) or contains a superlative adjective (e.g., the most interesting subject I’ve ever studied) because such nominal phrases are more commonly followed by a relative clause than indefinite nominal phrases (e.g., a secret that I don’t want to tell anyone).
A similar pattern has been observed for that as a complementizer (Jaeger Reference Jaeger2006, Reference Jaeger2010):
| a. | I think (that) alternatives exist. |
| b. | I’ll show ?(that) alternatives exist. |
The corpus data show that the odds of that are lower when the matrix verb is frequently followed by a complement clause (think, guess, suppose, etc.) and higher with matrix verbs that are rarely followed by a complement clause (e.g., teach, see, show). Thus, the omission of that is more likely in (15a) than in (15b).
This variation has been explained by the Uniform Information Density hypothesis, which predicts that speakers aim to transmit information uniformly close to, but not exceeding, the channel capacity (Jaeger Reference Jaeger2006; Levy and Jaeger Reference Levy, Jaeger, Schlökopf, Platt and Hoffman2007; see also Section 1.3). Adding extra markers in more informative contexts helps to keep the information flow even and uniform, avoiding peaks and canyons. Mentioning the complementizer that at the onset of a complement clause distributes the same amount of information over one more word, thereby lowering information density.
As was argued in Section 1.3, the explanation of these effects in terms of the negative correlation between accessibility and effort would be sufficient. The speaker provides additional formal cues to help the addressee to make inferences in those situations when the interpretation is less accessible, and omits them when it is more accessible.
As one more illustration, consider the use or absence of the particle to after help. More information about this alternation is provided in Section 9.3. According to Rohdenburg (Reference Rohdenburg1996), the chances of the to-form increase with linguistic distance (in words) between help and the infinitive. For example, the use of to is more likely in (16b) than in (16a):
| a. | You should help him (to) overcome his fears. |
| b. | You should help this troubled teenager with many complexes and difficult childhood ?(to) overcome his fears. |
This variation has been explained by the principle of (reduction of) cognitive complexity:
| The principle of cognitive complexity (Rohdenburg Reference Rohdenburg1996: 151): |
| In the case of more or less explicit grammatical options the more explicit one(s) will tend to be favored in cognitively more complex environments. |
Rohdenburg also mentions other formal asymmetries, which, according to him, support this principle. They include inflected and uninflected present-tense forms in non-standard varieties of English (e.g., My mother and father drink/drinks), optional prepositions (e.g., time spent (in) doing something) and prepositional substitutions (e.g., She was prevailed on/upon to write another letter). In addition to linguistic distance, which was discussed above, higher complexity is also attributed to passive constructions.
The effect of linguistic distance in (16) can be explained by the principle of negative correlation between accessibility and costs. As the linguistic distance increases and there are more and more words between the matrix verb and the infinitive, the mental representation of the matrix verb becomes less accessible, which makes it more difficult to identify the infinitival complement as a part of the construction with help. At the same time, the addressee may have less experience of using and processing such constructions in discourse because structures like (16b) are quite rare. Therefore, the speaker is more likely to choose the more costly expression in this case.
2.4.3 Resumptive Pronouns
Another illustration is the use of resumptive pronouns in relative clauses. Keenan and Comrie (Reference Keenan and Comrie1977) found that languages use relative clauses according to the following scale, known as the Accessibility HierarchyFootnote 6:
| Subject > Direct Object > Indirect Object > Oblique > Genitive > Obj. of Comparison |
For example, if a language has oblique genitive clauses, e.g., I see an equation, the solution to which is well known, it can also have subject clauses, as well as direct object, indirect object and oblique clauses, as in the examples below.
| a. | I see the woman who works in the room next to mine (Subject RC) |
| b. | I see the woman I admire (Direct Object RC). |
| c. | I see the woman who I sent my manuscript to (Indirect Object/Oblique RC). |
English has all types of relative clauses, although Object of Comparison RCs can be uncomfortable, e.g., the girl who Sue is taller than.
More directly relevant for the topic of this chapter, however, is another finding by Keenan and Comrie, namely that the same hierarchy constrains the use of resumptive pronouns in relative clauses. Consider an example from Hebrew:
| Hebrew: Afro-Asiatic (Keenan and Comrie Reference Keenan and Comrie1977: 92) | |||||
| ha-isha | she-David | natan | la | et | ha-sefer |
| the-woman | that-David | gave | to-her | obj | the book |
| ‘the woman that David gave the book to’ | |||||
Here, la is a resumptive pronoun in the indirect object position. According to the hierarchy, if a language has resumptive pronouns in the subject position, the pronouns will also be used in all other positions. If a language requires or allows them in the indirect object position, it will also require or allow them for obliques, genitives and objects of comparison.
Keenan (Reference Keenan, Fasold and Shuy1975) provided corpus data from English to demonstrate that the order in the hierarchy correlates with the frequency with which different positions occur. In a sample of more than 2,200 relative clauses, subjects were the most commonly relativized (e.g., the girl who is playing a computer game), and objects of comparison were never relativized. There were only a few examples of relativized genitives (e.g., the gate of which the hinges were rusty).
These findings have not been met uncritically, however. In particular, Fox (Reference Fox1987) argued that instead of Subject on the left end of the scale in (18), one should speak about arguments P or S (that is, objects and intransitive subjects, respectively. In some ergative languages (e.g., Dyirbal and Mayan), ergative subjects (A) are not relativized.Footnote 7 Moreover, object relatives are as frequent as subject relatives in conversational English. Fox explains this finding by the important discourse function played by object relatives. Namely, they anchor the head noun phrase with new information, often with the help of pronominal given subjects in the relative clause, e.g., Have you heard about the party we threw in Las Vegas?
One should also mention here a famous debate about the relative complexity of processing of subject and object relative clauses, as in the examples below (from Levy, Fedorenko and Gibson Reference Levy, Fedorenko and Gibson2013; see also references therein):
| a. | The reporter who attacked the senator hoped for a story. (Subject RC) |
| b. | The reporter who the senator attacked hoped for a story. (Object RC) |
It is received wisdom that object relatives are more difficult to comprehend than subject relatives. Numerous accounts have been given. One relevant factor is the memory load, which increases with the number and length of open syntactic dependencies, in particular, with the number of intervening words between the relative pronoun and the verb (see Section 3.2.1). This is why (21a), where the verb follows immediately after the relative pronoun, is easier to process than (21b).
However, this seems only to hold in artificial sentences with full noun phrases. For example, Reali and Christiansen (Reference Reali and Christiansen2007) demonstrated that object relative clauses can be more easily processed (that is, require shorter reading times) when they begin with a personal pronoun, e.g., The consultant that you called, than similar subject clauses, e.g., The consultant that called you. They were also more frequent than subject relative clauses in a large corpus. Object clauses with personal pronouns are much more natural than ones with nouns (cf. Fox Reference Fox1987), which may explain the different results. Thus, the relative complexity of subject and object clauses strongly depends on the specific linguistic cues and the language users’ experience with them. We process more easily what we are frequently exposed to and what we expect to encounter. See also Diessel (Reference Diessel2019: Section 10.5).
Regardless of whether the Accessibility Hierarchy is correct or not, the use or omission of resumptive pronouns can be explained by the principle of negative correlation between accessibility and costs. Ariel (Reference Ariel1990: Section 7.21) argues that the use and omission of resumptive pronouns in Hebrew is driven by the accessibility of their referents. Resumptive pronouns are omitted when the referent is highly accessible and used when it is less accessible. Accessibility depends on different factors, such as the distance from the head noun. Even Subject RCs, which normally do not allow for resumptive pronouns in Hebrew, can contain them if the distance is long. Resumptive pronouns are better in non-restrictive relative clauses (e.g., The foreign students, whom the university accepted, are very hard-working) than in restrictive ones (e.g., The foreign students who the university accepted are very hard-working), because the former are less semantically and pragmatically dependent on the main clause than the latter. Non-restrictive relative clauses are also intonationally (and, at least in English, with the help of punctuation) separated from the main clause. This may reduce the accessibility of the referents in non-restrictive clauses.
In addition, resumptive pronouns can help to ease the memory load and lower the processing costs (see Hawkins Reference Hawkins2004). All this makes the use and omission of resumptive pronouns relevant for efficient communication.
2.5 Same-Subject and Different-Subject Constructions
According to Cristofaro (Reference Cristofaro2003: 250), if the participants of the main clause and subordinate clause are shared, the reference to them in the subordinate clause is likely to be missing. If the situations expressed by the main and dependent clauses have different participants, they are likely to have overt participant reference in the subordinate clause. We can think of overt participant reference in subordinate clauses as a switch-reference device, which signals that the participants are different from those in the main clause, while the absence of participant reference signals that the participants are the same (cf. Ariel Reference Ariel1990: Section 7.1). All this means that highly accessible participants obtain less coding than less accessible participants. Frequently, some coding material is added to facilitate the interpretation, as well.
For example, the subject of the verb want and the complement it controls is usually the same (Haspelmath Reference Haspelmath2013b). That is, the meaning ‘X wants to do Y’ with the same subject is more frequent than the meaning ‘X wants Z to do Y’ with different subjects. When the subject is the same, in most languages it is not mentioned again, as in (22a) from German. If the subjects are different, both of them are mentioned. Moreover, additional coding is often used, such as complementizers and finite verb morphemes, as in (22b).
| German (own knowledge) | ||||||
| a. | Ich | will | zuhause | bleib-en. | ||
| I | want | at.home | stay-inf. | |||
| ‘I want to stay at home.’ | ||||||
| b. | Ich | will, | dass | du | zuhause | bleib-st. |
| I | want | that | you | at.home | stay-2sg.pres | |
| ‘I want you to stay at home.’ | ||||||
In some languages (e.g., Samoan and Korean), a longer verb form is used for the different-subject want. A few languages have the same construction for the same-subject and different-subject meanings, so no coding asymmetry is observed (e.g., Modern Greek). Most importantly, however, the cross-linguistic sample in Haspelmath (Reference Haspelmath2013b) contains no languages in which the same-subject want would be expressed by a longer construction than the different-subject want.
Another example is intend (Comrie Reference de Hoop and Malchukov1986). Intentions usually involve our own future actions, as in (23a), where an infinitival clause is used. But if we speak about intentions with regard to someone else’s actions, a finite clause is required, as in (23b).
| a. | Sue intends to stay at home. |
| b. | Sue intends that Joe should stay at home. |
But this is not the whole story. We can find some ‘local markedness’ examples again. If the verb in the main clause has two human arguments, and one of them appears in the subordinate clause, the use of the short and long forms depends on the lexical semantics of the verb. Take the verb promise. We usually promise someone to do something because we can control our actions more easily. This is why (24a) is shorter than (24b).
| a. | Sue promised Joe to stay at home. |
| b. | Sue promised Joe that he would stay at home. |
Now consider the verb persuade. When we persuade someone, we expect that they will perform some action. In English, this is expressed by an object-control construction with an infinitival clause, as in (25a). But if the agent of the action is the person who persuades, as in (25b), then a finite clause is used.
| a. | Sue persuades Joe to stay at home. |
| b. | Sue persuades Joe that she should stay at home. |
This formal length asymmetry is efficient because the more accessible interpretation is conveyed by a shorter form than the less accessible one. Although in general the principle observed by Cristofaro (Reference Cristofaro2003) is true, the examples with promise and persuade show that languages can have local formal asymmetries which depend on the expectations triggered by a specific verb in the main clause.
2.6 Zipf’s Law of Abbreviation
This section addresses one of the most famous manifestations of language efficiency, namely, the fact that more frequent words tend to be shorter than less frequent ones. This correlation is known as Zipf’s Law of Abbreviation (Reference Zipf1965 [1935]). Bentz and Ferrer-i-Cancho (Reference Bentz, Ferrer-i-Cancho, Bentz, Jäger and Yanovich2016) have tested the law on 986 languages from 80 families, using massively parallel corpora of Bible translations. They found a negative correlation between word length in characters and word frequency for all languages. The Law of Abbreviation is thus an absolute language universal, although it is statistical in each separate language because the correlation is not perfect.
According to Zipf (Reference Zipf1965 [1935]), this correlation is explained by the general pressure to save time and effort. The linguistic mechanisms responsible for this correlation include truncations, e.g., gas instead of gasoline. There is a lot of evidence for this strategy, e.g., app for application, or German Auto for Automobil.
We should also mention here formal erosion. This often happens as a result of grammaticalization (e.g., Lehmann Reference Lehmann2015: Section 4.2.1), for example when full verbs become auxiliaries (the Old English willan ‘want’ > will and ’ll), full pronouns become clitics (e.g., them and ’em) and bound person markers, because becomes ’cause and coz. A more detailed discussion of the diachronic mechanisms that lead to formal reduction is provided in Chapter 5.
The second strategy, according to Zipf, is to use permanent or temporary lexical substitutions. Temporary substitutions are anaphoric pronouns, which were discussed in Section 2.2. Examples of permanent substitutions are car, which is used instead of automobile or, in more specialized domains, juice for electricity or soup for nitroglycerine (at least in Zipf’s times).
There have also been some sceptical opinions about the interpretation of Zipf’s Law of Abbreviation in terms of efficient organization of language. Miller (Reference Miller1957) noted that a correlation between word length and word frequency is also observed if someone randomly types characters on a keyboard with letters and a space character. A randomly typing monkey would produce a sequence of meaningless strings of characters, whereby shorter strings would appear more frequently than longer ones. At the same time, Howes (Reference Howes1968) argued that the assumptions of Miller’s model are not applicable to natural language. Obviously, we do not form words from randomly reshuffled letters to express some random meanings. More recently, Ferrer-i-Cancho, Bentz and Seguin (Reference Ferrer-i-Cancho, Bentz and Seguin2020) showed that Miller’s random typing itself represents an optimal encoding system from the perspective of standard information theory, which means that it is not surprising that the results of random typing are similar to Zipf’s. Moreover, there are multiple indications that efficient formal reduction is an important type of language change. Section 1.1, for example, discussed the shortened forms for ‘coronavirus’. It is impossible to see this and numerous other examples (see Chapter 5) as a result of random processes.
Word length correlates not only with frequency but also with how predictable a word is from its context. In an experimental study, Manin (Reference Manin2006) showed that word length is correlated with the average probability of guessing the word in context. Informativity can be also inferred from very large corpora. Using n-grams from several Germanic, Romance and Slavic languages, Piantadosi, Tily and Gibson (Reference Piantadosi, Tily and Gibson2011) found out that the average informativity, i.e., the negative logarithm of the conditional probability of a word given its previous context (1 to 3 words on the left), is even more strongly correlated with word length than simple frequency. These findings were complemented and extended by Mahowald et al. (Reference Mahowald, Fedorenko, Piantadosi and Gibson2013), who examined such pairs as exam – examination, chimp – chimpanzee and math(s) – mathematics. Their corpus-based analysis demonstrates that the shorter forms had on average lower informativity given their left context. An experiment with forced-choice sentence completion also revealed that the shorter forms are preferred in more predictive contexts.
These conclusions, however, have been challenged recently by Meylan and Griffiths (Reference Meylan and Griffiths2021), who showed that the dominance of informativity is no longer observed when one encodes strings in UTF-8, which is more fit for languages other than English than the ASCII standard, and excludes words that are not found in the dictionaries of the specific languages. Moreover, one may wonder if the results will hold if more diverse languages are taken into account.
In order to answer this question, I investigated corpus data from nine languages: Arabic, Czech, English, Finnish, German, Hindi, Hungarian, Indonesian and Russian. The data are online news corpora with 30 million tokens from each language taken from the Leipzig Corpora Collection (Goldhahn, Eckart and Quasthoff Reference Goldhahn, Eckart, Quasthoff, Calzolari, Choukri and Declerck2012). The length of words was measured in UTF-8 characters. For each language, 4,000 wordforms (only alphabetic characters) with frequency greater than 20 were selected randomly for analysis. This frequency cut-off was used in order to avoid typos and other spurious hits. Frequency was represented by self-information. That is, the frequency is divided by the corpus size and then the negative logarithm is taken. The higher the frequency of a word, the lower the self-information value. Informativity represents the average probability of a word given one previous word, also negatively log-transformed. The more predictable a word on average from preceding words, the lower the contextual informativity value.
Next, Spearman’s rank correlation coefficients were computed for each language (a) between word length and self-information, and (b) between word length and contextual informativity. The results are shown in Figure 2.1. Partial correlations were also computed, such that the correlations between length and self-information were controlled for contextual informativity, and the correlations between length and contextual informativity were controlled for self-information. The partial correlations are represented by symbols (dots and triangles) on the same plot.

Figure 2.1 Spearman’s rank correlation coefficients between word length and self-information, and between word length and contextual informativity. The dots and triangles stand for partial correlations.
The plot shows that in most of the languages, contextual informativity is indeed more strongly correlated with word length, following Piantadosi et al. (Reference Piantadosi, Tily and Gibson2011). The dominance of informativity is particularly striking in highly analytic languages: Indonesian and English, especially if we look at the partial correlations. However, in Finnish and Hungarian, which are highly synthetic, the opposite is the case. Self-information based on simple frequency is more strongly correlated with word length than contextual informativity is. Note that, unlike in Meylan and Griffiths (Reference Meylan and Griffiths2021), words absent from dictionaries were not excluded; however, a follow-up study based on cleaned data reveals divergent correlations between informativity measures and length across languages, whereas the Zipfian correlation between frequency and length remains consistent (Levshina Reference Levshina2022b).
How can we interpret the findings? If we look at the distribution of word frequencies and bigram frequencies, we will see that Finnish and Hungarian tokens and bigrams have the highest number of hapax legomena (that is, units that occur only once). This is not surprising. Because of their rich morphology, Finnish and Hungarian have very many different forms of content words. The grammatical relationships are expressed by word-internal grams rather than by function words. The individual tokens (individual wordforms) are more difficult to predict from other content wordforms, which are rare. This means that the measures of contextual surprisal can be less reliable in those languages. Yet, even if we remove the hapax legomena when computing the surprisal (or, alternatively, all context words with frequency less than 5), the results change very little. This suggests that the results are not an artefact of data sparseness. The relatively infrequent neighbours are less reliable as cues for infrequent wordforms. Another reason is word order: in languages with rich morphology, word order tends to be less rigid and therefore less predictive of the next words than in languages with a less rich morphology (see Section 6.3). This makes the neighbouring tokens less reliable predictors of target words. Moreover, individual constructions also play a role. For example, some postpositions in these languages can be quite long and at the same time highly predictable from the previous word with a specific case form, e.g., Hungarian keresztül ‘through, across’, érdekében ‘for the benefit of’, kapcsolatban ‘in connection with’, kapcsolatos ‘in relation to’ and köszönhetoen ‘due, thanks to’.
Thus, there is no clear evidence that either frequency or informativity is more strongly correlated with length. One of the reasons is that informativity as a psychological construct representing the accessibility of a word for a language user is very difficult to estimate from corpora.Footnote 8 Moreover, different strings of characters have different degrees of wordhood, and the results will depend on orthographic conventions. Despite the debate about which measure is the most appropriate one for measuring the accessibility of words, the correlations reported above can be regarded as evidence for communicative efficiency.
2.7 Phonetic Reduction and Enhancement
Speakers tend to reduce articulation effort while at the same time producing a signal which shows sufficient acoustic distinctiveness for the addressee to correctly identify the linguistic content of the message (Lindblom Reference Lindblom, Hardcastle and Marchal1990). There is ample evidence in the literature that more accessible linguistic units (words, syllables and individual sounds) undergo reduction more frequently than less accessible ones. Bolinger (Reference Bolinger1963) observed that words are durationally shorter when they occur more frequently on their own or in combinations with other words. For example, the relatively new word robot is pronounced longer than the more familiar rowboat, whereas verbs can be pronounced shorter when followed by more typical complements or adjuncts.
The measures of accessibility that determine the degree of phonological reduction can be of different kinds. One of them is the context-free frequency of a given unit in discourse. Another factor is the conditional probability given the left or right context, e.g., n words on the left or right from the target word. Frequency can be measured across different texts or only in previous discourse. Similarly, conditional probability can be measured in a specific context where the unit of interest is used, or it can be averaged across all contexts where the unit occurs (see Section 2.6 for an illustration). In studies inspired by information theory, the probabilities are often made negative and logarithmically transformed, such that the resulting number represents the informativity of the unit in bits (or nats, depending on the logarithm base). Higher probability means lower informativity, and vice versa. Pointwise Mutual Information, which reflects how much more information is obtained about a word upon seeing its neighbour, and the other way round, has also been shown to be relevant for different types of reduction in language production (e.g., Gregory et al. Reference Gregory, Raymond, Bell, Fosler-Lussier and Jurafsky1999).
Bell et al. (Reference Bell, Brenier, Gregory, Girand and Jurafsky2009) studied the relationships between pronounced durations of words in a spoken corpus and several factors: frequency, conditional probability and repetition. They looked separately at content and function words. Both in content and in function words, there was a significant effect of different types of conditional probability – given the previous context or the next context. Moreover, word frequency and repetition led to reduction of content words. Similarly, Fowler and Housum (Reference Fowler and Housum1987) found effects of repetition on the duration of content words in a narration.
Phonetic reduction can manifest itself not only in formal shortening but also in the loss of phonetic detail. For instance, Aylett and Turk (Reference Aylett and Turk2004) report that highly predictable phrase-medial syllables are shorter than less predictable ones. At the same time, there is a loss of articulatory detail. In particular, vowels undergo centralization of their first and second formant frequency values. As a result, the vowel space is reduced (Aylett and Turk Reference Aylett and Turk2006).
Both context-specific and average predictability play a role in reducing the acoustic duration of a notional word, many other factors being controlled for (Seyfarth Reference Seyfarth2014). Therefore, formal reduction is to some extent stored in the lexicon. Similar results are obtained by Cohen Priva (Reference Cohen Priva, Abner and Bishop2008), who finds that oral and nasal stop deletion in English is influenced by the phones’ average informativity. This demonstrates again how the use of a unit in particular contexts percolates into language structure.
Pierrehumbert (Reference Pierrehumbert, Bybee and Hopper2001) proposes an exemplar-based model in order to explain why high-frequency words undergo reduction faster than low-frequency words. For example, the middle schwa is deleted before /r/ and /n/ in high-frequency words, such as evening and every, but is retained in rare words, such as mammary and artillery (Hooper Reference Hooper and Christie1976; see also Fenk-Oczlon Reference Fenk-Oczlon, Bybee and Hopper2001). According to Pierrehumbert, this difference can be explained by the systematic production bias towards lenition (Lindblom Reference Lindblom and MacNeilage1984), or ‘undershooting’ the phonetic target to the extent that it does not disrupt understanding. Since high-frequency words are used more often than low-frequency words, their stored exemplar representations are more affected by this persistent bias. This explains why high-frequency words are more reduced than low-frequency words synchronically and why the former undergo this reduction faster than the latter in diachrony. It does not seem very plausible, though, that there is a certain constant rate of lenition that is applied to every use of a word or sound in every context. Frequent words are also highly accessible on their own and across individual contexts, which is why they can be reduced in the first place.
Speakers also enhance linguistic forms under some circumstances, e.g., when they believe that the addressee may need help to disambiguate between two similarly sounding words. This has been shown in studies of hyperarticulation. For example, when the hearer has to choose between two similarly sounding words, e.g., dose – doze, the speaker tends to increase the voicing of the final consonant in doze more often than in situations when such ambiguity is not present (Seyfarth, Buz and Jaeger Reference Seyfarth, Buz and Florian Jaeger2016). Speakers also hyperarticulate when their communication partners misunderstand instructions (Stent, Huffman and Brennan Reference Stent, Huffman and Brennan2008). Hyperarticulation is observed immediately after the speaker finds out that they were misunderstood, and then decays gradually over several turns in the absence of further misrecognitions.
Explanation of these effects has been a controversial issue. First, they can be explained by audience design (Bell Reference Bell1984), which means that language users proactively adjust their message in order to increase their communicative success while at the same time reducing their efforts any time they can.
But this is not the only explanation that can be found in the literature. A popular view in usage-based linguistics involves the phenomenon of chunking. According to Bybee, for example, each instance of use further automates and increases the fluency of a sequence of words, leading to their fusion (Bybee Reference Bybee2007: 324; see also Section 5.4.3). A frequently repeated stretch of speech becomes automated as a processing unit due to neuromotor routines. Further repetition leads to reduction and overlapping of articulatory gestures. All this shortens the duration. For instance, Bybee and Scheibman (Reference Bybee and Thompson1999) found that reduction of the vowel and the consonants in don’t in spoken English is particularly frequent after the pronoun I and before the verbs know and think because this contraction occurs particularly frequently in phrases I don’t know and I don’t think. The process of automatization is not restricted to language alone and is largely unconscious.
If the automatization account is the only true one, then the joint probability of neighbouring units (i.e., the frequency of these units together, divided by the sum frequency of all other sequences) would be the only important factor in predicting formal reduction. However, empirical evidence reveals that conditional probability is more important than joint probability in that regard. In particular, Bell et al. (Reference Bell, Jurafsky, Fosler-Lussier, Girand and Gildea2003) investigated the effects of conditional probabilities and joint probabilities on the duration and phonetic reduction of function words in spoken English. They found that the conditional probabilities have either the strongest or the only significant effect in the predicted direction (i.e., more predictable target words are more frequently reduced than less predictable ones). Joint probabilities, which basically represent the frequencies of possible chunks and their degree of routinization, sometimes have an effect in the opposite direction. Also, Barth (Reference Barth2019) shows that reduction of be and have in highly grammaticalized contexts is due to the high conditional probabilities rather than the joint probabilities of these words with their neighbours (most importantly, the words that follow be and have). This can be regarded as evidence that accessibility due to high contextual predictability is more important than the process of chunking, at least, in these cases of formal reduction.
Another popular explanation is that the speaker buys time for planning by using a longer expression. As shown by Bell et al. (Reference Bell, Jurafsky, Fosler-Lussier, Girand and Gildea2003), planning problems, which are represented by disfluencies either preceding or following a function word, increase the chances of longer or fuller variants of words in language production. Planning issues were also one of the explanations offered by Szmrecsanyi (Reference Szmrecsanyi2003) to provide an account for the preference of the construction be going to in syntactically complex environments (in comparison with will/shall), which are more demanding in terms of processing resources (see Section 4.3).
While planning issues may well play a role, they fail to explain many instances of reduction and enhancement. For example, Jaeger and Buz (Reference Jaeger, Buz, Fernández and Cairns2017) argue that the link between the contextual predictability of a linguistic form and its own realization is not very clear if one accepts the ‘buying-time’ explanation. There is also evidence that backward transitional probabilities (i.e., those that predict the target unit given the following context) play a role that is at least as important as the role of forward transitional probabilities (i.e., the ones that predict the target unit from the preceding context), if not more important (Seyfarth Reference Seyfarth2014; Barth Reference Barth2019). Moreover, speakers adapt subsequent productions towards less reduced variants if previous use of more reduced variants resulted in communicative failure (Stent et al. Reference Stent, Huffman and Brennan2008; Buz, Tanenhaus and Jaeger Reference Buz, Tanenhaus and Jaeger2016). As Jaeger and Buz (Reference Jaeger, Buz, Fernández and Cairns2017) argue, this is incompatible with the idea that the degree of reduction depends solely on production ease.
One cannot exclude the possibility that routinization, ‘stalling for time’ and other production-related and speaker-centred explanations are relevant in some situations (cf. Ernestus Reference Ernestus2014). I argue that the effect of production factors should be ultimately constrained by the communicative need of the speaker to get the message across, although some of the lower-level reduction or enhancement processes can be caused by cognitive processes unrelated to the addressee’s needs (cf. Lindblom Reference Lindblom, Hardcastle and Marchal1990). This constraint becomes obvious if we listen to human (not previously recorded) announcers at a railway station. When the speaker announces that the platform number has been changed, the number will be highly accessible to him or her. However, the numeral representing the platform number is unlikely to be reduced because this information is highly important and not accessible to the travellers who need to catch the train.Footnote 9 Notably, numbers tend to be very stable phonologically across languages (Diessel Reference Diessel2019). We can think of at least two reasons for this. First, confusion can be costly in many linguistic and extralinguistic ways. Second, numbers are often used in similar contexts (e.g., X costs two/five/ten/… euros), which makes them on average less predictable from context. We need more research in order to obtain a conclusive answer and to disentangle these competing motivations and explanations.
A final word of warning should be said against a potential misunderstanding that an account based on audience design should only display effects based on context-specific accessibility. There is no conflict between this account and the evidence of entrenchment effects, which can last for a while, or even become conventionalized. For example, the voice-onset time of words with initial voiceless stops that have minimal pairs, e.g., cod – god, is greater in comparison with words without such a pair, e.g., cop – *gop. Baese-Berk and Goldrick (Reference Baese-Berk and Goldrick2009) found that this difference is observed even if the minimal pair is not present in the context (i.e., there is no need of disambiguation). They conclude that this effect is not driven by what they call ‘listener–modelling’. We know from Cohen Priva (Reference Cohen Priva, Abner and Bishop2008), Seyfarth (Reference Seyfarth2014), which were mentioned above, and other studies, that units that frequently occur in reducing contexts also become more reduced in general, i.e., usage percolates into the system. Therefore, units that are frequently hyperarticulated or reduced in some contexts may become hyperarticulated or reduced across the board. This may lead to short-term or long-term effects. In the study mentioned above, Stent et al. (Reference Stent, Huffman and Brennan2008) show that hyperarticulation is a targeted and flexible adaptation to a specific situation, which decays with time. At the same time, reduced or enhanced forms can be entrenched and conventionalized in their conjunction with specific communicative situations. As a result, whole special registers can emerge, e.g., child-directed speech, foreigner-directed speech, etc. (Jaeger and Buz Reference Jaeger, Buz, Fernández and Cairns2017). As in the previous examples of efficient formal asymmetries, we can observe different kinds of efficiency, from context-sensitive language use, where audience design is probably the strongest and most precise, to conventionalized patterns, which are coarser, but do not require much thinking and produce the desired cognitive effects most of the time.
2.8 Conclusions
We have seen many different manifestations of efficiency as a descriptive phenomenon in all domains of language – lexicon, phonology, morphosyntax and discourse. Some of them lend themselves easily to the efficiency explanation, while some others also have alternative accounts. Chapter 5 will discuss some of them and others in greater detail.
Formal length is related to processing costs. Longer expressions can be used to make processing easier for the addressee. For example, the use of resumptive pronouns (see Section 2.4.3) in some types of relative clauses can help the addressee to process the sentence. This does not automatically mean, however, that shorter expressions mean more processing effort for the addressee, and longer expressions mean less processing effort. First of all, as we saw in Section 1.3, overly informative expressions create problems for comprehension. Second, the use of short and ambiguous expressions does not result in processing difficulties, provided that there is enough relevant context. See more on this topic in Section 6.2.1.
3.1 Efficient Order
The order of linguistic units is another important source of cost minimization. Efficient word order has received a lot of attention in the literature, in particular in the typological work by John Hawkins (e.g., Reference Hawkins2004, Reference Hawkins2014) and in numerous experimental and corpus-based studies, which are discussed below. In addition to word order, I will also discuss efficient order of bound morphemes.
There are very many theories, especially in psycholinguistics, which argue that some word orders are more costly than others. The costs that are discussed in the literature are usually related to processing effort, especially to memory load. Many accounts give an advantage to word orders that allow for using time most efficiently, in particular, when accessible words and constituents are produced first. I will argue here that different ways of minimizing processing costs can be interpreted as maximization of accessibility, according to the principle discussed in Section 1.4.3.
First, I will discuss which factors can, according to different researchers, make word order more or less costly, based on existing evidence (Section 3.2). Next, I will provide well-known examples of efficiency observed across languages (Section 3.3). Finally, Section 3.4 will discuss the costs and benefits of violating word order conventions, using word order produced by Yoda in Star Wars as an example.
3.2 Factors Determining Efficiency of Order
3.2.1 Minimization of Memory and Surprisal Costs
It is uncontroversial that memory plays a crucial role in determining the costs of syntactic processing. As early as Reference Yngve1960, Yngve proposed an idea for measuring processing complexity by counting the open dependencies that need to be kept in working memory. He postulated that memory capacity limits, such as Miller’s ‘seven plus or minus two’ determine the maximum depth (that is, the number of open dependencies) of a structure that can be processed. Yngve expected these limits to shape the grammars of human languages.
Memory costs were discussed in great detail in dependency locality theory (Gibson Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000). According to this theory, the costs arise due to two tasks. The first one is related to storage of the structures built so far, as well as predictions about the following element until it appears. The second one has to do with integration of the new material into the structure. Integration requires reactivation of the word in the previous context that has a dependency relationship with the current word. The activation of the word decays as more and more words are added between the previously mentioned word and the current word, so more effort is needed to reactivate the former. Therefore, syntactic predictions held in memory over long distances are costly, which matters both for production and comprehension.
Memory costs should not be too high for a sentence to be processable. For example, sentences with double centre-embedded clauses, as in (1a), are problematic because there is a state during its parse that exceeds the available memory resources (Gibson Reference Gibson1998: 16).
| a. | The administrator who the intern who the nurse supervised had bothered lost the medical reports. |
| b. | The nurse supervised the intern who had bothered the administrator who lost the medical reports. |
According to Gibson, this state occurs at the noun nurse. There are too many predictions that the processor needs to keep in mind at this point: predictions about the empty category positions of the first who and the second who, as well as predictions of the verbs in both relative clauses.Footnote 1 Avoiding centre-embedded clauses, as in (1b), helps to avoid the breakdown.Footnote 2
As for integration costs, they are highest at the second lexical verb bothered. This is also the point where reading times are predicted to be the longest. Here, the processor has to perform two particularly long and costly integrations. The first one is to assign a thematic role from bothered to the intern. The second is to link the empty argument of bothered to the first instance of who.
By minimizing dependency distances, language users minimize memory costs. This is called the Principle of Dependency Locality. The processing costs are also lower when the speaker minimizes the domains necessary for the recognition of constituents (Hawkins Reference Hawkins2004). See more on this in Section 3.3.1.
These considerations can also be explained in terms of maximization of accessibility. Long dependencies decrease the accessibility of preceding words because their memory traces fade with time. More effort is needed to reactivate them.
Dependency locality can be seen as a special case of a more general principle, which is called information locality. According to Futrell and Levy (Reference Futrell, Levy, Lapata, Blunsom and Koller2017), processing is difficult when any elements with high mutual information (that is, which are strongly associated, based on the previous linguistic experience) are far from one another, not only members of syntactic dependencies. Efficient order then means that strongly associated words are placed close to each other.
This approach unifies two seemingly unrelated processing costs: the memory-based costs associated with dependency distances, and the expectation-based costs associated with high surprisal. Surprisal (that is, unexpectedness) of a word given its context is a contributor to processing costs. Numerous studies have shown that surprisal is a good predictor of online processing difficulty (see Section 1.2.2). By minimizing distances between semantically and syntactically related words, language users not only minimize storage and integration costs, but also minimize surprisal, because neighbouring words become more predictable. As the distance between two related words increases, the preceding word becomes a less effective cue for predicting the other word. As a result, the latter becomes more surprising, which creates processing costs. Since high surprisal means low accessibility, we can say that the principle of information locality (including dependency locality) reflects the principle of maximization of accessibility.
Locality effects can interact with other factors. In particular, the processing costs associated with increasing dependency distances can be modulated by context that helps to predict the upcoming word (Konieczny Reference Konieczny2000; Vasishth and Lewis Reference Vasishth and Lewis2006). For example, the sentence in (2b) is more plausible than (2a) because the information about cutting onions makes the verb cried more expected than in (2a). This makes the verb easier to interpret (Grodner and Gibson Reference Grodner and Gibson2005).
| a. | The fisherman cried. |
| b. | The fisherman who was cutting onions cried. |
The discourse status of noun phrases inside long dependencies also plays a role. Warren and Gibson (Reference Warren and Gibson2002) found that reading times at crucial verbs were faster when the referents introduced by intervening nouns were discourse-given and therefore easily accessible. In the example below, sentences with the 1st-person pronoun (we) were processed the most easily, as can be measured by the reading times on the main verb advised together with the following word. Sentences with a famous person’s name (Elon Musk) were more costly, followed by sentences with definite descriptions (the chairman). Finally, sentences with an indefinite description (a chairman) required the longest reading times in the crucial region.
The consultant who we/Elon Musk/the chairman/a chairman called advised wealthy companies.
According to Warren and Gibson (Reference Warren and Gibson2002), given and accessible referents, which are easier to integrate in discourse, also make the syntactic integration of the verb and arguments easier – probably, the resources used in processing of syntactic arguments and integration of discourse referents are not independent.
Locality effects interact with articulation effort. If a sentence has structures with low accessibility due to long dependency distances, for example, the processing costs can be mitigated by using longer forms. This can explain the cognitive complexity hypothesis by Rohdenburg (Reference Rohdenburg1996) discussed in Section 2.4.2, which explains the tendency to use function words when the memory of syntactically related words decays. The example provided there was the use of the particle to in the help + (to) infinitive construction, in situations where there are many intervening words between help and the infinitive. Low accessibility triggers the use of more costly expressions, while high accessibility allows the speaker to use less costly expressions. Another option is to use word order that helps to minimize dependency distances or syntactic domains. This strategy is discussed in Section 3.3.1.
3.2.2 Producing Accessible Elements First
Another criterion of processing ease is directly related to the principle of maximization of accessibility. When more accessible units are produced first, and less accessible ones are produced later, this helps to save processing effort and time. As already discussed in the previous chapter in relation to the expression of referents (Section 2.2.1), accessibility is determined by multiple factors: previous mentioning of the referent and the lexeme, recency in discourse, topicality, predictability from context, and others. There is substantial evidence that more accessible concepts are produced first, if this is allowed by the grammar (Bock and Irwin Reference Bock and Irwin1980; Bock and Warren Reference Bock and Warren1985). In particular, language users place given before new (e.g., Bock and Irwin Reference Bock and Irwin1980 for English and Ferreira and Yoshita Reference Ferreira and Yoshita2003 for Japanese) and animate before inanimate (Tanaka et al. Reference Tanaka, Branigan, McLean and Pickering2011 for Japanese).Footnote 3 Consider (4a) and (4b). Both SO and OS orders are possible in Dutch. Which one will be preferred depends on the relative accessibility of the referents expressed by the arguments. Under normal circumstances, (4a) will be preferred because the referent expressed by the personal pronoun zij is animate and given and therefore more accessible than the referent expressed by the indefinite noun appel.
| Dutch (personal knowledge) | |||||
| a. | Zij | heeft | een | appel | gegeten. |
| she | has | an | apple | eaten | |
| ‘She has eaten an apple.’ | |||||
| b. | Een | appel | heeft | zij | gegeten. |
| an | apple | has | she | eaten | |
| ‘She has eaten an apple.’ | |||||
Why does this help to save costs? First of all, we need to go beyond the boundaries of a sentence, which has been the traditional unit of analysis in many psycholinguistic theories. Referents, events and other pieces of information in discourse are connected by cohesion relationships, which can be seen as a kind of dependency. By mentioning a referent early, we decrease the memory costs required for integration of this referent. We can also save articulation costs because the referent will be more accessible, and therefore a less costly form will be used (see Section 2.2).
Moreover, putting accessible information first helps to save time. According to Levelt’s (Reference Levelt1989) model, language production consists of several stages: Conceptualization (determining the contents of the message), Formulation (building the necessary grammatical and phonological structures) and Articulation (uttering the phonetic representations). Importantly, sentence generation is incremental and can run in parallel, both between the stages and within the stages (De Smedt Reference De Smedt, Adriaens and Hahn1994). Because of the competition between different conceptual content at the Formulation level, the segments that are formulated faster can be sent to Articulation sooner. Heavy components, which usually have low accessibility, are also more time-consuming for Formulation than light components. When light and highly accessible elements are formulated and articulated first, and heavy and less accessible ones are produced later (cf. Arnold et al. Reference Arnold and Griffin2000), this saves time required for speech production. While formulating and articulating more accessible and lighter constituents, the production mechanism is busy with processing less accessible and heavy ones.
Consider binomial expressions, e.g., land and sea, bride and groom, fame and fortune. Fenk-Oczlon (Reference Fenk-Oczlon1989) argues that the order in such expressions is best explained by frequency. The first element is normally more frequent than the second one. Since more frequent words are more accessible than less frequent ones, this order is efficient. Note that semantic relations also help to explain the data to some extent. In particular, an important role is played by iconicity of order, e.g., past and present, birth and death. This principle will be discussed in Section 3.2.4. Similar reasoning can also explain the so-called right dislocation, when the heavy component, which requires a lot of time for formulation, is uttered last. See an example in Section 3.3.5.
As another illustration, consider English dative alternation, which was discussed in Section 2.3.3. The choice is between double-object dative, as in (5a), and prepositional dative, as in (5b).
| a. | The teacher gave me an interesting book. |
| b. | The teacher gave them to the smartest student. |
Bresnan et al. (Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007) demonstrate that the choice between the constructions is determined by a number of factors. In particular, the double-object construction, in which the recipient is followed by the theme, is more likely to be chosen than the prepositional-object construction if the Recipient is pronominal, animate, definite, discourse-given, 1st or 2nd person and relatively short in comparison with the Theme, and the Theme is not given, not pronominal, and not concrete. The reverse holds for the prepositional-object construction. Therefore, the more accessible and shorter element (Recipient or Theme) tends to come first, and the less accessible and longer one is usually placed second. In addition, the word order helps to minimize dependency distances, which saves memory costs. Note that processing efficiency again interacts with articulation efficiency. The prepositional dative has more coding material (the additional preposition to) than the double-object dative, which can be explained by the fact that the former represents less accessible configurations of participants (see Section 2.3.3).
There are some other factors determining which constituent will come first. In particular, Clark and Chase (Reference Clark and Chase1974) show that figures are better starting points than grounds. Compare two pictures in (6). The star is the figure, and the line is the ground.
| (a) | * | (b) | --------------------- |
| ---------------------- | * |
When describing (6a), language users predominantly mention the star first: The star is above the line, rather than The line is below the star. As for (6b), they prefer beginning with the star, too. So, The star is below the line is produced more often than The line is above the star. At the same time, the figure-first preference for (6b) is weaker than for (6a). The reason is another bias: speakers prefer to identify with objects ‘above’ rather than in the marked relation ‘below’. This identification is the starting point for building mental representations and sentence production (MacWhinney Reference MacWhinney1977). It is deeply rooted in our early sensorimotor experience (Piaget Reference Piaget1952). This asymmetry is echoed in the tendency to describe vertical relationships such that the ‘point of reference’ is at the bottom. For example, it is more natural to say Jack is taller than Bill than Bill is shorter than Jack. Similar asymmetries are observed for the pairs ‘in front of’ – ‘in back of’, ‘ahead’ – ‘behind’ and ‘before’ – ‘after’. Agents are also easier to identify with than Patients, which explains why the active voice is more frequent than the passive across languages (Greenberg Reference Greenberg1966). Also, animate entities are more accessible than inanimate ones, as was already mentioned.
The tendency to mention accessible units first is called the ‘Easy First’ bias by MacDonald (Reference MacDonald2013). She also argues that this bias competes with another principle, which she calls ‘Plan Reuse’. Speakers favour ‘easy’, more practised or recently used utterance plans. This explains effects of structural priming (Weiner and Labov Reference Weiner and Labov1983; Bock Reference Bock1986; Pickering and Branigan Reference Pickering and Branigan1998; see an overview in Pickering and Ferreira Reference Pickering and Ferreira2008). For example, if the speaker has recently uttered, heard or read a passive sentence, they are more likely to produce it again. According to MacDonald (Reference MacDonald2013), structural priming is part of long-term learning, so there is no principled difference between accessibility of a plan in long-term memory and as a result of activation in a recent usage event. While Easy First operates at the level of words and constituents, Plan Reuse involves more abstract sentence schemas, e.g., SOV or SVO. If Plan Reuse strongly dominates language production in a language, the order of constituents will be rigid, as in English or Mandarin Chinese. If it is weaker, then Easy First has more room for action, and word order will be more flexible, as in Russian or Czech. Both Easy First and Plan Reuse maximize accessibility, but they do it at different levels of abstraction.
3.2.3 Avoidance of Reanalysis
An efficient order will enable the recipient to interpret a sentence correctly (for example, to determine who did what to whom) from the first try. If the analysis has to be done again, it creates additional processing costs. This is why so-called garden-path sentences, e.g., the famous The horse raced past the barn fell, are costly. The processor would first interpret the participle raced as a finite past-tense form, which is reanalysed after the unambiguous verb form fell. This results in a waste of processing resources for the recipient.
The criterion of early and correct access can be linked to the principle of maximization of accessibility, as was argued in Section 1.4.3. The speaker leads the addressee up the garden path because the addressee is used to the fact that the most accessible interpretation is the best one in most cases.
That said, it is necessary to mention that language users do not always engage in reanalysis. Experiments demonstrate that language processing often yields a merely ‘good-enough’ rather than a detailed linguistic representation of the meaning of a sentence (Ferreira Reference Ferreira2003).
Closely related to the requirement to avoid reanalysis is Hawkins’ (Reference Hawkins2004) principle called Maximize On-line Processing. According to this principle, the speaker should use word order that provides the earliest possible access to as much structure as possible. For example, antecedents precede anaphor cross-linguistically, e.g., John adores himself is preferred to Himself adores John. The former order helps the addressee to identify the referent of the anaphoric expression (himself) easily. This can also be regarded as a strategy for maximization of accessibility, because the referent of the reflexive pronoun is immediately accessible if we use the standard order.
In addition to word order adjustments, early access can be secured by case marking and semantic cues. For example, verb-final languages tend to have case marking of the main arguments and a strong association between the syntactic roles and the semantics of the nominals that can fill them (Hawkins Reference Hawkins1986; Levshina Reference Levshina2020b). This information helps us to understand who did what to whom early in the sentence, avoiding the costs of reanalysis. See more on this in Section 6.3.
3.2.4 Diagrammatic Iconicity of Order
Linguistic iconicity refers to the correspondence between the conceptual structure and the linguistic structure (Haiman Reference Haiman1985; Croft Reference Croft2003: Section 7.2). This section focuses on diagrammatic iconicity where the order of linguistic units corresponds to the conceptual relationships between the elements they represent. This correspondence is also known as the semantic principle of linear order (Givón Reference Givón1990: 92). For example, the order of verbs in the phrase attributed to Julius Caesar, Veni, vidi, vici ‘I came, I saw, I conquered’, corresponds to the order of their conceptualization. An iconic order is efficient because it is easier to produce and to process. For example, it should be easier to process I moved from Berlin to Amsterdam than I moved to Amsterdam from Berlin. Other examples include frozen binomial expressions, e.g., birth and death, there and back, past and present or kiss and tellFootnote 4 (cf. Benor and Levy Reference Benor and Levy2006), although a major role in determining the order in binomial expressions in general is played by the frequency and therefore accessibility of their components (Fenk-Oczlon Reference Fenk-Oczlon1989; see also Section 3.2.2). Moreover, since the default interpretation in the absence of connectives is the sequential one, additional coding should be added in order to override it, in accordance with the principle of negative correlation between accessibility and costs, e.g., I conquered after I saw after I came.
The order does not have to be temporal. Consider the ascending order in numbering, e.g., Each steak needs 6–7 minutes to cook, where the lower estimate is followed by the upper estimate. Sequence relationships can also be very abstract and related to the cognitive and communicative space shared by the interlocutors. For example, in many languages (but not in all) old information usually precedes new information. This corresponds iconically to the development of knowledge and cognition from known to new information, as in (7A) and (7B):
| Russian (personal knowledge) | |||||
| A: | Nu, | čto | ty | kupila | segodnja? |
| well | what | you | bought | today? | |
| ‘Well, what have you bought today?’ | |||||
| B: | Ja | kupila | novoje | platje. | |
| I | bought | new | dress | ||
| ‘I’ve bought a new dress.’ | |||||
| Bʹ: | Novoje | platje | ja | kupila! | |
| new | dress | I | bought | ||
| ‘I’ve bought a new dress!’ | |||||
However, new and newsworthy information can also be put first, followed by old information, as in (7Bʹ), which is more emotionally coloured. The speaker simply cannot wait to boast about her new dress. The old information can in principle be omitted, but it can be added, as in the example, in order to remind the hearer about the continuing topic.
This shows that the pressure for iconicity of information flow can be overridden by the pressure for iconicity of urgency. In some languages more newsworthy (discourse-new and indefinite) nominal constituents are usually put first. Examples are polysynthetic languages Cayuga, Ngandi and Coos (Mithun Reference Mithun and Tomlin1987). However, these languages have obligatory bound pronouns that represent the main arguments, so given information is usually not expressed by separate nominal constituents.
Another very abstract type of iconicity is called iconicity of contiguity, using the classification from Haspelmath (Reference Haspelmath2008c). This means that elements that belong together semantically also tend to occur next to each other in speech. Here, the distance between linguistic units in speech iconically corresponds to the conceptual distance between concepts. This is why most constituents, e.g., nominal phrases, are usually not interrupted by other units, e.g., We listened to a very interesting lecture, and not We a very listened interesting lecture to. At the same time, spontaneous speech is known for exceptions, e.g., in Russian it is possible to say, My ocen’ interesnuju slusali lekciju (literally, ‘We very interesting listened to lecture’). Similarly, modifiers are located close to their heads. For example, the intensifier very is placed next to the property it intensifies, e.g., a very interesting lecture, and not an interesting lecture very.
It is possible to interpret many of the examples provided above as the tendency to put more accessible information first, following the principle of maximization of accessibility (see also Section 3.2.2). For example, the word order variation in (7) can be explained by higher accessibility of given information in the emotionally neutral utterance (7B) and higher accessibility of newsworthy, subjectively important information in the emotionally coloured version (7Bʹ). Similarly, we can say that continuous constituents are motivated by the higher accessibility of semantically related elements, which can be overridden by other factors, such as emotional salience. In accordance with the principle of information locality, closely related words or constituents should also be put close to each other in the sentence (see Section 3.2.1).
One difficult problem with iconicity as an explanatory factor is that it is not always easy to access a conceptualization independently from its linguistic expression. We do not have access to the language of thought and cannot compare the isomorphism of conceptual and linguistic structure directly (cf. Croft Reference Croft2003: 203). Moreover, there is evidence that conceptualization of events may depend on the preferred word order in a specific language, as we learn from eye-tracking studies. For example, speakers of subject-first languages, such as Dutch, first look at the agent before starting to describe a transitive event. Quite differently, speakers of Murrinhpatha, an Australian Aboriginal language with very flexible word order, do not show a preference for either the agent or the patient in the earliest stage of speech planning (Nordlinger et al. Reference Nordlinger, Rodriguez and Kidd2022). Notably, speakers of Tzeltal, a predominantly VOS language, direct their eye-gaze first to the agent (grammatical subject) when describing transitive events, although the preference is weaker (Norcliffe et al. Reference Norcliffe, Konopka, Brown and Levinson2015). This might sound surprising, given the fact that the subject in most sentences occurs last. A possible explanation is that the Tzeltal verb carries subject agreement markers. All this means that conventional order influences the order of conceptualization in language production, in accordance with Slobin’s (Reference Slobin, Aske, Beery, Michaelis and Filip1987) thinking-for-speaking hypothesis. The causal relationships between conceptual and linguistic structure are likely to be bidirectional.
3.2.5 Uniform Information Density
The Uniform Information Density (UID) hypothesis and similar proposals, which were discussed in Section 1.3, say that information (in the information-theoretic sense) should be distributed evenly throughout an utterance, avoiding high peaks and canyons. Usually, these ideas are used to explain formal variation, e.g., phonetic reduction or the use or omission of function words. This approach can also help to explain the cross-linguistic distribution of word order. Fenk-Oczlon (Reference Fenk-Oczlon1983) argued that word orders are more efficient if they lead to more uniform distribution of information. In particular, objects can be highly informative. However, when introduced later in the sentence, they become more predictable due to previous context, so that a peak in informativity is avoided. With subjects, the situation is reverse. This explains why SOV and SVO are the most popular orders cross-linguistically. Maurits (Reference Maurits2011) tested this hypothesis empirically, evaluating which of possible permutations of Subject, Object and Verb leads to the smallest differences between the entropy scores of each word, given the previous words. More on this follows in Section 3.3.4.
Also, avoidance of too high surprisal as a desideratum of the UID hypothesis overlaps with the information locality principle described in Section 3.2.1. According to that principle, closely related words appear together, which helps to minimize the processing costs associated with high surprisal. Therefore, we can interpret this aspect of the UID hypothesis as the fulfilment of the principle of maximization of accessibility, which manifests itself in word order. As for another aspect of the UID hypothesis, namely, the enhancement and reduction of the speech signal, this is explained by the principle of negative correlation between accessibility and costs (see Section 1.3).
3.3 Cross-Linguistic Manifestations of Efficient Order
3.3.1 Minimization of Dependency Distances and Domains
Language users tend to minimize distances between syntactic heads and their dependents (Ferrer-i-Cancho Reference Ferrer-i-Cancho2006; Liu Reference Liu2008; Gildea and Temperley Reference Gildea and Temperley2010; Futrell, Mahowald and Gibson Reference Futrell, Mahowald and Gibson2015b). An example is so-called heavy-NP shift. In English, the direct object nominal phrase (NP) is usually followed by the prepositional phrase (PP), as in (8a). However, when the NP is heavier than the PP, the preferred order is reversed, as in (8b).
| a. | I’ve read NP[ the fascinating paper on nominal classifiers, which you sent me last week ], PP[ with great interest ]. |
| b. | I’ve read PP[ with great interest ] NP[ the fascinating paper on nominal classifiers, which you sent me last week ]. |
Using the Universal Dependencies conventions (Zeman et al. Reference Zeman, Nivre and Abrams2020), the dependency distance between the verb read and the object paper in (8a) is 3 words (the, fascinating, paper). The distance between the verb and the head of the prepositional phrase interest is 15 words. If we add up these numbers, we get 3 + 15 = 18 as the sum dependency distance with regard to these two dependencies. In (8b), the dependency distance between read and interest is 3 words, and the distance between read and paper is 6 words, which makes the sum distance of 9 words. Since the sum of dependency distances in (8b) is shorter than in (8a), the word order in (8b) is more efficient than the order in (8a). Note that the preposition with is regarded as the head of the prepositional phrase with great interest in many theoretical frameworks (Osborne and Gerdes Reference Osborne and Gerdes2019), but this approach will lead to similar results.
The principle of minimization of dependency distances is closely related to the law of growing constituents formulated by Behaghel (Reference Behaghel1909) on the basis of text data from Indo-European languages. If there are two constituents of different length, the longer constituent follows the shorter one. Corpus evidence (Wasow Reference Wasow1997) and experimental data (Stallings and MacDonald Reference Stallings and MacDonald2011) support this claim for English. The higher the ratio of length of an NP and a PP, the more likely it is that speakers of English will put the shorter PP before the longer NP.
According to the dependency locality theory (Gibson Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000), longer-distance attachments, as in (8a), involve higher integration costs. As for storage costs, they are involved if both complements are obligatory and therefore expected. For example, Gibson (Reference Gibson1998: 51) argues that (9a) is more memory-expensive than (9b) because the verb give creates an expectation of Recipient and Theme coming later in the sentence:
| a. | The young boy gave NP[ the beautiful green pendant that had been in the jewellery store window for weeks ] PP[ to the girl ]. |
| b. | The young boy gave PP [ to the girl ] NP[ the beautiful green pendant that had been in the jewellery store window for weeks ]. |
Recent corpus-based studies, however, usually do not make the distinction between obligatory and non-obligatory constituents (which is very difficult due to the absence of this information in most corpora, and also theoretically problematic because this distinction is gradient). Also, there is normally no distinction between storage and integration costs. In addition, Gibson’s theory takes into account the number of new referents that need to be integrated into the structure. In corpus-based studies, processing difficulty is usually represented by the number of all words between the head and the dependent. The assumption is that different measures are highly correlated with each other, so that the differences between them are not substantial (Wasow Reference Wasow2002; Futrell, Levy and Gibson Reference Futrell, Levy and Gibson2020).
Interest in measuring dependency distances has been boosted by the emergence of large corpora annotated for syntactic dependency relations, especially the Universal Dependencies corpora (Zeman et al. Reference Zeman, Nivre and Abrams2020). But the preferences in the examples above can also be explained if we focus on syntactic constituents (e.g., NP, VP or PP) instead of dependencies. Most prominently, Hawkins (Reference Hawkins2004) argued that language users prefer word orders that minimize the syntactic and semantic domains needed for recognizing the constituent structure – a principle called Minimize Domains. A domain is ‘the smallest connected sequence of terminal elements and their associated syntactic and semantic properties that must be processed for the production and/or recognition of the combinatorial or dependency relation in question’ (Hawkins Reference Hawkins2004: 32). The domains in which immediate constituent (IC) relations can be processed are called constituent recognition domains. As an illustration, take the following sentence from Hawkins (Reference Hawkins2004: 23):
The old lady V[ counted ] PP1[ on him ] PP2[ in her retirement ].
We can find many different domains, depending on what kind of information we are processing. If we focus on the VP and its three immediate constituents (V, PP1, PP2), the domain is counted on him in. We can already recognize the structure from this sequence. Alternatively, if we take the lexical meaning of the verb count, the sufficient domain is counted on him, or possibly just counted on.
The principle Minimize Domains is about making the domains as small as possible. For example, the domain for parsing the lexical combination and dependency between count and on is smaller if the preposition immediately follows the verb. Similarly, if we take the domain for the processing of the VP and its three immediate constituents, counted on him in, the domain is four words. If we change the order of the two prepositional phrases, as shown in (11), the domain necessary for recognizing the constituents will contain five words: counted in her retirement on. It will be longer. Therefore, the order in (11) is less efficient than the order in (10).
The old lady V[ counted ] PP2[ in her retirement ] PP1[ on him ].
We can explain the preference in the example of heavy-NP shift, repeated below for convenience, by the same principle:
| a. | I’ve read NP[ the fascinating paper on nominal classifiers, which you sent me last week ], PP[ with great interest ]. |
| b. | I’ve read PP[ with great interest ] NP[ the fascinating paper on nominal classifiers, which you sent me last week ]. |
In (12a), the domain is read the fascinating paper on nominal classifiers, which you sent me last week, with. It contains fourteen words. In (12b), it is read with great interest the, only five words. Thus, both Hawkins’ constituent approach and the dependency distance approach predict that the word order in (12b) is more efficient and should therefore be preferred by language users.
The motivation for the principle Minimize Domains has to do with working memory and computation system (Hawkins Reference Hawkins2014: 13). The smaller the recognition domain, the fewer additional phonological, morphological, syntactic and semantic decisions that need to be made simultaneously with the task of identifying the domain in question. There will be fewer competing structural decisions to resolve.
These theories nicely predict the behaviour of postverbal elements, which consistently follow the rule ‘short before long’. There is also some evidence that preverbal constituents follow the rule ‘long before short’, as predicted by the processing principles above. In Japanese, for example, the order of objects and postpositional phrases, subjects and objects, and direct and indirect objects supports the theoretical expectations: long constituents are followed by short ones. This word order minimizes dependency distances and domains (Hawkins Reference Hawkins1994; Yamashita and Chang Reference Yamashita and Chang2001). But if we take two pre- or postpositional phrases depending on one verbal head, corpora of different languages reveal no clear preferences for long before short in the preverbal position (Liu Reference Liu2020). The evidence for the dependency minimization account is thus not always clear.
Moreover, there are some arguments against the memory-based explanation of domain minimization. As Wasow (Reference Wasow1997) argues, this account would require that both constituents are fully planned at the moment of speech, so that their weights can be compared. However, it is questionable if the speaker can do that. Instead, the actual formulation of the phrases takes place, at least, partly, after the order has been chosen. Wasow presents some corpus data to support his claim. He shows that collocations, e.g., take into account/consideration or bring to an end/close, more frequently participate in heavy-NP shift than non-collocations. The reason is that collocations are easier to produce for the speaker as one sequence before the more complex part that requires more planning. Note, however, that producing opaque collocations as one sequence can be more beneficial for the addressee, too, because it allows them to decide immediately on the lexical meaning of the verb (cf. Hawkins’ example with count on in (10)). We need more research in order to understand how all these factors interact.
3.3.2 Preferred Order of Elements within a Nominal Phrase
Elements of a nominal phrase – Noun, Adjective, Determiner and Numeral – appear in a different order in languages of the world. For example, English has the order Determiner – Numeral – Adjective – Noun, as in those three little kittens, whereas in Basque, the order is Numeral – Noun – Adjective – Determiner, as in three kittens little those. At the same time, some orders are common, and some orders are extremely rare or even not attested, e.g., Adjective – Numeral – Determiner – Noun, as in little three those kittens (Culbertson, Schouwstra and Kirby Reference Culbertson, Schouwstra and Kirby2020). In particular, in the preferred word orders, Adjective is placed closest to Noun, whereas Determiner is placed farthest away.
It seems that these preferences can be explained by the strength of association between objects and their different properties in the world. This account has been tested on corpus data by Culbertson et al. (Reference Culbertson, Schouwstra and Kirby2020), who used Pointwise Mutual Information as a measure of association. Associations are the strongest between Noun and Adjective. For example, wine is strongly associated with its colour (e.g., red or white), whereas skyscrapers are strongly associated with their height. Colour and height are inherent properties of wine and skyscrapers, respectively. Numerosity is less strongly associated with Nouns, although some objects usually come in pairs, e.g., shoes or socks, and some come in dozens or tens, e.g., eggs. Finally, Determiners, which usually specify the location and/or relation to the speech act participants, have the weakest association. This is not surprising, since individual Determiners are highly frequent and combine with very many diverse nouns.
Culbertson et al.’s findings can be explained by the principle of locality, which says that semantically and syntactically closely related elements should appear close to each other (see Section 3.2.1). This helps to maximize accessibility, decreasing memory load and expectation-based costs.
The principle of information locality can also explain the order of multiple adjectives in a nominal phrase, as demonstrated by Hahn et al. (Reference Hahn, Degen, Goodman, Jurafsky and Futrell2018) and Futrell (Reference Futrell2019). For example, English allows the order a large wooden table, but not a wooden large table. The adjective with higher mutual information will be closer to the noun. Also, evaluative adjectives are placed further from the noun, e.g., a beautiful red dress, but not a red beautiful dress. This can be explained by the fact that evaluative adjectives do not restrict the set of referents but communicate the speaker’s attitude. Their applicability to any given noun is determined by the speaker’s subjective state rather than by the noun itself. Speaking simply, what is beautiful for one person can be ugly for another. This explains why evaluative adjectives are located on the periphery of a nominal phrase. The explanation is supported by diachronic evidence. According to Traugott (Reference Traugott, Davidse, Vandelanotte and Cuyckens2010), as a linguistic unit develops more subjective meanings, its position also moves towards the periphery.
3.3.3 Cross-Linguistic Regularities in the Order of Morphemes
There are a few cross-linguistic generalizations concerning the order of elements within a word. Two of them are discussed in this section. The first one is the suffixing preference. The second one is the preference for a particular order of derivational and inflectional morphemes depending on the type of grammatical meaning they express.
It is well known that suffixing is more frequent cross-linguistically than prefixing, and both are more frequent than infixing (Greenberg Reference Greenberg and Greenberg1963). Several explanations have been proposed. One theory belongs to Cutler, Hawkins and Gilligan (Reference Cutler, Hawkins and Gilligan1985), who argue that suffixes are preferred to prefixes due to the fact that word onset is a particularly salient position serving as a strong cue for word recognition. Word endings are less salient than onsets, but more salient than middles. Moreover, according to Hupp, Sloutsky and Culcover (Reference Hupp, Sloutsky and Culicover2009), beginnings are the most salient for any kind of sequences. If a word is distorted at its onset, the effects for processing are more disruptive than if the distortion happens at the end of the word. Therefore, by putting roots first, as the elements that carry the most important information, it is easier to avoid reanalysis or misunderstanding.
The causality can be also reversed, however. It may be that speakers of WEIRD languages (that is, spoken in western, educated, industrial, rich and democratic societies), which provide the main bulk of psycholinguistic evidence, learn to pay more attention to the onset because it is the most informative in those languages. Possibly, their experience with a suffixing language leads to perception of beginnings as the most salient position for determining similarity. In fact, Martin and Culbertson (Reference Martin and Culbertson2020) show that speakers of Kîîtharaka, a prefixing Bantu language, perceive endings as the most salient for determining similarity, contrary to Hupp et al. (Reference Hupp, Sloutsky and Culicover2009).
Another explanation of the suffixing preference has to do with the tendency to provide disambiguating information early. As already mentioned (see Section 3.2.3), this tendency has been captured by Hawkins’ (Reference Hawkins2004) principle Maximize On-line Processing, an efficient strategy in communication. A lexical root or stem is less predictable, or more informative than an affix because individual lexical roots are more diverse and less frequent in comparison with individual affixes. Therefore, affixes are less important for word recognition. By providing the maximum of information at the beginning, the speaker helps the addressee to make correct predictions about the word.
Yet another explanation was formulated by Himmelmann (Reference Himmelmann2014), who argues that the suffixing preference is due to prosodic factors and grammaticalization processes. In general, affixes represent a result of greater grammaticalization and fusion of clitic function words with their lexical hosts. But if a function word precedes its lexical host, there can be a prosodic boundary between them, which impedes the fusion, as in the following example (Chafe Reference Chafe1980: 308, story 9):
And that’s the end of the .. story.
The prosodic boundary separates the definite article from the head noun. In contrast, when a function word occurs after its lexical host, there are hardly any prosodic boundaries, and prosody does not impede the fusion. As a result, postpositional clitics become suffixes more frequently than clitics that precede their hosts.Footnote 5
But why does the boundary occur more frequently when a function word occurs before its host than the other way round? According to Diessel (Reference Diessel2019: Section 5.5), this can be explained by predictability. The conditional probability of a lexical unit (e.g., a noun) given a functional element (e.g., an article) is low. For example, the article the can be followed by thousands of different nouns. The conditional probability of a functional element given a notion word is higher. If we take a typical English noun, it is likely to be accompanied by the. Thus, function words are more predictable given content words than the other way round. If a function word occurs before its host, the order of production means that the host is not very predictable (e.g., the girl/house/conference…). Low predictability may trigger production difficulties, which can result in disfluencies like pauses and hesitations. In contrast, if a function word occurs after its host, the function word has a high degree of predictability. It is retrieved and produced more easily, which means that the chances of a prosodic boundary are lower. High predictability leads to fusion of the host and the postposed element, which explains why suffixing occurs more frequently in languages of the world. Therefore, the suffixing preference can be explained by the higher accessibility of postposed dependent units in comparison with preposed ones.
Notably, the suffixing preference is not monolithic. It is very strong in verb-final (OV) languages and in the grammatical markers expressing nominal number, case, as well as tense and aspect (Cysouw Reference Cysouw, Beck and Gärtner2009). As for person marking, there is even a slight preference for prefixing.Footnote 6 Among potential explanations of this preference is word order (e.g., if Subject is before Verb, this can favour the emergence of subject prefixes), as well as the fact that the main participants of the situation often have high accessibility and are therefore produced earlier.
To finish this discussion, it is necessary to mention that the relative scarceness of infixing can be explained by the general tendency of keeping semantically related elements together (due to the information locality principle discussed in Section 3.2.1).
The second important cross-linguistic generalization related to the order of morphemes has to do with the relative distance of inflectional and derivational morphemes from the root. There are several well-known tendencies. Usually, derivational morphemes occur closer to the root than inflectional morphemes do. For example, in the wordform teachers, the derivational suffix ‑er is closer to the root teach than the plural marker ‑s. Inflectional morphemes also tend to be arranged in a particular order. For example, ‘the expression of number almost always comes between the noun base and the expression of case’ (Greenberg Reference Greenberg and Greenberg1963: 112).
As for verbal derivational and inflectional morphemes, the order is usually as follows (Bybee Reference Bybee1985):
Valence > Voice > Aspect > Tense > Mood > Agreement (Person and Number)
Bybee argues that the position of a morpheme is determined by the effect that this morpheme has on the root meaning. Derivational morphemes are more relevant to the root meaning in the sense that they change it more dramatically. Similarly, number has ‘a direct effect on the entity or entities referred to by the noun’, while case has ‘no effect on what entity is being referred to’ (Bybee Reference Bybee1985: 34). Also, the categories on the left of the scale in (14) have higher relevance to the verb than the ones on the right. For example, valence changes the number and role of participants involved in the event. It is central to the semantics of the verb. The differences related to valence are often so striking that they are lexicalized, as in the causative–inchoative pairs kill and die. In contrast, mood has the whole proposition in its scope, so it is less relevant for the lexical meaning of the verb. Similarly, agreement markers, such as person and number inflections, refer to the participants and are therefore peripheral with regard to the meaning of the verb.Footnote 7
We can explain these tendencies using the information locality principle. If we take aspect markers, which have a strong impact on the meaning of the verb, they would also be less freely applicable to different verbs than more peripheral markers. For example, an imperfective marker is more compatible with durative verbs than with punctual ones. Therefore, the mutual information of the root and the affix will be relatively high. In contrast, a person marker has fewer restrictions on the root. The mutual information would thus be lower. This reasoning is supported by a corpus study by Hahn, Degen and Futrell (Reference Hahn, Degen and Futrell2021), who investigated the order of morphemes in Japanese and Sesotho (a Southern Bantu language spoken in Lesotho and South Africa). Hahn et al. find that the order of morphemes correlates with mutual information, which represents the strength of association between neighbouring morphemes.
Similar reasoning can explain the order of case and number markers on nouns. According to Greenberg’s Universal 39 (Reference Greenberg and Greenberg1963), number markers are usually located closer to the stem than case markers, as in Turkish kitap-lar-ı ‘book-pl-acc.def’. A series of experiments by Saldana, Oseki and Culbertson (Reference Saldana, Oseki and Culbertson2021) demonstrates that learners of a miniature artificial language consistently reproduce this order even in the absence of wordforms with both case and number markers in the input language. Their behaviour is independent of the learners’ native language (English or Japanese), morpheme position with regard to the stem (prefixal or suffixal), degree of boundedness, frequency and other features. Importantly, this strong tendency can be reversed in the presence of case allomorphy. Since allomorphy increases the dependency between case markers and the stem, this serves as evidence for the principle of maximization of accessibility.
3.3.4 Subject-First Dominance
The dominance of the subject-first order in the world’s languages is a well-known fact (Greenberg Reference Greenberg and Greenberg1963; Dryer Reference Dryer, Dryer and Haspelmath2013). There are some grounds to believe that this preference is not a historical contingency. For example, experimental evidence reveals that even speakers of verb-initial languages (Irish, Tagalog) stick to subject-first order when communicating in gestures, while the ‘native’ verb-initial order is the third choice after SOV and SVO for those speakers (Futrell et al. Reference Futrell, Hickey, Lee, Lim, Luchkina and Gibson2015a). So, there is something deeply rooted in human cognition and communication that explains this dominance.
At the same time, the subject-first dominance is probably the champion if we count the number of explanations suggested in the literature. In fact, almost all the explanatory factors discussed in this chapter can play a role, potentially.
First of all, putting subject before object is efficient from the planning perspective because transitive subjects are usually highly accessible. That is, they are discourse-given, short, pronominal and animate (see Chapter 8). Therefore, it is efficient to place them first and use the remaining time to plan the less accessible elements (see Section 3.2.2).
A second explanation has to do with memory costs. If the subject comes first, the addressee will not expect an object because there is a chance that the sentence is intransitive. In contrast, if the first constituent is an object or adverbial phrase, there is still an expectation of the subject, which creates memory costs at this location (Gibson Reference Gibson1998). This theory, however, leaves unexplained the preference for subject-first order in ergative languages. If the subject with ergative marking appears first, it creates an expectation of an object because ergative marking signals that the sentence is transitive. It would be more efficient to place first the object marked with an absolutive case, which is similar formally to the intransitive subject, but this order is not very common among ergative languages.
The next potential explanation has to do with diagrammatic iconicity of order. In a prototypical transitive sentence, the action is transferred from Agent to Patient (Hopper and Thompson Reference Hopper and Thompson1980). This means that the energy ‘flows’ from Agent to Patient, where the Agent is the initiator, and the Patient is the affected entity and endpoint. An example is the causative event in (15). The Agent (the woman) is the source of energy necessary for the change that occurs with the Patient (the door).
The woman closed the door.
The order of subject and object reflects iconically this flow of energy. Thus, there is a correspondence between the subject-first order and the conceptualization of a transitive event.
In addition, Fenk-Oczlon (Reference Fenk-Oczlon1983) argues that subject-first basic orders are efficient because they produce a more uniform distribution of information for a randomly selected transitive clause. In contrast, the orders OSV and OVS are particularly inefficient. The logic behind this is as follows. According to Maurits’ (Reference Maurits2011: 117) data, there are fewer agents in the corpora than objects that the agents can manipulate. There are also multiple actions that the same agent can perform. In contrast, the number of objects is very high. This is why objects will have a very high surprisal value when they first appear. The information density will be more uniform, and surprisal peaks can be avoided, if there are some elements in front (subjects and/or verbs) which can help to reduce the surprisal of objects. Also, objects are highly predictive of verbs. For example, if ‘pizza’ is Object, then it is likely that the verb will be ‘eat’. So, if an initial object is followed by Verb, e.g., pizza eat, there will be a peak in surprisal on the object, followed by a very low valley on the verb. This creates large fluctuations in information density, which is not efficient.
One problem with this approach is that the density is only evaluated at the sentence level, without previous context. Subjects are usually discourse-given and highly accessible from context. This is why their informativeness should be evaluated in discourse, rather than in an isolated sentence. Also, the empirical data provided by Maurits (Reference Maurits2011) do not give consistent rankings of possible orders in terms of their information density profiles.
I propose that subject-first dominance can be explained by maximization of accessibility. Since subjects are usually given, putting them first reduces surprisal and minimizes memory costs. The iconic correspondence between word order and the conceptualization of energy flow from the agent to the patient arises because agents are usually humans. Since we usually speak about humans, they are often given and topical.
3.3.5 Continuous Constituents and Rarity of Crossing Dependencies
Cross-linguistically, syntactic trees with crossing dependencies are rare. In formal literature, such trees are called non-projective. They do not correspond to structures generated by lexicalized context-free phrase-structure grammars (see an overview in Yadav, Husain and Futrell (Reference Yadav, Husain and Futrell2021)).
We often observe crossing dependencies if there is discontinuity between syntactically related constituents or their elements. Consider the sentence I met a man once who knew too much. An analysis of this sentence according to the Universal Dependencies style (Zeman et al. Reference Zeman, Nivre and Abrams2020) is presented in Figure 3.1.Footnote 8 The arcs that represent syntactic dependencies (from heads to dependents) cross because the adverb once separates the head noun from the relative clause.

Figure 3.1 A sentence with crossing dependencies, according to the Universal Dependencies style
There is corpus evidence that language users avoid crossing syntactic dependencies (Nivre and Nilsson Reference Nivre and Nilsson2005; Havelka Reference Havelka2007; Ferrer-i-Cancho et al. Reference Ferrer-i-Cancho, Gómez-Rodríguez and Esteban2018). Yadav et al. (Reference Yadav, Husain and Futrell2021) demonstrate that the actual syntactic trees found in corpora have fewer crossing dependencies than random baselines. This means that language users have a bias against crossing dependencies.
A possible explanation of this fact lies in the tendency to keep semantically and grammatically related units close, in accordance with the principle of information locality, which, in its turn, is a manifestation of the principle of maximization of accessibility.
Discontinuities and crossed dependencies can arise due to planning issues in spontaneous language production and the tendency to put accessible information first, when these pressures override word order conventions. For example, discontinuities and crossing dependencies in Dutch are often motivated by the sentence bracket structure, which is similar to but looser than the German one, in the sense that diverse constituents are more often allowed to appear after the second (lexical) verb. The sentence in (16) contains a discontinuous nominal phrase with an extraposed prepositional phrase een auto … met zes deuren ‘a car … with six doors’, which is interrupted by the lexical verb gekocht ‘bought’. Since the lexical verb is connected with the auxiliary ‘have’, and ‘a car’ is connected with the prepositional phrase, this sentence contains crossing dependencies.
| Dutch (De Smedt and Kempen Reference De Smedt, Kempen, Bunt and van Horck1996: 148) | |||||||
| Ik | heb | een | auto | gekocht | met | zes | deuren. |
| I | have | a | car | bought | with | six | doors |
| ‘I have bought a car with six doors.’ | |||||||
As De Smedt and Kempen (Reference De Smedt, Kempen, Bunt and van Horck1996: 161) write, such discontinuities offer advantages for incremental sentence production. Right dislocations allow the speaker to produce the constituents that are ready and postpone the ones that are more complex and ‘heavy’ to a later stage. Therefore, discontinuities can help to save time. They can also help to minimize dependency lengths, and therefore save memory costs, both for the speaker and the addressee.
3.3.6 Greenbergian Word Order Correlations and Implications
Probably the most famous universals in typology are Greenbergian word order correlations and implications. For example, Greenberg’s (Reference Greenberg and Greenberg1963) Universal 2 says, ‘In languages with prepositions, the genitive almost always follows the governing noun, while in languages with postpositions it almost always precedes.’ This is a bidirectional relationship. Notably, we find correlations between multiple features. For example, the order of Verb and Object correlates with the order of adposition and NP, copula verb and predicate, ‘want’-verb and its complement, complementizer and complement clause, question particle and sentence, verb and adpositional phrase, noun and relative clause, adjective and standard of comparison, and some others (Dryer Reference Dryer1992).
Many discussions of these multiple correlations involve the notion of harmony. Word orders are harmonic if they co-occur in a language as predicted by the correlations. For example, the orders Preposition + Noun and Noun + Genitive are harmonic, as are the orders Noun + Postposition and Genitive + Noun, whereas, for example, Preposition + Noun and Genitive + Noun would be disharmonic.
There exist a plethora of explanations why harmonic orders are preferred. In particular, Dryer’s (Reference Dryer1992) Branching Direction Theory focuses on the relative order of phrasal (recursive, branching) and non-phrasal (non-recursive, non-branching) elements. It claims that languages tend to prefer only one order: either phrasal elements followed by non-phrasal ones, or non-phrasal elements followed by phrasal ones. For example, in a language with VO and Noun + Relative Clause orders, the branching elements (Object and Relative clause) follow the non-branching ones (Verb and Noun). In contrast, in a language with OV and Relative Clause + Noun, the branching elements precede the non-branching ones. Importantly, both languages would display harmonic orders.
Also, one often speaks of head-initial and head-final languages, depending on the order of the head (which is usually non-branching) and the dependent (which is usually branching), although Dryer (Reference Dryer1992) showed that the criterion of branching direction is superior to that of head direction, in that the former predicts the cross-linguistic correlations more precisely. Moreover, the head status of some elements is controversial and depends on the theoretical framework.
Can efficiency explain these correlations? There is a possibility that they emerge due to the pressure to minimize processing costs. In particular, Hawkins (e.g., Reference Hawkins1994, Reference Hawkins2014) argues that head-initial, or right-branching, and head-final, or left-branching, languages satisfy the principle Minimize Domains, while mixed languages do not. For example, we can create four possible scenarios for a verb with an adpositional phrase (Hawkins Reference Hawkins2014: 90, 99), where (17a) represents a right-branching and head-initial structure, (17b) represent a left-branching and head-final structure, and (17c) and (17d) are mixed.
| a. | [vp went [pp to the movies]] |
| b. | [[the movies to pp] went vp] |
| c. | [vp went [the movies to pp]] |
| d. | [[pp to the movies] went vp] |
The verb phrase recognition domains are underlined. According to Hawkins, the harmonic orders in (17a) and (17b) are efficient for processing because they result in smaller domains. This is why they are common cross-linguistically, unlike the non-harmonic variants in (17c) and (17d).
Using dependencies instead of constituents, Temperley (Reference Temperley2008) argues that a ‘same-branching’ grammar will result in shorter dependencies. At the same time, in case of multiple dependents and one head, it can be advantageous when one-word constituents branch in the opposite direction. For example, it is efficient to put an adverb before the verb, as in (18a). Compare it with (18b), where the adverbial modifier is long, and it should not be placed before the verb.
| a. | She is quickly rising in the music industry. |
| b. | She is rising in the music industry too quickly for her age. |
Another relevant factor is analogy. This can be interpreted as a kind of priming due to structural or semantic similarity of the current structure to one experienced before. It was argued in Section 3.2.2 that priming occurs due to increased accessibility of a recent form or meaning, so analogy can be seen as a result of accessibility maximization. Analogy in the order of functionally similar units can be beneficial for processing because it allows us to reuse the same accessible schema (MacDonald Reference MacDonald2013; see also Section 3.2.2). For example, the orders Verb + Object, Verb + Adverb and Auxiliary + Non-finite Verb can be generalized as the order of a finite verb followed by something else. Previous linguistic experience and immediate context with Verb + Object can prime the other two orders, making them more accessible, which makes production and comprehension easier.
The advantages of harmonic word orders may not be restricted to processing optimization only. They can also be easier to learn. For example, artificial language experiments reveal that adult and child language learners prefer harmonic word orders in the nominal phrase (e.g., either Adjective + Noun and Numeral + Noun, or Noun + Adjective and Noun + Numeral). This result does not depend on whether the learners’ L1 is harmonic or not itself (Culbertson, Smolensky and Legendre Reference Culbertson, Smolensky and Legendre2012; Culbertson, Schouwstra and Kirby Reference Culbertson, Schouwstra and Kirby2020).
Moreover, we should not underestimate the role of diachronic processes; for example, adpositions develop from verbs or nouns, which determines whether they become prepositions or postpositions (cf. Dryer Reference Dryer, Schmidtke-Bode, Levshina, Michaelis and Seržant2019). This can explain some correlations (but not all, as shown in Section 5.6).
Let us now move to word order implications, which represent one-directional relationships between different word order patterns. Implications usually emerge as a result of competing motivations in language (cf. Croft Reference Croft2003: Section 3.4). For illustration, consider Greenberg’s Universal 25, ‘If the pronominal object follows the verb, so does the nominal object’ (Greenberg Reference Greenberg and Greenberg1963). This is an implicational universal because it works only in one direction: if the nominal object follows the verb, the pronominal object may or may not do the same. This universal can be explained by two competing principles: the tendency to put accessible and short constituents first, and analogy, which means that functionally similar constituents should have the same position, due to the reasons explained above. In other words, the accessibility of specific words competes with accessibility of the abstract schema (cf. MacDonald’s [Reference MacDonald2013] principles Easy First and Plan Reuse in Section 3.2.2).
Figure 3.2 shows how often nominal and pronominal objects occur after the lexical verb in the Universal Dependencies corpora (version 2.6, Zeman et al. Reference Zeman, Nivre and Abrams2020).Footnote 9 The numbers are proportions relative to the total number of objects of each type. The labels are the ISO 639-3 codes of the languages. In languages such as Hindi, Turkish, Japanese and others, which are located in the bottom left corner, both pronouns and nouns precede the verb, e.g., I ice-cream love and I it love. Here, the principle of analogy is fulfilled, but the maximization of accessibility of specific constituents is achieved only partially. The speaker can indeed produce accessible pronouns early, but there is no extra time for less accessible nouns. In the languages located in the top right corner (Arabic, English, Hebrew, Indonesian, Irish and others), pronominal and nominal objects follow the verb, e.g., I love ice-cream and I love it. Abstract analogy works here, too, whereas the planning of specific units is optimal only for nouns because the speaker cannot produce accessible pronouns early. The Romance languages (French, Catalan, Spanish and others), which are located in the bottom right corner, have preverbal pronominal objects and postverbal nominal objects, e.g., I love ice-cream, but I it love. Here, the management of light and heavy objects is optimal, but the principle of analogy is not observed because the objects have different positions.

Figure 3.2 Proportions of nominal objects (horizontal axis) and pronominal objects (vertical axis) after verbs in the Universal Dependencies corpora
The top left corner is empty, in full accordance with Greenberg’s Universal. There are no languages with preverbal nominal objects and postverbal pronominal objects, such that one could say I ice-cream love, but I love it. This order would violate both principles. While some of the attested languages correspond to this principle well (the Romance languages, in particular), and the others correspond to some extent (but are probably easier to learn, due to analogy), unattested languages would be very inefficient in terms of processing. They could also be more difficult to learn.
3.4 Star Wars and Violations of Conventional Word Order
In all above-mentioned examples, the speaker normally prefers a low-cost word order to a high-cost one. But in some cases an inefficient word order can be deliberately and ostensively chosen by the speaker in order to trigger certain cognitive effects in the addressee. This is accompanied by a violation of word order conventions. A famous example is the speech of Yoda, a powerful Master Jedi from the Star Wars universe, who appeared in most of the films of the franchise (Episodes I, II, III, V and VI, as well as the sequels The Force Awakens and The Last Jedi, as a voice). Yoda belongs to an unknown species. One of his distinctive characteristics, in addition to large green ears, is the use of unusual word order patterns. Some examples are provided below:
| a. | Friends you have there. (Episode V) |
| b. | Help you it will. (Episode II) |
| c. | The secret of the Ancient Order of the Whills, he studied. (Episode III) |
Yodish word order has been described by some linguists as OSV or XSV, where X stands for any complement that goes with the verb.Footnote 10 A more precise description would be as follows:
(20) Non-finite part of predicate/Object/Oblique – Subject – Finite Verb/Auxiliary/Copula
The first part can be object, oblique, nominal part of the predicate or non-finite parts of the predicate, i.e., participle or infinitive with dependent elements. They are followed by the subject and the finite verb, auxiliary or copula. Below are some examples that support this generalization:
| a. | Rest I need (Episode VI) |
| b. | To his family, send him. (Episode III) |
| c. | A certainty it is. (Episode II) |
| d. | Hard to see, the dark side is. (Episode I) |
| e. | Earned it, I have. (Episode VI) |
However, there are some exceptions, as in the examples below:
| Copula/AUX – subject: | |
| a. | Not ready for the burden were you. (Episode VI) |
| b. | Heard from no one, have we. (Episode III) |
| Object – subject – auxiliary – lexical verb: |
| The outlying systems, you must sweep. (Episode III) |
Importantly, some of Yoda’s sentences have a standard word order:
| a. | Master Obi-Wan has lost a planet. (Episode II) |
| b. | A Jedi’s strength flows from the Force. (Episode III) |
| c. | That place is strong with the dark side of the force. (Episode V) |
Of course, the famous formula May the Force be with you is in standard English, too.
Remarkably, Yodish word order has longer dependency distances on average than its standardized version (Levshina Reference Levshina2019c). This conclusion is based on the Yodish data collected from the Internet Movie Scripts Database.Footnote 11 Data from five episodes were used: two episodes from the original trilogy (Episodes V and VI) and three episodes from the prequel trilogy (Episodes I, II and III).
The higher processing costs are to a large extent due to the fact that the auxiliaries and copulas are often separated from the non-finite and nominal parts of the predicates. Below is an example:
| a. | Failed to stop the Sith Lord, I have. (Episode III, original) |
| b. | I have failed to stop the Sith Lord. (standardized) |
Thus, Yodish is less efficient than standard English due to the separation of non-finite, lexical parts from the auxiliaries and copulas. This pattern is not only inefficient, it is also quite unrealistic. The reason is that grammaticalized elements, such as auxiliary verbs, arise in highly predictable contexts (see Section 5.4). For instance, the future marker going to is reduced (cf. gonna) and semantically bleached in the contexts where it is followed by a verb. When the auxiliary is not accompanied by the lexical part, it is less predictable and therefore less likely to undergo formal reduction and semantic change. Frequent co-occurrence of the elements together is necessary for grammaticalization and also explains why auxiliaries usually lose their positional freedom (Lehmann Reference Lehmann2015: 168). The existence of auxiliaries in Yodish that are often split from their lexical elements is then difficult to explain. This shows that Yodish is truly alien.
However, these additional processing costs are counterbalanced by additional cognitive effects. In particular, we can speak here of defamiliarization, a theoretic concept from Russian formalism:
The technique of art is to make objects ‘unfamiliar’, to make forms difficult, to increase the difficulty and length of perception because the process of perception is an aesthetic end in itself and must be prolonged.
Such violations provide additional cognitive effects of ‘strangeness’, which are important for creation of a new fictional universe. George Lucas and other film creators responsible for Yoda’s syntax seem to be exploiting the principle of positive correlation between benefits and costs. Extra efforts spent during the processing of Yoda’s utterances promise the film audience extra benefits in the form of additional inferences. As Yoda says himself, ‘You must unlearn what you have learned’ (Episode V). Thus, although we can conclude that Yoda’s word order is not optimal by itself, it is perfectly efficient for the communication between the film creators and the audience.
3.5 Conclusions
This chapter discussed the main criteria of efficient order of meaningful elements, from morphemes to words and syntactic constituents. Examples of efficient and inefficient order were also given, with alternative theories and accounts explaining language users’ preferences in production and difficulties in comprehension. It is not always easy to tell which explanatory factors are relevant and which are not.
Speaking very broadly, we can say that accessibility plays a crucial role in determining efficient order, similar to the coding length asymmetries discussed in Chapter 2. Many cross-linguistic generalizations – from relative position of morphemes and nominal phrase elements to the prevalence of subject-first orders and rarity of crossing dependencies – can be explained by the principle of maximization of accessibility. Accessibility has different aspects. One of them is the availability of a mental representation due to the semantic and discourse properties of a referent. For example, discourse-given referents are more accessible than new ones. Moreover, accessibility is also determined by the availability of a strong trace of an exemplar in the memory, due to its recency, as well as by surprisal of linguistic units.
Word order interacts with coding length. For example, if the position of an element is non-canonical, more articulatory effort can be necessary because the element will be less expected and therefore less accessible. In Section 3.2.4 it was mentioned that a non-iconic order of events in a sentence leads to the use of longer expressions (e.g., I conquered after I came). Consider a different example from Warlpiri discussed by Hawkins (Reference Hawkins2004: Ch. 6). When NP constituents are adjacent, as in (26a), the ergative case marking occurs just once in the NP and is not copied on all constituents. However, case copying occurs only if a noun and a dependent adjective are non-adjacent, as in (26b):
| Warlpiri: Pama-Nyungan (Hale Reference Langacker, Narrog and Heine1973: 314) | ||||
| a. | tyarntu | wiri-ngki+tyu | yarlki-rnu | |
| dog | big-erg+me | bite-pst | ||
| b. | tyarntu-ngku+tyu | yarlku-rnu | wiri-ngki | |
| dog-erg+me | bite-pst | big-erg | ||
| ‘The big dog bit me.’ | ||||
Another example is Korean, where object marking is probabilistic and depends on numerous parameters (see Chapter 8). All other things being equal, object marking is less likely to occur if the object is adjacent to the verb (Kim Reference Kim2008). These examples suggest that lower accessibility of the grammatical function due to word order can be compensated for by additional coding.
One caveat is that many of the existing processing theories are based on WEIRD languages, such as English and Dutch, which have many cross-linguistically rare features. However, recent typologically informed work (e.g., Martin and Culbertson Reference Martin and Culbertson2020) suggests that the results based on such languages should not be extrapolated to all languages automatically. This means that some ideas presented here might be challenged later when more diverse languages are taken into account.
4.1 Efficiency Beyond Coding Length and Word Order
In the previous two chapters we examined how language users save effort and time by using expressions of different length, or by rearranging meaningful elements. These strategies have received substantial attention in the literature. This chapter describes other methods of saving effort that are less frequently discussed but are equally important. These strategies include the following:
the use of more accessible forms from a set of alternatives. Accessibility here stands for higher frequency (or expectedness) and greater transparency. The related strategies are discussed in Sections 4.2 and 4.3, respectively;
horror aequi, or avoidance of identity, which helps to prevent similarity-based interference. This strategy is examined in Section 4.4;
avoiding cognitive overload in the process of integrating new referents. This method is considered in Section 4.5.
These strategies are related to minimization of processing costs by maximizing accessibility, as will be shown below. They also interact with the other strategies of saving effort and time described in the previous chapters.
4.2 Preference for Accessible Units and Interpretations
As discussed in Chapter 3, word order plays an important role in increasing accessibility at a particular point in discourse. For example, minimization of dependency distances and syntactic domains can be regarded as strategies for maximizing accessibility. We can also increase accessibility by choosing the most accessible forms in production or the most accessible interpretation in comprehension. Unlike in the previous chapter, where we discussed syntagmatic choices, this is a paradigmatic choice. This tendency is probably so obvious that it is hardly noticed. But there is some theoretical support for this thinking. For example, Sperber and Wilson (Reference Sperber and Wilson1995) argue that addressees are very efficient information processors: they choose the interpretation that maximizes cognitive effects while minimizing cognitive costs. The latter component means that they choose the most accessible interpretation at the given moment. Section 5.4.1 argues that semantic bleaching, an important grammaticalization process, can be explained by the addressee’s preference for the most accessible meaning of an expression. Similarly, we can say that speakers also minimize their processing costs, in particular by choosing the most accessible forms from a set of alternatives compatible with their communicative goals.
As in the previous chapters, several quantitative measures of accessibility can be relevant. It has been shown that objects are named faster in picture-naming experiments if their names are more frequent (Wingfield Reference Wingfield1968). Moreover, disfluencies, which indicate difficulties in language production, are less likely to occur before frequent expressions (Schnadt and Corley Reference Schnadt and Corley2006). As for comprehension, numerous studies demonstrate that it takes less time to recognize more frequent words in comparison with less frequent ones (e.g., Howes and Solomon Reference Howes and Solomon1951). Also, fixation time on infrequent words is longer than on frequent ones, as one can see from eye-tracking studies (e.g., Rayner and Duffy Reference Rayner and Duffy1986).
Predictability from previous context plays an important role in production and comprehension, as well. Words that are highly predictable from context are produced faster (Cohen and Faulkner Reference Cohen and Faulkner1983). There is substantial evidence that units with high surprisal (that is, low conditional probability given previous words, as estimated from corpora) are more difficult for comprehension than units with low surprisal (i.e., high predictability) (Hale Reference Hale2001; Levy Reference Levy2008), as one can see from reading times (e.g., Smith and Levy Reference Smith and Levy2013) and from brain activity patterns (Frank et al. Reference Frank, Otten, Galli and Vigliocco2015).
Accessibility is lower when there are interfering units with similar semantic or formal properties. During word retrieval, the lexical nodes of related words are activated. For example, if the speaker wants to produce the word ‘cow’, the lexical nodes of ‘bull’, ‘goat’, ‘cattle’, ‘animal’, etc. will be activated, as well. These nodes compete with each other. As a result, the more similar the activation levels of the nodes, the more difficult the decision (Schriefers, Meyer and Levelt Reference Schriefers, Meyer and Levelt1990; Roelofs Reference Roelofs1992). For example, an eye-tracking study (Rayner and Duffy Reference Rayner and Duffy1986) revealed that participants spent a longer time fixating ambiguous words with two equally likely meanings than fixating ambiguous words with one highly likely meaning. In production, naming a pictured object (e.g., a shark) was slower when a competing word (e.g., whale) had been recently elicited by a definition (e.g., ‘a very large mammal that lives in the sea’), which suggests a lexical interference effect (Wheeldon and Monsell Reference Wheeldon and Monsell1994). The same holds for constructional variation. For example, when language users were forced to produce a prepositional dative in contexts where a double-object dative was more expected on the basis of available contextual features, or the other way round, they gestured more and were more likely to be disfluent than when they produced more preferred structures (Cook, Jaeger and Tanenhaus Reference Cook, Jaeger and Tanenhaus2009). This indicates that production of more accessible constructional variants is less costly than production of less accessible ones.
We usually produce the most accessible forms, and choose the most accessible interpretation. But we also sometimes use less accessible ones in order to create desirable cognitive effects. Consider an illustration. Chapter 1 discussed Levinson’s (Reference Levinson2000) heuristics and implicatures (Section 1.4.2). Recall that I-implicatures contain the message that the meaning is ordinary, typical, expected, while M-implicatures suggest that the situation is non-stereotypical. In most cases, the contrasts also have length asymmetries, which means that they can be explained by the principle of negative correlation between accessibility and costs (see Chapter 2). For example, if we call someone’s house a ‘mansion’, we can imply ironically that it is pretentious and immodest (Levinson Reference Levinson2000: 138). But in some cases, the length difference between available options is small or non-existent. Examples are some pairs of cross-register doublets, as in the following example from Levinson (Reference Levinson2000: 139):
| a. | He was reading a book. |
| (I-communicates → He was reading an ordinary book). | |
| b. | He was reading a tome. |
| (M-communicates → He was reading some massive, weighty volume). | |
How to interpret such contrasts? First of all, the stylistically marked words may be more difficult to extract from the memory and to comprehend. Their use signals to the addressee that the meaning is less accessible (a large heavy book). Moreover, we should recall the principle of positive correlation between benefits and costs: rare words are more costly for processing, but they can create additional cognitive effects as a compensation for more effort. The effects can be in the form of elevated style or irony, as in the case with mansion used to refer to someone’s house. We can also expect that the ironic interpretation is supported by a special intonation and facial expression, which can help the addressee to infer the intended meaning.
The addressee’s processing costs can also be reduced thanks to linguistic co-text and situational context. As an illustration, take the pair of stylistic synonyms horse and steed (Levinson Reference Levinson2000: 139). The accessibility of the word steed can be higher in actual language use than on its own because the word occurs mostly in fiction and with specific attributes, which make its surprisal low e.g., his noble/trusty/mighty steed.
4.3 Analytic Support
It has been claimed that a sentence can be easier to produce and comprehend, especially in cognitively demanding contexts, if the speaker uses analytic forms instead of synthetic ones. The choice can be made in the following cases, for example:
English adjectival forms of comparison (e.g., cleverer – more clever, fuller – more full);
the English genitive alternation (e.g., the topic’s relevance – the relevance of the topic);
English subjunctive alternation (if he agree-Ø vs. if he agrees vs. if he should agree);
German past time alternation (she brauchte – sie hat gebraucht ‘she needed’);
English future tense alternation (will – going to), since will is often contracted to ’ll;
Spanish future tense alternation (e.g., comeré vs. voy a comer ‘will eat’).
Although these forms differ in length (cf. Chapter 2), they also differ in the degree of autonomy of the grammatical elements. The choice between more and less bounded expressions can play an independent role for efficiency. In fact, Wilhelm von Humboldt claimed as early as 1836 that analyticity increases explicitness and transparency while decreasing comprehension difficulty (Humboldt Reference Humboldt1836: 284–285).
Mondorf (Reference Mondorf, Moravcsik, Malchukov and MacWhinney2014) argues that the use of these forms can be explained in terms of processing demands. She supports her claims by showing that analytic forms are often used in situations that require more processing effort, while synthetic forms are used in easy-to-process environments. Complexity is multifactorial, and depends on many properties of contexts, from phonology to semantics and syntax. For example, analytic comparative and superlative forms are preferred when the word ends in a consonant cluster, as in strict or apt. Negation is also considered to add complexity to the context. This is why negated contexts generally increase the chances of the longer variant. This correlation has been found for the analytic and synthetic future in Mexican Spanish (Lastra and Butragueño Reference Lastra and Butragueño2010) and for the English subjunctive with zero-inflected verbs vs. would-subjunctive (Schlüter Reference Schlüter, Rohdenburg and Schlüter2009). Syntactic complexity can also play a role. For example, as shown by Szmrecsanyi (Reference Szmrecsanyi2003), the more analytic future form with going to is more frequently used in structures that are more complex to process (longer sentences and dependent clauses) than the will-future (due to the frequent use of the contracted form ’ll, it can be considered more synthetic).
Some examples of variation have to do with the accessibility of specific meanings and interpretations. For example, the analytic more-variant is chosen more often with adjectives that are infrequently used in the comparative (Mondorf Reference Mondorf, Rohdenburg and Mondorf2003: 260–261). Also, the use of a synthetic comparative form is facilitated if a comparative form of any type has been previously activated in context (Mondorf Reference Mondorf, Rohdenburg and Mondorf2003: 285–286), and the comparative meaning becomes more accessible. Moreover, abstract and figurative concepts are regarded as more complex than concrete and literal ones. Compare the figurative use of bitter in the more bitter takeover battles of the past with a literal use: the beer is bitterer.
Note that all the examples provided in this subsection display differences in length. Some of these can be explained by the principle of negative correlation between accessibility and costs. In particular, this motivates the user to choose more costly forms to signal that the information is less accessible due to the more complex environment.
But other explanations have been proposed, as well. For example, Szmrecsanyi (Reference Szmrecsanyi2003: 23) writes,
Because BE GOING TO typically contains more material than WILL/SHALL, it provides a sort of redundancy that will ease online processing for hearers by making the predication more accessible.
Also, it may be advantageous for the speaker to use the longer form when they have planning problems. For example, by using the longer form be going to, speakers can ‘stall’ for planning time (Szmrecsanyi Reference Szmrecsanyi2003: 23). This means that the longer forms can prevent cognitive overload both for the speaker and the addressee and avoid a breakdown in communication. However, this interpretation does not sound very plausible. There are more convenient devices for dealing with planning difficulties, most importantly, disfluency markers (e.g., um and uh), or word lengthening (e.g., theeee).
Another possible explanation of analytic support in complex environments has to do with transparency. The more transparent analytic forms can be easier to process, even if the length is the same. Why should that be the case? It is possible that at least some morphemes are weaker cues of the grammatical category than auxiliary words. For example, unlike the word more, the morpheme ‑er is ambiguous, being used to form both comparative forms and agentive nouns, e.g., The boy is a little cleaner. Also, suffixes can exhibit allomorphy due to phonological and other conditions, whereas auxiliary words are more formally stable. Analytic forms are also regular, while synthetic ones can be irregular. All these factors can create cognitive advantages for production and comprehension of analytic forms. Affixes and clitics are also more formally reduced and may be more difficult to identify in a noisy channel, even if they are highly frequent. For example, Hopper and Traugott (Reference Hopper and Traugott1993: 65) argue that the form going to is more substantive and therefore more accessible to hearers than ’ll or even will. This means that different aspects of accessibility (in particular, in terms of memory retrieval and perception) can be in conflict.
In general, the preference for analytic expressions is stronger in spoken language than in writing. According to Szmrecsanyi (Reference Szmrecsanyi2009), who studied analytic and synthetic expressions in English, explicitness and transparency are particularly important for spoken communication. But the higher analyticity of speech can also be explained by the pressure for maximization of accessibility in situations when interlocutors have to compete for the floor. Analytic expressions consist of highly frequent function words, which are easy to access. Compare the highly accessible analytic expression be happy with the less accessible lexeme rejoice. More research is needed in order to disentangle these factors.
4.4 Horror Aequi, or Avoidance of Identity
The principle horror aequi, or avoidance of identity, says that language users tend to avoid production of formally or structurally similar units close to one another. In phonology, this principle is also known as the Obligatory Contour Principle (Leben Reference Leben1973). For example, Rohdenburg (Reference Rohdenburg, Rohdenburg and Mondorf2003) points out that the bare infinitive after help is more likely if there is to before help. Consider an example:
| a. | She corrected me because she wanted to help me improve my German. |
| b. | She corrected me because she wanted to help me to improve my German. |
Language users are likely to avoid the second to before improve. More information about this alternation is provided in Section 9.3.
The cognitive motivation of this avoidance is similarity-based interference (MacDonald Reference MacDonald2013). For example, when two semantically related nouns are planned and uttered in close proximity, e.g., the saw and the axe, the production of utterances takes more time, and more errors are made than when the nouns were unrelated, e.g., the saw and the cat (Smith and Wheeldon Reference Smith and Wheeldon2004). When one word is chosen (e.g., the saw) for production, its semantic neighbours (e.g., the axe) need to be inhibited, which makes it more difficult to retrieve them again.
Interestingly, Gennari, Mirkovi and MacDonald (Reference Gennari, Mirkovi and MacDonald2012) show that participants produced active and passive relative structures about equally often if the head noun was inanimate, e.g., the bag being punched by the woman – the bag the woman is punching. But they used passives more often when the head noun was animate, e.g., the man being punched by the woman. Moreover, when participants used passive structures, they more frequently omitted the agent if the head noun was animate, the man that’s being punched, than when the head noun was inanimate, e.g., the bag that’s being punched by a woman. This can be regarded as avoidance of interference between semantically similar nouns (that is, the man and the woman).
Inhibitory effects of similarity can also affect comprehension. Evidence from different cognitive domains (including semantic visual and kinaesthetic information, tones and odours) reveals the same tendency: when some items are followed by stimuli that are similar to them along some dimensions, the original items are forgotten more quickly than in the absence of similarity (Lewis Reference Lewis1996; Van Dyke and McElree Reference Van Dyke and McElree2011).
Note that the interfering units need to be close to each other and be simultaneously present in working memory. If there is sufficient distance between them, formal and semantic similarity can in fact increase the accessibility of the target units, as evidence from structural priming shows (e.g., Bock et al. Reference Bock, Loebell and Morey1992).
Similarity-based interference also potentially explains the fact that sentences with double centre-embedded clauses are very difficult to process. Consider the following example, which was discussed in the previous chapter:
The administrator who the intern who the nurse supervised had bothered lost the medical reports.
Many popular accounts are based on some ideas about the limited capacity of working memory (e.g., Gibson Reference Gibson1998, Reference Gibson, Marantz, Miyashita and O’Neil2000; see Section 3.2.1). But the problems with (3) and similar sentences may also be due to the presence of too many structurally and semantically similar constituents with the same syntactic function (Lewis Reference Lewis1996). Remarkably, if we keep the same centre-embedded relative clause structure but make the forms more diverse, the sentence is easier to process:
The administrator everyone I supervised had bothered lost the medical reports.
Another piece of evidence is V. Ferreira and Firato’s (Reference Ferreira and Firato2002) experiment, in which two kinds of stimuli were presented. In the sentences like (5a), the target noun phrase was conceptually similar to three previous noun phrases in the same sentence, leading to greater similarity-based interference. In the sentences like (5b), the target phrase was conceptually dissimilar, leading to less interference. The use or omission of the complementizer that was distributed evenly between the semantic conditions.
| a. | The author, the poet, and the biographer recognized (that) the writer was boring. |
| b. | The author, the poet, and the biographer recognized (that) the golfer was boring. |
The task was to recall the sentences. Interestingly, speakers produced the complementizer more often before conceptually similar noun phrases like in (5a) than before dissimilar ones like in (5b). There were also more disfluencies. This means that similarity between elements indeed can cause additional processing costs (see also Walter and Jaeger Reference Walter, Jaeger, Edwards, Midtlyng, Sprague and Stensrud2008).
It is efficient therefore to avoid such interference. This can be seen as maximization of accessibility, or rather minimization of inaccessibility, because units that are highly similar to the ones used in the near context are temporarily less accessible.
4.5 Entry Place for New Referents
The last strategy discussed in this chapter has to do with the addressee’s processing resources and the distribution of new and therefore less accessible referents in discourse. There are well-documented universal preferences in the organization of discourse, known as the Preferred Argument Structure (Du Bois Reference Du Bois1987). One of these preferences is called ‘Avoid more than one new core argument’. Another formulation is Chafe’s (Reference Chafe and Tomlin1987: 32) principle ‘one new concept at a time’. It is believed that the introduction of new referents into the current discourse has high processing costs. This is why clauses with more than one new participant are avoided across different languages (Du Bois et al. Reference Du Bois, Kumpf and Ashby2003). For example, the sentence in (6) would have high processing costs:
A German orders a martini.
However, when the information is introduced in a piecemeal manner, the processing is easier, although the articulation costs are higher. Consider a joke in (7), where the story begins with the formula ‘An X walks into a bar …’. This makes the processing easier.
A German walks into a bar and orders a martini. The bartender asks ‘dry?’ The German says ‘Nein, just one.’
There are different views about whether one needs special structures to facilitate the integration of new referents. Du Bois (Reference Du Bois1987) argues that intransitive subjects and direct objects are suitable entry points. Particularly useful are semantically bleached intransitive predicates like come, arrive and appear, which provide little conceptual information beyond the appearance of the new referent. Other useful structures can be presentational constructions (Lambrecht Reference Lambrecht1994), such as English there is. At the same time, Schnell, Schiborr and Haig (Reference Schnell, Schiborr and Haig2021) do not find clear indications that syntactic argument structure is sensitive to newness. In their cross-linguistic spoken corpora, the only syntactic functions that show a consistently high proportion of new referents are direct objects and various oblique arguments. Schnell et al. argue that the specialized constructions are important only in very local discourse contexts, such as introductions of characters at the outset of a narrative (e.g., Once upon a time there was a little girl …) or major scene transitions. Objects are convenient entry points because they allow the speaker/writer to anchor new referents to a state of affairs with an already established referent. This may be more efficient than isolating the referent in a special introductory clause, also because adding a new clause would be more costly. It is not clear yet whether word order (in particular, OSV or OVS) can affect this preference, though.
To summarize, language users avoid structures that require simultaneous integration of more than one new referent into discourse. This avoidance has several benefits. First, the preference for step-by-step introduction of new referents helps to avoid an overflow of inaccessible information. Moreover, when we have two or more new referents waiting for their integration, this can create competition and interference in the memory, similar to the effects discussed in the previous section. In addition, useful cognitive effects, which represent the main benefits of language communication (see Section 1.2.1), arise when we integrate old information with new, and make new relevant conclusions. If new information cannot be integrated with previous knowledge within a reasonable stretch of discourse, its relevance is questionable. As a result, benefits cannot be obtained, and the cooperation between the speaker and the addressee will be disrupted.
4.6 Conclusions
In this chapter we discussed types of efficiency beyond coding length asymmetries and word order. Language users have a preference for forms and meanings with higher accessibility, which can be understood broadly as ease of retrieval from long-term memory, transparency, or absence of interfering competitors. Higher accessibility facilitates processing. The examples of efficiency discussed here obviously require more research. In particular, we need to understand which aspects of accessibility facilitate processing, and how their preferences interact with other strategies, such as using longer forms for less accessible meanings, and providing additional cognitive benefits to compensate for higher costs.




