2.1 Efficient Length Asymmetries
This chapter describes efficient use of linguistic units with different lengths. Length represents articulation costs, but it can also be interpreted as time expenditure. It is difficult to separate articulation costs from time costs. In phonological studies, one often measures the duration of units (e.g., Aylett and Turk Reference Aylett and Turk2004), while in studies of lexicon and grammar, which are often based on written corpora, the focus is usually on the number of words, segments or letters (see examples below). Although articulation effort also depends on stress and amount of articulatory detail, which require carefully annotated spoken data, I will focus here on length, which is easier to measure and compare.
As mentioned in Chapter 1, articulation is the slowest and most energy-consuming stage in human communication. The speaker can spare effort and time by omitting or shortening the forms that represent accessible information – that is, the information already available to the addressee, or easily inferable from the context and general knowledge. In contrast, more effort should be spent on information that is less accessible. This behaviour corresponds to the principle of negative correlation between accessibility and costs. We speak of an efficient length asymmetry when there is a negative correlation between formal length and accessibility of information. The sections below illustrate diverse formal asymmetries that display this correlation.
Formal length asymmetries are extremely diverse. Some efficient asymmetries are fully conventionalized, as, for example, zero marking of singular and non-zero marking of plural, e.g., book – books. Some asymmetries are context-dependent and require pragmatic inference. For example, when someone says their address to a taxi driver, they in fact exploit the principle of negative correlation between accessibility and costs. An expression like Park Street, 23, please is efficient, unlike saying I need to get to Park Street, house number 23, in this city. I want you to take me there in your cab now and promise to pay you a certain amount of money in return if you get me there.
The principle of negative correlation between accessibility and costs is also responsible for so-called bridging implicatures. For example, if your friend says, I bought a new bicycle yesterday. The saddle is very comfortable, you will understand that the saddle belongs to the bicycle that your friend has bought. You friend relies on your ability to access the knowledge that a typical bicycle has a saddle. This allows your friend to spare effort instead of saying The saddle of the bicycle I bought yesterday is very comfortable. This type of efficiency is pervasive in discourse.
Importantly, by opting for a longer or shorter expression, the speaker signals how accessible the intended interpretation is. The length itself represents an instruction for where to search for the interpretation. For example, the pronoun she, as discussed in the next section, means that the referent is not only female and singular, but also highly accessible, whereas the definite description the friend means not only a ‘close acquaintance’, but also that the referent has a relatively low degree of accessibility (Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001: 29). In this sense, every linguistic expression the speaker chooses is also a marker of the accessibility of its interpretation (cf. Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001). Some marking of this type involves the speaker’s choice, as in the referential expressions mentioned above (see Section 2.2), while some is fully conventionalized, as in the obligatory marking of grammatical categories (see Section 2.3). I argue that the emergence and maintenance of obligatory marking follows the same pragmatic principle as the speaker’s choice between different coding possibilities in optional marking – namely, the principle of negative correlation between accessibility and costs.
2.2 Accessibility of Referents and Length of Referential Expressions and Markers
2.2.1 Efficient Use of Referential Expressions: Hierarchy of Explicitness
An important type of efficient context-dependent asymmetries is observed in referential expressions. One can formulate a hierarchy of explicitness of such expressions (Ariel Reference Ariel1990, Reference Ariel, Sanders, Schliperoord and Spooren2001; Arnold Reference Arnold2010, see also Givón Reference Givón1983, Reference Givón2017), as shown in (1):Footnote 1
| Hierarchy of explicitness: | |
| Most explicit | |
| Semantically rich expressions (the most popular teacher at our school) | |
| Shorter nominal expressions (Ann, the teacher) | |
| Pronouns (she) | |
| Zeros | |
| Least explicit | |
This variation is constrained by the degree of accessibility of mental representations of the referents. The notion of accessibility was introduced in Chapter 1. Highly accessible representations are expressed by shorter forms than less accessible ones. Note that there can be more subtle accessibility distinctions within these broad categories, which cannot be explained by length alone. For example, James as a surname can signal lower accessibility than James as a first name (Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001). In Russia, colleagues would refer to me using the full patronymic form Natalja Gennadievna in front of the students, and would say Natasha when speaking with other colleagues, although the accessibility of me as the referent could be the same. As discussed in Section 1.2.2, social costs often interact with formal length.
The level of accessibility of referents in discourse depends on several factors (Ariel Reference Ariel1990, Reference Ariel, Sanders, Schliperoord and Spooren2001, Reference Ariel2008; Arnold Reference Arnold2010), which can interact in complex ways (see Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001). A crucial factor is previous discourse. The referents that have been introduced in discourse have more activated representations than the referents that have not been mentioned. This is why full nouns are typically used to introduce new referents, while pronouns or zeros are usually reserved for the referents already introduced in the discourse. Moreover, the more recent the mention of the referent, the more accessible the mental representation is. For example, Arnold (Reference Arnold2010) provides data to show that the chances of pronominal reference decrease with distance from the last mention of the referent (measured in clauses). Paragraphs and episode boundaries also decrease accessibility. A related factor is density of mention. The higher the density of mention of a referent in previous discourse, the more activated its mental representation is and therefore the higher the chances of short (pronominal) expressions (Levy and McNeill Reference Levy and McNeill1992). In addition, topical referents are more accessible and therefore expressed by less explicit forms than non-topical ones.
The syntactic function of the referent is another important factor. A referent is more accessible if it has been mentioned previously in the same syntactic function. This parallelism makes it easier for the addressee to identify the referent. This explains why reduced forms are more likely if the referring expression and the previous mention of the referent are in the same syntactic position (Levy and McNeill Reference Levy and McNeill1992). Consider an example:
| Ann invited Sue to the conference. | |
| a. | She asked Sue to present her new research on metaphors. |
| b. | Sue asked her to tell more about the event. |
According to Arnold (Reference Arnold2010), the preference for the pronoun she/her that refers to Ann should be stronger in (2a) She asked Sue…, than in (2b) Sue asked her… . Also, the current thematic role of the referent can be important. For example, Arnold (Reference Arnold2001) shows that goals of verbs of transfer, e.g., give/send/bring to Sue, are more frequently referred to by shorter pronominal forms than sources, e.g., accept/get/borrow from Sue. Language users also refer more to goal referents than to source referents in discourse, as Arnold’s story-telling experiment and corpus analyses reveal. This frequency asymmetry is also observed for inanimate goals and sources (e.g., to London/the market/a village is more frequent than from London/the market/a village), which accounts for the cross-linguistic differences in the length of marking of goals and sources (Michaelis Reference Michaelis2017). The higher probability of goals means their higher accessibility, which explains why they are expressed by shorter forms than sources.
The presence of competing referents in the context decreases accessibility. Tily and Piantadosi (Reference Tily, Piantadosi, van Deemter, Gatt, van Gompel and Krahmer2009) found, in particular, that participants were less likely to guess the upcoming referent correctly if there were many referents in the previous text. Notably, the presence of other referents plays a role even if there is no direct need for disambiguation. For example, Arnold and Griffin (Reference Arnold, Losongco, Wasow and Ginstrom2007) performed an experiment with cartoons, on which the subjects could see either one character or two different-gender characters. The first line of the story was, for example, Daisy went for a boat ride {with Mickey} on the lake. Next, the second picture was shown, which displayed one character doing something (e.g., Daisy rowing away). The second character was either present or absent. Participants generated another line for the story (e.g., Daisy left Mickey behind; or She rowed into the sunset). Interestingly, pronouns were more common in the one-character than two-character stories, despite the obvious fact that there was no risk of confusability, since the characters had different genders. The competition between the characters in the speaker’s mental model results in greater cognitive load and, importantly, in lower activation of each referent.
Finally, we should mention the interaction between the speaker and the addressee. Wilkes-Gibbs and Clark (Reference Wilkes-Gibbs and Clark1992) show that descriptive nominal expressions tend to become shorter when the speaker and the hearer develop and expand their common ground – the information they believe they share. Interestingly, even subtle differences in the status of the hearer, e.g., from being able to overhear or watch the previous interactions to being totally new to the scene, determine the amount of coding in the subsequent interaction.
To summarize, the more accessible a referent is due to the immediate context, interaction settings, syntactic role or previous experience with language, the less costly the referential expression will be. As pointed out by Ariel (Reference Ariel1990), the choice of the specific form helps the addressee to identify the location of the referent in their mental representation. The use of a shorter variant signals that the referent is accessible. Longer forms signal low accessibility. Section 1.5 discussed experimental evidence showing that processing costs increase if there is a mismatch between the length of a referential expression and the accessibility of the referent. This supports the idea that the pragmatic processes captured by the principle of negative correlation between accessibility and costs play a role in the processing.
As for zero anaphora, different languages have different rules with regard to which constituents can or should be omitted in discourse. Yet, there are a few general tendencies. First, given and topical referents, which are restorable from previous discourse, are more frequently omitted than new and focal ones. In many languages, including Chinese, Japanese, Korean, Hindi, Hungarian and Lao, any given, non-focal argument can be omitted, whereas no language omits focal elements (Goldberg Reference Goldberg, Östman and Fried2005). As defined by Lambrecht (Reference Lambrecht1994: 218), the focus relation relates ‘the pragmatically non-recoverable to the recoverable component of a proposition and thereby creates a new state of information in the mind of the addressee’. This is why focal elements need to be overtly expressed.
In Gilligan’s (Reference Gilligan1987) cross-linguistic study, imperative subjects can be omitted in nearly all languages, followed by subjects and then by direct objects. Other constituents (indirect objects, possessive pronouns and adpositional objects) are very rarely omittable. Note that this hierarchy is observed in languages without agreement (see Section 2.2.2). The hierarchy can be explained by the different average levels of accessibility of different arguments. Imperative subjects are easily restorable from the context, and therefore highly accessible. They are followed by other subjects, which are more frequently thematic, given, and therefore more accessible than objects (Lambrecht Reference Lambrecht1994: 262; see also Chapter 8). Some languages display variation within a specific argument. For example, in Ancient Greek, it was natural to omit definite objects if they were highly accessible (Luraghi Reference Luraghi2003). Notably, their omission depended on the degree of conventionalization and grammaticalization of the information helping the addressee to access the object referent. In highly grammaticalized constructions with conjunct participles,Footnote 2 object omission was obligatory. It was also common in coordinated clauses, followed by answers to yes–no questions. In other cases, omission was discourse-conditioned and optional. It affected highly accessible topical objects.
The examples of obligatory zero arguments demonstrate that efficient behaviour motivated by accessibility of a referent in context can become conventionalized, becoming obligatory. This mechanism is efficient by itself, as it makes language production more automatic and reduces the processing load.
English represents an interesting case as far as zero objects are concerned. Although it generally does not allow for object omission, there are a few lexically specific exceptions (Fillmore Reference Fillmore1986). Consider the following contrasts:
She won Ø can be said when the person in question won an election/game/race, but not if she won the gold medal or the first prize.
She lost Ø, again, can be said if she lost some competition, but not if she lost her wallet or keys.
We’ve already eaten Ø can be said in the situation when we have had a meal, but not when we have eaten something specific.
I forgot Ø, e.g., to fix something, but not if the speaker forgot the keys.
Interestingly, the object cannot be omitted even if it is previously mentioned or clear from the context (e.g., Where’re the keys? I forgot *(them)).
One might think that abstract entities and events are more commonly omitted than concrete physical objects. However, this is not quite true. If we take verbs of motion with a specific destination or point of departure, the object can be omitted if it is a physical location, and cannot be omitted if it is abstract and metaphorical:
She was approaching Ø (e.g., the speaker, the town), but not if she approached the solution.
She arrived Ø (e.g., at the summit), but not if she arrived at the answer.
The elliptical use is supported by conventionalized inferences based on the principle of negative correlation between accessibility and costs. This is obvious in the case of motion verbs, where the interpretation of the physical motion (approaching a location and arriving at a certain place) is the stereotypical interpretation, and the metaphorical extensions (approaching a solution or arriving at an answer) are less accessible. In other cases, the interpretation that allows the ellipsis is on average more probable than the interpretation that does not.
Consider the verb win. In a random sample of 100 examples of the verb from the Corpus of Contemporary American English (COCA, Davies Reference Davies2008– ), 90 were instances of win as a verb followed by a direct object or used without any complement. The majority of these instances (61) were about winning some competition (elections, sports, social conflicts, etc.), as in (3a). We will call this sense win1. Only 26 were about winning something for oneself (a prize, confidence, support, more rights, a Senate seat, etc.), as in (3b). This usage will be called win2. In three instances, it was difficult to classify the examples semantically.
| a. | Everything counts, everything has to be perfect for you to win the game. |
| (COCA, News, Denver, 2005) | |
| b. | Guess what? You can win a cruise at home as well. |
| (COCA, Spoken, NBC: Today Show, 2017) |
The meaning of win1 (i.e., winning some competition) is more common and therefore more restorable from context than win2 (i.e., winning some objects or other benefits). Also, the information about winning a competition is often mentioned previously or clear from context. Consider (4a and b):
| a. | If this is a big chess game, did you win or lose? |
| (COCA, Spoken, CBS_48Hours, 2007) | |
| b. | How are you doing in the polls? How are you going to win in New Hampshire? |
| (COCA, Spoken, CBS_Early, 1999) |
So, the information about the competition X wins (win1) is often accessible. It is discourse-given and topical. In contrast, the information about the prize X wins (win2) is usually not accessible. It is often focal. This is why the intransitive use of win2 has not become conventional, even if the object is accessible in a given context, e.g., Where did he get ten million dollars from? – He won2 *(them) in a lottery. This example demonstrates how an efficient strategy becomes conventionalized and becomes a categorical grammar rule.
Omission can also be due to reasons different from saving articulation effort or time. For example, taboo objects, such as bodily emissions (spit, piss) are usually omitted for reasons of politeness (from Goldberg Reference Goldberg, Östman and Fried2005):
| a. | Pat sneezed (mucus) onto the computer screen. |
| b. | The hopeful man ejaculated (his sperm) into the petri dish. |
| c. | Pat vomited (her lunch) into the sink. |
These are cases of the so-called Implicit Theme Construction (Goldberg Reference Goldberg, Östman and Fried2005). At the same time, the object is highly accessible to the addressee from general knowledge, so its omission helps to save effort, as well.
Next, the object can also be irrelevant if the attention is on the action itself (Goldberg Reference Goldberg, Östman and Fried2005):
| a. | Tigers only kill at night. |
| b. | She gave and gave, and he took and took. |
These are instances of the so-called Deprofiled Object Construction (Goldberg Reference Goldberg, Östman and Fried2005). This agrees with Givón’s (Reference Givón2017: 3) principle of cataphoric zeros: ‘Unimportant information need not be mentioned.’ Probably the most famous example of this principle at work is omission of the agent in passive constructions:
An English tourist was robbed of his Rolex watch (by Ø).
This type of argument omission is efficient, as well. The speaker does not spend effort on transfer of information that will bring no communicative benefits (see Section 1.4.2).
Goldberg also explains conventionalized habitual uses like She drinks/smokes/writes as a result of such deprofiling of the object, with subsequent lexicalization of the intransitive use. A similar perspective is taken by Givón (Reference Givón2017: 198). Indeed, what is important is that the person in question is an alcoholic, a smoker or a writer.
Although this interpretation is perfectly reasonable, accessibility of the object may also play a role. In particular, Huang (Reference Huang2007: 48–49) classifies uses like John doesn’t drink in the sense ‘John doesn’t drink alcohol’ as cases of lexical narrowing based on an I-implicature (see Section 1.4.2). Alcohol is a highly accessible interpretation if one is speaking about a habit, as the present simple form suggests. So it can be omitted as a typical object. Similarly, one can say John smokes, implying that he smokes tobacco (cigarettes, cigars or a pipe). Smoking other substances would be a less likely interpretation. One might wonder, however, if this inference will be made in a community where other plants are preferred.
Resnik (Reference Resnik1996) investigated the use of English verbs with and without objects in corpora and in human subject norms. He also measured selectional preference strength, which reflects the strength of association between the verbs and semantic classes of their objects.Footnote 3 The stronger the preference, the more biased a verb is to objects of certain semantic classes. Resnik found that the percentage of omitted objects positively correlated with selectional preference strength. For example, drink and sing had the highest rates of object omission, as well as the strongest selectional restrictions. In contrast, verbs like get and make had zero object omission rates and weak selectional restrictions. He concluded that strong selectional restrictions are a necessary condition for object omission. Notably, Glass (Reference Glass and Farrell2020) does not find strong support for this claim in general-interest conversations on Reddit.Footnote 4 However, when the data are taken from specific-interest threads, an interesting pattern emerges: verb objects are more frequently omitted in the communities where they are more strongly associated with a routine. For example, fitness enthusiasts frequently omit the object of the verb lift (weights), whereas home-brewers do not mention the object of bottle (beer). This demonstrates the importance of social and situational expectations for efficient use and omission of arguments.
It is possible that all the factors mentioned above play a role in determining if the argument can be omitted: its level of accessibility (based on diverse sources), the communicative benefits of naming it, and politeness concerns. The interaction of these factors requires further investigation.
2.2.2 Dependent Forms of Arguments
In addition to the factors discussed in the previous section, argument omission also depends on the presence or absence of agreement. In a cross-linguistic survey by Gilligan (Reference Gilligan1987: Section 3.4), languages where the verb agrees with a specific argument nearly always allow for omission of that argument. An example is subject agreement in Pashto:
| Pashto: Indo-European (Huang Reference Huang2007: 142) | ||
| Ø | mana | xwr-əm. |
| apple | eat-1.m.sg | |
| ‘(I) ate the apple.’ | ||
Languages without agreement allow pro-drop less frequently. As far as subject expression is concerned, this claim is supported by a recent study by Berdicevskis, Schmidtke-Bode and Seržant (Reference Berdicevskis, Schmidtke-Bode and Seržant2020), who report that languages that have subject indexation tend to allow for omission of the pronominal subject. They interpret that as evidence for an efficient trade-off: the subject should be coded only once, either as an independent form or as an agreement marker (see a critical evaluation of this claim in Section 6.2.1). There are some indications of a similar trade-off in case of object agreement: some languages (Arabic, Bantu and Iranian) have so-called pro-indexes, which are in complementary distribution with object nominals (Haspelmath Reference Haspelmath, Bakker and Haspelmath2013a; Haig Reference Haig2018). In other words, the indexes cannot occur when the object is explicit (although they may occur in the case of dislocated objects). However, object indexing often depends on diverse semantic and pragmatic factors, which are parallel to those relevant for differential case marking of objects (see Chapter 8). This can lead to patterns opposite to pro-indexing.
Consider Ruuli, a Bantu language, which has differential object indexing. In (9), the index ‑bu- corresponds to the noun class of the object (traps).
| Ruuli: Bantu (Just and Witzlack-Makarevich, Reference Just and Witzlack-MakarevichForthcoming: 2) | |
| Obuterega | o-bu-maite? |
| trap(14) | 2sg.sbj-14.obj-know.pfv |
| ‘Do you know these traps?’ | |
The indexing is probabilistic: 1st and 2nd-person, human and given objects are more frequently indexed than 3rd-person, non-human and new ones (Just and Witzlack-Makarevich, Reference Just and Witzlack-MakarevichForthcoming). This is efficient because new, indefinite/non-specific, nominal and 3rd-person referents are more likely to be objects than subjects, while given, definite/specific, pronominal and 1st or 2nd-person referents are biased towards the subject role (see the data in Section 8.4). So, arguments with a more accessible interpretation in terms of their grammatical role are less likely to be marked than arguments with a less accessible interpretation.
Another example is Maltese (Just and Čéplö Reference Just and Čéplö2019). An object index is always present if the object is pronominal and given, and always absent if it is new and non-specific (in typical VO sentences). Thus, arguments whose grammatical role is less accessible are indexed, and those whose role is more accessible are not. Also, an index is always used in sentences with OV order, which is less typical than VO. By providing an object marker, the speaker helps the addressee to process a sentence with a non-canonical order (see another example in Section 8.3.1).
We can also find efficient patterns at a more general level if we compare different arguments. Siewierska (Reference Siewierska2004: 43–46) observes a cross-linguistic correlation between the two scales in (10), which describe types of person markers.
| a. | Scale of phonological reduction/dependence of person markers: |
| Zero > Bound > Clitic > Weak | |
| b. | Scale of argument prominence: |
| Subject > Direct object/Theme > Indirect object > Oblique |
In the vast majority of languages that she examined (89 per cent, to be exact), more phonologically reduced and/or dependent person markers according to the scale in (10a), are used for arguments higher on the argument prominence hierarchy in (10b). Siewierska explains this correlation by the differences in accessibility of typical arguments in different syntactic positions:
since dependent person markers involve less encoding than independent ones, the expectation is that they should be characteristic of syntactic functions which tend to realize highly accessible referents.
Therefore, we can observe efficient asymmetries both on a global level (between person forms of different arguments), and on the level of specific arguments (as in differential indexing). As we will see below, such ‘recursive’ organization of efficient patterns is very common.
2.2.3 Expression of Coreferential Objects
Coreferentiality allows us to see two types of efficient correlations between accessibility and formal length. First, reflexive pronouns coreferential with the subject are either as long as or longer than corresponding forms with disjoint reference, e.g., English himself vs. him, Dutch zich or zichzelf ‘him/herself, themselves’ vs. hem ‘him’, and Mandarin Chinese (tā) zíji ‘him/herself’ vs. tā ‘him/her’ (Haspelmath Reference Haspelmath2008a). This has to do with the fact that in the overwhelming majority of cases, the subject and the object have disjoint reference (Ariel Reference Ariel, Sanders, Schliperoord and Spooren2001: 37; Ariel Reference Ariel2008: 218–219). For example, the Book of Genesis in Hebrew contains no direct objects coreferential with their subjects, out of approximately 4,500 clauses. This means that a disjoint reference interpretation of an object is more accessible than a coreferential one, which explains why the corresponding forms are often shorter. A diachronic account of the emergence of reflexive pronouns is offered in Section 5.3.1.
Second, similar to what we saw in the previous section on agreement markers, some languages display efficient asymmetries also at a more local level. There is variation within coreferential uses, which depends on the semantics of the verb. A language can have different coreferential forms for objects of verbs that usually represent self-directed actions, which include grooming verbs (e.g., wash, shave or dress), and for objects of verbs normally representing other-directed actions (e.g., hate, see or envy). Coreferential objects of self-directed verbs tend to have forms that are as long as or shorter than coreferential objects of other-directed verbs (Ariel Reference Ariel2008: Ch. 6; Haspelmath Reference Haspelmath2008a). For example, in English it is possible to omit the object when the action is self-directed, e.g., He shaved and dressed. In contrast, one cannot omit the object of an other-directed verb, e.g., He hates himself. This formal difference is efficient because a coreferential object of a self-directed verb is highly accessible, while a coreferential object of an other-directed verb has low accessibility. The different degrees of accessibility are supported by corpus frequencies (Haspelmath Reference Haspelmath2008a).
Thus, on the global level, coreferential objects are usually less accessible than objects with disjoint reference. This is why reflexive pronouns are often longer than non-reflexive ones. Moreover, at a local level, coreferential objects of verbs like wash are more accessible than coreferential objects of verbs like hate, for which disjoint reference is more typical. This is why coreferential objects of verbs like wash are shorter than coreferential objects of verbs like hate. The multiple layers of efficiency we observe here are similar to global and local markedness patterns and coding splits (Haspelmath’s Reference Haspelmath2021b), which are discussed in the next section.
2.3 Grammatical Coding Asymmetries and Splits
2.3.1 Global Markedness
Grammatical coding asymmetries are observed in members of contrasting grammatical categories that are expressed by markers of different length (Greenberg Reference Greenberg1966; Haspelmath Reference Haspelmath2021a). Below are some examples.
| a. | singular vs. plural nouns (e.g., book – books) |
| b. | positive vs. comparative and superlative degrees of comparison of adjectives (e.g., nice – nicer – the nicest) |
| c. | cardinal vs. ordinal numerals (e.g., ten – tenth) |
| d. | indicative vs. subjunctive (e.g., I go – I would go) |
| e. | active vs. passive verb forms (I called X – I was called by X). |
It is a robust cross-linguistic tendency that the first member in these pairs is formally unmarked (or has a shorter marker), whereas the second (and third) one is formally marked (or has a longer marker). These coding asymmetries became important in structuralist linguistics after Roman Jakobson (Reference Jakobson and Jakobson1971 [1932]) extended the notion of markedness from phonology to grammar. In binary oppositions, the shorter member is considered the unmarked one, whereas the longer one is referred to as marked. The unmarked member appears in neutralization contexts. For instance, in the opposition between singular and plural, as in cat – cats, the singular form is used to express the generic meaning, e.g., The cat is a night wanderer. Therefore, it is considered unmarked. With time, the notion of markedness has become so broad, being understood as non-naturalness, cognitive complexity, language-specific or cross-linguistic rarity, etc., that it can hardly be considered a useful scientific concept (see Haspelmath Reference Haspelmath2006). As argued by Fenk-Oczlon (Reference Fenk-Oczlon1991, Reference Fenk-Oczlon, Bybee and Hopper2001) and later by Haspelmath (Reference Haspelmath2006), markedness phenomena can be reduced to frequency effects, which provide a more parsimonious explanation and a causal mechanism for many interesting facts. For example, the unmarked members in the examples above usually have higher inflectional and syntagmatic potential than the marked members (Croft Reference Croft2003: Chapter 4). This and other observations can be explained by the fact that the unmarked members are more frequent than the marked ones (some corpus evidence is provided in Greenberg Reference Greenberg1966).
Importantly for the efficiency account of these asymmetries, the marked members are usually expressed by longer forms than the unmarked ones. According to Haspelmath, the unmarked categories are more frequent, and therefore, their meaning is more predictable:
Speakers can afford to use short shapes or zero coding for predictable meanings, but they have to make a greater coding effort for unpredictable meaning.
Using the notion of accessibility, we can say that a singular interpretation of a nominal is in general more accessible than a plural one. This allows language users to spare effort when speaking about singular referents. The same logic applies to the other coding asymmetries.
2.3.2 Local Markedness
The examples in (11) illustrated global markedness, where the markedness contrast is the same for all instances of the categories (e.g., singular is unmarked, while plural is marked). Local markedness, in contrast, represents a markedness reversal for some members of the contrasting categories. Tiersma (Reference Tiersma1982) discussed such exceptions in the paradigm levelling in Frisian and some other languages. Markedness theory predicts that the levelling of paradigmatic alternation will favour the unmarked form. However, as some nouns in Frisian undergo change, the originally ‘marked’ plural form becomes the basis for the singular form, rather than the ‘unmarked’ singular. For example, goes/gwozzen ‘goose/geese’ becomes gwos/gwozzen. Thus, the plural stem can be seen as unmarked. Tiersma showed that this markedness reversal happened to those nouns that are frequently used in the plural (‘arm’, ‘goose’, ‘horn’, ‘stocking’, etc.). Some examples from Slavic languages and Bavarian dialects are given in Fenk-Oczlon (Reference Fenk-Oczlon1991).
In some cases, the frequency effects can be even stronger and trigger a reversal of the formal marking. There are a few languages, for example, that can have both overt plural marking (e.g., day – days) and overt singular marking (e.g., Welsh pys-en ‘pea’ – pys ‘peas’), depending on the noun. Haspelmath and Karjus (Reference Haspelmath and Karjus2017) distinguish between ‘individualist’ nouns, which tend to occur with uniplex meaning, e.g., day, and ‘gregarious’ nouns, which are usually associated with multiplex meaning, e.g., pea. Gregarious nouns are often the names of fruits and vegetables, e.g., Russian kartofel’ ‘potatoes (mass noun)’ – kartofelina ‘potato’; small animals, e.g., Welsh adar ‘birds/flock of birds’ – aderyn ‘bird’; and body parts, e.g., Cushitic farró ‘fingers’ – farri-t ‘finger’. Corpus data from different languages demonstrate that the nouns that tend to have overt singular cross-linguistically are also predominantly gregarious. That is, they are used in the multiplex sense.
It would be efficient if all languages were like Welsh, marking the plural of individualist nouns and the singular of gregarious nouns. However, this is not what we see in the world’s languages. For example, English individualist and gregarious nouns behave similarly, e.g., day – days, pea – peas, potato – potatoes, bee – bees, eye – eyes. There is a strong competing factor, namely the systemic pressure, which explains why such efficient strategies are not very frequent cross-linguistically. A system with simpler rules is easier to learn (Haspelmath Reference Haspelmath, MacWhinney, Malchukov and Moravcsik2014).
2.3.3 Coding Splits
A famous example of coding splits is differential object marking. If a language formally marks some objects and does not mark others, prominent (e.g., animate and definite) objects tend to be formally marked, while less prominent (inanimate and indefinite) are usually unmarked. Differential object indexing was discussed in Section 2.2.2. Differential case marking of subject and object will be addressed in detail in Chapter 8. In all these cases, languages tend to mark more frequently those arguments for which the interpretation of an object or subject is less accessible given some semantic and pragmatic features or other contextual factors.
Coding splits can also be found in locative marking (Haspelmath Reference Haspelmath2019). If a language has a split depending on the semantics of locative noun phrases, then place names are likely to be unmarked, inanimates can be either unmarked or marked, and animates tend to be marked. The explanation is that place names represent typical locations, while animates are untypical locations. In other words, the interpretation of a location is the most accessible for place names, and the least accessible for animate beings.
Another example is adnominal possessive constructions, e.g., John’s house (Haspelmath Reference Haspelmath2017). In some languages, different possessive constructions are used, depending on whether possession is alienable or inalienable. For example, in Abun, a West Papuan language, there is the following contrast:
| Abun: West Papuan (Berry and Berry 1999: 77–82, cited from Haspelmath Reference Haspelmath2017: 194) | |||
| a. | alienable possession | ||
| ji | bi | nggwe | |
| I | gen | garden | |
| ‘my garden’ | |||
| b. | inalienable possession | ||
| ji syim | |||
| I arm | |||
| ‘my arm’ | |||
This example illustrates a cross-linguistic tendency for inalienable possession constructions, as in (12b), to have shorter coding than alienable possession constructions, as in (12a). Haspelmath’s corpus data demonstrate that entities that are usually inalienable (kinship terms, body parts) more frequently occur in the possessive constructions (e.g., ‘my hand’, ‘his sister’) than alienable objects, such as a house, a garden or a knife. In other words, the interpretation of inalienable entities as possessed is more accessible. Since nouns that are more frequently mentioned as possessed objects receive less formal marking than those that are less frequently mentioned as such, this coding split can be regarded as efficient. More details about the diachronic development of such patterns follow in Sections 5.2 and 5.3.3.
Differential marking of Recipient can be found in English. It can be expressed by a zero-marked form in the double-object dative (e.g., Sue gives her colleague the memory stick), and by a case-marked form in the prepositional dative (e.g., Sue gives the memory stick to her colleague). The two constructions have different word orders, namely, Recipient + Theme in the double-object construction and Theme + Recipient in the prepositional dative (although there can be exceptions, especially in dialects (Hawkins Reference Hawkins1994: 214; Gast Reference Havelka2007)). There is substantial evidence that language users switch between the constructions in order to manage the flow of information and optimize processing, as will be shown in Section 3.2.2. For example, Bresnan et al. (Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007) show that the double-object construction is preferred when the Recipient is animate, definite, given and pronominal, whereas the Theme is non-given, non-pronominal and indefinite and has a low rank on the animacy hierarchy. The prepositional dative is preferred in the reverse situations (see also Hawkins Reference Hawkins1994: 212–214; Goldberg Reference Goldberg1995: 91ff). In addition, according to Goldberg (Reference Goldberg1995: Chapters 5–7), the prepositional dative construction is a metaphorical extension of the caused motion construction ‘X causes Z to move to Y’ (e.g., I sent the letter to my parents/to her old address), while the double-object construction means ‘X causes Y to receive Z’. This semantic difference is also supported by the distinctive collexeme analysis in Gries and Stefanowitsch (Reference Holler, Kendrick and Levinson2004).
Yet, the constructions differ not only with regard to the order of their constituents and semantics, but, crucially, also in the amount of formal coding. Haspelmath (Reference Haspelmath2021b) argues that the shorter variant in alternations is normally used if the referential prominence of arguments corresponds to their roles, while the longer variant is used if there is some deviation from such canonical relationships. In particular, if an argument is animate, given, definite and pronominal, it is more likely to be Recipient than Theme. And conversely, if an argument is inanimate, new, indefinite and nominal, it is more likely to the Theme than Recipient. The features that provide strong cues to the roles (namely, animate, given, definite and pronominal Recipient, and inanimate, new, indefinite and nominal Theme) are associated with the shorter double-object construction, according to the data in Bresnan et al. (Reference Bresnan, Cueni, Nikitina, Baayen, Bouma, Krämer and Zwarts2007). Therefore, we can interpret the division of labour between the two dative constructions as an efficient coding split in the marking of Recipient: the construction with more formal coding (that is, the prepositional dative) expresses the less accessible assignment of roles to arguments than the construction with less formal coding (the double-object dative).
Interestingly, the frequency of the to-dative rose dramatically in Middle English, when formal marking on verbs and nouns was substantially reduced. Zehentner (Reference Zehentner2022) uses corpus data to show that the more costly to-construction was preferred in contexts with semantically atypical Recipient and Theme – that is, if Recipient is inanimate and/or Theme is animate. These findings can be regarded as support for the idea that the additional marking is used to facilitate a less accessible interpretation.Footnote 5
2.4 The Use and Omission of Clause Connectors
2.4.1 Omission of Adverbial Clause Connectors
In Relevance Theory (Sperber and Wilson Reference Sperber and Wilson1995), an important distinction is made between conceptual (representational) and procedural (computational) information. The former is information about concepts or conceptual representations to be processed, and the latter is information about how to process them (e.g., Blakemore Reference Blakemore1987; Wilson and Sperber Reference Wilson and Sperber1993). For instance, the conjunction so plays such a role:
She’s got a PhD, so she’ll be able to fill in this form.
Such connectors indicate the type of inference process that the addressee is expected to go through. In (13), the connector so indicates that the second clause should be interpreted as a conclusion. As Blakemore points out, expressions like so contribute to relevance by guiding the addressee towards the intended cognitive effects. In Grice (Reference Grice, Cole and Morgan1975), such inferences, which are associated with specific expressions, are called conventional implicatures. The connector so conventionally implicates, according to Grice, that the first clause explains the second. In spite of the differences between the theoretical interpretations, there is one common idea: the speaker guides the addressee’s inferential process by providing an instruction about how to process the propositions in the first and second clauses. Other examples of such cues are the connectors but, and, therefore, on the other hand and after all.
Importantly, connectors can be omitted when the intended inference is expected or easy to make. For example, Blumenthal-Dramé and Kortmann (Reference Blumenthal-Dramé and Kortmann2017) investigate the use and omission of causal and concessive adverbial connectors therefore and still, as in the following examples:
| a. | Ann didn’t read the essay questions properly and therefore failed the exam last January. |
| b. | Ann didn’t read the essay questions properly and failed the exam last January. |
| c. | Peter studied a lot and still failed the exam last January. |
| d. | Peter studied a lot and failed the exam last January. |
It is argued that there is a general tendency for concessive relations to be marked overtly, as in (14c), while causal relations are more often left implicit, as in (14b). The reason is that concessive relationships are more cognitively complex. As a result, implicit concessivity is more disruptive to discourse processing than implicit causality.
Taking the efficiency perspective, we can say that a causal interpretation is generally more accessible in discourse than a concessive one. This claim is supported by the counts from the Penn Discourse Treebank obtained by Asr and Demberg (Reference Asr and Demberg2012), who also show that causal relations are much more often implicit (62% to 69%, depending on the order of cause and effect) than concessive relations (8% to 19%). Therefore, the omission of a connector signals that the more probable (causal) meaning is intended. In addition, we cannot exclude that humans have a cognitive bias towards establishing causal links between events, even if these events are not causally related, e.g., the logical fallacy post hoc ergo propter hoc. If this is true, it makes a causal interpretation more accessible.
2.4.2 Omission of Complementizers and Relativizers
Similar reasoning can be applied to other clause-linking elements, such as complementizers and relativizers. They help the addressee to identify the syntactic and semantic role of elements in discourse. In a language with optional clause-linking elements, the speaker can use them if the function of the clause they introduce is more difficult to identify, and omit them if the function is more accessible. An important role is played by their heads – i.e., nominal phrases and predicates. If they are often followed by a clause, the interpretation is easier to access, which allows the speaker to omit the function word. For instance, as shown by Wasow, Jaeger and Orr (Reference Wasow, Jaeger, Orr, Simon and Wiese2011), the relativizer that in non-subject relative clauses is more likely to be omitted when the nominal phrase is definite (e.g., the colleague I’m replacing) or contains a superlative adjective (e.g., the most interesting subject I’ve ever studied) because such nominal phrases are more commonly followed by a relative clause than indefinite nominal phrases (e.g., a secret that I don’t want to tell anyone).
A similar pattern has been observed for that as a complementizer (Jaeger Reference Jaeger2006, Reference Jaeger2010):
| a. | I think (that) alternatives exist. |
| b. | I’ll show ?(that) alternatives exist. |
The corpus data show that the odds of that are lower when the matrix verb is frequently followed by a complement clause (think, guess, suppose, etc.) and higher with matrix verbs that are rarely followed by a complement clause (e.g., teach, see, show). Thus, the omission of that is more likely in (15a) than in (15b).
This variation has been explained by the Uniform Information Density hypothesis, which predicts that speakers aim to transmit information uniformly close to, but not exceeding, the channel capacity (Jaeger Reference Jaeger2006; Levy and Jaeger Reference Levy, Jaeger, Schlökopf, Platt and Hoffman2007; see also Section 1.3). Adding extra markers in more informative contexts helps to keep the information flow even and uniform, avoiding peaks and canyons. Mentioning the complementizer that at the onset of a complement clause distributes the same amount of information over one more word, thereby lowering information density.
As was argued in Section 1.3, the explanation of these effects in terms of the negative correlation between accessibility and effort would be sufficient. The speaker provides additional formal cues to help the addressee to make inferences in those situations when the interpretation is less accessible, and omits them when it is more accessible.
As one more illustration, consider the use or absence of the particle to after help. More information about this alternation is provided in Section 9.3. According to Rohdenburg (Reference Rohdenburg1996), the chances of the to-form increase with linguistic distance (in words) between help and the infinitive. For example, the use of to is more likely in (16b) than in (16a):
| a. | You should help him (to) overcome his fears. |
| b. | You should help this troubled teenager with many complexes and difficult childhood ?(to) overcome his fears. |
This variation has been explained by the principle of (reduction of) cognitive complexity:
| The principle of cognitive complexity (Rohdenburg Reference Rohdenburg1996: 151): |
| In the case of more or less explicit grammatical options the more explicit one(s) will tend to be favored in cognitively more complex environments. |
Rohdenburg also mentions other formal asymmetries, which, according to him, support this principle. They include inflected and uninflected present-tense forms in non-standard varieties of English (e.g., My mother and father drink/drinks), optional prepositions (e.g., time spent (in) doing something) and prepositional substitutions (e.g., She was prevailed on/upon to write another letter). In addition to linguistic distance, which was discussed above, higher complexity is also attributed to passive constructions.
The effect of linguistic distance in (16) can be explained by the principle of negative correlation between accessibility and costs. As the linguistic distance increases and there are more and more words between the matrix verb and the infinitive, the mental representation of the matrix verb becomes less accessible, which makes it more difficult to identify the infinitival complement as a part of the construction with help. At the same time, the addressee may have less experience of using and processing such constructions in discourse because structures like (16b) are quite rare. Therefore, the speaker is more likely to choose the more costly expression in this case.
2.4.3 Resumptive Pronouns
Another illustration is the use of resumptive pronouns in relative clauses. Keenan and Comrie (Reference Keenan and Comrie1977) found that languages use relative clauses according to the following scale, known as the Accessibility HierarchyFootnote 6:
| Subject > Direct Object > Indirect Object > Oblique > Genitive > Obj. of Comparison |
For example, if a language has oblique genitive clauses, e.g., I see an equation, the solution to which is well known, it can also have subject clauses, as well as direct object, indirect object and oblique clauses, as in the examples below.
| a. | I see the woman who works in the room next to mine (Subject RC) |
| b. | I see the woman I admire (Direct Object RC). |
| c. | I see the woman who I sent my manuscript to (Indirect Object/Oblique RC). |
English has all types of relative clauses, although Object of Comparison RCs can be uncomfortable, e.g., the girl who Sue is taller than.
More directly relevant for the topic of this chapter, however, is another finding by Keenan and Comrie, namely that the same hierarchy constrains the use of resumptive pronouns in relative clauses. Consider an example from Hebrew:
| Hebrew: Afro-Asiatic (Keenan and Comrie Reference Keenan and Comrie1977: 92) | |||||
| ha-isha | she-David | natan | la | et | ha-sefer |
| the-woman | that-David | gave | to-her | obj | the book |
| ‘the woman that David gave the book to’ | |||||
Here, la is a resumptive pronoun in the indirect object position. According to the hierarchy, if a language has resumptive pronouns in the subject position, the pronouns will also be used in all other positions. If a language requires or allows them in the indirect object position, it will also require or allow them for obliques, genitives and objects of comparison.
Keenan (Reference Keenan, Fasold and Shuy1975) provided corpus data from English to demonstrate that the order in the hierarchy correlates with the frequency with which different positions occur. In a sample of more than 2,200 relative clauses, subjects were the most commonly relativized (e.g., the girl who is playing a computer game), and objects of comparison were never relativized. There were only a few examples of relativized genitives (e.g., the gate of which the hinges were rusty).
These findings have not been met uncritically, however. In particular, Fox (Reference Fox1987) argued that instead of Subject on the left end of the scale in (18), one should speak about arguments P or S (that is, objects and intransitive subjects, respectively. In some ergative languages (e.g., Dyirbal and Mayan), ergative subjects (A) are not relativized.Footnote 7 Moreover, object relatives are as frequent as subject relatives in conversational English. Fox explains this finding by the important discourse function played by object relatives. Namely, they anchor the head noun phrase with new information, often with the help of pronominal given subjects in the relative clause, e.g., Have you heard about the party we threw in Las Vegas?
One should also mention here a famous debate about the relative complexity of processing of subject and object relative clauses, as in the examples below (from Levy, Fedorenko and Gibson Reference Levy, Fedorenko and Gibson2013; see also references therein):
| a. | The reporter who attacked the senator hoped for a story. (Subject RC) |
| b. | The reporter who the senator attacked hoped for a story. (Object RC) |
It is received wisdom that object relatives are more difficult to comprehend than subject relatives. Numerous accounts have been given. One relevant factor is the memory load, which increases with the number and length of open syntactic dependencies, in particular, with the number of intervening words between the relative pronoun and the verb (see Section 3.2.1). This is why (21a), where the verb follows immediately after the relative pronoun, is easier to process than (21b).
However, this seems only to hold in artificial sentences with full noun phrases. For example, Reali and Christiansen (Reference Reali and Christiansen2007) demonstrated that object relative clauses can be more easily processed (that is, require shorter reading times) when they begin with a personal pronoun, e.g., The consultant that you called, than similar subject clauses, e.g., The consultant that called you. They were also more frequent than subject relative clauses in a large corpus. Object clauses with personal pronouns are much more natural than ones with nouns (cf. Fox Reference Fox1987), which may explain the different results. Thus, the relative complexity of subject and object clauses strongly depends on the specific linguistic cues and the language users’ experience with them. We process more easily what we are frequently exposed to and what we expect to encounter. See also Diessel (Reference Diessel2019: Section 10.5).
Regardless of whether the Accessibility Hierarchy is correct or not, the use or omission of resumptive pronouns can be explained by the principle of negative correlation between accessibility and costs. Ariel (Reference Ariel1990: Section 7.21) argues that the use and omission of resumptive pronouns in Hebrew is driven by the accessibility of their referents. Resumptive pronouns are omitted when the referent is highly accessible and used when it is less accessible. Accessibility depends on different factors, such as the distance from the head noun. Even Subject RCs, which normally do not allow for resumptive pronouns in Hebrew, can contain them if the distance is long. Resumptive pronouns are better in non-restrictive relative clauses (e.g., The foreign students, whom the university accepted, are very hard-working) than in restrictive ones (e.g., The foreign students who the university accepted are very hard-working), because the former are less semantically and pragmatically dependent on the main clause than the latter. Non-restrictive relative clauses are also intonationally (and, at least in English, with the help of punctuation) separated from the main clause. This may reduce the accessibility of the referents in non-restrictive clauses.
In addition, resumptive pronouns can help to ease the memory load and lower the processing costs (see Hawkins Reference Hawkins2004). All this makes the use and omission of resumptive pronouns relevant for efficient communication.
2.5 Same-Subject and Different-Subject Constructions
According to Cristofaro (Reference Cristofaro2003: 250), if the participants of the main clause and subordinate clause are shared, the reference to them in the subordinate clause is likely to be missing. If the situations expressed by the main and dependent clauses have different participants, they are likely to have overt participant reference in the subordinate clause. We can think of overt participant reference in subordinate clauses as a switch-reference device, which signals that the participants are different from those in the main clause, while the absence of participant reference signals that the participants are the same (cf. Ariel Reference Ariel1990: Section 7.1). All this means that highly accessible participants obtain less coding than less accessible participants. Frequently, some coding material is added to facilitate the interpretation, as well.
For example, the subject of the verb want and the complement it controls is usually the same (Haspelmath Reference Haspelmath2013b). That is, the meaning ‘X wants to do Y’ with the same subject is more frequent than the meaning ‘X wants Z to do Y’ with different subjects. When the subject is the same, in most languages it is not mentioned again, as in (22a) from German. If the subjects are different, both of them are mentioned. Moreover, additional coding is often used, such as complementizers and finite verb morphemes, as in (22b).
| German (own knowledge) | ||||||
| a. | Ich | will | zuhause | bleib-en. | ||
| I | want | at.home | stay-inf. | |||
| ‘I want to stay at home.’ | ||||||
| b. | Ich | will, | dass | du | zuhause | bleib-st. |
| I | want | that | you | at.home | stay-2sg.pres | |
| ‘I want you to stay at home.’ | ||||||
In some languages (e.g., Samoan and Korean), a longer verb form is used for the different-subject want. A few languages have the same construction for the same-subject and different-subject meanings, so no coding asymmetry is observed (e.g., Modern Greek). Most importantly, however, the cross-linguistic sample in Haspelmath (Reference Haspelmath2013b) contains no languages in which the same-subject want would be expressed by a longer construction than the different-subject want.
Another example is intend (Comrie Reference de Hoop and Malchukov1986). Intentions usually involve our own future actions, as in (23a), where an infinitival clause is used. But if we speak about intentions with regard to someone else’s actions, a finite clause is required, as in (23b).
| a. | Sue intends to stay at home. |
| b. | Sue intends that Joe should stay at home. |
But this is not the whole story. We can find some ‘local markedness’ examples again. If the verb in the main clause has two human arguments, and one of them appears in the subordinate clause, the use of the short and long forms depends on the lexical semantics of the verb. Take the verb promise. We usually promise someone to do something because we can control our actions more easily. This is why (24a) is shorter than (24b).
| a. | Sue promised Joe to stay at home. |
| b. | Sue promised Joe that he would stay at home. |
Now consider the verb persuade. When we persuade someone, we expect that they will perform some action. In English, this is expressed by an object-control construction with an infinitival clause, as in (25a). But if the agent of the action is the person who persuades, as in (25b), then a finite clause is used.
| a. | Sue persuades Joe to stay at home. |
| b. | Sue persuades Joe that she should stay at home. |
This formal length asymmetry is efficient because the more accessible interpretation is conveyed by a shorter form than the less accessible one. Although in general the principle observed by Cristofaro (Reference Cristofaro2003) is true, the examples with promise and persuade show that languages can have local formal asymmetries which depend on the expectations triggered by a specific verb in the main clause.
2.6 Zipf’s Law of Abbreviation
This section addresses one of the most famous manifestations of language efficiency, namely, the fact that more frequent words tend to be shorter than less frequent ones. This correlation is known as Zipf’s Law of Abbreviation (Reference Zipf1965 [1935]). Bentz and Ferrer-i-Cancho (Reference Bentz, Ferrer-i-Cancho, Bentz, Jäger and Yanovich2016) have tested the law on 986 languages from 80 families, using massively parallel corpora of Bible translations. They found a negative correlation between word length in characters and word frequency for all languages. The Law of Abbreviation is thus an absolute language universal, although it is statistical in each separate language because the correlation is not perfect.
According to Zipf (Reference Zipf1965 [1935]), this correlation is explained by the general pressure to save time and effort. The linguistic mechanisms responsible for this correlation include truncations, e.g., gas instead of gasoline. There is a lot of evidence for this strategy, e.g., app for application, or German Auto for Automobil.
We should also mention here formal erosion. This often happens as a result of grammaticalization (e.g., Lehmann Reference Lehmann2015: Section 4.2.1), for example when full verbs become auxiliaries (the Old English willan ‘want’ > will and ’ll), full pronouns become clitics (e.g., them and ’em) and bound person markers, because becomes ’cause and coz. A more detailed discussion of the diachronic mechanisms that lead to formal reduction is provided in Chapter 5.
The second strategy, according to Zipf, is to use permanent or temporary lexical substitutions. Temporary substitutions are anaphoric pronouns, which were discussed in Section 2.2. Examples of permanent substitutions are car, which is used instead of automobile or, in more specialized domains, juice for electricity or soup for nitroglycerine (at least in Zipf’s times).
There have also been some sceptical opinions about the interpretation of Zipf’s Law of Abbreviation in terms of efficient organization of language. Miller (Reference Miller1957) noted that a correlation between word length and word frequency is also observed if someone randomly types characters on a keyboard with letters and a space character. A randomly typing monkey would produce a sequence of meaningless strings of characters, whereby shorter strings would appear more frequently than longer ones. At the same time, Howes (Reference Howes1968) argued that the assumptions of Miller’s model are not applicable to natural language. Obviously, we do not form words from randomly reshuffled letters to express some random meanings. More recently, Ferrer-i-Cancho, Bentz and Seguin (Reference Ferrer-i-Cancho, Bentz and Seguin2020) showed that Miller’s random typing itself represents an optimal encoding system from the perspective of standard information theory, which means that it is not surprising that the results of random typing are similar to Zipf’s. Moreover, there are multiple indications that efficient formal reduction is an important type of language change. Section 1.1, for example, discussed the shortened forms for ‘coronavirus’. It is impossible to see this and numerous other examples (see Chapter 5) as a result of random processes.
Word length correlates not only with frequency but also with how predictable a word is from its context. In an experimental study, Manin (Reference Manin2006) showed that word length is correlated with the average probability of guessing the word in context. Informativity can be also inferred from very large corpora. Using n-grams from several Germanic, Romance and Slavic languages, Piantadosi, Tily and Gibson (Reference Piantadosi, Tily and Gibson2011) found out that the average informativity, i.e., the negative logarithm of the conditional probability of a word given its previous context (1 to 3 words on the left), is even more strongly correlated with word length than simple frequency. These findings were complemented and extended by Mahowald et al. (Reference Mahowald, Fedorenko, Piantadosi and Gibson2013), who examined such pairs as exam – examination, chimp – chimpanzee and math(s) – mathematics. Their corpus-based analysis demonstrates that the shorter forms had on average lower informativity given their left context. An experiment with forced-choice sentence completion also revealed that the shorter forms are preferred in more predictive contexts.
These conclusions, however, have been challenged recently by Meylan and Griffiths (Reference Meylan and Griffiths2021), who showed that the dominance of informativity is no longer observed when one encodes strings in UTF-8, which is more fit for languages other than English than the ASCII standard, and excludes words that are not found in the dictionaries of the specific languages. Moreover, one may wonder if the results will hold if more diverse languages are taken into account.
In order to answer this question, I investigated corpus data from nine languages: Arabic, Czech, English, Finnish, German, Hindi, Hungarian, Indonesian and Russian. The data are online news corpora with 30 million tokens from each language taken from the Leipzig Corpora Collection (Goldhahn, Eckart and Quasthoff Reference Goldhahn, Eckart, Quasthoff, Calzolari, Choukri and Declerck2012). The length of words was measured in UTF-8 characters. For each language, 4,000 wordforms (only alphabetic characters) with frequency greater than 20 were selected randomly for analysis. This frequency cut-off was used in order to avoid typos and other spurious hits. Frequency was represented by self-information. That is, the frequency is divided by the corpus size and then the negative logarithm is taken. The higher the frequency of a word, the lower the self-information value. Informativity represents the average probability of a word given one previous word, also negatively log-transformed. The more predictable a word on average from preceding words, the lower the contextual informativity value.
Next, Spearman’s rank correlation coefficients were computed for each language (a) between word length and self-information, and (b) between word length and contextual informativity. The results are shown in Figure 2.1. Partial correlations were also computed, such that the correlations between length and self-information were controlled for contextual informativity, and the correlations between length and contextual informativity were controlled for self-information. The partial correlations are represented by symbols (dots and triangles) on the same plot.

Figure 2.1 Spearman’s rank correlation coefficients between word length and self-information, and between word length and contextual informativity. The dots and triangles stand for partial correlations.
The plot shows that in most of the languages, contextual informativity is indeed more strongly correlated with word length, following Piantadosi et al. (Reference Piantadosi, Tily and Gibson2011). The dominance of informativity is particularly striking in highly analytic languages: Indonesian and English, especially if we look at the partial correlations. However, in Finnish and Hungarian, which are highly synthetic, the opposite is the case. Self-information based on simple frequency is more strongly correlated with word length than contextual informativity is. Note that, unlike in Meylan and Griffiths (Reference Meylan and Griffiths2021), words absent from dictionaries were not excluded; however, a follow-up study based on cleaned data reveals divergent correlations between informativity measures and length across languages, whereas the Zipfian correlation between frequency and length remains consistent (Levshina Reference Levshina2022b).
How can we interpret the findings? If we look at the distribution of word frequencies and bigram frequencies, we will see that Finnish and Hungarian tokens and bigrams have the highest number of hapax legomena (that is, units that occur only once). This is not surprising. Because of their rich morphology, Finnish and Hungarian have very many different forms of content words. The grammatical relationships are expressed by word-internal grams rather than by function words. The individual tokens (individual wordforms) are more difficult to predict from other content wordforms, which are rare. This means that the measures of contextual surprisal can be less reliable in those languages. Yet, even if we remove the hapax legomena when computing the surprisal (or, alternatively, all context words with frequency less than 5), the results change very little. This suggests that the results are not an artefact of data sparseness. The relatively infrequent neighbours are less reliable as cues for infrequent wordforms. Another reason is word order: in languages with rich morphology, word order tends to be less rigid and therefore less predictive of the next words than in languages with a less rich morphology (see Section 6.3). This makes the neighbouring tokens less reliable predictors of target words. Moreover, individual constructions also play a role. For example, some postpositions in these languages can be quite long and at the same time highly predictable from the previous word with a specific case form, e.g., Hungarian keresztül ‘through, across’, érdekében ‘for the benefit of’, kapcsolatban ‘in connection with’, kapcsolatos ‘in relation to’ and köszönhetoen ‘due, thanks to’.
Thus, there is no clear evidence that either frequency or informativity is more strongly correlated with length. One of the reasons is that informativity as a psychological construct representing the accessibility of a word for a language user is very difficult to estimate from corpora.Footnote 8 Moreover, different strings of characters have different degrees of wordhood, and the results will depend on orthographic conventions. Despite the debate about which measure is the most appropriate one for measuring the accessibility of words, the correlations reported above can be regarded as evidence for communicative efficiency.
2.7 Phonetic Reduction and Enhancement
Speakers tend to reduce articulation effort while at the same time producing a signal which shows sufficient acoustic distinctiveness for the addressee to correctly identify the linguistic content of the message (Lindblom Reference Lindblom, Hardcastle and Marchal1990). There is ample evidence in the literature that more accessible linguistic units (words, syllables and individual sounds) undergo reduction more frequently than less accessible ones. Bolinger (Reference Bolinger1963) observed that words are durationally shorter when they occur more frequently on their own or in combinations with other words. For example, the relatively new word robot is pronounced longer than the more familiar rowboat, whereas verbs can be pronounced shorter when followed by more typical complements or adjuncts.
The measures of accessibility that determine the degree of phonological reduction can be of different kinds. One of them is the context-free frequency of a given unit in discourse. Another factor is the conditional probability given the left or right context, e.g., n words on the left or right from the target word. Frequency can be measured across different texts or only in previous discourse. Similarly, conditional probability can be measured in a specific context where the unit of interest is used, or it can be averaged across all contexts where the unit occurs (see Section 2.6 for an illustration). In studies inspired by information theory, the probabilities are often made negative and logarithmically transformed, such that the resulting number represents the informativity of the unit in bits (or nats, depending on the logarithm base). Higher probability means lower informativity, and vice versa. Pointwise Mutual Information, which reflects how much more information is obtained about a word upon seeing its neighbour, and the other way round, has also been shown to be relevant for different types of reduction in language production (e.g., Gregory et al. Reference Gregory, Raymond, Bell, Fosler-Lussier and Jurafsky1999).
Bell et al. (Reference Bell, Brenier, Gregory, Girand and Jurafsky2009) studied the relationships between pronounced durations of words in a spoken corpus and several factors: frequency, conditional probability and repetition. They looked separately at content and function words. Both in content and in function words, there was a significant effect of different types of conditional probability – given the previous context or the next context. Moreover, word frequency and repetition led to reduction of content words. Similarly, Fowler and Housum (Reference Fowler and Housum1987) found effects of repetition on the duration of content words in a narration.
Phonetic reduction can manifest itself not only in formal shortening but also in the loss of phonetic detail. For instance, Aylett and Turk (Reference Aylett and Turk2004) report that highly predictable phrase-medial syllables are shorter than less predictable ones. At the same time, there is a loss of articulatory detail. In particular, vowels undergo centralization of their first and second formant frequency values. As a result, the vowel space is reduced (Aylett and Turk Reference Aylett and Turk2006).
Both context-specific and average predictability play a role in reducing the acoustic duration of a notional word, many other factors being controlled for (Seyfarth Reference Seyfarth2014). Therefore, formal reduction is to some extent stored in the lexicon. Similar results are obtained by Cohen Priva (Reference Cohen Priva, Abner and Bishop2008), who finds that oral and nasal stop deletion in English is influenced by the phones’ average informativity. This demonstrates again how the use of a unit in particular contexts percolates into language structure.
Pierrehumbert (Reference Pierrehumbert, Bybee and Hopper2001) proposes an exemplar-based model in order to explain why high-frequency words undergo reduction faster than low-frequency words. For example, the middle schwa is deleted before /r/ and /n/ in high-frequency words, such as evening and every, but is retained in rare words, such as mammary and artillery (Hooper Reference Hooper and Christie1976; see also Fenk-Oczlon Reference Fenk-Oczlon, Bybee and Hopper2001). According to Pierrehumbert, this difference can be explained by the systematic production bias towards lenition (Lindblom Reference Lindblom and MacNeilage1984), or ‘undershooting’ the phonetic target to the extent that it does not disrupt understanding. Since high-frequency words are used more often than low-frequency words, their stored exemplar representations are more affected by this persistent bias. This explains why high-frequency words are more reduced than low-frequency words synchronically and why the former undergo this reduction faster than the latter in diachrony. It does not seem very plausible, though, that there is a certain constant rate of lenition that is applied to every use of a word or sound in every context. Frequent words are also highly accessible on their own and across individual contexts, which is why they can be reduced in the first place.
Speakers also enhance linguistic forms under some circumstances, e.g., when they believe that the addressee may need help to disambiguate between two similarly sounding words. This has been shown in studies of hyperarticulation. For example, when the hearer has to choose between two similarly sounding words, e.g., dose – doze, the speaker tends to increase the voicing of the final consonant in doze more often than in situations when such ambiguity is not present (Seyfarth, Buz and Jaeger Reference Seyfarth, Buz and Florian Jaeger2016). Speakers also hyperarticulate when their communication partners misunderstand instructions (Stent, Huffman and Brennan Reference Stent, Huffman and Brennan2008). Hyperarticulation is observed immediately after the speaker finds out that they were misunderstood, and then decays gradually over several turns in the absence of further misrecognitions.
Explanation of these effects has been a controversial issue. First, they can be explained by audience design (Bell Reference Bell1984), which means that language users proactively adjust their message in order to increase their communicative success while at the same time reducing their efforts any time they can.
But this is not the only explanation that can be found in the literature. A popular view in usage-based linguistics involves the phenomenon of chunking. According to Bybee, for example, each instance of use further automates and increases the fluency of a sequence of words, leading to their fusion (Bybee Reference Bybee2007: 324; see also Section 5.4.3). A frequently repeated stretch of speech becomes automated as a processing unit due to neuromotor routines. Further repetition leads to reduction and overlapping of articulatory gestures. All this shortens the duration. For instance, Bybee and Scheibman (Reference Bybee and Thompson1999) found that reduction of the vowel and the consonants in don’t in spoken English is particularly frequent after the pronoun I and before the verbs know and think because this contraction occurs particularly frequently in phrases I don’t know and I don’t think. The process of automatization is not restricted to language alone and is largely unconscious.
If the automatization account is the only true one, then the joint probability of neighbouring units (i.e., the frequency of these units together, divided by the sum frequency of all other sequences) would be the only important factor in predicting formal reduction. However, empirical evidence reveals that conditional probability is more important than joint probability in that regard. In particular, Bell et al. (Reference Bell, Jurafsky, Fosler-Lussier, Girand and Gildea2003) investigated the effects of conditional probabilities and joint probabilities on the duration and phonetic reduction of function words in spoken English. They found that the conditional probabilities have either the strongest or the only significant effect in the predicted direction (i.e., more predictable target words are more frequently reduced than less predictable ones). Joint probabilities, which basically represent the frequencies of possible chunks and their degree of routinization, sometimes have an effect in the opposite direction. Also, Barth (Reference Barth2019) shows that reduction of be and have in highly grammaticalized contexts is due to the high conditional probabilities rather than the joint probabilities of these words with their neighbours (most importantly, the words that follow be and have). This can be regarded as evidence that accessibility due to high contextual predictability is more important than the process of chunking, at least, in these cases of formal reduction.
Another popular explanation is that the speaker buys time for planning by using a longer expression. As shown by Bell et al. (Reference Bell, Jurafsky, Fosler-Lussier, Girand and Gildea2003), planning problems, which are represented by disfluencies either preceding or following a function word, increase the chances of longer or fuller variants of words in language production. Planning issues were also one of the explanations offered by Szmrecsanyi (Reference Szmrecsanyi2003) to provide an account for the preference of the construction be going to in syntactically complex environments (in comparison with will/shall), which are more demanding in terms of processing resources (see Section 4.3).
While planning issues may well play a role, they fail to explain many instances of reduction and enhancement. For example, Jaeger and Buz (Reference Jaeger, Buz, Fernández and Cairns2017) argue that the link between the contextual predictability of a linguistic form and its own realization is not very clear if one accepts the ‘buying-time’ explanation. There is also evidence that backward transitional probabilities (i.e., those that predict the target unit given the following context) play a role that is at least as important as the role of forward transitional probabilities (i.e., the ones that predict the target unit from the preceding context), if not more important (Seyfarth Reference Seyfarth2014; Barth Reference Barth2019). Moreover, speakers adapt subsequent productions towards less reduced variants if previous use of more reduced variants resulted in communicative failure (Stent et al. Reference Stent, Huffman and Brennan2008; Buz, Tanenhaus and Jaeger Reference Buz, Tanenhaus and Jaeger2016). As Jaeger and Buz (Reference Jaeger, Buz, Fernández and Cairns2017) argue, this is incompatible with the idea that the degree of reduction depends solely on production ease.
One cannot exclude the possibility that routinization, ‘stalling for time’ and other production-related and speaker-centred explanations are relevant in some situations (cf. Ernestus Reference Ernestus2014). I argue that the effect of production factors should be ultimately constrained by the communicative need of the speaker to get the message across, although some of the lower-level reduction or enhancement processes can be caused by cognitive processes unrelated to the addressee’s needs (cf. Lindblom Reference Lindblom, Hardcastle and Marchal1990). This constraint becomes obvious if we listen to human (not previously recorded) announcers at a railway station. When the speaker announces that the platform number has been changed, the number will be highly accessible to him or her. However, the numeral representing the platform number is unlikely to be reduced because this information is highly important and not accessible to the travellers who need to catch the train.Footnote 9 Notably, numbers tend to be very stable phonologically across languages (Diessel Reference Diessel2019). We can think of at least two reasons for this. First, confusion can be costly in many linguistic and extralinguistic ways. Second, numbers are often used in similar contexts (e.g., X costs two/five/ten/… euros), which makes them on average less predictable from context. We need more research in order to obtain a conclusive answer and to disentangle these competing motivations and explanations.
A final word of warning should be said against a potential misunderstanding that an account based on audience design should only display effects based on context-specific accessibility. There is no conflict between this account and the evidence of entrenchment effects, which can last for a while, or even become conventionalized. For example, the voice-onset time of words with initial voiceless stops that have minimal pairs, e.g., cod – god, is greater in comparison with words without such a pair, e.g., cop – *gop. Baese-Berk and Goldrick (Reference Baese-Berk and Goldrick2009) found that this difference is observed even if the minimal pair is not present in the context (i.e., there is no need of disambiguation). They conclude that this effect is not driven by what they call ‘listener–modelling’. We know from Cohen Priva (Reference Cohen Priva, Abner and Bishop2008), Seyfarth (Reference Seyfarth2014), which were mentioned above, and other studies, that units that frequently occur in reducing contexts also become more reduced in general, i.e., usage percolates into the system. Therefore, units that are frequently hyperarticulated or reduced in some contexts may become hyperarticulated or reduced across the board. This may lead to short-term or long-term effects. In the study mentioned above, Stent et al. (Reference Stent, Huffman and Brennan2008) show that hyperarticulation is a targeted and flexible adaptation to a specific situation, which decays with time. At the same time, reduced or enhanced forms can be entrenched and conventionalized in their conjunction with specific communicative situations. As a result, whole special registers can emerge, e.g., child-directed speech, foreigner-directed speech, etc. (Jaeger and Buz Reference Jaeger, Buz, Fernández and Cairns2017). As in the previous examples of efficient formal asymmetries, we can observe different kinds of efficiency, from context-sensitive language use, where audience design is probably the strongest and most precise, to conventionalized patterns, which are coarser, but do not require much thinking and produce the desired cognitive effects most of the time.
2.8 Conclusions
We have seen many different manifestations of efficiency as a descriptive phenomenon in all domains of language – lexicon, phonology, morphosyntax and discourse. Some of them lend themselves easily to the efficiency explanation, while some others also have alternative accounts. Chapter 5 will discuss some of them and others in greater detail.
Formal length is related to processing costs. Longer expressions can be used to make processing easier for the addressee. For example, the use of resumptive pronouns (see Section 2.4.3) in some types of relative clauses can help the addressee to process the sentence. This does not automatically mean, however, that shorter expressions mean more processing effort for the addressee, and longer expressions mean less processing effort. First of all, as we saw in Section 1.3, overly informative expressions create problems for comprehension. Second, the use of short and ambiguous expressions does not result in processing difficulties, provided that there is enough relevant context. See more on this topic in Section 6.2.1.
