7.1 The Causative Continuum
This chapter focuses on causative constructions in languages of the world. Causatives have received a lot of attention in the typological literature (e.g., Comrie Reference Comrie1989: Ch. 8; Song Reference Song1996; Dixon Reference Dixon, Dixon and Aikhenvald2000; Shibatani and Pardeshi Reference Shibatani, Pardeshi and Shibatani2002; Haspelmath et al. Reference Haspelmath, Calude, Spagnol, Narrog and Bamyacı2014). Normally, they are classified into several types, which form the so-called causative continuum:
Lexical – Morphological – Analytic (or Periphrastic)
In lexical causatives, the cause and effect are expressed in one morpheme, e.g., kill, break, give. Morphological causatives contain a causative morpheme, e.g., Turkish öl-dür- ‘kill’ from öl- ‘die’. Finally, analytic or periphrastic causatives are those in which the causative meaning is expressed by a combination of words, whereby the causing and caused events are expressed separately, e.g., make + NP + dead, cause + NP + to die.
These categories form a continuum because the boundaries between them are not clear-cut. In particular, one may argue about the class membership of non-productive morphological causatives, e.g., English wid-en and solid-ify, which may still exhibit morphological boundaries. Such causatives would be located between the prototypical lexical ones (e.g., kill) and the prototypical morphological ones (see the Turkish example above). Another example is formal variation within analytic causatives. For example, French causatives with faire ‘make’ are usually immediately followed by the infinitive, whereas constructions with demander ‘ask, request’ are followed by a nominal phrase, and then the infinitive, as in (2).
| French (Comrie Reference Comrie1989: 169) | |
| a. | J’ai fait manger les pommes à Paul. |
| ‘I made Paul eat the apples.’ | |
| b. | J’ai demandé à Paul de manger les pommes |
| ‘I asked Paul to eat the apples.’ | |
In this case, the elements expressing cause and effect in the construction with demander will be less integrated than those in the construction with faire. One can also mention here monoclausal and biclausal causatives, although this distinction is not clear-cut, either (Kulikov Reference Kulikov, Haspelmath, König, Oesterreicher and Raible2001: 887).
Probably the most famous cross-linguistic generalization about causatives is that the formal continuum in (1) corresponds to the semantic continuum of direct and indirect causation, as shown below:
| Lexical – Morphological – Analytic |
| more direct < ––––––––> less direct |
As Comrie (Reference Comrie1989: 173) puts it, ‘the kind of formal distinction found across languages is identical: the continuum from analytic via morphological to lexical causative correlates with the continuum from less direct to more direct causation’.
Although this idea has been very popular, it is somewhat surprising that empirical evidence for it has been equivocal (Escamilla Reference Escamilla2012; Bellingham et al. Reference Bellingham, Evers, Kawachi, Mitchell, Park, Stepanova, Bohnemeyer, Siegal and Boneh2020). The most important problem is probably that directness and indirectness of causation can be defined in different ways, which presents a challenge for studying causative constructions (cf. Bellingham et al. Reference Bellingham, Evers, Kawachi, Mitchell, Park, Stepanova, Bohnemeyer, Siegal and Boneh2020). For example, one can speak about the spatiotemporal integration of events. Consider a famous example from Fodor (Reference Fodor1970):
| a. | John caused Bill to die (on Sunday by stabbing him on Saturday). |
| b. | John killed Bill (*on Sunday by stabbing him on Saturday). |
In (4a), the causing and caused events are not spatiotemporally integrated, and the causation is indirect. In contrast, in (4b) the events should occur in the same time and space, and the causation is direct. Formally, the periphrastic causative construction cause to die in (4a) represents the causing and caused events separately, whereas the lexical causative verb kill in (4b) contains both of them in one verbal root.
Another factor defining (in)directness is the presence or absence of physical contact between the participants (Haiman Reference Haiman1983). An illustration is provided in (5), taken from Haiman (Reference Haiman1983: 784). In (5a), an instance of indirect causation, the Causer employs some unnatural force (e.g., magic or telekinesis) in order to cause the cup to rise without touching it. In contrast, in (5b) the Causer uses their own physical force to raise the cup and therefore has direct physical contact with the object.
| a. | I caused the cup to rise to my lips. |
| b. | I raised the cup to my lips. |
Moreover, direct causation has been defined as causation in which the Causer is the main source of energy responsible for the caused event (cf. Verhagen and Kemmer Reference Verhagen and Kemmer1997). When causation is indirect, there is some other source. For example, stabbing someone dead represents an instance of direct causation because the energy comes from the Causer. In contrast, imagine that someone tampers with another person’s gun ammunition, so that the owner kills themselves. This would be an example of indirect causation.
The role of the Causee is crucial. Here, I use this term to refer to the participant immediately following the Causer in the causation chain, which performs an action or undergoes a change of state. If the Causee is animate, it can be the main source of energy. The Causer can make the causation indirect by giving directions to the Causee (so-called directive causation). However, this is only possible when the Causee is agentive and responds to the causing event (e.g., the Causer’s command or request) by performing an action, as in (6a). When this agency is not present, e.g., when the Causee is asleep, as in (6b), the causation is direct despite the fact that the Causee is animate.
| a. | She made the children lie down. |
| b. | She laid the children down. |
This account also agrees with Givón’s, who predicts that periphrastic causatives are more likely to code causation with a human-agentive ‘manipulee’ (i.e., Causee), whereas morphological and lexical ones are more likely to code causation with an inanimate manipulee (Givón Reference Givón1990: 556).
Indirect causation is also associated with transitivity of the predicate that expresses the caused event, as in the next example, where the Causee (the mechanic) serves as an intermediary in bringing out the change in the Affectee (my transmission), or the end point of causation.
I had the mechanic fix my transmission.
Since causation chains with transitive predicates are longer and involve more participants, the causation is considered indirect. Similarly, in Bellingham et al. (Reference Bellingham, Evers, Kawachi, Mitchell, Park, Stepanova, Bohnemeyer, Siegal and Boneh2020), indirectness is operationalized as the presence of a mediator (an agentive human or an instrument).
A special case of indirect causation is so-called curative causation, where the Causer has something done by the Causee. A typical example is when the action is a service provided by the Causee professionally:
I had my hair cut (by the hairdresser).
The Causee is backgrounded and can be omitted, since it is not important who performs the action.
Moreover, one can also regard letting and permission as instances of indirect causation. In Talmy’s Cognitive Semantics, letting is defined as non-impingement, or cessation of impingement. The Causee’s intrinsic tendency towards rest or motion is not changed by the Causer (Talmy Reference Talmy2000: 417–421), which means that the Causer is not the source of energy for the caused event or state. Compare direct causation in (9a) with non-impingement in (9b) and cessation of impingement in (9c):
| a. | She rolled the stone up the hill (i.e., using mostly her own energy). |
| b. | She let the rain cover her dry footprints (i.e., by non-interference). |
| c. | The detective released the criminal’s arm and let him fall from the roof (i.e., by removing the obstacle to the force of gravity). |
Directness and indirectness can manifest themselves differently, and should be thought of as prototypical categories with several features rather than semantic primitives. The above-mentioned features are summarized in (10). They are strongly correlated in language use (Levshina Reference Levshina2016). Note that this list includes the dimensions of (in)directness discussed in Bohnemeyer et al. (Reference Bohnemeyer, Enfield, Essegbey, Kita, Bohnemeyer and Pederson2010): mediation, contact and force dynamics.
| Semantic features relevant for (in)directness of causation: | ||
| Direct causation | Indirect causation | |
| a. | Spatiotemporal integration | Lack of spatiotemporal integration |
| b. | Physical contact | Lack of physical contact, causation by other means (e.g., communication) |
| c. | Causer is the main source of energy | Causee or another force (e.g., magic) is the main source of energy |
| d. | Causee is affected | Causee is an intermediary or agent |
| e. | Short causation chain | Long causation chain |
| (two participants, intransitive predicate expressing the effect) | (three participants, transitive predicate expressing the effect) | |
| f. | Impingement (‘making’) | Lack or cessation of impingement (‘letting’) |
It is possible to make other semantic distinctions, as well. One of them is whether the Causer is acting intentionally or accidentally. The sentence in (11a) is an example of intentional causation, whereas (11b) exemplifies accidental causation.
| a. | The thief opened the safe with a key. |
| b. | Oops, I’ve just broken your Ming vase! |
This distinction may correlate with (in)directness, but it represents a dimension on its own. A Causer acting accidentally does not necessarily act indirectly, as in (12a), where the Causer has physical contact with the object, while acting intentionally does not mean acting directly, as in (12b), which represents an example of subtle manipulation by using communication.
| a. | Sorry, I’ve broken your iPad by sitting on it. |
| b. | So how do you make it so that he does want to text you back?Footnote 1 |
Other semantic types include forceful and comitative causation, which are also difficult to interpret in terms of (in)directness, without making the latter distinction vacuous. These are discussed in the next sections.
The diverse interpretations of (in)directness often make it difficult to evaluate and compare the results of studies where the form–meaning correlation in (3) is tested. In what follows I will demonstrate that typological and corpus data support this correlation. See also Levshina (Reference Levshina2016), where the use of analytic and lexical causatives in European languages is correlated with different parameters related to (in)directness, supporting the cross-linguistic generalization in (3).
The main focus of this chapter, however, is on the functional motivations for this generalization. It is traditionally explained by an iconic correspondence between form and function: ‘[t]he linguistic distance between expressions corresponds to the conceptual distance between them’ (Haiman Reference Haiman1983: 782). The closer two events or objects are conceptually, the closer to each other the elements that express them will be. For example, in Fe’fe’ Bamileke (Hyman Reference Hyman1971), two clauses can be separated by a coordinating conjunction nī ‘and’, as in (13a). The sentence conveys a strong implication that the events do not represent one unit. If the clauses are merely juxtaposed, there is a strong implication that the events take place at roughly the same time (Haiman Reference Haiman1983), as in (13b).
| Fe’fe’ Bamileke (Haiman Reference Haiman1983: 788) | |||||||
| a. | à | kà | gén | ntēe | nī | njwēn | lwà’ |
| he | pst | go | market | and | buy | yams | |
| ‘He went to the market and also (at some later date) bought yams.’ | |||||||
| b. | à | kà | gén | ntēe | njwēn | lwà’ | |
| he | pst | go | market | buy | yams | ||
| ‘He went to the market to buy yams.’ | |||||||
In this chapter, I will argue that efficiency provides a better explanation for cross-linguistic variation of causatives than iconicity of cohesion or distance, developing the line of argumentation in Haspelmath (Reference Haspelmath2008c). My explanation involves asymmetries in the accessibility of the meanings expressed by different causative constructions. These asymmetries also explain the differences in the degree of conventionalization and grammaticalization of the constructions. I will argue that the principle of negative correlation between accessibility and costs, which involves the accessibility of different types of causative situations, plays a central role in such efficient form–meaning pairings. The meanings expressed by more compact causatives are more accessible (that is, direct causation and other frequent causation types), whereas the meanings expressed by less compact causatives are less accessible (indirect causation and other less frequent types). Less compact causatives normally have more costly forms in terms of articulation effort and time because they are usually longer. But one can also speak about asymmetries in processing effort (in particular, extraction from long-term memory and integration of parts of a periphrastic expression).
These ideas are compatible with a pragmatic account of causatives based on Levinson’s (Reference Levinson2000) I- and M-implicatures (see Section 1.4.2). It is argued that typical lexical causatives, as stop in (14a), trigger an I-implicature that the causation event is expected and typical. Analytic and periphrastic causatives, as get to stop in (14b), are ‘marked’ and therefore generate an M-implicature that the situation is untypical.
| a. | Ann stopped the car. |
| (I-implicature → in the usual way, i.e., by putting her foot on the brake pedal) | |
| b. | Ann got the car to stop. |
| (M-implicature → in an unusual way, e.g., by using the emergency brake or crashing into a lamppost) | |
We can also interpret these expressions in terms of efficiency. Less costly expressions are matched with more accessible interpretations, while more costly expressions are matched with less accessible interpretations. The main difference is that we speak about the match between articulatory effort and accessibility, instead of a match between (un)typical forms and (un)typical meanings.
The arguments that support these ideas are the following.
1. The cross-linguistic variation of causative constructions with regard to their compactness is not restricted to (in)directness. There are other semantic parameters that are correlated with different degrees of formal compactness. This argument is developed in Section 7.2. I will argue that all these correlations, including the one related to (in)directness, can be explained by the accessibility of the corresponding causative meanings.
2. The cross-linguistic variation of causative constructions that express direct and indirect causation correlates more strongly with length differences than with the formal autonomy of the elements that express the cause and the effect, or the distance between them, as the iconicity account would predict. This is demonstrated in the typological study in Section 7.3.
3. Finally, one can model the development of efficient formal asymmetries in causatives without any iconic correlations. This is demonstrated in an artificial language learning experiment reported in Section 7.5. Artificial language learning is a valuable addition to the diachronic evidence showing how the efficient division of labour between causatives has emerged in some languages (see Section 7.4).
7.2 More than Just Direct and Indirect Causation
7.2.1 Causatives around the World
It was observed by Dixon (Reference Dixon, Dixon and Aikhenvald2000) that more and less compact causatives vary not only with regard to (in)directness of causation, but also with regard to other parameters, such as involvement of the Causer in the caused event and the Causer’s intentions. If this is true, then the iconicity account presented in Section 7.1 is too narrow. At the same time, we will see that the principle of negative correlation between accessibility and costs can deal with the multifactorial variation perfectly.
Let us begin with the scale of formal compactness in Dixon (Reference Dixon, Dixon and Aikhenvald2000), which decreases from lexical causatives (15a) to periphrastic causatives (15d):
| a. | lexical causatives, e.g., breaktr or walktr; |
| b. | morphological causatives, e.g., internal or tone change, reduplication, or affixation; |
| c. | complex predicates, e.g., serial verbs, French faire ‘make’ + Vinf, or causative particles; |
| d. | periphrastic causatives, which consist of verbs that belong to separate clauses, e.g., French laisser ‘let’ + NP + Infinitive or Portuguese fazer ‘make’ + (NP) + Infinitive. |
According to Dixon, the degree of compactness is correlated with different semantic and syntactic features, as shown in Table 7.1 (Dixon Reference Dixon, Dixon and Aikhenvald2000: 76). If a language has two different causative forms, a more compact and a less compact one, they will differ along one or more of the parameters.
Table 7.1. Correlation between formal compactness and semantic and syntactic parameters according to Dixon (Reference Dixon, Dixon and Aikhenvald2000)
| More compact forms | Less compact forms | |
|---|---|---|
| 1. | non-causal verb describing a state | non-causal verb describing an action |
| 2. | intransitive (or intransitive and simple transitive) non-causal verb | transitive (or ditransitive) non-causal verb |
| 3. | Causee lacking control | Causee having control |
| 4. | Causee willing (‘let’) | Causee unwilling (‘make’) |
| 5. | Causee partially affected | Causee fully affected |
| 6. | direct causation | indirect causation |
| 7. | intentional causation | accidental causation |
| 8. | causation occurring naturally | causation occurring with effort |
The ninth parameter discussed by Dixon is involvement of the Causer in the caused event. Yet Dixon did not find any correlations between this parameter and the degree of compactness. Note also that the fourth parameter in the table predicts more compact forms for willing Causees (letting), and less compact forms for unwilling Causees (making). It seems that two different distinctions are conflated here. The first distinction is whether the Causee resists the Causer’s action or not. This is reflected in the eighth parameter, i.e., whether the causation occurs naturally or with effort. The second distinction is that between making and letting, or factitive and permissive causation. The typological data, which are presented below, as well as previous corpus-based research (Levshina Reference Levshina2016), show clearly that making is expressed by more compact forms than letting.
Unfortunately, Dixon does not provide a clear definition of direct and indirect causation. From the examples, however, one can infer that causation is direct when the Causer performs the caused event personally, by physically manipulating an object, while indirect causation means that the causation happens through someone or something else. The distinctive conceptual features are thus physical contact and mediation (see Section 7.1). Moreover, some of the other distinctions fit our broad definition of (in)directness based on the semantic features listed in (10). In particular, transitivity reflects the length of a causation chain. Transitive verbs will form causative constructions with three participants, which means that causation can be indirect. Also, controlling Causees are agentive, which can be interpreted as a sign of indirect causation. So, some of Dixon’s parameters are closely related to (in)directness of causation in the maximally inclusive sense, which was discussed in the previous section.
But there are other parameters, as well. Let us have a look at intentional and accidental causation. In Section 7.1 it was argued that this distinction cannot be reduced to (in)directness. According to Dixon, if the Causer acts intentionally, the chances of a more compact form are higher than if the causation is accidental. An example can be found in Kammu, an Austro-Asiatic language spoken in Laos. In Kammu, the prefix p(n)- expresses intentional causation, whereas the particle tòk expresses accidental causation. Therefore, intentional causation is expressed by a more compact form than accidental causation.
| Kammu: Austro-Asiatic (Svantesson 1983: 103–111, cited from Dixon Reference Dixon, Dixon and Aikhenvald2000: 70) | ||||
| a. | kə̀ə | p-háan | tráak | |
| 3sg+m | caus-die | buffalo | ||
| ‘He slaughtered the buffalo.’ | ||||
| b. | kə̀ə | tòk | háan | múuc |
| 3sg+m | caus | die | ant | |
| ‘He happened to kill the ant (e.g., by accidentally treading on it).’ | ||||
Effortful vs. natural causation, as well as full vs. partial affectedness of the Causee, are also difficult to interpret in the sense of (in)directness, even if we use the broad definition proposed in Section 7.1.
The main question of this section is, do we find evidence of these correspondences between formal compactness and semantic features in languages of the world? We are particularly interested in the features beyond (in)directness of causation. Will Dixon’s observations still hold if we obtain more data from diverse languages?
In order to answer this question, I took a sample of fifty-nine languages, each from a different language family, in which at least two causative constructions were described. The data come from reference grammars. Lexical causatives were excluded (usually grammars provide very little, if any, information about their shared functions), except for labile verb alternations. The list of languages and references is provided in Appendix 1.Footnote 2
The causative constructions were then analysed semantically and formally, and all possible pairs of causatives were compared within each language. I found information about the semantic differences (which was either provided explicitly by the grammars or could be inferred from the examples) in the pairs from fifty-three languages (see more information in Appendix 1). Only these constructions are analysed in this section.
Compactness was determined according to Dixon’s scale in (15). Labile verbs (e.g., burn or melt) were considered more compact than morphological causatives, whereas light verb constructions were considered more compact than serial verb constructions. Causatives with clitics were considered more compact than analytic causatives, but less compact than morphological causatives. Whether a causative was analytic, morphological, or something in-between, was determined according to the descriptions provided in the grammars. If two causatives belonged to the same type, their length was used as a criterion of compactness, following Dixon (Reference Dixon, Dixon and Aikhenvald2000: 75).
Table 7.2 presents the semantic features of the less compact form in a pair of causatives. This is done because in many cases only the less compact form has a special semantic description in a grammar, whereas the more compact form is treated merely as a valency-increasing device, or the ‘default’ causative. One example comes from Trumai, a language isolate from South America. In (17a), the default causative with the particle ka is used. In (17b), one can see the periphrastic causative with the verb tao ‘order/give order’, which means that the periphrastic construction represents causing someone to do something by order.
| Trumai: isolate (Guirardello Reference Guirardello1999: 302, 307) | |||||
| a. | hai-ts | Yakair-ø | sa | ka. | |
| 1-erg | Yakairu-abs | dance | caus | ||
| ‘I made Yakairu dance.’ | |||||
| b. | hai-ts | ka_inFootnote 3 | [Atawaka-ø | pa] | tao. |
| 1-erg | foc/tns | Atawaka-abs | marry | order | |
| ‘I ordered Atawaka to marry.’ | |||||
Table 7.2. Different types of causation in the typological sample, the meaning of the less compact form
| The less compact form expresses more/more often… | Languages in the sample | Number of languages |
|---|---|---|
| Indirect causation | Ma’di, Gumuz, Humburi Senni, Kayardild, Kusunda, Chimariko, Hebrew, Humburi Senni, Basque, Betta Kurumba, Yukaghir (Kolyma), Creek, Japanese, Urarina | 15 |
| Directive causation (as opposed to manipulative) | Diyari | 1 |
| Agentive or volitional Causee | Aguaruna, Cherokee, Lakhota, Motuna | 4 |
| Causation by communication (e.g., ordering) | Trumai, Great Andamanese | 2 |
| Mediated causation | Hindi | 1 |
| Factitive causation with a human intermediary | Noon | 1 |
| ‘Indefinite’ causation (have something done) with a backgrounded Causee | Ainu | 1 |
| Weaker integration of events | Apinayé, Takelma | 2 |
| Distant causation (vs. contact causation) | Nivkh | 1 |
| ‘Mild’ causation | Caddo | 1 |
| Causee as beneficiary | Tubu/Dazaga | 1 |
| Formed from dynamic verbs, actions (vs. states) | Wappo, Garrwa, Finnish | 3 |
| Letting, permissive (vs. making, factitive) | Ma’di, Kusunda, Finnish, Trumai, Hebrew, Teribe | 6 |
| Forceful causation | Basque, Wappo, Ik, Finnish | 4 |
| Non-volitional, not intentionally acting Causer | Tidore, Adang, Apinayé | 3 |
| Involved Causer | Cavineña | 1 |
| Distributive causation | Yukaghir (Kolyma) | 1 |
| Iterative causation | Yukaghir (Kolyma) | 1 |
| ‘Resultative’ causation (keep X in a certain state) | Yukaghir (Kolyma) | 1 |
| Ballistic causation | Hup | 1 |
Note that some languages are mentioned more than once because they have more than one pair of causatives that can be compared.
It is a difficult question whether the meaning of direct, intentional, non-forceful, factitive, etc. causation, which is not expressed by the more semantically specialized constructions, is encoded in the default causatives, or if it should be pragmatically inferred on the basis of Q-implicatures (Levinson Reference Levinson2000, see also Sections 1.4.2 and 5.3.2). The addressee can reason, ‘The speaker has not used a more semantically specific construction, therefore this meaning is not implied here.’ If the more specific construction becomes sufficiently frequent, Q-implicatures of this kind will become conventionalized (Bybee Reference Bybee and Pagliuca1994). That is, the more compact causative will become conventionally associated with direct, intentional, etc. causation. To what extent this conventionalization has taken place is difficult to judge from the available descriptions, but this does not prevent us from assigning the more accessible meanings to the default causatives.
Moreover, I have encountered several combinations of features of the less compact form:
making/letting/compelling (Khoekhoe: Khoe-Kwadi);
permissive and not implicative (Waimiri-Atroarí: Cariban);
permission or coercion (Lahu: Sino-Tibetan, Slave: Na-Dene);
indirect and/or non-implicative (Korean: isolate);
indirect and/or unintentional (Indonesian: Austronesian, Motuna: East Bougainville, Filomeno: Totonacan);
‘weak’ causation with the semantics of motion, i.e., ‘send’ (Yagua: Peba-Yaguan).
The distinction between direct and indirect causation is the most popular one, especially if we also consider the features that can be interpreted as indirect causation using the list in (10): directive, mediated or distant causation, causation with agentive or volitional Causee, letting, and some others. So, it is not by chance that the distinction between direct and indirect causation plays a special role in the typology of causatives. Of course, we cannot exclude that this distinction is reported more frequently because it was introduced in the famous works.
And yet, we also observe features, such as forceful, non-intentional, distributive and iterative causation, which are more difficult to interpret in terms of (in)directness. Note that Dixon’s observations about the major types are mostly supported by the data, with the exception of letting, which is clearly expressed by less compact forms (see also Levshina Reference Levshina2016). There are also two problematic cases: Kayardild (Tangkic, Australia) and Mutsun (Penutian, North America). In Kayardild, the causative suffix expressing direct causation is actually longer and therefore less compact than the one expressing indirect causation, as shown in (18). However, the indirect causative suffix {-lu-tha} is also used in the factitive function, which means ‘cause to be in a state’ (Evans Reference Evans1995: 355). This functional overlap makes it difficult to say which of the constructions in general is more direct and which is less direct, since causing a state is usually associated with less agentive Causees.
| Kayardild: Tangkic (Evans Reference Evans1995: 355) | |
| a. | direct causation: suffix -THarrma-tha |
| thulatha ‘descend’ > thulatharrmatha ‘take down’ | |
| dalija ‘come’ > dalijarrmatha ‘bring’ | |
| b. | indirect causation: suffix {-lu-tha} |
| dulbatha ‘sink (intr)’ > dulbalutha ‘cause to sink, drown’ (e.g., by shooting and not allowing to get out of water) | |
The other problematic case is found in Mutsun, where the mediopassive-causative suffix ‑mpi (causing a change of state) is actually longer than the active causative ‑si (making someone do something). An example is provided in (19), where (19a) illustrates the causative with ‑mpi and (19b) the causative with ‑si.
| Mutsun: Penutian (Okrand Reference Okrand1977: 216, 219) | ||||
| a. | mala-n ‘to get wet’ > mala-mpi- ‘to cause (someone) to get wet’ | |||
| b. | ka·n-was | lolle-si-Ø | sinnise | |
| I-him | babble-caus-npst | baby.obj | ||
| ‘I made the baby babble.’ | ||||
This exception can be explained historically: the suffix ‑mpi in fact represents a fusion of the mediopassive suffix ‑n and the suffix ‑pi, which no longer occurs autonomously (Okrand Reference Okrand1977: 215–216).
And yet, we see that Dixon’s predictions are overall supported. Remarkably, we find one language in which involvement of the Causer in the activity (in addition to the Causee) is expressed by a less compact form (Cavineña, a Tacanan language), although Dixon did not find any formal asymmetries in his data (Dixon Reference Dixon, Dixon and Aikhenvald2000: 75). We also find some features that were not mentioned, such as iterative, distributive, resultative causation (Yukaghir Kolyma), ballistic causation (Hup: Nadahup) or causation with a beneficiary Causee (Tubu/Dazaga: Saharan). In addition, in Manambu, a Sepik language, verbal cause–effect compounds express the specific type of causing event, e.g., vya-puti- (hit-fall.off) ‘shake something off by hitting, e.g., dust from a mat or a sheet’. Compare those with caused motion constructions and resultative constructions in English, e.g., throw the ball into the street or paint the door green (e.g., Hampe Reference Hampe, Handl and Schmid2011).
Since many of the distinctions go beyond (in)directness, the iconicity account is problematic. It is more natural to explain the findings in Dixon (Reference Dixon, Dixon and Aikhenvald2000) and in my typological survey by the principle of negative correlation between accessibility and costs. Less compact forms are more costly, and they are associated with less accessible functions, such as indirect, accidental, and other rare types, including the ‘exotic’ ones mentioned in the paragraph above. Note that the Manambu case fits this explanation well because specific types of causation should be less frequent and therefore less compact than non-specific, generic causation types. I expect that as we study more and more languages, the number of possible semantic distinctions will grow asymptotically, never exhausting all possible semantic shades. At the same time, all of them will have one thing in common: the rare types of causation will be expressed by less compact forms.
But how do we know that the features expressed by the more costly forms are less accessible? This will be shown in the next section, where we will look at corpus frequencies of different causation features.
7.2.2 How Accessible Are Different Causative Meanings?
This section presents spoken corpus data from three languages (English, Lao and Russian), which show very clearly that the features that are expressed cross-linguistically by more compact forms are more accessible than the ones that are expressed by less compact forms. From this follows that the formal differences between different causatives are related to the accessibility of the causative meanings they express, such that the forms and the meanings are paired in an efficient way.
In order to obtain the frequencies, I took different spoken corpora in three languages: English, Lao and Russian. For English, I took samples of text from fourteen spontaneous informal conversations in the Santa Barbara Corpus of Spoken American English (Du Bois et al. Reference Du Bois, Chafe, Meyer, Thompson, Englebretson and Martey2000–2005). I searched manually for all kinds of causative meanings, where one could distinguish the Causer, the Causee and the causing and caused effects, at least potentially. The constructions were transitive verbs (lexical causatives, such as break and kill), analytic causatives (e.g., make/let/force/order/help + (to) Infinitive), and resultative constructions (e.g., keep X in a certain state). In total, I obtained 205 causative situations.
For Lao, I took the transcripts of Enfield’s (Reference Enfield2007) dialogues from the appendix of his grammar of Lao. These are five dialogues about family, agriculture, fishing and work. I found only sixty instances in the entire corpus.
For Russian, I took one large text from Zemskaja and Kapanadze (Reference Zemskaja and Kapanadze1978), which contained the transcripts (with additional contextual information) of one day in a Soviet family. It includes all interactions between the wife, the husband, their son and the husband’s mother during one day. It gives an idea of typical linguistic behaviour of educated Russian speakers in the 1970s. The family members speak about food, health, childcare and home-making. The total number of causative examples was ninety.
The examples from the corpora were coded for several variables, which represent different types of causation expressed cross-linguistically by the less compact form. First, there is a block of variables representing different shades of (in)directness known in the literature. They are not orthogonal to one another, or mutually exclusive. They are followed by several other features, which I was able to code in the corpora.
1. ‘No Overlap’: There is no temporal or spatial overlap between the Causer’s actions (or non-interference) and the event or state that corresponds to what happened with the Causee. Example: The professor had her students keep a diary.
2. ‘Human Causee’: The Causee is human. Examples: She laid the children down; She made the children lie down.
3. ‘Controlling Causee’: The Causee is in control of the caused event. In other words, the Causee can choose, in principle, whether to perform what the Causer causes or allows the Causee to do. Example: The professor had her students write long term papers, where the students can choose, in principle, whether they comply or not.
4. ‘Caused Action’: The Causee performs an action (rather than gets into or keeps being in a certain state). Example: The general had her troops run 10 miles. The caused situation should be dynamic and the Causee should be in control (see Parameter 3).
5. ‘Communication’: The Causer uses only communication in order to achieve the outcome. Example: John talked his grandparents into sponsoring his album.
6. ‘Human Intermediary’: The situation implies a human intermediary, who participates so that the caused event takes place. Example: She made him dig a hole in the ground.
7. ‘Letting’: The causing event is permissive. Example: He let the child play in the yard.
8. ‘Forceful’: Forceful causation, as opposed to natural. Causation requires more effort from the Causer than usual. It is also possible to paraphrase the Causer’s action with ‘force’. Example: Ann forced Peter to sign the agreement.
9. ‘Non-intentional’: The Causer affects the Causee unintentionally, or is incapable of intentional actions (e.g., inanimate). Example: John broke the window when he was playing football.
10. ‘Involved Causer’: The Causer is involved in the caused event. In other words, the Causee performs the caused action or is in the caused state together with the Causee. Example: Susan brought her friends to the party (and came herself).
11. ‘Causee Benefits’: The Causee benefits from the caused event. Example: John fed the child.
12. ‘Non-implicative’: There is a possibility that the caused event does not actually happen. Example: John ordered Bill to surrender (but Bill did not do it).
13. ‘Distributive’: The caused event occurs several times, each time with a different Causee. Example: John baked a cake on Wednesday and brownies on Thursday.
14. ‘Keeping’: The event can be paraphrased as ‘keep X in a certain state or location’. Example: Ann kept all her savings under the mattress.
15. ‘Iterative’: The causation repeats several times (with the same Causee). Example: The gamer had to kill the villain again and again, until the villain had no more lives left.
16. ‘Assistive’: The Causer helps the Causee to perform the caused event. Example: John walked the child into the room.
In a few cases, it was difficult to determine the value due to lack of additional context, but the proportion of missing values was never greater than 4 per cent of the total number of examples in each of the three languages.
Figure 7.1 presents the proportions of the functions that are cross-linguistically expressed by less compact forms. One can see that none of them accounts for more than a third of all instances of causation in any of the corpora. This means that compact forms express more frequent functions. Interestingly, the Causee is more frequently human, controlling, performing a caused action and serving as an intermediary in Lao than in the other two languages. This can be explained by the fact that the longest of the dialogues contains a discussion of employment, in particular, situations when the boss has the servant do something. Still, these types of causation are not frequent. The frequency of beneficiary Causees in the Russian data is relatively high because the language users often speak about childcare (feeding, dressing, putting to sleep, etc.). Assistive causation is not present in the data.

Figure 7.1 Percentage of the total number of causative situations in corpora of three languages
To summarize, the data from informal spontaneous spoken dialogues demonstrate that the features of causative situations expressed by less compact forms across languages are less frequent than the features represented by more compact forms. I argue that this pattern arises due to the principle of negative correlation between accessibility and costs. Since iconicity relies on (in)directness as the core semantic parameter correlating with formal distance between the elements expressing causing and caused events, it fails to explain the full range of cross-linguistic variation.
7.2.3 Taking a Multifactorial Approach: Evidence from a Parallel Corpus
In Section 7.2.1, we looked at the distribution of individual semantic features. But they co-occur in real language use. Take the sentence, The burglar broke the window in order to get in. We can say that the causation is direct, intentional, physical, factitive, and so on. Will we see the same correlations between meaning and form if we test each semantic parameter while controlling for the others? A positive answer to this question is given in Levshina (Reference Levshina2016), where lexical and analytic causatives from fifteen European languages were compared. The goal of this section is to test the hypothesis on a sample of more diverse languages and constructions.
For this purpose, I found 387 causative situations in the English segment of the ParTy corpus of film and TED Talks subtitles.Footnote 4 The situations were coded manually for the variables representing Dixon’s parameters of semantic and syntactic variation (see Table 7.3). In addition, I coded animacy of the Causer and Causee. Next, I coded the translations of the English causatives in ten languages (Chinese, Finnish, French, Hebrew, Indonesian, Japanese, Russian, Thai, Turkish and Vietnamese), classifying them into three types: lexical, morphological and analytic (including periphrastic ones). Consider an example in (20) from the film Avatar and its translations into French, Turkish and Vietnamese. The French version contains a lexical causative, the Turkish one has a morphological one, and the Vietnamese translation has an analytic causative with the verb làm ‘make, do’.
| Avatar: (Don’t shoot), you’ll piss him off. | |||||||
| a. | French | ||||||
| Vous | allez | l’ | énerver. | ||||
| 2pl | go.2pl | him | make.nervous | ||||
| b. | Turkish | ||||||
| Onu | kız-dır-acak-sın. | ||||||
| 3.acc | get.angry-caus-fut-2sg | ||||||
| c. | Vietnamese | ||||||
| Cậu | sẽ | làm | nó | nổi | điên | đó. | |
| 2sg | fut | make | 3sg | get | mad | part | |
Table 7.3. Semantic variables used in the study based on the parallel corpus
| Variable | Abbreviation | Values | Examples |
|---|---|---|---|
| Semantics of the caused event | CausedEvent | ‘Action’ ‘NonAction’ | The teacher had the students ask questions. Sue broke the vase. |
| Number of main participants | NoPart | ‘2’ ‘3’ | Sue broke the vase. Ann made Bill steal the money. |
| Controlling Causee | CeControl | ‘Yes’ ‘No’ | The teacher had the students ask questions. Sue broke the vase. |
| Causee acting willingly | CeVol | ‘Yes’ ‘No’ | The teacher let the students leave earlier. The minister made the journalists wait for him. |
| Making or letting | MakeLet | ‘Make’ ‘Let’ | Sue broke the vase. The teacher let the students leave earlier. |
| Causer acting directly | CrDirect | ‘Yes’ ‘No’ | John broke Bill’s arm during the fight. The teacher had the students ask questions. |
| Causer acting intentionally | CrIntent | ‘Yes’ ‘No’ | The thieves broke the window to get in. Oops, I’ve broken your Ming vase. |
| Causer acting effortfully | CrForce | ‘Yes’ ‘No’ | Ann forced Bill to steal the money. The teacher had students ask questions. |
| Causer involved in caused event | CrInvolved | ‘Yes’ ‘No’ | Bring your friends! Sue broke the vase. |
Translations that were too different semantically from the original were excluded from the analysis. Next, the associations between the forms and the semantic features were tested with the help of conditional inference trees (Tagliamonte and Baayen Reference Tagliamonte and Harald Baayen2012; Levshina Reference Levshina, Paquot and Gries2020a). This was done separately for each language. Figure 7.2 shows the conditional inference tree for analytic and lexical causatives in French. The interpretation is as follows. First, the algorithm looks for the semantic variable that is most strongly associated with the form (that is, analytical or lexical causative). This variable is making or letting (see Node 1). The algorithm makes a binary split in that variable, separating the observations with letting (the branch on the left) from the ones with making (the branch on the right). Next, it tries to find another variable that is significantly associated with the form (with p < 0.05). In the case of letting, no such variable is found. The bar plot in the node leaf (Node 2) shows that cases of letting are expressed predominantly by analytic causatives. In the case of making, the next split is made in the variable ‘Controlling Causee’ (CeControl), see Node 3. If the Causee has no control, lexical causatives are predominantly chosen (Node 7), and no further splits are made. If the Causee is in control, there is another binary split in Node 4, which is done in the variable ‘Causer acting intentionally’ (CrIntent). If the Causer acts intentionally, lexical causatives are preferred (Node 6). If not, analytic ones are more frequently chosen (Node 5). All this suggests that the variation is multifactorial and cannot be reduced to direct or indirect causation, in either narrow or broad sense. We can also conclude that Dixon’s parameters predict correctly the use of more and less compact forms, even when the variables are tested simultaneously. Moreover, we see that the variables interact with each other.

Figure 7.2 A conditional inference tree for French
This procedure was repeated for the other languages. Where available, I tested the contrasts between analytic, morphological and lexical causatives. For Japanese, only the contrast between lexical and morphological causatives was possible, due to the fact that analytic causatives were missing. Table 7.4 shows which variables participate in the splits in the trees when the models compared analytic and lexical causatives. Table 7.5 displays the splits in the models which compared morphological and lexical causatives. Finally, Table 7.6 contains the splits relevant for the comparison of analytic and morphological causatives.
Table 7.4. Variables participating in splits that separate analytic from lexical causatives
| Model | CausedEvent | NoPart | CeControl | CeVol | MakeLet | CrDirect | CrIntent | CrForce | CrInvolved |
|---|---|---|---|---|---|---|---|---|---|
| Chinese | + | + | + | + | + | ||||
| Finnish | + | + | + | ||||||
| French | + | + | + | ||||||
| Hebrew | + | + | + | + | |||||
| Indonesian | + | + | + | + | |||||
| Russian | + | + | + | + | + | ||||
| Thai | + | + | + | + | |||||
| Turkish | + | + | + | ||||||
| Vietnamese | + | + | + | + |
Table 7.5. Variables participating in splits that separate morphological from lexical causatives
| Model | CausedEvent | NoPart | CeControl | CeVol | MakeLet | CrDirect | CrIntent | CrForce | CrInvolved |
|---|---|---|---|---|---|---|---|---|---|
| Finnish | + | ||||||||
| Hebrew | + | ||||||||
| Indonesian | |||||||||
| Japanese | + | ||||||||
| Turkish | + |
Table 7.6. Variables participating in splits that separate analytic from morphological causatives
| Model | CausedEvent | NoPart | CeControl | CeVol | MakeLet | CrDirect | CrIntent | CrForce | CrInvolved |
|---|---|---|---|---|---|---|---|---|---|
| Finnish | + | + | |||||||
| Hebrew | + | + | + | ||||||
| Indonesian | + | ||||||||
| Turkish | + | + |
Recall that Dixon (Reference Dixon, Dixon and Aikhenvald2000) did not find any formal asymmetries that correspond to the Causer’s involvement in the caused event (action). The conditional inference tree of lexical and morphological causatives in Hebrew shows that the proportion of morphological causatives is significantly higher in contexts with the Causer’s involvement than in the other cases. This supports the finding reported in the typological case study above.
It is notable that the variable ‘making or letting’ (MakeLet) is particularly strongly associated with analytic causatives, separating them from the other types. It participates in a split in every language. This may be due to the fact that English subtitles have many instances of letting, which can be explained culturally. According to Wierzbicka (Reference Wierzbicka2006: Section 6.2.3), letting is an important category in Anglo-Saxon culture because it is associated with non-interference, non-imposition and personal freedom. Also, actions as caused events and longer causation chains (more than two participants) are strong cues for analytic causatives. The contrasts between lexical and analytic causatives are particularly strong and involve many different variables. This is not surprising, since these constructions represent two ends of the causative continuum, so the differences between them must be particularly striking. Lexical and morphological causatives are the least distinguishable (with no significant differences in Indonesian), and their differences are the least systematic across the languages. Forceful (effortful) causation is rare in the parallel corpus data, which probably explains why it is not relevant.
Obviously, we need to perform similar analyses on other data sources and text types. It is reassuring, though, that the text source (i.e., the film or TED Talk where the causation situations are mentioned) appeared only in one Finnish tree. This means that the form–meaning associations reported here do not depend on the translator’s whim. But even if the prominence of some situations (in particular, letting) in the English source texts has an effect on the results, what we see here is enough to conclude that the formal variation depends on multiple factors. Although the variables that can be interpreted in terms of direct and indirect causation do play a central role, we see that intentional and accidental causation is also quite powerful across the languages. Importantly, all associations between form and meaning are in line with the efficiency predictions based on accessibility of the semantic features. Finally, it is notable that the data support the correlation between semantic directness and formal integration. In the next section, we will zoom in on this correlation and discuss its multiple explanations.
7.3 Competition between Formal Parameters
This section focuses on the formal parameters of causatives associated with direct and indirect causation. The goal is to demonstrate that the principle of negative correlation between accessibility and costs explains the associations better than iconicity does. I will also discuss productivity, which has been proposed as a factor correlated with (in)directness.
When speaking about the formal variation of causatives, it is convenient to use Haiman’s (Reference Haiman1985: 105) scale of linguistic distance, which is shown in (21).
| a. | X # A # B # Y |
| b. | X # A # Y |
| c. | X + A # Y |
| d. | X # Y |
| e. | X + Y |
| f. | Z |
In this cline, X and Y are the linguistic expressions of interest that express the cause and the effect in a causative construction, A and B are other intervening units, # represents a word boundary, + stands for a morpheme boundary, and Z is a morpheme where X and Y are fused. It is important to note that Haiman’s scale incorporates two related but distinct formal distinctions: formal distance and autonomy of X and Y. In some contrasts, they overlap. Take the types (21e) X + Y and (21f) Z, which differ both in distance and autonomy. But this is not always the case. For example, the types (21a) X # A # B # Y and (21b) X # A # Y differ only in the distance between X and Y, but not in their autonomy. In contrast, (21d) X # Y and (21e) X + Y differ only in autonomy, but not in distance. The difference between autonomy and distance will be important later in this section.
To give a simple illustration, lexical causatives like kill have both zero autonomy and zero distance, because X and Y are perfectly fused in one morpheme. Compare that with an example of an analytic causative. The causing and caused event in the sentence John caused Bill to die are represented by two relatively autonomous units, i.e., the words cause and die, and are separated by the past-tense morpheme ‑ed, the proper name Bill and the particle to. Under the iconicity account, greater autonomy and longer distance are associated with less direct causation, while dependent and closely located X and Y will convey more direct causation.
There exists yet another formal parameter, which was discussed by Shibatani and Pardeshi (Reference Shibatani, Pardeshi and Shibatani2002: Section 5), who argue that indirectness of causation correlates with the degree of productivity of constructions: productive forms tend to express indirect causation, whereas lexically restricted forms are more associated with direct causation. Japanese morphological causatives provide a good illustration. For instance, the verb oros- ‘bring down’ from ori- ‘come down’, which expresses direct causation, is a non-productive causative. In contrast, the form ori-sase- ‘cause to come down’, which is formed with a productive suffix ‑(s)ase, is productive and expresses indirect causation. Shibatani and Pardeshi claim that (in)directness is more strongly correlated with productivity than with the traditional formal distinction between lexical, morphological and analytic causatives.
Another piece of evidence comes from Amharic. It has causative prefixes a- and as- (Amberber Reference Amberber, Dixon and Aikhenvald2000). The prefix a- is not productive. It only applies to intransitive unaccusative verbs, e.g., verbs of motion and (change of) state, e.g., ‘exist’, ‘melt’, ‘grow’, ‘enter’, and to transitive verbs of ingestion (e.g., ‘eat’ and ‘drink’). It cannot be used with unergative verbs (e.g., ‘dance’ or ‘laugh’). The prefix a- is used to express situations when the Causer is directly involved in the causation. This is illustrated in (22b), where the Causer transports the Causee. In contrast, the causative prefix as- is productive. It can be added to transitive and intransitive verbs of all classes. It is used to express indirect causation, as in (22c), where the Causer is not directly involved and can simply issue an order or permission. Thus, indirectness correlates with productivity.
| Amharic: Afro-Asiatic (Amberber Reference Amberber, Dixon and Aikhenvald2000: 320) | ||||
| a. | aster | wǝt’t’a-čč | ||
| A. | exit+perf-3f | |||
| ‘Aster exited.’ | ||||
| b. | lǝmma | aster-ɨn | a-wǝt’t’a-t | |
| L. | A.-acc | caus-exit+perf+3m-3f.obj | ||
| ‘Lemma took Aster out (as in ‘out of the house’).’ | ||||
| c. | lǝmma | aster-ɨn | as-wǝt’t’a-t | |
| L. | A.-acc | caus-exit+perf+3m-3f.obj | ||
| ‘Lemma made/let Aster exit.’ | ||||
These and other examples (cf. Shibatani and Pardeshi Reference Shibatani, Pardeshi and Shibatani2002: Section 5) demonstrate that more productive morphological causatives often express indirect causation, and less productive ones express direct causation. At the same time, they belong to the same class of morphological causatives. From this follows that productivity may be more directly aligned with the (in)directness distinction than autonomy or formal distance.
The associations between these formal parameters and (in)directness were tested in Levshina (Reference Levshina2018 [2016]) on the same typological data set as the one discussed in Section 7.2. In forty-six languages, (in)directness or similar semantic distinctions were mentioned as a distinctive semantic parameter of two or more different causative constructions. Consider an example from the Amur dialect of Nivkh (a Paleosiberian isolate) in (23). One of the constructions consists of a non-productive causative suffix ‑u and expresses contact factitive causation, as in (23b). The other morphological causative, shown in (23c), contains a productive suffix ‑ku/‑γu/‑gu/‑xu and usually expresses distant factitive or permissive causation (Nedjalkov and Otaina Reference Nedjalkov and Otaina2013: 133).
| Nivkh: isolate (Nedjalkov and Otaina Reference Nedjalkov and Otaina2013: 234) | ||||
| a. | Lep | ţ‘e-d̦. | ||
| bread | be.dry-ind | |||
| ‘The bread dried up.’ | ||||
| b. | If | lep+se-u-d̦. | ||
| s/he | bread+be.dry-caus-ind | |||
| ‘He dried up the bread’ (for dried crusts). | ||||
| c. | If | lep+ətu-doχ | q‘au-r | ţ‘e-gu-d̦. |
| s/he | bread+cover-sup | not.be-conv:nar:3sg | be.dry-caus-ind | |
| ‘Not covering the bread, he let (it) dry up.’ | ||||
It is crucial to discuss how (in)directness was operationalized. Grammars vary greatly in the semantic distinctions they mention. The full list of the distinctions used in the data sources which were interpreted as direct vs. indirect is as follows (see Table 7.2 for examples of languages):
direct vs. indirect causation;
strong vs. weak integration of the causing and caused events, separability of events;
manipulative vs. directive causation;
contact vs. distant causation;
direct vs. mediated causation;
the Causee as non-controlling undergoer vs. controlling agent (and therefore the main source of energy);
default vs. ballistic causation;
factitive vs. permissive causation;
caused state (or change of state) vs. caused activity;
default causation vs. causation with human intermediary;
default vs. curative or ‘indefinite’ causation;
general vs. ‘mild’ or ‘weak’ causation;
default vs. caused by ordering X to do Y;
implicative vs. non-implicative causal relationships.
This inclusive approach allows us to test the different shades of (in)directness that were mentioned in Section 7.1. It also includes the main dimensions of (in)directness mentioned by Bohnemeyer et al. (Reference Bohnemeyer, Enfield, Essegbey, Kita, Bohnemeyer and Pederson2010): mediation, contact and force dynamics. At the same time, it does not include the intentions of the Causer and some other parameters that are difficult to interpret in terms of conceptual distance and integration (see Sections 7.1 and 7.2).
Note that implicative vs. non-implicative relationships are included because they reflect integration of events, which is important for the iconicity account (Givón Reference Givón1980). Example (24) from a Cariban language Waimiri-Atroarí illustrates a combination of the factitive/permissive distinction and implicativity. The causative suffix py in (24a) expresses factitive causation, whereas the periphrastic construction with injaky ‘let/permit’ and particle tre’me shown in (24b) expresses permission. In addition, causation in (24b) is non-implicative, which means that we cannot say for sure whether the caused event actually happened or not.
| Waimiri-Atroarí: Cariban (Bruno Reference Bruno2003: 100, 103) | ||||||
| a. | Ka | k-yeepitxah-py-pia. | ||||
| 3pro | 1+2obj-laugh-caus-im.p | |||||
| ‘She/he made us laugh.’ | ||||||
| b. | A | ka | m-injaky-piany | wyty | ipy-na | tre’me. |
| 1pro | ?Footnote 5 | 2obj-permit/let-rec.p | meat | look.for-? | part | |
| ‘I permitted you to/let you leave to hunt.’ | ||||||
As was already discussed in Section 7.2.1, the authors of grammars often treat the causative simply as a tool for increasing valency, or speak about ‘default’ or ‘general’ causatives. For example, one can find such distinctions as default vs. permissive, or default vs. curative causation. As was argued in the previous section, in such cases, the addressee is likely to derive a Q-implicature which precludes the non-default interpretation of the default construction. This is why pairs of causative constructions, where one construction is described as the default causative, are also counted here as instances of the (in)directness distinction.
In total, I found seventy-four contrasts related to (in)directness in forty-six languages. Other semantic distinctions involve forceful, unintentional, distributive, iterative causation and other types. These were discussed in the previous section and are not considered here. In each pair of constructions, the construction that expresses (more) direct causation is referred to as the direct causative, and the construction that represents (more) indirect causation is represented here as the indirect causative. The pairs of constructions were compared with regard to the four formal parameters mentioned above: distance, autonomy, productivity and length. That is, I asked if the direct causative was shorter, less productive, and consisted of less autonomous and distant elements than its indirect counterpart. The formal criterion for distance was the number of phonological segments (i.e., phones or phonemes) in the in-between elements, including affixes, clitics and autonomous words, which are obligatorily used between the elements representing the cause and effect. Autonomy, which is similar to bondedness of a sign, or ‘the degree to which it depends on, or attaches to … other signs’ (Lehmann Reference Lehmann2015: 131), was determined by using the following cline:
one morpheme < morphemes in a word < clitic + host < parts of one verbal phrase (monoclausal) < clauses in a sentence (biclausal)Footnote 6
Lexical causatives, such as kill or breaktr, display no autonomy, while analytic causatives, such as cause X to die, have the greatest autonomy. Morphological causatives are in-between. Productivity is the ability of a unit to freely combine with other units. Commonly, a language has a causative construction that can be used with all verbs, and another with only intransitives or stative verbs (Dixon Reference Dixon, Dixon and Aikhenvald2000). Finally, length comparisons were based on the number of segments in grammatically equivalent forms of the same verb. See Levshina (Reference Levshina2018 [2016]) for more details about the coding procedure.
To illustrate the approach, let us take two morphological causatives from Urarina, a language isolate in Peru. The causative which usually expresses direct causation is formed with the help of the suffix ‑a (26a). The indirect causative contains the suffix ‑erate (26b). The first causative is shorter and less productive than the second. In addition, it can be attached only to intransitives (Olawsky Reference Olawsky2006: 609–621). Judging from the description, the causatives differ neither in terms of the distance between the suffixes and the non-causal root, nor in terms of their autonomy.
| Urarina: isolate (Olawsky Reference Olawsky2006: 610–611, 616) | |
| a. | eno-a ‘enter’ > eno-a-a ‘make enter’ |
| nalʉ-a ‘fall’ > nalʉ-a-a ‘drop’ | |
| b. | saʉ-a ‘cut’ > sa-eratia ‘make cut’ |
| hjani-a ‘leave’ > hjane-ratia ‘make leave’ | |
Table 7.7 displays the counts for the individual contrasts between direct and indirect causatives. The numbers in parentheses show the number of languages in which these contrasts were found. Note that the number of contrasting constructions is higher than the number of languages because some languages have more than one contrasting pair. For example, compare the bottom cell in the column ‘Direct causative < Indirect causative’. The numbers tell us that the data contain 59 contrasts where the direct causative is shorter than the indirect causative with respect to length. The number 39 in parentheses means that these 59 contrasts occurred in 39 languages.
Table 7.7. Formal parameters associated with (in)directness of causation: number of contrasting pairs
| Parameter | Direct causative < Indirect causative | Direct causative = Indirect causative | Direct causative > Indirect causative |
|---|---|---|---|
| Distance | 44 (27) | 30 (26) | 0 (0) |
| Autonomy | 41 (24) | 33 (28) | 0 (0) |
| Productivity | 40 (24) | 33 (28) | 1 (1) |
| Length | 59 (39) | 13 (10) | 2 (2) |
The numbers reveal that the parameter most strongly associated with (in)directness is formal length. It gives correct predictions for more than 75 per cent of contrasts and languages. Direct causatives are as long as the indirect causatives in only 13 contrasts from 10 languages. There are two exceptions, when the direct causative is longer than its indirect counterpart (see examples from Kayardild and Mutsun, which were discussed in Section 7.2.1). Length is followed by distance: direct causatives are less distant than indirect causatives in 44 contrasts from 27 languages, whereas in 30 contrasts from 26 languages there is no difference. Next follows autonomy, with 24 languages and 41 contrasts, where autonomy is less in the direct causative. The parameter that is least strongly associated with (in)directness is productivity (24 languages and 40 contrasts, plus one exception from the predicted direction).
A series of binomial exact tests with random sampling of contrasts from individual languages show that the biases towards direct causatives having smaller length, distance, autonomy and productivity than indirect causatives are statistically significant (see Levshina Reference Levshina2018 [2016] for details). The null hypothesis is that there is no difference with regard to the direction of the asymmetry. In other words, we can have either a direct causative with a shorter form, less autonomy, smaller distance and lower productivity than an indirect causative, or the other way round. The null hypothesis could be safely rejected (p < 0.0001).
I also performed comparisons between direct and indirect causatives in situations when both are morphological – that is, they represent affixal derivations, root changes, augmentations, reduplications or tonal changes (cf. Dixon Reference Dixon, Dixon and Aikhenvald2000). I found that formal length is again the most strongly associated with (in)directness, followed by productivity. However, only length asymmetry was statistically significant (p = 0.01).
If one takes all contrasts where both direct and indirect causatives are analytic, including monoclausal verbal compounds, serial verbs, light verbs and biclausal periphrastic causatives, we see again that length is the most prominent parameter, closely followed by distance. The biases, however, do not reach statistical significance, probably due to the small sample size. Only twelve contrasts between syntactic causatives were found in the data.
Finally, if we take only the contrasts where one of the causatives is morphological and the other is analytic, we see that length is again in the leading role, followed by distance. Productivity is the least strongly associated parameter. All biases are in the expected direction and statistically significant (all p < 0.001).
The results of these analyses show that in general, all previous accounts have some grain of truth in them. The indirect causation constructions are either more distant/autonomous/productive/longer than the direct causation constructions, or at least as distant/autonomous/productive/long as those. The exceptions are very scarce. Therefore, the typological data overall support the correlation between conceptual and formal integration or distance. This contradicts the results reported by Escamilla (Reference Escamilla2012) and Bellingham et al. (Reference Bellingham, Evers, Kawachi, Mitchell, Park, Stepanova, Bohnemeyer, Siegal and Boneh2020), who did not find significant correlations between (in)directness and formal integration or compactness. This may be due to some methodological choices, such as the use of a more restrictive definition of (in)directness than in the present study. In particular, Bellingham et al. (Reference Bellingham, Evers, Kawachi, Mitchell, Park, Stepanova, Bohnemeyer, Siegal and Boneh2020) used ratings of linguistic descriptions of videos with different causation events. The descriptions contained causatives of different degrees of formal compactness. Indirectness of causation was operationalized mainly as the presence of mediation. An example of a mediated scenario from their stimuli is as follows: a woman sneaks up behind a man and yells loudly, startling him, and causing him to knock over a tower of cups. Spatiotemporal integration was not taken into account. Their ordinal regression analysis showed that mediation did not have a significant effect on the ratings of causative constructions with different degrees of formal integration. However, there was a significant effect of the presence or absence of physical contact, which can also be interpreted as a manifestation of (in)directness.
Escamilla (Reference Escamilla2012) used grammar descriptions of fifty genealogically diverse languages. A binomial test showed that the association between formal compactness in Dixon’s (Reference Dixon, Dixon and Aikhenvald2000) sense and (in)directness was not significant. Unfortunately, it is unclear which criteria were used for defining (in)directness, and how one derives this information from reference grammars for different constructions. Also, Escamilla seems to compare the number of languages where (in)directness leads to correct predictions with the total number of languages where this distinction is mentioned, not with the number of violations (although this is not entirely clear from the description, either). This is different from the current study. Here, we test if the number of pairs supporting the generalization is greater or smaller than the number of pairs going against it. All this demonstrates very vividly how much our claims about validity of cross-linguistic generalizations depend on the choice of comparative concepts (c.f. Haspelmath Reference Haspelmath2010) and methodology.
Importantly, one can see that relative length is the parameter which is the most strongly associated with the (in)directness distinction, both in the whole data set and in each constructional type. The results thus favour the explanation based on the principle of negative correlation between accessibility and costs. Indirect causation forms are longer than direct causation forms because the indirect causation scenarios are less accessible than direct causation scenarios, which results in efficient formal asymmetries. It cannot be excluded that other factors may be relevant, too, but the efficiency account is supported by the strongest evidence.
7.4 Diachronic Evidence
Unfortunately, there is not much historical evidence of how causatives have emerged and developed in different languages, especially as far as more compact forms are concerned. Sometimes one can infer information about the diachronic sources of causative auxiliaries and morphemes from their colexifications with non-causative expressions. The known sources of causative markers and auxiliaries include the following:
‘make’ and ‘do’, e.g., suffix ‑(i)fy in English, which comes from Latin ficāre ‘to do, make’; Dutch analytic causative with doen ‘do’;
verbs of communication, e.g., ‘order’ (Trumai: isolate), ‘say’ (Skou: Skou) and ‘ask’ (Great Andamanese family);
verbs of possession: ‘have’ and ‘get’ (English), ‘take’ (Hup: Nadahup), ‘give’ (Finnish: Uralic), ‘hold, grasp’ (Kayardild: Tangkic);
motion verbs, e.g., ‘send’ (Yagua: Peba-Yaguan);
position verbs, e.g., ‘stand’ (Hup: Nadahup);
verbs of caused motion: ‘bring’ (Humburi Senni: Songhay), ‘put’ (Kayardild: Tangkic), ‘pull’ (Tubu/Dazaga: Saharan), ‘push’ (South Eastern Huastec: Mayan);
abstract verbs, e.g., ‘cause’ (English), ‘affect’ (Adang: Timor-Alor-Pantar), ‘force’ (Ik: Eastern Sudanic) or ‘treat in a certain way’ (Yuracaré: isolate);
verbs of physical contact: ‘hit’, ‘step on’ and ‘bite’ (Manambu: Sepik);
instrumental and manner affixes, e.g., ‘by hand’ (Northern Paiute: Uto-Aztecan) or ‘using a sawing action’ (Nishnaabemwin: Algic).
One can see that the sources are extremely diverse and come from different domains of human experience. At the same time, some are more popular than others. In particular, the evolution of verbs with the meaning ‘do’ and ‘make’ into causative markers is among the most common grammaticalization paths cross-linguistically (Bisang et al. Reference Bisang, Malchukov, Bisang and Malchukov2020: 15).
In addition, causative morphemes can coincide with the following grammatical markers:
transitivizers and verbalizers (Yapese: Austronesian);
directional (allative) case markers, e.g., ‘towards’ (Ijo/Izon: Niger-Congo);
intensifying affixes (Chichewa: Niger-Congo);
aspectual affixes, e.g., punctual action (Mari: Uralic);
passive markers (Southern Min: Sino-Tibetan);
applicatives (Uto-Aztecan languages);
benefactive affixes (Khasi: Austro-Asiatic);
complementizers (Thai: Tai-Kadai).
In most cases, it is very difficult or even impossible to determine the path of historical development. There are some arguments, however, that causative constructions often undergo the process of formal reduction and semantic shift from less syntactically and semantically integrated constructions, such as purposive and subjunctive constructions, to more semantically and formally integrated ones (Song Reference Song1996). An example is ‘X made such that Y should happen’ or ‘X ordered that Y does Z’.
Let us consider the development of the English causative with make. In Old English, it was followed by finite that-clauses, as in the following example:
| Old English, Heptateuch (Exodus 96: 14; cited from Lowrey Reference Lowrey2012)Footnote 7 |
| Ge habbaþ us gedon laþe Pharaone and eallum his folce and gemacod þæt hig wyllað us mid hyra sweordum ofslean |
| ‘You have made us hateful to Pharaoh and to all his people, and made them want (lit. that they want) to slay us with their swords’. |
In Middle English and Early Modern English the to-infinitive was predominant, but the bare infinitive occurred as well (Hollmann Reference Hollmann2003: 166–167; Moriya Reference Moriya2017). Consider two examples from the King James Bible (1611):
| Early Modern English, King James Bible (Moriya Reference Moriya2017: 44) | |
| a. | And wherefore haue ye made vs to come vp out of Egypt, to bring vs in vnto this euil place? (Numb. 20.5) |
| b. | And hee doeth great wonders, so that hee maketh fire come downe from heauen on the earth in the sight of men (Rev. 13.13) |
Moriya (Reference Moriya2017) argues that the preference for the bare or to-infinitive was guided by various factors. One is horror aequi, or avoidance of identity, which was discussed in Section 4.4. In the presence of to before make, the bare form was more likely. Another is the linguistic distance between make and the second verb (in particular, the length of the nominal phrase in-between). The marker is used when the environment is complex, according to Rohdenburg’s principle of cognitive complexity (see Section 2.4.2). One can also say that long linguistic distance makes the interpretation of the infinitival complement as belonging to make less accessible, which triggers the use of the more costly variant. Although the evidence for the semantic differences between the two variants is not conclusive and a lot of variation looks random, there are also some instances of the to-infinitive being preferred in contexts with willing Causees, which represent a less typical kind of causation, as one can judge from the low frequencies of human (and potentially willing) Causees discussed in Section 7.2.2. This suggests that the marker was preferred in contexts where the interpretation was less accessible, while the bare infinitive first spread in stereotypical causative situations and subschemata of the construction. At the moment, the causative make is used with the bare infinitive (e.g., This makes me laugh), with the exception of the passive form (e.g., He was made to sit on an uncomfortable stool). Due to its low frequency, the passive causative has low accessibility, and therefore remains the last bastion of the marked infinitive. Notably, the passive form of either the matrix verb or the infinitive was associated with the to-form in Late Middle English infinitival complements in general (Fischer Reference Fischer1995).
Importantly, the gradual disappearance of to after make has not happened in the other factitive causatives (cause, force, get, persuade, etc.). This can be regarded as an example of differential formal reduction (see Section 5.2). Hollmann (Reference Hollmann2003: 151–158) finds that the make-construction scores higher on semantic boundedness (which is operationalized as a weighted score based on directness, intentionality, punctuality, etc.) than the other constructions. I assume that the features which are associated with greater semantic boundedness by Hollmann and others are simply more accessible due to their higher frequency in discourse (see Section 7.2.2). Unfortunately, there is too little data about the development of the causative with have, which is used with the bare infinitive, although it has relatively low frequency.
Another example of formal reduction is provided by Song (Reference Song1996: 88). In Ijo (Izon), a Niger-Congo language, there is a causative suffix ‑mọ, which is also identical with the directive case marker. In some cases, a separate lexical element mie is added, which expresses the causing event. Compare (29a) and (29b):
| Ijo [Izon]: Niger-Congo (Song Reference Song1996: 88) | ||||
| a. | áràú | toboú | mìe | búnu-mo-mi |
| she | child | make | sleep-caus-asp | |
| ‘She soothed the child to sleep.’ | ||||
| b. | áràú | toboú | búnu-mo-mi | |
| she | child | sleep-caus-asp | ||
| ‘She laid the child down to sleep.’ | ||||
Song argues that the original purposive construction with two predicates (mie and the verb expressing the caused event) and the originally purposive marker mo- is giving way in this language to the morphological causative. The first predicate is normally omitted. The shorter causative in (29b) expresses more direct causation than the longer one in (29a). This can be explained by efficiency considerations, too: shorter forms represent more accessible causation scenarios.
Mithun (Reference Mithun, Wischer and Diewald2002) shows that causative morphemes can emerge as a result of semantic generalization and reanalysis of more specific, concrete meanings. For example, in diverse Northern American languages, there are numerous prefixes of manner and means, e.g., doing things with hands, feet, teeth, a knife, by pressure, etc. For instance, the manner prefixes yu-, pa- and ka- described different hand motions in Lakhota, e.g., bláya ‘be level, plain’ – yubláya ‘open, spread out, unfold, make level’. They are highly frequent, since language users do many things with hands. In the long run, they were reanalysed as general causative prefixes, and the more specific meaning of a hand movement has disappeared, as in the pair bléza ‘clear’ – yubléza ‘make clear’.
The general grammaticalization path of causatives seems to be the following: from less accessible functions (including indirect causation) and analytic forms to more accessible functions (including direct causation) and more compact forms. The general pragmatic mechanism based on the principle of negative correlation between accessibility and costs was proposed in Section 5.4.1. As a construction becomes more frequent, the causativizing element becomes more predictable, and its meaning becomes more accessible, which leads to its shortening. A construction that formerly expressed only indirect causation can be used in a reduced form to represent more direct causation. The link of the expression with the more typical meaning leads to a further increase in frequency and greater reduction, and so on.
As for the increasing bondedness of causative markers, this has been explained by the fact that short elements do not have enough bulk to stand on their own and need a host (Haspelmath Reference Haspelmath2008c: Section 6). However, it is also possible that a strong association between the units in a construction leads both to their formal integration and reduction. As a result of its high accessibility, the causativizing element becomes reduced.
An efficient formal asymmetry is created when a new causative emerges with a longer form and less accessible meaning than the old one. Consider Old Dutch. After the Germanic morphological causatives with the suffix ‑ja stopped being productive (possibly due to the loss of transparency in umlaut), there remained many lexical (ex-morphological) causatives. In the twelfth to the thirteenth centuries, analytic causatives with doen ‘do’ and laten ‘let’ emerged (van der Horst Reference van der Horst, Tieken-Boon van Ostade, van der Wal and van Leuvensteijn1998). The earliest instances of the doen-causative expressed curative causation (i.e., having someone do something), which implied an agentive Causee, as in the following example:
| Middle Dutch (van der Horst Reference van der Horst, Tieken-Boon van Ostade, van der Wal and van Leuvensteijn1998: 56) | |||||
| si | sullen | sin | hus | doen | breken |
| they | want | his | house | do | burst |
| ‘They will have his house broken down.’ | |||||
The first attestations of the construction with laten had a permissive sense:
| Middle Dutch (van der Horst Reference van der Horst, Tieken-Boon van Ostade, van der Wal and van Leuvensteijn1998: 64) | ||||
| lat | dise | arme | kinde | leuen |
| let | these | poor | children | live |
| ‘Let these poor children live.’ | ||||
These constructions were used for the relatively infrequent functions, while the more frequent ones were performed by lexical causatives (e.g., breken ‘break, burst’, weuen ‘weave’, spreiden ‘spread’ or leggen ‘lay’). Nowadays the causative construction with doen occupies the niche of affective causation (e.g., to make someone cry, think, believe), especially in Netherlandic Dutch, while the construction with laten expresses very diverse types of indirect causation, including permission, similar to German lassen (Verhagen and Kemmer Reference Verhagen and Kemmer1997; Levshina Reference Levshina2011). Both constructions represent less accessible types of semantics in comparison to lexical causatives, which are the default way of expressing a variety of direct causation scenarios.
Thus, if there is a novel causative expression, it is likely to begin with indirect causation or other non-stereotypical functions. If a costly expression is used, one is tempted to attribute to it less accessible meanings. For example, the speaker and the addressee know that there are some typical and untypical ways of making someone dead. They also know that the costs of the new longer form are higher for the speaker (and the addressee) than the costs of the default short form. The more costly expression signals that the interpretation is not trivial.
If the innovation spreads in the community and becomes conventional, this may cause the more compact form to trigger a Q-implicature (see Section 1.4.2). In particular, this will mean that the causation expressed by the compact form is not indirect, not non-intentional, etc., because otherwise the speaker would have chosen the less compact form. Compare Russian and German. In Russian, lexical causatives can still be used to express indirect curative causation:
| Šnur | vstavil | zuby | za | 250000$.Footnote 8 | |
| Sh. | inserted | teeth | for | 250000$ | |
| ‘Shnur (a celebrity) had his teeth replaced (lit. replaced his teeth) for $250,000.’ | |||||
The exact meaning is inferred from the context. In some cases, one can name the actual Causee (e.g., say ‘at the dentist’s’) or the price, as in this example, to indicate that this was a service provided by a professional, but this information is not obligatory. This use of lexical causatives is possible because Russian does not have a special construction to express non-forceful and non-permissive curative causation that would correspond to the English ‘have something done by someone’.
In contrast, German has more frequent analytic causatives, in particular, the causative with lassen ‘let’ (cf. Levshina Reference Levshina2015). In the example below, the analytic auxiliary lassen cannot be omitted if one is referring to the standard procedure performed by a dentist:
| Wollen Sie sich während Ihres Aufenthalts in Rumänien dritte Zähne einsetzen *(lassen)?Footnote 9 |
| ‘Would you like to have your dentures (lit. third teeth) done during your stay in Romania?’ |
The lexical causative einsetzen ‘set in, insert’ cannot be used to express indirect curative causation because it would trigger the Q-implicature that the subject would do the dentures oneself. Speakers of German are aware of the conventional alternative with lassen. This blocks the use of a lexical causative in this context.
7.5 An Artificial Language Learning Experiment
The previous section provided indirect evidence that the form and function of causatives evolve according to the principle of negative correlation between accessibility and costs. The aim of this section is to demonstrate that language users indeed have a bias towards efficient form–meaning mappings of causatives. That is, they tend to express a more accessible meaning by a shorter construction, and a less accessible one by a longer form. This bias is demonstrated with the help an artificial language learning experiment in Levshina (Reference Levshina, Schmidtke-Bode, Levshina, Michaelis and Seržant2019a). One can observe in real time how linguistic systems undergo change, revealing the cognitive and communicative biases of language users (Kirby, Cornish and Smith Reference Kirby, Cornish and Smith2008; Hudson Kam and Newport Reference Hudson Kam and Newport2009; Smith and Wonnacott Reference Smith and Wonnacott2010; Caldwell and Smith Reference Caldwell and Smith2012; Verhoef Reference Verhoef2012; Kirby, Griffiths and Smith Reference Kirby, Griffiths and Smith2014; Tamariz Reference Tamariz2016; Little, Eryılmaz and de Boer Reference Little, Eryılmaz and de Boer2017, and many others). The assumption is that those linguistic features that are easier to learn and to use in communication will spread at the expense of less ‘fit’ alternatives (Smith et al. Reference Smith, Perfors, Fehér, Samara, Swoboda and Wonnacott2017). Also, using artificial languages can help us to control for different semantic and functional properties of causatives, which are usually highly correlated in real languages, as we saw above.
In this case study, I focus on the claim that more frequent situations are expressed by means of less coding material than less frequent ones. Such differences are predicted by the efficiency account. In the previous sections, it was shown that more expected causative situations are usually expressed by shorter causatives, whereas less typical ones are expressed by longer constructions. But will artificial language users use such a system spontaneously when learning and using an artificial language? This is the main question that motivates the experiment.
The experiment was performed online, using Google Forms with built-in YouTube videos. The procedure was as follows. The participants of the experiment had to learn an artificial language. The instruction was as follows:
In this experiment you will learn the lingua franca of a highly developed civilization that exists on a planet in a galaxy far, far away… The planet is called Atruur. Its only vegetation form is called ‘grok’. It is similar to a cactus and is used by the Atruurians for food, as fuel for their flying vehicles and for entertainment. Because the Atruurians traditionally detest any form of physical activity, they have developed a technology for teleportation and telekinesis.
The word order in Atruurian was SV (for intransitives) or SOV (for transitives), as in the example below:
| Grok | babum. | |
| cactus | grow | |
| ‘A grok (cactus) grows.’ | ||
| Sia | grok | hum. |
| Atruurian | cactus | see |
| ‘An Atruurian sees a grok (cactus).’ | ||
In the training part, the participants learned the language by copying sentences that described thirty-two situations shown in video clips. In each of them, there was a UFO which hovered above the plant and flashed a yellow or blue light. After that, the plant either appeared, disappeared, grew or shrank. Different types of UFOs were shown. Figure 7.3 demonstrates four fragments from one video clip.

Figure 7.3 Fragments of a video clip used in the experiment
Crucially, the causing events were of two types. A UFO could flash either a yellow light above the plant or a blue light from the left-hand side of the plant. There was no reason to assume that one type of causation was more or less direct than the other. The yellow-light causing event was three times as frequent as the blue-light causing event. This means that the yellow-light causative event was more accessible.
In the artificial language, each of the causing events was represented by two allomorphs. One of them was expressed by the forms tere- or te-, as in (36), and the other was described using the forms gara-/ga-. The association between the pair and the event varied for different subjects. Most importantly, the short and long prefixes were evenly distributed among different events, such that the use of the longer and shorter forms did not depend on the type of caused events or other conditions.
| a. | Sia | grok | te-babum. |
| Atruurian | plant | caus-grow | |
| ‘The Atruurian caused the plant to grow (by flashing with yellow light from above).’ | |||
| b. | Sia | grok | tere-babum. |
| Atruurian | plant | caus-grow | |
| ‘The Atruurian caused the plant to grow (by flashing with yellow light from above).’ | |||
The prediction was that the participants would regularize the free variation, such that the output language is more efficient than the input language. More precisely, I expected them to choose the short allomorphs more frequently to convey the frequent causing events, using the long allomorph to express the rare causing events.
In the testing part, the task was to describe what was going on in video clips. The stimuli represented a selection from the previous stimuli: each of the caused events was presented with causing event A and causing event B.
The participants were recruited via my personal network and LinguistList. Overall, I obtained 554 valid data points from 70 participants with different L1 backgrounds. None of the participants guessed the purpose of the experiment judging by their responses to a control question at the end.
Figure 7.4 displays the counts of the long and short forms produced by the participants. Overall, the short forms are more preferred than the long ones, but the stimuli with the more frequent causing event are more likely to be described by a short form in comparison with the stimuli with the rare causing event. In the latter case, the proportions of the short and long forms are almost equal. These results are supported by a generalized linear mixed-effects model with individual participants as random effects (intercepts). If the causing event is rare, the odds of the longer form being chosen are 1.66 times greater than when the event is frequent (log-odds ratio b = 0.501, p = 0.006).

Figure 7.4 Counts of short and long causative forms in the responses
To summarize, the results demonstrate that frequent causative situations become more commonly expressed by shorter forms, whereas subjects are more tolerant of longer forms when expressing rare causative situations. As a result, a more efficient system emerges. The fact that the effect was detected in a non-iterative experiment with only one ‘generation’ of language learners suggests that the bias is strong. This provides evidence in favour of the efficiency-based account of functionally similar expressions. Since both forms were originally available to the learners, we can witness competition between the existing forms and how they become specialized in different causative situations.
7.6 Conclusions
This chapter presented evidence based on typological data, corpora and an artificial language learning experiment which supports the claim that the form–meaning correspondences in causative constructions are best explained by the principle of negative correlation between accessibility and costs. The other accounts, which involve iconicity and productivity as explanatory factors, are less successful in predicting and explaining the famous correlation between formal and semantic integration of events in causative constructions. Moreover, the efficiency account explains other form–meaning correspondences beyond event integration and the distinction between direct and indirect causation, such as intentional vs. accidental causation. Finally, this account predicts correctly the emergence of efficient formal asymmetries in an artificial language experiment which do not involve an iconic correspondence between form and meaning.
Arguing for efficiency, I do not want to exclude iconicity completely. It may be that iconicity matters at early stages, when a new causative expression is coined. For example, the temporal or spatial distance between the causing and caused events may be emphasized by presenting the cause and effect using two independent clauses, as in the following example:
He pressed the button and down in the control room all the screens suddenly came to life.Footnote 10
However, when a novel expression becomes popular, efficiency considerations become more important.
This chapter did not include semantic distinctions based on marking of arguments (most importantly, the Causee). Consider an example from Dutch in (38):
| Dutch (Kemmer and Verhagen Reference Kemmer and Verhagen1994: 136) | |||||||
| a. | Hij | liet | haar | de | brief | lezen. | |
| he | let | her | the | letter | read | ||
| ‘He let/had her read the letter.’ | |||||||
| b. | Hij | liet | de | brief | door | iemand | lezen. |
| he | let | the | letter | by | someone | read | |
| ‘He had the letter read by somebody.’ | |||||||
| c. | Hij | liet | de | brief | aan | iedereen | lezen. |
| he | let | the | letter | to | everyone | read | |
| ‘He let/had everybody read the letter.’ | |||||||
In (38a), the Causee haar ‘her’ is not case-marked. The meaning is that the Causee reads the letter for content. This is the stereotypical interpretation. The use of the preposition door ‘by’ in (38b) implies that the Causee is conceptualized as an instrument. Here, the Causer may be interested in having the letter checked for spelling and grammar, for instance. In (38c), the Causee is marked with a preposition aan ‘to’. The meaning is similar to (38a), in that the Causee reads for content. The difference is that the Causee in (38c) is less accessible than in (38a). Dative-marked Causees in Dutch are likely to be full noun phrases, which are often indefinite and non-topical (Kemmer and Verhagen Reference Kemmer and Verhagen1994: 137). Such Causees are postponed, which makes the planning and production more efficient (see Section 3.2.2).
In contrasts like this one, we can predict that the participants that are more accessible will have zero or shorter marking than less accessible ones. This prediction needs to be tested.
8.1 Differential Case Marking
This chapter discusses differential marking of transitive subject and object. In the typological literature, they are often represented as comparative concepts A and P, which correspond to the agent-like and patient-like arguments of a two-place transitive clause. This chapter addresses only differential case marking, or flagging. Case markers include both affixes and adpositions. Differential case marking is defined here in a very broad sense. I speak about differential marking when subjects, objects or other arguments are sometimes marked, and sometimes unmarked, depending on their own semantic, pragmatic and other properties and/or on the properties of other arguments, or other parameters, including gender, number, tense, aspect, word order, distance from the predicate, and so on. In this regard, my approach is different from some earlier ones (e.g., Bossong Reference Bossong1985; Sinnemäki Reference Sinnemäki2014). The reason for this is that efficiency is likely to manifest itself in many of these phenomena, not only in those traditionally discussed in the literature.
An example of differential subject marking can be found in Qiang, where the inanimate A is marked, and the animate A is unmarked:
| Qiang: Sino-Tibetan (LaPolla and Huang Reference LaPolla and Huang2003: 79–80) | |||
| a. | Animate A: unmarked | ||
| The: | qa | dʑete. | |
| 3sg | 1sg | hit | |
| ‘He is hitting me.’ | |||
| b. | Inanimate A: marked | ||
| Moʁu-wu | qa | da-tuə-ʐ. | |
| wind-agt | 1sg | dir-fall.over-caus | |
| ‘The wind knocked me over.’ | |||
The example of differential object marking in (2) is from Spanish, where animate objects tend to be formally marked, while inanimate objects are unmarked, although definiteness and individual verbs also play a role (von Heusinger and Kaiser Reference von Heusinger, Kaiser, Kaiser and Leonetti2007):
| Spanish (García García Reference García García, Seržant and Witzlack-Makarevich2018: 211) | |||||
| a. | Inanimate P: unmarked | ||||
| Pepe | ve | la | película. | ||
| Pepe | see.3sg | def | film | ||
| ‘Pepe sees the film.’ | |||||
| b. | Animate P: marked | ||||
| Pepe | ve | a | la | actriz. | |
| Pepe | see.3sg | obj | def | actress | |
| ‘Pepe sees the actress.’ | |||||
Differential marking has been widely discussed in the typological literature (e.g., Silverstein Reference Silverstein and Dixon1976; Comrie Reference Comrie and Lehmann1978; Dixon Reference Dixon1979, Reference Dixon1994; Bossong Reference Bossong1985; Aissen Reference Aissen1999, Reference Aissen2003; de Hoop and de Swart Reference de Hoop and de Swart2008; see also a comprehensive overview in Witzlack-Makarevich and Seržant Reference Witzlack-Makarevich, Seržant, Seržant and Witzlack-Makarevich2018).
There are many intriguing questions about differential marking. One of these is to what extent it is constrained by different hierarchies, or scales, of semantic, discursive and other features, such as the animacy hierarchy. For example, there is a popular claim that languages with differential object marking are more likely to mark animate objects than inanimate objects. The example from Spanish illustrates that tendency. There are different opinions about to what extent these constraints are universal. This topic is addressed in Section 8.2, where I also present typological data that support at least some cross-linguistic generalizations.
Another question is how to explain the observed common patterns. There are quite a few different theories, which involve frequency, iconicity, disambiguation, identification, topicality and other notions, which are discussed in Section 8.3. I argue for the efficiency account of these patterns, which is similar in spirit to Haspelmath (Reference Haspelmath2021b). In a language with differential case marking, a referential expression is more likely to be formally marked if there are no reliable cues that can help the addressee to infer its thematic role, and less likely to be marked if there are strong cues. For example, animacy can serve as a cue that the expression is more likely to be A than P. In other words, animacy makes the interpretation of an argument as a P more accessible. Corpus evidence for this claim is presented in Section 8.4. Similar reasoning can be applied to other features, such as definiteness, pronominality, givenness, and so on. Cues increasing accessibility can also come from other sources: the other arguments, the predicate, word order or extralinguistic context. In Section 8.5 I will discuss the use of object marking depending on the presence or absence of visual cues, and present some experimental evidence in support of the efficiency account.
8.2 Cross-Linguistic Generalizations Related to Differential Case Marking
There is a widespread opinion that the use or absence of markers is constrained by universal referential scales, such as the ones presented by Croft (Reference Croft2003: 130–132) and Haspelmath (Reference Haspelmath2021b) (see also Haude and Witzlack-Makarevich Reference Haude and Witzlack-Makarevich2016):
| a. | Person: 1 and 2 > 3 |
| b. | Nominality: pronoun > nounFootnote 1 |
| c. | Animacy: human > animal > inanimate |
| d. | Definiteness/specificity: definite > specific > non-specific |
| e. | Givenness: discourse-given > discourse-new |
| f. | Focus: background (topic) > focus |
Each of the scales may be more or less relevant to the use of differential marking in an individual language. However, there is a remarkable cross-linguistic tendency, which has fascinated linguists for many years. If a language has a coding split, more prominent P arguments (i.e., the ones on the left) are usually formally marked, while less prominent ones (i.e., the ones on the right) are unmarked. The example (2) from Spanish shows the effect of the animacy scale, where animate objects are marked and inanimate ones are unmarked. It has also been claimed that the reverse holds for A arguments. The Qiang example in (1), where animate subjects are unmarked and inanimate ones are marked, supports this view.
In general, differential object marking is more frequent typologically (also see below). In fact, typological data show that differential case marking is the attractor state for object marking (Sinnemäki Reference Sinnemäki2014). That is, if a language has object marking, it is more likely to be differential than to be consistently used on all objects. The most relevant features for differential object marking cross-linguistically are animacy and definiteness. For example, the definiteness hierarchy plays a role in Biblical Hebrew, where the marker ʔet is only used with definite nouns:
| a. | (Gen. 1:26) | ||||
| naase | adam | be-tzalme-nu | |||
| create | man | in-image-our | |||
| ‘Let’s create a man in our image…’ | |||||
| b. | (Gen. 1:27) | ||||
| va-yivra | Elohim | ʔet | ha-adam | b’-tzalm-o | |
| and-created | Almighty | obj | def-man | in-image-his | |
| ‘So God created the man in his image…’ | |||||
Animacy is important in Dhargari, a Pama-Nyungan language spoken in Australia, in which all animate objects are case-marked. Also, in Sinhalese, an Indo-Aryan language, inanimate objects are never marked, whereas animate objects may receive case marking optionally (Aissen Reference Aissen2003).
Very commonly, animacy and definiteness/specificity interact. An example is Hindi, where case marking is obligatory with human objects, as well as with definite and specific inanimates. It is optional with non-specific human-referring objects. Finally, it is not used with indefinite inanimate objects (Aissen Reference Aissen2003). The Romanian object marker pe, mostly combined with clitic doubling, applies to human objects only if they are definite, specific or topicalized (von Heusinger and Onea Gáspár Reference von Heusinger and Onea Gáspár2008).
Differential subject marking is less common and also less regular (de Hoop and Malchukov Reference de Hoop and de Swart2008). Moreover, systems that involve the animacy hierarchy, as in the Qiang example (1), are rare (Fauconnier Reference Fauconnier2011). There seems also to be little consistent evidence (due to many exceptions) that indefinite or non-specific subjects are marked. Instead, languages use alternative strategies, e.g., passives or presentative constructions similar to the English construction there is… (Comrie Reference Comrie1989:130; Malchukov Reference Malchukov2008).
There have been some critical voices, claiming that the scale effects are not supported enough by cross-linguistic data. In particular, personal pronouns are the greatest offenders (Filimonova Reference Filimonova2005). This can be explained by the conserving effect of frequency (Bybee and Thompson Reference Bybee, Pagliuca, Perkins, Croft, Denning and Kemmer1997), since personal pronouns are typically very frequent in discourse. As a result, they are stored and accessed as independent units, rather than as members of a morphological paradigm. This is why distinct pronominal case forms in English, e.g., I and me, she and her) persist, while the nominal case distinctions have been lost. In this case, the fact that some pronouns can be case-marked fits the nominality scale: some pronouns are object-marked (although the forms can be suppletive), while all nouns are unmarked. However, this mechanism can also cause unexpected patterns. For example, in some Indo-Aryan and Iranian languages, pronominal subjects are marked, and nominal ones are unmarked, contrary to what one might expect. This can be explained by the fact that these languages are undergoing a change from ergative to tripartite and then to Accusative case marking. Due to their resistance to change, personal pronouns retain the old ergative forms in the subject position, while nouns, which have undergone a sweeping change, have lost their marking. This can explain why the scale-based predictions are violated.
A critical view has also been expressed by Bickel, Witzlack-Makarevich and Zakharko (Reference Bickel, Witzlack-Makarevich, Zakharko, Bornkessel-Schlesewsky, Malchukov and Richards2015b), who argue that universal scale effects in differential case marking do not exist. A quantitative analysis of large-scale cross-linguistic data shows that there are no significant effects that would hold in all parts of the world. How should we evaluate their claim? As argued by Schmidtke-Bode and Levshina (Reference Schmidtke-Bode, Levshina, Seržant and Witzlack-Makarevich2018), at least some of the scale effects in object marking can still be considered universal if one uses an alternative statistical method (mixed-effects regression analysis instead of the Family Bias method used by Bickel et al.) and applies a different definition of universal effects. We find significant differences in the chances of marking in pronominality and high/low prominence of nouns, which represents a conflation of animacy, definiteness and other features in Bickel et al.’s data (see more information below). Pronominality and nominal prominence are not equally important for differential object marking in different geographic macroareas. For example, pronominal objects are substantially more likely to be marked than nominal objects in Africa, Australia and Papua New Guinea. High-prominence nouns are much more likely to be marked than low-prominence nouns in Eurasia and both Americas. Crucially, we find no significant violations of the scale effects, e.g., a macroarea where high-prominence objects are unmarked, and low-prominence objects are marked.
I believe that statistical evidence of the relevance of each scale in every possible area of the world is not required for the efficiency account. If there is a marking split, we should evaluate the general principle: it is efficient to mark an expression which is unlikely to function as A or P, given the available cues, and not to mark an expression with a high probability of performing that role. Exactly what those cues are may vary cross-linguistically. The main reason for this variation lies in the strong correlations between the features of individual roles. For example, a transitive subject is usually animate, non-lexical, discourse-given and definite (see below). It is extremely difficult to predict which of these features will become distinctive in a grammar. But if they do, we can predict the direction. If we take a particular scale, for example animacy, one would expect to find more languages with unmarked inanimate objects than with unmarked animate objects, and not the other way round (see Section 8.4 for corpus data showing why this should be the case).
This situation is similar to what we saw in Chapter 7 on causative constructions. When we discussed the semantics of the less compact causatives, we saw that the directionality of the formal asymmetries is universal: less accessible causative meanings are expressed by more costly forms than more accessible ones. This fact should not be obscured by the richness of the semantic distinctions and causation types that we find in the world’s languages.
Still, it is instructive to look at the cross-linguistic data, as we did in Chapter 7, in order to see which features and scales are relevant for differential marking. Below I present frequencies from the AUTOTYP database (Bickel et al. Reference Bickel, Nichols, Zakharko, Witzlack-Makarevich, Hildebrandt, Rießler, Bierkandt, Zúñiga and Lowe2017), version 0.1.0, which show which of the scales are more and less relevant for A and P cross-linguistically. Languages have many possible ways of combining the scales. Imagine Language X, which marks animate nominal and pronominal P arguments, and Language Y, which marks only animate nominal ones. In this case, Language X has a split on the animacy scale, whereas Language Y has a split on both the nominality scale (noun vs. pronoun), and on the animacy scale within nouns (inanimate vs. animate).
One feature of AUTOTYP is that it uses the labels ‘high’ and ‘low’ to indicate high or low prominence in discourse (DP). High-DP means that an argument is either given, definite, animate, topical, etc., or a combination of these features. Low-DP indicates the absence of these features (or some of them). This is why it is impossible to separate definiteness from animacy precisely. Instead, we have a very syncretic discourse prominence hierarchy:
| Discourse prominence hierarchy: |
| High-DP (human/animate, definite/specific, given, topical) > Low-DP (non-human/inanimate, indefinite/non-specific, non-topical, new) |
The frequencies of the splits found in AUTOTYP are presented in Tables 8.1 and 8.2. I excluded those that also involve splits between different language-specific lexical classes of nouns, the 1st and 2nd-person, singular and plural forms, or inclusive and exclusive pronouns. Splits in number and person are particularly frequent in pronouns (see more information in Schmidtke-Bode and Levshina Reference Schmidtke-Bode, Levshina, Seržant and Witzlack-Makarevich2018), which is probably explained by the conserving frequency effects causing the idiosyncratic patterns described by Filimonova (Reference Filimonova2005). I also excluded cases where one could interpret the data as both a fit and a violation. For example, Menya (an Angan language spoken in Papua New Guinea) has differential object marking, where pronouns and nouns with low discourse prominence are unmarked, while nouns with high discourse prominence are marked. This means that the system fits the DP prominence scale within nouns (high-DP nouns are marked, while low-DP nouns are unmarked), but violates the nominal scale, according to which pronouns should be marked if (some) nouns are marked.
Table 8.1. Cross-linguistic distribution of differential transitive subject marking in AUTOTYP 0.1
| Distinction | Scale(s) | No. languages | No. families | No. areas | Examples |
|---|---|---|---|---|---|
| Pronouns || Nouns | Nominality | 20 | 7 | 3 | Djapu (Pama-Nyungan, Pacific), Kryz (Nakh-Daghestanian, Eurasia), Cashinahua (Pano-Tacanan, Americas) |
| 1st & 2nd Person || 3rd Person (incl. Nouns) | Person | 11 | 4 | 2 | Kham (Sino-Tibetan, Eurasia), Yidiny (Pama-Nyungan, Pacific) |
| Animates || Inanimates (both pronouns and nouns) | Animacy | 2 | 2 | 2 | Mayali (Gunwinyguan, Pacific), Northern Qiang (Sino-Tibetan, Eurasia) |
| Pronouns, Animate Nouns || Inanimate Nouns | Nominality, Animacy | 1 | 1 | 1 | Hittite (Indo-European, Eurasia) |
| Pronouns, Personal Proper Names || Other Nouns | Nominality, Animacy | 1 | 1 | 1 | Djinang (Pama-Nyungan, Pacific) |
| Pronouns, DP Animate Nouns || Other Nouns | Nominality, Animacy, DP | 1 | 1 | 1 | Mangarrayi (Mangarrayi-Maran, Pacific) |
| Pronouns, Proper Nouns, Kinship terms || Other Nouns | Nominality | 1 | 1 | 1 | Central Pomo (Pomoan, Americas) |
| The most DP Pronouns and Nouns || Other | DP | 1 | 1 | 1 | Tukang Besi (Austronesian, Pacific) |
| Pronouns, Proper nouns, Kinship terms || Non-kin common nouns | Nominality | 1 | 1 | 1 | Eastern Pomo (Pomoan, Americas) |
| Common Nouns || Pronouns, Proper Nouns | Nominality | 1 | 1 | 1 | Gitksan (Tsimshianic, Americas) |
Table 8.2. Cross-linguistic distribution of differential object marking in AUTOTYP 0.1
| Distinction | Scale(s) | No. languages | No. families | No. areas | Examples |
|---|---|---|---|---|---|
| Nouns || Pronouns | Nominality | 65 | 33 | 4 | Logba (Kwa, Africa), Khanty (Uralic, Eurasia), Garrwa (Garrwan, Pacific), Rama (Chibchan, Americas) |
| Low-DP nouns || Pronouns, high-DP nouns | Nominality, DP | 59 | 21 | 4 | Dizi (Omotic, Africa), Awa Pit (Barbacoan, Americas), Tamil (Dravidian, Eurasia), Akoye (Angan, Pacific) |
| Indefinite nouns || Pronouns, definite nouns | Definiteness, Nominality | 14 | 8 | 3 | Amharic (Semitic, Africa), Chuvash (Turkic, Eurasia), Barasano (Tucanoan, Americas) |
| Nouns, 3rd person pronouns || 1st & 2nd person pronouns | Person, Nominality | 7 | 6 | 4 | Dyirbal (Pama-Nyungan, Pacific), Kutenai (isolate, Americas), Waskia (Madang, Pacific), Tsova-Tush (Nakh-Daghestanian, Eurasia) |
| Inanimate nouns || Pronouns, animate nouns | Animacy, Nominality | 7 | 4 | 2 | Hittite (Indo-European, Eurasia), Anamuxra (Madang, Pacific) |
| Inanimate and low-DP animate nouns || Pronouns and high-DP animate nouns | Nominality, DP | 6 | 3 | 2 | Hup (Nadahup, Americas), Djapu (Pama-Nyungan, Pacific) |
| Low-DP pronouns and nouns || High-DP pronouns and nouns | DP | 5 | 5 | 3 | Tariana (Arawakan, Americas), Kharia (Austro-Asiatic, Eurasia), Tainae (Angan, Pacific) |
| Inanimate nouns and pronouns || Animate nouns and pronouns | Animacy | 2 | 2 | 1 | Imonda (Border, Pacific) |
| Common, non-kin nouns || All other | Nominality | 2 | 2 | 2 | Eastern Pomo (Pomoan, Americas), Sardinian (Indo-European, Eurasia) |
| Non-specific nouns || All other | Definiteness, Nominality | 2 | 2 | 1 | Persian (Indo-European, Eurasia) |
| Nouns and low-DP 3rd person pronouns || All other pronouns | Nominality, DP | 1 | 1 | 1 | Yidiny (Pama-Nyungan, Pacific) |
| Non-kin nouns, low-DP 3rd person pronouns || All other | Nominality, DP | 1 | 1 | 1 | Central Pomo (Pomoan, Americas) |
| Low-DP nouns, inanimate pronouns || All other | DP, animacy | 1 | 1 | 1 | Afrikaans (Indo-European, Africa) |
| Pronouns, high-DP nouns || low-DP nouns | Nominality, DP | 1 | 1 | 1 | Maithili only in dependent clauses with converbs (Indo-European, Eurasia) |
| 1st & 2nd person pronouns || 3rd person pronouns, nouns | Person, Nominality | 1 | 1 | 1 | Osage (Siouan, Americas) |
| Highest-DP pronouns and nouns || All other | DP | 1 | 1 | 1 | Tukang Besi (Austronesian, Pacific) |
Let us consider the results for transitive subjects presented in Table 8.1. The double vertical bar ‘||’ represents a split. The features on the left of the sign are unmarked, and the features on the right are marked. Importantly, the marking is understood by the authors in the Silversteinian sense (see Bickel et al. Reference Bickel, Witzlack-Makarevich, Zakharko, Bornkessel-Schlesewsky, Malchukov and Richards2015b): an argument is unmarked if it has the same expression as S (intransitive subject). In practice, it usually means zero expression. The parentheses contain the language family and the geographic macroarea.
The predominant category for subject marking is clearly nominality (cf. de Hoop and Malchukov Reference de Hoop and de Swart2008: 567), followed by person. Animacy and discourse prominence only rarely play a role on their own. This supports the previous observations. There is a violation of the nominality scale (see the shaded row): in Gitksan (a Tsimshianic language spoken in Canada) subjects expressed by common nouns are unmarked, but subjects expressed by proper nouns and pronouns are marked.
The splits in object marking are shown in Table 8.2. The results partly support the previous claims. Inanimate, indefinite and low-DP nouns are overwhelmingly unmarked. However, the most frequent distinction is between nouns and pronouns. Person seems to produce the lowest number of splits. Again, there are a few exceptions (see the shaded rows), but they are not numerous.
To summarize, subjects tend to be marked if they are nominal and, less frequently, if they are 3rd person. The evidence for the other scales is rather weak. Objects tend to be marked if they are pronominal, high-DP (including animacy and definiteness) and 1st or 2nd person. The evidence for the person scale is the weakest.
One problem with this database is that the category DP hides many scales, so that we cannot directly test different scales. Also, there is no information about the presence or absence of case markers. Table 8.3 displays some frequencies from the cross-linguistic differential and optional marking database (Sommer and Levshina Reference Sommer and Levshina2021), which currently includes detailed descriptions of differential A and P marking in twenty-five languages from diverse families and all parts of the world. The counts are the number of languages in which a particular scale effect is found. Only productive patterns with segmental case markers are taken into account. Importantly, almost all these languages fit the scales. We find only one violation of the predictions. This provides additional support of the universal directionality of the scale effects described above. The evidence for objects is more convincing than the evidence for subjects, in accordance with the previous observations. More data will tell us if these observations are correct.
Table 8.3. Number of languages that fit (violate) the scales in the cross-linguistic differential and optional marking database
| Argument | Animacy scale | Definiteness scale | Nominality scale | Person scale |
|---|---|---|---|---|
| A | 3 (0) | 0 (0) | 4 (1) | 4 (0) |
| P | 8 (0) | 11 (0) | 9 (0) | 4 (0) |
The next question is, how do we know that the coding asymmetries are indeed due to efficiency? And why is it more efficient to mark definite objects, for example? The next section addresses these questions.
8.3 Explanations of Differential Case Marking
8.3.1 Disambiguation and Economy
There exist many theories and accounts that explain the observed scale effects in differential case marking. Probably the most popular explanation has to do with disambiguation, or distinguishing between the arguments. It is often argued that two nominal phrases, when they are simultaneously present in the transitive clause, should be distinguished (Comrie Reference Comrie and Lehmann1978: 379–380, Reference Comrie1989: 124–127; Dixon Reference Dixon1979; Givón Reference Givón1984: 184; Aissen Reference Aissen2003, among others). It is assumed that disambiguation interacts with economy. If a context is not ambiguous, or the argument in question has typical properties, it does not require formal marking. Only situations that may lead to misunderstanding require marking.
This principle is very clear in the systems in which the marking of the subject depends on the properties of the object, and the other way round (cf. de Hoop and Malchukov Reference de Hoop and de Swart2008). Such marking is called global, in contrast to local marking, which depends only on the properties of the argument itself (Witzlack-Makarevich and Seržant Reference Witzlack-Makarevich, Seržant, Seržant and Witzlack-Makarevich2018). For example, Malayalam (a Dravidian language) marks animate objects and usually does not mark inanimate objects. However, when the sentence is potentially ambiguous, the object marker can also be used on inanimate objects, as in (6).
| Malayalam: Dravidian (Asher and Kumari 1997:204, cited from de Swart Reference De Swart2007: 88) | |||
| a. | Kappal | tiramaalakaí-e | bheediccu. |
| ship.nom | waves-acc | split.pst | |
| ‘The ship broke through the waves.’ | |||
| b. | Tiramaalakaí | kappal-ine | bheediccu. |
| waves. nom | ship-acc | split.pst | |
| ‘The waves split the ship.’ | |||
More examples can be found in de Swart (Reference De Swart2007: Section 3.2).
Also, mood, tense and aspect can play a role. For example, object marking is less likely in imperative sentences in Udihe (Altaic) and Mutsun (Penutian) (Sommer and Levshina Reference Sommer and Levshina2021). This may have to do with the fact that imperatives usually do not contain subjects, so there is no competing nominal. Interestingly, Rapanui (Austronesian) has higher chances of object marking in imperative sentences with explicit subject than without it, which supports this explanation.
Yet, global marking systems like in (6) are less frequent than systems with local marking, as we can judge from the cross-linguistic data. A possible explanation is that local differential marking provides a convenient shortcut, which works in most situations, is easier to learn and probably helps to save processing effort during language production and comprehension. It can be easier to process a sentence incrementally, constituent after constituent, than to monitor it as a whole for potential ambiguity in language production (Seržant Reference Seržant, Schmidtke-Bode, Levshina, Michaelis and Seržant2019: 169), or run the risk of incorrect role assignment and reanalysis in comprehension. Moreover, if animate objects, for example, are often marked in order to avoid ambiguity with subjects in a global marking system, this marking can become entrenched and be triggered automatically by animacy. For example, speakers of Spanish know that la actriz ‘the actress’, when it functions as a direct object, is used with the marker a, while la película ‘the film’ is used without any marker (Diessel Reference Diessel2019: 238). No further considerations are required. It may be that both processing optimization and learning mechanisms contribute to the fact that global marking is less frequent cross-linguistically than local marking, and when it does occur, it usually plays a subordinate role with regard to other constraints (Seržant Reference Seržant, Schmidtke-Bode, Levshina, Michaelis and Seržant2019).
Different referential features (and other contextual information) serve as cues that help the addressee to infer the grammatical role immediately. As Comrie writes (Reference Comrie and Lehmann1978: 385–386),
there seems to be a general supposition in human discourse that certain entities are inherently more agentive than others, and as such inherently more likely to appear as A of a transitive verb and less likely to appear as P of a transitive verb.
For example, animacy is a strong cue for subjecthood, as opposed to objecthood. If a referent is animate, it is more likely to be a subject than an object. The addressee does not need much help in assigning the role. An inanimate entity is much more likely to be an object than a subject. Thus, inanimacy is a strong cue for objecthood. As Royen (Reference Royen1929: 590) observed, ‘Eine Person ist vor allem agens, ein Impersonale vor allem patiens [A person is mainly Agent, a non-person is mainly Patient]’ (Reference Royen1929: 590). If this is true, then additional marking on a human P or on non-human A would help the hearer to identify the role easier. In fact, this idea had already been proposed by Bishop Robert Caldwell (1856: 271, cited from Filimonova Reference Filimonova2005: 78):
The principle that it is more natural for rational beings to act than to be acted upon; and hence when they do happen to be acted upon – when the nouns by which they are denoted are to be taken objectively – it becomes necessary, in order to avoid misapprehension, to suffix to them the objective case-sign.
An example is Korean, which has probabilistic marking. In particular, Subject is marked more frequently in colloquial speech when it is 3rd person, a non-human participant and a common noun, than when it refers to speech act participants and other human referents expressed by pronouns or proper nouns. For Object, the tendencies are mostly reversed (Lee Reference Lee, de Hoop and de Swart2009). Consider an example of a pronominal object in (7).
| Korean (Kwon and Zribi-Hertz Reference Kwon and Zribi-Hertz2008: 263) | |||
| yeongmi-ga | uli-leul | moim-e | chodae ha-ess-eo |
| yeongmi-sbj | us-obj | meeting-loc | invitation do-pst-dec/inf |
| ‘Yeongmi invited us to the meeting.’ | |||
As will be shown in Section 8.4, the features with higher chances of marking are weaker cues to the grammatical roles.
There are many cases that cannot be easily explained by disambiguation at the sentence level. Consider the following example from Japanese:
| Japanese (Kurumada and Jaeger Reference Kurumada and Jaeger2015: 156) | |||
| Sensei-ga | seito-o | ekimae-de | mi-ta-yo. |
| teacher-sbj | student-obj | station-loc | see-pst-sfp |
| ‘The teacher saw the student at the station.’ | |||
The subject marker ‑ga makes the sentence unambiguous. Still, Kurumada and Jaeger (Reference Kurumada and Jaeger2015) show that the chances of the object marker are higher for animate nouns than for inanimate ones in sentences like this. In addition, they find that the participants tend to mark the object when it occurs in a less plausible configuration of A and P, as in (9a). In contrast, in more stereotypical contexts, such as the one in (9b), the marker is often omitted:
| a. | The police officer attacked the criminal in the middle of the night. |
| b. | The criminal attacked the police officer in the middle of the night. |
The accessibility of the object interpretation in (9a) is lower than in (9b) because it is criminals who usually attack police officers. By using the object marker, the speaker signals that the situation does not correspond to the stereotypical situation. Importantly, the subject marking is always present in their stimuli. There is thus no ambiguity. The object marker helps to override the more accessible interpretation.
According to Haspelmath (Reference Haspelmath2019), differential marking is motivated by expectation management rather than by ambiguity avoidance. Another argument is the fact that coding asymmetries in some languages involve a shorter and a longer marker. The longer marker is preferred in less predictable role–reference associations. This tendency cannot be explained by ambiguity avoidance because there is no ambiguity in the first place. But it can be explained by the principle of negative correlation between accessibility and costs (see also Section 1.4.2).
Differential marking is influenced not only by the referential properties of one or both arguments, but also by other factors, such as word order, tense and aspect, etc. An example is Gurindji Kriol, a mixed language spoken in Australia, which has SVO as the dominant word order. The ergative marker is used on subjects in about 66 per cent of all cases. However, in rare cases when the subject follows the verb, more than 90 per cent of such subjects have an ergative marker (Meakins Reference Meakins2008: 283). Another important variable is the distance from the verb. For example, in conversational Korean, the chances that an object will remain bare gradually decrease as the number of words between the object and the verb increases (Kim Reference Kim2008). This does not necessarily mean that the order must be untypical. For example, there can be an adverb intervening between the object and the verb while the word order remains canonical (i.e., SOV). These cases can also be interpreted as providing additional formal cues in more challenging contexts, where the addressee may have problems with accessing the intended interpretation.
At the same time, it should be said that the facilitation of local role assignment and ambiguity avoidance at the more general sentence level are two sides of the same coin. In both cases, the use of marking depends on the accessibility of the intended interpretation to the addressee. It is necessary to repeat here that the importance of ambiguity for explaining language structure and use is overrated (Ferreira Reference Ferreira2008; Wasow Reference Wasow and Winkler2015; see also Section 6.2.1). In most cases, language users have enough contextual cues to understand each other. Note that even languages with a high level of grammatical ambiguity and underspecification, like those in east and mainland Southeast Asia (Bisang Reference Bisang, Sampson, Gil and Trudgill2009), are still communicatively effective. At the same time, many languages are vastly redundant (see Section 6.3.6). However, if a language develops a coding split, it is likely to be efficient in the sense that more formal cues are provided to express a less accessible meaning. Accessibility is evaluated either globally, at the level of the entire clause, or – more often – locally, depending on the properties of the argument.
8.3.2 Iconicity of Markedness and Economy
Iconicity of markedness means that semantically (or cognitively) marked members of grammatical categories are usually marked formally, while semantically unmarked ones are also formally unmarked. The relationship between semantic and formal marking is iconic. This idea, also labelled by Givón (Reference Givón and Givón1995) as the meta-iconic markedness principle, has been used as an explanation of differential marking (cf. Dalrymple and Nikolaeva Reference Dalrymple and Nikolaeva2011: 3). As noted by Aissen (Reference Aissen2003), iconicity of markedness interacts with the principle of economy because semantically unmarked arguments are left formally unmarked. Semantically and formally unmarked subjects are prominent (animate or human, definite and specific), while marked subjects are non-prominent. For objects, the reverse is claimed to be the case (Aissen Reference Aissen2003).
A problem with this account is that markedness is a slippery concept, which has very many interpretations (Fenk-Oczlon Reference Fenk-Oczlon1991; Haspelmath Reference Haspelmath2006). There is a danger of using it in a circular way (e.g., this form has no marking because it is semantically/conceptually unmarked, and it is semantically/conceptually unmarked because it has no marking, and so infinitely). Following Fenk-Oczlon (Reference Fenk-Oczlon1991) and Haspelmath (Reference Haspelmath2006), I assume that many, if not all, aspects of markedness relevant for grammar can be explained by frequency asymmetries between different categories.
8.3.3 Indexing Function and High Transitivity
Another explanatory principle suggested in the literature is the identifying, or indexing, function of differential marking. As de Hoop and Malchukov (Reference de Hoop and de Swart2008) argue, both prominent subjects and prominent objects will be marked because they make for better and more individuated participants. Animate subjects, are, for instance, volitional and agentive, which is why they make good subjects. Animate objects are considered to be more (obviously) affected than others and therefore make for better objects. This is why, it is argued, differential object marking is more common than differential subject marking. In differential object marking, the distinguishing and identifying functions overlap: animate or definite objects are both prominent and similar to subjects (and therefore need to be distinguished from the latter). For subjects, however, these principles are in conflict because animate subjects are prominent but they do not require disambiguation, while inanimate subjects are not prominent but require disambiguation (Malchukov Reference Malchukov2008).
This approach is related to the one taken by Hopper and Thompson (Reference Hopper and Thompson1980), who say that the situations where differential object marking is observed indicate high transitivity. The latter is argued to be associated with the foregrounding function in discourse. Highly transitive events involve highly individuated subjects and objects, and the object is highly affected. Presumably, definite and animate subjects and objects satisfy these conditions and therefore lead to high transitivity.
Note that high transitivity is different from typical transitivity. For instance, Næss (Reference Næss2007) claims that the prototypical transitive clause is one where the subject and the object are maximally semantically distinct. Similarly, Comrie (Reference Comrie1989: 128) speaks of ‘natural transitive constructions’, in which the subject is high on animacy and definiteness, and the object is low on animacy and definiteness.
Certainly, de Hoop and Malchukov’s account is very elegant and nicely explains the relative scarceness of evidence for scale effects in differential subject marking. However, attempts by some researchers to use affectedness in order to explain differential case marking have not brought conclusive results so far (see García García Reference García García, Seržant and Witzlack-Makarevich2018). Animacy and definiteness, as we have seen, play a very important role in differential object marking, but there are far fewer cases where affectedness would play a role per se.Footnote 2 Finally, it is not quite clear what kind of cognitive or pragmatic mechanism is responsible for marking the more salient, more representative, etc. participants, while leaving the others unmarked.
How to explain the rarity of scale effects in differential subject marking then? First of all, ergative languages, where differential subject marking is found, are relatively infrequent, which leaves us with fewer opportunities for variation. A possible reason is a cognitive bias towards processing the unmarked first nominal phrase as the subject (Bickel et al. Reference Bickel, Witzlack-Makarevich, Choudhary, Schlesewsky and Bornkessel-Schlesewsky2015a). This bias, which can be explained by general processing principles (see Section 3.3.4), exists in ergative languages, as well. This may account for the fact that ergativity is disfavoured cross-linguistically. Other reasons may be the infrequent use of full lexical forms of subjects in discourse (see Section 8.4) and frequent omission of pronominal subjects due to their high accessibility (see Section 2.2), which do not allow the marking to grammaticalize.
8.4 Reverse Engineering: Cross-Linguistic Generalizations and Corpus Data
8.4.1 A Reverse-Engineering Approach to Differential Case Marking
This section focuses on the previous functional-adaptive accounts that involve the notions of disambiguation and iconicity of markedness. Which of them fits the cross-linguistic data the best? In order to compare the explanations, we can perform a kind of reverse engineering. We will use the cross-linguistic tendencies discussed in Section 8.2 to predict how the features of Subject and Object would need to be distributed in discourse for the observed patterns to arise, according to each of the strategies. In order to compare the predictions with the discourse data, I analyse corpora in diverse languages.
Let us begin with the distinguishing account. There are at least two strategies that language users may employ. According to one strategy, language users will tend to mark Subjects that look like typical Objects, and Objects that look like typical Subjects. As Aissen puts it (Reference Aissen2003: 437), ‘it is those direct objects which most resemble typical subjects that get overtly case-marked’. In other words, those Subjects that have properties of typical Objects get extra marking, and the other way round. I will call this strategy ‘Mark Impostors, Don’t Mark the Authentic’.
Formally, this can be expressed with the help of conditional probabilities. Let ℙ (Feature|A) stand for the conditional probability of a certain Feature (i.e., animacy, definiteness, etc.) given the A role. For brevity, we will use conventional labels A and P to represent transitive subject and object, respectively. The symbol ℙ stands for probability, which can be approximated by the proportion of A’s with this feature in the total number of A’s in a reasonably large sample of discourse. The symbol ℙ should not be confused with P, which represents a grammatical role. The predictions are then as follows:
| Reverse-engineering predictions based on the distinguishing account and the strategy ‘Mark Impostors, Don’t Mark the Authentic’ |
| Prediction for A: |
| If A with Featurei tends to be formally marked in languages of the world, then the probability ℙ (Featurei |P) in discourse should be high. If A tends to be unmarked, then ℙ (Featurei |P) should be low.Footnote 3 |
| Prediction for P: |
| If P with Featurej tends to be formally marked in languages of the world, then the probability ℙ (Featurej |A) in discourse should be high. If P tends to be unmarked, then ℙ (Featurei |A) should be low. |
If this approach is correct, one would expect that a typical P in discourse is nominal and 3rd person, since A’s with these features are marked cross-linguistically, as was shown in Section 8.2. An untypical P is then pronominal and 1st or 2nd person, because these features are not marked on A. As for A’s, their most common properties should be pronominality, high discourse prominence (givenness and definiteness, including humanness/animacy), and 1st or 2nd person, because P’s with these features are usually marked in the typological data. From this follows that A’s should rarely be nominal, low-DP, including inanimacy, or 3rd person.
The other interpretation of distinguishability involves interpreting the features as strong or weak cues for assigning a grammatical role. This disambiguation strategy can be labelled as ‘Mark Weak Cues, Don’t Mark Strong Cues’.
| Reverse-engineering predictions based on the distinguishing account and the strategy ‘Mark Weak Cues, Don’t Mark Strong Cues’ |
| Prediction for A: |
| If A with Featurei tends to be formally marked in languages of the world, then ℙ (A| Featurei) should be low. If A tends to be unmarked, then ℙ (A| Featurei) should be high. |
| Prediction for P: |
| If P with Featurej tends to be formally marked in languages of the world, then ℙ (P| Featurej) should be low. If P tends to be unmarked, then ℙ (P| Featurei) should be high. |
The notation ℙ (A| Featurei) represents the conditional probability of A given a certain feature. It can be computed as the proportion of A’s among all core arguments with this feature (in our case, A and P). ℙ (P| Featurei) stands for the conditional probability of P given a feature and can be estimated as the proportion of P’s relative to all A’s and P’s with this feature. The cross-linguistic distribution of the features, which was discussed in Section 8.2, suggests that one could expect to find few A’s among nominal core arguments (i.e., A and P taken together) and possibly among 3rd-person referents because A’s with these features tend to be marked cross-linguistically. We can also expect low proportions of P’s in pronominal, high-DP, animate or definite and possibly 1st and 2nd-person arguments, because P’s with these features tend to be marked.
As for iconicity of markedness, this has to do with the typicality of different A’s and P’s, regardless of the other argument. This principle can be called ‘Mark Weirdos, Don’t Mark Normals’. Since markedness is associated with the relative frequency of two or more members within one category, e.g., singular and plural within the category of number (Greenberg Reference Greenberg1966), we need to compute the proportions of particular referents (animate, definite, etc.) in the total number of A’s or P’s, and take the least probable ones. One can formulate the predictions as follows:
| Reverse-engineering predictions based on the markedness principle and the strategy ‘Mark Weirdos, Don’t Mark Normals’ |
| Prediction for A: |
| If A with Featurei tends to be formally marked in languages of the world, then ℙ (Featurei |A) should be low. If A tends to be unmarked, then ℙ (Featurei |A) should be high. |
| Prediction for P: |
| If P with Featurej tends to be formally marked in languages of the world, then ℙ (Featurej |P) should be low. If P tends to be unmarked, then ℙ (Featurei |P) should be high. |
As explained above, the quantities ℙ (Featurei |A) and ℙ (Featurei |P) represent conditional probabilities of a given feature given the role (A or P, respectively). If this principle is relevant, A should have a low proportion of nouns and possibly 3rd-person referents, while P should have a particularly low number of pronouns, high-DP, animate and possibly 1st and 2nd-person referents, since all these are the formally marked features of A and P in the cross-linguistic data. The predictions for the three marking strategies are summarized in Table 8.4. The features in brackets are less clearly supported cross-linguistically.
Table 8.4. Reverse-engineered predictions for the distribution of features of A and P in discourse
| Marking strategy | Prediction for A distribution | Prediction for P distribution | Conditional probability |
|---|---|---|---|
| Mark Impostors | A is usually pronominal, high-DP/animate (1&2 person). | P is usually nominal and 3rd person. | ℙ (Feature|Role), the other role |
| Don’t Mark the Authentic | A is rarely nominal, low-DP/animate (3rd person) | P is rarely pronominal and 1st & 2nd person. | |
| Mark Weak Cues | Nominal and 3rd person arguments are rarely A. | High-DP/animate, pronominal arguments (and 1st & 2nd person) are rarely P. | ℙ (Role|Feature),the same role |
| Don’t Mark Strong Cues | Pronominal and 1st & 2nd person arguments are usually A. | Low-DP/inanimate, nominal arguments (and 3rd person) are usually P. | |
| Mark Weirdos | A is rarely nominal and rarely 3rd person. | P is rarely pronominal, high-DP/animate (and 1st & 2nd person). | ℙ (Feature|Role),the same role |
| Don’t Mark Normals | A is usually pronominal and 1st and 2nd person. | P is usually nominal, low-DP/inanimate (and 3rd person). | |
As for the identifying explanation, I leave it out from testing. In order to test this hypothesis, we need independent evidence that the degree of individuation, affectedness, salience, etc., influences the probability of marking for any role. It seems that such evidence can only be provided experimentally, which is why this hypothesis is beyond the scope of the present corpus-based study.
8.4.2 Recycling Old Data
If any of the functional-adaptive accounts presented here is correct, we can expect to find a correspondence between corpus-based conditional probabilities ℙ (Role|Feature) or ℙ (Feature|Role) and the cross-linguistic distribution. In fact, some statistical evidence was provided by Thomson (Reference Thomson1909), who looked at transitive verbs in contemporary Russian and found that almost three-quarters of them had exclusively a person as the subject. Only 14 per cent of transitive verbs had a human being as the object (e.g., the Russian verbs denoting ‘undress’ and ‘hug’), and more than half could be used with a human object (e.g., ‘see’ and ‘steal a child’). In contrast, inanimate things were normally subjects for 10 per cent of all transitive verbs. About 45 per cent of the verbs always had an inanimate thing as a patient, and three-quarters of the verbs could take an inanimate object. Thomson (Reference Thomson1909) did not provide the exact frequencies, unfortunately, nor did he describe his sampling method.
After a long period when language in use was not the main focus of most theorists, the end of the twentieth century witnessed a strong interest in discourse-based explanations of grammatical patterns. There was a fruitful discussion of the Preferred Argument Structure as the basis for ergativity, as well as quite a few studies of animacy effects in grammar (e.g., Du Bois Reference Du Bois1987; Dahl and Fraurud Reference Dahl, Fraurud, Fretheim and Gundel1996; Dahl Reference Dahl2000; Du Bois et al. Reference Du Bois, Kumpf and Ashby2003; Haig and Schnell Reference Haig and Schnell2016). As a useful product of these studies and debates, one can find a considerable amount of available data showing the distributional properties of A and P in different types of corpora and languages. This section examines these data as evidence for or against the predictions formulated in Section 8.4.2. I chose those studies in which it was possible to find the original frequencies for the parameters of interest and a description of data sources. L2 learner corpora and language impairment data were not considered because I could not find comparable data in many languages for these varieties.
When recycling data from different studies by different authors, one has to make some compromises. First of all, the definitions of transitivity varied somewhat from one study to another. For example, Dahl and Fraurud (Reference Dahl, Fraurud, Fretheim and Gundel1996) only include those clauses in Swedish in which both the subjects and the objects are overt noun phrases. However, since Swedish is not a subject pro-drop language, the results can still be comparable with the other studies, where non-overt arguments are also counted (e.g., the Multi-CAST data). When pro-drop languages were considered, I only included those studies where both overt and covert A’s and P’s were counted. This was done for the sake of comparability of the results. Moreover, there were some discrepancies in the features. For example, most studies coded humanness, but some described the animacy of A and P. This difference was tolerated, since in my own experience, animals are infrequently mentioned in contemporary corpora. Also, most studies counted the 1st and the 2nd-person reference based on the grammatical properties of the arguments, but Dahl (Reference Dahl2000) uses the notion ‘egophoric’, which includes reference to the speech act participants, generic and logophoric reference, like he in Peter says that he is sick. Still, I hope that these differences lead to relatively small imprecisions.
The results that are directly relevant for our research question are summarized in Tables 8.5–8.7. The tables display the percentages which represent different conditional probabilities of the features of A and P. The original corpus counts are provided in Appendix 2. The features presented here were chosen because they were available in numerous studies. They are the following:
lexical, i.e., full nominal phrase vs. pronouns, affixes or zero;
human or animate (when specified) vs. all other semantic types;
1st and 2nd person (or egophoric reference in Dahl’s approach) vs. the 3rd person;
new vs. given or inferable from context.
Note that I did not aggregate the frequencies of pronouns because there was a lot of variation with regard to the acceptability of zero anaphora in a language. For instance, English allows pronouns to be omitted only in a few cases (most typically, the subject of an imperative sentence), while in Lao zero anaphora is a very common reference-management device for both subjects and objects (see Chapter 2, Section 2.2.1). This is why I make a distinction between lexical and non-lexical expressions here. Definiteness is absent from this list because this feature is very rarely reported, unfortunately. However, the study based on new data in Section 8.4.3 shows that the frequencies of definite A’s and P’s are similar to those of discourse-given arguments.
Table 8.5. Distribution of features of A (transitive subjects) within the role (%)
| Study | Corpus | Lexical | Human | 1st & 2nd | New |
|---|---|---|---|---|---|
| Du Bois Reference Du Bois1987 | Pear Story narratives in Sakapultek | 6.1 | 100 | NA | 3.2 |
| Chui Reference Chui1992 | Ghost narratives in Chinese | 38.1 | NA | NA | 2.9 |
| Ashby & Bentivoglio Reference Ashby and Bentivoglio1993 | interviews (monologues) in French | 6.7 | NA | NA | 0 |
| interviews (monologues) in Spanish | 6.1 | NA | NA | 0.4 | |
| Dahl & Fraurud Reference Dahl, Fraurud, Fretheim and Gundel1996 | written Swedish | NA | 56 | NA | NA |
| Sutherland-Smith Reference Sutherland-Smith1996 | Several oral narratives in modern Hebrew | 6.7 | NA | NA | 2.2 |
| Dahl Reference Dahl2000 | spontaneous conversations in Swedish | NA | 93.2 (animate) | 60.7 (ego) | NA |
| Allen & Schröder Reference Allen, Schröder, Du Bois, Kumpf and Ashby2003 | Inuktitut child language | 1.1 | 99 (animate) | 97.4 | 0.7 |
| Arnold Reference Arnold, Du Bois, Kumpf and Ashby2003 | Mapudungun narrative texts | 14.9 | NA | NA | 1.2 |
| Everett Reference Everett2009 | English talk shows | 9.7 | 91.8 | NA | NA |
| Portuguese talk shows | 17.1 | 87.1 | NA | NA | |
| Lin Reference Lin and Xiao2009 | Chinese conversations | 20 | NA | NA | 15 |
| Chinese narratives | 18.8 | NA | NA | 12.5 | |
| Chinese written texts | 15.9 | NA | NA | 20.5 | |
| Schiborr Reference Schiborr, Haig and Schnell2016 | English autobiographical narratives | 8.2 | 92.8 | 59 | NA |
| Haig & Thiele Reference Haig, Thiele, Haig and Schnell2016 | Northern Kurdish traditional narratives | 13 | 96.7 | 32.5 | NA |
| Abidifar Reference Adibifar, Haig and Schnell2016 | Persian stimulus-based narratives | 13.6 | 96.2 | 3.6 | NA |
| Mosel & Schnell Reference Mosel and Schnell2016 | Teop traditional narratives | 9.7 | 95.4 | 25.6 | NA |
| Schnell Reference Schnell, Haig and Schnell2016 | Vera’a traditional narratives | 7.4 | 94.7 | 15.4 | NA |
Table 8.6. Distribution of features of P within the role (%)
| Study | Corpus | Lexical | Human | 1st & 2nd | New | |
|---|---|---|---|---|---|---|
| Du Bois Reference Du Bois1987 | Pear Story narratives in Sakapultek | 45.8 | 10 | NA | 24.7 | |
| Chui Reference Chui1992 | Ghost narratives in Chinese | 84.3 | NA | NA | 33.6 | |
| Ashby & Bentivoglio Reference Ashby and Bentivoglio1993 | interviews (monologues) in French | 67.4 | NA | NA | 29.7 | |
| interviews (monologues) in Spanish | 59.7 | NA | NA | 24.9 | ||
| Dahl & Fraurud Reference Dahl, Fraurud, Fretheim and Gundel1996 | written Swedish | NA | 13 | NA | NA | |
| Sutherland-Smith Reference Sutherland-Smith1996 | Several narratives in modern Hebrew | 56.3 | NA | NA | 23.9 | |
| Dahl Reference Dahl2000 | spontaneous conversations in Swedish | NA | 16.4 (animate) | 4.3 | NA | |
| Allen & Schröder Reference Allen, Schröder, Du Bois, Kumpf and Ashby2003 | Inuktitut child language | 6 | 21.1 (animate) | 14.3 | 27 | |
| Arnold Reference Arnold, Du Bois, Kumpf and Ashby2003 | Mapudungun narrative texts | 85.1 | NA | NA | 47.8 | |
| Everett Reference Everett2009 | English talk shows | 59.7 | 12.6 | NA | NA | |
| Portuguese talk shows | 84.7 | 6.1 | NA | NA | ||
| Lin Reference Lin and Xiao2009 | Chinese conversations | 80 | NA | NA | 55.6 | |
| Chinese narratives | 94 | NA | NA | 72.5 | ||
| Chinese written texts | 81.8 | NA | NA | 70.1 | ||
| Schiborr Reference Schiborr, Haig and Schnell2016 | English autobiographical narratives | 47.8 | 12.4 | 4.8 | NA | |
| Haig & Thiele Reference Haig, Thiele, Haig and Schnell2016 | Northern Kurdish traditional narratives | 54.7 | 25.9 | 6.8 | NA | |
| Abidifar Reference Adibifar, Haig and Schnell2016 | Persian stimulus-based narratives | 52.7 | 18 | 0 | NA | |
| Mosel & Schnell Reference Mosel and Schnell2016 | Teop traditional narratives | 43 | 43.3 | 4.7 | NA | |
| Schnell Reference Schnell, Haig and Schnell2016 | Vera’a traditional narratives | 56.1 | 35.3 | 8.6 | NA |
Table 8.7. Distribution of the roles within the features (only A shown) (%)
| Study | Corpus | Lexical A | Non-lexical A | Human A | Non-Human A | 1st & 2nd A | 3rd A | New A | Non-new A |
|---|---|---|---|---|---|---|---|---|---|
| Du Bois Reference Du Bois1987 | Pear Story narratives in Sakapultek | 11.9 | 63.8 | 91.7 | 0 | NA | NA | 12.5 | 58.6 |
| Chui Reference Chui1992 | Ghost narratives in Chinese | 33.6 | 81.6 | NA | NA | NA | NA | 9 | 62.1 |
| Ashby & Bentivoglio Reference Ashby and Bentivoglio1993 | interviews (monologues) in French | 9 | 74.1 | NA | NA | NA | NA | 0 | 58.7 |
| interviews (monologues) in Spanish | 9.3 | 70 | NA | NA | NA | NA | 1.4 | 57 | |
| Dahl & Fraurud Reference Dahl, Fraurud, Fretheim and Gundel1996 | written Swedish | NA | NA | 75.3 | 25.9 | NA | NA | NA | NA |
| Sutherland-Smith Reference Sutherland-Smith1996 | Several narratives in modern Hebrew | 14 | 74.6 | NA | NA | NA | NA | 11.3 | 63.8 |
| Dahl Reference Dahl2000 | spontaneous conversations in Swedish | NA | NA | 88.8 (animate) | 6.9 | 92.9 (ego) | 27.5 | NA | NA |
| Allen & Schröder Reference Allen, Schröder, Du Bois, Kumpf and Ashby2003 | Inuktitut child language | 15.9 | 51.3 | 82.7 (animate) | 1.4 | 87.2 | 2.9 | 2.4 | 58.1 |
| Arnold Reference Arnold, Du Bois, Kumpf and Ashby2003 | Mapudungun narrative texts | 14.9 | 85.1 | NA | NA | NA | NA | 2.5 | 65.4 |
| Everett Reference Everett2009 | English talk shows | 13.8 | 68.9 | 87.8 | 8.4 | NA | NA | NA | NA |
| Portuguese talk shows | 16.4 | 83.7 | 93.1 | 11.6 | NA | NA | NA | NA | |
| Lin Reference Lin and Xiao2009 | Chinese conversations | 20 | 80 | NA | NA | NA | NA | 21.7 | 66.4 |
| Chinese narratives | 16.1 | 92.9 | NA | NA | NA | NA | 14.7 | 76.1 | |
| Chinese written texts | 16.3 | 82.2 | NA | NA | NA | NA | 22.8 | 72.9 | |
| Schiborr Reference Schiborr, Haig and Schnell2016 | English autobiographical narratives | 13.9 | 62.3 | 87.6 | 7.1 | 92 | 28.8 | NA | NA |
| Haig & Thiele Reference Haig, Thiele, Haig and Schnell2016 | Northern Kurdish traditional narratives | 19 | 65.4 | 78.6 | 4.2 | 82.5 | 41.7 | NA | NA |
| Abidifar Reference Adibifar, Haig and Schnell2016 | Persian stimulus-based narratives | 19.9 | 63.7 | 83.7 | 4.3 | 100 | 48.1 | NA | NA |
| Mosel & Schnell Reference Mosel and Schnell2016 | Teop traditional narratives | 22.5 | 67.2 | 74 | 9.6 | 87.6 | 50.3 | NA | NA |
| Schnell Reference Schnell, Haig and Schnell2016 | Vera’a traditional narratives | 16.4 | 75.8 | 79.9 | 10.8 | 72.7 | 57.8 | NA | NA |
The total number of corpora is nineteen. They represent fourteen languages from seven families and different parts of the world: Indo-European (English, French, Northern Kurdish, Persian, Portuguese, Spanish, Swedish), Austronesian (Teop and Vera’a), Afro-Asiatic (Hebrew), Araucanian (Mapudungun), Eskimo-Aleut (Inuktitut), Mayan (Sakapultek) and Sino-Tibetan (Chinese). Different registers and types of discourse are included: spontaneous conversations, transcribed talk shows, narratives retelling films, autobiographical and traditional narratives, stimulus-based monologues in sociolinguistic interviews, child language use and miscellaneous written texts.
Table 8.5 shows the distribution of the above-mentioned features within the A role. In other words, the numbers show the proportion of lexical, human, 1st or 2nd person or new A’s in the total number of A’s. The data show that typical A’s are human/animate, not new and not lexical. Therefore, untypical A’s are new, non-human/inanimate and lexical. The person varies a lot. The highest proportion of 1st and 2nd-person arguments is observed in Inuktitut child language (97.4%), followed by the English autobiographical narratives and Swedish spontaneous conversations. The lowest is observed in the Persian stimulus-based narratives (3.6%).
Do these results support any of the predictions based on the conditional probabilities of a feature given the grammatical role A? Recall that these probabilities are relevant for two marking strategies, namely, ‘Mark Impostors, Don’t Mark the Authentic’ and ‘Mark Weirdos, Don’t Mark Normals’. As for the strategy ‘Mark Impostors, Don’t Mark the Authentic’, we expected A’s to be predominantly pronominal and high-DP/animate (and 1st and 2nd person). The A arguments in the corpora are overwhelmingly non-lexical, non-new and human/animate. There is no preference for the 1st and 2nd person, but this feature does not play a very important role in the differential case marking of P. Therefore, we can say that the predictions are mostly supported by the corpus data. Since the proportions of different values of the same feature (e.g., human and non-human) are mirror images of each other, the conclusion for the second part of the strategy, ‘Don’t Mark the Authentic’ is the same.
As for the strategy ‘Mark Weirdos, Don’t Mark Normals’, we expected A’s to be infrequently nominal and frequently pronominal, and also rarely have 3rd-person reference, and frequently 1st and 2nd-person reference. Indeed, we find relatively few lexical A’s. However, the 3rd person predominates in the data. Overall, the distribution of the person values is very scattered. The 3rd person seems to be frequent in traditional narratives, and the 1st and the 2nd person in autobiographic narratives, child speech and spontaneous dialogues. We can conclude that the predictions are only partly met. In addition, non-human/inanimate and new A’s are also rare, but these features are rarely marked formally by languages, as was shown above. It has already been mentioned that languages avoid indefinite and non-specific subjects, which are usually new, using passives or other strategies.
Table 8.6 shows the proportions of the features within the P role. The most typical properties of P are 3rd-person reference and non-humanness/inanimacy. The other features display substantial variation. The proportion of lexical P’s varies from 6% in Inuktitut child language to 94% in Chinese narratives. The fraction of new arguments also fluctuates from 24.7% in Sakapultek to 72.5% in Chinese narratives.
Are our predictions borne out? As for the strategy ‘Mark Impostors, Don’t Mark the Authentic’, we expected to find predominantly nominal or lexical P’s, and 3rd-person referents, and rarely non-lexical and 1st and 2nd-person referents. Most of the P’s are indeed lexical, although there are a few exceptions. The 3rd-person P’s are predominant. Thus, we can say that this strategy is mostly supported.
The other relevant strategy, ‘Mark Weirdos, Don’t Mark Normals’, is partly supported by the data. We expected P’s to be rarely pronominal, animate/ high-DP, and 1st and 2nd person, and frequently nominal, inanimate, low-DP and 3rd person. These features should be infrequent in discourse. P’s in the corpora are indeed rarely animate/human and 1st and 2nd person, but they are neither overwhelmingly lexical, nor new. Therefore, these predictions are only partly supported.
Table 8.7 displays the distribution of the roles (only A) within each feature. The percentages stand for the proportions of A’s and P’s among all arguments in the sample with a given feature. To obtain the corresponding proportions for P, one can simply subtract these numbers from 100%. With the exception of the 3rd person, the distributions are quite compact and mostly contain values less than 50% or greater than 50%. This makes them good candidates for explaining cross-linguistic generalizations. Overall, if an argument (A or P) is non-lexical, human, 1st or 2nd person, and not new, it is more likely to be A; if it is lexical, non-human and new, it is more likely to be P. As for the 3rd-person arguments, they tend to be P rather than A, but there are exceptions in the data.
Do the results in Table 8.7 support the predictions of the strategy ‘Mark Weak Cues, Don’t Mark Strong Cues’? Let us first have a look at the weak cues and the predictions for the first part, ‘Mark Weak Cues’. We expected nominal (lexical) and 3rd-person arguments to be A only infrequently. This is supported by the corpus data mostly, although two out of seven data points of the 3rd person are slightly greater than 50 per cent. Also, we expected pronominal, high-DP/animate and 1st and 2nd-person arguments to be P’s only rarely. This is also what we find in the corpora: non-lexical, non-new and human/animate arguments are unlikely to be P’s, as well as the 1st and 2nd-person arguments. Therefore, the predictions are supported. The cues that are marked formally are indeed weak in terms of their frequencies. Still, there are a few features with low probability of A (i.e., non-human A and new A) that do not lead to widely attested marking in the cross-linguistic data. Therefore, the strategy overgenerates predictions.
If we focus on the second part of the strategy, ‘Don’t Mark Strong Cues’ and examine the features with particularly high median probabilities ℙ (Role|Feature), we will see that the predictions are met both for A (non-lexical and 1st & 2nd person) and for P (lexical, non-human, new, animate and predominantly 3rd person).
8.4.3 Data from Informal Conversations
In order to control for the register and add the proportions of definite and discourse-given arguments, I also took data from informal spoken conversations in five languages: English and Russian (Indo-European), Lao (Tai-Kadai), N||ng (Tuu) and Ruuli (Bantu). The annotation for English, Lao and Russian was performed by myself. The annotation for N||ng and Ruuli was performed by Alena Witzlack-Makarevich (the database is available upon request). More details about the data and method can be found in Levshina (Reference Levshina2021a).
The A and P arguments in transitive clauses were coded for the following variables:
lexicality: lexical (common and proper nouns, adjectival or other nominalizations) or non-lexical (diverse pronouns and implicit arguments);
person: 1st, 2nd or 3rd;
semantic class: animate (human, animal, kinship term, organization) or inanimate (physical object, abstract entity, event);
definiteness (identifiability): definite, indefinite (specific or non-specific);
givenness (discourse accessibility): given (mentioned previously or inferable from context) or new.
The probabilities of features given A and P are displayed in Figures 8.1 and 8.2, respectively. The proportions that represent complementary features (e.g., Non-lexical vs. Lexical, Animate vs. Inanimate) are a mirror image of each other. Figure 8.1 shows that A’s are nearly exclusively non-lexical, animate, definite and discourse-given. As for the person, we do not see consistent results. In Russian, English and N||ng, the A’s are predominantly 1st and 2nd person, but this is not the case in Lao, which has more 3rd person A’s than 1st/2nd-person ones. This may have to do with the fact that the largest text in the Lao corpus can be characterized as gossip about other people. Also, in Ruuli, the probabilities of 1st/2nd-person and 3rd-person A’s are nearly equal. This type of probability is highly sensitive to the topic of conversations. When gossiping, for example, people use more 3rd-person subjects than 1st or 2nd-person subjects. These descriptive observations are supported by mixed-effects Poisson regression models (see Levshina Reference Levshina2021a).

Figure 8.1 Probabilities of different features of A based on corpora of spontaneous conversations in five languages

Figure 8.2 Probabilities of different features of P based on corpora of spontaneous conversations in five languages
Figure 8.2 shows the probabilities of the features given P. Many P’s are non-lexical. The 1st and 2nd-person P’s are very unlikely, while 3rd-person P’s are extremely likely. P also tends to be inanimate. It is also more frequently definite and given than indefinite and new. Again, all these effects are statistically significant.
How do these results relate to the predictions? As for the strategy ‘Mark Impostors, Don’t Mark the Authentic’, the prediction was that A’s will be predominantly pronominal, animate and high-DP, which means in this context given and definite. This is what we observe. As for the weaker prediction that A’s are usually 1st and 2nd person, it is not supported. There is a lot of variation. If we go back to our predictions for P’s, we will recall that we expected to find predominantly nominal, or lexical arguments and the 3rd-person referents among P’s. The first prediction is not borne out, while the second one is very clearly supported.
As for the strategy ‘Mark Weirdos, Don’t Mark Normals’, we predicted that A will be frequently pronominal and infrequently nominal. This prediction is supported. Lexical A’s are very rare. They should also usually have 1st or 2nd-person reference and rarely 3rd-person reference. This prediction is confirmed only partly in some languages. If we look at P arguments, we expected them to be frequently nominal, inanimate/low-DP and 3rd person and rarely pronominal, animate/high-DP, and 1st and 2nd person. These predictions hold only for animacy and person. Nominality displays a lot of variation, whereas definiteness and givenness have the opposite results from what was expected: definite and given P’s are in fact predominant.
Now let us consider the third strategy, ‘Mark the Weak Cues, Don’t Mark Strong Cues’. This is connected with the probabilities of roles given features. The plot displaying the probabilities of A given the ten features is shown in Figure 8.3. The plot with the same probabilities for P is a mirror image of this one, and is not displayed.

Figure 8.3 Probabilities of the role A given different features based on corpora of spontaneous conversations in five languages
We expected nominal (lexical), and 3rd-person arguments to be A only infrequently, because this would support the sub-strategy ‘Mark Weak Cues’. This is supported by the corpus data. Also, the prediction was to find fewer pronominal, high-DP/animate and 1st and 2nd-person P’s than A’s. These expectations are supported by the data. The arguments that are case-marked are indeed weak cues. As for ‘Don’t Mark Strong Cues’, the predictions are met both for A, which has a high probability given non-lexical and 1st & 2nd-person referents, and for P, which has a high probability given lexical, 3rd-person, inanimate, indefinite and new referents.
8.4.4 Summary of Findings
In this section I have tested reverse-engineered predictions based on the cross-linguistic patterns in differential case marking. These predictions were formulated for three marking strategies, which are based on the available explanations of the typological data involving disambiguation pressure and iconicity of markedness. We expected to find relative frequencies of different A and P that would account for the typological data. A summary of the results for the three strategies is shown in Table 8.8. If one cell contains two words, e.g., ‘Mostly/Partly’, the former refers to the data recycled from the previous studies (Section 8.4.2), and the latter refers to the new data from spontaneous conversations (Section 8.4.3).
Table 8.8. Reverse-engineered predictions and the data
| Marking strategy | Prediction for A distribution | Prediction for P distribution |
|---|---|---|
| Mark Impostors | Mostly | Mostly/Partly |
| Don’t Mark the Authentic | Mostly | Mostly/Partly |
| Mark Weak Cues | Mostly/Yes | Yes |
| Don’t Mark Strong Cues | Yes | Yes |
| Mark Weirdos | Partly | Partly |
| Don’t Mark Normals | Partly | Partly |
The weakest support is found for the strategy based on iconicity of markedness, ‘Mark Weirdos, Don’t Mark Normals’. Its predictions are supported only partly. The first strategy based on ideas about the distinguishing function of differential marking, ‘Mark Impostors, Don’t Mark the Authentic’, does better. Most of its predictions are supported. The exceptions concern the person for A’s, which displays more variation than predicted, and nominality (lexicality) for P’s. Especially in the conversations, there are fewer nominal objects than expected. The strategy ‘Mark Weak Cues, Don’t Mark Strong Cues’ provides the best account of the observed cross-linguistic tendencies, especially the second part ‘Don’t Mark Strong Cues’. This strategy is also related to the distinguishing function of marking. From the efficiency perspective, this strategy has a straightforward interpretation in terms of the principle of negative correlation between accessibility and costs. For example, in the case of differential object marking where animate objects are marked and inanimate ones are unmarked, this principle guides the addressee who is processing an inanimate argument that the most probable interpretation (i.e., the P role) should be taken. This reasoning is similar to Haspelmath’s hypothesis about differential object marking and animacy:
Since more inanimate nominals have P-function than animate nominals, hearers are less surprised when they encounter an inanimate P-argument and therefore have less need for special coding.
Importantly, the conditional probabilities of A and P given specific features of the arguments, i.e., ℙ (Role|Feature), which are associated with the winning strategy, are distributed quite uniformly across different registers and languages. It is logical to assume that cross-linguistic universals should correspond to similar distributions across and within languages. Therefore, the homogeneous distributions of ℙ (Role|Feature) are good candidates for explaining the universal scale effects.
Some of the strategies overgenerate predictions. This means that there are discourse tendencies that do not have correspondences in the cross-linguistic distribution of differential case marking. This is relevant for subjects in particular. We still need to account for this fact.
These results support the more general principle formulated by Haspelmath (Reference Haspelmath2021b: 125): ‘Deviations from usual associations of role rank and referential prominence tend to be coded by longer grammatical forms if the coding is asymmetric.’ Note that by role rank it is meant that the A argument is ranked higher than the P argument, whereas referential prominence is associated with the values on the left in the animacy, definiteness and other scales in (3). Based on the findings presented above, we are able to formulate this principle in a more precise manner. Namely, Haspelmath’s ‘deviations from usual associations’ correspond to low conditional probabilities of roles given referential features. Moreover, we have identified the functional pressure responsible for this. It has to do with weak and strong cues that are supposed to help the addressee infer who did what to whom.
Note that we formulated the predictions for A and P without considering the properties of the other argument. In other words, we did not consider different scenarios, e.g., a ‘downstream scenario’, where A is more prominent than P, an ‘upstream scenario’, where P is more prominent than A, and a ‘balanced scenario’, where A and P have equal prominence (Haspelmath Reference Haspelmath2021b). The reason for this is that the synchronic evidence for co-argument sensitivity of the Malayalam type, or global marking, is rare in comparison with local differential marking of the Spanish or Turkish type, which depends (at least, in the majority of cases) on the properties of the argument that is marked or unmarked.
8.5 Development of Differential Case Marking
Differential case marking emerges as a result of differential reduction and differential enhancement of case forms. The former is described less often than the latter. However, there are a few examples. In particular, Harari, a Semitic language, case-marks only definite objects, while in Old Harari the Accusative suffix also marked indefinite nouns (Tosco Reference Tosco and Brugnatelli1994).
As for differential enhancement, Old Russian represents a particularly interesting case (Seržant Reference Seržant, Schmidtke-Bode, Levshina, Michaelis and Seržant2019). The emergent differential object marking comes from Proto-Slavic. Due to the overall loss of all word-final consonants, in most Proto-Slavic declensions, the singular Accusative and Nominative markers were phonetically indistinguishable. Over time, these turned into zero markers. The Genitive case marker then replaced the old zero Accusative on some pronouns and on animate nouns in some declension classes. The historical data suggest that the emergence of differential object marking was explained by the fact that subject and object forms were indistinguishable. We know from the corpus data that pronouns and animate nouns are less likely to be objects than inanimate nouns. Therefore, this development can be regarded as a case of efficient formal enhancement.
When a marker emerges, it often expands from one context to new types of referents. In Spanish, the marker a was first used on strong personal pronouns referring to humans. Now it is used with animate (or at least human) definite and specific objects (at least in Standard Spanish), whose interpretation as direct objects is also not very accessible. As for indefinite animate objects, there is variation. Inanimate objects are usually unmarked (von Heusinger and Kaiser Reference von Heusinger, Kaiser, Kaiser and Leonetti2007). They represent the last bastion of zero marking, although instances of marked inanimate objects were attested even in Old Spanish (García García Reference García García, Seržant and Witzlack-Makarevich2018), which means that there has been sufficient time to develop marking for them, too. As shown in Section 8.4, inanimate and indefinite arguments have higher probability of being objects than subjects, so the lack of case marking is efficient.
A similar development happened in Persian. The object marker râ first appeared on the 1st and 2nd-person pronouns, then spread to definite animate objects, then definite objects, but was still optional. In Modern Persian, it became obligatory on all definite objects and topical indefinite objects (Dalrymple and Nikolaeva Reference Dalrymple and Nikolaeva2011: 203). Focal indefinite objects thus remain the last bastion of zero marking.
Similarly, in Hindi the use of object marking started from the pronouns representing speech act participants. In the fourteenth century, the marking of those pronouns was obligatory. In other situations, it was optional. Later, the frequency of marking increased in human nouns. In modern Hindi, they are always marked. If we take only small clauses (complements of main verbs), the marking was first optional on human and inanimate nouns. In modern Hindi, it is obligatory with human nouns and very frequent with inanimate nouns (Montaut Reference Montaut, Seržant and Witzlack-Makarevich2018).
These examples demonstrate that marking expands depending on the accessibility of the interpretation of an argument as A or P. Notably, languages choose different cut-off points and levels of optionality. How to explain this fact is an intriguing question.
Not all researchers agree that differential marking is actually explained by communicative pressures, however. In particular, Cristofaro (Reference Cristofaro, Schmidtke-Bode, Levshina, Michaelis and Seržant2019) argues that the distributional restrictions on the use of an additional subject or object marker in marking splits (e.g., only with animate objects or nominal subjects) originate from the reinterpretation of an element of a pre-existing construction with similar distributional restrictions. For instance, some object markers come from topical markers. Since topics are usually definite, animate and pronominal (e.g., As for me, …), the marking has spread to objects with all or some of these features, as in Kanuri:
| Kanuri: Nilo-Saharan (Cyffer 1998: 52, cited from Cristofaro Reference Cristofaro, Schmidtke-Bode, Levshina, Michaelis and Seržant2019: 28) | ||||
| a. | Músa | shí-ga | cúro. | |
| Musa | 3sg-acc | saw | ||
| ‘Musa saw him.’ | ||||
| b. | wú-ga | |||
| 1sg-as.for | ||||
| ‘as for me’ | ||||
Topicality also plays a prominent role in Iemmolo’s (Reference Iemmolo2010) account. He observes that dislocated topical objects are case-marked in many Romance varieties. See also Dalrymple and Nikolaeva (Reference Dalrymple and Nikolaeva2011).
The idea that differential object marking originates from topic markers does not exclude the efficiency explanation, in fact. Subjects tend to be the topic par excellence, while objects can be both topics and foci. In particular, Maslova (Reference Maslova2003) reports that transitive sentences in Kolyma Yukaghir (a Yukaghir language spoken in the Russian Far East) have mostly topical subjects and topical objects (65 per cent), while topical subjects and focal objects are responsible only for approximately 35 per cent, and less than 1 per cent of all clauses have focal subjects and topical subjects. These numbers suggest that the probability that a focal nominal is an object is much higher than the probability that it is a subject. Focalness therefore serves as a perfect cue for objecthood, while topical objects may need some extra help to be distinguished from subjects, the primary topics. This can be interpreted as a manifestation of efficiency.
One can also say that the marking of topical objects is motivated by the need to attract the addressee’s attention to a new topic (Diessel Reference Diessel2019: Section 11.6). The tendency to mark topical objects represents ‘a hearer-oriented strategy to mark an atypical P argument that deviates from listener’s linguistic expectations’ (Diessel Reference Diessel2019: 244), especially in an unusual position. Therefore, there are good reasons to believe that topical objects are marked due to the principle of negative correlation between accessibility and costs.
While object markers can develop from topical markers, ergative markers can originate from focalizing constructions involving appositional pronominals and demonstratives, as in ‘The farmer he/this killed the duckling’ (McGregor Reference McGregor, Josephson and Söhrman2008). These examples can also be explained by efficiency: new and focal agents are very untypical and therefore need extra marking.
In addition, markers participating in differential marking can develop from other case markers. Ergatives often emerge from ablatives, instrumentals, locatives and genitives. Object markers can also develop from various oblique markers, such as the dative marker a in Spanish, the genitive case marker ‑a in Russian, the comitative marker saṇī (haṇī) ‘with’ in Garhwali and Kumaoni, Indo-Aryan languages spoken in India (Montaut Reference Montaut, Seržant and Witzlack-Makarevich2018: 308). Another source is different verbs, e.g., bă ‘hold, take’ in Chinese (Yang Reference Yang1995: 165), the verbal root lag ‘touch, be stuck to’ > lā in Marathi (Montaut Reference Montaut, Seržant and Witzlack-Makarevich2018: 307) or a copula or presentative verb (-)(ʔ)à in the Khoe languages (McGregor Reference McGregor, Seržant and Witzlack-Makarevich2018).
I believe that reanalysis of other source constructions and markers does not constitute sufficient evidence against the efficiency account. Language users create new constructions from semantically compatible material at hand. What is important is that all these different scenarios lead to similar outcomes: a highly grammaticalized marker for subjects or objects, used in a similar fashion in diverse languages of the world. It is unlikely that such convergence can be explained by the lasting impact of the source constructions alone.
8.6 Experimental Evidence from Artificial Languages
This section discusses recent experimental work on differential argument marking. The experimental study of optional marking in Japanese by Kurumada and Jaeger (Reference Kurumada and Jaeger2015) was mentioned in Section 8.3.1. Other experiments involve an artificial miniature language. In a pioneering study, Fedzechkina, Jaeger and Newport (Reference Fedzechkina, Jaeger and Newport2012) used a language with optional object case marking that was not conditioned on the typicality of objects. The learners watched videos of transitive events and had to repeat the sentences in the artificial language. The subjects were always human, while 50 per cent of the objects were human and 50 per cent were inanimate. The participants also had a comprehension task and a production task in which they had to describe a novel transitive event. This procedure was repeated on several days. The experiment showed that the learners used significantly more case markers on atypical (animate) objects than on typical (inanimate) objects.
In the second experiment, the objects were always inanimate, while 50 per cent of the subjects were animate and 50 per cent were inanimate. In the production task, there was an increase in the use of marking with time, but there was no consistent direct effect of animacy on its own.
In a follow-up study, Fedzechkina, Newport and Jaeger (Reference Fedzechkina, Newport and Jaeger2016) used a language with either flexible or fixed word order, and optional marking. They found that learners dropped the case marker more often in the fixed-order language, while retaining it in the flexible-order language. Therefore, their behaviour mirrored the negative correlation between case marking and rigid word order observed across languages (see Section 6.3). There were also indications that untypical OSV order, which was used less often in the training phase, resulted in more case marking than the more typical SOV order.
A more recent study by Smith and Culbertson (Reference Smith and Culbertson2020) could not replicate the results in Fedzechkina et al. (Reference Fedzechkina, Jaeger and Newport2012). Importantly, the latter only involved language learning and no communication. However, when Smith and Culbertson (Reference Smith and Culbertson2020) carried out an interactive experiment that involved a director-matching task between a participant and Smeeble, a monster presented to the participants as the tutor in the artificial language,Footnote 4 the sentences produced by the participants exhibited the expected differential marking. Thus, language users make their languages efficient due to communicative pressures. The results found in learning-only experiments (see also Section 7.5) can be explained by a spillover from communication. Similar to D. Slobin’s ‘thinking for speaking’, participants engage in ‘learning for communicating’.
I have tested the distinguishing account in an online communication game, where two participants first learned an ‘alien’ language with optional case marking. The stimuli introducing the language showed two different aliens. Sometimes one of them was the agent, and the other the patient, and sometimes it was the other way round. Half of the objects in the alien language were case-marked, and half were unmarked. The dominant word order (75%) was SOV, but OSV was also used (25%).
In the communication game, one participant had to describe one of two pictures in the alien language so that the other participant could guess which picture was meant. They did it in turns. The goal was to earn as many points (correct matches) as possible. The pictures were of three types:
‘Different Roles’: Both pictures showed the same actions but had different Agent and Patient. I expected the participants to use the object marker the most frequently because there were no other cues to infer this information from.
‘Different Actions’: The two pictures showed different actions with the same Agent and Patient. In that case, I expected the participants to use the object marker the least frequently because it was clear from the visual context who did what to whom.
‘All Different’: The pictures showed different actions with different Agent and Patient. I expected the frequency of marking to be less frequent than in the first condition but more frequent than in the second condition because one could use the verb as a reliable cue for helping to choose between the pictures.
The proportions of marked object forms produced by twenty-eight pairs of participants are shown in Figure 8.4. On average, the chances of marking are low. Some of the pairs do not use marking in any situation. The participants probably rely on word order. Yet, the condition ‘Different Roles’ has on average the highest proportions of marked objects, as predicted. We do not find that ‘Different Actions’ has the lowest probability. Its mean and median values are in fact slightly higher than those of ‘All Different’, but the difference is not statistically significant. These results show that the availability of cues helping to choose between the referents can be responsible for differential marking, as predicted by the efficiency account developed here. If the information is present visually and fully accessible, there is no need to use case marking.

Figure 8.4 Proportions of marked object forms produced by different pairs of participants in the online communication game. The black diamonds show the mean proportions. The black lines in the boxes are the median values.
Finally, I should also mention here the study by Tal et al. (Reference Tal, Smith, Culbertson, Grossman and Arnon2022), who tested the hypothesis that given objects would be more frequently case-marked than new ones. This is related to the popular view that emergence of differential marking is driven by topicality (see Section 8.5). This prediction was not borne out. At the same time, it is possible that the relationship may be indirect: they do find that object marking is more frequent in OSV sentences, which are produced more frequently when the object is topicalized. As argued above, this is a manifestation of efficient behaviour.
8.7 Conclusions
This chapter discussed diverse typological, corpus-based and experimental evidence related to differential argument marking of subject and object. The different types of evidence converge, supporting the efficiency-based interpretation of differential marking based on the principle of negative correlation between accessibility and costs. This chapter provides a novel contribution to this well-explored topic because the intuitions about the causes that lead to the emergence of cross-linguistic patterns are captured in the form of measurable probabilities that can be found in corpus data. Of particular importance here are the conditional probabilities of the role given the referential features of arguments that determine the accessibility of the intended role interpretation. Also, this chapter demonstrates that we can disentangle some competing explanations of differential marking using experiments with artificial language learning and communication.
9.1 Construction–Filler Predictability and Efficiency
Chapter 2 discussed examples of efficient language use where a function word can be used or omitted depending on the accessibility of the intended semantic and syntactic interpretation. This chapter provides some additional illustrations of constructional variation of this kind. I will discuss the following alternations:
stative verb + (at) home, as in More and more young fathers stay (at) home;
help + (to) Infinitive, as in I helped him (to) install the software;
go (and) Verb, as in Go (and) bring me a beer.
The main focus will be on the relationships between the constructions and the lexemes that fill in their slots. For example, give, show and send are frequent slot fillers in the Verb slot of the double-object dative construction, e.g., She gave/showed/sent him a letter. The construction help + (to) Infinitive has slot fillers in the Infinitive slot, such as install, make, understand, get. There is overwhelming evidence that language users learn and store information about the probabilistic relationships between constructions and their slot fillers (e.g., Goldberg, Casenhiser and Sethuraman Reference Goldberg, Casenhiser and Sethuraman2004, Reference Goldberg, Casenhiser and Sethuraman2005; Gries, Hampe and Schönefeld Reference Gries and Stefanowitsch2005; Ellis and Ferreira-Junior Reference Ellis and Ferreira-Junior2009; Taylor Reference Taylor2012). This is why it is logical to assume that the associations between a lexical slot filler and the construction influence the probability of using optional function words.
Based on the principle of negative correlation between accessibility and costs, we can formulate the following hypothesis:
In order to measure the accessibility of a construction and its lexeme, we can compute two measures. The first measure is the probability of the lexeme given the construction. More exactly, I will use informativity, which represents a log-transformed inverse of the corresponding conditional probability, as shown in (2):
![]()
where P(L, C) stands for the probability of the filler in the slot, and P(C) represents the probability of the construction. In practice, P(L, C)/P(C) is measured as the token frequency of the lexeme in the construction divided by the token frequency of the construction in a corpus. Because of the negative log-transformation, high probabilities correspond to low informativity, whereas low probabilities correspond to high informativity. The logarithm base 2 is commonly used in information theory to measure information in bits. This measure reflects how unexpected the lexeme is if we encounter the construction.
The second probabilistic measure is informativity of the construction given a particular lexeme. It is shown in (3):
![]()
where P(L, C)/P(L), again, can be obtained from the token frequency of the lexeme in the construction, divided by the token frequency of the lexeme in a corpus. This measure reflects how unlikely the lexeme is to be found in this construction.
These two measures have analogues in usage-based Construction Grammar, which are known as Attraction, i.e., the conditional probability of a lexeme given a construction, and Reliance, i.e., the conditional probability of a construction given a lexeme (Schmid Reference Schmid2000), or Faith (Gries et al. Reference Gries and Stefanowitsch2005). Due to the negative log-transformation, the higher informativity, the smaller Attraction or Reliance (Faith). Although many corpus linguists find it useful to compute one bidirectional measure that represents the association between a construction and one of its collexemes (e.g., Stefanowitsch and Gries Reference Stefanowitsch and Stefan2003 and later work), Schmid has argued that Attraction and Reliance represent two different types of information, each valuable on its own (e.g., Schmid and Küchenhoff Reference Schmid and Küchenhoff2013). This approach is pursued here, as well.
Accessibility is then associated with high predictability and low informativity of a lexeme given the construction, or the other way round. A series of quantitative analyses presented below provide statistical evidence that accessibility allows us to predict the use or omission of function words (the particle to, preposition at and conjunction and), such that higher accessibility is associated with the shorter forms without the function words, and lower accessibility is associated with the longer variants with the function words, in accordance with the Hypothesis of Construction–Lexeme Accessibility and Formal Length in (1). Section 9.2 discusses the use of (at) home. Section 9.3 focuses on the construction help + (to) Infinitive, followed by the discussion of go (and) Verb in Section 9.4.
9.2 Stay (at) Home, Save Lives!
The present section discusses the use of locative adverbials home and at home in US English. When the meaning is directional, e.g., go/return/bring (someone) home or a long way home, no preposition is used. The forms home and at home can only be interchangeable when the meaning is locative, as in (4):
| a. | Dads who stay at home (COCA, Magazines) |
| b. | Stories abound of men staying home to look after newborns (COCA, Magazines) |
One of the few mentions of this alternation can be found in Huddleston and Pullum (Reference Huddleston and Pullum2002: 683). They claim that home marks location only as a subject-oriented complement, as in Are you home? We stayed home, but not in other contexts, e.g., *I kept my computer home or *Home, the children were playing cricket.
At the same time, the use of these expressions attracts language learners’ attention, judging from numerous discussions on internet fora.Footnote 1 Moreover, in the period of the COVID-19 pandemic, there are many slogans advising people to stay home or at home, e.g., Keep calm and stay at home or Stay Home, Protect the NHS [UK National Health Service], Save Lives. I will focus on US English, where this variation seems to be more common, as one can conclude from language users’ intuitions and experts’ comments.Footnote 2
As shown in Levshina (Reference Levshina2018a), the use of at depends on many diverse factors. The data from the Corpus of Contemporary American English suggest that the bare form is strongly preferred when (at) home is preceded by back:
It’s good to be back home again.
The short form is also strongly preferred when someone has returned home:
| a. | Darling, I’m home! |
| b. | I was home from college for the summer, and I said I’d do it. (NPR Weekend) |
In contrast, the long form at home is nearly always preferred in a figurative sense, when the construction expresses one’s feeling of being comfortable and at ease in a particular situation:
And he’s probably more comfortable and at home with his stage makeup every day. (Ind Geraldo)
Another case is a semantic generalization, when (at) home is used to refer to the city or country where one lives. Here, the longer form is preferred, as well:
But for all his achievements on the international scene, the problems he faces at home seem insurmountable. (ABC Nightline)
This can be seen as a manifestation of the principle of negative correlation between accessibility and costs because the semantic extensions are less accessible than the literal meaning.
The long form is also much more likely with transitive verbs (I build furniture at home), as sentence adjuncts (At home, I drink only tea), attributes (Their stores at home are even emptier than here), in existential constructions (There is too much stress at home), and as part of an elliptic structure (Finally, at home!).
In all these cases, there are quite strong preferences for one or the other form. The only type of context that exhibits substantial variation is when (at) home is an adjunct of an intransitive predicate (e.g., I’m home), the meaning is literal, there is no semantics of arrival and the adjunct is not preceded by back. The variants home and at home are almost equally distributed in these contexts.
In order to test the Hypothesis of Construction–Lexeme Accessibility and Formal Length formulated in (1), I extracted all occurrences of the alternation from the spoken component of COCA, focusing only on these contexts. Set expressions (e.g., charity begins at home, be home free, romp home) were excluded. After manual cleaning, there were 4,032 occurrences with 71 different verbs. The bare variant was used 2,623 times (65%), while the prepositional form occurred 1,409 times (35%). The frequencies of each verb with (at) home and the total frequencies of the verb in the spoken subcorpus were obtained. Based on that information, the informativity scores were computed for each verb, using the approach described in Section 9.1.
The data are distributed in a rather straightforward way. Most verbs are followed by at home only. Only seven verbs are followed both by the prepositional and bare variants in the sample: be, stay, belong, sit, remain, wait and live. Six of those verbs (with the exception of belong, which occurs only once with home and once with at home) are among the top twelve verbs that have the highest frequency with (at) home. They are displayed in Table 9.1. In particular, the verbs be, stay and sit have the highest proportions of the bare variant. Notably, they are also the top most frequent verbs that occur with both variants. This means that they have the lowest informativity of the lexeme given the construction. Moreover, the verbs stay and sit also have very low informativity of the construction given the lexeme in comparison with the other verbs. All this supports the Hypothesis of Construction–Lexeme Accessibility and Formal Length.
Table 9.1. Frequency and informativity of the top twelve verbs most frequently used with locative (at) home
| Verb | Bare variant | Prepositional variant | Total frequency with (at) home | InfoLex | InfoCxn |
|---|---|---|---|---|---|
| be | 1796 (77.1%) | 505 (22.9%) | 2301 | 0.81 | 11.06 |
| stay | 760 (76.9%) | 229 (23.1%) | 989 | 2.03 | 4.83 |
| sit | 62 (24%) | 196 (76%) | 258 | 3.97 | 6.67 |
| live | 2 (1.5%) | 135 (98.5%) | 137 | 4.88 | 8.38 |
| work | 0 (0%) | 66 (100%) | 66 | 5.93 | 10.28 |
| watch | 0 (0%) | 54 (100%) | 54 | 6.22 | 9.22 |
| die | 0 (0%) | 21 (100%) | 21 | 7.58 | 10.15 |
| start | 0 (0%) | 17 (100%) | 17 | 7.89 | 11.79 |
| wait | 1 (5.9%) | 16 (94.1%) | 17 | 7.89 | 10.57 |
| play | 0 (0%) | 15 (100%) | 15 | 8.07 | 11.35 |
| happen | 0 (0%) | 14 (100%) | 14 | 8.17 | 12.64 |
| remain | 1 (7.1%) | 13 (92.9%) | 14 | 8.17 | 9.57 |
To summarize, we observe here accessibility effects. If a verb is highly expected to occur before (at) home, the bare form is preferred or permitted. With most verbs, the full form is the norm, although we observe occasional uses of the bare form. The effect of a verb as a strong or weak cue to the following (at) home is less clear, because the verb be, which occurs with the bare variant the most frequently, is multifunctional and does not serve as a strong cue for (at) home. The corpus annotation, however, does not allow me to distinguish between the auxiliary and lexical uses of be, which may have different representations in the mental lexicon, so which direction of predictability is more important remains an open question. We could also interpret some of the other features, such as figurative semantics and syntactic function different from the intransitive use, as indicators of lower accessibility of the construction. It is not surprising, then, that they are usually associated with the full form.
9.3 Efficient Use of Help (to) Infinitive
This section discusses the English construction with help followed by the infinitive with or without to, as in the examples below:
| a. | If this book does not help you to survive the Zombie Apocalypse, a full refund may be obtained from the author.Footnote 3 |
| b. | Just to be on the safe side you might want to start doing these 8 exercises that will help you survive the zombie apocalypse.Footnote 4 |
The construction help + (to) Infinitive is a rare case where the choice between the bare and the to-infinitive is possible in Present-Day English. This choice depends on many factors. For instance, it has been argued that the variant with the bare infinitive designates a more active involvement of the Helper in carrying out the event expressed by the infinitival complement (Dixon Reference Dixon1991: 199). Consider the following examples:
| (Dixon Reference Dixon1991: 199) | |
| a. | John helped Mary eat the pudding (he ate half). |
| b. | John helped Mary to eat the pudding (by guiding the spoon to her mouth, since she was still an invalid). |
When to is omitted, as in (10a), the sentence is likely to describe a cooperative effort where Mary and John ate the pudding together; when to is included, as in (10b), the sentence means that John acted as a facilitator for Mary, who actually ate the pudding herself (Dixon Reference Dixon1991: 199, 230). Similarly, Duffley (Reference Duffley1992: Section 2.3) suggests that the use of the to-infinitive evokes help as a condition that enables the Helpee to bring about the event denoted by the infinitive. It has also been argued that animate Helpers have a potentially greater involvement in the event (Lind Reference Lind1983). Indeed, Lohmann (Reference Lohmann2011) finds that animate Helpers have higher odds of the bare infinitive than inanimate Helpers, although the effect is not very strong. These tendencies might be explained by the higher accessibility of the cooperative interpretation, although more evidence is needed to defend this claim. Also, many researchers have questioned the relevance of this semantic distinction. For example, Huddleston and Pullum (Reference Huddleston and Pullum2002: 1244) argue that there are numerous contexts and examples where this distinction cannot be traced. Similar claims were made by McEnery and Xiao (Reference McEnery and Xiao2005).
Another relevant factor is the principle of avoidance of identity, or horror aequi. Horror aequi is a widespread tendency to avoid repetition of identical elements (Rohdenburg Reference Rohdenburg, Rohdenburg and Mondorf2003), which was discussed in Section 4.4. Avoidance of identity helps to avoid similarity-based interference. When the verb help is itself preceded by to, the following infinitive is usually without to (Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999: 737), as in the following example:
Sorry, but how is this supposed to help answer the question? (GloWbE, GB)Footnote 5
The next factor has to do with the principle of reduction of cognitive complexity (Rohdenburg Reference Rohdenburg1996), which was discussed in Section 2.4.2. Cognitive complexity here depends on the number of words between help and the infinitive. Consider an example of a complex environment below, where the distance between help and the infinitive is six words.
…it’s a way for me to make a contribution, to help the country in a small way to get back on its feet. (GloWbE, GB)
The longer the distance, the more likely it is that the infinitive will be marked by the particle to. This effect has already been explained from the efficiency perspective in Section 2.4.2. The greater the distance between the matrix verb and the infinitive, the less accessible the mental representation of the matrix verb. Low accessibility leads to more formal coding. Moreover, there is an interaction between distance and horror aequi: the more words there are between help and the infinitive, the weaker the influence of horror aequi (Lohmann Reference Lohmann2011; Levshina Reference Levshina2018b).
It has also been shown that the inflectional forms of the verb help have individual preferences for the bare or to-infinitive. In particular, Lohmann (Reference Lohmann2011) observes that the form helping tends to be more frequently used with the to-infinitive in British English than the other inflectional forms of help (see also Levshina Reference Levshina2018b). According to Rohdenburg (Reference Rohdenburg, Tieken-Boon van Ostade and van der Wurff2009: 317), the effect of helping has an analogy with daring and needing, which differ from all forms of dare and need by being virtually always associated with marked infinitives. In addition to that, there is a weakly significant preference of the 3rd-person singular form helps for the to-infinitive in comparison with the base form (Lohmann Reference Lohmann2011). As we will see, helping is the least frequent inflected form that occurs in this construction, so the principle of negative correlation between accessibility and costs may play a role here, as well.
The presence or absence of the Helpee is another relevant factor. Biber et al. (Reference Biber, Johansson, Leech, Conrad and Finegan1999: 735) show that the bare infinitive is particularly dominant in the pattern help + NP + infinitive clause. This observation is also supported by Lohmann (Reference Lohmann2011). Similarly, it matters whether the form of help is passive or active: According to McEnery and Xiao (Reference McEnery and Xiao2005), the passive form should always take a to-infinitive. One could again think about the low accessibility of such constructions as a reason for preferring the longer form. As for the complement, one can find examples of both the bare and to-forms of passive infinitives, as shown below.
| a. | If rural voices are important – the bread basket, our farmers, our miners – then an electoral approach, not a pure popular vote, helps them to be heard. (GloWbE, USA, general, 288902) |
| b. | Thank you so much for sharing and helping our Vets be heard! (GloWbE, USA, blog, 3177307) |
Moreover, the shorter variant with the bare infinitive is considered to be less formal than the one with the marked infinitive (e.g., Rohdenburg Reference Rohdenburg1996: 159; see also Biber et al. Reference Biber, Johansson, Leech, Conrad and Finegan1999: 736–737). This may be due to the fact that formal expressions are used in situations with less common ground between the participants. Lower accessibility of information can lead to more costly expressions. Across cultures, the association between formality and verbosity becomes conventionalized, with speech registers and styles as a result.
In addition, previous studies show that the bare infinitive has been gradually replacing the to-infinitive after help over the last two centuries. A corpus study by Rohdenburg (Reference Rohdenburg, Tieken-Boon van Ostade and van der Wurff2009: 318–319) shows that the infinitive marker to was dropped very rarely in British and American English with authors born before the end of the eighteenth century, but there was a significant increase in the dropping of the marker by the end of the nineteenth century. This tendency continued in US English also in the twentieth century. British English speakers followed suit with some delay. Similarly, McEnery and Xiao (Reference McEnery and Xiao2005) find that the bare infinitive is used more frequently in the British and American corpora from 1991 than in the data from 1961, and that the American variant of help is more frequently used with the bare infinitive (see also Mair Reference Mair2002). Therefore, we are dealing with regional variation in differential formal reduction.
Finally, it is necessary to mention phonological factors. There is some evidence that the use of to in different constructions depends on prosody. Wasow et al. (Reference Wasow and Winkler2015), in particular, found an effect of prosody on the use of the bare or to-infinitive in their investigation of the do-be construction, e.g., All we want to do is (to) celebrate. Namely, they discovered that to was used to eliminate stress clash when both the copula and the first syllable of the infinitive after be were stressed (see also Schlüter Reference Schlüter, Rohdenburg and Mondorf2003). Lohmann (Reference Lohmann2011) tested two other phonetic variables, namely, whether the infinitive begins with a vowel, and whether the first syllable of the infinitive is stressed. Neither of the variables had a significant effect on the choice between the forms of the infinitive. In Levshina (2022a) I investigated the potential impact of stress clash and stress lapse more directly, but found no effect either.
In the remaining part of this section, I will test the Hypothesis of Construction–Lexeme Accessibility and Formal Length in order to see if the probability of the infinitive given the construction, or the other way round, plays a role in the formal variation. Since the construction with help is not very frequent, and the distribution is skewed in favour of the bare infinitive, especially in US English and in informal registers, I used the British data set of Google Books Ngrams, which is based on books published in Great Britain.Footnote 6 For my analyses, I used 1-grams, 2-grams, 3-grams and 4-grams with part-of-speech (POS) tags. For Modern English, Google promises the accuracy of the POS tags to be around 95 per cent, and likely above 90 per cent for older English texts.Footnote 7 The data represent mostly formal registers (e.g., academic publications).
Only a fraction of this huge data set was used, representing the years from 2001 to 2009. Instances of the construction were extracted from the data sets with 2-grams, 3-grams and 4-grams. More exactly, I extracted the following patterns, where X stands for any string, Y denotes the object personal pronoun me, you, him, her, it, us and them, and * represents any ending, including zero:
help*_VERB X_VERB e.g., helps_VERB make_VERB
help*_VERB Y_PRON X_VERB e.g., helped_VERB me_PRON build_VERB
help*_VERB to_PRT X_VERB e.g., helping_VERB to_PRT achieve_VERB
help*_VERB Y_PRON to_PRT X_VERB e.g., help_VERB her_PRON to_PRT understand_VERB
By making this contextual restriction, it was possible to control for some of the relevant factors that influence the use of one or the other variant: linguistic distance (zero or one word), the Helpee (explicit or implicit) and the morphological form of help. The restrictions of the Helpees to personal pronouns are explained by the size of n-grams and by concerns about possible spurious hits. Previous corpus work (Levshina Reference Levshina2018b) suggests that zero and pronominal Helpees account for approximately 80 per cent of all uses of the construction in the corpora. I assume, therefore, that the extracted n-grams can be used as a testing ground.
Both upper-case and lower-case characters were allowed. The verbs in the open slot were later manually checked, and the finite forms, participles and misspellings were excluded. The verbs were normalized with regard to the spelling variant (e.g., organise and organize were treated as one lemma). The total number of occurrences of the construction was 2,471,027, and the total number of individual verbs was 1,672. The relative frequencies of the to-infinitive and bare infinitive were very similar: 47.1 and 52.9 per cent, respectively.
The frequencies for separate combinations of the forms of help and the presence or absence of the Helpee are displayed in Table 9.2. Notably, we observe a correlation between the relative frequency of the bare form and the total frequency of the form of help followed by the (to) Infinitive with and without the Helpee. The higher the total frequency, the higher the relative frequency of the bare form. This can be regarded as a manifestation of efficiency.
Table 9.2. Frequencies of different subschemata of the construction with help
| Context | Total frequency | Frequency of the to-infinitive | Frequency of the bare infinitive | Number of verb types |
|---|---|---|---|---|
| help + Inf | 897,120 | 328,329 | 568,791 | 1,329 |
| (100%) | (36.6%) | (63.4%) | ||
| helped + Inf | 459,042 | 273,218 | 185,824 | 1,354 |
| (100%) | (59.5%) | (40.5%) | ||
| helps + Inf | 295,028 | 193,759 | 101,269 | 873 |
| (100%) | (65.7%) | (34.3%) | ||
| helping + Inf | 120,815 | 106,905 | 13,910 | 750 |
| (100%) | (88.5%) | (11.5%) | ||
| help + Helpee + Inf | 497,241 | 155,565 | 341,676 | 688 |
| (100%) | (31.3%) | (68.7%) | ||
| helped + Helpee + Inf | 87,622 | 41,619 | 46,003 | 321 |
| (100%) | (47.5%) | (52.5%) | ||
| helps + Helpee + Inf | 73,982 | 41,687 | 32,295 | 236 |
| (100%) | (56.3%) | (43.7%) | ||
| helping + Helpee + Inf | 40,177 | 22,782 | 17,395 | 210 |
| (100%) | (56.7%) | (43.3%) |
The total frequencies of the individual verbs were obtained from the file with 1-gram frequencies in the entire British English data set for the period from 2001 to 2009. Also, the frequencies of use with both variants were summed for each individual verb. Based on these frequencies, the two informativity measures were computed as described in Section 9.1.
Consider an example. The verb understand occurs in the construction with help 85,815 times. The total frequency of the construction is 2,471,027. Therefore, the informativity score of understand given the construction is −log2 (85,815/2,471,027) ≈ 4.85. The verb occurs 3,239,809 times in the entire data set. From this follows that the informativity value of the construction given the verb is −log2 (85,815/3,239,809) ≈ 5.24.
In order to test the correlation between information context and the use or omission of to, I computed Spearman’s partial correlations between the proportions of the to-infinitives and each informativity measure. Partial correlations allow us to measure the correlation between two variables while controlling for the other(s). The correlations are displayed in Table 9.3.
Table 9.3. Spearman’s coefficients representing partial correlations between the proportions of to-infinitives and the informativity measures
| Context | InfoLex | InfoCxn |
|---|---|---|
| help + Inf | 0.066 (p = 0.002) | 0.169 (p < 0.0001) |
| helped + Inf | 0.431 (p < 0.0001) | 0.225 (p < 0.0001) |
| helps + Inf | 0.547 (p < 0.0001) | 0.147 (p < 0.0001) |
| helping + Inf | 0.552 (p < 0.0001) | -0.055 (p = 0.134), n.s. |
| help + Helpee + Inf | 0.007 (p = 0.857), n.s. | 0.122 (p < 0.0001) |
| helped + Helpee + Inf | 0.204 (p = 0.0002) | 0.123 (p = 0.027) |
| helps + Helpee + Inf | 0.512 (p < 0.0001) | -0.05 (p = 0.447), n.s. |
| helping + Helpee + Inf | 0.407 (p < 0.0001) | 0.115 (p = 0.099), n.s. |
In accordance with the Hypothesis of Construction–Lexeme Accessibility and Formal Length, we expect positive correlations between the proportion of to vs. zero and the informativity measures. This expectation is supported. At least one of the informativity measures is positive and significant for each individual context. The correlations for help + Infinitive and help + Helpee + Infinitive are quite low; only the former is significant. This may be an artefact of the data, since it is very difficult to control for possible horror aequi effects in n-gram data due to the lack of left context (e.g., ?This is an opportunity to help to protect millions of lives). Notably, the subschemata with individual inflectional forms display different behavioural properties, which is typical of so-called inflectional islands (Newman and Rice Reference Newman, Rice, Gries and Stefanowitsch2006).
To summarize, we find that the accessibility of constructions and lexemes correlates positively with the chances of the less costly variant, in accordance with the principle of negative correlation between accessibility and costs. Moreover, the total frequency of the subschemata with different forms of help with and without the Helpee correlates with the relative frequencies of the bare infinitive in these subschemata, which can also be considered efficient. The variation of help (to) Infinitive involves very many factors, but it seems that most of them can be interpreted in terms of accessibility related to the meaning, familiarity of inflected forms, memory decay, common ground and other factors.
9.4 Alternation Go (and) Verb
The third and final alternation discussed in this chapter is go (and) Verb, which is illustrated by the following example:
Let’s go (and) get some pizza!
This alternation has been widely discussed in the generativist literature. The main question of these studies is how the shorter variant is derived by formal operations from the construction with and (see, e.g., Pullum Reference Pullum1990 and Wulff Reference Wulff, Gries and Stefanowitsch2006 for an overview). Some semantic differences have been observed, as well. In particular, Carden and Pesetsky (Reference Carden and Pesetsky1979: 81) point out that go and Verb, unlike go Verb, can express unexpected events. They provide the following examples:
| a. | ??As we had arranged, the President went and addressed the graduating class. |
| b. | To our amazement, instead of addressing the graduating class, the President went and harangued the janitors. |
The use of go and Verb is more suitable when the action is surprising, as in (15b), than when it is planned, as in (15a). This can be seen as a direct manifestation of efficiency. The more costly form is used when the information is surprising and therefore less accessible.
In addition, Shopen (Reference Shopen1971) argued that go Verb implies volitionality, which is not always observed with go and Verb. The shorter variant is also associated with motion away from the viewpoint location. This is why it is possible to say, Go and come back to our house, but not *Go come back to our house.
More recently, Wulff (Reference Wulff, Gries and Stefanowitsch2006) found that the lexical overlap between the constructions is not very large. Her distinctive collexeme analysis also suggests that the verbs most distinctive of the go Verb construction are process verbs (e.g., run, work, walk and fly).
Another relevant usage-based study is Flach (Reference Flach2017), where data about the syntactic environment of the constructions are provided. Her corpus frequencies suggest that go Verb is more frequently used after adhortative let’s and in the imperative mood, whereas go and Verb is preferred after modals and as a to-complement.
According to the Hypothesis of Construction–Lexeme Accessibility and Formal Length, we can expect the longer variant to be preferred if a verb is highly informative given the construction go (and) Verb, or if the construction is highly informative given the verb.
In order to test this prediction, I extracted all instances of go followed by a verb in the base form, with or without and between them, from the spoken component of COCA. This data source was chosen because go Verb seems to be more popular in American English (e.g., Pullum Reference Pullum1990).Footnote 8 These strings could be followed by any word, with the exception of function words, adverbials (e.g., go there), adjectives (e.g., go crazy) and participles (e.g., go unnoticed and go shopping). After that, it was necessary to check the data manually because some of the verbs were annotated incorrectly (e.g., the second verb in go figure was tagged as a noun). The result was 6,540 instances of the alternation with 627 individual verbs. The go + Verb construction occurred more frequently than go and + Verb: 4,618 occurrences against 1,922 occurrences. I also extracted the token frequencies of individual verbs from the spoken component of COCA. Using this information, I computed the informativity measures as described in Section 9.1.
The relationships between the variants and the informativity measures were tested with the help of Generalized Additive Models (GAMs).Footnote 9 The main distinctive characteristic of GAMs is that they allow for straightforward and convenient modelling of non-linear relationships between the predictors and the response variable. This is done with the help of smooth terms, which can be of various types and degrees of ‘wiggliness’. In order not to oversmooth or undersmooth the data, various regression diagnostics were performed, based on the goodness-of-fit measures provided in the model summary, AIC (the measure that combines the goodness of fit with parsimony), visualization tools and built-in tests. More information about the basic concepts and modelling strategies of GAMs can be found in Wood (Reference Wood2006), Sóskuthy (Reference Sóskuthy2017) and Wieling (Reference Wieling2018).
The response was based on the frequencies of go and Verb and go Verb for every individual verb. The predictors were again the informativity variables with a bivariate tensor product smooth, which turned out to provide a better fit than individual univariate smooths, since the former produced smaller deviance than the latter.Footnote 10
The effect of both informativity variables on the chances of the longer variant is shown in Figure 9.1. The lighter regions represent the values where the chances of go and Verb are higher. They are observed at the top, where the informativity of the construction given a verb is higher, and the accessibility is lower. Many of those verbs are highly frequent, such as be, become, come, give and join, e.g., Do you want to go and be in a coma? and are used in many different constructions. Because of their ‘promiscuity’, they do not provide a strong cue to the construction and its meaning. The darker regions show the values where the chances of go and Verb are lower. They are observed in the bottom part of the plot, where the informativity value of a verb given go (and) Verb is lower. Some of the verbs with the lowest values are pee, hunt, check, rent, visit and golf, e.g., Why did you decide to go visit Saddam Hussein? These verbs provide strong cues to the construction. Therefore, the shorter variant is preferred when the construction is more expected given the verb, and the longer variant is preferred in contexts when the construction is less expected. This is exactly what is predicted by the Hypothesis of Construction–Lexeme Accessibility and Formal Length.

Figure 9.1 Effect of informativity on the chances of go and Verb vs. go Verb, based on a Generalized Additive Model
As for informativity of the verb given the construction, represented by the horizontal axis in Figure 9.1, we can see that the regions of high informativity on the right are somewhat lighter than the regions of low informativity, which is also according to the expectations. There is some non-additivity in the effect of both informativity measures, as one can see from the curvy isolines, which is why the tensor products are preferred to individual univariate smooths.
Additional partial correlation analyses show that both informativity measures are positively correlated with the chances of the longer form, as predicted. Spearman’s correlation coefficients are 0.087 (p = 0.03) for informativity of the verb given the construction, and 0.186 (p < 0.0001) for informativity of the construction given the verb.
The analyses thus show that the longer form go and Verb is preferred with verbs that have high informativity (in both directions). It is noteworthy that the semantics of unexpectedness, which, according to Carden and Pesetsky (Reference Carden and Pesetsky1979), is associated with go and Verb, is accompanied by relative unexpectedness of its slot fillers, which can be measured quantitatively. Of course, these variables should be tested in the presence of other factors that influence the use of the variants, but preliminary analyses of go (and) Verb with additional contextual variables (in collaboration with Susanne Flach, in preparation) reveal similar effects of informativity, which means that the observed effects are robust.
9.5 Conclusions
This chapter presented three case studies that demonstrate that accessibility based on associations between constructional slots and their fillers plays a role in constructional alternations with longer and shorter alternative forms. I formulated the Hypothesis of Construction–Lexeme Accessibility and Formal Length, which predicts that longer forms will be used when the associations are weaker, and shorter forms when the associations are stronger.
Based on these findings, we can formulate a prediction for the evolution of existing constructions and emergence of new ones. When a construction has two or more variants with closely related functions, the less costly variant will be used when the association between the constructions and their slot fillers is higher, and the more costly variant will be used when the mutual predictability between the constructions and the fillers is lower.
These results have consequences for the study of the associations between constructions and collexemes. Construction Grammar has traditionally focused on semantic compatibility (e.g., Goldberg Reference Goldberg1995; Stefanowitsch and Gries Reference Stefanowitsch and Stefan2003). The present study demonstrates that the probabilistic relationships between constructions and collexemes can explain the choice between the shorter and longer constructional variants. This needs to be integrated into the constructionist theory.
We also see many other manifestations of high and low accessibility that determine the use of longer and shorter variants. This shows that language users adjust their output to the individual situation in many very subtle ways, depending on semantic features, inflectional forms, distances between constructional components, and many other factors. There is a multitude of soft probabilistic constraints in which communicative efficiency manifests itself in language use. This should have important consequences for probabilistic grammar and lexicology (cf. Grafmiller et al. Reference Grafmiller, Szmrecsanyi, Röthlisberger and Heller2018). We may need to rethink many of the discovered factors in terms of accessibility.
This book has argued that language users have a bias towards efficient communication. They tend to minimize the cost-to-benefit ratio, where the benefits are desirable cognitive effects in the addressee, and the costs are related to articulation, processing and time. If we have several ways of formulating something, the most efficient expression will be the one with minimal costs, under the condition that the same cognitive effects are achieved.
A central role is played by accessibility. If intended meanings, interpretations, syntactic parses and so on are highly accessible, they can be expressed by less costly forms in terms of articulation effort and time, and also produced as soon as possible. Low accessibility means that more effort and time needs to be spent in order to get the message across.
Examples of efficient choices include forms of different length related to various generalized implicatures, more and less explicit anaphoric expressions, asymmetric grammatical categories, e.g., singular and plural, the use and omission of function words and grammatical morphemes, reduced or hyperarticulated phonetic variants of the same word, and many others. These examples were discussed in Chapter 2. Examples of efficient word order, which were provided in Chapter 3, include the subject-first bias in languages of the world, cross-linguistic preferences in the order of morphemes, Greenbergian correlations and implications, and the preference for structures that minimize syntactic domains and dependency distances. It can also be useful to choose more accessible forms and meanings and avoid less accessible ones, as argued in Chapter 4. Chapters 5 and 6 discussed different diachronic paths and causal scenarios that can explain how these efficient language patterns emerge and survive in language use. Although I do not exclude that some efficient patterns can emerge due to factors other than the pressure for efficiency, it was argued that efficiency may be a more successful explanatory factor in many cases.
Chapters 7 to 9 zoomed in on a set of diverse grammatical phenomena: causative constructions and differential case marking in languages of the world, as well as variation in the use or omission of function words in some English alternations. In all these cases, the amount of coding material negatively correlates with the accessibility of an intended interpretation. These diverse case studies and numerous other examples allow us to conclude that we are dealing with a universal phenomenon that is observed in different unrelated languages, across different levels of language structure and in the form of categorical and probabilistic rules. This book provides a unified framework for discussing these examples, bringing together insights from diverse accounts and theories: Neo-Gricean pragmatics, audience design, Ariel’s (Reference Ariel1990, Reference Ariel2008) Accessibility Theory, Zipf’s Principle of Least Effort, some aspects of markedness theory (cf. Fenk-Oczlon Reference Fenk-Oczlon1991, Reference Fenk-Oczlon, Bybee and Hopper2001; Haspelmath Reference Haspelmath2006) and Optimality Theory (e.g., de Hoop and Malchukov Reference de Hoop and de Swart2008), and many other ideas.
There remain many open questions and tasks for future research. First of all, the framework presented here needs to be tested on a wider range of linguistic phenomena in the future. I expect many more instances of efficient formal asymmetries to be found in languages of the world. There is a certain danger of being selective, picking up the phenomena that provide support to the account presented here, and ignoring the violations. We need to find out how we can sample the grammatical functions and the corresponding forms, and to reject the null hypothesis of no correlation between benefits and costs in a more systematic way.
The main challenge for efficiency research is how to integrate different types of costs in one metric. Ideally, we would need to compute total metabolic costs in calories or other units. But we do not know how to measure these yet. Until we find a solution, efficiency research is likely to remain a patchwork of different cost types.
There are many open issues in the debate about the role of audience design, mind reading and cognitive control. According to Croft (Reference Croft2000: 163), ‘it is clear that the speaker chooses degree of reduction of a constructional form with the hearer in mind’. But the extent to which the hearer is present in the speaker’s mind can be different. This study has argued that the choice of forms should be driven by simple heuristics, which work automatically and do not require conscious control and theory of mind. However, we can imagine that mentalizing mechanisms and cognitive control can be involved to different extents, depending on the specific case. My expectation is that they may be involved more if there are alternative ways of saying the same thing which are highly entrenched and conventionalized. Examples are particularized Gricean implicatures and Yoda’s word order (see Section 3.4).
Crucially, the connection between production efficiency and learning should be explored. A system that is easy to use may be difficult to acquire, and the other way round. One can hypothesize that there could be a negative correlation between articulatory ease and learnability. For example, formal reduction leads to ease of articulation but may obscure the identity of units. Automatization of articulation due to chunking is efficient for production but makes it more difficult to retrieve the individual components. At the same time, there may be a positive correlation between some aspects of processing ease and learnability. In particular, more accessible production plans (MacDonald Reference MacDonald2013) can also be easier to learn. Also, transparency may be a factor where these two motivations converge. For example, analytic constructions may be easier both to learn and to process than synthetic forms, although they are often more costly in terms of articulation (cf. Section 4.3).
Another important and understudied aspect of efficient behaviour is individual variability. In particular, we need more research on the correlation between theory of mind and efficiency (cf. Turnbull Reference Turnbull2019), in particular, on the differences between neurotypical and autistic individuals. We can also expect differences between speakers of different ages.
There is some evidence that men and women may have different strategies in some aspects of language processing. Namely, in one experiment Wang et al. (Reference Wang, Bastiaansen, Yang and Hagoort2010: 820) found that
females process information in an exhaustive way, and that they rely on all available information before rendering judgment. In contrast, male information processing is usually partial and incomplete, relying on a subset of highly available and salient cues instead of detailed message elaboration.
Wang et al. (Reference Wang, Bastiaansen, Yang and Hagoort2010) show that men are sensitive only to semantic anomalies that happen in the focus position, as in (1a), while women are sensitive to semantic violations regardless of the information structure, both in (1a) and (1b).
| a. | What kind of vegetables did mum buy for dinner today? – Today mum bought BEEF for dinner. |
| b. | Who bought the vegetables for dinner today? – Today MUM bought beef for dinner. |
This finding suggests that men prefer to minimize the costs, at the same time risking losing some benefits, while women maximize the benefits, at the same time increasing the costs. It is an open question which strategy is optimal.
Finally, the ideas about efficiency presented in this book should be formalized in the framework of probabilistic pragmatics (Franke and Jäger Reference Franke and Jäger2016), as in Rational Speech Act Theory (Frank and Goodman Reference Frank and Goodman2012), for example. The general principles should be spelled out formally and tested empirically, for instance, in communication games. At the same time, it is a matter of ongoing debate to what extent these sophisticated models can capture real communication. Recent empirical evidence shows that the rational models do not predict addressees’ communicative behaviour better than a baseline model driven solely by literal word meaning and a prior reflecting the contextual salience of referents (Sikos et al. Reference Sikos, Venhuizen, Drenhaus and Crocker2021).
Communicative efficiency is more relevant for society than one might think. Everything that has to do with distribution of resources, including time and metabolic energy, is a question of power. In ideal communication, both the speaker and the addressee should benefit from their efforts. However, there are some examples where this ratio is different for different communicators. For example, when a boss gives an employee a task in a cryptic message without detail, they save their own effort while exploiting the cognitive resources of the employee. Additionally, they may avoid the social costs of taking responsibility if the task does not bring the expected results. They can say, ‘This is not what I meant!’ Another example is when the addressee has no interest in processing the speaker’s message because there are no cognitive benefits for them, but keeps doing so out of politeness or fear. One can think about women’s and men’s communication, where men take the floor more often, while women listen patiently. This is a consequence of a long tradition of institutionalized misogyny, when religious and social authorities told women to keep silent in public (and in many places of the world still do).Footnote 1 This asymmetry in benefits and costs may not be obvious, but it cements further the existing inequality, especially as it permeates daily interaction. This is why communication efficiency is a political issue. We may not always have enough power or self-confidence to avoid disbalance, but we should be aware of how much each participant invests and gains from communication. How to transform communicative practices, taking into account the interests of underprivileged individuals and groups, remains an important question.
Language is an efficient tool that has helped us to become what we are as a species. It allows us to achieve great things collaboratively. With the development of new communication technologies, the costs and benefits change. For example, nowadays predictive text technology (T9) helps us to minimize writing effort. It is highly successful, despite all the anecdotes about awkward mistakes (and sometimes high social costs). On Twitter, the maximal length of a message is restricted to 280 characters. As a result, Twitter users are very aware of the space costs of different expressions, which has a strong effect on their linguistic choices. Communication via online conference software, such as Skype or Zoom, helps to connect people from all parts of the world, opening new opportunities, but at the same time it is more demanding for the brain than face-to-face communication.Footnote 2 The number of visual signals about body language is restricted, and it is more difficult to connect emotionally. Communication between multiple participants in the gallery view in Zoom, for example, is difficult, as well, because one has to monitor too many cues simultaneously. These are all additional costs of communication that we need to accommodate to.
Where are we heading? Elon Musk’s start-up Neuralink has been working on a brain implant that could communicate with a computer or mobile phone. In addition to helping people with injuries and disabilities, it could also allow people to communicate with each other without using language.Footnote 3 This kind of telepathy would be extremely efficient. No time and effort would be spent on encoding a message in words and decoding them. Yet I do not think that language will become obsolete in the foreseeable future. Even if one disregards the privacy and ethical issues,Footnote 4 the sheer complexity and diversity of the human brain does not allow anyone to decipher a brain signal and turn it into concepts, and all the way back. Linguists have nothing to worry about at the moment. But surely this line of research will lead to new and more efficient communication tools and gadgets, in ways that no one can predict now.








