The Evolution of Syntax

Part I The Evolution of Syntax

2 From the Protolanguage Spectrum to the Underlying Bases of Language

2.1 In Search of the Underlying Bases of Language (UBL)

2.1.1 Abandoning the Chomskian Frameworks

When Bickerton first claimed that creolization provides a model for the emergence of language, he lumped precursors of human language, aphasic language, the utterances of trained apes, early child language, and home sign systems together as instances of protolanguage.Footnote * Later, Bickerton recanted this view. For Bickerton (Reference Bickerton2009), and for this chapter, we focus on the evolutionary aspect: A protolanguage was an open system of communication used by a particular hominid grouping that was distinct from an animal communication system and was a potential precursor of “true” language. I use the word “potential” to signal that many protolanguages would have died out without being succeeded over the long haul by full languages.

My original plan for this chapter was to build upon two papers that contrast Bickerton’s view of protolanguages (Bickerton Reference 32Bickerton2008) with my own (Arbib Reference Arbib2008),Footnote ¹ along with further material inspired by Adam’s Tongue: How Humans Made Language, How Language Made Humans (Bickerton Reference Bickerton2009) as well as a look back at “The Language Bioprogram Hypothesis” (Bickerton Reference Bickerton1984). However, I then found that Bickerton’s final book on these topics, More than Nature Needs: Language, Mind, and Evolution (Bickerton Reference Bickerton2014) offers a volte-face, based on changing notions of Universal Grammar (UG), that must also be addressed.

Chomsky’s notion of UG is protean, changing each decade or two as he offered major restructuring of syntactic theory. Motivated by what he took to be the Poverty of the Stimulus,Footnote ² Chomsky (e.g., Chomsky & Lasnik Reference Chomsky, Lasnik, Jacobs, van Stechow, Sternefeld and Vennemann1993 for an exposition) advanced a Principles-and-Parameters UG (henceforth P&P)Footnote ³ that saw the human genome as encoding all possible rules of human syntax, positing that the child’s task in language acquisition was simply to recognize and henceforth use those parameters employed in the circumambient language. However, Chomsky’s current approach to language, the Minimalist Program (Chomsky Reference Chomsky1995), rejects principles and parameters in favor of a single operation, Merge, and offers no solution to the claimed poverty of the stimulus. Nonetheless, Bickerton (Reference Bickerton1984) was taking P&P as the regnant version of UG when, having argued that all creoles exhibited similar structure,Footnote ⁴ he further posited that this was because the children employed default settings for parameters not exemplified in their environment. This changed as Bickerton himself clearly stated (Reference Bickerton2014: 224–225):

The claims made in Bickerton [Reference Bickerton1984] that creoles are produced by children, in a single generation, from a relatively structureless early-stage pidgin, with the aid of an innate program for language, have been challenged frequently … Bickerton [Reference Bickerton1984] was published at a time when the principles-and-parameters model was in vogue. … [T]here appeared to be a simple way the two models could be reconciled. Creoles might represent the set of unmarked parameters in a parameterized UG, and if “unmarked parameter” meant “default parameter” that setting would be assumed by the child in the absence of evidence to the contrary … [This P&P UG with default parameters was what Bickerton then called the Language Bioprogram, LB.]

[In the late 1980s] the major focus of my interest shifted to the evolution of language, and … this was a serendipitous development. It turned out that one could not hope to fully understand how creoles emerged without understanding how language had evolved, and understanding how language evolved suggested that the true nature of syntax (a mix of biologically given and inductively learned materials …) differed from any account hitherto proposed. Among the benefits of this new account was a final clarification of the true relationship between the LB and UG.

In particular, Bickerton (Reference Bickerton2014) reduces UG to a very basic set of operations, leaving most of syntax to cultural evolution rather than the human genome. Since the “UG” of 2014 is so different from the UG of P&P, I will rename it as the Underlying Bases of Language (UBL) and assess its relation to protolanguage, language, and (briefly) pidgins and creoles. Moreover, I will argue for a different notion from UBL – the language-ready brain, rooted in comparative neuroprimatology – and suggest challenges in reconciling the two.

2.1.2 Protolanguages and Niche Construction

Niche construction theory (Odling-Smee et al. Reference Odling-Smee, Laland and Feldman2003) emphasizes that creatures don’t evolve within a fixed ecological niche. They may construct new niches, thus changing the conditions in which they and other species can further change – consider beaver ponds creating a whole new ecosystem for diverse species. Bickerton (Reference Bickerton2009) argues that just one specific protohuman niche crucially reoriented protohuman evolution on a path to language – territory scavenging. I dispute his claim for uniqueness while agreeing that niche construction theory is an excellent framework for studying human evolution.

For Bickerton (Reference Bickerton2009: 92), “No serious scholar nowadays doubts that language is, at bottom, biological rather than cultural, and therefore was not created, but somehow evolved.” I think this is a mistake if the evolution here is taken to be primarily by natural selection, and perhaps the Bickerton of 2014 would agree. I thus advocate an EvoDevoSocio framework: The genotype of a creature is not a specification of the adult creature, but of the mechanisms whereby that creature will develop into its adult form, and that development depends on the niche in which it occurs, a niche that is both socially and physically constructed. In humans, we see an accelerated cycle wherein the niche is itself transformed by cultural evolution (social processes that do not primarily affect the genome), thus changing the potentialities for future evolution. (Cultural evolution also occurs, though on a much lesser scale, in other animals – see Whiten Reference Whiten2019.)

A crucial hypothesis of my account of language evolution (§2.3) is that early Homo sapiens had “language-ready brains,” that is, brains sufficiently similar to our own that they would have been able to use language had it already existed. However, it was only in the last 100,000 to 50,000 years that humans made widespread use of language – and my claim is that this did not rest on a brain-transforming mutation of the genes but reflected the cumulative effects of cultural evolution. In the same vein, human brains were “agriculture-ready” and even “internet-ready” long before humans developed agriculture or the internet. I argue, then, that biological and cultural evolution yielded the genome for a language-ready brain, but it took tens of millennia of further cultural evolution to yield cultures in which children with similar genomes could develop language-using brains.Footnote ⁵

2.2 Towards an Adequate Theory of Language Origins

In much of Bickerton’s writing, the units of a protolanguage are words akin to nouns, verbs, and modifiers, but I will argue (§2.3) that protowords need not be akin to words of current human languages but may, rather, be holophrases. This stands in contrast to pidgins which, whatever their limitations, are composed of units akin to words of their related languages.

A group with a protolanguage can create new protowords to convey more meanings. Initially, though, combinability is limited to saying a few things together: An utterance of a protolanguage is a sequence of protowords like beads-on-a-string (Bickerton Reference Bickerton2014: 105). It’s a bit like saying water and bird with the possible interpretation that you have seen a swan – but with no developed strategy of combination here.Footnote ⁶ Combinability marks Bickerton’s Rubicon separating protolanguage from language, with a syntax that goes beyond beads-on-a-string.

Bickerton further argues for uniqueness in the transition to language: “If the proposed trigger for language is anything that affects other species, it’s not likely to be the right one” (Reference Bickerton2009: 29). On this basis he dismisses hunting, tool making, social relations, rituals, gossip, and so on. This seems mistaken. For example, some other animals use tools, but as new tools were invented in human pre-history they may have demanded new strategies for communicating about them, and these strategies in turn could make possible the development of more complex tools, in an expanding spiral. Thus, hunting, tool making, and social relations (whether gossip, ritual, or coordinating behavior) could all have been part of the way early humans began to have new ways of community that led in turn to larger protolanguages that laid the possible basis for languages.

Bickerton takes displacement as the key feature that distinguishes most Animal Communication Systems from language, stressing its relevance to recruiting others to engage in some joint activity– but note that tool making, too, involves displacement in space (where to acquire the stones) and time (preparing tools for later use.) Moreover, displacement offers another point contra “uniqueness” as a criterion. He notes that in certain bee and ant species, one insect can signal to enable others to forage in a place where it has found food. Thus, the fact that some activity is shared by other species does not militate against its relevance for protolanguage and language. Note, though, that Bickerton argues that it is the size of the human brain that lets humans, ultimately, communicate about displacement without the restriction to a specific type of innate behavior in communication for ant and bee foraging.

2.2.1 Territory Scavenging

Nonetheless, Bickerton (Reference Bickerton2009) argues for territory scavenging as the unique niche in which displacement was added to the protolanguage repertoire, arising when prehumans, perhaps 2 million years ago, began to range over broad territories for the carcasses of large dead animals, butchering and consuming meat wherever they found it while keeping other predators at bay. The notion is that someone who found such a carcass might have to travel some distance to recruit others to the site, and this would require displacement – communicating about a place and an activity outside the here and now. Some scholars emphasize territory hunting rather than territory scavenging as their proto-activity, but I reiterate that one need not privilege either of these, or the other candidates, over another. One could also object that signaling the location of the site is not necessary if the one who discovered it will lead the group back to the site.

Nonetheless, let’s agree that the territory scavenging niche could be one in which protolanguage could be enriched by a means for expressing displacement as a step toward language. But let’s not grant too much. Such a skill based on scavenging no more indicates a general understanding that protolanguage can convey information about other places and times than does the “language” of bees suggest that bees can dance about displaced events in general.

2.2.2 Micro-Protolanguages

As human praxic skills became more complex, they became harder to learn through trial and error, even if shaped by observation of practitioners, raising a need for teaching (Gärdenfors & Högberg Reference Gärdenfors and Högberg2017). The claim is that this included increased use of pantomime as a way of communicating skill. Such observations set the stage for the technological pedagogy hypothesis (Stout & Chaminade Reference Stout and Chaminade2012) that basically says that as you try to teach people more complicated things, you are probably going to need more protowords, and cumulatively this may lead to the more subtle word combinations of language.

Pedagogy is not the only driver. Protolanguages would be of use in diverse aspects of social coordination (hunting, childcare, food preparation, and more). My hypothesis is thus that early humans with different responsibilities within social groups may have developed micro-protolanguages specific to communicating about their own activities (while still relying greatly on context, gesture, and pantomime) long before the emergence of larger protolanguages shared by the whole community, in part through integrating portions of these diverse micro-protolanguages.

An interesting parallel is offered by the special styles and registers in the indigenous men and women’s spoken and signed languages of Australia.Footnote ⁷ For example, the roles of men and women in the Yanyuwa people in the Northern Territory of Australia are contrasted not only by their social roles, such as ritual life, hunting, and nurturing, but also explicitly by the use of different dialects (Bradley Reference Bradley and Coates1988). This contrasts with the observation (Kirton Reference Kirton1988) that such differences are normally associated with language dialects in separate locations. Members of each sex have a passive knowledge of aspects of the other dialect but do not normally use it. However, some of the knowledge cannot be shared or made publicly available, with only the owners of the knowledge being allowed to use it.

This example is within the context of fully developed Aboriginal languages in a present-day community, but is concordant with the notion of early human subgroups of a single tribe having distinct micro-protolanguages. Moreover, many crafts today have their own lingo – for example, blacksmiths use terms like hardie hole and anglesmith. I would further argue that mathematics is a full language that can be linked to English or some other natural language, but may work better, in part, in written rather than spoken form to exploit spatial layout of symbol structures.

2.2.3 “The” Debate on the Transition from Protolanguages to Languages

The Emergence of Protolanguage: Holophrasis vs Compositionality (Arbib & Bickerton Reference Arbib and Bickerton2010) was based on the debate between two views:

The compositional view (Bickerton Reference Bickerton1995): A protolanguage consists of a few nouns and verbs strung together without syntactic structure. For Bickerton (Reference Bickerton2009), language emerged by just adding Merge (pure syntax), though by 2014, Bickerton had replaced Chomsky’s Merge by Attach, as described below.

The holophrastic view (Arbib Reference Arbib2005; Wray Reference Wray1998): In much of early protolanguage, a complete communicative act involved a “unitary utterance” or “holophrase”: A holophrase has no subparts whose meaning can be separated from the whole and the meaning of the whole cannot be reconstituted from its subparts. Nonetheless, it is a candidate for becoming part of a protolanguage because it is a conventionalized performance that had to be “invented” and then passed on among the members of the tribe. Such protowords may often be more akin in meaning to phrases or sentences of a modern language. Moreover, there must be a measure of parity in that a meaning is shared, more or less, by both speaker and hearer. Only some may utter it – but all to whom it is directed can understand it. When I mention speaker and hearer, I will include those who perform non-vocal communicative gestures and those who interpret them.

On the account developed in §2.3, to get to language, words co-evolved culturally with syntax: As “protowords” were fractionated or modified to yield words for constituents of their original meaning, so were constructions developed to arrange the words to reconstitute those original meanings and (the advantage of this transition) many other meanings beside. Despite opposing the holophrastic view, Bickerton does consider how displacement might support a possible transition from holophrase to noun (Reference 32Bickerton2008: 171):

Initial displacement signals might well have been holistic; a protoword might have been interpreted as something equivalent to “There’s a dead elephant out there and we can eat it if we all move quickly” … However, … once the “elephant” signal was freed from its dependence on a physically-present, sensorily-accessible elephant, the road was opened to … use of the same signal on seeing elephant footprints or dung, … for example. Use in a constantly widening range of contexts would move the signal closer to … the kind of meaning exemplified by the modern word “elephant.”

The theory of §2.3 does not exclude such mechanisms but may require that the original protoword take somewhat different forms to distinguish the original meaning from the emerging protonoun.

The rest of this chapter will play out the debate over three sections: §2.3 presents the Mirror System Hypothesis on the biocultural evolution of the language-ready brain as setting the stage for the cultural evolution of languages, while §2.4 summarizes Bickerton (Reference Bickerton2014)’s notion of what I now call the underlying bases of language (UBL). I will argue that the UBL lacks key elements of the language-ready brain, and that these offer support for issues that Bickerton ignores. Finally, §2.5 turns, briefly, to the issue of pidgins, creoles, and the Language Bioprogram Hypothesis, and notes what has and has not been achieved by the transition from the P&P UG of 1984 to the UBL of 2014, and the extra perspective offered by the holophrase hypothesis.

2.3 To the Language-Ready Brain and on to Languages

Where Bickerton emphasizes that humans are very different from other creatures, the Mirror System Hypothesis (MSH) assesses how being related to apes and monkeys may have provided our last common ancestors with a substrate on which that differentiation could operate.Footnote ⁸ Our closest primate relatives do have some forms of communication systems (very limited vocal calls; plus manual gestures for great apes), but they also share with us manual skills and the ability to master new ones. Given this disparity, MSH postulates that manual skills were more important than vocal skills for the early stages of evolution of the language-ready brain (which, MSH asserts, provided the substrate for the cultural evolution of languages from protolanguages).Footnote ⁹

2.3.1 Biocultural Evolution on the Path to Homo Sapiens

Humans are creatures that share with other creatures the ability to interact with the real world and observe the behavior of others. For MSH, the starting point is the manual skills we share with apes and monkeys, and it then charts the evolution of and beyond imitation of manual skills and the importance of being able to pay attention to the details of other’s actions. Apes, the evolutionarily closest cousins of humans, have the ability to imitate, but not the ability to facilitate learning by paying attention to the details of movement. MSH posits expansion of imitative ability after the last common ancestor of humans and great apes that yields a blend of

complex action recognition – the ability to “parse” behavior into constituent actions, with the added ability to attend to the trajectories as well as the goals of subactions; with
complex imitation which exploits this ability to achieve a first approximation to that motion without trial and error.

Such complexity is graded, but even in modest form it greatly expanded the ability of protohumans to develop and share new skills that could then propagate through the community. In particular, complex action recognition offers a precursor for parsing in language – a topic that Bickerton largely ignores. Note, too, that action recognition would be invoked more often to understand what another is doing so that one can respond to them, rather than trying to imitate them to gain a new behavior.

MSH emphasizes that these mechanisms were in place as part of manual skill well before protolanguage – but they provided the scaffolding for the emergence of the latter. Being able to parse a behavior into constituent actions is also what we do when we try to listen to a speech stream and pull out the words. In addition, we have to make sense of how they achieve various communicative subgoals, as distinct from the ape-shared ability to recognize practical subgoals (Figure 2.1).

Figure 2.1 How chimpanzees did not acquire language

(Image of the “100th Monkey” – well, chimpanzee – reproduced with permission of the photographer Louie Psihoyos, www.psihoyos.com)

MSH argues that a capacity for pantomime came next. Based on actions performed when objects are present, the panto-mimic carries out the movements in the absence of the objects to convey some message – a form of displacement. The observer must be able to associate the miming with a pattern of behavior and infer the intended message. (If I mime drinking from a coffee cup, am I asking you to get me coffee, or inviting you to drink coffee, or … ? Clearly context helps resolve the ambiguity.) It may take biological and then cultural evolution to create a social niche in which production and recognition of open-ended performances becomes possible.

However, pantomimes can take a long time to perform and yet be ambiguous. Pantomime is then transitional to protosign, as conventionalization provides a sign that is easier to produce and recognize and less ambiguous. In an anachronistic example from American Sign Language (ASL, which is a full human language, not a protosign system), what might be taken as a one-handed pantomime for PLANE FLYING – moving a handshape akin to a plane with outspread wings through a long trajectory – became conventionalized as the sign for FLY, whereas moving the same handshape back and forth in place is now the sign for AIRPLANE (Supalla & Newport Reference Supalla, Newport and Siple1978). Such a distinction lets one signal, with the help of an extra sign in each case, that “a fly is flying” or that “a plane has crashed.”

With this we can summarize portions of the biocultural evolution (involving both Evo and Socio) that yielded creatures with a language-ready brain – but who did not have language:

A complex action recognition and imitation system
A pantomime system as the ability to freely pantomime performances to open up communication
Emerging protolanguages (an expanding spiral) combining protosign, a manual-based communication system, and protospeech.

Elsewhere (Arbib Reference Arbib, Zywiczynski, Blomberg and Boruta-Żywiczyńska2024), I offer a related exposition of MSH but with emphasis on the diverse types of pantomime, the relevance of pantomime in grounding co-speech gestures as well as words and constructions, and on new directions for MSH provided by moving beyond single utterances to narratives.

2.3.2 A Key Role for Fractionation in Language Emergence

MSH hypothesizes that early protolanguages defined a new niche for social interaction that served as the setting for novel behaviors – communicative, practical, and social – that could generate more things to “talk” about (manually and/or vocally) and so in turn could stimulate the emergence of more and more complex (proto)languages. The expanding (proto)language niche came to support a wide range of different constructed niches that further transformed the course of social evolution.

On the posited path to languages, protowords yielded to words governed by constructions; lexicons and grammars evolved together to provide the ability to express a widening range of novel meanings. With Wray (Reference Wray1998, Reference Wray, Knight, Studdert-Kennedy and Hurford2000), MSH insists that many (not all) protowords may have been holophrases, and a key claim here is that words co-evolved culturally with syntax through fractionation. The “flying plane” example above offers one of the openings to other mechanisms.

Complex action recognition enables humans to look at what other people are doing and break it into meaningful parts. Here’s the crucial addendum: We may often see a behavior for which we do not know the parts. When we try to break it into meaningful parts, we may or may not succeed – but the claim going forward is that we may “see” parts that did not preexist and give them meaning to make sense of the whole, and the complementary process of putting the pieces back together may yield a new skill. Here, MSH posits that these abilities came (perhaps after hundreds of millennia) to support more and more fractionation of holophrases – breaking them into pieces that came to be associated with their own meanings. The complementary process yielded constructions as a way to “put the pieces back together” – but now with the advantage that they could also combine pieces that had never been combined before.

To use another anachronistic example, consider pantomimes for opening or closing a door. No part of either pantomime corresponds to door, but their shared meaning can yield a new fractionation. The only part in common is the pantomime of turning the handle. This could then become fractionated out as the protosign for door while the complementary parts, the two directions of moving the hand, could yield protosigns for open and close. This yields the constructions X-open where the slot filler X must meet a semantic criterion of being something-that-opens, and similarly for X-close as a distinct idiosyncratic construction.

Moreover, the existence of even such primitive constructions can scaffold the invention of new (proto)words – for example, a rapid opening and closing of the mouth might become adopted as the protosign for mouth that could provide a new X to be available to fill the slot in X-open and X-close.

There were no general forms for constructions at this early stage. Given our posited framework, the orders in (X that is an openable thing)-open and want-(X that is a desirable thing) could coexist happily at an early stage of protolanguage development. However, generalizing across similar constructions to get more general constructions while also generalizing across the slot fillers could, possibly over a very long period, yield increasingly general categories. As we generalize, more and more constructions may share overlapping classes of slot fillers, and then the semantic restriction becomes weak:

semantic \to “ semantico - syntactic ” \to syntactic

Finally, we have general classes like nouns and verbs that can occur in diverse constructions with only weak semantic restrictions like “many elements of this class relate to objects” or “many elements of this class relate to actions.”

Thus, as want and open are recognized as part of a more general category of action words, we would have an emerging conflict between what can now be seen as constructions X-(sort of verb) and (sort of verb)-X, and thus different protolanguage communities might settle on one form or the other, with later complexities calling for the development of, for example, case markings lying far in the future.Footnote ¹⁰

Kemmerer (Reference Kemmerer2005) emphasized how the varied constructions of different languages ensure that even such seemingly basic syntactic categories as “verb” and “noun” vary from language to language, with variations arising through cultural evolution. Crucially, then, a constituent is not just “within the syntax” but also “in the semantics” – supporting the ability of the hearer to aggregate diverse words to some aspect of the overall understanding.

2.3.3 The Protolanguage Spectrum and the Continuum to Languages

Different groups within a tribe may develop different micro-protolanguages, and different tribes would certainly develop different protolanguages, each then growing with both limited and more general “inventions,” with diffusion both within and between tribes yielding more complex and widely shared protolanguages to the point that some could be considered as simple languages. Rather than there being a single protolanguage and a hard boundary with language, there would be a protolanguage spectrum (see Figure 2.2) as primitive protolanguages became enriched by an increasing lexicon, and more and more constructions.

Figure 2.2 Protolanguage spectrum

There is no strict criterion here for simplicity or complexity of languages. I am simply diagramming the reasonable assumption that over time languages expanded in vocabulary, the number of constructions, and the depth of nesting of those constructions, with some of the later constructions subsuming a range of earlier ones without necessarily displacing them. Construction Grammar posits that many idioms must be associated with constructions since the meaning of an idiom – e.g., kick the bucket – cannot otherwise be inferred via compositional semantics (Fillmore Reference Fillmore, Bach and Harms1966). Conversely, though, a language may lose some of the richness of the construction as it evolves from a precursor and yet still remain a complex language – for example, compare French and Latin. One must distinguish changes along the timeline from claims that one current language is simpler than another.

Even Alison Wray, who contributed so much to the theory of holophrasis, has concerns about what use a “partial” grammar might be (Reference Wray1998: 48):

[T]here is a critical level of complexity that must obtain for a creative grammar to be useful in expressing propositions. … [I]t is difficult to imagine what advantage a primitive, half-way grammar would have for its users, over the highly successful interactional systems of other primates …

I disagree. The trouble comes from viewing a grammar as providing an all-or-none capacity to express propositions or as comprising a rather small set of general rules, rather than as a set of independently useful constructions which can have “standalone utility.” As we move from (micro-)protolanguages to more and more powerful languages, those independently useful constructions become subsumed by more and more general constructions. But even then we do have very specific constructions. For example, if I say the woman in red I have employed a very focused construction in the sense that it means “the woman who is wearing red clothing,” and the second slot filler red has to be a color.

The key claim that words co-evolved culturally with syntax through fractionation does not exclude Bickerton’s concern with how existing words might be combined, but it does emphasize that the emergence of constructions can help create words, not just combine them. Recall the ASL flying plane example – one can reduce and “annotate” a pantomime to reduce ambiguity and get new “words” rather than fractionating the pantomime. We saw how “flying plane” became annotated to distinguish “flying” and “airplane.” Bickerton’s restriction of the “elephant-holophrase” to “just the elephant” would provide an adaptive pressure to put words together via constructions (which might be as simple as word order) to express necessary context.

2.4 To the Underlying Bases of Language and on to Languages

Chomsky abandoned Principles and Parameters (P&P) for Merge – but a Minimalist Language Acquisition Device has no explicit principles built-in, as P&P did, to address Chomsky’s (spurious) Poverty of the Stimulus argument. Similarly, Bickerton no longer views the Language Bioprogram as a P&P-like UG and offers an account of language evolution in which the emergence of Attach as “something-like-Merge” is the key step. However, Bickerton’s Attach is more plausible than Merge in describing language evolution.

2.4.1 Contra Berwick & Chomsky’s Merge-Based Account of Evolution

Thus, before summarizing the account in Bickerton (Reference Bickerton2014), let’s briefly outline what I see as flaws in the Merge-based account of language evolution offered by Berwick and Chomsky (Reference Berwick and Chomsky2015, henceforth B&C) in Why Only Us: Language and Evolution. Their answer to “Why Only Us” reduces, unsatisfyingly, to “Because we had a mutation that no one else had, and that mutation endowed us with Merge.” Moreover, they introduce the term “protolanguage” only to parody it (B&C: 72):

there is no room in this picture for any precursors to language – say a language-like system with only short sentences. There is no rationale for positing such a system: to go from seven-word sentences to the discrete infinity of human language … and there is of course no direct evidence for such “protolanguages.”

They ignore the notion that protolanguages might have employed meaningful protowords with little attempt to combine them. Moreover, they reject the notion of cultural evolution (B&C: 91–92; my italics):

what has evolved … is, of course, not languages but rather the capacity for language – that is, UG. Languages change, but they do not evolve. It is unhelpful to suggest that languages have evolved by biological and nonbiological evolution. The latter is not evolution at all.

The problem is compounded in that hints scattered through the book reveal Merge as a process whose evolution through a one-shot mutation is wildly implausible when one considers the complexity that B&C ascribe to it (see B&C: 10–11, 71–72, and 136 for the sources of this summary; my italics):

Merge, the single operation for building the hierarchical structure required for human language syntax, is just set formation. Given a syntactic object X (either a word-like atom or something that is itself a product of Merge) and another syntactic object Y, Merge forms a new, hierarchically structured object as the set; the new syntactic object is also assigned a label by some algorithm that employs minimal search, which locates the features of the “head” of the combination. Moreover, central components of thought, such as propositions, are basically derived from the optimally constructed generative procedure …

Moreover, we learn (B&C: 71–72) that

[Wolfram Hinzen (Reference Hinzen2006) espouses the very strong thesis] that central components of thought, such as propositions, are basically derived from the optimally constructed generative procedure … This Strong Minimalist Thesis (SMT) [would] reduce to the emergence of Merge, the evolution of conceptual atoms of the lexicon, the linkage to conceptual systems, and the mode of externalization.

B&C offer no hints as to how this complex of italicized Merge-related properties could have emerged via natural selection.

2.4.2 Bickerton on the Transition from Protolanguage to Language

Bickerton’s account of UBL (his new version of “Universal Grammar” defined in 2014: chapter 5) starts with nouns and verbs, but with modifiers where the immediate context plus pragmatic considerations (rather than syntax) are sufficient to identify a specific referent. He assumes that a primate brain, once equipped with words, would eventually create algorithms for the construction of phrases (take a noun and add modifiers) and clauses (take a verb and add phrases).

UBL does not specify what could constitute a modifier or how long a string of modifiers could be, but Bickerton crucially invokes a biological prerequisite, namely episodic memory, the ability to recall past events. Moreover, utterance complexity depends on working memory, the ability to hold enough details of the utterance and its associated meaning. We must also add brain structures that support prediction, planning, and imagination, thus licensing the ability to talk about (possible and impossible) futures (for mental time travel, see Corballis Reference Corballis2018; Suddendorf & Corballis Reference Suddendorf and Corballis2007). Assessment of these capabilities and neural underpinnings is clearly relevant to any attempt to synthesize compatible elements of UBL and MSH in further development of an evolutionary neurolinguistics.

Bickerton’s challenge is to chart a path from protolanguages in which words are only combined like beads-on-a-string, to languages in all their variations with sentences structured hierarchically. Interestingly, there is a link here to action more generally, but without the details explored by MSH:

[O]nce our ancestors started to talk at even a protolinguistic level, the brain would have begun to elaborate schemas for routinizing sentences (call them algorithms for sentence production, if you prefer), just as it elaborated schemas for throwing and similar actions. Stereotyping and automatization of action schemas save energy and free up neurons for other functions.

(Bickerton 2014: 129–130)

Bickerton (Reference Bickerton2014) abandons Merge for a somewhat related notion of Attach that allows semantics to enter into selecting what is to be attached. Bickerton’s “Attach A to B” is asymmetrical, with A being a modifier and B a head, so that B has priority over A. The key point is to replace “let’s put syntactically labeled words together” by “let’s put meaningful words together in a meaningful way,” expanding a message of relevance to speaker or hearer. As a variation on Attach, MSH (or construction grammars more generally) license building new constructions with varied slot fillers. However, what UBL and MSH share is that meaning is involved right from the start – “Attach A to B” requires that A express something relevant to B.Footnote ¹¹

Yes, we can enjoy Jabberwocky given the freedom of syntactically licensed slots, but this is a jeu d’esprit that distracts us from the adaptive pressures that drove the emergence of languages in all their diversity. As further jeux d’esprit, we can note rhyme emerging in time, or alliteration with further iteration – but they are not drivers for the protohistory of language.

Bickerton invokes checking that compares features on the items to be attached to ensure that there is a match. A singular noun cannot accept a plural modifier (e.g., *several dog) – but we are deep in the process of cultural evolution here, where syntactic cues join semantic cues in controlling slot fillers. And not all languages require this check. This is why I insist that discovering general constructions and words together is important – getting the syntactic notion of plurality requires modifying the words within a previously established way to combine them.

Strangely, Bickerton at first seems to accept that the brain must begin sentence construction with some equivalent of what the Chomsky (Reference Chomsky1995: 225) of Minimalism calls the “Numeration,” the collecting of all the lexical items to be used in a sentence – a bag of words – followed by the operation “Select,” which begins by choosing from the numeration the first items to be merged into the derivation. The operation that Chomsky has described as “Merge” then follows. But Bickerton’s Attach is already pre-grouping words with the head they will modify.

But working with a bag of words, or several closed bags of words, is not what happens. We start with some idea of “what we want to say” and then we seek words and constructions that can express some part of the meaning. In experiments from my lab, Lee (Reference Lee2012) studied people describing visual scenes, recording a time series of both visual fixations and word utterances. He studied two conditions. Under time pressure to rapidly describe the scene while inspecting it, subjects would provide a sequence of fragmentary utterances; however, subjects invited to describe the scene only after a thorough inspection could produce well-formed sentences, and not use some of the extraneous words that were emitted under time pressure. We have developed Template Construction Grammar as the basis for computational models (Barrès Reference Barrès2017; Lee Reference Lee2012) of how schemas for words and constructions may compete and cooperate (distributed rather than serial computation) to offer coverage for an assemblage of visual schemas providing the conceptual representation of the scene. The same fragment of a scene could, for example, yield the sequences “girl wearing a blue dress” or “young woman in blue,” employing different constructions and different “bags of words.”

Bickerton (Reference Bickerton2014: 147) concludes that his account of UBL provides the totality of Universal Grammar in the sense of specific computational mechanisms for generating syntax. There are also things that are vital to language that involve words and their meanings rather than structure.

Ideally all truly universal features of syntax should follow from the model described … Since it is universal, it serves nowadays as the common skeleton underlying all modern languages. It is what enables children to acquire any of the several thousand languages in the world today and, in appropriate circumstances, to build an entirely new language for themselves.

(p. 135)

But is the key evolute here “Attach” or “Constructions which constrain the nature of slot fillers”? And are there universal algorithms for phrases and clauses or, rather, a general capability to discover and learn (this is my view) new “algorithms” for deploying constructions in a meaningful way and to parse the resulting utterances when produced by others?

2.5 Pidgins and Creoles and the New-Look Language Bioprogram

We finally circle back to where Bickerton’s efforts started, the study of pidgins and creoles. Bickerton includes pidgins in his pre-2009 definition of protolanguages (as well as the speech of young children and of aphasics, and of enculturated ape use of ASL signs) but, as already noted, it is important not to conflate these with the study of the evolution of protolanguages when language and the associated knowledge do not yet exist. How the Brain Got Language (Arbib Reference Arbib2012) shows how

an account of how the child acquires language,
the emergence of Nicaraguan Sign Language (NSL) and the Al-Sayyid Bedouin Sign Language (ABSL),
the role of grammaticalization in historical linguistics, and
a brief treatment of pidgins and creoles.

fit into the MSH framework. Each takes place in the context of a language-rich community in which people offer examples of “what one can talk about” that can drive language acquisition or creation. It would be an interesting challenge to update the accounts of these topics in light of the above discussion and of post-2012 research, but here I just add a few notes that summarize the discussion of pidgins and creoles from How the Brain Got Language, with a few comments inspired by reading Bickerton (Reference Bickerton2014).

2.5.1 Nicaraguan Sign Language (NSL)

NSL began to emerge when deaf kids, previously separated in their homes and considered mentally challenged because their home sign (a family-generated gestural system, much more impoverished than a pidgin) enabled them to communicate only in very limited ways, were brought together in a school. There they began to share each other’s home signs. Some gained widespread use, while others did not. Regularization on these scattered home signs induced general patterns of sign use. However, these processes occurred within a language-rich environment. Even though they could not pick up the patterns of spoken language, they did learn from other sign language communities and they could interact with Spanish-speaking teachers who expanded their conceptual sphere. Consider the rich cultural evolution that must be built upon to develop a sign for “Thursday” – it could not be mastered directly through the structures of the spoken Spanish, but through learning to understand the use of calendars and that Thursday is market day. Indeed, the subtitle of Sarah Polich’s (Reference Polich2005) The Emergence of the Deaf Community in Nicaragua, “With Sign Language You Can Learn So Much,” is the poignant expression of a woman who had learned NSL.

A creole builds on portions of prior spoken languages (selects, modifies, combines, and extends these fragments, but does not equal the sum of these portions), whereas a signed language starts from home sign – but the hypothesis is that similar mechanisms of social coordination in communicating about a shared physical and social environment will apply, and no P&P-style UG is involved (though some argue to the contrary).

2.5.2 Pidgins and Creoles

A pidginFootnote ¹² is a grammatically simplified means of communication that develops between two or more groups that do not have a language in common: for example, in trade. The substrate language (or languages) is that of the native speakers who are obliged to interact with the “elite group” who speak what will become the superstrate language of the pidgin.

A creole is a stable, fully fledged language that develops from the mixed input of different languages – while perhaps adding novel syntax – into a new one within a fairly brief period. Some creoles develop from a pidgin. Creoles are acquired by children as their native languages. The lexicon of a creole language is largely based on the superstrate. However, the grammar of the creole may have new or unique features that differ substantially from those of the parent languages that underlay their emergence.

Bickerton (Reference Bickerton1984) assessed several different creoles and found that the innovative aspects of creole grammar were similar across all these creoles and (perhaps due to too limited a sample of different creoles) inferred that all creoles had (at least initially) similar grammars, an inference no longer widely accepted. He offered his Language Bioprogram Hypothesis to account for the suggested grammatical similarities that could not be related to the superstrate and substrate languages: In expanding the pidgin, children are hypothesized to use their inborn universal principles of language, the Language Bioprogram which in 1984 was a P&P-like UG, but with the innovation that each parameter had a default setting to which such children reverted in the absence of external models.

For Bickerton, a pidgin is formed in stripping a language of its grammar, leaving only lexical categories, with no functional/grammatical categories, no syntax, no structured sentences, no meaningful word order, and no subordination. However, many pidgins arise in confrontations between colonial masters and native workers, so which language is being stripped? Can some word order, at least, be preserved? Certainly, semantic specificity must be preserved, or adjusted, including some nouns and verbs and their relation to objects and actions and some rudimentary word order and phrases of special relevance –and pointing, gesture, and pantomime would probably also play a role. Thus, in line with our language emergence theory, words might appear in specific but limited constructions, and not be sorted into general syntactic categories. Such phrases – learned as a whole rather than word by word – may, in the cultural evolution of the pidgin (whether or not the result is viewed as a creole) serve as the equivalent of the role of holophrases in the cultural evolution of protolanguages. A related point is made below about the young child acquiring a circumambient language.

My reading on creoles (years ago) suggests an influence of substrates that is not always dominant but is not negligible, either. Work is needed to calibrate this against what I have learned about emergent sign languages (NSL and ABSL) where the emergent grammar is indeed novel – but my point when I discussed them (Arbib, Reference Arbib2012: chapter 12) was that what is to be signed (or said) is not created de novo but rather in the context of a circumambient language, providing the knowledge that language exists. This latter point is crucial.

What of Bickerton’s Language Bioprogram Hypothesis? With the UBL framework now in hand, Bickerton (Reference Bickerton2014: 219, my italics) asserts:

Although the theory presented here differs in a number of ways from the original Bioprogram …, the predictions it makes about creole languages do not differ substantively from the original predictions. Since children start to develop syntax with little regard for the language they are supposedly “learning,” and since their (pidgin) input contains little that would add to or run counter to their innate algorithms, they produce similar structures worldwide despite wide variety in the developed languages spoken around them. Since nature has equipped them with mechanisms capable of generating a complete language, the process of creolization can be completed in a single generation. Since once a language has been learned, it is no longer possible to directly access those mechanisms, only children can produce a creole language.

In other words, we don’t need P&P default parameters – we just need innate algorithms that “produce similar structures worldwide.” I think this is misleading.

Children learn about the world and how to act and interact within it even as they learn words. Crucially, much of what they learn takes the form of holophrases. However (unlike the case posited in MSH for the passage from protolanguage to words and constructions), here the holophrases involve repetition of a string of words, but the child does not yet comprehend the separate words, let alone their syntactic roles in the circumambient language. The crucial implication is that the relevant “universal principles” are not those posited in UBL. Rather, what is universal is the ability of to do the following:

(a) The child must extract from the caregiver’s speech stream a few words that seem relevant to the current situation – a statistical process that over time will enable the child to associate various holophrases with some patterns of being-in-the-world (Hill Reference Hill1983).
(b) Only with further experience and further examples will the child perform the fractionation that gives the individual words cognitive status. In that process, the child will begin to extract certain simple constructions. The suggestion is that, when the child first says “wantmilk,” this is a holophrase, and only eventually will diverse examples yield the simple construction “want-X,” with X something wantable, like milk, or mummy, or teddy (a teddy bear).
(c) There is nothing at this stage that makes “want” a verb or “milk” a noun. Indeed, the recognition of these general categories (and their boundaries vary between languages) may await specific instruction in school.
(d) In general, the child will interact with multiple caregivers and, as they grow older, multiple other children. They will thus be exposed to varied ways of putting much the same words together to say the same thing. General principles of “economization of skill learning,” rather than universal syntactic algorithms, as well as automatization (and patterns of competition and cooperation) lead the child to complement a growing vocabulary with fluent employment of a focused set of constructions, rather than randomly repeating the diverse turns of phrase they have encountered. We see here the difference between episodic memory and procedural memory (Squire Reference Squire2004; Squire & Wixted Reference Squire and Wixted2011) – contrast a tennis player recalling how she returned a ball at one moment of a particular game with deploying the skill to vary how she hits the ball depending on both its trajectory and the position and movement of the opposing player.

In conclusion, it appears that the variety of creoles requires attention to such principles as (a)–(d) in a further iteration beyond Bickerton’s transition from a Bioprogram based on a P&P-style UG to one (here referred to as UBL) that bases language evolution and acquisition on “two major categories, those that concern entities (nouns) and those that concern actions, events, or states (verbs),” and simple algorithms for producing “phrases headed by nouns and clauses headed by verbs.” To what extent such a shift would be relevant to addressing recent data on pidgins and creoles, I must leave to the authors of other chapters in this book.

3 From Protolanguage to Deuterolanguage: The Importance of Compounds

3.1 Bickerton and Jackendoff on Protolanguage

In Adam’s Tongue (Reference Bickerton2009: 211, 232), Derek Bickerton comments on the development of protolinguistic signals into the complex, hierarchically structured sort of language that humans use today.Footnote * He asks both (a) how this occurred and (b) when this occurred. His answers are (a) ‘With the greatest difficulty’ and (b) ‘at the very earliest a couple of hundred thousand years ago’. Thus, for about a million years (he suggests), early humans got by with a rudimentary vocabulary and no syntax. My aim in this chapter is not to say anything about the difficult question of how and why syntax evolved – a question that I have tackled elsewhere (Carstairs-McCarthy Reference Carstairs-McCarthy1999)Footnote ¹ – but to explore the implications of a suggestion first put forward (so far as I know) by Jackendoff (Reference Jackendoff2002, Reference Jackendoff, Lieber and Štekauer2009), building on observations by Sadock (Reference Sadock, Lapointe, Brentari and Farrell1998). This is the suggestion that aspects of protolanguage survive vigorously today not only in pidgins (as one might expect) but also in some natively acquired languages, specifically in the realm of compounding. I will call this kind of extended protolanguage ‘deuterolanguage’ (that is, ‘second language’). This is intended to distinguish it on the one hand from fully fledged contemporary syntax and on the other hand from protolanguage (‘first language’) in its original guise, as Bickerton envisaged it in Adam’s Tongue.

One point at issue will be whether Bickerton is right in denying that there could be any transitional or intermediate stage between protolanguage in its original guise and modern languages with their hierarchical syntax. The protolanguage of any group of early humans consisted at first only of a vocabulary of simple protowords (we can assume). But what happened when our ancestors began to produce utterances in which two or more protowords were strung together? One is tempted to think that the combinations must have been interpreted in a purely ad hoc fashion, with much reliance on contextual clues, until syntax slowly came to the rescue. But, if Bickerton is right, this metaphor of syntactic ‘rescue’ is misleading. As he puts it (Reference Bickerton2009: 234):

You either drive on the left or drive on the right – there’s no intermediate stage … In just the same way, you use either protolanguage – beads on a string – or real language – Merge with hierarchical structure. There could not have been … a series of changes in protolanguage that brought it gradually closer to real language: either an utterance is hierarchically structured or it isn’t.

Syntax has an all-or-nothing character (according to Bickerton), which helps to explain why early tool-using humans remained culturally static for so long (a million years or so).

I will suggest here that Bickerton’s view may be too black and white. I am happy to agree that, whatever kind of structure protolinguistic utterances may have possessed, it was not the ancestor of modern syntax. But there is something else that this protolinguistic structure may be the ancestor of, namely a certain kind of compounding (illustrated by snowman and blackboard) that Jackendoff calls ‘a possible protolinguistic “fossil” in English’ (Reference Jackendoff2002: 249). Elsewhere, Jackendoff expresses the same idea thus: ‘compounding takes on an unexpected status in grammatical theory. It is not some odd peripheral aspect of morphology; it is a system that reveals some of the evolutionary roots of modern language, as it were a coelacanth of grammar’ (Reference Jackendoff, Lieber and Štekauer2009: 114). A coelacanth is a fish discovered in 1938 that resembles the fossilised marine ancestors of modern land vertebrates, and its linguistic analogue would thus be a hangover from a pre-syntactic variety of language in which word-class labels such as ‘adjective’ and ‘verb’ are inappropriate.

Let us suppose, for the sake of argument, that Jackendoff’s ‘coelacanth’ suggestion is on the right track, and explore the implications of combining it with Bickerton’s contrast between ‘real language’ and ‘beads on a string’. We will expect to observe that compounding (or, at least, a significant subset of the phenomena that linguists classify under the head of ‘compounding’) has the following characteristics:

A. Within complex items, a hierarchical structure that is less clear-cut than that of syntax (as expected, given its distinct historical origin).
B. A bias towards items with nominal rather than verbal or adjectival interpretations (assuming that items of protolanguage would typically have designated things rather than actions or characteristics).Footnote ²
C. A reluctance to admit modifiers of the kind appropriate for verbs and adjectives syntactically, even when verbal or adjectival elements are present.
D. Reliance on simple juxtaposition, with no structural signalling by means of affixation or stem change.

Each of these characteristics will be illustrated in due course, by way of comparison with actual or hypothetical linguistic patterns that lack them.

My discussion will focus mainly on English, with just a few glances at other languages. This may seem a serious limitation. However, I shall suggest that English compounding turns out to display characteristics which are pretty much what one would expect on the basis of Jackendoff’s suggestion (mentioned earlier) about protolinguistic fossils, but which are otherwise difficult to make sense of – and which are not echoed in the superficially similar compounding habits of closely related Germanic languages. This suggests that my guess may be on the right track, even though (needless to say) much more work on other languages would be needed to confirm it.

3.2 Lexically Listed Items and ‘Compounds’: An Apparent Confusion

For about seventy years, linguistic theory has been bedevilled by an ambiguity in the term ‘lexical’. This has been used to mean (a) ‘pertaining to words’ and (b) ‘semantically opaque, hence needing to be listed in a lexicon or dictionary’. The ambiguity has been tolerated because, first, most items that most linguists want to call ‘words’ (or, more technically, ‘lexemes’Footnote ³) are indeed semantically opaque, and second, most semantically opaque items are, in syntactic terms, not phrases or clauses but words. Yet there are numerous exceptions to both these generalisations.Footnote ⁴ Many words (or lexemes) are semantically entirely transparent: moneciousness, unfortuitous and post-Augustan, for example, are not listed in the Shorter Oxford Dictionary (2002 edition) and do not need to be, because their meanings are predictable on the basis of those of monecious, fortuitous and Augustan. Conversely, many semantically opaque items, even though they are not words (or, at least, not obviously so), must be listed somehow in any comprehensive lexicon. Here are some examples:

(1)
a. red herring ‘irrelevant topic’
as in Mike’s drinking habits are a total red herring.

b. lady-in-waiting ‘woman who attends a queen at court’
as in Despite family pressure, she refused to be a lady-in-waiting.

c. keep tabs on ‘monitor, keep under surveillance’
as in His wife keeps tabs on all his expenditure.

d. take a shine to ‘be attracted to’
as in Jeff took a shine to Joe as soon as they met.

e. not be a patch on ‘not be nearly as good as’
as in Smith isn’t a patch on Jones as a pianist.

f. go under ‘fail commercially’
as in No one expected Lehmann Brothers to go under.

My qualification ‘not obviously so’ is necessary because of the widespread view that any noun-headed collocation, if its meaning is not transparent, must be classified as a compound – that is, as a word rather than a phrase – even if its internal structure appears syntactic rather than morphological. Thus, for example, Borer (Reference Borer, Lieber and Štekauer2009: 491) assumes that, in Hebrew, there is a grammatical distinction between beyt sar ‘minister’s house’ (literally ‘house of minister’) and beyt sefer ‘school’ (literally ‘house of book’): the former, being semantically transparent, is a phrase (she says), while the latter, because it is semantically opaque, must be a compound word. Similarly, Kornfeld (Reference Kornfeld, Lieber and Štekauer2009: 442–3), discussing semantically opaque Spanish collocations such as canción de cuna ‘lullaby’ (literally ‘song of cot’) and ojo de buey ‘porthole’ (literally ‘eye of bull’), classifies them as compound nouns, even though their internal structure is that of a noun phrase with a head noun modified by a prepositional phrase.

A style of analysis that favours ‘compound’ over ‘phrase’ is adopted also by Bauer, Lieber and Plag in their encyclopedic discussion of English morphology (Reference Bauer, Lieber and Plag2013). They observe correctly that, in English, many NN (i.e. noun-noun) collocations are stressed on the right, e.g. desert island, apple pie, main road, while many others, e.g. traffic island, banana bread, side-street, are stressed on the left. Traditionally, the former have been called ‘phrases’ while the latter have been called ‘compound words’. But since the only criterion for this supposed grammatical distinction is a phonological one (the position of the stress), Bauer and his colleagues argue that, for grammatical purposes, all these should be classified in the same way, namely as ‘compound nouns’.

One can certainly see the logic of this. Yet Bauer, Lieber and Plag do not apply the same reasoning in respect of AN (adjective-noun) collocations. Those that are stressed on the left, such as greenhouse, heavyweight and blackbird, are indeed categorised as compound nouns, but all those that are stressed on the right, even if semantically more or less opaque (such as green tea, light railway, black ice, hard cash, open secret) are categorised as not compounds but phrases, albeit ‘lexicalised’. This applies to all such right-stressed AN collocations, even ones that are entirely opaque, such as white elephant ‘unwanted object’, red herring ‘irrelevant topic’ and French letter ‘condom’.

The inherited confusion surrounding the term ‘lexical’ may be exercising an influence here, it seems fair to say.Footnote ⁵ What’s more, Bauer, Lieber and Plag’s phrasal analysis for white elephant and red herring leads to an anomaly that they do not comment on. If all AN collocations deserve to be analysed grammatically in the same way, no matter whether they are interpreted in an opaque (‘lexicalised’) sense or in a transparent (‘literal’) sense, one will expect them all to be open to the same sort of grammatical modification. But this is not what we find. Consider the following contrasts, involving adverbal modifiers (completely, entirely) versus adjectival ones (complete, entire):

1. a.
  completely red herring not ’completely irrelevant topic’
  but ‘herring that is red all over’
2. b.
  complete red herring either ‘completely irrelevant topic’
  or ‘red herring with no parts missing’

1. a.
  entirely white elephant not ‘entirely unwanted object’
  but ‘elephant that is white all over’
2. b.
  entire white elephant either ‘entirely unwanted object’
  or ‘white elephant with no parts missing’

The glosses supplied show that adverbial and adjectival modifiers are by no means interchangeable. It seems that an adverbial modifier (completely, entirely) is acceptable only if the adjective that it modifies is interpreted literally. If the whole noun phrase has its idiomatic interpretation, the adjective within it cannot be modified by an adverb – even if such modification would appear compatible with the idiomatic sense. We have here an illustration of characteristic C: completely and entirely, even though they are syntactically appropriate as modifiers for adjectives, are not allowed to modify adjectival elements in compound-like collocations that are semantically opaque.

What are we to make of this confusing state of affairs? Alert readers may already be puzzled as to why I labelled this section of the article ‘an apparent confusion’. The implication is that it may after all be appropriate to follow Borer and Kornfeld in assigning a special status to semantically opaque collocations but not to semantically transparent ones that are structurally parallel (though, in saying this, I am putting aside the issue of whether ‘compound’ is the appropriate label). Perhaps, for the same reason, Bauer, Lieber and Plag are wrong in denying ‘compound’ status to end-stressed collocations whose first element is an adjective (such as white elephant). After all, if Jackendoff is right in classifying some compounds as ‘coelacanths’ (hangovers from a pre-syntactic variety of language in which part-of-speech labels such as ‘adjective’ are inappropriate), then to assign a syntactic label such as ‘adjective’ to white in the opaque collocation white elephant meaning ‘unwanted object’ is strictly inappropriate. From this, the unacceptability of entirely as a modifier of white in this collocation follows directly. This unacceptability is surprising from the point of view of contemporary English syntax; but, if we take the ‘coelacanth’ idea seriously, the internal structure of white elephant ‘unwanted object’ is not really syntactic. Likewise, it may not be outrageous to suggest that the semantic opacity of the items beyt sefer ‘school’ (Hebrew) and ojo de buey ‘porthole’ (Spanish) entails that their structure is syntactic only in a superficial sense. Insofar as their meaning is not clearly derivable from their structure, they display characteristic A.

3.3 Non-nouns inside Compounds, and Restrictions on Their Modification

VN compounds such as killjoy and pickpocket are notoriously rare in English. They are commoner in French, where we observe examples such as essuie-mains ‘hand towel’ (literally ‘wipe hands’) and tire-bouchon ‘corkscrew’ (literally ‘pull-cork’) (Fradin Reference Fradin, Lieber and Štekauer2009: 422). But can the verbal element in such compounds be modified? If not, why not?

To illustrate what I mean, I offer two established verb-initial compounds in (4) and two hypothetical ones in (5):

1. a.
  gratte-ciel ‘sky-scraper’ (literally ‘scrape-sky’)
2. b.
  passe-partout ‘skeleton key’ (literally ‘pass-everywhere’)

1. a.
  ne-gratte-guère-ciel ‘medium-rise building’
  (literally ‘scarcely-scrape-sky’)
2. b.
  passe-presque-partout ‘key that opens most locks’
  (literally ‘pass-almost-everywhere’)

The meanings suggested for the examples at (5) seem plausible. Semantically, it looks as if there should be nothing wrong with these hypothetical compounds. Yet they do not exist. What’s more, one feels that they could not exist. Yet clearly this is not because the inserted elements ne … guère and presque are inappropriate as syntactic modifiers of gratte and partout respectively. So the situation is parallel to that of white and red in white elephant and red herring. If the two-word expression has its opaque interpretation, then the non-nominal elements cannot be modified. The element gratte in gratte-ciel may look like a verb. On the other hand, if this is indeed a deuterolinguistic context (a hang-over from protolanguage), it can contain no verbs; therefore, a modifier which would be appropriate for verbs in normal syntactic contexts cannot occur.

Once again, characteristic C makes sense of this, on the assumption that French compounds, like English ones, have deuterolinguistic features. It is true that for English completely white elephant, a straightforward syntactic interpretation is available, yielding the sense ‘elephant that is white all over’, whereas no such syntactic interpretation is available for the French collocation ne-gratte-guère-ciel. But this difference has nothing to do with deuterolanguage; it is explicable in terms of contemporary French syntax. It is simply that, in a sentence such as Ce bâtiment-là ne gratte guère le ciel ‘That building scarcely scrapes the sky’, the determiner le is essential for well-formedness.

3.4 Deuterolanguage Untamed: Law Degree Language Requirement Changes and Cup Bid Floats

Chomsky and Halle’s (Reference Chomsky and Halle1968) treatment of English stress has been enormously influential. It is also enormously complex – except as regards compounds. In compounds (we are told), the right branch is strong (that is, stressed) if and only if it branches (that is, if and only if it is itself a compound). It follows that the compounds láw degree and lánguage requirement are stressed on the left, as indicated – bearing in mind that requirement, even though morphologically complex, is a derived form, not a compound. (It also follows that right-stressed collocations such as desert ísland and apple píe are, by Chomsky and Halle’s definition, not compounds, even if they are semantically opaque, as white élephant is.) If we combine law degree and requirement, so as to form the compound [[láw degree] requirement], Chomsky and Halle’s rule predicts correctly that stress remains on the left. However, if the right-hand element requirement is replaced by lánguage requirement, which itself branches, then the main stress of the whole switches to this right-hand element. Furthermore, the stress remains on lánguage requirement when the non-compound (and hence non-branching) element change is added, so as to yield a single word (albeit lexically unlisted and internally complex) law degree lánguage requirement change meaning ‘change in the requirements relating to knowledge of languages for a degree in law’.

An example of similar complexity cited by Jackendoff (Reference Jackendoff, Lieber and Štekauer2009: 111) is at (6):

(6) inflectional morphology instruction manual software programming course

Naïve students have no difficulty understanding what this elaborate compound word means, says Jackendoff – but (a big but!) only if it is built up gradually by way of its components inflectional morphology, instruction manual and software programming. By contrast, no such gradual build-up is needed in order to understand a phrase with the same meaning:

(7) course in programming the software that accompanies manuals that teach inflectional morphology

We face a paradox. English speakers’ intuitions on stress placement in elaborate compounds such as law degree … changes and (7) are robust. Our brains know how to handle these linguistic constructs, it seems, even if we have never heard them before. At the same time, as Jackendoff (Reference Jackendoff, Lieber and Štekauer2009: 111) puts it, ‘the productivity [of compounding] is rather fragile by the usual syntactic standards’. In terms of my list of four characteristics, this illustrates characteristic A (the lack of a clear-cut hierarchical structure). What are we to make of this situation?

Jackendoff’s ‘coelacanth’ suggestion supplies an answer. Before there was syntax, strings of protowords could be combined into ‘compounds’ which would extend the vocabulary – yet be interpreted only in ad hoc, individual basis, unaided (or, perhaps one should say, uninhibited) by any productive principle. And it is not hard to find less elaborate examples of this in contemporary human English. Consider these sets of ‘compounds’:

1. a. butterfly net, drag-net, fishnet, hairnet, mosquito net
2. b. hairnet, hairpin, hairspray, hair-restorer
3. c. fly-paper, sandpaper, newspaper, rice paper
4. d. schoolboy, barrow-boy, paper-boy, ballboy, water-boy, tallboy
5. e. water-boy, water-wheel, water-spout, water-snake

All these items are well established in English. But is there any consistent way of interpreting them, on first hearing? Clearly not. Butterfly nets and mosquito nets are nets that have something to do with butterflies and mosquitos respectively, but knowing precisely what the relationship is depends on knowing the different ways in which these insects impinge on humans. A tallboy is indeed something tall, but it is not a human being of any kind; rather, it is a tall chest of drawers on legs, or else a combination of a chest of drawers and a clothes cupboard.

The item hair-restorer seems, at first sight, a counterexample to what I have just said. It can be confidently interpreted, surely, as denoting a substance (an ointment, perhaps) that restores hair. It thus resembles truck-driver ‘someone who drives trucks’: a classic example of a ‘synthetic’ or de-verbal compound of the kind discussed extensively in the early years of generative morphology (e.g. Roeper & Siegel Reference Roeper and Siegel1978; Selkirk Reference Selkirk1982; Lieber Reference Lieber1983). Yet consider the following three examples:

1. a. My uncle is going bald, so he is experimenting with some hair-restorer.
2. b. My uncle is going bald, so he is experimenting with some ointment to try to make his hair grow back.
3. c. ?My uncle is going bald, so he is experimenting with some ointment to try to restore his hair.

While hair-restorer as a noun raises no eyebrows, the phrase restore hair sounds awkward and unidiomatic. A similar anomaly arises with truck-driver and drive trucks: a person who hires a truck for a day to move some furniture does not thereby become a truck-driver, not even if she or he proceeds to hire two or more trucks on successive days. In fact, it is not hard to find examples of mismatches in both directions: verb phrases that lack a corresponding de-verbal compound, and supposedly de-verbal compounds for which the corresponding verbal expression sounds awkward. Here is a list, drawn from Carstairs-McCarthy (Reference Carstairs-McCarthy1992: 119):

1. a.
  take offence ?offence-taking
  dwell on misfortune ?misfortune-dweller
  give a cheer ?cheer-giving
  race to the finish ?finish-racer
  keep a mistress ?mistress-keeper
  deliver a verdict ?verdict-delivery
2. b.
  profit-taking ?take profits
  slum-dweller ?dwell in a slum
  care-giver ?give care
  motor-racing ?race (in) motors
  door-keeper ?keep the door
  time-keeping ?keep time
  (i.e. punctuality) (i.e. be punctual)

What we observe here is precisely the sort of lack of consistency that we should expect if well-formed nominal expressions such as in (10b) are not (despite appearances) related grammatically to well-formed verbal expressions such as in (10a). In other words, what we observe is what we expect if, on the one hand, Bickerton is right about the absence of any gradual transition between protolanguage and syntax and, on the other hand, Jackendoff is right about compounding as a coelacanth – an outgrowth of protolanguage in a non-syntactic direction.

The classic nominal collocation law degree language requirement changes may seem, at first sight, a counterexample. Is its meaning not perfectly predictable from that of its component parts? But my characterisation of deuterolanguage does not insist that such expressions should be opaque in every instance. It merely gives us reason not to be surprised by the degree of opacity that we so frequently observe, alongside seemingly random gaps in productivity. This opacity and gappiness is much more pervasive than would be expected if synthetic compound formation were guided by the syntactic faculty that is available to fully modern humans.

As Sadock (Reference Sadock, Lapointe, Brentari and Farrell1998: 164–5) observes, newspaper headlines are a fertile source of the kind of ad hoc nominal formation that we have been looking at. Usually such headlines are readily interpretable; after all, if they were not, a reader would react with a puzzled shrug and move to another story. But The Press newspaper of Christchurch (New Zealand) provides a counterexample. As I have put it elsewhere (Carstairs-McCarthy Reference Carstairs-McCarthy2018: 101–2):

It is unlikely that any readers have previously encountered the compound cup bid float, and unlikely too that many readers, having now encountered it, will be able to hazard much of a guess at its meaning. Yet it is a word that actually appeared in The Press newspaper in Christchurch, New Zealand, on 14 April 1994. The Press had a column on the front page summarising the main stories on inside pages. The words cup bid float appeared as the headline for one of these summaries, which continued: ‘New Zealanders will be offered the chance to buy shares in the company that will finance yachtsman Chris Dickson’s bid to win the America’s Cup next year.’ With just that much contextual information, the interpretation of the enigmatic headline becomes clear. Cup denotes the America’s Cup, bid denotes an attempt to win it (bid being an alternative to attempt that is favoured in newspaper headlines for the sake of brevity), and float refers to the floating of a limited company, i.e. the offering of shares in it on the share market. The fact that the headline cannot be interpreted without the help of the paragraph that it introduces hardly matters, from the journalist’s point of view; it has served its purpose if it has persuaded readers to read on.

What this illustrates is a further role that juxtapositional ‘compounding’ of nominal elements, of a kind that I have called ‘deuterolinguistic’, can play, even in the context of a fully modern language such as English. The meaning of cup bid float is neither institutionalised (since it is a nonce formation) nor transparent. Yet it can still serve a communicative purpose, by stimulating readers’ curiosity: ‘What can that be about?’ they ask themselves. The editor of The Press, without realising it, is exploiting a primitive pre-syntactic resource that survives vigorously in English.

3.5 Deuterolanguage Tamed: Evidence from Germanic Languages

Once, in conversation with a linguist whose native language was Swedish, I mentioned the contrast in English between tóy factory (stressed on toy), meaning ‘factory that manufactures toys’ and toy fáctory (stressed on factory), meaning ‘factory that is a toy’ (e.g. in a model town). She commented that English was a remarkable language. Presumably no such contrast, involving only stress placement, would be possible in Swedish. Could it be that deuterolinguistic relics are less prominent in other Germanic languages than in English?

Traditionally, Germanic languages are seen as much alike in their tolerance of elaborate nominal compounds. But, as I have just noted, a contrast such as that between toy fáctory and tóy factory may be peculiarly English. Let us suppose that this sort of contrast is a deuterolinguistic residue: a somewhat haphazard contrast between two ways of stringing beads (to use Bickerton’s metaphor). In that case, other Germanic languages, inasmuch as they do not exhibit such contrasts, may exhibit styles of compounding that have moved further away from deuterolanguage. So it is not an accident, perhaps, that, in other Germanic languages, compounding frequently involves more than mere juxtaposition – in other words, these languages are less consistent in regard to characteristic D. Don (Reference Don, Lieber and Štekauer2009: 370) illustrates this in relation to Dutch, constructing the novel Dutch compound at (11):Footnote ⁶

(11)
Weersvoorspellingsdeskundigencongres
weer[s].voorspelling[s].deskundige[n].congres
‘weather forecasting experts’ conference’

As can be seen, this contains three linking elements, placed by Don in square brackets. Can such linking elements always be analysed synchronically as markers of case (e.g. genitive) or number? It would seem not. For example, boekhandel ‘bookshop’ and boekenkast ‘bookcase’ differ in that only the latter contains the linking element -en-, even though both bookshops and bookcases are typically associated with more than one book, and hondehok ‘dog-kennel’ contains the plural-like element -e- even though a kennel typically contains only one dog.

It is true that, even in English, there are complex words which would traditionally be labelled ‘compounds’ and which contain linking elements, such as lambswool, beeswax, tradespeople and calvesfoot (Plank Reference Plank and Drachman1976: 204–5). But these examples, says Plank, ‘have to be regarded as lexical relics. … The situation is different in Modern German, Afrikaans, Dutch and Swedish.’ In German, for example, we observe a variety of linking elements in compounds (-e-, -en-, -s-) which bear a superficial resemblance to nominal inflectional affixes but which, in compounds, frequently accompany nominal components for which they are inappropriate inflectionally: for example, Anstalt-s-direktor ‘director of an institution’, where Anstalt is feminine and would therefore never exhibit -s as an inflectional suffix. Plank concludes (Reference Plank and Drachman1976: 209–10: ‘There simply are no general necessary or sufficient conditions for the choice of one and the avoidance of another juncture suffix in German, Swedish, Dutch and Afrikaans.’ He reports experiments with children aged 7 and 10 asked to form novel compounds in German, in which considerable variation was observed. It is true that elaborate novel compounds can be formed in Germanic languages other than English. But the crucial point for our present purposes is that it is hard to visualise analogues in them of the sort of juxtapositional exuberance exhibited by the real-life English example cup bid float, much less Jackendoff’s contrived example inflectional morphology instruction manual software programming course with its seven word-level components.

3.6 Characteristic B and the Truck-Driver Problem

I have alluded already to English compound nouns such as truck-driver and motor-racing. These attracted attention in the early days of generative morphology because of the non-existence of the verbs from which one might have expected them to be derived: verbs such as *truck-drive and *motor-race. I will not attempt here to summarise this earlier work.Footnote ⁷ I will merely observe that, if a verb such as truck-drive did exist in English, it would not (according to my criteria) be an instance of deuterolanguage, precisely because it would be a verb rather than a noun, thereby violating characteristic B.

Does this mean that no language could ever have a verb analogous in structure to truck-drive, with the meaning ‘drive trucks’? That would be a rash assertion, and almost certainly false. One thinks of Inuit languages, which (broadly speaking) use morphology to do much of the grammatical work that in most languages is done by syntax. What it does mean, however, is that the sort of elaborate morphology that Inuit languages possess is not likely to have anything to do with deuterolanguage.

Bickerton envisages a clear break between ‘beads-on-a-string’ protolanguage and modern languages with hierarchical syntactic structure. Yet modern languages may exhibit not only syntactic but also morphological structure. Why two patterns of organisation, not just one? I have suggested possible reasons elsewhere (Carstairs-McCarthy Reference Carstairs-McCarthy2010). For present purposes, what matters is that, if compounding of the English type ‘is not some odd peripheral aspect of morphology’ (Jackendoff Reference Jackendoff, Lieber and Štekauer2009: 114), then one should not expect its behaviour to be explicated by morphology, if that term denotes a pattern of grammar that exists in most if not all modern languages alongside syntax.

3.7 A Final Note: Idiomaticity and the Two Meanings of ‘Lexical’

In section 3.2 I pointed out the confusing double sense that the word ‘lexical’ acquired among linguists during the twentieth century. I could have quoted also Fradin’s blunt criticism of the many linguists who ‘confuse compounding with idiomaticity’ (Reference Fradin, Lieber and Štekauer2009: 42). But perhaps we should consider the possibility of two-way traffic. So far, we have concentrated on the fact that deuterolinguistic items, lacking the interpretative clarity that syntax provides, naturally acquire idiomatic meanings. What if items with idiomatic meanings, just in virtue of their idiomaticity, are readily treated by the brain as deuterolinguistic? Do idioms have a greater tendency to violate normal syntactic and morphological rules than similarly structured non-idiomatic expressions do?

Consider the following examples:

1. a. his take-it-or-leave-it response
2. b. his couldn’t-care-less response
3. c. his sorry-I-can’t-help response
4. d. his I-knew-that-already response
5. e. his apply-to-my-boss response

1. a. a dog-in-the-manger attitude
2. b. a generous-to-a-fault attitude
3. c. a selfish-in-the-extreme attitude
4. d. a friendly-to-the-neighbours attitude
5. e. a desperate-for-success attitude

Something that these two sets of examples have in common is a clear contrast with respect to idiomaticity between the first and the last (examples (a) and (e)). Furthermore, the modifiers at (a), take it or leave it meaning ‘make up your mind immediately’ and dog in the manger meaning ‘selfish person’ are clearly idiomatic. By contrast, at (e), apply to the boss and desperate for success are perfectly well-formed syntactically but have only a literal meaning, not an idiomatic one – and it is surely not an accident that (12e) and (13e) belong at the unacceptable end of the scale. In other words, ill-formed prenominal modifiers are acceptable only if they are idiomatic.Footnote ⁸

As for the examples (12b) and (13b), they sound to me not quite so natural as (12a) and (13a), yet not so clearly unnatural as the (c) and (d) examples. Why? Because (I suggest) couldn’t care less and generous to a fault are both idiomatic in the sense that they are clichés, whose elements cannot be readily substituted; thus, for example, couldn’t worry less and couldn’t be less bothered both sound unnatural, even though their meaning is clear, and the same applies to kind to a fault and welcoming to a fault.

What relevance does Bickerton’s work on language evolution have to the study of idioms and clichés? None, one might think. I am not aware that Bickerton himself ever considered any such connection. But the argument sketched here, invoking work by Jackendoff and Sadock, suggests that a connection may indeed exist. Perhaps (pace Fradin) compounding and idiomaticity are indeed linked, as a residue of the tortuous route by which language evolved.Footnote ⁹

4 The SOV Mystery and Language Evolution

4.1 Introduction

It is quite senseless to raise the problem of explaining the evolution of human language from more primitive systems of communication that appear at lower levels of intellectual capacity.

Chomsky (1968), Language and Mind

It strains credulity to pretend that language as we know it suddenly sprang up intact as a cultural invention in the absence of extensive cognitive and communicative pre-adaptation.

Lamendella (1976), Subject and Topic

There are good reasons for suspecting that the synchronic structure of language cannot be understood without reference to its prior as well as ongoing developmental trajectory.Footnote ¹ While the discussion usually focused on diachrony and the way it keeps shaping and reshaping the grammars of individual languages, one may also note intriguing parallels between diachrony and ontogeny. In this chapter I would like to extend the discussion to the third grand developmental trend that has shaped human language, evolution.Footnote ²

While the three developmental trends may display some striking parallels, they have shaped human language in radically different ways, in radically different contexts, and along radically different time frames.

Diachrony: Traditionally assumed to span long stretches of historical time, diachrony is in fact the concatenation of multiple instances of online individual communicative behavior. During each instance, speakers modify their language minutely and subconsciously. The traditionally presumed historical macro-frame of centuries or millennia is but the cumulation of multiple micro-changes that take place during successive instances of interpersonal communication. The gradual accretion of such micro-changes bears the most direct responsibility for the current synchronic state of each language, thus also for cross-language typological diversity.
Ontogeny: The relevant time frame for language ontogeny is the period of cognitive and linguistic growth and maturation of individual speakers. But the end product, the way each mature individual communicates, must fall within the bounds of variation acceptable to the adult speech community. The effect of child language development on the synchronic structure of each language is thus limited, due to the power imbalance between adults and children, so that the adult model most often prevails in early language acquisition.
Evolution: The relevant time frame of language evolution spans the ca. 7 million years since the genus Homo split from its putative primate ancestors. Subsequent hominid evolution is responsible for what is common to all human languages – so-called language universals. However, such universals may be best expressed as constraints on development; that is, on the possible diachronic changes that shape individual languages, as well as on the course of language ontogeny. How these three developmental trends interact, and the mechanisms that shape the striking parallels between them, is a foundational question whose resolution will only be hinted at here.

Chomsky’s rejection of human language as the product of gradual, adaptive Darwinian evolution rooted firmly in pre-human communication could have been motivated, at least in principle, by two distinct lines of reasoning:

(i) Methodological: Unlike the ample, fine-grained, graduated physical fossil record, no comparable record exists of the multiple evolutionary steps spanning the 7 million years between the communication of our chimp-like ancestors and current human language. Therefore, nothing useful, aside from idle speculation, can be said about gradual language evolution.
(ii) Theoretical: The evolution of human communication cannot be described in the familiar terms of Darwinian bio-evolution, with successive pre-adaptations piling up gradually one on top of the other. Rather, language evolution must have been a rare exception to the rest of evolutionary biology – a gapped, instantaneous leap (Hauser et al. Reference Hauser, Chomsky and Fitch2002).

As far as I can see, Chomsky’s position is clearly the theoretical (ii). However, its only discernible justification is, in fact, the lack of fossil record, thus the methodological (i). But deriving the theoretical (ii) from the methodological (i) is a rank non sequitur. The main thrust of Chomsky’s stance is thus, manifestly, the Cartesian dualism of body and mind: In spite of the mounds of evidence suggesting that the evolution of human physical traits, including the cranium, was protracted and gradual, the evolution of human cognition and communication must have been, somehow, instantaneous and gapped.

While not as abundant as the evidence that supported Darwin’s initial conjecture of gradual biological evolution driven by adaptive selection, the evidence for gradual language evolution is not zero either. To wit:

Comparative evidence from the neurological, cognitive, social and communicative behavior of modern humans and their nearest primate relatives.
Ontogenetic evidence from the neurological, cognitive, behavioral, social and communicative development and maturation of modern human children.
Evidence from living relics – fossils of language – such as pidginization, creolization and Broca’s aphasia.
Analogical evidence from language diachrony.

These four lines of evidence, however indirect and incomplete, may nonetheless hint at the evolutionary process. And since we have no cogent theoretical reasons to assume that language evolution was the one glaring exception to the gradual, adaptively driven Darwinian model, the two alternatives left to us are:

Give up on understanding language evolution.
Use the available incomplete evidence to extrapolate plausible hypotheses that can then be evaluated and eventually tested.Footnote ³

One line of evidence that may yet prove most productive involves the combined database of cognitive neuroscience, neuro-linguistics and neuro-genetics. In the past few decades, it has become increasingly possible to map genes or gene clusters onto neurological structures. It has also become possible to elucidate, in an increasingly reliable way, the evolutionary history of our genome. Lastly, it has become increasingly possible to map cognitive and linguistic functions to their supporting neurological structures. Put together, these three advances may yet make it possible to elucidate the gradual course of language evolution.Footnote ⁴

Moving from the admittedly sketchy available data to coherent testable hypotheses requires a heady mix of extrapolation, analogical reasoning and abductive inference. All three have been denounced on occasion as speculative. They are nonetheless unimpeachable gambits in the tool kit of empirical science.Footnote ⁵

4.2 The Neo-recapitulationist Perspective

Implicit parallels between ontogeny and phylogeny harken back, at least implicitly, to the biological works of Aristotle, who observed a gradual progression from simple to complex in both his classificatory work – his scala naturae – and his embryology. A more explicit statement of how ontogeny may recapitulate phylogeny is due to Haeckel (Reference Haeckel1874), and an early review of the issues involved may be found on Gould (Reference Gould1977). What is more, the current Evo-Devo or Epigenetic perspective on the unity of developmental trends (West-Eberhard Reference West-Eberhard2004; Tucker and Luu Reference Tucker and Luu2012) is a clear vindication – and elaboration – of the recapitulationist perspective in biology.

As Lamendella (Reference Lamendella, Harnad, Stelkis and Lancaster1976) pointed out, three features of Haeckel’s original formulation have been empirically disconfirmed:

The assumption that ontogenesis recapitulates the phylogenesis of adult traits; whereas the facts suggest that such recapitulation pertains to immature traits at corresponding levels of development.
The assumption that the recapitulation is full; whereas the facts suggest that it is at best partial.
The assumption that recapitulation is expressed at the level of the entire organism; whereas the facts suggest that it is expressed, selectively, at the level of individual organs.

Lamendella (Reference Lamendella, Harnad, Stelkis and Lancaster1976) also noted how the Cartesian cleavage between body and mind still haunts our discussion of recapitulation:

Most scholars have no problem accepting the notion of phylogenetic recapitulation of basic anatomical and physiological systems in the embryo, but there seems to be a general distaste for entertaining the idea that post-natal stages of human cognitive and linguistic information processing might also be a repetition of our species’ history …

In the same vein, Lamendella (ms.) noted the interaction between neuro-cognitive development and maturation, on the one hand, and the evolution of culture and culturally transmitted communication systems, on the other. Thus, protracted post-natal maturation, indeed neoteny – the extension of child-like traits to the adult phenotype – facilitates cultural transmission and learning:

The explanation of the biological utility of immature developmental stages lies partially in the further inverse relationship between the state of maturity at birth … and the potential for a species to rise above stereotyped, automatic responses to a limited range of specific sensory stimuli. Immaturity of neural systems that are nonetheless functional provides the developing individual with flexibility … to adapt to a highly variable environment … Maturation that is partially mediated by individual experience that directs neural growth in an appropriate direction, not only relieves the genetic code of the heavy burden of detailed specification, but also allows individual experience and learning to assume a prime role in the adaptation of both the individual and the group …

(Lamendella ms.: 47)

4.3 The SOV Mystery

In this section we will survey a range of facts about extant human languages that, I believe, constitute a typological relic of an earlier stage of language evolution. In most language families known today, this relic is well attested. In the vast majority of the others, it can be easily reconstructed from internal synchronic evidence to a time-depth going back to not farther than 6,000 – 7,000 BC. Only in a small minority of the world’s languages is there no surviving internal evidence of this relic, due either to earlier departure from the putative early stage, or a faster rate of subsequent diachronic change.

The facts as I see them may be summarized as follows:

The majority of known languages and language families exhibit SOV (subject-object-verb) syntax, and so far as one can tell have always been that way.Footnote ⁶ This includes major families such as Altaic, Turkic, Dravidian, Sino-Tibetan, Japanese, Cushitic, Sumerian, all Papua-New Guinea phyla, Khoi-San, Athabascan, Hokan and many others.
The overwhelming majority of languages or language families that do not currently exhibit SOV syntax still carry clear internal evidence in their morphosyntax that points toward a reconstructible SOV syntax at some earlier time. This group includes Indo-European, Uralic, Niger-Congo, Nilo-Saharan, Afro-Asiatic, Semitic, Iroquois, Sieuxan-Cadoan, Uto-Aztecan, Mayan and probably all other Amerindian and Australian language families.
Very few language families seem to exhibit no trace evidence of earlier OV syntax, most of them occupying one geographic corner: Thai-Kadai, Austronesian and Austro-Asiatic.
Most known instances of natural – non-contact-induced – word-order change, or drift, seem to suggest the drift of SOV > flexible word order > V-first > SVO.Footnote ⁷ A natural, non-contact-induced drift toward SOV order is extremely rare, with most exceptions turning out to uphold the rule.Footnote ⁸

While the evidence is not absolute, it is nigh-on overwhelming.Footnote ⁹ And the drift away from SOV syntax suggests that other word orders, particularly (S)VO, are more suited to the current evolutionary stage of human communication, in particular more amenable to the grammaticalization of the two sub-features of topicality – referent accessibility and referent importance.Footnote ¹⁰

As elsewhere in empirical science, it is facts that seem arbitrary and don’t cohere with the current theoretical framework that prompt the next cycle of theory building. I would like to open the discussion here by posing the two questions that will guide our investigation:

What was it in an earlier stage in the evolution of human culture, cognition and communication that prompted the first rigid word-order to grammaticalize as (S)OV?
What were the subsequent changes in human culture, cognition and communication that motivated the gradual drift away from OV syntax?

4.4 Extrapolation #1: Canine Communication

The data reported here are the cumulation of seven years of observation of one male Belgian Shepherd dog between October 1969 and August 1976. While informal and strictly qualitative, the observation was both intensive and extensive, tracking the subject’s communicative behavior with both canines and humans. Without excusing the informality of the method, its drawbacks were mitigated by the direct and near-constant personal access to the rich pragmatic context of the subject’s social and communicative behavior.

The summary below highlights the most salient features of canine – indeed pre-human primate – communication. Anybody who has interacted personally with our best friends will have no trouble recognizing the description. One must still justify the choice of canines over our closer primate kin. The best justification is that, in the main, my canine data closely match the data of primate communication in the wild.

4.4.1 Here and Now, You and I, This and That Visible

The most striking feature of canine communication is how firmly it is anchored in the current speech situation that is equally accessible to both speaker and hearer. The time is invariably now, the place is invariable here, and the referents are, invariably, either you and I or this and that perceptually accessible on the current scene.

(a) Time: Canine behavior strongly suggests ready access to long-term episodic memory of past experiences, as well as some mental representation of the immediate future. Their planning behavior hints at even longer-term representations of future action. But they never seem to communicate about objects or events in displaced time, only about those anchored in the present or the immediate future.
(b) Place: Dogs share our sub-cortical episodic (‘long term’) memory organs, the hippocampus and amygdala, and clearly have memory traces of remote objects and locations. Still, they seldom communicate spontaneously about such objects or location, only about those present at the shared current scene. They seem, however, to understand human verbal references to salient concrete objects and persons away from the current scene.
(c) Referents: As noted above, dogs communicate primarily about referents present here and now. Those referents are invariably concrete entities (nouns), both animate and inanimate, or concrete activities (verbs). It is easy to teach dogs human vocabulary that codes such concrete referents, but nigh on impossible to teach them abstract vocabulary, or even concrete adjectives.Footnote ¹¹
(d) Animacy and agency: Dogs seem to make a clear distinction between animate and inanimate entities, and thus presumably have some notion, however implicit, of purposive action and agency. Their observational criteria for this distinction are, most likely: “Entities that can move spontaneously without an apparent external cause must possess some internal prompt (intention) and the power (agency) to execute motion – just like me.” There is, thus, no reason to assume that dogs lack the concept, however rudimentary, of cause-and-effect.
(e) Events: When the referent of a canine speech act is an action/activity (event), it is invariably concrete, as are the coded verbal concepts that humans can teach them.
(f) Speech acts: Spontaneous Canine speech-acts are never informative (declarative, interrogative), but only manipulative (commands, requests). Nor do they seem to understand human declarative or interrogative speech acts.
(g) Speech-act participants: Canine behavior suggests that they must understand the difference between speaker and hearer, both in their own communication and in their interaction with humans. In most of their spontaneous speech acts that are directed at humans, dogs tag themselves as the beneficiary and the human interlocutor as the agent (‘you-H do this for me-C’). But they clearly understand human communication that reverses the two roles (‘you-C do this for me-H’).
(h) Mono-propositional discourse: While dogs’ planning behavior suggests that they can mentally represent coherent multi-propositional – multi-action, multi-event – information, the coherence scope of their speech-acts is strictly mono-propositional. They have no trouble interpreting ‘fetch the ball’, ‘sit’, or ‘roll over’ separately but seem baffled by the sequence ‘fetch the ball, then sit, then roll over’.

4.4.2 Sociocultural Context: The Society of Intimates

As Lamendella (Reference Lamendella, Harnad, Stelkis and Lancaster1976) suggested, it makes little sense to talk about a communication system outside the sociocultural context within which it evolved and was designed to perform its adaptive tasks. The sociocultural context of canine communication is in most general features identical to that of nonhuman social primates and early hominids, as well as, within bounds, to the social context of early childhood. The context is that of the Society of Intimates, whose most salient characteristics are:Footnote ¹²

(a) Small social unit: The total group size for wild dogs is ca. 10–25 (van Lawick-Goodall and van Lawick Reference van Lawick-Goodall and van Lawick1971). The group size of chimpanzees in the wild is ca. 15–40 (Goodall Reference Goodall and DeVore1965), with the variability due to the fluid fusion-fission pattern of the social group. The comparable group size for early hominids, including modern hunters and gatherers, is 25–150 (Dunbar Reference Dunbar1992), though Marlowe (Reference Marlowe2005) gives the median as 25.
(b) Kin-based social organization and cooperation: The society of wild canines and many social primates is organized around families headed by a senior female, together with her female descendants and immature male and female progeny. Social cooperation is organized along kinship lines, with the social position of adult males varying considerably from species to species.
(c) Relatively homogenous gene pool: With obvious provisions for exogamy, the social unit is composed of close blood relatives.
(d) Restricted territorial range: The canine daily foraging range is ca. 10 km, with a median total home range 1,700 km. In comparison, chimpanzee daily foraging range is 3–5 km, with median total home range of 12 km. The comparable figures for modern human hunter-gatherers is a daily foraging range of 9–14 km, with median total home range of 175 km, and an average of seven times per year of moving the home camp beyond the daily foraging range (Marlowe Reference Marlowe2005).
(e) Low rate of sociocultural change: Canine, primate and early human societies are/were extremely time-stable, displaying little cultural change within the lifetime of an individual.
(f) High informational stability and homogeneity: With restricted territorial range, a small and stable social group and a low rate of cultural change, information in the canine society of intimates is highly time-stable, and is distributed homogeneously among all members of the social group.

4.4.3 Types of Information

Information may be divided into two main components:

Generic information – what we all know, share and can take for granted as members of the same cultural group; knowledge about our shared physical, social and mental universe; what cognitive psychologists call semantic memory.
Specific information – what happens at specific times and places to specific persons, animals or objects; what changes, what is new, what psychologists call episodic memory.

Given the salient features of the Society of Intimates, (a) through (e) above, generic information in such a society is time-stable, predictable and universally shared among group members. And due to the group always being (and moving) together and sharing the same here-and-now scene, the bulk of specific new information about what happened and who did what to whom is equally shared. What is then left to communicate about in the Society of Intimates? What topics are neither taken for granted generically nor obvious situationally? What’s news?

There appear to be only three categories of adaptively vital information that is neither generically nor situationally shared among group members present at the here-and-now scene:

Specific internal mental states: fear, anger, arousal, pleasure, pain, hunger.
Specific intents to perform interpersonal acts: aggression, submission, friendliness, courtship.
Urgent external states: predator, prey, enemy.

These are precisely the most common signals communicated among group members in the canine and primate Society of Intimates.

4.4.4 A Note on Primate Communication

The study of chimpanzees, bonobos and other primates, both in the lab and in the wild, has grown exponentially since this paper was originally written,Footnote ¹³ with important works such as de Waal (Reference de Waal1982), Cheyney and Seyfarth (Reference Cheney and Seyfarth1990, Reference Cheney and Seyfarth2007), Savage-Rumbaugh et al. (Reference Savage-Rumbaugh, Murphy, Sevcik, Brakke, Williams and Rumbaugh1993), Savage-Rumbaugh and Lewin (Reference Savage-Rumbaugh and Lewin1993), de Waal and Lanting (Reference de Waal and Lanting1997), Tomasello and Call (Reference Tomasello and Call1997), Boesch and Boesch-Achermann (Reference Boesch and Boesch-Achermann2000), Zuberbühler (Reference Zuberbühler2000, Reference Zuberbühler2001), Boesch (Reference Boesch2002, Reference Boesch2005), Rumbaugh and Washburn (Reference Rumbaugh and Washburn2003) and Tomasello (Reference Tomasello2009), among many others. There are, clearly, considerable neuro-cognitive, sociocultural and communicative differences between social canines and social primates, evolution having not stood still. Nonetheless, within bounds, the general parameters outlined above of the sociocultural and communicative ecology of the canine Society of Intimates match remarkably well those observed in social primates, both in the wild and in the lab.

4.5 Extrapolation #2: Early Child Language

4.5.1 Communicative Mode

Much of what is known about canine and primate communication and its sociocultural context rings familiar when one considers early childhood communication, from birth to ca. 2 years of age.Footnote ¹⁴ To summarize briefly:

(a) Here and now, you and I, this and that visible: Child communication between birth and ca. 2 years is overwhelmingly anchored in the here-and-now speech situation (Piaget Reference Piaget1952; Werner and Caplan Reference Werner and Caplan1963; Carter Reference Carter1974; Bloom Reference Bloom1973; Scollon Reference Scollon1974, Reference Scollon1976; Bates Reference Bates1974, Reference Bates1976; Clark and Clark Reference Clark and Clark1977; among many others).
(b) Speech acts: At the early stage of differentiated crying, starting ca. 2 weeks after birth, the child’s speech acts are exclusively manipulative, expressing requests for rectification of bothersome conditions such as hunger, pain, itching, discomfort and loneliness, or pleasure at their rectification (Carter Reference Carter1974; Bates et al. Reference Bates, Camioni and Volterra1975; Dore Reference Dore1975; Bates Reference Bates1978; Lamendella ms.). With the advent of the first words ca. 1 year of age, most of the child’s coded communication is still manipulative, with declaratives gradually phasing in and interrogatives lagging far behind (Givón Reference Givón2009: chs. 6, 7, 8).
(c) Temporality: The time-axis of child speech acts in the first two years of life is, overwhelmingly, the present and immediate future, a fact fully consonant with the manipulative nature of the child’s early speech acts. Early declaratives and proto-declarative, such as pointing and attention-directing gestures, are fully anchored in the here and now.
(d) Spatial deixis: Pointing at objects and persons is the earliest mode of lexical coding, conflating two distinct communicative gestures – attracting the interlocutor’s attention to the child, and simultaneously to the intended object (Carter Reference Carter1974).
(e) Mono-propositional discourse: The coherence span of the child’s communicative turns during the first two years is overwhelmingly mono-propositional, expanding just before the advent of grammar (ca. 2 years) to conjoined clauses.
(f) Mode of complexity: During the early acquisition of grammar (ca. 2 years), the mode of increased utterance size and complexity is overwhelmingly that of conjunction (clause chaining). Hierarchic, subordinate clauses are phased in much later (Givón Reference Givón2009: chs. 6, 7, 8).
(g) Coded lexicon: At the early stage of lexical acquisition, ca. 1 year of age, the child’s spontaneous one-word utterances, all standing for clausal information (state, event), are mostly concrete nouns. Even at age 16 months, when 20% of the utterances are longer than one word, only 18% of those words can be considered ‘predicates.’ A sample from Bloom’s (Reference Bloom1973) corpus of a 1-year-old’s transcripts yields the following distribution:

(1) Word-types of a 1-year-old child

Category N %
object 54 30.5
location 35 19.0
adult human 9 5.0
interjection 13 7.5
“pivot”Footnote ¹⁵ 36 20.0
predicate 33 18.0
total: 180 100.0

The nouns in Bloom’s transcripts are either objects of transitive clauses (O), as in (2c,d) below, or subjects of intransitive clauses (S), as in (2e), but seldom agents of transitive clauses (A). This absolutive distribution is highly significant and is no doubt due to the fact that the agent is almost invariantly either the speaker or the hearer. Put another way, agents are recoverable from the situational context and thus can be safely zero-coded.Footnote ¹⁶ As I will suggest further below, this absolutive distribution figures out prominently in the evolutionary scenario.

The child’s one-word utterances – indeed one-word turns – at this stage are interspersed with adult turns that interpret the child’s speech-act intent (epistemic/informative vs. deontic/manipulative) and expand and elaborate on it. Typical examples of such diadic child–adult interaction are:Footnote ¹⁷

1. a.
  Adult–child diadic exchanges Speech-act interpretation
  MOT: What does the cow say Nomi? epistemic
  NAO: Moo.
  MOT: Moo.
2. b.
  MOT: Doggie.
  NAO: Me, me.
  MOT: I don’t think you want any apple juice now. deontic
3. c.
  NIN: Open.
  MOT: Okay.
  NIN: More book.
  MOT Okay, do you want another book? deontic
4. d.
  EVE: Napkin
  MOT: Oh, do you want a napkin too? deontic
5. e.
  EVE: Baby.
  MOT: What is Eve doing? epistemic
  EVE: Carry baby.

(h) Pre-grammatical pidgin communication: Just prior to the acquisition of grammar at ca. 2 years of age, the child’s multi-propositional communication, with each proposition now coded by two–three words, bears all the marks of pre-grammatical pidgin (Bowerman Reference Bowerman1973; Ochs-Keenan Reference Ochs-Keenan1974a, Reference 72Ochs-Keenan1974b, Reference Ochs-Keenan1975a, Reference Ochs-Keenan1975b; Ochs-Keenan and Schieffelin Reference Ochs-Keenan, Schieffelin and Li1976; Slobin Reference Slobin and Macnamara1977; Ochs-Keenan et al. Reference Ochs-Keenan, Schieffelin, Platt, Ochs-Keenan and Schieffelin1979; MacWhinney Reference MacWhinney and Kuczaj1982).Footnote ¹⁸ Thus, Bowerman (Reference Bowerman1973 : 3–4) observes:

early child speech is “telegraphic” – that is consists of strings of contents words like nouns, verbs and adjectives, and lacks inflections, articles, conjunctions, prepositions and post-positions and, in general, all functors or “little words” with grammatical but not referential significance …

(i) Cross-turn spreading of utterances: In both the early one-word stage, when the coherence scope of the child’s message is mostly mono-propositional, as well in the subsequent two-word stage when the message coherence scope turns multi-propositional, the message is typically distributed across adjacent child–adult turns, with the adult expanding and elaborating on the child’s short turns (Ervin-Tripp Reference Ervin-Tripp and Hays1970; Scollon Reference Scollon1974, Reference Scollon1976). Thus Ochs-Keenan et al. (Reference Ochs-Keenan, Schieffelin, Platt, Ochs-Keenan and Schieffelin1979: 267–268) observe:

caretaker and child together construct a single proposition. We suggest that a child may learn how to articulate [full] propositions through such a mechanism. That is, she may learn how to encode [full] propositions by participating in a sequence [of adjacent turns] in which she contributes components of a proposition …

(bracketed material added)

4.5.2 Sociocultural Context

The sociocultural ecology of L1 acquisition during the child’s first one–two years, whether in the confines of the nuclear family, the extended family or the extended clan at the home site of the hunter-gatherers (Marlowe, Reference Marlowe2005, Reference Marlowe2010; Hrdy Reference Hrdy2009), resembles in all major respects the Society of Intimates of canines and social primates described above (section 4.4.2.). With the added caveat that a vast power and knowledge asymmetry exists between the child and adult caregivers – older siblings or cousins, familiar adult kins – in the early years. This asymmetry dissipates gradually over time.

4.6 Pre-grammatical Pidgin as an Evolutionary Stage

Human language can be processed in two radically different modes, the pre-grammatical (pidgin) mode and the syntactic (grammaticalized) mode, with the major differences between them recapitulated as (Givón Reference Givón1979: ch. 5):

(3)

pre-grammatical processing	grammatical processing
topic-comment constructions	subject-predicate constructions
loose clause-chaining (simple clauses)	tight subordination (complex clauses)
separate intonation contour over simple clauses	unified intonation contours over complex clauses
flexible-pragmatic word order	rigid-grammatical word order
nearer to 1: 1 noun-to-verb ratio in text	higher noun-to-verb ratio in text
no grammatical morphology	rich grammatical morphology
slower, attended processing	faster, automated processing
higher error rate	lower error rate

The diachronic process of grammaticalization, via which grammatical morphology and syntactic constructions arise in tandem, consists of construction-by-construction changes in which pre-grammatical paratactic structures change to grammatical syntactic structure. During the first two years of language development, the child’s communication is overwhelmingly pre-grammatical, and earlier on mostly mono-propositional. Only toward the third year of their life do children begin to acquire grammar.

Pre-grammatical pidgin communication is not devoid of regularity but rather displays a number of universal, transparently iconic ‘rules.’ Most of those are also integrated into grammatical communication. The most salient rules of pre-grammatical (pidgin) communication are as follows:Footnote ¹⁹

(4) Rules of pre-grammatical communication
1. (i) Intonation rules
  1. a. Stress and predictability
    “Information chunks that are less predictable are stressed.”
  2. b. Melodic contours and mutual relevance
    “Information chunks that belong together conceptually are packaged together under unified intonation contours.”
  3. c. Rhythm and pauses
    “The size of the temporal break between information chunks corresponds to the size of the cognitive or thematic distance between them.”
2. (ii) Proximity rules
  1. a. Proximity and relevance
    “Information chunks that belong together conceptually are kept in closer spatio-temporal proximity.”
  2. b. Proximity and scope
    “Grammatical functors (‘operators’) are closest to the chunks of lexical or propositional information (‘operands’) to which they are most relevant.”
3. (iii) Linearity rules
  1. a. Linear order and importance
    “More important information chunks are fronted.”
  2. b. Linear order and unpredictability
    “More unpredictable (‘new’) chunks of important information are fronted.”
4. (iv) Quantity rules
  1. a. Zero coding and predictability
    “Predictable or already-activated information is left unexpressed.”
  2. b. Zero coding and relevance
    “Unimportant or irrelevant information is left unexpressed.”

The acquisition of grammar by children is gradual and proceeds through intensive interaction with adult interlocutors, who contribute active feedback, interpretation, expansion and correction. What is eventually acquired is, by and large, the adult grammatical model. Children do engage in spontaneous grammaticalization, producing constructions that are not attested in the adult input (Bowerman Reference Bowerman1973). But such spontaneous innovations are most often rejected by adult interlocutors and are eventually weeded out of the child’s language, given the overwhelming power imbalance between child and adult.

Natural second-language acquisition by adults seldom proceeds beyond the pre-grammar pidgin stage. This is, in all essential detail, also the language of Broca’s aphasia. As an illustration, consider (5) from Menn (Reference Menn, Menn and Obler1990: 165):

(5)

… I had stroke … blood pressure … low pressure …

period … Ah … pass out … Uh … Rosa and I, and …

friends … of mine … uh … uh … shore … uh drink,

talk, pass out …

… Hahnemann Hospital … uh, uh I … uh uh wife, Rosa …

uh … take … uh … love … ladies … uh Ocean uh Hospital

and transfer Hahnemann Hospital ambulance … uh …

half’n hour … uh … uh it’s … uh … motion, motion …

uh … bad … patient … I uh … flat on the back …

um … it’s … uh … shaved, shaved … nurse, shaved me …

uh … shaved me, nurse … [sigh] … wheel chair … uh ..

Given the multiple contexts in which pre-grammatical pidgin is the preferred mode of multi-propositional communication, I see no alternative but to assume that in language evolution as well, the first stage of multi-propositional discourse was pre-grammatical pidgin.Footnote ²⁰

4.7 The Evolution of Grammar: A Hypothesis

4.7.1 Ground-Zero: Shift of the Communicative Context

It is utterly senseless to discuss the evolution of human language without considering first the changes in the adaptive context that prompted it. As noted earlier, the Society-of-Intimates context in which primate communication was embedded imposed the following constraints on pre-human speech acts:

mono-propositional coherence scope
strictly manipulative speech acts
anchoring in the here and now
largely un-coded lexicon
strong context dependence

The sociocultural context of pre-human communication also guaranteed the extreme time-stability and intra-group sharing of generic-cultural knowledge. Likewise, the here-and-now anchoring of communication guaranteed the intra-group sharing of specific-situational knowledge.

The change we must contemplate now is one that shifted, however gradually, both categories of shared knowledge, so that both the cultural-generic and situational-specific knowledge ceased to be universally shared within the social group. This change must have made declarative information an adaptive necessity. I would like to suggest that the rudiments of this change are well-documented in the evolutionary history of early humans – Homo habilis and especially Homo erectus. They involved:Footnote ²¹

An expanded foraging range for both big-game scavenging and hunting by males and gathering by females;
splitting into smaller foraging parties;
the establishment of a stable, well-defended home base where both the too-old and the too-young could be left safely during the day;
moving the home-base periodically over the year’s cycle.Footnote ²²

This new cultural-geographic pattern created an information imbalance within the social group, in terms of both generic and specific information . Complex generic knowledge gleaned by the small scavenging, hunting and gathering parties was no longer automatically shared by the whole group. New hunting, gathering, tool-making and fighting skills became specialized and required teaching. Likewise, the specific situation-based context was not shared any more by the entire group. Adaptively crucial new information was now vested in scattered individuals and small subgroups. So that adaptively urgent new information about food and water sources and potential enemies was not available to all group members in the shared here and now.

4.7.2 Changes in the Communication System

4.7.2.1 Noun Coding: From Deixis to Well-Coded Nouns

Given the previously evolved referential device of pointing (deixis) for establishing joint attention to a referent, the first step of coded communication must have been, as it is still in child language, the lexicalization of referent nouns. Since the range of adaptively relevant well-coded objects and situations must have been fairly restricted to begin with, the verb – action, state, event – could have been easily inferred by a simple cultural calculus: What else does one do with game animals (hunt)? With food items (eat)? With shelter (enter, exit)? With a tool (use)? With a hungry child (feed)? With a conspecific of the opposite sex (mate)? With a predator (run)? With an enemy (fight)? In the hunter-gatherer Society of Intimates, the answers – the verbs – are universally obvious. And as in early child communication, the lexicalized noun when pointing in the here and now would not suffice any more must have been, initially, the predictable absolutive array – either the object of a transitive event (O) or the subject of an intransitive event (S).

4.7.2.2 Verb Coding: From One-Word to Two-Word Clauses

When increased cultural and environmental complexity made the calculus of inferring the verb less tenable, lexicalized verbs were added to the one-word utterance, much like in the two-word stage of early childhood communication (Bowerman Reference Bowerman1973). Since at the beginning the verbs were still fairly predictable from the context, they were added as afterthought (R-dislocation) following the entrenched lexicalized noun. That is, in the paratactic structures:

(6) Early paratactic two-word clauses
1. • transitive:O,V(deer, kill)
2. • intransitive:S,V(deer, run)

Eventually, the separate intonation contours of these early paratactic clauses were merged, yielding the corresponding syntactic clauses under a joint intonation contours; respectively:

(7) Early paratactic two-word clauses
1. • transitive:OV(deer kill)
2. • intransitive:SV(deer run)

4.7.2.3 From Mono-propositional to Multi-propositional Discourse

With the gradual accretion of the adaptively vital information that could not be assumed to be universally shared within the group, the move from mono-propositional to multi-propositional discourse was nigh inevitable. As noted earlier, behavioral evidence suggests that both canines and nonhuman primates have mental representations of coherent multi-event sequences in their episodic memory. The cognitive pre-adaptation for multi-propositional discourse had thus been already in place. Multi-propositional discourse depends on multi-event coherence; that is, on the fact that adjacent events in a sequence are relevant to each other, as in complex hunting, bulb-foraging, tool-making, mating, food-preparation or raiding routines.

The most concrete element of cross-event coherence is referential coherence – the recurrence of an important topical referent over successive events (Givón Reference Givón1983a,b). In human communication, this element is most commonly the agent (A) of transitive events or the subject (S) of intransitive events. The most natural pre-grammatical pidgin device for coding recurrent referents is zero anaphora.Footnote ²³ The most common device for coding unpredictable new referents is L-dislocation – fronting of the new topic. Both have been listed in (4) above as part of the ‘rules’ of pre-grammatical communication. The pre-grammatical pidgin that must have emerged as the early mode of multi-propositional discourse must, therefore, have had the following clause types:

1. a. Transitive:
  
  • new topical agent: A, OV
  • recurrent topical agent: [0]-OV
2. b. Intransitive:
  
  • new topical subject: S,V
  • recurrent topical subject: [0]-V

4.7.2.4 Grammaticalization as an Evolutionary Process

There is no reason to believe that the evolution of grammatical communication from pre-grammatical pidgin did not follow the sequence seen in diachrony and child-language acquisition, leading to the gradual emergence of:

tight, hierarchic syntactic constructions out of loose paratactic clause chains;
grammatical morphology out of lexical words within syntactic constructions.

But the early SOV word order of human language predates this stage, having been already established during the earlier phase of mono-propositional pidgin communication (see (6), (7) above). It was then carried over to multi-propositional pidgin communication, and onward to grammaticalized language.

The early SOV order of human language is thus an evolutionary relic of Homo sapiens’ early mono-propositional pidgin communication. Its adaptive rationale was rooted in single-event cognition, rather than in multi-event discourse coherence.

4.7.2.5 The Drift Away from SOV

As noted at the outset, SOV is still the most common rigid word order in human language. Wherever it has changed, the natural drift seems to be as in (9):

(9) Natural drift in word-order change
SOV > pragmatically controlled > V-first > SVO
flexible word order

The adaptive impetus for this drift does not reside in single-event cognition, but rather in multi-propositional discourse coherence. The dead giveaway is that the earliest stage of drift – from SOV to pragmatically controlled flexible word order – involved three discourse-pragmatic ‘rules,’ two of which commonly conflate into the same communicative device (Givón Reference Givón, Hammond, Moravcsik and Wirth1988; see (4) above):Footnote ²⁴

(10) Discourse-pragmatic word-order devices
a. pre-posing unpredictable new information L-dislocation
b. pre-posing important information L-dislocation
c. post-posing more predictable old information R-dislocation

Given that most recurrent/predictable nominal referents are zero-coded (see (8) above, as well as Givón Reference Givón1983a,b, Reference 71Givón2017), the preverbal subject position (SV) in the evolved grammaticalized SOV order is the direct consequence of the discourse-pragmatic word-order device (10a,b) above, thus motivated by the adaptive demands of multi-propositional discourse. The preverbal object position (OV), on the other hand, is a relic of the earlier evolutionary phase of mono-propositional discourse, and thus motivated by the prior lexicalization of nouns before verbs (see (6), (7) above).

4.8 Discussion

4.8.1 Vestigial Relicts of Early Communicative Modes

As noted earlier, our capacity for pre-grammatical communication remains an enduring feature of the human linguistic tool kit, as is evident from its ready availability in early child language, adult second-language pidgin, and Broca’s aphasia. A broadly similar communicative genre is telegraphic speech (Janda Reference Janda1976). In bio-evolutionary and neurological terms, this is testimony to the relatively recent evolution of grammatical communication. In the right context, humans can still communicate without grammar.

In the same vein, the rules of pre-grammatical communication (4) have been incorporated whole hog into extant grammars.Footnote ²⁵ One may thus consider the use of zero-coding of referents that are either highly accessible in the anaphoric context or unimportant (passive agent, antipassive patient; see Givón Reference 71Givón2017) as another surviving relic of pre-grammatical pidgin communication. In the same vein, universal intonation and word-order devices such as contrastive stress, clause-level intonation contours, L-dislocation and R-dislocation may also be considered such relics.

4.8.2 Recapitulation and Developmental Trends

What is meant by ‘recapitulation’ has changed considerably over the notion’s protracted history. The foundations were laid down by Aristotle’s work in biology, involving first the recognition that biological structure is functionally motivated (De partibus animalium). Aristotle’s bio-classification was presented then as a graduated scala naturae of increased size, complexity and ‘perfection’ (Historiae animalium). And his study of embryo development (De generationem animalium) implicitly recapitulated the graduated scala naturae.

The late eighteenth and early nineteenth centuries added the explicit notion of phylogenetic evolution (Lamarck Reference Lamarck1809), to which Darwin affixed the adaptive motivation – natural selection. Haeckel’s (Reference Haeckel1874) observed parallelism, couched in the metaphor ‘ontogeny recapitulates phylogeny,’ was an explicit integration of Aristotle’s disparate observations. With proper delimitation of scope and context, recapitulation survived into the expanded theoretical agenda of modern biology (Gould Reference Gould1977).

The modern integration of the third developmental trend – online individual adaptive behavior and lifetime learning – owes much of its original impetus to Lamarck’s (Reference Lamarck1809) idea of inheritance of acquired traits. Having been first rejected by Darwin (1859), it was eventually fleshed out into a credible mechanism, beginning with Baldwin (Reference Baldwin1896) and Waddington (Reference Waddington1942, Reference Waddington1953), then on to Mayr (Reference Mayr1976), Fernald and White (Reference Fernald, White and Gazzaniga2000), West-Eberhard (Reference West-Eberhard2004) and Tucker and Luu (Reference Tucker and Luu2012), among many others. This slow expansion of the recapitulationist agenda allows us to view individual online adaptive behavior as the key shared mechanism of ontogeny and phylogeny; and then to view language diachrony as the concatenation of multiple instances of online individual communicative behavior – thus the linguistic equivalent of individual adaptive behavior. In sum, then:

(11) The three developmental trends

  trend biology language
  phylogeny bio-evolution language evolution
  ontogeny embryology & maturation language acquisition
  adaptive behavior online adaptive behavior language diachrony

The timescale of diachrony was traditionally assumed to be the uniquely human scale of cultural history – decades, centuries, millennia. This view was foisted upon us by the traditional method of comparative reconstruction, a method that imposed on language diachrony the misleading perspective of large, gapped temporal spans. Such a perspective was articulated uncritically by both Saussure and Bloomfield. But in fact, diachronic change is the concatenation of successive instances of online individual adaptive behavior. And realizing this affords us a clearer view of the profound unity of the three developmental trends of human language, not only in terms of analogy, but also in terms of homology; that is, shared mechanisms.

5 Broken Windows: Creoles, Pidgins, and Language Evolution

5.1 The Forensic

In his book Adam’s Tongue (Reference Bickerton2009), Derek Bickerton reminisced that he had “had the great good luck to come to language evolution from the study of pidgins and creoles” (p. 38).Footnote * Throughout his long career, he forcefully argued for a scenario in which the children of nonindigenous adult laborers in plantation societies of the Caribbean and Pacific were the principal agents of creolization. He saw creolization as a catastrophic single-generation process that obtains from first language acquisition in abnormal circumstances (e.g., Bickerton Reference Bickerton, Pütz and Dirven1989: 16; Reference Bickerton, Tallerman and Gibson2012: 464). In the “interesting” cases, at least, an early-stage pidgin provides the primary linguistic data, and an innate biological program of linguistic competence shapes the result. On this view the formation of these languages points directly to humankind’s biological capacity to create language should the normal generation-to-generation means of transmission be disrupted.

Yet – and this is the first thesis of my chapter – accounting for creole genesis was not the ultimate goal of Bickerton’s research agenda. In the preface to the reissue of his book Roots of Language (Reference Bickerton1981), he boasted that “it remains as the first work to suggest that creoles could constitute a window on the earliest stages of language” (Reference Bickerton2016: x). Gradually, perhaps inexorably, the agenda became one of recovering an antecedent protolinguistic stage attributable to our hominin ancestors, and modeling the transition to full language in our species.

My second thesis proceeds from Veenstra’s (Reference Veenstra, Kouwenberg and Singler2008: 220) observation that Bickerton’s work on the “origins-of-language question, especially Bickerton (Reference Bickerton1990, Reference Baker and Baker1995) and Calvin & Bickerton (Reference Calvin and Bickerton2000), has attracted no attention whatsoever in the field [of creolistics].” That seems to have been largely true up until the early 2000s, save for Mufwene’s review article (Reference Mufwene1991) on Language & Species (Bickerton Reference Bickerton1990). But since then, the question has drawn the attention of creolists, and Bickerton’s idea that creoles provide the most direct window possible on the properties of the human language faculty – and are of probative value for an understanding of the origins of human language – has found little support.

My third thesis proceeds from DeGraff’s impression (Reference DeGraff2020: e297) that “Bickerton’s ‘living linguistic fossils’ and related hypotheses are still relatively popular among linguists, psychologists, and other cognitive scientists interested in language evolution.” Granted, Bickerton’s views have been widely quoted outside the field of creolistics, but his claim that creoles offer a special window on the human language faculty has enjoyed no such popularity among scholars who deal with language evolution.

My fourth thesis has to do with the question of whether hypothetical properties of protolanguage correspond to the typical properties of pidgins, qua restricted systems. Since Bickerton (Reference Bickerton1990: ch. 5) the window potential of pidgins (as opposed to creoles) has received a great deal of attention in the literature. In my own contribution to the Oxford Handbook of Language Evolution, I was optimistic that a properly constructed pidgin (but not creole) window on language evolution would hold great heuristic promise (Roberge Reference Roberge, Tallerman and Gibson2012: 544). Subsequent work in a fast-moving field has overtaken the Handbook, and now, more than a decade later, I am far less sanguine. Claims expressed in hypotheses about the evolution of language need to be empirical. Existing explications of the pidgin window remain elements of rather speculative frameworks resting on largely nonempirical criteria and probing the limits of intuitive appeal. They are themselves to be viewed in the light of a far more cautious assessment of their heuristic potential and limitations.

5.2 Conceptual Clarifications

By pidgin language I understand the linguistic creation of a contact community that has need for a common means of communication but does not share a preexisting language that fulfills this function. Pidgins are social (group) solutions to the problem of interethnic communication. A pidgin is a restricted linguistic system that is used in limited domains by people who retain their spoken languages and identities. It is native to no one.

The ontogeny of creole languages and the extent to which (if any) they constitute a special phylogenetic and/or typological class are warmly disputed questions in creolistics. It has been commonly assumed that a creole language “has a jargon or pidgin in its ancestry” and is “spoken natively by an entire speech community, often one whose ancestors were displaced geographically so that their ties with their original language and sociocultural identity were partly broken,” typically under the conditions of slavery in European colonies (Holm Reference Holm1988: 6). Yet, the idea that creoles derive from antecedent pidgins in any sense has been aggressively attacked (Aboh and DeGraff Reference Aboh, DeGraff and Roberts2017; DeGraff Reference DeGraff2003: 397–98, Reference DeGraff2005: 559–62, Reference DeGraff2020: e297; Mufwene Reference Mufwene and Bonvillain2015).

Not all linguists tie creolization to nativization, even on the premise that creoles emerge from a sudden linguistic encounter, in which circumstances bring people speaking mutually unintelligible languages together in a way that compels them to create a means of intergroup communication rather than adopt a preexisting language. For Baker (e.g., Reference Baker and Baker1995: 4), a distinction between pidgins and creoles based on whether they are the native language of some of their speakers serves no useful purpose. Indeed, there may be no fundamental differences between pidgins and creoles, for “contact languages develop along the same lines and at a similar speed” provided they are in constant daily use (p. 13).

Bickerton opined that in most cases creoles are different enough from any of the languages of the original contact situation to be considered “new” languages (Reference Bickerton1981: 2; Reference Bickerton2004: 830–31). But whether creoles are truly new languages that have no genealogical affiliation in the conventional sense or are to be situated along the phylogenetic branches of their lexifiers is likewise a contentious issue. From one perspective, creole lexicon and grammatical features cannot all be traced back primarily to the same source language. Creoles are therefore contact languages that did not arise primarily through descent with modification from a single source language. From an opposing perspective, however, creoles are disenfranchised varieties of their lexifier languages (Mufwene Reference Mufwene and Bonvillain2015: 349–55) that developed under the same set of conditions that have generally led to language change (Aboh and DeGraff Reference Aboh, DeGraff and Roberts2017: 435).

5.3 From Language Bioprogram Hypothesis to Lexical Learning Hypothesis

In 1978 Bickerton delivered a lecture at the University of Michigan to a colloquium on the genesis of language (published as Bickerton Reference Bickerton and Hill1979), in which he dealt with creole languages that developed rapidly, within a generation, by children born to speakers of early-stage pidgins in new multilingual contact situations in European colonies. Children born to the immigrant workforce would expand the rudimentary lingua franca into an adequate vehicle for the expression of their own needs, wants, and desires (Bickerton Reference Bickerton and Hill1979: 1–2). Observed similarities among creole grammars can be a matter neither of chance nor of shared substrata. Bickerton’s proposed explanation: Children have a pre-experiential core grammar that is “highly specified with regard to the core items of syntax [… and] semantics” (Reference Bickerton and Hill1979: 16) and is available to them in the context of radically disrupted language transmission. If one believes that people already have a first language encoded in the brain (pp. 14–15), it is but a small step to speculation about creolization affording us a special window on the innate human language faculty and ultimately on the emergence of language in Homo sapiens.

Bickerton (Reference Bickerton1981, Reference Bickerton1984) refined these early formulations into what became known as the Language Bioprogram Hypothesis. The basic idea that there is an innate biological program that determines the form of human language, just as there is an innate biological program that determines physical development. Bickerton’s hypothesis predicted that instead of merely processing linguistic input, children seek to actualize a blueprint for language with which the bioprogram provides them. With normal language transmission, children adapt their innate blueprint in the direction of the target system. There will be a preexisting full language that they will learn from their elders. Thus, almost from the earliest stages, the developing bioprogram interacts with the target language. In some cases features of the latter will be similar to features in the bioprogram, in which case one can expect “extremely rapid, early, and apparently effortless learning.” In other cases the target language will diverge from the bioprogram to varying degrees. One can anticipate learning errors, which are conventionally attributed to deviant abductions formed by the child, but which Bickerton claimed are simply the result of the child’s not being ready to acquire certain structures in the target system and following out instead the bioprogrammatic template (Reference Bickerton1981: 134–35).

Plantation societies in European colonies combined population displacement, which disrupted normal generation-to-generation transmission of language, with rapid nativization (Bickerton Reference Bickerton, Pütz and Dirven1989: 16). The primary linguistic data for the first generation of locally born children – in the form of makeshift jargons or an early-stage pidgin – were highly variable (both across the speech community and within the output of individual speakers), impoverished, macaronic, and lacking any recognizable structure and resources that languages normally employ in the expression of complex propositions.Footnote ¹ In these conditions, parsing had to be based almost exclusively on semantics and pragmatics (see Bickerton Reference Bickerton1984: 175). Access to the superstrate language was limited. While first-generation creole children could have acquired the heritage languages of their parents, there was little incentive for them to do so in such a highly diverse, if not chaotic linguistic milieu. Yet, they produced a full-fledged language, the grammar of which bears the closest resemblance not to grammars of indigenous and/or labor-caste (substrate) languages, nor to that of the dominant European (superstrate) language, but rather to the grammars of creole languages in other parts of the world, even though they may have quite different affiliations and are geographically far removed (Hawaiian Creole English vis-à-vis Caribbean Creole English). The “most cogent explanation” for the putative typological coherence among creoles is that plantation (and for that matter maroon) children drew on a “species-specific program for language, genetically coded and expressed, in ways still largely mysterious, in the structures and modes of operation of the human brain” (Bickerton Reference Bickerton1984: 173).

The Language Bioprogram Hypothesis is, as Bickerton sought to adumbrate in chapter 4 of Roots, a dynamic evolutionary theory, and the bioprogram itself is supposed to be “an adaptive evolutionary device” (Reference Bickerton1981: 144). If the human language faculty is biologically based, it must have been developed in the normal course of evolution, and therefore must have “a real (and perhaps traceable) history” (Reference Bickerton1984: 187; similarly, Reference Bickerton1981: 159). Although any stab at tracing this history must remain speculative at this point, the Language Bioprogram Hypothesis does, he opined, suggest “some novel approaches to the question” (Reference Bickerton1984: 187). The nature of creole syntax should indicate what is most basic “and hence perhaps also what is evolutionarily earliest in the syntax of language in general” (p. 187) In other words, it reflects an “inner core grammar” that harks back to the emergence of Homo sapiens and from which “more complex and varied grammars may have evolved” (p. 188; further in 1981: 290, 295–97).

Bickerton (Reference Bickerton and Newmeyer1988, Reference Bickerton, Pütz and Dirven1989, Reference Bickerton and DeGraff1999: 56–59) rebranded his Language Bioprogram Hypothesis as the Lexical Learning Hypothesis. Accordingly, children acquire morphemes and language-specific morpholexical properties, which combine with universal principles to yield syntax. In ordinary circumstances children do not learn but recreate a given language. In the case of creole formation, there is no preexisting language for children to recreate. Having a biological endowment for language but only restricted morpholexical input, creole children are still able to produce a language, albeit one that “is not quite like any pre-existing language [...] but does show striking similarities to other languages produced under similar conditions” (Reference Bickerton, Pütz and Dirven1989: 14).

Veenstra (Reference Veenstra, Kouwenberg and Singler2008: 226) notes that in the Lexical Learning Hypothesis iteration Bickerton was rather less concerned with how the study of creole languages could provide a window on the origin and evolution of language. True, one finds only the stray remark in this connection in subsequent writings of a creolistic nature (e.g., Reference Bickerton and Newmeyer1988: 282–83; Reference Bickerton, Pütz and Dirven1989: 12, 19). As it happened, he had merely redirected this pillar of his hypothesis to a different audience. In Language & Species (Reference Bickerton1990), Bickerton reiterated his thesis that creole formation is a recapitulation of the transition from the protolanguage of our hominin ancestors to the full language of humankind: “What happened in Hawaii was a jump from protolanguage [Hawaiian Pidgin English] to language [Hawaiian Creole English] in a single generation” (p. 171). In terms of formal structure, the gulf between pidgin and its associated creole is immense. A pidgin is structureless, whereas a creole is structure-dependent like any other natural human language (p. 169). Yet, the transition is abrupt. Full human language built on protolanguage rather than superseding it. Bickerton (Reference Bickerton1990: 177) rejected the possibility that some kind of “interlanguage” might have served as a bridge between protolanguage and true language, for the emergence of various syntactic elements needed to happen virtually simultaneously.

In Adam’s Tongue (Reference Bickerton2009: chs. 9, 12), More than Nature Needs (Reference Bickerton2014: ch. 5), and in his preface to the Roots reissue (Reference Bickerton2016: viii–ix), Bickerton aligned his hypothesis with precepts of the minimalist program. He has backed away from claims propounded in earlier writings, such as concrete innate specifications for determiners and the encoding of tense, modality, and aspect. Universal Grammar, as instantiated in creoles, has little, if any syntactic apparatus beyond Merge, a process that iteratively attaches constituents to one another in order to build complex structures, rather than inserting one structure within another (Bickerton Reference Bickerton, Tallerman and Gibson2012: 459). In fact Bickerton (Reference Bickerton2014: 132) would prefer to redefine “Merge A and B” as “Attach A to B,” an asymmetrical operation (B having priority over A).Footnote ² The speaker still has to know which elements can be legally attached to which elements. Though there are broad, presumably innate semantic guidelines, the properties enabling attachment must be learned inductively for each language. Pidgin speakers know what these are (for one language, at least) because they have adduced the properties of lexical items in their native languages. Creole children retain access to the innate semantic guidelines that inform them, inter alia, of a default list of semantic distinctions that should correlate with markers of some kind. They then search the restricted lexical inventory of the pidgin for items that might plausibly be interpreted as markers of those distinctions.

5.4 A Creole Window on Early Human Language? The View from Creolistics

Scholars concerned with creole languages have reacted strongly against a number of Bickerton’s ideas. Back in the 1980s Muysken (Reference Muysken and Newmeyer1988: 300) had already concluded that “thinking of creole languages as alike, simple, and mixed is far from unproblematic.” The controversies have had their span and are only cursorily documented here.

Recall that Bickerton (Reference Bickerton1981, Reference Bickerton1984; similarly, Reference Bickerton1995: 38) claimed that Hawaiian Creole English emerged from Hawaiian Pidgin English, guided by the bioprogram, within a single generation. There exists a body of demographic evidence showing that Bickerton’s scenario “bears little resemblance to what actually happened in Hawai’i as documented by contemporary observers” (Roberts Reference Roberts and McWhorter2000: 275). Additionally, Baker (Reference Baker, Spears and Winford1997: 93) has called out the dubious assumption that most, if not all of a creole’s defining features become established within the space of the first cohort of creolephones. The studies of the late Jacques Arends (1952–2005) have amply demonstrated that this assumption does not apply to Sranan, which Bickerton thought to be a “clincher case” (my term) for the Language Bioprogram Hypothesis; see Bickerton (Reference Bickerton1991) for a reply to gradualist views.

As we observed in section 5.2, the notion that creoles have antecedent pidgins, though still accepted by many, was and remains far from uncontested. A related question has to do with whether child language acquisition occupies so central a place in creole formation. In general children of enslaved immigrants (indentured, in the case of Hawai’i) must have had access to at least one preexisting language. Ordinarily, a parental language should have been acquirable, but the primary linguistic data were unlikely to be defective in quality even if sometimes it was “unusually limited in quantity,” as might have been the case if daytime childcare was assigned to someone of a different language background from the birth mother (Baker Reference Baker1991: 268). Unless parents lacked a common heritage language, they are unlikely to have used a pidgin with one another and even less likely to have used it with their offspring (Baker Reference Baker, Selbach, Cardoso and van den Berg2009: 48). Such children would, of course, come into contact with other children and adults, some of whom spoke the heritage language and others of whom did not. As a practical matter the acquisition of superstrate morphemes would enable children to connect to the larger communicative network. But they would have had no sense for the relative utility of other languages beyond their immediate environment until well into adolescence (p. 48).

Slobin (Reference Slobin, Givón and Malle2002: 386–87) reminds us of a body of research that shows how “languages that are considerably more complex than pidgins can arise in interaction with adults, before there are native speakers” (p. 386). Adults participate in language construction “via the unfolding of … competence into performance” (DeGraff Reference DeGraff1999: 13). That is to say, they introduce innovations into the externalized language (E-language) that provides the input for child language learners. For children everywhere, the resources at hand are the ambient E-language and guidance from the human language faculty. Even if children in creole societies encounter quite degenerate input, one might assume with Lightfoot (Reference Lightfoot2006: 151) that they merely lack exposure to as much redundant information as children in noncreole societies. The first generation of native speakers “smooths out” the language by compiling stable grammars (Slobin Reference Slobin, Givón and Malle2002: 386). The learning processes are normal “and do not reveal special capacities of the language-learning child beyond what is already known about the acquisition of ‘full-fledged’ languages” (p. 387). Even if, as Slobin states, “a creole language develops over time, in contexts of expanding communicative use of a limited pidgin language,” child learners “help to push the process forward, arriving at a grammar that is more regular and automated – but they do not appear to be the innovators” (p. 387; similarly, Heine and Kuteva Reference Heine and Kuteva2007: 336). Mufwene (e.g., Reference Mufwene, Laks, Cleuziou, Demoule and Encrevé2008) has also sought to explain why creoles are not made by children, and he does so along similar lines.

Although the explanation of observed parallels has constituted a special problem in creolistics, scholarly opinion has long resisted the idea that creoles can be defined typologically; that is, there exist diagnostic linguistic features that set them apart from all other languages. While most creolists will agree that there are similarities that do require explanation, a now substantial body of descriptive and systematic comparative work (nicely brought together in Velupillai Reference Velupillai2015) has shown that the latter are “neither so numerous nor so profound nor so unexpected as to require extraordinary theories” (Singler Reference Singler, Kouwenberg and Singler2008: 346).

Especially pernicious in DeGraff’s view is the belief that the pidgin-to-creole life cycle recapitulates the emergence of modern human language out of a structureless protolanguage imputed to our hominin ancestors (Reference DeGraff2003: 396–99; Reference DeGraff2004: 835; Reference DeGraff2005: 545, 558–61; Aboh and DeGraff Reference Aboh, DeGraff and Roberts2017: 406–407, 416). Such a scenario is “perhaps the most spectacular instance of Creole Exceptionalism” (DeGraff Reference DeGraff2005: 558). Yet, even creolists who embrace the idea of pidgin-to-creole development are generally skeptical about the possibility that creolization could offer a view on the emergence of human language in our species. They do not distinguish creoles from other languages as being more natural with respect to innate capacities; “nor can it be said that creoles represent the language competence in anything approaching an ontogenetically primary state” (McWhorter Reference McWhorter2005: 99). For Kihm (Reference Kihm2002), the fact that full languages were present and accessible in venues where creoles formed, even if only as lexical reservoirs, is dispositive. No compelling inferences may be drawn with respect to prehistorical settings in which human language emerges. Mufwene (2008:272-73) emphasizes what should be obvious, namely, that the development of creoles in European plantation colonies of the seventeenth, eighteenth, and nineteenth centuries “present[s] nothing that comes close to replicating the evolutionary conditions that led to the emergence of modern language.” Nor are there “any conceivable parallels” between the brains and minds of early hominins and those of modern humans, “even if one subscribes to the ontogeny-recapitulates-phylogeny thesis.” I myself have voiced dissent with respect to the idea that creole grammar is close to the biologically based genotype of human language, and that regularities in creoles might therefore provide clues to the innate human language faculty (Roberge Reference Roberge, Tallerman and Gibson2012: 540). Lefebvre (Reference Lefebvre, Lefebvre, Comrie and Cohen2013) sets out to dispel any notion that the sequence pidgin-to-creole could provide us with a window on the emergence of full human language. She subjects Bickerton’s Language Bioprogram Hypothesis ca. Reference Bickerton1984 to a detailed (though far from perfect) criticism, obviously because she considers it the main representative of the pidgin-to-creole life cycle and protolanguage-to-full human language scenarios, and because she has come to her own theory of creole genesis.

The gist of Lefebvre’s (Reference Lefebvre, Lefebvre, Comrie and Cohen2013) critique entails these main points: (1) Pidgins and creoles do not differ qualitatively from one another; that is, they constitute points or segments along a developmental continuum. Extended pidgins are simultaneously both pidgins (for the majority who speak them as an auxiliary language) and creoles (for the minority who speak them as a first language) (see Baker Reference Baker, Spears and Winford1997: 92). They have expanded in the same way as creoles. (2) Nativization in Caribbean plantation societies was an extremely slow process. (3) Although creoles do appear to share at least one feature (they are isolating languages), the extent of typological cohesion does not hold up upon systematic comparison of creoles with typologically different substrata. (4) The claim that salient aspects of creole grammar are inventions on the part of children in creolization, rather than features transmitted from preexisting languages (Bickerton Reference Bickerton1984: 173), is not supported by the facts. (5) Functional categories of creoles replicate those of their substrate languages. Adults have a very significant input in determining the inventory and properties of these functional categories, which are for the most part well established prior to creolization.

For his part, Bickerton continued to promote and defend his hypothesis long after the field of creolistics had moved on, albeit with some concessions and modifications. For example, Bickerton (Reference Bickerton1991: 54) still held that “the first creole generation supplies the minimal structural properties required by natural languages; subsequent generations may (or may not) add features that make the language easier to process or that provide alternative options for saying the ‘same thing.’ … But [there] is no reason for regarding such events as part of the creolization process.” By 1999, he had conceded that most of the first creole generation in Hawai’i did in fact acquire one or more of the heritage languages. Yet, these children did not transfer parameter settings from any of the existing languages in their environment, as evidenced by the lack of ethnolectal versions of Hawaiian Creole English (Chinese-influenced, Japanese-influenced, and so on). What strikes one “most forcibly” about Hawaiian Creole English is its homogeneity, which cannot be attributed to dialect leveling (Bickerton Reference Bickerton and DeGraff1999: 55). Fast forward to Bickerton (Reference Bickerton, Clements, Klingler, Piston-Hatlen and Rottet2006), and we learn that it was the children of the expansion phase – not the children of the establishment phase (i.e., the first locally born generation of plantation children) – who were the creators of a creole, after which “nothing of much linguistic interest happened” (Bickerton 2008: 164).

The label “bioprogram” tends to be submerged in later writings (e.g., Bickerton Reference Bickerton1995, Reference Bickerton2007, Reference Bickerton2009) before resurfacing retrospectively in Bastard Tongues (Reference Bickerton2008) and More than Nature Needs (Reference Bickerton2014: 219, 238). We are more likely to encounter “biological template” (Reference Bickerton2007: 514), “mental template” (Reference Bickerton2008: 181), “innate algorithms” (Reference Bickerton2014: 219), and the unpacked original “biological program for language” (Reference Bickerton2009: 176). While posterity may have wound up in the end with a weaker strain of the original hypothesis, its essential precept remained practically unaltered.

In his preface to the republished Roots of Language (Reference Bickerton2016), Bickerton acknowledged weaknesses and shortcomings in the 1981 book while touting its virtues. It was the first work to propose that the “grammar underlying creoles – the ‘language bioprogram’ as it came to be called – must also be both what enabled children to acquire language on a limited exposure to it and the form in which language originally evolved” (p. ix). His initial formulation of the evolutionary pillar of his hypothesis is admittedly “naïve” (p. x) in hindsight and has been superseded by much subsequent work (especially Bickerton Reference Bickerton2009, Reference Bickerton2014). But, he added, research of some three and a half decades “has uncovered no evidence to challenge [the] relationship” between language evolution in early humankind and child language acquisition (Reference Bickerton2016: ix). Thirty-five years after Roots first appeared, it is, he wrote (pp. vii, ix):

surprising how little needs to be changed. Despite repeated attempts to refute them (and, of course, unfounded claims that this work or that has successfully refuted them) there is no need to change the central contentions of the original book. … Though it is, of course, impossible to say what the earliest true languages of humans looked like, that they looked remarkably like creoles is consistent with all we know about evolution, prehistory, and the faculty of language.

Overall, the preface gives the impression of a case that is more or less settled. It is, though not in a way that the author’s remarks suggest.

5.5 A Creole Window on Early Human Language? The View from Evolutionary Linguistics

We turn now to the alleged popularity of Bickerton’s creole window among scholars who deal with the origin and evolution of human language (DeGraff Reference DeGraff2020). There is actually little systematic examination of the heuristic potential of creoles (as opposed to pidgins), though one can, of course, find considerable incidental discussion. The focal point is typically how a full language can emerge among children who do not have access to a preexisting language that can be culturally transmitted to them. Pinker (Reference Pinker, Christiansen and Kirby2003: 23) uncritically cites Bickerton’s (Reference Bickerton1981) scenario of language creation by the offspring of polyglot plantation laborers as evidence for language being a part of the “standard human phenotype” (p. 22), alongside the emergence of complex sign languages in deaf communities, as in Nicaragua. Johansson (Reference Johansson2005: 179, 240) sees the emergence of certain universal language features in the formation of creoles (following Bickerton Reference Bickerton1995) as “particularly compelling,” while at the same time acknowledging the opposing views of Mufwene (Reference 95Mufwene2002) and DeGraff (Reference DeGraff2003). Comrie (Reference Comrie2000: 996) likewise acknowledges the controversy within creole studies and arrives at this conclusion: “We cannot use creoles as a clear case of creation of a language or part of a language ex nihilo. At best we can say that if it is true that creole grammar has been created ex nihilo, then creoles are directly relevant and illustrate the rapid creation of a language on the basis of a given lexicon only. But this is a big ‘if.’” At the extreme opposite pole, Slobin (Reference Slobin, Givón and Malle2002: 386) is completely negative in his appraisal: “Bickerton’s proposal that creole genesis reveals an innate ‘bioprogram’ for language seems far less plausible than when it was introduced 20 years ago. (Personally, I am not convinced by any of the evidence or arguments for the bioprogram).”

Fitch (Reference Fitch2010: 406) observes that “perhaps the most intriguing, if controversial of Bickerton’s ‘windows’ comes from pidgin languages, and the transition to creoles.” Yet, Fitch is rather more circumspect with regard to the idea that creoles are close to the biologically based innate form of human language (p. 379), recognizing, as he does, that it is not clear that regularities in creoles are de novo creations that flow from Universal Grammar as opposed to substrate transfer or retention. He is aware that most creolists do not accept Bickerton’s claim in its strong form. One wonders whether very many in that field, by 2010, “remain[ed] intrigued by the possibility that similarities among creoles provide insights into the biological nature of human language” (p. 379). At best, “intriguing” should be taken to mean ‘interesting at first blush, once worthy of debate,’ but not a current theme. Be that as it may, Fitch is careful to add that creole creators are, of course, “fully modern, language-ready humans.” The putative suddenness of pidgin-to-creole development would “illustrate the possibility of ‘instant’ syntax in a glossogenetic sense only” (p. 407). Far more support would be required to make a compelling argument that glossogeny – a cultural phenomenon – recapitulates phylogeny.

DeGraff (Reference DeGraff2020: e297) portrays The Oxford Handbook of Language Evolution (Tallerman and Gibson Reference Tallerman and Gibson2012) as a seminal anthology representing current thinking on the origin and evolution of language, which it certainly was at the time of publication more than a decade ago. Yet, a quick browse through the digital edition of the Handbook does not turn up much in the way of “exceptionalist” views. We can ignore what are transparently neutral allusions of the pidgin-to-creole life cycle in the editors’ volume and section introductions, and discount sporadic, incidental citations in articles by contributors from outside the subdiscipline of creolistics. My own piece (Roberge Reference Roberge, Tallerman and Gibson2012) deals primarily with the window potential of pidgins and contains critical remarks opposing the special status of creole grammar. I would refer here as well to Chater and Christiansen (Reference Chater, Christiansen, Tallerman and Gibson2012), who see language as a cultural product, “that is, a collective construction, across individuals and across generations of language users” (p. 631). For these authors, language evolution is language change “writ large.” They understand the transition from pidgins to creoles as “exhibiting the processes of cultural invention and transmission that underpin language evolution” (p. 631). Content words, referring to objects, properties, and actions, can be communicatively effective even in the absence of a fully structured grammatical system and are presumed to be the starting points in the evolutionary sequence (Bickerton Reference Bickerton, Tallerman and Gibson2012). Such items might initially be concatenated somewhat arbitrarily; “over time (both within individuals and across generations) particular patterns may then come to have conventionalized significance” (p. 631). One is reminded of Bickerton’s analog of pidgin speakers putting words together like beads on a string, which, as he long believed, was all there was in the protolanguage of our hominin ancestors (Reference Bickerton2009: 187). From there, however, Chater and Christiansen differ markedly, having done away with the assumption that language requires a dedicated, genetically specified Universal Grammar.

The total number of contributions in Tallerman and Gibson (Reference Tallerman, Tallerman and Gibson2012) that could conceivably fall into the “exceptionalist” column is two. Bickerton’s own chapter (Reference Bickerton, Tallerman and Gibson2012) outlines a possible sequence of steps in the emergence of “syntactic language,” which would entail the organization of lexical items into hierarchical structures, the determination of the boundaries of elements within such structures, the movement of elements therein, and the determination of reference for elements that are not phonetically realized. At least three potential “windows on early syntax” are available: child language, creole languages, and language change (pp. 464–66). Carstairs-McCarthy (Reference Carstairs-McCarthy, Tallerman and Gibson2012) takes up the controversial question of whether some languages are more or less complex than other languages. He considers cases such as creoles (defined as pidgins that have acquired native speakers), Riau Indonesian, the Basic Variety in early adult L2 acquisition, Pirahã (spoken in a remote area of Brazil), and constructed languages. Like Fitch (Reference Fitch2010), this author characterizes Bickerton’s “suggestions” (Reference Bickerton1981) as “intriguing” and allows that creoles could “shed direct [sic] light on language evolution” (Carstairs-McCarthy Reference Carstairs-McCarthy, Tallerman and Gibson2012: 470). Sympathetic as this opinion is, it is only partially “exceptionalist”: “Bickerton’s account of the tense-mood-aspect system is not widely accepted by other creolists, and some of his later writings on language evolution (e.g., Reference Bickerton1990) emphasize it less. There is wider acceptance, however, of Bickerton’s view that creole languages are, in some important sense, unusually ‘simple’” (p. 471). Nonetheless, Carstairs-McCarthy is well aware of a conundrum that Bickerton (Reference Bickerton2014) would attempt to resolve: “If some natural languages really are significantly simpler than the majority, how and why has the brain evolved so as to support unnecessary grammatical complication? Or is the simplicity of those languages really only apparent?” (Reference Bickerton, Tallerman and Gibson2012: 478).

5.6 The Pidgin Window Revisited

Many researchers have proposed that early hominin communication involved single, word-like forms (vis-à-vis holophrases), uttered severally or jointly in short, unstructured concatenations.Footnote ³ Protolanguage is therefore a primitive mode of communication, consisting of asyntactic utterances no longer than three to five units, as the potential for ambiguity implies a practical constraint on length (Bickerton Reference Bickerton2014: 105). Bickerton’s version of lexical protolanguage is often considered definitive (Fitch Reference Fitch2017: 14), although there are actually several variants of this foundational premise. Somewhere between the separation of the hominin line from other primates and the emergence of Homo sapiens over 2 million years ago, our remote ancestors developed protolanguage, a first approximation of which could be thought of as any form of communication that contains arbitrary, meaningful symbols but lacks any kind of syntactic structure (Calvin and Bickerton Reference Calvin and Bickerton2000: 237). For the next 1.5 (Bickerton Reference Bickerton1990: 141) to 1.8 million years or longer, hominins spoke only in protolanguage. At some time within the last 200,000 years and through some form of exaptation, connections would have formed between compositional protolanguage and social calculus (corresponding to thematic roles), presumably causing the emergence of phrasal-clausal structure (Bickerton Reference Bickerton, Hurford, Studdert-Kennedy and Knight1998: 350–53; Calvin and Bickerton Reference Calvin and Bickerton2000: 215–46). Later, Bickerton (Reference Bickerton2007: 520) would narrow that timeline with a suggestion that the process of syntacticization (the core syntax or Universal Grammar) was most probably completed at some point between 140,000 years ago (“the likeliest date for the speciation of Homo sapiens sapiens”) and 90,000 years ago (the start of the human diaspora). Bickerton’s evolutionary back story takes a somewhat different turn in Adam’s Tongue (Reference Bickerton2009) with emphasis on the emergence of displacement (displaced reference), elaborating on speculations that he had articulated in Language & Species (Reference Bickerton1990: 150–54). The particulars lie well beyond the scope of this essay. What matters are the continuities in the story: Words came first, in a structureless protolanguage; syntax emerged later per saltum, with Merge being the end point of language evolution. There was no intermediate stage between protolanguage – “beads on a string” – and “real” language – Merge with hierarchical structure (Reference Bickerton, Hurford, Studdert-Kennedy and Knight1998, Reference Bickerton2009: 234).

Bickerton (Reference Bickerton1990: 106) asserted that “no event happens in the world without leaving traces of itself, subtle and indirect though these may be. … It therefore seems only reasonable to suppose that there may exist contemporary phenomena – living linguistic fossils, so to speak – that would give us some insight into the processes through which language emerged.” “Living linguistic fossils” is of course a metaphor for the protolinguistic mode of expression that he thought to be manifest incipient pidgins, the communicative abilities of trained apes, early child language (specifically, the verbal behavior of under-twos), and the grammatical competence of adults who have been deprived of language in childhood (pp. 106–22). As a mode of linguistic expression, protolanguage is quite separate from normal human language with regard to formal structure, null elements (any item may be absent from any position), realization of arguments, mechanisms for the expansion of utterances (no recursive addition of constituents), and grammatical items (no inflection, complementizers, auxiliary verbs, conjunctions; few determiners and adpositions) (pp. 122–26).Footnote ⁴

The idea that pidgins might be of heuristic value for language evolution has been entertained by other linguists of varying theoretical persuasions. Givón (Reference Givón2009, Reference Givón2018) is a longtime proponent of the view that human language can be processed in two radically different modes, the “pregrammatical (pidgin) mode” and the “syntactic (grammatical) mode” and that the former – whether in children or adults – “[is] a legitimate analog to a distinct stage in language evolution” (Reference Givón2009: 318). With dramatic differences between them in their conceptualization of how language emerged in our species – catastrophically (Bickerton) versus adaptation in the interest of enhancing first communication and only secondarily thought, incrementally in a sequence of partially ordered steps – Jackendoff (Reference Jackendoff1999, Reference Jackendoff2002: chs. 4 and 8) elaborates on Bickerton’s idea that one can adduce a number of prehuman innovations from contemporary phenomena, some prior to Bickerton’s protolanguage, and some later (Reference Bickerton and DeGraff1999: 272). “Fossil principles” such as semantically based ordering of Agent First and Focus Last are among the resilient features that are prior to syntax and reveal themselves most clearly in restricted systems. To Bickerton’s list Jackendoff adds the Basic Variety in early adult second language acquisition, home signs of deaf children to nonsigning parents, and the verbal behavior of agrammatic aphasics (Reference Bickerton and DeGraff1999: 275–76). Pidgins, however, may be less telling than home signs because they draw upon the vocabulary of their source languages, often in phonologically reduced form (Jackendoff Reference Jackendoff1999: 277; Reference Jackendoff2002: 99).

As for whether one can draw inferences about early human language from the structure of pidgins, Heine and Kuteva conclude, on the basis of their examination of Kenya Pidgin Swahili and other factors, that “one may hesitate to answer this question in the affirmative” (Reference Heine and Kuteva2007: 193).Footnote ⁵ They characterize the earliest stages of pidginization as a “stripping process,” whereby phonological oppositions, affixal morphology, functional categories, and subordination are reduced or jettisoned (pp. 168, 171, 187, 192, 343). The direction of change (from grammatically complex to less complex forms of language) is exactly the opposite of how language evolution must have proceeded (p. 194). Having passed through the “stripping phase,” a pidgin may acquire new grammatical structures under appropriate circumstances, like any other language. Lefebvre (Reference Lefebvre, Lefebvre, Comrie and Cohen2013) has arrived at essentially the same overarching conclusion independently and more forcefully. Both Heine and Kuteva (Reference Heine and Kuteva2007: 194) and Lefebvre (Reference Lefebvre, Lefebvre, Comrie and Cohen2013: 448) agree that quite a number of “nonpidgin” features survive the pidginization process. For Lefebvre, this fact contraindicates feature stripping as a major process in the formation of pidgins. It is important to bear in mind that Heine and Kuteva’s case study is by their own description a post-pidgin continuum (Reference Heine and Kuteva2007: 170). Three of the four principal varieties in Lefebvre’s sample are dual-source pidgins, and all four are developmentally rather more advanced than the early-stage pidgins that are of probative value for pidgin window construction. Although pidgins, typically, are analytic, in some cases they do express semantic categories morphologically. Bakker (Reference Bakker, Booij and van Marle2003) reports a number of such cases, with the somewhat surprising determination that overall pidgins show more inflectional morphology than creoles do, and that most of it is “inherited” (p. 23). It is not clear at this time whether a higher frequency of inflection in pidgins than in creoles is due to different processes of genesis, or accidents of history (pp. 24–27).

The pidgin window is a conceptual construct that in principle could enable us to draw and license inferences about the architecture of protolanguage (Bickerton Reference Bickerton1990; Jackendoff Reference Jackendoff1999, Reference Jackendoff2002), the processes by which more complex linguistic systems emerged (Givón Reference Givón2009, Reference Givón2018), and processes by which early forms of language were created. Restricted linguistic systems are what Botha (Reference Botha2016: 81–82) calls analogue windows. They could preserve resilient features that hark back to our hominin ancestors and/or replicate processes of syntacticization (achievement of hierarchical syntax) and grammatical elaboration that may have taken place in the very remote past. Botha (Reference Botha2016: ch. 5) has brought forth a thorough examination of the heuristic potential of the pidgin window, with particular reference to the formulations of Bickerton (Reference Bickerton1990, Reference Bickerton2009), Mufwene (Reference Mufwene, Laks, Cleuziou, Demoule and Encrevé2008), and Roberge (Reference Roberge, Botha and de Swart2009, Reference Roberge, Tallerman and Gibson2012), and in so doing exposes several inadequacies. It has been pointed out often enough that pidgin speakers possess modern minds, a fully evolved language faculty, and also one or more full languages in their repertoire, whereas early hominins lacked not only these capacities but also the full cultural potentialities of modern humans. The emergence of early language may have involved sociocultural ecologies that are not commensurable with those in which pidgins arose. The ontological divide is large, and the requirements of the windows approach are strict, as set out in Botha (Reference Botha2016).

I doubt that anyone would seriously disagree with Heine and Kuteva (Reference Heine and Kuteva2007: 194) on the unlikelihood that reduction (loss of referential and nonreferential power) and simplification (regularization) would have been characteristic of protolanguage or would have played any role in the first appearance of full language. But as I have argued elsewhere (e.g., Roberge Reference Roberge, Botha and de Swart2009), pidgins appear to be heavily reduced simplified versions of full languages only as an artifact of contrastive analysis. In keeping with the “constructivist” view (see Baker Reference Baker and Baker1995, Reference Baker, Spears and Winford1997, Reference Baker, Selbach, Cardoso and van den Berg2009), I do not presuppose a process of severe “stripping” of the lexical source language by people who do not know it. Whenever groups of people in a highly diverse linguistic ecology and lacking a common language enter into sudden and sustained contact and have a mutual interest in both intercommunication and maintenance of group identity, they are likely to start constructing a basic medium for interethnic communication. Following Hazaël-Massieux (Reference Hazaël-Massieux, Selbach, Cardoso and van den Berg2009: 115), I assume further that languages do not take on preexisting grammatical systems: “At best they take up lexical forms that may indeed have a grammatical function in the language of origin but that are only initially taken over, in the context of speakers without a common language, as lexical elements.” This is essentially the aforementioned Basic Variety, which is a likely departure point for the construction of more fully developed grammatical systems in untutored adult second language acquisition and pidginization (Clements Reference Clements2019: 148).

My “creation variant” of the pidgin window (Roberge Reference Roberge, Botha and de Swart2009, Reference Roberge, Tallerman and Gibson2012) takes the position that in the post-Basic Variety phase internal expansion continued to be governed by language-independent forces (see Mühlhäusler Reference Mühlhäusler1997: 221). In most cases of pidgin (and creole) formation, the directional concept of a target language has no relevance. Targeting in these circumstances is internal, save for lexis; that is, the real target is the linguistic system that speakers are actually developing (Baker Reference Baker, Spears and Winford1997: 104; Mühlhäusler Reference Mühlhäusler1997: 198). Pidgin creators may elaborate the developing medium of interethnic communication by configuring the linguistic resources at hand in ways that suit their communicative intent. The resources at hand would include lexical material in all of the languages known to the participants, as well as a set of mechanisms and strategies that are not subject to critical-period limitations and facilitate innovation. Mufwene (Reference Mufwene, Laks, Cleuziou, Demoule and Encrevé2008) acknowledges that what the development of pidgins does tell us about the evolution of language (if little else) has to do with competition and selection of features, the gradualness of the process, and how communal norms arise.

To satisfy the Groundedness Condition of the windows approach (window phenomena must be sufficiently well understood), one must establish that with the provision of a lexicon, language will develop in the presence of a community of potential speakers. One can take this potentiality, at least, to be fairly uncontroversial.Footnote ⁶ Benazzo (Reference Benazzo, Botha and de Swart2009) has subjected the Basic Variety to careful scrutiny. Her findings confirm three key points: (i) Initial systems of adult language learners do not seem to be strongly influenced by their L1. (ii) “A target language-like lexicon is organized on the basis of pragmatic and semantic principles which are largely independent of the source/target language specifics.” (iii) The earliest stage of L2 acquisition involves neither a process of relexification nor “piecemeal imitation” of a native-speaker model (pp. 23–24). Benazzo makes a cogent case for the comparability of the Basic Variety with scenarios whereby language evolution took place progressively by incremental steps (p. 48).

Lest our inferences be less than fully grounded, one will need to marshal a rich body of well-analyzed data regarding jargons and very early-stage pidgins, and their development of syntactic structure and functional elements, as individual solutions to the problem of interethnic communication are overtaken by the establishment of social norms (stabilization). But there is a significant empirical obstacle. The formation of pidgins from the initial point of contact has never been studied in situ. Data from the Cape Dutch Pidgin, which informs my own “creativist” viewpoint, derives from written documentation that survives in the form of fragmentary and impressionistic representations by dilettante observers, and in reflexes in the Cape Dutch Vernacular (a mixed language out of which Afrikaans developed). Longitudinal studies of some extended pidgins (e.g., Tok Pisin) have yielded data that are closer to the initial point of contact (Mühlhäusler Reference Mühlhäusler1997: 187). But mostly, data from that crucial episode is recoverable only through reconstruction, if not wholly out of reach.

Not only must a window phenomenon be properly grounded, but inferences therefrom must be warranted (the Warrentedness Condition). For Botha (Reference Botha2016: 102), the major weakness of existing variants of the pidgin window (including my own) is not insufficient grounding but rather the lack of appropriate bridge theories for licensing inferences from what is known about pidgins to what is not known about language evolution. The bridges must take the form of theories that make empirical claims showing why the differences between these domains are inconsequential and do not compromise the soundness of inferences (Botha Reference Botha2016: 102), a requirement that is apprehended at a general level by Benazzo (Reference Benazzo, Botha and de Swart2009) in regard to the Basic Variety. If the subsequent elaboration of pidgins is to be understood as a process whereby grammatical competence develops out of lexical competence, then it arguably does not differ in substance from grammaticalization. For Botha (Reference Botha, Botha and Knight2009: 103), the grammaticalization window of Heine and Kuteva (Reference Heine and Kuteva2007) runs afoul of the Pertinence Condition. These authors nowhere draw the proper distinction between the phylogenetic evolution of language as a biological phenomenon and the nonphylogenetic change of individual languages as a cultural phenomenon. Furthermore, Botha (Reference Botha2016: 153–59) concludes that the uniformitarian assumptions of Heine and Kuteva (Reference Heine and Kuteva2007) and adopted in Roberge (Reference Roberge, Botha and de Swart2009: 116) are insufficient as warrants.

If it is possible to “see” something of a lexical, concatenating protolanguage through the window of incipient pidgins, and if the subsequent transition to an unbounded system of hierarchically structured expressions is an aspect of the cultural evolution of humans with language-ready brains (see Arbib Reference Arbib2017), then one component of a bridge theory could follow from the observation that “the selection pressures driving evolution from one stage to the next, can be related to the increasing complexity of proto-human society” (Johansson Reference Johansson2005: 239). Mithen (Reference Mithen, Botha and Knight2009) conjectures that early humans must have had a need to communicate with strangers and to talk about strange things. With higher frequencies of contact came an increased need to exchange information with individuals in nonlocal contexts and the establishment of long-distance exchange communicative networks. Pidgins are born in communicative exigency. The pressures driving the stabilization and expansion of pidgins can be related to the increasing domains of communication and to the need for predictability and learnability.

5.7 Conclusion

In this chapter – within the limits of the space allotted to me – I have chronicled the creole and pidgin windows on language evolution through their entire arc. Few would deny the fecundity of Bickerton’s ideas or their profound impact on creolistics. His Language Bioprogram Hypothesis provoked a farrago of research in the course of the 1980s and 1990s, the bulk of it in opposition, to be sure. But engagement with such a controversial hypothesis of creole genesis, however reactive, did move the field forward (see Veenstra Reference Veenstra, Kouwenberg and Singler2008: 219–20, 235; Drechsel Reference Drechsel2019: 192).

As for Bickerton’s contribution to evolutionary linguistics, Hurford (Reference Hurford2015: 485) offers a fair assessment: “[Bickerton] has managed to construct an edifice worth taking seriously in its broadest outlines, even though some of its argumentative foundations are questionable or erroneous.” Bickerton (Reference Bickerton, Tallerman and Gibson2012: 463–64) was well aware of the need for caution about modern phenomena serving as windows on language evolution, given that the circumstances in which contemporary phenomena exist are very different from those that prevailed perhaps some hundreds of thousands of years ago. Nevertheless, his adherence to a creole window on the earliest form of human language displayed an unrelenting freedom from doubt on a substantive issue that posterity does not share. Its popularity, as per DeGraff (Reference DeGraff2020: e297), is certainly overstated. At present, the pidgin window on language evolution that began with Bickerton and Givón still holds heuristic promise. But if it is not to shrink to a balistraria, much more work remains to be done.

6 Roots of Syntax: Anaphora and Negation in Creoles

6.1 Introduction

In 1967, standing outside a working-class bar in Guyana, not far from Georgetown, Derek Bickerton heard the local creole for the first time.Footnote * That experience was to change his life and have a major impact on the field of linguistics.

Almost from the outset, Bickerton fixated on the question of how and why creoles the world over were structurally so similar. Looking back years later, he formulated the puzzle as follows:

Bickerton’s ProblemFootnote ¹

How could creoles in different parts of the world be so similar to each other and so different from the grammars of the languages around them?

(e.g., 2008: 109)

In the early 1980s, the pursuit of an answer to this question led Bickerton to the Language Bioprogram Hypothesis, which he first outlined in his seminal book, Roots of Language (Bickerton Reference Bickerton1981).

The [Language Bioprogram Hypothesis] claims that the most cogent explanation [for the properties of creoles] is that [they] derive from the structure of a species-specific program for language, genetically coded and expressed, in ways still largely mysterious, in the structures and modes of operation of the human brain.

(Bickerton 1984: 173)

This proposal marked a sharp break with previous work on creoles, bringing to the fore the same logic that underlies the case for Universal Grammar in the generative tradition. Indeed, Bickerton later noted (Reference Bickerton2014: 224) that “Chomskyan UG was the stimulus for the [Language Bioprogram] without which [it] could not even have been hypothesized.” Similar ideas continue to be prominent in the field of creole studies, thanks in part to Bickerton’s pioneering conjecture.

The line of inquiry that I will pursue in this chapter has exactly the same goal that Bickerton set for himself, which is to understand why creoles have the particular properties that they do – a question that ultimately leads to a search for the roots of syntax. However, I adopt a fundamentally different starting point, which is that the syntax of creoles – and of language in general – is shaped by processing pressures. This idea is most often embedded within the set of beliefs and assumptions that define emergentist approaches to language, particularly those developed by Hawkins (Reference Hawkins2004, Reference Hawkins2014) and O’Grady (Reference O’Grady2005, Reference O’Grady, MacWhinney and O’Grady2015, Reference O’Grady2022).

In pursuing this line of thinking, I take up a proposal put forward in skeletal form by Elizabeth Bates, a founding pioneer of linguistic emergentism, in her comments on Bickerton’s (Reference Bickerton1984) paper in Behavioral and Brain Sciences.

[Bickerton] concludes that the computational facts must stem from an autonomous and language-specific genetic base. There is another possibility: Just as the conceptual components of language may derive from cognitive content, so might the computational facts about language stem from … the multitude of competing and converging constraints imposed by perception, production, and memory for linear forms in real time.

(Bates 1984: 189–90)

Bickerton’s rebuttal was that there are no obvious “commonsense explanations” for the facts he had in mind (Reference Bickerton1984: 215) and that “alternatives to the bioprogram were so vague as to be contentless” (p. 216). These were, I believe, entirely reasonable points at the time, when emergentism was still in its infancy. This chapter offers a more precise characterization of emergentist alternatives to the bioprogram and its UG-based successors by presenting two case studies, one involving anaphora and the other focused on negation. My goal will be to show that an emergentist perspective provides insights into the syntax and acquisition of these phenomena in creole and non-creole languages alike.

6.2 Anaphora

Among the many phenomena that have attracted the attention of linguists over the past half-century, none has garnered more general interest than anaphora:

anaphora has not only become a central topic of research in linguistics, it has also attracted a growing amount of attention from philosophers, psychologists, cognitive scientists, and artificial intelligence workers … [It] represents one of the most complex phenomena of natural language, which, in itself, is the source of fascinating problems …

(Huang 2000: 1)

The classic example of anaphora involves reflexive pronouns, whose interpretation is typically determined by an expression (usually dubbed the ‘antecedent’) elsewhere in the same sentence.

(1) Marvin disguised himself. (himself = Marvin)

As a first and informal approximation, two long-standing generalizations help define the syntax of anaphora (e.g., Falk Reference Falk2006: 62; Jespersen Reference Jespersen1933: 111; Pollard & Sag Reference Pollard and Sag1992: 266; Sag & Wasow Reference Sag and Wasow1999: 149).

(i) The antecedent must be a co-argument of the reflexive pronoun – that is, the two must be arguments of the same predicate. Thus, the following sentence is unacceptable, since the intended antecedent (Marvin) is an argument of say whereas himself is an argument of disguise.

(2) *Marvin said [I disguised himself].

(ii) The antecedent must be in some sense more ‘prominent’ than the reflexive pronoun, as evidenced by the fact that an agent can serve as antecedent for a patient in a transitive clause, but not vice versa.

(3) Marvin disguised himself.

(4) *Himself disguised Marvin.

Various proposals have offered independent evidence for the relative prominence of the agent, based on its status as the starting point of an event:

… the agent is at the head of the causal chain that affects the patient.

(Kemmerer 2012: 50; also Bornkessel-Schlesewsky & Schlesewsky 2009: 41)

… the agent drives the event to take place to begin with, [and] becomes the point of attention or anchor for the upcoming information …

(Cohn & Paczynski 2013: 75)

As we will see next – first for languages in general and then for creoles – this approach to the workings of anaphora provides important insights into the roots of syntax.

6.2.1 The Roots of the Syntax of Anaphora

Consistent with the two generalizations outlined above, coreference can be computed at minimal cost if the reflexive pronoun takes its reference from a prior co-argument in a clause’s semantic representation, as depicted schematically below.

PRED
<arg … arg>
antecedent anaphor

The relevant interpretive algorithm can be formulated as follows. (α = the antecedent; x = uninterpreted reflexive pronoun)

The Anaphor Algorithm
PRED
<α x>
↳α

The following sentence offers a concrete example.

(5) Donald’s friend said that [Mickey scratched himself].

At the point at which the processor encounters the reflexive pronoun, a search for its referent is triggered. As required by the Anaphor Algorithm, the search is resolved immediately and locally within the argument structure of the verb scratch, selecting the co-argument Mickey as the antecedent and ignoring the NPs in the matrix clause.

(6)
Donald’s friend said that [Mickey scratched himself].
<m x>
↳ m (m = Mickey)

The end result – instant interpretation of the reflexive pronoun – has been independently documented in the experimental literature.

Reflexive binding relations … have been found to be established extremely quickly during processing.

(Cunnings, Patterson & Felser 2014: 51)

… from the earliest measurable point in time referential interpretations are determined by constraints [on coreference] in tandem with other sources of information.

(Clackson, Felser & Clahsen 2011: 140; also O’Grady 2005: 167–69)

A Note on Acquisition

A further feature of the syntax of anaphora is its early mastery in the course of first-language acquisition.

Children display adultlike comprehension of sentences including reflexives from about 3 years and produce such sentences spontaneously from about 2 years. Children … can compute the local domain and, within this, determine the antecedent.

(Guasti 2002: 290)

This is a quite remarkable achievement, especially in light of the very limited exposure that children receive to reflexive pronouns in speech that is directed to them. The data in Table 6.1 comes from a search of the CHILDES corpora of maternal speech to Adam, Eve and Sarah. The samples consist mostly of hour-long bi-weekly child–caregiver interactions over a period of many months: from 2;3 to 5;2 for Adam, from 1;6 to 2;3 for Eve and from 2;3 to 5;1 for Sarah.

Table 6.1 Number of reflexive pronouns in maternal speech

	himself	herself	itself	themselves
Adam	15	1	4	2
Eve	1	0	4	1
Sarah	4	3	4	1
Total	20	4	12	4

Of the 40 reflexives that were uncovered, 19 simply expressed the meaning ‘alone,’ as in by himself, by itself and so on. The number of examples that illustrate the classic subject–complement pattern (He hurt himself) is thus remarkably sparse for a phenomenon as important as anaphora.

A quite similar state of affairs has been documented for Japanese. Orita et al. (Reference Orita, Oho, Feldman and Lidz2021) report that there were just 49 instances of the Japanese reflexive pronoun zibun in 40,412 utterances produced by a mother in speech samples collected when her child was between 2;11 and 5;00 years of age. Moreover, as in the case of English, many occurrences of the pronoun (32 in all) had a non-reflexive meaning, similar to English ‘alone’ or ‘by himself/herself.’ Yet, as Orita et al. go on to show in a comprehension experiment involving 48 Japanese children aged 4;05 to 6;02, the young learners invariably selected a more prominent co-argument as antecedent for zibun.Footnote ²

How then do children learning English, Japanese and other languages master the essentials of anaphora in such a short time? In my view, the answer is simple: the interpretation of reflexive pronouns is guided by a natural impulse to favor operations that minimize processing cost. The consequence of that impulse is the selection of a prior co-argument as antecedent, consistent with the Anaphor Algorithm.

Also striking is the fact that children learning English sometimes err in the interpretation of plain pronouns such as him and her, which are almost 100 times more frequent than reflexive pronouns in the input. Intriguingly, the error involves treating plain pronouns as if they were reflexive pronouns (e.g., Chien & Wexler Reference Chien and Wexler1990; Conroy et al. Reference Conroy, Takahashi, Lidz and Phillips2009; Van Rij et al. Reference Rij, Jacolien and Hendriks2010). Thus, in an experimental setting, young children are prone to (wrongly) agree that the sentence The penguin hit him accurately describes the picture in Figure 6.1.

Figure 6.1 Sample picture from Van Rij et al. (Reference Rij, Jacolien and Hendriks2010: 749).

Source: Reproduced with permission of The Licensor through PLSclear.

In contrast to reflexive pronouns, whose referent can be determined immediately and locally at very minimal cost, the interpretation of plain pronouns is often more wide-ranging and hence more demanding from a processing perspective. Not infrequently, the search for an antecedent extends beyond the sentence and may even require attention to nonlinguistic cues (as when a speaker nods in the direction of someone and says “I know her”). Evidently, the lure of reduced processing cost can lead to overuse of the Anaphor Algorithm by young language learners, resulting in the interpretation of plain pronouns as if they were reflexives.

6.2.2 Anaphora in Creoles

Haspelmath et al. (Reference Haspelmath, the APiCS Consortium, Michaelis, Maurer, Haspelmath and Huber2013b) report the widespread use of reflexive pronouns in creoles, typically in ‘situations in which a nonsubject participant is coreferential with the subject participant’ – the very pattern on which we have been focusing. The seventy-one languages for which there is data in the Atlas of Pidgin and Creole Structures form their reflexive pronouns in a variety of ways, as the following examples help illustrate.

Compound reflexive pronoun – 43 languages

(7)
Morisyen (Adone Reference Adone2012: 82–83)
Ti Zako fin grat li-mem.
little monkey asp scratch 3sg-refl
‘The monkey scratched himself.’

Body-part reflexive pronoun – 35 languages

(8)
Haitian Creole (Fattier Reference Fattier1998: 860)
Li tiye tèt li.
3sg kill head poss.3sg
‘He killed himself.’ (Lit. He killed his head.)

Reflexive marking on the verb – 5 languages

(9)
Sri Lanka Portuguese (Smith Reference Smith1974–75: ex. 41-14)
Eli jaa-cucaa-taam faaka vɔɔnda.
3sg.m pst-stab-refl knife with
‘He stabbed himself with a knife.’

A striking feature of many creoles, noted by Muysken and Smith (Reference Muysken, Smith, Adone and Plag2011: 50), is that the form of their reflexive pronouns tends not to be directly inherited from the colonial lexifier:

neither substrate nor superstrate can in themselves offer an acceptable explanation of more than a small part of [this phenomenon], morphologically speaking.

(p. 54)

Nonetheless, despite their variability and innovation, the reflexive forms found in creole languages comply with the interpretive constraints typical of anaphora the world over: they select a more prominent co-argument as their antecedent. Why should this be?

A straightforward explanation suggests itself while at the same time shedding light on the roots of syntax. Put simply, as previously noted, the properties of anaphora are shaped by processing pressures that favor the rapid resolution of referential dependencies. Regardless of its morphology, a typical reflexive pronoun therefore looks to a prior co-argument for its reference, in accordance with the Anaphor Algorithm.

Anaphora in creole languages appears to manifest another telling property. In a series of experiments on children’s acquisition of the French-based creoles Morisyen (spoken in Mauritius) and Seselwa (the Seychelles), Adone (Reference Adone2012: 104) reports earlier mastery of reflexive pronouns than of plain pronouns in the sorts of experiments described earlier for English:

The results obtained in this study with Seselwa-speaking children … and the results with the Morisyen-speaking children … complement each other and point towards one direction, namely that creole-speaking children behave in a similar way when compared with children who speak other languages. They bind the reflexives locally, and with respect to pronouns, the Seselwa-speaking children also seem to [associate] the pronoun with a local antecedent.

The parallel with the facts for English is both welcome and telling – and not at all surprising. Children learning a creole are no more able to escape the effect of processing pressures than children learning English or any other language.

6.2.3 Summary

The most widely attested properties of anaphora can be traced to the roots of syntax, which consist of processing pressures that, in the case at hand, help minimize the cost of resolving referential dependencies. The end result is a remarkable similarity across languages of all types, both in the syntax of anaphora and in the developmental trajectory that underlies its acquisition. Put simply, processing pressures – not Universal Grammar – constitute the blueprint that shapes languages of all types.

We turn next to a very different subsystem of language, the syntax of negation. The choice is deliberate, since it creates the opportunity to consider a phenomenon that is fundamentally different from anaphora in both its morphosyntax and its communicative function. Yet, as we will see, its properties help confirm the importance of processing pressures in understanding the workings of syntax.

6.3 Negation

Negation is a crucial component of language and, more generally, of human cognition itself. “Negation is what makes us human, imbuing us with the capacity to deny, to contradict” (Horn Reference Horn and Horn2011: 1). The syntax of negation is striking in its intricacy and variation. I will focus here on a particularly puzzling phenomenon involving sentences that contain two instances of negation – a sentential negative and a negative pronoun. I will begin by considering the key properties of this pattern in non-creole languages before turning my attention to their manifestation in creoles.

6.3.1 The Roots of the Syntax of Negation

The most common strategy for expressing negation in English makes use of the sentential negative not, which has the effect of denying the occurrence of a particular event.

(10) Jane didn’t buy a Tesla. (= ‘There is no buying event involving Jane and a Tesla.)

A second strategy expresses negation with the help of a negative pronoun that denotes a null set. As illustrated below, these items negate an event by indicating that (for example) its agent or patient denotes a null set and therefore has no referent. As a result, the occurrence of a buying event is negated.

(11) No one bought a Tesla.

(12) Jane bought nothing.

Under certain circumstances, both sentential negatives and negative pronouns can co-occur in the same sentence. At least two types of patterns can be identified.

One pattern, dubbed ‘double negation,’ involves sentences such as the following, in which the two negatives cancel each other out. The end result, as illustrated in the example below, is an interpretation that can be paraphrased as ‘I did something.’

1. A:
  Once again, you did nothing.
2. B:
  I didn’t do nothing – I washed the car!
  (not nothing = something)

As Zeijlstra (Reference Zeijlstra2004: 58–59) and many others have noted, double negation is a quite rare occurrence, largely restricted to situations in which the speaker wishes to deny a negative claim made by someone else, as in the previous example.

A second type of pattern produces a very different result: the two negatives combine to yield a single negative interpretation.

(14)
Middle English (Williams Reference Williams1975: 280)
   Ne taketh nothing.
   not take nothing
   ‘Take nothing.’/‘Don’t take anything.

(15)
Nonstandard Modern English
I didn’t do nothing.
‘I did nothing.’/‘I didn’t do anything.’

(16)
Formal French
   George ne voit personne. (compare: *George voit personne.)
   George neg see nobody
   ‘George sees nobody.’

Patterns of this type are treated as instances of ‘negative concord.’ Although stigmatized in English, they are standard in many languages, creating a familiar puzzle.

If the meaning of [negative-concord sentences] is equivalent to that of the structure with a single negation …, why do we need them? Negative concord appears to be seriously redundant.

(Giannakidou & Zeijlstra 2017: 10; also Zeijlstra 2016: 4–5)

… because it seemingly defies logic, negative concord, a linguistic construct in which multiple negatives lead to a single negative interpretation, unlike mathematical logic, has generated an enormous body of work from a variety of perspectives in syntax, semantics, and comparative as well as historical linguistics.

(Déprez & Henri 2018b: 1)

There is, I believe, a way to make sense of negative concord and to trace its roots to the processing pressures that shape language in general. The key principle can be stated as follows.

Stability

Processing and learning are facilitated when a form’s default interpretation is stable throughout the sentence.

Negative concord and double negation differ from each other in an interesting way in this regard.

In the double-negative reading of I didn’t do nothing, the default null-set interpretation of nothing is modified by its interaction with not, resulting in a ‘something’ interpretation. This is an instance of the phenomenon known as ‘scope,’ in which one logical operator influences the interpretation of another. In the case at hand, the scopal computation and its effect can be depicted as follows.

Double negative interpretation (= ‘I did something.’)

As illustrated here, the simple null-set interpretation of nothing has to be reworked under the influence of the sentential negative in violation of Stability – at a cost that arguably contributes to the previously noted rarity of these patterns.

In the case of negative concord, in contrast, neither negative has a semantic effect on the other.Footnote ³ Instead, each simply carries out the function that it has in sentences where it is the sole negative.

(i) The sentential negative signals the non-occurrence of an event, just as it does in sentences such as I didn’t go.
(ii) The negative pronoun has the same null-set interpretation that it manifests in contexts without negative concord (e.g., What did you see? Nothing!).

Negative concord interpretation (= ‘I did nothing.’)

On this view then, negative concord enjoys two processing-related advantages. On the one hand, patterns of this type comply with Stability, since the interpretation of the negative pronoun retains its default null-set interpretation, escaping any interaction with the sentential negative. At the same time, negative concord patterns manifest maximal uniformity in the use of negators in general: both sentential negation and negative pronouns carry out the same function that they have in sentences where there is just one negation. I will refer to this as ‘consistency.’

Consistency

Processing and learning are facilitated when a form has a uniform function across sentences.

Although Stability and Consistency contribute to an explanation for the vitality of negative concord in the world’s languages, this cannot be the whole story, as we will see next.

Competing Pressures

Stability and Consistency are just two of numerous pressures that contribute to the character of language. In some cases, these and other pressures compete with each other for dominance – a phenomenon that is well documented in the literature (e.g., MacWhinney et al. Reference MacWhinney, Malchukov and Moravcsik2014). The syntax of negation manifests at least two examples of competition involving Consistency.

The first involves a force that Hawkins (Reference Hawkins2014: 15) dubs “Minimize Forms,” which can be paraphrased as follows:

Processing linguistic forms along with their meanings and grammatical properties requires effort. Minimizing the number of forms can reduce this effort.

Minimize Forms conflicts with Consistency in the case of negative concord patterns, since the presence of negation is expressed twice – once by the sentential negative and once by the negative pronoun. Tellingly, some languages have sought to eliminate this redundancy.

English is a case in point. Whereas earlier varieties of English had negative concord, negative pronouns in standard Modern English are ‘self-licensing,’ thereby eliminating the redundancy that is associated with the consistent use of a sentential negative for any utterance expressing negation.

(17)

Middle English (negative concord) (Chaucer Toilus 1 203; cited by Fischer et al. Reference Fischer, van Kemenade and Koopman2001: 86)
Ther	nys	nat oon	kan	war	by other be.
There	not-is	not one	can	aware	by other be.

(18)
Modern English version (stand-alone negative pronoun)
There is no one who can learn from the mistakes of others.

A different approach can be seen in languages that manifest so-called ‘non-strict’ negative concord, which requires a sentential negative only when the negative pronoun is postverbal (Zeijlstra Reference Zeijlstra2016 and Giannakidou & Zeijlstra Reference Giannakidou, Zeijlstra, Everaert and van Riemstijk2017). The following examples are from Spanish.

Postverbal negative pronoun is licensed by a sentential negative:

(19)
Juan no vió nadie.
Juan neg see nobody
‘Juan saw no one.’

Preverbal negative pronoun is self-licensed:

(20)
Nadie salió.
nobody left
‘No one left.’

In this instance, Consistency competes with a second processing preference, “Maximize Online Processing,” proposed by Hawkins (Reference Hawkins2014: 28), which can be paraphrased as follows:

There is a preference for arranging forms so as to provide the earliest possible access to as much of the relevant syntax as possible.

The relevance of this factor in the case of negative concord stems from the fact that a negative pronoun negates a sentence, triggering the need for a sentential negative to comply with Consistency. In patterns where the negative pronoun is postverbal, the requirement can be immediately satisfied by the already available sentential negative. The following example is from French.Footnote ⁴

Postverbal negative pronoun in French:

(21)

In contrast, the occurrence of a preverbal negative pronoun triggers a search that cannot be resolved until a later point in the sentence – a state of affairs that falls short of optimal online processing.

Preverbal negative pronoun in French:

(22)

As we have seen, Spanish and other languages with non-strict negative concord forgo the sentential negative where it is not immediately accessible to the negative pronoun, thereby facilitating online processing.

The presence of competing pressures creates diversity in the syntax of negation that is incompatible with the oft-stated conjecture that a “Martian scientist might reasonably conclude that there is a single human language” (e.g., Chomsky Reference Chomsky2000: 7). Still, competition does not license mayhem. There is always a winner; one processing pressure or another ends up imposing limits on what is possible in any particular language. That is why there are no languages in which reflexive pronouns do not permit co-argument antecedents – and no languages in which negative concord occurs only with preverbal negative pronouns.

A Note on Acquisition

Turning now to language acquisition, two findings are particularly suggestive. The first involves children’s use of negation in their own speech. In pioneering work, Bellugi (Reference Bellugi1967) found that one of the three English-speaking children who she studied (Adam) regularly produced negative-concord sentences even though he had not been exposed to such patterns in the speech of his parents.

Samples of negative concord in spontaneous speech by Adam; see also Hein et al. (2022):

(23) I didn’t do nothing. (file 63, age 3;5)

(24) I didn’t call him nothing. (file 72, age 3;8)

(25) Because nobody didn’t broke it. (file 107; age 4;5)

A second suggestive finding is reported by Thornton et al. (Reference Thornton, Notley, Moscati and Crain2016) in their experimental study of children’s interpretation of patterns such as the following:

(26) The girl who skipped didn’t buy nothing.

A key feature of these sentences is their potential ambiguity. On a negative-concord reading, they have the interpretation ‘The girl bought nothing’; on a double-negative reading, they mean ‘The girl bought something [i.e., not nothing].’

Using pictures and a story-book context, Thornton et al. found that the twenty children (aged 3 to 5) in their study systematically opted for the negative-concord interpretation of the test sentences. That is, they took The girl didn’t buy nothing to mean ‘she bought nothing’ rather than ‘she bought something.’ This preference reflects the effects of Consistency and Stability.

• The sentential negative retains its usual function of denying the occurrence of a particular event. (Consistency)
• The negative pronoun maintains its default null-set denotation. (Stability)
• The two negatives do not interact with each other; there is no scopal operation.

(27) Negative-Concord interpretation

On the ‘double-negative’ reading, in contrast, an additional computation is required to yield the ‘something’ interpretation that comes from the interaction of not with nothing.

(28) Double-negative interpretation

Once again, we see in processing pressures an explanation for fundamental facts about the syntax of negation and the manner in which it emerges in the course of development.Footnote ⁵

6.3.2 Negation in Creoles

Haspelmath et al. (Reference Haspelmath, Michaelis, Maurer, Haspelmath and Huber2013a) report a ‘massive preference’ for negative concord in creoles, confirming a conjecture first noted by Bickerton (Reference Bickerton1981: 65). Indeed, by their count, 59 of the 73 languages in the Atlas of Pidgin and Creole Structures manifest the strict variety of negative concord,Footnote ⁶ which requires sentential negation even when the negative pronoun is preverbal. The following example is from Papiamentu, a primarily Portuguese- and Spanish-based creole.

Postverbal negative pronoun (Unilang n.d.):

(29)
Mi no tin nada.
I neg have nothing
‘I have nothing.’

Preverbal negative pronoun (Kouwenberg Reference Kouwenberg, Michaelis, Maurer, Haspelmath and Huber2013):

(30)
Ningun di nan no a laga nada lòs tokante nan plan.
no.one of 3pl neg pfv let nothing loose about 3pl plan
‘Not one of them has revealed anything about their plan.’

As Déprez and Henri (Reference Déprez, Henri, Déprez and Henri2018a: 313) note, the predominance of strict negative concord in Romance-based creoles is quite striking, given that Portuguese and Spanish adopt the less stringent practice of employing a sentential negative only when the negative pronoun is postverbal.

Also remarkable is the fact that English-based creoles, too, manifest a strong predilection for strict negative concord.

(31)
San Andres Creole English (Bartens Reference Bartens, Michaelis, Maurer, Haspelmath and Huber2013)

a. Postverbal negative pronoun:
Mi neva sii nonbady kom.
Mi neg.pst see nobody come
‘I didn’t see anybody come.’

b. Preverbal negative pronoun:
Nonbady no waahn daans wid Taiga.
nobody neg want dance com Tiger
‘Nobody wanted to dance with Tiger.’

As Bickerton notes (Reference Bickerton2008: 141–42), this pattern of negation differs from the practice followed even in nonstandard varieties of English.Footnote ⁷

So-called “double-negatives” [in dialects of English] always involve negative concord of verb and object (They don’t know nothing, I never told that to nobody), and never negative concord of verb and subject, as in Nobody don’t like me. But negative concord of verb and subject is found in many creoles; in [the English-based creole of] Guyana, for instance, you can’t say [the equivalent of] A dog didn’t bite me or No dog bit me; you have to say:

None dog na bite me.

Further subtleties may well be in play when it comes to the fine syntax of negative concord. As a number of scholars have noted (e.g., van der Auwera & Van Alsenoy Reference Auwera, Johan and Van Alsenoy2016; Déprez, Chapter 12, this volume), not only is non-strict negation less common than its strict counterpart in creoles and other languages, it also manifests more internal variation. A particularly interesting example of this variation is noted by Déprez (Chapter 12): in a number of creoles, including Haitian and Mauritian, a sentential negative is not required when the preverbal negative pronoun is complex, as in the second example below from Haitian.

Simple subject; sentential negative required (Déprez Reference Déprez, Ziegler and Bao2017: 82):

(32)
[_subj Pèsonn] pa vini
no one neg come
‘No one came.’

Complex subject; no sentential negative (Déprez, Chapter 12, this volume):

(33)
[_subj Pèsonn ki te wè sa] a ta rapòte li.
no one who pst see that cond report It
‘No one who would see that would report it.’

As can be seen here, the presence of a relative clause in the second pattern increases the distance between the negative pronoun and the position in which the sentential negative would normally occur (after sa) – a substantial escalation in processing cost. Interestingly, under these circumstances, the need for the sentential negative seems to be relaxed.

Finally, it is worth noting that, whatever the apparent complexities and intricacies of negative concord are, its acquisition does not appear to be problematic. Drawing on spontaneous speech samples from twenty-one children aged 1;9 to 5;4 learning Mauritian Creole, Adone (Reference Adone1994: 110) reports that “negation is acquired essentially without any errors.” De Lisser and Durrleman (Chapter 7, this volume) make a very similar observation about the acquisition of Jamaican Creole. Their study of the spontaneous speech of six children aged 18 to 23 months revealed mastery of “the rules giving negation from [the] earliest negative utterances.” These findings fit well with parallel reports from English and other languages on the ease with which young children are attracted to negative concord, even in the absence of exposure to such constructions in the speech of their caregivers.

An emergentist approach to the syntax of creoles offers an elegant and satisfying way to make sense of these facts. Negative concord is the pattern of choice in creole languages and in early child language for the reasons noted in 6.3.1: a commitment to Stability and Consistency reduces the cost of computing negation.

6.3.3 Summary

Like anaphora, the syntax of negation has its roots firmly embedded in the network of processing pressures that shape natural language. Although these factors can produce more than one result, as previously observed, they all converge on the goal of controlling the costs associated with the expression and interpretation of negation, especially in cases involving two negators. The parallels involving typology, creole formation and the trajectory of language acquisition are particularly striking in this regard, suggesting that a genuine insight into the character of the human language faculty may be at hand.

6.4 Concluding Remarks

As noted at the outset, Bickerton’s Problem focuses on a fundamental puzzle of which he became aware at an early point in his career.

the more I read about other creoles, the more apparent it became that the similarities I was finding between creoles in Hawaii and in Guyana were far from unique. They were no more than a special case of what was happening throughout the world. Over and over, creoles totally unrelated to one another showed the same rules, the same kinds of grammatical structure.

(Bickerton 2008: 108)

Bickerton’s explanation, like that of most scholars working within the generative tradition, posited an inborn Universal Grammar – a stance whose popularity has endured despite differences of opinion on many other matters, including even the origin of creoles (Bickerton Reference Bickerton2014: 22; DeGraff Reference DeGraff2009: 888; Mufwene Reference Mufwene2008).

I take no position on the issue of creole genesis other than to insist that there is nothing in the human faculty of language that would permit creoles to be fundamentally different from other types of languages – a sentiment eloquently expressed by the English philologist and early creolist William Greenfield almost 200 years ago.

The human mind is the same in every clime; and accordingly we find the same process adopted in the formation of language in every country.

(Greenfield 1830: 5, cited by DeGraff 2003: 402)

The real question, as always, relates to the nature of the process and how it yields the particular results that it does.

According to the proposal outlined in this chapter, the properties of creoles are best understood by reference to the same processing pressures that shape languages in general. I have illustrated this line of thinking by considering key features of two core syntactic phenomena – anaphora and negation, both of which have been well documented in the creole literature. As we have seen, there are good processing-related reasons why creoles exhibit the particular systems of anaphora and negation that they do. Put simply, for creoles, like all languages, the roots of syntax lie in processing pressures, not in a Language Bioprogram or other variety of Universal Grammar.

Although this might not be the solution to Bickerton’s Problem that Bickerton himself favored, it at least fits well with the description of creoles that he laid out in the final words of his autobiography.

Creoles are not bastard tongues after all. Quite the contrary: they are the purest expression we know of the human capacity for language … [They] spring pure and clear from the very fountain of language, and their emergence, through all the horrors of slavery, represents a triumph of all that’s strongest and most enduring in the human spirit.

(Bickerton 2008: 247)

a.	red herring	‘irrelevant topic’
as in Mike’s drinking habits are a total red herring.

b.	lady-in-waiting	‘woman who attends a queen at court’
as in Despite family pressure, she refused to be a lady-in-waiting.

c.	keep tabs on	‘monitor, keep under surveillance’
as in His wife keeps tabs on all his expenditure.

d.	take a shine to	‘be attracted to’
as in Jeff took a shine to Joe as soon as they met.

e.	not be a patch on	‘not be nearly as good as’
as in Smith isn’t a patch on Jones as a pianist.

f.	go under	‘fail commercially’
as in No one expected Lehmann Brothers to go under.

completely red herring	not	’completely irrelevant topic’
	but	‘herring that is red all over’

complete red herring	either	‘completely irrelevant topic’
	or	‘red herring with no parts missing’

entirely white elephant	not	‘entirely unwanted object’
	but	‘elephant that is white all over’

entire white elephant	either	‘entirely unwanted object’
	or	‘white elephant with no parts missing’

ne-gratte-guère-ciel	‘medium-rise building’
	(literally ‘scarcely-scrape-sky’)

passe-presque-partout	‘key that opens most locks’
	(literally ‘pass-almost-everywhere’)

take offence	?offence-taking
dwell on misfortune	?misfortune-dweller
give a cheer	?cheer-giving
race to the finish	?finish-racer
keep a mistress	?mistress-keeper
deliver a verdict	?verdict-delivery

profit-taking	?take profits
slum-dweller	?dwell in a slum
care-giver	?give care
motor-racing	?race (in) motors
door-keeper	?keep the door
time-keeping	?keep time
(i.e. punctuality)	(i.e. be punctual)

Category	N	%
object	54	30.5
location	35	19.0
adult human	9	5.0
interjection	13	7.5
“pivot”Footnote ¹⁵	36	20.0
predicate	33	18.0
total:	180	100.0

Adult–child diadic exchanges	Speech-act interpretation
MOT: What does the cow say Nomi?	epistemic
NAO: Moo.
MOT: Moo.

MOT: Doggie.
NAO: Me, me.
MOT: I don’t think you want any apple juice now.	deontic

NIN: Open.
MOT: Okay.
NIN: More book.
MOT Okay, do you want another book?	deontic

• new topical agent:	A, OV
• recurrent topical agent:	[0]-OV

• new topical subject:	S,V
• recurrent topical subject:	[0]-V

a.	pre-posing unpredictable new information	L-dislocation
b.	pre-posing important information	L-dislocation
c.	post-posing more predictable old information	R-dislocation

trend	biology	language
phylogeny	bio-evolution	language evolution
ontogeny	embryology & maturation	language acquisition
adaptive behavior	online adaptive behavior	language diachrony

Donald’s friend said that [Mickey scratched himself].
<m x>
↳ m	(m = Mickey)

Morisyen (Adone Reference Adone2012: 82–83)
Ti	Zako	fin	grat	li-mem.
little	monkey	asp	scratch	3sg-refl
‘The monkey scratched himself.’

Book contents

Part I - The Evolution of Syntax

Information

2.1 In Search of the Underlying Bases of Language (UBL)

2.1.1 Abandoning the Chomskian Frameworks

2.1.2 Protolanguages and Niche Construction

2.2 Towards an Adequate Theory of Language Origins

2.2.1 Territory Scavenging

2.2.2 Micro-Protolanguages

2.2.3 “The” Debate on the Transition from Protolanguages to Languages

2.3 To the Language-Ready Brain and on to Languages

2.3.1 Biocultural Evolution on the Path to Homo Sapiens

2.3.2 A Key Role for Fractionation in Language Emergence

2.3.3 The Protolanguage Spectrum and the Continuum to Languages

2.4 To the Underlying Bases of Language and on to Languages

2.4.1 Contra Berwick & Chomsky’s Merge-Based Account of Evolution

2.4.2 Bickerton on the Transition from Protolanguage to Language

2.5 Pidgins and Creoles and the New-Look Language Bioprogram

2.5.1 Nicaraguan Sign Language (NSL)

2.5.2 Pidgins and Creoles

3.1 Bickerton and Jackendoff on Protolanguage

3.2 Lexically Listed Items and ‘Compounds’: An Apparent Confusion

3.3 Non-nouns inside Compounds, and Restrictions on Their Modification

3.4 Deuterolanguage Untamed: Law Degree Language Requirement Changes and Cup Bid Floats

3.5 Deuterolanguage Tamed: Evidence from Germanic Languages

3.6 Characteristic B and the Truck-Driver Problem

3.7 A Final Note: Idiomaticity and the Two Meanings of ‘Lexical’

4.1 Introduction

4.2 The Neo-recapitulationist Perspective

4.3 The SOV Mystery

4.4 Extrapolation #1: Canine Communication

4.4.1 Here and Now, You and I, This and That Visible

4.4.2 Sociocultural Context: The Society of Intimates

4.4.3 Types of Information

4.4.4 A Note on Primate Communication

4.5 Extrapolation #2: Early Child Language

4.5.1 Communicative Mode

4.5.2 Sociocultural Context

4.6 Pre-grammatical Pidgin as an Evolutionary Stage

4.7 The Evolution of Grammar: A Hypothesis

4.7.1 Ground-Zero: Shift of the Communicative Context

4.7.2 Changes in the Communication System

4.7.2.1 Noun Coding: From Deixis to Well-Coded Nouns

4.7.2.2 Verb Coding: From One-Word to Two-Word Clauses

4.7.2.3 From Mono-propositional to Multi-propositional Discourse

4.7.2.4 Grammaticalization as an Evolutionary Process

4.7.2.5 The Drift Away from SOV

4.8 Discussion

4.8.1 Vestigial Relicts of Early Communicative Modes

4.8.2 Recapitulation and Developmental Trends

5.1 The Forensic

5.2 Conceptual Clarifications

5.3 From Language Bioprogram Hypothesis to Lexical Learning Hypothesis

5.4 A Creole Window on Early Human Language? The View from Creolistics

5.5 A Creole Window on Early Human Language? The View from Evolutionary Linguistics

5.6 The Pidgin Window Revisited

5.7 Conclusion

6.1 Introduction

Bickerton’s ProblemFootnote 1

6.2 Anaphora

6.2.1 The Roots of the Syntax of Anaphora

A Note on Acquisition

Table 6.1 Number of reflexive pronouns in maternal speech

6.2.2 Anaphora in Creoles

6.2.3 Summary

6.3 Negation

6.3.1 The Roots of the Syntax of Negation

Stability

Consistency

Competing Pressures

A Note on Acquisition

6.3.2 Negation in Creoles

6.3.3 Summary

6.4 Concluding Remarks

Footnotes

2 From the Protolanguage Spectrum to the Underlying Bases of Language

3 From Protolanguage to Deuterolanguage: The Importance of Compounds

4 The SOV Mystery and Language Evolution

5 Broken Windows: Creoles, Pidgins, and Language Evolution

6 Roots of Syntax: Anaphora and Negation in Creoles

Bickerton’s ProblemFootnote ¹

Haitian Creole (Fattier Reference Fattier1998: 860)
Li	tiye tèt	li.
3sg	kill head	poss.3sg
‘He killed himself.’ (Lit. He killed his head.)

Sri Lanka Portuguese (Smith Reference Smith1974–75: ex. 41-14)
Eli	jaa-cucaa-taam	faaka	vɔɔnda.
3sg.m	pst-stab-refl	knife	with
‘He stabbed himself with a knife.’

Formal French
George	ne	voit	personne. (compare: *George voit personne.)
George	neg	see	nobody
‘George sees nobody.’

Ningun	di	nan	no	a	laga	nada	lòs	tokante	nan	plan.
no.one	of	3pl	neg	pfv	let	nothing	loose	about	3pl	plan
‘Not one of them has revealed anything about their plan.’

a. Postverbal negative pronoun:
Mi	neva	sii	nonbady	kom.
Mi	neg.pst	see	nobody	come
‘I didn’t see anybody come.’

b. Preverbal negative pronoun:
Nonbady	no	waahn	daans	wid	Taiga.
nobody	neg	want	dance	com	Tiger
‘Nobody wanted to dance with Tiger.’

[_subj Pèsonn	ki	te	wè	sa]	a ta	rapòte	li.
no one	who	pst	see	that	cond	report	It
‘No one who would see that would report it.’