Modelling non-specific linguistic variation in cognitive disorders

Clinical linguistic diversity extends far beyond ‘specific language’ disorders, such as acquired aphasia or specific language impairment (SLI), to a large range of mental disorders that are not language-specific. As cognitive impairments are involved in the latter, models with an integrated approach to language and cognition can be useful for understanding and classifying the variation in question. The aim of this paper is to specify such a model, called the Bridge model, which views linguistic cognition as resting on two partially pre-linguistic pillars: (i) perceptual categorisation and (ii) social-communicative interaction. Grammar acting as a bridge crossing between them mediates the lexicalisation of perceptual categories and, based on these, new forms of social interaction and communication conveying thought structured by grammar. This model allows to conceptualise mental disorders as different ways in which this integrated linguistic-cognitive phenotype can deviate from its normal course. We illustrate our general model for the specific instance of language variation within autism spectrum disorder (ASD).


INTRODUCTION
Countless neurodevelopmental syndromes, such as Cri du Chat, Coffin-Siris, Rett, Angelman, Landau-Kleffner, X-fragile, or Phelan-McDermid syndromes, imply fundamental changes not only in how cognition develops, but also in language. In [1] Funding is acknowledged from grant PID2019-105241GB-I00/AEI/10.13039/501100011033, Ministerio de Ciencia, Innovación y Universidades (MCIU) and Agencia Estatal de Investigación (AEI) and grant 2017 SGR 1265 (AGAUR); as well as a SSHRC Insight grant awarded to Martina Wiltschko (Social Sciences andHumanities Research Council of Canada 435-2018-1011). We also wish to thank the anonymous Journal of Linguistics reviewers for valuable feedback. this regard any visit to a large, mixed special-education school is likely to reveal a bewildering variety of linguistic phenotypes, both between and within such disorders. A multi-dimensional spectrum of language capacities will open, in a landscape that linguistics has barely begun to chart in theoretical or empirical terms. At a descriptive level, at the bottom of this spectrum, functional language will be seen to be absent altogether, in both production and comprehension and in any sensory modality, as is often the case in several of the above syndromes and in no less than 25-30% of school-age children with autism spectrum disorder (ASD) (Tager-Flusberg & Kasari 2013;Norrelgen et al. 2015;Hinzen et al. 2019;Slušná et al. 2021). There may be other children with primary physical motor speech disabilities, which can leave them unable of any vocal production, while comprehension and writing, and hence language capacities proper, are relatively less affected (as can be the case in children with perisylvian polymicrogyria or Worster-Drought syndromes: see Kuzniecky, Andermann & Guerrini 1993;Christen et al. 2000). Numerous other linguistic phenotypes will appear between and beyond these.
This diversity serves as a stark reminder that language capacities are neither universal nor uniform in humans. Beyond even a basic map of this territory, we are lacking a typology, and, beyond a typology, a theory of why there should be these types. What is clear is that the variation in question differs profoundly from more familiar clinical linguistic profiles as seen in children typically attending normal schools, such as those with dyslexia, where difficulties centre on reading difficulties, children with specific language impairment (SLI)/developmental language disorder (DLD), where the most obvious problems may be the omission of grammatical morphemes in production, or children with Williams syndrome (WS), where speech can be fluent but content is affected. In these other cases, it is often clear that much of the human language capacity is kept.
Where this is not the case, issues of considerable theoretical linguistic interest arise. The neural basis of absence of functional language by school age, in particular, could be a profound source of insight into the organisation of language in the neurotypical brain. In addition, non-linguistic-cognitive impairments are copresent in all or most of the cases of interest, leading to various degrees of intellectual disability (ID) when standardised tests of IQ are applied, including in non-linguistic cognition in most children with ASD who do not develop language (Slušná et al. 2021). As such, every case represents a natural model of how language and cognition relate and disintegrate together.
Our primary aim here is to address a theoretical challenge that this linguistic variation poses. 2 Today, SLI/DLD and acquired aphasia following strokes still largely define our models of what language impairment is, and it is these models in turn that inform current models of language in the brain (Fridriksson et al. 2018). But SLI/DLD may be inappropriate when modelling language dysfunction in nonspecific language disorders, as these disorders, unlike SLI/DLD as viewed traditionally, do not leave cognition intact. This dilemma illustrates a languagecognition divide, which has long formed a cornerstone in both generative linguistics and developmental psychology, but is a questionable starting point where fundamental cognitive changes define a clinical condition from the beginning.
In fact, the same divide has meanwhile become problematic in 'specific' language disorders themselves. Thus, cognitive impairment across multiple domains in acquired aphasia has now been documented in most group studies (Fonseca, Ferreira & Martins 2016;Gonzalez, Rojas & Ardila 2020;Yao et al. 2020), including in cases where left-hemisphere stroke patients with and without aphasia were directly compared (Baldo et al. 2005(Baldo et al. , 2010. General cognitive impairment has been argued to provide explanatory models for aphasia, with the explicit implication that language at a competence level in fact stays INTACT in aphasia, in direct reversal of the traditional conception of aphasia as a primary language impairment. 3 In turn, in SLI/DLD evidence has accumulated that the language impairments in question are not specific to language and could at least in part reflect general cognitive deficits in auditory processing, working memory, procedural learning, processing speed, or motor developmental delays (Miller et al. 2001;Kohnert & Windsor 2004;Leonard et al. 2007;Bishop 2010;Tsimpli, Kambanaros & Grohmann 2017;Schaeffer 2018). A classical case for a cognitive disorder leaving language intact, namely WS, has also long been questioned (Karmiloff-Smith et al. 1997;Brock 2007).
Together, these findings raise the question of whether there is a theoretically or clinically useful sense in which there are specific language disorders and a language-cognition divide is empirically meaningful at all. They also further reinforce the need for integrated theoretical models of language and cognition. That need extends even further, as any visit to a large hospital's adult psychiatric or neurological ward will equally reveal a wide range of language phenotypes, from the fluent but formally disorganised and often unintelligible speech of patients with the symptom of formal thought disorder (McKenna & Oh 2005), to the hallucinated speech of patients with auditory verbal hallucinations (Tovar, Fuentes-Claramonte, et al. 2019), the 'empty' speech of patients with Alzheimer's disease (Nicholas et al. 1985), or the syntactically disorganised utterances in Huntington's disease .
Linguistically oriented work on these phenotypes remains scarce today. Instead, most work addressing this variation has proceeded opportunistically, harnessing natural language processing tools with the practical aim of finding discriminatory linguistic markers of a pathological process (Ahmed et al. 2013;Clarke, Foltz & Garrard 2020;Hitczenko, Mittal & Goldrick 2021). Typical variables used have been the number of utterances produced, speech rate, number or length of pauses, [3] For example, Hula & McNeil (2008: 169) state that 'Language mechanisms are fundamentally preserved and that aphasic language behaviours are instead due to impairments of cognitive processes supporting their construction'. For related claims and approaches, see Bates et al. (1997), Kolk (1998), Linebarger et al. (2007), Grillo (2009), andMirman &Britt (2014). length of utterances in words, type-token ratio, formal-grammatical errors, dependency depth, idea density, lexical diversity, word connectedness as measured by speech graphs, and proportions of different parts of speech. Few if any of these variables connect to current models of linguistic theory or to specific hypotheses of how structures of language relate to the structure of human cognition and thought. Some studies have taken a more linguistic route and profiled deviances along such 'levels' as the 'lexicon', 'syntax', 'semantics', 'phonology' and 'pragmatics' (Covington et al. 2005). Yet these levels, too, are theoretically and practically problematic when featuring in models of pathological language, as most linguistic phenomena of interest will intersect between these putative domains, and both the theoretical 4 and neurobiological 5 validities of these constructs are open to question. We aim to contribute to addressing these problems here by developing a new general model of language and cognition that can inform empirical investigations: the Bridge model, according to which the human language faculty bridges between two pre-linguistic-cognitive abilities, namely perceptual categorisation and social interaction. It is this bridge which defines human-specific thought. Thus, thinking and language are inherently integrated and there cannot be atypical language without atypical thought, and vice versa.
We begin by motivating this model in Section 2 and pointing to independent evidence that grammar is linked to reference and hence provides a basis for thought and its content; and that, in this way, grammar contributes to meaning (crucially including interactive meaning), rather than merely being a system to combine words with meaning. In Section 3, we introduce the Bridge model in detail and in Section 4, we illustrate how it can shed light on linguistic and cognitive diversity within ASD. In Section 5, we conclude.

LINKING LANGUAGE AND COGNITION
Language in its normal use has to be coherently integrated with effectively all other cognitive functions: we talk about what we SEE, HEAR, or TOUCH, we need to REMEMBER words and their meanings, we PLAN what we say, and we say what we [4] There is a growing consensus in linguistic theory that in language the traditional (sub-) domains of linguistics cannot be viewed as isolated from each other. For example, semantic interpretation (beyond lexical semantics) necessarily depends on structure (hence syntax). In turn, pragmatics today defines a domain largely fused with semantics; and difficulties captured descriptively as 'phonological' may be due to underlying syntactic or pragmatic mechanisms. For example, sentence-final rise in English is an intonational tune that might be considered a phonological phenomenon. However, like all prosodic phonology, intonation has to map onto structure and hence syntax is implicated. Moreover, rising declaratives receive a question interpretation unlike declaratives with falling intonation, which are interpreted as assertions. Hence semantics is implicated.
[5] In particular, a 'substantivist' interpretation of grammar such as Wiltschko (2014) is inherently inconsistent with the idea of a level of syntax separable from aspects of meaning (see Section 2.2 for explicit discussion of this view). For evidence against the neurobiological validity of such an 'autonomous' syntax component, see Blank et al. (2016), Matchin et al. (2019), and Blank & Fedorenko (2020).
do based on representing what others KNOW or BELIEVE. Beyond integrating specific cognitive domains such as vision, executive functioning, or theory of mind (ToM), however, language in its normal functioning inherently expresses a particular mode of THOUGHT. Just as language never seems to occur without thought being conveyed in it, the kind of thought conveyed in language never seems to occur without being expressed in language, in some sensory modality (Hinzen 2017). In particular, it cannot be adequately expressed in representational media suited to other forms of thought, such as pictures or music. Since major mental disorders are disorders in the specific thought process expressed in language (not in music, say), it is the relation between language and thought that we ultimately need to thematise here. Many specific cognitive domains, such as memory, executive functions, or ToM, ARE typically affected in neuropsychiatric disorders such as ASD, bipolar disease, or schizophrenia (Boucher & Anns 2018;Sheffield, Karcher & Barch 2018;Thibaudeau et al. 2020). But none of these disorders are likely to reduce to a deficit in any such specific domain. Neurotypical thought necessarily INTEGRATES all of these domains, and the role of language in this integration across cognitive domains needs to be our target here. This is so even when the same question arises for all of the specific cognitive domains separately as well (e.g. the role of language in ToM; see Schroeder et al. 2021).

Expressivism and beyond
One view of the relation between language and thought suggests that the role of language is restricted to the expression or 'externalisation' of an otherwise already independently existing functional thought system. This view, which we may call 'expressivist', is associated with traditional Cartesian-rationalist theories of language (Arnauld & Lancelot [1676] 1966), who saw language as a 'mirror' of the rational structure of thought, viewed as governed by logic and as independently given; and with the contemporary heirs of Cartesian rationalism (Chomsky 1966). This is true, insofar as they view language as encoding independently constituted thoughts, perhaps generated in a 'language of thought' (LoT) that has its own compositional semantics (Pinker & Jackendoff 2005;Fodor 2008); or insofar as they depict language development as 'latching onto' or communicating a world of 'concepts' and 'thoughts' that we already possess (Pinker 1994). An expressivist view in the above sense can equally be attributed to the contemporary cognitivefunctionalist tradition (Tomasello 2003;Levinson 2019), insofar as it regards language as a social construction for conventionally representing 'communicative intentions', whose contents and hence thought as such are again assumed to be independently given. Expressivist views are challenged by the pervasive co-morbidity of language across neurodevelopmental and neuropsychiatric disorders noted above: if thought is independently constituted, such a co-morbidity is not predicted. They are also challenged by evidence reviewed below that language is a critical factor in cognitive development from the very beginning of a human life, mediating species-specific forms of categorisation, learning, attention, memory, and social cognition (Arunachalam & Waxman 2010;Vouloumanos & Waxman 2014;Dehaene-Lambertz, Flo & Pena 2020). In the remainder of this section, we will challenge expressivist views based on a particular interpretation of the traditional insight that, in language, the relation between form (sound) and meaning is mediated by grammar.
We could imagine several ways in which meaning was NOT so mediated. Monkey call systems identifying predators through warning calls (Seyfarth & Cheney 2017) have meaning, but they may resemble innate human vocalisations like crying, grunting, sobbing, or laughter more than speech (Deacon 2006). In our own species, meanings would not be mediated by grammar if meanings are modelled ontologically as propositions viewed as abstract objects (Frege 1956;Katz & Postal 1991), and natural languages are no more relevant to the existence of the meanings they express than the language of arithmetic is to the existence of the numbers referred to in it. For example, the sentence 'Snow is white' would express the proposition that snow is white, but this proposition would exist independently of whether there are sentences or people communicating with language. On another possible view, meaning in human language might be effectively causally controlled by wordobject links, while language is merely an arbitrary way of referring to the objects in question or for conventionalising the link between them and our non-linguistic 'concepts' (Fodor 1998). In both these latter cases, one would not have to look at the structure of the system in which meaning is grammatically configured and conveyed in order to understand the principles on which it was based.
Even if it was true that grammar mediated the relation between form and meaning, it would not follow that the meaning in question depended on grammar for its existence. Grammar could be theoretically unpacked to indicate a function generating binary sets embedding other binary sets ('Merge', Chomsky 2008). In this case, all semantic content would either be lexical or be located on the nonlinguistic side of a semantic interfacewith grammar providing a means for such non-grammatically based meaning (in the form of features) to be recursively combined or for their combination to be restricted. A problem for this view is that there appears to be no restrictive theory of features at present and that all forms of meaning arising at a grammatical level would effectively have to be pre-coded lexically or else simply assumed to exist pre-linguistically. It would be a substantial assumption that the entire range of meanings that can constitute the contents of possible human thoughts would be available independently of languagewith language simply representing the fortunate accident of a conventional way of expressing such meanings in humans.
A different possibility is that, without grammatical forms of organisation of the kind seen in every human language, the kind of meanings conveyed in language would not exist. This would immediately make sense of the empirical fact that whatever cognitive process we dignify with the term 'thinking' in nonverbal species, cognitive phenotypes across species also differ (Penn, Holyoak & Povinelli 2008;Tomasello 2008); and that we seem to never find the same kinds of meanings when language is absent, as in the case of (declarative) referential meaning which is absent in apes (Tomasello & Call 2018), absent in humans when language is absent (Maljaars et al. 2011;Slušná et al. 2018), and highly correlated with language in neurotypical development (Iverson & Goldin-Meadow 2005;Colonnesi et al. 2010).
Pursuing this possibility, Hinzen & Sheehan (2015) develop an 'un-Cartesian' programme, according to which meaning mediated by grammar specifically is REFERENTIAL MEANING: it comes about through embedding lexical concepts in the context of an act of speech as mediated by grammar. Though abstract poetry clearly takes this idea to its limits, in ordinary language use there is no other option of using words than the referential one. Language could not be used like music, say, or as a calculation device. Moreover, and crucially, mere lexical concepts in the sense of isolated content wordshouse, bark, dog, etc.are not referential, since they only capture general classes of objects or events and cannot as such pick out a particular dog or house, as opposed to the dog I saw in that house, or the specific event of his barking I witnessed yesterday.
On this view, then, the human-specific form of meaning qua referential meaning is an aspect of grammatical organisation: it is CONFIGURATIONAL. This makes the view different from a LoT-type view (Fodor 2008), since it hypothesises that grammar is the mechanism that makes thought referential. But it is equally important, on this view, that unlike a monkey call system, reference is internally mediated by a store of lexicalised concepts or content words (i.e. lexically codified semantic memory). These lexical concepts can be activated at any moment of our mental lives for purposes of thought and reference, crucially irrespective of perceptual stimuli occurring in the here and now. This makes such concepts fundamentally different from what we may call PERCEPTSthe unimodal (e.g. visual) representations causally evoked by a perceptual stimulus (Hinzen & Sheehan 2015: ch. 2).
Meaning in humans is a two-tiered system, then: rooted in internally available concepts, it depends on reference, which is mediated by grammar, a cognitive system different from the system of conceptualisation. On this 'un-Cartesian hypothesis', grammar is the cognitive function converting a lexicalised semantic memory store into expressions functioning referentially. By giving a specific content to the claim that form mediates meaning, the un-Cartesian model connects language to thought directly: referentiality is as intrinsic to normal language use as it is to the thoughts expressed in it.
Linguistic evidence for this model comes from the fact that reference in its declarative forms (Tomasello & Call 2018) does not appear to be available nongrammatically, as noted, and systematically exhibits different forms, all of which are mediated by specific grammatical configurations (Sheehan & Hinzen 2011;Arsenijević & Hinzen 2012;Martin & Hinzen 2014, 2021. Moreover, language is never used non-referentially. 6 By integrating reference in all of its forms into grammar and conceptualising it as configurational, the widely noted developmental link between reference and grammar (Colonnesi et al. 2010) is now predicted: reference in the shape of pointing is not 'pre-linguistic' cognition, but itself a signal of linguistic cognition unfolding, a form of cognition inherently integrated with reference as an essential aspect of the meaning it carries. We also predict, arguably correctly, that language (and grammatically mediated aspects of reference specifically) would be affected in all major mental disorders including Alzheimer's disease, schizophrenia, and ASD (Hinzen 2017;Hinzen et al. 2019;Chapin et al. 2022).

The universal spine hypothesis
In developing the un-Cartesian framework, Hinzen & Sheehan (2015) only touch on how patterns of cross-linguistic variation bear on the un-Cartesian hypothesis. Meanwhile, however, research on linguistic typology has moved in its own directions, at least one of which is consistent with a broadly un-Cartesian point of view. Specifically, Wiltschko (2014) addresses the basic problem that the identification of syntactic categories in a given language has to employ language-specific diagnostics; but then, how are universal categories identified? How do we know whether there is indeed universal substance to structure, which would serve in the grammatical configuration of reference and propositional meaning, and if there is such substance, what would it be?
In the context of this long-standing challenge, Wiltschko (2014) uses MULTI-FUNCTIONALITY as a universal diagnostic for categorial status. Multi-functionality is the phenomenon whereby a given unit of language can be interpreted in multiple ways depending on its grammatical distribution. For example, in its use as a main verb (Virginia has a room of her own), have denotes the relation of possession; in its use as an auxiliary (Virginia has written her essay) it has a bleached meaning which expresses grammatical content (aspect) only. It follows that a given word (or unit of language) may have one meaning in one grammatical context and a different (albeit often related) meaning in another context. Patterns of multi-functionality are ubiquitous in natural languages calling for a model which predicts them to exist. Assuming that structure comes with substance mediating different kinds of meaning does just that. For any given unit of language, the interpretation will differ depending on its structural position, because the substance associated with structure will affect its interpretation. In this way, syntax not only mediates the relation between sound and meaning for complex expressions, but also for simplex ones.
[6] Expressives (e.g. oh, wow, oops) are sometimes not conceptualised as referential, yet they cannot be uttered without the presence of a specific contextually given event and hence without a referent being implied in that broad sense. However, this referent appears to be always restricted to the here and now, there is no lexical description of the referent, and no anaphoric relations are possible. This converges with recent analyses of such expressives as referential and as grammatically complex expressions (Corr 2021;Wiltschko 2021).
The substance of structure adds meaning to words. We refer to this as the SUB-STANTIVIST view of grammar. Based on cross-linguistic variation, Wiltschko (2014) argues for a substantivist view in which grammatical categories are constructed on a language-specific basis. Specifically, units of language associate with a UNIVERSAL SPINE (Wiltschko's term for a hierarchically layered set of structures that define the substance of grammar). Since units of language are necessarily language-specific, as they involve the conventional bundling of sound and meaning that make up the lexicon of a language, it follows that the categories constructed in this way are also languagespecific. But the spine restricts the types of categories that languages construct and the hierarchical order that they display.
In Wiltschko's (2014) system as depicted in Figure 1, each layer turns out to be linked to a function, each of which is essential for the configuration of reference and propositional meaning. They are hierarchically organised in the sense that all 'higher' ones presuppose the foregoing as necessary parts, reflecting an inherent increase in grammatical complexity: (i) CLASSIFICATION serves to classify events and individuals into subcategories (e.g. telic vs. atelic events; mass vs. count nouns, etc.); (ii) the introduction of a POINT OF VIEW serves to map the classified event or individual to a particular perspective on it (e.g. viewing it as perfective or imperfective); (iii) ANCHORING serves to map the perspectivised event or individual to the utterance context; and (iv) LINKING serves to map the anchored event or individual to the ongoing discourse.
This system lends itself to an un-Cartesian construal, insofar as configuring these layers (first an event, then perspectivising it, then embedding it in context, then linking it) inherently corresponds to building the formal ontology of a normal thought itself. The functions of each hierarchical layer are 'cognitive' as much as they are 'linguistic', thus characterising a distinctively linguistic-cognitive phenotype. Uniform cognitive functions of grammar interact with language-specific resources to create the same effect across languages: a form in which meaning is organised in a particular way and along several layers, spanning the space of thought. 7

Figure 1
The universal spine.
[7] This is why the assumption that language configures thought does not commit us to a Sapir-Whorfian position: it is the abstract functions rather than the language-specific categories which universally determine human-specific thoughts.

Beyond reference and truth
Language is as much a vehicle for thought as it is a vehicle for communication. Its communicative functions are reflected in the utterances that humans produce, through discourse markers, expressives, interjections, and the like. These do not contribute to the truth-conditional meaning of an utterance, but instead serve to regulate the interaction itself. Interactional aspects of language are constrained by grammar and thus arguably part of the spine (Wiltschko's 2021 'interactional spine'). In other words, the spine constrains the way in which language is to be used in social interaction, beyond merely mediating referential meaning at a truthconditional level. This implies that the way in which grammar structures the space of possible thought is in fact deeper and indeed potentially exhausts it: it configures both truth-and use-conditional meaning. Among the evidence that the language function implicates this interactional dimension is the fact that speakers of neuro-typical human populations have clear judgements about the appropriate use of language dedicated to encoding it (Wiltschko 2021). Note that even core grammatical categories such as Tense and Person have meaning directly relating to the speech act situation: the relational meaning of Tense connects an event and/or reference time to the utterance time, Person connects event participants to utterance participants (Sigurðsson 2004). But the scope of grammar goes beyond this, in also structuring the mental attitudes that speakers convey as well as the linguistic social interaction. This includes the encoding of speech act types (Ross 1970;Rizzi 1997), epistemic attitudes (Speas & Tenny 2003;Bhadra 2017), tag questions (Wiltschko & Heim 2016), discourse particles (Haegeman 2014;Thoma 2016), politeness markers (Miyagawa 2012), addressee agreement (Miyagawa 2017), vocatives (Hill 2007), and response markers (Farkas & Bruce 2009;Wiltschko 2017). There is significant convergence in this body of work, which suggests that the syntactic spine includes a layer of structure dedicated to hosting the language of interaction as an inherent aspect of the substance it carries. Specifically, the evidence suggests that there are several layers with distinct spinal functions which encode (i) SUBJECTIVE meaning (the speaker's attitude towards the content expressed), (ii) INTERSUBJECTIVE meaning (the speaker's evaluation of how the addressee relates to what is being said), (iii) META-COMMUNI-CATIVE content (such as turn-taking management) (Wiltschko & Heim 2016;Corr 2021;Wiltschko 2021). This matters in the present context, since linguistic diversity in cognitive disorders critically affects the social-interactive and communicative dimensions of language. In this context, models of grammar are needed that include interactional aspects of language and can exhaust the typological space of the co-variation in language and cognition.
Interactional dimensions of grammar are different from meaning configured in lower layers of the spine in a number of respects: (i) They consist of the nonpropositional dimensions of meaning, which do not contribute to the configuration of truth-evaluable structures, as they are configured AFTER a truth value is assigned and do not affect this truth value either, regulating what people DO with propositions in a context of use. (ii) Interactive language allows for limited recursion only: there is recursion to the extent that several (potentially articulated) layers of non-propositional language are attested. (iii) Interactive language is typically (though not always) found in sentence-peripheral positions. For this reason, interactive language is sometimes considered to be outside the clause (Kaltenboeck, Keizer & Lohmann 2016) and thus not part of grammar proper. However, interactive language displays all of the characteristics we expect if it were associated with the spine (Wiltschko 2021): its meaning is mediated by grammatical form, it shows patterns of multi-functionality (many discourse markers, e.g. right, serve double duty in that they can be used both within the grammar of truth and the grammar of use), it is structure-dependent, it exhibits patterns of contrast and may be paradigmatic, and, finally, it displays ordering restrictions. Hence, the interactive dimension, too, has to be part of the universal spine, dominating propositional structure or the grammar of truth, as schematised in Figure 2.
In this way, Wiltschko's model of cross-linguistic variation makes grammar and the type of thought expressed in it inseparable and seemingly co-extensive, in line with an un-Cartesian perspective. Yet the empirical question remains where human cognition as structured by the spine begins: there is mental and cognitive life both before and outside of language. In the next section, we introduce the Bridge model, which seeks to provide an answer to this question by capitalising on two specific interfaces that we perceive in the architecture of grammar.

THE BRIDGE MODEL
The Bridge model is based on the fact that there are two pre-linguistic pillars on which language rests, with which it forms interfaces, and which, we will argue, it connects: First, infant perception functions so as to group stimuli into perceptual classes corresponding to object categories, long before they can produce and, in part, even comprehend words (see below). Second, infants are agents in a space of social interaction from the very beginning. A model that seeks to integrate language and thought has to minimally address the question as to how grammatical cognition is grafted onto these two critical capacities (categorisation and interaction), resulting in a thought system that is as species-specific as it is linguistic. Specifically, we propose that the two ends of the spine (i.e. its bottom and its top) are special in that they have both a linguistic and a pre-linguistic life. This distinguishes them from other cognitive systems, which do not have a dedicated position along the spine and, hence, grammar. For example, there is no aspect of grammar that would indicate how a referent or a proposition is remembered or how it affects our feelings. We can talk about these aspects of cognition (memory, emotions), but there are no grammatical categories dedicated to them. Conversely, the two ends of the spine differ from other layers in that the latter (anchoring, classifying, etc.) do NOT appear to have a pre-linguistic-cognitive life, in the sense of forming direct interfaces with partially preverbal cognitive systems.
In what follows we review evidence that the two pillars in question (categorisation and social interaction) do indeed have a special status in relation to grammar and the language faculty more generally.

At the bottom end of the spine: Categorisation
As perceptual creatures, we form object and event categories, which capture commonalities among perceptually presented objects. Categorisation in this general sense is neither dependent on language in humans nor human-specific (Santos et al. 2002;Hespos & Spelke 2004). Humans, however, are special in that perceptual categories become lexicalised as WORDS, and these play a critical role in how, when, and which object categories are formed; how objects are individuated; which objects are attended to; and how they are memorised (Dehaene-Lambertz et al. 2020). Thus, from at least 3 months old (Ferry, Hespos & Waxman 2010), infants group perceptually different objects into categories when these are named but not when they are presented simultaneously with tones. This suggests an incipient understanding of the fact that words refer to things but tones do not. In line with this, 4-month-olds appreciate that speech, apart from being a social signal, has CONTENT: it communicates, with words picking out objects in the world about which a thought is entertained (Vouloumanos & Waxman 2014;Marno et al. 2015). By 6 months, words are further analysed in accordance with semantic features that make up their meanings: babies at this age are sensitive to the fact that cars are more similar to strollers than they are to bananas (Bergelson & Aslin 2017). In the course of the second year, parts-of-speech distinctions such as noun, adjective, and verb are used to map percepts onto categories of different formal-ontological types: objects, properties, and events, respectively (Arunachalam & Waxman 2010). As infants grow into this linguistic space of shared meanings, words exert a top-down influence on visual object perception (Gliga, Volein & Csibra 2010).
In these ways, while object perception as such is at least partially languageindependent, it is also partially linguistic in humans, taking off on a different course early on in our species, through the way it is connected to speech, and, via speech, to grammar and thought: early thinking about the perceptual world is inextricably linked to language.

At the top end of the spine: Social interaction
The perception of speech structures human social interaction from birth as well. It is one of the earliest stimuli processed prenatally (as early as 24-28 weeks: Eggermont & Moore 2012) and the subject of preferential attention relative to non-speech sounds from birth (Vouloumanos & Curtin 2014). At the brain level, it activates perisylvian language regions similar to those seen activated in language tasks in adults (Dehaene-Lambertz 2017). As speech is a social stimulus and a crucial social bond, a preferential bias for speech processing makes the space of social interaction inherently linguistic from the start. In this space, the infant is not merely a recipient of linguistic utterances directed at it, but an active partner from the very beginning. Thus, newborns are more likely to vocalise while the mother is speaking (Rosenthal 1982). In addition, their early vocalisations, described in the literature as coos and murmurs (Oller 2000), occur as part of rapid vocal exchanges with adult partners. These resemble conversations and feature-alternating vocalisations separated by clearly defined pauses (Bateson 1975;Gratier et al. 2015). By around 2 months, maternal and infant vocalisations are separated by pauses ranging from 500 ms to 1 s (Jaffe et al. 2001). The coos and murmurs are described to be more 'speech-like' as compared with vocalisations outside of a turn-taking format (Bloom, Russell & Wassenberg 1987). They also elicit emotional and motivated responses from social partners, with vocalisations in general eliciting responses from adult partners more frequently than gaze and smiling (Van Egeren, Barratt & Roach 2001), thereby further forging a speech-related social bond.
These findings illustrate that there is a linguistic signature to long-noted precocious social capacities in young infants. A key aspect of the synchronisation of early social interaction is rhythm, to which infants are sensitive: they synchronise their crying, movement, sucking, heart rate, and breathing (Provasi, Anderson & Barbu-Roth 2014), with first evidence of sensitivity to rhythm in fetuses at 35 weeks old (Minai et al. 2017). This is a prerequisite for establishing neurotypical social bonds and is restricted to species which display vocal learning (Schachner et al. 2009). Rhythm is a crucial property of speech and essential for the acquisition of words: basic rhythmic patterns (stress-, syllable-, or mora-timed) are relevant for word segmentation; and it is critical for acquisition of word order, which can be bootstrapped via prosodic phonological phrasing (Nespor & Vogel 1986;Langus, Mehler & Nespor 2017). Similarly, rhythm facilitates early turn-taking behaviour for which neonates at 2-4 days of age show evidence: they display intricate temporal organisation of their vocalisations in face-to-face communication with their mothers. Dominguez et al. (2016) found that 68.9% of these vocalisations occurred within a time window of 1 second following the mother's turn, with 26.9% 'latched' onto them, i.e. occurring within the first 50 ms, suggesting a surprising predictive grasp of when her turn would end. At the same time, 30% of vocalisations are characterised by a pattern of overlap, differing from fully developed turn-taking behaviour, which is characterised by a 'no gap/no overlap' requirement (Sacks, Schegloff & Jefferson 1974), and is acquired only later.
The full development of these adult-like no gap/no overlap turn-taking patterns takes an interesting trajectory hinting at how language acquires its role in providing the turns in question with linguistic content. As early as 5 months (Hilbrink, Gattis & Levinson 2015), and maybe already at 2 months (Gratier et al. 2015), the amount of overlap in the interaction between infants and mothers decreases so as to display more of a turn-taking like structure, i.e. moving towards a no gap/no overlap pattern. However, at some point infants slow down in their responses (Gratier et al. 2015;Leonardi et al. 2017), apparently violating the no gap requirement of adult-like turntaking behaviour. This may be the result of the time it takes for infants to process previous turns and (later on) to plan the appropriate response (Clark & Lindsey 2015;Hilbrink et al. 2015;Casillas, Bobb & Clark 2016) when such turns start to involve linguistic content that has to be processed. The slowdown at 9 months of age reported in Hilbrink et al. (2015) may specifically correlate with the fact that skills relevant for communication, such as joint attention and pointing, start emerging around this time. As argued in Butterworth (2003), pointing is 'the royal road to language'. Early pointing richly correlates with linguistic measures, both lexical and grammatical, as noted in Section 2.1 (Iverson & Goldin-Meadow 2005;Colonnesi et al. 2010). Once pointing emerges, grammar does, and the slowdown in turn-taking behaviour could emerge due to it taking on its first lexical and protogrammatical forms.
Turn-taking, then, like perceptual categorisation, is neither dependent on language in humans nor human-specific, as shown by rudimentary turn-taking abilities in other pro-social species such as marmosets (Chow, Mitchell & Miller 2015) or songbirds (Henry et al. 2015). Yet again, as with perceptual categorisation, when language is grafted on these precursor structures, these are also transformed: categories become words; words become parts of thoughts as configured in clauses; clauses define the contents of conversational turns; and turns are managed with linguistic means. As grammar falls into place, these contents come to exhibit the full referential format of thought as structured by the spine layer by layer, up to and including the discourse and response markers as well as intonational tunes that are the outposts of grammar at its interactional end.
Grammar, in short, acts cognitively as a bridge that connects perception-based categorisation to social interaction in its full scope involving thought, spanning the space in-between, where such thought is hierarchically built. We refer to this as the Bridge model ( Figure 3). As depicted there, the extended universal spine is at the core of our language faculty. Its substance determines the grammatical organisation of human thoughts and the ways these thoughts are communicated. It is grafted on evolutionarily older cognitive domains, namely perceptual categorisation and social interaction, which we have suggested are the pillars that are connected via grammar-based thought. The former defines an entry point after which lexical items appear, the latter an exit point, as thought-sized units are packed into turns of social speech. The spine is the bridge in-between.
As a crucial test case and support of how clinical linguistic diversity might be approached in this way, we finally turn to the challenge of modelling a spectrum of language abilities in ASD.

LINGUISTIC DIVERSITY IN ASD
Language dysfunction has been criterial for ASD during the first decades of the autism concept, up to the point that Frith (1989: 20) could still note that 'More has been written on the language of autistic children … than any other of their psychological disabilities'. This language focus has largely disappeared with the shift of the field in the 1990s to so-called 'high-functioning autism', perceived as being untainted by intellectual disability or language disorder. In line with this shift, current theoretical schemes of the underlying neurocognitive basis of ASD focus on such mechanisms as ToM, weak central coherence, or executive dysfunction, but not language (Happé & Frith 2020). This shift is not unproblematic, however. The practical and clinical significance of language dysfunction in ASD as a whole is uncontested and demands a theoretical explanation. Language remains one of the most important early reasons of referral (Lord, Risi & Pickles 2004). Some of the earliest warning signs for autism are language-related, such as deviance in babbling (Patten et al. 2014) or reaction to speech sounds (Arunachalam & Luyster 2016), including an absence of the preference of speech over non-speech and of infantdirected speech relative to adult-directed speech (Droucker, Curtin & Vouloumanos 2013;Vouloumanos & Curtin 2014). Early language levels are a crucial predictor of outcomes (Howlin et al. 2014), including language markers at the brain level (Eyler, Pierce & Courchesne 2012;Kuhl et al. 2013;Lombardo et al. 2015).
The sharp neglect of 'low-functioning' autism (Jack & Pelphrey 2017) since the 1990s also does not make it go away. ASD with ID is estimated to comprise nearly half of the spectrum in the United States (Centers for Disease Control and Prevention 2014). This half harbours the most significant nonverbal or minimally verbal population in our species, a substantial 25-30% of individuals on the spectrum (Tager-Flusberg & Kasari 2013;Norrelgen et al. 2015), with earlier estimates of up to 50% in the context of other diagnostic criteria (Rutter 1978;Bryson, Clark & Smith 1988). 'Nonor minimally verbal' autism (nvASD) in this sense is different from mutism, the selective and affectively grounded absence of speech in the presence of a language capacity. Crucially, comprehension levels in nvASD do not tend to exceed production levels Slušná et al. 2021). ASD with ID is unique among other ID groups in the severity of the language disorders involved and the degree to which comprehension is equally or more affected than production (Maljaars et al. 2011(Maljaars et al. , 2012Garrido et al. 2015;Slušná et al. 2021). In Down syndrome, the most common cause of ID, no substantial subpopulation with no or minimal language is reported (Martin et al. 2009). Given that a substantial minority of children with nvASD have nonverbal IQ scores in the normal range (Hus Bal et al. 2016), and many children with ASD and total IQ scores in the normal range show language impairment (Kjelgaard & Tager-Flusberg 2001), it does not appear that IQ as such provides a good model of language dysfunction in ASD.
At this junction, merely recording language impairment in ASD as a pervasive co-morbidity is unsatisfactory as well. To the extent that this co-morbidity exhibits SLI-typical features, one could seek to analyse the language impairment as an admixture of SLI, but this would do nothing to explain this apparent connection between SLI and ASD. It would also face two additional problems: linguistic profiles in SLI and ASD diverge (Boucher 2012;Taylor, Maybery & Whitehouse 2012), and an SLI-type impairment cannot account for language impairment of the nature and scale seen in low-functioning or nonverbal ASD. The strategy would seek to account for language impairment in ASD based on a model of ASD as a 'cognitive' disorder and of SLI as a 'language' one, recapitulating a traditional dichotomy. This dichotomy cannot be maintained by restricting how language is measured. For example, measuring the construct of 'language' through non-word and sentence repetition tasks only, leaving out its sentence-level semantic and social-communicative dimensions (Silleresi et al. 2020), would be circular. Moreover, none of the basic mechanisms posited in cognitive models of ASD -ToM, weak central coherence, or executive function deficitsseems to be promising analytical tools for specifically understanding language dysfunction in ASD. 8 The picture of language impairment in ASD, however, is entirely unsurprising on the Bridge model. ASD is defined by deficits in social interaction and communication, yet in humans, these two take linguistic forms. Criterial diagnostic features in use today specifically are deficits in 'communication', 'reciprocal social interaction', 'nonverbal gestures', and 'restrictive and repetitive behaviours'. Yet these clinical descriptive categories create the conceptual problem of how a dysfunction in any of these domains could possibly NOT relate to language (dys-)function. The problems of 'communication' and 'reciprocal social interaction' in question are evidently NOT intended to be problems in forms of communication and social interaction as seen in non-humans. On the contrary, symptoms of ASD are located precisely in HUMAN-SPECIFIC FORMS of communication and social interaction, which involve language function inherently. Individuals with ASD, even at its lowestfunctioning end, DO both socially interact and communicate (Preissler 2008;Maljaars et al. 2011;Cantiani et al. 2016;DiStefano et al. 2016), just in their own ways rather than the species-typical ones linked to language in its normal use. Human communication and social cognition are linguistic from birth as reviewed in the previous section, and one of the most paradigmatic human 'reciprocal interactions', starting from birth, is language. Paradigmatically autism-related so-called 'nonverbal' gestures like pointing and shared attention closely relate to language in development, as reviewed above, including social smiles (Hsu, Fogel & Messinger 2001). Diagnostically significant behaviours such as echolalia or stereotyped and idiosyncratic phrases descriptively ARE anomalies of normal language function, [8] The most promising of these is ToM, yet there is no evidence that ToM in its human form is independent of a language capacity, particularly if a cultural origin of ToM is posited (Heyes et al. 2020), and given pervasive evidence for its correlation with language capacities in humans (Farrar et al. 2017;Schroeder et al. 2021).
which is creative and interactive in its normal use by nature and universally exhibits dedicated grammatical devices to act out such functions. These facts motivate using a model that integrates language, thought, and communication, seeing them as an indissociable triad, when they take their human-specific forms. In terms of the Bridge model, it is evident that the bridge is affected in ASD, not merely the pillars (communication/interaction and perceptual categorisation in a non-linguistic sense). In line with that, grammar-based dimensions of language have often been found to be proportionally more affected in ASD than vocabulary (Boucher 2012;Arunachalam & Luyster 2016). However, vocabulary and lexical meaning, too, are not quite neurotypical, even in those children with ASD who develop words (Tek et al. 2008;Arunachalam & Luyster 2016). This means that the lexicon cannot be separated from grammatical functioning in ASD, which makes sense in terms of the Bridge model: what truly predates grammar is not the lexicon but perceptual categories, whose lexicalisation depends on grammar. Further in line with this is the fact that in nvASD, where children do not develop phrase speech and hence there is no grammar in production, no substantial lexicon (beyond a few, atypically used words) develops either. 9 Forms of meaning based on this integrated system depend on layers of a syntactic hierarchy, along which meaning differentiates as specified in this model. It is in the nature of a hierarchy that higher entails lower, and an immediate prediction is therefore that as the bridge disintegrates, we will find that any intact layer, n, should entail the intactness of any lower layer < n, while layers > n could be deviant. This makes predictions for possible and impossible clinical linguistic change -AND associated cognitive dysfunction to which each of the layers are connected. In particular, a child who can put single phrases together, such as simple verb-noun combinations, might not show evidence of sensitivity to aspect or episodicity as relating to tense or 'anchoring' (see Figure 3); but the other way around (presence of anchoring without basic classifications of events and their participants) should not be observable. It is noteworthy in this regard that problems with time-deixis have been highlighted in ASD from early on (Bartolucci, Pierce & Streiner 1980;Fine et al. 1994).
It could also be that some children will seek to break into language in the pillars of the bridge: A child with nvASD may seek to identify its own strategies of socially interacting so as to achieve its goals, and learn some words for this purpose, based on the pre-linguistic perceptual categories it has. But the prediction is that as long as this child does not develop grammar, the use of the words in question is not the neurotypical one, involving declarative reference. This is borne out by current research: to the extent that words are acquired in nvASD, they are not processed normally (Preissler 2008;Cantiani et al. 2016), and declarative reference is uniformly absent (Slušná et al. 2018). Analogously, a child with WS might break into language via its interactive end, showing great skills as a conversationalist, yet [9] We owe this observation to Dominika Slušná. showing semantic deficits at a grammatical and lexical level, relating to how the spine grows from one pillar to the other.
Where classification, point of view, and anchoring are in place, additional questions arise, such as whether a child can link events placed in context to other events referred to in discourse. If linking is affected, this would predict problems in the more grammaticalised forms of reference (pronouns, definite NPs, and deictic devices), which directly relate to discourse-based linking. This has been the case even in ASD without ID (Bartolucci et al. 1980;Norbury & Bishop 2003;Modyanova 2009;Rumpf et al. 2012;Banney, Harper-Hill & Arnott 2015). If linking is possible, in turn, the intricacies of social-interactive language including turn-taking and the interpretation of utterances in relation to epistemic states, may or may not be neurotypical. If they are not, this may specifically result in misuse or under-use of personal pronouns, which has been documented in ASD (Bartolucci et al. 1980;Mizuno et al. 2011;Shield, Meier & Tager-Flusberg 2015). Areas of particular interest for further investigation are discourse markers, which serve to indicate the epistemic states of the interlocutors (e.g. oh, huh,…), terms of address and other forms specifically geared towards conversational management (e.g. vocatives, intonational tunes), or response markers and backchannels which serve to indicate mutual understanding and (dis)agreement (yes, no, mhm, right).
Given that to be linguistic IS to master this hierarchy along the bridge, and the autism spectrum is a language spectrum, our model thus invites mapping language function seen in ASD systematically onto the different hierarchical layers of the spine, creating implicational predictions at each layer as well as for associated cognitive dysfunction as currently measured by non-linguistic variables as 'perspective-taking', 'reciprocity', or ToM. Language can now be seen as not merely another behavioural and neuro-cognitive domain where a given syndrome shows impairments, but as an integral mechanism in how and why a social-cognitive phenotype unfolds on a different path. Where this mechanism fails, alterations of this phenotype are predicted and rationalisable on a cognitive basis that integrates language, such that linguistic typologies of cognitive disorders can be developed.

CONCLUSIONS
Two types of linguistic diversity mark the human linguistic phenotype: one staying within the confines of a neurotypical thought process, another co-varying with its disintegration. Our perspective suggests to systematically link such change to changes in our language capacity as modelled here. In the tradition of generative grammar, this capacity has long been conceptualised as the incarnation of a genetically driven universal capacity (termed universal grammar). But as noted in the beginning of this paper, our linguistic phenotype is actually not more universal in humans than our cognitive phenotype: both are subject to systematic variation in our species. While that is uncontroversial in the cognitive case, as the existence of countless cognitive disorders with a genetic basis attest, it is rarely recognised for the linguistic case. We have here proposed to conceptualise such variation as affecting cognitive functions rooted in grammar, where grammar, in turn, is a mechanism for turning percepts linked to lexical addresses into forms of thought endowed with reference and truth and entering into social interactions. While the genetic endowment for this capacity is currently still labelled as 'universal grammar' (e.g. Hinzen & Sheehan 2015), the breakdown of such a universal grammar, on our model, is also the breakdown of species-typical cognition more broadly. If so, it would be mislabelled as a universal grammar: when intact, it IS a thought and social communication system as much as it is a grammar system. This single system may then hold a critical new key to making sense of cognitive diversity in mental disorders.