The grammar of engagement I: framework and initial exemplification

abstract
 Human language offers rich ways to track, compare, and engage the attentional and epistemic states of interlocutors. While this task is central to everyday communication, our knowledge of the cross-linguistic grammatical means that target such intersubjective coordination has remained basic. In two serialised papers, we introduce the term ‘engagement’ to refer to grammaticalised means for encoding the relative mental directedness of speaker and addressee towards an entity or state of affairs, and describe examples of engagement systems from around the world. Engagement systems express the speaker’s assumptions about the degree to which their attention or knowledge is shared (or not shared) by the addressee. Engagement categories can operate at the level of entities in the here-and-now (deixis), in the unfolding discourse (definiteness vs indefiniteness), entire event-depicting propositions (through markers with clausal scope), and even metapropositions (potentially scoping over evidential values). In this first paper, we introduce engagement and situate it with respect to existing work on intersubjectivity in language. We then explore the key role of deixis in coordinating attention and expressing engagement, moving through increasingly intercognitive deictic systems from those that focus on the the location of the speaker, to those that encode the attentional state of the addressee.


a b st r a c t
Human language offers rich ways to track, compare, and engage the attentional and epistemic states of interlocutors. While this task is central to everyday communication, our knowledge of the cross-linguistic grammatical means that target such intersubjective coordination has remained basic. In two serialised papers, we introduce the term 'engagement' to refer to grammaticalised means for encoding the relative mental directedness of speaker and addressee towards an entity or state of affairs, and describe examples of engagement systems from around the world. Engagement systems express the speaker's assumptions about the degree to which their attention or knowledge is shared (or not shared) by the addressee. Engagement categories can operate at the level of entities in the here-and-now (deixis), in the unfolding discourse (definiteness vs indefiniteness), entire event-depicting propositions (through markers with clausal scope), and even metapropositions (potentially scoping over evidential values). In this first paper, we introduce engagement and situate it with respect to existing work on intersubjectivity in language. We then explore the key role of deixis in coordinating attention and expressing engagement, moving through increasingly intercognitive deictic systems from those that focus on the the location of the speaker, to those that encode the attentional state of the addressee. k e y w o r d s : engagement, attention, intersubjectivity, deixis, coordination.
These two individuals, the producer and the recipient of language, or as we may more conveniently call them, the speaker and the hearer, and their relations to one another, should never be lost sight of if we want to understand the nature of language and of that part of language which is dealt with in grammar. (Jespersen, 1924, p. 17) t h e g r a m m a r o f e n g a g e m e n t i particular grammatical organisation. We argue that many languages have grammaticalised systems for monitoring and adjusting intersubjective settings; it is this grammaticalised intersubjectivity which we refer to as engagement, in much the same way as grammaticalised time representation merits the special metalinguistic term tense. 3 Our paper is serialised into two parts, across two successive issues of this journal -the first introducing the phenomenon, situating it with respect to other work on intersubjectivity in language, and outlining the key role of deixis in coordinating attention, the second broadening out to a typological survey of the phenomenon of engagement and to the diachronic question of how engagement systems originate. Within this first part, we begin with an initial example from the Colombian language Andoke ( §2), whose description by Landaburu (2007) was the first to argue for engagement as a core grammatical phenomenon. We then review two other bodies of work on epistemic distribution in the speech situation. The first research tradition ( §3) is attuned to general properties of conversational organisation rather than the use of core grammatical devices. The second ( §4) sets up a general framework for viewing multiple perspective in language, necessary to understand the asymmetries of knowledge distribution that accompany any projection by the speaker of what they believe (or wish to portray they believe) the addressee's epistemic disposition to be. 4 In §5, the concluding section of Part I, we pass to the primal scenario for establishing shared access -deixis -and examine the notion of engagement as it applies to the management of joint attention in deictic scenarios of drawing attention to entities, through demonstrative systems such as those of Turkish and Jahai.

What is engagement? An initial example
Consider the following pair of contrasting sentences from Andoke, an isolate language of the Colombian Amazon (Landaburu, 2007). 5 [3] While we may attribute our use of the term 'engagement' to Landaburu's work on Andoke, we also note that it has been used by others to discuss overlapping phenomena in discourse studies (e.g., Hyland, 2005) and in French linguistics, notably Desclés (2009) and Guentchéva (2011). [4] As Alan Rumsey points out (p.c.), dissembling may be involved at various levels. How speakers use particular formal devices cannot be taken as a direct reflection of what they think or assume; often it is more a matter of their Goffmanian 'presentation of self ' in particular situations. The speaker may be deceptive in the belief they ostensibly project about the addressee, as they may be about their own knowledge state. Caveats about these devices pertaining to 'presented belief' rather than actual belief thus need to be added. However, since adding these caveats at every relevant point in our discussion would clutter our exposition, we confine ourselves to stating it once here.
[5] Abbreviations: 1: first person, 2: second person, 3: third person, a d d r : addressee, a g t : agent, a s y m : asymmetric, d at : dative, e n c l : enclosure, e n g a g : engagement, g e r : gerundive, i na n : inanimate, i n g r : ingressive, i q : w h -question, n o n m ut d e m : non-mutual demonstrative, pe r v : perfective, p q : polar question, s p k r : speaker.
(1) a. páa b-ʌ ʌ-pó'kə-i already +s p k r + a d d r . e n gag -3 s g.i na n 3 s g. i na n -light-ag r 'The day is dawning (as we can both see).' b. páa kẽ-ø ʌ-pó'kə-i already +s p k r -a d d r . e n gag -3 s g. i na n 6 3 s g.i na n -light-ag r 'The day is dawning (as I witness, but which you were not aware of).' The relevant point of grammatical contrast is seen in the auxiliaries bʌ and kẽ (structurally similar to a word such as is in the English phrase is dawning) that precede the main verb ʌpó'kəĩ 'light(en), dawn'. The Andoke auxiliaries are made up of two parts: the first element (b-or kẽ-) encodes the dimension of 'engagement' -the relative access of speaker and hearer -and the second element marks subject agreement (i.e., who is undertaking the activity; in this case, the day or the sun itself, which is encoded as a third person singular inanimate subject). No descriptive sentence can be constructed without employing one element from the engagement set. 7 Consider the situation where the day is dawning and the two of us, speaker and hearer, are watching the sun rise together, so the speaker can presume joint attention to this mutually accessible event. This would be expressed as in (1a), using the auxiliary base b-(represented as 'plus speaker and plus addressee engagement', +s p k r + a d d r . e n g a g ) . But if the event is not accessible to the addressee -for example, he is only just waking up and is not attending to it -the base, kẽ-('+s p k r -a d d r . e n g a g ' ) would be chosen (1b). 8 Though the reference to 'seeing' in our elaborated translations may seem reminiscent of evidentials, in particular those marking the source of information as visual, what is at issue in examples like (1a, b) is not primarily the source of information but whether the addressee is presumed to be attending to, or more broadly to have access to, the event: pure evidentiality is about sources, whereas engagement is about the presumed [6] The zero morpheme is given in the gloss of the original (Landaburu, 2007, p. 26) without explanation, but we presume it is a variant of the 3s g inanimate suffix.
presence or absence of intersubjective sharing, whatever the source. We will see later, however, that many languages exhibit complex interactions between engagement and evidentiality (Part II, §3). As a second example, consider how one would translate 'it's the white people arriving' into Andoke (Landaburu, 2007, p. 25). In a standard situation, with shared access to the event, the 'shared engagement' auxiliary base b-(2a) would be used -for example, where both the speaker and addressee are together in a canoe, the speaker hears the noise of a distant motor, and directs the addressee to pay attention to it, confident that they, too, will be able to hear it. On the other hand, the 'unshared engagement' auxiliary base in (2b) would be used in situations where (i) the interlocutor does not have direct access to the event described, but (ii) the speaker is sure of their assertion. A strong internal revelation to the speaker would be one such context; another would be the case where the speaker is up in a tree and from there sees the white people, whose arrival would not be visible to the addressee, positioned at the foot of a tree in the forest.
(2) a. duiʌ́hʌ b-ə̃ dã-ə-ʌ whites + s p k r + a d d r . e n gag -3 p l i n g r -move-3 'It's the whites arriving (as we can both witness).' b. duiʌ́hʌ kẽ-ə̃ dã-ə-ʌ whites + s p k r -a d d r . e n gag -3p l i n g r -move-3 'It's the whites arriving (which I know / can witness but you can't).' This initial two-way contrast (shared accessibility versus speaker-only accessibility) is, in turn, part of a four-valued set of auxiliary bases (with a further subdivision of one value) whose other members deal with cases where the speaker lacks knowledge. In the case of true questions, where the interlocutor can be expected to know the answer, the pair k-/d-is used (Landaburu, 2007, p. 27): kfor polar (yes-no) questions such as 'Is it the whites who are arriving?' (3a), and d-for WH-questions like 'Who is coming?' (3b). The fourth value, coded by bã-, is used for self-interrogatory questions to which the speaker expects no answer from their interlocutor, who is simply a witness to the speaker's deliberation; that is, the event is presented as inaccessible to both parties (3c). 9 (3) a. duiʌ́hʌ k-ə̃ dã-ə-ʌ whites -s p k r + a d d r . e n gag. p q -3 p l i n g r -move-3 'Is that the whites arriving?' [9] As a further example of the -s p k r -a d d r . e n g a g bã-, Landaburu (2007, p. 28) gives the example of an aged narrator, describing a genocide he witnessed as a child, using the form bã-as auxiliary base in the question 'And why were they killing?' Given the setting, in which the interlocutors were all too young to have witnessed the terrible events which he is recalling, Landaburu argues that this can only be self-interrogation, and that the addressees are not being expected to supply any type of answer.
b. kói d-ə̃ dã-ə-ʌ who -s p k r + a d d r . e n gag. i q -3 p l i n g r -move-3 'Who is arriving?' c. duiʌ́hʌ bã-ə̃ dã-ə-ʌ whites -s p k r -a d d r . e n gag. p q -3 p l i n g r -move-3 'I wonder if those are the whites coming'. (Landaburu, 2005, p. 2) As Guentchéva and Landaburu (2007, p. 5) put it, the contrast between the auxiliary bases of Andoke "is better seen, not simply as a relation between the speaker and the truth of their statement but also … as a relation between what the interlocutors know". 10 Further, Landaburu argues (2007, p. 30) that "as well as the knowledge of the speaker, we are dealing here with relations of epistemic authority between the speaker and the hearer. The speaker's judgment of the truth of his proposition combines with the intersubjective dimension of the proposition, inside the grammatical system and not simply in perlocutionary or pragmatic effects." 11 As Table 1 shows, Landaburu posits an orthogonal pairing of two twovalued semantic dimensions, neatly accounting for the functional symmetry of the Andoke system. (He treats k-/d-as specific variants conditioned by polar vs. WH-question as seen above.) We adapt his terminology slightly in the translation process, substituting 'knowledge' vs. 'lack of knowledge' for his terms 'savoir' vs. 'non-savoir', and 'speaker' and 'addressee' for his 'je' vs. 'tu'. In addition to these merely translational changes, we comment here on two more substantive problems of terminology. First, Landaburu's terminology conceals a deep asymmetry: the speaker knows what they themselves know, but can only presume what the addressee knows, so that a more realistic characterisation of the terms in the left-hand column would be 'presumed addressee (lack of) knowledge', an issue we return to in §4 under the rubric 'multiple perspective'. Second, neither Landaburu's savoir nor its rough English equivalent 'knowledge' fully convey the range of the addressee's mental dispositions: arguably, the crucial difference between the (a) and (b) example in each case concerns differential accessibility to the speaker and the addressee. In some of his examples it is clearly knowledge that is at issue, but in others, such as the 'sunrise' examples in (1), attention seems the more crucial mental disposition.
[11] "Autant que du savoir du locuteur, il s'agit donc de rapports d'autorité épistémique entre le locuteur et l'interlocuteur. Le jugement du locuteur sur la véracité de son propos se combine avec la dimension intersubjective du propos, dans le système grammatical et pas simplement dans les effets perlocutoires ou pragmatiques" (Landaburu, 2007, p. 30). Landaburu presciently observes (2007, pp. 30-31) that it was unlikely that the contrasts he described there would be found just in Andoke, and that further research would probably turn up comparable phenomena elsewhere. Moreover, he suggests that an emphasis on speaker-knowledge, at the expense of the epistemic relations between speaker and addressee, results from the influence of traditional grammar (whose assumptions were then imported into formal logic), itself reflecting the contingent privileging of certain grammatical categories (tense, aspect, mood) in the classical Indo-European languages.
There are, of course, important and familiar exceptions to the lack of attention paid to grammaticalised epistemic relations between speaker and hearer. The most important are (a) the definiteness contrasts expressed in article systems in western European languages, 12 (b) focus systems responsive to information structure, 13 and (c) discourse particles 14 like German doch 'after all, actually ta b l e 1. The Andoke engagement paradigm as a 2 x 2 matrix (Landaburu, 2007, p. 30) Speaker knowledge Lack of speaker knowledge Addressee knowledge b-k-/d-Lack of addressee knowledge kẽ-bã- [12] "(W)hat type of shared knowledge is needed for language use? and … how is that shared knowledge in practice assessed and secured? The area of language in which we will take up these questions is definite reference, but even our interest in definite reference is secondary to our concern with the two questions of mutual knowledge" (Clark & Marshall, 1981, p. 11).
[13] The linguistic term 'focus' is notoriously variable in its use, being generally partitioned into 'referential givenness/newness' and 'relational givennness/newness' (Gundel & Fretheim, 2006). The latter pertains to divisions of a linguistic unit into given/new, topic/focus, etc., and is not relevant to the phenomena discussed in this paper. The former is defined by Gundel and Fretheim as "a relation between a linguistic expression and a corresponding non-linguistic entity in the speaker/hearer's mind, the discourse (model), or some real or possible world, depending on where the referents or corresponding meanings of these linguistic expressions are assumed to reside." This is closer to many of the phenomena discussed in this paper, though we note the lack of precision with regard to whose mind is involved, or the nature of the intersubjective relationship between them. Elsewhere in the same paper they mention "the speaker/writer's intention to affect the addressee's attention state". This draws their conception of focus closer to the typical purpose of engagement, as discussed in this paper, but the encoding devices they discuss are less grammaticalised and involve prosody and syntactic positioning.
[14] Significantly, Heritage (2012c, p. 77) states that "deep and important findings await us … in an increasing body of cross-linguistic analyses of various epistemic particles (Hayano, 2011(Hayano, , 2012Wu, 2004)" (see also Wu & Heritage, 2017). We briefly return to the particle issue in Part II, §5. For now, we simply note that while epistemic particles do indeed often encode the sorts of epistemic assessments we are interested in here, they differ from the prototypical systems of engagement in being less integrated into the grammar (e.g., as relating to their status as particles rather than affixes), and being less structured into symmetrical systems of opposition on more than one dimension.
(against earlier expectation)' or Italian mica 'not at all (against earlier positive expectation)' which express incompatibilities between an asserted state and that presumed to have been the case at some prior moment in the discourse. 15 For many investigators of information structure, which takes in "such psychological phenomena as the speaker's hypotheses about the hearer's mental states" (Lambrecht, 1994, p. 3), it is a precondition that "what one individual may know or hypothesize about another individual's belief-state" is only of analytic interest "insofar as that knowledge and those hypotheses affect the forms and understanding of LINGUISTIC productions" (Prince, 1981, p. 233).
All of these studies, then, are relevant to the domain of intersubjective coordination. But as we will show, they represent only a fraction of the grammatical design space. With the wider typological sample we adduce, it is clear that the world's grammars attest a much wider set of intersubjectively relevant categories than has previously been suspected. The initial typological framework we propose here aims to set out a broad programme of typological research that systematises the great diversity of grammatical devices in the intersubjective domain, along the following two axes: (i) scope, be it semantic or syntactic (entity/location/referent, state of affairs/proposition, evidence/metaproposition), (ii) intersubjective distribution (epistemic authority can be speaker, addressee, neither, or both).
A note on terminology before we proceed. Rather than burden the overworked term intersubjectivity with one further use, we will follow Landaburu's lead in using the term engagement to refer to a grammatical system for encoding the relative accessibility of an entity or state of affairs to the speaker and addressee. 16 This definition clearly relates to Du Bois' (2007, p. 144) notion of 'alignment', "the act of calibrating the relationship between two stances, and by implication between two stancetakers". 17 But whereas his term is intended to be broadly [15] Cf. Kirsner (2003) for the use of the Dutch particles hoor lit. 'hear' vs. hè 'isn't it?' with imperatives.
[16] One understanding of the word 'accessibility' is in reference to perceptual access, for example, something that is visible to a person is also directly 'accessible' to that person (cf. Tournadre & LaPolla, 2014). However, our use of the word is broader than this, in that we also understand it in terms of mental accessibility and in relation to 'having something in mind'. For example, under this latter reading, something that a person is attending to is highly accessible, because it is at the forefront of that person's mind. We can thus think of attention (and other mental dispositions) as a kind of (or even constraint on) accessibility, along with visibility, audibility, etc.
[17] In fact, a similar use of the term 'alignment' goes back beyond Du Bois to Erving Goffman, who used it at least as far back as his 1974 book Frame analysis. In his subsequent book Forms of talk (1981) he defines footing (rather sketchily) as "the alignment we take up to ourselves and the others present as expressed in the way we manage the production or reception of an utterance". functional, we reserve engagement for grammaticalised systems, which are only one means of addressing the alignment problem. Likewise, while the term 'stance' has been employed in somewhat similar ways by various authors, it is generally used in a broadly functional way rather than focusing on grammaticalised systems: examples are Heritage's (2012a, p. 6) definition of 'epistemic stance' as concerning "the moment-by-moment expression of [social] relationships, as managed through the design of turns at talk", or Engelbretson's (2007) more general definition of stance as expressing 'a personal belief or attitude' or 'social value'. Finally, a remark on the trajectory by which categories are 'typologically detached' from semantically related categories that they share expression with in many languages. In laying out their analyses, it is helpful for typologists to work with canonical, neatly cut-and-dried categories (Brown, Chumakina, & Corbett, 2013), so as to illustrate the dimensions of the design space with maximal clarity. But the relation of engagement to epistemic categories means that it borders on many more familiar linguistic categories: evidentiality, miratives, focus, mood, and modality. 18 And much of the time actual languages run some of these dimensions together. This may arise through conventionalised polysemous extensions across categories, e.g., the well-known case of Turkish -mIş, used both for evidential categories and for miratives (Aksu-Koç & Slobin, 1986;Slobin & Aksu-Koç, 1982). Or it may come about by exploiting inferences from one type of interpretation to another, e.g., by applying hearsay evidentials to one's own past behaviour to indicate ironical disbelief or lack of responsibility for one's unconscious actions (see, e.g., Michael, 2012;Wilkins, 1986). Our general strategy, in unfolding the typological framework we develop [18] The foundational if rather abstract definition of mood by Jakobson (1990Jakobson ( [1957) as characterising PnEn/Ps "the relation between the narrated event and its participants with reference to the participants of the speech event", may be charitably interpreted as subsuming engagement since we are talking about intersubjective relations between participants in the speech event with respect to the narrated event, though his actual examples did not touch on phenomena comparable to those we discuss here. Likewise, consider the following interesting and inclusive definition of modality by Timberlake (2007): "Modality is about alternatives -how we come to know and speak about the world, how the world came to be as it is, whether it might be other than it is, what needs to be done to the world to make it what we want. The alternatives are sorted out and evaluated by some sort of authority, often the speaker or, if not the speaker, some other participant or even another situation. Modality, then, is consideration of alternative realities mediated by an authority" (p. 315). This could only be stretched to cover engagement if we include attentional phenomena -'who knows about or attends to it' -under the rubric of 'how we come to know and speak about it', and even then there is no overt focus on intersubjective calibration. Other definitions of modality fit even less well, e.g., the one by Nuyts (2006, p. 1) as "any kind of speaker modification of a state of affairs, even including dimensions such as tense and aspect … qualifications of states of affairs" which deviates from our interests through its exclusive concentration on the speaker.
here, is to begin each major section with more clear-cut cases and then look at more complex and transitional ones.

Epistemic management in conversation
In a series of papers, John Heritage discusses the related notions of 'epistemic status', 'epistemic stance', 'epistemic gradient', and 'territories of knowledge' in an effort to account for the relation between sentence-type and communicative function, and how this is seen in the sequential unfolding of turns as a form of social action (Heritage, 2002(Heritage, , 2011(Heritage, , 2012a(Heritage, , 2012b(Heritage, , 2013Heritage & Raymond, 2005. He argues that epistemic status and epistemic stance are keys to understanding the discrepancies between grammatical form and (social) action, an issue that has plagued speech-act theory since its formulation (Austin, 1962;Searle, 1969) and necessitated the label 'indirect speech-acts' to account for such discrepancies (see Levinson, 1979Levinson, , 1983, for a critique). Epistemic status, as an index of relative epistemic authority, is formulated with reference to the notion of A-and B-events (Labov & Fanshel, 1977): A-events are known only to the speaker (speaker authority) and B-events are known only to the addressee (addressee authority). Typical B-events include the addressee's opinions, beliefs, bodily states, or professional expertise. The observation that authority to comment on events is unevenly distributed across speech-act participants is also explored in detail by Kamio (1997), who notes the infelicity of Japanese statements that target the addressee's 'territory of information' unless these are marked by appropriate sentence-final particles, which serve to weaken the speaker's epistemic claims and mitigate the force of such statements. Kamio's conceptualisation of 'territories of information' is adopted by Heritage to define epistemic status as a relatively stable concept subject to socio-cultural conventions: [W]e can consider relative epistemic access to a domain as stratified between actors such that they occupy different positions on an epistemic gradient (more knowledgeable […] or less knowledgeable […] which itself may vary in slope from shallow to deep …). We will refer to this relative positioning as epistemic status, in which persons recognize one another to be more or less knowledgeable concerning some domain of knowledge [.] (Heritage, 2012b, p. 32) The heuristic of an 'epistemic gradient' allows for a relative positioning of the speech-act participant's knowledge-states and rights to knowledge. This notion has been used, for example, in cross-linguistic research on sentencefinal particles that signal different kinds of questions (see Enfield, Brown, & de Ruiter, 2012;Hayano, 2012). The notion of epistemic gradient may be used to determine a speaker's epistemic stance, as indicated by the speaker's choice of sentence-type.
Heritage's efforts to detail how the epistemic statuses of speech participants shape turn-design enable us to look under the hood of the 'epistemic engine' of conversation (Heritage, 2012b). Indeed, language users are continuously keeping track of what others know and how their own knowledge can be related to the knowledge of others, and Heritage offers us a detailed and empirically grounded picture of how this 'epistemic ticker' works in everyday conversation.
There are, however, some issues that concern us in exploring the notion of 'engagement' from a cross-linguistic perspective, which are left mostly without comment in Heritage's work. One particularly important issue is what (linguistic) resources are available for conveying epistemic stance. While sentence-type has occupied a central role in research on English, linguistic forms signalling aspects of epistemic status and stance go well beyond sentence-type distinctions and may involve grammatical sub-systems that specifically target the perception, attention, and perspective of the speech participants, without requiring reformulation as interrogatives.
A final consideration is that Heritage's formulation of an epistemic gradient remains underspecified with respect to the individual commitments of the speech participants. That is, while a 'seesaw' gradient is conceptually useful, it veils the fact that the speaker's assumptions concerning the addressee's knowledge of some event are 'in the mind of the speaker' and do not necessarily correspond to the addressee's actual knowledge state (see below, Evans, 2006;cf. Bergqvist, 2015). The notion of multiple perspective, which we discuss in the next section, provides this underlying asymmetry with an explicit formulation, where the speech participant's points-of-view with respect to objects of discourse are calculated from the speaker's perspective.

Multiple perspective in g r ammar
As mentioned already, there is a clear asymmetry in the contrasts of epistemic distribution which engagement expresses. Whereas speakers have direct access to their own perspective, and can thus assert with confidence what they know, attend to, or perceive, in the case of the addressee they can only assume, to varying degrees of certainty. Assessments of the mental directedness of others therefore involve a type of complex perspective (Evans, 2006), which represents the speaker's assumption about the addressee's attentional state or access with respect to some state of affairs. 19 [19] There have of course been long and thorny debates on how far recursive mutual inference about each other's mental states is possible: Sperber and Wilson (1986) argue that speaker and hearer must engage in pragmatic inference about each other at several recursive levels, and Scott-Phillips (2015) posits at least five levels of recursive mind-reading in any ostensive communicative act. For arguments that pragmatic inference is possible with a substantially less rich cognitive package than these scholars maintain, see Planer (2017aPlaner ( , 2017b. As a caution that not all investigators have taken this as obvious, consider the discussion of definite articles in Givón (1989), and in particular his statement that definite descriptions are "inherently about knowledge by one mind of the knowledge of another mind" (p. 206). We do not share Givón's epistemological optimism -that one mind can have knowledge of the knowledge of another mind. As a more accurate and epistemologically cautious characterisation, we prefer the formulation given in Hawkins (1978, p. 97): "the speaker when referring [and choosing between definite and indefinite articles -authors] must constantly take into consideration knowledge of various kinds which he assumes his hearer to have." 20 This asymmetry -i.e., that assessments of knowledge or attention by the interlocutor are based on assumptions by the speaker -should be borne in mind throughout our discussion.
Multiple perspective constructions are constructions that "encode potentially distinct values, on a single semantic dimension, that reflect two or more distinct perspectives or points of reference" (Evans, 2006, p. 99). These are found in various parts of the grammar and fall into three kinds of perspectives: double, meta-, and complex perspective.
'Double perspective' is calculated with regard to two points of reference at once, each having equivalent epistemological status. An example is a demonstrative system like Japanese, where both the speaker's and the addressee's positions are taken into account when relating a figure to a location (e.g., Japanese: kore 'speaker proximate', sore 'addressee proximate', are 'proximate to neither speaker nor addressee'; see Hinds, 1973). Double perspective constructions are likely to be limited to 'transparent dimensions of experience' such as space and time, as these do not require calculations regarding the attention and psychological state of others: the stated perspectival values of double perspective constructions are objectively verifiable. (As we shall see, however, this does not mean that spatial demonstratives cannot develop less epistemologically transparent uses, including psychological and attentional parameters -see §5, below.) Meta-and complex perspective constructions are defined by the embedding of one perspective inside another. In meta-perspective constructions the perspective of one person is considered from the perspective of another. This can be seen in reported speech constructions such as, He said (that) linguistics has high standards of evidence, where the speaker asserts a report of another's assertion, but does not directly represent the speaker's position regarding the secondary assertion, i.e., linguistics has high standards of evidence.
[20] Arie Verhagen (p.c.) suggests a third position: that speakers can use the more optimistic common-ground scenario as a useful opening heuristic, at least in cases where there is mutually accessible evidence, then making adjustments (i.e., inferring asymmetries) when necessary -see also Verhagen (2015, especially section 3).
Complex perspective features the speaker's assertion of his/her own perspective along with that assumed by the speaker to hold for the addressee/ other. The sentence He is under the illusion that linguistics has high standards of evidence, by using an anti-factive predicate in the main clause, simultaneously predicates one perspective of the embedded subject (who believes linguistics has high standards of evidence) and a different perspective of the speaker (who believes that any claim that linguistics has high standards of evidence is illusory). Summarising the contrast, a meta-perspective does not require the speaker's evaluation regarding the perspective of the other (although it may be present by implicature), whereas a complex perspective features nondefeasible assertions regarding both parties.
In the context of epistemic marking, multiple perspective constructions are arguably restricted to variants of meta-and complex perspective if one concedes that the perspective of the other necessarily is embedded in the speaker's perspective. The conceptualization of multiple perspective in epistemic marking targets the same issues that Heritage (2011Heritage ( , 2012aHeritage ( , 2012bHeritage ( , 2012c) details for epistemic status and stance, but with an increased focus on the different ways in which perspectives may be expressed, and what subsystems of language facilitate such expressions.

Demonstr atives and the coordination of attention to objects and places
Arguably the most basic of intersubjective tasks in conversation is to coordinate the speaker's and addressee's attention on an object present in the context, by drawing the latter's attention towards that object through pointing or eye-gaze. After a long period when the typology of demonstrative systems was dominated by their spatial properties (Anderson & Keenan, 1985;Diessel, 1999aDiessel, , 1999bDixon, 2003), the field is unveiling a growing number of cases where demonstratives can best be understood as grammatical devices for bringing one's interlocutor's attention into line with one's own (cf. Janssen, 2002). As Hausendorf (2003, pp. 257-9) puts it: How can we account for the transition from single perceiving activities to mutually shared perception? … Whenever sensory perception is to be extended or differentiated in order to make use of what can be seen, Es ist das Kernstück, es ist die bevorzugte Technik der anschaulichen Sprache, was wir als Zeigfeld beschreiben. (Bühler, 1934, p. 81) What we describe as the deictic field is the core, the favoured technique of speech about perceptual things … (Bühler, 1990, p. 95) e va n s e t a l .
heard, smelt or touched in the physical environment, deictic devices can be expected to make sure that these perceiving activities become mutually shared. … I would propose to consider deixis as a device whose main function is to 'help' perceiving activities to become mutually shared communicative moves. … Deixis allows visual perception to be perceived in itself.
Classic typologies of demonstrative systems (e.g., Anderson & Keenan, 1985) looked at the degrees of distance from the origo or speaker: two in (modern) English (this/that), three in Spanish (este, ese, aquel, using the analyses of Hottenroth, 1982, andDiessel, 1999a), and seven in Malagasy (but with an additional visible/invisible contrast that gives fourteen; Rasoloson & Rubino, 2005). These may then be elaborated by other spatial characteristics like up/ down, upstream/downstream, etc. Despite their great variety, on these accounts all are fundamentally egocentric systems.
The next level of interpersonal complexity adds the possibility of taking other parties to the conversations as anchor point. Again, staying at the simplest level, entities can next be related to speaker, addressee, both, or neither, e.g., the three-way contrast in Japanese (kore speaker-proximal vs. sore addresseeproximal vs. are other), or the four-way contrast which is obtained in Quileute (Andrade, 1933, p. 252) by adding a fourth 'first inclusive' value: x̣ o´'o 'near the speaker', so´'o 'near the second person', sa´'a 'at a comparatively short distance from both', áˑtca'a 'at a long distance'. Burarra (Glasgow & Glasgow, 1977) is similar, with some interesting further twists. 21 Systems that take more than one conversational party as spatial anchor points may then be elaborated further by taking degrees of distance from two or more of these reference points. Abui, for instance (Kratochvil, 2007(Kratochvil, , 2011 has speaker-proximal, addressee-proximal, speaker-medial, addressee-medial, and distal (note that the speaker vs. addressee anchor point becomes irrelevant once the referent is far enough away), among other values bringing in factors like elevation. For example, one would say do fala for 'this house, near me', to fala for 'that house, near you', o fala or lo fala for 'that house, some distance [21] In fact things are even more complex than this in Burarra, because there are proximal aṇ ̣ d distal forms for each of the four person-defined values, with the distal forms interacting with modes of evidence/knowledge/perceptual access depending on the person. Thus the first person inclusive distal form -gata is translated as 'that/those in sight or known to you and me', while the third person distal form -gaba is 'that/those out of sight there'. The second person proximal forms are still compatible with being close to the speaker as well, but imply either that they are habitually closer to the addressee, or near or known to him/her. For example, an out-of-town visitor to the regional capital, Darwin, on encountering locals there, might use the second person proximal form ngunyunarda because the addressees, who live there, would have greater knowledge of the current locale. This anticipates our discussion of asymmetries of knowledge in Part II, §2. from me (but closer to me than you)', yo fala for 'that house, some distance from you (but closer to you than me)', and oro fala for 'that house (far from us both)'. Inuktitut (Denny, 1982) is another example of a language where there are two sets of demonstratives -speaker-anchored vs. other-anchored -where the second set may be anchored to a previous speaker, to the addressee, or to some other person or thing in the situation, which may not have been referred to before.
With these systems, we have now brought in interpersonal space -through the choice of speaker, addressee, both, or other as spatial anchor point -but not yet any intersubjective considerations, at least as far as most such systems are normally described -though one suspects that, for example, locations near the addressee are assumed to be more accessible to their attention, and even early accounts that focus on spatial semantics allow for metaphorical extensions into psychological domains. 22 At a third level of elaboration, perceptual modality enters the typology. We have already mentioned that Malagasy distinguishes visible from nonvisible in addition to seven grades of distance. In Santali (Zide, 1972, digesting material from Bodding, 1929 demonstratives can add -tɛ for objects perceived visually and -nɛ for objects perceived by other senses which means, usually, aurally. Quileute (Andrade, 1933, p. 252), in addition to the four person-oriented forms mentioned above, has three forms for different types of partly or wholly invisible location: one for where they are nearby and maybe partly visible, one for where they are invisible but in a known location, and one where they are invisible and also in an unknown location. 23 The detailed analyses of the Yucatec Maya demonstrative system by Hanks (1990Hanks ( , 1999Hanks ( , 2007Hanks ( , 2009 show not only that there are formal contrasts based on a three-way contrast in sensory modality (visual, tactile, auditory/olfactory) in addition to distance, but also that the system is best understood as providing a "directive function … whereby they direct an addressee to look, listen or take an object in hand" (Hanks, 1999, p. 124).
[22] For example, Anderson and Keenan (1985, p. 278) write that "spatial references [in deictic systems] serve as the basis, in most languages, for a variety of metaphorical extensions into other domains. … notions such as 'near to the speaker' may be interpreted not only in the literal, physical sense, but also by extension to 'psychological proximity', i.e. vividness to the mind of the speaker". They stop short, however, of mentioning more intersubjective metaphorisations such as we will see below.
[23] In Andrade's words, the first of the invisible forms is used "when the location is near or when the speaker is in it, and hence, visible only in part". And of the other two, he says "[t]heir use depends on whether the place is known to the speaker from previous direct experience, having been there, or whether he imagines the place or has heard of it" (1933, p. 252).
Our journey through demonstrative systems has thus led us into increasingly intersubjective terrain. Starting with a primarily spatial system, 24 we passed to systems which recognise other conversational participants as the anchor point for reckoning spatial relations, then on to those which direct the sensory modality which their interlocutors should use in searching for referents. We now raise the intercognitive status a final notch, examining demonstrative systems that explicitly encode the speaker's assumptions about whether the addressee has succeeded in locking onto the referent.
The first language for which this was shown clearly was Turkish, in studies by Aslı Özyürek (1998) and her colleagues Sotaro Kita (Özyürek & Kita, n.d.) and Aylin Küntay (Küntay & Özyürek, 2002(Küntay & Özyürek, , 2006. Turkish has a threevalued demonstrative system with three forms bu, şu, and o, which had previously been analysed as a person-based system on Japanese lines (e.g., Lyons, 1968) or as a distance-based system on Spanish lines (Bastuji, 1976;Serebrennikov & Gadzuyeva, 1979). However, these early analyses drew their base data from written texts in which the dynamics of face-to-face interaction could not be gauged accurately. Özyürek and her colleagues broke new ground by using videos of face-to-face interaction in which it was possible to track eye-gaze and pointing 25 behaviour at the same time as demonstrative use, leading to the following breakthrough.
Two of the Turkish demonstrative forms, bu and o, appear to be used roughly like English this and that, contrasting entities close to and distant from the speaker. It is the third form şu which is unusual compared to previously studied systems: it can be used for objects at any distance, but only if joint attention has not yet been established. This gives us the following set (Table 2), adjusting the first two for the fact that, unlike English, they require joint attention to be established in addition to specifying distance.
Consider the following example from the work of Özyürek and her colleagues. A teacher and two students are in a pottery class and one of the students wishes to refer to an object that is at the other end of the room. She points to it but the teacher's gaze has yet to fix on it (example (4) and Figure 1); at this point she uses the term şu: (4) ya hocam şu oval mesela well teacher n o n m ut d e m oval for.example 'well sir that oval(one) for example' [24] To be clear here: we are not claiming that the systems considered so far disallow intercognitive readings (see footnote 22 on the 'metaphorical extensions' referred to by Anderson & Keenan, 1985), but rather that they contain no form whose meanings have been analysed as primarily intersubjective.
[25] Turkish speakers also use other means of indication, such as eyebrow-raising or raising the chin slightly (Göksel & Kerslake, 2005), though these were not mentioned in the Özyürek & Kita (n.d.) study.
In a second, more elaborated, utterance, in which she keeps pointing to the vase but the teacher's gaze has yet to lock onto it (example (5) and Figure 2), she continues to use the possessive form of şu, namely şunun 'of that one (which you have yet to identify): (5) şu-nun dış yüzey-in-e koy-up da n o n m ut d e m -g e n outer surface-g e n -d at put-g e r c o n n e c 'by putting it on that thing's outer surface' Finally the teacher's gaze moves up to follow the point and locate the referent (example (6) and Figure 3), and now the speaker switches to o, the form for distant but mutually attended objects (o is suffixed by (n)dan to mean 'from that'): (6) ondan da olabilir d i st : a b l and possible 'That could be one as well.' We can summarise how the Turkish deictic routine works in the following way: use a combination of pointing plus şu until you are sure of having achieved mutual attention on the object at issue, then proceed by using bu or o according to the distance to the referent. ta b l e 2. The Turkish demonstrative system (after Özyürek & Kita, n.d.;Küntay & Özyürek, 2002)  Our second example comes from work by Niclas Burenhult (2003Burenhult ( , 2008) on the Aslian language Jahai, spoken in Malaysia. Jahai has a set of eight demonstratives which can be arranged as in Table 3. The forms starting with a glottal stop (ʔ) are adverbials like 'here', while those starting with t are  t h e g r a m m a r o f e n g a g e m e n t i nominal demonstratives with meanings like 'this', but the logic of these two series is otherwise identical.
According to Burenhult, the Jahai conceive of conversation as a sort of container, and as "soon as a person addresses another person, they and the area between them become a connected spatial entity" (Burenhult, 2008, p. 116). The last four pairs in the table position objects with respect to that container. If we imagine it cut in half by a line between the speaker and the addressee, those on the speaker's side but outside the container will be denoted by tadeh, those outside it but on the addressee's side by tɲɨʔ. Those conspicuously above or below the speech situation will be identified using the so-called superjacent or subjacent demonstratives from the 'elevation' set.
But it is the top four which interest us more here, and in particular the 'addressee-anchored accessible' ton. Burenhult obtained revealing data on this system using a 'director-matching task' where a 'director' has a photograph of different arrangements of objects, which he describes orally to a 'matcher' whose job is to reproduce the arrangement using real objects. In addition to his own photograph, the director can see the matcher and what he is setting out, whereas the matcher can only see his own objects and needs to rely on the director's verbal description. Under these circumstances, discourses are produced which typically begin with the director's introduction of a referent (e.g., 'take the one which is flat and round'), proceed with a sequence of demonstrative exhortations by the ta b l e 3. Jahai demonstratives (Burenhult, 2003) (8)).
(7) tũn -tɲɨʔ -ton 'that (on your side but so far inaccessible to you) -that way over\on your side -that.one.now' or (8) taniʔ -taniʔ -ton 'this one (inaccess.) -this one (inaccess.) -that one now!' The way the Jahai demonstratives track the speaker's monitoring of the addressee's attention is thus rather similar to Turkish, but the actual progression is almost the converse (see Table 4). The initial şu forms in Turkish give no spatial information of their own, merely telling the addressee to keep looking (in particular, to follow the point), but once lock-in has been achieved they give way to spatially specific forms (close to or far from speaker). In Jahai the forms used give much more spatial information as the progression unfolds -is it in the speaker's or the addressee's half of the container, or close to the speaker or the addressee? But once lock-in has been achieved, the form ton is used regardless of exact spatial position, as if the attentional accessibility of the object now makes spatial information unnecessary.
Before leaving these two systems, an observation is in order about the communicative ecology of pointing on the one hand and the demonstrative system on the other. The Turkish example makes it clear that achieving reference in conversation combines both gestural and linguistic elements as the demonstrative şu signals to the addressee to keep attending to the point. Indeed, Küntay and Özyürek (2002), who were puzzled by the fact that children still have not mastered the correct use of şu by the age of six despite the well-attested abilities of much younger children to monitor the gaze of adults, suggest that the delayed development is due to the extra cognitive demands of coordinating linguistic and gestural elements. 26 [26] Küntay and Özyürek (2002, p. 345) write that "These results might sound surprising in light of research indicating that joint attention is a very early communicative process that appears in infancy (Trevarthen 1998)". They go on to suggest that a "reason that we can propose is the integration of nonverbal factors with verbal expressions is a protracted developmental process (Goldin-Meadow, Alibali & Church 1993), and needs to develop further beyond 6 years of age. Especially when this integration is called for in a conversational task" (2002, p. 345). However, we believe that the late development of Turkish demonstrative use may in fact not be so surprising once we adopt a more graded view of how theory of mind develops, and note the fact that adult levels of theory of mind may not phase in till anywhere between five and eleven according to the specific test used (Saxe & Baron-Cohen, 2006). For a 'dual-process' theory of mind model that starts children off with an innate, rudimentary module available at birth, then refined through cultural learning at a much later age, see Apperly and Butterfill (2009). On the other hand, in Jahai the use of actual pointing is much more limited. Within the experimental 'director-matcher' set-up, pointing was not an allowable part of the procedure. And in more naturalistic settings Burenhult mentions a number of reasons why pointing is much less common among Jahai than among most other cultures: communication often occurs while walking single-file along forest paths, or between spouses after dark, and in any case there are a number of cultural taboos against pointing. He goes on to suggest that the elaboration of the Jahai demonstrative system, which in effect gives a complex series of clues as to how the addressee should keep looking, compensates for the unavailability of pointing in many circumstances.
We draw our examination of demonstratives to a close by looking more briefly at two further examples where monitoring of the addressee's attention and expectations is relevant, though not in the sense we have seen of directly tracking whether they have latched onto the referent but rather in helping them assess its identification against previous expectations or searches.
The first comes from the Australian language Bininj Gun-wok, Gun-djeihmi dialect (Evans, 2003). Among a large number of demonstratives (and just giving the masculine forms, beginning with na-), an interesting part of this system is the intersection of distance with whether the speaker deems the addressee to have had some previous interest in the entity at issue. Let's say you are looking for something without success, and I spot it: I would then say either nabernu (if it is distant) or nabehrnu (the h represents a glottal stop) if it is close to hand. On the other hand, if I present something which I didn't think you had been interested in before (say I find a new plant which you didn't know existed) I could hold it up to you and say nahni. In other words, the system tracks pre-existing cognitive interest (or not) on the part of the addressee, and crosses this with distance.
A related phenomenon is attested for the Athapaskan language Kaska, namely the class of directionals (Moore, 2002, ch. 19; the term is also used by Golla, 1996), also referred to in the Athabaskanist literature as 'deictic/directionals'  (Rice, 1989), and 'locationals' (Henry & Henry, 1969). Leer (1989) has proposed that these derive from old sequences of a demonstrative plus a noun. Kaska directionals resemble demonstrative adverbs, and are built from two parts. The stem has spatial meanings like 'off to the side', 'above', 'below', 'downstream', 'back down a trail', or temporal meanings like 'past' or 'future'. But it is the prefix which concerns us here, since these are sensitive to shared or unshared knowledge states. Of crucial interest is the way three of the prefixes indicate different distributions of knowledge about the location across the speaker and addressee: With reference to the more distant locations, the directional also indicates whether the speaker and the addressee know the exact location being referred to. For instance, the prefix kúh-is used when the exact location is known by both the speaker and those they are addressing. As other examples, the prefix de-is used when the location is known by the speaker, but not those they are addressing, and the prefix ah-is used when neither the speaker nor their audience know the exact destination, but only its approximate direction. (Moore, 2002, p. 404;italics added) In terms of the four-way set of engagement values we found for Andoke ( §2), this set covers three of the values: speaker-only, shared, and known to neither. It is only the fourth term -for the situation where the speaker does not know the exact location, but expects that the addressee might -that appears to be missing from this system. 27 Finally, we note that marking the mutual knowledge of speaker and addressee as regards an entity also appears to be relevant to what have been analysed as evidential morphemes either within or outside demonstrative systems, although these are generally less well understood and less documented cross-linguistically (see Jacques, in press). Storch and Coly (2014, p. 8) describe the suffix -dìyà in Maaka (Nigeria) as indicating "that both speaker and hearer know or see the participant in question" (9). They further comment that this form originates from a Kanuri term meaning 'surely, entirely, only', highlighting the connection between joint witnessing and the establishment of truth (see also comments reproduced from Sillitoe, 2010, in Part II, §3).
(9) ʔáa-kè-díɓɓ zùlúm-tò-dìyà c o n d -2 s g : m a s c -crush:pe r v anus-p o s s : 3 s g : f e m -j o i n t : v i s tà-kwáadà-ntí-mìnê gè-ʔámmà-à 3 s g : f e m -throw:t r -a s s e r t-o b j: 1 p l lo c -water-DEF 'If you crush her anus [that we can both see] she will definitely throw us into the water.' (Storch & Coly, 2014, p. 197) [27] Functionally, we might imagine that an interrogative form would fill this gap.
Across the world, in the South American language Lakondê (Telles & Wetzels, 2006) a nominal morpheme -te-'n. p r o x ' is described as encoding both spatial distance and mutual visual perception. For example, 'sih-te-'te 'house-n. p r o x -r e f ' is translated as 'house which we see at a distance'. 28 Such nominal markers seem to be a genetic feature of Mamaindê languages and are especially elaborate in Southern Nambiquaran, which has aspect, tense, evidential, and engagement (termed 'individual/collective verification'; Kroeker, 2001) marking on definite nouns (Lowe, 1999, p. 282). For example, the expression wa3lin3su3ait3tã2 (numbers indicate tones) is glossed as 'this manioc root that I, but not you, saw some time in the past' and may be contrasted to wa3lin3su3ait3ta3li2, meaning 'this manioc root that we (both) saw some time in the past (Lowe, 1999, p. 282;cf. Kroeker, 2001, pp. 45-6). The meaning contrast between individual and collective verification of the manioc root may be traced to the -tã2 (individual verification) and the -li2 (collective verification) suffix at the ends of the nominals. The complexity of Southern Nambiquaran, while staggering at first glance, is suggestive of the potential range of variation and the richness of such systems.
We have focused in such detail here on demonstratives because they are the syntactically simplest method of achieving mutual coordination -as investigators have pointed out, from Bühler 29 (as quoted in the epigraph to this section) on to Diessel: demonstratives function to coordinate the interlocutors' shared attentional focus. In the simplest case, the demonstrative is used to direct the addressee's attention to a referent that previously was not in the shared attentional focus; in this case, the demonstrative creates a new joint focus of attention. However, demonstratives are also commonly used to direct the addressee's attention from the current referent to a previously established referent or to differentiate between multiple referents that are already in the shared attentional focus. (Diessel, 2006, p. 470) Demonstratives generally distinguish a reasonably large set of ontological categories -entities (this), places (here), times (now), manners (thus), and so forth, welded together with deictics into sets like koko/soko/asoko 'here / there [28] However, while both Maaka and Lakondê also have contrastive nominal markers that can indicate the visual perception of the speaker only, these do not form a clear paradigm with the mutual witness forms. In the Maaka case, the speaker-witness form -mu is only used with topicalised participants, while in Lakondê the speaker-only visual evidential -ta-does not encode spatial information and, unlike -te-, can be used on both nouns and verbs. It remains a possibility that joint 'eye-witness' markers are as much to do with mutual attentional status and affirmation as with information source and perception per se.
[29] And in fact this line of argument goes back through Steinthal (1891, p. 313) to Apollonius Dyscolus.
[by you] / there (away from us both)' in Japanese. However, the syntactic level at which they apply can be disarmingly simple. This makes it possible to use them in the most basic imaginable types of mini-dialogue, of the type discussed by Karcevski (1948Karcevski ( /1969 30 for Russian pairs like Ty kuda? Tuda. ('You ('re going) whither?' 'Thither (accompanied by a suitable gesture).'); see Diessel (2003) and Evans (2012) for further discussion of these 'dialogic parallelisms'. These Karcevskian dialogues are possible because the semantics of the deictic expressions is essentially self-contained: 31 a pairing of a deictic value (e.g., proximal vs. distal) and an ontological one (e.g., place, or time, or manner). In Part II of this paper, we will pass to a number of systems where attentional coordination has been expanded to the point where it concerns not just objects, but the broader domain of events and the epistemic background to talking about them. There are some important differences between engagement as it can apply to objects (especially objects that are present in the speech situation) and as it applies to events and situations, which may require increased abstraction in reference, and, once in the past, are not available for ostension and must rather be remembered, learnt, believed, etc. We explore the complexity of encoding the differential accessibility of events using data from languages of the Americas, Papua New Guinea, and Northern India (for example: Did the speaker directly experience this event? Did the addressee experience it, too?). Finally, we see that, as regards the category of engagement, the distinction between objects and states of affairs is not so hard-and-fast: Abui shows that a diachronic pathway between the two can be traced via the increased functionality of demonstrative forms. And so we move from the world of entities, as discussed in Part I of this study, to the world of events, the topic of Part II. FL130100111 'The Wellsprings of Linguistic Diversity'), the Australian Research Council Centre of Excellence for the Dynamics of Language, the Alexander von Humboldt Foundation (Anneliese Maier Forschungspreis to Evans), the Swedish Research Council (dnr. 2011Council (dnr. -2274, and the Netherlands Organisation for Scientific Research NWO (Netherlands Organisation for Scientific Research), Veni award 275-89-024, 'Learning the senses: Perception verbs in child-caregiver interaction', as well as to our respective host institutions: the Australian National University, Stockholm University, and Radboud Universiteit in Nijmegen. The ideas in this paper have emerged from discussions with many people, and we particularly thank the following: Niclas Burenhult, Bill Hanks, Sotaro Kita, Jon Landaburu, and Aslı Özyürek; we additionally thank Sotaro Kita and Aslı Özyürek for permission to reproduce figures from an unpublished paper they wrote on Turkish demonstratives that has influenced us deeply. Ron Planer, Matt Spike, Alan Rumsey, and Arie Verhagen gave much-appreciated helpful critical comments on an earlier version of this manuscript, as did two anonymous referees, and Susan Ford did an immaculate job in checking, formatting, and editing it. r e f e r e n c e s