13.1 Introduction
Prosody is an invaluable component of effective spoken communication, particularly in situated dialogue, where it supports the accomplishment of many pragmatic functions, including many relating to turn-taking, to conveying stance, to marking topic structure, and so on. Pinning down how prosody does these things is not easy. While one can readily notice that prosody plays a role in some function, such as marking an utterance as a complaint, identifying how it does so is hard.
However, over the past two decades, a diverse group of pioneering researchers has been examining the phonetic details of the prosody of dialogue behaviors. In 2010, Richard Ogden, noting how a pragmatic function may be conveyed by a temporal configuration of prosodic features, introduced the term ‘prosodic construction’ to describe such form–function pairings (Ogden Reference Ogden, Barth-Weingarten, Reber and Selting2010). Subsequent work by Day-O’Connell (Reference Day-O’Connell2013), Niebuhr (Reference Niebuhr, Skarnitzl and Niebuhr2015), and others, as surveyed elsewhere (Ward Reference Ward2019), has elucidated the detailed properties of several prosodic constructions and systematized the notion.
The first aim of this chapter is to provide an overview of the notion of prosodic constructions, and the second is to discuss the similarities and differences between prosodic constructions and grammatical constructions. The chapter is structured around eight key aspects of prosodic constructions, covered one by one in the next eight sections. These observations and claims are illustrated using examples from American English, taken mostly from Ward (Reference Ward2019). The chapter concludes by discussing the advantages of using the notion of prosodic construction for analysis and overviewing some prospects and key challenges.
13.2 A Prosodic Construction Is a Combination of Components
One way to express admiration of a cute baby is by saying awwww with a very specific prosody. This includes a single very long syllable, high pitch that is fairly flat with a long shallow dip in the middle, creaky voicing, and nasalization. None of these individual features conveys the meaning of admiration, but their combination does. The meaning of the whole being more than the meaning of these parts, it is appropriate to describe this pattern as a construction: a specific form–function mapping. Figure 13.1 summarizes with an informal visualization.
As this example illustrates, prosodic constructions may involve many prosodic features. Indeed, the list of features identified as having a role in some prosodic construction is just the list of all perceptually salient pitch features, as summarized in Table 13.1.
Table 13.1 Prosodic features known to be used in prosodic constructions
| Pitch features | notably the extent to which the pitch is high, low, wide, narrow, flat, rising, or falling |
| Intensity | namely, the loudness or quietness of the voice |
| Timing features | including local variations in the speaking rate and the durations of syllables and of pauses |
| Voicing features | notably creaky voice, breathy voice, high harmonicity, and falsetto |
| Other features | including nasalization and phonetic articulation and reduction |
Prosodic constructions may also involve non-prosodic components, such as lexical items, other phonetic elements and properties, and multimodal behaviors. For example, the Awww-of-Cute Construction involves not only specific lexical content, namely, the word awww, but also initial and final glottal stops, a gentle smile, cocked head, direct gaze, and a relaxed posture.
The inclusion of many feature types in the notion of prosodic constructions is important: This makes the notion much more descriptively useful than such earlier notions as intonation contour or tone sequence. Historically it was intonation, that is, pitch phenomena, that received the lion’s share of attention in prosody. This is unsurprising, since pitch is the most perceptually salient prosodic property, the easiest to measure, the easiest to visualize, and the easiest to describe symbolically. Yet it is easy to see that intonation-only accounts are often inadequate.
As a first example, consider the prosodic construction used in English to get attention or to cue action, as in excuse me, in peek-a-boo, in knock-knock as the prefix to a knock-knock joke, and when used to call a distant person by name. The most salient prosodic feature is an abrupt drop in pitch, stereotypically three semitones, from which this construction takes its name: the Minor Third Construction. But much more is involved. This was shown by Day-O’Connell (Reference Day-O’Connell2013), who had subjects produce phrases in two ways. For example, they produced dinner either as a call to come and eat, or as a word in a declarative statement. Systematic comparison across a broad set of phonetic features revealed many robust differences. The attention-getting and action-cueing renditions not only included the downstep but were also louder, flatter in pitch both before and after the pitch drop, longer both before and after the pitch drop, higher in harmonicity, and higher overall in the speaker’s pitch range. These features are not accidental correlates but essential components of this construction, as seen by the fact that similar downsteps, when appearing with different configurations of other features, can convey different meanings. For example, the prosody of apologies, as might occur with I’m sorry, shares a downstep but tends to have slightly creaky, rather than harmonic, voicing, to have a very long post-downstep syllable, to have less strict pitch flattening, to be less high in the speaker’s range, and to be quieter. Pitch downslopes are also found in curses, such as screw you, and positive assessments, such as good job, but these forms serving these functions are also distinguishable thanks to co-occurring non-intonational prosodic features (Niebuhr Reference Niebuhr, Skarnitzl and Niebuhr2015; Ward Reference Ward2019).
A second example is uptalk, in which statements end with a rising pitch. Considering intonation alone, the differences from questions are subtle and variable (Ritchart & Arvaniti Reference Ritchart and Arvaniti2014). However, looking beyond pitch, there is often a clear difference: Uptalk is frequently breathy (Ward et al. Reference Ward, Kirkland, Włodarczak and Szekely2022). Functionally, utterances with uptalk are not a mere dialectal variant form of statements; rather, they are typically employed in a process of establishing common ground, regarding what to talk about or what to call it. Thus uptalk also may be considered a prosodic construction.
Both grammatical constructions and prosodic constructions are built from components, but for prosodic constructions these are not words and structures, but prosodic features.
13.3 A Prosodic Construction May Be a Joint Construction
Some prosodic constructions involve behavior by both speakers. That is to say, the components of a prosodic construction can include contributions from either speaker or potentially more than two, in multiparty dialogue. For example, English has an Enthusiastic Overlap Construction, in which both speakers are active at the same time, speaking loudly, using wide pitch range, and having a slight tendency to elevated pitch. Speakers ‘perform’ this joint construction together to mark various kinds of shared feeling, including commiseration, appreciation, and amusement, as in shared laughter.
Another example is the Backchanneling Construction, diagrammed in Figure 13.2, which is commonly used by interlocutors both to mark the transition from one chunk of information to the next and to display their intent to continue in their current roles as speaker and listener.

Figure 13.2 Approximate temporal domains of influence of the components of the Backchanneling Construction (adapted from Ward (Reference Ward2019)
While the words spoken in dialogue may reflect joint projects, as seen for example in question–answer pairs, in co-construction, and in repair sequences, it is not yet clear whether it is appropriate to analyze such behaviors in terms of joint grammatical constructions. For prosody, in contrast, there are several types of behavior which are clearly best described as instantiating joint prosodic constructions, including not only enthusiastic overlaps, but also turn exchanges, backchanneling, and response cueing (Ward Reference Ward2019, Reference Ward2020).
13.4 A Prosodic Construction Is a Temporal Configuration
Consider the prosody of positive assessment, as it might occur when praising someone with the phrase good job. Figure 13.3 shows this as a gestural score, in the style of articulatory phonology. Laying out the components along a timeline illustrates that this is not just an unstructured collection of features: The ordering matters. This was confirmed in an experiment that elicited judgments of different stimuli in terms of how positive they sound (Ward & Jodoin Reference Ward and Jodoin2019). Neutral stimuli were manipulated in two ways, either to increase the pitch and duration everywhere or to increase the pitch of only the first syllable and the duration of only the second. The latter forms, that is, those where the temporal configuration was present, were judged more positive.

Figure 13.3 Approximate temporal domains of influence of the components in an instance of the Positive Assessment Construction, with times in milliseconds
Components of prosodic constructions have typical durations, as suggested by the figure, but these are not fixed. Among other factors, these can depend on the lexical content on which the prosody is realized. For example, if the second part of this construction aligns with more syllables, as when the Positive Assessment Construction is used with the word excellent to express praise, the duration increase may affect both of the last two syllables. For longer phrases, such as that was really good, the prosodic construction can align in different places, with the high pitch falling variously on that, really, or just the first syllable of really, and in each case the domain of lengthening and increased volume applies to the following syllable.
It is worth noting that prosodic constructions vary greatly in their time scales. The examples in this chapter are mostly of prosodic constructions that typically span just a few words, as these make for convenient illustrations. However, the time spans of prosodic constructions can be much longer or much shorter. For example, someone with a lot to say can employ the Turn-Holding Construction over tens of seconds: This involves greater than average loudness, fairly careful articulation, narrow pitch range at the start and the end, and an overall slow drop in pitch (declination). At the other extreme, the Late Peak Construction, used to mark suggestion and many other functions, operates over just a syllable or two, with the signature property being the occurrence of a pitch peak on the order of 80 milliseconds after the intensity peak of a stressed syllable, rather than occurring closely aligned with it, the default position (Ward Reference Ward2018).
The fact that prosodic constructions have associated time courses is an important differentiator that sets them off from two other realms of prosodic function. On the one hand, this contrasts with the uses of prosody in paralinguistics, where the prosodic properties tend to be ongoing. For example, someone who is old may have pervasively creaky voice and someone who is sad may have low intensity and slow speech as long as the feeling lasts. In contrast, in prosodic constructions the component features appear for limited times and their temporal specifications are interlinked. On the other hand, the temporal structure of prosodic constructions contrasts with that of the unit-related functions of prosody, for example, when syllable-bound prosody marks lexical stress or identifies a tone. In such phonological roles, prosodic features are fairly closely linked to the temporal span of a syllable, word, or phrase. While unit-linked prosody can also involve complex configurations of prosodic features (Landgraf Reference Landgraf2014; Niebuhr Reference Niebuhr and Nyvad2019), the temporal properties of pragmatics-serving prosodic forms generally involve more flexibility and complexity in how they align with syllables and lexical units.
Both grammatical constructions and prosodic constructions involve sequentiality, but for grammatical constructions these are just linear sequences of words and other constructions, whereas for prosodic constructions the time axis is involved in a more complex way, with overlaps of features being quite common. There is also complexity in the processes by which the time axis is stretched or compressed, for example in order to fit a complex intonation contour onto just a word or two, or, in other cases, to truncate it (Torreira & Grice Reference Torreira and Grice2018).
13.5 A Prosodic Construction Is a Form–Function Mapping
By definition, a construction is a form that serves a function. Importantly, prosodic constructions embody direct form–function mappings. In this, a construction-based approach departs from many previous approaches to prosodic phenomena. In particular, it stands in contrast to approaches that postulate symbol-level or phonological descriptions of prosody, such as H!H% or L*H% (Ladd Reference Ladd2008), to mediate between, on the one hand, phonetically accurate descriptions and, on the other, descriptions of meaning. There are at least two practical advantages of description in terms of prosodic constructions over descriptions in terms of symbol sequences. First, they encourage observation and accurate description of the details of phonetic form. Second, modeling in terms of direct form–function mappings avoids the difficult or impossible task of devising a finite set of symbols to represent all the diverse components of meaning-bearing prosodic configurations. But it must be acknowledged that the advantages are not all on one side and, in any case, the enterprise of modeling pragmatics-related prosody comes with intrinsic issues that are challenging for any methodology (Niebuhr & Ward Reference Ward2018).
It is worth noting that prosodic constructions vary greatly both in their forms and in the sorts of functions they serve. One important dimension of variation is from specific to general. At one extreme, the Awww-of-Cute Construction is only used when admiring baby animals and only in specific situations. Correspondingly, its prosody is almost fully specified. At the other extreme, the meaning of the Late Peak Construction is quite diffuse: It serves at least twenty-four functions, including making suggestions, marking incredulity, and correcting misconceptions. While these do bear family resemblances to each other, it is hard to characterize them as subtypes of any single meaning. Its prosodic form is correspondingly underspecified: This construction only specifies things about pitch height and intensity, with every other aspect of the realization left free to be determined by other, superimposed constructions. Towards the middle of the continuum are constructions like the Positive Assessment Construction. While this does seem to have one overarching meaning, in any specific utterance it commonly resolves to a more specific type of positive assessment, such as admiration, as with she dresses so cute, approval, as with you got it, appreciation, as with thank you, encouragement, praise, flattery, and so on.
Both grammatical constructions and prosodic constructions represent direct form–function mappings. Both grammatical constructions and prosodic constructions vary in the level of specificity of the meanings conveyed. However, it is possible that the functions of prosodic constructions more often relate to pragmatic and interactional intents (Couper-Kuhlen & Selting Reference Couper-Kuhlen and Selting2018), while grammatical constructions relate more often to semantics.
13.6 A Prosodic Construction Can Have Specific Contexts of Use
Consider again the Awww-of-Cute Construction. This is appropriate only in certain contexts. It would be strange, for example, to say it when watching a baby on television, or to comment on a baby who’s actively playing, or for a baby that is your child or that you’re caring for, or for a baby that you’re reaching to pick up. Rather, this construction is prototypically appropriate when the baby is with her mother, when you’re interacting with the mother and have already a joint focus of attention on the baby, when the baby is asleep or otherwise quiet, when your intention is to admire the baby from a distance, without bothering it or taking it away from the mother, and when you are female. Of course these are not strict requirements; for example, people may say awww when someone is showing a picture of their grandbaby, or of a sleeping puppy or kitten.
The more general prosodic constructions also have constraints on their context of use, although these tend to be less restrictive. For example, the Backchanneling Construction is most appropriate only after a conversation is well established, with the participants having settled on a topic and established for the moment who has the floor.
Both grammatical constructions and prosodic constructions have context dependencies. While context dependencies are often less important for grammatical constructions occurring in written language, in dialogue they are very relevant for constructions of both types (see also Chapters 12 and 14).
13.7 A Prosodic Construction Can Be Combined with Other Constructions
Consider calling someone by name to get their attention and warn them of danger, as in
This often involves two superimposed constructions: the Minor Third Construction and the Late Peak Construction. Both contribute elements of meaning: The Minor Third Construction calls for her attention and the Late Peak Construction calls for her to infer something. Thus, for example, if Susan is a child moving towards water that is too deep for her, this combination can cue her both to infer the danger and to display awareness of it. The two constructions also both contribute elements of the form. Indeed, these two mesh very well, as the Late Peak Construction’s need for a pitch peak on a long syllable is neatly satisfied by the Minor Third Construction’s specification for a second syllable that is long, loud, and relatively high in the speaker’s pitch range. Figure 13.4 illustrates how the pitch specifications of these two constructions can add together to determine the pitch contour: a downstep plus a final upturn. Figure 13.5 shows the same process in terms of the pitch height gestural scores.

Figure 13.4 Pitch Contour Superposition

Superposition is not rare: People in conversation are usually trying to accomplish many communicative goals simultaneously, and so most utterances exhibit multiple superimposed prosodic constructions.
Both grammatical constructions and prosodic constructions typically occur in combination. However, for grammatical constructions the alignment and licensing constraints can be very strict, at least in the written language, as seen by the popularity of unification-based modeling in accounts of how the constructions of a sentence interrelate (see Chapters 2 and 10). In contrast, the combination of prosodic constructions appears to be more flexible, with the superimposed constructions not necessarily much constraining each other. This is possible due to the multi-dimensional nature of prosody: For example, a syllable can be simultaneously flat, loud, breathy, nasal, and many other things, and these various properties can be governed by different constructions.
13.8 A Prosodic Construction Can Inherit from Other Constructions
Consider again the Awww-of-Cute Construction. While it has its own meaning, it shares aspects of that meaning with other constructions, and these seem to be inherited from more abstract form–function mappings. One of these is the general use in English of creaky voice for indicating distance of some sort. This meaning element is also present in Awww-of-Cute, in that one can appropriately use awww only for a baby at some physical and social distance. The Awww-of-Cute construction also inherits from the general mapping between nasal voice and appealing to shared knowledge: It is appropriate when others are around and when you assume that your evaluation of the baby is generally shared.
Constructions may also have ‘sisters’ in the network organizations of constructions (for details on networks and inheritance, see Chapter 9). For example, the Awww-of-Cute construction shares much with the awww of spectators at a missed goal, including most of the prosodic features except the final pitch hump, and some elements of the function, namely, conveying a feeling that is deep, shared, evaluative, and triggered by a visual sensation, and indicating the intent only to keep watching.
Inheritance relations can be fairly complex. Consider the I’m Good Construction, used to politely decline an offer, as when responding to more cake? with I’m good. This inherits from three other constructions: the Late Peak Construction, the Minor Third Construction, and the Positive Assessment Construction. The inherited functions are, respectively, expressing politeness, cuing a next action (whatever comes next in the dessert sharing ritual), and showing appreciation for the offer. The inherited prosodic components include, respectively, a late peak, a pitch downstep, and harmonicity. The I’m Good Construction appeared, at least in El Paso, only about ten years ago and may be an example of a creative lexico-prosodic combination by one speaker that became a conventionalized part of American English.
Thus both grammatical constructions and prosodic constructions can inherit both form and function from other constructions.
13.9 A Prosodic Construction Can Be Present to a Greater or Lesser Extent
Consider again the Positive Assessment Construction, for example as used with good job to praise a student. When the praise is stronger, the prosodic components of the construction are more strongly present. This is confirmed by experiment: When stimuli were created with the pitch height and duration components progressively increased, the samples with the stronger prosody were judged to sound more positive (Ward & Jodoin Reference Ward and Jodoin2019). People are very sensitive to such shades of meaning: Most subjects were able to distinguish eight levels of prosodic positivity. Similar results were found with stimuli manipulated to match the prosody of the Contrast Construction more or less closely (Kurumada et al. Reference Kurumada, Brown and Tannenhaus2012). While some approaches to prosody model it in terms of symbols, every prosodic construction examined so far seems to be essentially a gradient phenomenon.
For constructions with many components, such as the Minor Third Construction, speakers may omit or modify a few components, such as the lengthening, the harmonicity, or the height relative to the speaker’s pitch range. Such variant productions generally convey the same meaning, but more weakly (Ward Reference Ward2019).
Timings may also vary from the prototype. For example, consider again the Backchanneling Construction. While this, like other turn-taking constructions (Levinson & Torreira Reference Levinson and Torreira2015), seems to have a prototypical timing for its components, roughly as depicted in Figure 13.6, this varies from instance to instance. For example, the position of the backchannel may vary. While it most typically occurs in the clear, during a pause by the other speaker, it may overlap the previous utterance or overlap the continuation. The other speaker’s behavior may also vary, making a long pause, a short one, or none at all. These variations relate to many factors, including the listener’s cognitive processing rate and the precise communicative intents of the speaker and listener. In cases where the deviation is moderate, the construction’s function may still be served, although more weakly. In other cases, where the timing is really off, or the backchannel or the continuation is missing, or the prosody is infelicitous, the behavior may be seen as resembling this construction, but too far from the prototype to really instantiate it at all.
In general, it seems that occurrences which more closely match the prototype for a construction tend to more strongly convey the meaning of that construction, although other factors are also involved (Cangemi & Niebuhr Reference Cangemi, Niebuhr, Cangemi, Clayards, Niebuhr, Schuppler and Zellers2018).
In any given sentence, a grammatical construction is either present or absent, but a prosodic construction can be present to a greater or lesser extent.
13.10 Summary
Figure 13.7 summarizes the key elements of the notion of prosodic construction. Overall it can be seen as a close analog of the notion of grammatical construction. To recap the similarities and differences:
a) While the components of grammatical constructions are words and the like (morphemes, words, word classes, larger morphological or syntactic constituents), for prosodic constructions the components are prosodic features.
b) While in grammatical constructions the temporal relations among components are usually just sequencing, in prosodic constructions the temporal configurations can be complex.
c) Both grammatical constructions and prosodic constructions are direct form–function mappings. However, it seems that grammatical constructions more often serve semantic functions, whereas prosodic constructions more often serve pragmatic and interactional functions.
d) Both grammatical constructions and prosodic constructions have associated contexts of use.
e) While any grammatical construction is generally either present or absent in a text, a prosodic construction may be present to a greater or lesser degree.
f) Both grammatical constructions and prosodic constructions usually appear in combination with other constructions. However, the basic combining principle for grammatical constructions may be some form of unification, but for prosodic constructions it is superposition.
g) Both grammatical constructions and prosodic constructions may inherit properties, of both form and function, from other constructions.
h) Both grammatical constructions and prosodic constructions may involve contributions by both speakers, but so far, the details of this have been worked out only for prosodic constructions.
Overall the two notions are highly compatible: Prosodic constructions and grammatical constructions are in essence the same. The added possibilities and complexities of prosodic constructions are easy to understand as due to the nature of spoken dialogue versus text. These include not only the frequently discussed factors that relate to the nature of interacting with others and to the nature of actions in time (e.g., Fried & Östman Reference Fried and Östman2005; Brône & Zima Reference Brône and Zima2014; also Chapters 12 and 14, this volume), but also factors relating to the nature of sound as a medium.

Figure 13.7 Essential properties of prosodic constructions
At the same time, to properly handle prosody requires more than just adding a slot to some standard formalism to mention ‘prosodic correlates’. As we have seen, accurate modeling of prosodic constructions and, presumably, also prosody-involving grammatical constructions, is more complicated.
13.11 Methodological Implications
The findings of work on prosodic constructions are relevant for at least two research programs.
The first research program is that of improving the accuracy of descriptions of (mostly) grammatical constructions (e.g., Gras & Elvira-Garcia Reference Gras and Elvira-Garcia2021). For this, the take-home lesson is the diversity of prosodic features involved in constructions and, as a corollary, the frequent inadequacy of phonological descriptions. For example, while it is a reasonable first approximation to note that the Let Alone Construction may involve marking some words as ‘prosodically prominent’, as in I barely got up in time to EAT LUNCH, let alone COOK BREAKFAST (Fillmore et al. Reference Fillmore, Kay and O’Connor1988), the reality is likely more complex. Today we have resources that can help (for example a trivial search for let alone on Youglish.com yields 5429 spoken examples) and recent years have seen many insightful discussions of the prosody of grammatical constructions (such as Poldvere & Paradis Reference Poldvere and Paradis2020; Lehmann Reference Lehmann and SchlechtwegIn press), but accurate characterization is still difficult. In particular, we as researchers are hampered by our perceptual limitations. While it is easy to swiftly recognize and react to prosodically conveyed information without conscious effort when engaged in dialogue, to explicitly perceive and discuss prosody is difficult. This is true even for the most salient prosodic features (notably pitch) and the most convenient descriptive terms (stress, prominence, rise, fall, L*H, and the like). Fortunately, there are now tools that can aid perception and also resources to help non-phonetician researchers learn how to more comprehensively and more accurately describe prosody, even the complex forms common in real dialogue (e.g., Szczepek Reed Reference Szczepek Reed2010; Benus Reference Benus2021). At the same time, Construction Grammar researchers do not always need to start from scratch, as the prosodic components of many constructions may already have been inventoried (Ward Reference Ward2019). For example, it is likely that the prosody of the Let Alone Construction is largely inherited from two prosodic constructions: the Contrast Construction and the Bipartite Construction.
The second research program is that of categorizing the prosody of various pragmatic functions. While so far few researchers in prosody or pragmatics have chosen to describe their findings in terms of constructions, there are advantages to doing so. Working within this framework imposes no specific constraints, yet guides researchers to study and describe meaningful prosody in ways that have previously been very productive. In particular, adopting this framework can liberate researchers from the constraints that come from respecting postulated intermediate phonological layers, and instead allow them to look for direct form–function mappings; moreover, it can encourage researchers to transcend the limits of mere correlational work – such as itemizing the individual prosodic features that correlate with anger, irony, and so on – to seek, instead, the actual patterns involved.
13.12 Potential Applications
Today some speech synthesizers can take any sentence from any Wikipedia article and render it more intelligibly and pleasingly than most human speakers can. Yet none can synthesize sentences that convincingly convey pragmatic intents (Marge et al. Reference Marge, Espy-Wilson, Ward, Alwan, Artzi, Bansal, Blankenship, Chai, III, Dey, Harper, Howard, Kennington, Kruijff-Korbayová, Manocha, Matuszek, Mead, Mooney, Moore, Ostendorf, Pon-Barry, Rudnicky, Scheutz, Amant, Sun, Tellex, Traum and Yu2022). People have become used to this: Unexpressive voices have become the norm for spoken dialogue systems, but it does not have to be this way, and users would almost certainly prefer systems that are more communicative. The spectacular improvements in speech technology over the past decades have been largely due to two factors: machine learning using large datasets and abandoning mediating symbolic representations to use instead direct mappings from the speech signal to the categories of interest (Shriberg & Stolcke Reference Shriberg and Stolcke2004). Prosodic constructions represent direct form–meaning mappings and as such are potentially very compatible with direct, learning-based approaches. They may thus provide an avenue for adding pragmatic competence to speech synthesizers.
Today many people suffer in social interactions due to prosody-related communicative impairments. For learners of foreign languages, constructions have great promise in language teaching. So far this has been explored mostly for the grammatical aspects of language (Boas Reference Boas and Boas2022, also Chapter 23 in this volume). Speech raises new challenges (Gilquin Reference Gilquin2022), where construction-based approaches may also be useful, especially for prosody, which for many language learners is a major challenge. Current teaching methods can be problematic, as they may, for example, require learners to memorize arbitrary partial descriptions, such as ‘high rising’, and then to memorize how they are supposed to sound and to memorize the meanings they are supposed to associate with. My own experiences as a guest instructor for learners of English suggest that lessons organized around prosodic constructions can be effective. These can be grounded in highly memorable examples that illustrate specific things native speakers do in specific situations, and augmented with examples illustrating the range of typical uses, some concise explanation, and lots of pairwork and feedback. With such scaffolding, learners may be able to readily learn to recognize and use constructions. Similarly, teaching in terms of constructions may help people become more aware of how prosody varies across dialects and social groups and thereby help reduce unnecessary misunderstandings. Another population that could benefit are native speakers whose mastery of prosody is incomplete or inadequate for some purposes. Nowadays, people wanting to be more charismatic, or people with autism wishing to overcome social handicaps, are often coached to change superficial prosodic properties, for example, their average pitch height, pitch range, or volume. However, a focus on prosodic constructions, the prosodic elements that are actually the most communicatively relevant, may enable better assessment of skills and more meaningful interventions.
13.13 Research Questions
While the prospects for applications are bright, complete success will require more work on basic questions of prosody representation and processing. These include some basic questions about constructions, including:
To what extent are prosody and grammar independent domains? There are certainly times where prosody conveys meaning by itself. For example, the Minor Third Construction can be effective with no lexical content at all, for instance when produced on unh-uh to warn off a toddler reaching for the cookie jar. Another type of independence is seen when the prosody and the words can convey contradictory meanings, for example in sarcasm. Yet, strong claims about independence may be overstating the case (Imo & Lanwer Reference Imo, Lanwer, Imo and Lanwer2020).
What does the network of constructions actually look like? The complete picture must reflect the ways in which constructions inherit not only elements of grammatical form but also prosodic form, as well as modeling relations with sister and competitor constructions in both domains.
How are the prosodic aspects of constructions best described? This chapter has presented the prosody of constructions by listing the features involved and sometimes their temporal extents. While such descriptions suffice for evoking the typical prosody for human readers, they are selective and imprecise. Fully quantitative, rich descriptions (Ward Reference Ward2019) are probably more accurate and more useful for many applications, but they sacrifice readability. Further investigation and development is needed.
What are the processes by which constructions align with each other? We know some things about how prosodic constructions may warp and adjust to align and combine with other constructions, either prosodic or grammatical (Torreira & Grice Reference Torreira and Grice2018; Vigario et al. Reference Vigario, Cruz and Frota2019), but we need predictive models of these processes.
How do the prosodic and grammatical aspects of constructions actually relate? We can conclude that the notions of prosodic construction and grammatical construction are overall quite compatible. However, so far, every published description of a construction has focused on either the grammatical domain or the prosodic domain, including at best a few observations regarding the other. As a priority, we need a full description of at least one construction in its entirety. Only after a few such studies will we be ready to work towards a unified understanding, through theory development and comprehensive modeling (Ziem Reference Ziem2017). That will, in turn, enable us to realize the full potential of constructions for both research and applications.
14.1 Introduction
Dingemanse (Reference Dingemanse and Enfield2017: 195) defines ‘marginalia’ as “typologically unexceptional phenomena that many linguists think can be ignored without harm to linguistic inquiry.” He applies this concept to the analysis of ideophones (like kibikibi ‘energetic’ in Japanese) and interjections (like Ouch! in English), showing that both share a certain degree of syntactic independence (hence their common portrayal as marginal for grammatical theory) but arguing that these categories still shed light on central aspects of the design of human language:
Ideophones challenge us to take a fresh look at language and consider how it is that our communication system combines multiple modes of representation. Interjections challenge us to extend linguistic inquiry beyond sentence level and remind us that language is social-interactive at core.
The concept of marginalia can be easily related to one of the central tenets of Construction Grammar: to provide a full account of the grammar-lexicon of languages, thus challenging presumed distinctions between core and periphery in grammar. In this sense, construction grammarians have often given special attention to irregular and idiomatic constructions:
Our reasons for concerning ourselves with otherwise neglected domains of grammar are not so that we can be left alone, by claiming territory that nobody else wants, but specifically because we believe that insights into the mechanics of the grammar as a whole can be brought out most clearly by the work of factoring out the constituent elements of the most complex constructions.
One phenomenon that has been neglected until recently in descriptive and theoretical grammatical analyses is insubordination, the main clause use of subordination markers (Evans Reference Evans and Nikolaeva2007), such as an if-clause used as a polite request in English (If you could open the door, please). Insubordination can by no means be considered an exceptional phenomenon. Since the seminal work by Evans (Reference Evans and Nikolaeva2007), which includes thirty-seven languages representing twelve linguistic families, a large amount of literature has been devoted to describing and analyzing independent constructions with subordination markers in a wide array of languages belonging to unrelated linguistic families. Scholars have been very active in Romance (Debaisieux Reference Debaisieux2006; Lombardi Gras Reference Gras2011, Reference Gras, Jacob and Ploog2013, Reference Lombardi Vallauri, Evans and Watanabe2016; Patard Reference Patard2014; Sansiñena Reference Sansiñena2015; Sansiñena et al. Reference Sansiñena, De Smet and Cornillie2015a, Reference Sansiñena, De Smet and Cornillie2015b; Gras & Sansiñena Reference Sansiñena2015, Reference Gras and Sansiñena2017, Reference Gras, Sansiñena, Bouzouita, Enghels and Vanderscheuren2021; Vallauri Reference Vallauri, Evans and Watanabe2016; Hirata-Vale et al. Reference Hirata-Vale, Oliveira and Silva2017; Alves & Hirata-Vale Reference Alves and Hirata-Vale2021) and Germanic languages (Verstraete et al. Reference Verstraete, D’Hertefelt and Van Linden2012; Brinton Reference Brinton2014; Wide Reference Wide, Boogaart, Colleman and Rutten2014; D’Hertefelt Reference D’Hertefelt2018), but there are studies on insubordinate constructions in many non-Indo-European languages as well: Athabaskan and Eskimoan (Mithun Reference Mithun2008; Cable Reference Cable2011) and Altaic (Robbeets Reference Robbeets2009), among others.
Generally, studies tend to concentrate on specific constructions – complement or conditional clauses, non-finite verb forms or particles – whether in a single language or in a sample of related languages. Since independent insubordinate constructions tend to be highly polyfunctional, many of these studies try to disentangle whether the different meanings can be better represented as separate constructions or as instances of a single schematic construction (e.g., Verstraete et al. Reference Verstraete, D’Hertefelt and Van Linden2012; D’Hertefelt & Verstraete Reference D’Hertefelt and Verstraete2014; D’Hertefelt Reference D’Hertefelt2018 on independent complement and conditional constructions in Germanic languages; Gras Reference Gras, Jacob and Ploog2013, Reference Gras, Evans and Watanabe2016; Gras & Sansiñena Reference Sansiñena2015, Reference Gras and Sansiñena2017, Reference Gras, Sansiñena, Bouzouita, Enghels and Vanderscheuren2021; Sansiñena Reference Sansiñena2015 on independent complement constructions in Spanish).
This chapter tackles a different angle on insubordination. Since most insubordinate constructions tend to occur in informal conversations, they serve as an interesting case for analyzing the discourse–grammar–prosody interconnections, given that grammar in interaction has inherently a phonetic shape and needs to be related also to the turn structure of the conversational flow (see also Chapter 12). A relevant case study in this respect is the contrastive insubordinate conditional construction (CICC) (Montolío Reference Montolío1999; Schwenter Reference Schwenter2016). This construction, exemplified in (1), occurs typically as the dispreferred second part of an adjacency pair stating a reason why the speaker considers the previous turn to be inappropriate. In this case, by stating that the addressee is wearing their glasses, the previous question by the addressee is regarded as pragmatically inappropriate.Footnote 1
(1)
A: ¿Has visto mis gafas? ‘Have you seen my glasses?’ B: ¡Si las llevas puestas! if them wear.2sg.prs.ind put ‘But you are wearing them!’
The CICC clearly qualifies as a construction in that it pairs an idiosyncratic form (a self-standing conditional protasis) with a non-compositional meaning (contrast with previous context). This is an interesting case study for two main reasons. First, as already pointed out, the construction typically instantiates the dispreferred second part of an adjacency pair (Montolío Reference Montolío1999), but it remains unclear whether this is a formal restriction, equivalent to morphosyntactic constraints. And second, previous studies have pointed to the existence of prosodic restrictions (Montolío Reference Montolío1999; Schwenter Reference Schwenter2016) though no detailed prosodic research was conducted.
The integration of prosodic and discursive information in the description of grammatical constructions does not pose a theoretical challenge in Construction Grammar since most constructional approaches define grammatical constructions as pairings of any aspect of phonological, morphological, and syntactic form and any of semantic, pragmatic, and discursive meaning. However, the fact that a specific construct licensed by a construction must have all its features of form and meaning specified does not mean that these must be stated at the level of the construction. As a taxonomic theory, Construction Grammar allows information to be stored at different nodes of the constructional network, enabling some features to be inherited from more abstract constructions, on the one hand, and some features to be specified only in more concrete constructions, on the other. The goal of this chapter is to reflect on the integration of prosody and discourse in a constructional model that uses constructions and networks through the analysis of contrastive insubordinate conditional constructions in Spanish.
The chapter is organized as follows. Section 14.2 summarizes the main features of insubordination as a phenomenon, including its main formal and functional properties in Spanish, and offers a constructional analysis of the CICC in Spanish, providing evidence in favor of its constructional status. Section 14.3 describes the intonational analysis of the construction, while Section 14.4 presents the discursive analysis. Section 14.5 presents the conclusions.
14.2 Insubordination in Spanish
14.2.1 Insubordination
Insubordination “can be defined diachronically as the recruitment of main clause structures from subordinate structures, or synchronically as the independent use of constructions exhibiting prima facie characteristics of subordinate clauses” (Evans & Watanabe Reference Evans, Watanabe, Evans and Watanabe2016: 2). From a synchronic point of view, a construction is an instance of insubordination if it fulfills these two criteria: (i) that the construction shows features that are typical of subordinate clauses in the language at hand (non-finite verb forms, complementizers, subordinating particles, etc.), and (ii) that it is not a syntactic constituent of another syntactic structure uttered by the same speaker or another speaker, in the same or previous turns; that is, it is not a case or regular ellipsis. In this sense the CICC presented in example (1) qualifies as an instance of an insubordinate construction since (i) it bears subordinating marking (the conditional conjunction si ‘if’) and (ii) it cannot be parsed as a conditional protasis of the interrogative sentence uttered by the previous speaker (*¿Has visto mis gafas si las llevas puestas? lit. ‘Have you seen my glasses if you’re wearing them?’).
In terms of form, Spanish insubordinate constructions fall into two patterns: non-finite verb forms, either infinitives (2) or gerunds (3), and subordinating conjunctions followed by a verb in the indicative (4) or subjunctive mood (5).
(2)
Infinitive with retrospective imperative interpretation A: Tengo hambre. ‘I’m hungry.’ B: Pues haber comido antes. dp have eaten before ‘You should have eaten first.’
(3)
Gerund with imperative interpretation Andando, que es tarde walk.grd comp be.3sg.prs.ind late ‘Let’s walk, it’s late.’
(5)
<que ‘that’ + subjunctive> with optative interpretation ¡Buenas noches, que duerm-as bien! good night comp sleep-2sg.prs.sbjv well ‘Good night, sleep tight!’
Both formal patterns differ in terms of frequency. A study of insubordinate constructions in a corpus of informal conversations among adult speakers of Peninsular Spanish (Briz & Val.Es.Co. Reference Briz2002) showed that subordinating conjunctions are much more frequent than non-finite verb forms (Table 14.1): Non-finite verb forms constitute only 4.61 percent of the total amount of insubordinate constructions in the corpus, and no gerunds were found. As for the subordinating conjunctions, even though the literature identifies independent uses of several subordinating conjunctions, que ‘that’ and si ‘if’ represent most of the cases found in the corpus (83.52 percent).
Table 14.1 Distribution of insubordinate constructions in the Val.Es.Co. corpus (Reference Briz2002)
From a semantic-pragmatic perspective, Evans (Reference Evans and Nikolaeva2007) points out that insubordinate constructions tend to express the same meanings across languages. Considering data coming from twelve language families, Evans (Reference Evans and Nikolaeva2007) builds a functional cross-linguistic typology of insubordinate constructions, which consists of three macro functions: (i) They express indirectness and interpersonal control (especially directives, but also permissives, warnings, and threats), like the imperative infinitives (2) and gerunds (3); (ii) They express various kinds of modal framing (especially epistemic and deontic, but also evaluative), like the complement clauses with optative interpretation (5); (iii) They signal a high degree of presupposed material (focus, contrast, reiteration, among others), such as the complement clauses with quotative interpretation (4).
This typology has received different elaborations in further research in Romance and Germanic languages. On the one hand, it has been noted that there exists substantial overlap between functions (i) (indirectness) and (ii) (modal framing), since insubordinate directives could fall in either category (Gras Reference Gras2011, Reference Gras, Evans and Watanabe2016; D’Hertefelt Reference D’Hertefelt2018). Recent work has proposed that these uses of insubordination might be better explained as minor sentence type constructions (Siemund Reference Siemund2018; Gras Reference Gras2020; Pérez et al. Reference Pérez, Gras and Brisard2021): formally marked alternatives of major sentence types, which normally add an expressive value to their major counterparts.Footnote 2 On the other hand, function (iii) has been rephrased as ‘discourse insubordination’ (Gras Reference Gras2011, Reference Gras, Evans and Watanabe2016; Verstraete et al. Reference Verstraete, D’Hertefelt and Van Linden2012) because patterns marking this function tend to express relations between the insubordinate clause and some other parts of discourse. The CICC would be an instance of discourse insubordination, given that it expresses some sort of contrast with previous discourse.
The diversity of form, meaning, and function these constructions show leads to the question of whether we are dealing with a single unified phenomenon or with distinct phenomena that share a formal feature. Parts of the literature argue in favor of setting insubordination apart from the extension of dependency for discourse uses, being referred to as ‘extension of dependency’ (Mithun Reference Mithun2008, Reference Mithun, Beijering, Kaltenböck and Sansiñena2019) or ‘dependency shift’ (D’Hertefelt & Verstraete Reference D’Hertefelt and Verstraete2014; D’Hertefelt Reference D’Hertefelt2018).Footnote 3 An example of dependency shift is the use of complement clauses to elaborate on a previous turn, as in the following example:
(6)
Elaborative complement structure in Danish A: om vi skulle fråga våra eh förstaklassare här om dom vill ha betyg eller inte skulle dom inte fatta vad det handlade om vet inte hur vad betyg eller vad det e (…) så det ju nånting som / andra lägger på B: ja A: att det det kommer ju sen atomatist i comp it it come.prs part afterwards automatically in skolan att man får betyg school.def comp one get.prs grades å då kommer den här konkurrensen ännu mera in tror jag va A: ‘if we were to ask our first-graders here if they want to have a diploma or not they wouldn’t understand what it was about don’t know how what grades or what it is (…) so it’s something that / others impose’ B: ‘yes’ A: ‘that it it then comes automatically in school that one gets grades and then this competition starts even more I think right’ (D’Hertefelt & Verstraete Reference D’Hertefelt and Verstraete2014: 92)
These uses express a discursive meaning, are discursively dependent on a previous utterance or turn, and do not allow for the reconstruction of a main clause. According to the distinction between insubordination and dependency shift, both phenomena differ in terms of their meaning/function, their degree of (in)dependence, and their plausible diachronic development, as summarized in Table 14.2.
Table 14.2 Differences between insubordination and extension of dependency
| Insubordination | Extension of dependency | |
|---|---|---|
| Meaning/function | Modal, illocutionary | Discursive |
| Dependency | Syntactic and discursive independence | Syntactic independency, but discursive dependency |
| Plausible diachronic development | Ellipsis of a main clause that gives rise to the modal/interactional meaning of the insubordinate construction | Extension of dependency from a clausal to a discursive domain (increase of scope) |
Even though this distinction has been useful for explaining the differences between superficially similar constructions in several languages (see references above), the CICC challenges it in several ways. Regarding its meaning/function, CICC has a discursive meaning, which consists in signaling a contrast between the proposition and some background content. In example (1), asserting that the addressee is wearing their glasses questions the appropriateness of the question (¿Has visto mis gafas? ‘Have you seen my glasses?’). However, there is also a modal-illocutionary dimension of the construction since it carries a strong assertion (see Section 14.2.2).
As for its degree of independence, CICC in its prototypical use as a dispreferred response is syntactically independent but discursively bound to a previous turn by the addressee. However, in some cases, its instances can be uttered to express the speaker’s surprise at a situation that challenges their previous assumptions. Example (7) could be said by a speaker who, after looking for their glasses, realizes that they are wearing them. In this context, the construction expresses a mirative meaning and is not bound to previous discourse material but to the situational context. A partially equivalent pattern in English is the mostly substantive construction “If it isn’t X” (e.g., Well, if it isn’t my old friend Tom!), which is “used to express surprise about meeting someone when it is not expected” (Merriam-Webster def. ‘if it isn’t’).
(7)
¡Si las llev-o puestas! if them wear-1sg.prs.ind put ‘But I’m wearing them!’
As for its plausible diachronic development, it has been suggested that CICC derives from epistemic conditionals (‘If you are wearing your glasses, why are you asking for them?’). As Montolío (Reference Montolío, Bok-Bennema, de Jonge, Kampers-Manhe and Molendijk2001: 201) argues, “only the protasis appears in this construction, the apodosis having been systematically omitted or ‘silenced’, as it were, given its invariant nature: why have you said what you have just said?”. This explanation is compatible with Evans’ (Reference Evans and Nikolaeva2007) hypothesis about insubordination as a result of the conventionalization of a main clause ellipsis.
In sum, CICC challenges the distinction between insubordination and extension of dependency in several respects: (i) It combines modal-illocutionary and discursive meanings; (ii) It shows discourse-boundedness, while still able to be used discursively independently; (iii) It can be accounted for as the result of main clause ellipsis. This situation is not exclusive to this construction. In fact, many Spanish insubordinate patterns combine a modal-illocutionary component with discursive functions. On the one hand, several minor imperative sentence type constructions show discourse restrictions in addition to their illocutionary value. Consider, for instance, infinitives with retrospective imperative interpretation: They have the illocutionary force of a reproach (‘You should have done it’), while they serve as dispreferred responses in interaction, therefore fulfilling a discourse-structuring function (reaction). On the other hand, some discourse insubordinate patterns allow the reconstruction of a main clause. This is the case of complement clauses with quotative interpretation, which always allow the reconstruction of a verb of saying, as the modified version of example (4) shows in (4′). Therefore, the CICC will be treated in this chapter as an instance of discourse insubordination and not as a case of extension of dependency.
(4′)
<que ‘that’ + indicative> with quotative interpretation A: Voy a cenar mañana ‘I’m coming for dinner tomorrow.’ B: ¿Qué? ‘what?’ A: Te digo que voy a cenar mañana you say.1sg.prs.ind comp come.1sg.prs.ind to dinner tomorrow ‘I’m telling you that I’m coming for dinner tomorrow.’
14.2.2 Contrastive Insubordinate Conditionals in Spanish
Contrastive insubordinate conditionals differ from the most frequent use of insubordinate conditionals discussed in the literature (Kaltenböck Reference Kaltenböck, Kaltenböck, Keizer and Lohmann2016; D’Hertefelt Reference D’Hertefelt2018; Lastres-López Reference Lastres-López2020), which are polite requests (8) and wishes (9).Footnote 4 Even though insubordinate conditional structures can express polite requests and wishes in Spanish, they are much less frequent than CICC, at least in informal conversation. Indeed, all instances of independent insubordinate conditional in the Val.Es.Co. corpus have a contrastive interpretation (Gras Reference Gras2011). It should also be noted that CICC is not exclusive for Spanish. Equivalent constructions can be found in several Romance languages: Catalan (Salvador Reference Salvador and Solà2002), Portuguese (Alves & Hirata-Vale Reference Alves and Hirata-Vale2021), and Italian (Lombardi Vallauri Reference Vallauri, Evans and Watanabe2016).
(8)
Conditional with polite request interpretation in Spanish Si pudieras abrir la puerta, por favor. if can.2sg.pst.sbj open the door please ‘If you could open the door, please.’
(9)
Conditional with wish interpretation in Spanish ¡Si tuviera 10 años menos! if have.1sg.pst.sbj 10 years less ‘If I were ten years younger!’
CICC should not be confused with suspended conditional constructions (Schwenter Reference Schwenter2016), that is, conditional protases whose main clause must be inferentially reconstructed by the addressee, as in (10):
(10)
A: ¿Vamos a la playa? ‘Let’s go to the beach!’ B2: Si quieres … if want.2sg.pres.ind ‘If you want to …’
In example (10), si functions as a conditional marker that introduces a protasis, whose apodosis has been left open for reconstruction: si quieres, vamos a la playa ‘If you want, we go to the beach’. The difference between regular bi-clausal patterns and suspended patterns, thus, has to do with the explicit or implicit character of the apodosis: explicit in the former, implicit in the latter.
CICC has idiosyncratic formal features that set it apart from conditional constructions, whether regular bi-clausal or suspended ones (Montolío Reference Montolío1999; Schwenter Reference Schwenter1998). First, it rejects subjunctive verb forms, as version B1 from example (10′) shows. Second, it rejects coordination with other clauses, as the ungrammaticality of B2 shows. And finally, it does not combine with ‘continuation rise’ intonation; on the contrary, previous studies have suggested it has exclamative intonation (Montolío Reference Montolío1999; Schwenter Reference Schwenter2016).
(10′)
A: ¿Vamos a la playa? ‘Let’s go to the beach!’ B1: *¡Si esté lloviendo! if be.3 sg.prs.sbjv raining B2: *¡Si está lloviendo y si hace frío! if be.3sg.prs.ind raining and if make.3sg.prs.ind cold *‘It’s raining and it’s cold!’
As for the meaning, most authors tend to identify a contrastive meaning (Contreras Reference Contreras1960; Almela Reference Almela1985; Porroche Reference Porroche, Martín Zorraquino and Montolío1998; Schwenter Reference Schwenter1998; Montolío Reference Montolío, Bok-Bennema, de Jonge, Kampers-Manhe and Molendijk2001; Schwenter Reference Schwenter2016). In particular, Montolío (Reference Montolío1999, Reference Montolío, Bok-Bennema, de Jonge, Kampers-Manhe and Molendijk2001) identifies two features that capture the meaning side of the construction: (i) It expresses a basic meaning of contrast, which can affect various aspects of a previous turn (its propositional content or illocutionary force); (ii) It indicates the inappropriateness of some aspect of the addressee’s previous turn or their non-linguistic behavior, and leads to an implicit conclusion: p / if q → p is inappropriate (e.g., ‘If it’s raining and cold, then it is inappropriate to suggest going to the beach’). In addition, Montolío suggests that CICC is discourse-placed in the sense that their instances “encode information about the context in which they appropriately occur” (Evans Reference Evans and Foley1993: 325). In particular, they normally occur as the dispreferred second part of an adjacency pair, often as a turn in itself.
However, as Montolío herself acknowledges, the construction also occurs in discursive contexts other than dispreferred responses. In order to address this limitation, Schwenter (Reference Schwenter2016) has developed an epistemic analysis of the construction. According to him, it is convenient to distinguish between the ‘codified’ meaning of the construction and a series of context-dependent (pragmatic) interpretations: “The coded, non-truth-conditional meaning of si is argued to be epistemic in nature, marking the proposition that it accompanies as one which is obviously true to the speaker” (Schwenter Reference Schwenter2016: 22). According to Schwenter, this epistemic analysis makes it possible to explain apparently non-contrastive contexts of use, such as the introduction of a premise that justifies a previous statement, as exemplified by the rhetorical question in (11) or the agreement with a preceding turn, as occurs in (12).
(11)
Tweet about Carmen Electra, an American actress and model ¿Quién quiere conocer a @CarmenElectra? Si está más buena que unos taquitos if be.3sg.prs.ind more good than some taquitos al pastor a las 2am al pastor at the 2am ‘Who wants to meet @CarmenElectra? She’s yummier than some taquitos al pastor at 2am.’
(12)
Conversation between faculty members A: Juana la han aceptado en Stanford. ‘Juana has been accepted at Stanford.’ B: Claro, si es muy inteligente. sure if be.3sg.prs.ind very intelligent ‘Sure, she’s very intelligent.’
However, this analysis also has two interrelated limitations. Although the meaning of epistemic certainty is common to both the contrastive uses and the non-contrastive uses that have just been presented in (11)–(12), Schwenter’s analysis does not indicate which discursive contexts favor the presence of the insubordinate conditionals. Normally, presenting a proposition as obviously true for the speaker is compatible with any discursive context, and yet the construction tends to be used in a limited set of contexts. Also, in relation to this aspect, Schwenter’s analysis is based on the qualitative analysis of a set of examples and does not indicate how often the various interpretations/contexts occur.
To summarize, the constructional status of this pattern is clear. On the formal side, the construction exhibits properties that cannot be predicted from the knowledge of conditional constructions, especially mood selection and clause combining. On the interpretive side, the construction marks an interactional contrastive and assertive meaning that cannot be attributed to any lexical item. However, the status of the prosodic and discursive information of the construction remains unclear. From a discourse-structural perspective, a corpus-based analysis is needed to analyze the discourse constraints of the constructions in order to provide an empirically grounded constructional representation. From a prosodic perspective, it has been suggested that CICC has exclamative prosody. Nevertheless, more prosodic research is needed in order to decide whether the construction accepts other prosodic patterns and whether the attested prosody is idiosyncratic (i.e., does not occur outside the construction) or inherited (it exists outside the construction). To answer these questions, let us review two studies: an interactional corpus study (Section 14.3) and a prosodic study (Section 14.4).
14.3 Discourse Structure
The section presents a study whose goal is to analyze the discourse restrictions of the CICC in a corpus of informal conversations and, based on this analysis, to reflect on the most appropriate way to incorporate discursive information in a constructional approach.
14.3.1 Methodology
The interactional study is performed on data coming from the Val.Es.Co. corpus (Briz & Val.Es.Co. Reference Briz2002), which consist of informal conversations among adults from Valencia (Spain). To ensure accuracy, a total of seventy-five instances were carefully extracted using a semi-automated process, ensuring that only occurrences of the CICC were included. Each token has been analyzed according to two types of parameters: conversational and rhetorical. For the conversational analysis, three parameters have been considered:
(1) Type of unit: The construction constitutes (i) a turn in itself, (ii) a turn constructional unit (TCU) in a complex turn, or (iii) a turn extension.
(2) Type of intervention: The construction is used as (i) an initiation (e.g., question or command), (ii) a preferred response (e.g., affirmative reaction or agreement), or (iii) a dispreferred response (e.g., a negative reaction or disagreement).
(3) Position: The construction occurs in (i) an independent position (as a turn in itself), (ii) turn-initial, and (iii) turn non-initial (preceded by a TCU).
Regarding the rhetorical parameters, two have been considered:
(4) Target: The CICC refers to (i) a previous turn, (ii) a previous utterance of the speaker, or (iii) the extralinguistic situation.
(5) The argumentative orientation: (i) counter-orientation or (ii) co-orientation.
Taking into account the combination of the above parameters, five functions of the construction have been proposed. Four of them were already mentioned in the literature (rebuttal, controversial agreement, mirativity, and justification) and a fifth one has been identified (polyphonic rebuttal). They will be explained in detail in the next section.
14.3.2 Distribution of Functions in the Corpus
Rebuttal
Following Montolío (Reference Montolío1999), those occurrences in which the construction is used in a dispreferred response are called rebuttal. From a conversational point of view, the reply can be a turn or a TCU in a complex turn, which preferably occurs in turn-initial position, but also in non-initial position. In an initial position, it introduces a premise that leads to an implicit conclusion. As in (13), from a conversation between a boy (A) and his girlfriend (B), in which the speaker introduces a premise (Yo no te pido más tiempo ‘I don’t ask you for more time’) that leads to the implicit conclusion ‘You shouldn’t give me more time’, which contradicts what A said in his previous turn.Footnote 5
(13)
A is thinking about ending his relationship with B (ML.84.A.1.: 157–161) A: no yo SÉ que debería darte más tiempo↓ del que te doy B: pero si yo no te pido más tiempo ↓ but if I no you ask.1sg.prs.ind more time yo lo que te pido es que estés SEGURO A: ‘No I know I should give you more time than I give you’ B: ‘but I don’t ask you for more time, what I ask is that you be sure’
In a non-initial position, the use of CICC introduces a premise that justifies an explicit conclusion, as in (14), in which the propositional content (lo dijo por cachondeo ‘he was joking’) reinforces the disagreement expressed explicitly by ¡qué va! ‘no way’.
(14)
A found a watch in the street (RB.37.B.1: 48–68) C: ¿pero él– pero él entendía↑ dee– de reLOJES↑ oo? A: ¡QUÉ VA↓! si lo dijo por cachondeo no way if it say.1sg by fun C: ‘But did he know about watches or?’ A: ‘No way! He was joking’
To summarize, the construction can be used in contexts of explicit disagreement as reinforcement of a statement that codifies said disagreement (14), or else in contexts of implicit disagreement (13) in which the propositional content gives rise to a conventional implicature of a negative nature (‘something of what you have said or done is inappropriate’).
Polyphonic Rebuttal
The term polyphonic rebuttal is used to refer to utterances that contradict an assumption evoked in the turn, as in (15). Throughout her lengthy intervention, speaker A describes certain embarrassing situations she experiences when her roommates invite their boyfriends or partners to the apartment they share. The propositional content of the utterance (si yo creo que acepto de puta madre ‘I think that I absolutely accept it’) contradicts an inference that can be derived from the previous statements uttered by the speaker herself: that she does not accept the behavior of her roommates.
(15)
E talks about her roommates’ behaviors (L.15.A.2.: 933–947) porque es quee a mí me parece muy bien↑ que venga el novio de Olga y que se acueste con ella/// pero lo comprendo perfectamente si se queda la noche a dormir/ no va a dormir con él ¿no?/ lo que pasa que tú– te armen UUN CACAO to(d)a la noche que (( )) entonces/ […] SI YO CREO QUE LO ACEPTO MÁS DE PUTA MADRE ‘because the thing is that I think it’s fine that Olga’s boyfriend comes and sleeps with her, but I understand it perfectly if he sleeps over, it’s not that she’s not gonna sleep with him, no? The thing is that they make a big fuss all night that, then/ […] I think that I absolutely accept it’
From a conversational point of view, it is a TCU in non-initial position. From the rhetorical-argumentative point of view, the utterance introduces a statement that opposes a possible inference derivable from the previous discourse. Unlike regular cases of rebuttal, in which the counter-argumentation is established between statements issued by different speakers, in polyphonic rebuttals the counter-argumentation relationship is established between what the speaker affirms and what could be inferred from their previous utterances. It can be considered as a monological extension of a dialogical resource.
Controversial Agreement
The term controversial agreement is used to describe cases in which the CICC is used as a preferred response to polemic issues. This is what happens in example (16), in which a young man (J) tells his aunt (P) and his mother (C) that he has finally got his driver’s license. The CICC occurs as a response to P’s previous turn: P comments that she had told him that driving was easy and J agrees by saying si es una tontería conducir ‘driving is easy’. J’s intervention expresses his agreement with P’s previous turn and simultaneously contradicts J’s own earlier belief about driving being difficult.
(16)
J had a hard time getting his driving license (G.68.B.1: 365–374) P: ¿qué? ¿cómo va el coche ya Juan? J: muy bien/ que lo diga la mamá→ C: ¡ay!/ está hecho un artista J: que fuimos a la boda dee–/ bueno/ al bautizo C: al bautizo P: ¿yo qué te dije? verás cómo eso te vas a ir tú mismo↑ soltando↑ J: si es una tontería conducir if be.3sg a silly.thing driving C: es una tontería Q: ‘how are you doing with the car, Juan?’ J: ‘very good. Ask mom!’ C: ‘oh! He’s such an artist!’ J: ‘we went to the wedding, well, the baptism’ C: ‘to the baptism’ P: ‘what did I tell you? You’ll see that you’re going to let yourself go’ J: ‘driving is easy’ C: ‘it’s easy’
From the conversational point of view, the construction can be used as a turn in itself or as a TCU in initial position. From the rhetorical-argumentative point of view, it introduces a statement that confirms the previous turn, while canceling an assumption present in the context. Although it would seem paradoxical that the same construction is used to express both disagreement and agreement, it is necessary to clarify that it is a particular type of agreement: The CICC is used in situations in which the asserted content was challenged by previous context.
Mirativity
Mirativity refers to the speaker’s surprised attitude towards a state of affairs perceived at the time of speaking (De Lancey Reference DeLancey2001). This denomination describes situations in which a speaker expresses a counter-expectation reaction to an extralinguistic stimulus, such as in (17). This excerpt comes from a conversation in which speaker A talks about a watch she found on the street. In her turn, A says that she took it to a watchmaker to find out how much it would cost to repair it. The CICC reproduces in direct speech the reaction of the watchmaker when he sees the watch.
(17)
A found a watch in the street and brought it to a watchmaker to have it assessed (RB.37.B.1, p. 227: 117–122) y en seguida quitó la caja↑ y dice ¡vaya reloj! y dice pues si este reloj es buenísimo and say.3sg.prs.ind dp if this watch be.3sg.prs.ind good.sup ‘And right away he took out the box and he said what a watch and he said this is a very good watch.’
Mirativity contexts are conversationally characterized as turns that are initiations – since they are not responses to previous turns – but at the same time they are reactions to extralinguistic stimuli. From the rhetorical-argumentative point of view, they express a contrast between the speaker’s expectations and their perception at the time of speaking.Footnote 6
Justification
The last function departs from the above cases in that the notion of contrast is not salient. In justification uses, the construction is used as a turn extension, as in (18), coming from a conversation between young male adult friends. Speaker A is talking about his experience as an Erasmus student at the University of Ghent.
(18)
A talks about food during his Erasmus stay in Ghent (H.38.A.1., lines 459–480) B: de tapas ni de coña ¿no? A: tenía una– tapas tampoco↓ ¡qué va!// plato combinao me lo hago yo/ si tenía allí yoo una cocina/// if have.pst.ind there I a kitchen mis huevos y mis cosas (RISAS) B: ‘Eating tapas, no way, right?’ A: ‘Tapas neither, no way! A combo-dish I do it myself/ I had a kitchen there, my eggs and my stuff’
In this example, the CICC is used in a turn extension since it relies on the illocutionary force of the previous TCU. Through the use of CICC, the speaker introduces an argument that reinforces the immediately preceding TCU (‘I could make a combined dish because I had a kitchen there’).
Overview
The results of the corpus study allow us to propose an empirically based analysis of the discursive properties of the construction. As Table 14.3 shows, the construction displays a clear preference for the expression of rebuttal (63 percent) and, therefore, for dispreferred response position, thus confirming the intuitions of grammarians who tend to represent CICC in this type of context. However, a still significant number of cases (37 percent) are not dispreferred responses. Nevertheless, these data can be interpreted in another way if one takes into account that three of the remaining functions – polyphonic rebuttal, controversial agreement, and mirativity – constitute reactive interventions that also express some kind of contrast, although not between the content of two turns but between utterances in one speaker’s turn (polyphonic rebuttal), between what the speaker asserts and what could be inferred from previous turns (controversial agreement), or between what the speaker just realized and their previous expectations (mirativity).
Table 14.3 Distribution of meanings/functions of CICC in the Val.Es.Co. corpus (Reference Briz2002)
| Meaning/function | Tokens | Percentage |
|---|---|---|
| Rebuttal | 48 | 63% |
| Justification | 10 | 13% |
| Mirativity | 7 | 9.2% |
| Polyphonic rebuttal | 6 | 7.8% |
| Controversial agreement | 5 | 6.5% |
| Total | 76 | 100% |
14.3.3 Constructional Representation
The constructionist approach allows for discourse information to be represented as features of a grammatical construction: “A construction is a set of formal conditions on morphosyntax, semantic interpretation, pragmatic function, and phonology, that jointly characterize or license certain classes of linguistic objects” (Fillmore Reference Fillmore, Gras, Östman and Verschueren1999: 113). However, as the corpus study shows, the discursive position of the construction cannot be considered a formal restriction comparable to the morphological and syntactic restrictions described in Section 14.2. These are formal conditions which license a well-formed expression. For instance, the selection of a subjunctive verb form (in bold) would lead to an unacceptable result (*!Si las lleves puestas! ‘If you are wearing them!’). By contrast, the fact that CICC may not occur as a dispreferred response does not lead to ungrammaticality but to a different pragmatic function. At the same time, the fact that discourse position cannot be considered a formal restriction does not imply that it cannot be part of the representation of the construction. On the one hand, it is convenient to point out that the construction in question cannot occur in just any context: The analysis presented identifies five discursive contexts. On the other, not all contexts have the same impact: Reactive interventions constitute 63 percent of the occurrences in the corpus, which increases to 86.5 percent if we include all reactive interventions.
Construction Grammar offers the tools to incorporate discursive information in the representation of a construction. On the one hand, discursive information can be represented as a series of pragmatic attributes and values (e.g., Fried & Östman Reference Fried and Östman2005; Linell Reference Linell, Berg and Diewald2009; Nikiforidou et al. Reference Nikiforidou, Marmaridou and Mikros2014), like the conversational and rhetorical features used in the study reported above. On the other hand, as already mentioned, Construction Grammar allows information to be stored at different nodes of the ‘constructicon’, enabling some features to be specified in more concrete constructions. In order to capture the uneven distribution of contexts of the CICC, with rebuttal as the most frequent function, Langacker’s (Reference Langacker1987) approach, which distinguishes a ‘schema’ and a series of ‘instances’ and ‘extensions’, is especially relevant. This conception, which has been applied in Cognitive Construction Grammar (Goldberg Reference Goldberg2006), not only allows information to be represented at various levels (like all constructionist approaches) but also explains prototypical effects. The schema allows us to represent the information shared by all members of the category. The CICC constructional schema should include the following features:
Form: < si + indicative>
Meaning: assertion and contrast.
At the same time, the distinction of instances and extensions makes it possible to account for the functional versatility of the construction as well as the relative correspondence between functions and discourse positions. Likewise, this distinction makes it possible to capture the existence of a prototypical instance – the turn that occurs as a dispreferred response – as well as a series of extensions, which move away, to a greater or lesser extent, from the prototype and maintain relations of family resemblance among them. Both the prototypical instance and the extensions have their own discursive and rhetorical features:
(1) Rebuttal:
(2) Polyphonic rebuttal:
Function: contradict an assumption derivable from the previous content of the turn.
Discourse position: non-initial TCU.
Target: a previous TCU by the speaker.
Rhetorical relationship: counter-orientation.
(3) Controversial agreement:
Function: agree with the interlocutor while canceling an assumption derivable from previous turns.
Discourse position: initial TCU or turn in itself, preferred response.
Target: a previous turn by an interlocutor.
Rhetorical relationship: co-orientation with the previous turn and counter-orientation with previous turns or shared knowledge.
(4) Mirativity:
Function: express surprise about an extralinguistic situation.
Discourse position: initial TCU or turn in itself.
Target: the extralinguistic situation.
Rhetorical relationship: counter-orientation.
(5) Justification:
Function: agree with the interlocutor while canceling an assumption derivable from previous turns.
Discourse position: non-initial, turn extension.
Target: a previous turn by an interlocutor.
Rhetorical relationship: co-orientation.
The distinction between schemas and instances also allows us to account for geographic variation as well as polyfunctionality. As Levshina (Reference Levshina2012) argues, regional varieties of the same language can differ in terms of the organization of the schemas and instances/extensions (subschemas in her terminology) of a construction. In the case of CICC, it has been noted that the mirative function is not available in Argentinian Spanish (Rodríguez Ramalle Reference Rodríguez Ramalle2011) even though the construction can be used in its prototypical function (rebuttal). As for polyfunctionality, other discourse-sensitive patterns, such as discourse markers, are well known for their ability to occur in different contexts, expressing diverse meanings. A model which distinguishes schemas and instances/extensions can account for these phenomena, modeling the different functions/contexts either as separate constructions or as instances of an abstract schema (Fried & Östman Reference Fried and Östman2005; Nikiforidou et al. Reference Nikiforidou, Marmaridou and Mikros2014).
14.4 Prosody
The goal of this section is to illustrate an empirical prosodic analysis of CICC and to reflect on the most appropriate way to incorporate prosodic information in constructional representations.Footnote 7
14.4.1 Methodology
The prosodic analysis is based on elicited data, using a Discourse Completion Task in which the participants were given discourse contexts resembling those discussed in the previous section, and the lexical material they should use (Vanrell et al. Reference Vanrell, Feldhausen, Astruc, Feldhausen, Fliessbach and Vanrell2018). In order to keep the experiment controlled while still accounting for discourse-structural variation, the two most extreme contexts were included in the study: the prototypical dispreferred response expressing rebuttal (19) and the initiation with a mirative value (20).Footnote 8
(19) You are with a close friend, and you are talking about a third person that has put on weight after a pregnancy. Your friend tells you that it is obvious that she has put on weight because she eats chocolate every day, but you both have seen her eating vegetables and you say to your friend: ¡(Pero) si merienda verdura! ‘But she eats vegetables!’
(20) You are with a close friend that has recently had a child. You think that the child only drinks formula but you see that his mother is preparing vegetables. When you realize the vegetables are for the baby, you say: ¡(Anda) si merienda verdura! ‘(Wow) She eats vegetables!’
For each context, two sentences were included depending on the lexical stress of the last word (paroxytone and proparoxytone words).Footnote 9 The task was performed three times by each speaker. As for the speakers, all the recordings were made by fourteen native speakers of Peninsular Spanish. The average age of the participants was 24.44 (σ=2.10) and they came from four provinces of Spain (Madrid, Barcelona, Cantabria, Seville), thus representing a comprehensive overview of Peninsular Spanish. The first situation (19) was recorded by each participant (eighty-four responses), while the second was recorded only by one participant (six responses) in order to check if it had a different prosodic realization.
The annotation was done in PRAAT and consists of the first tier, in which the utterance was transcribed orthographically by words, the second tier with a syllable segmentation and phonetic transcription, and the third tier with Break Indices using Sp_ToBi, the Spanish version of ToBI (Tones and breaks indices) to transcribe prosody (Beckman et al. Reference Beckman, Díaz-Campos, McGory and Morgan2002; Estebas‐Vilaplana & Prieto Reference Estebas-Vilaplana and Prieto2008). ToBI systems are based on the Autosegmental Metrical model (Pierrehumbert Reference Pierrehumbert1980). According to this model, pragmatic meaning is best encoded by the combination of pitch movements that occur between the last stressed syllable and the end of the intonational phrase, called nuclear configurations or prosodic contours.
ToBI systems propose the existence of Break Indices (from 0 to 4), which are used to mark prosodic separation. They use a tone level-based approach, where intonation is understood as a series of high (H) and low (L) tones. Finally, they mark stressed syllables with a star (*), intermediate boundaries with a dash (-), and final boundaries with a percent symbol (%). An inventory of fourteen different nuclear configurations for Castilian Spanish can be found in Estebas-Vilaplana and Prieto (Reference Estebas-Vilaplana, Prieto, Prieto and Roseano2010). Each nuclear configuration consists of a pitch accent (the tones borne by the last stressed syllable in the utterance) and a boundary tone and is paired with a pragmatic meaning: basic speech act distinctions (assertions, questions, commands) and several pragmatic distinctions that combine with illocutionary forces (e.g., reiterated commands, contrastive focus, or obvious assertions). The goal of the analysis was to test whether the construction at hand (i) has an idiosyncratic nuclear configuration (not included in the inventory), (ii) is compatible with any of the nuclear configurations already identified for Spanish, or (iii) has a preference for one or more of these configurations.
14.4.2 Nuclear Configurations
Three nuclear configurations or prosodic contours were found in this study: L* HL%, L+H* L%, and ¡H*L%. The first two appeared in all four varieties, whereas the third appeared only in Seville. As for the contexts, all the contours can appear in both context types (rebuttal and mirative). As shown in Figure 14.1, the majority pattern is L+H* L% (59 percent), followed by L*HL% (39 percent), and a token presence of ¡H*L% (2 percent).

Figure 14.1 Attested nuclear configurations cases and percentages
The pattern L+H*L%, which consists of a rising stressed syllable and a low boundary tone (Figure 14.2), has been attested in almost all of Spanish as the typical contour of contrastive focus (Hualde & Prieto Reference Hualde, Prieto, Frota and Prieto2015).
L*HL% consists of a low last stressed syllable and a rising-falling boundary tone (Figure 14.3) and it has been described as the intonational pattern of narrow focus and contradiction statements (Prieto & Roseano Reference Prieto and Roseano2010).

Figure 14.3 Waveform, spectrogram, and pitch contour of the utterance ¡Pero si merienda verdura! ‘But she eats vegetables!’ produced by a female speaker from Barcelona with the nuclear configuration L* HL%
Finally, the ¡H* L% contour, which consists of a rising stressed syllable from a high target to an extra high one (Figure 14.4), has been described as a dialectal pattern attested in Canarian yes/no questions. However, it has also been attested in Seville as a dialectal solution for refutational statements.

Figure 14.4 Waveform, spectrogram, and pitch contour of the utterance ¡Pero si merienda médula! ‘But she eats marrow!’ produced by a male speaker from Seville with the nuclear configuration ¡H*L%
The summary of the attested prosodic patterns, their schematic contour, and pragmatic meaning can be observed in Figure 14.5. The three patterns are very similar, both in form and meaning. Formally, they consist of a rising stressed syllable and a low boundary tone, and it is unclear whether speakers can perceive the difference between them. Moreover, their meanings are also very similar, related to the expression of focus and contrast. Therefore, they can be considered formal variants of the same pattern.
14.4.3 Constructional Representation
Building on the analysis just presented, this section discusses the status of prosodic contours in a constructional model of grammar. It is possible to identify three potential scenarios regarding the relationship between a grammatical construction and its prosody:
Scenario 1. The construction is prosodically neutral; it can combine with any prosodic contour in the language.
Scenario 2. The construction is prosodically idiosyncratic; it has its own prosodic contour, which does not occur outside the construction (Sadat-Tehrani Reference Sadat-Tehrani2008).
Scenario 3. The construction inherits its prosody from independently existing prosodic constructions, which pair a prosodic contour with a pragmatic meaning (Ogden Reference Ogden, Barth-Weingarten, Reber and Selting2010; Ward Reference Ward2019).
As shown in the previous section, CICC combines only with focus prosodic contours, with a special regional variant in Seville. Therefore, the prosodic behavior of this construction is consistent with the third scenario: The construction inherits an already existing prosodic contour. A theoretical explanation of the relationship between prosodic patterns and grammatical constructions in a constructional model might consist of two interrelated aspects: (i) to treat prosodic patterns as constructions (with regional variants when relevant) and (ii) to represent prosody as a feature of grammatical constructions (see Chapter 13 for this distinction).
The first aspect rests on modeling prosodic contours as constructions that pair a phonological form (a prosodic contour) and a pragmatic meaning (illocutionary force or information structure status), in line with current research on the prosodic analysis of conversational data (Ogden Reference Ogden, Barth-Weingarten, Reber and Selting2010; Ward Reference Ward2019). Evidence for this approach comes from the fact that the same prosodic contours occur in constructions with different formal marking but similar meaning-function. Take, for instance, other Spanish insubordinate constructions with a rebuttal function (dispreferred responses), like ni que ‘not even that’ or como si ‘as if’ followed by past subjunctive verb forms (21),Footnote 10 which also inherit focus prosodic contours, as represented in Figure 14.6 (left and right).
(21)
A: ¿Me preparas la cena? ‘Can you cook me dinner?’ B: ¡Ni que fuera tu madre! not.even that be.1sg.pst.sbjv your mother ‘As if I was your mother!’ B’: ¡Como si fuera tu madre! as if be.1sg.pst.sbjv your mother ‘As if I was your mother!’

Figure 14.6 Waveform, spectrogram, and pitch contour of the utterances ¡Ni que fuera tu madre! and ¡Como si fuera tu madre! ‘As if I were your mother!’ produced with a focus intonational pattern (L+H* L%)
These insubordinate constructions differ from CICC in terms of their form: They are introduced by different subordinating conjunctions (ni que ‘not even that’, como si ‘as if’) and they select subjunctive verb forms. However, they coincide in their pragmatic function: They tend to occur in dispreferred second parts of adjacency pairs to express some correction of a previous statement (usually uttered by the interlocutor) by describing an extreme or absurd situation in which the previous turn would be appropriate. Moreover, the same prosodic contour is also found in simple declarative clauses with no specific marking if they express the same pragmatic function, as in (22), prosodically analyzed in Figure 14.7. Therefore, prosody does not mainly depend on lexicogrammatical form but on meaning-function.
(22)
A: ¿Me preparas la cena? ‘Can you cook me dinner?’ B: ¡No soy tu madre! not be. 1sg.prs.ind your mother ‘¡I’m not your mother!’

Figure 14.7 Waveform, spectrogram, and pitch contour of the utterance No soy tu madre ‘I am not your mother’ produced with a focus intonational pattern (L+H* L%)
Prosodic constructions, that is, pairings of an intonational contour with a meaning-function, can also have ‘allostructions’: structural variants of constructions that do not encode different meanings (Capelle Reference Capelle2006). On the one hand, it is not clear whether the changes in peak alignment that the data display in the alternation of L+H* L% and L* HL% are phonological, since they do not convey a dramatic change in meaning. On the other, intonational constructions are subject to dialectal variation, as the specific pattern found in Seville suggests.
The second option is to treat prosody as a feature of grammatical constructions. Lexical and phrasal constructions that can license sentences must inherit a prosodic construction, depending on the pragmatic function of the construction and the prosodic constructions available in the language. The range of prosodic constructions compatible with a specific construction depends on the functional specialization of that construction. Some constructions, like CICC, are only compatible with a specific prosodic construction due to their highly specialized pragmatic function. In other cases, a sentence-level construction can inherit more than one prosodic pattern because they can yield different pragmatic interpretations. For instance, insubordinate complement quotative constructions allow for the repetition of an utterance by the same speaker (Gras et al. Reference Gras, Pérez, Brisard, Hennecke and Wiesinger2023), as in (23). This construction can inherit two prosodic constructions (Figure 14.8): (i) declarative prosodic construction if the speaker simply repeats their previous statement and (ii) focus prosodic construction if the speaker tries to convey that their interlocutor did not react in accordance with their previous statement.
(23)
A: Son las nueve. ‘It’s 9.’ B: ¿Qué? ‘What?’ A: Que son las nueve. that be. pl.prs.ind the nine ‘I said it’s 9.’
Finally, these findings are compatible with construction-based prosodic research on the polyfunctionality of insubordinate constructions. Gras and Cabedo (Reference Gras and Cabedo2022) analyze independent constructions introduced by a ver si (lit. ‘to see if’) in Spanish, which can convey different modal-illocutionary values, like questions, wishes, and expressions of fear, and they find that these correlate with different nuclear configurations. Similarly, Fried and Machač (Reference Fried and Machač2022) analyze insubordinate constructions introduced by jestli (lit. ‘if, whether’) in Czech, which can express different degrees of epistemic certainty, and they observe that explicative constructions (‘I think that maybe p’) have slightly rising melody, whereas argumentative constructions (‘I think that probably not p’) have sharply falling intonation. Differences in prosody offer formal support for differences of meaning and are taken as evidence of separate constructional status.
14.5 Conclusions
One of the most attractive aspects of constructional approaches is that the notion of grammatical construction is flexible enough to represent whatever linguistic information is needed to explain how a linguistic pattern is used, regardless of whether the relevant information belongs to phonology, morphology, syntax, semantics, pragmatics, or discourse. While the meaning potential of constructions has been fruitfully developed in constructional literature, the prosodic dimension of constructions has received comparatively less attention.
The study of a marginalia, like insubordination, helps shed light on the discursive and prosodic functioning of grammatical constructions. As for the discursive aspects, it has been shown that CICC has discourse restrictions, but these are not comparable to morphosyntactic constraints which need to be satisfied by each instance of the construction. An alternative to including discourse information is considering constructions as complex categories made up of a schema and a series of instances and extensions. This analysis has several advantages: It (i) accounts for the effects of prototypicality (rebuttal as the main function), (ii) incorporates discursive information at the level of instances and extensions (not at the schema level), (iii) allows describing peripheral meanings (e.g., justification) with greater precision, and (iv) reflects dialectal variation since not all functions are attested in each geographical variety (e.g., mirativity).
Regarding prosody, it has been shown that CICC combines with a focus prosodic pattern. However, the prosodic information is not idiosyncratic, since the pattern exists outside the construction. The theoretical possibility explored in this chapter is that the prosodic patterns of a language (or language variety) can be represented as schematic constructions that pair a prosodic contour (form) with a pragmatic function (meaning) that are inherited by sentence-level constructions if their meanings are compatible.
In sum, a constructional model which considers constructions as complex categories with inheritance relationships from abstract constructions can account also for the discursive and prosodic features of constructions: Discursive information is better represented at the level of instances, which show different degrees of prototypicality, while prosodic information is better represented as inherited from general prosodic patterns in the language.
15.1 Introduction
To state that language use is fundamentally multimodal is uncontroversial for usage-based linguists. It is long recognized that the primary setting of language or its ur-context (Cienki Reference Cienki2016: 605) is face-to-face interaction and in direct, face-to-face interaction we simultaneously draw on verbal speech, gesture, posture, facial expressions, and other non-verbal cues to convey meaning. Yet, it is only a fairly recent development that cognitive linguists have started to fully embrace the multimodal nature of language use by working with authentic, video-recorded discursive data and developing theories to account for how semiotic modes work together in conceptualization.
A serious boost for multimodality research from a cognitive-linguistic perspective came from pioneer studies on multimodal metaphor and metonymy as expressed in co-speech gesture (Mittelberg Reference Mittelberg2006, Reference Mittelberg2019; Cienki Reference Cienki, Müller and Cienki2008; Müller Reference Müller2008; Cienki & Müller Reference Müller2008) as well as in pictures and video (Forceville Reference Forceville and Gibbs2008; for an overview see Sanaz Reference Sanaz2013 and Feyaerts et al. Reference Feyaerts, Brône, Oben and Dancygier2017). Over the past decade, other cognitive-linguistic paradigms, most notably Construction Grammar, have followed that path and widened their focus towards the kinesic modalities. A growing number of construction grammarians have raised the issue of whether, in the light of the inherent multimodality of human language use, the status of constructions as pairings of verbal forms and verbally encoded meanings needs to be reconsidered (Andrén Reference Andrén2010; Cienki Reference Cienki2015, Reference Cienki2016; Zima Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b; Zima & Bergs Reference Zima and Bergs2017; Feyaerts et al. Reference Feyaerts, Brône, Oben and Dancygier2017; Schoonjans Reference Schoonjans2018). At the same time, interactional linguists and gesture researchers have turned to Construction Grammar in search of a model of linguistic knowledge and cognitive representation to account for the tight coupling of verbal and kinesic structures observed in language use (Lanwer Reference Lanwer2017; Stukenbrock Reference Stukenbrock, Weidner, König, Wegner and Imo2020; Debras Reference Debras2021).
This convergent development, which one may hope will ring in a fully-fledged multimodal turn in Cognitive Linguistics (Zima & Brône Reference Zima and Brône2015), originates in the very core of the usage-based model and its premise that all knowledge of language is abstracted from language use. The implications of fully embracing the multimodality of language use, though, are far reaching for Cognitive Linguistics. The issue opens up the question of “what counts as language?” (Cienki Reference Cienki2016: 606) and thus what the research objects of Cognitive Linguistics should be. Furthermore, many theoretical debates that are ongoing within the field come to the fore with even greater saliency once we take a broader, multimodal perspective (Cienki Reference Cienki2017: 1). This also holds for the nascent field of multimodal Construction Grammar, which struggles with a number of theoretical and empirical issues and is occasionally met with skepticism within Construction Grammar and gesture studies alike (Ningelgen & Auer Reference Ningelgen and Auer2017; Lanwer Reference Lanwer2017; Ziem Reference Ziem2017; Debras Reference Debras2021). Therefore, my aim for this chapter is to present the current state of the ongoing debate on whether “we really need a multimodal Construction Grammar” (Ziem Reference Ziem2017: 1). I will start by giving a basic introduction to what gestures are, how they convey meaning, and why the discussion on the constructional status of gestural information only concerns co-speech gestures. There is no controversy that emblematic gestures (also called ‘emblems’) are constructions in their own right, just as signs of sign languages are (Hoffmann Reference Hoffmann2017). To illustrate co-speech gestures’ close integration with speech, I will show how they contribute to an utterance’s meaning at all levels and also touch upon issues of the temporal alignment between gestures and speech. Both are crucial aspects to be borne in mind when exploring the possible existence of multimodal constructions and the nature of the constructicon.Footnote 1
15.2 What Are Gestures and How Do They Convey Meaning?
Lay people often use the word ‘gesture’ as an umbrella term covering all sorts of hand movements, ranging from pointing gestures, iconics, and depictions to unspecific hand movements such as scratching one’s head or fiddling with one’s wedding ring. In gesture studies, the concept is on the one hand employed in a broader sense, encompassing all sorts of bodily articulators, such as the hands, the head, shoulders, arms, feet, and also facial expressions. On the other hand, the analytical focus is restricted to what Adam Kendon has termed ‘gesticulation’: “visible bodily action used as an utterance or as part of an utterance” (Kendon Reference Kendon2004: 7) or for short “utterance visible action” (Kendon Reference Kendon2014: 7). Gestures are thus produced with the intent to be semantically and pragmatically meaningful and thereby an integral part of utterance construction. In Kendon’s words, they are “employed to accomplish expressions that have semantic and pragmatic import similar to, or overlapping with, the semantic and pragmatic import of spoken utterances” (Kendon Reference Kendon2014: 7). In a similar vein, Calbris (Reference Calbris2011: 6) defines gestures as “visible movement[s] of any body part consciously and unconsciously made with the intention of communicating while speech is being produced” (my emphasis).Footnote 2 Both definitions emphasize gestures’ communicative meaning or deliberate expressiveness and hence exclude bodily movements that are not produced with the intent of encoding semantic-pragmatic meaning but rather reveal aspects of the speaker’s emotional or psychological state. The boundary, however, is not clear-cut and the analysis of authentic discourse always reveals a number of ambiguous cases. Nonetheless, there is consensus on what constitutes the core domain of co-speech gesture or ‘visible bodily action’: Gestures are kinesic movements that point towards a referent (present or imagined), depict a concrete or abstract referent, or serve to structure discourse.
David McNeill, one of the leading researchers in the field of psycholinguistic modality research, has therefore proposed a gesture typology comprising four types: deictics, iconics, metaphorics, and beats (McNeill Reference McNeill1992).Footnote 3 Deictic, iconic, and metaphorical gestures are referential in nature, that is, they relate to a referent either by pointing to or by depicting it. This referent may be a concrete entity (iconics) or an abstract one (metaphorics).Footnote 4 Beat gestures (also called ‘batons’, Efron Reference Efron1941; Ekman & Friesen Reference Ekman and Friesen.1969) are coordinated with the rhythm of the speech they accompany. The relationship is not semantic but discursive-pragmatic as they are often used to stress or emphasize a particular aspect. With respect to formal aspects, they usually consist of a back and forth, up-down, or left-right movement.
Another gesture category, not mentioned in McNeill (Reference McNeill1992), which, however, may also play a role for multimodal Construction Grammar, are recurrent gestures (Ladewig Reference Ladewig, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014), such as the palm-up open hand (Kendon Reference Kendon2004; Müller Reference Müller, Müller and Posner2004), the throwing-away gesture (Bressem & Müller Reference Bressem, Müller, Müller, Cienki, Fricke, Ladewig, McNeill and Teßendorf2014), and cyclic gestures (Ladewig Reference Ladewig2011). Their main characteristic is the fact that they “show a stable form–meaning relationship” (Ladewig Reference Ladewig, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014: 1158) and are thus more conventionalized than the spontaneous gestures that fall within the four other categories. However, they have not (yet) developed into emblems such as, for example, the thumbs-up gesture. They are thus not fully conventional signs or constructions in a Construction Grammar sense with a speech-independent semantics, as is the case with emblematic gestures (for an overview of emblematic gestures, see Teßendorf Reference Teßendorf, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014). Rather, the meaning of recurrent gestures is schematic. Most notably, Bressem and Müller (Reference Bressem and Müller2017) propose that one such recurrent gesture, that is, the throwing-away gesture, constitutes the gestural component of a verbo-gestural pattern expressing negative assessment, which may qualify as a multimodal construction in a Construction Grammar sense (to be explained in more detail in Section 15.4.1).
Other important facts to know about gestures pertain to the way they are produced in time and in relation to speech and the difference as to how they encode meaning in contrast to verbal language. Pioneering work by Adam Kendon (Reference Kendon and Key1980) has identified different phases in the execution of a gesture. The central phase is the so-called stroke phase. It is this phase that we excerpt the gesture’s meaning from. The stroke phase is preceded by a preparation phase, in which the hands move from the rest position to perform the stroke (usually close to the center of the speaker’s gesture space; McNeill Reference McNeill1992). This stroke phase may be followed by a retraction phase in which the hands move back into the rest position. Hold phases in between these phases (e.g., post-stroke hold) or within the stroke phase are also possible and often correlate with verbal disfluencies.
These gesture phases can be combined to form higher-level units: gesture phrases and gesture units. Gesture phrases comprise preparation phases and strokes and gesture units involve the full movement cycle from preparation to retraction. Gesture phrases, gesture units, and speech are temporally aligned with each other in a particular way: The preparation phase usually precedes the articulation of the lexical affiliate, that is, the lexical element that is semantically co-expressive with the gesture. Concerning the gesture stroke, there is more controversy. Some studies argue that the stroke onset may start and even end before the affiliate is articulated (Ferré Reference Ferré2010; ter Bekke et al. Reference ter Bekke, Drijvers and Holler2020), while others report that the stroke coincides with the affiliate (Chui Reference Chui2005; McNeill Reference McNeill2005). Focusing on the relationship between gesture and intonation, Loehr (Reference Loehr2004) reports that the stroke most typically shortly precedes or collides with the utterance’s focus accent. This is corroborated by follow-up studies (Jannedy & Mendoza-Denton Reference Jannedy and Mendoza-Denton2005; Shattnuck-Hufnagel et al. Reference Shattnuck-Hufnagel, Yasinnik, Veilleux, Renwick, Espositio, Bratanić, Keller and Marinaro2007). Although the details of the temporal alignment between speech and gesture are thus partly subject to debate, it is uncontroversial that they are closely aligned and this temporal alignment is mirrored in their semantic alignment, as co-speech gestures are generally considered to be co-expressive.
However, verbal language and gestures co-express meaning in different ways. Speech is segmented on various levels, that is, into phonemes, lexemes, phrases, constructions, etc. Although gestures can be segmented, too, their meaning is not compositional. Rather, they are considered to constitute one meaningful whole. Furthermore, co-expressiveness should not be confused with semantic redundancy. Although gestures can of course co-express meaning that is also encoded verbally, it is common for them to express meaning aspects that are not specified in speech (Kendon Reference Kendon and Key1980; McNeill Reference McNeill1992). For instance, if a speaker recounts a soccer match and says that “the defender tackled me and I lost the ball” and moves their right arm to depict an elbow check, we infer from this gesture that the act of tackling involved an elbow check and take it to be the reason the speaker lost the ball. But the gesture does more than that. It also involves specific information about how the elbow check was performed, including how quickly and with how much physical force, whether the elbow was moved in a horizontal trajectory or whether it was lifted, possibly to aim at the opponent’s upper body or face. All this information is put in and inferred from the gesture and it is obviously far more ecological and easier to depict all of it in one gesture than to put it in words. Hence, there is a division of labor between the verbal and the gestural modality or, put differently, they both work together to convey one thought. Kendon (Reference Kendon and Key1980) has termed this one thought the underlying ‘idea unit’. Speakers, however, do not only make use of referential gestures to express content that is easier to depict than to recount; they also use gestures to highlight meaning aspects (Alibali & Kita Reference Alibali and Kita2010; Schoonjans Reference Schoonjans2018). This point has been made most notably by Müller (Reference Müller2008), who argues that metaphors that are co-expressed in gesture and speech entail that the level of activation of the construal’s metaphoricity is higher than if the metaphor is only present in one modality. This is reminiscent of Givón’s (Reference Givón1985) principle of quantitative iconicity: “More form is more meaning.”
Besides these communicative and semantic-pragmatic functions, gestures also serve a number of functions that are linked to speech production and interaction management. For instance, gestures are frequent in persistent word searches and it has been argued that gesturing helps to overcome the word retrieval problem because the motor activity stimulates cognitive activity (Kraus Reference Kraus1998) and reduces cognitive load (Goldin-Meadow et al. Reference Goldin-Meadow, Nusbaum, Kelly and Wagner2001). Accordingly, it is claimed that gestures, including representational gestures, have self-oriented cognitive functions (Kita et al. Reference Kita, Alibali and Chu2017). At the same time, gestures also play a role in the turn-taking process as they are used to allocate turns as well as to signal a wish to take the turn (Mondada Reference Mondada2007; Schmitt Reference Schmitt2014; Zima Reference Zima2018). Gestures are hence multi-functional and this multi-functionality has a number of implications for how co-occurrences of gestures and verbal constructions may be modeled within Construction Grammar.
After this compact overview of some of the main characteristics of the gestural modality, the next section addresses this chapter’s main concern: Are constructions multimodal and how do we know? We start with the theoretical seeds of the idea.
15.3 Multimodal Constructions? The Discussion’s Theoretical Foundations
This contribution focuses on the place of co-speech gestures within Construction Grammar, most notably Cognitive Construction Grammar (Goldberg Reference Goldberg1995, Reference Goldberg2006), which clearly subscribes to the usage-based thesis (Barlow & Kemmer Reference Barlow, Kemmer, Barlow and Kemmer2000). Accordingly, it holds that all linguistic knowledge is abstracted from language use, drawing on general cognitive mechanisms such as pattern recognition, abstraction, schematization, and categorization. This usage, that is, the input, is inherently multimodal: We do not only speak with words, but we gesture, we direct our gaze someplace, we display emotions and attentional states through our postures and facial expressions, we speak up sometimes and whisper at other times, etc. Language is thus learned in a multimodal environment (Enfield Reference Enfield2009). Most notably, children gesture before they are able to speak and the language acquisition process is heavily dependent on co-speech gesture use (e.g., to establish the link between a given concept and the name for this concept). The dependence of language use on co-speech gesture use diminishes in the course of language acquisition (Cienki Reference Cienki2015), but nonetheless communication in face-to-face interaction remains inherently multimodal throughout the lifespan. Therefore, one theoretical argument put forward in favor of a multimodal reconceptualization of language is grounded in the fact that we obviously have extensive, systematic, and structured knowledge of how to communicate in multimodal environments. This knowledge must be stored, that is, entrenched, in one way or the other. The crucial question is: Is it part of linguistic knowledge, of grammar?
Usage-based linguistics models grammar as “the cognitive organization of one’s experience with language” (Bybee Reference Bybee2006: 2916) and construction grammarians have posited that this cognitive organization consists of constructions only. This idea is often referred to by citing Goldberg’s iconic statement (Reference Goldberg2003: 226), “It’s constructions all the way down.” Hilpert (Reference Hilpert2014: 2) has rephrased the same idea as “Knowledge of language consists of a large network of constructions, and nothing else in addition.” However, there is a crucial difference, notable in Bybee’s and Hilpert’s quotes, because Bybee refers to grammar, whereas Hilpert speaks of knowledge of language. This is not a trivial difference, as the way we model knowledge of gesture use in the Construction Grammar framework crucially depends on how we conceptualize the relationship between grammar and language.
From Hilpert’s encompassing view of constructions and the constructicon one may infer that constructions must include information on how to instantiate them multimodally because it is assumed that all language knowledge is stored as part of constructions and, surely, the way we use constructions is entrenched knowledge. This resonates with the line of argument put forward in Zima (Reference Zima2014b): If we align with the idea that the constructicon comprises all of our knowledge of language but conceptualize language and constructions as purely monomodal, it leaves us with the unresolved problem of needing to explain why the usage-based thesis should only hold for recurrences at the verbal level and where our rich knowledge on how to communicate multimodally, that is, to employ constructions multimodally, is stored.
Another take on the issue, however, is to view grammar and language knowledge as non-equivalent. This implies that knowledge of language includes grammatical knowledge as well as other forms of knowledge that are abstracted from language usage, potentially including knowledge on how to combine constructions with gestures. Quite a few authors have advocated this position (Ningelgen & Auer Reference Ningelgen and Auer2017; Ziem Reference Ziem2017; Verhagen Reference Verhagen2021), proposing it as a way out of the current impasse in the field. This discussion is not settled and it cannot be resolved in this chapter, but one consequence seems evident: If grammar and knowledge of language are only partly overlapping, the Construction Grammar claim that knowledge of language is “constructions and nothing else in addition” (Hilpert Reference Hilpert2014: 2) may not be tenable and may need to be revised.
In this context, it is important to note that the discussion on where to locate gestures in the constructicon did not really originate in Construction Grammar but was stipulated by Ronald Langacker, who explicitly acknowledged that gestures may be part of a linguistic unit:
In Cognitive Grammar …, the form in a form–meaning pairing is specifically phonological structure. I would of course generalize this to include other symbolizing media, notably gesture and writing. … Cognitive Grammar takes the straightforward position that any aspect of a usage event, or even a sequence of usage events in a discourse, is capable of emerging as a linguistic unit, should it be a recurrent commonality.
In 2008, he even got more specific, giving the example of a co-speech gesture that is performed in baseball:
When a baseball umpire yells Safe! and simultaneously gives the standard gestural signal to this effect (raising both arms together to shoulder level and then sweeping the hands outward, palms down), why should only the former be analyzed as part of the linguistic symbol? Why should a pointing gesture not be considered an optional component of a demonstrative’s linguistic form?
The theoretical statement and the example, however, differ in one important aspect. In the case of the umpire signal, the gesture is a mandatory component of the sign, that is, the signal is not adequately performed if one only yells Safe! and does not gesture. Therefore, from a Construction Grammar perspective the status of this form–meaning pairing as consisting of a verbal and a gestural component is rather uncontroversial, and some authors have indeed argued in a similar vein, proposing that constructions are multimodal if and only if a gestural component is mandatory and cannot be omitted without the construction being incomplete (Ningelgen & Auer Reference Ningelgen and Auer2017; Ziem Reference Ziem2017). In the case of the baseball signal, completeness is determined by sports convention, that is, to just perform the gesture without yelling Safe! is not uninterpretable but it is treated as pragmatically unacceptable. This is because at some point in time people have agreed upon the convention that in order for the umpire signal to be effective and consequential, the verbal and gestural parts have to be performed together. In other cases, especially in the case of some deictic constructions which are often discussed as candidate multimodal constructions (Stukenbrock Reference Stukenbrock2010, Reference Stukenbrock2015, Reference Stukenbrock, Weidner, König, Wegner and Imo2020; Ningelgen & Auer Reference Ningelgen and Auer2017; Balantani Reference Balantani2021), completeness is a semantic-pragmatic category. This, for instance, concerns deictic constructions like [like that/this] or [this ADJ] (also German so ‘like this’; Stukenbrock Reference Stukenbrock2015; Ningelgen & Auer Reference Ningelgen and Auer2017). These are uninterpretable without a gesture that specifies the deictic slot, by, for example, depicting how a certain action has to be performed (‘you need to hold your hand like this’) or by specifying the shape of an object or some spatial dimension (‘the hole was this big’). For constructions that involve an obligatory gestural component, the multimodal unit status is uncontested, too. Rather, the debate centers on the questions whether obligatoriness of a gestural component is a prerequisite for multimodal constructions and whether gestures can fill optional slots of multimodal constructions. The latter hypothesis is grounded, among others, in Goldberg’s definition of constructions as frequency dependent:
Any linguistic pattern is recognized as a construction as long as some aspect of its form or function is not strictly predictable from its component parts or from other constructions recognized to exist. In addition, patterns are stored as constructions even if they are fully predictable as long as they occur with sufficient frequency.
Innumerable studies have since then studied the effects of frequency on unit formation and entrenchment (e.g., Bybee Reference Bybee2006; Schmid Reference Schmid, Geeraerts and Cuyckens2007, Reference Schmid2014; Blumenthal-Dramé Reference Blumenthal-Dramé2012; Divjak & Caldwell-Harris Reference Divjak, Caldwell-Harris, Dąbrowska and Divjak2015; Divjak Reference Divjak2019) (see also Section 15.4.1), providing arguments and counterarguments for the unit status of highly frequent instantiations alongside more abstract and/or unpredictable constructional patterns, while at the same time agreeing on the fact that ‘sufficient frequency’ is too vague a term and therefore not an operational criterion (Traugott & Trousdale Reference Traugott and Trousdale2013: 11; for discussion, see also Hartmann & Ungerer Reference Hartmann and Ungerer2023). At the same time, the exemplar view advocated by Bybee (Reference Bybee2010) holds that even constructions that one comes across only once or a couple of times in one’s life may get stored in the long-term memory if there is some salient aspect to them that makes them stick in the mind. The exact role of frequency in Construction Grammar is hence still disputed (Hoffmann Reference Hoffmann, Hoffmann and Trousdale2013) and this has implications for multimodal Construction Grammar. Obviously, it is impossible to define a frequency threshold for gesture recurrence that any claim about the constructional status of a (verbal) construction-gesture co-occurrence can be safely based on. This has been the most critical issue in multimodal Construction Grammar so far. It touches upon the recognizable gap between the general acceptance of the claim that language is multimodal and the difficulties in proving that a particular construction is multimodal in nature. The next section sketches the state of the art in the field.
15.4 State of the Art in Multimodal Construction Grammar
The current debate in the field can be framed as comprising two main strands. The first one includes construction-based case studies that in one way or another rely on frequency of gesture co-occurrence as an argument in favor of or against the multimodal status of constructions. The second strand takes a more gesture- and meaning-centered approach. The following section is structured as follows. In Section 15.4.1, the state of the art in the field is presented by focusing on the case studies that have been conducted so far. These studies lay the groundwork for the presentation of approaches that draw on them for proposing novel ways to think about the issues under debate, most notably Cienki’s (Reference Cienki2017) proposal of an ‘utterance construction grammar’. These proposals are discussed in Section 15.4.2.
15.4.1 Case Studies
As outlined above, one of the main arguments brought forward in favor of a multimodal reconceptualization of the constructicon and constructions is grounded in claims that “any recurrent aspect of a construction’s usage can become entrenched” (Langacker Reference Langacker2001). Over the past decade, several studies have shown that gestures recurrently and systematically co-occur with given verbal constructions but co-occurrence frequencies vary strongly. They range from up to 85 percent for English motion and distance constructions ([all the way from X PREP Y]; see Zima Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b, and also Pagán Cánovas & Valenzuela Reference Pagán Cánovas and Valenzuela2017) to approximately 70 percent for different types of English time expressions (Pagán Cánovas et al. Reference Pagán Cánovas, Valenzuela, Alcaraz-Carríon, Olza and Ramscar2020), 58 percent for English aspectual verbs (Hinell Reference Hinell2018), and 37 percent (and less) for German modal particles (Schoonjans Reference Schoonjans2018). To date, except for Ningelgen & Auer (Reference Ningelgen and Auer2017) on deictic so ‘like this’ in German (see discussion in Section 15.3 on mandatory gestures with particular deictic expressions), no study thus far reports co-occurrence rates of 100 percent, and it seems safe to say that there may indeed be only very few verbal constructions that qualify as multimodal if a 100 percent co-occurrence rate is taken as the sole criterion. Ziem (Reference Ziem2017) takes this to be a strong counterargument against the multimodal conception of constructions and the constructicon. Similarly to Ningelgen and Auer’s line of argumentation, he proposes to perform deletion tests, arguing that the gesture’s input to the meaning of the construction must be so crucial that without the gesture the construction collapses and becomes uninterpretable.
A different path is followed by Lanwer (Reference Lanwer2017), Schoonjans (Reference Schoonjans2017), and most recently Debras (Reference Debras2021), who argue that mere frequency is rather uninformative and the analytical focus needs to be transferred to how gestures contribute to utterance meaning. Debras (Reference Debras2021) links this to a general complaint that the focus of Construction Grammar is too much on form. If we take the verbal construction and its form as the point of departure, we tend to consider the co-occurring gesture as secondary and optional, that is, something we add while we speak but which we could equally well leave out. Our notion of constructions and the constructicon, however, may look fundamentally different if we depart from the meaning side (cf. Lasch Reference Lasch2020 and his meaning-centered approach to the German constructicon) and shift focus to how gesture and speech collaborate to express an idea, that is, the ‘idea’ unit according to Kendon (Reference Kendon2004). This is the line of argumentation followed by, for example, Hoffmann (Reference Hoffmann2017), Mittelberg (Reference Mittelberg2017), Bressem & Müller (Reference Bressem and Müller2017), Schoonjans (Reference Schoonjans2018), and (partly) Zima (Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b).
Departing from an emergent grammar perspective, which takes grammar to be “the name for certain categories of observed repetitions in discourse” (Hopper Reference Hopper and Tomasello1998: 156), Mittelberg (Reference Mittelberg2017) presents a case study on the German existential construction [es gibt X] ‘there is an X’. She argues that this particular construction involves a slot for a gestural enactment that depicts an act of “giving or holding something” (Mittelberg Reference Mittelberg2017: 1). This gestural re-enactment is grounded in the basic pattern of experience that Goldberg has argued to motivate (di)transitive constructions: “The initial meaning is an experiential gestalt. This basic pattern of experience is encoded in a basic pattern of language” (Goldberg Reference Goldberg and Tomasello1998: 208). Accordingly, Mittelberg (Reference Mittelberg2017:2) argues that “the basic manual actions of giving and holding … motivate multimodal instantiations of existential constructions in German discourse.” Drawing on semi-experimental data of German spoken discourse, she illustrates that es gibt-constructions co-occur with unimanual variants of the palm-up open-hand gesture as well as bimanual palm-vertical open-hand gestures.Footnote 5 Her analysis shows that there is formal recurrence in the gestures, while their semantic-pragmatic meaning is clearly situated and dependent on the discursive context. The semantic recurrence only holds for the very schematic meaning of “holding some kind of imaginary entity.” As Mittelberg acknowledges, these analyses are preliminary, but her work on existential constructions points towards candidate constructions for future research in multimodal Construction Grammar by suggesting that “linguistic constructions that recruit basic embodied manual actions and interactions with the physical and social world are particularly likely to be instantiated multimodally and thus also engender emergent multimodal patterns, or clusters, of experience” (Mittelberg Reference Mittelberg2017: 5).
This conclusion seems to be backed up by my own studies on English motion and distance constructions such as [Vmotion in circles], [zigzag], and [all the way from X PREP Y] (Zima Reference Zima2014b and Zima Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b). In American English data from various TV formats (UCLA Library NewsScape; Steen et al. Reference Steen, Hougaard, Joo, Olza, Pagán Cánovas, Pleshakova, Ray, Uhrig, Valenzuela, Woźny and Turner2018), I found gesture co-occurrence frequencies that range between 37 percent and 85 percent. Although these are considerably high frequencies, gesturing with these constructions is obviously not mandatory, at least not under every circumstance. If one was to perform a deletion test, as proposed by Ziem (Reference Ziem2017), the conclusion would have to be that all these constructions are not multimodal in nature as the constructs are not uninterpretable without the gesture. Yet, the gestures are not redundant and add to the meaning of the utterances. In particular, the iconic gestures make a certain aspect of conceptualization particularly salient, following the quantitative iconicity principle of “more form is more meaning.” The following examples are meant to illustrate this.
In example (1), the speaker is telling a story and enacting a scene from a hockey game. The gesture, which consists of consecutive rapid movements of the right hand, emphasizes both the marked path of motion (in circles) and the velocity (faster and faster). It thus fulfills the function of highlighting and drawing attention to the semantic aspects of path and manner of motion.
(1) KNBC Tonight Show with Jay Leno, July 16, 2010

This highlighting function also holds for gestural instantiations of temporal and spatial uses of [all the way from X PREP Y] (Zima Reference Zima and Bergs2017). An example of a spatial instantiation that is accompanied by a co-speech gesture is given in (2). The bimanual gesture performed by the speaker depicts and thereby emphasizes the long distance between location X (Long Beach) and location Y (Lancaster), thereby communicating that the task of delivering food to all clients in this area on a single day is difficult.
(2) KNBC 4 News at Noon, December 25, 2012

Frame grab (1) shows the first stroke of the gesture that is co-produced with the articulation of the first geographical reference point (Long Beach), which instantiates the X-slot of the constructional template. Frame grab (2) depicts the second stroke that is aligned with Lancaster. Right and left hand each mark the beginning and endpoint of a spatial path. The space between the two extended hands maps onto the distance between the two places.
Based on both the analysis of the gestures’ semantic-pragmatic meaning and their frequency (63 percent for [V(motion) in circles]; 85 percent for spatial uses of [all the way from X PREP Y]), it is argued that we should not treat these seemingly redundant, co-expressive gestures as totally optional. Rather, our focus should be more data-centered, acknowledging and trying to explain the fact the speakers recurrently do gesture. Following Kendon (Reference Kendon2004) and Calbris (Reference Calbris2011), these gestures are produced with the intention to convey meaning and hence cannot be dismissed as ‘just optional’.
An equally meaning-centered approach is taken by Bressem and Müller (Reference Bressem and Müller2017). Departing from a recurrent gesture, the so-called throwing-away gesture, they illustrate that this gesture can be combined with a number of different verbal constructions including a wide range of grammatical categories such as particles, nouns, verbs, and adverbs. The throwing-away gesture is “characterized by a particular kinesic core: a lax flat hand oriented vertically with the palm facing away from the speaker’s body flapping downwards from the wrist” (Bressem & Müller Reference Bressem and Müller2017: 3). Just as Mittelberg argues for palm-up open-hand gestures that co-occur with German existential constructions, Bressem and Müller argue for an experiential basis of the gesture which they situate in the embodied experience of throwing concrete entities away. This is extended to metaphorical uses when referring to abstract objects in speech. They thus identify a constructional pattern, which they term “negative assessment construction,” with the multimodal form [throwing-away gesture] + [particles/negation/N/V/ADV]. From a theoretical perspective, they suggest the compelling idea that whether constructions are multimodal in nature is probably not a polar question requiring a yes-or-no answer. Rather, verbal constructions may constitute a multimodal network, with some of them being more, and others less, bound to particular gestures.
A further pioneering study is Schoonjans’ (Reference Schoonjans2018) monograph on German modal particles and the role of manual and head gestures to co-express down-toning meanings. His study is among the very first to not only raise theoretical questions but perform a large-scale corpus analysis that inquires in detail into the interdependence of verbal constructions and non-verbal co-occurrence patterns. The frequencies reported for multimodal instantiations of the modal particles under scrutiny are rather low (37 percent and less), but this should not lead one to dismiss Schoonjans’ results and his approach. Indeed, he raises and discusses a number of issues that are critical for future endeavors in multimodal Construction Grammar. These include the problem that recurrence (e.g., Langacker Reference Langacker2001) involves the assumption that there is a stable formal and semantic core that is common to all instantiations and results from subtraction of all in situ variation. However, as Bressem (Reference Bressem, Müller, Cienki, Fricke, Ladewig, McNeill and Teßendorf2013) illustrates, the form of manual gestures may vary in a great number of dimensions including hand shape, orientation, movement, and position in gesture space; therefore, “no two tokens of gesture are ever identical” (Harrison Reference Harrison2009: 82). Put differently, the issue of whether two gesture tokens are instantiations of the same gesture type is far from trivial.
Another methodological problem with far-reaching implications that Schoonjans draws attention to is the fact that there is not always perfect temporal alignment between the verbal construction and a co-expressive gesture. For instance, the performance of gesture phrases and units may take more time than the articulation of the lexical affiliate and, more importantly, the lexical affiliate may not be just one verbal construction but a larger semantic unit within an utterance. To date, all these issues are unresolved. As Schoonjans (Reference Schoonjans2017) argues, many of them are, however, not restricted to attempts to develop a multimodal Construction Grammar but they also concern monomodal Construction Grammars. This mostly concerns the still debated link between frequency and entrenchment (Hoffmann Reference Hoffmann, Hoffmann and Trousdale2013, Reference Hoffmann2017) but also the question of the level of granularity that one assumes a construction to be situated at.
15.4.2 Theoretical Proposals: Monomodal, Multimodal Construction Grammar, or Something Else?
Monomodal Construction Grammars posit that constructions exist at every level of granularity or schematicity, ranging from highly abstract patterns to lexically and syntactically fully fixed ones. They further allow for constructions to have optional slots. Therefore, one may consider it arbitrary to posit that verbal elements can be optional but gestural ones need to be obligatory. At the same time, one may equally wonder whether non-obligatory elements in verbally defined constructions are cognitively real or whether they rather point towards the existence of different constructions at different levels of granularity. This issue is raised by Lanwer (Reference Lanwer2017), who suggests that the difference between mono- and multimodal constructions may be a degree of schematicity. Therefore, multimodal constructions comprising a given [verbal form + gesture] may be stored alongside the more specific monomodal ones that do not involve a slot for a co-speech gesture. This argument is grounded in the very basic claim of Construction Grammar, namely that constructions may be stored redundantly at different levels of granularity. He further argues that in order to account for the varying frequencies of constructions’ co-occurrence with gestures and the varying degree of constructions’ dependence on gesture, we should consider thinking of a multimodal network of interrelated constructions as prototypically structured and involving fuzzy boundaries.
This idea is worked out in some more detail in Cienki (Reference Cienki2017). He introduces the idea of an Utterance Construction Grammar, with utterance being defined as “a level of description above that of speech and gesture for characterizing audio-visual communicative constructions” (Cienki Reference Cienki2017: 1). The suggestion of yet another model of linguistic knowledge is grounded in the conviction that it may be futile to try to coerce gestures into a verbally based constructional framework. In taking the utterance as point of departure, Cienki aligns with Kendon’s approach to gesture as “utterance dedicated visible bodily action” and speech as “utterance dedicated audible bodily action” (Kendon Reference Kendon and Allen2015: 44, cited in Cienki Reference Cienki2017: 3) as well as Langacker’s concept of the ‘usage event’ defined as including “the full phonetic detail of an utterance, as well as any other kinds of signals, such as gestures and body language” (Langacker Reference Langacker2008: 457, cited in Cienki Reference Cienki2017: 3). His proposal that constructions have a deep as well as a surface structure is reminiscent of two concepts that are traditionally associated with Generative Grammar, but Cienki stresses that the terms are borrowed without adhering to the nativist assumptions that underlie the Universal Grammar approach. The deep structure is conceptualized as “a set of tools that can be drawn upon to express the construction,” whereas the surface structure is “a metonymic representation of some (if not all) elements of the construction” (Cienki Reference Cienki2017: 3). Accordingly, information about which gestures go with a construction is stored in the construction’s deep structure. Constructions thus exhibit an inherent potential for multimodal realization and some aspects of this potential may get activated and be visible at a construction’s surface representation, that is, in a construct. Crucially, potential component elements as part of the deep structure may differ in being more or less prototypically associated with the construction. This way of thinking about constructions, Cienki (Reference Cienki2017: 5) argues, “is a more flexible alternative than positing that the model has the binary choice between required and optional elements” and is more compatible with the idea of various degrees of entrenchment.
Cienki thus proposes a new way of thinking about many issues that have turned out to be challenging for multimodal Construction Grammar. However, one may wonder about the ways of putting these ideas to the test. In that vein, Hoffmann (Reference Hoffmann2017) emphasizes the need for larger-scale data studies and the application of quantitative and statistical methods that go beyond absolute and relative frequencies (as, for example, in Zima Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b; Schoonjans Reference Schoonjans2018).
An example of such a quantitative approach is a recent study by Debras (Reference Debras2021) on French je (ne) sais pas ‘I don’t know’. Her approach is not explicitly situated within multimodal Construction Grammar. However, her paper involves an interesting discussion on why the constructional approach does not do full justice to the semantic-pragmatic import of co-speech gestures, arguing that the original Construction Grammar focus on verbal constructions entails that gestures are regarded as “secondary and dependent on speech” (Debras Reference Debras2021: 42). At the same time, she concludes that the association of the various uses of je (ne) sais pas as a pragmatic marker with recurrent gestures is too loose to allow for a straightforward categorization as a multimodal construction. In that respect, the methodology that is applied in her study is especially interesting and points to a potentially fruitful direction; based on a qualitative, multimodal analysis of eighty-four occurrences,Footnote 6 she identifies three multimodal profiles of je (ne) sais pas. A multiple correspondence analysis is then performed to identify the strength of association between all annotated parameters, which include phonetic realization, prosodic detail, functions, type of co-speech gestures, and a couple more. It turns out that the variable ‘type of co-speech gesture’ accounts for a big portion of the variation in the dataset and is thus only loosely associated with the particular phonetic realizations and functions. Mirroring the ongoing discussion on obligatoriness, frequency, and prototype structure in the field of multimodal Construction Grammar, these results may be thus interpreted in two ways: either as evidence for je (ne) sais pas clearly not being a multimodal construction, or as an argument for the need for a more nuanced model along the lines proposed by Zima (Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b), Lanwer (Reference Lanwer2017), Cienki (Reference Cienki2017), and Schoonjans (Reference Schoonjans2018).
All these studies, hence, suggest that there are many ways to conduct research with a multimodal constructional focus. However, in some way they are all struggling with similar issues, most notably difficulties in answering the pending question of where multimodal information is stored in our mind. This question clearly calls for an interdisciplinary approach that brings together experts in multimodal communication and gesture studies as well as cognitive linguists, psycholinguists, and cognitive scientists. However, it seems that one step to take before that is to increase the empirical basis by conducting more case studies on large enough multimodal datasets. Little is known on how systematic the relationship between given verbal constructions and gestures really is. So, where do we go from here?
15.5 The Road Ahead
As I hope to have shown in this chapter, the inquiry into the potential multimodality of constructions and the constructicon is still in its infancy and faces a number of theoretical and methodological challenges. These relate to the debated role of frequency of co-occurrence, the status of open slots in constructions, and the issue of whether grammar is restricted to verbal symbols or not. Some of these issues are intrinsic to the Construction Grammar framework but come to the fore with greater saliency when we extend the focus towards multimodal communication. This may leave readers with the impression that the endeavor may be futile altogether. I would like to close this chapter with a different conclusion. Much of the current discussion in the field of multimodal Construction Grammar suffers from a top-down approach; instead, we should adopt a more bottom-up perspective. Many arguments, including those presented in Zima (Reference Zima2014a, Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b), Zima and Bergs (Reference Zima and Bergs2017), and in this chapter, depart from the basic tenets of Cognitive Linguistics, the usage-based model, and especially Cognitive Construction Grammar. It is argued that there is a discrepancy between the acknowledgment that language use is multimodal and the way we theorize about language and language use in Construction Grammar. While this observation is valid, the discussion about the place of gesture (and other non-verbal modalities) within communication and grammar remains a purely theoretical one, unless we ground it in a much broader empirical basis. Too little is known about how consistent co-occurrences and mappings between the verbal and the gestural modalities are on a constructional level. Therefore, we need many more case studies, and this includes studies that start out from verbal constructions and their multimodal instantiations as well as more gesture- and meaning-centered ones. This entails the need for large enough, annotated, multimodal corpora. The NewsScape Library (Steen et al. Reference Steen, Hougaard, Joo, Olza, Pagán Cánovas, Pleshakova, Ray, Uhrig, Valenzuela, Woźny and Turner2018) is an exceptionally good starting point for any study on multimodal instantiations of constructions as it is fully searchable (for verbal constructions) and contains enormous amounts of audio-visual data, not only in English, but also in Spanish, Russian, German, and many more languages. Of course, this is not to say that smaller multimodal corpora cannot be used. They are equally relevant especially for constructions that occur frequently enough to compose a large enough dataset. Not least, these corpora are very valuable resources because the NewsScape Library only contains televised interactions and thus no private, face-to-face conversations or other interactional settings.
Finally, we need to broaden our methodological toolkit. To move forward on the issues under scrutiny, we need both qualitative research, which pays close attention to how meaning is expressed in situ in all modalities, as well as quantitative studies that make use of the full array of statistical methods that have been applied so successfully in Construction Grammar and other Cognitive Linguistic disciplines over the past decade (cf. Janda Reference Janda2013). Most notably, the issue at hand is a fundamentally interdisciplinary one that calls for an interdisciplinary approach and may not be resolvable by construction grammarians alone.
16.1 Signed Languages
The history of deaf people and their signed languages is mired in false assumptions and misunderstandings. Signing was seen to be only imagistic gestures, certainly not a language. Deaf people were long considered uneducable. The first schools for deaf children were only established in the seventeenth and eighteenth centuries. The most famous of these was in Paris. The story goes that Charles-Michel de l’Épée, a Catholic priest, saw young deaf sisters signing, and he had an idea. He believed that this signing could be used to teach deaf children. What he did not realize was their signing was a fully developed language that met the needs of the French deaf community. We know this from Pierre Desloges (1779, as reported in Lane & Grosjean Reference Lane and Grosjean1980: 123–124), a deaf Parisian who wrote:
There is no event in Paris, in France, and in the four corners of the world that is not a topic of our conversations. We express ourselves on all topics with as much orderliness, precision, and speed as if we enjoyed the faculties of speech and hearing.
In order to teach deaf children in his Paris Institute, l’Épée took the lexical stock of this language and modified it in various ways to represent French; he called this ‘methodical signs’. Because the Paris Institute was one of the only schools to use signed language, educators from other countries came to adopt l’Épée’s approach, and his method spread throughout France and Europe into the nineteenth century. In the early 1800s an educator from the United States, Thomas Hopkins Gallaudet, visited the Paris Institute. Gallaudet returned with one of the school’s deaf instructors, Laurent Clerc. Together, they established a school for the deaf in Hartford, Connecticut. As a result, instruction by sign spread throughout the United States.
Not everyone believed signing was the proper way to teach deaf children. A growing group favored the Oral Method, by which students were supposed to learn to speak. The opposing views turned into a battle, and the site was Milan, Italy, in 1880. A group led by Gallaudet advocated for signing instruction. The oralist camp believed signing would corrupt the minds of deaf children. One prominent oralist, Giulio Tarra, drew the argument into sharp relief, equating language with speech and sign with gesture (Lane Reference Lane1984: 393–394):
Gesture [i.e., sign] is not the true language of man which suits the dignity of his nature. Gesture, instead of addressing the mind, addresses the imagination and the senses. … Thus, for us it is an absolute necessity to prohibit that language and to replace it with living speech, the only instrument of human thought … Oral speech is the sole power that can rekindle the light God breathed into man when, giving him a soul in a corporeal body, he gave him also a means of understanding, of conceiving, and of expressing himself. … While, on the one hand, mimic signs are not sufficient to express the fullness of thought, on the other they enhance and glorify fantasy and all the faculties of the sense of imagination … The fantastic language of signs exalts the senses and foments the passions, whereas speech elevates the mind much more naturally, with calm and truth and avoids the danger of exaggerating the sentiment expressed and provoking harmful mental impressions.
The oralists won the debate and the Oral Method became the dominant form of deaf education. By the turn of the twentieth century, there were no schools in France or the United States that used signing. Deaf instructors were fired from their positions. Signed languages did not disappear, of course. But our understanding of them entered a dark period.
16.2 Linguistic Analysis of Signed Languages
In 1960 William C. Stokoe was a professor of English at Gallaudet College in Washington, DC, the world’s only institute of higher education dedicated to serving deaf students. In a scene reminiscent of l’Épée, Stokoe had been observing his students signing and began to believe it was a unique visual language. He discussed this with his faculty colleagues, who told him this was nonsense, that signing was simply a bad representation of English and lacked any structure of its own.
The view that signed languages lack linguistic structure is expressed linguistically in the claim that they lack duality of patterning – that is, that they do not have a finite inventory of meaningless elements that combine to form meaningful elements (Pulleyblank Reference Pulleyblank1987). Whenever linguists were asked to consider signed languages, the response was that signed languages are not vocally produced and thus do not consist of sounds. Signed languages have no phonology – the very name, after all, refers to ‘sound’.
Stokoe was not a trained linguist; his PhD had been in medieval literature, and he realized he needed a background in linguistics in order to show that signed languages do have a phonology. So in the summer of 1957 he studied with linguists George Trager and Henry Lee Smith at the Summer Institute of Linguistics in Buffalo, New York. From them he learned structuralist linguistics and phonology. The outcome was a pioneering book, Sign Language Structure: An Outline of the Visual Communication Systems of the American Deaf (Stokoe Reference Stokoe2005).Footnote 1 Stokoe demonstrated that signs do have a level of structure equivalent to the phonemes of spoken words. He called these minimal elements cheremes, the root word cher- from Homeric Greek meaning ‘hand’. He defined cheremes as “that set of positions, configurations, or motions which function identically [to phonemes] in the language; the structure point of sign language (analogous to ‘phoneme’)” (Stokoe Reference Stokoe2005: 33). Relying on the structuralist approach, he observed that “Like consonant and vowel,” the cheremes of position, configuration, and motion “may only be described in terms of contrast with each other” (Stokoe Reference Stokoe2005: 20). These three structural elements became known as location, handshape, and movement.
Stokoe moved beyond this structural view in his later writing. Structuralist assumptions, however, remained embedded within sign linguistics with significant implications for what is considered to constitute phonology, the lexicon, grammar, and constructions in a signed language. One significant impact of the structuralist heritage is the notion that the phonology of signed languages constitutes a finite set of elements, a listable inventory of meaningless building blocks for signs. As we will see, the linguistic status of location as one of those phonological elements emerges as a problem in the analyses of grammatical constructions.
16.3 Constructional Approaches
In this section we review approaches to the analysis of grammatical phenomena in signed languages that take a constructional approach. We start with a proposal for constructional morphology (Section 16.3.1) and then move to grammatical constructions (Section 16.3.2). One important issue that arises in research on grammatical constructions in signed languages pertains to the relation between language and gesture as components in grammatical constructions (Section 16.3.3).
16.3.1 Construction Morphology
One of the assumptions sign linguistics has inherited from the structuralist tradition is that linguistic knowledge is divided into two distinct categories, the lexicon and the grammar, each constituting separate ‘modules’ requiring a special set of primitives. Lepic and Occhino (Reference Lepic, Occhino and Booij2018) offer a Construction Morphology approach to the analysis of signed language structure as an alternative to structuralist sign morphology.
Sign language linguists traditionally distinguish a set of non-compositional, core lexical signs from multimorphemic classifier construction signs.Footnote 2 Core lexical signs have standard citation forms and meanings that are considered to be unpredictable from their sub-lexical structure and are assumed to reside in the lexicon. Classifier constructions exhibit more variability and transparency. Classifier signs are assumed to be productively derived in the grammar.
However, upon closer examination there is a gradient between core lexical and classifier signs. Signs like MEET in ASL are categorized in the core lexicon, with unpredictable forms and meanings (Brentari & Padden Reference Brentari, Padden and Brentari2001). The sign MEET can, however, be modified to create a morphologically related sign such as MISS-EACH-OTHER, which is categorized as a classifier construction. This raises the question of the relation between core lexicon and classifier constructions. One answer has been to claim that with repeated use, classifier constructions lexicalize. Viewed from the opposite direction, core lexical signs may be used in ways that reveal the compositional, transparent morphological structure characteristic of classifier constructions, or delexicalize.
In the Construction Morphology approach (see also Chapter 4), signs are morphological schemas containing both specific and schematic aspects of form and meaning. Certain sign constructions are highly specified morphological schemas with fixed pairings of form and meaning. MEET consists of a specific handshape (index finger) on both hands and a specific movement (hands approach and contact in front of signer). Others exhibit partially fixed or schematic structure. In MISS-EACH-OTHER the handshape is specified as for MEET; the movement, however, is schematic and contextually determined in use. Analyzability is thus a matter of the degree of entrenchment and conventionality of each component element as well as of the composite construction.
An example of the Construction Morphology approach is the ‘movable object’ construction, a family of signs in which two ‘A’ hands (a closed fist with the thumb extended) move in various ways relative to one another. The ASL signs FAR, CHASE, and FOLLOW are members of this family, along with several others (Frishberg & Gough Reference Frishberg and Gough2000). Associations of form and meaning across these three signs are extracted to create a morphological schema. The A-handshape remains as a constant, specific aspect of the form; movement and configuration of the hands are schematic components. The morphological schema analysis makes two predictions: “First, conventional (lexical) sign constructions that instantiate a morphological schema are expected to retain analyzable internal structure, even as they begin to gradually take on more idiomatic meanings,” and “Second, signers are expected to productively modulate their articulation of a schematic sign construction” (Lepic & Occhino Reference Lepic, Occhino and Booij2018: 160). This productive modulation produces a classifier construction. As an example, Lepic and Occhino offer an excerpt from an ASL news story discussing the 2015 United States Democratic party primary polling. At one point, Bernie Sanders was trailing Hillary Clinton by twenty-one points, but then he started catching up and at the time of the report was leading. In the description, the two fixed A-handshapes of the movable object construction represent the polling ranking of Sanders and Clinton; changing the relation between the hands instantiates the schematic relation between two entities in the construction – in this context, Sanders’ falling back or catching up. Lepic and Occhino conclude that requiring the signs FALL-BACK and CATCH-UP to be categorized as either core lexical signs or classifier constructions is a vestige of the structuralist tradition.
Lepic and Occhino also explore what they call the language vs. gesture problem: the assumption of a categorical division between language and gesture. They note that previous analyses have tried to distinguish the two in terms of a categorical distinction between elements that are listable, analyzable, and conventional (language) and elements that are holistic, context-dependent, and defy rule-based generalizations (gesture). The result of this view, they say, is that any gradient aspect of signing must be considered gestural and non-linguistic by definition.
Under a Construction Morphology approach, however, gradience and schematicity is an aspect of all constructions. This complements what is known from usage-based approaches, such as Bybee (Reference Bybee2010), who observes that all types of units proposed by linguists show gradience, in the sense that they exhibit variation within the domain of the unit (different types of words, morphemes, syllables) and difficulty in setting the boundaries of the unit.
16.3.2 Grammatical Constructions
Research applying construction grammar approaches to signed languages is still relatively rare. In the present section, research on the discourse functions of constructions with certain ASL verbs is presented. Hou (Reference Hou2022a) investigates recurring constructions that involve a high-frequency sign of visual perception glossed as LOOK and a family of ‘look’ signs. The LOOK sign exhibits two broad functions: LOOK/‘vision’ references literal or metaphorical vision, and LOOK/‘reaction’ signals a person’s reaction to a visual stimulus. These two major constructions were identified based on a corpus of 706 tokens and 36 types from the family of look signs. LOOK/‘vision’ was found to occur in more diverse syntactic environments, including:
(a) presence of an explicit object in a post-verbal position;
(c) co-occurrence with negators in pre-verbal position;
(d) formation of a complex predicate by co-occurrence of LOOK with another verb; and
(e) nominalization of LOOK (e.g., ‘reminiscence’).
LOOK/‘reaction’, on the other hand, tended to occur in expressions with first-person representing the signer’s attitudinal stance. This first-person LOOK/‘reaction’ construction appears to be grammaticizing as a highly conventionalized unit. Hou proposes three constructional schemas. Constructional schemas (1) and (3) correspond to LOOK/‘vision’ and LOOK/‘reaction’, respectively; constructional schema (2) was ambiguous between the two.
(1) (subject: agent) – LOOK/‘vision’ – (object)
‘Hey, look at me, could you please look me in the eye?’
(2) (subject: agent) – LOOK/‘vision’ – reaction
‘I was assigned to read Edgar Allan Poe, I read it, it went over my head.’
In this example, read is expressed by the sign LOOK. The first interpretation expresses vision with a book as the object; the second is ambiguous because the subsequent single sign over my head expresses the signer’s reaction that Poe was too difficult to understand.
(3) (subject: experiencer) – LOOK/‘reaction’ – reaction
‘He was like oh I see, you’re deaf, got it.’
Hou (Reference Hou2022b) adopts a usage-based approach to describe the argument structure of directional verbs in ASL.Footnote 3 Canonically, directional verbs consist of a path movement between locations determined by the arguments. For example, GIVE in a transfer construction might move between two locations, the first corresponding to the agent and the second to the recipient of the giving act.Footnote 4
Hou examined two datasets totalling 494 tokens of seven highly frequent ASL directional verbs: ASK, TELL, REMIND, AWARD, GIVE, CONVINCE, LOOK-AT. The verbs were grouped into those that can be used in reported speech constructions (ASK, TELL, REMIND), passive constructions (AWARD, GIVE, CONVINCE), and stance verb constructions (LOOK-AT). The analysis revealed two types of reported speech construction (RSC) schemas:
RSC Type 1 schema: [(subject) {ASK, REMIND, TELL …} (object) [CA: …]]Footnote 5
RSC Type 2 schema: [(subject) {ASK, REMIND, TELL …} (object) …]
Four types of passive (and reflexive) constructions were identified:
Passive construction Type 2: [… GIVE.3 object1 TO object2 …]
Passive construction Type 3: [… (subject) AWARD.3 …]
Passive/reflexive construction Type 4: [… PRO.1 CONVINCE.1 …]
Finally, LOOK-AT appeared in two constructions, as already described above (Hou Reference Hou2022b):
LOOK/‘vision’ construction: [(subject) (modal) (negator) LOOK/‘vision’ (object)]
LOOK/‘reaction’ construction: [(subject) LOOK-AT/‘reaction’ X-reaction]
Hou suggests that focusing on the discourse function of the verb can help resolve theoretical questions concerning the status of these constructions, specifically whether argument locations are language or gesture.
16.3.3 Constructions and Gesture
Unburdened of the need to defend the status of signed languages as nothing more than gesture, sign linguists have begun to explore the relationship between the two systems. Diachronic research suggests that gestures become incorporated over time into the linguistic systems of signed languages through lexicalization and grammaticalization (Wilcox Reference Wilcox2004, Reference Wilcox2005; Pfau & Steinbach Reference Pfau, Steinbach, Heine and Narrog2011; Janzen Reference Janzen, Steinbach, Pfau and Woll2012). In general, the process starts with a manually produced gesture which enters a signed language as a lexical morpheme; that lexical sign then acquires grammatical meaning. For example, it has been proposed that a departure gesture used in the Mediterranean region entered French Sign Language (LSF) as the lexical sign PARTIR ‘leave’ (Janzen & Shaffer Reference Janzen, Shaffer, Meier, Quinto and Cormier2002). Because ASL is historically related to LSF, the sign also appeared in ASL at the turn of the twentieth century with the lexical meaning ‘to depart’. It also occurs in ASL with a more grammatical function marking ‘future’.
The second way gesture and sign may interact is by co-occurring in utterances. This synchronic relationship has direct relevance to the nature of constructions, since it suggests that grammatical constructions may consist of both linguistic and gestural components (see also Chapter 15). One candidate for classification as a gestural component is the location at which signs are directed, for example in pointing and in verbs that mark arguments by spatial location. Liddell (Reference Liddell2003) points out the difficulties with specifying locations in pointing constructions (which Liddell calls pointing gestures), in indicating verbs,Footnote 6 in which arguments are marked by location, and in other constructions. The problem arises because these expressions can use innumerable locations to identify arguments. For example, consider the ASL verb ASK-QUESTION used in questioning constructions. When a signer asks an addressee a question, ASK-QUESTION is directed at the addressee’s chin. The specific location changes depending on the spatial location and relative heights of the addressee and the signer. As Liddell (Reference Liddell and Lucas1990) observes, if the signer were facing an exceptionally tall man, ASK-QUESTION would be directed to a location considerably higher than the signer. Other signs must be directed at specific parts of the body. GIVE is directed at the addressee’s chest and COMMUNICATE-TELEPATHICALLY is directed at the addressee’s forehead (Liddell Reference Liddell2003). Of course, many other factors can determine the spatial location of the addressee. In all of these cases, where the addressee or some conceptualized addressee is located will determine the location at which these signs are directed. Liddell attributes these properties of location to gradience and concludes that these facts about location “are inconsistent with the claim that there is a locus associated with the addressee toward which signs are directed. If there were such a locus, all directional signs referring to the same entity (e.g., the addressee) would be directed toward that single locus” (Reference Liddell2003: 76).
The spatial locations used to specify arguments are claimed to be an open set of unlistable locations. Since location is a phonological element, the claim implies that locations do not have linguistic status and instead are categorized as gesture. In a GIVE construction, for example, the handshape and certain other properties of the sign are regarded as linguistic, while the location at which the sign is directed is classified as gesture. Pointing, agreement, and many other grammatical constructions in signed languages are thus seen as combinations of linguistic and gestural elements. Liddell recognizes the need to specify phonological locations, but since these specifications are seen as gesture, he concludes that “Each individual verb has specific gestural characteristics associated with it” (Reference Liddell2003: 139).
Although Wilbur (Reference Wilbur2013: 222–223) does not accept Liddell’s argument, she summarizes it succinctly:
Liddell (Reference Liddell2003, Reference Liddell2011) has argued that since directional verbs move between spatial locations associated with referents, and since there are an infinite number of possible points, the forms of these verbs are unlistable, and are therefore just gestural indications of the referent … Thus, the argument goes, if the locations in space that are used for indexic and referential pointing are not listable, they cannot be part of the grammar, and therefore must be external to it, that is, part of ‘gesture’.
Schembri and colleagues (Schembri et al. Reference Schembri, Cormier and Fenlon2018) apply Liddell’s analysis, examining indicating verb constructions in detail in order to determine if they are compatible with an analysis in which agreement is a morphosyntactic mechanism of copying features from one verbal unit (controller) to another one (target). Accepting Liddell’s claim that in such constructions any movement of the signing hand towards a location signals an association with a referent (in the same way as a pointing gesture would by a non-signer), they extend his analysis by proposing that indicating verbs are typologically unique unimodal constructions (comparable to multimodal constructions in spoken languages) (Schembri et al. Reference Schembri, Cormier and Fenlon2018). They propose that indicating verbs are conventionalized pairings of form and meaning that consist partly of a monomorphemic sign specified phonologically for handshape, orientation, and movement, and “partly of a deictic gesture which has its own pragmatic properties” (Schembri et al. Reference Schembri, Cormier and Fenlon2018: 12). They note that if directionality in indicating verbs is a type of gesture rather than person agreement markers, this would predict that directionality of signs will have more in common with directionality in co-speech gestures than with agreement marking. They go on to show, as per Liddell’s (Reference Liddell, Emmorey and Lane2000, Reference Liddell2003) argument, that directionality of indicating verbs is not controlled by a formal or semantic property of the controller noun phrase, but by the real or imagined location of the referent. They note that an agreement analysis fails because the location of a referent is not reflected in any grammatical feature of the controller in signed languages; that is, they say, there is no evidence that all nouns of a particular signed language “have an inherent grammatical feature of location, with a fixed set of values” (Schembri et al. Reference Schembri, Cormier and Fenlon2018: 17) as would be required for a feature-copying agreement system.
They offer supporting evidence for their analysis of indicating verbs as fusions of signs and pointing gestures from a corpus-based study of indicating verbs in British Sign Language. Following Liddell’s claim that signers direct indicating verbs towards real or imagined referents, they predict that indicating verbs should co-occur with constructed action, in which the signer’s face and body represent an imagined referent’s actions, utterances, or feelings. Fenlon et al. (Reference Fenlon, Schembri and Cormier2018) showed that the presence of constructed action significantly favored modification of indicating verbs to mark subject and object arguments. Schembri et al. conclude that the indicating verb system of signed languages is not an agreement system, because the way in which these constructions exploit space for deictic reference “does not always result in the systematic covariance normally associated with agreement systems” (Reference Schembri, Cormier and Fenlon2018: 29). Rather, they conclude that the similarities to gesture argue for an analysis of indicating verbs as typologically unique, unimodal fusions of morphemes and pointing gestures functioning as a construction for the purpose of reference tracking.
Janzen (Reference Janzen2017) approaches the question of whether signed language constructions contain gestural elements from the perspective of Enfield’s (Reference Enfield2009, Reference Enfield, Müller, Cienki, Fricke, Ladewig, McNeill and Tessendorf2013) proposal of expressions as composite utterances. According to this view, utterances are complete units of social action with multiple components embedded in a sequential context, with meaning drawing on “both conventional and non-conventional signs, joined indexically as wholes” (Enfield Reference Enfield2009: 223).
Janzen examines topic-comment constructions and perspective-taking constructions in ASL to identify their linguistic and gestural elements. In ASL topic-comment constructions such as (4), the topic phrase is indicated by raised eyebrows and potentially a backward head tilt; at the beginning of the comment phrase the eyebrow and head return to a neutral position. This non-manual grammatical topic marking has been shown to be grammaticalized from a generalized questioning gesture (Janzen Reference Janzen1999).
(4) [TOMORROW NIGHT]-top WORK
‘Tomorrow evening I am working.’
Janzen suggests that in this construction, the signs TOMORROW, NIGHT, and WORK are conventional elements that are lexical, listable, and have a standard form across ASL communities. The status of the facial and head gestures, he contends, is less clear. While these non-manual topic marking elements are conventional in that they are “interpreted as signaling a topic phrase,” Janzen (Reference Janzen2017: 522–523) suggests that their “status as grammaticalized from a gestural body action does not preclude that the action is still gestural” and thus that the utterance is a hybrid composed of linguistic and gestural elements.
In a more complex example, a signer tells that while driving on the highway she sees an oncoming police car and pulls her car off onto the side of the road. The signer produces the utterance KNOW R-O-A-D HIGHWAY.Footnote 7 Janzen offers two observations about this utterance. First, the topic “road highway” introduced by KNOW is used to invoke a schema for highways as a reference point for the shoulder area. Second, in this utterance KNOW is articulated on the signer’s cheek rather than the canonical forehead location that would indicate a fully lexical word. Janzen notes that “there is a tendency for such alternate, non-canonical locations to be somewhat more likely when the sign has a grammaticalized function, and thus has lost its yes/no question interactional function” (Janzen Reference Janzen2017: 525), and argues that this non-canonical articulation suggests that “KNOW in this utterance is a hybrid signifier and not a purely conventional one” (Janzen Reference Janzen2017: 524).
In the same story, the signer produces the utterance in (5):
(5) WINDOW, depic:lean-on
‘She was leaning on the car door.’
Janzen analyzes WINDOW as a reference point topic, with a comment “consisting of an entirely gestural depiction of someone inside the car leaning on the car door (with the window completely rolled down)” (Janzen Reference Janzen1999: 527). Thus, this utterance is analyzed as a complete, meaningful utterance consisting of one conventional lexical sign and one gestural element.
A third example of a composite utterance occurs in a comparative construction. In comparative constructions, the entities to be compared are positioned in contralateral and ipsilateral locations in front of the signer (Winston Reference Winston, Emmorey and Reilly1995). The signer describes the changing size of wolf populations based on the changing size of their food source, using the contralateral and ipsilateral positions to indicate the two wolf populations, their comparative sizes, and the relative sizes of the food sources. Janzen observes that
The spatial positioning of each is considered here to be gestural. There is nothing linguistic that would necessitate the use of these spaces, although their conventionalization would indicate that their use has regularized as grammar, so that here we would propose that this construction represents an instance where something has entered the domain of grammar and yet has retained gestural components.
A final example of a composite perspective-taking utterance comes from the previous story about the driver pulling off to the side of the road. In this utterance, the signer depicts the driver sitting with her hands still grasping the steering wheel; her body and face indicate that she is looking down the road at the police car. Janzen notes that viewpointed gestural stances in depicted narrative events occur frequently in ASL utterances, and suggests that in this case the signer’s body and face are non-conventional gestures because “they portray the signer’s subjective take on a particular character’s interaction in a single event” (Janzen Reference Janzen2017: 532). As compared to the utterance in which the signer depicted a leaning gesture on the car, which Janzen categorizes as wholly non-conventional, this utterance is classified as partly conventional because the signer simultaneously uses “a conventionalized (depicting) verb for driving in ASL” (Janzen Reference Janzen2017: 532). Thus, this construction is classified as having conventionalized linguistic components (the sign for driving) and gestural components (the signer’s body and face).
The distinction between conventional and non-conventional, which is often equated with lexical versus gestural, is critical to the argument for sign–gesture composites and for what constitutes a grammatical construction. Janzen warns against making a binary distinction between conventional and non-conventional elements of signed language utterances, suggesting that conventionality should be seen as a continuum. He concludes that the role played by gesture in signed languages has never been clear, and there is still much to learn about the extent to which signed language utterances may be infused with gestural elements.
16.4 Cognitive Grammar and Sign Constructions
The focus in this section will be an approach to signed language constructions based on Cognitive Grammar (Wilcox & Occhino Reference Wilcox and Occhino2016; Martínez & Wilcox Reference Martínez and Wilcox2019; Wilcox & Martínez Reference Wilcox and Martínez2020). Cognitive Grammar is a radically austere construction grammar, claiming that lexicon, morphology, and syntax consist solely of assemblies of symbolic structures, the pairing of a semantic structure and a phonological structure, such that one is able to evoke the other (Langacker Reference Langacker1987, Reference Langacker1991, Reference Langacker2008). Semantic structures are conceptualizations that signers and speakers recruit to express meanings. The essential feature of phonological structures is that they are able to be overtly manifest.
Langacker (Reference Langacker, de Mendoza Ibáñez and Cervel2005) compares Cognitive Grammar with other types of construction grammars, such as Goldberg (Reference Goldberg1995) and Croft (Reference Croft2001). All construction grammars assume that constructions subsume lexicon and grammar and consist of form–meaning pairings, but differ in what is meant by form. In Cognitive Grammar, form specifically refers to phonological structure. Cognitive grammar does not include what in other types of construction grammars would be called grammatical form. Goldberg (Reference Goldberg1995) for example describes the pairing between a semantic and a syntactic level of grammatical functions, and Croft (Reference Croft2001: 62) defines a construction as symbolic because it consists of “a pairing of a morphosyntactic structure with a semantic structure.” In the Cognitive Grammar approach, grammatical form “does not symbolize semantic structure, but rather incorporates it, as one of its two poles” (Langacker Reference Langacker, de Mendoza Ibáñez and Cervel2005: 105). The basic claim of Cognitive Grammar is that grammatical notions such as noun, verb, noun phrase, subject, and object “are semantically definable and inherent in symbolic assemblies” (Langacker Reference Langacker, de Mendoza Ibáñez and Cervel2005: 106); linear order is a dimension of the phonological pole rather than an aspect of grammatical form. Whereas other construction grammars assume these irreducible grammatical primitives, Cognitive Grammar claims that only semantic and phonological structures are necessary, instantiating two fundamental domains of human experience: conceptualization (semantic structure) and perceptible forms (auditorily perceptible speech sounds and visually perceptible sign forms).
Constructions in Cognitive Grammar are complex assemblies of symbolic structures. Component symbolic structures are integrated at both the semantic and the phonological poles to form a composite symbolic structure. This integration takes place by correspondences equating schematic elements in one component structure with a more specific element in another component structure; the two structures are “superimposed, their specifications being merged (or unified)” (Langacker Reference Langacker2003: 50).
We find Cognitive Grammar a productive approach for a number of reasons. Apart from the theoretical appeal of reducing an apparently abstract phenomenon such as grammar to the experiential level of human conception and perception, Cognitive Grammar is particularly well suited for analyzing a class of languages in which highly complex, simultaneous forms are the norm. What has previously been described as irreducible signs are revealed in a Cognitive Grammar analysis to be constructions consisting of highly complex assemblies of symbolic structures.
In the following subsections we present research on complex assemblies of symbolic structures that have been carried out within the Cognitive Grammar framework. In Section 16.4.1, we describe nominal grounding as conceptual pointing. Section 16.4.2 presents an analysis of pointing constructions in signed languages. In Section 16.4.3, we describe a construction that allows signed languages to track referents in discourse, the Proxy-Antecedent Construction. In Section 16.4.4, we describe placing constructions in signed languages, which consist of specific, meaningful locations that are created or recruited by the placing of non-body-anchored lexical signs. These meaningful locations are then used in later discourse to track referents. In Section 16.4.5, we present a Cognitive Grammar analysis of signed language agreement verbs. Finally, in Section 16.4.6 we describe the placing-the-signer construction, by which signers indicate changes in character perspective in narratives and reported dialogue.
16.4.1 Nominal Grounding as Conceptual Pointing
Grounding refers to expressions that establish a connection between the ground (the speech or sign event, its participants, and the immediate circumstances including the time and place of speaking or signing) and the content evoked by a nominal or finite clause. Nominal grounding permits the signer or speaker to direct the interlocutor’s attention to the intended discourse referent, and so may be therefore understood as a kind of conceptual pointing. Physical pointing is a type of linguistic symbol, and the act of pointing is a good point of departure for understanding nominal grounding as a kind of mental pointing (Langacker Reference Langacker2016). Figure 16.1 depicts a prototypical act of pointing.
In the diagram, G is the ground in the current speech event. S and H are the speaker/signer and hearer/addressee. The current discourse environment includes the visually accessible immediate physical context. This onstage (OS) region contains a number of entities (the circles) which could be singled out by pointing. The solid arrow represents the pointing finger directed at FOC, the focus of attention. The act of pointing instructs the addressee to follow, both visually and conceptually, its direction. As a result, both interlocutors focus their attention on the same entity, the intended referent.
16.4.2 Pointing Constructions
As we saw in Section 16.3.2, spatial location is used extensively in signed languages, such as in pointing and in marking verb arguments. Yet, spatial location has posed a problem in previous analyses, leading many linguists to exclude these spatial elements of expressions from linguistic status, claiming instead that signed languages are fusions of gesture and language. In this section we show that a Cognitive Grammar symbolic analysis in which spatial location is a schematic phonological component in a variety of sign constructions resolves this problem.
Pointing is a construction consisting of two component symbolic structures – the means of directing attention, called a pointing device, and the focus of attention, a Place,Footnote 8 each consisting of a form and a meaning (Wilcox & Occhino, Reference Wilcox and Occhino2016; Martínez & Wilcox, Reference Martínez and Wilcox2019). Figure 16.2 depicts the two component symbolic structures and the (bolded) composite pointing construction. Ellipses in the phonological pole of the pointing device indicate schematicity, subsuming, for example, index finger, hand, eye gaze, or body orientation. The only phonological specification of the pointing device is that it has to be capable of directing attention. The pointing device instructs the addressee to follow its direction, so that both participants in the communicative event focus their conceptual attention on the same entity, the Place symbolic structure. The only phonological specification of Place is a spatial location (LOC).

Figure 16.2 Pointing construction
The semantic pole of Place is characterized schematically as ‘thing’ – something conceived through grouping and reification as a single entity. Places arise through the process of schematization acting on our perceptual and experiential world of actual usage events. The baseline Place is a perceptible physical object. Through experience with the world, networks of Places are created with varying degrees of semantic and phonological schematicity/specificity.
Conceptually, a pointing construction selects a particular referent from a pool of candidate entities in our mental universe. It does so by mentally pointing to the selected entity (the arrow in Figure 16.3), thereby profiling (bolded) it as the focus of attention. This candidate pool (large circle) and the potential referents (small circles) are elements of the semantic pole of the pointing device. Semantically, the pointing device does not make reference to a specific entity; rather, the selected referent is a schematic dependent structure internal to the pointing device’s semantic pole (cross-hatching in Figure 16.3 indicates schematicity). This schematic structure is elaborated by the semantic pole of a Place, depicted by the right portion of Figure 16.3.

Figure 16.3 Pointing device semantic pole
16.4.3 Anaphora and Proxy-Antecedent Constructions
Pronominal anaphora relies on conceptual reference point relationships (Langacker Reference Langacker1993, Reference Langacker2000; Van Hoek Reference Van Hoek1997). The reference point relationship is shown in Figure 16.4, in which C is the conceptualizer; R is the reference point, a salient entity in the current discourse space; T is the target structure to which R provides access; and D is the dominion, the set of potential targets.

Figure 16.4 Reference point
A spatial example from English illustrates how reference points operate (Wilcox & Occhino Reference Wilcox and Occhino2016). Spatial reference points are commonly used in providing directions. Suppose we learn that a new café, Carol’s Croissant Cottage, has opened, and we ask a friend where it is. Our friend might start by providing a reference point: “Do you know the intersection of Maple Street and Sugar Lane?”Footnote 9 This reference point then provides mental access to the target: “Carol’s Croissant Cottage is on the southeast corner.” Every nominal can serve as a reference point, with potential targets falling within its dominion. The intersection of Maple Street and Sugar Lane could be a reference point for several other businesses, or a recent traffic accident.
The semantic pole of an anaphoric pronoun profiles a schematic thing (indicated in Figure 16.5 by ellipses in the target). It also incorporates the assumption that the speech act participants have mental access to the intended referent, the full nominal antecedent which serves as the pronoun’s reference point. Mental access is provided by the reference point relationship (indicated by the dashed arrow): The pronoun target is in the dominion of the reference point antecedent, which is presumed to be salient and accessible to the interlocutors in the current discourse context. In saying I just bought a new Honda CR-V Hybrid. I really love the gas mileage it gets, the pronoun it is a target in the dominion of the contextually salient reference point Honda CR-V Hybrid.

Figure 16.5 Antecedent-anaphor as reference point
In signed languages, pointing signs function as anaphoric pronouns, demonstratives, body part labels, and more. The reference point analysis of the relationship between anaphoric pronouns and antecedents applies to signed languages as well, with one difference. Rather than consisting of two elements, an antecedent and an anaphor, antecedent-anaphor constructions in a signed language often require three elements: the antecedent, a proxy antecedent, and the anaphoric pronoun.
For example, in the following Argentine Sign Language narrative excerpt the signer uses a pointing construction in a more complex nominal clause, including an embedded relative clause (marked with brackets).
POSS1 NEW TEACHER point(right) < SAME(rel.) PRO1 1TELL2(perf) point (right) TO-RESEMBLE POSS1 MOTHER TO-RESEMBLE point(right) > / YESTERDAY TO-BE-ABSENT(perf).
‘My new teacher, the one I told you resembles my mother, was absent yesterday.’
The signer first signs a noun, TEACHER, which is then followed by a pointing sign directed to a particular spatial location, establishing an association between the referent TEACHER and the location. Later, the signer again points to the location on the right to refer to the teacher. In this construction, the first point creates a Place symbolic structure on the right side of signing space. The semantic pole of the Place symbolic structure profiles a schematic thing, which in this construction is associated with and elaborated by the semantic pole of the antecedent TEACHER. This Place structure then serves as a proxy antecedent for TEACHER. When the signer later wants to invoke the antecedent, she does not point directly to the lexical sign TEACHER (which is located at the head, and thus would require pointing to the signer’s head), but to the phonological spatial location of the proxy antecedent Place. In Figure 16.6, the semantic poles of the antecedent (TEACHER) and the proxy antecedent are connected by dotted correspondence lines indicating that the two conceptually map to the same entity – the antecedent TEACHER instantiates the schematic semantic pole of the proxy antecedent Place. Later in the narrative, the signer again points to this proxy antecedent Place; here the Place functions as a pronominal anaphor. The entity designated by the semantic pole of the anaphoric pointing construction and the entity that instantiated the proxy antecedent, the prior lexical sign TEACHER, are conceptually mapped to the same entity: They are co-referential.

Figure 16.6 Proxy-Antecedent Anaphor construction
Proxy-Antecedent constructions exhibit a feature unique to signed languages: conceptual co-reference is expressed by recruiting the same phonological location for the proxy antecedent and the anaphor. In other words, conceptual overlap is expressed symbolically (and iconically) by phonological overlap.
16.4.4 Placing Constructions
Places are also components in placing constructions in signed languages. The term placing was introduced by Clark (Reference Clark and Kita2003: 185), who identified pointing and placing as two forms of indicating, that is, of creating indices for things. In pointing, speakers direct their addressee’s attention to the object they are indicating. In placing, “speakers try to place the object they are indicating so that it falls within the addressees’ focus of attention” (Clark Reference Clark and Kita2003: 187). Martínez and Wilcox (Reference Martínez and Wilcox2019) extended the concept of placing in the context of signed languages. First, they observed that in signed language constructions, signs are communicative objects that can be placed in spatial locations. Second, they identified two types of placing: create-placing, in which a new Place is created, and recruit-placing, in which the signer recruits an existing Place. Figure 16.7 depicts a generic placing construction. S is the signer, I is the interlocutor, and G is the ground. The bold line with ball end indicates the physical act of placing. The dashed line with a magnet end indicates the subtle distinction between pointing and placing: Rather than directing attention, placing locates an entity so that it falls within the addressee’s focus of attention, and thus attracts the attention of the interlocutor to the Place.

Figure 16.7 Placing
An example of placing in Argentine Sign Language (LSA) is given in Martínez and Wilcox (Reference Martínez and Wilcox2019). The signer introduces the biography of José de San Martín, a hero of the independence of Argentina, Chile, and Perú. At the beginning of the narrative a noun, PERSON, is placed at the right side of the signer, creating a new Place (Figure 16.8). The schematic semantic pole of the Place is elaborated by the semantic pole of PERSON, and the schematic phonological pole of the Place is elaborated by the location on the right of the signer. Constructions specific to a signed language often specify phonological locations for new Places, for example, specifying that new discourse entities are introduced on the signer’s dominant side, or the specific locations of referents in comparative constructions as seen in Section 16.3.3. The placing construction locates PERSON as a newly introduced referent intersubjectively identified and accessible to the interlocutors.

Figure 16.8 Create-Placing and Pointing in Argentine Sign Language
Once the referent Place is created, the signer is able to refer to it in subsequent discourse. For instance, the sign RENOWNED (Figure 16.8) incorporates the nominal referent ‘person’ as a participant of the adjectival relation by directing the sign toward the ‘person’ Place. RENOWNED profiles one focal participant and associates the property of being famous with San Martín. Later in the discourse, the signer refers anaphorically to the same referent with pointing constructions directed to the Place instantiated by San Martín.
A more complex use of pointing comes from a video in LSA of the official account of the Movimiento Argentino de Sordos (MAS) in support of a bill recognizing Argentine Sign Language. In the video, two deaf leaders explain to those gathered that the MAS movement should not attempt to label or classify hearing people as inherently wrong or bad. The signer introduces what will become the discourse topic: Widespread ideology leads hearing people to believe that deaf people are mentally challenged, not equal to hearing people, or are deaf-mute. He does this by first signing IDEOLOGY, a two-handed sign with a location at the head (Figure 16.9). Then, while his non-dominant hand is still in the head location for IDEOLOGY, he begins to point to it with his dominant hand. By the time the signer completes the pointing action his non-dominant hand has moved down to a neutral position (as seen in the second panel of Figure 16.9); the sign IDEOLOGY is no longer present, and so he is pointing to the spatial location formerly occupied by IDEOLOGY. This pointing construction creates an ideology-Place.

Figure 16.9 IDEOLOGY Pointing and Recruit-placing
To express the idea that the goal is to change society’s ideology pertaining to deaf people, the signer places the sign CHANGE in the newly created ideology-Place (Figure 16.9). The schematic location of CHANGE is elaborated by the location of the ideology-Place, which has been previously elaborated by the phonological location of IDEOLOGY. Semantically, CHANGE profiles an action chain which includes an unexpressed agent and a theme, the changed entity. In this construction, the theme is elaborated by the ideology-Place, which in turn conceptually maps to IDEOLOGY.
Figure 16.10 depicts IDEOLOGY (a) and the ideology-Place (b) created by the pointing construction. IDEOLOGY is a lexical noun with full phonological specification (HC is hand configuration, MOV is movement), including the head location. IDEOLOGY is grounded as a full nominal by the proximal (downward directed) pointing device, which also creates an ideology-Place. IDEOLOGY thus corresponds to and elaborates the schematic Place. CHANGE is then placed at the location of the ideology-Place, indicating that ideology is the changed entity. CHANGE is shown as a construction consisting of two symbolic structures: the action chain process ‘change’ (double arrow) and the theme ‘ideology’. CHANGE incorporates a schematic symbolic substructure, indicated by the rectangle enclosing the theme (TH) and the location (LOC). The semantic pole of this symbolic substructure specifies the theme. The schematic phonological pole is elaborated by the placing construction.

Figure 16.10 CHANGE Recruit-placed
16.4.5 Place and Placing in Agreement Constructions
Kibrik (Reference Kibrik2019: 76) categorizes approaches to agreement in spoken languages into two types: a form-to-form and a cognition-to-form mapping approach. The traditional form-to-form approach claims that an agreement feature originates in one linguistic element, the controller, and is copied onto another one, the target. In the cognition-to-form mapping approach, agreement features are associated with referents in the cognitive representation. A similar approach is suggested by Croft, who proposes that agreement affixes express a symbolic relation (rather than a syntactic relation) indexing the referent, and thus treats agreement as ‘double indexation’ (Croft Reference Croft2001: 229).
Agreement in Cognitive Grammar is analyzed as multiple symbolization: “That is, information about some entity is symbolized by more than one component structure within the same symbolic assembly and thus has multiple manifestations in a single complex expression” (Langacker Reference Langacker2008: 188). Agreement is a matter of the same information being symbolized in multiple places and thus is a special case of the conceptual overlap characteristic of all grammatical constructions (Langacker Reference Langacker2008: 347). Multiple symbolization suggests a Place and placing account of signed language agreement as a special type of conceptual overlap. As we have shown, the predominant way conceptual overlap is symbolized in these constructions is by phonological overlap. Signed languages have several types of constructions in which agreeing elements are symbolized by phonological overlap of Places. The schematic description of one type consists of a verb of transfer consisting of two schematic Places, phonologically expressed by the beginning and ending locations. Semantically, the Places are specified only for agent (beginning Place) and recipient (ending Place). These verb-internal Places are elaborated externally by nominal Places which have been fully elaborated, typically in the previous discourse.
An example of this type of agreement construction in LSA occurs in a narrative about a famous event in Argentina. The signer says, “This man, Lagomarsino, the one who gave the gun to Nisman …” The signer first points to a location in front and slightly to her right, creating a Place, which for the moment remains schematic: We don’t know what or who this Place refers to. She then signs MAN, a body-anchored sign produced at the mouth. Next, she places the sign PERSON at the newly created Place. She then fingerspells the name Lagomarsino followed by a relative marker meaning, ‘the one who’, directed at the Place. Finally, she signs the directional verb GIVE, moving from the Place to a location on her left.
Figure 16.11 diagrams this excerpt. The solid line (a) is the first pointing construction which creates a Place (b). Line (c) represents the placing of PERSON. Line (d) indicates the relative marker directed at the Place. Finally, circle (e) shows the initial location of the verb GIVE (indicated by an arrow), and (f) shows the final location.

Figure 16.11 Lagomarsino discourse segment
Figure 16.12 depicts the semantic pole of the structures in this discourse segment. The arrows indicate the appearance of Place structures across the segment. A semantically schematic Place is created by pointing. Body-anchored MAN elaborates the entity type; we now know that the Place refers to a man. PERSON, placed at the man-Place, provides phonological substance to the location of the Place. Fingerspelled “Lagomarsino” fully elaborates the Place: We now know the referent of the Place. The relative marker, placed at the Lagomarsino-Place, tells us that the referent at this Place is the one who will be doing something. Finally, we learn that Lagomarsino gave a gun (to the person who will be described in the subsequent discourse frame). The initial phonological location of GIVE is placed at the Lagomarsino-Place.

Figure 16.12 Lagomarsino semantic pole
Figure 16.13 depicts the semantic and phonological correspondences. The semantic poles of the initial Place, MAN, PERSON, Lagomarsino, the head of the relative markers, and the agent of GIVE map onto the same conceptual entity. At the bottom of the diagram, the phonological pole correspondences are shown: The initial Place, PERSON, the head of the relative marker, and the agent of GIVE (its initial location) are produced in the same spatial location. MAN is body-anchored and thus has a different phonological location (near the mouth), and Lagomarsino is fingerspelled at a location in neutral signing space.

Figure 16.13 Lagomarsino semantic and phonological poles
A recurrent feature of multiple symbolization or so-called agreement constructions in signed languages is the instantiation of a schematic Place to identify verb arguments: Conceptual overlap or semantic co-reference is symbolized by phonological overlap. This double overlap for the Lagomarsino example is shown in Figure 16.14. The initial Place structure (created by a pointing construction), placed PERSON, the relative marker, and the initial symbolic substructure (agent) of GIVE all map onto the same entity in conceptual space – Lagomarsino. They also map onto the same location in phonological space.

Figure 16.14 Double overlap in multiple symbolization
16.4.6 Placing-the-Signer Constructions
The previous sections presented Place as a symbolic structure and introduced the concept of placing, in which the placed linguistic element was a sign. Placing can be extended to include the signer as a symbolic structure (Wilcox et al. Reference Wilcox, Martínez, Morales, Jucker and Hausendorf2022).
Dialogue in narrative can be presented either as a third-person report (indirect quotation) or as first-person (direct quotation) (Chafe Reference Chafe and Tannen1982). Speakers mark these constructed dialogues with conventional grammatical constructions, or by taking on the voices of characters by changes in pitch, voice quality, and prosody (Schiffrin Reference Schiffrin1981; Tannen Reference Tannen and Coulmas1986). In doing so, the speaker is said to ‘take the point of view’ of a character in the narrative.
Just as speakers have ways of presenting a point of view by taking on the vocal and behavioral qualities of characters, signers use their whole bodies and the space surrounding them to convey viewpoint in reported dialogue. Padden (Reference Padden and Padden1986) offers an example, depicted in Figure 16.15. The signer says, “The husband goes, ‘Really, I didn’t mean it.’” In the first frame the signer faces her actual interlocutor and signs HUSBAND, identifying who will be speaking in the next sequence. The next four frames present the constructed dialogue REALLY ME NOT MEAN “Really, I didn’t mean it” as signed by the husband. To mark the constructed dialogue, the signer shifts her body to the right and directs her eye gaze at the husband’s virtual addressee.
These constructions in signed languages tend to include one or more of these phonological features: (i) a change in body orientation; (ii) a change in eye gaze direction; and (iii) a change in deixis (the body of the signer is rearranged to take somebody else’s point of view). These constructions also exhibit some or all of the following semantic features (Engberg-Pedersen Reference Engberg-Pedersen1993): (i) shifted reference: the use of pronouns to refer to somebody other than the sender/narrator; (ii) shifted attribution of expressive elements: the use of the signer’s face and/or body posture to express emotions or attitudes of somebody other than the sender/narrator in the context of utterance; and (iii) shifted locus: the use of the sender/narrator locus for somebody other than the sender/narrator.
Not only are signs symbolic structures that may be placed but the signer may also function as a symbolic structure, and in these cases the signer can move about, occupying different meaningful locations (Wilcox et al. Reference Wilcox, Martínez, Morales, Jucker and Hausendorf2022). In these cases, the signer is a symbolic structure which can be placed, either to create a Place or to recruit an existing Place. Such placing-the-signer constructions are used in narrative reporting of dialogue and events.
Signed languages are, quite literally, face-to-face visual languages. In signed language interaction, the canonical communication configuration is for one signer to face another signer at some culturally determined distance. Between the two is a line of sight. This arrangement can be called the canonical interactional configuration. In addition to this canonical configuration, signers adhere to two conversational principles for effective visual communication: (1) reduce excessive moving around from the point of view of the interlocutor, and (2) make your signs as visible as possible.
When narrating a story, the canonical conversational configuration and communicative principles constrain the signer; the signer is not free to occupy any location whatsoever, and the signs must remain visible. It may not seem obvious at first why we analyze this strategy for expressing reported dialogue as placing the signer: In what sense has the signer in Figure 16.15 been spatially ‘placed’? An instructional video designed to teach students of Brazilian Sign Language (Libras) how a single signer reports interactional dialogue illustrates how placing the signer works. In this video the instructor, Eduardo (on the right), and his colleague, Leonardo (on the left), first demonstrate a signed dialogue as it would actually take place between two interlocutors, maintaining the face-to-face canonical interactional configuration (Figure 16.16).

Figure 16.16 Demonstration of canonical two-person interaction
Eduardo then shows how the same interaction would be presented by a single narrator with reported dialogue (Figure 16.17). The simplest and most realistic way for a narrator to present a two-person reported dialogue to an audience would be to ‘act out’ the interaction by taking both roles – that is, by simply recreating Figure 16.16, moving between the spatial locations of the two participants. This would, however, violate the visual communication principles. Instead, Eduardo as the narrator remains in one location, and by changing the orientation of his body, the narrator alternately assumes the role of Eduardo or Leonardo.
A diagrammatic representation of the real two-person interaction and the strategy in which the signer is placed is depicted in Figure 16.18. The top portion (A) shows the original interaction with Leonardo (L) on the left and Eduardo (E) on the right. The construction in which the signer is placed to express the reported interaction is depicted in the lower portion (B). When Eduardo as narrator (N) presents Eduardo’s utterances in interaction with Leonardo, he rotates his orientation slightly to his right, indicating that he has assumed Eduardo’s Place. Virtually present Leonardo assumes a position directly in front of Eduardo to maintain the canonical interactional configuration. When presenting Leonardo’s utterances directed to Eduardo, Eduardo as narrator changes orientation in the opposite direction, thus indicating that the narrator has occupied the Leonardo Place; in doing so, he takes the role of Leonardo. Eduardo, as a virtual addressee, now assumes a position in front of Leonardo to maintain the canonical interactional configuration. Thus, the overall scene is presented with Eduardo ‘playing’ himself and Leonardo, who are alternately presented as virtual versions of themselves (represented by the dashed circles).

Figure 16.18 Placing the signer in reported dialogue constructions
In this instructional video we see the relation between a real two-person interaction and how a single narrator reports the interaction, maintaining the canonical interactional configuration while also abiding by the visual communication maxims. Using Figure 16.18 to depict the original dialogue underlying Figure 16.15 would have the husband (L) facing the actual addressee (E), as in the top portion of Figure 16.18. In the reported dialogue, the signer as narrator (N in the bottom portion of Figure 16.18) is placed by orienting herself to the left, thereby assuming the role of the husband (L). The husband addresses the virtual addressee, now located at E (dashed circle). The bottom portion (B) of Figure 16.18, minus the character labels (L and E), thus represents the schematic description of reported dialogue constructions. Change in orientation marks the placing of a narrator into two different Places, each representing a conceptualizer. In reported dialogue constructions, the conceptualizers are the conversational participants. By using schematic Place symbolic structures, the construction conceptually maps the semantic pole of the signer as a symbolic structure onto the semantic pole of a virtual discourse participant.
This description of a placing-the-signer construction has examined two-person interactions in reported dialogue. Placing the signer can also be used when presenting narrative about two characters interacting, and in narratives in which one character interacts with two different participants (Wilcox et al. Reference Wilcox, Martínez, Morales, Jucker and Hausendorf2022). One use of placing the signer occurs in an Argentine Sign Language translation made by the deaf Argentine poet and story teller Diego Morales of the short story Continuidad de los parques by Julio Cortázar. In one section of the story, a man and his lover are talking to each other. In this excerpt, the signer as narrator clearly takes the perspective of the woman and signs directly into the camera, asking a question of the man, “Why do you have a scratch on your cheek?,” which places the audience in the role of the man (Figure 16.19). The signer then replies, taking the perspective of the man, telling the woman “Stop kissing me on the cheek” (Figure 16.20). This now places the audience in the role of the woman. In the placing analysis, the narrator and the audience have conventional, specific phonological locations (the former is the location of the actual signer; the latter is the location of the camera), which are the phonological poles of two Place structures. These two Places are then mapped onto the two lovers, each assuming the canonical interaction configuration with their interlocutor. Just as in the other strategies, the narrator must assume both roles. As he alternates between the man and the woman, the audience also alternates between the two interlocutors – that is, the narrator and the audience Places are alternately conceptually mapped onto the man and the woman. This strategy of placing the audience onstage has a dramatic effect, since the audience ‘becomes’ the characters within the interaction. The audience ‘sees through the eyes’ of one of the characters and thus feels included in the dialogue.

Figure 16.19 Lovers’ interaction (woman)

Figure 16.20 Lovers’ interaction (man)
Placing-the-signer constructions conceptually map the semantic pole of the signer as a symbolic structure onto the semantic pole of a virtual discourse participant, bringing the signer and the participant into conceptual correspondence. The constructions accomplish this by mapping the phonological pole of the signer’s Place onto the discourse participant’s Place. Placing-the-signer constructions are the grammatical means used to indicate changes in the conceptualizer, corresponding to conversational participants in narratives and reported dialogue. Placing-the-signer constructions also appear in other grammatical constructions, including fictive discourse and fictive interaction (Jarque & Pascual Reference Jarque, Pascual, Dancygier, Lu and Verhagen2016), in which the Places may represent different stances of the same conceptualizer, passive constructions in ASL (Janzen et al. Reference Janzen, O’Dea and Shaffer2001), and evidentiality (Shaffer Reference Shaffer, Dancygier and Sweetser2012; Jarque & Pascual Reference Jarque and Pascual2015; Wilcox & Shaffer Reference Wilcox, Shaffer and Aikhenvald2017).
16.5 Summary
From its beginning with Stokoe’s pioneering work in the late 1950s, the field of signed language linguistics has blossomed into a rich body of research from a variety of theoretical perspectives. Linguists now apply a variety of theoretical approaches to the study of the phonology, morphology, and syntax of signed languages.
Recently, research adopting constructional approaches has begun to appear. This work examines the structure of sign grammar in terms of constructions consisting of conventional pairings of meaning and form containing both fixed, specific components and schematic components, organized in complex structured patterns. Constructional approaches are beginning to show that the categorical distinction between monomorphemic lexicon and multimorphemic classifier expressions is no longer tenable, revealing instead a continuum. Constructional approaches are also being applied to multi-word expressions, again revealing patterns with schematic and specific components integrated into conventional complex assemblies. A unique feature of signed languages is their meaningful use of space. Constructional approaches, such as the Cognitive Grammar analysis of Place and placing described in this chapter, reveal that schematic spatial locations are conventional components of constructions. Such analyses may lead to further insight into the use of multiple symbolization in the manifestation of so-called agreement expressions. In addition, signers and their spatial locations are symbolic components of signed constructions in constructed dialogue, fictive interaction, evidentiality, passive constructions, and more.
Much more research needs to be conducted adopting the constructional perspective. One area of research that usage-based constructional approaches can address is the issue of the linguistic status of components of signed languages constructions: to what extent locations of real and imaginary entities in the spatial environment are elements of linguistic structure, and whether these and other spatial aspects of constructions are fully linguistic or integrations of linguistic and gestural components.












































