Construction Grammar and Gesture

doi:10.1017/9781009049139.016

15 - Construction Grammar and Gesture

from Part IV - Multimodality and Construction Grammar

Published online by Cambridge University Press: 30 January 2025

Elisabeth Zima

Edited by

Mirjam Fried and

Kiki Nikiforidou

Show author details

Mirjam Fried: Affiliation:
Univerzita Karlova
Kiki Nikiforidou: Affiliation:
University of Athens, Greece

Book contents

Summary

This chapter presents the current state of research in multimodal Construction Grammar with a focus on co-speech gestures. We trace the origins of the idea that constructions may have to be (re-)conceptualized as multimodal form–meaning pairs, deriving from the inherently multimodal nature of language use and the usage-based model, which attributes to language use a primordial role in language acquisition. The issue of whether constructions are actually multimodal is contested. We present two current positions in the field. The first one argues that a construction should only count as multimodal if gestures are mandatory parts of that construction. Other, more meaning-centered, approaches rely less on obligatoriness and frequency of gestural (co-)occurrences and either depart from a recurrent gesture to explore the verbal constructions it combines with or focus on a given meaning, for example, negation, and explore its multimodal conceptualization in discourse. The chapter concludes with a plea for more case studies and for the need to develop large-scale annotated corpora and apply statistical methods beyond measuring mere frequency of co-occurrence.

Keywords

construction gesture multimodal constructions recurrence frequency (non-)obligatoriness open slots

Information

Type: Chapter
Information: The Cambridge Handbook of Construction Grammar , pp. 384 - 404

DOI: https://doi.org/10.1017/9781009049139.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

15 Construction Grammar and Gesture

15.1 Introduction

To state that language use is fundamentally multimodal is uncontroversial for usage-based linguists. It is long recognized that the primary setting of language or its ur-context (Cienki Reference Cienki2016: 605) is face-to-face interaction and in direct, face-to-face interaction we simultaneously draw on verbal speech, gesture, posture, facial expressions, and other non-verbal cues to convey meaning. Yet, it is only a fairly recent development that cognitive linguists have started to fully embrace the multimodal nature of language use by working with authentic, video-recorded discursive data and developing theories to account for how semiotic modes work together in conceptualization.

A serious boost for multimodality research from a cognitive-linguistic perspective came from pioneer studies on multimodal metaphor and metonymy as expressed in co-speech gesture (Mittelberg Reference Mittelberg2006, Reference Mittelberg2019; Cienki Reference Cienki, Müller and Cienki2008; Müller Reference Müller2008; Cienki & Müller Reference Müller2008) as well as in pictures and video (Forceville Reference Forceville and Gibbs2008; for an overview see Sanaz Reference Sanaz2013 and Feyaerts et al. Reference Feyaerts, Brône, Oben and Dancygier2017). Over the past decade, other cognitive-linguistic paradigms, most notably Construction Grammar, have followed that path and widened their focus towards the kinesic modalities. A growing number of construction grammarians have raised the issue of whether, in the light of the inherent multimodality of human language use, the status of constructions as pairings of verbal forms and verbally encoded meanings needs to be reconsidered (Andrén Reference Andrén2010; Cienki Reference Cienki2015, Reference Cienki2016; Zima Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b; Zima & Bergs Reference Zima and Bergs2017; Feyaerts et al. Reference Feyaerts, Brône, Oben and Dancygier2017; Schoonjans Reference Schoonjans2018). At the same time, interactional linguists and gesture researchers have turned to Construction Grammar in search of a model of linguistic knowledge and cognitive representation to account for the tight coupling of verbal and kinesic structures observed in language use (Lanwer Reference Lanwer2017; Stukenbrock Reference Stukenbrock, Weidner, König, Wegner and Imo2020; Debras Reference Debras2021).

This convergent development, which one may hope will ring in a fully-fledged multimodal turn in Cognitive Linguistics (Zima & Brône Reference Zima and Brône2015), originates in the very core of the usage-based model and its premise that all knowledge of language is abstracted from language use. The implications of fully embracing the multimodality of language use, though, are far reaching for Cognitive Linguistics. The issue opens up the question of “what counts as language?” (Cienki Reference Cienki2016: 606) and thus what the research objects of Cognitive Linguistics should be. Furthermore, many theoretical debates that are ongoing within the field come to the fore with even greater saliency once we take a broader, multimodal perspective (Cienki Reference Cienki2017: 1). This also holds for the nascent field of multimodal Construction Grammar, which struggles with a number of theoretical and empirical issues and is occasionally met with skepticism within Construction Grammar and gesture studies alike (Ningelgen & Auer Reference Ningelgen and Auer2017; Lanwer Reference Lanwer2017; Ziem Reference Ziem2017; Debras Reference Debras2021). Therefore, my aim for this chapter is to present the current state of the ongoing debate on whether “we really need a multimodal Construction Grammar” (Ziem Reference Ziem2017: 1). I will start by giving a basic introduction to what gestures are, how they convey meaning, and why the discussion on the constructional status of gestural information only concerns co-speech gestures. There is no controversy that emblematic gestures (also called ‘emblems’) are constructions in their own right, just as signs of sign languages are (Hoffmann Reference Hoffmann2017). To illustrate co-speech gestures’ close integration with speech, I will show how they contribute to an utterance’s meaning at all levels and also touch upon issues of the temporal alignment between gestures and speech. Both are crucial aspects to be borne in mind when exploring the possible existence of multimodal constructions and the nature of the constructicon.Footnote ¹

15.2 What Are Gestures and How Do They Convey Meaning?

Lay people often use the word ‘gesture’ as an umbrella term covering all sorts of hand movements, ranging from pointing gestures, iconics, and depictions to unspecific hand movements such as scratching one’s head or fiddling with one’s wedding ring. In gesture studies, the concept is on the one hand employed in a broader sense, encompassing all sorts of bodily articulators, such as the hands, the head, shoulders, arms, feet, and also facial expressions. On the other hand, the analytical focus is restricted to what Adam Kendon has termed ‘gesticulation’: “visible bodily action used as an utterance or as part of an utterance” (Kendon Reference Kendon2004: 7) or for short “utterance visible action” (Kendon Reference Kendon2014: 7). Gestures are thus produced with the intent to be semantically and pragmatically meaningful and thereby an integral part of utterance construction. In Kendon’s words, they are “employed to accomplish expressions that have semantic and pragmatic import similar to, or overlapping with, the semantic and pragmatic import of spoken utterances” (Kendon Reference Kendon2014: 7). In a similar vein, Calbris (Reference Calbris2011: 6) defines gestures as “visible movement[s] of any body part consciously and unconsciously made with the intention of communicating while speech is being produced” (my emphasis).Footnote ² Both definitions emphasize gestures’ communicative meaning or deliberate expressiveness and hence exclude bodily movements that are not produced with the intent of encoding semantic-pragmatic meaning but rather reveal aspects of the speaker’s emotional or psychological state. The boundary, however, is not clear-cut and the analysis of authentic discourse always reveals a number of ambiguous cases. Nonetheless, there is consensus on what constitutes the core domain of co-speech gesture or ‘visible bodily action’: Gestures are kinesic movements that point towards a referent (present or imagined), depict a concrete or abstract referent, or serve to structure discourse.

David McNeill, one of the leading researchers in the field of psycholinguistic modality research, has therefore proposed a gesture typology comprising four types: deictics, iconics, metaphorics, and beats (McNeill Reference McNeill1992).Footnote ³ Deictic, iconic, and metaphorical gestures are referential in nature, that is, they relate to a referent either by pointing to or by depicting it. This referent may be a concrete entity (iconics) or an abstract one (metaphorics).Footnote ⁴ Beat gestures (also called ‘batons’, Efron Reference Efron1941; Ekman & Friesen Reference Ekman and Friesen.1969) are coordinated with the rhythm of the speech they accompany. The relationship is not semantic but discursive-pragmatic as they are often used to stress or emphasize a particular aspect. With respect to formal aspects, they usually consist of a back and forth, up-down, or left-right movement.

Another gesture category, not mentioned in McNeill (Reference McNeill1992), which, however, may also play a role for multimodal Construction Grammar, are recurrent gestures (Ladewig Reference Ladewig, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014), such as the palm-up open hand (Kendon Reference Kendon2004; Müller Reference Müller, Müller and Posner2004), the throwing-away gesture (Bressem & Müller Reference Bressem, Müller, Müller, Cienki, Fricke, Ladewig, McNeill and Teßendorf2014), and cyclic gestures (Ladewig Reference Ladewig2011). Their main characteristic is the fact that they “show a stable form–meaning relationship” (Ladewig Reference Ladewig, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014: 1158) and are thus more conventionalized than the spontaneous gestures that fall within the four other categories. However, they have not (yet) developed into emblems such as, for example, the thumbs-up gesture. They are thus not fully conventional signs or constructions in a Construction Grammar sense with a speech-independent semantics, as is the case with emblematic gestures (for an overview of emblematic gestures, see Teßendorf Reference Teßendorf, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014). Rather, the meaning of recurrent gestures is schematic. Most notably, Bressem and Müller (Reference Bressem and Müller2017) propose that one such recurrent gesture, that is, the throwing-away gesture, constitutes the gestural component of a verbo-gestural pattern expressing negative assessment, which may qualify as a multimodal construction in a Construction Grammar sense (to be explained in more detail in Section 15.4.1).

Other important facts to know about gestures pertain to the way they are produced in time and in relation to speech and the difference as to how they encode meaning in contrast to verbal language. Pioneering work by Adam Kendon (Reference Kendon and Key1980) has identified different phases in the execution of a gesture. The central phase is the so-called stroke phase. It is this phase that we excerpt the gesture’s meaning from. The stroke phase is preceded by a preparation phase, in which the hands move from the rest position to perform the stroke (usually close to the center of the speaker’s gesture space; McNeill Reference McNeill1992). This stroke phase may be followed by a retraction phase in which the hands move back into the rest position. Hold phases in between these phases (e.g., post-stroke hold) or within the stroke phase are also possible and often correlate with verbal disfluencies.

These gesture phases can be combined to form higher-level units: gesture phrases and gesture units. Gesture phrases comprise preparation phases and strokes and gesture units involve the full movement cycle from preparation to retraction. Gesture phrases, gesture units, and speech are temporally aligned with each other in a particular way: The preparation phase usually precedes the articulation of the lexical affiliate, that is, the lexical element that is semantically co-expressive with the gesture. Concerning the gesture stroke, there is more controversy. Some studies argue that the stroke onset may start and even end before the affiliate is articulated (Ferré Reference Ferré2010; ter Bekke et al. Reference ter Bekke, Drijvers and Holler2020), while others report that the stroke coincides with the affiliate (Chui Reference Chui2005; McNeill Reference McNeill2005). Focusing on the relationship between gesture and intonation, Loehr (Reference Loehr2004) reports that the stroke most typically shortly precedes or collides with the utterance’s focus accent. This is corroborated by follow-up studies (Jannedy & Mendoza-Denton Reference Jannedy and Mendoza-Denton2005; Shattnuck-Hufnagel et al. Reference Shattnuck-Hufnagel, Yasinnik, Veilleux, Renwick, Espositio, Bratanić, Keller and Marinaro2007). Although the details of the temporal alignment between speech and gesture are thus partly subject to debate, it is uncontroversial that they are closely aligned and this temporal alignment is mirrored in their semantic alignment, as co-speech gestures are generally considered to be co-expressive.

However, verbal language and gestures co-express meaning in different ways. Speech is segmented on various levels, that is, into phonemes, lexemes, phrases, constructions, etc. Although gestures can be segmented, too, their meaning is not compositional. Rather, they are considered to constitute one meaningful whole. Furthermore, co-expressiveness should not be confused with semantic redundancy. Although gestures can of course co-express meaning that is also encoded verbally, it is common for them to express meaning aspects that are not specified in speech (Kendon Reference Kendon and Key1980; McNeill Reference McNeill1992). For instance, if a speaker recounts a soccer match and says that “the defender tackled me and I lost the ball” and moves their right arm to depict an elbow check, we infer from this gesture that the act of tackling involved an elbow check and take it to be the reason the speaker lost the ball. But the gesture does more than that. It also involves specific information about how the elbow check was performed, including how quickly and with how much physical force, whether the elbow was moved in a horizontal trajectory or whether it was lifted, possibly to aim at the opponent’s upper body or face. All this information is put in and inferred from the gesture and it is obviously far more ecological and easier to depict all of it in one gesture than to put it in words. Hence, there is a division of labor between the verbal and the gestural modality or, put differently, they both work together to convey one thought. Kendon (Reference Kendon and Key1980) has termed this one thought the underlying ‘idea unit’. Speakers, however, do not only make use of referential gestures to express content that is easier to depict than to recount; they also use gestures to highlight meaning aspects (Alibali & Kita Reference Alibali and Kita2010; Schoonjans Reference Schoonjans2018). This point has been made most notably by Müller (Reference Müller2008), who argues that metaphors that are co-expressed in gesture and speech entail that the level of activation of the construal’s metaphoricity is higher than if the metaphor is only present in one modality. This is reminiscent of Givón’s (Reference Givón1985) principle of quantitative iconicity: “More form is more meaning.”

Besides these communicative and semantic-pragmatic functions, gestures also serve a number of functions that are linked to speech production and interaction management. For instance, gestures are frequent in persistent word searches and it has been argued that gesturing helps to overcome the word retrieval problem because the motor activity stimulates cognitive activity (Kraus Reference Kraus1998) and reduces cognitive load (Goldin-Meadow et al. Reference Goldin-Meadow, Nusbaum, Kelly and Wagner2001). Accordingly, it is claimed that gestures, including representational gestures, have self-oriented cognitive functions (Kita et al. Reference Kita, Alibali and Chu2017). At the same time, gestures also play a role in the turn-taking process as they are used to allocate turns as well as to signal a wish to take the turn (Mondada Reference Mondada2007; Schmitt Reference Schmitt2014; Zima Reference Zima2018). Gestures are hence multi-functional and this multi-functionality has a number of implications for how co-occurrences of gestures and verbal constructions may be modeled within Construction Grammar.

After this compact overview of some of the main characteristics of the gestural modality, the next section addresses this chapter’s main concern: Are constructions multimodal and how do we know? We start with the theoretical seeds of the idea.

15.3 Multimodal Constructions? The Discussion’s Theoretical Foundations

This contribution focuses on the place of co-speech gestures within Construction Grammar, most notably Cognitive Construction Grammar (Goldberg Reference Goldberg1995, Reference Goldberg2006), which clearly subscribes to the usage-based thesis (Barlow & Kemmer Reference Barlow, Kemmer, Barlow and Kemmer2000). Accordingly, it holds that all linguistic knowledge is abstracted from language use, drawing on general cognitive mechanisms such as pattern recognition, abstraction, schematization, and categorization. This usage, that is, the input, is inherently multimodal: We do not only speak with words, but we gesture, we direct our gaze someplace, we display emotions and attentional states through our postures and facial expressions, we speak up sometimes and whisper at other times, etc. Language is thus learned in a multimodal environment (Enfield Reference Enfield2009). Most notably, children gesture before they are able to speak and the language acquisition process is heavily dependent on co-speech gesture use (e.g., to establish the link between a given concept and the name for this concept). The dependence of language use on co-speech gesture use diminishes in the course of language acquisition (Cienki Reference Cienki2015), but nonetheless communication in face-to-face interaction remains inherently multimodal throughout the lifespan. Therefore, one theoretical argument put forward in favor of a multimodal reconceptualization of language is grounded in the fact that we obviously have extensive, systematic, and structured knowledge of how to communicate in multimodal environments. This knowledge must be stored, that is, entrenched, in one way or the other. The crucial question is: Is it part of linguistic knowledge, of grammar?

Usage-based linguistics models grammar as “the cognitive organization of one’s experience with language” (Bybee Reference Bybee2006: 2916) and construction grammarians have posited that this cognitive organization consists of constructions only. This idea is often referred to by citing Goldberg’s iconic statement (Reference Goldberg2003: 226), “It’s constructions all the way down.” Hilpert (Reference Hilpert2014: 2) has rephrased the same idea as “Knowledge of language consists of a large network of constructions, and nothing else in addition.” However, there is a crucial difference, notable in Bybee’s and Hilpert’s quotes, because Bybee refers to grammar, whereas Hilpert speaks of knowledge of language. This is not a trivial difference, as the way we model knowledge of gesture use in the Construction Grammar framework crucially depends on how we conceptualize the relationship between grammar and language.

From Hilpert’s encompassing view of constructions and the constructicon one may infer that constructions must include information on how to instantiate them multimodally because it is assumed that all language knowledge is stored as part of constructions and, surely, the way we use constructions is entrenched knowledge. This resonates with the line of argument put forward in Zima (Reference Zima2014b): If we align with the idea that the constructicon comprises all of our knowledge of language but conceptualize language and constructions as purely monomodal, it leaves us with the unresolved problem of needing to explain why the usage-based thesis should only hold for recurrences at the verbal level and where our rich knowledge on how to communicate multimodally, that is, to employ constructions multimodally, is stored.

Another take on the issue, however, is to view grammar and language knowledge as non-equivalent. This implies that knowledge of language includes grammatical knowledge as well as other forms of knowledge that are abstracted from language usage, potentially including knowledge on how to combine constructions with gestures. Quite a few authors have advocated this position (Ningelgen & Auer Reference Ningelgen and Auer2017; Ziem Reference Ziem2017; Verhagen Reference Verhagen2021), proposing it as a way out of the current impasse in the field. This discussion is not settled and it cannot be resolved in this chapter, but one consequence seems evident: If grammar and knowledge of language are only partly overlapping, the Construction Grammar claim that knowledge of language is “constructions and nothing else in addition” (Hilpert Reference Hilpert2014: 2) may not be tenable and may need to be revised.

In this context, it is important to note that the discussion on where to locate gestures in the constructicon did not really originate in Construction Grammar but was stipulated by Ronald Langacker, who explicitly acknowledged that gestures may be part of a linguistic unit:

In Cognitive Grammar …, the form in a form–meaning pairing is specifically phonological structure. I would of course generalize this to include other symbolizing media, notably gesture and writing. … Cognitive Grammar takes the straightforward position that any aspect of a usage event, or even a sequence of usage events in a discourse, is capable of emerging as a linguistic unit, should it be a recurrent commonality.

(Langacker 2001: 146, author’s emphasis)

In 2008, he even got more specific, giving the example of a co-speech gesture that is performed in baseball:

When a baseball umpire yells Safe! and simultaneously gives the standard gestural signal to this effect (raising both arms together to shoulder level and then sweeping the hands outward, palms down), why should only the former be analyzed as part of the linguistic symbol? Why should a pointing gesture not be considered an optional component of a demonstrative’s linguistic form?

(Langacker 2008: 250)

The theoretical statement and the example, however, differ in one important aspect. In the case of the umpire signal, the gesture is a mandatory component of the sign, that is, the signal is not adequately performed if one only yells Safe! and does not gesture. Therefore, from a Construction Grammar perspective the status of this form–meaning pairing as consisting of a verbal and a gestural component is rather uncontroversial, and some authors have indeed argued in a similar vein, proposing that constructions are multimodal if and only if a gestural component is mandatory and cannot be omitted without the construction being incomplete (Ningelgen & Auer Reference Ningelgen and Auer2017; Ziem Reference Ziem2017). In the case of the baseball signal, completeness is determined by sports convention, that is, to just perform the gesture without yelling Safe! is not uninterpretable but it is treated as pragmatically unacceptable. This is because at some point in time people have agreed upon the convention that in order for the umpire signal to be effective and consequential, the verbal and gestural parts have to be performed together. In other cases, especially in the case of some deictic constructions which are often discussed as candidate multimodal constructions (Stukenbrock Reference Stukenbrock2010, Reference Stukenbrock2015, Reference Stukenbrock, Weidner, König, Wegner and Imo2020; Ningelgen & Auer Reference Ningelgen and Auer2017; Balantani Reference Balantani2021), completeness is a semantic-pragmatic category. This, for instance, concerns deictic constructions like [like that/this] or [this ADJ] (also German so ‘like this’; Stukenbrock Reference Stukenbrock2015; Ningelgen & Auer Reference Ningelgen and Auer2017). These are uninterpretable without a gesture that specifies the deictic slot, by, for example, depicting how a certain action has to be performed (‘you need to hold your hand like this’) or by specifying the shape of an object or some spatial dimension (‘the hole was this big’). For constructions that involve an obligatory gestural component, the multimodal unit status is uncontested, too. Rather, the debate centers on the questions whether obligatoriness of a gestural component is a prerequisite for multimodal constructions and whether gestures can fill optional slots of multimodal constructions. The latter hypothesis is grounded, among others, in Goldberg’s definition of constructions as frequency dependent:

Any linguistic pattern is recognized as a construction as long as some aspect of its form or function is not strictly predictable from its component parts or from other constructions recognized to exist. In addition, patterns are stored as constructions even if they are fully predictable as long as they occur with sufficient frequency.

(Goldberg 2006: 5, my emphasis)

Innumerable studies have since then studied the effects of frequency on unit formation and entrenchment (e.g., Bybee Reference Bybee2006; Schmid Reference Schmid, Geeraerts and Cuyckens2007, Reference Schmid2014; Blumenthal-Dramé Reference Blumenthal-Dramé2012; Divjak & Caldwell-Harris Reference Divjak, Caldwell-Harris, Dąbrowska and Divjak2015; Divjak Reference Divjak2019) (see also Section 15.4.1), providing arguments and counterarguments for the unit status of highly frequent instantiations alongside more abstract and/or unpredictable constructional patterns, while at the same time agreeing on the fact that ‘sufficient frequency’ is too vague a term and therefore not an operational criterion (Traugott & Trousdale Reference Traugott and Trousdale2013: 11; for discussion, see also Hartmann & Ungerer Reference Hartmann and Ungerer2023). At the same time, the exemplar view advocated by Bybee (Reference Bybee2010) holds that even constructions that one comes across only once or a couple of times in one’s life may get stored in the long-term memory if there is some salient aspect to them that makes them stick in the mind. The exact role of frequency in Construction Grammar is hence still disputed (Hoffmann Reference Hoffmann, Hoffmann and Trousdale2013) and this has implications for multimodal Construction Grammar. Obviously, it is impossible to define a frequency threshold for gesture recurrence that any claim about the constructional status of a (verbal) construction-gesture co-occurrence can be safely based on. This has been the most critical issue in multimodal Construction Grammar so far. It touches upon the recognizable gap between the general acceptance of the claim that language is multimodal and the difficulties in proving that a particular construction is multimodal in nature. The next section sketches the state of the art in the field.

15.4 State of the Art in Multimodal Construction Grammar

The current debate in the field can be framed as comprising two main strands. The first one includes construction-based case studies that in one way or another rely on frequency of gesture co-occurrence as an argument in favor of or against the multimodal status of constructions. The second strand takes a more gesture- and meaning-centered approach. The following section is structured as follows. In Section 15.4.1, the state of the art in the field is presented by focusing on the case studies that have been conducted so far. These studies lay the groundwork for the presentation of approaches that draw on them for proposing novel ways to think about the issues under debate, most notably Cienki’s (Reference Cienki2017) proposal of an ‘utterance construction grammar’. These proposals are discussed in Section 15.4.2.

15.4.1 Case Studies

As outlined above, one of the main arguments brought forward in favor of a multimodal reconceptualization of the constructicon and constructions is grounded in claims that “any recurrent aspect of a construction’s usage can become entrenched” (Langacker Reference Langacker2001). Over the past decade, several studies have shown that gestures recurrently and systematically co-occur with given verbal constructions but co-occurrence frequencies vary strongly. They range from up to 85 percent for English motion and distance constructions ([all the way from X PREP Y]; see Zima Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b, and also Pagán Cánovas & Valenzuela Reference Pagán Cánovas and Valenzuela2017) to approximately 70 percent for different types of English time expressions (Pagán Cánovas et al. Reference Pagán Cánovas, Valenzuela, Alcaraz-Carríon, Olza and Ramscar2020), 58 percent for English aspectual verbs (Hinell Reference Hinell2018), and 37 percent (and less) for German modal particles (Schoonjans Reference Schoonjans2018). To date, except for Ningelgen & Auer (Reference Ningelgen and Auer2017) on deictic so ‘like this’ in German (see discussion in Section 15.3 on mandatory gestures with particular deictic expressions), no study thus far reports co-occurrence rates of 100 percent, and it seems safe to say that there may indeed be only very few verbal constructions that qualify as multimodal if a 100 percent co-occurrence rate is taken as the sole criterion. Ziem (Reference Ziem2017) takes this to be a strong counterargument against the multimodal conception of constructions and the constructicon. Similarly to Ningelgen and Auer’s line of argumentation, he proposes to perform deletion tests, arguing that the gesture’s input to the meaning of the construction must be so crucial that without the gesture the construction collapses and becomes uninterpretable.

A different path is followed by Lanwer (Reference Lanwer2017), Schoonjans (Reference Schoonjans2017), and most recently Debras (Reference Debras2021), who argue that mere frequency is rather uninformative and the analytical focus needs to be transferred to how gestures contribute to utterance meaning. Debras (Reference Debras2021) links this to a general complaint that the focus of Construction Grammar is too much on form. If we take the verbal construction and its form as the point of departure, we tend to consider the co-occurring gesture as secondary and optional, that is, something we add while we speak but which we could equally well leave out. Our notion of constructions and the constructicon, however, may look fundamentally different if we depart from the meaning side (cf. Lasch Reference Lasch2020 and his meaning-centered approach to the German constructicon) and shift focus to how gesture and speech collaborate to express an idea, that is, the ‘idea’ unit according to Kendon (Reference Kendon2004). This is the line of argumentation followed by, for example, Hoffmann (Reference Hoffmann2017), Mittelberg (Reference Mittelberg2017), Bressem & Müller (Reference Bressem and Müller2017), Schoonjans (Reference Schoonjans2018), and (partly) Zima (Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b).

Departing from an emergent grammar perspective, which takes grammar to be “the name for certain categories of observed repetitions in discourse” (Hopper Reference Hopper and Tomasello1998: 156), Mittelberg (Reference Mittelberg2017) presents a case study on the German existential construction [es gibt X] ‘there is an X’. She argues that this particular construction involves a slot for a gestural enactment that depicts an act of “giving or holding something” (Mittelberg Reference Mittelberg2017: 1). This gestural re-enactment is grounded in the basic pattern of experience that Goldberg has argued to motivate (di)transitive constructions: “The initial meaning is an experiential gestalt. This basic pattern of experience is encoded in a basic pattern of language” (Goldberg Reference Goldberg and Tomasello1998: 208). Accordingly, Mittelberg (Reference Mittelberg2017:2) argues that “the basic manual actions of giving and holding … motivate multimodal instantiations of existential constructions in German discourse.” Drawing on semi-experimental data of German spoken discourse, she illustrates that es gibt-constructions co-occur with unimanual variants of the palm-up open-hand gesture as well as bimanual palm-vertical open-hand gestures.Footnote ⁵ Her analysis shows that there is formal recurrence in the gestures, while their semantic-pragmatic meaning is clearly situated and dependent on the discursive context. The semantic recurrence only holds for the very schematic meaning of “holding some kind of imaginary entity.” As Mittelberg acknowledges, these analyses are preliminary, but her work on existential constructions points towards candidate constructions for future research in multimodal Construction Grammar by suggesting that “linguistic constructions that recruit basic embodied manual actions and interactions with the physical and social world are particularly likely to be instantiated multimodally and thus also engender emergent multimodal patterns, or clusters, of experience” (Mittelberg Reference Mittelberg2017: 5).

This conclusion seems to be backed up by my own studies on English motion and distance constructions such as [V_motion in circles], [zigzag], and [all the way from X PREP Y] (Zima Reference Zima2014b and Zima Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b). In American English data from various TV formats (UCLA Library NewsScape; Steen et al. Reference Steen, Hougaard, Joo, Olza, Pagán Cánovas, Pleshakova, Ray, Uhrig, Valenzuela, Woźny and Turner2018), I found gesture co-occurrence frequencies that range between 37 percent and 85 percent. Although these are considerably high frequencies, gesturing with these constructions is obviously not mandatory, at least not under every circumstance. If one was to perform a deletion test, as proposed by Ziem (Reference Ziem2017), the conclusion would have to be that all these constructions are not multimodal in nature as the constructs are not uninterpretable without the gesture. Yet, the gestures are not redundant and add to the meaning of the utterances. In particular, the iconic gestures make a certain aspect of conceptualization particularly salient, following the quantitative iconicity principle of “more form is more meaning.” The following examples are meant to illustrate this.

In example (1), the speaker is telling a story and enacting a scene from a hockey game. The gesture, which consists of consecutive rapid movements of the right hand, emphasizes both the marked path of motion (in circles) and the velocity (faster and faster). It thus fulfills the function of highlighting and drawing attention to the semantic aspects of path and manner of motion.

(1) KNBC Tonight Show with Jay Leno, July 16, 2010

(UCLA Library NewsScape, Steen et al. 2018; see also Zima 2014b: 13)

This highlighting function also holds for gestural instantiations of temporal and spatial uses of [all the way from X PREP Y] (Zima Reference Zima and Bergs2017). An example of a spatial instantiation that is accompanied by a co-speech gesture is given in (2). The bimanual gesture performed by the speaker depicts and thereby emphasizes the long distance between location X (Long Beach) and location Y (Lancaster), thereby communicating that the task of delivering food to all clients in this area on a single day is difficult.

(2) KNBC 4 News at Noon, December 25, 2012

(UCLA Library NewsScape, Steen et al. 2018; see also Zima 2017: 6)

Frame grab (1) shows the first stroke of the gesture that is co-produced with the articulation of the first geographical reference point (Long Beach), which instantiates the X-slot of the constructional template. Frame grab (2) depicts the second stroke that is aligned with Lancaster. Right and left hand each mark the beginning and endpoint of a spatial path. The space between the two extended hands maps onto the distance between the two places.

Based on both the analysis of the gestures’ semantic-pragmatic meaning and their frequency (63 percent for [V(motion) in circles]; 85 percent for spatial uses of [all the way from X PREP Y]), it is argued that we should not treat these seemingly redundant, co-expressive gestures as totally optional. Rather, our focus should be more data-centered, acknowledging and trying to explain the fact the speakers recurrently do gesture. Following Kendon (Reference Kendon2004) and Calbris (Reference Calbris2011), these gestures are produced with the intention to convey meaning and hence cannot be dismissed as ‘just optional’.

An equally meaning-centered approach is taken by Bressem and Müller (Reference Bressem and Müller2017). Departing from a recurrent gesture, the so-called throwing-away gesture, they illustrate that this gesture can be combined with a number of different verbal constructions including a wide range of grammatical categories such as particles, nouns, verbs, and adverbs. The throwing-away gesture is “characterized by a particular kinesic core: a lax flat hand oriented vertically with the palm facing away from the speaker’s body flapping downwards from the wrist” (Bressem & Müller Reference Bressem and Müller2017: 3). Just as Mittelberg argues for palm-up open-hand gestures that co-occur with German existential constructions, Bressem and Müller argue for an experiential basis of the gesture which they situate in the embodied experience of throwing concrete entities away. This is extended to metaphorical uses when referring to abstract objects in speech. They thus identify a constructional pattern, which they term “negative assessment construction,” with the multimodal form [throwing-away gesture] + [particles/negation/N/V/ADV]. From a theoretical perspective, they suggest the compelling idea that whether constructions are multimodal in nature is probably not a polar question requiring a yes-or-no answer. Rather, verbal constructions may constitute a multimodal network, with some of them being more, and others less, bound to particular gestures.

A further pioneering study is Schoonjans’ (Reference Schoonjans2018) monograph on German modal particles and the role of manual and head gestures to co-express down-toning meanings. His study is among the very first to not only raise theoretical questions but perform a large-scale corpus analysis that inquires in detail into the interdependence of verbal constructions and non-verbal co-occurrence patterns. The frequencies reported for multimodal instantiations of the modal particles under scrutiny are rather low (37 percent and less), but this should not lead one to dismiss Schoonjans’ results and his approach. Indeed, he raises and discusses a number of issues that are critical for future endeavors in multimodal Construction Grammar. These include the problem that recurrence (e.g., Langacker Reference Langacker2001) involves the assumption that there is a stable formal and semantic core that is common to all instantiations and results from subtraction of all in situ variation. However, as Bressem (Reference Bressem, Müller, Cienki, Fricke, Ladewig, McNeill and Teßendorf2013) illustrates, the form of manual gestures may vary in a great number of dimensions including hand shape, orientation, movement, and position in gesture space; therefore, “no two tokens of gesture are ever identical” (Harrison Reference Harrison2009: 82). Put differently, the issue of whether two gesture tokens are instantiations of the same gesture type is far from trivial.

Another methodological problem with far-reaching implications that Schoonjans draws attention to is the fact that there is not always perfect temporal alignment between the verbal construction and a co-expressive gesture. For instance, the performance of gesture phrases and units may take more time than the articulation of the lexical affiliate and, more importantly, the lexical affiliate may not be just one verbal construction but a larger semantic unit within an utterance. To date, all these issues are unresolved. As Schoonjans (Reference Schoonjans2017) argues, many of them are, however, not restricted to attempts to develop a multimodal Construction Grammar but they also concern monomodal Construction Grammars. This mostly concerns the still debated link between frequency and entrenchment (Hoffmann Reference Hoffmann, Hoffmann and Trousdale2013, Reference Hoffmann2017) but also the question of the level of granularity that one assumes a construction to be situated at.

15.4.2 Theoretical Proposals: Monomodal, Multimodal Construction Grammar, or Something Else?

Monomodal Construction Grammars posit that constructions exist at every level of granularity or schematicity, ranging from highly abstract patterns to lexically and syntactically fully fixed ones. They further allow for constructions to have optional slots. Therefore, one may consider it arbitrary to posit that verbal elements can be optional but gestural ones need to be obligatory. At the same time, one may equally wonder whether non-obligatory elements in verbally defined constructions are cognitively real or whether they rather point towards the existence of different constructions at different levels of granularity. This issue is raised by Lanwer (Reference Lanwer2017), who suggests that the difference between mono- and multimodal constructions may be a degree of schematicity. Therefore, multimodal constructions comprising a given [verbal form + gesture] may be stored alongside the more specific monomodal ones that do not involve a slot for a co-speech gesture. This argument is grounded in the very basic claim of Construction Grammar, namely that constructions may be stored redundantly at different levels of granularity. He further argues that in order to account for the varying frequencies of constructions’ co-occurrence with gestures and the varying degree of constructions’ dependence on gesture, we should consider thinking of a multimodal network of interrelated constructions as prototypically structured and involving fuzzy boundaries.

This idea is worked out in some more detail in Cienki (Reference Cienki2017). He introduces the idea of an Utterance Construction Grammar, with utterance being defined as “a level of description above that of speech and gesture for characterizing audio-visual communicative constructions” (Cienki Reference Cienki2017: 1). The suggestion of yet another model of linguistic knowledge is grounded in the conviction that it may be futile to try to coerce gestures into a verbally based constructional framework. In taking the utterance as point of departure, Cienki aligns with Kendon’s approach to gesture as “utterance dedicated visible bodily action” and speech as “utterance dedicated audible bodily action” (Kendon Reference Kendon and Allen2015: 44, cited in Cienki Reference Cienki2017: 3) as well as Langacker’s concept of the ‘usage event’ defined as including “the full phonetic detail of an utterance, as well as any other kinds of signals, such as gestures and body language” (Langacker Reference Langacker2008: 457, cited in Cienki Reference Cienki2017: 3). His proposal that constructions have a deep as well as a surface structure is reminiscent of two concepts that are traditionally associated with Generative Grammar, but Cienki stresses that the terms are borrowed without adhering to the nativist assumptions that underlie the Universal Grammar approach. The deep structure is conceptualized as “a set of tools that can be drawn upon to express the construction,” whereas the surface structure is “a metonymic representation of some (if not all) elements of the construction” (Cienki Reference Cienki2017: 3). Accordingly, information about which gestures go with a construction is stored in the construction’s deep structure. Constructions thus exhibit an inherent potential for multimodal realization and some aspects of this potential may get activated and be visible at a construction’s surface representation, that is, in a construct. Crucially, potential component elements as part of the deep structure may differ in being more or less prototypically associated with the construction. This way of thinking about constructions, Cienki (Reference Cienki2017: 5) argues, “is a more flexible alternative than positing that the model has the binary choice between required and optional elements” and is more compatible with the idea of various degrees of entrenchment.

Cienki thus proposes a new way of thinking about many issues that have turned out to be challenging for multimodal Construction Grammar. However, one may wonder about the ways of putting these ideas to the test. In that vein, Hoffmann (Reference Hoffmann2017) emphasizes the need for larger-scale data studies and the application of quantitative and statistical methods that go beyond absolute and relative frequencies (as, for example, in Zima Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b; Schoonjans Reference Schoonjans2018).

An example of such a quantitative approach is a recent study by Debras (Reference Debras2021) on French je (ne) sais pas ‘I don’t know’. Her approach is not explicitly situated within multimodal Construction Grammar. However, her paper involves an interesting discussion on why the constructional approach does not do full justice to the semantic-pragmatic import of co-speech gestures, arguing that the original Construction Grammar focus on verbal constructions entails that gestures are regarded as “secondary and dependent on speech” (Debras Reference Debras2021: 42). At the same time, she concludes that the association of the various uses of je (ne) sais pas as a pragmatic marker with recurrent gestures is too loose to allow for a straightforward categorization as a multimodal construction. In that respect, the methodology that is applied in her study is especially interesting and points to a potentially fruitful direction; based on a qualitative, multimodal analysis of eighty-four occurrences,Footnote ⁶ she identifies three multimodal profiles of je (ne) sais pas. A multiple correspondence analysis is then performed to identify the strength of association between all annotated parameters, which include phonetic realization, prosodic detail, functions, type of co-speech gestures, and a couple more. It turns out that the variable ‘type of co-speech gesture’ accounts for a big portion of the variation in the dataset and is thus only loosely associated with the particular phonetic realizations and functions. Mirroring the ongoing discussion on obligatoriness, frequency, and prototype structure in the field of multimodal Construction Grammar, these results may be thus interpreted in two ways: either as evidence for je (ne) sais pas clearly not being a multimodal construction, or as an argument for the need for a more nuanced model along the lines proposed by Zima (Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b), Lanwer (Reference Lanwer2017), Cienki (Reference Cienki2017), and Schoonjans (Reference Schoonjans2018).

All these studies, hence, suggest that there are many ways to conduct research with a multimodal constructional focus. However, in some way they are all struggling with similar issues, most notably difficulties in answering the pending question of where multimodal information is stored in our mind. This question clearly calls for an interdisciplinary approach that brings together experts in multimodal communication and gesture studies as well as cognitive linguists, psycholinguists, and cognitive scientists. However, it seems that one step to take before that is to increase the empirical basis by conducting more case studies on large enough multimodal datasets. Little is known on how systematic the relationship between given verbal constructions and gestures really is. So, where do we go from here?

15.5 The Road Ahead

As I hope to have shown in this chapter, the inquiry into the potential multimodality of constructions and the constructicon is still in its infancy and faces a number of theoretical and methodological challenges. These relate to the debated role of frequency of co-occurrence, the status of open slots in constructions, and the issue of whether grammar is restricted to verbal symbols or not. Some of these issues are intrinsic to the Construction Grammar framework but come to the fore with greater saliency when we extend the focus towards multimodal communication. This may leave readers with the impression that the endeavor may be futile altogether. I would like to close this chapter with a different conclusion. Much of the current discussion in the field of multimodal Construction Grammar suffers from a top-down approach; instead, we should adopt a more bottom-up perspective. Many arguments, including those presented in Zima (Reference Zima2014a, Reference Zima2014b, Reference Zima2017a, Reference Zima, de Mendoza, Luzondo and Pérez-Sobrino2017b), Zima and Bergs (Reference Zima and Bergs2017), and in this chapter, depart from the basic tenets of Cognitive Linguistics, the usage-based model, and especially Cognitive Construction Grammar. It is argued that there is a discrepancy between the acknowledgment that language use is multimodal and the way we theorize about language and language use in Construction Grammar. While this observation is valid, the discussion about the place of gesture (and other non-verbal modalities) within communication and grammar remains a purely theoretical one, unless we ground it in a much broader empirical basis. Too little is known about how consistent co-occurrences and mappings between the verbal and the gestural modalities are on a constructional level. Therefore, we need many more case studies, and this includes studies that start out from verbal constructions and their multimodal instantiations as well as more gesture- and meaning-centered ones. This entails the need for large enough, annotated, multimodal corpora. The NewsScape Library (Steen et al. Reference Steen, Hougaard, Joo, Olza, Pagán Cánovas, Pleshakova, Ray, Uhrig, Valenzuela, Woźny and Turner2018) is an exceptionally good starting point for any study on multimodal instantiations of constructions as it is fully searchable (for verbal constructions) and contains enormous amounts of audio-visual data, not only in English, but also in Spanish, Russian, German, and many more languages. Of course, this is not to say that smaller multimodal corpora cannot be used. They are equally relevant especially for constructions that occur frequently enough to compose a large enough dataset. Not least, these corpora are very valuable resources because the NewsScape Library only contains televised interactions and thus no private, face-to-face conversations or other interactional settings.

Finally, we need to broaden our methodological toolkit. To move forward on the issues under scrutiny, we need both qualitative research, which pays close attention to how meaning is expressed in situ in all modalities, as well as quantitative studies that make use of the full array of statistical methods that have been applied so successfully in Construction Grammar and other Cognitive Linguistic disciplines over the past decade (cf. Janda Reference Janda2013). Most notably, the issue at hand is a fundamentally interdisciplinary one that calls for an interdisciplinary approach and may not be resolvable by construction grammarians alone.

Footnotes

¹ The discussion on the place of multimodality in the Construction Grammar framework is of course not limited to gestures but also raises issues of how to integrate, among many others, gaze patterns, facial expressions, postures, and also prosodic contours into the description of constructions (see also Chapters 13 and 14). It is also discussed in Sign Language studies (see Chapter 16).

² Whether gestures are produced unconsciously or not is a recurring topic of debate in gesture studies. As it is not of immediate concern for the issues in this chapter, I will not elaborate on it any further; instead, I will focus on the common core of both definitions, that is, the intent to convey meaning.

³ Other researchers have proposed more fine-grained typologies, but this is not of immediate concern for this chapter. Note also that gestures can be multi-dimensional and instantiate more than one gesture type at the same time.

⁴ Prototypically, metaphoric gestures involve a mapping from a concrete to an abstract domain, but they may also map a concrete domain onto another concrete domain (Cienki & Müller Reference Müller2008: 485–486).

⁵ Mittelberg’s study does not provide information on the frequency of this co-occurrence.

⁶ This is, of course, still a quite low token number. As more and more multimodal corpora become available, one may hope for more case studies that rely on bigger amounts of data.

References

Alibali, M. W. & Kita, S. (2010). Gesture highlights perceptually present information for speakers. Gesture, 10(1), 3–28.CrossRef Google Scholar

Andrén, M. (2010). Children’s Gestures from 18 to 30 Months. PhD thesis. Lund University: Centre for Languages and Literature.Google Scholar

Balantani, A. (2021). Reference construction in interaction: The case of type-indicative “so”. Journal of Pragmatics, 181, 241–258.CrossRef Google Scholar

Barlow, M. & Kemmer, S. (2000). A usage-based conception of language. In Barlow, M. & Kemmer, S., eds., Usage-based Models of Language. Stanford: CSLI Publications, pp. 7–28.Google Scholar

Blumenthal-Dramé, A. (2012). Entrenchment in Usage-Based Theories: What Corpus Data Do and Do Not Reveal about the Mind. Berlin: De Gruyter Mouton.CrossRef Google Scholar

Bressem, J. (2013). A linguistic perspective on the notation of form features in gestures. In Müller, C., Cienki, A., Fricke, E., Ladewig, S. H., McNeill, D., & Teßendorf, S., eds., Body-Language-Communication: An International Handbook on Multimodality in Human Interaction. Berlin: De Gruyter Mouton, pp. 1079–1098.Google Scholar

Bressem, J., & Müller, C. (2014). The family of AWAY-gesture. In Müller, C., Cienki, A., Fricke, E., Ladewig, S. H., McNeill, D., & Teßendorf, S., eds., Body-Language-Communication: An International Handbook on Multimodality in Human Interaction. Berlin: De Gruyter Mouton, pp. 1592–1604.Google Scholar

Bressem, J. & Müller, C. (2017). The “Negative-Assessment-Construction” – A multimodal pattern based on a recurrent gesture? Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0053.CrossRef Google Scholar

Bybee, J. (2006). From usage to grammar. The mind’s response to repetition. Language, 82, 711–733.CrossRef Google Scholar

Bybee, J. (2010). Language, Usage and Cognition. Cambridge: Cambridge University Press.CrossRef Google Scholar

Calbris, G. (2011). Elements of Meaning in Gesture. Amsterdam & Philadelphia: John Benjamins.CrossRef Google Scholar

Chui, K. (2005). Temporal patterning of speech and iconic gestures in conversational discourse. Journal of Pragmatics, 37(6), 871–887.CrossRef Google Scholar

Cienki, A. (2008). Why study metaphor and gesture? In Müller, C. & Cienki, A., eds., Metaphor and Gesture. Amsterdam & Philadelphia: John Benjamins, pp. 5–26.CrossRef Google Scholar

Cienki, A. (2015). Spoken language usage events. Language & Cognition, 7, 499–514.CrossRef Google Scholar

Cienki, A. (2016). Cognitive Linguistics, gesture studies, and multimodal communication. Cognitive Linguistics, 27(49), 603–618.CrossRef Google Scholar

Cienki, A. (2017). Utterance Construction Grammar (UCxG) and the variable multimodality of constructions. Linguistics Vanguard, 3, https://doi/10.1515/lingvan–2016–0048.CrossRef Google Scholar

Cienki, A. & Müller, C. (2008). Metaphor, gesture, and thought. In Gibbs, R., ed., Metaphor and Thought. Cambridge: Cambridge University Press, pp. 483–501.CrossRef Google Scholar

Debras, C. (2021). Multimodal profiles of je (ne) sais pas in spoken French. Journal of Pragmatics, 182(1), 42–62.CrossRef Google Scholar

Divjak, D. (2019). Frequency in Language: Memory, Attention and Learning. Cambridge: Cambridge University Press.CrossRef Google Scholar

Divjak, D. & Caldwell-Harris, C. (2015). Frequency and entrenchment. In Dąbrowska, E & Divjak, D., eds., Handbook of Cognitive Linguistics. Berlin: De Gruyter Mouton, pp. 53–75.CrossRef Google Scholar

Efron, D. (1941). Gesture and Environment. Princeton: King’s Crown Press.Google Scholar

Ekman, P. & Friesen., W. V. (1969). The repertoire of nonverbal behavior: Categories, origin, usage, and coding. Semiotica, 1, 49–89.CrossRef Google Scholar

Enfield, N. J. (2009). The Anatomy of Meaning: Speech, Gesture, and Composite Utterances. Cambridge: Cambridge University Press.CrossRef Google Scholar

Ferré, G. (2010). Timing relationships between speech and co-verbal gestures in spontaneous French. In Proceedings of the 7th International Conference on Language Resources and Evaluation, Valetta, Malta, pp. 86–91. https://hal.science/hal-00485797.Google Scholar

Feyaerts, K., Brône, G., & Oben, B. (2017). Multimodality in interaction. In Dancygier, B., ed., The Cambridge Handbook of Cognitive Linguistics. Cambridge: Cambridge University Press, pp. 135–156.CrossRef Google Scholar

Forceville, C. (2008). Metaphor in pictures and multimodal representations. In Gibbs, R., ed., The Cambridge Handbook of Metaphor and Thought. Cambridge: Cambridge University Press, pp. 462–482.CrossRef Google Scholar

Givón, T. (1985). Functionalism and Grammar. Amsterdam & Philadelphia: John Benjamins.Google Scholar

Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press.Google Scholar

Goldberg, A. E. (1998). Patterns of experience in patterns of language. In Tomasello, M., ed., The New Psychology of Language: Cognitive and Functional Approaches to Language Structure. Mahwah & London: Lawrence Erlbaum Associates, pp. 203–219.Google Scholar

Goldberg, A. E. (2003). Constructions: A new theoretical approach to language. Trends in Cognitive Science, 7(5), 219–224.CrossRef Google Scholar PubMed

Goldberg, A. E. (2006). Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press.Google Scholar

Goldin-Meadow, S. Nusbaum, H., Kelly, S., D., & Wagner, S. (2001). Explaining math: Gesturing lightens the load. Psychological Science, 12(6), 516–522.CrossRef Google Scholar PubMed

Harrison, S. (2009). The Impulse to Gesture: Where Language, Minds, and Bodies Intersect. Cambridge: Cambridge University Press.Google Scholar

Hartmann, S. & Ungerer, T. (2023). Constructionist Approaches. Cambridge: Cambridge University Press.Google Scholar

Hilpert, M. (2014). Construction Grammar and Its Application to English. Edinburgh: Edinburgh University Press.Google Scholar

Hinell, J. (2018). The multimodal marking of aspect: The case of five periphrastic auxiliary constructions in North American English. Cognitive Linguistics, 29(4), 773–806.CrossRef Google Scholar

Hoffmann, T. (2013). Abstract phrasal and clausal constructions. In Hoffmann, T. & Trousdale, G., eds., The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, pp. 307–328.Google Scholar

Hoffmann, T. (2017). Multimodal constructs – multimodal constructions? The role of constructions in the working memory. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0042/html.CrossRef Google Scholar

Hopper, P. (1998). Emergent grammar. In Tomasello, M., ed., The New Psychology of Language: Cognitive and Functional Approaches to Language Structure. London: Lawrence Erlbaum Associates, pp. 155–175.Google Scholar

Janda, L. A., ed. (2013). Cognitive Linguistics: The Quantitative Turn. Berlin: De Gruyter Mouton.CrossRef Google Scholar

Jannedy, S. & Mendoza-Denton, N. (2005). Structuring information through gesture and intonation. Interdisciplinary Studies on Information Structure (ISIS), 3, 199–244.Google Scholar

Kendon, A. (1980). Gesticulation and speech: Two aspects of the process utterance. In Key, M., ed., The Relationship of Verbal and Nonverbal Communication. Berlin: De Gruyter Mouton, pp. 207–227.CrossRef Google Scholar

Kendon, A. (2004). Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.CrossRef Google Scholar

Kendon, A. (2014). Semiotic diversity in utterance production and the concept of ‘language’. Philosophical Transactions of the Royal Society B, 369(1651), 20130293. https://doi.org/10.1098/rstb.2013.0293.CrossRef Google Scholar PubMed

Kendon, A. (2015). Gesture and sign: Utterance uses of visible bodily action. In Allen, K., ed., The Routledge Handbook of Linguistics. London: Routledge, pp. 33–46.Google Scholar

Kita, S., Alibali, M., & Chu, M. (2017). How do gestures influence thinking and speaking? The gesture-for-conceptualization hypothesis. Psychological Review, 124(3), 245–266.CrossRef Google Scholar PubMed

Kraus, R. M. (1998). Why do we gesture when we speak? Current Directions in Psychological Science, 7(2), 54–60.CrossRef Google Scholar

Ladewig, S. (2011). Putting the cyclic gesture on a cognitive basis. CogniTextes, 6. https://doi.org/10.4000/cognitextes.406.CrossRef Google Scholar

Ladewig, S. (2014). Recurrent gestures. In Müller, C., Cienki, A., Fricke, E., Ladewig, S., McNeill, D., & Bressem, J., eds., Body-Language-Communication: An International Handbook on Multimodality in Human Interaction. Berlin: De Gruyter Mouton, pp. 1558–1575.Google Scholar

Langacker, R. W. (2001). Discourse in cognitive grammar. Cognitive Linguistics, 12, 143–188.CrossRef Google Scholar

Langacker, R. W. (2008). Cognitive Grammar: A Basic Introduction. Oxford: Oxford University Press.CrossRef Google Scholar

Lanwer, J. P. (2017). Apposition: A multimodal construction? The multimodality of linguistic constructions in the light of usage-based theory. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0071.CrossRef Google Scholar

Lasch, A. (2020). Semantically motivated constructions in a semantically motivated constructicon. Presentation at “Constructing a Constructicon,” Trient, May 2020.Google Scholar

Loehr, D. (2004). Gesture and Intonation. Washington, DC: Georgetown University dissertation.Google Scholar

McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press.Google Scholar

McNeill, D. (2005). Gesture and Thought. Chicago: University of Chicago Press.CrossRef Google Scholar

Mittelberg, I. (2006). Metaphor and Metonymy in Language and Gesture: Discourse Evidence for Multimodal Models of Grammar. Ann Arbor: Pro Quest/UMI.Google Scholar

Mittelberg, I. (2017). Multimodal existential constructions in German: Manual actions of giving as experiential substrate for grammatical and gestural patterns. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0047.CrossRef Google Scholar

Mittelberg, I. (2019). Visuo-kinetic signs are inherently metonymic: How embodied metonymy motivates form, function and schematic patterns in gesture. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.00254.CrossRef Google Scholar PubMed

Mondada, L. (2007). Multimodal resources for turn-taking. Discourse Studies, 9(2), 194–225.CrossRef Google Scholar

Müller, C. (2004). Forms and uses of the Palm Up Open Hand: A case of a gesture family? In Müller, C. & Posner, R., eds., The Semantics and Pragmatics of Everyday Gesture: The Berlin Conference. Berlin: Weidler Verlag, pp. 233–256.Google Scholar

Müller, C. (2008). Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View. Chicago: University of Chicago Press.CrossRef Google Scholar

Ningelgen, J. & Auer, P. (2017). Is there a multimodal construction based on non-deictic so in German? Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-005.CrossRef Google Scholar

Pagán Cánovas, C. & Valenzuela, J. (2017). Timelines and multimodal constructions: Facing new challenges. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0087.Google Scholar

Pagán Cánovas, C., Valenzuela, J., Alcaraz-Carríon, D., Olza, I., & Ramscar, M. (2020). Quantifying the speech-gesture relation with massive multimodal datasets: Informativity in time expressions. PloS ONE, 15(6), e0233892. https://doi.org/10.1371/journal.pone.0233892.CrossRef Google Scholar PubMed

Sanaz, M. J. (2013). Multimodality and Cognitive Linguistics. Review of Cognitive Linguistics, 11(2), 227–235.CrossRef Google Scholar

Schmid, H.-J. (2007). Entrenchment, salience and basic levels. In Geeraerts, D. & Cuyckens, H., eds., The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford University Press, pp. 117–138.Google Scholar

Schmid, H.-J. (2014). Entrenchment, Memory and Automaticity: The Psychology of Linguistic Knowledge and Language Learning. Berlin: De Gruyter Mouton.Google Scholar

Schmitt, R. (2014): Zur multimodalen Struktur von turn-taking. Gesprächsforschung – Online-Zeitschrift zur verbalen Interaktion, 6, 17–61.Google Scholar

Schoonjans, S. (2017). Multimodal Construction Grammar issues are Construction Grammar issues. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0050.CrossRef Google Scholar

Schoonjans, S. (2018). Modalpartikeln als Multimodale Konstruktionen: Eine Korpusbasierte Kookkurrenzanalyse von Modalpartikeln und Gestik im Deutschen. Berlin: De Gruyter Mouton.CrossRef Google Scholar

Shattnuck-Hufnagel, S., Yasinnik, Y., Veilleux, N., & Renwick, M. (2007). A method for studying the time alignment of gestures and prosody in American English: ‘Hits’ and pitch accents in academic-lecture-style speech. In Espositio, A., Bratanić, M., Keller, E., & Marinaro, M., eds., Fundamentals of Verbal and Nonverbal Communication and the Biometric Issue. Washington, DC: IOS Press.Google Scholar

Steen, F. F., Hougaard, A., Joo, J., Olza, I., Pagán Cánovas, C., Pleshakova, A., Ray, S., Uhrig, P., Valenzuela, J., Woźny, J., & Turner, M. (2018). Toward an infrastructure for data-driven multimodal communication research. Linguistics Vanguard, 4(1). https://doi.org/10.1515/lingvan-2017-0041.CrossRef Google Scholar

Stukenbrock, A. (2010). Überlegungen zu einem multimodalen Verständnis der gesprochenen Sprache am Beispiel deiktischer Verwendungsweisen des Ausdrucks “so”. Interaction and Linguistic Structures, 47, 1–23.Google Scholar

Stukenbrock, A. (2015). Deixis in der Face-to-Face-Interaktion. Berlin: De Gruyter Mouton.CrossRef Google Scholar

Stukenbrock, A. (2020). Mit Blick auf die Geste – multimodale Verfestigungen in der Interaktion. In Weidner, B., König, K., Wegner, L., & Imo, W., eds., Verfestigungen in der Interaktion – Konstruktionen, sequenzielle Muster, kommunikative Gattungen. Berlin: De Gruyter Mouton, pp. 233–263.Google Scholar

Teßendorf, S. (2014). Pragmatic and metaphoric gestures – Combining functional with cognitive approaches in the analysis of the “brushing aside gesture”. In Müller, C., Cienki, A., Fricke, E., Ladewig, S. H., McNeill, D., & Bressem, J., eds., Body-Language-Communication: An International Handbook on Multimodality in Human Interaction. Berlin: De Gruyter Mouton, pp. 1540–1558.Google Scholar

ter Bekke, M., Drijvers, L., & Holler, J. (2020). The predictive potential of hand gestures during conversation: An investigation of the timing of gestures in relation to speech. In Proceedings of the 7th GESPIN – Gesture and Speech in Interaction Conference. Stockholm: KTH Royal Institute of Technology. http://hdl.handle.net/21.11116/0000-0007-B9A2–1.Google Scholar

Traugott, E. C. & Trousdale, G. (2013). Constructionalization and Constructional Changes. Oxford: Oxford University Press.CrossRef Google Scholar

Verhagen, A. (2021). Construction grammar, multimodal communication, and design features of language: Preliminaries to a coherent research program. Presentation at SLE conference, Athens, August 30–September 3, 2021.Google Scholar

Ziem, A. (2017). Do we really need a Multimodal Construction Grammar? Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0095.CrossRef Google Scholar

Zima, E. (2014a). English multimodal motion constructions. A construction grammar perspective. Studies van de BKL – Travaux du CBL – Papers of the LSB, 8. http://uahost.uantwerpen.be/linguist/SBKL/sbkl2013/Zim2013.pdf.Google Scholar

Zima, E. (2014b). Gibt es multimodale Konstruktionen? Eine Studie zu [V(motion) in circles] und [all the way from X PREP Y]. Gesprächsforschung – Online-Zeitschrift zur verbalen Interaktion, 15, 1–48.Google Scholar

Zima, E. (2017a). On the multimodality of [all the way from X PREP Y]. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-0055.CrossRef Google Scholar

Zima, E. (2017b). Multimodal constructional resemblance: The case of English circular motion constructions. In de Mendoza, F. Ruiz, Luzondo, A., & Pérez-Sobrino, P., eds., Constructing Families of Constructions. Amsterdam & Philadelphia: John Benjamins, pp. 301–337.CrossRef Google Scholar

Zima, E. (2018). Multimodale Mittel der Rederechtsaushandlung im gemeinsamen Erzählen in der Face-to-Face Interaktion. Gesprächsforschung – Online-Zeitschrift zur verbalen Interaktion, 18, 241–273.Google Scholar

Zima, E. & Bergs, A. (2017). Multimodality and Construction Grammar. Linguistics Vanguard, 3. https://doi.org/10.1515/lingvan-2016-1006/html.CrossRef Google Scholar

Zima, E. & Brône, G. (2015). Cognitive Linguistics and interactional discourse: Time to enter into dialogue. Language & Cognition, 7(4), 485–498.CrossRef Google Scholar

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the HTML of this chapter is currently unknown and may be updated in the future.