1 Worldview
The concept of a growth point is new as concepts go and not easily categorized. I regard this as a strength. The historical figures who have influenced the growth point the most – Peirce, Saussure, James, Vygotsky, Piaget, Merleau-Ponty, Werner, and Kohler – have not put forth anything like it (Vygotsky and Werner come closest). But if we imagine these giants stacked up in a gestalt of their own, like those superimposed photos of different faces coalescing into one human face, something like the growth point coalesces as well. There are common threads – respect for the inner world and a resistance to reducing wholes to parts (for several of them). Looking over notes I wrote in the late 1980s and early 1990s, I see the concept jelling. I first used the term “growth point” in Hand and mind (Reference McNeill1992). A joint paper with Susan Duncan (Reference McNeill, Duncan and McNeillMcNeill & Duncan, 2000) was a major step. The idea has continued to evolve as I carried it into new areas. A book currently in progress, Language is gesture—Meaning is being, is likely to be the capstone.
To forestall misconceptions, however, I start with two proposals that are not the growth point. Neither clarifies what we seek. First, a speaker intends to say something, finds words and a sentence structure she can use to say it, and from this generates speech (e.g. Reference Krauss, Chen, Gottesman and McNeillKrauss, Chen, & Gotesman, 2000). Reference LeveltLevelt (1989) states it as in Box 1 in Figure 18.1.
Figure 18.1 Box 1
In this view, gestures are outside – paralanguage, add-ons, ornaments, “illustrators” (as some were called by Reference Ekman and FriesenEkman & Friesen, 1969).
The second proposal sees gesture on a wide semiotic landscape with other kinesic systems and not as special (Reference KendonKendon, 2008). We lose sight of gesture’s uniqueness. In a growth point, a gesture is an integral part of language and is unique.
A growth point is a generative force and has intellectual scope. Its ultimate source is the speaker’s own social interactive/actional/mental energy. To exist, it absorbs influences of many kinds and creates something new – a world of meaning specific to the immediate context and moment of speaking. The growth point inhabits this world and spreads the speaker’s energy into the workings of the speech-with-gesture emerging out of it.Footnote 1
The growth-point conceives of its topic as the moment of speaking. It takes a first-person, inside-looking-out view that aims to describe the experience of speaking. The growth point is non-reductive, not turning gesture–speech units into what Reference Vygotsky, Hanfmann and VakarVygotsky (1986) called “elements” but keeping them as “units,” which retain the essential properties of the whole.
It shuns models using computational concepts borrowed from computer modelling, concepts that belong to Reference SimonSimon’s (1996) sciences of the artificial, and instead aligns with Simon’s natural. Not all gestures can take part in growth points; gestures must be able to unify with speech: gesticulations – with their phases of preparation, stroke, and retraction – do; pantomimes (repelling speech instead) do not.
Gestures are components of speech, not accompaniments but integral parts of it. Much evidence supports this idea, but its full implications have not always been recognized. The growth point embodies this integral linkage. Gestures offer one kind of symbol, language a different kind, and the two kinds are unified in growth points. A growth point thus forms a single mental gestalt or idea unit made out of semiotically unlike components. Gestures of course do not always occur, and this may seem to restrict the growth point. In fact, it expands it. The absence of a gesture is a natural variation of newsworthiness. Apart from forced suppressions, gestures fall on an elaboration continuum. Even if newsworthiness and energy are low and no gesture appears, a gesture is still present, not materialized but still engaged in gesture–speech unity.
Speaking is more than uttering speech sounds with meaning; more deeply, it is “inhabiting” meaning. The concept of “inhabitance” was outlined by Reference Merleau-PontyMerleau-Ponty (1962) as “taking up a position in the [speaker’s] world of meanings” (he did not use the term, “growth point” of course). The speaker inhabits and, in a sense, “becomes” his/her growth point; it is part of his/her momentary being. The growth point is geared to this existential content of speech – this “taking up a position in the world of meanings.” A deeper answer to the query, therefore – when we see a gesture, what are we seeing? – is that we see the speaker’s current cognitive being, her very social/actional/mental existence at the moment it comes to life.
Two dimensions converge in a growth point that have classically been called “linguistic” and “psychological” but better (and less proprietary) terms are “static” and “dynamic.” Some phenomena are prominent on one dimension, others on the other, but the dynamic and static cannot be isolated: They intersect and interact. The dimensions are structured on different principles and draw on different methods of description and analysis (the field of linguistics specializing in the static dimension). Yet, both are dimensions of language, and we need to understand how they work together. Following Reference Humboldt, Heath and LosonskyHumboldt (1999), Werner in his lectures labeled them Ergon and Energeia, and saw that they must combine: language as structure, or Ergon, and language as an embodied moment of meaning, alive in an actor, or Energeia, form it (paraphrasing Reference Glick, Wapner and KaplanGlick, 1983).Footnote 2
Related to inhabitance is the concept of a material carrier (Reference Vygotsky, Rieber and CartonVygotsky, 1987, p. 46)Footnote 3 – the embodiment of meaning in material experiences. A material carrier, a gesture or writing something down for example, the very act of it, adds energy and enhances a symbol’s experiential power. From a first-person perspective, gestures (and words, sentences, etc., as well) are thinking in one of its many forms, not only expressions of thought, but thought itself.
Saussure’s langue is everything “systematic” in Human Language (pertaining to the regularities of the “system”) (Reference Saussure, Bally and SechehayeSaussure, 1959, translation of ca. 1910 lectures). His formula for parole or speech is: langage (the semiotic totality) minus langue (the “system”) = parole (“speech”). A growth point is not langage minus langue. It includes langue in a dialectic that unifies gesture and langue and is intrinsically dynamic, insisting that the static (linguistic form) and the dynamic (the gesture) unify. Static langue does not do this. Langue describes language as a cultural norm – static, complex, delicate, and crystalline, but not motion.
The concept of “performance” (Reference ChomskyChomsky, 1965), the modern version of parole, is inappropriate for another reason. Its metaphor is not subtraction but the “use” of competence or langue, and while “use” can be considered dynamic it assumes a separation that does not model gesture–speech unity.
Reference Schegloff, Atkinson and HeritageSchegloff (1984) introduced the “lexical affiliate” concept as a way to identify upcoming topics in conversations. A lexical affiliate is not a growth point. It is the word whose lexical meaning matches the gesture most closely. The lexical affiliate is independent of the context of speaking. Like the growth point, the lexical affiliate posits a tight speech–gesture linkage. but you cannot discover the growth point by looking for the lexical affiliate – you will be misled if you do. In a growth point, context and meaning are inseparable. The growth point is what Reference Vygotsky, Hanfmann and VakarVygotsky (1986, p. 220) called a psychological predicate – a point of differentiation within a context. A growth point is recoverable only by doing an analysis of the context and often does not include the lexical affiliate at all. Lexical affiliates tend rather to enter speech as part of the unpacking of a growth point (see Section 2.2).
Finally, some may regard the growth point as ignoring the “outer world,” the world of the social-interactive aspects of language and communication. It is true, I have not made the social interaction of conversation and other forms of social discourse my focus. Rather, I have taken them for granted. This reflects my choice of topic. I want to understand the mind and how it functions, develops, and evolved. But there is no conflict with an outside point of view. I am not proposing an alternative to it. What I aim for is a single gesture/speech idea unit that reflects social-interactive functions on the inside and pushes discourse along as they are realized in speech (see Reference McNeill.McNeill, 2018; Reference MüllerMüller, 2017).
2 The Growth Point, a Portrait
From here on, I explain the growth point’s core properties and pull them together into a kind of portrait. The growth point, a concept irreducibly a whole, cannot be reduced to one property or another; there will always be something missing. Indeed, multiple properties emerge – six or ten (depending on how we count). Having multiple properties is not inconsistent with unity. An analogy is a multifaceted gem. As we rotate it we see different colors. So it is here. We see the “colors” sequentially but keep in mind that we are looking at “one thing.” Seen as a whole, growth points jiggle the speaker’s personal energy: filter, rearrange, direct, add, subtract, adjust, and shape it as they spread it into speech (Section 4, “Final words,” gives the finished picture).
2.1 Infusion of Energy
The growth point is so named because it is a distillation of a growth process – an ontogenetic-like process but vastly sped up and made functional in online thinking-for/while-speaking (Reference Slobin, Aske, Beery, Michaelis and FilipSlobin, 1987). It is important to emphasize again that langue does not explain how we speak. It explains speech only when unified with gesture. The gesture brings langue to life. According to this framework, the growth point is the initial pulse of thinking-for/while-speaking, out of which utterance and discourse emerge. Imagery and spoken form are mutually influencing. It is not that gesture is the input to spoken form or spoken form is the input to gesture. The growth point is fundamentally both.
Efforts to separate gesture and linguistic form therefore fail. They are tightly bound – either they remain together or they are jointly interfered with; in either case the speech–gesture bond is unbroken. They are held together by the requirements of idea unit formation, stronger than association or convention: Thought itself is the glue. To think while speaking is to be active in both modes at once. The following are illustrations:
Delayed auditory feedback – the experience of hearing your own voice played back after a short delay – produces major speech disturbances but does not break speech–gesture synchrony (Reference McNeillMcNeill, 1992).
Stuttering or disruption of speech and gesture are incompatible. The onset of a gesture inoculates against stuttering and, conversely, once a gesture is ongoing, onset of stuttering stops the gesture immediately (Reference Mayberry, Jaques and McNeillMayberry & Jaques, 2000).
People blind from birth, who have never seen gestures and have no benefit from experiencing them in others, gesture and do so even to other blind people whom they know are blind (Reference Iverson and Goldin-MeadowIverson & Goldin-Meadow, 1997).
People born without arms “gesture” as they speak; that is, have the neurological feeling they gesture with full significance (Reference Ramachandran and BlakesleeRamachandran & Blakeslee, 1998).
Figure 18.2 shows a gesture–speech unit taking form during a narration. The speaker had just watched a cartoon and was recounting it to a listener from memory.Footnote 4 We had presented the task as story-telling and did not mention gesture. The speaker was describing an event in which one character (Sylvester) attempted to reach another character (Tweety) by climbing inside a drainpipe. To thwart Sylvester, Tweety had dropped a bowling ball into the pipe. The gesture-speech unit consisted of a gesture showing a moving entity (the bowling ball), its path (downward) and the landmark through which it passed (the pipe), synchronized in a single multimodal package with “[/ and it goes down].”Footnote 5 The core idea was something like “falling hollowness” – an idea that does not exist outside growth points, according to Reference TalmyTalmy (2000).
Figure 18.2 Speech-synchronized gesture
If we ask (as in the title of Reference McNeillMcNeill, 2016), “why we gesture,” the answer is not that when speech stops we “talk with our hands.” Reference Gullberg, Mattsson and NorrbyGullberg (2013) debunks this ancient idea – “when speech stops, gesture stops too.” Also, it is not that we gesture because we speak. While that may happen, the growth point offers another reason: Gesture is a template for speech and we speak because we gesture. Gesture orchestrates speech (not as a conductor or agent, not in a causal relation, but speech and gesture form a “harmonious unity” based on the gesture and its temporal, kinesic effects. Reference EfronEfron (1941) described a gentleman in an Eastern European neighborhood of New York City complaining that if you laid a hand on his arm (as was common there) he could not speak. It may have been gesture-orchestrated speech the restraining hand prevented.
****
The first property to describe is an imagery–language dialectic. A dialectic adds energy, propels thought and speech forward, and raises langue to the dynamic. It is a style of cognition special to language. Vygotsky described the process: “The relationship of thought to word is not a thing but a process, a movement from thought to word and from word to thought. […] This flow of thought is realized as an internal movement through several planes […] ” (Reference Vygotsky, Hanfmann and Vakar1986, p. 218). A growth point synchronizes a gesture (the “thesis”) with coexpressive speech (the “antithesis”). Coexpressivity means that speech and gesture convey the same meaning in different semiotic modes. The semiotic difference adds energy. The unpacking (the “synthesis”) spreads the energy into speech. The resulting sentence is still langue but is now inhabited dynamically.
The dialectic is not sequential: the thesis, antithesis, synthesis “steps” are not steps but are the logical aspects of the growth point and are all at once. There is an actual time of generation. Imagery and language-forms in a growth point can be activated just partly and parts of them not used. Such fragments do not vanish. They appear in later cycles and may have a long-range effect on the cohesiveness of discourse.
The dialectic, finally, has a built-in stop-order – the speaker’s intuitions of well-formedness tell her she has attained dialectic synthesis. The ending is also a new beginning, the growth point’s generative energy immediately launching the next growth point in a cycle of energy/repose followed by new energy/repose, and this repeating with fresh meanings as long as cohesion remains and the speaker continues speaking.
****
A dialectic can occur because the growth point has a unique semiotic mode: “dual semiosis.” It is a growth point inhabiting meaning both as imagery and as language at the same time, a global/synthetic/non-combinatoric semiotic face-to-face with a coexpressive analytic/segmented/combinatoric semiotic (Table 18.1). Coexpressivity unifies them. Dual semiosis implies that meanings are always cast in contrasting semiotic modes at points of coexpressivity. The term “coexpressivity” and its variants mean that gesture and the speech synchronized with it express the same meaning or idea. This is a basic property of the growth point from which other properties like the dialectic arise. Even if gesture and speech convey the same content, they remain semiotic opposites.
Table 18.1 Dual semiosis (Reference McNeillMcNeill, 2012, p. 20, Table 2.1, used with permission of Cambridge University Press)
| Imagery side | Language side |
|---|---|
| Global: Meanings of parts dependent on meaning of whole. | Compositional: Meaning of whole dependent on parts. |
| Synthetic: Distinguishable meanings in single image. | Analytic: Distinguishable meanings in separate linguistic forms. |
| Spontaneous: Forms created by individual on the fly. | Conventional: Forms regulated by sociocultural standards. |
| Additive: Images combine to add new details, but do not create new “higher” gestures or syntagmatic values. | Combinatoric: Linguistic elements combine into new higher units with syntagmatic values created in the process. |
The following apply to Table 18.1’s semiotic contrasts:
By “global,” I mean that the gesture’s parts (= the hands/fingers/trajectory/space/orientation/movement details) have meanings dependent upon the meaning of the gesture as a whole. Only the gesture’s downward motion has independent meaning. It means downward, but this is all, and it is not enough to generate “falling hollowness.” The parts as they reside in the gesture do not have meanings of their own, and the meaning of the whole is not composed out of the parts: rather, significance flows down, from whole to parts. The linguistic semiotic is the opposite. The meaning of the whole (a sentence or other unit) is composed out of its parts, which then must, and do, have their own meanings.
By “synthetic,” I mean that meanings are synthesized into one symbolic form (the “falling hollowness” hand) (synthesis was described by Reference Werner and KaplanWerner & Kaplan, 1963). In the companion speech, the meaning is analytic, separated into elements spread over the sentence (“goes” + “down”).
The third semiotic contrast is combinatoric potential; linguistic forms possess it, gesture imagery does not, and both exist within the growth point. When linguistic forms combine, new syntagmatic values and new units emerge. Even a single word has the property (hence, labeled “potential”). When gestures combine, they add imagery but not syntagmatic value or new gestures to contain them – part of the semiotic difference.
****
Context is the next topic. It is often regarded as a kind of encyclopedia that the speaker “consults,” “guiding” and “constraining” meaning. That is not the conception of context here. For a growth point, context is not passive; it is active and absorbed into the growth point (cf. Reference Goodwin, Duranti, Duranti and GoodwinGoodwin & Duranti, 1992, for a similar proposal). An implication of the absorption is that one meaning is two things. It is (1) a point of differentiation and (2) the immediate context, a field of meaningful equivalents, that it differentiates: “One meaning” is both. The growth point absorbs the context into its very existence. This makes the growth points inherently dynamic. Merely intending or recalling or associating something is not enough. The field of meaningful equivalents must also be included. This conception differs from familiar “one-thing” conceptions such as “signified content,” “association,” and so on that regard meaning without absorbing context.
A growth point “absorbing” the immediate context of speaking activates Vygotsky’s “psychological predicate” (not a grammatical predicate). One of Vygotsky’s examples is a crashing clock (Reference Vygotsky, Hanfmann and Vakar1986, p. 220): There is a crash in the next room – someone asks: “what fell?” (the answer: “the clock”), or: “what happened to the clock?” (“it fell”). Depending on the context – here crystallized in the questions – the newsworthy reply (the psychological predicate) highlights different elements. In forming a growth point, the speaker shapes the context in a certain way to highlight the intended differentiation within it while fulfilling a “two-things” meaning, much as the questioner about the falling clock shaped the context of the replies. This logic applies to the psychological predicate:
It is newsworthy in its context.
It implies and, if need be, shapes the context to ensure its newsworthiness and differentiation.
Examples suggest that growth points come in great variety, but as inferred from coexpressive speech/gesture synchrony, a common property runs through them all: psychological predicates are the local maxima of newsworthiness (equally, the point highest in communicative dynamism; Reference FirbasFirbas, 1971).
Figure 18.3 illustrates a different speaker describing Sylvester climbing the pipe on the inside, the event in the cartoon immediately preceding the Figure 18.3 downward event the first speaker described. The psychological predicate is the core idea of rising and interiority in “rising hollowness.” The speaker said, “he goe[s / up / th][rough the pipe] this time,” and with “up through” her hand rose and fingers spread outward to create an interior space. The newsworthiness of “rising hollowness” was the idea of being on the inside. The field of equivalents it differentiated was ways of climbing a pipe. Together, the field of equivalents and the differentiated idea were the growth point – ways of climbing a pipe: on the inside.
Figure 18.3 “Rising hollowness” with “he goes up through the pipe this time.” Computer art by Fey Parrill.
The growth point, as a psychological predicate, means that the mechanism of growth point formation is the differentiation of a point of focus in a context or field of equivalents. A robust phenomenon is that the form and timing of a gesture select just those features that differentiate the psychological predicate in a context that is at least partly the speaker’s own creation (see Reference McNeillMcNeill, 2005, pp. 108–112).
Growth point differentiation is reflected in the very close temporal connection of gesture strokes with peaks of acoustic output in speech, which also highlight newsworthiness (Reference NobeNobe, 1996). Reference ParrillParrill (2008), using a method for cuing psychological predicates on demand, found growth points also arising. Their emergence was automatic. Cuing different psychological predicates evoked different sentence structures, behind which were different growth points which the clue also evoked. A flashing arrow, pointing at Sylvester’s rounded, bowling-ball-shaped bottom, resulted in more ball-subject utterances, compared to an arrow pointing at his head (a cat prompt) – for example, “the ball rolls him down the street,” rather than “he rolls down the street.” The gesture following the ball prompt also included manner – the hand rotating rather than moving in the straight path that was common after the cat prompt – a feature the prompt did not convey and was part of the automatic emergence. This spontaneous manner also suggests a growth point had formed, confirming that growth points and psychological predicates are the same thing.
To see the full genesis of the “up through the pipe,” we trace the psychological predicates leading up to and then within it, as the psychological predicate differentiates its field of equivalents:
(1.1) and so he tries to get in the building again
(1.2) [he crawls up a pipe]
(1.3) and when he gets up to the bird
(1.4) the grandma hits him over the head with the umbrella
(1.5) and he goes back
(1.6) and [he goes up through the pipe] this time
In (1.2), the newsworthy point was climb a drainpipe and the field of equivalents was something like ways of reaching Tweety. The gesture was the hand rising upward with a relaxed flattish palm; it depicted only the upward trajectory. The verbal choice of “crawls up” likewise encoded the upward trajectory and added the manner of movement. That the movement was on the outside of the pipe was the “obvious” and not newsworthy.
By (1.6), the “rising hollowness” example, on the inside differentiated ways of climbing a pipe, which itself was an updating of the (1.2) point of differentiation, climb a drainpipe (creating both a new field of equivalents and a cohesive link back to (1.2)). Like (1.2), it presented a rising hand, but the hand now formed the “rising hollowness” image, with its palm-up cupped shape. The novelty was the fact that the trajectory was on the inside as opposed to the outside, a contrast that the cohesive thread back to (1.2) made meaningful. Newsworthiness was embodied in the extension of the fingers – the newsworthiness adding energy. Speech also received energy in the stress on “through.” The words “this time” likewise indicated newsworthiness and the contrast to (1.2) but over a different route, to be described below.
But why speak of differentiation and fields of equivalents at all? Why isn’t reference enough? This has a straightforward appeal: The speaker remembers the next event (Sylvester climbs the pipe on the inside) and describes it. This would portray the speaking process as incremental and not contrastive.
I can give three arguments why this position does not explain gestures.
We see the concurrent presence of the field of equivalents in examples like “up through the pipe” – what the gesture shows is not just a reference to going up the inside of the pipe, but a contrast to the previous outside ascent. This contrast was equally part of the gesture’s formation.
Some gestures are pure contrasts without reference. For example, “[they’re sup]posed to be the good guys” (from a narration of the Hitchcock film “Blackmail”; see Reference McNeillMcNeill, 1992, Reference McNeill2012, Reference McNeill2015), with the hand indexing the left side of space, “[but she] really did kill him,” with the hand indexing the right side – the right-left opposition diagramming (but not illustrating) the concept of virtue-in-appearance vs. crime-in-reality.
Reference is insufficient to explain the succession of growth points. In the (1.6) “up” example, the growth point included “this time” because its field of equivalents, ways of climbing a pipe, contrasted with the (1.2) field of equivalents, ways of reaching Tweety.
I stick with the overall plan. The sentence arises out of the growth point. The growth point retains its identity and dominates everything that occurs. It is not replaced by linguistic form but is the core around which this form rises. These factors are the invisible details of thinking-for/while-speaking that cannot be ignored. Utilizing the speech–gesture system, it is possible to expose the inner workings of this side of speech and infer the real-time genesis of thought and utterance.
****
Differentiation via psychological predicates is then one way that growth points absorb context. A second is via cohesive threads. We have already seen instances in the links of (1.6) to (1.2). The two modes of absorption convey contrasting new and old relationships to the context: psychological predicates conveying novelty, cohesive threads–connections to earlier growth points. Together, they anchor the growth point in the ever-changing waves of the discourse. Thanks to this cohesion, a growth point includes its current context and a web of earlier contexts to guide its unpacking (and insofar as anticipation is possible, future ones too). It is as if threads reach out to capture a feast of contexts, coherences, presuppositions, memories, and other fields of equivalents. Each moment of speaking, if we look at it in this way, is a synopsis of cohesion. It shapes the growth point, enlarges its world of meaning, becomes part of its field of equivalents and influences its differentiation. A conversation is mutual web-spinning in this fantasy, building a shared world of meaning. It also illustrates how growth points reflect the social exterior.
I have written about the examples in this section before (Reference McNeillMcNeill, 2012, Reference McNeill2016) but the connection to cohesion is new and receives exposition. Adding it deepens our portrayal. Growth points and cohesion work in tandem. Levy and I argued that gestures and referring expressions had similar discourse functions, cooperating in creating new themes and continuing old ones (Reference McNeillLevy & McNeill, 1992, Reference Levy and McNeill2015; Reference McNeill and LevyMcNeill & Levy, 1993), and so provide a basis for the production of the communicatively dynamic part of an utterance, this being the point of differentiation in a psychological predicate.
Broadly speaking, the threads are of two kinds. Some are imagery, called “catchments” – a phenomenon first noted by Kendon in 1972 (he did not use the term “catchment”); the second kind is built from them. The latter is often recycled from earlier gesture–speech dialectics and adds cohesion.
A catchment is when gesture features recur in at least two (not necessarily consecutive) gestures with the same thematic meaning. Catchments are a way of observing a speaker’s grouping patterns. Each catchment is the material carrier of the imagery which motivated it and a context for the growth point.
Convincing proof of the catchment as a psychologically real phenomenon comes from an ingenious test by Reference Furuyama, Sekine, Duncan, Cassell and LevyFuruyama and Sekine (2007). They noticed a systematic avoidance of gestures with a certain spatial content (involving a reversal of a predominant direction). The avoidance occurred precisely where this content, had it been included, would have disrupted an ongoing catchment. The catchment was the actual force and it blocked imagery that was referentially correct but inconsistent with it.
So, to define the catchment:
A catchment is recognized from recurrences of gesture form features over a stretch of discourse.
A catchment is a kind of thread of consistent dynamic visuospatial imagery running through the discourse and provides a gesture-based window into discourse cohesion.
The logic of the catchment is that discourse themes produce gestures with recurring features; these recurrences give rise to the catchment.
Thus, reasoning in reverse, the catchment offers clues to the cohesive linkages in the text with which it co-occurs.
Figure 18.4 is our example. Table 18.2 lists the catchments of Figure 18.4 and the cohesive threads they launched, as inferred from coexpressive gesture/speech synchrony. The Figure 18.4 Panel 2 growth point is our focus. It launches Catchment 2 and its metaphor thread, and absorbs five others: as I call them, the content, linguistic-pattern, accumulated-growth-points, shared-field-of-equivalents, and conversation/interaction threads, two of which were also part of Catchment 2 and all of which converged on Panel 2, and shaped the growth point and its field of equivalents.
Figure 18.4 Catchment view of the “it down” growth point (Panel 2). Transcription by S. Duncan. From Reference McNeillMcNeill (2016, Figure 4.2). Used with permission of Cambridge University Press
Table 18.2 Catchment themes of the “it down” growth point (from Reference McNeillMcNeill, 2012). Used with permission of Cambridge University Press.
| Catchments | Utterances |
|---|---|
| Catchment 1 One-handed gestures – Panels (1) and (6) – ties together the references to Sylvester as a solo force. Part of the growth-point accumulation thread |
|
| Catchment 2 Two-handed symmetrical gestures – Panels (2), (7), (8) and (9) – groups descriptions where the bowling ball is the antagonist, the dominant force. The two-handed symmetric gesture form highlights the shape of the bowling ball. The metaphor thread and part of the fields of equivalents and linguistic-pattern threads. |
|
| Catchment 3 Two-handed asymmetrical gestures – Panels (3), (4) and (5) – groups items in which the bowling ball (LH) and Sylvester (RH) are equals differing only in position and direction of motion. The content thread. |
|
The growth point consisted of the downward thrusting gesture shown, synchronized with the words “it down.” Catchment 2 produced a field of equivalents in which the bowling ball appeared in the role of an antagonist. This set the bowling ball apart from its role in Catchment 3 where the significance was a spatial relationship, and the bowling ball was on a par with Sylvester. Coexpressively, the speaker construed the gesture-speech unit to mean not just releasing the bowling ball but also as a metaphor of conflict: The Good, experienced as the bowling ball going down vs. The Evil, experienced as Sylvester climbing up (outlandish metaphors are to be expected in cartoon narrations). How this came about requires looking at the sentence in a different way, not only as a verbal form but as a process of metaphor construction which the cohesive threads are shaping. The antagonistic-force metaphor, starting in Panel 2, remained until Panel 9.
bowling ball as antagonistic force: down (Panel 2);
: inside Sylvester (Panel 7);
: Sylvester into bowling alley (Panel 8)
: strike (Panel 9).
The spoken “it down” and the downward gesture were the cognitive core of the Panel 2 utterance – the “it” coexpressive with the bowling ball as an antagonistic force and the “down” coexpressive with the significant contrast in the field of equivalents that Catchment 2 embodied (the force heading down).
Other cohesive threads were spin-offs of earlier dialectics. For example, Panel 2 contrasts with the Catchment 1 growth-point accumulation thread. This began in a previous growth point (not illustrated), something like ways of getting at tweety: climb a pipe, which set the stage for the growth point at (1), the (unillustrated) previous psychological predicate (climb a pipe) becoming (1)’s and (2)’s field of equivalents: ways of climbing a pipe: on the inside. The previous growth points and the (1) growth point together thereby formed the Catchment 2’s shared-field-of-equivalents thread that led to Panel 2’s partial repetition of the sentence structure of Panel 1, a “poetic” replication (cf. Reference Jakobson and SebeokJakobson, 1960). The sentences match as closely as possible, given that “drops” is transitive while “going up” is intransitive:
| (Sylvester) | up | in “he tries going up the inside of the drainpipe” |
| (Tweety) | down | in “and Ø(Tweety) drops it down the drainpipe” |
The shared-field-of-equivalents thread launched two more – Catchment 2’s linguistic-pattern and Catchment 3’s content thread (about the drainpipe) – wrapping the discourse together with the thread. Also, the “tries” in Panel 1 sets the stage for the downward force metaphor of Panel 2. The linking of growth points created another thread, the very idea that they are cohesively linked, which I will call the accumulation thread. The continued preparation and prestroke hold of “drops” is now explained: “drops” and the downward stroke were absorbing different threads, “drops” the accumulation thread, linking it to the previous growth point; the stroke the antagonistic-force thread, it launching a new concept. The stroke waited until the accumulation thread had been fulfilled and then launched the downward force.
The sixth thread, conversation/interaction, arose from the speaker’s ongoing interaction with her listener. It organized the interaction to accord with the pragmatic goal that because the listener had to retell the story from what the speaker told her, the narrative had to be accurate and complete. The thread did not reside in specific verbal/gestural expressions but covered the speaker’s pragmatic deployments overall.
****
Sometimes a construction from langue is a Trojan Horse. It sneaks in and threatens the growth point’s field of equivalents and its differentiation. Part of the growth point is the power to override an errant construction. If the growth point does not deploy it, speech immediately stops. Overriding does not erase the construction; the override bars it from orchestrating speech. The growth point at Panel 2 had an override. Overriding was no accident and not a failed attempt. It was part of the multiple cohesions the growth point had absorbed and the override was precise. The Trojan horse construction was the causative, “Ø(Tweety) drops it down,” which was needed for the growth point but if allowed to orchestrate speech would have tangled the accumulation and metaphor threads. Overriding let the metaphor thread hold back until uttering “drops” had fulfilled the accumulation thread. The growth point then launched the metaphor thread with the gesture stroke (see Box 2 in Figure 18.5). If I say the sentence in citation style, a temporal and aspirated “t” break appears at the syntactic “it”/“down” boundary. Such breaks were absent from how the speaker said it. In her speech, “it down” was a single prosodic package, suggesting it had been orchestrated as a whole.

Figure 18.5 Box 2
A different speaker, in a state of confusion, illustrates what happens when an override does not occur, and the Trojan Horse enters. The confusion was in navigating the sentence and included a midstream change of fields of equivalents. Confusions like this may be typical of failed overrides. The shutdown is at line (2.2).Footnote 6
2.1 [and he winds up rolling down the stre][et
2.2 because it th uh well ac][tually what happens is he I you assume that he swallows this bowling b][all
The speaker seems to have had two fields of meaningful equivalents in mind at once, not in a dialectic but as rivals: what happened next vs. what one supposes. The dropped grammatical subjects in line 2.2 – “it,” “th(e)” – belong to what happened next. The field of equivalents shut down and the second field, what one supposes, took over. Had speech continued, it would have been something like “it (or “the bowling ball”) I/you assume goes into him,” with the what one supposes field inserting not simply two words (“I/you assume”) but an entire other-meaning ensemble. Instead, speech stopped. The speaker might have repeated “it/the bowling ball” with “I/you assume” at the start, but this would already have been the what one supposes field. Instead, she kept “you assume he swallows,” and went on unpacking.
****
The beat is a simple up/down or in/out motion linked to speech stress-peaks. Its kinesic simplicity belies its semiotic complexity. Dynamically, beats absorb the cohesive threads embodied in stress peaks. They indicate where the growth point has absorbed the thread. Reference Bosker and PeetersBosker and Peeters (2021) show experimentally “[…] that beat gestures influence the […] perception of lexical stress (e.g. distinguishing OBject from obJECT), and in turn, can influence what vowels listeners hear” (p. 1). A beat at OBject absorbs a thread linking it to an entity; and obJECT one linking it to an action. The “manual McGurk Effect” the authors identify is set up via gesture–speech unity, conjuring a stressed vowel.
In our narrations, four kinds of beats occur:
(1) To highlight new content. An example is Panel 2 of Figure 18.4. Another, from a narration of Hitchcock’s Blackmail, is “his gIRLfriend, ALIce, Alice WHIte,” with a beat and prosodic stress accompanying each increment of new information – her functional role, first name, and last name. The beats highlight the increments and add cohesive links to the narrative theme targeting the names.
(2) To highlight an anaphor. An example is “the weight came down (with a large downward iconic gesture) and he got clobbered (a beat).” The beat was a miniaturized version of the first gesture, targeting its clause and creating a cohesive link back to the large gesture and its growth point.
(3) To highlight a cataphor – the reverse of (2), creating a cohesive link to the future with an obligation to fulfill it. The beat is a miniaturized anticipation of the larger gesture. For example, “so the next thing he does (the beat creating a link to the following) is go in the front door” (narrative with an iconic for motion).
(4) To highlight discourse significance. For example, a beat with “instead of” indicating a break from expectation in “the grandmother // instead of Tweety.”
****
Growth points finally can embody both Character and Observer viewpoints. Observer Viewpoint filters everything from the perspective of a detached observer, watching the event as if it occurred on a stage or screen: the hands – the whole character, the space – the space in which the character resides, and the speaker’s own head and body – on the outside, looking in. Character Viewpoint takes the perspective of the participant in the action: the speaker plays the character: her hands – the character’s, her motion – its, and her body – the character in the scene. The viewpoints occur in roughly equal numbers. Once established, a viewpoint is kept intact for more than a single utterance, and often over long stretches. The shifts between them are not random. They set up contrasting story-telling styles: Character, an enactment style; Observer, a picture-painting style. All these qualities, viewpoint, style, energy, differentiation, work together to funnel energy into speech within a viewpoint frame.
A “dual viewpoint” combines the two viewpoints (Parrill, 2009). Dual viewpoints pose contrasts, not alternatives. A unifying theme binds them. In our example (Box 3 in Figure 18.6), the theme was metapragmatic “this is ironic” (implying the theme, not the words). Each viewpoint has its own material carrier within the gesture: Character Viewpoint, grasping Tweety; Observer Viewpoint, tracing Sylvester’s path. Two viewpoints in one gesture embody the idea of contrast. The cartoon showed Sylvester catapulting himself up to Tweety by throwing a weight onto the other end of a kind of seesaw. He grabbed Tweety, fell back down to the ground and landed on the seesaw again. This relaunched the weight at the opposite end of the seesaw. It arced through the air and landed on him (Tweety escaping). The gesture potentiated the idea of irony by pitting Sylvester’s viewpoint (an unfolding triumph) against the observer’s (an upcoming disaster).
Figure 18.6 Box 3
The growth points worked as follows. The initial Character Viewpoint grip-gesture at (1) was coexpressive with “grab,” and the speaker continued with it at (2)~(4) as she added an arced trajectory with Observer Viewpoint. At (2), the contrast between the viewpoints created the irony of Sylvester’s premature self-congratulation. Coexpressivity then shifted up a notch to the metapragmatic and became a field of equivalents. Every step, grabbing Tweety, landing, running off with him,Footnote 7 differentiated the field of equivalents.
2.2 Dispensing Energy
The dialectic synthesis dispenses the speaker’s energy into a sentence. The energy covers the sentence, experienced in what Wundt called “simultaneous awareness” (Reference BlumenthalBlumenthal, 1970, p. 22). The speaker also experiences a second awareness, the “sequential,” which can be anywhere – at the start, middle or end, depending on the construction. The two awarenesses are features of the growth point. The simultaneous corresponds to the growth point’s core meaning. The sequential applies to the unpacking, how the energy is disbursed into the construction as it is activated into speech, as the transition from Ergon to Energeia. The gesture acts as a template, orchestrating the flow of vocal energy into its preparation, stoke, and retraction phases. The awarenesses give these gesture phrases more than descriptive interest. They are a record of the energy of the growth point spreading through the sentence out of langue, bringing life to the static construction. The following diagram traces the energy spreading through “he goes [up through the pipe] this time,” parsed in terms of awarenesses and gesture phrase/phases:
| Speaker’s production: 2 sec |
|---|
[“he goes energy. Flows in, simultaneous awareness of core meaning, successive awareness of preparation phase, up through the pipe simultaneous awareness continues, successive awareness of energy-peak this time”] energy-peak, stroke, simultaneous and successive awareness end, and well-formedness signals unpacking is complete. (retraction is an active part of the sentence).Footnote 8 |
How does it all work – and so quickly, that the whole process is over in a second or two? The growth point unpacks itself based on its own meaning. It “summons” a construction and words from langue – finding material compatible with its intentions and cohesive ties. The summons meets the requirement that the construction and words, whatever they may be, do not disrupt the differentiation and field of equivalents. The growth point remains intact while the construction builds up around it. The words and constructions the growth point “summons” are usually just right to unpack it into ratified verbal form. The “rising hollowness” growth point summoned what it needed, a declarative structure and words to complete it. The “it down” growth point summoned a causative construction. A construction has its own units and syntagmatic values, slots for words like “this time” and other constructions, and combinatoric potential (Reference GoldbergGoldberg, 1995). Deployed as a whole, the construction conveys a settled meaning. It also provides the place for the gesture to surface – all this is on the timescale of fluent speech. The information load is high. Hesitations and errors reflect the burden, so it is wrong to say the emergence is flawless, but there is, on the whole, a remarkable speed and accuracy to unpacking.
Like the discovery by Reference Marslen-WilsonMarslen-Wilson (1987), who introduced the concept of a cohort for speech-sound recognition, the summons finds constructions for production by forming and winnowing a cohort of its own. A large cohort can be winnowed quite fast. I use the term, “langue-feelings” for the intuitions that form cohorts. Reference JamesJames (1890) wrote of the feeling of “and” (roughly, more to come) and “but” (contrariness). For “up through the pipe” the langue-feeling is Something Happens. For “it down,” it is Make Something Happen. Feelings are not langue’s systematic features, but a channel into them via one’s own inhabitancies and intentions. Reference Huth, de Heer, Griffiths, Theunissen and GallantHuth, de Heer, Griffiths, Theunissen, & Gallant (2016) mapped regions of semantic-field activity in the brain as subjects listened to extended narrations. The semantic fields could be cohorts that langue-feelings were summoning.
3 Empirical Base
The growth point is an empirical concept in that each growth point is a hypothesis concerning an observed speech-gesture-context event.
I will explain a method for testing these hypotheses. But first, we need to consider that some methods, otherwise well-intended for normal investigations, pose obstacles in the unique conditions of testing growth points. Some obscure growth points. A standard experimental practice is to isolate the variable you want to study. Gesture experiments isolate gestures from speech to test the gesture’s specific role in communication. It is important to understand that doing this cannot address the growth point. The problem is that isolating a gesture from speech removes gesture-speech coexpressivity. “Coexpressivity” exists when a gesture and its synchronous speech express a shared meaning. Without coexpressivity, growth points do not exist.
An example is Reference Jouravlev, Zheng, Balewski, Pongos, Le, Levan and FedorenkoJouravlev et al. (2019), who isolated gestures as stimuli in an experiment recording functional magnetic resonance imagings (fMRIs). Gestures and speech were pre-recorded and then presented to the fMRI subjects in two forms: speech alone or gesture alone. It produced a robust reaction for speech-alone but little or none for gesture-alone (not more than for grooming movements). The gestures, however, could not have been organized into growth points and would have fallen out of the speech process. A quiet brain is expected and was observed. Despite a misleading title, the experiment shows a property of the growth point, that coexpressivity is indispensable.
Isolation also can take place accidentally. Reference Winter and DuffyWinter and Duffy (2020) is a case. The experiment had a hidden dimension, a by-product of the experimental design, which separated gesture from speech. Subjects watched a video in which the speaker said, “Next Wednesday’s meeting has been moved forward/backward two days. What day has the meeting been rescheduled to?” The subject was given the choice of either Monday or Friday. In half the cases, the statements were accompanied by a forward gesture with the word “forward” and a backward gesture with the word “backward.” The other half were mismatches, a forward gesture with “backward” and a backward gesture with “forward.” The mismatches had little effect on the subjects’ answers (Winter & Duffy’s Figure 2). This is the crucial datum. Winter and Duffy remark, “our results show that in our task, co-speech gesture cannot carry the mental time line all by itself” (p. 1778).
To understand the test items, the subjects had to form growth points that related a deictic-anchor point or “origo” (Reference Bühler, Jarvella and KleinBühler, 1982) to the onscreen speaker’s gestures. Matched gesture and speech are coexpressive when the origo is at the speaker’s location, the “obvious” place. It is likely they were understood correctly. For mismatches, however, subjects had to place the origo apart from the speaker’s location – hearing “forward” and seeing a backward gesture, the origo goes in front of the speaker; hearing “backward” and seeing a gesture forward, it goes behind. Given this opacity, an unknown number of mismatches likely did not have proper origo placements. Speech again lost coexpressivity; growth points did not form; gestures fell out of the speech process; and as with a deliberate separation, the absence of an effect shows that coexpressivity is indispensable.
In contrast, when an experiment explicitly includes coexpressivity, other growth-point properties automatically also emerge. Reference Sekine, Schoechl., Mulder, Holler, Kelly and ÖzyürekSekine et al. (2020) observed coexpressivity engendering gesture-speech synchronization – another core growth-point property. In theory, still other core properties also had emerged but were not tested.
****
Our method excludes speech-isolated gestures. Growth points are inferred from the totality of communicative events, with special focus on speech-gesture synchrony and coexpressivity. It treats each growth point as a hypothesis for the dynamic construction of one utterance. It keeps faith with a focus on single utterances at the moment of speaking. It is not a test to see if the growth point is a better solution to an empirical issue compared to other proposals.
A given growth point hypothesis is justified to the extent that speech and gesture are (a) synchronized, (b) coexpressive, (c) jointly a “psychological predicate,” (d) hold the same idea in opposite semiotic modes, and (e) follow cohesive threads and catchments. These criteria can be tested for applicability.
You start with a synchronous gesture/speech combination (accurate to within a syllable). Then, using the catchment, you ask what it differentiates – this shows the field of equivalents and coexpressivity. Next, to pin down the field of equivalents, ask how it came to be. Was it the preceding growth point’s differentiation, now a field of equivalents being differentiated itself ? Other fields of equivalents form out of cohesive threads. Still others are both. If none of these applies, the hypothesis is rejected. Next, ask how the speech that unpacked the growth point was summoned and whether langue-feeling could have found it. Finally, did the growth point override syntax, and if so was it to protect the field of equivalents and its differentiation? The answers to these questions must all be yes.
Synchrony is the product (not the condition) of the growth point, a product of thinking with speech and gesture together when the two modes are brought into direct contact. However, if gesture and speech do not precisely synchronize for some external reason (such as mechanical delay), a growth point can still form. But then the evidence is less certain, and at some point speech-gesture asynchrony grows so large it signals that some other process has intervened. The line for this is far from clear. As a matter of data purity, requiring close speech-gesture synchrony filters out ambiguous data. The ultimate criterion is whether a single idea is embodied in two unlike semiotic modes (with or without different aspects of the idea in each mode) and that this combination creates a dialectic. To put a figure to it, the time limit on growth-point asynchrony is probably around 1~2 secs., this being the range of immediate attentional focus.
Mimicry is a surprisingly useful tool for this kind of evidence (see Reference McNeillMcNeill, 2015). The mimicry must be of speech and gesture together, preserving synchrony. Then it is, in effect, mimicry of the growth point itself. Such mimicry gives the gesture coder insight into the growth point’s cognitive core. I often saw coders in our lab spontaneously mimicking gestures and speech.
I am unaware of statistical concepts that fit this method, which may seem a shortcoming but is intrinsic to the growth-point concept. Should we stop or should we recognize a limitation on current empirical methods, that they do not enter the world of the growth point? Deep analysis of the growth point may raise entirely new questions of patterns, gestalts, cohesion, differentiation, and so on. Magnusson’s THEMEtm detects hidden patterns and seems well suited for many of these questions (Reference MagnussonMagnusson, 2000).
4 Final Words
At the beginning I said the growth point is an idea not quite like any other. By now, I imagine, this is self-evident. In writing this summary of the growth point, I have drawn on five books, one with Levy (Reference McNeillLevy & McNeill, 2015), and various articles and chapters going back three decades. My aim has been to explain it and its conceptual framework, not to compare it to other explanations. What portrait do we find? As the speaker’s momentary state of cognitive being, growth points spread personal energy into verbal and gestural form. Evidence suggests the entire brain is active. I assemble the picture around this theme. While it is active, the growth point refashions the actor’s inhabited meaning into a dialectic of dual gesture/language semiosis; resolves it with a construction, the gesture located at the right place within it; and brings langue to temporary life. Coexpressiveness ensures there is gesture–speech unity and adds energy. Threading the growth point cohesively to other growth points adds more. Psychological predicates pack the growth point’s energy into the dialectic. Awarenesses guide the flow, the simultaneous energizing “the embodied moment of meaning” and the successive “the medium” or material carrier (from the Energeia description). The construction and words that unpacking energizes bring their own worlds of meaning, and growth points exploit, revise, and override them. The growth point can override syntax, withholding energy when it threatens integrity. All of this is our portrait – the growth point drawing on influences of many kinds, fashioning and refashioning energy, and spreading it into gesture and speech.
1 Introduction
Gestures are actions of the body that bridge the mind and the world. Gestures are deeply involved in the processes of producing and comprehending language, as psycholinguistic perspectives on gesture have demonstrated (e.g. Reference McNeillMcNeill, 2005). They are an integral part of social interaction in cultural context, as ethnographic studies of gesture have richly documented (e.g. Reference KendonKendon, 2004). They also play an important role in many other cognitive processes, including memory, reasoning, and problem solving (e.g. Reference Alibali, Spencer, Knox and KitaAlibali, Spencer, Knox, & Kita, 2011; Reference Cook, Yip and Goldin-MeadowCook, Yip, & Goldin-Meadow, 2010).
In this chapter, our focus is on cognitive perspectives on gesture. We focus specifically on the role of gesture in cognitive processes that are not directly involved in producing or comprehending language because language is the focus of other sections of this volume. Of course, because language is deeply involved in many cognitive activities, we will draw on psycholinguistic perspectives in some parts of this chapter, and because cognition occurs in context, we will draw on ethnographic perspectives as well. However, in this chapter, we specifically focus on cognitive perspectives on gesture – that is, on gestures as they reflect and affect cognitive processes.
To begin, we briefly consider the scope of psycholinguistic and ethnographic perspectives on gestures, in order to situate cognitive perspectives within the broader landscape of research on gestures. Psycholinguistic theories about gestures have focused primarily on the roles of gestures in producing and, to a lesser extent, comprehending language. Such theories helped to establish gesture as a focus of research in psychology – to put gesture “on the map,” so to speak. David Reference McNeillMcNeill (1992, Reference McNeill2005) pioneered the psycholinguistic study of gestures with growth point theory, which posits that speechFootnote 1 and gesture are two components of a single, irreducible unit of thought (for further details, see McNeill, this volume). In McNeill’s view, concepts are simultaneously both linguistic, as manifested in speech (or sign), and imagistic, as manifested in gestures, and gestures therefore express thought in action.
Later psycholinguistic theories built on the foundation established by McNeill, which emphasized that gesture and language are part of the same system. The interface theory (Reference Kita and ÖzyürekKita & Özyürek, 2003) holds that gestures and speech mutually influence one another throughout the process of speaking. Spatial and motoric information activates associated words and syntactic frames, and the linguistic possibilities of the language being spoken, in turn, influence the content and organization of the spatial and motoric information being expressed in gestures. Thus, the expressive possibilities of the speaker’s language influence the speaker’s conceptualization and utterance formation in real time. Other psycholinguistic theories have focused more narrowly on the role of gestures in specific processes involved in language production, including lexical access (Reference KraussKrauss, 1998; Reference Krauss, Chen, Gottesman and McNeillKrauss, Chen, & Gottesman, 2000) and packaging of spatial-motoric information into units suitable for verbal expression (Reference Alibali, Yeo, Hostetter, Kita, Church, Alibali and KellyAlibali, Yeo, Hostetter, & Kita, 2017; Reference Kita and McNeillKita, 2000).
Ethnographic perspectives on gestures focus on the role of gestures in social interaction in cultural context. Research in this tradition often involves highly detailed analyses of interactions that occur in naturalistic settings (e.g. Reference FloodFlood, 2018; Reference GoodwinGoodwin, 2000; Reference KendonKendon, 2004; Reference StreeckStreeck, 2009; Reference WolfgramWolfgram, 2014), and the cultural context within which such interactions occur is often a central focus of the work (e.g. Reference Brookes, Seyfeddinipur and GullbergBrookes, 2014; Reference EnfieldEnfield, 2005). Interactions in different linguistic and cultural settings may require different cognitive abilities; for example, some languages use inflected forms of cardinal direction terms as grammatical markers, so speakers of these languages must mentally track the cardinal directions. One example is the Indigenous Australian language Guugu Yimithirr, which makes “heavy use of the cardinal direction terms in […] talk about position, location, and motion” (Haviland, 1998, p. 29), therefore invoking an “insistent sense of orientation that is a necessary concomitant of such linguistic usage”(Haviland, 1998, p. 30). This sense of orientation is also evident in Guugu Yimithirr speakers’ gestures (Reference HavilandHaviland, 1993; see also Reference LevinsonLevinson, 1997). Other ethnographic studies of gestures highlight cultural practices, such as teaching, that have important cognitive dimensions (e.g. Reference RichlandRichland, 2015). Thus, there are aspects of gesture production in different interactional and cultural contexts that reveal variations and applications of cognitive abilities. Nevertheless, we believe it is fair to say that most ethnographic studies of gestures do not focus directly on cognitive processes or abilities. Even so, ethnographic accounts can be illuminating regarding the cognitive bases and implications of gesture production.
Cognitive perspectives on gestures focus on the role of gestures in cognitive processes, including learning, memory, categorization, reasoning, and problem solving. Although such activities often involve language, the aim of such activities is not specifically to produce or to comprehend language, but rather to accomplish other objectives, such as solving problems, remembering information, or acquiring skills.
Cognitive perspectives on gestures can be broadly grouped into two “families”: information processing perspectives and embodied perspectives. Information processing perspectives view the cognitive system as a system that takes in information from the world, operates on that information in specific ways, and produces responses. Therefore, information processing perspectives on gestures in cognition focus on the role of gestures in taking in, representing, manipulating, and expressing information, and on the consequences of these processes for performing cognitive activities. In contrast, embodied approaches to cognition view the cognitive system as fundamentally grounded in the actions of physical bodies in the material world (Reference Barsalou, Simmons, Barbey and WilsonBarsalou, Simmons, Barbey, & Wilson, 2003; Reference GlenbergGlenberg, 1997; Reference WilsonWilson, 2002). Therefore, embodied perspectives on the role of gestures in cognition focus on gesture as a form of physical action and on how such actions shape and even constitute thought. Both embodied and information processing perspectives seek to explain a wide range of gestural phenomena, including the sources of gesture in the human cognitive system and the consequences of producing gestures, both for gesture producers and for gesture comprehenders.
Our focus in this chapter is on the gestures that people use to refer to or represent objects, ideas, events, or locations, either by pointing (e.g. deictic gestures), via resemblance (i.e. iconic gestures), or by indicating or resembling a related object, idea, event or location (i.e. metaphoric gestures; see Reference McNeillMcNeill, 1992). We do not directly consider gestures that serve primarily pragmatic functions (e.g. interactive gestures; Reference Bavelas, Chovil, Lawrie and WadeBavelas et al., 1992) or discourse functions (e.g. beat gestures; Reference McClaveMcClave, 1994; Reference Shattuck-Hufnagel and RenShattuck-Hufnagel & Ren, 2018), or gestures that have conventionalized forms and meanings (e.g. recurrent gestures, Reference Ladewig, Müller, Cienki, Fricke, Ladewig, McNeill and BressemLadewig, 2014; Reference MüllerMüller, 2017; or emblems, Reference Matsumoto and HwangMatsumoto & Hwang, 2013; Reference Morris, Collett, Marsh and O’ShaughnessyMorris, Collett, Marsh, & O’Shaughnessy, 1979). However, it is important to note that these form- and function-based categories are not absolute; for example, some recurrent gestures represent metaphorical actions (e.g. the brushing-away gesture for dismissing an idea; Reference Payrató, Teßendorf, Müller, Cienki, Fricke, Ladewig, McNeill and BressemPayrató & Teßendorf, 2014), and given their representational nature, such gestures do fall within our scope.
In this chapter, we review cognitive perspectives on gestures, with a focus on gestures as actions of the body that bridge the mind and the world. We consider four main questions. First, how does the human cognitive system give rise to gestures? We consider theory and empirical work addressing the idea that gestures are based in people’s perceptual and physical experience of the world. Second, do gestures influence how people take in information from the world? We review research suggesting that producing gestures modifies producers’ experience of the world in specific ways. Third, does externalizing information in gestures affect cognitive processing? We consider evidence that expressing spatial and motoric information in gestures has consequences for thinking, including for memory and problem solving. Fourth, how do gestures influence other people’s cognitive processing? We review research indicating that gestures can highlight certain forms of information for others’ thinking, thus engaging social mechanisms that influence cognitive processing.
2 Gestures Are Based on Experience in the World
Gestures occur most frequently when people are thinking or speaking about visual, spatial, or motoric topics (e.g. Reference Lavergne and KimuraLavergne & Kimura, 1987). This observation has led to the proposal that gestures emerge from the imagistic representations that underlie speaking. McNeill’s pioneering growth point theory (Reference McNeill1992, Reference McNeill2005), for example, conceptualized gestures as an outgrowth of spatial, imagistic aspects of thought. This contention – that gestures arise from imagistic thought – is shared by many contemporary theories about the mental representations that give rise to gestures (e.g. Reference Chu and KitaChu & Kita, 2016; Reference Goldin-Meadow and BeilockGoldin-Meadow & Beilock, 2010; Reference Kita and ÖzyürekKita & Özyürek, 2003).
Among these views, the Gesture as Simulated Action (GSA) framework (Reference Hostetter and AlibaliHostetter & Alibali, 2008, Reference Hostetter and Alibali2019) addresses how an imagistic thought might be realized in a gesture. According to this view, which builds on embodied perspectives on cognition (e.g. Reference Glenberg, Witt and MetcalfeGlenberg, Witt, & Metcalfe, 2013), imagistic thought occurs when perceptual and motor systems are activated in the interest of recreating or imagining a physical or perceptual experience. Such images are quite useful, as they allow the cognitive system to access information that may not have been encoded during the original experience or to imagine perceptual consequences of producing a particular action that has not yet been produced (Reference Moulton and KosslynMoulton & Kosslyn, 2009). Further, the formation and maintenance of such images involve activating perceptual and motor systems in ways that resemble how those neural systems are activated during perception and action (e.g. Reference Bone, St-Laurent, Dant, McQuiggan, Ryan and BuchsbaumBone et al., 2019). Because imagery routinely evokes the same neural systems that are involved in perception and action, it is quite possible for activation in these systems to trigger overt movement. Under the GSA framework (Reference Hostetter and AlibaliHostetter & Alibali, 2008, Reference Hostetter and Alibali2019), gestures originate from these simulations – from the utilization of the motor system in the interest of forming and maintaining images.
At any moment, the likelihood that an individual will produce an overt movement that is recognizable as a gesture depends on three factors, according to the GSA framework. First, the amount of activation of the motor system matters. Reference Hostetter and AlibaliHostetter and Alibali (2008) describe this as the “strength of the simulation,” meaning that imagery can involve the motor system to a greater or lesser degree. Different types of imagery are known to involve the motor system to different degrees, with visual imagery relying less on motor areas than motoric imagery (e.g. Reference Guillot, Collet, Nguyen, Malouin, Richards and DoyonGuillot et al., 2009). Consequently, when speakers are using motor imagery to think about how they have interacted with an object, they gesture more than when they are using visual imagery to think about how the object looked (e.g. Reference Hostetter and AlibaliHostetter & Alibali, 2010; Reference Kamermans, Pouw, Fassi, Aslanidou, Paas and HostetterKamermans et al., 2019). Similarly, when describing an object that can be easily grasped, people gesture more than when describing an object that cannot be easily grasped (Reference Chu and KitaChu & Kita, 2016), suggesting that affordances of objects can encourage (or discourage) engagement of the motor system.
The second factor that is proposed to affect the likelihood that an individual produces a gesture is the current height of the individual’s “gesture threshold.” Within the GSA framework, the gesture threshold is conceptualized as the individual’s resistance to producing a gesture at a particular moment in time (Reference Hostetter and AlibaliHostetter & Alibali, 2008, Reference Hostetter and Alibali2019). Some people may maintain permanently high thresholds, rarely letting simulations be expressed as overt gestures. Further, people may temporarily lower their threshold in particular situations, such as when they perceive the communicative benefit of gesture to be high (e.g. Reference Kelly, Byrne and HollerKelly, Byrne, & Holler, 2011), resulting in increased gesture production in those situations compared to others. The gesture threshold is not necessarily a conscious mechanism, although it can be brought under conscious control, for example, when an individual intentionally decides not to gesture because it may be perceived as impolite. The gesture threshold accounts for the well-documented phenomenon that gesture frequency decreases in some social situations, such as when there is no visible audience (e.g. Reference Bavelas and HealingBavelas & Healing, 2013), by proposing that simulations can be maintained internally without being expressed outwardly in gesture.
The third factor that is proposed to affect the likelihood that an individual produces a gesture is whether the image and simulation occur simultaneously with speech. Because speech production involves engagement of the motor system in order to make the sounds of speech, it may be more difficult for people to inhibit the motor activation involved in simulation from being expressed as gesture when they are planning and producing speech than when they are engaging in silent thought. Co-thought gestures can and do occur (see Reference Chu and KitaChu & Kita, 2016), but all else being equal, they are less common than co-speech gestures. This is because, when the speech system is activated, it increases motor activation, making it more likely that activation will surpass the gesture threshold. One implication is that speakers who wish not to produce gestures (e.g. for social or cultural reasons) may find it more challenging to inhibit gestures when their motor systems are simultaneously activated for speaking.
Based on these three main tenets, the GSA framework contends that an individual will produce gestures when they engage their motor system during simulation of mental images and when this motor system activity is strong enough to surpass their current gesture threshold. The framework also offers a number of other, more fine-grained predictions about the likely form of a particular gesture, the cognitive effort involved in inhibiting a gesture, and factors that might affect the strength of a simulation or the height of the gesture threshold (see Reference Hostetter and AlibaliHostetter & Alibali, 2019, for a review). However, most notably for the present purposes, the GSA framework offers a mechanistic account of gesture production: gestures occur when motor and perceptual experiences in the world are mentally recreated, and as such, gestures resemble – in form, size, and content – the perceptual and action experiences that they are about.
There is accumulating evidence that gestures resemble aspects of the real-world experiences the speaker is describing. For example, the form of speakers’ gestures differs depending on the type of object they are describing. When people gesture about objects that are highly manipulable (e.g. a hammer), they are more likely to mimic how one could interact with the object (e.g. holding and pounding) than to depict the object’s shape (Reference Masson-Carro, Goudbeek and KrahmerMasson-Carro, Goudbeek, & Krahmer, 2017). Further, when speakers gesture about lifting an object, the movement they produce resembles the actual movement required to physically lift the object. When people describe a real object that they physically lifted, their gestures display a higher, more arced trajectory than when they describe a mouse movement that they used to virtually “lift” a virtual object on a computer screen (Reference Cook and TanenhausCook & Tanenhaus, 2009). Further, speakers often gesture about small, lightweight objects with one hand rather than with two hands – just as they are more likely to lift small, lightweight objects with one hand rather than two hands (Reference Beilock and Goldin-MeadowBeilock & Goldin-Meadow, 2010). When speakers describe an object that they perceive as heavy, they produce a slower lifting velocity in gesture – mimicking the increased difficulty associated with lifting a heavy object (Reference Pouw, Wassenburg, Hostetter, de Koning and PaasPouw, Wassenberg, Hostetter, de Koning, & Paas, 2020). Finally, the way that people interacted with an object is also reflected in their gestures. Speakers who had learned a figure by manually feeling its contours were more likely to gesture a tracing movement (as though feeling the figure’s contours) than speakers who had learned the figure by visually inspecting it (Reference Kamermans, Pouw, Fassi, Aslanidou, Paas and HostetterKamermans et al., 2019).
In sum, gestures are deeply tied to the perceptual and motor states that are involved in encoding, storing, and retrieving information. Because gestures reflect perceptual and motor experiences that people have had in the world, they are a way of bridging mental representations with the physical world. By recreating a perceptual or motor experience, simulated actions allow the cognitive system to discover aspects of those experiences that were not fully realized before. In this way, gestures may bring new information into the cognitive system. At the same time, by enacting particular aspects of a mental simulation in gestures, gestures put information back out in the world, a process that reduces the cognitive effort involved in maintaining that information internally and that may help others understand the information.
3 Gestures Modify People’s Experience of the World
Because gestures are actions, and because they reflect real or imagined perceptual and motor experiences, producing gestures should have consequences for cognition, more generally. In this section, we review research suggesting that producing gestures modifies people’s experience of the world by influencing the information that they encode. Gestures can provide kinesthetic, visual, and, in some cases, tactile feedback to the cognitive system (Reference Goldin-Meadow and BeilockGoldin-Meadow & Beilock, 2010; Reference Pouw and HostetterPouw & Hostetter, 2016). As such, gestures can introduce new information to the cognitive system and can bring that information more fully into conscious awareness. Note that our focus here is on the information that people take in and have available for processing – an information-processing perspective. At the same time, however, our focus is on the nature of the information that people derive from moving their hands and bodies – an embodied perspective.
Some researchers have argued that people use gestures to explore different ways of representing or interacting with objects, inscriptions, or events (Reference Kita and McNeillKita, 2000; Reference Kita, Alibali and ChuKita, Alibali, & Chu, 2017). As an example, consider a child responding to a Piagetian conservation task that involves comparing the amount of water in two glasses. In this example (drawn from an unpublished pilot study), the experimenter had poured water from one of two identical drinking glasses into a taller, thinner glass and asked the child to judge whether the two glasses now contained the same or different amounts of water and to explain her reasoning. The child compared the water levels in the two containers, saying, “Um, because this … (long pause) water is up to here, and this water is up to here.” A close examination of her response highlights the potential role of her gestures in helping her zero in on features of the objects that could inform her quantity judgement. At the outset of her response, as she said, “Um,” she indicated the height of the water in the glass by placing her thumb at the base of the taller glass and her fingers at the water level (see Figure 19.1, Panel A). During her brief mid-utterance silence, she highlighted the narrow width of the glass by using her index finger to trace the circumference of the glass (see Figure 19.1, Panel B). Note that both of these gestures occurred before she expressed any information about the glass in her speech; at this point, she had said only, “Um, because this ...,” and then paused. She then continued her verbal utterance, saying, “water is up to here,” and placed her flat palm, facing down, at the side of the taller glass at the water level (see Figure 19.1, Panel C). Then, as she said, “and this water is up to here,” she moved her hand to the side of the shorter glass and produced a similar indicating gesture at its water level (see Figure 19.1, Panel D). Thus, the girl indicated the height of the water in the taller glass in gesture at the beginning of her response, well before she first verbalized “up to here,” and she also indicated the narrow width of the taller glass, even though she never mentioned this information in her words.
Figure 19.1 Child explaining her solution to a Piagetian conservation of liquid quantity problem (see text). The child’s verbal utterance was: “Um, because this … (long pause) water is up to here, and this water is up to here.” (A) as she said, “Um,” she highlighted the height of the water in the taller glass, by placing her thumb at the base of the glass and her fingers at the water level; (B) during the pause in her speech, she highlighted the narrow width of the glass by using her index finger to trace the circumference of the glass; (C) as she said, “water is up to here,” she placed her flat palm, facing down, at the side of the taller glass at the water level; and (D) as she said, “and this water is up to here,” she moved her hand to the side of the shorter glass and produced a pointing gesture at its water level.
The temporal structure of the girl’s verbal and gestured response is compatible with the idea that she used gestures to explore the perceptual and action-relevant characteristics of the task objects (i.e. the water levels of the glasses and the narrow width of the tall glass), and that these gestures may have brought information about the water levels and the width of the tall glass into her thinking. This example and others like it (see e.g. Reference Alibali, Church, Kita, Hostetter, Edwards, Ferrara and Moore-RussoAlibali, Church, Kita, & Hostetter, 2014; Reference Kita, Alibali and ChuKita et al., 2017) have been taken as evidence for the idea that people use gestures to explore the perceptual and action-relevant characteristics of objects, events, and ideas that they think and communicate about. As such, producing gestures is thought to increase activation on perceptual and motor information (see Reference Kita, Alibali and ChuKita et al., 2017).
Some experimental evidence also supports the idea that people use gestures to explore perceptual and motor-relevant features of a task at hand. Reference Kirk and LewisKirk and Lewis (2017) addressed this issue in a study using the Alternative Uses Task, in which participants generate alternative ways in which one could use a common object, such as a newspaper or a brick. In their experiment, children performed the task, both in a baseline (gesture-allowed) condition and in a gesture-encouraged condition. Participants generated more novel uses in the gesture-encouraged condition, as would be expected if gesture brings new information into the cognitive system. The findings suggest that producing gestures allowed participants to explore the action affordances of the objects. Indeed, many of the gestures that participants produced depicted actions on or with the objects. Kirk and Lewis argued that the gestures made perceptual features of the objects and relevant action schemas more salient for participants, enabling them to generate novel ways of engaging with the objects.
If gestures make perceptual and motoric qualities highly salient, then people should focus more on such information when gestures are allowed than when gestures are prevented. Indeed, people talk more about perceptual and motoric information in conversations when they are allowed to gesture than when they are prohibited from gesturing (Reference Rimé, Shiaratura, Hupet and GhysselinckxRimé, Shiaratura, Hupet, & Ghysselinckx, 1984). They also rely more on perceptual and motor information in solving problems when gesture is allowed than when gesture is prevented (Reference Alibali and KitaAlibali & Kita, 2010). As one example, Reference Alibali, Spencer, Knox and KitaAlibali and colleagues (2011) asked participants to solve gear movement prediction problems, which involved different numbers of gears connected in a sequence. Participants were asked to determine which direction the final gear in the series would turn, if the first gear were turned in a particular direction. Participants who were prevented from gesturing often generated an abstract solution strategy that focused on the number of gears in the series: if the number was even, the final gear would turn in a direction opposite to the first gear, and if the number was odd, the final gear would turn in the same direction as the first gear. In contrast, participants who were allowed to gesture tended to rely on motor simulation strategies; they imagined (and often gestured about) the movement of each individual gear. Thus, the availability of gesture supported participants’ use of an action-based strategy to solve the problems.
Producing gestures may also influence participants’ problem-solving strategies by highlighting perceptual features of the problems or by supporting their encoding of those features. In a study of children learning to solve mathematical equivalence problems (e.g. 6 + 9 + 4 = 6 + __ ) (reported in Reference Alibali, McNeill, Perrott, Gernsbacher and DerryAlibali, McNeil, & Perrott, 1998), some participants were given problems in which the equal sign was printed in red ink. For the problem 6 + 9 + 4 = 6 + __ , one participant wrote an incorrect solution (25) in the blank, but then suddenly moved her hand to trace under the 4 = 6, saying “What’s this?!” She then said, “I, I didn’t look at it that way” while sweeping her hand under the entire problem. She then changed her solution to the problem, and this time applied a correct strategy. In this example, the girl’s initial solution strategy did not take into account the position of the equal sign. Her eventual attention to this perceptual feature of the problem was presumably triggered by the contrasting color ink, and then supported by her gesture tracing under the equal sign (i.e. under 4 = 6). Thus, her gesture appeared to reflect her perceptual encoding of the problem, and it may have even supported her in using that perceptual information to generate a new, correct-solution strategy. We turn next to the issue of how gestures may support or influence cognitive processing.
4 Externalizing Information in Gestures Supports Cognitive Processing
Information-processing perspectives on cognition view the human mind as a limited-capacity system. This means that people can maintain or operate on only a finite amount of information at any given time (e.g. Reference CowanCowan, 2010). Cognitive tasks can be made easier by reducing the amount of information that must be maintained internally (e.g. Reference Risko and GilbertRisko & Gilbert, 2016), for example, by writing some information down (e.g. Reference Eskritt and MaEskritt & Ma, 2014) or by directing one’s gaze to the information in the environment (e.g. Reference Droll and HayhoeDroll & Hayhoe, 2007). Some theorists have argued that gestures also have the potential to relieve processing demands because the information indicated or depicted in gestures does not have to be held internally in the cognitive system (Reference Cook, Fenn, Church, Alibali and KellyCook & Fenn, 2017; Reference Pouw, de Nooijer, van Gog, Zwaan and PaasPouw, de Nooijer, van Gog, Zwaan, & Paas, 2014). Thus, producing gestures may reduce the working memory demands involved in formulating explanations or solving problems.
Evidence to support this claim comes from studies that use a dual-task paradigm, in which participants are asked to engage in two tasks simultaneously – a primary task (e.g. describing something) and a secondary task (e.g. remembering unrelated information). One can infer how much cognitive capacity is required by the primary task by evaluating performance on the secondary task. The logic is that if the primary task requires more of the cognitive system’s total available capacity, then there will be fewer resources left to devote to the secondary task. In the first study to apply this method to understanding the cognitive consequences of gesture production, Reference Goldin-Meadow, Nusbaum, Kelly and WagnerGoldin-Meadow, Nusbaum, Kelly, and Wagner (2001) compared the resource demands of explaining how to solve a math problem with gesture to the resource demands of explaining without gesture. For both children and adults, using gesture during the explanations resulted in better memory for unrelated information (letters or words), suggesting that producing gestures had reduced the resource demands of explanation. Later research demonstrated that this effect is unique to gestures that express information; meaningless hand movements do not have the same effect (Reference Cook, Yip and Goldin-MeadowCook, Yip, & Goldin-Meadow, 2012).
One explanation for this phenomenon is that when speakers gesture, some of the resource demands of speaking are shifted to the external world (Reference Wagner, Nusbaum and Goldin-MeadowWagner et al., 2004). In the problem explanations studied by Reference Goldin-Meadow, Nusbaum, Kelly and WagnerGoldin-Meadow et al. (2001), children explained how they solved equations of the form a + b + c = a + __ , and adults explained how they factored polynomial expressions of the form ax2 + bx + c. These solution processes require thinking about both the identities and the positions of elements in the given problems. By pointing to the numbers or other symbols, speakers could index their thinking to the physical environment rather than holding all of the relevant information in their working memory. For example, in the equation 5 + 3 + 6 = 5 + __, it may be easier to jointly refer to both the five and its position by pointing to the 5 on the left side while saying “five,” than by verbally encoding both pieces of information in speech, for example, by saying “the 5 on the left side.” Likewise, in explaining the problem x2 + 5x + 6 = (x + 2)(x + 3), the solver needs to explain that the selected factors of 6 (2 and 3) sum to 5, which may be easier to do if pointing to the “6” and the “5” than if holding those values in mind. Importantly, gestures seem to relieve resource demands, even when there are no objects present in the physical environment (Reference Ping and Goldin-MeadowPing & Goldin-Meadow, 2010); merely pointing to a particular location where an object used to be seems to bind the representation being expressed in speech to that spatial location and reduce the mental load involved in explaining. (See the chapter “Gestures in learning and education” in this Handbook for more about research on this topic.)
An embodied perspective suggests an alternative view: namely, gestures ease resource demands because they support processing of visuo-spatial and motoric information. One task domain in which this possibility has been studied is mental rotation (e.g. Reference Shepard and MetzlerShepard & Metzler, 1971). Mental rotation tasks require participants either to imagine how a particular stimulus would look after it has been rotated an indicated amount (e.g. 90 degrees) in a particular direction (e.g. counterclockwise), or to identify, from a set of options, the image that depicts a target object that has been rotated. People regularly produce gestures when completing such tasks, with the form of the gestures being indicative of the solver’s mental strategy. For example, when initially learning the task, people tend to gesture as though they are imagining physically rotating the blocks with their hands; with more experience, they represent the object itself in gestures, rather than the manual action they would use to manipulate the object (Reference Chu and KitaChu & Kita, 2008). Further, people with strong spatial skills are often better at encoding and visualizing the starting configuration of the blocks, and they tend to gesture about this structure more than people with weaker spatial skills (Reference Göksun, Goldin-Meadow, Newcombe and ShipleyGöksun, Goldin-Meadow, Newcombe, & Shipley, 2013). Such findings support the idea that gestures reflect gesturers’ cognitive processes during mental rotation.
However, to make a strong case that gestures support processing of visuo-spatial information, it would be more compelling to show that producing gestures actually aids people in performing mental rotation. Indeed, Reference Chu and KitaChu and Kita (2011) found that speakers tend to gesture more on more difficult mental rotation tasks. Moreover, instructing participants to gesture during mental rotation significantly enhanced their performance, and the benefit persisted on later mental rotation problems that were not accompanied by gesture. Chu and Kita concluded that externalizing the rotation in gestures not only eased the computational demands of mental rotation, but also helped solvers learn to perform such tasks internally. Thus, the cognitive effects of gesture can last beyond a particular gesture. Producing gestures can help people more efficiently process spatial and motor information so that they eventually no longer need gestural support.
In gestures about mental rotation, the hands often “become” a representation of the depicted object that itself can be manipulated. Such gestures embody – that is, literally give a body to – the imagined object. Embodying an object is one of several ways in which gesture may support visuo-spatial processing. Other ways include manipulating an imagined object with the fingers or the whole hand (e.g. holding an imagined object to rotate it, or moving “beads” on an imaginary abacus) and tracing the form of an imagined object (e.g. tracing a word in the air to determine whether it is spelled correctly). It is worth noting that these different functions involve different modes of representation (see Reference Müller, Cavé, Guaïtella and SantiMüller, 1998, Reference Müller, Müller, Cienki, Fricke, Ladewig, McNeill and Bressem2014) – representing, acting, and drawing.
If people rely on gestures to support visuo-spatial processing, then prohibiting or interfering with such gestures should lead to decrements in performance. Some evidence supports this hypothesis, particularly for participants with low visual working memory capacity (Reference Eielts, Pouw, Ouwehand, van Gog, Zwaan and PaasEilts et al., 2020) and for participants who are in the early stages of learning a task, when the resource demands of performance are greatest. For example, Reference Cho and SoCho and So (2018) found that children who had intermediate skill at using a physical abacus tended to rely on gestures when they performed mental arithmetic, and they struggled when they could not produce gestures; however, children with higher skill performed well with gesture or without. Similarly, young children often hold up fingers to assist with arithmetic calculations, but as their arithmetic skills increase, they shift to using retrieval, and their use of fingers drops off (Reference Jordan, Kaplan, Ramineni and LocuniakJordan, Kaplan, Ramineni, & Locuniak, 2008; Reference Solyu, Lester and NewmanSolyu, Lester, & Newman, 2018).
However, some studies have revealed no benefits of allowing vs. prohibiting gestures on cognitive tasks that rely on visuo-spatial processing, such as generating proofs for geometric conjectures (Reference Walkington, Woods, Nathan, Chelule and WangWalkington, Woods, Nathan, Chelule, & Wang, 2019). Further, the mechanisms that underlie the observed benefits of gesture for visuo-spatial processing remain unspecified. In one recent study of abacus experts performing mental arithmetic, Brooks and colleagues found that interfering with motor planning (i.e. preventing simulation of actions) led to poorer performance, although interfering with motor movements (i.e. preventing gestural expression of those simulations) did not (Reference Brooks, Barner, Frank and Goldin-MeadowBrooks et al., 2017). It is possible that gestures reflect underlying mental imagery that is critical for performing some visuo-spatial tasks and that interfering with that imagery is detrimental to performance, even if preventing outward production of gestures is not (see Reference Kamermans, Pouw, Fassi, Aslanidou, Paas and HostetterKamermans et al., 2019). Further research is needed to elucidate the kinds of tasks for which producing gestures is beneficial and to better delineate underlying mechanisms.
When perceptual and action-relevant features of a task are important, people may also intentionally use gestures to support their processing of such information. People spontaneously use gestures in this way in a wide range of tasks, including counting objects (e.g. Reference GrahamGraham, 1999) and interpreting graphs (e.g. Reference Radford, Demers, Guzmán, Cerulli, Pateman, Dougherty and ZillioxRadford, Demers, Guzmán, & Cerulli 2003). Reference Alibali and DiRussoAlibali and DiRusso (1999) found that gestures helped children to keep track of objects when counting them, and they also helped children to coordinate reciting the counting string (i.e. “1, 2, 3 … ”) and attending to each of the objects. Children who gestured to (i.e. pointed or touched) the objects in an array as they counted them made fewer errors in coordinating these processes (e.g. continuing to recite the count words after the last item, or stopping before the last item) than did children who saw a puppet gesture while they recited the counting string (Reference Alibali and DiRussoAlibali & DiRusso, 1999). As another example, train conductors in Japan produce gestures as part of a technique to ensure that they implement operating procedures completely; the system is termed a “point-and-call” system, and it involves reinforcing steps in a task both physically (with gestures) and audibly (with words) (Reference RicharzRicharz, 2017). As these examples illustrate, producing gestures can not only increase the likelihood that perceptual or motoric information is available for processing, but it can support accurate processing of such information when it is critical to the task at hand.
Gestures may also influence processing of visuo-spatial and motoric information because they provide a natural, analogical way of representing that information. As a spatial and motoric representation, gestures may be readily remembered, and they can be used as an alternative or supplement to verbal forms of representation. For example, when learners gesturally enact an action during encoding, they have better memory for that action than when they encode it through verbal means alone (e.g. Reference Engelkamp and ZimmerEngelkamp & Zimmer, 1994).
This effect extends to speech-accompanying gestures, as well. Reference Cook, Yip and Goldin-MeadowCook et al. (2010) asked speakers to view and retell a cartoon, and then tested their memory for events after a three-week delay. Speakers better remembered events for which they spontaneously gestured during their initial retelling than events for which they did not gesture. Although this effect could be due to more salient events being both more likely to be accompanied by gesture and more likely to be remembered, Cook and colleagues demonstrated in a second experiment that instructing participants to gesture also enhanced their memory for events three weeks later. Thus, gesturing about information during encoding seems to encourage storage of the information in a way that is more resistant to decay or forgetting.
It seems clear, then, that externalizing information in gesture makes that information easier to think about, at least in some cases. There are multiple potential mechanisms by which gesture may have this beneficial effect – by offloading information to the environment, by supporting processing of visuo-spatial and motoric information, and by providing a natural, analogical way of representing visuo-spatial and motoric information. Next, we consider how externalizing information through gesture can also make communication about that information easier.
5 Gestures Support People’s Communication of Visuo-Spatial and Motoric Knowledge
We have argued that producing gestures supports processing of visuo-spatial and motoric information. Producing gestures may also support communication of such information, and communicating such information may have consequences for both gesture producers and gesture comprehenders. Simply put, gestures help speakers put visuo-spatial and motoric information into the world, where it can be taken up and used by others. In this section, we also briefly consider gesture’s roles in speech production because speech is the most common form of communication.
Some research has suggested that gestures help speakers retrieve spatial and motoric words from the mental lexicon via a process of cross-modal activation of spatial and motoric ideas. This spreading activation increases activation on lexical items associated with spatial and motoric concepts, facilitating access to those lexical items (e.g. Reference Krauss, Chen, Gottesman and McNeillKrauss et al., 2000). This mechanism helps explain why people focus more on visuo-spatial information in speaking and thinking when they are allowed to produce gestures, as discussed above. This mechanism can also explain reports that gesture aids in resolving tip-of-the-tongue states (e.g. Reference Frick-Horbury and GuttentagFrick-Horbury & Guttentag, 1998; Reference Pine, Bird and KirkPine, Bird, & Kirk, 2007). However, it is worth noting that some studies of tip-of-the-tongue states have not revealed beneficial effects of gesture (e.g. Reference Beattie and CoughlanBeattie & Coughlan, 1999) or have revealed beneficial effects only for participants with weaker verbal short-term memory (Reference Pyers, Magid, Gollan and EmmoreyPyers, Magid, Gollan, & Emmorey, 2021). Other studies have also revealed beneficial effects of non-gestural movements, such as tapping (Reference RavizzaRavizza, 2003). Gestures during tip-of-the-tongue states may also have social functions, cuing interaction partners to supply the sought-after words (Reference Goodwin and GoodwinGoodwin & Goodwin, 1986).
Other research has suggested that gesture helps speakers to “package” spatial and motoric information into verbalizable units (Reference Alibali, Kita and YoungAlibali et al., 2000; Reference Hostetter, Alibali and KitaHostetter, Alibali, & Kita, 2007; Reference Kita and McNeillKita, 2000). According to the Information Packaging Hypothesis, people use gesture to explore potential “packages” of spatial and motoric information for communicative expression, yielding chunks of information that speakers may focus on in their subsequent utterances, and that listeners may also attend to and process.
When speakers express spatial and motoric information in gestures, their listeners are likely to take in that information. Indeed, two recent meta-analyses (Reference Dargue, Sweller and JonesDargue, Sweller, & Jones, 2019; Reference HostetterHostetter, 2011) yielded strong evidence for beneficial effects of gestures on comprehension, and one (Reference HostetterHostetter, 2011) concluded that speakers’ gestures benefit listeners’ comprehension of motoric information to a greater degree than they benefit comprehension of abstract information. Why might this be the case? One possibility is that, when speakers express motoric information in their gestures, listeners may draw on that information to guide their own simulations of motor actions (see Reference Marghetis, Bergen, Müller, Cienki, Fricke, Ladewig, McNeill and BressemMarghetis & Bergen, 2014). Likewise, when speakers express spatial and perceptual information in their gestures, listeners may draw on that information to guide their own simulations of spatial and perceptual information. Put simply, speakers’ gestures may inform listeners’ simulations.
Speakers’ gestures may also evoke gestures or other actions in their listeners. People sometimes spontaneously imitate the gestures that others produce (e.g. Reference Holler and WilkinHoller & Wilkin, 2011; Reference KimbaraKimbara, 2006, Reference Kimbara2008; Reference Vest, Fyfe, Nathan and AlibaliVest, Fyfe, Nathan, & Alibali, 2020), and they sometimes imitate others’ actions on objects in their gestures. These mimicked gestures may in turn influence the cognitive processes of those who produce them (e.g. Reference Cook and Goldin-MeadowCook & Goldin-Meadow, 2006). Indeed, some educational technology applications are based on the idea that imitating others’ gestures can bring novel information into the learner’s cognitive system (see, e.g. Reference Nathan and WalkingtonNathan & Walkington, 2017).
These findings underscore the importance of gesture in communication and highlight that speakers’ gestures may influence cognitive processing in their listeners, as well as in themselves. Speakers’ simulations give rise to gestures, and producing gestures may in turn influence those simulations. Speakers’ gestures may also influence their listeners’ simulations, and those simulations can also influence listeners’ subsequent gestures. As Reference Marghetis, Bergen, Müller, Cienki, Fricke, Ladewig, McNeill and BressemMarghetis and Bergen (2014) put it, there are “bidirectional causal influences between gesture and simulation, both within the speaker and between speaker and listener” (p. 2000).
6 Conclusion: Gestures as a Nexus of the Body, the Mind, and the World
In this chapter, we have considered gestures as a nexus of the body, the mind, and the world. We began by reviewing evidence that gestures are based on and derive from perceptual experiences of the world (i.e. perceptual states) and physical experiences in the world (i.e. actions). From this perspective, gestures are actions of the body that manifest how people engage with their bodies and minds in the world.
We then considered evidence that producing gestures influences people’s experience of the world by bringing information into their cognitive systems and by highlighting perceptual, spatial, and motoric information for thinking and reasoning. Producing gestures influences the information that people encode about the world because gestures can provide or highlight kinesthetic, visual, and, in some cases, tactile feedback for the cognitive system. As such, people may use gestures as a means to explore the perceptual and action-relevant characteristics of tasks, situations, or ideas. As a consequence, when people produce gestures, they focus more on perceptual and motoric information than when they do not produce gestures. Thus, producing gestures enhances the salience of perceptual and motor information for further cognitive processing. In brief, gestures can both introduce perceptual-motor information into the cognitive system and can increase the salience of such information, bringing it more fully into the attentional spotlight, and perhaps, in some cases, into conscious awareness.
We also considered evidence that gestures put spatial and motoric information out into the world, and that doing so affects cognitive processing in specific ways. Externalizing information in gestures can reduce the demands of speaking, thinking, and problem solving on the cognitive system because the information expressed in gestures does not need to be held in mind during other processing. Put another way, putting information into the world via gesture can ease the burden of cognitive processing, and can therefore alter the course of cognitive activity.
Finally, we considered evidence that producing gestures supports people’s communication of visuo-spatial and motoric information, and that communicating such information has consequences for the recipients of that gestured information. When speakers use gestures to put visuo-spatial and motoric information into the world, that information can be taken up by others and can influence the content of others’ minds. Speakers’ gestures may enhance listeners’ attention to perceptual and motoric information or may encourage listeners to produce similar gestures themselves.
We have argued in this chapter that gestures play a pivotal role in cognition, as they are a bridge between the body, the mind, and the world. In closing, we would like to highlight two important threads that unify many of the specific claims we have discussed about gestures and cognition. First, gestures are closely tied to action, as they reflect simulations of action on the part of gesture producers (Reference Hostetter and AlibaliHostetter & Alibali, 2008, Reference Hostetter and Alibali2019), and they may support gesture comprehenders in simulating actions, either overtly or covertly, as well (Reference Alibali and HostetterAlibali & Hostetter, 2010; Reference Iani, Burin, Salatino, Pia, Ricci and BucciarelliIani et al., 2018; Reference Ping, Goldin-Meadow and BeilockPing, Goldin-Meadow, & Beilock, 2014). Gestures are actions of the body, yet they simultaneously represent other actions or perceptual states. Future research is needed to address the processes by which speakers abstract from physical actions to produce representational actions in the form of gestures. Research is also needed to investigate the role of action experience in gesture production, including how variations in people’s potential for action influence their gesture production (see Reference Casasanto, Caballero and Díaz VeraCasasanto, 2013).
Second, because gestures represent information, gestures can reveal how producers schematize information in the objects, tasks, events, situations, or inscriptions that they gesture about (Reference Kita, Alibali and ChuKita et al., 2017). Every mode of representation preserves some information about the thing it represents, and omits other information, and gesture is no exception. Moreover, the availability of gestures may lead producers to preserve spatial and motoric information, rather than other sorts of information, such as propositional or symbolic information. People’s gestures reveal how they schematize information, and those gestures can guide schematization for the recipients or comprehenders of those gestures, as well. Future research is needed to identify the processes involved in schematization in gestures, both for gesture producers and for gesture comprehenders.
Toward this goal, some scholars have suggested that some gesture forms may derive from more general cognitive schemas, such as image schemas or mimetic schemas (e.g. Reference CienkiCienki, 2013; Reference ZlatevZlatev, 2014). Image schemas can be defined as “recurring, dynamic pattern(s) of […] perceptual interactions and motor programs” (Reference JohnsonJohnson, 1987, p. xiv), such as schemas for PATH or CONTAINER. Mimetic schemas can be defined as preverbal, body-based representations that are learned via imitation (Reference ZlatevZlatev, 2014), such as schemas for GRAB or KICK. To illustrate, consider a speaker who is talking about catching a flying insect. The speaker might produce a tracing gesture that depicts the path of the insect’s flight, based on her general image schema for PATH. Or the speaker might produce a gesture that depicts grabbing for the insect, based on her mimetic schema for GRAB. Alternatively (or in addition), speakers may dynamically abstract features of mental simulations that are relevant or important in the moment, and they may spontaneously generate gesture forms to express these important features. To extend the flying insect example, suppose the speaker were focused on the small size of the insect. In this case, she might produce a thumb-and-index finger pinching gesture that depicts the creature’s smallness.
Regardless of the source – whether drawn from previously acquired image or mimetic schemas or generated anew – gestures can reveal the schematic representations that speakers activate, and they may inform listeners’ schematization, as well. Any of the gestures described in the preceding paragraph would be a reasonable and appropriate way to gesture about a flying insect. Importantly, the form of the gesture that a speaker produces can reveal something about what the speaker has in mind – and may also reveal to the listener something about where the speaker is likely to go next in her story.
We close with the idea that gesture is a bridging representation – one that connects concrete or instantiated ideas with more general or abstract ones. Imagine, for example, a person who represents the idea of drinking from a cup in gesture. Their gesture may be informed by their recent experience performing that specific, concrete action – yet it is also a generalized form of action, in the sense that it represents a generalization over many cups, with varying amounts of hot or cold liquid, lifted to the mouth and tipped in different settings, at different times, with different speeds, and so forth. The gesture is abstracted in the sense that it does not accurately model the exact hand shape used in any single instance of drinking from a cup – yet, at the same time the gesture schematizes key information about that class of specific actions, by depicting the hand holding a rigid object and tilting it to the mouth. Thus, gestures “redescribe” concrete, instantiated actions in more abstract ways, and in this sense, gestures connect the concrete and the abstract – or the abstracted.
In sum, gestures are based in people’s experience of the world, they affect how people take in information from the world, and they put information out into the world, where that information can be used by the self or by social others. Thus, gestures play an integral role in cognition, both for gesture producers and for gesture recipients because they are actions of the body that bridge the mind and the world.
This chapter presents and discusses empirical data on the neuropsychology of gesture production. Given the ongoing discussion of the roles of the right and left hemispheres in gesture production, the focus of this chapter is on their specific contributions. The neuroscientific method applied in a given study has a substantial impact on the results obtained, and different methodologies can even entail apparently opposing results concerning gesture production; therefore, the methods, their paradigms, and limitations will be presented in detail. Based on the research results and the methodological considerations, the chapter will discuss how different aspects of gesture production are linked to left and right-hemispheric functions such as spatial cognition, nonverbal emotional expression, global and metaphorical thinking, praxis,Footnote 1 and language.
1. Gesture Research in Neuroscience
1.1 History
In neuropsychology, gesture research has a long-standing tradition. In 1870, Finkelnburg introduced the concept of “asymbolia” as a new model of aphasia. This model included a wide range of disturbed functions concerning the use of symbols in general, such as musical notation, algebraic, geometrical, and chemical symbols, religious rites, gestures, facial expressions, and pantomime. In contrast, Reference LiepmannLiepmann (1908) argued that deficits in pantomime gestures could not be explained by asymbolia because this disorder often cooccurred with an impairment in imitating gestures. Visuo-motor imitation requires that only the visually perceived movement is reproduced but not that the meaning of the movement is understood. Therefore, deficits in pantomiming should reflect a general disorder in the conceptualization and execution of spatio-temporal movement concepts. Liepmann’s seminal proposition has been the basis for apraxiaFootnote 2 models to this day.
Liepmann’s model of apraxia was based on his examination of patients with a left hemisphere damage (LHD) who showed – as is to be expected – a palsy of the right half of the body and often also aphasia, but who also could not perform gestures and actions with the left hand on command, although the left hand was not paretic or ataxic (i.e. there is no lack of strength, coordination, or sensation). In order to follow Liepmann’s thoughts, it is important to know that the left cerebral hemisphere controls the contralateral right half of the body, and vice versa, the right hemisphere the contralateral left half (Figure 20.1a).

Figure 20.1a Neurotypical motor control of the right and left hands

Figure 20.1b Left hemisphere damage: paresis of the contralateral right hand

Figure 20.1c Right hemisphere damage: paresis of the contralateral left hand

Figure 20.1d Callosal disconnection: exclusive contralateral control of the right and left hands
Accordingly, in LHD, the right half of the body is paretic and cannot perform movements (Figure 20.1b), and vice versa, in right hemisphere damage (RHD), the left half of the body (Figure 20.1c). Liepmann inferred from the patients’ symptomatology that the left hemisphere must control movement not only for the contralateral right hand but also for the ipsilateral left hand.
Furthermore, since some of these patients were able to imitate movements with the left hand, Liepmann assumed that, in these patients, the selective process of movement execution (as compared to movement conceptualization) was not disturbed so that they could execute the movements when the movement concept was shown to them. Liepmann localized the process of execution (“innervation”) in the so-called sensomotorium of the motor cortex of each hemisphere. Thus, when we perform a movement, according to Liepmann, first a movement is conceptualized in the left hemisphere or the concept is retrieved from the memory (conceptualization) and then sent to the respective sensomotorium in the left or right hemisphere where it is translated into movement (execution). The movement concept is sent to the left-hemispheric sensomotorium when the movement is to be performed by the right hand and it is sent through the corpus callosum, which connects the two hemispheres, to the right-hemispheric sensomotorium when the movement is to be performed by the left hand (compare Figure 20.1a).
Liepmann also observed that most of the LHD patients were able to use their left hands normally for daily routine movements, for example, drinking from a glass, using a fork. He proposed that first, in the sensomotorium there is a distinct kinetic memory for certain short, automatized stereotypically recurring movements, and that second, the object leads the hand, for example, when using a coffee mill the hand movement is guided by the tool and, therefore, the conceptualization of the movement is facilitated.
The weak point of Liepmann’s two-step hierarchic apraxia model – (1) conceptualization, (2) execution – is that it does not explain several patient cases, for example, people in whom the visuo-motor imitation of a gesture is disturbed more than the performance of the same gesture on verbal command, although only the latter praxis modality requires a conceptualization of the movement (Reference Ochipa, Rothi and HeilmanOchipa, Rothi, & Heilman, 1994). As a consequence, a recent trend in apraxia research is the development of multidimensional models to consider each praxis modality separately and to examine which other cognitive functions are codisturbed, for example, an apraxia in the imitation of finger configuations might cooccur with a deficit in copying complex figures (see below) (Reference Goldenberg, Hartmann and SchlottGoldenberg, Hartmann, & Schlott, 2003). This method enables identification of which cognitive functions the praxis modality is associated with. This paradigm is also followed in this chapter, that is, to explore which cognitive function the generation of a specific gesture type may be associated with.
1.2 Methodology
This section describes specific neuroscientific methods as well as gesture elicitation and analysis procedures employed in neuropsychological gesture research.
1.2.1 Specific Neuroscientific Methods
As outlined above, a focus of neuropsychological gesture research is on the topographical localization of gesture production in the brain – more precisely, on the determination of brain regions that critically contribute to the production of gestures. The traditional method is the study of patients with circumscribed brain damage. In recent decades, technical development has also enabled the use of neuroimaging techniques for the localization of gesture production.
1.2.1.1 Lesion Studies
In neuropsychological gesture research, lesion studies are typically conducted in patients with LHD or RHD and the two patient groups are compared (only rarely do gesture production studies distinguish between different regions within a hemisphere). The specific deficit in gesture production reflects the impaired competence of the damaged hemisphere. Accordingly, the remaining competence in gesture production is likely to reflect the competence of the undamaged hemisphere. Thus, the paradigm behind the lesion studies is if a deficit in gesture production is found, the hemisphere which is damaged is crucial to the function which was found to be lost or impaired in gesture production. A limitation of this method is that it cannot be excluded that remaining undamaged regions in the damaged hemisphere contribute to gesture production, since damage to one hemisphere is rarely complete. However, the direct comparison of LHD and RHD individuals in a study enables the identification of right and left hemisphere contributions to gesture production relatively reliably.
The highest degree of reliability can be achieved by investigating patients with callosal disconnection. In this rare patient group, the corpus callosum, which is the biggest neural fiber connection between the left and right hemispheres, is lesioned,Footnote 3 such that information transfer between the two hemispheres is not possible anymore (Figure 20.1d). As outlined above, the right hand is controlled by the contralateral left hemisphere, and vice versa, the left hand by the contralateral right hemisphere.Footnote 4 This neural organization implies that information between the two hemispheres must pass through the corpus callosum, when the right hand executes ipsilateral right hemisphere concepts and vice versa, the left hand ipsilateral left hemisphere concepts. If for example a person with left-hemispheric speech production writes with the ipsilateral left hand, the control runs from the left hemisphere over the corpus callosum to the motor cortex of the right hemisphere and from there to the left hand. In patients with callosal disconnection, the callosal transfer is no longer possible, and thus the left hand can only execute right hemisphere concepts and, vice versa, the right hand left hemisphere concepts (Reference SperryGazzaniga, Bogen, & Sperry, 1967; Reference Lausberg, Kita, Zaidel and PtitoLausberg, Kita, Zaidel, & Ptito, 2003; Reference SperrySperry 1968; Reference Trope, Fishman, Gur, Sussman and GurTrope, Fishman, Gur, Sussman, & Gur, 1987; Reference Volpe, Sidtis, Holtzman, Wilson and GazzanigaVolpe, Sidtis, Holtzman, Wilson, & Gazzaniga, 1982). Therefore, the examination of patients with callosal disconnection reliably enables localization of the production of specific gestures in the left and right hemispheres. Moreover, in these patients the competences of the left and right hemispheres can be directly compared within one individual.
1.2.1.2 Hand Preference Studies
Beginning with Kimura’s seminal studies (Reference Kimura1973a, Reference Kimura1973b) on hand preferences in spontaneous gesture production, neuropsychological gesture research has investigated hand preferences in healthy individuals as an indicator of hemispheric specialization (see detailed review in Reference LausbergLausberg, 2013). As evidenced by behavioral laterality experiments, healthy individuals show a preference for responding with the hand that is contralateral to the hemisphere that performs the task (Reference Zaidel, White, Sakurai, Banks and ChiarelloZaidel, White, Sakurai, & Banks, 1988). As an example, right-handers shift from more right-hand use in verbal tasks, which are processed in the left hemisphere, toward more left-hand use in spatial tasks, which are processed in the right hemisphere (Reference Hampson and KimuraHampson & Kimura, 1984). The limitation of hand preference studies in healthy subjects, however, is that if required, the intact corpus callosum enables each hemisphere to exert control over the ipsilateral hand. Therefore, other factors can overrun the effect of hemispheric specialization for a task on the hand preference and induce the use of the hand that is ipsilateral to the hemisphere that performs the task. In gesture production studies, handedness has to be considered first and foremost. Right-handers as compared to left-handers show a trend toward more right-hand use.Footnote 5 Left-handers as compared to right-handers show a trend toward more left-hand use (Reference KimuraKimura, 1973a, Reference Kimura1973b) as well as different gestural roles of the left and right hands (Reference Helmich, Voelk, Coenen, Xu, Reinhardt, Mueller, Schepmann and LausbergHelmich et al., 2021, Reference Helmich, Meyer, Voelk, Coenen, Mueller, Schepmann and Lausberg2022). Furthermore, in one study, in which handedness, however, was not controlled for, males showed a stronger right-hand preference for gestures than females (Reference Saucier and EliasSaucier & Elias, 2001). A semantic purpose, such as when talking about the left or right of two objects, results in the preference for the left or right hand, respectively, in the accompanying gestures (Reference Lausberg and KitaLausberg & Kita, 2003). Likewise, cultural conventions can determine the hand choice, such as when Arrente speakers in Central Australia use the left hand to refer to targets that are on the left and vice versa the right hand for targets that are on the right (Reference Wilkins, de Ruiter, Van Geenhoven and WarnerWilkins & de Ruiter, 1999). Finally, an occupation of the right hand with some other physical activity, such as holding a cup of coffee, can influence the choice of the hand in gesture production.
Thus, if hand preference is used as an indicator of hemispheric specialization in healthy individuals, these factors need to be controlled. Since handedness, in particular, strongly influences the hand choice in gesture production, strictly speaking, only the use of the non-dominant hand can be used as a reliable indicator of hemispheric specialization. As an example, if a right-hander uses the dominant right hand for gesture production, it cannot be reliably distinguished if (s)he uses the right hand because of her/his handedness or because of hemispheric specialization in the production of that specific gesture. If, however, a right-hander spontaneously uses the nondominant left hand, this can be taken as an indicator of hemispheric specialization (if the other factors listed above are ruled out).
1.2.1.3 Tachistoscopic Experiments
In tachistoscopic experiments, visual stimuli are presented randomly in the left and right visual hemifields. The stimuli are presented for a short time, often 150 ms, such that the individual cannot shift the focus of his vision to the stimulus. This has the consequence that the stimuli are first processed in the contralateral visual cortex, that is, stimuli presented in the left visual field are processed in the right visual cortex and vice versa stimuli presented in the right visual field are processed in the left visual cortex (Figure 20.2).
Figure 20.2 Visuo-motor processing of lateralized presented stimuli
The paradigm behind this method in gesture production studies is as follows: If the task requires a fast gestural response, for example, depicting the stimulus in gesture, there is a spontaneous preference to use the contralateral hand (see above). Thus, stimuli presented in the left visual field, which are processed in the right hemisphere, will trigger left-hand gestures. Vice versa, stimuli presented in the right visual field, which are processed in the left hemisphere, will trigger right-hand gestures. The preference for the contralateral hand is only pushed back if the stimulus requires a hemispherically specialized function. Therefore, deviations from the contralateral hand preference are indicative of a hemispheric specialization in the production of that specific gesture.
1.2.1.4 Functional Neuroimaging Studies
In recent decades the spectrum of neuropsychological gesture research has been expanded by neurophysiological and neuroimaging methods. While neurophysiological methods (electroencephalography) which measure the electric activity in the brain, such as event-related potentials, are highly valuable for gesture-perception research, they are less suited for gesture-production research because muscle activity would substantially influence the result. In functional neuroimaging studies, such as functional magnetic resonance imaging (fMRI) and functional near-infrared spectroscopy (fNIRS), the neural activity during gesture production is localized via an increase of blood flow in a specific region of the brain. Muscle artifacts are controlled by fixating the head in the scanner (fMRI) or by fixating the device at the head (fNIRS). Among functional neuroimaging techniques, fNIRS has the advantage that the display of hand gestures can be performed in a quasi-natural setting as compared to the highly unnatural setting in an MRI scanner, involving lying on one’s back with the head and elbows fixated. However, thus far, fNIRS methodology has focused on block-design paradigms, which require that the same gesture is repeated several times in a row. Thus, the gestures are produced explicitly on command – a condition that differs from the spontaneous implicit gesture production in natural settings (see below).
1.2.2 Gesture Elicitation and Analysis
1.2.2.1 On-Command Gesture Production
In this procedure of gesture elicitation, a specific gesture has to be performed on command. The command can be launched in different modes: demonstration of a gesture to be imitated (e.g. Reference Goldenberg, Wimmer, Holzner and WesselyGoldenberg, Wimmer, Holzner, & Wessely, 1985; Reference Króliczak and FreyKróliczak & Frey, 2009), verbal or written command (Reference Frey, Funnell, Gerry and GazzanigaFrey, Funnell, Gerry, & Gazzaniga, 2005; Reference Sunderland, Wilkins and DineenSunderland, Wilkins, & Dineen, 2011), visual presentation of a stimulus (e.g. Reference GoldenbergGoldenberg, 2013; Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003), or presentation of sounds (e.g. Reference Lewis, Brefczynski, Phinney, Janik and DeYoeLewis, Brefczynzki, Phinney, Janik, & DeYoe, 2005).
Furthermore, three different gesture types are tested:
(1) Meaningful transitive, for example, pantomime brushing of the teeth; these are tool-use pantomime gestures,Footnote 6 which are defined as a gestural display (“as if”) of tool use with an imaginary tool in hand. This gesture type matches the type pantomime applied in other branches of gesture research. A substantial amount of neuropsychological gesture studies further include real tool use, defined as the real action with the specific tool in hand and the presence of all relevant physical target objects (e.g. actually hammering with hammer in hand on a nail), and tool-use demonstration, defined as showing how a tool is used with the actual tool in hand but without a physical target object (e.g. showing hammering with hammer in hand but without a nail). While the latter two movement forms are not gestures in the stricter sense, they are examined as proxies to tool-use pantomime in order to explore the relation between real tool use and pantomime of tool use.
(2) Meaningful intransitive, for example, military salute, swearing an oath. This gesture type matches the type emblem applied in other branches of gesture research.
(3) Meaningless, for example, extension of only the first, second, and fifth finger (finger configuration, comparable to hand shape in gesture research), or flat right hand with palm down at 90° angle touching the left cheek (hand-head position, comparable to hand orientation in gesture research). These are arbitrary novel static gestures that can be tested only by imitating another person’s demonstration. This gesture type has no equivalent in other branches of gesture research. It is, however, of interest as it reveals which kinetic components of hand gestures (hand shape, hand orientation) – independent of a meaning – are generated in which hemisphere.
The on-command gesture elicitation has the advantage that it allows for assessing a gesture performance as correct or incorrect. According to the two-step praxis models (see above), concept and execution errors are distinguished: In concept errors the correct concept is not retrieved, for example, when asked to pantomime toothbrushing, a floppy throwing movement is demonstrated, that is, the correct movement concept is not recognizable. Execution errors refer to minor errors in which the target movement concept is recognizable but the execution is deficient, for example, when asked to pantomime toothbrushing, a rotating movement in front of the mouth is shown but the hand shape is flat (for a detailed description of the apraxic error types, see Reference Lausberg, Kita, Zaidel and PtitoLausberg, Cruz, Kita, Zaidel, & Pitto, 2003).
With regard to Liepmann’s two-step apraxia model and more recent multidimensional models, the different modes of gesture elicitation and the investigation of different gesture types enable the identification of different components of gesture production. The differentiated analysis of error types provides valuable insights into the gesture production process and enables identification of the distinct production steps in the process, such as concept de novo generation, concept retrieval, and execution. Each step requires specific cognitive competences that can be selectively impaired. Therefore, neuropsychological apraxia tests typically comprise different modes of command and different gesture types (e.g. Goldenberg’s apraxia test, Reference Goldenberg2011).
1.2.2.2 Spontaneous Gesture Production
In this procedure of gesture elicitation, gestures are elicited indirectly by tasks that provide a thematical frame which stimulates gesture production but gives a free choice of the concepts that are executed in gesture. As an example, in the renarration of the Tweety and Sylvester cartoon (see McNeill, this volume), the gesturer chooses which aspects of the cartoon s/he depicts in the gestures that accompany speech. Further examples for elicitation tasks are semi-standardized interviews, such as those based on the Levels of Emotional Awareness scales (Reference Lane and SchwartzLane & Schwartz, 1987) or cognitive tasks, such as the Stroop (Reference StroopStroop, 1935) or the Tower of London (Reference ShalliceShallice, 1982) tests.
While gesture production on command can be assessed as correct or incorrect, and concept and execution errors can be further classified, the assessment of spontaneous gesture production requires gesture analysis systems (see Part 2, this volume). In neuropsychological gesture research, given that many patients with left hemisphere damage suffer from aphasia, if not global aphasia with a complete inability to produce language, for the analysis of spontaneous gestures, coding systems have to be employed that enable one to classify gesture independently of speech (Reference LausbergLausberg, 2013, Reference Lausberg2019). Table 20.1 shows different gesture coding systems used in neuropsychological gesture research and illustrates their comparability. The NEUROGES system is used as a reference system as it is detailed and covers all values listed in other systems and as its objectivity and reliability are empirically established (Reference LausbergLausberg, 2019; Reference Lausberg and SloetjesLausberg & Sloetjes, 2016).
Table 20.1 Comparison of gesture coding systems employed in neuropsychological research
| Reference LausbergLausberg (2019) | Reference EfronEfron (1941/1972) | Reference Freedman, Siegman and PopeFreedman (1972) | Reference KimuraKimura (1973a, Reference Kimura1973b) | Reference McNeillMcNeill (1992) | Apraxia research* |
|---|---|---|---|---|---|
| |||||
|
|
| free movements | beat | |
|
|
| free movements | deictic | |
|
|
| free movements | deictic | |
|
|
| free movements | iconic, metaphoric | meaningful transitive |
|
|
| free movements | iconic, metaphoric | |
|
|
| free movements | iconic, metaphoric | |
|
|
| free movements | iconic, metaphoric | |
| object-oriented action | real tool use | ||||
| subject-oriented action |
| self-touch | ___________ | ||
|
|
| free movements | ___________ | meaningful intransitive |
| meaningless |
Note: * e.g. Reference Goldenberg and HagmannGoldenberg and Hagmann (1997); ** superordinate categories that comprise subtypes are printed in capital letters; *** comparable gesture types and superordinate categories of different systems are arranged in one row.
2. Gesture Production in the Right and Left Hemispheres
In light of the methodological specificities and limitations outlined above, the following section reviews neuropsychological studies that investigate the roles of the right and left hemispheres in gesture production. Since direct comparisons of the left and right hemisphere functions provide the most reliable results, the focus of the review is on comparative studies.
2.1 Lesion Studies
2.1.1 On-Command Gesture Production
The on-command gesture-production studies in patients with LHD and RHD yield different patterns of impairment for the three gesture types: meaningful transitive, meaningful intransitive, and meaningless.
(i) Meaningful transitive
LHD patients are significantly more impaired than RHD patients in the imitation of transitive gestures, that is, pantomime of tool use (Reference Haaland and FlahertyHaaland & Flaherty, 1984; Reference Roy, Heath, Westwood, Schweizer, Dixon, Black and SquareRoy et al., 2000). The LHD patients’ deficit in producing pantomimes is also found in command modes that require movement conceptualization, that is, to visual presentation of the real tool (e.g. Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003), presentation of pictures of the tool (e.g. Reference GoldenbergGoldenberg, 2013), and verbal or written command (Reference Frey, Funnell, Gerry and GazzanigaFrey et al., 2005; Reference Sunderland, Wilkins and DineenSunderland et al., 2011). Often, LHD patients are impaired in pantomime as well as in tool-use demonstration (Reference De Renzi, Faglioni and SorgatoDe Renzi, Faglioni, & Sorgato, 1982; Reference Jarry, Osiurak, Delafuys, Chauviré, Etcharry-Bouyx and Le GallJarry et al., 2013; Reference Randerath, Li, Goldenberg and HermsdörferRanderath, Goldenberg, & Hermsdörfer, 2009).
However, the two faculties may dissociate and LHD individuals may be impaired in pantomime while tool-use demonstration is preserved (Reference De Renzi, Faglioni and SorgatoDe Renzi et al., 1982; Reference GoldenbergGoldenberg, 2013; Reference Hermsdörfer, Li, Randerath, Goldenberg and JohannsenHermsdörfer, Randerath, Goldenberg, & Johannsen, 2012; Reference Jarry, Osiurak, Delafuys, Chauviré, Etcharry-Bouyx and Le GallJarry et al., 2013; Reference Randerath, Goldenberg, Spijkers, Li and HermsdörferRanderath, Goldenberg, Spijkers, & Hermsdörfer, 2011). In the same line, patients with callosal disconnection show with their left hand (separate right hemisphere) a severe apraxia for pantomime, while tool-use demonstration is not or is only mildly impaired (Reference Boldrini, Zanella, Cantagallo and BasagliaBoldrini, Zanella, Cantagallo, & Basaglia, 1992; Reference Frey, Funnell, Gerry and GazzanigaFrey et al., 2005; Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003). In contrast, with their right hand, they perform both tool-use demonstration and pantomime perfectly. The findings indicate that the separate right hemisphere is able to perform tool-use demonstration but not pantomime, while the separate left hemisphere is able to perform both tool-use demonstration and pantomime.
Only single LHD cases have been reported which show the reverse pattern with pantomime being performed better than tool-use demonstration (Reference De Renzi, Faglioni and SorgatoDe Renzi et al., 1982; Reference FukutakeFukutake, 2002; Reference Goldenberg, Hentze and HermsdörferGoldenberg, Hentze, & Hermsdörfer, 2004; Reference Jarry, Osiurak, Delafuys, Chauviré, Etcharry-Bouyx and Le GallJarry et al., 2013; Reference Motomura and YamadoriMotomura & Yamadori, 1994). The bidirectional dissociations indicate that despite substantial overlap in neural control, pantomime and tool-use demonstration also rely in part on distinctly different competences. Thus, pantomime is not just tool use without a tool but it requires distinct competences.
(ii) Meaningful intransitive
In patients with LHD and RHD, equal degrees of impairment were found in the imitation of intransitive gestures (emblems) (Reference Haaland and FlahertyHaaland & Flaherty, 1984; Reference Heath, Roy, Black and WestwoodHeath, Roy, Black, & Westwood, 2001).
(iii) Meaningless
LHD patients make more errors in the imitation of hand-head positions than of finger configurations, whereas RHD patients display more errors with finger configurations than with hand-head positions (Reference GoldenbergGoldenberg, 1999). The finding of hemispheric specialization for the imitation of hand-head positions versus finger configurations is confirmed by a corresponding pattern of impairment in patients with callosal disconnection (Reference Goldenberg, Müllbacher and NowakGoldenberg, Müllbacher, & Nowak, 1995; Reference Lausberg and CruzLausberg & Cruz, 2004). To tachistoscopic presentation of finger configurations and hand-head positions, these patients have a deficit in imitating finger configurations with the right hand (separate left hemisphere), but not with the left hand (separate right hemisphere). In contrast, they have a deficit in the imitation of hand-head positions with the left hand, but not with the right hand. This indicates that imitation of finger configurations relies on right-hemispheric competence, while the imitation of hand-head positions requires left-hemispheric competences.
To summarize, pantomime gestures that are produced on command rely on left-hemispheric competences. Dissociations between pantomime and tool-use demonstrations in lesion studies indicate that there is not a perfect overlap in neural control, but that pantomime and tool-use demonstration represent partly distinct cognitive functions. Furthermore, in the visuo-motor imitation of static hand gestures, the right hemisphere contributes to forming spatially complex finger configurations (hand shape), while the left hemisphere contributes to orienting the hand (hand orientation, here: relative to the head).
2.1.2 Spontaneous Gesture Production
In contrast to the on-command studies, which indicate a stronger contribution of the left hemisphere than of the right one to gesture production, the studies of spontaneous gesture production reveal a substantial contribution of the right hemisphere to gesture production.
One clear line of evidence stems from individuals with callosal disconnection. As described above, in patients with callosal disconnection, the right hand can be distinctly controlled only by the left hemisphere, and vice versa, the left hand by the right hemisphere. Patients with a callosal disconnection spontaneously produce considerably more gestures with the left hand than with the right hand (Reference Kita and LausbergKita & Lausberg, 2008; Reference Lausberg, Davis and RothenhäuslerLausberg, Davis, & Rothenhäusler, 2000; Reference Lausberg, Zaidel, Cruz and PtitoLausberg, Zaidel, Cruz, & Pitto, 2007; Reference McNeillMcNeill, 1992; Reference McNeill, Pedelty, Emmorey and ReillyMcNeill & Pedelty, 1995). This was confirmed for different settings such as personal interviews or renarration of cartoons. Thus, the right hemisphere contributes more to spontaneous gesture production than the left hemisphere. Moreover, since for all these patients a left hemisphere language dominance had been experimentally demonstrated, their spontaneous left-hand preference for gestures, which reflects right hemisphere control, proves that a substantial amount of gestures are generated primarily independently of speech.
These findings are supported by studies of LHD patients as compared to RHD patients (Reference Hadar and SorokerHadar, Wenkert-Olenik, Krauss, & Oreker, 1998; Reference Hogrefe, Rein, Skomroch and LausbergHogrefe, Rein, Skomroch, & Lausberg, 2016; Reference Rousseaux, Daveluy and KozlowskiRousseaux, Daveluy, & Kozlowski, 2010). Reference Hadar and SorokerHadar and colleagues (1998) found in RHD patients a reduction of gestures relative to pictorial input and to lexical production as compared to LHD patients and healthy controls. In particular, iconic gestures (but not emblematic or deictic gestures) were reduced. In contrast, the LHD patients produced the highest rate of gestures relative to pictorial input and to lexical production among the three groups. Likewise, Reference Hogrefe, Rein, Skomroch and LausbergHogrefe and colleagues (2016) reported in patients with RHD, as compared to patients with LHD and to healthy controls, a significant reduction of spontaneous gestures during the renarration of animated cartoons. With their (non-paretic) right hand, RHD patients had a significantly lower mean number of gestures (phasic in space) and produced significantly less of them per minute than the controls with either hand. The reduced gesture production in RHD patients was specific for gestures and did not affect other types of hand movements, such as self-touch (on body). In contrast, with the (non-paretic) left hand, LHD patients, most of them with global aphasia, produced significantly more gestures per minute than the controls with either hand and the RHD patients with the right hand. These studies evidence that damage of the right hemisphere and not that of the left hemisphere results in a significant reduction of gesture production. The lesion studies prove therefore that spontaneous gestures can be produced in the right hemisphere. Given that many of the LHD patients are aphasic or even global aphasic, their gestures are generated primarily independently from speech.
To summarize, studies in patients with callosotomy and in patients with LHD and RHD damage evidence that the right hemisphere substantially contributes to spontaneous gesture production independent from language production.
2.2 Hand Preference Studies
As outlined above, with regard to hemispheric specialization in the production of gestures, in right-handers it is left-hand gestures that are of interest, since in this case handedness can be ruled out as a determining factor. In right-handers, percentages of left-hand gestures in unimanual gestures range between 25 percent and 39 percent (Reference Dalby, Gibson, Grossi and SchneiderDalby et al., 1980; Reference KimuraKimura, 1973a; Reference Lavergne and KimuraLavergne & Kimura, 1987; Reference Sousa-Posa, Rohrberg and MercureSousa-Poza, Rohrberg, & Mercure, 1979; Reference StephensStephens, 1983). As an example, Reference Miller and FranzMiller and Franz (2005) reported for their sample of twelve right-handed individuals a mean proportion of time of 3.15 seconds per minute spent making gestures with the right hand and of 2.32 seconds per minute spent gesturing with the left hand, with great interindividual variability. The question arises as to what causes the production of left-hand gestures in right-handers, in particular since neither handedness nor language lateralization can explain the hand choice in this case.
Some studies report that the topic influences hand preference for gestures. In right-handers, more left-hand use is found in gestures that are accompanied by an emotional facial expression than in gestures without accompanying emotional facial expression (Reference Moscovitch and OldsMoscovitch & Olds, 1982). Reference Kita, de Condappa and MohrKita, Condappa, and Mohr (2007) report a shift toward more left-hand use for those depictive gestures that have a character viewpoint in a metaphor condition. (However, Reference StephensStephens, 1983, reported a stronger right-hand preferences among right-handers for metaphorics as well as for iconics.). Other studies investigate hand preferences for specific gesture types. For batons or beats as compared to other gesture types, some studies show a trend toward more left-hand use (Blonder et al., 1995; Reference McNeillMcNeill, 1992; Reference Sousa-Posa, Rohrberg and MercureSouza-Poza et al., 1979; Reference StephensStephens, 1983).
The few gesture studies on left-handers reflect a high heterogeneity within this group which is linked to differences in language lateralization (Reference KimuraKimura, 1973b) as well as in writing style, that is, left inverted vs. left normal (Reference StephensStephens, 1983). Likewise, in left-handers, hand use for specific gestures appears to be more variable than in right-handers (Reference Helmich, Meyer, Voelk, Coenen, Mueller, Schepmann and LausbergHelmich et al., 2022; Reference MeyerMeyer, 2021).
To summarize, left-hand gestures in right-handed healthy individuals can be considered as an indicator of a right-hemispheric contribution to gesture production. This appears to apply to gestures that are generated in an emotional or metaphorical context as well as to gestures that lend rhythmical emphasis to a verbal statement. Left-handers show a high variability in hand preference for gestures, which reflects that in this group, hemispheric specialization is known to be more variable than in right-handers.
2.3 Neuroimaging Studies
Thus far, there are only a few neuroimaging studies that have investigated gesture production. Some of them examine only the left hemisphere (e.g. Reference Balconi, Crivelli and CortesiBalconi, Crivelli, & Cortesi, 2017), which makes a comparison of the contributions of the right and left hemispheres impossible. All of them investigate gesture production on command; none of them consider spontaneous gesture production.
In functional near-infrared spectroscopy (fNIRS) simple meaningless movements of the right and left hands (here: flexion/extension of the thumb) are accompanied by an increase in cerebral oxygenation in the contralateral hemisphere, that is, for the left thumb in the right hemisphere and vice versa, for the right thumb in the left hemisphere (Reference LausbergHelmich, Rein, Niemann, & Lausberg, 2013). In contrast, pantomime, tool use demonstration, and body-part-as-object (BPO)Footnote 7 gestures show a lateralization to the left hemisphere (Reference Helmich, Holle, Rein and LausbergHelmich et al., 2015).
With an event-related fMRI study design, Reference Hermsdörfer, Terlinden, Mühlau, Goldenberg and WohlschlägerHermsdörfer, Terlinden, Mühlau, Goldenberg, and Wohlschlager (2007) showed an increase of activity in the left intraparietal sulcus in pantomime as compared to tool use demonstration. (However, this finding is based on a region of interest analysis of the left inferior frontal and left inferior parietal cortices.) In a fMRI study by Reference Imazu, Sugio, Tanaka and InuiImazu, Sugio, Tanka, and Inui (2007), the contrast of pantomime to tool use demonstration for using chopsticks with the right hand shows significantly greater activity in the left inferior intraparietal lobe. Finally, Reference Lausberg, Kazzer, Heekeren and WartenburgerLausberg, Kazzer, Heekeren, and Wartenburger (2015) investigated tool use pantomime with either hand in response to visual tool presentation and contrasted it with tool use demonstration. The fMRI conjunction analysis of the right and left hands’ executions of tool use pantomime relative to tool use demonstration shows significant activity in the left middle and superior temporal lobe. While all three fMRI studies examined pantomime and tool use demonstration, the latter study was the only one in which the study design was perfectly analogous to the above reported lesion studies, in which differences between pantomime and tool use demonstration were found in LHD and callosal disconnection patients.
To summarize, for the on-command performance of simple meaningless hand gestures there is no hemispheric specialization, while for tool use pantomime there is a left-hemispheric specialization. The contrast of tool use pantomime and tool use demonstration yields an activation in the left middle and superior temporal gyri. Since the only difference between two conditions is that the pantomime condition requires the individual to act with an imaginary tool in hand, the left temporal gyri appear to be specifically involved in integrating the mental image of a tool in the gesture execution (see further discussion below).
3. Discussion
At first glance, the above review seems to show conflicting findings on the contributions of the right and left hemispheres to gesture production. While some studies demonstrate the relevance of the right hemisphere, others exclusively indicate left-hemispheric contributions.
If, however, the methodology that was applied is taken into account, it is striking that spontaneous gesture production studies evidence a substantial contribution of the right hemisphere to gesture production, while gesture production on-command studies show a relevant role of the left hemisphere. Moreover, if cognitive-emotional challenges are differentiated, it appears that the left and right hemispheres make specific contributions to gesture production. These two aspects are discussed below.
3.1 The Impact of Investigating Spontaneously Versus On-Command Produced Gestures on the Study Results
As early as 1907, Liepmann reported a patient who was not able to produce a specific gesture on command, but who readily performed the “same” gesture spontaneously in a natural context (Reference Liepmann and MaasLiepmann & Maas, 1907). Since then, there have been numerous reports of patients with LHD or of patients with callosal disconnection in the left hand (the control of which is deprived of left-hemispheric input) who cannot produce gestures explicitlyFootnote 8 upon request, for example, waving goodbye on command, but they can produce the same gesture implicitly without any problem as part of an automated daily routine, for example, waving when actually saying goodbye to somebody. Likewise, patients with LHD or callosal disconnection (left hand) have been reported to spontaneously use a tool correctly in a natural context but to fail in an experimental condition when they were asked to use the same tool on command (e.g. Reference Buxbaum, Schwartz, Coslett and CarewBuxbaum, Schwaryz, Coslett, & Carew et al., 1995; Reference Hermsdörfer, Li, Randerath, Goldenberg and JohannsenHermsdörfer, Terlinden, Mühlau, Goldenberg, & Wohlschlanger, 2012; Reference Lausberg, Göttert, Münßinger, Boegner and MarxLausberg, Göttert, Münßinger, Boegner, & Marx, 1999; Reference Liepmann and MaasLiepmann & Maas, 1907). As an example, a patient with callosal disconnection could even not take something out of his trouser pocket with the left hand when he intended to do so but he could perform the same action when he did not think about it (Reference Lausberg, Göttert, Münßinger, Boegner and MarxLausberg et al., 1999).
As described above, Liepmann proposed that the right hemisphere contains a sensomotorium with a kinetic memory for automatized recurring movements. Later, Reference Rapcsak, Ochipa, Beeson and RubensRapcsak, Ochipa, Beeson, & Rubens (1993, p. 23) suggested that the “praxis system of the right hemisphere is strongly biased toward the ‘concrete’ or context-dependent execution of familiar, well-established routines such as overlearned actions” and intransitive gestures, whereas the left hemisphere controls “‘abstract’ or context-independent performance of transitive movements and in learning of novel movement sequences.” Thus, performing gestures on command appears to require more left-hemispheric competences, while spontaneous gesture production relies more on right-hemispheric competences. Therefore, it is plausible that experimental settings which rely on gesture production on command, such as apraxia tests but also neuroimaging studies, will yield a strong or even exclusive left-hemispheric generation of gestures, while experimental settings in which spontaneous gestures are indirectly elicited, such as in interaction or renarration, will show a stronger right-hemispheric contribution to gesture production.
3.2 The Impact of the Cognitive-Emotional Challenges of the Task on the Study Results
Another methodological factor that strongly influences the study results is the cognitive-emotional challenge of the task. While some tasks require more right-hemispheric functions such as spatial cognition, spatial attention, nonverbal emotional expression including affective prosody, and metaphorical thinking, other tasks address more left-hemispheric functions such as language and praxis. As outlined above, in healthy individuals, the hemisphere that is predominantly engaged in a task influences the hand choice for gesture execution, for example, spatial tasks, which rely on right-hemispheric spatial cognition, induce a shift toward more (contralateral) left-hand use in gesture. It is highly plausible that the right hemisphere not only processes the spatial task and executes the corresponding gesture – as evidenced by the left-hand choice – but also conceptualizes the spatial gesture, which is grounded in spatial cognition. Following this line of thought, the sections below discuss the above findings with regard to gesture production in the right and left hemispheres.
3.2.1 Gesture Production in Relation to Right-Hemispheric Cognitive and Emotional Processes
3.2.1.1 Spatial Cognition and Spatial Attention
The right hemisphere is specialized for spatial cognition which includes the analysis and construction of spatial relations. It is evident that gestures as spatial-temporal forms of expression – movement is physically defined as a displacement in space of a physical body in time – are particularly suitable for describing spatial (and temporal) conditions. The link between gesture production and spatial cognition as right-hemispheric functions is evidenced by the fact that right-hemispheric lesions are accompanied by deficits in the production of the spatial components of hand gestures. As described above, individuals with RHD are impaired in the imitation of spatially complex finger configurations (Reference GoldenbergGoldenberg, 1996, Reference Goldenberg1999). Furthermore, in RHD individuals, a deficit in the visuo-spatial tests was accompanied by a reduction of iconic gestures (Reference Hadar and SorokerHadar et al., 1998). However, Reference Cocks, Hird and KirsnerCocks, Hird, and Kirsner (2007) could not find a systematic relation between visuo-spatial tests and gesture production in their RHD individuals. As in both studies the sample sizes were small, and further research is needed.
In healthy persons, the relation between gesture production and spatial thinking becomes apparent in the fact that they perform more gestures in spatial imagery tasks when talking about space, landscapes, or sculptures than when talking about abstract or routine issues (Reference Feyereisen and HavardFeyereisen & Havard, 1999; Reference Miller and FranzMiller & Franz, 2005). In spatial problem-solving tasks, participants show significantly more gestures than in interference tasks, such as the Stroop-Test, which measures through color-word-interference to what extent irrelevant stimuli can be suppressed (Reference Barosso, Freedman and GrandBarosso, Freedman, & Grand 1978). With regard to hand preferences, in tachistoscopic experiments, right-handers prefer the left hand for the gestural depiction of spatial rotation (Reference Helmich, Voelk, Coenen, Xu, Reinhardt, Mueller, Schepmann and LausbergHelmich et al., 2021).
Furthermore, the right hemisphere attends to the whole body and the whole body-external space including the gesture space, while the left hemisphere only attends to the right half of the body and the right half of the body-external space. This implies that RHD individuals who can only rely on their left-hemispheric spatial attention show a neglect of the left half of the body and the left half of the body-external space including the gesture space. Accordingly, individuals with a callosal disconnection show a neglect of the left gesture space in their right-hand gestures, which are controlled by the left hemisphere (Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003). They move their right hands only in the right gesture space and do not cross the body midline, while with their left hands they use the whole gesture space.
3.2.1.2 Emotion and Affective Prosody
It is well documented that the right hemisphere is specialized in recognizing and generating emotional nonverbal expression. Individuals with RHD as well as individuals with callosal disconnection (in the separate left hemisphere) display a deficit in nonverbal emotional communication (Reference Benowitz, Bear, Rosenthal, Mesulam, Zaidel and SperryBenowitz et al., 1983; Reference Blonder, Bowers and HeilmanBlonder, Bowers, & Heilman, 1991; Reference Blonder, Burns, Bowers, Moore and HeilmanBlonder, Burns, Bowers, Moore, & Heilman, 1993; Reference Bowers, Blonder, Feinberg and HeilmanBowers, Blonder, Feinberg, & Heilman, 1991). This has been demonstrated in particular for emotional gestures or gestures displayed in an emotional context. Reference Ross and MesulamRoss and Mesulam (1979) described two patients with RHD as unable to communicate emotion via gestures. Further, individuals with RHD showed a reduction of gesture production especially in discourse samples with high emotional content (Reference Cocks, Hird and KirsnerCocks et al., 2007).
With regard to hand preferences, more left-hand use is found in gestures that are accompanied by an emotional facial expression than in gestures without accompanying emotional facial expression (Reference Moscovitch and OldsMoscovitch & Olds, 1982). Furthermore, individuals with callosal disconnection display emotional shrugs only with the left shoulder (Reference Lausberg, Zaidel, Cruz and PtitoLausberg et al., 2007). Footnote 9 The left-side preference strongly supports the proposition that emotional gestures are generated in the right hemisphere.
The emotional functions of the right hemisphere also contain emotional phonemes and affective prosody (e.g. Reference Schirmer, Alter, Kotz and FriedericiSchirmer, Alter, Kotz, & Friederici, 2001). A specific gesture type whose production is coordinated with prosody is emphasis gestures (batons/beats) which set rhythmical accents on the verbal utterance. Support for a link between right-hemispheric affective prosody and the production of emphasis gestures is given by the fact that healthy individuals show a relative left-hand preference for emphasis gestures (Blonder et al., 1985; Reference Lausberg, Zaidel, Cruz and PtitoLausberg et al., 2007; Reference Sousa-Posa, Rohrberg and MercureSousa-Poza et al., 1979; Reference StephensStephens, 1983). The shift to more left-hand use suggests right-hemispheric engagement in the production of emphasis gestures.
3.2.1.3 Metaphorical Thinking
The right hemisphere also plays a dominant role in processing metaphors (Reference Ferstl, Neumann, Bogler and von CramonFerstl, Neumann, Bogler, & von Cramon, 2008). In gesture, these can be expressed by presentation gestures (representational gestures). It is well documented that these gestures not only refer to concrete entities but also to abstract entities and metaphors (Reference Cienki and KoenigCienki, 1998; Reference Kita, de Condappa and MohrKita et al., 2007; Reference McNeillMcNeill, 1992; Reference MittelbergMittelberg, 2006; Reference MüllerMüller, 1998), such as the dynamics of an economic development, or the psychological closeness or distance in a human relationship. As an example, when a person talks about her/his father and mother, the simultaneous gestural performance of a round form can indicate that s/he experiences them as a “round” entity. According with the fact that metaphorical thinking is a right-hemispheric function, a relative left-hand preference is found for gestures in metaphoric issues (Reference Argyriou, Mohr and KitaArgyriou, Mohr, & Kita, 2017; Reference StephensStephens, 1983) and for depictive gestures that have a character viewpoint in a metaphor condition, as compared to deictics and to other depictive gestures (Reference Kita, de Condappa and MohrKita et al., 2007). Even more, left-hand use seems to enhance metaphor explanation (Reference Argyriou, Mohr and KitaArgyriou et al., 2017).
3.2.1.4 Global Versus Analytical Thinking
Finally, the right hemisphere has a preponderance for processing superordinate structures and for organizing events in a narrative structure in contrast to the perception of details, which is a left-hemispheric specialization (Reference LezakLezak, 1995). A gesture type that matches this cognitive function is ideographic gestures as defined by Reference EfronEfron (1941/1972, p. 96): “ […] ideographic, in the sense that it traces or sketches out in the air the ‘paths’ and ‘direction’ of the thought pattern. [They] might also be called logico-topographic or logico-pictorial.” While Efron’s ideographic gesture type has rarely been investigated in experimental studies, Reference Lausberg, Zaidel, Cruz and PtitoLausberg and colleagues (2007) reported in two individuals with callosal disconnection a clear left-hand preference, indicating right-hemisphere generation.
In summary, some gesture types such as spatial gestures, emotional gestures, emphasis gestures, and presentation gestures, especially those that depict metaphorical or ideographical contents, appear to be directly associated with right-hemispheric nonverbal, cognitive, and emotional functions, such as spatial cognition, emotional processes, prosody, and metaphoric or global thinking. Lesion studies as well as the finding of a left-hand preference in healthy individuals for these gesture types suggest that these gestures are conceptualized and executed in the right hemisphere.
3.2.2 Gesture Production in Relation to Left-Hemisphere Cognitive Processes
3.2.2.1 Tool-Use Praxis
Tool–use praxis is a relevant facet of praxis. The relation between real tool use and its gestural depiction, the tool-use pantomime, has been focused on not only in neuropsychology but also in gesture and multimodal communication research. The close relation between pantomime, tool-use demonstration, and real tool use (see definition of the three modalities above) is demonstrated by the numerous studies showing that in LHD, all three modalities can be impaired together. However, the precise nature of their relation can best be explored by analyzing dissociations between the three modalities as evidenced in other studies.
In LHD, pantomime is more often disturbed than real tool use and tool-use demonstration (see above). This can be explained by the fact that – in contrast to pantomime, in which there is no real tool present – real tool use and tool-use demonstration are facilitated by perceptual tactile cues provided by the tool (Reference Boldrini, Zanella, Cantagallo and BasagliaBoldrini et al., 1992; Reference Goldenberg and HagmannGoldenberg & Hagmann, 1997; Reference Laimgruber, Goldenberg and HermsdörferLaimgruber et al., 2005; Reference LiepmannLiepmann, 1908; Reference Randerath, Goldenberg, Spijkers, Li and HermsdörferRanderath et al., 2011). It has further been argued that a higher degree of familiarity explains the better performances in real tool use and tool-use demonstration as compared to pantomime. While this is certainly true for real tool use, in particular, if executed in the natural context (Reference Buxbaum, Schwartz, Coslett and CarewBuxbaum et al., 1995; Reference De Renzi, Denes and PizzamiglioDe Renzi, 1999; Reference Lausberg, Göttert, Münßinger, Boegner and MarxLausberg et al., 1999), it hardly applies to tool-use demonstration without the target object. It might even, on the contrary, be argued that pantomiming tool use might be more frequent in everyday communication than tool-use demonstration with a tool in hand but without a physical target.
Findings in patients with callosal disconnection, who show a left-hand (right-hemisphere) apraxia for pantomime cooccurring with preserved tool-use demonstration, deliver further clarification (Reference Boldrini, Zanella, Cantagallo and BasagliaBoldrini, Zanella, Cantagallo, & Basaglia, 1992; Reference Frey, Funnell, Gerry and GazzanigaFrey et al., 2005; Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003). It is noteworthy that these patients’ separate right hemispheres have no deficit in tool-use semantics, that is, in associating pictures of tools with movie clips of pantomimes (Reference Frey, Funnell, Gerry and GazzanigaFrey et al., 2005). Furthermore, with the left hand they not only display conceptual errors but also conceptually correct body-part-as-object (BPO) gestures (Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003), for example, showing toothbrushing with the index finger representing the toothbrush. These BPO presentations reveal that the individual correctly recognizes the visually presented tool and that (s)he associates the tool with the correct movement concept. Thus, the callosal disconnection patients’ separate right hemispheres must contain intact mental representationsFootnote 10 of tools and the appropriate tool-specific movement concepts. However, the separate right hemisphere does not have the competence to perform tool-use actions with an imaginary tool in hand, that is, to integrate the projected mental image of the tool in the execution of the movement concept. (More specifically, the hand is shaped around the imaginary tool and executes the tool-specific movement concept with it.) Likewise, kinematic studies reveal that grip formation and other kinematic variables differ significantly between real grasping of objects and pantomimed grasping beside the object (Reference Goodale, Jakobsen and KeillorGoodale, Jakobsen, & Keillor 1994; Reference Laimgruber, Goldenberg and HermsdörferLaimgruber, Goldenberg, & Hermsdörfer, 2005). Reference Goodale, Jakobsen and KeillorGoodale et al. (1994) argued that pantomimed grasping was driven by stored perceptual information, that is, mental representation, about the object even if the object was visually present, while real grasping of the object relies on the visuo-motor online control system that directs actions in real time. fMRI studies suggest that the left middle and superior temporal gyri are specifically involved in integrating the projected mental image of a tool in the execution of a tool-specific movement concept (Reference Lausberg, Kazzer, Heekeren and WartenburgerLausberg et al., 2015), as these regions are activated when the participants pantomime with an imaginary tool in hand as contrasted to the demonstration with tool in hand.
To summarize, in contrast to real tool use and tool-use demonstration, in which a real tool is held in hand, pantomiming tool use with an imaginary tool held in hand is based on specific cognitive processes that imply that the mental image of the tool is projected into the gesture space and the hand is shaped around it and acts with it. While real tool use can in most cases be conducted with the left and right hemispheres, the faculty to pantomime tool use is exclusively left-hemispheric (for further discussion, see Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003, Reference Lausberg, Kazzer, Heekeren and Wartenburger2015).
3.2.2.2 Language
As outlined above, spontaneous gestures can be generated in the right hemisphere and thus, primarily independently of speech production. However, it is evident that, at least as a secondary process, gesture and speech are temporally and semantically linked. The following section will discuss the implications of the right-hemispheric generation of gestures for assessing their relation to speech.
Gestures, as spatio-temporal forms of expression, which are generated in association with right-hemispheric processes such as spatial cognition or emotional expression, provide information in a fundamentally different form than speech does. As examples, non-representational emphasis gestures create accents and rhythms, or representational presentation gestures realize visual, spatial, sensory, or motor mental images. In most cases, gestural and verbal message complement each other by expressing different aspects of one concept. As an example, the combination of the spoken word “ocean” and a presentation gesture with the hand gently stroking along an imaginary flat surface builds up the message of a calm ocean. In healthy individuals, the corpus callosum is the neural basis for integrating a right-hemispheric mental image and a left-hemispherically generated word.
In contrast, in gesture speech mismatch (Reference McNeillMcNeill, 1992), the dissociation between verbal and gestural message reflects the presence of two concepts – one expressed verbally and the other one gesturally. A neuropsychological explanation for the appearance of such dissociations is the representation of one concept in the right hemisphere and the representation of the other concept in the left hemisphere. In patients with callosal disconnection, nonverbal–verbal dissociations are well documented (Reference Lausberg, Davis and RothenhäuslerLausberg et al., 2000; Reference SperrySperry, 1967). While these gesture/speech mismatches may occur spontaneously, they can also be provoked by tachistoscopic experiments. A patient with callosal disconnection, to whom a stimulus was presented to the left visual field (right hemisphere) (compare Figure 20.2), nonverbally nodded “Yes” and verbally said “No” at the same time. Actually, both answers were correct as the right (non-verbal) hemisphere could see the stimulus in the contralateral left visual field and therefore nodded “Yes,” whereas the left (verbal) hemisphere in the contralateral right visual field did not see a stimulus and therefore verbally answered “No.”
In healthy persons, dissociations between verbal and gestural expressions can occur, too. An everyday example for such a dissociation is that a person giving directions says “turn right” but at the same time points to the left. When it comes to spatial issues, the gesture generally gives the correct information, since gestures as spatio-temporal forms of expression enable the direct depiction of the spatial image. In contrast, the verbal statement requires a translation of the spatial image into a phonetic expression – a transformation process which is more susceptible to errors. Thus, speech–gesture dissociations can be observed particularly in experiments on spatial cognition. Participants show different cognitive strategies simultaneously in gesture and speech in experiments on the planning of spatial tasks (Reference Garber and Goldin-MeadowGarber & Goldin-Meadow, 2002). The same was documented in primary school children in Piaget’s behavioral experiments (Reference Church and Goldin-MeadowChurch & Goldin-Meadow, 1986; Reference PiagetPiaget, 1962). Dissociations between gestural and verbal expression can also occur with emotional problems, as illustrated by the following example from psychotherapeutic practice: A patient performed a left-hand gesture as if she would push away something or someone vehemently and at the same time she said: “ … as I care for others.” (Context of the utterance: “And … I am a person who ducks out of disputes. I always want to have the ideal world. And … today, it is scary … you always have to watch out that you come into your own … and actually, I don’t like that … I always want, I always think … as I care for others and look after them I always think others also look after me.”) Thus, the gestural expression reflected the unconscious impulse to push away others that did not concur with her verbalized conscious self-perception.
In this connection, it is important to mention that facts are correctly represented – independently of speech – in gestures. Experimental studies, in which participants had to renarrate animated cartoons, prove – independent of the verbal description – that the gestural depiction as regards the reference stimuli is correct. Scenes with geometric objects (square, triangle, ball) moving in various ways (rolling, jumping, sliding, etc.) and in various spatial relations to each other (apart, one after another, toward each other, etc.) were presented precisely and correctly in the gestures that spontaneously accompanied speech, even though the participants did not know that their spontaneous, implicitly displayed gestures were the object of research. This applied also when gestures were performed explicitly, that is, when the participants were requested to depict the scenes in gesture (Reference Lausberg and KitaLausberg & Kita, 2003; Reference Lausberg, Kita, Zaidel and PtitoLausberg et al., 2003). Thus, the scenes were reflected correctly in implicit and explicit gestural presentation.
Neurophysiological experiments employing event-related potentials reveal that the recipient’s cerebral event-related electric activity is altered when s/he is presented stimuli with speech–gesture dissociation (Reference Kelly, Kravitz and HopkinsKelly, Kravitz, & Hopkins, 2004), for example, the word “fat” together with a gesture showing a slim entity. While the incongruency is immediately reflected in the electrical brain activity in the course of the stimulus processing, this does not necessarily imply that the recipient consciously identifies the dissociation. Rather, the recipient might have a vague feeling that something is strange with the utterance, without being able to precisely determine where this intuition stems from. This has methodological implications for gesture analysis. If gesture and speech are analyzed from the outset together, a bias occurs to the effect that the gesture is interpreted in the sense of the verbal utterance. Thus, there is a risk that the genuine gestural message is neglected and as a consequence that the gestural–verbal dissociations are not detected. Only a two-stage procedure – firstly, analyzing gestures without speech, secondly, analyzing the relation between gesture and word (Reference Bressem, Ladewig, Müller, Müller, Cienki, Fricke, Ladewig, McNeill and TeßendorfBressem et al., 2013; Reference Davis and HadiksDavis & Hadiks, 1995; Reference Davis, Walters, Vorus and ConnorsDavis, Walters, Vorus, & Connors, 2000; Reference LausbergLausberg, 2013) – enables detection of gestural–verbal dissociation.
Concerning the relation of speech and gesture it also has to be noted that a mental concept can be first expressed gesturally and later verbally. Developmental psychological studies with Piaget’s behavioral experiments suggest that children perform a spatial problem correctly via gesture in an early stage of their development but could answer verbally correctly only at a later stage: “ […] we have found that correct explanations are produced in gesture before speech in children acquiring both conservation and mathematical equivalence, […]” (Reference Goldin-Meadow, Alibali and ChurchGoldin-Meadow, Alibali, & Church, 1993, p. 293). As the correct gestural responses may occur together with false verbal answers, that is, a dissociation occurs as mentioned above, Goldin-Meadow and colleagues propose that the dissociation between correct gestural and false verbal answers indicates a transitional state in which the child acquires a new cognitive concept (see also Novack & Goldin-Meadow, this volume). However, not only gestures that contradict the verbal utterance can indicate that a person is in a process of development but also gestures that, at first, apparently do not have any reference to the accompanying verbal utterance and can only be identified later on as complementary. Reference Mahl and ShilenMahl (1968, pp. 323–324) reports from his detailed nonverbal analyses of psychoanalytic sessions that the gestural expression of a mental image can precede the verbal utterance by a considerable amount of time. One of his examples describes: “Mrs. B. […] discussed feelings of inferiority toward her husband, and while doing this she momentarily placed her fingers to her mouth. Three minutes later she was stating that her sense of inferiority dated from childhood when she felt she was not as pretty as her sister because she had two buck teeth.” Referring to a series of examples of several patients, he summarizes: “In all of the preceding observations, spontaneous verbalization followed the nonverbal behavior within minutes. The time interval may be greater. In some of our observations the action occurs one day and the spontaneous verbalization follows 1 or 2 days later.”
To summarize, the fact that gestures can be primarily generated in the right hemisphere, while language is produced in the left hemisphere, provides a neuropsychological basis for understanding the complementarity but also the dissociation between gestural and verbal message. Gesture and speech might express complementary aspects of one concept, with the gesture directly depicting a visual, spatial, sensory, or motor mental image. However, in particular in conditions of cognitive or emotional problem solving, semantic gesture–speech dissociations might occur, with gesture and speech expressing two different concepts. Furthermore, the imagistic processing and, hence, gestural depiction might precede the verbalization such that a temporal gesture–speech dissociation occurs.
1 Why Is Gesture a Powerful Learning Tool?
Hand gestures are an important part of communication and are ubiquitous in both formal and informal learning contexts. For instance, a teacher may point back and forth between two representations of the same idea to help students understand the relationship between them, a father may use gesture to refer to an object in view when labeling it, and a five-year-old child may use his hands to indicate the height or width of two differently sized cups as he thinks through whether they contain the same amount of liquid. The gestures speakers produce when they talk provide insight into their mental representations (Reference Goldin-MeadowGoldin-Meadow, 2003; Reference KendonKendon, 2004; Reference McNeillMcNeill, 1992) and, as a result, can have profound effects on learning and conceptual change.
Gestures are particularly well suited to supporting learning because they are tied to language, which means that they can enhance language processing (Reference HostetterHostetter, 2011) and support learning. Gestures’ close ties to language also mean that they can be useful for learning language itself. But gestures also have the unique capacity to convey information that is not conveyed in the spoken language they accompany. In this way, learners are able to express ideas in gesture that they might not be able to express in speech and might be able to glean information from gesture that they might not be ready to take from speech.
Throughout this chapter, we consider the ways in which gesture can support learning by influencing speech, and by providing information that is different from speech. Additionally, we explore how these features allow gesture to serve two main roles in the learning process: (1) reflecting a learner’s knowledge, and (2) changing that knowledge through the gestures that learners produce themselves, and through the gestures that learners see others produce. Finally, because gesture plays an important role in learning processes across the life span, we take a developmental perspective. Gesture reflects and supports learning starting with the youngest learners – infants and young children through childhood and even into adulthood. Thus, the importance of gesture in learning is a constant across ontogeny.
Before exploring gesture over development, we begin by establishing our view of what gesture is, and what it takes to recognize a gesture when we see one. Gestures are movements of the hands that typically (although not always) accompany speech and communicate information of various kinds (for definitions, see Reference KendonKendon, 2004; Reference McNeillMcNeill, 1992). Here, we further specify gestures as meaningful empty-handed movements – movements that do not involve manipulating objects and do not involve having a direct effect on the environment. This definition makes two important points about gesture. First, gestures are distinct from movements produced directly on objects – we follow the position that gestures represent information (e.g. they show how to operate a novel object without actually touching the object), rather than accomplish tangible, object-directed goals (to physically manipulate the object). Second, gestures are distinct from empty-handed movements like those in dance or exercise – on our view, the goal of a gesture is to represent something other than itself (e.g. to show how a dance movement is performed), rather than to perform the movement for its own sake (to perform the dance) (see Reference Novack and Goldin-MeadowNovack & Goldin-Meadow, 2017).
How do people know that a movement is a gesture when they see one? If parents are gesturing as they teach their child a new word, or if a teacher is gesturing during a math lesson, is it obvious to the child that those hand movements are gestures and thus might matter for learning new information? How do we know when to classify movements as gestures?
It turns out that people of all ages are quite good at recognizing when hand movements are gestures, as long as they are accompanied by sufficient contextual cues. Adults, as well as children as young as four, describe movements that manipulate objects (e.g. moving balls into boxes) differently from movements that do not manipulate objects (e.g. gesturing the action of moving balls into boxes without actually touching them) (Reference Novack, Wakefield and Goldin-MeadowNovack, Wakefield, & Goldin-Meadow, 2016; Reference Novack and Goldin-MeadowWakefield, Novack, & Goldin-Meadow, 2017). Movements on objects are described in terms of external goals (e.g. “she moved the balls into the boxes”). In contrast, movements that do not touch objects – empty-handed movements – are candidates for gestures. Empty-handed movements are described either as having representational goals, that is, as gesture (e.g. “she showed how to move balls into boxes”) or as having movement-based goals, that is, as movement-for-its-own-sake (e.g. “she waved her hands back and forth”). The more contextual support surrounding an empty-handed movement (e.g. the presence of objects that could be acted upon; the presence of a hand shape in the movement that resembles how the object would be manipulated; the presence of speech that signals the act is communicative), the more likely adults are to describe empty-handed movement as gesture, rather than movement-for-its-own-sake (Reference Novack, Wakefield and Goldin-MeadowNovack et al., 2016). Children (four- to nine-year olds) initially tend to describe empty-handed movement as movement-for-its-own-sake rather than as gesture, suggesting that seeing movement as gesture, particularly when it is presented with limited context, may be challenging early in life. However, some children, even those as young as four, can describe empty-handed movements as gesture, and this ability steadily increases with age. The ability to see movement as gesture is thus in place quite early but expands with age (Reference Wakefield, Novack and Goldin-MeadowWakefield et al., 2017).
The early emerging ability to identify hand movements as gestures suggests that there is something about gestures that draws our attention to them and sets up learning from them at an early age. Given this early and persistent attraction to gestures, we first explore the role that gestures play in learning in infants and young children, and then turn to gesture’s role in learning in older children.
2 Early Learning from Gesture – before Language Is Learned
There are two reasons why gesture is a particularly important learning device for children who do not yet have full access to language as a formal system. First, gestures offer children a way to communicate with others at a time when they are not yet able to produce and comprehend full linguistic structures. Children can point to an object that they do not yet know the word for, and a parent can use gesture to show a child how to do an action that would be challenging to explain using words alone. In this way, gestures provide infants with a way to learn from other people, even if they cannot understand the words those people are saying. Second, because gesture and speech are part of a single integrated system (Reference McNeillMcNeill, 1992), gesture might be particularly beneficial for learning language itself. In other words, because speech and gesture naturally go together, gesture provides a direct link to language. In this way, gesture is able to play a role in reflecting and changing early language learning.
2.1 Child Gesture Indexes Language Knowledge
One way that gesture plays a role in language learning is by indexing when children are ready to make advances in their linguistic development. For example, infants’ own gestures reflect a readiness to learn new words. Babies gesture before they speak, often producing their first pointing gestures between nine and 12 months (Reference Bates, Camaioni and VolterraBates, Camaioni, & Volterra, 1975; Reference Greenfield and SmithGreenfield & Smith, 1976). Importantly, the timing of these earliest pointing gestures indicates and predicts infants’ first words: Items that infants point to between ten and 14 months subsequently begin to show up in their productive vocabularies approximately three months later (Reference Iverson and Goldin-MeadowIverson & Goldin-Meadow, 2005). Early pointing gestures therefore set the stage for language learning, perhaps giving caregivers a signal that the child is ready to receive verbal input about specific items. This relationship has important longer-term implications as well: Children’s gesture use at fourteen months predicts their productive vocabulary three years later (Reference Rowe and Goldin-MeadowRowe & Goldin-Meadow, 2009). In fact, a meta-analysis has found that pointing is both concurrently (r = .52) and longitudinally (over time) (r = .35) related to language development, reinforcing the importance of early gesture in language acquisition (Reference Colonnesi, Stams, Koster and NoomColonnesi, Stams, Koster, & Noom, 2010).
One reason why pointing reflects language learning may be that babies point to items they are interested in, and this interest leads to a desire to learn labels for those specific objects. But if that were the only reason, then any indication of interest on a baby’s part should lead to learning. Instead, there seems to be something special about pointing itself that may be critically tied to language learning (Reference Butterworth and KitaButterworth, 2003). Eighteen-month-olds are more likely to learn labels for objects if they are provided with those labels after they point to the objects than after they reach toward the objects, or look toward the objects (Reference Lucca and WilbournLucca & Wilbourn, 2018). Even though looking, reaching, and pointing are all expressions of interest, receiving a label following pointing is more likely to support word learning than receiving a label following the other two interest behaviors. Infants are also more likely to learn the function of an object if they point to that object than if they reach or look toward the object, suggesting that the effects of pointing on learning may not be restricted to the language domain (Reference Lucca and WilbournLucca & Wilbourn, 2019). Infant pointing may reflect not only an interest in an object, but also a motivation to learn about that object.
Pointing gestures also index semantic development, reflecting infants’ readiness to combine constituents and begin producing two-word utterances. Even after children begin to produce their first words, they continue to gesture, often gesturing together with their speech. Sometimes those gestures provide information that is complementary to speech – for example, when a child points to a cup and says “cup.” But other times the gestures provide information that is supplementary to speech – for example, when a child points to a cup and says “mommy,” indicating mommy’s cup. Children who produce these supplementary combinations begin producing two-word utterances just a few months later (Reference Iverson and Goldin-MeadowIverson & Goldin-Meadow, 2005; Reference Özçalışkan and Goldin-MeadowÖzçalışkan & Goldin-Meadow, 2005). This finding suggests that the ability to combine constituents across two different modalities is a first step in indexing that a child is gearing up to combine constituents all within the verbal modality. Gesture can provide children with a tool to express complex ideas at a time when their language abilities are not up to the task.
As children get a little bit older, gesture can even reflect developing knowledge for linguistic concepts that are abstract, such as number words. Learning how to map the terms “one,” “two,” and “three” to specific exact quantities is harder than learning the meaning of concrete nouns such as “cup” or “chair” because number words can refer to any object or set of objects. Number word acquisition is a drawn-out process that develops gradually over the course of many months and years (Reference CareyCarey, 2009; Reference Sarnecka and CareySarnecka & Carey, 2008; Reference Sarnecka and LeeSarnecka & Lee, 2009; Reference WynnWynn, 1990, Reference Wynn1992). Although children may learn to recite number words as part of a rote counting routine, it is not until they understand that these terms map to specific amounts (i.e. “two” means exactly two, not one and not three) that they have truly learned the number word. Gestures plays a critical role in indexing when children are on the brink of making these difficult conceptual mappings. When asked to indicate the number of objects in a set, children often use both words and gestures in their response (Reference Gibson, Gunderson, Spaepen, Levine and Goldin‐MeadowGibson, Gunderson, Spaepen, Levine, & Goldin‐Meadow, 2019; Reference Gunderson, Spaepen, Gibson, Goldin-Meadow and LevineGunderson, Spaepen, Gibson, Goldin-Meadow, & Levine, 2015). If asked to label a set of two objects, a child who fully understands the meaning of “two” will accurately say “two” while holding up two fingers. But a child who does not yet know “two” may be able to show two before giving the correct verbal response. That is, the child may say “three” to describe two objects but hold up two fingers – a speech–gesture mismatch in which speech and gesture convey different information. When this happens, children are more likely to be accurate in gesture before they are accurate in speech (Reference Gunderson, Spaepen, Gibson, Goldin-Meadow and LevineGunderson et al., 2015). Importantly, children who produce number speech–gesture mismatches of this sort are more likely to benefit from number-word instruction than children who do not produce speech–gesture mismatches (Reference Gibson, Gunderson, Spaepen, Levine and Goldin‐MeadowGibson et al., 2019). In the domain of the number word, gesture can again reflect young children’s readiness to learn.
2.2 Producing Gesture Changes Young Children’s Language Knowledge
We have just seen that gesture can reflect linguistic knowledge, indexing when children are ready to learn concrete nouns or abstract number words, and when they are ready to combine words into sentences. But the gestures that children produce can do more than merely indicate or reflect that a child is ready to make linguistic advances. Producing gesture can play a causal role in the learning process itself.
First, take the example that pointing reflects a readiness to learn new nouns (Reference Iverson and Goldin-MeadowIverson & Goldin-Meadow, 2005). We know that the act of producing gesture plays a causal role in learning because children who have been instructed to point show increases in language outcomes, relative to children who are not instructed to point. In one study, 16-month-old children were either taught words from a picture book (“look at the chair”), taught words while an experimenter pointed at the pictures, or taught words while the experimenter pointed and the child was instructed to point as well (“can you do this?”). After a 7-week intervention, the children who were instructed to point not only increased their pointing in subsequent interactions with their caregivers, relative to the other two groups, but also showed increases in vocabulary (Reference LeBarton, Goldin-Meadow and RaudenbushLeBarton, Goldin-Meadow, & Raudenbush, 2015). This finding indicates that gesture in children can be experimentally increased, and that increases in gesture lead to increases in language outcomes.
Gesture’s ability to causally influence language learning extends beyond infancy and beyond simple nouns. Four-year-old children who were taught to produce gestures (e.g. a gesture illustrating an action on an object) while learning novel verbs (e.g. “this is blicking”) were more likely to learn those verbs and correctly extend them to the action performed on a novel object than children who were not instructed to produce gestures (Reference Wakefield, Hall, James and Goldin‐MeadowWakefield, Hall, James, & Goldin‐Meadow, 2018). Moreover, children who were taught to produce gestures were better at extending the verbs, compared to children who were given experience producing actions directly on the objects. Gesturing while learning a verb can thus play a causal role in helping children generalize that verb to new contexts, and can do so better than concrete action experience.
2.3 Seeing Gesture Changes Young Children’s Language Knowledge
What about learning from other people’s gestures? Infants and young children see their parents and caregivers gesture all the time. What role do the gestures that young learners see play in language learning?
Pointing gestures are intentional referential cues, and the pointing gestures that infants and young children see can help them interpret labeling scenarios as intentional, pedagogical situations. Certainly, gestures are captivating and salient. Even at four to six months old, infants begin to orient to pointing hands (Reference Bertenthal, Boyer and HardingBertenthal, Boyer, & Harding, 2014; Reference Rohlfing, Longo and BertenthalRohlfing, Longo, & Bertenthal, 2012) and, at around twelve months, they understand the communicative intentions of others’ pointing gestures (Reference Behne, Liszkowski, Carpenter and TomaselloBehne, Liszkowski, Carpenter, & Tomasello, 2012; Reference Krehm, Onishi and VouloumanosKrehm, Onishi, & Vouloumanos, 2014; Reference Woodward and GuajardoWoodward & Guajardo, 2002). Not surprisingly, then, starting at around one year, infants show a boost in word learning when taught new words paired with pointing gestures (Reference Booth, McGregor and RohlfingBooth, McGregor, & Rohlfing, 2008; Reference Woodward, Hall and WaxmanWoodward, 2004).
By the time infants are two years old, it is not just pointing gestures that support learning – young children can also use the iconic gestures they see to learn new information. Iconic gestures encode features of the objects or actions that they represent; for example, banging a fist to represent hammering, dragging a finger across the sky to represent the movement of a bird, or touching one’s thumb to fingers in a circular shape to indicate the round shape of a ball. Iconic gestures are relatively rare in spontaneous production in both parents and children (Reference Özçalışkan, Goldin-Meadow, Stam and IshinoÖzçalışkan & Goldin-Meadow, 2011). Nevertheless, experimental studies have shown that iconic gestures can support word learning in early childhood. For example, two-, three-, and four-year olds are able to infer the meaning of novel intransitive verbs – the movement to which the verb refers – if they are presented with co-speech iconic gestures depicting that movement (Reference Goodrich and Hudson KamGoodrich & Hudson Kam, 2009). Language training that includes iconic gesture not only boosts young children’s initial mapping of novel words to referents but also supports greater enrichment, retention, and generalization of those words (Reference Capone and McGregoryCapone & McGregory, 2005; Reference McGregor, Rohlfing, Bean and MarschnerMcGregor, Rohlfing, Bean, & Marschner, 2009).
Gestures are thus well suited to learning words. But gestures can also be useful for learning about things that are hard to describe in words. For example, it is easier to produce gestures explaining how to operate a strange object (like a can opener) than to describe the process in words. Importantly, even two-year olds are able to use gestures of this sort to learn how to operate a novel object. They are able to operate an object after an experimenter demonstrates the action using a gesture (Reference Novack, Goldin-Meadow and WoodwardNovack, Goldin-Meadow, & Woodward, 2015). However, two-year olds are better at learning the function of an object from an action demonstration (specifically a failed-action demonstration that does not show the completed outcome) than from a gesture demonstration. Although children as young as two can appreciate the representational function of gesture, the ability seems to be relatively fragile, making it difficult to learn from iconic gestures at very young ages.
2.4 Summary
Gesture plays an important role in learning for infants and young children. Children’s gesture reflects children’s knowledge and indicates when a child may be getting ready to make a linguistic leap or benefit from language instruction. Infants and young children benefit both from producing gesture and from seeing gesture within the domain of language learning and also in other domains, such as learning about number and object functions. Young learners use, and make use of, pointing gestures quite early in development, and iconic gestures (which have a more complex relation to their referents) a bit later. Gesture thus plays an important role in learning for young children who have not yet developed full language abilities. What happens when language is in full swing? In Section 3, we delve into how gesture supports learning and conceptual change in older children; specifically, how gesture interacts with speech in a developed communication system.
3 Later Learning from Gesture – When Gesture and Language Work Together
When children are young, gesture plays a dominant role in learning, particularly in learning language. But as children continue to develop their language skills, gestures take on a secondary role to language as the two work together to support learning. Speech and gesture naturally work together as part of a single integrated system (Reference McNeillMcNeill, 1992), and it seems that the combined effects of speech and gesture may be greater than their effects alone. We explore the powerful effects of the speech–gesture system in school-aged children within the domain of math.
3.1 Child Gesture Indexes Knowledge about Math
To explore how gesture indexes knowledge in older children, we consider how children learn about mathematical equivalence. Mathematical equivalence is the concept that two sides of an equation must equal the same amount, a concept that is surprisingly challenging for eight- to ten-year-old children in the United States, and one that is foundational to more advanced mathematical skills (Reference McNeill, Hornburg, Devlin, Carrazza and McKeeverMcNeill, Hornburg, Devlin, Carrazza, & McKeever, 2019). Equivalence problems that have missing addends, such as 2 + 7 + 8 = __ + 8, present challenges that reveal children’s misconceptions (Reference Perry, Church and Goldin-MeadowPerry, Church, & Goldin-Meadow, 1988). For example, some children incorrectly think that the equals sign is a signal to add up all of the numbers in the problem; in this example, adding 2, 7, 8, and 8 and putting 25 in the blank. Other children choose to ignore the addend on the right-hand side (the rightmost 8 in the example) and add up the numbers to the left of the equal sign, adding 2, 7, and 8 and putting 17 in the blank. If asked to explain why they wrote down an incorrect solution on a missing addend equivalence problem, children have no problem describing these flawed rationales in words. But, importantly for our purposes, children also gesture as they explain their solutions and those gestures can reveal information that differs from the information conveyed in the accompanying speech. It is these gestures, considered in relation to speech, that best predict which children are ready to learn from math instruction, and which children are not.
Sometimes, children’s gestures match what they are saying in speech. For example, a child says, “I added the 2, the 7, the 8, and the 8,” while pointing with her right hand to each of the four addends as she indicates them in speech (an incorrect “add all” strategy that matches in speech and gesture). Other times, children express different information in gesture and speech. For example, a child says the “add all” explanation in speech but uses one hand to sweep under the numbers on one side of the equation, and the other hand to sweep under the numbers on the other side of the equation. Here, the change in hand shape indicates an awareness (albeit implicit) that the equation has two distinct sides, and the parallelism of the sweeps produced under each side indicates an awareness that the two sides should be treated alike (a correct “equalizer” strategy). When children produce different information in speech and in gesture, they have produced a mismatch (Reference Church and Goldin-MeadowChurch & Goldin-Meadow, 1986; Reference Perry, Church and Goldin-MeadowPerry et al., 1988). Children who produce speech–gesture mismatches as they explain their solutions to missing addend equivalence problems are more likely to profit from instruction than children who do not produce mismatches. The gestures that a child produces, taken in relation to speech, can thus offer insight into the child’s conceptual state, signaling that the child is in a transitional state with respect to learning mathematical equivalence (Reference Goldin-Meadow, Alibali and ChurchGoldin-Meadow, Alibali, & Church, 1993).
Why is this the case? One possibility is that gesture and speech provide separate avenues for considering multiple hypotheses, and that concurrent activation of multiple hypotheses is a feature of the transitional state (Reference Alibali and Goldin-MeadowAlibali & Goldin-Meadow, 1993; Reference Goldin-Meadow, Alibali and ChurchGoldin-Meadow et al., 1993). According to this view, when children are in a transitional state, they might not be aware of the strategies they produce in gesture – in other words, these strategies might not yet be explicit and are thus not yet accessible to speech. Instead, these hypotheses emerge implicitly through gesture. Gesture can thus provide an avenue for exploring new ideas that a child might not yet fully understand and is just beginning to entertain.
Although many children spontaneously gesture when asked to explain their reasoning to math problems, children do not always gesture during their explanations. These children may be missing out on opportunities to express implicit knowledge and, perhaps, missing out on opportunities to learn. But it turns out that mismatches can be coaxed out of a learner by encouraging gesture. If an instructor asks a child to gesture while explaining how to solve a math problem, many of the gestures that the child produces not only convey strategies that are not found in the child’s speech, but those strategies are also correct. When the children are later given a math lesson, the mismatching gestures that result from encouraged gesture turn out to predict learning, just as spontaneously produced mismatches do (Reference Broaders, Cook, Mitchell and Goldin-MeadowBroaders, Cook, Mitchell, & Goldin-Meadow, 2007). Encouraging children to gesture thus appears to bring out implicit knowledge and help learners consider novel ideas, which, in turn, facilitate learning.
Although the two-modality system of communication (i.e. language through the spoken modality, gesture through the manual modality) provides an ideal vehicle for two conflicting ideas to emerge, it is not a necessary component. Deaf individuals gesture along with their signs (Reference Emmorey, Messing and CampbellEmmorey, 1999; Reference SandlerSandler, 2009; Wilcox, this volume) and both gesture and sign are produced in the manual modality. Even though this is, for the most part, a one-modality system of communication, deaf children still produce sign-gesture mismatches on missing addend equivalence problems (Reference Goldin-Meadow, Shield, Lenzen, Herzig and PaddenGoldin-Meadow, Shield, Lenzen, Herzig, & Padden, 2012). Importantly, as in their hearing counterparts, deaf children who produce mismatches in sign and gesture are more likely to benefit from math instruction than deaf children who do not produce mismatches. Gesture’s ability to predict readiness-to-learn thus stems not from the fact that two different modalities are involved in the explanation, but more likely from the fact that the two modalities use different representational formats – gesture conveys information gradiently, whereas language (speech or sign) conveys information discretely.
The fact that mismatches index when a child is in a transitional state can be a useful cue for teachers and parents who want to help their students and children learn. If shown videos of children explaining their solutions to mathematical equivalence problems, adults use information in children’s’ gestures to assess their knowledge (Reference Alibali, Flevares and Goldin-MeadowAlibali, Flevares, & Goldin-Meadow, 1997). Adults are thus sensitive to the variability that children display across modalities. Moreover, they can use the information conveyed uniquely in gesture to determine how likely a child is to benefit from instruction. Teachers also adapt their instruction based on their students’ gestures. Teachers provide more variable instruction, and more different types of problem-solving strategies, to children who produce mismatches than to children who do not produce mismatches. Child gesture can thus play a role in getting teachers to adapt their teaching methods in the moment (Reference Goldin-Meadow and SingerGoldin-Meadow & Singer, 2003). The mismatching state, as reflected in the relation between a child’s speech (or sign) and gesture, can be profitably exploited by adults to support child learning.
3.2 Producing Gesture Promotes Learning Math Concepts
Children’s mismatching gestures reflect an internal readiness to learn a particular concept. Can the act of producing a mismatching gesture also play a causal role in the learning process itself? To address this question, Reference Goldin-Meadow, Cook and MitchellGoldin-Meadow, Cook, and Mitchell (2009) essentially turned children into mismatchers by giving them a math lesson and instructing them to produce mismatching speech and gesture strategies. They taught children to say one correct strategy in speech while producing a second, correct (mismatching) strategy in gesture. Children, who were “turned into” mismatchers benefited from instruction significantly more than children who were taught only the single correct strategy in speech. Mismatch between speech and gesture not only reflects readiness to learn a particular concept, but it can even play a causal role in learning that concept.
The causal effects of producing gesture on learning can extend beyond the immediate learning context. Children taught to produce gesture during instruction show continued learning boosts at least four weeks after the lesson (Reference Cook, Mitchell and Goldin-MeadowCook, Mitchell, & Goldin-Meadow, 2008). This lasting learning may be tied to underlying neural pathways that encode gesture production during learning. One study using functional magnetic resonance imaging (fMRI) found that children who were taught to produce gesture during a math lesson later recruited their motor networks when passively solving equivalence problems while lying still in the scanner and solving problems without moving their hands (Reference Wakefield, Congdon, Novack, Goldin-Meadow and JamesWakefield, Congdon, Novack, Goldin-Meadow, & James, 2019). In other words, the experience of learning through gesture influences the way that children learn and changes the way that children later access that learned knowledge. Producing gesture thus has a lasting influence, both on learning and on the brain.
Thus far, we have focused on how producing gesture can support learning. But it is important to remember that gestures do not occur in isolation – they occur together with speech. And although the gesture itself is important, the way that gesture complements, interacts with, and reinforces speech may also play a critical role in the effects gesture has on learning.
One finding that demonstrates the powerful influence of producing gesture with speech comes from a study in which one group of children was taught to produce speech and gesture during a math lesson, and another group was taught to produce speech and action (an action that was similar to the gesture, but involved manipulating objects) (Reference Novack, Congdon, Hemani-Lopez and Goldin-MeadowNovack, Congdon, Hemani-Lopez, & Goldin-Meadow, 2014). In both groups, children were taught the equalizer strategy in speech; they were taught to say, “I want to make one side equal to the other side.” Both groups were also taught the grouping strategy (the idea that the problem can be solved by grouping the non-repeated addends on the left side of the problem and putting the sum in the blank on the right side) in either action or gesture. In the gesture group, the grouping strategy was expressed by making a V-point with the index and middle fingers to the two numbers on the left side that can be added together. In the action group, the strategy was expressed by physically picking up two magnetic number tiles placed over the addends in the problem. Thus, both the action and gesture represented a distinct strategy from the equalizer strategy expressed in speech, and both provided mismatching and supplementary information to the speech strategy.
In this study, children who produced gesture with speech showed greater depth of learning than children who produced action with speech (Reference Novack, Congdon, Hemani-Lopez and Goldin-MeadowNovack et al., 2014). But, importantly, how children understood the speech they were taught to say predicted the extent of their learning. At the end of the study, children’s learning and generalization was measured in a posttest assessment, and they were asked to explain their reasoning to the posttest problems. Children in both conditions tended to repeat the verbal strategy they had learned during the lesson, mimicking the words they were taught and thus expressing the equalizer strategy in speech. But for children who had learned the speech strategy together with gesture, posttest use of equalizer in speech predicted learning and generalization scores – the more they used the verbal strategy, the higher their posttest and generalization scores, indicating that they had truly understood the meaning of the words they were taught. In contrast, for children who had learned the speech strategy together with action, posttest use of equalizer in speech did not predict learning or generalization scores. In other words, children who learned a speech strategy while producing gesture seemed to have internalized the meaning of the rotely produced words, whereas children who learned the same speech strategy while producing action did not. The combined effect of speech and gesture (but not speech and action) thus appears to help learners make sense of speech, helping to tie that verbal information into their broader conceptual understanding.
3.3 Seeing Gesture Promotes Learning Math Concepts
The gestures that learners see can also have a significant and influential impact on learning math concepts. Here again, it is not just gestures in isolation, but rather the relationship between gestures and the accompanying speech, that is critical for learning outcomes. Teachers naturally gesture when explaining concepts, and this is particularly true in math classrooms where gesturing is routinely used to ground and link ideas (Reference Alibali, Nathan, Wolfgram, Church, Jacobs, Johnson Martinez and KnuthAlibali et al., 2014; Reference Alibali and NathanAlibali & Nathan, 2012). For example, teachers asked to explain mathematical equivalence to students spontaneously gesture and convey substantive information in those gestures, which is relevant to the lesson (Reference Goldin-Meadow, Kim and SingerGoldin-Meadow, Kim, & Singer, 1999). Children are more likely to reiterate correct speech when it is accompanied by matching gestures than when it is accompanied by no gesture at all. Gesturing by teachers can thus support speech and, as a result, facilitate student comprehension of a lesson.
Children can also benefit when the information conveyed in gesture does not match the information conveyed in speech. Children are more likely to learn from instruction in which a teacher provides different, but potentially integrable, information in speech and in gesture than when the teacher provides the same information in speech and in gesture (Reference Singer and Goldin-MeadowSinger & Goldin-Meadow, 2005). If a teacher expresses one problem-solving strategy in speech and a second strategy in gesture, children are more likely to learn how to solve the problem than if the teacher expresses both strategies in speech, or if she expresses only one of the strategies in both speech and gesture. Children thus benefit when they are provided with multiple strategies, but only if those strategies are presented across speech and gesture. Getting multiple strategies, all in the verbal domain, appears to be too much information for the child to take in. Gesture provides a method of presenting multiple ideas to children in a format that they can learn from.
If different strategies in speech and gesture support optimal learning, do they have to be presented at the same time? Or can gesture benefit learners even if it is not presented simultaneously with speech? It turns out that the simultaneous timing of speech and gesture is key to its learning benefits. If speech is presented before a gesture strategy, then the learning benefits of gesture instruction are greatly reduced (Reference Congdon, Novack, Brooks, Hemani-Lopez, O’Keefe and Goldin-MeadowCongdon et al., 2017). Children who receive math instruction with simultaneous speech and gesture are more likely to generalize their knowledge, and retain that knowledge, compared to children who receive instruction with speech followed by gesture (or instruction with one speech strategy followed by a second speech strategy). When asked to explain their solutions at the end of the study, children who successfully learned from simultaneous speech and gesture instruction also integrated the speech strategy into their own understanding of the problems. Simultaneously presented speech and gesture seemed to solidify the speech, helping children internalize the meaning of that speech.
Wakefield and colleagues (Reference Wakefield, Novack, Congdon, Franconeri and Goldin-MeadowWakefield, Novack, Congdon, Franconeri, & Goldin-Meadow, 2018) used eye tracking to probe how children visually attend to gesture when it is presented simultaneously with speech. Children saw a teacher explain how to solve missing addend equivalence problems. In a Speech + Gesture condition, the instructor said the verbal equalizer strategy and, at the same time, produced a gestural grouping strategy. In a Speech Alone condition, the instructor provided the same verbal strategy but without moving her hands. As previous literature would lead us to expect, children were more likely to learn how to solve the equivalence problems, as measured by a posttest assessment, after Speech + Gesture instruction than after Speech Alone instruction. Importantly, however, eye-tracking evidence revealed differences in how children in the two conditions attended to instruction, and how their visual attention linked to posttest learning. Children in the Speech + Gesture condition looked at the problem more than children in the Speech Alone condition. However, amount of looking at the problem did not predict children’s learning outcomes. Instead, it was the timing of learners’ looks at the problem that predicted the depth of their learning. Children in the Speech + Gesture condition were more likely to look at the part of the problem that the instructor was mentioning in speech, at the time when the instructor mentioned it, than children in the Speech Alone condition. The gestures helped children follow along with speech. But some children in the Speech Alone condition were able to follow along with speech. Importantly, however, the degree to which children followed along with speech predicted how well they performed on a posttest only in the Speech + Gesture condition, not the Speech Alone condition. In other words, following along with the instructor’s speech was beneficial only when gesture co-occurred with that speech, and presumably helped learners glean information from the speech.
3.4 Summary
Gesture works together with speech to support learning in school-aged children. Gesturing on a task reveals when children are ready to profit from instruction in that task. Producing one’s own gestures can change knowledge, as does seeing other people’s gestures. Here we have focused on gesture’s role in learning within the domain of math. But gesture indexes who is ready to learn in tasks such as Piagetian conservation (Reference Church and Goldin-MeadowChurch & Goldin-Meadow, 1986) and moral reasoning (Reference Beaudoin-Ryan and Goldin-MeadowBeaudoin-Ryan & Goldin-Meadow, 2014); producing gesture helps children learn in domains such as word concepts (Reference Wakefield and JamesWakefield & James, 2015) and second language acquisition (Reference Macedonia, Müller and FriedericiMacedonia, Müller, & Friederici, 2011); and seeing gesture helps children learn about topics such as bilateral symmetry (Reference Valenzeno, Alibali and KlatzkyValenzeno, Alibali, & Klatzky, 2003) and mental rotation (Reference Goldin-Meadow, Levine, Zinchenko, Yip, Hemani and FactorGoldin-Meadow, Levine et al., 2012). The effects of gesture on learning thus extend across learning contexts, suggesting that gesture can play a major role in how children learn.
4 Open Questions for Future Research
We have reviewed the roles that gesture plays in learning contexts – in indexing learners’ knowledge, and in changing learners’ knowledge through both the gestures learners produce and the gestures learners see. For the youngest learners, gesture plays an important role in the language learning process itself. For older learners who have full access to language, gesture plays a role in learning across a variety of domains, and does so by interacting with, and reinforcing, the language it accompanies. In this last section, we discuss a number of questions that remain with respect to how gesture reflects knowledge and how it changes knowledge.
4.1 Open Questions about How Gesture Indexes Knowledge
The gestures learners spontaneously produce on a task provide insight into when they are ready to profit from instruction in that task. Moreover, encouraging learners to gesture on a task brings out implicit knowledge of the task, which, in turn, makes learners more likely to profit from instruction on the task. As described earlier, Reference Broaders, Cook, Mitchell and Goldin-MeadowBroaders et al. (2007) asked school-aged children to use their hands to explain solutions to their problems, and found that this request increased the number of speech–gesture mismatches children produced, which subsequently led to greater learning outcomes. Similarly, Reference LeBarton, Goldin-Meadow and RaudenbushLeBarton et al. (2015) asked toddlers to point to photos, by saying “Can you put your finger here?” This training manipulation led to an increase in spontaneous pointing gestures with caregivers, which subsequently led to greater vocabulary growth. Explicitly asking learners to use their hands is one way to effectively encourage gesture. But is it the only way?
Reference Cook and Goldin-MeadowCook and Goldin-Meadow (2006) tried to address this question by asking whether children are more likely to gesture if they are asked to gesture (i.e. if they are explicitly told to imitate their teacher’s gestures) or if they just see their teacher gesture during a lesson. A control group was taught by a teacher who did not use gesture. Whether or not they were explicitly told to imitate their teacher’s gesture, children who saw their teacher gesture were more likely to gesture themselves, compared to children who saw instruction without gesture. The mere presence of gesture in a learner’s input (at least, from a teacher) thus has the potential to boost learner gesture production, which can then lead to greater learning gains. Although this study offers some insight into how to boost gesture, it is still not clear whether there are more effective ways to get children to gesture than merely modeling gesture.
An additional question is: What is the best way to increase gesture in a child’s input? Are children more likely to increase their own gesture production if their parents and teachers gesture more, versus if their classmates and siblings gesture more? And how might a child’s relationship with these various social partners matter? That is, would children be more likely to imitate their teacher’s gestures if they liked that teacher, or felt connected to that teacher, compared to if they did not like that teacher? Future work could consider the role that social factors play in encouraging gesture production, and whether those factors can be manipulated to boost gesture in the classroom or at home.
4.2 Open Questions about How Gesture Changes Knowledge
Focusing first on the impact that producing gesture has on learning, we need to ask whether gesture benefits all children, or whether there are certain populations, or children at certain conceptual stages, for whom gesture is particularly helpful. Some studies have found that producing gesture during instruction is differentially beneficial for specific groups of children (Reference Congdon, Kwon and LevineCongdon, Kwon, & Levine, 2018; Reference Wakefield, Foley, Ping, Villarreal, Goldin-Meadow and LevineWakefield, Foley et al., 2019). For example, in a task designed to teach first-grade children about linear measurement, children using a more conceptually advanced (although still incorrect) strategy at the start of the study were more likely to benefit from producing gesture during instruction than children using a more rudimentary (incorrect) strategy (Reference Congdon, Kwon and LevineCongdon et al., 2018). This study raises the question of how individual differences, such as a child’s starting state with respect to a task, affects when a child is most likely to benefit from producing gesture in the task. Future work that adopts an individual-differences approach might be particularly useful for translating gesture findings into classroom settings.
Turning next to the impact that seeing gesture has on learning, we need to explore gesture’s role in classrooms and learning environments as online materials are incorporated into their practices. Teachers now often use online teaching materials in the classroom, and students themselves use internet resources (such as YouTube) to look for instructional videos at home. Future research therefore needs to consider how gesture can be best used in online educational content. Surprisingly, only a minority of online instructional videos include sight of an instructor’s hands, thus giving no opportunity for gesture. One study found that only thirty-two percent of randomly selected YouTube instructional math videos showed hands; the rest presented instructional content without any humans at all (Reference Son, Ramos, DeWolf, Loftus and StiglerSon, Ramos, DeWolf, Loftus, & Stigler, 2018). Of the videos that showed hands, almost all included some sort of gesture, although they were primarily pointing gestures. Research suggests the learning from gesture online has equivalent effects to learning from gesture in person (Reference Koumoutsakis, Church, Alibali, Singer and Ayman-NolleyKoumoutsakis, Church, Alibali, Singer, & Ayman-Nolley, 2016), but more research is needed to best know how to incorporate gesture into online materials. For example, videos can use gesture, but also visual highlighting, animated content, or multiple screens, all to convey multiple ideas at the same time. Are these techniques equally effective? These questions need to be asked by gesture researchers given the changing dynamics of modern learning.
5 Conclusions
The gestures that children, parents, and teachers produce play a key role in how learners acquire new information. Gestures support learning even before children know language and continue to reflect and change knowledge throughout development. Children take in information from gestures that they see others produce and learn from the gestures that they themselves produce. Gestures are separate from, but interact with, speech to influence how language is understood and, in this way, provide a unique tool for learning in a wide range of domains. Although a number of open questions remain, it is clear that gesture has the potential to play a powerful – and previously underused – role in educational contexts.




