Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-25wd4 Total loading time: 0 Render date: 2024-04-29T06:43:51.991Z Has data issue: false hasContentIssue false

Action Understanding

Published online by Cambridge University Press:  10 April 2024

Angelika Lingnau
Affiliation:
Universität Regensburg, Germany
Paul Downing
Affiliation:
Bangor University

Summary

The human ability to effortlessly understand the actions of other people has been the focus of research in cognitive neuroscience for decades. What have we learned about this ability, and what open questions remain? In this Element the authors address these questions by considering the kinds of information an observer may gain when viewing an action. A 'what, how, and why' framing organises evidence and theories about the representations that support classifying an action; how the way an action is performed supports observational learning and inferences about other people; and how an actor's intentions are inferred from her actions. Further evidence shows how brain systems support action understanding, from research inspired by 'mirror neurons' and related concepts. Understanding actions from vision is a multi-faceted process that serves many behavioural goals, and is served by diverse mechanisms and brain systems.
Type
Element
Information
Online ISBN: 9781009386630
Publisher: Cambridge University Press
Print publication: 09 May 2024

1 Introduction

1.1 Motivation

Our experience of everyday social life is deeply shaped by the actions that we see others perform: consider a parent carefully watching her infant try to feed herself, a fan watching a tennis match, or a pottery student observing her teacher throw a pot. Although we may sometimes pause momentarily in puzzlement (what is my neighbour doing up there on his roof?) or be caught by surprise (by a partner’s sudden romantic gesture), we normally understand others’ actions quickly and without a feeling of expending much effort. By doing so, we unlock answers to important questions about the world around us: What will happen next? How could I learn to do that? How should I behave in a similar situation? What are those people like?

How, then, do we understand observed actions? The simplicity of this question, and the fluency of action understanding, obscures the complexity of the underlying mental and neural processes. To start to answer it, and in contrast to several recent valuable perspectives (e.g. Reference KilnerKilner, 2011; Reference Oosterhof, Tipper and DowningOosterhof et al., 2013; Reference Pitcher and UngerleiderPitcher & Ungerleider, 2021; Reference Tarhan and KonkleTarhan & Konkle, 2020; Reference Thompson, Bird and CatmurThompson et al., 2019; Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al., 2019; Wurm & Caramazza, 2021) we do not focus first on possible brain mechanisms (including the possible role of mirror neurons; see Reference Bonini, Rotunno, Arcuri and GalleseBonini et al., 2022; Reference Heyes and CatmurHeyes & Catmur, 2022). Instead, first thinking about the problem in terms of Reference MarrMarr’s (1982) computational level, we ask: why would an observer attend to the actions of others? A reasonable answer to this question might be: Observers attend others’ actions to learn about the meaning and outcomes of different action kinds; to establish causal links between actors’ actions and their goals, states, traits, and beliefs; and to use that learned knowledge to make predictions about the social and physical environment, and to extend one’s own action repertoire. (Although beyond the focus of this review, we also sometimes attend others’ actions for pure enjoyment, e.g. when watching ballet or figure skating; e.g. Reference Christensen and Calvo-MerinoChristensen & Calvo-Merino, 2013; Reference Orgs, Hagura and HaggardOrgs et al., 2013). Achieving these multiple complex aims requires suitable mental representations and processes – algorithms in Reference MarrMarr’s (1982) terms. That is the main focus of Section 2 of this article. In Section 3, we go on to describe key neuroscientific evidence on action understanding (focusing on Marr’s implementation level), drawing links to the concepts and constructs described in Section 2. In the final section, we identify directions for future research that are highlighted by this review.

1.2 Definitions and Scope

A survey of the literature in neuroscience, psychology, computer science, and cognitive science reveals a proliferation of terminology (action recognition, action comprehension, action identification, action perception, action observation, action interpretation, activity recognition) and equally diverse definitions. These make related, but not always consistent, assumptions and distinctions (Table 1) that may in part be due to different aspects of action that are highlighted in different experimental paradigms (Figure 1). This diversity is to be expected, given the complexity of the topic and the need to simplify it to gain traction. For this review, we adopt the term ‘action understanding’ as an umbrella term of convenience, to refer in general to the act of making sense of viewed human actions, and we avoid making further terminological distinctions. We resist the temptation to provide a single, concise definition of action understanding, preferring that this should emerge from the breadth of behaviours, cognitive mechanisms, and brain systems that we describe. However, some basic assumptions provide a grounding: we are concerned with observable behaviours that are intended to effect changes in the physical world or on others’ minds.

Table 1 Definitions of action understanding.

Reference Gallese, Fadiga, Fogassi and RizzolattiGallese et al. (1996)‘the capacity to recognize that an individual is performing an action, to differentiate this action from others analogous to it, and to use this information in order to act appropriately’
Reference Rizzolatti, Fogassi and GalleseRizzolatti et al. (2001)‘We understand actions when we map the visual representation of the observed action onto our motor representation of the same action’.
Reference Kohler, Keysers and UmiltaKohler et al. (2002), ScienceAudio-visual mirror neurons might contribute to action understanding by evoking ‘motor ideas’
Reference Fogassi, Ferrari and GesierichFogassi et al. (2005), ScienceMirror neurons selectively encode the goals of motor acts and thus facilitate action understanding

Reference Bonini and FerrariBonini & Ferrari (2011)

Action recognition: ‘know again, recall to mind’; the ability to form a link between sensorimotor description and motor representations
Reference Rizzolatti and SinigagliaRizzolatti & Sinigaglia (2016)‘the outcome to which the action is directed’

A: Examples of paradigms used in the monkey literature.

B: Examples of paradigms used in the human literature. These examples give a sense of the wide variety of stimuli and tasks used in this literature, which may include schematics, still images, animations, or movies of typical or atypical manual or whole-body actions, either in a natural or a constrained context. The diversity of these examples is matched by the diversity of terminology and definitions adopted in the action understanding literature (see Table 1).

Figure 1 What we talk about when we talk about action understanding.

What topics fall under the broad umbrella of ‘action understanding’? We focus here on human action understanding, so we do not consider purely engineering-led approaches such as AI systems for what is typically known as action classification or activity recognition in that literature (Reference Muhammad, Ullah and ImranMuhammad et al., 2021; Reference Vrigkas, Nikou and KakadiarisVrigkas et al., 2015). As vision is at the heart of most treatments of human action understanding, we focus on understanding seen real-world actions (but see Reference Camponogara, Rodger, Craig and CesariCamponogara et al., 2017; Reference Repp and KnoblichRepp & Knoblich, 2004, for discussion of action understanding in other modalities). Evidence from animals is reviewed for its influences on thinking about human action understanding. We set aside the interpretation of actions and interactions that are conveyed symbolically, such as the decisions of a partner in an economic game like the Prisoners’ Dilemma (e.g. Reference AxelrodAxelrod, 1980). Finally, we focus on understanding by typical healthy adult observers in exclusion of neuropsychological or neuropsychiatric populations. The logic for this is that while action understanding difficulties are associated with (for example) autism, schizophrenia, or semantic dementia, it is not clear that this is necessarily a central feature of those conditions (see e.g. Reference Cappa, Binetti and PezziniCappa et al., 1998; Reference Cusack, Williams and NeriCusack et al., 2015; Reference Frith and DoneFrith & Done, 1988). Action clearly is central to apraxia, however in that case definitions and diagnostics tend to focus on patients’ production of appropriate gestures and skilled actions, particularly those relevant to tool use (Reference Baumard and Le GallBaumard & Le Gall, 2021) rather than understanding per se (but see e.g. Reference Kalénine, Buxbaum and CoslettKalénine et al., 2010). That said, these difficulties may be informative for our thinking about the different computations and algorithms involved in action understanding; the same caveat applies to developmental evidence (Reference Reddy and UitholReddy & Uithol, 2016; Reference SouthgateSouthgate, 2013).

Other, more specific action-related topics have recently been reviewed elsewhere: these include the perception of social interactions (Reference McMahon and IsikMcMahon & Isik, 2023; Reference PapeoPapeo, 2020; Reference Quadflieg and WestmorelandQuadflieg & Westmoreland, 2019), the execution of joint or collaborative actions (Reference Azaad, Knoblich and SebanzAzaad et al., 2021; Reference Sebanz and KnoblichSebanz & Knoblich, 2021), and visual perception of biological motion, especially from ‘point-light’ displays (Reference Blake and ShiffrarBlake & Shiffrar, 2007; Reference Thompson and ParasuramanThompson & Parasuraman, 2012; Reference Troje and BasbaumTroje & Basbaum, 2008).

1.3 General Principles

Two principles that have motivated many researchers’ thinking about action understanding recur in our review. First, inspired by theories of hierarchies in the motor system (Reference GeorgopoulosGeorgopoulos, 1990; Reference Harpaz, Flash and DinsteinHarpaz et al., 2014; Reference Turella, Rumiati and LingnauTurella et al., 2020; Reference Uithol, van Rooij, Bekkering and HaselagerUithol et al., 2012), actions are often described at different hierarchical levels (see Table 2). These include kinematics (the how of an action), the action kind (the what of an action), and the intention (the why of an action). These levels have strong implications for the representations and processes that are required for action understanding, and accordingly we adopt this three-way distinction to structure Section 2. The idea that actions can be described at multiple levels implies that action understanding may emphasize one of these levels over the others, depending on the observer’s goals. For example, a basketball player who aims to improve his three-pointer performance might attend to the kinematics of the throw (e.g. the angle of the arm and the hand, the trajectory of the ball). In contrast, a basketball player that aims to prevent a three-pointer by another player might attend to the intention of his opponent (e.g. by focusing on the gaze direction of the other player). This view conflicts with descriptions of action understanding as ‘automatic’, which would imply a process that unfolds independently of observer goals and the demands of other concurrent tasks that may ‘load’ cognition or perception. So in Section 2, we also describe different conceptions of automaticity and how they might play out in different action understanding situations.

Table 2 Action understanding at different hierarchical levels.

Reference Vallacher and WegnerVallacher & Wegner (1989)Actions can be identified on a range of different levels, from low level (how is the action performed?) to high level (why or with what is the action performed?)
Reference Hamilton and GraftonHamilton & Grafton (2007)
  • Muscle level (pattern of activity in all involved muscles)

  • Kinematic level (shape of the hand, movement of the arm)

  • Goal level (intention and outcome)

Reference Spunt, Satpute and LiebermanSpunt et al. (2011)How vs What vs Why
Reference KilnerKilner (2011)
  • Kinematic level (trajectory and velocity profile)

  • Motor level (processing and pattern of muscle activity)

  • Goal level (immediate purpose of the action)

  • Intention level (overall reason)

Reference Wurm and LingnauWurm & Lingnau (2015)
  • Abstract level (generalization across different exemplars)

  • Concrete level (exemplar-specific)

Reference Thompson, Bird and CatmurThompson et al. (2019)
  • Action identification (e.g. precision versus whole hand)

  • Goal identification (e.g. to grasp the cup)

  • Intention identification (e.g. to quench thirst)

Reference Zhuang and LingnauZhuang & Lingnau (2022)Taxonomic levels (superordinate, basic and subordinate level)

Second, like any form of perception (Reference Bar, Kassam and GhumanBar et al., 2006; Reference de Lange, Heilbron and KokDe Lange et al., 2018; Reference Hutchinson and BarrettHutchinson & Barrett, 2019; Reference Rao and BallardRao & Ballard, 1999), action understanding enables predictions about what is likely to follow next, over timescales from seconds to years (Reference Kilner, Vargas, Duval, Blakemore and SiriguKilner et al., 2004; Reference Oztop, Wolpert and KawatoOztop et al., 2005; Reference Schultz and FrithSchultz & Frith, 2022; Reference Umiltà, Kohler and GalleseUmiltà et al., 2001). In some situations predictions are implicit (e.g. watching our tennis partner prepare to serve, a hunch that she will fault), and explicit in others (e.g. anticipating that the opposing tennis player will try to play a cross ball while one finds oneself in the opposite corner of the court). Predictions emerge across the hierarchical levels identified above. For example, from local cues such as hand or arm kinematics, gaze direction, and grasp preshaping, an observer can make spatially and temporally precise predictions about how an action will unfold (Reference McDonough, Hudson and BachMcDonough et al., 2019), and about the target of a reaching movement or the intended use of a grasped object (Reference Ambrosini, Costantini and SinigagliaAmbrosini et al., 2011, Reference Ambrosini, Pezzulo and Costantini2015; Reference Amoruso and FinisguerraAmoruso & Finisguierra, 2019; Reference Amoruso and UrgesiAmoruso & Urgesi, 2016). At the same time, our semantic knowledge about different kinds of actions includes descriptions of their typical aims, and of the kinds of events that typically tend to follow (cf. Reference Schank and AbelsonSchank & Abelson, 1977). For example, observing a friend hand-washing the dishes implies that next they will be dried and put away. Finally, observing an action supports inferences about an actor’s underlying goals and beliefs, enabling predictions about what future actions would be consistent with those beliefs, or further those goals, and indeed how that actor might behave in new situations even into the distant future.

2 What, How, and Why?

2.1 ‘What’: Two Conceptions of Action Categorization

To answer the question ‘what kind of action am I seeing now?’ requires extracting visual information about the surrounding scene, the actors and their movements, objects, and the relationships among those elements. This perceptual evidence must be compared to stored representations of the actions that the observer knows about. The studies considered in this section have addressed two main research questions posed by those requirements: How is long-term knowledge about action kinds organized? And how is perceptual data matched to that knowledge?

Classifying an action requires the ability to generalize over variation caused by different viewpoints, lighting effects, occlusion, and other visual variables, just as in visual object recognition (see also Reference Perrett, Harries and BevanPerrett et al., 1989). Further, a given action (e.g. chopping vegetables) may be carried out by many possible actors, using many possible objects, in many possible locations. That problem of generalization is complemented by the problem of specificity, which requires correctly excluding from a category exemplars that do not belong. Taking an analogy from objects, for example, one must understand that a robin (canonical exemplar) and a penguin (unusual exemplar) are both birds, but that a bat, despite numerous shared features with the bird category, is not. Figure 2 illustrates that similar problems arise for action understanding, where the challenge is to correctly include visually diverse exemplars while excluding attractive foils.

Figure 2 Successful action understanding requires generalizing over highly distinct exemplars (e.g. of <chopping vegetables>; right side) including unusual ones (centre bottom image) while excluding highly similar non-exemplars (e.g. carving; left side).

Finally (and also like objects), actions are well described by taxonomies that include an abstract (or ‘superordinate’) level, a basic level, and a subordinate level (Reference Rosch, Mervis, Gray, Johnson and Boyes-BraemRosch et al., 1976; Reference Zhuang and LingnauZhuang & Lingnau, 2022). For example, ‘playing tennis’ may describe an action at the basic level that is part of a superordinate category ‘sporting activities’ and also includes the subordinate level ‘performing a forehand volley’. The basic level has been proposed to play a key role in object categorization, e.g. as evidenced by the number of features used to describe objects, and the speed of processing (Reference Rosch, Mervis, Gray, Johnson and Boyes-BraemRosch et al., 1976). Reference Zhuang and LingnauZhuang & Lingnau (2022) recently reported similar results for actions. Specifically, participants produced the highest number of features to describe actions at the basic level (see also Reference Morris and MurphyMorris & Murphy, 1990; Reference RifkinRifkin, 1985). Moreover, they verified action categories faster and more accurately at the basic and the subordinate level in comparison to the superordinate level. These findings suggest that the taxonomical levels of description proposed for objects have a homology in the long-term representation of action knowledge.

Action Spaces

One major approach to understanding the representation of action knowledge was influenced by previous work investigating the mental representation of objects (e.g. Reference Beymer and PoggioBeymer & Poggio, 1996; Reference EdelmanEdelman, 1998; Reference GärdenforsGärdenfors, 2004; Reference Kriegeskorte, Mur and BandettiniKriegeskorte et al., 2008a, b). These studies develop the idea that known actions are described by multidimensional ‘spaces’ (see Figure 3), in which each type of action occupies a point in that space (Reference Dima, Tomita, Honey and IsikDima et al., 2022; Reference Kabulska and LingnauKabulska & Lingnau, 2022; Reference Lingnau and DowningLingnau & Downing, 2015; Reference Thornton and TamirThornton & Tamir, 2022; Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al., 2019; Reference Watson and BuxbaumWatson & Buxbaum, 2014; Reference Zhuang and LingnauZhuang & Lingnau, 2022). Traversing along one hypothetical dimension, actions should vary systematically on one action property; furthermore, by hypothesis, the distance between a pair of actions in such a space should be directly related to the perceived subjective similarity of those actions (Reference Dima, Tomita, Honey and IsikDima et al., 2022; Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al., 2019).

Figure 3 Illustration of the action ‘spaces’ idea. Action kinds may be construed as atom-like points in representational spaces, the dimensions of which may correspond to psychologically meaningful distinctions. Positions of actions reflect their values on hypothetical mental dimensions. Distances between actions are proportional to subjective judgments of the similarity between them. Here we present only a reduced example for the sake of clarity; realistic action spaces would be far more complex.

Several possible action spaces have been identified in studies that combine a data-driven element such as subjective similarity ratings or analyses of text corpora, with data-reduction methods such as hierarchical cluster analysis, principal component analysis, or multidimensional scaling. For example, in a series of studies based on analyses of large text corpora and other measures, Reference Thornton and TamirThornton & Tamir (2022) identified Abstraction, Creation, Tradition, Food, Animacy, and Spiritualism as key action dimensions (referred to as the ACT-FASTaxonomy). These dimensions successfully captured variance in the judged similarity of action words, and also described the socially relevant features of actions (e.g. how, why and by whom an action is performed).

Focusing instead on action images, Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al. (2019) required participants to group pictures of actions by their perceived similarity in a multi-item arrangement task (Reference Kriegeskorte and MurKriegeskorte & Mur, 2012). Analysis by k-means clustering and principal component analysis revealed several meaningful action categories, including food-related actions, communicative actions, and locomotion. These categories fell along dimensions that were organized according to the type of change induced by the action, and the type of need to be fulfilled by the action (e.g. basic/physiological versus higher social needs). Reference Kabulska and LingnauKabulska & Lingnau (2022) used a very similar approach with pictures of 100 different actions that revealed additional action categories such as Interaction, Gestures, and Aggressive actions. Analyses of action features provided evidence for a strong dimension capturing the positive or negative valence of the action. Still further evidence shows how typically intended outcomes may also shape action spaces. For example, Reference Tarhan, de Freitas and KonkleTarhan et al. (2021) analysed data from a multi-arrangement task to find that a goal similarity model predicted judgments of action similarity better than models based on movement similarity or on visual similarity.

Together, this family of approaches has revealed a diverse set of candidate principles and dimensions that organize action knowledge. The ‘spaces’ that emerge from a given study depend on the kinds of actions being considered, and the specific task set for the observers (a topic that we will return to in Section 3). It may be that multiple distinct spaces, or hierarchically nested spaces, best capture the huge and diverse range of actions that observers can understand, rather than a single space. Indeed, in each of the studies described, a large proportion of variability remained unexplained, suggesting that there are additional organizing principles not yet identified.

Our knowledge about actions changes as we learn; and in different contexts, different aspects of actions may be more or less immediately relevant. Accordingly, action spaces are likely to be dynamic over both short and longtime scales. Attention or context effects may dynamically ‘warp’ the shape of action spaces as they are used in making action judgments. For example, Reference Shahdloo, Çelik, Urgen, Gallant and ÇukurShahdloo et al. (2022) used neuroimaging to reveal how the distributed patterns of activity evoked by actions were modulated by changing the observers’ immediate task to attend either to their communicative or their locomotion characteristics. At longer time scales, effects of experience and expertise are relevant. For one example, over the course of years of practice, a gymnast must build a dense and detailed ‘space’ representing her specialist events, which would likely have more, and more meaningful dimensions relative to a novice observer. Moreover, attention and experience might modify the weights of specific dimensions (Reference GärdenforsGärdenfors, 2004). To our knowledge, these ideas have not been explored empirically from the action spaces perspective.

By definition, the organization of observed actions into action spaces shows similarities with semantic networks (e.g. Reference Collins and QuillianCollins & Quillian, 1969) and semantic categories generally (see e.g. Reference LevinLevin, 1993; Reference PinkerPinker, 1989; Reference TalmyTalmy, 1985). However, the organization of actions depicted by visual stimuli and by verbal material is bound to differ in important ways since visually presented actions are concrete instantiations of a specific action, whereas language has the flexibility to refer to actions at varying levels of abstraction. For related discussions, see Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al. (2019); Reference Vinson and ViglioccoVinson & Vigliocco (2008); and Reference Watson and BuxbaumWatson & Buxbaum (2014).

Action Frames

The ‘space’ metaphor is very powerful for capturing the key dimensions of action knowledge as well as subjective judgments of the similarity of different kinds of action. One limitation of the approach, however, is that it obscures some of the rich internal structure that constitutes our knowledge of familiar actions. This is not easily captured in a dimensional representation that treats action concepts as single points in a mental space. Accordingly, building on previous conceptions of knowledge frames (Reference MinskyMinsky, 1975) and scripts (Reference Bower, Black and TurnerBower et al., 1979; Reference Schank and AbelsonSchank & Abelson, 1977) here we consider the idea of action frames. Related ideas have also been more recently explored in the context of action understanding (Reference Aksoy, Orhan and WörgötterAksoy et al., 2017; Reference Chersi, Ferrari and FogassiChersi et al., 2011; Reference Zacks, Speer, Swallow, Braver and ReynoldsZacks et al., 2007), although these have tended to focus more narrowly on specific issues such as the sequential nature of actions. An action frame may be seen as a schematic representation that describes, abstractly, important features of an action, such as its intended outcomes or goals; means by which the goals typically are achieved; and the kinds of movements, postures, objects, and locations associated with that kind of action (Figure 4). These associations are assumed to be picked up from statistical co-occurrences in our natural environment. Action frames may help to identify action kinds by interacting with the output of perceptual systems that recognize objects and scenes (Reference Epstein and BakerEpstein & Baker, 2019), detect and classify people (Reference Pitcher and UngerleiderPitcher & Ungerleider, 2021), and estimate their poses and movements (Reference Giese and PoggioGiese & Poggio, 2003). These perceptual systems analyse an observed action, abstracting over some details (e.g. the colour of a knife) while emphasizing others (e.g. its position relative to the ingredients, and its motion related to the movements of the chef). Consistent evidence gathered in the perceptual systems and schematic action frame representations mutually reinforce each other, whereas inconsistent evidence leads to suppression. Recent perspectives have also highlighted the perceptual significance of typical relationships amongst scene elements (Reference Bach, Knoblich, Gunter, Friederici and PrinzBach et al., 2005; Reference Green and HummelGreen & Hummel, 2006; Reference Hafri and FirestoneHafri & Firestone, 2021; Reference Kaiser, Quek, Cichy and PeelenKaiser et al., 2019), which will also have diagnostic value for distinguishing among different kinds of actions. We can understand action classification as emerging from competitive interactions amongst perceptual systems, and between perceptual systems and action frames, such that (normally) this system rapidly converges on an interpretation that best coheres with the available evidence (cf. Reference ErnstErnst, 2006; Reference Netanyahu, Shu, Katz, Barbu and TenenbaumNetanyahu et al., 2021).

Figure 4 Illustration of the ‘action frames’ perspective. A, B: Perceptual subsystems process objects, body postures, movements, and scenes to extract relevant aspects of the action, and the relationships among them. C: Mental ‘action frames’ capture the roles, relationships, and reasons that comprise our action knowledge. Slots of a given frame gather perceptual evidence about scene elements. Matches increase the evidence for one action (<cooking>) relative to others (<cleaning>). Normally, interactions between perceptual subsystems and action frames cohere rapidly to select one action frame; action understanding is this convergence of activity. Links omitted for clarity.

Action frames must be abstract enough to encompass the wide perceptual variety of different action instances that we described earlier. They must also be flexible or probabilistic, rather than rigid, to account for our ability to tolerate variations: for example, cooking normally happens in the kitchen but may also take place outdoors in a campsite. Further, a key aspect of our knowledge about action kinds is an understanding of the desired outcomes that normally motivate a given action. Accordingly, frames need to describe not only knowledge about the directly observable elements that constitute an action; they also need to include descriptions of the expected mental states of the actors. Finally, they also require access to more general semantic knowledge of the physical and social world. This includes, for example, knowledge about typical cause-and-effect relationships (cooking pasta makes it soft and edible; stealing from someone makes them angry). Likewise, we deploy knowledge about the ways in which the properties of objects like tools make them suited to specific kinds of manipulations for specific kinds of outcomes – the shape, hardness, and weight distribution of a hammer makes it useful for driving in nails (e.g. Reference Buxbaum, Shapiro and CoslettBuxbaum et al., 2014; Reference Osiurak and BadetsOsiurak & Badets, 2016; see also Reference Binkofski and BuxbaumBinkofski & Buxbaum, 2013).

Action frames as described here might offer several useful properties. First, they may describe the highly predictable way in which actions generally unfold over time that is not readily captured by a semantic space of actions. For example, purchasing food ingredients is not just semantically related to cooking; one typically precedes the other in a predictable way. Likewise, at a finer grain, preparing a soup may include obtaining, washing, peeling, and slicing vegetables, sub-actions that only make sense in a specific order. These regularities enable an observer to anticipate what is likely to follow next (Reference Aksoy, Orhan and WörgötterAksoy et al., 2017; Reference Chersi, Ferrari and FogassiChersi et al., 2011; Reference Schank and AbelsonSchank & Abelson, 1977; Reference Zacks, Speer, Swallow, Braver and ReynoldsZacks et al., 2007). It may be difficult to capture these kinds of relationships in a scheme in which action kinds are considered as ‘points’ in a multidimensional Euclidean space. A more abstract and compositional representation may be better suited to capture the temporal and causal relationships that describe typical chains of actions.

Prediction

We previously highlighted the important theme of prediction in action understanding. Here we briefly explore how expectations and predictions might play out from the perspectives of action spaces and action frames. For one example, Tamir, Thornton, and colleagues have proposed that the proximity of actions in a space reflects not only semantic similarity, but also transitional probability. In general, one cooking event is more likely to immediately follow another cooking action than (say) a vehicle-repair action. In a series of studies, Reference Thornton and TamirThornton & Tamir (2021a) found that participants’ ratings of transition probabilities between actions corresponded well to actual rates of action transitions (determined on the basis of several large naturalistic datasets). More important, Reference Thornton and TamirThornton & Tamir (2021b) demonstrated that actions that were close to each other in the ACT-FAST action space described above were also more likely to follow each other.

From the action frames perspective, expectations – for example, evidence that an action will unfold in a kitchen – allows relevant action frames (e.g. for cooking, eating, and washing up) to compete with and suppress less relevant ones. In turn, this enables pre-activation of cooking-relevant objects (e.g. a knife), again at the expense of other unrelated objects (e.g. pliers). The net effect of these competitive interactions should be a relative advantage in understanding actions that are consistent with expectations, by suppression of unlikely alternatives. Indeed, when actions are embedded in an incongruent context, they take longer to be processed in comparison to actions embedded in a neutral or congruent context (Reference Wurm and SchubotzWurm & Schubotz, 2012, Reference Wurm and Schubotz2017). Likewise, ambiguous actions are recognized with higher accuracy when taking place in a congruent context in comparison to incongruent or neutral contexts (Reference Wurm and SchubotzWurm & Schubotz, 2017; Reference Wurm, Artemenko, Giuliani and SchubotzWurm et al., 2017a). Here, the surrounding context (e.g. the emotional facial expression of an agent) shapes the interpretation of the action (e.g. an approaching fist with the intention to punch or to greet the observer with a fist bump; see e.g. Reference Kroczek, Lingnau, Schwind, Wolff and MühlbergerKroczek et al., 2021), just as ambiguous objects (e.g. Reference Brandman and PeelenBrandman & Peelen, 2017) and emotional facial expressions (Reference Aviezer, Trope and TodorovAviezer et al., 2012) are interpreted in reference to their immediate context in the domains of scene and body perception.

To summarize, here we have considered two complementary perspectives on how the mind organizes long-term knowledge about familiar actions. These are not mutually exclusive ideas: as action understanding is so complex, each perspective may better describe different aspects of what we know about actions, how that knowledge is applied to understanding ‘what’ a given action is, and how that supports predictions about the actors and events that we interact with.

2.2 ‘How’: Observational Learning, Imitation, and Expertise

In many contexts, the specific manner in which an action is carried out (‘how’) may be more immediately relevant than its category (‘what’), so here the action frames and spaces constructs may have fewer applications. Much of the research under this heading has focused on learning, to ask how observing actions can change the observer’s own action repertoire; and conversely, how one’s own experience with a family of actions influences how those actions are perceived. We also describe a strand of the literature in which attending to the ‘how’ of an action provides the observer with cues about the beliefs, intentions, or longstanding traits of the actors.

Observational Learning

Observational learning (sometimes ‘social learning’) refers, in the broadest sense, to acquiring knowledge about the contingencies between behaviour and outcomes by observing others (Reference Bandura and WaltersBandura & Walters, 1977). By observation, without the need for first-hand experience that may be ineffective, slow, or even dangerous, we can learn that touching an electrified fence is painful; that others find playing a musical instrument rewarding; or that posting controversial opinions to Twitter attracts both praise and condemnation.

One focus area within this broad theme concerns the transfer of learning from one domain (normally vision) to motor performance. What can we learn about serving a tennis ball, shaping a clay pot, or performing brain surgery, by watching an expert do those things? A fundamental issue in this literature concerns the role of symbolic or cognitive representations in mediating the benefits of observational learning. As an example, a tennis novice observing a coach in order to learn to serve might try to segment the observed movement according to summary cues such as ‘down together’, ‘up together’, ‘back’, ‘hit’, and ‘follow-through’. In a classic study of motor sequence learning, Reference Bandura and JeffreyBandura & Jeffrey (1973) compared participants’ learning and retention of simple manual action sequences as a function of rehearsal type. Those participants who were instructed to encode observed sequences in verbal terms (e.g. with letter codes) recalled sequences better than those who were not, especially over longer intervals. The insight is that at least in some cases, symbolic representations are more informationally compact and durable (Reference Uithol, van Rooij, Bekkering and HaselagerUithol et al., 2012), and therefore more readily rehearsed and retrieved at a later time, compared to ‘raw’ motor representations.

Related research asks whether the action representations acquired from observation are explicit, in the sense of being overtly retrievable and usable as part of a strategy, or instead implicit, in the sense of being acquired without awareness. For example, serial response tasks require participants to rapidly press one of several keys corresponding to the location of a single target. Typically, if the sequence of locations is repeated in a second-order cycle, response times improve with practice. However, explicit knowledge of the sequence may be absent, for example as tested in a subsequent task requiring participants to guess the next item in a series (Reference SegerSeger, 1997). In contrast, studies of observational learning – in which participants learned keypress sequences simply by watching the target events appear – found that sequence learning was mediated mainly by explicit, verbalizable knowledge (Reference Kelly, Burton, Riedel and LynchKelly et al., 2003). Interestingly, Reference Bird, Osman, Saggerson and HeyesBird & colleagues (2005) found that when sequences were observed not simply as visual events, but as the outcomes of a live actor’s behaviour, implicit learning was also revealed, suggesting that the actor’s presence encouraged a more first-person like encoding of the action sequences.

The preceding examples all relate to categorical actions and action sequences – pressing one of four keys, for example – for which the specific action dynamics were not relevant. Other studies have examined observational learning with actions that involve more continuous variables. For example, Reference Mattar and GribbleMattar & Gribble (2005) required participants to make simple reaches under the influence of an unseen ‘force field’ that deflected those movements. Participants who first watched another actor perform this task before attempting it showed stable benefits (e.g. smaller disruptions to their own reach trajectories) compared to controls. Notably, this observational learning remained essentially intact even when it took place under a demanding concurrent cognitive load, suggesting a relatively automatic and implicit form of learning (see also Section 3).

Conversely, other work has examined the transfer of motor learning to visual action judgments. Reference Casile and GieseCasile & Giese (2006) demonstrated how learning to perform an unusual pattern of walking movements selectively improved visual detection of those movements when they were rendered as point-light animations. In a more naturalistic context, Reference Aglioti, Cesari, Romani and UrgesiAglioti et al. (2008) demonstrated that experienced basketball players made better predictions about the outcome of observed free throws in comparison to individuals with similar visual experience (experienced coaches, sports journalists) and to novices. Improved performance of players in this example, compared to experienced coaches, invites the interpretation that motor experience specifically contributes to improved action understanding. In a similar vein, Reference Knoblich and FlachKnoblich & Flach (2001) found that participants were better able to judge from a video where a thrown dart would land, when that video depicted a previous throw that they had performed themselves, compared to another thrower. An important feature of each of these motor-to-vision studies is that the observed actions were seen from a side view, that is, one that is normally unavailable for one’s own actions. Therefore, the learning exhibited in those situations must extend over modalities (from motor to visual) and must also generalize across visual perspective.

The preceding findings imply a close overlap between an observer’s own motor repertoire and her ability to understand actions. Yet other findings show that these two variables can be dissociated. For example, a series of studies of individuals with congenital dysplasia who lack upper arms (and therefore have no upper-limb motor representations), revealed essentially normal performance in a variety of tasks. These included different aspects of action understanding, including the ability to name pantomimes and point-light animations, to learn new actions, and to predict the outcome of basketball free-throws (Reference Vannuscorps and CaramazzaVannuscorps & Caramazza, 2016; but see Reference Vannuscorps and CaramazzaVannuscorps & Caramazza, 2023). Developmental studies reveal similar dissociations; for example, three-month-old infants have been shown to interpret observed actions as goal-directed before they are able to perform reach and grasp actions themselves (Reference Liu, Brooks and SpelkeLiu et al., 2019; see also Reference SouthgateSouthgate, 2013). In sum, whereas several studies suggest that the ability to detect subtle differences in the kinematics of observed movements is modified by the observer’s experience, relevant motor experience is not always a necessary requirement for the ability to understand actions.

Imitation

Observational learning generally relates to the effects of experience on later performance (or perception) of an action. In contrast, imitation concerns the attempt to immediately replicate another person’s action. Here, key research questions have concerned the development of imitation (to what extent is imitation present from birth?) and automaticity (e.g. to what extent does imitation occur in spite of the observers’ current goals?).

The claim that even newborn infants possess not only the ability to imitate facial expressions but a tendency to do so spontaneously (Reference Meltzoff and MooreMeltzoff & Moore, 1977) has been highly influential, although the core findings have been questioned by more recent large-scale replication efforts (e.g. Reference Oostenbroek, Suddendorf and NielsenOostenbroek et al., 2016). Similarly, evidence for ‘automatic’ imitation in adults has proven fruitful. A simple procedure, typically called the automatic imitation task, was developed by Brass and colleagues (Reference Brass, Bekkering, Wohlschläger and PrinzBrass et al., 2000; Reference Cracco, Bardi and DesmetCracco et al., 2018). Here, participants lift either their index or middle finger in response to a visually presented numeric cue. At the same time as the cue, an on-screen hand is shown to lift either the index or middle finger. While the finger movement is task-irrelevant, participants are nonetheless normally faster when that movement also matches the action they are required to execute, compared to when it does not match. Variants of this procedure have been developed to understand this compatibility effect, to identify its neural correlates (Reference Darda and RamseyDarda & Ramsey, 2019), to assess its malleability following training (Reference Catmur, Walsh and HeyesCatmur et al., 2007), and to test the claim that it is ‘automatic’ (Reference Cracco, Bardi and DesmetCracco et al., 2018).

In contrast to these relatively simple and controlled tasks, researchers in social psychology have asked whether, in more naturalistic settings, participants tend to unwittingly mimic the movements or body postures of confederates. For example, Reference Chartrand and BarghChartrand & Bargh (1999) reported a ‘chameleon effect’ whereby individuals may unintentionally match others’ overt behaviours, and moreover that the experience of being imitated in this fashion increases liking. In general, then, there is some evidence of the tendency for irrelevant or incidental actions of others to influence the observer’s own concurrent behaviours, even in the absence of an explicit goal to imitate.

A final important distinction is that between imitation and emulation, where the latter refers to an achievement of the same end state via different specific motoric means (see also Reference Bekkering, Wohlschlager and GattisBekkering et al., 2000; Reference CsibraCsibra, 2008; Reference HeyesHeyes, 2001; Reference Tomasello, Kruger and RatnerTomasello et al., 1993). For example, given no specific instructions, preschool children will tend to emulate the target of an action (e.g. reaching for the right ear) instead of producing a faithful copy of the observed action (e.g. reaching for the right ear with the contralateral hand; Reference Bekkering, Wohlschlager and GattisBekkering et al., 2000). This finding illustrates that actions may normally be understood by default from the ‘intentional stance’ – as deliberate and rational behaviours, performed by an agent for a reason – a topic we return to in Section 2.3.

‘How’ beyond Observational Learning

The specific manner in which an action is performed (e.g. grasping a bottle at the top or the bottom) can provide cues about the immediate goal of an actor (e.g. to move the bottle or to use it to pour a drink). Observers may use a variety of sources, such as the kinematics and the preshaping of the hand, as well as perceived gaze direction (e.g. Reference Aglioti, Cesari, Romani and UrgesiAglioti et al., 2008; Reference Ambrosini, Costantini and SinigagliaAmbrosini et al., 2011; Reference Cavallo, Koul, Ansuini, Capozzi and BecchioCavallo et al., 2016) to anticipate how an action will unfold, and to coordinate actions of two or more actors (see also Reference Azaad, Knoblich and SebanzAzaad et al., 2021). Access to the precise way in which an action is performed also plays a role in the predictive coding framework of action understanding (Reference KilnerKilner, 2011; Reference Kilner, Friston and FrithKilner et al., 2007). We will return to this point in Section 4.

Studies from the direct perception tradition (Reference GibsonGibson, 1979/2014) and more recently from the social vision framework (Reference Adams, Adams, Ambady, Nakayama and ShimojoAdams et al., 2011) examine how the observed patterns of others’ movements provide rich clues about the states and traits of other individuals (with the caveat that such cues may not be fully valid). For example, studies of point-light recordings of actors performing simple actions revealed that they support above-chance identification of the actor (Reference Loula, Prasad, Harber and ShiffrarLoula et al., 2005) and discrimination of emotion (Reference Atkinson, Dittrich, Gemmell and YoungAtkinson et al., 2004), gender (Reference Kozlowski and CuttingKozlowski & Cutting, 1977), or sexuality (Reference Johnson, Gill, Reichman and TassinaryJohnson et al., 2007). Those studies guided by the direct perception framework have tried to identify simple physical properties of movement patterns that reliably cue social variables without the need for complex cognitive analysis. For example, Reference Kozlowski and CuttingKozlowski & Cutting (1977) identified that a lower centre of movement reliably signals female as opposed to male actors from walking patterns. Studies in the related social vision framework have tended to focus on outcomes, as seen, for example, in the finding that observers’ judgments of actors’ health from movement patterns was a reliable predictor of which actor would be selected in a hypothetical political election (Reference Kramer, Arend and WardKramer et al., 2010). In sum, the details of action dynamics, even from minimal stimuli like ‘point-light’ animations, can provide information about the states and traits of the actors that perform those actions. In the following section, we examine how observed actions also provide evidence about more complex mental states such as goals and beliefs.

2.3 ‘Why’: Intentions, Mental States, and Traits of Observed Actors

A meaningless waggle of the hands, or a flag waving in the wind, are not actions: actions are carried out with the intent to effect a change in the state of the world. As described in Section 2.1, typical outcomes are an essential part of our general, abstract semantic knowledge about different action kinds. Here, we explore the situation in which an observer understands the goals of a specific actor undertaking a specific instance of an action. To emphasize the distinction: it is one thing to know that, in general, cleaning the kitchen is an action intended to reduce the amount of dirt in that room, and another to understand the behaviour of a specific individual performing specific movements in a specific kitchen with a broom and dustpan.

As noted in Section 1, understanding the goal or desired outcome of an action is sometimes regarded as the pinnacle of a hierarchical encoding of that action. Yet the ‘why’ behind an action may often be described at multiple levels: Why is he moving the broom forward and backward? Because he knows that this is an effective way to gather dust. Why is he sweeping? Because he desires the end-state of a clean floor. Why is he cleaning the floor? Because he wants his expected visitors to judge him positively. Furthermore, the goals of the observer will influence the level at which she seeks to identify the actor’s intentions (Reference Bach, Peatfield and TipperBach et al., 2007; Reference Spunt and LiebermanSpunt & Lieberman, 2014; Reference Thompson and ParasuramanThompson & Parasuraman, 2012; Reference Thompson, Long, Bird and CatmurThompson et al., 2023). As an example, an observer might have the goal to imitate for the sake of learning; to figure out whether the other person needs help; or to form a first impression. What is common across all of these levels, however, is that the observer normally treats the actor with the intentional stance (Reference DennettDennett, 1987). That is, when watching the man sweep his kitchen, she attributes to him mental states such as knowledge, beliefs, and goals – all of which may well differ from her own. She will understand these mental states as having a causal role in his decisions about what actions to perform and how; and conversely, she will expect that his actions follow rationally from his beliefs and goals, given his available repertoire.

Framed in these terms, we can examine some of the main approaches to revealing how observers understand the intentions of a specific actor from a specific observed action. In one approach (e.g. Reference Brass, Schmitt, Spengler and GergelyBrass et al., 2007; Reference de Lange, Spronk, Willems, Toni and Bekkeringde Lange et al., 2008; Reference Dungan, Stepanovic and YoungDungan et al., 2016; for a meta analysis, see Reference Van Overwallevan Overwalle, 2009), actions are presented that are unusual or unfamiliar in some aspect: for example, a person switching on a light with her knee (which makes sense if the hands of the actor are occupied, but not if they are empty). The error signals that are generated by such unusual actions would normally trigger a search for an explanation, just as would be expected for other violations of expectations (such as seeing a rowboat in a desert landscape; Reference Brandman and PeelenBrandmann & Peelen, 2017; Reference Oliva and TorralbaOliva & Torralba, 2007). Generally, when there is a significant mismatch between a percept and one’s expectations or action knowledge, a more explicit and effortful process is engaged to understand the action. To what extent does that search involve representing the actor’s mental states?

One approach to examining mental state attribution in action understanding is by reverse inferenceFootnote 1 from the activity of brain regions that are thought to support such ‘mentalizing’, as revealed in false-belief or perspective-taking tasks (e.g. Reference Saxe and KanwisherSaxe & Kanwisher, 2003; Reference Schurz, Radua, Aichhorn, Richlan and PernerSchurz et al., 2014). Unusual actions (switching on a light with the knee) recruit such brain regions more when they are presented in an implausible context (actor’s hands are free) relative to a more plausible context (the hands are otherwise occupied; Reference Brass, Schmitt, Spengler and GergelyBrass et al., 2007). The logic is that the implausible action elicits an attempt to identify an account of the situation, which by default is one that relies on representing the mental states of the actor.

A related topic in social psychology (e.g. Reference Ambady and RosenthalAmbady & Rosenthal, 1992; Reference EstesEstes, 1938; Reference Tamir and ThorntonTamir & Thornton, 2018) concerns how action understanding provides cues about the states and traits of an actor (Reference Bach and SchenkeBach & Schenke, 2017). Here we are concerned with the meaning and outcomes of the action, rather than the dynamics as reviewed in Section ‘How’ beyond Observational Learning. For example, observing an actor make a donation to a charity supports general predictions about his future behaviour in related situations (such as helping an old person cross the road), perhaps mediated by a guess about his personality traits. Indeed, the fundamental attribution error (Reference Gilbert and MaloneGilbert & Malone, 1995; Reference RossRoss, 2018) reveals how observers tend to emphasize explanations of other people’s behaviour in terms of the actor’s personality traits, often neglecting the contribution of the situation or context. For example, having observed that a colleague regularly drives his car instead of his bike to work despite a relatively short distance, we might consider him lazy, without taking into account that he might have to drop his children at a more distant nursery on his way to work. These concepts and findings reveal how action understanding contributes to general processes of person perception in the social-psychological sense.

Finally, several authors have adopted a Bayesian inverse planning approach to model how mental state inferences are drawn from observed actions (e.g. Reference Baker, Saxe and TenenbaumBaker et al., 2009; Reference Baker, Jara-Ettinger, Saxe and TenenbaumBaker et al., 2017). As an example, Reference Baker, Saxe and TenenbaumBaker et al. (2009) presented human observers with simple animations consisting of an agent moving through a two-dimensional environment with obstacles and target locations. The animations stopped at a predefined point in time, and participants had to report which target they thought the agent was trying to reach. The authors modelled causal relations between the environment, goals, and actions in the form of rational probabilistic planning in Markov decision problems. To infer the agent’s belief and goals from their actions, this relation is inverted using Bayes’ rule. The goal of the agent is to achieve a specific state of the environment, and this goal can change over time and can have varying levels of complexity. The authors observed that participants’ judgements could be well predicted on the basis of these Bayesian inverse planning models. Whereas the paradigm focused on spatial navigation, similar models can also be adapted to more complex, naturalistic tasks. In sum, the contribution of this modelling approach is to formalize ideas about how observed variables (actions) may be used to make inference about unobservable variables (actors’ beliefs and goals).

To briefly summarize Section 2: we have so far reviewed different perspectives on action understanding, asking what kinds of mental representations and processes might be used to understand an action. What emerges clearly is that the answer depends on the goals of the observer: action understanding is not monolithic. While there are important examples that cross the boundaries, the tasks of classifying an action, understanding how an action is carried out, and understanding the intentions of the actors, draw on different mental capacities. Broadly, classifying actions requires a rich semantic ‘database’ of our long-term knowledge about actions; attention to the means by which an action is performed implicates implicit, motoric knowledge as well; and adopting the intentional stance to make inferences about others’ mental states requires implicit theories of how traits, states, intentions, and behaviour interact.

3 Attention and Automaticity

3.1 Varieties of Attention

Do observers automatically understand an action that they observe, as sometimes suggested (Reference Ferrari, Bonini and FogassiFerrari et al., 2009; Reference IacoboniIacoboni, 2009; see also Reference Cook, Bird, Catmur, Press and HeyesCook et al., 2014, for a review)? The evidence reviewed in Section 2 already indicates the limits of the automaticity of action understanding, given its multifaceted nature, and its dependence on distinct processes as well as contextual factors including the observer’s own experience and goals. To focus more closely on the question, here we consider several conceptions of automaticity that have been put forward in the social cognition literature (Reference BarghBargh, 1989). To simplify the discussion, in each case we refer to examples that have used the ‘automatic imitation’ task (Reference Brass, Bekkering, Wohlschläger and PrinzBrass et al., 2000; see above) as a proxy measure of understanding a simple viewed movement.

First, what aspects of action understanding proceed even when they are not relevant to the task at hand? Say the observer is trying to find a friend who is performing on a crowded stage; to what extent does he also represent the performer’s actions even though these are not relevant to his goal? In the context of the automatic imitation task, Reference Hemed, Mark-Tavger, Hertz, Bakbani-Elkayam and EitamHemed et al. (2021) approached this issue by including incompatible finger movements that were also never task relevant (and so not part of the participants’ response set). Such irrelevant movements did not affect task performance, providing one example of the attentional filtering of action even in a very minimalistic setting. In other words, there is a limit to the automaticity of processing even simple movements viewed in isolation.

Second, what aspects of action understanding are resistant to top-down control, which is to say they are carried out even when the observer deliberately tries not to do so? Reference Chong, Cunnington, Williams and MattingleyChong et al. (2009) reported that the ‘automatic imitation’ of a viewed grasping action (measured via response compatibility effects) was eliminated when participants’ attention was directed to another object presented at the same location. Here, again we see evidence against strong ‘automaticity’ in the finding that even a single, foveated action will affect the observer’s behaviour less to the extent that it is not in the focus of selective attention.

Third, to what extent does action understanding persist in a complex visual environment, or under increased mental load? For example, in daily life, an action may be observed in a serene setting (watching the only other patient in a dentist’s waiting room) or in a complex one (watching fans in a sporting arena). At the same time, one may be free of distraction, or alternatively heavily distracted by another ongoing mental task (e.g. attending an online meeting while also home-schooling). These examples highlight the dimensions of perceptual and cognitive load, which deeply affect everyday cognition (Reference Lavie and DaltonLavie & Dalton, 2014). Several recent studies have explored the effects of perceptual load (Reference CatmurCatmur, 2016; Reference Thompson, Long, Bird and CatmurThompson et al., 2023) and cognitive load (Reference Ramsey, Darda and DowningRamsey et al., 2019) on tasks that require either explicit action category judgments or measure action perception implicitly (but see Reference BenoniBenoni, 2018). The general strategy is to assess how an action task is impacted by a second concurrent task, performed at low versus high load. In perceptual tasks, load is typically manipulated by adding more, or more varied, stimulus items along with the task-relevant item. Cognitive load may be varied by requiring participants to maintain one versus many letters or digits in working memory. Reference CatmurCatmur (2016) reported that perceptual load amplifies the effects of irrelevant finger movements in the automatic imitation task. In contrast, Reference Ramsey, Darda and DowningRamsey et al. (2019) reported no effects of concurrent cognitive load on the strength of the automatic imitation effect. This was the case even when the items to be maintained in working memory (images of hand postures) were highly similar to the automatic imitation cues. Findings like these help establish the automaticity of action understanding with respect to other ongoing perceptual and cognitive processes.

The studies discussed in this section all focused on relatively simple finger movements within the automatic imitation paradigm. Other studies have tested different action understanding tasks, along with manipulations to examine the relative automaticity of action processing and its modulation by perceptual and cognitive load (Reference Lingnau and PetrisLingnau & Petris, 2013; Reference Spunt and LiebermanSpunt & Liebermann, 2014).

3.2 Task Set and Observer Goals

Observers may actively try to attend to the kinematics of an action (perhaps to learn how to improve one’s tennis backhand), its category (is that backhand a slice shot or not?), or its intended result (is that a drop shot or a long volley?). These distinct kinds of attentional sets in turn have an impact on more basic perceptual processes that analyse the scene: in the first example, perhaps attention is focused on the movements of the arm, whereas the angle of the racket may be more relevant in the second example. This intention to select aspects of the action may fail, in the sense that there may be processing of irrelevant aspects of the action as well. For one example, on the principles of object-based attention (Reference Cavanagh, Caplovitz, Lytchenko, Maechler, Tse and SheinbergCavanagh et al., 2023), attempting to focus on the movement of the arm may necessarily entail selection of the tennis racket it holds as well. Similarly, based on neuroimaging studies, Reference Spunt and LiebermanSpunt & Lieberman (2013) have suggested that focusing attention on ‘why’ an action is executed also elicits a representation of ‘how’ it is executed, even if the latter is not task relevant.

Finally, attention is sometimes construed as the selection of internal representations or templates, for example to support visual search for a certain target item such as a face or house (Reference Chun, Golomb and Turk-BrowneChun et al., 2011; Reference Peelen and KastnerPeelen & Kastner, 2014; Reference Serences, Schwarzbach, Courtney, Golay and YantisSerences et al., 2004). Applied to actions, we can think about search templates in the frameworks of action spaces and action frames (Section 2.1). In terms of action spaces, attention might ‘reshape’ representational geometries (see also Reference EdelmanEdelman, 1998; Reference Kriegeskorte and KievitKriegeskorte & Kievit, 2013; Reference NosofskyNosofsky et al., 1986). As an example, attending closely to the location in which an action takes place (e.g. a kitchen) might effectively ‘expand’ the representational space of kitchen-related actions, and ‘compress’ the space around other actions (see also Reference Nastase, Connolly and OosterhofNastase et al., 2017; Reference Shahdloo, Çelik, Urgen, Gallant and ÇukurShahdloo et al., 2022; Reference Wurm and SchubotzWurm & Schubotz, 2012, Reference Wurm and Schubotz2017). In this metaphor, ‘expanding’ dimensions of a representational space implies enhancing distinctions that are relevant to that dimension (e.g. amongst different kinds of slicing, chopping, and grating) and de-emphasizing other distinctions that are not relevant (Figure 5). In contrast, in terms of action frames, attention might facilitate or inhibit the connections between different scene elements (cf. Figure 4B) or between different action frames (Figure 4C), again to highlight those that are contextually relevant.

A: Action space of four hypothetical action categories without attention (see also Figure 3).

B: Action space of four hypothetical action categories while attending to the category highlighted in red. In this example, distinctions among the members of the attended category are enhanced, whereas distinctions within irrelevant action categories, and also between action categories, are attenuated.

Figure 5 Schematic illustrating expansion and compression of action spaces via attention.

To briefly summarize Section 3: while we argue that a general answer to the question ‘is action understanding automatic’ must be ‘no’, much remains to be learned about how different senses of automaticity apply to different contexts. We suggest that the concepts and approaches developed in the study of visual attention in general, are well suited to test assumptions about the representations captured in action spaces and action frames. This broader approach, we suggest, will be more fruitful than seeking a simple answer to the question of whether or not action understanding proceeds automatically.

4 Brain Mechanisms

In the preceding sections, we focused on the mental processes and representations that enable action understanding. Next, we review evidence and theories about the brain regions, networks, and distributed patterns of activity that support action understanding tasks. Neuroscientific studies in this area have been very strongly influenced by the discovery of the ‘mirror neuron’ and related theoretical views on the contribution of the motor system to visual action understanding. Accordingly, we structure this section roughly chronologically to track initial findings and conceptions of mirror neurons, following subsequent waves of human neuroimaging and non-human primate studies, and finally to consider more recently emerging theoretical perspectives. Specifically, we start our journey in Section 4.1 by briefly reviewing evidence for visual action-selective neurons in the macaque superior temporal sulcus (STS). We then review in Section 4.2 the initial reports and key findings about ‘mirror neurons’ in macaque premotor cortex. Section 4.3 reviews studies inspired by those findings that sought signatures of a human ‘mirror neuron system’. These have used several methods to probe the activity of motor regions in visual action understanding tasks, and to identify potential markers of ‘mirror-like’ representations. More recently, as we see in Section 4.4, several groups have turned away from the emphasis on motor representations, to instead draw methodological and theoretical parallels between action understanding and research on visual object perception. Finally, in Section 4.5, we come full circle to consider more recent discoveries about mirror neurons in the macaque, and to review how thinking has evolved about possible alternative functional roles of mirror neurons or a mirror ‘system’ in human action understanding. Throughout, we highlight points of contact between neuroscientific findings and concepts, and the themes introduced in Sections 2 and 3. Figure 6 provides a visual guide to some of the regions in the human and the macaque brain that we discuss.

A: Macaque brain, lateral view.

B: Human brain, lateral view.

Adapted from https://www.supercoloring.com/coloring-pages/human-brain-anatomy. F5: rostral portion of ventral premotor cortex, CS: central sulcus, AIP: anterior intraparietal area, IPL: inferior parietal lobe, STS: superior temporal sulcus, IT: inferior-temporal cortex, V1: primary visual cortex, PMv: ventral premotor cortex, PMd: dorsal premotor cortex, IFG: inferior frontal gyrus, S1: primary somatosensory cortex, SPL: superior parietal lobule, pSTS: posterior superior temporal sulcus, LOTC: lateral occipitotemporal cortex, MT: middle temporal area.

Figure 6 Key brain regions discussed in Section 4.

4.1 High-Level Visual Representations of Actions in the Macaque

Perrett and colleagues demonstrated that macaque STS contains neurons that selectively respond to different types of observed manual actions, such as picking, tearing, or rotating (e.g. Reference Perrett, Harries and BevanPerrett et al., 1989). Some of these neurons generalized over different instances (e.g. front versus side view), and also were sensitive to agent-object interaction (e.g. a hand manipulating fur versus a hand performing the same action but with a gap between hand and fur). From findings like these, the authors concluded that networks of neurons within the STS collectively represent socially significant aspects of others’ movements and postures, such as their direction of attention, or their intention to act.

Later, action sensitive neurons with more complex properties were discovered in the same general region. As mentioned earlier, under natural conditions intentional grasping actions in humans are accompanied by an anticipatory gaze shift of the actor towards the object (Reference Ambrosini, Costantini and SinigagliaAmbrosini et al., 2011, Reference Ambrosini, Pezzulo and Costantini2015; Reference Flanagan and JohanssonFlanagan & Johansson, 2003). Neurons in macaque STS have been shown to detect subtle variations in this relationship. For example, Reference Jellema, Baker, Wicker and PerrettJellema et al. (2000) found stronger responses in anterior STS neurons when both a reaching movement and gaze were directed towards the monkey, in comparison to a reach towards the monkey accompanied by a shift of gaze somewhere else. Findings like these have been taken as evidence of neural computations that support discriminating intentional actions from unintentional movements.

4.2 Initial Discovery and Characterization of Mirror Neurons

Macaque ventral premotor area F5 was long known to contain neurons that discharge during the execution of specific object-directed hand actions (e.g. Reference Rizzolatti, Scandolara, Gentilucci and CamardaRizzolatti et al., 1981, Reference Rizzolatti, Camarda and Fogassi1988), and during the observation of objects that require a specific grip type (Reference Murata, Fadiga and FogassiMurata et al., 1997; Reference Rizzolatti, Scandolara, Gentilucci and CamardaRizzolatti et al., 1981). Other studies also showed selectivity in these neurons for object-directed actions, irrespective of the effector involved (e.g. grasping food with the hand or the mouth; Reference Rizzolatti, Camarda and FogassiRizzolatti et al., 1988). In a further study of this region, Reference Di Pellegrino, Fadiga, Fogassi, Gallese and RizzolattiDi Pellegrino et al. (1992) incidentally observed that some F5 neurons also discharged during the passive observation of certain actions (e.g. picking up food) performed by the experimenter. Further observations with other actions revealed a direct correspondence between the effective action during observation and execution in a subset of all examined neurons (12 out of 184). Later studies identified additional properties of these ‘mirror neurons’: for example, that they would only discharge during an interaction between an actor and an object (Reference Gallese, Fadiga, Fogassi and RizzolattiGallese et al., 1996), and that they sometimes responded to expected but occluded grasping actions (Reference Umiltà, Kohler and GalleseUmiltà et al., 2001). Furthermore, visuo-motor neurons found in the monkey inferior parietal lobule (IPL) were sensitive to the target of otherwise similar manual actions (e.g. grasping to place food into a container next to the shoulder versus grasping to place food into the mouth; Reference Fogassi, Ferrari and GesierichFogassi et al., 2005; see also Reference Bonini, Rozzi and ServentiBonini et al., 2010). Further reviews of these and other early key studies are found in Reference Kilner and LemonKilner & Lemon (2013).

Modern perspectives on the relationship between perception, decision-making, action planning, and action execution tend to emphasize shared representations (e.g. common coding framework; Reference PrinzPrinz, 1997), and describe these as cascading parallel processes rather than serial stages (e.g. Reference CisekCisek, 2007). Mirror neurons, because they discharge during the observation and execution of similar actions, have been proposed to provide the neural basis of such shared representations. As we will show in the following, this view has evolved and expanded greatly as new findings have emerged (for related reviews, see Reference Kilner and LemonKilner & Lemon (2013), Reference Rizzolatti and SinigagliaRizzolatti & Sinigaglia (2016), Reference Heyes and CatmurHeyes & Catmur (2022), and Reference Bonini, Rotunno, Arcuri and GalleseBonini et al. (2022)).

Based on the initial discovery of mirror neurons, Reference Di Pellegrino, Fadiga, Fogassi, Gallese and RizzolattiDi Pellegrino et al. (1992) concluded that premotor cortex not only retrieves appropriate motor acts in response to sensory stimuli (such as the shape and size of objects), but also in response to the meaning of the motor acts of another individual. In other words, the authors argued that these neurons provide an explicit representation of the link between the execution of a motor act and its visual appearance when performed by another individual (Reference Di Pellegrino, Fadiga, Fogassi, Gallese and RizzolattiDi Pellegrino et al., 1992). Reference Gallese, Fadiga, Fogassi and RizzolattiGallese et al. (1996) went further to propose that mirror neurons play a role in action understanding of motor events, which they defined as ‘the capacity to recognize that an individual is performing an action, to differentiate this action from others analogous to it, and to use this information in order to act appropriately’. In line with the division between the ventral and dorsal pathways (Reference Goodale and MilnerGoodale & Milner, 1992; Reference Ungerleider, Mishkin, Ingle, Goodale and MansfieldUngerleider & Mishkin, 1982), the authors argued that neurons in STS might provide an initial description of hand-object interactions and capture a semantic (or ‘What’) representation of the action, whereas mirror neurons in F5 might provide a match with the ‘motor vocabulary’, capturing a pragmatic (or ‘How’) representation of actions.

Based on the observation that mirror neurons respond to hidden actions, Reference Umiltà, Kohler and GalleseUmiltà et al. (2001) further reasoned that mirror neurons have the capability to infer both the action and the object from past perceptual history, and suggested that the hidden condition requires cognitive effort from the monkey since it must pay attention to the actions of the experimenter and ‘reconstruct the missing part of the action’ (page 161). These key points – italics ours –contrast with earlier proposals that mirror neuron activity supports ‘automatic’ action understanding (see also Reference Cook, Bird, Catmur, Press and HeyesCook et al., 2014).

Following the observation that some mirror neurons in monkey inferior parietal cortex code the target of an action, Reference Fogassi, Ferrari and GesierichFogassi et al. (2005) argued that individual motor acts are combined by means of ‘intentional chains’ which allow the observer to predict the goal of the action, and from that to ‘read the intention’ of the actor. This is a proposal about the discovery of internal mental states from observed actions, as discussed in Section 2. As a form of ‘direct perception’, it stands in contrast to the mentalizing or ‘theory of mind’ perspective, by which goals would be understood via inferences about beliefs and other mental states. Most starkly, some researchers (e.g. Reference Rizzolatti, Fogassi and GalleseRizzolatti et al., 2001) have claimed that actions are understood ‘when we map the visual representation of the observed action onto our motor representation of the same action’, without ‘inferential processing or ‘high-level mental processes – and that mirror neurons constitute the basis for this understanding. Note the contrast between this perspective and the descriptions of action spaces and action frames (Section 2), which describe our rich semantic knowledge about actions that is not obviously motoric in nature.

Claims that mirror neurons constitute a solution to the problem of action understanding, and that this takes place automatically, have remained controversial. For example, single cell recordings are correlational, so they do not allow inferences regarding a causal role of measured neurons in the tasks under investigation (see also Reference Caramazza, Anzellotti, Strnad and LingnauCaramazza et al., 2014; Reference HickokHickok et al., 2009; Reference Thompson, Bird and CatmurThompson et al., 2019). So it remains unknown whether mirror neurons play a causal role in action understanding in the macaque, a problem that is exacerbated because identifying suitable tasks and measures of ‘understanding’ in non-human primates is not trivial. Moreover, for practical reasons, studies of mirror neurons have focused on immediate reach-to-grasp movements targeting food or other desirable objects in most cases. It is therefore not clear how these kinds of findings generalize to the wide repertoire of actions (see also Reference Sliwa and FreiwaldSliwa & Freiwald, 2017) performed with various body parts, objects and tools in human daily life.

We return in Section 4.5 to more elaborate arguments and debates about the role of mirror neurons. First, however, we review key points in the large literature on human observers that has been directly inspired by the discovery of mirror neurons and by the initial ideas about their possible functional roles.

4.3 A Human Mirror System?

Whereas it is not straightforward to identify and characterize mirror neurons in humans directly, several indirect approaches have been developed to identify mechanisms that link observed and executed actions in the human brain (Figure 7). In each case, the core of the logic is that there should be some neural signature that is sensitive to the match between a specific performed action, and observation of that same action.

A: Post-stimulus ‘rebound’ of the suppressed cortical mu-rhythm response following execution of a repetitive action (solid lines) or passive observation of a similar action (dotted lines).

B: Enhancement of the contralateral Motor-Evoked Potential by passive observation of a grasping action (top) relative to an object observation control (bottom) in two hand muscles (first dorsal interosseus, left; opponens policis, right).

C: Human brain regions commonly activated in action observation, in action execution tasks, or by both tasks, in fMRI experiments.

D: Top: Human IPL exhibits repetition suppression for transitive hand actions that were mimed and then observed. From Reference Chong, Cunnington, Williams and MattingleyChong et al., 2009. Bottom: Reduction in the hemodynamic response function to repeated actions relative to non-repeated actions.

E: Top: Schematic illustration of the logic from multivoxel pattern analysis (MVPA) fMRI studies that sought to identify regions in which local voxel patterns are more similar for the same action than different actions, across performance and observation. Bottom: Brain regions that exhibit the similarity patterns described in the top panel, as revealed by surface-based MVPA of fMRI data.

Figure 7 Examples of approaches to identifying aspects of human brain activity that share properties in common with the mirror neuron.

Physiological Measures of Motor System Activity

One way to examine the level of activation of the motor system is to induce a brief electrical current in the primary motor cortex via transcranial magnetic stimulation (TMS). This can trigger measurable motor evoked potentials (MEPs) in contralateral peripheral muscles such as in the hand. The strength of these MEPs indexes the excitability of the corresponding stimulated motor region (for a review, see Reference Bestmann and KrakauerBestmann & Krakauer, 2015). Moreover, the comparison of MEPs between different muscles of the hand that are involved in specific types of grasping movements enables an examination of muscle-specific activation of the motor cortex during action observation. In general, findings that the excitability of the motor cortex can be modulated by passively observed compatible actions have been taken as evidence of a ‘mirror-like’ mechanism in humans (Reference Rizzolatti, Fogassi and GalleseRizzolatti et al., 2001).

Using this approach, several studies found that observing manual grasping actions leads to a muscle-specific activation of some of the same motor pathways that would be used if the observer were to perform that action (Reference Baldissera, Cavallari, Craighero and FadigaBaldissera et al., 2001; Reference Fadiga, Fogassi, Pavesi and RizzolattiFadiga et al., 1995; Reference Maeda, Kleiner-Fisman and Pascual-LeoneMaeda et al., 2002; Reference Strafella and PausStrafella & Paus, 2000). Such findings have been interpreted as a sign of an automatic ‘resonance’ in the observer’s motor system caused by observing the action. Other studies demonstrated that MEPs are sensitive to predicted action outcomes. For example, Reference Aglioti, Cesari, Romani and UrgesiAglioti et al. (2008) found that in expert basketball players, specific MEPs were evoked during the observation of missed free-throws, upon release of the ball but before the outcome was known (see also Reference Gangitano, Mottaghy and Pascual-LeoneGangitano et al., 2001, Reference Gangitano, Mottaghy and Pascual-Leone2004; Reference Kilner, Vargas, Duval, Blakemore and SiriguKilner et al., 2004). More recent studies demonstrated that MEPs are not only modulated by low level kinematic features of an action, but are also affected by higher level processes, such as the difference between honest and deceptive actions (Reference Finisguerra, Amoruso, Makris and UrgesiFinisguerra et al., 2018) or the congruence between the context and the action (Reference Amoruso and UrgesiAmoruso & Urgesi, 2016; Reference Betti, Finisguerra, Amoruso and UrgesiBetti et al., 2022). Findings like these have contributed to a debate about whether MEPs reflect an automatic motor resonance, or whether instead they are also modulated by top-down influences (for a review, see Reference Amoruso and FinisguerraAmoruso & Finisguerra, 2019).

Another series of experiments exploited the mu rhythm, a frequency in the cortical EEG signal in the range of 8–13 Hz over sensorimotor cortex. In general terms, the mu rhythm is suppressed during selective attention and motor preparation, and the mu rhythm can be sensitive to the type of movement and handedness (for review, see Hobson & Bishop, 2007). Mu suppression, like MEPs, has been used as an index of motor system activity during the passive observation of others’ actions. In common with the properties ascribed to some mirror neurons, for example, suppression of the mu rhythm is stronger during the observation of a precision grip of an object compared to a mimicked precision grip in the absence of an object (e.g. Reference Muthukumaraswamy and JohnsonMuthukumaraswamy et al., 2004a, Reference Muthukumaraswamy, Johnson and McNairb). However, the view of the mu rhythm as an index of human mirror neuron activity also remains debated (e.g. Reference Hobson and BishopHobson & Bishop, 2017). In particular, it is not straightforward to determine whether a suppression in the 8–13 Hz window originates from sensorimotor areas, or whether it instead stems from a modulation of the alpha rhythm originating from occipital cortex. This alternative indicates that the modulation of the mu rhythm during action observation might instead, or additionally, reflect visual attention or perceptual processes.

Studies from the brain stimulation and mu rhythm lines of work have been useful to explore how the states of the observers’ motor system are influenced by what the observer sees and understands about an action. However, the functional implications of some of these findings remain debated, in that several interpretations remain about what processes these neural measures reveal.

Human Neuroimaging

Early human neuroimaging studies using PET (Reference Grafton, Arbib, Fadiga and RizzolattiGrafton et al., 1996; Reference Rizzolatti, Fadiga and MatelliRizzolatti et al., 1996) and fMRI (Reference Iacoboni, Woods and BrassIacoboni et al., 1999) adopted the logic that anatomical overlap between brain areas that are recruited during the observation of actions, and the execution, imagination, or imitation of actions, would provide evidence of ‘mirror-like’ human brain representations. Some common findings in these initial studies laid the foundation for later human neuroimaging investigations. For example, fMRI studies demonstrated that during passive observation of goal-directed actions, participants recruit a consistent set of brain region including the ventral premotor cortex (PMv) extending into the posterior IFG, the preSMA, somatosensory cortex, anterior and superior sections of the parietal cortex, and portions of the lateral occipitotemporal cortex (see Figure 7B). As a shorthand, these regions are often collectively referred to as the ‘action observation network’. Later studies showed how parts of this network (premotor, parietal, and somatosensory areas) also overlap with the areas involved during motor imagery and/or movement execution (for meta analyses, see e.g. Reference Arioli and CanessaArioli & Canessa, 2019; Reference Caspers, Zilles, Laird and EickhoffCaspers et al., 2010; Reference Hardwick, Caspers, Eickhoff and SwinnenHardwick et al., 2018; but see Reference Turella, Pierno, Tubaldi and CastielloTurella et al., 2009). Together, findings like these have been taken to show a common neural representation of the corresponding visual and motor aspects of actions, as a possible system-level homologue of the mirror neuron.

However, an influential commentary by Reference Dinstein, Thomas, Behrmann and HeegerDinstein et al. (2008) noted limitations in this logic, namely that spatially overlapping activations (e.g. of regions responding to observed and to executed actions) may reflect overlapping but distinct neural populations rather than a shared representation (see also Reference Peelen and DowningPeelen & Downing, 2007). Better evidence for a ‘mirror like’ representation would be found in a demonstration that neuronal populations within overlapping regions are selective for specific motor acts. Accordingly, several studies have investigated cross-modal action selectivity using fMRI adaptation or repetition suppression (e.g. Reference Grill-Spector and MalachGrill-Spector & Malach, 2001). This method is based on the observation that the repetition of a specific stimulus property, such as object category, leads to an attenuation of the fMRI signal in neuronal populations that represent the repeated stimulus property.

Several studies followed this approach to seek evidence for cross-modal action selectivity as suggested by Reference Dinstein, Thomas, Behrmann and HeegerDinstein et al. (2008). Neuronal populations with such properties should adapt when the same action is repeated, across performance or observation of that action, compared to different actions. Reference Dinstein, Hasson, Rubin and HeegerDinstein et al. (2007) and Reference Press, Weiskopf and KilnerPress et al. (2012) obtained action-selective adaptation during observation and also during execution in overlapping parietal regions. However, they did not observe cross-modal adaptation – that is, for observation of an action followed by its execution, or vice versa. Reference Chong, Cunnington, Williams, Kanwisher and MattingleyChong et al. (2008) found cross-modal adaptation in the right IPL, but only tested for execution followed by observation (see also Reference de la Rosa, Schillinger, Bülthoff, Schultz and Umildagde la Rosa et al., 2016). In contrast, Reference Lingnau, Gesierich and CaramazzaLingnau et al. (2009) tested for cross-modal adaptation in both directions; this effect was found in the left IPL, but only when observation was followed by execution. Finally, using a similar approach, Reference Kilner, Neal, Weiskopf, Friston and FrithKilner et al. (2009) found cross-modal adaptation in the IFG in both directions.

Following these initial contradictory results, doubts arose about one of the key assumptions underlying these studies: namely, that mirror neurons adapt to repetition in the same way as other types of neurons. Reference Caggiano, Pomper and FleischerCaggiano et al. (2013) reported that mirror neurons in F5 do not reduce their response amplitude following two repetitions. By contrast, Reference Kilner, Kraskov and LemonKilner, Kraskow, & Lemon (2014) found a modulation of the firing rate, the latency, and beta band power of the local field potential in this region, but only after repetitions of 7–10 trials.

Together, human neuroimaging studies using repetition suppression to examine cross-modal action selectivity remain inconclusive. It is likely that at least one contribution to this is the variety of tasks, stimuli, and action types that have been tested. For example, the combined effects of action type (e.g. object-directed vs intransitive), viewpoint (e.g. first- or third-person), and meaningfulness (e.g. simple movements vs grasps vs pantomimes) have not been factorially explored within a single repetition suppression study of action understanding.

Multivoxel pattern analysis (MVPA; Reference Norman, Polyn, Detre and HaxbyNorman et al., 2006) approaches offer another way to identify shared visual and motor representations of actions that may avoid the issues with interpreting ‘overlap’ identified by Reference Dinstein, Thomas, Behrmann and HeegerDinstein (2008). For example, in a series of studies, Reference Oosterhof, Wiggett, Diedrichsen, Tipper and DowningOosterhof et al. (2010, Reference Oosterhof, Tipper and Downing2012a, Reference Oosterhof, Tipper and Downingb; reviewed in Reference Oosterhof, Tipper and DowningOosterhof et al., 2013) used whole-brain surface-based ‘searchlight’ MVPA (Reference Kriegeskorte, Goebel and BandettiniKriegeskorte et al., 2006; Reference Oosterhof, Wiggett, Diedrichsen, Tipper and DowningOosterhof et al., 2010) to identify brain regions in which the local patterns of activity are a) similar for a given action, whether passively observed or performed by the participant; and also b) dissimilar for different actions. This logic captures the core concept of the mirror neuron in carrying representations that generalize over modality and also distinguish between different actions. These studies consistently revealed regions of the anterior parietal and lateral occipitotemporal cortex that met those defining criteria. Further, patterns of activity in the ventral premotor cortex were also cross-modal and action specific, but only for actions viewed from the first-person perspective – in contrast to initial evidence on mirror neurons that exhibited at least some evidence for selectivity to third-person views of action (see also Reference Caggiano, Fogassi and RizzolattiCaggiano et al., 2011). By contrast, viewpoint independence of cross-modal action-selective representations was obtained in parietal and occipitotemporal cortex only (Reference Oosterhof, Tipper and DowningOosterhof et al., 2012a).

Finally, the most direct way to examine whether the human brain contains cells with mirror properties is to perform direct recordings in humans undergoing preparation for neurosurgery. Reference Mukamel, Ekstrom, Kaplan, Iacoboni and FriedMukamel et al. (2010) recorded extracellular single and multiunit activity from a group of neurons in patients being treated for epilepsy. The authors found neurons that responded both during observation and execution of actions in the supplementary motor area, the hippocampus, and other nearby regions. A subset of these neurons showed excitation during execution, but inhibition during observation (see also Reference Kraskov, Dancause, Quallo, Shepert and LemonKraskov et al., 2009). The presence of both excitation and inhibition is in line with computational models of action planning (see e.g. Reference CisekCisek, 2007) that assume that several potential actions are specified in parallel and compete with each other until there is enough sensory evidence in favour of one of these actions.

The preceding section has briefly laid out some of the main neuroscientific approaches that have been used to apply the mirror neuron logic to the human brain. Overall, the results of these studies converge to implicate several key regions in one or more aspects of action understanding (see Figure 6). Where they diverge is in the extent to which they confirm or fail to confirm the key concepts of cross-modal, view-invariant, and action-specific representations that were inherited from initial descriptions of mirror neurons.

Expertise

If one’s own motor representations play a causal role in action understanding, it stands to reason that the richness of those representations should influence the nature of understanding. Accordingly, several studies have examined how different kinds and levels of action expertise (and specifically motor expertise) change the way these actions are processed in brain regions of the action observation network. The general logic is that relative to the novice, an expert’s richer motor representations of an action repertoire enable an improved, or even qualitatively different, understanding of observed actions from that domain.

Observers’ expertise modulates fMRI activity within the action observation network (see Reference Turella, Wurm, Tucciarelli and LingnauTurella et al., 2013, for a review). For example, one series of studies examined brain responses of expert dancers from two disciplines (ballet and capoeira). In their domain of expertise, dancers exhibited more activity in prefrontal and parietal regions relative to dance movements of the other domain (Reference Calvo-Merino, Glaser, Grèzes, Passingham and HaggardCalvo-Merino et al., 2005) and to dance movements of the expert domain that were motorically but not visually familiar (Reference Calvo-Merino, Grèzes, Glaser, Passingham and HaggardCalvo-Merino et al., 2006; see also Reference Cross, Hamilton and GraftonCross et al., 2006, and Reference Jola, Abedian-Amiri, Kuppuswamy, Pollick and GrosbrasJola et al., 2012). The interpretation of these findings was that motoric aspects of dance expertise influenced the way that experts visually perceived and understood actions, by way of a cross-modal visuo-motor representation.

An apparent paradox in this literature is that in some cases the effect of experience appears to decrease rather than increase the activity in action observation regions (see e.g. Reference Gardner, Aglinskas and CrossGardner et al., 2017). For example, Reference Petrini, Pollick and DahlPetrini et al. (2011) found such a pattern of results when comparing the neural activity elicited by observing ‘point light’ animations of drumming actions, in experienced versus novice drummers. These divergent effects may reflect two different facets of expertise: on the one hand, expertise (e.g. with performing a class of actions) provides a rich framework by which observed actions may be assigned meanings that are not accessible to novices; hence a relative increase in activity in relevant regions for experts. In contrast, expertise also entails familiarity with actions from the relevant domain, supporting an improved ability to predict what will be seen next. Indeed, the literature on perceptual expectations emphasizes the suppressing effect of expectations on neural activity in line with predictive coding models (Reference Summerfield, Trittschuh, Monti, Mesulam and EgnerSummerfield et al., 2008).

Modulation by Task Requirements

In Section 3, we discussed the automaticity of action understanding. Neuroscientific studies have also approached this question by asking to what extent brain activity is modulated by manipulations of the observers’ task, such as by instruction to attend to an action or instead to an object in a scene (Reference Wurm, Ariani, Greenlee and LingnauWurm et al., 2015); to attend to the goal or to the effector involved in an action (Reference Lingnau and PetrisLingnau & Petris, 2013); to attend to the type of action performed by an animal or rather its taxonomic category (Reference Nastase, Connolly and OosterhofNastase et al., 2017; see also Reference KemmererKemmerer, 2021); or to attend to the type of action, the actor, or the colour of the object (Reference Orban, Ferri and PlatonovOrban et al., 2019).

Here, typically the task modulates the engagement of specific brain regions implicated in action understanding. For example, one study showed a higher response in the lateral occipitotemporal cortex when focusing on the ‘what’ of an action, and a higher response when focusing on the ‘why’ of an action in several areas, including the dorsomedial prefrontal cortex and the temporal pole (Reference Spunt, Satpute and LiebermanSpunt et al., 2011; but see Reference Spunt, Kemmerer and AdolphsSpunt et al., 2016). Part of the logic of such studies is to apply reverse inference from previous findings. For example, activity in the ‘action observation network’ may be interpreted as evidence for processing the ‘how’ of an action (Reference Rizzolatti and CraigheroRizzolatti & Craighero, 2004; Reference Rizzolatti and SinigagliaRizzolatti & Sinigaglia, 2010; Reference Caspers, Zilles, Laird and EickhoffCaspers et al., 2010), whereas activity in regions linked to mentalizing tasks is taken to reveal an effort to understand the intentions behind an action (‘why’; e.g. Reference Van Overwallevan Overvalle, 2009; Reference Van Overwalle and Baetensvan Overvalle & Baetens, 2009). More generally, these studies reinforce the view discussed in Section 3, namely that action understanding is not reflex-like, but rather recruits neural processes that adapt to serve the observer’s current goals.

4.4 Parallels with Object Vision

Alongside the studies that have focused on describing possible human homologues of mirror neurons, other researchers have increasingly adapted research questions and methods from the domain of object recognition to action understanding. These parallels include, for example: how are invariant representations achieved over viewpoints, or over different exemplars (Figure 2)? What are the critical features and dimensions underlying the encoding of actions? And what are the temporal dynamics of the brain’s extraction of those features? Here we briefly summarize some recent work in this area.

Generalization and Abstraction

Which brain regions show selectivity for specific observed actions, and how abstract or generalized are those representations? Initially, motivated by findings from the mirror-neuron literature, many studies used region-of-interest (ROI) approaches to focus on regions such as the PMv and the IPL. To establish whether these regions demonstrate action selectivity – a response that can distinguish between different observed actions – several studies relied on fMRI adaptation. For example, Reference Hamilton and GraftonHamilton & Grafton (2006) reported that the anterior IPS encodes the object that is the target of the reach, in a way that generalizes over the specific trajectory that is required to reach that object. Similarly, Reference Hamilton and GraftonHamilton & Grafton (2008) reported a representation of the outcome of an action (e.g. an opened or closed box) that generalizes over the specific kinematics required to achieve that outcome, in the right IPL, the left aIPS and the right IFG (see also Reference Majdandžić, Bekkering, van Schie and ToniMajdandžić et al., 2009). Finally, using a related approach called TMS adaptation, Reference Cattaneo, Sandrini and SchwarzbachCattaneo et al. (2010) adapted participants to the observation of hand or foot actions manipulating an object. TMS applied to the IPL and the PMv led to shorter response times for repeated actions relative to non-repeated ones, irrespective of the effector. By contrast, TMS applied to the STS revealed effector-specific adaptation, suggesting action representations at different hierarchical levels in the STS and the IPL/PMv. Together, these studies are a good early example of how neuroimaging and brain stimulation methods could answer qualitative questions about levels of neural action representation, and how they vary across different brain regions.

Tests of the abstractness of an action representation have also been addressed with a cross-decoding MVPA approach. Here, a classifier might initially be trained to distinguish between two observed actions (A and B), based on the activity patterns within a given brain region. Next, that classifier is tested to see whether it can still distinguish between the two actions following a variation in the way the actor performed the action. Using this logic with a whole-brain searchlight approach, several studies reported that it is possible to decode observed actions from patterns of activity in the lateral occipitotemporal cortex (LOTC) and in the IPL across different target objects (Reference Wurm, Ariani, Greenlee and LingnauWurm et al., 2015) and across objects and the kinematics required to manipulate these objects (Reference Wurm and LingnauWurm & Lingnau, 2015). Similarly, Reference Hafri, Trueswell and EpsteinHafri, Trueswell, & Epstein (2017) were able to distinguish between different interaction categories (e.g. biting, kicking, slapping) across different visual formats (static images versus dynamic videos) based on activity in several regions, including occipitotemporal, parietal and left premotor cortex. And Reference Wurm and CaramazzaWurm & Caramazza (2019) were able to decode actions from videos to written descriptions and vice versa from activation patterns in human LOTC.

Together, what these MVPA decoding findings show is that distributed activity patterns can reveal rich information about viewed actions that go beyond a literal description of a single instance of an action, to extend to more general properties. One important point of focus in this body of work has been around the anatomical regions implicated. As noted, the initial human neuroimaging work focused on the role of ventrolateral frontal and parietal regions. However, a typical pattern in a growing number of more recent human studies (e.g. Oosterhof et al., 2012; Reference Wurm and LingnauWurm & Lingnau, 2015; Reference Wurm, Ariani, Greenlee and LingnauWurm et al., 2015, Reference Wurm, Caramazza and Lingnau2017b) is that these abstract action representations are instead found more consistently in posterior occipitotemporal regions. In part, this discrepancy may reflect different neural distributions in different regions, which may be more or less visible to MVPA. Indeed, by using single-cell recordings from two tetraplegic patients with electrode arrays in the posterior parietal cortex, Reference Aflalo, Zhang and RosarioAflalo et al. (2020) were able to decode manipulative actions across different stimulus formats in human parietal cortex.

The studies reviewed in this section so far clearly indicate how rich information about observed actions is implicit in the activity patterns seen in human brain regions beyond the core motor system. Indeed, a common pattern over multiple studies is that the highest degree of generalization, in common with object vision, is found in higher-level visual areas and the parietal cortex (see e.g. Reference Ayzenberg and BehrmannAyzenberg & Behrmann, 2022).

Organization of Observed Actions in Space and Time

In Section 2.1, we described the logic of multidimensional ‘spaces’ that could describe some aspects of knowledge about action categories. More recent studies have adapted this logic – which emerged from work on the representations of concepts, objects, and faces (Reference GärdenforsGärdenfors, 2004; Reference ShepardShepard, 1958; Reference Valentine, Lewis and HillsValentine et al., 2016) – to examine how patterns of brain activity might describe similar neural ‘spaces’ for action representation. Many of these studies adopt the representational similarity analysis (RSA) approach, which uses measures of similarity to describe the notional geometry of a representation of a class of events or stimuli (Reference Kriegeskorte, Mur and BandettiniKriegeskorte et al., 2008a). In this way, comparisons between behavioural and neural measures, or between two different neural measures, are possible at a level of abstraction above the specific items. For example, Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al. (2019; see also Reference Tarhan, de Freitas and KonkleTarhan et al., 2021; Reference Zhuang, Kabulska and LingnauZhuang et al., 2023) compared representational geometries based on the perceived semantic similarity of observed actions from behavioural measurements, with geometries derived from fMRI multi-voxel activity patterns. They found that neural activity patterns in a set of regions along the ventral and dorsal stream resembled the behaviourally determined action space, in the sense that there was a significant positive relationship between the ‘space’ inferred from behavioural judgments, and that determined from the patterns of brain activity. Thus, these approaches can link, at an abstract level, subjective and neural representations of action knowledge.

If patterns of neural activity capture action ‘spaces’, what is the organization of these spaces? Reference Tarhan and KonkleTarhan & Konkle (2020) identified five distinct distributed clusters of brain regions covering the lateral and ventral occipitotemporal cortex and the intraparietal sulcus. These carried information about body parts and the target of an action during the passive observation of short naturalistic video clips. Responses in four of the identified clusters were organized by the spatial scale of the action (e.g. from small, precise movements involving the hands to large movements involving the entire body). Using a similar approach, Reference Thornton and TamirThornton & Tamir (2022) were able to decode amongst observed actions on the basis of their six-dimensional ACT-FAST taxonomy, based on fMRI activity measured from a widespread set of occipitotemporal, parietal and frontal regions. Finally, using EEG during passive observation of short video clips depicting everyday actions in combination with behavioural ratings, Reference Dima, Tomita, Honey and IsikDima et al. (2022) observed a temporal gradient in action representations. Over a period from 60 to 800 ms, the shape of action ‘spaces’ changed from an emphasis on visual features, to action-related features, and then to social-affective features. Together, studies like these show how specific action-space models can be developed and tested on the basis of neuroimaging data.

Multiple studies have found particularly strong evidence that LOTC plays a role in representing action spaces. For example, Reference Tucciarelli, Wurm, Baccolo and LingnauTucciarelli et al. (2019) found that patterns of activity across the LOTC best capture the semantic similarity structure of observed actions, when variability due to specific action features such as body parts, scenes, and objects is removed. In that study, actions related to locomotion, communication, and food formed clusters both in the behaviourally determined and in the neural action space. Given evidence for abstract action ‘spaces’ in LOTC, is there evidence of any anatomical organization to the patterns of activity within this region? We have previously made the case for representational gradients across the LOTC, such that the way it encodes an action property (e.g. the extent to which it is person or object-directed) varies continuously across the region. Specific proposed gradients include a posterior-anterior gradient for the dimensions concrete-abstract and visual-multimodal, and a dorsal-ventral gradient for the dimensions intentional-perceptual and animate versus inanimate (e.g. Reference Papeo, Agostini and LingnauPapeo et al., 2019; Reference Tarhan, de Freitas and KonkleTarhan et al., 2021; Reference Wurm, Caramazza and LingnauWurm et al., 2017b; for reviews, see Reference Lingnau and DowningLingnau & Downing, 2015; Reference Wurm and CaramazzaWurm & Caramazza, 2022).

Together, this family of findings shows that the representational similarity approach can test hypotheses about how action knowledge is captured in distributed patterns of brain activity. Moreover, these studies have highlighted the role of the LOTC, and point to several action-relevant features that are captured in this region. At the same time, this review highlights that there is not yet consensus on a single set of organizing dimensions. Indeed, given the flexibility with which observers can process an action depending on their attentional state or task set, such a consensus may not be expected.

4.5 Mirror Neurons Revisited

Reference Rizzolatti and SinigagliaRizzolatti & Sinigaglia (2010) argued that while other people’s actions could, in principle, be perceived on the basis of visual processing, such a description lacks an understanding ‘from the inside as a motor possibility’, which was instead proposed to be provided by mirror neurons. Since then, following their discovery and initial characterization, additional properties of mirror neurons have been revealed that continue to shape ideas about how they contribute to action understanding. These findings have highlighted several complex factors that influence, or are even a core part of, action understanding. They take us further away from thinking of action understanding as a direct mapping of ‘the visual representation of the observed action onto our motor representation of the same action’ (Reference Rizzolatti, Fogassi and GalleseRizzolatti et al., 2001) or the idea that actions are understood ‘without inferential processing’ or ‘high-level mental processes’ (Reference Rizzolatti and FogassiRizzolatti & Fogassi, 2014; Reference Rizzolatti and SinigagliaRizzolatti & Sinigaglia, 2010). Here, we review some of that newer evidence, and then go on to describe more recent perspectives that extend beyond the idea of mirroring in action understanding.

A key family of findings is that mirror neuron responses are in some cases influenced by contextual factors. As an example, Reference CsibraCsibra (2008) pointed out that the reach-to-place and the reach-to-eat conditions used in the study by Reference Fogassi, Ferrari and GesierichFogassi et al. (2005) differed with respect to the object (food versus non-food) and the presence or absence of a container. The role of context is also explicitly highlighted in a computational model for the execution and recognition of action sequences proposed by Reference Chersi, Ferrari and FogassiChersi et al. (2011). Likewise, several studies demonstrated a distinction between peripersonal and extrapersonal space (Reference Caggiano, Fogassi, Rizzolatti, Thier and CasileCaggiano et al., 2009; Reference Maranesi, Livi and BoniniMaranesi et al., 2017) and the subjective value of an object that is the target of an action (Reference Caggiano, Fogassi and RizzolattiCaggiano et al., 2012). Further, some F5 mirror neurons are sensitive to the difference between visual stimuli that either caused or did not cause an action (e.g. a hand, represented as a disc, reaching, holding and moving an object, compared to a control condition with a similar movement pattern in which the disc made no contact with the object; Reference Caggiano, Fleischer, Pomper, Giese and ThierCaggiano et al., 2016). This difference was obtained for naturalistic stimuli, and also for abstract stimuli depicting the same causal (or non-causal) relationships, suggesting a broader role in understanding events beyond observed motor behaviours.

Further, some mirror neurons have properties that suggest they form a representation of an upcoming action based on the action affordances that an object presents (Reference Bonini, Maranesi, Livi, Fogassi and RizzolattiBonini et al., 2014; see also Reference Bach, Nicholson and HudsonBach et al., 2014). (‘Affordances’ refer to aspects of an object that are closely linked to a particular kind of action, such as the handles of objects such as pans or mugs.) This class of so-called ‘canonical’ mirror neurons discharges both during an observed action (e.g. grasping a large cone with a whole hand grip) and during the presentation of an object for which that same grip would be appropriate (e.g. a large cone). Further, the firing rate of the majority of such neurons is suppressed when the object is presented behind a transparent plastic barrier (Reference Bonini, Maranesi, Livi, Fogassi and RizzolattiBonini et al., 2014), suggesting that these neurons only fire when it is actually possible for the monkey to interact with the object. This pattern of findings implies a pragmatic coding of an observed object by mirror neurons, in the sense that the representation is influenced by context and the potential for an overt action. While this observation does not necessarily apply to all mirror neurons, it does strongly imply that mirror neuron activity may at least in part support the preparation to act on an object, in contrast to contributing to a more receptive understanding process.

Together, findings like these highlight the contribution of the object, the context and the potential to perform an action in shaping mirror neuron activity, in line with a network-level approach to action understanding (see also Reference Bonini, Rotunno, Arcuri and GalleseBonini et al., 2022). Inspired by findings like these, and by other theoretical considerations, several authors have addressed the possible contributions of the mirror neuron system from a broader perspective; we discuss these next. Note that these proposals, like the original studies on mirror neurons, tend to focus on manual actions performed on a single object, so their applicability to a wider range of actions requires further investigation (see also Section 5).

Reference CsibraCsibra (2008) proposed that mirror neurons might play a role in action reconstruction instead of direct matching. Similar to the steps involved in object recognition, where mid-level features are assembled into objects (e.g. Reference Brincat and ConnorBrincat & Connor, 2004; Reference Güçlü and van GervenGüçlü & van Gerven, 2015; Reference Kravitz, Saleem, Baker, Ungerleider and MishkinKravitz et al., 2013; Reference TanakaTanaka, 1997; Reference Yau, Pasupathy, Brincat and ConnorYau et al., 2013), the proposal is that visual analysis can translate mid-level features such as movements and body parts into complete action representations (see also Reference Fleischer, Caggiano, Thier and GieseFleischer et al., 2013; Reference Lanzilotto, Maranesi and LiviLanzilotto et al., 2020; Reference Perrett, Harries and BevanPerrett et al., 1989; Reference Wurm, Caramazza and LingnauWurm et al., 2017b). Reference CsibraCsibra (2008) furthermore argues that if observed actions are interpreted at a relatively abstract level in the visual system, the resulting representation can serve as the input to the motor system, where these actions can be reproduced. In this view, the role of mirror neurons would be to help an observer to reconstruct the motor programs required to perform such observed actions (for similar arguments, see Reference Bach, Nicholson and HudsonBach et al., 2014; Reference KilnerKilner, 2011). Thus, the action reconstruction proposal posits a role for mirror neurons not as the initial or sole route to action understanding, but rather as an intermediate step between primarily visual encoding and the retrieval of relevant motor behaviours (‘perception-for-action’; see also Reference Maranesi, Livi and BoniniMaranesi et al., 2017). This interpretation is compatible with the observation that the activation of canonical mirror neurons is suppressed when a plastic barrier prevents the monkey from manipulating the object (Reference Bonini, Maranesi, Livi, Fogassi and RizzolattiBonini et al., 2014). Such an intermediate step supports an observer in coordinating their own actions, and with engaging in joint actions (cf. Reference Azaad, Knoblich and SebanzAzaad et al., 2021).

In contrast to viewing the mirror neuron system as a strict feedforward recognition system, several authors have proposed predictive coding models of the mirror neuron system (Reference Donnarumma, Costantini, Ambrosini, Friston and PezzuloDonnarumma et al., 2017; Reference KilnerKilner, 2011; Reference Kilner, Friston and FrithKilner et al., 2007; Reference Oztop, Wolpert and KawatoOztop et al., 2005; Reference Oztop, Kawato and ArbibOztop et al.,2013; Reference Wilson and KnoblichWilson & Knoblich, 2005). In general, predictive coding is the idea that the brain constantly generates and updates mental models, each of which tries to predict representations at the next lower processing level. In this framework, backward connections that compare the prediction to the obtained representation are used to compute a prediction error, which the system tries to minimize. Applying this framework to the mirror neuron system, Reference KilnerKilner (2011) proposed that the most likely goal of an action is derived from a visual analysis of the context of the action (in particular, the target object). Ventral stream areas including the middle temporal gyrus and the anterior portion of the IFG are proposed to retrieve actions that are semantically associated with this object, whereas medial regions of the IFG select the most appropriate action. In turn, the motor parameters corresponding to the selected action are retrieved by mirror neurons on the posterior IFG. On this view, the sensory consequences of actions are fed back to the ventral stream via dorsal regions of the action observation network where the predicted sensory consequences are compared with the observed sensory information. The neural representations of the likely sensory causes of the action are adjusted until the mismatch between the predicted sensory consequences and the observed sensory information is minimized (see also Reference Oztop, Wolpert and KawatoOztop et al., 2005). Here, then, ‘understanding’ the action constitutes a reverse inference of the intent from what is observed. In line with this view, a recent depth-resolved ultra high-field fMRI study comparing feedback signals arriving in parietal cortex reported a higher signal during the observation of predictable versus scrambled sequences (Reference Cerliani, Bhandari and De AngelisCerliani et al., 2022). In sum, predictive coding provides a biologically plausible mechanism that might describe an alternative role for mirror neurons during action observation (namely, the prediction of sensory consequences of the most likely action), and that can explain a number of findings that are hard to reconcile with a strict feedforward account of the mirror neuron system (see also Reference Oztop, Kawato and ArbibOztop et al., 2013).

Finally, in a recent review, Reference Orban, Lanzilotto and BoniniOrban et al. (2021) highlight the role of parietal area AIP in integrating different types of visual information (body movements, body-object relationship, and action-related object features) along with haptic feedback. The authors draw a connection to the affordance competition hypothesis (Reference CisekCisek, 2007) which describes a model of action preparation and execution. In contrast to the assumption of serial processing stages consisting of sensory processing, decision-making and movement planning, this view proposes that sensory processing includes, in parallel, an analysis of the action possibilities, which compete with each other until enough evidence is collected in favour of one of these options. Reference Orban, Lanzilotto and BoniniOrban et al. (2021) argue that, similar to the concept of object affordances, parietal neurons code the affordances of an observed action (‘social affordances’). According to this proposal, visuo-motor parietal neurons code observed actions, such as grasping, and action classes, such as kinds of object manipulation. In turn, these are linked to associated motor plans for the selection and planning of potential motor actions in response to the observed action. Thus, in contrast to the special roles originally attributed to mirror neurons, the proposal by Reference Orban, Lanzilotto and BoniniOrban et al. (2021) highlights the convergence of various different types of visual, somatosensory, and proprioceptive information in parietal cortex, which both helps to identify an observed action, and to support context-appropriate movement planning.

In sum, these recent findings and theoretical proposals suggest ways in which mirror neurons are more complex than originally conceived, and further are embedded in a wider network of brain areas, some of which are more specialized for a visual analysis of the observed action. Collectively, these developments reduce the focus on mirror neurons per se as providing a unified, abstract representation of actions at the pinnacle of an action understanding system. What emerges instead is a view of mirror neurons operating as part of a wider set of processes in which they may provide a concrete representation of observed actions that is closely related to the preparation of corresponding motor plans.

5 Directions for Future Research

Our review points to many open questions. Here we highlight a few, following the structure of the preceding sections.

Both the action frames and action space perspectives (Section 2) require further development. As an example, there is more to learn about how action spaces develop and change with experience. Developmental studies as well as studies with specific populations might provide valuable insights into these questions. Moreover, we need to better understand the structure underlying the representational spaces of actions, and how they are influenced by current task goals. Hypothetical action spaces amount to a proposal about dimension reduction, collapsing many observations into a simpler structure. However, depending on the algorithms used to reduce the dimensionality of the data, we might arrive at very different kinds of structures. A recent computational approach offers a method by which such principles might be discovered, bottom-up, in neural or behavioural data (Reference Kemp and TenenbaumKemp & Tenenbaum, 2008). Likewise, we need to better understand how action frames organize action knowledge, and how they are acquired – another topic that would profit from developmental studies, as well as from studies with special populations (such as neurological patients, or experts in specific types of actions). There are some initial findings on how information about an action is extracted and elaborated over time, particularly from an action spaces perspective (e.g. Reference Dima, Tomita, Honey and IsikDima et al., 2022), but this requires further investigation. Finally, some action categories might have processing ‘priority’ over others, on the basis of being more related to survival over an evolutionary time frame (e.g. attacking, eating) than others that are more recent (e.g. reading; see also Reference CisekCisek, 2019). This relates to similar previous proposals about, for example, emotional face expressions, direct eye gaze, and fear-inducing objects such as snakes. The methods applied to those topics could be extended to learn more about highly salient action kinds.

While the action spaces perspective has proved productive in generating hypotheses about patterns of activity in human neuroimaging studies, this is less straightforward from the action frames view. Given the conceptual similarity with abstract knowledge schemas, we might expect to find similar brain networks engaged, such as the ventromedial prefrontal cortex and the hippocampus (Reference Gilboa and MarlatteGilboa & Marlatte, 2017). Measures of functional connectivity, or of connectivity patterns (e.g. Reference Anzelotti and CoutancheAnzelotti & Coutanche, 2018; Reference Anzelotti, Caramazza and SaxeAnzelotti et al., 2017) could be used to seek evidence of the predicted interplay between regions involved in action observation, object recognition, body and face perception, and scene perception. A better understanding of this interplay would also provide a basis for examining how these dynamics are shaped by the observer’s action understanding goals.

The research to date on the effects of attention and perceptual or cognitive load on action understanding has focused on a fairly limited set of tasks that could be expanded in further studies. In parallel, as the brain encoding of action knowledge becomes better understood (such as in pattern classification studies of the LOTC), this creates opportunities to use multivariate approaches to measure action representations to see how they are modulated under different attention and load conditions.

To date, much of the human neuroscientific work on action understanding has used correlational measures such as fMRI or EEG. Perturbation methods such as TMS allow the targeted disruption of one or more brain regions, as a way to index their normal contributions to behavioural action understanding tasks. That approach has mainly been applied to motor regions, and to quite simple action observation tasks (see also Section 4). Yet more recent work implicating parietal and occipitotemporal regions in rich action knowledge points to further targets for intervention, and predictions about how disrupting those regions should impact on action understanding behaviours.

Biologically inspired models of action understanding have been developed to explain manual reaching and grasping (e.g. Reference Fleischer, Caggiano, Thier and GieseFleischer et al., 2013) and have been inspired by predictive coding and Bayesian modelling (e.g. Reference Bach and SchenkeBach & Schenke, 2017; Reference Baker, Saxe and TenenbaumBaker et al., 2009; Reference Kilner, Friston and FrithKilner et al., 2007; Reference Oztop, Wolpert and KawatoOztop et al., 2005). Extending this line of research towards a wider range of actions while incorporating the rich sources of information that are known to contribute to processing the ‘What, How and Why’ of actions would be fruitful for the generation of new testable hypotheses. More specifically, potential lines for this modelling work will be to more explicitly incorporate (a) the role of information obtained about actions from different perceptual systems that analyse objects, scenes, postures and movements and the way this information is combined; and (b) the observer’s own knowledge about how a family of actions is performed, such as through first-hand experience with a particular sport.

The cognitive neuroscience of object understanding has been transformed in recent years by the use of deep neural network models (Reference Cadieu, Hong and YaminsCadieu et al., 2014; Reference Cichy and KaiserCichy & Kaiser, 2019; Reference Spoerer, McClure and KriegeskorteSpoerer et al., 2017). These have been proposed to offer a source of hypotheses about the transformations that link early visual encoding of a visual scene (edges, surfaces, contours, etc.) and later high-level object representations (see also Reference Güçlü and van GervenGüçlü & van Gerven, 2015; Reference Seeliger, Ambrogioni and GüçlütürkSeeliger et al., 2021). Similarly, the layers of such networks have been compared to stages of the inferotemporal pathways of the visual brain (although such comparisons are not necessarily straightforward; Reference Bowers, Malhotra and DujmovićBowers et al., 2022). It may be worthwhile to explore whether we can identify similarities between the processing hierarchy and critical features for actions captured in the visual system, and deep neural networks that are trained on action understanding tasks. Additional insights might be gained from synthesizing images that are expected to strongly drive certain brain regions known to be involved in the processing of observed actions using generative adversarial networks – an approach successfully used in the domain of object perception (Reference Murty, Bashivan, Abate, DiCarlo and KanwisherMurty et al., 2021).

Finally, as described throughout this review, action understanding typically goes hand in hand with planning our own actions, even if the degree to which these two processes mutually depend on each other is still a matter of debate. That said, recent technological developments in virtual reality and mobile human neuroimaging (see e.g. Reference Stangl, Maoz and SuthanaStangl et al., 2023) enable examining the processes involved in action understanding in the real world and thus open an entirely new approach.

6 Concluding Remarks

Action understanding, like other kinds of understanding, is a complex construct. It covers a broad class of behaviours that are aimed at learning about events in the world, and about the links between cause and effect, including physical and mental causes. Accordingly, a key message of this review is that multiple kinds of cognitive processes and representations are implicated in action understanding, and the nature of these depends on the experience and the goals of the observer.

Many recent treatments of the topic of action understanding begin with the mirror neuron system and work outwards from observations about their properties and ostensibly analogous properties of the human brain and behaviour. This approach has clearly been productive, as witnessed by the resulting explosion of empirical findings and theoretical perspectives. However, it has also sometimes begged the question by assuming a role for mirror neurons and then seeking that role, and in some cases fitting definitions of action understanding around the resulting findings – a form of reverse inference that may be in part responsible for perpetuating controversies around this topic.

In contrast, we have started by asking first why an observer might attend others’ actions – what goals this might serve – and then in turn what cognitive and neural machinery might be necessary to achieve those goals. As a guiding framework, we were led by three broad themes: understanding what an action is, how it is carried out, and why it is performed. While these distinctions highlight different requirements of cognitive systems for action understanding, it is also clear that crosstalk amongst these action understanding goals and the implicated systems is probably the norm, rather than the exception, in real-world behaviour.

One point that emerges repeatedly is that predictive processes of various kinds are central to action understanding. These include, for example, abstract predictions that might be made about a hypothetical actor, to guess what kind of action she might carry out given her aims; predictions about the kind of action that is observed, and the intended outcomes, based on the metric details of the actor’s grasp and eye movements, objects and the scene (see also Reference Wurm and SchubotzWurm & Schubotz, 2012, Reference Wurm and Schubotz2017); and predictions about the traits of a specific actor, and her future behaviours, based on the evidence of her current actions. Prediction, of course, is arguably central to all forms of perception and of understanding (Reference Kilner, Friston and FrithKilner et al., 2007). Forming a meaningful model of the world involves the processing of information about what might come next, and also about the possible outcomes of one’s own behaviours. In this light, the connection between prediction and action understanding may not be a unique one, but actions, even simple ones, are simply a very rich source of different kinds of cues about the social and physical world.

In sum, we believe that progress in understanding action understanding profits from a focus on diverse kinds of observer goals, and available cues to support those goals. We believe that this approach opens up new avenues for research, especially where paradigms and methods from the domain of object recognition can be transferred to action understanding. We hope that this review inspires the current and next generation of researchers to pick up these threads and to carry out future studies along these lines.

Acknowledgments

Our thanks to Jens Schwarzbach, Marius Zimmermann, Moritz Wurm, Deyan Mitev, Maximilian Reger, Marisa Birk, Federica Danaj, Zuzanna Kabulska, and Filip Djurovic for helpful discussions and comments on previous versions of this manuscript. A.L. was supported by a DFG Heisenberg-Professorship (LI 2840/2-1).

  • James T. Enns

  • The University of British Columbia

  • Editor James T. Enns is Professor at the University of British Columbia, where he researches the interaction of perception, attention, emotion, and social factors. He has previously been Editor of the Journal of Experimental Psychology: Human Perception and Performance and an Associate Editor at Psychological Science, Consciousness and Cognition, Attention Perception & Psychophysics, and Visual Cognition.

Editorial Board

  • Gregory Francis Purdue University

  • Kimberly Jameson University of California, Irvine

  • Tyler Lorig Washington and Lee University

  • Rob Gray Arizona State University

  • Salvador Soto-Faraco Universitat Pompeu Fabra

About the Series

  • The modern study of human perception includes event perception, bidirectional influences between perception and action, music, language, the integration of the senses, human action observation, and the important roles of emotion, motivation, and social factors. Each Element in the series combines authoritative literature reviews of foundational topics with forward-looking presentations of the recent developments on a given topic.

Footnotes

1 Applied to neuroimaging, ‘reverse inference’ describes estimating the cognitive processes involved in a task on the basis of the brain regions that are engaged by that task (in fMRI, for example). While sometimes used perjoratively, reverse inference may be a strong form of induction where the activity of the region in question is consistently selective across different contexts (Reference PoldrackPoldrack, 2006).

References

Abdollahi, R. O., Jastorff, J., & Orban, G. A. (2013). Common and segregated processing of observed actions in human SPL. Cerebral Cortex, 23(11), 27342753.CrossRefGoogle ScholarPubMed
Adams, R. B., Adams, R. B. Jr., Ambady, N., Nakayama, K., & Shimojo, S. (Eds.). (2011). The Science of Social Vision: The Science of Social Vision (Vol. 7). Oxford University Press.Google Scholar
Aflalo, T., Zhang, C. Y., Rosario, E. R., et al. (2020). A shared neural substrate for action verbs and observed actions in human posterior parietal cortex. Science Advances, 6(43), 116.CrossRefGoogle ScholarPubMed
Aglioti, S. M., Cesari, P., Romani, M., & Urgesi, C. (2008). Action anticipation and motor resonance in elite basketball players. Nature Neuroscience, 11(9), 11091116.CrossRefGoogle ScholarPubMed
Aksoy, E. E., Orhan, A., & Wörgötter, F. (2017). Semantic decomposition and recognition of long and complex manipulation action sequences. International Journal of Computer Vision, 122(1), 84115. https://doi.org/10.1007/s11263-016-0956-8.CrossRefGoogle Scholar
Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256274.CrossRefGoogle Scholar
Ambrosini, E., Costantini, M., & Sinigaglia, C. (2011). Grasping with the eyes. Journal of Neurophysiology, 106(3), 14371442.CrossRefGoogle ScholarPubMed
Ambrosini, E., Pezzulo, G., & Costantini, M. (2015). The eye in hand: Predicting others’ behavior by integrating multiple sources of information. Journal of Neurophysiology, 113(7), 22712279.CrossRefGoogle ScholarPubMed
Amoruso, L., & Finisguerra, A. (2019). Low or high-level motor coding? The role of stimulus complexity. Frontiers in Human Neuroscience, 13, 19.CrossRefGoogle ScholarPubMed
Amoruso, L., & Urgesi, C. (2016). Contextual modulation of motor resonance during the observation of everyday actions. NeuroImage, 134, 7484.CrossRefGoogle ScholarPubMed
Anzelotti, S., & Coutanche, M. N. (2018). Beyond functional connectivity: Investigating networks of multivariate representations. Trends in Cognitive Sciences, 22, 258269.CrossRefGoogle Scholar
Anzelotti, S., Caramazza, A., & Saxe, R. (2017). Multivariate pattern dependence. PloS Computational Biology, 20, 120. https://doi.org/10.1371/ journal.pcbi.1005799.Google Scholar
Arioli, M., & Canessa, N. (2019). Neural processing of social interaction: Coordinate-based meta-analytic evidence from human neuroimaging studies. Human Brain Mapping, 40(13), 37123737.CrossRefGoogle ScholarPubMed
Atkinson, A. P., Dittrich, W. H., Gemmell, A. J., & Young, A. W. (2004). Emotion perception from dynamic and static body expressions in point-light and full-light displays. Perception, 33(6), 717746.CrossRefGoogle ScholarPubMed
Aviezer, H., Trope, Y., & Todorov, A. (2012). Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science, 338(6111), 12251229.CrossRefGoogle Scholar
Axelrod, R. (1980). Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24(1), 325.CrossRefGoogle Scholar
Ayzenberg, V., & Behrmann, M. (2022). Does the brain’s ventral visual pathway compute object shape? Trends in Cognitive Sciences, 11191132.CrossRefGoogle Scholar
Azaad, S., Knoblich, G., & Sebanz, N. (2021). Perception and Action in a Social Context. Cambridge University Press.CrossRefGoogle Scholar
Bach, P., & Schenke, K. C. (2017). Predictive social perception: Towards a unifying framework from action observation to person knowledge. Social and Personality Psychology Compass, 11(7), 117.CrossRefGoogle Scholar
Bach, P., Knoblich, G., Gunter, T. C., Friederici, A. D., & Prinz, W. (2005). Action comprehension: Deriving spatial and functional relations. Journal of Experimental Psychology: Human Perception and Performance, 31(3), 465479.Google ScholarPubMed
Bach, P., Peatfield, N. A., & Tipper, S. P. (2007). Focusing on body sites: The role of spatial attention in action perception. Experimental Brain Research, 178, 509517.CrossRefGoogle ScholarPubMed
Bach, P., Nicholson, T., & Hudson, M. (2014). The affordance-matching hypothesis: How objects guide action understanding and prediction. Frontiers in Human Neuroscience, 8, 113.CrossRefGoogle ScholarPubMed
Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113, 329349.CrossRefGoogle ScholarPubMed
Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (2017). Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour, 1(4), 110.CrossRefGoogle Scholar
Baldissera, F., Cavallari, P., Craighero, L., & Fadiga, L. (2001). Modulation of spinal excitability during observation of hand actions in humans. European Journal of Neuroscience, 13(1), 190194.CrossRefGoogle ScholarPubMed
Bandura, A., & Jeffrey, R. W. (1973). Role of symbolic coding and rehearsal processes in observational learning. Journal of Personality and Social Psychology, 26(1), 122130.CrossRefGoogle Scholar
Bandura, A., & Walters, R. H. (1977). Social Learning Theory (Vol. 1). Prentice Hall: Englewood cliffs.Google Scholar
Bar, M., Kassam, K. S., Ghuman, A. S., et al. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences, 103(2), 449454.CrossRefGoogle ScholarPubMed
Bargh, J. A. (1989). Conditional automaticity: Varieties of automatic influence in social perception and cognition. Unintended Thought, 351.Google Scholar
Baumard, J., & Le Gall, D. (2021). The challenge of apraxia: Toward an operational definition? Cortex, 141, 6680.CrossRefGoogle ScholarPubMed
Bekkering, H., Wohlschlager, A., & Gattis, M. (2000). Imitation of gestures in children is goal-directed. The Quarterly Journal of Experimental Psychology: Section A, 53(1), 153164.CrossRefGoogle ScholarPubMed
Benoni, H. (2018). Can automaticity be verified utilizing a perceptual load manipulation? Psychonomic Bulletin & Review, 25(6), 20372046.CrossRefGoogle ScholarPubMed
Bestmann, S., & Krakauer, J. W. (2015). The uses and interpretations of the motor-evoked potential for understanding behaviour. Experimental Brain Research, 233, 679689.CrossRefGoogle ScholarPubMed
Betti, S., Finisguerra, A., Amoruso, L., & Urgesi, C. (2022). Contextual priors guide perception and motor responses to observed actions. Cerebral Cortex, 32(3), 608625.CrossRefGoogle ScholarPubMed
Beymer, D., & Poggio, T. (1996). Image representations for visual learning. Science, 272(5270), 19051909.CrossRefGoogle ScholarPubMed
Binkofski, F., & Buxbaum, L. J. (2013). Two action systems in the human brain. Brain and Language, 127(2), 222229.CrossRefGoogle ScholarPubMed
Bird, G., Osman, M., Saggerson, A., & Heyes, C. (2005). Sequence learning by action, observation and action observation. British Journal of Psychology, 96(3), 371388.CrossRefGoogle ScholarPubMed
Blake, R., & Shiffrar, M. (2007). Perception of human motion. Annual Review of Psychology, 58, 4773.CrossRefGoogle ScholarPubMed
Bonini, L., Rozzi, S., Serventi, F. U., et al. (2010). Ventral premotor and inferior parietal cortices make distinct contribution to action organization and intention understanding. Cerebral Cortex, 20, 13721385.CrossRefGoogle ScholarPubMed
Bonini, L., & Ferrari, P. F. (2011). Evolution of mirror systems: a simple mechanism for complex cognitive functions. Annals of the New York Academy of Sciences, 1225(1), 166175.CrossRefGoogle ScholarPubMed
Bonini, L., Maranesi, M., Livi, A., Fogassi, L., & Rizzolatti, G. (2014). Space-dependent representation of objects’and other’s action in monkey ventral premotor grasping neurons. Journal of Neuroscience, 34(11), 41084119.CrossRefGoogle ScholarPubMed
Bonini, L., Rotunno, C., Arcuri, E., & Gallese, V. (2022). Mirror neurons 30 years later: Implications and applications. Trends in Cognitive Sciences, 767781.CrossRefGoogle Scholar
Bower, G. H., Black, J. B., & Turner, T. J. (1979). Scripts in memory for text. Cognitive Psychology, 11(2), 177220.CrossRefGoogle Scholar
Bowers, J. S., Malhotra, G., Dujmović, M., et al. (2022). Deep problems with neural network models of human vision. Behavioral and Brain Sciences, 1–77, 174.Google Scholar
Brandman, T., & Peelen, M. V. (2017). Interaction between scene and object processing revealed by human fMRI and MEG decoding. Journal of Neuroscience, 37(32), 77007710.CrossRefGoogle ScholarPubMed
Brass, M., Bekkering, H., Wohlschläger, A., & Prinz, W. (2000). Compatibility between observed and executed finger movements: Comparing symbolic, spatial, and imitative cues. Brain and Cognition, 44(2), 124143.CrossRefGoogle ScholarPubMed
Brass, M., Schmitt, R. M., Spengler, S., & Gergely, G. (2007). Investigating action understanding: Inferential processes versus action simulation. Current Biology, 17(24), 21172121.CrossRefGoogle ScholarPubMed
Brincat, S. L., & Connor, C. E. (2004). Underlying principles of visual shape selectivity in posterior inferotemporal cortex. Nature Neuroscience, 7, 880886.CrossRefGoogle ScholarPubMed
Buxbaum, L. J., Shapiro, A. D., & Coslett, H. B. (2014). Critical brain regions for tool-related and imitative actions: A componential analysis. Brain, 137(7), 19711985.CrossRefGoogle ScholarPubMed
Cadieu, C. F., Hong, H., Yamins, D. L., et al. (2014). Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Computational Biology, 10(12), 118.CrossRefGoogle ScholarPubMed
Caggiano, V., Fogassi, L., Rizzolatti, G., Thier, P., & Casile, A. (2009). Mirror neurons differentially encode the peripersonal and extrapersonal space of monkeys. Science, 324(5925), 403406.CrossRefGoogle ScholarPubMed
Caggiano, V., Fogassi, L., Rizzolatti, G., et al. (2011). View-based encoding of actions in mirror neurons of area f5 in macaque premotor cortex. Current Biology, 21(2), 144148.CrossRefGoogle ScholarPubMed
Caggiano, V., Fogassi, L., Rizzolatti, G., et al. (2012). Mirror neurons encode the subjective value of an observed action. Proceedings of the National Academy of Sciences, 109(29), 1184811853.CrossRefGoogle ScholarPubMed
Caggiano, V., Pomper, J. K., Fleischer, F., et al. (2013). Mirror neurons in monkey area F5 do not adapt to the observation of repeated actions. Nature Communications, 4(1), 18.CrossRefGoogle ScholarPubMed
Caggiano, V., Fleischer, F., Pomper, J. K., Giese, M. A., & Thier, P. (2016). Mirror neurons in monkey premotor area F5 show tuning for critical features of visual causality perception. Current Biology, 26(22), 30773082.CrossRefGoogle ScholarPubMed
Calvo-Merino, B., Glaser, D. E., Grèzes, J., Passingham, R. E., & Haggard, P. (2005). Action observation and acquired motor skills: An FMRI study with expert dancers. Cerebral Cortex, 15(8), 12431249.CrossRefGoogle ScholarPubMed
Calvo-Merino, B., Grèzes, J., Glaser, D. E., Passingham, R. E., & Haggard, P. (2006). Seeing or doing? Influence of visual and motor familiarity in action observation. Current Biology, 16(19), 19051910.CrossRefGoogle ScholarPubMed
Camponogara, I., Rodger, M., Craig, C., & Cesari, P. (2017). Expert players accurately detect an opponent’s movement intentions through sound alone. Journal of Experimental Psychology: Human Perception and Performance, 43(2), 348359.Google ScholarPubMed
Cappa, S. F., Binetti, G., Pezzini, A., et al. (1998). Object and action naming in Alzheimer’s disease and frontotemporal dementia. Neurology, 50(2), 351355.CrossRefGoogle ScholarPubMed
Caramazza, A., Anzellotti, S., Strnad, L., & Lingnau, A. (2014). Embodied cognition and mirror neurons: A critical assessment. Annual Review of Neuroscience, 37, 115.CrossRefGoogle ScholarPubMed
Casile, A., & Giese, M. A. (2006). Nonvisual motor training influences biological motion perception. Current Biology, 16(1), 6974.CrossRefGoogle ScholarPubMed
Caspers, S., Zilles, K., Laird, A. R., & Eickhoff, S. B. (2010). ALE meta-analysis of action observation and imitation in the human brain. Neuroimage, 50(3), 11481167.CrossRefGoogle ScholarPubMed
Catmur, C. (2016). Automatic imitation? Imitative compatibility affects responses at high perceptual load. Journal of Experimental Psychology: Human Perception and Performance, 42(4), 530539.Google ScholarPubMed
Catmur, C., Walsh, V., & Heyes, C. (2007). Sensorimotor learning configures the human mirror system. Current Biology, 17(17), 15271531.CrossRefGoogle ScholarPubMed
Cattaneo, L., Sandrini, M., & Schwarzbach, J. (2010). State-dependent TMS reveals a hierarchical representation of observed acts in the temporal, parietal and premotor cortices. Cerebral Cortex, 20(9), 22522258.CrossRefGoogle ScholarPubMed
Cavallo, A., Koul, A., Ansuini, C., Capozzi, F., & Becchio, C. (2016). Decoding intentions from movement kinematics. Scientific Reports, 6(1), 18.CrossRefGoogle ScholarPubMed
Cavanagh, P., Caplovitz, G. P., Lytchenko, T. K., Maechler, M. R., Tse, P. U., & Sheinberg, D. L. (2023). The Architecture of Object-Based Attention. Psychonomic Bulletin & Review, 125.CrossRefGoogle Scholar
Cerliani, L., Bhandari, R., De Angelis, L., et al. (2022). Predictive coding during action observation – A depth-resolved intersubject functional correlation study at 7T. Cortex, 148, 121138.CrossRefGoogle ScholarPubMed
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception–behavior link and social interaction. Journal of Personality and Social Psychology, 76(6), 893–910.Google Scholar
Chersi, F., Ferrari, P. F., & Fogassi, L. (2011). Neuronal chains for actions in the parietal lobe: A computational model. PloS one, 6(11), 115.CrossRefGoogle ScholarPubMed
Chong, T. T. J., Cunnington, R., Williams, M. A., Kanwisher, N., & Mattingley, J. B. (2008). fMRI adaptation reveals mirror neurons in human inferior parietal cortex. Current Biology, 18(20), 15761580.CrossRefGoogle ScholarPubMed
Chong, T. T. J., Cunnington, R., Williams, M. A., & Mattingley, J. B. (2009). The role of selective attention in matching observed and executed actions. Neuropsychologia, 47(3), 786795.CrossRefGoogle ScholarPubMed
Christensen, J. F., & Calvo-Merino, B. (2013). Dance as a subject for empirical aesthetics. Psychology of Aesthetics, Creativity, and the Arts, 7(1), 7688.CrossRefGoogle Scholar
Chun, M. M., Golomb, J. D., & Turk-Browne, N. B. (2011). A taxonomy of external and internal attention. Annual Review of Psychology, 62(1), 73101.CrossRefGoogle ScholarPubMed
Cichy, R. M., & Kaiser, D. (2019). Deep neural networks as scientific models. Trends in Cognitive Sciences, 23, 305317.CrossRefGoogle ScholarPubMed
Cisek, P. (2007). Cortical mechanisms of action selection: The affordance competition hypothesis. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1485), 15851599.CrossRefGoogle ScholarPubMed
Cisek, P. (2019). Resynthesizing behavior through phylogenetic refinement. Attention, Perception, & Psychophysics, 81, 22652287.CrossRefGoogle ScholarPubMed
Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 8(2), 240247.CrossRefGoogle Scholar
Cook, R., Bird, G., Catmur, C., Press, C., & Heyes, C. (2014). Mirror neurons: From origin to function. Behavioral and Brain Sciences, 37(2), 177192.CrossRefGoogle ScholarPubMed
Cracco, E., Bardi, L., Desmet, C., et al. (2018). Automatic imitation: A meta-analysis. Psychological Bulletin, 144(5), 453500.CrossRefGoogle ScholarPubMed
Cross, E. S., Hamilton, A. F. D. C., & Grafton, S. T. (2006). Building a motor simulation de novo: Observation of dance by dancers. Neuroimage, 31(3), 12571267.CrossRefGoogle ScholarPubMed
Csibra, G. (2008). Action mirroring and action understanding: An alternative account. Sensorymotor Foundations of Higher Cognition: Attention and Performance XXII, 435459.Google Scholar
Cusack, J. P., Williams, J. H., & Neri, P. (2015). Action perception is intact in autism spectrum disorder. Journal of Neuroscience, 35(5), 18491857.CrossRefGoogle ScholarPubMed
Darda, K. M., & Ramsey, R. (2019). The inhibition of automatic imitation: A meta-analysis and synthesis of fMRI studies. NeuroImage, 197, 320329.CrossRefGoogle ScholarPubMed
de la Rosa, S., Schillinger, F. L., Bülthoff, H. H., Schultz, J., & Umildag, K. (2016). fMRI adaptation between action observation and action execution reveals cortical areas with mirror neuron properties in human BA 44/45. Frontiers in Human Neuroscience, 111. https://doi.org/10.3389/fnhum.2016.00078.CrossRefGoogle Scholar
de Lange, F. P., Spronk, M., Willems, R. M., Toni, I., & Bekkering, H. (2008). Complementary systems for understanding action intentions. Current Biology, 18(6), 454457.CrossRefGoogle ScholarPubMed
de Lange, F. P., Heilbron, M., & Kok, P. (2018). How do expectations shape perception? Trends in Cognitive Sciences, 22(9), 764779.CrossRefGoogle ScholarPubMed
Dennett, D. C. (1987). The Intentional Stance. MIT press.Google Scholar
Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176180.CrossRefGoogle ScholarPubMed
Dima, D. C., Tomita, T. M., Honey, C. J., & Isik, L. (2022). Social-affective features drive human representations of observed actions. Elife, 11, 122.CrossRefGoogle ScholarPubMed
Dinstein, I., Hasson, U., Rubin, N., & Heeger, D. J. (2007). Brain areas selective for both observed and executed movements. Journal of Neurophysiology, 98(3), 14151427.CrossRefGoogle ScholarPubMed
Dinstein, I., Thomas, C., Behrmann, M., & Heeger, D. J. (2008). A mirror up to nature. Current Biology, 18(1), R13R18.CrossRefGoogle ScholarPubMed
Donnarumma, F., Costantini, M., Ambrosini, E., Friston, K., & Pezzulo, G. (2017). Action perception as hypothesis testing. Cortex, 89, 4560.CrossRefGoogle ScholarPubMed
Dungan, J. A., Stepanovic, M., & Young, L. (2016). Theory of mind for processing unexpected events across contexts. Social Cognitive and Affective Neuroscience, 11(8), 11831192.CrossRefGoogle ScholarPubMed
Edelman, S. (1998). Representation is representation of similarities. Behavioral and Brain Sciences, 21, 449498.CrossRefGoogle ScholarPubMed
Epstein, R. A., & Baker, C. I. (2019). Scene perception in the human brain. Annual Review of Vision Science, 5, 373397.CrossRefGoogle ScholarPubMed
Ernst, M. O. (2006). A Bayesian view on multimodal integration Cue. Human Body Perception from the Inside Out, 105131.Google Scholar
Estes, S. G. (1938). Judging personality from expressive behavior. The Journal of Abnormal and Social Psychology, 33(2), 217236.CrossRefGoogle Scholar
Fadiga, L., Fogassi, L., Pavesi, G., & Rizzolatti, G. (1995). Motor facilitation during action observation: A magnetic stimulation study. Journal of Neurophysiology, 73(6), 26082611.CrossRefGoogle ScholarPubMed
Ferrari, P. F., Bonini, L., & Fogassi, L. (2009). From monkey mirror neurons to primate behaviours: Possible ‘direct’ and ‘indirect’ pathways. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1528), 23112323.CrossRefGoogle ScholarPubMed
Finisguerra, A., Amoruso, L., Makris, S., & Urgesi, C. (2018). Dissociated representations of deceptive intentions and kinematic adaptations in the observer’s motor system. Cerebral Cortex, 28(1), 3347.CrossRefGoogle ScholarPubMed
Flanagan, J. R., & Johansson, R. S. (2003). Action plans used in action observation. Nature, 424(6950), 769771.CrossRefGoogle ScholarPubMed
Fleischer, F., Caggiano, V., Thier, P., & Giese, M. A. (2013). Physiologically inspired model for the visual recognition of transitive hand actions. Journal of Neuroscience, 33, 65636580.CrossRefGoogle ScholarPubMed
Fogassi, L., Ferrari, P. F., Gesierich, B., et al. (2005). Parietal lobe: From action organization to intention understanding. Science, 308(5722), 662667.CrossRefGoogle ScholarPubMed
Frith, C. D., & Done, D. J. (1988). Towards a neuropsychology of schizophrenia. The British Journal of Psychiatry, 153(4), 437443.CrossRefGoogle ScholarPubMed
Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119(2), 593609.CrossRefGoogle ScholarPubMed
Gangitano, M., Mottaghy, F. M., & Pascual-Leone, A. (2001). Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12, 14891492.CrossRefGoogle ScholarPubMed
Gangitano, M., Mottaghy, F. M., & Pascual-Leone, A. (2004). Modulation of premotor mirror neuron activity during observation of unpredictable grasping movements. European Journal of Neuroscience, 20(8), 21932202.CrossRefGoogle ScholarPubMed
Gärdenfors, P. (2004). Conceptual Spaces: The Geometry of Thought. MIT press.Google Scholar
Gardner, T., Aglinskas, A., & Cross, E. S. (2017). Using guitar learning to probe the action observation network’s response to visuomotor familiarity. NeuroImage, 156, 174189.CrossRefGoogle ScholarPubMed
Georgopoulos, A. P. (1990). Neurophysiology of reaching. In M. Jeannerod (Ed.), Attention and performance 13: Motor representation and control (pp. 227–263). Lawrence Erlbaum Associates, Inc.Google Scholar
Gibson, J. J. (1979/2014). The Ecological Approach to Visual Perception: Classic Edition. Psychology Press.CrossRefGoogle Scholar
Giese, M. A., & Poggio, T. (2003). Neural mechanisms for the recognition of biological movements. Nature Reviews Neuroscience, 4(3), 179192.CrossRefGoogle ScholarPubMed
Gilbert, D. T., & Malone, P. S. (1995). The correspondence bias. Psychological Bulletin, 117(1), 2138.CrossRefGoogle ScholarPubMed
Gilboa, A., & Marlatte, H. (2017). Neurobiology of schemas and schema-mediated memory. Trends in Cognitive Sciences, 21, 618631.CrossRefGoogle ScholarPubMed
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15(1), 2025.CrossRefGoogle ScholarPubMed
Grafton, S. T., Arbib, M. A., Fadiga, L., & Rizzolatti, G. (1996). Localization of grasp representations in humans by positron emission tomography: 2. Observation compared with imagination. Experimental Brain Research, 112, 103111.CrossRefGoogle ScholarPubMed
Green, C., & Hummel, J. E. (2006). Familiar interacting object pairs are perceptually grouped. Journal of Experimental Psychology: Human Perception and Performance, 32(5), 11071119.Google ScholarPubMed
Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional properties of human cortical neurons. Acta Psychologica, 107(1–3), 293321.CrossRefGoogle ScholarPubMed
Güçlü, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35, 1000510014.CrossRefGoogle ScholarPubMed
Hafri, A., & Firestone, C. (2021). The perception of relations. Trends in Cognitive Sciences, 25(6), 475492.CrossRefGoogle ScholarPubMed
Hafri, A., Trueswell, J. C., & Epstein, R. A. (2017). Neural representations of observed actions generalize across static and dynamic visual input. Journal of Neuroscience, 37(11), 30563071.CrossRefGoogle ScholarPubMed
Hamilton, A. F., & Grafton, S. T. (2007). The motor hierarchy: From kinematics to goals and intentions. Sensorimotor Foundations of Higher Cognition, 22, 381408.Google Scholar
Hamilton, A. F., & Grafton, S. T. (2008). Action outcomes are represented in human inferior frontoparietal cortex. Cerebral Cortex, 18(5), 11601168.CrossRefGoogle ScholarPubMed
Hamilton, A. F. D. C., & Grafton, S. T. (2006). Goal representation in human anterior intraparietal sulcus. Journal of Neuroscience, 26(4), 11331137.CrossRefGoogle ScholarPubMed
Hardwick, R. M., Caspers, S., Eickhoff, S. B., & Swinnen, S. P. (2018). Neural correlates of action: Comparing meta-analyses of imagery, observation, and execution. Neuroscience and Biobehavioral Reviews, 94, 3144.CrossRefGoogle ScholarPubMed
Hari, R. (2006). Action–perception connection and the cortical mu rhythm. Progress in Brain Research, 159, 253260.CrossRefGoogle ScholarPubMed
Harpaz, N. K., Flash, T., & Dinstein, I. (2014). Scale-invariant movement encoding in the human motor system. Neuron, 81(2), 452462.Google Scholar
Hemed, E., Mark-Tavger, I., Hertz, U., Bakbani-Elkayam, S., & Eitam, B. (2021). Automatically controlled: Task irrelevance fully cancels otherwise automatic imitation. Journal of Experimental Psychology: General, 9961017.Google Scholar
Heyes, C. (2001). Causes and consequences of imitation. Trends in Cognitive Sciences, 5(6), 253261.CrossRefGoogle ScholarPubMed
Heyes, C., & Catmur, C. (2022). What happened to mirror neurons? Perspectives on Psychological Science, 17(1), 153168.CrossRefGoogle ScholarPubMed
Hickok, G. (2009). Eight problems for the mirror neuron theory of action understanding in monkeys and humans. Journal of Cognitive Neuroscience, 21(7), 12291243.CrossRefGoogle ScholarPubMed
Hobson, H. M., & Bishop, D. V. (2017). The interpretation of mu suppression as an index of mirror neuron activity: Past, present and future. Royal Society Open Science, 4(3), 122.CrossRefGoogle ScholarPubMed
Hutchinson, J. B., & Barrett, L. F. (2019). The power of predictions: An emerging paradigm for psychological research. Current Directions in Psychological Science, 28(3), 280291.CrossRefGoogle ScholarPubMed
Iacoboni, M. (2009). Imitation, empathy, and mirror neurons. Annual Review of Psychology, 60, 653670.CrossRefGoogle ScholarPubMed
Iacoboni, M., Woods, R. P., Brass, M., et al. (1999). Cortical mechanisms of human imitation. Science, 286(5449), 25262528.CrossRefGoogle ScholarPubMed
Jellema, T., Baker, C. I., Wicker, B., & Perrett, D. I. (2000). Neural representation for the perception of the intentionality of actions. Brain and Cognition, 44, 280302.CrossRefGoogle ScholarPubMed
Johnson, K. L., Gill, S., Reichman, V., & Tassinary, L. G. (2007). Swagger, sway, and sexuality: Judging sexual orientation from body motion and morphology. Journal of Personality and Social Psychology, 93(3), 321334.CrossRefGoogle ScholarPubMed
Jola, C., Abedian-Amiri, A., Kuppuswamy, A., Pollick, F. E., & Grosbras, M. H. (2012). Motor simulation without motor expertise: Enhanced corticospinal excitability in visually experienced dance spectators. PloS one, 7(3), 12.CrossRefGoogle ScholarPubMed
Kabulska, Z., & Lingnau, A. (2022). The cognitive structure underlying the organization of observed actions. Behavior Research Methods, 55, 18901906.CrossRefGoogle ScholarPubMed
Kaiser, D., Quek, G. L., Cichy, R. M., & Peelen, M. V. (2019). Object vision in a structured world. Trends in Cognitive Sciences, 23(8), 672685.CrossRefGoogle Scholar
Kalénine, S., Buxbaum, L. J., & Coslett, H. B. (2010). Critical brain regions for action recognition: Lesion symptom mapping in left hemisphere stroke. Brain, 133(11), 32693280.CrossRefGoogle ScholarPubMed
Kelly, S. W., Burton, A. M., Riedel, B., & Lynch, E. (2003). Sequence learning by action and observation: Evidence for separate mechanisms. British Journal of Psychology, 94(3), 355372.CrossRefGoogle ScholarPubMed
Kemmerer, D. (2021). What modulates the Mirror Neuron System during action observation? Multiple factors involving the action, the actor, the observer, the relationship between actor and observer, and the context. Progress in Biology, 205, 124.Google ScholarPubMed
Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences, 105(31), 1068710692.CrossRefGoogle ScholarPubMed
Kilner, J. M. (2011). More than one pathway to action understanding. Trends in Cognitive Sciences, 15(8), 352357.CrossRefGoogle ScholarPubMed
Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). Predictive coding: An account of the mirror neuron system. Cognitive Processing, 8, 159166.CrossRefGoogle ScholarPubMed
Kilner, J. M., Kraskov, A., & Lemon, R. N. (2014). Do monkey F5 mirror neurons show changes in firing rate during repeated observation of natural actions? Journal of Neurophysiology, 111(6), 12141226.CrossRefGoogle ScholarPubMed
Kilner, J. M., & Lemon, R. N. (2013). What we know currently about mirror neurons. Current Biology, 23(23), R1057R1062.CrossRefGoogle ScholarPubMed
Kilner, J. M., Vargas, C., Duval, S., Blakemore, S. J., Sirigu, A. (2004). Motor activation prior to observation of a predicted movement. Nature Neuroscience, 7(12), 12991301.CrossRefGoogle ScholarPubMed
Kilner, J. M., Neal, A., Weiskopf, N., Friston, K. J., & Frith, C. D. (2009). Evidence of mirror neurons in human inferior frontal gyrus. Journal of Neuroscience, 29(32), 1015310159.CrossRefGoogle ScholarPubMed
Knoblich, G., & Flach, R. (2001). Predicting the effects of actions: Interactions of perception and action. Psychological Science, 12, 467472.CrossRefGoogle ScholarPubMed
Kohler, E., Keysers, C., Umilta, M. A., et al. (2002). Hearing sounds, understanding actions: Action representation in mirror neurons. Science, 297(5582), 846848.CrossRefGoogle ScholarPubMed
Kozlowski, L. T., & Cutting, J. E. (1977). Recognizing the sex of a walker from a dynamic point-light display. Perception & Psychophysics, 21, 575580.CrossRefGoogle Scholar
Kramer, R. S., Arend, I., & Ward, R. (2010). Perceived health from biological motion predicts voting behaviour. The Quarterly Journal of Experimental Psychology, 63(4), 625632.CrossRefGoogle ScholarPubMed
Kraskov, A., Dancause, N., Quallo, M. M., Shepert, S., & Lemon, R. N. (2009). Corticospinal neurons in macaque ventral premotor cortex with mirror properties: A potential mechanism for action suppression? Neuron, 64, 922930.CrossRefGoogle ScholarPubMed
Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G., & Mishkin, M. (2013). The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 2649.CrossRefGoogle ScholarPubMed
Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: Integrating cognition, computation, and the brain. Trends in Cognitive Sciences, 17(8), 401412.CrossRefGoogle ScholarPubMed
Kriegeskorte, N., & Mur, M. (2012). Inverse MDS: Inferring dissimilarity structure from multiple item arrangements. Frontiers in Psychology, 3, 113.CrossRefGoogle ScholarPubMed
Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences, 103(10), 38633868.CrossRefGoogle ScholarPubMed
Kriegeskorte, N., Mur, M., & Bandettini, P. (2008a). Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 128.Google ScholarPubMed
Kriegeskorte, N., Mur, M., Ruff, D. A., et al. (2008b). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 11261141.CrossRefGoogle ScholarPubMed
Kroczek, L. O., Lingnau, A., Schwind, V., Wolff, C., & Mühlberger, A. (2021). Angry facial expressions bias towards aversive actions. Plos one, 16(9), 113.CrossRefGoogle ScholarPubMed
Lanzilotto, M., Maranesi, M., Livi, A., et al. (2020). Stable readout of observed actions from format-dependent activity of monkey’s anterior intraparietal neurons. Proceedings of the National Academy of Sciences, 117(28), 1659616605.CrossRefGoogle ScholarPubMed
Lavie, N., & Dalton, P. (2014). Load theory of attention and cognitive control. The Oxford Handbook of Attention, 5675.Google Scholar
Levin, B. (1993). English Verb Classes and Alternations. Chicago: The University of Chicago Press.Google Scholar
Lingnau, A., & Downing, P. E. (2015). The lateral occipitotemporal cortex in action. Trends in Cognitive Sciences, 19(5), 268277.CrossRefGoogle ScholarPubMed
Lingnau, A., & Petris, S. (2013). Action understanding inside and outside the motor system: The role of task difficulty. Cerebral Cortex, 23(6), 13421350. https://doi.org/10.1093/cercor/bhs112.CrossRefGoogle Scholar
Lingnau, A., Gesierich, B., & Caramazza, A. (2009). Asymmetric fMRI adaptation reveals no evidence for mirror neurons in humans. Proceedings of the National Academy of Sciences, 106(24), 99259930.CrossRefGoogle ScholarPubMed
Liu, S., Brooks, N. B., & Spelke, E. S. (2019). Origins of the concepts cause, cost, and goal in prereaching infants. Proceedings of the National Academy of Sciences, 116(36), 1774717752.CrossRefGoogle ScholarPubMed
Livi, A., Lanzilotto, M., Maranesi, M., et al. (2019). Agent-based representations of objects and actions in the monkey pre-supplementary motor area. Proceedings of the National Academy of Sciences, 116(7), 26912700.CrossRefGoogle ScholarPubMed
Loula, F., Prasad, S., Harber, K., & Shiffrar, M. (2005). Recognizing people from their movement. Journal of Experimental Psychology: Human Perception and Performance, 31(1), 210220.Google ScholarPubMed
Maeda, F., Kleiner-Fisman, G., & Pascual-Leone, A. (2002). Motor facilitation while observing hand actions: Specificity of the effect and role of observer’s orientation. Journal of Neurophysiology, 87(3), 13291335.CrossRefGoogle ScholarPubMed
Majdandžić, J., Bekkering, H., van Schie, H. T., & Toni, I. (2009). Movement-specific repetition suppression in ventral and dorsal premotor cortex during action observation. Cerebral Cortex, 19(11), 27362745.CrossRefGoogle ScholarPubMed
Maranesi, M., Livi, A., & Bonini, L. (2017). Spatial and viewpoint selectivity for others’ observed actions in monkey ventral premotor mirror neurons. Scientific Reports, 7(1), 17.CrossRefGoogle ScholarPubMed
Marr, D. (1982). Vision. W.H. Freeman.Google Scholar
Mattar, A. A., & Gribble, P. L. (2005). Motor learning by observing. Neuron, 46(1), 153160.CrossRefGoogle ScholarPubMed
McDonough, K. L., Hudson, M., & Bach, P. (2019). Cues to intention bias action perception toward the most efficient trajectory. Scientific Reports, 9(1), 110.CrossRefGoogle ScholarPubMed
McMahon, E., & Isik, L. (2023). Seeing social interactions. Trends in Cognitive Science, 27(12), 11651179.CrossRefGoogle ScholarPubMed
Meltzoff, A. N., & Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates. Science, 198(4312), 7578.CrossRefGoogle ScholarPubMed
Milner, A. D., & Goodale, M. A. (1995). The Visual Brain in Action. Oxford: Oxford University Press.Google Scholar
Minsky, M. (1975). Minsky’s Frame System Theory. Proceedings of the 1975 workshop on theoretical issues in natural language processing, pages 104116.Google Scholar
Morris, M. W., & Murphy, G. L. (1990). Converging operations on a basic level in event taxonomies. Memory & Cognition, 18(4), 407418.CrossRefGoogle ScholarPubMed
Muhammad, K., Ullah, A., Imran, A. S., et al. (2021). Human action recognition using attention based LSTM network with dilated CNN features. Future Generation Computer Systems, 125, 820830.CrossRefGoogle Scholar
Mukamel, R., Ekstrom, A. D., Kaplan, J., Iacoboni, M., & Fried, I. (2010). Single-neuron responses in humans during execution and observation of actions. Current Biology, 20(8), 750756.CrossRefGoogle ScholarPubMed
Muthukumaraswamy, S. D., & Johnson, B. W. (2004a). Changes in rolandic mu rhythm during observation of a precision grip. Psychophysiology, 41(1), 152156.CrossRefGoogle ScholarPubMed
Muthukumaraswamy, S. D., Johnson, B. W., & McNair, N. A. (2004b). Mu rhythm modulation during observation of an object-directed grasp. Cognitive Brain Research, 19(2), 195201.CrossRefGoogle ScholarPubMed
Murata, A., Fadiga, L., Fogassi, L., et al. (1997). Object representation in the ventral premotor cortex (area F5) of the monkey. Journal of Neurophysiology, 78(4), 22262230.CrossRefGoogle ScholarPubMed
Murty, N. A. R., Bashivan, P., Abate, A., DiCarlo, J. J., & Kanwisher, N. (2021). Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nature Communications, 12, 114.Google Scholar
Nastase, S. A., Connolly, A. C., Oosterhof, N. N., et al. (2017). Attention selectively reshapes the geometry of distributed semantic representation. Cerebral Cortex, 27(8), 42774291.CrossRefGoogle ScholarPubMed
Netanyahu, A., Shu, T., Katz, B., Barbu, A., & Tenenbaum, J. B. (2021). Phase: Physically-grounded abstract social events for machine social perception. In Proceedings of the aaai Conference on Artificial Intelligence, 35(1), 845853.CrossRefGoogle Scholar
Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424430.CrossRefGoogle ScholarPubMed
Nosofsky, R. M. (1986). Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General, 115(1), 3957.CrossRefGoogle ScholarPubMed
Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cognitive Sciences, 11(12), 520527.CrossRefGoogle ScholarPubMed
Oostenbroek, J., Suddendorf, T., Nielsen, M., et al. (2016). Comprehensive longitudinal study challenges the existence of neonatal imitation in humans. Current Biology, 26, 13341338.CrossRefGoogle ScholarPubMed
Oosterhof, N. N., Wiggett, A. J., Diedrichsen, J., Tipper, S. P., & Downing, P. E. (2010). Surface-based information mapping reveals crossmodal vision–action representations in human parietal and occipitotemporal cortex. Journal of Neurophysiology, 104(2), 10771089.CrossRefGoogle ScholarPubMed
Oosterhof, N. N., Tipper, S. P., & Downing, P. E. (2012a). Viewpoint (in) dependence of action representations: An MVPA study. Journal of Cognitive Neuroscience, 24(4), 975989.CrossRefGoogle ScholarPubMed
Oosterhof, N. N., Tipper, S. P., & Downing, P. E. (2012b). Visuo-motor imagery of specific manual actions: A multi-variate pattern analysis fMRI study. Neuroimage, 63(1), 262271.CrossRefGoogle ScholarPubMed
Oosterhof, N. N., Tipper, S. P., & Downing, P. E. (2013). Crossmodal and action-specific: Neuroimaging the human mirror neuron system. Trends in Cognitive Sciences, 17(7), 311318.CrossRefGoogle ScholarPubMed
Orban, G. A., Ferri, S., & Platonov, A. (2019). The role of putative human anterior intraparietal sulcus area in observed manipulative action discrimination. Brain and Behavior, 9, 113.CrossRefGoogle ScholarPubMed
Orban, G. A., Lanzilotto, M., & Bonini, L. (2021). From observed action identity to social affordances. Trends in Cognitive Sciences, 25(6), 493505.CrossRefGoogle ScholarPubMed
Orgs, G., Hagura, N., & Haggard, P. (2013). Learning to like it: Aesthetic perception of bodies, movements and choreographic structure. Consciousness and Cognition, 22(2), 603612.CrossRefGoogle ScholarPubMed
Osiurak, F., & Badets, A. (2016). Tool use and affordance: Manipulation-based versus reasoning-based approaches. Psychological Review, 123(5), 534568.CrossRefGoogle ScholarPubMed
Oztop, E., Wolpert, D., & Kawato, M. (2005). Mental state inference using visual control parameters. Cognitive Brain Research, 22, 129151.CrossRefGoogle ScholarPubMed
Oztop, E., Kawato, M., & Arbib, M. A. (2013). Mirror neurons: Functions, mechanisms, and models. Neuroscience Letters, 540, 4355.CrossRefGoogle ScholarPubMed
Papeo, L. (2020). Twos in human visual perception. Cortex, 132, 473478.CrossRefGoogle ScholarPubMed
Papeo, L., Agostini, B., & Lingnau, A. (2019). The large-scale organization of gestures and words in the middle temporal gyrus. Journal of Neuroscience, 39(30), 59665974.CrossRefGoogle ScholarPubMed
Peelen, M. V., & Downing, P. E. (2007). Using multi-voxel pattern analysis of fMRI data to interpret overlapping functional activations. Trends in Cognitive Sciences, 11(1), 4–4.CrossRefGoogle ScholarPubMed
Peelen, M. V., & Kastner, S. (2014). Attention in the real world: Toward understanding its neural basis. Trends in Cognitive Sciences, 18(5), 242250.CrossRefGoogle ScholarPubMed
Perrett, D. I., Harries, M. H., Bevan, R., et al. (1989). Frameworks of analysis for the neural representation of animate objects and actions. Journal of Experimental Biology, 146(1), 87113.CrossRefGoogle ScholarPubMed
Petrini, K., Pollick, F. E., Dahl, S., et al. (2011). Action expertise reduces brain activity for audiovisual matching actions: An fMRI study with expert drummers. Neuroimage, 56(3), 14801492.CrossRefGoogle ScholarPubMed
Pinker, S. L. (1989). Cognition: The Acquisition of Argument Structure. MIT Press.Google Scholar
Pitcher, D., & Ungerleider, L. G. (2021). Evidence for a third visual pathway specialized for social perception. Trends in Cognitive Sciences, 25(2), 100110.CrossRefGoogle ScholarPubMed
Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10(2), 5963.CrossRefGoogle ScholarPubMed
Press, C., Weiskopf, N., & Kilner, J. M. (2012). Dissociable roles of human inferior frontal gyrus during action execution and observation. Neuroimage, 60(3), 16711677.CrossRefGoogle ScholarPubMed
Prinz, W. (1997). Perception and action planning. European Journal of Cognitive Psychology, 9(2), 129154.CrossRefGoogle Scholar
Quadflieg, S., & Westmoreland, K. (2019). Making sense of other people’s encounters: Towards an integrative model of relational impression formation. Journal of Nonverbal Behavior, 43, 233256.CrossRefGoogle Scholar
Ramsey, R., Darda, K. M., & Downing, P. E. (2019). Automatic imitation remains unaffected under cognitive load. Journal of Experimental Psychology: Human Perception and Performance, 45(5), 601615.Google ScholarPubMed
Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 7987.CrossRefGoogle ScholarPubMed
Reddy, V., & Uithol, S. (2016). Engagement: Looking beyond the mirror to understand action understanding. British Journal of Developmental Psychology, 34, 101114.CrossRefGoogle ScholarPubMed
Repp, B. H., & Knoblich, G. (2004). Perceiving action identity: How pianists recognize their own performances. Psychological Science, 15(9), 604609.CrossRefGoogle ScholarPubMed
Rifkin, A. (1985). Evidence for a basic level in event taxonomies. Memory & Cognition, 13(6), 538556.CrossRefGoogle ScholarPubMed
Riley, M. R., & Constantinidis, C. (2016). Role of prefrontal persistent activity in working memory. Frontiers in Systems Neuroscience, 9, 114.CrossRefGoogle ScholarPubMed
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169192.CrossRefGoogle ScholarPubMed
Rizzolatti, G., & Fogassi, L. (2014). The mirror mechanism: Recent findings and perspectives. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1644), 112.CrossRefGoogle Scholar
Rizzolatti, G., & Sinigaglia, C. (2010). The functional role of the parieto-frontal mirror circuit: Interpretations and misinterpretations. Nature Reviews Neuroscience, 11(4), 264274.CrossRefGoogle ScholarPubMed
Rizzolatti, G., & Sinigaglia, C. (2016). The mirror mechanism: A basic principle of brain function. Nature Reviews Neuroscience, 17(12), 757765.CrossRefGoogle ScholarPubMed
Rizzolatti, G., Scandolara, C., Gentilucci, M., & Camarda, R. (1981). Response properties and behavioral modulation of ‘mouth’ neurons of the postarcuate cortex (area 6) in macaque monkeys. Brain Research, 225(2), 421424.CrossRefGoogle ScholarPubMed
Rizzolatti, G., Camarda, R., Fogassi, L., et al. (1988). Functional organization of inferior area 6 in the macaque monkey: II. Area F5 and the control of distal movements. Experimental Brain Research, 71, 491507.CrossRefGoogle ScholarPubMed
Rizzolatti, G., Fadiga, L., Matelli, M., et al. (1996). Localization of grasp representations in humans by PET: 1. Observation versus execution. Experimental Brain Research, 111, 246252.CrossRefGoogle ScholarPubMed
Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews Neuroscience, 2(9), 661670.CrossRefGoogle ScholarPubMed
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382439.CrossRefGoogle Scholar
Ross, L. (2018). From the fundamental attribution error to the truly fundamental attribution error and beyond: My research journey. Perspectives on Psychological Science, 13(6), 750769.CrossRefGoogle Scholar
Saxe, R., & Kanwisher, N. (2003). People thinking about thinking people: The role of the temporo-parietal junction in ‘theory of mind’. Neuroimage, 19(4), 18351842.CrossRefGoogle ScholarPubMed
Schank, R. C., & Abelson, R. P. (1977). Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures. Psychology press.Google Scholar
Schultz, J., & Frith, C. D. (2022). Animacy and the prediction of behaviour. Neuroscience & Biobehavioral Reviews, 140, 111.CrossRefGoogle ScholarPubMed
Schurz, M., Radua, J., Aichhorn, M., Richlan, F., & Perner, J. (2014). Fractionating theory of mind: A meta-analysis of functional brain imaging studies. Neuroscience & Biobehavioral Reviews, 42, 934.CrossRefGoogle ScholarPubMed
Sebanz, N., & Knoblich, G. (2021). Progress in joint-action research. Current Directions in Psychological Science, 30(2), 138143.CrossRefGoogle Scholar
Seeliger, K., Ambrogioni, L., Güçlütürk, Y., et al. (2021). End-to-end neural system identification with neural information flow. PLoS Computational Biology, 17(2), 122.CrossRefGoogle ScholarPubMed
Seger, C. A. (1997). Two forms of sequential implicit learning. Consciousness and Cognition, 6(1), 108131.CrossRefGoogle ScholarPubMed
Serences, J. T., Schwarzbach, J., Courtney, S. M., Golay, X., & Yantis, S. (2004). Control of object-based attention in human cortex. Cerebral Cortex, 14(12), 13461357.CrossRefGoogle ScholarPubMed
Shahdloo, M., Çelik, E., Urgen, B. A., Gallant, J. L., & Çukur, T. (2022). Task-dependent warping of semantic representations during search for visual action categories. Journal of Neuroscience, 42(35), 67826799.CrossRefGoogle ScholarPubMed
Shepard, R. N. (1958). Stimulus and response generalization: Tests of a model relating generalization to distance in psychological space. Journal of Experimental Psychology, 55(6), 509523.CrossRefGoogle Scholar
Singer, J. M., & Sheinberg, D. L. (2010). Temporal cortex neurons encode articulated actions as slow sequences of integrated poses. Journal of Neuroscience, 30(8), 31333145.CrossRefGoogle ScholarPubMed
Sliwa, J., & Freiwald, W. A. (2017). A dedicated network for social interaction processing in the primate brain. Science, 356, 745749.CrossRefGoogle Scholar
Southgate, V. (2013). Do infants provide evidence that the mirror system is involved in action understanding? Consciousness and Cognition, 22(3), 11141121.CrossRefGoogle ScholarPubMed
Spoerer, C. J., McClure, P., & Kriegeskorte, N. (2017). Recurrent convolutional neural networks: A better model of biological object recognition. Frontiers in Psychology, 8, 114.CrossRefGoogle ScholarPubMed
Spunt, R. P., & Lieberman, M. D. (2013). The busy social brain: Evidence for automaticity and control in the neural systems supporting social cognition and action understanding. Psychological Science, 24(1), 8086.CrossRefGoogle ScholarPubMed
Spunt, R. P., & Lieberman, M. D. (2014). Automaticity, control, and the social brain. In J. W. Sherman, B. Gawronski, & Y. Trope (Eds.), Dual-process theories of the social mind (pp. 279298). New York, NY: Guilford Press.Google Scholar
Spunt, R. P., Satpute, A. B., & Lieberman, M. D. (2011). Identifying the what, why, and how of an observed action: An fMRI study of mentalizing and mechanizing during action observation. Journal of Cognitive Neuroscience, 23(1), 6374.CrossRefGoogle ScholarPubMed
Spunt, R. P., Kemmerer, D., & Adolphs, R. (2016). The neural basis of conceptualizing the same action at different levels of abstraction. Social Cognitive and Affective Neuroscience, 11(7), 11411151.CrossRefGoogle ScholarPubMed
Stangl, M., Maoz, S. L., & Suthana, N. (2023). Mobile cognition: Imaging the brain in the ‘real world’. Nature Reviews Neuroscience, 24, 347362.CrossRefGoogle ScholarPubMed
Strafella, A. P., & Paus, T. (2000). Modulation of cortical excitability during action observation: A transcranial magnetic stimulation study. Neuroreport, 11(10), 22892292.CrossRefGoogle ScholarPubMed
Summerfield, C., Trittschuh, E. H., Monti, J. M., Mesulam, M. M., & Egner, T. (2008). Neural repetition suppression reflects fulfilled perceptual expectations. Nature Neuroscience, 11(9), 10041006.CrossRefGoogle ScholarPubMed
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. Language Typology and Syntactic Description, 3(99), 36149.Google Scholar
Tamir, D. I., & Thornton, M. A. (2018). Modeling the predictive social mind. Trends in Cognitive Sciences, 22(3), 201212.CrossRefGoogle ScholarPubMed
Tanaka, K. (1997). Mechanisms of visual object recognition: Monkey and human studies. Current Opinion in Neurobiology, 7, 523529.CrossRefGoogle ScholarPubMed
Tarhan, L., & Konkle, T. (2020). Sociality and interaction envelope organize visual action representations. Nature Communications, 11(1), 111.CrossRefGoogle ScholarPubMed
Tarhan, L., de Freitas, J., & Konkle, T. (2021). Behavioral and neural representations en route to intuitive action understanding. Neuropsychologia, 163, 110.CrossRefGoogle ScholarPubMed
Thompson, E. L., Bird, G., & Catmur, C. (2019). Conceptualizing and testing action understanding. Neuroscience & Biobehavioral Reviews, 105, 106114.CrossRefGoogle ScholarPubMed
Thompson, E. L., Long, E. L., Bird, G., & Catmur, C. (2023). Is action understanding an automatic process? Both cognitive and perceptual processing are required for the identification of actions and intentions. Quarterly Journal of Experimental Psychology, 76(1), 7083.CrossRefGoogle ScholarPubMed
Thompson, J., & Parasuraman, R. (2012). Attention, biological motion, and action recognition. Neuroimage, 59(1), 413.CrossRefGoogle ScholarPubMed
Thornton, M. A., & Tamir, D. I. (2021a). People accurately predict the transition probabilities between actions. Science Advances, 7, 112. https://doi.org/10.1126/sciadv.abd4995.CrossRefGoogle ScholarPubMed
Thornton, M. A., & Tamir, D. I. (2021b). Perceiving actions before they happen: Psychological dimensions scaffold neural action prediction. Social Cognitive and Affective Neuroscience, 16(8), 807815.CrossRefGoogle ScholarPubMed
Thornton, M. A., & Tamir, D. I. (2022). Six dimensions describe action understanding: The ACT-FASTaxonomy. Journal of Personality and Social Psychology, 122(4), 577605.CrossRefGoogle ScholarPubMed
Tomasello, M., Kruger, A. C., & Ratner, H. H. (1993). Cultural learning. Behavioral and Brain Sciences, 16(3), 495511.CrossRefGoogle Scholar
Troje, N. F., & Basbaum, A. (2008). Biological motion perception. The Senses: A Comprehensive Reference, 2, 231238.Google Scholar
Tucciarelli, R., Wurm, M., Baccolo, E., & Lingnau, A. (2019). The representational space of observed actions. elife, 8, 124.CrossRefGoogle ScholarPubMed
Turella, L., Pierno, A. C., Tubaldi, F., & Castiello, U. (2009). Mirror neurons in humans: Consisting or confounding evidence? Brain and Language, 108(1), 1021.CrossRefGoogle ScholarPubMed
Turella, L., Wurm, M. F., Tucciarelli, R., & Lingnau, A. (2013). Expertise in action observation: Recent neuroimaging findings and future perspectives. Frontiers in Human Neuroscience, 7, 15. https://doi.org/10.3389/fnhum.2013.00637.CrossRefGoogle ScholarPubMed
Turella, L., Rumiati, R., & Lingnau, A. (2020). Hierarchical action encoding within the human brain. Cerebral Cortex, 30(5), 29242938. https://doi.org/10.1093/cercor/bhz284.CrossRefGoogle ScholarPubMed
Uithol, S., van Rooij, I., Bekkering, H., & Haselager, P. (2012). Hierarchies in action and motor control. Journal of Cognitive Neuroscience, 24(5), 10771086.CrossRefGoogle ScholarPubMed
Umiltà, M. A., Kohler, E., Gallese, V., et al. (2001). I know what you are doing: A neurophysiological study. Neuron, 19, 155165.CrossRefGoogle Scholar
Umiltà, M. A., Escola, L., Intskirveli, I., et al. (2008). When pliers become fingers in the monkey motor system. Proceedings of the National Academy of Sciences, 105(6), 22092213.CrossRefGoogle ScholarPubMed
Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In Analysis of Visual Behavior. Edited by Ingle, D. J., Goodale, M. A., & Mansfield, R. J. W., 549586. MIT Press.Google Scholar
Valentine, T., Lewis, M. B., & Hills, P. J. (2016). Face-space: A unifying concept in face recognition research. Quarterly Journal of Experimental Psychology, 69(10), 19962019.CrossRefGoogle ScholarPubMed
Vallacher, R. R., & Wegner, D. M. (1989). Levels of personal agency: Individual variation in action identification. Journal of Personality and Social Psychology, 57, 660671.CrossRefGoogle Scholar
Van Overwalle, F. (2009). Social cognition and the brain: A meta-analysis. Human Brain Mapping, 30(3), 829858.CrossRefGoogle ScholarPubMed
Van Overwalle, F., & Baetens, K. (2009). Understanding others’ actions and goals by mirror and mentalizing systems: A meta-analysis. Neuroimage, 48(3), 564584.CrossRefGoogle ScholarPubMed
Vannuscorps, G., & Caramazza, A. (2016). Typical action perception and interpretation without motor simulation. Proceedings of the National Academy of Sciences, 113(1), 8691.CrossRefGoogle ScholarPubMed
Vannuscorps, G., & Caramazza, A. (2023). Effector-specific motor simulation supplements core action recognition processes in adverse conditions. Social Cognitive and Affective Neuroscience, 18(1), 111.CrossRefGoogle ScholarPubMed
Vinson, D. P., & Vigliocco, G. (2008). Semantic feature production norms for a large set of objects and events. Behavior Research Methods, 40(1), 183190.CrossRefGoogle ScholarPubMed
Vrigkas, M., Nikou, C., & Kakadiaris, I. A. (2015). A review of human activity recognition methods. Frontiers in Robotics and AI, 2, 128.CrossRefGoogle Scholar
Watson, C. E., & Buxbaum, L. J. (2014). Uncovering the architecture of action semantics. Journal of Experimental Psychology: Human Perception and Performance, 40(5), 18321848.Google ScholarPubMed
Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131(3), 460473.CrossRefGoogle ScholarPubMed
Wurm, M. F., & Caramazza, A. (2019). Distinct roles of temporal and frontoparietal cortex in representing actions across vision and language. Nature Communications, 10(1), 289.CrossRefGoogle ScholarPubMed
Wurm, M. F., & Caramazza, A. (2022). Two ‘what’ pathways for action and object recognition. Trends in Cognitive Sciences, 26(2), 103116.CrossRefGoogle ScholarPubMed
Wurm, M. F., & Lingnau, A. (2015). Decoding actions at different levels of abstraction. Journal of Neuroscience, 35, 77277735.CrossRefGoogle ScholarPubMed
Wurm, M. F., & Schubotz, R. I. (2012). Squeezing lemons in the bathroom: Contextual information modulates action recognition. Neuroimage, 59, 15511559.CrossRefGoogle ScholarPubMed
Wurm, M. F., & Schubotz, R. I. (2017). What’s she doing in the kitchen? Context helps when actions are hard to recognize. Psychonomic Bulletin & Review, 24, 503509.CrossRefGoogle ScholarPubMed
Wurm, M. F., Ariani, G., Greenlee, M., & Lingnau, A. (2015). Decoding concrete and abstract action representations during explicit and implicit conceptual processing. Cerebral Cortex, 26(8), 33903401. https://doi.org/10.1093/cercor/bhv169.CrossRefGoogle ScholarPubMed
Wurm, M. F., Artemenko, C., Giuliani, D., & Schubotz, R. I. (2017a). Action at its place: Contextual settings enhance action recognition in 4- to 8-year-old children. Developmental Psychology, 53(4), 662670.CrossRefGoogle ScholarPubMed
Wurm, M. F., Caramazza, A., & Lingnau, A. (2017b). Action categories in lateral occipitotemporal cortex are organized along sociality and transitivity. Journal of Neuroscience, 37, 562575.CrossRefGoogle ScholarPubMed
Yau, J. M., Pasupathy, A., Brincat, S. L., & Connor, C. E. (2013). Curvature processing dynamics in macaque area V4. Cerebral Cortex, 23, 198209.CrossRefGoogle ScholarPubMed
Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S., & Reynolds, J. R. (2007). Event perception: A mind/brain perspective. Psychological Bulletin, 133(2), 273293.CrossRefGoogle ScholarPubMed
Zhuang, T., & Lingnau, A. (2022). The characterization of actions at the superordinate, basic and subordinate level. Psychological Research, 86(6), 18711891.CrossRefGoogle ScholarPubMed
Zhuang, T., Kabulska, Z., & Lingnau, A. (2023). The representation of observed actions at the subordinate, basic and superordinate level. Journal of Neuroscience, 43(48), 82198230.CrossRefGoogle ScholarPubMed
Figure 0

Table 1 Definitions of action understanding.

Figure 1

Figure 1A: Examples of paradigms used in the monkey literature.

Figure 2

Figure 1B: Examples of paradigms used in the human literature. These examples give a sense of the wide variety of stimuli and tasks used in this literature, which may include schematics, still images, animations, or movies of typical or atypical manual or whole-body actions, either in a natural or a constrained context. The diversity of these examples is matched by the diversity of terminology and definitions adopted in the action understanding literature (see Table 1).

Figure 3

Table 2 Action understanding at different hierarchical levels.

Figure 4

Figure 2 Successful action understanding requires generalizing over highly distinct exemplars (e.g. of ; right side) including unusual ones (centre bottom image) while excluding highly similar non-exemplars (e.g. carving; left side).

Figure 5

Figure 3 Illustration of the action ‘spaces’ idea. Action kinds may be construed as atom-like points in representational spaces, the dimensions of which may correspond to psychologically meaningful distinctions. Positions of actions reflect their values on hypothetical mental dimensions. Distances between actions are proportional to subjective judgments of the similarity between them. Here we present only a reduced example for the sake of clarity; realistic action spaces would be far more complex.

Figure 6

Figure 4 Illustration of the ‘action frames’ perspective. A, B: Perceptual subsystems process objects, body postures, movements, and scenes to extract relevant aspects of the action, and the relationships among them. C: Mental ‘action frames’ capture the roles, relationships, and reasons that comprise our action knowledge. Slots of a given frame gather perceptual evidence about scene elements. Matches increase the evidence for one action () relative to others (). Normally, interactions between perceptual subsystems and action frames cohere rapidly to select one action frame; action understanding is this convergence of activity. Links omitted for clarity.

Figure 7

Figure 5A: Action space of four hypothetical action categories without attention (see also Figure 3).

Figure 8

Figure 5B: Action space of four hypothetical action categories while attending to the category highlighted in red. In this example, distinctions among the members of the attended category are enhanced, whereas distinctions within irrelevant action categories, and also between action categories, are attenuated.

Figure 9

Figure 6A: Macaque brain, lateral view.

Adapted from Riley & Constantinidis (2016).
Figure 10

Figure 6B: Human brain, lateral view.

Adapted from https://www.supercoloring.com/coloring-pages/human-brain-anatomy. F5: rostral portion of ventral premotor cortex, CS: central sulcus, AIP: anterior intraparietal area, IPL: inferior parietal lobe, STS: superior temporal sulcus, IT: inferior-temporal cortex, V1: primary visual cortex, PMv: ventral premotor cortex, PMd: dorsal premotor cortex, IFG: inferior frontal gyrus, S1: primary somatosensory cortex, SPL: superior parietal lobule, pSTS: posterior superior temporal sulcus, LOTC: lateral occipitotemporal cortex, MT: middle temporal area.
Figure 11

Figure 7A: Post-stimulus ‘rebound’ of the suppressed cortical mu-rhythm response following execution of a repetitive action (solid lines) or passive observation of a similar action (dotted lines).

From Hari et al., 2006.
Figure 12

Figure 7B: Enhancement of the contralateral Motor-Evoked Potential by passive observation of a grasping action (top) relative to an object observation control (bottom) in two hand muscles (first dorsal interosseus, left; opponens policis, right).

From Fadiga et al., 1995.
Figure 13

Figure 7C: Human brain regions commonly activated in action observation, in action execution tasks, or by both tasks, in fMRI experiments.

From Hardwick et al., 2018.
Figure 14

Figure 7D: Top: Human IPL exhibits repetition suppression for transitive hand actions that were mimed and then observed. From Chong et al., 2009. Bottom: Reduction in the hemodynamic response function to repeated actions relative to non-repeated actions.

From Chong et al., 2009.
Figure 15

Figure 7E: Top: Schematic illustration of the logic from multivoxel pattern analysis (MVPA) fMRI studies that sought to identify regions in which local voxel patterns are more similar for the same action than different actions, across performance and observation. Bottom: Brain regions that exhibit the similarity patterns described in the top panel, as revealed by surface-based MVPA of fMRI data.

From Oosterhof et al., 2010, 2013.

Save element to Kindle

To save this element to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Action Understanding
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Action Understanding
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Action Understanding
Available formats
×