3 Dynamic information for face perception
3.1 Introduction
All faces share the same spatial configuration – eyes above nose above mouth. Yet, when we look at a face, even briefly, we not only classify the face as a face, but we also notice its individual uniqueness. The human face provides a wide range of social signals to the perceiver. It tells us about gender, intention, health, and approximate age. The relative shapes and postures of the facial features help to specify the emotional state of the person (Ekman Reference Ekman1982), and movements of the lips and tongue help us to distinguish different speech sounds (for example see Campbell Reference Campbell1986). Direction of eye gaze often signals the focus of our attention (Kleinke Reference Kleinke1986), and aids turn-taking during conversation (Kendon Reference Kendon1967). In addition to these kinds of information, the face provides a particularly accessible and salient cue to identity, although people can of course be identified by means other than the face. Voice, body shape, and clothing may all act as cues to identity, in circumstances where facial detail is not available.
Face research has been criticized for its reliance on static photographs of faces (Bruce Reference Bruce1994), which may be processed in a different manner to real faces (Pike et al.Reference Pike, Kemp, Towell and Phillips1997). One obvious difference between a photograph of a face and the real thing is that the latter moves. Indeed, the preponderance of our experience with faces is not with static images, but with live moving faces. At a simplistic level, information from the face can be thought of in two distinct ways – that based on static-based parameters (time-independent), and that based on dynamic1 (time-varying) parameters. Dynamic information arises because the face moves in a variety of ways, some to do with its signal-sending functions (smiling, nodding, speaking) and some to do with other functions (looking, chewing). When we yawn, when we laugh, when we talk, and when we smile, our face moves in a complex manner. Faces move in both rigid and non-rigid ways. During rigid motion, the face maintains its three-dimensional form, while the whole head changes its relative position and/or orientation. In non-rigid motion, individual parts of the face move in relation to one other, as during the formation of expressions or the articulation of speech. Typically, a complex combination of both rigid and non-rigid motion is required for everyday interaction, with head and face movements superimposed onto larger body movements (Munhall and Vatikiotis-Bateson Reference Munhall, Vatikiotis-Bateson, Campbell, Dodd and Burnham1998). In this chapter we examine the usefulness of facial motion for a variety of face perception tasks.
One way to illustrate the importance of information conveyed by motion is to present displays in which motion is just about the only source of information available. In the classic experiment of this kind, Johansson (Reference Johansson1973) attached small lights to the major joints of an actor’s body, videotaped in the dark performing a range of activities. When the tape was played the contrast was adjusted, so that just the movements of the ‘point-lights’ were visible. Importantly, when static the point-lights appeared as a random distribution to the viewer, and it was only when the video was played that the prevailing structure of a moving person became apparent. Work using this point-light technique has shown that isolated dynamic information can convey useful information about the gender of the actor (Kozlowski and Cutting Reference Kozlowski and Cutting1977), the approximate amount of weight lifted (Runeson and Frykholm Reference Runeson and Frykholm1981; Bingham Reference Bingham1987), and emotion displayed during a dance routine (Dittrich et al.Reference Dittrich, Troscianko, Lea and Morgan1996).
A similar point-light technique has also been applied to human faces. Bassili (Reference Bassili1978) attached a large number of bright dots to the surface of the face, which was then filmed carrying out a series of different expressions. Results indicated that when the point-light display was shown moving, participants were highly accurate at determining which expression was shown. In a follow-up experiment, Bruce and Valentine (Reference Bruce, Valentine, Gruneberg and Morris1988) created point-light displays of a number of personally familiar colleagues, and asked other members of the department to act as participants. Results concurred with Bassili (Reference Bassili1978), indicating that decisions such as identifying a face as a face, and discriminating between different facial expressions, could be made much more accurately from moving point-light displays, compared to static displays. Bruce and Valentine (Reference Bruce, Valentine, Gruneberg and Morris1988) also asked participants to try and decide the sex of each face as well as identifying the person viewed. Participants’ abilities to categorize gender and to identify which person (from the six familiar faces captured) appeared in each clip was significantly better when the displays were seen moving, compared to when static, though overall performance was very poor. Even though performance in this ‘identity’ task was highly inaccurate, this study does provide us with limited evidence that isolated dynamic information is also helpful for within-category discriminations, of the type involved in recognizing an individual’s face.
Thus, initial evidence indicates an important role for dynamic information in categorizing faces in various ways. The following sections review further evidence to support this claim, focusing on the role that dynamic information plays in categorizing facial expressions and processing visual speech. We then address in more detail whether the additional information afforded by motion is also useful during identity processing. Here we describe some of our own work which suggests that dynamic information does indeed act as a useful cue when recognizing familiar faces, particularly when recognition is problematic from static form alone. However, the role of motion in recognition memory for relatively unfamiliar faces is much less clear from current evidence. Finally, we evaluate the importance of these findings from both a practical and theoretical perspective.
3.2 Motion information for expression perception
Much past research on facial expression processing has utilized static facial images, devoid of motion. Bruner and Taguiri (Reference Bruner, Tagiuri and Lindzey1954) state that ‘historically speaking, we may have been done a disservice by the pioneering efforts of those, who, like Darwin, took the human face in a state of arrested animation as an adequate stimulus situation for studying how well we recognize human emotion’. Indeed, while it is clear that judgements of emotion from static images can be very accurate (see Ekman Reference Ekman1982), ordinarily when we assess an individual’s emotion we have a wealth of cues in addition to a fixed facial expression. The patterning of the facial expression from onset to offset provides dynamic cues, often accompanied by larger body movements. These cues, in turn, are supplemented by our knowledge of the context of the emotional response, and sometimes, by knowledge of the individual exhibiting the expression.
Although most past research has utilized static images of faces, it seems unlikely that the information provided by motion is redundant. Edwards (Reference Edwards1998) suggests that humans are attuned to the dynamic aspects of facial expressions of emotion. He presented participants with a number of photographs, each of which depicted a ‘snapshot’ of the same expression, taken at intervals of 67 ms in real time. Participants were asked to reproduce the progression of the spontaneous expression (from onset to offset), from the scrambled series of photographs. Results indicated that participants were able to utilize extremely subtle dynamic cues between the expression photographs, to reproduce the correct temporal progression of the display, at above-chance accuracy.
Dynamic aspects (e.g. speed of onset and offset, degree of irregularity) of facial movement also appear to distinguish genuine from posed emotional facial expressions (see Duchenne Reference Duchenne1990). During the 1970s, Ekman and Friesen (Reference Ekman and Friesen1978) developed a ‘Facial Action Coding System’, or FACS. This system allows a researcher to precisely catalogue the movement of different groups of facial muscles over the time course of an expression. Using this system Ekman and his colleagues claim that it is possible to distinguish between 7000 different facial expressions (including 19 different types of smiles). Often differences between expressions are reflected in their temporal dynamic properties. For example, Ekman, Friesen, and Simons (Reference Ekman, Friesen and Simons1985) found that the onset of a posed ‘startle’ expression was 100 msecs later than a spontaneous ‘startle’ expression. Ekman and Friesen (Reference Ekman and Friesen1982) speculated that false (deceptive) expressions tend to have very short onset and offset times, and are typically over-long or unusually short. In line with this suggestion, Weiss, Blum, and Gleberman (Reference Weiss, Blum and Gleberman1987) found that deliberate facial expressions had shorter onset times and more irregularities (pauses and stepwise intensity changes). Hess and Kleck (Reference Hess and Kleck1990) found these differences were most marked when the deception involved the concealment of a different emotion (for example, smiling while watching a disgusting episode). However, later work by Hess and Kleck (Reference Hess and Kleck1994) suggested that participants were relatively poor at using these cues to differentiate between genuine and posed expressions.
Further evidence for the importance of dynamic information during expression decoding has been found from experimental studies, studies of the patterns of impairments found in brain-injured patients, and more recent work using brain imaging techniques.
Matsuzaki and Sato (Reference Matsuzaki and Sato2008) examined the contribution of motion information to facial expression perception using point-light displays of faces. In the motion condition, apparent motion was induced by displaying a neutral expression followed by an emotional face image. In the repetition condition, the same emotional face image was presented twice. Results indicated that correct expression perception was higher in the motion than the repetition condition, and that this advantage was reduced when a white blank field was inserted between the neutral and emotional expression. Thus, even viewing a simplistic induced motion display served to increase expression recognition.
Interestingly, Kamachi et al. (Reference Kamachi, Bruce, Mukaida, Gyoba, Yoshikawa and Akamatsu2001) found that the precise dynamic characteristics of the observed motion affected how well different expressions could be recognized. In this study, dynamic expressions were created by displaying morph sequences, morphing between a neutral and a peak expression. Adding different numbers of intervening frames to the sequence, fast (6 frames), medium (26 frames), and slow (101 frames) changed the speed of the motion. In a free description task participants were asked to describe the emotion viewed. Results suggested that sadness was most accurately identified from slow sequences, with happiness, and, to a lesser extent surprise, most accurately from fast sequences. Angry expressions were best recognized from medium speed sequences. A second experiment confirmed that this result was not simply due to differences in the total time of the display, but rather reflected differences in the dynamic properties of the observed motion. Later work by Pollick, Hill, Calder, and Paterson (Reference Pollick, Hill, Calder and Paterson2003) found that changing the duration of an expression had a small effect on ratings of emotional intensity, with a trend for expressions with shorter durations to have lower ratings of intensity.
Finally, in terms of experimental work, Bould, Morris, and Wink (Reference Bould, Morris and Wink2008) inves-tigated the importance of dynamic temporal characteristic information in facilitating the recognition of subtle expressions of emotion. In Experiment 1 there were three conditions, dynamic moving sequences that showed the expression emerging from neutral to a subtle emotion, a dynamic presentation containing nine static stills from the dynamic moving sequences (ran together to encapsulate a moving sequence) and a First-Last condition containing only the first (neutral) and last (subtle emotion) stills. The results showed recognition was significantly better for the dynamic moving sequences than both the Dynamic-9 and First-Last conditions. Further experiments changed the dynamics of the moving sequences by speeding up, slowing down, or disrupting the rhythm of the motion sequences. These manipulations significantly reduced recognition, and it was concluded that in addition to the perception of change, recognition is facilitated by the characteristic muscular movements associated with the portrayal of each emotion.
In terms of patient work, Humphreys, Donnelly, and Riddoch (Reference Humphreys, Donnelly and Riddoch1993) report the case study of a brain-injured patient, with severe face processing impairments. Patient HJA is markedly impaired at recognizing the identity of familiar faces, and is poor at making gender and emotional expression judgements from static photographs. In contrast, when asked to make judgements of expression or gender from moving point-light displays, he performs normally. It seems that with expressions, this patient is able to use movement but not static form information. This pattern of deficits supports not only the idea that identity and expression processing fractionate (see later in this chapter), but that expression processing itself can be separated according to whether expression is conveyed through static form or motion information.
Recent work by Trautmann, Fehr, and Herrmann (Reference Trautmann, Fehr and Herrmann2009) used an fMRI study to examine the neural networks involved in the emotion perception of static and dynamic facial stimuli separately (neutral, happy, and disgusted expressions). Dynamic faces indicated enhanced emotion-specific brain activation patterns in the parahippocampal gyrus (PHG), including the amygdala (AMG), fusiform gyrus (FG), superior temporal gyrus (STG), inferior frontal gyrus (IFG), and occipital and orbitofrontal cortex (OFC). Post hoc ratings of the dynamic stimuli revealed a better recognizability in comparison to the static stimuli.
In conclusion, while most work by psychologists interested in emotional expressions has used static displays of posed expressions it seems likely that motion information provides an important dimension of emotional processing in everyday interpersonal interactions. Dynamic facial expressions might provide a more appropriate approach to examine the processing of emotional face perception than static stimuli.
3.3 Motion information for visual speech perception
It has been well documented that visual information from a talker’s mouth and face plays an important role in the perception and understanding of spoken language (see Massaro Reference Massaro1987 for an early review, and contributions to this volume). Under noisy conditions, viewing the talking face supplements the auditory signal, increasing perceptual accuracy (e.g. Sumby and Pollack Reference Sumby and Pollack1954; Walden et al.Reference Walden, Prosek, Mongomery, Scherr and Jones1977). We make use of visual information from the face even during the understanding of clear and unambiguous speech (see Reisberg et al.Reference Reisberg, McLean, Goldfield, Dodd and Campbell1987; Vitkovich and Barber Reference Vitkovich and Barber1994). The classic demonstration that we use visual information when perceiving speech is ‘the McGurk effect’ where visual and auditory speech signals are combined in a way which can give rise to illusory percepts of phonemes which do not correspond to what was seen or heard (e.g. auditory ‘ba’ plus visual ‘ga’ often results in observers hearing ‘da’) (McGurk and MacDonald Reference McGurk and MacDonald1976).
Most current descriptions of visual speech information are based on static parameters, such as lip shape, tongue height, place of cavity constriction, and amount of visible teeth (Montgomery and Jackson Reference Montgomery and Jackson1983; Summerfield and McGrath Reference Summerfield and McGrath1984). Indeed, there is much evidence to support the fact that static-based parameters can convey some visual speech information. For example, Campbell (Reference Campbell1986) showed that participants could readily distinguish between point vowels, such as ‘ee’ and ‘oo’ from photographs of faces posturing articulatory positions. However, just because we can make use of static-based information for visual speech, does not preclude the use of the dynamic information afforded by a moving speaking face.
Evidence demonstrating the salience of isolated dynamic information for visual speech processing comes from studies carried out using point-light displays. In early work, Summerfield (Reference Summerfield1979) used simplistic point-light displays (four ‘points’ placed on lips) to determine if visible lip motion could help listeners process heard speech, played against a background of interfering prose. Although listeners were able to identify the visual displays as moving lips, comprehension of speech was only marginally improved. However, a follow-up study by Rosenblum, Johnson and Saldaña (Reference Rosenblum, Johnson and Saldaña1996) found that moving point-light configurations could indeed enhance the perception of speech, embedded in white noise. No such advantage was found with static point-light displays.
Additional evidence for the salience of dynamic information in visual speech processing comes from Rosenblum and Saldaña (Reference Rosenblum and Saldaña1996), who found that point-light displays could generate audiovisual fusion illusions (McGurk and MacDonald Reference McGurk and MacDonald1976). For example, when a moving point-light display of a face saying /va/ was paired with an auditory /ba/, participants often experienced that /fa/ was spoken. No such fusions, between what was seen and what was heard, were found when the visual display was a static image of a face mouthing /va/ (but see Benoît et al.Reference Benoît, Abry, Cathiard, Guiard-Marigny and Lallouache1995). The fact that static facial speech does not integrate strongly with auditory speech, led Rosenblum and Saldaña (Reference Rosenblum, Saldaña, Campbell, Dodd and Burnham1998) to suggest ‘that time-varying (dynamic) dimensions of visible speech should be given serious consideration as the most salient informational form’.
Other evidence for the primacy of dynamic information in visual speech comes from Vitkovitch and Barber (1994), who investigated the effect of frame rate on speechreading ability. Results indicated that faster frame rates (16.5 Hz to 30 Hz) were much better at conveying visual speech, than images presented at a slower rate (8.3 Hz to 12.5 Hz). Vitkovitch and Barber (1994) concluded that this increase must be due to additional information becoming available to the viewer as the frame rate is increased. The most likely source of this information is from dynamic parameters.2
It is clear then that dynamic information has an important role to play in the processing of visual speech, although it is difficult, based on the research outlined so far, to assess its relative importance compared with information based on static parameters. One interesting way to explore this issue involves the testing of brain-injured patients, who have established problems with the perception of visual motion (e.g. McLeod et al.Reference McLeod, Dittrich, Driver, Perrett and Zihl1996). Here, it is possible to directly clarify the importance of dynamic facial information for speechreading, as dynamic information is not available to these patients. Campbell and colleagues (Reference Campbell, Zihl, Massaro, Munhall and Cohen1997) reported the speechreading ability of one such patient (LM). While LM’s reading of natural speech was severely impaired, she was able to recognize speech patterns from face photographs and provide reasonable speechreading of monosyllables produced in isolation. As with other visual events (for example, tracking the direction of gaze) the rate of presentation was critical to her performance. She was able to report events during slow presentation (~ one event/2 seconds), but was poor at distinguishing between normal, fast (double-speed), or slow (half-speed) seen speech. Campbell et al. (Reference Campbell, Zihl, Massaro, Munhall and Cohen1997) concluded that visible speech perception cannot be based solely on dynamic properties of speech, otherwise LM should have lost the ability to perform any speechreading task (from either static or moving displays). Instead, Campbell et al. (Reference Campbell, Zihl, Massaro, Munhall and Cohen1997) suggest that both static and dynamic information is required for effective speechreading of natural speech. However, Rosenblum and Saldaña (Reference Rosenblum, Saldaña, Campbell, Dodd and Burnham1998) emphasize that data from brain-injured patients should be cautiously interpreted, since speechreading ability varies considerably between individuals (Demorest et al.Reference Demorest, Bernstein and DeHaven1996) and the speechreading ability of these patients prior to their lesions is not known.
In summary, much research has emphasized the importance of dynamic information for visual speech processing. This observation, alongside the previous discussion of expression processing, leads naturally to the question of whether dynamic information, which is available from a moving face and clearly used in at least some processing tasks, might also provide information useful for the recognition of identity. Next, we report a series of experiments that compare recognition performance from moving and static faces, to answer this question.
3.4 Dynamic information for familiar face recognition
The majority of research on face recognition has been concerned with how static faces are recognized, and static form-based information has – implicitly or explicitly – been emphasized in most theoretical accounts of face recognition. It has long been known that static-based information, about the shape and configuration of individual features (for example, see Tanaka and Farah Reference Tanaka and Farah1993) and the overall shape and pigmentation of the skin, is utilized in the recognition of identity (see Bruce and Langton Reference Bruce and Langton1994; Kemp et al.Reference Kemp, Pike, White and Musselman1996). This information can, of course, be as easily extracted from a static face image as a moving one.
Given that our recognition of known people from photographs or pictures is typically so good (see Burton et al.Reference Burton, Wilson, Cowan and Bruce1999), it has often been assumed that ‘motion is little used for face identification’ (Humphreys et al.Reference Humphreys, Donnelly and Riddoch1993). However this conclusion seems premature, considering that dynamic information seems to be particularly salient for both expression processing and visual speech processing. Indeed, seeing a face move undoubtedly adds additional information for the viewer, unavailable from a static image. The key question, then, involves determining the nature and role this additional information afforded by motion plays in identity processing. It is important to investigate this issue not only to advance our theoretical understanding of the processes involved in face recognition, but also to determine whether static images are an adequate way to represent faces in studies of this kind.
Initial findings by Bruce and Valentine (Reference Bruce, Valentine, Gruneberg and Morris1988), using point-light displays, suggested that isolated dynamic information may act as a cue to identity when static cues are impoverished. More convincing evidence that movement is important in the recognition of individual faces comes from Knight and Johnston (Reference Knight and Johnston1997). Knight and Johnston (Reference Knight and Johnston1997) presented famous faces either in a negative (contrast-reversed) format or upside down, and compared recognition performance from moving and static sequences. In experiments of this kind it is usually necessary to degrade spatial cues to reduce recognition performance away from ceiling. Results indicated that moving famous faces were recognized significantly more accurately than static ones, but only when these were shown as upright-negative images. Knight and Johnston (Reference Knight and Johnston1997) proposed that seeing the face move may provide evidence about its three-dimensional structure, compensating for the degraded depth cues available within a negative image (see Bruce and Langton Reference Bruce and Langton1994). Alternatively, they suggest that known faces may have characteristic facial gestures, idiosyncratic to the individual viewed.
Our follow-up research (see Lander et al.Reference Lander, Christie and Bruce1999; Lander et al.Reference Lander, Bruce and Hill2001) showed that the recognition advantage for moving faces is not specific to upright-negated images. Instead motion confers benefits through a range of image manipulations, including thresholding (where a multiple grey-level image is converted to a one-bit per pixel black-and-white image), pixelation, and Gaussian blurring. In these studies what seems important for the demonstration of a motion recognition advantage is not the nature of the image manipulation but rather that recognition performance is below ceiling, allowing higher recognition rates to be found. Our moving sequences show famous faces, pictured from the shoulders up, talking and expressing. Thus, these experiments clearly demonstrate that non-rigid movement adds useful information when recognizing the identity of famous faces shown in difficult viewing conditions. It is important to note at this point that the motion recognition advantage is not simply due to the increased number of static images shown when the face is in motion (25 frames per second in UK). Indeed when the number of images is equated across static and moving presentation conditions (Lander and Bruce Reference Lander and Bruce2001) there was still an advantage for viewing the face in motion.
Furthermore, the viewing conditions in which facial motion is observed have been shown to affect the extent to which motion aids recognition. Research indicates that disruptions to the natural movement of the face can influence the size of the motion advantage in facial recognition. Lander et al. (Reference Lander, Christie and Bruce1999) and Lander and Bruce (Reference Lander and Bruce2001) found lower recognition rates of famous faces when the motion was slowed down, speeded up, reversed or rhythmically disrupted. Thus, seeing the precise dynamic characteristics of the face in motion provides the greatest advantage for facial recognition. A further demonstration of this point comes from research using both natural and artificially created (morphed) smiling stimuli (Lander et al.Reference Lander, Chuang and Wickham2006). In order to create an artificially moving sequence, Lander et al. (Reference Lander, Chuang and Wickham2006) used a morphing technique to create intermediate face images between the first and last frames of a natural smile. When shown in sequence, these images were used to create an artificially moving smile that lasted the same amount of time, and had the same start and end point as the natural smile for that individual. Results found that familiar faces were recognized significantly better when shown naturally smiling compared with a static neutral face, a static smiling face, or a morphed smiling sequence. This further demonstrates the necessity for motion to be natural in order to facilitate the motion advantage.
Lander et al. (Reference Lander, Chuang and Wickham2006) whilst investigating the effects of natural vs morphed motion found a main effect of familiarity, revealing that the nature of the to-be-recognized face can also mediate what effect motion has on facial recognition. It is posited that the more familiar a person’s face is, the more we may be able to utilize the movement of their face as a cue to identity. Indeed, characteristic motion patterns may become a more accessible cue to identity as a face becomes increasingly familiar. Indeed, in recent work, Butcher (Reference Butcher2009) found a significant positive correlation between rated face familiarity and the recognition advantage for moving compared with static faces. This research was conducted using famous faces and found that the more familiar the famous face was rated to be, the larger the recognition advantage for viewing that face in motion.
Another factor of the to-be-recognized face that has been shown to be important in understanding what mediates the motion advantage is distinctiveness. Facial recognition research has demonstrated a clear benefit for faces that are thought to be spatially distinctive, as findings indicate that distinctive faces are better recognized than faces that are rated as being ‘typical’ (Light et al.Reference Light, Kayra-Stuart and Hollander1979; Bartlett et al.Reference Bartlett, Hurry and Thorley1984; Valentine and Bruce Reference Valentine and Bruce1986; Valentine and Ferrara Reference Valentine and Ferrara1991; Vokey and Read Reference Vokey and Read1992). It has also been established that a larger motion recognition advantage is attained from distinctive motion than typical motion (Lander and Chuang Reference Lander and Chuang2005; Butcher Reference Butcher2009). Lander and Chuang (Reference Lander and Chuang2005) found that the more distinctive or characteristic a person’s motion was rated to be, the more useful a cue to recognition it was. This finding can be considered within Valentine’s (Reference Valentine1991) multi dimensional face space model of facial recognition, which is often used to provide an explanation for the spatial distinctiveness effect. This model posits that faces similar to a prototype or ‘typical’ face are clustered closer together in face space making them harder to differentiate from each other, leading to distinctive faces that are positioned away from this cluster, to be easily recognized. Due to the homogeneous nature of faces on the whole, many faces are perceived as similar to the prototype so their representations in face space cluster close to the prototype representation, leading to distinction between these faces being more difficult.
A similar theoretical explanation could be applied to moving faces, whereby faces in the centre of the space move in a typical manner. Consequently, faces that exhibit distinctive facial motions could be located away from the centre of the space, making them easier to recognize than faces displaying typical motion. It is important here to consider that distinctiveness in facial motion may refer to (1) a motion characteristic or typical of a particular individual; (2) an odd motion for a particular individual to produce or; (3) a generally odd or unusual motion. Also, it may be that spatial and temporal distinctiveness of faces are in some way related. For instance, spatially distinctive faces might naturally have more distinctive movements, a notion that should itself be addressed in future research.
Having clearly demonstrated a moving recognition advantage it is important to investigate the theoretical basis of this effect. Two theories have been proposed by O’Toole, Roark, and Abdi (Reference O’Toole, Roark and Abdi2002), namely the representation enhancement hypothesis and the supplemental information hypothesis. The representation enhancement hypothesis (O’Toole et al.Reference O’Toole, Roark and Abdi2002) suggests that facial motion aids recognition by facilitating the perception of the three-dimensional structure of the face. It posits that the quality of the structural information available from a human face is enhanced by facial motion, and this benefit surpasses the benefit provided by merely seeing the face from many static viewpoints (Pike et al.Reference Pike, Kemp, Towell and Phillips1997; Christie and Bruce Reference Christie and Bruce1998; Lander et al.Reference Lander, Christie and Bruce1999). As this mechanism is not dependent on any previous experience with an individual face it seems to predict that motion should aid recognition of previously unfamiliar faces, a notion discussed later in this chapter.
In contrast, the supplemental information hypothesis (O’Toole et al.Reference O’Toole, Roark and Abdi2002) assumes that we represent the characteristic facial motions of an individual’s face as part of our stored facial representation for that individual. For the particular individual’s characteristic facial motions to be learnt, experience with that face is needed – the face must be familiar. ‘Characteristic motion signatures’ are learnt over time, allowing a memory of what facial motions a person typically exhibits to be stored as part of their facial representation. Therefore when motion information for an individual has been integrated into the representation of their face, this information can be retrieved and used to aid recognition of that face.
When considering the theoretical basis of the motion recognition advantage from a cognitive and representational perspective, a number of studies have provided support for the idea that facial motion becomes intrinsic to a familiar individual’s face representation. For example, Knappmeyer, Thornton, and Bülthoff (Reference Knappmeyer, Thornton and Bülthoff2003) used two synthetic heads, each animated by the movement of a different volunteer. Participants viewed and thus became familiar with either head A with motion from volunteer A, or head B with motion from volunteer B. In the test phase an animated head constructed from the morph of the two synthetic heads (A and B) was produced. Participants were asked to identify whose head was shown. It was found that participants’ identity judgements were biased by the motion they had originally learnt from head A or B, demonstrating that representations of an individual’s characteristic facial motions are learnt and are inherent to that individual’s face representation.
Repetition priming studies (Lander and Bruce Reference Lander and Bruce2004; also see Lander et al.Reference Lander, Bruce, Hancock and Smith2009) have also found evidence that is consistent with the view that dynamic (motion) information is inherent to the face representation of a particular individual. Repetition priming is the facilitation demonstrated at test when the to-be-recognized item (here a face) has previously been encountered at some point prior to the test (Lewis and Ellis Reference Lewis and Ellis1999). Such priming effects have previously been demonstrated for words (Jackson and Morton Reference Jackson and Morton1984) and objects (Warren and Morton Reference Warren and Morton1982; Bruce et al.Reference Bruce, Carson, Burton and Ellis2000) as well for familiar faces (Bruce and Valentine Reference Bruce and Valentine1985). Repetition priming has been used to probe the nature of the representations underlying recognition of faces (e.g. Ellis et al.Reference Ellis, Flude, Young and Burton1996; Ellis et al.Reference Ellis, Burton, Young and Flude1997). It is proposed that when priming is sensitive to some change in the form of the faces between study and test, this parameter may be intrinsic to the representations that mediate face recognition. In the prime phase of Lander and Bruce’s (Reference Lander and Bruce2004) experiments, participants were presented with a series of famous faces and asked to name or provide some semantic information about the person presented. Half of the faces were presented in static form and half in motion. In the test phase participants were presented with a series of faces and asked to make familiarity judgements about them, indicating whether each face was familiar or unfamiliar. Lander and Bruce (Reference Lander and Bruce2004) found that, even when the same static image was shown in the prime and the test phases, a moving image primed more effectively than a static image (Experiment 1). This finding was extended (Experiment 2) to reveal that a moving image remains the most effective prime, compared to a static image prime, when moving images are used in the test phase. Significantly, providing support for the notion of ‘characteristic motion signatures’ inherent to a person’s face representation, Lander and Bruce (Reference Lander and Bruce2004) also found that the largest priming advantage was found with naturally moving faces, rather than with those shown in slow motion (Experiment 3). However, it was also observed that viewing the same moving facial sequence at prime as at test produced more priming than using differing moving images (Experiment 4).
3.5 Dynamic information for unfamiliar face learning
O’Toole et al.’s (Reference O’Toole, Roark and Abdi2002) first explanation of the advantages of dynamic presentation of faces should predict benefits for unfamiliar face recognition too. However, in contrast to these intriguing effects on the identification of familiar faces, the effect of motion on the learning of unfamiliar faces is much less clear. Early work conducted by Christie and Bruce (Reference Christie and Bruce1998) found no advantage for either rigid or non-rigid motion in face learning. In this incidental learning task, participants were shown faces either as a moving computer-animated display or as a series of static images, and were asked to decide whether they thought each person shown studied arts or science subjects at university. The number of frames in the moving and static conditions was equated in the learning phase. The motion involved was either non-rigid (expression changes) or rigid (head nodding or shaking). At test, participants either saw moving sequences or a single static image (so the number of frames was not equated at test) and were asked which were the faces of people seen earlier in the arts/science task. Results indicated there was no benefit for studying faces in motion on the subsequent recognition task. However, there was a slight benefit for testing faces in (rigid) motion, compared to static images. In line with this finding, Schiff, Banka, and De Bordes Galdi (Reference Schiff, Banka and De Bordes Galdi1986) found an advantage for testing recognition memory for unfamiliar faces using a moving sequence rather than a static ‘mug-shot’ photograph. These findings can be compared with those using familiar faces (Knight and Johnston Reference Knight and Johnston1997; Lander et al.Reference Lander, Christie and Bruce1999; Lander et al.Reference Lander, Bruce and Hill2001) who also found a beneficial effect of motion at test.
Despite this early work a number of studies have found an advantage for learning faces from moving sequences. For example, Pike, Kemp, Towell, and Phillips (Reference Pike, Kemp, Towell and Phillips1997) filmed actors rotating on a motorized chair, which was illuminated from a single light source. In the learning phase, participants were asked to try and learn the identity of previously unfamiliar faces from either dynamic sequences (10-second clip), multiple static images (5 images selected from the moving sequence, each presented for 2 seconds) or a single static image (single image presented for 10-seconds). The dynamic sequence showed the target initially facing the video camera, and then undergoing a full 360-degree rotation of the chair. At test, participants viewed a single (full-face) static image, different from any shown in the learning phase. They were asked to decide if the face shown had been present in the earlier learning phase. Results indicated that there was a significant advantage for faces learned via a coherent moving sequence. Bruce and Valentine (Reference Bruce, Valentine, Gruneberg and Morris1988) reported a similar trend in an experiment which compared learning from video sequences of the target faces speaking, nodding etc. to sequences of single static images. Again, test images were single images taken from a different viewpoint, on a different occasion. In this experiment performance was best when the faces were learned via a moving sequence, although the difference between the moving and static condition failed to reach significance. The failure to reach significance was explained by the authors in terms of the variability of performance, for this task, across the participant population.
In later follow-up work, Lander and Bruce (Reference Lander and Bruce2003) conducted four experiments that aimed to investigate the usefulness of rigid (head nodding, shaking) and non-rigid (talking, expressions) motion for establishing new face representations of previously unfamiliar faces. Results showed that viewing a face in motion leads to more accurate face learning, compared with viewing a single static image (Experiment 1). The advantage for viewing the face moving rigidly seemed to be due to the different angles of view contained in these sequences (Experiment 2). However, the advantage for non-rigid motion was not simply due to multiple images (Experiment 3) and was not specifically linked to forwards motion but also extended to reversed sequences (Experiment 4). Thus, although there seems to be clear beneficial effects of motion for face learning, they do not seem to be due to the specific dynamic properties of the sequences shown. Instead, the advantage for non-rigid motion may reflect increased attention to faces moving in a socially important manner.
Finally, Lander and Davies (Reference Lander and Davies2007) investigated the impact of facial motion as a previously unfamiliar face becomes known. We presented participants with a series of faces each preceded by a name, and asked participants to try to learn the names for the faces. When the participants felt they had learnt the names correctly they continued onto the recognition phase in which they were presented with the same faces (same presentation method as in learning phase), and asked to name the individual. The learning phase was repeated and the participant was asked to try and learn the names of the faces again if any of the names were incorrectly recalled, after which they took the recognition test again. This procedure was replicated until the participant correctly named all 12 faces shown. In the test phase, participants were presented with 48 degraded faces; 24 as single static images and 24 moving. In the moving condition the faces were each presented for 5 seconds. Participants were informed that some would be learnt faces and some would be ‘new’ faces, for which they had not learnt names. Participants were asked to name the face or respond ‘new’ and to provide a response for every trial. Results suggested that facial motion learning was rapid, and as such the beneficial effect of motion was not highly dependent on the amount of time the face was seen for. Rather there was support for the idea of rapidly learnt characteristic facial motion patterns, with results only revealing an advantage for recognizing a face in motion (at test) when the face had been learnt moving. Conversely, when the face was learnt as a static image, there was no advantage for recognizing moving faces compared with a static image. Indeed, it seems that participants were able to extract and encode dynamic information even when viewing very short moving clips of 5 seconds. Furthermore, the beneficial effect of motion was shown to remain stable despite prolonged viewing and learning of the face identity in Experiment 2. In this experiment, participants were assigned to one of four experimental groups. Group 1 viewed episode 1 of a TV drama before the test phase, group 2 viewed episodes 1 and 2, group 3 episodes 1 to 3 and group 4 episodes 1 to 4. Each episode was 30 minutes in length. In the test phase, participants viewed moving and static degraded images of the characters and were asked to try and identify them by character name or other unambiguous semantic information. The results revealed that, although better recognition of characters from the TV drama was seen as the number of episodes viewed increased, the relative importance of motion information did not increase with a viewer’s experience with the face (O’Toole et al.Reference O’Toole, Roark and Abdi2002). The size of the beneficial effect remained relatively stable across time demonstrating how rapidly motion information, through familiarization with the to-be-recognized face, can be integrated into a face representation and utilized at recognition.
To summarize, the role of movement in building face representations is somewhat unclear. Christie and Bruce (Reference Christie and Bruce1998) found no benefit for learning faces that were moving either non-rigidly or rigidly. In contrast Pike et al. (Reference Pike, Kemp, Towell and Phillips1997) and Lander and Bruce (Reference Lander and Bruce2003) found that learning faces in rigid motion did subsequently help participants recognize the faces more accurately, compared with when they were originally presented as a single static image or as a series of statics. It is clear, however, that as a face moves from being unfamiliar to familiar that motion information becomes an important cue to identity (Lander and Davies Reference Lander and Davies2007), however it is unknown how this process is undertaken. Further work is needed to investigate the familiarization process, and to evaluate the role of motion in building face representations.
3.6 Practical considerations
The effects we have reviewed have practical as well as theoretical implications. It has become increasingly important to gain an understanding of how human observers process moving faces, from an applied perspective. In terms of application, facial ‘animation’ has become a developing computer technology, highly important in the entertainment industry (for example Parke and Waters Reference Parke and Waters1996). Animation techniques also have wider impact, for example allowing the construction of realistic dynamic faces useful in the planning of reconstructive facial surgery and forensic medicine (Alley Reference Alley1999). A better understanding of how dynamic information is processed by the human observer should help the development of face animation systems, as well as giving a practical estimate of when (and why) seeing a face move can aid the recognition of identity.
With the use of Closed Circuit Television (CCTV) surveillance systems now commonplace in the UK, moving video footage is often used as a source of evidence in the criminal justice process. Often the video footage captured is of poor ‘grainy’ quality with additional image size, lighting, and focus problems (Aldridge and Knupfer Reference Aldridge and Knupfer1994). Typically the role of the police, witnesses, and jurors alike is either to identify the (familiar) target from the footage or to ‘match’ the viewed person to a (captured) suspect. Experiments described in this chapter suggest that the recognition of known faces from degraded video footage is significantly better when the image is viewed moving rather than static. It seems likely that viewing moving CCTV footage will help to maximize the chances of an observer recognizing a known person. This suggestion also has implications for the design and installation of CCTV systems, as many current systems do not capture continuous motion, but rather operate on a time lapse basis. There may be some very real benefits to be gained from the installation of ‘continuous’ motion systems, although further work is needed to investigate the extent of this potential beneficial effect under these circumstances.
While often it is important to reveal the identity of people shown in video sequences, sometimes attempts must be made to conceal this. In the UK, for example, documentary programmes often show people who for various legal or security reasons should not be identifiable to viewers of the programme. Sometimes faces are concealed by pixellating the face area – presenting the face as a small number of square pixels whose grey levels flicker as the image moves on the screen. More recently some television companies have been using blurring rather than pixellation to conceal identity. While these effects on the surface appear to disguise information which could specify individual identity, our research has shown that familiar faces can quite often be recognized from such image sequences, with moving sequences giving very much higher recognition than static ones (Lander et al.Reference Lander, Bruce and Hill2001). It may be extremely difficult for a film editor unfamiliar with a person shown on the film to judge appropriate levels of image degradation to protect a person from recognition by someone who knows them well. We recommend that the only certain way to conceal identities of faces in moving sequences is to cover them completely with an opaque block.
3.7 Theoretical interpretations
So, it seems that non-rigid movement patterns – either of faces generally, or of specific faces, aid the recognition of identity. How might this ‘dynamic’ information be stored in memory? One possibility is that the motion trace is quite distinct from the static form-based representation. If this were the case then dynamic information may feed into the face identity system either directly or via other aspects of face processing, where dynamic information is known to be important, for example via expression and/or visual speech processing. However, it is difficult to conceptualize how dynamic information from expression processing and/or visual speech processing could play a role in identity processing, in terms of our current understanding of face processing, as we now explain in a little more detail.
Bruce and Young (Reference Bruce and Young1986) proposed that expression and visual speech processing operate independently of each other and of face identification, and all are processed in parallel from a viewed face (see Figure 3.1). There is a considerable body of converging evidence to support this suggestion of independence.
Figure 3.1 A functional model for face recognition (Bruce and Young Reference Bruce and Young1986)
Firstly, between expression processing and identity processing, evidence comes from prosopagnosic patients who are unable to recognize familiar faces, instead typically identifying the person from their voice or gait (Damasio et al.Reference Damasio, Tranel and Damasio1990). Bruyer et al. (Reference Bruyer, Laterre, Seron, Feyereisen, Strypstein, Pierrard and Rectem1983) reported the case of ‘Mr W’, who could accurately perceive and interpret expressions, but was unable to accurately recognize familiar faces (also see Shuttleworth et al.Reference Shuttleworth, Syring and Allen1982; Schweinberger et al.Reference Schweinberger, Klos and Sommer1995). Conversely, a number of non-prosopagnosic patients have been found who are impaired at facial expression judgements, but have no problems identifying familiar people (Kurucz et al.Reference Kurucz, Feldmar and Werner1979; Etcoff Reference Etcoff1984; Parry et al.Reference Parry, Young, Saul and Moss1991; Humphreys et al.Reference Humphreys, Donnelly and Riddoch1993; Young et al.Reference Young, Newcombe, De Haan, Small and Hay1993). It seems that there is a double dissociation between facial expression processing and identity processing, supporting the notion of independence. Further support for this dissociation comes from Young, McWeeny, Hay, and Ellis (Reference Young, McWeeny, Hay and Ellis1986) who did a matching task with ‘normal’ participants. Participants were required to decide as quickly as possible whether two faces presented simultaneously belonged to the same person (identity matching) or showed the same expression (expression matching). Half of the stimuli faces were familiar to the participants and half were unfamiliar. For the identity matching task, reaction times were significantly faster with familiar faces, but there was no difference across familiarity in the expression matching task. Results clearly support the view, proposed by Bruce and Young (Reference Bruce and Young1986), that expression processing and identity are carried out independently.
Secondly, support for independence between visual speech processing and identity processing comes from a study by Campbell, Landis, and Regard (Reference Campbell, Landis and Regard1986). They reported the case of a prosopagnosic patient, ‘Mrs D’, who despite being severely impaired at identifying familiar faces, performed entirely normally on all visual speech processing tasks. In contrast, another patient, ‘Mrs T’, was unable to perform these speechreading tasks but was unimpaired at recognizing faces or expressions. This pattern of impairments indicates that there is a double dissociation between visual speech processing and identity processing. Campbell, Brooks, De Haan, and Roberts (Reference Campbell, Brooks, De Haan and Roberts1996) also found that while matching judgements based on identity were significantly affected by familiarity (reaction times to familiar faces were significantly faster than to unfamiliar faces), no such effect of familiarity was found when the matching task involved decisions about visual speech (Experiment 1). Results again confirm that visual speech decisions are relatively insensitive to face familiarity.
In summary then, there is considerable evidence to support the notion that both expression processing and visual speech processing operate independently to identity processing. However, most of the studies supporting independence have used static images of faces as stimuli. Here, we are concerned with the importance of dynamic information, provided by expression and/or visual speech processing, for identity processing. It is difficult to address this issue using static stimuli, instead dynamic faces should be used. Indeed, a number of recent studies, using dynamic stimuli, have indicated that there may be some subtle links between these different types of processing. For example, Walker, Bruce, and O’Malley (Reference Walker, Bruce and O’Malley1995) utilized the McGurk effect to examine the claims of independence between identity and facial speech processing. The issue of familiarity and speechreading was studied by manipulating the familiarity of the faces used to create the McGurk stimuli, in that participants were either familiar or unfamiliar with the people ‘speaking’ the syllables. Also the faces and voices used were either congruent (they were from the same person) or incongruent (from different people, some gender matched, some not). Results showed that participants who were familiar with the people reported significantly less expected combination responses, compared to those participants who were unfamiliar with the seen face (regardless of the whether the face and voice were congruent or incongruent). A similar familiarity effect was reported for the expected blend responses when the seen face and heard voice were incongruent (from different people). So, when the face and voice came from different but familiar people, participants rarely reported McGurk blend illusions (perceived ‘da’ following seen ‘ga’ with auditory ‘ba’), but when the same materials were shown to participants unfamiliar with the faces then McGurk blend illusions were common even when face and voice were of people of different genders. These results do not support the dissociation between facial identity and facial speech processing found previously using static stimuli (see Campbell et al.Reference Campbell, Landis and Regard1986; de Gelder et al.Reference de Gelder, Vroomen and van der Heide1991 outlined earlier in this chapter) but show instead that audiovisual speech integration can be influenced by signals from the identity system.
It is clear that speakers show systematic individual variations in the articulation of phonemes. These idiosyncrasies are evident in facial speech as well as auditory speech (see Montgomery and Jackson Reference Montgomery and Jackson1983). Evidence has suggested that speech perception is affected by familiarity with a speaker’s voice (Nygaard et al.Reference Nygaard, Sommers and Pisoni1994). Similarly Lander and Davies (Reference Lander and Davies2008) found that speechreading performance is influenced by face familiarity. In this experiment, we first measured the baseline speechreading performance of participants, from unfamiliar faces. Next, participants were familiarized with the face and voice of either the same or a different speaker, or were asked to take part in a word puzzle instead. Speechreading performance was measured again, before participants completed a further period of familiarization (or puzzle completion) and a final speechreading performance task. Results showed that speechreading performance increased overall with practice but that performance increased significantly more as participants became increasingly familiar with the same speaker. Our findings demonstrate the importance of talker-specific variations or other instance-based characteristics and suggest that these are a useful source of information for speechreading.
Recently there have also been demonstrations that identity can influence facial expression processing as well as facial speech processing. Schweinberger and Soukup (Reference Schweinberger and Soukup1998) used a Garner (Reference Garner1974) interference paradigm to test the independence between face identification and facial speech processing, and face identification and expression analysis. Consistent with the Bruce and Young framework, they observed that responses based upon facial identities were unaffected by variation in expressions, or by facial speech; however, responses based upon facial expressions and facial speech were affected by variations in identity – suggesting that the identity of a face can influence the analysis conducted within these other routes. This asymmetric interference of identities onto expressions, but not expressions onto identities, was replicated by Schweinberger, Burton, and Kelly (Reference Schweinberger, Burton and Kelly1999).
These recent findings suggest that there may be some moderation of the facial expression routes and the facial speech analysis routes on the basis of facial identity (see also Baudouin et al.Reference Baudouin, Sansone and Tiberghien2000). Importantly, though, the facial identification route itself appears uninfluenced by variations in expression or speech. This affirms the position of Bruce and Young (Reference Bruce and Young1986) that the task of face recognition is logically and functionally independent of other uses made of facial information which has consequences for the ways in which face recognition can be studied and explained. Importantly for our purposes here, however, it makes it appear unlikely that the source of the effects of dynamic information on identification lies within the expression and facial speech systems, since there is no evidence that these feed into the person identification pathway.
Instead it seems that dynamic information may in some way be represented within the identification system itself. The Bruce and Young (Reference Bruce and Young1986) model and more recent developments of it (Burton et al.Reference Burton, Bruce and Johnston1990; Burton et al.Reference Burton, Bruce and Hancock1999) have assumed that the face is one of a number of means by which more abstract person identities can be established. Voices and written names are the most frequently mentioned examples of other access routes, but occasionally it has been suggested that ‘gait’ forms a further means of access (though gait patterns alone seem to be relatively poor cues to identity – see Burton et al.Reference Burton, Wilson, Cowan and Bruce1999). It is possible that the dynamic information associated with a person’s expressive and speech movements is part of some more general memory of the way people move. Such a view would see facial dynamics as a discrete source of information about personal identity, like voices and written names, but not incorporated within the face representation system itself. Alternatively if dynamic information is stored within the face identity system then it may be linked to the static form-based face representations, or, may be intrinsically incorporated into the representations themselves (cf. dynamic representations in Freyd Reference Freyd1987; Reference Freyd, Meyer and Kornblum1993). If the representations mediating face recognition are dynamic (whereby the temporal dimension is inextricably embedded in the representation, see Freyd and Pantzer Reference Freyd, Pantzer, Smith, Ward and Finke1995) then recognition from a static image should be thought of as a ‘snapshot’ within an essentially dynamic process.
Finally, when considering the underpinning theoretical basis for the moving recognition advantage, it is interesting to note that a possible dissociation has been revealed between the ability to recognize a face from the motion it produces, and the ability to recognize it from a static image. In this work, Steede, Tree, and Hole (Reference Steede, Tree and Hole2007) reported the case study of a developmental prosopagnosic patient, CS. Despite CS being impaired at recognizing static faces, he was able to effectively discriminate between different dynamic identities, and demonstrated the ability to learn the names of individuals on the basis of their facial movement information (at levels equivalent to control subjects). This case study indicates a possible dissociation between the cognitive mechanisms involved in the processes of recognizing a static face and those involved in recognizing a dynamic face. This research is supported by neuroimaging studies that have demonstrated functional separation of motion and structural aspects of face perception in humans (Haxby et al.Reference Haxby, Hoffman and Gobbini2002). Haxby et al. (Reference Haxby, Hoffman and Gobbini2002) found that facial movements activated the superior temporal sulcus (STS) area while the more shape-based aspects of the face activated the fusiform gyrus. Further neuropsychological studies have revealed differential activations for processing motion and static face information (Schultz and Pilz Reference Schultz and Pilz2009). Based on such neuroimaging studies O’Toole et al. (Reference O’Toole, Roark and Abdi2002) proposed a ‘two-route’ model of face recognition that could explain why facial motion information aids recognition when other stimulus cues, e.g. spatial information and pigmentation, are absent. O’Toole et al. (Reference O’Toole, Roark and Abdi2002) argued that the moving aspects of a face may be encoded and represented separately from static-based aspects of the face. However, Schultz and Pilz (Reference Schultz and Pilz2009) have provided evidence to suggest that spatial and temporal aspects of a face are processed in an integrative manner. Schultz and Pilz (Reference Schultz and Pilz2009) found that for most of the classic face-sensitive areas (bilateral fusiform gyrus, left inferior occipital gyrus and the right superior temporal sulcus [STS]) the response to dynamic faces was higher than to static faces, with the STS revealed as the region most sensitive to moving faces. Thus, there is evidence that both motion and form related areas participate in the processing of moving faces, with higher brain activation for moving than static faces.
Despite support for a potential dissociation between the processing of static and moving faces, research findings are mixed. Steede et al. (Reference Steede, Tree and Hole2007) suggested that CS could use motion as a cue to identity, even when impaired at static face recognition. However, Lander, Humphreys and Bruce (Reference Lander, Humphreys and Bruce2004) found that an acquired prosopagnosic patient HJA was not able to overtly or covertly use motion as a cue to facial identity. In Experiment 1, HJA attempted to recognize the identity of dynamic and static famous faces. HJA was found to be severely impaired in his ability to recognize identity, and was not significantly better at recognizing moving faces compared with static ones. In order to test HJA’s ability to learn face–name pairings a second experiment was conducted using an implicit face recognition task. In this experiment HJA was asked to try and learn true and untrue names for famous faces, which were shown in either a moving clip or a static image. HJA found this a difficult task and was no better with moving faces or true face–name pairings. Some prosopagnosic patients have previously found it easier to learn true face–name pairings more accurately and efficiently than untrue ones (covert recognition by de Haan et al.Reference de Haan, Young and Newcombe1987). A final experiment demonstrated that HJA was able to decide whether two sequentially presented dynamic unfamiliar faces had the same or differing identities. HJA was found to be better at doing this task with moving rather than static images and performance with moving stimuli was found to be comparable to the performance of control participants. His good performance on this matching task demonstrates that HJA retains good enough motion-processing abilities to enable him to match dynamic facial signatures, yet insufficient abilities to allow him to store, recognize, or learn facial identity on the basis of facial movements.
3.8 Future research and conclusions
Our current and future programme of research is aimed at trying to tease apart different theoretical interpretations of the effects of dynamic information on face recognition.
Some questions could be answered making use of the kind of synthetic animated face displays that Christian Benoît was developing at the time of his death (e.g. Le Goff and Benoît Reference Le Goff and Benoît1997 – see also several contributions to this volume). For example, what would happen if we showed one face displaying someone else’s movements? Would Al Gore’s face be easier to recognize if animated with movements derived from Bill Clinton’s face? Here static-based and dynamic cues to identity would be in conflict, allowing us to investigate the relative importance of static and dynamic cues to identity. Using high-quality three-dimensional models of faces derived from cyberware scanners and animated using sophisticated models of expressive and speech movements it should be possible to ask such questions. Using animated head models we should also be able to investigate whether benefits of movement for face identification depend upon the face having been learned in movement. Most of us are familiar with famous faces from past centuries that we have seen only in portraits or photographs. Would Abraham Lincoln be easier to recognize if shown as an animated model than in conventional portrait form? If no effects are found, then this further supports the idea that dynamic benefits do not arise from a very general facilitation of the identification system from patterns of natural movement, but rather reflect specific knowledge about characteristic individual facial gestures. Such experiments in the future could help us to understand the ways in which patterns of movement in facial expressions and facial speech help us to retrieve the identities of faces.
To conclude, while faces are complex, mobile surfaces which change both rigidly and non-rigidly when gesturing, expressing, and talking, much past research on face perception has ignored this mobility, and considered information to be derived from static snap shots of the facial form. Here we have reviewed evidence suggesting that dynamic information provides an important source of information for expression processing, speech perception, and for identification of faces. Future research will help us to understand the way in which dynamic information from the face is represented in memory and the precise mechanism by which it facilitates the retrieval of other information about personal identity.