Published online by Cambridge University Press: 01 June 2004
The emulation theory of representation is developed and explored as a framework that can revealingly synthesize a wide variety of representational functions of the brain. The framework is based on constructs from control theory (forward models) and signal processing (Kalman filters). The idea is that in addition to simply engaging with the body and environment, the brain constructs neural circuits that act as models of the body and environment. During overt sensorimotor engagement, these models are driven by efference copies in parallel with the body and environment, in order to provide expectations of the sensory feedback, and to enhance and process sensory information. These models can also be run off-line in order to produce imagery, estimate outcomes of different actions, and evaluate and develop motor plans. The framework is initially developed within the context of motor control, where it has been shown that inner models running in parallel with the body can reduce the effects of feedback delay problems. The same mechanisms can account for motor imagery as the off-line driving of the emulator via efference copies. The framework is extended to account for visual imagery as the off-line driving of an emulator of the motor-visual loop. I also show how such systems can provide for amodal spatial imagery. Perception, including visual perception, results from such models being used to form expectations of, and to interpret, sensory input. I close by briefly outlining other cognitive functions that might also be synthesized within this framework, including reasoning, theory of mind phenomena, and language.
1. Those interested in more technical details should see any of the many works that discuss KFs in detail, for example, Kalman (1960); Kalman and Bucy (1961); Gelb (1974); Bryson and Ho (1969); and Haykin (2001); for some discussion of applications of KFs (among other constructs) to understanding brain function, see Eliasmith and Anderson (2003).
2. It might be wondered what justification there is for assuming that the driving force can be predicted accurately. This is just by definition. It is assumed that the process is subject to external influences. Any influence that is completely predictable is a driving force; the rest of the external influence – whatever is not predictable – is process noise. So in a case where there were an “unpredictable” driving force, this would actually be part of the process noise.
3. See discussion in Hutchins (1995; especially Ch. 3).
4. What gets suppressed is the overt performance. Interestingly, however, a great many other bodily events normally associated with overt performance, such as increases in metabolic activity, heart rate, and so on, accompany many kinds of motor imagery. For a review, see Jeannerod (1994). In this article, when I speak of motor commands being suppressed in favor of the processing of an efference copy, I mean only the overt bodily movements are suppressed. It may even be so that in some cases there is a small degree of muscular excitation, perhaps because the motor signals are not completely blocked.
5. It should be noted that Johnson's position here is not exactly the same as Jeannerod's, because he claims that this imagery is used in order to construct a final motor plan. But, as far as I can tell anyway, Johnson nevertheless is maintaining that it is imagery that is being used, and that this imagery is the result of the “simulated” operation of efferent motor areas, those involved in planning a movement. The details are complex, though, and Johnson's position may not be a good example of what I call the simulation theory.
6. The situation here is complex. It is not clear to what extent and under what conditions the MSS emulator adapts as a function of plant drift. While the case of phantom limb patients suggests that it can, other cases of paralysis suggest that this is not always so (Johnson 2000b). I will simply note that the emulation theory itself need not take a stand on whether, and under what conditions, emulators are malleable. I use the example of apparent malleability in the case of phantom limb patients to make the contrast between the emulation and the simulation theories clear. But that clarificatory role does not depend on the empirical issue of the conditions under which such malleability actually obtains.
7. A benefit of the emulation theory over the simulation theory is that it allows us to make sense of the difference between (a) things which we cannot move but do not feel paralyzed and (b) things which we cannot move and do for that reason feel paralyzed. The first group includes not only our own body parts over which we have no voluntary control, such as our hair, but also foreign things such as other people's arms, chairs and tables, et cetera. We cannot move these things, but the phenomenology of their not being voluntarily movable is not like that of a paralyzed part. If mere lack of ability to produce a motor plan accounted for the feeling of paralysis, then all of these things should seem paralyzed. On the emulation theory, the feeling of paralysis is the product of a mismatch between a motor plan and the resultant feedback, whether from the body or the emulator. Such a mismatch is possible only when we can produce a motor plan that mismatches the result of the attempt to effect that motor plan.
8. One difference is that Murphy was an actual robot, whereas the model discussed here is completely virtual.
9. Mel's Murphy is a very simple system working in a very constrained environment. More complex environments, including those with objects that moved without the agent willing it, would be far less predictable. This is of course much of the reason why, in perception, the Kalman gain is set fairly high. I use Murphy because its simplicity makes it a good exemplar for introducing the basic ideas, and I simply note that real perceptual situations will require much more sophistication. For an example of the sort of complexities complexities that a full version of this sort of mechanism would need to deal with, see Nolfi and Tani (1999) and the references therein.
10. The emulation theory predicts exactly Wexler et al.'s results. Thus, if it were the case that motor areas were not active during “active” visual imagery, or if it were the case that the specific nature of the motor command associated with the imagined movement (rotate right vs. rotate left, for example) were not recruited during such imagery, then this would be prima facie evidence against the emulation theory.
11. I call this kind of imagery amodal rather than multimodal because this sort of imagery, if it in fact exists, is not tied to any modality. But in a sense it is multimodal, because it can be used to produce a modal image in any modality so long as a measurement procedure appropriate to that modality is available. The expressions amodal and multimodal are used in many ways, and it may be that what I here am calling amodal imagery might be close to what some researchers have called multimodal imagery.
12. For example: If x is between a and b, and b is between a and c, is x necessarily between a and c? There is reason to think that such questions are answered by engaging in spatial imagery, but little reason to think that much in the way of specifically visual mock experience is needed, though of course it might be involved in specific cases.
13. What I have in mind here is the idea that the neurally implemented emulator represents states by things like firing frequencies, phases, and such. A neural pool that is representing the presence of a predator behind the rock by firing rapidly can be “directly measured” in the sense that other neural systems can be wired such as to sniff that pool's activation state, and hence be sensitive to the presence of the predator. A “measurement” of this state would yield a visual image of a rock, because the predator is not in the visual image, and hence the narrowly modal emulator would throw away relevant information.
14. Exactly how to understand such a system is not trivial. Understanding the which system as an attentional tagging mechanism is sufficient for present expositional purposes, but my suspicion, which I am not prepared to argue for here, is that it is a system that has richer representational properties, such as the constitution of basic object identity. Of course, the richer sort of mechanism, if there is one, will surely be based at least in part upon a simpler attentional tagging mechanism.
15. I owe this phrase to Ramesh Jain, who produced it during a talk at UCSD.