To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Our ability to recognize the current environment determines our ability to act strategically, for example when selecting a route for walking, anticipating where objects are likely to appear, and knowing what behaviors are appropriate in a particular context.
Whereas objects are typically entities that we act upon, environments are entities that we act within or navigate towards: they extend in space and encompass the observer. Because of this, we often acquire information about our surroundings by moving our head and eyes, getting at each instant a different snapshot or view of the world. Perceived snapshots are integrated with the memory of what has just been seen (Hochberg, 1986; Hollingworth and Henderson, 2004; Irwin et al., 1990; Oliva et al., 2004; Park and Chun, 2009), and with what has been stored over a lifetime of visual experience with the world.
In this chapter, we review studies in the behavioral, computational, and cognitive neuroscience domains that describe the role of the shape of the space in human visual perception. In other words, how do people perceive, represent, and remember the size, geometric structure, and shape features of visual scenes? One important caveat is that we typically experience space in a threedimensional physical world, but we often study our perception of space through two-dimensional pictures.
Perhaps the most characteristic aspect of life, and a powerful engine driving adaptation and evolution, is the ability of organisms to interact with the world by responding adequately to sensory signals. In the animal kingdom, the development of a neural system that processes sensory stimuli, learns from them, and acts upon them has proven to be a major evolutionary advantage in the struggle for existence. It has allowed organisms to flee danger, actively search for food, and inhabit new niches and habitats at a much faster pace than ever before in evolutionary history.
The more complex animals became, the more extensive and specialized became their nervous system (Randall et al., 1997). Whereas some simple invertebrates such as echinoderms lack a centralized brain and have only a ring of interconnected neurons to relay sensory signals, vertebrates such as mammals have developed a highly specialized neural network, consisting of a central and a peripheral nervous system, in which each subunit has its own functional properties in controlling the body. While the spinal cord and brainstem are involved in controlling automated, internal vegetative processes such as heartbeat, respiration, and reflexes, the prosencephalon (the forebrain, containing the neocortex) has specialized in so-called higher-order functions, such as perception, action, learning, memory, emotion, and cognition (Kandel et al., 2000). The specialization of the neural control of movement is a major feature that distinguishes primates from other animals.
Humans observers have a remarkable ability to identify thousands of different things in the world, including people, animals, artifacts, structures, and places. Many of the things we typically encounter are objects – compact entities that have a distinct shape and a contour that allows them to be easily separated from their visual surroundings. Examples include faces, blenders, automobiles, and shoes. Studies of visual recognition have traditionally focused on object recognition; for example, investigations of the neural basis of object and face coding in the ventral visual stream are plentiful (Tanaka, 1993; Tsao and Livingstone, 2008; Yamane et al., 2008).
Some recognition tasks, however, involve analysis of the entire scene rather than just individual objects. Consider, for example, the situation where one walks into a room and needs to determine whether it is a kitchen or a study. Although one might perform this task by first identifying the objects in the scene and then deducing the identity of the surroundings from this list, this would be a relatively laborious process, which does not fit with our intuition (and behavioral data) that we can identify the scene quite rapidly. Consider as well the challenge of identifying one's location during a walk around a city or a college campus, or through a natural wooded environment. Although we can perform this task by identifying distinct object-like landmarks (buildings, statues, trees, etc.), we also seem to have some ability to identify places based on their overall visual appearance.
In understanding visual processing, it is important to establish not only the local response properties for elements in the visual field, but also the scope of neural interactions when two or more elements are present at different locations in the field. Since the original report by Polat and Sagi (1993), the presence of interactions in the two-dimensional (2D) field has become well established by threshold measures (Polat and Tyler, 1999; Chen and Tyler, 1999, 2001, 2008; Levi et al., 2002). A large array of other studies have also looked at such interactions with suprathreshold paradigms (e.g., Field et al., 1993; Hess et al., 2003). The basic story from both kinds of studies is that there are facilitatory effects between oriented elements that are collinear with an oriented test target and inhibitory effects elsewhere in the 2D spatial domain of interaction (although the detectability of a contrast increment on a Gabor pedestal also reveals strong collinear masking effects).
The present work extends this question to the third dimension of visual space as specified by binocular disparity, asking both what interactions are present through the disparity dimension and how these interactions vary with the spatial location of the disparate targets. Answering these questions is basic to the understanding of the visual processing of the 3D environment in which we find ourselves.
Seeing in 3D is a fundamental problem for any organism or device that has to operate in the real world. Answering questions such as “how far away is that?” or “can we fit through that opening?” requires perceiving and making judgments about the size of objects in three dimensions. So how do we see in three dimensions? Given a sufficiently accurate model of the world and its illumination, complex but accurate models exist for generating the pattern of illumination that will strike the retina or cameras of an active agent (see Foley et al., 1995). The inverse problem, how to build a three-dimensional representation from such two-dimensional patterns of light impinging on our retinas or the cameras of a robot, is considerably more complex.
In fact, the problem of perceiving 3D shape and layout is a classic example of an ill-posed and underconstrained inverse problem. It is an underconstrained problem because a unique solution is not obtainable from the visual input. Even when two views are present (with the slightly differing viewpoints of each eye), the images do not necessarily contain all the information required to reconstruct the three-dimensional structure of a viewed scene. It is an illposed problem because small changes in the input can lead to significant changes in the output: that is, reconstruction is very vulnerable to noise in the input signal. The problem of constructing the three-dimensional structure of the viewed scene is an extremely difficult and usually impossible problem to solve uniquely.
When an object in the world moves relative to the eye, the image of the object moves across the retina. Motion that occurs on the retina is referred to as retinal motion. When objects move within our visual field we tend to move our eyes, head, and body to track them in order to keep them sharply focused on the fovea, the region of the retina with the highest spatial resolution. When the eyes move to track the object, there is no retinal motion if the tracking is perfect (Figure 10.1), yet we still perceive object motion. Retinal motion is therefore not the only signal required for motion perception. In this chapter, we discuss the problem of how retinal motion and eye movements are integrated for motion perception. After introducing the problem of representing position and motion in three-dimensional space, we will concentrate specifically on the topic of how retinal and eye-movement signals contribute to the perception of motion in depth. To conclude, we discuss what we have learned about how the combination of eye movements and retinal motion differs between the perception of frontoparallel motion and the perception of motion in depth.
A headcentric framework for motion perception
Position (and motion) in the physical three-dimensional world can be described in a number of different ways. For example, it can be described in Cartesian coordinates (x, y, z) or in terms of angles and distances with respect to a certain origin.
We perceive the world as three-dimensional. The inputs to our visual system, however, are only a pair of two-dimensional projections on the two retinal surfaces. As emphasized by Marr and Poggio (1976), it is generally impossible to uniquely determine the three-dimensional world from its two-dimensional retinal projections. How, then, do we usually perceive a well-defined three-dimensional environment? It has long been recognized that, since the world we live in is not random, the visual system has evolved and developed to take advantage of the world's statistical regularities, which are reflected in the retinal images. Some of these image regularities, termed depth cues, are interpreted by the visual system as depth. Numerous depth cues have been discovered. Many of them, such as perspective, shading, texture, motion, and occlusion, are present in the retina of a single eye, and are thus called monocular depth cues. Other cues are called binocular, as they result from comparing the two retinal projections. In the following, we will review our physiologically based models for three binocular depth cues: horizontal disparity (Qian, 1994; Chen and Qian, 2004), vertical disparity (Matthews et al., 2003), and interocular time delay (Qian and Andersen, 1994; Qian and Freeman, 2009). We have also constructed a model for depth perception from monocularly occluded regions (Assee and Qian, 2007), another binocular depth cue, but have omitted it here owing to space limitations.
Binocular vision provides important information about depth to help us navigate in a three-dimensional environment and allow us to identify and manipulate 3D objects. The relative depth of any feature with respect to the fixation point can be determined by triangulating the horizontal shift, or disparity, between the images of that feature projected onto the left and right eyes. The computation is difficult because, in any given visual scene, there are many similar features, which create ambiguity in the matching of corresponding features registered by the two eyes. This is called the stereo correspondence problem. An extreme example of such ambiguity is demonstrated by Julesz's (1964) random-dot stereogram (RDS). In an RDS (Figure 7.1a), there are no distinct monocular patterns. Each dot in the left-eye image can be matched to several dots in the right-eye image. Yet when the images are fused between the two eyes, we readily perceive the hidden 3D structure.
In this chapter, we will review neurophysiological data that suggest how the brain might solve this stereo correspondence problem. Early studies took a mostly bottom-up approach. An extensive amount of detailed neurophysiological work has resulted in the disparity energy model (Ohzawa et al., 1990; Prince et al., 2002). Since the disparity energy model is insufficient for solving the stereo correspondence problem on its own, recent neurophysiological studies have taken a more top-down approach by testing hypotheses generated by computational models that can improve on the disparity energy model (Menz and Freeman, 2003; Samonds et al., 2009a; Tanabe and Cumming, 2009).
Studies of the evolution of animal signals and sensory behaviour have more recently shifted from considering 'extrinsic' (environmental) determinants to 'intrinsic' (physiological) ones. The drive behind this change has been the increasing availability of neural network models. With contributions from experts in the field, this book provides a complete survey of artificial neural networks. The book opens with two broad, introductory level reviews on the themes of the book: neural networks as tools to explore the nature of perceptual mechanisms, and neural networks as models of perception in ecology and evolutionary biology. Later chapters expand on these themes and address important methodological issues when applying artificial neural networks to study perception. The final chapter provides perspective by introducing a neural processing system in a real animal. The book provides the foundations for implementing artificial neural networks, for those new to the field, along with identifying potential research areas for specialists.
The purpose of this chapter is to introduce the physical principles underlying models of the electrical activity of neurons. Starting with the neuronal cell membrane, we explore how its permeability to different ions and the maintenance by ionic pumps of concentration gradients across the membrane underpin the resting membrane potential. We show how the electrical activity of a small neuron can be represented by equivalent electrical circuits, and discuss the insights this approach gives into the time–dependent aspects of the membrane potential, as well as its limitations. It is shown that spatially extended neurons can be modelled approximately by joining together multiple compartments, each of which contains an equivalent electrical circuit. To model neurons with uniform properties, the cable equation is introduced. This gives insights into how the membrane potential varies over the spatial extent of a neuron.
A nerve cell, or neuron, can be studied at many different levels of analysis, but much of the computational modelling work in neuroscience is at the level of the electrical properties of neurons. In neurons, as in other cells, a measurement of the voltage across the membrane using an intracellular electrode (Figure 2.1) shows that there is an electrical potential difference across the cell membrane, called the membrane potential. In neurons the membrane potential is used to transmit and integrate signals, sometimes over large distances. The resting membrane potential is typically around –65mV, meaning that the potential inside the cell is more negative than that outside.
So far we have been discussing how to model accurately the electrical and chemical properties of nerve cells and how these cells interact within the networks of cells forming the nervous system. The existence of the correct structure is essential for the proper functioning of the nervous system, and in this chapter we discuss modelling work that addresses the development of the structure of the nervous system. Existing models of developmental processes are usually designed to test a particular theory for neural development and so are not of such wide application as, for example, the HH model of nerve impulse propagation. We discuss several examples of specific models of neural development, at the levels of individual nerve cells and ensembles of nerve cells.
The scope of developmental computational neuroscience
Modelling of the development of the nervous system has been intense, but largely restricted to the development of the features of neurons and networks of neurons in specific cases. This means that computational theories of, for example, neural precursors, or stem cells, are not considered, although they could be. A long-established field of research, the elegant mathematical treatment of morphogenetic fields, providing possible mechanisms by which continuous gradients of molecules called morphogens can be read out to specify regions of the brain in early development (Turing, 1952; Meinhardt, 1983; Murray, 1993), is conventionally regarded as the province of theoretical biology rather than of developmental computational neuroscience.
This book has been about the principles of computational neuroscience as they stand at the time of writing. In some cases we have placed the modelling work that we described in its historical context when we felt this would be useful and interesting. We now make some brief comments about where the field of computational neuroscience came from and where it might be going.
The development of computational modelling in neuroscience
The field of computational modelling in neuroscience has been in existence for almost 100 years. During that time it has gone through several stages of development. From the 1930s, researchers in the field of mathematical biophysics conceived of mathematical and physical models of biophysical phenomena. In the more neural applications, there was work on how networks of nerve cells could store and retrieve information through the adaptive tuning of the thresholds of selected nerve cells (Shimbel, 1950), on linking psychophysical judgements with the underlying neural mechanisms (Landahl, 1939) and mathematical accounts of how nerve cells could act as logical elements and what the computational capabilities of a network of such elements were (McCulloch and Pitts, 1943). From the 1950s onwards there was interest in treating the brain as a uniform medium and calculating its modes of behaviour, such as the conditions under which it would support the propagation of waves of activity across it (Beurle, 1956). Much of this work laid the foundation for the field of artificial neural networks in the 1950s and 1960s.
In this chapter, we show how to model complex dendritic and axonal morphology using the multi-compartmental approach. We discuss how to represent an axon or a dendrite as a number of compartments derived from the real neurite's morphology. We discuss issues with measurement errors in experimentally determined morphologies and how to deal with them. Under certain assumptions, complex morphologies can be simplified for efficient modelling. We then consider how to match compartmental model output to physiological recordings and determine model parameters. We discuss in detail the techniques required for determining passive parameters, such as membrane resistance and capacitance over a distributed morphology. The extra problems that arise when modelling active membrane are also considered. Parameter estimation procedures are introduced.
Modelling the spatially distributed neuron
The basis of modelling the electrical properties of a neuron is the RC electrical circuit representation of passive membrane, consisting of a capacitor, leak resistor and a leak battery (Figure 2.14). Active membrane channels may be added by, for example, following the Hodgkin–Huxley approach. Further frameworks for modelling the myriad of ion channels found in neuronal membrane are covered in Chapter 5. If we are interested in voltage changes in more than just an isolated patch of membrane, we must consider how voltage spreads along the membrane. This can be modelled with multiple connected RC circuits. This approach is used widely and often referred to as multi-compartmental modelling or, more simply, compartmental modelling.
To understand the nervous system of even the simplest of animals requires an understanding of the nervous system at many different levels, over a wide range of both spatial and temporal scales. We need to know at least the properties of the nerve cell itself, of its specialist structures such as synapses, and how nerve cells become connected together and what the properties of networks of nerve cells are.
The complexity of nervous systems make it very difficult to theorise cogently about how such systems are put together and how they function. To aid our thought processes we can represent our theory as a computational model, in the form of a set of mathematical equations. The variables of the equations represent specific neurobiological quantities, such as the rate at which impulses are propagated along an axon or the frequency of opening of a specific type of ion channel. The equations themselves represent how these quantities interact according to the theory being expressed in the model. Solving these equations by analytical or simulation techniques enables us to show the behaviour of the model under the given circumstances and thus addresses the questions that the theory was designed to answer. Models of this type can be used as explanatory or predictive tools.
This field of research is known by a number of largely synonymous names, principally computational neuroscience, theoretical neuroscience or computational neurobiology.