To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The inferior temporal cortex (ITC) is the highest echelon within the visual stream concerned with processing visual shape information. The Felleman and Van Essen diagram (Chapter 1, Figure 1.5) places the hippocampus at the top. While visual responses can be elicited in the hippocampus, people with bilateral lesions to the hippocampus can still see very well. A famous example is a patient known as H. M., who had no known visual deficit but gave rise to the whole field of memory studies based on his inability to form new memories. The hippocampus is not a visual area and instead receives inputs from all sensory modalities (Chapter 4).
We have been traveling through the wonderful territory of the visual cortex, examining the properties of different brain areas and neural circuits, learning about how animals and their neurons respond to visual stimuli and what happens when different parts of the visual cortex are lesioned or artificially stimulated. It is now time to put all this biological knowledge into a theory of visual recognition and to instantiate this theory through a computational model that can see and interpret the world. En route toward this goal, here we start by discussing how scientists describe neural circuits using computational models and define the basic properties of neural networks.
Around the 1950s, a wealth of behavioral experiments had characterized many phenomenological aspects of visual perception that begged for a mechanistic explanation (Chapter 3). Lesion studies had provided a compelling case that damage to circumscribed brain regions led to specific visual processing deficits (Chapter 4). These lesion studies pointed to specific brain areas to investigate visual processing, especially the primary visual cortex in the back of the brain. In addition, the successful use of microelectrode electrical recordings had led to direct insights about the function of neurons within the retinal circuitry (Chapter 2). The time was ripe to open the black box of the brain and begin to think about how vision emerges from the spiking activity of neurons in the cortex.
We want to understand the neural mechanisms responsible for visual cognition, and we want to instantiate these mechanisms into computational algorithms that resemble and perhaps even surpass human performance. In order to build such biologically inspired visually intelligent machines, we first need to define visual cognition capabilities at the behavioral level. What types of shapes can be recognized, and when and how? Under what conditions do people make mistakes during visual processing? How much experience and what type of experience with the world is required to learn to see? To answer these questions, we need to quantify human performance under well-controlled visual tasks. A discipline with the picturesque and attractive name of psychophysics aims to rigorously characterize, quantify, and understand behavior during cognitive tasks.
In the previous chapter, we introduced the idea of directly comparing computational models versus human behavior in visual tasks. For example, we assess how models classify an image versus how humans classify the same image. In some tasks, the types of errors made by computational models can be similar to human mistakes. Here we will dig deeper into what current computer vision algorithms can and cannot do. We will highlight the enormous power of current computational models, while at the same time emphasizing some of their limitations and the exciting work ahead of us to build better models.
We want to understand how neuronal circuits give rise to vision. We can use microelectrodes and the type of neurophysiological recordings introduced in Section 2.7. In the case of the retina, it is evident where to place the microelectrodes to examine function. However, there are about 1011 neurons in the human brain, and we do not have any tools that enable us to record from all of them. How do we figure out what parts of the brain are relevant for vision so we can study them at the neurophysiological level?
And there was light. Vision starts when photons reflected from objects in the world impinge on the retina. Although this may seem rather clear to us right now, it took humanity several centuries, if not more, to arrive at this conclusion. The compartmentalization of the study of optics as a branch of physics and visual perception as a branch of neuroscience is a recent development. Ideas about the nature of perception were interwoven with ideas about optics throughout antiquity and the middle ages. Giants of the caliber of Plato (~428–~348 BC) and Euclid (~300 BC) supported a projection theory according to which cones of light emanating from the eyes either reached the objects themselves or met halfway with other rays of light coming from the objects, giving rise to the sense of vision. The distinction between light and vision can be traced back to Aristotle (384–322 BC) but did not reach widespread acceptance until the investigations of properties of the eye by Johannes Kepler (1571–1630).
As discussed in the last two chapters, there has been significant progress in computer vision. Machines are becoming quite proficient at a wide variety of visual tasks. Teenagers are not surprised by a phone that can recognize their faces. Self-driving cars are a matter of daily real-world discussions. Having cameras in the house that can detect a person’s mood is probably not too far off. Now imagine a world where we have machines that can visually interpret the world the way we do. To be more precise, imagine a world where we have machines that can flexibly answer a seemingly infinite number of questions on a given image. Let us assume that we cannot distinguish the answers given by the machine from the answers that a human would give; that is, assume that machines can pass the Turing test for vision, as defined in Section 9.1. Would we claim that such a machine can see? Would such a machine have visual consciousness?
Understanding how the brain works constitutes the greatest scientific challenge of our times, arguably the greatest challenge of all times. We have sent spaceships to peek outside of our solar system, and we study galaxies far away to build theories about the origin of the universe. We have built powerful accelerators to scrutinize the secrets of subatomic particles. We have uncovered the secrets to heredity hidden in the billions of base pairs in DNA. But we still have to figure out how the three pounds of brain tissue inside our skulls work to enable us to do physics, biology, music, literature, and politics.
We have come a long way since our initial steps toward defining the basic properties of vision in Chapter 1. We started with characterizing the spatial and temporal statistics of natural images (Chapter 2). We summarized visual behavior – that is, how observers perceive the images around them (Chapter 3). Lesion studies helped define specific circuits in the cortex that are responsible for processing distinct types of visual information (Chapter 4). We explored how neurons in the retina, the thalamus, and the ventral visual cortex respond to a variety of different stimulus conditions (Chapters 2, 5, and 6).