To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
It is well known that sensory information on a number of modalities is arranged in a spatial “map” in biological brains, such that information from similar sensors arrives close together in the cortex. For example, much visual sensory information is represented in a map in the primary visual cortex, which is arranged roughly as the image coming in through the eyes. The human somesthetic cortex receives touch information arranged so that sensors which are close to each other on the body tend to be represented in close areas in the cortex. Similarly the bat auditory cortex receives information arranged by frequency, so that close frequencies produce a similar response.
A common feature of these feature maps is that the representation scale is non-uniform: some areas are magnified when compared with others. For example, the area of the visual cortex which corresponds to the fovea is magnified much more than that for peripheral vision (in terms of angle on the retina). The cortical area given over to touch sensors in the lips and fingers is proportionally much greater than that given over to the back. The bat has a much magnified representation for frequencies around its echo-location frequency than elsewhere (Anderson and Hinton, 1981).
These magnified areas of feature maps all seem to correspond to receptors which are proportionally more important than others. Information from the fovea is required for discernment of detail in an object; information from the fingertips or lips is needed for fine manipulation of tools or food; while the detailed frequency differences around the bat's echo-location frequency are important if it is to find and catch its prey.
Spoken language can be regarded as the combination of two processes. The first is the process of encoding a message as an utterance. The second is the transmission process which ensures the encoded message is received and understood by the listener.
In this chapter I will argue that the clarity variation of individual syllables is a direct consequence of such a transmission process and that a statistical model of clarity change gives an insight into how such a process functions.
Clarity
We often do not say the same word the same way in different situations. If we read a list of words out loud we say them differently from when we produce them, spontaneously, in a conversation. Even within spontaneous speech there are wide differences in the articulation of the same word by the same speaker. If you remove these words from their context some instances are easier for a listener to recognise than others. The instances that are easier to recognise share a number of characteristics. They tend to be carefully articulated, the vowels are longer and more spectrally distinct and there is less coarticulation. These instances have been articulated more clearly than others. One extreme example of a clear instance of a word is when a speaker is asked to repeat a word because the listener does not understand it. For example:
A. Bread, Flour, Eggs, Margarine.
B. Sorry what was that last item?
A. MARGARINE.
The second instance of “margarine” will be significantly different acoustically from the first instance. It will be much more clearly articulated.
the relevance of information theory to psychology depends a bit on what you think psychology is about. Since this book is about the relationship betwen IT and the brain, it could be argued that the whole of it is relevant to psychology. If you are of the school that thinks the only real psychology is clinical, then IT has rather little to offer as yet, though it is making inroads even there: Germine (1993) describes a model of the physiology of the mind as an informational system. It is probably no coincidence that all three of the editors of this volume work in psychology departments. Our interests are all rather in the area of perception: how things may be represented in the brain and how those representations come to be formed. Much of the rest of the volume concerns such things: this part concerns some attempts to apply information theory to other aspects of behaviour.
Janne Sinkkonen (Chapter 13) argues that the amount of processing an input receives will depend upon its importance to the organism, which he identifies with its unexpectedness. He shows that this assumption leads to an identity of form in the equations used to describe the theoretical information content of the message and the resource allocation by the organism. The principle resource to be allocated is energy. An experiment to test these ideas with human participants is reported. A device is used that can measure the EMF generated from postsynaptic currents in the brain, which, potentially, indicate energy consumption. The participants are played a sequence of two tones, where one occurs with lower probability than the other.
It has been experimentally observed that the receptive fields (RF) of cortical cells have a dynamic nature. For instance, it was found that some time (of the order of minutes) after the occurrence of a retinal lesion the area of the RF increased by a factor of order 5 (Gilbert and Wiesel, 1992), and that cortical cells with their classical RF inside the damaged region recovered their activity. A similar effect can be obtained without the existence of real lesions. Stimuli can emulate the lesion if they are localized; that is, if there is some small part of input space that receives stimulation strongly different from their surround. Lack of stimulation in a small region of the visual space produces an effect similar to a scotoma. Experiments with localized stimuli have been done in both the visual (Pettet and Gilbert, 1992) and the somatosensory systems (Jenkins et al., 1990).
These changes in the RFs of cortical neurons can be quantitatively studied with psychophysical experiments. For instance, changes in RF sizes are reflected in a systematic bias in feature localization tasks. It has been found (Kapadia et al., 1994) that the ability to determine the relative position of a short line segment in the middle of another two, presented close to the border of the artificial scotoma, was strongly biased in a way that is consistent with the expansion of RFs of neurons in the cortical scotoma.
It has been speculated (Gilbert, 1992; Pettet and Gilbert, 1992) that the expansion of RF sizes is responsible for the perceptual filling-in effect (Ramachandran and Gregory, 1991) and other visual illusions.
Recent advances in techniques for the formal analysis of neural networks (Amit et al., 1987; Gardner, 1988; Tsodyks and Feigelman, 1988; Treves, 1990; Nadal and Parga, 1993) have introduced the possibility of detailed quantitative analyses of real brain circuitry. This approach is particularly appropriate for regions such as the hippocampus, which show distinct structure and for which the microanatomy is relatively simple and well known.
The hippocampus, as archicortex, is thought to predate phylogenetically the more complex neocortex, and certainly possesses a simplified version of the six-layered neocortical stratification. It is not of interest merely because of its simplicity, however: evidence from numerous experimental paradigms and species points to a prominent role in the formation of long-term memory, one of the core problems of cognitive neuroscience (Scoville and Milner, 1957; McNaughton and Morris, 1987; Weiskrantz, 1987; Rolls, 1991; Gaffan, 1992; Cohen and Eichenbaum, 1993). Much useful research in neurophysiology and neuropsychology has been directed qualitatively, and even merely categorially, at understanding hippocampal function. Awareness has dawned, however, that the analysis of quantitative aspects of hippocampal organisation is essential to an understanding of why evolutionary pressures have resulted in the mammalian hippocampal system being the way it is (Stephan, 1983; Amaral et al., 1990; Witter and Groenewegen, 1992; Treves et al., 1996). Such an understanding will require a theoretical framework (or formalism) that is sufficiently powerful to yield quantitative expressions for meaningful parameters, that can be considered valid for the real hippocampus, is parsimonious with known physiology, and is simple enough to avoid being swamped by details that might obscure phenomena of real interest.
This book is the result of a dilemma I had in 1996: I wanted to attend a conference on information theory, I fancied learning to surf, and my position meant that it was very difficult to obtain travel funds. To solve all of these problems in one fell swoop, I decided to organise a cheap conference, in a place anyone who was interested could surf, and to use as a justification a conference on information theory. All I can say is that I thoroughly recommend doing this. Organising the conference was a doddle (a couple of web pages, and a couple of phone calls to the hotel in Newquay). The location was superb. A grand hotel perched on a headland looking out to sea (and the film location of that well-known film Witches). All that and not 100 yards from the most famous surfing beach in Britain. The conference was friendly, and the talks were really very good. The whole experience was only marred by the fact that Jack Scannell was out skilfully surfing the offshore breakers, whilst I was still wobbling on the inshore surf.
Before the conference I had absolutely no intention of producing a book, but after going to the conference, getting assurances from the other editors that they would help, and realising that in fact the talks would make a book that I would quite like to read, I plunged into it.
Connectionism has recently become a very popular framework for modelling cognition and the brain. Its use in the study of basic language processing tasks (such as reading and speech recognition) is particularly widespread. Many effects (such as regularity, frequency and consistency effects) arise naturally, as a simple consequence of the gradual acquisition of the appropriate conditional probabilities, for virtually any mapping for virtually any neural network trained by virtually any gradient descent procedure. Other effects (such as cohort, morphological and priming effects) can arise as a simple consequence of information or representation overlap. More effects (such as robustness) follow easily from information redundancy. These effects show themselves during learning, after learning and after simulated brain damage. There is thus much scope for the connectionist modelling of developmental, normal and patient data, and the literature reflects this.
The problem is that many of these effects are essentially “free gifts” that come with virtually any neural network model, and yet we often see them being quoted in the literature as being “evidence” for the correctness of particular models of the brain. This can be very misleading, particularly for researchers that have no direct modelling experience themselves. In this chapter I shall review the main effects that we can expect to arise naturally in connectionist models and attempt to show, in simple terms, how these effects are a natural consequence of the underlying information theory and how the details of the network models do not make any real difference to these results.
Part of the function of the neuron is communication. Neurons must communicate voltage signals to one another through their connections (synapses) in order to coordinate their control of an animal's behaviour. It is for this reason that information theory (Shannon and Weaver, 1949) represents a promising framework in which to study the design of natural neural systems. Nowhere is this more so than in the early stages of vision, involving the retina, and in the vertebrate, the lateral geniculate nucleus and the primary visual cortex. Not only are early visual systems well characterised physiologically, but we are also able to identify the ultimate “signal” (the visual image) that is being transmitted and the constraints which are imposed on its transmission. This allows us to suggest sensible objectives for early vision which are open to direct testing. For example, in the vertebrate, the optic nerve may be thought of as a limited-capacity channel. The number of ganglion cells projecting axons in the optic nerve is many times less than the number of photoreceptors on the retina (Sterling, 1990). We might therefore propose that one goal of retinal processing is to package information as efficiently as possible so that as little as possible is lost (Barlow, 1961a).
Important to this argument is that we do not assume the retina is making judgements concerning the relative values of different image components to higher processing (Atick, 1992b). Information theory is a mathematical theory of communication. It considers the goal of faithful and efficient transmission of a defined signal within a set of data.
This chapter examines coding efficiency in the light of our recent analysis of the metabolic costs of neural information. We start by reviewing the relevance of coding efficiency, as illustrated by work on the blowfly retina, and subsequently on mammalian visual systems. We then present the first results from a new endeavour, again in the blowfly retina. We demonstrate that the acquisition and transmission of information demands a high metabolic price. To encode and transmit a single bit of information costs a blowfly photo-receptor or a retinal interneuron millions of ATP molecules, but the cost of transmission across a single chemical synapse is significantly less. This difference suggests a fundamental relationship between bandwidth, signal-to-noise ratio and metabolic cost in neurons that favours sparse coding by making it more economical to send bits through a channel of low capacity. We also consider different modes of neural signalling. Action potentials appear to be as costly as analogue codes and this suggests that a major reason for employing action potentials over short distances in the central nervous system is the suppression of synaptic noise in convergent circuits. Our derivation of the relationship between energy expended and the useful work done by a neural system leads us to explore the molecular basis of coding. We suggest that the representation of information by arrays of protein molecules is relatively cheap – it is transmission through cellular systems that makes information costly. Finally, we demonstrate that the cost of fuelling and maintaining the retina makes significant demands on a blowfly's energy budget.
This chapter addresses the problem of training a self-organising neural network on images derived from multiple sources; this type of network potentially may be used to model the behaviour of the mammalian visual cortex (for a review of neural network models of the visual cortex see Swindale (1996). The network that will be considered is a soft encoder which transforms its input vector into a posterior probability over various possible classes (i.e. alternative possible interpretations of the input vector). This encoder will be optimised so that its posterior probability is able to retain as much information as possible about its input vector, as measured in the minimum mean square reconstruction error (i.e. L2 error) sense (Luttrell, 1994a, 1997c).
In the special case where the optimisation is performed over the space of all possible soft encoders, the optimum solution is a hard encoder (i.e. it is a “winner-take-all” network, in which only one of the output neurons is active) which is an optimal vector quantiser (VQ) of the type described in Linde et al. (1980), for encoding the input vector with minimum L2 error. A more general case is where the output of the soft encoder is deliberately damaged by the effects of a noise process. This type of noisy encoder leads to an optimal self-organising map (SOM) for encoding the input vector with minimum L2 error, which is closely related to the well-known Kohonen map (Kohonen, 1984).
The soft encoder network that is discussed in this chapter turns out to have many of the emergent properties that are observed in the mammalian visual cortex, such as dominance stripes and orientation maps.
A definition for “resource” reads: “a source of supply, support, or aid, esp. one held in reserve” (Webster's, 1996). In the context of biological organisms, this definition can be complemented in several ways. Resources of a certain type are more or less freely allocable for specific functions of the organism. As long as the functions serve an adaptive purpose, resource allocation does so as well. Because re-acquiring resources is almost always costly, natural selection gradually shapes organisms toward economic design. In biology, this principle of economization by natural selection is called “design for economy”, and it is thought to play an important part in evolutionary adaptation (Sibly and Calow, 1986; Diamond, 1993).
For example, energy is a typical resource. It is universal in the sense that almost every action of the organism, including passive maintenance of its state, requires energy. Plants typically have large and costly structures to acquire light, while many animals use most of their time in seeking food.
From another viewpoint, organisms maintain their life (homeostasis) by ranging from simple chemical loops to complex nervous systems. The former implicitly employs a very simple model of the environment, in which everything but a few parameters remain constant. On the other hand, organisms with large brains have a much richer picture of the world and relations of its constituents. Acting on the basis of complex causal relationships is much more efficient than simple momentary adaptation, because the outcomes of possible actions can be explicitly or implicitly predicted, sometimes to distant future.
We have argued that principles of universal grammar (i.e., of syntax, morphology, semantics, and phonology) have, with respect to the nervous system, a status much like that of Mendelian laws of classical genetics (Jenkins, 1979).That is, they are an abstract characterization of physical mechanisms which, in this case, reflect genetically specified neural structures.
Moreover, we argued that it made no more (or less) sense to ask whether what we then called “Chomsky's laws” were “psychologically real” than it did to ask whether “Mendel's laws” were “physiologically real.” If you were convinced by the evidence from the argument from poverty of the stimulus, or by other nonlinguistic evidence, that UG represented the genetic component or initial state of the language faculty, then it made sense to talk about the genes involved in the specification of the initial state. And one could ask the usual things that get asked about genes – what chromosomes are they on? Do they act in a dominant, recessive, polygenic, or other fashion? What do they do – are they structural or regulatory genes? And so on.
Objections were raised that Mendel's laws were either outmoded or else, if they still were operative at all, they didn't have much to do with UG:
His [Mendel's] fundamental approach, using statistical methods and proposing abstract laws to describe the regularities, was a plausible one in the initial stages of the scientific study of heredity; but it would make no sense nowadays, with the knowledge we have acquired about the chemistry of the genetic program.
Chomsky has posed what we consider to be the central questions for the study of language and biology (biolinguistics):
What constitutes knowledge of language?
How is this knowledge acquired?
How is this knowledge put to use?
What are the relevant brain mechanisms?
How does this knowledge evolve (in the species)?
Chomsky asks “how can we integrate answers to these questions within the existing natural sciences, perhaps by modifying them?” (Chomsky, 1991a:6). This more general question is part of what he has referred to as the unification problem, a topic to which we return below (Chomsky, 1994a:37,80).
The discussion of the questions (1)–(5) above within the tradition of generative grammar began in the early 1950s: “At least in a rudimentary form, these questions were beginning to be the topic of lively discussion in the early 1950s, primarily among a few graduate students. In Cambridge, I would mention particularly Eric Lenneberg and Morris Halle, and also Yehoshua Bar-Hillel” (Chomsky, 1991a:6).
The period between the mid-1950s and the present is sometimes referred to as the “cognitive revolution.” However, Chomsky has observed that contemporary work might be more properly viewed as a “renewal” of the “classical concerns” of the seventeenth and eighteenth centuries (Chomsky, 1997a). This earlier period of the study of mind, which includes as a central element the Cartesian theory of body and mind, might then be called the “first cognitive revolution” (Chomsky, 1994a:35). There are, in addition, many antecedents to modern-day studies of language and mind, both before and after this period.