To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The properties of ANN's described in the preceding chapters should make them interesting candidates both for models of some brain functions as well as for technical applications in certain areas of computer development or artificial intelligence. In either case, one of the first questions that comes to mind is the storage capacity of such systems, namely the quantity of information that can be stored and effectively retrieved from the network. It is of primary interest to know, for example, how the number of possible memories, in single patterns or in sequences, varies with the number of elements, neurons and synapses, of the network.
The storage capacity of a network can be quantified in a number of possible ways. It must be expressed per unit network element. Here we mention a few such possible measures:
The number of stored bits per neuron.
The number of stored bits per synapse.
The number of stored patterns per neuron.
The number of stored patterns per synapse.
The number of stored bits per coded synaptic bit.
Any one of the items in this list must be supplemented by informational qualifications. It should be realized that the usefulness of any of these three quantifications is strongly dependent on the level of correlation between the stored bits.
It would have been wonderful if some distinguished thinker had written: ‘Thinking about learning is a real headache’ and we could have used this phrase as a quote. A glimpse at the type of difficulties that present themselves when a new experience should lead to learning is expressed by Minsky[1] as follows:
Which agents could be wise enough to guess what changes should then be made? The high level agents can't know such things; they scarcely know which lower-level processes exist. Nor can lower-level agents know which of their actions helped us to reach our high-level goals; they scarcely know that higher-level goals exist. The agencies that move our legs aren't concerned with whether we are walking toward home or toward work – nor do the agents involved with such destinations know anything of controlling individual muscle units…
[p.75]
Hebb[2] adds that ‘the more we learn about the nature of learning, the farther we seem to get from being able to give firm answers.’
The scope is truly awesome and very soon it overlaps the deep controversy between innateness and behaviorism. There are several courses of action which avoid, at this stage, the full dimension of the issue:
To consider artificial devices which can serve as associative memories with increasingly complex performances and which are decreed by the designer, and to restrict the question of learning to that of training such machines to their final, known performing organization.
Analog neurons, spike rates, two-state neural models
The two-state representation of neural output, which enjoys such a wide popularity among modelers of neural networks, is often considered oversimplified both by biologists and device designers. Biologists prefer to describe relevant neural activity by firing rates. These are continuous variables describing mean spike activity of neurons, rather than the discrete variables which describe the presence or the absence of an individual spike. Device designers prefer sometimes to think in terms of operational amplifiers, currents, capacitances, resistors, and continuous time equations. It turns out that in a wide range of parameters the performance of a network as an ANN is largely independent of the representation[1]. As we shall see in the following chapters, e.g., Chapter 5, the discrete representation provides a more transparent framework for structured manipulations of attractors.
Eventually the gap between the different descriptions, closes, because in our formulation of the output mechanism, Section 1.4.4, a significant event is said to occur when a time average over spike activity is found to be high, which is nothing but a measure of mean firing rates. On the other hand, in the analog description, in terms of electronic components, which is deterministic in structure, one eventually generates spikes, stochastically, at a mean instantaneous rate proportional to the continuous variable at hand, e.g., the potential excess over the threshold.
By now the reader may feel that the issue of attractors has been some what belabored. This is unavoidable, given that attractors and their close relatives are the main message we bring from physics. The accompanying taste and style will ultimately be judged by the results and the clarity that can be produced as well as by qualitative novelty. This task is undertaken in this chapter and in Chapter 6, which are devoted to an exposition of tools and results. Admittedly, the results are transparent in situations which are not fully realistic. But once the model is formulated, with its extreme simplifications, it becomes clear that the qualitative nature of the difficulties involved in disentangling its properties are of a similar order to those which one would expect in a highly interactive network of realistic neurons. Whether cognition can be accounted for by either the dynamics of the realistic or that of the simplified network is, of course, a problem of a different magnitude. But if an aesthetic criterion is involved in selecting a mechanism for higher mental functions, this chapter should go a long way toward reinforcing ANNs' claim for the role.
Let us recapitulate the simplifying assumptions which will be most pertinent for the technical manipulations of the rest of this chapter:
This book summarizes in some detail the ideas, techniques and results developed in the last 5-6 years in the physics community about the collective properties of large assemblies of neurons. The subject has been, and still is, a source of great excitement among physicists the world over and new original ideas are generated incessantly. This enthusiasm has produced a wealth of new concepts and new detailed results which has not gone unnoticed outside physics departments. Biologists have begun to ask themselves whether the properties that physics anticipates in neural networks can indeed be observed and whether they provide useful theoretical guides for the empirical investigation of brain activity; computer scientists would not rule out these ideas as candidates for coherent parallel processing; psychologists and neurologists have been expecting some new useful metaphors for interpreting behavioral disfunction; cognitive scientists study the new concepts in their continued struggle with the elusiveness of processes of mind, even on the most elementary levels; and technologists have added, of course, Attractor Neural Networks to the list of future industries for sale.
One explanation for this impact of the study of neural networks seems to be in the type of new concepts that have been generated. They appear plausible upon introspection and they are based on elements with biological flavor. Another attraction is the clarity, the wealth and the detail provided by the quantitative analysis of the properties of such networks.
In Section 1.4.1 we have seen that the dynamics of an ANN is a march on the vertices of a hypercube in a space of N dimensions, where N is the number of neurons in the network. Every one of the vertices of the cube, Figure 1.13, is a bona fide network state. Let us first consider the case of asynchronous dynamics, Section 2.2.3, with a single neuron changing its neural state at every time step. In this case the network steps from one vertex to any of its N nearest neighbors. The question of ergodicity, and correspondingly of the ability of the system to perform as an associative memory, is related to the dependence of trajectories on their initial states. In other words, the initial states of the network are those states which are strongly influenced by an external stimulus. If the network were to enter a similar dynamical trajectory for every stimulus, no classification would be achieved. This will be the sense in which we will employ the term ergodic behavior. In terms of our landscape pictures, Figure 2.10 and 2.11, this would be the case of a single valley to which all flows. Alternatively, if trajectories in the space of network states depend strongly on the initial states, and correspondingly on the incoming stimuli, then the network can recall selectively, and have a variety of items in memory.
One obvious attraction of artificial neural networks is their potential technological applications, for which they serve as early feasibility studies. This is an issue that is better left at this stage to popular journalism. See e.g., [2,3,4]. Inasmuch as an individual neuron is interpreted as a computing device, artificial neural networks may provide answers to some of the outstanding questions of parallel computing – the coherent coordination of a multitude of processors. This motivation will also not be discussed here. Instead, such networks will be described below for several other reasons.
To provide a physical environment in which any set of simplifying assumptions about neural networks can be literally implemented. This possibility was raised in Section 1.1.3 in the context of the methodological discussion about verifiabilty of the theoretical results. Some of it can, of course, be investigated by computer simulations.
In addition, there are various uncontrollable variables which are naturally present in a real system, such as various random delays, inhomogeneities of components, etc. In this sense such real networks are one step removed from computer simulation.
In Section 1.2.1, we listed some of the simplifying assumptions involved in the construction of the models discussed so far. Many more assumptions may have been detected by the reader along the way. No amount of lifting of simplifications will closely approximate the full glory of an assembly of real live neurons. Yet, as the grossest assumptions are replaced by more realistic ones and as the model is modified to account for more complex types of behavior without a significant loss in its basic functional features and in its effectiveness, the model gains in plausibility. To recapitulate our general methodological point of view: The lifting of simplifications is not performed as an end in itself. If the more complicated system functions in a qualitative way that can be captured by the simplified system, then the complication is removed and analysis continues with the simplified system.
We shall recognize two types of robustness, related to two types of results:
Robustness of specific properties to perturbation of the underlying parameters.
Robustness of general features to modifications required by more complex functions.
Since this chapter will be primarily concerned with robustness of the first kind we start by giving examples of situations of the second kind.
Reduction to physics and physics modeling analogues
When physics ventures to describe biological or cognitive phenomena it provokes a fair amount of suspicion. The attempt is sometimes interpreted as the expression of an epistemological dogma which asserts that all natural phenomena are reducible to physical laws; that there is an intrinsic unity of science; that there are no independent levels or languages of description, only more or less detailed ones. The intent of this section is to allay such concern with regard to the present monograph, which remains neutral on the issue of reductionism. Yet, before explaining the conceptual alternative, of analogies to physical concepts, which has informed the work of physicists in the field of neural networks, it is hard to resist a few comments on the general issue of reductionism, as well as an expression of our own commitment.
It should be pointed out that the misgivings about reductionism cast many shadows. Biologists often still harbor traces of vitalism and feel quite uncomfortable at the thought that life, evolution or selection could be described by laws of physics and chemistry. Cognitive scientists resent both the reduction of cognitive phenomena to neurobiology[1,2] as well as to computer language[3]. A physicist who reads Fodor's proof of the impossibility of reduction between different levels of description should be troubled about the connection that was so ingeniously erected by Boltzmann and Gibbs between the macroscopic phenomena of thermodynamics and the underlying microscopic dynamics of Newton, Maxwell and Planck.
The type of neural network described in the previous chapter is a first prototype in the sense that:
it stores a small number of patterns;
it recalls single patterns only;
once a pattern has been recalled, the system will linger on it until the coming of some unspecified dramatic event.
Such a system may provide some useful technical applications as rapid, robust and reliable pattern recognizers. Such devices are discussed in Chapter 10. It seems rather unlikely that they can satisfy one's expectations of a cognitive system.
Very rudimentary introspection gives rise to the impression that, with or without explicit instruction, a single stimulus (or a very short string of stimuli) usually gives rise to a retrieval (or recall) of a whole cascade of connected ‘patterns’. Most striking are effects such as the recall of a tune, which can be provoked by a very simple stimulus, not directly related to the tune itself. Similarly, rather simple stimuli bring about the recall of sequences of numbers, especially in children, or of the alphabet. Similarly, much of the input into the cognitive system seems to be in the form of temporal sequences, rather than single patterns. This appears to be accepted in the study of speech recognition (see e.g., ref. [1]), as well as in vision, where a strong paradigm has it that form is deciphered from motion (see e.g., ref. [2]).