To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter 3 has been devoted to the modeling of the dynamics of neurons. The standard model we arrived at contains the main features which have been revealed by neuroelectrophysiology: the model considers neural nets as networks of probabilistic threshold binary automata. Real neural networks, however, are not mere automata networks. They display specific functions and the problem is to decide whether the standard model is able to show the same capabilities.
Memory is considered as one of the most prominent properties of real neural nets. Current experience shows that imprecise, truncated information is often sufficient to trigger the retrieval of full patterns. We correct misspelled names, we associate images or flavors with sounds and so on. It turns out that the formal nets display these memory properties if the synaptic efficacies are determined by the laws of classical conditioning which have been described in section 2.4. The synthesis in a single framework of observations of neurophysiology with observations of experimental psychology, to account for an emergent property of neuronal systems, is an achievement of the theory of neural networks.
The central idea behind the notion of conditioning is that of associativity. It has given rise to many theoretical developments, in particular to the building of simple models of associative memory which are called Hebbian models. The analysis of Hebbian models has been pushed rather far and a number of analytical results relating to Hebbian networks are gathered in this chapter. More refined models are treated in following chapters.
Something essential is missing in the description of memory we have introduced in previous chapters. A neural network, even isolated, is a continuously evolving system which never settles indefinitely in a steady state. We are able to retrieve not only single patterns but also ordered strings of patterns. For example, a few notes are enough for an entire song to be recalled, or, after training, one is able to go through the complete set of movements which are necessary for serving in tennis. Several schemes have been proposed to account for the production of memorized strings of patterns. Simulations show that they perform well, but this does not mean anything as regards the biological relevance of the mechanisms they involve. In actual fact no observation supporting one or the other of the schemes has been reported so far.
Parallel dynamics
Up to now the dynamics has been built so as to make the memorized patterns the fixed points of the dynamics. Once the network settles in one pattern it stays there indefinitely, at least for low noise levels. We have seen that fixed points are the asymptotic behaviors of rather special neural networks, namely those which are symmetrically connected. In asymmetrically connected neural networks whose dynamics is deterministic and parallel (the Little dynamics at zero noise level), the existence of limit cycles is the rule. It is then tempting to imagine that the retrieval of temporal sequences of patterns occurs through limit cycles.
The architectures of the neural networks we considered in Chapter 7 are made exclusively of visible units. During the learning stage, the states of all neurons are entirely determined by the set of patterns to be memorized. They are so to speak pinned and the relaxation dynamics plays no role in the evolution of synaptic efficacies. How to deal with more general systems is not a simple problem. Endowing a neural network with hidden units amounts to adding many degrees of freedom to the system, which leaves room for ‘internal representations’ of the outside world. The building of learning algorithms that make general neural networks able to set up efficient internal representations is a challenge which has not yet been fully satisfactorily taken up. Pragmatic approaches have been made, however, mainly using the so-called back-propagation algorithm. We owe the current excitement about neural networks to the surprising successes that have been obtained so far by calling upon that technique: in some cases the neural networks seem to extract the unexpressed rules that are hidden in sets of raw data. But for the moment we really understand neither the reasons for this success nor those for the (generally unpublished) failures.
The back-propagation algorithm
A direct derivation
To solve the credit assignment problem is to devise means of building relevant internal representations; that is to say, to decide which state Iµ, hid of hidden units is to be associated with a given pattern Iµ, vis of visible units.
A neural network self-organizes if learning proceeds without evaluating the relevance of output states. Input states are the sole data to be given and during the learning session one does not pay attention to the performance of the network. How information is embedded into the system obviously depends on the learning algorithm, but it also depends on the structure of input data and on architectural constraints.
The latter point is of paramount importance. In the first chapter we have seen that the central nervous system is highly structured, that the topologies of signals conveyed by the sensory tracts are somehow preserved in the primary areas of the cortex and that different parts of the cortex process well-defined types of information. A comprehensive theory of neural networks must account for the architecture of the networks. Up to now this has been hardly the case since one has only distinguished two types of structures, the fully connected networks and the feedforward layered systems. In reality the structures themselves are the result of the interplay between a genetically determined gross architecture (the sprouting of neuronal contacts towards defined regions of the system, for example) and the modifications of this crude design by learning and experience (the pruning of the contacts). The topology of the networks, the functional significance of their structures and the form of learning rules are therefore closely intertwined entities. There is now no global theory explaining why the structure of the CNS is the one we observe and how its different parts cooperate to produce such an efficient system, but there have been some attempts to explain at least the most simple functional organizations, those of the primary sensory areas in particular.
This text started with a description of the organization of the human central nervous system and it ends with a description of the architecture of neurocomputers. An unattentive reader would conclude that the latter is an implementation of the former, which obviously cannot be true. The only claim is that a small but significant step towards the understanding of processes of cognition has been carried out in recent years. The most important issue is probably that recent advances have made more and more conspicuous the fact that real neural networks can be treated as physical systems. Theories can be built and predictions can be compared with experimental observations. This methodology takes the neurosciences at large closer and closer to the classical ‘hard’ sciences such as physics or chemistry. The text strives to explain some of progress in the domain and we have seen how productive the imagination of theoreticians is.
For some biologists, however, the time of theorizing about neural nets has not come yet owing to our current lack of knowledge in the field. The question is: are the models we have introduced in the text really biologically relevant? This is the issue I would like to address in this last chapter. Many considerations are inspired by the remarks which G. Toulouse gathered in the concluding address he gave at the Bat-Sheva seminar held in Jerusalem in May 1988.
Clearly, any neuronal dynamics can always be implemented in classical computers and therefore we could wonder why it is interesting to build dedicated neuronal machines. The answer is two-fold:
Owing to the inherent parallelism of neuronal dynamics, the time gained by using dedicated machines rather than conventional ones can be considerable, so making it possible to solve problems which are out of the reach of most powerful serial computers.
It is perhaps even more important to become aware that dedicated machines compel one to think differently about the problems one has to solve. To program a neurocomputer does not involve building a program and writing a linear series of instructions, step by step. In the process of programming a neurocomputer, one is forced to think more globally in terms of phase space instead, to eventually figure out an energy landscape and to determine an expression for this energy. Z. Pilyshyn made this point clear enough in the following statement (quoted by D. Waltz):
‘What is typically overlooked (when we use a computational system as a cognitive model) is the extent to which the class of algorithms that can even be considered is conditioned by the assumptions we make regarding what basic operations are possible, how they may interact, how operations are sequenced, what data structures are possible and so on. Such assumptions are an intrinsic part of our choice of descriptive formalism.’
Mind has always been a mystery and it is fair to say that it is still one. Religions settle this irritating question by assuming that mind is non-material: it is just linked during the duration of a life to the body, a link that death breaks. It must be realized that this metaphysical attitude pervaded even the theorization of natural phenomena: to ‘explain’ why a stone falls and a balloon filled with hot air tends to rise, Aristotle, in the fourth century BC, assumed that stones house a principle (a sort of a mind) which makes them fall and that balloons embed the opposite principle which makes them rise. Similarly Kepler, at the turn of the seventeeth century, thought that the planets were maintained on their elliptical tracks by some immaterial spirits. To cite a last example, chemists were convinced for quite a while that organic molecules could never be synthetized, since their synthesis required the action of a vital principle. Archimedes, about a century after Aristotle, Newton, a century after Kepler, and Wöhler, who carried out the first synthesis of urea by using only mineral materials, disproved these prejudices and, at least for positivists, there is no reason why mind should be kept outside the realm of experimental observation and logical reasoning.
We find in Descartes the first modern approach of mind.
Neural networks are at the crossroad of several disciplines and the putative range of their applications is immense. The exploration of the possibilities is just beginning. Some domains, such as pattern recognition, which seemed particularly suited to these systems, still resist analysis. On the other hand, neural networks have proved to be a convenient tool to tackle combinatorial optimization problems, a domain to which at first sight they had no application. This indicates how difficult is the task of foreseeing the main lines of developments yet to come. All that can be done now is to give a series of examples, which we will strive to arrange in a logical order, although the link between the various topics is sometimes tenuous. Most of the applications we shall present were put forward before the fall of 1988.
Domains of applications of neural networks
Neural networks can be used in different contexts:
For the modeling of simple biological structures whose functions are known. The study of central pattern generators is an example.
For the modeling of higher functions of central nervous systems, in particular of those properties such as memory, attention, etc., which experimental psychology strives to quantify. Two strategies may be considered. The first consists in explaining the function of a given neural formation (as far as the function is well understood) by taking all available data on its actual structure into account. This strategy has been put forward by Marr in his theory of the cerebellum. The other strategy consists in looking for the minimal constraints that a neuronal architecture has to obey in order to account for some psychophysical property. The structure is now a consequence of the theory. If the search has been successful, it is tempting to identify the theoretical construction with biological structures which display the same organization.
This text is the result of two complementary experiences which I had in 1987 and 1988. The first was the opportunity, which I owe to Claude Godrèche, of delivering, in a pleasant seaside resort in Brittany, a series of lectures on the theory of neural networks. Claude encouraged me to write the proceedings in the form of a pedagogical book, a text which could be useful to the many people who are interested in the field. The second was a one-year sabbatical which I spent at the Hebrew University of Jerusalem on a research program on spin glasses and neural networks. The program was initiated by the Institute for Advanced Studies and organized by a team of distinguished physicists and biologists, namely Moshe Abeles, Hanoch Gutfreund, Haim Sompolinsky and Daniel Amit. Throughout the year, the Institute welcomed a number of researchers who shed different lights on a multi-faceted subject. The result is this introduction to the modeling of neural networks.
First of all, it is an introduction. Indeed the field evolves so fast that it is already impossible to have its various aspects encompassed within a single account.
Also it is an introduction, that is a peculiar perspective which rests on the fundamental hypothesis that the information processed by the nervous systems is encoded in the individual neuronal activities. This is the most widely admitted point of view. However, other assumptions have been suggested.
This chapter is not a course on neurobiology. As stated in the title, it is intended to gather a few facts relevant to neural modeling, for the benefit of those not acquainted with biology. The material which is displayed has been selected on the following accounts. First of all, it is made of neurobiological data that form the basic bricks of the model. Then it comprises a number of observations which have been subjects for theoretical investigations. Finally, it strives to settle the limits of the current status of research in this field by giving an insight on the huge complexity of central nervous systems.
Three approaches to the study of the functioning of central nervous systems
Let us assume that we have a very complicated machine of unknown origin and that our goal is to understand its functioning. Probably the first thing we do is to observe its structure. In general this analysis reveals a hierarchical organization comprising a number of levels of decreasing complexity: units belonging to a given rank are made of simpler units of lower rank and so on, till we arrive at the last level of the hierarchy, which is a collection of indivisible parts.
The next step is to bring to light what the units are made for, how their presence manifests itself in the machine and how their absence damages its properties. This study is first carried out on pieces of the lowest order, because the functions of these components are bound to be simpler than those of items of higher rank.
Autoregressive data modelling using the least-squares linear prediction method is generalized for multichannel time series. A recursive algorithm is obtained for the formation of the system of multichannel normal equations which determine the least-squares solution of the multichannel linear prediction problem. Solution of these multichannel normal equations is accomplished by the Cholesky factorization method. The corresponding multichannel Maximum Entropy spectrum derived from these least-squares estimates of the autoregressive model parameters is compared to that obtained using parameters estimated by a multichannel generalization of Burg's algorithm. Numerical experiments have shown that the multichannel spectrum obtained by the least-squares method provides for more accurate frequency determination for truncated sinusoids in the presence of additive white noise.
INTRODUCTION
Multi-channel generalizations of Burg's1–3 now-classical algorithm for the modelling of data as an auto-regressive sequence and therefore estimation of its equivalent maximum entropy spectrum have been obtained independently by several authors (Jones, Nuttal, Strand, Morf et al., Tyraskis and Tyraskis and Jensen. For single-channel data, Ulrych and Claytonll have also introduced an alternative procedure which is commonly described as ‘the exact-leastsquares method’ for the estimation of the autoregressive data model parameters from which a spectrum can be directly obtained. This method has been further developed and extended and efficient recursive computational algorithms have been provided by Barrodale and Errickson and Marplel. The exact least-squares method has been demonstrated to allow much improved spectral resolution and accuracy when compared to Burg's algorithm for single-channel time series although Burgapos;s algorithm requires somewhat less computational time and storage.