To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In Section 1.4.1 we have seen that the dynamics of an ANN is a march on the vertices of a hypercube in a space of N dimensions, where N is the number of neurons in the network. Every one of the vertices of the cube, Figure 1.13, is a bona fide network state. Let us first consider the case of asynchronous dynamics, Section 2.2.3, with a single neuron changing its neural state at every time step. In this case the network steps from one vertex to any of its N nearest neighbors. The question of ergodicity, and correspondingly of the ability of the system to perform as an associative memory, is related to the dependence of trajectories on their initial states. In other words, the initial states of the network are those states which are strongly influenced by an external stimulus. If the network were to enter a similar dynamical trajectory for every stimulus, no classification would be achieved. This will be the sense in which we will employ the term ergodic behavior. In terms of our landscape pictures, Figure 2.10 and 2.11, this would be the case of a single valley to which all flows. Alternatively, if trajectories in the space of network states depend strongly on the initial states, and correspondingly on the incoming stimuli, then the network can recall selectively, and have a variety of items in memory.
One obvious attraction of artificial neural networks is their potential technological applications, for which they serve as early feasibility studies. This is an issue that is better left at this stage to popular journalism. See e.g., [2,3,4]. Inasmuch as an individual neuron is interpreted as a computing device, artificial neural networks may provide answers to some of the outstanding questions of parallel computing – the coherent coordination of a multitude of processors. This motivation will also not be discussed here. Instead, such networks will be described below for several other reasons.
To provide a physical environment in which any set of simplifying assumptions about neural networks can be literally implemented. This possibility was raised in Section 1.1.3 in the context of the methodological discussion about verifiabilty of the theoretical results. Some of it can, of course, be investigated by computer simulations.
In addition, there are various uncontrollable variables which are naturally present in a real system, such as various random delays, inhomogeneities of components, etc. In this sense such real networks are one step removed from computer simulation.
In Section 1.2.1, we listed some of the simplifying assumptions involved in the construction of the models discussed so far. Many more assumptions may have been detected by the reader along the way. No amount of lifting of simplifications will closely approximate the full glory of an assembly of real live neurons. Yet, as the grossest assumptions are replaced by more realistic ones and as the model is modified to account for more complex types of behavior without a significant loss in its basic functional features and in its effectiveness, the model gains in plausibility. To recapitulate our general methodological point of view: The lifting of simplifications is not performed as an end in itself. If the more complicated system functions in a qualitative way that can be captured by the simplified system, then the complication is removed and analysis continues with the simplified system.
We shall recognize two types of robustness, related to two types of results:
Robustness of specific properties to perturbation of the underlying parameters.
Robustness of general features to modifications required by more complex functions.
Since this chapter will be primarily concerned with robustness of the first kind we start by giving examples of situations of the second kind.
Reduction to physics and physics modeling analogues
When physics ventures to describe biological or cognitive phenomena it provokes a fair amount of suspicion. The attempt is sometimes interpreted as the expression of an epistemological dogma which asserts that all natural phenomena are reducible to physical laws; that there is an intrinsic unity of science; that there are no independent levels or languages of description, only more or less detailed ones. The intent of this section is to allay such concern with regard to the present monograph, which remains neutral on the issue of reductionism. Yet, before explaining the conceptual alternative, of analogies to physical concepts, which has informed the work of physicists in the field of neural networks, it is hard to resist a few comments on the general issue of reductionism, as well as an expression of our own commitment.
It should be pointed out that the misgivings about reductionism cast many shadows. Biologists often still harbor traces of vitalism and feel quite uncomfortable at the thought that life, evolution or selection could be described by laws of physics and chemistry. Cognitive scientists resent both the reduction of cognitive phenomena to neurobiology[1,2] as well as to computer language[3]. A physicist who reads Fodor's proof of the impossibility of reduction between different levels of description should be troubled about the connection that was so ingeniously erected by Boltzmann and Gibbs between the macroscopic phenomena of thermodynamics and the underlying microscopic dynamics of Newton, Maxwell and Planck.
The type of neural network described in the previous chapter is a first prototype in the sense that:
it stores a small number of patterns;
it recalls single patterns only;
once a pattern has been recalled, the system will linger on it until the coming of some unspecified dramatic event.
Such a system may provide some useful technical applications as rapid, robust and reliable pattern recognizers. Such devices are discussed in Chapter 10. It seems rather unlikely that they can satisfy one's expectations of a cognitive system.
Very rudimentary introspection gives rise to the impression that, with or without explicit instruction, a single stimulus (or a very short string of stimuli) usually gives rise to a retrieval (or recall) of a whole cascade of connected ‘patterns’. Most striking are effects such as the recall of a tune, which can be provoked by a very simple stimulus, not directly related to the tune itself. Similarly, rather simple stimuli bring about the recall of sequences of numbers, especially in children, or of the alphabet. Similarly, much of the input into the cognitive system seems to be in the form of temporal sequences, rather than single patterns. This appears to be accepted in the study of speech recognition (see e.g., ref. [1]), as well as in vision, where a strong paradigm has it that form is deciphered from motion (see e.g., ref. [2]).
The statistical study of spatial patterns and processes has during the last few years provided a series of challenging problems for theories of statistical inference. Those challenges are the subject of this essay. As befits an essay, the results presented here are not in definitive form; indeed, many of the contributions raise as many questions as they answer. The essay is intended both for specialists in spatial statistics, who will discover much that has been achieved since the author's book (Spatial Statistics, Wiley, 1981), and for theoretical statisticians with an eye for problems arising in statistical practice.
This essay arose from the Adams Prize competition of the University of Cambridge, whose subject for 1985/6 was ‘Spatial and Geometrical Aspects of Probability and Statistics’. (It differs only slightly from the version which was awarded that prize.) The introductory chapter answers the question ‘what's so special about spatial statistics?’ The next three chapters elaborate on this by providing examples of new difficulties with likelihood inference in spatial Gaussian processes, the dominance of edge effects for the estimation of interaction in point processes. We show by example how Monte Carlo methods can make likelihood methods feasible in problems traditionally thought intractable.
The last two chapters deal with digital images. Here the problems are principally ones of scale dealing with up to a quarter of a million data points. Chapter 5 takes a very general Bayesian viewpoint and shows the importance of spatial models to encapsulate prior information about images.
Images as data are occurring increasingly frequently in a wide range of scientific disciplines. The scale of the images varies widely, from meteorological satellites which view scenes thousands of kilometres square and optical astronomy looking at sections of space, down to electron microscopy working at scales of 10µm or less. However, they all have in common a digital output of an image. With a few exceptions this is on a square grid, so each output measures the image within a small square known as a pixel. The measurement on each pixel can be a greylevel, typically one of 64 or 256 levels of luminance, or a series of greylevels representing luminance in different spectral bands. For example, earth resources satellites use luminance in the visual and infrared bands, typically four to seven numbers in total. One may of course use three bands to represent red, blue and green and so record an arbitrary colour on each pixel.
The resolution (the size of each pixel, hence the number of pixels per scene) is often limited by hardware considerations in the sensors. Optical astronomers now use 512 × 512 arrays of CCD (charge coupled device) sensors to replace photographic plates. The size of the pixels is limited by physical problems and also by the fact that these detectors count photons, so random events limit the practicable precision. In many other applications the limiting factor is digital communication speed. Digital images can be enormous in data-processing terms.
This essay aims to bring out some of the distinctive features and special problems of statistical inference on spatial processes. Realistic spatial stochastic processes are so far removed from the classical domain of statistical theory (sequences of independent, identically distributed observations) that they can provide a rather severe test of classical methods. Although much of the literature has been very negative about the problem, a few methods have emerged in this field which have spread to many other complex statistical problems. There is a sense in which spatial problems are currently the test bed for ideas in inference on complex stochastic systems.
Our definition of ‘spatial process’ is wide. It certainly includes all the areas of the author's monograph (Ripley, 1981), as well as more recent problems in image processing and analysis. Digital images are recorded as a set of observations (black/white, greylevel, colour…) on a square or hexagonal lattice. As such, they differ only in scale from other spatial phenomena which are sampled on a regular grid. Now the difference in scale is important, but it has become clear that it is fruitful to regard imaging problems from the viewpoint of spatial statistics, and this has been done quite extensively within the last five years.
Much of our consideration depends only on geometrical aspects of spatial patterns and processes.