To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The statistical study of spatial patterns and processes has during the last few years provided a series of challenging problems for theories of statistical inference. Those challenges are the subject of this essay. As befits an essay, the results presented here are not in definitive form; indeed, many of the contributions raise as many questions as they answer. The essay is intended both for specialists in spatial statistics, who will discover much that has been achieved since the author's book (Spatial Statistics, Wiley, 1981), and for theoretical statisticians with an eye for problems arising in statistical practice.
This essay arose from the Adams Prize competition of the University of Cambridge, whose subject for 1985/6 was ‘Spatial and Geometrical Aspects of Probability and Statistics’. (It differs only slightly from the version which was awarded that prize.) The introductory chapter answers the question ‘what's so special about spatial statistics?’ The next three chapters elaborate on this by providing examples of new difficulties with likelihood inference in spatial Gaussian processes, the dominance of edge effects for the estimation of interaction in point processes. We show by example how Monte Carlo methods can make likelihood methods feasible in problems traditionally thought intractable.
The last two chapters deal with digital images. Here the problems are principally ones of scale dealing with up to a quarter of a million data points. Chapter 5 takes a very general Bayesian viewpoint and shows the importance of spatial models to encapsulate prior information about images.
Images as data are occurring increasingly frequently in a wide range of scientific disciplines. The scale of the images varies widely, from meteorological satellites which view scenes thousands of kilometres square and optical astronomy looking at sections of space, down to electron microscopy working at scales of 10µm or less. However, they all have in common a digital output of an image. With a few exceptions this is on a square grid, so each output measures the image within a small square known as a pixel. The measurement on each pixel can be a greylevel, typically one of 64 or 256 levels of luminance, or a series of greylevels representing luminance in different spectral bands. For example, earth resources satellites use luminance in the visual and infrared bands, typically four to seven numbers in total. One may of course use three bands to represent red, blue and green and so record an arbitrary colour on each pixel.
The resolution (the size of each pixel, hence the number of pixels per scene) is often limited by hardware considerations in the sensors. Optical astronomers now use 512 × 512 arrays of CCD (charge coupled device) sensors to replace photographic plates. The size of the pixels is limited by physical problems and also by the fact that these detectors count photons, so random events limit the practicable precision. In many other applications the limiting factor is digital communication speed. Digital images can be enormous in data-processing terms.
This essay aims to bring out some of the distinctive features and special problems of statistical inference on spatial processes. Realistic spatial stochastic processes are so far removed from the classical domain of statistical theory (sequences of independent, identically distributed observations) that they can provide a rather severe test of classical methods. Although much of the literature has been very negative about the problem, a few methods have emerged in this field which have spread to many other complex statistical problems. There is a sense in which spatial problems are currently the test bed for ideas in inference on complex stochastic systems.
Our definition of ‘spatial process’ is wide. It certainly includes all the areas of the author's monograph (Ripley, 1981), as well as more recent problems in image processing and analysis. Digital images are recorded as a set of observations (black/white, greylevel, colour…) on a square or hexagonal lattice. As such, they differ only in scale from other spatial phenomena which are sampled on a regular grid. Now the difference in scale is important, but it has become clear that it is fruitful to regard imaging problems from the viewpoint of spatial statistics, and this has been done quite extensively within the last five years.
Much of our consideration depends only on geometrical aspects of spatial patterns and processes.
Structural models for covariance matrices are used when studying relationships between variables and are employed predominantly in the social sciences. The best known of these is the factor analysis model but recently there has been rapid development of extensions and alternatives. Most of this work has appeared in psychometric journals. Textbooks on applied multivariate analysis are recently including chapters on factor analysis. They tend, however, to be out of date and give little attention to modern developments such as efficient computational methods in maximum likelihood factor analysis, effective methods for oblique rotation, methods for obtaining standard errors of factor loadings and extensions of the factor analysis model.
This paper will present a personal view of structural models for covariance matrices arising from continuous variates. The lack of robustness of statistical tests involving the assumption of multivariate normality will receive some attention and alternative approaches will be examined. Modern developments will be considered but no attempt will be made to provide an exhaustive review. Readers who require additional information are referred to the review article by Bentler (1980) and to the bibliographies in Harman (1976) and Mulaik (1972).
The paper is divided into two main parts. Part I is concerned with general technical background common to all structural models for covariance matrices. The manner in which covariance structures arise will be examined and general procedures for estimating parameters and comparing the adequacy of alternative models will be dealt with.
The principles underlying data scaling originated in the field of psychology. Direct measurement of intelligence, aggressiveness, depression and other mental states are impossible, whereas it is fairly easy to observe various manifestations of these states. For example, a typical IQ test consists of a set of questions designed to test various aspects of the underlying quality which the researcher calls “intelligence”. Every person that undergoes such a test generates a vector of responses and these are then converted into a single value which is called the IQ of that person. The process whereby this is achieved is called scaling, because each person becomes a point on a single dimension, in this case the IQ scale. Geometrically, the original data vector may be considered a point in a high-dimensional space and the process of scaling maps this point to a point in a one-dimensional subspace. These ideas can be generalized to mapping data vectors to points in a two-dimensional subspace so that the original vectors are “scaled” as points in a plane. Whether the dimensionality of the final subspace of representation be one, two or higher, the underlying principles are the same. Firstly, scaling transforms data points of very high dimensionality to points of much lower dimensionality. Secondly, the original data vectors are often points in a non-Euclidean space, that is distances are not defined in the usual physical way.
In 1980, the National Research Institute for Mathematical Sciences (NRIMS) of the Council for Scientific and Industrial Research held a highly successful Summer Seminar Series on Extensions of Linear-Quadratic Control Theory. This was followed in February 1981 by one on Applied Multivariate Analysis. The papers presented at that seminar formed the penultimate drafts of those in this book.
The choice of Applied Multivariate Analysis as the topic for the second Summer Seminar Series was prompted by the recognition that there is a much wider divergence between theory and practice of multivariate methods than is the case with univariate statistics. As a result of this, multivariate methods are applied widely, but little understood by their users.
The authors of these papers, apart from having considerable practical experience in the application of the techniques they discuss, have also contributed to their theoretical development. Their papers give a distillation of what they regard as the major features of each topic from the viewpoint of applicability, and so may go some way towards bridging the chasm between theory and practice.
Valuable comments on the first drafts and suggestions for improvements were made by Mr. J.C. Gower (Chapters 4 and 6), Professor J.N. Darroch (Chapter 3) and Professor J. Aitchison (Chapter 1). The generosity of these reviewers in finding time to go through the manuscript is greatly appreciated.
Automatic Interaction Detection (AID) is a family of methods for handling regression-type data in a way that is almost free of the usual assumptions necessary to process the data using linear hypothesis methods.
In AID, one has a dependent variable Y which one wishes to predict, and a vector of predictors X from which to predict Y. The predictors are all categorical (i.e. either nominal or ordinal), and generally take on only a few possible values. Interval predictors may be reduced to this form by grouping their possible values into classes, and then using the (ordinal) class variable as the predictor.
Various different methods within the AID family have been devised for situations in which the dependent variable Y is: (a) a scalar interval variable, (b) a scalar nominal variable, (c) a vector of interval variables. Other possibilities such as an ordinal Y or a vector of nominal Y are easy to fit into the general conceptual framework of AID.
The name AID suggests that the function of the technique is to discover whether the linear hypothesis model predicting Y from X contains only main effects, or whether interactions also occur. This is indeed one of the things that AID can do, but it has a number of other uses as well, which we consider overshadow this use in importance.
Before going into a detailed study of the aims and methods of AID, it may help to consider a simple example.
By
L. Paul Fatti, Council for Scientific and Industrial Research,
Douglas M. Hawkins, Council for Scientific and Industrial Research,
E. Liefde Raath, Council for Scientific and Industrial Research
Discriminant analysis is concerned with the problem of classifying an object of unknown origin into one of two or more distinct groups or populations on the basis of observations made on it. As evidenced by the examples given below, this problem occurs frequently in various fields as diverse as medicine, anthropology and mining, and the techniques of discriminant analysis have been used successfully in many situations. Computer packages for performing the necessary calculations involved in applying some of the techniques have been readily available for some time, although there are still some serious omissions in most of these packages.
Some examples
1. Haemophilia is a sex-linked genetic disease which is transmitted only by females, but whose symptoms are manifest only in males. Under normal medical examination it is impossible to distinguish between females carrying the disease and those not. In order to try and identify female carriers, the levels of a coagulant factor and its related antigen (Factor VIII and Factor VIII RA) in the blood have been suggested as possible discriminators between carriers and non-carriers.
A pilot study was carried out by Gomperts et al (1976) to test how well Factor VIII and its related antigen discriminate between carriers and non-carriers. A sample of 26 white females, of which 11 were known, for genetic reasons, to be carriers and 15 were known to be non-carriers was selected and the Factor VIII and Factor VIII RA levels measured in each subject.
Categorical variables can be divided into two main sub-classes:-nominal and ordinal. Ordinal variables, such as social status, number of children, categorized continuous variables, etc. have some sort of inherent grading which allows one to say, for example, that social class II is on the same “side” of social class I as is social class III, but is “nearer” to it than is social class III. Nominal variables, on the other hand do not have this inherent grading. They are typified by eyecolour, the type of disorder diagnosed in a psychiatric patient, and the city in which one resides.
The binomial, multinomial and Poisson distributions are commonly used for modelling the frequency of occurrence of a particular value of both nominal and ordinal variables.
All three distributions belong to the broader class of the exponential family. This family has the unifying property that the logarithms of the frequencies of occurrence of a particular value of the variable may be expressed as a linear function of the parameters of the distribution. Such models for log frequencies are termed loglinear models. They may be used for the analysis of two-way or multi-way contingency tables and data on rates and proportions (both of which arise naturally with nominal variables), and may also be extended to logit analysis and log-odds ratios which are the natural tools for dealing with ordinal variables.
The urge to classify objects must be recognized both as a basic human attribute, and as one of the cornerstones of the scientific method. A sorting and classification of a set of objects is the necessary prerequisite to an investigation into why that particular classification works to the extent that it does.
The most striking example of this process is in biology. The Linnaean system of classification predated Darwin by a century, but the very success of a scheme which was able to describe biological species as if they were the leaves of a tree invited a model to explain such an effect. The theory of evolution provided just such a model.
Much later, the same evolutionary model and its consequent tree structure have been used in linguistics to study the evolution of languages. These two applications illustrate two possible uses for cluster analysis. The first is to take a set of objects with particular interobject similarities and classify them “blindly”. As a result of this, one can study the resultant typology, and use it to build up models to explain the typology. The second use is to take a set of objects with a known form of typology (e.g. an evolutionary tree) and use an appropriate method to classify the objects.
There are other uses of cluster analysis in which the existence of a “real” typology is not presumed, but the analysis provides a convenient summary of a large body of data.