To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Categorical variables can be divided into two main sub-classes:-nominal and ordinal. Ordinal variables, such as social status, number of children, categorized continuous variables, etc. have some sort of inherent grading which allows one to say, for example, that social class II is on the same “side” of social class I as is social class III, but is “nearer” to it than is social class III. Nominal variables, on the other hand do not have this inherent grading. They are typified by eyecolour, the type of disorder diagnosed in a psychiatric patient, and the city in which one resides.
The binomial, multinomial and Poisson distributions are commonly used for modelling the frequency of occurrence of a particular value of both nominal and ordinal variables.
All three distributions belong to the broader class of the exponential family. This family has the unifying property that the logarithms of the frequencies of occurrence of a particular value of the variable may be expressed as a linear function of the parameters of the distribution. Such models for log frequencies are termed loglinear models. They may be used for the analysis of two-way or multi-way contingency tables and data on rates and proportions (both of which arise naturally with nominal variables), and may also be extended to logit analysis and log-odds ratios which are the natural tools for dealing with ordinal variables.
The urge to classify objects must be recognized both as a basic human attribute, and as one of the cornerstones of the scientific method. A sorting and classification of a set of objects is the necessary prerequisite to an investigation into why that particular classification works to the extent that it does.
The most striking example of this process is in biology. The Linnaean system of classification predated Darwin by a century, but the very success of a scheme which was able to describe biological species as if they were the leaves of a tree invited a model to explain such an effect. The theory of evolution provided just such a model.
Much later, the same evolutionary model and its consequent tree structure have been used in linguistics to study the evolution of languages. These two applications illustrate two possible uses for cluster analysis. The first is to take a set of objects with particular interobject similarities and classify them “blindly”. As a result of this, one can study the resultant typology, and use it to build up models to explain the typology. The second use is to take a set of objects with a known form of typology (e.g. an evolutionary tree) and use an appropriate method to classify the objects.
There are other uses of cluster analysis in which the existence of a “real” typology is not presumed, but the analysis provides a convenient summary of a large body of data.
Although for this chapter we have used a title which has often a medical connotation the problem arises in many other fields – for example, in the diagnosis of a fault in a complex industrial process, in categorising an archaeological or anthropological specimen. From an expository point of view, however, the nature of a diagnostic problem is most easily described, and the corresponding theory is best developed, within the context of a specific situation. For this purpose we have selected a medical problem concerning the differential diagnosis of three forms or types of a particular syndrome on the basis of two diagnostic tests or observable features. We have deliberately selected this three-type two-feature problem because it allows the maximum exploitation of diagrammatic means of expressing concepts and analyses. All the concepts and analyses carry over straightforwardly into higher dimensional problems. Indeed the introductory illustrative problem which we now present is a subproblem extracted from a larger real one.
Example 11.1
Differential diagnosis of Cushing's syndrome. Cushing's syndrome is a rare hypersensitive disorder associated with the over-secretion of cortisol by the adrenal cortex. For illustrative purposes we confine ourselves here to three ‘types’ of the syndrome, those types in which the cause of this over-secretion is actually within the adrenal gland itself. The types are
a: adenoma,
b: bilateral hyperplasia,
c: carcinoma,
and we investigate the possibilities of distinguishing the types on the basis of two observable ‘features’, the determination by paperchromatography of the urinary excretion rates (mg/24h) of two steroid metabolites, tetrahydrocortisone and pregnanetriol.
An essential feature of statistical prediction analysis is that it involves two experiments e and f. From the information which we gain from a performance of e, the informative experiment, we wish to make some reasoned statement concerning the performance of f, the future experiment. In order that e should provide information on f there must be some link between these two experiments. Throughout this book we shall deal with problems where this link is through the indexing parameter of the two experiments e and f, and so we make the following assumption.
Assumption 1 The class of probability models which form the possible descriptions of e and the class of possible models for f have the same index set Θ, and the true models have the same (though unknown) index θ*.
A further general feature of all the problems we shall consider is contained in the following independence assumption.
Assumption 2 For given index θ the experiments e and f are independent.
By adopting this second assumption we deliberately exclude a range of prediction problems in which f is a continuation of some stochastic process of which e records a realisation to date. Techniques such as forecasting by exponential weighting, linear least squares prediction and time series analysis are thus outside the scope of this book.
To give some idea of the wide applicability of statistical prediction analysis as defined above and to motivate the development of appropriate theory we devote the remainder of this chapter to the presentation of specific prediction problems. All these problems are later analysed and extended in the sections indicated in the text.
The myriad of possible statistical sampling inspection procedures forces us to consider in detail only a very small selection in a book of this size. We would need a separate book to do justice to the huge variety of plans. In this chapter therefore we show how some standard plans come within the framework of decisive prediction, and how the framework can readily cope with less standard problems. The application of prediction theory to this area will provide some additional justification and motivation for some of these plans. We hope that those selected will be sufficient to indicate the direction of analysis to any reader with a specific problem.
We consider both fixed size sample and sequential sampling schemes. Wetherill (1966) and Wetherill and Campling (1966) also provide a decision theory approach to sampling inspection but do not consider predictive distributions.
Fixed-size single-sample destructive testing
We consider first a fixed-size single-sample plan for deciding whether to accept or reject a batch. For a process which produces an item at each of a number of independent operations we may imagine as our basic future experiment the determination of the quality y of a single item. This quality y may be a simple counting variable taking the value 1 for an effective and 0 for a defective item, or may be more sophisticated, for example the lifetime of a component or the degree of purity of a chemical preparation. We suppose that the probabilistic mechanism which describes the production of the variable y is a density function p(y|θ) on Y where, as in previous work, θ is an indexing parameter with density function p(θ).
When a treatment is applied to an object or individual it is with the express purpose of altering the future of that object or individual. Thus when we choose one of a number of possible refining processes for a batch of raw material we intend that the batch will in the future attain some desirable quality. When we select a method of machining an industrial component we have in mind some future characteristic of the component. When we prescribe a particular treatment for a patient we hope that some specific aspect of his future condition will be more agreeable than his present state of disease. Because of this preoccupation with the future state of an object or individual it will not be surprising to find that statistical prediction analysis has an important role to play in the problem of treatment allocation.
In the examples already mentioned there are three basic sets which must clearly play an important role. First we suppose that the present state or indicator t of the individual unit under consideration belongs to some specifiable set T of possible initial states or indicators. Secondly, there is some set A of possible treatments from which we have to select a treatment a to apply to the individual unit. Thirdly we must to some extent assess the effectiveness of treatment in terms of the future state or response y attained by the unit after application of treatment; we thus have to be in a position to envisage the set Y of possible future states or responses.
If we are asked to predict the outcome of a performance of a future experiment f our answer will clearly depend on how we view the consequences of being wrong. More specifically we may attempt to assess the relative consequences of being ‘close’ to the realised outcome and of being ‘badly’ wrong. If we can quantify these visualised consequences then we can present the problem as one of statistical decision theory. Since in constructing the predictive density function p(y|x) we have already carried out the information-extraction part of the problem we have a particularly simple confrontation in this decision problem. The components are as follows,
(i)Parameter set. The unknown outcome of the future experiment f plays the role of an unknown state of nature, so that Y, the sample space of f, is the parameter set of the statistical decision problem. Our assessment of the plausibility of a particular y at the time of making a decision is p(y|x), the predictive density at y.
(ii)Action set. The set A of possible actions is simply a reproduction of Y, since any element of Y is a possible prediction a.
(iii)Utility function. Associated with each prediction or action a and each realisable outcome y there is a utility or value U(a, y). We thus suppose defined a function U on the product domain A × Y.
Prediction by its derivation (L. praedicere, to say before) means literally the stating beforehand of what will happen at some future time. It is an occupational hazard of many professions: meteorologist, doctor, economist, market researcher, engineering designer, politician and pollster. It is indeed a precarious game because any specific prediction can eventually be compared with the actuality. Many prophets of doom predicting that the world will end at 12.30 on 7 May are left in quieter mood by 12.31. Prediction is a problem simply because of the presence of uncertainty. Seldom, if ever, is it a case of logical deduction; almost inevitably it is a matter of induction or inference. Probabilistic and statistical tools are therefore necessary components of any scientific approach to the formalisation of prediction problems.
In this book we shall be concerned with prediction not only in this narrow sense of making a reasoned statement about what is likely to happen in some future situation but with a much wider class of problems. Any inferential problem whose solution depends on our envisaging some future occurrence will be termed a problem of statistical prediction analysis. The presentation in chapter 1 of a selection of motivating examples illustrates the nature and diversity of statistical prediction analysis, and serves as an introduction to the ingredients of the problem.
A science historian, writing on the development of the concepts and practice of prediction, would probably start by pointing out how primitive man was compelled to attempt prediction, for example the forecasting of the date on which the local river would flood.