To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
When an inquirer seeks to improve his current state of full belief, the legitimacy of the alteration made depends on the aims of the inquirer. There are many kinds of aims inquirers might and do have in altering their full beliefs. These aims need not be economic, political, moral, or aesthetic. Cognitive aims may be pursued as well. The kind of cognitive aim that, in my opinion, does best in rationalizing scientific practice is one that seeks, on the one hand, to avoid error and, on the other, to obtain valuable information. Whether inquirers always seek error-free information or not need not concern us here. I rest content for the present with making the claim that agents can coherently pursue cognitive aims of this kind.
A consequence of this view is that states of full belief should be classifiable as error free or erroneous. Otherwise it makes little sense for an inquirer to seek to avoid error in changing his or her state of full belief. Likewise states of full belief should be classifiable as stronger or weaker; for those who seek valuable information should never prefer weaker states of full belief to stronger ones.
The two classifications are interrelated. If state 1 is stronger than state 2 and is error free, so is state 2. If state 2 is erroneous, so is state 1.
Rational agents and inquirers are sometimes capable of evaluating their beliefs, values, and actions with respect to their coherence or consistency – that is, with respect to prescriptive standards of rationality. But this is not always so. Often the predicaments faced are too complex or the time available before a judgment is to be made too short or the cost of deliberation too great for even the most intelligent and well balanced agents to engage in such self-criticism. And even when these considerations present no obstacle, emotional difficulties and indolence may impede this type of activity.
Because agents sometimes can and sometimes cannot fulfill the demands of reason by themselves, they need help. Education in logic and mathematics can contribute. Practical training in various types of deliberation is often useful. Whether current forms of psychotherapy are of value is a matter of dispute but their chief importance to us is their claim or promise to do so. The same is true of reading good literature.
We also use prosthetic devices when they become available. We write notes to ourselves when we don't trust our memories. We consult handbooks as stores of information and resources for how to make exact or approximate calculations. And more recently we have become fond of using the products of the burgeoning technologies that furnish us with calculators and other automata that enhance our capacity to engage in self-criticism.
I make these banal observations to emphasize a simple but important point.
The conclusion of an inductive inference is a coming to full belief.1 It entails a change in the agent's state of full belief and, in this respect, in one's doxastic commitment. Or, if the reasoning is suppositional, it is a full belief conditional on an inductively extended supposition. In both cases, the induction is a transformation of a corpus of full belief into another one. If default reasoning is a species of nonmonotonic inductive inference, the conclusion of a default inference must be representable by such a transformation.
It is far from clear, however, that students of default reasoning always understand the conclusions of default inference to be held so confidently. The inquirer might judge the conclusion as worthy of belief to some degree without coming to endorse it fully. In this sense, nonmonotonic reasoning need not involve inductive expansion to a new corpus of full beliefs.
What, however, is to be meant by a degree of belief or beliefworthiness? The currently fashionable view is that degrees of belief are to be interpreted as credal probabilities used in evaluating the expected values of options in decision problems.
There are alternative proposals. In particular, there is the interpretation as degrees of belief of measures having the formal properties first identified by G. L. S. Shackle (1949) and embedded in the b-functions described in section 6.7. Moreover, there are diverse proposals for ordinal or quantitative indices of beliefworthiness or inductive support such as the difference between a “posterior” probability and a “prior” probability, the ratio of a posterior and a prior, or the logarithm of such a ratio.
A great deal of hyperbole has been devoted to neural networks, both in their first wave around 1960 (Widrow & Hoff, 1960; Rosenblatt, 1962) and in their renaissance from about 1985 (chiefly inspired by Rumelhart & McClelland, 1986), but the ideas of biological relevance seem to us to have detracted from the essence of what is being discussed, and are certainly not relevant to practical applications in pattern recognition. Because ‘neural networks’ has become a popular subject, it has collected many techniques which are only loosely related and were not originally biologically motivated. In this chapter we will discuss the core area of feed-forward or ‘back-propagation’ neural networks, which can be seen as extensions of the ideas of the perceptron (Section 3.6). From this connection, these networks are also known as multi-layer perceptrons.
A formal definition of a feed-forward network is given in the glossary. Informally, they have units which have one-way connections to other units, and the units can be labelled from inputs (low numbers) to outputs (high numbers) so that each unit is only connected to units with higher numbers. The units can always be arranged in layers so that connections go from one layer to a later layer. This is best seen graphically; see Figure 5.1.
Pattern recognition has a long and respectable history within engineering, especially for military applications, but the cost of the hardware both to acquire the data (signals and images) and to compute the answers made it for many years a rather specialist subject. Hardware advances have made the concerns of pattern recognition of much wider applicability. In essence it covers the following problem:
‘Given some examples of complex signals and the correct decisions for them, make decisions automatically for a stream of future examples.’
There are many examples from everyday life:
Name the species of a flowering plant.
Grade bacon rashers from a visual image.
Classify an X-ray image of a tumour as cancerous or benign.
Decide to buy or sell a stock option.
Give or refuse credit to a shopper.
Many of these are currently performed by human experts, but it is increasingly becoming feasible to design automated systems to replace the expert and either perform better (as in credit scoring) or ‘clone’ the expert (as in aids to medical diagnosis).
Neural networks have arisen from analogies with models of the way that humans might approach pattern recognition tasks, although they have developed a long way from the biological roots. Great claims have been made for these procedures, and although few of these claims have withstood careful scrutiny, neural network methods have had great impact on pattern recognition practice.
The use of tree-based methods for classification is relatively unfamiliar in both statistics and pattern recognition, yet they are widely used in some applications such as botany (Figure 7.1) and medical diagnosis as being extremely easy to comprehend (and hence have confidence in).
The automatic construction of decision trees dates from work in the social sciences by Morgan & Sonquist (1963) and Morgan & Messenger (1973). (Later work such as Doyle, 1973, and Doyle & Fenwick, 1975, commented on the pitfalls of such automated procedures.) In statistics Breiman et al. (1984) had a seminal influence both in bringing the work to the attention of statisticians and in proposing new algorithms for constructing trees. At around the same time decision tree induction was beginning to be used in the field of machine learning, which we review in Section 7.4, and in engineering (for example, Sethi & Sarvarayudu, 1982).
The terminology of trees is graphic, although conventionally trees such as Figure 7.2 are shown growing down the page. The root is the top node, and examples are passed down the tree, with decisions being made at each node until a terminal node or leaf is reached. Each non-terminal node contains a question on which a split is based. Each leaf contains the label of a classification. A subtree of T is a tree with root a node of T; it is a rooted subtree if its root is the root of T.
The supervised methods considered so far have learnt both the structure of the probability distributions and the numerical values from the training set, or in the case of parametric methods, imposed a conventional structure for convenience. Other methods incorporate non-numerical ‘real-world’ knowledge about the subject domain into the structure of the probability distributions. Such knowledge is often about causal relationships, or perhaps the lack of causality as expressed by conditional independence.
These ideas have been most explored within the field of expert systems. This is a loosely defined area, and definitions vary:
‘The label “expert system” is, broadly speaking, a program intended to make reasoned judgements or to give assistance in a complex area in which human skills are fallible or scarce. …’
(Lauritzen & Spiegelhalter, 1988, p. 157)
‘A program designed to solve problems at a level comparable to that of a human expert in a given domain.’ (Cooper, 1989)
‘An expert system has two parts. The first one is the knowledge base. It usually makes up most of the system. In its simplest form it is a list of IF … THEN rules: each specifies what to do, or what conclusions to draw, under a set of well-defined circumstances’.
The second part of the expert system often goes under the name of “shell”. As the name implies, it acts as a receptacle for the knowledge base and contains instruments for making efficient use of it.