In days of yore, logic was neatly divided into two parts, Deductive Logic and Inductive Logic (Mill 1949/1843). The two parts were often taught as parts of a single course. Inductive logic has faded away, and now the very term has acquired a slightly antiquated patina. It has also acquired a number of quite specific modern meanings.
One very narrow and very specific meaning is that inductive inference is the inference from a set of observations or observation sentences (crow #1 is black; crow #2 is black; …; crow #n is black) to their universal generalization (all crows are black). There is not much logic here, but there is a big problem: to determine when, if ever, such an inference is “justified.”
A somewhat less ambitious construal of “induction” is as the inference from a statistical sample to a statistical generalization, or an approximate statistical generalization: from “51% of the first 10,000 tosses of this coin yielded heads,” to “Roughly half of the tosses of the coin will, in the long run, yield heads.” So construed (as by Baird 1992), the line between the logic of induction and the mathematics of statistics is a bit vague. This has been of little help to inductive logic, since the logic of statistical inference has itself been controversial.
THE HORSE OR THE CART?
John Maynard Keynes (Keynes 1952) proposed that probability should be legislative for rational belief. He also proposed that probabilities should form only a partial order: There were to be incomparable pairs of probabilities where the first is not larger than the second, the second not larger than the first, yet the two probabilities are not equal.
Frank Plumpton Ramsey objected (Ramsey 1931), quite correctly, that any such scheme depended on being able to relate beliefs and probabilities. He disregarded the second proposal, and so took probabilities to be numbers, so that what he took to be necessary was a way of measuring degrees of belief.
Ramsey offered a somewhat naïve operational way of measuring beliefs. He himself took it to be no more than approximate (“I have not worked out the mathematical logic of this in detail, because this would, I think, be rather like working out to seven places of decimals a result only valid to two” (ibid., p. 180). What was important about Ramsey's proposal was that it also suggested why beliefs (assuming their measurability) should satisfy the probability calculus.
Ramsey's approach became the model for later “subjectivistic” approaches. First, we think about ways in which to measure degrees of belief; second, we consider why those degrees should satisfy the probability calculus; and third, we consider how those probabilities should be updated in the light of new evidence.
Classical logic—including first order logic, which we studied in Chapter 2—is concerned with deductive inference. If the premises are true, the conclusions drawn using classical logic are always also true. Although this kind of reasoning is not inductive, in the sense that any conclusion we can draw from a set of premises is already “buried” in the premises themselves, it is nonetheless fundamental to many kinds of reasoning tasks. In addition to the study of formal systems such as mathematics, in other domains such as planning and scheduling a problem can in many cases also be constrained to be mainly deductive.
Because of this pervasiveness, many logics for uncertain inference incorporate classical logic at the core. Rather than replacing classical logic, we extend it in various ways to handle reasoning with uncertainty. In this chapter, we will study a number of these formalisms, grouped under the banner nonmonotonic reasoning. Monotonicity, a key property of classical logic, is given up, so that an addition to the premises may invalidate some previous conclusions. This models our experience: the world and our knowledge of it are not static; often we need to retract some previously drawn conclusion on learning new information.
Logic and (Non)monotonicity
One of the main characteristics of classical logic is that it is monotonic, that is, adding more formulas to the set of premises does not invalidate the proofs of the formulas derivable from the original premises alone. In other words, a formula that can be derived from the original premises remains derivable in the expanded premise set.
This book is the outgrowth of an effort to provide a course covering the general topic of uncertain inference. Philosophy students have long lacked a treatment of inductive logic that was acceptable; in fact, many professional philosophers would deny that there was any such thing and would replace it with a study of probability. Yet, there seems to many to be something more traditional than the shifting sands of subjective probabilities that is worth studying. Students of computer science may encounter a wide variety of ways of treating uncertainty and uncertain inference, ranging from nonmonotonic logic to probability to belief functions to fuzzy logic. All of these approaches are discussed in their own terms, but it is rare for their relations and interconnections to be explored. Cognitive science students learn early that the processes by which people make inferences are not quite like the formal logic processes that they study in philosophy, but they often have little exposure to the variety of ideas developed in philosophy and computer science. Much of the uncertain inference of science is statistical inference, but statistics rarely enter directly into the treatment of uncertainty to which any of these three groups of students are exposed.
At what level should such a course be taught? Because a broad and interdisciplinary understanding of uncertainty seemed to be just as lacking among graduate students as among undergraduates, and because without assuming some formal background all that could be accomplished would be rather superficial, the course was developed for upper-level undergraduates and beginning graduate students in these three disciplines. The original goal was to develop a course that would serve all of these groups.
In Chapter 3, we discussed the axioms of the probability calculus and derived some of its theorems. We never said, however, what “probability” meant. From a formal or mathematical point of view, there was no need to: we could state and prove facts about the relations among probabilities without knowing what a probability is, just as we can state and prove theorems about points and lines without knowing what they are. (As Bertrand Russell said [Russell, 1901, p. 83] “Mathematics may be defined as the subject where we never know what we are talking about, nor whether what we are saying is true.”)
Nevertheless, because our goal is to make use of the notion of probability in understanding uncertain inference and induction, we must be explicit about its interpretation. There are several reasons for this. In the first place, if we are hoping to follow the injunction to believe what is probable, we have to know what is probable. There is no hope of assigning values to probabilities unless we have some idea of what probability means. What determines those values? Second, we need to know what the import of probability is for us. How is it supposed to bear on our epistemic states or our decisions? Third, what is the domain of the probability function? In the last chapter we took the domain to be a field, but that merely assigns structure to the domain: it doesn't tell us what the domain objects are.
There is no generally accepted interpretation of probability.
We have abandoned many of the goals of the early writers on induction. Probability has told us nothing about how to find interesting generalizations and theories, and, although Carnap and others had hoped otherwise, it has told us nothing about how to measure the support for generalizations other than approximate statistical hypotheses. Much of uncertain inference has yet to be characterized in the terms we have used for statistical inference. Let us take a look at where we have arrived so far.
Our overriding concern has been with objectivity. We have looked on logic as a standard of rational argument: Given evidence (premises), the validity (degree of entailment) of a conclusion should be determined on logical grounds alone. Given that the Hawks will win or the Tigers will win, and that the Tigers will not win, it follows that the Hawks will win. Given that 10% of a large sample of trout from Lake Seneca have shown traces of mercury, and that we have no grounds for impugning the fairness of the sample, it follows with a high degree of validity that between 8% and 12% of the trout in the lake contain traces of mercury.
The parallel is stretched only at the point where we include among the premises “no grounds for impugning. …” It is this that is unpacked into a claim about our whole body of knowledge, and embodied in the constraints discussed in the last three chapters under the heading of “sharpening.”
We are now in a position to reap the benefits of the formal work of the preceding two chapters. The key to uncertain inference lies, as we have suspected all along, in probability. In Chapter 9, we examined a certain formal interpretation of probability, dubbed evidential probability, as embodying a notion of partial proof. Probability, on this view, is an interval-valued function. Its domain is a combination of elementary evidence and general background knowledge paired with a statement of our language whose probability concerns us, and its range is of [0, 1]. It is objective. What this means is that if two agents share the same evidence and the same background knowledge, they will assign the same (interval) probabilities to the statements of their language. If they share an acceptance level 1 – α for practical certainty, they will accept the same practical certainties.
It may be that no two people share the same background knowledge and the same evidence. But in many situations we come close. As scientists, we tend to share each other's data. Cooked data is sufficient to cause expulsion from the ranks of scientists. (This is not the same as data containing mistakes; one of the virtues of the system developed here is that no data need be regarded as sacrosanct.) With regard to background knowledge, if we disagree, we can examine the evidence at a higher level: is the item in question highly probable, given that evidence and our common background knowledge at that level?
There are a number of epistemological questions raised by this approach, and some of them will be dealt with in Chapter 12.
As Carnap points out [Carnap, 1950], some of the controversy concerning the support of empirical hypotheses by data is a result of the conflation of two distinct notions. One is the total support given a hypothesis by a body of evidence. Carnap's initial measure for this is his c*; this is intended as an explication of one sense of the ordinary language word “probability.” This is the sense involved when we say, “Relative to the evidence we have, the probability is high that rabies is caused by a virus.” The other notion is that of “support” in the active sense, in which we say that a certain piece of evidence supports a hypothesis, as in “The detectable presence of antibodies supports the viral hypothesis.” This does not mean that that single piece of evidence makes the hypothesis “highly probable” (much less “acceptable”), but that it makes the hypothesis more probable than it was. Thus, the presence of water on Mars supports the hypothesis that that there was once life on Mars, but it does not make that hypothesis highly probable, or even more probable than not.
Whereas c*(h, e) is (for Carnap, in 1950) the correct measure of the degree of support of the hypothesis h by the evidence e, the increase of the support of h due to e given background knowledge b is the amount by which e increases the probability of h: c*(h, b Λ e) – c*(h, b). We would say that e supports h relative to background b if this quantity is positive, and undermines h relative to b if this quantity is negative.
Traditionally, logic has been regarded as the science of correct thinking or of making valid inferences. The former characterization of logic has strong psychological overtones—thinking is a psychological phenomenon—and few writers today think that logic can be a discipline that can successfully teach its students how to think, let alone how to think correctly. Furthermore, it is not obvious what “correct” thinking is. One can think “politically correct” thoughts without engaging in logic at all. We shall, at least for the moment, be well advised to leave psychology to one side, and focus on the latter characterization of logic: the science of making valid inferences.
To make an inference is to perform an act: It is to do something. But logic is not a compendium of exhortations: From “All men are mortal” and “Socrates is a man” do thou infer that Socrates is mortal! To see that this cannot be the case, note that “All men are mortal” has the implication that if Charles is a man, he is mortal, if John is a man, he is mortal, and so on, through the whole list of men, past and present, if not future. Furthermore, it is an implication of “All men are mortal” that if Fido (my dog) is a man, Fido is mortal; if Tabby is a man, Tabby is mortal, etc. And how about inferring “If Jane is a man, Jane is mortal”? As we ordinarily construe the premise, this, too is a valid inference. We cannot follow the exhortation to perform all valid inferences: There are too many, they are too boring, and that, surely, is not what logic is about.
We form beliefs about the world, from evidence and inferences made from the evidence. Belief, as opposed to knowledge, consists of defeasible information. Belief is what we think is true, and it may or may not be true in the world. On the other hand, knowledge is what we are aware of as true, and it is always true in the world.
We make decisions and act according to our beliefs, yet they are not infallible. The inferences we base our beliefs on can be deductive or uncertain, employing any number of inference mechanisms to arrive at our conclusions, for instance, statistical, nonmonotonic, or analogical. We constantly have to modify our set of beliefs as we encounter new information. A new piece of evidence may complement our current beliefs, in which case we can hold on to our original beliefs in addition to this new evidence. However, because some of our beliefs can be derived from uncertain inference mechanisms, it is inevitable that we will at some point encounter some evidence that contradicts what we currently believe. We need a systematic way of reorganizing our beliefs, to deal with the dynamics of maintaining a reasonable belief set in the face of such changes.
The state of our beliefs can be modeled by a logical theory K, a deductively closed set of formulas. If a formula φ is considered accepted in a belief set, it is included in the corresponding theory K; if it is rejected, its negation ¬φ is in K. In general the theory is incomplete.
The idea behind evidential probability is a simple one. It consists of two parts: that probabilities should reflect empirical frequencies in the world, and that the probabilities that interest us—the probabilities of specific events—should be determined by everything we know about those events.
The first suggestions along these lines were made by Reichenbach [Reichenbach, 1949]. With regard to probability, Reichenbach was a strict limiting-frequentist: he took probability statements to be statements about the world, and to be statements about the frequency of one kind of event in a sequence of other events. But recognizing that what concerns us in real life is often decisions that bear on specific events—the next roll of the die, the occurrence of a storm tomorrow, the frequency of rain next month—he devised another concept that applied to particular events, that of weight. “We write P(a) = p thus admitting individual propositions inside the probability functor. The number p measures the weight of the individual proposition a. It is understood that the weight of the proposition was determined by means of a suitable reference class, …” [Reichenbach, 1949, p. 409]. Reichenbach appreciated the problem of the reference class: “… we may have reliable statistics concerning a reference class A and likewise reliable statistics for a reference class C, whereas we have insufficient statistics for the reference class A·C. The calculus of probability cannot help in such a case because the probabilities P(A, B) and P(C, B) do not determine the probability P(A · C, B)” [Reichenbach, 1949, p. 375]. The best the logician can do is to recommend gathering more data.
We consider a group of puppies, take what we know about that group as a premise, and infer, as a conclusion, something about the population of all puppies. Such an inference is clearly risky and invalid. It is nevertheless the sort of inference we must make and do make. Some such inferences are more cogent, more rational than others. Our business as logicians is to find standards that will sort them out.
Statistical inference includes inference from a sample to the population from which it comes. The population may be actual, as it is in public opinion polls, or hypothetical, as it is in testing an oddly weighted die (the population is then taken to be the hypothetical, population of possible tosses or possible sequences of tosses of the die). Statistical inference is a paradigm example of uncertain inference.
Statistical inference is also often taken to include the uncertain inference we make from a population to a sample, as when we infer from the fairness of a coin that roughly half of the next thousand coin tosses we make will yield heads–a conclusion that might be false. Note that this is not probabilistic inference: the inference from the same premises to the conclusion that the probability is high that roughly half of the next thousand tosses will yield heads is deductive and (given the premises) not uncertain at all.
The inference from a statistical premise about a population to a nonprobabilistic conclusion about part of that population is called direct inference. The inference from a premise about part of a population to the properties of the population as a whole is called inverse inference.
The theory of Induction is the despair of philosophy–and yet all our activities are based upon it.Alfred North Whitehead: Science and the Modern World, p. 35.
Ever since Adam and Eve ate from the tree of knowledge, and thereby earned exile from Paradise, human beings have had to rely on their knowledge of the world to survive and prosper. And whether or not ignorance was bliss in Paradise, it is rarely the case that ignorance promotes happiness in the more familiar world of our experience—a world of grumbling bellies, persistent tax collectors, and successful funeral homes. It is no cause for wonder, then, that we prize knowledge so highly, especially knowledge about the world. Nor should it be cause for surprise that philosophers have despaired and do despair over the theory of induction: For it is through inductive inferences, inferences that are uncertain, that we come to possess knowledge about the world we experience, and the lamentable fact is that we are far from consensus concerning the nature of induction.
But despair is hardly a fruitful state of mind, and, fortunately, the efforts over the past five hundred years or so of distinguished people working on the problems of induction have come to far more than nought (albeit far less than the success for which they strove). In this century, the debate concerning induction has clarified the central issues and resulted in the refinement of various approaches to treating the issues. To echo Brian Skyrms, a writer on the subject [Skyrms, 1966], contemporary inductive logicians are by no means wallowing in a sea of total ignorance and continued work promises to move us further forward.
Email your librarian or administrator to recommend adding this to your organisation's collection.