Theory Assessment

Daniel M. Hausman

doi:10.1017/9781009320283.011

Part II Theory Assessment

Part I distinguished models and theories and clarified the characteristics of model construction in theoretical economics. Its chapters may defuse superficial criticisms of economics, but the discussion postponed addressing questions of empirical assessment. Economics only provides knowledge of economies if can use its models to tell us some truths about actual economic phenomena. Conceptual exploration is well and good, but explanation and prediction require that there be evidence in support of our theoretical hypotheses. In Part II, I turn to the central problems of theory assessment, canvas the traditional solutions to them, and, inspired by the views of John Stuart Mill, offer my own account.

When one thinks of economic methodology, the first questions that come to mind are questions of appraisal. Are equilibrium models useful for the purposes of explanation and prediction? What sort of confidence should economists place in generalizations that employ these models? Do economists behave as good scientists should? Are standards for appraising social theories the same as standards for appraising theories in the natural sciences? When one focuses on economics, these questions seem particularly pressing, for economic theory resembles theories in the natural sciences, except in predictive success. One striking problem is that equilibrium theory is full of “laws” that are, if taken literally, false, and further false assertions are made when economists use their models to answer specific questions. Do these facts show that there is something fundamentally wrong with economics?

I maintain that the answers to these questions lie mainly in the peculiarities of the structure and strategy of economic theory discussed in Part I, in the complexities of economic phenomena, and in the difficulties of testing rather than in mistaken views of confirmation or theory appraisal. I shall argue that one can regard economists as employing an unremarkable, indeed platitudinous, theory of confirmation in their appraisals of theoretical hypotheses. Although their appraisals are sometimes too favorable, their overconfidence does not result from an erroneous view of theory assessment. It comes instead from methodological and substantive commitments to equilibrium theory as a “separate science” (as discussed above in Chapter 7).

9 Inexactness in Economic Theory

The generalizations of equilibrium theory are not true universal statements. Preferences are not always complete or transitive. Firms do not always aim to maximize profits. Individuals are sometimes satiated. Yet the generalizations that constitute equilibrium theory are informative, and mainstream economists have constructed useful models that incorporate them. How is one to understand the content and value of such “inexact” (i.e., false) claims?

9.1 Mill on Tendencies

In “On the Definition of Political Economy and the Method of Investigation Proper to It,” John Stuart Mill (Reference Mill1836a) argues that political economy is a science of “tendencies”: that its claims are “true in the abstract” and would be true in the concrete were it not for disturbing causes. What can he mean?

When Mill returns to these issues in A System of Logic (1843), his language is a little different and clearer. He maintains that in an inexact science:

[T]he only laws as yet accurately ascertained are those of the causes which affect the phenomenon in all cases, and in considerable degree; while others which affect it in some cases only, or, if in all, only in a slight degree, have not been sufficiently ascertained and studied to enable us to lay down their laws, still less to deduce the completed law of the phenomenon, by compounding the effects of the greater with those of the minor causes.

(1843, 6.3.1)

Mill cites the science of tides as an example. Scientists know the laws of the greater causes – that is, the gravitational attraction of the sun and the moon – but they are ignorant of the laws of minor causes, and they do not know the precise initial conditions, such as the configuration of the shore and ocean bottom. One might suggest that there are no exact sciences, although in some cases for some purposes the inexactness of a science might be negligible. Mill disagrees. He believes that astronomy is an exact science, “because its phenomena have been brought under laws comprehending the whole of the causes by which the phenomena are influenced … and assigning to each of those causes the share of the effect which really belongs to it” (1843, 6.3.1).

Mill regards motives as analogous to forces. When he speaks of “compounding the effects” of causes, he has in mind the vector addition of forces in mechanics. Compounding of causes need not be additive. Perhaps it can be understood more generally as deducing a prediction from some principle of combination and a group of lawlike generalizations, the effects of which when operating singly are known.

When Mill talks about an “inexact science,” he is not concerned mainly with imprecision in the predictions of a science. Even if knowledge of relevant causal factors were complete, economists might still be unable to make accurate predictions because of difficulties in learning the initial conditions or because of computational or measurement limitations. Mill is concerned with inexactness within theories – within the set of lawlike statements that constitutes a theory (see §6.3).

Mill is also not mainly concerned with rough empirical generalizations such as “birds fly” or “trees shed their leaves in winter.” In his view, these are not explanatory. Instead, they express patterns in the data for which one seeks explanations.Footnote ¹ In Mill’s view, the “empirical laws” of the social sciences are typically just rough generalizations, not laws at all (compare Rescher Reference Rescher1970, pp. 164–7):

All propositions which can be framed respecting the actions of human beings as ordinarily classified, or as classified according to any kind of outward indications, are merely approximate. We can only say, Most persons of a particular age, profession, country, or rank in society have such and such qualities.

(1843, 3.23.3, emphasis added)

Although rough generalizations such as the Phillips curve lack explanatory power,Footnote ² they are the raw material for theorizing and may play an important role in models. In Mill’s view, the explanatory or causal laws of inexact sciences are not rough generalizations, which are mere correlations among features of human action as these are ordinarily classified. The “science of Human Nature” counts as a science, insofar as its rough empirical laws can be connected deductively to genuine laws of human nature.

Mill writes:

[T]here is no reason that it [the science of human nature] should not be as much a science as Tidology is …

But in order to give a genuinely scientific character to the study, it is indispensable that these approximate generalisations, which in themselves would amount only to the lowest kind of empirical laws, should be connected deductively with the other laws of nature from which they result … In other words, the science of Human Nature may be said to exist in proportion as the approximate truths which compose a practical knowledge of mankind can be exhibited as corollaries from the universal laws of human nature on which they rest, whereby the proper limits of those approximate truths would be shown, and we should be enabled to deduce others for any new state of circumstances, in anticipation of specific experience.

(1843, 6.3.2)

The generalizations concerning market demand and supply discussed in Chapters 2 and 3 are somewhere inbetween empirical laws and universal laws of human nature. These generalizations are causal claims, not merely statements of correlations. However, they are shallow and their explanatory power is limited. The “laws” of equilibrium theory from which the generalizations concerning supply and demand can be derived are, as stated, false and hence hardly “universal laws of human nature,” though they seem to identify genuine causes and function in economics as if they were laws. Tendencies are the causal powers underlying the regularities that inexact laws express.Footnote ³

In Mill’s view, knowing only the laws of the “greater causes” of the phenomena, economists are unable reliably to infer from them what will occur. Economics is in this way an inexact science. This inability is a consequence of inexactness within the theory, not merely of faulty data or mathematical limitations. Economics employs inexact laws and thus inexact theories. Although the fundamental generalizations of equilibrium theory are not true, it seems that there is a good deal of truth to them. But what does it mean to say that a claim has “a good deal of truth to it” other than putting a happy face of the admission that it is false? What exactly is inexactness? How should one analyze this inexactness and make precise the idea that economists possess true causal laws that nevertheless capture only the behavior of the most important causes of economic phenomena?

9.2 Four Kinds of Inexactness

There are at least four ways, which are not mutually exclusive, in which to analyze inexact laws:

1. Inexact laws are probabilistic or statistical. Instead of stating how human beings always behave, economic laws state how they usually behave.
2. Inexact laws are approximate. They are true within some margin of error.
3. Inexact laws are qualified with ceteris paribus clauses.
4. Inexact laws state tendencies that causal factors exert both singly and in combination.

As I argue in this chapter, the first two construals do not capture the important ways in which economic generalizations are inexact, even though it may sometimes be useful to identify the approximations and probabilistic aspects of economics. In contrast, the third and the fourth interpretations go to the heart of the matter. J. N. Keynes (Reference Keynes1917) appears to endorse the third interpretation, as I did in the first edition of this book. However, the view that the generalizations of economics express tendencies seems most faithful to Mill and to the thinking of most economists. Despite apparent metaphysical commitments in invoking tendencies, the fourth construal is more natural then the third. I argue that the differences between the third and fourth interpretations are subtle and may matter little to the practice of economics.

9.2.1 Inexactness as Probabilistic

Are the “laws” of equilibrium theory implicitly probabilistic claims? After all, even though people’s preferences are not always transitive, the frequency of intransitive preferences in circumstances of economic choice is low. Satiation is not impossible, only unusual.

There is little support in Mill’s writing for this construal, and economists have seldom explicitly defended it. To see why, consider three interpretations of probabilistic claims:

1. The probability of an event $E$ is the limit of the relative frequency of $E$ in some reference class.
2. The probability of an event $E$ is the propensity or objective chance of $E$ obtaining in some chance set-up.
3. The probability of a proposition $P$ is an agent’s degree of belief in the truth of $P$ .

On a frequentist interpretation (1), one needs to identify a reference class and measure frequencies, but as far as I know, there are no measurements of the frequencies of intransitive choices, satiation, or firms not attempting to maximize profits.

Most plausible among the probabilistic interpretations of approximate truth is the propensity or objective chance interpretation (2). However, this interpretation is not helpful. It merely adds an attribution of a probabilistic magnitude (which is seldom, if ever, to be found in the economic literature) to the view of economic generalizations as expressing tendencies. Propensities or objective chances are probabilistic tendencies.

The inexact generalizations of economics are not stated in an explicitly statistical or probabilistic form; they instead appear to have counterexamples. Their merely statistical validity is not the validity of merely statistical generalizations. Without a more probabilistic structure, identifying the inexactness of a generalization with some frequency of the correctness of its implications in some reference class is merely to say that the generalization has some frequency of false implications.

Interpreted as degrees of belief, probabilities are of no help in understanding inexactness. One can hardly maintain that what makes a generalization such as “preferences are transitive” inexact is some middling degree of belief in whether preferences are transitive. To the contrary, anyone who is well informed has a degree of belief in transitivity (as a universal generalization) that is close to zero. Perhaps inexactness implies a limited degree of belief in the claim that people’s preferences tend to be transitive. But in that case, the serious work in understanding inexactness will lie in the account of tendencies, not in assigning subjective probabilities to propositions concerning tendencies.

9.2.2 Inexactness as Approximation

Sometimes lawlike claims, which are not true as stated, can be made true by specifying a margin of error in a certain domain. If the claims of special relativity theory are true, then the claims of Newtonian mechanics are in this sense approximately true in most macroscopic domains. Provided that one is dealing with bodies that move slowly compared to the speed of light, the predictions one makes using Newtonian theory are correct within a small margin of error. Limiting the scope of Newton’s laws and slightly “smearing” what they say results in literally true statements.

Mill does not interpret the laws of inexact sciences as true within a margin of error, and very little of the inexactness of economic generalizations is a matter of approximation in this sense. The difficulties with the claim that firms are profit maximizers are not resolved by making the weaker claim that the actions of firms are always within some neighborhood surrounding the profit-maximizing action. They aren’t.

9.2.3 Inexactness as Vague Qualification

A third interpretation of inexact generalizations is that they are qualified with ceteris paribus clauses – that the antecedents of these generalizations proscribe the influence of any disturbing causes. In that way, one can maintain that, with these qualifications, inexact generalizations may be true.

According to the vague qualification view, the “laws” of inexact sciences carry with them implicit ceteris paribus clauses.Footnote ⁴ This interpretation is consistent with Mill’s empiricism and much of what he writes about inexact sciences.Footnote ⁵ To assert that people’s preferences are transitive or that there are diminishing marginal returns is to make a qualified claim. A change in tastes, for example, does not falsify the first generalization, since changes in tastes are ruled out by implicit ceteris paribus clauses. According to this interpretation, when Mill speaks of the “psychological law” “that a greater gain is preferred to a smaller,” he is claiming that people prefer greater gains when there are no interferences or disturbing causes. The models that economists construct analyze the predominant factors that operate in economic behavior, which may, however, be modified and sometimes counteracted by disturbing causes.

The ceteris paribus clauses that render laws inexact are imprecise and ineliminable and thus problematic. Is it sensible to regard vaguely qualified statements as laws (see Hutchison Reference Hutchison1938, pp. 40–1)? Not all appeals to ceteris paribus qualifications to explain away apparent disconfirmations are legitimate: it is certainly not the case that, ceteris paribus, horses have six legs. One who regards the laws of inexact sciences as vaguely qualified claims must distinguish legitimate from illegitimate ceteris paribus qualifications. What do sentences with ceteris paribus clauses mean, and when, if ever, can they be true? When is one justified in regarding them as laws? Some, such as Earman and Roberts (Reference Earman and Roberts1999), argue that the answer is “never.”

Moreover, Mill complains – and Cartwright follows him in her (1989) and (1999) – that economic generalizations qualified with ceteris paribus clauses do not tell us about what happens when, as is often the case, ceteris is not paribus and “disturbing causes” are present. Economists need to know what contribution a causal factor makes to outcomes that are influenced by multiple causes:

Now, if we happen to know what would be the effect of each cause when acting separately from the other, we are often able to arrive deductively, or a priori, at a correct prediction of what will arise from their conjunct agency. To render this possible, it is only necessary that the same law which expresses the effect of each cause acting by itself shall also correctly express the part due to that cause of the effect which follows from the two together.

(1843, 3.6.1)

It thus appears that the claim that a price decrease tends to cause an increase in demand is stronger than the claim that ceteris paribus, or in the absence of disturbing causes, price decreases cause increases in demand. It looks as if we need to opt for the tendency interpretation, which, unlike the ceteris paribus interpretation, maintains that the influence of price on demand is still “at work” when the ceteris paribus condition is not met.

9.2.4 Inexactness as Tendency

Mill, like others, such as Schumpeter (Reference Schumpeter1954, pp. 1049–50) or Gibbard and Varian (Reference Gibbard and Varian1978), sometimes interprets inexact laws as stating tendencies rather than hedged regularities. By a tendency I mean some nomological factor (mainly, but not exclusively, causal)Footnote ⁶ that has an influence on an outcome that can in some sense be “added” to the influence of other causes (within some set of relevant possible causal factors).Footnote ⁷ Forces in physics are in this sense tendencies, unlike the effect of pouring water over salt, which does not have the same component influence on the outcome regardless of the introduction of other chemicals. Mill identifies tendencies with what he calls “mechanical” causes:

I soon saw that in the more perfect of the sciences, we ascend, by generalization from particulars, to the tendencies of causes considered singly, and then reason downward from those separate tendencies, to the effect of the same causes when combined. I then asked myself, what is the ultimate analysis of this deductive process; … the Composition of Forces, in dynamics, occurred to me as the most complete example of the logical process I was investigating. On examining, accordingly, what the mind does when it applies the principle of the Composition of Forces, I found that it performs a simple act of addition. It adds the separate effect of the one force to the separate effect of the other, and puts down the sum of these separate effects as the joint effect. But is this a legitimate process? … I now saw, that a science is either deductive or experimental, according as, in the province it deals with, the effects of causes when conjoined, are or are not the sums of the effects which the same causes produce when separate.

(1873, pp. 95–7)

If inexact laws express tendencies, then they say not only how a cause operates in the absence of interferences, but they also permit us to understand their contribution to effects of combinations of causes (Cartwright Reference Cartwright1999, chapters 4 and 6).

Mill sometimes explicitly endorses a tendency view of the inexact “laws” of economics:

To accommodate the expression of the law to the real phenomena, we must say, not that the object moves, but that it tends to move, in the direction and with the velocity specified. We might, indeed, guard our expression in a different mode, by saying that the body moves in that manner unless prevented, or except in so far as prevented, by some counteracting cause, but the body does not only move in that manner unless counteracted; it tends to move in that manner even when counteracted; it still exerts in the original direction the same energy of movement as if its first impulse had been undisturbed, and produces, by that energy, an exactly equivalent quantity of effect.

(1843, 3.10.5)

The alternative way in which one might “guard our expression” of economic generalizations restricts their content to circumstances where there are no interferences or disturbing causes. That restriction would be intolerable. Tendency claims must thus identify a causal “force” that continues to act when there are disturbing causes.

However, the case for a tendency interpretation of inexactness as opposed to a qualification view is weaker than it may appear, because the algorithm for deriving the consequences of multiple inexact generalizations is plausibly not part of the content of those inexact generalizations themselves. Consider the motion of a projectile. Ignoring air resistance, its path will be determined by the (zero acceleration) constant horizontal component of its velocity and the (approximately) constantly increasing downward component of gravity. The claim that the net change in velocity is the vector sum of these components can plausibly be regarded as an additional law. Laws qualified with ceteris paribus clauses do not tell us how they combine, but they do not need to do so. That is the job of the generalization concerning the combination of causes. It is no demerit of the qualified generalization view that those generalizations tell us what happens only when the ceteris paribus qualification is satisfied.

The interpretation of inexact laws as qualified universal generalizations is thus not ruled out by the requirement that inexact laws be pertinent to circumstances in which, owing to the action of other causes, the ceteris paribus condition is not met. Indeed, it is questionable whether there is any important distinction between an interpretation of inexactness in terms of tendencies or in terms of ceteris paribus qualifications. “For how could there be a ‘tendency to cause or bring about something’ without there being a law to the effect that, ceteris paribus, if certain conditions are satisfied, such and such will be the result” (Pietrosky and Rey Reference Pietroski and Rey1995, pp. 103–4). To treat some claim as a tendency rather than as a ceteris paribus law is to invoke (possibly implicitly) some principle of composition of the effects of causes. What makes the laws governing those causes inexact lies in the imprecision and inaccuracy of the implications of those laws, both singly and in combination.

There is a deeper question at issue in interpreting inexactness in economics and elsewhere in terms of ceteris paribus qualifications or in terms of tendencies. In Nancy Cartwright’s view, one faces a choice between understanding science as fundamentally either (1) a matter of laws which, in conjunction with initial conditions and auxiliary assumptions, including ceteris paribus conditions, give rise to tendencies and enable us to explain and predict phenomena; or (2) the identification of tendencies that in combination enable us to explain and predict phenomena and that when combined in just the right way give rise to regularities. For the purposes of this book, I do not need to choose.

9.2.5 Some Remarks on Idealizations

The tendency view of inexact laws and theories should be distinguished from the related claim that economics involves ideal entities or circumstances. The claims that people’s preferences are transitive and that commodities are infinitely divisible may both be regarded as idealizations, but only the first has any pretenses to be a law. Although laws may involve idealizations, idealizations are especially important with respect to the nonlaw aspects of models. The modeling assumption that commodities are infinitely divisible or that individuals have perfect knowledge are paradigm instances of idealizations, while the exaggerations in asserting that preferences are complete or that individuals are not satiated are not clear cases of idealizations at all.

Not every claim that is known to be false, whether purportedly a law or not, counts as an idealization. The claim that crocodiles have feathers is not an idealization. Idealizations involve exaggerating some actual property toward some limit.Footnote ⁸ In the class of quantitative relations, idealization can be a matter of taking some small quantity to be zero, some large quantity to be infinite, some quantities that are almost equal to be exactly equal, or some approximations to be precise values.

Idealizations have a purpose. They allow theorizing to escape from the “mess” of reality. Idealization permits interconnected phenomena to be treated as isolated, and it cuts off (in theory) the effects of subsidiary causes.Footnote ⁹ Idealizations can be successively relaxed and the complications from which idealizations permit one to abstract successively can be tackled. As Mill’s remarks on geometry show, he believes that idealization has a legitimate role to play in science and that statements involving idealizations are confirmable and may be true counterfactuals.Footnote ¹⁰

An idealization is a false claim that exaggerates some feature of reality for some abstractive or isolating theoretical purpose. Inexact laws and statements of tendencies may involve idealizations but they need not. Idealizations permit scientists to draw conclusions about how things would be were friction zero rather than small or were people perfectly rational rather than not usually irrational. Not all false claims in models are in this sense idealizations. Sometimes models contain assumptions that are essential to their implications; replacing them with some more realistic assumptions would not result in more or less the same implications. Models that contain such assumptions are much more troubling.Footnote ¹¹

9.3 The Meaning or Truth Conditions of Inexact (Causal) Generalizations

Economic laws are qualified with ceteris paribus clauses in two different ways. In partial equilibrium theories and practical work, it is common practice to consider separately the effects of different known causal factors. As discussed in Section 2.1, for example, demand for some commodity or service depends on its price, the prices of substitutes and complements, income, and tastes. Yet economists may want to consider demand for coffee as a function (ceteris paribus) of the price of coffee only. In the language of tendencies, they may want to consider how a change in the price of coffee tends to affect the quantity of coffee demanded when acting by itself. Here the constituents of the ceteris paribus clause are those factors that economic theory itself identifies as other causal determinants of demand for coffee. Such ceteris paribus qualifications are of philosophical interest in the analysis of the causal structure of partial equilibrium explanations, but the meaning and justification of “laws” with only such qualifications is unproblematic. If one takes for granted fundamental economic theory, the term “ceteris paribus” in generalizations such as the law of demand can be replaced with a list of specific causal factors, the effects of which are considered separately. Moreover, in principle, changes in the price of coffee, the prices of complements and substitutes, and incomes all increase or decrease the quantity of coffee demanded, and there should be an algorithm predicting the net effect of the separate tendencies. Exactly how to add in the effect of changes in tastes is murkier. Although the ceteris paribus clauses attached to derivative laws introduce no additional vagueness, they inherit the vague qualifications attached to the fundamental “laws” of equilibrium theory.

The ceteris paribus laws or statements of tendencies I am concerned with in this section and the next are more problematic. Fundamental economic theory considers only some of the causes of economic phenomena. The remaining causes are not enumerated and are often unknown. The basic claims of economics are true only under various conditions that are not fully specified. Without specifying the disturbing causes, can one still make substantive claims concerning the “greater” economic causes? What precisely is a vague ceteris paribus clause? (Or, alternatively, what makes a generalization a statement of a tendency?) What does it mean to say that people’s preferences tend to be transitive or that, ceteris paribus, people’s preferences are transitive? What must the world be like if such claims are true?

The same sentence can be used to say different things in different contexts. Following Stalnaker (Reference Stalnaker, Davidson and Harman1972, pp. 380–97), let us distinguish the meaning of a sentence (the context-invariant interpretation of the sentence) from its content (the proposition expressed by the sentence), which may vary in different contexts. “I’m confused by this book” has a single meaning, but its content depends on who utters it and when and where it is uttered. Stalnaker suggests that one should regard the meaning of a sentence as a function from contexts to contents or propositions. The meaning of a sentence determines a content in a given context.

Adapting this terminology, one might suggest that ceteris paribus clauses, both explicit and implicit, have one meaning – “other things being equal” – which in different contexts picks out different propositions or properties.Footnote ¹² The context – especially the economist’s background understanding – determines what the “other things” are and what it is for them to be “equal.” So, for example, in the simpler case of the precise ceteris paribus clauses of partial equilibrium analyses, the term “ceteris paribus” might pick out the proposition “other prices, tastes, and incomes do not change.”

The term “ceteris paribus” need not determine a property or proposition in every context. Sometimes in uttering a sentence containing such a clause, one fails to express any proposition. For example, I suggest that the ceteris paribus clause in the sentence “ceteris paribus all dogs have three heads” has no content. Moreover, the properties ceteris paribus clauses pick out in different uses may vary greatly in clarity and precision. At one extreme are examples such as those in supply and demand explanations or in some laws of physics such as Coulomb’s law.Footnote ¹³ Consider, in contrast, clauses such as “holding technology and other inputs constant,” which one finds in the law of diminishing returns. Such clauses do not have a precise extension, but they are not completely vague either. Although there are formal difficulties with vague predicates, such predicates abound in science and ordinary language, and we cannot do without them.

What proposition does a vaguely qualified law, such as “ceteris paribus people’s preferences are transitive,” express? Suppose that the logical form of an inexact law were “ceteris paribus everything that is an $F$ is a $G$ ,” where $F$ and $G$ are predicates with definite extensions.Footnote ¹⁴ Consider first the unqualified generalization, “everything that is an $F$ is a $G$ .” Logicians interpret sentences with this form to mean that there is nothing in the extension of the predicate $F$ that is not in the extension of the predicate $G$ . (Recall that the extension of a predicate is the set of all things of which the predicate is true.)

In the case of qualified generalizations such as “ceteris paribus everything that is an $F$ is a $G$ ,” some things that belong to the extension of $F$ do not belong to the extension of $G$ – otherwise the qualification would be unnecessary. One view, which I endorsed in the first edition, is to regard “ceteris paribus everything that is an $F$ is a $G$ ” as a true universal statement if and only if, in the given context, the ceteris paribus clause picks out a property – call it $C$ – and everything that is both $C$ and $F$ is $G$ . The extension of the vague predicate $C$ must contain only properties that economists consider to be nomologically relevant to $G$ . Otherwise one might take $C$ to be $G$ itself or some property whose extension includes the extension of $G$ and trivializes the analysis (Earman and Roberts Reference Earman and Roberts1999, p. 475). If one considers only the interior of region $C$ in Figure 9.1, one sees that all of region $F$ that is contained there (i.e., the intersection of regions $F$ and $C$ ) lies within region $G$ . In offering a qualified generalization, one is only asserting that, once the qualifications are met, all of region $F$ lies within region $G$ . The predicate $C$ belongs in the antecedent of the law, although it may be awkward to state the law in this form. I have drawn $C$ without a solid boundary only to suggest that economists do not know precisely what the extension of the ceteris paribus predicate is and not to suggest that it does not have a definite extension – which it must have if the qualified claim is truly to be a law. In committing oneself to a law qualified with a ceteris paribus clause, one envisions that the imprecision in the extension of the predicate one is picking out will diminish without limit as one’s scientific knowledge increases.

Figure 9.1 Ceteris paribus clauses.

To believe that, ceteris paribus, everybody’s preferences are transitive is to believe that anything that satisfies the ceteris paribus condition and is a human being has transitive preferences. One need not be disturbed by intransitive preferences caused by, for example, changes in tastes, because such counterexamples to the unqualified generalization lie outside region $C$ . In my analysis, sentences qualified with ceteris paribus clauses may be laws. A sentence with the form “ceteris paribus everything that is an $F$ is a $G$ ” is a law just in case the ceteris paribus clause determines a property $C$ in the given context, and it is a law that everything that is $C$ and $F$ is also $G$ .

This is not the only analysis of a ceteris paribus law in economics. It is plausible to believe that events and tendencies in separate sciences such as economics or psychology supervene on a variety of physical states and physical laws. Suppose that microstates $f_{1}, f_{2}, \dots f_{n}$ are realizers of some economic state $F$ , micro states such that necessarily anything that is $f_{i}$ is $F$ . In other words, each $f_{i}$ is one way of being $F$ . Suppose then that for almost all of the realizers of $F$ , there are completers $c_{1}$ , $c_{2}$ , $c_{m}$ , such that it is a true exceptionless law that if $f_{i}$ and $c_{j}$ , then $G$ . In such a case, because $F$ supervenes on the microstates, there would clearly be a nomological connection between $F$ and $G$ , but there would be no exceptionless law in the vocabulary of economics connecting them. This possibility describes a different way in which the generalizations of economics can be inexact laws or tendencies. We can call such generalizations nomological but irreparably inexact. If any of the many different physical realizations of any preference ordering do not guarantee that preferences are transitive, then there will be no condition $C$ that is stateable in the language of economics that will make, “if $C$ , then preferences are transitive” true (Schiffer Reference Schiffer1991; Fodor Reference Fodor1991). But this statement can be an (irreparably) inexact law nevertheless.

9.4 Qualification or Independent Specification

James Woodward criticizes the strategy of taking the antecedents of inexact laws to contain a ceteris paribus qualification, which he calls “exception incorporating” (2000, pp. 228–35; 2003, pp. 273–9). His name for strategies such as the one I have so far defended is tendentious, because the qualifications that I envision as implicit in the inexact generalizations of economics are not ad hoc exclusions of apparent falsifications, but instead characterize significant causal factors that enhance or impede the action of the explicitly specified causes.

Instead of regarding generalizations such as profit maximization as universal truths once they are properly qualified, Woodward suggests that one might regard them as possessing a limited scope that is specified independently. With a different interpretation, Figure 9.1 might represent the independent specification interpretation just as well as it represents the exception-incorporating view. The difference is that $C$ is now an independent specification of the scope of the generalization, not an antecedent in the law.

Woodward has an additional and much more radical critique (2002; see also Lange Reference Lange2002) of regarding ceteris paribus conditions as antecdents in laws. In his view, causal laws are not exceptionless regularities. Instead, they are statements of relations among variables that are “invariant to interventions.” What this means, roughly, is that for some interventions that change the values of variables in the antecedents of causal laws, the generalization will correctly state the values of the variables in the consequent. $Y = F (X, Z)$ explains why $Y = y *$ , if $y * = F (x *, z *)$ and for some interventions that set $X = x'$ , for some value $x'$ of $X$ , $F$ is invariant – that is $Y = F (x', z *)$ . An intervention on a variable, $X$ , that changes the value of $X$ is a cause of $X$ that has no causal connection to any other variables except in virtue of changing the value of $X$ . Although not endorsing Woodward’s view of causality, Nancy Cartwright also argues that laws have a much less significant role in modeling and explanation. In her view, laws result from stable arrangements of tendencies, which, rather than laws, are the fundamental building blocks of scientific theorizing (1999, chapter 6).

Woodward’s and Cartwright’s views are tempting but controversial, and I do not want to stake my account of tendencies or inexact laws on them. The solutions that their views on explanatory generalizations offer to the puzzle of how the laws of equilibrium theory can be explanatory despite not being true universal generalizations would be straightforward, but I am unwilling to stake my analysis on an idiosyncratic minimalist take on what causal explanation requires.

Setting aside Woodward’s and Cartwright’s accounts of causal explanation, one can nevertheless appreciate how the independent specification approach avoids complicating the generalizations of economics with qualifications, which are now off-loaded to a codicil specifying that this law is limited to a domain in which condition $C$ holds. One is only concerned with states of affairs in which $C$ is satisfied, and within $C$ , on an old-fashioned view of laws, all $F s$ are $G s$ . In both the exception-incorporating and independent specification views, all $F s$ that are also $C$ are $G s$ . But the independent specification approach allows the generalizations of economics to remain simple and unqualified, if limited in scope. The task of specifying the disturbing causes will be handed off to something like a commentary detailing when one can and cannot make use of economic laws either singly or in combination.

The independent specification view has some drawbacks. It can encourage a lazy instrumentalism (which is definitely not true of Woodward’s own views). On the independent specification view, a mistaken prediction need not call for the revision of an economic generalization such as acquisitiveness. It may only call for an adjustment in the scope specified for the generalization, which itself appears not to be subject to empirical scrutiny. Of course, if one has to keep cutting down the scope of a generalization such as acquisitiveness, there may come a time when it will make sense to abandon it. But according to the independent specification view, unfavorable evidence seems to have very little direct or immediate bearing on the credibility of purported laws themselves.

A second problem is that sometimes adding a small qualification to a generalization can vastly increase its explanatory and predictive power. Adding the qualification to the generalization might in addition point the way toward a deeper theoretical grounding for the generalization. In such a case, independent specification would impede scientific progress.

Nevertheless, the independent specification view fits the practice of economics much better than the exception-incorporating view. Economic essays are not clogged with myriad qualifications. Moreover, the independent specification view simplifies the task (see Chapter 10) of clarifying when it is reasonable or unreasonable to regard a generalization that is qualified with a ceteris paribus clause as an inexact law. The test is instead whether the independently specified ceteris paribus condition is met in the domain that economists are studying.

The term “ceteris paribus” may be used in other ways. In offering a rough generalization, such as “birds fly,” one need not believe that there is any set of conditions in which being a bird is sufficient for flying.Footnote ¹⁵ One might simply believe that almost all birds fly. Indeed, scientists may sometimes believe that the true law will not involve the current predicates used in the generalization at all. One may regard a rough generalization, such as “ $F s$ are generally $G s$ ,” as having some predictive force, even though one expects it to be superseded in the course of further inquiry. My analysis is not intended to deny these truths, nor am I taking any position concerning whether there are probabilistic or statistical laws. All I am claiming is that when one takes an inexact generalization to be an explanatory law, one supposes that the ceteris paribus clause picks out conditions in which the purported law no longer faces counterexamples.

9.5 Mechanical Phenomena and the Composition of Economic Causes

Suppose one has two qualified laws: (1) ceteris paribus for every $1 increase in $p_{x}$ , $q^{D}_{x}$ drops by 1,000 units; and (2) for every $1 increase in the price $p_{z}$ , of a substitute, $z$ , $q^{D}_{x}$ increases by 200 units. The ceteris paribus clause attached to (1) (whether it be incorporated as a qualification or independently specified) maintains that there are no effects from changes in income, from changes in prices of any complements or substitutes, from changes in tastes, or from a miscellany of other possible interferences whether they be earthquakes or alien invasions. The ceteris paribus clause attached to (2) has the same content except that it precludes changes in $p_{x}$ , and it does not preclude changes in $p_{z}$ .

From these two laws and the claim that the total effect is the sum of the two separate effects, one can apparently deduce (3) ceteris paribus if $p_{x}$ increases by $5 and $p_{z}$ increases by $10, then $q^{D}_{x}$ decreases by 5,000–2,000 or 3,000 units. Notice that the drop in demand will be the simple sum of the two effects only if the demand is the sum of separable functions of $p_{x}$ and $p_{z}$ (i.e., $q^{D}_{x} = f (p_{x}) + g (p_{z})$ . If the proportional change in demand is a linear function of the proportional change in price, then one composes the effects of the causes by multiplying their separate effects rather than adding them.Footnote ¹⁶ The “composition of causes” is not always addition.

Furthermore, it may be complicated to keep track of the contents of ceteris paribus clauses when there are multiple causes acting. The qualification in (3) clearly cannot include all the qualifications in (1) and (2), because the ceteris paribus qualification that specifies the domain of application for each law rules out the change considered in the other law. The ceteris paribus condition for the combination is the intersection of the conditions ruled out for each of the separate laws.

Consequences such as (3) can only be drawn reliably when one has what Mill calls “mechanical phenomena” (1843, 3.6.1 and 3.6.2). Mill maintains that in mechanical phenomena the effect of two causal factors acting simultaneously is the sum of the effects of each acting separately. As we have seen, this definition is too narrow. Even though in note 16 the existence of a change in the price of a substitute changes the absolute amount that $q^{D}_{x}$ decreases in response to a change in $p_{x}$ , it does not change the functional relationship. Mill needs instead something like the claim that the relationships between the effects of two causes $x$ and $z$ and the value of $y$ are mechanical if and only from the relations ceteris paribus $y = f (x)$ and ceteris paribus $y = g (z)$ it follows that ceteris paribus, $y$ is a mathematical function of both $f (x)$ and $g (z)$ , such as the sum or product. Each factor continues to “operate” no matter what other causes are operating (1843, 3.10.5; Cartwright Reference Cartwright1983, pp. 44–73). When one has such “mechanical phenomena” the causal factor captured in the qualified law is responsible for a “tendency” in the phenomena that is present whenever the causal factor is.

When one is not dealing with causal factors that compose in this way or when one simply does not know how various causal factors will interact, one may still use laws qualified with ceteris paribus clauses. Qualified laws dealing with nonmechanical phenomena will, however, be more provisional and will have a more restricted scope. They may apply only when there are no appreciable interfering factors (Elster Reference Elster1989a, p. 216). Even if the basic generalizations of equilibrium theory are inexact laws, they will not help one to understand real economies with their inevitable disturbing causes, unless economic phenomena are mechanical phenomena.

Mill simply asserts that economic phenomena are mechanical: that the basic economic causal factors continue to act as component “forces” in the total complicated effect (1843, 6.7.1). Such a supposition is implicit in many applications of economic models. I see no justification for it other than the empirical confirmation of the implications of composing the effects of multiple causes. For an illustration of how Mill treats economic phenomena as mechanical, see his discussion of the combined effects on rents, profits, and wages of an increase in capital and labor and of technological change (1871, book IV, chapter 3).

Since scientists do not know exactly what property a ceteris paribus clause picks out, why regard it as picking out any property at all? Is there enough clarity in the independent specification of a ceteris paribus clause that rules out “other interferences” as in the example just discussed? Does such a specification identify a domain in which the basic “laws” of equilibrium theory are true? One can recognize that the generalizations of equilibrium theory may guide research and help economists to interpret data without regarding them as laws. If the interferences vaguely specified by the implicit ceteris paribus clause are absent, then economists can regard the generalization in that domain as a restricted law.Footnote ¹⁷ Without the limitation to a specific domain provided by the independent specification of the ceteris paribus condition, economists can regard the generalizations of equilibrium theory merely as assumptions in models. To regard inexact general “laws” as merely assumptions in models highlights the elusiveness of ceteris paribus clauses, which I have perhaps understated, and it emphasizes that economists regard inexact “laws” differently when they use them to give explanations than when they rely on them in doing speculative research.

Because theorists use basic economic “laws” to try to explain economic phenomena, they cannot regard them as mere assumptions, but must take them as expressing some truth, however rough (see §A.3). Otherwise their attempts to use them to explain economic phenomena would be incomprehensible (Reiss Reference Reiss2012; 2013, chapter 7). At some point, with respect to some domains, economists must construe the assumptions of the basic equilibrium models either as true qualified lawlike assertions or as true statements of tendencies.

Countenancing qualified laws forthrightly, one need not make invidious comparisons between the natural sciences and social sciences. One finds instead gradations of inexactness. Scientists strive for exactness, but possessing, as they typically do (whether in economics or chemistry), only qualified generalizations or generalizations with restricted scope, they nevertheless have learned something about their subject matter and can explain some of the phenomena in the domain.

9.6 Conclusions

Mill’s views of tendencies and inexactness are of value only if there is some way to tell whether generalizations express genuine tendencies. Can those generalizations that express tendencies be distinguished from rough generalizations that happen by accident occasionally to give the right answers? Since it is entirely consistent with the claim that $F$ tends to cause $G$ that we observe instances in which $F$ is not followed by $G$ , how can claims about tendencies be tested? Chapter 10 attempts to provide the answer that economists from Mill in the first half of the nineteenth century to Lionel Robbins in the first half of the twentieth century have given.

10 Mill’s Deductive Method and the Assessment of Economic Hypotheses

In addition to offering an account of inexact laws and tendencies, Mill also discusses how to confirm claims about complicated circumstances in which multiple causes play a role. Crucial in his account of confirmation is his distinction between direct and indirect inductive methods. Mill maintains that the indirect inductive method, which he calls “the method a priori” or “the deductive method,” is especially important in economics.

Section 10.1 begins with descriptions of contemporary methods of confirmation against which to situate Mill’s views. Section 10.2 lays out the broad outlines of Mill’s deductive method. Section 10.3 expands upon Mill’s method a posteriori – his direct inductive method – to address the question of how economists can know whether their fundamental generalizations express inexact laws or tendencies. Section 10.4 examines in detail what Mill has to say about his deductive method, while Section 10.5 lays out the implicit algorithm that Mill offers for testing the implications of theoretical hypotheses in an inexact science. Section 10.6 concludes.

10.1 Confirmation: Likelihoods and Bayesian and Hypothetico-Deductive Methods

In contemporary philosophy of science, one finds several views of theory appraisal, the most important of which are the old-fashioned “hypothetico-deductive” method and Bayesian views, broadly understood.

The hypothetico-deductive method has four steps

1. Formulate a hypothesis.
2. Deduce a prediction from the hypothesis and other statements.
3. Test the prediction.
4. Evaluate the hypothesis on the basis of the test results.

One tests hypotheses by testing predictions derived from amalgamations of hypotheses and other statements, which specify the initial conditions and assume aways complications or “disturbing causes.” Owing to the inexactness of the basic “laws” of equilibrium theory, there are bound to be many failures of implications of those laws and other statements, which can reasonably be attributed to the “other statements.” The evaluation of economic hypotheses in the light of test results will not be simple. Economists typically have little basis for increased confidence in a hypothesis when things are as predicted, and they have little basis for lessened confidence when things are not as predicted. How then can economists learn from experience?

Bayesian views rest on the idea that degrees of belief can be modeled as subjective probabilities, with confirmations increasing probabilities. The definition of a conditional probability: $Pr (A / B) = Pr (A & B) / Pr (B)$ implies that $Pr (B / A) = Pr (B) \times Pr (A / B) / Pr (A)$ , which is known as Bayes’ theorem. Substituting $H$ (a statement of the hypothesis of interest) for $B$ and substituting for $A$ , $E$ (a statement that, if true, would be evidence for $H$ ), one can rewrite Bayes’ theorem as $Pr (H / E) = Pr (H) \times Pr (E / H) / Pr (E)$ . In the philosophical and statistical literature, these terms have somewhat misleading or confusing names. $Pr (H)$ is called the “prior” probability of $H$ , while $Pr (H / E)$ is called the “posterior” probability. $Pr (E / H)$ is, confusingly, called the “likelihood of $H$ .” Although $Pr (H)$ and $Pr (H / E)$ are called prior and posterior, Bayes’ theorem says nothing about time. $Pr (H)$ is not earlier than $Pr (H / E)$ . What generates the names is an idea about how to update one’s degrees of belief in response to new evidence. Suppose scientists begin at time $t$ with subjective probabilities or degrees of belief in $H$ , $E$ , $H / E$ , and $E / H$ , where $E$ is a testable implication of $H$ . To test $H$ harshly, scientists will look for a testable implication $E$ such that $P r_{t} (E)$ is low. $E$ is presumably inferred from $H$ and other premises, and let us suppose that $P r_{t} (E / H)$ is high. In that case $P r_{t} (H / E) > P r_{t} (H)$ . Suppose then that at $t ’$ , through experiment or observation, scientists determine that $E$ is true – that is, that $P r_{t ’} (E) = 1$ . The suggestion then is to adjust one’s subjective probabilities, setting $P r_{t}_{’} (H) = P r_{t} (H / E)$ . (Since $P r_{t ’} (E) = 1$ , $P r_{t ’} (E / H) = 1$ , and $P r_{t}_{’} (H) = P r_{t} (H / E) .)$ Nothing in Bayes’ theorem itself implies that one should update in this way. One might, for example, conclude that the previous likelihood and prior were mistaken. Bayesian views of confirmation echo the hypothetico-deductive method with updating substituting for the hypothetico-deductive method’s vague fourth evaluation step.

The dependence on subjective probabilities is disturbing. Should confirmation depend on the mental states of individuals? Bayesians can argue that with enough evidence differences in people’s initial subjective probabilities will wash out, but that is true only in what could be a very long run. Another way to assuage this worry is to take the consensus within a scientific community as determining the prior probabilities.

Yet another possibility for the theory of confirmation is to set aside concerns with prior or posterior probabilities altogether. Suppose that we consider how a particular evidence statement $E$ bears on competing hypotheses $H$ and $K$ . We can rewrite Bayes’ theorem as $Pr (H / E) / Pr (H) = Pr (E / H) / Pr (E)$ or as $Pr (K / E) / Pr (K) = Pr (E / K) / Pr (E) .$ Dividing the first of these by the second, we have the following:

[Pr (H / E) / Pr (H)]/[Pr (K / E) / Pr (K)] = Pr (E / H) / Pr (E / K)

What this says is that the likelihood ratio $[Pr (E / H) / Pr (E / K)]$ measures how much more (or less) credence the evidence $E$ should give you in the truth of $H$ as compared to $K$ .

Unfortunately, whether one thinks about confirmation as resting on Bayes’ theorem, evidence, likelihood, and prior probabilities, or one thinks of confirmation comparatively as resting on likelihoods alone, these probabilistic approaches relabel the problems we saw in the brief discussion of the hypothetico-deductive method rather than solving them. A crucial difficulty is that the additional premises needed to derive testable implications from economic generalizations are so dubious and the generalizations themselves sufficiently inexact that predictive successes provide scant confirmation and predictive failures provide little disconfirmation.

In the context of Bayesian or likelihoodist views of confirmation, this problem shows up as the difficulty of specifying likelihoods. Suppose that economists construct a model that relies on a generalization $G$ and permits economists to deduce a prediction, $P$ . Economists thus know that $Pr (P / G & M = 1)$ , where $M$ consists of all the other premises needed to deduce $P$ from $G$ . The problem is that $Pr (P / G & M = 1)$ says nothing about the value of $Pr (P / G)$ , which is what the Bayesian or likelihoodist needs to know to judge whether and how strongly the observation of $P$ confirms $G$ or whether and how strongly the observation that $P$ is false disconfirms $G$ .

In the following chapters in this part of the book we shall see several responses to this conundrum. This chapter presents Mill’s view of confirmation in economics, which is implicit in most mainstream economic inquiries. It has some serious problems, which Chapter 13 attempts to solve.

10.2 Confirmation in Economics: An Old-Fashioned View

One approach to the general problems of theory appraisal in economics dominated methodological discussion until the 1940s and still appears to govern a good deal of methodological practice.Footnote ¹ This view dates back at least to David Ricardo’s time, early in the nineteenth century. It was first explicitly articulated in the 1830s and 1840s by John Stuart Mill (Reference Mill1836a; 1843) and Nassau Senior (Reference Senior1836).Footnote ² I focus on Mill’s presentations, which are more philosophically sophisticated than Senior’s.

Mill’s economics, which derived from Ricardo (Reference Ricardo, Sraffa and Dobb1817), posed problems of assessment resembling those posed by modern economics. For its basic claims, such as “individuals seek more wealth,” are, as Mill explicitly points out, not true universal generalizations, and its predictions were often incorrect. Mill was both a Ricardian economist and an empiricist, but his economics seems not to measure up to empiricist standards for knowledge. The implications of Ricardian economics were not strenuously tested, and the most important of them appeared to be consistently disconfirmed (de Marchi Reference de Marchi1970). For example, Ricardo’s theory incorrectly predicted that the share of national income paid as rent would increase, which was not the case. How can Mill reconcile his confidence in economics and his empiricism?

In Mill’s view (1836a; 1843, book VI), the basic premises of economics are either introspectively established psychological claims, such as “people seek more wealth,” or experimentally confirmed technical claims, such as the law of diminishing returns. As discussed in Chapter 9, Mill believes that these established premises state accurately how specific causal factors operate. They are statements of tendencies and are inexact rather than universal generalizations. In formulating them, vague ceteris paribus qualifications will be unavoidable.Footnote ³ Economics explores the consequences of these established premises. Since so much is left out of economic theory, these consequences will not always obtain. The confidence of economists in this science is based on confirmation of its basic “laws,” not on confirmations of their economic implications. In Mill’s terminology, the method of economics, the way it establishes its conclusions, is “deductive” or “a priori.”Footnote ⁴

This view solves the problem of the inapplicability of the hypothetico-deductive and Bayesian methods by denying that the grounds for accepting or rejecting economic theories are the successes or failures of their economic predictions. Theorists rely instead on other empirical evidence, which requires further clarification. Mill’s views on theory appraisal in economics were adopted by followers such as J. E. Cairnes (Reference Cairnes1875) and early neoclassical methodologists such as John Neville Keynes (Reference Keynes1917). Moreover, with some updating, one has the view to which I suggest many economists (regardless of what they may say in methodological discussion) still subscribe.

In the so-called neoclassical revolution of the last quarter of the nineteenth century, both economic theory and its methodology changed. Neoclassical theory (particularly in its Austrian or Walrasian forms), focuses on individual decision-making and short-run micro effects, unlike classical economists who are more concerned with social classes and questions about long-run growth and distribution. Despite these differences, which were emphasized by authors such as Frank Knight (Reference Knight1935b; Reference Knight1940), Ludwig von Mises (Reference Mises1949; Reference Mises1978; Reference Mises1981), and Lionel Robbins (Reference Robbins1935), neoclassical economists agreed with Mill that the basic premises of economics are well-justified, and that empirical failures do not cast them into doubt. In defending this view, Lionel Robbins explicitly notes this long tradition (1935, p. 121), and provides the following formulation of essentially Mill’s view:

The propositions of economic theory, like all scientific theory, are obviously deductions from a series of postulates … The main postulate of the theory of value is the fact that individuals can arrange their preferences in an order, and in fact do so. The main postulate of the theory of production is the fact that there are [sic] more than one factor of production. The main postulate of the theory of dynamics is the fact that we are not certain regarding future scarcities. These are not postulates the existence of whose counterpart in reality admits of extensive dispute once their nature is fully realised. We do not need controlled experiments to establish their validity: they are so much the stuff of our everyday experience that they have only to be stated to be recognised as obvious.

(1935, pp. 78–9)

Many questions remain. What exactly is the “deductive method” or the “method a priori?” Why is it particularly apt for inexact sciences? How can one use evidence to support or dispute claims about tendencies? How does the method a priori relate to contemporary views of theory appraisal? Can one rationally defend economics by employing this method?

10.3 When Do Generalizations Express Genuine Tendencies?

It is not enough to explain the inexactness of the economic “laws” in terms of tendencies or in terms of ceteris paribus qualifications or the independent specification of ceteris paribus scope restrictions. One also needs to consider when, if ever, economists have reason to believe that a generalization is indeed a law when its scope is restricted or a ceteris paribus qualification is attached to its antecedent. If the content of the ceteris paribus condition were known, then one would face the standard question of how to determine whether a generalization is true in a particular domain. But economists do not know precisely what the extension of the ceteris paribus condition is – that is, what the size and location of region $C$ in Figure 9.1 is. When is one justified in regarding a statement with a vague ceteris paribus clause as a law?

When controlled experiments are possible, these difficulties may not be pressing, although they do not disappear. Even without knowing what the disturbing causes are, whose effects are ruled out by the ceteris paribus clause, economists may be able to arrange two circumstances in which there is little that differs between them except whether $F$ obtains. If $G$ then obtains just in case $F$ does, we have evidence for a lawful connection between them. But there are, of course, anomalies even in controlled experiments, and the failures of the “law” in less controlled circumstances would still demand explanation. So the possibility of carrying out controlled experiments (which is circumscribed in economics) is only a partial cure for the problems of justification.

I suggest that economists are justified in regarding a causal generalization such as $F s$ cause $G s$ , whose domain is only vaguely specified with a ceteris paribus clause, as expressing a tendency or as a restricted inexact law only when four conditions are met. Tendencies must be lawlike, reliable, refinable, and excusable:Footnote ⁵

1. The generalization must be lawlike. It must be the sort of statement that would be a law if it were true. It does not make much a sense to say of a book that, ceteris paribus, it is a Bible. Ceteris paribus clauses attach to purported laws. As explained in Section A.4, the notion of lawlikeness is philosophically perplexing, but the philosophical problems here are oddly untroubling in practice. Scientists and lay people are usually easily able to distinguish lawlike from nonlawlike claims.
2. The generalization must be reliable. There must a significant domain in which $F s$ cause $G s$ (which one determines by considering whether interventions that bring about $F$ succeed in bringing about $G$ as well). Whether a domain is “significant” depends on the interests of the scientists or economists who are concerned with whether $F s$ cause $G s$ . Reliability is a vague and context-dependent requirement: the causal generalization has to “work” often enough for the purposes of economists in a domain that matters to them. How reliable a generalization needs to be depends in part on what it is to be used for.
3. Tendencies must be refinable. As one elaborates and refines the specification of the ceteris paribus clause, the generalization should become more reliable. Economists may not be interested in refining the specification of the domain in which the generalization can be counted on. The less complicated original specification may be more convenient. Refinability only demands that scientists can make the generalization more reliable. The refinability condition does not, however, demand that theorists can completely replace the ceteris paribus condition with specific provisos. Refinability is a trivial condition unless one imposes constraints on the domain restrictions that refine the applicability of a generalization. Otherwise, one could refine a generalization simply by removing from its scope each case where the implications of the generalization do not obtain. But, as in the case of reliability, there are context-dependent constraints on the specifications of domains of applicability.
4. Generalizations that express tendencies must also be excusable. Economists should know enough about what sort of phenomena count as “disturbing causes” to be able to justify invoking the ceteris paribus clause as an excuse. When, for example, inflation remained low despite the Federal Reserve pumping massive amounts of money into the economy during and following the 2008–9 Great Recession, economists should have been able to point to the causal factors that explain what happened rather than invoking the ceteris paribus clause blindly. Economists should know which disturbing causes are important and should usually be able to justify relying on the ceteris paribus clause as an excuse (see Rescher Reference Rescher1970, p. 172). It should not seem to be a miracle that the generalization sometimes “works” and sometimes fails. Again, there is a danger of trivialization. To cite an interference is not just to cite an arbitrary feature of a circumstance in which the generalization fails. It is to cite a “causal factor,” and thus shows a commitment to a lawful connection between the factor cited and the failure of the generalization. To explain away anomalies in terms of interferences is to make claims that can be tested in other circumstances in which these “interferences” are present (Pietrovsky and Rey Reference Pietroski and Rey1995). It should have other testable implications beyond explaining away the specific failure of the generalization.

In my view, one may regard a generalization as expressing a tendency or an inexact law even though it would face disconfirmation without its ceteris paribus condition only if it is lawlike, reliable, refinable, and excusable. These requirements supplement rather than replace theories of confirmation such as the Bayesian and hypothetico-deductive accounts with which this chapter began.

The four conditions seem to me to be both rationally defensible and a reasonable formulation of the implicit criteria by which scientists and laymen assess the legitimacy of invoking ceteris paribus clauses to explain away apparent disconfirmations. Since one does not know precisely which predicate $C$ the ceteris paribus clause picks out in a given context, the claim “within domains in which $C$ is satisfied, everything that is F is $G ”$ is unavoidably vague and hard to test. Without knowing the boundaries of C, it is hard to look for disconfirmations of “all $F s$ and $G s$ ” within $C$ , and if the generalization is not lawlike, reliable, refinable, and excusable, there will be little reason to regard positive instances of the generalization as anything more than accidents.

10.4 Mill’s Deductive Method

Even inexact causal laws are going to be hard to come by when dealing with complicated phenomena, such as those encountered by the social scientist. But the method by which advanced sciences study complex phenomena, which Mill calls “the deductive method” or “the method a priori,” offers a partial solution. Mill describes the deductive method and defends its necessity as follows:

When an effect depends on a concurrence of causes, these causes must be studied one at a time, and their laws separately investigated, if we wish, through the causes, to obtain the power of either predicting or controlling the effect; since the law of the effect is compounded of the laws of all the causes which determine it.

(1843, 6.9.3)

If, for example, one wants to “obtain the power of predicting or controlling” an effect such as projectile motion through understanding its causes, one needs to investigate separately the separate causal factors (gravity, momentum, friction) and their laws. This notion of “compounding of the laws of all the causes” is crucial to Mill’s methodological views. Mill regards compounding as adding the effects of the separate causes, but as we have seen that is too restrictive.

By a deductive method Mill does not mean the hypothetico-deductive method, which he calls the “hypothetical method” and which he criticizes, when it fails to prove its conclusions inductively (1843, 3.14.4–5). In insisting on the need for a deductive method, Mill is also not primarily concerned with how laws and theories are discovered. For example, in discussing Whewell’s views, Mill makes clear that his methods of induction serve to justify scientific claims, whether or not they also serve as methods of discovery (1843, 3.9.6). Mill is not maintaining that what distinguishes the deductive method is that one creates hypotheses rather than derives them from evidence. Quite the contrary, the deductive method is in part an account of how one can derive economic laws from inductive evidence of a different kind (see §A.6 and §A.7).

Mill’s deductive method consists of three stages (1843, 3.11). In the first, one establishes laws by induction. Whether induction functions here as a method of discovery does not matter. First, for example, scientists interested in tides induce the laws of mechanics and of gravitation, or they borrow information concerning these laws that has been established by the inductions of others. Good evidence for these laws comes from diverse sources but rarely from direct inductive study of complex phenomena such as tides.Footnote ⁶ Mill believes that inductive methods can provide very strong support – indeed, he speaks of “empirical proof.” Second, scientists deduce the laws of tides from these fundamental laws and specifications of the relevant circumstances. Third, they must verify the deductive results. In doing so, they are not testing the basic laws, just their (inexact) lawlike consequences concerning the tides. Since many causal factors are left out, there is no way to know without testing how accurate or reliable the theory of tides is. The more complex the phenomena, the less one can study it directly and the more one needs to develop one’s science deductively on the basis of laws that are independently established. Mill does not regard induction and deduction as contraries. What is opposed to deduction is observation or experimentation (1843, 2.4.5). Deductive justification is in Mill’s view ultimately inductive. The evidence that supports (inductively) the premises of a deductive argument is the (inductive) basis for one’s belief in the argument’s conclusions (1843, 2.3.3).

To make the basic idea clearer, let me give two further illustrations. Suppose Wendy is sick and we want to know whether penicillin will help cure her (compare this to Mill’s own example, 1843, 3.10.6). The a posteriori method, or, as Mill calls it, the method of direct experience, would have us inquire whether others with symptoms resembling Wendy’s recovered more often or more rapidly when given penicillin. The method a priori, in contrast, would have us draw upon our knowledge of the causes of Wendy’s symptoms and upon our knowledge of the operation of penicillin to decide whether penicillin will help cure her. Both methods are “empirical” and involve testing. The difference is that the former attempts to use experiment or observation to learn about the complex phenomenon directly, while the latter employs observation or experiment to study the relevant component causal factors.

Similarly, one could determine empirically the range of an artillery piece directed at different angles with different charges, wind conditions, and atmospheric pressure. Or one could make use of the law of inertia, Galileo’s law of falling bodies, and experimentally determined laws of air resistance and explosive force to calculate the range. The latter deductive method is, in Mill’s view, the method of all advanced sciences, although for practical applications, the direct experience is needed as a check on the deductive results.

Presented in conjunction with examples like those in the previous paragraphs, the deductive method seems unobjectionable. The evidence concerning the correctness of Galileo’s law or the law of inertia that can be garnered from controlled experiments is of a higher quality than that provided by observations of the range of artillery pieces, so the application of these laws to complex phenomena test these laws only slightly. The laws do not say what will inevitably happen, only what tends to happen or would happen in the absence of other causal factors.

But the application of the deductive method to economics is problematic, because, in contrast to the example of determining the range of the artillery piece, causal factors that are known to be significant are left out of the story. The inexactness is far from negligible. Indeed, Mill criticizes members of the “school of Bentham” (especially, by implication, his father, James Mill) for analogous “geometrical” theorizing about government. James Mill argued in defense of representative government on the grounds that only in representative governments will the rulers have the same interests as the governed (1820). This account is, in the view of the younger Mill, empirically inadequate and methodologically flawed, for it focuses on only one (admittedly important) causal factor and ignores many others. J. S. Mill writes:

They would have applied, and did apply, their principles with innumerable allowances. But it is not allowances that are wanted … It is unphilosophical to construct a science out of a few of the agencies by which the phenomena are determined, and leave the rest to the routine of practice or the sagacity of conjecture. We either ought not to pretend to scientific forms, or we ought to study all the determining agencies equally, and endeavour, so far as it can be done, to include all of them within the pale of the science; else we shall infallibly bestow a disproportionate attention upon those which our theory takes into account, while we misestimate the rest, and probably underrate their importance. That the deductions should be from the whole and not from a part only of the laws of nature that are concerned, would be desirable even if those omitted were so insignificant in comparison with the others, that they might, for most purposes and on most occasions, be left out of the account.

(1843, 6.8.3)

But when it comes to economics, Mill apparently recommends just the methodological practice that he condemns in these remarks. For the correct method of including all the “determining agencies” “within the pale of the science” is not feasible. Economists must set their sights lower and aim only at a hypothetical science of tendencies, which is, in Mill’s view, generally “insufficient for prediction” yet “most valuable for guidance” (1843, 6.9.2). Since in political economy “the immediate determining causes are principally those which act through the desire of wealth” (1843, 6.9.3), one can separate the subject matter of political economy from other social phenomena and theorize about political economy as if the desire for wealth were virtually the only relevant causal factor.

Mill defends this sort of partial deductive method as follows:

The motive which suggests the separation of this portion of the social phenomena from the rest, and the creation of a distinct branch of science relating to them, is, that they do mainly depend, at least in the first resort, on one class of circumstances only; and that even when other circumstances interfere, the ascertainment of the effect due to the one class of circumstances alone is a sufficiently intricate and difficult business to make it expedient to perform it once for all, and then allow for the effect of the modifying circumstances; especially as certain fixed combinations of the former are apt to recur often, in conjunction with ever-varying circumstances of the latter class.

(1843, 6.9.3)

The defenses Mill offers for employing this partial or inexact deductive method seem to be (1) practical – that there is no alternative; (2) metaphysical – that, although the results are only hypothetical, the same causal influences persist even when there are other disturbing causes; and (3) pragmatic – that this is an efficient way of theorizing and that more order can be found this way than in any other.Footnote ⁷

In the case of economics, theorists first borrow basic “laws” from the natural sciences or psychology, which Mill regards as an introspective experimental science. Practitioners of other sciences test the fundamental laws upon which economics is constructed on other phenomena (including controlled experimental circumstances) where there are fewer disturbing causes. Then economic theorists develop economics deductively. Verification is essential, but not in order to test the basic laws; they are already established and could hardly be cast in doubt by the empirical vicissitudes of a deduction from a partial set of causes. Mill is unclear about whether verification is necessary in order to regard the deductively derived laws as economic laws at all, or whether verification merely determines the economic applicability or usefulness of these laws.Footnote ⁸

The deductive development of economics is not a matter of proving theorems with nothing but established laws and true descriptions of the relevant circumstances as premises. The premises of the deductions also include stipulations and auxiliary hypotheses that are often poorly established and often known to be false. Contrary to Samuelson’s assumptions, some goods keep and some people support their old parents without any expectations of a quid pro quo from members of the next generation. Furthermore, the implicit ceteris paribus qualifications in the fundamental lawlike claims themselves complicate matters, for the theorems will carry complex qualifications compounded of the qualifications on all the laws.

The messiness of the “deduction” in the inexact deductive method as it is applied in economics is not necessarily a fatal handicap when one is attempting to discover or generate theories. One task of the weakest sort of logic of discovery is to lay bare the reasoning which makes plausible first attempts at scientific theories, and deduction from somewhat plausible premises does make what is deduced plausible (Nooteboom Reference Nooteboom1986). If an economic claim can be shown to follow from more fundamental generalizations and auxiliary hypotheses, which are reasonable approximations or idealizations, one has reason to take that claim seriously. Principles such as Say’s law were embraced by economists on such grounds.

It might be argued that the partial deductive method can do no more than help make economic hypotheses plausible. For, as Mill notes, the deduced implications must themselves be confirmed, and it might be contended that the results of testing them should determine our confidence in them, not whether they were deduced from inductively established laws and various simplifications. One might thus be inclined to conclude that the deductive method is only really valuable when it cannot be used.

This dismissal of the inexact deductive method would be unjustified. There are degrees of confirmation and, as Bayesians emphasize, degrees of belief. An economist’s confidence in generalizations such as those concerning market demand may be rationally increased by showing that they can be derived from the inexact fundamental laws of the theory of consumer choice and specifications of relevant circumstances. The general strategy of developing models that incorporate the laws of equilibrium theory provides the implications of those models with a certain credibility in advance of specific testing. Furthermore, when deciding whether to attribute anomalous data to some disturbing cause or to a fundamental inadequacy in the theory itself, the deductive method turns out to be crucial. I return to this point in Chapter 15.

10.5 The Inexact Deductive Method

To sharpen the discussion, let us formulate a schema expressing the broad outlines of the deductive, or a priori, method as it appears to have been conceived by Mill to apply to economics. Later, in Chapter 13, qualifications will be needed, but at this point a bold formulation will provide a useful focus:

1. Borrow proven (ceteris paribus) laws concerning the operation of relevant causal factors and how they combine.
2. Deduce from these laws and statements of initial conditions, simplifications, etc., predictions concerning relevant phenomena.
3. Test the predictions.
4. If the predictions are correct, then regard the whole amalgam as confirmed. If the predictions are not correct, then consider:
1. a. whether there is any mistake in the deduction,
2. b. what sort of interferences occurred,
3. c. how central the borrowed laws are (how important the causal factors they identify are, and whether the set of borrowed laws should be expanded or contracted).

This method should be called Mill’s inexact method a priori. The true deductive method relies only on facts and causes, not on simplifications, and includes all the causes. The deductive method, to which Mill believes economics is condemned, cheats and omits significant causal factors. I have left out of the summary formulation the “proving” of the laws concerning relevant causal factors, which Mill takes to be the first step of the deductive method, because I want to focus on the tasks of economists, who are more concerned with applying psychological and technical laws than with establishing them. Formulating the deductive method in this way also helps to make clear how this method differs from the hypothetico-deductive method. The differences are in step 1, where one begins with “proven” (but inexact) laws rather than mere hypotheses to be tested, and in step 4. Since the laws are already established, they are not open to question in the judgment step. Apart from discovering logical errors in the deduction, all that is open to assessment are the sufficiency and accuracy of the other premises and the extent of the “coverage” provided by the borrowed laws.

Knowing (as Mill maintains) that individuals seek wealth (and leisure and “the present enjoyment of costly indulgences”), economists investigate deductively what follows from these tendencies in various situations given other plausible assumptions and simplifications. The deductive method is needed for all sciences in which there is a complexity of causal factors.

In disciplines such as economics the correspondence between the phenomena and the implications of theory is rough, and complete failures are frequent. Since economic phenomena are the effects of numerous causes, many of which the theory does not encompass, one can expect nothing better. Yet, with only this sort of evidence, how could economists rationally commit themselves to these theories? What good reason do they have to accept them? Mill believes that one cannot answer these questions by attempting to apply the hypothetico-deductive method directly and considering how well the claims of economic theory are confirmed by observations of economic phenomena. In Mill’s view, only the deductive method renders commitment to the (inexact) truth of economic theory justifiable.

10.6 Conclusion and Qualms

Having thus offered (1) an interpretation of the inexactness of the “laws” of economics, whether they be fundamental or derived; (2) an account of how one may rationally become convinced that generalizations are inexact laws, despite apparently disconfirming evidence; and (3) a construal of how one can rationally have indirect inductive grounds for accepting claims about economies by showing them to be deductive implications of premises that include fundamental laws of equilibrium theory, all that remains – so it seems – is to consider how justified various portions of economics are. Do the fundamental generalizations express genuine tendencies? Are they well established? What is one to say about the credibility of the simplifications that are needed to deduce economic conclusions? To what extent are these conclusions indeed justified?

To offer a fully satisfactory answer to these questions requires detailed knowledge of the success of particular applications of equilibrium theory. In some contexts, it seems to me uncontroversial that the propositions of equilibrium theory employed do satisfy these conditions, and that the simplifications used can be given an analogous defense.Footnote ⁹ In Capital, Profits, and Prices, I raised serious questions about whether the laws of equilibrium theory satisfied the excusability condition (Hausman Reference Hausman1981a, p. 134). At that time, I maintained that economists showed little concern to explain empirical anomalies, and I offered as an explanation their commitment to equilibrium theory as a separate science coupled with the pragmatic virtues of equilibrium theory that are discussed in Chapter 15. In my view, questions about whether it is reasonable to regard the postulates of equilibrium theory as inexact laws should be regarded as questions about the scope of these postulates and ultimately about the strategy of economic theorizing.

But there is no point in asking to what extent equilibrium theory satisfies the conditions that should be imposed on inexact sciences, if the reliance on tendencies and on the deductive method is scientifically illegitimate. And during the past century both economists and philosophers have made this charge. During this period, those concerned with economic methodology have increasingly found something fishy or even fraudulent in Mill’s and Robbins’ dogmatic attachment to inexact fundamental laws despite their frequently disconfirmed consequences. For it seems that, on Mill’s and Robbins’ view, evidence can only confirm economic theory or show that there is some disturbing cause. There seems to be no real possibility of empirical criticism and, thus, no real empirical justification for the theory. In the judgment step, no judgment of the laws themselves is permitted.

Mill’s inexact method a priori has been subject to (1) logical, (2) methodological, and (3) practical criticisms. The logical criticism (1) is directed to inexact laws themselves, and maintains that statements that are vaguely qualified with ceteris paribus clauses are scientifically illegitimate, because they are too vague, untestable, or not conclusively refutable by empirical testing. The accounts in this chapter and Chapter 9 of the truth and justification conditions for vaguely qualified statements provide most of the answer to this logical criticism, which is completed when we consider in Chapter 12 whether the logical requirement of falsifiability can justifiably be imposed on scientific claims.

The methodological criticism (2) maintains that the rules implicit in the deductive method are unacceptably dogmatic. In particular, it may plausibly be alleged that one ought not to regard the basic laws as proven or to refuse ever to regard unfavorable test results as disconfirming them. This criticism is considered in Chapter 13, where I also discuss the practical criticism.

The practical criticism (3) alleges that, by regarding apparent disconfirmations as inevitably the result of some disturbing cause, the inexact method a priori winds up justifying theories that cannot be of any practical use. For policy purposes we need to know what will happen, not what would happen in the absence of disturbing causes.

These are serious criticisms, and indeed, since the early 1940s, the only defenses of the traditional view of justification in economics have been J. Watkins’ (Reference Watkins, Feigl and Brodbeck1953), J. Melitz’s (Reference Melitz1965), I. M. W. Stewart’s (Reference Stewart1979), mine, and those of the Austrian school (Dolan Reference Dolan1976). Beginning in the 1930s, there was a dramatic revolution in theorizing about economic methodology, which led to a repudiation of the inexact deductive method in precept, although, I suggest, not in practice. From what is now the orthodox contemporary perspective, the view of theory assessment in economics that I have developed in this chapter appears reactionary and wrong-headed.

Chapters 11 and 12 consider contemporary alternatives to the deductive method, their philosophical underpinnings, and the theoretical basis for the criticisms of the deductive method. I point out the inadequacies in these alternatives and in their philosophical roots before returning in Chapter 13 to the specific criticisms of Mill’s inexact method a priori. There I show how to resolve the conflict that has arisen between methodological practice (which still appears to adhere to the deductive method) and methodological precept (which is typically positivistic or Popperian in character), not by preaching better methodology to the practicing economist, but by preaching better preaching to the methodologist.

11 Methodological Revolution

Although there had been challenges to the “abstract” deductive method in the nineteenth century by members of the so-called historical school, who defended a view of economics as normative and historically bounded,Footnote ¹ the first major change in accepted views of theory assessment in economics occurred in the 1930s. In this chapter I examine this revolution in the methodological self-conception of the economics profession. In Chapter 12, I then explore criticisms and alternatives that derive from the philosophy of science defended by Popper and Lakatos, as well as a second homegrown methodological revolution that is transforming economics as I write. Unlike the previous chapters, a large part of this chapter and Chapter 12 is critical, but the traditional objections to the method a priori are best answered by showing constructively in Chapter 13 what role empirical criticism should have in economics.

I argue in this chapter and Chapter 12 that a great deal of both the criticisms and defenses of economics in methodological writings in the second half of the twentieth century was misconceived, owing to faulty views of the nature of science. These misconceived views are either reminiscent of the early logical positivists or of Karl Popper,Footnote ² or they depend on the more sophisticated views of the later logical empiricists. In criticizing the philosophical presuppositions of many of the controversies concerning economic theory, I also point out the methodological schizophrenia that is characteristic of a large portion of contemporary economics, whereby methodological doctrine and practice frequently contradict one another. This schizophrenia is a symptom of the unsound philosophical premises underlying economic methodology in the late twentieth century, and it shows the importance of transcending the terms of that debate.

11.1 Terence Hutchison and the Initial Challenge

The positivist challenge that first caught the attention of the economics profession was Terence Hutchison’s.Footnote ³ In The Significance and Basic Postulates of Economic Theory, Hutchison criticizes theoretical economics, which he regards as without empirical content, and he recommends that economists concentrate on the discovery of empirical laws that will permit “prognoses.” Hutchison was influenced by the logical positivists and Popper, with whose work (in German) Hutchison was familiar.

Hutchison’s principal criticism of theoretical economics – of, in his terminology, “propositions of pure theory” – is that it does not have testable implications. Its generalizations are either disguised tautologies or they are so hedged with ceteris paribus clauses that their unambiguous interpretation and testing is impossible.Footnote ⁴ It is hard to interpret this criticism, because it is unclear what a “proposition of pure theory” is. Hutchison seems to regard the law of diminishing marginal utility as an empirical law (1938, p. 64), while the claim that firms are profit maximizers is supposed to be a proposition of pure theory.

Hutchison does not condemn all uses of ceteris paribus clauses. He claims: “We suggest that the ceteris paribus assumptions can only be safely and significantly used in conjunction with an empirical generalisation verified as true in a large percentage of cases but occasionally liable to exceptions of a clearly describable type” (1938, p. 46). Hutchison is suggesting justification conditions on the legitimate use of ceteris paribus clauses that are similar to those developed in Chapter 10 (and indeed my work was influenced by studying Hutchison’s account). On the basis of these sketchy and, in my view, unreasonably stringent justification conditions, Hutchison turns immediately to criticism. For the economic generalizations to which ceteris paribus clauses are appended are not almost universally true. Exceptions are widespread. Economists are not just covering their ignorance of the causes of infrequent failures. Furthermore, economists have done little to classify the cases in which their generalizations fail. They cannot specify which interferences are ruled out by the ceteris paribus clauses. For these reasons, Hutchison regards the reliance on ceteris paribus clauses in economics as illegitimate.

Hutchison also criticizes what he calls the “hypothetical” or “isolating” method – the method of theorizing about simplified states of affairs with the hope of reaching an understanding of actual economies via “successive approximation” (1938, pp. 43, 119–20). He is thus forthrightly rejecting Mill’s deductive method. Hutchison maintains that the “inexact” laws upon which economists rely are not laws at all. Without their qualifications, they are not true, and, with their qualifications, they are not testable or empirically significant.

Hutchison argues that it is no defense to claim that economic laws are statements of “tendencies.” He quotes the following comments from Hayek: “There seems to be no possible doubt that the only justification for this [special concern with equilibrium analysis] is the existence of a tendency toward equilibrium. It is only with this assertion that economics ceases to be an exercise in pure logic and becomes an empirical science.”Footnote ⁵ Hutchison criticizes Hayek by distinguishing between two kinds of tendencies. In the first, “the position actually is regularly arrived at,” while in the second there is no assumption that one even comes close to the position toward which there is a supposed tendency (1938, p. 106). Hutchison finds talk of tendencies in the second sense cheap. One might talk of a human tendency toward immortality that is, alas, always counterbalanced by other tendencies. Whether or not generalizations express tendencies, they are inadequate unless, at least for a specified set of conditions, they are usually correct – in other words, unless Hutchison’s stringent reliability condition is satisfied. Talk of tendencies does not resolve the problem of justifying purported laws that are apparently false.

Hutchison’s basic criticism is that claims qualified with ceteris paribus clauses and theories relying on extreme simplifications are untestable and empirically empty. Hutchison extends this criticism in various ways:

by pointing out how pervasive the inaccuracies of economic generalizations are and how economists have failed to specify sharply what classes of phenomena these generalizations are supposed to apply to,
by pointing out that the method of isolating causal factors and successively approximating the complexities of reality never gets beyond its first step, and
by arguing that claims about tendencies have little content unless the supposed tendency is not often counteracted.

Hutchison’s basic criticism is that economics does not make testable empirical claims.

According to Hutchison, economic theorists need to free themselves from abstract, tautologous, contentless theorizing and concentrate on the inductive development of empirical laws that permit genuine prognoses (1938, p. 166). How this task is to be accomplished is not clear. Hutchison has no definite program for economics, apart from his call to face the facts of uncertainty, and his philosophical apparatus is unsophisticated. Yet he is pointing out real problems in traditional economic theory. The simplifying assumption to which he most vehemently objects (1938, chapter 4) – the attribution of perfect knowledge to economic agents – still carries a heavy weight in contemporary theory. Can economists justifiably claim to have evidence for purported laws that are not supposed to apply precisely to any real economy? Successive approximations that begin with models such as Samuelson’s overlapping-generations model never get very “close” to economic reality. How can such work be of value? Might one not be better off eschewing such theorizing altogether?

Hutchison’s attack was disquieting. Did microeconomic theory measure up to the standards for science defended by up-to-date philosophy of science? Those who first rose to answer Hutchison’s challenge, such as Frank Knight (Reference Knight1940; 1941) may have aggravated rather than allayed this disquiet, for Knight repudiates the empiricist or positivist philosophy of science upon which Hutchison’s challenge relied. Knight accuses the positivists of overlooking the complexity and uncertainty of testing in all sciences (1940, p. 153), and he argues that positivist views of science are particularly inappropriate with respect to economics, which, like all sciences of human action, must concern itself with reasons, motives, values, and errors, not just causes and regularities.

By resting his response to Hutchison on controversial theses in philosophy of science, Knight may have left less philosophically sophisticated economists wondering whether there was any way to respond to Hutchison without repudiating up-to-date philosophy of science. Knight worries about the pernicious effect Hutchison’s book may have on the young (1940, pp. 151, 152), who might be influenced by the tide of empiricist philosophical views sweeping philosophy and the sciences. Knight was right to worry, for, although few wound up fully accepting Hutchison’s criticisms, even defenders of economics wound up accepting Hutchison’s central philosophical premises.

11.2 Paul Samuelson’s “Operationalism”

Given how sketchy and hard to implement Hutchison’s constructive suggestions were, it might appear that Knight had little to worry about. But, at about the time Hutchison issued his challenge, Paul Samuelson sketched out an “operationalist” program for economic theory that apparently offered a new, empirically respectable way of doing economics.

Samuelson’s views on the assessment of economic theories are scattered through his economics. I focus on his most explicit discussions of methodologyFootnote ⁶ and on the exemplification of his methodological commitments in his early work on revealed preference theory.

The relevant part of Samuelson’s (Reference Samuelson1963) argument goes as follows. Let $B$ be a theory and $C$ be the set of all its empirical consequences. Then (so Samuelson maintains) $B$ if and only if $C$ . If all of $C$ is correct, then $B$ is a good (perhaps a perfect) theory. If only some part of $C$ , $C -$ , is correct, then only that portion of $B$ , $B -$ , that implies and is implied by $C -$ , is a good theory. The remainder of $B$ is false and ought to be discarded. The application of the hypothetico-deductive method is thus rendered more difficult in some ways, since all the consequences of the theory need to be tested but simplified in another, since there are no longer any inductive leaps. Since theories supposedly entail and are entailed by the complete set of their empirical consequences, inferences concerning the correctness or incorrectness of theories are deductive.

Samuelson illustrates what he means by referring to his own work on revealed preference theory. In the late 1930s, he showed (for the two-commodity case) that an individual maximizes a complete and transitive utility function that is an increasing function of quantities of the two commodities if and only if the individual’s choices satisfy the WARP.Footnote ⁷ The WARP states that if an individual chooses the bundle $B$ , which is more expensive than commodity bundle $B'$ at one set of prices, $p$ , then the individual will not choose $B'$ over $B$ when $B'$ is more expensive than $B$ . If the individual chooses $B'$ over $B$ when $B'$ is more expensive, having previously chosen $B$ when $B$ was more expensive, then the individual violates the WARP. Samuelson believes that he has “shown that the standard theory of utility maximization implied, for the two-good case, no more and no less than that ‘no two-observed [sic] points on the demand functions should ever reveal … [a] contradiction of the Weak Axiom’” (1964, p. 738).

Samuelson is proposing a radically “behaviorist” reformulation of economic theory, on the lines of revealed preference theory. But few economists (and certainly not Samuelson himself) do such behaviorist theorizing. Nor (as I argued in §1.2) could they without dramatic loss. The general view Samuelson espouses, of replacing theories insofar as possible with representations of their correct empirical implications, is incoherent and unhelpful, and to attempt to implement it would mean abandoning economic theory.

The incoherence of Samuelson’s proposal is a consequence of the fact that the “implications of economic theory” are not implications of economic theory alone, but of economic theory coupled with statements of initial conditions concerning beliefs, market structure, etc., and auxiliary theories concerning, for example, data generation and ceteris paribus clauses. The notion of “the set of empirical consequences of a theory” has meaning only relative to these other stipulations. If an individual falsely believes that bundle $B$ is not available in the circumstances in which $B'$ is more expensive than $B$ , then the choice of $B'$ , which violates the WARP, does not falsify “the regular theory of utility maximization.”

Not only does the set of empirical consequences of an economic theory depend on the other theories economists accept (which cannot thus themselves be equivalent to their “sets of empirical consequences”), but there is no empirical advantage in insisting on this equivalence. Although each individual empirical consequence is observable, the correctness of the whole infinite set (whatever it is) is no more observable than the correctness of the theory, and an inductive leap is equally necessary.

Third, even if one could formulate a clear notion of the set of all the empirical consequences of a theory and could somehow determine that all the empirical consequences in some subset were correct, there would in general be no feasible way to replace the original theory with a pared-down version that implied only this correct subset. Unless one were extraordinarily lucky, all that would be left would be a long and useless list of conditional statements.

Finally, in attempting to eschew all theorizing that goes beyond observable consequences, one surrenders almost all explanatory ambitions. Standard utility theory apparently explains why individuals choose the consumption bundles they do (in terms of their beliefs and preferences and the constraints on their choices). Revealed preference theory permits no such explanations and no significant way of linking choice behavior in the market to other sorts of rational choice behavior.

One might respond that explanation is just excess metaphysical baggage and that the possibilities of theoretical unification do not justify a refusal to pare theories down to their empirical consequences. But such a response is hardly tenable for economists who, like Samuelson, make use of theoretical idealizations and simplifications that have many false empirical implications. Just recall Samuelson’s “Exact Consumption-Loan Model” in Chapter 8. What bearing could his views on theory appraisal have on work such as this, apart from roundly condemning it? When Fritz Machlup pointed out this apparent conflict between Samuelson’s preaching and practice (1964), Samuelson replied, “[s]cientists constantly utilize parables, paradigms, strong polar models to help understand more complicated reality. The degree to which these do more good than harm is always an open question, more like an art than a science” (1964, p. 739). But what can the word “understand” mean here?

Juxtaposing the author of “An Exact Consumption-Loan Model” with the methodologist whose views I have been discussing, one might reasonably suppose that these are two different people who happen to have the same name. This is a vivid example of the methodological schizophrenia of late twentieth-century economics, and it is found in “the very model of a modern neoclassical.”Footnote ⁸ What causes it? Why does Samuelson espouse a methodology that he so regularly violates? The reason, I conjecture, is that he believes that the equation of a theory with its empirical consequences is mandated by up-to-date philosophy of science. Since he is not about to reject this authority nor to keep his economic theorizing within behaviorist boundaries, he chooses instead to live with the contradiction.

11.3 Fritz Machlup and Logical Empiricism

During the 1940s, empirical qualms concerning economic theory grew in response not only to Hutchison’s critique and Samuelson’s operationalist program but also because of efforts of economists to test fundamental propositions of the theory of the firm. For example, Richard Lester (Reference Lester1946, Reference Lester1947) tried to determine whether firms attempt to maximize expected returns, whether they face rising cost curves, and whether they in fact adjust production until marginal revenue equals marginal cost.Footnote ⁹ Lester’s tests, which consisted of surveys sent to various businesses, were not well designed. But they attracted considerable attention and provoked strong responses (e.g., Machlup Reference Machlup1946, Reference Machlup1947; Stigler Reference Stigler1947), partly because everybody knew that Lester was right: that firms did not behave precisely as the theory of the firm maintains. As Fritz Machlup, one of Lester’s harshest critics, wrote in response to Hutchison’s empiricist critique:

Surely some businessmen do so [maximize net returns] some of the time; probably most businessmen do so most of the time. But we would certainly not find that all of the businessmen do so all of the time. Hence, the assumption of consistently profit-maximizing conduct is contrary to fact.

(1956, p. 488)

Machlup apparently confesses that Lester was right; yet he does not accept Lester’s critique. To criticize the details of Lester’s surveys, while conceding the relevance of more sophisticated studies of the same kind, seems to surrender the traditional neoclassical ship to the rising tide of logical positivism.

Machlup defends economic theory from empirical criticisms such as Lester’s and from philosophical criticisms such as Hutchison’s by applying more philosophically sophisticated views of theory structure and theory appraisal defended in the later work of the logical positivists or, as they preferred to be called, “logical empiricists.”Footnote ¹⁰ Although Machlup’s reasons are different from Samuelson’s or Hutchison’s, he rejects Mill’s deductive method as completely as they do. In attempting to reply to what he calls “ultra-empiricist” criticisms of economic theory, Machlup argues that the truth or falsity of the basic postulates of economics is not open to direct observation or test. For example, he compares the notion of “money illusion” to that of the neutrino:

With the help of the new construct the consequences deduced from the enlarged system promised to correspond to what was thought to be the record of observation; but the construct is without direct reference to observables and no one could reasonably claim to have any direct experience of illusions suffered by other minds. The reference to observed phenomena is entirely indirect.

(1960, p. 579)

In Machlup’s view, there is no direct way to observe or test the assumptions or basic postulates of economics. One can only assess them indirectly by testing the observable consequences that one can derive with their help.Footnote ¹¹ At times (1955, 1956), Machlup suggests an instrumentalist view (see §A.2), whereby it is inappropriate to assess the truth or falsity of theoretical claims at all. The only relevant question is whether such claims are good tools for making predictions concerning observable market phenomena. At other times (1960), Machlup suggests instead that such theoretical claims are “partially interpreted” through their links with observational consequences and may justifiably be judged true or false, according to whether their consequences are true or false. Either way, the denial that these propositions can or should be (directly) tested themselves is central to Machlup’s position:Footnote ¹²

Unfortunately, writers on verification have all too often overlooked the important difference between the (direct) verification of a single empirical proposition and the (indirect) verification of a theoretical system consisting of several propositions, some of which need not be directly verifiable and need not be composed of operational concepts. These are not directly verifiable propositions and these non-operational concepts may be perfectly meaningful.

(1960, p. 559)

Machlup’s response to Hutchison, to Lester, to Samuelson, and to all who question fundamental theory is to accuse them of the methodological error of attempting to assess directly the basic postulates of economic theory instead of focusing on their observational consequences. Machlup contends that up-to-date philosophy of science supports the view that fundamental theory need do no more than demonstrate its fruitfulness in deriving correct observational consequences. Just as sophisticated logical positivists recognize the legitimacy of theories in physics that concern unobservable phenomena, yet have correct observational implications, so should economists recognize the legitimacy of theories in economics that have correct observational implications. Machlup concludes that Hutchison is mistaken about which criteria must be satisfied by the statements of “pure theory,” that Lester is mistaken about what to test, and that Samuelson is mistaken about the role of theory in systematizing data. Machlup maintains that with a more sophisticated understanding of philosophy of science, the empiricist criticisms of economic theory dissolve.Footnote ¹³

Although superficially plausible, the analogy between the unobservable claims of particle physics and the false claims of equilibrium theory breaks down. Economic theories rarely make claims about unobservable things other than the unobservables (“commonsensicals” in Mäki’s term) of everyday life. Although one cannot “see” money illusion, economists can observe the behavior of salaried workers who have had a 3 percent raise in a period of 4 percent inflation or ask them whether they are richer or poorer than a year before. Such a test is no more “indirect” than are the econometric tests employed to assess the supposedly observable implications of economics concerning price changes.Footnote ¹⁴ No interesting claims in any of the sciences are appreciably more observable than is the claim that laborers suffer from money illusion. In any scientifically relevant sense of “direct testing” or “direct observing,” the behavioral postulates of economics are generally directly testable. The problem with claims such as “people’s preferences are transitive” or “firms attempt to maximize profits” is not that they are untestable but that they are false.

Why does Machlup defend this thesis? One reason is that he is motivated by a philosophical view of the privacy of subjective experience. In addition, he thinks that he can exploit contemporary philosophical work concerning theoretical physics. Given the failure of attempts to relate theoretical claims closely to observational claims via either explicit definition or reduction sentences, the logical empiricists retreated either to “partial interpretation” or noncognitive instrumentalist views of theoretical claims.Footnote ¹⁵ Machlup gives the instrumentalist and partial interpretation views a twist when he applies them to defend equilibrium theory. Unlike the logical empiricists, he is not trying to show how statements might be legitimate even if one cannot test them directly. He argues instead that one should not test the basic assertions of equilibrium theory individually and that one should ignore their apparent falsity. There is nothing in the work of the logical positivists that supports an injunction not to test or to ignore the results of tests. Without the mistaken analogy with the instrumentalist or partial interpretation views of the logical empiricists, Machlup has no argument against testing the “laws” of equilibrium theory by psychological experimentation or surveys, no matter how unsettling the results may be (see Chapters 14, 15, and 16). Moreover, Machlup insists forcefully both on the explanatory role of theories and that falsehoods are not explanatory. Machlup has no coherent answer to Hutchison’s philosophical critique or to Lester’s survey results. This conclusion is not surprising, for survey results obviously can be relevant data (see, e.g., Blinder and Choi Reference Blinder and Choi1990). Although now largely abandoned, one of the most damaging methodological legacies that Machlup (and, as we shall see, Friedman) left behind was a repudiation of survey research by economists.

11.4 Friedman’s Narrow Instrumentalism

The most influential way of apparently reconciling economics and up-to-date philosophy of science was not Machlup’s but Milton Friedman’s. Friedman’s essay, “The Methodology of Positive Economics,” is by far the most influential methodological statement of the twentieth century. Although Friedman does not explicitly refer to contemporary philosophy of science, he, too, attempts to show that economics satisfies positivist standards. In “The Methodology of Positive Economics” (1953c), Friedman offered the apparent way out of the empirical difficulties raised by Lester and others which has proven most popular with economists.Footnote ¹⁶ It is that apparent way out, not the intricacies of Friedman’s views, with which I am concerned.

After distinguishing positive and normative economics, Friedman begins his response to critics of economics, such as Lester, by asserting that the goals of a positive science are exclusively predictive (1953c, p. 7). Economists seek significant and usable predictions, not understanding or explanation. The view that science, or at least economic science, aims only at prediction is a contentious one, for which Friedman offers no argument, and it might reasonably be challenged (see §A.2). Since Friedman’s methodological views are untenable, even if one grants his claim that the goals of economics are exclusively predictive, let us grant his views of the ultimate goals of science for the purposes of argument.Footnote ¹⁷

In Friedman’s usage, any implication of a theory whose truth is not yet known counts as a prediction of a theory, whether or not it is concerned with the future. He argues that since the goals of science are exclusively predictive, a theory which enables one to make reliable predictions is a good theory. In the case of a tie on the criterion of predictive success, simpler theories or theories of wider scope (that apply to a wider range of phenomena) are to be preferred, unless they are inconvenient to use (1953c, p. 10).

Friedman stresses that there is no other test of a theory, in terms of whether its “assumptions” are “unrealistic” (1953c, p. 14). When Friedman speaks of the “assumptions” of a theory, he includes both fundamental assertions (such as the claim that consumers are utility maximizers) and additional premises needed in particular applications (such as the claim that different brands of cigarettes are perfect substitutes for one another). It is not clear what Friedman or the critics he is responding to mean by the term “unrealistic.” Friedman equivocates. Sometimes he means simply “abstract” or “not descriptively complete.” But usually, when he calls an assumption unrealistic, he means (as he must if he is to respond to Lester’s challenge) that it is not true, nor even approximately true, of the phenomena to which the theory is applied.

Friedman argues that researchers such as Lester mistakenly attempt to assess the “assumptions” of economic theory instead of its predictions. In dismissing assessment of assumptions, Friedman is responding to a critical tradition which extends back to the German Historical School via American institutionalists such as Veblen (Reference Veblen1898, Reference Veblen1900, Reference Veblen1909). Authors in this tradition object to the unrealistic assumptions of economic theory because they question the worth of abstract theorizing. Friedman apparently enables economists to reject all such criticism as fundamentally confused.Footnote ¹⁸

However, Lester’s case cannot be dismissed so easily. He apparently shows that economic theory makes false predictions concerning the results of his surveys. The distinction between assumptions and implications is a shallow one that depends on the particular formulation of a theory. Assumptions trivially imply themselves, and theories can be reformulated with different sets of assumptions that have the same implications. False assumptions concerning observable things will always result in false predictions.

For a standard instrumentalist (§A.2) who regards all the observable consequences of a theory as significant, this difficulty is insuperable, but Friedman is not such a standard instrumentalist. When one looks hard, one can find ample evidence. Consider the following six passages in which I have italicized certain phrases:

“Viewed as a body of substantive hypotheses, theory is to be judged by its predictive power for the class of phenomena which it is intended to ‘explain’” (1953c, pp. 8–9).
“For this test [of predictions] to be relevant, the deduced facts must be about the class of phenomena the hypothesis is designed to explain” (1953c, pp. 12–13).
“Misunderstanding about this apparently straightforward process centers on the phrase ‘The class of phenomena the hypothesis is designed to explain.’ The difficulty in the social sciences of getting new evidence for this class of phenomena and of judging its conformity with the implications of the hypothesis makes it tempting to suppose that other, more readily available, evidence is equally relevant” (1953c, p. 14).
“Clearly, none of these contradictions of the hypothesis is vitally relevant; the phenomena involved are not within the ‘class of phenomena the hypothesis is designed to explain …’” (1953c, p. 20).
“The decisive test is whether the hypothesis works for the phenomena it purports to explain” (1953c, p. 30).
“The question whether a theory is realistic ‘enough’ can be settled only by seeing whether it yields predictions that are good enough for the purpose in hand” (1953c, p. 41).

Although some ambiguities are hidden by taking these quotations out of context, they show that Friedman rejects a standard instrumentalist view whereby all the observable predictions of a theory matter to its assessment. He is maintaining in effect that a good tool need not be an all-purpose tool. The goal of economics and of science in general is “narrow predictive success” – correct prediction for “the class of phenomena the hypothesis is designed to explain.” Lester’s surveys are irrelevant because answers to survey questions are not among the phenomena that the theory of the firm was designed to explain. Those who reject any inquiry into whether the claims of the theory of choice are true of individuals reason the same way.

Friedman’s views are a distinctive form of instrumentalism. Mistaken predictions matter only if they detract from a theory’s performance in predicting the phenomena it was designed to “explain.” A theory of the distribution of leaves on trees which states that it is as if leaves had the ability to move instantaneously from branch to branch is thus regarded by Friedman as perfectly “plausible” (1953c, p. 20), although of narrower scope than the accepted theory. On Friedman’s view, if a theory predicts accurately what one wants to know, it is a good theory, otherwise not.

When Friedman says that it is as if leaves move or as if expert billiard players solve complicated equations (1953c, p. 21), what he means is that attributing movement to leaves or higher mathematics to billiard players leads to correct predictions concerning the phenomena in which one is interested. And a theory which accomplishes this is a good theory, for a “theory is to be judged by its predictive power for the class of phenomena which it is intended to ‘explain’” (1953c, p. 8). It may thus seem obvious that the realism of a theory’s assumptions or the truth of its uninteresting or irrelevant implications is unimportant except insofar as either restricts the theory’s scope. Since economists are not interested in what business people say, the results of Lester’s surveys are irrelevant.

It might seem that Friedman has drawn an obvious implication of the instrumentalist view that the goals of economics are exclusively predictive. If all that matters are correct predictions concerning some class of phenomena, then surely the only test of a theory is the correctness of its predictions concerning that class of phenomena? Since the unrealistic claims within theories are not predictions concerning the relevant class of phenomena, their falsity is irrelevant.

Although apparently plausible, this line of thought is fallacious. Whether the assumptions of a model are true or false remains relevant, even if one grants Friedman’s narrow instrumentalism. Consider the following elaboration of the line of thought presented in the previous paragraph:

1. A good scientific hypothesis provides valid and meaningful predictions concerning the class of phenomena it is intended to explain (premise).
2. If a good scientific hypothesis provides valid and meaningful predictions concerning the class of phenomena it is intended to explain, then the only relevant test of whether a scientific hypothesis is a good scientific hypothesis is whether it provides valid and meaningful predictions concerning the class of phenomena it is intended to explain (premise).
3. Thus, the only relevant test of whether a scientific hypothesis is a good scientific hypothesis is whether it provides valid and meaningful predictions concerning the class of phenomena it is intended to explain (from 1 and 2).Footnote ¹⁹
4. Any other facts about a hypothesis, including whether its assumptions are realistic, are irrelevant to its scientific assessment (trivially from 3).

The argument is valid and enticing (at least if one accepts Friedman’s criterion of narrow predictive success, restated in premise 1), but it is not sound. Premise 2 is false. To see why, consider the following analogous argument.

a. A good used car drives reliably (premise).
b. If a good used car drives reliably, then the only relevant test of whether a used car is a good used car is a road test.
c. The only test of whether a used car is a good used car is a road test (from a and b).
d. Anything one discovers by opening the hood and checking the separate components of a used car is irrelevant to its assessment (trivially from c).

Presumably nobody believes c or d.Footnote ²⁰ What is wrong with the argument? If a road test were a conclusive test of a car’s future performance, then premise b would be true, and there would indeed be no point in looking under the hood. We would know everything about its performance, which is all we care about. But a road test only provides a small sample of this performance. A mechanic who examines the engine can provide relevant and useful information. The mechanic’s input is particularly important when one wants to use the car under new circumstances and when the car breaks down.Footnote ²¹ One wants a sensible and skilled mechanic who not only notices that the components have flaws, but who can also judge how well the components are likely to serve their separate purposes.

Similarly, given Friedman’s view of the goal of science, there would be no point to examining the assumptions of a theory if it were possible to carry out a definitive assessment of its future performance with respect to the phenomena it was designed to explain. But one cannot do such an assessment. Indeed, the whole point of a theory is to guide us in circumstances where we do not already know whether the predictions are correct.Footnote ²² There is thus much to be learned by examining the components (assumptions) of a theory and its predictions concerning phenomena that it was not intended to explain. Such consideration of the “realism” of a theory’s assumptions is particularly important to provide guidance when extending the theory to new circumstances or when revising it in the face of predictive failure.Footnote ²³ What is relevant in the messy world of economics is not whether the assumptions are perfectly true, but whether they are adequate approximations and whether their falsehood is likely to matter for particular purposes. Saying this is not conceding Friedman’s case. Wide, not narrow, predictive success constitutes the grounds for judging whether a theory’s assumptions are adequate approximations. The fact that a computer program correctly solves a few problems does not render study of its algorithm and code superfluous or irrelevant.

As is implicit in the previous remarks, there is some truth in Friedman’s defense of theories containing unrealistic assumptions. For failures of assumptions may sometimes be irrelevant to the performance of the hypothesis with respect to the designated range of phenomena. Just as a malfunctioning air-conditioner is irrelevant to a car’s performance in Alaska (setting aside global warming), so is the falsity of the assumption of infinite divisibility unimportant in hypotheses concerning markets for basic grains. Given Friedman’s narrow view of the goals of science (which I am conceding for the purposes of argument but would otherwise contest), the realism of assumptions may thus sometimes be irrelevant. But this practical wisdom does not support Friedman’s strong conclusion that only narrow predictive success is relevant to the assessment of a hypothesis.

One should note three qualifications. First, we sometimes have a wealth of information concerning the track record of theories and automobiles. I may know that my friend’s old Ford has been running without trouble for the past seven years. The more information we have about performance, the less important is separate examination of components. But it remains sensible to assess assumptions or components, particularly in circumstances of breakdown and before applying them in a new way. Second, intellectual tools, unlike mechanical tools, do not wear out. But, if one has not yet grasped the fundamental laws governing a subject and does not fully know the scope of the laws and the boundary conditions on their validity, then generalizations are as likely to break down as are physical implements. Third (as Erkki Koskela reminded me), it is easier to interpret a road test than an econometric study. The difficulties of testing in economics make it all the more mandatory to look under the hood.

When either theories or used cars work, it makes sense to use them – although caution is in order if the risks of failure are large, and their parts have not been examined or appear to be faulty. But known performance in some sample is not the only relevant information. Economists must (and do) look under the hoods of their theoretical vehicles. When they find embarrassing things there, they must not avert their eyes and claim that what they have found cannot matter.

Thus, even if one fully grants Friedman’s view of the goals of science, one should still be concerned about the realism of assumptions. There is no good way to know what to try when a prediction fails or whether to employ a theory in a new application without judging its assumptions. Without assessments of the realism – that is, the approximate truth – of assumptions, the process of theory modification would be hopelessly inefficient and the application of theories to new circumstances nothing but guesswork. Even if all one wants of theories are valid predictions concerning particular phenomena, one needs to judge whether the needed assumptions are reasonable approximations, and one thus needs to be concerned about incorrect predictions, no matter how apparently irrelevant.

I have dwelled on Friedman’s views not only because of their influence but because they show the same methodological schizophrenia that we saw in Samuelson’s work. Friedman’s confidence in “the maximization-of-returns hypothesis” and in mainstream theory in general purportedly rests entirely on “the repeated failure of its implications to be contradicted” (1953c, p. 22; but see pp. 26–30 on indirect testing). On this, Friedman is at one with Popperian methodologists such as Blaug (Reference Blaug1980a; 1980b). But the implications of economic theory have been contradicted on many occasions. This would be so even if the theory lived up to its highest praises. All it takes is some disturbance, such as a change in tastes, a new invention, a pandemic, or a real or imagined invasion from Mars.Footnote ²⁴ Does any economist accept neoclassical theory on the basis of “the repeated failure of its implications to be contradicted?” Is this not rather a doctrine piously enunciated in the presence of philosophers or of their economist fellow travelers and conveniently forgotten when there is serious work to do (Mäki Reference Mäki1986, pp. 137–40)?

11.5 Koopmans’ Restatement of the Difficulties

In concluding this survey of methodological revolution in the 1930s, 1940s, and 1950s, it is instructive to look back to Tjalling Koopmans’ Three Essays on the State of Economic Science (1957).Footnote ²⁵ Koopmans’ Essays was written in the wake of Hutchison’s and Friedman’s works (to which Koopmans refers) and at about the same time as Machlup’s views, but before the great wave of comment and criticism directed toward Friedman’s “Methodology of Positive Economics” broke out. Koopmans cogently rejects Friedman’s methodological position, but also expresses hesitation about the exaggerated claims Robbins makes for the obviousness of the basic assumptions of economics. Koopmans incisively links his methodological comments to the details of particular problems in economics and argues that some problems call for more mathematical investigation of the implications of fairly obvious postulates, while others require more empirical work. He relies upon philosophical distinctions much stressed by the positivists, such as the distinction between syntax and semantics (see §6.1), to defend the importance of purely logical and mathematical explorations in economics, yet he also defends a nonpositivist notion of model building that is similar to the view developed in Chapter 6. He states his overall conclusion concerning the assessment of economic theory as follows:

Whether the postulates are placed beyond doubt [as in Robbins], or whether doubts concerning their realism are suppressed by the assertion that verification can and should be confined to the hard-to-unravel more distant effects [as in Friedman] – in either case the argument surrounds and shields received economic theory with an appearance of invulnerability which is neither fully justified nor at all needed. The theories that have become dear to us can very well stand by themselves as an impressive and highly valuable system of deductive thought, erected on a few premises that seem to be well-chosen first approximations to a complicated reality. They exhibit in a striking manner the power of deductive reasoning in drawing conclusions which, to the extent one accepts their premises, are highly relevant to questions of economic policy. In many cases the knowledge these deductions yield is the best we have, either because better approximations have not been secured at the level of the premises, or because comparable reasoning from premises recognized as more realistic has not been completed or has not yet been found possible. Is any stronger defense needed, or even desirable?

(1957, pp. 141–2)

Depending on one’s aims, perhaps no stronger defense is needed. But a clearer one is. Although Koopmans’ general vision exudes good sense, he has avoided rather than answered the criticisms of Mill’s deductive method. Is it scientifically acceptable to rely on premises “that seem to be well-chosen first approximations?” How can such a methodology make use of the results of observation or experiment? Can such a methodology be legitimate? Given the inexactness of the premises, are the conclusions “highly relevant to questions of economic policy”? Is the test of whether this “knowledge” “is the best we have” (as Koopmans implies) a comparison only with other deductively structured and derived theoretical systems? As I argue in Chapter 13, Koopmans’ views are partially defensible. But their defensible core was not understood, and his remarks did nothing to head off three or four decades of misconceived methodological debate largely divorced from methodological practice.

12 Karl Popper and Imre Lakatos Falsificationism and Research Programs

Chapter 11 criticized mid-to-late twentieth-century alternatives to traditional views of theory appraisal in economics. This chapter looks to some of the philosophical underpinnings of later twentieth-century concerns about the methodology of economics. Apart from the work of the logical positivists or logical empiricists, the philosopher who had the greatest influence is Karl Popper (de Marchi Reference de Marchi1988; Mäki Reference Mäki1990). His views are often invoked by leading figures such as Mark Blaug and Terence Hutchison, as well as by lesser writers on economic methodology. Popper’s philosophy influenced a major introductory textbook, Richard Lipsey’s An Introduction to Positive Economics (1966). If Popper is right about what scientific methodology requires, then the traditional Millian view of justification in economics developed in Chapters 9 and 10 is indefensible.

It is odd that Popper’s philosophy of science has been so influential in economics, because, as we shall see, he demands that scientific theories make claims that, unlike the inexact generalization of economics, can be conclusively refuted by data from observation or experiment. Such a view immediately condemns virtually all of economics, and there is no prospect of replacing mainstream economics with any general theory that would begin to satisfy Popper’s strictures. The views defended by Popper’s follower and then challenger, Imre Lakatos, would appear better suited to the task, but interest in Lakatos appears to have collapsed. Like Popper, Lakatos emphasizes the centrality of theory and sees science as devoted to the articulation and refinement of theory, while the focus in economics is on specific questions, which are often of relevance to policy. Popper’s views are still mentioned, but they are now of much less interest to economists.Footnote ¹

This chapter presents and answers the Popperian and Lakatosian challenge to Mill’s views (and to mainstream economicsFootnote ²) by criticizing Popper’s and Lakatos’ philosophy of science. These criticisms are widely accepted among philosophers.Footnote ³ Although I share Popper’s and Lakatos’ concern that scientific theories be testable, their views prevent economists from coming to terms with the problems of testing in economics. A reasoned concern with testability leads one to reconsider the Robbins–Mill view of theory appraisal in economics, rather than to discard it entirely.

12.1 The Problems of Demarcation

Throughout his work, Popper is concerned with what he calls “the problem of demarcation”: the problem of distinguishing science from nonscience (§A.8). He is careful to stress that this is not the problem of distinguishing what is true from what is false. Science may lead us astray, and nonsciences or even pseudo-sciences may happen on the truth. Scientific results are not conclusively proven, and even the most successful scientific theories, such as Newton’s theory of motion, can be found in error.

If the demarcation of what is scientific from what is not scientific is not a demarcation of the true from the false, why should one care whether some discipline or theory is scientific? Of course, it matters when it comes to membership in the American Academy for the Advancement of Science or to the receipt of grants from the National Science Foundation. Our culture values science (or used to). Other cultures value sorcery. Does the question of whether something is scientific have any further significance?

The answer to which Popper subscribes is simple. What justifies a concern with the problem of demarcation is that science has a particularly excellent method of weeding out truth from falsity. In non-Popperian terminology, the conclusions of science have a special claim to be believed. Nancy Reagan’s reliance on the advice of an astrologer was disturbing because there is no good reason to believe astrologers, not because astrologers are always wrong.Footnote ⁴ The distinction between the questions “is this claim true?” and “are we justified in believing this claim?” is critical to clear thinking in the philosophy of science.

Popper argues that what distinguishes scientific theories, such as Newton’s or Einstein’s, from unscientific theories, such as Freud’s or those endorsed by astrologers, is that scientific theories are falsifiable. A theory is falsifiable if there are some possible tests or observations which, if the results are unfavorable, would be evidence that the theory is false. “All swans are white” is the sort of statement which is appropriate in science, because the observation of a nonwhite swan can establish its falsity (Popper’s own example, 1968, p. 27).

Popper refines this intuitive notion by distinguishing a class of “basic statements” upon whose truth agreement is easily obtained. Basic statements are true or false reports of observations that are of an “unquestioned empirical character” (1969a, p. 386; see also 1968, §§28, 29).Footnote ⁵ Accepted basic statements are not certain, infallible, or incorrigible. One is not forced by the facts to accept them. But people do (albeit tentatively) decide to do so, and they rather easily reach agreement on which basic statements to accept.

Popper defines a theory as falsifiable if and only if it is logically inconsistent with some finite set of basic statements (whether true or false). A falsifiable but true theory will not be inconsistent with any set of true basic statements, but it will be inconsistent with false basic statements. What is important is that it will not be consistent with whatever might be claimed to have been observed. A falsifiable theory will forbid some possible observations.

In an introduction to the Postscript to the Logic of Scientific Discovery written in the early 1980s, Popper writes:

It is of great importance to current discussion to notice that falsifiability in the sense of my demarcation criterion is a purely logical affair … A statement or theory is, according to my criterion, falsifiable if and only if there exists at least one potential falsifier – at least one possible basic statement that conflicts with it logically.

(1983, p. xx)

Popper insists here on a demarcation between scientific theories, which are logically falsifiable, and theories that are not logically falsifiable. Yet, decades before in The Logic of Scientific Discovery itself, Popper wrote: “Indeed, it is impossible to decide, by analysing its logical form, whether a system of statements is a conventional system of irrefutable implicit definitions, or whether it is a system which is empirical in my sense; that is, a refutable system” (1968, p. 82).

By a “conventionalist theory” Popper means a theory whose claims are taken, as a matter of convention or decision, to be true, or at least beyond questioning. Here the demarcation depends on the methods and attitudes of the practitioners rather than on the logical properties of theories.

Despite his apparent concern to distinguish scientific theories or statements from theories or statements that are not scientific, throughout his career, Popper stressed the importance of methodological decisions and was concerned with a demarcation between scientific practices and practices that are not scientific. Popper maintains that through the use of “conventionalist stratagems” people can cling unscientifically to theories such as Marx’s, which were falsifiable, and were, in Popper’s view, falsified. Let us first examine how Popper distinguishes scientific theories from theories that are not scientific.

12.2 Logical Falsifiability and Popper’s Solution to the Problem of Induction

As a corollary of logical falsifiability, Popper emphasizes an “asymmetry between verifiability and falsifiability; an asymmetry which results from the logical form of universal statements” (1968, p. 41; see also 1983, pp. 181–9). A universal statement concerning an unbounded domain may be falsifiable – that is, it may be inconsistent with some basic statements. But it will not be verifiable – it will not be deducible from any finite set of basic statements, and its negation will not be inconsistent with any finite set of basic statements. For example, “this swan is black” falsifies “all swans are white.” But no set of observation reports entails “all swans are white.” It is not possible to verify any truly universal statement, but one can falsify it or verify its negation.

Popper argues that this asymmetry between falsifiability and verifiability leads to a solution to the problem of induction.Footnote ⁶ As Popper understands Hume (1972, p. 7), the problem of induction is the unsolvable problem of finding a valid argument with only basic statements as premises and a universal statement as a conclusion. In accepting the conclusion that no such argument can be given, Popper agrees with Hume. But Popper argues that one need not accept Hume’s skeptical conclusion that human inductive proclivities have no rational justification. Since one can provide valid deductive arguments against universal statements, one can (albeit fallibly, since basic statements are not themselves infallible) find out that theories are wrong.

By itself this observation does not solve the problem of induction: only the fallacy of elimination enables one to find theory $T$ meritorious merely because an alternative theory $T'$ has been refuted. However, Popper has a more radical proposal, which is to cut the linkage between knowledge and justification altogether. Conjectures about the world constitute knowledge if they are true. No justification needed. In testing conjectures, scientists sometimes find out that they are false and not knowledge at all. That which has not been falsified one takes to be knowledge. Justification has no role. Popper maintains that Hume was right to point out that claims about the future and universal generalizations cannot be justified, but he was wrong to believe that justification is needed.

Popper has put pleasant make-up on Hume’s ugly conclusion, but Popper’s bottom line is as skeptical and ultimately as nihilistic as Hume’s. For Popper explicitly denies that there is any room for argument in support of any theory or law. He writes, for example, “that in spite of the ‘rationality’ of choosing the best-tested theory as a basis of action, this choice is not ‘rational’ in the sense that it is based upon good reasons for expecting that it will in practice be a successful choice; there can be no good reasons in this sense, and this is precisely Hume’s result” (1972, p. 22, emphasis in original). One has no better reason to expect that the predictions of well-tested theories will be correct than to expect that untested theories will predict correctly.

Although I reject Popper’s solution to the problem of induction, I think that his proposed revision of the concept of knowledge is in the correct direction; it is merely too extreme. As argued in Section 12.4 and sketched in Section A.7, we should reject Hume’s view that human knowledge is in need of foundational justification – that is, the view that justification requires a logically valid argument from premises that are self-justified. Rather than surrender justification altogether, as Popper proposes, we need to temper our justificatory demands.

Popper’s purported solution to the problem of induction presupposes that individual scientific statements or individual scientific theories are falsifiable. But, as Popper himself notes, scientific theories are not logically falsifiable. Neither are probabilistic claims. Even a million heads in a row does not logically falsify the claim that a particular coin is unbiased. Moreover, claims that cannot be tested individually are not themselves inconsistent with any finite set of basic statements and hence are not logically falsifiable. And virtually no scientific claims of any interest and none of the conjunctions of such claims that constitute recognizable scientific theories are, by themselves, inconsistent with sets of basic statements. To falsify even so simple a scientific claim as Galileo’s law of falling bodies requires nonbasic statements concerning whether nongravitational forces are present. Only conglomerates of various theories, statements of initial conditions, and auxiliary assumptions concerning the absence of interferences will entail a prediction. If statements or theories can be regarded as scientific only if they are logically falsifiable, all nontrivial science is not science after all. Requiring that scientific theories must be individually falsifiable demands too much.

Despite his insistence on logical falsifiability, Popper recognizes that scientific theories are not logically falsifiable, and he discusses at length the role of background knowledge in testing and the “conventionalist stratagems” true believers might employ to shield theories from falsification. Logical falsifiability is not a criterion that scientific statements or even whole scientific theories have to satisfy individually. Only whole systems of scientific theories, auxiliary assumptions, and statements of initial conditions are falsifiable (1983, p. 187). Let us call such logically falsifiable conglomerates “test systems.” Galileo’s law of falling bodies is not itself logically falsifiable, but conjoined with claims about resistance and friction and other forces, one has a falsifiable test system.

To require merely that scientific theories be incorporated into logically falsifiable test systems is unfortunately inadequate as a criterion of demarcation and inconsistent with Popper’s purported solution to the problem of induction. Demanding that test systems be logically falsifiable fails to distinguish science from pseudo-science. Virtually nothing fails to count as science. For example, Popper’s objection to Marx and Freud is not that one cannot derive falsifiable predictions from their theories and various statements of initial conditions and auxiliary assumptions. What Popper objects to is instead the behavior of Freudians and Marxists – their purported unwillingness to abandon their theories in the face of predictive failures.Footnote ⁷

Popper’s concession that only whole test systems are logically falsifiable also rules out his solution to the problem of induction. Since individual scientific theories need not be falsifiable, there is no logical asymmetry between the verifiability and falsifiability of particular scientific theories: they are neither logically verifiable nor logically falsifiable. Accepted basic statements and deductive logic can get one to the falsity of whole test systems and no further.

12.3 Falsificationism as Norms to Govern Science

Popper has always stressed that methodology is concerned with rules, not simply with logic. As documented earlier, he sometimes addresses the problem of what distinguishes scientific practices from practices that are not scientific, and he specifies norms that scientists should follow. Popper’s central methodological claim is that what distinguishes scientists from nonscientists is that scientists have a critical attitude. They look for hard tests and they take test results seriously. Scientists treat their theories as corrigible hypotheses that should be put to serious tests. Such a critical attitude does not separate scientists from classical scholars, historians, or literary critics, but it does highlight the dogmatism of “scientific creationists” or of many astrologers (e.g., West and Toonder Reference West and Toonder1973).

Descending from this plausible and salutary general vision to a more detailed level, Popper’s account runs into difficulties, because he demands too much and denies the existence of empirical justification. Popper’s falsificationist methodology – his account of what a critical attitude is – consists in outline of three rules addressed to scientists:

1. Propose and consider only contentful and thus testable theories.
2. Seek only to falsify scientific theories.
3. Accept theories that withstand attempts to falsify them as worthy of further critical discussion, never as certain, probable, or close to the truth.Footnote ⁸

The second and third rules are better understood as requirements on the institutions of science than on individual scientists (1969b, p. 112; but see 1972, p. 266). With open and free communication, the institution as a whole may be critical, even though individual scientists attempt to protect their own theories from criticism.

Are these three rules good rules? The answer presumably depends on what the objectives of scientists should be. And therein lies a tangled story, which we had better avoid tackling here. Since realists and instrumentalists (see §A.2) agree that one fundamental goal of science is to provide correct predictions, let us consider how well following these rules promotes this goal.

The first rule is unproblematic and generally accepted. Science (not necessarily individual scientists) should scrutinize theories with lots of content that can be tested harshly. But there is no news in the first rule. Inductivists have been saying these things since at least the seventeenth century (Grünbaum Reference Grünbaum and Cohen1976, pp. 17–18).

The second and third rules are hard to accept. Why should scientists seek only to falsify theories, never to support them, and why should they never regard theories as more than conjectures that may be worthy of criticism?Footnote ⁹ Popper offers four reasons in defense of these two rules:

1. Popper maintains that confirming evidence is worthless since “[i]t is easy to obtain confirmations, or verifications, for nearly every theory – if we look for confirmations” (1969a, p. 36). But only comparatively worthless confirmation is readily available. From a Bayesian perspective, for example, a good test requires not only a high likelihood $(Pr (e / H))$ , but also an unlikely prediction, a low $Pr (e)$ (§A.7–§A.8). Good supporting evidence is hard to obtain and leads one to seek harsh tests (see Grünbaum Reference Grünbaum and Cohen1976, pp. 215–29).
2. Popper argues that to seek confirmation or to believe that one has found it shows a dogmatic attitude rather than the critical attitude shown by those who seek falsifications (1969a, pp. 49–50). But someone who seeks confirmation for a theory need not be credulous, closed-minded, or dogmatic. A person seeking the solution to a problem, who is concerned both with confirming and with disconfirming evidence, does not automatically qualify as a dogmatist.
3. Popper suggests that to seek supporting evidence or to regard scientific theories as sometimes well-established falsely supposes that scientific knowledge is infallible. But knowledge claims may be both well supported and fallible.Footnote ¹⁰
4. Popper finally maintains that it is impossible for evidence to support scientific theories. Scientific theories cannot be confirmed. Popper says bluntly, “there are no such things as good positive reasons; nor do we need such things” (1974, p. 1043). There is no such thing as supporting evidence.Footnote ¹¹ In Popper’s view, evidence that truly provides positive reason for accepting a scientific claim cannot be had, while evidence that inductivists mistakenly take to support scientific theories is easy to obtain. Popper’s conviction that one never has good reason to accept any scientific claims takes us back to the problem of induction. Because there are no valid arguments with only basic statements as premises and scientific theories as conclusions, Popper concludes that there is no supporting evidence, no sense in seeking it, and no justification for believing that one has found it. It remains rational to seek to criticize scientific theories, because such theories are falsifiable: there may be good arguments against them. But this asymmetry depends on the mistaken view, which is explicitly denied by Popper, that individual laws and theories in science are falsifiable.

12.4 Decisions, Evidence, and Scientific Method

Since scientific statements and theories are not individually falsifiable, how can one test theories harshly? Popper’s answer is that it is legitimate for the purpose of testing to make further decisions to take nonbasic statements as unproblematic background knowledge. These further decisions make it possible to “falsify” specific scientific theories.Footnote ¹²

Let us call falsifications whose premises include both basic statements and background knowledge “conventional falsifications” as opposed to logical falsifications. (These labels are somewhat misleading, because the tentative acceptance of basic statements as true is just as conventional as the decision to regard nonbasic statements as background knowledge.) Notice that there is no conventional asymmetry of falsification and verification. If it is permissible to include background knowledge among one’s premises in order to make conventional falsifications possible, then one also makes conventional verifications possible. The conventional asymmetry thesis fails, and Popper has failed to defend his claim that scientists should seek falsifications only.

Consider, for example, a scientist attempting to determine the spectrum of a newly discovered metallic element (see Nisbett and Thagard Reference Nisbett and Thagard1982 and Holland et al. Reference Holland, Holyoak, Nisbett and Thagard1986). The scientist already knows that the spectrum of an element is invariant from pure sample to sample. Given (1) this background knowledge, (2) the report of a particular Bunsen burner’s flame turning orange, and (3) the claim that the particular sample was pure, the scientist can deduce that all pure samples of the element will turn a Bunsen burner’s flame orange. Given background knowledge in addition to basic statements, one can provide good arguments verifying universal statements. There is no claim to incorrigibility or infallibility in pointing to such possibilities of verification, and one need be no more dogmatic in offering such arguments than one is when one relies on basic statements and background knowledge to falsify scientific claims.Footnote ¹³

Reliance on decisions does not set falsification and verification apart, and Popper’s rules are a poor procedure for determining from which theories useful predictions can be derived. In practice, in both science and everyday life, people make estimates of how well-established and plausible various claims are and how risky it is to rely on them. These estimates may be mistaken and may be revised, but they are used in pure science as well as in everyday action and engineering. Beliefs about how well-supported different propositions are may be crucial to the interpretation of experimental failures. If someone reports that price increases were followed by demand increases, economists would conclude that there was some statistical error or that other causal factors were involved. This conclusion is based on the judgment that the law of demand is a good approximation to the truth. Weaker links are more likely to break (see §13.1). Popper alleges that such judgments are unsupportable, and a Popperian scientist would not make them. He is calling for a revolution in the conduct of inquiry.

Consider how a scientific practice would work that relies on evidentially unsupported decisions to regard statements as a part of background knowledge. Could there even be a completely noninductivist Popperian science (see Watkins Reference Watkins1984)? In deciding what to do in the light of the failure of a prediction of a whole test system, one might, perhaps, be guided entirely by consideration of which revisions are maximally content increasing, least ad hoc, etc. Questions about the differing degrees of confirmation of the constituents of the system would play no role. Such an enterprise is radically unlike the science we are familiar with, and, indeed, Popper is hesitant in presenting it. Zahar (Reference Zahar1983, p. 168) quotes the following passages from Popper’s (Reference Popper1979) (written in 1930–1), which illustrate vividly Popper’s early hesitance:

We unquestionably believe in the probability of hypotheses. And what is more significant: our belief that many a hypothesis is more probable than others is motivated by reasons which undeniably possess an objective character (Grunde, denen ein objecktiver Zug nicht abgesprochen werden kann).

([1979], p. 145)

The subjective belief in the probability [of hypotheses] … assumes that a corroborated hypothesis will be corroborated again. It is clear that without this belief we could not act and hence that we could not live either … Its objective motives are clarified by the notion of corroboration to such an extent that this belief should not give rise to the deployment of any further epistemological questions.

(ibid. p. 155)

Popper seems to grapple with the problem (“without this belief we could not act”), but then to back away from it with words that baffle me:

There is first the layered structure of our theories – the layers of depth, of universality, and of precision. This structure allows us to distinguish between more risky or exposed parts of our theory, and other parts which we may – comparatively speaking – take for granted in testing an exposed hypothesis.

(1983, p. 188)

Perhaps this passage is consistent with Popper’s philosophy, but it seems that Popper has a hard time avoiding all reliance on evidential support.

A noninductive Popperian science would be inefficient. Just as I argued with respect to Friedman’s dismissal of questions concerning the realism of assumptions (§11.4), one good way to proceed in the case of failure and one basis for determining whether extensions of a theory to new domains are likely to work is to consider how well supported the components of one’s theory are. It should not take detailed philosophical argument to defend the truism that it matters for predictive purposes how well supported claims are. (But it may require a great deal of philosophical analysis and argument to clarify this truism and make it precise.) The concern with justification cannot be avoided by insisting on a purely theoretical view of science, for theoretical scientists, just as much as engineers, need to be able to rely on some statements in order to test others.

There is little to be said in defense of Popper’s second and third rules. It is sometimes sensible to regard theories that have been well tested and that have passed these tests as admissible into the background knowledge that one relies upon in developing and testing new theories – just as it is sometimes sensible to regard such theories as a reliable basis for engineering purposes. In learning more we are stuck on Neurath’s boat (§A.7), and as it becomes more seaworthy, we can repair it better.

12.5 Why Are Economic Theories Unfalsifiable?

Even if these criticisms could be answered, Popper’s rules for scientific procedure would be of little value to economists, because they foreclose any interesting questions to be asked concerning the falsifiability of economic theory.Footnote ¹⁴ The questions Popper permits all have trivial answers:

1. Are economic theories logically falsifiable by themselves? No, but neither are any interesting theories in science.
2. Can economic theories be incorporated into logically falsifiable test systems? Yes, but the same goes for theories of practically all disciplines, no matter how patently unscientific they may appear.
3. Can economists take the other statements in test systems to be background knowledge and regard economic theories as conventionally falsifiable? Yes, if one decides (without any evidential support) to take other statements to be background knowledge.

Economists concerned about whether economic theory is testable have not been preoccupied with these questions. They have instead wanted to know when experiment or observation provide good reason to believe that economic theories are correct or mistaken. That it be possible to incorporate such theories into logically falsifiable test systems and decide to regard the other statements as background knowledge is only a necessary condition. In addition, economists need good reason to believe that the other statements in such test conglomerates are true or close to the truth or that their falsity does not matter for the particular inquiry. But, according to Popper, one never has such good reason. One can never justify the decision to regard a claim as part of background knowledge on the grounds of its confirmation or corroboration (Lieberson Reference Lieberson1982b). Consequently, there is no way within Popper’s philosophy of science to capture the questions economists ask concerning the falsifiability of their theories. To understand in what way economic theories have seemed untestable, one must reject Popper’s third methodological rule prohibiting scientists from regarding theories as more than conjectures.

It is also unreasonable to follow Popper’s second methodological rule requiring scientists to seek falsifications only, and unflinchingly to discard falsified theories. If science consisted only of claims that are falsifiable but unfalsified, economic theory would be either an empirical failure or an unfalsifiable metaphysical theory. In his discussion of “the logic of the situation,” Popper seems inclined to regard economics as a metaphysical theory (Hands Reference Hands1985a). Even though one might still find economics to be useful metaphysics, the costs of such an interpretation are considerable. In such a view, there are no empirical discriminations to be drawn between mainstream economics and other approaches (unless, unlike mainstream economics, some of the other approaches actually qualified as scientific), nor could one discriminate which propositions of economic theory were better supported by the evidence.

The most prominent economic methodologists who have defended parts of the Popperian vision (Blaug Reference Blaug1980a; 1980b; 1985; Hutchison Reference Hutchison1977; 1978; 1981; 1988; Klant Reference Klant1984) have been unwilling to draw these drastic conclusions. They have instead argued only for the importance of criticism and testing. Such advice might appear harmless, like defenses of clean living and family values. But it may have distracted economists from the real difficulties that stand in the way of developing better tested theories.

Consider the allegations of Popperian methodologists such as Mark Blaug: (1) that economists rarely formulate their theories in ways which facilitate testing, (2) that they carry out few tests, and (3) that they pay little attention to negative results. Popperians and non-Popperians agree that one prominent feature of good science is a serious concern with testing and its results, even when they are unfavorable. Presumably economists know this much methodology.

Blaug’s comments on economics are no longer defensible as written. In contemporary economics, there is a great deal of empirical investigation – although of course, like other scientists, economists do not conform to Popper’s strictures. Blaug’s comments were far more justifiable in 1980 than they are today. Why was testing so unimportant in economics when Blaug was writing? Was there a prolonged lapse in scientific conduct, which can only be explained by sociological or institutional facts, or are there good reasons? Is there something about economic theories or the phenomena economists study that explains the scarcity of testing? One Popperian answer would be that economic theories are themselves untestable. But, as already argued, if the accusation is that economic theories are not by themselves logically falsifiable, then the accusation is true but trivial, for no interesting theories are falsifiable in this sense. If, on the other hand, the accusation is that economic theories cannot be combined with other statements to derive testable predictions, then it is false. Furthermore, even if it were true that economic theories are in some significant sense unfalsifiable, this would only push the explanatory question back one step. In asking why testability has so little grip in economics, one surely wants also to know why economic theories are untestable, if indeed they are.

So it seems that the only explanation for the apparent methodological failings of economics that the Popperian methodologist can give is Mark Blaug’s: that there has been too little methodological nerve. Toward the end of The Methodology of Economics, Blaug argues:

Mainstream neoclassical economists do not have the same excuse. They preach the importance of submitting theories to empirical tests, but they rarely live up to their declared methodological canons. Analytical elegance, economy of theoretical means, and the widest possible scope obtained by ever more heroic simplification have been too often prized above predictability and significance for policy questions. The working philosophy of science of modern economics may indeed be characterized as “innocuous falsificationism.”

(1980a, p. 259)

On the contrary, I suggest that economists were so little involved with testing for two reasons. First, many were involved with nonempirical conceptual work, especially concerning general equilibrium theory, which was a far more vibrant area of economic research at that time (see §6.4). Second, it is difficult to formulate feasible and ethically permissible tests of economic predictions or to interpret the results of tests.

To test a theory requires not merely that one derive a testable prediction from the theory and a set of further statements. One must also have good reason to regard these further statements as unproblematic in the context. One cannot arbitrarily decide to treat them as part of unproblematic background knowledge. Testing requires knowledge and relatively simple phenomena, such as those created in experimentation, so that few auxiliary theories are needed to derive predictions. Facing a complex subject matter, lacking such knowledge, and believing themselves unable to experiment, economists could not effectively test their theories (see §15.3). In fact, there was a partial cure, as economists acquired better experimental techniques and discovered ways of simulating experiments when they could not be carried out. To take full advantage of these new possibilities has required methodological reform – particularly of the commitment to economics as a separate science (§7.6, §13.7, and Chapter 16) – but not better standards of theory assessment. Moral entreaty to be good scientists will not help, and it can even hurt, for it disguises the real problems. Popper’s philosophy of science does not permit economists to pose the central problems of theory appraisal in economics, and it does not help to resolve them.

12.6 Lakatos and Sophisticated Methodological Falsificationism

Imre Lakatos was a follower of Popper’s, but their views came into conflict shortly before Lakatos’ premature death. Lakatos’ writings on the philosophy of science date from the late 1960s and early 1970s. For a couple of decades in the twentieth century, Lakatos had a tremendous influence on methodological thinking on economics, exceeded only by Popper’s.

Although Lakatos’ views are a brilliant modification of Popper’s, they fall prey to the same fundamental difficulties.Footnote ¹⁵ Lakatos grants many of the criticisms of Popper in this chapter, but he thinks them unfair, for Lakatos argues that Popper was moving toward a more sophisticated position to which the criticisms do not apply. Lakatos calls this new position “sophisticated methodological falsificationism.”Footnote ¹⁶ One can best grasp what sophisticated methodological falsificationism requires by contrasting its basic rules of scientific conduct with Popper’s:

1. Whereas Popper required that theories worthy of scientific attention possess a great deal of content (and thus be testable), the sophisticated methodological falsificationist requires that scientific theories possess excess content when compared to the “touchstone” theories that were previously entertained. The sophisticated methodological falsificationist requires of every new theory $T'$ that
1. a. $T'$ must explain all the corroborated content of the previous theory $T$ ,
2. b. $T'$ must have additional implications or “excess content” compared to $T$ , and
3. c. some of the “excess content” of $T'$ must be “novel predictions.”Footnote ¹⁷ A theory $T'$ that satisfies $(a) - (c)$ shows “theoretical progress.”
2. Popper’s second rule required that scientists attempt to falsify theories by seeking harsh tests, which are made possible by accepting certain statements as unproblematic background knowledge for the purpose of testing. Theories that fail such tests must be rejected. The second rule of the sophisticated methodological falsificationist, in contrast, calls upon scientists:
1. a. to modify existing theories by proposing alternatives that make novel predictions,
2. b. to test the novel predictions of the proposed alternatives – if some of these are corroborated, the alternative shows “empirical progress” – and
3. c. to reject existing theories when some of the novel predictions of a proposed alternative are corroborated or, in some circumstances, when the alternative shows sufficient theoretical progress.

Rather than accepting theories that withstand attempts to falsify them as worthy of further critical discussion, never as certain, probable, or close to the truth, as Popper’s third rule requires, the object of appraisal shifts to sequences of theories. Criticism is more difficult and more constructive than in Popper’s view. The methodological directives of sophisticated methodological falsificationism are also, Lakatos argues, less risky and more in accord with the history of science. When faced with an empirical anomaly – the falsification of a “test system” –

we do not have to decide which of the ingredients of the theory we regard as problematic and which ones we regard as unproblematic: we regard all ingredients as problematic in the light of the conflicting basic statement and try to replace all of them. If we succeed in replacing some ingredient in a “progressive” way (that is, the replacement has more corroborated empirical content than the original), we call it “falsified”.

(1970, pp. 40–1)

Sophisticated methodological falsificationism is scarcely “falsificationist”: “the few crucial excess-verifying instances are decisive” (1970, p. 36); “the only relevant evidence is the evidence anticipated by a theory” (1970, p. 38). For Popper, surviving harsh testing makes a theory particularly testworthy, nothing more. Lakatos, in contrast argues that increasing corroboration must be taken as evidence of increasing “verisimilitude,” or else science becomes a mere game. We have to recognize progress. We need an inductive principle which connects realist metaphysics with methodological appraisals, verisimilitude with corroboration. Lakatos would thus reinterpret the rules of the “scientific game” as a – conjectural – theory about the signs of the growth of knowledge, that is, about the signs of growing verisimilitude of our scientific theories (1974, p. 156).

Lakatos’ sophisticated methodological falsificationism differs from Popper’s presentation of falsificationism in two main ways. First, Lakatos shifts the focus from individual theories to series of theories. What distinguishes science from nonscience is not how scientists test and criticize individual theories but how scientists modify their theories. Second, Lakatos retreats from Popper’s repudiation of all inductive principles and insists that successfully following the rules of sophisticated methodological falsificationism permits the tentative conclusion that science is moving toward its epistemic goals.

12.7 The Appraisal of Scientific Research Programs

Lakatos maintains that sophisticated methodological falsificationism, which he attributes to Popper, needs further modification, because it leaves unexplained the continuity that persists across theory modifications. According to the rules of sophisticated methodological falsificationism, the shift from $T$ to $T'$ would be theoretically progressive if $T'$ merely tacked on to $T$ some unrelated bold conjecture. To offer a sensible appraisal of a series of theories, $T$ , $T'$ , $T''$ requires an account of what links these theories and generates the theory modifications.

Research programs thus play an important part not only in Lakatos’ account of the global theory structure of science (see §7.2), but also in his account of appraisal within science. Lakatos stresses:

The idea of growth and the concept of empirical character are soldered into one.

(1970, p. 35)

We accept theories if they indicate growth in truth-content (“progressive problemshift”); we reject them if they do not (“degenerative problemshift”). This provides us with rules for acceptance and rejection even if we assume that all the theories we shall ever put forward will be false.

(1968, p. 178)

[T]he essence of science is growth: fast potential growth … and fast actual growth.

(1978, vol. 2, p. 180; see also vol. 2, pp. 183–4)

The crucial unit of appraisal is the research program. The heuristics of a research program are responsible for its progress, although luck, genius, and nature have roles to play too. Within research programs there are also appraisals of specific theory modifications, but the standards for these appraisals are largely determined by the heuristics of the research program. Appraisals of research programs are fundamental to the understanding of science.

Lakatos reinterprets the problem of demarcation as the problem of distinguishing between “scientific and pseudo-scientific adjustments, between rational and irrational changes of theory” (1970, p. 33). The “new, nonjustificationist criteria for appraising scientific theories” are “based on anti-adhocness.” The failures of theory modifications all involve ad hocness of some kind:

I distinguish three types of ad hoc auxiliary hypotheses: those which have no excess empirical content over their predecessor $(a d h o c_{1})$ , those which have such excess content but none of it is corroborated $(a d h o c_{2})$ and finally those which are not ad hoc in these two senses but do not form an integral part of the positive heuristic $(a d h o c_{3})$ .

(1971, p. 112n)

The essential feature of science is growth, and what distinguishes sciences is their autonomous and rapid growth. Lakatos’ views, like Popper’s, cut science off from ordinary inquiry, but Lakatos retains one thread linking science to human interests: the fallible metaphysical hypothesis that increasing corroboration is a sign of increasing verisimilitude.

It is in some regards unsurprising that Lakatos’ views were appealing to economists, because they are well suited to the defense of mainstream economic theory. Lakatos makes the heuristic power of a research program, in which equilibrium theory is rich, central to its assessment, greatly downplays the importance of refutations of individual theories, and dismisses criticisms of the central propositions as methodologically misguided. Thus one finds Lakatosian defenses of microeconomics in Latsis (Reference Latsis1976) and Weintraub (Reference Weintraub1985a; 1985b; 1987; 1988).Footnote ¹⁸ On the other hand, Lakatos envisions scientists as focused on honing and elaborating theory, rather than as using theory to address specific questions, which has more and more come to characterize economics.

Lakatos’ work is a brilliant caricature that calls attention to features of science that others have overlooked. But his account of appraisal is unworkable and misconstrues scientific progress. It exaggerates the importance of growth, and its concessions to inductivism do not go far enough to meet the objections to Popper’s views presented in Section 12.4. Both in practical applications and in theoretical science, decisions to rely on particular claims depend on judgments of how well supported they are. The knowledge that $T'$ has not been falsified or that $T'$ represents progress over $T$ is not enough.

Lakatos argues that the shift from theory $T$ to theory $T'$ is not progressive unless $T'$ includes all the corroborated content of $T$ and $T'$ makes novel predictions. Because these conditions are seldom met, one must either condemn virtually all of mainstream economics or seek some other account of success than Lakatos’. Theory shifts typically involve loss as well as gain of content. For example, some of the corroborated content of the law of diminishing marginal utility was lost in the shift to ordinal utility theory, which is nevertheless regarded by economists as an important step forward.Footnote ¹⁹ To insist that corroborated content must not be lost in theory modifications would block most theory modifications. The insistence on novel predictions is also inconsistent with the history of science, as Lakatos’ shift to progressively weaker senses of “novel predictions” reveals. Lakatos is right to point out that scientists are greatly impressed by successful novel predictions, but there is little support for the view that this is the only evidence they are concerned with (§A.7).

Like Popper, Lakatos believes that theoretical science can dispense with the notion of supporting evidence for theories or theory changes. In his view, it is a mistake to inquire, “yes, I can see that $T'$ is indeed much better than $T$ , but how good is $T'$ ?” Lakatos recognizes that such questions may be unavoidable in practical life,Footnote ²⁰ but Lakatos allows them no role in theoretical science. In this, Lakatos is still a follower of Popper’s, and his view of science inherits the central flaws of Popper’s view.

Unlike Popper, Lakatos responds explicitly to the objection that practical applications depend upon judgments of reliability. In his view, the knowledge needed for engineering emerges from theoretical science:

[W]e take the extant “body of science” and replace each refuted theory in it by a weaker unrefuted version. Thus we increase the putative verisimilitude of each theory, and turn the inconsistent body of scientific theories (accepted₁ and accepted₂) into a consistent body of accepted₃ theories, which we may call, since they can be recommended for use in technology, the “body of technological theories.”

(1968, p. 183; see also 1978a, pp. 218–19)

This account has three central features, all of which are questionable. First, like Popper, Lakatos sees theoretical science as autonomous. It has nothing to learn from engineering. Second, engineering knowledge derives from theoretical knowledge through weakening the bold claims of theory. Lakatos denies that engineering possesses any autonomy. There seems to be no such thing as engineering research. Third, there is a radical discontinuity in the methods of theoretical and applied science. In theoretical science, growth and heuristic power are everything, while in engineering, the concern is with reliability.

However, theoretical science needs technological knowledge both to build its experimental devices and to judge which claims are unlikely to be responsible for apparent refutations. Much of what we call “science” is devoted to determining nitty-gritty “facts,” such as the density and tensile strength of materials, the price elasticity of demand for commodities, or the toxicity of chemicals. It is unhelpful to have to classify this work as “mere” engineering. There is no radical discontinuity between the methodology evinced by work designed to provide technologically useful results and work in theoretical science.

Lakatos links scientific theory to human concerns only via a speculative connection between corroboration and verisimilitude. His account thus places heavy reliance on the concept of verisimilitude, which was undermined by work published at the time of Lakatos’ death.Footnote ²¹ Lakatos distinguishes the formal notion of verisimilitude from an intuitive notion of closeness to the truth:

“Versimilititude” has two distinct meanings which must not be conflated. First it may be used to mean intuitive truthlikeness of the theory; in this sense, in my view, all scientific theories created by the human mind are equally unverisimilar and “occult.” Secondly, it may be used to mean a quasi-measure-theoretical difference between the true and false consequences of a theory which we can never know but certainly may guess.

(1970, p. 101)

But with the demise of the measure of this “measure-theoretic” difference, all that is left is the intuitive notion with respect to which in Lakatos’ view there is no increase in verisimilitude with the growth of science. Moreover, even if the formal notion were defensible, it would not do the work that assessments of reliability need to do. For example, Lakatos argues that “thus we cannot grade our best available theories for reliability even tentatively, for they are our ultimate standards of the moment (1968, p. 185, emphasis in original).” In Lakatos’ view, there are no grounds to believe that physical theories are more trustworthy than economic theories or that it is a better bet that properly maintained bridges will last for another five years than the rate of unemployment in one year will be as predicted. This is to deny what Lakatos ought instead to be explaining. Philosophy might convince us that we are wrong in matters this fundamental to ordinary life, but it will take an awful lot of convincing.Footnote ²²

Lakatos recognizes that actions are based on beliefs, that not all beliefs are equally reliable, and that science has something to tell us about the reliability of beliefs. But he does not want to permit questions of justification into theoretical science. The unsatisfactory result is the earlier account of “acceptability₃.”

Both science and technology need a noncomparative assessment of the extent to which various claims are supported by evidence. Without any such “justificationism,” Lakatos’ science will be as inefficient as Popper’s or Friedman’s. To repeat an argument already made and to be repeated yet again (§13.1), scientists need to judge how well supported claims are in order to modify theories efficiently in the face of apparent disconfirmations. Lakatos would advise scientists to attempt to modify everythingFootnote ²³ and to see which modifications are progressive (1970, pp. 40–1, 45). This is just like the mechanic who repairs a car by replacing its components one by one until it runs again. Such a brute force method may sometimes work, but it is typically a waste of time.

12.8 Further Comments on Induction, Falsification, and Verification

Let me return to the problem of induction, reactions to which are essential to the wrong turn that Popper and Lakatos take and to the differences between Popper and Lakatos that led to their intellectual parting. The issues are central not only in Popper’s and Lakatos’ philosophy but to the nature of science (See also §A.7.).

Hume’s problem of induction follows from the combination of (1) his empiricism, which limits empirical evidence to reports of sensory experiences and which treats these reports as self-justified; and (2) his foundationalism, which stipulates that a statement is justified only if it is self-justified or follows from justified statements by means of a valid argument. Although I shall argue against foundationalism, it is a plausible view of justification. If everything needed justifying, then the process of justification could never get started, but according to most epistemologies, including empiricism, there is a base or foundation that is not itself in need of any further justification. Genuine justification relies on nothing that is not part of this foundation other than deductive logic.

Empiricism and foundationalism jointly create an insoluble problem of induction. There are no valid deductive arguments with general laws as conclusions and nothing but basic statements as premises. The correct reaction is neither to conclude, with Hume and Popper, that all generalizations are equally unsupported, nor to conclude, with Popper, that support is never needed, nor to conclude, with Lakatos, that induction is a metaphysical leap in the dark which serves only to give some point to the game of science. The proper reaction is to take seriously the piecemeal, nonfoundational justification of generalizations relative to what scientists regard provisionally as background knowledge. This piecemeal “internal” justification is what matters in both science and practice. As Isaac Levi has rightly stressed, justification plays its part in responding to specific challenges and in changing our knowledge (1980; compare Williams Reference Williams1977 and Popper Reference Popper1969a, p. 228). We need to justify a particular claim only when it is challenged or when we run into conflicts within our beliefs.

Popper aims in the right direction but overshoots the target, and Lakatos’ correction is too slight. Popper and Lakatos stress that human knowledge does not rest on any epistemically privileged foundations. Even basic statements are not certain. One decides to accept them, although such decisions may lead one astray. Decisions do not stop there. In Popper’s view, scientists advance beyond the uninformative logical falsification of a whole test system by deciding to take a large portion of the system as “background knowledge” and to attribute the error to the remaining part. In doing so they may blunder, but Popper believes that without doing so, nothing can be learned. In Lakatos’ view decisions are also unavoidable, although they are less haphazard, for they are determined by the heuristics of a research program.

Scientists need more than mere logical falsification, and, if they want science to grow efficiently, they cannot live with Lakatos’ sophisticated methodological falsificationism either. They need a rational basis for deciding which statements to take to be true in order to test others. Bayesian models are closer to the mark. Popper denies that the extent to which a hypothesis is “corroborated” by the data ever provides such a basis. In his view, one may decide to take claims to be part of background knowledge, but one never has good reason to believe that they are true, probable, or good approximations to the truth. Lakatos will allow a metaphysical presumption of verisimilitude, but he permits this metaphysical conjecture no methodological role within theoretical science and winds up, like Popper, proscribing the efficient use of knowledge in the acquisition of knowledge.

This question of whether scientists can rely on what they think they have established in learning more about the world goes to the heart of traditional discussions of economic methodology. As argued in Chapter 10, the mainstream view of justification in economics, as enunciated by Nassau Senior, John Stuart Mill, John Neville Keynes, or Lionel Robbins, maintained that economics explores the deductive consequences of well-established principles such as “agents prefer more commodities to fewer.” These deductions constitute reasons to accept their conclusions.

So, for example, the strongest argument for the hypothesis of rational expectations is not that it survives hard tests, but that it seems to follow from equilibrium theory, once one accepts the claim that knowledge is, from an economic perspective, a commodity like any other.Footnote ²⁴ Consider the famous argument that John Muth offered:

I should like to suggest that expectations, since they are informed predictions of future events, are essentially the same as the predictions of the relevant economic theory …

If the prediction of the theory were substantially better than the expectations of the firms, then there would be opportunities for the “insider” to profit from the knowledge – by inventory speculation if possible, by operating a firm, or by selling a price forecasting service to the firms. The profit opportunities would no longer exist if the aggregate expectation of the firms is the same as the prediction of the theory.

(Muth 1961, pp. 316, 318)

This argument can be reformulated as deductively valid – that is, as an argument whose conclusion must be true if all its premises are. The conclusion is that, ceteris paribus, the expectations of firms “are essentially the same as the predictions of the relevant economic theory.” Some of the premises, such as that few firms run by economists make extraordinary profits or that the expectations of some firms coincide with the predictions of economic theory, are roughly reports of observations. But also involved are premises concerning the advantages of accurate predictions and the accuracy of the predictions of economic theory. These are not observation reports. Some of these premises are questionable, but the argument is still valid, and the premises are largely contained within the background knowledge of a mainstream economist. If scientists can make use of background knowledge – which must be the case, if there is to be any science – then there are valid arguments with accepted premises for the truth of general scientific conclusions.Footnote ²⁵

If one surrenders foundationalism and countenances conventional falsifications and verifications, one can offer a partial solution to the problem of induction: conclusions that transcend observation can be defended by good arguments if one permits large parts of our presumptive knowledge to supply some of the premises. This “solution” turns crucially on reformulating the problem of induction and changing what one expects of justification. Hume would certainly cry “foul!” As a foundationalist, he would insist that the premises themselves must have a foundational justification. He would not permit economists to help themselves to the premises in the argument that are not reports of observations. But, if one rejects a foundationalist epistemology, there is no reason to insist that the only admissible premises are observation reports. In the repudiation of all justification, which is central in Popper’s and Lakatos’ philosophy of science, Popper slides backward toward the foundationalism that he rejects, while Lakatos steps away from assessing theories at all.

12.9 Concluding Remarks on Popperian and Lakatosian Methodologies

Popper’s and Lakatos’ slogans are in many cases consistent with the reasoned consensus within the philosophy of science. Empirical criticism is crucial to science, and scientific theories must, however indirectly, be open to empirical criticism. The most important evidence in support of scientific theories comes from hard tests and analogous explanatory achievements, not from adding up favorable instances. Scientific knowledge is corrigible, and scientists may be forced to surrender even the best-established theories. All of these Popperian theses may be used to criticize irresponsible proponents of unsupported theories.

Similarly, Lakatos is correct to emphasize the role of heuristics in the development of science, and he may be right to argue that heuristic power is important in theory assessment. One must take seriously the competition between different theories within research programs and the competition between research programs, and one should not lightly surrender a powerful scientific theory until one can find a better alternative. But once economists tie themselves to a philosophical system such as Popper’s or Lakatos’, they will be trapped with its unattractive aspects. A greater measure of philosophical agnosticism among economic methodologists is sensible and fortunately widespread.

Popper’s and Lakatos’ methodologies have dramatic flaws both from the perspective of knowledge acquisition and from the perspective of error avoidance. Popper’s decisions about what to regard as unproblematic background knowledge and Lakatos’ decisions about how to modify theories must depend on the evidence. Just as engineers want theories to be well supported when they rely on them to build bridges or to manage inflation, so scientists want the theories they use to test other theories to be well supported. We need confirmations to decide which theories to use in practice and to decide which theories to rely on when testing others. And, as I have shown, if we can have falsifications, we can have confirmations too.

One might object that this critique of Popper and Lakatos is just semantics. Popper writes, for example, “the decision to ascribe the refutation of a theory to any particular part of it amounts, indeed, to the adoption of a hypothesis; and the risk involved is precisely the same” (1983, p. 189). Perhaps Popper is only denying that scientists can have foundational justifications for their claims. Similarly, Lakatos insists that corroboration can be taken as evidence of verisimilitude and emphasizes the importance of corroboration and of the acceptance of theories in the falsification of others.

But to regard Popper and Lakatos as granting the importance of supporting evidence in determining which claims theoretical scientists should rely on eviscerates their philosophies. If Popper conceded that there are nonfoundational justifications, he would have to surrender his central theses and his methodological rules. He would have to reject falsificationism as an apt label for his views, for he would have granted that science is devoted to seeking verifications as well as falsifications. Essential to Popper’s work has been not the platitude that scientists should be critical and take disconfirming evidence seriously, but the striking thesis that there is nothing to scientific rationality except conjecture, evidentially unsupported methodological decision, and refutation. The rejection of “justificationism” in theoretical science is just as essential to Lakatos’ vision. Lakatos insists that heuristic power and empirically progressive theory changes are all that scientists should be concerned with. I doubt that an enterprise that functioned according to either Popper’s or Lakatos’ methodology could exist. It would be a poor tool for acquiring knowledge and inefficient in practice.

13 The Inexact Deductive Method

Chapters 11 and 12 canvassed the main twentieth-century alternatives to the method a priori. None shows how economists can rationally commit themselves to a highly inexact science such as economics. Each of the alternatives runs into internal philosophical difficulties, and (except Koopmans) each implies drastic changes in methodological practice.

Perhaps methodological practice in economics was due for a major overhaul. Many writers on methodology, including myself in the first edition, called for a greater engagement with the details of actual economic relations among individuals and firms. Let us look again at Mill’s deductive method and the criticisms it faced to see what is defensible and what is mistaken.

As I said before, Mill’s inexact deductive method has been subject to logical, methodological, and practical criticisms:

1. The logical criticism, which one finds in Hutchison and to some extent in Samuelson, maintains that inexact (ceteris paribus) laws are scientifically illegitimate, because they are meaningless or unfalsifiable. But the arguments in Chapters 9 and 10 show that qualified claims are not meaningless or untestable and, as argued in Section 12.2, no interesting scientific claims are logically falsifiable.
2. The methodological criticism of Mill’s inexact deductive method, which one finds in Popperians such as Mark Blaug, is that it is too dogmatic: it rules out the possibility of disconfirming the basic “laws.” Adhering to Mill’s inexact deductive method thus, it is alleged, impedes the progress of economics and leads to ad hoc responses to apparent disconfirmation characteristic of a degenerating research program. I accept this criticism of the method, but not generally of economists, who, despite appearances, rarely adhere to it.Footnote ¹
3. Furthermore, methodological vice is alleged to lead to practical impotence by authors such as Hutchison and, from a different perspective, Friedman. Even if the inexact laws and the other statements needed to deduce a prediction are true, the unspecified ceteris paribus clauses mean that the prediction follows only if there are no interferences. Since these ceteris paribus qualifications are vague, it is hard to know when they are satisfied. If economists do not know when they are satisfied, then economic theory is of little use in practice. For practical purposes, economists need to know what will happen if a policy is instituted, not what would happen if there were no disturbing causes. Even if Milton Friedman’s views are mistaken, at least he is concerned about when theories actually work. Does not the reliance on the deductive method render economics useless?

Although the methodological rules of Mill’s inexact method a priori, as summarized at the end of Chapter 10, cannot be defended as stated, I shall nevertheless defend (for the most part) the existing practices of theory assessment among economists. These practices appear to follow Mill’s inexact deductive method, but they are, I contend, also consistent with standard methods of theory appraisal in the special circumstances with which economists have to cope. Although apparent Millians in practice, economists can be good Bayesians or hypothetico-deductivists in principle.Footnote ² After demonstrating this possibility in Sections 13.1 and 13.2, I sketch the method of theory appraisal economists typically employ, and I discuss the large and legitimate role of pragmatic factors in economic theory choice. Only then does this chapter discuss the practical objection. The chapter concludes by pointing to the real source of dogmatism in economics.

13.1 Apparent Dogmatism and the Weak-Link Principle

To maintain that economists should never attribute apparent disconfirmations to shortcomings in their theories is unjustifiable. To this extent, the critics of the deductive method are correct. To follow such a rule would preclude theoretical and empirical progress. Such a rule is objectionably dogmatic.

It looks as if economists are adhering to this rule, but, given the tasks and difficulties economists face, widely accepted theories of confirmation, such as the Bayesian and hypothetico-deductivist views, recommend confirmational practice that is largely indistinguishable from what Mill’s inexact method a priori recommends. The methods of theory appraisal economists employ may be defensible, even though the method a priori is indefensible, and economists appear to conform to it.

It is not unacceptably dogmatic to refuse to find disconfirmation of economic “laws” in typical failures of their market predictions. When the anomalies are those cast up by largely uncontrolled observation of complicated market phenomena, it may be more rational to pin the blame on some of the many disturbing causes, which are always present. Since the confidence of economists in the simplifications and ceteris paribus assumptions necessary to apply economic theory to actual market phenomena will generally be much lower than their confidence in the basic laws, the more likely explanation for the apparent disconfirmation will usually be a failure of the simplifications and ceteris paribus assumptions. In consequence, little can be learned about the purported laws from such observations, but the failure will lie in the difficulties of the task, not in methodological mistake. The possibility of discovering errors in the “laws” of equilibrium theory may be foreclosed by the inadequacies in the data and limitations in economic knowledge, not by unjustifiable methodological fiat.

In responding this way to apparent disconfirmation, economists are implicitly relying on what I call “the weak-link principle”:Footnote ³

The weak-link principle: when a false conclusion depends on multiple premises, attribute the mistake to the most questionable of the premises.

This is but one of many principles that economists might use in modifying a model so that it conforms to some observation and serves as a reliable predictive tool. Scientists and nonscientists alike use the weak-link principle, and (pace Popper) it is rationally justifiable to do so. If either $G$ or $B$ is false and the subjective probability of $B$ is less than that of $G$ , then (other things being equal) it is more likely that $B$ is false than that $G$ is. However, the weak-link principle is neither inviolable nor always appropriate. Details concerning the model’s failures might point to a different premise as the culprit.

Since the simplifications and ceteris paribus clauses needed to derive predictions concerning uncontrolled market phenomena from equilibrium theory are the weak links, mistaken predictions rarely disconfirm the theory. Hence, one can see why Mill’s views seemed so plausible and were so easily refuted, and yet why methodological practice apparently continues to conform to them.

Given their subject matter, economists are bound to look like followers of Mill’s inexact deductive method. Powerful tests require either experimentation, with its possibilities of intervention and control, a great deal of knowledge, or good fortune in finding natural experiments, and without such tests (or superior alternatives) it would be irrational to react to apparent disconfirmations by surrendering credible hypotheses with great pragmatic attractions. If economists could do experiments easily, then they could control for disturbances and avoid the complexity of the phenomena with which they are presented nonexperimentally. If they knew enough, they could exert much the same control even if experiments are not possible. If they were blessed with a comparatively simple set of phenomena such as those of celestial motion, then neither the inability to experiment nor the paucity of their knowledge would be crippling. But the combination of these handicaps makes knowledge of economic phenomena hard to garner.

Limitations in the ability to test can make the basic “laws” of economics de facto unfalsifiable, even if economists were explicitly employing a Bayesian account of confirmation (§§10.1 and A.7). Let $H$ be either a “law” of equilibrium theory or a conjunction of such laws and $A$ be the conjunction of all the other statements needed to derive a prediction, $e$ from $H$ . The prior probability of $H$ , $P r (H)$ is much larger than the prior probability of $A$ , $Pr (A)$ . In the case of uncontrolled market predictions, $Pr (A)$ will be tiny. For each of the simplifications and ceteris paribus qualifications is improbable, and the probability of the conjunction will be much smaller than that of the separate conjuncts. If $e$ is deducible from $H & A$ , then $Pr (e / H & A) = 1$ , but that does not imply that $Pr (e / H)$ is large. Given how weakly evidence bears on $H$ , the credible “laws” with which economists begin will be de facto nonfalsifiable.

13.2 Are Economists Too Dogmatic?

The deficiencies of market data coupled with the weak-link principle will mimic Mill’s inexact method a priori only if economists judge the “laws” of equilibrium theory to be much more probable than the simplifications and ceteris paribus claims that are needed to test them. Given the empirical problems with those “laws,” can such a judgment be defended? And, if it cannot be defended, then are not economists as unjustifiably dogmatic as critics of the method a priori have alleged?

Few economists regard the “laws” of equilibrium theory as proven or obvious truths, although they may take them for granted when trying to predict the outcome of some proposed policy. Only some fancy philosophical footwork permits one to regard these “laws” as true (see Chapter 9), and it is questionable whether they can be regarded as well established (see Chapter 10). Why then do economists takes these behavioral postulates for granted? Why do they make use of them in the face of their apparent disconfirmation? Is it introspection, as Mill maintains, or everyday experience, as in Robbins’ view, or are these assumptions implicit in the very concept of action, as has been maintained by Austrian theorists such as von Mises (Reference Mises1978, p. 8)?

A full answer would take us back to the discussion of the theoretical strategy of economics in Chapter 7 or forward to the conclusion of this chapter. Everyday experience and introspection are sufficient to establish that some of these laws, such as diminishing marginal rates of substitution and diminishing returns, are reasonable approximations. Without qualifications and a margin of error, they are false, but with these they are unlikely to lead one astray; and economists have good reason to be committed to them.

Furthermore, each of the laws of equilibrium theory possesses pragmatic virtues, for each plays an important role in making the theory mathematically tractable, consistent, and determinate. Indeed, this is about the only virtue of the postulate of constant returns to scale, to which economists are not nearly as committed. Constant returns to scale figures in many economic theories for essentially mathematical reasons, and because it is hoped that its falsity does not do much harm.

Claims such as acquisitiveness and profit maximization are not such good approximations to the truth as are diminishing returns or diminishing marginal rates of substitution, but neither are they as poorly established as constant returns to scale. And what is the alternative? There is a great deal of truth to them, and their virtues in permitting determinate mathematical formulations are considerable. Firms pursue all sorts of objectives besides profits, and a usable theory that heeds these facts should be more accurate. But the accuracy would be purchased at the cost of complexity, and complications could destroy the normative force equilibrium theory has when it is coupled with minimal benevolence. In such circumstances, pragmatic factors may justifiably be more than empirical tie-breakers. If the empirical benefits of a theory change are small – that is, if (1) a slightly more accurate theory does not serve the often practical purposes of economists appreciably better and (2) economic theorists do not believe that an appreciably more exact or useful economic theory is feasible – then the pragmatic virtues of current theory may be decisive. It may be more sensible to treat other objectives of managers and other behavioral generalizations concerning individuals as disturbing causes that may usually be ignored, even if they are important in particular contexts.Footnote ⁴

Thus, one finds a combined empirical and pragmatic basis for refusing to regard the basic propositions of equilibrium theory as disconfirmed. Although not necessarily unjustifiably dogmatic, there is a serious risk that economists become so entranced by their models that they overlook anomalies and do not consider alternatives. As Mill so presciently remarked (though only when criticizing his father, not his own work):

We either ought not to pretend to scientific forms, or we ought to study all the determining agencies equally, and endeavour, so far as it can be done, to include all of them within the pale of the science; else we shall infallibly bestow a disproportionate attention upon those which our theory takes into account, while we misestimate the rest, and probably underrate their importance.

(1843, 6.8.3)

In my view, the dogmatism of economists, such as it is, lies in an overly complacent commitment to equilibrium theory and to the theoretical strategy underlying it, not in a mistaken view of theory appraisal.

Because economists have reasonable grounds for judging their basic explanatory and predictive generalizations to be less open to revision than are the simplifications and ceteris paribus claims that are also needed to derive conclusions about market phenomena, they may behave in the way that Mill’s inexact deductive method recommends, without being committed to a dogmatic view of theory appraisal. The apparent dogmatism may be just the result of the good fortune of beginning with a set of plausible generalizations coupled with the bad luck of being unable to perform good tests. Consider these remarks of Robert Lucas (Reference Lucas1980, pp. 710–11):

How is confidence [in the components of models] of this sort earned? This is a question on the answer to which economists are fairly well agreed, yet I cannot recall where I have seen the nature of this agreement articulated. The central idea is that individual responses can be documented relatively cheaply, occasionally by direct experimentation, but more commonly by means of the vast number of well-documented instances of individual reactions to well-specified environmental changes made available “naturally” via censuses, panels, other surveys, and the (inappropriately maligned as “casual empiricism”) method of keeping one’s eyes open …

Notice that, having specified the rules by which interaction occurs in detail, and in a way that introduces no free parameters, the ability to predict individual behavior is nonexperimentally transformed into the ability to predict group behavior.

(Lucas 1980, pp. 710–11)

How can one tell whether economists are committed to Mill’s inexact method a priori or whether they are good Bayesians or hypothetico-deductivists doing the best they can in the face of poor data coupled with informal sources of knowledge? If the only data economists could gather were the results of uncontrolled observations of markets, then we might not be able to find out what the methodological commitments of economists were. But experimentation in economics has never been impossible, and in some experiments going back to the middle of the twentieth century, the auxiliary assumptions – the additional premises needed to derive predictions from equilibrium theory – have been sufficiently strong links that the experimental results could actually disconfirm the theory. By examining how economists have responded in such cases, one can determine whether they are proponents of a dogmatic theory of confirmation or whether their apparent dogmatism in nonexperimental circumstances is a rational response to weak evidence.

13.3 Expected Utility Theory and Its Anomalies

Although I cannot offer a comprehensive survey of how economists have reacted to well-established anomalies, I offer some examples here and in Chapters 14 and 15. Expected utility theory (§1.3) has many of the same axioms as equilibrium theory (including completeness, an axiom I focus on). It is, like parts of equilibrium theory, a theory of rationality, and it is accepted on the same sort of grounds as equilibrium theory. But unlike equilibrium theory it is readily testable. Consequently, it is easier to consider how apparent disconfirmations bear on expected utility theory.

The case for expected utility theory, as for equilibrium theory, seems to rest upon an application of Mill’s inexact method a priori. Consider the following remarks of Daniel Ellsberg:

However, this proposition [that individuals have cardinal expected utility functions], which we will call the Hypothesis on Moral Expectations, has little inherent plausibility. The major feat of von Neumann and Morgenstern is to show that the Hypothesis on Moral Expectations is logically equivalent to the hypothesis that the behavior of given individuals satisfies certain axiomatic restrictions. Since the axioms appear, at first glance, highly “reasonable,” the second hypothesis seems far more intuitively appealing than the equivalent Hypothesis on Moral Expectations. It is thus more likely to be accepted on the basis of casual observation and introspection, although the two hypotheses would both be contradicted by exactly the same observations.

(1954, p. 277)

Ellsberg is pointing out that economists sometimes accept theories, such as expected utility theory, because the axioms appear “reasonable.” The credibility of the axioms is largely prior to any testing of predictions derived from the theory.

One cannot regard the axioms of expected utility theory as proven scientific truths, though one may say on their behalf (1) that they are, as Ellsberg notes, “reasonable”; (2) that there is some experimental evidence that confirms them; and (3) that, if people do not conform to the axioms of expected utility theory, then, contrary to observation, they may make fools of themselves.Footnote ⁵ These grounds provide the axioms with some credibility, and (via the weak-link principle) they provide the theory with an ability to withstand casual falsifications.

Unlike equilibrium theory, expected utility theory is readily testable, and the assumptions necessary to derive predictions from the theory need not always be weak links. Psychologists and decision theorists have shown that human behavior sometimes does not conform to the “laws” of expected utility theory. Some of these anomalies can be explained as the consequence of nonrational disturbing causes – such as the result of minor peculiarities in how people process information or of people’s failure to take small differences in probabilities seriously.Footnote ⁶ Others are more troubling. Let us see how economists and decision theorists do and should deal with some of these.

13.3.1 The Allais Problem

In the early 1950s, Maurice Allais formulated the problem shown in Table 13.1.Footnote ⁷ A ball is drawn from an urn containing one red ball, eighty-nine white balls, and ten blue balls. So the probabilities are known. Depending on the color and the choice of $A$ or $B$ in problem I or of $C$ or $D$ in problem II, one receives one of the prizes in the table.

Table 13.1 The Allais problem

Problems	Choices	Pay-offs
Problems	Choices	Red (1)	White (89)	Blue (10)
I	A	$1 million	$1 million	$1 million
	B	$0	$1 million	$5 million
II	C	$1 million	$0	$1 million
	D	$0	$0	$5 million

Many people are inclined to prefer option $A$ to option $B$ in problem I and to prefer option $D$ to option $C$ in problem II. Even the Bayesian statistician, Leonard Savage, was at first so inclined (Savage Reference Savage1972, p. 103). If these choices reflect preferences, then they violate the independence principle, for the only difference between the choice pairs is in the magnitude of the pay-off if a white ball is drawn, which should be irrelevant to the choices because it does not depend on whether $A$ or $B$ in problem I or $C$ or $D$ in problem II is selected. Thus, $A$ should be preferred to $B$ if and only if $C$ is preferred to $D .$ Footnote ⁸ Yet many individuals are unpersuaded. In one view:

In Situation $x$ [problem I], I have a choice between $1,000,000 for certain and a gamble where I might end up with nothing. Why gamble? The small probability of missing the chance of a lifetime to become rich seems very unattractive to me.

In Situation $Y$ , there is a good chance that I will end up with nothing no matter what I do. The change [sic] of getting $5,000,000 is almost as good as getting $1,000,000 so I might as well go for the $5,000,000 and choose Gamble 4 [ $D$ ] over Gamble 3 [ $C$ ].

(Slovic and Tversky 1974, p. 370)

If expected utility theory is a correct normative theory of rationality, then this reasoning must be fallacious or irrational.Footnote ⁹

Allais devised the example as a criticism of the normative adequacy of expected utility theory, not as an empirical refutation. So, even if people do stubbornly choose $A$ in problem I and $D$ in problem II, one can still ask whether these choices are evidence against subjective expected utility theory or evidence of human irrationality. The latter view would be supported, if one could find some obvious sign of irrationality, but there is none. If many people are inclined to choose $A$ over $B$ and $D$ over $C$ , and a variety of thoughtful decision theorists are prepared to defend the rationality of choosing $A$ and $D$ , such as Allais himself, Levi (Reference Levi1986), or Sugden (Reference Sugden1986), then there are significant grounds for questioning the independence condition as a normative condition of rationality.

If one is concerned with independence as a generalization about how people actually choose, then it might seem that it does not matter why people make these choices. The fact that they do is sufficient to show that they do not act in accordance with expected utility theory. But predictions do not follow from expected utility theory all by itself, and the diagnosis of the reasoning responsible for the anomalous decisions may still be important. Since we believe that people’s choices are influenced by their reasoning and that fallacious reasoning is unstable, such a diagnosis remains of the utmost importance. For paradoxes such as Allais’ call for a fundamental modification of expected utility theory as an inexact positive theory of choice behavior only if the choices cannot reasonably be attributed to disturbing causes of secondary importance. There is another regard in which examples such as these may have less force in an empirical critique of expected utility theory than in a challenge to its normative adequacy, for it might be objected that choice problems such as these are unusual and unimportant.

Are people’s choices in the Allais problem evidence against subjective expected utility theory or do they show that there is some disturbing cause? There is little evidence of irrationality, but perhaps the interference might be some further rational factor. One might want to supplement subjective expected utility theory with some further rational tendency counteracting the independence principle. Expected utility theory is falsified only if such a hypothesis is inferior to one which denies rather than merely qualifies one or more of its “laws.” What makes this issue more tractable than those raised by the myriad apparent “falsifications” of equilibrium theory revealed by market data is the possibility of experimentation. Instead of an impenetrable mess in which one can do no better than to hold on to what is independently plausible, one has a partially penetrable mess from which one may learn how to correct or improve what one begins with.

13.3.2 Qualification versus Disconfirmation

One important effect of mitigating the empirical difficulties that stand in the way of testing inexact claims is to make the conceptual difficulties clearer. To patch up an apparently disconfirmed model by changing an auxiliary hypothesis or citing a disturbing cause (whether rational or nonrational) is to change one’s model or applied theory in response to disconfirming evidence. The new applied theory has different empirical consequences than the old. Hence it is wrong to say that those who cite interferences to explain away unfavorable evidence ignore disconfirmations. Perhaps they do not react correctly, but they do react.

Disturbing causes, like all causes, have their (inexact) laws, and to explain away a disconfirmation by citing an interference may not be purely ad hoc, at least in Lakatos’ first two senses (1970, p. 112n). The disturbing cause cited is to be expected in similar circumstances, and the modification has nonvacuous empirical content – although the complexity of the phenomena may make testing impossible. The more general the disturbing cause, the more contentful and less ad hoc is the hypothesis that cites it.

Once one has largely ruled out failures of rationality, the question “does the Allais paradox reveal mistakes in expected utility theory, or does it merely reveal a mistake in some simplification or the influence of some disturbing cause?” turns out to be less straightforward than it might appear. The right question in a well-controlled experimental context is not “is the theory disconfirmed or is there an interference?” but “what should one do about this disconfirmation? Should one add a qualification to the model or narrow the independent specification of its scope (which might in many contexts harmlessly be ignored), or should one revise the model in some more fundamental way?” One cannot draw any sharp line between qualifications and modifications, but one does not need to do so either. In both cases, empirical evidence exerts some control over theory change. The difference is pragmatic: qualifications can often be dropped while modifications leave a permanent mark. The significant question is whether theorists can, for particular purposes in a particular context, ignore the necessary changes and employ the original model.

Another way of grasping the issue would be to ask how economists are supposed to know, in Mill’s terminology, that equilibrium theory has captured the “greater” causes of economic phenomena.Footnote ¹⁰ Introspection provides evidence that acquisitiveness is a significant causal factor affecting economic phenomena, but introspection does not give economists good reason to believe that acquisitiveness is a more important cause of economic behavior than, for example, the attitudes toward risk that seem to influence choices in the Allais paradox.

How can economists decide whether a disturbing cause is “major” or “minor” and whether they may justifiably regard expected utility theory as capturing the “greater causes” of choice behavior? The quantitative statistical question “how much of the variation in some dependent variable is due (in the actual complicated circumstances) to a particular independent variable?” is subject to fallible statistical investigation. But Mill’s concern in distinguishing major and minor causes is not simply quantitative. “Major” causes are fundamental and have universal scope, while minor causes are more superficial and have narrower scope. The decision whether to deal with an empirical anomaly by changing one’s model or by citing a disturbing cause is tantamount to the decision of whether to treat the factors mentioned by one’s current model and only those factors as the “major” causes. If expected utility theory leaves out a major cause that is responsible for the Allais paradox, then a serious theory change is called for. If it encompasses all the major causes, then anomalies such as the Allais paradox only call for qualifications, which for many purposes can be ignored. The decision depends on both pragmatic and empirical factors.

In its pragmatic aspect this question demands that economists be clear about both practical and theoretical employments and aspirations for the model. What do they want the model for and what sort of theoretical grasp of the subject matter do they think is possible?Footnote ¹¹ Although this way of thinking is most congenial to instrumentalists, it carries no instrumentalist commitments. Realists can also think about the cognitive jobs they want particular models to do and how well they think such jobs can be done. Some of the pragmatic virtues of the axioms of expected utility theory – of the “laws” of equilibrium theory – have already been mentioned: they lead to a mathematically tractable and determinate theory. But there are other pragmatic virtues, to which I return shortly, which are related to the fact that these are also theories of rationality.

The decision whether to qualify or to modify also hinges on the empirical scope, frequency, and distribution of the apparent disconfirmations experimenters have uncovered. If, for example, the disconfirmations are not very important in the domain that is of the greatest theoretical and practical importance, and economists do not believe that a much better theory is likely to be found (which obviously will depend on what alternatives have been suggested), then it would be reasonable to account for the disconfirmations in terms of “interferences.” If, on the other hand, the qualifications need to be invoked often and economists believe that considerably more exactness is possible, then it would be more reasonable to seek to modify the model fundamentally.

The presence and promise of alternatives also influence theory choices. It is fair to say (following to some extent Lakatos’ views) that what converts anomalies or difficulties such as Allais’ paradox into disconfirming evidence demanding fundamental theory modification is the formulation of alternatives, which accommodate such anomalies within a theory that can do the job done by expected utility theory.

At this point a distinctive element enters the picture about which I have more to say in Section 16.3. For one job that expected utility theory, like ordinal utility theory, does is provide a theory of rationality. Should the fact that utility theory is a theory of rationality affect its empirical appraisal?

The suggestion may seem ludicrous. To argue that utility theory is a good theory of how people actually behave because it is also a plausible approximate theory of how they ought to behave seems like the argument that the moral judgment that people ought not to cheat on their taxes implies that people do not in fact do so. The argument seems to presuppose what is in question, which is whether people behave as (according to this theory) they ought rationally to behave.

This response does not settle the matter. Irrationality can be costly, and the costs of irrational behavior may make it unstable. Although people’s behavior diverges from that predicted by expected utility theory, it may be that there can be no better general theory precisely because of the instability of these divergences. Furthermore, people’s behavior is influenced by its theoretical description. For example, it is plausible that the arguments of economists concerning the advantages of index funds over actively managed mutual funds is a large part of the explanation for why index funds have captured such a large portion of individual investment in equities. There is evidence that students who learn economics learn to conform more closely to utility theory.Footnote ¹² The defense of utility theory as a first approximation may be self-supporting, while espousing nonrational theories of choice may be self-defeating. The fact that utility theory is a theory of rationality seems to provide some grounds to believe that it is a correct approximation to how people actually choose.

The fact that utility theory is a theory of rationality may provide pragmatic reasons not to give it up and to attribute anomalies to interferences. There are two pragmatic considerations here. First, the fact that utility theory is a theory of rationality permits explanations in economics to be reason-giving explanations in addition to causal explanations (see §16.3 and §A.9). Explanations in economics justify as well as explain choices, and they consequently depend on the factors that economic agents focus on and find of interest. A different sort of explanation might be more successful empirically, but the costs of severing the links between economics and the concerns of economic agents are significant, and they give economists reason to favor a positive theory of choice that is, like the current theory, also a theory of rational choice.

In addition, rather than changing their theories to conform to how people behave, perhaps economists should try to change the behavior. Those who are unclear on what rationality requires or who are lazy or ineffective in their efforts to conform need reeducation. This educative function of expected utility theory provides a pragmatic reason for accepting it, unless a competitor is much better supported by the evidence or better able to guide choice. I am not proposing that theorists pretend that people behave according to expected utility theory even when they do not do so. But the educative function of a theory of choice gives economists reason to describe the divergences as lapses or interferences and to retain expected utility theory (or some alternative with similar normative force) rather than opting for a nonnormative alternative. This reason may not be decisive, for the empirical advantages of a nonnormative alternative might be overwhelming. But such pragmatic grounds are neither trivial nor irrational.Footnote ¹³

The similarities between the complex mixture of empirical and pragmatic elements one finds in defenses of the use of equilibrium models and Mill’s inexact method a priori are superficial. What drives most economists to regard interferences as minor disturbing causes is not the manifest truth of the basic axioms, nor any methodological rule prohibiting revisions of them, but the nature of the disconfirmations coupled with the pragmatic attractions of accepted theory.

13.3.3 Incomplete Preferences: Levi’s Alternative

An argument for an alternative to expected utility theory illustrates how this complicated process of theory assessment might work. There are several alternatives to expected utility theory which purport to inherit much of its normative and predictive virtues and to accommodate anomalous examples, of which Allais’ problem is but one instance: regret theory, theories which surrender the independence principle, such as Edward McClennen’s (Reference McClennen, Stigum and Wenstop1983) and Mark Machina’s (Reference Machina1987), theories which surrender completeness, such as Isaac Levi’s (Reference Levi1980), theories which surrender independence and completeness, such as Edward McClennen’s (Reference McClennen1990), and theories that surrender context independence, like the prospect theory of Tversky and Kahneman (Reference Tversky and Kahneman1981), about which I say more later.

In this section, I discuss only one of these – Levi’s proposal – which brings out the methodological points clearly. Levi is a philosopher (one of my teachers), and his views, unlike Machina’s, for example, have little following within economics. However, his work illustrates the issues that arise concerning the structure of rational criticism and theory change. For a case study concerning how mainstream economists behave in the face of apparently disconfirming evidence, see Chapter 14.

As discussed in Section 1.1.2, one questionable axiom of both ordinal utility theory and expected utility theory is completeness or comparability – that among any two options $X$ and $Y$ a rational agent will either prefer $X$ to $Y$ or $Y$ to $X$ or the agent will be indifferent. The axiom appears to be false: individuals are often unable to rank alternatives. The related claim that agents can and should form precise subjective probability judgments, which is required by completeness of preferences over gambles, is similarly dubious.

The standard defense of completeness assumes that choice demonstrates preference. What one chooses is what one prefers. But the standard defense gives rise to spurious intransitivities. The only remaining ground upon which to accept completeness is that it is a reasonable approximation or a harmless idealization that permits the development of a simple and systematic theory of rationality. Levi argues that paradoxes such as Allais’ – as well as a pragmatic perspective on inquiry – suggest that assuming completeness is not harmless.

In Levi’s view, people are often unable to rank options with respect to expected utility, owing to indeterminacies in their utility functions or in their probability judgments. Given this inability, Levi maintains that they ought to suspend judgment, as indeed people often do, rather than making arbitrary presumptions. After screening out those options that are unambiguously inferior with respect to expected utility for any admissible utility function or probability judgment, agents should choose on the basis of secondary criteria such as security. Consider the Allais problem again. Option $A$ , $1 million for sure, obviously beats option $B$ on this criterion of security. It is, however, less obvious that $D$ is more secure than $C$ , although Levi argues that it is.Footnote ¹⁴

Levi argues that his account of rational choice, which surrenders ordering (and hence completeness) and sharply distinguishes preference and choice, accounts for a wide range of choice behavior that conflicts with expected utility theory and in which (as in the Allais paradox) many subjects refuse to see the error of their ways. Permitting indeterminacies in probability judgments and utilities is not ad hoc, but is required by a pragmatic theory of inquiry that takes ignorance seriously, and the theory that results is neither normatively nor empirically empty (Seidenfeld et al. Reference Seidenfeld, Schervish and Kadane1987). If these claims are defensible, then Levi has presented a strong case disconfirming completeness.

Levi’s alternative cannot be classified unambiguously as a fundamental theory change, although this is clearly how Levi would classify it (compare Kaplan Reference Kaplan1989). Since expected utility theory is preserved as a special case within Levi’s theory, when one has precise preferences and precise probability judgments, one might plausibly argue that Levi is offering a theory that includes more causal factors and thus supplements rather than replaces expected utility theory. Yet Levi believes that circumstances in which agents can be treated as if they had precise preferences and precise probability judgments are exceptional and that failures of completeness should not be treated as unusual complications.Footnote ¹⁵

There was never any question of an empirical proof of completeness. At best it appeared to be a reasonable approximation. Levi argues that these appearances are misleading. The example shows that theories that are regarded as inexact and that are defended by means of what looks like Mill’s inexact deductive method are not immunized against refutation.

13.4 Behavioral Economics and Methodological Changes

One way to determine whether economists are committed to the dogmatic method a priori or whether they are instead handicapped by the complexity of the phenomena coupled with the inability to experiment is to see what happens when economists discover new possibilities for exerting or finding experimental controls. Do they continue to defend their theories from the possibility of disconfirmation, or do they modify them when their implications are not borne out by the data? Do they expand the range of the variables and relations discussed in their theories as they acquire the ability to make testable and significant claims about them?

Over the last half-century, slowly at first but with an increasing pace, economists have discovered new possibilities for both laboratory and field experiments, and they have uncovered ways to interpret changes in policies and other historical events as if they were experimental interventions. Although some economists have argued that much of this experimentation is irrelevant to economics (notably Gul and Pesandorfer 2008), experimentation and historical investigation that identify natural experiments have become standard tools of economics. Fifty years ago, the leading journals published only a tiny fraction of the empirical investigations to be found in the same journals today. Chapter 14 is devoted to an extended case study concerning the reception by economists of psychological investigations purporting to reveal the existence of preference reversals. In that history, one can trace the evolution of the responses of economists to this phenomenon, from attempting to explain it away, to attempting to relegate the problems it suggests to peripheral aspects of economic models, to uncomfortable acceptance of the phenomenon, and then finally to inquiry concerning its causes and the ramifications.

One serious issue raised by critics of behavioral and neuroeconomics,Footnote ¹⁶ such as Gul and Pesandorfer, concerns the relevance of behavioral and neurological findings to the questions that economists are concerned with, which focus on the aggregate consequences of individual choices rather than on the idiosyncratic factors that are responsible for preference rankings. The discovery that people are no less cooperative when hungry (Reynolds Reference Reynolds2019) may be very useful to an employee approaching her boss for a raise before lunch, but it doesn’t bear significantly on our understanding of economic fluctuations or the properties of markets.

Although a fair criticism of some work in behavioral and neuroeconomics, these qualms do not apply to all of it. Consider, for example, anchoring, loss aversion, framing, and the endowment effect. Many experiments show that individuals “anchor” their evaluation of alternatives to a reference state of affairs, typically the status quo, and they assess gains and losses from that reference point differently. People weigh losses more heavily than gains with both marginal losses and marginal gains of diminishing importance. People’s choices reflect a value function shaped like the one shown in Figure 13.1.

Figure 13.1 Anchoring and loss aversion.

Like a standard preference ranking, this value function is supposed to determine choices, but unlike a preference ranking, it depends on the reference point, and for that reason it is obviously context dependent. Contextual factors that affect people’s reference point, which are irrelevant to a preference ranking that satisfies the standard axioms, can dramatically shift the rankings of alternatives. Tversky and Kahneman (Reference Tversky and Kahneman1981) present the following striking example, which is by now very well known. They asked groups of subjects to assess alternative policies to treat a disease that threatens the lives of 600 people. When given the choice between programs A and B in Table 13.2, about three-quarters of experimental subjects prefer program A.

Table 13.2 Saving lives

Program A

Saves 200 people

Program B

Saves 600 people with probability $1 / 3$

Saves no one with probability $2 / 3$

Other experimental subjects were asked to compare Programs C and D in Table 13.3. Faced with a choice between C and D, about three-quarters of experimental subjects prefer program D. Yet A and C are descriptions of the same state of affairs, as are B and D. Anchoring and loss aversion explain this framing effect. The same outcomes look different depending on whether the reference point is the death of 600 people (and one’s action is seen as saving 200 for sure or 600 with a probability of one-third) or whether the reference point has all 600 living with policies permitting the deaths of different numbers.

Table 13.3 Allowing deaths

Program C

400 people die

Program D

No one dies with probability $1 / 3$

600 people die with probability $2 / 3$

Because people weigh gains and losses differently, their choices depend on the reference point. Rather than unsystematic failures of the axioms, these data identify an additional systematic influence on choice, which is relevant to the behavior of consumers and firms. As Kahneman et al. put it:

It is in the nature of economic anomalies that they violate standard theory. The next question is what to do about it. In many cases there is no obvious way to amend the theory to fit the facts, either because too little is known, or because the changes would greatly increase the complexity of the theory and reduce its predictive yield. The anomalies that we have described… may be an exceptional case, where the needed amendments in the theory are both obvious and tractable. The amendments are not trivial: the important notion of a stable preference order must be abandoned in favor of a preference order that depends on the current reference level. A revised version of preference theory would assign a special role to the status quo, giving up some standard assumptions of stability, symmetry and reversibility which the data have shown to be false. But the task is manageable.

(1991, p. 205, [emphasis added])

The crucial points are, first, that these anomalies show that the influence of the diverse factors that motivate people cannot be summarized in a single ranking that is complete, transitive, context independent, and choice determining; and second, that some of the divergences are systematic and predictable. The preferences of consumers and the ranking of investments by firms are likely to depend significantly and regularly on the reference point from which individuals evaluate alternatives.

Loss aversion is also manifest in the so-called endowment effect (Kahneman et al. Reference Kahneman, Knetsch and Thaler1991) whereby individuals demand more to part with some commodity than they would have been willing to pay to acquire it. Although the effect was demonstrated with students and coffee mugs, the phenomenon is far from trivial. As Benjamin Friedman (Reference Friedman2005) argues, loss aversion and the endowment effect are among the factors that make redistributive policies so much more difficult to implement when there is little economic growth. When the economy is growing, redistribution will look to those who lose out to be a lesser improvement rather than a loss.

The newly discovered factors that affect choice behavior derive from a blossoming of empirical economics. This takes the forms of laboratory experimentation, field experimentation, and the exploitation of natural experiments, particularly with the assistance of instrumental variable techniques. Empirical investigation of economic phenomena has always faced serious problems. Economists can hardly cast some individuals into poverty to observe the effects of an unexpected loss of wealth on a random sample. It would be unethical and scarcely feasible. Giving some random sample of the population $1,000,000 to observe the effects on consumption might be ethical, but it would be prohibitively expensive. Controlled experiments on participants in realistic markets are difficult to carry out, given how many factors there are to control for. When controlled experiments are possible, economists face the huge (“external validity”) problem of determining whether the findings in the controlled environment of the experiment will hold true outside of the laboratory.

These problems are ineliminable, but economists have found ways of lessening them. Modeling interactions between firms or consumers as simple games has made it possible to simulate economic problems facing market participants by interactions among students and other experimental subjects communicating via computer terminals. Shifting from explicit controls to the control exercised by randomization has made it easier to investigate specific putative causal factors and has lessened (although certainly not eliminated) the problems of external validity.

Randomized control trials are especially useful in field experiments, where explicit controls are infeasible, but field experiments raise special problems of external validity, because they are necessarily carried out within a specific and typically not-well-understood social context. For example, the Tamil Nadu Integrated Nutrition Project (TINP) provided nutritional education to pregnant women in rural districts of the state of Tamil Nadu and apparently greatly lessened malnutrition among infants. Encouraged by these results, policy-makers instituted a similar program in Bangladesh, where it failed. The explanation lies in a difference in an additional causal factor: in Bangladesh, mothers-in-law distribute the family’s food, while in Tamil Nadu, mothers do (Cartwright and Hardie Reference Cartwright and Hardie2012).

Although the TINP was not a randomized control trial, it illustrates the difficulties of extrapolating a finding from one context to another. Moreover, since not all of the districts in Tamil Nadu received this assistance, one might conjecture that the receipt of the assistance was close enough to an experimental intervention that the results could be attributed to the intervention rather than to unknown confounding factors. Such conjectures can be more or less reasonable, and sometimes the implementation of policies can be sequenced so as to create an experiment. For example, Mexico’s PROGRESA (Programa de Educación, Salud y Alimentación), which is a comprehensive antipoverty program combining health care, education, and nutrition, was implemented gradually, with the first locations chosen randomly and the outcomes compared to locations in which PROGRESA had not yet been implemented. Although there is always some possibility that the districts in which PROGRESA was first introduced differed in some other relevant regard from districts in which PROGRESA had not yet been implemented, such a study should count as a randomized control trial. The fact that the experiment is part of the implementation of a policy may create a risk of bias, but it is otherwise irrelevant.

Natural experiments are not easy to find, because of the importance of confounders, but sometimes there are ways around the problems. For example, consider Angrist’s (Reference Angrist1990) study of the effects of serving in the Vietnam War on lifetime earnings. The possibility that young men from less affluent families were more likely to enlist rules out any simple attribution of the lesser earnings of veterans to having served in the military. But an individual’s number in the draft lottery, which determined whether the individual was “draft eligible,” counts as an intervention. Because draft eligibility is determined by a random process, one can attribute the earnings differences between those who were and those who were not draft eligible to the difference in draft eligibility. (Of course, it is possible by chance that those whose numbers in the draft lottery made them eligible to be drafted all happened to be less valuable employees, but this is enormously improbable.) Knowing the effect of draft eligibility on lifetime earnings, Angrist could then use information concerning the proportions of draft-eligible and draft-ineligible men who wound up serving in the military to reach a conclusion concerning the effect of military service on lifetime earnings.Footnote ¹⁷ Draft eligibility serves as an instrument that permits inferences concerning the effects on earnings of military service.

In explicitly causal language, one can explain the definition and use of instrumental variables as follows. If economists want to determine whether $X$ causes $Y$ , in circumstances in which there are likely to be unmeasured confounders, they cannot draw any causal conclusions from observing a correlation between $X$ and $Y$ . But if there is some other variable $Z$ that is (1) a cause of $X$ , that is (2) independent of any confounders of the relationship (if any) between $X$ and $Y,$ and (3) has no causal connection to $Y$ unless there is a causal path from $Z$ to $Y$ via $X$ , then $Z$ counts as an instrumental variable, and the discovery of a correlation between $Z$ and $Y$ is strong evidence that $X$ causes $Y$ .

Figure 13.2 An instrumental variable.

In Figure 13.2, economists cannot determine whether $X$ causes $Y$ from information concerning whether $X$ and $Y$ are correlated, owing to the possibility of unknown confounders, $W$ . But if there is a cause of $X$ , such as $Z$ , that has no causal relationship to $Y$ apart from a causal relationship that it might have in virtue of causing $X$ , then the discovery of a correlation between $Z$ and $Y$ is evidence that $X$ causes $Y$ . The causal inferences economists can draw on the basis of measuring the correlation between $Z$ and $Y$ presuppose causal knowledge concerning the relationship between $Z$ and the other variables. For example, Angrist’s conclusions rest on the premises that draft eligibility does not affect lifetime earnings, except via its effects on military service, that draft eligibility increases the probability of military service, and that draft eligibility is not caused by another of the other variables including possible confounders.

Although the ethical and feasibility constraints on economic experimentation remain, economists are now able and willing to confront economic hypotheses with evidence. For example, instead of concluding (plausibly) that an increase in the minimum wage increases unemployment among unskilled workers, because firms can substitute machinery or more highly skilled workers for the now more expensive unskilled labor, economists have taken advantage of natural experiments to discover that relatively small increases in the minimum wage have little effect on the employment of unskilled workers (Card and Kreuger Reference Card and Kreuger1994).

13.5 The Economists’ Deductive Method

We are now in a position to formulate a schema sketching a “deductive” method of theory appraisal that is both justifiable and consistent with existing theoretical practice in economics, insofar as that practice aims to appraise theories empirically. For, as I stressed earlier (§§6.4, 7.3, and 7.4), a good deal of theoretical work in economics is still concerned with conceptual exploration, not with empirical theorizing.

To facilitate the comparison of what I am calling the economists’ deductive method with Mill’s inexact method a priori, I have juxtaposed sketches of the two methods in Table 13.4.

Table 13.4 Deductive methods

Mill’s inexact deductive method	Economist’s deductive method
Borrow proven (ceteris paribus) laws concerning the operation of relevant causal factors	Formulate credible (ceteris paribus) and pragmatically convenient generalizations concerning the operation of relevant causal factors
Deduce from these laws and statements of initial conditions, simplifications, etc., predictions concerning relevant phenomena	Deduce from these generalizations, and statements of initial conditions, simplifications, etc., predictions concerning relevant phenomena
Test the predictions	Test the predictions
If the predictions are correct, then regard the whole amalgam as confirmed. If the predictions are not correct, then judge (1) whether there is any mistake in the deduction, (2) what sort of interferences occurred, and (3) how central the borrowed laws are (how major the causal factors they identify are) and whether the set of borrowed laws should be expanded or contracted	If the predictions are correct, then regard the whole amalgam as confirmed; if the predictions are not correct, then compare alternative accounts of the failure on the basis of explanatory success, empirical progress, and pragmatic usefulness

The economist’s deductive method, unlike the inexact method a priori, is consistent with standard views of confirmation. What justifies continuing to call it a deductive method, despite its concessions that the inexact laws with which one begins are not proven and that they can be refuted by economic evidence? First (in sharp contrast to the methodological views discussed in Chapters 11 and 12), independent direct confirmation of the basic inexact laws plays a crucial role.Footnote ¹⁸ Second, refutation is largely proscribed, albeit by the circumstances, not by methodological rule. Since economists are typically dealing with complex phenomena in which many simplifications are required and in which interferences are to be expected, the evidential weight of predictive failure will be small. It will rarely be rational to surrender a well-supported hypothesis because of a predictive failure in circumstances such as these. The Allais problem exaggerates the weight of evidence because of its quasi-experimental basis.

The simplified account of the economist’s deductive method sketched earlier follows the hypothetico-deductive method (§10.1) precisely in steps 2 and 3 and is consistent with it in steps 1 and 4, where it is merely more specific. The hypothetico-deductive method is mute on where hypotheses to be tested come from and permits one to begin with a theory with known empirical and pragmatic virtues.

The fourth step of the economist’s deductive method abbreviates Section 13.3.2. It is consistent with the hypothetico-deductive method, which merely requires that the correctness or incorrectness of the predictions contribute to the appraisal of the hypothesis tested. The empirical grounds for discriminating between theories in the economist’s deductive method remind one of Lakatos’ formulations, and they direct one to consider what theory modifications or qualifications best explain the data and best increase the confirmed empirical content of the theory. It will be extremely difficult to judge theory modifications on empirical grounds, because of the acute practical Duhem–Quine problem in economics (§A.7), which is a consequence of how dubious are the various auxiliary hypotheses that are necessary in order to perform most tests. Pragmatic grounds may consequently play a large role. If one cannot tell which theory modification is empirically better, it is sensible to choose the one that has greater pragmatic virtues – that is, the one that it is easier to use, gives sharper advice, lends itself to cleaner mathematical expression, and so forth.Footnote ¹⁹

Yet, when experiments are possible and when alternatives are available that inherit the initial credibility of the accepted theory and offer similar pragmatic advantages, then the economist’s deductive method favors theory change. If one studies how economists respond to experimental anomalies, one can see that they are not committed to a dogmatic view of confirmation, such as the inexact method a priori. The case study in Chapter 14 provides evidence for this claim. The dogmatism one used to find in economists’ responses to anomalies, which resulted more from a commitment to an image of economics as a separate science than from any theory of confirmation, has not disappeared, but it has faded. Many modifications are proposed, discussed, and tested.

13.6 The Deductive Method and the Demands of Policy

The one remaining criticism of the deductive method is practical: in following a deductive method, economists allegedly condemn their work to practical futility. This criticism is mistaken. The economist’s deductive method does not rule out theory changes when doing so will increase the empirical content of the theory – on the contrary, it mandates them. Nor does it – or any other variety of the deductive method – condemn empirical generalization. Mill is explicit in endorsing common sense on this point: if something works, use it (though with due caution). Moreover, the development of empirical generalizations, for which no deductive derivation is currently possible, is of great theoretical importance too, for such generalizations constitute the most important data for which theories need to account. There are, as we saw in Section 7.5, methodological rules against employing ad hoc generalizations – that is, generalizations that do not permit rational choice explanations, that do not give pride of place to consumerism, that have narrow scope, or that rule out the possibility of equilibrium. But these result from the vision of economics as a separate science, not from distinctive views of theory appraisal.

The economist’s deductive method does not recommend repudiating useful empirical generalizations or abandoning accurate predictive devices. Instead, it condemns naive reliance on unreliable empirical generalizations, and it offers an additional means of getting a predictive grasp on the phenomena. Whether the best way to aim an artillery piece is by firing it in various circumstances and fitting a curve to the data points or by calculating from fundamental laws is an empirical question. Rather than forbidding the first procedure, the deductive method offers a way of improving, correcting, and extending the results one gets by it.

If the standard theory of the firm had all the empirical virtues claimed for it by Milton Friedman and others, then one should make use of it for relevant practical purposes. The economist’s deductive method does not recommend theoretical purism that spurns useful tools that are not in perfect condition or perfectly understood. By considering the realism of a theory’s assumptions – the constituent causal processes and their laws – one may be able to get some guidance concerning when the predictions of the theory are likely to break down and concerning how to modify the theory in the face of apparent disconfirmation.

13.7 Conclusion: Economics as a Decreasingly Separate Science

What may stand in the way of developing generalizations that are of practical utility is not the deductive method per se but Mill’s vision of economics as a separate science: as a discipline that is concerned with a domain in which a small number of causal factors predominate. I argued in Chapter 7 that this vision of economics as a separate science, although not often expressed in this terminology, remains central to contemporary microeconomics. Mainstream microeconomics, macroeconomics, and general equilibrium theory presuppose that a single set of causal factors underlies economic phenomena and determines their broad features. Other relevant causal factors are countenanced typically only as disturbing causes, whose influence must be acknowledged, but which do not form a part of the central theory. Their effects are allegedly significant with respect to a narrow range of cases; while, without its many specific qualifications, the basic theory is still purportedly a good general guide. As already noted, Mill makes such a pragmatic case:

[T]he ascertainment of the effect due to the one class of circumstances alone is a sufficiently intricate and difficult business to make it expedient to perform it once for all, and then allow for the effect of the modifying circumstances; especially as certain fixed combinations of the former are apt to recur often, in conjunction with every-varying circumstances of the latter class.

(1843, 6.9.3)

To surrender the understanding of economics as a separate science would be to part with the grand vision that a single theory could provide one with a basic grasp of the subject matter. A considerable number of economists are now willing to pay the price, and they have paired up with psychologists, political scientists, neurologists, and much less frequently with sociologists and anthropologists. However, the temper and character of modern economics still embodies the Millian vision of the discipline as a separate science.

Can one better understand economies by applying equilibrium theory, or would economists do better to develop a variety of different theories with narrower domains and a larger repertory of causes? The latter alternative would lower the barriers between economics and other social sciences, since the causal factors with which sociologists and psychologists have been concerned may be important in particular economic subdomains. Although the question is an empirical one, the answer also depends on the objectives and uses of economic theories. For a separate science of economics has aesthetic appeal, heuristic power, and normative force, none of which economists will willingly sacrifice unless the more fragmented and less purely “economic” alternatives have similar virtues and fit the data much better. So long as the data consist of noisy economic statistics, I doubt that the sacrifice will often appear worthwhile.

But, with the development of experimental economics and with increasingly sophisticated field research, this situation is changing; and if twenty-five years from now I undertake a third edition of this book, its title may no longer be appropriate. The central economic theories may have more structural similarities to the sorts of theories favored by institutionalist economistsFootnote ²⁰ than to contemporary microeconomics, macroeconomics, behavioral economics, and general equilibrium theory. But I have no crystal ball, and it may be impossible to generate additional significant theories that provide any appreciably better grip on the data than do contemporary mainstream theories. In that case, economics will go on as it has; and critics may continue to complain that economists are not behaving as responsible scientists should. But before criticizing prematurely, they should recognize that the apparent dogmatism can arise from the circumstances in which economists find themselves – blessed with behavioral postulates that are plausible, powerful, and convenient, and cursed with the inability to learn much from experience.

Economists are committed to equilibrium theory because they regard its basic laws as credible and as possessing heuristic and pragmatic virtues. Their response to anomalous market data, which mimics the inexact method a priori, is not illegitimately dogmatic. It is, on the contrary, consistent with standard views of theory assessment, once one takes account of how bad these data are. The problem is not a moral failing among economists – their inability to live up to their Popperian convictions – but a reflection of how hard it is to learn about complex phenomena if one does not know a great deal already and can do few controlled experiments.

14 Casting off Dogmatism The Case of Preference Reversals

What happens when economists come across apparently disconfirming experimental evidence? In this chapter, I discuss one fascinating case. I chose it because of its tractability and because the anomalous results have been discussed repeatedly in prominent economic journals. It is an illustration rather than an argument for the interpretation of the evolution of economic methodology defended in Chapter 13. As we shall see, the initial reactions of economists to the anomalous results of experiments carried out by psychologists are very different than current attitudes.

14.1 The Discovery of Preference Reversals

No sensible economist believes that the axioms of utility theory are exceptionless universal laws, but utility theory may still be a reasonable first approximation that is useful in predicting and explaining behavior. What should be of concern to economists would be evidence that people’s choices differ systematically from those predicted by utility theory.

One way in which people’s choice behavior does apparently deviate systematically from that predicted by utility theory involves so-called preference reversals. Paul Slovic and Sarah Lichtenstein describe the discovery of this phenomenon as follows:

The impetus for this study was our observation in our earlier 1968 article that choices among pairs of gambles appeared to be influenced primarily by probabilities of winning and losing, whereas buying and selling prices were primarily determined by the dollar amounts that could be won or lost … Subjects setting a price on an attractive gamble appeared to start with the amount to win and adjust it downward to take into account the probability of winning and losing, and the amount that could be lost. The adjustment process was relatively imprecise, leaving the price response greatly influenced by the starting point payoff. Choices, on the other hand, appeared to be governed by different rules. In our 1971 article, we argued that, if the information in a gamble is processed differently when making choices and setting prices, it should be possible to construct pairs of gambles such that people would choose one member of the pair but set a higher price on the other. We proceeded to construct a small set of pairs that clearly demonstrated this predicted effect.

(1983, p. 597)

Slovic and Lichtenstein called the bets with a high probability of winning “P-bets”; bets with large prizes are “$-bets.” Given their earlier conjectures, Slovic and Lichtenstein predicted that, among pairs of bets with a positive expected value, individuals who choose the P-bets should often be willing to pay more for $-bets. For example, consider the P-bet ( $P *$ ) consisting of a gamble in which one wins $4.00 if a roulette wheel comes up with any number except 1 (i.e., with a probability of 35/36) or loses $1.00 if the roulette wheel comes up 1 (i.e., with a probability of 1/36). Slovic and Lichtenstein paired it with the $-bet ($*) in which one has an 11/36 chance to win $16.00 and a 25/36 chance to lose $1.50. The expected monetary value of the two gambles (the sum of the prices weighted by the probabilities) are respectively $3.86 and $3.85. Slovic and Lichtenstein made the conditional prediction that if individuals preferred the P-bets in pairs such as $(P *, $ *)$ , they would be likely to pay more for the $-bets. Call such reversals “predicted.” A reversal in which an individual prefers a $-bet and prices a P-bet higher is “unpredicted.”

14.1.1 The First Experiments

In their essay, “Reversals of Preference Between Bids and Choices in Gambling Decisions,” Lichtenstein and Slovic (Reference Lichtenstein and Slovic1971) (1971) (1971) report the results of three experiments in which subjects were first asked to choose among bets with approximately the same expected value, such as $P *$ and $$*$ . Then the subjects were distracted with other tasks before they were asked to put a price on bets presented to them one at a time. In the first two experiments, subjects were paid for participating, and there was no actual gambling. In the third experiment, the bets were played, and the subjects were paid their winnings. In the pricing part of experiment I, subjects were asked to suppose that they owned tickets to play the lotteries and to state the minimum price they would accept to sell their tickets. In experiment II, subjects were asked to state the highest price they would pay to purchase each lottery. In experiment III, choices were all repeated three times, with prompting concerning prior choices, and a special device, to be described shortly, was used to give subjects incentives to state accurately the minimum selling prices for lotteries. In experiment I nearly three-quarters of the subjects reversed their preference every time they chose the P-bet in the pairwise comparison. There were few unpredicted reversals. In experiment II the results were not as striking, but more than two-thirds of the subjects had a higher rate of predicted reversals than of unpredicted reversals. In experiment III, which used only fourteen subjects, six always made conditional predicted reversals, five sometimes made them, and unpredicted reversals were infrequent. As Lichtenstein and Slovic’s hypotheses concerning choices and valuations of gambles implied, reversals were most frequent when the loss in the $-bet was larger than in the P-bet (which led subjects to prefer the P-bet more often), and when the win in the $-bet was large relative to the win in the P-bet (which led individuals to bid more for the $-bet).

To encourage subjects to reveal their true minimum selling price, Lichtenstein and Slovic arranged in the third experiment to purchase the bet from a subject whenever a chance mechanism generated a purchase price exceeding the subject’s selling price. If a subject announced a selling price higher than the probabilistically generated purchase price, then the subject would play the lottery instead. Given this arrangement, there is nothing to be gained by understating one’s minimum selling price and there may be real costs, for doing so may result in selling the lottery for less than it is worth to one. To overstate the minimum selling price again brings no additional revenue, and may lead one to play the lottery when one would prefer to sell it at the price offered. This method is due to Becker et al. (Reference Becker, deGroot and Marschak1964).

In the case of bets with negative expected values and improbable but large losses, Lichtenstein and Slovic predicted the opposite reversals among those preferring the $-bets to the P-bets. This implication was not tested in the experiments reported in the 1971 paper. But when the above results were replicated in a later paper (Lichtenstein and Slovic Reference Lichtenstein and Slovic1973), this additional implication was also confirmed. The experiment discussed in the 1973 paper was carried out on the balcony of the Four Queens Casino in Las Vegas, and the experimental subjects, who included professional gamblers, played with their own money! Lichtenstein and Slovic again found frequent predicted reversals and infrequent unpredicted reversals (see also Lindman Reference Lindman1971).

14.1.2 Apparent Significance

If individuals prefer more money to less, preference reversal apparently involves gross choice inconsistency. As the economists David Grether and Charles Plott point out:

Taken at face value the data are simply inconsistent with preference theory and have broad implications about research priorities within economics. It [sic] suggests that no optimization principles of any sort lie behind even the simplest of human choices and that the uniformities in human choice behavior which lie behind market behavior may result from principles which are of a completely different sort from those generally accepted … Notice this behavior is not simply a violation of some type of expected utility hypothesis. The preference measured one way is the reverse of preference measured another and seemingly theoretically compatible way. If indeed preferences exist and if the principle of optimization is applicable, then an individual should place a higher reservation price on the object he prefers.

(1979, p. 623)

Suppose I prefer bet a to bet b and place a price of $$ x$ on a and a price of $$ y$ on b. If we assume that I place a price of $$ x$ on a if and only if I am indifferent between a and $$ x$ and similarly for b and $$ y$ , then I must prefer $$ x$ to $$ y$ . This equivalence between pricing and indifference is called “procedure invariance” by Tversky et al. (Reference Tversky, Slovic and Kahneman1990, p. 205). If I am indifferent between $$ x$ and a, which I prefer to b, and I am indifferent between b and $$ y$ , then, by transitivity, I must prefer $$ x$ to $$ y$ . If, in addition, I prefer more money to less, $$ x$ must be larger than $$ y$ . Yet, in the case of preference reversals, individuals who prefer P-bets price $-bets higher. Preferring a P-bet and pricing a $-bet higher violates either transitivity or procedure invariance.

14.2 Grether and Plott’s Experiments

These results were greeted with skepticism by economists. However, those who reacted to them in print did not argue that such results could not shake their confidence in the fundamental propositions of economic theory. They showed neither the dogmatism implied by the inexact deductive method nor the dogmatism implicit in Milton Friedman’s argument against considering the realism of assumptions. Economists have, of course, considered the possibility that the results are due to disturbing causes or that they arose only because of peculiarities of the experimental set-up. But these possibilities suggest experiments rather than providing automatic excuses. Thus Grether and Plott comment:

There is little doubt that psychologists have uncovered a systematic and interesting aspect of human choice behavior. The key question is, of course, whether this behavior should be of interest to economists. Specifically it seems necessary to answer the following: 1) Does the phenomenon exist in situations where economic theory is generally applied? 2) Can the phenomenon be explained by applying standard economic theory or some immediate variant thereof?

(1979, p. 624)

Grether and Plott did not dismiss the results as due to experimental error or economically insignificant disturbing causes. Instead, they attempted to see whether the preference reversal phenomenon would disappear in properly designed experiments. Grether and Plott are explicit about how they want the experiments to come out, for they say bluntly that the purpose of their experiments was “to discredit the psychologists’ works as applied to economics” (1979, p. 623). Nevertheless, whether equilibrium theory can be defended depends on the experimental results, not on methodological fiat.

Accordingly, Grether and Plott constructed a list of possible explanations for the preference reversal phenomenon. On the list are psychological explanations, including two in terms of human information-processing procedures. The first of these is Lichtenstein and Slovic’s, in terms of the different methods devoted to different cognitive tasks, while the second, which is not their view, although it might complement their view, explains the preference reversals in terms of information-processing strategies designed to lessen the costs of decision-making. The other possible psychological hypotheses on Grether and Plott’s list cannot explain the data.

In addition, the list includes explanations in terms of faults in the experiment – misunderstanding among unsophisticated subjects, expectations produced by the knowledge that these were psychological experiments, and so forth. Grether and Plott do not believe that Lichtenstein and Slovic have botched their experiments, but, just to be sure, they try to control for these unlikely sources of the odd results.

14.2.1 How Preference Reversals Might Be Explained Away

Grether and Plott are particularly interested in the following four ways in which economists might attempt to explain away the preference reversal phenomenon. If supported by the evidence, these explanations would show that the preference reversal phenomenon poses no serious challenge to mainstream economics. The four possible “economic” explanations are:

1. Poor incentives: the incentives in the experiment were insufficient to get people to behave as they would in real life when making significant decisions.
2. Income effects: as people acquire more wealth, they may rationally come to be willing to gamble more. This change in aversion to risk as a result of increases in wealth could contaminate the results in some of Lichtenstein and Slovic’s experiments, in which many gambles were played, and wealth changed between separate choices.
3. Indifference: in Lichtenstein and Slovic’s experiment, subjects were not allowed to say that they were indifferent between the two bets. If subjects were indifferent between the P- and $-bets, when they said they preferred the P-bet, then there would be less irrationality in pricing the $-bet higher.Footnote ¹
4. Strategic pricing: subjects might not be telling the truth when asked to state the minimum price they would accept to sell a lottery. It is often advantageous to place a higher price on what one is selling than one would truly be willing to accept. Since it is hard to exaggerate the value of the P-bet, this general strategy could account for the reversals.

Grether and Plott endeavored to control for these factors to see whether the conditionally predicted preference reversals would then go away (see also Grether and Plott Reference Grether and Plott1982). Before discussing their experiments, it is worth noting that these alternatives to accepting Lichtenstein and Slovic’s hypothesis are implausible and generally insufficient:

1. Poor incentives: since the same results obtained in Lichtenstein and Slovic’s experiments whether the gambles were played, whether it was the subject’s own money, and whether individuals were driven to attend carefully, it is hard to believe that the preference reversals result merely from the weakness of the incentives. And, while it would be reassuring to economists if preference reversals went away when the incentives were substantial, economists should still be curious as to why weak incentives would lead only to the predicted, not the unpredicted, reversals.
2. Income effects: it is hard to believe that these could be important, since the results obtained whether the gambles were played or not; and, as is noticed in the first excuse, the stakes were low. Furthermore, the opposite reversals in the case of bets with large possible losses, which were predicted and observed in the Las Vegas replication, are inconsistent with a purported explanation in terms of income effects.
3. Indifference: even if individuals were indifferent between P- and $-bets when they announced a preference for the P-bet, it would still be inconsistent with rational choice theory to price the $-bet higher. Also, indifference would not explain the asymmetry between the frequency of predicted reversals and unpredicted reversals. Furthermore, although Lichtenstein and Slovic did not permit individuals to register indifference, they did ask them to indicate strength of preference on a four-point scale – “slight,” “moderate,” “strong,” and “very strong” – and the mean strength of preference indicated was “strong.”
4. Strategic pricing: strategic misrepresentation would not explain reversals, when individuals were only asked to price gambles rather than to state buying or selling prices and would predict that, when asked to state buying price, individuals would understate the prices of $-bets.

14.2.2 Grether and Plott’s Results

It not surprising that Grether and Plott failed to make the preference reversal phenomenon go away by controlling for these factors. Here is what happened:

1. Poor incentives: to determine the importance of incentives, Grether and Plott varied them. The phenomenon was unaffected, refuting the explanation in terms of weak incentives. The fact that incentives had little effect was taken by Grether and Plott as evidence also against the explanation in terms of information-processing costs, since individuals should devote more care to adjusting for probabilities as the stakes increase.
2. Income effects: to control for these, subjects played only one of the gambles (which was chosen randomly) and the order of choosing versus bidding varied. The phenomenon persisted.
3. Indifference: Grether and Plott permitted subjects to register indifference as well as preference, but scarcely any subject ever did, so the phenomenon is not explained by indifference.
4. Strategic pricing: Grether and Plott used the same Becker–deGroot–Marschak mechanism as Lichtenstein and Slovic in order to elicit a truthful statement of minimum selling price. They also compared the result with simply asking people to state what they believed a lottery was worth. The amounts stated when pricing and evaluating were not appreciably different, ruling out the explanation in terms of strategic responses.

Grether and Plott conclude:

Needless to say, the results we obtained were not those expected when we initiated this study. Our design controlled for all the economic-theoretic explanations of the phenomenon which we could find. The preference reversal phenomenon which is inconsistent with the traditional statement of preference theory remains.

(1979, p. 634)

What is surprising is not the result of Grether and Plott’s experiments but their surprise at the results. Given the implausibility of the alternatives to Lichtenstein and Slovic’s hypothesis, which incidentally predicted this phenomenon before it was ever observed, it seems, at least with hindsight, that Grether and Plott should not have expected any different results.

14.2.3 Apparent Dogmatism: Grether and Plott’s Conclusions

What then do Grether and Plott conclude?

The fact that preference theory and related theories of optimization are subject to exception does not mean that they should be discarded. No alternative theory currently available appears to be capable of covering the same extremely broad range of phenomena. In a sense the exception is an important discovery, as it stands as an answer to those who would charge that preference theory is circular and/or without empirical content. It also stands as a challenge to theorists who may attempt to modify the theory to account for this exception without simultaneously making the theory vacuous.

(1979, p. 634)

After the preceding open-minded discussion and the striking concession that the preference reversal phenomenon really does appear to refute a central behavioral postulate of contemporary economics, these words (which constitute the last paragraph in Grether and Plott’s conclusion) are a letdown. It is almost as if they conclude that, “since these data cannot be discredited, economists should ignore them, after first congratulating themselves for possessing a false rather than vacuous theory.” Is this caricature unfair? Is their response justifiable?

14.3 Dogmatism and the Commitment to Economics as a Separate Science

Dogmatism is sometimes justifiable. As philosophers such as Lakatos have pointed out, theories are too valuable and too hard to generate to be easily discarded, even when they face serious problems. Theory change awaits the formulation of better alternatives.

Moreover, Grether and Plott use their experiments to test the explanations of preference reversal proposed by some psychologists, and they argue that some of these hypotheses are unsuccessful too (1979, p. 634). They suggest that they are confronting a mysterious phenomenon rather than rejecting a well-confirmed alternative hypothesis. But Lichtenstein and Slovic’s own hypothesis anticipated the experimental results, and it is confirmed by these new experiments. At first glance, Grether and Plott’s reaction seems indefensibly dogmatic.

What explains this dogmatism? Commentators such as Hutchison and Blaug complain that economists are employing something like Mill’s deductive method and are unwilling to take evidence seriously. But Grether and Plott are not committed to Mill’s inexact deductive method. They do not refuse to take the disconfirming evidence provided by Lichtenstein and Slovic seriously, and they are not content to say merely that the problem must be caused by some disturbing cause. On the contrary, here is an instance where respected economists, who are committed to equilibrium theory, are prepared to conclude that the evidence has disconfirmed one of its most central claims. But having done so, surprisingly little changes.

Why? How else might one explain this apparent dogmatism? The reason Grether and Plott give for refusing to move from refutation to theory change or modification is: “No alternative theory currently available appears to be capable of covering the same extremely broad range of phenomena.” This way of defending economic theory is familiar. Recall the comments quoted from Koopmans near the end of Chapter 11. At first glance, the defense seems reasonable. Equilibrium theory is only a first approximation, so disconfirmations are not decisive. And, in any case, theory assessment is comparative. As problematic as economic theory may be, there are no alternatives which provide “better approximations … at the level of the premises” (Koopmans 1957, pp. 141–2) and enable one to draw conclusions comparable to those which can be drawn from accepted theory. Friedman offers a similar defense when he remarks that “criticism of this type is largely beside the point unless supplement by evidence that a hypothesis … yields better predictions for as wide a range of phenomena” (1953c, p. 31).Footnote ² But Friedman’s and Koopmans’ defenses of equilibrium theory, like Grether and Plott’s, have a tacit premise: that any good economic theory must, like the accepted theory, have both comprehensive scope and a parsimonious theoretical core. The stipulated standard that an alternative theory must meet is that it “be capable of covering the same extremely broad range of phenomena.”

Grether and Plott, like Koopmans and Friedman, are committed to a vision of economics as a separate science, as a science that explains and predicts all central and significant economic phenomena by means of a single systematic theory. This theoretical strategy precludes accepting hypotheses with a narrow scope such as Lichtenstein and Slovic’s generalizations concerning choosing and pricing gambles. These generalizations are significant for only a small set of phenomena; they are not significant factors in all economic phenomena.

Grether and Plott, Koopmans, and Friedman are not just saying that it is reasonable to hang on to accepted theory, since there are no alternatives that are better confirmed. Instead, they implicitly demand that any alternative to accepted theory must preserve a peculiarly “economic” realm to be spanned by a singled unified theory. They are not merely defending simplicity, unity, and broad scope as methodological desiderata or as criteria to be employed when there are ties or near ties on empirical grounds. Instead, one finds a constraint in operation here against considering hypotheses with narrow scopes, regardless of their empirical support. Similarly, one of the attractions of real business cycle theory was precisely the unification between microeconomics and macroeconomics it hoped to achieve.

As argued in Chapter 7, this requirement seems unjustified. In defense of it in this context, economists might argue that since utility theory is a theory of rationality, as well as a set of generalizations about how people in fact behave, it should not have a piecemeal structure. But this assumes that the positive theory of choice must also be a theory of rational choice. As argued in Chapter 13, there are pragmatic grounds for preferring theories of choice that are also theories of rational choice, but those grounds must take second place to empirical evidence. It would be nice if a better confirmed alternative possessed such unity and scope, united positive economics and the theory of rationality, and preserved the peculiar moral authority of economists, but one cannot inflate these methodological desiderata into methodological constraints against considering alternatives, no matter how much better they fit the data.

14.4 Further Responses by Economists

Let us now consider how other economists reacted to the discovery of preference reversals. In the decade following Grether and Plott’s essay, economists authored several significant papers on the phenomenon, most of which appeared in the American Economic Review, one of the most prestigious economics journals. One striking feature of these publications is that none pays careful attention to Lichtenstein and Slovic’s hypothesis (that people employ different cognitive processes when pricing than when choosing), and there was at the time no attempt on the part of economists to incorporate Lichtenstein and Slovic’s hypothesis into economics. At that time, there was little theoretical collaboration between economists and psychologists in this area, and the continuing work by psychologists on aspects of preference reversals was not cited by economists.

In the immediate aftermath of Grether and Plott’s essay, Pommerehne et al. (Reference Pommerehne, Schneider and Zweifel1982) and Reilly (Reference Reilly1982) tried even harder to make the preference reversal phenomenon go away and were able to reduce the frequency of preference reversals (although in doing so, they also blunted Grether and Plott’s criticism of the information-processing-costs explanation). But Pommerehne et al. found that, although experimental subjects can learn from repetitions to accept more profitable gambles, they do not learn to avoid preference reversals (1982, p. 573). In a more dramatic demonstration of just how robust the phenomenon is, Berg et al. (Reference Berg, Dickhaut, O’Brien and Smith1985) ran a series of experiments in which they exploited the choice inconsistencies to lead the subjects through a “money pump” cycle of exchanges in which they paid money to wind up back where they started. The effect was to decrease the dollar amount of the preferences reversals but not to eliminate them (Roth Reference Roth1988, p. 1015).Footnote ³ All of this confirms Lichtenstein and Slovic’s initial hypothesis.

Instead of examining that hypothesis, some economists still tried to shield equilibrium theory from refutation. Thus Holt (Reference Holt1986) and Karni and Safra (Reference Karni and Safra1987) pointed out that the experimental results can be explained by a failure of the independence axiom, rather than by a failure of transitivity. Since the independence condition is not a part of ordinal utility theory and not as central to the theory of rationality, this was an encouraging result for some mainstream economists. With odd preferences for money and a strange function relating degrees of belief to objective probabilities, one can explain the experimental results as what Karni and Safra call “announced price reversals” that show no intransitivities. In a similar vein, Segal (Reference Segal1988) pointed out that the preference reversals in some of Grether and Plott’s experiments could be due to a failure of the reduction postulate,Footnote ⁴ which is an even less important part of the theory of rational choice. Since equilibrium theory presupposes only ordinal utility theory and does not rely on either the independence condition or the reduction postulate, these alternatives would save equilibrium theory from apparent disconfirmation.

But these ways of “saving” transitivity are implausible and do not account for the details of the data. The purported explanation of preference reversals in terms of a failure of the independence condition requires attributing to people, in an ad hoc way, bizarre preferences and subjective probability judgments, for which there is no independent evidence.Footnote ⁵ Furthermore, no single set of such beliefs and preferences can account for the whole series of choices subjects make in the experiments. The purported explanation in terms of a failure of the reduction postulate is just as ad hoc and, as noted by Tversky et al. (Reference Tversky, Slovic and Kahneman1990, p. 209), it cannot explain the asymmetry in preference reversals.Footnote ⁶ Tversky et al. also establish that a random mixture of P-bets and $-bets is not preferred to the P-bets and $-bets for sure, as it should be if there is a failure of independence (1990, p. 209). Furthermore, the explanation in terms of a failure of the reduction postulate is refuted by the result that the selling prices elicited by the Becker, deGroot, and Marschak mechanism do not differ significantly from the other valuations subjects make.

Although these alternative explanations for the preference reversal phenomenon are of interest mainly as evidence of how unwilling economists were to accept the disconfirmation or to take seriously psychological hypotheses, they were tested. In a 1989 essay (also published in the American Economic Review), Cox and Epstein report the results of tests of the explanations of preference reversals in terms of failures of the independence condition or of the reduction postulate. The paper begins with a misstatement of the preference reversal phenomenon: it is described simply as any inconsistency between the pricing and choice of $- and P-bets rather than pricing $-bets higher than chosen P-bets. The authors do note the point in a footnote (1989, p. 409), in which they mention that a referee pointed it out! The essay pays little attention to the details of the phenomenon or to the psychological hypothesis that predicts just these reversals.

To determine whether reversals might be due to failures of independence or of the reduction postulate, Cox and Epstein jettison the Becker–deGroot–Marschak elicitation mechanism. Instead, subjects were asked to price both of the gambles in a pair at the same time and were told that they would get to play the gamble with the higher price and would be paid a fixed amount for the gamble with the lower price (1989, p. 412). This experimental procedure is seriously faulty, for it makes pricing just a way of stating a choice. Cox and Epstein themselves conjecture “that most of our subjects realized that the particular numbers they stated for prices were irrelevant except for their relative magnitudes. This was evidenced by their comments and by their propensity to state prices such as 1,000 francs for lottery a and 999 francs for lottery b in any given (a, b) pair” (1989, p. 422).

The procedure removes the central difference between the tasks of pricing and choosing that led Lichtenstein and Slovic to predict the reversals in the first place, and there is no reason to expect the phenomenon to present itself in these circumstances. Were Cox and Epstein’s procedure to show the same preference reversals, one would have grounds to doubt Lichtenstein and Slovic’s account of the source of the phenomenon.

Cox and Epstein do not find the standard (asymmetrical) preference reversal phenomena, and they conclude mistakenly that these results disconfirm Lichtenstein and Slovic’s work. Cox and Epstein write: “However, if the anchoring and adjustment theory is to be immunized to the apparent falsifying evidence of our experiments, it will have to be extended to incorporate more than a message space explanation of choice reversals” (1989, p. 422). What they mean by the “message space explanation of choice reversals” is an explanation in terms of whether bids rather than choices are elicited. Cox and Epstein’s conclusions are unpersuasive. What matters in Lichtenstein and Slovic’s view is the task subjects are asked to carry out, not the way the task is worded. Cox and Epstein find reversals in either direction about one-third of the time, but in the absence of any inquiry concerning how consistently the subjects otherwise choose, it is impossible to diagnose the causes of these reversals.

Although one sees in this history little evidence of a distinctively dogmatic view of theory appraisal, one does see insularity of a different sort. In particular, economists at that time showed little interest in the hypotheses psychologists formulated to explain this aberrant choice behavior. I suggest that what explains this lack of interest is the threat these hypotheses pose to the structure of theoretical economics.

The unwillingness to take seriously theoretical work by psychologists is ironic, because Lichtenstein and Slovic’s hypothesis concerning information processing can be modeled with the mathematical tools economists employ and combined with much that is standard in economic theory. Their views on information processing owe a great deal to the work of less orthodox economists such as Richard Cyert, James March, or Herbert Simon (Cyert and March Reference Cyert and March1963; Simon Reference Langley, Simon, Bradshaw and Zytkow1959). But to incorporate Lichtenstein and Slovic’s hypothesis within an economic model would be to move toward modeling economic behavior with many behavioral postulates rather than few and with behavioral postulates that apply only to a comparatively narrow range of phenomena. It calls on economists to surrender their vision of a single unifying mode of economic analysis.

After the failure of attempts such as Holt’s, Karni and Safra’s, and Segal’s to “save” the standard theory, decision theorists and theoretical economists in the late 1980s and 1990s began to countenance the possibility that preference reversals constitute a disconfirmation of ordinal utility theory, and some theorists proposed alternatives to account for the phenomenon. Loomes and Sugden (Reference Loomes and Sugden1983) argue that their revision of expected utility theory, “regret theory” (1982) can explain the preference reversal phenomenon. However, later it became clear that the regret theory explanation of the reversals fails.Footnote ⁷ Later, Sugden (Reference Sugden2003) offered an explanation that depends on loss aversion in selling the $-bet relative to a reference level inflated by anchoring on the large pay-off in the $-bet. Machina (Reference Machina1987) suggests that formal choice models involving intransitive preferences can be formulated. These examples show that distinguished economists such as Sugden and Machina are willing to discard even such a central postulate of equilibrium theory as transitivity. Economists are more willing to jettison important parts of their theories than their theoretical strategy. These proposals for modifying utility theory cling to the vision of a separate science of economics.

14.5 Preference Reversals and “Procedure Invariance”

Somewhat later, contributions of psychologists concerning preference reversal undermine these theories, for they suggest that preference reversals are not due to a failure of transitivity after all.Footnote ⁸ Recall that pricing the $-bet higher than the P-bet violates transitivity only if one assumes that, in pricing a bet, an individual is indifferent between the stated price and the bet. Failures of this assumption of procedure invariance rather than failures of transitivity might be responsible for preference reversal. In a 1990 paper, Tversky, Slovic, and Kahneman point out this fact and report the findings of experiments designed to discriminate intransitivities from procedural variances.

Suppose an individual a is offered a choice between three alternatives, P-bets, $-bets, and some pay-off for certain, $$ X$ , whose value is between the subject’s values of the $P - bet$ and the $-bet. Let $$ (P)$ and $$ ($)$ represent the prices the individual would put on the P- and $-bets, and assume in each case that the following preference orderings hold:

$ ($) > $ X > $ (P)

P - bet > $ - b e t

We know the orderings within the two rows, but we do not necessarily know where items in one row fit in the ordering in the other. Depending on how these are combined, preference reversals in the predicted direction can arise in four different ways (Tversky et al. Reference Tversky, Slovic and Kahneman1990, p. 206):

1. Intransitivity: given procedure invariance, individuals are indifferent between the P-bet and $$ (P)$ and between the $-bet and $($) and one has the intransitive ranking:

$ ($) ~ $ - bet > $ X > $ (P) ~ P - bet > $ - bet

2. Overpricing the $-bet: subjects are indifferent between $$ (P)$ and the $P - bet$ and $$ X$ is preferred to both the bets. The consistent preference ordering is:

$ ($) > $ X > $ (P) ~ P - bet > $ - bet

3. Underpricing the P-bet: subjects are indifferent between $($) and the $-bet and both bets are preferred to $$ X$ . The consistent preference ordering is:

P - bet > $ ($) ~ $ - bet, > $ X > $ (P)

4. Overpricing the $-bet and underpricing the P-bet: one consistent preference ordering is:Footnote ⁹

$ ($) ~ P - bet > $ X > $ (P) ~ $ - bet

Procedure invariance requires one to rank the bet and its price equally, and it leads to intransitivity. If it is not assumed, then the bets can be placed in various places in the monetary ranking, and purely transitive preference orderings are possible.

As summarized in the Table 14.1, the four cases above provide testable criteria for intransitivity, overpricing, underpricing, or both over and underpricing as explanations of preference reversals. In a sizable study, Tversky et al. tested for the frequency of these four patterns. It seems that procedure variance, in particular overpricing of the $-bet, is a much more important factor in preference reversals than is intransitivity. The results are written at the bottom of each column.

Table 14.1 Sources of preference reversals

$\begin{matrix} $ - bet > $ X > \\ P - bet \end{matrix}$	$\begin{matrix} $ X > P - bet > \\ $ - bet \end{matrix}$	$\begin{matrix} P - bet > $ - bet > \\ $ X \end{matrix}$	$\begin{matrix} P - bet > $ X > \\ $ - bet \end{matrix}$
Intransitivity	Overpricing $-bet	Underpricing P-bet	Over and underpricing
10%	65%	6%	18%

These data are compatible with Lichtenstein and Slovic’s original explanation for preference reversals. Agents pay more attention to pay-offs when they are pricing bets than when they are stating a preference. This explanation is superficial, and Tversky et al. offer a conjecture with a wider scope. They argue that human thinking is influenced by what they call “scale compatibility.” If asked to answer a question about quantities in a particular unit, people give a larger role to data expressed in the same units. Dollar amounts have a greater influence in the pricing task, because dollars are the units in which one prices. It follows that preferences and the pricing of bets that involve nonmonetary prices should be more consistent, as has been shown by a study done by Slovic et al. (Reference Slovic, Griffin, Tversky and Hogarth1990). If one examines rankings and the pricing of monetary options in which there is no explicit element of risk, similar reversals should be found. This implication is supported by the results of a second experiment by Tversky et al. (Reference Tversky, Slovic and Kahneman1990, pp. 212–14) in which subjects were asked to rank and to price options involving different time patterns of incomes. Many who prefer smaller short-run gains place a higher price on larger longer-run gains.

The implications for economics are disturbing. Tversky and Thaler conclude a summary article as follows:

The discussion of the meaning of preference and the status of value may be illuminated by the well-known exchange among three baseball umpires. “I call them as I see them,” said the first. “I call them as they are,” claim the second. The third disagreed, “They ain’t nothing till I call them.” Analogously, we can describe three different views regarding the nature of values. First, values exist – like body temperature – and people perceive and report them as best they can, possibly with bias (I call them as I see them). Second, people know their values and preferences directly – as they know the multiplication table (I call them as they are). Third, values or preferences are commonly constructed in the process of elicitation (they ain’t nothing till I call them). The research reviewed in this article is most compatible with the third view of preference as a constructive, context-dependent process.

(1990, p. 210)

This context-dependence has serious implications, for, as Tversky et al. conclude:

These developments highlight the discrepancy between the normative and the descriptive approaches to decision-making, which many choice theorists (see Mark Machina Reference Machina1987) have tried to reconcile. Because invariance – unlike independence or even transitivity – is normatively unassailable and descriptively incorrect, it does not seem possible to construct a theory of choice that is both normatively acceptable and descriptively adequate.

(1990, p. 215)

What seems to be required is the sort of theorizing that has traditionally been repugnant to economic theorists. The pragmatic preference for a theory of choice that is also a theory of rational choice will have to be abandoned. The hope of a unitary account of all economic choice behavior vanishes. There also seems to be a case here for the sort of subjectivist perspective that contemporary Austrian economists defend. For, in eliciting preferences, we must attend to how agents interpret our actions and questions (Schick Reference Schick1987). The independence between belief and preference that is fundamental to standard decision theory is cast into doubt.

Contrast these disturbing implications to Machina’s (Reference Machina1987) discussion of preference reversal. Although that essay takes the phenomenon seriously, holds no hope for making it disappear, and indeed seems to urge economists to consider what sort of influence such anomalies may have in real market behavior (1987, p. 140),Footnote ¹⁰ Machina’s theoretical prescription is to construct a still more general formal theory of utility maximization that permits intransitivities. Machina apparently does not regard piecemeal theorizing that relies on substantive generalizations with limited applicability as worth considering. But such theorizing seems to be needed.

14.6 Current Thoughts on Preference Reversals

Over the past generation, economists and psychologists have not published much on preference reversals, but the chasm between them has disappeared (Starmer Reference Starmer, Durlauf and Blume2008). What one finds are new ways to investigate the phenomenon experimentally by both psychologists and economists as well as refinements of previously proposed explanations. Examples of new experimental findings are Bleichrodt and Pinto Prades (Reference Bleichrodt and Pinto Prades2009), Kim et al. (Reference Kim, Seligman and Kable2012), and Alós-Ferrer et al. (Reference Alós-Ferrer, Granic ́, Kern and Wagner2016).

Bleichrodt and Pinto Prades find an odd reversal that may be peculiar to choices with a possibility of death and which may not be related to the preference reversals between ranking and pricing. Consider the following health state, $X$ , which is one possible outcome of having had a stroke:

As a consequence of the health problem the patient is unable to live independently. He is unable to travel alone or shop without help if he did these things previously; and he is unable to look after himself at home for some reason (for example he may not be able to prepare a meal, do household chores, or look after money). He can attend to his bodily needs (such as washing, going to the toilet, and eating) without problems.

(2009, p. 715)

Many experimental subjects who prefer health state $X$ to death (as most do) prefer a treatment for strokes that has a 75 percent chance of cure and a 25 percent chance of death to a treatment that has a 75 percent chance of death and a 25 percent chance of their winding up in state $X$ . It is not clear how this reversal should be understood, but it certainly complicates the task of determining the treatment priorities of patients.

In contrast, both Kim et al. (Reference Kim, Seligman and Kable2012) and Alós-Ferrer et al. (Reference Alós-Ferrer, Granic ́, Kern and Wagner2016) are concerned with the asymmetric preference reversals identified by Lichtenstein and Slovic. Kim et al. (a group of psychologists) provide additional evidence for Tversky’s compatibility hypothesis by measuring where the visual attention of subjects is directed when they are engaged in different tasks. When choosing, they visually fixate on the probabilities, while when bidding, they fix their vision on the pay-offs. Alós-Ferrer et al., in contrast, measure how long it takes individuals to express a preference or to announce a price. By studying decision times, which are indicators of cognitive difficulty, the authors make the case that both predicted and unpredicted reversals involve greater cognitive difficulties. They hypothesize that reversals are due to two factors: “noisy” lottery evaluations – that is, imprecise preferences – give rise to the reversals, while overpricing (which in turn can be explained by Tversky’s compatibility hypothesis) explains the asymmetry in the reversals, whereby reversals are much more frequent among those who preferred P-bets. When Alós-Ferrer et al. substituted a nonmonetary ranking method for pricing, the asymmetry disappeared. (Indeed, the reversals among those who chose the $-bet were now somewhat more frequent than reversals among those who chose the P-bet.) Alós-Ferrer et al. sum up as follows:

Given the fundamental importance of preference elicitation methods for both decision theory and applied economics, and the amount of attention dedicated to the preference reversal phenomenon in the last half century, we believe that fleshing out these mechanisms is an important step. At the same time, we show that a parsimonious combination of insights from the literature with standard facts on decision times can account for received evidence and provide new, testable hypotheses allowing us to better understand the determinants of the preference reversal phenomenon.

(2016, p. 95)

This attitude toward the preference reversal phenomena bears little resemblance to the efforts in the 1970s and 1980s to make the phenomena go away. Instead of treating preference reversals as something from which economics needs to be protected, Alós-Ferrer et al. regard preference reversals as phenomena that economists should attempt to understand.

The general complaisance with which most economists regard the claims of equilibrium theory has not disappeared, but economists have become more willing to take seriously relevant psychological hypotheses. The attractions of a separate science run deep, but there is no justification for insisting on such a structure, and doing so has in the past created unreasonable barriers to theoretical and empirical progress.

Christian Seidl concludes his survey of work on preference reversals with the following prediction (2002, pp. 646–7):

A plethora of empirical phenomena, so far hardly ever noticed by the economics profession, will become centerpieces of applied economic research: Anchoring moves individuals’ values and preferences in the direction marked by the anchor [cf., e.g., Slovic (Reference Slovic1972); Tversky and Kahnemann (Reference Kahnemann1974); Slovic et al. (Reference Slovic, Fischhoff, Lichtenstein., Jungermann and de Zeeuw1977, p. 16); Edwards and von Winterfeldt (Reference Edwards and von Winterfeldt1986, p. 247); Northcraft and Neale (Reference Northcraft and Neale1987); Kahnemann (Reference Kahnemann1992)]. The background contrast effect purports that an alternative appears attractive on the background of less attractive alternatives and unattractive on the background of more attractive alternatives [Simonson and Tversky (Reference Simonson and Tversky.1992); Tversky and Simonson (Reference Tversky and Simonson1993)]. The tradeoff contrast effect means that the relative scarcity of attributes of choice alternatives influences the weighting of an option’s attributes for subsequently presented alternatives [Tversky and Simonson (Reference Tversky and Simonson1993, p. 1181)]. The asymmetric dominance effect notes that the presentation of a choice alternative Z, which is dominated by X, but not by Y, shifts preferences in favor of X [Huber et al. (Reference Huber, Payne and Puto1982); Huber and Puto (Reference Huber and Puto1983); Tyszka (Reference Tyszka, Sjöberg, Tyszka and Wise1983); Ratneshwar et al. (Reference Ratneshwar, Shocker and Stewart.1987); Wedell (Reference Wedell1991)]. The endowment effect observes that people demand more to give up an object than they are willing to pay to acquire it, which causes differences in willingness-to-accept and willingness-to-pay on the one hand, and nonreversibility of indifference curves on the other [Thaler (Reference Thaler1980); Knetsch (Reference Knetsch1989; 1992); Kahneman et al. (Reference Kahneman, Knetsch and Thaler1990; 1991)]. The availability bias means that subjects judge the probability of events by the ease of getting information.

[Tversky and Kahneman (1973); Lichtenstein et al. (1978)]

I cannot vouch for the accuracy of Seidl’s prediction, but I agree with him on the direction of change.

Book contents

Part II - Theory Assessment

Summary

Information

9.1 Mill on Tendencies

9.2 Four Kinds of Inexactness

9.2.1 Inexactness as Probabilistic

9.2.2 Inexactness as Approximation

9.2.3 Inexactness as Vague Qualification

9.2.4 Inexactness as Tendency

9.2.5 Some Remarks on Idealizations

9.3 The Meaning or Truth Conditions of Inexact (Causal) Generalizations

9.4 Qualification or Independent Specification

9.5 Mechanical Phenomena and the Composition of Economic Causes

9.6 Conclusions

10.1 Confirmation: Likelihoods and Bayesian and Hypothetico-Deductive Methods

10.2 Confirmation in Economics: An Old-Fashioned View

10.3 When Do Generalizations Express Genuine Tendencies?

10.4 Mill’s Deductive Method

10.5 The Inexact Deductive Method

10.6 Conclusion and Qualms

11.1 Terence Hutchison and the Initial Challenge

11.2 Paul Samuelson’s “Operationalism”

11.3 Fritz Machlup and Logical Empiricism

11.4 Friedman’s Narrow Instrumentalism

11.5 Koopmans’ Restatement of the Difficulties

12.1 The Problems of Demarcation

12.2 Logical Falsifiability and Popper’s Solution to the Problem of Induction

12.3 Falsificationism as Norms to Govern Science

12.4 Decisions, Evidence, and Scientific Method

12.5 Why Are Economic Theories Unfalsifiable?

12.6 Lakatos and Sophisticated Methodological Falsificationism

12.7 The Appraisal of Scientific Research Programs

12.8 Further Comments on Induction, Falsification, and Verification

12.9 Concluding Remarks on Popperian and Lakatosian Methodologies

13.1 Apparent Dogmatism and the Weak-Link Principle

13.2 Are Economists Too Dogmatic?

13.3 Expected Utility Theory and Its Anomalies

13.3.1 The Allais Problem

Table 13.1 The Allais problem

13.3.2 Qualification versus Disconfirmation

13.3.3 Incomplete Preferences: Levi’s Alternative

13.4 Behavioral Economics and Methodological Changes

Table 13.2 Saving lives

Table 13.3 Allowing deaths

13.5 The Economists’ Deductive Method

Table 13.4 Deductive methods

13.6 The Deductive Method and the Demands of Policy

13.7 Conclusion: Economics as a Decreasingly Separate Science

14.1 The Discovery of Preference Reversals

14.1.1 The First Experiments

14.1.2 Apparent Significance

14.2 Grether and Plott’s Experiments

14.2.1 How Preference Reversals Might Be Explained Away

14.2.2 Grether and Plott’s Results

14.2.3 Apparent Dogmatism: Grether and Plott’s Conclusions

14.3 Dogmatism and the Commitment to Economics as a Separate Science

14.4 Further Responses by Economists

14.5 Preference Reversals and “Procedure Invariance”

Table 14.1 Sources of preference reversals

14.6 Current Thoughts on Preference Reversals

Footnotes

9 Inexactness in Economic Theory

10 Mill’s Deductive Method and the Assessment of Economic Hypotheses

11 Methodological Revolution

12 Karl Popper and Imre Lakatos Falsificationism and Research Programs

13 The Inexact Deductive Method

14 Casting off Dogmatism The Case of Preference Reversals

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Save book to Kindle

Save book to Dropbox

Save book to Google Drive