Causation

Eric W. K. Tsang

doi:10.1017/9781009323109.003

This chapter starts with a discussion of Hume’s regularity theory of causality, which argues that the idea of necessary connection between two events (i.e., cause and effect) cannot be derived from observing the events and must be derived from an internal impression. Simply put, a causal relation is a mere constant conjunction of events. A counterfactual analysis of causation is based on Hume’s remark that if the cause did not occur, the effect would not exist. Lewis develops the analysis using the concept of possible world. In contrast to Hume’s deterministic view of causation, a probabilistic approach argues that causes raise the probability of − but do not necessarily lead to − their effects. Causal graph modeling and vector spacing modeling are two major techniques of identifying causal relations from empirical data, with the latter being more suitable for management research. A mechanism is a causal chain producing the effect of interest, and process tracing is a technique for identifying mechanisms in qualitative research.

Chapter 2 Causation

“Causation is a topic of perennial philosophical concern” (Hitchcock Reference Hitchcock1996: 267). The way nature operates is via causation: the processes unfolding around us are causal processes, with earlier processes linked to later ones by causal relationships (Beebee Reference Beebee2006). Although words related to causation pervade our everyday conversations, natural scientists are more cautious in using such vocabulary. Judea Pearl, a key founder of causal modeling that is discussed later in this chapter, laments such conservatism in scientific research:

The word cause is not in the vocabulary of standard probability theory. It is an embarrassing yet inescapable fact that probability theory, the official mathematical language of many empirical sciences, does not permit us to express sentences such as “Mud does not cause rain”; all we can say is that two events are mutually correlated, or dependent − meaning that if we find one, we can expect to encounter the other. Scientists seeking causal explanations for complex phenomena or rationales for policy decisions must therefore supplement the language of probability with a vocabulary for causality, one in which the symbolic representation for the causal relationship “Mud does not cause rain” is distinct from the symbolic representation for “Mud is independent of rain.” Oddly, such distinctions have not yet been incorporated into standard scientific analysis.

(Pearl 1998: 226–227)

In contrast to their natural science colleagues, social scientists seem to be more cognizant of the fact that knowledge of causation affects their understanding of the social world (Gerring Reference Gerring2008). In particular, management researchers do not shy away from using causal language. For instance, case studies enable researchers to tease out ever-deepening layers of reality when searching for mechanisms and contingencies and to peer into the box of causality when identifying the factors connecting some critical cause with its purported effect (Gerring Reference Gerring, Boix and Stokes2007). More specifically, a longitudinal case design allows researchers to collect data about how events of interest unfolded over time and thus provide stronger evidence for proposed causal relationships than a cross-sectional design would allow. In other words, a main objective of case studies is to figure out the causes of events.

Explanation and causation are intimately related. To explain an event is to cite a cause of the event (Hausman Reference Hausman1998) and the event “stands at the end of a long and complicated causal history” (Lewis Reference Lewis1986: 214). Explanation involves causation but not vice versa; we may observe a causal process unfolding without any intention to explain it. Explanation is epistemological and causation is metaphysical. Causation is objective in that it is a relationship between events out there. Many causal relationships would exist even if no one observed or thought of them. In contrast, explanation is a human activity affected by human interests. “The intimate bond between causation and explanation threatens the objectivity of causation” (Hausman Reference Hausman1998: 7). Before I present the major modes of explanation in Chapter 3, I will discuss here as a backdrop the concept of causation.

Regularity Theory of Causation

Although David Hume, one of the best-known scholars in Western philosophy, developed his concept of causation more than 200 years ago, its influence can be felt even in modern-day academic research. Mackie (Reference Mackie1974: 3) considers that Hume made “the most significant and influential single contribution to the theory of causation.” Hume is traditionally credited with creating the regularity theory of causation, according to which the causal relationship between two events consists merely in the fact that events of one kind are always followed by events of another kind.

Necessary Connection

Hume’s argument begins with his favorite everyday case that clearly shows cause and effect − colliding billiard balls. Suppose we observe a red ball rolling toward a blue ball and the red ball coming into contact with the blue ball. Then we see the blue ball rolling away from the spot where it was struck. Of course, we also hear a noise when the balls come into contact. Do we see a connection or tie between the two events (i.e., the collision of the balls and the ensuing motion of the blue ball)? Hume’s answer is a resounding “no.” He generalizes from the billiard ball case that no individual case of causation involving objects that we perceive by our senses will provide any impression of necessary connection. To put it in his words:

When we look about us toward external objects, and consider the operation of causes, we are never able, in a single instance, to discover any power or necessary connexion; any quality, which binds the effect to the cause, and renders the one an infallible consequence of the other. We only find, that the one does actually, in fact, follow the other. The impulse of one billiard-ball is attended with motion in the second. This is the whole that appears to the outward senses. The mind feels no sentiment or inward impression from this succession of objects: Consequently, there is not, in any single, particular instance of cause and effect, any thing which can suggest the idea of power or necessary connexion.

(Hume 1999: 136)

In other words, the idea of necessary connection cannot be derived from observing any individual pair of events in the physical world and so must be derived from an internal impression:

This, therefore, is the essence of necessity. Upon the whole, necessity is something that exists in the mind, not in objects; nor is it possible for us ever to form the most distant idea of it, considered as a quality in bodies. Either we have no idea of necessity, or necessity is nothing but that determination of the thought to pass from causes to effects, and from effects to causes, according to their experienc’d union.

(Hume 2007: 112)

According to Hume, all simple ideas are copies of impressions. When we exercise our wills, we have an idea of power derived from an impression of power that we have. For example, if we force ourselves to lift a heavy object, we form, by introspection, an “impression” of power, which leads to our awareness of the power. Could this idea of power be what we have in mind when we assert that one billiard ball exerts power on another, or that there is a necessary connection between the collision and the movement of billiard balls? Definitely not, because a billiard ball is a material object and cannot have an impression of power similar to the one we have in voluntary action. The same argument applies to other material objects that enter into causal relationships. Therefore, even if we have an idea of power derived from human volition, this idea does not enable us to understand causation in material objects. Hume assumes that any idea of power or necessary connection between events worth taking seriously must be based on a deductive inference from one event to another; otherwise the idea is “vulgar” and “inaccurate” (Dicker Reference Dicker1998). He therefore arrives at the conclusion that the idea is nonexistent in this sense. The following passage summarizes his reasoning:

All events seem entirely loose and separate. One event follows another; but we never can observe any tie between them. They seem conjoined, but never connected. And as we can have no idea of any thing which never appeared to our outward sense or inward sentiment, the necessary conclusion seems to be, that we have no idea of connexion or power at all, and that these words are absolutely without any meaning, when employed either in philosophical reasonings, or common life.

(Hume 1999: 144)

After establishing that our idea of necessary connection or power is derived from an internal impression, Hume examines how we infer, from the occurrence of one event, that some other event will occur. After we have observed that an event of a certain kind is always followed by an event of another kind, we begin to infer, upon observing an event of the first kind, that an event of the second kind will follow. It is only after we repeatedly experience events of kind A always being followed by events of kind B that we begin to inductively infer an event of kind B from observing an event of kind A. Consequently, we think there is some necessary connection between the two kinds of events, calling event A the “cause” and event B the “effect.” The idea of necessary connection cannot represent any mind-independent relationship between causes and effects. Hume (Reference Hume, Norton and Norton2007: 61) uses the example of flame and heat to illustrate his point:

We remember to have had frequent instances of the existence of one species of objects; and also remember, that the individuals of another species of objects have always attended them, and have existed in a regular order of contiguity and succession with regard to them. Thus we remember to have seen that species of object we call flame, and to have felt that species of sensation we call heat. We likewise call to mind their constant conjunction in all past instances. Without any farther ceremony, we call the one cause and the other effect, and infer the existence of the one from that of the other.

In other words, the idea of necessary connection arises from the experience of constant conjunction through observing many similar pairs of events rather than any individual pairs. If, whenever we observe an event like the first member of the pair, an event like the second member follows, we develop a feeling of expectation or anticipation that is in our minds rather than in the events themselves. This feeling is “the only new ingredient added by having the experience of constant conjunction” (Dicker Reference Dicker1998: 107) and is the impression of necessary connection. This impression arises simply from the psychological principle of human nature, which Hume calls custom or habit. Once we have acquired the habit of inferring events B from events A, we come to judge that events A are causes of events B. Then events A and events B no longer seem entirely loose and separate (Beebee Reference Beebee2006).

Why do we have a notion of some necessary connection between events themselves if the necessary connection is just a feeling in our minds? To answer this question, Hume argues that “we project our own feeling of expectation or anticipation outward into the observed events, and thereby mistakenly come to think that we are aware of a necessary connection” (Dicker Reference Dicker1998: 107–108). In the words of Hume (Reference Hume, Norton and Norton2007: 112–113):

the mind has a great propensity to spread itself on external objects, and to conjoin with them any internal impressions … the same propensity is the reason, why we suppose necessity and power to lie in the objects we consider, not in our mind, that considers them.

Definitions of Causation

Hume offers two definitions of causation that have led subsequently to much debate among philosophers as to how the definitions should be interpreted consistently. The first definition is as follows:

Similar objects are always conjoined with similar. Of this we have experience. Suitably to this experience, therefore, we may define a cause to be an object, followed by another, and where all the objects similar to the first are followed by objects similar to the second. Or in other words, where, if the first object had not been, the second had never existed.

(Hume 1999: 146)

The second definition goes this way:

The appearance of a cause always conveys the mind, by a customary transition, to the idea of the effect. Of this also we have experience. We may, therefore, suitably to this experience, form another definition of cause, and call it, an object followed by another, and whose appearance always conveys the thought to that other.

(Hume 1999: 146)

Hume uses the example of vibration and sound to illustrate both definitions:

We say, for instance, that the vibration of this string is the cause of this particular sound. But what do we mean by that affirmation? We either mean, that this vibration is followed by this sound, and that all similar vibrations have been followed by similar sounds: Or, that this vibration is followed by this sound, and that, upon the appearance of the one the mind anticipates the senses, and forms immediately an idea of the other. We may consider the relation of cause and effect in either of these two lights; but beyond these, we have no idea of it.

(Hume 1999: 146)

Hume’s definitions are written rather loosely. For example, “it is more accurate to regard causes and effects as events than as objects” (Dicker Reference Dicker1998: 112). When we observe that a red billiard ball – one object – hits a blue billiard ball – another object – and causes the blue ball to move, the cause is not just the red ball as such, but its collision with the blue ball, which is an event.

The two different definitions have led to controversy concerning Hume’s intentions, such as whether he had two different theories of causation (Beauchamp and Rosenberg Reference Beauchamp and Rosenberg1981). It is beyond the scope of this book to discuss this controversy. Suffice it to say that the first definition is concerned with causation occurring objectively in nature, regardless of whether there are any people observing, while the second definition refers to the triggering of expectations through observation of the cause’s occurrence. Stroud (Reference Stroud1977: 90) describes the relationship between the two definitions:

Any events or objects observed to fulfil the conditions of the first “definition” are such that they will fulfil the conditions of the second “definition” also. That is to say that an observed constant conjunction between As and Bs establishes a “union in the imagination” such that the thought of an A naturally leads the mind to the thought of a B. That is just a fundamental, but contingent, principle of the human mind.

The first definition makes no reference to necessary connection between cause and effect because necessary connection is just the feeling of expectation mentioned in the second definition. Instead, the first definition involves only “constant conjunction” – one type of event being always followed by another type of event − and lays the foundation of the regularity theory of causation.

Causal Relationships versus Accidental Regularities

Mackie (Reference Mackie1974: 196) argues that the problem of distinguishing causal from accidental regularities “is the great difficulty for any regularity theory of causation.” The most common objection to the theory is that it cannot distinguish between genuine causal relationships − or what Lewis (Reference Lewis1973: 556) calls “causal laws” − and regular but non-causal relationships (Dicker Reference Dicker1998). For the former, suppose that last year John reached the retirement age of the company that he worked for − Company X − and so started receiving the pension provided by that company. All retired employees of Company X have been receiving the pension. When an employee reaches the retirement age, other employees expect that person to receive the pension. According to the two definitions of causation discussed above, reaching the retirement age in Company X causes pension payouts, as embodied by Statement (a) “Whenever an employee of Company X reaches the retirement age, they start receiving pension.” As to accidental regularities, consider Companies A and B whose fiscal years end on September 30 and December 31, respectively. Therefore every year, Statement (b) “Company A’s annual financial reports are followed by B’s” holds. After seeing A’s reports, one expects to see B’s. This regularity holds for all companies having September 30 as the fiscal year-end and those having December 31 as the fiscal year-end. Again, according to the two definitions of causation, one may conclude that A’s financial reports cause B’s. Of course, this time the causal inference is flawed.

The main difference between causal relationships and accidental regularities is that the former do, but the latter do not, support counterfactuals (Beebee Reference Beebee2006). A counterfactual statement says that if something that did not happen but is assumed, counter or contrary to the fact, to have happened, then something else would have happened. To illustrate how causal relationships support counterfactuals, we return to the above example of John’s pension. Suppose that John in fact did not reach the retirement age last year. The causal relationships captured in Statement (a) supports the corresponding counterfactual in the sense that we can infer from Statement (a) that if John had reached the retirement age last year, he would have started receiving his pension. On the other hand, this is not true for accidental regularities. Suppose that in the current year, Company A changed its fiscal year-end such that it no longer fell on September 30. We cannot infer from Statement (b) that if Company A’s fiscal year had ended on September 30 in the current year, Company A’s annual financial reports would have been followed by B’s. In fact, if Company A’s new fiscal year-end is June 30, its financial reports are still followed by B’s; that is, this contradicts the counterfactual condition that if something that did not happen but is assumed to have happened, then something else would have happened. In brief, Statement (b) does not support the counterfactual. We use genuine causal relationships, not accidental regularities, as a basis for prediction and counterfactual reasoning. Some philosophers argue that Statement (a) possesses a special necessity but Statement (b) does not. This difference shows that the problem faced by the regularity theory – to distinguish between causal relationships and accidental regularities – is insuperable (Dicker Reference Dicker1998).

Necessary and Sufficient Conditions

Immediately after giving the first definition, Hume (Reference Hume and Beauchamp1999: 146) adds a remark: “Or in other words, where, if the first object had not been, the second had never existed.” The remark is puzzling in that it is different from and cannot be implied by the definition. Although Hume makes the remark only once, some philosophers do not dismiss it as a careless slip because they deem that “an adequate analysis of causation should imply that a cause is not just a sufficient condition for its effect, but also a necessary condition for its effect” (Dicker Reference Dicker1998: 125). Returning to the example of John’s pension, reaching the retirement age caused the pension payouts; that is, reaching the retirement age is a sufficient condition for receiving the pension. But this does not imply that it is also a necessary condition − if John did not reach the retirement age last year, he would not receive the pension. The remark plays the role of specifying the necessary condition.

The expanded definition − that is, the first definition plus the remark − therefore specifies both the sufficient and necessary conditions for the effect to occur. However, a difficulty arises because in this case, if the cause occurs, the effect occurs and if the effect occurs, the cause occurs. In other words, the relationship between cause and effect is perfectly symmetrical and we can no longer distinguish between cause and effect. Yet it is well known that a causal relationship is asymmetrical: reaching the retirement age causes pension payouts but receiving pensions does not cause an employee to reach the retirement age. Hume’s requirement that the cause must occur before the effect in time offers one way to deal with the difficulty; one must reach the retirement age before receiving pensions. This temporal condition would restore the asymmetry of the causal relationship .

As Dicker (Reference Dicker1998: 128) points out, “the idea that a cause is a necessary condition for its effect is not wholly accurate. Rather, a cause is necessary for its effect only on the assumption that no other cause of that effect is operative.” For example, it may not be accurate to hold that the statement “Reaching the retirement age caused pension payouts” implies that if an employee had not reached the retirement age, they would not have received pensions. The statement in fact implies that if the employee had not reached the retirement age and nothing else could enable them to receive pensions, then they would not have the pensions. Note that it is rather common among companies that employees have the option of early retirement after serving their company for a certain number of years. Early retirement also enables them to receive pensions.

The complexity of necessary and sufficient conditions was highlighted by John Stuart Mill, another prominent philosopher in the English-speaking world after Hume:

It is not true, then, that one effect must be connected with only one cause, or assemblage of conditions; that each phenomenon can be produced only in one way. There are often several independent modes in which the same phenomenon could have originated. One fact may be the consequent in several invariable sequences; it may follow, with equal uniformity, any one of several antecedents, or collections of antecedents. Many causes may produce mechanical motion: many causes may produce some kinds of sensation: many causes may produce death. A given effect may really be produced by a certain cause, and yet be perfectly capable of being produced without it.

(Mill 1973: 435)

The gist of the above passage is that individual causal factors are neither necessary nor sufficient. Rather, they constitute an overall combination that is sufficient for the outcome and alternative combinations are possible.

Mackie (Reference Mackie1974) develops systematically Mill’s idea and argues that a cause is at least an INUS condition for the effect. The INUS condition stands for an insufficient but nonredundant part of a condition that is itself unnecessary but sufficient for the occurrence of the effect. Bennett (Reference Bennett1988) simplifies the term to NS conditions − necessary parts of sufficient conditions. To illustrate an INUS condition, which offers some insights to management research, we use the example of Samsung’s relocation of its display production from China to Southern Vietnam discussed in Chapter 1. Let us assume that the cause given by the Vietnamese state-run newspaper − Vietnam being an important gateway to other Southeast Asian countries and a link in Samsung’s global supply chain − is genuine. Suppose further that this particular relocation decision was triggered by the Covid-19 pandemic, which revealed the risk of concentrating production activities in one host country (i.e., China) and that there were other causes such as low land cost and abundant supply of labor force in Southern Vietnam. Together these causes constitute an unnecessary but sufficient condition for Samsung’s decision. The condition is sufficient given the fact that Samsung made the decision, but it is unnecessary because other causes could have led to the same decision; for instance, a political conflict arising between South Korea and China and Samsung wanting to hedge against its political risk in China. Within the current set of causes, the cause cited by the Vietnamese newspaper is insufficient because the cause alone is not good enough to account for Samsung’s decision. However, the cause is also nonredundant because without it, Samsung would not have considered moving to Vietnam; there are other Southeast Asian countries, such as Indonesia, that have low land costs and abundant labor supply but these countries are less well located than Vietnam. Hence, the cause is an INUS condition for Samsung’s relocation.

A Counterfactual Analysis of Causation

Collins et al. (Reference Collins, Hall, Paul, Collins, Hall and Paul2004: 3) claim that “counterfactuals are fundamental to any philosophical understanding of causation.” Referring to Hume’s (Reference Hume and Beauchamp1999: 146) abovementioned remark: “Or in other words, where, if the first object had not been, the second had never existed,” Lewis (Reference Lewis1973: 557) argues that the remark is a proposal for “a counterfactual analysis of causation,” and he is the principal advocate of such an analysis. He rephrases Hume’s remark with a caveat indicating the difficulty of his task:

We think of a cause as something that makes a difference, and the difference it makes must be a difference from what would have happened without it. Had it been absent, its effects − some of them, at least, and usually all − would have been absent as well. Yet it is one thing to mention these platitudes now and again, and another thing to rest an analysis on them. That has not seemed worth while. We have learned all too well that counterfactuals are ill understood, wherefore it did not seem that much understanding could be gained by using them to analyze causation or anything else.

(Lewis 1973: 557)

Possible Worlds

To deal with the difficulty concerning counterfactuals, Lewis brings in the concept of “possible worlds.” The core idea is that in the world in which we live, things need not have been as they are and might have been different in countless ways. History, since the Big Bang, could have unfolded in a way different from what it did. In short, the actual world is only one among many possible worlds. Lewis (Reference Lewis1973) assumes that possible worlds can be ordered with respect to their similarity to the actual world.

Given the complexity of Lewis’s arguments and subsequent developments and debates among other scholars, here I follow Hausman’s (Reference Hausman1998: 112) exposition because of its clarity and conciseness. Lewis (Reference Lewis1973) specifies that his analysis applies to particular events only and not general phenomena. For two distinct events, C and E, E is said to be counterfactually dependent on C if and only if both of the following counterfactual statements are true:

(1) If C were to occur, then E would occur.
(2) If C were not to occur, then E would not occur.

If both C and E occur, the first statement is automatically true because the closest possible world in which C occurs is the actual world and in that world E also occurs. As to the second counterfactual statement, we consider possible worlds in which C does not occur. Given that these possible worlds can be ordered with respect to their similarity to the actual world, some of them will be more similar to the actual world than others. The statement is true if a possible world without C (a “non-C possible world”) in which E does not occur is more similar to the actual world than any other non-C possible world in which E occurs. This counterfactual argument lays the foundation for understanding causation.

Let’s return once more to the example of John’s pension. In the actual world, John reached the retirement age and received his pension, satisfying the first counterfactual statement. Among possible worlds in which John did not reach the retirement age, the one where he did not receive the pension is more similar to the actual world than the others. This satisfies the second counterfactual statement. That is to say, receiving the pension is counterfactually dependent on reaching the retirement age. Let’s assume that John’s company did not have the option of receiving the pension upon early retirement. Lewis (Reference Lewis1979) argues that the non-C possible world that is most similar to the actual world should have exactly the same history as the actual world until shortly before the time when C occurs in the actual world, with the necessary adjustments that lead to C’s non-occurrence. Suppose that in one of the possible worlds in which John did not reach the retirement age, he took early retirement and so received his pension. This possible world is less similar to the actual world than the one where he did not receive his pension although he took the initiative to have early retirement. It is because according to our assumption, John’s company in the actual world did not offer the early retirement option.

Symmetrical Overdetermination

Needless to say, Lewis’s counterfactual approach is not without problems. Consider the well-known example of window-shattering, in which a rock is thrown at a window and the window is broken. Saying that the striking of the window caused the shattering of the window is the same as saying that if the window had not been struck, it would not have shattered. In other words, the shattering was counterfactually dependent on the striking. Suppose Tom and Mary both threw rocks at a window at the same time with exactly the same force. The window shattered. Moreover, each rock was thrown with sufficient force to shatter the window all by itself. Intuitively speaking, both Tom’s and Mary’s throws were causes of the shattering. Yet a counterfactual analysis says otherwise. If Tom had not thrown his rock, the window would still be shattered (by Mary’s rock); the same applies to Mary’s throw. Therefore, the shattering was not counterfactually dependent on either Tom’s or Mary’s throw; neither throw was a cause of the shattering. This example shows that Lewis’s analysis breaks down in the case of symmetrical overdetermination of an effect (Collins et al. Reference Collins, Hall, Paul, Collins, Hall and Paul2004).

Cases of symmetrical overdetermination are not rare in business. Let’s continue with the example of John’s pension. Suppose that when John joined his current employer decades ago, he decided that if he continued to work there, he would, once he was eligible, take the company’s early retirement option, which allowed employees to receive their pensions after serving the company for at least thirty years. At the time of making his decision, John was thirty years old and the company’s mandatory retirement age was sixty-five. In other words, he planned to retire at sixty, not sixty-five. Suppose further that not long after his joining the company, the mandatory retirement age was changed to sixty. Last year John reached sixty. His pension payouts were overdetermined in the sense that either the mandatory age or his plan of early retirement would have caused it. Yet, the payouts were not counterfactually dependent on either.

A more straightforward example is in finance. Suppose a mutual fund manager programed a sell instruction on a particular stock in her portfolio such that if the price of the stock fell during the day to $90 or by 10 percent of the opening price, 15 percent of the stock would be sold immediately. Then on a particular day the opening price of the stock was $100. After about an hour, it fell to $90 (or, by 10 percent) and so triggered the sell instruction. The sale was caused by either of the two conditions of the sell instruction but was counterfactually dependent on neither.

Backtracking

The above cases of symmetrical overdetermination show that counterfactual dependence is not necessary for causation. Counterfactual statements are vague, at least with respect to the issue of backtracking. Lewis (Reference Lewis1979: 456) borrows this example from Downing (Reference Downing1959):

Jim and Jack quarreled yesterday, and Jack is still hopping mad. We conclude that if Jim asked Jack for help today, Jack would not help him. But wait: Jim is a prideful fellow. He never would ask for help after such a quarrel; if Jim were to ask Jack for help today, there would have to have been no quarrel yesterday. In that case Jack would be his usual generous self. So if Jim asked Jack for help today, Jack would help him after all.

Hence, there are two different interpretations of the counterfactual statement “If Jim were to ask Jack for help, Jack would help him.” According to the first interpretation stated in the above passage, the statement is false because Jack is in no mood to be helping Jim given that he is still hopping mad. On the other hand, the second interpretation views the statement as obviously true: Jim would not ask Jack for help unless there had been no quarrel between them and if there had been no quarrel, Jack would be generous in offering his assistance. The first interpretation is non-backtracking and the second is backtracking. Heller (Reference Heller1985: 77) makes a distinction that “a non-backtracking counterfactual is concerned with what the result would be of a certain antecedent’s being true in a situation similar to the actual situation” whereas a backtracking counterfactual “takes into account how the world would have to have been different in order for the antecedent to get to be true.”

Back to the example of Jim and Jack, the antecedent in question is that Jim were to ask Jack for help. The non-backtracking interpretation considers what the result would be in case the antecedent is true in possible worlds closest to the actual world, which is that Jack is still angry about the quarrel. Therefore, the result of Jim seeking help is that Jack would not help. In contrast, the backtracking interpretation considers the closest worlds in which Jim asks Jack for help to be those in which there has been no quarrel (given that Jim is a prideful fellow and would never ask Jack for help after such a quarrel). In all of these worlds Jack is not angry and so he would help Jim. Here the focus is on the result of the antecedent being true in a situation where there has been no quarrel because, if otherwise, the antecedent won’t be true.

In discussing causal analysis of singular events in history, Reiss (Reference Reiss2009: 713) examines the following three counterfactual claims related to historical events:

Had the Greeks not won against the Persians at Salamis, Western civilization would not have become dominant in the world.
Had Chamberlain confronted Hitler at Munich, World War II would have been no worse and probably better.
Had Kennedy shown more resolve prior to the Cuban Missile Crisis, Khrushchev would not have deployed missiles.

He concludes that counterfactuals in history are backtracking, although in philosophy “it is a generally accepted pillar of truth that if counterfactuals are to be used as stand-ins for causal claims, they have to be nonbacktracking” (720). Consider, for example, the third counterfactual claim above. Lebow and Stein (Reference Lebow, Stein, Tetlock and Belkin1996) deem that to evaluate the counterfactual, we need to examine what conditions would have to have been present in order for Kennedy to show more resolve before the Cuban Missile Crisis. Since those conditions that would have made Kennedy show resolve were just not present during that historical period, Lebow and Stein regard the antecedent as inadmissible. The topic of counterfactuals is discussed further in Chapter 3 where historical explanation is introduced.

Probabilistic Causation

An obvious problem of the regularity theory of causality is that contrary to the constant-conjunction view, most causes in everyday life are not invariably followed by their effects and causal attributions are often nondeterministic. For example, while it is well known that smoking is a cause of lung cancer, some smokers never develop the cancer. Dropping a glass on the floor causes it to break, but occasionally a glass is dropped but does not break. People generally believe that college education increases an individual’s earning potential, but this may not hold for certain individuals. Punishments should deter theft but the deterrence is not perfect. Such examples account for an indeterministic view of causation and motivate the development of probabilistic causation.

The central idea of probabilistic causation is that causes raise the probability of their effects. Suppose Event A_t1 occurred at time t1 and Event B_t2 at time t2. Suppes (Reference Suppes1970) defines a cause as a probability-raising event:

A_t1 is a prima facie cause of B_t2 if and only if

(1) t1 < t2
(2) P(A_t1) > 0
(3) P(B_t2/A_t1) > P(B_t2)

That is, A_t1 is a cause of B_t2 if and only if A_t1 occurred before B_t2 and the conditional probability of B_t2 given A_t1 is greater than the absolute probability of B_t2. Simply put, if the probability of an event given another event is higher than the probability of the first event alone, the two events are causally connected in some way. This definition addresses each of the four simple examples discussed above: smoking increases the probability of having lung cancer; dropping a glass on the floor increases the probability of its breaking; college graduates are likely to earn more than non-college graduates; and punishments increase the probability of having a lower level of theft incidents. According to the definition, a sufficient or determinate cause underlying the constant-conjunction view is one that produces its effect with certainty (i.e., a probability of one).

A well-known fact in statistics is that a correlation between two variables X and Y does not warrant the conclusion that X causes Y or vice versa. Suppose a study found that the extent of peak hour traffic in Dallas from 2000 to 2010 was correlated positively with sales of lipstick in the city. Without doubt, traffic conditions do not cause lipstick sales. Rather, a higher level of employment causes more peak hour traffic. Moreover, women constitute about half of the US workforce; when more women go to work, the demand for lipstick rises. Suppes (Reference Suppes1970) considers several ways to address this issue by introducing the term “spurious cause.” One solution is to define a spurious cause as:

A_t1 is a spurious cause of B_t2 if and only if A_t1 is a prima facie cause of B_t2 and there is a t3 < t1 and an Event C_t3 such that

(1) P(A_t1&C_t3) > 0
(2) P(B_t2/A_t1&C_t3) = P(B_t2/C_t3)
(3) P(B_t2/A_t1&C_t3) ≥ P(B_t2/A_t1)

The idea is that a spurious cause does not change the conditional probability of Event B_t2 given C_t3. The addition of Event A_t1 into the picture has no real effect upon the occurrence of B_t2; Event C_t3 can account for Event B_t2 at least as well as A_t1 can. Returning to the above example, traffic conditions are a spurious cause of lipstick sales and level of employment is what Suppes (Reference Suppes1970) calls a “genuine cause,” defined as a prima facie cause that is not spurious. Level of employment alone can account for lipstick sales and peak hour traffic has no additional effect on lipstick sales once level of employment has been included into the calculation.

The above discussion shows some of the complexity involved in inferring causal relationships from probabilistic correlations. Scholars have developed a number of techniques for representing systems of causal relationships and inferring causal relationships from probabilities. As a result, an interdisciplinary field called “causal modeling” devoted to the study of methods of causal inference has emerged. Given the technicalities of the field and the space limitations of this book, in the following section I introduce briefly two major techniques − causal graph modeling and vector space modeling. Readers may skip the discussion if they find it too technical.

Causal Graph Modeling

Figures of causal models are found commonly in management research papers to represent relationships between constructs or variables, although the authors may not use the technique of causal graph modeling. To my knowledge, Durand and Vaara (2009) were the first to introduce systematically to management research causal graph modeling, using the relationship between firm resources and performance as an illustration. My discussion in this section is based on their paper and the section after, “Vector Space Modeling,” presents an alternative causal modeling technique to address the problems of causal graph modeling. The discussion here focuses on some basic principles to illustrate the nature of causal graph modeling and skips the mathematical details involved.

The attraction of causal graph modeling is that, under certain conditions, it permits the determination of the causal relationships between types of events with logical necessity. Both Spirtes et al. (Reference Spirtes, Glymour and Schein1993) and Pearl (Reference Pearl2009) prove theorems that show how causes can be identified through the probabilistic analyses of causal graph modeling, as well as present the various methods of estimation. In particular, Pearl (Reference Pearl2009: xv–xvi) distinguishes between probabilistic and causal relationships: “I now take causal relationships to be the fundamental building blocks both of physical reality and of human understanding of that reality, and I regard probabilistic relationships as but the surface phenomena of the causal machinery that underlies and propels our understanding of the world.”

Figure 2.1 shows three directed acyclic graphs. Each graph includes a set of points with directed arrows connecting them. The graphs show causal or explanatory relationships, represented by directed arrows. A graph is acyclic if following a series of arrows will never bring one back to where one started. A dot in a graph can represent a trope, an event or a variable; for simplicity of discussion, I consider dots to represent variables. Figure 2.1a depicts a causal graph where the two arrows represent causal relationships between Z at the origin and the other two variables X and Y at the arrows’ heads. Z is a parent of X (and X is a descendant of Z) since there is a directed path from Z to X. By the same token, Z is also a parent of Y. Figure 2.1b shows that Z affects Y directly and also indirectly through X. Unlike Figure 2.1a and b, Figure 2.1c has a white dot, U, which represents an unobservable variable. The two dashed arrows represent causal influence from U on Z and F.

Figure 2.1 Back-door causal paths

Back-door Paths

As Durand and Vaara (Reference Durand and Vaara2009: 1257) well say, “the general principle of causal graph estimation is to eliminate ‘back-door paths.’” A back-door path is constituted by any causal factor that influences the phenomenon to be explained through intermediate causes. Figure 2.1a illustrates a case in which Z affects X and Y directly and separately. There are no intermediate causes and therefore no back-door paths. The above example of lipstick sales in Dallas can be represented by this figure, with Z, X and Y standing for level of employment, peak hour traffic and lipstick sales, respectively. Figure 2.1b and c illustrate cases in which there is a back-door path from X to Y, assuming that these two variables constitute the cause-effect pair in question. The back-door path in Figure 2.1b is X ← Z → Y, while Figure 2.1c has a longer back-door path: X ← Z ← U → F → Y.

The logic of blocking a back-door path is based on the concept of “screening-off” introduced by Reichenbach (Reference Reichenbach1956). Formally, if events X and Y are probabilistically independent, they display the following relationship:

P (X & Y) = P (X) \times P (Y) (probabilistic independence)

Events are probabilistically dependent if they occur together either more or less frequently than they would be expected to do by chance. This is the case when:

P (X & Y) \neq P (X) \times P (Y) (probabilistic dependence)

Screening-off occurs when there is a common cause of two events that are initially probabilistically dependent. Suppose there are three events X, Y and Z (see Figure 2.1a). Initially, there is a probabilistic dependence between events X and Y. Screening-off arises when a third event, Z, screens off the dependence between the first two events X and Y, rendering X and Y independent, that is:

\begin{matrix} P (X & Y) \neq P (X) P (Y) (necessary condition) \\ P (X & Y / Z) = P (X / Z) \times P (Y / Z) (necessary condition) \end{matrix}

The above two conditions constitute a jointly sufficient screening-off condition. Therefore, Z must lie in the causal history of X and Y.

The logic of screening-off can be extended to eliminate any back-door path. Taking the situation depicted in Figure 2.1c, Z influences Y through a back-door path; hence, there is initially a dependence between Z and Y:

P (Z & Y) \neq P (Z) \times P (Y)

However, once we condition for X and F (see the definition of “conditioning” below), Z becomes independent of Y:

P (Z & Y / X & F) = P (Z / X & F) \times P (Y / X & F)

Consequently, Z must lie in the causal history of either F, X, or both.

The first two strategies introduced by Durand and Vaara, conditioning (Figure 2.2a) and instrumenting (Figure 2.2b), are based on the back-door criterion. Conditioning involves accounting for all back-door paths (C₁ … C_n) of one or more potential causal factors (X, Z). As shown in Figure 2.2a, C becomes independent of Y after taking into account X and Z. Hence, one can establish that there is a causal chain from the back-door factor C through X and Z to Y. In the case of instrumenting there is a controllable instrument (T) that influences directly X (Figure 2.2b). If T becomes independent of Y, given X, we know that there must be a causal chain from T through X to Y. Conditioning and instrumenting are thus both cases of the back-door path condition. While we are searching for a whole host of factors in the former, in the latter we know already of a factor, T, that triggers indirectly Y and with which the effect estimation can be carried out.

Figure 2.2 Three strategies for causal effect estimation

The third strategy presented by Durand and Vaara, mediating (Figure 2.2c), is based on the front-door condition. The front-door condition involves looking for a factor – M − that lies between the potential causal factor of interest – X − and the outcome, Y (Pearl Reference Pearl2009). The front-door condition is fulfilled if X becomes independent of Y again, once M is conditioned for. Such conditional independence implies that X causes M, which in turn causes Y.

The front- and back-door criteria − and thus all three strategies for causal effect estimation − are based on the logic of screening-off. Events that were dependent initially become independent, given the conditioning of some other events. This conditional independence indicates that the events must lie in the causal history of the event under consideration. Conditioning and instrumenting use the back-door criterion, since there is a movement backwards in the causal chain to establish causal relationships. Mediating through M (as shown in Figure 2.2c) uses the front-door criterion, since the factor M, which is a descendant of X, is used to establish the fact of X being causally relevant for Y. Durand and Vaara (Reference Durand and Vaara2009) ultimately dropped conditioning and instrumenting, as the assumptions on which these strategies are based are often violated under real world conditions.

Markov Condition

The fact that causal graph modeling is based on the logic of screening-off highlights one of the core assumptions on which this modeling rests – the Markov condition. This condition states that, once all of its parents are conditioned for, a variable will be probabilistically independent of all other variables except its descendants. It is important to recognize how fundamental this assumption is to Durand and Vaara’s project and how quickly the arguments favoring causal graph theory break down once it is violated. More specifically, a major limitation of causal graph modeling is that the Markov condition frequently fails to hold under the kind of circumstances faced in management decision-making. We consider below two cases in which this is so: missing information and causal interdependence.

The Markov condition may fail to hold when there is insufficient information about potential causes and their effects. This includes cases in which not all relevant parents are specified and in which events have not been specified in a sufficiently fine-grained way (Arntzenius Reference Arntzenius, Hull and Okruhlik1992). The problem of missing information can be illustrated using an adapted version of Durand and Vaara’s prime example of the mediating strategy. Consider Figure 2.3, which is a simplified version of their Figure 2.4. In addition to excluding some variables, the main modification I have made is to place the unobservable event U in a slightly different position.

Figure 2.3 Missing information on causally relevant events

Figure 2.4 Process tracing tests of evidential strength

Suppose we want to establish whether certain resources R lead to high firm performance Y_t.¹ We assume that the individual and combined probabilities of these factors are known and that R is mediated by P (a set of intermediate factors such as rareness, immobility and low substitutability of resources). To be able to establish that Y_t is causally related to R in the way that Durand and Vaara propose, the following relationships would have to hold:

\begin{matrix} P (R) \times P (Y_{t}) \neq P (R & Y_{t}) \\ P (R / P) \times P (Y_{t} / P) = P (R & Y_{t} / P) \end{matrix}

In this case R and Y_t would initially be probabilistically dependent but become independent once we condition for P. We would then establish that R influences firm performance Y_t through some mediating factor P.

However, given the relationships depicted in our example, the following probabilities actually hold:

\begin{matrix} P (R) \times P (Y_{t}) \neq P (R & Y_{t}) \\ P (R / P) \times P (Y_{t} / P) \neq P (R & Y_{t} / P) \end{matrix}

P does not make R and Y_t independent because there is an unknown factor U, say, whether the technological environment of an industry is changing rapidly, that influences P directly. Durand and Vaara (2009), accordingly, have to assume that “the unobservable factors … do not influence P” (1259) for their approach to work. The presence of any such unknown factors that influence directly either the mediating factor P or the potential causal factor R and the phenomenon under consideration Y_t, will lead to the breakdown of the Markov condition, rendering causal graph modeling infeasible (Arntzenius Reference Arntzenius, Hull and Okruhlik1992).

In order that the Markov condition would hold in Durand and Vaara’s example, the relationship between R and P is deemed to be unaffected by any unknown factor U, which is highly unlikely to ever be the case in reality. Most readers, for instance, would probably have already found this example overly simplistic. In real world situations, firm performance is likely to depend on a complex web of causal relationships with “many back-door paths,” as Durand and Vaara themselves argue.

The second reason for the Markov condition not being satisfied is causal interdependence. Unlike the previous reason, which is related to a lack of relevant information, here the Markov condition fails to hold even under conditions of perfect information. As mentioned, the Markov condition requires that once we condition for all its parents, a variable has to be probabilistically independent of all other variables except its descendants. To illustrate this requirement, take again the case represented by Figure 2.1a of a parent Z that causes X and Y. For the Markov condition to hold, X and Y must be statistically independent, given that Z has occurred. This would be equivalent to a situation in which, once Z has occurred, the likelihood of X and Y is determined by tossing two fair coins independently of each other. However, it seems to be much more likely intuitively that the effects of X and Y will depend on each other in some way, given that they share a common cause (Cartwright Reference Cartwright1999).

Consider a more concrete example. Suppose a firm purchases a new machine (Z) that will, on average, reduce the defect rate of output (X) and the chance of machine breakdown (Y), as compared with the old machine. The Markov condition demands that, once the purchase of the new machine is taken into account, reductions in the defect rate and reductions in machine breakdown will occur independently of each other, as if the occurrence of each type of events were determined by tossing a coin. Yet, since both types of events result from a common cause, it is reasonable to expect that their occurrences are correlated. For instance, when the machine is close to a breakdown state, the defect rate is likely to be higher. In cases like this, it is highly unlikely that the Markov condition would hold.

An alternative example runs as follows. It has long been accepted that firms exist precisely because they are complex, interdependent structures (Coase Reference Coase1937) in which the output of the combined work of employees is greater than the output of its constituent parts (Alchian and Demsetz Reference Alchian and Demsetz1972). Similar synergies are observed for firms’ external relationships. Cohen and Levinthal (Reference Cohen and Levinthal1990), for example, argue that firms that have prior knowledge in a particular area are better at acquiring new knowledge in the same area than firms without such prior knowledge. The newly acquired knowledge will contribute to the knowledge base, which enhances further the acquisition of knowledge in that area. Knowledge interactions of this kind create a virtuous loop that can help firms to get ahead of their competition.

Durand and Vaara acknowledge that “causal graphs are non-parametric and acyclic (i.e., they do not permit representation of circular causation …)” (1257). However, they do not mention that causal graph modeling also requires that causes lead to their effects independently, a particular form of atomism according to which causal factors exert their effects in isolation from each other. Such atomism is unlikely to hold in firms, which are composed of structured, interdependent relationships.

Vector Space Modeling

In contrast to causal graph modeling, vector space–based algorithms have a large number of successful scientific and business applications, such as search engines (Berry and Young Reference Berry and Young1995), literature-based discovery where previously unknown relationships between phenomena are inferred (Swanson Reference Swanson1988), image recognition (Bulcão-Neto et al. Reference Bulcão-Neto, Camacho-Guerrero, Dutra, Barreiro, Parapar and Macedo2011) and web-based translation (Bishop Reference Bishop2006). Vector space models such as latent semantic indexing or Dirichlet allocation were first developed to identify similarities in linguistic concepts (Blei et al. Reference Blei, Ng and Jordan2003). Although the models themselves can be highly sophisticated, the underlying logic is straightforward. States of affairs are represented as points in a multidimensional space, with each aspect of a state of affairs occupying a particular dimension. The points are modeled as vectors that have a length and a direction (hence the term vector space modeling).

The working of the algorithms can be illustrated using the analogy of linguistic text, the application for which they were originally developed. A text has two characteristics, namely (1) that it depicts a number of different entities, also called terms, and (2) that it contains information on how the entities are structurally related. In a piece of text, each word represents a basic unit of information. Words in turn are composed of letters. To infer the meaning of a word or string of words, vector space–based algorithms analyze how words and the letters they are composed of relate. By analyzing multiple texts, the meaning of words in their particular context can be identified. Texts do not have to be composed of the same words to have similar meanings. The algorithms are able to pick up similar structural arrangements, even if some of the words differ within the texts (Bishop Reference Bishop2006).

In management research, a firm’s particular structural configuration might be regarded as analogous to text. This text comprises intra- and inter-organizational processes (the words) such as particular routines, how these routines are internally structured (the letters that comprise the words) and how the routines relate (as words form sentences in a text). Suppose that the way a firm is integrated with its suppliers is a key structural feature for achieving superior R&D performance and that it is difficult to imitate such integration because there is causal ambiguity as to whether and which elements of the relationship lead to the performance. Suppose further that whether a desired type of relationship can be achieved depends on the wider culture in which the firm operates. For instance, building relationships of mutual trust might require substantial investment and might be difficult to achieve in a society with a “transactional” culture. This example can be used to illustrate how vector space–based modeling differs from causal graph modeling in terms of (1) the nature of data used, (2) the mechanism that converts the data input into an output, (3) the nature of the output and (4) the assumptions made.

Nature of Data Used

A vector space–based modeling exercise might take information about meetings between a company and its suppliers from computer-based diaries as a source of data. Such diaries typically record the names of participants and their companies, their rank within these companies and the topic(s) of each meeting. Since we want to investigate R&D performance, this could involve collecting key R&D metrics such as which products were developed and how successful the products were in terms of indicators like speed of development, budget and sales. While researchers would still have to decide what data, such as the structure of meetings, they want to collect, the algorithm does not require the inputting of a classification of the structures that these meetings take. Establishing such structures will be, rather, an outcome of the analysis.

In contrast, causal graph modeling requires researchers to define a set of constructs and variables that measure the key factors they believe to be causally relevant. These constructs frequently describe whether firms possess particular characteristics or resources. Examples of the types of categories that the firms under investigation might have to be slotted into might include whether they have a matrix or a pyramid structure, how frequently they hold joint meetings with their suppliers, the level of trust established with the suppliers and whether they have joint development teams with their suppliers. Identifying in advance clearly defined entities or properties, rather than their structural relationship, therefore, becomes the main focus of data collection.

Conversion of Data Input into Output

In the vector space–based model, the composition of meetings is comparable to written text in the analysis of linguistic meaning. Each processual feature of a meeting, such as its participants and the departments they are affiliated with and the companies they belong to, is depicted in multidimensional vector space. The resulting vector represents the total structural description of each meeting. The similarity between structures can then be determined by calculating the angles between the vectors that represent them, with lower angles of deviation indicating higher structural similarity. If, for example, the same departments are present in a number of meetings, these meetings will be considered structurally more similar in this respect.

Causal graph modeling assumes perfect knowledge of probabilities. This implies that traditional statistical analysis is required to (1) identify the conditional probabilities between the entities in question and (2) assess whether the sample probabilities correspond to the population probabilities. For example, we might take 1,000 firms, determine whether they have an arm’s length relationship with their suppliers or whether they form strategic alliances and then look at the conditional probabilities of arm’s length relationships (or strategic alliances) in conjunction with R&D performance characteristics such as speed of product development.

Nature of Output

The way in which data are converted into an output influences the nature of that output. In the case of the vector space–based model, the output could be a characterization of the type and structure of interaction that is associated with the development of particularly successful products. Possible findings might include that:

a particular structure of interaction (e.g., between different departments) is fruitful;
a particular structural evolution of interactions over time is fruitful (e.g., initial interaction between certain departments of the focal firm and the supplier and then interaction between some of the firm’s own departments);
some structures of interaction are more common if strategic alliances are present; or
particular compositions of teams may be effective.

Causal graph modeling, in contrast, would require population-based probabilities and that the Markov condition holds. The causal graph model structures the conditional probabilities in terms of causal relationships. We may then find, for example, that a long-term relationship with suppliers, such as a joint venture, lies in the causal history of successful product development.

Assumptions

The differences between vector space modeling and causal graph modeling boil down to the assumptions made. Vector space modeling combines the inference of structural relationship with the inference of causal directionality as two steps of an inseparable problem. Thus, it identifies causal structures where the relata have an internal structure and where interdependencies between relata drive the performance of firms. The aim is to explicate the causal mechanisms, the concept of which is discussed later in this chapter, underlying the phenomena of interest. Some of these structures may share similarities and, if so, it might then be possible to identify the types of structural relationship associated with performance. We can name and describe these structures, but no two will be exactly the same, as each varies in its elements and how the elements are arranged. As such, vector space modeling is particularly well suited to situations that involve a complex web of causation, where no single factor, but rather an interdependent web of causes, leads to performance outcomes. Uncovering the underlying causal mechanisms can be of considerable help in understanding these situations.

Causal graph modeling assumes perfect knowledge of the probabilistic relationship between events; the main task is to draw conclusions about causal relationships from these probabilities. Hence, this modeling represents a view of event causation (Lewis Reference Lewis1973) where clearly definable, identifiable and separable entities − which we call resources − exist. In particular, it is assumed that the influence of one entity is directed clearly at another so that there are no mutual interdependencies and that entities can be measured and their causal influence separated from each other. The effects of the entities are assumed to be conditionally independent and therefore it is assumed that the Markov condition holds. Further, while entities may causally relate to each other, they are denied any internal structure, which, according to vector space modeling, is crucial for generating causal mechanisms. Table 2.1 summarizes the comparison between vector space modeling and causal graph modeling.

Table 2.1. Comparison between causal graph modeling and vector space modeling

	Causal graph modeling	Vector space modeling
Nature of data	Clearly defined and separable constructs Firms need to be classified in terms of the constructs of interest	Main focus on structural elements and their relationships Less pre-classification needed
Conversion of data input into output	Inference of population statistics Conditional independence	Comparing deviation of angles between vectors Identifying structural similarities (usually by calculating cosine between vectors)
Nature of output	Causal relationships of various significance levels and strengths between constructs	Structural similarities between objects or features of objects such as firms
Assumptions	Perfect knowledge of probabilistic relationships between events Separable, independently-acting resources drive performance (event causation) Causal effects are conditionally independent (Markov condition)	Structural characteristics of firms drive performance (structural causation) Key to these structural characteristics are the relationships entities have with each other Resources are mutually dependent entities that cannot be entirely isolated from their context

Strengths

Perhaps the most important benefit of vector space modeling is that it can deal with causal complexity. As discussed above, vector space models do not depend on the Markov condition, which often breaks down in business phenomena. Furthermore, even if the Markov condition were to hold, causal graph modeling would not give us results that take sufficient account of causal complexity. Returning to the example of what causes superior R&D performance, a causal graph model may show that a pyramid structure is more effective than a matrix structure, or that cross functional meetings are more effective than single function meetings. These results are very limited, as they reduce causation to a few, supposedly independent factors. In contrast, vector space models give us an understanding of the underlying mechanism and consequently provide insights into the complex web of causation that typically leads to superior performance. A vector space model may for example identify that joint product development teams between a company and its supplier, where a variety of ranks meet frequently inside and outside of work to create a high trust culture, lead to greater R&D success.

How does vector space modeling perform when applied to a complex web of causal interactions and missing data? Suppose researchers had access to the transcripts of R&D meetings but did not have access to other rich data sources, such as R&D expenditure by project and scientist locations. It turns out that vector space modeling outperforms methods that require traditional statistical analysis, such as causal graph modeling, in cases of causal complexity and missing data (Duch et al. Reference Duch, Swaminathan and Meller2007). Even if one had only the transcripts of R&D meetings, these documents would contain information on numerous potential causal factors. Thus an almost infinite number of potential combinations could explain the effect, leaving the internal structure of the relevant mechanism a black box. It is highly likely that methods such as multivariate regression or genetic algorithms would result in over fitting and the identification of spurious relationships. As causal graph modeling requires a similar type of data input, it suffers from the same problem. Vector space modeling, on the other hand, provides a sense of the internal structure of a mechanism, which is composed of a multiplicity of components, as one vector. It is thus much less likely to identify spurious correlations because (1) the number of potentially explanatory factors is substantially lower; and (2) we obtain a sense of the internal structure of the complex web of causation and can thus verify whether the causal mechanism, so identified, is plausible.

There is clear evidence for the advantages of vector space modeling in pharmaceutical research. This is an area in which significant advances have been made over the last several decades by focusing increasingly on the mechanisms and pathways through which drugs work, rather than on merely whether a drug is efficacious and safe (Rainsford Reference Rainsford1995). Vector space modeling has been shown to be more effective than traditional statistical approaches in this context because it takes into account a variety of complex structures, such as the three-dimensional structure of molecules, as well as the interaction of a number of different genes (Nobel Reference Nobel2006).

Limitations

Vector space modeling often requires detailed data on underlying structures and processes. It may be difficult to acquire such data from firms and to transform the data into a format that can be used by the algorithms. Moreover, sometimes there may be only a few observations of key variables. For example, if a critical decision leading to R&D success is made in a single meeting and is not captured by the data, vector space modeling will not be able to identify this causal factor. Of course causal graph modeling will fail too in this instance, as no correlations can be identified.

A further limitation is that there is always some judgment involved; we need to select the structures to be studied, such as the meetings discussed in the example above. However, in contrast to causal graph modeling, it is not necessary to define in advance which particular properties of a meeting, such as their cross-functional nature, could be the cause of superior performance and to classify the meetings accordingly. Rather, such structural characteristics will emerge from the analysis.

Finally, while vector space modeling returns information that helps to uncover underlying causal mechanisms, it cannot guarantee that the mechanism identified was causally relevant in any particular case. Making causal attributions ultimately involves an unavoidable element of judgment, which will also depend on the background knowledge, experience and intuition of the researcher concerned. While causal modeling techniques offer little help in this respect, knowledge of judgment biases helps to avoid making mistakes (see Kahneman Reference Kahneman2011). As an analytical technique, vector space modeling is also silent with respect to philosophical issues such as the debates between agent and event causalists discussed in the next section.

Agent versus Event Causation

Consider a scenario where Mary hit a red billiard ball with her cue, the red ball collided with a blue ball and the blue ball moved. Roughly speaking, the scenario consists of two events. First, Mary hit the red ball and the ball moved and second, the red ball collided with the blue ball and the blue ball moved. One key difference between the two events is that the first event involved Mary’s action based on her intention to hit the red ball with her cue whereas the second event involved the movement of two inanimate objects without any human intention. Both events exhibit causation; the movement of Mary’s cue caused the red ball’s movement and the red ball’s collision with the blue ball caused the latter’s movement. Are the two incidents of causation of the same nature? There has been heated debate about the answer to this question. Those who answer “yes” are called event causalists while those who answer “no” are called agent causalists.

The answer depends on whether one thinks that an intentional action involves an irreducible causal relationship, whose subject is the agent carrying out the action, or involves an event or sequence of events (Bishop Reference Bishop1983). An agent refers to someone or something that makes things happen² and “to make something happen is to cause an event of some kind, that is, to exercise the power to cause an event of that kind to occur” (Alvarez and Hyman Reference Alvarez and Hyman1998: 221). Given the nature of management research, I focus on human agents and exclude the agency of animals, plants or inanimate things, although the latter can also make things happen − birds building nests or oxygen rusting iron, for example. When an agent acts intentionally, such as Mary hitting the red ball with her cue, an incident of agency occurs. Event causalists maintain that actions are movements of the agent’s body parts that were caused in a particular way by mental events involving the agent’s intentions, desires, emotions or beliefs. In this sense, they downplay the role played by agency in the process. In contrast, agent causalists argue that if an agent intentionally caused an event, we cannot “reduce it to the case of an event being a cause” (Davidson Reference Davidson1980: 128). Therefore, a correct account of agency has to preserve for the agent the role of an action’s cause and the subject of the causal relationship in question is the agent, not the event in which the action occurred. The stress is on the causal power by virtue of which the agent has freedom of will to act (O’Connor Reference O’Connor1996). To ascribe a power to an object is to say something about what it will or can do, under suitable conditions, in virtue of its intrinsic nature (Harré and Madden Reference Harré and Madden1975). In brief, for the event causalist, “actions are events caused by intentions” whereas for the agent causalist, “they are events intentionally caused by agents” (Alvarez and Hyman Reference Alvarez and Hyman1998: 222). It is beyond the scope of this book to present the debate; suffice it to say, each side has some unique insights.

A related issue that is pertinent to management research is whether reasons can be considered causes. Davidson (Reference Davidson1963), who subscribes to event causation, famously proposes that reasons are causes. Simply put, the reason for which an individual performs an action is the cause of the action. Actions are motivated by beliefs and desires. “We need beliefs and desires because our wanting this and believing that, besides being our reasons for doing what we do, are − sometimes at least − the reasons why we do” (Dretske Reference Dretske1989: 1). Suppose John’s early retirement was due to his desire to spend more time on his hobbies. His desire constituted the reason that caused his decision to retire early. The role played by reasons in intentional explanation is discussed in Chapter 3.

Mechanisms

“To give a description of a mechanism for a phenomenon is to explain that phenomenon, i.e., to explain how it was produced” (Machamer et al. Reference Machamer, Darden and Craver2000: 3). Similarly, Bunge (Reference Bunge1997: 410) highlights the central role played by mechanisms in formulating explanations in the natural and social sciences:

If we wish to understand a real thing, be it natural, social, biosocial, or artificial, we must find out how it works. That is, real things and their changes are explained by unveiling their mechanisms: in this respect, social science does not differ from natural science.

Social science does differ from natural science with respect to experiments. Scientists need to conduct experiments because of the open character of the world in which events are subject to diverse causal influences. Natural scientists design experiments with the objective of achieving the conditions of a closed system in which “a constant conjunction of events obtains; i.e. in which an event of type a is invariably accompanied by an event of type b” (Bhaskar Reference Bhaskar1978: 70). The ideal of experiments is to create closed systems so that extraneous causal factors are controlled for, regular sequences of events are observed and the causal relationships among the events can be analyzed. However, it is often very difficult, if ever possible, for social researchers to achieve conditions of closure, as shown by the artificiality of laboratory experiments performed by social psychologists (Harré and Second Reference Harré and Second1972). Sayer (Reference Sayer1992) provides two main reasons for the openness of social systems. First, the configuration of social systems is continuously modified by human actions and, second, unlike inanimate objects or other animals, humans have the capacity for learning and self-change. Hence, social structures are less enduring than structures found in nature.

Given the impossibility of performing experiments under conditions of closure, a serious problem facing social researchers is to make reliable inferences about the causes of social phenomena. Some scholars maintain that a mechanism approach to research can ameliorate significantly this problem (e.g., Elster Reference Elster1989; Little Reference Little1991; Hedström and Swedberg Reference Hedström and Swedberg1996). It should be noted that the notion of mechanism can be traced to the scientific worldview of the seventeenth century when natural science was dominated by mechanics, the exemplar of which was Newtonian mechanics. The idea then spread from physics and astronomy to other natural sciences such as chemistry and biology (Hedström and Swedberg Reference Hedström and Swedberg1996). The original concept of mechanism has been broadened over the centuries; while a few of the mechanisms studied by contemporary scientists are mechanical, most are not (Bunge Reference Bunge1997).

Whereas in economics the concept of market mechanism was evident as early as 1776, in Adam Smith’s The Wealth of Nations published that year, the term “mechanism” came into use in social research much more recently. One of the earliest instances of its uses was in the prominent sociologist Robert Merton’s (Reference Merton1968) paper “On Sociological Theories of the Middle Range,” first published in 1949. In that paper, Merton advocates theories of the middle range that “lie between the minor but necessary working hypotheses that evolve in abundance during day-to-day research and the all-inclusive systematic efforts to develop a unified theory that will explain all the observed uniformities of social behavior, social organization, and social change” (39). Using the example of role-set theory, Merton shows how social mechanisms − “the social processes having designated consequences for designated parts of the social structure” (43) − serve as elementary building blocks of a middle-range theory. In management research, March and Simon’s (Reference March and Simon1958) landmark work Organizations was an early attempt at explicating the mechanisms of differentiation and aggregation through which individuals are able to accomplish organizational objectives − individuals being grouped into hierarchically connected functional units and performing specialized yet coordinated tasks.

A mechanism-centered social science can also be seen as a reaction to Friedman’s (Reference Friedman1953: 14–15) famous challenge to the necessity of model or theory realism:

Truly important and significant hypotheses will be found to have “assumptions” that are wildly inaccurate descriptive representations of reality, and, in general, the more significant the theory, the more unrealistic the assumptions … the relevant question to ask about the “assumptions” of a theory is not whether they are descriptively “realistic,” for they never are, but whether they are sufficiently good approximations for the purpose in hand. And this question can be answered by seeing whether the theory works, which means whether it yields sufficiently accurate predictions.

Since core assumptions often constitute a significant element of a mechanism, incorporating false assumptions will render a mechanism unrealistic. For example, marginal theory in economics assumes that business executives arrive at their production decisions through consulting schedules or multivariate functions showing marginal cost and marginal revenue. However, Lester’s (Reference Lester1946) empirical study of US business executives falsified the assumption, implying that the mechanism entailed by marginal theory was not realistic. Friedman’s above view was a response to the heated debate aroused by Lester’s study about whether core assumptions of a theory had to be realistic. Contrary to Friedman’s instrumentalist stance, researchers who adopt a mechanism approach are not satisfied with a model that merely generates accurate predictions based on covariational analyses. Rather, they attempt to specify discrete causal paths that connect the variables together. This will enhance our knowledge of the phenomenon by allowing us to peer deeply into the box of causality (Gerring Reference Gerring2008).

From a policy perspective, it is surely important to know what effect a given policy has produced, but it is also useful to know why the policy has that effect. The latter knowledge, which is gained through studying the mechanism concerned, will help policymakers anticipate possible unintended side effects and improve the policy accordingly (Gerring Reference Gerring2010). In the management discipline, bridging the gap between theory and practice is no easy task. The key criticisms that managers make of theorists is that theorists “comment on practice but elide context, overlook constraints, take the wrong things for granted, overestimate control, presume unattainable ideals, underestimate dynamism, or translate comprehensible events into incomprehensible variables” (Weick Reference Weick, Tsoukas and Knudsen2003: 453). As discussed below, a mechanism approach commits to the locality of causal processes and thus situates mechanisms in context. The approach seems appropriate to address managers’ concern. Hence, “a deeper understanding of mechanisms might be one way to better translate organizational theories into managerial action” (Anderson et al. Reference Anderson, Blatt, Christianson, Grant, Marquis, Neuman, Sonenshein and Sutcliffe2006: 109).

Characteristics

In science, what constitutes a mechanism has evolved over time (Machamer et al. Reference Machamer, Darden and Craver2000). In social research, the concept of mechanism “contains a plethora of meanings” (Gerring Reference Gerring2010: 1500). Mayntz (Reference Mayntz2004: 238) laments that “a survey of the relevant empirical and methodological literature soon bogs down in a mire of loose talk and semantic confusion about what ‘mechanisms’ are.” Mahoney (Reference Mahoney2001), for example, lists twenty-four definitions proposed by twenty-one authors. That various definitions of mechanism have been proposed is not surprising, given that the entities and processes studied by different disciplines are rather heterogeneous and that a mechanism is identified by the kind of effects or phenomena it generates. A mechanism is therefore always a mechanism for something (Darden Reference Darden2006). For the sake of discussion, I adopt a general definition of mechanisms that is applicable to both natural and social phenomena and is modeled on the definition of Machamer et al. (Reference Machamer, Darden and Craver2000): mechanisms consist of entities and activities organized so as to produce regular changes from a beginning state to an ending one. Entities can be understood as the actors, organizations, structures and so on that engage in activities, and the activities are the producers of change.³ Let’s illustrate this definition using Merton’s (Reference Merton1948) well-known example of self-fulfilling prophecy in the context of a bank run. The entities here refer to the bank, its cash reserves, its depositors’ belief that the bank is having financial difficulties, a banking system that lacks a deposit insurance scheme and so on. The activities are depositors’ withdrawals of their deposits in large numbers within a short period of time and the bank paying the depositors from its cash reserves. The entities and activities are not random but are related and form an organized whole. Unless the government intervenes or other banks give a helping hand, the bank will eventually become insolvent (or even bankrupt) − the change from the beginning state of solvency. The mechanism is regular in that it always works in the same way from the beginning to end under the same conditions.

Since mechanisms involve change, “it makes no sense to talk about mechanisms in pure ideas or abstract objects, such as sets, functions, algorithms, or grammars, for nothing happens in them (when taken in and by themselves)” (Bunge Reference Bunge1997: 418). For example, 3+2=5 does not represent the mechanism of adding three and two to arrive at the answer of five. Similarly, a deductive inference is not a mechanism through which a conclusion is drawn. Therefore, the concept of mechanism is alien to logic, mathematics and linguistics, none of which are concerned with changes that take place in time. This point is also consistent with the nature of a mechanism being an irreducibly causal process that produces the effect of interest. Adding three and two, for instance, does not cause five to exist. Consider the following syllogism:

Premise 1. All human beings are mortal.
Premise 2. Mary is a human being.
Conclusion. Mary is mortal.

The two premises do not cause Mary to be mortal.

Including the terms “entities” and “activities” in the definition entails the philosophical intuitions of both substantivalists and process ontologists (Machamer et al. Reference Machamer, Darden and Craver2000). Substantivalists focus on entities and their properties, believing that talk of activities can be reduced to that of properties and their transitions. They speak of entities with capacities to act, such as aspirin’s capacity to relieve a headache. Note here that entities refer to concrete things rather than pure ideas or abstract objects, such as functions, sets or algorithms (Bunge Reference Bunge1997). In contrast, process ontologists reify activities, believing that talk of entities can be reduced to that of processes that entities generate. Each side is biased and fails to capture fully the nature of mechanisms. In fact, entities and activities are interdependent in that “entities having certain kinds of properties are necessary for the possibility of acting in certain specific ways, and certain kinds of activities are only possible when there are entities having certain kinds of properties” (Machamer et al. Reference Machamer, Darden and Craver2000: 6). Mechanisms are active in generating phenomena and so need to be conceptualized as the activities of their entities. In the bank run example, the entities and their properties alone do not give rise to a bank run. Rather, a significant percentage of depositors have to, based on their belief, act within a short period of time in order to start a bank run. The definition is dualist in that both entities and activities constitute mechanisms. In short, “it is the activities that entities engage in that move the mechanism from an initial causal condition through different parts to an outcome” (Beach Reference Beach2016: 465).

A mechanism is a causal chain producing the effect of interest. The effect of the bank run example is the insolvency of the bank that confirms depositors’ initial belief. The mechanism perspective commits to the locality of causal processes in that whether X is a cause of Y depends on facts about the spatiotemporally-restricted causal process in question (Hedström and Ylikoski Reference Hedström and Ylikoski2010). For example, if depositors withdraw their deposits over a longer period of time, the bank may be able to call back some of its loans and have sufficient cash to pay the depositors. Consequently, some depositors may change their initial belief that the bank is in financial trouble and so do not withdraw their deposits or even re-deposit their money. The bank run may stop; that is, the mechanism does not run its course. The notion of a causal chain implies that there should be some intermediate steps between cause and effect (Mayntz Reference Mayntz2004). In the case of a bank run, there are a series of steps between the formation of depositors’ belief and the eventual insolvency of the bank. On the other hand, when a cause directly leads to an effect, such as one billiard-ball colliding with another, the whole event does not constitute a causal chain.

Management research is concerned with social mechanisms. As a subset of mechanisms, social mechanisms are characterized by interactions among individuals that underly and account for social regularities (Little Reference Little1991). Such individuals are categorized into groups defined by salient features that members of a group share. In describing a mechanism, the relevant behavior of an individual depends on the group to which that individual belongs. In the bank run example, depositors form a group and an individual depositor’s behavior is affected by the general concern of the group that if the bank fails, depositors will not be able to get back all of their money. Therefore, when the bank’s financial situation is believed to be poor, it makes sense to withdraw one’s deposit as soon as possible.

Another issue is whether a social mechanism refers to a recurrent or a unique causal process. Mayntz (Reference Mayntz2004: 241) opts for the former and proposes that mechanisms “‘are’ sequences of causally-linked events that occur repeatedly in reality if certain conditions are given.” On the other hand, Boudon (Reference Boudon, Hedström and Swedberg1998: 172) deems a mechanism to be “the well-articulated set of causes responsible for a given social phenomenon” and that mechanisms “tend to be idiosyncratic and singular.” Hedström and Swedberg (Reference Hedström and Swedberg1996) are right regarding the point that the generality of mechanisms gives them explanatory power.⁴ In reality, each bank run is unique. A mechanism can be formulated to tailor for the idiosyncratic features of a particular bank run. Yet, such a tailor-made description is only valid for that bank run and is better regarded as an account of a unique chain of events that led from one event to another than as a mechanism (Hedström and Swedberg Reference Hedström and Swedberg1996). An approach that is followed here and is more in line with scientific research is to work out a general bank run mechanism applicable to many bank run cases, although the mechanism may not describe accurately the details of any of the cases. The spirit of this approach is captured by Tilly’s (Reference Tilly2001: 25–26) definition of mechanisms in political science: “Mechanisms form a delimited class of events that change relations among specified sets of elements in identical or closely similar ways over a variety of situations.” In other words, a mechanism is supposed to describe a variety of similar political situations, not one particular situation.

Temporality is an essential characteristic of social mechanisms, which take place in time (Mayntz Reference Mayntz2004). That said, causal mechanisms, especially complex ones, do not always unfold in a linear manner; there may be branch causal chains, escalation processes and feedback loops. For example, during the early stage of a bank run, some depositors may not firmly hold to their belief that the bank is in financial difficulty and so hesitate to withdraw their money from the bank. However, when they see long queues of people outside the branches of the bank trying to get back their deposits, their belief is strengthened once more and motivates them to join the queue. This action in turn motivates more depositors to follow suit. This feedback loop escalates the process of deposit withdrawal.

Most of the mechanisms studied by natural or social scientists are unobservable or hidden and thus their description usually contains concepts that do not appear in empirical data (Bunge Reference Bunge2004). We do not see or observe the mechanism of self-fulfilling prophecy per se in an actual bank run. More likely, we read news about depositors scrambling to get back their money, their belief about the bank’s financial situation and the bank trying to satisfy these depositors. We then link these entities and activities together and compare the information with our knowledge of self-fulfilling prophecy to determine whether our observation fits the mechanism. For new mechanisms, researchers have to use their skills of reasoning and imagination:

To discuss mechanisms is to reason about possible and plausible states of the world as they bear on a particular causal relationship. In reasoning, writers build on their knowledge of a particular context and on general knowledge of the world. They may also play out elaborate reconstructions of the events as they actually occurred or might have occurred to test the relative plausibility of various hypotheses.

(Gerring 2010: 1502)

Generally speaking, mechanisms cannot be inferred from empirical data and have to be conjectured (Bunge Reference Bunge1997). For instance, the mechanism of self-fulfilling prophecy cannot be inferred from the data about a bank run; it has to be conjectured by researchers “with imagination both stimulated and constrained by data, well-weathered hypotheses, and mathematical concepts such as those of number, function, and equation” (Bunge Reference Bunge2004: 200). As Harré (Reference Harré1970: 40) well stated about half a century ago, “making models for unknown mechanisms is the creative process in science, by which potential advances are initiated, while the making of models of known things and processes has, generally speaking, a more heuristic value.” A certain degree of creativity on the part of the researcher is needed. Interestingly, Stinchcombe (Reference Stinchcombe1968:13) once noted, “a student [of sociology] who has difficulty thinking of at least three sensible explanations for any correlation that he is really interested in should probably choose another profession.” As a staunch advocate of theorizing with mechanisms in the social sciences, his point concerns conjecturing at least three different mechanisms that can explain a correlation between variables.

It goes without saying that a conjectured mechanism has to be testable empirically in order that it is regarded as scientific. A conjectured mechanism must have survived empirical tests before it can be regarded as true to some degree and, according to Popper (Reference Popper1959), the mechanism should also be falsifiable (i.e., there is a chance of it being refuted by empirical tests). Pseudoscientific or superstitious practices, such as parapsychology, telepathy, astrology, horoscope, feng shui (風水) and faith healing, do not belong to the scientific domain unless they are based on falsifiable mechanisms; such practices, however, often do not work other than offering a placebo effect (Bunge Reference Bunge2004).

Causal Inference

Mechanisms assist researchers to make causal inference in two main ways:

On the positive side, we can infer that X is a cause of Y if we know that there is a mechanism through which X influences Y. The negative flip side is that if no plausible mechanism running from X to Y can be conceived of, then it is safe to conclude that X does not cause Y, even if the two variables are probabilistically dependent.

(Steel 2004: 56)

In particular, the second way implies that mechanisms can help to address the problem of confounders, which refer to common causes that explain observed but spurious correlations. We can exclude the possibility of a spurious correlation between two variables if we can formulate a plausible mechanism that links directly the variables in the circumstances (Little Reference Little1991). In the earlier example of peak hour traffic correlated positively with the sale of lipstick in Dallas, it is virtually impossible to come up with a plausible mechanism through which peak hour traffic would affect lipstick sales or vice versa. Hence the correlation is spurious. Here, the confounder was level of employment, which affected both peak hour traffic and lipstick sales. As discussed above, a plausible mechanism can be conjured to link level of employment with peak hour traffic or lipstick sales. Of course, a mechanism approach is not the only way to deal with the problem of confounders; vector space modeling, for instance, can also tackle the problem.

However, Steel (Reference Steel2004: 65) suggests that cases similar to the lipstick sales example are “too few and far between for the no-plausible-mechanism strategy to be of much use in distinguishing cause from mere correlation in social science.”⁵ A good example is the positive correlation between opportunity for advancement and level of frustration with the promotion system, found by Stouffer et al.’s (Reference Stouffer, Suchman, Devinney, Star and Williams1949) study of American soldiers in World War II. Contrary to common sense, soldiers in the military police, which offered relatively few promotion opportunities, were on average more satisfied with the promotion system than those in the army air corps, which offered more opportunities. Gambetta (Reference Gambetta, Hedström and Swedberg1998) summarizes five mechanisms that have been proposed by sociologists over the years to account for Stouffer et al.’s counterintuitive finding. For example, Merton (Reference Merton1957: 237) proposes that a “generally high rate of mobility induces excessive hopes among members of the group so that each is more likely to experience a sense of frustration in his present position and dissatisfaction with the chances of promotion.” Here, the mechanism is one of excessive hopes that led more soldiers in the army air corps to frustration.

Gambetta (Reference Gambetta, Hedström and Swedberg1998) fails, however, to consider the possibility that opportunity for advancement had little or no significant effect on the level of frustration with the promotion system. Steel (Reference Steel2004: 65) puts forward a possibility that Stouffer et al.’s finding was due to a confounder: “extremely ambitious people are much more likely to embark on career paths that promise greater opportunities for advancement and that their lofty aspirations are also more likely to make them dissatisfied with their current stations in life.” Steel’s self-selection bias argument implies that with sufficient imagination and luck, researchers can often conjure up mechanisms linking variables together. Given the complexity of human psychology, I believe that the five mechanisms reviewed by Gambetta and the confounder suggested by Steel were all possible in reality; that is, soldiers’ levels of frustration could be accounted for by one or more of these mechanisms. If Stouffer and his colleagues had asked their subjects the reasons for their frustration over or satisfaction with the promotion system, they would have had a much better understanding of the mechanism(s) underlying their finding.

Amid the hype around the mechanismic turn in the social sciences − in particular, management (Anderson et al. Reference Anderson, Blatt, Christianson, Grant, Marquis, Neuman, Sonenshein and Sutcliffe2006) − during the past three decades or so, Gerring (Reference Gerring2010: 1504) issued a cautionary note by questioning the turn’s novelty:

It seems unlikely that anyone has ever published an article or book in any field that merely announces a covariational result as causal without any discussion of possible causal mechanisms. These are things that social scientists do, more or less self-consciously, when they argue about causes.

For instance, when Hempel − an arch-positivist who supposedly had serious reservations about the concept of causal mechanisms − attempted to substantiate the point that general laws serve similar functions in history as in the natural sciences, he offered the following description of what brings about a revolution:

if a particular revolution is explained by reference to the growing discontent, on the part of a large part of the population, with certain prevailing conditions, it is clear that a general regularity is assumed in this explanation, but we are hardly in a position to state just what extent and what specific form the discontent has to assume, and what the environmental conditions have to be, to bring about a revolution.

(Hempel 1942: 41)

Surely, he used mechanismic wording here. What distinguishes a mechanism-centered approach may be that researchers are more explicit in their theorizing with mechanisms and take more seriously the specification of discrete causal pathways compared to their peers following a different approach.

Process Tracing

As mentioned, if we know that there is a mechanism through which X affects Y, we can infer that X is a cause of Y. The question then becomes: How do we know that a mechanism exists? Little (Reference Little1991: 30) proposes two ways that researchers may acquire knowledge of mechanisms:

To credibly identify causal mechanisms we must employ one of two forms of inference. First, we may use a deductive approach, establishing causal connections between social factors based on a theory of the underlying process …. Second, we may use a broadly inductive approach, justifying the claim that a caused b on the ground that events of type A are commonly associated with event of type B …. But in either case the strength of the causal assertion depends on the discovery of a regular association between event types.

Little’s suggestion implies that researchers possess some prior knowledge − theoretical knowledge in the case of deduction and empirical knowledge in the case of induction − of the connections between events. Since a mechanism often consists of a series of events, both kinds of knowledge may be needed to identify the mechanism.

Little’s method may be categorized under a more general approach called “process tracing,” which, in the case of social mechanisms, “consists in presenting evidence for the existence of several prevalent social practices that, when linked together, produce a chain of causation from one variable to another” (Steel Reference Steel2004: 67). Simply put, process tracing is to trace a mechanism. A successful instance of process tracing provides empirical support for the existence of a mechanism linking the variables of interest. Process tracing is a fundamental tool of qualitative analysis, used usually by researchers who carry out within-case analysis (Collier Reference Collier2011). Process tracing can be used in tandem with quantitative methods: “process tracing is used to establish qualitative claims about causal structure, and statistical analysis is called on to estimate the strengths of these relationships” (Steel Reference Steel2004: 72).

To explicate process tracing, Bennett (Reference Bennett, Brady and Collier2010) uses the analogy of a detective attempting to solve a crime by examining clues and potential suspects and collecting evidence that bears on suspects’ motives, means and opportunity to have committed the crime. To uncover variables not previously considered in a mechanism, researchers may conduct process tracing backward from observed outcomes to potential causes, or forward from hypothesized causes to eventual outcomes. Similarly, after a crime has occurred, a detective can work forward from potential suspects as well as backward from clues about the crime. In our bank run example, researchers may start their investigation by interviewing depositors who were among the first to withdraw their money or interviewing bank executives who were struggling to maintain the bank’s solvency. Needless to say, the assumption here is that research access is readily available.

Since there exists detailed guidelines for process tracing (e.g., Collier Reference Collier2011; Beach and Pedersen Reference Beach and Pedersen2019), it is beyond the scope of this section to discuss these techniques. Yet, it is worth presenting in a simplified manner Van Evera’s (Reference Van Evera1997) classification of four tests according to whether a piece of evidence is necessary and/or sufficient for accepting a hypothesis. The classification is useful for evaluating evidential strength in process tracing. As shown in Figure 2.4, the four tests are called “straw in the wind,” “smoking gun,” “hoop” and “doubly decisive.” Let’s illustrate the tests by returning to the example of self-fulfilling prophecy in the context of a bank run. Admittedly, this illustration may not be in line with the sophisticated banking and payment systems nowadays. Within this limitation, my aim is to demonstrate the nature of each of the four tests. Suppose a researcher wants to know whether a bank run has occurred after observing some events. The hypothesis here is that the mechanism of self-fulfilling prophecy has begun. The following events, corresponding to the four tests, provide different extents of support for the hypothesis.

Straw in the wind: Outside the branches of a bank, there are long queues of depositors trying to withdraw their money. Is this a piece of evidence that a bank run of the self-fulfilling prophecy kind has started? No, the evidence is neither necessary nor sufficient for establishing the hypothesis. Depositors in large numbers may withdraw their money for various reasons, such as shopping during holidays. An essential element of the self-fulfilling prophecy is depositors’ belief that the bank is experiencing financial trouble and so it is risky to keep their money with the bank. Without knowing the depositors’ reasons, the observation of the long queues provides little information that favors or calls into question the researcher’s hypothesis.
Smoking gun: Here, there is additional information about the above event: these depositors believe that the bank is experiencing financial trouble. In this case, the event confirms the hypothesis. However, depositors may withdraw their money through other means, such as writing checks and transferring funds from their bank accounts to their mutual fund accounts. A bank run due to the same depositors’ belief may occur without long queues outside its branches. In other words, the evidence is sufficient but not necessary for supporting the hypothesis. As Van Evera notes, when the suspect is holding a smoking gun right after a murder, this evidence implicates the suspect clearly. However, the absence of the gun does not exonerate the suspect.
Hoop: The term “hoop” here means that a piece of evidence must “jump through the hoop” in order not to be eliminated; however, success in passing the hoop test does not support strongly the hypothesis in question. Suppose there is reliable evidence that a large number of a bank’s depositors believe that the bank is having financial difficulties. Yet, most of these depositors have not acted on their belief and withdrawn money from the bank because of the expectation that the government will assists banks during a financial crisis. As mentioned above, the belief is a necessary element of the self-fulfilling prophecy, but the belief alone is not sufficient for the bank run to occur.
Doubly decisive: Further to the above hoop test example, suppose the depositors do act on their belief and withdraw money from the bank through various channels. In this case, the evidence is both necessary and sufficient for confirming the hypothesis. Van Evera uses the example of a bank camera photographing the faces of all robbers, thereby implicating those caught by the camera and exonerating others.

While one piece of “doubly decisive” evidence may suffice to confirm a hypothesis and is better than many pieces of “straw in the wind” evidence, such high-quality data are hard to come by in reality. Instead, researchers may collect “hoop” and “smoking gun’ data that together provide evidence of equivalent strength to “doubly decisive” evidence (Van Evera Reference Van Evera1997).

Process tracing is not without limitations. Bennett (Reference Bennett, Brady and Collier2010) mentions two key problems: degrees of freedom and infinite regress. The former is a usual problem of case studies in which there are too few cases included in a sample relative to the large number of variables studied. The latter is more specific to process tracing: when researchers pay attention to exceedingly fine-grained details of mechanisms, it can result in an infinite regress of studying “causal steps between any two links in the chain of causal mechanisms” (King et al. Reference King, Keohane and Verba1994: 86). Suppose researchers investigate a bank run and interview depositors who believe that the bank is experiencing financial trouble. After hearing that, say, a depositor’s belief came from their friend, they may investigate further how the two individuals communicated for the belief to be transferred, what cognitive processes were involved and so on. Basically, this issue is concerned with the “stopping point” − that is, when an inquiry into a mechanism should stop. For example, in Checkel’s (Reference Checkel2006) study of the mechanism of socialization in the Council of Europe, he broke the mechanism into three sub-mechanisms, namely strategic calculation, role playing and persuasion, and focused on persuasion. The question then becomes: why should he stop at this point? It is plausible that persuasion could be broken down into further sub-mechanisms, such as more micro cognitive processes.

Book contents

Chapter 2 - Causation

Summary

Keywords

Information

Chapter 2 Causation

Regularity Theory of Causation

Necessary Connection

Definitions of Causation

Causal Relationships versus Accidental Regularities

Necessary and Sufficient Conditions

A Counterfactual Analysis of Causation

Possible Worlds

Symmetrical Overdetermination

Backtracking

Probabilistic Causation

Causal Graph Modeling

Back-door Paths

Markov Condition

Vector Space Modeling

Nature of Data Used

Conversion of Data Input into Output

Nature of Output

Assumptions

Table 2.1. Comparison between causal graph modeling and vector space modeling

Strengths

Limitations

Agent versus Event Causation

Mechanisms

Characteristics

Causal Inference

Process Tracing

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Book contents

Chapter 2 - Causation

Summary

Keywords

Information

Regularity Theory of Causation

Necessary Connection

Definitions of Causation

Causal Relationships versus Accidental Regularities

Necessary and Sufficient Conditions

A Counterfactual Analysis of Causation

Possible Worlds

Symmetrical Overdetermination

Backtracking

Probabilistic Causation

Causal Graph Modeling

Back-door Paths

Markov Condition

Vector Space Modeling

Nature of Data Used

Conversion of Data Input into Output

Nature of Output

Assumptions

Table 2.1. Comparison between causal graph modeling and vector space modeling

Strengths

Limitations

Agent versus Event Causation

Mechanisms

Characteristics

Causal Inference

Process Tracing

Accessibility standard: Unknown

Why this information is here

Accessibility Information

Save book to Kindle

Save book to Dropbox

Save book to Google Drive